Non-Beginner Discussion

Does anyone see any issue with installing packages to Paperspace persistent storage and then adding this location to system path?

To save time I have been doing the following:

  • Create a directory in /notebooks (the home dir) called libs
  • Install packages to this dir using the --target argument: !pip install --target=./libs wandb.
  • Set this location to the sys.path:
import sys
pkg_path = "./libs"

The above cell is then the only cell you need to run each time you restart a Paperspace machine.

Does anyone see any fault in this approach? It saves me precious minutes (package install is a little slow in Paperspace) when I have a quick idea to try out, but unsure if setting sys.path has any other impacts (none that I can see so far).

1 Like

@stantonius The Paperspace images come with a certain amount pre-installed, but that’s just one thing which you have to reinstall / reload each time unfortunately.

Another approach would be to create a custom ‘notebook’ based on a custom docker image. You could potentially use the fastai docker container as a base (more on those here), but you might want to add some custom logic on top of that. I described how I did that with the IceVision library and connected it to a custom notebook on Paperspace in this blog post.

That said, I’m not sure I’d bother with it for just simple fastai experimentation. Even when I needed to do it for my side project, it still felt like a distraction from the main show in town: i.e. training my models. YMMV :slight_smile:

1 Like

@stantonius @strickvl please remember to keep non-beginner topics, like modifying the python sys path and creating docker images, away from any topic labelled “beginner”. I’ve moved your discussion to the non-beginner topic now.


I’m about to use timm, but just making sure - I could use timm models in unet_learner as well as I could in vision_learner? (Or it’s just for vision_learner)

Another question from the tutorial. Say, we have this:

import timm 
import torch

model = timm.create_model('resnet34')
x     = torch.randn(1, 3, 224, 224)

torch.Size([1, 1000])

What does each axis of the model’s shape mean? I figure it out to be number of layers but I’m not sure.

right now unet_learner does not support timm.

The shape is the logits for each of the classes, and ImageNet has 1000 classes…

1 Like

Hey everyone,

I have been iterating through Jeremy’s notebook Iterate like a grandmaster! | Kaggle and trying to take on a the suggestion of using the patent-trained model + incorporating this with fastai. However I am hitting a wall: when I try to cut/slice/manipulate a Pytorch model

# example of what I thought was the simplest model modification,
# which is just deconstructing and reconstructing the model
model = nn.Sequential(*[l for l in model_d.children()])

and pass it a batch, I get this error:

TypeError: forward() got an unexpected keyword argument 'input_ids'

I know the batch is formatted correctly, because when I pass one batch to the HF-delivered model, I get the expected output.

Any ideas what I am doing wrong here? I have searched high and low and cannot find the answer, but that usually means I have overlooked something simple.

BTW this is clearly a basic example, but this is after I had tried experimenting with custom architectures and always eventually hit this same error. I ultimately wanted to write a blog for this group that outlines how to use models from other libs in fastai, but I have fallen at the first hurdle.

For reference, my notebook is here that describes the issues I am facing in more detail (same Kaggle competition data as JH’s notebook).

Extra points: I found when I was creating the Datasets object myself for this custom task, I had to specifically move the input tensors to the GPU via the .cuda() method in my custom Transform. However when I follow the HF or any fastai tutorial, it seems this is done automatically. I tried looking in both repos but can’t seem to see where this happens. Any ideas?

Many thanks for any guidance or feedback

I just found out about this intriguing project and it seems to resonate with what Jeremy was discussing in today’s lesson about Transformers being good for GPUs but ULMFiT/RNNs being better for larger contexts. It appears to be based on something called “Attention-Free Transformers”. Is anybody familiar with this type of work? Is it worth pursuing?


Am I correct to understand that huggingface does not provide a facility to configure ssh-keys? Lots of googling hasn’t turned up anything. Is there some alternative way to configure that git push doesn’t ask my usename & password each time?

[Edit:] Heh! I discovered something new… Git - gitcredentials Documentation
So all I needed to do at my local machine WSL console was…

Read the help files to consider security implications

$ git help credential-cache
$ git help credential-store

Implement the one I chose…

$ git config --global credential.helper store

and then at the next push, entered my huggingface username/password for the last time.


Maybe you can talk to BlinkDL in the EleutherAI discord, he is frequently sharing his progress there.

1 Like

Hey guys, any idea how to deploy my model into an online app? I just need a website where one can upload a photo, and get it predicted and decoded with the model.

I barely caught the thing with Binder, and others

How is what you’re looking to do different from what we covered in lesson 2?


Oops! Might have forgotten that. I’ve probably looked again in an old tutorial. Thanks for addressing me to HuggingFace, Gradio… I will check these up

You don’t say that you’ve considered HuggingFaces and don’t want to use that, so use that!

Otherwise, I see lots of options googling for: gradio web hosting.
Can you report on three options you find there, so we can get a better feel for what sort of service you are looking for?

To be honest, I just didn’t know about HuggingFace. I was exploring different options but found only Binder and in an old version of the fastai tutorials/classes, and got lost, but rewatching again the class #2 of this year 2022, showed me new options such as HuggingFace, which is what I’m going to try first… Will keep you posted :wink:


a tip btw, in helping with the transcriptions I’m picking up details that I missed while just viewing them…


…and your help is very much appreciated!

1 Like

To continue going down the understanding malware. I have been fascinated by this old competition on Malware from 2015. They aren’t using NLP and it would be a fantastic to see what I could do with it.

Some observations.

  1. There is lots of renewed interest around converting them to images and trying to use those to classify. I have an old medium trying to change data into a visual problems ( it just didn’t perform well. The spoiler being, converting to a CNN there are places data is lost and padded. It might have been code back in the day but there is probably a better model if you don’t lose as much data.

  2. That being said. Malware is different because there is some vanity things that pop in. For example, this one that is found (more can be found here Microsoft Malware Classification Challenge (BIG 2015) | Kaggle) suggests that this is relatively common. My favorite being:

  3. The data is huge for the competition at 500 GB. So I took the training set ~200 and reduced it to ~20 GB (MicrosoftMalware2015TrainSubset | Kaggle) so that I could try using some NLP items for it. Hugging face has a Small Codebert tokenizer that I hope to run on it soon.

I am also planning to make a dataset of the malware code converted to images to put up in Kaggle so others can try with CNN. It will be an interesting “bakeoff” at the end of the day.


Hi ,

I am trying to make an ensemble of models using DataBlock API.

I created a stratified k fold group based on the Label values.

so now instead of having a randomsplitter - I want to pass the IndexSplitter along with fold and get the outputs for each of the training folds. Any idea why am I getting this error?

Hi there,

I have a question about segmentation models in fastai. Is it possible to provide data in multi-class multi-label format? Currently, I use MaskBlock(codes=[0, 1, 2, 3]) to read masks. They look like:

0 0 1 2 1 
0 1 1 0 2
0 0 3 0 1

So one can use out-of-the-box fastai classes without any issues. However, what if I have overlapping masks? Let’s say, building from the toy example above, we have three overlapping classes:

0 0 1 1 1
0 0 1 1 0
0 0 1 0 0

0 2 2 0 0
0 2 2 0 0
0 0 0 2 0

0 0 0 0 0
0 0 3 3 0
0 3 0 3 0

One possible solution is to combine codes using bit masks and represent overlapping classes as a new class. For example:

c1 = 1 -> replace -> 0b001
c2 = 2 -> replace -> 0b010
c3 = 3 -> replace -> 0b100

c1 = 1
c2 = 2
c3 = 4
c1 and c2 = (0b001 | 0b010) = 0b011 = 3
c1 and c3 = (0b001 | 0b100) = 0b101 = 5

In this case, we can encode all these combinations as a new mask with values from 0 to 7. For example, if we have three overlapping masks, then we can represent them as a single array as follows:

0 0 1  2 0 0  4 0 0   6 0 1   
0 1 1  2 2 0  0 4 4 = 2 7 5  
1 1 0  2 2 0  0 4 4   3 7 4

But in this way, we increase the number of forecasted classes, which may (or may not?) lead to decreased performance, depending on how many overlaps are in the data.

So I wonder if there is some (simple) way to provide masks as N-channel images instead of just 2D mask tensors? For now, I am going to replace MaskBlock with some custom transformation that returns something different from PILMask. But maybe there is an easier way to achieve the same result.


At the moment, I plan to use a second ImageBlock:

db = DataBlock(blocks=(ImageBlock(...), ImageBlock(...)), ...)

I haven’t tried it yet, but at the first glance, it looks like a working approach.

1 Like

Hi all, I am trying to open tensorboard in paperspace’s jupyter notebook. However, I received a “refused to connect” error. I have followed paperspace’s guide as shown here: TensorBoard | Paperspace. I would greatly appreciate your help.