Getting help with fastai v1

Request for Help

Before asking for help, please search the forums. Click the magnifying glass in the top right. Try a few different searches if you don’t find anything right away. If you need help with your installation, please see troubleshooting docs, and be sure to show the output of:

python -c 'import fastai; fastai.show_install(1)'

When you ask for help, you have a lot of context about your problem that the person you’re asking doesn’t know (and the answer depends on that context). By following these guidelines, others will be able to more quickly and effectively help you. Be sure to:

  • Be as specific as possible. Include relevant code, full stack trace, error outputs, etc. “fastai isn’t working” is a lot worse than “I can’t figure out how to use a different sampler in my dataloader”, which is a lot worse than "Here’s a code snippet where I’m trying to use a WeightedRandomSampler, and it’s giving the following error: ..."
  • Search your question here on the forums, as well as on Google (bonus points if you include links you found in your search that were helpful but didn’t fully answer your question!)
  • Include a (the shortest possible) code snippet to demonstrate what you’re trying to do (for more info, please see https://stackoverflow.com/help/mcve). In the case of Deep Learning problems, consider what’s really needed for someone to reproduce your problem. Does the person helping you need to download your entire dataset, or does a single piece of data work? Or even better, does a torch.ones tensor of the same shape as your data produce the same error?
  • Please include your system version and setup: are you developing locally on Windows? Remotely on AWS?
    • Be sure to mention if you’re not using our standard official setup steps, or if you’ve made some changes
    • Note that what computer you’re actually typing at is generally not what we need to know, unless your problem is with awscli or some similar PC problem; since the actual analysis is generally running on a remote machine (e.g. AWS or Paperspace) it’s the details of that which we need
  • Tell us:
    • What steps you’ve tried to fix the problem yourself, what you expected in each case, and what actually happened
    • Your hypotheses about what might be going wrong, and your ideas about what approaches might be able to fix it
    • What you’ve typed, particularly if you’ve done anything differently to the setup and notebooks we’ve given you
    • Exactly what error message you received
  • If relevant, show screenshots of problems that occurred
  • Whether the results of any earlier steps or configuration processes looked different in any way to what we did in class
  • What you’ve already tried to fix your problem (for instance, if you tried restarting the kernel on your notebook, or you read a related wiki page that didn’t cover your use case)
  • If relevant, if there’s something similar that did work for you (for example, you were able to get a command to work on t2, but not on p2).

Bug Report

Bug reports are welcome here. We do ask that your follow the previous instructions as appropriate – especially giving us the code needed to reproduce your bug is very helpful.


And remember, we’re all here to help you, but we’re not magic (and also unpaid). We’ll try our best to get you to a solution, and showing that you’re also putting in effort is appreciated.

Thanks to @danielhunter for contributing to this guide.

4 Likes

Hello, I have a bug report and I am not sure where to post it, since the GitHub guidelines said I should post it on your forum, but the forum seems to not be accepting new posts anymore.

Bug report
FastAI version: fastai-1.0.32

Describe the bug
I have a multi category dataset with 14 labels.
When creating a new ImageDataBunch using the from_lists function, data_bunch.train_ds has multi_category labels of length 14, but progress_bar(test_data.train_dl) returns only labels of length 2.

This leads the create_cnn function to generate a model which has only 2 outputs instead of 14.

To Reproduce

labels_labels = [[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
labels_files = ['./resized/00029943_006.png', './resized/00025993_000.png', './resized/00019927_009.png', './resized/00014158_000.png', './resized/00009021_001.png', './resized/00017835_008.png', './resized/00010698_003.png', './resized/00019275_004.png']

def get_data(bs):
  ds_tfms = ([flip_lr(p=0.5)], 
             [])
    
  data = ImageDataBunch.from_lists(DATA_PATH, 
                                   labels_files,
                                   labels_labels,
                                   ds_tfms=ds_tfms, bs=64)
  
  return data

learn = create_cnn(get_data(8), models.resnet34, metrics=accuracy)

learn.model[-1]

Output:

Sequential(
  (0): AdaptiveConcatPool2d(
    (ap): AdaptiveAvgPool2d(output_size=1)
    (mp): AdaptiveMaxPool2d(output_size=1)
  )
  (1): Lambda()
  (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (3): Dropout(p=0.25)
  (4): Linear(in_features=1024, out_features=512, bias=True)
  (5): ReLU(inplace)
  (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (7): Dropout(p=0.5)
  (8): Linear(in_features=512, out_features=2, bias=True) #This should have out_features=14
)

Expected behavior
Create a learner with the last layer being a linear layer with 14 outputs:
(9): Linear(in_features=512, out_features=14, bias=True)

It accepts new posts - you just have to read a few first. But putting it here is just fine. Thanks for the bug report! :slight_smile:

We don’t actually currently support one-hot encoded multi-labels. Instead, please use a list of labels, e.g. ["hazy","primary"]. This will be one-hot encoded for you.

We do plan to add one-hot encoded label support in a future version btw.

Hi, I’m really impressed with fastai, but it is still a trouble for me to create data bunch from my pytorch dataset, I tried to google through the forum, but I found an outdated version. So, could you please point out the latest custom dataset tutorial, please?

Thanks.

hi there… did you get a solution…
I have got 5004 classes… these classes just go as integers . I expected that they would get hot encoded. Please let me know what and where i need to do some things to ensure my labels are hot encoded during training

Correction

python -c 'import fastai; fastai.show_install(1)' should now be

python -c 'import fastai; fastai.utils.show_install()'

1 Like

I think this is just a simple git question:

I got nbdev working by cloning the nb_dev template repository into a new repository on my own Github user account. I then linked a new repository on my local machine to my GitHub repository,

In other words, there is a chain like this:
local machine > my Github repository > nbdev_template repository

git remote -v #on my local machine
origin https://github.com/me/my_repository.git (fetch)
origin https://github.com/me/my_repository.git (push)

However, when I tried to push changes from my local machine to my Github repository, like this:
git push -u origin master
it triggered a message saying that I had failed to push to the nbdev_template repository, which was not what I was trying to do (oops, sorry!).

Can someone please explain how to push to my local repository without pushing to the original nbdev_template repository as well?

Thanks!
John

Sorry, another one. Does anyone know why the count of items for a training dataset wouldn’t match the length of the predictions? @sgugger said that if you wanted the filenames associated with predictions, they were in learner.data.valid_ds.items. But I tried generating predictions for the training set of a big databunch, and the lengths don’t match (I tried the same thing with a different training dataset and the lengths didn’t match either.

preds,ys = learner.get_preds(ds_type=DatasetType.Train)
len(learner.data.train_ds.x),len(learner.data.train_ds.y)
>(216078, 216078)
preds.shape,ys.shape
>(torch.Size([216064, 50]), torch.Size([216064, 50]))

However, the lengths do match with the validation sets I tried.

preds,ys = learner.get_preds(ds_type=DatasetType.Valid)
len(learner.data.valid_ds.x),len(learner.data.valid_ds.y),len(learner.data.valid_ds.items)
>(54019, 54019, 54019)
preds.shape,ys.shape
>(torch.Size([54019, 50]), torch.Size([54019, 50]))

The training dataloader has shuffle=True (so your predictions are in a different order) and drop_last=True (so it drops the last batch that is not of size batch_size). You should use fix_dl, which is the same as train_dl but with both those options as False and data augmentation removed.

Thanks very much for explaining that, Sylvain. The documentation has a counter-example under the get_preds() section here, which should probably be updated (I would like to submit a pull request but it might take me a while to figure out.)

Yes documenting that properly would be nice. If you want to tackle this, go ahead! :slight_smile:

The instructions to show the fastai version should be, as of 1.0.60 at least, the following:

python -c 'import fastai.utils; fastai.utils.show_install(1)'