Fastai v2 chat

Pablo · February 11, 2020, 3:31pm

Quick comment to mention that I had problems saving the dataloaders in my language models. But after updating to the latest Fastai2 it works fine. So good work, someone, somewhere.

joneslloyd · February 12, 2020, 9:46am

Hi all – I just wanted to check, because I wasn’t clear from browsing fastaiv2 forum topics:

Does v2 now work with Colab, since it supports Python 3.6?

navneetkrch · February 12, 2020, 11:11am

Yes it works on Google colab as well.
See the Zachary Mueller’s repo for FASTAI2 notebooks and everything works on Google Colab as well.
I myself am trying out the new fastai2 using Google colab only.

joneslloyd · February 12, 2020, 3:22pm

Awesome! Thanks for this info

jeremy · February 12, 2020, 9:27pm

fastai2 0.0.8 and 0.1.11 released. Nothing major - just some fixes - but notable that it’s the version used for the fastai paper (preprint coming out tonight) and the reviewer’s draft of the book.

barnacl · February 16, 2020, 12:46am

had a few questions about the new paper: @jeremy
On page 7 Numericalization and vocabulary creation often requires many lines of code, and careful management here fails and caching. unclear what the last part means-" here fails and caching" is it that caching fails here?
On page 19 This strings that represent categories cannot be used in models directly and are turned into integers using some vocabulary. This -> The

ram_cse · February 16, 2020, 1:17am

I have the following directory structure. I am using the DataBlock API of fastai V2. To create my data loaders I use the following code.


data = DataBlock(
    blocks = (ImageBlock, CategoryBlock),
    get_x = ColReader(cols=0, pref=f"{path}/Train/"),
    splitter = RandomSplitter(),
    get_y = ColReader(cols=1),
    item_tfms = RandomResizedCrop(240, min_scale=0.75, ratio=(1.,1.)),
    batch_tfms = [*aug_transforms(), Normalize.from_stats(*imagenet_stats)]
)

dls = data.dataloaders(df_train)

How can I, prepare my test_dl for creating prediction so that I can use

learn.get_preds(dl=test_dl)

sgugger · February 16, 2020, 2:34pm

There is a test_dl method specifically designed for this: dl = dls.test_dl(new_items).

ram_cse · February 16, 2020, 8:38pm

I store the path of all test images in test_items. Then run the following code:


test_dl = dls.test_dl(test_items)
preds = learn.get_preds(test_dl)

But get the error, as shown in the screenshot.

muellerzr · February 16, 2020, 8:49pm

Can you do test_dl.show_batch() (test_dl being your new DataLoader)

ram_cse · February 16, 2020, 9:35pm

test_dl.show_batch() works correctly. Notebook is attached.

https://colab.research.google.com/gist/ram-ai/4deeeb7f1c71815563ffd7ce0537420d/age-detection.ipynb

ram_cse · February 16, 2020, 9:57pm

Thanks @muellerzr and @sgugger for help. Correct statement is

preds = learn.get_preds(dl=test_dl)

Pablo · February 17, 2020, 8:46am

Hi all!

I am working on NLP. I have 100% functional code (for a test dataset) that I am trying to apply to a very large dataset.

I am still at the language model part. In particular, progress seems to have stalled at this line:

tfms = [tokenizer.from_folder(unsupervised_folder_path), Numericalize()]

I have 3.7 million documents and I can see Fastai has tokenized 3.4 million of them. However, now progress is stalled, with no interruption of the execution, no error messages or anything. And no progress (same number of tokenized docs).

Small update: in case docs are processed in order, I have seen that the last tokenized doc corresponds to the last doc in my source. So it may be that tokenization has completed fine? (Then what’s holding progress?) CPU use and read/write activity seem stalled as well.

Do we know of any reason why this could happen? Any hard limit on the number of docs, or perhaps if there is something wrong with one doc (like having no text?)

Update: we have been thinking that perhaps one of the threads died for whatever reason which results in some missing files and then Fastai may be waiting for this thread to complete, and thus the stall. Does this sound right?

sgugger · February 17, 2020, 3:28pm

I think that would be the most likely explanation. Note that since tokenize.from_folder saves all the result in a temp folder, you should only have this long wait once. You can check the dictionary lengths.pkl that should be there to get a list (in the keys) of all processed files and see which one is missing.

jeremy · February 17, 2020, 3:32pm

BTW if you set num_workers to zero then it’ll disable multi-processing, so you’ll be able to see the error more clearly. Or else you can put all your processing inside a try/catch that logs errors to a file.

Pablo · February 17, 2020, 3:54pm

For the moment I am just executing the code again (in another machine, which is cheaper since I don’t need GPU for this step).

@sgugger lengths.pkl was never created!

I hope it works fine, but I’ll report otherwise.

jeremy · February 17, 2020, 6:15pm

I fixed the problem with the new version of PyTorch, and made a release. So you should find it works fine with the latest PyTorch now.

AntonioLopardo · February 17, 2020, 6:35pm

I’m having trouble installing fast ai v2, everything seems to work fine but I fail many of the notebooks when testing.
ModuleNotFoundError: No module named ‘fastai2._pytorch_doc’
this is the only error I get but I fail around 10-15 notebooks.
I followed the instructions on the repo and and this page http://dev.fast.ai

Not sure if this is the right place but I’ve been trying to solve this problem for a while without success.

sgugger · February 17, 2020, 8:14pm

That’s weird, all tests run fine for me. Double-check you do have the file fastai2/_pytorch_doc.py in your fastai2 repo, but otherwise it should work.

AntonioLopardo · February 17, 2020, 8:51pm

I nuked my anaconda installation and re-installed everything, now my only problem is with the 09_vision.augment.ipynb notebook
This is the error
AssertionError Traceback (most recent call last)
in
6 y = TensorImage(stack([x1,x2,x3,x4])[:,None])
7 y = y.warp(p=1., draw_x=[0.,0,-0.5,0.5], draw_y=[-0.5,0.5,0.,0.])
----> 8 test_eq(y[0,0], tensor([[0.,1.,0.,1.,0.], [0.,0.,0.,0.,0.], [0.,0.,0.,0.,0.], [0.,0.,0.,0.,0.], [0.,0.,0.,0.,0.]]))
9 test_eq(y[1,0], tensor([[0.,0.,0.,0.,0.], [0.,0.,0.,0.,0.], [0.,0.,0.,0.,0.], [0.,0.,0.,0.,0.], [0.,1.,0.,1.,0.]]))
10 test_eq(y[2,0], tensor([[0.,0.,0.,0.,0.], [1.,0.,0.,0.,0.], [0.,0.,0.,0.,0.], [1.,0.,0.,0.,0.], [0.,0.,0.,0.,0.]]))

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/fastcore/test.py in test_eq(a, b)
30 def test_eq(a,b):
31 "test that a==b"
—> 32 test(a,b,equals, ‘==’)
33
34 # Cell

~/anaconda3/envs/fastai2/lib/python3.7/site-packages/fastcore/test.py in test(a, b, cmp, cname)
20 "assert that cmp(a,b); display inputs and cname or cmp.__name__ if it fails"
21 if cname is None: cname=cmp.name
—> 22 assert cmp(a,b),f"{cname}:\n{a}\n{b}"
23
24 # Cell

AssertionError: ==:
tensor([[0.1667, 0.1667, 0.0000, 0.1667, 0.1667],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000]])
tensor([[0., 1., 0., 1., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
AssertionError: ==:
tensor([[0.1667, 0.1667, 0.0000, 0.1667, 0.1667],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000]])
tensor([[0., 1., 0., 1., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])

Any idea of what my problem might be?