A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

hello34 · April 9, 2020, 6:37am

Spotted an error in your cross validation notebooks, can I put a PR for the same @muellerzr.

I have submitted a PR: https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/pull/17

AjayStark · April 10, 2020, 6:25am

Hi, in the multilabel classification, in using datablocks, get_x method is being used to get the images from the table.
Instead of this can we use get_image_files() on the train folder and get the images from there? @muellerzr

Also, you say get_items() get the x and y together but in one of Jeremy’s notebook he uses get_items() to get the images and get_y=parent_labels to get the y values.

Thanks,

AjayStark · April 10, 2020, 3:38pm

In the cross validation notebook, can we create the data using datablock api?
If yes, how to deal with the splitter as in this case we have split_list which is of type L and splitter expects a function.

thank you,

muellerzr · April 11, 2020, 1:11am

get_items simply grabs whatever items are available to it. For instance we had our get_items be get_image_files, which grabs our filename. From that filename/location we can either add an additional get_x or get_y function to take that further. For instance parent_label will grab the folder’s name that said filename was in.

I’m not 100% sure you’d want to do this as the data is all within our DataFrame itself. To me it wouldn’t quite make sense to not use the data provided in its format.

Not really. Because we need to have any number of subsets and any number of splits. The medium level API is best for this type of problem.

AjayStark · April 11, 2020, 2:12am

Thanks for the reply,

So it is like the use of middle, medium and high level api are not interchangeable and each of them can be used for a specific purpose which the other api can’t supplement. Am I right?

muellerzr · April 11, 2020, 2:20am

Not quite. The high level API is built upon the medium level API. Just some tasks getting it to fit in the high level API can be a pain, so you should go with the medium level. A prime example is having more than one train/test set such as this exact instance. You could generate a splitter that would then further generate every single split for the cross validation, but why should we when we already know our splits? Thus we move to the Datasets level because we already have our splits set up for us each time.

I guess then you could override a DataBlock’s splitter each time however changing it via an IndexSplitter, but I find the Datasets in this case easier to read. Do you follow me?

AjayStark · April 11, 2020, 2:23am

Thanks, got it clear

vijayabhaskar · April 12, 2020, 1:38pm

Hi @muellerzr I was trying to run your object detection notebook, it ran fine all the way to the end. So I tried run learn.get_preds() which throws this error:

TypeError: object of type 'int' has no len()

Can you help me with this?

muellerzr · April 12, 2020, 1:57pm

Yes get_preds and predict will fail IIRC. See earlier in the thread there’s a link to an object detection thread. Along with that though if you want to go much deeper you can always just use raw pytorch and convert everything back with as much fastai as possible. See my speed up thread here for that info: Speeding Up fastai2 Inference - And A Few Things Learned (though not sure if the decodes will work OOTB this way)

AjayStark · April 13, 2020, 10:07am

In the cross validation notebook, our final accuracy is greater than the initial one but how do we do .predict using our final accuracy. As it is the summed up accuracy of 10 learner models.

Thanks,

muellerzr · April 13, 2020, 12:47pm

You’d need to get the raw predictions from the 10 then sum and average them together.

AjayStark · April 13, 2020, 1:30pm

So to make a single new prediction I’ve to train the 10 models again and get raw predictions?

muellerzr · April 13, 2020, 1:38pm

No, you’d use the 10 trained models that you have saved somewhere.

AjayStark · April 13, 2020, 1:48pm

Ohhh, so the kfold for loop should be in such a way that each learner is saved (lyk… learn1,learn2…) and the model by learn1.save(‘model1’).
Finally get the raw predictions from these saved models:
learn1.load(‘model1’)
learn1.get_preds(‘test_img’)

Am I right?

muellerzr · April 13, 2020, 2:10pm

Yes, that is correct.

AjayStark · April 13, 2020, 2:12pm

Thank you!

muellerzr · April 14, 2020, 5:03pm

Went through and updated all the notebooks today for vision and tabular and fixed any bugs associated with them. Most noteably is any notebook that had multiple functions in get_y, these are now wrapped inside a Pipeline. See ImageWoof for an example of this

AjayStark · April 15, 2020, 3:11pm

In the style transfer notebook, while running _get_layers() function I get the following error:
“_vgg_config() is not defined”
Although I have imported vgg19

muellerzr · April 15, 2020, 3:14pm

Unsure why, on my notebooks it shows the proper variable name (which as an _ in the beginning as it’s a private function)

AjayStark · April 15, 2020, 3:19pm

Oops Sorry, the screenshot is of the code when I tried without the ‘’.
If I use the '’ also the get the same error: “_vgg_config() is not defined”
Also, I didn’t install nbdev nor used any of the # lines. Is it because of that?