A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

Spotted an error in your cross validation notebooks, can I put a PR for the same @muellerzr.

I have submitted a PR: https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/pull/17

Hi, in the multilabel classification, in using datablocks, get_x method is being used to get the images from the table.
Instead of this can we use get_image_files() on the train folder and get the images from there? @muellerzr

Also, you say get_items() get the x and y together but in one of Jeremy’s notebook he uses get_items() to get the images and get_y=parent_labels to get the y values.

Thanks,

In the cross validation notebook, can we create the data using datablock api?
If yes, how to deal with the splitter as in this case we have split_list which is of type L and splitter expects a function.

thank you,

get_items simply grabs whatever items are available to it. For instance we had our get_items be get_image_files, which grabs our filename. From that filename/location we can either add an additional get_x or get_y function to take that further. For instance parent_label will grab the folder’s name that said filename was in.

I’m not 100% sure you’d want to do this as the data is all within our DataFrame itself. To me it wouldn’t quite make sense to not use the data provided in its format.

Not really. Because we need to have any number of subsets and any number of splits. The medium level API is best for this type of problem.

2 Likes

Thanks for the reply,

So it is like the use of middle, medium and high level api are not interchangeable and each of them can be used for a specific purpose which the other api can’t supplement. Am I right?

Not quite. The high level API is built upon the medium level API. Just some tasks getting it to fit in the high level API can be a pain, so you should go with the medium level. A prime example is having more than one train/test set such as this exact instance. You could generate a splitter that would then further generate every single split for the cross validation, but why should we when we already know our splits? Thus we move to the Datasets level because we already have our splits set up for us each time.

I guess then you could override a DataBlock's splitter each time however changing it via an IndexSplitter, but I find the Datasets in this case easier to read. Do you follow me?

1 Like

Thanks, got it clear :slight_smile:

Hi @muellerzr I was trying to run your object detection notebook, it ran fine all the way to the end. So I tried run learn.get_preds() which throws this error:

TypeError: object of type 'int' has no len()

Can you help me with this?

Yes get_preds and predict will fail IIRC. See earlier in the thread there’s a link to an object detection thread. Along with that though if you want to go much deeper you can always just use raw pytorch and convert everything back with as much fastai as possible. See my speed up thread here for that info: Speeding Up fastai2 Inference - And A Few Things Learned (though not sure if the decodes will work OOTB this way)

1 Like

In the cross validation notebook, our final accuracy is greater than the initial one but how do we do .predict using our final accuracy. As it is the summed up accuracy of 10 learner models.

Thanks,

You’d need to get the raw predictions from the 10 then sum and average them together.

1 Like

So to make a single new prediction I’ve to train the 10 models again and get raw predictions?

No, you’d use the 10 trained models that you have saved somewhere.

Ohhh, so the kfold for loop should be in such a way that each learner is saved (lyk… learn1,learn2…) and the model by learn1.save(‘model1’).
Finally get the raw predictions from these saved models:
learn1.load(‘model1’)
learn1.get_preds(‘test_img’)

Am I right?

Yes, that is correct.

1 Like

Thank you! :smiley:

Went through and updated all the notebooks today for vision and tabular and fixed any bugs associated with them. Most noteably is any notebook that had multiple functions in get_y, these are now wrapped inside a Pipeline. See ImageWoof for an example of this

2 Likes

In the style transfer notebook, while running _get_layers() function I get the following error:
“_vgg_config() is not defined”
Although I have imported vgg19

Unsure why, on my notebooks it shows the proper variable name (which as an _ in the beginning as it’s a private function)

1 Like

Oops Sorry, the screenshot is of the code when I tried without the ‘’.
If I use the '
’ also the get the same error: “_vgg_config() is not defined”
Also, I didn’t install nbdev nor used any of the # lines. Is it because of that?