Lesson 3 - Official Topic

maya · April 1, 2020, 1:55am

Can we deliberately reduce the training set by using augmentation & allocate more images to validation/test dataset? Will it affect the results?

sgugger · April 1, 2020, 1:56am

No item_tfms are on the CPU, which is why we only do the resize there, usually (you need images of the same size to form a batch).

deeplearner · April 1, 2020, 1:56am

Should we use cleaner separately for training, validation and for each class ?

wgpubs · April 1, 2020, 1:56am

Got it. So each image is transformed randomly in an epoch … the next epoch comes around and again, it is transformed randomly based on your transforms.

sgugger · April 1, 2020, 1:56am

It’s always better to have as much training data as possible (even with data augmentation). So 20% of your data for the validation set is probably all you need.

rahulrav · April 1, 2020, 1:57am

Just wanted to say, ImageClassifierCleaner is incredible. I used it in my models, and I found some very interesting data.

JPKab · April 1, 2020, 1:58am

If you haven’t tried nbdev yet, do it. Love it.

Already using it for second time at work, and getting all my coworkers on it.

deeplearner · April 1, 2020, 1:58am

Is there any way to use nbdev for the existing repos ? Where the code is already written and maintained using traditional way

ganesh.bhat · April 1, 2020, 1:59am

How can we use cleaner like function on the raw data rather than having to have to learn from the loss function?

Is there a similar function available to look at the raw images?

Yolo · April 1, 2020, 1:59am

Why not smaller then 24x24 ? How small can it go ?

Raymond-Wu · April 1, 2020, 2:00am

What do you do if you’re constantly getting new data in production? For example, if you got 100 new images of bears do you retrain the model every time with all the data or just the new data?

ilovescience · April 1, 2020, 2:00am

nbdev is used to convert a series of Jupyter notebooks to library and documentation. So unfortunately, no, this is not the purpose of nbdev.

Grace1 · April 1, 2020, 2:00am

Are you supposed to run the cleaner utility commands for each category? IE, those 2 command lines to delete and unlink – Are those meant to run multiple times for each drop down category? Or is it meant to run only once?

viccutler · April 1, 2020, 2:00am

what would happen if we put a black bear, teddy bear, and grizzly all together in one picture? would we ever want to do this?

sgugger · April 1, 2020, 2:00am

Usually models require a minimum of 32x32 or 64x64 (we will see why when we are studying them later in the course).

DanielLam · April 1, 2020, 2:00am

For the image cleaner, can you annotate all the training/validation images for each category, and then run the unlink command once? Or do you need to do these two steps for each training/validation set?

ilovescience · April 1, 2020, 2:01am

Well then if you keep decreasing size, you have decreased performance. In many cases 224x224 is a good tradeoff between performance and compute cost/memory usage. It will depend on the particular application though and you have to try out different sizes likely.

erlapi · April 1, 2020, 2:01am

what do we need to set up on the production server to deploy the model? Fastai 2? anything else?

sgugger · April 1, 2020, 2:01am

Great question. This classifier will fail (for reasons we will see later tonight) because the model will want to predict one class on top of the others.
You need to use a different kind of loss function to deal with images that can have multiple labels (we will see this next week I think).

sgugger · April 1, 2020, 2:02am

It’s best to retrain your model regularly, on a mix of new and old data. What percentage of which depends on your problems: bears don’t change so you probably want everything, but if your data could shift, you probably want some mix like 80-90% new and 20-10% old.