Lesson 3 - Official Topic

Can we deliberately reduce the training set by using augmentation & allocate more images to validation/test dataset? Will it affect the results?

No item_tfms are on the CPU, which is why we only do the resize there, usually (you need images of the same size to form a batch).

1 Like

Should we use cleaner separately for training, validation and for each class ?

1 Like

Got it. So each image is transformed randomly in an epoch … the next epoch comes around and again, it is transformed randomly based on your transforms.

1 Like

It’s always better to have as much training data as possible (even with data augmentation). So 20% of your data for the validation set is probably all you need.

2 Likes

Just wanted to say, ImageClassifierCleaner is incredible. I used it in my models, and I found some very interesting data. :slight_smile:

6 Likes

If you haven’t tried nbdev yet, do it. Love it.

Already using it for second time at work, and getting all my coworkers on it.

5 Likes

Is there any way to use nbdev for the existing repos ? Where the code is already written and maintained using traditional way

1 Like

How can we use cleaner like function on the raw data rather than having to have to learn from the loss function?

Is there a similar function available to look at the raw images?

Why not smaller then 24x24 ? How small can it go ?

What do you do if you’re constantly getting new data in production? For example, if you got 100 new images of bears do you retrain the model every time with all the data or just the new data?

3 Likes

nbdev is used to convert a series of Jupyter notebooks to library and documentation. So unfortunately, no, this is not the purpose of nbdev.

1 Like

Are you supposed to run the cleaner utility commands for each category? IE, those 2 command lines to delete and unlink – Are those meant to run multiple times for each drop down category? Or is it meant to run only once?

what would happen if we put a black bear, teddy bear, and grizzly all together in one picture? would we ever want to do this?

5 Likes

Usually models require a minimum of 32x32 or 64x64 (we will see why when we are studying them later in the course).

2 Likes

For the image cleaner, can you annotate all the training/validation images for each category, and then run the unlink command once? Or do you need to do these two steps for each training/validation set?

Well then if you keep decreasing size, you have decreased performance. In many cases 224x224 is a good tradeoff between performance and compute cost/memory usage. It will depend on the particular application though and you have to try out different sizes likely.

what do we need to set up on the production server to deploy the model? Fastai 2? anything else?

Great question. This classifier will fail (for reasons we will see later tonight) because the model will want to predict one class on top of the others.
You need to use a different kind of loss function to deal with images that can have multiple labels (we will see this next week I think).

6 Likes

It’s best to retrain your model regularly, on a mix of new and old data. What percentage of which depends on your problems: bears don’t change so you probably want everything, but if your data could shift, you probably want some mix like 80-90% new and 20-10% old.

4 Likes