Lesson 11 discussion and wiki

yonatan365 · April 11, 2019, 8:10am

Now after I thought some more I think the LSUV loop is necessary after all!

Its because of the nonlinearity. It causes the effect of the division of the linear layer weights by the STD on the activations means/stds after the nonlinearity to become unpredictable.

It seems reasonable that iterating several times will push the activations stats nearer to our goal, but its not certain for any distribution of data…

sgugger · April 11, 2019, 12:48pm

Yes you’re right. And it just so happens that in that case, one iteration was enough.

marcmuc · April 11, 2019, 1:08pm

One question re: Images / Transforms: I seem to remember Jeremy saying in the 2018 course (fastai 0.7 version), that opencv beats PIL performance wise almost all the time, which is why fastai 0.7 used it instead of PIL. Now today Jeremy was talking again about the importance of performance in transformations etc. so I was wondering what was the reason for the switch from opencv to PIL for fastai v1 (and the course)? (I seem to remember it supports more image formats? But the performance issue would still remain!?)

brismith · April 11, 2019, 1:21pm

I found in Part1 when I was trying to classify sports action photos that higher resolution was key to improving accuracy - I ended up using 448 x 448. I’m guessing that as the action could be pretty varied it needed reasonable resolution to capture the differentiating features. Maybe shorts socks, cap sleeves or other things that could identify a sport. Off topic but I see some ‘other’ questions above - and I did end up using an ‘other’ class of basically my random photos (mostly landscapes and city shots) that avoided any non-sports photo with some grass in it being identified as cricket.

swagman · April 11, 2019, 1:40pm

When are we going to be taught how to feed rectangular images into CNNs? i cant wait for it

t-v · April 11, 2019, 2:04pm

So one thing probably is that PyTorch torchvision just uses PIL and that it is can be a bit less complex to install. Regarding the performance, there are three things to note:

Is preprocessing the training bottleneck? Preprocessing happens in the background with PyTorch dataloaders, so unless your model waits for the next batch, preprocessing probably is fast enough already. Homography / rotation does cost CPU cycles, cropping not really.
There is a SIMD drop-in PIL replacement that very likely catches up quite a bit.
If you really want fast preprocessing, you’d probably look at the GPU. Now that Jeremy has rehabilitated my lazyness of just using nearest neighbour, I should really put up my homography transform CUDA kernel (but it’s really trivial to do, so if you always wanted to implement a custom cuda thing, I can recommend it as a first project).

Best regards

Thomas

brismith · April 11, 2019, 3:39pm

The Mixup papersimilarly smelled tenchy to me - as it seems to produce a scrambled image by “convex combinations of pairs of examples and their labels” - so I guess this is a similar de-regularization.

jeremy · April 11, 2019, 4:59pm

Kaspar · April 11, 2019, 5:53pm

Noise like in dropout is believed to have an an effect like “ensemble learning” by breaking up the network in more or less de-correlated subnetworks. Its not just dropout but also other types/ amount of noise that has this effect. Considering that, it becomes more plausible that fish images without fish could be ok - in small amount.

Another angle is the mixup/mixin experience that shows that “borderline images” can help refine the boundary between classes. Ie a fisherman without fish is closes to than an astronaut -in some abstract sense

jeremy · April 11, 2019, 6:25pm

I posted the edited lesson video to the top post.

pcuenq · April 11, 2019, 6:36pm

Apparently there were multiprocessing incompatibilities between Python and OpenCV that made it unreliable.

wdhorton · April 11, 2019, 9:58pm

It already only loads the data into memory one batch at a time, so it has some lazy properties that make it possible to train on data larger than your RAM.

t-v · April 11, 2019, 10:19pm

I’m not sure if I missed it, but I’m a bit concerned about the timing of the GPU ops. You could need to have synchronization before the measurement and at the end of the timed function if you want to use %timeit (which I personally do a lot for quick benchmarks.).

Also, I’m not sure if I would include the transfer to the GPU in the benchmark, as you’ll be transferring your image to the GPU anyways at some point, so it’s not really an overhead that you incur from the transformation.

Best regards

Thomas

Kjeanclaude · April 11, 2019, 10:21pm

I succeeded to run this lesson 11 on Kaggle too and I got similar results as in Jeremy’s notebooks. I compressed and imported the exp folder.

Kjeanclaude · April 11, 2019, 10:33pm

I also think that you should deal with some variable casting …
Something like below:

jeremy · April 11, 2019, 11:06pm

What would you suggest would be the best way to write this for the example we showed?

jeremy · April 11, 2019, 11:27pm

This is already handled by CudaCallback.

jeremy · April 11, 2019, 11:28pm

Right - it’s not ideal. But by definition there won’t be many items in this class, so it should be OK. Trying to predict a category we’ve never seen before is always going to be tricky!..

jeremy · April 11, 2019, 11:29pm

Write a simple and readable version. See if it’s fast enough for what you’re doing. If it’s not, use your profiler to find what’s taking the time, and fix that bit.

stas · April 12, 2019, 12:38am

And it’s documented here:
https://docs.fast.ai/performance.html#faster-image-processing