Lesson 2 discussion - beginner

radek · November 18, 2017, 1:06pm

For larger datasets, I was experimenting with pytables - the IO on AWS with reasonable volumes seems to be very limiting, especially with smaller files. I also experimented with other types of drives, etc.

Once we go from an image to a matrix, this suddenly becomes huge. Problem is - I never found a good way of compressing it to conserve disk space. I think with the 70GB of data for the cdiscount competition, once read in as matrices, even with compression, the data becomes over 1TB or something like that, maybe even closer to 2TB IIRC.

Guess this is just how it is but I never found a good way of compressing image data in matrix form for storage. Chances are algorithms for image compression are just so highly specialized nothing else generic comes even close… and the whole idea of reading in images and saving them as matrices is just silly.

BTW if you have a dataset this size, does shuffling still even make sense?

creviera · November 24, 2017, 2:51am

Could you explain what restarting a jupyter notebook kernel does? How does it affect weights? If we have precompute=True then weights have already been computed and restarting the kernel shouldn’t affect this, correct? However when I run

learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)

I get 83% percent accuracy on the third epoch. If I run the cell again, I get something like 31%. If I restart the kernel, we’re back to 83%. Why would my results be different based on the state of the kernel?

creviera · November 24, 2017, 2:59am

[deleted]

creviera · November 24, 2017, 3:07am

Is there any reason to call learn.fit when we have precompute=True? You say that if you have saved weights to skip learn.fit? Does learn.fit do anything more than generate weights?

vikbehal · November 24, 2017, 4:01am

If you’ve saved weights, you don’t need to rerun. You just need to create learn object and load weights.

vikbehal · November 24, 2017, 4:04am

Before restarting kernel ensure to save weights and load after restart. If you’ve not saved, and not loaded, they will be recalculated.
Precompute true saves weights in cache thus it’s fast.

creviera · November 24, 2017, 5:45am

Ah, okay. Looks like I can call learn.load before learn.fit and a learn.save afterwards, like this:

learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.load('224_all-layers')
learn.fit(1e-2, 1)
learn.save('224_all-layers')

And it is a lot faster now, thanks!

[Update] Oh I think I see what you’re saying. I could do a learn.fit and then a learn.save(some_name) before I restart the kernel. Then, after a kernel restart, comment out the learn.fit and instead do a learn.load(some_name).

vikbehal · November 24, 2017, 6:41am

Yup! So, whatever your model has learnt, it’ll be saved.

sabzo · November 24, 2017, 7:49am

test time augmentation: I understand the data augmentation that can be done while training a model. After the model is trained what use is test time augmentation? Is learn.TTA() re-learning or somehow improving the model? If it’s a way to make the predictions better isn’t that training in a way? Or is it a way to increase the size of the validation set (having more images to test against)?

Thnx.

radek · November 24, 2017, 8:41am

TTA is not improving the model nor is it enlarging the validation set. All it should do is enhance our model’s predictive ability. The idea is that if we show our model a couple of slightly altered versions of the same image, we hope that overall it will do better then if it saw just a single image.

Maybe the image after transformations will be more like what our model learned on the train set? And even if not, it’s sort of like us when we take a new thing in our hand and try to recognize what it is, what it does. We turn it around in our hands and look at it from various perspectives. I think that this is somewhat like what we are doing here - probably not a perfect analogy but one way to look at this

Either way, by transforming the image a little bit, I think the idea is to address some of the deficiencies of our model (not being that great in categorizing images where the object doesn’t appear in roughly the same spot like in the train set, etc).

sabzo · November 24, 2017, 2:24pm

Thank you for the response! You mentioned “TTA is not improving the model” and at the end of your response you mention “it addresses some of the deficiencies of our model”.

To “enhance [the] model’s predictive ability” sounds like an improvement to the model.

I’m still not sure what TTA is doing – perhaps it’s that I don’t understand the role of learn.predict()?
After the training is prediction a form of validation? Measuring the results? Or does calling learn.predict() (or learn.TTA()) change (improve) the model’s ability to predict (predict in this case meaning to “classify”).

radek · November 24, 2017, 2:31pm

Sorry, I think I understand your question better now

predicting - we take our trained model, show it an image, and ask it what category the image belongs to
TTA (test time augmentation) - before showing our model an image, we create a couple of other images that are similar (sort of like we augment the train data with data augmentation). We might change how light the image is, make changes to the colors, scale the image, etc. We want our model to classify the original image, but instead of showing it just the single image we started with, we show it the original + slightly altered copies. We predict on each of those images and then combine the results somehow (in our case, I believe we just take the average of the predictions).

Would this answer your question?

sabzo · November 24, 2017, 2:38pm

Not really – I understand that process: We take the average prediction of the image + 4 transformations. I don’t understand why we need to do so. However TTA seems like just testing to me – Instead of testing 1 image, now we’ve tested 5 images. We’re calling TTA to see can it detect transformations. We’re testing how good the model is. Not making it better, but giving it different augmentations to see how good it is. I just don’t see how testing 5 images vs 1 image has any effect if the model has already been trained (after all augmentation is done during training).

jeremy · November 24, 2017, 6:19pm

Analogy: teach a baby to recognize a chair. Show baby 3 different chairs. For each, show them from the front, and from the back. That’s training augmentation.

Now ask the baby to recognize a chair they haven’t seen before (that’s testing). Regular version: just show them the front of that one chair. TTA version: show them both the front and the back of that chair.

You would expect them to be better at recognizing it, if you show them both the front and the back.

sabzo · November 24, 2017, 7:08pm

Ah! Thank you. So to potentially get better classification results, I can take an image along with its various augmentations and the model may have a better chance to classify the image.

I’m thinking in terms of a Web App where a user uploads an image to be classified (dog breed for example). The image once uploaded will be transformed in various degrees (side ways for ex) and the model seeing the same image in different augmentations might be able to classify it better.

I hope this is correct.

jeremy · November 24, 2017, 10:13pm

Exactly correct

Avhirup · December 4, 2017, 4:55am

The validation loss starts increasing …What should i do now? @jeremy

jeremy · December 4, 2017, 6:57pm

That means you’re overfitting! So try the techniques we learned in class:

Data augmentation
Dropout
SGDR
Differential learning rates

Kaneki · January 24, 2018, 7:41pm

I’ve searched for the Dog Breed notebook but I’m not able to find it at all. Even tried git pull but it seems my repo wasn’t updated with anything new. Can anyone provide me a way to obtain the notebook ?

jonneff · January 25, 2018, 12:16am

I’m pretty sure you have to create it yourself. It’s a homework problem. I started from the lesson 2 workbook and modified it to use the dogbreeds data.