It does seem slow and for resnet34 I had to set bs=12 instead of bs=16 like they did, because I got a ‘bus error’ (ie. not enough GPU memory). For resnet50 they set bs=10 which works.
It feels to me that using resnet is a cheap trick. Your training is not any smarter/faster, you are just building upon a huge pre-trained NN which is awe-inspiring but you have no way to control.
The point of using a big pretrained net is you get really fast to a very good baseline. That is the state of the art recommended approach for most vision problems today. From there it’s your choice if you dig deeper to understand what’s going on to make your models better. OTOH you can use this powerful model you’ve already trained for your problem to train smaller models much better (this knowledge distillation technique should be covered in future lessons, but you can start looking into it from this paper https://arxiv.org/abs/1503.02531 ).
If you look at the actual implementation of the training procedure, you’ll see there’s quite a bit of “smart” in the backend (e.g. : https://arxiv.org/abs/1803.09820 ).
You can get a better feel of how good the framework is if you try to match the accuracy obtained in fastai in another framework. Then compare how much time it took you to write the code to do all you need and how much time the training takes on similar HW.
I played around with a kaggle competition last night and using code based on the Lesson1 notebook I got to around 96% accuracy in 30-40 min ( dowloading dataset + coding + training). Will share once I clean it up. I have much more experience with keras & tf but it would’ve taken me at least as much time just to write the data preparation code. Maybe I will try to match the result in keras during the weekend just for comparison.
I have little experience in this area, but I dislike the fact that there is a fastapi python library too.
It’s easy to code a ‘happy path’ into your API and then the examples look really short and powerful. Of course, once you need to change anything you have to dig down and then you see if the API is really well thought of or just veneer.
Knowledge distillation sounds cool to me as it also hides the implementation detail. It’s a form of obfuscation. I remember reading that some Google guys showed you could make an equivalent single-layer NN for a multi-layer NN.
I found very interesting the article you suggested ( https://arxiv.org/abs/1503.02531 ). I was guessing if you already tested to shrink a model training on a small deploy device. It happen to me often to load deep learning solutions on very small devices with only 4 or 8 cores and limited amount of memory and bandwidth. But I never attempted this approach…
I just tried running ConvLearner on the “3”/“7” images dataset and Google Colab was about 7 times faster than my CPU (I actually have Linux in VirtualBox and I only gave it 4 of the 8 i7 cores I have, but still).
So while Colab might be slow, it sure beats the CPU.
If somebody needs the mindmap, I will share it somewhere with you.
You can modify its content as you desire by installing Xmind (Zen or 8) (I made this one in Zen and I recommend it as it is much faster to work with).
Regular expressions get complicated really fast. Note that ImageDataBunch.from_name_func allows you to set a label_func= argument which is a function you can define to label each image. I find it easier for me as instead of grouping with regexp I can just do str.rfind(".") and then extract substrings manually in Python.
For those who had errors on Friday during Google Cloud setup (either because of gcloud-cli, or other packages), below there is an alternative how to do the entire setup, from creating an instance to accessing/running jupyter notebook in the browser:
As far as I remember Fedora distros were plagued with errors and even one Ubuntu had trouble.
Hey Teo,
I tried the tutorial you sent but I ran into problems: I can’t select the service field “until I upgrade my free trial account” - see attached.
PS: I’m on Fedora 27.
LE: I upgraded the account. Now I have to wait 1 day for Google’s reply. Thanks for the tutorial!
Does anybody have also problems connecting to the GCP instance?
Today I’ve received all the time the following error (after gcloud compute ssh command) and only deleting/re-creating the instance solved the problem for a short time (after stopping the instance the problem arose again).
ssh: connect to host 35.204.66.68 port 22: Resource temporarily unavailable
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
The content above may be simply saved as a new file called .bash_aliases inside home directory and will be automatically loaded at log-in time by .bashrc under Ubuntu.
These 4 commands may be enough for working solely from terminal (i.e. independent of the GCP webapp).
Hi, I have just launched a discussion thread on fastai study groups to gather feedback from organizers and participants to identify best practices and avoid some gaps.
More information in this post. Thank you if you can take a few minutes to participate in the discussion