Google Colab Setup for FastAI Part 2 v2


(Naman Bhalla) #1

Hi !

As @jeremy had uploaded an early draft of the notebooks for the Part 2 v2, I decided to try these on Google Colab. For those unaware of Google Colab, it offers GPU as a backend for free for 12 hours at a time. You can check it at https://colab.research.google.com

I feel many might not be able to afford a GPU for training the models that are built in the course and it shall be a great help for them to have the notebooks and setup instructions for Colab so that they can try the teachings of the course themselves, that too for free.

I shall be maintaining a list of notebooks in this thread which can be directly run on Collab to try whatever is taught over the next 7 weeks.

Setup
To set up your Colab VM for Fast AI, you can download the linked notebook: https://github.com/Naman-Bhalla/fastAI_part2_v2_colab/blob/master/FastAI_v2_setup.ipynb . Please note that you need to run it every time a new VM instance is created.
Upload the notebook to your Google Drive. Go to https://colab.research.google.com and open FastAI_v2_setup.ipynb from File -> open Drive Notebook -> My Drive.
After it opens, go to Runtime -> Change Runtime Type and make sure Hardware Accelerator is GPU.Finally, go to Runtime and select Run All.
The following cells shall connect your Google Drive with Colab (for permanent file saving), setup FastAI, PyTorch, all dependencies and also download the datasets at the correct locations (At the moment, only dataset for pascal.ipynb is downloaded but I shall keep it updated with all the datasets as the course progresses.) It shall save the notebooks for the lectures covered till date in your Google Drive in fastai_v2_colab folder.
Would like to add a note about Google Drive integration. There shall be 2 prompts when the particular cell will run. Click on the links in the prompt, choose your Google account, allow access, and then copy the secure key generated into the box.

That’s it. The particular VM instance of Colab is ready to host your lecture notebooks. Go to your Drive and open the lecture notebooks directly or check modified ones below (Ensure their Runtime Type is also GPU Hardware Accelerated). Make sure to run the Setup notebook each time a new VM instance is created by you (~ When you open notebook after 12 hours).

Lecture Notebooks
Often due to limitations of Colab, there might be a possibility of the lecture notebooks not working as it is. Though I will try my best to make sure notebooks run without any changes, but in case some changes are needed, I will be uploading the modified notebooks in the repository. For most of the notebooks, only thing that needs to be changed shall be commenting out the matplotlib magic functions in the first cell

(%matplotlib inline
%reload_ext autoreload
%autoreload 2)

Just download the notebooks, upload them to a suitable directly in your Drive, open via Collaboratory, change Runtime Type to GPU and that’s it !! If any error, double check that you had run the Setup notebook, or run it again !

pascal.ipynb : https://github.com/Naman-Bhalla/fastAI_part2_v2_colab/blob/master/pascal.ipynb

As this is still an early draft, forum members have found an IndexError, possibly because of padding ? I have modified the notebook to remove the error (the error is not fixed yet, though ! I just commented out the lines mentioned by @belskikh here . Though this affects the final results, and the bounding boxes appear incorrectly as can be seen here , I am sure @jeremy will provide solution or fix this soon, I have uploaded the modified notebook so that if someone wishes to try the notebook before lecture starts.

When I ran pascal.ipynb m, it took less than 7 mins for the complete notebook to run. So, yes, Colab is very fast, after all you have Tesla K80 at your disposal !! I hope this thread gives relief to ones concerned about buying a GPU or paying for AWS to finish the course.

See you all in the course !

Regards. :slight_smile:


(Vikrant Behal) #2

Thanks, Naman for sharing.

My experience with colab so far:
I’ve tried google colab in the past and the connection gets lost in the middle of training. How is your experience so far?


(Naman Bhalla) #3

My experience has actually been quite good till now. Yes, I have also had a few situations when there was a connection loss but I guess I am lucky that they have been quite rare for me (like only 3 times over last 3 months. I feel this happens when the training time exceeds 3 hours. During those situations, I just end up using Azure.
The major issue I have with Colab is regarding file management. As all data is lost after a VM instance is destroyed, I have ended up creating a lot of scripts for different projects I work upon. Also, a few times no GPU is available probably because of limited resources.

Thanks for sharing your experience !!


(Vikrant Behal) #4

True! I also had to set up extra scripts to read and write the data to Google drive. Nevertheless, it’s a great place for learners who can’t afford paid resources!


(Arnav) #5

Great stuff @naman-bhalla. I did the beginning lessons 1-3 of Part 1 v2 on Colab and it’s a great resource.
Especially for the non-intensive playing around with the code part.
The script looks pretty exhaustive. I made a similar one for Kaggle competitions.


(Adrian Galdran) #7

Thank you very much for taking the time to translate the first lesson to google colab, Naman :slight_smile:

I’ve followed your steps, and your set-up seems to work for me. Only, I’m getting some out-of-bounds errors in the second part, but I guess that will be fixed by tomorrow. One question: the data (the pascal dataset, I mean) we download with your notebook persists on our google drive after closing that particular session, or it disappears and needs to be downloaded each time? I would actually prefer it to disappear, I have not too much google drive space left…

Thanks!

Adrian


(Vikrant Behal) #8

With my experience, it disappears unless saved explicitelty.


(Adrian Galdran) #9

Ok Vikrant, good to know, thanks!


(Avinash) #10

Hi naman. Thanks a lot for putting it all together. I am relatively new to colab. So if the steps you mentioned are followed, it gives us the fas.ai environment which lasts for 12 hours, right? I should take the notebook you shared that contains the installation scripts, run the scripts and then import fast.ai notebooks and start executing them, is it? I tried following the steps and then imported pascal notebook. when i run the first cell ,it says it can’t find fast.ai module. Am i missing anything here?


(Arnav) #11

@avinash3593 Every notebook you start up in Colab provides you with a different Virtual Machine. Make sure the you add the installation scripts on top of the notebook you’re going to work with.
You could always !ls any_directory and poke around to see what is installed/downloaded and what’s not.
Also, since the VM resets every 12 hours or 1.5 idle hours, always keep track of the time your VM has been active using !uptime or other similar commands.
Hope this helps.


(Avinash) #12

Thank you @keratin. I did try looking into what !ls shows. I was expecting the downloaded data and installed libs to be available for 12 hours. That wasn’t the case when I opened a new notebook. So I was wondering if it’s just that my connection to existing VM was reset or if every new notebook needs to be setup. Thanks for the clarification.


(Naman Bhalla) #13

Hi !
Make sure you have changed the runtime type of the pascal.ipynb also to GPU. In case you initialized the Setup notebook in a GPU backend, the files and other repos are installed for that particular backend. By default, a new notebook opens in a CPU only backend.


(Naman Bhalla) #14

The data is actually never downloaded to your Drive but to the VM instance. So, it disappears and doesn’t take your Drive storage.


(Avinash) #15

This worked. When I opened a new notebook, it’s runtime was not GPU. Once I changed it, the downloaded data and installed packages are working fine in the new notebook. Thank you!


(Ibrahim El-Fayoumi) #16

thanks, I am downloading it now.


(vibhor sood) #17

Hello Naman,

Do you also have the updated FastAI_v2_setup.ipynb for pascal-multi.ipynb?

Thanks


(Mayank) #18

how can i download from here : http://files.fast.ai/data/ in colab notebook? wget extension isn’t wirking for this.


(Asif Imran) #19

You should add a few more details. Which dataset are you trying to download exactly (or the corresponding notebook)?

I use colab quite frequently and yet to come across any wget issues.

Best,
A


(Mayank) #20

I’m running lesson4 part1 v2 on colab and came across imdb dataset. How can I download it from the given link into the notebook.


(Asif Imran) #21

I see. Looking at my old notes, these seemed to have done the trick for me. Give it a go …

!mkdir -p data/imdb/  
!wget http://files.fast.ai/data/aclImdb.tgz -P data/imdb/
!tar -xzf data/imdb/aclImdb.tgz -C data/imdb/
!rm -rf data/imdb/aclImdb.tgz

Good luck studying
A