Platform: Colab ✅

@mrfabulous1 that sounds to me like you’re purposefully trying to play to the test set, which isn’t what we want :wink: The test set in question was a random subsample of 10% of my data, to which I realized was the wrong way to do it for such a tabular problem. I instead needed to do like jeremy does in lesson 7 with Rossmann and time-series like data. Thanks though!!! :slight_smile:

1 Like

Hi muellerzr thanks for your comments, they will put me in a good position for when I get to lesson 7.

Cheers mrfabulous1 :smiley::smiley:

Yesterday I published a detailed description on how to set up Google Colab and have it sync with your drive, both for development and storing datasets.
When using the Drive desktop app for syncing you can even write your scripts locally and immediately use them in your Colab notebooks. I find that especially useful for part 2 of the course. Anyway, here it is:

“Setting up Google Colab for DeepLearning experiments” von Oliver Müller

https://link.medium.com/y9PP1SkYQY

Please let me know if you’re having trouble with any of the steps, I’ll see what I can do then :slight_smile:

2 Likes

Hi, I went through the lesson 1 and used my own data set by importing images into the google drive using collab. I am curious to know if is there a way to upload and use the images directly from the local machine? (without importing the images anywhere (collab or drive)- this is needed due to security reasons). Thanks in advance!

1 Like

HI Pradi if there is a way I would like to know to.

It is my understanding that many people have their own GPU so they can train models with out using a service provider such as collab.

I used my Macbook Pro to complete lesson 1.
Unfortunately Macs are not fully supported so it doesn’t use the GPU.

Its specifications are:

Model Name: MacBook Pro
Model Identifier: MacBookPro13,3
Processor Name: Intel Core i7
Processor Speed: 2.6 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB

However each epoch takes approximately 40 minutes to complete using the pets database.

If security is an issue then you could probably do this.
I have created models then went to bed or worked on something else while they trained.

If no one comes up with an alternative this could be a way forward.

Cheers mrfabulous1 :smiley::smiley:

Do you need to manually download all the course files from Github and save them in your drive in order to get the Lesson 0 notebook to work (specifically, the cat example)? The only way I’ve managed to get that line to work is by running the code to mount My Drive as the base_dir, then changing the path in the open() method to match where the image is stored in My Drive (because I downloaded the entire fastai-v3 course repo and saved it).

However, I haven’t seen anyone else mention that they needed to clone the repo from Github and save it in My Drive (and it’s also not mentioned in the tutorial) so I’m afraid I’ve done something wrong. Would anyone be able to help me?

Hi pratyushmn hope all is well!

I did something similar to what you did. I saved the individual cat file in my Gdrive then changed the path in the open statement.

I didn’t download the whole repository as I only needed the single file.

You’re not doing anything wrong it happens because when we you use Colab we just download the notebook we need at the time.

As you go through the course you will probably make more little tweaks to help things go smoothly.

To me the only important thing is like Jeremy said is to get through the notebooks and writing code so you understand them.

Cheers mrfabulous1 :smiley::smiley:

This works well. Where do you store all this data from sources when you are running out?

Can this be used to store the kaggle data sets on google. How do you direct them to the drive?

Yes this is possible. First you download the kaggle dataset to your local machine. From there you upload it to your Google drive.

If you mount the drive as described in my post, you can access any folder that is part of the drive. Since your training data is part of the drive you just have to figure out the path. After you mounted your drive You can open the tab on the left (in Colab) to browse your files in the filetree. Let me know if you need help setting this up :slight_smile:

1 Like

My 2 cents, one thing I’ve found is if you instead copy the download link when you download the dataset and wget it in colab, it will typically download much faster for me (since it uses google’s servers) and then mv or cp bash command the folder to your google drive If you want or need a visualization of this I can post that later on today.

You’re right. Loading directly from the drive is really slow. I’ve been working with univariate timeseries lately. They’re so small in size that I kind of forgot about that problem…

Copying the data from drive to the notebook (basically what you just described) works though.

Problem - Colab session timing out after 12 hours model requires 20 hours what is the solution?

Hi everyone hope all is well!
In https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-imdb.ipynb the snippet below takes 1 hour 55 minutes to complete. (with GPU enabled)


The snippet learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7)) below takes approximately 20 hours. Google Colab resource limit is set at 12 hours, which means the session is terminated before its finished. (frustrating :frowning_face:)

Any Idea’s how to resolve this problem?

Reading the forum possible solutions could be:

  1. checkpoints
  2. callbacks
  3. run learn.fit_one_cycle(10, 1e-3, moms=(0.8,0.7)

    saving and reloading after each epoch. I haven’t got my code to work so far.

Any solutions, tips or ideas greatly appreciated.

mrfabulous1 :smiley::smiley:

Hey @mrfabulous1! I’d do checkpoints and model saves every x epochs. And save that save into your google drive. Else usually for LM’s I go to paperspace for a few hours…

1 Like

Cheers muellerzr I will learn checkpoints tomorrow its 4:30am here. Zzz
Many Thanks mrfabulous1 :smiley::smiley:

1 Like

Problem - Colab session timing out after 12 hours model requires 20 hours what is the solution?

To solve this problem I have created command using callbacks https://docs.fast.ai/callbacks.html To test it works I have used the multilabel example from https://docs.fast.ai/tutorial.data.html#A-multilabel-problem

learn = cnn_learner(data, models.resnet18, callback_fns=[CSVLogger])
learn.fit_one_cycle(30,1e-2, callbacks=[ShowGraph(learn), SaveModelCallback(learn, monitor=‘train_loss’, mode=‘min’, name=‘mini_train_30_best_model’)])

The output of the command above is shown below, the command saves a list of the epoch results to a csv file and a model is also saved to a file after every epoch.
image
image
image




How can I change the command above to stop at epoch 28 when the training loss is less than the validation loss ?

I have tried using other values such as error_rate but I get the following error, and am not sure how to change the command to achieve the result I require.

/anaconda/envs/fastai_uvicorn_0_7_1/lib/python3.6/site-packages/fastai/callbacks/tracker.py:50: UserWarning: <class ‘fastai.callbacks.tracker.SaveModelCallback’> conditioned on metric error_rate which is not available. Available metrics are: train_loss, valid_loss warn(f’{self.class} conditioned on metric {self.monitor} which is not available. Available metrics are: {", ".join(map(str, self.learn.recorder.names[1:-1]))}’)

Thanks in advance mrfabulous1 :smiley::smiley:

Typically it is better to save the model with the best valid_loss. Then, with early stopping, it seems like you would be done much earlier.

1 Like

Cheers ilovescience I changed the command to monitor valid loss and removed min and it now stops at epoch 13.

Thank you very much!

mrfabulous1 :smiley::smiley:

1 Like

Hi,
Need some help with lr_find function on Colab. I am trying to do lesson 2 of the fast ai course - https://course.fast.ai/videos/?lesson=2
This involves picking up urls of images from the internet, and running a cnn to categorize it.

I picked 3 items - forks ladles and spoons (attached). I ran the download images, stored and verified them. This is all fine.

Ran fit one cycle cnn through it, which gave some numbers.

The next steps ask you to unfreeze the model, and run a learning rate finder through it. Here is where I get stuck. Instead of giving me some numbers and a graph it gives me #na#.

I’ve tried below with start and end lr.

I’ve also removed those parameters, and tried just lr_find()
image

Please help :slight_smile:

1 Like