Platform: Colab ✅

Does anyone have a benchmarking fastai on colab script already set up I could make use of? (I made a little dummy task, but I don’t trust the results - it’s not a realistic test. An actual notebook would be a better test)

Otherwise does anyone have a oneclick run all notebook set up for lesson 1 they can share? ie all the bash and setup commands already done for colab, no manually twiddling around with paths and setup?

I want to quickly benchmark my box against colab just to see if my box is close enough to just use it instead of colab.

Also, I’m having some docker issues with smh and ipc-host that I need a benchmark test to diagnose further.

I have the same issue while running ImageCleaner on Google Colab and I have quite a bunch of dataset on for fungi images classification.

It disconnects, and it’s always busy and stuck on that step. I’m already using GPU runtime.

4 Likes

Tutorial here: https://course.fast.ai/start_colab.html#step-4-saving-your-data-files

1 Like

They are stored in the directory where you data is located. A sub-folder named model will be created.

@ady_anr yes, you can. As the tutorial suggests, you can access root_dir using the standard python file system commands.
Example:
To view ALL you GDrive file, just execute

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
os.listdir(root_dir)

Now you can play around with you GDrive files, just like your local files.
Hope this helps!

4 Likes

Yes, Colab currently doesn’t support all ipywidgets. For some workarounds, read this thread.

1 Like

A new notebook means a new instance, hence all data will be lost. To persist data you need to save it in GDrive and programmatically access it from there. Complete steps here: https://course.fast.ai/start_colab.html#step-4-saving-your-data-files

Has anybody found a solution to running Image_Cleaner on colab?
each time i try running that line on lesson 2 download ipnb, the runtime gets restarted.
Stuck.

Does anyone have the solution to my problem:
When using collab, rather than filling up my gdrive with lots of standard datasets, i want to use gdrive just for my model data.

So I want my setup to look like this

/content/gdrive/My Drive/my_project/models/…

  • model_files"

/root/.fastai/data/…

  • mnist_sample
    • train
    • valid
      labels csv

I am guessing running the learner on data stored on the collab file server is much faster. But i want my models saves to persist an dbe able to be loaded again.
It doesnt seem a big issue to redownload the dataset when you need it

But it seems a learner object or the save/load methods wont allow me to specify a path

Does anyone have a solution to this?

Either in how you setup your project for persistance, or how you save out your learner models?

If you have a solution to this, can you post the code to setup, and a short example work flow of a learner save / load

NB: I am guessing i can write something like the following,

instance_path = '/content/data/mnist_sample/models/'
gdrive_path = '/content/gdrive/My Drive/fastai-v3/data/'
!cp $instance_path/my_model.pth '$gdrive_path'

but is there a cleaner way of doing it witht he inbuilt functions. EG create the ability to set a custom save location using the learner.save method e.g.

learn.save/load('model-name', path=custom_model_path)

Thx

I did some research and overloaded the save and load methods with this code.
Anyone care to comment? :slight_smile: @jeremy

def custom_path_save(self, name:PathOrStr, path='', return_path:bool=False, with_opt:bool=True):
        "Save model and optimizer state (if `with_opt`) with `name` to `self.model_dir`."
        # delete #  path = self.path/self.model_dir/f'{name}.pth'
        # my addition: start
        if path=='': path = self.path/self.model_dir/f'{name}.pth'
        else: path = f'{path}/{name}.pth'
        # end
        if not with_opt: state = get_model(self.model).state_dict()
        else: state = {'model': get_model(self.model).state_dict(), 'opt':self.opt.state_dict()}
        torch.save(state, path)
        if return_path: return path

def custom_path_load(self, name:PathOrStr, path='', device:torch.device=None, strict:bool=True, with_opt:bool=None):
        "Load model and optimizer state (if `with_opt`) `name` from `self.model_dir` using `device`."
        if device is None: device = self.data.device
        # delete # state = torch.load(self.path/self.model_dir/f'{name}.pth', map_location=device)
        # my addition: start
        if path=='': path = self.path/self.model_dir/f'{name}.pth'
        else: path = f'{path}/{name}.pth'
        state = torch.load(path, map_location=device) 
        # end
        if set(state.keys()) == {'model', 'opt'}:
            get_model(self.model).load_state_dict(state['model'], strict=strict)
            if ifnone(with_opt,True):
                if not hasattr(self, 'opt'): opt = self.create_opt(defaults.lr, self.wd)
                try:    self.opt.load_state_dict(state['opt'])
                except: pass
        else:
            if with_opt: warn("Saved filed doesn't contain an optimizer state.")
            get_model(self.model).load_state_dict(state, strict=strict)
        return self

learn.save = custom_path_save.__get__(learn)
learn.load = custom_path_load.__get__(learn)
# if you don't want to overload
#learn.custom_path_save = custom_path_save.__get__(learn)
#learn.custom_path_load = custom_path_load.__get__(learn)


model_path = '/content/gdrive/My Drive/fastai-v3/data/'
learn.save('new-model-name', path=model_path)
learn.load('new-model-name', path=model_path)
1 Like

Hello everyone,

I’m trying to setup google drive as default path for my data and model. As per documenation I setup following:

from google.colab import drive
drive.mount(’/content/gdrive’, force_remount=True)
root_dir = “/content/gdrive/My Drive/”
base_dir = root_dir + ‘fastai-v3/’

Now, what changes should I make so that data downloads into my google drive when I do?

path = untar_data(URLs.PETS); path

My data always goes into: PosixPath(’/content/data/oxford-iiit-pet’)

1 Like

Hi Vikrant,

Heres the code also for untar_data
https://docs.fast.ai/datasets.html#untar_data

accessible via doc(untar_data) in a traditional jupyter install of fast_ai (not colab)

Have you tried the rest of the code at the bottom of the collab config page
https://course.fast.ai/start_colab.html#step-4-saving-your-data-files

path = Path(base_dir + 'data/pets')
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)

the Path command makes a special path object where you can get access to folder methods

does this help?

Hi,
How do I add dir structures in colab and upload the text files for the images as suggested in lesson 2?

Hi, @kadlugan

I have a similar problem, I followed the tutorial:
https://course.fast.ai/start_colab.html#step-4-saving-your-data-files

I add this at the beginning:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'fastai-v3/'

and I imported:

from fastai.vision import *
from fastai.metrics import error_rate
bs = 64

then following the tutorial and your post I added:

path = untar_data(URLs.PETS); path
path = Path(base_dir + 'data/pets')
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)

and received this error:
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-8-41891b907d8d> in <module>()
      1 path = Path(base_dir + 'data/pets')
----> 2 dest = path/folder
      3 dest.mkdir(parents=True, exist_ok=True)
      4 path = untar_data(URLs.PETS); path

NameError: name 'folder' is not defined

I made in drive a folder named: fastai-v3. I am not well understanding where untar_data is saving the files.
How to make it saving in the google drive?
or in a folder in the pc?

Thank you much for your helps

dest = path/folder should be dest = path/'folder', if folder is a string or if not it needs to be defined.

@salvatore.r @vikbehal hope @gamo fix worked for you. I found that my overload learn.save/load code that i posted a few posts above helped me split up my model saves - Whilst retaining the fastai colab free functionality

If you are using standard data sets, the notebooks and saved weights are usually enough to learn and progress, and you can to keep a gdrive copy.
Saved weights can be in the 250Mb range for image recognition

If you are creating your own datasets then having all of the folder (images weights notebooks ) on your gdrive is a good option. Though I am not sure if you need to shift your image sets to the colab instance for performance. Anyone care to comment?

As you get more sophisticated you will see that practitioners begin using paid cloud services like AWS ec2/s3 for storage of models. or you build your own DL computer and have the datasets on it.

I am not sure of your tech experience (high | low) so I hope this advice hits the right level for you :slight_smile:

Thank you @kadlugan!

I did try but above will just create the directory system. How do I tell fast.ai that mentioned path is my path where data and models will be saved?

I followed this post to change the configuration. It works if I run notebook as-is. As soon as I change the runtime to GPU, it goes back to default fast.ai path.

Any guidance will be greatly appreicated.

Gabriel, thank you. I did that. Now how do I tell fast.ai to download data at that path? Also, what changes should I make so that model data is saved in Google drive - i.e. the path.

Did you rerun the notebook from start after you switched to GPU?

Yup! Did it work for you?