Platform: Colab ✅

@ady_anr yes, you can. As the tutorial suggests, you can access root_dir using the standard python file system commands.
Example:
To view ALL you GDrive file, just execute

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
os.listdir(root_dir)

Now you can play around with you GDrive files, just like your local files.
Hope this helps!

4 Likes

Yes, Colab currently doesn’t support all ipywidgets. For some workarounds, read this thread.

1 Like

A new notebook means a new instance, hence all data will be lost. To persist data you need to save it in GDrive and programmatically access it from there. Complete steps here: https://course.fast.ai/start_colab.html#step-4-saving-your-data-files

Has anybody found a solution to running Image_Cleaner on colab?
each time i try running that line on lesson 2 download ipnb, the runtime gets restarted.
Stuck.

Does anyone have the solution to my problem:
When using collab, rather than filling up my gdrive with lots of standard datasets, i want to use gdrive just for my model data.

So I want my setup to look like this

/content/gdrive/My Drive/my_project/models/…

  • model_files"

/root/.fastai/data/…

  • mnist_sample
    • train
    • valid
      labels csv

I am guessing running the learner on data stored on the collab file server is much faster. But i want my models saves to persist an dbe able to be loaded again.
It doesnt seem a big issue to redownload the dataset when you need it

But it seems a learner object or the save/load methods wont allow me to specify a path

Does anyone have a solution to this?

Either in how you setup your project for persistance, or how you save out your learner models?

If you have a solution to this, can you post the code to setup, and a short example work flow of a learner save / load

NB: I am guessing i can write something like the following,

instance_path = '/content/data/mnist_sample/models/'
gdrive_path = '/content/gdrive/My Drive/fastai-v3/data/'
!cp $instance_path/my_model.pth '$gdrive_path'

but is there a cleaner way of doing it witht he inbuilt functions. EG create the ability to set a custom save location using the learner.save method e.g.

learn.save/load('model-name', path=custom_model_path)

Thx

I did some research and overloaded the save and load methods with this code.
Anyone care to comment? :slight_smile: @jeremy

def custom_path_save(self, name:PathOrStr, path='', return_path:bool=False, with_opt:bool=True):
        "Save model and optimizer state (if `with_opt`) with `name` to `self.model_dir`."
        # delete #  path = self.path/self.model_dir/f'{name}.pth'
        # my addition: start
        if path=='': path = self.path/self.model_dir/f'{name}.pth'
        else: path = f'{path}/{name}.pth'
        # end
        if not with_opt: state = get_model(self.model).state_dict()
        else: state = {'model': get_model(self.model).state_dict(), 'opt':self.opt.state_dict()}
        torch.save(state, path)
        if return_path: return path

def custom_path_load(self, name:PathOrStr, path='', device:torch.device=None, strict:bool=True, with_opt:bool=None):
        "Load model and optimizer state (if `with_opt`) `name` from `self.model_dir` using `device`."
        if device is None: device = self.data.device
        # delete # state = torch.load(self.path/self.model_dir/f'{name}.pth', map_location=device)
        # my addition: start
        if path=='': path = self.path/self.model_dir/f'{name}.pth'
        else: path = f'{path}/{name}.pth'
        state = torch.load(path, map_location=device) 
        # end
        if set(state.keys()) == {'model', 'opt'}:
            get_model(self.model).load_state_dict(state['model'], strict=strict)
            if ifnone(with_opt,True):
                if not hasattr(self, 'opt'): opt = self.create_opt(defaults.lr, self.wd)
                try:    self.opt.load_state_dict(state['opt'])
                except: pass
        else:
            if with_opt: warn("Saved filed doesn't contain an optimizer state.")
            get_model(self.model).load_state_dict(state, strict=strict)
        return self

learn.save = custom_path_save.__get__(learn)
learn.load = custom_path_load.__get__(learn)
# if you don't want to overload
#learn.custom_path_save = custom_path_save.__get__(learn)
#learn.custom_path_load = custom_path_load.__get__(learn)


model_path = '/content/gdrive/My Drive/fastai-v3/data/'
learn.save('new-model-name', path=model_path)
learn.load('new-model-name', path=model_path)
1 Like

Hello everyone,

I’m trying to setup google drive as default path for my data and model. As per documenation I setup following:

from google.colab import drive
drive.mount(’/content/gdrive’, force_remount=True)
root_dir = “/content/gdrive/My Drive/”
base_dir = root_dir + ‘fastai-v3/’

Now, what changes should I make so that data downloads into my google drive when I do?

path = untar_data(URLs.PETS); path

My data always goes into: PosixPath(’/content/data/oxford-iiit-pet’)

1 Like

Hi Vikrant,

Heres the code also for untar_data
https://docs.fast.ai/datasets.html#untar_data

accessible via doc(untar_data) in a traditional jupyter install of fast_ai (not colab)

Have you tried the rest of the code at the bottom of the collab config page
https://course.fast.ai/start_colab.html#step-4-saving-your-data-files

path = Path(base_dir + 'data/pets')
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)

the Path command makes a special path object where you can get access to folder methods

does this help?

Hi,
How do I add dir structures in colab and upload the text files for the images as suggested in lesson 2?

Hi, @kadlugan

I have a similar problem, I followed the tutorial:
https://course.fast.ai/start_colab.html#step-4-saving-your-data-files

I add this at the beginning:

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'fastai-v3/'

and I imported:

from fastai.vision import *
from fastai.metrics import error_rate
bs = 64

then following the tutorial and your post I added:

path = untar_data(URLs.PETS); path
path = Path(base_dir + 'data/pets')
dest = path/folder
dest.mkdir(parents=True, exist_ok=True)

and received this error:
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-8-41891b907d8d> in <module>()
      1 path = Path(base_dir + 'data/pets')
----> 2 dest = path/folder
      3 dest.mkdir(parents=True, exist_ok=True)
      4 path = untar_data(URLs.PETS); path

NameError: name 'folder' is not defined

I made in drive a folder named: fastai-v3. I am not well understanding where untar_data is saving the files.
How to make it saving in the google drive?
or in a folder in the pc?

Thank you much for your helps

dest = path/folder should be dest = path/'folder', if folder is a string or if not it needs to be defined.

@salvatore.r @vikbehal hope @gamo fix worked for you. I found that my overload learn.save/load code that i posted a few posts above helped me split up my model saves - Whilst retaining the fastai colab free functionality

If you are using standard data sets, the notebooks and saved weights are usually enough to learn and progress, and you can to keep a gdrive copy.
Saved weights can be in the 250Mb range for image recognition

If you are creating your own datasets then having all of the folder (images weights notebooks ) on your gdrive is a good option. Though I am not sure if you need to shift your image sets to the colab instance for performance. Anyone care to comment?

As you get more sophisticated you will see that practitioners begin using paid cloud services like AWS ec2/s3 for storage of models. or you build your own DL computer and have the datasets on it.

I am not sure of your tech experience (high | low) so I hope this advice hits the right level for you :slight_smile:

Thank you @kadlugan!

I did try but above will just create the directory system. How do I tell fast.ai that mentioned path is my path where data and models will be saved?

I followed this post to change the configuration. It works if I run notebook as-is. As soon as I change the runtime to GPU, it goes back to default fast.ai path.

Any guidance will be greatly appreicated.

Gabriel, thank you. I did that. Now how do I tell fast.ai to download data at that path? Also, what changes should I make so that model data is saved in Google drive - i.e. the path.

Did you rerun the notebook from start after you switched to GPU?

Yup! Did it work for you?

Having fastai work directly with gdrive is probably not a good idea, you have to treat data on gdrive as on a NAS or other remote storage, if you try to run data directly off gdrive it will have to move that data over network and it will be slow.

Keep all data and models local on colab while you are working and then use !cp or python specific library to copy data and/or models to gdrive. It is best to create a function (def) for this that way you can include it in your learner and have it save the model to gdrive during learning if learning has to run for a long time giving you running backups of your model.

1 Like

@gamo @kadlugan

Thank you for your help, now I understood the general concept behind.

%reload_ext autoreload
%autoreload 2
%matplotlib inline

from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
base_dir = root_dir + 'fastai-v3/'

from fastai.vision import *
from fastai.metrics import error_rate
bs = 64

path = untar_data(URLs.PETS); path
path = Path(base_dir + 'data/pets')
dest = path/"folder"
dest.mkdir(parents=True, exist_ok=True)

however doing this what it is actually happening is that is creating a folder in my google drive, but for some reasons I am missing it is not downloading inside this folder. So it is creating an empty folder in the fastai folder on my drive.
In my opinion I think what is happening is that is creating a new path but without downloading the data in it.

It is not immediately obvious to me how you are trying to make the code you wrote do that. The untar_data can be given arguments for paths to both the download and the untared data. See https://github.com/fastai/fastai/blob/master/fastai/datasets.py#L151

If you want to untar your data directly to gdrive you should do something like:

gd_dir = Path('/content/gdrive/My Drive/fastai-v3/pets/')
path = untar_data(URLs.PETS, dest=gd_dir)

As I said in a previous post, if you are just saving the data to gd then that is ok, but if you will then use the data on colab, then colab will have to get that data back from gd over network before it can be used and that is slow.

Instead download all data and models locally to your colab instance and use them there, then when you want to save/backup your work copy the whole project folder to gd.

2 Likes

Hi all,

As of 30 minutes ago, I have not been able to use the Fastai library. Despite following the setup guide, I get the error below when I run: from fastai.vision import *

VersionConflict: (fastprogress 0.1.18 (/usr/local/lib/python3.6/dist-packages), Requirement.parse('fastprogress&gt;=0.1.19'))

Is there anyone also experiencing this issue ?