Platform: Colab (Free; $10/month Pro)

barnacl · April 2, 2020, 12:27am

I was able to use this nifty hack until i decided to open it from my drive so i could save stuff (got greedy )
now i can’t get my notebook to run
i tried from /content without mounting my drive as seen in the below picture:

it is stuck

sigh

safekidda · April 6, 2020, 4:51pm

Mine card was declined for colab pro too (UK, but used a US zip code), I tried a different card and it worked.

angelinayy · April 6, 2020, 7:38pm

same issue. did this:
!pip install fastai2>=0.0.11 graphviz ipywidgets matplotlib nbdev>=0.2.12 pandas scikit_learn azure-cognitiveservices-search-imagesearch sentencepiece

forgot about this, but this helped.

safekidda · April 7, 2020, 10:29am

Can anyone recommend the best options for storage on colab/drive?

I’ve got a lot of images, approx 250,000 and looking to increase that number substantially. I’ve got a few questions:

What’s the best fastest way to upload them to google drive? Just upload via web browser, or should I zip upload and then unzip with python once it’s copied?
Is there a performance hit compared to using local storage? My model is currently running on a work machine, where it’s on an SSD, but it’s a non work project so I need to move it.
Is there any advantage to moving the files locally to colab when training my model, i.e. to /content or elsewhere, somewhere other than drive

safekidda · April 7, 2020, 3:00pm

Well, I’ve done some experiments and I think I’ve answered my own question. Using training data straight from drive is horrendously slow. So the best option seems to be, zip it up, copy to drive and then use the following to copy from drive, unzip and then delete the zip. For about 25k images (1GB zipped), it takes about 20 seconds which is tolerable (So i’m looking at about 3 minutes for my full dataset).

! cp /content/drive/My\ Drive/Colab/data.zip .
! unzip -q data.zip
! rm data.zip

safekidda · April 7, 2020, 5:05pm

Done anyone know how to increase the size of the shared memory on colab i.e. the volume /dev/shm? Though they’ve upped it to about 5GB, I’m doing something fairly intensive and it’s bombing out with the following error. Jeremy previously had a script to do it, but it seems to have gone:

RuntimeError: DataLoader worker (pid 137) is killed by signal: Bus

gamo · April 8, 2020, 8:54pm

This is what I wrote for the last course.
(I have made changes so that it works with the current version of colab)

# change shm size to same as GPU memory, default to 12GB (K80 GPU)
colab_shm_size = '12g'
colab_fstab_path = '/etc/fstab'
colab_shm_pat = '/^shm \/dev\/shm tmpfs rw,nosuid,nodev,noexec,relatime,size=/!p'
colab_shm_tab_pat = f'$ashm \/dev\/shm tmpfs rw,nosuid,nodev,noexec,relatime,size={colab_shm_size} 0 0'
!sed -n -e '{colab_shm_pat}' -e '{colab_shm_tab_pat}' -i '{colab_fstab_path}'
!mount -T '{colab_fstab_path}' -o remount /dev/shm

It remounts shm but I have not tried it with large datasets/batches.

Other useful commands.
!cat /etc/mtab
!cat /etc/fstab
!mount -l
!df -a

If you experiment and mess up the fs just factory reset the runtime.

safekidda · April 9, 2020, 8:03am

EDIT: Tried it, and it works. Thanks @gamo

Thanks @gamo I think I tried this (saw from your previous post), but I was getting an issue with the second regex. I’ll give it another shot though Strangely my error seems to be intermittent.

Another tip for others using colab, my model training was approx 3x slower than the work machine which has similar GPUs so I figured out that it’s basically down to the measly CPU spec that colab provides. However I managed to make it about 50% quicker by changing the Runtime shape to High-RAM, not only does this increase the RAM, but it doubles the CPU. Obviously this only works for Colab Pro though.

safekidda · April 9, 2020, 7:27pm

@gamo So is the /dev/shm mount RAM? If so, might it be more efficient to put my data on there? I’ve just tried it and my epoch time has gone down from about 7 minutes to just under 6 - though it’s hard get an accurate baseline due to the performance inconsistency I’ve been seeing on colab.

gamo · April 9, 2020, 9:37pm

shm stands for “shared memory”. It is used to pass data between programs.
tmpfs is temporary volatile storage, also known as a ramdisk.

If I where to guess, in this case, it is used to pass data (batches) between the python part and cuda part (gpu) of the trainer.

If you put your unprepared training data on the shm you take up memory that should be used when sharing data.
The speedup you see can be due to that the training data does not need to be read from disk, which would make preprocessing faster. If you want to do it without speed or storage penalties you should resize the shm to training data size + gpu ram size or create another separate tmpfs for the training data.

harish3110 · April 15, 2020, 11:46am

Anyone come acrsoss any keyboard shortcuts to use in Colab that mimics a Jupyter notebook?

barnacl · April 15, 2020, 6:22pm

(ctrl + m) is the prefix you need to add. so how you would add a cell in jupyter - esc + a in collab will be ctrl+m followed by a
Screen Shot 2020-04-15 at 11.19.36 AM

wittmannf · April 22, 2020, 2:26am

Not sure if it is a Colab related issue, but when running the following Cell in the 05_pet_breeds.ipynb file:

pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
pets1.summary(path/"images")

I get the following error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-13-ead0dd2a047d> in <module>()
      3                  splitter=RandomSplitter(seed=42),
      4                  get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
----> 5 pets1.summary(path/"images")

6 frames
/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     53             storage = elem.storage()._new_shared(numel)
     54             out = elem.new(storage)
---> 55         return torch.stack(batch, 0, out=out)
     56     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     57             and elem_type.__name__ != 'string_':

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 500 and 335 in dimension 2 at /pytorch/aten/src/TH/generic/THTensor.cpp:612

Any hint on how to solve that?

muellerzr · April 22, 2020, 2:28am

@wittmannf IIRC that’s supposed to happen (notice you don’t have a Resize anywhere in there, it’s supposed to show how dblock.summary() can be used for debugging)

wittmannf · April 22, 2020, 2:30am

ohhh okay, thanks @muellerzr!!

dhoa · April 30, 2020, 12:28pm

Hi. I’m running into this problem when trying to git push in colab cell.

!git push origin master
fatal: could not read Username for ‘https://github.com’: No such device or address

Anyone get this problem when running git push in colab ? Thanks

muellerzr · April 30, 2020, 1:15pm

Can we have a bit more of your steps? Did you properly config git with your username and password?

dhoa · April 30, 2020, 1:26pm

In short, the steps is below

# Load the Drive helper and mount
from google.colab import drive
# This will prompt for authorization.
drive.mount('/content/drive')

%cd '/content/drive/My Drive/dhoa.github.io'
!git config --global user.email "dienhoa.t@gmail.com"
!git config --global user.name "dienhoa"

!git add .
!git commit -m "first commit"
!git push origin master

I think I forgot step to put my passord.

ulat · June 9, 2020, 1:20pm

Automatically saving models to gdrive.

I was annoyed when colab lost the session during training and I had to start from scratch. Therefore I extended the SaveModel Callback to not only save the best model after each epoch but also tar and copy it to my gdrive (or any other bucket-store).

In case someone hast faced simliar problems I share the code snippet:

gist.github.com

https://gist.github.com/ulat/5658a863b578d2617654dabddd9dd1a7

SaveAndPersistModel.py

class SaveAndPersistModel(SaveModelCallback):
  def __init__(self , monitor='valid_loss', comp=None, min_delta=0., fname='model', every_epoch=False, add_save=None, with_opt=False, 
               persistence_path:str='./drive/My\ Drive/'):    
    super().__init__(monitor, comp, min_delta, fname, every_epoch, add_save, with_opt)
    self.persistence_path = persistence_path

  def after_epoch(self):  
    super().after_epoch()
    #print(str(self.path/f"{self.fname}.tar.gz"))
    #print(str(self.path/self.model_dir/f"{self.fname}.pth"))

This file has been truncated. show original

mrfabulous1 · June 10, 2020, 8:39pm

Hi ulat hope all is well!
Great snippet.

Cheers mrfabulous1