New Tesla T4 available in google collaboratory!

Just that, we have a powerful GPu now available for free @ colab.


It is a very fast card, with 16GB of memory. Roughly, 4 times faster than the old K80. (equivalent to a RTX2070)

19 Likes

Do you know of methods of importing your dataset directly to google colab, without storing in on google drive? I mean, if I directly do wget command, then I would need twice the amount of memory to store the extracted files and the original tar one. Does this make any sense?

It makes sense. I do that all the time.

I actually created a new forum and found a solution for it. Can you clear one doubt. So google colab gives 50GB of disk space and if I have a dataset of say 70GB in size, then after mounting my google drive how fast can I access my data using PyTorch dataloaders.

The forum I created.

The problem with colab is the 1 core cpu, so the image dataloaders are slow.

1 Like

I must have thought about that. Same problem with kaggle kernels.

Well one thing that we can benefit, is downloading the zip files and then extracting in colab itself and then downloading only the relevant pieces of it.

Not bad, but 3 times slower for the oxford-iiit-pet data set compared to my local machine (8 core Xeon, 16GB Ram, 1070ti 8GB) set up.

1 Like

It is the CPU, the T4 is faster than the 1070

It is the CPU, the T4 is faster than the 1070

True! Is there a way to change this? or is that in a “paid tier” of colab somewhere?

No paid tier of Google Colab. For that you have to use GCP

Actually the CPU on Colabs atm has two cores:

1 Like

the forum you created is private? I can not access to that link

I forgot about that. I am posting the discussion I had over there. So this is what I asked.

So I have been struggling with dataset downloads, which takes days. And at the same time, Google Colab has come up with Tesla T4 GPUs, so I have come with a weird plan to download data. Can someone who uses these tools guide me?

Main Goal:- Data should not be downloaded by me.

Solution:- I am considering upgrading my google drive storage to 100Gb, and the plan is to make Google download the datasets for me.

How:- This is where I need help. But one thing that comes to my mind, is to use curl or wget in google colab and maybe that will work. Have not tried that yet? So, does anyone have faced this issue or knows a workaround that I am not able to think of.

EDIT:- Getting another internet connection is not an option based on my current circumstances.

And the solution that I came up for it.
The only price you have to pay is for Google Drive storage.

How it works?
I show it using Google Colab as that is where we need data. Some datalimits before that. Google Colab gives you 50GB disk space and 12GB RAM for free. This storage is temporary, once the kernel is terminated this space is gone. To get permanent storage we will use Google Drive, which gives 15GB of free storage. You can create multiple accounts, if your datasets are small like in NLP. But the 100GB price plan for $2/month is a offer that you must consider (also for 1year subscription you get 2 months free).

General Guideline

  • Download using wget in Google Colab
  • Move it to Google Drive

Code

  • To download use this command in Google Colab
# This command you have to work around, but when it works
# I got around 15MB/sec download speed.
wget 'your_url'

Note :- Not sharing the wget command, as license issues.If you do ls, you get

!ls
# MRNet-v1.0.zip	sample_data
  • To move the downloaded zip file to Google Drive
    Run this cell and then you would get a verification URL.
from google.colab import drive
drive.mount('/content/gdrive')

All your Google drive can be found in this directory in /content/gdrive/My Drive/

!ls /content/gdrive/My\ Drive/

Link I found useful

Well there are other ways to move files between drive and colab using files.upload(). I have not used this method, so cannot comment much on it.

1 Like

Brilliant, 16 GB of pure awesomeness!

Tue Apr 23 20:39:07 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   44C    P8    16W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

@tcapelle do you know when exactly this become available?

It started rolling out last week.

Also, if you load data from google cloud storage buckets rather than google drive, it’s very fast. There has been some talk among forum members (not fastai staff) about making an unofficial fastai extension to load data from GCS, but I’m not sure where it’s up to.

@ThomM Were you able to mount the buckets as drives under colab?

I tried using google colab, but it was similar in speed to kaggle kernels, even with to_fp16()! How can I improve the speed? I assume this has to do with dataloading by the cpu, so is it possible to do this with the GPU?

Am I the only one who has a slower training with to_fp16() on the new Colab GPU ?

I doubled check with the show_install() and have the T4 enabled. What could be the reason ?

Sorry, I was imprecise. I should’ve said “it should be very fast”. I’ve heard from a reliable source at google that the backend infrastructure is configured so that Colab and Cloud Storage have very high bandwidth, but no, personally I haven’t tried it. Sorry to be misleading.