Bengali AI kaggle contest

(Venkateshwar Ragavan) #1

I am a novice kaggle. I am taking part in the Bengali AI contest. I am facing a roadblock. It is being incredibly difficult to train using Kaggle as the Kaggle kernel cannot be inactive for an hour and the duration of training for 10 epochs with Resnet 101 takes about 10 hours which is beyond the total amount of time Kaggle kernels can be used. I thought of using Colab for this purpose but I am facing a roadblock in uploading the 200K images dataset to google drive(I downloaded it as a zip file in Gdrive and I am trying to extract it which is unsuccessful as the Colab notebook often crashes due to the exorbitant amount of data. Any suggestions?

0 Likes

(Zachary Mueller) #2

Are you running out of room specifically in Colab for the files?

0 Likes

(Venkateshwar Ragavan) #3

Memory doesn’t seem to be the problem. The private data test is downloaded into my google drive via the setting up the Kaggle API on Colab. The dataset is downloaded as a zip file and is stored in my google drive. While extracting the zip file that is in gdrive, the Colab notebook crashes due to timeout due to large no of images. I have also tried to manually extract it but it doesn’t bear fruit either.

0 Likes

(Zachary Mueller) #4

Can you post the exact error it winds up throwing at you?

0 Likes

(Miguel) #5

If you are using fastai2 you can check this kernel I just shared on kaggle :slight_smile:
https://www.kaggle.com/mnpinto/fastai2-starter-lb0-9598

There are other good kernels using fastai v1 that you can look to get a good baseline.

Regarding Colab I’m not sure but if you try smaller models even a resnet50 the 9h limit on kaggle should be enough to train a single fold.

2 Likes

(Vijayabhaskar J) #6

I think you’re directly unzipping the files from the Gdrive, better copy the zip file to local colab storage and then extract it.

2 Likes

(Amrit ) #7

Not sure if you have already found a solution but this is how I did it. Once you have downloaded the zip files from Kaggle into Colab, move them to your google drive: (using your paths)

import shutil
shutil.move("/content/train.csv.zip", “/content/gdrive/My Drive/Colab_Data/Bengali/”)

Once moved, cd into that directory and uzip:

!unzip train_image_data_0.parquet.zip

I have included a link the code used: https://github.com/asvcode/BengaliAI

0 Likes