Anybody working on cdiscount Kaggle competition?

I am trying to train on the images based on the from folder strategy. But every time my kernel is dying after it does 1% of the training. Anyone who is working in this competition, did you face this problem? Did you do any kind of pre-processing on the images? The images are of size 180x180, so it is not really too big. I have the latest fastai code. I am only running this single notebook on a Titan X with 12GB memory. I have tried rebooting my machine, but the problem persists.

Here is the relevant code-

sz = 180
arch = resnet50
bs = 2

tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
data = ImageClassifierData.from_paths(PATH, tfms=tfms, test_name='test', bs = bs)

learn = ConvLearner.pretrained(arch, data, precompute = True)

Every time, training starts but when it reaches around 1%, the kernel just dies. Even when I do after not doing precompute=True, the same thing happens. I have tried with bs = 2, 16,32,64 but still kernel is dying. By contrast, other notebooks like lesson1-rxt50.ipynb are running smoothly.

I had the same issue using ImageClassifierData.from_paths and running on AWS.


  • I used no transformations

  • The kernel would die while running the learning rate finder

Smaller batch sizes and less-complex architectures failed to resolve the issue.

The competition deadline did not make it feasible to continue troubleshooting.

1 Like

Just to document what is happening with me so that I do not forget myself as well as for others who might face this problem.

  1. If I run the training in the fastai environment, the training would not start at all, and eventually it shows an error of going out of CUDA memory.
  2. If I run it on my own environment, then training starts but after 1% of training, the kernel dies.
  3. In my own environment, if I comment out the from .dataloader import DataLoader line from file, and add from import DataLoader Dataset, TensorDataset in the file, then finally training happens.
  4. But then when I try to predict on test data with tta= learn.TTA(is_test = True) the following error occurs -
RuntimeError                              Traceback (most recent call last)
<ipython-input-9-b9aab6f76bfd> in <module>()
----> 1 tta = learn.TTA(is_test=True)

~/fast_ai_fellowship/fastai/courses/dl1/fastai/ in TTA(self, n_aug, is_test)
    260         dl1 =     if is_test else
    261         dl2 = if is_test else
--> 262         preds1,targs = predict_with_targs(self.model, dl1)
    263         preds1 = [preds1]*math.ceil(n_aug/4)
    264         preds2 = [predict_with_targs(self.model, dl2)[0] for i in tqdm(range(n_aug), leave=False)]

~/fast_ai_fellowship/fastai/courses/dl1/fastai/ in predict_with_targs(m, dl)
    116     if hasattr(m, 'reset'): m.reset()
    117     res = []
--> 118     for *x,y in iter(dl): res.append([get_prediction(m(*VV(x))),y])
    119     preda,targa = zip(*res)
    120     return to_np(, to_np(

~/fast_ai_fellowship/fastai/courses/dl1/fastai/ in __next__(self)
    222         if self.i>=len(self.dl): raise StopIteration
    223         self.i+=1
--> 224         return next(
    226     @property

~/anaconda3/lib/python3.6/site-packages/torch/utils/data/ in __next__(self)
    193         while True:
    194             assert (not self.shutdown and self.batches_outstanding > 0)
--> 195             idx, batch = self.data_queue.get()
    196             self.batches_outstanding -= 1
    197             if idx != self.rcvd_idx:

~/anaconda3/lib/python3.6/multiprocessing/ in get(self)
    335             res = self._reader.recv_bytes()
    336         # unserialize the data after having released the lock
--> 337         return _ForkingPickler.loads(res)
    339     def put(self, obj):

~/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/ in rebuild_storage_fd(cls, df, size)
     68         fd = multiprocessing.reduction.rebuild_handle(df)
     69     else:
---> 70         fd = df.detach()
     71     try:
     72         storage = storage_from_cache(cls, fd_id(fd))

~/anaconda3/lib/python3.6/multiprocessing/ in detach(self)
     56             '''Get the fd.  This should only be called once.'''
     57             with _resource_sharer.get_connection(self._id) as conn:
---> 58                 return reduction.recv_handle(conn)

~/anaconda3/lib/python3.6/multiprocessing/ in recv_handle(conn)
    180         '''Receive a handle over a local connection.'''
    181         with socket.fromfd(conn.fileno(), socket.AF_UNIX, socket.SOCK_STREAM) as s:
--> 182             return recvfds(s, 1)[0]
    184     def DupFd(fd):

~/anaconda3/lib/python3.6/multiprocessing/ in recvfds(sock, size)
    159             if len(ancdata) != 1:
    160                 raise RuntimeError('received %d items of ancdata' %
--> 161                                    len(ancdata))
    162             cmsg_level, cmsg_type, cmsg_data = ancdata[0]
    163             if (cmsg_level == socket.SOL_SOCKET and

RuntimeError: received 0 items of ancdata

This is a known error which I saw from this issue but I tried the solution with ulimit, but it is not working for me. In all cases, I have the latest fastai code, and in case of fastai environment it is updated to the latest version. I have tried rebooting, but all of the problem persists.

Try upgrading to pytorch 0.3 - just came out this evening.

1 Like

Hi @jeremy, tried it with the new pytorch version. Still the exact same problems. At the moment, I am unable to do inference on any image test set. Not sure what to do now.

What if you split the test set into a few smaller folders? Does it work with a smaller folder?

I’m using the multi-folder approach and it’s working well. I’ve used this script to extract the images:

The only issue that I had was using precompute=True, it took all my RAM (16gb) and all my swap (30gb) for the first 5% of the precomputing. @jeremy, is there a way to use not so much memory?

During the training process (about 2 hours per epoch using a 1080 ti), it takes only 7gb of my RAM.

1 Like

Hi @jeremy, I tried with 30 images in the test folder, It actually works. I was thinking of dividing up the test set into many folders, and do inference on them separately. But it seems that to train the learner with precompute = True, you need to set the test folder specifically in the ImageClassifierData.from_paths function. So you have to train each time differently for a different test folder?

But on the initial issue of failing on test data, apart from the above error that I posted, sometimes it also shows the error of CUDA going out of memory. So, I tried with smaller batch sizes, but it seems that is not the problem.

I doubt you want to use precompute=True for test set anyway, so just have a small test folder for that.

That shouldn’t happen - have you tried on AWS using our AMI? Do you see the same thing there?

Not yet! Only tested on my localhost.

Trying to run the predict_with_targs I ended up with the same issue, it uses all the RAM (16gb) and SWAP (30gb) and can’t finish the process. Maybe the matrix of predictions is too big? We are talking about 3,095,080 images in the test dataset and 5k categories.

I’m thinking about running the test in batches (grabbing the argmax and deleting the matrix of predictions), does it make sense?


The same is happening when I run:

x,y = next(iter(data.test_dl))

This snippet takes about 10 seconds to run, after that it continues to increase the memory consumption until it takes all the RAM and SWAP.

@jeremy, what could be this issue?


I really don’t know - I’d need to know whether you see the same thing on AWS to help track it down.

Ok, I’ll try to reproduce it on AWS! Thanks @jeremy

Hi @jeremy! Yep, the same is happening on AWS. After running that line it goes from 2GB up to 60GB of ram. Follow my last screenshot (it was using 50gb):

Here are the notebooks, one that extracts the files from the bson, and one that reproduces the error:

Unfortunately I can’t download the dataset until after it’s over. I’ll try to take a look once I can.

I was giving a try to Cdiscount using the lesson1-rxt50 notebook but found memory issues too, similar to @thiago.

On my local workstation I was able to run the full notebook, getting RAM usage peaks around 90 GB (out of 128).
On AWS p2.xlarge (61GB RAM) I always got the Jupyter kernel frozen during the precompute phase with both PyTorch 0.2 and PyTorch 0.3.

I prepared train and validation datasets in the form of bcolz arrays with chunk length 1, and for training used ImageClassifierData.from_arrays and practically no augmentation. I used a batch size small enough to not exhaust the GPU memory, but it looks like it is the workstation memory that is gradually exhausted even with just 1 worker.

Since training on the full dataset locally takes about 4 days per epoch (12 million images 180x180), my initial plan was to use AWS for training several rxt50 instances on different partitions (600k images each) of the total train dataset, then create predictions using a voting ensemble.
My current plan is uncertain between trying also a p3.x2large just to see if it works (still 61GB RAM but 16GB GPU which might help keep the RAM less busy) or abandoning ship since also my free nights are almost exausted :sleeping:

1 Like

I have tried this competition 2 months back but not able to spend much time on it since then.
My code was based on some pytorch example(I don’t remember which one).

The main problem with python is that variables can’t be deleted once they go out of scope from the method. So memory allocated on gpu never released. you need to call del on variables which trigger destructor which releases gpu memory.

Here are the scripts if anyone is interested. It takes 22 hrs for me for 1 epoch on gtx 1070


I’m having a similar issue on a different dataset.

I’m training a G-LOC detector (G-induced Loss Of Consciousness – when a pilot blacks out in a hard turn). I put together a dataset of about 41,000 400x400 images, and I’m running the model on a local workstation.

Copying Jeremy’s approach, I started by training on smaller-versions of the images until reaching full size: 100 —> 200 —> 400

At full-size, my Jupyter kernel crashes once system memory & swap are exhasted - about a quarter through an lr_find() operation.

My workstation has 15.6 GB of memory and 15.9 GB of swap. I arranged the dataset into subfolders each of several thousand images, but this had no effect: memory usage still monotonically increases.

Something under the hood is holding onto stuff past when it’s needed, but I don’t know enough about FastAI’s internals to know what it is yet.

The dataset is just screen-captures of YouTube videos, about 1.55 GB zipped.

Edit: Just noticed there’s a .fit_gen() method for the learner class. Was my/our issue as silly as forgetting to use that? I can’t answer that right away since I rebuilt my dataset to ~7600 images instead of the original ~41,000 (think a lot of self-similarity between video frames was hurting generalization) – but I’ll eventually test it out.

1 Like

Hi guys,

Unfortunately, Kaggle has removed the dataset from their data page. I’d like to know if anyone still have this dataset with them? Thanks.