Training model on different data set: some issues & fixes

nbharatula · October 29, 2018, 11:24pm

So, I managed to setup my GCP server and go through Lesson 1. Today I’m trying to download my own data set to create a training model using that. I’m following the instructions in the downloadimages notebook. Successfully downloaded two sets (got some content_length error but ignoring that for now), but for the third set, the download_images function always gets stuck close to 98% done and I’m unable to run any more commands after that. Like here:

11%20PM

Anyone else face this? Any tips to move past this?

PS: also, can I ignore the content_length error? I notice its on few files, and basically results in downloading lesser than 200 files… I’m guessing its ok for now.

lesscomfortable · October 29, 2018, 11:45pm

Can you send a screenshot of what error you are getting with ‘content_length’? Are you getting this error only in the last images?

It is normal that you actually download less than the 200 images since some images cannot be downloaded or cannot be opened and these we are not interested in keeping.

nbharatula · October 29, 2018, 11:47pm

This is what I see:

Also, this time my last set downloaded to 100% but I got a connection reset error as well. Not sure if its due to a bad file in that set or a general connection timeout.

lesscomfortable · October 30, 2018, 12:03am

Yeah, I wouldn’t worry too much about this. These are errors in the downloading of the images. Some images are in servers that have connection problems, other servers respond to HTTP requests (which is what the fastai library is doing behind the scenes with requests.get) with errors (‘content-length’).

These images will not be part of your final dataset since they cannot be downloaded.

nbharatula · October 30, 2018, 12:21am

Ok, thanks. So I’m able to create ‘data’ now and view it too. But the path/‘models’.ls() fails - obviously, because the ‘models’ directory hasn’t been created yet. Did I miss a step somewhere?

Also, the ImageDataBunch.from_folder did not create any train/valid/test folders. Is that expected behavior from that command?

lesscomfortable · October 30, 2018, 12:35am

Where are your running path/‘models’.ls()? I don’t see it in my notebook.

No, this is not expected. ImageDataBunch.from_folder should have created train and valid directories for each of your categories’ folders.

nbharatula · October 30, 2018, 12:48am

I see it in the download_images notebook check pointed on last Saturday -

I made a copy of this to create my version. And the train, valid folders are not getting created in my version… just did a update on the fast.ai library… will try again now.

lesscomfortable · October 30, 2018, 12:54am

Yes and if you can try the last version of the notebook as well.

nbharatula · October 30, 2018, 2:44am

Did a git pull and also updated the fastai libraries. But the ImageDataBunch.from_folder is still not creating any train/valid folders under my working directory. However, I was able to train a model and run interpretation on the results. What am I missing here?! What is the purpose of the train and valid folders and where do they get used?

lesscomfortable · October 30, 2018, 3:22am

Nalini, I heard others with this same issue and I will look into it tomorrow. I’m curious on how you run your model.

The training folder contains the images with which we teach our network, penalizing when it fails and reward it when it succeeds. With the validation set we can see if what the model is learning is generalizable or not and we can tune our hyperparameters to make our network more general. Both are needed for effective training.

jeremy · October 30, 2018, 3:38am

This is not correct. It doesn’t create any directories. (@lesscomfortable see the docs - you’ll note that it doesn’t make any claim to create any directories!)

nbharatula · October 30, 2018, 6:19am

Thanks Jeremy. Does that mean the function uses the datasets in the classes to both train and check validity? Based on the valid_pct values?

For images downloaded from Google, do you recommend manual pruning and creating separate sets for training and validity to reduce the error rates? I ran the training model on classes I created using Google image search, but the error rates are close to 50%.

amirmim · October 30, 2018, 6:43am

Hi everyone, I could re-run the lesson one with two different datasets one from kaggle and one downloaded from google images but have some issues with a third data set: I get an error from data.normalize(imagenet_stats)

and another error when running the learn.fit_one_cycle(4) which pops up when the first epoch is finished:

I changed the number of samples in test and train folders and re-run the code with different bs, but couldn’t fix the issue.

isubrama · October 30, 2018, 10:13pm

Nalini,

I too using GCP server. Since this is my first experiment with building image classification, I tried with the smaller data set MNIST (https://s3.amazonaws.com/fast-ai-imageclas/mnist_png.tgz) from fastai.

You can probably start with that and then try with your own custom data set.

prratek · October 31, 2018, 4:55pm

I’m noticing the same problem, also on GCP. It gets stuck on the first set for me if that’s relevant for any reason. The last time, I restarted the kernel and ran it again a few times and it worked. Once it worked the first time, I had no trouble running it again. Now today it’s getting stuck again and restarting doesn’t seem to help. Here’s a screenshot:

NOTE: Here’s a hacky fix for anyone still having trouble getting download_images() to run all the way through. You’ll notice that if it got to 749/750, for example, those 749 images did download and are in the dest folder. You could restart the kernel and then run download_images() for any remaining classes until you have everything you need. The only consequence is you’ll end up without those last few images in each class, which in my case at least wasn’t a dealbreaker.

nbharatula · October 31, 2018, 11:18pm

It seems like the “stuck” is caused by a connection timeout due to a bad url or large file. Like you said, it doesn’t matter if you have few images lesser to train on… I was able to run through without them.

cyberdroidmann · November 4, 2018, 5:12pm

Hey man, you can place the different image classes in different folders then run the Databunch from folders , specifying valid_pct. This happens when validation set argument or folder not found.