Lesson 3 In-Class Discussion ✅

Yup! :smile: on the nose! :slight_smile: and not annoying at all :slight_smile:

2 Likes

Thank you so much! :raised_hands:

1 Like

Hi friends! :wave:

So I just wrapped up homework for lesson 3. I ended up building an image classifier to recognize genres of a movie based on a movie poster. Accuracy was around 89% :nerd_face: Here’s the dataset: https://www.kaggle.com/neha1703/movie-genre-from-its-poster

It took me around 8 hrs to preprocess all of the data to get it to a point where it’s actually usable for a mode. I feel like most of ML work is data engineering :sweat_smile:

I had a quick question about training/validation loss. Most of my losses up until this homework were roughly below 0.10. However on my homework assignment it was around 0.20 which is not super high but it’s not as low as I’d like it to be. Any ideas on how to debug this? Here’s a screenshot of my losses etc:

Another question I had was regarding plot_losses. After training a bunch, switching infrastructures etc. I still landed around the same accuracy I had before. However when I plotted losses my last model (Resnet 50) here’s the chart that I’ve gotten:

It’s a bit odd that validation loss is not going up first rather it goes down slightly. Any ideas on how to debug that or should I be even worried about it?

AAnnnnnd for fun here’s results after I wrapped up training resnet34, it’s pretty sweet:

1 Like

Hi novarac23 hope you are well.
having created a few image classifiers 89% is not bad.
I am still learning ai/ml and am currently completing apps for the head pose, tabular data, multi segmentation, described in lesson 3.

If it were my model I would check what the winning or highest percentage for accuracy is on Kaggle. Then investigate how they achieved it.
Having read a few papers now on machine learning, it appears that some people do many things to tweak their models, such as feature engineering, trying different algorithms and like you mentioned different models in the transfer learning.

One thing I have noticed is in a pocket watch classifier that I created the images of watches are so similar that for every class I created the value for accuracy went down.

Have a jolly day.

mrffabulous1 :smiley::smiley:

1 Like

Did you ever have any luck with this (creating a classifier for the card game SET)? I’m wondering if I should train four models (shape/color/number/fill) or somehow one. (I’m only on lesson 1). I was just going to assemble a training set manually, and maybe run it through some elements of the barely-working manual approach I tried before: https://github.com/djMax/setbot

Just added this The Universal Approximation Theorem for neural networks video to the wiki of the lesson. It helped me understand it very well as he visualizes everything. Hope it helps you too.

2 Likes

Great find @kelwa!!! That helped make so much sense to me :slight_smile: Thank you!

1 Like

HI kelwa
Great video!
A picture paints a thousand words and video even more!

Cheers mrfabulous1 :smiley::smiley:

1 Like

Anyone had ‘TypeError: argument 1 must be an iterator’? Specifically on ‘----> 3 .label_from_df(label_delim=’ ')

when using:

src = (ImageList.from_csv(path, 'labels.csv', folder='train', suffix='.jpg')
       .split_by_rand_pct(0.2)
       .label_from_df(label_delim=' ')
      )

hey there…
I had the same confusion about slice’s working and I had asked a generic question:

thanks to @Honigtoast,
the docs referenced explained it well

1 Like

Hi, I tried to figure out downloading the planet kaggle dataset for a while now and it seems the files are no longer available in the format it was before? I can see other data-set but this competition only has 3 mb of files with 2 torrents instead of the actual pictures. I’m not sure how to torrent this as I haven’t used torrent in a long time and my attempts with chrome free software did not work.
Can someone confirm if its just me or what has changed? I could not find any posts in this thread about problems with downloading the data.

1 Like

Hi Manuel_R hope you are having a wonderful day!


Maybe this link above will help. When I did this lesson the files were in the old format.
I went to the new site and had a look and was able to download some files using BitTorrent Classic https://www.bittorrent.com/ on my mac. see image in above post. I am not aware of a way to download the files with anything from Google.

Cheers mrfabulous1 :smiley::smiley:

1 Like

Hi, Can anyone help me with this simple error? I am trying to run the lesson3 notebook on CoLab and when I try to run the cells importing fastai2.vision.all I get the error that “name ‘ShowTitle’ is not defined”. Before running the imports cell, I did run the command “!curl -s https://course.fast.ai/setup/colab | bash” and I also ran the command “pip install fastai2”. I also tried to Reset Runtime but that also didn’t help. Can anyone provide any help or suggestion?

It seems you’re working out of fastai2. If that is intentional then see the v2 intro thread to setting everything up. If not you should be working out of fastai not fastai2

BrokenProcessPool Error in lesson3-imdb.ipynb is thrown by the line .label_for_lm()


I am running Windows 10 64-bit
I’d appreciate hearing from anyone who has solved this issue. The BrokenProcessPool error occurs in many different places in the Fastai v1 library. I’ve spent a lot of time trolling the Forum, but have yet to find a post with a viable solution.

Thanks, I just used the latest notebook. I will change fastai2 to fastai and work on the notebook.

I’ve created an annotated and somewhat refactored version of the original lesson3-imdb.ipynb notebook (re-named lesson3-imdb_jcat.ipynb) and posted it here on github.
I had a few problems running the notebook on my Windows-10 64-bit machine.

  • Almost all blocks of code using Fastai’s data API failed with BrokenProcessPool errors. I discovered that this is a stochastic error – sometimes it fails, sometimes it doesn’t! So I implemented a brute-force approach: just repeatedly try the code block until it runs successfully.
  • I got a CUDA out of memory error with batch size bs = 48, so I had to change it to bs = 24; this allowed me to get past the error, but then the very last training step (fine tuning) failed with a CUDA out of memory error. At that point I lost patience – the next thing would be to reduce batch size again – bs = 16 or bs = 8 to get the final movie classifier. However, since we saved the pre-fine-tuned version of the model, we still end up with a classifier (albeit a sub-optimal one) to play with. And (spoiler alert) in spite of this, it turns out to be pretty good!
  • PyTorch doesn’t seem to allow me to use the GPU on my Windows 10 machine. The notebook does complete overnight, but it would probably be much faster on a Linux machine or even on Google Colab.

Update: I re-ran the notebook after reducing batch size from 24 to 16, but this time I got

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
at the last step (in section 3.4).

Any suggestions for what to try next?

Hi!

I’m trying to solve s problem at work using a segmentation model, based on lesson 3. My data is a surface model made from aerial laser scanning showing buildings and their surroundings. The segmentation masks marks different types of roofs.

Since my data is a raster representation of height values, it has basically just one channel. I have, however, transformed it to grayscale images.

I’m getting somewhat good, camvid metric ~80%, results I think, based on mu small training set, about 40 images.

But am I on the right path? Is there a better way to deal with this kind of data?

I can use the corresponding aerial photos and add the surface model data as a fourth channel. Can fastai handle that kind of data and how?

Greatful for any help!
Peter


Raster representation of height values from a digital surface model (DSM) showing a gable-roofed house with a attached flat roof.

Hi Peter I am using LIDAR images which sounds similar. Height as greyscale works just fine. Just 40 training images sounds challenging though. I use 13,000 and got 92 percent. Consider getting more samples.

1 Like

Thank you!