Share your work here ✅

This week I wanted to really make sure I grasped what was going on with everything in chapter 4 of the book, so I put my study hat on. Writing helps me understand, so I split the things I learned from the chapter into several separate blog posts. I used the Fashion MNIST dataset to switch things up and chose to compare pullovers and dresses as I figured they were at least partly similar somehow in appearance:

  • “A dress is not a pullover: learning about PyTorch Tensors and pixel similarity using the Fashion MNIST dataset” (link) — where I replicate the pixel similarity approach taken at the start of the chapter
  • “Some foundations for machine learning with PyTorch” (link) — in which I summarise the seven steps needed for model training with this update-the-gradients approach.
  • “Stochastic Gradient Descent: a mini-example of the whole game” (link) — I replicate the mini-example of optimising a very simple function.
  • “Using the seven-step SGD process for Fashion MNIST” (link) — I cover the ways that we need to add some extra functionality to make things work for the many parameters of our Fashion MNIST image data.
  • “A neural network for Fashion MNIST data” (link) — I show how we can extend the previous linear function so that we emerge with a neural network by the end.

Obviously this closely follows the material taught in the book, so no prizes for originality. At the very end, I made a diagram to try to summarise some of the things I’d learned from this chapter and how they all fit together as part of the abstractions.

I’m really looking forward to the upcoming class. We had a little taster of the power of embeddings in our meetup group today thanks to an amazing show-and-tell from @pcuenq!



I’m finally sharing a Gradio app with my work. It is a model that classifies between the following 8 telecommunication tower parts:

  • Base plate
  • Grounding bar
  • Identification
  • Ladder
  • Light
  • Lightning rod
  • Platform
  • Transmission lines

Here some example pictures:

You can try the app here:

I worked as a structural engineer for 14 years designing new telecommunication structures and doing reinforcing analysis for existing ones . So I have lots of pictures of towers from all over Venezuela. To give you an idea, between 2015 and 2016 I did the analysis of the following sites.

To create this model, only 478 images for training and 119 for validation were used.

This experiment began in the 2019 edition of the fastai course. To make things easy, I chose parts that were, like, easily distinguishable between them. But I think there would be potential by segmenting parts, or detecting rust, missing bolts. Nowadays inspections are done by drones (at least in the advanced economies), so maybe using some models would help automate or improve inspection and analysis.

Nevertheless, at the time, I was very impressed with the accuracy achieved thanks to transfer learning. Using resnet34 and training for 6 epochs gave me an error rate of 0.091667.

But I also found some “obstacles”:

  • I couldn’t make use of my own GPU because it was very hard to install fastai in Windows. So I did the training in Colab, even though using Colab wasn’t officially supported or explained in the docs. Practical Deep Learning for Coders 2019
  • Then it was a pain to deploy the model. Although I did set up the front-end and back-end within a Docker container, the UI was so awful and the app was difficult to deploy in Heroku that I didn’t share it here in the forum (I regret that).

Now those things are more accessible. You can train your models in many platforms for free. And there is also WSL2 which allowed me to use my old GTX 1070 GPU! (Thanks to lesson 2 and the help from this forum)

And there is also more accessible options like Gradio and Streamlit to share some great apps without worrying about JavaScript or Docker while starting out.

For this model, using resnet18 for 3 epochs resulted in an error rate of 0.008333.
So, an improvement from 90% to 99% accuracy with half the epochs.

There is a great margin of improvement. I tried some pictures that were misclassified. But I could also use many more pictures to improve the model.

Well, sorry for the long story, I’m happy with the results and grateful with this supporting community.

PD. The tutorial Gradio + HuggingFace Spaces: A Tutorial by @ilovescience was really usefull. Pay special attention to the git-lfs part.

Thanks a lot.


I’m a bit late reviewing the shared works, and maybe you already resolved this issue. Anyway I see no notice or edit here, and no change to the code, so I’ll comment.

The “0.0000 probability” indicates that the predicted category is not on index zero of the probability array, i.e. not “probs[0]” .

# your code
is_planet,_,probs = learn.predict(PILImage.create('planet.jpg'))  #<---compare
print(f"This is a: {is_planet}.")
print(f"Probability it's a planet: {probs[0]:.4f}")

what you want is something more like this…

learn = load_learner('watersports.pkl')
print( learn.dls.vocab)
prediction,idx,probs = learn.predict(PILImage.create('planet.jpg')) #<---compare
print(f"This is a: {prediction}.")
print(f"Probability it's a planet: {probs[idx]:.4f}")
1 Like

A naive suggestion, as my first time through the course… Perhaps a data augmentation step might be to make the white background transparent and merge with a variety of different backgrounds, so the net learns to ignore the backgrounds. e.g…

1 Like

You may already know, but some might not, so btw…
you can also exclude elements from your search
e.g… planes flying -pilot -passengers -people


Thank you! Great idea, I will try out the first method from stack overflow discussion.

Which Watersport?

I like aliteration and water, so this question appealed to me. Try it out here…

The build process.

I’m lazy, so I think to myself… Why make up a list of watersports when I can scrape one from here… So I used the following code to do that semi-automatically… (manually removing erroneous and duplicate entries from the list)

I ended up with 37 categories, which in hindsight is perhaps a bit overboard, but anyway…
To clean the categories I downloaded all the images locally, then uploaded the following dataset to Kaggle…
This was before learning of the built in cleaning tools in Lesson 2.

Training used RandomSplitter to specialise resnet18 to produce inference model watersports.pkl, with the following code…

Took a while to get my system setup properly, which I documented here:

In summary, I created a new HuggingFace space for my app. Cloned that repo to my local machine. Installed LFS. Downloaded the inference model. Copied the contents of to a local Jupyter notebook applocal.ipynb to test in. Downloaded a few example images, then committed and pushed to lot to hugging space.

Now I’m surprised at how well it did, particularly distinguishing between similar categories like:

  • Snorkling, Scuba diving, Cave diving, Free diving , Wreck diving, Spearfishing
  • Fin swimming, Mermaiding
  • Kayaking, Canoe polo, Outrigger boating, Dragon boating, Rowing, Paddle boarding
  • Water skiing, Barefoot skiing
  • Body boarding, Body surfing, Surfing, Kite boarding

Some things still to experiment with:

  • Using RandomResizedCrop - I wonder snaps of common areas of water affect the training - I presume it learns this is irrelevant.
  • Trying a higher level ResNet
  • Review Confusion Matrix

I took the course back in 2019 and it changed my life :slight_smile: Now I’m just here to help and have fun.

A lot of my friends are thinking of buying a car, and I’m trying to convince them to buy a Tesla so I made a Tesla Model classifier.

Link to Kaggle notebook: Tesla model classifier | Kaggle

Screen Shot 2022-05-15 at 9.33.21 PM


Thanks, @pcuenq @mike.moloch and Jeremy for all the suggestions. I will try them and hopefully will make the video with quality audio this time :grin:.

I made the datablock video with the same airpods but made it during lockdown. So it was less noisy I guess :grin:.


In preparation for the upcoming transformers lesson, last week I explored the use of multimodal (vision + text) transformers for a task I’m interested in: find photos in my personal collection using just free-form text descriptions.

I found that the results were great for my purposes, and more flexible than the text-based search built in my iPhone. I wrote this post about it. You can explore the examples I tested by dragging the slider of the first figure embedded in the post. It shows the photos that are most similar (according to the model) to a few text queries I used. I was surprised that the model works for things such as concepts, styles or locations, without using any metadata information at all (just the images).

It was a very fun project that I intend to keep working on. My teammates in the Delft study group seemed to like it too, and they suggested a few very interesting queries :slight_smile: Any questions or suggestions are welcome!


Btw, there is also an open-sourced neural search project that maybe an interesting framework for this kind of apps. I haven’t tried it myself, but was reading through their docs briefly. Sounds promising.


Can I just repeat again here that I really liked this demo ? :heart_eyes_cat:


Thanks Suvash! :slight_smile:


That’s a great post! Do you have something I can retweet?


I just wrote this: I still feel a little bit shy about sharing stuff, but I’m working on it :slight_smile: Thanks for the support!


I know the feeling - the only way to get past it is to ignore it and do it anyway, until you start to get used to it…

Your work is too awesome to not share widely IMHO.


This is an incredibly well done write-up. Love the interactivity … really well done.

Is there a repo or notebook folks can look at to see how you trained the model?

Thanks for the share.


Wow, really nice. Enjoyed reading it :clap::clap:

1 Like

Great post @pcuenq! Looking forward to see more on this, especially on the word embedding algebra to search images :grinning:

1 Like

I have built cloud segmentation pipelines using fastai if you are interested.