Share your work here ✅

bencoman · May 16, 2022, 3:41am

Which Watersport?

I like aliteration and water, so this question appealed to me. Try it out here…

The build process.

I’m lazy, so I think to myself… Why make up a list of watersports when I can scrape one from here…https://en.wikipedia.org/wiki/List_of_water_sports. So I used the following code to do that semi-automatically… (manually removing erroneous and duplicate entries from the list)

I ended up with 37 categories, which in hindsight is perhaps a bit overboard, but anyway…
To clean the categories I downloaded all the images locally, then uploaded the following dataset to Kaggle… https://www.kaggle.com/datasets/bencoman/watersports
This was before learning of the built in cleaning tools in Lesson 2.

Training used RandomSplitter to specialise resnet18 to produce inference model watersports.pkl, with the following code…
https://www.kaggle.com/code/bencoman/which-watersport-2-train-mode

Took a while to get my system setup properly, which I documented here:

In summary, I created a new HuggingFace space for my app. Cloned that repo to my local machine. Installed LFS. Downloaded the inference model. Copied the contents of app.py to a local Jupyter notebook applocal.ipynb to test in. Downloaded a few example images, then committed and pushed to lot to hugging space.

Now I’m surprised at how well it did, particularly distinguishing between similar categories like:

Snorkling, Scuba diving, Cave diving, Free diving , Wreck diving, Spearfishing
Fin swimming, Mermaiding
Kayaking, Canoe polo, Outrigger boating, Dragon boating, Rowing, Paddle boarding
Water skiing, Barefoot skiing
Body boarding, Body surfing, Surfing, Kite boarding

Some things still to experiment with:

Using RandomResizedCrop - I wonder snaps of common areas of water affect the training - I presume it learns this is irrelevant.
Trying a higher level ResNet
Review Confusion Matrix