Share your work here ✅

Did you include this in your notebook? If so … please do :slight_smile:

Kind of amazing is that this is considered an advanced technique and waits for folks at the end of the book … and yet you figured out how to implement it after the first session of the course. Top > down learning works!

6 Likes

Hi all,

I have created a classifier to detect if your house has been damaged by storms and have published a simple Gradio app on Huggingface spaces.

Thanks @suvash for the inspiration to explore Gradio and Huggingface Spaces.

Thanks @strickvl for hosting the delft-fastai study group, and good work using class activation maps to check how your model is working, I will definitely have to give this a go as well!

2 Likes

Yeah I’m super impressed how @strickvl just dived in and gave it a try!

6 Likes

I took an idea I had when tinkering with a smart home around “private” computer vision - can I use images with detail stripped out and still build a model that can drive certain smart home tasks (ie. lights on/off when someone enters/exits a room). With limited time and using pretty much the standard set of hyperparameters that fastai suggests, I was able to get 85-90% accuracy in a simple multi-class classification using two different types of proxy camera filters.

The background and next steps for anyone interested can be found in this notebook, but the lesson here is that with models that are clearly not optimized for this task can still get good results in a couple hours of time.

This fastai/DL stuff is like magic - but unlike magicians, the fastai team actually reveal their secrets :slight_smile:

BTW - I tried to host my notebook on Kaggle but I was encountering an error. Will repost there if I can resolve

4 Likes

Yeah, I kind of figured out the pipeline looking at notebooks made by @dhoa and played around with gradio audio inputs. It worked pretty smooth with the integration.

Yet one thing I noticed was two of the three gradio demos showing audio feature were having some issues.

Hi everyone, I have enjoyed following lesson 1 and learning from the python code.

I made some quick modifications to the “Is it a bird?” kaggle notebook.

I was inspired by this clip by Stephen Colbert to answer the question:

Is Potato?


Uploading: image.png…

I also liked this one because some potato images look similar to rock from the image search.

There were some issues with this example becuase there were images of the “The Rock”.

Looking forward to meeting up with the other members of the Australia Study Group before today’s talk.

6 Likes

This is an intriguing concept. I think from a security perspective, I’d be paranoid enough to not let anything hit the camera sensor that I don’t want it to see, so I’d probably go for a “physical filter”. I’ve worked in offices where they used motion detectors above cubicles to determine presence (to turn lights off when people leave) but the most annoying part of that was that these sensors turn lights off if your’e working late and their resolution wasn’t that great. So, if you’re working late just sitting at your desk, they can’t detect presence. I think a vision based system like you propose can check itself against a base case and keep the lights on even if no motion is detected. I think not too much resolution is needed to determin if there is motion within the monitored space.

Something really cool would be if the system can “learn” the base case of an empty room and then detect presence of objects within it. Especially if those objects tend to move once in a while. or maybe a combination of multiple cheap sensors that feed into the network? like a very low resolution camera, a microphone and an ultrasound motion detector all in one case?

The more I think about it, the more complicated this problem gets in my mind :sweat_smile:

Definitely an interesting and challenging problem for sure!

2 Likes

I also tried to re-use “Is it a bird?” code with a different set of images to see if the classifier can distinguish between war memes and news photos. It performed very well but it was interesting to see that a captioned news photo was classified as a meme! Looking forward to playing with this idea a little more, potentially adding NLP element to it.

3 Likes

Might be even more interesting to see if you can find out why using the techniques described here: Share your work here ✅ - #77

1 Like

Art Mood

This is a variation of my last model in which I trained an image classifier on pictures of the four seasons. In Art Mood I wanted to determine how well the model could predict the mood the paintings evoked in me. (I limited these moods to the four seasons.) The examples in the Art Mood app are a combination of both real and abstract art. Subjectively, I thought the model did very well. My wife is an artist and she too found the predictions to largely match her seasonal moods—and her perspective should be more highly regarded than my own. . .

This was a proof of concept. In order to make this more rigorous, I would need to create a labeled training set of paintings in order to compare how well the model did a priori.

Let me know if the model aligns with your Art Mood

9 Likes

I built a classifier to match an image with the best-fitting art movement from the following:

I gathered about 200 photos / movement, 150 for Expressionism from Google Arts & Culture, and 200 for the rest from DuckDuckGo.

Using a resnet34 and presizing the images got it down to below a 5% error rate.

Testing my classifier on images outside of the genres provided seems to align with my intuitions pretty well - Keith Haring is categorized as Expressionism

While for Mickey Mouse Ukiyo-e is the favored category

Next steps might be looking at training on the WikiArt dataset.

Also curious about scraping Twitter hashtags for generative art (Dall-e 2, MidJourney) and predicting likes / retweets.

Here’s my Hugging Face Demo

9 Likes

@davidrd Very cool project. I tested it with some of the paintings I used for my Art Mood project and your model performed very well! Thanks too on the lead on the WikiArt source. That looks to be quite interesting.

1 Like

I decided to try using an image classifier for a task that it’s not “supposed” to be used for: language detection! I.e., given an image of a chunk of text, can the computer tell whether it is in French or German? (I decided to use French and German rather than, say, French and English, because I wanted two languages that both use accent marks. Also, I wanted to use languages that are written in the same alphabet, as otherwise the model would be learning to recognise writing systems, not languages.)

You can see my Kaggle notebook here: https://www.kaggle.com/code/skalyan91/is-it-a-bird-creating-a-model-from-your-own-data. (I haven’t adjusted the text to reflect the fact that I’m no longer doing bird/forest classification—apologies!)

For the data collection, I did DuckDuckGo image searches for “text en français” (text in French) and “Text auf Deutsch” (text in German). However, since the resulting images were usually pretty large, I had to crop them to a standard size, rather than squishing them. (Unfortunately, the fastai library does not seem to provide a convenient function for batch-cropping a folder of images, as it does for resizing. This meant I had to do the cropping and saving using the Pillow library, which was slower as I didn’t have time to parallelise it.)

Here are some sample images I used (French, then German):

image
image

I fine-tuned the model for 10 epochs, since I thought this might give it a better chance of learning the classification. However, after about the 6th epoch, both the error rate and the validation loss pretty much got stuck.

epoch train_loss valid_loss error_rate time
0 0.841501 1.059826 0.545455 00:09
1 0.805841 0.899394 0.428571 00:09
2 0.719036 0.953295 0.363636 00:09
3 0.619781 1.105118 0.376623 00:08
4 0.540305 1.153665 0.350649 00:09
5 0.477164 1.134642 0.337662 00:09
6 0.429748 1.095640 0.311688 00:09
7 0.397670 1.083638 0.311688 00:09
8 0.367227 1.104923 0.337662 00:09
9 0.340956 1.109644 0.324675 00:09

In the end, the model ended up with an error rate (on the validation set) of about one-third. I think this is surprisingly good for a model that’s trying to do something it’s not designed for!

8 Likes

We published this work at an HCI conference last year about detecting hand postures using the reflected webcam of a laptop to trigger commands and play games. The posture recogniser uses Fastai.

7 Likes

A few days back, I built the Marvel character classifier app and deployed it on Jarvislabs.ai. While doing it, I figured out that I can make a few changes to Jarvislabs.ai platform and make it super easy for anyone to do it.

I made a blog and video describing how this can be done. All you need is to add the server_name and server_port to the launch function.

import gradio as gr
demo = gr.Interface(fn=find_character, 
                    inputs=gr.inputs.Image(shape=(256, 256)),
                    outputs= gr.outputs.Label(num_top_classes=3)
                   )
if __name__ == "__main__":
    demo.launch(server_name="0.0.0.0",server_port=6006)

I have described it in detail here in the blog and video.

I believe this can help

  • When you need to deploy models that need the power of GPU.
  • When you want to share it only with your team/friends and not make it public.

I hope you like it :smiley:.

8 Likes

Also wanted to mention an earlier experiment I did, using the image classifier to identify pictures of Indian musical instruments! (It’s in Version 1 of the same notebook as in my previous post.)

In particular, I trained the model to differentiate between a veena and a sitar, both fretted wooden instruments with two resonating chambers:

image (veena)
image (sitar)

More sample images:

image

I trained for 3 epochs, and got a final error rate of about 16%. Not bad!

3 Likes

fastai has a transform called RandomResizeCrop which would be best for this - it’ll probably give you better results, and doesn’t require any preprocessing. See the book for details.

This is great. Although IIUC it’ll cost at least $75/week to run it? Are there any ways a user can get the cost down?

Would it be a good idea to introduce new instance types that are quite a bit lighter-weight for running full-time for stuff like this? E.g 16GB RAM, 2 cores, and a 1080ti?

1 Like

After attending the first class I built an image classification model to detect different types of rock (Sandstone and Coal) which I am using a lot in the mining industry project. I am looking forward to today’s 2nd lesson. Thanks

1 Like

Yes, the cheapest GPU option currently costs 75$/week.

If we can leverage spot instances it would cost less than 32$ a week. While using spot instances, there are 2 challenges.

  • Spot instances can be terminated.
  • Should it run 24/7

Currently, users can use Jarvislabs API to handle the above challenges.

We want to make it easier further by allowing the users to make it completely configurable. For example

  • Choose spot instances for deploy
  • Deploy on 1st request, similar to AWS Lambda, and stop when there is no activity.

Regarding using 1080ti, we cannot use it in DC based on Nvidia rules. However, I am thinking of the possibility of using shared GPUs like A6000 which comes with a huge VRAM of 48GB.

6 Likes