Lesson 2 official topic

Putting this question here from the chat: If you went looking for photos of grizzlys and black bears online (assuming there wasn’t a dataset already made and labelled), what is the best way to ensure these photos aren’t misclassified?.

Jeremy then showed the Image Classifier Cleaner, and Nick said it pays to visually inspect when using these “open” image searches. Results can deteriorate drastically with both the inherent ambiguity of your topic or your search query. Sheik Mohamed Imran said we would have to manually get the losses for the data and sort it.Or you van peek into the code used for the GUI, has the same logic

5 Likes

I think Albumentations has something like this:

You might be able to run this, get your augmentation policy, and incorporate it in fastai like this:

But I haven’t tried this… could be something interesting to play around with :slightly_smiling_face:

5 Likes

Yes, exactly, that’s another framework I was talking about :smile:
Somehow, it was difficult to integrate this (autoaug) into my pipeline last time I tried it out. However, maybe it was just me. However, otherwise, the lib is great and a default way to go with augmentations for many projects.

1 Like

same thing different style

1 Like

It’s going to depend on the type of training you’re trying to accomplish. In general, if you’re optimising for cost, you’ll want to choose the GPU that allows you to train most efficiently with respect to the memory that your problem requires.

K80s are used by default in Google’s Colab and are a good economy choice while getting started. You can pay up for Colab Pro or Pro+ if you want better memory / GPUs / TPUs

1 Like

4 posts were merged into an existing topic: Help: Python, git, bash, etc :white_check_mark:

You can also increase data augmentation. Mostly it helps, if that does not help I would consider looking at more data.

1 Like

Shravan Kumar:
​Can this cleaning method to apply to only few images with where loss is higher? Becuase if we have million of images it is tedious to clean

There are other methods which employ coreset sampling methods, which identifies a sample of the most diverse data and use it for training.

Libraries that I know of are Lightly, and Cords

(actually it adds features like tabs and new editors that weren’t available before lire R and markdown if I remember well)

but basically jupterlab is the new front of the jupyter server, and the notebook is the “format”.

1 Like

This aligns with the concept of data drift and model drift. Seldon has a nice library to identify this.

What do you think about Streamlit? I think they have some kind of cloud hosted dashboards as well. I use it quite often, but they didn’t have cloud until recently.

2 posts were merged into an existing topic: Help: Basics of fastai, PyTorch, numpy, etc :white_check_mark:

Personally I felt, Gradio had lesser learning curve than Streamlit, but streamlit has a lot more functionality than former.

2 Likes

Abhishek Sharma
​Is it a pickle file? Pytorch is .pth? how is it different?

It has the weights(pth), vocabulary(classes used), and other things used by the learner.

I haven’t checked on this in years but I’m wondering if anyone has deployed their models to a mobile device recently. It used to be a PITA having to convert your pytorch file to ONNX and then to mlmodel.

In fastai is it possible to save model files in pth format rather than using the pickle format?

1 Like

Yes I am also frequent user of streamlit. For quick POVs in office work streamlit is good. Gradio is new to me . Is this gradio better than streamlit?

fastai.model will return a PyTorch model. So you can save it any way you want.

3 Likes

Jeremy spoke about this in the last iteration of the lesson. He recommends hosting the model on the server and accessing the model via api on the mobile device. Idea is the server will have more processing power, and this solution is feasible where internet connectivity is not an issue.

Yes, you can save the model weight seperately.