Putting this question here from the chat: If you went looking for photos of grizzlys and black bears online (assuming there wasn’t a dataset already made and labelled), what is the best way to ensure these photos aren’t misclassified?.
Jeremy then showed the Image Classifier Cleaner, and Nick said it pays to visually inspect when using these “open” image searches. Results can deteriorate drastically with both the inherent ambiguity of your topic or your search query. Sheik Mohamed Imran said we would have to manually get the losses for the data and sort it.Or you van peek into the code used for the GUI, has the same logic
Yes, exactly, that’s another framework I was talking about
Somehow, it was difficult to integrate this (autoaug) into my pipeline last time I tried it out. However, maybe it was just me. However, otherwise, the lib is great and a default way to go with augmentations for many projects.
It’s going to depend on the type of training you’re trying to accomplish. In general, if you’re optimising for cost, you’ll want to choose the GPU that allows you to train most efficiently with respect to the memory that your problem requires.
K80s are used by default in Google’s Colab and are a good economy choice while getting started. You can pay up for Colab Pro or Pro+ if you want better memory / GPUs / TPUs
Shravan Kumar:
Can this cleaning method to apply to only few images with where loss is higher? Becuase if we have million of images it is tedious to clean
There are other methods which employ coreset sampling methods, which identifies a sample of the most diverse data and use it for training.
What do you think about Streamlit? I think they have some kind of cloud hosted dashboards as well. I use it quite often, but they didn’t have cloud until recently.
I haven’t checked on this in years but I’m wondering if anyone has deployed their models to a mobile device recently. It used to be a PITA having to convert your pytorch file to ONNX and then to mlmodel.
Jeremy spoke about this in the last iteration of the lesson. He recommends hosting the model on the server and accessing the model via api on the mobile device. Idea is the server will have more processing power, and this solution is feasible where internet connectivity is not an issue.