I finally wrote up my blog post about creating the guitar classification model. In the previous days I decided to redo the exercise and incorporate the new data block API, progressive resizing and other goodies of fastai v1.
Please let me know if my description of the one-cycle-routine, progressive resizing, etc. is off.
I think the results came our real nice and I’m still amazed how good progressive resizing works!
I tried to implement the embedding approach in the Collaborative Filtering from scratch in Keras. I got terrible results on the movielens 100k dataset
This is the accuracy of the model:
This is the losses of the model:
This is the Keras model:
num_factors = 5 # embedding dimentionality
# input
users_input = Input(shape=(1,))
items_input = Input(shape=(1,))
# embedding
user_weight = Embedding(num_users, num_factors, input_length=1)(users_input)
item_weight = Embedding(num_items, num_factors, input_length=1)(items_input)
# bias
user_bias = Embedding(num_users, 1, input_length=1)(users_input)
item_bias = Embedding(num_items, 1, input_length=1)(items_input)
# the collaborative filtering logic
res1 = Dot(axes=-1)([user_weight, item_weight]) # multiply users weights by items weights
res2 = Add()([res1, user_bias]) # add user bias
res3 = Add()([res2, item_bias]) # add item bias
res4 = Flatten()(res3)
res5 = Activation('sigmoid')(res4) # apply sigmoid to get probabilities
# scale the probabilities to make them ratings
ratings_output = Lambda(lambda x: x * (max_score - min_score) + min_score)(res5)
model = Model(inputs=[users_input, items_input], outputs=[ratings_output])
I need to figure out what I missed to improve the model. All is detailed in this blog post. Any improvement suggestions?
Continuing our series of updates to our aircraft classifier project, I have added the Data Block API and progressively resized the dataset, from 32x32, to 64x64, to finally 128x128. We are now at 99.3% accuracy. Hooray.
Using the new model I have created this web app. Check it out at: deepair-v2.
I’ve written the following short Medium post describing the details of the process.
The accompanying notebook can be found at this gist.
I have been playing around with audio classification, using bachir’s strategy of transforming the audio signal into an image represents its spectrogram, and then performing transfer learning on those images using the guidelines from the first three lessons. I tried this with the dataset from the tensorflow speech recognition challenge from Kaggle last year (https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/) and I got an interesting result. The dataset comprises short utterances containing commands such as up, down, stop, go etc. In my first trial, I excluded the categories unknown and silence to facilitate training.
The best result was superior to the first place in the private leaderboard of Kaggle 10 months ago. However, I’d need to include the unknown and silence categories to perform a fair comparison.
I also applied this same approach to emotion recognition from speech using the IEMOCAP database (https://sail.usc.edu/iemocap/). This database contains speech signals uttered by actors and labeled in categories such as sadness, happiness, anger and so on. I started with two classes with a decent amount of data (one thousand samples each) and the first results are encouraging: I got about 93 % accuracy differentiating between anger and sadness. I’m curious to see the performance for the entire dataset.
I am working on an object verification problem that need to add new categories regularly. (similar to face verification, we first show the identity card and a model will verify if the face is matching)
I found 2 approach from here how-to-add-a-new-category-to-a-deep-learning-model . It says that we can retrain the model with old weight and adding this new category or using Content-based image retrieval. It means that we base on the last layer feature vector to decide the category (by calculating their hamming distance and euclidean distance). The paper of this technique you can find it here: Deep Learning of Binary Hash Codes for Fast Image Retrieval
I have tested the technique with Mnist data set, from number 0 to number 7 only. Number 8 and 9 will be add later. I use only the binarizing code of the last layer in this moment. I tested with number 9 and the results is quite ok. Every other numbers have low similarity (<60%), except number 7, it has 90% similarity with number 9. I will continue to test with Euclidean distance rather than this binary hamming distance.
That is the “data wrangling” part - prepping raw data into a format more useful for analysis or ML models. How you slice and dice depends on what you are trying to do, and the design of your model. It looks like you may want to construct tabular data and perhaps look at relationships of other dimensions based on day of the week and/or time.
If that’s the case, you might be interested in the concept of “embeddings for categorical variables”. Here is a useful blog post and workshop video from Rachel on the subject:
Regarding music spectograms, can’t we do something like this to perform data augmentation
We can split the thirty seconds(let’s say 30 sec audio files) into 10 chunks, 3 seconds each… Each window of the song will be tagged with the same genre of the original thirty seconds.
If a song had rock as genre, then all 10 windows that came out of the splitting will have rock as genre. This trick gives 10x more data than the original dataset.
A little late to the game but here is my experiment with creating and deploying a cuisine-type classifier (5 cuisines, with ~200 images each in train+valid).
Learnings:
Surprisingly only resnet50 was able to bring the accuracy to >70%. Struggled with resnet32 to get accuracy beyond 60% (tried changing lr, batchsize, image_size, data augmentations, ran more epochs until I saw some small sign of over-fitting)
Increasing image size from 224 to 299 and using flip_vert=True in data aug and using resnet50, was able to bring the accuracy to 70%
add other popular cuisines (french, italian, etc) and a ‘other’ category for non-food images and check the model performance
web app: add some default sample images that the user can select and test, ability to pass a URL, add GradCAM activation maps, add class prob as a graph
Pro-tip:
If you haven’t already check out imgcat and imgls. Makes working with image files on a remote machine from a TERMINAL so much easier. See https://github.com/olivere/iterm2-imagetools
What do you mean by object verification? Are you saying you can have say 5 pre-trained categories A, B, C, D, E and given a new object you want to be able to classify it as A, B, C, D, E or neither? The method you are citing is an instance-based method and is used to find similar objects by comparing distances between a set of feature vectors extracted from the image. I am working on an image-similarity problem so maybe we have the same use case.
Yeah, kind similar. I want to check if an object with identity A is really A (double check). For example I have a package of apple and want to check if all items inside are apple and not mixing up with orange
But from time to time, I need to add new categories in my model as well, (F, G, … rather than just A B C D E) and because the new categories will be added regularly then I don’t want to retrain the model very often.
This method can be used to compare just the last feature vector to verify the identity (apparently, we should fine tune the model to represent the characteristic of this data set)
Fortunately, I think we are working in the very similar problem. I’m thinking on how to create a reference feature vector with the training set (at first I will just take the mean) and calculate the Euclidean distance. I will update the result when I finish but very appreciated if someone can shed some lights about this problem
So really you don’t trust the label of A and want to verify it.
I would try an unsupervised learner such as a CNN autoencoder and use a bottleneck layer as the feature vector for each image during inference.
First you will need to train it on A to E categories of images to produce a set of filters that can represent your dataset and then you can run inference on each image to obtain the feature vector.
Store these feature vectors in an approximate nearest neighbour search package like Annoy with the label.
Then given a new image you can run inference on it to extract the feature vector then search the AKNN for the K nearest images and obtain their labels and distances. From here you can rank which label is most likely given K nearest images and distances.
How to deal with new image categories?
For new categories you will want at least 3 images in the category. Run the images from the new category through the network using inference only to extract the feature vectors. Then add the feature vectors to the AKNN. Then for each image in the new category search for the K nearest images. If your algorithm ranks the 2 other images in the category as highest match you should be good to use the current network without fine-tuning. If there is a mismatch then you can fine-tune the autoencoder and recompute the feature vectors for the existing images.
There are probably more complicated things you could do to avoid having to fine-tune one network and recompute feature vectors such as have a seperate autoencoder per category.
Thank you so much for your suggestion @maral , it is very helpful for me. I was thinking about using autoencoder also but because I don’t have many experience on it so I first tried with an easy method that I know . I will read carefully your post and try it out.
P/s: By the way, do you have some example codes or a posts that I can read about CNN autoencoder ?