That is weird… All the memory sizes are as expected, except the 224x224…
Perhaps you have looked at it when your model was freezed… After unfreezing usually it will take more gpu ram… you can test it by doubling BS with 224 size and you will get cuda OOM error.
I am trying to do my first project: precidt the NBA MVP winners. I made a training and testing csv files (based on the last 30 years).
have few problems:
Each year only 1 players wins the mvp. how do I add this condition to my program? (ofcourse that every year 1 player must win the title). (1 for winning the title, 0 else)
I splited my train and test data in this pattern:
2018 data - test
2017 data - train
2016 data - train
2015 data - train
2014 data - test
so on…
is it the right way?
3)What is the right way to split between the train and validation sets?
4)What is the right way to define the cont and cat variables?
Your Help we be great!
Here is 2016 training data (Westbrook won the MVP)
This is an extremely exciting project… I have read your blog post when @radek tweeted about it with a lot of enthusiasm…
I read the amazing blog post that has inspired you to do your project. Thanks for sharing…
This is something that made my heart palpitating…
I think the most exciting part is that the dataset is human comprehensible, so we can understand the math operations on the embeddings by just looking on the resulted image. People are lucky who work on datasets that both us and machines share the same insight and both understand the same way. Such visualization are not only useful for debugging, but excites you that the machine not only learned to recognize images, but can understand what are the differences between them…
And I could not resist but trying vec2whale operations (like whale - whale = ? ; whale + whale = ?). That was after finishing the kaggle whale competition with a silver medal (which I haven’t think that is even a possibility for my 1st serious kaggle comp)…
Here are few images of those whale2vec… Note that the whale identification features are way more subtle than roads, buildings…etc… The model has been trained to identify whales from small scratches and colors. So its embedding math operations should be understood in this context.
Hey, I published my own review on the Kaiming paper, without the mathematical derivation but with all the important intuitive concepts. I hope it serves as a complement of @PierreO’s post. You can find it here. Feedback is welcome appreciated!
Hi, I recently tried to build a joke generator using the NLP stuff taught in course 4.
I used the jokes in wocka.json and stupidstuff.json from this repo.
It didn’t really work that great but some of the generated “jokes”:
Yo mamma so old, she is still in the shower!
There was a blonde who was working on a computer. She was a BLONDE PROGRAMMER.
What do you call a man with a dog in his mouth? - An Irishman.
Hi, I just reviewed the Transformer-XL paper and architecture which is implemented in fastai. The improvement over Transformer is quite interesting and makes a lot of sense. You can find it here. Let me know what you think!
Hi @lesscomfortable I’m not familiar with Transformer architecture, will definitely take a look but I’m not doing text classification.
I’m trying to train a classifier that takes the poster of a movie and tries to predict the genre
Hi all - happy to share my results in using FastAI and Resnet152 and a lot of differential learning rate cycles on a cellular histo-pathology dataset - up to 100% accuracy!
I saw two papers on this dataset and noticed that in both cases, the CNN’s they were using were pretty bland and in addition, very standard practice of a simple fixed learning rate, etc. One was from summer 2018, so while relatively recent, I could see the techniques we are learning here are way ahead of the curve.
I thus took it up as a challenge to apply FastAI to it. Interestingly, I started with Resnet50 and while it got to 98% and was very stable there, it became stuck on two very similar classes and could not move beyond it. I ultimately had to restart with ResNet152.
That still took a lot of cycles with the learning rate finder and differential learning rates, and a very steady train/check learning rate/ retrain process, but I did manage to tune it to repeated 100% results and thus outdo both of the papers in accuracy by a reasonable margin.
(91-95% was their best, and in one case they oddly only tested subsets of 4 classes to get that averaged 91%, not all 20 at once).
After completing lecture 1 of FastAI I decided to build a data set of Irish Gaelic football team players.
The goal of my computer vision classification model was to distinguish which players played for which team. Out of interest I built and trained my model using both Keras and FastAI.
*** FastAI Classification Model:** (Best Result: 92% Validation Accuracy) Link to Collab File
*** Keras Classification Model:** (Best Result: 91% Validation Accuracy) [Link to Colab File]
Some practical things which improved my model:
Data augmentation seems to work very well for small data sets (+8% Acc Improvement)
Fine tunining on learning rate seems to work well (this feels like a very important hyperparameter)
Tracking the Train & Val error allowed me to diagnose under fitting issues. Adding more capacity to my custom CNN network in Keras gave me some big improvements (+12% Acc Improvement).
While it was easier for debugging & experiments with lower epochs I found for my final models increasing the number of epochs gave me better results. I could see this trend from the training curves (+4% Acc Improvement)
I completed a visual error analysis of the misslassified images with the biggest loss. I removed images from the validation set which were clearly misllabelled. (+12% Acc Improvement)
Some other takeaways from the project:
I was suprised by how similar my results were between Keras and FastAI. I used a custom CNN network architecture using Keras while I used Resnet 50 architecture using FastAI (I didn’t do a like for like comparison).
Building custom image datasets isn’t as hard as I thought it would be and it’s far more rewarding than using out of the box datasets
Google Collab seems super useful for these quick hacky projects
Did you finish running the file dl2/imdb.ipynb. can you share me the model output such as lm_last_ft, lm1, lm1_enc, clas_0, clas_1, clas_2 …
It took me a lot of days to run all of the fit() commands. Please help me if you have these model files, thanks.
I tried to create a milti-label classifier that takes a movie poster as its input and predicts the different genres of that movie.
I thought it was an interesting experiment regardless of the results and definitely learned a lot while doing it.
You can find my final version on this repo. Any tips or recommendation is welcome.
My final result is ~0.59 f2 score but I’m not sure how to evaluate that, I didn’t find other classifiers to compare my results so if anyone knows about other solutions I’d love to see them.
I found that ClassificationInterpretation was lacking some functionality for multi-label classifiers, I created some functions manually but if anyone knows about how to interpret results in multi-label problems better I’d appreciate it.
Great work @oguiza I am reading all the thread on the TS study group and I am going to try our your examples on my dataset. One concern: the paper looks to be a broken link:
I have trained a classifier with fine tuned embedding language model which assigns content labels to basic restaurant descriptions. So you have a text like:
The three star coffee shop, The Eagle, gives families a mid-priced dining experience featuring a variety of wines and cheeses. Find The Eagle near Burger King.
This is the dataset used for the e2e Natural Language Challenge which consists of 50k <text,content labels> pairs. It achieves an F-score of 92% thanks to the gradual unfreezing of the layers.
You can find out more in my kaggle kernel. Comments are welcome.
Thanks @marcello_m! Could you please point me to the right reply where this link appear. I don’t have any context and don’t know which is the linked paper.
That is super impressive. Exploration in latent space is intuitive for explaining the network output results. The whale output in addition / subtraction gives you a fuzzy reassurance that the network indeed learned something. Congrats on the gold medal.
At some point I want to get back to the experiment and pretrain the loc2vec network based on multilabel classification. I have a strong hunch that it would be performing far better and easier to train as it would have seen relevant content and learned from it.