Share your work here ✅

this is a pet project I am involved in, not sure if this is the right place to post but it’s loosely inspired by me learning fastai, so I thought it might be interesting:

We felt it’s important to keep up to date with recent discussions in machine learning across the net, so I helped writing a site that collects this kind of content: hype.machlearning.net

It can do some interesting queries, e.g. changes to SotA in the last month, sorted by “top”, meaning: first places first:

https://hype.machlearning.net/?so=top&t=1m&SotA=

It also knows which arXiv papers have been written by which group, so you can e.g. see all papers discussed which were written by Google in the last 3months, ordered by date:

https://hype.machlearning.net/?so=new&t=3m&a=Google

It uses a sentiment model to decide which twitter messages are related to machine learning and also tries to find the most significant phrase in a conclusion of an arXiv paper (Could it change the SotA? What are problems with this approach?) and displays it next to the paper’s title

1 Like

I have worked on news categorization of AG News dataset using the Fast.ai library. Got an accuracy of 93%. You can check out the GitHub repo here

2 Likes

Try with Arcface loss.
I have implemented it in following kernel which I have created for identification of whale species using tale. This has hoping 5004 classes. i stand with Score of ~~93 on test set
https://www.kaggle.com/jaideepvalani/arcface-humpback-customhead-fastai-score919/

3 Likes

From your diagram it looks like you cannot gain super-convergence. Just run learn.lr_find() again, and let us know if the results are different. It seems to me that this plot always provides different results if I repeat:

learn.lr_find()
learn.recorder.plot()

You can always use the learn.fit_one_cycle(1, max_lr=slice(3e-5,3e-4)) which should be the safe default.
I am not sure why the plot is always different. Maybe I am missing something.

1 Like

Hey, interesting project and my mother had the same problem! :smiley:

A tip for everybody who has a small dataset

@kodzaks
and
@mididou

If you have small dataset maybe you can try the powerful mixup technique that is already integrated in fastai. I had EEG data and experimented with very limited data, and got from 75% to 83% accuracy boost by adding .mixup() after creating the the learner like in this example:

from the fastai docs:

Mixup data augmentation

What is Mixup?

This module contains the implementation of a data augmentation technique called Mixup. It is extremely efficient at regularizing models in computer vision (we used it to get our time to train CIFAR10 to 94% on one GPU to 6 minutes).

As the name kind of suggests, the authors of the mixup article propose to train the model on a mix of the pictures of the training set. Let’s say we’re on CIFAR10 for instance, then instead of feeding the model the raw images, we take two (which could be in the same class or not) and do a linear combination of them: in terms of tensor it’s

new_image = t * image1 + (1-t) * image2

where t is a float between 0 and 1. Then the target we assign to that image is the same combination of the original targets:

new_target = t * target1 + (1-t) * target2

assuming your targets are one-hot encoded (which isn’t the case in pytorch usually). And that’s as simple as this.

mixup

Dog or cat? The right answer here is 70% dog and 30% cat!

As the picture above shows, it’s a bit hard for a human eye to comprehend the pictures obtained (although we do see the shapes of a dog and a cat) but somehow, it makes a lot of sense to the model which trains more efficiently. The final loss (training or validation) will be higher than when training without mixup even if the accuracy is far better, which means that a model trained like this will make predictions that are a bit less confident.

Example Training

model = simple_cnn((3,16,16,2))
learner = Learner(data, model, metrics=[accuracy]).mixup()
learner.fit(8)

================================
This powerful technique needs more visibility… I hope Jeremy will be kind to mention this mixup augmentation in one of the awesome lectures that we are enjoying in the part2 v3 course…

================================

@kodzaks
How many images you have in your dataset? and how many classes? How is your best accuracy so far? If mixup worked for you, I would love to know…

I love any project related to sounds and waves… FFT and understanding how is the speech and different musical sounds created by mixing only pure sine waves with different freq, phase and amplitude was intriguing me since I was a kid… This was something that I was dying to know and nobody could help (pre-internet era)… After several years when I got into, college I could understand it, and implemented FFT on my old MSX2 computer in BASIC and did some FIR filtering on the waves… That was truly a joy for me that I still remember vividly… Now kids are lucky that anything they want to know, it is only few clicks away…

13 Likes

That’s a clever workaround… I had the same issue when interpretation could not run on fp16 models.

Thanks for sharing!

That is weird… All the memory sizes are as expected, except the 224x224…

Perhaps you have looked at it when your model was freezed… After unfreezing usually it will take more gpu ram… you can test it by doubling BS with 224 size and you will get cuda OOM error.

NBA MVP predictions

Hi everybody!

I am trying to do my first project: precidt the NBA MVP winners. I made a training and testing csv files (based on the last 30 years).

have few problems:

  1. Each year only 1 players wins the mvp. how do I add this condition to my program? (ofcourse that every year 1 player must win the title). (1 for winning the title, 0 else)

  2. I splited my train and test data in this pattern:
    2018 data - test
    2017 data - train
    2016 data - train
    2015 data - train
    2014 data - test
    so on…
    is it the right way?

3)What is the right way to split between the train and validation sets?

4)What is the right way to define the cont and cat variables?

Your Help we be great!

Here is 2016 training data (Westbrook won the MVP)

Part of my code:

1 Like

This is an extremely exciting project… I have read your blog post when @radek tweeted about it with a lot of enthusiasm…

I read the amazing blog post that has inspired you to do your project. Thanks for sharing…

This is something that made my heart palpitating…
I think the most exciting part is that the dataset is human comprehensible, so we can understand the math operations on the embeddings by just looking on the resulted image. People are lucky who work on datasets that both us and machines share the same insight and both understand the same way. Such visualization are not only useful for debugging, but excites you that the machine not only learned to recognize images, but can understand what are the differences between them…

And I could not resist but trying vec2whale operations (like whale - whale = ? ; whale + whale = ?). That was after finishing the kaggle whale competition with a silver medal (which I haven’t think that is even a possibility for my 1st serious kaggle comp)…

Here are few images of those whale2vec… Note that the whale identification features are way more subtle than roads, buildings…etc… The model has been trained to identify whales from small scratches and colors. So its embedding math operations should be understood in this context.

WHALE2VEC operations:

Minus:

(X) whale - (Y) whale = (Z )whale

Colored - colored = Black and white

The following minus operation is interesting:

Mostly whitish - middle whitish and blackish on edges = middle blackish and whitish on edges (just what you are expecting from minus operation)

Plus:

(X) whale + (Y) whale = (Z )whale

Middle white (with streaks) + edge white = ALL white (with streaks)

2 Likes

Hey, I published my own review on the Kaiming paper, without the mathematical derivation but with all the important intuitive concepts. I hope it serves as a complement of @PierreO’s post. You can find it here. Feedback is welcome appreciated!

5 Likes

Hi, I recently tried to build a joke generator using the NLP stuff taught in course 4.
I used the jokes in wocka.json and stupidstuff.json from this repo.

It didn’t really work that great but some of the generated “jokes”:

Yo mamma so old, she is still in the shower!
There was a blonde who was working on a computer. She was a BLONDE PROGRAMMER.
What do you call a man with a dog in his mouth? - An Irishman.

I know reaaaally good ones!

5 Likes

Me to, seems it is always different for no apparent reason. I cant understand why. Could it be because of me forgetting to seed?

numpy.random.seed(42)

Hey @prosti thanks for your reply, it turns out I was doing all kinds of mistakes in my code.

I ended up giving up with this dataset from google images and started to work on another one, will post my results shortly

Hi, I just reviewed the Transformer-XL paper and architecture which is implemented in fastai. The improvement over Transformer is quite interesting and makes a lot of sense. You can find it here. Let me know what you think!

2 Likes

Hi @lesscomfortable I’m not familiar with Transformer architecture, will definitely take a look but I’m not doing text classification.
I’m trying to train a classifier that takes the poster of a movie and tries to predict the genre

Hey! That’s a computer vision problem, my post is on NLP.

:heart_eyes: :pray::pray:Thank U @hwasiti , great tip , definitely need to delve more into the Documentation.

1 Like

I managed to get 98% accuracy classifying speakers from a small dataset! More details:

2 Likes

Hi all - happy to share my results in using FastAI and Resnet152 and a lot of differential learning rate cycles on a cellular histo-pathology dataset - up to 100% accuracy!

I saw two papers on this dataset and noticed that in both cases, the CNN’s they were using were pretty bland and in addition, very standard practice of a simple fixed learning rate, etc. One was from summer 2018, so while relatively recent, I could see the techniques we are learning here are way ahead of the curve.
I thus took it up as a challenge to apply FastAI to it. Interestingly, I started with Resnet50 and while it got to 98% and was very stable there, it became stuck on two very similar classes and could not move beyond it. I ultimately had to restart with ResNet152.
That still took a lot of cycles with the learning rate finder and differential learning rates, and a very steady train/check learning rate/ retrain process, but I did manage to tune it to repeated 100% results and thus outdo both of the papers in accuracy by a reasonable margin.
(91-95% was their best, and in one case they oddly only tested subsets of 4 classes to get that averaged 91%, not all 20 at once).

8 Likes