Share your work here ✅

Thank you very much for the great articles!
I really enjoy them! :smiley:

1 Like

Hey @r2d2, nice work!

I noticed your notebook will work on versions <= 1.0.22 . Is there a chance of updating it to current version? Which updates must be fixed in order to do so?

Posting this here to see if there might be more eyes on it for those who might find it useful. Forgive the spam; happy to clean it up.

Find below my implementation of WideResNet from scratch modeled after Lesson 7. Trying to figure out what papers are saying and how to implement them in is really teaching me a lot. Some takeaways:

  • Darknet really is fast – really
  • Though I did not test this myself, if I extrapolate from WRN’s performance in this notebook, I can see how it would perform better against fresh ResNets with a similar number of parameters; based on this and other results from the paper, I will seriously consider widening a network before deepening if I ever feel the need to increase the number of layers beyond 50
  • As you can see, train_loss consistently hovers above valid_loss in later epochs, while valid_loss continues to drop; my suspicion is that the dropout layers are really helping the network generalize – bears further investigation


1 Like

Check out my new articles on the implementation of Gated Recurrent Units (GRUs) using Electronic Health Records (EHR) for predicting future diagnosis codes.

Part 1: Generating artificial EHR data

Part 2: Pre-processing artificially generated EHR data

Part 3: Minimal Implementation of Edward Choi’s et. 2016 Dr. AI paper.


Thanks! I won’t have time in the near future to fix this, but I think you should be able to do this yourself as well :slight_smile: I did nothing more than pass a dataset through the trained model, save the activations of the last hidden layer, and then apply PCA to those. My approach was quite hacky so I’m sure you’ll find an even better way to do it!

Hello everyone,

I want to share a project I’ve written, in which I develop a data augmentation technique.

The approach, which I call “CDA” (for Combinatorial data augmentation) is designed for classification problems in which the number of labelled images is very low, even for standard transfer learning. For example, 24 labelled images in total for a binary classification problem (it works with even less images).

In the README and jupyter notebook:

I describe the technique in detail, and test it thoroughly on the Fashion MNIST dataset, but I’ll describe it briefly here as well.

Just a note: I have not read about this approach elsewhere, so to the best of my knowledge, this is a novel approach. Please let me know if you’ve seen anything similar somewhere else.

I focus on the binary case, but the same ideas can be applied for multi-class problems.

Suppose we want to classify images from these two categories (these are two classes in Fashion MNIST):

And suppose we have only 12 images from each class, so 24 labelled images in total. We could leave 8 images for validation and use the other 16 for transfer learning on resnet34, tuning the weights of the last layer. That’s going to be the benchmark, but it is unlikely to yield very accurate results on a Test set (disjoint from Train and Valid) due to the low number of examples.

What CDA does is to produce a large set of collages from the 16 images used for training. These collages are simply 3x3 arrays of images taken randomly among the 16 available for training (could be also 2x2, 4x4, etc.).


Because there are 9=3x3 locations, and 16 images available, the number of different collages is 16^9>10^{10}. We would obviously not consider all possible collages, but only a subset large enough. The label of a collage is given by the class that occurs the most among the 9 images in it. The combinatorial nature of the procedure to generate collages gives the name to the technique.

Once the desired number of collages is generated, a transfer learning from resnet34 is applied using them, and because the number of collages is now large, one can tune more than just the last layer of resnet34, which gives a neural network N_{alt}.

Finally, one applies a transfer learning on N_{alt}, based on the 16 original training images, just as it was done for the benchmark, adjusting the weights in the last layer.

The intuition is that because N_{alt} has learned to classify the collages according to the majority class in them, and this has been done for thousands of images (potentially millions), N_{alt} has learned already a lot of useful patterns related to the problem of interest. Therefore, a further transfer learning starting at N_{alt}, and based on the 16 original images, would likely do a better job at classifying the two original classes.


Easy pair

For example, working with an easy pair: coat vs boot:

these are some of the collages obtained:


And the following shows the error rates on Test set (1000 images from each class) for both the benchmark and CDA, for different values of m=|X_t|:


As m increases, both benchmark and CDA decrease their error rates, but CDA is always significantly lower than the benchmark. Here we have the ratio between their error rates (CDA/Benchmark):


which goes from 0.5 to below 0.1.

Hard pair:

Similarly, for a hard pair: pullover vs shirt:

we get collages such as:


Error rates are still significantly lower for CDA:


More results in the notebook!


It seems really interesting.

Did you also tried using a different grid-size instead the mentioned 3x3 collage grid?

Thanks for sharing!

1 Like

Hi! thanks for the feedback @danielegaliffa !

In the notebook I used only 2x2, 3x3, 4x4 arrays. I did not try larger. They don’t have to be square either, could be rectangular e.g., 2x3. I used squared so that I only have one parameter instead of two. I tried to use grids as small as possible, so I went to 3x3 or 4x4 only when the number of labelled images was very small. As it is grows larger, like 64 let’s say, then 2x2 is enough to create enough collages.

But that’s definitely something I should explore: the impact of using larger q, for a fix number of images.

Hey people,

I wanted to share my work here.
I have written a React front end application that uses canvas to collect a drawing and make a prediction on it. The api is written in python using flask and it is deployed on render for the api and netlify for the front end.

I would happily write up a blog or tutorial on it if anyone is interested.


This is very interesting! What happens if the background is not black - does joining different images confuse the model? What if there are classes that come out of the side of the image - then combining the images doesn’t make too much sense. How fast is it to combine the images into a collage and feed into a minibatch?

Obviously I don’t expect you to have all the answers but rather they’re just some questions to think about :slight_smile:

Hi Tom,
Thanks for the feedback and questions!

Generating about 50,000 collages takes a few minutes, but I am sure my function to do this is not very efficient. Regarding the other questions, I really cannot tell because I only test this thoroughly on the Fashion MNIST dataset. I would love to try it in another dataset to find where the CDA approach fails. There could be certainly issues with images having characteristics as the ones you mention.

1 Like

This question does not exist.

Language model trained on a data dump of Stack Overflow that mimics coding questions.

If you find a good one you can share it with the permalink (click the “Fresh Question” button to get a new one).

Interesting things I’ve noticed so far:

  • It does a remarkably good job of context switching between programming languages based on the semantics of the question! If the question is about SQL it often includes SQL in < code > tags. If it’s about JavaScript it will include JavaScript! The syntax isn’t perfect due to the tokenizer mangling some things but it’s pretty close!
  • The grammar isn’t perfect but it’s pretty good.
  • It doesn’t seem to lose closing track of closing tags and quotes.

I wrote a full writeup here. And have posted the pretrained model for people to download and play with as well.

Edit: Stack Roboflow made it to the front page of hacker news!!!
Edit 2: now it’s #1 on /r/programming!


That seems wonderful Brad. did you use the whole dump? what was your data cleaning process?

I was trying something similar and right now I’m looking for resources to use image and tabular data together for training.

Awesome, did it converge at 68% accuracy?

Am suprised that it has such a high accuracy, has it overfitted to the specific language of stackoverflow?

Hi everyone,

It’s my first time posting in this forum, so I wanted to say many thanks to everyone who is making this course so great.
I wanted to share what I have been working on for week 2.

I wanted to see if I was able to create a classifier that could distinguish between paintings from Monet, Manet, and Renoir.

Since they are from the same art movement (impressionism), it can be quite challenging to see the different styles.
I can get up to a 0.80 accuracy, which is not very high, but I believe that with a better training set it could improve.

The notebook is available here:
I have also managed to put it into production on Render:

If anyone has any input on how to improve the model I am very welcome to suggestions!

I have just seen that @mrandy (Share your work here ✅) has done something similar, really cool!



No unfortunately I ended up having to train on a subset (~1/16 of the available data) because the full amount wouldn’t fit in memory.

In retrospect I made a poor choice in choosing the first 1/16 of the data because now it doesn’t know about more recently invented programming topics. If the site gets popular I’ll be improving the model going forward though!

No manual data cleaning. I did have to convert it from XML to CSV so I stripped the new lines and replaced them with a new special XXNL token. And I filtered out “answers” which were in the same data dump to use at a later date.

1 Like

Yes, me too. I’m not sure why the accuracy is so high but I suspect it’s because programming has lots of highly repeated things.

For example, if you’re in a code block and you see an open paren, there’s probably a close paren coming. And if you’re writing JS and see “var x” the next token is likely “=“.

This is an area I’d like to explore more. I’d love to get a dump of all public GitHub gists and see how well a language model could learn to generate code.

Cool! I just came up with the same idea, thinking of that it may become a new way for a computer to learn anything by itself.

Hello :wave:t3:,

As I advance through the course, I’ve worked on my first NLP project. The objective was to measure the tonality/sentiment of a tweet. I started with the language model trained on WikiText-103, fine-tuned it using a sample of tweets from the Sentiment140 dataset.
From there, I was able to create my tweet classifier which remains basic, as it has difficulty detecting sarcasm.