Blog Posts, Projects and Articles

(Jeremy Howard) #21

Looking good @VishnuSubramanian . A few suggestions:

  • The final layer of resnet is not fully connected, but is a global average pooling. So I suggest you change that comment. It’s fine to replace the global average pooling layer with a fully connected layer, as you have here. You may however find you get better results in transfer learning by fine-tuning all the fully connected layers of VGG. Perhaps you could try both and see which work better?
  • I think people will be most interested in seeing how well your approach works. Perhaps you could show how long it takes to train, and how accurate it is. And maybe show some examples of the images and predicted labels.
  • In the course we save a lot of training time by pre-computing the penultimate layer’s features, and then just training the fully connected layer(s). It might be worth showing how to do that too, and show the impact on time. Maybe that’s for a 2nd blog post! :slight_smile:

(Jeremy Howard) #22

Based on the pic you showed, the model seems to be very significantly under-fitting. Step one for any model is to try to over-fit! :slight_smile: Try adding many more parameters so that you can over-fit, and then gradually regularize until you get good test set performance.

Once you’ve got a good test set fit, try comparing it to other approaches to see whether it’s the best approach. (e.g. you could try a CNN, MLP, random forest, and standard time series methods).

(sravya8) #23

In case you missed on the other thread, here is a blog post we wrote about some of the awesome high impact projects our students are working on We will be talking about these projects on May 5th at the Data Institute.

(Vishnu Subramanian) #24

Thanks for your valuable suggestions ,

I was looking into pytorch implementation of resnet here.

They have included a Linear layer after average pooling at line 111. So I assumed that the last layer is a linear layer. Please let me know if my understanding is wrong.

Started working on the 2nd blog based on your suggestions. Will post it once completed.

(Jeremy Howard) #25

Ah yes, you’re quite right - sorry I forgot about that!

(Brendan Fortuner) #26

Interesting talk highlights factors to consider before starting an AI company.

So You Want To Found an AI Startup?

(jerry liu) #27

I found it particularly interesting that the author remarked he got better results to train Resnet50 from scratch, vs using pretrained Resnet50.

It’s been noted by other participants as well as from other competitions (e.g. fisheries), as well as my own observations, that pretrained Resnet50 and Inception often performs worse than VGG16 out of the box.

I wonder if the presence of the identity blocks is allowing Resnet to “skip” alot of basic convolution filters and therefore contributes less useful features for transfer learning?

(Vishnu Subramanian) #28

HI ,

Wrote a continuation blog on transfer learning using Pytorch . Where I have shown how to use transfer learning with pre-convoluted features .Similar to what we learnt in Part-1 of this course. Compared the performance results with Keras on Tensorflow. Please find it here.

Let me know your feedback.

(Brendan Fortuner) #29

Great work! Interesting runtime comparison between pytorch and keras.

(Vishnu Subramanian) #30

@Jeremy The performance result of PyTorch Vs Tensorflow was approximately 15 min vs 30 min . Where I forced pytorch to use 1 core . But when I allow pyTorch to use all the 6 cores in machine , it was taking 11 minutes to complete. Since I was not sure about multiprocessing capabilities in Keras , I did not try it. Thanks for tweeting .

(Jeremy Howard) #31

Thanks for the clarification - sorry I misunderstood!

(Slav Ivanov) #32

I finally built that DL box. Here is a post describing the process:

(Brendan Fortuner) #33

This is great! Thanks for the detailed overview and benchmarks! Let me know how the CPU turns out. I have two GPU cards now and the CPU (i7) has become the biggest bottleneck when using data augmentation. The GPU is so fast it will crunch a big batch in 20ms while the data augmentation (rotates, flips, zoom) can take up to 500ms. Even w multiple workers the GPU catches up. I’m looking for ways around this but still an open problem.

(David Gutman) #35

Great post. :slight_smile:

As for model training taking too long, think it’s just the machine learning equivalent of this:

(Jeremy Howard) #36

How many cores do you have?

(Brendan Fortuner) #37

Four. Intel Kaby Lake i7. I’m also using heavy data augmentation and running 2-5 experiments simultaneously.

(Slav Ivanov) #38

@brendan I’m also wondering about the CPU. A lot of people suggested that the bottleneck on my configuration would be the PCIe lanes on the CPU (with 2 GPUs). The graphics cards would have to run in 2x8, instead of 2x16, i.e. sending up to 8GB one way to each GPU.
Have you experienced this bottleneck?

(Corbin Albert) #39

To anyone interested, I tried to implement a “Breadth-First” approach to Deep Neural Networks in my latest blog post in an attempt to give a general overview of what Deep Learning can accomplish, how the functions learn, etc.

It is quite high-level, so I’m sure no one here will learn much of anything from it, but I thought I’d post it just in case.

Also, partly inspired by a Mathematician’s Lament and Jeremy and Rachel’s “breadth-first” v. “Depth-First” take on learning, something I myself had been thinking about a lot at the time, I wrote a blog post about that as well, which can be found here:

(Vishnu Subramanian) #40

Check out my post on how to scale deep learning to 100’s of nodes : Which is a summary of the recently published paper by FAIR. Please drop your feedback.

(janardhanp22) #41

“Connecting the dots for a Deep Learning App”. Check out my latest blog post and try the App.