Blog Posts, Projects and Articles

jeremy · April 13, 2017, 8:12pm

Looking good @VishnuSubramanian . A few suggestions:

The final layer of resnet is not fully connected, but is a global average pooling. So I suggest you change that comment. It’s fine to replace the global average pooling layer with a fully connected layer, as you have here. You may however find you get better results in transfer learning by fine-tuning all the fully connected layers of VGG. Perhaps you could try both and see which work better?
I think people will be most interested in seeing how well your approach works. Perhaps you could show how long it takes to train, and how accurate it is. And maybe show some examples of the images and predicted labels.
In the course we save a lot of training time by pre-computing the penultimate layer’s features, and then just training the fully connected layer(s). It might be worth showing how to do that too, and show the impact on time. Maybe that’s for a 2nd blog post!

jeremy · April 13, 2017, 8:15pm

Based on the pic you showed, the model seems to be very significantly under-fitting. Step one for any model is to try to over-fit! Try adding many more parameters so that you can over-fit, and then gradually regularize until you get good test set performance.

Once you’ve got a good test set fit, try comparing it to other approaches to see whether it’s the best approach. (e.g. you could try a CNN, MLP, random forest, and standard time series methods).

sravya8 · April 14, 2017, 12:27am

In case you missed on the other thread, here is a blog post we wrote about some of the awesome high impact projects our students are working on https://medium.com/@sravsatuluri/lightning-talks-impact-driven-ai-applications-fb5836d875f7 We will be talking about these projects on May 5th at the Data Institute.

VishnuSubramanian · April 14, 2017, 6:38am

Thanks for your valuable suggestions ,

I was looking into pytorch implementation of resnet here.

github.com

pytorch/vision/blob/master/torchvision/models/resnet.py

import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo


__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
           'resnet152']


model_urls = {
    'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
}


def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""

This file has been truncated. show original

They have included a Linear layer after average pooling at line 111. So I assumed that the last layer is a linear layer. Please let me know if my understanding is wrong.

Started working on the 2nd blog based on your suggestions. Will post it once completed.

jeremy · April 14, 2017, 5:58pm

Ah yes, you’re quite right - sorry I forgot about that!

brendan · April 15, 2017, 3:31pm

Interesting talk highlights factors to consider before starting an AI company.

So You Want To Found an AI Startup?

twairball · April 17, 2017, 1:47am

I found it particularly interesting that the author remarked he got better results to train Resnet50 from scratch, vs using pretrained Resnet50.

It’s been noted by other participants as well as from other competitions (e.g. fisheries), as well as my own observations, that pretrained Resnet50 and Inception often performs worse than VGG16 out of the box.

I wonder if the presence of the identity blocks is allowing Resnet to “skip” alot of basic convolution filters and therefore contributes less useful features for transfer learning?

VishnuSubramanian · April 19, 2017, 1:37pm

HI ,

Wrote a continuation blog on transfer learning using Pytorch . Where I have shown how to use transfer learning with pre-convoluted features .Similar to what we learnt in Part-1 of this course. Compared the performance results with Keras on Tensorflow. Please find it here.

Let me know your feedback.

brendan · April 19, 2017, 2:58pm

Great work! Interesting runtime comparison between pytorch and keras.

VishnuSubramanian · April 19, 2017, 10:08pm

@Jeremy The performance result of PyTorch Vs Tensorflow was approximately 15 min vs 30 min . Where I forced pytorch to use 1 core . But when I allow pyTorch to use all the 6 cores in machine , it was taking 11 minutes to complete. Since I was not sure about multiprocessing capabilities in Keras , I did not try it. Thanks for tweeting .

jeremy · April 20, 2017, 1:10am

Thanks for the clarification - sorry I misunderstood!

slavivanov · May 29, 2017, 2:51pm

I finally built that DL box. Here is a post describing the process:

brendan · May 29, 2017, 3:22pm

This is great! Thanks for the detailed overview and benchmarks! Let me know how the CPU turns out. I have two GPU cards now and the CPU (i7) has become the biggest bottleneck when using data augmentation. The GPU is so fast it will crunch a big batch in 20ms while the data augmentation (rotates, flips, zoom) can take up to 500ms. Even w multiple workers the GPU catches up. I’m looking for ways around this but still an open problem.

davecg · May 29, 2017, 11:37pm

Great post.

As for model training taking too long, think it’s just the machine learning equivalent of this:

jeremy · May 30, 2017, 4:12am

How many cores do you have?

brendan · May 30, 2017, 5:44am

Four. Intel Kaby Lake i7. I’m also using heavy data augmentation and running 2-5 experiments simultaneously.

https://www.amazon.com/gp/aw/d/B01MXSI216/ref=mp_s_a_1_1?ie=UTF8&qid=1496122800&sr=8-1&pi=AC_SX236_SY340_FMwebp_QL65&keywords=kaby+lake+i7

slavivanov · May 30, 2017, 10:17am

@brendan I’m also wondering about the CPU. A lot of people suggested that the bottleneck on my configuration would be the PCIe lanes on the CPU (with 2 GPUs). The graphics cards would have to run in 2x8, instead of 2x16, i.e. sending up to 8GB one way to each GPU.
Have you experienced this bottleneck?

corbin · July 2, 2017, 3:21pm

To anyone interested, I tried to implement a “Breadth-First” approach to Deep Neural Networks in my latest blog post in an attempt to give a general overview of what Deep Learning can accomplish, how the functions learn, etc.

It is quite high-level, so I’m sure no one here will learn much of anything from it, but I thought I’d post it just in case.

https://developingideas.me/deepneuralnetworkoverview/

Also, partly inspired by a Mathematician’s Lament and Jeremy and Rachel’s “breadth-first” v. “Depth-First” take on learning, something I myself had been thinking about a lot at the time, I wrote a blog post about that as well, which can be found here:

https://developingideas.me/depth-vs-breadth-learning/

VishnuSubramanian · August 9, 2017, 3:04am

Check out my post on how to scale deep learning to 100’s of nodes : Which is a summary of the recently published paper by FAIR. Please drop your feedback.

janardhanp22 · August 22, 2017, 7:17am

“Connecting the dots for a Deep Learning App”. Check out my latest blog post and try the App.
Blog:

App:
https://movie-review-sentiment.herokuapp.com/