Lesson 2 In-Class Discussion

Gaurav85 · November 7, 2017, 5:06am

How do we do the SSH with Cygwin? I have win 7 machine.

charlielee · November 7, 2017, 5:07am

Even · November 7, 2017, 5:08am

@jeremy For those of us with home machines that are designed for deep learning is there a list of packages that are installed in the ami?

I built a deep learning box after the last course and I want to get it setup equivalent to the ami.

charlielee · November 7, 2017, 5:08am

install the environment based on the environment.yml file.

Git clone the fastai repo.
Run the following command (found that you will need a *nix system as some packages are not supported on windows) conda env create -f environment.yml
activate the env & run pip install -r requirements.txt to install the python packages needed
???
profit.

KevinB · November 7, 2017, 5:11am

use that conda command that he did, conda env update There is also a way to initially create a conda environment, but I am not remembering it off the top of my head, hopefully somebody else remembers, but basically you point that to environment.yml file and it magically installs everything into a new environment. It is really slick and awesome.

Note: The conda env update must be done in the same directory as the environment.yml.

satya · November 7, 2017, 5:13am

Any idea where to get the pretrained resnext50 model from? It doesn’t seem to be on model zoo.

Edit: Found this github project (https://github.com/clcarwin/convert_torch_to_pytorch) that can convert the model available here (https://github.com/facebookresearch/ResNeXt) to one fastai can use.

suvash · November 7, 2017, 5:21am

Thanks again @jeremy and @yinterian for this lesson.

I have a couple of general/ML questions, in the context of Deep learning.

I see that we picked up 20% of data for validation in one of the examples. What about things like cross-validation and things like k-fold validation ? Is it too much to compute perhaps ?
We are using Accuracy to measure model efficiency. I’m assuming this is (True Positives+True Negatives)/Total. Just curious, is it uncommon/complicated to use other measures, things like Precision, Recall, PR curve, AUC(ROC curve) etc. ?
How do we deal with unbalanced classes ? (I think Jeremy mentioned a paper on balancing datasets.)

Even · November 7, 2017, 5:33am

Thanks @charlielee this was very helpful.

ravivijay · November 7, 2017, 5:55am

Reg. point 3, Jeremy said that the paper suggests just copying the classes underrepresented to increase their count. I want to experiment with some sort of data augmentation tricks + copying too.

jeremy · November 7, 2017, 6:32am

Because the training loss includes dropout. We’ll learn about this soon.

jeremy · November 7, 2017, 6:33am

That paper is only relevant to folks with effectively infinite resources. IMHO it’s of little if any practical value.

jeremy · November 7, 2017, 6:35am

We’re only using the learning rate finder from that paper, not the cyclical learning rates themselves. The annealing method we use is from https://arxiv.org/abs/1608.03983

jeremy · November 7, 2017, 6:37am

Yes, exactly

jeremy · November 7, 2017, 6:43am

Yup that’s exactly what we’ll be doing

jeremy · November 7, 2017, 6:45am

You got it right

jeremy · November 7, 2017, 6:45am

Yes there’s dropout. We’ll learn about that soon.

jamesrequa · November 7, 2017, 6:49am

We don’t appear to be using any k-fold cross validation so far. Is this not needed because of TTA?

sermakarevich · November 7, 2017, 7:01am

'+ train prediction through cv

jeremy · November 7, 2017, 7:17am

Yes that’s a real issue in the unicorn community. In practice, you have to adjust your probabilities to ‘undo’ the over-sampling.

jeremy · November 7, 2017, 7:19am

Yup I did say that. Adding the validation set back in right at the end is fine - you can get a better final model this way. cc @yinterian