Lesson 1 In-Class Discussion ✅

prajjwal1 · October 28, 2018, 12:52pm

Is there any example available regarding using Custom head on top of ResNet50 with ConvLearner. It’s not available in docs. I have 10M (approx) images, and Resnet has reached saturation. I wanted to increase number of parameters. Also as far as docs says, there’s support for darknet and wide resnet including Unet which may not serve the purpose here.

paul · October 28, 2018, 2:49pm

I agree with the neural net here: there is no way this is not a beagle
Anyone knows of any systematic ways to figure out which data are mislabeled?

MicPie · October 28, 2018, 3:19pm

Yes, try this:

interp = ClassificationInterpretation.from_learner(learn)
# show most confused images (with min cut off = 2)
interp.most_confused(min_val=2)
# or plot the full confusion matrix:
interp.plot_confusion_matrix(figsize=(17,12), dpi=60)

Mwai_Newton · October 28, 2018, 3:47pm

I’m also getting the same error. Have you found a solution yet?

Tejaswani · October 28, 2018, 4:36pm

Hi team,

After going through lesson1, I tried to train a model with a different image dataset(http://ai.stanford.edu/~acoates/stl10/stl10_binary.tar.gz), But I am facing some issues while using the untar_data function

KeyError: 'content-length’

Also, on using a different dataset(http://www.cs.utoronto.ca/~kriz/cifar-10-python.tar.gz), I am getting the below error on using the untar_data function

OSError: Not a gzipped file (b’<!’)

Any help would be appreciated

brismith · October 28, 2018, 4:44pm

In a conda environment:
conda install nb_conda_kernels
resolved this for me @pandeyanil. I then saw all my kernels.

MicPie · October 28, 2018, 4:46pm

It seems that you put too much data on your GPU RAM.

Try to check with the bash command nvidia-smi (or I like to use gpustat) before and after the operation causing the CUDA memory error if this is the case.
You can call it directly from your jupyter notebook with !nvidia-smi (with “!” you can run bash commands in a jupyter notebook).

Also restart the notebook kernel (shortcut Esc+00) and if this does not help your entire system.

If it is the case, try decreasing the bs and/or the image size.

I hope this helps.

raghavab1992 · October 28, 2018, 4:52pm

@prajjwal i think jeremy will cover that in the coming lessions. However if u want to give it a shot, refer to this link in the documentation

paul · October 28, 2018, 4:56pm

Is there a way to quickly spot the possible errors in the labels of the validation set data? (Similarly also errors in the training data, though that should be less of a problem.) The mislabeled validation data are more likely to be misclassified by any deep network that predicts them, so maybe one could look at the intersection of the wrong predictions of different models, especially in the intersection of the confident predictions that are wrong. Does anyone have any links to such literature, blogs, or past fastai threads?

Mwai_Newton · October 28, 2018, 5:09pm

This helps. It worked when I changed the image size. Thanks a lot.

asutosh97 · October 28, 2018, 5:17pm

can you give a brief explanation of what its trying to do?

keyurparalkar · October 28, 2018, 5:24pm

Have a look at this question which focuses on semantic segmentation.

vijaysai · October 28, 2018, 5:35pm

So Guys, Jeremy has mentioned about issues with images being rectangular and not square. What is it the exact issue that happens if the images do not have a X * X resolution and have X * Y resolution ?

ste · October 28, 2018, 5:47pm

Hi Daniele,

Usually the first parameter of DataBunch constructors (“path”) refers to DataBunch root folder.

Example:
data = ImageDataBunch.from_lists(path=pathToRoot, fnames=files, labels=labels, valid_pct=0.2, test='test', ds_tfms=tfms, bs=32)

In all the sample notebooks, the path folder contains: both training data and the saved models (in “models” subfolder).
For example:

learn.save('last') will write the file “last.pth” inside the “models” subfolder in path
learn.load('last') will read it

NB: usually fast.ai samples has this “root folder” inside ~/.fastai/data
(ie: ~/.fastai/data/mnist_sample)

Link to working sample:

github.com

artste/fastai-samples/blob/master/kaggle/lesson1-Distracted-Driver-Detection.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lesson 1 - Are you a distracted driver?\n",
    "# Kaggle competition sample"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try to address the Synthetic Digits dataset using concepts from lesson1/v3\n",
    "\n",
    "https://www.kaggle.com/c/state-farm-distracted-driver-detection\n"
   ]
  },
  {

This file has been truncated. show original

prajjwal1 · October 28, 2018, 5:49pm

I did, that’s why I asked for example that’s proven to work better when added as head.

ste · October 28, 2018, 6:19pm

Sure! In the first lesson the architecture (=model parameter) has been loaded from prebuilt one:

learn = ConvLearner(data, models.resnet50, metrics=error_rate)

But you can define you own using fast.ai (Layers – fastai) in a way very similar to Keras:

model = nn.Sequential(
    nn.Conv2d(3,  16, kernel_size=3, stride=2, padding=1), nn.ReLU(),
    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1), nn.ReLU(),
    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1), nn.ReLU(),
    nn.AdaptiveAvgPool2d(1),
)

For what I’ve understand fastai library wraps and simplify PyTorch as Keras wraps and simplify Tensorflow.

Says that, I think that pretrained Resnet34 model is a good starting point to address the bioimaging dataset you’ve proposed.
Of course, you have to unfreeze almost all layers (probably I would keep the first 3 to 15 layers frozen as starting point), and train with A LOT images, normalizing on new dataset and not relying on imagenet normalization.

Mwai_Newton · October 28, 2018, 6:20pm

I’m playing with some data from kaggle whose evaluation is based on macro F1 score. Do we have a f1_score metric in fastai? I’ve tried f1_score from sklearn.metrics but it doesn’t seem to work

jeremy · October 28, 2018, 6:26pm

Sure!

Mwai_Newton · October 28, 2018, 6:58pm

Thank you!

amallya · October 28, 2018, 7:01pm

I downloaded images (my dataset) from google images following @lesscomfortable 's wonderful jupyter notebook. I am using AWS Sagemaker as my platform. I got a 66% error rate, which is bad . On looking at the images (show_batch), I found some irrelevant images. How do I remove irrelevant images from my dataset?

I have 6 classes and 300 images in each class. Should I open each image and verify if they are relevant or not? Is it possible to download the dataset from sagemaker to my laptop, so that I can quickly delete irrelevant images?