Lesson 2 discussion

stella · January 26, 2017, 6:32pm

Thanks! Sorry didn’t notice it was an old question. I am just start going through the lessons.

anamariapopescug · January 26, 2017, 6:33pm

yes, it’s hard to see - but awesome of you to add the answer in-line for future readers

shgidi · January 29, 2017, 10:54am

How come training data for this competition is around 600MB, but when saved as numpy with bcolz, the file size is around 6GB.
is it normal?

pl3 · January 31, 2017, 2:54pm

Anyone familiar with the css Jeremy is using for this notebooks?

Gelu74 · January 31, 2017, 6:05pm

@pl3 this might be of interest:

pl3 · January 31, 2017, 6:24pm

@Gelu74 thanks. I had stumbled across that, but was also looking for the color scheme, widening the boxes, etc. Most themes I’ve found are for old Ipython notebooks that don’t work in Jupyter notebooks any more.

maxim.pechyonkin · February 1, 2017, 8:48am

@jeremy In the video lecture for Lesson 2, at 1h 16m 21s, when discussing derivatives and defining function upd(), it seems that you are taking derivatives for the loss function. At first it seemed very confusing to see dydb = 2 * (y_pred - y) but then I understood that it was actually not dydb but rather dLossdb as we are interested in how the loss value will change when we change our b parameter.

geniusgeek · February 3, 2017, 6:49pm

take a look at these https://github.com/dunovank/jupyter-themes

geniusgeek · February 3, 2017, 6:51pm

take a look at these https://github.com/ipython-contrib/IPython-notebook-extensions/wiki/Codefolding

pjmn · February 5, 2017, 8:34pm

I found the course “Computational Photography” on Udacity very relevant. See lesson 100-130 on topics such as:

Cross correlation
Smoothing
Convolution
Mean and median filtering
Gaussian filters

geniusgeek · February 7, 2017, 12:50pm

Thank So much, i enjoyed the lectures, i learnt the following

Image Smoothing/Normalization using kernel and neighbourhood computations
X Correlation
Mean and Median Filtering
Convolution Method and Properties
Diff between Convolution and correlation
Gaussian Filter
Linear Filtering
Image gradients
Detect Features in Images
edges

cmeff1 · February 13, 2017, 5:38pm

So I’m back at it again. Been working through lesson2. I understand the idea of using the training data and validation data. I understand how to setup the directory structure and using the linear model on it in lesson 2. My real question is whats the best way to use the test data after the valadation and training data has been used with the model. I’ve looked in the dogs and cats redux notebook to try and understand how to call the test data and the line i see is:
batches, preds = vgg.test(test_path, batch_size = batch_size*2)

However, I also see a model.predict or model.predict_generator for batches. Do I simply use test_batches = get_batches(…) on the test data and then run model.predict_generator on the test_batches to get the predictions on the unlabeled test data?

Thanks ahead of time.

Chris

Even · February 13, 2017, 6:28pm

Hey @cmeff1, if you take a look at vgg16.py you’ll see that vgg.test is actually calling model.predict_generator:

def test(self, path, batch_size=8):
test_batches = self.get_batches(path, shuffle=False, batch_size=batch_size, class_mode=None)
return test_batches, self.model.predict_generator(test_batches, test_batches.nb_sample)

It just includes another step that sets up the batches for prediction. So you could do it either way. vgg.test is just a helper function that makes it easy.

cmeff1 · February 13, 2017, 7:06pm

Thanks Even. I’ll try that tonight when I get home.

cmeff1 · February 14, 2017, 3:44am

HI Even,

So I’m not sure its working exactly like I think it should.

My code:

%matplotlib inline
import utils; reload(utils)
from utils import *
%matplotlib inline
from future import division,print_function
import os, json
from glob import glob
import numpy as np
import scipy
import h5py
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import confusion_matrix
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
import utils; reload(utils)
from utils import plots, get_batches, plot_confusion_matrix, get_data

from numpy.random import random, permutation
from scipy import misc, ndimage
from scipy.ndimage.interpolation import zoom

import keras
from keras import backend as K
path = "dogscats/sample/"
test_path = “dogscats/sample/test/”
#path = "dogscats/"
model_path = path + 'models/'
if not os.path.exists(model_path): os.mkdir(model_path)

batch_size=1

from vgg16 import Vgg16
vgg = Vgg16()
model = vgg.model

import bcolz
def save_array(fname, arr): c=bcolz.carray(arr, rootdir=fname, mode=‘w’); c.flush()
def load_array(fname): return bcolz.open(fname)[:]

def onehot(x): return np.array(OneHotEncoder().fit_transform(x.reshape(-1,1)).todense())
val_batches = get_batches(path+‘valid’, shuffle=False, batch_size=1)
batches = get_batches(path+‘train’, shuffle=False, batch_size=1)

val_data = get_data(val_batches)
trn_data = get_data(batches)

save_array(model_path+ ‘train_data.bc’, trn_data)
save_array(model_path + ‘valid_data.bc’, val_data)

trn_data = load_array(model_path+‘train_data.bc’)
val_data = load_array(model_path+‘valid_data.bc’)

val_classes = val_batches.classes
trn_classes = batches.classes

val_labels = onehot(val_classes)
trn_labels = onehot(trn_classes)

trn_features = model.predict(trn_data, batch_size=batch_size)
val_features = model.predict(val_data, batch_size=batch_size)

save_array(model_path+ ‘train_lastlayer_features.bc’, trn_features)
save_array(model_path + ‘valid_lastlayer_features.bc’, val_features)

trn_features = load_array(model_path+‘train_lastlayer_features.bc’)
val_features = load_array(model_path+‘valid_lastlayer_features.bc’)

lm = Sequential([ Dense(2, activation=‘softmax’, input_shape=(1000,)) ])
lm.compile(optimizer=RMSprop(lr=0.01), loss=‘categorical_crossentropy’, metrics=[‘accuracy’])

batch_size=4

lm.fit(trn_features, trn_labels, nb_epoch=1, batch_size=batch_size,
validation_data=(val_features, val_labels))

test_batches, predz = vgg.test(test_path, batch_size=batch_size*2)
print(predz)

Output from code above:
All zeros? I’m not sure I think that makes sense. Not sure if I’m doing this right…
Found 104 images belonging to 1 classes.
[[ 2.3819e-06 8.2677e-07 1.5837e-07 …, 2.0192e-06 3.4587e-05 2.7597e-04]
[ 8.6598e-08 7.3182e-07 3.6479e-07 …, 1.6141e-08 1.6603e-05 3.5059e-03]
[ 3.9418e-06 8.7915e-06 3.6428e-05 …, 1.1809e-06 7.2487e-05 6.3038e-03]
…,
[ 4.1993e-09 4.5927e-09 3.1021e-09 …, 1.2422e-08 9.2751e-08 3.1926e-06]
[ 8.2603e-08 9.9292e-07 1.6900e-07 …, 1.2691e-09 2.0147e-06 3.2606e-05]
[ 7.2147e-08 3.3645e-06 3.7729e-07 …, 1.0056e-07 2.1949e-04 6.8866e-04]]

Does this output make any sense? Running it on just a sample of the data…

vahuja4 · February 14, 2017, 4:44am

Here is the weight update method from sgd.ipynb that was introduced in Lesson 2:

def upd():
    global a_guess, b_guess
    y_pred = lin(a_guess, b_guess, x)
    dydb = 2 * (y_pred - y)
    dyda = x*dydb
    a_guess -= lr*dyda.mean()
    b_guess -= lr*dydb.mean()

I can see that dyda and dydb are going to be vectors containing as many partial derivatives as the number of points. Can someone please explain why we are taking the mean()? How do we interpret it geometrically?

vahuja4 · February 14, 2017, 5:32am

Here is the animate method from sgd.ipynb that was introduced in Lesson 2:

fig = plt.figure(dpi=100, figsize=(5, 4))
plt.scatter(x,y)
line, = plt.plot(x,lin(a_guess,b_guess,x))
plt.close()

def animate(i):
    line.set_ydata(lin(a_guess,b_guess,x))
    for i in range(100): upd()
        return line,

ani = animation.FuncAnimation(fig, animate, np.arange(0, 40), interval=100)
ani

In this snippet, every time the animate method is called, the weight update method is called a 100 times. Is 100 chosen for a particular reason? Should we not check for the gradient to be all 0s (we have reached a minima), and then we can stop?

vahuja4 · February 14, 2017, 10:42am

I am a bit lost in the part where @jeremy discusses using a linear model with the imagenet probabilities as inputs. Text in the notebook:

“They ignore information available in the predictions; for instance, if the models predicts that there is a bone in the image, it’s more likely to be a dog than a cat.”

Here, they refers to the manual mapping of the 1000 class probabilities to 2 classes (dog and cat). I am not able to understand how the class probabilities encode this kind of information?

rashudo · February 14, 2017, 4:02pm

I’m busy rewriting lesson 2 notebook in my own code. I’ve arrived at the part where we pop the last layer (1000 classes) and replace it with a 2-class dense fullyconnected layer. When I run fit_model the validation accuracy is stuck at .5, or 50%. I don’t understand what is wrong here. What could be causing that? All the steps before that ran fine and produce the same results as in the original notebook.

Could it be that class information is missing from the training set? Did anyone else ran into this problem? Or are the training and validation classes ‘misaligned’?

radek · February 14, 2017, 4:58pm

earlier, linear model approach =>

keep the last layer that outputs probabilities across 1000 labels that sum to one for each example where we run the prediction
stick another layer on top of that that takes a linear combination of those predictions
vgg claims it likely sees a bone, some animal and maybe a ball in the image -> our last layer combines those predictions and maybe is inclined to indicate this is a picture of a dog

finetuning =>

get rid of the last layer, the output layer predicting across 1000 labels
stick a layer with two classes on top of the model
train the model to go directly from features it produces in the last but one layer to predictions across two classes, cats and dogs
the rest of the model is frozen - we are only training the top most layer (for the time being at least)