Unofficial Lesson 8 Classnotes

timlee · November 22, 2017, 7:46am

Machine Learning Lecture 8

Aside: Know how to sell - will always be a part of everyone’s job : Be Noticed!

Moving from Decision Trees to Deep Learning

There are some situations, were RF will not perform as well. For the kaggle grocery competition, its difficult to get a good score with RFC. This is due to :

high cardinality
time series
lots of data

What about recognizing digits?

1. Get the Data

Load libraries

%load_ext autoreload
%autoreload 2

from fastai.imports import *
from fastai.torch_imports import *
from fastai.io import *

Pull the saved data

URL='http://deeplearning.net/data/mnist/'

!wget http://deeplearning.net/data/mnist/

--2017-11-21 14:49:53--  http://deeplearning.net/data/mnist/
Resolving deeplearning.net... 132.204.26.28
Connecting to deeplearning.net|132.204.26.28|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 935 [text/html]
Saving to: ‘index.html.1’

index.html.1        100%[===================>]     935  --.-KB/s    in 0s      

2017-11-21 14:49:53 (89.2 MB/s) - ‘index.html.1’ saved [935/935]

path = './'

FILENAME='mnist.pkl.gz'
def load_mnist(filename):
    return pickle.load(gzip.open(filename, 'rb'), encoding='latin-1')

Pickle - python object saving / loading

Great for saving python objects. You can save everything, but is not always optimal. You can save dictionaries, lists, complex objects.

Feather - specific python dataframe saving / loading

Feather is specifically designed to save dataframes, to write and load fast

Deconstructing

If we know that a function will return a 2 item tuple, we can set a similar structure to capture the variables outside of the tuple as shown below.

def myfunc:
    return (2,3)

(a,b) = myfunc

a = 2 # no longer tuple
b = 3 # no longer tuple

get_data(URL+FILENAME, path+FILENAME)
((x, y), (x_valid, y_valid), _) = load_mnist(path+FILENAME)

2. Data Normalization + Processing

Image size: 28 x 28

We also notice that the dimensions are too high, this is because of the way that the data is stored. We will need to extract.

Vocab:

Vector = 1d array = rank 1 tensor

Matrix = 2d array = rank 2 tensor

cube = 3d array = rank 3 tensor

If we take a look at the dimensions of the variables that we have loaded

type(x), x.shape, type(y), y.shape

(numpy.ndarray, (50000, 784), numpy.ndarray, (50000,))

Does normalization matter for random forests for independent variables ?

No, When we normalize, we only care about the order. This is also why random forests are resilient to outliers.

Does normalization matter for deep learning ?

Yes, because we are making a parameterized model.

Normalizing the entire dataset

mean = x.mean()
std = x.std()
x=(x-mean)/std
x.mean(), x.std()

(-3.1638146e-07, 0.99999934)

Normalize the Validation Set

Note that we are using the entire dataset mean and std to transform the validation set. This is to ensure standardized transformation between all datasets.

x_valid = (x_valid-mean)/std
x_valid.mean(), x_valid.std()

(-0.0058509219, 0.99243325)

Helper Methods for Plotting

Wrapping the matplotlib library for showing images to make our lives a bit easier

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

def show(img, title=None):
    plt.imshow(img, interpolation='none', cmap="gray")
    if title is not None: plt.title(title)

def plots(ims, figsize=(12,6), rows=2, titles=None):
    f = plt.figure(figsize=figsize)
    cols = len(ims)//rows
    for i in range(len(ims)):
        sp = f.add_subplot(rows, cols, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i], interpolation='none', cmap='gray')

Remember that all the data is stacked together, we will split into the individual images

x_valid.shape

(10000, 784)

**10,000 x 784 ** --> **10,000 x 28 x 28 **

(-1,28,28) - All neural networks, the first axis is a row, image, a sentence, example, then we reshape the last two axis

x_imgs = np.reshape(x_valid, (-1,28,28)); x_imgs.shape

(10000, 28, 28)

Example of a Reshape operation

Look at a image sample - first img

show(x_imgs[0], y_valid[0])

output_32_0

Look at a labels - first label

y_valid.shape

(10000,)

y_valid[0]

Look at part of an image (look at a portion of image #1)

Note we are using numpy’s indexing to get to a portion of a matrix

x_imgs[0,10:15,10:15]

array([[-0.42452, -0.42452, -0.42452, -0.42452,  0.17294],
       [-0.42452, -0.42452, -0.42452,  0.78312,  2.43567],
       [-0.42452, -0.27197,  1.20261,  2.77889,  2.80432],
       [-0.42452,  1.76194,  2.80432,  2.80432,  1.73651],
       [-0.42452,  2.20685,  2.80432,  2.80432,  0.40176]], dtype=float32)

show(x_imgs[0,10:15,10:15])

output_38_0

Preview some of the different classes of images

plots(x_imgs[:8], titles=y_valid[:8])

Intro to Neural Network

Class of mathematical functions. Supports the universal approximation function. It can approximate anything, assuming we make it big enough.

On Functions:

A function takes inputs and returns outputs. For instance, f(x) = 3x + 5 is an example of a function. If we input 2, the output is 3\times 2 + 5 = 11, or if we input -1, the output is 3\times -1 + 5 = 2

Functions have parameters. The above function f is ax + b, with parameters a and b set to a=3 and b=5.

Machine learning is often about learning the best values for those parameters. For instance, suppose we have the data points on the chart below. What values should we choose for a and b?

Animation (right image, fitting a function to the data):

So what is a Neural-Net?

Neural Networks with Pytorch

Graphical processing units (GPUs) allow for matrix computations to be done with much greater speed, as long as you have a library such as PyTorch that takes advantage of them. Advances in GPU technology in the last 10-20 years have been a key part of why neural networks are proving so much more powerful now than they did a few decades ago.

You may own a computer that has a GPU which can be used. For the many people that either don’t have a GPU (or have a GPU which can’t be easily accessed by Python), there are a few differnt options:

Don’t use a GPU: For the sake of this tutorial, you don’t have to use a GPU, although some computations will be slower. The only change needed to the code is to remove .cuda() wherever it appears.

import libraries

from fastai.metrics import *
from fastai.model import *
from fastai.dataset import *
from fastai.core import *

import torch.nn as nn

Torch.nn Docs can be found here:

http://pytorch.org/docs/master/nn.html

Sequential PyTorchs

net = nn.Sequential(
    nn.Linear(28*28, 10),
    nn.LogSoftmax()
)#.cuda() #<--- signals to run on the GPU

nn.Linear(28*28, 10) - there are 28 x 28 images and output 10 size (for number classes). Linear layer, y = ax + b in a matrix format

nn.LogSoftmax() - This is the non-linear layer. These can be converted to comparative probabilities.

md = ImageClassifierData.from_arrays(path, (x,y), (x_valid, y_valid))

ImageClassifierData - wrapper for all the data

path - file path location

(x,y) - training data

(x_valid, y_valid) - validation data

x.shape

(50000, 784)

loss=nn.NLLLoss()
metrics=[accuracy]
opt=optim.Adam(net.parameters())

loss function - this is our primary metric to determine how we are doing better. How do we score? (lower is better)

nn.NLLLoss() class torch.nn.NLLLoss(weight=None, size_average=True, ignore_index=-100, reduce=True)[source]
The negative log likelihood loss.

fit(net, md, epochs=1, crit=loss, opt=opt, metrics=metrics)

Failed to display Jupyter Widget of type HBox.

If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean that the widgets JavaScript is still loading. If this message persists, it likely means that the widgets JavaScript library is either not installed or not enabled. See the Jupyter Widgets Documentation for setup instructions.

If you're reading this message in another frontend (for example, a static rendering on GitHub or NBViewer), it may mean that your frontend doesn't currently support widgets.

[ 0.       0.31865  0.2866   0.91849]

Sample of a Binary Loss functin

def binary_loss(y,p):
    return np.mean(-(y * np.log(p)+(1-y)*np.log(1-p)))

acts_sample = np.array([1, 0, 0, 1])
preds_sample = np.array([0.9, .1, .2, .8])
binary_loss(acts_sample, preds_sample)

0.164252033486018

We could replace this with a if statement - see if you can scale it out to be a categorical cross entropy

Prediction 10000 x 10

For each of the images in the validation set, we are making 10 predictions (probability for each of the numbers)

preds = predict(net, md.val_dl)

preds.shape

(10000, 10)

Looking at single row

argmax returns the index of hte maximum value

preds[1]

array([-10.98194,  -7.74565,  -5.0269 ,  -4.07413,  -4.27588,  -3.71443,  -9.82602,  -9.61403,  -0.06536,
        -7.05641], dtype=float32)

preds.argmax(1)[:5]

array([3, 8, 6, 9, 6])

preds = preds.argmax(1)

np.sum(preds == y_valid)/len(preds)

0.91810000000000003

plots(x_imgs[:8], titles=preds[:8])

We see that the 4 is incorrect and the 3 is off. One of the reasons for this is not as accurate, is that we only have a single layer, and essentially making a logistic regression. A deep neural network will have many more layers.

net = nn.Sequential(
    nn.Linear(28*28, 10), <--- input is multiplied
    nn.LogSoftmax()       <--- then logistic regression is applied
)#.cuda()

Let’s Define Logistic Regression

Will do this by defining a pytorch class

torch.randn(3,4) # <-- similar to numpy language

 1.9552 -0.5213  0.3728  0.8425
-0.8652 -0.7114  0.4367 -0.0971
-1.0709 -0.0252  0.8083 -0.1447
[torch.FloatTensor of size 3x4]

torch.randn(3,4).cuda() # <-- similar to numpy language, but with GPU!

-0.3837 -0.9107  0.7963 -2.0376
 0.8746  1.1661  0.9240  0.1158
-1.3293  1.9849 -0.8541  0.0062
[torch.FloatTensor of size 3x4]

def get_weights(*dims): return nn.Parameter(torch.randn(*dims)/dims[0])

Initializing the random weights, that are nicely behaved. Mean =0, standard deviation = 1.

Basic Logistic Regression pytorch class (not a layer / nn architecture)

class LogReg(nn.Module):
    def __init__(self):
        super().__init__()
        self.l1_w = get_weights(28*28, 10)  # Layer 1 weights
        self.l1_b = get_weights(10)         # Layer 1 bias

    def forward(self, x):
        x = x.view(x.size(0), -1)
        x = torch.matmul(x, self.l1_w) + self.l1_b  # Linear Layer
        x = torch.log(torch.exp(x)/(1 + torch.exp(x).sum(dim=0)))        # Non-linear (LogSoftmax) Layer
        return x

(nn.module) <-- we are extending a pytorch class. we are borrowing all the methods from the standard module and will add some additional methods to it.

super().__init__() <-- as a result, we have to create or instantiate the standard module first

self.l1_w = get_weights(28*28, 10) is essentially the ax part of y=ax + b

self.l1_b = get_weights(10) is essentially the b part of y=ax + b

Pytorch special method: forward

forward - special hook in the pytorch library, how we are implementing each layer

http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#forward-and-backward-function-hooks

You can register a function on a Module or a Variable. The hook can be a forward hook or a backward hook. The forward hook will be executed when a forward call is executed. The backward hook will be executed in the backward phase. Let’s look at an example.

x = x.view(x.size(0), -1) - is the reshape (flatten)

x = torch.matmul(x, self.l1_w) + self.l1_b - matrix multiplication

x = torch.log(torch.exp(x)/(1 + torch.exp(x).sum(dim=0))) - softmax from scratch

How the matrix multiplication works out (screenshot):

net2 = LogReg()#.cuda()
opt=optim.Adam(net2.parameters())

fit(net2, md, epochs=1, crit=loss, opt=opt, metrics=metrics)

Failed to display Jupyter Widget of type HBox.

If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean that the widgets JavaScript is still loading. If this message persists, it likely means that the widgets JavaScript library is either not installed or not enabled. See the Jupyter Widgets Documentation for setup instructions.

If you're reading this message in another frontend (for example, a static rendering on GitHub or NBViewer), it may mean that your frontend doesn't currently support widgets.

[ 0.       2.43236  2.39509  0.90993]

np.sum(preds == y_valid)/len(preds)

0.91810000000000003

preds = predict(net2, md.val_dl).argmax(1)
plots(x_imgs[:8], titles=preds[:8])

To dos:

Play with pytorch tutorials,

matrix multiplication
play with tensors
play with pytorch and functions