Wiki: Lesson 1


Can someone please explain the “binary_loss” in the last couple of cells of lesson 1?

def binary_loss(y, p):
    return np.mean(-(y * np.log(p) + (1-y)*np.log(1-p)))
  1. Why is it the “binary” loss? Does that just mean that it is normalized between 0 and 1?
  2. What is the “y” here? It is not the y from log_preds,y = learn.TTA(), because it crashes if I use that “y
  3. I see that that when a list is made to call this function, the variable is called “acts”, what does this stand for? I realise that it is y, but I don’t understand.
  4. How would I get y from the confusion matrix? I got inf when I tried using probs - so maybe what is probs then…

Also, one really dumb question, but I just want to verify I understand - what does “precompute” actually precompute when set to True?


(Pierre Guillou) #249

Check your understanding of the lesson 1

(post original in portuguese at Deep Learning Brasília - Lição 1)

Hi guys,

I did watch again the video of the lesson 1 (part 1) to get the whole image and I took notes of the vocabulary used by @jeremy.

Let’s play ! OK ? :wink:
Can you give a definition / a url / an explanation for all the followings terms and expressions ?

If yes, you are done with the first lesson !!! :sunglasses: :sunglasses: :sunglasses:

PS : you do not want to test yourself or you want to check your answers ? Go to the blog post “Deep Learning 2: Part 1 Lesson 1” of @hiromi : " super travail !!! :slight_smile: "

  • course Fastai
  • forum Fastai
  • GPU
  • CUDA
  • Crestle / PaperSpace
  • jupyter notebook
  • Data Science
  • SHIFT + ENTER in a jupyter notebook
  • python 3
  • wget
  • exclamation mark in a cell (ex : !ls)
  • bash command
  • python variable into brackets
  • training set
  • validation set
  • Fastai Machine Learning course : prerequesite or not ?
  • image Classifier
  • label
  • keras
  • plt.imread
  • plt.imshow
  • python 3.6 format string
  • img.shape
  • 3 dimensional array (rank 3 tensor)
  • Red Green Blue (RGB) pixel values between 0 and 255
  • kaggal competition
  • pre-trained model
  • resnet24
  • ImageNet competition
  • Convolucional Neural Network (CNN)
  • accuracy
  • train a model
  • 3 lines of code
  • epoch
  • testing set
  • learning rate
  • loss function
  • cross entropy loss
  • validation and testing set accuracy
  • Fastai library
  • transfer learning
  • pytorch
  • tensorflow
  • network architecture
  • data augmentation
  • validation set dependent variable val_y
  • data.classes
  • classes
  • object data
  • object learn
  • the model
  • prediction on validation set
  • learn.predict()
  • log of the predictions : log_preds
  • get the predictions on validation set np.argmax(log_preds, axis=1)
  • get probabilities on dogs : np.exp(log_preds[:,1])
  • numpy
  • top-down, the whole game
  • code driven approach
  • world class neural network
  • stalelite images
  • structured data
  • NLP classifier
  • recommendation system
  • text generator
  • create our own architecture from scratch
  • donwload a pre-trained model and precompute
  • alphago
  • image classifier for fraude dectection
  • machine learning
  • Arthur Samuels, 1950s, ML father
  • IBM mainframe
  • play checkers
  • traditional Machine Learning
  • features engineering
  • domaine experts and specialits
  • algorithm (Deep Learning) :
    ** infinitely flexible function
    ** all-purpose parameters fitting
    ** fast and scalable
  • neural network, number of simple linear layers interspersed with a number of non linear layers
  • universal approximation theorem
  • Fit parameters, Gradient Descent (how good are they, find a minimum on loss function curve, local miminim)
  • minimum time, GPU 10 time faster than a CPU
  • hidden layer
  • increase of number of parameters by layer is a problem but increase number of layers is teh solution
  • DL = neural network with multiple hidden layers
  • Google starts using DL in 2012
  • Geoffrey Hinton, DL father
  • Andrej Karpathy
  • inBox by Gmail
  • Skype Translator
  • Semantic Style Transfer
  • cancer detection
  • true/false positive/negative
  • CNN, Convolucional Neural Network
  • convolucional
  • find edges
  • multiplication of pixels values by a kernel (filter)
  • linear operation
  • linear layer
  • non linear layer
  • sigmoid
  • Relu
  • element wise multiplication
  • michael Neslon
  • Stochastic Gradient Descent
  • derivative
  • small step
  • learning rate
  • combine convolution, non linearity, gradient descent
  • picture of what each layer learns
  • parameters of the kernels are learnt using gradient descent
  • learning rate not too high, but not too low as well
  • choosing a learning rate
  • learn.lr_find()
  • best improvement of the loss before it gets worse
  • learn.shed.plot_lr()
  • learn.sched.plot()
  • mini batches
  • traing loss
  • validation loss
  • validation accuracy
  • overfitting : stop fitting your model
  • tab to get list of function
  • SHIFT + TAB (once : parameters, twice : documentation, 3 times : pops up a window with source code)
  • 1 question mark : documentation
  • 2 question mark : pops up source code
  • H to get short codes in jupyter notebooks
  • Stop your Crestle or PaperSpace machine !
  • use the fastai forum !

  1. Binary loss represents the loss function for a binary classification problem. This does not necessarily mean that the loss itself is normalized from 0 to 1.

  2. y here represents the labels for the examples that the loss is calculated for. For example, if picture 1 is a dog and picture 2 is a cat, then y = [1, 0] (assuming 1 represents dog and 0 represents cat). p represents the probability that the example is a dog (1), output by the model.

  3. I’m guessing that acts stands for actuals, as in the actual labels.

  4. I’m not sure why you would want to get y from the confusion matrix. As I understand, the confusion matrix is a visualization of the model’s predictions so that you can see which categories your model performs well on and which ones it performs poorly on.

Setting precompute to True ensures that the model uses precomputed activations for the model. This means that the model will use the activations that were precomputed during training except for the last layer. This is because with little data, it will be difficult to properly train the whole model, but training only the last layer is easier to do.

(BixQu) #252

At minute 49:27 in the video, I see a function S(x) = 1/(1-exp(x)). Is that an activation function? I seems to look like a Sigmoid, but that is 1/(1+exp(-x)).