Wiki: Lesson 1

Hi all,

I tried search the forum and this thread for an answer, but it seems to be very few things associated, at least as I could find.

I am currently a complete beginner when it comes to Deep Learning, however I have a solid background of programming. I would like to ask for a piece of advice here. I started looking at the first lecture, and after googling about at the costs of AWS (e.g. paperspace) I came to realise that it is not very cheap in the end anyway, especially with cost increasing proportional to use.

I have a freshly built (watercooled) rig, that I would then like to run my Deep Learning server (DLS) on. I run W10, with:
Asus Maximus IX Hero mobo
Intel i7 7700K
GTX 1070
16GB ram
2 SSDs + 1 HDD
Corsair AX 860W PSU

I honnestly feel like it would be a pity for me to use anything else than this. I could upgrade RAM and run SLI if needed. So now to my questions:

  • Since I am primarily a Windows user, I would like to run a virtual box (VB, or something similar, better if existing?) for my DLS. Are there any downsides, primarily in terms of speed, to running it through VB or will it work just as the AWS?
  • Should I consider upgrading the PC? RAM is of course easy, but for SLI I would have to drain the system and remake some of the water pipes.

If anyone has some good insight and/or links on the matter of setting up a DLS using a VB I would be very grateful!


Hi Hwang,
Same problem here with Paperspace cloud P5000 instance. Kernel is dying after i run the method after 30 seconds. Did you come around with a sloution? Help me with this.


Hi Class!

I’m trying to submit an attempt for the dogs/cats redux. Problem is, my prediction only has 8,819 rows :confused:. I’m pretty sure I’ve parsed the data correctly. My redux data directory contains the three sub-folders I created (train,test, and valid) plus the two that were created when I trained my model (models and tmp). Train and valid each have two sub-folders (cats and dogs). The path variable has been set to the following:

PATH = "data/newdogscats/"

The output of os.listdir(PATH) is as follows:

['.ipynb_checkpoints', 'test', 'valid', 'models', 'train', 'tmp']

The image splits (number of image files) for three sub-folders are as follows:

Train: 8,090 (per class)
Valid: 4,410 (per class)
Test: 12,500

I’ve verified each of these by passing the os.listdir() output to the len() function. Below are the commands I am using to train my model:

data = ImageClassifierData.from_paths(PATH, test_name='test', tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True), 2)

and yet my learn.predict() method is only creating 8,819 predictions!? Does this have something to do with the discrepancy between the size of the test and train folders, or is there something else I might want to verify? Thanks in advance for your input.

Update: len(data.val_y) only output 8,819. So, I’m thinking it might an issue reading files from the test folder.

Second Update: Solved my own problem :+1: It had nothing to do with the commands I pasted. When using learn.predict() you need the is_test = True argument.

1 Like

Hi Reshama,

I am currently trying to configure AWS to start working on this course. I’m following the link of yours. I tried logging in with
ssh -i aws_fastai_gpu.pem ubuntu@ -L8888:localhost:8888 .

However, the output I got was:
Warning: Identity file aws_fastai_gpu.pem not accessible: No such file or directory.
ssh: connect to host port 22: Operation timed out

It turns out that there is no file called aws_fastai_gpu.pem in the .ssh directory.I am on Mac - I had to a create a .ssh folder within /Users/arunvs/ and the only contents of the directory after running sshgen are id_rsa, and known_hosts.

I am not sure how to resolve this issue about logging in. I am new to both AWS and Mac. Your help will be much appreciated.


Can someone please explain the “binary_loss” in the last couple of cells of lesson 1?

def binary_loss(y, p):
    return np.mean(-(y * np.log(p) + (1-y)*np.log(1-p)))
  1. Why is it the “binary” loss? Does that just mean that it is normalized between 0 and 1?
  2. What is the “y” here? It is not the y from log_preds,y = learn.TTA(), because it crashes if I use that “y
  3. I see that that when a list is made to call this function, the variable is called “acts”, what does this stand for? I realise that it is y, but I don’t understand.
  4. How would I get y from the confusion matrix? I got inf when I tried using probs - so maybe what is probs then…

Also, one really dumb question, but I just want to verify I understand - what does “precompute” actually precompute when set to True?


Check your understanding of the lesson 1

Check your understanding of the lesson 2 >>>

(post original in portuguese at Deep Learning Brasília - Lição 1)

Hi guys,

I did watch again the video of the lesson 1 (part 1) to get the whole image and I took notes of the vocabulary used by @jeremy.

Let’s play ! OK ? :wink:
Can you give a definition / a url / an explanation for all the followings terms and expressions ?

If yes, you are done with the first lesson !!! :sunglasses: :sunglasses: :sunglasses:

PS : you do not want to test yourself or you want to check your answers ? Go to the blog post “Deep Learning 2: Part 1 Lesson 1” of @hiromi : " super travail !!! :slight_smile: "

  • course Fastai
  • forum Fastai
  • GPU
  • CUDA
  • Crestle / PaperSpace
  • jupyter notebook
  • Data Science
  • SHIFT + ENTER in a jupyter notebook
  • python 3
  • wget
  • exclamation mark in a cell (ex : !ls)
  • bash command
  • python variable into brackets
  • training set
  • validation set
  • Fastai Machine Learning course : prerequesite or not ?
  • image Classifier
  • label
  • keras
  • plt.imread
  • plt.imshow
  • python 3.6 format string
  • img.shape
  • 3 dimensional array (rank 3 tensor)
  • Red Green Blue (RGB) pixel values between 0 and 255
  • kaggal competition
  • pre-trained model
  • resnet24
  • ImageNet competition
  • Convolucional Neural Network (CNN)
  • accuracy
  • train a model
  • 3 lines of code
  • epoch
  • testing set
  • learning rate
  • loss function
  • cross entropy loss
  • validation and testing set accuracy
  • Fastai library
  • transfer learning
  • pytorch
  • tensorflow
  • network architecture
  • data augmentation
  • validation set dependent variable val_y
  • data.classes
  • classes
  • object data
  • object learn
  • the model
  • prediction on validation set
  • learn.predict()
  • log of the predictions : log_preds
  • get the predictions on validation set np.argmax(log_preds, axis=1)
  • get probabilities on dogs : np.exp(log_preds[:,1])
  • numpy
  • top-down, the whole game
  • code driven approach
  • world class neural network
  • stalelite images
  • structured data
  • NLP classifier
  • recommendation system
  • text generator
  • create our own architecture from scratch
  • donwload a pre-trained model and precompute
  • alphago
  • image classifier for fraude dectection
  • machine learning
  • Arthur Samuels, 1950s, ML father
  • IBM mainframe
  • play checkers
  • traditional Machine Learning
  • features engineering
  • domaine experts and specialits
  • algorithm (Deep Learning) :
    ** infinitely flexible function
    ** all-purpose parameters fitting
    ** fast and scalable
  • neural network, number of simple linear layers interspersed with a number of non linear layers
  • universal approximation theorem
  • Fit parameters, Gradient Descent (how good are they, find a minimum on loss function curve, local miminim)
  • minimum time, GPU 10 time faster than a CPU
  • hidden layer
  • increase of number of parameters by layer is a problem but increase number of layers is teh solution
  • DL = neural network with multiple hidden layers
  • Google starts using DL in 2012
  • Geoffrey Hinton, DL father
  • Andrej Karpathy
  • inBox by Gmail
  • Skype Translator
  • Semantic Style Transfer
  • cancer detection
  • true/false positive/negative
  • CNN, Convolucional Neural Network
  • convolucional
  • find edges
  • multiplication of pixels values by a kernel (filter)
  • linear operation
  • linear layer
  • non linear layer
  • sigmoid
  • Relu
  • element wise multiplication
  • michael Neslon
  • Stochastic Gradient Descent
  • derivative
  • small step
  • learning rate
  • combine convolution, non linearity, gradient descent
  • picture of what each layer learns
  • parameters of the kernels are learnt using gradient descent
  • learning rate not too high, but not too low as well
  • choosing a learning rate
  • learn.lr_find()
  • best improvement of the loss before it gets worse
  • learn.shed.plot_lr()
  • learn.sched.plot()
  • mini batches
  • traing loss
  • validation loss
  • validation accuracy
  • overfitting : stop fitting your model
  • tab to get list of function
  • SHIFT + TAB (once : parameters, twice : documentation, 3 times : pops up a window with source code)
  • 1 question mark : documentation
  • 2 question mark : pops up source code
  • H to get short codes in jupyter notebooks
  • Stop your Crestle or PaperSpace machine !
  • use the fastai forum !
  1. Binary loss represents the loss function for a binary classification problem. This does not necessarily mean that the loss itself is normalized from 0 to 1.

  2. y here represents the labels for the examples that the loss is calculated for. For example, if picture 1 is a dog and picture 2 is a cat, then y = [1, 0] (assuming 1 represents dog and 0 represents cat). p represents the probability that the example is a dog (1), output by the model.

  3. I’m guessing that acts stands for actuals, as in the actual labels.

  4. I’m not sure why you would want to get y from the confusion matrix. As I understand, the confusion matrix is a visualization of the model’s predictions so that you can see which categories your model performs well on and which ones it performs poorly on.

Setting precompute to True ensures that the model uses precomputed activations for the model. This means that the model will use the activations that were precomputed during training except for the last layer. This is because with little data, it will be difficult to properly train the whole model, but training only the last layer is easier to do.


At minute 49:27 in the video, I see a function S(x) = 1/(1-exp(x)). Is that an activation function? I seems to look like a Sigmoid, but that is 1/(1+exp(-x)).

Here are instructions for creating a keypair

You will want to replace aws_fastai_gpu.pem with your keypair name which will be

Thanks @reshama for your paperspace setup instructions. Extremely useful!


Did a git pull and conda env update this evening and now lesson1.ipynb (for cell 29) gives AttributeError: 'ConvLearner' object has no attribute 'data_path'.

Found this post 15606 which fixed the problem. Thanks Tal!

What is the relationship between epoch and batch size? How to set batch size correctly?

At 1:19 the teacher is talking about epoch and batch size, at each epoch we take a batch size of 64…

I noticed I was unable to plot the learning rate learn.sched.plot() until I set the batch size to 6 for my 200 images (100 each of each type) with a setting of 75% training 15% valid. When I inspected the current batch size it was already set to 64 before I changed it for my dataset.

Thanks for clarifications!

Thanks a bunch @duanev, i’d got the same issue, thanks for sharing the link to solution.

1 Like

Hi prairieguy,
I get an error “selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status: 1” when i run the script. Could i be doing something wrong?

thanks for the terms and expressions !! It is well appreciated and keep doing it

Why do we always have to find the local minimum in Deep learning algorithms ? What purpose does it serve

The issue with np.mean() and the call to accuracy_np(probs, y) failing as it was getting passed a one-dim array:

AxisError: axis 1 is out of bounds for array of dimension 1

update: for some reason pip wasn’t loading the latest version of fastai - I replaced it with pulling directly from github and it all works now. So it was a false alarm.

Ideally we would want to find a global minimum of our loss function which should represent “how far away” we are from our desired values. But in practice we may end up with overfitting.
From this paper:

We empirically verify several hypotheses regarding
learning with large-size networks:
• For large-size networks, most local minima are equivalent and yield similar performance on a test set.
• The probability of finding a “bad” (high value) local minimum is non-zero for small-size networks and decreases quickly with network size.
• Struggling to find the global minimum on the training set (as opposed to one of the many good local ones) is not useful in practice and may lead to overfitting.

1 Like

Hi everyone,
I have written my first blog post on Dogs Vs Cats Classification, please have a look

give suggestion for further improvement

1 Like

Hi just some feedback. I was following an older version of this course a while ago, and I found that much, much easier to follow than this version.

The old one had a few utility methods and stuff (“” and “”!), but this new one comes with thousands of lines of “helpful” code in the fastai library, way too much to casually understand without a lot of work.

Now I feel like I’m not learning how to use keras or theano or tensorflow or pytorch, I’m just investing a lot of time into learning your made-for-this-course framework.

I’m willing to work hard, but if I put in the work to understand the fastai library it’s not transferrable or useful. I’d much rather have to slowly build up over time all the code for image-loading, transforming, model creation, etc. Then at least that effort teaches me something that’s useful in the future.

As helpful as the fastai library is, it’s not likely to be used outside of this course. Rather than learn it, I’d like to learn how to do those things myself.