Lesson 2 discussion

My issue is resolved. An answer I received on another forum:

when using fully connected layers, typically you flatten multidimensional arrays into vectors, because by using a FC layer, you’re acknowledging that spatial structure doesn’t matter. Keras is probably expecting a 2-d input [num_examples, example_size]. Since spatial structure probably does matter, you might want to use a convolutional layer instead.

1 Like

I had the same error: ImportError: No module named tensorflow.examples.tutorials.mnist.

This worked for me (Ubuntu 16.04):

  • install TensorFlow for Anaconda as per Rachel’s instructions

  • switch to TensorFlow command line:
    ezchx@ezchx-DX4300:~/fastai$ source activate tensorflow

  • install Jupyter Notebook for TensorFlow
    (tensorflow) ezchx@ezchx-DX4300:~/fastai$ pip install jupyter

  • close all running versions of Jupyter Notebook, start a new command line, switch to TensorFlow command line, and open jupyter notebook from there:
    (tensorflow) ezchx@ezchx-DX4300:~/fastai$ jupyter notebook

Please note that if you open Jupyter notebook from a standard / non-TensorFlow command line, TensorFlow will not work.

I also had to install matplotlib and scipy to TensorFlow to run convolution-intro-richie.ipynb.

These links also helped me:


1 Like

Hey Everyone-

I’m reading through the suggested chapters of Michael Nielsen’s Neural Networks and Deep Learning book, and I’m having trouble understanding the Quadratic Cost function. At a high level I understand that we’re finding the difference between the expected output and the actual output for each given input, then squaring it to accentuate outliers. But I have a couple a questions:

  1. Is the Quadratic Cost function computed against the output of the entire neural network, or is it computed for each layer or node?

  2. Why do we divide by one half?

Thanks!

Hi @quidmonkey,

You don’t see your expected output until the last layer, so the cost function is computed for the output of the entire network. It might be helpful to work through a really simple example initially, like in this picture

(source)

For the second question, I think you mean “Why do we divide by 2?” :slight_smile: This is just a convention because when you take the derivative, the 2 cancels out.

Hi guys (first post here).

So along with this lesson, to try something a bit different, I tried to build this off of Keras’s built-in VGG16 model.
I essentially took the keras built-in VGG16 model, and did finetuning:

Here’s a gist of that, though it’s not the exact code I had

Doing it this way, accuracy was actually reasonably worse. (between ~.958 and ~.965)
Does anybody know why this method yields worse results that the one we’ve built?
One thing I noticed, looking at the source, is they don’t have any Dropout layers. Could that be the reason for such a difference? Or is there something else I didn’t notice / did incorrectly?

For what it’s worth, I also tried a version with Dropout built on top of the above (starting from the ‘flatten’ layer) and the results were just miserable. I suspect I had another issue attempting that.

1 Like

@bpicolo I am not sure about the reason, but there was thread on this.

@Manoj

Oh man, thanks so much for the link! That actually talks about both of the things I had hit when trying the alternative. Perfect! (Though the particular reasons for the differences seem to be not clear, good to have some validation!)

@z0k Thanks! That makes sense.

1 Like

Hello guys,
here’s my opinion about main assignment of lesson2:
1: use vgg16 model to train the training set, use the “7 lines code”;
2: use model.pop() method to delete the final layer;
3: use model.add() method to add a new dense layer, in order to output 2 classes

but here’s my confusion:
in lesson2.ipynb, I did’t see any training vgg model process like lesson1, such as using model.fit(), shouldn’t we train the model first and then modify the last layer?

thanks for answering.

@justinho,we already load the weights from vgg while creating the model itself, hence no need to train it before removing the last layer. You can checkout that code in vgg16.py

Once the last layer is popped out, we add a new layer with 2 outputs, then we can train the model again to our dataset.

1 Like

Thanks @Manoj ,
Do you mean these lines of code?

if so, in lesson1.ipynb, why we have to use finetune() and fit() at the first begining (in 11th line), and then use that again (in 20th and 21th line)? Would get a better result after do the finetune twice?Therefore in my mind, the first fit() is used to train the model, and the second time is uesd to finetune, I don’t know if my idea is right or not, it confuses me.

如何进行预测??
我自己也训练了一个猫和狗识别的模型,但是训练完以后我不知道怎么去预测?? 有方法可以分享下吗?

@justinho, In finetune() method, we just pop up the last layer and add a new layer depending upon our data and compile the model. In finetune(), we don’t train the model. In fit we train the model with the given data. We only need to call finetune() once, that is enough.

But we can call fit() number of times or pass on no_of epochs parameter to train the model multiple times, this is where model tries to find the optimal weights. We keep on training the model(may need to change the learning rates) until we get a good accuracy on predictions.

I have started reading https://cs231n.github.io/optimization-1/

Can I skipnotes 1, 2 which deal with SVM and other approaches?. I want to focus on the current course and not go on a very long tangent and get lost in forest of things …

Check the dogs_cats_redux.ipynb

Folder Structure should be **test/unknown/**to_be_predicted.jpg

Call

test_path = data_path + 'test/'
batches, preds = vgg.test(test_path, batch_size = batch_size*2)

So I was watching the lesson video, great as usual, but expected somebody to ask one question I had in mind on the last part of the video.

Jeremy ran a 3-epoch experiment, and the accuracy after the second epoch was lower than after the first one. The third epoch got the best result, though. Could anyone explain to me why the accuracy decreased after the second epoch?

Let me take a stab at this, in very basic terms (I’m only on lesson 2)…

Each epoch runs through the entire dataset that you’re “fitting” the model for. During this process you’re trying to get the best weights/parameters. This is the optimization part (e.g. stochastic gradient descent), as an example in lesson 2, where Jeremy showed the animated linear model, the line (model) getting closer and closer to the dotted line (actuals).

The accuracy changes because as the model is being fit through each epoch, adjustments are made in order to try to “improve” the model. Sometimes it gets better and sometimes it gets worse, but another iteration happens, it learns, readjusts and tries to improve - something to do with partial derivatives :slight_smile:

If you went through 30 instead of 3 epochs you’ll see similar results but should be improving overall. Check out the wiki below for a much better answer on SGD.

Hope that helps a little.

http://wiki.fast.ai/index.php/Lesson_2_Notes#Gradient_Descent

1 Like

Thanks Jason, very useful. So adding more epochs does not necessarily mean obtaining better resullts at each iteration, but in the overall accuracy after all the iterations instead.

I keep getting a memory error message on the following line:

trn_data = get_data(path+‘train’)

The same line corresponding to validation files works fine. This is the error I get:


MemoryError Traceback (most recent call last)
in ()
----> 1 trn_data = get_data(path+‘train’)

/home/username/courses/deeplearning1/nbs/utils.pyc in get_data(path, target_size)
135 def get_data(path, target_size=(224,224)):
136 batches = get_batches(path, shuffle=False, batch_size=1, class_mode=None, target_size=target_size)
–> 137 return np.concatenate([batches.next() for i in range(batches.nb_sample)])
138
139

MemoryError:

It looks like numpy is running out of memory when concatenating the batches. I’m working on a computer with 16 GB of RAM and a GTX 970 graphic card. Any ideas?

@Estiui, 16GB Ram won’t be sufficient when calling get_data. Even 60GB RAM p2 instance was not sufficient for me. You can add large amount of swap and try.