Lesson 1 discussion

ankitakash · December 14, 2017, 11:59am

Lesson 1 : Memory Error

I am working on a T2 server on AWS and while going through the steps following video 1. I got stuck while running this line of command

vgg = Vgg16()

![19 PM|690x431]

the error it is showing is MemoryError:

Does it has something todo with me not using p2 server. Please guide me through it. Also I my P2 server request is struck in processing. So is there any other way to proceed with it?

ecdrid · December 14, 2017, 12:21pm

Yes you don’t have sufficient memory…

fast.ai · December 17, 2017, 10:41pm

I started this course.
Steps done:
Step 1: Install Ubuntu latest version on my desktop
Step 2: Downloaded the files from Git hub
Step 3: Ran ubuntu setup shell file
Step 4: on firefox : localhost:8888 to enter the Jupyter Notebook
Step 5: Went to Lesson 1 and opened the notebook and then all I did was Restart the Kernel and Run the code.
Step 6: the Error states that something was removed from theano…Need help to resolve this and what needs to be done.

/--------------------------------------
'You are tring to use the old GPU back-end. '
117 'It was removed from Theano. Use device=cuda* now. '
118 'See https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end(gpuarray) '
/----------------------------------------
Now when I run the code it keeps failing to install the utils package. Here is the Error I get:

alueError Traceback (most recent call last)
in ()
----> 1 import utils; reload(utils)
2 from utils import plots

/home/ajay/Downloads/courses-master/deeplearning1/nbs/utils.py in ()
26 from IPython.lib.display import FileLink
27
—> 28 import theano
29 from theano import shared, tensor as T
30 from theano.tensor.nnet import conv2d, nnet

/home/ajay/anaconda2/lib/python2.7/site-packages/theano/init.pyc in ()
86
87
—> 88 from theano.configdefaults import config
89 from theano.configparser import change_flags
90

/home/ajay/anaconda2/lib/python2.7/site-packages/theano/configdefaults.py in ()
135 “letters, only lower case even if NVIDIA uses capital letters.”),
136 DeviceParam(‘cpu’, allow_override=False),
–> 137 in_c_key=False)
138
139 AddConfigVar(

/home/ajay/anaconda2/lib/python2.7/site-packages/theano/configparser.pyc in AddConfigVar(name, doc, configparam, root, in_c_key)
285 # This allow to filter wrong value from the user.
286 if not callable(configparam.default):
–> 287 configparam.get(root, type(root), delete_key=True)
288 else:
289 # We do not want to evaluate now the default value

/home/ajay/anaconda2/lib/python2.7/site-packages/theano/configparser.pyc in get(self, cls, type_, delete_key)
333 else:
334 val_str = self.default
–> 335 self.set(cls, val_str)
336 # print “RVAL”, self.val
337 return self.val

/home/ajay/anaconda2/lib/python2.7/site-packages/theano/configparser.pyc in set(self, cls, val)
344 # print “SETTING PARAM”, self.fullname,(cls), val
345 if self.filter:
–> 346 self.val = self.filter(val)
347 else:
348 self.val = val

/home/ajay/anaconda2/lib/python2.7/site-packages/theano/configdefaults.py in filter(val)
114 elif val.startswith(‘gpu’):
115 raise ValueError(
–> 116 'You are tring to use the old GPU back-end. '
117 'It was removed from Theano. Use device=cuda* now. '
118 'See https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end(gpuarray) ’

ValueError: You are tring to use the old GPU back-end. It was removed from Theano. Use device=cuda* now. See https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end(gpuarray) for more information.

voidmain · December 18, 2017, 1:49am

Hi guys, there’s a typo in this wiki: http://wiki.fast.ai/index.php/Lesson_1

make a copy of the lesson 1 notebook and use the new copy to draw in the new Dogs Vs. Cats data (if you copy the notebook outside of the course folder, don’t forget the vgg26.py files etc)

I believe here, vgg26.py should really be vgg16.py. I don’t have the privilege to modify it, so I have to post it here.

strickvl · December 19, 2017, 8:09am

Spend many hours over the past day or two struggling to get a submission in shape for Kaggle. Starting to feel like my python skills aren’t sufficient for this course. The people on this forum seem to breeze through things that take me hours and hours to figure out. For now I will continue, but was tempted many times to give up.

My final submission was generated thanks to a lot of helpful code from @Matthew. My final score on kaggle was 1.08402, which seems to be pretty bad in the rankings.

I’m going to go back through the code I wrote up now, using this as a model for how to try to improve. Feeling very drained / dispirited. Maybe if I take a bit of a break from things, then I’ll return and try the same workflow / codeflow on a different data set.

Still have very little idea of what’s going on in terms of how this all works / how finetuning reshapes the model etc. Anyway, just posting here for posterity’s sake. Will keep chugging along.

strickvl · December 19, 2017, 8:19am

I wanted to post some resources I found useful while struggling through putting together the dogs vs cats .csv file. Perhaps they’ll be useful for someone else.

Lesson materials (obviously)

Help with CSV export

this thread by @grasshopper
Code samples from @matthew

Figuring out the test() function

You need to put images in a /unknown folder. Thanks to this post and this post which allowed me to figure this out.

Getting help / general tips

Code backend

reviewing the vgg16.py file
reviewing the utils.py file

youngjeong46 · December 21, 2017, 4:14pm

So I have my own DL server with 8GB GeForce GTX1080 installed. While running Lesson 1 training set (the whole thing), I got an outofmemory exception. Is it common to run out of memory @ 8gb with that size of a training set? Should I be worried about using a bigger dataset in the future?

I had figured that initially I would be okay with just 1 GPU

carlosdeep · December 21, 2017, 5:24pm

I think you can work with bigger dataset but you will need to use small batch size to avoid memory issues. I also have the same Nvidia video board and even using not big dataset but batch size of 128,256 or bigger, I get the out of memory issue. So, I need stay around 32, 64 batch size.

youngjeong46 · December 21, 2017, 7:41pm

smaller batch will make it longer to train, I assume?

jk23541 · December 24, 2017, 7:57am

So to be clear, finetuning pretty much takes the result from batches = vgg.get_batches(path+‘train’, batch_size=batch_size) where it says (Found 22500 images belonging to 2 classes.) and then it filters my dataset into those 2 classes?

jk23541 · December 24, 2017, 11:28pm

Also, what exactly does vgg.fit do? Why do we need the val_batches? Does that help make the accuracy percentage?

damien · December 27, 2017, 11:41pm

Hello,

Could someone help me? I tried to reproduce the results on the lesson 1. The only changes is I use tensorflow with batchsize = 8. The accuaracy I can get only is around 0.91. Is that normal?

Thanks.

why_no_https · December 31, 2017, 7:03am

Hi there,

I’m trying to do the assignment by creating and training the network myself, just to check understanding. However my network does not converge (stuck on 0.5 accuracy forever). Would anyone be able to give any hints on why?

I’ve tested it on mnist and aside from being overkill, it works fine. I’ve tried with putting the pixel data in [0, 1] and [-1,1], doesn’t make any difference.

Any help would be greatly appreciated!

Model is below. I’m running it on my own machine, with tensorflow-gpu backend and tf dimension ordering.

BATCH_SIZE = 32 # 8
IMG_SIZE = 56 # 224
N_CHANNELS = 3
N_OUTPUTS = 2
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, N_CHANNELS)

train_datagen = image.ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(
    'data\\train',
    target_size=(IMG_SIZE,IMG_SIZE), class_mode='categorical',
    batch_size=BATCH_SIZE, shuffle=True)
valid_datagen = image.ImageDataGenerator()
valid_generator = valid_datagen.flow_from_directory(
    data\\valid',
    target_size=(IMG_SIZE,IMG_SIZE), class_mode='categorical',
    batch_size=BATCH_SIZE, shuffle=True)

model = Sequential()
model.add(ZeroPadding2D(1, input_shape=IMG_SHAPE))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(ZeroPadding2D(1))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))

model.add(ZeroPadding2D(1))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(ZeroPadding2D(1))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))

model.add(ZeroPadding2D(1))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D(1))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D(1))
model.add(Conv2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D(2, 2))

model.add(Flatten())

model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(N_OUTPUTS, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.fit_generator(
    train_generator,
    steps_per_epoch=int(np.ceil(train_generator.samples/BATCH_SIZE)),
    epochs=40,
    validation_data=valid_generator,
    validation_steps=int(np.ceil(valid_generator.samples/BATCH_SIZE)) )

irshaduetian · January 2, 2018, 5:14pm

Hi @why_no_https,
The model should work, but there might be a problem with Learning Rate or setting the shuffle=True on the validation set.
Try using a smaller rate (0.001 or smaller), and setting Shuffle=False in the validation set.

If it does not work, upload the copy of your notebook on gist-Github, I will try to figure out the problem

motor_city_cobra · January 3, 2018, 12:49am

I’m in the lesson1.ipynb.
I’m at the cell that is 'Downloading data from http://files.fast.ai/models/vgg16.h5’
and was wondering how to modify vgg16py so that it grabs vgg16.h5 and future files like it from my local disk. Because I already have them on my local disk.

I figured this was a useful problem to know how to solve, eliminating redundancy and all, but after many hours of changing code, searching the web, and trying to put the file where I think the downloaded file would be (like in the root folder env of the notebook), no luck.

It seems to be grabbing through a script in Keras but I can’t figure out what exactly is calling it to do this.
In vgg16py: I commented out line 46 and merged it with line 57,
then I tried to get line 140 to look at the local dir alone within model.load_weights()

why_no_https · January 3, 2018, 10:18am

Hi @irshaduetian thanks so much for responding!

I tried a smaller and a larger learning rate, as well as turning off shuffle, augmenting the data, removing dropout (to deliberately try and overfit to at least get some convergence) and a few other things.

My hypothesis is that either there’s just way too little data, or my weights are initialized in a really bad way. But I really don’t know!

Here is my Jupyter notebook: https://www.dropbox.com/sh/u3p21a8hhi7kwr8/AADXLRkjedxe-bsowbXZvvjXa?dl=0

Note that the data folders don’t contain any of the images, because that’s too much to download. Just unzip the cats and dogs archive and things should work. I use tensorflow backend and dimension ordering, but I can always modify it to use theano.

I really appreciate your help, thank you again!

irshaduetian · January 3, 2018, 3:05pm

Your assumption is right, VGG model is too big to learn good features from the small amount of data (cats and dogs) in a few epochs. Unfortunately, I didn’t have an environment to run your notebook otherwise I would have tested it myself.
My recommendation would be to run this model on the Mnist dataset and see if you can get a good accuracy out of that in around 30 epochs.

You can consult the mnist.ipynb in fast.ai course repo.

Let me know if you need further help

why_no_https · January 4, 2018, 1:09am

@irshaduetian I tried it on MNIST (I had to remove some conv2d layers since the inputs are smaller) and was able to get high accuracy after less than 1 epoch! However I’m worried there’s some problem with my network design or how I’m loading data, since I get the exact same results for every epoch. So each epoch ends with identical error and accuracy metrics - I would have thought it at least changes randomly a bit!

I’m currently downloading all of image-net to try training myself - although it seems like it might take a few months to download at the moment

ricardomang · January 8, 2018, 4:30am

Hello Haroun,

I’m looking to go through the lessons without the p2 instance because I have a machine that matches your specs.
Would you be willing to help me setup the configuration the lesson 1 Jupyter notebook.
I’m getting really confused, but I really wanna learn machine learning!!
Anything would help, thank you and good luck in future endeavors!

irshaduetian · January 8, 2018, 5:41am

If you really wanna Machine Learning then you are in the wrong thread : stuck_out_tongue:
Here is the new course about Machine Learning Another treat! Early access to Intro To Machine Learning videos