Lesson 5 discussion


I believe I found a mistake/typo on lesson 5 notebook , in the Multi-size CNN.
When constructing the ‘graph’ model (multi CNN) - its Input layer shape is written as

graph_in = Input ((vocab_size, 50)) where vocab_size is 5000 (when its need to be 500):

This graph model is than used in the sequential model after the embedding layer that have output of (500,50):

so to fix that i would change the graph model Input layer to be:
graph_in = Input ((seq_len, 50)) which is the output size for the embedding layer.

(Bobby Lindsey) #43

I’m having some trouble instantiating the Vgg16BN() class and getting the error:

Unable to open file (File signature not found)

From what I’ve read the vgg16_bn.h5 file might be corrupt or not in the correct format and I’m currently downloading that file from http://files.fast.ai/models/.

Does anyone know of a resolution for this? I’d really like to use batch normalization with vgg16 model.

(arnaud schenk) #44

I’m really proud of this result on the IMDB sentiment analysis challenge:

The model I used to achieve this looks like this:

With model_8 just being the graph model that Ben Bowles showed in his blog post.

(John Lundberg) #45

I wondered this as well and found that @jeremy has answered this in another thread. Why do we divide the embedding by 3?

(Ravi Teja Gutta) #46

Thank You @johnlu



I had the same error when trying to run this line from the lesson3 notebook:

After some searching, I (re)discovered that the vgg16_bn.h5 was stored in my ~/.keras/models directory. An ls -lh command revealed that it was only 63K in size whereas the oft-used vgg16.h5 (ie sans bn) is 528M.

ubuntu@ip-10-0-0-9:~/.keras/models$ ls -lh
total 528M
-rw-rw-r-- 1 ubuntu ubuntu 35K Jun 18 18:26 imagenet_class_index.json
-rw-rw-r-- 1 ubuntu ubuntu 63K Jun 18 18:04 vgg16_bn.h5
-rw-rw-r-- 1 ubuntu ubuntu 528M Jun 18 18:26 vgg16.h5

I thus wgot the file again using the link you provided and that above line from lesson 3 now runs without error for me.


(jyo) #48

Dose any one know why no batchnorm is used in all lesson 5(the notebook) 's model. Does it helps to use batchnorm?

(Bobby Lindsey) #49

Awesome observation! That fixed my error as well. Appreciate it, jp_beaudry.


Hey everyone, I’m trying to re-implement Dogs and Cats using the functional API, but with little success…

I set up batches as earlier in the course, i.e.

batches = get_batches(train_path, batch_size=batch_size)
val_batches = get_batches(valid_path, batch_size=batch_size*2)

Then built a model as per the functional API, and successfully loaded up the weights from vgg16.h5. But now when I try to run:

model.fit_generator(batches, samples_per_epoch=batches.nb_sample, nb_epoch=1, validation_data=val_batches, nb_val_samples=val_batches.nb_sample)

I get the error:

Exception: Error when checking model input: expected input_4 to have shape (None, 3, 244, 244) but got array with shape (64, 3, 224, 224)

(As if the generator isn’t recognising the batch dimension?)

Any help on this would be massively appreciated. (The capacity to identify cats from dogs has become rather central to my sense of self worth over the past month…)

(Matthieu Di Mercurio) #51

@idano did you get an answer on this one? I would have expected this to throw an error but it looks like it works.

(tittaya) #53

How to deal with this error?

(nok) #54

I have the same issue. I am using python 3.4, when I install Theano==0.9 I get error like this.

Exception: Compilation failed (return status=1): g++.exe: error: Chan\theano\compiledir_Windows-10-10.0.15063-Intel64_Fa. g++.exe: error: Chan\theano\compiledir_Windows-10-10.0.15063-Intel64_Family_6_Model_78_Stepping_3_GenuineIntel-3.4.5-6. lazylinker_ext\mod.cpp: No such file or directory

(Charles Daly) #56

In the sentiment example, why is setting all words to a specific value, 5000, better than just dropping those rare words altogether?

(Marco) #57

Hi all,

Can someone please explain why in the create_emb function we divide the emb matrix by 3 at the end?




This is explained in this post:

(An R) #59

Lesson 5 is an amazing introduction to sentiment analysis. I tried to game the system by predicting the sentiment of a sarcastic review. Of course it failed (in fact it got better score than a truly honest positive review). Has anyone tried to train against sarcasm? Is it even possible or that’s A.I. 2.0?

phrase = np.array([], dtype="int64")

np.append(phrase, [1.])

textphrase = 'yeah sure, you should trust the reviews, by all means, this is an amazing movie, come and enjoy :/ NOOOOT'

for o in textphrase.split(' '):
    if o in ids:
        phrase = np.append(phrase, ids[o])

padded_phrase = sequence.pad_sequences([phrase], maxlen=seq_len, value=0)

​conv1.predict(padded_phrase, True)

output ---> array([[ 0.942]], dtype=float32)

(Nicolas Philippe) #60

I completely agree. Great catch, IMO.

The Embedding layer is based on an output tensor of size

(None, 500, 50)

as you pointed out. This fits properly with the normalized sequence length (500) and the dimension of embedding (50) attached to each sentence.

The size of the vocabulary is only relevant to the Embedding layer to understand the range of integers used to represent the inputs fed to that layer (i.e. the result of the word2idx).

The input size of the convolutional layers however should match the length of each sentence. The filters (3,4,5) are applied to the tensor of latent factors (500x50).

While the size of the resulting matrices are ten fold in the class’ notebook vs. what they are supposed to be, the number of resulting parameters, however, does not change : it’s primarily dependent on the size of filters.

Bottom line is that it still works but it’s probably not as efficient (takes longer to train per epoch) and the results are probably slightly more “noise” prone (therefore taking more epoch to reach the same accuracy).

@Jeremy / @Rachel : if you agree with @idano 's assessment, what’s the best way to correct this ? Pull request ?



I believe you can solve this by change the line
if word and ....

to this:

if word in words and re.match(r"^[a-zA-Z0-9\-]*$", word):

(Phong) #62

Does anyone get the different result from the lecture’s note when running the model by yourself ? I often get lower accuracy comparing with the lecture’s note result. For example below I trained the model “Single conv layer with max pooling” as in the note, but the result is very low although I did the same steps.

(Phong) #63

I did the same and got higher accuracy score. Thanks for your advice