Dogs vs Cats - lessons learned - share your experiences

+1 would really love to see your code for doing this.

Does this allow validation data to be re-randomized when you run different experiments?

In the course I show an end to end process for MNIST that includes both ensembling and pseudo-labeling using “dark knowledge”.

Sorry @jeremy, I ran through a number of the lectures again looking for the section you’re talking about and the closest I could find was the start of lecture 6 where you talk about ensembling and pseudo labeling and I checked the mnist code but it doesn’t contain what I’m referring to.

My understanding after watching Hinton’s talk on dark knowledge is that what he refers to as ‘dark knowledge’ is the resulting vector from the shifting of a softmax layer’s outputs via a temperature so that the relationship between objects is much clearer. The vectors he shows at around 11:35 in the lecture are the idea i’m driving at. By training a new net on those soft predictions and a subset of hard targets he’s able to get some very interesting results.

I think there’s a chance we’re talking about different things unless I’m misunderstanding.

I spend about two weeks in this competition and learned a lot, my last score is 0.05051, place at 67, close to top 5%. The tools I used are dlib, keras and mxnet.

What I learned from this competition is:

1 : Ensemble may make your results worse
2 : Remember to record down the parameters you used, excel like editor is a nice tool for this
3 : Feed pseudo labels into the mini-batch with naive way do not work(I should finished lessons 4 before I gave it a rush even I am running out of time)
4 : Leverage pretrained model is much easier to get good results
5 : How to use dlib, keras and mxnet
6 : Read the post at forums, it may give you useful info
7 : Fast ai course is awesome, I should view them earlier(just finished lesson 4)

-------------Work approach--------------------

a : dlib

1 : split the data to 5 cross with augmentation(5 times), I did not figure out
which augmentation tricks work best, however, vertical augmentation looks like a bad choice
2 : extract features by resnet34 of dlib on the training data and test data, store them
3 : Predict the labels by different combinations of the k-cross models.
4 : Submit, score is 0.06266
5 : clip the value to 0.02, 0.98, this improve the score to 0.05688
6 : validate data with random crop might improve accuracy, but I have no time to try out

b : mxnet

I reentered this competition when I got 5 or 6 days left, so I am in a hurry, solution I tried on
mxnet and keras are less sophisticated than dlib

1 : Fine tune resnet34~200 on the dataset with augmentation, no k-cross validation,
did not figure out best why to augment the data.

2 : ensemble all of the results of the models, including the results of dlibs, this improve my
score to 0.05051

-------------Non work approach--------------------

1 : I trained different models by dlibs and ensemble them, but this give me worse results.The steps are

a : Extract augmented features by resnet34, store them
b : Train k-cross models with extracted features and different "top models"
c : ensemble the results
d : clip value to 0.02, 0.98
e : get worse results :frowning:

--------------My views on the library(bias)------------------

1 : keras

pros : easiest to use, lots of nice examples out there
cons : hard to extend(I want to change the way the data feed into mini-batch), maybe it is
because I am not an expert of python yet.Learn a new language is very easy, but become an expert of it is another story.

2 : mxnet

pros : more pretrained models
cons : Documents and examples are not that good, some(many) examples are outdated.I cannot figure out how to find out the numbers of layers, freezing learning rate of base layers with correct solution yet(I implement them but not sure they are correct).

3 : dlib

pros : could work as a zero dependency lib, easy to port to different platforms, a library designed to solve real world problems, apps development rather than prototype nor academic use. Nice documents, examples, high quality source codes(this is what we called modern c++ :slight_smile: looks like).

cons : Got one pretrained model(resnet34) only, small community, lack lots of of features in deep learning world. Since it is new, we can expect there will be more features add into it in the future.

ps : I may have bias on dlib because it is written by my favorite language–c++

4 Likes

Thank you for this idea @Even - works like a charm. I think it even works without follow_links = True in the generator (unless it is a default value as I didn’t have to set it).

Good point - I think of ‘dark knowledge’ as referring in general to the idea of training a neural net using the full set of predictions as the target, rather than just the predicted class. That’s what we do when we do pseudo-labeling in the lessons.

I’m not aware of the shifting the layer’s outputs via a temperature as being important - although I’m not sure I’ve seen a direct comparison.

Hey, i also created some code to automate the creation of test/sample folders

import os
import random
import shutil

def organize_folder(folder):
    _, _, filenames = next(os.walk(folder))
    unique_classes = {filename.split(".")[0] for filename in filenames}
    for _class in unique_classes:
        path = os.path.join(folder, _class)
        if not os.path.exists(path):
            os.makedirs(path)
        for filename in filenames:
            if filename.startswith(_class):
                shutil.move(os.path.join(folder, filename), os.path.join(path, filename))        
    
def create_sample_folder(_from, to, percentage=0.1, move=True):
    if not os.path.exists(to):
        os.makedirs(to)
    _, folders, _ = next(os.walk(_from))
    for folder in folders:
        if not os.path.exists(os.path.join(to, folder)):
            os.makedirs(os.path.join(to, folder))
        _, _, files = next(os.walk(os.path.join(_from, folder)))
        sample = random.sample(files, int(len(files) * percentage))
        for filename in sample:
            if move:
                shutil.move(os.path.join(_from, folder, filename), os.path.join(to, folder, filename))
            else:
                shutil.copyfile(os.path.join(_from, folder, filename), os.path.join(to, folder, filename))

I used organize_folder to create two folders for the dogs and cats competition, haven’t found a use for it in other competitions yet.

create_sample folder was what i used to create a sample/test/validation folders, it has served me pretty well so far.

2 Likes

Wow thanks tham for the write-up. its a great result, thank you for sharing your workflow.

I’ve started experimenting with Resnet50 (Keras’ builtin model), can you talk about the optimizer you use, and what kind of learning rate, decay, momentum you try with?

Thanks,

Jerry

Sorry for my late reply, recently I was spending my times on the videos and lectures of fast ai.

Yes, I do not have much times to tune the parameters, almost every models keras use the same setting.
Because I was running out of time, I trained on the whole training data set, did not split to training set and validate set

optimizer = adam
learning rate = 0.0001
momentum = default value

my top models looks like

top_model = Dense(128, activation='relu')(top_model)
top_model = Dropout(0.5)(top_model)
top_model = Dense(256, activation='relu')(top_model)
top_model = Dense(classes, activation='softmax')(top_model)

However, keras do not improve my results, mxnet did

@tham thanks for replying. Resnet50 doesn’t have the 2 dense layers like in VGG, are you referring to VGG in this example?

Thanks,

Jerry

it is resnet50, what I did is slap the resnet and dense layers together.

base_model = ResNet50(include_top=False, weights='imagenet', input_tensor=Input(shape=(im_dim, im_dim, 3)))
top_model = create_top_model(base_model, top_model_index=2, classes=2)
# Slap the model and FC block together and compile
model = Model(input=base_model.input, output=top_model)
2 Likes

Looks like the final answer wasn’t that complicated. Just throw a bunch of pretrained networks at the problem + ensembling.

1 Like

So much training and training time though…I wonder, is this really applicable in real life applications?

Just seems like a grand ensemble of all possibilities, which wouldn’t be useful or applicable for real world applications?

You can try to create a single neural network that consolidates the information from your ensemble into a single simpler model.

https://arxiv.org/abs/1503.02531

1 Like

Hi I am trying to run the ensemble notebook but I am running into a problem. When building the ensemble on the first pass when setting the weights at the top of train_dense_layer

def train_dense_layers(i, model):
conv_model, fc_layers, last_conv_idx = get_conv_model(model)
conv_shape = conv_model.output_shape[1:]
fc_model = Sequential(get_fc_layers(0.5, conv_shape))
for l1,l2 in zip(fc_model.layers, fc_layers):
weights = l2.get_weights()
l1.set_weights(weights) <------ Returns following error

the error

ValueError: You called ‘set_weights(weights)’ on layer "batchnormalization_xx with a weight list of length 0, but the layer was expecting 4 weights. Provided weights:[]…

Every time I retry the cell xx keeps increasing and when I look after the cell at the model summary the xx is always xx - 1.

Not sure I explained that very well.

So far I have discovered that dropout is causing a problem in this weight setting. All layers setting weights match until layer 4 is reached. In which we try to set the batchnormalization weights with dropout weights which of course there aren’t. Now I have to discover how to solve this.
If the layers have to match then I see the only way to get them to match is to add dropout to the l1 layers or remove dropout from the l2 layers. I tried the latter with comments which didn’t seem to work

I figured it out::

The get_fc_layers uses batch normalisation so calls to egg should use vgg16BN or remove the bn from get_fc_layers.

Thanks if you have had a similar issue

2 Likes

My experience was the ensemble results don’t match the position on the leader board. It is overfitting.

Took the Mnist ensemble and merged it to implement the dogscats-ensemble. The result; I moved up the leader board 150 places with respect to the original dogscacts-ensembler. (0.06668).

Changing the notebooks is quite challenging with out an xml type editor.

I want to change my latest to include Jeremy’s resnet50, but I am having problems fine tuning the model, i.e. to get dense 2 way output. I can remove (pop) the end layers or create with include_top=False but if I try to add a batch norm as per the ensemble three layers I am passing into batch norm parameters when they are not expected. Not sure what I am doing wrong

Joining the party a little late. The Dogs Vs Cats competition is closed, however I went ahead and submitted my file just to see where i was .

I was getting the validation accuracy of 0.9170 after 3 epochs following the notebook step by step. however, my logloss was pretty terrible. initially i did 0.025/0.975 and got logloss of 0.33. I then changed to 0.05/0.95 as in the notebook and it improved slightly.

Dogs and Cats predictions before clipping

Is dog predictions after clipping 0.05/0.95

Any pointers would be hugely appreciated…

Thanks

There are only 36 images within isdog where the probability is between 0.2 and 0.8 and 14 images where isdog is between 0.4 and 0.6

What is the ‘_from’ and ‘to’ variables?