Submitting Keras prediction to CSV


I just started part 1 and the Cat vs Dogs competition. I watched Jeremy’s approach to submitting a CSV using the predict_generator function. However, for my own understanding I wanted to implement this by looping through the results and get predictions per batch.

I used the code below for this but have two issues

  1. The loop keeps running unless I manually specify that it should return once reaching the number of files in the test folder
  2. The test result score on Kaggle is horrible so I suspect that I might be mismapping file id’s and the prediction

Here’s my code


test_batches = vgg.get_batches(path+'test', batch_size=batch_size, shuffle=False)

def make_prediction(batches):
    result = []
    filenames = batches.filenames
    for idx, batch in enumerate(test_batches):  
        if idx * batch_size >= len(filenames):
            return filenames, result
        img, _ = batch
        probs, label, category = vgg.predict(img)
        print("Percentage done:", idx * batch_size / len(filenames))
        print(idx * batch_size)
    return filenames, result

filenames, probs = make_prediction(test_batches)


import re
def prepare_output(filenames, probs):
    df = pd.DataFrame()    
    filenames = [''.join(re.findall('\d+', s)) for s in filenames]
    df['id'] = filenames
    df['label'] = probs
    return df.sort_values(by='id')

submission = prepare_output(filenames, probs)