I just started part 1 and the Cat vs Dogs competition. I watched Jeremy’s approach to submitting a CSV using the
predict_generator function. However, for my own understanding I wanted to implement this by looping through the results and get predictions per batch.
I used the code below for this but have two issues
- The loop keeps running unless I manually specify that it should return once reaching the number of files in the test folder
- The test result score on Kaggle is horrible so I suspect that I might be mismapping file id’s and the prediction
Here’s my code
test_batches = vgg.get_batches(path+'test', batch_size=batch_size, shuffle=False) def make_prediction(batches): result =  filenames = batches.filenames for idx, batch in enumerate(test_batches): if idx * batch_size >= len(filenames): return filenames, result img, _ = batch probs, label, category = vgg.predict(img) result.extend(probs) print("Percentage done:", idx * batch_size / len(filenames)) print(idx * batch_size) return filenames, result filenames, probs = make_prediction(test_batches)
import re def prepare_output(filenames, probs): df = pd.DataFrame() filenames = [''.join(re.findall('\d+', s)) for s in filenames] df['id'] = filenames df['label'] = probs return df.sort_values(by='id') submission = prepare_output(filenames, probs)