Hi,
I just started part 1 and the Cat vs Dogs competition. I watched Jeremy’s approach to submitting a CSV using the predict_generator
function. However, for my own understanding I wanted to implement this by looping through the results and get predictions per batch.
I used the code below for this but have two issues
- The loop keeps running unless I manually specify that it should return once reaching the number of files in the test folder
- The test result score on Kaggle is horrible so I suspect that I might be mismapping file id’s and the prediction
Here’s my code
Predictions
test_batches = vgg.get_batches(path+'test', batch_size=batch_size, shuffle=False)
def make_prediction(batches):
result = []
filenames = batches.filenames
for idx, batch in enumerate(test_batches):
if idx * batch_size >= len(filenames):
return filenames, result
img, _ = batch
probs, label, category = vgg.predict(img)
result.extend(probs)
print("Percentage done:", idx * batch_size / len(filenames))
print(idx * batch_size)
return filenames, result
filenames, probs = make_prediction(test_batches)
Submission
import re
def prepare_output(filenames, probs):
df = pd.DataFrame()
filenames = [''.join(re.findall('\d+', s)) for s in filenames]
df['id'] = filenames
df['label'] = probs
return df.sort_values(by='id')
submission = prepare_output(filenames, probs)