Kaggle: Statefarm Distracted Driver: How to use ImageClassifierData

krisho007 · September 7, 2018, 10:31am

Hi,
I just completed Week 3 of Part 1. Successfully submitted the Planet competition and was trying my hands with https://www.kaggle.com/c/state-farm-distracted-driver-detection competition. I am stuck with writing the get_data function.
I wrote it like this.
def get_data(sz):
tfms = tfms_from_model(resnet50, sz, max_zoom=1.05)
return ImageClassifierData.from_paths(PATH, tfms=tfms, trn_name=‘train’, val_name=None, test_name=‘test’)

I get an error when I try get_data(64).
join() argument must be str or bytes, not ‘NoneType’ It might be because of not having a val_name folder? Any help is appreciated.

sjdlloyd · September 7, 2018, 10:44am

You do need a val_nane. Is there a reason that you’re going without a validation set?
If there is a reason, you’re best off creating a dummy directory with a folder for a category, and putting an image in that

krisho007 · September 7, 2018, 10:56am

Thanks Sam for the answer. I will do that. Another question. How do I fill this validation set with 20% of training set in this case?

sjdlloyd · September 7, 2018, 11:41am

I’m on my phone, so below isn’t tested, and I can’t remember all the syntax so you will have to do some debugging

from pathlib2 import Path
TRAIN = Path (‘data/train’)
VAL = Path(‘data/valid’)
for cat in TRAIN.iterdir():
cat_name = cat.name
img_list = list(cat.iterdir())
Img_list.shuffle()
cut = into( len(img_list)/5)
val_list = img_list[:cut]
cat_val = VAL/cat_name
cat_val.mkdir()
for img in val_list:
img.stat(cat_val/IMG.name) # check this is the right method to move!

krisho007 · September 7, 2018, 7:25pm

Thanks a lot Sam. At last got this working with below code. You gave me lots of inspiration on how to approach this.

from pathlib import Path, PurePath
import random

# Create path objects
train_path = Path('data/distractedDriver/imgs/train')
# create the 'valid' folder beforehand
validation_path = Path('data/distractedDriver/imgs/valid')

for cat in train_path.iterdir():
    cat_name = cat.name
    #create a folder with name cat_name
    validation_path.joinpath(cat_name).mkdir()
    cat_path_train = train_path.joinpath(cat_name)
    cat_path_val = validation_path.joinpath(cat_name)
    # Get 5 random files from the train category folder
    source_files = random.sample(list(cat_path_train.iterdir()), sum(1 for _ in Path(cat_path_train).iterdir())//5))
    for file in source_files:
        new_file = str(PurePath(file)).replace('train', 'valid')
        file.rename(new_file)

sjdlloyd · September 7, 2018, 7:39pm

file.rename()… that’s what I was trying to remember!