Huge performance improvement with network training!

Christina · May 3, 2017, 6:32pm

I created a little convenience function to split the main train directory into training and validation directories and randomly shuffle files into them, based upon the number of folds (while keeping the main training directory intact. Yes, it is a brute force way of doing it, and isn’t stratified k-fold, but you can call this function to shuffle things up between sets of epochs.

> import os, shutil
> import numpy as np
> import random

> #This function keeps the main train dir intact, and creates 2 new dirs, one each for
> #randomly selected train/test split

> def train_val_split(train_path, split_train_path, split_val_path):

>     # First see if train_split and val_split directories already exist - if so, delete them...
>     if(os.path.exists(split_train_path)): shutil.rmtree(split_train_path)
>     if(os.path.exists(split_val_path)): shutil.rmtree(split_val_path)

>     # Create a new val directory
>     os.mkdir(split_val_path)

>     # Next copy everything in the combined training directory to a the split training directory
>     shutil.copytree(train_path, split_train_path)

>     num_folds = 5  # One of the folds to be val, the rest for train...

>     for subdir in glob(split_train_path + '*'):
>         valsubdir = split_val_path + subdir.split('/')[3]
>         os.mkdir(valsubdir)
>         g = glob(subdir + '/*.jpg')

>         shuf = np.random.permutation(g)

>         for i in range(int(round(len(shuf)/num_folds))):
>             print("Transferring ", shuf[i], " to ", split_val_path + shuf[i].split('/')[3] + '/' + shuf[i].split('/')[4])           
>             os.rename(shuf[i], split_val_path + shuf[i].split('/')[3] + '/' + shuf[i].split('/')[4])