I developed a Python Package to Spit Folders (train/val/test)


(Johannes Filter) #1

Hey people,

when I was applying image classification to my own images, I couldn’t find a Python package to easily split folders into training, validation and test sets. So I created one: https://github.com/jfilter/split-folders

Let me know what do you think of it.

Best Wishes
Johannes


(Ilia) #2

Interesting! I was trying to build something similar here. Though my idea was to build a small toolkit with various helpers, like, visualizations, etc. that can help one during their notebook manipulations.

Now the project is in a bit stale state, though I would like to continue development soon =) Delete some old stuff, make the API stable, remove implicit dependencies from keras and Kaggle files structure, and so on.


(Michael) #3

Dear Johannes,

this looks nice and I could need something like this just right know.

In my code snippet database I have this useful script for copying the images to the class folders for the cifar dataset from DeepLearning-Lec7Notes :

import shutil
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# create sub-folders for each class
OUTPATH = 'data/cifar10/'
for x in classes:
    os.makedirs(OUTPATH+'train/'+x, exist_ok=True)
    os.makedirs(OUTPATH+'val/'+x, exist_ok=True)

INPATH = 'data/cifar/'
filenames = os.listdir(INPATH+'train/')
counts = {x:0 for x in classes}
print(len(filenames))

# copy files from cifar folder to cifar10 folder with sub-directories
valsz = len(filenames) / 10 * 0.2 # 20%

for fl in filenames:
    for cl in classes:
        if cl in fl:
            counts[cl] += 1 # increase count +1
            if counts[cl] < valsz:
                shutil.copy(INPATH+'train/'+fl, OUTPATH+'val/'+cl+'/'+fl)
            else:
                shutil.copy(INPATH+'train/'+fl, OUTPATH+'train/'+cl+'/'+fl)
        if 'automobile' in fl:
            counts['car'] += 1
            if counts[x] < valsz:
                shutil.copy(INPATH+'train/'+fl, OUTPATH+'val/car/'+fl)
            else:
                shutil.copy(INPATH+'train/'+fl, OUTPATH+'train/car/'+fl)

# copy test set
filenames = os.listdir(INPATH+'test/')
os.makedirs(OUTPATH+'test/', exist_ok=True)

for fl in filenames:
    shutil.copy(INPATH+'test/'+fl, OUTPATH+'test/'+fl)

Maybe this is interesting for incorporating a class to dir sorting functionality?

Maybe I will look into this, as I maybe need this for my current pet project.

Thanks for sharing & best regards
Michael