Splitting data into valdation and training | Script included

Hi,

I just finished writing a script to separate my data into a train and validation folder and also shuffle the data.
My problem is that when I get data from the internet through API’s or via scraping I will usually end up with multiple directories where all my images will be held.

  • data
    • cats(contains all cat images)
    • dogs(contains all dog images)
    • elephants(contains all elephant images)

But for Keras you need to split this data into training and validation. As you can imagine doing this by hand is way too much work so I created a little script that will automatically shuffle your data and then move it into a train and valid directory while keeping your directory structure the same so it will look like this:

  • data
    • train(80%)
      • cats
      • dogs
      • elephants
    • validation(20%)
      • cats
      • dogs
      • elephants

The script is by no means perfect, I am not an expert programmer and most of it was just copy pasted:

https://gist.github.com/GertjanBrouwer/95c815565d3d8788137929ef27054db9

1 Like