nice script! I've started to write my own helpers too. I might steal some of yours
As soon as I download the images, I create "backup_train" and "backup_test" directories where I stored the unzipped images. But I don't create any subdirectories yet. I do that in the notebooks since each competition will be different and I want the whole process to be completely repeatable every time. If I screw up somewhere and need to "reset" I just delete train or test and
cp -r backup_train train
In my notebooks I try to stay in the same directory the whole time. No cd.. But at the beginning I create a reference variable DATA_HOME_DIR.
import os, sys
current_dir = os.getcwd()
DATA_HOME_DIR = current_dir+'/data/statefarm'
After that I can reference that directory anywhere in my code: move, copy files, load images, etc.
I'll also sometimes add this:
#Set Paths - Sample or Prod
root = DATA_HOME_DIR+'/sample' #or nothing
test_path = DATA_HOME_DIR+'/test/'
results_path = root + '/results/'
train_path = root + '/train/'
valid_path = root + '/valid/'
models_path = root + '/models/'
Which lets me switch between prod and sample datasets quickly: