I’ve been using an enhanced version of functionality implemented in fastai, that has been very usefull in my work so I tought it may worth sharing.

def filter_files(files, include=[], exclude=[]):
    for incl in include:
        files = [f for f in files if incl in]
    for excl in exclude:
        files = [f for f in files if excl not in]
    return sorted(files)

def ls(x, recursive=False, include=[], exclude=[]):
    if not recursive:
        out = list(x.iterdir())
        out = [o for o in x.glob('**/*')]
    out = filter_files(out, include=include, exclude=exclude)
    return out = ls

It allows to list files in all subdirectories and also apply filters, for example:

path = Path('data')

# List files, including subdirectories, with .tif in the name but excluding .tif.xml:, include=['.tif'], exclude=['.xml'])

this is good. also printing size is a good enhancement. hope you can raise a PR and merge back.

For the recursive path, you should look at our get_files function and os.walk as it’s way faster than the glob ‘**/*’ if you have a huge dataset (like ImageNet).


Thanks, I will take a look at it!