Questions about ImageDataBunch.from_name_re and ImageDataBunch.from_folder

Hi, I have questions about loading the image datas with ImageDataBunch.

My data path is like. ./label_names/xxx.jpg

and I set:
path = Path(./)

If I use ImageDataBunch.from_name_re, it actually requires the name, which supposed to be get from
fname = get_image_files(path). However, this method does not really go recursively into the subfolders, so that fname will get return []. Question: Is there a way to let get_image_files recursively go into subfolders? (although a simple loop may do the job if only 1-layer deep)

If I use ImageDataBunch.from_folder, it requires u re-arrange your folders into ./train/label_names/xxx.jpg and ./valid/label_names/xxx.jpg. Well, although it is common practice to prepare the valid dataset, sometimes I might want them to be generated automatically? Is there a function in the lib to do this?

Thanks a lot.

2 Likes

You can create your file list any way you like. Iā€™d suggest this:

https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob

How can I use ImageDataBunch where I have a numpy array of 60000 X 784 as image dataset where each image of total 60000 is 784 pixel flattened?

This is an example of glob usage:

# Create list of all files
all_files = flat_list([d.glob('*') for d in path_train.glob('*')])
np.random.shuffle(all_files) # Ensure no bias from ordering
print('Files count: ' + str(len(all_files)))
print('sample: ', all_files[:10])
files = all_files # Assign files scope to all
1 Like