Questions about ImageDataBunch.from_name_re and ImageDataBunch.from_folder

Hi, I have questions about loading the image datas with ImageDataBunch.

My data path is like. ./label_names/xxx.jpg

and I set:
path = Path(./)

If I use ImageDataBunch.from_name_re, it actually requires the name, which supposed to be get from
fname = get_image_files(path). However, this method does not really go recursively into the subfolders, so that fname will get return []. Question: Is there a way to let get_image_files recursively go into subfolders? (although a simple loop may do the job if only 1-layer deep)

If I use ImageDataBunch.from_folder, it requires u re-arrange your folders into ./train/label_names/xxx.jpg and ./valid/label_names/xxx.jpg. Well, although it is common practice to prepare the valid dataset, sometimes I might want them to be generated automatically? Is there a function in the lib to do this?

Thanks a lot.

3 Likes

You can create your file list any way you like. I’d suggest this:

https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob

How can I use ImageDataBunch where I have a numpy array of 60000 X 784 as image dataset where each image of total 60000 is 784 pixel flattened?

This is an example of glob usage:

# Create list of all files
all_files = flat_list([d.glob('*') for d in path_train.glob('*')])
np.random.shuffle(all_files) # Ensure no bias from ordering
print('Files count: ' + str(len(all_files)))
print('sample: ', all_files[:10])
files = all_files # Assign files scope to all
1 Like

Hey,
I have the same question as OP. I’ve tried to go through the link above but I haven’t fully understood how to go about it. As far as I know, in the fast.ai docs, to use the from_name_re function one has to create some sort of a function “fn_paths” to pass as an argument to from_name_re.

Since my data path is like …/label_name/__.jpg, how would I go about creating a function that extracts the label names from this? The example code of glob usage doesn’t work for some reason and throws the error: “name ‘flat_list’ is not defined”. Even if I were to import the libraries, I wouldn’t know how to go forward with it.

If possible, would there be an easier to understand way to do this?