How to create categories from image titles when using ImageDataBunch.from_folder()?

Hi.

When using ImageDataBunch.from_name_re(), fastAI takes care of creating the categories by using the name of each image. We simply have to input a regex string like this:

file_parse = r'/([^/]+)_\d+\.(png|jpg|jpeg)$'

Is there a way to do this when using ImageDataBunch.from_folder()?

np.random.seed(42)
data = ImageDataBunch.from_folder('/content/Food-101/images', train='/content/Food-101/images/train', test='/content/Food-101/images/test', valid_pct=0.2, ds_tfms=get_transforms(), size=224)
data.normalize(imagenet_stats)

Because when I am running epocs, this is what fastAI is giving me:

epoch train_loss valid_loss error_rate time
0 0.000000 0.000000 0.000000 21:05
1 0.000000 0.000000 0.000000 20:43
2 0.000000 0.000000 0.000000 20:38
3 0.000000 0.000000 0.000000 19:38
4 0.000000 0.000000 0.000000 20:01

I believe this is because my file_parse variable isn’t used anywhere.

hi Onur,

I think it is not possible to directly use ImageDataBunch, But you can use datablock API to create that.

data = (ImageList.from_folder(path)
.split_by_folder()
.label_from_re(pattern)
.databunch())

this is the basic idea.

You can look here for more details, “https://docs.fast.ai/data_block.html”.

But where does training and test go into that?

We use the same data block as before but we add ‘add_test_folder’ to get test data.

data = (ImageList.from_folder(path)
.split_by_folder()
.label_from_re(pattern)
.add_test_folder()
.databunch())

In fastai framework we won’t have any labels for test data because we want to get results for that,but if you want look at how good your model is doing with data for which you already have labels you consider it as validation set.

If it indeed is validation set then .from_folder() expects it to have name as ‘valid’ if it has other name you can do this,
.from_folder(train=name,valid=name)

I don’t get it. The validation set is to be created from the training set. I have an additional, labelled test set as well.

It is not always a case that you want to use training data in validation set. Since you have a separate set with labeled data to look how your model is doing, you don’t need to take out some data from training set to make a validation set.