How to create categories from image titles when using ImageDataBunch.from_folder()?

oo92 · May 10, 2020, 10:35pm

Hi.

When using ImageDataBunch.from_name_re(), fastAI takes care of creating the categories by using the name of each image. We simply have to input a regex string like this:

file_parse = r'/([^/]+)_\d+\.(png|jpg|jpeg)$'

Is there a way to do this when using ImageDataBunch.from_folder()?

np.random.seed(42)
data = ImageDataBunch.from_folder('/content/Food-101/images', train='/content/Food-101/images/train', test='/content/Food-101/images/test', valid_pct=0.2, ds_tfms=get_transforms(), size=224)
data.normalize(imagenet_stats)

Because when I am running epocs, this is what fastAI is giving me:

epoch	train_loss	valid_loss	error_rate	time
0	0.000000	0.000000	0.000000	21:05
1	0.000000	0.000000	0.000000	20:43
2	0.000000	0.000000	0.000000	20:38
3	0.000000	0.000000	0.000000	19:38
4	0.000000	0.000000	0.000000	20:01

I believe this is because my file_parse variable isn’t used anywhere.

shikamaru · May 12, 2020, 2:28pm

hi Onur,

I think it is not possible to directly use ImageDataBunch, But you can use datablock API to create that.

data = (ImageList.from_folder(path)
.split_by_folder()
.label_from_re(pattern)
.databunch())

this is the basic idea.

You can look here for more details, “https://docs.fast.ai/data_block.html”.

oo92 · May 12, 2020, 5:00pm

But where does training and test go into that?

shikamaru · May 12, 2020, 6:34pm

We use the same data block as before but we add ‘add_test_folder’ to get test data.

data = (ImageList.from_folder(path)
.split_by_folder()
.label_from_re(pattern)
.add_test_folder()
.databunch())

In fastai framework we won’t have any labels for test data because we want to get results for that,but if you want look at how good your model is doing with data for which you already have labels you consider it as validation set.

If it indeed is validation set then .from_folder() expects it to have name as ‘valid’ if it has other name you can do this,
.from_folder(train=name,valid=name)

oo92 · May 12, 2020, 11:14pm

I don’t get it. The validation set is to be created from the training set. I have an additional, labelled test set as well.

shikamaru · May 13, 2020, 8:59am

It is not always a case that you want to use training data in validation set. Since you have a separate set with labeled data to look how your model is doing, you don’t need to take out some data from training set to make a validation set.