Count images per vocabulary item

I created a datablock and corresponding dataset. I’m specifically using images and I’m mapping them on a MultiCategoryBlock. I can also check out the vocabulary by using dls.train.vocab. Is there a quick way to see how many images I have per category?

Some (pseudo) code:

import fastbook
fastbook.setup_book()
from fastai.vision.all import *
from fastbook import *

d = {'file': ['file1.jpg', 'file1.jpg', 'file1.jpg'], 'tags': ['a b', 'a', 'b']}
df = pd.DataFrame(data=d)
def get_x(r): return path/r['file']
def get_y(r): return r['tags'].split(' ')
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                                get_x = get_x, get_y = get_y,
                                item_tfms = RandomResizedCrop(128, min_scale=0.35))
dls = dblock.dataloaders(df, bs=1)
dls.train.vocab
> ['a', 'b']
#pseudo code:
dls.train.category_len
> {'a': 2, 'b': 2}

Hi danielsj!

Try the below code, I believe ‘train_label_counter’ will output what you’re looking for in your pseudo code:

classes = dls.vocab
train_lbls = L(map(lambda x: classes[x[1]], dls.train_ds))
train_label_counter = Counter(train_lbls)

Regards,
Pau

1 Like