How to read customs labels to create Dataloaders

Hi , I am trying to do my image classifier. I already named my images bases on certain criteria.
example: grizly_001.jpg, black_002.jpg, etc…

def splitLabel(label):

bears = DataBlock(
blocks=(ImageBlock, CategoryBlock),
splitter=RandomSplitter(valid_pct=0.20, seed=42),
get_y= ## How do I custom read my labels??,

how do I tell dataloader?

A couple of ways if know:

This assumes images and labels are in a directory structure like
/bears/images/[images here]
/bears/labels/[labels here]

def get_y(x):
    return (str(x).replace('images','labels')) #<== or custom code to get your equivalent label for the image

bears = DataBlock(
... get_y=get_y, ...

this assumes the ‘path’ variable is loaded with ‘/bears’ directory

path = Path('/bears')

bears = DataBlock(
...  get_y=lambda o: path/'labels'/f'{o.stem}{o.suffix}', ...

Ok thanks so , I can just code my own function and say get_y=My_Function(x)

Do I have to give the input, in your example you define def get_y(x): , but in the Datablock you say get_y=get_y . Should I say get_y=get_y(x) instead???


Nope, no x needed. Give it a try. You might want to look up pipelines.

Also if you put a print(x) in the “def get_y(x)” you will see the file that is sent in.

In the end I decided to go with RegexLabeller function. Windows 10 had trouble with the split function

get_y=using_attr(RegexLabeller( r’(label_1|label_2|label_3)’ , ‘name’ ),

so the function will find one of the label options. If you want to do multi-label

1 Like