Defining get_y

I am currently doing a multilabel image classification model training and its Inference in Python code(django framework).
Please find below the dataBlock defined :

forms = DataBlock(
blocks=(ImageBlock, MultiCategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_y=lambda o: [Path(o).parent.name],

When execute learn.export(), I am getting the below error:

PicklingError: Can’t pickle <function at 0x0000028B306A3948>: attribute lookup on main failed

I don’t want to use a custom function(for some reason) for the get_y.
So someone please suggest any other possible ways to define the dependent variable.
Each class of images are arranged in each separate folder and y should be the name of the folder. Thanks in advance.

Well, you have to. You can’t pickle a lambda function, that just doesn’t work. So just wrap what you do in an actual function.

For example, I can do: def(o): return [o]

If my lambda was: lambda x: [x]

3 Likes

Thanks for your quick response :slight_smile:

gey_y should return the parent folder name only.
Instead of lambda I had tried with defining custom function(as below)

def parent_labeler(o, **kwargs):
return [Path(o).parent.name]

But during inference(django-Python) the custom function is expected to be in manage.py
I believe that is not a standard way and that will not work in the production environment as there it is deployed and hosted as wsgi app, where manage.py will not get invoked.

So I am searching for any other solution other than defining a custom function.
When using CategoryBlock instead of MultiCategoryBlock , the below code works well:

get_y=parent_label

But when using it with MultiCategoryBlock, the vocab function is returning some single characters instead of parent folder name, which I didn’t understand.

learn.dls.vocab.items

What you’re asking simply isn’t possible. You will need that function when you export your module no matter what, just as how you need the fastai library when you export and deploy. That is simply how the API works.

Your parent_label example works because it’s nested in the fastai library, its placed somewhere already when you install.

If you really don’t want to do that you can make a small git repo and install your required bits and bobs from a private library

1 Like

:slight_smile:Thanks Zach .

I have no issues to use/import the custom function definition while using load_learner, but it is expecting to be in manage.py itself(django), not just in the class where we load the pkl file.

On the other side , any idea why it is returning so many single characters when we vocab, if we have used parent_label with MultiCategoryBlock ? I am thinking of how we can make use of parent_label function along with MultiCategoryBlock.

Why we are getting single characters when we vocab, if we have used parent_label for get_y with MultiCategoryBlock ?

Because a string is an array of letters. Multi CategoryBlock expects an array as your label. Hence why for the single label multi category example (what you are trying to do), we wrap it in an outer array letting the category be the only answer.

1 Like

I was trying to learn how multicategoryBlock works when get_y is defined as parent_label.

I tried with a test sample. Below are 2 folders(those are just for testing) containing images of 2 different categories: ‘ABC’ and ‘CDE’.
image

Below is my DataBlock:
image
and vocab output is as below:
image

After fine tuning when tested with a prediction it gives the result as below:
image

Each letter of the folder name have been treated as a single class.

:slight_smile: Without using custom function, I believe the only way we can do is by having the folder name with different single character; and later on after prediction we can interpret each single character as our required category name.

Thanks @muellerzr for the explanations.