Labeling training data with non-Latin characters

Hi! I am using a dataset of museum images from a Korean dataset. When I load the images using the ImageDataLoader, I have a function that labels each image based on the category of the image - but these words are in Korean.

data = ImageDataBunch.from_name_func(PATH/'data/images', fnames, label_func = func, ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats) data.show_batch(rows=3, figsize=(7,6))

These are some of the errors I am seeing:
UserWarning: You are labelling your items with CategoryList. Your valid set contained the following unknown labels, the corresponding items have been discarded. 석 gray schist, 나무 나무에 채색, 도자기010 if getattr(ds, 'warn', False): warn(ds.warn) /Users/...../anaconda3/lib/python3.7/site-packages/matplotlib/backends/ RuntimeWarning: Glyph 53664 missing from current font. font.set_text(s, 0.0, flags=flags) ... ...

I changed the func function to return just a random english word instead for the label, and then I don’t have these issues anymore.

Is there a way for me to specify that the label I want is in a non-Latin character?

Hi hellogoodbye hope all is well.
Not sure if you have sorted this problem.
Could you use some sort of codec? Like the ones in the link to do some conversion.

Cheers mrfabulous1 :smiley::smiley: