Unbalanced class distribution -> Classes missing from validation set

Hi! I’m working on a tabular classification dataset where classes are unequally distributed which often causes the validation set to have zero samples of some classes. What would be the best way to approach this? Does Fast.ai have a method to ensure all classes are included in the validation set, or should I go for upsampling/downsampling?

I am wondering the same thing. I have an image classification dataset and not all classes are showing up in the validation/test data.

I have 5 classes (labeled: 1, 2, 3, 4, 5) and this is the result of:
data = ImageDataBunch.from_name_re(path_img, fnames, pat, size=224, bs=bs,ds_tfms=get_transforms()).normalize(imagenet_stats)

ImageDataBunch;

Train: LabelList (4400 items)
x: ImageList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: CategoryList
5,3,3,3,3
Path: /home/jupyter/SCUT-FBP5500_v2.1/Images;

Valid: LabelList (1100 items)
x: ImageList
Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224),Image (3, 224, 224)
y: CategoryList
5,4,4,4,3
Path: /home/jupyter/SCUT-FBP5500_v2.1/Images;

Test: None

These are the number of instances of each class:
1 (169)
2 (789)
3 (2576)
4 (1166)
5 (800)

The hashing trick provides a solution to the problem of incomplete classes in categorical problems, by using a hash function to map features to category labels in a deterministic (but possible non-unique way) onto a fixed set of categories. A subset of this type of problem, in which the data may have previously unknown classes that have not been seen before, is the case of out-of-core learning, where the data is too large to fit in memory, so you can only process a batch at a time; such a batch might not contain examples of all the classes.

Don’t be tricked by the Hashing Trick is a great blog post on the hashing trick.

Here’s an example of the implementation of the hashing trick for the Kaggle Avazu Click Through Rate Competition.

@jcatanza I’ll look into this, thanks.