AssertionError with HuggingFace dataset and vision_learner

e-eight · August 19, 2023, 9:28am

I am trying to load datasets from HuggingFace datasets for image classification problems. For example, for Fashion MNIST, I have the following code:

from datasets import load_dataset, load_dataset_builder
from fastai.data.core import DataLoaders
from fastai.vision.all import *
from torch.utils.data import DataLoader
import torch

ds_name = "fashion_mnist"
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_ds = load_dataset(ds_name, split="train").with_format("torch", device=device)
valid_ds = load_dataset(ds_name, split="test").with_format("torch", device=device)

train_dl = DataLoader(train_ds, batch_size=256, shuffle=True, num_workers=1, pin_memory=True)
valid_dl = DataLoader(valid_ds, batch_size=256, shuffle=False, num_workers=1, pin_memory=True)
dls = DataLoaders(train_dl, valid_dl)

learn = vision_learner(dls, "convnext_base", metrics=error_rate)

In the last line I get an AssertionError that says that n_out is not defined and could not be ascertained from data. I am not sure how to specify the label key in train_ds and valid_ds have the output labels, since DataLoaders do not take a y_name keyword argument.