Costum dataloader for text data

I am trying to go deeper in my understanding of fastai API and want to be able to implement some things in “pure” pytorch and then let fastai do all of the optimization tricks.

I am trying simple text classification with my own dataloader class.
Firstly, I still get error when I try to show one batch getting a RecursionError.

my_dls.show_batch()
res = L(b).map(partial(batch_to_samples,max_n=max_n))
RecursionError: maximum recursion depth exceeded while calling a Python object

What I need to do in my class so I can train text classification model with custom stuff.

from torch.utils import data
from torch.utils.data import DataLoader, Dataset
import pandas as pd
from fastai.data.core import DataLoaders
from torch.nn import CrossEntropyLoss

from fastai.text.all import *


# Example of data
# Entire data here: https://github.com/koaning/tokenwiser/blob/main/data/oos-intent.jsonl
d = {"text": "how would you say fly in italian", "label": "translate"}

data = pd.read_json("text.jsonl", lines = True)

class text_dataset(Dataset):
      def __init__(self, text, label):
          self.text = text
          self.label = label
          self.n_classes = len(set(self.label))
          self.vocab = [i for i in set(self.label)]


      def __len__(self):
            return len(self.label)

      def __getitem__(self, idx):
            text_i = self.text[idx]
            label_i = self.label[idx]
            return {"text": text_i, "label": label_i}

dls = text_dataset(data["text"], data["label"])
dls.n_classes

len(dls)

data_loader = DataLoader(dls)

my_dls = DataLoaders.from_dsets(dls)


my_dls.show_batch()
#RecursionError: maximum recursion depth exceeded while calling a Python object


learn = text_classifier_learner(my_dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy, loss_func = CrossEntropyLoss)

# Also does not work
#learn = Learner(my_dls, AWD_LSTM, metrics=accuracy, loss_func = CrossEntropyLoss)


learn.fit_one_cycle(1)

You can use raw PyTorch, but know that you will lose access to show_batch and a few other bits that are specific to the fastai data api. (Doesn’t impact training, only QOL data exploration, etc) This tutorial in the docs explains it well, and is applicable to any application in the fastai framework, not just images:

1 Like