Use fastai to load/show batches from huggingface/nlp GLUE datasets

Hi all,
huggingface/nlp is a library helps you dowload, preprocess, cache datasets and use metrics easily.

And I create a novel integration to use fastai with it. GLUE_with_fastai.ipynb can

  • Load batches
  • use show_batch to show

The code contains lots of logic to load GLUE properly, but the core will be HF_Dataset,HF_Datasets and how I use them.

But it can’t do even dls.vocab and many other things. and I hope this novel thing could help people create a better integration between fastai and huggingface/nlp

Also, this is actually the 3rd post of the series " Pretrain MLM and fintune on GLUE with fastai "

Follow my twitter Richard Wang to get updates for this series and maybe the updates of this novel integration.

(Boiler alert: multi-task learning using fastai is on the way !!)


Yes, we can train model with this novel integration.

I added training mrpc using huggingface/nlp dataset to the notebook.


Small update: Testset prediction

We can use this to genereate files to submit to GLUE benchmark.