Update to blurr library (huggingface-fastai integration for developers)

wgpubs · May 7, 2020, 1:29am

Just occurred to me that I didn’t post a link to this library on this thread before … so here it is.

Last week I released a new library with the goal of to support training all huggingface transformer models with fastai v2. It’s called blurr and you can read all about it here.

The latest update includes support for:

Sequence classification (multiclassification, multi-label classification)
Question Answering
Token Classification (new).

There’s actually quite a few changes and enhancements, much in part due to the very cool pieces of software available in fastai v2 and one in particular from a solid recommendation courtesy of @sgugger.

Everything is under very active development, and with a goal to do a release every 1-2 weeks, I’m hoping the library is fully production worthy when we get a v.1 of fastai v2.

Valle998 · May 8, 2020, 12:16pm

How to get this update in older version? is that any process? I want to know more.

martijnd · May 22, 2020, 12:09pm

Hello wgpubs. A big compliment for your library. You make it really easy to use one of the HF models.

Sequence Classification with only 2 labels works for me. However, if I have more than 2 labels the lr.find / lr.fit are not working anymore.

I’ve tried to specify config.num_labels = len(labels.unique) in my case 5 labels, but this doesn’t work. Do I have to specify the labels somewhere else?

  ---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-229-ad38bd08516f> in <module>
      1 #slow
----> 2 learn.lr_find(suggestions=True)

/home/shared/blurr/fastai2/fastai2/callback/schedule.py in lr_find(self, start_lr, end_lr, num_it, stop_div, show_plot, suggestions)
    226     n_epoch = num_it//len(self.dls.train) + 1
    227     cb=LRFinder(start_lr=start_lr, end_lr=end_lr, num_it=num_it, stop_div=stop_div)
--> 228     with self.no_logging(): self.fit(n_epoch, cbs=cb)
    229     if show_plot: self.recorder.plot_lr_find()
    230     if suggestions:

/home/shared/blurr/fastai2/fastcore/fastcore/utils.py in _f(*args, **kwargs)
    429         init_args.update(log)
    430         setattr(inst, 'init_args', init_args)
--> 431         return inst if to_return else f(*args, **kwargs)
    432     return _f
    433 

/home/shared/blurr/fastai2/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    201                     try:
    202                         self.epoch=epoch;          self('begin_epoch')
--> 203                         self._do_epoch_train()
    204                         self._do_epoch_validate()
    205                     except CancelEpochException:   self('after_cancel_epoch')

/home/shared/blurr/fastai2/fastai2/learner.py in _do_epoch_train(self)
    173         try:
    174             self.dl = self.dls.train;                        self('begin_train')
--> 175             self.all_batches()
    176         except CancelTrainException:                         self('after_cancel_train')
    177         finally:                                             self('after_train')

/home/shared/blurr/fastai2/fastai2/learner.py in all_batches(self)
    151     def all_batches(self):
    152         self.n_iter = len(self.dl)
--> 153         for o in enumerate(self.dl): self.one_batch(*o)
    154 
    155     def one_batch(self, i, b):

/home/shared/blurr/fastai2/fastai2/learner.py in one_batch(self, i, b)
    159             self.pred = self.model(*self.xb);                self('after_pred')
    160             if len(self.yb) == 0: return
--> 161             self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
    162             if not self.training: return
    163             self.loss.backward();                            self('after_backward')

/home/shared/blurr/fastai2/fastai2/layers.py in __call__(self, inp, targ, **kwargs)
    292         if targ.dtype in [torch.int8, torch.int16, torch.int32]: targ = targ.long()
    293         if self.flatten: inp = inp.view(-1,inp.shape[-1]) if self.is_2d else inp.view(-1)
--> 294         return self.func.__call__(inp, targ.view(-1) if self.flatten else targ, **kwargs)
    295 
    296 # Cell

/home/shared/conda/fastai2/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

/home/shared/conda/fastai2/lib/python3.6/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
    914     def forward(self, input, target):
    915         return F.cross_entropy(input, target, weight=self.weight,
--> 916                                ignore_index=self.ignore_index, reduction=self.reduction)
    917 
    918 

/home/shared/conda/fastai2/lib/python3.6/site-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2019     if size_average is not None or reduce is not None:
   2020         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2021     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2022 
   2023 

/home/shared/conda/fastai2/lib/python3.6/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   1836                          .format(input.size(0), target.size(0)))
   1837     if dim == 2:
-> 1838         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   1839     elif dim == 4:
   1840         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

IndexError: Target 2 is out of bounds.

wgpubs · May 23, 2020, 8:44pm

I’ll take a look and try to put an example together for multi-label. Thanks for the heads up!

wgpubs · May 23, 2020, 8:45pm

BTW … just updated the library with support for conditional text generation (things like summarization, conversational agents, etc…). Check it out and lmk how it works for y’all:

wgpubs · May 26, 2020, 3:41am

Are you doing multi-label classification? If so, check out this example I just added:

https://ohmeow.github.io/blurr/modeling-core/#Example-usage---Multi-label-classification

If not, lmk that too

wgpubs · July 7, 2020, 1:20am

FYI: Just updated everything to work with huggingface transformers >= 3.0.2!

You can install from master (instructions here), or via pypi here. You will need to use transformers >=3.0.2 which includes some fixes to make tokenizers pickle’ble.

For current blurr users, yah, this update may break your existing code as I had to make quite a bit of changes to work with the latest version of transformers, and also make a few areas in the code more concise and better performing.

shimsan · July 8, 2020, 6:44am

Hi @wgpubs, this is fabulous work, and I love your blog too BTW. Thanks for your Datablock Nirvana articles. I am slowly absorbing them.

I have been trying to use your library with a dataset of my own that has 17 classes, and I am using RandomSplitter instead of ColSplit.

I am getting some strange errors when I try to run learn.summary() or learn.lr_find()

----> 1 learn.summary()

4 frames
/usr/local/lib/python3.6/dist-packages/fastai2/callback/hook.py in summary(self)
    186     "Print a summary of the model, optimizer and loss function."
    187     xb = self.dls.train.one_batch()[:self.dls.train.n_inp]
--> 188     res = self.model.summary(*xb)
    189     res += f"Optimizer used: {self.opt_func}\nLoss function: {self.loss_func}\n\n"
    190     if self.opt is not None:

/usr/local/lib/python3.6/dist-packages/fastai2/callback/hook.py in summary(self, *xb)
    163     sample_inputs,infos = layer_info(self, *xb)
    164     n,bs = 64,find_bs(xb)
--> 165     inp_sz = _print_shapes(apply(lambda x:x.shape, xb), bs)
    166     res = f"{self.__class__.__name__} (Input shape: {inp_sz})\n"
    167     res += "=" * n + "\n"

/usr/local/lib/python3.6/dist-packages/fastai2/callback/hook.py in _print_shapes(o, bs)
    155 def _print_shapes(o, bs):
    156     if isinstance(o, torch.Size): return ' x '.join([str(bs)] + [str(t) for t in o[1:]])
--> 157     else: return str([_print_shapes(x, bs) for x in o])
    158 
    159 # Cell

/usr/local/lib/python3.6/dist-packages/fastai2/callback/hook.py in <listcomp>(.0)
    155 def _print_shapes(o, bs):
    156     if isinstance(o, torch.Size): return ' x '.join([str(bs)] + [str(t) for t in o[1:]])
--> 157     else: return str([_print_shapes(x, bs) for x in o])
    158 
    159 # Cell

... last 2 frames repeated, from the frame below ...

/usr/local/lib/python3.6/dist-packages/fastai2/callback/hook.py in _print_shapes(o, bs)
    155 def _print_shapes(o, bs):
    156     if isinstance(o, torch.Size): return ' x '.join([str(bs)] + [str(t) for t in o[1:]])
--> 157     else: return str([_print_shapes(x, bs) for x in o])
    158 
    159 # Cell

RecursionError: maximum recursion depth exceeded

learn.lr_find():

    ---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/fastai2/learner.py in one_batch(self, i, b)
    162             if not self.training: return
--> 163             self.loss.backward();                            self('after_backward')
    164             self.opt.step();                                 self('after_step')

41 frames
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)` (createCublasHandle at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:8)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x46 (0x7fc28b57b536 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xf6ebd5 (0x7fc28c945bd5 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cuda.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0x94c (0x7fc28c9469bc in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cuda.so)
frame #3: <unknown function> + 0xf642d1 (0x7fc28c93b2d1 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cuda.so)
frame #4: <unknown function> + 0x140e34d (0x7fc28cde534d in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cuda.so)
frame #5: THCudaTensor_addmm + 0x5c (0x7fc28cdeeefc in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cuda.so)
frame #6: <unknown function> + 0x105a6f8 (0x7fc28ca316f8 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cuda.so)
frame #7: <unknown function> + 0xf7dab8 (0x7fc28c954ab8 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cuda.so)
frame #8: <unknown function> + 0x10c2780 (0x7fc2c454f780 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x2c9b47e (0x7fc2c612847e in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x10c2780 (0x7fc2c454f780 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #11: at::Tensor c10::Dispatcher::callUnboxed<at::Tensor, at::Tensor const&, at::Tensor const&>(c10::OperatorHandle const&, at::Tensor const&, at::Tensor const&) const + 0xb3 (0x7fc2d2890c33 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #12: <unknown function> + 0x28ac327 (0x7fc2c5d39327 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::generated::AddmmBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x134 (0x7fc2c5d73ff4 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x2d89705 (0x7fc2c6216705 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x16f3 (0x7fc2c6213a03 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x3d2 (0x7fc2c62147e2 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fc2c620ce59 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_cpu.so)
frame #18: torch::autograd::python::PythonEngine::thread_init(int) + 0x38 (0x7fc2d2b54488 in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #19: <unknown function> + 0xbd6df (0x7fc2f75c76df in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #20: <unknown function> + 0x76db (0x7fc2f86a96db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #21: clone + 0x3f (0x7fc2f89e288f in /lib/x86_64-linux-gnu/libc.so.6)


During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/torch/cuda/__init__.py in _lazy_new(cls, *args, **kwargs)
    431     # We may need to call lazy init again if we are a forked child
    432     # del _CudaBase.__new__
--> 433     return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
    434 
    435 

RuntimeError: CUDA error: device-side assert triggered

I have heard that the device-side assert issue comes when the linear layers are not the right size, which is why I was trying to learn learn.summary().

Reference: https://towardsdatascience.com/cuda-error-device-side-assert-triggered-c6ae1c8fa4c3

Do you have any pointers?

My code:

wgpubs · July 8, 2020, 5:49pm

Because the transform now returns the huggingface dictionary with all the tensors used to represent a sequence (input_ids, attention_mask, token_type_ids, etc…) rather than a single tensor (which fastai expects), a bunch of fastai functionality currently doesn’t work.

To resolve this I’ve implemented a number of custom helper methods in the latest release (e.g., learn.blurr_predict, learn.blurr_summary, etc…) that work with a dictionary. Check out the documentation for more info.

I know @ mentioning is generally frowned up, yet I dare to do @ mention our BDFL on this one. @jeremy, take a look if you have some time to what I’m doing here for example to make things work with a dictionary. The changes I made were so minor that I think it can probably be done in the framework by either picking the first tensor if the data type is a dictionary … or else add a property/method in the base Transform and ItemTransform classes that allows you to specify what tensor to use for the purposes of the core fastai functions like predict and summary (by default it just returns whatever encodes returns).

learn.lr_find() should still work so my guess is something is wrong with your DataLoaders. Figuring out the cryptic CUDA errors are a right of passage for every deep learning practitioner … but if you still can’t get things figured out after awhile, feel free to post me/us a link to a notebook to checkout

ncduy · July 9, 2020, 8:45am

Hi @wgpubs, is it possible to use the same approach in your conditional_text_generation document for machine translation task? I tried attempting it in my notebook here but somehow the models overfitted very quickly when I fine-tuned the unfreezed model.

And the results looked really strange to me?

	text	target	prediction
0	Quelles sont les pages de codes qui sont acceptées par CANAFE?	Which code pages are accepted by FINTRAC?	What What What What What What What What What What What What What What What What What What What What What What What What What What
1	Quel effet cela aurait-il sur la qualité et la quantité de la formation dans le secteur des arts au Canada?	What impact would this have on the quality and quantity of arts training in Canada?	What What What What What What What What What What What What What What What What What What What What What What What What What What
2	Où se déroule l’histoire et pourquoi ce lieu a-t-il joué autrefois, pour nos ancêtres Dane-zaa, un rôle si important?	Where does the story take place and why was this place so important to our Dane-zaa ancestors in the past?	Where Where Where Where Where Where Where Where Where Where Where Where Where Where Where Where Where Where Where Where Where Whe
3	En quoi consistent les normes de service?	What are service standards?	What What What What What What What What What What What What What What What What What What What What What What What What What What

wgpubs · July 9, 2020, 5:09pm

I haven’t really worked on implementing MT tasks in blurr, but its something on my radar. A few things:

You may need to start with a multi-lingual pretrained model (pretrained_model_name = "facebook/bart-large-cnn" may not offer a very friendly vocab for MT as it is trained solely on English) … or you may need to train your own multi-lingual tokenizer. The BART paper is worth looking into if you are using BART as it discusses their MT experiments/hyperparameters. You can see all the available BART pre-trained models/tokenizers here.
If you’re overfitting, you may want to look at customizing the splitter function, using a smaller LR, using different schedulers, adding in regularization in the forms of things like weight-decay, etc…
There are MT specific models available in huggingface as well that might be better suited for your needs.
The learner.generate_text can be played with as well to use beam search, top-k/nucleus sampling, etc… that may help produce better results as well.

Just some ideas. Keep me posted if you get anything decent working

ncduy · July 10, 2020, 2:32am

Thanks, I’ll try them out!

wgpubs · July 12, 2020, 3:12am

v.0.0.7 - UPDATE :

Includes text generation and token classification metrics baked into a Callback used by all blurr learners.
If you’re interested in how to return multiple metrics at once, check out how HF_TokenClassCallback and HF_TextGenModelCallback make use of fastai’s ValueMetric class (article forthcoming).
Text generation has been updated to use huggingface’s PreTrainedModel.generate everywhere and allow users the ability to pass whatever args you want into via the text_gen_kwargs argument of HF_TextGenModelCallback. See here for an example of how to override and use the default summarization defined in BartConfig.

You can install from master (instructions here ), or via pypi here. You will need to use transformers >=3.0.2 which includes some fixes to make tokenizers pickle’ble.

Sorry if anything breaks

I would love to get more folks involved if there is interest; from everything to adding documentation, doing code reviews, adding tests, and implementing new features (especially from those familiar with huggingface and experienced with fastai v2).

shimsan · August 13, 2020, 1:32am

Closing the loop on this one.

learn.summary to be replaced by learn.blurr_summary as mentioned by @wgpubs

Secondly, as mentioned originally, I wanted to verify why my final layer was having the wrong number of out_features (which causes the CUDA side-assert runtime crash), and that can be also done by:

print(hf_model)

or

learn.model

Careful not to run learn.lr_find() as it will break the CUDA, meaning you will need to restart runtime. It happens because of size mismatch.

OK, now how to actually fix this?

I found the answer tucked away in one of the pages on https://ohmeow.github.io/blurr/modeling-core/

Pass config_kwargs={'num_labels': 30} to the BLURR_MODEL_HELPER.get_hf_objects

where num_labels is number of your classes.

Now, you can begin to train.

danteoz · October 11, 2020, 5:10pm

@wgpubs Loving the library so far, it has made it easy for me to get into using HuggingFace. I am currently working on multi classification task for school and am running into issues. Binary classification works fine however when I switch to HF_TASKS_AUTO.MultipleChoice, I a CUDA error.

I know that for multi label classification
config.num_labels = len(lbl_cols) must be added to the model config. Is there a num_classes parameter which must be passed for multi classification tasks?

Edit: HF_TASKS_AUTO.SequenceClassification also results also result in a CUDA error:

wgpubs · October 11, 2020, 6:21pm

I believe you need to pass config.num_labels for both multiclass and multilabel tasks (the former being the number of distinct classes and the later being the number of labels you are predicting 0 or 1 for).

If you want to post a gist, I can take a look.

danteoz · October 11, 2020, 7:01pm

That fixed the issues. I was also incorrectly using MultipleChoice when I should have been using SequenceClassification. For anyone else who runs into this issue with multi-classification, the correct combination is to use SequenceClassification and pass config.num_labels to BLURR_MODEL_HELPER.get_hf_objects() as show in the mutli-label example in the Blurr documentation. Thank you. This library is amazing.

wgpubs · October 11, 2020, 7:48pm

Yah I haven’t implemented the “multiple choice” specific huggingface models (although I imagine they will be essentially constructed the same way a multi-label task is set up with the “sequence classification” models.

danteoz · October 13, 2020, 7:24pm

I am getting an error when I attempt to call load_leaner on my exported models.

I do not get these errors when loading a sample UMLFit model. Have you encountered this before?

danteoz · October 14, 2020, 12:49am

After further testing it only seems to affect XLNet.