Fastai v2 chat

Pablo · March 6, 2020, 4:20pm

That’s good news! I agree they are doing great work keeping things up to date and in one place, so I think that’s a good decision on your side. We are also trying to make hugging face work for our problem. We have already found a blog post using hugging face (some of it) on v2 (a given application) so I am confident this line of work should be easier (our unusual dataloader requirements complicate things a bit, though). I was exploring the other approach in case it proved more or less trivial to port. It’s… not trivial, but it looks feasible. I got to the optimizer so far.

morgan · March 7, 2020, 3:48pm

Hey @Pablo I wrote that post, let me know if you have any questions or if I can help , although by the looks of it here you probably know more than me at this stage!

Pablo · March 9, 2020, 8:22am

Thank you @morgan, that’s kind of you! I’ll take you up on that offer

Pablo · March 9, 2020, 1:34pm

I have been studying a bit more the Fastai V2 code, and I noticed something weird in one method of the class _BaseOptimizer:

def set_freeze(n, rg, ignore_force_train=False):
    for p in self.param_groups[n]: p.requires_grad_(rg or (state.get('force_train', False) and not ignore_force_train))

Note this does not seem to be a static method, because it does not say so and because it uses self. But it is missing self as the first argument. Is this a bug?

sgugger · March 9, 2020, 3:05pm

Oh that is a mistake indeed. Fixed this, thanks for flagging!

Pablo · March 9, 2020, 3:20pm

Glad that I could help!

Pablo · March 9, 2020, 4:52pm

Thanks for the post, @morgan it’s been very helpful, and it’s clear that you put a lot of effort into this.

We managed to make it work for our case (one file per document, all in one folder, with multi-label data associated to this docs in a csv file).

Models like Bert work great, but they have a very relevant problem, which is that they only work with the first x tokens (512 tokens in Bert’s case). Our documents are longer, so other models like XLNet require around 70GB of GPU memory even for batch size 4… so this is hard to address.

We are also working on Multifit, which I believe uses a much smaller model, so we can probably work with longer documents. But Multifit is not ported to V2 yet… we are going to try Multifit on a real-but-smaller dataset which should be fine on V1, to see how promising this is (if it is groundbreaking compared to Bert we will have to fight to make it work on V2 or somehow shallow our many docs).

If any of you have any other ideas for classifiers for very long and very many documents… I’d be glad to know!

morgan · March 10, 2020, 9:29am

Ah good to hear you’re making progress! I don’t have any experience with it, and you’ve probably already considered it, but is there the possibility to break your document into chunks and then do some ensembling of predictions on the chunks to get a single prediction for the document? Or extract the last layer embeddings from BERT for each of the chunks, combine them and send them through a linear classifier?

For new transformers I’d keep an eye on any ELECTRA pytorch ports that get released over the next week or two as google research just posted their code yesterday, https://github.com/google-research/electra, paper: https://openreview.net/pdf?id=r1xMH1BtvB. But you’ll have the same problem here as it looks like the reduced the input to 128 (although one of the models does use 512 too)

Pablo · March 10, 2020, 9:52am

Yes, we have tried chunks at inference time (with Bert). Recall raised significantly, at the cost of precision. It looks like we need to do this at training time as well, at least as a fine-tuning step. It feels a bit “hacky”, so I was looking for something that can work with long texts by construction.

My still superficial understanding of TransformerXL suggests this was the way to go, and I don’t get yet why such crazy memory requirements.

I will post here if there are any interesting developments.

morgan · March 10, 2020, 10:03am

True, ensemling predictions might be a bit too hacky for the real world but aggregating embedding layers might help on the precision side…in the recent Google QUEST kaggle competition a few of the gold medallists (1st and 2nd I think) also combined the last layer embeddings from 2 BERT models (one trained on questions, one on answers): https://www.kaggle.com/c/google-quest-challenge/discussion. Some of them have shared your code in case you need a head start

Pablo · March 10, 2020, 10:30am

I had not considered combining embeddings instead of predictions. It seems like it would be a bit harder to code, but it’s an interesting alternative. Thanks for sharing!

farid · March 11, 2020, 6:46pm

I’m installing fastai2 from the fastai2 repo like this:

!pip install git+https://github.com/fastai/fastai2.git

This result in :

Successfully installed fastai2-0.0.12 fastcore-0.1.14

When I tried to import:

from fastai2.basics import *

I ended up having the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-c5acc50824c2> in <module>()
----> 1 from fastai2.basics import *

4 frames
/usr/local/lib/python3.6/dist-packages/fastai2/basics.py in <module>()
----> 1 from .data.all import *
      2 from .optimizer import *
      3 from .callback.core import *
      4 from .learner import *
      5 from .metrics import *

/usr/local/lib/python3.6/dist-packages/fastai2/data/all.py in <module>()
      1 from ..torch_basics import *
----> 2 from .core import *
      3 from .load import *
      4 from .external import *
      5 from .transforms import *

/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in <module>()
    114 # Cell
    115 @docs
--> 116 class DataLoaders(GetAttr):
    117     "Basic wrapper around several `DataLoader`s."
    118     _default='train'

/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in DataLoaders()
    127 
    128     def _set(i, self, v): self.loaders[i] = v
--> 129     train   ,valid    = add_props(lambda i,x: x[i], _set)
    130     train_ds,valid_ds = add_props(lambda i,x: x[i].dataset)
    131 

/usr/local/lib/python3.6/dist-packages/fastcore/utils.py in add_props(f, n)
    530 def add_props(f, n=2):
    531     "Create properties passing each of `range(n)` to f"
--> 532     return (property(partial(f,i)) for i in range(n))
    533 
    534 # Cell

TypeError: 'function' object cannot be interpreted as an integer

So I figured out fastcore latest version on pypi wasn’t pushed yet (which seems normal as we don’t push it after every single change, I guess). To eliminate this error, I installed the fastcore latest version (0.1.15) by doing this:

!pip install git+https://github.com/fastai/fastcore.git

I was wondering if there is any mechanism to automatically sync the installation of the latest version (from repos) of both fastai2 and fastcore when we directly install fastai2 from master.

For this same reason, I always installed the editable version of fastai2 like this:

pip install -e .

instead of this:

pip install -e ".[dev]"

And then , I install the editable version of nbdev

jeremy · March 11, 2020, 7:21pm

@farid that’s a good point - if you use fastai2 from master, you need to do the same for fastcore. And you need to git pull both whenever you update.

farid · March 11, 2020, 7:34pm

Thank you Jeremy. I was wondering if after git pull both, we have to pip install them each time. By the way, this what I’m doing now but I was wondering if it’s the proper way to do it.

I thought that somehow they would auto-magically pip themselves up but I guess this is what we call laziness in the real world!

sgugger · March 11, 2020, 7:37pm

You don’t have to pip install -e . more than once.

farid · March 11, 2020, 7:50pm

That’s what I did but then, several times, I realized that my local fastai2 (and fastcore) were lagging behind. For instance, this morning my fastai2 version stayed at 0.0.11 and fastcore was at 0.1.13. So, I used pip install -e . for both of them. I will see then when a new version is pushed.

So, I guess by using -e pip install creates a watcher to observe if any pull action takes place. Is it a fair assumption?

sgugger · March 11, 2020, 8:51pm

Yes. I guess you were lagging before because you did not have the editable install.

farid · March 11, 2020, 9:01pm

I had the editable install for a long time now. I will monitor that and I will let you know if there is something.

Thank you again

sabzo · March 11, 2020, 9:41pm

In the dev.fast.ai install section (http://dev.fast.ai/#Installing) It is not clear what “fastcore” is and how to link fastai2 repository to fastcore. Shouldn’t fastai2’s master branch already have the latest fastcore master?

sgugger · March 11, 2020, 9:55pm

These are two separate packages. When you are using a version of fastai2 that’s released, we can control the minimal version fo fastcore you have through requirements, but for an editable install, we don’t have a mean to do this automatically, so you need to make sure to pull them both.

Especially when we are working on changes that impact both packages at the same time like right now.