Fastai v2 chat

I was looking at code in fastai2/fastai2/text/models/ core.py and have seen this piece of code where there seems to be unreachable code. Is this intentional (since development is ongoing) or is it some leftover code?

Note the return right below def _concat

# Cell
class SentenceEncoder(Module):
"Create an encoder over `module` that can process a full sentence."
def __init__(self, bptt, module, pad_idx=1, max_len=None): store_attr(self, 'bptt,module,pad_idx,max_len')

def _concat(self, ts, bs):
    return torch.cat()
    bs,sl = ts[0].shape[0],sum([t.shape[1] for t in ts])
    res = ts[0].new_zeros(bs, sl, *ts[0].shape[2:])
    ts,xtra = (ts[:sz],ts[sz]) if len(ts) > sz else (ts,None)
    for i,j in enumerate(idxs):
        c = torch.cat([t[i] for t in ts[j:] if t.shape[0] > i] + [t[i] for t in ts[:j] if t.shape[0] > i] +
                      ([] if xtra is None or xtra.shape[0] <= i else [xtra[i]]))
        res[i,:c.shape[0]] = c
    return res
1 Like

This function is not used anymore. I guess itā€™s some leftover of when I was debugging it. Removed the whole thing.

1 Like

Iā€™m slowly trying to understand/make work the TransformerXL code. I have added a script at fastai2/fastai2/text/models/ core.py and I am trying to make run. The model loads fine.

First I got a complaint related to the MixedPrecision callback, which I have deactivated until things work that far. (Error: 'LMLearner' object has no attribute 'master_pgs').

Then I got a complaint that None had no size which happened because MultiHeadRelativeAttention was using forward from MultiHeadAttention and so _apply_attention was missing parameters.

This is apparently solved by giving MultiHeadRelativeAttention its own forward where we now pass kwargs.

def forward(self, x:Tensor, mask:Tensor=None, **kwargs):
    return self.ln(x + self.drop_res(self.out(self._apply_attention(x, mask=mask, **kwargs))))

Edit: Another problem is that LinearDecoder has changed a bit (probably the encoder output is different). So I have taken the old LinearDecoder and the right one for each case is given using the _model_meta dict.

And thatā€™s where I am right now, I think I have to deal with more errors if such a naive port of the transformer.py script from V1 is even possible (at least I am learning a bit about this part of the code :man_shrugging:t4:).

1 Like

Note that fastai 2 wonā€™t reimplement transformers models like in v1. We plan to use hugging face implementations since they have done a great job at creating a model garden.

4 Likes

Thatā€™s good news! I agree they are doing great work keeping things up to date and in one place, so I think thatā€™s a good decision on your side. We are also trying to make hugging face work for our problem. We have already found a blog post using hugging face (some of it) on v2 (a given application) so I am confident this line of work should be easier (our unusual dataloader requirements complicate things a bit, though). I was exploring the other approach in case it proved more or less trivial to port. Itā€™sā€¦ not trivial, but it looks feasible. I got to the optimizer so far.

Hey @Pablo I wrote that post, let me know if you have any questions or if I can help :slight_smile:, although by the looks of it here you probably know more than me at this stage!

Thank you @morgan, thatā€™s kind of you! Iā€™ll take you up on that offer :wink:

1 Like

I have been studying a bit more the Fastai V2 code, and I noticed something weird in one method of the class _BaseOptimizer:

def set_freeze(n, rg, ignore_force_train=False):
    for p in self.param_groups[n]: p.requires_grad_(rg or (state.get('force_train', False) and not ignore_force_train))

Note this does not seem to be a static method, because it does not say so and because it uses self. But it is missing self as the first argument. Is this a bug?

2 Likes

Oh that is a mistake indeed. Fixed this, thanks for flagging!

2 Likes

Glad that I could help!

Thanks for the post, @morgan itā€™s been very helpful, and itā€™s clear that you put a lot of effort into this.

We managed to make it work for our case (one file per document, all in one folder, with multi-label data associated to this docs in a csv file).

Models like Bert work great, but they have a very relevant problem, which is that they only work with the first x tokens (512 tokens in Bertā€™s case). Our documents are longer, so other models like XLNet require around 70GB of GPU memory even for batch size 4ā€¦ so this is hard to address.

We are also working on Multifit, which I believe uses a much smaller model, so we can probably work with longer documents. But Multifit is not ported to V2 yetā€¦ we are going to try Multifit on a real-but-smaller dataset which should be fine on V1, to see how promising this is (if it is groundbreaking compared to Bert we will have to fight to make it work on V2 or somehow shallow our many docs).

If any of you have any other ideas for classifiers for very long and very many documentsā€¦ Iā€™d be glad to know!

Ah good to hear youā€™re making progress! I donā€™t have any experience with it, and youā€™ve probably already considered it, but is there the possibility to break your document into chunks and then do some ensembling of predictions on the chunks to get a single prediction for the document? Or extract the last layer embeddings from BERT for each of the chunks, combine them and send them through a linear classifier?

For new transformers Iā€™d keep an eye on any ELECTRA pytorch ports that get released over the next week or two as google research just posted their code yesterday, https://github.com/google-research/electra, paper: https://openreview.net/pdf?id=r1xMH1BtvB. But youā€™ll have the same problem here as it looks like the reduced the input to 128 (although one of the models does use 512 too)

2 Likes

Yes, we have tried chunks at inference time (with Bert). Recall raised significantly, at the cost of precision. It looks like we need to do this at training time as well, at least as a fine-tuning step. It feels a bit ā€œhackyā€, so I was looking for something that can work with long texts by construction.

My still superficial understanding of TransformerXL suggests this was the way to go, and I donā€™t get yet why such crazy memory requirements.

I will post here if there are any interesting developments.

1 Like

True, ensemling predictions might be a bit too hacky for the real world but aggregating embedding layers might help on the precision sideā€¦in the recent Google QUEST kaggle competition a few of the gold medallists (1st and 2nd I think) also combined the last layer embeddings from 2 BERT models (one trained on questions, one on answers): https://www.kaggle.com/c/google-quest-challenge/discussion. Some of them have shared your code in case you need a head start

2 Likes

I had not considered combining embeddings instead of predictions. It seems like it would be a bit harder to code, but itā€™s an interesting alternative. Thanks for sharing!

1 Like

Iā€™m installing fastai2 from the fastai2 repo like this:

!pip install git+https://github.com/fastai/fastai2.git

This result in :

Successfully installed fastai2-0.0.12 fastcore-0.1.14

When I tried to import:

from fastai2.basics import *

I ended up having the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-c5acc50824c2> in <module>()
----> 1 from fastai2.basics import *

4 frames
/usr/local/lib/python3.6/dist-packages/fastai2/basics.py in <module>()
----> 1 from .data.all import *
      2 from .optimizer import *
      3 from .callback.core import *
      4 from .learner import *
      5 from .metrics import *

/usr/local/lib/python3.6/dist-packages/fastai2/data/all.py in <module>()
      1 from ..torch_basics import *
----> 2 from .core import *
      3 from .load import *
      4 from .external import *
      5 from .transforms import *

/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in <module>()
    114 # Cell
    115 @docs
--> 116 class DataLoaders(GetAttr):
    117     "Basic wrapper around several `DataLoader`s."
    118     _default='train'

/usr/local/lib/python3.6/dist-packages/fastai2/data/core.py in DataLoaders()
    127 
    128     def _set(i, self, v): self.loaders[i] = v
--> 129     train   ,valid    = add_props(lambda i,x: x[i], _set)
    130     train_ds,valid_ds = add_props(lambda i,x: x[i].dataset)
    131 

/usr/local/lib/python3.6/dist-packages/fastcore/utils.py in add_props(f, n)
    530 def add_props(f, n=2):
    531     "Create properties passing each of `range(n)` to f"
--> 532     return (property(partial(f,i)) for i in range(n))
    533 
    534 # Cell

TypeError: 'function' object cannot be interpreted as an integer

So I figured out fastcore latest version on pypi wasnā€™t pushed yet (which seems normal as we donā€™t push it after every single change, I guess). To eliminate this error, I installed the fastcore latest version (0.1.15) by doing this:

!pip install git+https://github.com/fastai/fastcore.git

I was wondering if there is any mechanism to automatically sync the installation of the latest version (from repos) of both fastai2 and fastcore when we directly install fastai2 from master.

For this same reason, I always installed the editable version of fastai2 like this:

pip install -e .

instead of this:

pip install -e ".[dev]"

And then , I install the editable version of nbdev

@farid thatā€™s a good point - if you use fastai2 from master, you need to do the same for fastcore. And you need to git pull both whenever you update.

2 Likes

Thank you Jeremy. I was wondering if after git pull both, we have to pip install them each time. By the way, this what Iā€™m doing now but I was wondering if itā€™s the proper way to do it.

I thought that somehow they would auto-magically pip themselves up but I guess this is what we call laziness in the real world!

You donā€™t have to pip install -e . more than once.

2 Likes

Thatā€™s what I did but then, several times, I realized that my local fastai2 (and fastcore) were lagging behind. For instance, this morning my fastai2 version stayed at 0.0.11 and fastcore was at 0.1.13. So, I used pip install -e . for both of them. I will see then when a new version is pushed.

So, I guess by using -e pip install creates a watcher to observe if any pull action takes place. Is it a fair assumption?