Fastai v2 chat

Pablo · January 28, 2020, 11:01am

Interesting, do you also use transformers with Fastai v2?

fmobrj75 · January 28, 2020, 12:15pm

Yes. Since my datasets are in portuguese, I used the multilingual models from transformers (Bert multilingual, Xlm 17 and Xlm-Roberta), using fastai2 as the framework for data, learner and training loop.

I am getting better results with AWD-LSTM with 15k tokens Sentencepiece and Label Smoothing Crossentropy Loss for the Language model finetuning and Classifier.

I suspect the 15k tokens for SP, inspired on the MultiFIT paper (I was using 32k before and getting worse results), the Label Smoothing for the domain finetune and classification, and the new pad_input_chunk (fastai2) for the padding in the classification are the main suspects for the boost.

I am not using the backward model yet, which I suspect would improve further the results.

Pablo · January 28, 2020, 1:40pm

Thanks! This is really useful input. In my case datasets are in Spanish, so it makes sense that similar choices should work.

fmobrj75 · January 28, 2020, 2:31pm

For training the language models on wikipedia, I used almost exactly the same parameters @pierreguillou used in his notebooks (https://github.com/piegu/language-models/blob/master/lm3-portuguese.ipynb), but i got better results with LSTM than with QRNN. I kept the same number of tokens for the entire corpora, the same wd values, dropouts, number of epochs as his. The only different parameter is that I used the default LSTM config:

{‘emb_sz’: 400,
‘n_hid’: 1152,
‘n_layers’: 3,
‘pad_token’: 1,
‘qrnn’: False,
‘bidir’: False,
‘output_p’: 0.1,
‘hidden_p’: 0.15,
‘input_p’: 0.25,
‘embed_p’: 0.02,
‘weight_p’: 0.2,
‘tie_weights’: True,
‘out_bias’: True}

For clarification, I trained the portuguese Language Model using fastai v1. And then finetuned to the domain corpora and trained the classification model using fastai2. For the moms in fastai2, I always used moms=(0.8,0.7,0.8) and the pad_input_chunk for the classification dataloader. For the rest, I used the same parameters as his TCU classification notebook.

Now I am trying to adapt my code to finetune (using fastai2) a backward version of an analog protuguese Language Model I trained using fastai v1.

Pablo · January 28, 2020, 4:59pm

Please keep us in the loop, I think it’s very helpful!

zlapp · January 28, 2020, 11:05pm

Have been wondering how to go about running fastai2 on video data or data with multiple 2d slices of images with variable length. Meaning x is a set of 2d slices composing a 3d volume and between two distinct x’s the number of 2d slices may vary (i.e. one video may have more frames than the other since its a longer shot).

fmobrj75 · January 29, 2020, 1:58am

For the backward models, in the Dataset / Dataloader level, I created a very simple transform function based on previous library code and it worked flawlessly. Even with pad_input_chunk for the classification task. I got very good results in the backward LM and Classifier. I hope there is no caveat with this:

def backwards(tokens):
    return tokens.flip(0)

tfms = [attrgetter("text"), Tokenizer(tokenizer=sp), Numericalize(vocab), backwards]
splits = RandomSplitter(valid_pct=0.1, seed=42)(df)

dsrc = Datasets(df, [tfms], splits=splits, dl_type=LMDataLoader)

sgugger · January 29, 2020, 2:16am

Nice and easy, I like it!

Edit: Not too sure it works well with the pad_input_chunks directly, I think you might need to do it the other way (so pad_first=False) to make sure it works as intended.

fmobrj75 · January 29, 2020, 2:26am

I ran some small tests, and it seems to work ok. Unless I did not understand the filosophy behind pad_input_chunk. Is it ok?

Normal / Forward (seq-len = 72):

pad_input_chunk([(tensor([1,2,3,4]),1), (tensor([5,6,7]), 2), (tensor([8,9]), 3), (tensor([10]), 4)], pad_idx=0, pad_first=True, seq_len=72)

Output:

[(tensor([1, 2, 3, 4]), 1),
 (tensor([5, 6, 7, 0]), 2),
 (tensor([8, 9, 0, 0]), 3),
 (tensor([10,  0,  0,  0]), 4)]

Backwards:

pad_input_chunk([(backwards(tensor([1,2,3,4])),1), (backwards(tensor([5,6,7])), 2), (backwards(tensor([8,9])), 3), (backwards(tensor([10])), 4)], pad_idx=0, pad_first=True, seq_len=72)

Output:

[(tensor([4, 3, 2, 1]), 1),
 (tensor([7, 6, 5, 0]), 2),
 (tensor([9, 8, 0, 0]), 3),
 (tensor([10,  0,  0,  0]), 4)]

Now with seq_len = 2:

Forward:

pad_input_chunk([(tensor([1,2,3,4]),1), (tensor([5,6,7]), 2), (tensor([8,9]), 3), (tensor([10]), 4)], pad_idx=0, pad_first=True, seq_len=2)

Output:

[(tensor([1, 2, 3, 4]), 1),
 (tensor([5, 6, 7, 0]), 2),
 (tensor([0, 0, 8, 9]), 3),
 (tensor([ 0,  0, 10,  0]), 4)]

Backward:

pad_input_chunk([(backwards(tensor([1,2,3,4])),1), (backwards(tensor([5,6,7])), 2), (backwards(tensor([8,9])), 3), (backwards(tensor([10])), 4)], pad_idx=0, pad_first=True, seq_len=2)

Output:

[(tensor([4, 3, 2, 1]), 1),
 (tensor([7, 6, 5, 0]), 2),
 (tensor([0, 0, 9, 8]), 3),
 (tensor([ 0,  0, 10,  0]), 4)]

sgugger · January 29, 2020, 2:28am

Ah yes, backwards happens before the padding, so it’s all well. You’re perfectly right, it works exactly as intended.

fmobrj75 · January 29, 2020, 2:28am

Thanks!

jwuphysics · January 29, 2020, 3:40am

Simple and beautiful! And props to Jeremy and Sylvain for making it so easy to put a new transform in the mix.

boris · January 29, 2020, 4:21am

It seems that IntToFloatTensor does not multiply back by self.div during decoding: see this line

def decodes(self, o:TensorImage): return o.clamp(0., 1.) if self.div else o

I had some issues because I manipulate images afterwards and it was fixed with this:

def decodes(self, o:TensorImage): return o.clamp(0., 1.).mul_(self.div).to(torch.uint8) if self.div else o

On another note, you allow to init the transform of a TensorImage with div=None but you still have to recast to uint8 in the decoding part and also the encoding part will not work currently with div=None.

sgugger · January 29, 2020, 3:08pm

Decoding is not a full reverse operation, we only decode what is necessary for showing purposes, which is why there is no casting back to int and remultiplying by div (we can show float pixels between 0 and 1). If you need this for your purposes, it should be a separate step.

Also note that if the behavior doesn’t please you, you can always define your custom type (MyTensorImage) then patch new encodes and decodes method to that transform

boris · January 29, 2020, 3:43pm

Thanks, yes I just built my own Transform. I was just wondering if it was a mistake or not but it’s true that RGB displays also in float

jeremy · January 29, 2020, 5:21pm

What was the reason you needed this, BTW? If it’s something common then we’re open to adding this decoder.

boris · January 29, 2020, 6:06pm

I’m playing with fastaiv2 building a colorizer:

I use TfmdLists
- Load images
- Resize -> will move that to GPU because pipeline is slow and I think it’s the reason (+ getting a good SSD)
- convert to CIE Lab color space
- ToTensor
- split between L channel (black & white) to use as input and a & b channels (color component) to use as output -> this let me load the images only once for input/output and will give me more flexibility than saving pre-formatted pics (as long as I can make this pipeline fast enough)
Dataloader from TfmdLists with batch_tfms
- IntToFloat -> goes from [0,255] to [0,1] in one way but not the opposite
- I may normalilze later though I’m not sure it’s needed as there’s no pre-trained networks to use (I expect that keeping ranges between [0, 1] should be sufficient)
model is U-net (not pretrained) with 1 channel as input and a Sigmoid at the end (with 2 channels)

I go from RGB to LAB using Pillow, and do the same for reverse operation (from LAB to RGB) which requires me to go from ImageTensor to PIL.Image. This conversion is not implemented yet in fastai2 but basically it is just PIL.Image.fromarray(np.uint8(self)) which works only when the tensor is already in [0,255] range (and even better if already in uint8).

jeremy · January 29, 2020, 9:40pm

Got it. Thanks for the explanation.

@sgugger do you recall if there’s a reason we don’t do the decodes (other than that it’s not normally needed)? Is there a downside?

sgugger · January 30, 2020, 12:45am

As I said, the design point was that we only decode to show, not fully reverse all functions (note that some transforms are not reversible). We can change this if we want but it’s going to add a bit of code (+ what do we do about not-reversible functions?)

jeremy · January 30, 2020, 1:57am

I was only wondering about this one function and this one param. I’m not suggesting we change the behavior more generally.