Fastai v2 chat

sgugger · January 24, 2020, 8:16pm

No, all afiine/coord/lighting transforms are done on the GPU

muellerzr · January 25, 2020, 3:16am

@sgugger question on this new naming change. Is there still train, Val, and test_dl? (Like dbunch.train_dl) Or is it index based dbunch[0] (to more easily allow unlimited sets). I was noticing this on the commits and wanted to be sure this is the behavior I’m reading

Edit: it seems like there still is, but I also see behaviors like dbunch[x]. Could you enlighten me a little?

sgugger · January 25, 2020, 2:14pm

There is no test_dl attribute, but there is a train_dl and valid_dl attribute for the first two. You can also index directly (for your second validation dl for instance).

muellerzr · January 25, 2020, 2:50pm

Thanks for the clarification

jeremy · January 25, 2020, 3:26pm

The release with renamed DataBunch/DataSource is now on pypi.

sgugger · January 25, 2020, 9:00pm

Second breaking change in DataBlock this time. We moved the transforms in the init (and not .dataloaders anymore) which needs you to move those bits of codes. All notebooks in the fastai2 repo, examples and the course notebooks are up to date with that change, if you need some examples.

This will allow us to have a better representation/summary for the DataBlock class and some useful debug methods are in the oven (and all of that needs to know the transforms).

To easily change the item_tfms or batch_tfms of a given DataBlock, use DataBlock.new.

sgugger · January 25, 2020, 9:10pm

Follow-up on the rename DataBunch -> DataLoaders, I’ve pushed the renaming for the subclasses. To update existing code, run the following:

find . -type f -exec perl -pi -e 's/\bTextDataBunch\b/TextDataLoaders/g' {} +
find . -type f -exec perl -pi -e 's/\bImageDataBunch\b/ImageDataLoaders/g' {} +
find . -type f -exec perl -pi -e 's/\bSegmentationDataBunch\b/SegmentationDataLoaders/g' {} +
find . -type f -exec perl -pi -e 's/\bTabularDataBunch\b/TabularDataLoaders/g' {} +
find . -type f -exec perl -pi -e 's/\bCollabDataBunch\b/CollabDataLoaders/g' {} +

s.s.o · January 25, 2020, 9:10pm

@sgugger, Are you also planing to change DataLoaders subclasses like ImageDataBunch ?

sgugger · January 25, 2020, 9:11pm

Just did actually

s.s.o · January 25, 2020, 9:12pm

That was quick : I was looking at couse examples

sgugger · January 25, 2020, 9:36pm

Following up on the naming, DataLoaders.train_dl and DataLoaders.valid_dl was becoming redundant and inconsistent with the easier Datasets.train/Datasets.valid and TfmdLists.train/TfmdLists.valid.

So now it’s simply DataLoaders.train and DataLoaders.valid. Again all course notebooks and examples have been updated to that change.

fmobrj75 · January 25, 2020, 11:54pm

For text in the new DataLoaders format, do we have a backwards option?

sgugger · January 26, 2020, 5:24am

This has not been implemented yet.

Edit: Sorry I meant not fully implemented. You have the option at the dataloader level, but I think it’s not properly propagated. In any case it’s not tested, so there might be bugs.

fmobrj75 · January 26, 2020, 5:46am

Thanks. I will try it.

fmobrj75 · January 26, 2020, 5:54am

Thank you very much.

@sgugger I updated to last version (fastai2 and fastcore) to use Datasets and dataloaders and started to get an error when using SentencePieceTokenizer (it was working with 0.0.6). It seems to be something with the defaults and the special tokens:

sp=SentencePieceTokenizer(sp_model='spm15kpt.model')

The error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-35-02509167f33d> in <module>
----> 1 sp=SentencePieceTokenizer(sp_model='spm15kpt.model')

/media/hdd3tb/data/fastai2/fastai2/text/core.py in __init__(self, lang, special_toks, sp_model, vocab_sz, max_vocab_sz, model_type, char_coverage, cache_dir)
    306         self.vocab_sz,self.max_vocab_sz,self.model_type = vocab_sz,max_vocab_sz,model_type
    307         self.char_coverage = ifnone(char_coverage, 0.99999 if lang in eu_langs else 0.9998)
--> 308         self.special_toks = ifnone(special_toks, defaults.text_spec_todsrc.tokenizer[1].lengthsk)
    309         if sp_model is None: self.tok = None
    310         else:

AttributeError: 'types.SimpleNamespace' object has no attribute 'text_spec_todsrc'

sgugger · January 26, 2020, 5:31pm

That was some bad text introduced by mistake. Fixed now.

boris · January 26, 2020, 5:39pm

I’m trying to understand the logic behind all the data classes. This is what I understood so far:

Transform applies a transformation from an input to output
Pipeline is used to apply a sequence of Transform, reordered based on order
DataLoaders are what we need to train a model and will typically have at least a train and validation data loader and are created from Datasets , TfmdLists or DataBlock
DataBlock helps in building DataLoaders by specifying the type of inputs, outputs we will have and internally create the relevant Pipeline for each input/output
Datasets are created by applying sets of Transforms to each input/output
TfmdLists is similar to Datasets and let us create DataLoaders but is more versatile as we define our custom Pipeline

Is that correct?

fmobrj75 · January 26, 2020, 5:53pm

Thanks!

sgugger · January 26, 2020, 7:04pm

Almost.

DataBlock is a general blueprint where you use blocks for inputs/targets, specify how to get, label and split your samples. See notebook 50 for many examples of use.
TfmsLists is a set of items with one Pipeline of transforms, maybe with a training and validation set if you specified splits
Datasets groups TfmdLists together, because you usually have one pipeline of transforms for your inputs and another for your targets. This is what helps us have the block API in DataBlock.

You would use TfmdLists instead of Datasets only if you have a specific pipeline of transforms that create the final tuple itself, instead of grouping pipelines together.

fmobrj75 · January 26, 2020, 8:30pm

Another small issue I am facing. Trying to use LabelSmoothingCrossEntropy and receiving the following message:

‘LabelSmoothingCrossEntropy’ object has no attribute ‘mean’