Here are the questions:
- Why do we say that fastai has a “layered” API? What does it mean?
Fastai’s layered API refers to how we have a high-level API that allows for training neural networks for common applications with just a few lines of code, but also have lower-level APIs that are more flexible and better for custom tasks.
- Why does a
Transform
have adecode
method? What does it do?
The
decode
methods reverses (if possible) the application of the transform. It is often used to convert predictions and mini-batches into human-understandable representation
- Why does a
Transform
have asetup
method? What does it do?
Sometimes it is necessary to initialize some inner state, like the vocabulary for a tokenizer. The
setup
method handles this.
- How does a
Transform
work when called on a tuple?
The
Transform
is always applied to each item of the tuple. If a type annotation is provided, theTransform
is only applied to the items with the correct type.
- Which methods do you need to implement when writing your own
Transform
?
Just the
encodes
method, and optionally thedecodes
method for it to be reversible, andsetups
for initializing an inner state.
- Write a
Normalize
transform that fully normalizes items (subtract the mean and divide by the standard deviation of the dataset), and that can decode that behavior. Try not to peek!
Here is a
Normalize
transform:class Normalize(Transform): def setups(self, items): self.mean = x.mean() self.std = x.std() def encodes(self, x): return (x-self.mean)/self.std def decodes(self, x): return x*self.std+self.mean
- Write a
Transform
that does the numericalization of tokenized texts (it should set its vocab automatically from the dataset seen and have adecode
method). Look at the source code of fastai if you need help.
Here is a numericalization transform:
class Numericalize(Transform): def __init__(self, min_freq=3, max_vocab=60000): store_attr('min_freq,max_vocab') def setups(self, dsets): count = Counter(p for o in dsets for p in o) self.special_toks = dsets.special_toks self.vocab = make_vocab(count, min_freq=self.min_freq, max_vocab=self.max_vocab, special_toks=self.special_toks) self.o2i = defaultdict(int, {v:k for k,v in enumerate(self.vocab) if v != 'xxfake'}) def encodes(self, o): return TensorText(tensor([self.o2i [o_] for o_ in o])) def decodes(self, o): return L(self.vocab[o_] for o_ in o)
- What is a
Pipeline
?
The
Pipeline
class is meant for composing several transforms together. It is defined by passing a list ofTransform
s toPipeline(...)
. When you callPipeline
on an object, it will automatically call the transforms inside, in order.
- What is a
TfmdLists
?
A
TfmdLists
object groups together the raw items with aPipeline
.
- What is a
Datasets
? How is it different from aTfmdLists
?
Datasets
will apply two (or more) pipelines in parallel to the same raw object and build a tuple with the result. This is different fromTfmdLists
which leads to two separate objects for the input and target.
- Why are
TfmdLists
andDatasets
named with an “s”?
Because they can handle a training and a validation set with the
splits
argument
- How can you build a
DataLoaders
from aTfmdLists
or aDatasets
?
You can call the
dataloaders
method.
- How do you pass
item_tfms
andbatch_tfms
when building aDataLoaders
from aTfmdLists
or aDatasets
?
You can pass
after_item
andafter_batch
, respectively, to thedataloaders
argument.
- What do you need to do when you want to have your custom items work with methods like
show_batch
orshow_results
?
You need to create a custom type with a
show
method, sinceTfmdLists
/Datasets
will decode the items until it reaches a type with ashow
method.
- Why can we easily apply fastai data augmentation transforms to the
SiamesePair
we built?
Because they dispatch over tuples or their subclasses.