Tentative solutions:
- Why do we say that fastai has a “layered” API? What does it mean?
fastai is designed following a layered architecture with 4 layers: an applications layer, a high-level API, a mid-level API and a low-level API. These offer higher and higher customizability as you make your way down the layers. The high-level of the API is most likely to be useful to beginners and to practitioners who are mainly in interested in applying pre-existing deep learning methods. It offers concise APIs over four main application areas: vision, text, tabular and time-series analysis, and collaborative filtering. These APIs choose intelligent default values and behaviors based on all available information. The mid-level API provides the core deep learning and data-processing methods for each of these applications, and low-level APIs provide a library of optimized primitives and functional and object-oriented foundations, which allows the mid-level to be developed and customised.
- Why does a
Transformhave adecodemethod? What does it do?
decodeis used by fastai’sshow_batchandshow_results, as well as some other inference methods, to convert predictions and mini-batches into a human-understandable representation.
- Why does a
Transformhave asetupmethod? What does it do?
In general, a
Transformis an object that behaves like a function and has an optionalsetupmethod that will initialize some inner state and an optionaldecodethat will reverse the function.
- How does a
Transformwork when called on a tuple?
A special behavior of
Transforms is that they always get applied over tuples. In general, our data is always a tuple(input,target)(sometimes with more than one input or more than one target). When applying a transform on an item like this, such asResize, we don’t want to resize the tuple as a whole; instead, we want to resize the input (if applicable) and the target (if applicable) separately. It’s the same for batch transforms that do data augmentation: when the input is an image and the target is a segmentation mask, the transform needs to be applied (the same way) to the input and the target.
- Which methods do you need to implement when writing your own
Transform?
If you want to write a custom transform to apply to your data, the easiest way is to write a function. You will need the
encodemethod and optionally thesetupordecodemethods.
- Write a
Normalizetransform that fully normalizes items (subtract the mean and divide by the standard deviation of the dataset), and that can decode that behavior. Try not to peek!
class NormalizeMean(Transform): def setups(self, items): self.mean = sum(items)/len(items) def encodes(self, x): return x-self.mean def decodes(self, x): return x+self.mean
- Write a
Transformthat does the numericalization of tokenized texts (it should set its vocab automatically from the dataset seen and have adecodemethod). Look at the source code of fastai if you need help.
class Numericalize(Transform): def setups(self, dsets): if dsets is None: return if self.vocab is None: count = dsets.counter if getattr(dsets, 'counter', None) is not None else Counter(p for o in dsets for p in o) if self.special_toks is None and hasattr(dsets, 'special_toks'): self.special_toks = dsets.special_toks self.vocab = make_vocab(count, min_freq=self.min_freq, max_vocab=self.max_vocab, special_toks=self.special_toks) self.o2i = defaultdict(int, {v:k for k,v in enumerate(self.vocab) if v != 'xxfake'}) def encodes(self, o): return TensorText(tensor([self.o2i [o_] for o_ in o])) def decodes(self, o): return L(self.vocab[o_] for o_ in o)
- What is a
Pipeline?
To compose several transforms together, fastai provides the
Pipelineclass. We define aPipelineby passing it a list ofTransforms; it will then compose the transforms inside it. When you callPipelineon an object, it will automatically call the transforms inside, in order.
- What is a
TfmdLists?
Your data is usually a set of raw items (like filenames, or rows in a DataFrame) to which you want to apply a succession of transformations. We just saw that a succession of transformations is represented by a
Pipelinein fastai. The class that groups together thisPipelinewith your raw items is calledTfmdLists.
- What is a
Datasets? How is it different from aTfmdLists?
Datasetswill apply two (or more) pipelines in parallel to the same raw object and build a tuple with the result. LikeTfmdLists, it will automatically do the setup for us, and when we index into aDatasets, it will return us a tuple with the results of each pipeline.TfmdListsbehaves differently in that it returns two separate objects for our inputs and targets.
- Why are
TfmdListsandDatasetsnamed with an “s”?
TfmdListsandDatasetsare named with an “s” because they can handle a training and a validation set with asplitsargument.
- How can you build a
DataLoadersfrom aTfmdListsor aDatasets?
You can directly convert a
TfmdListsor aDatasetsto aDataLoadersobject with thedataloadersmethod.
- How do you pass
item_tfmsandbatch_tfmswhen building aDataLoadersfrom aTfmdListsor aDatasets?
When building a
DataLoadersfrom aTfmdListsor aDatasets,after_itemis the equivalent ofitem_tfmsinDataBlockwhereasafter_batchis the equivalent ofbatch_tfmsinDataBlock.
- What do you need to do when you want to have your custom items work with methods like
show_batchorshow_results?
decodeis used by fastai’sshow_batchandshow_results, as well as some other inference methods, to convert predictions and mini-batches into a human-understandable representation. Your custom items should therefore have adecodemethod in order for them to work with methods likeshow_batchorshow_results.
- Why can we easily apply fastai data augmentation transforms to the
SiamesePairwe built?
The mid-level API for data collection gives us two objects (
TfmdListsandDatasets) that can help us easily apply transforms to theSiamesePairwe built.