Fastai v2 transforms / pipeline / data blocks

arora_aman · September 18, 2019, 7:26pm

There’s something just so magical about V2.

After rewatching walk thru #6, here is the intuition I got:

Transforms → Pipeline → TfmList → TfmDS → DataSource → “Infinite possibilities”

As we saw earlier, a Transform can encode or decode an item. Let’s just keep it that.
What if you want multiple Transforms in a series/sequence? Well, enter Pipelines.
A pipe can apply multiple transforms to one item.

But wait, how is that going to help? In Data Science we have batches ie., multiple items. Solution? As expected, TfmdList! It will apply a number of transforms to each item in a list! self.tfms(super()._get(i))

Okay, great! But, we have a dependant and independent variable ie., X and a y? Now what? Should we repeat this process every time and create two separate TfmdList? Nah, don’t be silly! This is covered in TfmdDS! like so self.tls = [TfmdList(items, t, do_setup=do_setup, filt=filt, use_list=use_list) for t in L(tfms)]I am already in LOVE with V2!

So this takes care of two sets of pipelines ready to be applied to the same set of items or Ls to get a dependent and an independent variable. We are ready to train now, aren’t we?!

Yes, we are! BUT, we need a train set and validation set to do beautiful work! Well, low and behold - enter DataSource!

Pass in a list of filters or idxs and these filters will be passed all way back until we reach transforms which has the intelligence or capacity to apply tfms to only the filter we passed otherwise it does nothing.
if filt!=self.filt and self.filt is not None: return x

Transforms ← Pipeline ← TfmList ← TfmDS ← DataSource ← “Infinite possibilities with filters”

Beautiful