Note: I Updated the Tips and Tricks
list to reflect the new added tips (Inference
and Production
entries)
I would like to start a Wiki topic where everyone interested in sharing her/his knowledge can post her/his recipes (tips and tricks) that she/he discovered while learning fastai v2. I suggest we separate them in different categories in order to ease both navigation and discovery of all tips. The categories I proposed should be considered as a suggestion. They may also be separated in different module: vision, tabular, text, etc. All input that may help improving this wiki are very welcome.
To kick start this exercise I’m posting some stuff that I learned during my journey. I hope other will soon share theirs, and all the fastai community will benefit from them. Hopefully, this wiki will ease the fastai v2 learning curve for all of us.
Since this topic is a Wiki, anyone can add, or correct the information that is gathered here.
Thank you for sharing!
General
- Expand condensed code snippet for v2 new users
Sometimes, we encounter a code snippet (like the one here below). For a v2 new user, this may be intimidating .
planet = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
get_x=lambda x:planet_source/"train"/f'{x[0]}.jpg',
splitter=RandomSplitter(),
get_y=lambda x:x[1].split(' '),
batch_tfms=aug_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.))
It may be helpful to expand the code (mentally or in writing) as follow in order to grasp what are the key components of this new API:
def get_x(x): return planet_source/"train"/f'{x[0]}.jpg'
def get_y(x): return x[1].split(' ')
batch_tfms=aug_transforms(flip_vert=True, max_lighting=0.1, max_zoom=1.05, max_warp=0.)
planet = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
get_x=get_x,
splitter=RandomSplitter(),
get_y=get_y,
batch_tfms=batch_tfms)
Live example: 50_datablock_examples.ipynb . Search for Multi-label - Planet
Datasets
Please add your tips here
DataBlock
- Use getters = [ItemGetter(0), ItemGetter(1)]
when your items
are a list of tuples (x,y).
Example:
getters = [ItemGetter(0), ItemGetter(1)]
tsdb = DataBlock(blocks=(TSBlock, CategoryBlock),
get_items=get_ts_items,
getters=getters,
splitter=RandomSplitter(seed=seed),
batch_tfms = batch_tfms)
NB: get_items=get_ts_items
is the key information. get_ts_items
returns a list of tuples (our (x,y) tuples). In this case a list of (2D numpy array, label). Hence the use of [ItemGetter(0), ItemGetter(1)].
- ItemGetter(0) will return the 2D numpy array : the x
- ItemGetter(1) will return the label (string) : the y
Thread: post
Live example: index.ipynb. Search for 2nd method : using DataBlock and DataBlock.get_items()
- Why, in some cases, we don’t need a get_x
Example:
pets = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files,
splitter=RandomSplitter(),
get_y=RegexLabeller(pat = r'/([^/]+)_\d+.jpg$'),
item_tfms=Resize(128),
batch_tfms=aug_transforms())
NB: get_items=get_items
is again the key information. Also, one has to remember that both get_x
and get_y
are methods that are applied against the list returned by the get_items
method (in this case, get_image_files
) . Both get_x
and get_y
are initialized to noop
get_items
(i.e. get_image_files
) already returns a list of image filenames which corresponds to our x
, and therefore we don’t need to add get_x
in our DataBlock
declaration. If we really insist in having get_x
then we can add get_x=noop
which means “Please, don’t do anything!”.
How about get_y=RegexLabeller(pat = r'/([^/]+)_\d+.jpg$')
? . get_items
again returns a list of image filenames. So, get_y
loop through the list of the image filenames and returns the corresponding pet name. The latter is our label and corresponds to the y
variable.
Thread: post
Live example: 50_datablock_examples.ipynb. Search for Pets
section
- How can make sure my Datablock
object has been properly built?
Let’s use the pets
object created above as an example. Once pets
object is created, call its summary()
:
pets.summary((untar_data(URLs.PETS)/"images"))
The summary()
method provides very useful information like:
- How the samples are built,
- Input and output types, and real samples extracted from the underling dataset,
- Show the different pipelines (of transfoms) at different stages (after_item, before_batc, after_batch)
- Build a mini batch of 4 samples
- Show a batch if you set show_batch=True
. You can even pass kwargs
(figsize
for example) to the show_batch()
method
Live example: 50_datablock_examples.ipynb. Search for Pets
section
DataLoaders
Extracting the number of classes from DataLoaders
dls= pets.dataloaders(…)
c_out = dls.c
Live example : To be added
Learner
Useful information that you can extract from Learner
train and valid datasets
train = learn.dls.train
valid = learn.dls.valid
# get items
train.items
valid.items
# get a batch
valid.one_bacth
# iterate
next(iter(learn.dls.valid))
first(learn.dls.valid)
Thread : post
Inference (Predictions)
Please, check out this post here below
Production
Lambda function and Serialization
Please, check out this post here below