Interesting posts:
Note: 2nd level headings are for modules, 3rd level headings are for functions/classes.
data.core
get_files
get_files returns an L
list of all the non-hidden files in path
with optional extensions
and recurse
, only if an optional include
directory is in the path."
Example:
source = untar_data(URLs.MNIST_TINY)
all_files = get_files(source)
train_files = get_files(source, folders='train')
valid_img_files = get_files(source, folders='valid', extensions='.png')
labels = get_files(source, recurse=False)
FileGetter
Creates and returns a partial get_files
function that searches path suffix suf
and passes along args.
Example:
source = untar_data(URLs.MNIST_TINY)
get_train = FileGetter(suf='train')
get_valid = FileGetter(suf='valid')
train_files = get_train(source)
valid_imgs = get_valid(source, extensions='.png')
get_image_files
Returns an L
list of all possible image files in the path
recursively, only if an optional include
directory is in the path.
Example:
source = untar_data(URLs.MNIST_TINY)
train_imgs = get_image_files(source, folders='train')
RandomSplitter
RandomSplitter is used for splitting the dataset into train and validation datasets. It creates 2 sets of shuffled indexes, one for train and another for valid.
RandomSplitter returns a function which takes a list of objects(ex: filenames). Let’s say the length of the list is 1000 and we need 20% of it as a validation dataset, it returns a list consisting of shuffled indexes for the train(800 indexes) and valid (200 indexes) set.
Example:
source = untar_data(URLs.PETS)/"images"
items = get_image_files(source)[:1000]
split_idx = RandomSplitter(valid_pct=0.2)(items)
len(split_idx),len(split_idx[0]),len(split_idx[1])
Output: (2, 800, 200)
Categorize
Categorize helps in converting label strings to vocab id and vice versa.
Example:
tcat = Categorize(vocab=['cat','dog'])
lbl = tcat('cat'); lbl
Output : 1
#For reversing/decoding
tcat.decode(1)
Output : 'cat'
data.transform
Transform
This uses metaclass _TfmMeta. The class has two functions - encodes and decodes. Whenever you index (index is more a functional call like () than [ ]) into the class using an index number the encodes function is automatically called via the _call() method defined in the class. The class.decodes() will have to be called explicitly. It is usually called via the decode() defined in the class.
In a pipeline where a list of transformations are called
pipe = Pipeline([f2,f3,f1])
the pipe.decode() calls the decode() on each of the transformations. In this case, it is f1.decode(), f2.decode() etc. f1.decode() calls f1.decodes() internally.
TupleTransform
From Tuple transform is in docs, but not in code · Issue #266 · fastai/fastai2 · GitHub Per Sylvian
It used to exist, but it’s removed now since all transforms have this behavior (applying over tuple) unless they are ItemTransform
.
This is a subclass Transform
. This returns as_item_force = False which allows the Transform to return the result of encodes function as tuple type. This allows an encodes to selectively apply the encodes to an item within the tuple that matches a criteria.