Correct for this particular Kaggle competition.
Partially true. I will explain later why is this so.
By default, fastai library was designed for ease-of-use in mind. So, most of the APIs assumed that your dataset has a ‘train’, ‘test’ and ‘valid’ directories. Again, this depends on your dataset format/structure and which function you call in fastai library as different function accepts different parameters and focus on achieving different thing.
For this competition, use the ImageClassifierData.from_csv()
function to create the training data for model. Notebook/code example:
PATH = 'data/kaggle/plant_seedlings/'
!ls {PATH}
labels.csv models/ sample_submission.csv test/ tmp/ train/
labels_csv = f'{PATH}labels.csv'
n = len(list(open(labels_csv))) - 1
val_idxs = get_cv_idxs(n)
arch = resnet50
sz = 299
bs = 32
tfms = tfms_from_model(arch, sz, aug_tfms=transforms_top_down, max_zoom=1.1)
data = ImageClassifierData.from_csv(path=PATH, folder='train', csv_fname=labels_csv, test_name='test', val_idxs=val_idxs, tfms=tfms, bs=bs)
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 5)
ImageClassifierData
class read in images and their labels given as numpy arrays.
ImageClassifierData.from_csv()
function read in images and their labels given as a CSV file. This method should be used when training image labels are given in an CSV file as opposed to sub-directories with label names.
ImageClassifierData.from_csv()
parameters:
-
path
: a root path of the data (used for storing trained models, precomputed values, etc.)
-
folder
: a name of the folder in which training images are contained.
-
csv_fname
: a name of the CSV file which contains target labels.
-
tfms
: transformations (for data augmentations). e.g. output of tfms_from_model
-
val_idxs
: index of images to be used for validation. e.g. output of get_cv_idxs
. If None
, default arguments to get_cv_idxs
are used.
-
test_name
: a name of the folder which contains test images.
As you can see, you can be upfront on how you intend a fastai function to work by specifying the path, folder name to your train set or test set and pass into the function. It gives you a certain degree of control.
I hope that clarify your understanding.