How to make predictions on list of "new" images

reyemarr · October 31, 2018, 3:08pm

Hi,

On lesson 1 (breeds), we are given a way to make predictions on individual images towards the end of the jupyter notebook as follows:

Method 1.
trn_tfms, val_tfms = tfms_from_model(arch, sz)
ds = FilesIndexArrayDataset([fn], np.array([0]), val_tfms, PATH)
dl = DataLoader(ds)
preds = learn.predict_dl(dl)
np.argmax(preds)

Method 2.
trn_tfms, val_tfms = tfms_from_model(arch, sz)
im = val_tfms(open_image(PATH + fn)) # open_image() returns numpy.ndarray
preds = learn.predict_array(im[None])
np.argmax(preds)

I am wondering if there is a way to generalize the approach to make predictions on a list of images and not just a single one.

I know I could accomplish the task by making a function and looping through all the images I want to predict on, but it seems to me that the fastai library is smarter than that and probably already has a method to do what I want.

Any advise?

Thanks,

MR

reyemarr · November 1, 2018, 8:22am

Anyone ?? @jeremy @rachel Thanks guys

AjitB · November 1, 2018, 12:07pm

In the below example, I have two folders train and test with the images. test=test_folder specifies the folder to look for the test images.

…
path = ‘’
train_folder = ‘train’
test_folder = ‘test’

data = ImageDataBunch.from_df(path=path, df=df_train, folder=train_folder, test=test_folder, ds_tfms=get_transforms(), size=224, bs=10)
data.normalize(imagenet_stats)
…

After you are done training/validating the images, you do this:

log_preds, y = learn.TTA()
accuracy(log_preds, y)

log_preds_test = learn.get_preds(is_test = True)
log_preds_test

log_preds_test = np.argmax(log_preds_test[0], axis = 1)
preds_classes = [data.classes[i] for i in log_preds_test]
probs = np.exp(log_preds_test)

submission = pd.DataFrame({ ‘imgname’: os.listdir(‘test’), ‘label’: preds_classes })
submission.to_csv(‘test_classification_results.csv’, index=False)

Please note the above example is that of a multi-label classification. The generated csv file will have two columns ‘imgname’ and ‘label’

reyemarr · November 1, 2018, 2:05pm

Thanks @AjitB. However, I think that in your suggested approach, the “test” images in the learn object are the images used during training. In my case, I want to predict on a new set of images which were not available during trainig, so they were not passed on to the learn object.

I did figure out a way to do what I want, which essentially is to create a new data object as follows:

def get_data2(sz, bs): # sz: image size, bs: batch size
    tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)
    data = ImageClassifierData.from_csv(PATH, 'train', f'{PATH}labels.csv', test_name='myTest',
                                   val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)
    return data
data2 = get_data2(sz, bs)
trn_tfms, val_tfms = tfms_from_model(arch, sz)
ds = data2.test_ds
dl = DataLoader(ds)
preds = learn.predict_dl(dl)
np.argmax(preds,axis=1)

The key changes in this code was that the test_name variable inside the get_data2 function now points to a new directory with my new images.

One “problem” with this approach is that I am not applying TTA to the new predictions, which is not ideal. I am still trying to figure out how to do that. Any thoughts?

Does anyone know how to update the learn object without training? In essence, I want to update the test images in it to my new set of images, so that I can call learn.TTA to make predictions?

Cheers,

reyemarr · November 1, 2018, 2:38pm

I figured it out…
It is just a matter of doing learn.set_data(get_data2(sz,bs)), and point the get_data fucntion to the directory with the new test data, and then just follow the process as detailed in the notebook, that is:

log_preds, y = learn.TTA(is_test=True)
probs = np.mean(np.exp(log_preds),0)

Hope this helps someone else in the future …

MR

AjitB · November 2, 2018, 9:39am

Hi @reyemarr,
I have verified that the images in the test folder are not used for training or validation.
It can be confirmed by looking at the values of the following:

data.train_ds
data.valid_ds
data.test_ds

len(data.train_ds) + len(data.valid_ds) = Number of images in the train folder
len(data.test_ds) = Number of images in the test folder

Definition of the ‘class ImageDataBunch’ can be found here:

github.com

fastai/fastai/blob/master/fastai/vision/data.py#L264


mnist_stats = ([0.15]*3, [0.15]*3)


def channel_view(x:Tensor)->Tensor:
"Make channel the first axis of `x` and flatten remaining axes"
return x.transpose(0,1).contiguous().view(x.shape[1],-1)


def _get_fns(ds, path):
"List of all file names relative to `path`."
return [str(fn.relative_to(path)) for fn in ds.x]


class ImageDataBunch(DataBunch):
@classmethod
def create(cls, train_ds, valid_ds, test_ds=None, path:PathOrStr='.', bs:int=64, ds_tfms:Optional[TfmList]=None,
                 num_workers:int=defaults.cpus, tfms:Optional[Collection[Callable]]=None, device:torch.device=None,
                 collate_fn:Callable=data_collate, size:int=None, **kwargs)->'ImageDataBunch':
    "Factory method. `bs` batch size, `ds_tfms` for `Dataset`, `tfms` for `DataLoader`."
    datasets = [train_ds,valid_ds]
    if test_ds is not None: datasets.append(test_ds)
    if ds_tfms: datasets = transform_datasets(*datasets, tfms=ds_tfms, size=size, **kwargs)
    dls = [DataLoader(*o, num_workers=num_workers) for o in
           zip(datasets, (bs,bs*2,bs*2), (True,False,False))]

The definition of the fit method can be found here:

github.com

fastai/fastai/blob/master/fastai/basic_train.py#L69




def train_epoch(model:nn.Module, dl:DataLoader, opt:optim.Optimizer, loss_func:LossFunction)->None:
"Simple training of `model` for 1 epoch of `dl` using optim `opt` and loss function `loss_func`."
model.train()
for xb,yb in dl:
    loss = loss_func(model(xb), yb)
    loss.backward()
    opt.step()
    opt.zero_grad()


def fit(epochs:int, model:nn.Module, loss_func:LossFunction, opt:optim.Optimizer,
    data:DataBunch, callbacks:Optional[CallbackList]=None, metrics:OptMetrics=None)->None:
"Fit the `model` on `data` and learn using `loss` and `opt`."
cb_handler = CallbackHandler(callbacks, metrics)
pbar = master_bar(range(epochs))
cb_handler.on_train_begin(epochs, pbar=pbar, metrics=metrics)


exception=False
try:
    for epoch in pbar:
        model.train()