Making predictions in v1

jamesp · October 15, 2018, 10:04pm

Is there any guarantee about the order of these results with regards to the order of the input CSV? I ask because, as far as I can tell, the y value is always 0 since there is no place to tell the model what the truth is for the test data. So I’m merging the predictions with my CSV downstream.

I’m getting decent (~95%) accuracy on my validation, but 50% on my test data. It’s possible that I’m overfitting, but I suspect I’m doing data munging incorrectly.

tetelias · October 15, 2018, 11:08pm

In v0.7 you could get the actual order of test samples from data.test_ds.fnames. In v1 I don’t see it, but I’m pretty sure [filepath.stem for filepath in list( path_of_your_test_folder.iterdir() ) ] with path_of_your_test_folder being Pathlib path should give you the actual order of your test files.

jamesp · October 16, 2018, 1:03am

This is really helpful. It is quite surprising to me that the order is based on the sort order in the directory, rather than the order in the csv!

jamesp · October 16, 2018, 2:54am

Just looping back to confirm that mapping based on the sort order in the directory (as @tetelias suggested) rather than the CSV (as I had been doing previously) solved my problem. My test set predictions were 95% accurate, very similar to my validation set.

sgugger · October 16, 2018, 12:19pm

Thanks for the feedback. We’ll be looking at how to get the matching easier on the test set in future developments.

jeremy · October 16, 2018, 1:40pm

This is now: test_ds.x.

tetelias · October 16, 2018, 7:26pm

In v1.0.5 installed from pip data have test_dl, but not test_ds.

Phil · October 16, 2018, 7:42pm

Same, using the master branch.

I loaded my data like this:

data = ImageDataBunch.from_csv(
DATA_PATH,
folder=‘processed_train’,
test=DATA_PATH/‘processed_test’,
csv_labels=‘train.csv’,
sep=’ ‘,
suffix=’.png’,
bs=32
)

I get a train_dl, train_ds, valid_dl, valid_ds, and a test_dl but no test_ds.

Screenshot%20from%202018-10-16%2012-39-52

Phil · October 16, 2018, 7:59pm

Ahh, digging a little further it appears the test filepaths are available here: data.test_dl.dl.dataset.x

If i call get_preds on the test like this:
test_preds, test_y = learn.get_preds(is_test=True)

Will the order of the images in learn.data.test_dl.dl.dataset.x match the order of the predictions in test_preds?

Thanks!

jeremy · October 16, 2018, 10:11pm

Should be. BTW some of those classes have __getattr__ defined so you can probably get rid of dl or dataset in your call.

jamesp · October 16, 2018, 10:47pm

Yes, data.test_dl.dataset.x works in place of data.test_dl.dl.dataset.x.

Phil · October 17, 2018, 8:29am

Thanks Jeremy, indeed I can get rid of ‘dl’ in my call.

Unfortunately get_preds() is giving me results I don’t think I understand. For example, when I call get_preds on the validation set I’m expecting it to return the predictions and known targets. However some of the targets it returns don’t match any of the labels from my csv file.

Steps

Load data

data = ImageDataBunch.from_csv(DATA_PATH, folder=‘processed_train’, test=DATA_PATH/‘processed_test’, csv_labels=‘train.csv’, sep=’ ‘, suffix=’.png’)

Create learner and train

loss_fn = F.binary_cross_entropy_with_logits
learn = ConvLearner(data, tvm.resnet18, loss_fn=loss_fn, metrics=fbeta)
learn.fit_one_cycle(1, 0.01)

Get predictions and targets from validation set

preds, targets = learn.get_preds()

Inspect a target (28 classes from my train.csv)

Screenshot from 2018-10-17 01-17-08.png798×83 4.89 KB

This is an unusual combination so I wanted to have a look at this particular image. The above target should correspond to a label of ‘1 2 3 4’ in my train.csv…however this label doesn’t exist!

My train.csv looks like this (28 possible labels):
Screenshot%20from%202018-10-17%2001-24-13

Have I made some silly mistake? Is my expectation incorrect? Or is there an issue with what get_preds is returning or perhaps how the labels were read in from train.csv?

digitalspecialists · October 17, 2018, 9:57am

Note that there is also a nice holdout capability that can reduce the amount of duplicated code you may need to get validation and test predictions. Instead of data.test_dl and data.valid_dl you can use data.holdout(is_test=True) and data.holdout(is_test=False) respectively.

tcapelle · October 17, 2018, 12:19pm

I am unable to make predictions with my test data, always cuda out of memory, independent of batch size.
The progress bar arrives to the end, so 100% of my test data passes the model, but when the output is computed, error.
I don’t have problems with the valid set.

out = learn.get_preds(is_test=True)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-132-588810aa18f3> in <module>
----> 1 out = learn.get_preds(is_test=True)

~/fastai/fastai/basic_train.py in get_preds(self, is_test)
    175     def get_preds(self, is_test:bool=False) -> List[Tensor]:
    176         "Return predictions and targets on the valid or test set, depending on `is_test`."
--> 177         return get_preds(self.model, self.data.holdout(is_test), cb_handler=CallbackHandler(self.callbacks))
    178 
    179 @dataclass

~/fastai/fastai/basic_train.py in get_preds(model, dl, pbar, cb_handler)
     36 def get_preds(model:Model, dl:DataLoader, pbar:Optional[PBar]=None, cb_handler:Optional[CallbackHandler]=None) -> List[Tensor]:
     37     "Predict the output of the elements in the dataloader."
---> 38     return [torch.cat(o).cpu() for o in zip(*validate(model, dl, pbar=pbar, cb_handler=cb_handler, average=False))]
     39 
     40 def validate(model:Model, dl:DataLoader, loss_fn:OptLossFunc=None,

~/fastai/fastai/basic_train.py in validate(model, dl, loss_fn, metrics, cb_handler, pbar, average)
     47         for xb,yb in progress_bar(dl, parent=pbar, leave=(pbar is not None)):
     48             if cb_handler: xb, yb = cb_handler.on_batch_begin(xb, yb, train=False)
---> 49             val_metrics.append(loss_batch(model, xb, yb, loss_fn, cb_handler=cb_handler, metrics=metrics))
     50             if not is_listy(yb): yb = [yb]
     51             nums.append(yb[0].shape[0])

~/fastai/fastai/basic_train.py in loss_batch(model, xb, yb, loss_fn, opt, cb_handler, metrics)
     17     if not is_listy(xb): xb = [xb]
     18     if not is_listy(yb): yb = [yb]
---> 19     out = model(*xb)
     20     out = cb_handler.on_loss_begin(out)
     21 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
     90     def forward(self, input):
     91         for module in self._modules.values():
---> 92             input = module(input)
     93         return input
     94 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    475             result = self._slow_forward(*input, **kwargs)
    476         else:
--> 477             result = self.forward(*input, **kwargs)
    478         for hook in self._forward_hooks.values():
    479             hook_result = hook(self, input, result)

~/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/conv.py in forward(self, input, output_size)
    724         return F.conv_transpose2d(
    725             input, self.weight, self.bias, self.stride, self.padding,
--> 726             output_padding, self.groups, self.dilation)
    727 
    728 

RuntimeError: CUDA error: out of memory

It is probably how I created my test ds, the SegmentationDataset expects x and y values, so i just called the constructor with (x,x) as I don’t have y values for test data.

Then used:

def get_tfm_datasets(path, val_idxs, size):
    datasets = get_datasets(path, val_idxs)
    tfms = get_transforms(do_flip=True, max_rotate=4, max_lighting=0.2, max_warp=0.15)
    return transform_datasets(train_ds, valid_ds, test_ds=test_ds, tfms=tfms, tfm_y=True, size=size, padding_mode='border')
  
train_tds, _, _= get_tfm_datasets(PATH128, range(400), 128)

to get trasnformed datasets.

Any idea?

tcapelle · October 17, 2018, 12:22pm

My GPU ram is not beaing liberated:

=== Software === 
python version  : 3.6.6
fastai version  : 1.0.6.dev0
torch version   : 1.0.0.dev20181015
nvidia driver   : 396.54
torch cuda ver  : 9.2.148
torch cuda is   : available
torch cudnn ver : 7104
torch cudnn is  : enabled

=== Hardware === 
nvidia gpus     : 1
torch available : 1
  - gpu0        : 8119MB | Quadro P4000

=== Environment === 
platform        : Linux-4.4.0-130-generic-x86_64-with-debian-stretch-sid
distro          : Ubuntu 16.04 Xenial Xerus
conda env       : fastai
python          : /home/paperspace/anaconda3/envs/fastai/bin/python
sys.path        : 
/home/paperspace/anaconda3/envs/fastai/lib/python36.zip
/home/paperspace/anaconda3/envs/fastai/lib/python3.6
/home/paperspace/anaconda3/envs/fastai/lib/python3.6/lib-dynload
/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages
/home/paperspace/fastai
/home/paperspace/anaconda3/envs/fastai/lib/python3.6/site-packages/IPython/extensions
/home/paperspace/.ipython

Wed Oct 17 08:21:36 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.54                 Driver Version: 396.54                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:00:05.0 Off |                  N/A |
| 46%   34C    P8     5W / 105W |   8105MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2019      C   ...rspace/anaconda3/envs/fastai/bin/python  8095MiB |
+-----------------------------------------------------------------------------+

jeremy · October 17, 2018, 1:21pm

get_preds isn’t likely to be a great option for segmentation - you don’t want all your images in memory at once! Try doing your work a batch at a time.

sgugger · October 17, 2018, 2:44pm

I think you should look at your train_ds.classes. It’s possible that the 0 label corresponds to another number in your dataframe.

jamesp · October 17, 2018, 8:25pm

Using an imaging dataset with a binary classifier using the default crossentropy loss, I’m getting good classification (98% correct in a balanced set) in a set of data that is visually distinguishable to a human (that is, I believe that it’s possible to be this good).

The one odd thing to me is that in the test set, the log_prob is always one of two values:

learn = ConvLearner(data, 
                    tvm.resnet50, 
                    metrics=[accuracy, dice], 
                    callback_fns=ShowGraph)

learn.fit_one_cycle(1)

# Update all of the layers
learn.unfreeze()
learn.fit_one_cycle(24, slice(1e-4, 2e-2))#, pct_start=0.05)

# Predict into the test set
learn.get_preds(is_test=True)
test_output = learn.get_preds(is_test=True)
log_probs, y = test_output

mpl.pyplot.hist(test_results['logprob'])

It seems like I must be doing something wrong - I would expect a distribution of values rather than just two values. Any pointers?

mksenzov · November 29, 2018, 4:10am

I also ran into complications with predictions. I am on fastai 1.0.28. Working through the kaggle standard example of dogs-vs-cats with structure:

$ ls datasets/dogs-vs-cats/

README.md sampleSubmission.csv test1 test1.zip train train.zip

I create my data bunch as:

data = ImageDataBunch.from_name_re(path=path, 
                               fnames=fnames, 
                               pat=r"/(dog|cat)\.\d+\.jpg$", 
                               ds_tfms=get_transforms(),
                               size=224, 
                               bs=BATCH_SIZE,
                               test='test1',
                               suffix='.jpg')

and test set seems to be empty:

>>> data.test_ds

LabelList
y: CategoryList (1 items)
[]...
Path: datasets/dogs-vs-cats/train
x: ImageItemList (1 items)
[]...
Path: datasets/dogs-vs-cats/train

Do I invoke ctor incorrectly?

mksenzov · November 29, 2018, 4:12am

Also , to confirm, test set does not have to conform to the same pat regexp, right? Then do we need suffix and if so - is it used for test set only (since train/validation should be governed by pat).