A Brief Guide to Test Sets in v2 (you can do labelled now too!)

My guess would be possible padding being done on a batch level? @sgugger any ideas? (I haven’t played with text data yet so I’m unsure!)

I’d need more details and a reproducer to investigate this.

1 Like

Working on a reproducible example (dataset I’m working with is NDA’d). In the meanwhile, for anyone running into the same problem, passing bs=1 to the dataloader gets accurate results.

I’m not quite getting the behavior @leviritchie but I am getting this odd behavior, examine using 2, 3, and 4 rows (this is taken from training via IMDB_SAMPLE):

dl = learn.dls.test_dl(df[0:2])
learn.get_preds(dl=dl)

dl = learn.dls.test_dl(df.iloc[0:3])
learn.get_preds(dl=dl)

dl = learn.dls.test_dl(df.iloc[0:4])
learn.get_preds(dl=dl)

Here’s the results:

(tensor([[0.3101, 0.6899],
         [0.7826, 0.2174]]), None)

(tensor([[0.3101, 0.6899],
         [0.9947, 0.0053],
         [0.7826, 0.2174]]), None)

(tensor([[0.3101, 0.6899],
         [0.9947, 0.0053],
         [0.1390, 0.8610],
         [0.7826, 0.2174]]), None)

This behavior doesn’t seem right to me, as the second in #2 should follow the last from #1 (let me know if you require more info, I’ll put the full setup below):

path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')

imdb_lm = DataBlock(blocks=(TextBlock.from_df('text', is_lm=True),),
                    get_x=attrgetter('text'),
                    splitter=RandomSplitter())

dls = imdb_lm.dataloaders(df, bs=64, seq_len=72)
learn= language_model_learner(dls, arch=AWD_LSTM)
learn.fine_tune(5)
learn.save_encoder('enc')
imdb_clas = DataBlock(blocks=(TextBlock.from_df('text'), CategoryBlock),
                      get_x=attrgetter('text'),
                      get_y=attrgetter('label'),
                      splitter=RandomSplitter())

dls = imdb_clas.dataloaders(df, bs=64, seq_len=72)
learn = text_classifier_learner(dls, AWD_LSTM, metrics=accuracy).load_encoder('enc')
learn.fine_tune(5)

Don’t forget samples are sorted by length, that’s probably what is happening here. I’ll see how we can reverse the ordering automatically in get_preds.

1 Like

I’m aware this is a bit old but it needed to be updated. To pass in labels you need to pass with_labels=True to your call to test_dl (which is updated in the first post)

Confirmed that fixes all errors related to Attribute Error! Thanks @muellerzr!

Hi @muellerzr I don’t see how this will work correctly. When creating to_tst with a proc like Normalize you would want to use the same mapping to Z scores as used in to, ie the fitting data set, but won’t this method calculate new mean and std for df_test resulting in different mapping. Similary how do you know the mapping to categories is the same. Is it not better to follow the method below to address this:

to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="salary", splits=splits)
to_tst = to.new(df_test)
to_tst.process()
1 Like

No, well, both is the right answer I suppose. Any and all test_dl methods do exactly this. For instance the tabular one does the exact process you describe. (Proof is examining test_dl in the tabular.learner IIRC) All preprocessing done on the validation set is applied to any and all new test sets made. Does this help @deepmindfulness?

So I’m short, test_dl does exactly what you did. They’re both right answers, just one saves you lines of code :slight_smile:

Thanks @muellerzr. So in terms of the code below, the data via test_dl uses the proc functions calculated from df_main, but when calling to_test.items[cat+cont] you’d see transforms calculated with reference to df_test only and not df_main?

to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="salary", splits=splits)
to_test = TabularPandas(df_test, procs, cat_names, cont_names, y_names="salary")
dls = to.dataloaders()
test_dl = dls.test_dl(df_test, with_labels=True)

It shouldn’t be, because here is the exact source for tabular’s test_dl:

@delegates(TabDataLoader.__init__)
def test_dl(self, test_items, rm_type_tfms=None, **kwargs):
    to = self.train_ds.new(test_items)
    to.process()
    return self.valid.new(to, **kwargs)

You can see that we run the train’s procs via to.process() based on the train_ds

If it is a multi-category problem, vocab will be within
learner.dls.multi_categorize.vocab

1 Like

Thanks @muellerzr Your code snippet makes clear the processing within the test data loader. However for the test tabular object and test loader to be consistent wouldn’t you need:

to = TabularPandas(df_main, procs, cat_names, cont_names, y_names="salary", splits=splits)
to_tst = to.new(df_test)
to_tst.process()
dls = to.dataloaders()
test_dl = dls.test_dl(df_test, with_labels=True)

For working with tree based models, it might be preferred to work with the tabular object directly, so I think it is important to empahsise that

to_tst = to.new(df_test)
to_tst.process()

is not the same as

to_test = TabularPandas(df_test, procs, cat_names, cont_names, y_names="salary")

In regards to having the training’s transforms applied, no. The latter does not do that, which I believe you may need to? However the train_ds is a TabularPandas object IIRC

I am using your cross validation notebook on WWF and I guess I don’t understand this line:

dsrc = Datasets(train_imgs+tst_imgs, tfms=[[PILImage.create], [parent_label, Categorize]],
splits = split_list)

I’m trying to understand in my dataset what goes here:
[parent_label, Categorize]]

I mean I think I get it, but what I don’t understand is where is this information the documentation that I can find? I have a y that isn’t a label, it’s regression actually.

After I tested it, I got the results below.

# Predictions
preds = learn1.get_preds(dl=test_dls)
# Calculate the accuracy
acc = accuracy(inp=preds[0], targ=preds[1]).item()
# Trace the results
print(f"Test accuracy: %{round(acc * 100, 2)}")

>> Test accuracy: %99.85

and for the validate method;

round((1 - learn1.validate(dl=test_dls)[0]) * 100, 2)

>> 99.4

is this accurate? or am I doing something wrong?

Experiment info
data = URLs.MNIST
vision_learner(arch = resnet18)
fine_tune for 20 epochs, early stop at 16. the best round in the 11th is 0.013064 (train_loss) 0.029337 (valid_loss) 0.992857 (accuracy) 01:28 (time)

What type of object is “to” in your example?

What is valid_ds and tst_imgs in this regard?