Testdataset misses labels

data source : https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite)

path =Path('DataSet')
df =pd.read_csv('DataSet/sat.trn',sep=' ',header=None)
df1=pd.read_csv('DataSet/sat.tst',sep=' ',header=None)
test = TabularList.from_df(df1,path=path)
data=(TabularList.from_df(df,path=path,cont_names=list(range(36)),procs=Normalize)
      .split_by_rand_pct(0.2)
     .label_from_df(cols=36)
      .add_test(test)
      .databunch(bs=64))
data.show_batch(10,ds_type=DatasetType.Test)

And it looks like this below.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 target
0.7826 0.8044 0.1566 -0.1941 0.5116 0.8179 0.1663 -0.1901 0.5327 0.8377 0.4199 0.0233 0.4959 0.6765 0.5094 0.1147 0.5191 0.8639 1.1195 0.2738 0.8330 1.0529 1.1365 0.2803 0.7237 1.0267 0.5757 0.2185 0.7423 1.0399 0.5837 0.2215 0.7670 1.0525 0.8357 0.2250 1
0.4898 0.8044 0.1566 -0.1941 0.5116 0.8179 0.4047 0.0190 0.5327 0.8377 0.4199 0.2324 0.4959 0.8512 1.1064 0.2721 0.8150 1.0384 1.1195 0.2738 0.8330 1.2714 1.1365 0.2803 0.7237 1.0267 0.5757 0.2185 0.7423 1.0399 0.8228 0.2215 0.7670 0.8771 0.2941 0.0155 1

I don’t know why fastai can’t add labels for my test dataset correctly. And how to add labels manually.

The test set is always unlabled. If you want labeled, follow my notebook here: https://github.com/muellerzr/fastai-Experiments-and-tips/blob/master/Test%20Set%20Generation/Labeled_Test_Set.ipynb

It seems a strange solution. In fact, you create two databunch. It may take more resource and I don’t know whether the procs(Normalize) wil work well. The procs should depend on train set only, but now you create two version.

I use two but use the originals procs :slight_smile: that way they’re applied the same