Testdataset misses labels

erow · August 28, 2019, 3:28am

data source : https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite)

path =Path('DataSet')
df =pd.read_csv('DataSet/sat.trn',sep=' ',header=None)
df1=pd.read_csv('DataSet/sat.tst',sep=' ',header=None)
test = TabularList.from_df(df1,path=path)
data=(TabularList.from_df(df,path=path,cont_names=list(range(36)),procs=Normalize)
      .split_by_rand_pct(0.2)
     .label_from_df(cols=36)
      .add_test(test)
      .databunch(bs=64))
data.show_batch(10,ds_type=DatasetType.Test)

And it looks like this below.

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	target
0.7826	0.8044	0.1566	-0.1941	0.5116	0.8179	0.1663	-0.1901	0.5327	0.8377	0.4199	0.0233	0.4959	0.6765	0.5094	0.1147	0.5191	0.8639	1.1195	0.2738	0.8330	1.0529	1.1365	0.2803	0.7237	1.0267	0.5757	0.2185	0.7423	1.0399	0.5837	0.2215	0.7670	1.0525	0.8357	0.2250	1
0.4898	0.8044	0.1566	-0.1941	0.5116	0.8179	0.4047	0.0190	0.5327	0.8377	0.4199	0.2324	0.4959	0.8512	1.1064	0.2721	0.8150	1.0384	1.1195	0.2738	0.8330	1.2714	1.1365	0.2803	0.7237	1.0267	0.5757	0.2185	0.7423	1.0399	0.8228	0.2215	0.7670	0.8771	0.2941	0.0155	1

I don’t know why fastai can’t add labels for my test dataset correctly. And how to add labels manually.

muellerzr · August 28, 2019, 3:29am

The test set is always unlabled. If you want labeled, follow my notebook here: https://github.com/muellerzr/fastai-Experiments-and-tips/blob/master/Test%20Set%20Generation/Labeled_Test_Set.ipynb

erow · August 28, 2019, 3:46am

It seems a strange solution. In fact, you create two databunch. It may take more resource and I don’t know whether the procs(Normalize) wil work well. The procs should depend on train set only, but now you create two version.

muellerzr · August 28, 2019, 3:46am

I use two but use the originals procs that way they’re applied the same