Accuracy_multi = 0.000000

ATVincent · April 19, 2023, 3:43pm

Hello,

I’m new to Deep Learning and trying to make some code to better understand how it all works. I have a tabular file of about 190,000 lines that looks like this:

Name	AA	SS	D	E
Name1	G	C
Name2	N	E
Name3	K	C
Name4	H	H
Name5	D	H	0.6127
Name6	D	H	0.451613
Name7	E	E		0.393627
Name8	E	C		0.617496
Name9	H	E
Name10	R	C
Name11	Q	C
Name12	S	H
Name13	H	C
Name14	N	H
Name15	Q	E
Name16	K	H
Name17	Q	E
Name18	T	C
Name19	T	C
Name20	D	H	0.392787

As you can see, I have a lot of missing data. I would like to make a model to predict the values of columns A, C, D and E based on columns AA and SS.

Here is my snippet of code:

from fastai.tabular.all import *

df = pd.read_table('Results.tsv')

splits = RandomSplitter(valid_pct=0.2)(range_of(df))

to = TabularPandas(df, procs=[Categorify, FillMissing,Normalize],
                   cat_names = ['AA', 'SS'],
                   y_names= ['A', 'C', 'D', 'E'],
                   splits=splits)

dls = to.dataloaders(bs=64)

learn = tabular_learner(dls, metrics=[accuracy_multi], loss_func = BCEWithLogitsLossFlat())

learn.fit_one_cycle(1)

Here is what I get.:

epoch     train_loss  valid_loss  accuracy_multi  time    
0         nan         nan         0.000000        00:01

Do you have an idea of the problem ?

Thanks

ATVincent · May 3, 2023, 7:24pm

Hi !

Just a little up, please

meanpenguin · May 15, 2023, 6:17pm

The results.tsv training data needs to have the columns A, C, D, E populated.
The learner needs to learn from the samples of input and output you provide.

Once trained, you can do predictions by providing just the AA and SS columns.