Hello,
I’m new to Deep Learning and trying to make some code to better understand how it all works. I have a tabular file of about 190,000 lines that looks like this:
Name | AA | SS | A | C | D | E |
---|---|---|---|---|---|---|
Name1 | G | C | ||||
Name2 | N | E | ||||
Name3 | K | C | ||||
Name4 | H | H | ||||
Name5 | D | H | 0.6127 | |||
Name6 | D | H | 0.451613 | |||
Name7 | E | E | 0.393627 | |||
Name8 | E | C | 0.617496 | |||
Name9 | H | E | ||||
Name10 | R | C | ||||
Name11 | Q | C | ||||
Name12 | S | H | ||||
Name13 | H | C | ||||
Name14 | N | H | ||||
Name15 | Q | E | ||||
Name16 | K | H | ||||
Name17 | Q | E | ||||
Name18 | T | C | ||||
Name19 | T | C | ||||
Name20 | D | H | 0.392787 |
As you can see, I have a lot of missing data. I would like to make a model to predict the values of columns A, C, D and E based on columns AA and SS.
Here is my snippet of code:
from fastai.tabular.all import *
df = pd.read_table('Results.tsv')
splits = RandomSplitter(valid_pct=0.2)(range_of(df))
to = TabularPandas(df, procs=[Categorify, FillMissing,Normalize],
cat_names = ['AA', 'SS'],
y_names= ['A', 'C', 'D', 'E'],
splits=splits)
dls = to.dataloaders(bs=64)
learn = tabular_learner(dls, metrics=[accuracy_multi], loss_func = BCEWithLogitsLossFlat())
learn.fit_one_cycle(1)
Here is what I get.:
epoch train_loss valid_loss accuracy_multi time
0 nan nan 0.000000 00:01
Do you have an idea of the problem ?
Thanks