Calculating accuracy on a test set

JohnyCZS · March 28, 2024, 12:03pm

Hello, I am aware that the question I’m asking has been answered here few times. But I haven’t been able to make sense of the answers or the list provided in these answers don’t work anymore.

I’m currently trying to calculate accuracy on my test set of tabular data. It is a single category classification with 4 possible outputs as a category. I have achieved 100% accuracy on my training set, but I’m very sceptical of this and I’m trying to calculate accuracy on my test set.

I will try to provide code snippets and would very much welcome any feedback on them.

df = pd.read_csv("TEST.csv")

here I get a dataFrame fro my CSV file.

train=df.sample(frac=0.8)
test_df=df.drop(train.index)

After that I split it into my training and test set.

splits = RandomSplitter(valid_pct=0.2)(range_of(train))

to = TabularPandas(df = train, 
                  procs = [Categorify, FillMissing],
                  cat_names = [x for x in df.columns.values if x != "target"],
                  y_names = "target",
                  y_block = CategoryBlock(),
                  splits = splits)

I then create splits and load them to TabularPandas, create my model, find learning rate and then:

learn.fit(3, 5e-3)

As I said accuracy goes up to 100% while both train and valid loss decrease in each epoch.
I then create a new DL from my test set
dl = learn.dls.test_dl(test_df)

learn.validate(dl=dl)
preds, y = learn.get_preds(dl=dl)
acc = accuracy(preds, y)

Where acc return: TensorBase(1.)

I have also ran

acc2 = learn.validate(dl = dl)

Where dl is the same as above
Which returns [0.004373230040073395,1.0]

Any help would be greatly welcome and would be a huge help.

vbakshi · April 23, 2024, 4:47am

Did you find an explanation that resolved this question for you? If not, can you share the CSV dataset? I can try to run the training in a colab/Kaggle notebook and see what result I get.