Negative loss with letters MNIST

Hello,

I have a notebook in which I am trying to create an Arabic MNIST notebook from numpy arrays directly!

Here is the notebook

I am not sure why I am getting a negative loss in validation error and test error.

Steps:

  1. Import libraries
  2. Read data and create numpy arrays.
  3. Created a new Dataset class named ArabicMNIST
  4. Reshaped images (I can feed without reshaping as well but I want to create three channels by copying single channel 3 times)
  5. In ArabicMNIST Dataset I am normalizing per image (which is wrong but I tried to normalize the entire data first as well)
  6. In ArabicMNIST Dataset class I created 3 channels and transposed the images in [C, H, W] notation.
  7. Created a Databunch with my new ArabicMNIST dataset.
  8. Created learn object and trained for 3 epochs.

Why sometimes most of the times I am getting a negative loss?

Thanks

Edit: Moved to normal thread. (Better idea?)

3 Likes

I don’t understand what these lines of code are meant to do :

arr = (image - image.mean()) / image.std() # normalizes between -1 and +1
arr = (arr + 1) / 2 * 255 # moves it between 0 and 255
arr /= 255.0   ### WITH RANGE [0 ... 1]
#         arr = np.clip(arr, 0, 255).astype(np.uint8),
arr3D = np.tile(arr[..., None], 3)

Maybe instead of normalising try dividing by max pixel value in the ds.

It could be through something that your probs are coming out wrong. 1e-9 is way to small to fix this in the next epochs so try higher

1 Like

@bluesky314
Above code is to normalize the images (btw this is not how you want to normalize the images) then np.tile to create 3 channels by copying 1 channel 3 times! :slight_smile:

Could it be something to do with codes related to this line and the comments

arr = (image - image.mean()) / image.std() # normalizes between -1 and +1

arr is normalized and its value has a bell-curve shape, so it doesn’t really stay between -1 and +1. You could have a value smaller than -1.

@wyquek @bluesky314 I have removed normalization from the data!
I have tried Fashion MNIST on the same custom DataSet. I am getting same issue. So now we know, there is a problem somewhere in the Custom DataSet class. But what could be the issue? Maybe @lesscomfortable can help?

Have you tried to apply same transformation and normalization to Target ?

I did not apply any transformation! Could it be an issue?

Can you provide pixel intensity values of predictions and ground-truth?

By the way as par this notebook which I created to understand the torch.nll_loss, I am also receiving the negative values (outputs) of the loss function. I have attached a screenshot in this forum doubt which went under the bag. :slight_smile:

The input in this screenshot is obtained by running the Arabic MNIST notebook with 1 batch size, debugging and inspecting the output of the model with %%debug.

just guessing but for your latest example would you like to try using F.nll_loss(F.log_softmax(input), target) from example in pytorch docs

# input is of size N x C = 3 x 5
input = torch.randn(3, 5, requires_grad=True)
 # each element in target has to have 0 <= value < C
target = torch.tensor([1, 0, 4])
output = F.nll_loss(F.log_softmax(input), target)
output.backward()

not quite sure, but seems like you could be missing a softmax somewhere.

Right that is exactly what I was thinking but again why I would be missing the softmax because I called DataBunch.create and model is still created by create_cnn. Also DataBunch is creating dataloaders. Shouldn’t DataBunch or create_cnn apply softmax? This is why I am cancelling the possibility of softmax. (Or am I missing something?)

@wyquek btw in my notebook after applying log_softmax on input I got positive result.

Also I noticed that PyTorch official docs says, we need to apply log_softmax with F.nll_loss and I couldn’t find any occurrence in the code related to log_softmax. Also hidden function _loss_func2activ which is using proper activation function for loss_function is used only at the time of prediction and not at the time of calculating loss (in get_preds only).
Either I am missing something big or there is bug (which I highly doubt is the case because I am the only one who received negative loss)

PyTorch docs example:

Try this and see if the negative losses go away

learn = create_cnn(data, models.resnet18, metrics=error_rate) 
learn.loss_func = torch.nn.functional.cross_entropy <<== add this
learn.fit(3, 1e-9)
2 Likes

I’ll try this. But isn’t it same as writing, F.nll_loss(F.log_softmax(input), output)

I suppose you could do that too, but I can’t help but noticed @uwaisiqbal, a fastai student, had hit the ball out of the ballpark 10 days ago on Kaggle using fastai v1 codes sort of similar to the lesson notebook

I think I have got it. So the thing is Fastai library sets up the loss_func properly while creating the labels. While creating labels at one point you will hit this function [label_from_list] (https://github.com/fastai/fastai/blob/master/fastai/data_block.py#L178) Now this function is calling [get_label_cls](https://github.com/fastai/fastai/blob/master/fastai/data_block.py#L168) and there they are setting up the List type (MultiCategoryList, FloatList, CategoryList) In case of multi-class classification we end up with having CategoryList. In this class we have self.loss_func = F.cross_entropy (https://github.com/fastai/fastai/blob/master/fastai/data_block.py#L245)

Now when we created the custom dataset with DataBunch.create method, this label function was not called and hence Fastai library added default loss function which is F.nll_loss.

@wyquek (and me) already have discussed how F.nll_loss(F.log_softmax(inp), out) == F.cross_entropy(inp, out) Which is F.cross_entropy applying F.log_softmax automatically but in case of F.nll_loss we need to apply that manually. And when you don’t apply F.log_softmax result can be negative. So it is a good idea to give a look at your loss function and pondering whether you are getting desired loss function or not! :slight_smile:

(Now it would be really great if someone from fastai can confirm my findings! :slight_smile: )

Two solutions for this:

  1. Apply F.log_softmax on inputs before feeding them to F.nll_loss
  2. An easy solution is set loss function manually. learn.loss_func = F.cross_entropy

I have updated the notebooks in the title!

Special thanks @wyquek @bluesky314 and @keyurparalkar

9 Likes

Excellent analysis. To add to this answer, I would first print out the predictions that are made. Then would plot the histogram for both predictions as well as ground truth. And would come up with calculating loss for some values using different loss functions for values which are causing negative frequency in histogram. But still your approch sounds easy and apt one.

Hey guys,

Just saw this thread today. I put together a solution with the Arabic MNIST dataset here.
The github repo is a fork from the original repo with the dataset where he has the images saved as files as well.

Rather than using the raw data from the csv I used the image files and created a directory structure to match what is needed by the fastai library. My best guess is that something has gone wrong when you are converting the csv values to pixel values for the fastai library to then use.

I’ve been playing around with your codes, and it’s surprisingly hard to beat that 96.8%

P.S. seems like ImageFileList is no more; not sure if you want to update your nb from

data = (ImageFileList.from_folder(path)
       .label_from_func(get_label)
       .split_by_folder()
       .datasets()
       .transform(get_transforms(do_flip=False, max_lighting=0.5, p_lighting=0.9), size=224)
       .databunch()
       .normalize(imagenet_stats))

to

data = (ImageItemList.from_folder(path)
        .split_by_folder()
        .label_from_func(get_label)        
        .transform(get_transforms(do_flip=False, max_lighting=0.5, p_lighting=0.9), size=224)
        .databunch()
        .normalize(imagenet_stats))

Good spot! I’ll update it when I get some time. Haha I spent a couple hours at it and tried a bunch of things but that 96.8% is pretty stubborn. If you take a look at the most confused items, they are really poorly handwritten examples which look like scribbles lol. I think it’s because of the data.

1 Like