Structured Learner

ronlut · July 29, 2018, 4:19pm

You need out_sz=2 as there are 2 output probabilities (negative, positive).
In addition, why do you want binary cross entropy and not just pass is_multi=false? Then, have softmax decide whether it’s positive or negative?

stas · July 29, 2018, 4:48pm

what Rony said. If you update your fastai repository it now asserts sz>1 when is_reg=False, to prevent this kind of situations. https://github.com/fastai/fastai/pull/654

pierreguillou · July 29, 2018, 7:37pm

Hi @ronlut,

I’m doing one-label binary classification with is_reg=false, is_multi=false, out_sz=2, softmax, nll_loss and it works but my question was : why can’t we use out_sz=1, sigmoid and Binary Cross Entropy (BCE) with Fastai ?

Why do I want to do that ? 2 reasons :

For logistic regression (classification), the usual way is using 1 output with Sigmoid and the Binary Cross Entropy as loss function.
What are the performances and the 2 possibilities ([2 outputs + Softmax + nll_loss] vs [1 output + Sigmoid + BCE]) ? What is the best one ?

pierreguillou · July 29, 2018, 7:41pm

Thanks @stas. Your Fastai repository update confirms the [2 outputs + Softmax + nll_loss] solution to create a classifier with fastai for structured data.

However, I’m still wondering why we put on side the [1 output + Sigmoid + Binary Cross Entropy] possibility. The 2 solutions are not identical. Which one has the best performance ?

PranY · July 29, 2018, 7:46pm

Change is_multi = True, out_sz =1 and y.astype(np.float32) in ‘from_data_frame’ call.

PranY · July 29, 2018, 7:59pm

I tired both the approaches and the [1 output + Sigmoid + Binary Cross Entropy] worked better for me but only marginally.

If you look at the definition of nll loss here

It says the negative log likelihood loss is useful to train a classification problem with C classes.

If provided, the optional argument weight should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.

The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size either (minibatch,C) or (minibatch,C,d1,d2,…,dK) with K≥2 for the K-dimensional case.

Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer.

I think this explains why they perform identical, however, my experimentation with structured data is very limited and I would love to hear from others.

stas · July 29, 2018, 7:59pm

And you will need is_reg=True, since otherwise it’s a classification, and you need at least out_sz=2.

However, I’m still wondering why we put on side the [1 output + Sigmoid + Binary Cross Entropy] possibility. The 2 solutions are not identical. Which one has the best performance ?

I hope someone with more experience can answer that, I haven’t yet tried this combo myself.

Idealy, we should have a demo notebook for each of these different combinations. At the moment it’d be nice to have a structured data binary and multiclass classification examples. I think titanic kaggle competition would be a perfect simple example for the binary classification from structured data. What would be a good structured dataset to use as a demo for a multiclass classification? I’d be happy to work on that while I’m learning.

PranY · July 29, 2018, 8:05pm

My explanation is for classification scenarios only.

For classification:

Use out_sz=2, is_reg=False, is_multi=False and it recognizes to use NLL
Use out_sz=1, i2_reg=False, is_multi=True and set your target to FloatTensor, this will work with BCE

Both ways I trained a binary classifier.

Patrick · July 29, 2018, 8:11pm

I believe that mathematically your architecture+loss function is equivalent to the [2 output + Softmax + nll_loss] architecture+loss function. Could it be your results are better due to randomness?

PranY · July 29, 2018, 8:38pm

I think so, I ran multiple iterations with same architecture and the 2 cases and they pretty much converged on the same val_loss

stas · July 29, 2018, 8:45pm

Shouldn’t classification have at least 2 outputs? If, that’s true then this change of mine is wrong: https://github.com/fastai/fastai/pull/654 and needs to be reverted. My intent was to check on the model-level whether the inputs and model are of the correct configuration to avoid all kinds of failures from pytorch. The check was:

if is_reg==False: assert out_sz >= 2, "arg is_reg==False (classification) requires out_sz>=2"

do you think it’ll still be a good validator if it’s adjusted to:

 if is_reg==False and is_multi==False: 
    assert out_sz >= 2, "arg is_reg==False/ is_multi==False (classification) requires out_sz>=2"

or whether we should allow any out_sz for perhaps there are other cases where it’s needed?

Also, do you have a working notebook that I could play with it to check those both combinations. Thank you.

PranY · July 30, 2018, 6:30am

I read your conversation with sgugger here and I think I’ll still stick to what I mentioned earlier as I tested it myself.

For out_sz =1, the loss would be BCE, the pytorch documentation here says

Shape:

    Input: (N,∗) where * means, any number of additional dimensions
    Target: (N,∗), same shape as the input
    Output: scalar. If reduce is False, then (N, *), same shape as input.

For out_sz =2, the loss would be NLL, the pytorch documentation here says

Shape:

    Input: (N,C) where C = number of classes, or (N,C,d1,d2,...,dK) with K≥2 in the case of K-dimensional loss.

    Target: (N) where each value is 0≤targets[i]≤C−1, or (N,d1,d2,...,dK) with K≥2 in the case of K-dimensional loss.

    Output: scalar. If reduce is False, then the same size as the target: (N), or (N,d1,d2,...,dK) with K≥2 in the case of K-dimensional loss.

As you can see, the input for the relevant losses can be (N,*) or (N,C), in case of binary classification, we can feed in input both ways and use relevant last layer (Log softmax or not).

Well, I’m unsure, I did find it hard to figure out what works when, but I think out_sz can still sit without check because the error that I catch from pytorch clearly points to the same. At times pytorch fails badly so I debug on cpu with pdb, I think I’ll share a notebook with all possiblities of regression and classification and leave it for everyone to decide what is the best way out.

PranY · July 30, 2018, 3:27pm

I didn’t had any working notebook but it took a few mins to create this one, hope it helps.
The data used for the example is also available at the same location. Please share your thoughts.

PranY · July 30, 2018, 3:30pm

I found similar result with my recent experiment, can you please take a look if the results are only better due to randomness?

stas · July 31, 2018, 4:58am

Thank you for sharing the code, @PranY

Before looking at the correctness/accuracy let’s first make the code work.

I needed to move cat_sz and emb_szs creation before proc_df, otherwise the code fails. since proc_df reduces cat columns to numbers. but that’s a minor thing - just couldn’t use your code out of the box. perhaps you could update your notebook - in case others decide to experiment with it.

So now let’s add metrics:

fit(..., metrics=[accuracy])

it works in binary classification out_sz=2, but fails with out_sz=1.

TypeError: eq received an invalid combination of arguments - got (torch.cuda.FloatTensor), but expected one of:
 * (int value)
      didn't match because some of the arguments have invalid types: (torch.cuda.FloatTensor)
 * (torch.cuda.LongTensor other)
      didn't match because some of the arguments have invalid types: (torch.cuda.FloatTensor)

That’s actually why I added that assert out_sz>1 - as I was getting this error in titanic dataset and for a long time I failed to see that I had out_sz=1 so I thought it’d save someone some hair.

That’s said if someone has a way to resolve this then we can look at accuracy next.

The problem is that it wants y to be long integer and you gave it float32. But if you switch y to Integer then a reverse problem appears, I don’t remember the exact error now, but again type mismatch.

PranY · July 31, 2018, 5:27am

Sure, I’ll do that, Thanks for pointing it out.

Ah, I understand your point, I’ll look into it and get back to you as soon as I can. If I find a way out, I’ll simply update the same notebook and revert back.

PranY · July 31, 2018, 2:38pm

Hi,

I moved cat_sz and emb_szs before proc_df (Thanks for suggesting this), Now it should work out of the box. Although I’ve tried to ignore warnings but it seems I still get warnings during the call to .fit. Will look into it later. Notebook should work fine end-to-end.

I have discussed the metrics part in detail, I hope I was able to convey the message. Please let me know if that helps. Feel free to drop in any questions you may have.

The notebook is available in the same location here

MicPie · July 31, 2018, 6:26pm

Hmm, no matter what I do my NN with the structured learner seems to get always stuck at the same metric while the loss is decreasing. Then, the prediction always generates the same output for all data points. This happens with additional data (other embeddings) and with different hidden layer numbers.

I do not get any error message and I’m a little clueless because I don’t have an error to debug.

Did somebody encountered this strange behavior before?

PranY · August 1, 2018, 6:37am

Try using .cpu() on your MixedInputModel and add a pdb.set_trace() in accuracy definition and follow it step by step. I think that should help, I’ll be glad to help if you can share the notebook you are using.

stas · August 2, 2018, 5:13am

Thank you for updating the notebook and putting detailed steps to explain what you were presenting, @PranY

When I run it I get multiple warnings on every batch of fit/lr_find run:

[...]python3.6/site-packages/torch/nn/functional.py:1189: UserWarning: Using a target size (torch.Size([512])) that is different to the input size (torch.Size([512, 1])) is deprecated. Please ensure they have the same size.
  "Please ensure they have the same size.".format(target.size(), input.size()))