Shape error in keras when the output variable is categorical

prateek2686 · December 1, 2017, 8:16pm

I am using the following, fairly simple code to predict an output variable which may have 3 categories:

n_factors = 20
np.random.seed = 42

def embedding_input(name, n_in, n_out, reg):
    inp = Input(shape=(1,), dtype='int64', name=name)
    return inp, Embedding(n_in, n_out, input_length=1, W_regularizer=l2(reg))(inp)

user_in, u = embedding_input('user_in', n_users, n_factors, 1e-4)
artifact_in, a = embedding_input('artifact_in', n_artifacts, n_factors, 1e-4)

mt = Input(shape=(31,))
mr = Input(shape=(1,))
sub = Input(shape=(24,))

def onehot(featurename):
    onehot_encoder = OneHotEncoder(sparse=False)
    onehot_encoded = onehot_encoder.fit_transform(Modality_Durations[featurename].reshape(-1, 1))
    trn_onehot_encoded = onehot_encoded[msk]
    val_onehot_encoded = onehot_encoded[~msk]
    return trn_onehot_encoded, val_onehot_encoded

# One hot encode the categorical variables
trn_onehot_encoded_mt, val_onehot_encoded_mt = onehot('modality_type')
trn_onehot_encoded_mr, val_onehot_encoded_mr = onehot('roleid')
trn_onehot_encoded_sub, val_onehot_encoded_sub = onehot('subject')
trn_onehot_encoded_quartile, val_onehot_encoded_quartile = onehot('quartile')

# Model
x = merge([u, a], mode='concat')
x = Flatten()(x)
x = merge([x, mt], mode='concat')
x = merge([x, mr], mode='concat')
x = merge([x, sub], mode='concat')
x = Dense(10, activation='relu')(x)
BatchNormalization()
x = Dense(3, activation='softmax')(x)
nn = Model([user_in, artifact_in, mt, mr, sub], x)
nn.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

nn.optimizer.lr = 0.001
nn.fit([trn.member_id, trn.artifact_id, trn_onehot_encoded_mt, trn_onehot_encoded_mr, trn_onehot_encoded_sub], trn_onehot_encoded_quartile, 
       batch_size=256, 
       epochs=2, 
       validation_data=([val.member_id, val.artifact_id, val_onehot_encoded_mt, val_onehot_encoded_mr, val_onehot_encoded_sub], val_onehot_encoded_quartile)
      )

Here’s the summary of the model:

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
user_in (InputLayer)             (None, 1)             0                                            
____________________________________________________________________________________________________
artifact_in (InputLayer)         (None, 1)             0                                            
____________________________________________________________________________________________________
embedding_9 (Embedding)          (None, 1, 20)         5902380     user_in[0][0]                    
____________________________________________________________________________________________________
embedding_10 (Embedding)         (None, 1, 20)         594200      artifact_in[0][0]                
____________________________________________________________________________________________________
merge_25 (Merge)                 (None, 1, 40)         0           embedding_9[0][0]                
                                                                   embedding_10[0][0]               
____________________________________________________________________________________________________
flatten_7 (Flatten)              (None, 40)            0           merge_25[0][0]                   
____________________________________________________________________________________________________
input_13 (InputLayer)            (None, 31)            0                                            
____________________________________________________________________________________________________
merge_26 (Merge)                 (None, 71)            0           flatten_7[0][0]                  
                                                                   input_13[0][0]                   
____________________________________________________________________________________________________
input_14 (InputLayer)            (None, 1)             0                                            
____________________________________________________________________________________________________
merge_27 (Merge)                 (None, 72)            0           merge_26[0][0]                   
                                                                   input_14[0][0]                   
____________________________________________________________________________________________________
input_15 (InputLayer)            (None, 24)            0                                            
____________________________________________________________________________________________________
merge_28 (Merge)                 (None, 96)            0           merge_27[0][0]                   
                                                                   input_15[0][0]                   
____________________________________________________________________________________________________
dense_13 (Dense)                 (None, 10)            970         merge_28[0][0]                   
____________________________________________________________________________________________________
dense_14 (Dense)                 (None, 3)             33          dense_13[0][0]                   
====================================================================================================
Total params: 6,497,583
Trainable params: 6,497,583
Non-trainable params: 0
_____________________________

But on the fit statement, I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-71-7de0782d7d5d> in <module>()
      5        batch_size=256,
      6        epochs=2,
----> 7        validation_data=([val.member_id, val.artifact_id, val_onehot_encoded_mt, val_onehot_encoded_mr, val_onehot_encoded_sub], val_onehot_encoded_quartile)
      8       )
      9 # nn.fit([trn.member_id, trn.artifact_id, trn_onehot_encoded_mt, trn_onehot_encoded_mr, trn_onehot_encoded_sub], trn.duration_new,

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1520             class_weight=class_weight,
   1521             check_batch_axis=False,
-> 1522             batch_size=batch_size)
   1523         # Prepare validation data.
   1524         do_validation = False

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_batch_axis, batch_size)
   1380                                     output_shapes,
   1381                                     check_batch_axis=False,
-> 1382                                     exception_prefix='target')
   1383         sample_weights = _standardize_sample_weights(sample_weight,
   1384                                                      self._feed_output_names)

/home/prateek_dl/anaconda3/lib/python3.5/site-packages/keras/engine/training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    142                             ' to have shape ' + str(shapes[i]) +
    143                             ' but got array with shape ' +
--> 144                             str(array.shape))
    145     return arrays
    146 

ValueError: Error when checking target: expected dense_14 to have shape (None, 1) but got array with shape (1956554, 3)

How do I resolve this error? Why is the final layer expecting (None,1) when according to the summary() it has to output (None,3)?

@jeremy - Am I doing something obviously stupid?
Any help would be greatly appreciated.

machinethink · December 2, 2017, 1:29pm

Try using categorical_crossentropy, without the sparse. Alternatively, don’t one-hot encode your training and validation labels.

prateek2686 · December 4, 2017, 8:03pm

Thanks @machinethink. Using categorical_entropy worked.

If I don’t one-hot encode my categorical variables, wouldn’t the model treat them as simple numbers and not categories? I want the model to consider them as categories.

machinethink · December 4, 2017, 8:26pm

That’s what sparse_categorical_crossentropy means: you don’t need to one-hot encode your variables, Keras will do this behind-the-scenes and treat it as if you did.

prateek2686 · December 4, 2017, 9:39pm

I see, that makes sense. However, for some reason when I use sparse_categorical_entropy without externally one-hot encoding the categorical variables, I get a much worse result than when I use categorical_entropy with one-hot encoding.

Thanks for your help!