Kaggle NLP Competition - Toxic Comment Classification Challenge

Hi Bohdan,
I’m interested in forming a Team
my kaggle user is bruno16
Rgds
Bruno

I’m struggling with figuring out how to use the language model I trained to make predictions on multiple labels. I’m having two problems I’ve spent a couple days on.

  1. Creating dataset splits that feed in multiple labels to torchtext. I created a custom dataset that takes in dataframes and creates a different field for each label (similar to this post: Creating a ModelData object without torchtext splits?). Is this on the right track? Or should I be feeding in a list of six numbers directly to the label field for each example? Am I on the right track? I’d post code, but I’m not sure if that’s allowed because this is a Kaggle competition.

  2. Modifying the model decoder to output 6 predictions instead of one. As per this thread (Question on labeling text for sentiment analysis) I modified PoolingLinearClassifier to output the sigmoid of a 6 output units. Is this on the right track? I’m still not sure how the model will know what type of loss to use or which of the fields from the splits will be treated as labels.

Anyway, any help on this would be much appreciated! Is this way simpler than I’m making it? I feel like I’m missing something here!

1 Like

How do you call py file where you keep DL stuff?

You are on the right track!! Keep going :slight_smile:

1 Like

Very simple example of words polarity analysis based on Logit Regression coefficients.:

https://www.kaggle.com/sermakarevich/words-polarity-based-on-lr-weights

5 Likes

Here are attempts by classmates to load up the dataset with multiple labels (towards the bottom of the thread) if you find it helpful.

1 Like

Thank you so much for the help. I’m going to check out that discussion!

I am training a Bidirectional LSTM with pretrained GLOVE embedding using Crestle GPU. It is taking 1 hour to train per epoch. Is it normal ?
When I had trained CNN with pretrained GLOVE embedding it took only 1 minute per epoch.

CuDNNLSTM 1 epoch takes 2-3 minutes to run on GTX 1080 Ti with 300-x embeddings.

How much time did it train to train CNN ? Was it significantly less ?

Maybe 4-6 minutes for same 1 bidirectional layers LSTM with single FC layer of size 128.

Is anything wrong with this code ?

#MAIN Create LSTM model
model=Sequential()
model.add(Embedding(vocab_size, 300, weights=[embedding_matrix],
input_length=500, trainable=False))
model.add((Bidirectional(LSTM(50,dropout=0.2,recurrent_dropout=0.2 ))))
model.add(RepeatVector(500))
model.add((Bidirectional(LSTM(50,return_sequences=True ))))
model.add(GlobalMaxPool1D())
model.add(Dense(50, activation=“relu”))
model.add(Dropout(0.1))
model.add(Dense(6, activation=“sigmoid”))

compile the model

Adam_opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)
model.compile(optimizer=Adam_opt, loss=‘binary_crossentropy’, metrics=[‘acc’])

early_stopping = EarlyStopping(monitor=‘val_loss’, patience=5, mode=‘min’)
save_best = ModelCheckpoint(’/home/nbuser/toxic.hdf’, save_best_only=True,
monitor=‘val_loss’, mode=‘min’)

history = model.fit(X_train, y_train, validation_data=(X_eval, y_eval),
epochs=1, verbose=1,callbacks=[early_stopping,save_best])

1 Like

Looks okay
I tried with 2 bi-directional LSTM layers but the model did not improve as much as I expected.

I used CuDNNLSTM instead of LSTM, that helped. Brought down the training time

1 Like

Well, it is just deeper and wider. I use input_length 100-250, you use 500. I use 1 LSTM, you use 2 LSTMs. I do not use RepeatVector which, I assume, make output of LSTM1 500 times deeper. +LSTM instead of CuDNNLSTM. This might be the difference.

1 Like

Yes reducing 1 LSTM and using using CuDNNLSTM made it better.
Is 1 LSTM enough to learn a good representation ?

Hard to say what kind of architectures guys use in this competition. Some declared they can achieve 0.987 public and 0.99+ CV with a single model. My best GRU model gets 0.9811 and 0.987 CV. This very simple GRU gets 0.983 on public leaderboard but I have no idea about its CV score.

2 Likes

The number of words in my sentence(maxlen)=500. How much maxlen did you keep ?

I tried 100-500. GRU I shared in previous reply uses 100. On a forum I seen a recommendation to start with 100.

Ok. Thanks !!:grinning: