Kaggle Comp: NLP Classification

Looks like the last fastai update changed the definition

@jeremy if we already have the dataset loaded from pandas.
Is the PATH really needed on the method LanguageModelData.from_dataframes?

Yes the learner needs to know where to store models etc.

if it is not too late, can I also join the group?

Hi,

I am looking for torchtext examples on pre-processing text data and loading pre-trained word embedding matrices. Can someone point me in the right direction?

Thanks!!

Does dataset class expects data to be in dogscats or Keras style structure e.g. train/all/pos or train/all/neg

IMDB
image

Spooky’s content:
image

Should we convert our data into IMDB type structure or dataset class can be customized to handle csv type data?
My dataset is similar.
image

@jeremy, @yinterian: Could you pls. share your thoughts?

2 Likes

I want to try myself in this competition as well. But I’m worried about training time for language model. In imdb notebook Jeremy did around 60 epochs for language model which would train around 20 hours on p2 Amazon.
Do you guys train your own language models for this competition based on dataset provided by kaggle or is it possible to use other pretrained models ?

1 Like

@rob Did you manage to fix overfitting issue?

It’s fast on p2 (say few minutes to 1 hour). The data is less.

@rob Did you manage to fix overfitting issue?

No, I just trained until training loss became better than validation loss and stopped there

How much loss could you manage to reduce.

Down to 3.2 or so. Note this wasn’t for spooky author, it was for another dataset

interesting.

Is Spooky worth trying with DL? Do we’ve enough data?

1 Like

Are you using splits method for the problem you are working on?

Spooky author dataset gave me this with the seed : "I will love you forever, my love ..."

" ..the most wonderful and most most appalling , and when , upon the whole of the day , i had been obliged to make a thorough account of the scene . <eos> i had been obliged to make a thorough account of the thing that the corpse had been made ..."

Not bad –
… automated script writing, may be … who knows?

Shakespeare … this is a gold mine http://www.gutenberg.org/cache/epub/100/pg100.txt

This is pure drama – with the seed :I will remember you till the last day I live, my love...

, and your love will be too much occupied in the power of the world . <eos> i had been obliged to take a definite place , and i would not be at the trouble of making it more than a coincidence . <eos> i had been obliged to make ...

This thing writes itself .

Trained on Shakespeare’s entire work -

Seed - 'All that glitter is not gold’
All that glitters is not gold
tears to darkle, put in him discourse [enter as man first witch. in love and wear begin to queen in cheek up and side, yet, balfumus! reignier. but shall i suward in this whereto some is, and the sweet importune in dance. constable. great house satisfy, quoth my trroap in no by this need; but if i might approve it will nevel thrice beatrice. better. a disprison! either out and thus on the earth to the undererates did. countess. how shall we go to my lordshire. with blind country- large your lamoving king? her. faith, and god barnardic. and speak; and of spit hence of it have done i did break his bloody- pardon fast unto the forceful of dole charles dismissed waure. get you thut, and import’d it before he resires out o, do do such a more or de preading. hence, who not remove proceeding on a reign, what cap mischances your time, who are a follying power honesty, and lord aufidius. she’s

7 Likes

Hi,

I have trained RNNs using pre-trained Glove Embeddings on that data. Accuracy is ~ 83%. The loss is less than NB. The data is on the smaller side but I think it’s not a lost. Someone ranked in the top 10 said that s/he is not using LSTMs.

PS - Were you able to fix the torch text issue? I am unable to figure out how to feed the data and use pre-trained embeddings in Pytorch. Had to move to Keras for this.

Not yet. Need more time to experiment with code to create data loader in right manner.

But, I could predict using @KevinB’s suggestion over here.

1 Like

Language model capture Julius Ceaser well -

Stars, hide your fires; Let not light see my black and deep desires. my notfolky drignified, sir? gloucester. fombark, sir, thy words out. alone of their name is in his breatulate land; and i have she vex’d, all octavia in them- prince of wales. i warrant my bearrestry, as you shall plutter a volumny of me; he troubles to tell her. [cassius.] brutus. be so most fool. i would see thee too much? simpcox. perambly and she, for a duke. i cannot speak such as more than you do struck, for last as good life. brought by thee, do your greatness for the gentleman. hither kitchens had i slain on the way; under mystery with an high tongues, and anything there the deer to kiss your life against your sons to proclaim for news and blank aboot in-the bastard! how say, i will, archbishop of duke. debite you, who can well die them. staff. glou. i do nothing [feet and lords.] second messenger. false trable himself. prince. you think the

Glad I am not from the 16th century -
The fault, dear Brutus, is not in our stars, but in ourselves. well, sir. let the likelip like; but a matter. corn. i have strange, sir. marry, but the age of great life to prick the voice horse he had empir’d his bed. and i say grief! the fees is gone. cloten. yes, now the other poor lome. elbow. ay, my lord, face you, sir, i do not, unednan subject. enter king edward ratcliff. argument of his son, being, and i am a lady. and i not agone die. claud. feason, don lancaster, you may convey my fore2. rosaline. my lords. orlando. i know you cassio contempt: and the stroke of the cause flesh and brief his friendness? beat. i cry our office have but great heaven. her good angel? do you le here in his father! we shall read goodly time the duke of the deer was sweet that monking off, a brother of the duke of norfolk, and two senators en posted and bardolph, prince of guarded friend and dovico, buckingham, and isabella rom. fie, will truvy then, and make him become me whom that, hath not been from thy n

Using Shakespeare work as language model. Good fun.

1 Like

What were the column names for your dataframes?