Lesson 4 In-Class Discussion


(Pete Condon) #204

Sure you can, “this movie is shit” and “this movie is the shit” mean very different things :wink:


(Rafael Valdivia) #205

i did remove stop words with imdb before and it does help


(Pranjal Yadav) #206

bag of words is a method to throw away order and keep bundled text with more frequent combined occurrences, whereas, vocab is a unique set of all possible words we want to keep in for modeling purpose.


(Charles C. Lee) #207

hahah touche :slight_smile:


(Even Oldridge) #209

It sounds like that break doesn’t happen with review boundaries in mind? Does that matter? And in particular if all the reviews tend to be short does it matter if multiple reviews get appended together?


(Ezequiel) #210

Does someone know what last papers is talking Jeremy about? Papers that are a breaktrough in NLP?
Maybe https://einstein.ai/research/non-autoregressive-neural-machine-translation?


(yinterian) #211

Jeremy came up with this idea he is discussing. He then found some papers that talk about it.


(Wayne Nixalo) #212

< lecture thoughts – BPTT I think >

I wonder how you could apply this to vision – say a self-driving car or a plane; context is very important… (maybe esp. if you could learn a weighting since some states of a car or plane have a large effect on what’s possible later on) … is there a way to encode images / state the way Jeremy just showed with words?

– around 1:50:00 in the lecture, when Jeremy showed the array of text after someone asked a question.

Hmm… how to do backprop-through-time for images… maybe a multi-input with perceived state --> fed into a ‘decision-maker’ NN…


(Chris Palmer) #213

Are the sentences just concatenated together? And what about something like movie genre - would that have an impact on the type of language / reviewer etc. Is it just the biggest number that wins?


#214

Batching and bptt explained well on this link https://github.com/pytorch/examples/blob/master/word_language_model/main.py


(john v) #215

What’s the size of the embedding for this language model?

— edit —

answer = 200


(Pranjal Yadav) #216

Yes, the sentences are concatenated. A column represents the same.
Movie genre may impact the sentiment given the bias(not ML wise) of the reviewer with associated movie genres.
Its not simply the biggest number winning. You can think of it as complex combination of sentiment index associated with each word. Jeremy will talk about it I guess.


(Karthik Ramesh) #218

@yinterian Why aren’t we using pre-trained embedding like word2vec or Glove?


(Wojciech Czarnowski) #219

How can we use CNNs for NLP tasks?


(Chris Palmer) #220

What kind of hardware do you need to do this in a timely fashion?


(Phani Srikanth) #221

Some details for Machine Translation in here: https://code.facebook.com/posts/1978007565818999/a-novel-approach-to-neural-machine-translation/


(Chris Palmer) #222

Would you need to use this approach to solving other NLP problems like topic modelling?


(Pranjal Yadav) #223

Coz word2vec or Glove aren’t IMDB content specific and pretty huge in the embeddings dimension. By training our own IMDB embeddings we are creating is small and meaningful embedding for our language model.

You can still use word2vec or Glove on our current dataset :slight_smile:


(yinterian) #224

Topic modeling often is not modeled as a classification problem. For a NLP classification problem this should work.


(Chris Palmer) #225

Sure, but I am trying to get my head around the utility of language modelling with fast.ai - can it feed into topic modelling?