RNN Design Guidelines?

pietz · September 26, 2017, 12:40pm

After spending most of my time with images and CNNs, I’m starting to get into NLP with RNNs. Although Jeremys lectures on the topic are fantastic, my understanding how to tweak an architecture is still limited.

Are you aware of any rules of thumb to live by when certain problems emerge with your current architecture? When would I increase the number of channels opposed to building a deeper model and vice versa? Are their any rules where you would usually position dropout and how much do you commonly use in RNNs? When would I go for a character level instead of word level predictions? These types of things.

Thanks

mmusket · September 26, 2017, 1:05pm

I’d try using 1D convolusions first instead of RNNS. They train much much faster and are harder to overfit.

pietz · September 26, 2017, 1:08pm

any paper or tutorials you can point to regarding CNN based sequence-to-sequence models? i’d love to stick on the CNN side of things, but i got the impression that many fields in NLP heavily benefit from RNNs.

mmusket · September 27, 2017, 1:00am

Here is a working example in keras based on one of the main papers on the subject

pietz · September 28, 2017, 3:55pm

thanks for your time.

how would i create a CNN model that outputs a fixed length of characters?

In the course I only see them used for sentiment analysis and predicting the next char based on the previous sequence. meaning, both can be seen as a single 1-out-of-N classification.

what if i wanted to create a model that maps 16 input chars to 16 output chars in order to learn simple ciphers like caesar or vigenere? this would be a 1:1 mapping between chars where the order of the sequence plays a role as well. i cannot wrap my head around building a simple CNN architecture that does just that.