Custom RNN training/batching correctly?

wiltors42 · January 21, 2019, 12:34am

Hi everyone, somewhat new to deep learning.

I’ve been trying to implement a custom LSTM/GRU network in PyTorch by following some examples (Karpathy’s blog post/colah’s blog post) I’ve been trying to do text sequence generation. I first tried using batch side 1 with some success but now I’m trying to implement batching and I’m not sure if I’m splitting my output vector correctly. PyTorch default RNN has batching by default forward pass wants 3D tensors. My custom GRU class accepts input as 3D tensor [batches, characters, vectors] where characters are input context and vectors are the encoded characters. Since I’m getting many batches as output how can I know that I’m slicing them correctly when I sample them for text? I’ve tried many different sampling techniques - Numpy random.choice(), pytorch multinomial() + temperature, and argmax().

Code here https://pastebin.com/ZmxSEm9c

wiltors42 · January 24, 2019, 3:19am

Update:
My issue was with text embedding. After some more searching I found this post` which describes a similar problem. I decided not to use the embedding feature of PyTorch but instead put each character into a list and one-hot encoded them. To fix the problem I had to sort the characters in order so that each character is nearest to its next closest value.