Hi everyone, somewhat new to deep learning.
I’ve been trying to implement a custom LSTM/GRU network in PyTorch by following some examples (Karpathy’s blog post/colah’s blog post) I’ve been trying to do text sequence generation. I first tried using batch side 1 with some success but now I’m trying to implement batching and I’m not sure if I’m splitting my output vector correctly. PyTorch default RNN has batching by default forward pass wants 3D tensors. My custom GRU class accepts input as 3D tensor [batches, characters, vectors] where characters are input context and vectors are the encoded characters. Since I’m getting many batches as output how can I know that I’m slicing them correctly when I sample them for text? I’ve tried many different sampling techniques - Numpy random.choice(), pytorch multinomial() + temperature, and argmax().
Code here https://pastebin.com/ZmxSEm9c