Hello everyone, a noob here. I’d be grateful if you could help me why this is the case and how one can solve this issue.
Basically I followed the Udacity’s Pytorch IPython notebooks here and worked with Character RNN example. I wrote everything and everything works fine there. However, today I noticed We have different RNN/LSTM types, which are as follows:
-
many to many
-
many to one
-
one to many
-
one to one!
and apparently getting text input and outputting text is a many to many or sequence to sequence type!
I noticed when training we simply feed a sequence of 100 characters and get outputs with sequence of 100 characters , so far so good! but when it comes to generate text ourselves, I noticed the author used single input (one character-lengthed input!) and using that she generated many text that looked good! by looking good, I mean, words and punctuation were mostly correct, there were actual words and phrases not something random!
However, I tried to see whether feeding multiple characters at once would generate the same output or at least in a similar fashion! but to my surprise, the output was garbage!
Here is how the original sampling functions look like :
def predict(model, input_char , hidden_states, char2int, int2char, length, topk, device):
# 1.convert the char into int and then onehot encode it
input_int = np.array([char2int[input_char]]).reshape(1,-1)
input_one_hot = one_hot_encode(input_int, length)
input_tensor = torch.from_numpy(input_one_hot).to(device)
output, hidden_states = model(input_tensor, hidden_states)
hidden_states = tuple(h.data for h in hidden_states)
output = torch.nn.functional.softmax(output, dim=1).data
if topk == None:
top_characters = output.topk(np.arange(length))
else :
probs, top_characters = output.topk(topk)
top_characters = top_characters.cpu().numpy().squeeze()
probs = probs.cpu().numpy().squeeze()
char = np.random.choice(top_characters, p=probs/probs.sum())
return int2char[char], hidden_states
def sample(model, size, string_prime, topk, device ):
model.eval()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
chars = [c for c in string_prime]
w = (next(model.parameters())).data
h = (w.new_zeros(model.num_layers,1,model.hidden_size).to(device),
w.new_zeros(model.num_layers,1,model.hidden_size).to(device) )
for c in string_prime:
char, h = predict(model, c, h, model.char2int,model.int2char, 83,5,device)
chars.append(char)
for c in range(size):
char, h = predict(model, chars[-1], h, model.char2int,model.int2char,83,5,device)
chars.append(char)
return ''.join(chars)
sample(model, 1000,'The time', 5, device)
And this is my version which creates garbage output on the very same model that the above functions create very good resulta! :
def predict2(model, input_string , hidden_states, char2int, int2char, length, topk, device):
# 1.convert the string into char and then int and then one-hot encode it
all_chars = [char for char in input_string]
input_int = np.array([char2int[ch] for ch in all_chars]).reshape(1,-1)
input_one_hot = one_hot_encode(input_int, length)
input_tensor = torch.from_numpy(input_one_hot).to(device)
output, hidden_states = model(input_tensor, hidden_states)
# remove hidden state history or something else that I dont fully understand!!
hidden_states = tuple(h.data for h in hidden_states)
# our output is distribution! and we need distribution probability so
# we use softmax, we also use .data attribute/property since we are after
# the values and dont need grads!
output = torch.nn.functional.softmax(output, dim=1).data
# now lets take the most probable characters and among them get the highest!
if topk == None:
top_characters = output.topk(np.arange(length))
else :
probs, top_characters = output.topk(topk)
top_characters = top_characters.cpu().numpy().squeeze()
probs = probs.cpu().numpy().squeeze()
chars = []
for i in range(probs.shape[0]):
char = np.random.choice(top_characters[i], p=probs[i]/probs[i].sum())
chars.append(char)
return [int2char[char] for char in chars], hidden_states
def sample2(model, size, string_prime, topk, device ):
model.eval()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# our output shall start with this prime text,
# we first get the output for the last character,
# then continue to generate more using the last character! each time
w = (next(model.parameters())).data
h = (w.new_zeros(model.num_layers,1,model.hidden_size).to(device),
w.new_zeros(model.num_layers,1,model.hidden_size).to(device) )
chars = []
chars.append(string_prime)
for i in range(size):
string_prime, h = predict2(model, string_prime, h, model.char2int,model.int2char,83,5,device)
chars.append(''.join(string_prime))
return ''.join(chars)
print('sample 2: ')
sample2(model, 20,'The time', 5, device)
Example outputs :
Sample 1 :
print('sample 1: ')
sample(model, 100,'The time', 5, device)
outputs:
sample 1:
‘The time, and no\none in a child her moist sly cress swing on his eyes as though some of Varenka had\ncranced f’
Sample 2:
print('sample 2: ')
sample2(model, 20,'The time', 5, device)
outputs:
sample 2:
‘The timerirohme ,air adbidtsfsmatei\nuianednsetldt e,.uy.ebx _o. ues u\nHn\n pon\neigmesscithad ,eeisn-p s,n, lrdi s toessepco s\n lshrb\nimaceaimnpiisttyeynl ii d eywnmmiy’
My question is, is it not supposed to work either way? why does it not work with multi-character sequence input? and only works with single charactered input?
Clearly the loss decreases and the network learns something! but why can it only work with sequences of length 1 ? Do I need to do something else in Pytorch to get this to work?
Thank you very much in advance