Just for fun ran this and this is what I got on my first attempt (without beam search, directly sampling from the softmax of the RNN):
The President has spoken! He seems to agree with my statement above
Just for fun ran this and this is what I got on my first attempt (without beam search, directly sampling from the softmax of the RNN):
The President has spoken! He seems to agree with my statement above
For those comfortable in Keras
if __name__ == '__main__':
import numpy as np
import pandas as pd
import re
print("Enter the path file")
path = str(input())
file = open(path + 'warpeace_input.txt', 'r')
text = file.read()
file.close()
text = re.sub('[^a-zA-Z]', ' ', text)
text_input = list(text)
print("size of data : {}".format(len(text_input)))
vocab = set(text_input)
print("size of vocabulary : {}".format(len(vocab)))
# Create a dictionary
char_to_int = dict((k,v) for v,k in enumerate(vocab))
int_to_char = dict((k,v) for k,v in enumerate(vocab))
# The fun part
# without any loss of generality let us assume seq_len = 50
seq_len = 50
vocab_size = len(vocab)
char_corpus = len(text_input)
n_sample = char_corpus//seq_len
# Create a test matrix X with dimension [sample, seq_len, features]
X = np.zeros(shape = (n_sample,seq_len, vocab_size))
Y = np.zeros(shape = (n_sample,seq_len, vocab_size))
for i in range(n_sample):
if (i+1)%1000 == 0:
print("{} sequence generated".format(i+1))
x_seq = text_input[i*seq_len: (i+1)*seq_len]
x_seq_in = [char_to_int[k] for k in x_seq]
x_input = np.zeros((seq_len, vocab_size))
for j in range(seq_len):
x_input[j][x_seq_in[j]] =1
X[i] = x_input
y_seq = text_input[i*seq_len + 1: (i+1)*seq_len + 1]
y_seq_in = [char_to_int[k] for k in y_seq]
y_input = np.zeros((seq_len, vocab_size))
for j in range(seq_len):
y_input[j][y_seq_in[j]] = 1
Y[i] = y_input
print(" the shape of input matrix : ", X.shape)
print(" the shape of target matrix : ", Y.shape)
import keras
from keras.models import Sequential
from keras.layers import Dense,Dropout, LSTM, TimeDistributed
# Generate the keras model:
# Set the hyper paramaters for the model
epoch = 200
batch_size = 64
hidden_lstm = 64
model = Sequential()
model.add(LSTM(hidden_lstm, input_shape = (None, vocab_size), return_sequences = True))
model.add(Dropout(0.2))
model.add(LSTM(hidden_lstm,return_sequences = True))
model.add(Dropout(0.3))
model.add(LSTM(hidden_lstm//2,return_sequences = True))
model.add(TimeDistributed(Dense(vocab_size, activation = 'softmax')))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')
print(model.summary())
model.fit(X,Y, batch_size=batch_size, epochs=epoch, verbose = 1)
model.save(path+'language.h5')
# Make predictions
def generate_text(model, length):
ix = [np.random.randint(vocab_size)]
y_char = [int_to_char[ix[-1]]]
X = np.zeros((1, length, vocab_size))
for i in range(length):
X[0, i, :][ix[-1]] = 1
print(int_to_char[ix[-1]], end="")
ix = np.argmax(model.predict(X[:, :i+1, :])[0], 1)
y_char.append(int_to_char[ix[-1]])
return ('').join(y_char)
print ('enter the length')
length = int(input())
x = generate_text(model, length)
print(x)
I saw that link earlier and was struck by the comments talking about how beam search with higher values resulted in repeating patterns.
the papers that mention beam search in the context of sequence prediction nets generally use a beam width of 2, or another low value
It’s a curious thing to me that searching more widely would result in this kind of looping and that by setting the beam to be much more narrow it becomes more dynamic. It looks from your example like you’ve set the beam width to 3 so I’m surprised it’s so repetitive, but I suspect you’re right and that the softmax is forcing the results down a consistent path.
Kudos on the implementation though! It’s something I hope to find time to do at the word level.
I need to dig in a little more on the cosine annealing front. In my masters we optimized routing on FPGAs with simulated annealing and it’s a methodology that I understand well. How does it differ performance wise from sgd with restarts and is there a reason to use it instead of that method?
Anyway, thanks for sharing!
I still get confused by the nomenclature when it comes to SGDR and cyclical learning rates, but as far as I understand the cosine annealing callback in the fastai library would implement the sgd with restarts.
As for differences in performance - I haven’t tested, but training feels substantially different. You just throw a learning rate that seems decent and don’t have to bother with manually tweaking it, which I think is great. Plus you get to save your models on cycle ends which can make for nice ensembling.
It would be really great to have a comparison on how it stacks up to training a model with Adam for example but I am not sure if anyone got far with that (I think people claimed the results were not that great after all and nothing was published by anyone).
EDIT: Here is the paper on SGDR.
EDIT2: I used Adam with the cosine annealing callback. At some point will want to compare it with Adam without the CB. Just the fact that fastai allows for such a combination so effortlessly is really amazing.
Guys this is my first blog. Please review and comment.
Monday 4/8! Can’t believe I am already half way through!
Funny thing is happening. I still sweat about the things that I publish not being good and I think I have better things coming but sort of the friction about sharing things is slowly, very slowly diminishing This is nice.
I was not starting to feel like that after the 1st blog post, not after the 3rd, but after a couple of more it seems to be a bit better
Today I bring you Talk like the President, part 2. Nothing new here for our fast.ai sisters and brothers Exactly what was covered in lecture 6 and first part of lecture 7 with added beam search and a slight twist
BTW there is way more training data that gets downloaded than what I use and I didn’t spend much time optimizing the network architecture, so there is definitely quite a bit more of performance that can be squeezed out of this.
EDIT: Ok, not sure anymore if with time you care less about what you publish being crap Maybe you just learn to ignore this pesky feeling and move on
I have to mention this here as well.
Witnessing the change from @radek being nervous if he’ll make the deadlines (Mondays) to if he’ll be able to put out top notch content is really motivating
I hope I get to that point sometime - I’m still terrified before every class and somewhat convinced after each class that it should have been far better…
This is an old blog post that I rewrote quite extensively. I was tempted to delete it but then decided to go with a major edit.
I’ve got good results on the German Traffic Sign Recognition Dataset using the general fast.ai approach to image classification (though I had to spend a few days tuning different parts). Wrote a post about that:
It’s a draft, I’m going to publish it at the beginning of the next week. Please take a look, I appreciate any feedback.
I probably wrote this article in as much to share my findings as to figure things out for myself. I intended it to be the last post of the series (currently on week 6 out of 8) but I started to notice that as time was passing by I started to care about this less and less. Of course I still find what I wrote important and I do my best to stick to it. Also, using Twitter has not ceased to be a challenge. But having figured it out to an extent I am comfortable with, I moved onto other things. Seems that right now what I have on my mind are resnets, densenets and a bit of rmsprop
So, decided to switch things around and publish the article now. The two remaining posts will be technical although with a slight twist, but one that I couldn’t help. One features dragons and the other bubbles. I am actually asking the reader to imagine they are a bubble.
Please take a look, I appreciate any feedback.
Thanks for the feedback. I’ve published the article.
Blog post Monday 7/8!
What led me to writing it was the question I asked a couple of months ago and one that I see asked on the forums quite frequently.
The tone of the article is not very serious and I have significant concerns about it. I am not sure I could have shared the information I wanted to convey in a more serious fashion. Anyhow - maybe someone will find it useful
And this is my last blog post in the 8 week series
Of course, many doubts about both the content and the presentation. I think I should have asked people to take a look at the post before I published it. But all these coulda shoulda woulda are probably also a direct result of the insecurity about posting things that despite having done this for 8 weeks hasn’t completely gone away.
But oh well. If you cannot conquer it, maybe at least you can ignore it
Are you a Writer?
Truly your writing skills are awesome…
Nice post
Thank you I am not a writer. And for most of my professional life my writing sucked. To the point where I had it in my year end objectives for a couple years running to figure out how to be more concise I never did before leaving that job
I do put a lot of work into writing these articles though. Really glad you enjoy them
I have to confess- I never took blogging seriously. My first article was a really bad attempt. But @radek has been a constant source of learning for I think all of us, and I think I can say I’ve improved a lot-thanks to him
I think this is the best part about fast ai too. We get to meet great DL folks.
I’ve learnt so much in the past 6 months just by scrolling through the forum threads daily, more than what my school has taught me during my 3 years of undergrad studies.
ooo…Does anyone remember the moment from Part1v2 where Jeremy had posted a screenshot of the LB?
And the LB got turned upside down every 2 hours?
Hi, I just wrote a blog to explain the intuition of Decision Trees/Random Forest.
P.s.reposted here as I did not see this thread yesterday.
I’ve posted a few blogs here on setting up AWS, jupyter notebook. Feedback welcome. Tutoring a local group in Brisbane, found it easier to blog on medium than demo the same thing multiple times.
suggestions on best option to assign a static ip (elastic IP on aws), route a domain to the ip and create a ssl from a recognised ssl provider welcome.
Hi all, I have begun blogging and have written a couple of blogs. I would like some suggestions on the things I can improve (especially since the first one is based off the first deep learning class).
Augmentation for Image Classification:
I’m planning to put together a series of best practices from all the resources that I come across, so this is the first one:
@jeremy your feedback will be most valuable.
Thanks a lot!