Thread for Blogs (Just created one for ResNet)


#182

Just for fun ran this and this is what I got on my first attempt :slight_smile: (without beam search, directly sampling from the softmax of the RNN):

The President has spoken! :slight_smile: He seems to agree with my statement above :wink:


(Aditya) #183

For those comfortable in Keras

if __name__ == '__main__':
	import numpy as np
	import pandas as pd
	import re
	print("Enter the path file")
	path = str(input())
	file =  open(path + 'warpeace_input.txt', 'r')
	text = file.read()
	file.close()
	text = re.sub('[^a-zA-Z]', ' ', text)
	text_input = list(text)
	print("size of data : {}".format(len(text_input)))
	vocab = set(text_input)
	print("size of vocabulary : {}".format(len(vocab)))
	# Create a dictionary 
	char_to_int = dict((k,v) for v,k in enumerate(vocab))
	int_to_char = dict((k,v) for k,v in enumerate(vocab))
		# The fun part
	# without any loss of generality let us assume seq_len = 50
	seq_len = 50
	vocab_size = len(vocab)
	char_corpus = len(text_input)
	n_sample = char_corpus//seq_len
	# Create a test matrix X with dimension [sample, seq_len, features]
	X = np.zeros(shape = (n_sample,seq_len, vocab_size))
	Y = np.zeros(shape = (n_sample,seq_len, vocab_size))
	for i in range(n_sample):
	    if (i+1)%1000 == 0:
	        print("{} sequence generated".format(i+1))
	    x_seq = text_input[i*seq_len: (i+1)*seq_len]
	    x_seq_in = [char_to_int[k] for k in x_seq]
	    x_input = np.zeros((seq_len, vocab_size))
	    for j in range(seq_len):
	        x_input[j][x_seq_in[j]] =1
	    X[i] = x_input
	    
	    y_seq = text_input[i*seq_len + 1: (i+1)*seq_len + 1]
	    y_seq_in = [char_to_int[k] for k in y_seq]
	    y_input = np.zeros((seq_len, vocab_size))
	    for j in range(seq_len):
	        y_input[j][y_seq_in[j]] = 1
	    Y[i] = y_input
	print(" the shape of input matrix : ", X.shape)
	print(" the shape of target matrix : ", Y.shape)
	import keras
	from keras.models import Sequential
	from keras.layers import Dense,Dropout, LSTM, TimeDistributed
	# Generate the keras model:
	# Set the hyper paramaters for the model
	epoch = 200
	batch_size = 64
	hidden_lstm = 64

	model = Sequential()
	model.add(LSTM(hidden_lstm, input_shape = (None, vocab_size), return_sequences = True))
	model.add(Dropout(0.2))
	model.add(LSTM(hidden_lstm,return_sequences = True))
	model.add(Dropout(0.3))
	model.add(LSTM(hidden_lstm//2,return_sequences = True))
	model.add(TimeDistributed(Dense(vocab_size, activation = 'softmax')))
	model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')
	print(model.summary())
	model.fit(X,Y, batch_size=batch_size, epochs=epoch, verbose = 1)
	model.save(path+'language.h5')
	# Make predictions
	def generate_text(model, length):
	    ix = [np.random.randint(vocab_size)]
	    y_char = [int_to_char[ix[-1]]]
	    X = np.zeros((1, length, vocab_size))
	    for i in range(length):
	        X[0, i, :][ix[-1]] = 1
	        print(int_to_char[ix[-1]], end="")
	        ix = np.argmax(model.predict(X[:, :i+1, :])[0], 1)
	        y_char.append(int_to_char[ix[-1]])
	    return ('').join(y_char)
	
	print ('enter the length')
	length = int(input())
	x = generate_text(model, length)
	print(x)

(Even Oldridge) #184

I saw that link earlier and was struck by the comments talking about how beam search with higher values resulted in repeating patterns.

the papers that mention beam search in the context of sequence prediction nets generally use a beam width of 2, or another low value

It’s a curious thing to me that searching more widely would result in this kind of looping and that by setting the beam to be much more narrow it becomes more dynamic. It looks from your example like you’ve set the beam width to 3 so I’m surprised it’s so repetitive, but I suspect you’re right and that the softmax is forcing the results down a consistent path.

Kudos on the implementation though! It’s something I hope to find time to do at the word level.

I need to dig in a little more on the cosine annealing front. In my masters we optimized routing on FPGAs with simulated annealing and it’s a methodology that I understand well. How does it differ performance wise from sgd with restarts and is there a reason to use it instead of that method?

Anyway, thanks for sharing!


#185

I still get confused by the nomenclature when it comes to SGDR and cyclical learning rates, but as far as I understand the cosine annealing callback in the fastai library would implement the sgd with restarts.

As for differences in performance - I haven’t tested, but training feels substantially different. You just throw a learning rate that seems decent and don’t have to bother with manually tweaking it, which I think is great. Plus you get to save your models on cycle ends which can make for nice ensembling.

It would be really great to have a comparison on how it stacks up to training a model with Adam for example but I am not sure if anyone got far with that (I think people claimed the results were not that great after all and nothing was published by anyone).

EDIT: Here is the paper on SGDR.
EDIT2: I used Adam with the cosine annealing callback. At some point will want to compare it with Adam without the CB. Just the fact that fastai allows for such a combination so effortlessly is really amazing.


(Divyansh Jha) #186

Guys this is my first blog. Please review and comment. :slight_smile:


#187

Monday 4/8! :slight_smile: Can’t believe I am already half way through!

Funny thing is happening. I still sweat about the things that I publish not being good and I think I have better things coming :wink: but sort of the friction about sharing things is slowly, very slowly diminishing :slight_smile: This is nice.

I was not starting to feel like that after the 1st blog post, not after the 3rd, but after a couple of more it seems to be a bit better :slight_smile:

Today I bring you Talk like the President, part 2. Nothing new here for our fast.ai sisters and brothers :slight_smile: Exactly what was covered in lecture 6 and first part of lecture 7 with added beam search and a slight twist :slight_smile:

BTW there is way more training data that gets downloaded than what I use and I didn’t spend much time optimizing the network architecture, so there is definitely quite a bit more of performance that can be squeezed out of this.

EDIT: Ok, not sure anymore if with time you care less about what you publish being crap :slight_smile: Maybe you just learn to ignore this pesky feeling and move on :slight_smile:


(Sanyam Bhutani) #188

I have to mention this here as well.

Witnessing the change from @radek being nervous if he’ll make the deadlines (Mondays) to if he’ll be able to put out top notch content is really motivating :smiley:


(Jeremy Howard) #189

I hope I get to that point sometime - I’m still terrified before every class and somewhat convinced after each class that it should have been far better…


#190

This is an old blog post that I rewrote quite extensively. I was tempted to delete it but then decided to go with a major edit.