Thread for Blogs (Just created one for ResNet)

radek · February 8, 2018, 1:59pm

Just for fun ran this and this is what I got on my first attempt (without beam search, directly sampling from the softmax of the RNN):

The President has spoken! He seems to agree with my statement above

ecdrid · February 8, 2018, 5:01pm

For those comfortable in Keras

if __name__ == '__main__':
	import numpy as np
	import pandas as pd
	import re
	print("Enter the path file")
	path = str(input())
	file =  open(path + 'warpeace_input.txt', 'r')
	text = file.read()
	file.close()
	text = re.sub('[^a-zA-Z]', ' ', text)
	text_input = list(text)
	print("size of data : {}".format(len(text_input)))
	vocab = set(text_input)
	print("size of vocabulary : {}".format(len(vocab)))
	# Create a dictionary 
	char_to_int = dict((k,v) for v,k in enumerate(vocab))
	int_to_char = dict((k,v) for k,v in enumerate(vocab))
		# The fun part
	# without any loss of generality let us assume seq_len = 50
	seq_len = 50
	vocab_size = len(vocab)
	char_corpus = len(text_input)
	n_sample = char_corpus//seq_len
	# Create a test matrix X with dimension [sample, seq_len, features]
	X = np.zeros(shape = (n_sample,seq_len, vocab_size))
	Y = np.zeros(shape = (n_sample,seq_len, vocab_size))
	for i in range(n_sample):
	    if (i+1)%1000 == 0:
	        print("{} sequence generated".format(i+1))
	    x_seq = text_input[i*seq_len: (i+1)*seq_len]
	    x_seq_in = [char_to_int[k] for k in x_seq]
	    x_input = np.zeros((seq_len, vocab_size))
	    for j in range(seq_len):
	        x_input[j][x_seq_in[j]] =1
	    X[i] = x_input
	    
	    y_seq = text_input[i*seq_len + 1: (i+1)*seq_len + 1]
	    y_seq_in = [char_to_int[k] for k in y_seq]
	    y_input = np.zeros((seq_len, vocab_size))
	    for j in range(seq_len):
	        y_input[j][y_seq_in[j]] = 1
	    Y[i] = y_input
	print(" the shape of input matrix : ", X.shape)
	print(" the shape of target matrix : ", Y.shape)
	import keras
	from keras.models import Sequential
	from keras.layers import Dense,Dropout, LSTM, TimeDistributed
	# Generate the keras model:
	# Set the hyper paramaters for the model
	epoch = 200
	batch_size = 64
	hidden_lstm = 64

	model = Sequential()
	model.add(LSTM(hidden_lstm, input_shape = (None, vocab_size), return_sequences = True))
	model.add(Dropout(0.2))
	model.add(LSTM(hidden_lstm,return_sequences = True))
	model.add(Dropout(0.3))
	model.add(LSTM(hidden_lstm//2,return_sequences = True))
	model.add(TimeDistributed(Dense(vocab_size, activation = 'softmax')))
	model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')
	print(model.summary())
	model.fit(X,Y, batch_size=batch_size, epochs=epoch, verbose = 1)
	model.save(path+'language.h5')
	# Make predictions
	def generate_text(model, length):
	    ix = [np.random.randint(vocab_size)]
	    y_char = [int_to_char[ix[-1]]]
	    X = np.zeros((1, length, vocab_size))
	    for i in range(length):
	        X[0, i, :][ix[-1]] = 1
	        print(int_to_char[ix[-1]], end="")
	        ix = np.argmax(model.predict(X[:, :i+1, :])[0], 1)
	        y_char.append(int_to_char[ix[-1]])
	    return ('').join(y_char)
	
	print ('enter the length')
	length = int(input())
	x = generate_text(model, length)
	print(x)

Even · February 8, 2018, 5:24pm

I saw that link earlier and was struck by the comments talking about how beam search with higher values resulted in repeating patterns.

the papers that mention beam search in the context of sequence prediction nets generally use a beam width of 2, or another low value

It’s a curious thing to me that searching more widely would result in this kind of looping and that by setting the beam to be much more narrow it becomes more dynamic. It looks from your example like you’ve set the beam width to 3 so I’m surprised it’s so repetitive, but I suspect you’re right and that the softmax is forcing the results down a consistent path.

Kudos on the implementation though! It’s something I hope to find time to do at the word level.

I need to dig in a little more on the cosine annealing front. In my masters we optimized routing on FPGAs with simulated annealing and it’s a methodology that I understand well. How does it differ performance wise from sgd with restarts and is there a reason to use it instead of that method?

Anyway, thanks for sharing!

radek · February 8, 2018, 5:46pm

I still get confused by the nomenclature when it comes to SGDR and cyclical learning rates, but as far as I understand the cosine annealing callback in the fastai library would implement the sgd with restarts.

As for differences in performance - I haven’t tested, but training feels substantially different. You just throw a learning rate that seems decent and don’t have to bother with manually tweaking it, which I think is great. Plus you get to save your models on cycle ends which can make for nice ensembling.

It would be really great to have a comparison on how it stacks up to training a model with Adam for example but I am not sure if anyone got far with that (I think people claimed the results were not that great after all and nothing was published by anyone).

EDIT: Here is the paper on SGDR.
EDIT2: I used Adam with the cosine annealing callback. At some point will want to compare it with Adam without the CB. Just the fact that fastai allows for such a combination so effortlessly is really amazing.

divyansh · February 8, 2018, 6:22pm

Guys this is my first blog. Please review and comment.

radek · February 12, 2018, 9:13am

Monday 4/8! Can’t believe I am already half way through!

Funny thing is happening. I still sweat about the things that I publish not being good and I think I have better things coming but sort of the friction about sharing things is slowly, very slowly diminishing This is nice.

I was not starting to feel like that after the 1st blog post, not after the 3rd, but after a couple of more it seems to be a bit better

Today I bring you Talk like the President, part 2. Nothing new here for our fast.ai sisters and brothers Exactly what was covered in lecture 6 and first part of lecture 7 with added beam search and a slight twist

BTW there is way more training data that gets downloaded than what I use and I didn’t spend much time optimizing the network architecture, so there is definitely quite a bit more of performance that can be squeezed out of this.

EDIT: Ok, not sure anymore if with time you care less about what you publish being crap Maybe you just learn to ignore this pesky feeling and move on

init_27 · February 13, 2018, 12:12am

I have to mention this here as well.

Witnessing the change from @radek being nervous if he’ll make the deadlines (Mondays) to if he’ll be able to put out top notch content is really motivating

jeremy · February 13, 2018, 1:53am

I hope I get to that point sometime - I’m still terrified before every class and somewhat convinced after each class that it should have been far better…

radek · February 19, 2018, 7:30am

This is an old blog post that I rewrote quite extensively. I was tempted to delete it but then decided to go with a major edit.

surmenok · February 24, 2018, 10:26pm

I’ve got good results on the German Traffic Sign Recognition Dataset using the general fast.ai approach to image classification (though I had to spend a few days tuning different parts). Wrote a post about that:

It’s a draft, I’m going to publish it at the beginning of the next week. Please take a look, I appreciate any feedback.

radek · February 26, 2018, 9:29am

I probably wrote this article in as much to share my findings as to figure things out for myself. I intended it to be the last post of the series (currently on week 6 out of 8) but I started to notice that as time was passing by I started to care about this less and less. Of course I still find what I wrote important and I do my best to stick to it. Also, using Twitter has not ceased to be a challenge. But having figured it out to an extent I am comfortable with, I moved onto other things. Seems that right now what I have on my mind are resnets, densenets and a bit of rmsprop

So, decided to switch things around and publish the article now. The two remaining posts will be technical although with a slight twist, but one that I couldn’t help. One features dragons and the other bubbles. I am actually asking the reader to imagine they are a bubble.

surmenok · February 26, 2018, 3:55pm

Please take a look, I appreciate any feedback.

Thanks for the feedback. I’ve published the article.

radek · March 5, 2018, 9:33am

Blog post Monday 7/8!

What led me to writing it was the question I asked a couple of months ago and one that I see asked on the forums quite frequently.

The tone of the article is not very serious and I have significant concerns about it. I am not sure I could have shared the information I wanted to convey in a more serious fashion. Anyhow - maybe someone will find it useful

radek · March 12, 2018, 9:11am

And this is my last blog post in the 8 week series

Of course, many doubts about both the content and the presentation. I think I should have asked people to take a look at the post before I published it. But all these coulda shoulda woulda are probably also a direct result of the insecurity about posting things that despite having done this for 8 weeks hasn’t completely gone away.

But oh well. If you cannot conquer it, maybe at least you can ignore it

ecdrid · March 12, 2018, 11:13am

Are you a Writer?

Truly your writing skills are awesome…

Nice post

radek · March 12, 2018, 12:03pm

Thank you I am not a writer. And for most of my professional life my writing sucked. To the point where I had it in my year end objectives for a couple years running to figure out how to be more concise I never did before leaving that job

I do put a lot of work into writing these articles though. Really glad you enjoy them

init_27 · March 13, 2018, 1:16am

I have to confess- I never took blogging seriously. My first article was a really bad attempt. But @radek has been a constant source of learning for I think all of us, and I think I can say I’ve improved a lot-thanks to him

I think this is the best part about fast ai too. We get to meet great DL folks.

I’ve learnt so much in the past 6 months just by scrolling through the forum threads daily, more than what my school has taught me during my 3 years of undergrad studies.

ooo…Does anyone remember the moment from Part1v2 where Jeremy had posted a screenshot of the LB?
And the LB got turned upside down every 2 hours?

nok · March 16, 2018, 2:38am

Hi, I just wrote a blog to explain the intuition of Decision Trees/Random Forest.

P.s.reposted here as I did not see this thread yesterday.

MTAU · March 26, 2018, 9:11am

I’ve posted a few blogs here on setting up AWS, jupyter notebook. Feedback welcome. Tutoring a local group in Brisbane, found it easier to blog on medium than demo the same thing multiple times.

suggestions on best option to assign a static ip (elastic IP on aws), route a domain to the ip and create a ssl from a recognised ssl provider welcome.

neerjadoshi · April 7, 2018, 5:29am

Hi all, I have begun blogging and have written a couple of blogs. I would like some suggestions on the things I can improve (especially since the first one is based off the first deep learning class).

Augmentation for Image Classification:

I’m planning to put together a series of best practices from all the resources that I come across, so this is the first one:

@jeremy your feedback will be most valuable.
Thanks a lot!