Lesson 10 Discussion

jeremy · March 7, 2017, 7:13pm

We still didn’t get to GANs this week! So pushing this material to next week…

Papers:

Matthew · March 7, 2017, 10:35pm

@mariya and I found some bugs in the DCGAN notebook:

Replace:

dl,gl = train(MLP_D, MLP_G, 4000)

With:

dl,gl = train(MLP_D, MLP_G, MLP_m, 4000)

Replace:

plot_gen()

With:

plot_gen(MLP_G)

jeremy · March 7, 2017, 11:10pm

Thanks!

mariya · March 9, 2017, 2:53am

@Matthew and I jumped ahead and looked at the dcgan and wgan notebooks at study group on Tuesday. Both of us had the experience, when training the wgan in the notebook, of training our notebooks for hours only to have the notebook freeze upon completion and be unusable. Matt took 4.5 hours to run 200 epochs on his own box, but my Amazon P2 instance took 4 hours to do only 50.

Seems like you’ll want to train for small epochs at a time and then save / checkpoint often to avoid our #deepsuffering fate

Also - when training dcgan, I found it quite hard to assess if the training was actually improving results. From a purely qualitative perspective, seems like the results got better, then got worse. Ian Goodfellow mentions in his paper how there’s not a universally quantitative way to measure “good”. Is this still true or are there better methods now?

jeremy · March 9, 2017, 10:47pm

One of the big steps in WGAN is that they suggest that loss functions appear to be at least somewhat meaningful with that approach. They don’t have a solid mathematical proof, but it aligns with my experience.

I didn’t see the crashes you did. Be sure to set verbose=2 and maybe use tqdm instead if you want to see progress. Otherwise jupyter can crash your browser because of the overly fast progress bar updates.

xinxin.li.seattle · March 14, 2017, 6:53am

wgan training took a little over 7 minutes for 2 epochs on the p2.xlarge instance
CPU times: user 5min 46s, sys: 1min 35s, total: 7min 21s
Wall time: 7min 18s

I wonder why it is so slow? what are the ways to speed it up? in style transfer we learned about developing a vangogh-irises style transfer, if we were to focus only on a vangogh-irises wgan, could that in theory make things faster?

jeremy · March 14, 2017, 11:06am

That seems very fast to me. Perceptual losses training took an hour or so. Why would you say it’s slow?

shgidi · March 14, 2017, 1:01pm

if I want to run the imagenet process notebook, can I do it with less than the full imagenet dataset? how can I get such a subset?
Thanks

xinxin.li.seattle · March 14, 2017, 4:54pm

if it takes 7 minutes for every 2 epochs, assume linear growth, then it takes 700 minutes for 200 epochs (as in the code), that is 10+hrs. Is that what we should expect? I did check my gpu using nvidia-smi to make sure it’s being used. But I did add limit_mem() after loading the libraries because my notebook crashed without this limiting the memory usage.

jeremy · March 14, 2017, 6:09pm

Sure. You could just download the validation set from imagenet (or academictorrents).

davecg · March 14, 2017, 6:29pm

@mariya

Try using…

from keras.callbacks import ModelCheckpoint

You can set it to save every few epochs or only your best models (unless WGAN I guess you would want to save every few epochs if val_loss is meaningless).

You use it by creating a list of callbacks and passing it to the callbacks kwarg of model.fit or model.fit_generator.

jpuderer · March 14, 2017, 7:07pm

I want to make sure I understand something correctly, and maybe clarify this for others as well…

It looks like any bcolz arrays used with BcolzArrayIterator will need to be randomized ahead of time as well.

If for example, you construct your bcolz arrays using by linearly iterating over a directory tree of categorized cat and dog images (eg. using os.walk or Keras’ flow_from_directory with shuffle=False), then most of the chunks in your bcolz array will contain images for the same category (eg. chunks of cats, followed by chucks of dogs).

If you used the BcolzArrayIterator on bcolz arrays constructed this way, you would get very poor training performance since it would train your model with batches containing just cats, followed by batches containing just dogs (even with shuffle=True, since this only shuffles images inside each chunk).

jeremy · March 15, 2017, 2:12am

That’s correct. That’s the key reason we randomized the file name list in the last lesson, so the resized image array would be in a random order. The file names contain the label, so that didn’t cause any problems in this case - otherwise you’d need to be careful to permute your labels in the same way.

renjithmadhavan · March 15, 2017, 2:35am

I was working on the neural-sr.ipynb notebook and created the compressed array using chunk size 32, and is getting out of memory error. I am trying to recreate the array using chunk size = 16

K.set_value(m_sr.optimizer.lr, 1e-4)
train(32, 18000)

jeremy · March 15, 2017, 2:41am

Yeah sorry I forgot to mention in the class - I had the same problem. 16 works fine.

rkoppula · March 15, 2017, 6:20am

I am working through the neural-sr.ipynb. I don’t understand this code.
def mean_sqr_b(diff):
dims = list(range(1,K.ndim(diff)))
return K.expand_dims(K.sqrt(K.mean(diff**2, dims)), 0)

One difficulty I have with Keras is that I can’t really call a function with some test inputs and check it’s outputs.

jeremy · March 15, 2017, 6:33am

It’s just a mean squared error - it averages over all the dimensions except the first.

Here’s a tip: you can call a keras function manually to test it out by wrapping the call in K.eval(...). And you’ll need to wrap any arrays you pass it in K.variable(...). E.g.:

K.eval(K.sum(K.variable(np.array([1.,2]))))

rkoppula · March 15, 2017, 7:34am

Thank you for the cool tip!
K.eval(mean_sqr_b(K.variable(np.array([[1.,2],[3,4]])))) gives array([[ 1.58113885, 3.53553391]], dtype=float32) as expected.

I guess we don’t want to average over the first dimension because its the ‘batch’ dimension?

kelvin · March 15, 2017, 8:58am

In the imagenet_process.ipynb, there is a minor bug. However it affects the least significant digit so its impact is minor.

This function:

def parse_w2v(l):
    i=l.index(' ')
    return l[:i], np.fromstring(l[i+1:-2], 'float32', sep=' ')

should be:

def parse_w2v(l):
    i=l.index(' ')
    return l[:i], np.fromstring(l[i+1:-1], 'float32', sep=' ')

The trailing ‘\n’ is actually one character.

Alternatively one could just strip newlines during readlines():

lines = [l[:-1] for l in open(w2v_path+'.txt').readlines()]

kelvin · March 15, 2017, 9:47am

imagenet_process.ipynb classids.txt generation:

import nltk
# nltk.download()
# > d
# > wordnet
# > q
wordnet_nouns = list(nltk.corpus.wordnet.all_synsets(pos='n'))
with open(os.path.join(w2v_basepath, 'classids.txt'), 'w') as f:
    f.writelines(['n{:08d} {}\n'.format(n.offset(), n.name().split('.')[0]) for n in wordnet_nouns])