Lesson 8 wiki

jeremy · February 28, 2017, 8:09pm

This post is editable by all of you! Please edit it to add any useful information for this weeks class, including links brought up during class, other helpful readings, useful code/shell snippets etc. Also, please help organize this wiki post by putting things in sections, adding/editing prose, etc.

NB: Discourse does not allow multiple edits at the same time - if two people save, most recent wins. So copy your edits before saving, just in case!

Lesson 8 Wiki

Class links

Papers

[A Neural Algorithm of Artistic Style] ([1508.06576v2] A Neural Algorithm of Artistic Style)

Original artistic style paper
26 Aug 2015
Handy summary of the paper; NB many other paper summaries also available at that repo

What problem are they solving?

Generate an image that has the content of one image and the style of another. “Style” here means colors and textures.

What is the general idea that they are using to solve it?

Use a pre-trained CNN to extract the content of one image and the style of the other. Combine the content with the style.

A step-by-step explanation:

Choose a CNN pre-trained for image classification (e.g. VGG trained on ImageNet).
Generate a random input image.
Choose a measure and measure the difference between the input image’s CNN activations and the content image’s CNN activations (e.g. MSE).
Choose another measure and measure the difference between the input image’s CNN activations and the style image’s CNN activations (e.g. MSE of the Gram matrices).
Choose a way to combine the differences and combine them (e.g. a weighted sum).
Minimize the combination by changing the pixels of the input image (which will then change the CNN activations, the differences, and finally the combination).
Repeat using the modified input image until it looks good or doesn’t change.

The CNN activations can come from multiple convolutional layers.

What kind of results are they getting?

What previous work are they building on?

Deep convolutional neural networks for image classification
Non-photorealistic rendering, particularly texture transfer.

Cool examples of neural style

Links mentioned during class

Jeremy’s liked tweets - over a thousand deep learning papers and articles recommended by Jeremy, and a great place to find interesting DL researchers to follow
http://www.arxiv-sanity.com/ - great way to find similar papers to what you’re interested in, and get recommendations. Be sure to login, and save papers that we’re working on
http://www.mendeley.com/ - Jeremy’s recommended app for reading, organizing, and annotating papers
https://www.reddit.com/r/MachineLearning/ - good source of deep learning news
Tensorflow Dev Summit Videos
Import AI newsletter - deep learning news
WildML news - another deep learning newsletter

Code snippets

To get all files for the lesson (h/t @ibarinov):
wget -r -nH -nd -np -R index.html* http://files.fast.ai/part2/lesson8/
(files out-of-date)

Steps needed for Style Transfer using VGG:

Content extraction

Read the cont_image
Resize cont_image
Preprocess : RGB ->BGR and normalize
Create VGG_avg
Generate P(l) = activations for the cont_image at layer l
Generate F(l) = activations for white noise image at layer l
content_loss = MSE(P(l), F(l))

Style extraction

Read the style_image
Resize style_image
Preprocess : RGB ->BGR and normalize
Create VGG_avg
Generate Gram_matrix for original image, A(L) = Inner product of F * Ft for the layers L, where F is the vectorized feature map. (There is some weight to the loss for each layer?)
Generate Gram_matrix for white noise image, G(L) similarly above
style_loss = MSE(A(L), G(L))

Style transfer

loss(c,s,x) = a * content_loss(c, x) + b * style_loss(s, x), where c = content image, s = style image, x = generated image
Use scipy’s implementation of L-BFGS to find the values of “x” that minimize the loss (fmin_l_bfgs_b(loss, x0=x, args=(c, s))). In our case, “x” happens to be image pixels, and thus we end up searching for the image that is close to both the content image (c) and the style image (s).

An Introduction to Tensorflow

The software library is called TensorFlow and the central unit of data in TensorFlow is known as tensor. Tensors are values shaped in the form of arrays varying from zero to any number of dimensions. When you hear a “Rank” of a tensor it refers to the number of dimensions of the tensor array.

Example:

3                  # A Rank 0 tensor
[1, 2, 3]          # A rank 1 tensor
[[1, 2, 3]]        # A rank 2 tensor

TensorFlow Dev Summit 2017 notes

Tensorflow Dev Summit Videos

Keynote

History:
- DistBelief in 2012
  - Scalable and worked well in production
  - But not flexible
    - Designed for CPUs, GPUs were bolted on
    - Worked for simple models but sequence models and reinforcement learning problems were hard to express
- TensorFlow
  - Supports many platforms
    - CPU, GPU, Android, iOS, Raspberry Pi, ASICs (e.g. TPU)
  - Supports cloud platforms and virtual machines
  - Supports many languages
  - Visualization and monitoring tools
    - TensorBoard
- Progress to date
  - Nov '15 (v0.5): Launched
  - Dec '15 (v0.6): Faster on GPUs; Python 3.3+
  - Feb '16 (v0.7): TF Serving
  - Apr '16 (v0.8): Distributed TensorFlow
  - Jun '16 (v0.9): iOS; Mac GPU
  - Aug '16 (v0.10): TF Slim
  - Oct '16 (v0.11): HDFS; CUDA 8; CuDNN 5
  - Nov '16 (v0.12): Windows 7, 10, and Server 2016; TensorBoard Embedding Visualizer
  - Feb '17 (v1.0): XLA (accl. linear algebra), TF debugger; pip package; better API; stable API
Applications:
- Word lens (augmented reality; translating text in images to other languages)
- Spam detection
- Smart reply (“Are you free tomorrow?”–> “Yeah, what’s up?”)
- Cucumber sorter
- Dermatologist-level classification of skin cancer
- Ophthalmologist-level detection of diabetic retinopathy

XLA

Just-in-time (JIT) compiler for TF
- Better on GPUs, worse on CPUs (for now)
Ahead-of-time (AOT) compiler for TF
- Smaller binaries on mobile (600KB instead of 2.6MB)
Makes models faster
Makes models smaller
Experimental

TensorBoard

Visualizing the computation graph
- tf.summary.FileWriter
  - A Python class that writes data for TensorBoard
  - Code example

            writer = tf.summary.FileWriter("tmp/mnist_demo/1")
            writer.add_graph(sess.graph)

        $ tensorboard --logdir /tmp/mnist_demo/1

Summaries
- tf.summary.scalar
- tf.summary.image
- tf.summary.audio
- tf.summary.histogram
- Code example

        merged_summary = tf.summary.merge_all()
        writer = tf.summary.FileWriter("tmp/mnist_demo/3")
        writer.add_graph(sess.graph)
        for i in range(2001):
            batch = mnist.train.next_batch(100)
            feed_dict = {x: batch[0], y: batch[1]}
            if i % 5:
                s = sess.run(merged_summary, feed_dict=feed_dict)
                writer.add_summary(s, i)
            sess.run(train_step, feed_dict=feed_dict)

    $ tensorboard --logdir /tmp/mnist_demo/3

Hyperparameter search
- Parent directory
- Child directory for each hyperparameter
Embedding visualizer

High-Level API

Layers: Layer-oriented API (like Keras)
- tf.layers
- An API that maps to architecture descriptions
- Compatible with Keras
Estimator
- model_fn
- API:
  - train_op: fit()
  - eval_op: evaluate()
  - predictions: predict()
  - export_savedmodel()
    - For TensorFlow Serving
Canned Estimators

Integrating Keras & TensorFlow

Keras: An API for building deep learning models
Integrating with TensorFlow means:
- New features with the Keras API:
  - Distributed learned
  - Cloud ML
  - Hyperparameter tuning
  - TF Serving
Take-aways slide:
- For TF users: an accessible high-level API with good defaults
- For Keras users: powerful TF features for your Keras models
- tf.contrib.keras by TF 1.1 (mid-March 2017)
- tf.keras by TF 1.2
- A big step in making TensorFlow and deep learning accessible to as many people as possible.

TensorFlow at DeepMind

DeepMind moved to TensorFlow
Data center cooling
AlphaGo
WaveNet
- Natural-sounding speech
- Music generation (nonvocal)
Learning to learn by gradient descent by gradient descent
- Trained a neural network to train a neural network

Skin Cancer Image Classification

Stanford
AI lab + medical school
Skin cancer
- Most common cancer in the US
- 1 in 5 Americans will develop skin cancer
- Estimate for 2017: 87k new cases of melanoma and 9.7k deaths from it.
- Survival rate for melanoma is 98% if detected early
- Estimate for 2020: 6.1 billion smartphones in circulation
“If your program can differentiate between hundreds of dog breeds, I believe it could make a great contribution to dermatology.” - Dr. Rob Novoa, Jan 27, 2015, to the Stanford AI lab
Dataset:
- 129k images, 2k diseases
- Base classes: 2k
- Superclasses: Benign, ambiguous, dangerous, deadly
Training:
- “We find that training on finer classes results in better performance.”
- If they want less-fine classification, they sum the probabilities of the constituent classes.
- Transfer learning with Inception-V3 (V3 worked better than V1)
Evaluation:
- Sensitivity: True position / positive
- Specificity: True negative / negative
- Confusion matrices

Mobile and Embedded TensorFlow

A low-level talk
Offering unique user experiences
- Real-time translations (e.g. Word Lens)
- Predicting next words on keyboards
- Scanning old photos
- Detecting diseases real-time
Working closely with hardware builders
- ARM, CEVA, Movidius, IBM, Intel, Qualcomm
TF support for:
- Android, iOS, Raspberry Pi
Tutorials for getting started with TF + Android/iOS/RaspberryPi
TF + Android examples
- TF Classify
- TF Detect
- TF Stylize
Managing binary size

Distributed TensorFlow

Data parallelism
Model paralellism
Very large models (wide and deep)
Outrageously large models
Core concepts of distributed TF:
- Replicating your model
- Device placement for Variables
- Sessions and Servers
- Fault tolerance

TensorFlow Ecosystem: Integrating TensorFlow with Your Infrastructure

Data preparation, training, and serving.
Data prep
- Import from various sources
- Proprocess the data
- Export in a file format that TF supports (e.g. TFRecords)
Data prep tools
- Apache Spark, Hadoop MapReduce, Beam
Training
- Local vs. Distributed
Distributed training tools (cluster managers)
- Kubernetes, Apache hadoop, Mesos, Slurm
Distributed storage tools
- Hadoop HDF5, Google Cloud Storage, AWS
Container engines:
- Docker, Rkt
Distributed training concepts:
- Parameter servers
- Workers
TensorBoard is compatible with distributed training
Be careful with your choice of file formats (for data and for models)
Serving models has nuances; TF Serving handles them:
- Loading a new version of a model
- Batching inputs efficiently
- Isolation between multiple models (by default they contend for hardware resources)
When to use in-process TF over TF Serving?:
- Mobile
- Batch inference
- Very strict latency requirements (TF Serving involves round-trip RPCs)
- Run one fewer service

Serving Models in Production with TensorFlow Serving

History of software:
- 2005: No source control
- 2010: Source control & continuous build, but not for ML
- 2017: Great tools for ML but still have a way to go
“Just because we have a best practice, doesn’t mean that everyone uses it yet.”
Goal: Develop best practices for ML, and make them the default configuration for ML tools
“Serving”: How you use your ML model after you’ve trained it
Continually training models and continually deploying them
TF Serving: A flexible, high-performance serving system for machine learned models, designed for production environments

ML Toolkit

Tools
- Linear / logistic regression
- Clustering
  - KMeans clustering
  - Gaussian mixture model (GMM)
- WALS matrix factorization
- Support vector machine (SVM)
- Stochastic dual coordinate ascent (SDCA)
- Random forest
- DNN, RNN, LSTM, Wide & Deep, …
Properties (of the implementations of the tools):
- Usable
  - scikit-learn-inspired Estimator APIs
- Extensible
  - Combining the tools in novel ways?
  - Chaining a DNN after KMeans?
- Scalable
  - Distributed implementations of the tools
- Fast

Sequence Models and the RNN API

Case study: Google Translate
- Sequence to sequence
- Encoder and decoder
  - Encode from source language to vector
  - Decode from vector to target language
Topics
- Reading sequence data
  - Goal: Pad, but minimize the amount of padding needed
    - Approach 1: Static padding
      - Need to know the maximum length ahead of time
      - Wastes time
      - Wastes space
    - Approach 2: Dynamic padding
      - Max sequence in the batch sets the sequence length
      - This is still pretty close to static padding in terms of waste
    - Approach 3: Bucketing
      - When reading a sequence, place it into the shortest bucket it can fit into
    - Approach 4: Truncated BTT via State Saver
      - ?
- The RNN API
  - A library of RNN architectures
  - A flexible API for implementing your own RNN architectures
- Fully dynamic calculation
  - Fast and memory efficient custom loops
  - Goal: Handle sequences of unknown length
- Fused RNN cells (flexibility vs. efficiency)
  - Optimization for special cases
  - XLA fused time steps (use on GPUs, not CPUs; no speedup on embedded devices, but makes the models smaller)
  - Recommendation: Try XLA and benchmark your intended use case
- Dynamic decoding

Wide & Deep Learning: Memorization + Generalization with TensorFlow

Wide: Memorization: “Seagulls can fly.” and “Pigeons can fly.”
Deep: Generalization: “Animals with wings can fly.”
Wide & Deep: Generalization + memorizing exceptions: “Animals with wings can fly, but penguins cannot fly.”
Wide & Deep tutorials available; The Wide & Deep Project

Magenta: Music and Art Generation

Can ML generate compelling media?
- Compelling media
  - Music
  - Images and video
  - Text
    - Jokes
    - Stories
Interface for creative coders and artists
- pip install magenta
The importance of critical feedback
- Musicians and listeners
Music and art co-evolve with technology
- Turning technologies into artistic instruments
  - Using them in novel, unintended ways
  - “Breaking” them
  - Merging them
Examples:
- Image inpainting
- Music generation (similar to image inpainting)
- Artistic style transfer
“If you’re not using TensorBoard, you want to be using TensorBoard.”
magenta.tensorflow.org

Case Study: TensorFlow in Medicine - Retinal Imaging

Diabetic retinopathy: fastest growing cause of blindness
415M people with diabetes, each at risk of going blind
Regular screening is key to preventing blindness (e.g. once a year)
- The doctor takes a picture of the back of the eye with a special camera, and then looks at the picture
Shortage of eye doctors to do this task
Doctors are inconsistent with each other
Dataset: 30k images, 880k diagnoses (labels)
- Hired an army of doctors to label the images (54 doctors)
F-score of model: 0.95; Median F-score of doctors: 0.91
How did TF help?
- Quick prototyping
  - Starter architectures
  - Pre-trained models
- Experiment at scale
  - GPU support
  - Fast training
- The above helped them focus on their other problems:
  - Finding the right problem
  - Getting the data & labels
  - Validating & deploying
Next steps:
- Validation of the model by hospitals
- Custom hardware to take the images and run the models
  - Low-cost and easy to use hardware

sravya8 · February 28, 2017, 8:36pm

I recommend not using -nd as all files in future will be dumped in the same folder, rest of the flags are useful.

jeremy · February 28, 2017, 8:39pm

Here’s the description of that flag, for those (like me) that weren’t aware of it:

‘-nd’
‘–no-directories’
Do not create a hierarchy of directories when retrieving recursively. With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the filenames will get extensions ‘.n’).

Whether you use it I guess depends on whether you create your own folder structure in advance or not.

lin.crampton · March 2, 2017, 5:06am

made a text file transcript of lesson 8 (otherwise known as part2 lesson1). file is lesson8.txt and it resides at https://drive.google.com/open?id=0BxXRvbqKucuNVUFBU3NIT1dtb3M

let me know if you see things i should change

lin

ljubomir · March 2, 2017, 12:37pm

I’d like to share a tensorflow tip

You may have noticed that tensorflow outputs messages like these when it starts:

import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally

I find these unhelpful because they make it harder to see the actually useful output.

To control the level of messages produced by tensorflow use the environment variable
TF_CPP_MIN_LOG_LEVEL. If you set it to 1, tensorflow will produce only W (Warning)
and E (Error) messages, not I (Information):

export TF_CPP_MIN_LOG_LEVEL=1

If you set it to 2, tensorflow will only output E messages.

For the record, I found this here (the second answer, by user “craymichael”):

There are several other solutions proposed on stackoverflow, but they don’t seem
to work in tensorflow 1.0.0.

Hope this is useful

Ljubomir

Maxime · March 2, 2017, 4:01pm

@lin.crampton Thanks a lot for the transcript Lin, this is great

jeremy · March 3, 2017, 8:47pm

Thanks folks for all the great edits to the wiki! Lots of good info there now.

jeremy · March 4, 2017, 4:51pm

@Matthew those tensor summit notes are great! Any interest in writing them in prose and turning it into a blog post? That would be of interest to a lot of people, I think.

Matthew · March 4, 2017, 6:14pm

No, but anyone is welcome to take the notes and expand and modify them into whatever they’d like.

When I reflect on my skims through the videos, here’s what comes out of my head (not much):
Format:

Inner thought
- Video title

TensorFlow is newer than I thought (late 2015). It’s growing fast. Wow!

Keynote

When deploying, check out XLA and TF Serving.

XLA

Distributed TensorFlow

TensorFlow Ecosystem: Integrating TensorFlow with Your Infrastructure

Serving Models in Production with TensorFlow Serving

When training, use TensorBoard to improve one’s intuition for models and to look for bugs.

TensorBoard

Keras is heavily influencing TensorFlow’s API. Sweet.

High-level API

Integrating Keras & TensorFlow

A bunch of applications. Hopefully one day I’ll think of one or discover one. One approach to attempting an application seems to be “get the data, and then throw every trick in the book at it.”

TensorFlow at DeepMind

Skin Cancer Image Classification

Magenta: Music and Art Generation

Case Study: TensorFlow in Medicine - Retinal Imaging

TF on mobile. Deep learning on mobile. Oh yeah, mobile is a thing. I should keep that in mind when looking for applications. When working with mobile, google “mobile and embedded tensorflow”

Mobile and Embedded TensorFlow

TF is trying to replace scikit-learn? Hopefully this means more for users than just a change in import statements and syntax (e.g. faster code; better APIs)

ML Toolkit

RNNs. Not focused on them yet. I’ll return to these nuances when I am. I’m glad to know these nuances exist, even if I don’t fully understand them yet.

Sequence Models and the RNN API

I thought memorization was a bad thing. Memorizing exceptions to generalizations seems like a good thing. I never connected memorization to network width. Cool.

Wide & Deep Learning: Memorization + Generalization with TensorFlow

lin.crampton · March 7, 2017, 2:44am

created a text transcript earlier in the week, https://drive.google.com/open?id=0BxXRvbqKucuNVUFBU3NIT1dtb3M.

if there’s something that needs to be changed, let me know.

jeremy · March 7, 2017, 7:42pm

Thanks @lin.crampton - I’ve added those captions directly to the video now.

sravya8 · March 7, 2017, 9:40pm

You are just awesome @lin.crampton! Thank you!

topbots · April 12, 2017, 6:47am

I built my own deep learning server this week and migrated my old homework code off of AWS. My neural-style.ipynb which ran fine on AWS instance is now producing this error. Ideas on how to fix?

jeremy · April 12, 2017, 4:22pm

Looks like you have keras 2. Install keras 1 instead - there’s a thread on the main forum about this.

RogerS49 · May 3, 2017, 2:31pm

Hi looking for the style images used in the lesson8 neural_style notebook
Thanks

jeremy · May 3, 2017, 4:54pm

Just find some you like through Google Images!

RogerS49 · May 3, 2017, 10:52pm

Please could some one explain what the ‘f’ stands in the following code snippet. When I see this form of code the instinct is too remove the ‘f’ as the syntax sugar thinks the rest of the line is a string and removing the ‘f’ corrects the syntax sugar. Which is confusing.

imsave(f'{path}/results/res_at_iteration_{i}.png', deproc(x.copy(), shp)[0])

Thanks

jeremy · May 4, 2017, 12:39am

Python 3.6 string formatting. Upgrade jupyter notebook to fix the syntax highlighting.

darthdeus · May 25, 2017, 11:07am

Just a small note, this will only work in bash and if there already isn’t a file called index.htmlSOMETHING. If there is, the shell auto-expands the command before passing it to wget, resulting in -R index.htmlSOMETHING.

The correct command is using index.html\* or "index.html*" to prevent shell expansion, the one mentioned only works because bash has inconsistent behavior in this case. I’m saying this mostly since people tend to get used to this and then are surprised when the * actually does get expanded. The correct command should be

wget -r -nH -nd -np -R index.html\* http://files.fast.ai/part2/lesson8/

or

wget -r -nH -nd -np -R "index.html*" http://files.fast.ai/part2/lesson8/