Reinforcement learning vs. evolutionary strategies

mariya · March 26, 2017, 6:10pm

Just posting this content here in case it interests anyone. OpenAI wrote a blog post comparing reinforcement learning with evolutionary strategies, which turn out to be competitive with RL on a number of tasks.

The key difference:

Reinforcement learning injects noise into an agent’s actions, then back propagates to determine the parameters.
Evolutionary strategies ignore the agent / environment and injects noise directly into the parameters, eliminating need for backprop.

ES has a few benefits over RL: no need for backprop, less memory needed since no need to keep track of long history of actions, works with non-differentiable networks, no exploding gradients in RNNs, highly parallelizable, etc.

ES has one big downside, which is that adding noise to parameters doesn’t always result in new outcomes, while adding noise to actions often does. Can be tricky to perturb parameters in a way to get a good gradient signal.

Here is a notebook that lets you explore simple 2D evolutionary strategy

github.com

karpathy/randomfun/blob/master/es.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.gridspec as gridspec\n",
    "from IPython import display\n",
    "plt.rcParams['image.cmap'] = 'gray'\n",
    "plt.rcParams['image.interpolation'] = 'nearest'\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",

This file has been truncated. show original

Here is code to explore how to optimize a quadratic function with ES

gist.github.com

https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d

nes.py

"""
A bare bones examples of optimizing a black-box function (f) using
Natural Evolution Strategies (NES), where the parameter distribution is a 
gaussian of fixed standard deviation.
"""

import numpy as np
np.random.seed(0)

# the function we want to optimize

This file has been truncated. show original