Reinforcement learning vs. evolutionary strategies

Just posting this content here in case it interests anyone. OpenAI wrote a blog post comparing reinforcement learning with evolutionary strategies, which turn out to be competitive with RL on a number of tasks.

The key difference:

  1. Reinforcement learning injects noise into an agent’s actions, then back propagates to determine the parameters.
  2. Evolutionary strategies ignore the agent / environment and injects noise directly into the parameters, eliminating need for backprop.

ES has a few benefits over RL: no need for backprop, less memory needed since no need to keep track of long history of actions, works with non-differentiable networks, no exploding gradients in RNNs, highly parallelizable, etc.

ES has one big downside, which is that adding noise to parameters doesn’t always result in new outcomes, while adding noise to actions often does. Can be tricky to perturb parameters in a way to get a good gradient signal.

Here is a notebook that lets you explore simple 2D evolutionary strategy

Here is code to explore how to optimize a quadratic function with ES