Reinforcement learning for combinatorial optimization (Help with PhD research)

Valentas · May 4, 2018, 3:10pm

Good day,

I am doing my PhD in operation research in logistics. I want to create an urban logistics simulation, during which I would show how autonomous vehicles can adapt to disruptions and resilience would emerge. The theoretical approach is based on complex-adaptive systems theory.

As I understand, the routing should be made by using “reinforcement learning for combinatorial optimization”, however, I do not have experience related to RL application for route scheduling. Could anyone recommend courses or literature related to this topic?

If anyone has developed such a model, maybe could provide some insights in to the data architecture? As I understand this would be similar to supervised learning, however sequence of categories must also be taken into consideration.
Should the implementation be based on graphs?
Or should I have features categorizing a trip with evaluation of the cost function?

In my case, I am having a e-commerce industry with product delivery to end-consumer, during the day I am generating traffic jams, which would block the routs. The algorithm should learn from the environment and select better routes automatically by considering the goal function.

msp · May 4, 2018, 4:32pm

Sounds like a great project!

Have you read the classic RL book by Sutton & Barto?

Valentas · May 4, 2018, 5:50pm

No I have not, but I will I fought that this kind of approach is quite novel, therefore was looking for more recent work, but I expect that beginning from the “basics” would be a good idea.

I have found a recent published publication, which is quite suitable for this approach. However, I lack technical understatement of the input data (architecture) and logic of the whole process, therefore hopping to find a course, which would cover combinatorial optimization and deep learning integration.

Deep Reinforcement Learning for Solving the Vehicle Routing Problem, https://arxiv.org/abs/1802.04240

msp · May 4, 2018, 5:57pm

I think the book is worth reading. Especially if you are doing a PhD in this topic you will be expected to know the classic. For more recent work, combining with Deep Learning, I recommend reading all the papers from DeepMind – they are the most advanced in RL at the moment. Also check out the OpenAI gym, which is a kind of benchmarking environment for RL algorithms.

Valentas · May 4, 2018, 6:06pm

My PhD is a bit tricky I am studying social science, but PhD is an interdisciplinary topic, therefore all the computations I am learning by myself trough online material.

Thank you for the tips!

msp · May 4, 2018, 7:27pm

Sounds interesting! But I would recommend to make sure that you are doing your PhD in a suitable department for your topic. In Operations Research, the expectation is usually a research-level contribution in terms of mathematics/computation/etc.

Valentas · May 4, 2018, 8:01pm

In my department computations are not popular, more popular is interviews and surveys. The most important thing is theoretical contribution - phenomena explanation. So the algorithm in my case does not matter, I am simply using it to generate data and to explain a phenomena - “supply chain resilience”. Its called “generative social science”, there is quite a good website about that - https://www.complexityexplorer.org/

I am interesting in data science, therefore got approval to use such a not traditional approach in our department. That’s why I am asking for courses, books and guidelines wherever I can

but thanks for the worry! I will finish my PhD, but want to finish it at a high level and with a bang!

Valentas · May 6, 2018, 6:55am

I have found several publications and a great github example. However, I lack a more basic understatement of the process, so if anyone could clarify thank you!

Deep Reinforcement Learning for Solving the Vehicle Routing Problem
https://arxiv.org/abs/1802.04240

Neural Combinatorial Optimization with Reinforcement Learning
https://arxiv.org/abs/1611.09940

Example:

github.com

higgsfield/np-hard-deep-reinforcement-learning/blob/master/Neural Combinatorial Optimization.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "<p>This tutorial presents <a href=\"\">Neural Combinatorial Optimization with Reinforcement Learning</a>. Focusing on the traveling salesman problem (TSP) and train a recurrent neural network that, given a set of city coordinates, predicts a distribution over different city permutations. Using\n",
    "negative tour length as the reward signal, the model optimize the parameters of the recurrent\n",
    "neural network using a policy gradient method. </p><p>Despite the computational expense, without much engineering and\n",
    "heuristic designing, Neural Combinatorial Optimization achieves close to optimal\n",
    "results on 2D Euclidean graphs with up to 100 nodes.</p><p>\n",
    "Previous attempts used supervised learning. Learning from examples in such a\n",
    "way is undesirable for NP-hard problems because (1) the performance of the model is tied to the\n",
    "quality of the supervised labels, (2) getting high-quality labeled data is expensive and may be infeasible\n",
    "for new problem statements, (3) one cares more about finding a competitive solution more than\n",
    "replicating the results of another algorithm. By contrast, Reinforcement Learning (RL) provides an appropriate paradigm for training\n",
    "neural networks for combinatorial optimization, especially because these problems have relatively\n",
    "simple reward mechanisms that could be even used at test time. </p>"

This file has been truncated. show original

nav13n · May 6, 2018, 11:56am

Hi @Valentas, Not sure it’s directly related but have a look at this https://selfdrivingcars.mit.edu/deeptraffic/. There are accompanying lectures as well once you create an account that would help you learn the central idea of reinforcement learning. If you would go through the deep traffic paper, one of the promising observations were that improving the state of one agent improved the behaviour of overall system. Hope it helps a bit.

Valentas · May 6, 2018, 1:31pm

Thank you, I will check the courses and publications of the the provided link. I always like MIT courses, just not always you know where to look for. So thanks for the tips!