Reinforcement learning for combinatorial optimization (Help with PhD research)

Good day,

I am doing my PhD in operation research in logistics. I want to create an urban logistics simulation, during which I would show how autonomous vehicles can adapt to disruptions and resilience would emerge. The theoretical approach is based on complex-adaptive systems theory.

As I understand, the routing should be made by using “reinforcement learning for combinatorial optimization”, however, I do not have experience related to RL application for route scheduling. Could anyone recommend courses or literature related to this topic?

If anyone has developed such a model, maybe could provide some insights in to the data architecture? As I understand this would be similar to supervised learning, however sequence of categories must also be taken into consideration.
Should the implementation be based on graphs?
Or should I have features categorizing a trip with evaluation of the cost function?

In my case, I am having a e-commerce industry with product delivery to end-consumer, during the day I am generating traffic jams, which would block the routs. The algorithm should learn from the environment and select better routes automatically by considering the goal function.

Sounds like a great project!

Have you read the classic RL book by Sutton & Barto?

No I have not, but I will :slight_smile: I fought that this kind of approach is quite novel, therefore was looking for more recent work, but I expect that beginning from the “basics” would be a good idea.

I have found a recent published publication, which is quite suitable for this approach. However, I lack technical understatement of the input data (architecture) and logic of the whole process, therefore hopping to find a course, which would cover combinatorial optimization and deep learning integration.

Deep Reinforcement Learning for Solving the Vehicle Routing Problem,

I think the book is worth reading. Especially if you are doing a PhD in this topic you will be expected to know the classic. For more recent work, combining with Deep Learning, I recommend reading all the papers from DeepMind – they are the most advanced in RL at the moment. Also check out the OpenAI gym, which is a kind of benchmarking environment for RL algorithms.

My PhD is a bit tricky :slight_smile: I am studying social science, but PhD is an interdisciplinary topic, therefore all the computations I am learning by myself trough online material.

Thank you for the tips!

Sounds interesting! But I would recommend to make sure that you are doing your PhD in a suitable department for your topic. In Operations Research, the expectation is usually a research-level contribution in terms of mathematics/computation/etc.

In my department computations are not popular, more popular is interviews and surveys. The most important thing is theoretical contribution - phenomena explanation. So the algorithm in my case does not matter, I am simply using it to generate data and to explain a phenomena - “supply chain resilience”. Its called “generative social science”, there is quite a good website about that -

I am interesting in data science, therefore got approval to use such a not traditional approach in our department. That’s why I am asking for courses, books and guidelines wherever I can :slight_smile:

but thanks for the worry! I will finish my PhD, but want to finish it at a high level and with a bang!

1 Like

I have found several publications and a great github example. However, I lack a more basic understatement of the process, so if anyone could clarify thank you!

Deep Reinforcement Learning for Solving the Vehicle Routing Problem

Neural Combinatorial Optimization with Reinforcement Learning


Hi @Valentas, Not sure it’s directly related but have a look at this There are accompanying lectures as well once you create an account that would help you learn the central idea of reinforcement learning. If you would go through the deep traffic paper, one of the promising observations were that improving the state of one agent improved the behaviour of overall system. Hope it helps a bit.

Thank you, I will check the courses and publications of the the provided link. I always like MIT courses, just not always you know where to look for. So thanks for the tips!