Reinforcement Learning Study Group

(Lankinen) #1

This group is for people who are interested about RL. Fastai is not directly supporting it but we can learn and teach ways to use it with and without Fastai.

(Danielh Carranza) #2

Deep Reinforcement Learning Resources

Here it’s a list from some resources that i found useful


Deep learning for Research with Tensorflow


Advanced Deep learning & Deep Reinforcement Learning by DeepMind



Implementation of Reinforcement Learning Algorithms

Reinforcement Learning course UCL

Deep Reinforcement Learning Nanodegree program

CS224 Deep RL Course

DRL by DeepMind



(Lankinen) #3

Let’s update this so it is easier to find all resources from here instead of need to scroll whole topic.

(Lankinen) #4

I think we could start this by telling others how much previous knowledge we have on this topic. I can start by saying that I started studying machine learning about start of 2018 and spent some time learning RL. I would describe myself as person who know what different kind of approaches there is but don’t know how to use those. I hope that in next 3-4 months I learn to code these from scratch.

I was thinking that if there are many people in the same position right now maybe we could take some course and then study it same time talking about the challenges we have faced. Is this good idea and is someone interested about it?

(Ilia) #5

I believe that DeepMind and OpenAI are definitely good reference points for everyone who wants to get into RL. Also, David Silver’s courses are interesting from a theoretical point of view, and recently Sutton & Barto updated their RL book also.

(Nahid Alam) #6

For people like me who has ZERO knowledge of RL, it would be good to have some type of virtual study session.

(Lankinen) #7

Great to have someone interested about this :wink: . That might be a good idea and something we could do.

(Alexandre Gravier) #8

Like @devforfu wrote, the updated RL book by Sutton & Barto is a very solid reference. It is also available FoC at I think that it is good to add it to the list.

(Less ) #9

I have done a lot of work in RL and pytorch - the best book I can recommend is this one by Maxim Lapan:

I also did the Udacity RL course but did not have time/motivation to complete all the projects…it’s a good course though at the same time I didn’t care for the project portion. I also got quite demotivated as I found there’s very few job openings and actual real life applications for RL, and so have switched my focus to computer vision for now.

(Lankinen) #10

Could you @ingbiodanielh add this to the resources list you have created.

(Ilia) #11

Yeah, I would say that RL is a more rare thing to see among requirements in job postings than general-purpose CV algorithms and tools. However, I guess it should be quite popular in the areas that involve robotics or self-driving cars, right?

(Less ) #12

From what I saw, RL is being looked at Microsoft (had an interview) and other very large companies that can afford to do pure R&D - i.e. research that may not pay off for years. But not aware of anyone using it for any actual practical use at the moment.
DeepMind/AlphaZero arguably are using it the most but for games (go, chess) which is where RL shines right now (stationary environment), but it has a ways to go before it can solve real world use cases, and thus minimal hiring.
It’s too bad b/c RL is in my opinion, the ‘true AI’ - raw intelligence that learns just like people learn. The next most promising use case I’ve seen is for robots in warehouses…again, RL really needs a very fixed environment atm to be useful.

(Lankinen) #13

But what you think about RL in website optimizing products, content, and other things. I’m not sure is RL better at making predictions than normal recommendation system.

(Less ) #14

oh, for advertising there is definite use of multi-arm bandits (Yahoo home page used to use it I believe, among others) but I consider the multi-arm bandit as a smaller subset of true RL. (i.e. RL models as a MDP, multi-arm bandit is more like an adaptive algo).
In theory, RL can outperform other recommendation systems if the environment is stationary…but the problem is that peoples preferences are not static. You might have interest in product A and clicking on ads for it but once you purchase, your interest level in that type of product ads is now zero, but the system doesn’t know it’s changed initially. For Netflix, or similar, where peoples taste are more stable, then RL could do well I think.
This article has some discussion on multi-arm bandit vs RL:

(Ilia) #15

I would say that Deep Mind experiments in building universal game controllers and virtual environments looks promising. Like, if we’re talking about Deep Learning in general, we also have a kind of “static” environment. The network trained to recognize objects doesn’t work well in case if we show it a dataset with completely new properties, I guess.

But I agree with your point. The dynamic and non-deterministic environments are quite a challenging topic. Sure enough, these game experiments are probably the very beginning of the research process but once the Probability Theory was also used to predict dice outcomes and count chances in card games :smile: