Reinforcement Learning Study Group

This group is for people who are interested about RL. Fastai is not directly supporting it but we can learn and teach ways to use it with and without Fastai.

We are starting to read Sutton’s RL book so anyone interested please join our slack channel (link below) to take part conversations and meetups.


Deep Reinforcement Learning Resources

Here it’s a list from some resources that i found useful


Deep learning for Research with Tensorflow


Advanced Deep learning & Deep Reinforcement Learning by DeepMind



Implementation of Reinforcement Learning Algorithms

Reinforcement Learning course UCL

Deep Reinforcement Learning Nanodegree program

CS224 Deep RL Course

DRL by DeepMind





Let’s update this so it is easier to find all resources from here instead of need to scroll whole topic.

I think we could start this by telling others how much previous knowledge we have on this topic. I can start by saying that I started studying machine learning about start of 2018 and spent some time learning RL. I would describe myself as person who know what different kind of approaches there is but don’t know how to use those. I hope that in next 3-4 months I learn to code these from scratch.

I was thinking that if there are many people in the same position right now maybe we could take some course and then study it same time talking about the challenges we have faced. Is this good idea and is someone interested about it?

I believe that DeepMind and OpenAI are definitely good reference points for everyone who wants to get into RL. Also, David Silver’s courses are interesting from a theoretical point of view, and recently Sutton & Barto updated their RL book also.

For people like me who has ZERO knowledge of RL, it would be good to have some type of virtual study session.


Great to have someone interested about this :wink: . That might be a good idea and something we could do.

1 Like

Like @devforfu wrote, the updated RL book by Sutton & Barto is a very solid reference. It is also available FoC at I think that it is good to add it to the list.


I have done a lot of work in RL and pytorch - the best book I can recommend is this one by Maxim Lapan:

I also did the Udacity RL course but did not have time/motivation to complete all the projects…it’s a good course though at the same time I didn’t care for the project portion. I also got quite demotivated as I found there’s very few job openings and actual real life applications for RL, and so have switched my focus to computer vision for now.


Could you @ingbiodanielh add this to the resources list you have created.

1 Like

Yeah, I would say that RL is a more rare thing to see among requirements in job postings than general-purpose CV algorithms and tools. However, I guess it should be quite popular in the areas that involve robotics or self-driving cars, right?

From what I saw, RL is being looked at Microsoft (had an interview) and other very large companies that can afford to do pure R&D - i.e. research that may not pay off for years. But not aware of anyone using it for any actual practical use at the moment.
DeepMind/AlphaZero arguably are using it the most but for games (go, chess) which is where RL shines right now (stationary environment), but it has a ways to go before it can solve real world use cases, and thus minimal hiring.
It’s too bad b/c RL is in my opinion, the ‘true AI’ - raw intelligence that learns just like people learn. The next most promising use case I’ve seen is for robots in warehouses…again, RL really needs a very fixed environment atm to be useful.

1 Like

But what you think about RL in website optimizing products, content, and other things. I’m not sure is RL better at making predictions than normal recommendation system.

oh, for advertising there is definite use of multi-arm bandits (Yahoo home page used to use it I believe, among others) but I consider the multi-arm bandit as a smaller subset of true RL. (i.e. RL models as a MDP, multi-arm bandit is more like an adaptive algo).
In theory, RL can outperform other recommendation systems if the environment is stationary…but the problem is that peoples preferences are not static. You might have interest in product A and clicking on ads for it but once you purchase, your interest level in that type of product ads is now zero, but the system doesn’t know it’s changed initially. For Netflix, or similar, where peoples taste are more stable, then RL could do well I think.
This article has some discussion on multi-arm bandit vs RL:

1 Like

I would say that Deep Mind experiments in building universal game controllers and virtual environments looks promising. Like, if we’re talking about Deep Learning in general, we also have a kind of “static” environment. The network trained to recognize objects doesn’t work well in case if we show it a dataset with completely new properties, I guess.

But I agree with your point. The dynamic and non-deterministic environments are quite a challenging topic. Sure enough, these game experiments are probably the very beginning of the research process but once the Probability Theory was also used to predict dice outcomes and count chances in card games :smile:


I was thinking other day that can we think simplest rl as supervised learning model with different loss function? For example, if we have supervised learning model that predicts which football team is going to win. The loss function would be how far our prediction is from real result. Then if we want to bet real money on these teams we should somehow calulate the bet size using accuracy of the model and how much money we have. This could be solved with rl. So what if we modify the original model by making output bet size. If it is negative, absolute value of that number will be placed for team B and if it is positive the bet will be placed for team A. Then the loss function could be how much money model will loss/win. If it is placing 20 units to team b and team A win the loss would be -20 but if B win the loss might be 30.

Can reinforcement learning been teached this way? I have never seen anyone using this but in my mind this should work. I havent had time to test this yet but if someone is interested to see the results I can help with that.

Someone might find this usefull

It is written using Tensorflow but I think it is good practice to sometimes use other than PyTorch.

I created a study group for us in Slack.

I think we could start to learn something soon. If you are interested join there so we can discuss more about when to have meetups and other things.

IF YOU HAVE ANY INTEREST TO LEARN RL NOW IT IS GREAT TIME! We are going to start read Sutton’s book and anyone interested about this can join our Slack group to take part conversations and meetups. We start slowly so anyone can join easily following next couple of weeks. It is great for anyone no matter how good basic knowledge you have about this area. Book is easy to read and free so there is no excuses to not start learning RL right now.

Anyone here interested in participating in the Microsoft AI challenge 2019,
I have attached a link to the challenge below…