This group is for people who are interested about RL. Fastai is not directly supporting it but we can learn and teach ways to use it with and without Fastai.
We are starting to read Sutton’s RL book so anyone interested please join our slack channel (link below) to take part conversations and meetups.
I think we could start this by telling others how much previous knowledge we have on this topic. I can start by saying that I started studying machine learning about start of 2018 and spent some time learning RL. I would describe myself as person who know what different kind of approaches there is but don’t know how to use those. I hope that in next 3-4 months I learn to code these from scratch.
I was thinking that if there are many people in the same position right now maybe we could take some course and then study it same time talking about the challenges we have faced. Is this good idea and is someone interested about it?
I believe that DeepMind and OpenAI are definitely good reference points for everyone who wants to get into RL. Also, David Silver’s courses are interesting from a theoretical point of view, and recently Sutton & Barto updated their RL book also.
I also did the Udacity RL course but did not have time/motivation to complete all the projects…it’s a good course though at the same time I didn’t care for the project portion. I also got quite demotivated as I found there’s very few job openings and actual real life applications for RL, and so have switched my focus to computer vision for now.
Yeah, I would say that RL is a more rare thing to see among requirements in job postings than general-purpose CV algorithms and tools. However, I guess it should be quite popular in the areas that involve robotics or self-driving cars, right?
From what I saw, RL is being looked at Microsoft (had an interview) and other very large companies that can afford to do pure R&D - i.e. research that may not pay off for years. But not aware of anyone using it for any actual practical use at the moment.
DeepMind/AlphaZero arguably are using it the most but for games (go, chess) which is where RL shines right now (stationary environment), but it has a ways to go before it can solve real world use cases, and thus minimal hiring.
It’s too bad b/c RL is in my opinion, the ‘true AI’ - raw intelligence that learns just like people learn. The next most promising use case I’ve seen is for robots in warehouses…again, RL really needs a very fixed environment atm to be useful.
But what you think about RL in website optimizing products, content, and other things. I’m not sure is RL better at making predictions than normal recommendation system.
oh, for advertising there is definite use of multi-arm bandits (Yahoo home page used to use it I believe, among others) but I consider the multi-arm bandit as a smaller subset of true RL. (i.e. RL models as a MDP, multi-arm bandit is more like an adaptive algo).
In theory, RL can outperform other recommendation systems if the environment is stationary…but the problem is that peoples preferences are not static. You might have interest in product A and clicking on ads for it but once you purchase, your interest level in that type of product ads is now zero, but the system doesn’t know it’s changed initially. For Netflix, or similar, where peoples taste are more stable, then RL could do well I think.
This article has some discussion on multi-arm bandit vs RL: https://boliu68.github.io/2017/Reinforcement-Learning-versus-Bandit/
I would say that Deep Mind experiments in building universal game controllers and virtual environments looks promising. Like, if we’re talking about Deep Learning in general, we also have a kind of “static” environment. The network trained to recognize objects doesn’t work well in case if we show it a dataset with completely new properties, I guess.
But I agree with your point. The dynamic and non-deterministic environments are quite a challenging topic. Sure enough, these game experiments are probably the very beginning of the research process but once the Probability Theory was also used to predict dice outcomes and count chances in card games
I was thinking other day that can we think simplest rl as supervised learning model with different loss function? For example, if we have supervised learning model that predicts which football team is going to win. The loss function would be how far our prediction is from real result. Then if we want to bet real money on these teams we should somehow calulate the bet size using accuracy of the model and how much money we have. This could be solved with rl. So what if we modify the original model by making output bet size. If it is negative, absolute value of that number will be placed for team B and if it is positive the bet will be placed for team A. Then the loss function could be how much money model will loss/win. If it is placing 20 units to team b and team A win the loss would be -20 but if B win the loss might be 30.
Can reinforcement learning been teached this way? I have never seen anyone using this but in my mind this should work. I havent had time to test this yet but if someone is interested to see the results I can help with that.
IF YOU HAVE ANY INTEREST TO LEARN RL NOW IT IS GREAT TIME! We are going to start read Sutton’s book and anyone interested about this can join our Slack group to take part conversations and meetups. We start slowly so anyone can join easily following next couple of weeks. It is great for anyone no matter how good basic knowledge you have about this area. Book is easy to read and free so there is no excuses to not start learning RL right now.