Reinforcement Learning Study Group

oh, for advertising there is definite use of multi-arm bandits (Yahoo home page used to use it I believe, among others) but I consider the multi-arm bandit as a smaller subset of true RL. (i.e. RL models as a MDP, multi-arm bandit is more like an adaptive algo).
In theory, RL can outperform other recommendation systems if the environment is stationary…but the problem is that peoples preferences are not static. You might have interest in product A and clicking on ads for it but once you purchase, your interest level in that type of product ads is now zero, but the system doesn’t know it’s changed initially. For Netflix, or similar, where peoples taste are more stable, then RL could do well I think.
This article has some discussion on multi-arm bandit vs RL:
https://boliu68.github.io/2017/Reinforcement-Learning-versus-Bandit/

1 Like