Reinforcement Learning does not work yet

In the last lesson’s AMA of Jeremy he had been asked whether he changed his mind about reinforcement learning and whether he will teach the topic next year. He said for the last 3 years this was a recurrent request and he haven’t changed his opinion on RL.

I was a fan of RL too until early 2018. Planned and collected books and courses to study after DL. But fortunately I read the tweet of Andrey karpathy and the below article which changed my mind about RL within half an hour.

Sometimes I am thinking that Twitter and forum posts are more important than books. Since it collects a large number of experiences that is impossible to get from a few number of books that I can read in my limited reading time. Although books have their own advantages too, going in more depth and the quality usually better, but I would never be able to read experiences of more than a few hundreds of book authors in my life.

I wonder if I have not read that tweet, how many months or years I would waste in my life?

I am glad if anybody is interested in reinforcement learning to comment on this article and whether it changed his mind. If not please share why?


Here is Andrey Karpathy’s insights on the winning AlphaGo and why it is a very special problem and most real life problems are not suitable for such solution:

1 Like

Thanks for sharing this, fascinating read also for people like me who don’t have an opinion on or experience with reinforcement learning!

1 Like

Thanks for linking to these articles Haider.

I also read Karpathy’s ‘pixels to pong’ article that you mentioned; it served as a good intro to RL.

I share your sentiments about the value of forum and blog posts vs. books. Well said!

Jeremy has an amazing track record and the fastai work is absolutely amazing. I respect Jeremy’s decision to focus on areas he finds more promising for his goal to disseminate AI to every one.

As far as RL-ish areas, my personal view aligns with the “Looking to the Future” section of the Irpan post. It IS generally more difficult to get RL-ish stuff working in real world. At the same time, there is amazing work happening in model based “RL-ish” stuff, and I know of teams having success with bandit type optimization (see the section above for examples).

1 Like

thanks for sharing this. Is there any change in your thoughts on RL.