Reinforcement Learning Study Group

Lankinen · February 3, 2019, 6:21am

That would be great competition but I don’t have enough time and talents. If you are going to participate it would be interesting to hear your solutions.

rnv93 · February 3, 2019, 5:38pm

I too have just started my journey in DRL not that too long ago, I felt participating in contest would provide the motivation to pick up the necessary skills at a faster rate.
If anyone is planning to start any new project, I would be interested in joining in.

Lankinen · February 5, 2019, 6:06am

We are going to make some example codes to Github and this is our group where anyone can join.

In case you are interested it would be great if you can check is there mistakes in the codes. Currently the repository is empty but we will add there something soon.

rnv93 · February 6, 2019, 4:42am

Cool will do so

rnv93 · February 6, 2019, 4:49am

This is another nice resource that I found online, the algorithms are implemented using Pytorch.

How do I add it to the list of resources available at the start of this post

Lankinen · February 6, 2019, 4:54am

@ingbiodanielh can you do it?

Lankinen · February 24, 2019, 10:51am

Hello everyone! We started this group a month ago and now we have multiple active members. There is still time to join and I wrote this message because of that. We had our first meeting this Monday where we talked about the first chapter of Sutton’s and Barto’s book. It wasn’t that important because the first chapter is more or less introduction to the book.

We will have another meetup week after the next and topic will be chapter 2 (multi armed bandit problem). We will have at least three amazing presentations about the topic. Now it is the best time to join our Slack, download the free RL book, and start reading it. You don’t have to talk anything in the meetings so if you just want to hear the presentations that is ok for us. In case you want to dig deeper, you can reserve some topic and have a presentation about it.

Reinforcement learning is a very important topic to understand no matter what kind of machine learning stuff you are doing. It is good to have a basic understanding because some new techniques might use RL for example loss function calculation. It can be hard to understand what the model is doing if you don’t understand how the loss function is working because it is using some simple RL technique. I would call this fastai part 3 although you don’t need to have information from part 2 to understand this.

LessW2020 · May 25, 2019, 12:51pm

This paper on mcp (multiplicative control) looks like a big advance for drl and robotics.

harikrishnanrajeev · September 28, 2019, 5:46am

thank you for starting this group.
“Reinforcement learning is a very important topic to understand no matter what kind of machine learning stuff you are doing.”
Why reinforcement learning is important ?.

Lankinen · September 28, 2019, 6:51am

A lot of those game playing AIs are built using reinforcement learning. There is also some business use cases but it is unsure how big impact it will have in he future. Personally I recommend to learn it because it is already achieving impressive results and might give even better in the future.

Smith06 · September 29, 2019, 8:51pm

I am not able to join the slack group. Can you open it again??

gargomeister · October 1, 2019, 3:23pm

Hey @Lankinen! I would love to join this study group channel. Is it possible to have a valid link? thanks!

Lankinen · October 1, 2019, 3:36pm

The group is not active anymore but if you can read the messages and if you are interested maybe someone could start it again by hosting meetups.

https://join.slack.com/t/rl-studygroup/shared_invite/enQtNzY2OTczOTMxMTcwLTVhODRkZDMxMzhhNGQ3MTg3NTM4MmRlYmRjYjVmNDhjZjQ1ZmFmZTVlNDdlNjhjYWI2ZGVkYWU2ODMzYjY3Njc

Ninesouls · December 18, 2019, 12:34pm

Hi there! Great to have a forum like this.

A question regarding DQN RL. I’ve just started looking into this, and one thing which I’m not sure I understand correctly is experience replay.

In experience replay, the state, the action, the next state and the reward are stored at each timestep. Later on they are sampled and used for training. This is supposed to decorrelate the data, which is generally strongly correlated in consecutive game timesteps. While this is true, it seems to me that it also breaks the correlations that are necessary for effective learning, because the true reward for an action usually lags the action by multiple timesteps.

Suppose, for example, that we have a game where firing a missile costs 10 points but if it hits you gain 100 points. In whatever sample that you may draw of firing the missile, the network will see an immediate reward of -10. The samples in which the missile hits (several timesteps later) and you get the +100 reward might have a completely different action associated with them. Moreover, the sample in which the missile hits would probably not be selected in the same minibatch and they would not be simultaneously used for the training. So, how is the network supposed to learn to associate the firing with the reward?

One might suggest that in the state just before the hit the network sees the missile close to the target and so it learns to associate this with a reward, then gradually it learns to associate further and further distances between missile and target with rewards and so on. This might be possible, but first - there might be very little signal drowning in a huge amount of noise as many missiles miss, if there are many timesteps between firing and hitting you might look at an incredibly complex model etc. Secondly, even if it’s feasible, it seems to me like a very roundabout way to learn something, one that would require a tremendous amount of training to distill the signal from the noise. Why not use as a reward the difference between the current score and the score after max-missile-flight time? Sure, that would also add noise around the signal, but at least it would envelope both the firing and the hitting (or missing!) in the same reward - to me that seems a far easier way of getting the learning done. Alternatively, why not use a recurrent network such as LSTM which can store multiple states and learn from their combined results? What must I be missing?

Thanks!

AjayStark · August 21, 2020, 2:58pm

Hii,
Is anyone currently on this topic? learning RL from any MOOC ?

TomHale · February 11, 2021, 1:15pm

I’m doing the Coursera RL speciality. They also use the Barto & Sutton book mentioned above.

@Lankinen the slack links have expired – could you please send a fresh one?

I’m looking for online RL community. Does anyone have any good IRC or slack recommendations generally?

Lankinen · February 11, 2021, 1:33pm

Not anymore active. I personally started reading it a few months ago again and I’m nearly finished. I recommend fastai’s discord and this forum as imo those are the best AI communities. I don’t know if there are any on RL at least active ones.

cory.mosiman12 · April 6, 2021, 8:50pm

Hi @TomHale or anybody else. Also looking for folks to help bounce ideas off of as I go through implementing RL things for the Sutton & Barto book. Is there a specific channel in the discord server to discuss or what is the best way to discuss?

TomHale · April 7, 2021, 8:13am

I’m just about to search for the fast.ai discord myself. Perhaps see you there