Fastai reinforcement learning

Jumonji · May 31, 2019, 11:33pm

I know Jeremy isn’t interested in reinforcement learning because “its not practical and doesn’t apply to real world problems.” Well… I beg to differ! For example: AI for wireless spectrum allocation and for autonomous vehicles of all kinds. In fact, any problem that doesn’t have a predefined best outcome is ripe for reinforcement learning. I don’t know what kind of transfer learning we can apply (yet) but I suspect that if we dig in a bit we’ll find something. Perhaps.

I see there is at least one group of fastai users/alumni who are studying reinforcement learning together from Sutton and Barto’s new online graduate-level “introductory” textbook, but I’m wondering if anyone is, or is interested in, creating a fastrl extension to fastai in the excellent best-practices style that Jeremy and Rachel have pioneered? I found one fastrl project on github but it doesn’t seem to be related to fastai at all.

If there is sufficient interest (or maybe even if it’s just me) I’m up for forking fastai as fastrl on github and defining the project goals (for discussion), and keeping it synced as fastai is improved.

Thoughts?

pem · June 1, 2019, 5:58am

I’d be very happy to see a fast.ai extension catering to reinforcement learning. Although as much as I’d like to help out coding it I’m not sure I know enough of the theory to do it (I’m a beginner in machine learning but I’m also really interested in RL). If this project starts up, I’m happy to try it out or maybe help on a few small features.

jcardonnet · June 2, 2019, 8:22pm

Do you have any links/references to the book and/or the study group you mention? I couldn’t find anything online about a new book by Hinton.

Regards,
Julian

matejthetree · June 2, 2019, 9:52pm

I would be willing to help on this

Jumonji · June 2, 2019, 11:52pm

Sorry - I meant to say Sutton & Barto, not Hinton. Wrong legendary research team.

Reinforcement Learning: An Introduction - Sutton and Barto

You can download the entire pdf for free. Don’t let the title fool you - it’s only an “introduction” in the sense that it’s a launching point for Phd research.

Jumonji · June 3, 2019, 1:54am

Ok - I’ll see if can get the fork up this week and post a link here. I’m thinking we should use the Open AI Gym for the test bed problems to focus on. I also looked at the DeepMind stuff (like GO and DOTA 2 solutions) but they are way too complex to start with. The gym is at least manageable - like the cart-pole balance problem. Plus, on the PyTorch tutorials they actually show a version of this that uses a vision system to see the pole and learn to balance it - that might let us leverage the fastai vision subsystem. Thoughts?

One of the first things I/we have to do is really understand the fastai library so we can see what can be used, reused, and what styles we can emulate in creating the RL extensions. Fun stuff!

Jumonji · June 3, 2019, 5:34pm

If you could just help write test code, that would be great!

Jumonji · June 3, 2019, 6:01pm

Initial fork and project created. We’ll use the standard github workflow so I look forward to getting pull requests for your contributions. Contributors that submit high-value, quality code will be invited to formally join the project team.

Next steps are to define the next steps… I’ll be using a kanban style workflow to manage development tasks. If you send me a pull request for code, be sure to specify which task its for (or if it’s something totally new.)

Click the box below:

Cheers!

pem · June 4, 2019, 4:53am

Yeah, the Open AI Gym should be a good start. Are you planning to include the gym as part of fastrl or something that has to be imported separately?

We definitely can use the vision subsystem, particularly the cnn_learner and vision.image.Image.

Some good first steps I think are:

extend DataBunch to allow adding of new training data as the agent explores the environment
extend DataBunch to allow removing of old training data. Should help in implementing the concept of “memory” in the future

Just so we have some base to work with, I can try to create a notebook that plays cartpole with gym and existing fast.ai code. Maybe others can try the other games in the gym as well

Jumonji · June 4, 2019, 1:15pm

Let’s not actually copy any more code like gym, instead we can include instructions on where to find it and then have an import statement in our code. Make sense? That way our copy won’t ever get out of sync. We just have to list which version of gym we tested with and/or update our code if there are breaking changes in the gym code. (I’ve already had to sync up with fastai changes once already after I forked it 2 days ago!)

I like your code ideas! Which solution are you thinking of for the cart pole problem? Perhaps we could create a general framework for a cart-pole solver and have hooks for implementing different solutions. I’d like to do the most popular ones, like Q-Learning and SARSA.

We could build a DataBunch that feeds cart-pole data, for example, and that could be generalized to any data that is generated instead of coming from a file. We could have one version that provides the standard state vector that’s used in gym,and another that provides it as image data as used in the PyTorch tutorial.

Jumonji · June 4, 2019, 2:05pm

I created a slack page for us to brainstorm and critique fastrl ideas in detail as more of a developer space. Let’s move the discussions there, for those who have decided to jump in and contribute only please:
https://join.slack.com/t/fastrl/shared_invite/enQtNjQzMzQxMjc2NzcwLThlOTcyMmFlMjY5YmM3YzcxY2U1ODZjMzA2ZWI2OWU5Yjg1ODI1NjE2N2YyNTVkZTg1NWJmNDcxYWM0NjhjZmU

matejthetree · June 4, 2019, 6:35pm

Ok. I am on my second run trough fast.ai v3 course, so I am getting a good hang of it, but I am far from big shots. I work as a developer though for the last 8 years, so I am sure I will be able to help.

Jay2020 · June 4, 2019, 8:24pm

Hi all, I’ve completed fast.ai v3 part 1, working through part 2 and also working on Open AI spinning up exercises. Would like to help on this effort too.

scitator · June 7, 2019, 7:45am

Hi,

If you are interested in reinforcement learning, you can try Catalyst - https://github.com/catalyst-team/catalyst/tree/master/examples/atari
our high-level utils for both DL and RL.
For now its the most effective and featured framework for RL and distributed training based on PyTorch. And reproducible, btw. So, give it a try. Tested on NeurIPS RL competitions.

scitator · June 7, 2019, 7:48am

Last month I was also working on Open-source RL list. Maybe, you will find it interesting, https://docs.google.com/spreadsheets/d/1EeFPd-XIQ3mq_9snTlAZSsFY7Hbnmd7P5bbT8LPuMn0/edit?usp=drivesdk

Jumonji · June 7, 2019, 8:46pm

Thanks! I will absolutely give it a try.

Jumonji · June 7, 2019, 8:47pm

Thanks! I’ll look at that for sure.

gmovr · June 12, 2019, 7:00pm

Thanks for the links, interested in this too. Would love to contribute in the future.

MJPansa · June 13, 2019, 6:00pm

I was just thinking of starting a project like this. Unfortunately I have quite limited time atm. A lot of exams coming up and currently working through the udacity deep reinforcement learning nanodegree. Will definitely follow this project and see when I have time to contribute.

Gabriel_Syme · June 14, 2019, 1:01am

This is a great initiative! I also think RL is the future of practical applications, I feel it can have even more cross-domain interactions than deep learning itself.

One issue I have with RL out there is that the examples, open source code, etc, is not easily transferrable to real world settings. For example, most research happens in computer games since they can easily generate ‘real world experience’. But that doesn’t help much for other domains. For example, in my domain, Architecture, Engineering, and Construction, it’s really not usable. Things like Unity that can create 3D environments of the world and specialized software that create environmental data for the world are required there.

I guess what I’m trying to say is that I think the RL code should at first include just that, code implementations of the most important RL algorithms, that allows people in different parts of life run their experiments, provided they can generate their own data and set up a relevant environment.

Gym helps a lot to test, play, experiment, and visualize practical applications, but I also have noticed that it tends to be the last place people look for all that. The step from the cartwheel to the real world, that’s the hardest part. A clean code base that provides independent implementations of algorithms might help with that.

p.s. I have a very long categorized list of RL papers if that’s helpful, I’m just scared to share cause it might be too long

Kind regards,
Theodore.