Plain PyTorch implementation of fast.ai notebooks

Looks great!

I was thinking of doing the same, but probably after this course end, since it’s already a handful to follow the course, work full time and try out something with my own datasets.
If you’ll start this project sooner, I’d like to follow and maybe join closer to the end of the live course.

That’s sounds interesting from an educational point of view.
I’ve started something similar in this repo:

It is just a bunch of PyTorch scripts, and a couple of classes. I wanted to implement something simple, like, fastai_core, which would include training loop and callbacks only. So we can use an arbitrary iterable, dataset, augmentation library, etc.

However, it is a quite time consuming task :smile: The fastai library includes a lot of very advanced stuff, especially if talking about RNNs and YOLO-like detectors which is not too simple to replicate. It is not too difficult to implement a basic training loop and callbacks (there are a lot of examples in PyTorch docs) but to get the same loss/accuracy values as fastai code probably is not that simple.

I think that a “bi-directed” approach to learning is a good thing. Like, when you do some real data science, competitions, data analysis, building apps, and solutions, use something proven and tested, extend the library, etc. But if you feel ready to dive deeper, then start implementing things from scratch, or at least with fewer levels of abstraction.

Anyway, would be glad to help with committing into fastai itself, or any other interesting initiative withing PyTorch ecosystem :wink:

7 Likes

@pnvijay is part of our Asia study group. I think Vijay will be the best person to check with for the porting ML course project. A few of us are also studying the ML course right now. We have @PegasusWithoutWinds and @Taka who I think will be interested in this as well.

IMO, a smarter way to spend time is pick a small piece (like the basic training loop) from the full fastai library and give your best attempt on implementing it from scratch using Python and PyTorch and write proper documentation along the way. I think this should be more manageable. This is even useful for beginners. It will help them internalize their learning.


Update 1:

To give an example of the kind of ‘mini-docs’ that I find it useful:

6 Likes

I think this is a nice idea and something I’m keen to do. I definitely agree on starting simple though, for example just understanding what is imported and what depends on what. Then as you build from a simple resnet34 you can compare impact on performance of each part as you add it.

Very interesting topic ! Count me on. I tried to replicate the tabular module with pure Python reverse tabular module. This module at this time I think is less challenging than other module (vision, text). Now, I am trying to dig deeper in vision and optimizer

@srmsoumya Fast.ai library is pretty much modular and you can use any piece of code to use it with your custom models. Jeremy will definitely explain some of the internals and would encourage us to understand the structure and choose a problem(classification,recommendation,LM,sentimental analysis etc.) and solve it with fast.ai library.

I like this Idea. Its already done by jeremy. Basically the whole documentation at docs.fast.ai is generated from notebooks Link.
Every function,method is there. Have a look.

Glad you found it useful. When I was initially writing it, I really didn’t know where I was going, but I definitely learned a lot about where the loss function is determined in the new fastai library. I usually will start a new post whenever I find something that I don’t have a good grasp on and if I am able to solve it or ask a well thought-out question, I will post it.

I usually learn more when I am explaining what I learn and I have actually searched errors in the forums and ended up on posts that I started that I had forgotten about so they can actually save your future self some headache time too.

5 Likes

I have implemented SGDR and Snapshot Ensembling in PyTorch when I was doing one of my personal projects.
You can find the code here: https://github.com/jayeshsaita/Speech-Commands-Recognition
The code is documented and easy to understand.
Also, if anyone is interested in the project and wants to know more, I have written a blog. You can find the blog here: https://towardsdatascience.com/ok-google-how-to-do-speech-recognition-f77b5d7cbe0b

3 Likes

I’m doing the same thing myself on different parts of the library. Like @noskill said, the official docs already have all the notebooks created and there are over 30 notebooks there. I think one way to be helpful to the docs at this point is to go through the notebooks and see if there is any error (I’m sure there will be).

I think at this point re-implementing in PyTorch is mostly for our own understanding, and it’s an absolutely necessary step if we want to get a solid grasp of these tools. One can always dig deeper than PyTorch if re-writing in PyTorch is not enough :wink: If we find something that’s missing from the library, we’ll do pull requests to add/update.

What do you think?

1 Like

Great discussion! Here are some ways that you can learn a lot about the library, whilst also contributing to the community:

  • Pick a class, function, or method and write tests for it. For instance, here are the tests for fastai.core. Adding tests for anything without good test coverage is a great way to really understand that part of the library deeply, and have in depth conversations with the dev team about the reasoning behind decisions in the code
  • Document something that is currently undocumented. You can find them by looking for the “new methods” section in any doc notebook. Here’s a search that lists them
  • Add an example of use to the docs for something that doesn’t currently have an example of use. We’d like everything soon in the docs to include an actual piece of working code demonstrating it. Currently we’ve largely only provided working examples for stuff higher up the abstraction ladder.
37 Likes

Agree, a good test coverage and thorough documentation would be really helpful. I guess the library will get a lot of contributors during the course.

1 Like

Yeah! Totally agree on this.

I would love to write test for the library. However I have no experience on it :smiley:, I will search but appreciated if someone can give me some useful resources about it.

Then, when finish, I will do a PR on github ? I have no experience on doing a PR also so I’m sorry in advance if I bother you too much @jeremy.

Thank you

That’s great! Here’s a starting point for you:

https://docs-dev.fast.ai/test.html

Easiest is to read the source for some existing tests, and play around with them to see how they work. And read the pytest docs or a tutorial.

That’s right. It’s wonderfully easy if you install hub:

https://hub.github.com/

And it’s certainly no bother to help folks wanting to help me! :smiley:

14 Likes

I’ve moved this to the ‘advanced’ category.

I have no experience doing any of this but I’m willing to roll my sleeves up and learn while helping!!

1 Like

That’s an awesome attitude! Just yell if you need any help. :slight_smile:

@shoof @jayeshsaita @cedric @devforfu I really like the idea of moving some of the key parts of the fastai library into plain PyTorch, and, even more interestingly, extending some of the functionality. We could create both fastai-independent and dependent versions. If we enhance something sufficiently, we should create a pull request with a dependent version. I’m definitely up to help with a few key parts of this. If we get at least two people to volunteer, then we can split up responsibilities and check in on progress every now and then.

1 Like

That’s an interesting idea. However, I propose to make sure that we really understand how the library works at first, helping with tests, digging into PyTorch, building custom torch.Dataset classes, applications, meaningful PR’s, etc. Then we’ll get enough expertise, I believe, to make our own forks.

Of course, it is only my point of view, based on some experience with building training loops and data loaders. And, probably others already are quite flexible with fastai and pytorch. Anyway, would be glad to participate in fastai development, its forks, or inspired-by libraries.

3 Likes