Fastai2 vs pytorch-lightening ... pros and cons? integration of the two?

lgvaz · May 24, 2020, 11:17pm

@sgugger, @jeremy and fastai community. I’m a bit nervous to share my thoughts, these kind of declarations can always generate resentment. While reading this please keep in mind that this is a very personal report of my experience with fastai and how I believe it can be improved.

I started writing this as a comparison between fastai and lightning, but it quickly developed into a more of feedback than a comparison. In the text you’re going to find a lot of “I feel” and that’s because all I wrote is just my opinion and I can be completely wrong, and that’s okay. I’m not trying to say what is better here.

A cold comparison

I feel like the libraries achieves different goals. Fastai feels to me like a final product, where I can go and quickly train models for all standard areas via a unified API. I know fastai also provides lower level abstractions for researches, but because fastai provides the entire system for actually getting a task done with deep learning (how to process your data to the model (data block API), apply transforms to your data, visualize the data, fine-tune your model, use model for inference…) it ends up being very opiniated as @scart97 pointed out.

Lightning is a framework that aims to only solve only the training part, granting the freedom (and hard work) to the developer to figure out the rest, this is why lightning feels to me like a tool for developing other libraries.

Lightning feels like high level pytorch, fastai is a whole new deal.

My personal experience.

I feel it’s important to say what context I’m coming from, so my comparisons become more understandable. All my thoughts regard fastai v2 exclusively since I don’t have much experience with v1.

First of all, I’m much more experienced in fastai than lightning. I currently contributed with 25 commits to fastai compared to a single PR (almost approved) in lightning. As I said before, I still require more experience with lightning for a more complete comparison, but I think I’m on the point that I can at least say something useful.

I felt forced to move away from fastai when I started diving into object detection.

I love to build libraries and for me this is the best and most enjoyable way of learning something, so quite naturally I started building an extension/library for object detection in fastai. It’s fair to say that at this point I was already well versed with the source code and most of the fastai abstractions and internals (at least regarding vision), so I was very confident of quickly building up a library.

Modern object detection models are quite different from any other application in the sense that they can handle differently sized images as inputs, so you cannot use the standard batch collation where you just resize all your images to the same size and stack them into a big tensor. Instead you just put all your images in a list and that’s it.

It’s straight forward to do this modification, you just need to create a new collate function and pass that do your data loader, quite rapidly I got that working in fastai.

The torchvision models are also quite unique in the sense that you need to pass your targets (a list of dicts) as an input to your model (as well as the images). It was very easy to achieve this functionality with the callback system.

Also, in fastai we have batch_tfms, these transforms assume your data is collated in to a big tensor, so the first step was to modify this functionality to instead apply the transform to each individual item at a time. Okay, not hard.

So far it’s all sunshine and roses, my problem really began with everything else that was not training.

Currently in my eyes, the biggest downfall fastai has, which I believe is the main cause of the other users complains on “errors are cryptic”, “not smooth”, is the fact that fastai tries to be really adaptive to user input, functions like tuplify, detuplify, is_listy, is_iter work really well in the beginning, but always ends up working in unexpected ways in the long road. I used to love those functions as well, I used to use them on all my little projects, but they always, ALWAYS came back to haunt me.

Remember when I said the only thing I needed to modify in the DataLoader was the collate function? And that the collate function should return a list? Can you see the chaos brewing? My batch started to interact in unexpected ways with these “innocent” functions. And it was hard, very hard to figure out what was happening, and where it was happening… After a lot of pain I figure out a solution, a hack, something non-sensical that I was forced to create to make peace with the library (and this process is what @feribg described as fightning with the library), I had to create something very strange just to satisfy the library requirements. For those who are curious, I created a class (called Bucket) that returned True to is_iter but false to is_listy, black magic.

Fastai also introduces semantic types for tensors, and pytorch is not really well equipped to deal with these, because of that a lot of complexity needed to be introduced. When the DataLoader collates a batch it tries to preserve it’s types, this works well for collated tensors, but not so well for list of tensors. I again had to hack around.

Differently than the is_listy kind of functions, the semantic types might be irreplaceable though, the benefits they add are huge.

So this was the general flow, hack, error, hack, error, hack… A long fight between me and the library that ultimately resulted in me moving away.

What did I lost when I moved to lightning?

The datablock API was not very useful for my case, object detection datasets comes in very different formats so I ended up having to write my custom “parsers” anyways. Very quickly I got my data ready to be fed to the model in lightning.

The transforms that fastai use are awesome, specially the batch transforms (I think currently no other library works like that? I can be wrong, I did not looked around that much). But as I said, batch transforms only work if you have a big collated tensor, so no benefit for me there. I ended up using albumentations (I actually made the library in a way that you could use any transform library you like, so technically you could use the fastai transforms as well).

Learner, this is the main point of loss. Functions like differential learning rates, easy freeze/unfreeze (fine-tuning) training was some of the stuff I had to re-implement, but it was not too much work.

Metrics. Currently lightning has no default support for metrics, so I had to implement my own logic. This is changing though, lightning is already working on this.

Everything else, coding style, repo maintenance, and stuff

A library is much more than it’s interface, every decision that is made behind the scenes is as important as the final product, I would like to talk about that.

The fastai coding style

This is the hardest point for me to write, because I’m unsure. This haunts me everyday when I’m building my own libraries, I navigated both extremes and I cannot decide what is best.

Fastai and lightning differ a lot in coding style. Lightning adopts the more standard “easy to read, explicit, pep8” style while fastai has one of it’s own.

I’ll not talk about the coding styles details, but rather the implications they have (For the details take a look here for fastai and here for lightning).

Fastai coding style is denser, it makes heavy use of abbreviations and of very specialised functions that simplifies common tasks by a lot (take a look at L as an example). I personally feel fastai coding style puts the developer productivity in it’s highest, this coding style tries to minimize the vertical space utilised by your code, so you can understand what it’s happening with a single glance over the code, without the need of scrolling around.

But all of this comes with a price, a steep learning curve and a lot of baggage. It was not once that I heard people saying “I cannot understand fastai source code”, for to understand it, you need to be used to it. Once you understand it though, it’s incredible and very simple, the time you put into it really pays off.

But we have to keep in mind that this is not a single person project, this is open source, and open source benefits only starts flourishing once people get involved and start contributing, I personally think we should make the contribution barrier as low as possible, and this is what we see with the lightning style. I feel there is a balance to achieve between “main developer productivity vs code complexity”

nbdev

I got very excited when I first heard about nbdev and quickly left my IDE and switched all my workflow to it, in the beginning everything is cool and new so it’s easy to ignore the obvious flaw, Jupyter has a crap interface for developing libraries. It works great for exploring your data and training your models, but it’s not an environment you want to spend all day developing your code, for obvious reasons.

First of all it sits on the browser, so shortcuts are limited and the editor flow cannot ever match the quality of a proper IDE. It induces bad code practices, like writing big extensive files that are hard to move around (going in the opposite way of the fastai coding style), I tried the ToC extension but that still bad compared to having small files that you can quickly fuzzy search . It’s much harder than necessary to jump around files, there is no way of quickly jumping between function/classes definitions (yeah yeah you can do ?? but that is crap compared to actually going to the file where the function is defined). The editor does not help you even with very simple stuff like “hey, this variable name was not declared”.

The problem is not nbdev itself but the interface around it, maybe nbdev is ahead of its time, we first need a solid notebook interface before we all start building libraries in notebooks. Maybe Jupyter lab is becoming that, but it stills works on the browser, so idk.

I was trapped using nbdev thinking it made me more productive, until I was forced to get back to my IDE (to submit a PR to lightning) and I felt what freedom was like again. And at the end of the day, everything I could do with a notebook I can do with an IDE, the only thing you need is a REPL interface (python console in pycharm), this way you can send code from the file you’re working to the REPL to be executed (think of it as an automatic copy and paste into a IPython shell), this way you can keep the interactive way of programming (which is the key for notebook productivity) and the good interface.

Contributing

I’ll drop the bomb, it’s hard to contribute to fastai, and I tried really hard, one of the problems is that there is no board with what needs to be fixed, what needs to be implemented, what needs to be discussed. Most of the contributions I done were fixes to problems I encountered while developing my own libraries.

Fastai is a much bigger project than lightning, as I previously said, it tries to cover the entire pipeline for solving a problem with deep learning. I would love to see a future where researches start to use fastai as they main tool of development, just imagine the speedup we would see if all researches used an unified tool with all the best practices already available! Because of the fastai nature it would be extremely easy to mix and match parts of different papers, testing new ideas would be easy as we never saw before. I come from a Deep RL space, and some very simple practices are still not used in the field, simply because it’s not worth to implement them.

But for this to happen we need researches to migrate to fastai, and they need to trust fastai, all those complains about “fighting with the library” and “crypt error messages” and in some cases “confusing documentation” needs to go away, and for that to happen we need more contributors ASAP. We need both more contributors from the community, and more core contributors from the fastai team.

Fastai is already loosing a lot of momentum, it’s not the best library on NLP anymore, we still don’t have object detection (and derivates), we have people unofficially working on audio and time series but I don’t think there is a plan of officially integrating those in the library.

Go to lightning repo, open the issues tab, search for the label “help wanted” and you find 186 open issues. Too overwhelming? Filter for “help wanted” and “good first issue” and you get twelve. Want to understand what new features are being worked on for the next releases? Go to the milestones page to find exactly that.

Take a look at their PRs, they need 3 core contributor approvals before merging, they have a lot of discussion on how to interface should look and behave, what should be added, what shouldn’t. To be clear, I don’t speak for fastai v1 because I was not active back then, but I don’t see this happening in v2.

Now, there is a very serious and important question we have to answer. Is it possible to have a “one does all” library for deep learning given the current scenario? Or is DL simply moving too fast for that? What does fastai wants to be? A library with decent baselines, or a library with SOTA baselines?

If the answer is we want SOTA baselines, we need SOTA to be developed with fastai and not replicated by fastai. That’s the only way of maintaining the pace with this ever more rapidly changing technology.

I cannot even image the amount of pressure and stress @sgugger and @jeremy are facing right now, there is the book, the course, the libraries, the forums, kids and the dreadful pandemic. Please know that you two are my inspiration, you guys must be on the select number of people that have access to a time machine, for I look at all that you have built and I find astonishing that it was built by mainly two people (I say mainly because there is always other people I’m failing to cite (and I’m sorry for that) and the community that plays an essential role in this, but the pilar is always the two of you)