Fastai2 vs pytorch-lightening ... pros and cons? integration of the two?

Check out the notebooks the library was built on in here: https://github.com/fastai/fastai2/tree/master/nbs

The summary, check learn.summary(), dblock.summary(). The rest are in callbacks (gradient accumulation, checkpoint, early stopping), logger is in there too in the other callback notebooks. And if you prefer an HTML, look at https://dev.fast.ai (the docs) and each notebook has been generated into a documentation webpage

3 Likes

Its been interesting to see the push Lightening has made recently, I think its partially related to people’s reluctance in spending time learning a whole new api vs dabbling with a more light weight one.

I haven’t used it, but from my (shallow) understanding, TPU support is the main thing it has that fastai v2 lacks (for now), for the rest I believe fastai has it more or less covered

Another very good framework is Nemo from Nvidia which includes very good free models for ASR and TTS.

1 Like

Recently I have used fast v2 and pytorch-lightening in some small project. I find it that fastai2 is like a useful tools that you need to take time to learn it. And Pytorch-lightening is more like a way to organize your PyTorch code. If you want to start a model from scratch, I prefer fastai2 because it offers great tools to deal with dataset. If you want to modify others project for experiment, I prefer pytorch-lightening because I could have a better understanding about the project after move the project to pytorch-lightening.

1 Like

The only thing that makes intermediate users prefer pytorch-lightning over fastai2 is pytorch-lightning is just plain python + pytorch, fastai2 comes with it’s own api design, and even after looking at the source code fastai2 still looks mysterious, fastbook has 1 chapter explaining the midlevel api, it would have been great if there are more chapters like it or similar resources explaining the api in detail.

5 Likes

This from one of their comments on this intro video

  • Lightning is built for professional researchers using PyTorch.
  • fastAI is built for students for the fastAI course (with PyTorch).

Seems to trivialize fastai as a library fine for learning if you are taking the fastai course … but that’s about it. If you want to do real research, I guess you use PyTorch-Lightening or write everything out by hand??

Sounds a bit snarky imo.

5 Likes

I’d agree. I’ve done plenty of research with fastai (literally the last year and a half), and I also converted over a few professors at my university to use fastai instead of Keras for their research with their students

4 Likes

Yah … it seems like a shitty oversimplification that isn’t even factual. Kinda like a White House Press briefing.

1 Like

You can’t get impartial comparison from the person who built one of those libraries, which is why I’m personally refraining from commenting.
I’d be curious to have the feedback of @lgvaz since I think I saw on Twitter he used PyTorch Lightening for a project on object detection.

6 Likes

Concerning TPU support and distributed training (across multiple machines and multiple GPUS) … where should I look in the fastai2 docs?

I don’t think the TPU support is there but I could be wrong.

TPU isn’t currently. Distributed: https://dev.fast.ai/distributed

2 Likes

Ow booy, that is a tough spot you put me in @sgugger :rofl: :rofl:

Let me get used to lightning a bit more and I’ll share my thoughts here :smile:

5 Likes

I am a core developer of the unofficial fastai2 audio extension and also have recently experimented with pytorch-lightning in one project, so I’ll try to do a unbiased comparation of the two libraries.

First of all, note that they have different purposes. Lightning is a lightweight library focused on the training loop, and tries to make the engineering aspects (like logging or distributed/TPU training) trivial while giving full flexibility in the way you write your research code. On the other hand, fastai is a powerful framework that tries to integrate best pratices on all of the aspects of deep learning training, from data loading to architectures and also the training loop.

Pros about fastai2:

  • Modern best pratices are constantly implemented

  • Datablock API is wonderful to load data

  • Huge variety of applications ready to be used

Cons about fastai2:

  • The library is strongly opinated on how things should behave, down to the level of changing how python works. This introduces a huge friction when you want to do something new or different.

  • It’s hard to integrate with other libraries in the pytorch ecosystem. More than once I’ve seen people reimplement code in fastai2 because it’s easier.

  • Error messages in fastai2 have really degraded from fastai1. Often they are a couple of pages long and hard to tell where the problem actually is.

Pros about pytorch-lightning:

  • No friction at all to use with other libraries. Just import them and do your thing, no need to use wrappers or rewrite code.

  • Automatic experiment management is great, you just run the code a bunch of times with different hyperparameters and fire tensorboard to easily compare results.

  • Larger community of core and active contributors

  • Plain pytorch, code is simple to understand and reason about

Cons about pytorch-lightning:

  • Mixed precision training is currently tied to NVIDIA apex (waiting on official torch.amp to stabilize)

  • Could have better integration with hydra

16 Likes

Thanks for the detailed comment!

So in your PL project … how did you assemble/create your Datasets/Dataloaders? How did you export everything needed for inference, and how did you actually code things up for inference?

fastai2 takes care of most of those bits for ya, whereas PL seems almost purely focused on one thing = the training/eval loop; leaving the rest up to you.

To add to this, I remember someone mentioning his company was trying to decide to go pytorch or fastai v1 route. They ended up going with pytorch for easier maintainability (converting to fastai v2 down the road was an unknown). As said, each has pros and cons, and the user should figure out his/her needs.

That’s exactly it.

Check out asr/data.py for the Dataset definition, and usage is at asr/asr_module.py. Dataloader is the pytorch vanilla one, but I use a custom collate_fn and batch_sampler with it. I apply what would be the item_tfms while loading the items at the Dataset, and the batch_tfms are applied in the training_step inside the ASRModule.

My project is not at this stage yet, but the LightningModule work the same way as a pytorch nn.Module. Last time I checked it, you load the checkpoint and use it the same as this.

2 Likes

I would say the following in terms of personal, unbiased feedback:

Too much indirection in fastai2, a lot of people love the datablocks API, I’m personally not the biggest fan because my use cases most often revolve around tabular data, so I prefer dealing with the raw tensors, I can definitely see it being more useful for images though.

It was mentioned previously but as a result of all the indirection, errors are not great and just browsing through the code even for fairly decent developers is not smooth.

The way I tend to use them is fastai2 where I need a good out of the box solution to a known problem (fine tune for image classification let’s say) vs lightning for anything I’d like to experiment with, research and dive deeper, I tend to fight the library too much in the fastai2 case.

7 Likes

@sgugger, @jeremy and fastai community. I’m a bit nervous to share my thoughts, these kind of declarations can always generate resentment. While reading this please keep in mind that this is a very personal report of my experience with fastai and how I believe it can be improved.

I started writing this as a comparison between fastai and lightning, but it quickly developed into a more of feedback than a comparison. In the text you’re going to find a lot of “I feel” and that’s because all I wrote is just my opinion and I can be completely wrong, and that’s okay. I’m not trying to say what is better here.

A cold comparison

I feel like the libraries achieves different goals. Fastai feels to me like a final product, where I can go and quickly train models for all standard areas via a unified API. I know fastai also provides lower level abstractions for researches, but because fastai provides the entire system for actually getting a task done with deep learning (how to process your data to the model (data block API), apply transforms to your data, visualize the data, fine-tune your model, use model for inference…) it ends up being very opiniated as @scart97 pointed out.

Lightning is a framework that aims to only solve only the training part, granting the freedom (and hard work) to the developer to figure out the rest, this is why lightning feels to me like a tool for developing other libraries.

Lightning feels like high level pytorch, fastai is a whole new deal.

My personal experience.

I feel it’s important to say what context I’m coming from, so my comparisons become more understandable. All my thoughts regard fastai v2 exclusively since I don’t have much experience with v1.

First of all, I’m much more experienced in fastai than lightning. I currently contributed with 25 commits to fastai compared to a single PR (almost approved) in lightning. As I said before, I still require more experience with lightning for a more complete comparison, but I think I’m on the point that I can at least say something useful.

I felt forced to move away from fastai when I started diving into object detection.

I love to build libraries and for me this is the best and most enjoyable way of learning something, so quite naturally I started building an extension/library for object detection in fastai. It’s fair to say that at this point I was already well versed with the source code and most of the fastai abstractions and internals (at least regarding vision), so I was very confident of quickly building up a library.

Modern object detection models are quite different from any other application in the sense that they can handle differently sized images as inputs, so you cannot use the standard batch collation where you just resize all your images to the same size and stack them into a big tensor. Instead you just put all your images in a list and that’s it.

It’s straight forward to do this modification, you just need to create a new collate function and pass that do your data loader, quite rapidly I got that working in fastai.

The torchvision models are also quite unique in the sense that you need to pass your targets (a list of dicts) as an input to your model (as well as the images). It was very easy to achieve this functionality with the callback system.

Also, in fastai we have batch_tfms, these transforms assume your data is collated in to a big tensor, so the first step was to modify this functionality to instead apply the transform to each individual item at a time. Okay, not hard.

So far it’s all sunshine and roses, my problem really began with everything else that was not training.

Currently in my eyes, the biggest downfall fastai has, which I believe is the main cause of the other users complains on “errors are cryptic”, “not smooth”, is the fact that fastai tries to be really adaptive to user input, functions like tuplify, detuplify, is_listy, is_iter work really well in the beginning, but always ends up working in unexpected ways in the long road. I used to love those functions as well, I used to use them on all my little projects, but they always, ALWAYS came back to haunt me.

Remember when I said the only thing I needed to modify in the DataLoader was the collate function? And that the collate function should return a list? Can you see the chaos brewing? My batch started to interact in unexpected ways with these “innocent” functions. And it was hard, very hard to figure out what was happening, and where it was happening… After a lot of pain I figure out a solution, a hack, something non-sensical that I was forced to create to make peace with the library (and this process is what @feribg described as fightning with the library), I had to create something very strange just to satisfy the library requirements. For those who are curious, I created a class (called Bucket) that returned True to is_iter but false to is_listy, black magic.

Fastai also introduces semantic types for tensors, and pytorch is not really well equipped to deal with these, because of that a lot of complexity needed to be introduced. When the DataLoader collates a batch it tries to preserve it’s types, this works well for collated tensors, but not so well for list of tensors. I again had to hack around.

Differently than the is_listy kind of functions, the semantic types might be irreplaceable though, the benefits they add are huge.

So this was the general flow, hack, error, hack, error, hack… A long fight between me and the library that ultimately resulted in me moving away.

What did I lost when I moved to lightning?

The datablock API was not very useful for my case, object detection datasets comes in very different formats so I ended up having to write my custom “parsers” anyways. Very quickly I got my data ready to be fed to the model in lightning.

The transforms that fastai use are awesome, specially the batch transforms (I think currently no other library works like that? I can be wrong, I did not looked around that much). But as I said, batch transforms only work if you have a big collated tensor, so no benefit for me there. I ended up using albumentations (I actually made the library in a way that you could use any transform library you like, so technically you could use the fastai transforms as well).

Learner, this is the main point of loss. Functions like differential learning rates, easy freeze/unfreeze (fine-tuning) training was some of the stuff I had to re-implement, but it was not too much work.

Metrics. Currently lightning has no default support for metrics, so I had to implement my own logic. This is changing though, lightning is already working on this.

Everything else, coding style, repo maintenance, and stuff

A library is much more than it’s interface, every decision that is made behind the scenes is as important as the final product, I would like to talk about that.

The fastai coding style

This is the hardest point for me to write, because I’m unsure. This haunts me everyday when I’m building my own libraries, I navigated both extremes and I cannot decide what is best.

Fastai and lightning differ a lot in coding style. Lightning adopts the more standard “easy to read, explicit, pep8” style while fastai has one of it’s own.

I’ll not talk about the coding styles details, but rather the implications they have (For the details take a look here for fastai and here for lightning).

Fastai coding style is denser, it makes heavy use of abbreviations and of very specialised functions that simplifies common tasks by a lot (take a look at L as an example). I personally feel fastai coding style puts the developer productivity in it’s highest, this coding style tries to minimize the vertical space utilised by your code, so you can understand what it’s happening with a single glance over the code, without the need of scrolling around.

But all of this comes with a price, a steep learning curve and a lot of baggage. It was not once that I heard people saying “I cannot understand fastai source code”, for to understand it, you need to be used to it. Once you understand it though, it’s incredible and very simple, the time you put into it really pays off.

But we have to keep in mind that this is not a single person project, this is open source, and open source benefits only starts flourishing once people get involved and start contributing, I personally think we should make the contribution barrier as low as possible, and this is what we see with the lightning style. I feel there is a balance to achieve between “main developer productivity vs code complexity”

nbdev

I got very excited when I first heard about nbdev and quickly left my IDE and switched all my workflow to it, in the beginning everything is cool and new so it’s easy to ignore the obvious flaw, Jupyter has a crap interface for developing libraries. It works great for exploring your data and training your models, but it’s not an environment you want to spend all day developing your code, for obvious reasons.

First of all it sits on the browser, so shortcuts are limited and the editor flow cannot ever match the quality of a proper IDE. It induces bad code practices, like writing big extensive files that are hard to move around (going in the opposite way of the fastai coding style), I tried the ToC extension but that still bad compared to having small files that you can quickly fuzzy search . It’s much harder than necessary to jump around files, there is no way of quickly jumping between function/classes definitions (yeah yeah you can do ?? but that is crap compared to actually going to the file where the function is defined). The editor does not help you even with very simple stuff like “hey, this variable name was not declared”.

The problem is not nbdev itself but the interface around it, maybe nbdev is ahead of its time, we first need a solid notebook interface before we all start building libraries in notebooks. Maybe Jupyter lab is becoming that, but it stills works on the browser, so idk.

I was trapped using nbdev thinking it made me more productive, until I was forced to get back to my IDE (to submit a PR to lightning) and I felt what freedom was like again. And at the end of the day, everything I could do with a notebook I can do with an IDE, the only thing you need is a REPL interface (python console in pycharm), this way you can send code from the file you’re working to the REPL to be executed (think of it as an automatic copy and paste into a IPython shell), this way you can keep the interactive way of programming (which is the key for notebook productivity) and the good interface.

Contributing

I’ll drop the bomb, it’s hard to contribute to fastai, and I tried really hard, one of the problems is that there is no board with what needs to be fixed, what needs to be implemented, what needs to be discussed. Most of the contributions I done were fixes to problems I encountered while developing my own libraries.

Fastai is a much bigger project than lightning, as I previously said, it tries to cover the entire pipeline for solving a problem with deep learning. I would love to see a future where researches start to use fastai as they main tool of development, just imagine the speedup we would see if all researches used an unified tool with all the best practices already available! Because of the fastai nature it would be extremely easy to mix and match parts of different papers, testing new ideas would be easy as we never saw before. I come from a Deep RL space, and some very simple practices are still not used in the field, simply because it’s not worth to implement them.

But for this to happen we need researches to migrate to fastai, and they need to trust fastai, all those complains about “fighting with the library” and “crypt error messages” and in some cases “confusing documentation” needs to go away, and for that to happen we need more contributors ASAP. We need both more contributors from the community, and more core contributors from the fastai team.

Fastai is already loosing a lot of momentum, it’s not the best library on NLP anymore, we still don’t have object detection (and derivates), we have people unofficially working on audio and time series but I don’t think there is a plan of officially integrating those in the library.

Go to lightning repo, open the issues tab, search for the label “help wanted” and you find 186 open issues. Too overwhelming? Filter for “help wanted” and “good first issue” and you get twelve. Want to understand what new features are being worked on for the next releases? Go to the milestones page to find exactly that.

Take a look at their PRs, they need 3 core contributor approvals before merging, they have a lot of discussion on how to interface should look and behave, what should be added, what shouldn’t. To be clear, I don’t speak for fastai v1 because I was not active back then, but I don’t see this happening in v2.

Now, there is a very serious and important question we have to answer. Is it possible to have a “one does all” library for deep learning given the current scenario? Or is DL simply moving too fast for that? What does fastai wants to be? A library with decent baselines, or a library with SOTA baselines?

If the answer is we want SOTA baselines, we need SOTA to be developed with fastai and not replicated by fastai. That’s the only way of maintaining the pace with this ever more rapidly changing technology.

I cannot even image the amount of pressure and stress @sgugger and @jeremy are facing right now, there is the book, the course, the libraries, the forums, kids and the dreadful pandemic. Please know that you two are my inspiration, you guys must be on the select number of people that have access to a time machine, for I look at all that you have built and I find astonishing that it was built by mainly two people (I say mainly because there is always other people I’m failing to cite (and I’m sorry for that) and the community that plays an essential role in this, but the pilar is always the two of you)

65 Likes

I have to agree with @lgvaz 100% nail it.

In my experience FastAI is great when the problems you are working on are already solved in the library like image classification or using unet for segmentation, etc… If it’s a topic that has been cover extensively in the lessons FastAI does great at it, but as soon as you need to start modifying the internals it’s just a big mess plus the fact that the API is constantly being refractor doesn’t help. When you change behavior of the API or new functionality is better to just go back to using pure pytorch and or pytorch lightning, that’s why they mentioned is better for research.

Again that’s just my opinion. I don’t necessarily think is one or the other I personally use all 3 pure Pytorch pytorch lightning and FastAI. They all pytorch in the end so room for both to coexist. My main issue with FastAI is and will always be that it is a new API. So for a library that is build on top of Pytorch it sucks that it doesn’t play nicely with pytorch.

For example if I could just use the Data block api and just use it with my super advance training loop that I created or someone else already created it will make a huge difference. Or if I’m using a custome dataloader that has a new sampler and requires unique input outputs combinations but I can just use that dataloade with fastai.

That’s my ideal FastAI library one that simplifies some thing but always plays nicely with Pytorch and let’s me go back to it at any point. For now I’ll just use it for those problems it excels at like image classification.

Peace.

5 Likes

After spending the last week with lightning, I think this is the thing the concerns me the most right now, especially given Sylvain’s recent departure to huggingface … and that is: Who is maintaining fastai?

The last commit was almost a month ago, Jeremy doesn’t seem as involved any more with the maintenance of the library (that could be temporary, I dunno), and Sylvain, who seemed to be the sole maintainer and primary developer, is gone. It honestly makes me a bit nervous as a software developer wanting to use fastai in a number of projects. I wonder, will it be around in 6 months? In a year? Will the API in the next version be as radical and breaking as it was from v1 to v2? Is it really just an educational framework subject to constant change as alleged by the pytorch-lightning folks?

I love fastai, I love the framework (hell, I’m deeply invested in it), but this question really needs to be answered before I personally can have confidence in the long-term durability and stability of this library. I would suggest what is needed is something similar to what the lightning folks are doing in terms of having a team of core contributors/owners who manage the vision and development of the library.

Just my 2 cents. I’m still playing with PL, parts of it I like, but there is A LOT that isn’t there that I’d have to re-implement that I otherwise wouldn’t have too with fastai. My concern is more about the future, vision, and stability of the later.

18 Likes