Fastai2 vs pytorch-lightening ... pros and cons? integration of the two?

I’d agree. I’ve done plenty of research with fastai (literally the last year and a half), and I also converted over a few professors at my university to use fastai instead of Keras for their research with their students

4 Likes

Yah … it seems like a shitty oversimplification that isn’t even factual. Kinda like a White House Press briefing.

1 Like

You can’t get impartial comparison from the person who built one of those libraries, which is why I’m personally refraining from commenting.
I’d be curious to have the feedback of @lgvaz since I think I saw on Twitter he used PyTorch Lightening for a project on object detection.

6 Likes

Concerning TPU support and distributed training (across multiple machines and multiple GPUS) … where should I look in the fastai2 docs?

I don’t think the TPU support is there but I could be wrong.

TPU isn’t currently. Distributed: https://dev.fast.ai/distributed

2 Likes

Ow booy, that is a tough spot you put me in @sgugger :rofl: :rofl:

Let me get used to lightning a bit more and I’ll share my thoughts here :smile:

5 Likes

I am a core developer of the unofficial fastai2 audio extension and also have recently experimented with pytorch-lightning in one project, so I’ll try to do a unbiased comparation of the two libraries.

First of all, note that they have different purposes. Lightning is a lightweight library focused on the training loop, and tries to make the engineering aspects (like logging or distributed/TPU training) trivial while giving full flexibility in the way you write your research code. On the other hand, fastai is a powerful framework that tries to integrate best pratices on all of the aspects of deep learning training, from data loading to architectures and also the training loop.

Pros about fastai2:

  • Modern best pratices are constantly implemented

  • Datablock API is wonderful to load data

  • Huge variety of applications ready to be used

Cons about fastai2:

  • The library is strongly opinated on how things should behave, down to the level of changing how python works. This introduces a huge friction when you want to do something new or different.

  • It’s hard to integrate with other libraries in the pytorch ecosystem. More than once I’ve seen people reimplement code in fastai2 because it’s easier.

  • Error messages in fastai2 have really degraded from fastai1. Often they are a couple of pages long and hard to tell where the problem actually is.

Pros about pytorch-lightning:

  • No friction at all to use with other libraries. Just import them and do your thing, no need to use wrappers or rewrite code.

  • Automatic experiment management is great, you just run the code a bunch of times with different hyperparameters and fire tensorboard to easily compare results.

  • Larger community of core and active contributors

  • Plain pytorch, code is simple to understand and reason about

Cons about pytorch-lightning:

  • Mixed precision training is currently tied to NVIDIA apex (waiting on official torch.amp to stabilize)

  • Could have better integration with hydra

15 Likes

Thanks for the detailed comment!

So in your PL project … how did you assemble/create your Datasets/Dataloaders? How did you export everything needed for inference, and how did you actually code things up for inference?

fastai2 takes care of most of those bits for ya, whereas PL seems almost purely focused on one thing = the training/eval loop; leaving the rest up to you.

To add to this, I remember someone mentioning his company was trying to decide to go pytorch or fastai v1 route. They ended up going with pytorch for easier maintainability (converting to fastai v2 down the road was an unknown). As said, each has pros and cons, and the user should figure out his/her needs.

That’s exactly it.

Check out asr/data.py for the Dataset definition, and usage is at asr/asr_module.py. Dataloader is the pytorch vanilla one, but I use a custom collate_fn and batch_sampler with it. I apply what would be the item_tfms while loading the items at the Dataset, and the batch_tfms are applied in the training_step inside the ASRModule.

My project is not at this stage yet, but the LightningModule work the same way as a pytorch nn.Module. Last time I checked it, you load the checkpoint and use it the same as this.

2 Likes

I would say the following in terms of personal, unbiased feedback:

Too much indirection in fastai2, a lot of people love the datablocks API, I’m personally not the biggest fan because my use cases most often revolve around tabular data, so I prefer dealing with the raw tensors, I can definitely see it being more useful for images though.

It was mentioned previously but as a result of all the indirection, errors are not great and just browsing through the code even for fairly decent developers is not smooth.

The way I tend to use them is fastai2 where I need a good out of the box solution to a known problem (fine tune for image classification let’s say) vs lightning for anything I’d like to experiment with, research and dive deeper, I tend to fight the library too much in the fastai2 case.

7 Likes

@sgugger, @jeremy and fastai community. I’m a bit nervous to share my thoughts, these kind of declarations can always generate resentment. While reading this please keep in mind that this is a very personal report of my experience with fastai and how I believe it can be improved.

I started writing this as a comparison between fastai and lightning, but it quickly developed into a more of feedback than a comparison. In the text you’re going to find a lot of “I feel” and that’s because all I wrote is just my opinion and I can be completely wrong, and that’s okay. I’m not trying to say what is better here.

A cold comparison

I feel like the libraries achieves different goals. Fastai feels to me like a final product, where I can go and quickly train models for all standard areas via a unified API. I know fastai also provides lower level abstractions for researches, but because fastai provides the entire system for actually getting a task done with deep learning (how to process your data to the model (data block API), apply transforms to your data, visualize the data, fine-tune your model, use model for inference…) it ends up being very opiniated as @scart97 pointed out.

Lightning is a framework that aims to only solve only the training part, granting the freedom (and hard work) to the developer to figure out the rest, this is why lightning feels to me like a tool for developing other libraries.

Lightning feels like high level pytorch, fastai is a whole new deal.

My personal experience.

I feel it’s important to say what context I’m coming from, so my comparisons become more understandable. All my thoughts regard fastai v2 exclusively since I don’t have much experience with v1.

First of all, I’m much more experienced in fastai than lightning. I currently contributed with 25 commits to fastai compared to a single PR (almost approved) in lightning. As I said before, I still require more experience with lightning for a more complete comparison, but I think I’m on the point that I can at least say something useful.

I felt forced to move away from fastai when I started diving into object detection.

I love to build libraries and for me this is the best and most enjoyable way of learning something, so quite naturally I started building an extension/library for object detection in fastai. It’s fair to say that at this point I was already well versed with the source code and most of the fastai abstractions and internals (at least regarding vision), so I was very confident of quickly building up a library.

Modern object detection models are quite different from any other application in the sense that they can handle differently sized images as inputs, so you cannot use the standard batch collation where you just resize all your images to the same size and stack them into a big tensor. Instead you just put all your images in a list and that’s it.

It’s straight forward to do this modification, you just need to create a new collate function and pass that do your data loader, quite rapidly I got that working in fastai.

The torchvision models are also quite unique in the sense that you need to pass your targets (a list of dicts) as an input to your model (as well as the images). It was very easy to achieve this functionality with the callback system.

Also, in fastai we have batch_tfms, these transforms assume your data is collated in to a big tensor, so the first step was to modify this functionality to instead apply the transform to each individual item at a time. Okay, not hard.

So far it’s all sunshine and roses, my problem really began with everything else that was not training.

Currently in my eyes, the biggest downfall fastai has, which I believe is the main cause of the other users complains on “errors are cryptic”, “not smooth”, is the fact that fastai tries to be really adaptive to user input, functions like tuplify, detuplify, is_listy, is_iter work really well in the beginning, but always ends up working in unexpected ways in the long road. I used to love those functions as well, I used to use them on all my little projects, but they always, ALWAYS came back to haunt me.

Remember when I said the only thing I needed to modify in the DataLoader was the collate function? And that the collate function should return a list? Can you see the chaos brewing? My batch started to interact in unexpected ways with these “innocent” functions. And it was hard, very hard to figure out what was happening, and where it was happening… After a lot of pain I figure out a solution, a hack, something non-sensical that I was forced to create to make peace with the library (and this process is what @feribg described as fightning with the library), I had to create something very strange just to satisfy the library requirements. For those who are curious, I created a class (called Bucket) that returned True to is_iter but false to is_listy, black magic.

Fastai also introduces semantic types for tensors, and pytorch is not really well equipped to deal with these, because of that a lot of complexity needed to be introduced. When the DataLoader collates a batch it tries to preserve it’s types, this works well for collated tensors, but not so well for list of tensors. I again had to hack around.

Differently than the is_listy kind of functions, the semantic types might be irreplaceable though, the benefits they add are huge.

So this was the general flow, hack, error, hack, error, hack… A long fight between me and the library that ultimately resulted in me moving away.

What did I lost when I moved to lightning?

The datablock API was not very useful for my case, object detection datasets comes in very different formats so I ended up having to write my custom “parsers” anyways. Very quickly I got my data ready to be fed to the model in lightning.

The transforms that fastai use are awesome, specially the batch transforms (I think currently no other library works like that? I can be wrong, I did not looked around that much). But as I said, batch transforms only work if you have a big collated tensor, so no benefit for me there. I ended up using albumentations (I actually made the library in a way that you could use any transform library you like, so technically you could use the fastai transforms as well).

Learner, this is the main point of loss. Functions like differential learning rates, easy freeze/unfreeze (fine-tuning) training was some of the stuff I had to re-implement, but it was not too much work.

Metrics. Currently lightning has no default support for metrics, so I had to implement my own logic. This is changing though, lightning is already working on this.

Everything else, coding style, repo maintenance, and stuff

A library is much more than it’s interface, every decision that is made behind the scenes is as important as the final product, I would like to talk about that.

The fastai coding style

This is the hardest point for me to write, because I’m unsure. This haunts me everyday when I’m building my own libraries, I navigated both extremes and I cannot decide what is best.

Fastai and lightning differ a lot in coding style. Lightning adopts the more standard “easy to read, explicit, pep8” style while fastai has one of it’s own.

I’ll not talk about the coding styles details, but rather the implications they have (For the details take a look here for fastai and here for lightning).

Fastai coding style is denser, it makes heavy use of abbreviations and of very specialised functions that simplifies common tasks by a lot (take a look at L as an example). I personally feel fastai coding style puts the developer productivity in it’s highest, this coding style tries to minimize the vertical space utilised by your code, so you can understand what it’s happening with a single glance over the code, without the need of scrolling around.

But all of this comes with a price, a steep learning curve and a lot of baggage. It was not once that I heard people saying “I cannot understand fastai source code”, for to understand it, you need to be used to it. Once you understand it though, it’s incredible and very simple, the time you put into it really pays off.

But we have to keep in mind that this is not a single person project, this is open source, and open source benefits only starts flourishing once people get involved and start contributing, I personally think we should make the contribution barrier as low as possible, and this is what we see with the lightning style. I feel there is a balance to achieve between “main developer productivity vs code complexity”

nbdev

I got very excited when I first heard about nbdev and quickly left my IDE and switched all my workflow to it, in the beginning everything is cool and new so it’s easy to ignore the obvious flaw, Jupyter has a crap interface for developing libraries. It works great for exploring your data and training your models, but it’s not an environment you want to spend all day developing your code, for obvious reasons.

First of all it sits on the browser, so shortcuts are limited and the editor flow cannot ever match the quality of a proper IDE. It induces bad code practices, like writing big extensive files that are hard to move around (going in the opposite way of the fastai coding style), I tried the ToC extension but that still bad compared to having small files that you can quickly fuzzy search . It’s much harder than necessary to jump around files, there is no way of quickly jumping between function/classes definitions (yeah yeah you can do ?? but that is crap compared to actually going to the file where the function is defined). The editor does not help you even with very simple stuff like “hey, this variable name was not declared”.

The problem is not nbdev itself but the interface around it, maybe nbdev is ahead of its time, we first need a solid notebook interface before we all start building libraries in notebooks. Maybe Jupyter lab is becoming that, but it stills works on the browser, so idk.

I was trapped using nbdev thinking it made me more productive, until I was forced to get back to my IDE (to submit a PR to lightning) and I felt what freedom was like again. And at the end of the day, everything I could do with a notebook I can do with an IDE, the only thing you need is a REPL interface (python console in pycharm), this way you can send code from the file you’re working to the REPL to be executed (think of it as an automatic copy and paste into a IPython shell), this way you can keep the interactive way of programming (which is the key for notebook productivity) and the good interface.

Contributing

I’ll drop the bomb, it’s hard to contribute to fastai, and I tried really hard, one of the problems is that there is no board with what needs to be fixed, what needs to be implemented, what needs to be discussed. Most of the contributions I done were fixes to problems I encountered while developing my own libraries.

Fastai is a much bigger project than lightning, as I previously said, it tries to cover the entire pipeline for solving a problem with deep learning. I would love to see a future where researches start to use fastai as they main tool of development, just imagine the speedup we would see if all researches used an unified tool with all the best practices already available! Because of the fastai nature it would be extremely easy to mix and match parts of different papers, testing new ideas would be easy as we never saw before. I come from a Deep RL space, and some very simple practices are still not used in the field, simply because it’s not worth to implement them.

But for this to happen we need researches to migrate to fastai, and they need to trust fastai, all those complains about “fighting with the library” and “crypt error messages” and in some cases “confusing documentation” needs to go away, and for that to happen we need more contributors ASAP. We need both more contributors from the community, and more core contributors from the fastai team.

Fastai is already loosing a lot of momentum, it’s not the best library on NLP anymore, we still don’t have object detection (and derivates), we have people unofficially working on audio and time series but I don’t think there is a plan of officially integrating those in the library.

Go to lightning repo, open the issues tab, search for the label “help wanted” and you find 186 open issues. Too overwhelming? Filter for “help wanted” and “good first issue” and you get twelve. Want to understand what new features are being worked on for the next releases? Go to the milestones page to find exactly that.

Take a look at their PRs, they need 3 core contributor approvals before merging, they have a lot of discussion on how to interface should look and behave, what should be added, what shouldn’t. To be clear, I don’t speak for fastai v1 because I was not active back then, but I don’t see this happening in v2.

Now, there is a very serious and important question we have to answer. Is it possible to have a “one does all” library for deep learning given the current scenario? Or is DL simply moving too fast for that? What does fastai wants to be? A library with decent baselines, or a library with SOTA baselines?

If the answer is we want SOTA baselines, we need SOTA to be developed with fastai and not replicated by fastai. That’s the only way of maintaining the pace with this ever more rapidly changing technology.

I cannot even image the amount of pressure and stress @sgugger and @jeremy are facing right now, there is the book, the course, the libraries, the forums, kids and the dreadful pandemic. Please know that you two are my inspiration, you guys must be on the select number of people that have access to a time machine, for I look at all that you have built and I find astonishing that it was built by mainly two people (I say mainly because there is always other people I’m failing to cite (and I’m sorry for that) and the community that plays an essential role in this, but the pilar is always the two of you)

64 Likes

I have to agree with @lgvaz 100% nail it.

In my experience FastAI is great when the problems you are working on are already solved in the library like image classification or using unet for segmentation, etc… If it’s a topic that has been cover extensively in the lessons FastAI does great at it, but as soon as you need to start modifying the internals it’s just a big mess plus the fact that the API is constantly being refractor doesn’t help. When you change behavior of the API or new functionality is better to just go back to using pure pytorch and or pytorch lightning, that’s why they mentioned is better for research.

Again that’s just my opinion. I don’t necessarily think is one or the other I personally use all 3 pure Pytorch pytorch lightning and FastAI. They all pytorch in the end so room for both to coexist. My main issue with FastAI is and will always be that it is a new API. So for a library that is build on top of Pytorch it sucks that it doesn’t play nicely with pytorch.

For example if I could just use the Data block api and just use it with my super advance training loop that I created or someone else already created it will make a huge difference. Or if I’m using a custome dataloader that has a new sampler and requires unique input outputs combinations but I can just use that dataloade with fastai.

That’s my ideal FastAI library one that simplifies some thing but always plays nicely with Pytorch and let’s me go back to it at any point. For now I’ll just use it for those problems it excels at like image classification.

Peace.

5 Likes

After spending the last week with lightning, I think this is the thing the concerns me the most right now, especially given Sylvain’s recent departure to huggingface … and that is: Who is maintaining fastai?

The last commit was almost a month ago, Jeremy doesn’t seem as involved any more with the maintenance of the library (that could be temporary, I dunno), and Sylvain, who seemed to be the sole maintainer and primary developer, is gone. It honestly makes me a bit nervous as a software developer wanting to use fastai in a number of projects. I wonder, will it be around in 6 months? In a year? Will the API in the next version be as radical and breaking as it was from v1 to v2? Is it really just an educational framework subject to constant change as alleged by the pytorch-lightning folks?

I love fastai, I love the framework (hell, I’m deeply invested in it), but this question really needs to be answered before I personally can have confidence in the long-term durability and stability of this library. I would suggest what is needed is something similar to what the lightning folks are doing in terms of having a team of core contributors/owners who manage the vision and development of the library.

Just my 2 cents. I’m still playing with PL, parts of it I like, but there is A LOT that isn’t there that I’d have to re-implement that I otherwise wouldn’t have too with fastai. My concern is more about the future, vision, and stability of the later.

17 Likes

@jeremy @sgugger what are your thoughts on this point. Would you take into consideration something like having a core set of devs that can approve after ‘x’ approvals? (Whom it would be would be up to your discretion of course). I like the idea, and I think it puts a bit of stress off of you and Sylvain as well :slight_smile:

6 Likes

It’s mostly Jeremy’s decision, for this.

1 Like

I had heard of lightning but not tried it. I am intrigued to try it now.
Where a problem looks a bit curly and not fit for fastai I generally go to Catalyst which is very flexible, or pytorch itself.

As far as I know, fastai2 is still in the development stage and not yet officially released to the public, with no active maintainers to the library when will fastai2 launch as the new “fastai” library and who will maintain it then? I already see posts related to fastai library gets very few or no replies in recent days, It makes me feel like the community involvement in the forums have decreased dramatically. I love the fastai2 library but, sometimes I get stuck and I hit a dead-end, before 2-3 months this wasn’t the case if I get stuck I would ask in the forums and people responded and somehow I will figure it out with their help, but now I just feel lost.
I tried using Pytorch lightening and It doesn’t suit me, I’m now concentrating on plain Pytorch accompanied by fastai functions, but I find my self implementing code that already exists and I’m sure I will waste most of the time by messing up something that I could have easily avoided by using fastai2.

As @wgpubs asks

I wonder the same.

5 Likes

For those concerned, Jeremy voiced what the future will be here, and is taking questions :slight_smile: Update on fastai2 progress and next steps

10 Likes

Since I have used FastAI 0.x and 1.x for my research in the past and have switched to lightning before v2, I’ll also share my experience with the libraries. Note that I have only briefly looked at FastAI 2, but I feel like the points I have do still apply. Also I only have experience with the library parts regarding images (which I abused for volumes a bit), no NLP or tabular.

  • I always enjoyed the data loading and augmentation solutions in FastAI, since they made all of my data preparations very easy when I was working with GANs for images. When I switched my research to new datasets, which included 3D volume data, I started adopting FastAI for this data type. While I enjoyed most of the code I encountered in the library, I must say it often cost me a lot of time just to make my custom stuff nicely compatible with FastAI.
  • Modifying the training on low level was mostly possible through callbacks in FastAI, but with fancy GAN training dynamics experiments I often found it way easier to just get full control over the training loop, which often lead me to not use Learner anymore (or just hack it right in there).
  • As far as I can tell both the data loading and callback system have further improved in version 2, but ultimately I did not invest in learning V2 yet. At the moment it just seems that learning V2 takes quite some time, especially getting to know the library internals that I’m sure I’ll have to study again to be able to implement my research.
  • PyTorch Lightning so far seems to give me personally more freedom for my research stuff, since all it does is structuring my code and taking care of the fp16 and distributed training. To put this into fair perspective, lightning does not help me at all with my data loading, but since I mostly work with data that is no primary citizen in FastAI either, not having to learn abstractions was a big deal. Also I just have full control over my training loop.
  • There are definitely a few things I’m missing in lightning, like LR finder, just a lot of the utility functions, easy control over learning rates for different layers.

In conclusion I believe FastAI (from what I’ve seen V2 even more) is an excellent library and it will probably be my first choice when I find myself working with 2D CNNs again, in classification or segmentation domains, especially when I can make use of transfer learning (since the tools for this are superb). However as soon as I go with fancy data or have to modify very low level stuff, I find the amount of library learning often too much. That learning aspect does not get better considering there has been a major library rewrite basically every year since I was using it. Having to learn the same library (often including internals) for the third time just did not seem reasonable to me when v2 launched, which is why I switched. I hope to see the library converging tho, and when I find myself back in domains where FastAI has invested the most, I’ll definitely give the new version a try.

If someone who knows neither library asks me which one to learn, I would send anyone working in the domains covered by FastAI to learn FastAI, especially so if they are also looking for a course to learn about DL. If the someone asking is a researcher that works with uncommon data and networks where you can’t easily benefit from transfer learning, I’d probably suggest lightning because it’s less learning effort and still provides fp16, ddp and good logging.

13 Likes