About the fastai dev category


(Jeremy Howard) #1

This category is for discussion of development of fastai v1, a rewrite of the fastai library, including:

  • Support consistent API for classification and regression, across all of: vision, NLP, tabular data, time series, and collaborative filtering
  • Clear and complete documentation for both new and experienced users
  • Well structured code
  • Notebooks showing how and why the library is built as it is.

Still to come is:

  • Consistent API for localization and generation, across the 4 application areas
  • Good test coverage (both unit and integration tests)

If you’re interested in helping, feel free! Be sure to read the fastai contributor notes. And if you’re in SF, you can join the SF study group in person (details here).

Note for new contributors

It can be tempting to jump in to a new project by questioning stylistic decisions that have been made, such as naming, formatting, and so forth. Especially so for python programmers coming to this project, which is unusual in following a number of conventions that are common in other programming communities, but not in Python. However, please don’t do this, for (amongst others) the following reasons:

  • Contributing to Parkinson’s law of triviality has negative consequences for a project. Let’s focus on deep learning!
  • It’s exhausting to repeat the same discussion over and over again, especially when it’s been well documented already
  • You’re likely to get a warmer welcome from the community if you start out by contributing something that’s been requested on the forum, since you’ll be solving someone’s current problem
  • If you start out by just telling us your point of view, rather than studying the background behind the decisions that have been made, you’re unlikely to be contributing anything new or useful
  • I’ve been writing code for nearly 40 years now, across dozens of languages, and other folks involved have quite a bit of experience too - the approaches used are based on significant experience and research. Whilst there’s always room for improvement, it’s much more likely you’ll be making a positive contribution if you spend a few weeks studying and working within the current framework before suggesting wholesale changes.

Fastai dev study group at USF
Coding Style for v1
Managing v1 notebook-specific discussions?
(WG) #2

Suggestion: Name the environment something like fastai-v1 so as to not affect folks using the current framework.

I’m glad to submit a PR so lmk.

-Wayde


(William Horton) #3

I’d like to help out! And I’m comfortable with the prerequisites you mentioned. I took a look at the commit history and I see that you and Sylvain are working on transforms/augmentation. Do you need more help in that area, or is there something else that you’re looking for someone to take on?


(Francisco Ingham) #4

Idem except not too comfortable with OO and functional programming in Python (know the basics but not much coding hours spent doing it). However I see this as a great opportunity to learn. Please let me know where I can help!


(Mark Worrall) #5

Hello @jeremy,

Sounds great…any idea on a ballpark completion date?

Context: it will take me perhaps ~200 hours (~10 weeks) to get up to speed given I haven’t done course 2 yet and am not familar with the fastai internals. But after that if you aren’t done I’ll have a lot more time.

Thanks,

Mark


(Jeremy Howard) #6

Plan is to try to release by mid October. @wdhorton @lesscomfortable start reading through the notebooks from 00, 00b, etc onwards, and see if you can get them working and you understand what’s going on. Then in a few days we should be ready to get some help! :slight_smile:


(Nick) #7

This is interesting.

Can I suggest thinking carefully and defining the goals, in particular the relationship with PyTorch?

I’ve gone reasonably deep into the library (have done one PR, and another one currently being reviewed for the ULMFiT stuff), and I’ve been confused about why some things are in PyTorch and some are in Fastai. I’m sure I’m not the only one.

From the FastAI course point of view I found it pretty challenging. I understand the FastAI library pretty well, but not PyTorch, and when I want to do something that is outside the library I don’t know how to do it.

I didn’t find the same issue with the 2017 course with Keras. I could jump straight into Keras and do what I needed to do.


(Paul ) #8

Just some feedback regarding the Fastai Abbreviation Guide (abbr.md).

Is using a trailing underscore to denote internal properties and methods not potentially a source of confusion? There is a fairly common python convention that a leading underscore is used for this purpose and in addition PyTorch uses a trailing underscore to indicate an ‘in-place’ operation. Since fastai is built on top of PyTorch there could be an argument for not ‘overloading’ the convention with a different purpose in fastai.

Clearly not the end of the world but I thought I would raise it at this early stage.


(Jeremy Howard) #9

Yes we’ll be using leading underscore now.


(Jeremy Howard) #10

Yup we’re working on that explicitly. Check out the fastai_v1/001a notebook, where we show exactly what’s in torch.nn and why. Then the next notebooks will gradually introduce the additions made by fastai and show why.


(Nick) #11

This is perfect.


(Karanbir) #12

I’d love to help out. I am familiar with OO concepts and writing clean tesatable code.
I’m not too familiar with the fastai library however.

Can we start by outlining a basic roadmap of features to deliver ?
We can divide work that way and hopefully get more stuff done. We can use the projects feature of Github to write issues and assign , track and so on and so forth. I have used them in my personal projects and they’re great for collabarating with teams. Your personal “Jira” . This kanban board would be great for newcomers to join in.

Also we can use the Github wiki pages to update content/ documentation.

This is an example on how we can go on about maybe ?

Can we have some list of issues on the repo,so we can start work with something and start contributing ?
Sorry for the long post. Building well tested production grade stuff really excites me :smiley:


(Michael) #13

This looks amazing for learning the fastai library with pytorch! Thank you! :smiley:

I was going through it and made some annotations for myself and shared it on GitHub. Maybe this is also interesting for others.

Best regards
Michael


(Jeremy Howard) #14

I’m afraid not - we’re doing research as we go and the development path is highly path dependent on that. So this won’t look at all like the usual enterprise software development project you’re used to. We’ll post here when there are specific tasks we need help with.


(Jeremy Howard) #15

That’s a great idea. You might find it even more helpful if you used markdown cells instead of comments for much of your notes, so you can add formatting, links, etc.


(Jason Antic) #16

I’m very excited about this initiative and I happen to have a lot of time on my hands. I’m currently part-time at my day job as a software engineer, which I did just to do a deep dive into deep learning (and I totally do not regret it!). I’m on part 2 lesson 11 right now, so I’m pretty far along and I’ve dug quite a bit into the existing code base.

What I’m probably most useful for: I’ve been in the business for a decade as a software engineer on big (enterprisey) code bases, and I’m big on making code readable, well tested, correct, and easy to reason about. So if you have some research code that you want to solidify into something solid, I can help. I’ll try to keep tabs on this forum page but ping me with a direct message if I don’t respond in a timely fashion.


#18

Hi,
since fastai is doing some plotting, I just wanted to mention a library a really love right now:
holoviews.org which is the main part of the pyviz.org project. (scipy tutorial )
I think that would simplify some things (one could change the style of a visualization after creation of an object) and allow cool stuff (live updating curves during training for example). Just some points about it:

  • built on top of matplotlib and bokeh (plotly is experimental)
  • unifying the plotting language, it’s relatively easy to change backends
  • easier interactivity
  • rather describe your data than your plot

So just wanted to mention it and maybe some opinions.
Thanks


(William Horton) #19

@sgugger what are the specs of the machine you used for the Cifar10-comparison-pipelines notebook? I’m getting slower times on my home rig (30 minutes for the “Standard DawnBench result with one GPU” run vs. the 22min47sec it says in the notebook), trying to figure out what might be the reason.


#20

All times reported are on a p3 instance.


(William Horton) #21

Makes me feel better, I’m not lucky enough to have a Tesla V100 at home