Fastai v2 chat

KevinB · August 30, 2019, 5:12am

I am just reading through and running the notebooks at the moment. I would do that and pay attention to the forums. Understanding this style of development is going to be a huge part of contributing. When you are going through, if you hit any issues, try to find what is going wrong and document it for future users. If anything seems unclear add comments and documentation as it makes sense.

arora_aman · August 30, 2019, 5:15am

Thanks Kevin for your reply, this really helps.

sgugger · August 30, 2019, 9:04am

Looks like a bug. I’ll look into it.

sgugger · August 30, 2019, 9:05am

Also note that the dataloader stuff require pytorch 1.2.0 or more.

pnvijay · August 30, 2019, 10:54am

I had to pip install fastai for the fastprogress module to be imported. The fastai version is 1.0.57. Is this the right version of fastai for these notebooks to function.

nareshr8 · August 30, 2019, 11:03am

I dont think this should required fastai. We are building one. fastprogress is a separate library to have progress-bar.

pnvijay · August 30, 2019, 11:16am

Uninstalled fastai and tried again. all worked. So you are right that his does not require fastai. Having said that I think the packages that were installed when I installed fastai did help for successful import of all the required packages for 00_test.ipynb.

nareshr8 · August 30, 2019, 11:54am

I found a issue in coll_repr in fastai_dev/dev/01_core.ipynb that it doesnt use the max argument in the function. I added the same and added testcase for it.

Created a branch to push to fastai. But access was denied. As per the guidelines

Am I doing something wrong.

florobax · August 30, 2019, 12:19pm

You need to work on a fork of yours, then make a PR to fastai’s repo, you can’t push directly to a repo you’re not a contributor of (which is reassuring)

jeremy · August 30, 2019, 12:58pm

Don’t include conda-forge in channels @KevinB - it can cause all sorts of issues. There may be some things that (for now) we need to install through pip, if they don’t have conda installers outside of conda-forge (if we still use those libs when v2 comes out, we’ll copy them to our channel to avoid the issue).

jeremy · August 30, 2019, 1:01pm

By talking here. And by going thru the notebooks and looking for things that you feel could be improved.

nareshr8 · August 30, 2019, 1:39pm

I am thinking of adding params like file_type and file_extns in the ls() method. I have done the same personally in my notebook for picking only image files. Do u think we can add this optional params to ls()?

KevinB · August 30, 2019, 2:10pm

Ok, I updated it. PR request is out there. It seems like it works for me. python 3.7+, pytorch 1.2+, no conda-forge.

I’m guessing there are probably more that we want to add, but this has worked on all the notebooks I’ve ran so far.

kdorichev · August 30, 2019, 2:37pm

Added scipy

maxim.pechyonkin · August 30, 2019, 4:20pm

Can you provide more details about what is going wrong in your case?

Also, follow up questions:

were you able to create a pull request?
were you able to instrument Git to strip your notebooks of metadata, as explained here?

nareshr8 · August 30, 2019, 4:27pm

I didn’t run the strip notebooks. However, I have forked the repo and made couple of pull requests. Going through line by line json and carefully picking the change was little difficult. I hope that strip will solve that problem. I will try and let you know.

maxim.pechyonkin · August 30, 2019, 4:28pm

From my personal experience, commits won’t be accepted if the notebooks are not stripped out. I suggest you go through the setup of nb_stripout tutorial.

nareshr8 · August 30, 2019, 4:29pm

Okay. I’ll check and redo the same

jeremy · August 30, 2019, 8:16pm

Thanks for the suggestion and the PR. Since this additional functionality benefits from what L provides, I’ve moved it to later in the NB and reimplemented it using L.

ilovescience · August 31, 2019, 1:40am

As fastai v2 is in development, I just want to comment on the ML community’s view of fastai based on my experience using it in the “wild”. It seems that fastai v2 is quite well-developed but I wanted to point out some things that may help inform development decisions.

Based on my experience, in the Kaggle community, there are two types of people: people who love fastai (like me!) and people who hate fastai. I was surprised to see some of the hostility and disdain people had towards fastai. As an example of this, check this discussion post. This post is just one example.

I think people have a couple major issues with fastai that prevents them from using it:

Fastai is not flexible enough for using in Kaggle competitions. This seems to be the biggest problem people have with fastai. People are put off by all of the undocumented defaults, and how everything needs to be packed into fastai classes to benefit from fastai.
Fastai is poorly documented. Honestly, I kind of agree with this. And I understand why: fastai is maintained by just three people, and the rest is done by the community, with individuals working on this during their free time.
Fastai has a poor codebase that changes rapidly. It’s definitely difficult for people to have to keep migrating from one version of the library to another, especially when the library is not backward-compatible. In fact, within one year, two completely different versions of the library will have been released (v1 and v2). This can also put off a lot of folks from using the library.

What are the solutions behind this?

I think the key here is the documentation. There definitely needs to be a focus on improving the documentation. But there shouldn’t be just documentation regarding how to use the functions or classes. There should also be information about undocumented constants and settings. For example, I don’t think many people realize TTA only uses crop_pad, flip_lr, dihedral, and zoom transforms. Or that cnn_learner adds a custom head that has its own defaults. These are things that need to be more thoroughly explained. By providing this information, it allows others to change things more easily.

In addition, the documentation should also better demonstrate how PyTorch can be used with fastai. for example, some people prefer not to use the data_block API and rather use their own PyTorch dataset. But many don’t realize it’s easy to package PyTorch Datasets into a DataBunch, and use with a fastai Learner and take advantage of fastai’s amazing training routines. Documentation on this could also be helpful.

Finally, fastai should preferably be backward-compatible. I know there are many aspects of the library that need to be changed significantly to improve the library, but it is definitely desirable to keep the API as similar as possible.

Anyway, this is just my opinion after reading others’ opinions about fastai in the Kaggle community. I think it would great for more and more people to adopt fastai as their primary deep learning library.

Nevertheless, it is amazing how much success the fastai library has achieved and I am glad to be using and contributing to this library.