How we're developing fastai_v1

Oh I like that idea! We could add a comment or some cell metadata to cells to include in the .py file. :slight_smile:

OK let’s do it. How about the make the first line of a cell #EXPORT if we want to include it in the py file?

Do you want to try writing a little script to do that? The ipynb files are just json, so when I’ve written similar scripts in the past I’ve just used the python json module directly (or, if required for performance, ujson). Or I think there’s some official jupyter libraries you can use to access them, but I haven’t tried that myself.

Will fastai_v1 require Pytorch 0.4.0 or 0.3.0? For some reason, Pytorch 0.4.0 no longer supports my GPU but it also has newer features (and I can still run it on Paperspace) so I’m kinda torn either way.

It’ll require pytorch 1.0.

1 Like

Just ran the first two notebooks, the ‘Refactor using…’ sections are a great add and focus on one concept at a time. In 002b, the TODO mentions metrics, are these performance based or things like the confusion matrix, MSE etc.

You mean there will be just one #EXPORT, and everything else under it will be be exported. Otherwise that’s going to be a lot of #EXPORT cells.

Hmm, perhaps it’d be easier to simply have a new notebook for just that? So instead of one long notebook there will be several parts - ala ml1/*rf series? And the last one will always be 00XX-99final.ipynb or something similar. And to convert it to .py we already have tools for that. ipython nbconvert --to=python.
Since it’ll need to have all the imports/dataset loading/etc/, it can’t build upon the earlier parts of the notebook anyway.

So with 001:

  • kill 001b.ipynb (obviously preserving what’s already different)
  • split 001a_nn_basics.ipynb
  • so now we will have:
    001a_nn_basics.ipynb
    001b.ipynb
    001c-final.ipynb

Also, currently nb_001b.py only has functions in it, it doesn’t have a “running code”.
in the first post you wrote:

The net result of all this is that we build up over a few notebooks a complete solution to some problem (such as “create a training loop”, or “do data augmentation”). Once we’ve got a good solution to a whole problem domain (like computer vision), we can combine the various modules build along the way into a well-designed set of one or more modules, which we can then add tests and docs to.

So why does nb_001b.py at the moment have only functions and none of what you wrote above?

and we also need to think about docs and tests as you describe above - do you already have an idea of how this would be implemented - because this information is needed to complete this design issue - can we supply tests/docs using notebook - or somehow have them work side by side - perhaps we can use 001 as a guinea pig and build test/docs right away? (and then not worry about it for the rest if you feel it will delay the development at this stage)

edit: I think I have a better idea. The last notebook will be .py, and .ipynb will be auto-generated from it. I see there have been all kinds of attempts at py2ipynb out there but they require a pretty noisy input (indicating what kind of cell the following code should go to). I was thinking to simply split the code on double new lines to create distinct cells. Or perhaps we could have a simple ###\n tag that will stand for split here… thoughts?

Also we need to have consistent separators - or _ in the notebook names, currently we have both:

001a_nn_basics.ipynb
Cifar10-comparison-pipelines.ipynb
dogscats-test-aug.ipynb

I personally like - (minus) separator for filenames, keeping _ is for variables in the code.

and consistent case? s/Cifar/cifar/?

and I suppose you decided to have notebooks starting with 00X for future lessons, and non 00X (Cifar10-) to cover supplementary topics which won’t be directly worked on the new editions of fastai MOOC, correct?

Finally, can we have better hints in the name of the notebook of what it is about? e.g. img-reg (image regression), img-class (image classification), structured-classification, etc.

In v0 at the beginning it’s very confusing when notebook names only have the name of the kaggle competition or what objects they work with (rossman, cats-dogs, dog-breed). these are all fine and easy to refer to, but perhaps some abbreviation added to what type of a problem they are trying to solve: reg/class and type of input (img/nlp/structure).

2 Likes

With respect to exporting notebooks. We can write custom templates and preprocessors for nbconvert. So for example if we need to include a single cell, it should start with ‘#EXPORT’ and if we need to include multiple cells, then they must be wrapped in ‘#EXPORT START’ ‘#EXPORT END’ block. In case one of the cells must be excluded inside that block, it should start with ‘#EXPORT EXCLUDE’ or something. That probably should work, but need to test it on one of the notebooks, to see how that looks.

There’s an recent alternative to Jupyter notebooks that work very well for me in rapid explorations: Pycharm’s scientific mode using code cells. Essentially you can use ‘#%%’ to divide a python script into cells that can be executed just like Jupyter notebook cells, with the output sent to the Python console and any plot displayed in the Plot pane. You get the full power of an IDE (or just your favorite text editor) to support refactoring and such. Once the python script is more or less stable, you can export it to Jupyter notebooks and share with others.

If you don’t love editing text in a browser, you should look at Pycharm’s scientific mode.

The downside is that Pycharm costs $89/year.

Below is a screenshot

5 Likes

Here’s a thought. How about using tags instead and the tag can be the filename of the .py file that the code should go into. Additional tags can be added to specify other directives when including the code into the file such as a tag for which section of the file the code should be included in. For example, an import section at top, all top-level defs in a def section next and a class section last with the items ordered alphabetically within each section. This way the tags can be hidden and shown when desired. For example, students could show the tags to learn which file that code is in but hidden during lectures, etc… Also, by using the filename you can export the code to whichever file is desired not necessarily tied to that particular notebook. The tags are then easily retrieved from the metadata section of the json. Some screenshots below. The prefixes indicate the directive. For example, f: prefix is “file” and s: prefix is “section”. Just a thought.

I was going through 002_images.ipynb notebook. I came across the line torch.ByteStorage.from_buffer in pil2tensor function. Quick search on google did not help me understand what it does. Can someone help in undesrtanding why we are using it and what it does.

Spyder that comes free with Anaconda has the same functionality, doesn’t it?

I wasn’t aware of this but it looks very useful - thanks

I have coded a quick proof of concept for converting notebooks with 001b as an example https://github.com/bny6613/nbconvert_test. Please take a look and see if it worth to continue in this direction.

Yes, Spyder is also very feature-rich for ML workflow. The challenge is that it is not as well-known among traditional software engineers who are more familiar with Jetbrains or Visual Studio/VS Code products. Most of them who want to get into ML/AI end up using a combination of VS Code + Jupyter or Pycharm CE + Jupyter.

I hope that VS Code would eventually catch up with Pycharm’s capabilities around scientific mode and remote debugger. Competition is good!

Yeah it’s a good idea - I wondered about that too. But I don’t really like how the UX for them makes every cell bigger. And I don’t really need to be able anything more than a single per-cell flag saying what to export.

@Sylvain is working on this little script now, so we should have something to use pretty soon.

2 Likes

Hi @jeremy . Wrong @? I don’t understand enough to be able to contribute yet :neutral_face:

1 Like

Sorry @Sylvain I meant @sgugger! :slight_smile:

(Although I’m not sure I understand enough to contribute yet either…)

1 Like

That’s helpful to see how to do it with nbconvert - thanks for doing this. :slight_smile:

We just used plain JSON to do this for now - it seems to be working nicely. See notebook2script.py in dev_nb.

@stas I haven’t actually run this on our modules and checked everything still works - do you want to try that (as you mentioned, they’re slightly out of sync, so this may require some minor tweaks to the notebooks - as well as adding the #export tags of course.)

@stephenjohnson, I guess we will need to wait with integrating your improvements. Too many people have been changing too many things, and notebooks are not collaboration-friendly (rather counter-collaborative.) if this somehow falls between the cracks please remind me. Thank you.

Yes, will do.

1 Like