How we're developing fastai_v1

urmas.pitsi · August 1, 2018, 7:07am

Spyder that comes free with Anaconda has the same functionality, doesn’t it?

johnri99 · August 1, 2018, 9:28am

I wasn’t aware of this but it looks very useful - thanks

bny6613 · August 1, 2018, 10:06am

I have coded a quick proof of concept for converting notebooks with 001b as an example https://github.com/bny6613/nbconvert_test. Please take a look and see if it worth to continue in this direction.

vha14 · August 1, 2018, 3:41pm

Yes, Spyder is also very feature-rich for ML workflow. The challenge is that it is not as well-known among traditional software engineers who are more familiar with Jetbrains or Visual Studio/VS Code products. Most of them who want to get into ML/AI end up using a combination of VS Code + Jupyter or Pycharm CE + Jupyter.

I hope that VS Code would eventually catch up with Pycharm’s capabilities around scientific mode and remote debugger. Competition is good!

jeremy · August 1, 2018, 5:41pm

Yeah it’s a good idea - I wondered about that too. But I don’t really like how the UX for them makes every cell bigger. And I don’t really need to be able anything more than a single per-cell flag saying what to export.

@Sylvain is working on this little script now, so we should have something to use pretty soon.

Sylvain · August 2, 2018, 3:05am

Hi @jeremy . Wrong @? I don’t understand enough to be able to contribute yet

jeremy · August 2, 2018, 1:59pm

Sorry @Sylvain I meant @sgugger!

(Although I’m not sure I understand enough to contribute yet either…)

jeremy · August 2, 2018, 2:10pm

That’s helpful to see how to do it with nbconvert - thanks for doing this.

We just used plain JSON to do this for now - it seems to be working nicely. See notebook2script.py in dev_nb.

@stas I haven’t actually run this on our modules and checked everything still works - do you want to try that (as you mentioned, they’re slightly out of sync, so this may require some minor tweaks to the notebooks - as well as adding the #export tags of course.)

stas · August 2, 2018, 8:55pm

@stephenjohnson, I guess we will need to wait with integrating your improvements. Too many people have been changing too many things, and notebooks are not collaboration-friendly (rather counter-collaborative.) if this somehow falls between the cracks please remind me. Thank you.

stephenjohnson · August 2, 2018, 10:53pm

Yes, will do.

stas · August 3, 2018, 6:04pm

I looked.

Well, first this is going to be very fugly. You will need to have #export in many many cells. And then you will add a new cell and forget to add #export. There must be a better way.

At the very least it should support ranges of cells:

#export start
cell34

cell35
#export end

so if the last bit of a notebook is all meant for .py, we would just need one range. But that’s not the real problem.

The real problem is that you are going to forget some settings/code/imports that happened earlier in the notebook, since the final part will just work in the scope of the big notebook. Once you export/convert to .py things will mysteriously break, or worse - it’d work , but give inferior results, because some bits didn’t get imported.

I feel that simply splitting the last part of the notebook into a whole separate notebook is the most sane and clean way to go. (Especially since you have been splitting them already anyway). That way you don’t need no weird markup and just convert the notebook independently - and it’s self-contained.

And with regards to #2 - if it’s a separate notebook not relying on anything in the “build-up” notebooks, any errors and omissions would be instantly obvious.

I wrote in more details about this here.

The only minor detriment in this approach would be during teaching - as you’d need to switch to the 00XX-final- notebook at some point, rather than having it all in one tab - now it’ll be several tabs. Though, again, since you already started splitting them (And you also did so in ml1 2018 ed) it should be just an extension of that.

And of course the generated .py will have a big:

#################################################
### THIS FILE WAS AUTOGENERATED! DO NOT EDIT! ###
#################################################

stas · August 3, 2018, 6:16pm

Actually, why do we even want to maintain this .py export? we could just provide a doc snippet telling people how to extract .py from any final notebook, and then we don’t need to maintain it. Since, again, we will forget to re-run the export on every update of .ipynb and the committed .py and .ipynb versions will not be synchronized, causing confusion and a waste of time on the part of all involved.

Also since we don’t print() in the notebook, but rely on its built in print, .py will have problems unless you start using print and also not rely on ipython features which might not be available in .py. It sounds like it’s much better to just let end users figure it out. And have some docs explaining how to adjust things (e.g. re-adding print()).

I vote for not exporting .py at all, and just keeping a nicely fully contained 00XX-final.ipynb.

vha14 · August 3, 2018, 6:31pm

It’s possible to write a pre-commit hook to automatically export from new/modified notebooks to .py files and also automatically add them to the commit.

stas · August 3, 2018, 8:22pm

It’s doable, but it most likely shouldn’t be done due to a variety of issues with autogenerated .py files, only some of which I have discussed earlier.

If it’s an automatic conversion any user should be able to do that if they need to.

ipython nbconvert --to=python *.ipynb

Otherwise, it’s going to be just a huge number of files in the repository, that contributes nothing. IMHO, of course.

edit: I meant that in the case that we decide to make a standalone 00XXz-final.ipynb. Of course if the decision will be to do the extraction of some bits from the main notebook then what you suggested @vha14 is definitely a way to go.

sgugger · August 3, 2018, 9:34pm

I think the goal is to just keep the functions that are useful for the future notebooks during development. It’s quick and easy, and the point in Jeremy’s mind isn’t to hurt readability but to show to everyone reading them later what was worth keeping.
I’m not sure the .py will be meant to be used as standalone, it’s just our way to document the process of creating a new library (or trying to). The final result will have proper modules built from functions in those files, but it’s not going to be auto-generated from the notebooks.

jeremy · August 4, 2018, 12:38am

It’s even more than that - these notebooks are designed to be the basis of part 2. So as I go thru them in class, I’ll be explaining why some cells are worth saving for later.

I’ve just gone thru 001b nb and added #export to the relevant cells, and am now testing the auto-generated result in 002 nb.

Also, @lesscomfortable @313V I’ve removed the dataloader transforms from that notebook now, since it turns out we probably don’t want to use that approach. Also, I added a crazy idea that @313V had, which is to integrate tqdm directly in to the dataloader. Not sure how I feel about it yet, but it’s kinda fun…

stas · August 4, 2018, 12:51am

Now that both of you clarified that, I see that I understood it differently. What you explained now doesn’t quite match in my mind what Jeremy wrote in the very first post of this thread, but I can see now how this is what Jeremy meant to convey.

The confusing part was in Jeremy saying:

“Then when that all looks pretty good, I download the notebook as a python file, remove all the code that isn’t necessary for the final approach, and save that as a module.”

Whereas really you are just saving some functions and not the complete final approach. And you’re planning to use those functions for the new fastai codebase. i.e. this is not meant for the end-user as is, but instead it’s a transitional phase before all these get morphed into the v1 fastai codebase and probably once in the code base all those .py’s will get nuked then.

Did I understand it correctly?

stas · August 4, 2018, 12:54am

probably run this so that the two files match (because we have more than one notebook starting with the same number, as it was coded originally):

--- a/dev_nb/notebook2script.py
+++ b/dev_nb/notebook2script.py
@@ -1,4 +1,4 @@
-import json, fire, re
+import json, fire, re, os.path

 def is_export(cell):
     if cell['cell_type'] != 'code': return False
@@ -13,7 +13,7 @@ def notebook2script(fname):
     code_cells = [c for c in cells if is_export(c)]
     module = ''
     for cell in code_cells: module += ''.join(cell['source'][1:]) + '\n\n'
-    number = fname.split('_')[0]
-    with open(f'nb_{number}.py','w') as f: f.write(module[:-2])
+    fname = os.path.splitext(fname)[0]+'.py'
+    with open(f'nb_{fname}', 'w') as f: f.write(module[:-2])

 if __name__ == '__main__': fire.Fire(notebook2script)

stas · August 4, 2018, 12:56am

remember that nb_001b.py doesn’t match its originating notebook, so probably it needs to be ported back first but of course I trust you will see the diff.

jeremy · August 4, 2018, 12:56am

Nearly right. The morphing bit is this:

Once we’ve got a good solution to a whole problem domain (like computer vision), we can combine the various modules build along the way into a well-designed set of one or more modules, which we can then add tests and docs to.

But we don’t get rid of the modules built along the way. They’re used in part 2 of the course - each one supports the next notebook (that’s why we import the previous notebook’s module at the start of each notebook).

So the idea is that students will get to experience something close to the development process of the whole library, and in so doing, understand the problems that are solved along the way (which must by definition cover all the details of every deep learning application supported by the library).