Exporting and timing

I followed the symlink approach you suggested in this PR (which I opened for easier reference). The libjupyterInstalledPackages.so library needs to be loaded before modules can be imported, so I had to copy it into the user-specified modules location and break up the build process a little bit so the dylib is loaded when Swift is started, instead of doing it when the packages are built.

An alternative would be to share the intermediate build files as well (by creating a symlink from the /tmp/xyzxyz directory itself). Iā€™m not sure if thatā€™s desirable, even if it means build intermediates could be reused.

@marcrasi Please, let me know whether this is useful. I plan to work on the Swift-based notebook export idea next :slight_smile:

1 Like

Thanks!! I left some comments on the PR. My biggest comment is that I think itā€™s very desirable and important to share the intermediate build files. I explain why in the PR comment.

1 Like

Regarding notebook exporting, I just wrote an all-Swift package as @marcrasi suggested. It is currently available at https://github.com/latenitesoft/NotebookExport

After you %install it in your notebook, you can use it like this:

import Path
import NotebookExport
let exporter = NotebookExport(Path.cwd/"00_load_data.ipynb")
print(exporter.toPackage())
  • toPackage() creates an independent package called ExportedNotebook_00_load_data with the exported cells and the notebook %install dependencies, which are automatically parsed from the notebook. You can also use:

  • toScript() if you want to add the exported cells as a source inside another package called ExportedNotebooks (in this case the notebook dependencies will be merged with the packageā€™s ones); or

  • export() if you want to do both.

I chose the prefix ExportedNotebook to prevent conflicts with existing FastaiNotebook packages already created. It can be overridden, of course.

If you find this useful I can submit a PR to swift-jupyter with the package implementation files so itā€™s included with the main distribution. We can also keep it in a separate repo if thatā€™s preferrable.

@vova, @jeremy tagging you because you previously commented about this task, happy to hear your comments (and @marcrasiā€™s, of course!).

4 Likes

That sounds great! I donā€™t think weā€™ll have time to use it for this weekā€™s lesson, but would be great to target moving things to it for next weekā€™s lesson.

Can you explain more how this works for notebook modules that require other notebook modules? E.g. 02 requires 01 which requires 00. Currently we have a very hacky create_packages.py thing which copies appropriate subsets of swift files to every package - itā€™s not a great approach! How are you handling that? Are you instead creating proper dependencies from each notebook module? Does the user still import it by just importing the one package (e.g. notebook 02 package)?

I suggest you go ahead and do the PR now so itā€™s easier for us all to start using it.

toScript() thing is cool :slight_smile:
From general usage perspective (unrelated to fastai notebooks), it would be nice to have a way to specify full package and source file name, not just a prefix.

I was wondering why you do this way because itā€™s possible to put one file per package and have them depend on each other like russian doll (installing the top one will install all dependencies)

1 Like

Sure! We donā€™t want to break lessons with modules not fully tested. I was counting on changes to be made.

Dependencies for each package are saved inside the exported package definition, so each package refers recursively to the previous one. No flattenization is performed. This means that you need to %install the one package, but then you need to import all the namespaces you want to use. For example, if notebook 00_test defines function test_00; then notebook 01_test defines test_01, this is how usage would look like from a third notebook:

If that is not convenient, I can try to do the flattenization.

Note that the dependencies are indeed flattened for the global ExportedNotebooks package (which is similar to the FastaiNotebooks that one of the Python scripts creates). Importing ExportedNotebooks would work with just a single import line. However, itā€™s clearer that each notebook refers to the previous code.

Anyone can test now installing from the external repo:

%install '.package(url: "https://github.com/latenitesoft/NotebookExport", .branch("master"))' NotebookExport

Sounds good. Iā€™d even like to infer the source filename from the notebook and use it as a default value, but Iā€™m not sure that can be done without some jupyter magic.

Exactly!

Thereā€™s one trick to avoid that, though itā€™s not official.
In 01_test notebook (aka ExportedNotebook_test_01 package) from your example you can add following cell:

//export
@_exported import ExportedNotebook_test_00

This should make ExportedNotebook_test_00 namespace available when you only import ExportedNotebook_test_01. And it seems to work with nested dependencies :slight_smile:

Yeah thatā€™s significantly less usable than what we have now. We do need to be able to just install the previous notebookā€™s export, like in the python versions (and the current swift system).

Sounds nice. How do we make it official? Or are there better options?

Awesome, thanks for sharing! With that hint I found this interesting thread about the topic. Iā€™ll do some experimenting.

As I understand thereā€™s still discussion how/if to allow that kind of thing in the language while the @_exported attribute is temporary solution.
Hereā€™s the link to discussion topic Pedro found:

Currently @_exported works, but itā€™s undocumented so itā€™s not guaranteed to work in future Swift versions.

Canā€™t think any, rather than extracting code into files in one package (can be done with Pedroā€™s tool using toScript() method) and importing that same package in all notebooks - but it seems you want to keep each new progression of the library available separately.

This is what Iā€™m planning to try:

  • Automatically prepend @_export to import clauses in exportable cells. By skimming through that thread it looks like that could work.
  • If it doesnā€™t, Iā€™ll try to prepare packages in a different way by navigating through dependencies. The problem with creating that sort of logic is that it may become more fragile. But Iā€™ll give it a go.

I found @_exported a bit cumbersome because it still requires user cooperation (import lines must be flagged as exportable); doesnā€™t seem to work with operators; and is not supported.

I ended up copying source files from previously-exported packages the current one depends on. This seems to work in my preliminary tests, Iā€™ll test a copy of all the lessons tomorrow.

Note that this solution does not really address visibility - the copied sources are in fact shadowing the declarations in previous packages. We probably donā€™t even need to declare the dependency in the manifest file, as the sources are included inside the package. Iā€™ll test that too.

The main motivation to create this system was that dependencies against external libraries such as Path or Just are computed automatically instead of being hardcoded. And dependent sources are now derived from explicit %install directives rather than relying on filenames.

I do have one question about the current system, though. What is the purpose of the FastaiNotebooks where notebooks are initially exported to? In subsequent notebooks we always refer to the previous package (i.e., FastaiNotebook_01_matmul and the like), and never to the ā€œglobalā€ one. Does it solely exist to support the Python script that creates the other packages, or is it meant for something else? Because if it serves no special purpose, Iā€™m happy to delete all the code that deals with it in my export library - I donā€™t need it to create the other packages.

1 Like

Sounds right. If you ā€œunwrapā€ dependency by copying files from it into new package, you donā€™t need it as dependency anymore. But you probably need to tell your script which dependencies you want to ā€œunwrapā€, e.g. notebook 01:

%install ... Path
%install ... Just
%install ... NotebookExport
// ...code...
NotebookExport(Path.cwd/"01.ipynb").toPackage()

This creates package ExportedNotebook_01 with dependencies from %install directives (Path, Just, NotebookExport).
And in notebook 02:

%install ... ExportedNotebook_01
%install ... SomethingElse
// ...code...
NotebookExport(Path.cwd/"02.ipynb").toPackage(unwrappingDependencies: ["ExportedNotebook_01"])

This could create package ExportedNotebook_02 that has code and dependencies of ExportedNotebook_01 (Path, Just, NotebookExport) plus code and dependencies of 02.ipynb (SomethingElse).
Then in notebook 03 you can install and then unwrap ExportedNotebook_02 package, and so on.

Turns out last night I just pushed to my work repo in my home server but forgot to push to the public remote. Sorry about that.

The way it works now, you donā€™t need to tell it what dependencies to unwrap. We select packages that:

  • Are present in the %install cell.
  • Are local (installed with path: instead of url:).
  • Share the same parent directory with the package we are creating.
  • Have the same prefix.

So for the fastai use case, there is no need to specify the source package to unwrap. By the way, I do like the unwrap term :slight_smile:

2 Likes

Itā€™s solely there for the python script. Itā€™s basically a legacy of how we originally built this. We certainly donā€™t need it.

Cool, good to know, thanks. Then Iā€™ll delete the unnecessary code and will test again. Deleting code feels good!

1 Like

Iā€™ve tested the latest version of the NotebookExport package and it seems to be working fine with the lessons. Iā€™m now using hard links for unwrapped dependencies instead of copying them, so they act like regular files but get updated if you re-export a previous notebook.

Iā€™ve pushed the changes to the fastai branch, so use the following %install directive to test:

%install '.package(url: "https://github.com/latenitesoft/NoteBookExport", .branch("fastai"))' NotebookExport

Note that exported packages are prefixed with ExportedNotebook_ to prevent conflicts with the current version. To test the lessons with the new system, please update the paths accordingly, or include a usingPrefix: argument to the export function.