Managing data files with nbdev

I’m working on a project using nbdev where part of the library is a folder of data files. I’m having an issue where a test that involves loading one of the data files succeeds locally but failed on github CI.

This is the current folder directory structure:

mylib
    mylib
        data_folder/
        nb_exported.py
    nbs
        data_folder/ (symlink)
        00_nb.ipynb

Locally, I can run code/tests that read files from data_folder. nbdev_test runs successfully. When I push to github, CI fails with a file not found error. What do I need to set up to make this work with github CI?

@KarlH you would also need to push the data folder to github repo.

Data folder was pushed to github. I’m not sure why, but the relative pathing got changed up somehow during CI. I fixed the issue by extracting the __path__ variable of the module and reconstructing the filepath from that.

Interestingly, normal paths worked for extracted tests. So open('data_folder.file') worked fine for tests. But for loading files in extracted modules, I had to change things to open(mylib.__path__/data_folder.file')

A folder inside of mylib for Python is a subpackage. You should probably move the data folder to the root of the repository and use ../data_folder instead of the symbolic link.

1 Like

one should really use importlib.resources.path for this. it’s the only way to find data files reliably for all ways of dissemination of the package (src/wheel).

1 Like