Developer chat

Figured. Since the text cleaner is mostly just meant for at-a-glance evaluation, a hack undo-for-display method seems to work well enough, but I agree with miko that it would be nice if there were some way for the databunch object to be able to link back to the original texts using some kind of primary key-like system. Maybe I will look into that next.

Hi guys,

I wanted to ask about this:

Are there any etiquette that I should be following when submitting/making changes in a PR?

For example Here, Jeremy had requested a few changes. I’m nervous if I have made too many comments? Should I just be sticking to a Thumbs up?
I’ve made the changes in browser, making each change a little commit. Which might have annoyed the dev(s) with a few notifications.

Am I doing it incorrectly? Any things I should be doing otherwise?

Thanks,
Sanyam.

good question @init_27 , had similar thoughts. An ‘etiquette’ could help to define a.o., if smaller or larger PRs are preferred, comment on comments in github or rather avoid notifications, discussions on approach rather in github or the forum.

If you use a sentencepiece Tokenizer(), you can decode and restore the full text; It is fully reversible. I have an example of doing that if it is helpful. There are a couple steps to setup your own custom tokenizer, but it is not hard to get it to fit inside the current wrappers.

1 Like

Interesting. I’d like the TextCleaner widget to work with fastai’s defaults, and I’ve found that regex, rule-based decoding the user can optionally apply restores enough readability to the text for the use case.

I would be interested in seeing your example though.

We don’t have an exact specification and are just trusting that the contributors have common sense and will do their best knowing what their know.

Not having dozens of commits/notifications would have been nicer for sure, if you know you have to make a lot of changes - the best approach to minimize noise is to close the PR, do all the fixes and then submit a new PR.

On the other hand, it might be more difficult to track the suggested and the corresponding changes, so doing the changes in the existing PR has its own benefits.

And surely, if you don’t need to make a specific comment, but are just saying yes, thumbs up is certainly more efficient, since it’ll generate less messages/notifications.

Either way, I trust you will make the best decision when to combine several small fixes into a single commit and when to have them separate, and creating a new PR vs. continuing with the existing one.

and last but not least, please don’t be stressed out about it - we are grateful to everybody’s contributions and little by little you will feel comfortable doing it in the best possible way you know.

7 Likes

Would you recommend another way than dual boot to install fastai / linux on a windows machine (and be able to access the GPU) ? ty

If you just need a few windows apps that you can’t get on linux, run linux as the host and use virtualbox/vmware/other virtualization software to run a windows client for just those apps. Of course, running 2 OSes concurrently will use more RAM, but it should be easy to suspend the windows virtual client when you don’t need it, so that it won’t interfere.

That way you won’t need dual boot.

1 Like

New custom dependencies install feature

If you don’t want to install all the fastai dependencies, because you only want the vision or text dependencies the custom dependency groups are now automated, so that you can do:

pip selective dependency installation:

pip install --no-deps fastai
pip install $(python setup.py -q deps --dep-groups=core,vision)

same for conda:

conda install --no-deps -c fastai fastai
conda install -c pytorch -c fastai $(python setup.py -q deps --dep-conda --dep-groups=core,vision)

adjust the --dep-groups argument to match your needs, which you can get from:

python setup.py -q deps

You should get something like:

Available dependency groups: core, text, qrnn, vision

This assumes you’re inside the fastai git repo.

What happens behind the scenes is:

python setup.py -q deps --dep-groups=core,vision

returns:

Pillow beautifulsoup4 bottleneck dataclasses;python_version<'3.7' fastprogress>=0.1.18 matplotlib numexpr numpy>=1.12 nvidia-ml-py3 packaging pandas pyyaml requests scipy torch>=1.0.0 torchvision typing

There is another option to get the same but quoted output suitable for manual copy-n-paste:

# pip:
python setup.py -q deps --dep-groups=core,vision --dep-quote
# conda:
python setup.py -q deps --dep-groups=core,vision --dep-quote --dep-conda

So the output for pip will look like:

"Pillow" "beautifulsoup4" "bottleneck" "dataclasses;python_version<'3.7'" "fastprogress>=0.1.18" "matplotlib" "numexpr" "numpy>=1.12" "nvidia-ml-py3" "packaging" "pandas" "pyyaml" "requests" "scipy" "torch>=1.0.0" "torchvision" "typing"

I couldn’t figure out how to make the quoted output work with backticks/$(cmd), it won’t split the words then. If you can figure it out so that the quoted output can be fed directly into pip install please let me know.

The full docs are here: https://docs.fast.ai/install.html#custom-dependencies

This is totally new (a custom distutil command that I just invented), so feedback is welcome (more intuitive API, etc.)

1 Like

Hi. I ran again lesson1-pets.ipynb with learn.export() and load_learner().
It worked but I do not understand why I got a learner on cuda and not on cpu? (screenshot below)
My fastai version: 1.0.42 on Windows 10

When you use load_learner it automatically puts the model on defaults.device. So it was saved on the CPU, but when you load it, it’s put on the GPU if you have one.

Thanks for the answer Sylvain. Is there a possibility to add an argument to load_learner() to put it to CPU (as I want to use my learner object to get predictions, I do not need GPU after loading) ?

Not really no, it uses the usual Learner init which doesn’t take that argument. That’s why there is a defaults object you can change.

Ok but it means it is necessary to write 2 lines of code instead of one to load a learner in order to get predictions on CPU (when the notebook is on cuda).

defaults.device = torch.device('cpu')
learn = load_learner(path)

Another point, in the docs and code about load_learner(), it is written:

Load a Learner object saved with export_state in path/fn with empty data, optionally add test and load on cpu.

It is true but it is a bit confusing with the fact that defaults.device = torch.device('cpu') is needed in order to get our model on CPU in practice (when the notebook is on cuda).

If you have a GPU, you gain nothing to put everything on the CPU, so I don’t get why it’s a problem. We’re only talking about doing inference on CPU because you’d do this on a machine/server that doesn’t have access to a GPU (in which case the default device will be the CPU).

1 Like

Thanks. It is clear.

Hi. I have been installing the dev version from github using pip install git+https://github.com/fastai/fastai.git.

For the latest version, it fails because the cu/cpp files in fastai/text/modesl/ were not installed.

I am guessing this has something to do with the MANIFEST.in file (it does not explicitly include *.cpp), but I have no idea how to fix it (so no PR).

TIA,
Etay

That is a very vague report, @emaror. What fails?

It appears those were experimental and used only by those with pip install -e .

Now added fastai/text/models/forget_mult*{cpp,cu} to MANIFEST.in as you suggested, so everybody will have access to those.

If you still need help, please remember we don’t have an 8-ball (yet), so you need to be explicit in your help requests.

Right, I get an exception when importing QRNNLayer:

from fastai.text.models.qrnn import QRNNLayer

Traceback (most recent call last):
File “”, line 1, in
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/fastai/text/models/qrnn.py”, line 11, in
forget_mult_cuda = load(name=‘forget_mult_cuda’, sources=[fastai_path/f for f in files])
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/torch/utils/cpp_extension.py”, line 645, in load
is_python_module)
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/torch/utils/cpp_extension.py”, line 793, in _jit_compile
with_cuda=with_cuda
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/torch/utils/_cpp_extension_versioner.py”, line 44, in bump_version_if_changed
hash_value = hash_source_files(hash_value, source_files)
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/torch/utils/_cpp_extension_versioner.py”, line 16, in hash_source_files
with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/fastai/text/models/forget_mult_cuda.cpp’

Copying the cpp (and cu) files to the installation directory fixed this issue.
Hopefully, the change to MANIFEST.in fixes this.

Thanks!

that’s much better, thank you! yes, please try again after installing from git.