Developer chat

(benedikt herudek) #702

Fellow coders, am sure someone gave this a thought before (think there was also a remark from Jeremy on that on a question), so thought it worthwhile picking some smart brains.

Could lr_find spit out a smart default recommendation for which lr to choose? while 3e-3 seems a good default choice, one could also try to inspect the graph and find a resonably long steep downward slop.

Not suggesting a complete automate here, but rather suggesting that learn.recorder.plot()(…,…) which you typically call after lr_find would print an advice, like: ‘here are 3 good tips for choosing lr’ instead of one reading just from the graph.

(Stas Bekman) #703


(benedikt herudek) #704

yes, thx.

(Sanyam Bhutani) #707

I’ve submitted my first PR (not to fastai but to any repo) Here.

@wdhorton and I have been porting the lesson nb(s) to kaggle kernels, this a PR to add the section to the website as well as respective links in the lesson notebooks.

Here is the Kaggle kernel discussion thread

(WG) #708

Any chance there is a URLs.WT103_1_bwd in the works?

(Piotr Czapla) #709

You can train one easily, on ulmfit-multlingual, or it will be trained once we finish the bidir models.
Why you need to do an ensemble?

(WG) #710

“Need” is still to be determined … but ensembling my fine-tuned forward and backwards trained LM using the old fastai signficantly improved my downstream text classifier.

So yah, would like to try that again and see what if any improvement it makes with the latest codebase.

(Jonathan Miller) #711

Hello all,

I am currently taking the course and would like to contribute, despite being a noob to both software dev and deep learning. I am currently working on a TextCleaner widget which is similar to the ImageCleaner widget. However, as far as I know, fastai data bunches only store the tokenized text and it is not very human readable.

I would like to check if there is there any easy way to undo the tokenization performed, before I write a hacky method to do it myself.



There is no way to reverse the tokenization from spacy sadly.


I am having the same problem, and have been wondering about possible solution. One thing I can think of is (which would also make some of the nice feature implemented into more easily portable to text) is to use some kind of primary key like feature (for images it looks like we use the path, for text it could be either the path or some column if the text is stored into a df). Another option is to use the original text as index, and in the show methods of the databunch actually showing those instead of the tokenized version. Sadly I am afraid this is out of my coding comfort zone :frowning:

(Jonathan Miller) #714

Figured. Since the text cleaner is mostly just meant for at-a-glance evaluation, a hack undo-for-display method seems to work well enough, but I agree with miko that it would be nice if there were some way for the databunch object to be able to link back to the original texts using some kind of primary key-like system. Maybe I will look into that next.

(Sanyam Bhutani) #715

Hi guys,

I wanted to ask about this:

Are there any etiquette that I should be following when submitting/making changes in a PR?

For example Here, Jeremy had requested a few changes. I’m nervous if I have made too many comments? Should I just be sticking to a Thumbs up?
I’ve made the changes in browser, making each change a little commit. Which might have annoyed the dev(s) with a few notifications.

Am I doing it incorrectly? Any things I should be doing otherwise?


Improving/Expanding Functional Tests
(benedikt herudek) #716

good question @init_27 , had similar thoughts. An ‘etiquette’ could help to define a.o., if smaller or larger PRs are preferred, comment on comments in github or rather avoid notifications, discussions on approach rather in github or the forum.

(Bobak Farzin) #717

If you use a sentencepiece Tokenizer(), you can decode and restore the full text; It is fully reversible. I have an example of doing that if it is helpful. There are a couple steps to setup your own custom tokenizer, but it is not hard to get it to fit inside the current wrappers.

(Jonathan Miller) #719

Interesting. I’d like the TextCleaner widget to work with fastai’s defaults, and I’ve found that regex, rule-based decoding the user can optionally apply restores enough readability to the text for the use case.

I would be interested in seeing your example though.

(Stas Bekman) #720

We don’t have an exact specification and are just trusting that the contributors have common sense and will do their best knowing what their know.

Not having dozens of commits/notifications would have been nicer for sure, if you know you have to make a lot of changes - the best approach to minimize noise is to close the PR, do all the fixes and then submit a new PR.

On the other hand, it might be more difficult to track the suggested and the corresponding changes, so doing the changes in the existing PR has its own benefits.

And surely, if you don’t need to make a specific comment, but are just saying yes, thumbs up is certainly more efficient, since it’ll generate less messages/notifications.

Either way, I trust you will make the best decision when to combine several small fixes into a single commit and when to have them separate, and creating a new PR vs. continuing with the existing one.

and last but not least, please don’t be stressed out about it - we are grateful to everybody’s contributions and little by little you will feel comfortable doing it in the best possible way you know.

Improving/Expanding Functional Tests
(Florian Mutel) #721

Would you recommend another way than dual boot to install fastai / linux on a windows machine (and be able to access the GPU) ? ty

(Stas Bekman) #722

If you just need a few windows apps that you can’t get on linux, run linux as the host and use virtualbox/vmware/other virtualization software to run a windows client for just those apps. Of course, running 2 OSes concurrently will use more RAM, but it should be easy to suspend the windows virtual client when you don’t need it, so that it won’t interfere.

That way you won’t need dual boot.

(Stas Bekman) #723

New custom dependencies install feature

If you don’t want to install all the fastai dependencies, because you only want the vision or text dependencies the custom dependency groups are now automated, so that you can do:

pip selective dependency installation:

pip install --no-deps fastai
pip install $(python -q deps --dep-groups=core,vision)

same for conda:

conda install --no-deps -c fastai fastai
conda install -c pytorch -c fastai $(python -q deps --dep-conda --dep-groups=core,vision)

adjust the --dep-groups argument to match your needs, which you can get from:

python -q deps

You should get something like:

Available dependency groups: core, text, qrnn, vision

This assumes you’re inside the fastai git repo.

What happens behind the scenes is:

python -q deps --dep-groups=core,vision


Pillow beautifulsoup4 bottleneck dataclasses;python_version<'3.7' fastprogress>=0.1.18 matplotlib numexpr numpy>=1.12 nvidia-ml-py3 packaging pandas pyyaml requests scipy torch>=1.0.0 torchvision typing

There is another option to get the same but quoted output suitable for manual copy-n-paste:

# pip:
python -q deps --dep-groups=core,vision --dep-quote
# conda:
python -q deps --dep-groups=core,vision --dep-quote --dep-conda

So the output for pip will look like:

"Pillow" "beautifulsoup4" "bottleneck" "dataclasses;python_version<'3.7'" "fastprogress>=0.1.18" "matplotlib" "numexpr" "numpy>=1.12" "nvidia-ml-py3" "packaging" "pandas" "pyyaml" "requests" "scipy" "torch>=1.0.0" "torchvision" "typing"

I couldn’t figure out how to make the quoted output work with backticks/$(cmd), it won’t split the words then. If you can figure it out so that the quoted output can be fed directly into pip install please let me know.

The full docs are here:

This is totally new (a custom distutil command that I just invented), so feedback is welcome (more intuitive API, etc.)

(Pierre Guillou) #724

Hi. I ran again lesson1-pets.ipynb with learn.export() and load_learner().
It worked but I do not understand why I got a learner on cuda and not on cpu? (screenshot below)
My fastai version: 1.0.42 on Windows 10