Developer chat

(Stas Bekman) #562

For those of you who wanted to understand the Pillow-SIMD and libjpeg-turbo story with all of its complexities head to, where it’s unraveled for you including installation details with the complex nuances of it.

There is now an automated script that does the performance analysis for you.

It’s discussed in this new thread:

(Stas Bekman) #563

Also note a new dependency: nvidia-ml-py3, so make sure you either install it (pip/conda) or run 'pip install -e .[dev]aftergit pull`.

And with nvidia-ml-py3 you can start tapping into nvidia status from fastai code w/o needing the awkward parsing of nvidia-smi's output.

nvidia-ml-py3 was already on pypi and we added a conda version in the fastai channel.

And it’s super fast!

(Finn Frotscher) #564

Hi guys,
i am new to the course an am having issues with gpu management or capacity (which idk).
I outlined the issue here:
Is it a problem of management? i sure hope so

(Bobak Farzin) #565

I have been working on a simple Language Model generator in v1

All works quite smoothly and then I hit the learner.predict() and get different text generation each time I call it. Digging into the code, I see how torch.multinomial() is being called to sample from the output probabilities. I also (think) I understand the temperature and min_p parameters which allow you to control how much variation you want in your selection. (for example, if you set temp = 0 , you get a uniform selection of words and a random sample on the output.)

What I don’t see is a configuration that will take the single-best word on each pass through the loop. This could lead to “loops” where you flip-flop predictions but I think that would be good to know about your network. This would also generate the same words in the same order given the same state.

Am I missing something important here about how the text.learner.predict() works? Or a set of parameters that would meet this criteria?

(benedikt herudek) #567

Where would be a good place to discuss test coverage, e.g. as a subtopic in: Dev Projects Index ?

Made a simple test coverage for basic_data.show_batch in Motivation initially was I could try to reproduce this bug here: . But I could already make a PR of the test scripts, even though, there are no asserts: its simply coverage for this method as a regression test that it doesn’t throw exceptions.

After running make coverage, I can see other areas, where test scripts could be useful. Am aware of the general guidelines (, but maybe good to have one place to coordinate, set prios and have guidelines when asserts matter (or not)? Might help preventing non - accepted PRs.


(Stas Bekman) #568

Coverage for the sake of coverage is mostly meaningless unless it verifies the workings, but tests for the functionality that hasn’t been tested yet are very welcome.

In general you can just submit such test contributions directly via github, but if you’d like to make a study and perhaps compile a list of functionality that’s untested yet, then yes, making a dedicated to testing thread under and linking to it from Dev Projects would be a very useful contribution, @Benudek. There is one for Documentation improvements so that would be one for Testing Improvements or something like that.

(benedikt herudek) #569

drafted a task wrt testing here: Test Asserts & Coverage and linked it here: Dev Projects Index

Feel free to adjust task & process

(Stas Bekman) split this topic #570

4 posts were merged into an existing topic: Improving/Expanding Tests

(Hiromi Suenaga) #571

I was going through lesson5-sgd-mnist.ipynb and noticed that pathlib’s Path does not get imported:

I did not think much of it and did

from import *

And got past the error. However, a little later in the same notebook, I am getting this error:

data = DataBunch.create(train_ds, valid_ds, bs=bs)
AttributeError                            Traceback (most recent call last)
<ipython-input-12-16801c71829f> in <module>
----> 2 data = DataBunch.create(train_ds, valid_ds, bs=bs)

~/git/fastai/fastai/ in create(cls, train_ds, valid_ds, test_ds, path, bs, num_workers, tfms, device, collate_fn, no_check)
    112                collate_fn:Callable=data_collate, no_check:bool=False)->'DataBunch':
    113         "Create a `DataBunch` from `train_ds`, `valid_ds` and maybe `test_ds` with a batch size of `bs`."
--> 114         datasets = cls._init_ds(train_ds, valid_ds, test_ds)
    115         val_bs = bs
    116         dls = [DataLoader(d, b, shuffle=s, drop_last=(s and b>1), num_workers=num_workers) for d,b,s in

~/git/fastai/fastai/ in _init_ds(train_ds, valid_ds, test_ds)
    102     @staticmethod
    103     def _init_ds(train_ds:Dataset, valid_ds:Dataset, test_ds:Optional[Dataset]=None):
--> 104         fix_ds =, train_ds.y) # train_ds, but without training tfms
    105         datasets = [train_ds,valid_ds,fix_ds]
    106         if test_ds is not None: datasets.append(test_ds)

AttributeError: 'TensorDataset' object has no attribute 'new'

I am on commit ce85b4718865f25d0243042c0e4d051929ca3b52 and not seeing anything obvious.

Does this ring a bell to anybody?

(Stas Bekman) #572

we now have a pillow conda package built against a custom build of libjpeg-turbo (and libtiff w/ libjpeg-turbo) in the fastai test channel

I made 3.6 and 3.7 linux builds:

To install:

conda uninstall -y pillow libjpeg-turbo
conda install -c fastai/label/test pillow 

This will make your jpeg decompressing much faster (See Performance Improvement Through Faster Software Components).

Note that this is pillow-5.4.0.dev0 build - i.e. don’t use in production w/o testing.

I also uploaded pillow-simd-5.3.0.post0 w/ libjpeg-turbo and built w/ avx2 - only py36.

conda uninstall -y pillow libjpeg-turbo
conda install -c fastai/label/test pillow-simd

note that it’d get overwritten by pillow through update/install of any other package depending on pillow.

(Jeremy Howard (Admin)) #573

Many apologies @hiromi - I made a change in master recently that removes the need for from fastai import * any time you use an application (e.g. from import *). However, if you aren’t using an application, you need from fastai.basics import *. I’ve fixed all the notebooks now (I hope!)

The 2nd bug you came across is because I added a new fix_dl attribute that provides the training dataset, but without shuffling or augmentation. It only works with fastai ItemLists however, not with generic Datasets. So I’ve fixed that now (in master) to skip creating it for generic datasets.

(Kaspar Lund) #574

when i make git pull it tries to switch to “” the new developer branch ?

is this the new dev branch ?

(Hiromi Suenaga) #575

Thank you so much for the detailed explanation!

I was hoping that I can spot what has changed and fix it somehow, but still a bit slow to get my bearings. Maybe next time :slight_smile:


Breaking change: In all the text application, batch is now the first dimension (and sequence length the second dimension). It won’t impact you if you’re using the fastai models/databunch but if you were customizing things, they may need to be tweaked.

@piotr.czapla Tagging you as a warning, even if I know you’re not using master.

(Stas Bekman) #577

Thanks for the heads up, @Kaspar. I may have done something wrong while trying to add a change to a PR, I will be extra careful next time. I deleted that branch as it has been merged, so you probably shouldn’t have a problem now.

(benedikt herudek) #578

dear fellow developers: consider grabbing some test scripts

… if I may suggest :wink:

(Deena Blumenkrantz) #579

Is there a tread to report bugs in, so we can discuss them before submitting a new issue on github?

Or should I create a whole new thread for any detected bugs?

(Stas Bekman) #580

If you think it’s a bug in fastai you can submit it directly as a github issue, but you can also post here about it if you’re not sure.

(Kaspar Lund) #581

just a heads up. I have created this issue [] with ref to a notebook on how to fix the memory overhead in LanguageModelLoader. The proposed version uses less than 5% of the current version. Testing and validation of accuracy on the english corpus is still need.

Would love some feedback

(Mikhail Grankin) #582

If you call twice it will wrap learn.model twice. Not an issue on runtime, but problem if you try save and load model. May I propose unwrap model in the on_train_end method?