Performance Improvement Through Faster Software Components

This thread is dedicated to tips and tricks on improving performance of your ML/DL code, without making any changes to your code or upgrading your hardware.

First, note a new document: - where we would like to compile all kinds of performance improvements, so please feel free to contribute.

As of this writing, the bulk of it is Faster Image Processing. But this hopefully will change over time as you contribute performance improvements in other domains.

Some of the discussed performance improvement checks can be automated. Here is the very first version of the automated checker. It will become available in fastai-1.0.38, until then use git pull and then run:

python -m fastai.utils.check_perf

Currently on my setup I get:

Running performance checks.

*** libjpeg-turbo status
❓ libjpeg-turbo's status can't be derived - need Pillow(-SIMD)? >= 5.4.0 to tell, current version 5.3.0.post0
5.4.0 is not yet available, other than the dev version on github, which can be installed via pip from git+ See

*** Pillow-SIMD status
✔ Running Pillow-SIMD 5.3.0.post0

*** CUDA status
✔ Running the latest CUDA 10.0.130 with NVIDIA driver 410.79

Refer to to make sense out of these checks and suggestions.

See also: GPU Optimizations Central

p.s. It’s a wiki post, so anybody can edit it.


Want to report my problem maybe useful for others and maybe something to investigate.

Installing fastai on conda virtualenv get me problems trying to install Pillow-SIMD, and without trying to install Pillow-SIMD all is working.

Trying again to recreate the fastai env and install pillow-simd:

  • If I install pillow-simd without uninstalling pillow using the command

from import *

I got an error due the fact pillow-simd is a 5.3.0.post0 release, vs pillow a 5.4.x

  • Uninstalling pillow and installing again pillow-simd on the same command I got the error
    AttributeError                            Traceback (most recent call last)
    <ipython-input-3-c0e76450f370> in <module>
    ----> 1 from import *

    ~/anaconda3/envs/fastai-py37/lib/python3.7/site-packages/fastai/vision/ in <module>
          2 from ..basics import *
          3 from .learner import *
    ----> 4 from .data import *
          5 from .image import *
          6 from .transform import *

    ~/anaconda3/envs/fastai-py37/lib/python3.7/site-packages/fastai/vision/ in <module>
        205 def verify_image(file:Path, idx:int, delete:bool, max_size:Union[int,Tuple[int,int]]=None, dest:Path=None, n_channels:int=3,
    --> 206                  interp=PIL.Image.BILINEAR, ext:str=None, img_format:str=None, resume:bool=False, **kwargs):
        207     "Check if the image in `file` exists, maybe resize it and copy it in `dest`."
        208     try:

    AttributeError: module 'PIL' has no attribute 'Image'

Reinstalling pillow 5.3.0 got the same error.

Something is going on, no idea what but a pity I can’t use the SIMD accelleration.

Dind’t know if have something to do with my hw using Ryzen CPU and a dual GPU setup (AMD and Nvidia).

1 Like

Thank you for sharing this info, @davide445.

Didn’t know if have something to do with my hw using Ryzen CPU and a dual GPU setup (AMD and Nvidia).

Are you using the custom packages I made here?

It’s very possible that there is some kind of platform mismatch, since I built those on intel cpu. Can anybody else who is not on the intel cpu confirm whether the conda packages work for you or not?

I asked about this here but got no reply.

Have you tried building from source?

I got an error due the fact pillow-simd is a 5.3.0.post0 release, vs pillow a 5.4.x

It always helps to paste the exact error message. and the exact command that generated it and any other related commands, plus the output of show_install

1 Like

Are you using the custom packages I made here?

No, I just followed the steps 1-4. What I might have done wrong is mixing conda and pip installation.
Pardon me as a beginner can you please confirm this are the correct steps, your installaton instructions are not clear to me.

  1. conda uninstall -y --force pillow pil jpeg libtiff
  2. conda install -c conda-forge libjpeg-turbo
  3. conda install -c fastai/label/test pillow-simd
  4. conda install -y jpeg libtiff

Yes, you’re using the custom experimental packages. They are invisible to general public, unless they know to include -c fastai/label/test.

So try building from source:

conda uninstall pillow pillow-simd jpeg libtiff --force
pip   uninstall pillow pillow-simd jpeg libtiff 
conda install -c conda-forge libjpeg-turbo
CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

There is no problem mixing conda and pip in this case. But you can build from source only with pip, unless you want to build your own conda package. You will find notes on how to do that here: (i.e. that’s how those experimental packages were built).

But first try the approach I listed above and see if that works. Most likely those conda packages will only work on a similar CPU.

1 Like

Just to report my previous sequence does have problems so can’t be the correct one.

conda install -c fastai/label/test pillow-simd

After new installations and upgrade generate this

The following packages will be DOWNGRADED:

certifi:           2018.11.29-py37_1000                 conda-forge       --> 2018.11.29-py36_0                
mkl:               2019.1-144                                             --> 2018.0.3-1                       
mkl_fft:           1.0.10-py37ha843d7b_0                                  --> 1.0.6-py36h7dd41cf_0             
mkl_random:        1.0.2-py37hd81dba3_0                                   --> 1.0.1-py36h4414c95_1             
numexpr:           2.6.9-py37h9e4a6bb_0                                   --> 2.6.8-py36hd89afb7_0             
python:            3.7.1-hd21baee_1000                  conda-forge       --> 3.6.8-h0371630_0                 
pytorch:           1.0.0-py3.7_cuda9.0.176_cudnn7.4.1_1 pytorch           --> 0.4.1-py36ha74772b_0             
scipy:             1.2.0-py37h7c811a0_0                                   --> 1.1.0-py36hfa4b5c9_1             
spacy:             2.0.18-py37hf484d3e_1000             fastai            --> 2.0.16-py36h962f231_0            
thinc:             6.12.1-py37h637b7d7_1000             fastai            --> 6.12.1-py36h4989274_0      

So pytorch will be downgraded from 1.0.0 to 0.4.1, I suppose can’t be correct.

Allowing this and also the 4. step checking the environment result in

(fastai-py37) dz@DSPC:~$ python -m fastai.utils.check_perf
Traceback (most recent call last):
  File "/home/dz/anaconda3/envs/fastai-py37/lib/python3.6/", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/dz/anaconda3/envs/fastai-py37/lib/python3.6/", line 85, in _run_code
    exec(code, run_globals)
  File "/home/dz/anaconda3/envs/fastai-py37/lib/python3.6/site-packages/fastai/utils/", line 1, in <module>
    from ..script import *
  File "/home/dz/anaconda3/envs/fastai-py37/lib/python3.6/site-packages/fastai/", line 2, in <module>
    from dataclasses import dataclass
ModuleNotFoundError: No module named 'dataclasses'  

So this is not the right route.

Where do you see it suggesting to downgrade to pytorch-0.4.1? The output you pasted shows 1.0.0.

ModuleNotFoundError: No module named 'dataclasses'  

How did you install fastai? dataclasses is a dependency that it installs for py36.

Ah! It looks like conda env was created for py37, so fastai didn’t install dataclasses (built in in py37), but then you’re running py36. Somewhere your conda env got downgraded to py36. That’s the problem. Either move back to py37, or reinstall fastai which will install dataclasses.

1 Like

Your other method didn’t generate errors, but

(fastai-py37) dz@DSPC:~$ python -m fastai.utils.check_perf
Running performance checks.

*** libjpeg-turbo status
❓ libjpeg-turbo's status can't be derived - need Pillow(-SIMD)? >= 5.4.0 to tell, current version 5.3.0.post0

*** Pillow-SIMD status
✔ Running Pillow-SIMD 5.3.0.post0

*** CUDA status
✘ You are running pytorch built against cuda 9.0.176, your NVIDIA driver 410.79 supports cuda10. See to install pytorch built against the faster CUDA version.

Refer to to make sense out of these checks and suggestions.

So maybe something is not correct considering “libjpeg-turbo’s status can’t be derived”?

About the previous path py37 was downgraded to py36 from the pillow-simd installation.

libjpeg-turbo’s status feature was added in pillow-5.4.0, and pillow-simd-5.4.0 that should port that feature hasn’t been released, that’s why I can’t tell you programmatically whether it’s enabled or not. Does it make sense?

You may need to tell the pillow-simd developer to update his fork to sync with the pillow-5.4.0

Yes, I was only able to build the conda package against py36. conda build of pillow-simd w/ py37 had compilation problems. There was a talk about fixing the toolchain for py37, I’m not sure whether it happened.

So until it’s all sorted out, were you able to build pillow-simd from source?

1 Like

Clear enough. Tried to do what you suggested before,

Ah! It looks like conda env was created for py37, so fastai didn’t install dataclasses (built in in py37), but then you’re running py36. Somewhere your conda env got downgraded to py36. That’s the problem. Either move back to py37, or reinstall fastai which will install dataclasses .

to update again from py36 to py37, but got this error

(fastai-py37_psimd) dz@DSPC:~$ python -m fastai.utils.check_perf
Traceback (most recent call last):
  File "/home/dz/anaconda3/envs/fastai-py37_psimd/lib/python3.7/", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/dz/anaconda3/envs/fastai-py37_psimd/lib/python3.7/", line 85, in _run_code
    exec(code, run_globals)
  File "/home/dz/anaconda3/envs/fastai-py37_psimd/lib/python3.7/site-packages/fastai/utils/", line 5, in <module>
  File "/home/dz/anaconda3/envs/fastai-py37_psimd/lib/python3.7/site-packages/fastai/", line 40, in call_parse
  File "/home/dz/anaconda3/envs/fastai-py37_psimd/lib/python3.7/site-packages/fastai/utils/", line 153, in check_perf
    from PIL import features, Image
ModuleNotFoundError: No module named 'PIL'

So the path py37 is definitelly not working. Btw updating python back to 3.7.1 I got this message

pytorch: 0.4.1-py36ha74772b_0 --> 0.4.1-py37ha74772b_0

so I can say pythorch was effectively downgraded to 0.4.1 trying to execute

conda install -c fastai/label/test pillow-simd

So the right installation path is for me:

  • conda install -c pytorch -c fastai fastai
  • conda install -c pytorch pytorch cuda100
  • conda uninstall pillow pillow-simd jpeg libtiff --force
  • pip uninstall pillow pillow-simd jpeg libtiff
  • conda install -c conda-forge libjpeg-turbo
  • CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

In your installation instruction is stated

NB: fastai v1 currently supports Linux only, and requires PyTorch v1 and Python 3.6 or later.

You may need to clarify only py36 is supported?

You can’t just go back and force between py36 and py37 without creating a big mess. If you chose to run py37 stick to it. If something is asking you to downgrade to py36, don’t install such packages. This is not so smart on part of conda, which tries to be so strict and proper.

So the path py37 is definitelly not working. Btw updating python back to 3.7.1 I got this message
pytorch: 0.4.1-py36ha74772b_0 --> 0.4.1-py37ha74772b_0

If you look closely, you already had pytorch 0.4.1 installed so it just updated the version of the same package. You need to watch what packages cause such downgrade and not hit Enter without looking.

conda can be amazing in some situations and very problematic to use in some others cases.

I suggest you remove this conda env and start from scratch where you stick to py37 and pytorch-1.0.0. That would be the best course of action based on my experience. don’t use the automatic yes -y and instead manually approve any new packages you install, after watching any suggested upgrade/downgrade notices.

So the right installation path is for me:

  • conda install -c pytorch -c fastai fastai
  • conda install -c pytorch pytorch cuda100
  • conda uninstall pillow pillow-simd jpeg libtiff --force
  • pip uninstall pillow pillow-simd jpeg libtiff
  • conda install -c conda-forge libjpeg-turbo
  • CC=“cc -mavx2” pip install -U --force-reinstall pillow-simd

So this one worked, right?

In your installation instruction is stated

NB: fastai v1 currently supports Linux only, and requires PyTorch v1 and Python 3.6 or later.

You may need to clarify only py36 is supported?

pillow-simd has nothing to do with fastai. This is just a courtesy information/support to the fastai users, who may want to speed things up. fastai uses pillow which works with any py36+. Let me know if this clarifies things, @davide445.

So fastai works requires PyTorch v1 and Python 3.6 or later.

And after this discussion I still don’t know whether the pillow-simd conda packages built on Intel platform will not work on AMD.

1 Like

Thanks for your explanations, really a beginner here.
Using source code compilation pillow-simd appear to be correctly installed.
I need now just figure out why the new conda virtualenv didn’t show up in jupiter notebook to test it.

1 Like

I just compile the latest Pillow code for SIMD, now fastai prompts correct status for both libjpeg-turbo and Pillow-SIMD as follow:

python -c "import fastai.utils.collect_env; fastai.utils.collect_env.check_perf()"
Running performance checks.

*** libjpeg-turbo status
✔ libjpeg-turbo is on

*** Pillow-SIMD status
✔ Running Pillow-SIMD 6.0.0.post0

*** CUDA status
❓ Running cpu-only torch version, CUDA check is not relevant

But I’m not sure whether libjpeg-turbo’s status feature is correct added in my compiled Pillow-SIMD.

Here is how I build pillow-simd:

  1. clone both Pillow and Pillow-SIMD
  2. mv Pillow-SIMD/src Pillow-SIMD/src-5.3.0.post0
  3. cp -r Pillow/src Pillow-SIMD/
  4. cp Pillow-SIMD/src-5.3.0.post0/libImaging/Filter*.c Pillow-SIMD/src/libImaging/
  5. cp Pillow-SIMD/src-5.3.0.post0/libImaging/Resample*.c Pillow-SIMD/src/libImaging/
  6. change 6.0.0.dev0 to 6.0.0.post0 in Pillow-SIMD/src/PIL/
  7. cd Pillow-SIMD
  8. CFLAGS="${CFLAGS} -msse4.2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile .

Note, my CPU is very old(Xeon E5-2670v2) which only has SSE4.2 and no AVX2 instructions. So I use -msse4.2 in my compile command. If your cpu supports AVX2, you can use -mavx2 in your compile command.

Given the rather complicated nature of current instructions surrounding installing Pillow-SIMD, I am having a look at improving this. I’m not an expert on this by any means but have some reasonable experience with lower-level binary stuff from work in InfoSec. Though I have decidedly less experience in Python packaging this seems like a nice opportunity to learn.
I’m currently still exploring how things work and what the issues are. As a first step I’m putting together a script that gathers various information on the state of installed libraries which hopefully some people with various configurations can run to get better idea of whats happening. One of the obvious difficulties is I don’t have a non-AVX machine to test against and it’s hard to identify all the various library combinations that might be encountered.

@stas: you seem to have the most experience here and have done quite a lot of work on this. I gather based on the current fairly complicated instructions and the various attempts at solutions around (e.g. in the Fastai repo) this is still an area that could do with improvement. Or am I missing a nice simple solutions?
From what I can see the key issues are:

  1. conda packages will overwrite pip packages as it’s only using conda metadata to determine packages whereas pip will see the stuff dropped by conda and regard the package as installed
  2. While pip allows easy source-based installs conda uses binary packages so though it has very nice build specification stuff.that requires a properly configured build environment which is a trickier in conda than pip.
  3. The presence of multiple conda packages with different names (pillow/pillow-simd, libjpeg/libjpeg-turbo, libtiff/libtiff-libjpeg-turbo) can create issues with things being overwritten.
  4. Incompatibilities between different components when builds are mixed? This is the bit I’m least sure about but I gather you had issues with mixing up various libjpeg/libtiff/pillow-simd builds.

Is this about right? Any updates/corrections?

Got some ideas on solutions, but will hold back until I’ve investigated a bit more.


Your notes sound right, @TomB.

The first critical issue with Pillow-SIMD is that I can’t get an answer to a very simple question - can a binary package built to the capacities of a CPU of the machine it was built on be used by a system with different CPU capacities. I don’t know anything about SIMD, the maintainer of that package won’t answer my question and trying to search for the answer via google led me nowhere. So until we can answer this question there is no point to even talk about making an official binary package.

Most likely the answer is: that package can only be used on SIMD instruction compatible CPUs. But conda has no way of supporting that level of granularity of “system requirement”. At least not that I know of, see build variants.

So you could work on improving this, but the final package may or may not work for the end user, and she would just need to build her own Pillow-SIMD via pip.

And, yes, otherwise, you have all those problems of different build components and overwriting on package update.

No, an attempt to execute say an AVX2 instruction on a system that doesn’t support it will cause a processor exception (not a python one, I’d probably expect python to just crash but maybe not). In terms of your question on the Pillow-SIMD github about adding dynamic runtime checking of features while not likely a big performance issue that would require potentially significant alterations to the code and it doesn’t seem like the maintainer was interested in that.
But looking at the codebase it seems the only difference between Pillow and Pillow-SIMD is in the C extension. The interface the extensions present to python is identical (there was one test that was updated in Pillow-SIMD but not upstream but it wasn’t clear it was related). So one option is you could ship a patched Pillow that included multiple optimised versions of the library and dynamically loaded an appropriate one based on CPU properties. That shouldn’t be too hard but would require maintaining patches against PIL. The loading appears confined to though it may be possible by just playing around in
To answer your other unanswered question just having an SSE4 version and an AVX2 version (also with SSE4, they aren’t mutually exclusive and can optimise different parts) should get most of the benefits. There are a whole host of other optimisations that the compiler will use if enabled and enabling various of them might speed up other stuff but those two are probably the big ones. Though it’s probably better to use the -march option instead of -mavx2 to enable other optimisations on newer processors as well given any processor with avx2 will have those other optimisations. I’ll probably look at running some performance tests against various options to see how much other things matters.

(Not entirely clear on the details here, haven’t tried much of this, doing so now, anyone who knows conda build please chime in and correct me, or just reassure me)
Actually I think that conda can support this with build variants and also with features (which I think effectively use build variants under the hood). Build variants provide the basic infrastructure to have multiple packages with the same name and version but which differ in build string. Build string can be specified or is automatically derived from outputs so as to ensure uniqueness (I think, in some way). Though build variants don’t by themselves provide a means for allowing the user to easily select them there is some information on how you might do this. In particular there is an example of very much what is desired here (the broadwell and sandybridge are possible march settings specifying various optimisations such as -mavx2). Another option would be features which also seem to basically provide a means of letting users select among build variants, here by having a dummy feature metapackage (a package with no files effectively), so you could conda install use-pillow-avx pillow-simd to get avx while conda install pillow-simd would just give you the SSE (or you might be better to call these packages pillow rather than pillow-simd, though this may cause issues, see below).

So the potential options seem to be:

  1. Provide a package with multiple optimised library versions loaded dynamically based on CPU features. This could be either a pip or a conda package. You’d probably be best to build in conda either way, looks like it should be possible to package it’s output for PyPi, as it’s build system hopefully allows you to efficiently reuse the existing unmodified while adding this stuff on top and gives you easy compilation across python versions (and potentially even across compilers/OSs, though Windows building of pillow-simd looks really tricky, apart from with mingw which isn’t ideal, and I don’t have a Mac so I’m focussing only on Linux-64). Disadvantages are:
    • Need to patch PIL, though hopefully not in ways that are hard to merge into new versions
    • Conda not respecting pip packages, this seems like the big issue
    • A single namespace in PyPi so you can’t have the same name as existing packages, though you could just use a github to install from to get around this.
    • A larger distribution size which doesn’t seem like a big issue, the extensions total about 600K
  2. Use conda build variants with features to select things so you’d conda install -c some_channel use-pillow-avx pillow-simd
  3. Use conda build variants with the multiple package tricks from that example so you’d conda install -c some_channel pillow-avx or conda install -c some_channel pillow-sse4 or conda install pillow for the non-simd.

The choice of 2 or 3 depends a fair bit on whether (and how cleanly) those methods actually work (and actually getting them to work cleanly). I’ll leave discussion of that for the moment, I’m playing with that now.
One issue there is that some methods may require providing new versions of the dependency packages (libtiff/libjpeg-turbo) and potentially using the same names as existing packages (i.e. libjpeg-turbo->libjpeg and/or pillow-simd->pillow). The desirability of that depends in part on continued maintenance, distribution and support of them. Hopefully maintenance should be able to be minimised, ideally just updating a version tag and rebuilding, but is still an issue. Would the presence of various stuff in the fastai channel indicate in-principle support for this? I’d be somewhat reticent to provide such packages myself, if only as having packages from personal channels seems less desirable and may (and probably should) set off some red-flags. Though at least doing it on conda-forge should give some better assurance as it doesn’t mean installing a binary with unknown provenance, you just have to trust their CI/CD.

Phew, I’ll leave it there, I have other thoughts but could do with more testing. Sorry for the long post, and a bit of brain-dumping, but as I gather you found it isn’t a particularly simple thing.

And thanks for your efforts Stas, your posts in various places helped a lot in figuring things out.


1 Like

I’ve created a recipe that builds pillow-simd against libjpeg-turbo with custom -march (and -mtune). It doesn’t yet do anything about making multiple versions (just needs a conda_build_config.yaml and probably to use a single jinja parameter to set both march and mtune). Then needs stuff to allow user selection. But it seems to build and install OK on my machine. I also haven’t yet added other image formats, notably libtiff which, due to it’s dependency on jpeg, could cause issues. Otherwise should just be a matter of adding back the requirements I removed from the existing recipe to speed testing. It seems to work for jpeg, the selftests which include some minimal jpeg tests all pass.

The main changes I made are:

  • Isolating the build from the system so no system libraries are used by doing build_ext --disable-platform-guessing (and the install --old-and-unmanageable removes a conda warning when an egg is made). I also added the host requirement on pkg-config as otherwise it uses the system one and finds system libs.
  • Added a run-constrained requirement. This (little documented and newish) feature of conda build means that it will now not allow to be installed so it can’t clobber libjpeg-turbo. As noted this perhaps should be pushed upstream to the libjpeg-turbo build on conda forge (or a verison of it local to the channel eventually used to avoid having to -c conda-forge for it).

For the tests you’ll need to mkdir -p Tests/images && wget{gif,jpg,ppm} --directory-prefix=Tests/images in the recipe directory as they aren’t in the archive the recipe downloads (and I didn’t bother to upload them to gist).
Then conda build -c defaults -c conda-forge . (keeping order or else it pulls in everything from conda-forge) and conda install -c {BUILD_DIR} -c defaults -c conda-forge pillow-simd (filling in the BUILD_DIR conda gives you, or if installing in the same env conda may find it with --use-local).
I haven’t checked AVX is actually being enabled in pillow-simd yet, but it is properly passing through the flags to gcc (and removing existing ones as otherwise order matters). Nor really checked those are good values for -march and -mtune.

I’ve also put together a script which checks out what libjpeg is being used by PIL. Unlike the code in fastai this actually checks the libjpeg-turbo library is being loaded whereas the fastai check using PIL.features only verifies that PIL was compiled with libjpeg-turbo, if libjpeg is clobbered that feature will still report true. You need the pyelftools it will prompt about to do this check though it could be pretty easily rewritten to just run some regex on the output of readelf which is installed by linux build tools to avoid this dependency if you did want to update the fastai check.

It’s linux64 only as from reports pillow-simd doesn’t build on windows and I have no Mac to test on. But the anaconda and conda-forge recipes for pillow all work on all platforms with minimal work required so might check out windows sometime, the errors I saw didn’t seem related to the simd changes.


1 Like

Wow, great work and brain dump, and answering my questions I posted at pillow-simd Issues, @TomB! Thank you for working on this.

I guess the main follow up questions are:

  1. So how many variants will be needed (say with just linux to start with) - e.g. your recipe includes ‘skylake’ which is just one of the many Intel archs - what about AMD?

  2. a. who is going to build all those variants, as if you include a cpu architecture - they will require a different CPU to be used to build each of those variants.

    b. who is going to maintain all those?

    Ideally we would get the owner of Pillow-SIMD to handle that, if that can be fully automated perhaps he would do it? On the other hand he hasn’t been updating the package since Oct-18. Pillow 5.4.0 was released Jan 1 and 6.0.0 has been released a few weeks ago. Pillow-SIMD is still at 5.3.0.

  3. how will a user know which variant they need to install?

The important part would be for a user to be able to build the package to the required architecture so that they could deploy it on the instance they use. Surely for desktop users it’d be the best to build it from source, since it’ll do the right thing. Though all those libjpeg-turbo dependencies instructions will have be carefully scripted.

Perhaps we provide an intermediary solution that makes it easy for a user to build what they need then (conda package) and not actually release any binaries? Then they can just deploy it on their instances, by installing it from the local build?

p.s. the fastai library soon will be doing most transforms on GPU using pure pytorch functions. So really soon we will only need “normal” Pillow with libjpeg-turbo to do faster decompression. i.e. I’m not sure whether Pillow-SIMD would be of much use to fastai users. I guess resize might remain on PIL, where Pillow-SIMD would be faster, until someone will code its equivalent in pytorch.

1 Like

Well, the skylake is provided for mtune (though the same options apply to march). Where march specifies what instructions are allowed mtune just affects which of those instructions are chosen in particular cases as different instructions may be optimal on different microarchitctures. So builds with a different mtune will all run on any architecture allowed by the march. The choice there was moderately arbitrary but I picked what might be expected to be a more common architecture, it could probably be more recent even given likely faster adoption among fastai users. Skylake also made some optimisations that might make GCC more aggressively use AVX instructions, so hopefully will speed up newer chips (will potentially somewhat slowing pre-skylake). Anaconda by comparison uses -march=nocona (a late 486 architecture supporting up to SSE3) and -mtune=haswell (I believe it is). I would have likely used fixed mtune/march combinations rather than all of them (though conda build does support easily specifying a subset of combinations to build).
In terms of march, likely just SSE4 and AVX2 would be the main thing. That’s the hand-optimised options pillow-simd handles. AVX512 (starting with later skylakes) would be the other main option though not sure how much would be auto-optimised for that in GCC.

They can all be built automatically by conda build. You don’t need the architecture to build for it. Most cloud platforms should have AVX2 support so AVX512 would likely be the only thing that couldn’t be tested on CI/CD systems. And a basic import test should work (unless initialisation code for the native libraries got optimised to use AVX512 which I wouldn’t have thought especially likely but I could easily be wrong). The differences will all be due to GCC code generation which should generally be pretty well tested. So lack of continuous microarchitecture specific testing hopefully shouldn’t be a big deal.

Yeah, perhaps t can be upstreamed if clean enough. Though as the conda channel seems tied to a company there may not be too much motivation in adding complexity beyond what they need. But worth seeing.
There shouldn’t be too much maintenance required. It should mostly be just a matter of bumping the version in the build recipe and updating dependency versions (following the upstream pillow recipes) when needed (they allow for minor updates without a rebuild). It should be fairly easily hosted on conda-forge so as not to require any CI/CD infrastructure. Or if clean enough you might even have tried to integrate it into the Pillow conda package which Continuum maintains (defaulting to the current pillow build if the user doesn’t select it somehow). Though they might not like the patching of compiler arguments. That doesn’t seem like the official way, I think they want you to build a new compiler package for that, though I’m reasonably sure these flags shouldn’t affect binary compatibility in this case. Merging it into Pillow seems arguably the better method for conda which has stuff to handle different binary variants of packages. This seems better than the current situation with two differently named conda packages providing different versions of the same files (though Continuum aren’t doing this, just conda-forge with libjpeg(-turbo)).
Or given the conda stuff in fastai I gather there is already CI/CD infrastructure (or at least procedures) for updating conda builds so this should be able to be just dropped in. That would also eliminate the issue of proper channel selection you currently have, you’d just need the fastai channel before conda-forge (assuming you don’t rename the package to Pillow).
As you noted the Pillow-SIMD is also getting a little behind, so if maintenance is being dropped for that then there’s broader issues.

You could have a script that detected this. There’s a python library that would tell you what is supported (though the only conda package is a win64 only personal repo, it’s cross-platform on PyPi though). Or if you only supported linux64 (at least officially anyway) you could use /proc/cpuinfo from either python or a shell script (cf cat /proc/cpuinfo | egrep -o '(sse3|sse4|avx2|avx512)' | sort | uniq). While conda won’t let you dynamically select a variant (with sensible reasons), it does allow a pre-link script that is run before install that can cause it to fail. So install could fail on a wrong library and even warn if not using the most optimised one.

Perhaps. Though that would require them to install and use conda-build. While this is ideally just a matter of conda install conda-build && conda build . && conda install -c local ... there’s a lot going on that could make this break (and system compiler issues on Windows/Mac, though obviously support for them is less important, though given fastai basically just works there I guess Win64 would be nice). Also while PIP provides decent source build support conda isn’t aimed at this. So I don’t think conda will auto-update locally built packages and users would need to manually handle this.(of course on PIP an auto-update of Pillow-SIMD will remove customisations).
Arguably the support time required for builds that go wrong would be greater than the maintenance effort of packaging up prebuilt binaries. Though this depends a bit on how smooth that is, it’ll hopefully be a lot smoother than installs on randomly configured systems (especially given the troubles people can get themselves into with conda (and, to a lesser extent I think, pip) when they start --forceing, see the experience above for example). Ideally it should just be a matter of appropriate CI/CD scripts, which should be able to be lifted from conda-forge or some, to get automated builds running smoothly (though I’ve done barely any CI/CD work and none on conda so may be wrong, conda-forge looks very smooth).

Ah, right, yeah I was looking at some source and wondered where that stuff was happening, saw code that seemed related to transforms on GPU but couldn’t see how it was being used, guess that’s because it’s in transition. Recompiling Pillow with better processor arch might still give you a reasonable advantage on general PIL image loading and so still be useful in some cases perhaps. Though they’d probably be good candidates for just pushing your image handling into a pre-processing step and pulling pre-processed image tensors in during training (you’d still be doing augmentation on the fly but could then eliminate decoding in training).


Thank you for taking the time to provide indepth answers, @TomB!

Do yourself a favor and don’t waste time on this one, trying to get that company to do something as simple as to fix any issue in their broken/limping infrastructure is a complete waste of time. Asking them to support a package with a relatively minor usage will not happen any time soon.

That’s the main reason why the python community is making a big effort to move to a solid conda-forge, anaconda is too slow at being updated and is incomplete to be able to get cutting edge things done.

Pillow integration

Since the maintainer of Pillow-SIMD declared on the project’s page that it can’t/won’t be integrated into Pillow, and he is not interested in supporting conda packages, then we can’t ask the upstream to integrate that.

Perhaps you’d be inspired to create the 2 linux variants and put them on the conda-forge channel? and perhaps over time get a few trusted persons to co-maintain those so that the responsibility won’t be on one person?

Well those would be 2 x 2 variants, since we need python 3.6 and 3.7 variants too.

The fastai channel releases were just experiments - that’s why they are in the “test” tag sub-channel. They shouldn’t really be part of fastai.

Recompiling Pillow with better processor arch might still give you a reasonable advantage on general PIL image loading and so still be useful in some cases perhaps.

I think this one isn’t going away any time soon. So the minimal “upgrade” of Pillow to use libjpeg-turbo is very important for image loading speed. But perhaps this one could be taken care of by the Pillow team. It could maintain a pillow+libjpeg-turbo build in their pip/conda channels.

1 Like