I know everyone is working on v1 and that is the hot topic
But at the same time Machine Learning for Coders was officially released and promoted and so now more new people are trying to get working fastai environments setup and lots of them are are having trouble.
Also the stated goal of fast.ai is to “make neural nets uncool again”, but right now you kind of have to be an expert to actually get the library set up, which is kind of counterproductive to this goal.
The fastai package can only be installed via pip, not conda, but the environment has to be set up via conda (so always both are needed, mixing package managers is not really elegant)
And v0.7 will be the workhorse for anyone going through the 3 MOOCs until the v1-based course will be released next year. So lets make it painless to install!
So, my goal would be to
- remove installation obstacles/packages that are troublesome
- actually make it pip-installable (it isn’t really unless you have a predefined working environment set up, preferably via conda)
- make it more Windows friendly
- make it work without conda env (especially in corporate environments outside of data science related areas, people may have python installed, but don’t have anconda and can’t get it without painful IT requests)
I wanted to do a quick but systematic test of what the problem was on windows after seeing so many problem threads, so I misused my wifes PC for that. That lead to basically a full day of digging into ever deeper holes even on linux. There are a number of issues I have uncovered.
So I would like to address this step by step. I have started on that journey, but I would like to get some feedback, whether anybody actually cares about this and my pull requests would have a chance or whether I would be wasting my time. My first 2 proposals, more to follow:
Step 1: Remove bcolz dependency
Reasoning:
- Lots of errors and issues for windows and linux users, pip and conda users alike and errors regarding bcolz are plenty in the forums (and in fast.ai related blogs on medium). -> high impact
- It is absolutely non-essential for DL students because only currently used for optionally storing precomputed activations which is a feature that did not even make it into v1 (see Jeremy’s post Planning to get rid of `precompute` in fastai v1. Comments welcome)
- it is completely irrelevant for ML students.
Proposal:
- Remove from setup dependencies, remove from global import list and import only in necessary places, change precompute setting to false if not installed
I am not saying it is not a great module to use or to throw it out completely, but the idea would be to make the fastai library resilient against it not being there and to remove the necessity to install it. (For most people it cannot be installed using pip because it needs compiling, which for noobs is a hassle on linux and for windows people an even bigger hurdle.)
Step 2 - remove setup.py dependencies that requries compilation/build
Reasoning:
- removing packages that are not needed immediately or can be replaced with compiled alternatives make it especially more windows friendly (but more noob friendly in general, even on linux compiling is not straight forward and requires several system packages to be apt-installed)
- The then missing dependencies could be moved to a separate requirements.txt file so a complete environment setup can be achieved later, but missing packages are not a “taking first steps” impediment.
Proposal:
- spacy: remove it from the setup.py dependencies list. It is only necessary when using NLP related topics that come up in DL1 from lesson 4 onwards and ML from lesson8. It will need to be installed at some point. But for people just starting out it is a major hassle (no wheels, not pip installable without compiling )
- pytorch: remove the <0.4 requirement (i.e. replace by <0.4.2, it has been made to work with 0.4x and on windows no binaries are available for 0.3.x via pip, so this alone throws every windows user off.
Yes I am aware, some of these issues get resolved if “you just use anaconda, they have precompiled binaries” and no one likes windows, but if we want widespread usage these issues need to be solved and I think they could be.
Of course even if the pull requests get accepted it will be no good until someone with the rights for that creates and uploads a new dist to PyPI?!
Here is my first pull-request, but feedback and discussion here very welcome!
Thanks!