No problem. Thanks for reading the responses and asking good questions so I think through the issues.
Update:
After a fair bit of learning conda build (and a bit of fighting it) I’ve now got a seemingly working build running on Azure Pipelines. Just need to do a few final checks and cleanup then I’ll push a test version to anaconda cloud.
Also had reasonable initial success with a local windows build. It builds and passes the selftest but currently isn’t doing compiler flags (which could uncover errors) and needs to be added to Azure. But hopeful on that.
Will have a go at a Mac build as well at some point and see if that goes as smoothly. That’s a little harder given lack of local test system and inability to logon to the Azure hosted agents I’m using. It was a fair bit of trial and error just to get linux working and I could at least inspect my local system for that (not entirely happy with some changes to run on Azure but had a lot of trouble finding various build outputs, will return to that at some point too). But maybe it’ll just work on Mac, the anaconda recipes seem to.
And responses:
I’ve currently just used azure-pipelines rather than conda-forge (so will publish to anaconda cloud). I’m not sure it would work on conda-forge currently as I’m using some fairly new features of conda build which I’m not sure are supported there. But that might be largely based on older information. Also not entirely sure everything would fit into their templates. Plus I wanted to get some understanding of the underlying CI stuff by doing it myself first at least.
It should still have many of the advantages of conda-forge, automatic building of PRs and being easily forkable with free CI coming along (just sign up/in to Azure and point it at your fork).
Ah right I hadn’t noticed that. Though there are other dependencies that are in the main fastai tag.Would the eventual goal be to remove all non-fastai code from the fastai channel? Does that not complicate installation/maintenance, requiring various channels, given not everything is in the default channel, nor likely ever will be. You could just rely on conda-forge for everything but it seems then you’re at the whim of various upstream maintainers and changes there could potentially break fastai without any local changes. Though from what I’ve seen so far conda-forge packages seem pretty good in spite of the very decentralised nature.
One option I thought might have worked was a fastai-depends channel, this limits the number of channels and allows filtering of upstream changes while clearly delineating them from fastai itself. But that’s obviously really a side issue for this. Fitting into fastai is good but shouldn’t be the only consideration.
I guess from my perspective (as at least temporary maintainer of the packages) the main advantage of having a fastai affiliated channel is the wonderful forum would likely make support a fair bit easier by solving some of the problems people have that aren’t necessarily due to the package itself. Though if it’s pointed to by fastai then just a thread here to point support to should provide all that. Though of course if you did want to include the packages in the (or a) fastai channel I’d happily help with that.
On various upstreaming options:
Yeah, looks like Anaconda don’t have that many resources and aren’t really looking to maintain a whole lot of packages. Which is reasonable but does present some issues for those packaging on it. Certainly I wouldn’t expect to get new packages in there. But they may perhaps accept a PR for their existing pillow package if it didn’t add significant maintenance effort and provided advantages. Much the same seems true of pillow-simd maintainer. There is perhaps more of an opportunity there as currently they don’t do pillow-simd build for Windows so perhaps if I could get that working that would be considered enough of an advantage to accept a PR even if it did introduce some complexity.
But as you say otherwise it could easily be published as it’s own thing.
Yeah, can’t see you’d move away from that. Images need to be loaded and even if possible not sure you’d see too much advantage of moving that to GPU without massive rewriting or advances in GPU code generation.
Though as I was getting at I guess if loading was slowing things down then (assuming your dataset is too large to just keep loaded images in memory) presumably you could have a pre-processing step that loaded the images, performed any non-augmentation transforms, and then saved pickled tensors. Then your training loop could just load the tensors and do augmentation. Though that’s not really a generally applicable thing so for fastai loading is still gonna be a general thing.
Tom.