Will elastic distortions also be done directly on the torch tensors (if so how will the grid flow be generated in a device agnostic way)? Will piecewise affine transforms of an image also be considered? Thanks for all the goodies so far and the goodies currently baking …
We’re running tests to see if our implementation is faster or not on a wide range of tasks. Torchvision is slightly slower than opencv, in the few tests I did.
We’ve not implemented that yet, but we’ll be looking at it. All the functions we use work on the CPU or the GPU (mainly affine_grid and grid_sampler) so even if it ends up being on one device in fastai_v1 (for now we’re mainly looking at the CPU), it’ll be easy to adapt it to another.
It’s a little early to say anything definitive about speed, since Soumith has been kind enough prioritise optimizing stuff that we need for performance in fastai - so for instance they just added a PR that optimizes grid_sample by 10x, and there’s more to come.
Yes that’s the plan. The grid flow generation will be done in a similar way to the current affine matrix generation (that is, it’ll be put on the same device as the image, when it’s used).
Thanks for the reply Sylvain and Jeremy.
Thank you so much for sharing this post, enabled me to stumble upon the fast ai dev category as well , also great summary. What do you guys think about libraries such as:
What other data augmentation techniques are you looking into implementing? Anywhere we can see a list of whats next? Little curious about the testing process, do you augment -> train -> measure accuracy, loss , etc? if someone wants to test too, would that be the process?
For now we are creating the general pipeline with classic data augmentation techniques (probably all what Augmentor offers will be there in the end). Not necessarily all the transforms will be there at the first release, but we’re making it very easy for anyone to add a new transform.
As for the testing, we compare it gets the same accuracy on trainings of CIFAR-10, imagenet, dogs and cats etc… as when we use data augmentation from torchvision/PIL/opencv.
Hey, @sgugger thanks for the great explanation.
You state that:
I don’t quite understand this. Are you talking about the pixels that don’t fall exactly in the grid? Don’t we just crop these out in Step 2.5?
There must be something I am not getting right. Thanks!
Padding is, when in the zone of your image (decided by cropping) there is a pixel value that’s out of the bounds of the input picture (so < -1 or > 1 with pytorch conventions). The way we choose a value for them (we have to decide something since we can’t take anything inside the image) is the different ways I explained.
To see what padding does, do a 30 degrees rotation of a square picture ;-).
Do you think we should include this in the dev_nb as prose? I know the idea was to use it in documentation but did you think of including it in the dev notebooks? (sorry I don’t know if dev_notebooks are the only source of documentation).
It’ll be included in the documentation (which is going to be notebooks, but different from the dev notebooks). As for the dev notebooks, Jeremy is going to use them as support for the second part of the course, so I’m guessing that he’ll explain what’s in this post during one of the lessons.
Maybe a summary can be included in the notebook, but don’t put the whole thing I think.
Yeah that’s what I thought. Thanks!