Lesson 11 discussion and wiki

ste · April 12, 2019, 1:42am

Interesting article on NVIDIA DALI: Data Augmentation Library

jeremy · April 12, 2019, 3:01am

There’s a little starter for using DALI in the course repo BTW. It is just enough to give you a sense of how to get started writing your own data-blocks-style API using DALI. I’ll probably come back to it and flesh it out in the coming weeks.

sshleifer · April 12, 2019, 4:15am

in sgd_step we say p.data.add_(-lr, p.grad.data). Why do we use two arguments instead of multiplying?

t-v · April 12, 2019, 4:55am

I’d probably not use type and to(device=..., dtype=...) instead.

Best regards

Thomas

t-v · April 12, 2019, 5:03am

So

%timeit -n 10 grid = F.affine_grid(theta.cuda(), x.size())

would become the (more verbose, unfortunately)

theata_cuda = theta.cuda()
def time_fn():
  grid = F.affine_grid(theta_cuda, x.size())
  torch.cuda.synchronize()

time_fn() # mini warm-up and synchronize
%timeit -n 10 time_fn()

The warm-up seems to be done generally and it gives us a torch.cuda.synchronize() so everything before out function is done when we call the function.
Then the time_fn() synchronizes to make sure we don’t read off the time before the kernel is actually done.

I guess one could make a %cuda_timeit magic to get back to the nice, short way of calling it.

AhriaR · April 12, 2019, 5:59am

Would it simplify the code in Learner’s init and remove the need for cb_funcs if our callbacks were just passed to Learner through the cb kwarg as an array of class instances e.g.
cbs = [Recorder(), AvgStatsCallback(accuracy)] instead of cbfs = [Recorder, partial(AvgStatsCallback,accuracy)]

stas · April 12, 2019, 6:33am

Not if the callback is a LearnerCallback subsclass, and requires learn at instantiation, e.g. see examples here: https://docs.fast.ai/metrics.html#Creating-your-own-metric

Kjeanclaude · April 12, 2019, 8:24am

OK, thank you Jeremy.
I didn’t know because it is not again in the Official Callback documentation. I will try it.

Kjeanclaude · April 12, 2019, 9:27am

They start by defining beta and zeta values (in the model definition).
Then, through the annealing algorithm applied to these values (parabolic and exponential annealing), they proceed to loss reduction during the training.
It is not yet applicable in the model definition in one step as dropout, but I am happy to find this implementation. Congratulations to the authors!
I discovered the delta rule last year with my PhD in AI at BIU.
Thanks for sharing @Kaspar!

ste · April 12, 2019, 3:25pm

Does it make sense to normalize images after data augmentation in case this step introduces too big distortions, especially to colors?

t-v · April 12, 2019, 4:06pm

Usually (e.g. for ImageNet) the normalization is fixed based on the entire training dataset data statistics and not from per image statistics. (And it makes sense, consider Jeremy’s fog vs. sunny thought experiment from the other day.) As such, the normalization is not dependent on the augmentation and it doesn’t matter as much.

AhriaR · April 12, 2019, 4:38pm

Thank you Stas!

rsrivastava · April 12, 2019, 6:44pm

All about magic methods in Python:

http://minhhh.github.io/posts/a-guide-to-pythons-magic-methods

jeremy · April 12, 2019, 7:00pm

Check the docs for add and tell us what you find

jeremy · April 12, 2019, 7:01pm

None of the things we’ve built in this course are in the docs. We’re building everything from scratch, remember!

jeremy · April 12, 2019, 7:02pm

No, because the point of such augmentation would be lost if you then normalized it out again!

tank13 · April 12, 2019, 7:35pm

I’m getting an assertion error for test_eq(setify('aa'), {'aa'}). It looks like (‘aa’) is getting listified as [‘a’, ‘a’], which then turns into the set {‘a’}. My listify is coming from nb04 – any suggestions?

Kjeanclaude · April 12, 2019, 8:28pm

OK, I got it Jeremy. Thank you!

jcatanza · April 12, 2019, 8:40pm

I don’t understand why you shouldn’t normalize after augmentation. Augmentation extends the training set with “new” images, exactly as if we had gathered such images in the field as part of our data set. In the latter case, we would normalize the whole data set, after its collection. Why wouldn’t we treat the data set containing the augmented images the same way?

jeremy · April 12, 2019, 9:01pm

You might need to git pull the course repo.