This is a place to talk about more advanced or tangential topics related to the Lesson 2 lecture. This will not be monitored during class, but we will read it afterwards.
Feel free to discuss anything you like, as long as it’s at least somewhat related to what’s happening in class.
In previous courses, one of the steps was to take the complete dataset and resize everything to something like 256, and have a separate dataset for maybe 512 later on. For planets and camvid, this no longer seems to be the case.
Is this because of the new transformation api? Don’t we still have the overhead of opening large images? I also notice the transform library uses PIL and not opencv. Should we be using PIL simd? What are the best practices now?
For the resnet architectures, how do we know if the loss function for each is convex or non-convex? If they are nonconvex, does fast.ai automatically run multiple different starting points?
When I am trying to use fp16 training in Lesson 3 notebook (planet dataset), the kernel dies. Not sure why it happens, if it is something related to my setup or drivers. I guess similar problems are discussed in this thread. In my case, I am getting KeyboardInterrupt exception but I guess it could be something else on other machines:
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch-nightly_1541148374828/work/aten/src/THC/generic/THCTensorCopy.cpp:20
It was already mentioned in this thread as well:
Seems that mixed-precision training is a bit broken. Or probably something with drivers? Do we need to re-build PyTorch from sources to solve these issues? I am using PyTorch with CUDA 9.2 and 410 driver.
For our simple example, the loss function is (y(act)-y(pred))^2 = (y-Ax)^2. If you plot this function, you’ll see that it has one global minimum (and not a series of a bunch of minimums). So, once you see that your gradient is close to zero, you know you’ve found the one and only minimum. A convex function has a global minimum (and the Hessian is positive semi-definite). However, there are functions that are non-convex and have multiple minimums. Think about a sine wave or various polynomial functions. In these cases, you can get “stuck” in a local minimum that may be much worse than the true overall minimum. In this context, if your parameters are from a poor local minimum, there are much better parameters out there in the global minimum.
Optimizing neural networks is generally a non-convex problem. There’s a paper somewhere that basically showed you never find the global minimum, but there are many local minimums that are very close to it so they’re “good enough”.
Other than medical imaging, what are some of the practical use cases for image segmentation. Jeremy mentioned self driving cars - but I can’t imagine the effort that goes to do pixelwise labelling of millions of images from SD car cameras. Is there a way to fast track the labelling process ?
What about loss function for multi label classification? Does the same loss function ( cross entropy) work for multi label classification as well?
In Keras , I use binary cross entropy+sigmoid for multi label, it’s not clear how fastai takes care of this
Mainly in lstms, I think its because of the vanishing gradient problem. I don’t have any reference to back up my argument. It’s just a practice that I observed.