Are there any techniques (or practices) which are widely used in the industry for Life-Long Learning?
One common advice people give is to use low learning rate for new set of incremental data to fine tune the network but this deteriorates the previously learnt weights (quite badly) over time. @jeremy mentioned in Lecture 3 (2019)@43:32 that one can train with higher learning for new (incremental) dataset but this can lead to overfitting of the new data and thus the network will forget about what it has learnt previously. This effect is even worse if the incremental is quite small. At least, this is my experience.
There are so many research on this topic using various concepts such as, distillation, elastic weight consolidation, learning without forgetting, incremental classifier representation learning, deep adaption, progressive distillation and retrospection, hedge adaption, just to name a few which I am aware of. But they are on research datasets and I am not sure whether they can actually perform as they are advertised in real world datasets.
Some companies (https://www.neurala.com/press-releases/neurala-announces-breakthrough-update-to-award-winning-deep-neural-network-technology) claim that they have a solution for catastrophic forgetting of deep learning (with a slightly reduced performance on the whole dataset)