[Paper] The unreasonable effectiveness of the forget gate

Reading this, it strikes me that the first time an idea is introduced, in many AI advances, it happens to be more complicated than the variants that eventually work out well. From the paper, the authors give a historical perspective of RNNs, LSTMs and various variants with different kinds of gates. https://arxiv.org/pdf/1804.04849.pdf