Pytorch LSTM not present inside Fastai

I noticed PyTorch LSTM class is not used inside Fastai at the moment.
It may be per purpose.

After I watched the Lesson 7 I got the idea how building on a top of this code you can basically create state of the art LSTM.

Thank you guys on keeping everything plain and simple.

1 Like

I believe fastai wraps the PyTorch LSTM (nn.LSTM) unit in their awd-lstm implementation:

See line 94.

@Esteban, look at this @sgugger reverted some code 2 days ago. This is hot :slight_smile:

Looks that bidirectional part is missing the other way.

Still, I like not using the original PyTorch LSTM and building LSTM from scratch because you have the perfect control of what you build and the platform forces you to be creative.

Yes this code base is very active. The fastai developers are adding features on a daily basis. I love the work they are doing.

I haven’t tried the bidirectional feature yet, but the unidirectional awd-lstm works amazingly well.

Developing a LSTM from scratch is a great way to learn the details.

@Esteban I am positive awd-lstm is probable the best we can get on LSTM since it is part of fastai.

I am just guessing PyTorch LSTM has fewer features comparing to Keras LSTM. I mean if you check the documentation pages, it is obvious.

I liked very much the idea to create LSTM but in a way we started in here.

Note Model1, Model2 and Model3 are RNNs without PyTorch RNN class inside. Model4 and Model5 use PyTorch RNN and GRU.

My question would be: Is it smart we create LSTM modules without using PyTorch RNNs (RNN, GRU or LSTM), but similar to Model1, Model2 and Model3 approach, just using the Embedding layer, Linear layer, Batch norm, and possible few more simple modules if needed, but without using the RNN modules PyTorch introduces?

This is just my opinion but, but things like Model1, Model2, a Model3 are great exercises for learning. In practice you will want to leverage (or extend) library implementations like PyTorch RNNs(GRU, LSTM, etc.). The exception might be if you are trying to invent something novel or current library implementations lack the features you need. There are so many opportunities to introduce bugs if you always implement from scratch and it can be hard to compare your work to others. Again this is just my opinion and I don’t want to discourage you from implement things from scratch if you so desire. Most deep learning practitioners will leverage as many library features as they can, but a researcher may play around with implementing features from scratch to advance the state-of-the-art.