Going through the NLP part of the new part 1 course and book. Wondering if it would be useful to have transformer models the default or easier to use out of the box with fast ai. From the little I have seen, it seems like they would be faster to fine tune. I think I read somewhere that they are not as good at classification as RNNs or LSTMs, but I really don’t know.
Any drawbacks to using transformers instead of RNNs or LSTMs ?
I did try running that just now and get an out of memory error. Starting at learn.lr_find or if I try to skip that and go to learn.fit_one_cycle. On google colab.
RuntimeError: CUDA out of memory. Tried to allocate 148.00 MiB (GPU 0; 15.75 GiB total capacity; 14.31 GiB already allocated; 28.88 MiB free; 14.39 GiB reserved in total by PyTorch)
I guess high memory usage would be one problem with these large language models ? Maybe distilbert or something similar would work better.