Lesson 4 Language Model Out of Memory

theyear2000 · November 20, 2018, 1:07am

I am trying to build a language model from lesson 4 in English. The only difference from class is the data. My data is ~100k random news articles. Each article is a .txt file at least 1kb in size. I split the corpus into 3 dirs: /train ~60k /test ~20k /valid 20k. My problem is when trying to run the same code as in Lesson 4 (lesson 4-imdb) I run out of memory at 30% into my first run. I am using a single 1080ti (11GB mem).

My question is should I change some of the code for example the batch size or should I cut the data set in half, train the model, then train it again on the second half?

P.S. if I change the batch size to something like 32 or 16 or 8, should I also change the back prop through time also to save memory or would that harm the learning? Is there a batch size width to bptt height sweet spot or best practice ratio?

elruso · November 20, 2018, 3:25pm

can be useful Batch size effect on validation accuracy

theyear2000 · November 20, 2018, 3:47pm

Thanks for the heads up @elruso. This was helpful. It also leads me to more questions though lol!

theyear2000 · November 20, 2018, 9:48pm

For anyone who comes across this post, what I did was cut the batch size down from 64 to 32, however I left bptt alone. It is taking a lot longer since cutting the bs and the GPU has stayed at ~95% memory but it seems to be working. I am currently in the 10th pass out of 15. Hope this helps someone in the future…