Optimising class LanguageModelLoader()

I will have to check it. I have not tried your example. My numbers are using my dataset. I will report back in the next few days.

I tried it with the latest pull. It is faster but not as fast as it was before. For perspective, the fastest in release-1.0.38 for 1 epoch on my dataset took a little over 2 hours. The slowest before the latest pull for 1 epoch took over 12 hours. For this latest pull, the current estimate says a little over 10 hours.

So there is an improvement, but not as fast as what it used to be.

yes but the memory peek has gone down so you should be able to increase batchsize and that will:
With bptt=70 i get the following execution time for an epoch

bs= 32 => time = 39:20 min
bs= 64 => time = 32:30 min
bs=128 => time = 29:10 min

Can you manage to git dissect to the exact commit that has this happening? That would be very helpful since I donā€™t have an example reproducing this.

5dd3868173212e100e5113b14e8d46a6305b4e23 is the first bad commit
commit 5dd3868173212e100e5113b14e8d46a6305b4e23
Author: Sylvain <sylvain.gugger@gmail.com>
Date:   Wed Dec 26 14:28:32 2018 +0000

    LanguageModelLoader don't make copies of the datasets

:040000 040000 bed3f7c98746d6ce3f3a549f163238bf2d8f2157 cca5e94b603e9a49654063bdac8c47dbe3de6ecc M      fastai
:040000 040000 a746759baa590f0c7da055bf88f3f6c460ae9537 97cb3c36f3a97b77f08b2ee7f41df5cf9f144c2e M      tests

I used git bisect and started with release-1.0.38 (good) and release-1.0.39 (bad). Here is the diff.

Iā€™m currently using the commit before the bad commit to train my language model. If I train it from this commit, will everything work without problems in the latest master?

Normally yes, as we didnā€™t make any change to how models are loaded.
There is the batch first commit that may be after, but it wonā€™t change your models weights

Do you have any idea what is causing the slow down? Do you need any more information from me?

Well it looks like itā€™s the new language model loader but I need some time to investigate. How large is the dataset youā€™re working with?

For creating the LM, the dataset of use is 5,635,619 records for a total of 1.2GB of text data.

Ok, Trying with something that has 5,000,000 texts, the current implementation loops over one epoch in 47 hours (ouch) instead of 21 seconds before. @Kaspar PR gets us to 6 minutes (so not a matter of seconds, but we save memory and it shouldnā€™t even make a difference with the GPU having to actually do stuff), did you try it?

Is @Kaspar 's PR already integrated in 1.041?

It will be in 1.0.42 (currently itā€™s on master).

1 Like

That is a HUGE jump in runtime. Any idea what is causing that?

Can you please elaborate on this, I didnā€™t understand.

I just started a run with the latest pull from master and good news, the 1 epoch estimate is showing a little over 2 hours (similar to what we had before the runtime hike). Does this mean this problem is solved?

The previous implementation was very long for a huge dataset because it was computing at every line of a batch what index it should go to look for the text. The new one by @Kaspar is saving those very smartly so we save that compute time. I didnā€™t catch it because this compute time is growing with the size of the dataset and I didnā€™t make any tests on a very big dataset.
Now the problem should be solved, yes. The new implementation is slightly slower than the old one that was using a lot of RAM but you wonā€™t see it as the bottleneck is the GPU again.

3 Likes

@sgugger, It seems that we lost the random bppt feature in the LangaugeModelPreLoader is this intentional and the randomised bppt is done somewhere else, or is this done by accident? Iā€™m checking the code of the loader as the recent switch to LanguageModelPreLoader broke our joint bidirectional training implementation and Iā€™m trying to update the code, to work with the change.

Update:
Iā€™ve noticed that we are shuffling the indexes on every epoch now so the randomised bppt is not needed.

Exactly. Iā€™ve double checked on wikitext-2, but the shuffle of the articles is enough, and it actually gives slightly better results than the randomized bptt. Just be sure to pass your dataset with one full article per item (otherwise youā€™ll shuffle weird things).

1 Like

Hi @piotr.czapla i what way did it break ?
The LanguageModelPreLoader can load forwards and backwards- see also the test test_text_languagemodelpreloader.py

Your code is fine, it was just me relying on the existence of LanguageModelLoader. :slight_smile: