Learn.lr_find taking way too long to run


When I try to plot the learning rate, it is behaving oddly and is taking way too long to plot the graph. The image currently shows 7 iterations but it goes till default 100 which is taking around 2 hours just to plot the graph. I am currently using this on Jupyter Lab in GCP.

I then tried to pass a smaller number for the iterations and got the following graph and message

Also, before this I got the following error message while plotting the graph:

Error: libsixel is needed. See https://github.com/saitoha/libsixel

This also never came before, but I fixed this by installing ‘Libsixel’ via pip command.

This didn’t use to happen previously so I’m not sure what triggered this change or why is this happening suddenly. Any suggestions on how to fix it or what I am doing wrong would be really helpful

You should check your gpu usage during that. If its mostly on 0%, then its just taking too long to fetch/preprocess your dataset before feeding it to the network

Hi Julian, I tried to check the gpu usage and its not 0%


any other possible reason for this that I could look and fix?

Sorry I wasn’t clear enough. GPU usage is where it says Volatile GPU-Util, so at the time you run that command nothing was being processed by the gpu. It sounds more like a problem with your dataset. How long does show_batch take?

data.show_batch took around 10 secs to show the sample images.

I am receiving “OSError: [Errno 9] Bad file descriptor” message. And even getting some warning message regarding the multiprocessing file
“File “/opt/anaconda3/lib/python3.7/multiprocessing/connection.py”, line 177, in close

Could this be a possible reason and please also let me know how to fix this.

Hard to know without seeing your code. That error usually shows when python tries to close a file which was already closed. It could be an external process closing them, you doing something in a random fashion which is picking twice same file, or something else.