Theano compile time much longer than TensorFlow

I’m not sure if this is normal, but the bigger model I create, the longer it takes for model.fit_generator to actually start training. For a a small network with just a few dense layers this is on orders of seconds (say up to 10), but if I try to do something vgg-like with a lot of convolutional layers, it literally takes minutes.

During this time, if I look in GPU-Z I see Load at 0%, looking at task manager I see around 10% usage of the Python process, which probably means something like 100% on a single CPU? Even though it doesn’t really show up in the CPU usage, but I guess this is since 100% average usage of one core on 12 logical cores (i7 5820k) is ~10%?

Anyway, I’ve tried using Python’s cProfile and turns out that the majority of the time is spent in theano.compile.function_module, which I assume is responsible for compiling the model down to CUDA code? There seems to be some issues on Theano GitHub regarding different timings for cuda and gpu backend, but it’s been really slow having tried both on my machine.

However, switching to TensorFlow the compile times go down drastically

  • Theano backend, gpu device: killed it after 8 minutes
  • Theano backend, cuda device: 4 minutes 3 seconds
  • TensorFlow backend: 11 seconds

Note that I do have CUDA and cuDNN configured with theano, and they are used during the training. This is just the time it takes before training starts when fit/fit_generator gets called.

Anyone else observed anything similar?