Fine tuning with differential learning rate (cycle_mult=2) takes longer time

Hi,
I am using an AWS p2.xlarge Instance.
In lesson - 2, section 7.2
when I run the
learn.fit(lr,3, cycle_len=1, cycle_mult=2) statement, It is taking longer time sometimes more than a hour.
May I know what is the normal time for running this statement in lesson 2 with a GPU?