I hit some issues with the code b/c there is a bug in the notebook callback handler (calls begin_epoch *2), but with some hardcoding, it’s all working.
here’s the code- I greatly welcome any improvements:
def __init__(self, layer, dropout_final=.5, batch_size=538, num_epochs=1):
self.layer = layer
self.dp_final = dropout_final
self.batch_size=0 #len(self.data.train_dl) causes recursive err(?)
self.total_iterations = 0
self.warmup_sets = 0
def begin_fit(self, **kwargs):
self.batch_size = len(self.data.train_dl)
print(self.batch_size," batch size****")
self.n_epochs = max(1,self.num_epochs) # min(1,kwargs['n_epochs']) #avoid 0
self.total_iterations = (self.batch_size * self.n_epochs)
#main calculations for when to apply dropout %
self.warmup_sets = int(self.total_iterations * .1)
self.full_dropout_sets = (self.warmup_sets *2)
self.middle_sets = self.total_iterations - (self.warmup_sets + self.full_dropout_sets)
print("breakout of sets: warmup ", self.warmup_sets," middle ",self.middle_sets," final ",self.full_dropout_sets)
self.start_full_dropout = self.warmup_sets + self.middle_sets
print("begin epoch - dp sched")
print(self.current_epoch, " current epoch - dp sched")
#print("iteration = ",self.iter)
if self.iter < self.warmup_sets:
elif self.iter > self.start_full_dropout:
self.layer.p = self.dp_final
i = self.iter - self.warmup_sets
print(i," i val")
pct = round(i / self.middle_sets,2)
dp_pct = 1- round(1 * (1/self.middle_sets)**pct,2)
print(dp_pct," drop pct")
new_drop = round(dp_pct * self.dp_final,2)
print("iter ", self.iter, " dp_pct ", new_drop)
self.layer.p = new_drop
#new_dp = self.curve(1, self.total, self.iter)
#self.layer.p = new_dp
#print("total iter ",self.total_iterations)
The paper shows that you need a smooth curve on the dropout adjustments - if you make big jumps, then it causes the CNN to ‘forget’ and rest to some degree.
The paper has an algorithm to produce their curve but I couldn’t make it work…and their github just has a schedule they made from matlab or excel, not actual code.
Anyway, I created a similar curve by doing:
0-10% - 0% dropout for first 10% of total iterations to be run
10-70% - an exponential curve for steadily increasing dropout - from 0% to full %…checks at each batch
70-100% - full dropout rate
I’ll try and run it tomorrow on Imagenette with XResNet50 to see how it compares to leaderboard results.
I’m waiting for FastAI v2 to come out and then can hopefully finalize it. Note that you currently have to manually find the dropout layer and pass that in…I’d like to automate that finding aspect.
Also, in the paper they had 3 dropout layers - 1 right after the inputs (90% retention or up to 10% dropout), 75% in the middle conv layer and then up to 50% in the final layer before flatten.
So, technically we need 3 layers to mimic the paper.
Right now I’m just running with one before the final output basically.
As noted, appreciate any input on the very rough code above!