New Optimizer - DeepMemory

Hi all,
I’ve developed a new optimizer, DeepMemory, that is testing out quite well so far.

DeepMemory is a new optimizer I came up with after blending DiffGrad + AdaMod. It takes the memory concept of AdaMod from merely throttling to actively blending and matches it to the epoch size.
The core concept is to provide the optimizer with long term memory of the previous step sizes across the entire epoch.

Results in initial testing put it on par with Ranger. Both Ranger and DeepMemory topped the recent testing I did with about 8 different optimizers (to be sure this is not conclusive! Just what I could do in an afternoon of testing and spending $20 on GPU time).

DeepMemory is designed to offset the weakness of many adaptive optimizers by creating a ‘long term’ memory of the gradients over the course of an epoch. This long term memory is averaged against the current adaptive step size generated from the current mini-batch in order to help guide the step size more optimally.

DeepMemory also keeps a short term gradient buffer that was developed in diffgrad, and locks down the step size when minimal gradient change is detected.

I’ll write up a more detailed article on Medium but for now if anyone is interested in testing, here’s the link along with a notebook already setup for running:

DeepMemory works best when it knows the batches per epoch in order to match the memory size to your epochs:

memory_size = (len(data.x)//bs);memory_size  #should be equal to or close to # of batches per epoch in order to build an average step size for the dataset
optar = partial(DeepMemory, len_memory = memory_size)

*please note you do need to copy over the file to your working directory and then import.
from deepmemory import DeepMemory

Any feedback or questions are welcome. I’ll try and write up a more proper article soon.


Good job as always!
I’m wondering if the whole DeepMemory or parts of it could be used within Look Ahead?
(Something like memorizing deep ranger? ;-p)

Also, If you have any datasets or notebooks that you would like volunteers to help you with testing out code let me know.

I don’t have any credits lying around but i have an 2080ti build that i can run notebooks on my spare time.


Grate job!
Tried on imagewoof.
Actually, on size 128 has same result as ranger, on 20 and 80 epochs.
With ranger i can use higher lr.
Will check on 256 size.


Trying to run DeepMemory and I am getting the following error:
AttributeError Traceback (most recent call last)
----> 1 learn.fit_flat_cos(5, 1e-2)

~/anaconda3/envs/fastten/lib/python3.7/site-packages/fastai2/callback/ in fit_flat_cos(self, n_epoch, lr, div_final, pct_start, wd, cbs, reset_opt)
    130     "Fit `self.model` for `n_epoch` at flat `lr` before a cosine annealing."
    131     if self.opt is None: self.create_opt()
--> 132     self.opt.set_hyper('lr', if lr is None else lr)
    133     lr = np.array([h['lr'] for h in self.opt.hypers])
    134     scheds = {'lr': combined_cos(pct_start, lr, lr, lr/div_final)}

AttributeError: 'DeepMemory' object has no attribute 'set_hyper'

Using fastai2 Version: 0.0.16 wondering if anyone else has come across this error, would appreciate any help on this. Cheers