Time series/ sequential data study group

tcapelle · October 3, 2019, 10:10am

I think that @oguiza can explain it better, the current implementation is pretty straight forward:
At a batch level, it will mix the last input with the current one current:

new_input = last_input * lambd + current * (1-lambd)
new_target = last_target * lambd + current * (1-lambd)

where lambd follows a Beta distribution. @sgugger examplains in detail here