Real time learning machine - decremental learning, attribute addition/removal

So was reading a whitepaper from a company (, not a research things marketing stuff. There were some things they claim they are doing and I was just reasoning about how that could be added in a data pipeline.

Incremental learning can happen if say they are using neural nets and just doing transfer learning. Any other thing that marketing people can call incremental learning?

The part with decremental learning is bit confusing to me. I can see that we can totally do cross validation or similar stuff to train multiple models to choose the best one. But saying that they “remove the observations identified as adversely affecting model performance” sounds like taking out individual data rows. Either that or they are just taking our sparse/duplicate data. Or does that sound like marketing not technical talk?

Attribute addition/deletion can be done on the fly if they were using something like a hashing trick to mesh up all of the attributes into a list and then using that list of numbers as the features for training the models. Anything else that seems plausible?

The more I think about how they are emphasizing that they update things in real time (<10ms) the more confused I am. Models take time to train. On the order of seconds. Unless they do like 1 epoch with batch size of 1 I cannot think of a way to make it work. Can it?

I have no idea what they are doing, but if you have a LOT of data to learn from (in this case it might be a continuous stream of data about ad impressions) then you can do “online” learning, which is indeed one training iteration with a single training example. This works because you’ve constantly got new data coming in, but each data point is used only once.

Yes think billions in event stream per day. Do you have a reference to the “online” learning that you are referring to? Does that happen on the order of milliseconds? Because the use case of real time bidding is on the order of milliseconds.

How long it takes obviously depends on the model, but it’s just one forward pass and one backward pass. Usually that is on the order of milliseconds indeed. (If on the GPU, it might make sense to train on mini-batches.)

Here’s a reference: