Imagine a situation where you have multiple learning instances each operating independently on partly overlapping inputs. How do you coalesce their learnings?
Let me explain with a fun metaphor:
- we want to build the language model of Klingon, from scratch
- there are lots of books in the many public library across the Federation, but we cannot just physically gather them all to a common planet. That would be crazy!
- so we decide to install a computer at each public library, which will slowly process the text of the local books. Naturally, some books are present at multiple libraries, and some are unique.
- periodically, we can upload those models to our cloud server (is the Federation using AWS or GCP? )
How would you then merge those partial models of into the ultimate Klingon language model?
Simply map the vocabularies and do an element-wise averaging of the activations? Normalize them first?
ps: for sake of discussion, letβs pretend that the architecture is not important.