Multiple transfer learning "coalescing" (with a Star Trek metaphor)

Imagine a situation where you have multiple learning instances each operating independently on partly overlapping inputs. How do you coalesce their learnings?

Let me explain with a fun metaphor:

  • we want to build the language model of Klingon, from scratch
  • there are lots of books in the many public library across the Federation, but we cannot just physically gather them all to a common planet. That would be crazy! :slight_smile:
  • so we decide to install a computer at each public library, which will slowly process the text of the local books. Naturally, some books are present at multiple libraries, and some are unique.
  • periodically, we can upload those models to our cloud server (is the Federation using AWS or GCP? :smiley: )

How would you then merge those partial models of into the ultimate Klingon language model?

Simply map the vocabularies and do an element-wise averaging of the activations? Normalize them first?

ps: for sake of discussion, let’s pretend that the architecture is not important.