Great idea and thanks for organizing, count me in. Can you share the sentencepiece implementation? I have access to relatively powerful infrastructure so I can help with the experiments. Without time constrain we can probably gather a relatively large German twitter corpus for the LM.