I’ve started the language2motion project with goal of creating multi-modal implementation of Transformer architecture in Swift. It’s a learning exercise for me and an attempt to answer the question if Swift for Tensorflow is ready for non-trivial work.
The use-case is based on a paper "Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks" by Matthias Plappert. He created a nice dataset of few thousand motions “The KIT Motion-Language Dataset”.
Feel free to check it out and contribute.