Language2motion project in Swift

Hi all,

I’ve started the language2motion project with goal of creating multi-modal implementation of Transformer architecture in Swift. It’s a learning exercise for me and an attempt to answer the question if Swift for Tensorflow is ready for non-trivial work.

The use-case is based on a paper "Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks" by Matthias Plappert. He created a nice dataset of few thousand motions “The KIT Motion-Language Dataset”.

Feel free to check it out and contribute.


As you encounter issues, please do reach out on the mailing list ( or here. Excited to see your progress!

1 Like

Hi Wojtek

Just followed your project on GitHub nice approach on image motion visualization and description :slight_smile:
Also downloaded labels description and fork your GitHub.
Labels was created by multiple persons as you probably aware there are multiple descriptions the same activity.

[‘a’, ‘person’, ‘is’, ‘walking’, ‘forwards’]
[‘a’, ‘person’, ‘walks’, ‘4’, ‘steps’, ‘forward’]
[‘a’, ‘human’, ‘walking’]

For me those labels have very similar meaning.
Also you can see words stats there was 1775 different words used in label vocabulary.

If we look on counters of words

[(‘a’, 7235),
(‘person’, 4262),
(‘the’, 2259),
(‘walks’, 2248),
(‘and’, 1524),
(‘forward’, 1390),
(‘to’, 1338),
(‘is’, 1257),
(‘human’, 1140),
(‘right’, 1098),
(‘left’, 1036),
(‘steps’, 991),
(‘with’, 876),
(‘walking’, 868),

There is only handful of useful words which describe motion.

NLP is whole new for me hope I can help in Python and only small bits in Swift (whole new area for me) :slight_smile:




Just finished work with labels simplified them an unified

Netbook with work progress.

Cheers :slight_smile:

Hi Michał,

Nice NLTK work :slight_smile: I suppose you can select few combinations of tokens, output a set of labels and plug them into one of models we have: BERT-language2label, ResNet-img2label or ResNet-motion2label. I’m curious how your labels will perform.

1 Like

I’ve uploaded 2 new processed datasets for your X10 1-channel ResNet performance tests:

1 Like

Superb :slight_smile: will update with progress on moving to X10 and performance :slight_smile: