Share your work here ✅ (Part 2 2022)

fredguth · November 10, 2022, 1:29pm

I am still baffled by these results. I have posted an example where very similar phrases in Portuguese yield very different results. There is no Portuguese words in the tokenizer vocabulary, but it understands some words and others it doesn’t.

In the linked example, I used the word “montando” which means “riding” and the model didn’t understand, but a less precise “andando a cavalo” which is also riding a horse but it would be literary translated as “walking by horse” it does understand. The model also understands that “astronauta” is “astronaut”, “cavalo” is “horse”.