Vocabulary Complexity?

I have started messing around with a new Kaggle project and am exploring the idea of ‘vocabulary complexity’. I am initially wanting to look at features such as word and sentence length as well as syllable distribution. I found the Pattern library and it looks very promising for this.

Does anyone have any NLP experience using this type of analysis? Here is my current Kaggle notebook for reference.


