I’m Samar and I’m a Research Assistant at the Center for Language Engineering in Lahore, Pakistan, where I work on using machine learning and natural language processing to improve support for low-resource languages like Urdu, Pakistan’s lingua franca.
I had the opportunity to be a part of Deep Learning Part I as an International Fellow back in October. Participating in the course remotely was an incredibly valuable experience and taught me so much about the practical aspects of model design, implementation, and evaluation. Jeremy, your lectures were always peppered with useful tips and tricks on getting deep learning to work, and discussions on the Slack channel and course forum were a welcome escape from struggling with deep learning in isolation.
You might be pleased to hear that your course has already helped me make advances in my work on low-resource languages.
One of the biggest challenges I had witnessed during my research was the lack of labeled data available for training machine learning algorithms, a phenomenon that commonly hinders research on languages with little data on them. This got me thinking of ways in which unsupervised learning techniques can be employed to extract meaningful representations for use in supervised learning problems later on.
Taking a cue from Lesson 5, I acquired, cleaned, and segmented into sentences an Urdu corpus with over 35 million tokens and trained a continuous bag-of-words model to learn vector representations of words from it. The resulting embeddings captured not only very useful semantic relationships between words but also lexical variations frequently found in Urdu. This marks the first time such word representations have been trained for Urdu, and, while they are themselves an incredibly valuable resource, it is exciting to think of ways in which they can be used to advance the state of natural language processing for Urdu in applications ranging from text classification to sentiment analysis to machine translation.
Applications of recurrent neural networks demonstrated towards the end of the course have inspired me to test their effectiveness at building character-level language models with long-term dependencies for Urdu. I look forward to seeing how well they can capture the rich morphology that Urdu exhibits.
In the long run, I hope to use deep learning techniques to bridge gaps in human communication by helping computers better process and understand regional languages and use machine translation to help unlock a world of information for people who don’t speak English (or other popular languages).
I’m passionate about learning new things and sharing that knowledge with others. I have volunteered as a TA multiple times during my undergrad and I love explaining things by deconstructing complex equations into intuitive concepts. I hope to one day become a professor and get to do this full-time.
Something not many people know about me? That I was homeschooled right up to grade 10! That’s when I fell in love with learning and reading.
I can’t wait for the course to begin!