Roane and I are starting research on a project to address the issues brought up in this study, “Racial disparities in automated speech recognition”, by Stanford
In the study they examined voice recordings from black and white speakers and found that black voices were far less understood on five major corporate voice transcriptions services. This can affect lives in many ways including automated job screening that is now common. To address these issues, they recommended better acoustic recognition vs a better language model.
If we can successfully retrieve these voice recordings, can fastai help with encoding these transcriptions? In one lecture Jeremy showed how laying out the audio clip as an image was helpful, is this the right path?
If anyone has experience encoding audio we would love to chat!
Links to the study:
The study points these properties that may contribute to the errors, “pronunciation and prosody—including rhythm, pitch, syllable accenting, vowel duration, and lenition—between white and black speakers”. So we do wonder if focus on generate data that addresses these may be the best way to focus our efforts. We also emailed the author of the study to see if the “VOC” dataset is available outside of Stanford.