First deep learning project - Speech sentiment analysis

Hi,

As my first deep learning project, I am thinking of building a phone call sentiment and key phrases extractor. Picking this one as it is also related to something I might be working on at work.

Scope of the project

Input: Phone call recordings (audio files) between the business and its customers.

Outputs:

  1. Key sentiment (positive / negative / neutral with a score) of the customer from that phone call.

  2. The conversation could have a mix of key phrases that drive the overall sentiment for e.g. “I have been using product A for a while and I like it”, “Recently tried product B and it seems too pricey”, “Even though I signed up for promotions/ discounts, I don’t get any in my email”.

High level approach I am thinking of taking:

  1. Use a speech to text converter.

  2. Use a NLP library (NLTK, Hugging Face Transformers etc.) to extract overall sentiment of the conversation.

  3. Leverage transfer learning by using BERT or Seq2Seq to help extract the key phrases. Not sure if 2 & 3 can be done using the same library / model in order to have an overall sentiment with a score, key phrases that contribute to that score and also see how much they contribute to the overall score.

Questions:

  1. Is the high level approach mentioned above a good starting point? Are there any other things that you would recommend?

  2. Any good precedents (papers / project documentation / Kaggle competitions) that you would recommend me to go through to build my knowledge before starting out on this project?

  3. Any speech datasets that I can use?

Thanks!

Hello,
Thanks for this post.
Prosperidad Social

Thanks For share this!