Hi,
As my first deep learning project, I am thinking of building a phone call sentiment and key phrases extractor. Picking this one as it is also related to something I might be working on at work.
Scope of the project
Input: Phone call recordings (audio files) between the business and its customers.
Outputs:
-
Key sentiment (positive / negative / neutral with a score) of the customer from that phone call.
-
The conversation could have a mix of key phrases that drive the overall sentiment for e.g. “I have been using product A for a while and I like it”, “Recently tried product B and it seems too pricey”, “Even though I signed up for promotions/ discounts, I don’t get any in my email”.
High level approach I am thinking of taking:
-
Use a speech to text converter.
-
Use a NLP library (NLTK, Hugging Face Transformers etc.) to extract overall sentiment of the conversation.
-
Leverage transfer learning by using BERT or Seq2Seq to help extract the key phrases. Not sure if 2 & 3 can be done using the same library / model in order to have an overall sentiment with a score, key phrases that contribute to that score and also see how much they contribute to the overall score.
Questions:
-
Is the high level approach mentioned above a good starting point? Are there any other things that you would recommend?
-
Any good precedents (papers / project documentation / Kaggle competitions) that you would recommend me to go through to build my knowledge before starting out on this project?
-
Any speech datasets that I can use?
Thanks!