Speech diarization and summary

nn.Charles · November 12, 2020, 4:42pm

Hi everyone,

I am looking into summarising meeting records and I am not sure what is the best way to approach the problem.
We split the problem in three phases :

Speaker diarization, maybe after reducing the noise
Speech to Text, possibly followed by some correction
Text summarisation
I am a bit concerned that by the time we get to summarising, the model will not be able to understand the coreferences in the dialogues.
Also, I am not sure what architecture I should use. I know that as output we should have a list of words and as input, I imagine converting the audios in spectrograms.

Any ideas or experience , you could share on this topic ?

Best,

Charles