Hi guys, I’m working on my final year project which involves aspect-based sentiment analysis on earning transcript calls. The aspects in this case are countries or sectors being mentioned by the CEO/CFO.
The tricky part is the lack of ground truth labels, which I’m trying to overcome by using a pre-trained BERT to generate labels on the text. But because there is no aspect element to BERT’s classification, the labels will be wrong for some if not most.
The simple but tedious solution would be to hand label the corpus, but with 11 million documents and just 1 guy, I’m not looking forward to it
My current plan is to train the model in phases, by first using texts with only single aspects and using a lexicon based approach to find ‘high confidence’ data, then slowly progress my training. Just started this and wondering if it will work.
Has anyone met and solved this problem before in a better way?
Thanks in advance, Neo