Training with inaccurate labels (Aspect-based Sentiment Analysis)

Hi guys, I’m working on my final year project which involves aspect-based sentiment analysis on earning transcript calls. The aspects in this case are countries or sectors being mentioned by the CEO/CFO.

The tricky part is the lack of ground truth labels, which I’m trying to overcome by using a pre-trained BERT to generate labels on the text. But because there is no aspect element to BERT’s classification, the labels will be wrong for some if not most.

The simple but tedious solution would be to hand label the corpus, but with 11 million documents and just 1 guy, I’m not looking forward to it :slight_smile:

My current plan is to train the model in phases, by first using texts with only single aspects and using a lexicon based approach to find ‘high confidence’ data, then slowly progress my training. Just started this and wondering if it will work.

Has anyone met and solved this problem before in a better way?

Thanks in advance, Neo

There is something called gold loss correction that appear to work well when you have severe label noise.

Fastai also has LabelSmoothingCrossEntropy out of the box for less severe cases.

2 Likes

Thanks! Looks v interesting, will definitely try it out :slight_smile:

For anyone who stumbles on this, I eventually solved it by creating a synthetic multi-aspect dataset using the more correctly labelled single country ones.

Couple that will label smoothing gave me pretty good results. :slight_smile: