Audio DataSet V/S Twitter DataSet

I am doing a major project in which my task is Sequence Classification using BERT.
Currently, I have twitter data for fine-tuning BERT.

If I get an Audio Data regarding my problems, is there any chance that I can get better accuracy.
Because if audio data will result in worse accuracy then I will not spend time find the audio data.