How many samples of a label needed for effective training

I’m going through the process of data cleaning (on a multi-class text classification problem, with thousands of labels) and was wondering what I should do if I only have 1 sample of input for training. I suspect I should remove those from the dataset. Is there some sort of guideline for the minimum number of samples of a particular label required to effectvely train on?