Plotting Label Count Distribution of Data - Assessing Balance
This is my first post in the forum.
As I was running through lesson one and attempting to load my custom dataset I was wondering how balanced my data is. What I mean by balanced is how many instances of each class/label do I have in the train/validation set.
I searched through the code base but wasn’t able to find a way to plot/print this information so I coded up a few jupyter cells which I added to my lesson one notebook. Thought it might be beneficial to others when building their dataset especially if the data set is scraped automatically.
The following code shows how to plot the distribution of the classes/labels in the train set.
# extracting the distribution import collections items = data.label_list.train.y.items occurance_count = collections.Counter(items) occurance_count = list(occurance_count.values()) classes = data.label_list.train.y.classes # plotting index = [i for i in range(len(occurance_count))] plt.bar(index,occurance_count) plt.xticks(index,[classes[i] for i in range(len(occurance_count))],rotation=45) plt.ylabel('Label Count') plt.xlabel('Label') plt.title('Count of Labels') plt.show()
Hope you find this useful!