I realized I had an issue with my Data Block API objects when I saw that splitting the data differently resulted in a different number of total images appearing. Specifically, when I split using split_by_valid_idx
and gave it an array of indexes, it looked as if the number of training and validation images I had was more than my total number of images. Eventually, I realized this was because my method of getting indexes, np.random.randint
, didn’t replace when sampling and gave me duplicate indexes.
Of course, this bug was not the fastai’s fault or the data block API, but if there could be a simple line in split_by_valid_idx
to check for duplicates, that might be helpful in the future and avoid this bug that I’m sure is somewhat common, and hard to check for.