I realized I had an issue with my Data Block API objects when I saw that splitting the data differently resulted in a different number of total images appearing. Specifically, when I split using
split_by_valid_idx and gave it an array of indexes, it looked as if the number of training and validation images I had was more than my total number of images. Eventually, I realized this was because my method of getting indexes,
np.random.randint, didn’t replace when sampling and gave me duplicate indexes.
Of course, this bug was not the fastai’s fault or the data block API, but if there could be a simple line in
split_by_valid_idx to check for duplicates, that might be helpful in the future and avoid this bug that I’m sure is somewhat common, and hard to check for.