"Unknown labels" in my dataset using from_name_re (Lesson 1)

I’ve loaded downloaded some data from the Google Landmarks dataset. Following along with Lesson 1, I created a data bunch for the images using from_name_re – but I got an error:

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py:537: UserWarning: You are labelling your items with CategoryList.
Your valid set contained the following unknown labels, the corresponding items have been discarded.
97308, 61555, 162488, 171215, 83111...
  if getattr(ds, 'warn', False): warn(ds.warn)

The interesting thing is that every time I run it, the set of “unknown labels” changes. I also tried logging len(data.classes) (where data is the resulting image bunch), and I found that its length changes every time I reload the data, though it varies by at most about 20. I dug into the fastai source, and it’s still not clear to me why this would be happening: it seems like this is just a list of places where the processor/importer failed, right? But I’d expect those failures to be deterministic; it’s not like I’m changing the files or their names in between each load!

Here’s some example filenames – the first segment is the ID, and the segment after the underscore is the label:

array([PosixPath('/storage/landmarks_training/b41909f546b2ea13_90647.jpg'),
       PosixPath('/storage/landmarks_training/2e17429a187c4f6b_151454.jpg'),
       PosixPath('/storage/landmarks_training/ac0d7a4f5761d18d_29781.jpg'),
       PosixPath('/storage/landmarks_training/9579d7a6681fcf15_59497.jpg'), ...,
       PosixPath('/storage/landmarks_training/4f5d62e32d3dea03_166120.jpg'),
       PosixPath('/storage/landmarks_training/67ec63c491e0d821_15513.jpg'),
       PosixPath('/storage/landmarks_training/8b029a3b836ebc99_132385.jpg'),
       PosixPath('/storage/landmarks_training/cbd8932ce04d577e_66445.jpg')], dtype=object)

And here’s the code that creates the data bunch:

img_path = Path('/storage/landmarks_training')
fnames = get_image_files(img_path)
data = ImageDataBunch.from_name_re(
    img_path,
    fnames,
    re.compile(r'/.*_(\d+).jpg$'),
)

Any ideas?