"Unknown labels" in my dataset using from_name_re (Lesson 1)

I’ve loaded downloaded some data from the Google Landmarks dataset. Following along with Lesson 1, I created a data bunch for the images using from_name_re – but I got an error:

/opt/conda/envs/fastai/lib/python3.6/site-packages/fastai/data_block.py:537: UserWarning: You are labelling your items with CategoryList.
Your valid set contained the following unknown labels, the corresponding items have been discarded.
97308, 61555, 162488, 171215, 83111...
  if getattr(ds, 'warn', False): warn(ds.warn)

The interesting thing is that every time I run it, the set of “unknown labels” changes. I also tried logging len(data.classes) (where data is the resulting image bunch), and I found that its length changes every time I reload the data, though it varies by at most about 20. I dug into the fastai source, and it’s still not clear to me why this would be happening: it seems like this is just a list of places where the processor/importer failed, right? But I’d expect those failures to be deterministic; it’s not like I’m changing the files or their names in between each load!

Here’s some example filenames – the first segment is the ID, and the segment after the underscore is the label:

       PosixPath('/storage/landmarks_training/9579d7a6681fcf15_59497.jpg'), ...,
       PosixPath('/storage/landmarks_training/cbd8932ce04d577e_66445.jpg')], dtype=object)

And here’s the code that creates the data bunch:

img_path = Path('/storage/landmarks_training')
fnames = get_image_files(img_path)
data = ImageDataBunch.from_name_re(

Any ideas?