Is batches.filenames supposed to be sorted?

I’m at beginning of the course, about to finish lesson 1 home work. I successfully submitted Dogs vs Cats and got to the top 50%, but I noticed that my batches.filenames are sorted:
[‘unknown\1.jpg’, ‘unknown\10.jpg’, ‘unknown\100.jpg’, ‘unknown\1000.jpg’, ‘unknown\10000.jpg’]
and not random as in Jeremy’s notebook
[‘unknown/9292.jpg’, ‘unknown/12026.jpg’, ‘unknown/9688.jpg’, ‘unknown/4392.jpg’, ‘unknown/779.jpg’].

I didn’t give it too much thought in the beginning because i got good results, but now when I’m trying to do state-farm I get really bad results which validation probabilities skewed towards c8 & c9… The only explanation I can think of is that it’s because they are last in training, the network learns to favorite them.

Is the sorting really a problem? what can I do about it?

P.s. I know that there is a course notebook for state-farm but I’m trying to do this myself first

1 Like

You should turn on shuffling for your data generators for training, especially if there are groupings in the data when it is sorted.

How do I do that? It seems that get_batches has shuffle=True by default.
Is there any other shuffle option?