Lesson 3: Really high validation accuracy for state farm?

I’m working through the assignment for lecture three, and after first trying my own solution and then the statefarm-sample notebook I’m still getting much higher validation accuracy for the linear model than the example does. Since the only variable in statefarm-sample is creating the validation set, I’m assuming I’m making a mistake there.

For example, this is the output provided in the sample notebook:

Epoch 1/4
1568/1568 [==============================] - 7s - loss: 1.3763 - acc: 0.5816 - val_loss: 2.5994 - val_acc: 0.2884
Epoch 2/4
1568/1568 [==============================] - 5s - loss: 1.0961 - acc: 0.7136 - val_loss: 1.9945 - val_acc: 0.3902
Epoch 3/4
1568/1568 [==============================] - 5s - loss: 0.9395 - acc: 0.7730 - val_loss: 1.9828 - val_acc: 0.3822
Epoch 4/4
1568/1568 [==============================] - 5s - loss: 0.7894 - acc: 0.8323 - val_loss: 1.8041 - val_acc: 0.3962

and this is what I get:

Epoch 1/4
1500/1500 [==============================] - 49s - loss: 1.3822 - acc: 0.5973 - val_loss: 1.7339 - val_acc: 0.4060
Epoch 2/4
1500/1500 [==============================] - 40s - loss: 1.1753 - acc: 0.6687 - val_loss: 1.3550 - val_acc: 0.5250
Epoch 3/4
1500/1500 [==============================] - 39s - loss: 0.9793 - acc: 0.7573 - val_loss: 1.1352 - val_acc: 0.6350
Epoch 4/4
1500/1500 [==============================] - 39s - loss: 0.8338 - acc: 0.8040 - val_loss: 1.0000 - val_acc: 0.6920

I use the following code to do it. Can anyone spot a glaring error? Thanks!

def move_pct(srcdir, dstdir, pct, copy=False):
"""
For a given source directory and distination directory, 
move or copy a percentage of the files from the source 
to the destination
"""

# List the files in the source directory
files_in_dir = os.listdir(srcdir)

# Count how many files are in the source directory
num_files = len(files_in_dir)

print("{nm} files in source directory {src}".format(nm=str(num_files),
                                                   src=srcdir))
# Get the number of files we're going to move into the destination directory
num_to_move = np.floor(num_files * (1.0*pct)/100)
print("moving {nm} from {src} to {dst}".format(src=srcdir,
                                               nm=str(num_to_move),
                                              dst=dstdir))

selected_files = [files_in_dir[x] for x in np.random.choice(num_files, 
                                                            int(num_to_move)).tolist()]

# create dstdir if not exists
if not os.path.isdir(dstdir):
    os.mkdir(dstdir)

try:
    if copy:
        for elem in set(selected_files):
            shutil.copy(os.path.join(srcdir,elem),
                       os.path.join(dstdir,elem))
    else:
        for elem in set(selected_files):
                shutil.move(os.path.join(srcdir,elem),
                           os.path.join(dstdir,elem))
except Exception as e:
    print(e)

The meta-function that loops over each label

def sfd(srcdir,validdir):
"""
Loop over the available labels, copying a percentage of each into the validation directory
"""

# enumerate labels
labels = os.listdir(srcdir)
  
# create validdir if not exists
if not os.path.isdir(validdir):
    os.mkdir(validdir)
    
for elem in labels:
    move_pct(os.path.join(srcdir,elem), 
            os.path.join(validdir,elem), 5, copy = False)

print("complete")

And finally the invocation:

sfd('data/state/train', 'data/state/valid')

This all creates the following output after I unzip imgs.zip into a new directory:

2489 files in source directory data/state/train/c0
moving 124.0 from data/state/train/c0 to data/state/valid/c0
2267 files in source directory data/state/train/c1
moving 113.0 from data/state/train/c1 to data/state/valid/c1
2317 files in source directory data/state/train/c2
moving 115.0 from data/state/train/c2 to data/state/valid/c2
2346 files in source directory data/state/train/c3
moving 117.0 from data/state/train/c3 to data/state/valid/c3
2326 files in source directory data/state/train/c4
moving 116.0 from data/state/train/c4 to data/state/valid/c4
2312 files in source directory data/state/train/c5
moving 115.0 from data/state/train/c5 to data/state/valid/c5
2325 files in source directory data/state/train/c6
moving 116.0 from data/state/train/c6 to data/state/valid/c6
2002 files in source directory data/state/train/c7
moving 100.0 from data/state/train/c7 to data/state/valid/c7
1911 files in source directory data/state/train/c8
moving 95.0 from data/state/train/c8 to data/state/valid/c8
2129 files in source directory data/state/train/c9
moving 106.0 from data/state/train/c9 to data/state/valid/c9
complete