I’m working through the assignment for lecture three, and after first trying my own solution and then the statefarm-sample notebook I’m still getting much higher validation accuracy for the linear model than the example does. Since the only variable in statefarm-sample is creating the validation set, I’m assuming I’m making a mistake there.
For example, this is the output provided in the sample notebook:
Epoch 1/4
1568/1568 [==============================] - 7s - loss: 1.3763 - acc: 0.5816 - val_loss: 2.5994 - val_acc: 0.2884
Epoch 2/4
1568/1568 [==============================] - 5s - loss: 1.0961 - acc: 0.7136 - val_loss: 1.9945 - val_acc: 0.3902
Epoch 3/4
1568/1568 [==============================] - 5s - loss: 0.9395 - acc: 0.7730 - val_loss: 1.9828 - val_acc: 0.3822
Epoch 4/4
1568/1568 [==============================] - 5s - loss: 0.7894 - acc: 0.8323 - val_loss: 1.8041 - val_acc: 0.3962
and this is what I get:
Epoch 1/4
1500/1500 [==============================] - 49s - loss: 1.3822 - acc: 0.5973 - val_loss: 1.7339 - val_acc: 0.4060
Epoch 2/4
1500/1500 [==============================] - 40s - loss: 1.1753 - acc: 0.6687 - val_loss: 1.3550 - val_acc: 0.5250
Epoch 3/4
1500/1500 [==============================] - 39s - loss: 0.9793 - acc: 0.7573 - val_loss: 1.1352 - val_acc: 0.6350
Epoch 4/4
1500/1500 [==============================] - 39s - loss: 0.8338 - acc: 0.8040 - val_loss: 1.0000 - val_acc: 0.6920
I use the following code to do it. Can anyone spot a glaring error? Thanks!
def move_pct(srcdir, dstdir, pct, copy=False):
"""
For a given source directory and distination directory,
move or copy a percentage of the files from the source
to the destination
"""
# List the files in the source directory
files_in_dir = os.listdir(srcdir)
# Count how many files are in the source directory
num_files = len(files_in_dir)
print("{nm} files in source directory {src}".format(nm=str(num_files),
src=srcdir))
# Get the number of files we're going to move into the destination directory
num_to_move = np.floor(num_files * (1.0*pct)/100)
print("moving {nm} from {src} to {dst}".format(src=srcdir,
nm=str(num_to_move),
dst=dstdir))
selected_files = [files_in_dir[x] for x in np.random.choice(num_files,
int(num_to_move)).tolist()]
# create dstdir if not exists
if not os.path.isdir(dstdir):
os.mkdir(dstdir)
try:
if copy:
for elem in set(selected_files):
shutil.copy(os.path.join(srcdir,elem),
os.path.join(dstdir,elem))
else:
for elem in set(selected_files):
shutil.move(os.path.join(srcdir,elem),
os.path.join(dstdir,elem))
except Exception as e:
print(e)
The meta-function that loops over each label
def sfd(srcdir,validdir):
"""
Loop over the available labels, copying a percentage of each into the validation directory
"""
# enumerate labels
labels = os.listdir(srcdir)
# create validdir if not exists
if not os.path.isdir(validdir):
os.mkdir(validdir)
for elem in labels:
move_pct(os.path.join(srcdir,elem),
os.path.join(validdir,elem), 5, copy = False)
print("complete")
And finally the invocation:
sfd('data/state/train', 'data/state/valid')
This all creates the following output after I unzip imgs.zip into a new directory:
2489 files in source directory data/state/train/c0
moving 124.0 from data/state/train/c0 to data/state/valid/c0
2267 files in source directory data/state/train/c1
moving 113.0 from data/state/train/c1 to data/state/valid/c1
2317 files in source directory data/state/train/c2
moving 115.0 from data/state/train/c2 to data/state/valid/c2
2346 files in source directory data/state/train/c3
moving 117.0 from data/state/train/c3 to data/state/valid/c3
2326 files in source directory data/state/train/c4
moving 116.0 from data/state/train/c4 to data/state/valid/c4
2312 files in source directory data/state/train/c5
moving 115.0 from data/state/train/c5 to data/state/valid/c5
2325 files in source directory data/state/train/c6
moving 116.0 from data/state/train/c6 to data/state/valid/c6
2002 files in source directory data/state/train/c7
moving 100.0 from data/state/train/c7 to data/state/valid/c7
1911 files in source directory data/state/train/c8
moving 95.0 from data/state/train/c8 to data/state/valid/c8
2129 files in source directory data/state/train/c9
moving 106.0 from data/state/train/c9 to data/state/valid/c9
complete