Hi,
I might be a little late to the party on this, but I was having the same issue as a couple of the previous posters, in that I’m using my own PC, (Intel i5-7600K, 16Gb RAM, GTX 1070), and found that the get_data()
function caused my RAM and Swap to rapidly fill up. It seems that the get_data()
function’s overall output is just an array of features. So, instead of loading the entire image library into memory and predicting the entire batch, why not just load individual images, process the prediction for each image, and construct that final required prediciton array piece by piece?
To that end, I wrote a function which attempts to do this, then, matching the original Lesson 2 notebook, saves the final feature array to disk.
Please note, that although I’m by no means new to programming (25+ years), this (fantastic) course is the first time I’ve ever programmed using Python, so it’s quite possible that the following code is perhaps not as efficient as it could be, seeing that I’m not yet familiar with Python’s intricacies. It’s also possible (I hope not) that the final array is not “ordered” correctly (if that matters?)
Anyway, here’s the function and how to use it. Please feel free to take/improve as necessary. It also implements the Keras image library, so ensure that it’s imported too:
from keras.preprocessing import image
def get_features(dirname):
#print(dirname)
directory = os.fsencode(dirname)
trn_features = []
for dir in os.listdir(directory):
subdir = os.fsencode(dirname + "/" + os.fsdecode(dir))
for file in os.listdir(subdir):
imgname = os.fsdecode(file)
imgfile = os.fsdecode(subdir) + '/' + imgname
#print(imgfile)
test_image = image.load_img(imgfile,target_size=(224,224))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image,axis = 0)
result = model.predict(test_image,batch_size=1)
trn_features.append(result[0])
return np.array(trn_features)
To call it, simply pass the directory path. Use it as a drop in replacement for get_data()
trn_features = get_features(path+'train')
val_features = get_features(path+'valid')
It takes a while to process all 23,000 images, but it should (eventually) return an array with the same shape as trn_features
in the original Lesson2 notebook:
trn_features.shape
(23000, 1000)
The, the array can be saved/loaded as required:
save_array(model_path+'train_lastlayer_features.bc', trn_features)
save_array(model_path+'valid_lastlayer_features.bc', val_features)
trn_features = load_array(model_path+'train_lastlayer_features.bc')
val_features = load_array(model_path+'valid_lastlayer_features.bc')
I hope that helps somebody!
Cheers,
Codegnosis.