Hey! I’m training an image classifier with my own data. I’m using google colab for the task. Fastai version = 2.3.0. I’m using my own data and I have 2 main problems:
- I have too much data to load to google drive all at once (I connect google colab to google drive), therefore I had to separate it into 11 folders randomly distributed.
- Each epoch takes around 8 hours to run, so (even with colab pro) I can only do 1 epoch at a time before I get disconnected and I need a way to save after that and load again to do the next epoch or next folder.
I think that it’s possible to resume the epoch even if the runtime is disabled (Resume training with fit_one_cycle) but what i really need is a realiable way to train, save, load, and train all over again to iterate over my multiple folders
Here is my code:
os.chdir(’/content/drive/MyDrive/Data1’)
path = ‘/content/drive/MyDrive/Data1/train’
xray = DataBlock(blocks=(ImageBlock, CategoryBlock),
get_items=get_image_files(str(path+’/train’)),
splitter = partial(GrandparentSplitter(),),
get_y=parent_label)
path = ‘/content/drive/MyDrive/Data1’
categ = os.listdir(os.path.join(path, “train”))
print(categ)
path_anno = str(path + ‘/’ + ‘annotations’)
pre_path_img = []
def path_helper():
for category in categ:
pre_path_img.append(str(path + ‘/’ + category))
path_img = ‘’
for pre in pre_path_img:
path_img = path_img + ', ’ + str(pre)
return path_img
path_img = path_helper()
dls = ImageDataLoaders.from_folder(path, train=‘train’, test=‘test’, valid=‘val’, bs=8)
dls.train_ds.items[:3]
dls.valid.show_batch(max_n=8, nrows=2)
learn = cnn_learner(dls, resnet50, metrics=error_rate)
learn.fine_tune(1)
learn.export(fname=‘export1.pkl’)
#THIS IS WHERE PROBLEMS BEGIN
learn.load("/content/drive/MyDrive/Data1/models/export1.pkl")
learn.fine_tune(1)
learn.export(fname=‘export2.pkl’)
The first time I ran learn.fine_tune(1) it worked ok. The model was exported as export1.pkl. However, when I try to load export1.pkl it give the error:
FileNotFoundError: [Errno 2] No such file or directory: ‘/content/drive/MyDrive/Data1/models/export1.pkl.pth’
For what I notice, it expects the model to be saved as .pth
Seeing this I tried the same thing with an older project in witch I trained it all in one go. The trained model was saved as “/content/drive/MyDrive/model1.pth”, and I tried:
learn = cnn_learner(dls, resnet50, pretrained=True, metrics=error_rate)#.to_fp16()
learn.load("/content/drive/MyDrive/model1")
And get the following error:
RuntimeError: Error(s) in loading state_dict for Sequential:
Missing key(s) in state_dict: “0.0.weight”, “0.1.weight”, “0.1.bias”, “0.1.running_mean”, “0.1.running_var”, “0.4.0.conv1.weight”, “0.4.0.bn1.weight”, “0.4.0.bn1.bias”, “0.4.0.bn1.running_mean”, “0.4.0.bn1.running_var”, “0.4.0.conv2.weight”, “0.4.0.bn2.weight”, “0.4.0.bn2.bias”, “0.4.0.bn2.running_mean”, “0.4.0.bn2.running_var”, “0.4.0.conv3.weight”, “0.4.0.bn3.weight”, “0.4.0.bn3.bias”, “0.4.0.bn3.running_mean”, “0.4.0.bn3.running_var”, “0.4.0.downsample.0.weight”, “0.4.0.downsample.1.weight”, “0.4.0.downsample.1.bias”, “0.4.0.downsample.1.running_mean”, “0.4.0.downsample.1.running_var”, “0.4.1.conv1.weight”, “0.4.1.bn1.weight”, “0.4.1.bn1.bias”, “0.4.1.bn1.running_mean”, “0.4.1.bn1.running_var”, “0.4.1.conv2.weight”, “0.4.1.bn2.weight”, “0.4.1.bn2.bias”, “0.4.1.bn2.running_mean”, “0.4.1.bn2.running_var”, “0.4.1.conv3.weight”, “0.4.1.bn3.weight”, “0.4.1.bn3.bias”, “0.4.1.bn3.running_mean”, “0.4.1.bn3.running_var”, “0.4.2.conv1.weight”, “0.4.2.bn1.weight”, “0.4.2.bn1.bias”, “0.4.2.bn1.running_mean”, “0.4.2.bn1.running_var”, “0.4.2.conv2.weight”, “0.4.2.bn2.weight”, “0.4.2.bn2.bias”, “0.4.2.bn2.running_mean”, “0.4.2.bn2.running_var”, “0.4.2.conv3.weight”, “0.4.2.bn3.weight”, “0.4.2.bn3.bias”, “0.4.2.bn3.running_mean”, “0.4.2.bn3.running_var”, “0.5.0.conv1.weight”, “0.5.0.bn1.weight”, “0.5.0.bn1.bias”, “0.5.0.bn1.running_mean”, “0.5.0.bn1.running_var”, “0.5.0.conv2.weight”, “0.5.0.bn2.weight”, “0.5.0.bn2.bias”, “0.5.0.bn2.running_mean”, “0.5.0.bn2.running_var”, "0.5.0.conv3…
Unexpected key(s) in state_dict: “i_h.weight”, “rnn.weight_ih_l0”, “rnn.weight_hh_l0”, “rnn.bias_ih_l0”, “rnn.bias_hh_l0”, “rnn.weight_ih_l1”, “rnn.weight_hh_l1”, “rnn.bias_ih_l1”, “rnn.bias_hh_l1”, “h_o.weight”, “h_o.bias”.
I know that I probably just don’t understand whats happening behind the scenes. Cloud anyone give me a hint?