Hello all,
My goal is to train an image-culling model. I want to run inference on a batch of photos and have the model predict which photos would survive the cull based on previous examples.
The data consists of 153,498 total photos that were taken that I extracted from a private database. There were 55,829 photos that survived to make it into reports. These photos are derived from ~300 projects, each project has ~12 reports, and each report has anywhere from 12 to 300 photos to cull.
Initially, with 69,557 total photos of which 22,931 survived, I achieved a 70% accuracy with resnet34 architecture. I then achieved 72% using ‘convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384’ There were not many differences in changing the transformations. ( I would like to try to keep as much of the photo as possible with zero padding).
After doubling the data, the convnext based model only achieved around 65% as opposed to 72% accuracy.
I used the default learning rate finder, which I did not save a photo of the graph but it selected a point that was on a positive slope which I found to be weird.
I am currently running the resnet64 architecture again and I am already starting off with a worse accuracy of 59% for the first epoch as opposed to 65% previously.
I hypothesize that since I do not know how to train a time series-based model (report by report instead of all photos of all reports at once) the error is with the lack of context for any given photo since it just peels off the top photo and guesses if it survived or not until the 150,000th photo is processed. I am preparing a forum post asking about time series-based models now or how to assign batches based on project and report numbers.
Here is my latest code:
from fastai.vision.all import *
import pandas as pd
import os
import pandas as pd
df = pd.read_csv('~/data/filename.csv')
print(df.head())
to_be_culled_df = df
# Extract filename from 'Image Path'
to_be_culled_df['filename'] = to_be_culled_df['Image Path'].apply(lambda x: os.path.split(x)[-1])
# Now sort by 'Project Name', 'Report Number', and 'filename'
to_be_culled_df = to_be_culled_df.sort_values(by=['Project Name', 'Report Number', 'filename'])
# Reset the index of the dataframe
to_be_culled_df = to_be_culled_df.reset_index(drop=True)
import os
def get_x(r): return os.path.expanduser(r['Image Path'])
def get_y(r): return r['Survived']
dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
splitter=RandomSplitter(valid_pct=0.2, seed=42),
get_x=get_x,
get_y=get_y,
item_tfms=[Resize(460, method='pad', pad_mode='zeros')],
batch_tfms=aug_transforms(do_flip=True, flip_vert=True, max_rotate=0.0, max_zoom=1.0))
# Create a DataLoader
dls = dblock.dataloaders(to_be_culled_df, bs=16)
# Display a batch to check if everything is alright
dls.show_batch(max_n=4, nrows=1)
print(torch.cuda.is_available())
print(torch.version.cuda)
print(torch.__version__)
# Define a cnn_learner
learn = cnn_learner(dls, resnet34, metrics=accuracy)
# Find the optimal learning rate and plot the learning rate finder
lr_min = learn.lr_find(show_plot=True)
print(f"The suggested learning rate is: {lr_min}")
learn=vision_learner(dls,'convnext_base', metrics=error_rate).to_fp16()
get_ipython().system('pip install timm')
import timm
print(timm.list_models('convnext*'))
learn=vision_learner(dls,'convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384', metrics=error_rate).to_fp16()
# In[19]:
def create_custom_model(pretrained=True, n_out=dls.c, **kwargs):
model = timm.create_model('convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384', pretrained=pretrained, num_classes=0, **kwargs)
num_features = model.num_features
model.fc = nn.Linear(num_features, n_out) # replace the last layer to match the number of your classes
return model
learn = vision_learner(dls, create_custom_model, metrics=error_rate).to_fp16()
# or Manually set a learning rate based on the plot
# lr_manual = 1e-2
# Train the model
learn.fit_one_cycle(4, lr_min)
# Train the model
# learn.fit_one_cycle(4, lr_manual)
# Save the model
learn.save('model_name')