I doubled my data (~150,000 photos) and the accuracy got slightly worse

Hello all,

My goal is to train an image-culling model. I want to run inference on a batch of photos and have the model predict which photos would survive the cull based on previous examples.

The data consists of 153,498 total photos that were taken that I extracted from a private database. There were 55,829 photos that survived to make it into reports. These photos are derived from ~300 projects, each project has ~12 reports, and each report has anywhere from 12 to 300 photos to cull.

Initially, with 69,557 total photos of which 22,931 survived, I achieved a 70% accuracy with resnet34 architecture. I then achieved 72% using ‘convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384’ There were not many differences in changing the transformations. ( I would like to try to keep as much of the photo as possible with zero padding).

After doubling the data, the convnext based model only achieved around 65% as opposed to 72% accuracy.

I used the default learning rate finder, which I did not save a photo of the graph but it selected a point that was on a positive slope which I found to be weird.

I am currently running the resnet64 architecture again and I am already starting off with a worse accuracy of 59% for the first epoch as opposed to 65% previously.

I hypothesize that since I do not know how to train a time series-based model (report by report instead of all photos of all reports at once) the error is with the lack of context for any given photo since it just peels off the top photo and guesses if it survived or not until the 150,000th photo is processed. I am preparing a forum post asking about time series-based models now or how to assign batches based on project and report numbers.

Here is my latest code:

from fastai.vision.all import *
import pandas as pd
import os

import pandas as pd

df = pd.read_csv('~/data/filename.csv')
print(df.head())

to_be_culled_df = df

# Extract filename from 'Image Path'
to_be_culled_df['filename'] = to_be_culled_df['Image Path'].apply(lambda x: os.path.split(x)[-1])

# Now sort by 'Project Name', 'Report Number', and 'filename'
to_be_culled_df = to_be_culled_df.sort_values(by=['Project Name', 'Report Number', 'filename'])

# Reset the index of the dataframe
to_be_culled_df = to_be_culled_df.reset_index(drop=True)


import os

def get_x(r): return os.path.expanduser(r['Image Path'])
def get_y(r): return r['Survived']

dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                   splitter=RandomSplitter(valid_pct=0.2, seed=42), 
                   get_x=get_x, 
                   get_y=get_y,
                   item_tfms=[Resize(460, method='pad', pad_mode='zeros')],
                   batch_tfms=aug_transforms(do_flip=True, flip_vert=True, max_rotate=0.0, max_zoom=1.0))

# Create a DataLoader
dls = dblock.dataloaders(to_be_culled_df, bs=16)

# Display a batch to check if everything is alright
dls.show_batch(max_n=4, nrows=1)

print(torch.cuda.is_available())
print(torch.version.cuda)
print(torch.__version__)

# Define a cnn_learner
learn = cnn_learner(dls, resnet34, metrics=accuracy)


# Find the optimal learning rate and plot the learning rate finder
lr_min = learn.lr_find(show_plot=True)
print(f"The suggested learning rate is: {lr_min}")

learn=vision_learner(dls,'convnext_base', metrics=error_rate).to_fp16()

get_ipython().system('pip install timm')
import timm

print(timm.list_models('convnext*'))

learn=vision_learner(dls,'convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384', metrics=error_rate).to_fp16()


# In[19]:


def create_custom_model(pretrained=True, n_out=dls.c, **kwargs):
    model = timm.create_model('convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384', pretrained=pretrained, num_classes=0, **kwargs)
    num_features = model.num_features
    model.fc = nn.Linear(num_features, n_out)  # replace the last layer to match the number of your classes
    return model

learn = vision_learner(dls, create_custom_model, metrics=error_rate).to_fp16()

# or Manually set a learning rate based on the plot
# lr_manual = 1e-2



# Train the model
learn.fit_one_cycle(4, lr_min)
# Train the model
# learn.fit_one_cycle(4, lr_manual)


# Save the model
learn.save('model_name')

I think your project is interesting. I read through your issue and code. I think right now I cannot give you exact reasons why the performance is good or not. But, I have some hypothesis and suggestions.

  1. I think you need to change how you create validation set. Jeremy has focussed on this a lot. Randomsplitting is not a good option in my pov. You should create validation set on project level. that is say 50 project images for training and 15 projects images for validation. The reason is tomorrow if you use this model in real world then, none of those images would have been seen by the model. With random splitting, you are causing data leakage. * I think with this your performance would deteriorate from 72% but I would feel that it would be a more ‘true’ reflection of the performance.

  2. Are these images of large size? Because when you downsize these images, (say to 460) many important details get lost, which could be pertinent to professional photographers, based on which they take the decisions. May be you can use some image tiling approach and saving the information in these high resolution original images.

  3. Image culling is very subjective in nature. So, if you had details about the person/s involved in culling and concatenate this information with images then, it can act as a strong prior. Network may learn to recognize individual styles and preferences. (But this will not create a generalizable model which can be used in the wild. But it could very well be used within a single firm whose data is available to you)

These are my theories based on my understanding and past experience with a subjective-dataset where I was trying to predict the emotion by seeing the face of the person. Simply using the image in my case didn’t get me good results. But when I extracted additional features from faces like facial action units and concatenated these with image features, I witnessed a boost in the performance. Good luck with your project