Pupil detection


I have annotated approximately 200 images with purpose to train the model to detect pupils. Every image is annotated with two points that represent the pupils. As the dataset is small, I wanted to augment the dataset the way it was augmented for object detection. In order to correctly transform dependent variables, I have transformed the points to bounding boxes; for every pupil (x, y), I have created a bounding box with y_min = y - image_height/20, x_min = x - image_width/20, y_max = y + image_height/20, x_max = x + image_width/20

After evaluating the dataset, the bounding boxes seemed OK:

In order to train the model, I have tried to reshape the output that represents 2 bounding boxes back to 2 points:

def bb2centroids(source_tensor):
    batch_size, coords_count = list(source_tensor.size())
    bb_count = coords_count/4
    box_coords = torch.functional.split(source_tensor, 1, dim=1)
    centroid_coords = torch.cuda.FloatTensor(())
    for i in range(int(bb_count)):
        bb_idx = i * 4
        y1 = box_coords[bb_idx]
        y2 = box_coords[bb_idx + 2]
        x1 = box_coords[bb_idx + 1]
        x2 = box_coords[bb_idx + 3]
        y = (y1 + y2)/2
        x = (x1 + x2)/2
        centroid_coords = torch.cat((centroid_coords, y), 1)
        centroid_coords = torch.cat((centroid_coords, x), 1)
    return centroid_coords

After this, I have tried to minimize the MSE, but the model accuracy was really bad.

I have also tried to use the sum of euclidean distances as a loss function but there was no improvement:

def get_centroid_distances(input, target):
    centroids_input = bb2centroids(input)
    centroids_target = bb2centroids(target)
    batch_size, coords_count = list(centroids_input.size())
    points_count = coords_count/2
    losses = torch.cuda.FloatTensor(())
    points_coords_input = torch.functional.split(centroids_input, 2, dim=1)
    points_coords_target = torch.functional.split(centroids_target, 2, dim=1)
    for i in range(int(points_count)):
        loss_current = (points_coords_input[i] - points_coords_target[i])**2
        loss_current = torch.sqrt(loss_current.sum(1, keepdim=True))
        losses = torch.cat((losses, loss_current), 1)
    return torch.mean(losses)

I will continue to work on this model and try to improve it but nevertheless any of your practical experiences that might be relevent to this challenge might help. I guess one reason for bad performance is the size of the dataset. - will annotate more images along the way. Another thing that I am wondering if it improves or reduces the model’s accuracy is transformations of the points to bounding boxes and then back to points. The only reason I did this is that in such way it was possible to take advantage of existing fast.ai function for augmentation of dependent variables. I will also try to avoid data augmentation and manually set dependent variables (y and x coordinates) to coord_y/image_height, coord_x/image_width.

Anyway, any comment or suggestion would be appreciated.

Best regards,


1 Like