Angular loss for distance metric learning


I was looking at the paper from Baidu research about improving distance metric learning with angular loss (

I have experiences with simple triplet network but I have big problems to move from triplet to simple triplet+angular loss.

I implemented triplet loss function in Tensorflow which works great:

def triplet_loss(self, anchor, out_a, out_b, squared):
    FaceNet: A Unified Embedding for Face Recognition and Clustering
    with tf.name_scope("triplet-loss"):
        distance_pos = self.euclidean_distance(anchor, out_a, squared=squared)
        distance_neg = self.euclidean_distance(anchor, out_b, squared=squared)
        triplet_loss = tf.maximum(0.0, self.margin + distance_pos - distance_neg)
        total_loss = tf.reduce_mean(triplet_loss)
        return distance_pos, distance_neg, triplet_loss, total_loss

def euclidean_distance(self, a, b, squared=False):
    eps = 1e-12
    with tf.name_scope("euclidean-distance"):
        if squared:
            return tf.reduce_sum(tf.square(tf.subtract(a,b)), 1)
        return tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(a, b)), 1)+eps)

However, I tried to change it to angular loss but the optimization is much slower then simple triplet loss. Here is the code for angular loss in Tensorflow:

def angular_loss(self, anchor, out_a, out_b, in_degree, squared, alpha=45):
    Deep Metric Learning with Angular Loss
    with tf.name_scope("angular-loss"):
        if not in_degree:
            alpha = self.deg2rad(alpha)
        out_c = tf.div(tf.add(anchor, out_a), 2.0)
        distance_pos = self.euclidean_distance(anchor, out_a, squared=squared)
        distance_bc = self.euclidean_distance(out_b, out_c, squared=squared)
        tan_dist = 4.0 * (tf.tan(alpha) ** 2) * distance_bc
        loss = tf.maximum(distance_pos - tan_dist, 0.0)
        total_loss = tf.reduce_mean(loss)
        return distance_pos, distance_bc, loss, total_loss

For embeddings I used last fully connected layer(4096 dim) of pretrained VGG Network. I tried to play with different optimizers, learning rates but in the end the model is much worse than simple triplet loss. Do you have any experiences here with angular/triplet networks? Is my implementation right according to the paper?

Thank you