Hello,

I was looking at the paper from Baidu research about improving distance metric learning with angular loss (http://research.baidu.com/Public/uploads/5acc20706a719.pdf).

I have experiences with simple triplet network but I have big problems to move from triplet to simple triplet+angular loss.

I implemented triplet loss function in Tensorflow which works great:

```
def triplet_loss(self, anchor, out_a, out_b, squared):
"""
FaceNet: A Unified Embedding for Face Recognition and Clustering
<https://arxiv.org/abs/1503.03832>
"""
with tf.name_scope("triplet-loss"):
distance_pos = self.euclidean_distance(anchor, out_a, squared=squared)
distance_neg = self.euclidean_distance(anchor, out_b, squared=squared)
triplet_loss = tf.maximum(0.0, self.margin + distance_pos - distance_neg)
total_loss = tf.reduce_mean(triplet_loss)
return distance_pos, distance_neg, triplet_loss, total_loss
def euclidean_distance(self, a, b, squared=False):
eps = 1e-12
with tf.name_scope("euclidean-distance"):
if squared:
return tf.reduce_sum(tf.square(tf.subtract(a,b)), 1)
return tf.sqrt(tf.reduce_sum(tf.square(tf.subtract(a, b)), 1)+eps)
```

However, I tried to change it to angular loss but the optimization is much slower then simple triplet loss. Here is the code for angular loss in Tensorflow:

```
def angular_loss(self, anchor, out_a, out_b, in_degree, squared, alpha=45):
"""
Deep Metric Learning with Angular Loss
<https://arxiv.org/pdf/1708.01682.pdf>
"""
with tf.name_scope("angular-loss"):
if not in_degree:
alpha = self.deg2rad(alpha)
out_c = tf.div(tf.add(anchor, out_a), 2.0)
distance_pos = self.euclidean_distance(anchor, out_a, squared=squared)
distance_bc = self.euclidean_distance(out_b, out_c, squared=squared)
tan_dist = 4.0 * (tf.tan(alpha) ** 2) * distance_bc
loss = tf.maximum(distance_pos - tan_dist, 0.0)
total_loss = tf.reduce_mean(loss)
return distance_pos, distance_bc, loss, total_loss
```

For embeddings I used last fully connected layer(4096 dim) of pretrained VGG Network. I tried to play with different optimizers, learning rates but in the end the model is much worse than simple triplet loss. Do you have any experiences here with angular/triplet networks? Is my implementation right according to the paper?

Thank you