How to interpret predictions from multi labels in machine learning?

Say I use a (convolutional) neural network to classify a multi-label problem. For example, the problem is classifying different stages of skin cancer from 0 (healthy) to 4 (severe). The five stages are represented in the following manner.

0 => [1, 0, 0, 0, 0]
1 => [1, 1, 0, 0, 0]
2 => [1, 1, 1, 0, 0]
3 => [1, 1, 1, 1, 0]
4 => [1, 1, 1, 1, 1]

In the last layer of my neural network, I have 5 output neurons that use a sigmoid activation function.

After I’ve trained my neural network, I feed it a test image and it gives me the following prediction.

[0.6, 0.2, 0.7, 0.1, 0.1]

If any class output greater than 0.5 activates a class, I get the following prediction.

[1, 0, 1, 0, 0]

My question is, should this correspond to a prediction of 2 or 1?