Non Deterministic behaviour when Docker container is restarted

disisbig · December 16, 2019, 2:02pm

Hi,

I have been trying to figure out the root cause of following issue:

I have created a container from a docker image. Whenever I restart the container, the loss for classification changes. The weird part is LM fine tuning loss remains the same and is independent of Docker restart.
Both LM and Classification loss remain the same otherwise.

I’m wondering what’s happening at Docker restart that causes slight change in classification loss?

The architecture being used is ULMFiT.

I’ve verified that the same DataBunch is being created at all times and I’m setting the required random seed for cuda,numpy as given below:

np.random.seed(seed_value)  # cpu vars
torch.manual_seed(seed_value)  # cpu  vars
random.seed(seed_value)  # Python
if use_cuda:
    torch.cuda.manual_seed(seed_value)
    torch.cuda.manual_seed_all(seed_value)  # gpu vars
    torch.backends.cudnn.deterministic = True  # needed
    torch.backends.cudnn.benchmark = False

Attaching the image for loss achieved at different times:

I’m aware that some cuda operations can be a source of non-determinism as documented here, but I’m not sure if that’s an issue here.