Unet_learner is not reproducible / seed is not applied

Hey all,
since a few days I try to seed my unet_learner.
I create an unet_learner several times in a CI pipeline to be able to perform an anomaly detection with it. With different parameters I want to determine the best model. I am primarily interested in finding out which parameters have a good / bad effect on the result.
I use the following data set for this.

Because the dataset contains masks of the errors, my result is an IoU.

Long story short. When I try to seed the training with set_seed(42,True) (fastai.torch_core), I still get different values. Not only the IoU is different, but also the metrics during the training. The data used are always the same and always in the same order.
I have set the number of workers (num_workers) at the dataloader to zero. shuffle_train is also False.

I also tried to do the training with
with no_random(): ....
Unfortunately this did not help either.

I use a method to create the unet_loader including DataBlock / DataLoaders.
The training is outsourced in a separate method.

This is my code. I use different configuration objects (dataloaders_config: dict, learner_config: dict) to configure the learner and the dataloaders

Create Learner

def create_learner(dataloaders_config: dict, learner_config: dict, img_size: int,
                   path_obfuscated: Path, path_domain: Path, seed: int) -> Learner:
    """Creates a `unet_learner` with dataloaders included."""


    datablock = DataBlock(blocks=(ImageBlock, ImageBlock),
                              valid_pct=dataloaders_config["valid_pct"], seed=seed),
                          batch_tfms=[*aug_transforms(max_zoom=2.), Normalize.from_stats(*imagenet_stats)])
    dls = datablock.dataloaders(
        path_obfuscated, bs=dataloaders_config["bs"], path=path_domain, item_tfms=Resize(img_size),num_workers=0,shuffle_train=False)
    dls.c = dataloaders_config["channels"]

    if learner_config["loss_func"] == "FeatureLoss":
        loss_function = create_feature_loss(learner_config["loss_config"])
        loss_function = None

    cbs = [MixedPrecision, EarlyStoppingCallback(monitor=learner_config["early_stopping"]["monitor"],

    return unet_learner(dls=dls,
                        if learner_config["loss_func"] == "FeatureLoss" else None,

As a loss function I use the feature loss function presented in the fast.ai course.

class FeatureLoss(Module):
    def __init__(self, m_feat, layer_ids, layer_wgts):
        self.m_feat = m_feat
        self.loss_features = [self.m_feat[i] for i in layer_ids]
        self.hooks = hook_outputs(self.loss_features, detach=False)
        self.wgts = layer_wgts
        self.metric_names = ['pixel',] + [f'feat_{i}' for i in range(len(layer_ids))
              ] + [f'gram_{i}' for i in range(len(layer_ids))]

    def make_features(self, x, clone=False):
        return [(o.clone() if clone else o) for o in self.hooks.stored]
    def forward(self, input, target, reduction='mean'):
        out_feat = self.make_features(target, clone=True)
        in_feat = self.make_features(input)
        self.feat_losses = [base_loss(input,target,reduction=reduction)]
        self.feat_losses += [base_loss(f_in, f_out,reduction=reduction)*w
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out),reduction=reduction)*w**2 * 5e3
                             for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]
        if reduction=='none': 
            self.feat_losses = [f.mean(dim=[1,2,3]) for f in self.feat_losses[:4]] + [f.mean(dim=[1,2]) for f in self.feat_losses[4:]]
        for n,l in zip(self.metric_names, self.feat_losses): setattr(self, n, l)
        return sum(self.feat_losses)
    def __del__(self): self.hooks.remove()

Fit model

def fit(learner: Learner, result_path=Path, epochs_freeze: int = 10, epochs_unfreeze: int = 15, base_lr: float = 1e-3,
        lowest_lr: float = 1e-5, pct_start: float = 0.9, wd: float = 1e-3, enable_logging: bool = False):
    """Performs the actual fitting. First trains freezed (`epochs_freeze`) and then unfreezed (`epochs_unfreeze`). Send the model and the metrics (loss, etc.) of the training to the mlflow tracking server  if `enable_tracking` ist set to True."""
    def fit_cycles_and_export_model():  
        learner.fit_one_cycle(n_epoch=epochs_freeze, lr_max=base_lr, pct_start=pct_start, wd=wd)
        learner.fit_one_cycle(n_epoch=epochs_unfreeze, lr_max=base_lr, pct_start=pct_start, wd=wd, cbs=[MLFlowLogCallback()] if enable_logging else None)
        learner.path = result_path

    if enable_logging:
        with learner.no_bar(),learner.no_loggings():

In a different module, I first create the learner and then pass the learner into fit().

I use nvidia-docker to run the pipeline. Maybe this is another cause for my problem.


I noticed that set_seed() uses numpy.random.seed(). According to numpy’s documentation, this method is a legacy function. I’m not sure if this could be a problem.

Unfortunately I can’t find the error and was hoping to get help here.

The following answers could not help me:

If you need more info, leave a comment below