WideResNet (wrn_22) Cuda Error in Lesson1-pets

I am trying to implement WideResNet for a project and kept getting the following Cuda error.

RuntimeError: CUDA error: device-side assert triggered

Which as sgugger explains is a generic bad index.

Any idea of how to problem-shoot and solve? I have tried:

  1. Resetting everything (Did you turn it off and on?)
  2. Googled most are (Masking off,
  3. Re-updating everything
  4. Running it in lesson1-pets
  5. Reducing batch-size to 10

I took a screenshot in lesson1-pets. While Resnet still works, WRN_22 has the device-side assert triggered.

The wide resnet 22 is intended for CIFAR10, so it has an output hardcoded with 10 classes. You should use the functions in that module to create a suitable model for you :wink:

Gotcha, this seems to work.

def wrn_Custom(num_groups=3, N=3, num_classes=10, k=6, drop_p=0.): 
    "Default Wide ResNet has 22 layers."
    return WideResNet(num_groups, N, num_classes, k, drop_p)

Accuracy is a bit less and will update when I get back up there.

*Many epochs later. It just occurred to me, I am not getting very good results compared to ResNet moving over to WideResNet because there wasn’t any pre-trained WideResNet on imagenet to transfer over.

Any recommendations on where I can go to get these weights?

Widenet- In the middle of 20 hours of training

Resnet 50 at the end of Lesson1-pets

I see a wide resnet 50 on Cadene pretrained models, but that’s all I could find.

Awesome, that is very helpful for more than just WideResNet! Apologize, this thread had gone a little off-topic and thank you for helping me troubleshoot it all!

After noticing that Cadene’s RexNeXt is more popular, I found this recent paper from NIPS suggesting RexNext would be better anyway and will explore that route.

Also, I really like the following notebook explaining pre-trained Cadene models.

Edit: Yeah it got a little better


CUDA Device side assert triggered error occur for a number of reasons, is there a simple way to debug?
In my case it takes quite long to get to the actual error. Amy rule of thumb sort of?