Kernel crash when fitting model in swift

(Thom M) #1

Not quite sure where to start debugging this one…

I’m replicating the budding Audio library in Swift, and have managed to load audio into Tensors and put them into a Databunch, but the kernel dies (reliably) when I try to train the model.

I’m using the model arch from 08_data_block, and when I call learner.fit(1), the progress bar pops up, and the kernel crashes.

To be clear, the model trains properly in both 08_data_block and 11_imagenette on my setup.

In the notebook I get this error:

Fatal error: No algorithm worked!: file /swift-base/swift/stdlib/public/TensorFlow/CompilerRuntime.swift, line 2108
Current stack trace:
0    libswiftCore.so                    0x00007fa079e12e00 _swift_stdlib_reportFatalErrorInFile + 115
1    libswiftCore.so                    0x00007fa079d5b06c <unavailable> + 3035244
2    libswiftCore.so                    0x00007fa079d5b15e <unavailable> + 3035486
3    libswiftCore.so                    0x00007fa079ba2a12 <unavailable> + 1231378
4    libswiftCore.so                    0x00007fa079d27d42 <unavailable> + 2825538
5    libswiftCore.so                    0x00007fa079ba1ef9 <unavailable> + 1228537
6    libswiftTensorFlow.so              0x00007fa077219022 <unavailable> + 598050
7    libswiftTensorFlow.so              0x00007fa077217770 checkOk(_:file:line:) + 508
8    libswiftTensorFlow.so              0x00007fa07723be70 _TFCCheckOk(_:) + 81
9    libswiftTensorFlow.so              0x00007fa07723be60 _swift_tfc_CheckOk + 9

In the jupyter console I get this error:

$ python: /swift-base/swift/include/swift/SIL/AbstractionPattern.h:299: void swift::Lowering::AbstractionPattern::initSwiftType(swift::CanGenericSignature, swift::CanType, swift::Lowering::AbstractionPattern::Kind): Assertion `signature || !origType->hasTypeParameter()' failed.

Any ideas of where to start? Notebook is in the fastai_audio branch here.

1 Like

(brett koonce) #2

I got this as well when trying to use resnet34 on lesson 11 earlier, for what it’s worth.

0 Likes

(Thom M) #3

Bumping this one. The notebook is merged into the fastai_docs master branch (I didn’t really mean to include it in the PR, and would be happy to remove it until it works!). The crash is still happening with the 0.3.1 release, with the same error messages.

For what it’s worth, this is the function in SIL/AbstractionPattern.h that’s crashing:

  void initSwiftType(CanGenericSignature signature, CanType origType,
                     Kind kind = Kind::Type) {
    assert(signature || !origType->hasTypeParameter());
    TheKind = unsigned(kind);
    OrigType = origType;
    GenericSig = CanGenericSignature();
    if (OrigType->hasTypeParameter())
      GenericSig = signature;
  }

…but I don’t know which Swift code is calling this function. TBH I’m most interested in how I could debug this. I’m going to try setting this up in xcode with a non-CUDA toolchain and see if that gets me anywhere, but not sure whether that’s the best path.

Would appreciate any help with either educated guesses about what’s going wrong (my gut says something about correct tensor types) or tips towards an effective debugging mechanism (e.g. are there log files or something?).

Cheers again :slight_smile:

0 Likes

(brett koonce) #4

Sorry, wasn’t clear the other day on what I meant. When I switched from xresnet18 to the xresnet34 variant I got this error as well, which suggests to me that we’re running out of memory and then it’s just dying as a result of that and throwing a weird error.

Would suggest you use the cpu-only version with jupyter and see if your code runs on that to make sure the above is not an issue. Tried to do it here but my install is acting up.

ps. your code is using shellCommand, which isn’t imported by default in your notebook right now. You might do a PR where you add it/make sure 00_load_data.swift gets imported somewhere.

0 Likes

(Thom M) #5

Cheers Brett. FWIW it still crashes if I change it to an xresnet18.

I’ll give the CPU-only version a crack.

ps. your code is using shellCommand,

Yep, it’s just that I haven’t updated it in a few weeks, now it’s using the .shell extension.

0 Likes

(Marc Rasi (S4TF Team)) #6

The error information that you get in jupyter when you hit a compiler bug is not great. Trying to compile this as a swiftpm swift exectuable might get you some more useful information.

@pcuenq recently added a feature to https://github.com/latenitesoft/NotebookExport that lets you export notebook cells to a swiftpm swift executable quite easily. You could try using that to create a swiftpm swift excutable, and then swift run <name> it.

2 Likes