Possible Memory Leaks Swift For TensorFlow in Python Interop (Python.swift)

stephenjohnson · March 25, 2019, 4:03am

First, if anyone is interested in using Python from within Swift without the TensorFlow part, you can easily integrate the Python interop part of the TensorFlow code by getting it from here https://github.com/pvieito/PythonKit I’ve been using it for many months now and they seem to keep it pretty well in sync with the TensorFlow repo.

Back when Jeremy, Sylvain and many others started working on FastAI 1.0, I foolishly decided I’d try to learn the inner workings of FastAI by attempting to code it in Swift as they coded it in Python and so I started using the aforementioned PythonKit.

It’s been a great learning experience but I’ve had issues with memory not being released, so finally this weekend I decided to solely work on finding and fixing these memory leaks if at all possible and after much searching I believe I’ve found a few causes and so wanted to share them with you as it seems the FastAI community is actively embracing Swift with much enthusiasm and you can’t get much deep learning done if you run out of memory. I haven’t fixed all my memory leaks and if I find any more in this library I’ll post here. Also, if anyone can either corroborate my findings or tell me why I’m wrong that would be great.

Below are the changes I believe you need to make in Python.swift

1. In the 2 functions dynamicallyCall, add defer { Py_DecRef(result) } right before the return statement like below. I believe this is necessary because PyObject_Call and PyObject_CallObject according to the Python documentation return a reference that has already been incremented but the PythonObject constructor (the owning constructor) being used in this code is also incrementing the reference.

   guard let result = PyObject_CallObject(selfObject, argTuple) else {
      // If a Python exception was thrown, throw a corresponding Swift error.
      try throwPythonErrorIfPresent()
      throw PythonError.invalidCall(base)
   }
   defer { Py_DecRef(result) } // <-- Add this line
   return PythonObject(owning: result)
}

2. Rewrite the function performBinaryOp to the following to use defer to decrement the lhs, rhs and the result. I believe these changes are necessary because original code was passing ownedPyObjects to the operator and ownedPyObject increments the pointer reference but there was no code to decrement the references after the operation finishes. Also, I believe (but couldn’t find in documentation), that the operator’s return value already has an incremented reference and the PythonObject(owning) constructor increments it again thus it has a reference count of 2 instead of 1 and therefore needs to be decremented before being returned to the caller.

private func performBinaryOp(_ op: PythonBinaryOp, lhs: PythonObject, rhs: PythonObject) -> PythonObject {
  // let result = op(lhs.ownedPyObject, rhs.ownedPyObject) <-- Replace this line with below
  let lhs = lhs.ownedPyObject
  let rhs = rhs.ownedPyObject
  defer { 
    Py_DecRef(lhs)
    Py_DecRef(rhs)
  }
  let result = op(lhs, rhs)
  // If binary operation fails (e.g. due to `TypeError`), throw an exception.
  try! throwPythonErrorIfPresent()
  defer { Py_DecRef(result) } // <-- Add this line
  return PythonObject(owning: result!)
}

3. One other tip is to make sure to call the close() function on any PIL images when you are done with them. I’m not sure yet if this is a side effect of references not being released correctly and once these are all fixed than that won’t be necessary anymore or not. Still investigating.

Happy Swifting!

Kaspar · March 25, 2019, 7:10am

I also had massive memoryleaks problems with the Python integration. I assumed that it was an xcode issue, because the team is mainly working on ubuntu . I would love to see the python integration work seamlessly on macos/xcode too. It makes it is much easier to make a raw port using the python integration and then replacing it with pure swift piece-by-piece

So big thanks for the PythonKit

jeremy · March 25, 2019, 2:26pm

I know that @saeta and team have been working on getting rid of memory leaks recently, so I’m sure this analysis will be extremely helpful!

dan-zheng · March 25, 2019, 5:46pm

cc @parkers, who recently worked on Python refcounting in Swift

parkers · March 25, 2019, 6:56pm

I believe these should be fixed with: https://github.com/apple/swift/pull/23316. I believe my fix is congruent with your notes on memory leaks. (Except maybe your performBinaryOp version I didn’t effectively add those initial defers). Please note that PythonKit is a version of this library that does not live in the swift tree on the tensorflow branch. I believe they are working to port my fixes into PythonKit. I also included some preliminary test-infrastructure to check for leaks. I expect that PythonKit will probably adopt these changes as well.

stephenjohnson · March 27, 2019, 4:00am

Thanks Parker. Ran the changes and performBinaryOp still needs fixing in some manner as outlined above.

clattner · March 27, 2019, 4:25am

Thank you for driving this @stephenjohnson, this is core infra - +1 for it being absolutely solid.

parkers · March 28, 2019, 8:29pm

Thanks Stephen, I added: https://github.com/apple/swift/pull/23619 for the binary operator call. Please let me know if I missed anything else.

stephenjohnson · March 31, 2019, 7:00pm

Thanks Parker and Chris for resolving this so quickly. I’ve been testing and everything looks great so far!

I did run into one more memory leak issue but not with the Python interop. I’ll mention it here so anyone using Swift for AI stuff can benefit.

In my attempt at a FastAI-ish library in Swift, I having been playing around with doing my batch loading/prep in one Swift process and adding it to a Redis list (queue) and then having a different Swift process pull batches from the queue. I’ve been doing this to get around the issue where I’ve been unable to get multiple threads to access Python from within the same Swift process (it locks up on me) so that batches can be prepared simultaneously while training. Anyway, my latest memory issue was when using the Kitura-redis library although the issue isn’t with the library itself.

The simple Swift program below illustrates the memory leak I was encountering.

import Foundation

let string = String(repeating: "I am repeating myself.", count: 1000)
for _ in 1...250000 {
  let _ = string.data(using: String.Encoding.utf8)
}

The data function below is from NSString not a native Swift library and it’s what Kitura uses to convert String data to binary to send to Redis. However, the Data object is an autoreleased object and as such needs to be enclosed in an autorelease pool like below.

let string = String(repeating: "I am repeating myself.", count: 1000)
for _ in 1...250000 {
  autoreleasepool {
    let _ = string.data(using: String.Encoding.utf8)
  }
}

Hope this helps.

jeremy · April 1, 2019, 12:52pm

@stephenjohnson how do you go about identifying these memory issues? Are there some tools which make it easy?

stephenjohnson · April 1, 2019, 8:02pm

TLDR; Unfortunately, I didn’t use any tools that made it easy.

There may be some like Instruments (part of Xcode) that may have helped, but I’m not sure it would have helped with the Python interop ones since those dealt with python reference counting. There’s probably one’s out there that would have made things simpler but I’m not aware of them.

For these I used a very manual and simple process. I started with a small test program and added to it as I looked for leaks. For the Python ones I added print statements to the various reference increments and decrements in Python.swift and then matched them up. When I verified that I was seeing ones that didn’t have equal decrements to the increments, I dug deeper into the Python.swift file looking for places where I could find code that didn’t look right, then tweaked it to see if the memory issue vanished. For the autorelease issue that was again adding into my test program pieces of my larger codebase that were showing memory issues until the test program also leaked memory. So, I then followed the code into the Redis codebase and then adding that code to my test program. Then finally dug into the NSString code which led me to the fact that the data object returned needed autoreleasing which was something I was not aware of would need to happen within Swift. I think I assumed (very incorrectly) that somehow the magic of the Swift to Objective-C bridgin somehow changed the autorelease to a Swift reference counted object.

The good news is that with the above changes I’ve now been able to train and my memory usage stays constant.