Library for analyzing scored data sets

Given two models that score the same data set and generate different scores, we’d like to analyze the scored data sets.
For example,

  • what % of the top 100 examples overlap
  • What % of the bottom overlap
  • Which errors did model B correct that appeared in model A

The input data could simply be a list of ID’s, scores and labels.

Would not be hard to build, but would be better to contribute to existing. Anyone aware of such a thing?