Shortest failing test sequence finder (coupled tests detector)

stas · February 25, 2019, 5:23pm

Part 1a. Find an efficient algorithm to solve the problem.

The first part is not to worry about the test suite. Let’s start with a pure list reduction CS algorithm question.

The question is very simple: given a list of length N, find a minimal sub-list according to a certain criteria. In this case the criteria is that nth item behaves differently (fails) after a specific sub-list was executed, but which behaves normally otherwise.

I wrote a simple demo program to demonstrate the issue:

x = 0
y = 0
def black_box(i):
    global x, y
    if i == 5: x +=1 
    if i == 100: 
        if x: y +=1 # order is important! 
    if i == 230: assert not (x and y), f"{i} failed"

# need to implement the following using black_box as crit_func
def find_min_fail_sequence(items, crit_func):
     min_sequence = []
     # your code here
     return min_sequence

N=240
t = list(range(N))
print(f"Failing sequence: { find_min_fail_sequence(t, black_box) }")
# expecting: 5, 100, 230 for the above black_box implementation

# examples of success/failure
black_box(t[230]) # no problem on its own
for i in t[100],t[5],t[230]: black_box(i) # no problem (wrong order)
for i in t[:100],t[101:]:    black_box(i) # no problem (100 was skipped)
for i in t:                  black_box(i) # fails since 5, then 100 were run

Output:

Failing sequence: []
[...]
---> 25 for i in t:                  black_box(i) # fails since 5, then 100 were run
----> 8     if i == 230: assert not (x and y), f"{i} failed"
[...]
AssertionError: 230 failed

Problem: find the secret i’s that lead to failure black_box(230) in an efficient way - i.e. ideally in much less than N iterations.

you don’t know how many i’s lead to a problem, but the algorithm should be optimized for a sequence of 2-3 items (i.e. only 2-3 i’s are coupled, which is the 99% of real life situations), but it should be possible to search for slightly larger coupled groups)
the order of i’s is important, so no changing the order of i’s
you can’t see the code of black_box.
you can’t see x and y (or know their relevance). These globals demonstrate a stateful state that affects the tests.
the goal should be to have the least amount of invocations of black_box, outputs of which can’t be cached. This is because the real application is the running tests, so we want to find the failing sequence in the least amount of time.

Currently worst case scenario brute force approach would take about 4 hours to complete. And unlike the black_box example, the test failure in the real problem is inconsistent, and will need to be re-run on average 5 times - that’s 20 hours of runtime and a lot of wasted electricity. And it’ll grow as we have more tests.