A framework for applying deep learning to software testing

dekimir · November 27, 2017, 4:12pm

Hi, everyone,

I must say I enjoyed the online course and these forums very much. Most everything I’ve learned about modern AI and its applications, I’ve learned here. I never expected to get drawn in this deep when I first watched the lecture videos.

But the reason I wanted to study deep learning was to try to improve the software development and testing process. It always seemed to me that this area would lend itself well to AI involvement, but of course that remains to be proven empirically. So I decided to create a framework to facilitate research into it. It took a lot of hard work, but it is finally good enough to allow experimentation with real-life software. It can parse C++ code, generate randomized tests for it, and feed the test results directly to deep-learning models. Using it, I managed to teach a neural network to discern between failing and passing unit tests. Please take a look at a paper I wrote describing it:

github.com

dekimir/RamFuzz/blob/master/sci/ramfuzz.md

# RamFuzz: A Framework for C++ Test Generation via Deep Learning
Dejan Mircevski

November 2017

## Abstract

We have developed a framework for generating unit tests of arbitrary C++ source and processing their results by deep-learning neural networks. We show that a network originally developed for text processing can learn to accurately predict which randomized unit tests will fail and which will succeed, based on the random values generated. This is a promising first step in teaching artificial intelligence what a software program does and how to test it.

## Introduction

Randomized testing is a family of automated procedures that exercise software with randomly generated inputs.  It has proven very effective at finding bugs that manual testing could not.  From fuzzers to unit-test generators, the practice of feeding random inputs to a piece of software to expose its flaws has become invaluable in raising the quality and security of widely used programs the public depends on.  In particular, randomized unit testing (where tests exercise individual methods instead of the whole program) is helpful in isolating where a bug occurs, making it easier to diagnose and fix.  Moreover, unit testing works to fortify the quality of small software components, increasing reusability of those components and with it the programmer's overall efficiency.

Most research in randomized unit-test generation is in languages other than C++, despite C++ accounting for a substantial fraction of all software, particularly if weighted by usage.  Essential programs like browsers, compilers, build systems, and cloud infrastructure are written in C++.  There is every reason to believe that randomized unit testing is just as valuable for C++ as it demonstrably is for other languages, yet there is far less support for it.  To help bring about more C++ tools for randomized unit-test generation, we have developed the [RamFuzz framework](https://github.com/dekimir/RamFuzz).  RamFuzz can create randomized input for unit tests, including random instances of user-defined C++ classes.  The input generation is logged, leaving a precise record of the test run.  The log can be used to replay the test run for debugging purposes.

Additionally, a set of logs can be fed to deep-learning algorithms via a convenient Python interface.  This allows application of artificial intelligence (AI) to the process, leading to some very interesting results.  It also provides a framework for further research into leveraging AI to automatically test C++ software.

In this paper, we give an overview of the RamFuzz code generator, runtime logger, and post-processing interface to Python tools for deep learning.  We then show how the framework allows a neural network to learn to discern failing-test logs from the successful ones.  Although this is not the same as generating useful unit tests, it is a promising step on that path and an illustration of the framework's overall capabilities.  More research is needed to attain automatic test generation, as we describe in the final section.  We argue that RamFuzz is a potent vehicle for conducting that future research.

## Generating and Logging Random Parameter Values

This file has been truncated. show original

Everyone I spoke to seemed impressed by this and excited for what it can eventually accomplish. But I’d really appreciate the feedback from you guys; you’re the experts that I trust. Please let me know what you think!

piotr.czapla · February 7, 2018, 2:21pm

Interesting topic, I’m thinking about automating regression testing in UI using deeplearning so we have similar interest. I’m not an expert but i will be glad to have a look at your approach.

dekimir · February 7, 2018, 5:13pm

So I spent some time analyzing how exactly the model manages to predict test success/failure from the run logs, and I’ve come to some interesting conclusions. To wit, the random values generated are mostly ignored by the model; it derives its conclusions based predominantly on the code locations logged. It simply notices where the program goes and derives its prediction from that.

For instance, if the last log entry is the location deciding how many more methods to invoke, the model predicts success – well, sure, if that’s the last log entry, that most likely means that the program decided to invoke zero more methods and end right there. But the model’s weights are such that that zero in the log carries almost no impact, while the fact that the location was the last one in the log carries a lot. This holds true for all other values, too. Even if an extreme value is obviously the cause of an assertion failure, that value is transformed into a minuscule number in the calculations by normalization and inner-layer weights, while the sequence of program locations is amplified into a correct prediction of success/failure.

While this is a fascinating illustration of the power of deep-learning networks, it’s not very promising for autonomous test generation. I’m trying a different approach now; I’ll write about it in the next post.