I’ve been playing around with the idea of trying to fit an LSTM to do TDD in Python, and I figured I’d ask for your thoughts before investing a lot of time and sweat.
Why Python? Because it occurred to me that github’s stash of python functions with doctests, could be parsed into a (vast?) supervised dataset of semi testable code. For those unfamiliar with doctest, it is a simple way to add a few tests in a comment at the beginning of python functions or modules. The comments are parsed by the framework, run in python interpreter and checked against expected output:
The project could go somewhat like this:
- Scrape / fetch all-the-python from github.
- Train and validate a character level language model on a proven LSTM architecture on all-the-python to learn syntax.
- Mechanically clean up a subset of the python (e.g. only short python 3 utility functions with docstrings that pass tests even after they were removed from their context).
- Train a custom head to complete the function body given its definition line and some docstring.
- Run linter / python interpreter in the loss function to penalize lines with syntax errors and wrong outputs (based on the remainder tests in the docstring).
Some questions and issues to spark discussion:
- Am I crazy, a decade a head of what’s possible, or could this work to a small degree?
- The huge computational load of running an interpreter in the loss function in the final state of training.
- The function definition plus one test is not enough to go on towards writing a function that generalizes.
- Overestimation of how many people have tested their code?
- Data leakage e.g. from forked repos
- Approaches to scraping / fetching
- Choice of architecture
- Loss function design
- Code license issues
- Wanna join forces?
- First fast.ai model to feed on its own code… -> singularity
Also let me know if you know any previous work along similar lines. I’ve tried to google some, but weeding anything relevant out of the sea of “python for LSTM” results is painful. I’ve found some inspiring work like this old classic, but nothing about code that would even strive to “do the right thing”.