Taint Mode For Python

Juanjo Conti has done some fantastic work for OWASP implementing a taint mode in Python through a library. Taint mode is a language feature which can highlight injection flaws by tracing untrusted user input through the code by tracking the “taintedness” of variables. In taint mode, developers identify untrusted inputs, sanitization functions, and sensitive sinks. User input variables are initially marked as tainted, and the taintedness will be propagated as new variables are created based on already tainted variables. Tainted variables can be sanitized, after which they will be marked as untainted. Taint mode will prevent tainted data from going to identified sensitive sinks such as an interpreter or browser.

Sounds simple enough, but the real power here is in the propagation of taint information. As just one example, if you concatenate a tainted variable with an untainted variable, the result will be tainted. In this way the taintedness can traced through the program from the source to the sink.

Previous taint mode implementations required modifying the interpreter. Juanjo has done it in a library by implementing subclasses of the primitive datatypes that allows forĀ propagationĀ of taint information. This comes at the cost of needing to modify the application code, but the decorators and shortcuts Juanjo provides makes it easy.

Juanjo presented at OWASP App Sec 2010 in Stockholm. You can read or watch his presentation online. The source code for the library is available here.

More information about taint mode can be found on the PythonSecurity.org wiki.