Big Bird: killing sqli with graph homomorphisms
Sun 26 June 2022 — download

Sometimes in 2017, while blotus, bui and I were brainstorming what mitigations we wanted in Snuffleupagus, we really wanted to kill SQL Injections, but it's a hard problem.

During an evening train ride back from a mission for a customer, we were idly drinking beers and tossing ideas around on how to solve this intractable issue. Then I remembered that in 2015, I saw TQ's "Square Peg, Round Hole: Automatically stopping injection attacks with shape matching" at Berlinsides as well as Database Firewall from Scratch by raz0r and dnkolegov, and thought that they would be a good starting point. Except that we would do it at the PHP level instead of at the database one, granting us more context about what is happening. So I drafted a design document, in French, under the codename Big Bird and this blogpost is a rough translation of it.

Main idea

The crux here is to assume that only a fraction of a website's SQL queries is malicious, and that it's possible to detect those that aren't by comparing the AST of the original request against the AST with user inputs replaced with constants. Replacements must be done in a (parallelizable) combinatorial way, for every input source: GET, POST, COOKIES, …

Doing so opens the door to second order exploits, allowing an attacker to send substrings of requests to generate false-positives, like sending SELECT A, B FROM to be passed to SELECT A, B FROM MYTABLE WHERE A.ID = "?";, resulting in CONSTANT MYTABLE WHERE A.ID = "CONSTANT". This can be used to arbitrary block some requests, but on the bright side, it's not impossible to generate false-negatives, up to homomorphisms of course, but if your application is vulnerable to data-only SQL attacks, you've got logic issues that can't be mitigated in a generic way anyway.

A table of legitimate AST can be cached in a hashtable, with something like H(file_name | line_number | simplified_stacktrace_to_remove_loops_and_dups | …) used as keys to tie legitimate SQL requests AST to where they're actually used/called from. This can be used to immediately identify legitimate requests, instead of doing the (costly) "replace user data and compare" dance, if someone sent a request with the same AST before.

Example

Original query: SELECT * FROM T WHERE ID="$1";

If $1 is set to 1, the query will be SELECT * FROM T WHERE ID="1", parsed as S * F T W C="STR" without constantification, and SELECT * FROM T WHERE ID="CST" aka S * F T W C="STR" with: they have the same AST, it's a legitimate query, we should store it into the cache.

If $1 is set to 1" OR 1 = 1, the query will be SELECT * FROM T WHERE ID="1" OR 1 = 1", parsed as S * F T W C="STR" O 1 = 1" without constantification, and SELECT * FROM T WHERE ID="CST" aka S * F T W C="CST": the two AST are different, and S * F T W C="STR" O 1 = 1" has never been seen before (it's not in the cache), meaning that this is an injection (or a false-positive).

If $1 is set to WHERE, the query will be SELECT * FROM T WHERE ID="WHERE", parsed as S * F T W C="STR" without constantification, and SELECT * FROM T CST ID="CST" aka CST * F T W C="CST", but while the AST are different, S * F T W C="STR" is present in the cache, meaning that this isn't an injection.

Limitations

An attacker could DoS its own process by sending a lot of variables, but isn't able to poison the known-legitimate-AST cache. The major limitation is that if the user's input is transformed before reaching the database, all bets are off, but this should™ be quite rare. Another pitfall is that an attacker is able to generate false-positives for never-seen-before legitimate queries, allowing to block the execution of arbitrary queries. This can be mitigated by using a crawler, or using a non-blocking mode for some time, to populate the cache.

Current and future status

I have a rough proof-of-concept of the implementation, but never published it, since it was quite brittle, horribly hackish and with a terrifying SQL parser contraption. I don't plan on implementing it further in Snuffleupagus, but I'm always happy to review and merge pull-requests if one feels inclined to send some.

Moreover, if you think of improvements or bypasses, please do tell me!

This isn't a fresh idea anyway, similar work has been presented in 2005, and has roots in Robert J. Hansen and Meredith L. Patterson's 2005 paper, Guns and Butter: Towards Formal Axioms of Input Validation