Sometimes in 2017, while blotus, bui and I were brainstorming what mitigations we wanted in Snuffleupagus, we really wanted to kill SQL Injections, but it's a hard problem.
During an evening train ride back from a mission for a customer, we were idly drinking beers and tossing ideas around on how to solve this intractable issue. Then I remembered that in 2015, I saw TQ's "Square Peg, Round Hole: Automatically stopping injection attacks with shape matching" at Berlinsides as well as Database Firewall from Scratch by raz0r and dnkolegov, and thought that they would be a good starting point. Except that we would do it at the PHP level instead of at the database one, granting us more context about what is happening. So I drafted a design document, in French, under the codename Big Bird and this blogpost is a rough translation of it.
Main idea
The crux here is to assume that only a fraction of a website's SQL
queries is malicious, and that it's possible to detect those that aren't by
comparing the AST of the
original request against the AST with user inputs replaced with constants. Replacements
must be done in a (parallelizable) combinatorial way, for every input source:
GET, POST, COOKIES, …
Doing so opens the door to second order exploits, allowing an attacker to send
substrings of requests to generate false-positives, like sending SELECT A, B
FROM to be passed to SELECT A, B FROM MYTABLE WHERE A.ID = "?";, resulting
in CONSTANT MYTABLE WHERE A.ID = "CONSTANT". This can be used to arbitrary
block some requests, but on the bright side, it's not impossible to generate
false-negatives, up to
homomorphisms of course, but if
your application is vulnerable to data-only SQL attacks, you've got logic
issues that can't be mitigated in a generic way anyway.
A table of legitimate AST can be cached in a hashtable, with something like
H(file_name | line_number | simplified_stacktrace_to_remove_loops_and_dups |
…) used as keys to tie legitimate SQL requests AST to where they're actually
used/called from. This can be used to immediately identify legitimate requests,
instead of doing the (costly) "replace user data and compare" dance, if someone
sent a request with the same AST before.
Example
Original query: SELECT * FROM T WHERE ID="$1";
If $1 is set to 1, the query will be SELECT * FROM T WHERE ID="1",
parsed as S * F T W C="STR" without constantification, and
SELECT * FROM T WHERE ID="CST" aka S * F T W C="STR" with: they have the
same AST, it's a legitimate query, we should store it into the cache.
If $1 is set to 1" OR 1 = 1, the query will be SELECT * FROM T WHERE ID="1" OR 1 = 1",
parsed as S * F T W C="STR" O 1 = 1" without constantification, and
SELECT * FROM T WHERE ID="CST" aka S * F T W C="CST": the two AST are
different, and S * F T W C="STR" O 1 = 1" has never been seen before (it's
not in the cache), meaning that this is an injection (or a false-positive).
If $1 is set to WHERE, the query will be SELECT * FROM T WHERE ID="WHERE",
parsed as S * F T W C="STR" without constantification, and
SELECT * FROM T CST ID="CST" aka CST * F T W C="CST", but while the
AST are different, S * F T W C="STR" is present in the cache, meaning that
this isn't an injection.
Limitations
An attacker could DoS its own process by sending a lot of variables, but isn't able to poison the known-legitimate-AST cache. The major limitation is that if the user's input is transformed before reaching the database, all bets are off, but this should™ be quite rare. Another pitfall is that an attacker is able to generate false-positives for never-seen-before legitimate queries, allowing to block the execution of arbitrary queries. This can be mitigated by using a crawler, or using a non-blocking mode for some time, to populate the cache.
Current and future status
I have a rough proof-of-concept of the implementation, but never published it, since it was quite brittle, horribly hackish and with a terrifying SQL parser contraption. I don't plan on implementing it further in Snuffleupagus, but I'm always happy to review and merge pull-requests if one feels inclined to send some.
Moreover, if you think of improvements or bypasses, please do tell me!
This isn't a fresh idea anyway, similar work has been presented in 2005, and has roots in Robert J. Hansen and Meredith L. Patterson's 2005 paper, Guns and Butter: Towards Formal Axioms of Input Validation