Lecture 18 - Automatic Creation of SQL Injection and Cross-Site Scripting Attacks
Meta-Q: What do we think so far about the differences (or not) between work in automated testing/security in SE vs Security vs PL?
Background
PHP + CGI execution model
User makes a request to a web server -> web server invokes a program to service that request So - challenge for doing dynamic analysis - need to correlate the data from multiple user requests
XSS + SQLi vulnerabilities
OWASP top 10
SQL injection example: SELECT * from users where userName='$THE_USER'
Malicious user sends username: 1' OR '1'='1
Evaluates to: SELECT * FROM users where userName='1' OR '1'='1'
How could we prevent this from being exploited in this way?
- Input validation - do not allow certain characters as input, or certain patterns. Need to do this kind of validation on both the client side and the server side
- Automatically escape every quote in every input that we receive from users!
1' OR '1'='1
->1\' OR \'1\'=\'1
SELECT * FROM users where userName='1\' OR \'1\'=\'1'
- “Magic Quotes”
- Problem with this: Defensive programmers end up double-slashing things because they escape it, too
- Avoid writing code this way - use “PreparedStatements”:
SELECT * from users where userName=?
and then something likeset_parameter(1, $THE_USER)
- The actual implementation of how the prepared statement works depends on your database implementation, and also the implementation of your database API
- Equifax hack - vulnerability in input validation for Apache Struts
What is the problem being solved here?
- Generating inputs?
- Try to solve for more paths through the inputs
- This “throw everything at the wall and see what sticks’ approach might be effective for small programs, but unsure about big ones
- Detecting vulnerabilities?
- (Especially non-crashing ones)…
Why would you not solve this problem with static analysis?
- False positives - particularly with the database boundary to get the “2nd order XSS” vulnerabilities
High level approach
Taint tracking
int x = Tainted(5);
int y = 10;
int z = x + y;
if(isTainted(z)){
}
$THE_USER = $_GET['user']; //taint source
$THE_USER = addslashes($THE_USER); //Sanitizer?
$result = mysql_query("SELECT * from users where userName='$THE_USER'"); //taint sink
- Generate some inputs with the goal of reaching vulnerable sinks
- Taint that input, use dynamic taint tracking to see if that input flows to a vulnerable sink
- If yes, try to inject a vulnerability
- If injected vulnerability, see if it “succeeded” at an exploit
What do they do about sanitizers?
- A sanitizer function will clear the taint set
- Unclear what their sanitizers are
- Different sanitizers might work better for different sensitive sinks
- Might miss some true positives that were induced by using the wrong sanitizer
- Would have been an interesting evaluation to compare: how many of the reports get filtered out by sanitizers, which may or may not represent actual vulnerabilities?
- Interesting though that false positive cost is primarily machine time: you can always validate a report as a “vulnerability” or not based on whether or not you can induce one
- Better models of sanitizers can help mitigate this problem, but comes with a maintenance burden as the language evolves, and might be prone to developer errors still, particularly when there are custom sanitizers
- XSS sanitization is tricky and depends on where the value flows into the browser:
<script> alert($THE_EVIL_INPUT); Eval($THE_EVIL_INPUT); // This page was generated by $THE_EVIL_INPUT </script> <a href="$THE_EVIL_INPUT">Click here!</a> <$THE_EVIL_INPUT></$THE_EVIL_INPUT>
Example: <a href="$THE_EVIL_INPUT">Click here!</a>
Depending on sanitizer…
- No sanitizer at all
$THE_EVIL_INPUT= '"><script language="JavaScript">alert("hahaha");</script>'
- Sanitizer: prevent me from adding double quotes
$THE_EVIL_INPUT="javascript:alert('haha');"
<img src="$THE_EVIL_INPUT" />
<- could provide a URL to something malicious
How does Ardilla detect that an attack is successful? (4.3.2)
- SQL injection
- Compare the database statements (and any queries in them) - parse them, see if you get different parse tree
- This is pretty good - state-of-the-art at the time was maybe to do regex parsing on SQL
- Might miss vulnerabilities in something like PhpMyAdmin which lets users enter arbitrary mySQL queries and execute them, but duh
- XSS attack checking
- Signal attack if the output contains “additional script-inducing constructs”
- Might be a weak oracle, as evidenced by the variety of injection routes + sanitizers
- Compare the “strict” and “lenient” evaluations - might be an overly simplified model of XSS
Concrete + symbolic database
Need to do taint tracking through the database - not just in PHP, since data is persisted in DB
It’s something that you need, but it’s engineering - lots of work to do it, but it’s something that you need to have this system actually work
To make general: also have to worry about triggers, complex queries like INSERT into X select Y from Z
, etc.
Alternative approaches?
- Treat everything from the DB as tainted?
- Don’t trust anything that comes out of the DB. Any false positive by your tool should be solved by defensive programming - add sanitizers all over.
- Have some heuristic?
- Try to create the most diverse set of inputs across different requests as possible
- Before storing in to DB, record on the side that value ABCDEFG”
- Any time you read anything from the DB, look for ABCDEFG and if I see that exact value, apply taint mark T
Idealist view: defensive programming! Don’t even try to solve this problem. Use a static checker that will find candidates for this bug, force remediation. Pragmatic view: We already have a lot of code, and it was written in PHP, so we don’t really believe that many good choices were made in its design or implementation :)
Where have these things gone in the past 12 years?
- SQL injection
- We don’t use SQL anymore, or
- We use prepared statements
- XSS
- Some browsers have implemented some filters
- Frontend frameworks have helped a lot also
- LGTM + CodeQL
- Have not solved input injection broadly - cognizant of the risks (better education) and developed linguistic/API approaches to make it easier to do it right
- Constant trade-off in expressiveness vs security