Link Search Menu Expand Document

Lecture 18 - Automatic Creation of SQL Injection and Cross-Site Scripting Attacks

Meta-Q: What do we think so far about the differences (or not) between work in automated testing/security in SE vs Security vs PL?

Background

PHP + CGI execution model

User makes a request to a web server -> web server invokes a program to service that request So - challenge for doing dynamic analysis - need to correlate the data from multiple user requests

XSS + SQLi vulnerabilities

OWASP top 10

SQL injection example: SELECT * from users where userName='$THE_USER' Malicious user sends username: 1' OR '1'='1 Evaluates to: SELECT * FROM users where userName='1' OR '1'='1'

How could we prevent this from being exploited in this way?

  • Input validation - do not allow certain characters as input, or certain patterns. Need to do this kind of validation on both the client side and the server side
  • Automatically escape every quote in every input that we receive from users!
    • 1' OR '1'='1 -> 1\' OR \'1\'=\'1
    • SELECT * FROM users where userName='1\' OR \'1\'=\'1'
    • “Magic Quotes”
    • Problem with this: Defensive programmers end up double-slashing things because they escape it, too
  • Avoid writing code this way - use “PreparedStatements”: SELECT * from users where userName=? and then something like set_parameter(1, $THE_USER)
    • The actual implementation of how the prepared statement works depends on your database implementation, and also the implementation of your database API
  • Equifax hack - vulnerability in input validation for Apache Struts

What is the problem being solved here?

  • Generating inputs?
    • Try to solve for more paths through the inputs
    • This “throw everything at the wall and see what sticks’ approach might be effective for small programs, but unsure about big ones
  • Detecting vulnerabilities?
    • (Especially non-crashing ones)…

Why would you not solve this problem with static analysis?

  • False positives - particularly with the database boundary to get the “2nd order XSS” vulnerabilities

High level approach

Taint tracking

	int x = Tainted(5);
	int y = 10;
	int z = x + y;
	if(isTainted(z)){
		
	}

$THE_USER = $_GET['user']; //taint source

$THE_USER = addslashes($THE_USER); //Sanitizer?

$result = mysql_query("SELECT * from users where userName='$THE_USER'"); //taint sink
  1. Generate some inputs with the goal of reaching vulnerable sinks
  2. Taint that input, use dynamic taint tracking to see if that input flows to a vulnerable sink
  3. If yes, try to inject a vulnerability
  4. If injected vulnerability, see if it “succeeded” at an exploit

What do they do about sanitizers?

  • A sanitizer function will clear the taint set
  • Unclear what their sanitizers are
  • Different sanitizers might work better for different sensitive sinks
  • Might miss some true positives that were induced by using the wrong sanitizer
    • Would have been an interesting evaluation to compare: how many of the reports get filtered out by sanitizers, which may or may not represent actual vulnerabilities?
    • Interesting though that false positive cost is primarily machine time: you can always validate a report as a “vulnerability” or not based on whether or not you can induce one
    • Better models of sanitizers can help mitigate this problem, but comes with a maintenance burden as the language evolves, and might be prone to developer errors still, particularly when there are custom sanitizers
    • XSS sanitization is tricky and depends on where the value flows into the browser:
      <script>
      alert($THE_EVIL_INPUT);
      Eval($THE_EVIL_INPUT);
      // This page was generated by $THE_EVIL_INPUT
      </script>
      <a href="$THE_EVIL_INPUT">Click here!</a>
      <$THE_EVIL_INPUT></$THE_EVIL_INPUT>
      

Example: <a href="$THE_EVIL_INPUT">Click here!</a>

Depending on sanitizer…

  • No sanitizer at all
    • $THE_EVIL_INPUT= '"><script language="JavaScript">alert("hahaha");</script>'
  • Sanitizer: prevent me from adding double quotes
    • $THE_EVIL_INPUT="javascript:alert('haha');"

<img src="$THE_EVIL_INPUT" /> <- could provide a URL to something malicious

How does Ardilla detect that an attack is successful? (4.3.2)

  • SQL injection
    • Compare the database statements (and any queries in them) - parse them, see if you get different parse tree
    • This is pretty good - state-of-the-art at the time was maybe to do regex parsing on SQL
      • Might miss vulnerabilities in something like PhpMyAdmin which lets users enter arbitrary mySQL queries and execute them, but duh
  • XSS attack checking
    • Signal attack if the output contains “additional script-inducing constructs”
    • Might be a weak oracle, as evidenced by the variety of injection routes + sanitizers
      • Compare the “strict” and “lenient” evaluations - might be an overly simplified model of XSS

Concrete + symbolic database

Need to do taint tracking through the database - not just in PHP, since data is persisted in DB

It’s something that you need, but it’s engineering - lots of work to do it, but it’s something that you need to have this system actually work

To make general: also have to worry about triggers, complex queries like INSERT into X select Y from Z, etc.

Alternative approaches?

  • Treat everything from the DB as tainted?
    • Don’t trust anything that comes out of the DB. Any false positive by your tool should be solved by defensive programming - add sanitizers all over.
  • Have some heuristic?
    • Try to create the most diverse set of inputs across different requests as possible
    • Before storing in to DB, record on the side that value ABCDEFG”
    • Any time you read anything from the DB, look for ABCDEFG and if I see that exact value, apply taint mark T

Idealist view: defensive programming! Don’t even try to solve this problem. Use a static checker that will find candidates for this bug, force remediation. Pragmatic view: We already have a lot of code, and it was written in PHP, so we don’t really believe that many good choices were made in its design or implementation :)

Where have these things gone in the past 12 years?

  • SQL injection
    • We don’t use SQL anymore, or
    • We use prepared statements
  • XSS
    • Some browsers have implemented some filters
    • Frontend frameworks have helped a lot also
  • LGTM + CodeQL
  • Have not solved input injection broadly - cognizant of the risks (better education) and developed linguistic/API approaches to make it easier to do it right
    • Constant trade-off in expressiveness vs security

© 2021 Jonathan Bell. Released under the CC BY-SA license