Lecture 19 - HotFuzz: Discovering Algorithmic Denial-of-Service Vulnerabilities Through Guided Micro-Fuzzing
Background
Algorithmic complexity vulnerabilities DoS on the cheap Example:
- My server can process 10k requests/sec before crashing
- Hence, attacker cost is: 10k requests/sec
- What if I could create a request that takes 1,000x longer to process than any other request?
- You can do a DoS with fewer requests/second
Why is it hard to defend against these kinds of attacks?
- How do you differentiate between legitimate requests and those that are targeted attacks
- But: How do you compare the requests to identify if you are being potentially attacked at any given moment?
- Traffic looks like:
- Steady state: 5 req/sec
- All of sudden: 15k req/sec
- But with algorithmic complexity:
- Steady state: 5 req/sec
- Attack: 5 req/sec
- Traffic looks like:
- What are the characteristics of inputs that are exploiting algorithmic complexity vulnerabilities?
- (These might also be valid inputs that find performance bugs)
- Timing?
- But: How do you compare the requests to identify if you are being potentially attacked at any given moment?
- “Classic” examples for algorithmic DoS: Execution time that is super-linear to the size of the input
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ELEMENT lolz (#PCDATA)>
<!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
DARPA STAC
- Space Time And Complexity Vulnerability Project
- Other fuzzers that find AC vulnerabilities:
- Number of executed bytecode instructions -> proxy for runtime
- Dynamic approaches for approximating complexity - O(n…)
- “Ground truth dataset”
What is the big problem that HotFuzz is solving?
- Oracle problem: Existing fuzzers use an oracle of “program must crash” or a sanitizer fake-crashes the program
- “Hard to get broad code coverage”
- Micro-fuzzing vs API fuzzing?
- Problem with existing API fuzzing work: “Developers need to write harnesses”
- Q: Why do those existing techniques require harnesses?
- You might randomly violate preconditions, which will result in generate some call that could never actually happen in practice
- Q: Does micro fuzzing solve that problem? No - can get false positives
- “Determining whether an adversary can transform these test cases into working AC exploits on victim programs is outside the scope of this work”
- Micro-fuzzing vs API fuzzing?
What is the solution?
Q: What does HotFuzz use as an oracle to determine that there is an AC vulnerability?
- Execution time (set a timeout)
- Q: Ideas for an alternative oracle?
- Throw a range of inputs at the system to show input size vs runtime - try to maximize smaller inputs that get you the same runtime as bigger inputs or worse
- What about sequences of inputs?
- Classic example: Hash collisions in a table - you exploit this vulnerability by creating a sequence of inputs that will collide in the hash table, not obvious as a vulnerability if you look at just one input
- Can you draw a baseline for performance under normal operation, and then find inputs that are outliers?
- Where is the line drawn between a performance bug and an algorithmic complexity vulnerability?
- What is a performance bug but not a complexity vulnerability?
- Maybe it’s primarily tied to: “Can an attacker deliberately trigger it”
- Or: “An AC vulnerability is a performance bug that is found by an attacker, and not by you”
- What is a performance bug but not a complexity vulnerability?
- Why not use a static approach, or combine a static approach with a dynamic one?
Micro-fuzzing
Fuzzing java methods… Java methods take inputs that are typed… String, List, int, etc. Q: How do they generate these typed inputs?
- IVI - Identity Value Initialization
- Fixed set of default values
- SRI - Small Recursive Insnantiation
- Random values selected from some set of possible values
- Do we care about “Validity” in the sense that Zest did?
- No - try to directly fuzz some code while bypassing any complex checks or type invariants
- Evolve the input corpus using mutation + crossover
- Consider AFL/Zest - performs these changes on some bytestream input. HotFuzz will manipulate types directly like java.util.List
- How do they mutate a list?
- Mutate a random attribute - only mutate strings, primitives, and arrays of primitives
- How do they do crossover on a list?
- Roughly speaking: take two objects, pick some field at random as the crossover point, exchange field values between two objects at that point
- Example of crossover (generally):
- Input 1: XXXXXXX
- Input 2: YYYYYYY
- Mutate Input 1:
- XXXXXXXZ
- Crossover Inputs 1, 2:
- XXXYYYYY
- “Witness synthesis”
- These AC-inducing inputs are found in an unrealistic environment - not a normal JVM. Create a script that can trigger this input in a normal JVM
Implementation
- Patch on OpenJDK
- 1,007 lines of C+++ code, 5,497 lines of Java code, 288 lines of Python code - is this a lot or a little? :)
- Complexity of the program, general correctness?
Q: what are the features that they need to get out of jvm, and why change it?
- Timing measurements for just a single method at a time - presumably ms or ns?
- “Didn’t want to alter the program bytecode”
- Focus on performance, but still run evaluation in interpreted mode?
- Interesting discussion of JIT compilation + interpretation
- Why not generate harnesses, which collect timing info?
- In particular - “JMH” is the internally-developed performance microbenchmarking toolkit created by the JVM developers - could build a fuzzer that uses these harnesses to precisely collect timing information, and…
- Also offers a solution for controlling GC
- In particular - “JMH” is the internally-developed performance microbenchmarking toolkit created by the JVM developers - could build a fuzzer that uses these harnesses to precisely collect timing information, and…
- If you fuzz functions that are normally JIT’ed, and you do it in interpreted mode, will it change the vulnerabilities that you find?
- Yes! They use “witness” to validate in non-interpreted mode JVM, but still subject to “did this get hot enough to get JIT’ed?”
- Fuzz things besides C! Interesting problems here, not fully worked out:
- How to fuzz typed methods (ideally creating “valid” inputs”)? [Input validity]
- How to detect a bug/vulnerability? [Oracle problem]