Discussion: Fuzzing - “Automated Testing of Graphics Shader Compilers.” (Donaldson et al 2017)
Poll:
To what extent do you think that this work could impact SE? Not very: 0 Unsure/so-so: 5 Very: 15
- Coming up with those semantics-preserving changes for new langauges/domains can be tricky (also finding ones that are defect-revealing)
- Having multiple implementations of shader compilers gives “more bang for buck” - find bugs in multiple implementations concurrently
- An impressive effort
To what extent do you think that the article is convincing that this is the best approach for testing graphics shader compilers? Not very: 0 Unsure/so-so: 8 Convinced: 11
- Are there potential inconsistencies/biases based on the shaders that they used as inputs - would a qualitative analysis of the shaders help provide more confidence?
- “High-value” bugs are a good target
Context and background
- Why are graphics shader compilers difficult to test?
- No “correct” answer
- Why not do “differential” testing - compare “fuzzily” the output of two compilers?
- Requires multiple GPUs w/ benchmarks - need a precise definition of “equivalent”
- Still need to determine which one has the bug
- Vendor-specific extensions prevent this to be used fully
- What is the relationship between this work and testing approximate computing?
- What is the rigorous definition of “close”? For shader compiler fuzzing, we might need to have a human evaluator (“the fog didn’t render but that was OK”)
Metamorphic testing
Example: sin(x)
we want to test, we don’t know what the outputs should be
Metamorphic relation: Specify some property that describes a transformation of the input, and ac corresponding transformation on the output.
sin(-x)=-sin(x)
sin(2pi+x) = sin(x)
integral from 0 to 2pi of sin(x) = 0
What does this buy us in terms of being able to test things, example, sin
function?
- Provides a relationship from function inputs to outputs
- Create “follow-on” inputs
- What is metamorphic relation in this context of GLFuzz?
- Image generated by shader should be equivalent to the one generated by a shader with a semantics-preserving transformation applied
- What other domains might we be able to specify metamorphic relationships for, and maybe use to find bugs?
- Machine learning - There are a lot of matrix multiplications, do relations over the multiplications.
- SQL queries - semantics preserving transformations, or even some that are not preserving, but we understand (e.g.
LIMIT 10
should restrict number of rows) - Testing for data races - add no-ops into code might expose a data race (aka the semantics-preserving transformation)
- Numerical analysis - know the different transformations, what is acceptable in terms of approximation
- (Generally: computer algebra systems)
- Image/video processing - semantics preserving transformations, but also identity transformations (crop it, should still be similar, etc)
- Compiler testing - same as this paper. Property: semantics-preserving change to program should yield same compiled program/output
What is motivation?
- Why do we care about shader compilers?
- It produces nice graphics, looks good in a powerpoint
- It’s also a nice research problem, because there is no easy/obvious way to do it
- All of the poor gamers need games to work right; safety.
- Safety can be both from security perspective (exploit-freedom) and general correctness (crash-freedom)
- Tesla almost definitely is using OpenGL :)
What is the solution?
Apply semantics-preserving transformation to some inputs, run them, see if output is the same:
compileAndRun(P, I) = compileAndRun(f(P(I), I)
, where f(x)
produces a semantically-equivalent program
Given an input set of shaders…
- How do they generate the variants?
- Add dead-code of various kinds:
x = 1; if(x == 0){}
- Don’t just put empty blocks in there - copy/paste code from other shaders,
break
or acontinue
, justreturn
- Don’t just put empty blocks in there - copy/paste code from other shaders,
- “Live code injection”
- Copies some code that computes some local variables from one shader into another
- Why copy code from another shader instead of generating from scratch?
- There is some risk that what you generate could have some unexpected side-effect
- It might be hard
- Maybe it would be more likely to create something that would be recognized as a “real” bug
- Mutate code with mathematical identities,
x=1*x
,x=0 + x
- Pack variables into a vector
x,y,z
->q = vec(x,y,z)
- Q: Need to ensure that the control flow of the program is preserved so that it doesn’t unexpectedly crash (aka they make non-semantics-preserving changes )
- Is there a guarantee that the changes are semantics-preserving? (No)
- What is the risk if we make a non-semantics-preserving change?
- False positives (report a bug when there isn’t)
- Every transformation needs to be reversible for minification
- Detecting “deviant” variations
- Some heuristic for determining “similarity”
- Build a histogram of the RGB distributions of each image
- Compare distributions to determine if they are statistically different (chi-squared distance)
- Reduce/minifify the deviant variations
- Reverse a random set of transformations, if not equivalent to original image, repeat until converges to local maxima
- Add dead-code of various kinds:
Evaluation
- What scale of bugs per-dollar should we be accepting of?
- If we are Nvidia/arm…. :/
- If we are a platform vendor, maybe we care a lot
- If we are an application developer in, say, the medical domain, we might want/need to have some higher degree of certainty of whether our software generates correct visualizations on different hardware
- 17 devices, 3 evaluation sets
- 1,000 shaders from GSLSSandbox.com, took 100 largest shaders that could be compiled
- How do they choose to apply which transformation?
- Randomly - “swarm testing”
- Identify false positives
- Three fellow researchers, not involved in the project, indicate via a key-press whether each pair is visually distinguishable (how long?)
- Table 3 - the true positives and false positives by technique
- Mutation is very productive, but it also is going to be most likely to introduce floating point errors that result in false positives
- Could false positives be reduced? Does it matter?
Overall discussion
- How to focus on detecting security vulnerabilities?
- Identify shaders that are executing memory-risky parts of code?
- Have no interest in checking the output images
- But, having a frame-by-frame analysis is still important/valuable for finding those security issues
- Faster/easier way to get the same results?
- How to make this a generic approach/architecture for testing compilers, with “semantics preserving changes?”
- Need some engine for rewriting the code
- Need some checker to determine whether the output of the programs is the same
- Maybe easy, maybe hard?
- Need to be able to clearly specify the “semantics-preserving” part
- How to make this a generic approach/architecture for testing compilers, with “semantics preserving changes?”
- Is it valuable to find bugs in older software that may have been fixed already?
- Yes: it shows that approach works. Maybe one could argue that this approach is “cheaper”?
- And it means that developers really care about the bugs!