Link Search Menu Expand Document

Lecture 20 - Feedback-directed Random Test Generation

Return to a favorite: Evaluating fuzzers by coverage vs bugs found - any new insights?

  • Still say both

Approach

Q: What kinds of errors does Randoop find?

  • “Contract violations”
    • Reminiscent of Daikon + “How good are the specs”
  • Differed slightly between Java, .Net. Default filters are trying to find NPEs or assertion errors

“Sequences of method calls”

  • At least one method call, with its arguments
  1. Pick some random public method to call
  2. Find some random sequences of inputs to pass to that method
    1. For each type:
      1. If primitive (int/byte/short/long/char etc) - they pick the value from a small set of possible values
      2. If reference type, then: (with equal random chance)
        1. Use some value V that was already created for this sequence
        2. Pick some value that is produced by an existing sequence that does not result in an error, add that sequence to this, and then use the returned value
        3. Use null - but ONLY if [2] had no results Q: When we are generating these parameters, what is our goal? * How small is a small set of possible values? * Can we create meaningful object inputs to pass to methods to test them?
          • Compare HotFuzz approach to Randoop approach
            • Randoop - Generate method calls that produce instances of a type
            • HotFuzz - Recursively, randomly instantiate values
            • Randoop might create more “valid” inputs. Only calls public methods, so it is (theoretically) more likely to be designed to be called with some semi-arbitrary parameters and be more “robust” than a private method, which is literally designed to encapsulate private state
              • But: The sequence of calls to create some interesting object might be extremely long. There are classes of data structures for which it might be harder to create the object by a sequence of calls, while for others, it might be harder by the random field assignment. But, there are also probably ways to combat this
          • Are there other approaches that do not require developer input to create interesting input types?
            • Automatically infer models of valid inputs, then try to create inputs to that model
            • Where does symbolic execution fit in this?
              • Model inference?
            • “Capture interesting objects” from larger executions, and then potentially mutate those later (“OCAT”? Tao Xie has something here)
            • Create type-specific mutators, where the test generation tool creates some perhaps not super interesting seed instances, but then is able to mutate those less interesting objects into more interesting versions by using the custom mutator
            • Other search-based techniques (we will see EvoSuite later this week…)

Q: What do you think about this approach to generating inputs to methods?

  • Using a small set of primitives might be very constraining
    • But only if you care about bugs that are triggered by primitives
      • Q: What are their contracts that they check?
        • Object equality transitivity
        • No exception
        • Maybe for these contracts, it doesn’t really matter
  • Weighting of selecting null?

“Extensible” sequences What is an extensible sequence?

  • Can you use the result of this sequence as the input to another method call?

“Filters” determine if a sequence is extensible or not:

  • Equal - Don’t add the same object more than once, by object equality (calling foo.equals(bar))
    • But the system does not assume that equals is correctly implemented :)
    • This compares the objects returned by each method invocation to see if they are the same, as defined by the equals method.
      • Other approach: Check the objects recursively, field-by-field
    • What about side-effects?
      • int read(byte[] ar, int offset, int len)
      • read(ar, 0, 100) -> 100
      • read(ar, 100, 100) -> 100
      • Does it matter? “Shrug”
        • How many of your methods have side effects?
        • Do you care about finding bugs in them?
        • Could certainly spend more CPU time + human effort on modeling side effects
  • Null - don’t add null to pool
  • Exceptions - don’t add things that throw exceptions
    • But isn’t there lots of interesting behavior when you have something that throws an exception? Example of a test that WILL NOT be generated by Randoop:
      Foo obj = new Foo();
      try{
        obj.riskyMethod();
      }catch(Throwable t){}
      obj.someOtherMethod();
      
  • Would have been lovely to see an evaluation that showed how effective each filter was at reducing unnecessary executions, and whether you miss any bugs as a result

Repetition

  • With 10% chance, when decide to add some method call Foo(x) to the sequence, don’t just add it once, but add it at a number chosen at random in (0,100]
  • Are there other kinds of patterns to calls that might be desirable?
    • E.g. “always call foo before calling bar
    • If you call add multiple times and get something interesting, then do it again…
      • “What is interesting” - change in coverage? Side effects?

Feedback

  • What kind of feedback does Randoop use?
    • Contract violations
    • Filters allow users to set arbitrary definitions of “is not interesting”

To fix IV22 projector: Projector to “input C” (should be there already). Use a pin to reset the crestron HDMI splitter on left side of podium


© 2021 Jonathan Bell. Released under the CC BY-SA license