04 Feb 2020
CrossHair is an ambitous project, and a lot of it seems pretty magical. It’s less magical than you might think, and Python does a lot of the heavy lifting for us.
This is part one in hopefully a series of posts about how CrossHair works.
I recommend the following reading as well, depending on your familiarity with other work in this space.
First, I can’t recommend the fuzzing book enough. CrossHair’s approach largely corresponds to the chapter on Concolic Fuzzing. Some parts of the Symbolic Fuzzing chapter are relevant as well. Unlike concolic execution, CrossHair’s values start each path execution as purely symbolic values, and do not get concrete values until they are needed.
CrossHair uses the Z3 SMT solver to perform its deductions. This Python-specific introduction is a good place to get a feel for what it does. Please note that I’ll say “Z3” below, but many statements about Z3 apply to all SMT solvers and/or the SMT-LIB language.
CrossHair just repeatedly calls the function you want to analyze and passes in special objects that behave like things your function expects.
CrossHair doesn’t do any kind of AST or bytecode analysis. It just calls your function.
These “special objects” behave differently on each execution. That’s how CrossHair explores different paths through your function.
These special CrossHair objects hold one or more Z3 expressions which are used in the Z3 solver. Here are some examples:
When your function takes a parameter with this python type, | we supply an object of this CrossHair type, | which holds an expression with this Z3 sort: |
---|---|---|
int |
SmtInt |
IntSort() |
bool |
SmtBool |
BoolSort() |
str |
SmtStr |
StringSort() |
dict |
SmtDict |
ArraySort(K, V) and IntSort() for the length |
We can initialize a CrossHair object by giving it a name:
>>> crosshair_x = SmtInt('x')
We can access the .var
attribute of any CrossHair object to get
the Z3 variable(s) that it holds:
>>> crosshair_x.var x >>> type(crosshair_x.var) <class 'Z3.Z3.ArithRef'>
This takes the Z3 variable we just defined and adds one to it:
>>> expr = crosshair_x.var + Z3.IntVal(1) >>> expr x + 1 >>> type(expr) <class 'Z3.Z3.ExprRef'>
We can create CrossHair objects not only for fresh variables, but
also for Z3 expressions.
So, if we wanted to wrap x + 1
back into a CrossHair object,
we’d write:
>>> SmtInt(crosshair_x.var + Z3.IntVal(1))
The SmtInt
class defines the __add__
method so that you don’t
have to spell that out, though. You can just say crosshair_x + 1
, and
SmtInt
does the necessary unwrapping and re-wrapping:
>>> type(crosshair_x + 1) <class 'crosshair.libimpl.builtinslib.SmtInt'>
SmtInt
also defines the comparison methods so that they return symbolic
booleans:
>>> type(crosshair_x >= 0) <class 'crosshair.libimpl.builtinslib.SmtBool'>
The symbolic boolean holds an equivalent Z3 expression:
>>> (crosshair_x >= 0).var 0 <= x
So far, everything is symbolic. But eventually, the Python interpreter needs a real value; consider:
>>> if crosshair_x > 0: >>> print('bigger than zero')
Should this execute the print or not? When python executes the if
statement, it calls __bool__
on the SmtBool
object. This method
does something very special. It consults Z3:
CrossHair will remember what decisions it has made so that it can make different decisions on future executions. Ultimately, we’re looking for some target thing to happen: an exception to be raised, or a postcondition to return False. When that happens, we ask Z3 for a model and report it as a counterexample.
That’s the core of how CrossHair works.
Well, if there is an accomplishment about CrossHair, it’s that it tries hard to get the details right. And there are a lot of details.
Here are some of the topics I’m considering talking about next. Let me know which ones interest you the most!
object
.