We’re often asked why we would do binary analysis on software that we already have the source code to, and Rob Graham over at Errata’s blog had a great post on this a few days ago about that very topic. As Graham says the key difference between coders and hackers (or security researchers playing the part) is the concrete versus the abstract. Analyzing the binary itself allows us to have a much more complete understanding of what the program is actually doing without all the assumptions getting in the way.
In looking at the binary an auditor has to, on some level, forget what they know about what the program is supposed to do and focus on the specifics of the section that they are analyzing. Each memory read and write has to be examined for what it is, and not for what it is supposed to be, which in all honestly can be quite tedious but it’s the only way to find a lot of vulnerabilities. Not to say that if everyone was doing all their coding in assembly that we’d have less security problems, but an eye towards the underlying actions that happen at the basic level during the development process would.
But there is also a place for source code analysis. When looking for certain types of problems, like logic and implementation correctness that type of analysis will be very fruitful and can be found much easier than slogging through assembly. Auditing a section of code using complex mathematics an auditor could work his way up from the additions/subtractions in the binary to understand the function and spot the problem, but it’s a lot more likely that he would notice a typo and an incorrect variable being used and probably spend a lot less time find it. Doing this level of analysis also gives us insight into how vulnerabilities may have been created in the first place, allowing for recommendations of changes in coding practices and “big picture” security issues to prevent more like it from occurring. Both binary and source analysis have their place in an audit and combined give real understanding of programs security from top to bottom.
—–BEGIN PGP SIGNED MESSAGE—–
Hash: SHA1
Daniel,
During my PhD research I came across a paper which I think could be useful to the community. It is written by researchers of the University of Wiskonsin and GrammaTech Inc, and is titled “WYSINWYX: What You See Is Not What You eXecute”.
What you and Robert Graham of Errata Security write about this topic is quite rational. Even assuming that what programmers intend with high level source code they write, there still could be mismatches between such code and what is actually executed by the CPU.
The paper is located at the following link:
http://www.cs.wisc.edu/wpis/papers/wysinwyx05.pdf
—–BEGIN PGP SIGNATURE—–
Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFIc2l03JhHvEZ9fsERAtkBAJ4gzjHrv/LjEC+Nq9rz/6xMVuupGACgxt7K
YhsbnmITHW1Wyi/80YhfPqE=
=mech
—–END PGP SIGNATURE—–
There is also research that was conducted at Dept. of Computer Science, Illinois Univ., Urbana-Champagne that proposed executing binaries in a simulated processor that extended the ISA (instruction set architecture) to trace the prorogation of “tainted” inputs through system memory (registers, stack, heap). This allowed identification of tainted pointers and alarms/logs the taintedness as it moves through memory.