Best Way to Fuzz?
There was an interesting discussion and information on what is the “best way from an ROI measure” to fuzz test at the CERT sponsored Vulnerablity Disclosure Workshop in DC this week. It led to some tweets back and forth between Digital Bond alumni Matt Franz and myself. First some background:
Fuzz testing is used by vendors, I hope, to look for common coding errors that can lead to vulnerabilities, such as buffer overflows. Consultants, researchers and hackers of all hat colors use fuzzing to look for exploitable vulnerabilities. Steve Lipner of Microsoft and co-author of the Security Development Lifecycle [SDL] said in his 2008 S4 keynote that fuzz testing and threat modeling proved to be the most effective ways to reduce exploitable vulnerabilities. Asset owner should be asking their vendors in RFP’s and User Group Meetings to explain their SDL and insure fuzz testing is part of it.
We have two security vendors that are trying to sell products to the control system market: Wurldtech with their Achilles platform and Mu Dynamics with their Mu Test Suite. [FD: Wurldtech is a past Digital Bond client and advertiser] One of the features of these products is they both send a large number of malformed packets at an interface – - typically crashing protocol stacks that have ignored negative testing.
While this greatly simplifies the issue, the two vendors have taken different approaches on how to create those negative packets. Wurldtech touts the use of a structured grammar to create the malformed packets, and MU takes more of an expert systems approach where there security engineers determine what would be the most effective malformed packets to send. There is certainly some overlap between the two approaches, but the question has always been what is more effective at identifying protocol stack errors.
So with that as background, Matt’s tweet “At CERT vuln discovery workshop. Interesting MSFT says grammar based fuzzing has lower ROI than dumb fuzzing” caught my attention. Matt was kind of enough to expand on his summary in an email:
The consensus of the talks was that you can’t rely on a single tool or technique but the ROI was higher for dumb “mutation based” fuzzers and white box approaches like SAGE than the time and effort to develop grammar based approaches, model the target, etc.
The direct comment was they still used “smart fuzzers” for highly critical code, Office, IE but that it wasn’t practical for other platforms like Exchange due to the way that it would hold up development and release cycles. Even/Especially in MSFT and CSCO devtest resources are precious and finite. Relatively poorly skill devtesters were able to achieve good enough results.
So if you are a vendor, or even an asset owner, starting from scratch you will have a low ROI on developing a grammar based fuzzer. But what if the grammar based solution already exists, such as in the case of Achilles, and you can buy it? This makes the ROI decision more interesting because you could compare the Achilles and the Mu Test Suite head to head and take into account any cost differences. So actually the question still remains if you are looking to purchase a control system fuzzer.
In a future blog post we will have Daniel cover Microsoft’s fuzzing efforts in the form of their SAGE tool which does “white box fuzzing” using symbolic execution and negative constraints. SAGE is still an internal Microsoft tool, but the approach is public.
Author: Dale Peterson
Posted: February 3rd, 2010 under Development Tools, Security Tools, The Rack.
Comments: 7
Comments
Comment from kowsik
Time: February 3, 2010, 9:53 pm
Dale,
Dunno if you know about Mu Studio: http://www.mudynamics.com/products/test-modules/Studio-Zx.html
What we’ve done is to eliminate the artificial dichotomy between generational (from specs) and mutational (from a sample) fuzzing into something that leverages both. This means you can have the Mu automatically generate contextually relevant Fuzz test cases for a specific interaction to your device without having either us or you do any work!
Comment from Ralph Langner
Time: February 4, 2010, 7:21 am
Sometimes it occurs to me that fuzzing tends to be overrated in respect to control systems. Here is why.
1. Fuzzing uses malformed packets. Ie. it is protocol specific. If your fuzzer fuzzes Modbus, it is of little value for testing devices that use Ethernet/IP, Profinet, …, instead. Ideally, fuzzing would have to target ALL protocols that a specific test client implements, including proprietary stuff. Something I haven’t seen in a real-world product.
2. For application layer protocols, the value of fuzzing should be seen in relation to the authentication features that the fuzzed protocol provides. If there is solid authentication at session initiaton, one might question the relevance of fuzzing the interaction after session establishment.
3. Many devices crash when receiving certain WELL-FORMED packets that are 100% specification conformant. Example: UDP packets with zero user data, IPv6 packets, auto-sensing packets, broadcast packets, … I urge everybody to test those first before messing with fuzzers.
Comment from Stefan Ditting
Time: February 4, 2010, 11:43 am
As a vendor of safety related controlers we know that functional safety could just be reached when cyber security is supported. So we are developing security features for our products and additionaly we implemented stack testing in our development. As Mr. Langner states (1.) it is essential to adress the special needs of a certain branch. We tested some scanners and decided for wurldtech. We can depend on their deep knowledge, further development and their close relation to our customers in the process industry. Wurldtech knows the needs of this branch. They are expanding their extensive testing and we (their customers) are participating.
Comment from Matt Franz
Time: February 4, 2010, 12:15 pm
I actually think the ROI issue was more germane to the internal development of “smart fuzzers” (whatever that means, but things that would take significant development cycles to get right) vs. end users buying commercial products to test.
However, I think the real test of effectiveness should be based on real data (number of bugs found, type, severity, etc.) and it would be cool to see a bakeoff against a pristine implementation between Achilles, Mu, and Codenomicon, and anything else.
Microsoft did this comparing Peach, internal smart fuzzers, and their SAGE platform and they showed the data and where there was overlap, etc.
- mdf
Pingback from Digital Bond » Best Way to Fuzz Part 2
Time: February 5, 2010, 8:55 am
[...] Read Best Way to Fuzz Part 1 and comments [...]
Comment from Ari Takanen (Codenomicon)
Time: February 7, 2010, 6:19 pm
Very interesting discussion! It is important to notice that it only takes one person to build a fuzzer, and then you can have hundreds or thousands of people using it. ROI should not focus on building fuzzers but using them. The only meaningful ROI metrics are “number of found issues” and “cost per bug found”. A tool that only finds one issue, but costs one tenth of a tool that finds hundreds of issues, does not sound like the best choice to me. Fuzzing is all about coverage.
Actually fuzz-test coverage and metrics has always been my favorite topic in fuzzing, ever since our PROTOS research in 1999-2001. We have a pile of thesis works done by our people and other security researchers in Oulu on that topic. Feel free to contact me if interested. We also dedicated a chapter on that in our Fuzzing book (co-authored by me, Jared DeMott and Charlie Miller). Another chapter is all about a vendor-neutral bakeoff of fuzzers (probably still the first and only one, as all others are paid or sponsored in one way or another).
Also, just early last week we published a short white-paper on the topic of fuzzing coverage and metrics, available here:
http://www.codenomicon.com/products/coverage.shtml
Comment from Jörg Lübbert (SoftSCheck)
Time: February 13, 2010, 8:34 pm
Interesting discussion indeed!
I’m leading a team of a research project from Germany. We collected over 130 fuzzers that we are evaluating at the moment and things look really promising. From those 130, we will have all the interesting ones evaluated and scored by October 2010.
To be able to do so, we created a taxonomy that enables precise naming of fuzzers according to their primary attributes and we’ve got a parameter-based scoring system with weighting that enables companies to find the fuzzers that are most suited for their purpose.
I’d also like to give my two cents about fuzzer quality. There’s new research about measuring a fuzzers quality about which I have read a draft paper. Plain code-coverage and number of bugs is ok, but as the paper suggest, it can be done even better.
On a side note: Those dynamic generation based fuzzers like SAGE are really promising in theory for their expected high ROI. Fortunately, there’s at least some research and development outside the closed doors of Microsoft, too.
If anyone reading this is interesting in further discussion or does have a yet unpublished fuzzer that he wants to have evaluated, feel free to drop me a mail
Write a comment