Why Crain / Sistrunk Vulns Are A Big Deal

DNP3 Master FuzzingICS vulnerabilities are easy to find and often not even necessary because the ICS applications and protocols are insecure by design. So why are the vulnerabilities that Adam Crain and Chris Sistrunk found in DNP3 protocol stacks such a big deal? Three reasons why I think this is the ICS security story of the year:

1. An attacker at a single, unmanned substation can crash an entire transmission SCADA system with a single packet (ok, maybe a few packets since there are likely redundant masters).

The protocol stacks of the PLC/controllers have been fuzzed, found lacking and are beginning to be patched. This is not news, and taking down a single substation is not news.

What the researchers have done is tested the DNP3 protocol stack in the master in the control center. They wait for the master to send a request packet, which occurs regularly, to the PLC/controller and then send back specially crafted response packets. Result: the master crashes. Impact: the control center loses the ability to monitor and control the SCADA network.

Note – they actually don’t even have to wait for a request packet since DNP3 supports unsolicited response.

When the master crashes it can no longer monitor or control any or all of the substations. There is no way to stop this with a firewall or other perimeter security device today. You have to let DNP3 responses through. In theory a future Tofino or Checkpoint application layer firewall that supported DNP3 could stop these malformed DNP3 response packets.

The researchers have found this to be true on almost all the systems they have tested which represent the big boys in the electric sector. By physically breaking in to 10 or 20 remote substations around the country they could bring down SCADA systems that monitor and control a large portion of the US power transmission and distribution systems.

2. This attack works with serial communications that are specifically out of scope of NERC CIP Cyber Security Regulations

The original version of DNP3 worked over serial, 4-20mA, RS-485 type communications. It still is widely deployed as a serial protocol in large systems, particularly in the less important (and typically less physically secure) substations or outstations. There is another version that encapsulates DNP3 in IP. Almost all attacks on ICS have focused on IP protocols.

The researchers have proven the same attacks that work on the IP version of DNP3 work on DNP3 serial.

Here is the kicker – the current NERC CIP cybersecurity regulations specifically exclude serial communications and the equipment that uses serial communication from meeting any security requirements. The researchers have proven the folly of this when serial communications can be used to take down the entire ICS monitoring and control capability.

I’m not arguing against IP networks getting most of the security attention, but serial comms coming into a control center are shown here to have a big impact.

3. This has been reported over six months ago and little or nothing has been done to address the problem in the electric grid

Adam proposed this research for S4x14 back in July. It was an easy yes, and in fact we have him teaching a class on Friday on response fuzzing and serial fuzzing. At that time he had been in touch with DNP3 Technical Committee, DHS/ICS-CERT and vendors for months. I have been waiting for all of these organizations, and NERC, to start pushing hard to get these master stations patched asap.

Instead, we have been hearing crickets. That’s not quite fair. ICS-CERT and vendors have been putting out very subtle, low key alerts and bulletins that there were vulnerabilities when patches became available. For example:

  • <ICS-CERT’s eterraControl Advisory: “Impact - Successful exploitation of this vulnerability could allow an attacker to affect the availability of the Alstom e-terracontrolsoftware. Impact to individual organizations depends on many factors that are unique to each organization. ICS‑CERT recommends that organizations evaluate the impact of this vulnerability based on their operational environment, architecture, and product implementation.” This is a far cry from saying if an attacker gains physical access to one of your substations he can stop your ability to monitor and control your transmission and distribution system.
  • DNP3 Technical Committee Announcement: Correctly states it is an implementation, not protocol issue but doesn’t raise any alarms. They also revert to the old excuses “SCADA protocols were designed for use on trusted networks. On untrusted networks, these protocols must be deployed within a system that uses adequate security measures.” and “No single security feature can defend against all types of attacks. Experts suggest using a defense-in-depth security methodology.” They never explain the impact of the vulns or encourage members to patch the vulns.
  • I don’t have access to the vendor bulletins; send them to me if you have examples of vendors emphasizing the impact and need to patch.

Why isn’t DHS, NERC, and the DNP3 technical committee telling vendors they need to fix this now and utility owners they need to get this patched asap? As much as I harp on insecure by design problems, this is a vuln that is actually much more serious. It is not that hard to gain physical access to a substation, especially one that is less important and still connected via serial comms.

It actually is a much easier fix than a PLC vuln because there are only a small number of masters, typically running on Windows or Unix Servers, that need to be patched. These systems are deployed with redundancy, and you could even stand up an additional server to help with the transition and possible rollback.

S4x14 Hype

Yes, we are very excited that Adam will be revealing the technical detail in public for the first time at S4x14. He thought January was the right time to give vendors and owner/operators time to address the problem, and the S4 audience was the best to understand the technical details. I’m most interested in the how they constructed the DNP3 response fuzzing packets that caused the crashes. I know it was not random data. The details on the categories of failures and vendor responses should also be very interesting.

Image by Chris Hunkeler

8 comments to Why Crain / Sistrunk Vulns Are A Big Deal

  • I made comments at
    http://news.infracritical.com/pipermail/scadasec/2013-October/011019.html
    which answer some of your concerns.

    Contrary to your assertion, the DNP Users Group does encourage the membership to upgrade their investments in DNP3 frequently in various forums. However, we also know that they are mired in costs, logistics and obligations which you do not acknowledge.

    This is every bit a social problem as it is a technical one. You have been doing great work in the past addressing the technical side of this issue, but the social side, the regulatory side, the standards, and the designs remain.

    If I could snap my fingers and change that reality, I would. But things like that do not change until disaster happens. One of the lessons we learn from biblical theology is that prophets are usually ignored until after the events they warned about have happened.

  • Dale Peterson

    Jake, a fair summary of the DNP3 Users Group document is there are no problems with the protocol and apply basic ICS security measures. Hardly a call to action for members to work with vendors to see if their implementation is vulnerable and apply a patch if it is.

    This is not just another in a series of vulns. The potential impact is huge, and all it takes is physical access to an unmanned substation (or other field site when DNP3 is used in pipelines, water, etc.)

    Patching the master stations is much simpler than dealing with the insecure by design issues of PLCs identified in Project Basecamp and elsewhere. Still needs to be planned, tested, rollback capable, but if you can’t do this then your ICS is very fragile.

    I don’t want to be part of a community that lives by “things like that do not change until disaster happens”.

    Dale

  • We live in a management by disaster world, whether you acknowledge it or not. This is how all governance works. I have watched it happen professionally for nearly 30 years. Changing that aspect of this problem would be a monumental cultural achievement.

    There is little we can do to the DNP3 protocol that would make these software errors less of a problem. The protocol is not the problem. The problem is that our vendors are human and they didn’t test to the state of the art. Why? Because their customers didn’t think to ask for it. And why is that? Because their customers didn’t even know that this was a problem.

    They couldn’t put this level of detail in to their product because if they did, someone else would come along and build an indistinguishably worse product for less, and they would lose market.

    That’s the reality of this business. We get pretty much what we paid for.

    This is a structural problem in the software business. Eventually these problems do get fixed, just as software used to have horrifying memory leaks until people developed inexpensive tools to find and track them. Crain and Sistrunk have broken new ground by fuzzing this protocol. Others will be fuzzed soon too. These problems will be found and dealt with by vendors once customers learn to expect it in the costs and performance. And then a new technique will come along in another ten years, and we’ll repeat this cycle once again.

    This is but one of a long series of discoveries percolating in to industry. You are ranting because it isn’t as instantaneous as you would like. Yet even if it took half of the time it currently takes to percolate these changes through the industry, the opportunity to exploit the weaknesses would remain.

    Should we get better? Well, we’d all like to. So how much are we willing to spend?

    Using a more familiar example: Do you trade in your car every year so that you can get the latest safety upgrades? No? Cars still have a lifetime of approximately ten years, just as they did in decades past.

    Why is that? Because most people can not afford to change cars just to get the latest and greatest features every year. Likewise, we move at the speed of new process equipment. Much of it is designed for multi-decade lifespans. The driving force is the cost and the methods of financing. There is only one way these features will be regularly incorporated in to these systems: Regulation.

    If you want to fight this battle, you need to talk to the people who regulate the business. And you need to codify these things in to law. This is exactly what Joe Weiss does on a regular basis. Regardless of what you may think of his position on the issues, this tactic is the only one that makes sense.

    Stop ranting and talk to your local PUC. Talk to your state legislators. Talk to the Federal Regulators.

    Those of us who are employed by actual utilities have ethical obligations not to lobby on behalf of spending programs that might enlarge our status within the organization. In other words, I, as a utility employee, can not lobby for such programs. My normal retirement date is coming up in three years, so perhaps after that things can change.

    And when you are granted all the authority to do the things you want to see happen, just remember who you’ll need to deal with to make things happen. I always tell people to be nice in this business. You never know who your boss will be tomorrow.

  • RonF

    “Using a more familiar example: Do you trade in your car every year so that you can get the latest safety upgrades? No? Cars still have a lifetime of approximately ten years, just as they did in decades past.”

    This works because of the proper regulation you speak of…

    http://www.nhtsa.gov/Vehicle+Safety/Recalls+&+Defects

    I don’t buy a new car for new safety features but when a defect is found I can expect a free fix. Sometimes this is a ‘patch’. Sometimes this is a whole new part. Your analogy is sound, but don’t think of the car as a PLC or other subsystem, but the whole system containing many subsystems.

    “The problem is that our vendors are human and they didn’t test to the state of the art. Why? Because their customers didn’t think to ask for it. And why is that? Because their customers didn’t even know that this was a problem.”

    I believe this is the main problem. It’s not about the vendor putting out a perfect product out the gate. That is impossible. But when a researcher provides free testing, provides data on the defect and possibly how to fix and the vendor takes no/little action then what? The customer being ignorant of a potential problem is not an viable excuse for ignoring a known/notified defect in your product.

    Those of us who are employed by actual utilities should demand this support, because our lives are at stake.

  • Having been in the SCADA supply business for many years and having had to design and develop drivers, both at the master and remote ends of the wire, for a lot of protocols (including DNP3) I see this issue as being a case of poorly written (or worse yet – integrated) drivers. There is no such thing as a magic messasge frame. They may violate the specifications of the protocol and be filled with invalid command codes and data, but there is nothing you can send down an asynchronous serial com channel that in and of itself will crash a driver. What is usually happening is that there are error conditions that were never thought about or improperly handled in the code and when the messages cause a branch to that part of the driver (or have no handler to deal with the condition) the driver usually crashes. If the SCADA system is robust in its design then the driver ought to be running as a seperate task and the system should note its death and restart it again. (And yes you can send the same message sequence and kill it again.) Too many SCADA vendors just pay some other group to supply a driver and then they do a poor job of integrating it or the driver itself has these logic errors and so now all vendors using the same software are vulnerable to the same message sequences. There is nothing particularly wrong with DNP3 as a protocol (it way ahead of many of the legacy protocols still used by electric utilities.) The problem lies in the poor coding and testing of the drivers that are supposed to “speak” and understand the protocol. SCADA vendors need to use techniques like fuzzing to stress their drivers and find all the logic holes before the bad guys (or researchers) do.

  • Jeff Brandt

    The car analogy is simplistic. Your choice about your car impacts (directly) mostly you, and while it may have an indirect impact on others (if you drive off the road), you are the primary person impacted.
    ICS security is different. The persons making the short-sighted decisions are rarely impacted, directly and personally,

  • Mike Fitzpatrick

    I would like to chime in on Shaw’s response. Although I don’t have broad experience with many systems, I do have an understanding of the system that I support and it works just as Shaw has defined. Yes I believe an ill formmated command could cause the driver for that line to abort. However, that is one line that communicates to a single device. That will not cause the SCADA sytem to fail and it will not result in loss of all control becuase of it. Unfortunately, the article result in fear tactics to try and raise a red flag for an issue that may or may not be a problems for certain systems. I agree that there may be systems that would crash and as Shaw states due to poorly integrated drivers. However, not all systems are built the same or tested the same.

    A statement made in the article states that an attacker coule even use unsolicited reporting to compromise the system. Again this depends on the system and how it is designed. My system does not enable this and therefore doe not listen or respond to such command coming from the remote device.

    Bottom line is yes there are potentail problems with systems, but to categorically state that it is a protocol issue and that every serial implementation is going to be susceptiple is just overblown hype.

  • @Tim @Mike.

    You’re absolutely right.

    We observed a ton of integration issues due to what we can only explain as poor/weak API design from source code vendors.

    It’s hard to get people to use APIs correctly even if they’re excellently designed. If you pass them trash and they choke on it, the blame is on your libraries not the integrator.

    Good points. Fuzz tested library does NOT equal a secure integration. We’re going to discuss this with specifics at S4. Rotem Bar is also presenting exclusively on integration vulnerabilities too.

Leave a Reply