Call for Papers
AAA  AAA 

Generating network traffic for Quickdraw Security events.

My temporary job here at Digital Bond is to support Digital Bond’s control system technology lab and specifically the Quickdraw project.  That means primarily to identify and generate significant ‘representative’ network traffic, specifically control system traffic that may have security significance.  We are using real control system hardware devices to produce the ‘representative’ network traffic.  In this blog entry I am going to outline how this work relates to some of the other Digital Bond projects and to describe at a high level what has been done toward identifying control system security significant traffic.  But the real reason I am writing this is to solicit feedback about the events we have identified (or missed) so far.  I am also going to describe the software tools that I am currently using to stimulate the control system hardware to produce relevant network traffic and to solicit suggestions on other tools or methods that I might be able to employ. I am especially interested in tools that can not only interact with actual control system devices to produce control system traffic; but, can also facilitate the production of control system device security related traffic e.g. authentication, account creation, audit policy logging, etc.

Portaledge – Quickdraw – Quickdraw Security Events
By now most of you have at least heard mention of the Portaledge, Quickdraw and Security Events projects at Digital Bond.  These projects have a large degree of synergy although they are not entirely interdependent.  At a high level, the aim of Portaledge is to aggregate security events from a variety of data sources.  One of those sources might be passive log generation from the Quickdraw project.  The Quickdraw project is all about tracking network communication to generate logs for user transactions with “security implications”. The Security Events [1] part of Quickdraw project aims to recognize control system specific “security events, extract[...] parameters and create the security event, and send[...] the security event to a log server, historian, SEM or other log aggregation server.”

Quickdraw security events.

The benefit of identifying the Quickdraw security events is two fold.  On the one hand, Quickdraw provides a mechanism to compensate for the lack of explicitly security related logging support on today’s control system equipment.  However, arguably even more important is that by identifying potential security events we are exploring just what kind of process control events might signify or contribute to detection of a security condition and should therefore, in the future, be included in the security audit capabilities of control system devices.

A while back I brainstormed a set of potential security events.  Based of feedback from Dale there is a first cut of these events on a Scadapedia  page

It turns out that some of these events can be facilitated with the basic network technologies available.  I can toggle power on devices, change DHCP mac address entries and such.  I can push ladder logic, firmware, etc.  But process control traffic and especially security specific process control traffic is a bit more challenging.  As I have attempted to generate ‘representative’ network traffic I have become more concerned that I don’t have the tooling to address specifically security related aspects of the protocols e.g. authentication, account creation, audit policy logging, etc.  I am also a little concerned about a point raised in a comment to a blog on an unrelated subject by Digital Bond’s Kevin Lackey.  That comment, from Ralph Langer stated:

 ’The big risk with controllers, however, is process manipulation. This can hardly be done via firmware upload, but it can easily be done by using the controller’s command protocol, be it Modbus, Ethernet/IP or whatever proprietary stuff.”

I am not convinced that the events we’ve come up with so far in any way relate to ‘process manipulation’ as described by Ralph Langer.  Maybe by the time the process is being manipulated the game is already over, but I’m definitely interested in other people’s thoughts on the matter.

Another key problem I have encountered is that although Digital Bond has been acquiring several paradigm examples of networked control system components, the Digital Bond lab does not represent an actual industrial process.  So you take even a really simple event, like say event #45 Reboot or Restart.  Some devices announce themselves when they come online but without any sort of heartbeat or normal traffic in place, a device might conceivably come on and go off-line without so much as a peep.

Now I would like to solve this by essentially building a big old ‘hard’ honeynet architecture, with for example Wonderware being used to monitor the systems and IO simulation pumping normal events into the various controllers, all generating ‘representative’ control system traffic.  In this way there would be a background to compare against so if a new device were rebooted/restarted that would be detectable in the traffic not only by the temporary absence of traffic but by failure to connect type events, etc.  Note that wouldn’t always be true because some industrial protocols use connection-less UDP based packet broadcasting and multi-casting which means that rebooting a data recipient still may not create an observable.  Anyway, a full blown ‘hard’ honeynet architecture isn’t an option right now, so let’s look at the control protocol client software technology I’ve been working with so far. 

For the Ethernet/CIP technologies I can work with RSLogix ladder logic upload/download.  I can view the status of ControlLogix memory by querying with the Wonderware via the Wonderware Ethernet/IP Data Acquisition Server.  That strategy also works for the ModBus traffic too.  In addition to Wonderware, a very simple tool I can use to both read data from and write data to a ModBus controller is the Win-Tech ModScan32.  The demo only runs for a couple of minutes, but I can use it to demonstrate basic connectivity.  I use RSLogix to change MNet1.WriteData[0] on the controller and then the change appears in 40001 holding register.  I use ModScan32 to change the 40601 holding register and I can see with RSLogix that the value has changed in MNet1.ReadData[0].

Unfortunately, Wonderware doesn’t ship a DNP3 Data Acquisition Server.  For DNP3 I have found that I can use the Triangle Microworks Communication Test Protocol to verify connectivity with the DNP3 devices.  By copying the DNPSNET.Status.Scn_cnt to the DNPSNET.Data.DNP_A1 data address in the Ladder logic, I can then use the Triangle Microworks Communication Protocol Test Harness to read the first analog value, which reflects the ladder logic induced changes to the DNPSNET.Status.Scn_cnt.  Another advantage or the Triangle Microworks Communication Protocol Test Harness is that it is the only tool I’ve found so far that I believe is going to help me with the explicitly security related features of DNP3.

It looks like there is quite a bit more I can do with the Triangle Microworks Communication Protocol Test Harness, especial when accessing devices like SEL-351s.  I am thinking that the ASE2000 from ASE Systems would also be very helpful.

So my questions to the Digital Bond blog readers are:

 1.  What do you think of the initial security event list, what is obviously missing, etc.: 
 
 2.  What do you think of the client tooling approach we are taking?  Will these protocol clients be able to generate the requisite security events?  Would it make more sense to go with a full package from a single vendor to get features like authentication?  Are there available demo applications that would be useful for instrumenting control system traffic? 
 
Advice, criticism, comments etc. welcome!

Martin

Comments

Comment from Ralph Langner
Time: January 1, 2009, 10:30 am

First of all, happy New Year to everyone.

“Maybe by the time the process is being manipulated the game is already over, but I’m definitely interested in other people’s thoughts on the matter.” — There is still very much value in collecting this as a security related event, Martin.

First of all, you get some hard facts for your forensics. WHEN, HOW, and BY WHOM was the process (i.e. variables, timers, counters etc.) manipulated? Your logs will tell. Besides, if we assume that most manipulations will not result in a sudden “game over” situation, but in some performance or quality degradation, maintenance engineers will be able to make good use of your event logs to determine what exactly went wrong. Without such guidance, an attacker could drive a maintenance crew nuts by simply doing little manipulations here and there over a long period of time.

Second, if it is a known fact among staff that all manipulations (or at least all manipulations of critical variables) are logged, this also acts as a preventive countermeasure. The insider attacker is no longer untraceable, and will therefore perhaps find this type of attack to risky.

Third, sooner or later the user (=asset owner) may think, well, if we are able to intercept and log process manipulations, we might as well install better filtering & authentication procedures to prevent unauthorized manipulations. Asset owners with no control of and no idea of what’s going on in their PCN sometimes think secure access isn’t within reach anyway. Once you show them that every single bit can be traced and controlled, some start to rethink strategy.

Just some lessons we learned from our “Total Control” project, which was somewhat similar to Quickdraw, but not nearly as ambitious.

Comment from Michael Toecker
Time: January 2, 2009, 7:18 pm

Martin,

I think that the majority of events that would fall into Ralph’s malicious manipulation category will likely go unnoticed by your current setup, simply because those types of manipulation are based on the specific system, and couldn’t be abstracted out to every system. Maybe you should have a few generic events that would allow owners to watch for changes to specific points that are outside what they would consider normal.

Watching for forced points would be good. Forcing points is typically not *supposed* to be done in process environments except as a weapon of last resort. It’s also one of the easiest ways to mask manipulation, or cause other haywire to ensue. Some forcing is done on the HMI (maybe out of scope), but some is done on the controller itself.

I also didn’t see anything in there about alarms and events. I know some controllers have their own alarm reporting outside of normal point polling within the controller (especially SCADA controllers). These alarms are usually configured at commissioning time, and changed only if the physical equipment being controlled is switched out (or the settings were WRONG to begin with). If someone were to disable those alarms, I’d want to know about it, especially since sometimes an HMI/RTS isn’t configured with it’s own alarms if the system is designed to accept events coming from the controller. I don’t know if you are planning to use any controllers create their own alarms and events though. Since the lack of alarming contributed to the Northeast Blackout, it would be something that should be watched.

Ummm. Maybe a change in Data Type for points coming from the controller? I don’t think that would work because it would require Quickdraw to keep track of all the data types of all the points. But, switching from a signed to an unsigned integer could be a real fun way to mislead an operator using a legacy system that doesn’t check data types. File that under mischief+mayhem category.

Events for bad quality points (i.e. points that are transmitted late) would be good. A malicious person will likely be changing points quickly to compensate for operator actions, and this would cause the controller’s reporting to slow down. Lots of these would indicate problems outside of security though too, like controller failure or a malfunctioning network connection, and are likely already in the system.

Tall order you gave, lots to think about…

Mike

Comment from Martin Solum
Time: January 6, 2009, 6:49 pm

Thanks to both Ralph Langner & Michael Toecker for your comments. Both are really appreciated.

Both comments made me think about the problem of application specifics. When I first started looking at this I thought PLC faults would be a natural thing to try to passively log. But as I started digging into the Rockwell Controller a bit it seemed like we would have to add fault handlers to generate fault information to be passively logged, and in that case we might as well just ‘actively’ log it. And the issue of what faults to log where seems to be a pretty daunting task.

The forced points concept is a really interesting idea, really a critical idea, but also runs into application specifics. If the packets from the HMI cause a changed set point that might very well be business as usual. On the other hand, if packets from the HMI (or really, anywhere outside of the PLC’s inputs) changes a process variable, that’s a red flag. But from a passive logging point of view, how do I know which change is normal and which is a process variable force? I think it can be done but not sure it can be done without essentially reading the application specific data dictionary and building a map of what can and can’t be changed under normal circumstances.

Alarms is a big application issue too. The ability to sense that alarms or alarm heartbeats are being disabled would be very valuable. But alarms being “usually configured at commissioning time” (as Michael stated) means there would have to be a way to read in the local configuration, and that’s going to vary like crazy across protocols, technologies and especially local applications.

Data type variance would indicate something very anomalous. But I am thinking a lot of processing might result in very little useful information. It would be an expensive way to catch an adversary’s goof, but wouldn’t catch huge but correctly typed process manipulations.

All this makes me think it’s critical try to find out what critical information can be passively logged without having application specific knowledge even though sooner or later, application profiling is probably going to be necessary.

you both emphasize Quickdraw really is a very ambitious project, I guess that’s why I think it’s so cool. ;-)

-Martin

Comment from Ralph Langner
Time: January 7, 2009, 9:29 am

Analyzing process manipulations at this level is somewhat like separating spam from legitimate mail. It’s not about black and white, it’s about how light your grey zone gets. Some strategies that can be used:

1. Log ALL manipulations of all process variables. Usually only little information is written to controllers, e.g. only once per lot, so storage requirements are minimal. The log mostly acts as a reference database for Forensics, similar to an audit trail.

2. Provide a configurable whitelist of legitimate clients (e.g. HMIs, OPC servers) and log only manipulations that originated from unlisted sources.

3. Log only access to critical process variables. In some facilities, there is only a hand full of variables that would lead to catastrophic consequences. In general, the variables (or registers, memory areas etc.) to be monitored should be configurable.

4. Log any write access to memory areas that is never used for normal operation. This is product specific. For example, Siemens PLCs come with the outright silly feature to manipulate markers, timers, and counters via the network (all of these are used for internal program execution of the ladder logic). This would/should never be used by a legitimate application and will usually de-sync outputs.

5. Ideally you would have some scripting capability to enable the user for some application specific adjustments.

As for alarms, it should be quite easy to intercept any code that tampers with the alarm handling of a PLC at runtime. I don’t think it makes sense to bother with configuration issues at commission time as most security related issues in this department can be handled by using PLC version control systems.

Write a comment