hiring
AAA  AAA 

Virtualization in the SCADA World: Part 1

A few years back, the traditional IT world was debating the merits of virtualization. There were concerns about performance, security, vendor support, and a host of other issues. Fast-forward to today, however, and you’ll find virtual machines in use in nearly every data center. The number one reason virtual machines have revolutionized server-side computing, I believe, is cost savings. I can deploy a server in a fraction of the time I could before and, from a power consumption standpoint, operate it much more inexpensively. And then there are the business continuation benefits – I can quickly fail over or recover to virtual machine across the city or across the globe.

So what are the implications of this in the SCADA world? I think it’s just a matter of time before we see more widespread acceptance of VMware and other virtualization platforms in production control systems. The benefit here may be less about cost savings, though, and more about increased functionality. The ability to snapshot and clone machines for backup and testing, for example, is very attractive.

We’re going to examine this subject over a series of blog posts. Hopefully we’ll cover all the major topics – security, reliability, performance, serial communication issues, vendor support, and adoption rate, to name a few.

I look forward to your comments and opinions.

Comments

Comment from Ralph Langner
Time: January 31, 2008, 6:40 am

Our experiences with virtualization of SCADA systems are good. The few problems that we ran into are minor — for example, one app that required a parallel dongle to run. Every asset owner who is expanding his IT environment should have a look at this.

Comment from Jake Brodsky
Time: January 31, 2008, 8:04 am

We’re looking at it too, but not for the purpose of saving servers. We think this is a great way to carefully phase in OS and application patches, while still using the common application resources.

Should something ugly happen, we’re only one context switch away from our previous configuration.

Comment from Ron Southworth
Time: January 31, 2008, 8:26 am

Hi Gents

I have heard of some assessments in test environments here where a concern raised from testing running a numer of SCADA packages is in the “increased attack surface” the virtualisation creates - “many machines same potential problem”

I suppose the same can be said of other OS environments. For certain I can see some advantages in using the environment the potential to reduce kernel 0 attacks. I still am nervous with SCADA code running on virtual machines only because some of this stuff out there still has legacy routines written in all sorts of long forgoten code types that dinosaurs like me used to write in especially that wonderful language of COBAL. The code is just not designed to run in an environment where the question arrives after the answer!

Dongles well they are a pain frankly I really would love to see a support model where dongles were not required I see so many problems caused by dongles or code operation faiures in wierd modes when accessing these devices.

My thoughts still come back to why is there such interest to make a control systems environment any more complex that what is absolutely necessary is computer hardware/ OS etc that un reliable?

Comment from Ron Southworth
Time: January 31, 2008, 8:28 am

Jake the sort of testing you speak of is a good reason for virtualisation I still love the KISS principal.

Comment from Christofer Hoff
Time: January 31, 2008, 9:41 am

I’m interested in how risk is being assessed against these virtualized platforms (for better or for worse in your estimation.) Further, what standards and/or guidelines would you assess the entire virtualized system as a whole against?

I deal with folks in the DoD, public sector and Fortune X00 daily who have entire teams working on crafting strategies around virtualization, so I have a good idea of what most folks are doing, but since control system security continues to be described as being so different, I’d like to understand the approach better.

While the benefits of virtualization are many, the attendant increase in complexity, additional management interfaces, new processes and procedures, lack of mature and security toolsets, and lost visibility to traditional detection and prevention tools.

I won’t speak to the patch cycles for the underlying virtualization platforms themselves, as I did that on my blog, but I’m not sure that many of these points are considered as well as they should be before deployment.

/Hoff

Comment from Michael Toecker
Time: January 31, 2008, 12:27 pm

Coming from an electricity sector, there is only one main issue I’ve heard routinely with using virtualization in electric SCADA environments: How is a NERC-CIP auditor going to treat a virtualized set of systems?

Would the host machine be classified as a CCA? Could a virtualized plafform be a non-CCA that contains OTHER CCAs, and still be compliant? Where would an Electronic Security Perimeter be when a portion of the network is virtualized inside the host system? How do we test patches and security updates to the host in a manner that reflects the production environment if our critical systems are running inside it?

If there is no answer that set of questions, we will find many owners who will not make moves in the VM direction, regardless of the benefits, until they know that they can be compliant with NERC-CIP.

Mike

Comment from Landon Lewis
Time: January 31, 2008, 1:31 pm

I’m thinking the problems are going to be support all of the way. The different SCADA {soft}ware vendors barely support their application after you apply OS patches as it is. The next problem or limitation as previously mentioned are the serial, usb, and parallel licensing methods used by the different vendors to protect/control the usage of their software. If some of these issues with licensing can be resolved, supported or not it makes a decent approach for QA testing those patches and possibly even a tertiary fail-over environment.

Comment from Marty Edwards
Time: January 31, 2008, 2:42 pm

I agree with Landon on the support issue. The systems that I have virtualized under VMware ran fairly well and the ease of restoring a system to ‘known good’ with snapshot images is great. The problem is that most control system vendors don’t even want to talk to you on the tech support line if you mention that their application is running under VM. This will change as virtualization matures in the IT world…remember us ‘controls people’ usually lag behind by at least 5 yrs in adoption of COTS technology, so we are prime to see virtualization adoption in the next year or so.

I too had some issues with dongles, but there are enough IP based hardware solutions for adding USB ports, serial ports, or even parallel ports that you can setup one of these devices, put on your dongle and then load the driver on the VM box. Works great !!

Comment from Jake Brodsky
Time: January 31, 2008, 2:46 pm

In response to Michael Toecker’s queries about NERC CIP classification, I think this concern is exactly backwards. Build it with all due considerations and care from the start and let the NERC CIP classifications fall where they may.

Clearly, you shouldn’t virtualize things just because you can. Any virtuatlization efforts ought to take diversity, redundancy, and availability issues in to account.

And in response to Ron Southworth’s concerns, I agree. We need to keep the VM as simple and uncluttered as we can. Yes, there is more attack surface, so one should understand as much as possible about Virtualization before going down this road. That’s the risk. The reward is the ability to quickly flip from old to new configurations and then back again if needed.

Comment from Ralph Langner
Time: January 31, 2008, 4:06 pm

Hey Marty, do you tell your vendor EVERYTHING?

Ron and Jake, in many instances virtualization can actually reduce the attack surface. The VM box is usually top notch hardware with redundant power supply, RAID, monitoring software and other cool stuff. If you use this to replace a couple of rusty PCs that no admin regularly cares about, this can be a tremendous increase in stability. Think of it as replacing four worn off piston engines in an airplane by one slick turbine.

Comment from Clint Bodungen
Time: January 31, 2008, 4:32 pm

All good points and valid discussion and I agree that as virtualization matures, all will begin to fall into place in terms of vendor support and regulatory classification/guidance. Jake was on to a great point concerning a test environment. It’s a great way to create a test/lab environment to test changes before applying them to the production environment. Like so many others, though, I’m still hesitant to use it as a production platform. In addition to some of the other comments, my ultiate concern is single point of failure.

Whether you’re talking about an increased attack surface or other outage, if the host goes down so does every other virtual machine. Yes, there are backup clusters and all but now, for example, I only have to take out 2 hosts to disrupt 8 systems (4 live virtual hosts and 4 backup virtual hosts) instead of all 8 as with a traditional cluster.

And as far as dongles are concerned, it’s like the Statue of David… sooner or later you are going to lose it. :)

Comment from Marty Edwards
Time: January 31, 2008, 4:35 pm

Ralph - no I usually didn’t tell them EVERYTHING, however there comes a time you have to fess up or push back on their policies - otherwise there would be no change for the better.

Totally agree on your comment of rusty PC’s vs ‘top notch hardware’

I think the biggest ‘win’ for virtualization in the control system environment is the value add in recovery and restore ability - come on, the ability to spin up a new VM with a known good configuration in minutes on another VM server over having to re-image a ‘rusty’ server box is pretty hard to overlook.

Comment from Ron Southworth
Time: January 31, 2008, 6:16 pm

Hi Marty,

I think you hit on a good point with vendor support related issues.I can understand their side of the coin as well. Also as you rightly point out if we don’t ask vendors as a community to start looking at different ways to protect and support their applications and make changes how can we expect to ever see improvement.

There is a particular brand of dongle - very commonly used that has some very poor behavious effecting availability usually in heavy cpu load conditions (when you need teh system to work reliably to boot) primarily I have seen various packages loosing the license or end user seats on the license intermittantly. Just long enough to flash on the screen the license failure and for the critical button press to be ignored. What I have observed to date does not seem to be dependant on parallel port operation or USB operation. Virtualisation can present problems with the intermittant “loss” of the dongle.

This caused a bit of a problem for us the other day in how it manifested in a control system failure, the failure was more related to the HMI software but the trigger was the loss of the operator licence. it caused the mode of a pump to change not a good thing for a piece of pipe! The thing is that these devices can be so easily defeated.

For the vast majority of people that are honest is it really worth the amount of effort and frustration the devices create?

“Systemic Resilliance” is something that I can see needs to be addressed and this needs to take many forms. Virtualisation is oene means to en end. I think we just need to verify the operation of packages so that availability is not going to be an issue and see if we can persuade our vendors to support the virtualisation environment!

Comment from Ralph Langner
Time: January 31, 2008, 6:21 pm

Clint, I would like to stress your argument on single point of global failure. First, as you mentioned implicitly, we usually see redundancy for the VM server. In the average SCADA server… probably. Probably not. Now here’s the point. The asset owner decides on giving virtualization a go. However he’s a little bit shaky because of this all eggs in one basket thing and so on. So he is not only throwing in some redundancy, but also decent backup policys, monitoring, and on top of that an admin that pampers the box 24/7. After all, he ends up with a much more stable IT environment as before — even like in the old mainframe days!

Well, here’s the catch with your argument on single point of global failure. In many real world production environments, every single one of your mentioned 8 SCADA servers may well be a single point of global failure — as we are NOT only talking about IT systems, but also of a logistics chain. You kill one server, and the other seven may get useless, even if they continue to work fine. In situations like this, it’s easy to see that virtualization actually can take you seven problems away. Remember Lindbergh: He opted against a twin engine airplane to cross the Atlantic because this way he had one problem less to worry about. As pilots use to say: One engine, one trouble. Two engine, double trouble.

Comment from Jake Brodsky
Time: January 31, 2008, 9:10 pm

Folks, I like to decouple two issues here. First, I understand the case can be made for better hardware if one virtualizes servers. However, the cost of the hardware is a relatively minor issue. The software, the installation, the patches and updates, and the regular backups cost more. I wouldn’t put a virtualized server on a common white box computer. But I don’t normally use white box computers anyway…

As for Lindbergh’s remark: Two engines can be better, or they can be worse. One must consider the rest of the design and it’s use before making that declaration. The same needs to be said for virtualization.

Comment from Matthew Franz
Time: January 31, 2008, 10:06 pm

Wow, Jason gets the award for the first non-disclosure blog post than generated so much interest so fast. I’m sure Hoff has seen the viewpoint I see in my (non-SCADA) view of the woods where in the “Get Lean IT” the burden of proof to show why a given app *can’t* be run on ESX. Of course upgrades of virtualizaiton platform (and the impact on the underlying network) can be sort of interesting when you think about the number of hosts being brought online after an upgrade.

In 2008, more posts of virtualization less on vulnerability disclosure!

Comment from Ralph Langner
Time: February 1, 2008, 6:29 am

Absolutely, Jake — the overall design must be considered, and eight VM servers could perhaps be better than one. But there is theory, and then there is practice. Many of my clients have to fight for every dollar when they want/need to invest in new SCADA servers. They get all sorts of dumb questions. However if they go for the big one (i.e. ESX), that changes. For many of them, the move to virtualization is the first time in company history that they approach industrial IT in a professional way — and if not for anything else, that’s making it absolutely worthwhile.

Comment from Christofer Hoff
Time: February 1, 2008, 10:58 am

@Ralph, this is somewhat ranty and in no way directed at you personally other than to gain clarification on a couple of points you made :)

Undeniably, virtualization is about time and money. Making the most of one with less of the other. The benefits *always* are leveraged first in a compelling economic discussion such as that which you illustrate.

But that’s why we get into messes, as the downsides — which are often hard to quantify because of a lack of empirical data/metrics — get put in the “hold and we’ll deal with them after we save all this money” queue.

There are absolute and quantifiable operational and technical risks associated with virtualization. Weighing them against the upside is something that often gets swept under the rug of momentum.

However, we shouldn’t abandon rational and pragmatic approaches to dealing with this stuff, either.

Virtualization is absolutely going to emerge in your industry; it’s disruptive innovation and there’s very little to stop it, but we should also be careful about how we describe security in this context.

As such, I’m slightly confused by one of your statements.

How does increasing availability (by virtue of your statement regarding “stability”) have anything to do with “decreasing the attack surface?”

Somehow you’re alluding to the fact that reliability == security? It’s certainly a contributor. Is what you meant that the stability comes from someone paying more attention to and patching servers?

I can see potentially improving one’s security posture by the classical C, I and A definition (assuming the appropriate investments are made to do so,) but I think it’s unreasonable to generalize that with no other changes and simply implementing virtualization that:

virtualization == more secure.

Further, are you saying that an administrator who doesn’t pay attention to 4 “old rusty servers” in terms of administration, patching, etc. is going to pay attention to 1-2 shiny new ones because of more blinky lights?

…doesn’t sound like a technology problem at all ;)

/Hoff

Comment from Ralph Langner
Time: February 1, 2008, 12:17 pm

Hoff, no problem with the rantiness. To make things clear: I’m one of those guys who don’t see intentional attacks as the major threat to a SCADA system. However elaborating on this would lead us far away from the virtualization issue.

You are absolutely right that especially in my last post I am pointing more at an organizational issue than at a technological one. My experience is that with more centralization, the admin is likely to pay more attention to the equipment, and also gets better funding, because everyone realizes the importance of the machine. In a decentralized environment, some servers may well be overlooked even though they may be single points of global failure.

As for the technological advantages, I could add some more to those that others already mentioned, but I don’t want to appear like an ESX salesperson. Let’s just say you can nail me on the point that usually virtualization comes with organizational changes on the plant floor that increase security.

Write a comment