High availability and honeypots: The truth about industrial security
There’s an old saw that says if we built a house the way we built software, we could pull out one nail anywhere and the house would come down. I believe, according to my interviews, that data security must allow some people to see the data because of national interests. I suppose this makes sense, but not for SCADA systems.
For example, if a single BB from a low-powered BB gun hit me, I would just get angry. But if somebody shot me with a shotgun, I would die. Relating that to SCADA systems, we should be able to absorb small anomalies in the production process. IT security wants high bandwidth and confidential transmission. Occasional interruptions don’t matter.
I use my computer repeatedly for distance learning and teaching. This requires my system to be online no matter what. To some extent, I don’t care about bandwidth or confidentiality. I call up my systems expert and he insists upon increasing bandwidth that sometimes reduces availability — a quandary.
Downloading private information, even if it’s Amazon, can tolerate momentary outages of minutes or hours without catastrophe. CIA stands for confidentiality, integrity and availability — in that prioritized order. We, in industry, prefer to reverse those priorities. We don’t care if you know how much beer we make, but we do care if the process stops. The IT view of security is prioritized backwards from the industrial view of security. If we are manufacturing a truck every three minutes and go down for three hours, that’s a lot of trucks, and that time lost can’t be recovered. An automation process is a hard, real-time system. The system is more important than the speed of processing or downloading movies. Web searching indicated that the cost of a single downtime can reach up to $50,000 and can occur more than 10 times a year. That’s a half million dollars a year of downtime. I am reminded of an automobile I purchased recently. They offered me a warranty. I don’t want a warranty. I want to purchase a car that does not need one.
I’ve been associated with several systems that are real-time, including the programmable controller (PLC) and the fast food cash register. Most of us can’t remember the last time Big Macs were held up because of a computer crash or the last time a PLC had a blueline of error.
The PLC got this reliability in part because of its OS. My long-suffering secretary asked me, “How come the PLC never fails?” I lied to her. I asked her what happens when our system goes down, what do you do? “I unplug the router, count to 10 and it runs again.” I continued with, “And with most PLCs, we ‘reset’ the system 100 times a minute.” “Oh. Why doesn’t everybody do that?” A more difficult question to answer.
The secret to fast food reliability and availability is the communication system that we installed in the first White Castle systems. I knew that the communication wires going to the back room would have to be close to the deep fryers. They were heated by SCR circuitry that was electrically noisy. We knew that Shannon (one of my heroes) knew that signal power over noise power was the key. We hypothesized a knife switch connected to a car headlight and a car battery. We knew that if we put a system like that in, no one would ever interfere. The first specification on the communication system was +/-12 V and one amp. It was a communication system on steroids.
When I worked on early airplane systems, we used three computers situated far apart in the body of the craft. Each time a computer detected a failure, we changed to the next computer. This round robin continued even if one computer had been shot. There were some efforts made, and still exist, that use redundant computers for backup. Rather than go into detail, go check the web yourself.
We generate electric power using both atoms and molecules. We can’t recover easily from a power failure. I live in the woods in New Hampshire, north of Boston, and get a micro-failure every day and a major failure once a month. All of my computers are now laptops with built-in batteries. We can ride over most failures and I can still have the availability of my computer.
We forget this real-time requirement on real systems. Classic software allows interrupts and “go-to” avalanche software to exist. A PLC has an unusual OS — the subject of many of my columns. The cost of a PLC failing is never covered by the purchasing agent. As my daddy once told me, “The purchasing agents know the cost of everything and the value of nothing.” Most production systems run from five to 20 years, and we really don’t want to service them. My copier is a case in point. The copier is an amazingly complex system. Colour, two-sided and automatic stapling are some of the functions. Most servicing of the copier is done by my office staff. Here, staff is not plural. We can’t do that with automation systems. The buyer should be able to get the system up quickly. I remember the days when we used to repair TV sets or bought computer kits available at the store. Now we buy complete systems, not the components.
We can even have fun with a system that’s AIC versus CIA. As part of the system, we would put in a honeypot. A honeypot is the ability to distract intruders and possibly detect them. We would put in real data as to how much beer we made behind a low-grade security wall. The hacker would be satisfied, and we might be able to identify him or her. Another honeypot is the spy novel using seduction to find out enemy secrets.
The costs of failure in an automation system are high. It is more than compensated for by the effort to make solid systems. The brick outhouse is what we need with a lock on the door. You have to think about systemic and not component costs. Your car is built well, but almost every critical system is backed up mechanically — the brakes, steering and lighting. Only one headlight goes out at a time, we use two pistons for braking, and the steering system is an assist, not a standalone device. Take those to heart and design a system, not a collection of components. Good luck, and remember these are not answers, just places where we should look for answers.
This column originally appeared in the October 2013 issue of Manufacturing AUTOMATION.