Mustard, Steve - Industrial Cybersecurity Case Studies and Best Practices-IsA - International Society of Automation (2022)
Mustard, Steve - Industrial Cybersecurity Case Studies and Best Practices-IsA - International Society of Automation (2022)
Mustard, Steve - Industrial Cybersecurity Case Studies and Best Practices-IsA - International Society of Automation (2022)
Contributor to
Water Environment Federation, Design of Water Resource Recovery
Facilities, Manual of Practice No. 8, Sixth Edition
Industrial Cybersecurity Case
Studies and Best Practices
No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written
permission of the publisher.
ISA
P.O. Box 12277
Research Triangle Park, NC 27709
Acknowledgments
About the Author
Chapter 1 Introduction
About this Book
Terminology
Intended Audience
Chapter 2 What Makes Industrial Cybersecurity Different?
Introduction
What Are the Differences between OT and IT?
Relative Priorities
The Golden Triangle
The Significance of Technology
The Significance of Culture
Consequences
Mitigations
Foundations of Industrial Cybersecurity Management
Frameworks, Regulations, Standards, and Guides
The Difference between Frameworks, Regulations,
Standards, and Guides
National Institute of Standards and Technology
Cybersecurity Framework
ISA/IEC 62443
NIST Special Publication 800 Series
Others
Summary
Chapter 3 Creating Effective Policy
Introduction
Establish the Governance Infrastructure
Assign Senior Management Representation
Allocate Resources and Assign Clear Ownership
Establish Good Oversight
Reporting Cybersecurity Management System
Effectiveness
Tracking and Managing Cybersecurity Risk
Monitoring Changes
Communicate to the Organization
Regular Reports on Lagging and Leading Indicators
Prompt Reporting of Cybersecurity Incidents
Reporting Cybersecurity Observations or Near Misses
Reporting Cybersecurity Incidents to Employees
Monitoring Compliance and Benchmarking
Monitoring Against Policy
Monitoring Against Industry Standards
Summary
Chapter 4 Measure to Manage Risk
Introduction
A Brief Overview of Risk Management
The Importance of Risk Management
Defining Safety Risk
Defining Cybersecurity Risk
Industrial Cybersecurity Risk
As Low as Reasonably Practicable
Security Process Hazard Analysis
Quantifying Risks with Statistics
Monte Carlo Simulation
Bayes’s Theorem
Cybersecurity Safeguards
Using ISA/IEC 62443 Standards to Define Safeguards
Responsibility for Defense-in-Depth Measures
Simplified Assessment and Definition of Safeguards
The Future for Industrial Cybersecurity Risk
Management
Summary
Chapter 5 Standardized Design and Vendor Certification
Introduction
Benefits of Standardizing Designs
Essential Elements of a Standardized Design
Secure Network Design
System Hardening
Hardening Wi-Fi Networks
Physical Access Control
Electronic Access Control
Secure Remote Access
Network Monitoring
Cybersecurity Incident Response Plan
Backup and Recovery Procedures
Manual Procedures
System Availability
Specifying System Availability
Designing for System Availability
Other Considerations
Internet Protocol Addressing
Encryption
ISASecure
Summary
Chapter 6 Pitfalls of Project Delivery
Introduction
Secure Senior Project Leadership Support
Embed Cybersecurity Throughout the Project
Feasibility
Engineering
Construction
Commissioning
Start-Up
Handover and Closeout
Embed Cybersecurity Requirements in All Contracts
Raise Awareness Within the Project Team
Implement a Rigorous Oversight Process
Verification of Requirements
Risk and Issue Management
Performance Management
Summary
Chapter 7 What We Can Learn from the Safety Culture
Introduction
The Importance of Awareness
Underestimating Risk
Human Error
Supporting the Right Behaviors
The Safety Culture
The First Line of Defense
Training and Competency
Continuous Evaluation
Summary
Chapter 8 Safeguarding Operational Support
Introduction
Making Cybersecurity a Key Factor
Barrier Model Analysis and Visualization
People Management
Background Checks
Separation of Duties
Joiners, Movers, and Leavers
Manual Procedures
Inventory Management
Creating an Inventory for New Facilities
Creating an Inventory for Existing Facilities
Maintaining and Auditing the Inventory
Incident Response
Suppliers, Vendors, and Subcontractors
Insurance
Summary
Chapter 9 People, Poetry, and Next Steps
Bibliography
Appendix A: Resources
Index
Acknowledgments
To Neil Tubman and Mark Davison, the two most brilliant minds I have
ever come across, for your continued support throughout the years.
To Mike Morrissey, for showing me how to establish great partnerships
between staff and volunteers in professional associations and how to
achieve positive results in the process.
To Lauren Goodwin, for your support, and the Clase Azul Reposado.
To Ken Nguyen, for the opportunity to work on one of the most exciting
and fulfilling projects of my career, and to all my friends on the digital team
for making it fun along the way.
To Steve Huffman, Steve Pflantz, Leo Staples, and Mike Marlowe, for
your friendship, mentorship, support, and chicken pot pie.
To Blair Traynor, Nicky Jones, John Flynn, and Paul Holland for your
invaluable comments on drafts of this book.
To the ISA staff, for working together with the member community to
create a better world through automation.
Finally, to David Boyle, my friend and colleague for too many years to
count. A true friend accepts who you are, but also helps you become who
you should be. Thank you for helping me be better.
About the Author
Mustard is a recognized
authority on industrial
cybersecurity, having developed and delivered cybersecurity management
systems, procedures, training, and guidance to various global critical
infrastructure organizations.
Mustard is a licensed Professional Engineer in Texas and Kansas, a UK
registered Chartered Engineer, a European registered Eur Ing, an ISA
Certified Automation Professional (CAP), a certified Global Industrial
Cybersecurity Professional (GICSP), and a Certified Mission Critical
Professional. He also is a Fellow in the Institution of Engineering and
Technology (IET), a Senior Member of ISA, a member of the Safety and
Security Committee of the Water Environment Federation (WEF), a board
member of the Mission Critical Global Alliance (MCGA), and a member of
the American Water Works Association (AWWA).
Mustard writes and presents on a wide array of technical topics and is
the author of Mission Critical Operations Primer, published by ISA. He has
also contributed to other technical books, including the Water Environment
Federation’s Design of Water Resource Recovery Facilities, Manual of
Practice No.8, sixth edition.
1
Introduction
From the early 2000s, when Vitek Boden used a stolen laptop and a radio to
wreak havoc at a sewage treatment plant in Queensland, Australia; through
2010, when the Stuxnet malware disrupted production at an Iranian nuclear
enrichment facility; to 2018, when attackers gained access to safety systems
and shut down a Middle East refinery, the threats of malware and
cyberattacks have increased in lockstep with advances in industrial
automation.
Some industry sectors have progressed further than others. Some sectors,
such as oil and gas, have invested heavily in industrial cybersecurity. Other
sectors, such as water and wastewater, remain behind the curve on
addressing their cybersecurity. In all cases, asset owners/operators have
largely developed their own solutions and systems in isolation. This results
in similar approaches with varying degrees of success.
During the final stage of writing this book, four OT-related cybersecurity
incidents occurred. The first, in February 2021, occurred when an
unauthorized remote access user tampered with the levels of a toxic
chemical in a water treatment plant.7 The second incident, reported a month
later, occurred in 2019 and was initiated by a disgruntled former employee
who attempted to remotely tamper with a different water treatment plant.8
In the third incident, a fuel pipeline was shut down for a week after the
company’s billing system was incapacitated by ransomware.9 The fourth
incident, also involving ransomware, occurred two weeks later, impacting
the operations of a global meat producer. The company paid an $11 million
ransom and was able to restore operations in less than one week.10 Some
experts distinguished the two ransomware incidents as IT, not OT,
cyberattacks. Although this is technically correct, the result was
indistinguishable from an OT attack. The pipeline control system was
disabled, and operations were shut down. This led to panic buying and gas
shortages across the southeastern United States. There was further potential
for interruption of critical services, such as airports, that depend on this fuel
supply. The meat producer shutdown could have led to similar issues had it
not been resolved as quickly as it was. In short, we are, on the whole,
woefully unprepared to adequately manage cybersecurity incidents, be they
IT or OT.
Poor project delivery can negate some or all of the benefits of secure
designs. This can take the form of poor execution or oversight. It might
entail the introduction of new vulnerabilities that are not properly identified
or addressed. It can even be seen in poor practices during the development
or commissioning of a system. This book will provide some guidance on
effective oversight methods.
Finally, this book will consider the role operational support plays in
industrial cybersecurity. That includes day-to-day activities such as
operating system patching and system backups, as well as preparation for
and response to cybersecurity incidents.
This book is intended to help identify these gaps and offer solutions to
address them. I focus on the highest risk areas that can be tackled
immediately with the least additional cost and effort.
I sincerely hope that we can make more progress going forward than we
have in the past 20 years. And I hope this book can contribute to that
progress.
Terminology
Cybersecurity, like many technical subjects, comes with its own lexicon
and, with that, many confusing and interchangeable terms.
There are ongoing debates about the most appropriate and inclusive terms
for the subject matter of this book. I have used the term industrial
cybersecurity, but I acknowledge that some sectors or specialisms consider
themselves excluded. For example, many building automation system
providers and users do not typically consider themselves as “industrial.”
The series title brings up a final terminology question: Should the term used
be “cybersecurity” or simply “security”? Some believe that the cyber
distinction leads to incorrect assumptions such as this: ownership of the risk
lies with an organization’s chief information security officer (CISO). Others
have accepted that the term cybersecurity has been adopted sufficiently and
that changing it would lead to further confusion.
Intended Audience
This book is intended for anyone involved in industrial automation and
control systems cybersecurity, including operators, technicians, engineers,
and managers within asset-owner and asset-operator organizations; product
vendors; system integrators; and consultants.
____________
1 George Dalakov, “The First Computer Virus of Bob Thomas (Complete History),” accessed
July 25, 2021, https://history-computer.com/inventions/the-first-computer-virus-of-bob-
thomas-complete-history/.
2 RSA is an acronym made up of the first letters of the last names of the three company co-
founders: Ron Rivest, Adi Shamir, and Leonard Adleman.
3 RSAC Contributor, “The Future of Companies and Cybersecurity Spending,” accessed June
21, 2021, https://www.rsaconference.com/library/Blog/the-future-of-companies-and-
cybersecurity-spending.
4 Gartner, “Gartner Forecasts Worldwide Security and Risk Management Spending to Exceed
$150 Billion in 2021,” May 17, 2021, accessed June 21, 2021,
https://www.gartner.com/en/newsroom/press-releases/2021-05-17-gartner-forecasts-
worldwide-security-and-risk-managem.
5 Finances Online, “119 Impressive Cybersecurity Statistics: 2020/2021 Data & Market
Analysis,” accessed June 21, 2021, https://financesonline.com/cybersecurity-statistics/.
6 RiskBased Security, “2020 Year End Report: Data Breach QuickView,” accessed June 21,
2021, https://pages.riskbasedsecurity.com/en/en/2020-yearend-data-breach-quickview-report.
7 Jack Evans, “Someone Tried to Poison Oldsmar’s Water Supply during Hack, Sheriff Says,”
Tampa Bay Times, February 9, 2021, accessed June 21, 2021,
https://www.tampabay.com/news/pinellas/2021/02/08/someone-tried-to-poison-oldsmars-
water-supply-during-hack-sheriff-says/.
8 Chris Young, “A 22-Year-Old Logged in and Compromised Kansas’s Water System
Remotely,” Interesting Engineering website, April 6, 2021, accessed June 21, 2021,
https://interestingengineering.com/a-22-year-old-logged-in-and-compromised-kansas-water-
system-remotely.
9 Ellen Nakashima, Yeganeh Torbati, and Will Englund, “Ransomware Attack Leads to
Shutdown of Major US Pipeline System,” Washington Post, May 8, 2021, accessed June 21,
2021, https://www.washingtonpost.com/business/2021/05/08/cyber-attack-colonial-pipeline/.
10 Jacob Bunge, “JBS Paid $11 Million to Resolve Ransomware Attack,” Wall Street Journal,
June 9, 2021, accessed June 21, 2021, https://www.wsj.com/articles/jbs-paid-11-million-to-
resolve-ransomware-attack-11623280781.
2
What Makes Industrial
Cybersecurity Different?
Introduction
Information technology (IT) cybersecurity is concerned with information
security, personal information, and financial transactions. Operational
technology (OT) cybersecurity is concerned with operational system
availability and safety.
One school of thought says there should be no distinction between IT and
OT cybersecurity. Its mantra: Technology is technology. The use of IT
products in industrial control systems has increased dramatically in the past
40 years. Systems now run on servers and workstations running Windows
operating systems and databases. These systems have many IT-oriented
application layer protocols in use. However, the use of that technology and
the consequences when it fails are distinct.
This chapter highlights key differences between OT and IT environments. It
looks at how cybersecurity practices must adapt to cope with these
differences.
People • Primary focus is the service provision • Primary focus is safety, then production
• Underlying technology is the majority of • Underlying technology is a means to an
the service end
• Control and management of data • Control and management of physical
• Many skilled professionals processes
• Limited pool of skilled professionals
To date, the focus on IT/OT differences has been on the technology element.
Many books and presentations have discussed similar lists of differences.
These differences continue to be important:
• The frequency of technology refresh is unlikely to change in OT
environments. The technology is there to support a high-availability
production system. The adage “if it ain’t broke, don’t fix it” is
common in OT environments. Taking systems out of service to
perform updates is not only costly, but it also introduces new risks:
New technology has less of a track record and includes additional
features that may create unexpected consequences. Consider
Boeing’s 737 MAX 800 aircraft, which entered service in 2017. After
two fatal crashes, the aircraft was grounded in 2019. One factor in
the crashes was the introduction of a new automated flight control
system called Maneuvering Characteristics Augmentation System
(MCAS), which was not explained in any manuals or in crew
training.11
• Although newer OT systems include some components that will
integrate with IT, network, or cybersecurity tools, they may never
include full support across all components. Some components, such
as safety controllers, must minimize their functionality to maximize
performance and reliability. In addition, segregation using protocol
firewalls will limit the ability to reach devices using IT tools. More
importantly, because of the long life cycle already mentioned, many
facilities will continue to run on legacy equipment that cannot
support such integration.
• The server and workstation elements of OT systems have converged
with their IT equivalents in terms of environmental standards and
reliability. This is possible because these items operate in climate-
controlled environments. There are many elements of OT systems
that will continue to operate in harsh environments and will always
need specialist hardware.
To date, there has been less focus on the people and process elements—
specifically, their impact on IT/OT differences. These elements can be
summed up in four distinct points:
1. The significance of technology
2. The significance of culture
3. Consequences
4. Mitigations
The Significance of Technology
Figure 2-2 shows an example of typical elements of IT and OT projects.
Figure 2-2. Proportions of IT and OT projects.
As already noted, the C-I-A triad greatly simplifies the relative concerns for
IT and OT systems. A more realistic list of potential consequences is as
follows:
• Privacy violation – Exfiltration of personally identifiable
information (PII), such as government identification numbers and
bank account numbers
• Operational impact – Loss of production capacity, inability to
process customer orders, and other effects
• Reputational damage – Typically related to another consequence,
such as privacy violation, environmental harm, injury, or loss of life
• Regulatory impact – Typically related to another consequence, such
as operational impact, environmental harm, injury, or loss of life
• Injury or loss of life – Harm to workers in the operational
environment or to members of the public, for instance, from fire or
explosion
• Environmental harm – Release of pollutants or other harmful
materials into a body of water or the atmosphere
The risk assessment of the facility or process is based on the assumption that
all layers of protection are in place and will operate on demand. This means
organizations must take seriously the threat of a cybersecurity incident on
these systems.
Consider the example of a gas turbine control system. Gas turbines are used
extensively in industry for critical processes, such as power generation, gas
compression, and water injection. A gas turbine is shown in Figure 2-4. A
typical gas turbine may cost $6 million (£4.3 million), weigh 20,000 lb
(9000 kg), and operate at up to 10,000 psi (69,000 kPa).
Figure 2-4. Gas turbine used for power generation, gas compression, and water injection.
A control system is required to safely operate the turbine and shut it down in
the event of a serious situation. Figure 2-5 shows a simplified block diagram
of this system.
Figure 2-5. Simplified gas turbine control system showing potential cybersecurity risks.
A programmable logic controller (PLC) is the basis for the control functions
that provide the basic automation layer; a connected human-machine
interface (HMI) enables operators to observe the system status and make set-
point changes (the plant personnel intervention layer).
The safety functions that form the safety system layer may include a safety
controller that focuses on turbine protection, and a fire and gas controller
that interfaces with the safety controller to shut down the turbine if required.
These functions operate independently of the control functions and react
immediately, and automatically, to contain or mitigate a hazard.
To provide warranty and support for the end user, the gas turbine vendor
collects process data from the system. This data typically travels over a
secure connection between the vendor’s operations center and the facility.
This connection enables the vendor to analyze turbine performance and
determine maintenance actions.
The gas turbine control system is vulnerable to the same incidents discussed
earlier:
• The basic automation layer can be compromised, enabling the
system to be operated remotely. This could result in the turbine being
shut down, causing a power failure or loss of production at the
facility.
• The plant personnel intervention layer can be compromised so that
the turbine migrates outside its normal operating envelope (by
manipulating the basic automation layer). Meanwhile, it appears to
the users at the facility and the vendor to be operating normally. This
can cause damage and an unplanned outage, resulting in loss of
power or production. The plant will face unplanned repair costs and
the challenges of operating without the damaged turbine.
• The safety system layer can be compromised so that the automatic
shutdown is disabled. This can result in catastrophic damage to the
turbine (e.g., after running without lube oil present), power failure,
loss of production, fire, or explosion.
As awareness of OT systems grows, the threat to organizations that operate
them grows. Nation-states now see potential for major disruption to their
enemies through attacks on OT systems. After the Stuxnet attack on Iran’s
Natanz uranium enrichment facility was made public in 2010, it led to a
series of cyberattacks and counterattacks involving the United States, Iran,
Saudi Arabia, and Israel. Iranians were indicted in 2016 for attempting to
gain access to a US dam system in upstate New York.19 The Shamoon-
related attacks on Saudi Aramco and its vendors, such as Saipem, have been
attributed to Iran.20,21 Israel attacked an Iranian port in May 2020, causing
“massive backups on waterways and roads leading to the facility.”22
Mitigations
The response to the cybersecurity threat in IT and OT systems must be
distinct.
Consider the example of the gas turbine control system. External access is
limited to a secure connection with the vendor’s operations center. There is
no particular concern regarding sensitive data exfiltration; however, note the
following:
• The PLC, HMI, safety controller, and fire and gas controller may be
accessed by anyone in the facility. This flaw enables an unauthorized
individual to reprogram these systems or deploy malware to the
HMI. Although the specialist skills and knowledge to work on these
systems can be hard to find, there are several examples of hackers
with no industrial control systems experience identifying and
exploiting vulnerabilities in those systems. In one example, two
hackers with no prior product experience identified three previously
unknown vulnerabilities in a major automation vendor’s product and
presented them at the RootedCON 2014.23,24
• OT facilities have a variety of physical controls, such as lockable
cabinets and rooms, and strict procedures for accessing and working
in these areas. Nevertheless, personnel may bypass some of these
controls, for example, by leaving cabinets unlocked.
• The turbine control system network may be isolated from the wider
network (aside from the vendor connection). This means automated
monitoring and updates of Windows equipment may need to be done
manually.
• The secure connection provides some protection, but the
effectiveness of this control depends on the awareness, training,
policies, procedures, and physical security behaviors of the vendor’s
personnel.
Based on these considerations, the focus on mitigations for the gas turbine
control system is distinct from that for mitigations for an IT system in
several ways:
• Physical and electronic security – Limiting physical and electronic
access to the control system components to authorized individuals
only. This is accomplished by such actions as locking doors and
cabinets and protecting usernames and passwords.
• Strict enforcement of procedures – For instance, limiting, or
banning, the use of removable media and maintaining security
updates, antivirus software, and signatures (or using application
control) on Windows equipment.
As noted earlier, the focus is more on the people and the processes than the
technology. The misuse or insecure use of technology in the OT environment
can create significant vulnerabilities. A safety culture with well-trained
people following strict processes and procedures is essential in the OT
environment.
The book Mission Critical Operations Primer26 provides more detail on the
primary function of regulatory and standards bodies. Although the book
focuses on US regulatory and standards bodies, similar organizations in
other countries perform the same function.
National Institute of Standards and Technology Cybersecurity Framework
US Presidential Executive Order 13636 (“Improving Critical Infrastructure
Cybersecurity”) instructed the National Institute of Standards and
Technology (NIST) to develop a voluntary cybersecurity framework (CSF)
that would provide a “prioritized, flexible, repeatable, performance-based,
and cost-effective approach for assisting organizations responsible for
critical infrastructure services to manage cybersecurity risk.”
The CSF is structured into five core functions, each of which includes
categories and subcategories. This format enables those unfamiliar with the
requirements of cybersecurity management to navigate the subject and drill
into detail as needed.
The CSF overview is illustrated in Figure 2-7. It shows the five core
functions, Identify, Protect, Detect, Respond, and Recover, with their
respective categories (e.g., Asset Management, Identity Management, and
Access Control).
As noted previously, as a framework, the CSF does not provide any detailed
guidance. Instead, the document refers to standards and guides. This format
helps readers who are unfamiliar with the standards and guides to navigate
the documents.28
For industrial cybersecurity, the CSF refers to the ISA/IEC 62443 Series of
Standards and NIST 800 series guides for its specific guidance. Both of
these sources are focused specifically on industrial cybersecurity.
ISA/IEC 62443
The ISA/IEC 62443 Series of Standards addresses the security of industrial
automation and control systems (IACSs) throughout their life cycle. These
standards and technical reports were initially developed for the industrial
process sector but have since been applied to the building automation,
medical device, and transportation sectors. Figure 2-8 provides an overview
of the family of standards.
There are four tiers in the series of standards. The first two focus on people
and processes. The last two focus on technology (systems and components).
At the time of writing, some documents are still in development. Key
documents in the family include the following:
• Part 2-1 – Establishing an IACS security program. This helps
organizations plan and implement a cybersecurity management
system focused on industrial cybersecurity.
• Part 3-2 – Security risk assessment, system partitioning, and security
levels. This describes the requirements for addressing the
cybersecurity risks in an IACS, including the use of zones and
conduits as well as security levels. These are key aspects of industrial
cybersecurity design.
• Part 3-3 – System security requirements and security levels. This
document describes the requirements for an IACS system based on a
specified security level. It helps organizations quantify their
requirements in universally understood terms.
• Part 4-1 – Product security development life-cycle requirements.
This describes the requirements for a product developer’s security
development lifecycle.
• Part 4-2 – Technical security requirements for IACS components.
This addresses the requirements for IACS components based on the
required security level. Components include devices and
applications.
Regulation Title 10 CFR – Energy Nuclear Regulatory Commission (NRC) regulation for the
US nuclear industry.
Regulation Critical Infrastructure North American Electric Reliability Corporation (NERC)
Protection (CIP): regulation for North American electricity generation and
• CIP-002-5.1a – Cyber distribution industries.
Security – BES Cyber
System Categorization
• CIP-003-6 – Cyber
Security – Security
Management Controls
• CIP-004-6 – Cyber
Security – Personnel &
Training
• CIP-005-5 – Cyber
Security – Electronic
Security Perimeter(s)
• CIP-006-6 – Cyber
Security – Physical
Security of BES Cyber
Systems
• CIP-007-6 – Cyber
Security – System
Security Management
• CIP-008-5 – Cyber
Security – Incident
Reporting and Response
Planning
• CIP-009-6 – Cyber
Security – Recovery
Plans for BES Cyber
Systems
• CIP-010-2 – Cyber
Security – Configuration
Change Management and
Vulnerability
Assessments
• CIP-011-2 – Cyber
Security – Information
Protection Related
Information
• CIP-014-2 – Physical
Security
Regulation Title 21 CFR Part 11 – US Food and Drug Administration (FDA) regulation on
Electronic Records; businesses producing food, tobacco products, medications,
Electronic Signatures – biopharmaceuticals, blood transfusions, medical devices,
Scope and Application electromagnetic radiation emitting devices, cosmetics, and
animal feed and veterinary products.
Regulation 6 CFR Part 27 – Chemical US Department of Homeland Security (DHS) regulation
Facility Anti-Terrorism for chemical facilities in the United States.
Standards (CFATS)
Standard ISO 61511:2016 – International standard that defines practices in the
Functional Safety – Safety engineering of systems that ensure the safety of an
Instrumented Systems for industrial process through the use of instrumentation. It
the Process Industry Sector includes an explicit requirement to conduct a security risk
assessment (IEC 61511, Part 1, Clause 8.2.4).
Standard ISO 27001:2013 – International standard for information security. Although
Information Technology – specific to IT systems, there are some overlaps that may
Security Techniques – need to be considered when developing an industrial
Information Security cybersecurity management system.
Management Systems –
Requirements
Guide Center for Internet Security Simple guide to the top 20 security controls that should be
(CIS) Critical Security implemented in IT and OT systems.
Controls
Framework COBIT 5 Control Developed by the Information Systems Audit and Control
Objectives for Information Association (ISACA) to define a set of generic processes
and Related Technology for the management of IT.
(ISACA)
Summary
The aim of this chapter was to differentiate OT and IT cybersecurity and
show why these differences are important to an effective cybersecurity
management system.
For some time, the difference between OT and IT was explained using the
C-I-A triad. C-I-A shows the priority for IT cybersecurity is confidentiality
(C), whereas the priority for OT cybersecurity is availability (A). This
explanation is too simplistic and requires elaboration to provide a more
complete picture.
Within the cybersecurity profession, there is growing appreciation for the
consequences of an OT cybersecurity incident. Such incidents may impact
the environment, safety, or production. Cybersecurity controls applied to IT
need some adaptation before they can be applied to OT—for instance,
software patching or network monitoring. However, factors such as the
differences in OT and IT projects, and the differences in culture between OT
and IT operations, do not receive enough attention. These differences can
have significant impacts on OT cybersecurity. There is no single answer to
managing OT cybersecurity. The process begins by understanding the
differences between IT and OT and then adapting technology, people, and
processes in line with those differences.
____________
11 Jon Hemmerdinger, “Boeing Asked FAA in 2017 to Strip MCAS from Max Training Report,”
FlightGlobal website, October 18, 2019, accessed June 21, 2021,
https://www.flightglobal.com/airframers/boeing-asked-faa-in-2017-to-strip-mcas-from-max-
training-report/134896.article.
12 S. Lucchini, “I Thought I Had the Right Roadmap for Implementing a Safety System!,” (white
paper presented at the Texas A&M Engineering Experiment Station, 20th Annual International
Symposium, Mary Kay O’Connor Process Safety Center, Texas A&M University, 2017).
13 World Nuclear Association, “Chernobyl Accident 1986,” accessed June 21, 2021,
https://www.world-nuclear.org/information-library/safety-and-security/safety-of-
plants/chernobyl-accident.aspx.
14 The other occurred at the Fukushima Daiichi Nuclear Power Plant in Ōkuma, Japan, and was
caused by an earthquake and subsequent tsunami.
15 The vendor, Fazio Mechanical Services, provided heating, ventilation, and air conditioning
(HVAC) services to Target. It was the subject of a phishing attack that resulted in the exfiltration
of credentials for Target’s billing system. The attackers used this system to gain access to the
rest of Target’s network. Because HVAC was indirectly involved, many mistakenly believe that
this attack was the result of ingress via a less-secure HVAC network. Brian Krebs, “Target
Hackers Broke in Via HVAC Company,” KrebsOnSecurity blog, February 5, 2014, accessed
June 21, 2021, https://krebsonsecurity.com/2014/02/target-hackers-broke-in-via-hvac-company/.
16 Kevin McCoy, “Target to Pay $18.5M for 2013 Data Breach that Affected 41 Million
Consumers,” USA Today, updated May 23, 2017, accessed June 21, 2021,
https://www.usatoday.com/story/money/2017/05/23/target-pay-185m-2013-data-breach-
affected-consumers/102063932/.
17 ”2017 Equifax Data Breach,” Wikipedia, accessed June 21, 2021,
https://en.wikipedia.org/wiki/2017_Equifax_data_breach.
18 ”Order Granting Final Approval of Settlement, Certifying Settlement Class, and Awarding
Attorney’s Fees, Expenses, and Service Awards,” Equifax Data Breach Settlement, accessed
June 21, 2021,
https://www.equifaxbreachsettlement.com/admin/services/connectedapps.cms.extensions/1.0.0.0
/927686a8-4491-4976-bc7b-
83cccaa34de0_1033_EFX_Final_Approval_Order_(1.13.2020).pdf.
19 Mark Thompson, “Iranian Cyber Attack on New York Dam Shows Future of War, Time, March
24, 2016, accessed June 21, 2021, https://time.com/4270728/iran-cyber-attack-dam-fbi/.
20 ”FireEye Responds to Wave of Destructive Cyber Attacks in Gulf Region,” FireEye blog,
December 1, 2016, accessed June 21, 2021, https://www.fireeye.com/blog/threat-
research/2016/11/fireeye_respondsto.html.
21 Thomas Brewster, “Warnings as Destructive ‘Shamoon’ Cyber Attacks Hit Middle East Energy
Industry,” Forbes, December 13, 2018, accessed June 21, 2021,
https://www.forbes.com/sites/thomasbrewster/2018/12/13/warnings-as-destructive-shamoon-
cyber-attacks-hit-middle-east-energy-industry/#53fe71893e0f.
22 Joby Warrick and Ellen Nakashima, “Officials: Israel Linked to a Disruptive Cyberattack on
Iranian Port Facility,” Washington Post, May 18, 2020, accessed June 21, 2021,
https://www.washingtonpost.com/national-security/officials-israel-linked-to-a-disruptive-
cyberattack-on-iranian-port-facility/2020/05/18/9d1da866-9942-11ea-89fd-
28fb313d1886_story.html.
23 Brian Prince, “Researchers Detail Critical Vulnerabilities in SCADA Product,” Security Week,
March 13, 2014, accessed June 21, 2021, https://www.securityweek.com/researchers-detail-
critical-vulnerabilities-scada-product.
24 ”Juan Vazquez and Julián Vilas, “A patadas con mi SCADA! [Rooted CON 2014],” YouTube,
accessed June 21, 2021, https://www.youtube.com/watch?
v=oEwxm8EwtYA&list=PLUOjNfYgonUsrFhtONP7a18451psKNv4I&index=23.
25 Dr. Saul McLeod, “Maslow’s Hierarchy of Needs,” updated December 29, 2020, accessed June
21, 2021, https://www.simplypsychology.org/maslow.html.
26 Steve Mustard, Mission Critical Operations Primer (Research Triangle Park, NC: ISA
[International Society of Automation], 2018).
27 ”S.I. No. 360/2018 – European Union (Measures for a High Common Level of Security of
Network and Information Systems) Regulations 2018,” electronic Irish Statute Book, accessed
June 21, 2021, http://www.irishstatutebook.ie/eli/2018/si/360/made/en.
28 NIST, “Components of the Cybersecurity Framework,” presentation, July 2018,
https://www.nist.gov/cyberframework/online-learning/components-framework.
29 SP 800-82 Rev. 2, Guide to Industrial Control Systems (ICS) Security (Gaithersburg, MD: NIST
[National Institute of Standards and Technology], 2015), accessed June 21, 2021,
https://csrc.nist.gov/publications/detail/sp/800-82/rev-2/final.
3
Creating Effective Policy
Introduction
As mentioned in Chapter 2, “What Makes Industrial Cybersecurity
Different?,” governance is the foundation of a cybersecurity management
system. Without effective governance, policies and procedures may be
overlooked or go unenforced. Training may be ineffective if it lacks the
weight of the organization’s leadership, and investment in technical controls
may be poorly managed, leading to disappointing results. Despite these clear
shortcomings, many organizations implement elements of a cybersecurity
management system without good governance in place.
The committee charter defines all these elements. It should be no more than
two pages in length; if it is any longer, it will contain too many details to
manage and enforce. The charter should be issued by senior leadership and
clearly and concisely state:
In simple terms, IT is responsible for the systems and networks in the office
environment (known colloquially as “the carpeted area”). OT is responsible
for the systems and networks in the production or manufacturing
environment (“the noncarpeted area”). The physical division between the
two is referred to as the demilitarized zone (DMZ). The DMZ represents an
interface between the two environments, designed to secure communications
between them.
In practical terms, the division of responsibilities is often complicated.
Organizations may have their IT department responsible for operating
system updates and antivirus management of all servers and workstations.
That would include those in the OT environment. Some organizations may
have the OT team take responsibility for these activities but have IT provide
oversight and report on nonconformances. Regardless, it is essential that
these responsibilities are documented and clearly understood by all parties.
The organization must have the flexibility to adjust and optimize the division
of responsibilities. This might mean changing team structures to support a
multidisciplinary approach. This adaptation may not fit within the
conventional organization structure, but it may be the best option from a
cybersecurity management perspective. One of the biggest challenges in
creating a multidisciplinary team is convincing each group that they need the
other. The IT specialists do not have the detailed system understanding that
OT specialists have. IT specialists do provide critical elements of a well-
defined cybersecurity management system, but this must be executed
correctly.
The often-used responsibility assignment matrix (RACI) illustrates a
responsibility hierarchy. The RACI acronym stands for responsible,
accountable, consulted, and informed. The common pitfalls of a RACI
chart/matrix are as follows:
Too often, responsibility assignment matrices end up filed away and never
used. The matrix may look good on paper but does not properly address the
organization’s policies and procedures. In most organizations, industrial
cybersecurity duties are added to existing roles. These additional duties can
easily be neglected in favor of well-established activities. It is also common
to underestimate the effort required to perform industrial cybersecurity
duties. At the very least, it is unreasonable to place additional duties on
employees without ensuring they have the capacity to undertake them.
Doing so without oversight creates additional cybersecurity vulnerabilities as
tasks are skipped. In the European Union, this failure to provide sufficient
resources is a legal issue. The Network and Information Systems (NIS)
directive places legal obligations on all providers of critical infrastructure
(operators of essential services—OES) to ensure they are prepared to deal
with the increasing volume of cyber threats.33
Figure 3-5. The accident triangle with lagging and leading indicators.
The accident triangle clearly shows the relationship between the earlier,
more minor incidents and the later, more serious ones. This relationship can
also be expressed in terms of leading and lagging indicators. In the accident
triangle in Figure 3-5, unsafe acts or conditions and near misses are leading
indicators of health and safety issues. As these numbers increase, so does the
likelihood of more serious incidents. These more serious incidents are the
lagging indicators. To minimize serious health and safety incidents,
organizations monitor leading indicators such as unsafe acts or near misses.
Monitoring near misses through safety observations (and encouraging
employees to report them), coupled with regular audits to check that
employees are following their training and procedures, creates leading
indicators that can be adjusted. For example, employees can be prompted to
complete assigned safety training. Observations and audits will show if this
training must be adjusted. The goal is to improve safety as reflected in
lagging indicators.
Figure 3-6. A simple security triangle with lagging and leading indicators.
1 Impact of regulatory change and scrutiny on operational resilience, products, 6.38 6.24 (3)
and services
10 Adoption of digital technologies may require new skills or significant efforts to 5.71 N/A
upskill/reskill existing employees (new in 2020) (new)
*Scores are based on a 10-point scale, with “10” representing that the risk issue will have an
extensive impact on the organization.
Monitoring Changes
A key role for the governance board is to monitor the changing
circumstances that impact the organization’s risk and, potentially, the
cybersecurity management system.
• Review and approval of operational support decisions relating to
industrial cybersecurity – Often decisions made at the operational
level have a major impact on an organization’s cybersecurity
preparedness (e.g., the decision whether to purchase spare parts, such
as workstations, PLC cards, and network devices).
• Review of changes to risk assessment – The cybersecurity risk
assessment is subject to constant change. This is in response to
external threats, new vulnerabilities, and organizational changes.
• Review of changes to the cybersecurity management system – In
addition to organizational changes resulting from operational support
decisions or risk assessments, it may be necessary to make changes
to the cybersecurity management system. For example, audits may
highlight gaps that must be closed, incident investigations may
identify improvements, or benchmarking may identify new best
practices.
These considerations will be reviewed in more detail later in this book. For
now, it is enough to highlight that decision-making at the governance board
level leads to greater consistency across the organization and helps manage
everyone’s expectations. These three change monitoring items should be
included in a standing agenda for the governance board.
A warning regarding record keeping: In the age of email, instant
messaging, and collaborative tools, it has become normal to bypass good
documentation practices and make decisions more informally. Good
documentation practices are essential to effective industrial cybersecurity
management. Documentation need not be onerous, but for any decisions, a
short document should be presented to the governance board. The board
should signify their approval on the final version. This document is then
retained centrally for ease of access. This approach allows decisions to be re-
reviewed as necessary and ensures proper procedures are not bypassed.
Communicate to the Organization
The work to establish good governance can be undermined if it is not
communicated throughout the organization. The governance board should
make available to the organization:
• The governance charter
• The supporting policies and procedures
• The responsibility assignment matrix
• Regular reports on lagging and leading indicators
• Prompt reports on cybersecurity incidents
• Compliance and benchmarking results
All information should be shared within the organization and with any
relevant external parties, such as the Department of Homeland Security in
the United States and the National Cyber Security Centre (NCSC) in the
United Kingdom. Sharing the information, good or bad, helps to make all
employees feel they are part of the results. Sharing with relevant external
parties helps build a clearer picture of the situation that all critical
infrastructure operators face. Reports should be issued regularly (in line with
health and safety reporting) to reinforce that the organization takes its
cybersecurity responsibilities seriously.
Sharing only positive 1. Funding is cut Share information openly and honestly.
reports or messages or because it
downplaying bad news to appears to not be
senior management needed.
2. There is
significant
surprise and
disappointment
when an incident
occurs.
Overselling value of 1. There is a Ensure everyone is aware that people and process
technical controls to the perception that are always the weakest links, regardless of
organization everything is technical controls.
under control,
therefore
personnel do not
need to be
vigilant.
Declaring the details of 1. Lessons are not Censor and redact details as necessary but ensure
cybersecurity incidents to learned by those that all incidents, including near misses, are
be classified, and thus who are most promptly reported. Reporting improves awareness
limiting the sharing of likely to be and encourages improvement, and prompt
these incidents within the involved in reporting inspires a sense of urgency in addressing
organization similar issues.
incidents.
2. Lack of incident
reporting creates
a sense that
cybersecurity is
not a major
problem.
IT departments acting 1. Incomplete view OT department defines and manages risk and IT
independently of OT of cybersecurity department supports OT department with services
departments when risk in the to help manage that risk.
engaging on cybersecurity organization or
with regulators
and other
authorities.
2. Lack of
investment as a
result of
incomplete view
of risk.
Collecting the data to track these indicators can involve significant cost and
effort. In some cases, an organization may lack access to the data needed.
There is also an infrastructure requirement to achieve this data collection.
Once the data is collectible, it must be analyzed and presented in a suitable
format. Cybersecurity vendors offer solutions, including reporting
dashboards. Organizations can develop their own in-house reporting
solutions if they have such a capability. The key with cybersecurity metrics
and reporting, like any other type of reporting, is to:
• Focus on the measures that are meaningful to the organization. Just
because it is possible to measure does not necessarily mean the
measurement is useful.
• Focus on the relationship between leading and lagging indicators.
The security triangle is an essential visual.
Like the safety triangle, the security triangle, as shown in Figure 3-7, clearly
shows the relationship between the leading and lagging indicators. The focus
must be on managing the leading indicators. A more detailed security
dashboard will be required to generate the data in the security triangle, but
the security triangle provides a snapshot of cybersecurity performance.
Figure 3-7. Security triangle showing real data.
Assuming the organization is aware that an incident has occurred, the most
common reasons the author has heard for withholding reporting are listed in
Table 3-3 with counterarguments.
Table 3-3. Common concerns over cybersecurity reporting.
Concern Counterargument
The arguments in Table 3-3 could apply to health and safety incidents.
However, the advent of “safety culture” has demonstrated that the benefits of
open reporting are greater than any potential downside.
Failing to report cybersecurity incidents leads to the misconception that
cybersecurity is not a major problem. If nobody in the organization hears
about real incidents, they may let their guard down, dismissing the warnings
and guidance.
Level 1 No No attempt is made to train and educate the organization. People do not know or
Security understand organizational policies and procedures, do not realize they are a target, and
Awareness are highly vulnerable to most human-based attacks.
Program
Level 2 Awareness program is designed primarily to meet specific compliance or audit
Compliance requirements. Training is limited to annual or ad hoc basis, such as an on-site
Focused presentation once a year or quarterly newsletters. There is no attempt to change
behavior. Employees are unsure of organizational policies, their role in protecting
their organization’s information assets, and how to prevent, identify, or report a
security incident.
Level 3 Awareness program identifies the training topics that have the greatest impact on
Promoting supporting the organization’s mission and focuses on those key topics. Program goes
Awareness beyond annual training and includes continual reinforcement throughout the year.
and Change Content is communicated in an engaging and positive manner that encourages
behavior change at work, at home, and while traveling. As a result, employees are
aware of policies/processes and actively prevent, recognize, and report incidents.
Level 4 Long-term sustainment builds on an existing program that is promoting awareness and
Long-Term change, adding the processes and resources for a long-term life cycle, including at a
Sustainment minimum an annual review, and updating both training content and communication
methods. As a result, the program becomes an established part of the organization’s
culture and is always current and engaging.
Level 5 Defined as a security awareness program that has metrics in place to track progress
Metrics and measure impact. As a result, the program is continuously improving and able to
demonstrate a return on investment.
Summary
The most common failure in industrial cybersecurity governance arises from
a failure to properly execute one or more of the following tasks:
• Establish the governance infrastructure
• Assign senior management representation
• Allocate resources and assign clear ownership
• Establish good oversight
• Communicate to the organization
Without supporting infrastructure or senior management engagement, an
industrial cybersecurity program will eventually dissipate. The organization
will slip back into business as usual, reacting again only after periodic audits
flag noncompliance.
With clear ownership and good oversight, it is possible to maintain a focus
on cybersecurity, along with other business-critical areas such as safety. In
fact, the cybersecurity community can learn a lot from their safety
management colleagues. Safety culture is embedded in organizations. As a
result, it is at the forefront of everyone’s mind. Another aspect of good
safety management is communication. The more organization leaders and
staff hear about cybersecurity, especially near misses and other performance
metrics, the more likely they are to internalize the issue and take it seriously.
____________
30 ISO/IEC 27001:2013, Information Technology – Security techniques — Information security
management systems — Requirements (Geneva 20 – Switzerland: IEC [International
Electrotechnical Commission] and ISO [International Organization for Standardization]).
31 Luis A. Aguilar, “Boards of Directors, Corporate Governance, and Cyber-Risks, Sharpening the
Focus,” speech June 10, 2014, at the “Cyber Risks and the Boardroom” Conference in New
York, NY, https://www.sec.gov/news/speech/2014-spch061014laa.
32 Such as the 2019 survey by North Carolina State University’s enterprise risk management
(ERM) initiative and global consulting firm Protiviti shown in Table 3-1.
33 Information Commissioner’s Office, “The Guide to NIS,” accessed June 21, 2021,
https://ico.org.uk/for-organisations/the-guide-to-nis.
34 This term, abbreviated as ALARP, has its origins in health and safety legislation (originally
from the UK Health and Safety at Work Act of 1974) and means that a risk must be mitigated
unless the cost of doing so would reasonably be seen as excessive. For example, the risk of
using universal serial bus (USB) ports on a personal computer can be mitigated easily and
cheaply by the use of firmware disabling or USB locks, but with residual risk (that the ports
could be reenabled or unlocked). Building a custom workstation with the USB ports physically
removed would result in no residual risk but would be disproportionately expensive. In most
applications, the use of firmware disabling USB ports or USB locks would be seen as reducing
the risk to as low as reasonably practicable.
35 Amy Krigman, “Cyber Autopsy Series: Ukrainian Power Grid Attack Makes History,”
GlobalSign Blog, October 22, 2020, accessed June 21, 2021,
https://www.globalsign.com/en/blog/cyber-autopsy-series-ukranian-power-grid-attack-makes-
history.
36 North Carolina State University and Protiviti, “Illuminating the Top Global Risks in 2020,”
accessed June 21, 2021, https://www.protiviti.com/US-en/2020-top-risks.
37 Occupational Health and Safety Hub, Quick Safety Observation Card – Free Template,
https://ohshub.com/quick-safety-observation-card-free-template/.
38 A zero-day (also known as 0-day) vulnerability is one that is unknown to or unaddressed by the
product vendor or the user. Until the vulnerability is mitigated, it can be exploited to adversely
affect operation.
39 Lance Spitzner, “Security Awareness Maturity Model,” January 1, 2019, accessed June 21,
2021, https://www.sans.org/security-awareness-training/blog/security-awareness-maturity-
model/.
4
Measure to Manage Risk
Introduction
There are many books on the subject of risk management. This chapter is not
intended to cover the basic principles described in those books. Instead, the aim
is to explain how industrial cybersecurity risks are different, and how those
risks can be quantified and managed. This chapter will look beyond how
industrial cybersecurity risks are currently managed to propose more effective
approaches.
One common theme in cybersecurity is a reluctance to use probability and
statistics when estimating the likelihood of an incident. This chapter will
provide guidance on how probability and statistics can be applied to deliver
more useful results.
The recommended reading list provides several resources that describe the
basics of risk management in more detail. These resources offer more insight
into how probability and statistics can be used in cybersecurity.
The Cullen Report41 made more than 100 recommendations to improve safety.
One of the most significant recommendations was that responsibility for
identification of major accident risks should be transferred from legislator and
safety inspectorate to the operating company. Since then, safety has been
established as the number one priority for companies operating in high-hazard
environments. The management of safety risk is now a key activity embedded
into every process.
Defining Safety Risk
In safety, the definition of risk is: “a measure of human injury, environmental
damage, or economic loss in terms of both the incident likelihood and the
magnitude of the loss or injury.”42
In a high-hazard environment, typical consequences of an incident include the
following:
These incidents illustrate the fact that too little effort is made to address
intolerable risks that, ironically, involve minimal cost. While it is unclear if the
current investment is appropriate, it is clearly not correctly targeted.
There have been attempts to align industrial cybersecurity risk assessment with
these safety methods, with cyber PHA being the most common. CHAZOP may
also incorporate cybersecurity risks. However, Marszal and McGlone note that
cyber PHA and CHAZOP are more akin to failure modes and effects analysis
(FMEA) than HAZOP. Whereas a HAZOP focuses on the hazards in the
process, cyber PHA and CHAZOP focus on control system and network
equipment failure. This is not ideal because of the following:
• The frequency of cyberattack is not random like other equipment
failures modeled in the FMEA. Although it is possible to apply a
frequency for the purposes of analysis, there is no statistical basis for it,
unlike the random hardware and human failures in non-cybersecurity
assessments. The output of such an analysis is therefore misleading.
This could be addressed by rigorous collection of data, but that takes
time, and this issue cannot wait.
• With the focus on control system and network equipment failure, the
identification of safeguards is limited to the control system and the
network, whereas the overall process analysis will identify other
safeguards (such as mechanical protection).
Figures 4-5 and 4-6 are bowtie diagrams related to safety and cybersecurity
risk, respectively. Bowtie diagrams are used in safety management to help
visualize the relationship between causes, hazards, and consequences.53
Figure 4-5 is a bowtie diagram showing a single initiating cause (flow valve
fails open) resulting in a hazard (overfill/overpressure of free water knockout
vessel) that causes an event (loss of primary containment) that can lead to a
consequence (fire/explosion). On the left side of the diagram are preventive
actions, those that are in place to stop the event (e.g., opening of a pressure
valve); on the right side of the diagram are mitigating actions, those that are in
place to reduce the impact of the event (e.g., emergency shutdown).
Figure 4-6 shows a typical cybersecurity bowtie. In this case, the initiating
cause (malware deployed on system) leads to a generic hazard (cybersecurity
threat), event (cybersecurity incident), and ultimately a consequence (loss of
control), which is system, rather than process, focused. As a result, the
preventive and mitigating actions are focused on the system rather than the
process.
Figure 4-7 shows a simplified overview of the SPR process. The beauty of the
process is that the SPR is either part of an overall PHA study or uses the output
of a PHA study. This elevates cybersecurity in the overall process, where it can
be properly addressed. The company’s safety organization must understand that
cybersecurity risks can contribute to process hazards, and they should not be
treated as unrelated issues to be managed by others in the company. This is
easier said than done: Plant-based OT personnel may not have the time or
resources to address these issues. IT personnel may not have the domain
knowledge to properly appreciate the issues.
Figure 4-7. Overview of the SPR process.
Table 4-1 shows a simplified list of causes, consequences, and safeguards from
a gas turbine PHA. In this case, the loss of turbine wash-water feed-pump
suction and loss of lube oil cooling would be hackable because the safeguards
relate to computer-based elements, and the overpressure of a high-pressure
(HP) separator would not, because its safeguard is mechanical.
Table 4-1. Simplified causes, consequences, and safeguards from a gas turbine
PHA.
Cause Consequence Safeguard
Consider the matrix shown in Figure 4-2. The severity rating is classified in
different consequence types (health and safety, environmental, financial,
reputation and public disruption, and regulatory) and categorized with terms
from Low up to Severe. The likelihood is categorized in terms of the frequency
of occurrence, from Improbable (less than 10 years) to Frequent (once a
month). The common criticisms of this method from the cybersecurity
community are as follows:
There is also a genuine concern about such matrices as they can produce
confusing or contradictory results if poorly defined. For instance, in the risk
matrix in Figure 4-2, a financial consequence of less than $10,000 is
considered low, whereas a loss of more than $1,000,000 is seen as severe.
However, if the low consequence event occurred frequently (once a month) it
could cost the organization more than the severe-rated consequence.
Douglas Hubbard and Richard Seiersen offer methods to provide qualified
estimates in their book How to Measure Anything in Cybersecurity Risk.
Contrary to the concerns about estimating risk, they point out: “There are
problems in statistics that can only be solved by using a probabilistically
expressed prior state of uncertainty.”55
• Major projects use Monte Carlo simulation to analyze cost and schedule
and to produce risk-based confidence estimates, such as P10, P50, or
P90, where P stands for percentile. Many organizations, such as the UK
Ministry of Defense, require P10, P50, and P90 confidence forecasts to
be provided.56
• The US Securities and Exchange Commission (SEC) defines oil and
gas reserves in terms of P10, P50, and P90 ranges.57
Using the same ranges to quantify cybersecurity risk should provide a familiar
basis for probability.
58
Hubbard and Seiersen’s method is well suited to estimating the likely financial
impact of a cybersecurity incident. The question is whether it can work with
other consequences, such as death or injury, harm to the environment,
equipment damage, loss of production, regulatory violations, and brand
damage. Because these consequences all have a financial impact, one option is
to estimate that impact using the loss exceedance method. The results of this
method can also be used to calibrate the results from the security PHA method.
Bayes’s Theorem60
Another statistical concept recommended by Hubbard and Seiersen is Bayes’s
theorem. Bayes’s theorem is well suited to dealing with situations where data is
limited. The challenge with statistics is that the accuracy of any estimate is
based on the size of the sample. Conversely, it is impractical (or impossible) to
collect a sample size sufficiently large to be accurate. Frequentist statistics,
which involves the collection of sample data and estimating mean and standard
deviation, works well when the data is normally distributed but is less reliable
otherwise. Figure 4-9 shows a simplified example. The expectation based on
frequentist statistics is the normal bell curve. Using the mean and standard
deviation of this curve would indicate a remote probability of an event
occurring. This could provide false reassurance.
Paul Gruhn, a renowned functional safety expert and past president of the
International Society of Automation (ISA), has adopted Bayes’s theorem in his
work for similar reasons, “Frequentist statistics cannot be used to confirm or
justify very rare events,” for instance, the probability that a plant will have a
catastrophic process safety accident in the next year.61
Where P(A) and P(B) are the probabilities that two events, A and B, occur
independently. P(A|B) means the probability that event A (the event we are
interested in, which is hard to estimate) occurs given that event B has occurred
(an event that can be observed). P(B|A) is the probability that event B occurs
given that event A has occurred. What makes Bayes’s theorem powerful is that
P(A) can start with any prior estimate, but with new evidence (P(B), the new
estimate will improve and can be used in the next iteration of the calculation
when new evidence is available. To see how Bayes’s theorem can help,
consider the following example from Hubbard and Seiersen:63
A prior estimate can be based on data (e.g., how many people followed
another procedure), or an educated guess (e.g., a specialist may use their
experience to estimate the number). To be extremely conservative, the
estimate could assume all probabilities are equal. This is shown in Figure
4-10. All possible probabilities are shown from 0 to 1 (or 0% to 100% in
percentage terms). The uniform distribution shows all probabilities are
equally valid.
Each FR is first defined in terms of what is needed for each SL. For example,
for FR 2, Use Control the SLs are:
Base On all interfaces, the control system shall provide the capability to enforce
Requirement authorizations assigned to all human users for controlling use of the control system to
support segregation of duties and least privilege.
Requirement RE (1): Authorization enforcement for all users on all interfaces – The control system
Enhancements shall provide the capability to enforce authorizations assigned to all users (humans,
(RE) software processes, and devices) for controlling use of the control system to support
segregation of duties and least privilege.
RE (2): Permission mapping to roles – The control system shall provide the capability
for an authorized user or role to define and modify the mapping of permissions to
roles for all human users.
RE (3): Supervisor override – The control system shall support supervisor manual
override of the current human user authorizations for a configurable time or event
sequence.
RE (4): Dual approval – The control system shall support dual approval where an
action can result in serious impact on the industrial process.
Once the system under consideration (SUC) is identified (i.e., what is included
and excluded from the scope), an initial cybersecurity risk assessment is
performed. The SUC is then divided into separate zones (e.g., by vendor or by
functional area). Next, the connections between these zones (the conduits) are
identified. Cybersecurity safeguards are then identified and documented. This
is accomplished using the guidance from ANSI/ISA-62443-3-3 described
earlier.
Table 4-3 shows the typical sharing of responsibilities for the defense-in-depth
measures in the bowtie diagram in Figure 4-8.
Cybersecurity incident X X
response plan
Mechanical fail-safes X
Manual procedures X X
Although the asset owner has responsibility for all defense-in-depth measures,
the asset owner depends on other principal roles to perform the tasks. For
example:
• The asset owner can limit physical and electronic access to the system,
but the maintenance service provider must do the same. In 2000, a
disgruntled former contractor working for an integration service
provider was able to gain unauthorized access to a wastewater control
system and release raw sewage into the environment more than 40
times over several months. The integration service provider did not do
enough to prevent the contractor’s access. It should have, as a
minimum, removed the user from the system and changed all shared
account passwords when the contractor left the project.
• Removable media access, antivirus protection, operating system
updates, and backup and recovery are key cybersecurity defense-in-
depth measures, but they require all principal roles to contribute. The
product supplier must support these features. This means, for example,
not relying on an obsolete operating system that cannot be updated. The
integration service provider must design in these requirements from the
outset and test them before handover. The maintenance service provider
will be required to operate these measures. This could involve taking
and testing backups, as well as applying antivirus and operating system
updates. The asset owner must support the measures with rules such as
forbidding removable media access.
• The asset owner is entirely responsible for the mechanical fail-safes.
These provide one of the last lines of protection in the event of an
incident. The product supplier should not be responsible for the design
or maintenance of these fail-safes. The integration service provider may
design in these fail-safes (depending on its role in the project) but
cannot be responsible for their maintenance and upkeep. The
maintenance service provider may have some responsibility depending
on its arrangement with the asset owner.
Simplified Assessment and Definition of Safeguards
At the time of writing, the asset-owner approach to industrial cybersecurity risk
assessment is typically less rigorous than the SPR methodology and ISA/IEC
62443 SR compliance would provide. A typical asset owner manages industrial
cybersecurity by identifying key safeguards that must be in place in any
system. These are usually as follows:
The lifesaving rules set out simple and clear dos and don’ts. The rules have
been put in place to ensure a consistent safety posture for all workers.
The cybersecurity safeguards achieve a similar result for the organization’s
cyber resources without the need to perform in-depth analysis of every system
and process. If an organization can comply with all cybersecurity safeguards on
all systems, the likelihood of a cybersecurity incident will be greatly reduced.
The following are disadvantages of this one-size-fits-all approach:
• Systems, and the processes they control and monitor, have different
levels of risk and consequence. This approach does not prioritize based
on these factors.
• Systems contain different components, and as a result, it may not be
possible to apply all safeguards equally across all systems.
Figure 4-13 shows that eliminating the hazard is the most effective method,
while personal protective equipment (PPE) is the least effective. If the
minimum cybersecurity safeguards described earlier are mapped to the
hierarchy of controls, the effectiveness of the safeguards can be seen more
easily. This is shown in Table 4-4. For effectiveness, a score of 1 to 5 is used,
with 1 being the least effective and 5 being the most effective.
Table 4-5 summarizes the key stages in the process and highlights what can be
done to reduce the risk at each stage to reduce overall risk for the asset owner.
Table 4-5. Methods to address the industrial cybersecurity risk chain.
Stage Issues Methods to Address
Products • No common method to identify • Product supplier must have its products
cybersecurity assurance certified to ANSI/ISA-62443-4-2
Systems • No common method to identify • Product supplier must have its systems
cybersecurity assurance certified to ANSI/ISA-62443-3-3
• Full assessment of each system
required for each project
Summary
This chapter provided details on why industrial cybersecurity risk is different
from its IT counterpart. Although an increasing number of organizations
understand these differences, many still use the same techniques to estimate
industrial cybersecurity risk.
____________
41 Lord William Cullen, The Public Inquiry into the Piper Alpha Disaster (London: Her Majesty’s
Stationery Office, 1990), http://www.hse.gov.uk/offshore/piper-alpha-disaster-public-inquiry.htm.
42 Definition from the American Institute of Chemical Engineers (AIChE) and Center for Chemical
Process Safety (CCPS).
43 A denial-of-service (DoS) attack occurs when legitimate users are unable to access information
systems, devices, or other network resources due to the actions of a malicious cyber threat actor. US
Cybersecurity and Infrastructure Security Agency (CISA), “Security Tip (ST04-015):
Understanding Denial-of-Service Attacks,” revised November 20, 2019, accessed June 21, 2021,
https://us-cert.cisa.gov/ncas/tips/ST04-015.
44 Ralph Langer, To Kill a Centrifuge: A Technical Analysis of What Stuxnet’s Creators Tried to
Achieve, accessed June 21, 2021 (Arlington, VA: The Langner Group, November 2013),
https://www.langner.com/wp-content/uploads/2017/03/to-kill-a-centrifuge.pdf.
45 Nicholas Falliere, Liam O Murchu, and Eric Chen, W32.Stuxnet Dossier Version 1.3 (November
2010), accessed June 21, 2021,
https://www.wired.com/images_blogs/threatlevel/2010/11/w32_stuxnet_dossier.pdf.
46 Amy Krigman, “Cyber Autopsy Series: Ukrainian Power Grid Attack Makes History,” GlobalSign
Blog, October 22, 2020, accessed June 21, 2021, https://www.globalsign.com/en/blog/cyber-
autopsy-series-ukranian-power-grid-attack-makes-history.
47 Dragos, “TRISIS Malware: Analysis of Safety System Targeted Malware,” version 1.20171213,
accessed June 21, 2021, https://www.dragos.com/wp-content/uploads/TRISIS-01.pdf.
48 ”ALARP at a glance,” Health and Safety Executive, accessed November 6, 2021,
https://www.hse.gov.uk/managing/theory/alarpglance.htm.
49 IEC 61508-1:2010, Functional Safety of Electrical/Electronic/Programmable Electronic Safety-
Related Systems – Part 1: General Requirements (IEC [International Electrotechnical
Commission]).
50 “Cybersecurity spending trends for 2022: Investing in the future,” CSO, Accessed February 14,
2022, https://www.csoonline.com/article/3645091/cybersecurity-spending-trends-for-2022-
investing-in-the-future.html
51 Edward Marszal and Jim McGlone, Security PHA Review for Consequence-Based Cybersecurity
(Research Triangle Park, NC: ISA [International Society of Automation], 2019).
52 Marszal and McGlone, Security PHA Review for Consequence-Based Cybersecurity, 14.
53 It is not the aim of this book to describe the bowtie diagram in detail. The “Further Reading”
section provides references for more details on this subject.
54 Marszal and McGlone, Security PHA Review for Consequence-Based Cybersecurity, 9.
55 Douglas W. Hubbard and Richard Seiersen, How to Measure Anything in Cybersecurity Risk
(Hoboken, NJ: John Wiley & Sons, 2016), 38.
56 Martin Hopkinson, “Monte Carlo Schedule Risk Analysis—A Process for Developing Rational and
Realistic Risk Models” (white paper, Risk Management Capability, 2011), accessed June 21, 2021,
http://www.rmcapability.com/resources/Schedule+Risk+Analysis+v1.pdf.
57 “Summary Report of Audits Performed by Netherland, Sewell & Associates,” accessed June 21,
2021, https://www.sec.gov/Archives/edgar/data/101778/000119312510042898/dex992.htm.
58 Hubbard and Seiersen, How to Measure Anything in Cybersecurity Risk, 52.
59 European Union Agency for Cybersecurity (ENISA), “ENISA’s Position on the NIS Directive,”
January 2016, accessed June 21, 2021, https://www.enisa.europa.eu/publications/enisa-position-
papers-and-opinions/enisas-position-on-the-nis-directive.
60 This is often written as Bayes’ Theorem. This book uses the Britannica version of the name.
61 Paul Gruhn, “Bayesian Analysis Improves Functional Safety,” InTech, March 31, 2020, accessed
June 21, 2021, https://www.isa.org/intech-home/2020/march-april/features/bayesian-analysis-
improves-functional-safety.
62 Hubbard and Seiersen, How to Measure Anything in Cybersecurity Risk, 161–165.
63 Hubbard and Seiersen, 171–174.
64 ANSI/ISA-62443-3-3 (99.03.03)-2013, Security for Industrial Automation and Control Systems –
Part 3-3: System Security Requirements and Security Levels (Research Triangle Park, NC: ISA
[International Society of Automation]).
65 ANSI/ISA-62443-3-2, Security for Industrial Automation and Control Systems – Part 3-2: Security
Risk Assessment for System Design (Research Triangle Park, NC: ISA [International Society of
Automation]).
66 These principal roles are defined in ANSI/ISA-62443-1-1.
67 Intrinsic safety is a design technique applied to electrical equipment for hazardous locations that is
based on limiting energy, electrical and thermal, to a level below that required to ignite a specific
hazardous atmospheric mixture.
68 The Mitre Corporation, “ATT&CK for Industrial Control Systems,” accessed June 21, 2021,
https://collaborate.mitre.org/attackics/index.php/Main_Page.
69 Lockheed Martin Corporation, “The Cyber Kill Chain,” accessed June 21, 2021,
https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-kill-chain.html.
5
Standardized Design and
Vendor Certification
Introduction
In response to cybersecurity threats to their products, major automation
vendors have begun developing their own secure architectures. In some
cases, they incorporate customized security tools. Several vendors have gone
a step further and obtained third-party certification of these solutions.
Despite this, asset owners collectively spend millions of dollars designing
and reviewing solutions from vendors. These solutions are routinely
deployed, even within the same asset-owner organization. In fact, an asset
owner with multiple projects in different regions of the world, using the
same vendor solution, may treat each deployment project as if it were novel
and unknown.
Clearly, more standardization would improve the owner’s cybersecurity
posture while reducing the cost of deployment. This chapter will consider
the benefits of standardized designs, identify the elements of a standardized
design, and recommend ways to capture these details and minimize
implementation costs.
System hardening X FR 3, FR 7
Network monitoring X FR 6, FR 7
Manual procedures X FR 6
Table 5-1 shows why is it is not enough to implement secure systems and
components. This is consistent with the hazardous-area equipment analogy
already discussed. Even when a certified-hazardous-area product is
procured, it must still be installed in compliance with standards and the
vendor’s instructions to ensure it is safe. In this case, the asset owner must
put controls in place around the systems and components it procures to
ensure the overall facility is secure.
Figure 5-2 shows a simplified block diagram of a typical IACS environment
for a facility. This example includes an integrated control and safety system
(ICSS). This could also be a separate, distributed control system (DCS) and
safety instrumented system (SIS), or a wide-area supervisory control and
data acquisition (SCADA) system. The facility also includes a system to
monitor and control the power to the plant and power to the systems
themselves. These systems, along with a turbine control system, an
associated vibration monitoring system, and specific control systems, are
provided as part of the packaged plant system (e.g., wastewater treatment).
These systems are typically procured from different vendors but must work
together to achieve the overall objectives for the asset owner.
Some system vendors provide more secure solutions than others. Some offer
security-specific features. A few vendors provide their own antivirus and
patching solutions or their own backup solutions. Some include their own
network monitoring features.
Even with vendor support, the asset owner must secure the entire facility, not
just individual systems. Maintaining multiple vendor systems for antivirus
and patching could be cost prohibitive and difficult to resource. Vendors may
have different standards for screening operating system patches or network
monitoring, which may lead to inconsistencies. For this reason, it is essential
that the scope is clearly defined. Security implementation must be facility-
wide, not on a per-system basis. There are many challenges to achieving this
clear scope. These challenges are discussed in more detail in Chapter 6,
“Pitfalls of Project Delivery.”
Figure 5-3 shows the example facility in more detail. This diagram identifies
the key components of each system76 and the connectivity required for
operational purposes. The individual system architectures are based on
actual vendor solutions.
Figure 5-3. Illustrative facility architecture with no environment security controls.
The systems themselves may individually meet the asset owner’s security
requirements, but additional controls are required to operate in this
interconnected manner.
The core of the facility is the ICSS. The other systems communicate key
process data with the ICSS. This provides a facility-wide overview of
operations from one human-machine interface (HMI). Each system provides
its own HMI. This allows for a more detailed view of system operation. For
instance, operators may require a summary of power status on the ICSS
HMI. Electrical engineers may need to view more detailed power
management information from that system’s HMI.
The ancillary systems connect to the ICSS via Ethernet networks. Typically,
the ICSS will poll the ancillary systems using an industrial protocol, such as
Modbus or EtherNet/IP. This is an open standard based on the Common
Industrial Protocol (CIP), not to be confused with Transmission Control
Protocol/Internet Protocol (TCP/IP). The ancillary systems will return the
data requested by the ICSS.
The content and operation of the packaged plant control systems vary
considerably depending on the package and the vendor. In some cases, the
control system may comprise a programmable logic controller (PLC) and an
HMI. In other cases, it may be a personal computer (PC) connected to
specialist sensors. A third option may be a PLC with no HMI. The
connectivity to the ICSS will also vary. Modern packaged control systems
integrate into the same Ethernet networks as the ancillary systems described
earlier. However, some systems still connect using serial networks (RS-232
or RS-485). In some cases, this may involve hardwired connections, such as
analog or digital inputs or outputs. These connections represent critical
signals in the process.
In some facilities, it is necessary to deploy a device that belongs to a third
party. A typical example is a facility operated on behalf of a government-
owned oil and gas company. The government will want to receive
production totals and quality information from the facility. This information
is obtained from either a metering system or the ICSS, often using PLCs or
remote terminal units (RTUs) connected over Ethernet or serial networks.
Purdue Hierarchy
The Purdue hierarchy was developed by a team led by Theodore (“Ted”)
Williams (formerly of Monsanto Chemical Co.) at Purdue University’s
consortium for computer integrated manufacturing and published in 1992.77
The Purdue reference model is part of a larger concept: the Purdue
Enterprise Reference Architecture (PERA). This concept “provides a way to
break down enterprises into understandable components, to allow staff at all
levels to see the ‘20,000-ft view’ as well as to describe the details that they
see around them every day.”78 PERA expert and evangelist Gary Rathwell
was a member of the original development team. He maintains that PERA
was ahead of its time and never achieved the level of adoption that it
deserved. Rathwell has successfully implemented many major automation
projects by following the PERA methodology, but few outside his projects
appreciate what can be achieved. Most automation professionals know only
of the Purdue hierarchy, either from the ISA-95 standard, incorporated into
the IEC 62264 standard, Enterprise-Control System Integration,79 or the
ISA-99 standard, incorporated in ISA-62443-1-1, Security for Industrial
Automation and Control Systems,80 and even then, there is limited
understanding of the principles behind it.
Figure 5-4 shows the original Purdue hierarchy, in this case for a continuous
process such as petrochemicals. Another version (with different
descriptions) covered a manufacturing complex.
Figure 5-4. The original Purdue hierarchy.81
Industrial protocols such as Modbus adhere to the OSI model. This enables
Modbus-based applications to communicate with serial devices over RS-232
or RS-485, as well as devices on Ethernet or Wi-Fi.
IIoT and the Purdue Reference Model
Some consider the Purdue hierarchy obsolete. Key drivers behind this
opinion are the growing adoption of the IIoT and the move to locate central
processing servers to the cloud.
Definitions of IIoT are many and varied. In general, the concept differs from
conventional automation systems. Sensors and actuators are connected
directly to systems, typically cloud-based84, which “allows for a higher
degree of automation by using cloud computing to refine and optimize the
process controls.”85
Figure 5-7 shows an example of a cloud-oriented industrial architecture.
This alternative to the Purdue hierarchy is intended to demonstrate why it is
no longer applicable.
A key application for many asset owners and their vendors is condition-
based monitoring. Condition-based monitoring requires data at a relatively
low rate and assumes the condition does not change rapidly. The primary use
of condition-based monitoring is to monitor operational data over time. The
goal is to predict failures, rather than detect issues, in near real time. Figure
5-9 shows that both of these solutions achieve the condition-based
monitoring application requirement.
Figure 5-9. Conventional and IIoT-based approach to vibration monitoring.
Some of the sensors shown in Figure 5-7 do not fall into this category: The
parts tracking functionality (radio-frequency identification and barcode
readers) is unlikely to require responses in the millisecond, second, or
minute timescales. For these, an IIoT solution will likely be acceptable.
The worker monitoring requirements are unclear. If there is physiological
monitoring (body temperature, heart rate, blood pressure) or location
tracking, then it may be important to generate an alarm in the control room.
These conditions may be tracked elsewhere, for example, by medical staff in
a separate location. If so, only a periodic summary would be provided to the
control room.
In traditional implementations, the set point is set locally via the operator
console or the supervisor control console. The advent of improved
communications and devices allows a “higher degree of automation.” This
enables some business logic to determine the optimal set points needed to
achieve a particular objective. However, the use of IIoT cannot change
where the closed-loop control is executed, at least not in a resilient solution.
The Purdue levels help explain the order of priority for operation:
Compare the facility architecture shown in Figure 5-3 with the equivalent
Purdue hierarchy shown in Figure 5-12.
The architecture now includes a DMZ with network time, central domain,
backup, endpoint protection, and remote access services to be shared by all
the systems. Any duplicated equipment, such as NTP servers, can be
removed. All traffic is directed through the DMZ avoiding direct access to
any of the systems.
The management of this DMZ is critical to the successful operation of the
facility. In some organizations, management of DMZ equipment may by
default be the responsibility of the information technology (IT) function,
with independent oversight by the operational technology (OT) function.
Some organizations may create an industrial DMZ managed by the OT
function with independent oversight by the IT function. As with all decisions
discussed in this book, the organization must take a risk-based approach to
assessing the options, ensure there are sufficient qualified resources
available to administer the procedures, and apply rigorous oversight to
ensure the procedures are followed and the risks are managed.
When determining conduits, all communications paths into and out of zones
must be considered. These considerations include the following:
• The primary communications path for transferring data to and from
the zone
○ Remote access connections
○ Cellular or other backup connections
○ Dial-up connections used by vendors
• Definitions required for each conduit identified
○ The zones it connects (to and from)
○ The communications medium it uses (e.g., Ethernet, cellular)
○ The protocols it transports (e.g., Modbus, TCP port 502)
○ Any security features required by its connected zones (e.g.,
encryption, multifactor authentication).
Once identified, a conduit list should be produced to capture the details. An
example is shown in Table 5-2. At this stage, it is sufficient to list the names
of the traffic/protocols. This list will identify and document the specific ports
needed to create the firewall rules.
Table 5-2. Conduit list for example facility architecture.
Conduit From To Traffic/Protocols
ID
1 Vendor zone Corporate zone Vendor condition
Corporate zone Vendor zone monitoring
protocol
None
2 Corporate zone Demilitarized zone None
Demilitarized zone Corporate zone Domain control
Endpoint
protection
(AV/patching)*
3 Turbine control system zone Demilitarized zone Domain control
Packaged Plant #1 control Demilitarized zone Endpoint protection
system zone Demilitarized zone (AV/patching)
Packaged Plant #2 control Demilitarized zone NTP88
system zone Demilitarized zone Remote Desktop
ICSS zone Demilitarized zone Protocol
Power management system Backup protocol
zone
Vibration monitoring
system zone
4 Packaged Plant #1 ICSS zone Hardwired digital signal
Control System Zone
5 Third-Party Zone ICSS zone Modbus server89
ICSS Zone Third-party zone Modbus client
6 ICSS Zone Turbine control system Modbus server
Turbine Control System zone Modbus client
Zone ICSS zone
7 ICSS Zone Packaged Plant #2 control Modbus server
Packaged Plant #2 Control system zone Modbus client
System Zone ICSS zone
8 ICSS Zone Packaged Plant #3 control Modbus server
Packaged Plant #3 control system zone Modbus client
system zone ICSS Zone
9 ICSS zone Power management system EtherNet/IP server
Power management system zone EtherNet/IP client
zone ICSS zone
10 ICSS zone Vibration monitoring Modbus server
Vibration monitoring system zone Modbus client
system zone ICSS zone
*AV = antivirus
88,89
With the zones and conduits identified, the next step is to segregate the
zones and manage the communications across the conduits. There are
several options, the most common being
• firewall,
• virtual local area network (VLAN), and
• virtual private network (VPN).
Firewall
A firewall controls access to and from a network for the purpose of
protecting it and the associated devices. A firewall connects to two or more
networks, creating separate network zones. A firewall operates at layer 2 or
layer 3 of the OSI model—the network layer or data layer—and filters traffic
by comparing network packets against a ruleset. A rule contains the
following details:
• Source address
• Source port
• Destination address
• Destination port
• Protocol, TCP, User Datagram Protocol (UDP), or both
It can also include a time element (e.g., limiting remote access as needed
rather than always). Most firewalls have multiple network interfaces. The
firewall rule will define which interface the rule is configured on.
The firewall ruleset should be defined based on the conduit list produced
earlier. For example, using the conduit list in Table 5-2, the ruleset for the
turbine control system zone would be as shown in Table 5-3.90
Table 5-3. Firewall ruleset for a turbine control system zone.
From Source To Zone Destination Service Comment
Zone
The turbine control system uses only two of these commands and a limited
address range for each command. An example as shown in Table 5-6.95
Table 5-6. Modbus commands and address ranges for a turbine control
system interface.
Function Command Address Range
Code
The industrial firewalls are network-based devices and may be used for more
than one conduit. For instance, the firewall could be placed on the uplink of
a switch that connects multiple systems. However, in this example,
additional commands and/or address ranges would need to be configured.
From a segregation and management perspective, it would be better to
configure one industrial firewall per conduit. In this case, the firewall would
be placed in line between each system and the switch. The cost per unit is
marginal compared with ongoing maintenance and management. Also, this
approach improves security.
As with firewalls, VLANs enable strict control over network traffic, in this
case, limiting which devices can communicate with each other. Note,
however, that as with firewalls, care must be taken to validate configurations
as errors may not be obvious. Just because systems are operating normally
does not mean they are secure from unauthorized operation or intrusion.
Virtual Private Network
VPNs and VLANs have many similar characteristics. Both are used to
establish logical networks on top of a physical network. VPNs tend to be
used to establish secure communications between geographically diverse
networks. This creates a single, logical network over external physical
networks such as the Internet. These secure communications are restricted to
an authorized group of users. Common uses for VPNs in automation system
networks include the following:
• Providing users with a secure method of remotely accessing a system
to monitor and/or control equipment
• Providing vendors with a secure method of accessing condition
monitoring or other maintenance-related information on their
equipment
As shown in Figure 5-19:
In a VPN, the computers at each end of the tunnel encrypt the data entering
the tunnel. They then decrypt the data at the other end using encryption keys.
Once data is encrypted, it is impossible to read without access to the
encryption keys. IP Security (IPsec) secures the storage and transmission of
these encryption keys and enables secure VPNs to operate. IPsec is a set of
protocols developed by the Internet Engineering Task Force (IETF) to
support the secure exchange of data across the Internet. IPsec has been
deployed widely to implement VPNs.
For IPsec to work, the sending and receiving devices must share a public
key. This is accomplished through a protocol known as Internet Security
Association and Key Management Protocol/Oakley (ISAKMP/Oakley),
which allows the receiver to obtain a public key and authenticate the sender
using digital certificates. Digital certificates have additional security
benefits. As well as authenticating a user, they provide
• data integrity assurance, by verifying that data has not been altered in
transit; and
• nonrepudiation, by proving that data was sent by a particular user,
based on their certificate credentials.
There are some scenarios where asset owners approve vendors to provide
technical support via VPN. This technical support may require changes to
system set points or logic. In fact, the COVID-19 pandemic forced many
organizations to adapt to the challenges of restricted travel and site work. In
May 2020, Siemens successfully completed the start-up and adjustment of
one of its gas turbines in Russia.96 Although this case involved changes made
by personnel on-site with guidance by remote experts through
videoconference, there is a trend in many organizations toward performing
more work remotely. This should be considered with great caution. One
effective control is to limit the availability of such remote access to only
when necessary, and under strict on-site supervision. Such access should be
disabled by default.
System Hardening
Hardening a system means configuring equipment to reduce the likelihood
of using a vulnerable program or service. Automation systems have a
narrower function than IT systems. Automation systems are thus better
suited to the rigorous hardening needed to prevent unauthorized access or
operation.
The following hardening practices are essential for cybersecurity
management of automation systems. Certain practices may not be applicable
to all devices.
• Protecting endpoints – This includes antivirus protection and
operating system patching, both of which may be performed
semiautomatically or manually. In automation systems, antivirus and
operating system patching should never be performed automatically.
Only specific patches and antivirus definitions approved by the
vendor should be deployed.97 Machines should only be rebooted
manually. This will avoid unnecessary outage of the automation
system. Endpoint protection may also involve application control.
Application control locks down the equipment’s configuration so
changes cannot be made. That means programs (malicious or
otherwise) cannot be installed or executed. This “lockdown”
approach is better suited to automation system equipment which
rarely changes except for upgrades to the automation application.
• Using USB and network port security – USB (universal serial bus)
enabled devices, such as hard drives and devices containing storage,
are major sources of malicious programs. A machine should have
antivirus or application control installed to prevent malicious
programs from executing, but it is recommended that unused USB
ports be disabled or locked. USB ports can be disabled in the
machine firmware (the BIOS—basic input/output system) or the
operating system. They can also be physically locked, using products
such as that shown in Figure 5-20. The presence of a lock reminds
users to think before plugging their device into a workstation to
charge it or download personal files. A lock and key system enables
an authorized user to unlock a port if needed. For additional
protection, it is recommended that autorun features be disabled in
Windows operating systems. This will prevent USB ports, used by
mouse and keyboard devices, from intentionally or unintentionally
opening USB drives and deploying malicious programs. The same
approach is recommended for network ports. Ports on switches and
routers should be disabled or configured to dummy VLANs that are
not used. If someone connects a device to such a port, it cannot
communicate. Network port locks, similar to the USB lock shown in
Figure 5-20, should be installed to help change behavior and prevent
users from connecting without formal approval and change control.
Assuming the physical security team has put in place the elements
mentioned earlier, additional considerations for physical security of
automation systems equipment include the following:
• Define who should have access to each facility. This might include
who has keys, or copies of keys; who is programmed into an
electronic card access system; or who has the codes for keypad locks.
• Create a process for taking action when someone leaves. It is
common to share codes or keys with staff and vendors. A standard
process should ensure that locks or codes are changed, or card access
systems are updated, when someone leaves employment.
• Enforce physical security on-site. This includes ensuring that
equipment rooms (e.g., Figure 5-22) and cabinets (e.g., Figure 5-23)
are locked when not in use. Visitors must always be escorted. These
controls are part of the risk assessment performed by the physical
security team. They are deemed necessary and should always be in
place.
Procedural Controls
• All remote access activities involving changes to automation
systems, or associated devices (e.g., PLC, RTU), should be only
conducted under an approved permit to work. The permit should
identify the planned activities, the associated risks, and any
additional controls required.
• No remote access activity should be permitted if the risk of a remote-
connection failure would leave the facility unsafe or in an out-of-
service state.
• Remote access for particular tasks may require a specific type of
connection. For instance, a cellular connection may be less reliable
than a broadband connection.
• Formal, defined support schedules should be available to all
involved. These document who should be connecting at any
particular time.
Network Monitoring
Network monitoring is a broad term that includes
• monitoring networks for problems, such as device failures, heavy
traffic, and slow response times; and
• intrusion detection and prevention.
Network monitoring, IDS, and IPS tools may have a place in automation
systems, but before they are deployed it is essential that:
• The operation of all automation systems is clearly understood and
documented. This includes defining all protocols, commands, and
registers necessary for operation.
• Other controls are properly implemented. This includes the correct
configuration and testing of all standard firewalls. It also includes
industrial firewall features, such as only allowing specific commands
and registers, and logging all other events.
• All equipment is properly hardened. This includes activities noted
earlier in this chapter, in particular, disabling or removing
unnecessary services or programs that might generate unwanted
traffic.
• Procedures are in place to regularly review log files and investigate
suspicious activity.
With these elements in place, it is possible that network monitoring, IDS,
and IPS could be useful aids in the monitoring process.
Cybersecurity Incident Response Plan
When a cybersecurity incident occurs, an incident response (IR) plan must
be initiated. Incident response plans must cover all the failure scenarios
considered in the network design. The incident response plan will define
• recovery objectives;
• roles, responsibilities, and levels of authority;
• communications procedures and contact information;
• locations of emergency equipment and supplies; and
• locations of spares and tools.
The incident response plan must identify the recovery objectives for each
essential function in the automation system. There are two key recovery
objectives to identify:
1. The recovery time objective (RTO) – Defining how long the
function can be out of service
2. The recovery point objective (RPO) – Defining how much data
can be lost in the event of a failure
These objectives will dictate what must be in place, in terms of:
• SLA(s) with vendor(s)
• System design
• Spare parts, on and off-site
• What is backed up and how often
• How long the backup and restore takes, and the backup location(s)
In extreme cases, automation system unavailability may become a disaster-
level situation, for instance, the loss of the primary control room in a flood
or fire. This scenario requires specific recovery actions. A disaster recovery
(DR) plan should be produced, defining these disaster scenarios and the
required actions.
Server (DCS, SCADA, historian, Disk image On change Based on RPO (e.g.,
etc.) Application/database daily/weekly)
Workstation Disk image On change
Automation device (RTU, PLC, Program/configuration On change
etc.) file(s)
Network device (switch, router, Configuration file(s) On change
firewall, etc.)
Each equipment type should have its own procedure that describes the
specific steps taken to perform the backup. Note that some automated
backup may be available, either at a system level (entire system) or device
level (e.g., via a PLC programming environment).
Storage and Retention
Consideration must be given to backup file storage and retention. At a
minimum, files must be kept off-site to protect against a localized disaster
that could destroy the equipment and backups. Depending on recovery
objectives, it may also be necessary to hold copies of backups locally to
allow for rapid response. In this case, backups should be kept in a fireproof
safe to protect them from damage.
Backup files can be large. Transferring them over a network can be time-
consuming and interfere with other network operations. It may be necessary
to transfer files during quiet periods.
Backup file retention is important, but there may be limitations on available
storage space. For some systems, it is common to maintain full and
incremental backups. This approach minimizes the need for multiple large
backup files. In such a case, there may be 1 or 2 full backups (taken
monthly) and 7 to 14 incremental backups (taken daily). Other scenarios will
emerge, depending on specific circumstances.
Restoration
Restore procedures describe the specific steps needed to restore the system
from a backup. It is essential that these procedures be tested regularly using
real backup files. Testing of this sort
• provides additional verification that the backup is completing as
required, and
• verifies the restoration process.
Ideally, a setup will enable these tests to be performed without interrupting
the operation of the live system. This test setup may require only one of each
type of device to verify the backup/restoration process. For larger systems,
the vendor may be asked to test backup file veracity using its own systems.
Other Verification Requirements
There must be some procedure to verify that malware is not present in a
backup. If malware compromised a system 12 months ago, it can be assumed
all backups in archive back to this time are also compromised. This may
nullify all backups and require a system rebuild. Procedures should therefore
include an anti-malware scan before a backup is taken, and a similar scan
when restoration testing is being performed.
When backups are taken by vendors or service providers, an asset owner will
need a different set of verification procedures to check that the vendor or
service provider is taking the backups, testing them, and checking them for
malware.
Manual Procedures
As noted in Chapter 2, “What Makes Industrial Cybersecurity Different?,”
policies and procedures are a critical element of good cybersecurity
management. Fortunately, personnel at facilities operating automation
systems are accustomed to following procedures. These environments are
hazardous and following procedures can mean the difference between life
and death.
As with safety procedures, cybersecurity procedures introduce inefficiencies
into work processes. For example, logging in remotely using multifactor
authentication and then accessing a machine via an intermediate remote
access server creates several additional steps. However, as with safety, these
additional steps reduce the likelihood of an incident. In Chapter 4, it was
noted that administrative controls, such as procedures, are among the least
effective. People inevitably find ways around the procedures. Nevertheless,
procedures are an essential part of the defense-in-depth approach to
cybersecurity management.
Key manual procedures recommended for end-user facilities are as follows:
• Require all site visitors to take a cybersecurity induction that covers
the key cybersecurity rules.
• Require all personnel to complete formal training, including ongoing
security awareness, and update this training annually to keep up with
evolving threats, vulnerabilities, and mitigations.
• Require that all changes involve backups of equipment before and
after the change.
• Require that all changes follow a formal change-control procedure
that includes updating and approving all documentation.
• Require that all files are transferred using a secure, approved method.
System Availability
The terms availability, reliability, maintainability, and redundancy are often
used incorrectly or interchangeably.
Availability is the probability that the system is operating properly when it is
required. Availability is measured as a percentage over a defined period, for
example, per day, per month, or per year. Availability over a year is the most
frequently used measure for a system. An availability of 90% translates to 40
days per year downtime. An availability of 99.9999% (commonly referred to
as six nines) translates to 30 seconds downtime in the same period.
Availability is the combination of reliability and maintainability.
Reliability is a measure of the probability of a component or system to
perform a required function in a specific environment for a specific period
without failure.
Maintainability measures the ease with which a product can be maintained
and is an essential element for successful operations.
Redundancy achieves high availability by replicating hardware so that if one
device fails, another can take over. There are several types of redundant
design that can be used for elements of a system:
• Cold standby – Although not a redundant system in the true sense,
the immediate availability of spare components provides a basic level
of response.
• Warm standby – In this scenario, duplicate components are running
alongside the live equipment and can be swapped in more quickly
than in the cold standby scenario. However, there is still some loss of
service during the swap.
• Hot standby – This scenario minimizes the downtime experienced
during component failure. The duplicate/standby component
communicates with its live counterpart. If it detects failure, the
standby component takes over. In some designs, an overall system
controller monitors all components to detect failures.
If the probability of the individual events (e.g., primary supply failure) are
known, it is possible to calculate the overall probability of the scenario (e.g.,
loss of view). This probability can then be used to define availability figures
for each scenario.
The fault tree method can be used to model modifications to the system
design (e.g., the addition of a backup communications option) to determine
the effect on availability.
Designing for System Availability
Power
Most automation system sites use an uninterruptible power supply (UPS) to
ensure continuous power to equipment. The UPS monitors incoming power,
detects problems, and automatically switches over to battery backup. The
battery is charged continuously while the primary power supply is available.
Larger sites with bigger demands require stand-alone generators (e.g., diesel)
to provide backup power.
Other Considerations
Internet Protocol Addressing
An IP address uniquely identifies a device on an IP network. There are two
standards for IP addressing, IPv4 and IPv6. In IPv4, the address is made up
of 32 binary digits, or bits, which can be divisible into a network portion and
host portion. The 32 bits are broken into four octets (1 octet = 8 bits). The
value in each octet ranges from 0 to 255 decimal, or 00000000 to 11111111
binary. Each octet is converted to decimal and separated by a period (dot),
for example, 172.16.254.1. This is shown in Figure 5-29.
Asymmetric encryption uses two keys, one for encryption and one for
decryption. This is shown in Figure 5-33. The encryption key is known as
the public key. It is freely available to everyone to encrypt messages. This is
why asymmetric encryption is also known as public-key encryption.
Asymmetric key systems ensure a high security level, but their complexity
makes them slower and computationally more demanding than symmetric
key encryptions. Hybrid encryption systems use symmetric and asymmetric
systems, combining the advantages of the two. Hybrid systems have the
safety of the public key and the speed of the symmetric key.
In the hybrid system, a public key is used to safely share the symmetric
encryption system’s private key. The actual message is then encrypted using
that key and sent to the recipient.
Another unique form of encryption, known as hashing, is commonly used to
protect sensitive data. In hashing, an encrypted version of data (the hash) is
created but cannot be decrypted. An example of hashing is shown in Figure
5-34. Hashing is often used to securely store passwords. In this scenario, to
verify a password, a hash is created on the fly and compared against the
stored hash. This avoids the need to store and transmit the unencrypted
version of the password, which could be accessed by unauthorized users.
ISASecure
The ISA Security Compliance Institute (ISCI) is a nonprofit organization
that has developed several product certification programs for IACSs and the
components of these systems.101 These programs are based on certification
around the ISA/IEC 62443 Series, Security for Industrial Automation and
Control Systems.
The current ISASecure certification programs are listed below.
• Security Development Lifecycle Assurance (SDLA), which certifies
that the security development life cycle of a vendor meets the
requirements in ANSI/ISA-62443-4-1, Product Security
Development Life-Cycle Requirements.
• System Security Assurance (SSA), which certifies that IACS
products have the capability to meet the requirements in ANSI/ISA-
62443-3-3, System Security Requirements and Security Levels, and
have been developed in accordance with an SDLA program
compliant with ANSI/ISA-62443-4-1-2018, Security for Industrial
Automation and Control Systems – Part 4-1: Secure Product
Development Lifecycle Requirements (formerly Part 4-1: Product
Security Development Life-Cycle Requirements).
• Component Security Assurance (CSA), which certifies that IACS
component products have the capability to meet the requirements in
ANSI/ISA-62443-4-2-2018, Security for Industrial Automation and
Control Systems – Part 4-2: Technical Security Requirements for
IACS components, and have been developed in accordance with an
SDLA program compliant with ANSI/ISA-62443-4-1-2018, Security
for Industrial Automation and Control Systems – Part 4-1: Secure
Product Development Lifecycle Requirements (formerly Part 4-1:
Product Security Development Life-Cycle Requirements). Certified
component products can be embedded devices, such as controllers;
host devices, such as PC workstations; network devices, such as
firewalls; and software applications.
Certification is conducted by ISO 17065 accredited certification bodies
(CBs). A certificate is issued that shows details of the product, including
relevant release numbers, the version of the standard referenced, and the date
of certification. An example certificate is shown in Figure 5-35.
Figure 5-35. Example ISASecure certificate.
Source: International Society of Automation, ISASecure website. https://www.isasecure.org/en-
US/End-Users/IEC-62443-4-2-Certified-Components.
Although many vendors have SDLA, SSA, and CSA certification, at the
time of this writing, it is still not common for asset owners to demand
certified vendors, systems, or components. As noted at the beginning of this
chapter, there are substantial benefits to building facilities around certified
vendors and products, just as there are with hazardous-area certified
equipment.
• Products that are secure by design, developed by vendors who are
certified to follow standards for their processes and procedures, form
a much better foundation for a secure facility.
• The onus is on vendors, rather than asset owners, to obtain and
maintain third-party certification for their products. The asset owner
can save time and money by avoiding the need to perform audits of
vendors and their products.
• To be compliant, the vendor must provide clear instruction to the
asset owner on secure deployment of the product. This will save
additional time and money for the asset owner, who will not need to
develop security requirements. It also provides a greater degree of
consistency that is easier to maintain.
The main driver for vendors to obtain certification is market pressure. Few
vendors will take the initiative to invest in certification without a business
case for a return on that investment. Unfortunately, many asset owners still
do not fully understand automation systems security. Many security
questionnaires in requests for proposal include questions oriented entirely
around information security, such as the following:102
• Are you certified and/or audited to any information security or
quality standards such as ISO/IEC 27001, ISO 9001, SAS 70, or PCI
DSS?
• Will any <asset owner> information be stored, processed, or accessed
from outside of <country>?
• What security controls are in place to keep <asset owner> systems
and data separate from other client data?
• Will access to <asset owner> information held on your systems be
able to be gained via a remote connection?
These questions are important, but without asking for ISA/IEC 62443
certification or any details of automation systems-related controls, there is
no requirement for vendors to learn about or pursue them.
When asset owners finally demand certified automation system vendors and
products, the business case will be clear and vendors will comply. This
compliance will greatly improve the inherent security of automation systems
products.
Summary
Despite the general awareness of cybersecurity risks, many asset owners and
vendors are still not providing or maintaining secure automation systems.
Although some automation vendors have begun developing their own secure
architectures, and some have obtained third-party certification, there is still
much to be done.
Asset owners that are security-aware have developed their own internal
automation systems security standards. They have been designing and
reviewing solutions from vendors. The lack of consistency of approach, even
within asset-owner organizations, introduces additional cost while failing to
achieve the most secure outcome.
Standardization is essential if asset owners are to improve their cybersecurity
posture and reduce the cost of deployment. Asset owners should focus on
ensuring the essential elements of standardized designs are in place before
looking at other, more advanced controls.
A secure network design is a foundation of good cybersecurity posture.
Many implementations fail at this stage due to a lack of understanding of
secure design principles. Among these is the use of the Purdue hierarchy. It
can define a functional design that can be converted into a physical one, with
all the necessary security zones and conduits in place.
The demand for IIoT and cloud solutions is driving poor network design.
These solutions may work, but without the resilience needed by most asset-
owning businesses. Unfortunately, many discover this lack of resilience
during an incident.
Even with a secure network design, it is critical that equipment be properly
hardened. Physical and electronic access control must be put in place, with
robust manual procedures, before advanced solutions like network
monitoring or intrusion detection are deployed.
Even with the most secure solution feasible, an asset owner will still
experience cyber incidents. Being prepared for these, with proven tested
incident response and disaster recovery plans, supported by backup and
recovery processes, will make the difference between a minor and major
outage or incident.
Certification of vendors and products is a key means to raise the standard of
automation systems security. Hazardous-area certification became a business
requirement because of safety concerns. The link between cybersecurity and
safety should be clear to all asset owners, as described in Chapter 4. Once
cybersecurity certification for automation systems becomes the norm, asset
owners will be able to implement more secure facilities at a fraction of the
current cost.
____________
70 The National Electrical Code (NEC) defines hazardous-area classifications in the United States
(NEC Article 500). An NEC hazardous-area classification consists of several parts: the class,
group, and division. Worldwide, outside the United States, IEC standard IEC 60079 defines
hazardous-area classifications using class and zone (this classification method is known as
ATEX, an abbreviation of the French atmosphères explosibles).
71 ANSI/ISA-62443-4-2 defines the requirements for component products; these can be embedded
devices, host devices, network devices, and software applications.
72 ANSI/ISA-62443-3-3 defines the requirements for an IACS system based on security level.
73 ISASecure System Security Assurance (SSA) certifies that products have the capability to meet
the requirements in ANSI/ISA-62443-3-3 and have been developed in accordance with a
Security Development Lifecycle Assurance (SDLA) program. ISASecure Component Security
Assurance (CSA) certifies that component products have the capability to meet the requirements
in ANSI/ISA-62443-4-2 and have been developed in accordance with an SDLA program.
74 ANSI/ISA-62443-3-3 (99.01.01)-2013, Security for Industrial Automation and Control Systems
– Part 3-3: System Security Requirements and Security Levels (Research Triangle Park, NC:
ISA [International Society of Automation]).
75 Listed in ANSI/ISA-62443-3-3.
76 The diagram is for illustrative purposes. The number of components in each system will vary
depending on facility requirements.
77 Theodore J. Williams, The Purdue Enterprise Reference Architecture: A Technical Guide for
CIM Planning and Implementation (Research Triangle Park, NC: Instrument Society of
America, 1992).
78 PERA Enterprise Integration (website), Gary Rathwell, accessed June 21, 2021,
http://www.pera.net/.
79 IEC 62264-1:2013, Enterprise-Control System Integration (Geneva 20 – Switzerland: IEC
[International Electrotechnical Commission]).
80 ISA-62443-1-1-2007, Security for Industrial Automation and Control Systems – Part 1-1:
Terminology, Concepts, and Models (Research Triangle Park, NC: ISA [International Society of
Automation]).
81 Williams, The Purdue Enterprise Reference Architecture, 146.
82 ANSI/ISA-62443-1-1-2007, Security for Industrial Automation and Control Systems.
83 ISA-62443-1-1-2007, Security for Industrial Automation and Control Systems, 60.
84 “Is the Purdue Model Dead?”
85 ”Industry 4.0,” University of West Florida (website), accessed June 21, 2021,
https://uwf.edu/centers/haas-center/industrial-innovation/industry-40/.
86 Williams, The Purdue Enterprise Reference Architecture, 144.
87 The WannaCry incident involved exploiting a vulnerability in Microsoft Windows and resulted
in over 230,000 computers in 150 countries being infected with ransomware. Timothy B. Lee,
“The WannaCry Ransomware Attack Was Temporarily Halted. But It’s Not Over Yet,” Vox,
May 15, 2017, accessed June 21, 2021, https://www.vox.com/new-
money/2017/5/15/15641196/wannacry-ransomware-windows-xp.
88 As noted earlier, to maintain resilience, a local NTP service is provided, so there is no NTP
traffic required from the DMZ to the Corporate Zone.
89 This book uses the newer convention of server and client. This convention was adopted by the
Modbus Organization on July 9, 2020. See https://www.modbus.org/docs/Client-ServerPR-07-
2020-final.docx.pdf for further details.
90 The specific ports are shown for example only and are not intended to reflect particular products
or solutions or any changes in products or solutions after this book is published.
91 Good firewall configuration procedures require the association of unique names with IP
addresses to improve the readability of a ruleset.
92 FortiGate 7060E chassis. Fortinet, “FortiGate® 7000E Series FG-7060E, FG-7040E, and FG-
7030E Datasheet,” accessed June 28, 2021,
https://www.fortinet.com/content/dam/fortinet/assets/data-
sheets/FortiGate_7000_Series_Bundle.pdf.
93 Tofino Argon 100 security appliance. Tofino, “Argon Security Appliance Data Sheet,” DS-TSA-
ARGON, Version 5.0, accessed June 28, 2021,
https://www.tofinosecurity.com/sites/default/files/DS-TSA-ARGON.pdf.
94 There are additional function codes specified in the Modbus protocol. Some vendors have their
own function codes for product-specific features. The commands specified here are commonly
used by most systems.
95 This is an example only and does not reflect any particular vendor solution.
96 Fortum, “Siemens Carried Out First Remote Start-Up and Adjustment Work in Russia at
Nyagan GRES,” accessed June 21, 2021, https://www.fortum.com/media/2020/06/siemens-
carried-out-first-remote-start-and-adjustment-work-russia-nyagan-gres.
97 This approach may lead to inconsistencies, with different vendors approving different patches or
signatures, or not adequately testing against all patches before approving. Application control
may provide a more consistent approach if implemented systematically.
98 Geofencing is the use of location-based services to locate users, and that information is used to
make decisions, in this case to provide remote access. Margaret Rouse, “What Is Geo-Fencing
(geofencing)?” WhatIs.com, accessed June 21, 2021.
99 A process control narrative, or PCN, is a functional statement describing how automation
system components should be configured and programmed to control and monitor a particular
process, process area, or facility.
100 A host file is a clear text file stored on a server or workstation that contains a list of hostnames
and associated IP addresses. This is a simplified, but decentralized, version of DNS where
network devices share and update a similar list in real time.
101 See https://www.isasecure.org/en-US/About-Us for more details.
102 These questions are similar to a sample from a real questionnaire from an asset owner.
Identifying details have been removed where applicable.
6
Pitfalls of Project Delivery
Introduction
Most cybersecurity literature and training reference the challenge of
applying cybersecurity controls to legacy equipment. Typical of these
references are “There is a large installed base of SCADA [supervisory
control and data acquisition] systems, ranging from current levels of
technology back to technologies from the 1980s (and possibly older),”103 and
“In the long run, however, there will need to be basic changes in the design
and construction of SCADA systems (including the remote terminal units—
RTUs) if they are to be made intrinsically secure.”104
Conceptual engineering Cybersecurity risk comparison for high-level logical design options
Feasibility
As previously noted in Chapter 2, EPC projects tend to run for many years.
For the systems and networks being implemented, cybersecurity is more
than a design or assurance issue105. During the feasibility stage, the project
team should consider the risks related to each subsequent phase of the
project. This includes the time to deploy controls, procedural or technical, to
manage these risks. Establishing a solid foundation of good governance, as
part of a comprehensive cybersecurity management system, will reduce
delays to a project. Putting this system in place early may prevent a
cybersecurity incident or last-minute technical issues.
Consider the December 2018 Shamoon 3 cyberattack that targeted service
providers in the Middle East. EPC contractor Saipem suffered significant
disruption at locations in the Middle East, India, Aberdeen, and Italy.
According to a Reuters report, up to 400 servers and 100 workstations were
crippled by the attack.106 The impact was not limited to Saipem. Its
customers all over the world suffered disruption to their projects as they
were forced to take action to prevent being drawn into the attack. Such
actions included disabling user accounts and removing access to systems.
These preventive measures forced projects to find workarounds until the
incident was satisfactorily addressed. These workarounds can introduce new
security vulnerabilities that require additional attention to avoid increasing
exposure to attack. Saipem customers invested significant time and effort
investigating the incident to determine their exposure. Sensitive information
may have been exfiltrated, and accounts may have been compromised. This
weeks-long investigation distracted employees and drew down resources
needed for the actual EPC project.
If Saipem and its customers recognized this risk during the early stages of
the project, they could have developed defenses such as awareness training,
monitoring, and joint incident response plans. Although these mitigations
may not have prevented the attack, they would have reduced the impact to
the project.
Engineering
The engineering phase of the project focuses heavily on the design of
construction elements, for example, the fabrication of a vessel or oil and gas
platform, or the construction of a treatment plant.
During this phase, automation system vendors refine the details of their
solution in several ways:
• Defining the list of data, including the instrument type, location, and
data type
• Defining the control strategy
• Conducting or contributing to hazard and operability studies
(HAZOP), including a control system HAZOP (CHAZOP) that
focuses specifically on failures of the control system
• Identifying interfaces with other systems
• Designing the physical network arrangement
• Designing the cabinet arrangement and cabling details
Decisions made in the engineering phase can have significant impacts later
in the project or during operations. For example, there is a misconception
that isolating automation system equipment from other networks addresses
cybersecurity threats to that system. Obviously, this is not the case. Isolated
systems are exposed to many cybersecurity threats, including the use of
uncontrolled removable media. Furthermore, the isolation of automation
systems creates operational challenges that reduce cybersecurity posture. For
example, operating system patches and anti-malware updates must be
transferred manually using removable media, rather than through secure
network-based mechanisms. As discussed, manual processes are vulnerable
to failure.
Construction
There are two major issues relating to cybersecurity during the construction
phase of a project:
1. Management of change
2. Incident response preparedness
Management of Change
Despite the best efforts of everyone involved in the engineering phase, errors
and omissions will occur that must be corrected. A typical example is the
need to run additional cables to accommodate system connections.
Often, some requirements are omitted during the project phase. For instance,
equipment required for vendor remote access to its system may not be
included in the project scope because it is considered part of a separate
maintenance contract. As a result, changes may be needed to accommodate
this equipment later, as well as additional cabling to provide connectivity.
For the automation vendors, this phase of the project can last well over 12
months. During this time, numerous individuals from the asset owner, EPC
contractor, and the automation system vendor will come into contact with
the automation system equipment.
Basic cyber hygiene tasks, such as anti-malware protection, backup, and
electronic access management, should be performed during the construction
phase, but this is not always the case. This negligence can be attributed to
poor cybersecurity awareness, limited oversight, and minimal contractual
obligations. Many automation system vendors assume that because the
equipment is not operational, these tasks are not necessary. These important
steps are often seen as time-consuming and overly cautious for equipment
that is still under development. However, the risk to the project timescale
and associated cost of neglecting basic cyber hygiene is significant. For
example:
• A failure to maintain regular backups could result in a loss of several
days, or even weeks, of progress in the event of a cybersecurity
incident.
• A failure to maintain rigorous electronic access control, especially
with respect to joiners, movers, and leavers, can lead to a
compromise of systems.
• As with operational automation systems, there is a misunderstanding
that because these systems are not usually directly connected to the
Internet, they are not at risk from external threats. These systems are
often indirectly connected to the Internet,107 for example, through a
connection to the office network allowing developers to work from
their desks. When the automation system vendor is operating from a
temporary facility, there is an even greater chance of indirect
connectivity through poorly managed temporary firewalls.
• Even with no connection to the Internet, there is a major risk that
systems could be compromised from within. This is especially true
with poor management of removable media. This risk is increased
when secure file transfer facilities are not provided. Developers will
need a secure means to transfer files to and from servers and
workstations on the automation system.
The plan must identify incident handling procedures and categorize these
procedures for four stages:
1. Before the incident occurs – What activities must take place to be
prepared for an incident? One example is regular backup of
systems.
2. While the incident is underway – What activities must be
performed in response to each type of incident? An example would
be verifying the details of malware on detection.
3. Immediately after the incident – What activities should take
place when the incident is defined as over? These might include
communications to various stakeholders, damage assessment, and
restoration of any services that were suspended during the
incident.
4. During recovery – Which activities must take place once the
incident is over? Examples include replacing damaged equipment
and conducting a lessons-learned exercise or root-cause analysis.
Start-Up
The highest profile milestone in any project is start-up. Start-up is the
culmination of the project and highly symbolic. In some cases, completion
of the project may be strategically significant to the organization. Any delay
may have a negative impact on share price. As a result, there is a great deal
of focus on the start-up date. Start-up is the last chance to eliminate any
cumulative delays created during the project. There will be significant
pressure from management to make up this lost time and not miss the start-
up date. As we saw with commissioning, time pressure can cause important
steps to be skipped during the start-up phase.
Incident response continues to be critical during the start-up phase. High-
profile projects may attract unwanted attention. For instance, environmental
activists may seek to disrupt the operation of new oil and gas platforms
using cyber methods. Nation-states may seek to attack major new pipeline
projects to disrupt trade.
Even without such attention, a new facility is vulnerable during the early
stages of operation. With a new facility and new systems, operators and
technicians will not be familiar with normal behavior and will be slower to
identify abnormal situations. Training and incident response exercises
throughout a project prepare personnel for start-up and beyond. The project’s
incident response plan will be updated to reflect changes in circumstance
and include new threats.
In some sectors where cybersecurity posture is higher (e.g., oil and gas),
asset owners provide a set of cybersecurity requirements to contractors and
then conduct assessments on deliverables to confirm that these requirements
are being met.
In sectors where cybersecurity posture is lower (e.g., water and wastewater),
there may be no requirements issued or requirements may focus on IT
security.
The use of contractual requirements for cybersecurity varies from country to
country, depending on the regulatory environment.
Contracts should include cybersecurity requirements not only for systems,
but also for project execution. As noted throughout this chapter, a typical
project involves a wide range of cybersecurity risks. These risks can only be
managed if the contractors and vendors are aware and prepared.
One important consideration for EPC projects is that the EPC will issue its
own contracts to subcontractors and vendors. Asset owners should therefore
ensure that a contract with the EPC stipulates what cybersecurity conditions
must be passed on to subcontractors and others working for the EPC.
Key considerations for cybersecurity in contracts are as follows:
• Explicit milestones, deliverables, and payments related to
cybersecurity. Examples include successful completion of design
review(s), successful red-team assessment, and closure of
cybersecurity punch-list items or actions. Payment terms for these
milestones and deliverables must be significant enough that the EPC
or vendor is motivated to complete them.
• Quality assurance of handover documentation. As noted earlier, data
handed over in projects is often of poor quality. At the end of a
project, the EPC or vendor may not be willing to invest the additional
time required to clean up the data.
• Project-related cybersecurity activities. This would include a
cybersecurity incident response plan that addresses how the EPC or
vendor will deal with a cybersecurity incident on the project. The
contract should explicitly state that the EPC or vendor is responsible
for maintaining the cybersecurity posture of all equipment during the
project life cycle. This includes the patch status of the operating
system for servers and workstations, and the awareness of the EPC or
vendor’s employees and contractors.
Some asset owners now specify certified secure products in their contracts.
The popularity of this approach will continue to grow, as a standard provides
an independent means of assessing the security of products.
The ISA/IEC 62443 standards define compliance requirements for devices,
systems, and even the development life cycle of automation system vendors.
This is discussed in Chapter 5. This standards-based approach will
significantly improve the cybersecurity posture of systems and projects.
Look at how broad adoption of contracts requiring ISO 9000 compliance has
improved quality and safety for hazardous equipment. This standards-based
requirement is especially important in sectors or countries that lack well-
defined standards or regulations.
Verification of Requirements
Contracts must include key requirements, but including them does not
ensure the requirements will be met.
Governance, risk management, and compliance (GRC) defines how an
organization approaches these practices. GRC can highlight areas of
concern. However, the information produced from a GRC process is only as
good as the data provided.
Like many aspects of cybersecurity, there is often too much emphasis on
tools and not enough focus on the people and processes required for
effective oversight. For example, in many large organizations assessment
reports must be completed for each automation system. A GRC tool
generates a questionnaire to check compliance against cybersecurity
standards. It is based on questions such as:
• Is backup software in place?
• Is it possible to remotely access the safety system?
In most cases the response is either yes or no. Some questionnaires include
multiple-choice questions; for example, for the question, “How often is user
access reviewed?” the answers could be review not performed, reviewed
every 18 months, reviewed every 6 to 12 months, or reviewed every 3 to 6
months.
The viability of the questionnaire depends on the quality of the assessor, the
knowledge of the responder, and the availability of the information needed
to answer the questions. These questionnaires are often performed late in the
project life cycle. That means there is more information available, but it may
be too late in the project to address nonconformances.
A better approach is to verify that the product vendor has completed the
questionnaire before any contract is let. The vendor can also be asked to
provide responses such as compliant, optional at extra cost, and not
compliant.
It would still be necessary to validate that what is delivered meets the
original requirements. That validation is easier to accomplish based on
information provided by the vendor. The National Cyber Security Centre has
produced compliance guidelines for Operators of Essential Services (OES),
and Appendix B of these guidelines provides a helpful checklist that can be
used if no specific questionnaire exists.110
Performance Management
Effectively tracking the performance of a contractor or vendor is critical to
the success of a project. EPC projects typically use S-curves as a visual
representation of planned and actual progress. Figure 6-5 shows a simple
example of planned working hours per month (bars) and cumulative hours
(line). The S-curve gets its name from the fact that the cumulative hours line
is S-shaped. Depending on what is being tracked, the bars and lines might
represent other metrics, such as deliverables (number of HMI screens
completed, number of cabinets assembled, etc.).
Figure 6-5. Example S-curve.
The reality is often quite different. Figure 6-6 shows an example of plan
versus forecast for a project activity. For various reasons, projects take
longer to start than planned. The usual response is to show a forecast where
more work is done later (or backloaded) to achieve the planned target date.
Figure 6-6. Planned and actual performance S-curves.
Summary
Today projects that deliver new automation systems or enhancements to
existing systems routinely introduce new cybersecurity vulnerabilities in
organizations. In addition, the projects themselves contain vulnerabilities
that can impact the organization. The lack of understanding of cybersecurity
risks is a major factor, as is the failure to correctly manage cybersecurity.
This chapter has identified the key factors for successfully managing
cybersecurity:
• Secure senior project leadership support
• Embed cybersecurity throughout the project
• Embed cybersecurity requirements in all contracts
• Raise awareness within the project team
• Implement rigorous oversight processes
There are many things that organizations can leverage to improve results,
including the following:
• Use certified secure products from certified vendors.
• Define security controls at the design stage to avoid costly, less
effective, implementation later.
• Regularly review cybersecurity implementation progress in the
project.
• Link milestones and payments to cybersecurity requirements.
• Ensure the project resourcing and time plan includes a regular
cybersecurity update of equipment during execution (e.g., anti-
malware, software upgrades/patches, backups).
• Include a plan for changing over all user accounts and test code, as
well as removing vendor accounts, default accounts, and test
software or configurations.
• Define a cybersecurity incident response plan for the project that
includes all stakeholders, so a process is in place when an incident
occurs during project execution.
• Provide regular cybersecurity awareness training for everyone on the
project, including users, vendors, and integrators.
• Plan for an independent red-team assessment of the final as-built
environment, incorporating realistic scenarios to provide additional
assurance that security is in place as expected.
Cybersecurity is a critical element of operations and must be treated as such
during a project.
____________
103 William T. Shaw, Cybersecurity for SCADA Systems (Tulsa, OK: PennWell Corporation, 2006),
389.
104 Shaw, Cybersecurity for SCADA Systems, 390.
105 Also called front-end engineering design, or FEED.
106 Stephen Jewkes and Jim Finkle, “Saipem Says Shamoon Variant Crippled Hundreds of
Computers,” Reuters, December 12, 2018, accessed June 21, 2021,
https://www.reuters.com/article/us-cyber-shamoon/saipem-says-shamoon-variant-crippled-
hundreds-of-computers-idUSKBN1OB2FA.
107 According to a 2019 Dragos report, 66% of incident response cases involved adversaries
directly accessing the Industrial Control System (ICS) network from the Internet. Dragos, “2019
Year in Review,” accessed June 21, 2021, https://www.dragos.com/wp-
content/uploads/Lessons_Learned_from_the_Front_Lines_of_ICS_Cybersecurity.pdf.
108 POSC Caesar Association, “An Introduction to ISO 15926,” November 2011, accessed June 21,
2021, https://www.posccaesar.org/wiki/ISO15926Primer.
109 Capital Facilities Information Handover Specification, International Association of Oil & Gas
Producers (IOGP), “More About CFIHOS,” accessed June 21, 2021, https://www.jip36-
cfihos.org/more-about-cfihos/.
110 National Cyber Security Centre (NCSC), “NIS Compliance Guidelines for Operators of
Essential Service (OES),” accessed June 21, 2021,
https://www.ncsc.gov.ie/pdfs/NIS_Compliance_Security_Guidelines_for_OES.pdf.
7
What We Can Learn from
the Safety Culture
Introduction
Cybersecurity awareness training is a common tool employed by many
organizations. What constitutes awareness training and who receives it can
vary considerably. Any cybersecurity awareness training is better than none,
but training designed for those in information technology (IT) environments
is not sufficient for those in operational technology (OT) environments. This
distinction is lost in many organizations where training, like other aspects of
cybersecurity, is managed by the IT function. Such generic training neglects
the operational and cultural differences in OT facilities.
This chapter will identify the operational and cultural differences between an
IT and an OT environment. Taking these differences into account, it will
explore the essential elements of cybersecurity awareness training and
monitoring required for an OT environment.
The Importance of Awareness
Visit any OT facility today and you will likely find several obvious
cybersecurity policy violations or bad practices. Typical examples include
the following:
• Poor physical security, such as unlocked equipment rooms or keys
permanently left in equipment cabinet doors.
• Uncontrolled removable media used to transfer data.
• Poor electronic security, such as leaving user credentials visible. See
Figure 7-1 for a real example of this bad practice.
Figure 7-1. Control room console with user credentials visible on a permanent label.
This is consistent with other reports. Interviewed by the Wall Street Journal
for a report on the Duke Energy fine, security consultant Tom Alrich said,
“The state of compliance is pretty rotten.” Because he knew that Duke spent
a lot of money on its critical infrastructure protections, Alrich added, “I
really doubt they are much more insecure than anyone else.”112
The 127 violations relate to controls such as physical security, change
control, access management, configuration management, documentation,
information protection, and incident response, all of which are highly
dependent on individual training, awareness, and behavior. Some of the
violations actually related to gaps or failures in Duke Energy’s cybersecurity
awareness training program.
In July 2020, Duke Energy announced a $56 billion capital investment plan.
This plan featured several forward-looking statements,113 including this one
referencing cybersecurity threats: “These factors include, but are not limited
to: …[t]he impact on facilities and business from a terrorist attack,
cybersecurity threats, data security breaches, operational accidents,
information technology failures or other catastrophic events, such as fires,
explosions, pandemic health events or other similar occurrences.”114
Even though Duke Energy’s plan includes cybersecurity alongside well-
known types of threats, it may not help the company’s NERC CIP
compliance. This compliance depends on the performance of its employees,
their awareness of cybersecurity threats, and execution of the necessary
mitigations.
Underestimating Risk
Skepticism is a significant factor in poor cybersecurity preparedness. By
now, most individuals are familiar with one or more high-profile
cybersecurity incidents; some even may have been impacted by one. Despite
this growing awareness, it seems many people in OT environments feel
cybersecurity is not their problem. Their views fall into one of two camps:
1. The likelihood of a cyber incident is low because either the
organization is not a target, or it has not happened in the past.
2. The consequence of a cyber incident is low because many layers
of protection are in place.
Human Error
In an article titled “The Sorry State of Cybersecurity Imagery,” Eli
Sugarman and Heath Wickline note that images online are “all white men in
hoodies hovering menacingly over keyboards, green Matrix-style 1s and 0s,
glowing locks and server racks, or some random combination of those
elements—sometimes the hoodie-clad men even wear burglar masks. Each
of these images fails to convey anything about either the importance or
complexity of the topic.”116 It is therefore no surprise that most people’s
perception of a cybersecurity incident is limited.
Figure 7-3 shows a categorization of potential threat sources. Alongside the
obvious sources—terrorist, hacker, organized crime, and disgruntled former
employees—are accidental acts by well-meaning employees and contractors,
as well as disasters, natural and man-made.
Figure 7-3. Taxonomy of threat sources.
The potential for human error, and the limitations in mitigating this risk,
highlight the importance of people in cybersecurity. No amount of
technology or procedures can completely mitigate the potential for a human-
initiated cybersecurity incident. This is borne out by an analysis of the
initiating cause of high-profile cybersecurity incidents. Almost without
exception, a human was involved, clicking on a link in a phishing email,
failing to deploy patches, failing to follow removable media procedures, and
so on.
Unfortunately, many organizations still try to address the cybersecurity
challenge by deploying more and more technology and creating more and
more rules. They do not recognize or address the significance of humans.
Strict rules may provide the appearance that cybersecurity is under control.
A bowtie diagram with many additional barriers will help support this
argument. However, this approach can actually lead to complacency on the
part of the individuals. Workers assume that adequate controls are in place,
or that cybersecurity is someone else’s job. Even individuals aware of the
importance of cybersecurity may consider the threat mitigated by physical
security, technical controls, and procedures. This attitude may result in a
more casual approach to other controls, such as limiting electronic access
and the use of removable media. A common example of this in OT
environments is the deployment of universal serial bus (USB) locks on
equipment such as HMIs or servers. Personnel will claim these controls are
not necessary because the equipment is in locked cabinets inside secure
rooms. In fact, it is not unusual to discover these rooms are not adequately
secured, and the cabinets are left with keys in the locks. Even the keys
themselves are commonly available, and the same key can usually open most
cabinets from the same manufacturer.
Jessica Barker notes that “the burden for security often falls to the end user.”
She goes on to say that it is “not fair to ask people to add security” in the
same way that we “do not ask people to make sure the soft furnishings they
buy are fire resistant or the car they rent has been safety-tested.”129
Organizations identify security controls, define policies and procedures, and
provide awareness training. They must also provide realistic, workable
solutions that allow personnel to comply while maintaining the necessary
level of productivity. It is unfair, and unrealistic, to expect personnel to bear
the burden for security. They must do their part, but with organizational
support. This support should include the following:
Toolbox talks should also cover cybersecurity issues, for example, the risk of
performing a software update on a machine and “controls” such as
performing a backup before making any changes.
The more cybersecurity is embedded into the safety culture, the more likely
it is to be adopted as an integral part of operations rather than an
afterthought.
The First Line of Defense
The phrase “users are the weakest link” underestimates the challenges users
face in dealing with cybersecurity. Organizations recognize that badly
trained users operating with a lack of procedures, tools, or management
support are likely to initiate most, if not all, cybersecurity incidents.
Automation system product vendors (as well as system integrators and other
service providers) must understand cybersecurity so that they can design
products securely. Their organization must have secure development
procedures in place to validate that products are secure. These development
procedures will also improve the rigor in development and testing, providing
a higher quality, more reliable solution.
The organization can now create a trend line for PICO, which, in this
example, shows a steady increase toward three credible phishing incidents
per month. This number should be more meaningful to individuals. It is
similar in structure to well-known safety HiPo (high-potential) events they
might be familiar with.
Summary
Cybersecurity is constantly in the news, so it may seem reasonable to believe
that people have a good awareness of the cybersecurity risks their
organizations face. However, evidence indicates otherwise as incidents
continue to occur. This trend is primarily driven by people failing to enforce
good cybersecurity management practices. Even in regulated industries,
organizations still fail to meet cybersecurity management requirements. This
is largely due to personnel not following procedures, and a lack of oversight
and enforcement by management.
____________
111 See https://www.nerc.com/pa/comp/CE/Pages/Actions_2019/Enforcement-Actions-2019.aspx
for details.
112 Rebecca Smith, “Duke Energy Broke Rules Designed to Keep Electric Grid Safe,” Wall Street
Journal, updated February 1, 2019, accessed June 21, 2021, https://www.wsj.com/articles/duke-
energy-broke-rules-designed-to-keep-electric-grid-safe-11549056238.
113 A “forward-looking statement” is a recognized term in US business law that is used to indicate,
for example, plans for future operations or expectations of future events.
114 Duke Energy News Center, “Duke Energy Reaffirms Capital Investments in Renewables and
Grid Projects to Deliver Cleaner Energy, Economic Growth,” July 5, 2020, accessed June 21,
2021, https://news.duke-energy.com/releases/releases-20200705-6806042.
115 This is for illustrative purposes only. Specific likelihood, consequence, and risk values will vary,
but the deviation is likely to be as dramatic when more rigorous methods are used.
116 Eli Sugarman and Heath Wickline, “The Sorry State of Cybersecurity Imagery,” July 25, 2019,
accessed May 12, 2022, https://hewlett.org/the-sorry-state-of-cybersecurity-imagery/.
117 Repository of Industrial Security Incidents, “2013 Report on Cyber Security Incidents and
Trends Affecting Industrial Control Systems, Revision 1.0,” June 15, 2013, available by request
from RISI, https://www.risidata.com/
118 It is unclear whether this increase is due to better measurement or more human error, or both.
119 IBM Security, IBM X-Force Threat Intelligence Index 2020, 8, accessed June 21, 2021,
https://www.scribd.com/document/451825308/ibm-x-force-threat-intelligence-index-2020-pdf.
120 IBM Security, IBM X-Force Threat Intelligence Index 2020.
121 Jessica Barker, Confident Cyber Security (London: Kogan Page Limited, 2020), 91.
122 Barker, Confident Cyber Security, 92–93.
123 Whaling is a method of targeting high-profile employees of organizations, not just chief
financial officers, and derives its name from the big catch during phishing.
124 Dante Alighieri Disparte, “Whaling Wars: A $12 Billion Financial Dragnet Targeting CFOs,”
Forbes, December 6, 2018, accessed May 12, 2022,
https://www.forbes.com/sites/dantedisparte/2018/12/06/whaling-wars-a-12-billion-financial-
dragnet-targeting-cfos/?sh=7d0da85a7e52.
125 Darren Pauli, “Barbie-Brained Mattel Exec Phell for Phishing, Sent $3m to China,” The
Register, April 6, 2016, accessed May 12, 2022,
https://www.theregister.com/2016/04/06/chinese_bank_holiday_foils_nearperfect_3_million_m
attel_fleecing.
126 Richard H. Thaler and Cass R. Sunstein, Nudge: Improving Decisions About Health, Wealth,
and Happiness (New Haven, CT: Yale University Press, 2008).
127 Daniel Kahneman, Thinking, Fast and Slow (New York: Farrar, Straus and Giroux, 2011).
128 A. Ertan and G. Crossland, Everyday Cyber Security in Organizations (Royal Holloway
University of London, 2018), 23.
129 Barker, Confident Cyber Security, 69.
130 James Reason, “Achieving a Safe Culture: Theory and Practice,” Work & Stress: An
International Journal of Work, Health and Organisations 12, no. 3 (1998): 302, accessed June
21, 2021, https://www.tandfonline.com/doi/abs/10.1080/02678379808256868.
131 E. S. Geller, “10 Leadership Qualities for a Total Safety Culture,” Professional Safety, May
2020, accessed June 21, 2021,
http://campus.murraystate.edu/academic/faculty/dfender/OSH650/readings/Geller—
10%20Leadership%20Qualities%20for%20a%20Total%20Safety%20Culture.pdf.
132 ”It’s Up to Me” and the extract from “I Chose to Look the Other Way” are reprinted with the
permission of the author, Don Merrell. Contact Don Merrell at [email protected] to
inquire about the use of his poems or to comment on their impact.
133 Career Onestop Competency Model Clearing House, “Automation Competency Model,”
accessed June 21, 2021, https://www.careeronestop.org/competencymodel/competency-
models/automation.aspx.
134 My thanks to Collin Kleypas for the original idea.
135 I am indebted to Don Merrell for providing this modified verse from his poem “It’s Up to Me.”
It is reprinted with the permission of the author, Don Merrell. Contact Don Merrell at
[email protected] to inquire about the use of his poems or to comment on their impact.
8
Safeguarding Operational
Support
Introduction
One of the distinguishing features of operational technology (OT) is the
operational life of the equipment. Information technology (IT) is refreshed
every 18 months to 3 years to keep pace with the demands of users and their
applications. Conversely, OT equipment is designed for a specific, limited
set of functions. Once deployed, there is little desire to change it. Recall the
adage, “If it ain’t broke, don’t fix it,” from Chapter 2, “What Makes
Industrial Cybersecurity Different?” In fact, the high-availability
environments where OT exists create a unique operational support culture,
one that does not lend itself to good cybersecurity management.
Shortcomings include the following:
He then notes, “Things are changing; slowly, but they’re changing. The risks
are increasing, and as a result spending is increasing.”136
Things certainly are changing. In May 2020, Blackbaud, a cloud software
provider, was the victim of a ransomware attack and data breach. Blackbaud
managed data for a wide variety of organizations. The company’s own
publicity claims its customers include more than 25,000 organizations in
more than 60 countries. It serves arts and cultural organizations,
corporations, faith communities, foundations, healthcare organizations,
higher education institutions, individual change agents, K–12 schools, and
nonprofit organizations.137 The data breach affected, among others, at least 6
million individuals whose healthcare information was exfiltrated.138
Blackbaud ultimately paid the ransom in return for access to its customers’
data and a promise that the exfiltrated information was destroyed. This was
just the beginning of the consequences for Blackbaud. As of November
2020, it was the defendant in 23 consumer class-action lawsuits.139
In his blog, Schneier also discusses why vendors “spend so little effort
securing their own products.”
We in computer security think the vendors are all a bunch of idiots, but they’re behaving
completely rationally from their own point of view. The costs of adding good security to software
products are essentially the same ones incurred in increasing network security—large expenses,
reduced functionality, delayed product releases, annoyed users—while the costs of ignoring
security are minor: occasional bad press, and maybe some users switching to competitors’
products. The financial losses to industry worldwide due to vulnerabilities in the Microsoft
Windows operating system are not borne by Microsoft, so Microsoft doesn’t have the financial
incentive to fix them. If the CEO of a major software company told his board of directors that he
would be cutting the company’s earnings per share by a third because he was going to really—no
more pretending—take security seriously, the board would fire him. If I were on the board, I
would fire him. Any smart software vendor will talk big about security, but do as little as
possible, because that’s what makes the most economic sense.140
In this example, the organization used this tool for communication at four
levels within the company:
• Daily/weekly at the facility – Informing the overall facility manager
about the condition of the installation so it can assess suitability to
continue operations.
• Biweekly with the facility operations teams – Informing the
facility operations manager of the status of the installation and the
progress of ongoing remedial scopes, and providing the opportunity
to prioritize and escalate issues with teams.
• Monthly within the business unit – Providing a view of current
asset integrity and status of all operated assets’ safety-critical
elements and barriers to major accident hazards. This provides the
management team with a clear view of the progress being made
toward remediation of barrier impairments.
• Monthly at the overall organization leadership team level –
Providing the leadership team with a management overview of asset
condition.
For each level, the key question is: Are we still safe to operate?
Integrating cybersecurity into such a reporting tool helps to make
cybersecurity a key factor. First, consider Figure 8-3, a barrier representation
of the typical cybersecurity controls discussed in Chapter 4 and Chapter 5,
“Standardized Design and Vendor Certification.”
On its own, this barrier representation can be helpful, especially if the status
of the barriers can be determined by reading data from operational systems.
The real power of this approach comes if the barrier representation shown in
Figure 8-2 is updated to include a cybersecurity barrier. Now the barrier
representation reviewed at all four levels in the aforementioned organization
clearly shows the status of cybersecurity at the facility. The question Are we
still safe to operate? must now include the status of cybersecurity.
People Management
As noted throughout this book, technology is not the only facet of the
cybersecurity challenge. People and processes are equally as important as
technology, and in some cases more important.
Background Checks
Employers need a means of verifying the integrity and honesty of their
employees. In general, people with a history of honesty are more likely to be
honest in the future. Conversely, applicants who lie to obtain a job are more
likely to be dishonest once they have the job. Interviews alone may not be
sufficient to weed out dishonest applications.
• audit trails to track who took what action, and when, and
• periodic supervisory reviews of audit trails and other records to
verify all tasks are being performed as expected.
The information recorded in audit trails, and the frequency of reviews should
match the level of risk involved. It may be too late to take corrective action
if the review frequency is too low. In some cases, the individuals may have
left the organization, or the consequences are felt before the cause is known.
Even with the four-eyes principle in place, periodic reviews are essential. In
2005, the US Food and Drug Administration (FDA):
…carried out an inspection of Able Laboratories, a New Jersey–based generic pharmaceutical
manufacturer between May 2 and July 1, 2005. As a result of finding discrepancies between
paper and electronic records in the analytical laboratory and due to the firm’s failure to
investigate out-of-specification (OOS) results, the company ceased manufacturing operations,
recalled 3,184 batches of product (its entire product line) and withdrew seven Abbreviated New
Drug Applications (ANDAs). The resulting problems and a failure to resolve the issue with the
FDA resulted in a USD 100 million bankruptcy filing in October 2005 and a fire sale of the
company’s assets.147
The most significant process failure, in all three cases, was the failure to
remove access rights and change account details in response to a leaver, in
particular, a disgruntled one.
This continues to be a problem, even in large, blue-chip organizations that
have the resources to manage risk accordingly. Contractors working at
multinational companies continue to have access to business systems for
several days, sometimes weeks, after leaving. Rarely, if ever, are user
accounts for automation systems changed when someone leaves an
organization. Many systems use standard accounts that have been in place
for years, or even decades. Former employees of system vendors may still
recall the default administrator account for their former company’s products.
Manual Procedures
There have been references to manual procedures throughout this book.
These range from support processes, such as reviewing and approving access
to systems, to system support activities, such as performing backups or
operating system patching.
The effort to become agile leads organizations to rethink their processes and
procedures. Although many larger organizations could benefit from less
bureaucracy, care must be taken to protect critical processes needed to
manage business risk, including cybersecurity. If agility is treated as an
excuse to remove or short-circuit any and all processes, the result can be
chaos.
As a relatively new function, cybersecurity is particularly vulnerable to these
issues. In many organizations, individuals who are already fully allocated are
given the additional task of cybersecurity manager, cybersecurity single
point of accountability, or some other function almost as an afterthought. In
automation environments, this person might be the lead instrumentation and
control technician or engineer in a facility. Although it might make sense for
this person to be given responsibility and oversight, he or she will almost
certainly need additional resources to ensure the necessary tasks are
performed. These resources include the following:
• Operating system and software updates
• Anti-malware updates
• Backups
• User account updates
• Log file analysis
Inventory Management
A key element of successful OT cybersecurity management during
operational support is inventory management. When a product vulnerability
is announced, the first question to answer is: Does this affect my
organization, and if so, where, and how much?
It is impossible to answer this question without an accurate and up-to-date
equipment inventory. An equipment inventory is sometimes called an asset
register or configuration management database. It can be as simple as an
Excel spreadsheet or can be a purpose-made relational database and
application. IT and OT security vendors offer inventory management
systems.
IT solutions can work well for IT systems and devices. This equipment tends
to be based on a small number of standard operating systems, which are
normally connected to a network. Most of these cooperate well with asset
management systems, providing information about their configuration and
patch status, for instance.
The same is not true for OT systems and devices. There are several
challenges to using an automated tool to create a reliable OT device
inventory:
• The range of device types is much larger and includes many
firmware and software solutions that are not designed to interact with
asset management solutions.
• Many devices that are networked may only respond to the most basic
industrial protocol commands. Rarely do these commands support
the return of configuration information. This is a requirement for an
effective inventory.
• There is no guarantee that devices are accessible on a common
communications network. Many installations will contain serially
connected (RS-232, RS-485, RS-422) devices that only respond to
the aforementioned basic industrial protocol commands.
• In more modern OT networks, there may be industrial firewalls or
data diodes that isolate devices from the wider network. This design
limits communications to very few industrial protocol commands.
Some asset owners avoid these issues by focusing their inventories only on
network-connected devices, or other specific categories such as Windows
devices. This strategy is fundamentally flawed. The compromise or failure of
any interconnected device could cause operational issues. Every device that
is required for the successful operation of the system should be included in
the inventory.
Creating an Inventory for New Facilities
For new facilities, creating an inventory should be very straightforward. In
all contracts, vendors should be required to provide a specific set of
inventory data, such as the following:
Figure 8-5 shows how the proportions typically change as the size of the
facility increases. Note that as the facility size increases, the number of
embedded devices grows, but the number of Windows devices stays
relatively fixed. This is because the control room (where most of the
Windows devices are located) does not grow in direct proportion to the
facility size. The number of embedded devices (PLCs and RTUs) must
increase to manage additional process areas.
Figure 8-5. Change in device proportions for varying OT facility sizes.
Incident Response
Incident response planning is not just about preparing for the inevitable
incident. Considering plausible scenarios facilitates a review of business risk
and the identification of additional mitigations to reduce this risk.
During the press conference, Sherriff Gualtieri laid out the sequence of
events:
• The operator was aware that his supervisor and other users routinely
used remote access to view the HMI screen and so did not report the
incident.
• At approximately 1:30 p.m. on the same day, the operator noticed a
second remote access to the HMI. This time, the remote user
navigated through various screens and eventually modified the set
point for sodium hydroxide (lye) to a level that would be toxic to
humans.
The remote user logged off, and the operator immediately reset the sodium
hydroxide level to normal. The operator then disabled remote access and
reported the incident to the city and to local and state law enforcement. It is
unclear if the operator was following an incident response plan or was just
experienced enough to make the right decisions. City representatives stated
at the press conference that additional controls were in place to prevent
exposure of toxic water to consumers, but they did not describe them in
detail.
At the time of this writing, the investigation is still underway, and the culprit,
and his or her intentions, remain unclear. The most likely explanations range
from an authorized user who made the change in error, a disgruntled former
employee or contractor, or a random hacker who discovered the system was
accessible from the Internet. Other options that should not be discounted are
organized crime syndicates or nation-states. The water treatment plant
affected was 15 miles from the Raymond James Stadium in Tampa, Florida,
which hosted the Super Bowl just two days after the incident occurred.
It was fortunate that the city of Oldsmar operator was sufficiently observant
and aware to take immediate action. This prevented catastrophic
consequences. It remains to be seen how well-prepared similar organizations
would be.
There are more than 145,000 active public water systems in the United
States (including territories). Of these, 97% are considered small systems
under the Safe Drinking Water Act, meaning they serve 10,000 or fewer
people. Public water systems of the size of the one in the city of Oldsmar
(15,000 population) have limited resources to manage threats to their
operations.
Although it resulted in a near miss, the Oldsmar incident highlights gaps in
process and people elements. Closing these gaps could make future events
less likely and the potential consequences less severe:
• The operator observed a remote user several hours before the
attempted set-point adjustment. This did not arouse suspicion
because the supervisors used remote access to monitor the plant.
Remote access of this type must be strictly limited to specific users,
from specific locations, at specific times. The Oldsmar operator
should have known who was accessing the system. If this was not an
authorized user, it should have prompted the operator to activate the
incident response plan. This response would start with disconnecting
remote access to the system. It would then initiate various forensics
and tests to determine if anything had been altered (e.g., code), in
either the systems or processes (e.g., set points, alarms).
• Until the incident was reported, the engineering company that
developed the supervisory control and data acquisition (SCADA)
system for the city of Oldsmar maintained a page on the portfolio
section of its website. This page displayed a screen from the SCADA
HMI, providing details of plant processes (e.g., number of reverse
osmosis skids, number of pumps on each skid). It was easy to see the
button that would enable navigation to the sodium hydroxide page.
Such a screenshot is extremely valuable in terms of planning a
potential attack. The page is now deleted, although it can be found in
Internet archives through search tools.
• The deleted page also had a summary of the project, which included
the following description of an automatic control feature: “(Noting
that the engineering company…) worked with the city to create an
easy-to-use, single-button interface. This button resides on the
SCADA screen in the control room and is also accessible through
city iPads connected to the SCADA system. Operators can easily
press the button to initiate automatic control regardless of their
location, which is helpful in emergency situations and during routine
site tours.” This raises the question about what functionality should
be accessible remotely. A common initial reaction to the Oldsmar
incident was “There should be no remote access at all, ever.” This is
unrealistic and impractical. Even if remote access were not available,
users would inevitably find their own less secure solutions. This is a
cultural issue because the same users would not try to circumvent a
safety system or safety procedures. In addition to ensuring remote
access is securely designed and limited by user, location, time, and
duration, remote access should offer limited functionality. The ability
to monitor or view may be all that is required for most users.
Although it may be desirable to have an automatic control switch, is
it really necessary? In which circumstances would it be used? Are
these rare enough that the risk outweighs the benefit?
• The incident raises questions about the functionality of the SCADA
system itself. In the aforementioned press conference, the city
reported that the unauthorized remote user attempted to change the
set point of sodium hydroxide from 100 parts per million (ppm) to
11,100 ppm. This higher level seems to be way outside any normal
expected setting. That prompts the question: “Why would the
SCADA system accept such a setting?” In fact, it is unclear if it did
accept the new level. At the press conference, a city official said the
operator reset sodium hydroxide to its normal level, which implies it
was changed. Recall the standard layers of protection model that has
been presented throughout this book. The basic process control
layer’s function is to maintain the process within its normal, safe,
operating envelope. If this safe operating envelope is not properly
defined, the basic process control layer will not perform correctly,
which means the risk transfers to other layers, in this case the plant
personnel/process alarm layer. Limiting the sodium hydroxide range,
restricting who could change it, and from where they could change it
would have been significant mitigation factors in this case. This is
why OT cybersecurity risk quantification is so different from IT risk
quantification. As noted in Chapter 4, the assessment process must
consider the hazards in the process and treat cybersecurity as an
initiating cause.
Note that many of the incident detection tools and methods promoted by IT
vendors (and even some supposed OT vendors) would have done little to
help the city of Oldsmar. Intrusion detection and prevention systems only
work if the unauthorized access can be identified as abnormal. As already
mentioned, even the operator could not determine if the user was authorized.
It is unlikely that any tool would have been able to discern this fact.
Likewise, regarding the change in the sodium hydroxide set point, if the
system enables the user to change the value, then there is no way a detection
system could identify this as an anomaly.
The city of Oldsmar incident provides clear evidence of why cybersecurity
incident response planning is required, and why this planning must take
account of OT factors.
In many cases, the personnel from these organizations are in place so long
that they become indistinguishable from asset-owner personnel. Few asset
owners properly manage the cybersecurity risks arising from these
arrangements:
• Third-party computers may not have the necessary security controls
(e.g., anti-malware protection, application control, user access), yet
they may be connected to business-critical systems or networks.
• Vendors may not have sufficient controls in place to manage user
credentials for their clients’ systems. Examples include having
standard administrator accounts for all client systems, sharing
account details, and not securely protecting these account details.
• Vendors may not have procedures in place to manage system
backups. They must also protect these backups to ensure continuity
of operations for their clients. In the case of cloud environment
providers, this oversight could be catastrophic for the asset owner, as
illustrated by the Blackbaud example earlier in this chapter.
• Suppliers, vendors, and subcontractors may not have adequate
security management systems in place in their organization. Their
vulnerability to cybersecurity incidents exposes the asset owner.
• Suppliers, vendors, and subcontractors may not provide adequate
security awareness training to their personnel. These personnel may
be working in the asset owner’s business-critical environment where
this awareness is essential.
The UK Centre for Protection of National Infrastructure (CPNI) and
National Cybersecurity Centre (NCSC) published a set of 12 supply chain
principles.152 These are divided into four stages:
1. Understand the risks:
a. Understand what must be protected and why.
b. Know who your suppliers are and build an understanding of
what their security looks like.
c. Understand the security risk posed by your supply chain.
2. Establish control:
a. Communicate your view of security needs to your suppliers.
b. Set and communicate minimum security requirements for
your suppliers.
c. Build security considerations into your contracting processes
and require that your suppliers do the same.
d. Meet your own security responsibilities as a supplier and
consumer.
e. Raise awareness of security within your supply chain.
f. Provide support for security incidents.
3. Check your arrangements:
a. Build assurance activities into your approach to managing
your supply chain.
4. Implement continuous improvement:
a. Encourage the continuous improvement of security within
your supply chain.
b. Build trust with suppliers.
A key step to establishing control is contract management. Contracts should
be tailored to specific arrangements. Contract clauses should reflect this.
However, to mitigate cybersecurity risks, the following key aspects must be
included in contracts with third parties:
Insurance
There is an established cyber insurance market focused on IT cybersecurity
risks, and insurers and brokers are now developing policies to cover threats
to OT infrastructure. As explained in Chapter 2, OT or industrial
cybersecurity is different, and insurers and brokers are still learning what
risks an asset owner is exposed to from an OT cybersecurity incident.
Chapter 4 discussed methods to measure and manage this risk.
The two high-profile ransomware incidents in early 2021 that are referenced
in the introduction to this book were resolved when insurers negotiated
payments. The asset owners had sufficient insurance coverage to enable
payment of large sums ($4.4 million and $11 million).
Tom Finan of Willis Towers Watson, a global insurance broking company,
points out that “having a cyber insurance policy does not make a company
safer. Instead, an enhanced cybersecurity posture results from going through
the cyber insurance application and underwriting process.”153
Insurance alone is not sufficient for asset owners to manage their IT or OT
cybersecurity risk. Asset owners must properly understand and manage their
risk if they are to continue to have insurance coverage as part of their overall
risk management. As Finan and McIntyre note: “To provide coverage,
brokers and underwriters need information about an applicant’s cyber risk
posture. Brokers seek that information to tell a client’s ‘story’ to the market
—specifically, how a client is addressing cyber risk, the lessons it’s learned,
and how it’s applying those lessons. Stories that show steady risk
management improvement over time help brokers make an effective case for
coverage. For their part, underwriters take on all the risk. In other words,
they’re the companies that pay out when a bad cyber day happens.
Unsurprisingly, they want as much certainty as possible about an applicant’s
cyber position before they issue a policy.”154
Summary
Although OT environments have a different operational support culture from
IT environments, many factors can give OT cybersecurity the management
attention it requires.
• The safety culture that is ingrained in all OT environments can
incorporate cybersecurity, treating it as another initiating cause of
high-impact incidents that can occur.
• The use of management monitoring tools, such as the barrier
representation, can ensure that cybersecurity is considered at the
same level as other protective layers.
As noted throughout this book, technology is not the only element of the
cybersecurity challenge. People and process are critical weak points. Much
of what happens in operational environments revolves around people.
Cybersecurity relies on training and awareness, and the adherence to strict
processes and procedures. Gaps in training and awareness or in processes
and procedures create vulnerabilities that can be as severe as any technical
issue.
Incident response is one of the most importance plans to have in place. With
the growth in high-profile cybersecurity incidents and the knowledge of the
costs of dealing with them, it is harder for organizations to ignore the need
for good preparation. There is still work to be done to educate asset owners
that good incident response planning does not begin and end in their own
organization. The use of suppliers, vendors, and subcontractors means that
cybersecurity risks, and their remediation, rely on the cooperation of all
parties.
One key control that asset owners can use is contract management. A set of
model clauses that represent good cybersecurity management should be
included in all third-party contracts. These should be nonnegotiable. Any
third party that is not already following these practices should not be in
business today.
Although insurance can be a useful tool for an asset owner, it cannot replace
effective identification and proactive management of risk.
As with all other aspects of cybersecurity management, there is still much to
do in operational support, but the elements are in place to improve the
cybersecurity posture of all organizations.
____________
136 Bruce Schneier, “Secrets and Lies: Introduction to the Second Edition,” Schneier on Security
blog, accessed June 21, 2021, https://www.schneier.com/books/secrets-and-lies-intro2.
137 Blackbaud, “Cloud Software Built for the World’s Most Inspiring Teams,” accessed June 21,
2021, https://www.blackbaud.com/.
138 Marianne Kolbasuk McGee, “Blackbaud Ransomware Breach Victims, Lawsuits Pile Up,”
BankInfo Security, Information Security Media Group, September 24, 2020, accessed June 21,
2021, https://www.bankinfosecurity.com/blackbaud-ransomware-breach-victims-lawsuits-pile-
up-a-15053.
139 Maria Henriquez, “Blackbaud Sued After Ransomware Attack,” Security magazine, November
6, 2020, accessed June 21, 2021, https://www.securitymagazine.com/articles/93857-blackbaud-
sued-after-ransomware-attack.
140 ”Schneier, “Secrets and Lies: Introduction to the Second Edition.”
141 David E. Sanger, Nicole Perlroth, and Eric Schmitt, “Scope of Russian Hack Becomes Clear:
Multiple U.S. Agencies Were Hit,” New York Times, December 14, 2020, accessed June 21,
2021, https://www.nytimes.com/2020/12/14/us/politics/russia-hack-nsa-homeland-security-
pentagon.html.
142 Tom Kemp, “What Tesla’s Spygate Teaches Us About Insider Threats,” Forbes, July 19, 2018,
accessed June 21, 2021, https://www.forbes.com/sites/forbestechcouncil/2018/07/19/what-
teslas-spygate-teaches-us-about-insider-threats/?sh = 4a09507c5afe.
143 Ben Popken, “Facebook Fires Engineer Who Allegedly Used Access to Stalk Women,” NBC
News, May 1, 2018, accessed June 21, 2021, https://www.nbcnews.com/tech/social-
media/facebook-investigating-claim-engineer-used-access-stalk-women-n870526.
144 Iain Thomson, “US Engineer in the Clink for Wrecking Ex-Bosses’ Smart Meter Radio Masts
with Pink Floyd lyrics,” The Register, June 26, 2017, accessed June 21, 2021,
https://www.theregister.com/2017/06/26/engineer_imprisoned_for_hacking_exemployer/.
145 Tony Smith, “Hacker Jailed for Revenge Sewage Attacks,” The Register, October 31, 2001,
accessed June 21, 2021,
https://www.theregister.com/2001/10/31/hacker_jailed_for_revenge_sewage/.
146 H. Boyes, Draft Code of Practice for Cyber Security in the Built Environment, Institution of
Engineering and Technology, Version 2.0, January 31, 2021.
147 R. D. McDowall, “Quality Assurance Implications for Computerized Systems Following the
Able Laboratories FDA Inspection,” Quality Assurance Journal 10 (2006): 15–20.
148 Willem Z.’s full name was not given in any of the online records of this incident.
149 ”Sewer Hack Committed Via Admin and Test Accounts” (“Rioolhack gepleegd via admin- en
testaccounts”), AG Connect, September 14, 2018, accessed June 21, 2021,
https://www.agconnect.nl/artikel/rioolhack-gepleegd-admin-en-testaccounts.
150 Alexander Martin, “Garmin Obtains Decryption Key After Ransomware Attack,” Sky News,
July 28, 2020, accessed June 21, 2021, https://news.sky.com/story/garmin-obtains-decryption-
key-after-ransomware-attack-12036761.
151 ”Treatment Plant Intrusion Press Conference,” YouTube, accessed June 21, 2021,
https://www.youtube.com/watch?v = MkXDSOgLQ6M.
152 National Cyber Security Centre, “Supply Chain Security Guidance,” accessed June 21, 2021,
https://www.ncsc.gov.uk/collection/supply-chain-security.
153 Tom Finan and Annie McIntyre, “Cyber Risk and Critical Infrastructure,” Willis Towers
Watson, March 8, 2021, accessed June 21, 2021, https://www.willistowerswatson.com/en-
US/Insights/2021/03/cyber-risk-and-critical-infrastructure.
154 Finan and McIntyre, “Cyber Risk and Critical Infrastructure.”
9
People, Poetry, and Next
Steps
People and process failures are most apparent in project delivery and
operations. Poor execution or oversight can negate some or all the benefits
of secure designs. This may occur through the introduction of new
vulnerabilities that are not properly identified or addressed. It can surface in
the form of poor practices during the development or commissioning of a
system.
Raising and maintaining awareness among personnel is key to ensuring that
cybersecurity is always at the forefront of everyone’s mind. Individual
awareness reduces the likelihood of causing an incident and increases the
chances of avoiding one.
Cybersecurity is still a long way from being managed like safety. To reach
this goal would require the following:
• Secure-by-design systems and products made by companies that
recognize the importance of security, even if their customers don’t.
• Developers and system integrators trained in the importance of
security in their day-to-day work. They must recognize that one
mistake could lead to calamity.
• Asset owners who refuse to buy products that are not secure by
design and refuse to work with companies that do not demonstrate
their commitment to security.
• Facility personnel trained in the importance of security who treat it
like safety, as an integral part of their job. This includes stop-work
authority if any activity appears insecure.
• The recognition that while technology is an important tool to
manage security, it is useless without people and processes.
I will close this book with Don Merrell’s revised ending to his famous
safety poem “It’s Up to Me.”155 It is a reminder to us all that we can, and
must, do our part to ensure good cybersecurity management, whether we
are product designers, system integrators, engineers, technicians, operators,
or managers.
We can all have a workplace that’s cyber-incident free
If we each one Commit, to Making It Be,
If we all do our part, and each of us see,
If It’s Going to Happen, - Then It’s Up to Me.
____________
155 Once again, my thanks to Don Merrell for providing this modified verse from his poem “It’s
Up to Me.” It is reprinted with his permission. Contact Don Merrell at
[email protected] to inquire about the use of his poems or to comment on their impact.
Bibliography
Finan, T., and A. McIntyre. “Cyber Risk and Critical Infrastructure.” Willis
Towers Watson website. March 8, 2021. Accessed June 21, 2021.
https://www.willistowerswatson.com/en-US/Insights/2021/03/cyber-
risk-and-critical-infrastructure.
Lockheed Martin Corporation. “The Cyber Kill Chain.” Accessed June 21,
2021. https://www.lockheedmartin.com/en-us/capabilities/cyber/cyber-
kill-chain.html.
Martin, Alexander. “Garmin Obtains Decryption Key After Ransomware
Attack.” Sky News, July 28, 2020. Accessed June 21, 2021.
https://news.sky.com/story/garmin-obtains-decryption-key-after-
ransomware-attack-12036761.
Marszal, E., and J. McGlone. Security PHA Review for Consequence-Based
Cybersecurity. Research Triangle Park, NC: ISA (International Society
of Automation), 2019.
McCoy, Kevin. “Target to Pay $18.5M for 2013 Data Breach that Affected
41 Million Consumers.” USA Today. Updated May 23, 2017. Accessed
June 21, 2021.
https://www.usatoday.com/story/money/2017/05/23/target-pay-185m-
2013-data-breach-affected-consumers/102063932/.
McGee, Marianne Kolbasuk. “Blackbaud Ransomware Breach Victims,
Lawsuits Pile Up.” BankInfo Security, Information Security Media
Group, September 24, 2020. Accessed June 21, 2021.
https://www.bankinfosecurity.com/blackbaud-ransomware-breach-
victims-lawsuits-pile-up-a-15053.
McDowall, R. D. “Quality Assurance Implications for Computerized
Systems Following the Able Laboratories FDA Inspection.” Quality
Assurance Journal 10 (2006): 15–20.
McLeod, Dr. Saul. “Maslow’s Hierarchy of Needs.” Updated December 29,
2020. Accessed June 21, 2021.
https://www.simplypsychology.org/maslow.html.
Mustard, Steve. Mission Critical Operations Primer. Research Triangle
Park, NC: ISA (International Society of Automation), 2018.
The Mitre Corporation. “ATT&CK for Industrial Control Systems.”
Accessed June 21, 2021.
https://collaborate.mitre.org/attackics/index.php/Main_Page.
Nakashima, E., Y. Torbati, and W. Englund. “Ransomware Attack Leads to
Shutdown of Major US Pipeline System.” Washington Post, May 8,
2021. Accessed June 21, 2021.
https://www.washingtonpost.com/business/2021/05/08/cyber-attack-
colonial-pipeline/.
National Cyber Security Centre (NCSC). “Supply Chain Security
Guidance.” Accessed June 21, 2021.
https://www.ncsc.gov.uk/collection/supply-chain-security.
National Cyber Security Centre (NCSC). “NIS Compliance Guidelines for
Operators of Essential Service (OES).” Accessed June 21, 2021.
https://www.ncsc.gov.ie/pdfs/NIS_Compliance_Security_Guidelines_for
_OES.pdf.
NIST. “Components of the Cybersecurity Framework.” Presentation, July
2018. https://www.nist.gov/cyberframework/online-
learning/components-framework.
North Carolina State University and Protiviti. “Illuminating the Top Global
Risks in 2020.” Accessed June 21, 2021. https://www.protiviti.com/US-
en/2020-top-risks.
“Order Granting Final Approval of Settlement, Certifying Settlement Class,
and Awarding Attorney’s Fees, Expenses, and Service Awards.” Equifax
Data Breach Settlement. Accessed June 21, 2021.
https://www.equifaxbreachsettlement.com/admin/services/connectedapp
s.cms.extensions/1.0.0.0/927686a8-4491-4976-bc7b-
83cccaa34de0_1033_EFX_Final_Approval_Order_(1.13.2020).pdf.
Occupational Health and Safety Hub. “Quick Safety Observation Card –
Free Template.” https://ohshub.com/quick-safety-observation-card-free-
template/.
Pauli, Darren. “Barbie-Brained Mattel Exec Phell for Phishing, Sent $3m to
China.” The Register, April 6, 2016. Accessed May 12, 2022.
https://www.theregister.com/2016/04/06/chinese_bank_holiday_foils_ne
arperfect_3_million_mattel_fleecing.
Popken, Ben. “Facebook Fires Engineer Who Allegedly Used Access to
Stalk Women.” NBC News, May 1, 2018. Accessed June 21, 2021.
https://www.nbcnews.com/tech/social-media/facebook-investigating-
claim-engineer-used-access-stalk-women-n870526.
POSC Caesar Association. “An Introduction to ISO 15926.” November
2011. Accessed June 21, 2021.
https://www.posccaesar.org/wiki/ISO15926Primer.
Prince, Brian. “Researchers Detail Critical Vulnerabilities in SCADA
Product.” Security Week, March 13, 2014. Accessed June 21, 2021.
https://www.securityweek.com/researchers-detail-critical-vulnerabilities-
scada-product.
Rathwell, Gary. PERA Enterprise Integration (website). Accessed June 21,
2021. http://www.pera.net/.
RSAC Contributor. “The Future of Companies and Cybersecurity
Spending.” Accessed June 21, 2021.
https://www.rsaconference.com/library/Blog/the-future-of-companies-
and-cybersecurity-spending.
“S.I. No. 360/2018 – European Union (Measures for a High Common Level
of Security of Network and Information Systems) Regulations 2018.”
Electronic Irish Statute Book. Accessed June 21, 2021.
http://www.irishstatutebook.ie/eli/2018/si/360/made/en.
SP 800-82 Rev. 2. Guide to Industrial Control Systems (ICS) Security.
Gaithersburg, MD: NIST (National Institute of Standards and
Technology), 2015. Accessed June 21, 2021.
https://csrc.nist.gov/publications/detail/sp/800-82/rev-2/final.
Thompson, Mark. “Iranian Cyber Attack on New York Dam Shows Future
of War.” Time, March 24, 2016. Accessed June 21, 2021.
https://time.com/4270728/iran-cyber-attack-dam-fbi/.
Thaler, Richard H., and C. R. Sunstein. Nudge: Improving Decisions About
Health, Wealth, and Happiness. New Haven, CT: Yale University Press,
2008.
“Treatment Plant Intrusion Press Conference.” YouTube. Accessed June 21,
2021. https://www.youtube.com/watch?v=MkXDSOgLQ6M.
Further Reading
Barker, Jessica. Confident Cyber Security: How to Get Started in Cyber
Security and Futureproof Your Career. London: Kogan Page Limited,
2020, ISBN 978-1789663426.
Hubbard, Douglas W. and Seiersen, Richard. How to Measure Anything in
Cybersecurity Risk. Hoboken, NJ: John Wiley & Sons, Inc., 2016.
Useful Resources
Infracritical (http://infracritical.com/). Infracritical is an organization
founded by Bob Radvonsky, Jake Brodksy, Tammy Olk, and Michael
Smith, internationally recognized experts in the field of industrial
cybersecurity. Infracritical provides two resources:
B
background checks, people management, 213
backup and recovery procedures, 89, 140–141
automation system, 140
restoration, 141
storage and retention, 140–141
verification requirements, 141
Barker, Jessica, 189, 190, 191
barrier model analysis, visualization and, 208–211
basic automation, 16
gas turbine, 17
protection, 15
Bayes’s theorem, 83
adjusted beta distribution after evidence applied, 73
estimating, 71
formula for, 71
frequentist statistics for estimating, 70, 71
quantifying risks, 70–73
benchmarking, cybersecurity incidents, 48–52
Berners-Lee, Tim, 1
big catch, phishing, 190n123
Bird’s triangle, 38
Blackbaud, software provider, 207, 208
Boden, Vitek, 2, 215–216
Boeing’s 737 MAX 800 aircraft, 9
bowtie diagram, IT and OT network, 187, 188
Braithwaite, Al, 224
British Standard (BS) 7799, Information Technology, 29
Brodsky, Jake, 246
bus, 146
C
Capital Facilities Information Handover Specification (CFIHOS), 173–174
Center for Internet Security (CIS), Critical Security Controls, 27
Centre for the Protection of National Infrastructure (CPNI), 5, 228
CFIHOS. See Capital Facilities Information Handover Specification (CFIHOS)
CFO fraud, 190
Challenge Handshake Authentication Protocol (CHAP), 130
change management, cybersecurity, 166–167
CHAZOP (control hazard and operability study), 63, 164
Chemical Facility Anti-Terrorism Standards (CFATS), 26
Chernivtsioblenergo, Ukrainian regional electricity, 37
Chernobyl Nuclear Power Plant, 12
chief information security officer (CISO), 6
C-I-A (confidentiality-integrity-availability) triad, OT versus IT, 8, 13
ciphertext, encryption, 152
CISO. See chief information security officer (CISO)
CLAM terminology, 34, 35
class-action lawsuits, 207
cleartext, encryption, 152
closed-circuit television (CCTV), 118, 119
cold standby, redundancy, 143
commissioning
challenges during, 170
red-team assessment, 170–171
Common Industrial Protocol (CIP), 91
communication networks, system availability, 146–147
communications technology, selection for remote, 132–134
competency
continuous evaluation, 199–201
organizations, 195–197
supply chain, 197, 199
Component Security Assurance (CSA), 155
conduits
connection between zones, 107
definition, 104
list for facility architecture, 110
secure network design, 103–111
confidence interval, Monte Carlo simulation, 68
Confident Cyber Security! (Barker), 189
configuration error, 187
configuration management database, 218
construction, cybersecurity, 166
consulted, RACI term, 34
continuous evaluation, cybersecurity, 199–201
contracts
agreements, 213
aspects of, with third parties, 229–230
embedding cybersecurity requirements in all, 174–175
verification of requirements, 176–177
control system hazard and operability studies (CHAZOP), 164
COVID-19 pandemic, 121, 169
CPNI. See Centre for the Protection of National Infrastructure (CPNI)
Creeper computer worm, 1
Critical Infrastructure Protection (CIP), regulations, 26
Cullen Report, 56
culture, significance of, 12–13
cyber hygiene, 168–169
cyber insurance market, 230–231
Cyber Kill Chain, Lockheed Martin, 82
cyber physical systems, term, 6
cybersecurity, 2, 5. See also people management
allocating resources, 31–36
assigning ownership, 31–36
assigning representation, 31
awareness training, 181, 203
barrier model analysis and visualization, 208–211
commissioning, 170–171
committee charter, 31, 32
construction, 166
continuous evaluation, 199–201
elements of, 5
embedding, throughout project, 162–164
embedding requirements in all contracts, 174–175
engineering, 164–165
factors for successfully managing, 160, 179
feasibility of, 162–164
first line of defense, 194–195, 202–203
handover and closeout, 172–174
human error, 185–190
importance of awareness, 181–183
improving results for, 180
incident response preparedness, 167–170
industrial, 233–234
information technology (IT) and operational technology(OT), 8–19, 233–234
insurance, 230–231
management, 161
management of change, 166–167
performance management, 178–179
project stages and, considerations, 163
raising awareness within project team, 175–176
risk, 2–3, 199
risk management, 55, 177
safety culture, 191–194
suppliers, vendors, and subcontractors, 227–228
supporting the right behaviors, 191
terminology, 6
training and competency, 195–199
underestimating risk, 183–185
“users are the weakest link”, 194–195
Cybersecurity and Infrastructure Security Agency Chemical Facility Anti-Terrorism Standard (CISA
CFATS), 51
cybersecurity awareness maturity model, 52
cybersecurity bowtie diagram, 64, 75, 87
cybersecurity committee charter, 31, 32
cybersecurity compliance, 49
assessment, 51
calculating, 50
cybersecurity controls, barrier representation of, 210, 211
cybersecurity designs. See also standardized designs
benefits of standardizing, 85–87
cybersecurity framework (CSF), National Institute of Standards and Technology (NIST), 22, 23
cybersecurity incident(s)
common concerns over, 48
confidence interval, 68
loss exceedance curve, 68, 69
monitoring against industry standards, 50–52
monitoring against policy, 48–50
monitoring compliance, 48–52
observations or near misses, 46–47
people management and, 215–217
prompt reporting of, 45–47
reporting to employees, 47–48
return on control, 70
safety observation card, 46
cybersecurity incident response plan, 89, 139–140
exercising the plans, 139–140
near misses, 140
recovery point objective (RPO), 139
recovery time objective (RTO), 139
cybersecurity information, common mistakes in sharing, 44
cybersecurity management system, 36–37
frameworks, regulations, standards, and guides, 21–22
frameworks, standards, guides, and regulations, 25–27
ISA/IEC 62443, 24–25
monitoring changes, 37, 42
National Institute of Standards and Technology (NIST), 22–24
NIST Special Publication 800 series, 25
reporting effectiveness, 36, 37–40
shortcomings of, 205–206
tracking and managing risk, 37, 40–41
cybersecurity manager, 218
cybersecurity risk
defining, 58–59
top 10 risks for 2020, 40
tracking and managing, 40–41
cybersecurity safeguards
effectiveness of minimum, 80
future of industrial cybersecurity risk management, 81–83
hierarchy of controls, 79–80
ISA/IEC 62443 standards, 73–75, 76
responsibility for defense-in-depth measures, 75, 77–78
simplified assessment and definition of safeguards, 78–80
cybersecurity single point of accountability, 218
Cybersecurity Ventures, 3
Cygenta, Barker of, 189
D
data confidentiality (DC), foundational requirement, 88
defense-in-depth measures, responsibility for, 75, 77–78
demilitarized zone (DMZ), 33
access control, 106
antivirus and patching, 106
backup and recovery, 106–107
block diagram of, 105
firewall and separation of duties, 118
firewall ruleset, 112
management of, 214
remote access, 106
secure designs, 105–107
denial-of-service (DoS) attack, 58, 58n43
Department of Energy, regulations, 22
design. See standardized designs
DevOps, software development and operations, 12
Digital Bond, 247
distributed control system (DCS), 25, 89
documentation, manual procedures, 217–218
domains, Windows Active Directory, 129
Draft Code of Practice for Cyber Security in the Built Environment, Institution of Engineering and
Technology (IET), 212–213
Dragos, 247
Duke Energy, 182, 183
Dynamic Host Configuration Protocol (DHCP), 124
E
Edwards versus the National Coal Board, 61
Electronic Records, regulation, 26
Electronic Signatures, regulation, 26
embedded devices, 148–149
architecture, 148
data logging, 148
power supply, 148
emerging technology, competency, 196, 198
encryption
asymmetric, 152–153
automation systems, 154
hashing, 153–154
private-key, 152
public-key, 152–153
symmetric, 152
engineering, 31
cybersecurity, 164–165
project delivery, 164–165
engineering, procurement, and construction (EPC), 161, 167, 172, 174
Environmental Protection Agency (EPA), regulations, 22
Equifax, 14
equipment access control, 89, 128–131
authentication protocols, 130
key, 128–130
multifactor authentication, 131
role-based access control (RBAC), 129
EU Network and Information Systems (NIS), 70
European Union, 22, 35
Extensible Authentication Protocol (EAP), 124
F
factory acceptance testing (FAT), 171
fail-fast approach, IT, 12
fail-fast concept, 191–192
failure modes and effects analysis (FMEA), 63
Fazio Mechanical Services, 13n15
Federal Trade Commission, 14
Finan, Tom, 230–231
firewall(s), 111–118
DMZ, and separation of duties, 118
facility architecture, 117
operational facility, 117
ruleset for turbine control system zone, 112
standard and industrial, 114–116, 118
first line of defense, cybersecurity, 194–195
Flanagan, Adam, 215–216
forests, Windows Active Directory, 129
Fortune 500 companies, 207
foundational requirements (FRs), ANSI/ISA-62443-3-3, 87–88, 89
four-eyes principle
separation of duties, 214
US Food and Drug Administration, 214–215
framework, cybersecurity management, 22, 27
Functional Safety, standard, 26
function code, 115
G
Garmin, 223
gas turbines, 16
control system, 17
cybersecurity risks of control system, 59–60
mitigations, 18–19
Modbus protocol, 115, 116
process hazard analysis (PHA), 66
safety system, 16–17
Siemens, 121
Geller, E. Scott, 192–193
geofencing, 135, 135n98
golden triangle
information technology (IT) and operational technology (OT), 8–10
people, process, and technology, 8
Google, 1
governance
communicating to the organization, 42–43
establishing infrastructure, 30–31
industrial cybersecurity, 30, 52–53
industrial cybersecurity management, 20
monitoring changes, 37, 42
governance, risk management, and compliance (GRC), 176
Gruhn, Paul, 70
Gualtieri, Bob, 224
guides, cybersecurity management, 22, 27
Guide to Industrial Control Systems (ICS) Security (NIST), 25
H
hashing, 153–154
hazard and operability study (HAZOP), 63, 164
Health and Safety at Work Act (1974), 61
Heinrich’s triangle, 38
hierarchy
cybersecurity controls, 188, 189
industrial cybersecurity management, 20–21
Maslow’s, of human needs, 20
HiPo (High Potential Accident), 38
host-based intrusion detection systems (HIDSs), 136
hot standby, redundancy, 143
How to Measure Anything in Cybersecurity Risk (Hubbard and Seiersen), 68
Hubbard, Douglas, 68
human error, 188
cybersecurity, 185–190
human-machine interface (HMI), 91, 162, 184
control system, 215
Hypertext Transfer Protocol (HTTP), 96
I
IACSs. See industrial automation and control systems (IACSs)
IBM X-Force Threat Intelligence Index, 187
identification and authentical control (IAC), foundational requirement, 87
IEC 62242 standard, 92
ILOVEYOU worm, 1
incident response, 232. See also cybersecurity incident(s)
planning, 223–227
preparedness, 167–170
industrial automation and control systems (IACSs), 6, 25. See also standardized designs
division of responsibilities between IT and OT, 33
foundational requirements (FRs), 87–88
simplified block diagram of environment, 89
industrial control systems (ICSs), 25
industrial cybersecurity, 2, 4
foundations of management, 20–21
hierarchy of management, 20–21
methods to address risk chain, 82
risk chain, 81, 82
risk management, 81–83
term, 6
industrial cybersecurity risk, 59–61
industrial firewalls, 114–118
Industrial Internet of Things (IIoT), 92
hierarchy and control, 100–103
hierarchy and speed of response, 98–100
Purdue reference model and, 96–103
vibration monitoring, 100
information security risk assessment, 59
Information Systems Audit and Control Association (ISACA), 27
information technology (IT), 2
bowtie diagram, 187, 188
competency, 196, 198
consequences of, 13–18
cybersecurity, 2, 4, 7
cybersecurity awareness training, 181
division of responsibilities, 33
gas turbine control system, 19
ISO 27001, 27, 29
mitigations, 18–19
operational technology (OT) versus, 8–19, 233–234
OT projects and, 161
people, process and technology, 9
refresh of, 205
significance of culture, 12–13
significance of technology, 10–11
unauthorized access to network, 187
informed, RACI term, 34
Infracritical, 246
Institute of Electrical and Electronic Engineers (IEEE), 124
Institution of Engineering and Technology (IET), Draft Code of Practice for Cyber Security in the
Built Environment, 212–213
insurance, cybersecurity, 230–231
integrated control and safety system (ICSS), 88–90
core of facility, 90, 91
Ethernet networks, 91
intentional threats, cybersecurity, 211–212
International Association of Oil & Gas Producers (IOGP), 79, 173
International Electrotechnical Commission (IEC), 75
International Nuclear Event Scale, 12
International Organization for Standardization (ISO), 94
International Society of Automation (ISA), 70, 196, 246
ISASecure certification programs, 154–155, 156
International Telegraph and Telephone Consultative Committee (CCITT), 94
Internet, 169
Internet Control Message Protocol (ICMP), 113
Internet Engineering Task Force (IETF), 120
Internet protocol (IP), 173
addressing, 149–152
basic structure of address, 149
classes of address, 150
classless inter-domain routing (CIDR) method, 150
host file, 151, 151n100
IPv4 schemes, 151
IPv6 address format, 151
non-routable addresses, 150
Internet Security Association and Key Management Protocol/Oakley (ISAKMP/Oakley), 120
intrinsic safety, 81n67
intrusion detection system (IDS)
alert failure, 187
networking monitoring, 136–138
intrusion prevention system (IPS), networking monitoring, 136, 138
inventory management, 218–222
creating, for existing facilities, 220–222
creating an inventory for new facilities, 219–220
maintaining and auditing, 222
operational technology (OT) systems and devices, 218–219
ISA-95, 1
ISA-99 standard, 92
ISA Global Cybersecurity Alliance (ISAGCA), 84, 246
ISA/IEC 62443, 24–25, 51, 78, 83–84, 247
compliance requirements, 175
standards to define safeguards, 73–75, 76
ISA/IEC 62443 series, 66
Security for Industrial Automation and Control Systems, 154
ISASecure, 154–157, 247
ISA Security Compliance Institute (ISCI), 154, 246–247
ISO 15926, 173
ISO 17065, 155
ISO/IEC 27001, information technology security, 29
issue management, project leadership, 177
IT. See information technology (IT)
“It’s Up to Me” (Merrell), 194–195, 203, 234
J
J.R. Simplot Co., 194
K
Kahneman, Daniel, 190
Kerberos, 130
Kyivoblenergo, Ukrainian regional electricity, 37
L
lagging and leading indicators
accident triangle, 38
security triangle, 39, 43–45
Langner, Ralph, 247
leadership qualities, total safety culture (TSC), 193
Lee, Rob, 247
Lockheed Martin, Cyber Kill Chain, 82
loss exceedance curve, Monte Carlo simulation, 68, 69
M
McGlone, Jim, 62
maintainability, term, 143
malware, 2
management of change, cybersecurity, 166–167
Maneuvering Characteristics Augmentation System (MCAS), 9
manual procedures, 89, 141–142
end-user facilities, 142
people management, 217–218
Marszal, Edward, 62
Maslow, Abraham, 20
Mattel, 190
media access control (MAC), 124
Merrell, Don, 194–195, 203, 234
mesh, 146
Mission Critical Operations Primer! (Mustard), 22
mitigation, 184
IT versus OT, 18–19
Mitre, ATT&CK for Industrial Control Systems, 81–82
Modbus protocol, turbine control system, 115, 116
monitoring
cybersecurity incidents, 48–52
network, 89, 136–138
Monte Carlo simulation, 184
inherent risk, 68
loss exceedance tolerance, 68, 69
percentile, 69
quantifying risks, 67–70
residual risk, 68
return on control, 70
Monte Carlo simulations, 83
multifactor authentication, equipment access control, 131
N
National Cyber Security Centre (NCSC), 43, 177, 228
National Infrastructure Security Coordination Centre, United Kingdom, 5
National Institute of Standards and Technology (NIST), 51
core functions, 23
cybersecurity framework (CSF), 22–24, 51
Special Publication 800 series, 25
Network and Information Systems (NIS), 35
network-based intrusion detection systems (NIDSs), 136
network-connected devices, 219
network monitoring, 89, 136–138
intrusion detection system (IDS), 136–138
intrusion prevention system (IPS), 136, 138
vibration monitoring rack, 138
network security, Schneier on, 206–207
Nobel Prize, Thaler and Kahneman, 190
North American Electric Reliability Corporation (NERC), 26, 182, 183
Critical Infrastructure Protection (CIP) regulations, 182, 183
North American Electric Reliability Corporation Critical Infrastructure Protection (NERC CIP), 51
North Carolina State University, enterprise risk management (ERM) initiative, 40
Nuclear Regulatory Commission (NRC), 26
O
OAuth, 130
Occupational Safety and Health Administration (OSHA), 63
Oldsmar incident, water treatment plant, 224–227
Olk, Tammy, 246
Open Systems Interconnection (OSI) model, 94–96
operational technology (OT). See also people management
basic automation layer, 15
bowtie diagram, 187, 188
competency, 196, 198
consequences of, 13–18
cybersecurity, 2, 4, 7
cybersecurity awareness training, 181
division of responsibilities, 33
gas turbine control system, 18–19
incident response, 223–227
information technology (IT) versus, 8–19, 233–234
insurance, 230–231
layers of protection, 14–16
life of equipment, 205
mitigations, 18–19
network security, 206–208
operational support culture, 205–206, 231–232
people, process, and technology, 9
plant personnel intervention layer, 15
safety system layer, 15, 16
securing project leadership, 160–162
significance of culture, 12–13
significance of technology, 10–11
suppliers, vendors, and subcontractors, 227–230
term, 6
unauthorized access to network, 187
operations, 31
Operators of Essential Services (OES), 177
organization, training and competence in, 195–197
OT. See operational technology (OT)
overall equipment effectiveness (OEE), 143
oversight
establishing good, 36–41
implementing rigorous, 176
ownership, leadership qualities, 193
P
Password Authentication Protocol (PAP), 130
people, golden triangle, 8
people management, 211–218, 234
accidental acts, 211, 212
background checks, 213
cybersecurity risk reduction, 212–213
four-eyes principle, 214
intentional threats, 211–212
IT versus OT, 9
joiners, movers, and leavers, 215–217
manual procedures, 217–218
separation of duties, 213–215
PERA Enterprise Integration, 247
performance management
cybersecurity activities, 178–179
S-curves, 178, 179
Peterson, Dale, 247
phishing, 190n123, 200–201
phishing incident credible occurrence (PICO) score, 201
physical access control, 89
equipment room within secured facility, 126
equipment room with locked cabinets, 127
inadequately secured automation system equipment, 128
remote well facility, 125
Piper Alpha disaster (1988), 56
plaintext, encryption, 152
plant personnel intervention, 16
gas turbine, 17
protection, 15
policies and procedures, industrial cybersecurity management, 20
power, system availability, 145–146
prevention, 184
private-key encryption, 152
process
golden triangle, 8
IT versus OT, 9
leadership qualities, 193
process control narrative (PCN), 146n99
process hazard analysis (PHA), 184
process safety bowtie diagram, 64
process safety management (PSM), 63
programmable logic controller (PLC), 16, 91, 184, 219
project delivery
commissioning, 170–171
construction, 166
embedding cybersecurity requirements in contracts, 174–175
embedding cybersecurity throughout, 162–174
engineering, 164–165
feasibility, 162–164
handover and closeout, 172–174
oversight process, 176
performance management, 178–179
project stages and cybersecurity considerations, 163
raising awareness within the team, 175–176
risk and issue management, 177
secure senior project leadership support, 160–162
start-up, 171–172
verification of requirements, 176–177
Protiviti, 40
Prykarpattyaoblenergo, Ukrainian regional electricity, 37
public-key encryption, 152–153
Purdue Enterprise Reference Architecture (PERA), 92, 97, 247
Purdue hierarchy, 92–94, 95
control, 100–103
example facility architecture, 104
Industrial Internet of Things (IIoT) and, 96–103
levels of ISA-62443-1-1, 93–94, 95
original, 93
speed of response, 98–100
Purdue model, cybersecurity committee charter, 31, 32
Purdue University, 92
R
RACI. See responsibility assignment matrix (RACI)
radar chart, 49
RADIUS networking protocol, 130
Radvanovsky, Bob, 246
ransomware, 3–4
Rathwell, Gary, 92, 247
Raymond James Stadium, 225
Reaper (Thomlinson), 1
Reason, James, 192, 208
“reasonable person” test, standards, 22
red-team assessment
commissioning, 170–171
physical security controls, 172
redundancy, term, 143
redundant array of inexpensive disks (RAID), 147
registers, 115
regulations, cybersecurity management, 22, 26
reliability, term, 143
remote access, managing, 165
Remote Authentication Dial in User Service (RADIUS), 124
remote communications technology
availability, 133
location, 132–133
policy recommendations, 134–135
security, 133–134
selecting, 132–134
Repository of Industrial Security Incidents (RISI), 186–187
resource availability (RA), foundational requirement, 88
resources
reading, 245–246
useful, 246–247
responsibilities, integrating accountabilities and, 36
responsibility assignment matrix (RACI), 34, 35
responsible, RACI term, 34
restricted data flow (RDF), foundational requirement, 88
ring, 146
risk
assessing future of industrial cybersecurity, 81–83
realistic estimates of likelihood and consequence changes, 185
underestimating, 183–185, 202
RiskBased Security, 3
risk chain, cybersecurity, 199
risk management, 55
as low as reasonably practicable (ALARP), 61–62
defining cybersecurity risk, 58–59
defining safety risk, 57–58
importance of, 55–57
industrial cybersecurity risk, 59–61
overview of, 55–62
project leadership, 177
risk matrix, 57, 58
risk quantification with statistics, 67–72
Bayes’s theorem, 70–73
Monte Carlo simulation, 67–70
role-based access control (RBAC), 129
RootedCON 2014, 19
Royal Holloway University of London, 191
RSA Conference, 2, 2n2
S
Safe Drinking Water Act, 225
safety. See cybersecurity incidents
safety culture, 19
cybersecurity, 191–194
site safety briefings, 193–194
term, 12
toolbox talk, 193, 194
safety instrumented system (SIS), 89
safety observation card, 46
safety risk, defining, 57–58
safety system, gas turbine, 16–17, 17
Saipem, 18, 161, 163–164, 208
SANS, cybersecurity maturity model level definitions, 52
Saudi Aramco, Shamoon-related attacks on, 18
SCADA. See supervisory control and data acquisition (SCADA) system
SCADASec mailing list, 246
Schneier, Bruce, 206, 207
Secrets and Lies (Schneier), 206
secure network design, 89, 91–121
cloud-oriented industrial architecture, 97
conduits, 107, 110
demilitarized zone (DMZ), 105–107
hierarchy and control, 100–103
hierarchy and speed of response, 98–100
IIoT and Purdue reference model, 96–103
Open Systems Interconnection (OSI) model, 94–96
options for plant influent control system, 102
Purdue hierarchy, 92–94, 95, 101, 104
zones, 104
zones and conduits, 103–111
zones and resilience, 107–109, 111
secure remote access, 89, 131–135
approval process and oversight, 135
availability, 133
location, 132–133
policies and procedures, 134–135
procedural controls, 135
redundancy, 133
remote access risks, 132
security, 133–134
selecting remote communications technology, 132–134
technical controls, 135
user management, 134
Security Assertion Markup Language (SAML), 130
Security Development Lifecycle Assurance (SDLA), 154, 155
Security PHA Review for Consequence-Based Cybersecurity (Marszel and McGlone), 62–63
security process hazard analysis (PHA), 62–67
bowtie diagrams, 64
causes, consequences, and safeguards of gas turbine, 66
cybersecurity risk assessment, 63–64
hackable, 65
PHA methodology, 63
security levels of SPR, 66–67
security PHA review (SPR), 62–63
SPR process overview, 65
security requirements (SR), authorization enforcement, 74
security triangle, lagging and leading indicators, 39, 43–45, 200
Seidel, Eric, 224
Seiersen, Richard, 68
senior management, representation of cybersecurity, 31
separation of duties, people management, 213–215
Serious Accidents, 38
Server Message Block (SMB) service, 123
servers, 147–148
service level agreement (SLA), 143
service set identifier (SSID), 124
Siemens, gas turbines, 121
Simple Network Management Protocol (SNMP), 113, 147
Smith, Michael, 246
SolarWinds software, 207
Spitzner, Lance, 52
standard firewalls, 114–118
standardization, 85
standardized designs
backup and recovery procedures, 89, 140–141
benefits of, 85–87
cybersecurity incident response plan, 89, 139–140
electronic access control, 89, 128–131
essential elements of, 87–142
foundational requirements (FRs), 87–88, 89
manual procedures, 89, 141–142
network monitoring, 89, 136–138
physical access control, 89, 125–127
secure network design, 89, 91–121
secure remote access, 89, 131–135
system hardening, 89, 121–125
standards, cybersecurity management, 22, 26–27
star, 146
start-up, project delivery, 171–172
statistics, quantifying risks with, 67–72
Stuxnet malware, 2, 15, 18, 60, 184, 247
subcontractors, 227–230
Sugarman, Eli, 185
supervisory control and data acquisition (SCADA) system, 15, 25, 89, 159
data acquisition, 103
Oldsmar incident, 226
operator display, 102
suppliers, 227–230
supply chain
principles, 228–229
training and competence in, 197, 199
support contracts, 149
support culture, operational, 205–206, 231–232
Swiss cheese model, 208
accident causation, 192
representation of, 209
simplistic example of, 209
system availability
communications networks, 146–147
designing for, 145–149
embedded devices, 148–149
power, 145–146
servers and workstations, 147–148
simplified fault tree, 144
specifying, 144–145
terms, 142–143
system hardening, 89, 121–125
hardening Wi-Fi networks, –70605
practices for cybersecurity management, 121–124
USB lock, 122
system integrity (SI), foundational requirement, 88
Systems and Cyber Impact Database Markup (SCIDMARK), 246
System Security Assurance (SSA), 154–155, 155
T
Target
Fazio Mechanical Services and, 13n15
hackers, 13–14
technical, industrial cybersecurity management, 20
technology
golden triangle, 8
IT versus OT, 9
significance of, 10–11
terminology, cybersecurity, 6
Thaler, Richard, 190
Thomas, Bob, 1
Thomlinson, Ray, 1
threat sources, taxonomy of, 186
timely response to events (TRE), foundational requirement, 88
total safety culture (TSC), 192
training
completion rates, 200
continuous evaluation, 199–201
cybersecurity, 203
industrial cybersecurity management, 20
organizations, 195–197
supply chain, 197, 199
Transmission Control Protocol/Internet Protocol (TCP/IP), 91
tree, 146
trees, Windows Active Directory, 129
TRISIS malware, 184, 246, 247
turnkey, 161
U
UK Centre for the Protection of National Infrastructure (CPNI), 5, 228
UK Health and Safety Commission, safety culture, 12
Ukrainian regional electricity distribution, 37, 60, 184
uninterruptible power supply (UPS), 145
United Kingdom, National Infrastructure Security Coordination Centre, 5
universal serial bus (USB) locks, 189
universal serial bus (USB) network port security, 122
universal serial bus (USB) ports, 37n34, 81
University of Manchester, 192, 208
US Department of Homeland Security (DHS), 26, 43
US Department of Labor, Automation Competency Model (ACM), 196–197
use control (UC), foundational requirement, 88
user access, managing, 165
“users are the weakest link”, 194–195
US Food and Drug Administration (FDA), 26, 214–215
US National Fire Protection Association, 79–80
US Presidential Executive Order 13636, 22, 23
US Securities and Exchange Commission (SEC), 69
V
vendors, 227–230
Virginia Tech, Center for Applied Behavior Systems, 192
virtualization, 148
virtual local area networks (VLANs), 118–119
virtual private networks (VPNs), 119–121
setup for remote access, 120
visual illustration
barrier model analysis, 208–211
barrier representation of cybersecurity controls, 210
Swiss cheese model, 209
W
Wall Street Journal (newspaper), 182
WannaCry outbreak, 108, 123
warm standby, redundancy, 143
water treatment plant, Oldsmar incident, 224–227
water/wastewater construction project, 160
weak antivirus (AV)/patching regime, 187
weak password management, 187
whaling, 190, 190n123
Wickline, Heath, 185
Wi-Fi networks, hardening, 124–125
Wi-Fi Protected Access (WPA), 124
Williams, Theodore (Ted), 92, 97
Willis Towers Watson, insurance company, 230–231
Windows, operating system market, 1
Windows Active Directory, 129–130
Wired Equivalent Privacy (WEP), 124
workstations, 147–148
World Wide Web, term, 1
Z
Z., Willem, 216
zones, 104
definition, 104
demilitarized zone (DMZ), 105–107
hierarchy for facility architecture, 108
resilience and, 107–109, 111
secure network design, 103–111
security requirements, 104