Clusters and Security Distributed Security For Distributed Systems

Enabling privacy and security in cloud of things

Uploaded by

Kashif Shahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views9 pages

Clusters and Security Distributed Security For Distributed Systems

Enabling privacy and security in cloud of things

Uploaded by

Kashif Shahid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

2005 IEEE International Symposium on Cluster Computing and the Grid

Clusters and Security: Distributed Security for Distributed Systems

Makan Pourzandi, David Gordon,

Open Systems Laboratory, Ericsson Research
8400 Decarie Blvd, Town of Mont-Royal, QC, Canada
{makan.pourzandi, david.gordon}@ericsson.com
William Yurcik, Gregory A. Koenig
National Center for Supercomputing Applications (NCSA)
University of Illinois, Urbana-Champaign, USA
{byurcik, koenig}@ncsa.uiuc.edu

Abstract The main difficulty in cluster security results from the

fact that even though many security mechanisms exist for
Large-scale commodity clusters are used in an increas- single nodes in a cluster, the issues related to securing a
ing number of domains: academic, research, and indus- cluster as a whole are not the same as those related to secur-
trial environments. At the same time, these clusters are ing the independent nodes that make up the cluster. Even
exposed to an increasing number of attacks coming from though the behavior of individual nodes may be simple and
public networks. Therefore, mechanisms for efficiently and could be approached with traditional security techniques,
flexibly managing security have now become an essential we believe that effective security management in the con-
requirement for clusters. However, despite the growing im- text of cluster systems requires tools that evaluate the state
portance of cluster security, this field has been only min- of the cluster as an indivisible entity. Simply put, secur-
imally addressed by contemporary cluster administration ing a 100-node cluster is different from securing 100 stan-
techniques. This paper presents a high-level view of existing dalone nodes. To illustrate the above, consider the exam-
security challenges related to clusters and proposes a struc- ple of a traditional security monitoring tool that examines
tured approach for handling security in clustered servers. the flow of communication into and out of individual clus-
The goal of this paper is to identify various necessarily- ter nodes. This tool is limited to evaluating security based
distributed security services and their related characteris- only on streams of data that it considers independently of
tics as a means of enhancing cluster security. any cluster-specific context. On the other hand, a cluster-
aware security mon'itoring tool could evaluate whether a
given node should even be communicating at all, based on
1. Introduction information from sources such as the cluster's job manage-
ment system. That is, if no job is currently scheduled for
Large-scale commodity clusters are used in an increas- execution on a given node, that node should most likely not
ing number of domains: academic, research, and indus- be sending or receiving data on the network.
trial environments. These clusters share and coordinate The idea that cluster security must be considered as a
the use of resources (CPU, storage, ...) for a wide range whole is further underscored by realizing that while the be-
of users. Furthermore, the functionality provided by these havior of individual cluster components may be simple, the
clusters varies from carrier-class applications with tight re- combined interactions of multiple components may result
quirements on availability and real-time response time to in complex, unintended, and non-intuitive behaviors that
High-Performance Computing clusters where availability are difficult or impossible to predict. That is, even if cer-
and real-time response time are secondary issues. At the tain hardware or software components that make up a clus-
same time, these clusters are exposed to an increasing num- ter are certified as assured, these components must co-exist
ber of attacks coming from public networks. Therefore, in a cluster environment that most likely consists of non-
mechanisms for efficiently and flexibly managing security assured components. Furthermore, even if a cluster were
have now become an essential, but at the same time chal- built entirely from certified components, it is unlikely that
lenging, requirement for clusters. the entire cluster, considered as a single entity, would have

0-7803-9074-1/05/$20.00 ©2005 IEEE 96

Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.
been evaluated in any kind of certification process. Simple legitimate user. Masquerade attacks are particularly dan-
combinatorics make it infeasible to use fonnal methods to gerous since they can lead to further damage beyond the
identify and protect against all known vulnerabilities from initial compromised account and there is little indication of
component interactions. a problem to cluster security system administrators.
Cluster security is an emergent property because it arises External attacks that probe and then exploit cluster vul-
from the independent security aspects of the individual clus- nerabilities are a new reality after the Spring 2004 attacks
ter nodes and is at the same time irreducible with regard to on HPC infrastructures worldwide [8, 12]. Attackers seek
the overall cluster system [25]. Within this paper, we lever- to steal cluster services, eavesdrop on cluster messages,
age this perspective on cluster security in order to identify and disrupt cluster operations. There has been at least one
various necessarily-distributed security services and their reported case where a cluster's computational power was
related characteristics. Our goal is to develop techniques used to stage a brute-force effort to decrypt stolen password
that can be used to enhance the security of clusters that exist files [ 15]. Cluster operations can be disrupted using external
in domains ranging from a carrier-class telecommunications denial-of-service attacks on cluster nodes which are Internet
environment to a High-Performance Computing (HPC) en- accessible or by attacks against communications between
vironment. remote users and clusters.
The remainder of this paper is organized as follows. We The largest cluster security threat is actually the com-
first define a threat model specific to the clusters environ- bination of multiple threats in what has been referred to
ment in Section 2. In Section 3 we explain why cluster as cascading threats or dependent risk [25]. The security
security is a challenging task. In Section 4 we state the re- of resources in a cluster environment are dependent on the
quirements for different cluster security services for includ- integrity of all nodes. If one node is compromised, either
ing authentication, access control, and monitoring. Sections by internal or external means, there is a dramatically in-
5 and 6 introduce the authors' research projects, DSI and creased risk to the rest of the cluster nodes since they of-
NVisionCC, respectively. Section 7 addresses deployment ten share identical configurations and common protection
issues with the practical use of these cluster security solu- mechanisms. There is also a risk to peer resources in other
tions. We end with a summary and conclusions in Section security domains (often other clusters) since cluster users
8. tend to coordinate access across different resources.
We highlight three special points about the cluster secu-
2. Threat model rity threat model:

In order to prioritize our efforts in protecting clusters we * Changing Nature of Clusters - Clusters have moved
present a threat model that guides our efforts. The goal of a from closed/proprietary environments (particularly in
threat model is not to cover every possible attack scenario, commercial settings) to open/standard systems that are
an exercise that is impossible given new threats and new often exposed to public networks. This change has re-
vulnerabilities that continuously appear. Rather, the goal of sulted in exposing clusters to a variety of point-and-
a threat model is to understand a security posture given that click attack tools that are easily available on the Inter-
attacks are numerous, that no protection system is perfectly net. Furthermore, many clusters run code from third-
secure, and that protection resources are finite. party partners or software providers. It is almost im-
A threat model can be reduced to risk management. possible for time and money issues to perform a secu-
What are the likely attacks and, knowing these likely at- rity audit on all of this code. Thus, many clusters run
tacks, what tradeoffs are you willing to make to protect untrusted third-party software. This is a major change
against these likely attacks? Attackers have a given set of from traditional clusters running a controlled base of
capabilities; cluster vulnerabilities exist. When the capabil- known source code.
ities generally available to attackers match the exploitable * Shift from Random Reliability Failures to Intentional
cluster vulnerabilities, there is a higher probability of secu- Attacks - Traditional high-availability clusters relied
rity breaches in the cluster. on redundancy to address issues related to random fail-
The first threat to consider is from insider attack which ures of hardware and software. However, this approach
most empirical studies report as being the most likely mode is not applicable to intentional attacks where faults are
of attack [3]. Authorized users with privileged access may targeted and dependent (as opposed to random and in-
attempt to access unauthorized resources, perform denial- terdependent).
of-service attacks on shared resources, or delete or mod-
ify shared data sets. Another type of insider threat is when * Security as a Service versus Security as an Obligation
a legitimate user's authentication credentials (password or - With the increasing use of clusters in several fields,
keys) are stolen, allowing an attacker to masquerade as a there has been a gradual change in the way security is

97
Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.
handled. In many traditional, security-sensitive fields deployed at a user level across the entire cluster (with pos-
(e.g. banking, government) the client is bound to the sible support at the operating system level).
offeringl. In many new fields, security is a service To simplify, the distributed security functionality needed
or an add-on to enhance other service offerings. This for a secure service invocation/communication between two
means that the security should be provided not to inval- objects in different nodes in the cluster can be summarized
idate or conflict with other requirements. For example, as follows:
in the case of handling on-line transactions, security
should support real-time response time requirements. 1. Authenticating the source and target objects. This is
If the transactions are too slow, the client can choose fundamental in order to be able to securely define the
not to use the service and the supplier loses the busi- credentials/privileges for each object in the system.
ness opportunities. This puts extra pressure on the se-
curity requirement and changes the way the security 2. Deciding whether the source object can perform this
should be implemented. action on the target object. This should be done ac-
cording to the security policy already defined.
3. Challenges for Cluster Security 3. Auditing the action. For many clusters, this is op-
tional based on the system functionality needed. Even
Given the threat model we have described, implementing though auditing is an often-needed function, its use is
security for a cluster is difficult in multiple dimensions [25]. based on a trade-off between performance and security
A cluster encompasses a collection of distributed resources: needs.
multiple layers including applications, middleware, operat-
ing systems, and network interconnects must all be coher- 4. Protecting the data flow (requests and responses, the
ently protected. While locking down a cluster by disabling data exchanged, ... ) from being modified or eaves-
services is desirable from a security perspective, cluster re- dropped during the transit between nodes.
sources are meant to be used, so there is the resource man-
agement challenge of allowing users to consume resources The above functionality is often implemented in many
in an authorized way. clusters even though it is not clearly defined.
Clusters represent a heterogeneous management envi- Our approach is to qualify each functionality through
ronment composed of different hardware and software node a service and to provide the needed functionality by that
configurations, presenting the challenge of integrating dif- service. Therefore, we define respectively the following
ferent security solutions (vendor or open source) with a goal services: distributed authentication, distributed access con-
of comprehensive security solutions across the entire clus- trol, distributed monitoring/auditing, and secure communi-
ter. Further, there are large scale management requirements. cations between different nodes. This service-oriented ap-
As the size of clusters continues to increase, installing, proach provides more flexibility (as services evolve, a ser-
monitoring, and maintaining clusters becomes a challenge vice can be replaced or enhanced with new capabilities),
since any misconfiguration or inconsistency potentially be- scalability (since several instances of the same service could
comes an exploitable vulnerability. We are beyond the point run on the same node), and fault tolerance (as high avail-
where typically-sized clusters can be managed manually ability techniques can be used to provide service availabil-
without automation support. The current state-of-the-art has ity [22]). This approach also allows us to take a systematic
automated cluster tools available for performance manage- approach to deploying security functions.
ment, the challenge is developing automated cluster tools Services are inherently distributed across a cluster, inter-
for security management. acting to maintain the distributed secure state of a cluster
(Figure 1). In turn, each local service instance on different
nodes should be built on top of existing security mecha-
4. Distributed Security Services nisms at the operating system level or at the level of other
node security mechanisms (Figure 2).
In the previous sections of this paper we present the logic Microsoft recently initiated an effort to provide dis-
behind the need for a common infrastructure to implement tributed security services for Windows 2000 [19]. Mi-
security in a coherent way throughout an entire cluster. This crosoft's approach is heavily based on PKI, Kerberos, and
section concentrates on cluster platform-level security. By Active Directory. Active Directory is a repository for ac-
platform security we mean the security mechanisms that are count information that enhances and scales the use of ac-
'For example, when was the last time you threatened your tax organi- count information in different domains and stores the se-
zation to provide you with a decent, fast, and secure interface to submit curity policy for different domains. Microsoft added the
your tax forms or you would change your supplier? support for some of these services at the operating system

98
Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.
4.1 Distributed Authentication

Distributed Security Services through cluster The distributed authentication service has the goal of
providing a homogeneous framework for the entire cluster
in order to supply authentication information for objects in
the cluster. The local authentication of objects at the node
level is well known; the greater challenge is to propagate
-t__ 1_ __7l
the authentication information in a transparent way across
the cluster.
: ._ _jX__ Kerberos mechanisms may implement such an infras-
tructure [7]. Several efforts toward its use have been de-
ployed [ 19] resulting in many improvements leading to a re-
liable and proved protocol. However, deployments of Ker-
beros in real world environments show that it does not scale
Figure 1. Distributed Security Services at cluster well when applied to large clusters. Furthermore, the pro-
level. tocol presents some single points of failure, which is unac-
ceptable for many high-availability applications.
The use of digital certificates issued and verified by a
Certificate Authority as part of a Public-Key Infrastructure
level. Unfortunately, their approach is heavily tied to spe- is considered likely to become the standard way to perform
cific technologies and lacks the flexibility necessary to be authentication on the Internet. The widespread acceptance
adapted to clusters with an operating system different from of Kerberos addresses practical issues such as deployment
Windows. and trust management. The deployment of Kerberos inside
a cluster is much easier since it avoids the major problem
with traditional PKI deployment: scalability and trust man-
agement issues. There are a variety of methods for using
PKI in this case. Mainly, certificate servers can be used to
l 00 ~Apli[caionSeurt create and securely propagate certificates while maintaining
Middeware Security 000 the certificate revocation lists.
The target unit for authentication can be Unix users,
nodes, or even applications/processes inside the cluster.
Most of this choice depends on the type of cluster environ-
ment. The more straightforward approach is to use Unix
users as the basic granularity. In this case, every user in the
Operating System Security system holds a key pair which is typically distributed with
certificates issued by a trusted CA.
In many applications, possibly only a few users exist on
dedicated clusters for running a pre-defined set of software.
Figure 2. Distributed Security Services at node level. However, the user-based security system does not support
authentication and authorization checks for interactions be-
tween two processes belonging to the same user. This situ-
ation leads to an all-or-nothing approach, as all users within
There are also some efforts towards leveraging dis- a group or all processes owned by the same user have the
tributed security services from Grid computing [20]. These same rights. This is quite inconvenient when one wishes
services cope with nodes being dynamically added to the to compartmentalize these rather large distributed applica-
Grid and support a wide scope of interoperability, since tions by restricting the access to some resources or some
a Grid generally runs across heterogeneous environments. processes or users within the same group. Therefore, user-
However, clusters often depend on unique administrative level granularity used as the basic entity for access control
ownership and nodes are not scattered as dynamically as in these distributed applications may not be sufficient. In
they are in Grids. This fundamentally changes the scope this scenario there is a need for a security mechanism with
of these services. In the case of clusters, a narrower scope finer granularity which uses the individual processes as the
enhances the possibility of implementing specific mecha- basic entity being secured.
nisms. Although user authorization remains a fundamental pre-

99
Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.
occupation, applying security to applications has always The DSI project showed the feasibility of this approach by
depended directly on the user's permissions. For a given implementing a Linux operating system kernel-level mod-
application, distributed authorization should determine its ule that performs real-time security verification based on the
scope of execution throughout the cluster. Therefore, we LSM hooks [5].
work toward implementing better authentication programs In terms of usability, our experience with SELinux and
(i.e. binaries) instead of concentrating our authentication ef- DSI has shown that process-level security involves an error-
forts around Unix users. As for Unix users, the Unix userid prone configuration task. Such a complicated task will
should be managed in a consistent way throughout an entire eventually lead to administrative mistakes or simple mis-
cluster. configurations. Therefore, we believe that with process
In the end, a combination of process- and user-level se- granularity, a simplified scheme based on a higher level of
curity is the first step in implementing better security for abstraction is key to general acceptance. In the DSI project,
clusters. Current systems offer all the tools and infrastruc- we developed a high level of abstraction regarding access
ture for user authentication. As for authentication with pro- control separating network, administrative, and computa-
cess granularity, the DSI project addresses this issue by of- tional processes into different security zones. Though, this
fering distributed security policies based on process classi- approach should be further extended to all aspects of dis-
fications. tributed security. There are few projects going on providing
tools to simplify SELinux policy administration [21].
4.2 Distributed Access Control In summary, distributed access control should still be
based on user-level access control if only for legacy issues.
Distributed access control presents a unified and consis- However, when possible or when the cluster handles sensi-
tent implementation of access control in the cluster, provid- tive data, a process-level granularity based on MAC in addi-
ing the platform with mechanisms to control access to the tion to DAC is a better solution. To use MAC over a cluster
system resources uniformly throughout the cluster. efficiently, we should then extend MAC to the entire cluster.
The standard Unix access control is based on Discre- Some research projects show the feasibility of this approach
tionary Access Control (DAC) for users. DAC means that without major performance impact [5].
users are in charge of defining permissions for different
objects which belong to them. This approach necessitates 4.3 Distributed Monitoring
correctly setting permissions for different services for each
user throughout the entire cluster: file permissions, network One of the major challenges in protecting a cluster is
service configuration, etc. should be set correctly for each in monitoring a set of distributed resources. Monitoring
user. More importantly, DAC provides no protection against an entire cluster environment involves examining the state
flaws in the system software or malicious software installed of several cluster resources including authentication mech-
on the system. There is also the legacy problem of services anisms (Section 4.1), access control mechanisms (Section
running as privileged users with coarse-grain privilege con- 4.2), activity on individual cluster nodes, software configu-
trol. Recently there have been many alternative developed: ration across the cluster, network traffic (both internal and
capabilities, access control lists (ACL), sand-boxing- (BSD external to the cluster), and user behavior. Due to the in-
jails, chroot) and so on. herently decentralized architecture of clusters, monitoring
In opposition to DAC, Mandatory Access Control these various entities typically involves creating some type
(MAC) is based on security policies and attributes defined of unified view of the status of all cluster resources.
by an administrator for different objects in the system. To create a unified view of the status of cluster resources,
MAC may alleviate the risks related to DAC by providing messages must be sent between nodes. Often, these mes-
for the confinement of the programs defined by the admin- sages are sent to a designated management node that is
istrator in a security policy. As further detailed in [17], a heavily protected and accessed only by the system admin-
process-level granularity approach based on MAC can dra- istrator. This management node collates independent status
matically improve the overall security of a cluster. reports from cluster resources and synthesizes a cohesive
The mechanisms used to enforce process-level access view of the entire cluster system. Two cluster-specific mon-
control in a cluster need improvement. To this end, SELinux itoring projects that employ such an architecture include
introduced the concept of network security identification Clumon [ 1 ] and Ganglia [ 1 1]. At a more basic level, how-
tags (NSID) [13]. The NSID is inserted in network com- ever, each cluster node may simply be configured to send all
munications between applications. This allows for a mech- Unix Syslog messages to the management node which col-
anism to validate the permissions of the given application lates messages from all cluster resources into a single log
within its current context of execution. This again begs file.
the question of how such measures should be implemented. From another perspective, a cluster may be treated as a

100
Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.
black box by observing only the network traffic entering and There are several research projects showing the feasibil-
leaving the cluster, but not the actual activity taking place ity of implementing some of the above services in a cluster.
within the cluster itself. For example, one may examine the In the following sections, we detail two existing research
status of smart hubs, routers, or network Intrusion Detection projects: DSI in Section 5 and NVisionCC in Section 6.
System sensors to analyze activity of the cluster. The idea
here is that most malicious activity involving a cluster must 5. Distributed Security Infrastructure (DSI)
at some point pass between the (external) attacker and the
cluster, and this malicious activity can be observed at this
point. The DSI project [5] targets the distributed access control
Finally, clusters typically contain a number of resources service. DSI began as a research project to support different
that can be leveraged to determine an overall view of cluster security mechanisms to address the needs of telecommuni-
activity. One of the most obvious places to obtain informa- cations applications running on carrier-class Linux clusters.
tion about the state of a cluster is from the cluster's batch For the time being, DSI provides distributed mechanisms
job scheduler. This technique is particularly powerful when for access control, security management, and authentica-
coupled with other monitoring techniques because information.
tion about cluster batch jobs can be correlated with other The Distributed Security Infrastructure contains one se-
information to obtain a richer view of activity. curity server (SS) and a security manager (SM) on the re-
Each of these monitoring approaches presents informa-
maining cluster nodes. The SS is responsible for distributed
tion from across the cluster. While these monitoring ap- security management of the cluster. It will propagate the
proaches provide integrity checks and enhanced operation, security policy and communicate via alarms and messages
with the SMs on the nodes. Communication is done over
their distributed nature makes them vulnerable to attacks the Secure Communication Channel (SCC). The SCC com-
since they are involve some type of message passing. En-
munications are encrypted using SSLITLS over CORBA.
cryption, properly implemented, can solve many of the The versatility of DSI is in fine-grained control that can
problems but not all. Monitoring protocols should be for-
mally verified using standard implementations. Monitors be enforced on the node by the SMs. Various structures in
must be careful not to do more harm than good by being the kernel such as sockets and processes can be assigned a
vulnerable to attack and subversion. security context identifier (ScID). ScIDs are global over the
cluster and persistent. ScIDs are meant to group together
processes that have the same security context. So, contrary
4.4 Distributed Secure Communications to PIDs, SsIDs do not uniquely identify processes but se-
curity contexts. Similarly, each node is assigned a security
Protecting the integrity of cluster communications is im- node identifier SnID. Hence, the distributed security pol-
portant. In this context, we include both communications icy (DSP) consists of a list of rules to be applied to (SnID,
related to the computational goals of the cluster (e.g., the ScID) pairs.
communications related to parallel and distributed applica- For security mechanisms to be effective, users should
tions running on a cluster) as well as communications re- not be able to bypass them. Hence, the best place to en-
lated to managing and monitoring a cluster. Further, be- force security is at kernel level.. Therefore, when neces-
cause protecting communications often includes a high per- sary, all security decisions are implemented at kernel level,
formance cost (e.g., the cost of encryption), it is sometimes in the DSI Security Module (DSM). DSM is a set of kernel
useful to differentiate between intra-cluster communication functions enforcing distributed security policy, and is im-
and inter-cluster communication. In the case of intra-cluster plemented using LSM [231 as a Linux kernel module. As
communication where messages stay completely within the future work, in order to use the mainstream Linux tools, we
boundary of the cluster's System Area Network, an opti- consider using SELinux instead of our internally developed
mization may be to assume that this network is less prone DSM Linux kernel module.
to attack and to simply allow these communication opera- As presented in Section 3, there is need for compartmen-
tions to remain unsecured. This is in contrast to inter-cluster talization in large distributed applications. In order to com-
communication where messages likely travel over a public partmentalize large applications, DSI uses ScIDs to imple-
network that is well outside the control of cluster admin- ment different virtual security zones. These security zones
istrators, suggesting that securing such communication is are defined with a process level granularity across the entire
important. In either case, having the granularity to deter- cluster. They are based on the process type and the node
mine which communication operations are secured and un- on which they are executing. A process instance can be-
secured is useful. Various well-known and usable solutions long to different security zones. For example, the instances
including IPSec and SSL/TLS exist and can address these of the same process type can be defined in different secu-
issues effectively. rity zones depending on which cluster node they are run-

101
Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.
ning. ScIDs do not identify different instances of a process ters in High-Performance Computing environments. NVi-
type, but rather define the security zone they belong to. The sionCC implements a new security monitoring paradigm
security rules are defined in a central security policy file: based on the observation that although the average num-
Distributed Security Policy (DSP). They define the possible ber of nodes in clusters is increasing, these nodes can be
interactions between different security zones in the entire divided into distinct classes that exhibit relatively homoge-
cluster. The DSP file can be used by the administrator to neous behavior for each node in a given class. For exam-
define a homogeneous view of the cluster. This is particu- ple, login nodes, compute nodes, storage nodes, and man-
larly convenient for the carrier-class clusters which are not agement nodes are all common classes of nodes found in
running a wide range of applications - this makes it possible most clusters. All cluster nodes within each class typically
to predefine interactions between different zones. This flex- have similar operational characteristics such as the list of
ible mechanism can be used to confine untrusted software or expected processes, installed software, network traffic pat-
in an extreme case run them inside a sandbox. DSP changes terns and port activity, and user behavior [25]. This allows a
are automatically propagated to all nodes of the cluster. The profile of a given node's steady-state behavior to be created,
security managers are in charge of communicating this new a feature that is feasible in the unique HPC cluster envi-
rules to the local DSM providing a dynamic evolution of ronment (as opposed to an enterprise network environment)
security behavior of the cluster. due to the observation that a dedicated cluster environment
A more detailed presentation of DSI can be found in [ 16, represents a constrained set of circumscribed activities and
5]. states that can be enumerated. Thus, the problem of mon-
itoring the security of a cluster of hundreds or thousands
6. Scalable Cluster Security Monitoring of nodes is reduced to the much more feasible problem of
scanning each cluster node for deviations from its expected
The trend in cluster computing, particularly in High- behavior profile. For example, instead of monitoring hun-
Performance Computing settings where large computa- dreds or thousands of compute nodes in a cluster indepen-
tional problems require huge amounts of processing power, dently, a single profile can configured for all compute nodes
is for larger clusters with increasing node counts. Further, which defines the list of expected processes, installed soft-
decreasing per-node costs have accelerated this trend for ware, network traffic patterns and port activity, and user be-
larger clusters. As the average size of clusters grows, how- havior. In this scalable way, hundreds or thousands of com-
ever, security monitoring techniques that may have worked pute nodes can be compared to this one profile and analyzed
well for smaller clusters are often no longer effective. There for unexpected activity.
are three primary reasons why traditional techniques do not While the scalable processing of potential security events
scale: is a focus, ultimately it is the scalable communication of
security information to the human operator that may rep-
1. Security management tools are predominantly resent the most difficult security challenge. Leveraging
command-line interfaces designed for monitoring a the successful Clumon cluster management GUI developed
small number of entities. at the National Center for Supercomputing Applications,
NVisionCC has designed a visualization framework that
2. Human cognitive abilities to perceive, understand, de- presents information to a human operator with the following
cide, and react do not scale in the same way as cluster characteristics:
size, speed, and functionality.
1. All nodes within an entire cluster are shown on a single
3. Existing security management tools are designed for screen. Nodes are shown as being adjacent in space,
enterprise environments. Currently only one secu- not stacked in time nor scrolled at the bottom.
rity management tool exists that has been specifically
designed for the unique cluster environment (NVi- 2. An overview of an entire cluster is given along with
sionCC) [24]. the ability to "drill down" to areas of interest, revealing
raw data details at the individual node level.
As clusters grow in size, the combinatorics resulting
from monitoring an increasing number of statistics about 3. Smallest effective differences in shape and color indi-
the cluster nodes, the aggregate processes running across cate information.
all nodes, the installed software packages, file system and 4. Different icons show different levels of process secu-
network I/O, etc., become intractable without a new security status: critical, bad, suspicious, and normal[24].
rity monitoring paradigm specifically designed for clusters.
NVisionCC is a security monitoring tool specifically de- While using the NVisionCC profiling approach provides
signed to address the challenges presented by large clus- dramatically increased security monitoring scalability, there

102
Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.
is a limit to this approach (security monitor processing coordinated include the expected load range on the cluster
power versus cluster node size). NVisionCC relies on in order to tune security monitor performance and minimize
polling an agent, Performance Co-Pilot (PCP) [14], running performance hits, the expected number of users for tuning
on each cluster node. As the number of cluster nodes in- the cluster authentication system, and the expected storage
creases beyond a certain point, or as the aggregate node ac- behavior (interactive or write-once) and storage size to tune
tivity increases beyond a certain level, this polling approach the cluster security access policy. One way to handle this is
no longer scales. The exact scalability breaking point for to define attributes in a configuration file where values can
the polling approach is an open question currently being in- be set. To contrast the approaches we present in this paper,
vestigated. these attribute values will be very different in carrier-class
An alternative to polling is an interrupt-based approach clusters versus general-purpose HPC clusters.
that sends only significant change events to the central mon-
itoring process for analysis. The advantage of this model is 8 Summary
that security events are analyzed in a more timely fashion as
they occur rather than at discrete polling intervals. The sig- In this paper we have presented complementary security
nificant drawback of this approach is that current operating approaches for clusters across a range from carrier-class
systems are not directly instrumented for such monitoring clusters to High-Performance Computing clusters. At the
and implementing it would entail the deployment of kernel carrier-class end of the cluster environment spectrum, clus-
loadable modules onto each cluster node. These modules ters must be locked-down to a maximum extent with an em-
must be extensively tested before being accepted into a pro- phasis on production reliability. The corresponding secu-
duction environment, and must also typically be upgraded rity approach we present for this carrier-class cluster en-
with each kernel revision. Further, these modules may lead vironment is a unified security model including distributed
to unacceptable decreases in the processor cycles delivered authentication and distributed access control. At the HPC
to application software running on cluster nodes. cluster environment end of the cluster environment spec-
trum, clusters must be flexible to handle a dynamic user
7. Deployment Issues constituency executing a wide range of applications. The
corresponding security approach we present for this HPC
Deploying clusters out of the HPC environment and into cluster environment focuses on multi-dimensional detection
the mainstream has been slower than it should be due to designed to enable real-time cluster security management
installation and management challenges; clusters are not that adapts quickly and automatically to changing situa-
trivial to set up and manage. Cluster installation packages tions. NVisionCC is the first security intrusion detection
like OSCAR [4] and ROCKS [18] are making cluster in- system specifically designed for the unique HPC cluster en-
stallation easier. Of further help is the fact that a large per- vironment.
centage of system software is common within a particular Issues specific to cluster security have traditionally not
cluster node class (login node, compute node, storage node, been studied extensively, and it is our hope that this paper
management node, etc.). However, while clusters may start sparks discussion. There is much more work to be done in
out as homogeneous within a node class, this homogene- areas such as scalable-cluster monitoring, intuitive human
ity can quickly diverge as hardware and software is added interfaces to security tools, interconnect security, masquer-
and replaced. After installation, cluster management con- ade detection, and flexible protection that evolves with in-
sists of performance tuning by benchmarking and configur- cremental cluster growth.
ing CPU capabilities, memory subsystems, V/O subsystems,
and compiler options. The time and learning investment in References
properly tuning a cluster can be substantial. Unified policies
and centralized management can help by lowering barriers [1] Clumon - The Cluster Monitoring System. http: / /
to cluster deployment - the so-called "rule of one": one clumon . ncsa . uiuc . edu/.
system administrator with one plan, one set of user policies,
and one help desk for handling questions and problems es- [2] CORBA Security Service Specification, Object Man-
pecially related to security management. agement Group, version 1.8, March 2002.
A cluster implementation is typically not performed by
the cluster security software developers, especially in the [3] CSI/FBI Computer Crime and Security Survey, Com-
carrier-class commercial case, but coordination and training puter Security Institute, 2004.
between developers and implementers is important since [4] B. des Ligneris, S. Scott, T. Naughton, and N. Gor-
even a carefully designed cluster security system is worth- such, Open Source Cluster Application Resources
less if not properly deployed. Some issues that need to be (OSCAR): Design, Implementation, and Interest for

103
Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.
the [Computer] Scientific Community, First OSCAR [191 Secure Networking Using Windows 2000: Distributed
Symposium, 2003. Security Services, Microsoft White Paper, 1999.
[5] Distributed Security Infrastructure Open Source [201 T. Seki, OGSA Introductory Session by IBM,
Project, http: //disec. sourceforge.net. Framework for Commercial Grids, IBM
Japan, 2002. www.gridforumkorea.
[6] B. Hartman, D. Flinn, and K. Beznosov, Enterprise org/workshop/2002/2002_winter/
Security with EJB and CORBA, Wiley, 2001. 01-Tutoriall-TakanoriSeki(IBM).pdf,
[7] J. Kohl and C. Neuman, The Kerberos Network Au- [21] SE Linux Policy Tools, http://www.tresys.
thentication Service (V5), IETF RFC 1510, Septem- com/selinux/selinux_policy_tools.
ber 1993. http://cryptnet.net/mirrors/ html.
rfcs/rfcl510 .txt
[22] Service Availability Forum, http://www.
[8] B. Krebs, Hackers Strike Advanced Computing Net- saforum.org/home
works, Washington Post, April 2004.
[23] C. Wright, C. Cowan, S. Smalley, J. Morris, and G.
[91 B. LaMacchia, S. Lange, M. Lyons, R. Martin, and K. Kroah-Hartmann, Linux Security Modules: General
Price, .NET Framework Security, Pearson Education, Security Support for the Linux Kernel, Usenix Se-
2002. curity Symposium, 2002. http: //lsm. immunix.
org.
[10] U. Lang, Access Policies for Middleware, University
of Cambridge Technical Report UCAM-CL-TR-564, [24] W. Yurcik, X. Meng, and N. Kiyanclar, NVisionCC:
May 2003. A Visualization Framework for High Performance
Cluster Security, ACM CCS Workshop on Visual-
[11] M.L. Massie, B.N. Chun, and D.E. Culler, The Gan- ization and Data Mining for Computer Security
glia Distributed Monitoring System: Design, Imple- (WzSEC/DMSEC), 2004.
mentation, and Experience, Parallel Computing, Vol
30 Issue 7, 2004. [251 W. Yurcik, G. A. Koenig, X. Meng, and J. Greenseid.
Cluster Security as a Unique Problem with Emergent
[12] Multiple Unix Compromises on Campus, Stan- Properties: Issues and Techniques, 5th LCI Intl. Con-
ford ITSS Security Alert, April 2004. http: ference on Linux Clusters, 2004.
//securecomputing.stanford.edu/
alerts/multiple-unix-6apr2004.html,
[131 Network Packet Labeling, http://www.nsa.
gov/selinux/papers/module/x2794.
html.
[14] Performance Co-Pilot. http://oss.sgi.com/
proj ects/pcp/
[15] T. Perrine and D. Kowatch, Teracrack: Password
Cracking Using TeraFLOP and Petabyte Resources,
San Diego Supercomputer Center Security Group
Technical Report, 2003. http: //security.
sdsc . edu/publications/teracrack . pdf.
[16] M. Pourzandi, I. Haddad, C. Levert, M. Zakrzewski,
and M. Dagenais, A Distributed Security Infrastruc-
ture for Carrier Class Linux, Fourth Annual Ottawa
Linux Symposium, 2002.
[17] M. Pourzandi, A new Distributed Security Model for
Linux Clusters, Usenix, 2004.
[18] ROCKS Cluster Distribution, http: //www.
rocksclusters.org/Rocks/

104
Authorized licensed use limited to: HIGHER EDUCATION COMMISSION. Downloaded on September 25,2024 at 10:12:11 UTC from IEEE Xplore. Restrictions apply.