TMP B899
TMP B899
TMP B899
Abstract—As military, academic, and commercial computing to multiple end users efficiently [1], [2]. However, complex
systems evolve from autonomous entities that deliver computing systems that use cloud computing, such as shown in Fig. 1,
products into network centric enterprise systems that deliver com- are prone to failure and security compromise in five main
puting as a service, opportunities emerge to consolidate computing
resources, software, and information through cloud computing. areas: computing performance (e.g., latency and time delay
Along with these opportunities come challenges, particularly to experienced by a system when processing a request), cloud
service providers and operations centers that struggle to monitor reliability (e.g., network connectivity), economic goals (e.g.,
and manage quality of service (QoS) for these services in order to interoperability between cloud providers), compliance (e.g.,
meet customer service commitments. Traditional approaches fall digital forensics to discern what happened, learn how to prevent
short in addressing these challenges because they examine QoS
from a limited perspective rather than from a system-of-systems incidents, and collect information for future actions), and infor-
(SoS) perspective applicable to a net-centric enterprise system in mation security (e.g., to protect the confidentiality and integrity
which any user from any location can share computing resources of data and ensure data availability) [1], [2]. Present approaches
at any time. This paper presents a SoS approach to enable QoS to mitigate these issues are not sufficient to ensure quality of
monitoring, management, and response for enterprise systems service (QoS) for end users [3].
that deliver computing as a service through a cloud computing
environment. A concrete example is provided for application of Recent survey articles [4]–[7] summarize challenges for
this new SoS approach to a real-world scenario (viz., distributed cloud computing to provide mechanisms to mitigate the issues
denial of service). Simulated results confirm the efficacy of the listed above, the need for monitoring the various layers of the
approach. cloud computing environment, and the need to provide end-to-
Index Terms—Cloud computing, distributed denial of service end solutions for the problems. The architectural frameworks
(DDoS), enterprise systems, information assurance, net centric, considered in these excellent surveys are one dimensional and
quality of service (QoS), security, service-oriented architecture principally deal with the infrastructure, platform, and software
(SOA), systems of systems (SoS). as service layers of cloud computing. Federated clouds with
cross-cloud connectivity provide an opportunity mentioned in
I. I NTRODUCTION [6] to support service-oriented computing, and QoS is discussed
in [8] but is limited in application to the task of virtual machine
A S ECONOMIC pressure intensifies for network and en-
terprise operations centers, those responsible for these
centers seek a method to lower costs in the presence of data
provisioning in data centers rather than to end-user satisfiability.
None treats the clouds from a system-of-systems point of view
overload. This dramatic increase in the quantity of data is a as is done here. The architectural model used here to support
product of the evolution of complex net-centric enterprise sys- a service-oriented architecture (SOA) is multidimensional, ex-
tems over which multiple disparate users in dispersed locations tending to layers of business and governance as services, while
share gigabytes, terabytes, or even petabytes of data at high reducing complexity by combining the traditional platform and
speeds over production networks. Cloud computing provides applications layers into one software layer that provides overall
one possible method to meet this challenge by reducing cost Software as a Service (SaaS) to end users. Furthermore, specific
through shared computing resources while distributing data locations and the use of software agents for monitoring and
observation in this system of systems (SoS) are presented.
Motivation: QoS specification and monitoring for cloud ser-
Manuscript received November 16, 2012; revised September 19, 2013; vices is a complex and challenging issue, one where there
accepted December 10, 2013. are few universal benchmarks or standards. While the usual
P. C. Hershey is with Raytheon Intelligence, Information and Services,
Dulles, VA, 20166 USA (e-mail: [email protected]).
quality metrics such as uptime and reliability may still be
S. Rao is with the International Institute of Information Technology- considered applicable in the context of cloud systems, it is
Bangalore, Bangalore 560100, India (e-mail: [email protected]). less clear what the QoS parameters unique to such systems
C. B. Silio Jr., is with the University of Maryland, College Park, MD 20742
USA (e-mail: [email protected]). are and how they should be applied in specific contexts. A
A. Narayan is with School of Computing, National University of Singapore, lack of common metrics to be applied to cloud offerings from
Singapore 117417 (e-mail: [email protected]). various providers is also a barrier to standardization of cloud
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. offerings from different providers. Thus, although “federated”
Digital Object Identifier 10.1109/JSYST.2013.2295961 clouds using services from different providers are thought to
1932-8184 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
HERSHEY et al.: SoS FOR QoS OBSERVATION AND RESPONSE IN CLOUD COMPUTING ENVIRONMENTS 3
HERSHEY et al.: SoS FOR QoS OBSERVATION AND RESPONSE IN CLOUD COMPUTING ENVIRONMENTS 5
through the DoD Information Technology Security Certifica- (blue boxes) for a representative cloud computing environ-
tion and Accreditation Process. Metrics within the C&A met- ment realized through a SOA-based net-centric enterprise sys-
rics category identify security risks and deficiencies, provide tem. This system includes four user communities: 1) end
information to help ensure that steps are taken to correct these users shown with the client workstation, 2) help desk shown
deficiencies and vulnerabilities, and provide information to help with the trouble management system database, 3) opera-
ensure the safeguarding of applications, networks, systems, tions shown with the configuration management database, and
data and information. 4) engineering shown with the project control and development
Step 3b: Measuring performance metrics. When considering tracking database. With respect to the system components,
heterogeneous resource types, having a baseline for measuring the end-user client machines that run applications over the
all performance and security metrics is nontrivial. In our work, base/camp/post/station network appear at the left side of the
we measure low-level independent performance metrics such figure. The network edge appears as a Customer Edge (CE)
as delay at each level of EMMRA CC. In the prototype of router attached to a High Assurance IP Encryptor (HAIPE)
an online transaction processing application, we measure the device and a Provider Edge (PE) router, along with a cache. The
response time at each of the layers. Specifically, we measure core network includes information transported via Multiproto-
the following: 1) time taken for a database query to return the col Label Switching (MPLS), Dense Wave-Division Multiplex-
results (on the cloud instance hosting the database), 2) time ing, and Synchronous Optical Network services. The network
taken by the application logic to execute (on the cloud instance terminates at a Defense Enterprise Computing Center (DECC)
hosting the application server), and 3) time taken for the data to with the connection from a second PE router to a HAIPE
be transmitted over the network between the application server device and then to a DECC Edge (DE) router and a DECC
and the database server. Throughput is measured as the number LAN. To the right of the DE are the computing services to be
of transactions completed when viewed from the application shared through a cloud computing environment, including those
perspective and hence is measured on the instance hosting for dynamic host configuration protocol (DHCP), Distributed
the application server. A similar approach to measure perfor- Names System (DNS), Discovery, App Server, Portal, Message
mance metrics for SLA management has been used in earlier Queue, Web Server, Service Node, Security, and Database. As
work [22]. an example, we provide the rationale used to select the locations
Step 4: Identify suitable locations within the cloud com- for IA/security metrics.
puting environment for metric detection. This step identifies Authentication: Cyber security can be monitored at the
suitable locations to observe and collect the metrics in Table II. Apps, Portal, and Security/SSO servers. For example, EMMRA
Fig. 4 shows these locations for an expanded set of metrics for CC agents can monitor Security Assertion Markup Language
security (green boxes), delay (yellow boxes), and throughput authentication assertions at the Security/SSO server. These
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
HERSHEY et al.: SoS FOR QoS OBSERVATION AND RESPONSE IN CLOUD COMPUTING ENVIRONMENTS 7
TI = n × Transaction Throughput
TS = m × TI
TB = q × TS . (2)
Fig. 7. Operational view of EMMRA CC agents and CA node deployment to counter DDoS attacks.
HERSHEY et al.: SoS FOR QoS OBSERVATION AND RESPONSE IN CLOUD COMPUTING ENVIRONMENTS 9
TABLE IV
D ELAY R ECORDED IN T EN S AMPLE T RANSACTIONS
Fig. 8. Variation in delay recorded over time. Fig. 10. Throughput: Number of transactions per minute.
we monitor delay at various levels of EMMRA CC. We notice delay in the setup. For example, a regular pattern in spikes
that the delay perceived at the business or governance level in clearly indicates a set of operations that cause repeated delay
our work is the sum total of delay experienced at every level of in the system. Such operations can be isolated, and remedial
EMMRA. This follows the relation established in (1). Table IV actions can be suggested to the users to avert a QoS breach.
indicates that the SoS perspective of delay metric is additive in Another relevant variation in delay is the variation in the ob-
nature, as discussed in Section III. served delay and the delay computed using the modeling in (1)
Monitoring and characterizing delay helps determine the (without the correction factors pi ). This variation accounts for
necessary configuration for a particular type of application the component-induced performance degradation due to which
stack to be deployed on the cloud. Delay modeling in Section III delay is induced in the system. In Fig. 9, we notice this variation
is an attempt to achieve the same. The experimental evidence in the observed delay and the computed delay value. To account
provided here supports the model defined earlier. for this component-induced performance degradation, we need
Although characterization of delay in a deployment is advan- to include some correction factors in delay modeling, and
tageous, it has been noted that that delay is not a constant in the hence, it is apt to have pi at each level in EMMRA CC. It
system. The following section describes the variation in delay may be necessary to perform a calibration run to determine the
and its implications. accurate values of pi in a particular setup.
TABLE V
T HROUGHPUT OF T EN S AMPLE T RANSACTIONS
table represents the number of database transactions needed A. Maraschini, P. Massonet, H. Muoz, and G. Tofetti, “Reservoir—When
to complete phase 2 of our application. The total through- one cloud is not enough,” Computer, vol. 44, no. 3, pp. 44–51, Mar. 2011.
[10] T. Gwo-Hshiung, G. H. Tzeng, and J.-J. Huang, Multiple Attribute Deci-
put shown in Table V follows the additive rule presented in sion Making: Methods and Applications. Boca Raton, FL, USA: CRC
Section III. Press, Jun. 2011.
[11] L. Zeng, B. Benatallah, A. H. H. Ngu, M. Dumas, J. Kalagnanam, and
H. Chang, “QoS-aware middleware for web services composition,” IEEE
VI. C ONCLUSION Trans. Softw. Eng., vol. 30, no. 5, pp. 311–327, May 2004.
[12] T. L. Saaty, Multicriteria Decision Making: The Analytic Hierarchy
The new approach presented in this paper enables cloud Process: Planning, Priority Setting Resource Allocation. Pittsburgh,
computing service providers and operations centers to meet PA, USA: RWS Publications, 1990.
committed customer QoS levels using a trusted QoS metric [13] A. S. Prasad and S. Rao, “A mechanism design approach to resource
procurement in cloud computing,” IEEE Trans. Comput., vol. 63, no. 1,
collection and analysis implementation scheme that extends pp. 17–30, Jan. 2014.
traditional monitoring, management, and response for IaaS and [14] M. F. Mithani and S. Rao, “Improving resource allocation in multi-tier
SaaS to a complete SOA stack that includes business logic cloud systems,” in Proc. 6th Annu. IEEE Int. SysCon, Vancouver, BC,
Canada, Mar. 2012, pp. 356–361.
(BaaS) and governance (GaaS). This paper includes real-world [15] G. A. Lewis, E. Morris, P. Place, S. Simanta, and D. B. Smith, “Require-
scenarios that describe the applications of this approach to voice ments engineering for systems of systems,” in Proc. 3rd Annu. IEEE Int.
and data systems for performance metrics and to DDoS for SysCon, Mar. 2009, pp. 247–252.
[16] S. M. White, “Modeling a system of systems to analyze requirements,” in
security metrics. Next steps include simulating these scenarios Proc. 3rd Annu. IEEE Int. SysCon, Mar. 2009, pp. 83–89.
to quantify the effectiveness of this approach with respect to [17] Defense Acquisition Guidebook (DAG), Jan. 2012. [Online]. Available:
operations center response time to restore QoS in the presence https://dag.dau.mil/Pages/Default.aspx
[18] Systems Engineering Guide for Systems of Systems, Version 1.0,
of anomalous enterprise events. ser. OUSD (A & T) SSE, Aug. 2008.
[19] P. Hershey and D. Runyon, “SOA monitoring for enterprise computing
systems,” in Proc. 11th Int. IEEE EDOC Conf., Oct. 2007, pp. 443–450.
R EFERENCES [20] P. J. Denning and J. P. Buzen, “The operational analysis of queueing
[1] P. Mell and T. Grance, “The NIST definition of cloud comput- network models,” ACM Comput. Surv., vol. 10, no. 3, pp. 225–261,
ing,” Natl. Inst. Standards Technol. (NIST), U.S. Dept. of Commerce, Sep. 1978.
Gaithersburg, MD, USA, NIST Special Publication 800-145, Sep. 2011. [21] DoD instruction 5200.40: DoD Information Technology Security Certifi-
[Online]. Available: http://csrc.nist.gov/publications/nistpubs/800-145/ cation and Accreditation Process, Dec. 1997. [Online]. Available: http://
SP800-145.pdf csrc.nist.gov/groups/SMA/fasp/documents/c&a/DLABSP/i520040p.pdf
[2] L. Badger, T. Grance, R. Patt-Corner, and J. Voas, “Cloud computing [22] V. Emeakaroha, I. Brandic, M. Maurer, and S. Dustdar, “Low level metrics
synopsis and recommendations,” Natl. Inst. Standards Technol. (NIST), to high level SLAs—LoM2HiS framework: Bridging the gap between
U.S. Dept. of Commerce, Gaithersburg, MD, USA, NIST Special Pub- monitored metrics and SLA parameters in cloud environments,” in Proc.
lication 800-146, May 2012. [Online]. Available: http://csrc.nist.gov/ 2010 Int. Conf. HPCS, 2010, pp. 48–54.
publications/nistpubs/800-146/sp800-146.pdf [23] DoD instruction 5200.80: Security of DoD Installations and Resources,
[3] M. F. Mithani, M. Salsburg, and S. Rao, “A decision support system 2005. [Online]. Available: http://www.dtic.mil/whs/directives/corres/pdf/
for moving workloads to public clouds,” in Proc. Annu. Int. Conf. CCV, 520008p.pdf
Singapore, May 2010, pp. 3–9. [24] IETF, “A one-way delay metric for IPPM,” Fremont, CA, USA, RFC2679.
[4] J. Spring, “Monitoring cloud computing by layer, Part 1,” IEEE Security [Online]. Available: http://www.ietf.org/rfc/rfc2679.txt
Privacy, vol. 9, no. 2, pp. 66–68, Mar./Apr. 2011. [25] J.-Y. Girard, “Linear logic,” Theoretical Comput. Sci., vol. 50, no. 1,
[5] J. Spring, “Monitoring cloud computing by layer, Part 2,” IEEE Security pp. 1–101, 1987.
Privacy, vol. 9, no. 3, pp. 52–55, May/Jun. 2011. [26] P. Hershey and C. Silio, “Procedure for detection of and response to
[6] Y. Wei and M. Blake, “Service-oriented computing and cloud computing: distributed denial of service cyber attacks on complex enterprise systems,”
Challenges and opportunities,” IEEE Internet Comput., vol. 14, no. 6, in Proc. 6th Annu. IEEE Int. SysCon, Mar. 2012, pp. 85–90.
pp. 72–75, Nov./Dec. 2010. [27] Network Infrastructure Technology Overview. Version 8, Release 5,
[7] Q. Zhang, L. Cheng, and R. Boutaba, “Cloud computing: State-of-the-art Apr. 27, 2012, (ArchiveEntry=U_Network_V8R5_Overview.pdf).
and research challenges,” J. Internet Serv. Appl., vol. 1, no. 1, pp. 7–18, [Online]. Available: http://goo.gl/Ja8Ajk
May 2010. [28] Enclave Security Technical Implementation Guide, Jan. 3, 2011,
[8] R. Calheiros, R. Ranjan, and R. Buyya, “Virtual machine provisioning (ArchiveEntry=U_Enclave_V4R3_STIG.pdf). [Online]. Available:
based on analytical performance and QoS in cloud computing environ- http://goo.gl/Xhy4L2
ments,” in Proc. ICPP, 2011, pp. 295–304. [29] A. Iyengar, M. Squillante, and L. Zhang, “Analysis and characterization
[9] B. Rochwerger, D. Breitgand, A. Epstein, D. Hadas, I. Loy, K. Nagin, of large-scale web server access patterns and performance,” World Wide
J. Tordsson, C. Ragusa, M. Villari, S. Clayman, E. Levy, Web, vol. 2, no. 1/2, pp. 85–100, Jun. 1999.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
HERSHEY et al.: SoS FOR QoS OBSERVATION AND RESPONSE IN CLOUD COMPUTING ENVIRONMENTS 11