0% found this document useful (0 votes)
161 views3 pages

Big Data Analytics For Security

The document discusses how big data analytics is being applied to security problems. It enables analyzing large, heterogeneous security data at unprecedented scales. Technologies like Hadoop allow efficient storage and analysis of terabytes of security logs and events. This facilitates applications like advanced persistent threat detection by long-term correlation of diverse internal and external data sources. However, privacy and data provenance are challenges to address when using big data for security applications.

Uploaded by

Dinesh Anbumani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views3 pages

Big Data Analytics For Security

The document discusses how big data analytics is being applied to security problems. It enables analyzing large, heterogeneous security data at unprecedented scales. Technologies like Hadoop allow efficient storage and analysis of terabytes of security logs and events. This facilitates applications like advanced persistent threat detection by long-term correlation of diverse internal and external data sources. However, privacy and data provenance are challenges to address when using big data for security applications.

Uploaded by

Dinesh Anbumani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

SYSTEMS SECURITY

Editors: Patrick McDaniel, [email protected] | Sean W. Smith, [email protected]

Big Data Analytics for Security


Alvaro A. Crdenas | University of Texas at Dallas
Pratyusa K. Manadhata | HP Labs
Sreeranga P. Rajan | Fujitsu Laboratories of America

nterprises routinely collect


terabytes of security-relevant
data (for instance, network events,
software application events, and
peoples action events) for regulatory compliance and post hoc
forensic analysis. Large enterprises
generate an estimated 10 to 100 billion events per day, depending on
size. These numbers will only grow
as enterprises enable event logging
in more sources, hire more employees, deploy more devices, and run
more software. Unfortunately, this
volume and variety of data quickly
become overwhelming. Existing
analytical techniques dont work
well at large scales and typically
produce so many false positives that

74

November/December 2013

their efficacy is undermined. The


problem becomes worse as enterprises move to cloud architectures
and collect much more data.
Big data analyticsthe largescale analysis and processing of
informationis in active use in
several fields and, in recent years,
has attracted the interest of the
security community for its promised ability to analyze and correlate security-related data efficiently
and at unprecedented scale. Differentiating between traditional data
analysis and big data analytics for
security is, however, not straightforward. After all, the information
security community has been leveraging the analysis of network traffic,
Copublished by the IEEE Computer and Reliability Societies 

system logs, and other information sources to identify threats and


detect malicious activities for more
than a decade, and its not clear how
these conventional approaches differ from big data.
To address this and other questions, the Cloud Security Alliance
(CSA) created the Big Data Working Group in 2012. The group consists of volunteers from industry
and academia working together to
identify principles, guidelines, and
challenges in this field. Its latest
report, Big Data Analytics for Security Intelligence (https://cloud
securityalliance.org/download/
big-data-analytics-for-security
-intelligence), focuses on big datas
role in security. The report details
how the security analytics landscape is changing with the introduction and widespread use of new
tools to leverage large quantities of
structured and unstructured data.
It also outlines some of the fundamental differences from traditional
analy
tics and highlights possible
research directions. We summarize
some of the reports key points.

Advances in
Big Data Analytics

Data-driven information security


dates back to bank fraud detection
and anomaly-based intrusion detection systems (IDSs). Although
analyzing logs, network flows, and
system events for forensics and
intrusion detection has been a
problem in the information security
1540-7993/13/$31.00 2013 IEEE

community for decades, convenIn particular, new big data tech- were too much for traditional SIEM
tional technologies arent always nologiessuch as the Hadoop systems (it took between 20 minadequate to support long-term, ecosystem (including Pig, Hive, utes to an hour to search among a
large-scale analytics for several rea- Mahout, and RHadoop), stream months load of data). In its new
sons: first, retaining large quantities mining, complex-event process- Hadoop system running queries
of data wasnt economically feasible ing, and NoSQL databasesare with Hive, it gets the same results
before. As a result, in traditional enabling the analysis of large-scale, in approximately one minute.3 The
infrastructures, most event logs and heterogeneous datasets at unprec- security data warehouse driving
other recorded computer activities edented scales and speeds. These this implementation lets users mine
were deleted after a fixed retention technologies are transforming secu- meaningful security information
period (for instance, 60 days). Sec- rity analytics by facilitating the stor- from not only firewalls and secuond, performing analytics and com- age, maintenance, and analysis of rity devices but also website traffic,
plex queries on large, unstructured security information. For instance, business processes, and other daydatasets with incomplete
to-day transactions. This
and noisy features was
incorporation of unstrucEven with privacy regulations in place,
inefficient. For example,
tured data and multiple
we need to understand that large-scale
several popular security
disparate datasets into a
information and event
collection and storage of data make these single analysis framework
management
(SIEM)
is one of big datas promdata stores attractive to many parties.
tools werent designed
ising features.
to analyze and manage
Big data tools are also
unstructured data and
particularly suited to
were rigidly bound to predefined the WINE platform1 and Bot- become fundamental for advanced
schemas. However, new big data Cloud2 allow the use of MapReduce persistent threat (APT) detection
applications are starting to become to efficiently process data for secu- and forensics.4,5 APTs operate in
part of security management soft- rity analysis.
a low-and-slow mode (that is, with
ware because they can help clean,
We can identify some of these a low profile and long-term execuprepare, and query data in hetero- trends by looking at how reactive tion); as such, they can occur over
geneous, incomplete, and noisy security tools have changed in the an extended period of time while
formats efficiently. Finally, the man- past decade. When the market for the victim remains oblivious to the
agement of large data warehouses IDS sensors grew, network monitor- intrusion. To detect these attacks,
has traditionally been expensive, ing sensors and logging tools were we need to collect and correlate
and their deployment usually deployed in enterprise networks; large quantities of diverse data
requires strong business cases. The however, managing the alerts from (including internal data sources and
Hadoop framework and other big these diverse data sources became a external shared intelligence data)
data tools are now commoditiz- challenging task. As a result, secu- and perform long-term historical
ing the deployment of large-scale, rity vendors started the develop- correlation to incorporate a postereliable clusters and therefore are ment of SIEMs, which aimed to riori information of an attack in the
enabling new opportunities to pro- aggregate and correlate alarms and networks history.
other network statistics and present
cess and analyze data.
Fraud detection is one of the all this information through a dash- Challenges
most visible uses for big data ana- board to security analysts. Now Although the application of big
lytics: credit card and phone com- big data tools are improving the data analytics to security problems
panies have conducted large-scale information available to security has significant promise, we must
fraud detection for decades; how- analysts by correlating, consolidat- address several challenges to realize
ever, the custom-built infrastruc- ing, and contextualizing even more its true potential.
Privacy is particularly relevant
ture necessary to mine big data for diverse data sources for longer perias new calls for sharing data among
fraud detection wasnt economical ods of time.
We can see specific benefits from industry sectors and with law
enough to have wide-scale adopbig
data tools from a recent case enforcement go against the privacy
tion. One of the main impacts from
study
presented by Zions Bancorpo- principle of avoiding data reuse
big data technologies is that theyre
facilitating a wide variety of indus- ration. Its study found that the data that is, using data only for the purtries to build affordable infrastruc- quantities it had to deal with and the poses that it was collected. Until
number of events it had to analyze recently, privacy relied largely on
tures for security monitoring.
www.computer.org/security

75

SYSTEMS SECURITY

technological limitations on the


ability to extract, analyze, and correlate potentially sensitive datasets.
However, advances in big data analytics have given us tools to extract
and correlate this data, making privacy violations easier. Therefore, we
must develop big data applications
with an understanding of privacy
principles and recommendations.
Although privacy regulation exists
in some sectorsfor instance, in the
US, the Federal Communications
Commission works with telecommunications companies, the Health
Insurance Portability and Accountability Act addresses healthcare
data, Public Utility Commissions
in several states restrict the use of
smart grid data, and the Federal
Trade Commission is developing
guidelines for Web activityall this
activity has been broad in system
coverage and open to interpretation in most cases. Even with privacy regulations in place, we need to
understand that large-scale collection and storage of data make these
data stores attractive to many parties, including industry (who will
use our information for marketing
and advertising), government (who
will argue that this data is necessary
for national security or law enforcement), and criminals (who would
like to steal our identities). Therefore, our role as big data application
architects and designers is to be proactive in creating safeguards to prevent abuse of these big data stores.
Another challenge is the data
provenance problem. Because big
data lets us expand the data sources
we use for processing, its hard to be
certain that each data source meets
the trustworthiness that our analysis algorithms require to produce
accurate results. Therefore, we need
to reconsider the authenticity and
integrity of data used in our tools.
We can explore ideas from adversarial machine learning and robust statistics to identify and mitigate the
effects of maliciously inserted data.
76

IEEE Security & Privacy

This particular CSA report


focuses on the use of big data analytics for security, but the other side of
the coin is the use of security to protect big data. As big data tools continue to be deployed in enterprise
systems, we need to improve systems security by not only leveraging
conventional security mechanisms
(for example, integrating Transport
Layer Security within Hadoop) but
also introducing new tools, such as
Apaches Accumulo, to deal with
the unique security problems in big
data management.
Finally, another area that the
report didnt cover but that needs
further development is humancomputer interaction and, in particular, how visual analytics can help
security analysts interpret query
results. Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces.
Compared to technical mechanisms
developed for efficient computation and storage, human-computer
interaction in big data has received
less attention but is nonetheless one
of the fundamental tools to achieve
the promise of big data analytics,
because its goal is to convey information to a human via the most
effective representation.

ig data is changing the landscape of security technologies


for network monitoring, SIEM, and
forensics. However, in the eternal
arms race of attack and defense, big
data is not a panacea, and security
researchers must keep exploring
novel ways to contain sophisticated
attackers. Big data can also create
a world where maintaining control over the revelation of our personal information is constantly
challenged. Therefore, we need to
increase our efforts to educate a
new generation of computer scientists and engineers on the value
of privacy and work with them to
develop the tools for designing big

data systems that follow commonly


agreed privacy guidelines.
References
1. T. Dumitras and D. Shou, Toward a
Standard Benchmark for Computer
Security Research: The Worldwide
Intelligence Network Environment
(WINE), Proc. EuroSys BADGERS
Workshop, ACM, 2011, pp. 8996.
2. J. Franois et al., BotCloud: Detecting Botnets Using MapReduce,
Proc. Workshop Information Forensics
and Security, IEEE, 2011, pp. 16.
3. E. Chickowski, A Case Study in
Security Big Data Analysis, Dark
Reading, 9 Mar. 2012.
4. P. Giura and W. Wang, Using
Large Scale Distributed Computing to Unveil Advanced Persistent
Threats, Science J., vol. 1, no. 3,
2012, pp. 93105.
5. T.-F. Yen et al., Beehive: LargeScale Log Analysis for Detecting
Suspicious Activity in Enterprise
Networks, to be published in Proc.
Ann. Computer Security Applications
Conference (ACSAC 13), ACM,
Dec. 2013.
Alvaro A. Crdenas an assistant pro-

fessor at the University of Texas


at Dallas. Contact him at alvaro.
[email protected].

Pratyusa K. Manadhata is a re-

searcher at HP Labs. Contact him


at [email protected].

Sreeranga P. Rajan is the director of

software systems at Fujitsu Laboratories of America. Contact him


at [email protected].

Selected CS articles and columns


are also available for free at
http://ComputingNow.computer.org.

Got an idea for a future article?


Email editors Patrick McDaniel
([email protected])
and Sean W. Smith (sws@
cs.dartmouth.edu).
November/December 2013

You might also like