SNP 14
SNP 14
An integral part of enterprise computer security incident response teams, a security operations center
(SOC) monitors security incidents in real time. Security incident and event management systems play
a critical role in SOCs—collecting, normalizing, storing, and correlating events to identify malicious
activities—but face operational challenges.
1540-7993/14/$31.00 © 2014 IEEE Copublished by the IEEE Computer and Reliability Societies September/October 2014 35
www.computer.org
Authentication device
Security
management Security alerts
platform
Firewall
Normalized Terminal
Network device events
stream
alerts
Forensic
analysis
Application server database
Hosts
Figure 1. A typical security incident and event management (SIEM) system architecture. The SIEM system accepts inputs from various security
devices and sensors. Connectors receive events, parse them, and convert them into a common format.
such tools became more widely used and more tools for post hoc forensic analysis as well as investigating and
appeared, two problems arose: first, there were too detecting slow and stealthy attacks, including advanced
many user interfaces to manage, and second, there were persistent threats (APTs).
no tools to correlate events across different security
tools. Correlating events across tools became a necessity Structuring SOCs around SIEM Systems
because individual tools that operate with little or no Figure 1 illustrates an SIEM system’s basic architectural
awareness of the IT architecture trigger too many false- components. The SIEM system accepts inputs from var-
positive alerts, even after careful tuning. On the other ious security devices and sensors, including perimeter
hand, when multiple sensors trigger alerts in response defense systems (network firewalls and intrusion pre-
to an action, the action is more likely to be malicious. vention systems), host sensors (IDSs and AVSs), appli-
SIEM systems were designed to meet these chal- cations (Web application firewalls and authentication
lenges: they collect events from diverse sources, each systems), and network sensors. Each device and sen-
of which might represent events using a vendor-spe- sor is configured to output security events with unusual
cific schema; normalize these disparate schemata into or anomalous behavior that might indicate malicious
a common representation; and store these normalized intent. These events are represented in vendor-, device-,
events. Their rule engine triggers alerts from the stored and version-specific schema. So, the SIEM system’s
events; the rules allow correlation of events from differ- first task is to normalize the different representations
ent sensors. SIEM systems also include auxiliary con- into a common format to ease further processing and
textual information, such as up-to-date information on to simplify rule creation and maintenance. As the figure
enterprise assets that can be used to write better, con- shows, SIEM system connectors—customized for each
text-aware rules and prioritize alerts. SIEM systems’ version, device type, and vendor—receive the events.
main strength is their ability to cross-correlate logs The connectors parse input events and convert them
from diverse sources using common attributes to define into a common format, and do so in a scalable manner
meaningful attack patterns and scenarios, which when to keep up with the event source.
they occur, can alert security analysts (SAs). Thus, SIEM Once normalized, events are forwarded to the secu-
systems are like radar, detecting objects in a timely man- rity management platform and the archival forensic
ner. Their long-term event retention capability is useful analysis database. The platform maintains and analyzes
www.computer.org/security 37
yearly maintenance cost. If a SOC staffs 20 SAs, the severe time restrictions force SAs to sample alerts from
yearly operating cost might be upwards of $5 million. the events list. Although the number of alerts on SA
Enterprise-scale SIEM systems need significant screens is proportional to the size of the logs flowing
investment in both hardware and manpower, and SIEM into the SIEM system, some professionals claim that by
systems and SOCs must continue to deliver to justify writing the right SIEM system rules and applying the
the investment. right management techniques, they can dampen the
relation between the volume of logs and alerts. This
Operational Challenges remains speculative; with the explosive growth in data
SOCs confront various operational challenges when using rates, it’s difficult to see how SIEM system processing
SIEM systems, driven primarily by the scale and complex- rates can keep up under cost constraints.
ity of the enterprise being monitored and the rate at which
events arrive from security devices and sensors. Lack of Contextual Information
Another challenge SOCs face is isolation from enter-
Rule Creation and Management prise network operations. SOC personnel aren’t
Having all the network and host logs at the SAs’ finger- involved in the details of configuring, testing, and
tips is attractive because the more information a SOC maintaining enterprise assets. Routine activities such
has, the better its situational awareness. However, this as patching, backup, and testing might trigger alerts
comes at the cost of trans- in SIEM systems designed
forming an SIEM to detect security
system’s data manage- Communicating the right kind and breaches, and track-
ment to big data man- amount of information between ing down the cause
agement, which turns of such alerts creates
enterprise operations and the SOC
storage, search, shar- unnecessary overhead.
ing, transfer, analy- in an automated way is essential. In an interview,
sis, and visualization a senior SA pointed
into challenges. One out the importance
aspect of this problem is the system’s inability to effi- of automatically collecting detailed host configurations,
ciently execute complex queries, severely limiting SAs’ servers, devices, and user information. In principle, this
ability to write c omplex correlation rules. A more information can be correlated with SIEM system alerts
problematic aspect is the number of false alarms that to significantly reduce false-alarm rates. However, col-
the SIEM system rules tend to trigger. Because benign lecting and maintaining this information, especially in
events outnumber malicious ones, even a low false- large networks, is challenging. Instead, such informa-
positive rate will produce many false alarms,2 which the tion is often communicated in an informal, even ad hoc
SOC might not have the capacity to deal with. There- manner, either verbally or via email. In one incident, an
fore, SIEM system rules must have extremely low false- SA had to contact the network operations team about
positive rates to be usable in practice. potential malicious activity in the internal network,
Most of the time, SIEM system analysts need to write which turned out to be a spurt in traffic from a patching
very specific rules to capture an attack, but this means server. The SOC saw probing alerts on its screens; these
the system might miss other forms of that attack. Thus, were manually tracked to the patching server and even-
there’s always a tradeoff between false-positive and tually declared false positives.
false-negative rates. To prevent false negatives—that Information communicated informally usually falls
is, detection misses from overly specific attack rules— through the cracks when SOC analysts change shifts,
engineers resort to generic rules, so that an activity with thus exacerbating the problem. Instead of storing cru-
even a remote possibility of indicating an attack will cial contextual information in SIEM systems, all too
trigger an alert. Then, analysts are responsible for moni- often, SOCs rely on SAs to maintain this information.
toring the SIEM system to distinguish the true alarms Unfortunately, this information is lost when SAs leave
from the enormous number of false ones. Many SOC and replacements are hired.
teams have limited resources to process overwhelming In general, although isolating a SOC from the
volumes of events. Thus, the SOC enters a vicious cycle enterprise systems’ routine maintenance activities is
of accumulating more and more alerts that SAs must a reasonable objective, communicating the right kind
process each hour. and amount of information between enterprise opera-
In talking with many SOC teams, we found it’s tions and the SOC in an automated way is essential to
acceptable to triage an event in 10 minutes; some teams reduce the SOC’s load and to achieve more effective
would like to reduce this to one minute or less! Such security monitoring.
www.computer.org/security 39
attacks and other security-relevant events. For example, effective and efficient decisions, for instance, identi-
they might correlate HTTP proxy and antivirus prod- fying a new attack or deciding which security alerts
uct logs to detect malware downloaded to endpoint to respond to. SIEM systems must develop visualiza-
devices. However, performing correlations and iden- tion techniques that aid humans in gathering infor-
tifying patterns at the scale of large enterprises remain mation from large quantities of data, provide context
challenges. Moreover, SIEM systems must perform information in a timely manner, and work at different
more sophisticated analysis to derive true value from organizational levels, such as system administrator and
the collected events. higher-level management.
Scalable analysis algorithms that handle 1 trillion or
more events per day face significant challenges. First, Toward Addressing the Challenges
identifying attacks from event streams is more art than Multiple SIEM system vendors have offered different
science—no definition of attacks exists, so SOC ana- approaches to improve SIEM systems’ capabilities to
lysts use heuristics derived from past experience to collect, store, and correlate events in large enterprise
identify attacks and other relevant events. Analysis algo- networks; however, progress is necessary to address
rithms should automatically identify patterns of interest scalability issues. For example, because complex
from large event streams, but automating a heuristics- correlation is time consuming, analysts typically avoid
driven process is difficult. creating feature-rich correlation rules that incorporate
Second, even if an algorithm can identify attacks many information sources to capture sophisticated and
today, it might not work tomorrow as adversaries stealthy attacks.
adapt, enterprise networks change, and employees’ To the best of our knowledge, no research directly
behaviors change. Hence, the algorithm must learn addresses SIEM system challenges. However, we
and evolve continuously. believe that advances in many fields of computer
Third, the problem of false positives becomes more science will significantly impact SIEM systems. For
acute as SIEM systems collect more data. Because example, advances in storage systems—especially
benign events outnumber malicious events by orders of nonvolatile memory—will help with storing more
magnitude, an extremely low false-positive rate might event data at lower cost. Similarly, advances in paral-
still produce too many false positives to be usable in lel and distributed computing, especially in big data
practice. Hence, analysis algorithms might not be able analysis, will provide the platform for scalable anal-
to make a scalability–accuracy tradeoff. ysis. For example, a distributed correlation engine
Fourth, even if an analysis algorithm produces no might handle more complex rules than traditional
false positives, it might produce more true positives SIEM systems. There’s also recent work on using big
than SOC analysts can handle. Hence, SIEM sys- data analysis to identify actionable security informa-
tems will have to prioritize the true positives for SOC tion from very large event datasets. Ting-Fang Yen
analyst consumption. and her colleagues analyze HTTP proxy logs to iden-
Fifth, more events might lead to statistically signifi- tify suspicious host activities—they extract features
cant but ultimately meaningless correlations.3 When from logs, then use clustering to find outlying suspi-
dealing with high-dimensional data, many unrelated cious activities.4
variables can have high correlations, which will manifest There’s a long line of research on alert correlation
as false positives. Hence, analysis approaches should be as a way to increase the features available to make deci-
able to filter out spurious correlations. sions, building on the assumption of the impractical-
Finally, the hardest challenge is inferring human ity of achieving meaningful results on the basis of a
intent from machine logs—analysis algorithms will single event such as a network packet.2,5,6 However,
have to infer attacker and user intent from event streams alert correlation solutions tend to have false correla-
to identify true attacks, as both malicious attacker tions from the large amount of low-quality events that
actions and benign user actions might generate the SIEM systems handle. This has led to research on alert
same event patterns. prioritization—that is, identifying higher-quality alerts
that analysts should focus on. Researchers introduced
Visualization multiple alert prioritization approaches, some using
We believe that SIEM systems will never reach the matu- probability theory and Dempster-Shafer theory.7,8
rity level needed to replace human analysts in SOCs. At Big data visualization is a very active research area.9
best, they’ll be tools in analysts’ and network adminis- Data visualization specifically for security has also
trators’ decision-making processes. Hence, SIEM sys- been explored.10 Advances in these two areas will help
tems face the challenge of summarizing analysis results address the big data visualization problem that SIEM
and presenting them so that humans can make more systems face.
On Computing
-big-errors-people.
4. T.-F. Yen et al., “Beehive: Large-Scale Log Analysis for
Detecting Suspicious Activity in Enterprise Networks,”
Proc. 29th Ann. Computer Security Applications Conference podcast
(ACSAC 13), 2013, pp. 199–208. www.computer.org/oncomputing
5. F. Cuppens and A. Miege, “Alert Correlation in a Cooper-
ative Intrusion Detection Framework,” Proc. IEEE Symp.
Security and Privacy, 2002, pp. 202–215.
6. B. Morin et al., “A Logic-Based Model to Support Alert
Correlation in Intrusion Detection,” Information Fusion,
2009, pp. 285–299.
7. L. Zomlot et al., “Prioritizing Intrusion Analysis Using
Dempster-Shafer Theory,” Proc. 4th ACM Workshop Arti-
ficial Intelligence and Security (AISec 11), 2011, pp. 59–70.
8. Y. Zhai et al., “Reasoning about Complementary Intru-
sion Evidence,” Proc. 20th Ann. Computer Security Applica-
tions Conf. (ACSAC 04), 2004, pp. 39–48.
9. S. Liu et al., “A Survey on Information Visualization:
Recent Advances and Challenges,” The Visual Computer,
Springer, 2014, pp. 1–21.
10. R. Marty, Applied Security Visualization, Addison-Wesley,
2009.
www.computer.org/security 41