Smart Log Data Analytics: Techniques For Advanced Security Analysis Florian Skopik PDF Download
Smart Log Data Analytics: Techniques For Advanced Security Analysis Florian Skopik PDF Download
https://ebookmeta.com/product/smart-log-data-analytics-
techniques-for-advanced-security-analysis-florian-skopik/
https://ebookmeta.com/product/sql-for-data-analysis-advanced-
techniques-for-transforming-data-into-insights-1st-edition-cathy-
tanimura/
https://ebookmeta.com/product/big-data-analytics-and-intelligent-
techniques-for-smart-cities-1st-edition-kolla-bhanu-prakash/
https://ebookmeta.com/product/sold-by-the-alien-2022nd-edition-
renard-loki/
Marketing 6th Edition Dhruv Grewal
https://ebookmeta.com/product/marketing-6th-edition-dhruv-grewal/
https://ebookmeta.com/product/voice-ergonomics-1st-edition/
https://ebookmeta.com/product/gripp-caged-and-dangerous-
book-6-1st-edition-milly-taiden/
https://ebookmeta.com/product/the-truth-according-to-ginny-moon-
benjamin-ludwig/
https://ebookmeta.com/product/utah-1st-edition-elle-james/
An Army Doctor on the Western Frontier Journals and
Letters of John Vance Lauderdale 1864 1890 1st Edition
Robert M. Utley
https://ebookmeta.com/product/an-army-doctor-on-the-western-
frontier-journals-and-letters-of-john-vance-
lauderdale-1864-1890-1st-edition-robert-m-utley/
Florian Skopik
Markus Wurzenberger
Max Landauer
Smart
Log Data
Analytics
Techniques for Advanced Security
Analysis
Smart Log Data Analytics
Florian Skopik • Markus Wurzenberger
Max Landauer
Max Landauer
Center for Digital Safety & Security
Austrian Institute of Technology
Vienna, Austria
This Springer imprint is published by the registered company Springer Nature Switzerland AG.
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Prudent event monitoring and logging are the only means that allow system
operators and security teams to truly understand how complex systems are utilized.
Log data are essential to detect intrusion attempts in real time or forensically work
through previous incidents to create a vital understanding of what has happened in
the past.
Today, almost every organization already logs data to some extent, and although
it means a considerable effort to establish a secure and robust logging infrastructure
as well as the governing management policies and processes, basic and raw logging
is comparatively simple in contrast to log analysis. The latter is an art of its own,
which not many organizations know how to master. Log data are extremely diverse
and processing them is unfortunately quite complex. There is no standard that
dictates the granularity, structure, and level of details that log events provide. There
is no agreement what logs comprise and how they are formatted.
Facing these facts, it is astonishing that not much literature that concerns logging
in computer networks exists. And although there are at least some great books
out there, it is not enough. On the one side some existing literature did not age
well (certain topics are simply outdated after several years as technologies evolve
and newer concepts such as bring your own device, cloud computing, and IoT hit
the market), and on the other side some relevant topics are simply not sufficiently
covered yet, especially when it comes to complex—and sometimes dry—log data
analytics.
We take Dr. Chuvakin’s (et al.) book ‘Logging and Log Management’ from 2013
as a starting point. This is a great book that covers all the essential basics from
a technical and management point of view, such as what log data actually are,
how to collect log data, and how to perform simple analysis, and also explains
filtering, normalization, and correlation as well as reporting of findings. It further
elaborates on available tools and helps the practitioner to adopt state-of-the-art
logging technologies quickly. However, while it provides a profound and important
basis for everyone who is in charge of setting up a logging infrastructure, this book
does not go far enough for certain audiences. The authors essentially stop there,
where our book starts. We assume, the reader of our book knows the basics and
v
vi Preface
has already collected experience with logging technologies. We further assume, the
reader spent some serious thoughts on what to log, how to log and why to log—
and that common challenges regarding the collection of log data have been solved,
including time synchronization, access control for log agents, log buffering/rotation,
and consistency assurance. For all these topics, technical (and vendor-specific)
documentation exists.
We pick up the reader at this point, where they ask the question what to do with
the collected logs beyond simple outlier detection and static rule-based evaluations.
Here, we enter new territory and provide insights into latest research results and
promising approaches. We provide an outlook on what kind of log analysis is
actually possible with the appropriate algorithms and provide the accompanying
open-source software solution AMiner1 to try out cutting-edge research methods
from this book on own data!
This book discusses important extensions to the state of the art. Its content is
meant for academics, researchers, and graduate students—as well as any forward-
thinking practitioner interested to:
• Learn how to parse and normalize log data in a scalable way, i.e., without
inefficient linear lists of regular expressions
• Learn how to efficiently cluster log events in real time, i.e., create clusters
incrementally while log events arrive
• Learn how to characterize systems and create behavior profiles with the use of
cluster maps
• Learn how to automatically create correlation rules from log data
• Learn how to track system behavior trends over time
In the last decade, numerous people supported this project. We would like to
specifically thank Roman Fiedler as one of the founders of the AMiner project,
Wolfgang Hotwagner for the invaluable infrastructure and implementation support,
Georg Höld for his contributions to the advanced detectors, and Ernst Leierzopf for
software quality improvements.
March 2021
1 https://github.com/ait-aecid.
Acknowledgments
This work has been financially supported by the Austrian Research Promotion
Agency FFG and the European Union’s FP7 and H2020 programs in course of
several research projects from 2011 to 2021.
vii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 State of the Art in Security Monitoring and Anomaly Detection . . . . 1
1.2 Current Trends and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 . . . Future Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Log Data Analysis: Today and Tomorrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Smart Log Data Analytics: Structure of the Book . . . . . . . . . . . . . . . . . . . . 9
1.6 Try It Out: Hands-on Examples Throughout the Book . . . . . . . . . . . . . . . 10
2 Survey on Log Clustering Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Survey Background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Survey Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Survey Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 Incremental Log Data Clustering for Processing Large
Amounts of Data Online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Concept for Incremental Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Outlook and Further Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Try It Out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 Generating Character-Based Templates for Log Data. . . . . . . . . . . . . . . . . . . 63
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Concept for Generating Character-Based Templates . . . . . . . . . . . . . . . . . 65
4.3 Cluster Template Generator Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Outlook and Further Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5 Try it Out. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5 Time Series Analysis for Temporal Anomaly Detection . . . . . . . . . . . . . . . . . 83
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Concept for Dynamic Clustering and AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.3 Cluster Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
ix
x Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
About the Authors
xi
xii About the Authors
AD Anomaly detection
AECID Automatic event correlation for incident detection
ARIMA Autoregressive integrated moving-average
CE Cluster evolution
CPS Cyber-physical systems
CTI Cyber threat intelligence
DNS Domain name system
EDR Endpoint detection and response
HIDS Host-based intrusion detection system
IDS Intrusion detection system
IOC Indicator of compromise
JSON JavaScript Object Notation
NIDS Network-based intrusion detection system
PCA Principle component analysis
SIEM Security information and event management
TSA Time-series analysis
VTD Variable type detector
xv
Chapter 1
Introduction
“Prevention is ideal, but detection is a must” [20]. Active monitoring and intrusion
detection systems (IDS) are the backbone of every effective cyber security frame-
work. Whenever carefully planned, implemented and executed preventive security
measures fail, IDS are a vital part of the last line of defence. IDS are an essential
measure to detect the first steps of an attempted intrusion in a timely manner. This
is a prerequisite to avoid further harm. It is commonly agreed that active monitoring
of networks and systems and the application of IDS are a vital part of the state
of the art. Usually, findings of IDS, as well as major events from monitoring, are
forwarded to, managed and analyzed with SIEM [77] solutions. These security
information and event management solutions provide a detailed view on the status
of an infrastructure under observation.
However, a SIEM solution is only as good as the underlying monitoring and
analytics pipeline. IDS are an inevitable part of this pipeline, which spans from
gathering data, including operating system logs, process call trees, memory dumps
etc., from systems, feed them into analysis engines and report findings to SIEMs.
Obviously, the verbosity and expressiveness of data is a key criterion for the
selection of data sources. This is an art of its own and mainly dependent on
answering what kind of common attack vectors today (referring to the MITRE
ATT&CK matrix [105] are reflected best in which sources (e.g., DNS logs, netflows,
syscalls etc.). There are literally hundreds of tools and agents to harness the different
sources and tons of guidelines on the configuration of these tools to control the
verbosity and quality of resulting log data.
In terms of detection mechanisms, most commonly used today are still signature-
based NIDS approaches. Similarly, signature-based HIDS are capable of using
host-based sources, such as audit trails from operating systems, to perform intrusion
detection. The secret of their successes lies in the simple applicability and the
virtually zero false positive rate. Either a malicious pattern is present, or it is not. As
simple as that.
Unfortunately, this easy applicability comes with a price. The slightest mod-
ification to the Malware or (configuration of) attacking tool changes the traces
an attack leaves on a system and in the numerous log files respectively, which
renders signature-based approaches almost null and void [15]. For instance, [18]
demonstrated that well-known Malware can evade IDS by implementing a single
NOP instruction in the right place of its code.
In order to mitigate attacks with polymorphic and customized tools, IDS vendors
combine signature-based approaches with heuristics to enable a kind of fuzzy
detection, i.e., detect patterns that match to a certain degree but allow some inherent
noise. This again however, increases the false positive rate, which makes such
approaches of limited use. The job for solution vendors and integrators is to find
the sweet spot where fuzzy signature-based matching still works without producing
too many false detections. While there are some promising solutions available today,
it is expected that attacks become even more customized in the future, why the focus
on the detection of known bad actions seems to be a dead end for defenders on the
long run.
As a consequence, a major transition away from signature-based blacklisting
approaches to behavior-based whitelisting approaches takes place. The fundamental
idea is that if we cannot determine how malicious activities look like on a system,
we could do it with legit activities (which get whitelisted) and define everything else
as potentially problematic. This is how anomaly detection (AD) methods work.
(continued)
Exploring the Variety of Random
Documents with Different Content
Functional diseases, 56.
G
ALVANOMETER, 300.
General causes of uterine and pelvic diseases, 61.
Germ theory of disease, 77.
German measles, 372.
Germans, river-bathing of, 77.
Germicidal properties of drugs, 176.
Girls should be independent to choose their choice, 66.
Gonorrhœal infection, statistics of, 276.
Goodells, Prof., on “abuses of uterine treatment,” 22.
on uterine symptoms, 140.
Graafian follicles, 263.
Green sickness, 133, 134.
Growth of the uterus from the moment of conception, 72.
H
EMORRHAGE of the womb, 143, 144.
of wounds, 381.
arrest of, 382.
Hemorrhoids, causing inflammation of urethra, 152, 322.
in irritability of bladder, 166.
Heroic treatment, 42.
Hippocrates’s view of fetal life, 102.
Histology of inflammation, 145.
How a woman should lie after confinement, 335.
Human ovum, size of, 264.
Hygiene of gynecology, 78, 79.
Hygienic measures, 182, 183.
Hypnotism, 52.
Hysteria, amenable to mind cure, 56.
I CE-BAGS, 291.
Imagination is the realm of the soul, 49.
Impersonal sleep of Dr. Charcot, 54.
Improprieties of dress, 68, 69, 70.
Imprudence during menstruation, 71.
In the realm of thought there is no monopoly, 15.
Indigestion, 356.
Infants fed on cows’ milk, 406.
overfeeding of, 394.
Infection, 287.
gonorrhœal, 81.
innocent, 81.
Inflammation, 145.
of the womb, 196.
chronic, 203.
Interpolar regions, 297.
Intra-abdominal pressure, 233.
Involution, 73.
Iron pills in chlorosis, 137.
L
AITY, object of educating the, 18.
Landois, Prof., on the curative force in the lower animals, 39.
Laws on abortion, 104, 105.
Leucorrhœa, 80, 176.
Little girls, muco-purulent secretion of the vagina, 152.
Lochial discharge, 84.
Lung fever, 352.
Lying on the back after confinement, 73.
N
AVEL, care of the, 338, 339.
Nerve strain, 140, 141, 152.
Nervous and congestive dysmenorrhœa, 140.
Nervous system in chlorosis, 135.
Nervousness, due to excessive mental application, 63.
Negative pole, 301.
Nine years of my professional life, 15.
Noeggerath, Dr., on gonorrhœal infection, 276.
R
EPARATIVE energy of nature, 38.
Reparative process after confinement, 233.
Retroflexion in pregnancy, 256.
treatment of, 252.
Retroflexion of the womb, 247.
Retroflexion, replacement of, 254, 255.
Retroversion of the womb, 246.
Round ligaments of the womb, 248.
Rose-rash, 372.
Rumbold, Dr. Thos. F., 184.
S
ALPINGITIS, 259.
treatment for, 260, 261.
Sawyer, Dr. Herbert C., 58.
Scalds, 382.
Scarlet fever, 373, 374, 375.
Schelling, William Joseph, 47.
Scrofulous diseases, 400.
Serous membrane, 277, 278.
Sexual desire, Prof. Carpenter on, 89, 90.
Sexual instinct not unholy and depraved, 109.
Signs and symptoms of pregnancy, 305.
Signs of chronic inflammation of the womb, 207.
Skin, or integument, 182.
Somnambulism, 52.
Soor, 343.
Sore nipples, 339.
Soxhlet, Prof., 406.
Spasms in children, 361.
Spermatozoa, 89, 100.
measurements of, 105.
Spruce, 343.
Stages of labor, 332.
Sterility in flexion, 244.
in ovaritis, 272.
due to abortion, 118.
Sterilization of milk, 407.
Stoics’ view of fetal life, 102.
Stricture of the neck of the womb, 139.
Stupidity of the masses, 23.
Subinvolution of the womb, 73.
Sugar, digestion of, 398.
Superfluous garments, 70.
Sympathy will cause disease, 58.
Syringe, proper selection of a, 181.
T
EMPERATURE for living-rooms, 189.
Terror causes or cures disease, 58.
The bed for confinement, 329, 330.
The choice of a physician, 328.
The care of the baby, 337.
The nurse, 328.
The righting of the organ, 331.
Thermæ, 76.
Thrush, 341.
Tonsilitis, 344.
Too much mischievous doctoring, 24.
Tubal dropsy, 259.
Tuke, Dr. Daniel H., 50.
Tying the cord, 336.
V
AGINA, 122.
catarrh of, 175.
catarrh in children, 177, 178.
acute and chronic inflammation of the, 172.
gonorrhœal infection of, 174.
knee-chest posture in catarrh of the, 180.
relaxed, mistaken for falling of the womb, 179, 180.
Vaginal douches after confinement, 335.
Vaginal injections, directions for their use, 85, 86.
Versions and flexions due to abortions, 117.
Virchow, Prof., theory of inflammation, 146.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.