0% found this document useful (0 votes)
49 views12 pages

62a292c20b7371fb41311930 - WP Machine Learning

The document discusses Darktrace's approach to cybersecurity which combines unsupervised and supervised machine learning. Darktrace uses unsupervised machine learning to continuously learn what normal behavior looks like in a network without relying on prior classifications of threats. It then uses supervised machine learning to detect known threats by learning from examples. The combination allows Darktrace to detect both known and unknown threats.

Uploaded by

harichigurupati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views12 pages

62a292c20b7371fb41311930 - WP Machine Learning

The document discusses Darktrace's approach to cybersecurity which combines unsupervised and supervised machine learning. Darktrace uses unsupervised machine learning to continuously learn what normal behavior looks like in a network without relying on prior classifications of threats. It then uses supervised machine learning to detect known threats by learning from examples. The combination allows Darktrace to detect both known and unknown threats.

Uploaded by

harichigurupati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Darktrace AI: Combining Unsupervised

and Supervised Machine Learning


Technical White Paper
A New Age of Cyber-Threat
Executive Summary A Changing Threat Landscape
Darktrace is a research and development-led company that first applied The last decade has seen an unmistakable escalation in cyber conflict, as criminals,
unsupervised machine learning to the challenge of detecting threatening nation states and lone opportunists take advantage of digitization and connectivity to
digital activity within corporate networks, with the aim of combatting novel compromise networks and gain advantage – whether financial, reputational, or strategic.
cyber attacks that bypass rule-based security tools. This white paper Our adversaries in cyber space are constantly innovating too, launching new attack
examines the multiple layers of machine learning that make up Darktrace’s tools and technologies to get through traditional cyber security defenses, such as
Cyber AI, and how they are architected together to create an autonomous, firewalls and signature-based gateways, and into the file shares or accounts that are
system that self-updates, responding to, but not requiring, human input. most valuable to them.
This system is today used by over 4,000 organizations globally. Attackers have also exploited to their advantage the digital complexity of the average
organization of today, which shares data across multiple locations, devices and
technology services– from cloud services and SaaS tools, to untrusted home networks
and non-official IoT devices. Meanwhile, malicious insiders remain a constant threat.
The new age of cyber security is defined by a constantly evolving threat landscape:
no sooner have you identified the adversary, than it has shape-shifted into something
unrecognizable.
The emergence of these new threats created a technical change: how can we use
machine learning to detect what we don’t know to anticipate? In other words, without
relying on data sets on previous attacks, how can we build a system that learns what
a threat is, by comparing its behavior to everything else going in that environment?
This white paper explains Darktrace’s approach to machine learning and shines a light
on the unique interplay between unsupervised machine learning, supervised machine
learning, and deep learning behind the world’s leading cyber AI technology.

1
Traditional Approaches to Cyber Security
According to the traditional security paradigm, firewalls, endpoint security methods, So-called ‘behavioral analytics’ rely on the rules-based paradigm of configuring how
and other tools such as SIEMs and sandboxes are deployed to enforce specific certain job titles or devices ‘should’ behave and then looking for deviations in that
policies and provide protection against known threats. While these tools have a part behavior. This approach fails to scale to the complexity and size of modern businesses.
to play in an organization’s overall defense posture, they are ill-equipped to tackle the
Ultimately, legacy systems have been outpaced by modern business complexity and
new age of rapidly evolving cyber-threats. Many have become defunct as enterprise
attacker innovation, suffering from these fundamental constraints:
infrastructures diversify.
 They need to know about all previous attacks.
Fundamental Limitations  They need to perfectly understand your business and business-specific rules.
Perimeter controls are dependent on signatures, rules and heuristics – if they miss
 They need a flawless way of sharing high quality information about new attacks.
an attack at the point of entry, they have failed and cannot take further action.
 They need to guess what all future attacks and software vulnerabilities look like.
Endpoint security also depends on signatures, and only detects attacks that have
been previously identified – ineffective against unseen threats.  They need to be able to turn all the above insights into rules or signatures that work.
Log tools and SIEM databases require manual effort to ensure data is consistently Most significantly, legacy tools require victims before they can provide solutions.
collected across the entire organization and matched against the security team’s The age of unpredictable, fast-moving attacks has rendered this approach woefully
predictions of threats. It relies on the security team imagining everything that might deficient.
possibly go wrong, without overwhelming analysts with alarms.

2
Applying AI Across Diverse Digital Environments
Darktrace’s Cyber AI technology is powered by multiple machine learning approaches,
An Examination of Supervised Machine Learning
which operate in combination to power the world’s first AI platform for cyber defense, Supervised learning works by using previously-classified data, from which the machine
working in any digital environment. This allows Darktrace to protect the entire digital learns the classification system. For scenarios where behaviors are well understood
estate of the 4000 organizations that it protects, including corporate networks, cloud and classifications are easy to determine, the output of such systems can be highly
computing services and SaaS, IoT, Industrial Control Systems, and email systems. accurate.
Plugged into the heart of the organization’s infrastructure and services, the AI For example, state-of-the-art image classification systems are outperforming humans
ingests and analyzes the data and its interactions within the environment, and forms in some cases. Indeed, what makes supervised machine learning so powerful is its
an understanding of the normal behavior of that environment, right down to the ability to learn to deal with the errors and noise of the real world, through a statistical
granular details of specific users and devices, using unsupervised machine learning. approach.
A self-learning approach, the system continually revises its understanding about
Thus, supervised machine learning systems are best equipped to give you an explicit
‘what is normal’ based on evolving evidence.
answer based on prior knowledge. For example, we can feed a system with lots of
This evolving understanding of normal means that the AI can identify, with a high examples of known ransomware and it will learn the common indicators of that
degree of precision, events or behaviors that are anomalous, and unlikely to be malware and be able to detect similar attacks in the future. However, overfitting is a
benign. The ability to identify highly subtle activity that represents the first footprints common problem in supervised machine learning, where model parameters are too
of an attacker, without any prior knowledge or intelligence, lies at the heart of the finely tuned to the training data.
AI’s efficacy in keeping pace with today’s threat actors. The AI detects what humans
Instead of learning the essence of a category, the machine learns a particular example
cannot, amid the immense noise of legitimate, day-to-day digital interactions.
– for example, a machine may learn to recognize a German Shepherd, but fail to
understand ‘dogs’ as a category, when distinguishing between ‘dogs’ and ‘cats,’ despite
recognizing the features that make that German Shepherd pertain to the group.

3
Supervised Machine Learning and Cyber Security An Examination of Unsupervised Machine Learning
In the information security context, supervised machine learning is used to train a Unsupervised machine learning is critical because, unlike supervised approaches,
database of previously seen behaviors, where each behavior is known to be either it does not require labeled training data. Instead it is able to identify key patterns
malicious or benign and is labeled as such. and trends in the data, without the need for human input. Unsupervised learning can
therefore take computer processing beyond what programmers already know or can
New activities are then analyzed to see whether they more closely match those in
imagine, and discover previously unknown relationships.
the malicious class, or those in the benign class. Any that are evaluated as being
sufficiently likely to be malicious are again flagged as threats. Darktrace uses unique unsupervised machine learning algorithms to analyze enterprise
data at scale, and make billions of probability-based calculations based on the
Systems that rely entirely on supervised machine learning have fundamental constraints:
evidence that it sees.
 Malicious behaviors that deviate sufficiently from those seen before will fail to be
Instead of relying on knowledge of past threats, it independently classifies data
classified as such, hence will pass undetected.
and detects compelling patterns. From this, it forms an understanding of ‘normal’
 A large amount of human input is needed to label the training data. behaviors across the infrastructure, pertaining to devices, users, or cloud containers
and sensors, and detects deviations from this evolving ‘pattern of life’ that may point
 Any mislabeled data or human bias introduced can seriously compromise the
to a developing threat.
ability of the system to correctly classify new activities.
Machine learning has presented a significant opportunity to the cyber security
industry. New machine learning methods can vastly improve the accuracy of threat
detection thanks to the greater amount of computational analysis they can handle.
They are also heralding in a new era of autonomous response, as machine learning
systems are sufficiently intelligent to understand how and when to fight back against
in-progress threats.

4
Darktrace Machine Learning:
Combining Unsupervised with Supervised
Darktrace technology is powered, at its core, by unsupervised machine learning
algorithms that uncover rare and previously-unseen threats that other approaches
Reverend Thomas Bayes
fail to detect. Over the years, our R&D team in Cambridge, England, has continually The mathematics at the forefront of Darktrace’s machine learning approach are
developed and deepened the capability of its set of proprietrary technologies, benefiting anchored in the seminal work of British mathematician Thomas Bayes (1702–
from the knowledge and experience from the thousands of deployments of Cyber 1761). His theory of conditional probability provides a mathematical bridge
AI across the world. between objective, developed methods and the subjective world that we populate.
An advanced approach to Bayesian theory, developed by mathematicians from
Today, the Darktrace AI architecture is made up of deep learning techniques that
the University of Cambridge, provides a filter to ascertain the true meaning of
supplement the unsupervised algorithms with expert knowledge from the field.
vague and profuse data.
Darktrace’s use of Bayesian probability as part of its unsupervised machine
learning approach uniquely enables Darktrace’s technology to:
 Discover previously unknown relationships.
 Independently classify data.
 Detect compelling patterns that define
what might be considered normal behavior.
 Work without prior assumptions
when needed.

5
Core Principles of Darktrace’s Machine Learning: Ranking Threat
 Learns ‘on the job’ – it does not depend upon knowledge of previous attacks. Darktrace’s AI accounts for ambiguities by distinguishing between the subtly differing
levels of evidence that characterize network data. Instead of generating the simple
 Thrives on complexity and diversity of modern businesses.
binary outputs ‘malicious’ or ‘benign’, Darktrace’s mathematical algorithms produce
 Constantly revises assumptions about behavior, using probabilistic mathematics. outputs marked with differing degrees of potential threat. This enables users of the
system to rank alerts in a rigorous manner, and prioritize those which most urgently
 Always up to date, and not reliant on human input.
require action.
The impact of Darktrace’s unsupervised machine learning on cyber security is
Meanwhile, it avoids the problem of numerous false positives associated with
transformative. Its Cyber AI technology has quickly proven itself capable of seeing
a rule-based approach.
hitherto undiscovered cyber events, from a variety of threat sources, which would
otherwise have gone unnoticed. At its core, Darktrace mathematically characterizes what constitutes ‘normal’ behavior,
based on the analysis of a large number of different measures of a device’s network
These include:
behavior, including:
 Insider threat – malicious or accidental
 Server access
 Zero-day attacks – previously unseen, novel exploits
 Data volumes
 Latent vulnerabilities
 Timings of events
 Machine-speed attacks – ransomware and other automated attacks that propagate
 Credential use
and/or mutate very quickly
 Connection type, volume, and directionality
 Cloud and SaaS-based attacks
 Directionality of uploads/downloads
 Silent and stealthy attacks
 File type
 Advanced spear-phishing
 Admin activity
 Resource and information requests

6
Clustering Methods Modeling Dynamic Environments
In order to model what should be considered as normal for a device or cloud container, A major challenge in modeling the behaviors of a dynamically evolving infrastructure
its behavior is analyzed in the context of other similar entities on the network. Darktrace is the huge number of potential predictor variables. For the observation of packet
uses unsupervised machine learning to algorithmically identify significant groupings, traffic and host activity within an enterprise LAN or WAN, where both input and output
a task which is impossible to do manually. can contain many inter related features (protocols, source and destination machines,
log changes, and rule triggers), learning a sparse and consistent structured predictive
To create a holistic image of the relationships within the network, Darktrace employs
function is crucial.
a number of different clustering methods, including matrix-based clustering,
density-based clustering, and hierarchical clustering techniques. The resulting clusters In this context, Darktrace employs a, cutting-edge large- scale computational approach
are then used to inform the modeling of the normative behaviors. to understand sparse structure in models of network connectivity based on applying
L1- regularization techniques (the lasso method). This allows Darktrace’s AI to discover
true associations between different elements of a network which can be cast as
efficiently solvable convex optimization problems and yield parsimonious models.

Within one week of installing Darktrace, the Enterprise


Immune System notified us to threats and vulnerabilities
we had been totally unaware of.
- Gabe Cortina, Chief Technology Officer,
Bunim/Murray Productions

7
Recursive Bayesian Estimation Enhancing Detection
To combine these multiple analyses of digital activity, Darktrace leverages the power Deep learning is a subset of machine learning that uses the cascading interactions of
of Recursive Bayesian Estimation (RBE). Using RBE, Darktrace’s mathematical models layered mathematical processes – known as neural nets – to give intelligent systems
are able to constantly adapt to new information as it becomes available to the system. a higher degree of insight. Multi-layered neural nets can improve the detection and
remediation of certain threats, for example, in the identification of DNS anomalies,
Continually recalculating threat levels in the light of new data, the Darktrace Immune
which are less effectively tracked by other machine learning methods. Darktrace’s
System can discern significant patterns in data flows indicative of attacks, where
deep learning system assigns a score to all DNS data from a device or digital entity,
conventional signature-based methods see only chaos.
with the purpose of identifying suspicious activity even faster.
Darktrace clusters devices into peer groups, based on its own understanding of how
those devices behave, and uses supervised learning to uncover sequences of breaches,
As we shifted to the new mode of operation with unusual patterns, or to detect aberrant activity at a higher, more holistic level. For
example, the well-known WannaCry ransomware was detected by Darktrace as it
people being remote, Darktrace very quickly gave us breaches a number of different ‘pattern of life’ models. Combining this approach to
the ability to have the same functionality that we had when detection with supervised machine learning, Darktrace can replicate the process of a
everybody was working on campus. human interpreting various sets of breaches for a device, network or data environment
over time and so present correlated alerts instead of a multitude.
- Irving Bruckstein, CIO, Salve Regina University
Supervised Machine learning is also used by Darktrace to understand more about the
environment, without a human having to label it. By observing millions of different
smartphones, for example, Darktrace gets faster and faster at identifying a new
device as a ‘smartphone’, and even what type of smartphone it is.
Using a combination of supervised and unsupervised techniques to complement its
core unsupervised machine learning algorithms, Darktrace builds up unique, contextual
knowledge about activity and integrates the insights of our global deployments to
improve threat detection.

8
Autonomous Response
Because Darktrace’s artificial intelligence is capable of understanding the ‘pattern of Thanks to the combination of core unsupervised AI and machine learning, this solution
life’ across the entire digital infrastructure at a granular level, detect specific deviations can also learn from itself, as well as learn passively from the data that it observes.
from normal, benign activity, in addition, it can come to autonomous decisions about For example, when Antigena generates an autonomous response action, a feedback
how to appropriately and proportionately respond to an in-progress attack. reinforcement loop is triggered.
AI response can be triggered by a significant deviation from the derived ‘normal’ Available across cloud, email, IoT, and on-premise networks, Darktrace Antigena is a
for the device and its peer group, by the detection of specific malicious indicators crucial part of the data-agnostic Cyber AI Platform that works across an organization’s
or unwanted activities, or by a combination of small but meaningful indicators and entire digital business. Used in this way, Cyber AI technology does not replace the
subtle deviations from expected behavior. This autonomous response technology, human’s function, but rather serves to enhance it. Darktrace Antigena acts faster
known as Antigena, is supported by this unsupervised ‘pattern of life’ detection, as than a human, buying the security team precious time to catch up.
well as a range of cutting-edge supervised and unsupervised classifiers that measure
associations between users, activity patterns and user intents.
Antigena can generate proportionate and targeted surgical responses to deliver these
precise actions without disrupting daily business operations. Actions are targeted to We no longer live in an era where cyber-attacks are limited
the source of the threat, and escalated only when necessary. For instance, Antigena to the desktop or server. Darktrace’s machine learning
may strip active parts from an email attachment, sever an unusual FTP connection,
or block access to Office 365 from an anomalous IP range.
fights the battle before it has begun.
- Michael Sherwood, CIO, City of Las Vegas

9
Automating Threat Investigation Processes
Finally, Darktrace also uses various machine learning techniques to automate repetitive The Cyber AI Analyst can often detect details that a human might miss, or might not
and time-consuming tasks carried out during investigation workflows. By analyzing have time to identify, and can decide whether or not an initial hypothesis holds in a
how expert cyber analysts interact with the AI’s output, for example how they triage matter of minutes. More crucially, the technology can classify and store the results
threat alerts and how they use third-party sources, Darktrace is able to replicate those of these investigations, allowing for only a small number of high priority incidents to
expert behaviors and automate certain analyst functions. This allows for increasingly be presented at any one time.
efficient and simplified investigations for analysts of all maturity levels. It also gives
It communicates its findings and recommendations immediately in the user interface,
security teams the crucial time they need to focus on higher-value strategic work,
and additionally as detailed PDF reports, which present only a few high-priority incidents
such as managing risk and focusing on broader improvements to the business.
at any one time in natural language. These are then enriched with context and security
Darktrace leverages supervised learning in a capability that mimics the way a human insights that can be reviewed and understood by executives and end-users alike.
carries out the threat investigation process, in the form of a capability known as the
Crucially, the Cyber AI Analyst is able to adapt to new and unprecedented situations
Cyber AI Analyst. In the initial stages of the investigation process, the technology will
on the fly, enabling users to spend less time trawling through alerts, and more time
make broad hypotheses about what is happening, and then will query and analyze
prioritizing the strategic work that matters.
this information as a human would – using custom algorithms and other machine
learning techniques.
Once the investigation has been launched, the results of these can be classified using
supervised machine learning to determine incidents of interest, at a speed and scale
only possible with AI.
It’s mind-bending that Darktrace has been able to do this.
Having the Cyber AI Analyst stitching together multiple
security alerts at once helps us get to the
high-value work quickly.
- Phillip Miller, CISO, Brooks Brothers

10
Conclusion
Our generation is witnessing the machine learning revolution. We are seeing shifts
in working practices brought about by the replacement of muscle with machine, the
automation of repetitive tasks, and now the replacement of low value, thoughtful
tasks with machines capable of handling big data and making vast calculations.
As networks have grown in scope and complexity, the opportunities for attackers to Darktrace is one of the few in the threat
exploit the gaps have increased. Walls are no longer enough to protect the corporate analysis space doing it right.
networks spilling into home environments, and rules-based tools cannot keep up
with all possible attack vectors, and cannot respond fast enough if a machine-speed - Alissa Knight, Senior Analyst, Aite group
attack hits. A constantly evolving cyber-attack landscape requires a step up in our
detection capability, using machine learning to understand the environment, filter the
noise and take action where threats are identified.
Darktrace’s technology has become a vital tool for security teams attempting to
understand the scale of their network, observe levels of activity, and detect areas of
potential weakness. Machine learning technology is the fundamental ally in the defense
of systems from the hackers and insider threats of today, and in formulating response
to unknown methods of cyber-attack. It is a momentous change in cyber security.

11

You might also like