100% found this document useful (1 vote)
234 views6 pages

Anomaly Detection On Active Directory Log Data For Insider Threat Monitoring

What you see is not definitely believable is not a rare case in the cyber security monitoring. However, due to various tricks of camouflages, such as packing or virutal private network (VPN), detecting “advanced persistent threat”(APT) by only signature based malware detection system becomes more and more intractable. On the other hand, by carefully modeling users’ subsequent behaviors of daily routines, probability for one account to generate certain operations can be estimated and used in anom

Uploaded by

zeliquestein
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
234 views6 pages

Anomaly Detection On Active Directory Log Data For Insider Threat Monitoring

What you see is not definitely believable is not a rare case in the cyber security monitoring. However, due to various tricks of camouflages, such as packing or virutal private network (VPN), detecting “advanced persistent threat”(APT) by only signature based malware detection system becomes more and more intractable. On the other hand, by carefully modeling users’ subsequent behaviors of daily routines, probability for one account to generate certain operations can be estimated and used in anom

Uploaded by

zeliquestein
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

AD2: Anomaly Detection on Active Directory Log

Data for Insider Threat Monitoring

Chih-Hung Hsieh Chia-Min Lai, Ching-Hao Mao, and Tien-Cheu Kao Kuo-Chen Lee∗
Institute of Informaiton Industry Institute of Informaiton Industry Institute of Informaiton Industry
Taipei, Taiwan Taipei, Taiwan Taipei, Taiwan
Email: [email protected] Email: {senalai, chmao, tckao}@iii.org.tw Email: [email protected]
*: Corresponding Author

Abstract—What you see is not definitely believable is not a AD data modeling. The proposed framework 1) takes the
rare case in the cyber security monitoring. However, due to AD logs as time-series input data providing chronological
various tricks of camouflages, such as packing or virutal private evidences, 2) emphasizes on sequential context mining from
network (VPN), detecting “advanced persistent threat”(APT) by collected AD log, and 3) for each account, build the prob-
only signature based malware detection system becomes more ability Markov model where best depicts the corresponding
and more intractable. On the other hand, by carefully modeling
users’ subsequent behaviors of daily routines, probability for one
likelihoods of different behaviors occurring.
account to generate certain operations can be estimated and used For real world feasibility concerning, a real dataset of AD
in anomaly detection. To the best of our knowledge so far, a novel logs, from real organization in Taiwan containing large amount
behavioral analytic framework, which is dedicated to analyze of employees, was collected. After proper pre-processing and
Active Directory domain service logs and to monitor potential
behavior learning, the performance of the proposed frame-
inside threat, is now first proposed in this project. Experiments
on real dataset not only show that the proposed idea indeed work is measured with cross-validation manner for the sake
explores a new feasible direction for cyber security monitoring, of objectively evaluating the effectiveness and robustness. A
but also gives a guideline on how to deploy this framework to fair N -fold cross validation experiment shows that the learnt
various environments. behavioral Markov model gives successful results in terms of
outstanding recall rates as well as fair precisions. The merits
Keywords—Active Directory Log Analysis, Anomaly Detection,
Behavioral Modeling, Machine Learning, Advanced Persistent
and contributions of this paper are fourfold. 1) the inside
Threat. threat detection problem is first formularized as a sequential
behavior modeling problem regarding with time-series AD
log data mining. 2) The learnt model has good ability at
I. I NTRODUCTION monitoring the post-compromised anomaly behavior in terms
The so called “advanced persistent threat”(APT) is a set of of 66.6% recall and 99.0% precision rate. 3) the useful domain
stealthy and continuous computer hacking processes. Typically, knowledge provided by well-known cyber security company,
APT processes take a high degree of covertness over a long TrendMicro, was encoded as an annotation profile and indeed
period of time, use malware of sophisticated techniques to helps building accurate model; 4) a practical guideline for
exploit vulnerabilities in systems, and continuously monitor or future users doing parameter selection is also proposed based
extract confidential data from specific targets, once attackers on series discussions of how the framework parameters effect
get the control of victim systems. According to cyber security the detection results.
technical reports from various organizations or companies, it In the remaining parts of this report, section II briefly intro-
takes averagely at least 346 days for more than 81% victims duces the target Active Directory domain service. Section III
to aware that they have been hacked [1]. Yet, due to various describes the kernel methods of the proposed framework, in-
tricks of camouflages, such as packing, obfuscated shellcode, cluding Markov model, the encoded representation for domain
or virtual private network (VPN), signature-based malware knowledge, and the designed anomaly detection mechanism.
detection technique, such as intrusion detection system (IDS), The performance of the proposed framework is evaluated in
is getting less useful, especially on identifying advanced Section IV. At last, Section V concludes this project.
persistent threat(APT) at post-compromised stage. Facing the
above challenges, considering account’s subsequent behaviors
II. BACKGROUNDS & M ATERIALS
may gives cyber security engineers additional context-based
evidences and smart chance to detect threats from insiders [2]. A. Active Directory Domain Service
In this report, instead of expert rules with only a few The Active Directory domain service is a directory service
pre-defined signatures, a threat detection method regarding that Microsoft developed for Windows domain networks [3][4].
account’s behavior sequences recorded as Active Directory An AD domain controller authenticates and authorizes all users
(AD) logs was proposed. The AD domain controller of an and computers in a Windows domain type networkassigning
organization monitors all related information when any intra- and enforcing security policies for all computers and installing
net accounts try to allocate or acquire various resources and or updating software. For example, when a user logs into a
services. To the best of our knowledge so far, none of previous computer that is part of a Windows domain, Active Directory
works are dedicated to threat identification by using sequential checks the submitted password and determines whether the

287
user is a system administrator or normal user [5]. The AD two months, from 2014/11/26 to 2015/01/02. Among the 22.5
domain controller of an organization monitors all related GBs data, logs with respect to accounts of real employees
information when any intra-net accounts try to allocate or was left to be analyzed and forms an original dataset D in
acquire various resources and services. To the best of our the following experiment. At last, D consists of 2,887,504
knowledge so far, none of previous works are dedicated to logs recorded in a 5.2 GBs file from 95 employees. It shoud
threat identification by using sequential AD data modeling. be mentioned that because the number of event code “4624”
Figure 1 is an illustration example of how a “Kerberos” and “4634” extremely dominate than those of others, and may
process, one variant of AD domain, works and interacts with lead to a biased Markov model. For the dataset D in the
intra-net accounts. The process goes as the following steps, following experiment, those two event codes are safely ignored
detail about this “Kerberos” process example can be referenced as considering model state.
from [6]:
III. M ETHOD
Stage 1 The Authentication Service Exchange
Step 1 Kerberos authentication service request The ultimate goals of this project are as followings:
(KRB AS REQ) 1) to collect Active Directory log data which is generated
Step 2 Kerberos authentication service response once any account try to access the Active Directory domain
(KRB AS REP) service; 2) to build behaivoral model capable of describing
personal tendency for each user; and 3) to estimate the resulted
Stage 2 The Ticket-Granting Service Exchange
likelihood given one’s model to generate certain subsequent
Step 3 Kerberos ticket-granting service request event codes. Based on the requirements mentioned above,
(KRB TGS REQ) the structure of the proposed framework can be made up of
Step 4 Kerberos ticket-granting service response three major modules. The first is responsible for pre-processing
(KRB TGS REP) raw data composed of whole access logs caused by all intra-
Stage 3 The Client/Server Exchange net accounts and for forming the input dataset of following
Step 5 Kerberos application server request analytic usages. In the second module, The Markov model,
(KRB AP REQ) the famous machine learning algorithm and well-known as a
Step 6 Kerberos application server response (optional) consequent state changing modeling tool, is then adopted to
(KRB AP REP) be the kernel approach to summarizing the user’s behaviors.
The last one is designated as an anomaly detection process
to determine the likelihood that a certain account’s model
produce the given input sequences of event codes. Figure 2
shows the whole framework comprising of the three modules.
The remain parts of this section give the brief introduction of
adopted Markov model, the usage of prior knowledge, and the
details about those three major modules.

A. Markov Model
In probability theory, a Markov model is a stochastic
approach to model randomly changing systems where it is
assumed that future states depend only on the present state
and not on the sequence of events that preceded it (that is,
it assumes the Markov property)[7]. Generally, the reason
of taking this assumption is because it enables subsequent
reasoning and computation regarding the model that would
otherwise be intractable. The simplest Markov model is the
Markov chain. It models the state of a system with a ran-
dom variable that changes through time. In this context, the
Fig. 1. An illustration of AD domain controller works and interact with tendency of every transition form one state to another is
intra-net accounts. described by a probability. The Markov property suggests
that the distribution of this probability depends only on the
distribution of the previous state. Due to the advantage of being
B. The Real Dataset good at describing consequent state changing, there are lots of
applications, such as speech recognition [8], hand-written text
The performance of the proposed framework is evaluated in recognition [9], gesture recognition [10], and cyber security
this section. In this experiment, a certain government organiza- intrusion detection[2], based on Markov model or its popular
tion of 95 employees in Taiwan was selected as the deployment variant, hidden Markov model. Figure 3 is an illustration
environment of the proposed method. The Active Directory example of a Markov model with 3 states [11].
domain service using version of Windows Servers 2008 R2
is mounted on this organization’s domain network and keeps
B. Generic Markov-model-state Annotation profile
monitoring all service requests and resource allocations raised
from intra-net account. In this circumstance, Total 12,310,519 Because the proposed method try to build the behavioral
logs with size of 22.5 giga-bytes (GBs) was collected during model for each account and to detect anomaly once account

288
codes “4771” represents that an account was pre-authentication
failed for some reason. Therefore, it will provide much useful
information about details of failure when event code “4771”
be co-considered with the attribute “result code”. Figure 4
is an illustration of a Markov-model-state annotation profile.
It includes three annotations for three different event codes,
respectively. For example, event code of no.4623 should be
co-considered with both two fields, “xxx” and “yyy”, to be
formed as one state of Markov model. Note that for any event
code which is unlisted in profile, it means that this event code
is adopted default setting where implies using only event code
itself as Markov state.

Annotation 1:
Event Code = 4623 with
F ield N ame1 = xxx,
F ield N ame2 = yyy.

Annotation 2:
Event Code = 4624 with
F ield N ame1 = xxx,
F ield N ame2 = yyy,
F ield N ame3 = zzz.

Annotation 3:
Event Code = 4723 with
F ield N ame1 = aaa,
F ield N ame2 = bbb,
F ield N ame3 = ccc.

Fig. 4. An example of Markov-model-state annotation profile.

Fig. 2. The proposed framework of inside threats monitoring

C. Module I: Generate Dataset of Event Code Chains.

Because the collected raw data is mixed with AD log data


coming from multiple different accounts and the proposed
method try to build separate behavior model for each account,
one function of Module I is to partition original AD log data
into different files of dataset, one for each account. Besides,
the adopted Markov approach models one account’s multiple
instances of different operations as a probabilistic model.
Fig. 3. An example of 3-state Markov model. Different operation instances also need to be divided from
a long consequent log sequence of event code into multiple
shorter event code segments. The hypothesis used here to
separate out those event code segments is that considering idle
do not act like themselves compared to their historical daily time in naive but real circumstance, the time intervals inter
routine. For this purpose, instead of considering only Ac- two independent segments of operations are usually longer
tive Directory code, making use of additional information than those intra one operation. For this reason, during the idle
as more as possible may significantly improve the model’s time, the time intervals between two subsequent AD logs are
representative degree. In this subsection, we define a Markov- usually longer than those in working periods. In module I,
model-state annotation (MMA) profile to encode proper prior a real-valued parameter θ is adopted as a cutting threshold.
knowledge by describing that which event code should be θ is longest allowed time interval and is used to divide a
co-considered with certain specific attributes as a complete event sequence into two segments. If a time interval of two
Markov state. For example, according to the Active Directory subsequent AD logs are longer than θ, it will lead two different
domain service specification provided by Microsoft, event event segments generated by cutting out the idle time.

289
D. Module II: Build Markov Model given an Event Chains he did his daily routine jobs. Based on the Pref and a user-
Dataset defined threshold parameter, δ, the following equation (1) is
designed to detect the anomaly.
This section defines what components constitute adopted
Markov model and how to build Markov model given the
dataset consisting of event chains of an employee. N M Pi (T ri ) − N M Pi (Datasetunseen )
Cond.i : ≥δ
Given the dataset Di containing event chains of the i th N M Pi (T ri )
employee, i = 1, ..., E, the resulted Markov model based on
the dataset should include following components:

1) A finite state set S = {s1 , s2 , ..., sns } that contains all N M Pi (Dataset) = s(Dataset)
l(c)
M Pi (c) (1)
c∈Dataset
possible states of Markov model defined by the MMA
mentioned in previous section and derived from Di . Note
that ns is the total number of derived states in Markov s(Dataset) = size of Dataset
model;
2) An 1 × ns initial probability (IP ) vector, IP = l(c) = length of c
[ip1 , ..., ipns ] where ipi represents the probability that ith
state is the initial state of an event code segment, and T ri represents training dataset for model building of em-
ns
ployee i = 1, ..., E and E is the maximum index of employees.
ipi = 1.
i=1 Assume that Dataset is a set of event segments given as
input of employee i’s learnt Markov model and c is any
∀ i = 1, ..., ns, event segment belonging to Dataset. N M Pi (·) and M Pi (·)
#event chains starting with si in Di
ipi = #event chains in Di
return the normalized and original model probabilities where
i’s learnt Markov model can generate the whole Dataset,
3) An ns×ns transition probability (T P ) matrix, as following: respectively. The reason of using normalized probability is that
⎡ ⎤ Markov model has two characteristics: 1) the longer the length
tp1,1 . . . tp1,j . . . tp1,ns of an event segment has, the smaller resulted probability is; and
⎢ .. .. .. . .. ⎥ 2) the more segments a dataset consists of, the smaller resulted
⎢ . . . .. . ⎥
⎢ ⎥ probability of this dataset is. To cope with this situation and
T P = ⎢ tpi,1 . . . tpi,j . . . tpi,ns ⎥
⎢ .. .. .. ⎥ provide a fair evaluation between datasets or event segments
⎣ ..
. ..
. ⎦ with different sizes, we normalize the resulted probability,
. . .
tpns,1 . . . tpns,j . . . tpns,ns M Pi (·), of a dataset to N M Pi (·) not only according to length
of each event segment but also according to size of each of
where for each i and j, tpi,j represents the transition dataset with equation (1). Note that N M Pi (T ri ) is exactly
probability from ith state to j th state, with constraints that the referenced probability Pref for ith employee, mentioned
ns
in the beginning of this subsection.
tpi,j = 1, i = 1, ..., ns.
j=1
The idea of using equation (1) as anomaly detection
∀ i, j = 1, ..., ns, mechanism is relative intuitive. Once the condition cond.i is
#transitions from s to s in D true, it means the probability of ith employee’s Markov model
tpi,j = #transitions starting ifrom js in Di generating unseen dataset (N M Pi (Datasetunseen )) is relative
i i
lower than the ith employee’s referenced probability (Pref or
Given an observed event code segment, c = [o1 , o2 , ..., oT ] of N M Pi (·)) by a given ratio threshold δ. In this circumstance,
length T , ot means the tth observed Markov state in c. And the it is quite unlikely that this Datasetunseen came from the ith
model probability (M P ) that Markov model of (S, IP, T P ) employee. Due to this hypothesis, when the cond.i is true, it
generates event code segment c can be calculated by: should trigger an alert of anomaly.
T −1
M P (c) = ipo1 tpot ,ot+1 . IV. PARAMETER S ELECTION & P ERFORMANCE
t=1
E VALUATION
A. Experiment Settings
E. Module III: Probability Estimating given Markov Model
Considering both effectiveness and robustness of proposed
The anomaly detection mechanism for inside threat moni- method, the performance is fairly measured with a N -fold
toring is implemented in Module III. After generating training cross validation manner. Assume that D is the dataset contain-
dataset by using Module I, the output of Module II is personal ing event chains of all employees, then D1 ∪D2 ∪...∪DE = D,
behavioral model accompanied with a referenced probability, where Di is the dataset of event chains for ith employee,
Pref . This referenced probability is calculated by estimating i = 1, ..., E, and E is now 95 in current experiment. In N -fold
the likelihood that this Markov model generates the training cross validation, each Di will first be partitioned into N folds.
dataset of itself which is used to build the corresponding In each fold, one of the N parts is used for validation, named
model. Pref provides referenced usages that how well this as InnerV aij , while the other N − 1 parts are combined and
resulted model fits the used training dataset and what is the formed as the so called InnerT rij for model building. Note
likelihood that this employee produce corresponding logs when that InnerT rij ∪ InnerV aij = Di for j = 1, ..., N .

290
According to our N -fold cross validation setting, in each Annotation 1:
fold j, for each employee i, the corresponding InnerV aij Event Code = 4768 with
will be used as the Datasetunseen in equation (1) to eval- F ield N ame1 = “return code ,
uate the model trained by InnderT rij . The model trained F ield N ame2 = “f ailure code ,
InnderT rij will then try to differentiate whetehr the given F ield N ame3 = “service name .
input Datasetunseen belongs to corresponding account or
not. In this experiment, i = 1, ..., E, j = 1, ..., N , while Annotation 2:
E and N are set to be 95 and 5. It will results in total
Event Code = 4769 with
5 × 95 × 95 = 45, 125 testing cases.
F ield N ame1 = “return code ,
In our proposed framework, there are three kinds of param- F ield N ame3 = “service name .
eters needed to be determined. Following are the brief reviews
of them, and the ranges of parameter value to be tuned in the Annotation 3:
following parameter selection. Event Code = 4771 with
F ield N ame1 = “return code ,
1) θ: This is the longest allowed time interval between two
subsequent event codes in one event code segments, and F ield N ame2 = “f ailure code ,
is used to divide a event sequence into two segments. In F ield N ame3 = “service name .
this experiment, θ is set to be 3 minutes, 6 minutes, and 9
minutes.
2) δ: The usage of this parameter is a anomaly detection Fig. 5. The Markov-model-state annotation profile from TrendMicro com-
pany.
threshold included in equation (1). In this experiment, δ
is set to be 1%, 5%, and 10%.
3) MMA: a Markov-model-state annotation profile specifies
which event code should be co-considered with certain outperform than the other two, None Used MMA and Ran-
attributes as a state in the Markov model. Note that for dom MMA, in terms of recall and accuracy. Although the
event code not to be listed in MMA means that use the TrendMicro and All Used MMA perform almost exactly
default setting. The default setting now is using event code the same, the former is still more feasible than the latter
itself only. Note that, in this experiment, the possibilites in the realistic deployed environment. Because the space
of candidated MMA can be classified into two categories. complexity of Markov model state is about O(nm ) while
The first category is without using any domain knowledge, n is the number of attributes, and there are m different
such that MMA can be: a) for every event code using values in each attribute. The MMA co-considering with
none of attribute (None Used MMA, i.e. default setting), all possible attributes may result in the number of Markov
b) for every event code using all kinds of attribute (All states exponentially increasing. Therefore, the TrendMicro
Used MMA), and c) for every event code using randomly MMA, incorporating significant prior knowledge, provides
selected attributes (Random MMA). The second one is the best and rational results.
based on the domain knowledge given by our cooper- 2) In this experiment, the longer the interval time θ is, the
ated domain experts working in TrendMicro company better performance of Markov model built can deliver. The
(TrendMicro MMA). Figure 5 shows the used domain idea of the proposed framework is to model user’s complete
knowledge-based TrendMicro MMA which annotates event operations as multiple instances of possible behaviors. The
codes 4768, 4769, and 4771 with additional attributes, time interval of 3 minutes may be more likely too short
respectively. to contain a complete operation than using 9 minutes as
maximum allowed idle time. However, this setting should
be customized according to the scenario of different de-
B. Results & Discussions ployed environments based on appropriate experiments for
As mentioned above, there are total 45,125 cases testing if parameter selection.
the given input Datasetunseen belongs to Markov model being 3) It is no wonder that smaller values of δ will cause the whole
evaluated. The predicted result could be positive or negative. anomaly detection mechanism more sensitive by making
The positive case means the predicted label is “Datasetunseen the cond.i in equation (1) more easily to be trigger (i.e.
does not belong to this account” and is also the anomaly case increasing recall rate). On the other hand, despite larger
where the Management Information System (MIS) engineers values of delta increase the threshold of trigger an anomaly
are interested. On the other hand, the negative one represents alert, it still indeed enhances the certainty grades once
that the input data belongs to this account. The following any anomaly alerts are triggered (i.e. increasing precision
measurements are used to evaluate the proposed method and rate). When setting the values of δ, it should concern the
corresponding parameter setting. Table I, Table II, and Table inevitable trade-off situation between recall and precision.
III shows the different performance under different settings of Generally speaking, because the cyber security attacking
maximum interval time θ = 3, 6, and 9 minutes. Based on the causes huge amount of damage in the most cases, to make
results of 5-fold cross validation , it can be observed that: sure an acceptable recall rate is our first thumb rule.
4) Although, combining the prior domain knowledge form
1) The behavioral modeling seems to take advantage of addi- TrendMicro, AD2 can not only produce highest perfor-
tional annotations when co-considering event code with an- mance in terms of recall and accuracy, but also may sig-
notated attributes as a Markov state. The effect is shown by nificantly reduce the number of possible states in Markov
that the TrendMicro MMA and All Used MMA obviously model state set (S) compared to all-used MMA. However,

291
AD2 with TrendMicro MMA still can only produce about every intra-net accounts was collected. And an anomaly de-
66% recall rate or accuracy. It shows us that anomaly tection framework based on famous Markov model algorithm
detection only based on AD log may be limited. Due to was proposed to analyze AD log and to build the personal
this reason, the future work of this study is inspired with model for each account. Further a novel Markov-model state
that combining AD log with other various logs or contexts annotation (MMA) profile was also be incorporated during
has opportunity to improve the performance of detecting the model training and testing phases. Experiments on a
anomaly. dataset from a real environment of 95 employees shows that
the proposed Markov-model based approach combined with
TP TrendMicro prior knowledge will give the best performance
Recall = (2) of about 66.6% recall and 99.0% precision rates compared to
TP + FN model without using domain knowledge. The major advantages
TP of of using TrendMicro MMA than using all-used MMA is that
P recision = (3)
TP + FP TrendMicro MMA consists of only a few Markov-model-state
TP + TN annotations such that it can significantly reduced the number of
Accuracy = (4) possible Markov states being concerned, compared to all-used
TP + TN + FP + FN MMA. However, even combining the prior domain knowledge,
T P : True Positive, T N : True Negative, (5) AD2 only can produce about 66% recall rate or accuracy.
That may gives us another conjecture that anomaly detection
F P : False Positive, F N : False Negative. based on analyzing AD log may be limited by information
which AD log can tell. This observation inspires our team that
TABLE I. P ERFORMANCE EVALUATION WITH θ = 3 MINUTES . combining AD log with other various logs or contexts may
MMA TrendMicro All Used
be helpful to detect anomaly. A brief guideline of how to set
δ 1% 5% 10% 1% 5% 10% up the parameters included in this framework is also provided
Recall 64.81% 61.99% 57.98% 64.82% 61.99% 57.98% according to the experimental result. The future works include:
Precision 99.09% 99.19% 99.29% 99.09% 99.19% 99.29%
Accuracy 64.60% 61.89% 58.01% 64.60% 61.89% 58.02% 1) keep improving the recall rate without sacrificing accompa-
MMA None Used Random nied precision; 2) make a clustering analysis on the intra-net
δ 1% 5% 10% 1% 5% 10% accounts to see whether different people behave like a group
Recall 54.00% 51.77% 48.90% 56.86% 51.70% 45.14% or not.
Precision 99.01% 99.10% 99.21% 99.02% 99.13% 99.21%
Accuracy 53.95% 51.82% 49.06% 56.76% 51.77% 45.36%
R EFERENCES
TABLE II. P ERFORMANCE EVALUATION WITH θ = 6 MINUTES . [1] “Trend micro white paper on advanced persistent threat(apt),” Trend
Micro Inc., Tech. Rep., 2013.
MMA TrendMicro All Used [2] H.-K. Pao, C.-H. Mao, H.-M. Lee, C.-D. Chen, and C. Faloutsos,
δ 1% 5% 10% 1% 5% 10%
Recall 64.88% 62,15% 58.30% 64.88% 62,15% 58.30%
“An intrinsic graphical signature based on alert correlation analysis
Precision 99.10% 99.15% 99.28% 99.10% 99.15% 99.28% for intrusion detection,” in Technologies and Applications of Artificial
Accuracy 64.67% 62.02% 58.32% 64.67% 62.02% 58.32% Intelligence (TAAI), 2010 International Conference on. IEEE, 2010,
MMA None Used Random pp. 102–109.
δ 1% 5% 10% 1% 5% 10% [3] “Directory system agent,” Microsoft, MSDN Library, Tech.
Recall 54.27% 52.17% 49.44% 56.20% 50.15% 42.11% Rep., 2014, [Online; accessed: 6-May-2014]. [Online]. Available:
Precision 98.99% 99.04% 99.18% 99.13% 99.18% 99.26% https://msdn.microsoft.com/en-us/library/ms675902(v=vs.85).aspx
Accuracy 54.20% 52.18% 49.56% 56.17% 50.26% 42.40%
[4] M. E. Russinovich and D. A. Solomon, Microsoft Windows Internals:
Microsoft Windows Server (TM) 2003, Windows XP, and Windows 2000
TABLE III. P ERFORMANCE EVALUATION WITH θ = 9 MINUTES . (Pro-Developer). Microsoft Press, 2004.
[5] “Active directory collection: Active directory on a windows
MMA TrendMicro All Used server 2003 network,” Microsoft, TechNet Library, Tech. Rep.,
δ 1% 5% 10% 1% 5% 10% 2015, [Online; accessed: 6-May-2015]. [Online]. Available:
Recall 66.60% 64.38% 61.37% 66.60% 64.38% 61.37% https://technet.microsoft.com/en-us/library/cc780036(WS.10).aspx
Precision 99.07% 99.16% 99.25% 99.07% 99.16% 99.25%
Accuracy 66.34% 64.21% 61.32% 66.34% 64.21% 61.32% [6] “How the kerberos version 5 authentication protocol works,”
MMA None Used Random Microsoft, TechNet Library, Tech. Rep., 2015, [Online; accessed:
δ 1% 5% 10% 1% 5% 10% 6-May-2015]. [Online]. Available: https://technet.microsoft.com/en-
Recall 54.55% 52.92% 50.59% 53.22% 48.11% 41.64% us/library/cc772815(v=ws.10).aspx
Precision 98.98% 99.07% 99.15% 99.10% 99.12% 99.20% [7] J. R. Norris, Markov chains. Cambridge university press, 1998, no. 2.
Accuracy 54.47% 52.92% 50.68% 53.24% 48.24% 41.93%
[8] L. Rabiner, “A tutorial on hidden markov models and selected applica-
tions in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2,
pp. 257–286, 1989.
V. C ONCLUSION [9] M.-Y. Chen, A. Kundu, and J. Zhou, “Off-line handwritten word
recognition using a hidden markov model type stochastic network,”
Not only because APT attacking takes a high degree of Pattern Analysis and Machine Intelligence, IEEE Transactions on,
covertness over a long period of time, but also it usually vol. 16, no. 5, pp. 481–496, 1994.
cause lots of human efforts or financial damage. Efficient and [10] A. D. Wilson and A. F. Bobick, “Parametric hidden markov models for
effective inside threat monitoring becomes a hot issue during gesture recognition,” Pattern Analysis and Machine Intelligence, IEEE
recent decade. In this project, unlike just using of expert rules Transactions on, vol. 21, no. 9, pp. 884–900, 1999.
with only a few pre-defined signatures, the idea of most likely [11] “Markov model and hidden markov model,” 2015,
state changing estimation is leveraged as a behavioral modeling [Online; accessed: 1-May-2015]. [Online]. Available:
http://www.csie.ntnu.edu.tw/ u91029/HiddenMarkovModel.html
technique. The logs of Active Directory domain service from

292

You might also like