Design and Implementation of Continuous Authentication Mechanism Based
Design and Implementation of Continuous Authentication Mechanism Based
Research Article
Design and Implementation of Continuous Authentication
Mechanism Based on Multimodal Fusion Mechanism
Received 16 October 2020; Revised 8 July 2021; Accepted 3 September 2021; Published 15 October 2021
Copyright © 2021 Jianfeng Guan et al. *is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Most of the current authentication mechanisms adopt the “one-time authentication,” which authenticate users for initial access.
Once users have been authenticated, they can access network services without further verifications. In this case, after an illegal user
completes authentication through identity forgery or a malicious user completes authentication by hijacking a legitimate user, his
or her behaviour will become uncontrollable and may result in unknown risks to the network. *ese kinds of insider attacks have
been increasingly threatening lots of organizations, and have boosted the emergence of zero trust architecture. In this paper, we
propose a Multimodal Fusion-based Continuous Authentication (MFCA) scheme, which collects multidimensional behaviour
characteristics during the online process, verifies their identities continuously, and locks out the users once abnormal behaviours
are detected to protect data privacy and prevent the risk of potential attack. More specifically, MFCA integrates the behaviours of
keystroke, mouse movement, and application usage and presents a multimodal fusion mechanism and trust model to effectively
figure out user behaviours. To evaluate the performance of the MFCA, we designed and implemented the MFCA system and the
experimental results show that the MFCA can detect illegal users in quick time with high accuracy.
1. Introduction soft biometrics [2, 3]. More specifically, the hard and soft
biometric-based authentications overcome the problems
With the vigorous development of 5G, IoT (Internet of that password authentication is long and hard to remember
*ings), and AI (Artificial Intelligence), the Internet has and the problems related to smart card, which is easy to be
penetrated into various traditional industries, which brings stolen. On the contrary, biometric-based authentications
in greater data privacy disclosure and more serious in- do not require the authentication entity to be carried along
formation security risks due to the endogenous security at all times, which is inconvenient and also easy to be lost.
issues of the Internet. As the first line of network defense, Biometric authentications are based on human physio-
authentication mechanism becomes a crucial way to ensure logical behaviour characteristics, and have the advantages
information security [1]. *e current authentication of natural nonreplication, which greatly improves user
schemes can be classified into four kinds: (1) authentica- experience and reduces the risk of privacy disclosure [4].
tions based on passwords or PINs (Personal Identification However, physiological feature recognition generally relies
Numbers), (2) portable smart card or token-based au- on specific feature recognition devices such as face rec-
thentications, (3) biometric-based authentications, such as ognition device and fingerprint collector, which depend on
face, fingerprint, and iris recognition, which are also called expensive equipment and even poses the risk of forgery
hard biometric-based authentications, (4) behaviour-based when they lack effective supervision. Besides, due to the
authentications such as gait and keystroke, which are called limitations of user devices, computation and storage of
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 Security and Communication Networks
authentication procedure are generally offloaded to edge or and the real identity of users is evaluated based on
remote cloud, which may increase the attack surface and the trust evaluation.
security risks.
*e structure of this paper is organized as follows. In
On the other hand, the first three kinds of authentica-
Section 2, the related work has been discussed. Section 3
tions belong to one-time authentication from the perspective
describes Multimodal Fusion mechanism and recognition
of the identification mode, which only verifies the users’
models used in the MFCA. *e design and procedure of the
identity when the devices are unlocked for the first time.
MFCA system have been discussed in Section 4. *e per-
Once the users have passed the authentication, which is just
formance of the proposed MFCA system is analysed com-
like getting the device’s pass card, they can use the system
prehensively in Section 5. Finally, Section 6 concludes the
resources continuously without receiving the verification
work of this paper.
again [5]. For example, when a legitimate user temporarily
leaves for tea or has a short conversation with others, the
device that is not immediately locked may be at risk of being 2. Related Work
used to steal information by an adversary. *ese kinds of *e related researches in terms of authentications based on
attacks can be classified as internal attacks, which are dif- keystroke, mouse movement, or swipe and application usage
ficult to defend. *e recent report from Cybersecurity In- are shown in Figure 1.
siders [6] shows that 68% of organizations feel moderately to Keyboard and mouse, as the most commonly used input
extremely vulnerable to insider attacks. To prevent insider devices, have their own advantages to depict users’ behav-
attacks, the concept of “Zero Trust” has been proposed, iour characteristics. Keyboard dominates text input while
which follows the principle of “Never Trust, Always Verify” mouse is more commonly used in GUI. *e identification
[7]. *e authentication, especially the continuous authen- based on single input device will affect the immediacy and
tication, plays an import role in zero trust architecture. accuracy of user identity verification. Moreover, as the in-
*is paper designs and implements a continuous au- herent hardware equipment, the keyboard and mouse are
thentication system that continuously monitors the user’s transparent to users during identity verification, which can
operations after the device is unlocked. Once the system avoid targeted destruction or forgery by identity counter-
finds that user identity is abnormal, the device will be au- feiters in advance. When users interact with the operating
tomatically locked to prevent the risk of “one-time au- system using the keyboard and mouse, they will trigger the
thentication” and guarantee the user information security. iterative update procedure of the application state. Different
*e main contributions of this paper are as follows: users’ preferences for application reflect users’ using habits,
(1) We propose a multimodal fusion mechanism for which are irreplaceable and can be used as an important way
multidimensional behaviour characteristics. Con- for identity verification.
sidering that the single mode recognition is not Keystroke dynamics refers to the physiological neural
enough to effectively depict the user’s behaviours, we control mechanism of humans, which reflects the unique
design a multimodal fusion mechanism based on characteristics by analysing users’ habits, patterns, or
multidimensional features, and construct a trust rhythms through the keystroke such as different time in-
model for continuous identity authentication, to tervals between keystrokes and keystroke strength. As early
improve the authentication accuracy and recogni- as the 1980s, research studies [8–10] had proved the utility of
tion rate. keystrokes in terms of identity verification.
Mouse dynamics refers to the track and the click of the
(2) We select three users’ behaviours to realize the mouse during user interaction with the system, and the most
multimodal identification, which include keystroke commonly used features include mouse keystroke speed,
behaviours, mouse movement, and application usage habits, frequency, and direction of the mouse moving distance.
characteristics. *e keystroke and mouse movement Everitt and Mcowan [11] found and proved the feasibility of
are time-sensitive and convenient, and the applica- knowing a user’s identity by analysing the user’s mouse op-
tion usage based on logs is more stable and efficient. eration habit and behaviour characteristics. With the devel-
More important, all of them do not rely on additional opment of computer GUI, the mouse has superseded the
hardware devices, and therefore have the advantages keyboard and become the dominant I/O device [12].
of being low cost and user friendly. User application usage refers to the characteristic of
(3) We design a Multimodal Fusion-based Continuous terminal device in terms of resources scheduling, con-
Authentication (MFCA) system based on multi- suming, and even interacting with other equipment, which
modal fusion mechanism. *e MFCA system mainly can be obtained through the system interface or process
consists of three parts: First, the multidimensional information since they are independent of special hardware.
behaviour models (keystroke model, mouse model, More recently, the behaviour-based continuous authenti-
and application model) are obtained by training on cation technologies have been widely active in many fields.
multidimensional behaviours data; second, the For example, user identity recognition can be based on
multidimensional model classification results are smartphone applications [13], the information from sensors
fused; third, the multidimensional behaviour models such as gyroscope and magnetometer [14], the users’ arm
are evaluated based on the trust model algorithm, movement records on smart watch [15], and gait recognition
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Security and Communication Networks 3
Figure 1: *e related research of keystroke, mouse/swipe, and App usage. *e different shapes represent the different authentication
methods.
based on wristband [16]. While in traditional PC, the most than 60%. However, after nearly ten years of development,
commonly used biometric authentication is still based on Shimpshon et al. [22] proposed a clustering method based
keystroke and mouse [17]. on graph in 2010 which added a similar continuous key-
stroke to form a fixed length of the session, and the ex-
perimental results in 21 real users and 165 counterfeiters
2.1. Keystroke-Based Authentication Schemes. As early as showed that it has a False Accept Rate (FAR) of 3.47% just
1975, Spillane [8] discussed the feasibility of keystroke in by using 250 keystrokes. More specifically, Rybnik et al.
user identification, and this suggestion was also verified by [23] explored keystrokes in different lengths of nonfixed
Forsen and Gaines in 1977 [9] and 1980 [10], respectively. text in 2013, providing a reliable basis for the authenti-
Forsen et al. [9] realized access control by analysing the cation of free text. After that, Song et al. [24] constructed a
keystroke characteristics of users when they input names Gaussian model for the user’s recent input characters se-
that are similar to human signatures, while Gaines et al. quence based on the Gaussian probability density function
[10] recorded the keystroke interval when the typist input in 2016, which shortened the authentication cycle and
the specified text, and analysed the time probability of reached FAR of 5.3% under 30 characters. Huang et al. [25]
consecutive typing characters, and verified the uniqueness updated the keystroke samples using the sliding window
of individual keystroke characteristics. *is is the starting method and achieved FAR of 1% and False Reject Rate
point of keystroke recognition, and belongs to static au- (FRR) of 11.5% in a 1-minute sliding window in 2017.
thentication technology based on fixed text. Fixed text Furthermore, they evaluated the ability of the proposed
refers to the predefined text or phrases to register a user, algorithm to resist short quick insider attacks and detected
and it requires the user to type exactly the same text to insider attacks that lasted 2.5 minutes or longer with a
perform identity verification with the objective of reducing probability of 98.4%. More recently, Ayotte et al. [26]
uncertainty by controlling variables and observing the proposed an instance-based graph comparison algorithm,
performance of a single keystroke feature in identification. which achieved an EER (Equal Error Rate) of 7.9%, 5.7%,
*is kind of experiment is generally applicable to scientific 3.4%, and 2.7%, respectively, under the samples of 50, 100,
researches [18, 19], which has great limitations and is 200, and 500 keystrokes, realizing faster and more accurate
currently only applicable to the identity verification of fixed free-text keystroke identification.
user names and passwords. *erefore, it is also called static
authentication.
On the contrary, in continuous authentication sce- 2.2. Mouse-Movements-Based Authentication Schemes.
narios, users are free to type texts that are not limited by the Mouse dynamics has also received attention in terms of
predefined contents. Free-text-based keystroke recognition authentication. In 2003, Everitt and Mcowan [11] proved
is more difficult than fixed one in terms of data pre- the feasibility of mouse behaviour characteristics in user
processing, feature selection, and keystroke authentication identity authentication for the first time, which set a solid
[20]. Dowland et al. [21] proposed a statistical method for foundation for the subsequent extensive researches in the
continuous certification in 2001, whose accuracy was less academic community. In 2004, Pusara and Brodley [27]
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Security and Communication Networks
used supervised learning to model the mouse movement characteristics, and then build user behaviour model to
behaviour of 11 users and obtained FAR of 0.43% and FRR represent user identity and complete continuous authenti-
of 1.75%. However, due to the small number of user cation. Lots of current research efforts have proved the
samples and single mouse features, the authors also uniqueness of user behaviour, and thus derived a lot of user
pointed out that the analysis of that study was not enough behaviour analytical methods based on big data.
to achieve independent user identity authentication. In In 2017, Liu [36] extracted URL characteristics and
2009, Aksari and Artuner [28] used mouse movement identified user’s consumption level by taking users’ surfing
characteristics for active identity authentication, and ob- time as the sample. In 2018, Mahbub et al. [13] built a
tained FAR of 5.9% and FRR of 5.9% by analysing the Markov model based on the complete application data of
mouse trajectory when the user clicked 10 squares in a row. users, carried out continuous authentication by evaluating
In 2013, Sayed et al. [29] introduced mouse gestures into the changes of hidden Markov model (HMMs), and finally
the user registration system and performed training realized the capture of abnormal users within 2.5 minutes on
through the neural network to realize identity verification the experimental dataset. *ey solved the active authenti-
when the user logins, and finally reached FAR of 5.26% and cation problem by using application usage formulaically and
FRR of 4.59% within 26.9 s in a dataset of 39 users. *e systematically. Furthermore, they suggested that unknown
above researches are also called static mouse authentica- application and unforeseen events had more important
tion, which mainly explores the diversity of mouse features impacts on the authentication performance than the most
and the wide application field of mouse recognition common ones. In 2018, Meng et al. [37] conducted user
through specifying user behaviour or limiting mouse authentication based on the touch gestures of Android
operation scope and trajectory. mobile phone browser, achieving an average EER of 2.4%
On this basis, mouse movement-based continuous among 48 participants, and their system can reduce the
authentication schemes have attracted more and more touch behavioural deviation than others. In 2019, Wei [38]
attention. In 2012, Chao Shen et al. [30] evaluated 5550 analysed the DNS logs of campus network by categorizing
mouse operation samples of 37 users, and obtained mouse the domain names to obtain users’ online behaviour habits
features based on distance measurement and feature space and access preferences, and summarized the characteristics
transformation technology, which reached FRR of 8.74% of students’ online behaviours. In terms of user analysis
and FAR of 7.69% within 11.8 s. Besides, they established based on application records, it can be divided into single
the first public mouse-behaviour dataset, and their re- application based, top n applications based, and all appli-
search results revealed the potential of mouse dynamics in cations based behaviour identification by considering po-
user authentication. In the same year [31], the pattern- tential unique user behaviour pattern when using
based growth method was used to mine frequent mouse applications.
behaviour fragments, and obtained more stable mouse Besides, Alzubaidi et al. [39] presented an active au-
features and reached FAR of 0.37% and FRR of 1.12%. In thentication based on the smartphone usage data under
2014, Medvet et al. [32] used mouse dynamics to provide different machine learning models, and achieved a lower
continuous session authentication and nonintrusive au- EER of 8.2% for authenticating users within short periods of
thentication for web users, and achieved the accuracy of time with a small number of features on the MIT dataset
97% for 24 users. *eir work extended the potential scope [40]. *eir scheme was effective in reducing the classification
of mouse dynamics as a continuous authentication tool to error rate compared with other authentication methods. For
web applications hosted in the cloud rather than just in mobile devices, they are easy to deploy authentication based
local devices. In 2018, Li et al. [33] used the random forest on the App using record, phone usage record, and even web
and sequential sampling analysis to analyse the angle- browser history. However, due to the different operating
based mouse movement and wrist movement, and systems, it is difficult to achieve the intersystem
reached FAR of 1.46% on the dataset of 26 users, and the authentication.
verification time could be determined within 9–12 mouse
clicks. *eir approach is more effective in timely au-
thentication compared with methods based on the mouse 2.4. Multimodal-Fusion-Based Authentication Schemes.
geometry and locomotion features. In 2019, Yildirim and With the development of various authentication technolo-
Anarim [34] verified on Balabit dataset [35] that mouse gies, some researchers try to combine different authenti-
movement curves alone and session-based mouse identity cation technologies to increase the accuracy and timeliness.
could be used, achieving Area Under Curve (AUC) of 93% Although no single authentication technology is perfect, it is
and EER of 13%. very difficult to fool multiple authentication methods at the
same time. *erefore, multimodal fusion authentication can
overcome the problems of partial feature loss. *e recent
2.3. Application-Usage-Based Authentication Schemes. work from Modak and Jha [41] summarized the multi-
Different from keyboard and mouse-based authentication biometric fusion strategy and its different applications in
schemes, application-usage-based authentication is not a terms of multi-modal, multi-algorithm, multi-sample,
biometric technology but a behavioural analysis based on multi-sensor, and multi-instance, and proved the perfor-
user activity records with the objectives to mine user activity mance upgrade of combining two or more individual bio-
records, extract user multi-attribute behaviour metric traits.
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Security and Communication Networks 5
As early as in 2012, Traore et al. [42] proposed an online behaviour. *e average accuracy on CMU dataset is 96.8%, the
authentication system under web environment, which average FAR is 0.04%, and the average FRR is 6.5%. At the same
combined dynamic mouse and keystroke features in a time, a continuous authentication algorithm based on the
multimodal framework to conduct real-time monitoring of weighted reward and punishment mechanism was proposed.
24 user operations, and the final system EER was 8.21%. When the effective double key pairs of each user are 100, the
However, their results had a low Average Number of EER is 8.5% and the AUC is 93.94%.
Genuine Actions (ANGA) value which made the system not For this purpose, this paper designs a multimodal fusion
practical for real users. In 2014, Bailey et al. [43] proposed a continuous authentication mechanism based on users’
user authentication system based on multimodal behav- multidimensional behaviour characteristics in terms of
ioural biometrics by fusing user data from keyboard, mouse, keystroke, mouse, and application usage to effectively pre-
and GUI interactions, and adopted ensemble classification vent the illegal user identity phishing, avoid data privacy,
method to get FAR of 2.1% and FRR of 2.24% over the improve authentication efficiency, and ensure the safety of
dataset with 31 users, which supports the idea of multimodal user information.
fusion to gain better consequence. In 2015, Fridman et al.
[44] presented a multimodal fusion for continuous au- 3. Multimodal-Fusion-Based
thentication by collecting the behavioural biometrics of Continuous Authentication
keystroke dynamics, mouse movement, and a high-level
modality of stylometry, and developed a sensor for each *e MFCA system consists of multimodal fusion mecha-
modality and organized these sensors as parallel binary nism, trust model, and multidimensional behaviour rec-
decision fusion architecture. *eir experimental results ognition models. In this section, we will introduce the
based on database of 67 users who work individually for a multimodal fusion mechanism and three recognition
week show that FRR and FAR are less than 1% within 30 s. In models that are used in the MFCA.
2016, Mondal and Bours [45] proposed a continuous
identity authentication for PC users by combining keystroke
3.1. Multimodal Fusion Mechanism. *e multidimensional
and mouse dynamics, and the recognition rate reached
behaviours of network users mainly include keystrokes,
62.6% and 58.9% in closed and open environments, re-
mouse, screen swipe, and application usage. *is paper
spectively. *e average operation times were 471 and 333,
designs the multimodal fusion mechanism to collect user
respectively. Besides, they first introduced the issue of
behaviour data, combines these multidimensional features
Continuous Identification (CI) and discussed the concept of
effectively, fuses the multiple classifier to avoid the limitation
Continuous Authentication and Identification that provided
of the single classification and improve the classification
the combination of security and forensics. In the same year,
accuracy and generalized capability, and finally realizes the
Beserra et al. [46] applied the dynamic identity recognition
continuous authentication.
application by combining keyboard and mouse for the first
time in online games, and carried out real-time identifica-
tion of player operations to realize anti-cheating function. In 3.1.1. Multi Classifier Fusion Mechanism. Considering the
2018, Sergio et al. [47] established a user emotion model diversity, complexity, and fusibility of the features, this paper
based on the interaction data of the keyboard and mouse in adopts the Multi-Classifier Fusion (MCF) mechanism to im-
the learning scenario, so as to predict the affective state of the prove the accuracy and generalization capability of the final
learner. In 2019, Quintal et al. [48] analysed the mobile user classification results by integrating the output classification
continuous authentications in IoT, and classified these au- results of base classifiers. At the same time, MCF can simplify
thentication factors into event capture types such as pass- classify design, balance classification time and performance,
word, fingerprint, applications start and end, network and improve time and space efficiency. *e typical structure of
connection and disconnection, continuous sequence of the MCF includes cascade combination, parallel combination,
events, such as gestures, and derived behavioural features, and mixed combination. Parallel combination does not have
such as application choice, and demonstrated that all factors the error accumulation problem of cascade combination.
are correlated with the actual user identity. Currently, lots of Furthermore, the system can achieve the best performance of
multimodal continuous authentications are proposed in real-time classification by designing an appropriate decision
smartphone, IoT [49–51]. process. So, this paper adopts parallel combination to perform
*e key points of multimodal fusion continuous authen- parallel processing on the user’s multidimensional behaviours
tication are the association, unified representation, and coor- including keystroke dynamics, mouse movement, and appli-
dination of multimodal information, and the main issues are: cation usage data.
(1) multimodal characteristic expression, that is, how to design *e results of the MCF algorithm depend on the output
single-modal characteristics under the framework of multi- type of the base classifier. When the base classifier output is
modal architecture; (2) how to unify the model of multimodal an interval value or probability value, we can adopt the mean
characteristics. In our preliminary work, we have studied value method (simple average or weighted average), maxi-
continuous authentication based on users’ keystroke and mum-minimum value method, product method, etc. When
mouse behaviour [19], and developed a prototype system. the output is a predefined class label, we adopt the voting
Among them, a static authentication algorithm based on method such as weighted voting, supermajority voting, or
convolutional neural network is proposed for user keystroke relative majority voting.
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Security and Communication Networks
Trust score
erating user depends on the deviation between the user’s
behaviour characteristics and the expected characteristics of Initial score
the model over a period of time. *e system predefines trust
illegitimate user
score and trust threshold at the outset, and then increases or
decreases the trust score along with their operations. When Trust threshold
the user’s behaviour characteristics conform to the model,
the trust score will increase (no more than maximum score). Time
Otherwise, the trust score will decrease. Once the score falls
below the predefined trust threshold, an exception alarm will Figure 2: Schematic diagram of trust score fluctuation curve.
be triggered.
Figure 2 shows the schematic diagram of the fluctuating Table 1: *e parameters of trust model.
trust score. Over a period of continuous operations, the
legitimate user behavioural characteristic is the most Parameters Meaning Value
trusted attribute even though it may not be stable most of th {1, 0}
*e classification result of i
the time. *e corresponding trust score will be slightly up Fi Legal � 1,
feature
and down in a certain period of time, but it is always higher illegal � 0
than the trust threshold. *e legitimate users will almost *e trust score of ith
Ti [Tmin, Tmax]
authentication
imperceptibly perceive the authentication system in order
Tmin *e minimum trust score Tmin
to ensure transparency. On the contrary, the abnormal Tmax *e maximum trust score Tmax
operations of the illegal user will inevitably lead to the Talert *e alert threshold [Tmin, Tmax]
continuous decline of the trust score, which will eventually *e punishment weight of
make the trust score lower than the trust threshold and Wi [0–1.00]
feature i
trigger the abnormal alarm. *erefore, without relaxing the R *e maximum punishment score [0, Tmax]
timely detection of illegal users, the design of the trust P *e minimum punishment score [0, Tmax]
model increases the tolerance of legitimate users’ misop-
erations to improve the accuracy and user-friendliness of
the authentication system. Different behavioural operations will generate different
Table 1 shows the related parameters used in the trust characteristics, but user behaviours have a certain pattern.
model. Each user has an initial trust score of T0. *e model *erefore, the characteristic with high frequency will be
verifies the current user’s identity status in real time according considered more stable and identifiable. In contrast, the
to the user’s behaviour characteristic Fi. When the user’s characteristics with low frequency generally have lower
identity is judged to be legitimate, the trust model gives rewards credibility in the trust model. Take mouse keystroke events
to increase the trust score until the highest threshold Tmax. as an example; mouse keystroke events are divided into left
Otherwise, it will reduce the trust score until the minimum click, left double click, right click, and right double click.
threshold Tmin. When the score is lower than the trust threshold When the occurrence probability of left-click events is much
Talert, the system alarm will be triggered to lock the device. *e greater than that of right-click events, the stability of left-
increase or decrease of the trust score is limited by the max- click behaviour is stronger, and the reward and punishment
imum reward score R and maximum punishment score P, and weight obtained are also higher. For example, left-click
the increase or decrease range depends on the reward and occurs 67 times, double-click occurs 20 times, right-click
punishment weight Wi of the current characteristic. occurs 10 times, and double right-click occurs 3 times. When
According to the above definitions, we can deduce the user is judged as a legitimate user in the left-click feature,
equations (1) and (2), from which the trust score Ti is ob- the trust score should be rewarded with 67 R/100. Otherwise,
tained after the initial authentication. when the user is judged as an illegal user, the trust score will
be punished with 67 P/100. *erefore, the trust score value is
Fi ∗ Wi ∗ R, Fi � 1, mainly affected by two major factors in the weight design:
ΔT Fi � (1) the weight ratio of characteristic model in model fusion and
Fi − 1 ∗ Wi ∗ R, Fi � 0,
the frequency ratio of the feature in the feature set.
Ti � minmaxTi−1 + ΔT Fi , Tmin , Tmax . (2)
3.2. Keystroke Recognition Model
*e basic component of the trust model is the user’s
single behaviour characteristic, and the reward and pun- 3.2.1. Keystroke Dataset Capture Module. In this section, we
ishment range of the trust score depend on the reward and give the keystroke capture procedure of Windows as an
punishment weight of the given characteristic. *e specific example. *e user interacts with the computer through the
weight is introduced for that the system involves three types keyboard to finish the input, so the keystroke data capture
of classification models. range is global events. *erefore, we adopt the keyboard
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Security and Communication Networks 7
hook to collect the keystroke data and encapsulate hook into Start
the Dynamic Link Library (DLL) to ensure the automatic
loading and real-time collection of keystroke data. *e
SetWindowsHookEx()
implementation of keyboard hook is divided into three parts:
Install Keyboard hook WH_KEYBOARD_LL
the installation of keyboard hook, the monitoring and
processing of keyboard message, and the uninstalling of
keyboard hook. Figure 3 shows the procedure of keystroke Call KeyboardProc()
data capture. Capture and process keyboard information.
First, the keystroke capture module adds the keyboard
hook to the list and binds the keystroke event to the key- Record the state of Shift Key and Caps key for
board hook via the SetWindowsHookEx () function that transcode
mainly consists of four parameters. *e first parameter
idHook represents the installed hook type which has two Record click event and time stamp to
kinds. *is module selects a global keyboard hook called Distinguish WM_KEYUP and WM_KEYDOWN
WH_KEYBOARD_LL, which contains lots of keyboard in-
formation such as virtual keyboard key value vKCode, CallNextHookEx()
keystroke state WM_KEYUP and WM_KEYDOWN, and so Callback next hook in the link
on. *e second parameter LPFN points to the hook sub-
routine for further processing of the hooked message, which
UnHookWindowsHookEx()
is also called the call back function. In this module, we Uninstall keyboard hook
rename this function as KeyboardProc. *e third parameter
hMod is the current instance handle which is also known as
DLL module handle. *e fourth parameter dw;readId is End
the thread identifier associated with the keyboard Figure 3: *e procedure of keystroke capture.
subroutine.
Second, the KeyboardProc function is used to monitor
keyboard messages, and Table 2 shows the related field Table 2: Keystroke data information.
information to be collected. When a user clicks a key, the Field name Type Description
keyboard hook captures the event and begins to record the keyCode Int Keystroke key code
keystroke value, keystroke timestamp, keystroke event type, keyValue Char Keystroke key values
and so on. keyEvent Int Event type
As for the conversion of key values and codes, the keyStamp Long Keystroke timestamp
commonly used ASCII codes distinguish the key values of isShiftOn Bool Shift key on/off
upper and lower case letters “a-z” from “A-Z”, with 65–90 isCapsOn Bool CapsLock key on/off
representing uppercase letters and 97–122 representing
lowercase letters. VkCode, on the other hand, is treated as the
same keyboard key without distinction, and only records the keystroke behaviour of the same key value distinguishes two
A-Z key value with 65–90. *erefore, when collecting rec- records of press and release. *erefore, it should be merged
ords, the system needs to further determine whether the and converted into key code, key value, press timestamp, and
Shift key is being pressed through GetAsyncKeyState (), and release timestamp at first, and then deletes the record with
obtain the state of CapsLock key through GetKeyState (). the missing value. Finally, the raw data are transformed into
When either of them is pressed, the letter key is defined as feature data.
uppercase state, and vice versa. In free-text environment, user keystroke is affected by
After that, the specific type of keystroke event is obtained the language, profession, and even emotional stress.
through wParam. *e system aims to intercept WM_KEYUP *erefore, the behaviour habit is random and diverse. On
(key press down) and WM_KEYDOWN (key release), and one hand, the keyboard layout is complicated, which in-
records the keystroke timestamp through GetLocalTime cludes typical QWERTY keyboard with 87, 104, and 109
function. In this case, the time can be accurate to milli- keys. On the other hand, the use of the function keys is
seconds. Finally, the CallNextHookEx () function is used to adventitious, and its characteristics need long-term obser-
complete the delivery to the next hook in the list, and the vation. *erefore, timeliness is insufficient when it is used in
keyboard hook is destroyed once the data collection is continuous authentication. Our system extracts the char-
completed. acteristics of user’s inputted characters. 26 character keys
will randomly form different character sequences which are
affected by language grammar and common words, and the
3.2.2. Keystroke Data Preprocessing and Feature Selection. typical character combinations have a wide range of uni-
*e original keystroke record obtained through data ac- versality. When different users hit the same character, they
quisition is a combination of key code, key value, event type, will show different time characteristics and keystroke fre-
and timestamp, such as 87, W, WM_KEYDOWN, and quency. Besides, the length of character sequence determines
59108278, respectively. Due to different event types, the the order and the magnitude of the combined sequence and
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Security and Communication Networks
the space-time loss complexity of feature processing. Table 3: *e statistical table of double keys.
*erefore, we select the user double key combination No Double No Double
comprehensively, that is, the character sequence of length 2 Frequency Frequency
(#) keys (#) keys
is used as the feature sample of the keystroke recognition 1 AN 9619 11 WO 32
model. To select the most popular double key pairs, we 2 NG 7580 12 AO 3220
conduct statistical experiments on the statistical frequency of 3 IN 7338 13 NA 3015
double key combinations, and record the frequency of the 4 SH 6605 14 EI 2900
top 20 double key combinations among hundreds of 5 EN 6049 15 HE 2653
thousands of valid character keystrokes [19], as shown in 6 IA 5932 16 HS 2622
Table 3. Finally, the top 7 double keys are selected as the 7 CH 4926 17 XI 2554
double key feature samples. 8 ZH 4145 18 ON 2466
*e seven double bond characteristics (“AN,” “NG,” 9 AI 3869 19 HI 2337
“IN,” “SH,” “EN,” “IA,” and “CH”) are extracted uniformly 10 JI 3798 20 IE 1906
for three types of time characteristics: Hold time, Down-
Down time, and DownUp time. As shown in Figure 4 taking
the double keys “WO” as an example, its characteristics are DD[W][O]
described as follows:
Hold[W] UD[W][O] Hold[O]
(1) Hold [W]: *e duration of key “W” from press to
release, likewise Hold [O];
(2) DD [W] [O]: *e interval between press “W” (down) W O
3.2.3. Train and Test of the Keystroke Model. For the double
key characteristics in the keystroke process, the system process includes the establishment of mouse hook, the in-
adopts the decision tree algorithm for model training, as terception and processing of mouse message, and the
shown in Algorithm 1. First, Shannon entropy and infor- uninstallation of the mouse hook. *e establishment and
mation gain are selected as the criteria for feature selection of uninstallation of the mouse hook are similar to that of the
the decision tree. Second, the 7 double keys features are keystroke hook. As for the monitoring and processing of
calculated one by one to obtain the current information gain, mouse messages, it defines the unique mouse data as shown
so as to constantly update the maximum information gain in Table 4, which is different from the keystroke data. When
and the best features. After that, the current subtree is the user manipulates the mouse to trigger the mouse event,
created according to the best feature data, and the current the mouse hook captures these messages, triggers the call
best feature is continuously removed to complete the re- back function MouseProc, and starts to record the mouse
cursive creation of the entire subtree. Finally, after the entire event type, mouse cursor coordinates (x, y), and event oc-
decision tree is built, the decision tree generated by training currence timestamp.
is returned. In addition, we adopt Pessimistic Error Pruning
(PEP), and the penalty factor is set to 0.5 to prevent
overfitting. 3.3.2. Mouse Movement Data Preprocessing. *e original
*e adoption of the decision tree is due to the fact that captured mouse data format is mouse event type, X coor-
once the training is completed, the distinguishing and dinate, Y coordinate, and timestamp, which is relatively
classifying of the existing features are very fast in the testing simple. However, mouse events have natural complexity,
stage. *erefore, in the process of user’s continuous key- which can be mainly divided into four types of events: mouse
stroke in the authentication stage, keystroke data within a idle, mouse moves, mouse drags, and mouse clicks. Among
short period will contain 7 predefined double key charac- them, mouse clicks can be further divided into left click and
teristics with a high probability. During this time, user right click, left double click, and right double click. Besides,
identity determination will be quickly completed and au- the click events can be further divided into press (down) and
thentication results will be calculated through the decision release (up). *erefore, it is important to preprocess the
tree model immediately. mouse data, and transform the scattered data records into
effective mouse events, and further divide them into mouse
features that can be used for identity authentication.
3.3. Mouse Movement Recognition Model As shown in Figure 5, the mouse data preprocessing
procedure is as follows.
3.3.1. Mouse Movement Data Capture. Similar to keystroke
data collection, mouse movement data capture also applies Step 1: mouse click events are divided into left mouse
hook technology, which belongs to the global mouse hook click, left mouse double click, right mouse click, and
WH_MOUSE_LL in the system hook. *e overall capture right mouse double click. *e mouse hook further
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Security and Communication Networks 9
Input:
*e keystroke dataset matrix X;
*e keystroke feature vector F: � F1, F2, . . ., Fn;
Output:
*e Decision Tree Model, Tree;
(1) initialize: do preprocess and split, data � process (X, F);
(2) initialize: init bestInfoGain � 0.0, bestFeature � -1
(3) calculate Shannon entropy; shang � calculateshang (data)
(4) for curFeature � 0 to n do
(5) calculate newEntropy and curInfoGain
(6) bestInfoGain � max (curInfoGain, bestInfoGain)
(7) bestFeature � curFeature
(8) end loop
(9) for value � 0 to data[bestFeature]. size () do
(10) Tree[bestFeature][value] � createTree (X, F - bestFeature)
(11) end loop
(12) return Tree
Step 5: further simplify the mouse record according to classification model based on statistics or neural network.
the selected time slice T and session length X. When the Many studies have proved that SVM performs well in small
number of valid mouse events in time slice T is less than sample data and nonlinear high-dimensional mode. After
X, this session event is discarded without further fea- comprehensively considering the number of mouse fea-
ture extraction. tures and the samples, this paper chooses linear Support
Vector Machine (SVM) [52] as the classifier of mouse
recognition model, and uses the open source libsvm 3.0 [53]
3.3.3. Selection of Mouse Features. In the mouse recognition to build the classification model. Furthermore, since the
model, the system extracts the mouse features according to gamma parameter in the Gaussian kernel will affect the
the user’s mouse behaviour for verification. *e mouse width of the Gaussian function, the larger the gamma, the
features are complex and diverse, and the user identity can easier it is for the SVM to overfit. So our system sets gamma
be effectively measured by using the features of time, to 0.5.
position, frequency, and mouse trajectory. In order to Our system first uses Principal Component Analysis
ensure the timeliness and accuracy of the model in the (PCA) for dimensionality reduction processing of the early
continuous authentication, the system chooses mouse collection of multidimensional mouse motion features, and
movement with more obvious characteristics in a short retains the correlation of each feature to avoid the occur-
period. rence of dimensional disasters. After feature selection is
When the mouse cursor moves from the point P1 (x1, y1) completed by PCA, the original 45-dimension mouse fea-
to P2 (x2, y2), it shows the following five characteristics tures are reduced to 16-dimension features.
during the movement: *e mouse recognition model based on libsvm is divided
(1) *e proportion of mouse movement events in 8 into training stage and testing stage. *e algorithm de-
different movement directions. scription is shown in Algorithm 2. Since the feature di-
mension reduction is required at both stages, the training
(2) *e moving distance (Euclidian distance) of the
and testing of the mouse model are summarized in the same
mouse in 8 directions including average moving
function description and distinguished by option O.
distance and extreme moving distance. *e calcu-
X is the characteristic matrix of the mouse, which is divided
lation of the average moving distance is shown in
into training set and test set according to different O values. L is
Equation 3. *e calculation of the extreme moving
the mouse label matrix, and the size of the matrix depends on
distance is the maximum moving distance in a single
the number of samples n. *is system is a binary classification
time slice T.
model [54], so the label is defined as (0, 1), where the legal user
�������������������
1 M 2 2 is 1 and the illegal user is 0. Op for libsvm training custom
d� x1 − x2 + y 1 − y 2 . (3) parameters, including kernel function and other values, in this
M 0
system is mainly selected by using the exhaustive method.
In the training stage, the system performs dimensionality
(3) *e moving speed of the mouse in 8 directions,
reduction on the features of the training set, conducts
including average moving speed and ultimate
training according to the libsvm options of the feature data
moving speed. *e calculation of the average moving
set, and finally exports the mouse recognition model MM
speed is shown in Equation 4. *e calculation of the
after the training. In the testing stage, the system predicts
extreme moving speed is the maximum moving
according to the existing model MM and test set data, and
speed in a single time slice T.
������������������� exports the user identification result, which is legal or illegal,
2
x1 − x2 + y1 − y2
2 and calculates the classification accuracy ACC.
Δd (4)
v� � . After that, according to the mouse data and mouse
Δt t2 − t1 recognition model, the session length X and time slice T were
tested, in which X was 50,100, and 200 valid mouse events.
(4) *e moving acceleration of the mouse in 8 directions *e experimental data set was user mouse operations col-
is shown in Equation 5. lected within 48 hours, including about 15,000 effective
Δv v − v1 mouse operation events.
a� � 2 2. (5) Figure 6 shows the ROC of session length X � 100. FAR is
Δt t2 − t1
negatively correlated with FRR, and when FAR � FRR, its
(5) *e proportion of mouse movement events in all value is ERR. In addition, when X are 50 and 200, the ROC
mouse operation events. trend is the same as the whole, but the error rate ERR and the
average time slice Tare greatly different. As shown in Table 5,
with the increase of the session length X, the ERR is reduced.
3.3.4. Training and Testing of the Mouse Model. In the fixed *is is because, the more effective the mouse events in the
time slice, the distribution of mouse behaviour events does session cycle, the more stable the mouse features displayed
not have regularity, so the mouse movement characteristics by users. However, at the same time, the longer the session
extracted by this system also have a small sample size and length is, the greater the corresponding average session time
do not conform to the normal distribution. *erefore, it is T will be, which will lead to the longer user behaviour de-
difficult to obtain a good recognition effect on the tection time and therefore greatly affect the system’s timely
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Security and Communication Networks 11
Input:
*e mouse dataset matrix X;
*e option of train or test, O;
*e mouse label vector L: � L1, L2, ..., Ln;
*e libsvm options op;
*e number of principal components, c;
Output:
*e Mouse Model, MM;
*e Predicted Answer, Ans;
*e Predicted Accuracy, Acc;
(1) initialize: do preprocess, pca � PCA (c);
(2) initialize: pca. fit (X)
(3) if O � � 0 then
(4) MM � libsvmTrain (L, X, op)
(5) return Mouse Model, MM
(6) else if O � � 1 And MM is exist then
(7) [Ans, Acc] � libsvmPredict (L, X, MM, op)
(8) return [Ans, Acc]
(9) else
(10) logging illegal options
(11) end if
25
sion, our system has made the balance among the above
factors and selected the time slice length as 5 min and the 20
session length as 100 effective mouse events.
15
10
3.4. Application Usage Recognition Model. Different from the
biological behaviour feature recognition based on keystroke 5
and mouse, the application feature recognition is based on
0
the statistical analysis of the user’s application records, and 0 10 20 30 40 50 60
mines the user’s behaviour features. When the system is False Acceptance Rate (FAR) (%)
deployed, it analyses the user application records in the
current time window in the form of sliding window, extracts Figure 6: ROC curve graph for session length X � 100.
the features and standard model library for verification, and
completes the authentication. Compared to the behavioural Table 5: Mouse session length and classification error rate.
feature model, the data acquisition cycle of the application
recognition model is longer, but the number of users’ core Session length Session duration (min) ERR (%)
applications is relatively fixed. So, the application features 50 2.7 17.93
are more stable, which remedies the shortcoming of strong 100 5.3 9.27
real-time but insufficient stability of keystroke and mouse 200 12.9 7.11
recognition in multimodal recognition.
After that, our system cyclically monitors the process status,
3.4.1. Application Data Collection. Application data col- records the process construct, destroys the events, and
lection mainly captures the process information, and our updates the times and the total running time of the process.
system adopts the Windows API PSAPI library to finish this *e specific process collection information is shown in
work. When the user starts the system, the current process Table 7.
data are initialized through loading dynamic DLL.
First, EnumProcesses () enumerates the ongoing pro-
cesses, counts the total number of processes, and obtains 3.4.2. Application Data Preprocessing and Feature Selection.
detailed data of each process (time, process ID, process *e application-based identity recognition model is a sta-
name, process path) as shown in Table 6. tistics-based classification model, so it does not involve
Second, when recording application processes data, our multidimensional features, and does not require complex
system adds process state, construct times, destroy times, dimensionality reduction and feature selection. After users
and total running time fields according to the current log into the system, they are allowed to make basic system
process information, and initializes the value as 1, 1, 0, t + ∆t. settings and manually select the list of applications to be
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 Security and Communication Networks
monitored. *erefore, our system only carries out statistical Figure 7 shows the overall design of the MFCA sys-
processing for monitoring applications defined by users. tem, and it can be divided into training stage and testing
First, the nontarget application information in the collected stage, in which the training stage mainly completes the
dataset will be eliminated, and then the event update fre- training and fusion of multidimensional behaviour
quency, running time, and total proportion of each target models and finally generates the trust model. In the
application are calculated. When the user does not define a testing stage, the authentication mechanism verifies the
monitoring application, the entire application process is real-time behaviour characteristics of users through the
handled by default. trust model and exports the current trust score. When the
trust score is lower than the predefined trust threshold,
the current user is judged to be an illegal user and the
3.4.3. Application Data Training and Test. For the training MFCA system will lock the device, generate alarm, and
and prediction of the application recognition model, our prevent the user from using the devices. When the trust
system uses the Naive Bayes algorithm based on sklearn score is higher than the trust threshold, the MFCA system
naive_Bayes library. *e idea is to conduct model training determines that the current user’s identity is legitimate,
according to the existing user characteristics and classifi- and the user can continue to use without interference and
cation results. After the training is completed, the proba- any processing.
bilities of each feature belonging to a different category are *e following parts describe the MFCA from three as-
calculated in the testing stage as the final classification re- pects of model training and testing, multimodal fusion
sults. *erefore, it is also known as the classification algo- mechanism, and trust algorithm design.
rithm based on statistics, and the related processing
procedure is shown as follows.
4.1. Model Training and Prediction. *e multidimensional
(1) Assume that X � {x1, x2, ..., xn} is a user to be clas- behaviours of network users mainly consist of keystrokes,
sified, and each user contains n application feature xi mouse, and application usage. Our system designs the
(2) *e result of user identity classification is Y � {0, 1}, multimodal fusion mechanism to collect user behaviour data
in which 0 means illegal user and 1 means legal user and combines them effectively, adopts the multiple classifier
fusion to avoid the limitation of the single classification and
(3) Calculate the probability P (Yi | x) that x belongs to
improve the accuracy of the classification results and gen-
the classification result Y, and P (Y | x) � Max {P (Y1|
eralized capability, and finally realizes continuous
x), P (Y2 | x))
authentication.
Algorithm 3 shows the training procedure. First, the data *e multidimensional behaviour model (keystroke
of the training set is normalized and transformed. Second, model, mouse model, and application model) mainly
the Gaussian Bayesian algorithm in naive Bayes [55] is consists of three stages: model establishment, training,
selected for model training. After the training is completed, and prediction, as shown in Figure 8. When a user reg-
the fitting process is carried out, and the recognition model isters for the first time, the system will default this user as
PM is finally output. Algorithm 4 describes the testing legitimate and collect the data to establish the initial
procedure when applying the recognition model. In the model. After that, the model will constantly update and
testing stage, our system normalizes the test data and evolve with the increase of the multidimensional behav-
computes the classification result Ans according to the iour data. *erefore, the authentication system will
existing training model PM and the test data set X, and continuously collect users’ multidimensional character-
calculates the output confusion matrix Acc according to the istic data, update the model to fit the current user be-
predefined indicators. haviour characteristics while verifying the identity, and
improve the identity recognition accuracy.
In the training stage, once either of the models is updated,
4. MFCA System Design and Procedure the authentication system will trigger the iterative updating of
the trust model to make the model learn the characteristics of
In this section, we will introduce the design methods and
the current users and ensure the timeliness of the model. In the
procedure of the MFCA system. *e MFCA system in-
test phase, once a behaviour such as keystroke has enough data
troduces multimodal fusion to analyse the collected
for feature extraction, the MFCA system will depend on these
multidimensional user behaviour characteristics, per-
characteristics through the trust model to generate the cor-
forms model training according to the characteristics,
responding result. *e trust model will convert multiple re-
and generates the trust model to achieve continuous
sults as the latest trust score based on their predefined
authentication.
proportion, and compare it with the trust threshold to de-
*e MFCA system mainly consists of three parts: first,
termine the legitimacy of user identity, and decide whether or
the keystroke model, mouse model, and application model
not to trigger alarms.
obtained from training based on keystroke data, mouse data,
and application record, respectively; second, the multimodal
fusion technique used to merge the classification results of 4.2. Multimodal Fusion Mechanism. Multimodal fusion
the three models; third, the trust model algorithm used for (known as multi-classifier fusion) is designed to effectively
continuous identity authentication. combine multidimensional features for decisions, avoids the
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Security and Communication Networks 13
Train Stage
Test stage
Keystroke events
Keystroke
feature extraction
illegal
Login Mouse events Mouse movement Trust Identity
Trust model
feature extraction score decision
legitimate
User App usage Application usage
feature extraction
Loop update
limitations of single classification, and improves the accu- multidimensional behaviours need to be considered. In our
racy and generalization of classification results by fusing work, three types of data, namely, keystroke, mouse
multiple models finally. When performing continuous au- movement, and application record, are collected as they have
thentication, the complementarities among natural complementarity when users interact with the
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 Security and Communication Networks
computer. *ree kinds of models have different abilities to 5.1. Experiment Dataset. *e whole experiment scenario is
recognize users. *erefore, the MFCA can cover the using free environment without static authentication, and the
habit of different users based on multiple classifier fusion to system does not require the user to type the specified
improve the accuracy. statement to unlock the device or sign the gesture through
the mouse. From data collection to authentication, the user
maintains normal operations without additional restriction
4.3. Trust Model Algorithm. Take behaviours of keystroke, requirement, so that he/she can almost ignore the existence
mouse, and application as examples to describe the trust of our system except the alarm. In order to facilitate the
score algorithm, as shown in Algorithm 5, where the MCF experiment, 22 participants have been recruited to operate
adopts parallel combination, the outputs of the three base on computer in their daily life which can insure the con-
classifiers are all predefined binary values (illegal � 0 or tinuity and integrity to reduce the impact of uncertain
legal � 1), and the exported results are labelled rather than factors. *e data are collected over three weeks after the
probability values. *e weighted voting method is selected to installation of our system.
complete the multi-classifier fusion. In addition, the system is applied to the general sce-
Taking keystroke identification model of double key pair nario rather than the strict laboratory environment. *e
“an” for example, “an” appears in the double characteristics of system design considers the function and universality with
the weight for the Wf � Count (an)/Count (keyFeature); when the objective of balancing the application condition and
a user types “an” and is identified as a legitimate user, the the application effect, and ensures the high reliability of
model exports classification results FC � 1. At this point, the the characteristics selection and model training. As shown
system will reward the user with Wf∗WK∗R, and update the in Table 8, the computer and hardware are slightly dif-
trust score Trusti � Trusti−1 + Wf∗WK∗R. It should be noted ferent, but they all run on the basic Windows environ-
that the new trust score will be no more than the maximum ment. In the keyboard and mouse equipment, the user
threshold Tmax. On the contrary, the system will punish the selects the general qwerty keyboard and double key
user with Wf∗WK∗P and update the trust score Trusti � Trusti- mouse. *e equipment manufacturers are different, but
f K
1–W ∗W ∗P. However, the new trust score should be no less the impact of the key feature collection of the system can
than the minimum threshold Tmin. After obtaining the trust be ignored.
score Trusti, the system will determine whether the trust score
is lower than the alarm threshold Talert. If the trust score is
lower than the threshold Talert, the system will set the warning 5.2. Evaluation Metrics. After the system was deployed, 22
sign Alert � 1 and trigger the alarm. participants were tested to verify the performance of the
MFCA system. Most previous research work adopts FAR
5. Performance Analysis of MFCA System and FRR to evaluate performance. However, it is not im-
portant to know whether an imposter or illegal user is
In the above, we have introduced the MFCA system and its detected, but when the illegal user is detected. In fact, FAR
sub-modules in detail. In this section, we will describe the and FRR are more suitable for one-time authentication
experiment procedure and analyse the performance of the scenarios. *ey can only indicate whether an illegal user is
MFCA system in detail. detected but cannot indicate when an illegal user can be
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Security and Communication Networks 15
Input:
*e process train dataset matrix X;
*e label vector of process data, L: � L1, L2, ..., Ln;
Output:
*e Process Model, PM;
(1) initialize: do preprocess, scalar � MinMaxScaler ( )
(2) initialize: X � scalar. fit_transform (X)
(3) PM � GaussianNB ( )
(4) PM. fit (X, L)
(5) return PM
Input:
*e label vector of process data, L;
*e process test dataset matrix X;
*e process model, PM
Output:
*e predicted answer, Ans;
*e predicted accuracy information matrix, Acc;
(1) initialize: do preprocess, scalar � MinMaxScaler ( )
(2) initialize: X � scalar. fit_transform (X)
(3) predicted � PM. predict (X)
(4) Ans � metric. classification report (L, predicted)
(5) Acc � metrics. confusion matrix (Ans, predicted, L)
(6) return [Ans, Acc]
detected which is more important in a continuous au- as test data. *e following operations are performed on all
thentication scenario. For example, even if the recognition users’ input data: first, our system uses the training data of
rate of a model is high, but the detection time is long, the legitimate users for model training; second, the test data
intrusion may have been completed before illegal users are are used to calculate the Number of Genuine Action
detected, which is unacceptable. Different from the previous (NGA) of the model; finally, the data of illegal users are
performance evaluation metrics, this paper adopts Average used to attack and the Number of Imposter Action (NIA)
Number of Imposter Actions (ANIA) and Average Number of the model is calculated. *e initial trust score of all
of Genuine Actions (ANGA) to evaluate the application users is 90. When the trust score is below the threshold of
effect of the system, where ANIN refers to the average 75, the pop-up alarm will be triggered, and the system will
number of behavioural characteristics required for illegal record the verification times of each feature to obtain NIA
users to be identified as exceptions, and ANGA refers to the and NGA, and calculates the ANIA and ANGA. *e
average number of behavioural characteristics used by le- experimental data of the two groups are shown in Tables 9
gitimate users to be identified as exceptions. *erefore, and 10.
ANIA should be as low as possible, so that ANIA users can As shown in Tables 9 and 10, ANIA � 430 and
be identified more quickly and in less time, which can ANGA � 7341, which means that the average illegal user
perform fewer illegal operations. ANGA should be as high as can be identified in the 430 features input, the legal user
possible so that legitimate users can work without inter- has an average of 7341 characteristics input. Note that an
ruption as much as possible. effective feature here is not a user behaviour. Take a mouse
operation as an example; an effective mouse movement
that contains multidimensional features such as moving
5.3. Experimental Results. In the experiment, 22 partici- distance, moving speed, and moving direction, so that the
pants are divided into two groups: one group comprise authentication speed will accelerate as the user performs
legal users’ normal use of their own equipment, and the the features frequently. *e capture period of illegal users
other group comprise illegal users’ operation of others’ is shorter, which can realize the user exception in a short
equipment. *e whole experimental environment does time. *e normal using period of the legitimate user is
not have other restrictive requirements. We take the first longer; therefore, the daily work will rarely be interrupted.
70% of the user’s input data as training data and the others In addition, to speed up the abnormal authentication
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16 Security and Communication Networks
Input:
*e feature type, Ft
*e feature classification results, FC;
*e weight of this feature in its recognition model, Wf ;
*e trust score after last calculation, Trusti−1 ;
*e initial trust score T, and score threshold Tmax and Tmin ;
*e score will trigger system alert Talert;
*e reward and punishment score for each feature, R, P;
*e weight of *ree authentication model, W: � Wk , Wm , Wp ;
Output:
*e trust score after this calculation, Trusti ;
IF alert the system trust, A;
(1) initialize: Init A � 0, if the first calculation, Trusti � T;
(2) if Ft � � 0 then
(3) if FC �� 0 then
(4) Trusti � max(Trusti−1 − Wf ∗ Wk ∗ P, Tmin );
(5) else {FC �� 1}
(6) Trusti � min(Trusti−1 + Wf ∗ Wk ∗ R, Tmax );
(7) end if
(8) else if Ft �� 1 then
(9) replace Wk in the above formula with Wm ;
(10) else Ft �� 2
(11) replace Wk in the above formula with Wp ;
(12) end if
(13) if Trusti < Talert then
A � 1;
(14) end if
(15) return [Trusti , A]
speed or prevent the user from being disturbed, our system scenarios against the models and introduce data anonym-
can increase or reduce the trust threshold of the trust isation and differential privacy mechanisms to protect user
model. data privacy.
[6] “Insider threat report [EB/OL],” 2020, https://www. [23] M. Rybnik, M. Tabedzki, M. Adamski, and K. Saeed, “An
cybersecurity-insiders.com/wp-content/uploads/2019/11/2020- exploration of keystroke dynamics authentication using non-
Insider-*reat-Report-Gurucul.pdf,%202021-05. fixed text of various length,” in Proceedings of the Interna-
[7] S. Teerakanok, T. Uehara, and A. Inomata, “Migrating to zero tional Conference on Biometrics and Kansei Engineering,
trust architecture: reviews and challenges,” Security and pp. 245–250, Tokyo, Japan, July 2013.
Communication Networks, vol. 2021, Article ID 9947347, [24] X. Song, P. Zhao, M. Wang, and C. Yan, “A continuous
10 pages, 2021. identity verification method based on free-text keystroke
[8] R. Spillane, “Keyboard apparatus for personal identification,” dynamics,” in Proceedings of the IEEE International Confer-
IBM Technical Disclosure Bulletin, vol. 173346 pages, 1975. ence on Systems, Man, and Cybernetics (SMC), pp. 206–210,
[9] G. E. Forsen, M. R. Nelson, and R. J. J. Staron, “Personal Budapest, Hungary, October 2016.
attributes authentication techniques,” Pattern Analysis and [25] J. Huang, D. Hou, and S. Schuckers, “A practical evaluation of
Recognition Corp, Technology Report, NTIS No. 197805, free-text keystroke dynamics,” in Proceedings of the IEEE
1977. International Conference on Identity, Security and Behavior
[10] R. Gaines, W. Lisowski, and S. Press, “Authentication by Analysis (ISBA), pp. 1–8, New Delhi, India, February 2017.
keystroke timing: some preliminary results,” Rand Corpo- [26] B. Ayotte, M. K. Banavar, D. Hou, and S. Schuckers, “Fast and
ration: Rand Report R-2560-NSF, *e Rand Corporation, accurate continuous user authentication by fusion of in-
Santa Monica, CA, USA, 1980. stance-based, free-text keystroke dynamics,” in Proceedings of
[11] R. A. J. Everitt and P. W. Mcowan, “Java-based Internet the International Conference of the Biometrics Special Interest
biometric authentication system,” IEEE Transactions on Group, pp. 1–6, Darmstadt, Germany, September 2019.
Pattern Analysis and Machine Intelligence, vol. 25, no. 9, [27] M. Pusara and C. E. Brodley, “User Re-authentication via
pp. 1166–1172, 2003. mouse movements,” in Proceedings of the 2004 ACM work-
[12] N. Zheng, A. Paloski, and H. Wang, “An efficient user ver- shop on Visualization and data mining for computer security,
ification system using angle-based mouse movement bio- pp. 1–8, Association for Computing Machinery, New York,
metrics,” ACM Transactions on Information and System NY, USA, October 2004.
Security, vol. 18, no. 3, pp. 1–27, 2016. [28] Y. Aksari and H. Artuner, “Active authentication by mouse
[13] U. Mahbub, J. Komulainen, D. Ferreira, and R. Chellappa,
movements,” in Proceedings of the 24th International Sym-
“Continuous authentication of smartphones based on ap-
posium on Computer and Information Sciences, pp. 571–574,
plication usage,” IEEE Transactions on Biometrics, Behavior,
Suzelyurt, Cyprus, September 2009.
and Identity Science, vol. 1, no. 3, pp. 165–180, 2019.
[29] B. Sayed, I. Traore, I. Woungang, and M. S. Obaidat, “Bio-
[14] M. Ehatisham-Ul-Haq, M. Awais Azam, U. Naeem, Y. Amin,
metric authentication using mouse gesture dynamics,” IEEE
and J. Loo, “Continuous authentication of smartphone users
Systems Journal, vol. 7, no. 2, pp. 262–274, 2013.
based on activity pattern recognition using passive mobile
[30] C. Chao Shen, Z. Zhongmin Cai, X. Xiaohong Guan, Y. Du,
sensing,” Journal of Network and Computer Applications,
and R. A. Maxion, “User authentication through mouse
vol. 109, pp. 24–35, 2018.
dynamics,” IEEE Transactions on Information Forensics and
[15] R. Kumar, V. Phoha, and R. Raina, “Authenticating users
Security, vol. 8, no. 1, pp. 16–30, 2013.
through their arm movement patterns,” 2016, http://arxiv.
[31] C. Shen, Z. Cai, and X. Guan, “Continuous authentication for
org/abs/1603.02211.
[16] Y. Li, “Research on gesture recognition model and its ap- mouse dynamics: a pattern-growth approach,” in Proceedings
plication based on wear sensing perception,” pp. 1–45, of the IEEE/IFIP International Conference on Dependable
Lanzhou University, Lanzhou, China, 2019, Master’s *esis. Systems and Networks (DSN 2012), pp. 1–12, Boston, MA,
[17] J. Handa, S. Singh, and S. Saraswat, “A comparative study of USA, June 2012.
mouse and keystroke based authentication,” in Proceedings of [32] E. Medvet, A. Bartoli, F. Boem, and F. Tarlao, “Continuous
the 2019 9th International Conference on Cloud Computing, and non-intrusive reauthentication of web sessions based on
Data Science & Engineering (Confluence), pp. 670–674, Noida, mouse dynamics,” in Proceedings of the 9th International
India, January 2019. Conference on Availability, Reliability and Security, pp. 166–
[18] P. H. Pisani, A. C. Lorena, and P. L. F. De Carvalho, “Adaptive 171, Fribourg, Switzerland, September 2014.
biometric systems using ensembles,” IEEE Intelligent Systems, [33] B. Li, W. Wang, Y. Gao, V. Phota, and Z. Jin, “Hand in
vol. 33, no. 2, pp. 19–28, 2018. motion: enhanced authentication through wrist and mouse
[19] M. Liu, “Research on authentication technology based on user movement,” in Proceedings of the IEEE 9th International
keystroke behavior,” pp. 1–80, Beijing University of Posts and Conference on Biometrics ;eory, Applications and Systems
Telecommunications, Beijing, China, 2019, Master’s *esis. (BTAS), pp. 1–9, Rendondo Beach, CA, USA, October 2018.
[20] D. Gunetti and C. Picardi, “Keystroke analysis of free text,” [34] M. Yildirim and E. Anarim, “Session-based user authenti-
ACM Transactions on Information and System Security, vol. 8, cation via mouse dynamics,” in Proceedings of the 27th Signal
no. 3, pp. 312–347, 2005. Processing and Communications Applications Conference
[21] P. Dowland, H. Singh, and S. Furnell, “A preliminary in- (SIU), pp. 1–4, Sivas, Turkey, April 2019.
vestigation of user authentication using continuous keystroke [35] Á Fülöp, L. Kovács, T. Kurics, and E. Windhager-Pokol,
analysis,” in Proceedings of the IFIP 8th Annual Working “Balabit mouse dynamics challenge data set,” Available at:
Conference on Information Security Management & Small https://github.com/balabit/Mouse-Dynamics-Challenge,
Systems Security, Las Vegas, NV, USA, September 2001. 2016.
[22] T. Shimshon, R. Moskovitch, L. Rokach, and Y. Elovici, [36] S. Liu, “Research on user behaviour analysis model based on
“Continuous verification using keystroke dynamics,” in log data,” pp. 1–63, Yunnan University, Kunming, China,
Proceedings of the 2010 International Conference on Com- 2017, Master’s *esis of.
putational Intelligence and Security, pp. 411–415, Naning, [37] W. Meng, Y. Wang, D. S. Wong, S. Wen, and Y. Xiang,
China, December 2010. “TouchWB: touch behavioral user authentication based on
2037, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/6669429, Wiley Online Library on [28/09/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Security and Communication Networks 19
web browsing on smartphones,” Journal of Network and [54] S. Almalki, P. Chatterjee, and K. Roy, “Continuous authen-
Computer Applications, vol. 117, pp. 1–9, 2018. tication using mouse clickstream data analysis,” in Proceed-
[38] J. Wei, “Analysis and research of user access behavior based ings of the International Conference on Security, Privacy and
on DNS log,” pp. 1–75, Beijing Jiaotong University, Beijing, Anonymity in Computation, Communication and Storage,
China, 2019, Master’s *esis of. pp. 76–85, Springer, Atlanta, GA, USA, July 2019.
[39] A. Alzubaidi, S. Roy, and J. Kalita, “A data reduction scheme [55] I. Rish, “An empirical study of the naive Bayes classifier,”
for active authentication of legitimate smartphone owner IJCAI 2001 workshop on empirical methods in artificial in-
using informative apps ranking,” Digital Communications and telligence, vol. 3, no. 22, pp. 41–46, 2001.
Networks, vol. 5, no. 4, pp. 205–213, 2019.
[40] N. Eagle, A. Pentland, and D. Lazer, “Inferring friendship
network structure by using mobile phone data,” Proceedings of
the National Academy of Sciences, vol. 106, no. 36,
pp. 15274–15278, 2009.
[41] S. K. S. Modak and V. K. Jha, “Multibiometric fusion strategy
and its applications: a review,” Information Fusion, vol. 49,
pp. 174–204, 2019.
[42] I. Traore, I. Woungang, M. S. Obaidat, N. Youssef, and L. Iris,
“Combining mouse and keystroke dynamics biometrics for
risk-based authentication in web environments,” in Pro-
ceedings of the Fourth International Conference on Digital
Home, pp. 138–145, IEEE Computer Society, Guangzhou,
China, November 2012.
[43] K. O. Bailey, J. S. Okolica, and G. L. Peterson, “User iden-
tification and authentication using multi-modal behavioral
biometrics,” Computers & Security, vol. 43, pp. 77–89, 2014.
[44] L. Fridman, A. Stolerman, S. Acharya et al., “Multi-modal
decision fusion for continuous authentication,” Computers &
Electrical Engineering, vol. 41, pp. 142–156, 2015.
[45] S. Mondal and P. Bours, “Combining keystroke and mouse
dynamics for continuous user authentication and identifi-
cation,” in Proceedings of the 2016 IEEE International Con-
ference on Identity, Security and Behavior Analysis, pp. 1–8,
ISBA, Sendai, Japan, February 2016.
[46] I. D. S. Beserra, L. Camara, and M. D. Costa-Abreu, “Using
keystroke and mouse dynamics for user identification in the
online collaborative game league of legends,” in Proceedings of
the 7th International Conference on Imaging for Crime De-
tection and Prevention (ICDP 2016), November 2018.
[47] S. M. Sergio, R. S. Baker, O. C. Santos, and J. González-
Boticario, “A machine learning approach to leverage indi-
vidual keyboard and mouse interaction behavior from mul-
tiple users in real-world learning scenarios,” IEEE Access,
vol. 6, pp. 39154–39179, 2018.
[48] K. Quintal, B. Kantarci, M. Erol-Kantarci, A. Malton, and
A. Walenstein, “Contextual, behavioral, and biometric sig-
natures for continuous authentication,” IEEE Internet Com-
puting, vol. 23, no. 5, pp. 18–28, 2019.
[49] R. Wang and D. Tao, “Implicit authentication mechanism
based on context awareness for smartphone,” Journal of
Beijing University of Posts and Telecommunications, vol. 42,
no. 6, pp. 118–125, 2019.
[50] Y. Yang, J. Sun, and L. Guo, “PersonaIA: a lightweight implicit
authentication system based on customized user behavior
selection,” IEEE Transactions on Dependable and Secure
Computing, vol. 16, no. 1, pp. 113–126, 2019.
[51] S. Vhaduri and C. Poellabauer, “Multi-modal biometric-based
implicit authentication of wearable device users,” IEEE
Transactions on Information Forensics and Security, vol. 14,
no. 12, pp. 3116–3125, 2019.
[52] C. Cortes and V. Vapnik, “Support-vector networks,” Ma-
chine Learning, vol. 20, no. 3, pp. 273–297, 1995.
[53] C.-C. Chang and C.-J. Lin, “Libsvm,” ACM Transactions on
Intelligent Systems and Technology, vol. 2, no. 3, pp. 1–27, 2011.