Illusionpin: Shoulder-Surfing Resistant Authentication Using Hybrid Images
Illusionpin: Shoulder-Surfing Resistant Authentication Using Hybrid Images
Illusionpin: Shoulder-Surfing Resistant Authentication Using Hybrid Images
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
1
Abstract—We address the problem of shoulder-surfing attacks observation. A typical example is an adversary who is standing
on authentication schemes by proposing IllusionPIN (IPIN), a behind a person in the line for an ATM machine and is
PIN-based authentication method that operates on touchscreen looking, or “surfing”, over the person’s shoulder to obtain her
devices. IPIN uses the technique of hybrid images to blend two
keypads with different digit orderings in such a way, that the PIN information. In this scenario, the attacker is observing
user who is close to the device is seeing one keypad to enter her a person while being in her vicinity. However, the attacker
PIN, while the attacker who is looking at the device from a bigger may observe someone remotely by using recorded material
distance is seeing only the other keypad. The user’s keypad is that was collected intentionally or even unintentionally. For
shuffled in every authentication attempt since the attacker may example, unintentional recording of shoulder-surfing material
memorize the spatial arrangement of the pressed digits.
To reason about the security of IllusionPIN, we developed could result from a surveillance camera that captured a person
an algorithm which is based on human visual perception and while entering her authentication credentials to unlock her
estimates the minimum distance from which an observer is unable phone in a store or at the workplace.
to interpret the keypad of the user. We tested our estimations with Authentication schemes which are not resilient to observa-
84 simulated shoulder-surfing attacks from 21 different people. tion are vulnerable to shoulder-surfing. Any kind of visual
None of the attacks was successful against our estimations. In
addition, we estimated the minimum distance from which a information may be observed, including the blink of a button
camera is unable to capture the visual information from the when it is pressed, or even the oily residue that the fingers
keypad of the user. Based on our analysis, it seems practically leave on a touchscreen [5]. Shoulder-surfing is a big threat for
almost impossible for a surveillance camera to capture the PIN PIN authentication in particular, because it is relatively easy
of a smartphone user when IPIN is in use. for an observer to follow the PIN authentication process. PINs
Index Terms—PIN Authentication, Shoulder-Surfing, Video are short and require just a small numeric keypad instead of the
Attack, Hybrid Image, Human Visual Perception usual alphanumeric keyboard. In addition, PIN authentication
is often performed in crowded places, e.g., when someone is
I. I NTRODUCTION unlocking her mobile phone on the street or in the subway.
Shoulder-surfing is facilitated in such scenarios since it is
U SER authentication is performed in various ways [1]. We
focus on PIN authentication because of its simplicity
and maturity. A Personal Identification Number (PIN) is a
easier for an attacker to stand close to the user while escaping
her attention.
sequence of digits that confirms the identity of a person when it The motivation behind this work relies on the hypothesis
is successfully presented. PINs are simpler than alphanumeric that PIN authentication will really meet the needs of its users
passwords as they solely consist of numerical characters (0-9) when it will increase its shoulder-surfing resistance without a
and have a short length that is usually either 4 or 6 digits. significant overhead in its usability. We contributed towards
This makes PINs easy to remember and easy to reproduce, this claim in the following ways.
and as a consequence, PIN authentication is characterized by • We designed IllusionPIN (IPIN) for touchscreen devices.
infrequent errors [2]. So, simplicity is translated to usability. The virtual keypad of IPIN is composed of two keypads
The maturity of PIN authentication is a result of its continuous with different digit orderings, blended in a single hybrid
usage for years in a wide range of everyday life applications, image [6]. The user who is close to the screen is able
like mobile phones and banking systems. to see and use one keypad, but a potential attacker who
From the perspective of security, PIN authentication is is looking at the screen from a bigger distance, is able
susceptible to brute force or even guessing attacks [3]. To to see only the other keypad. We analyze in detail the
balance this weakness, the number of allowed authentication design of IllusionPIN in Section V.
attempts is usually constrained to a small number such as 5. • We developed an algorithm to estimate whether or not
However, a simple attack that is still very hard to mitigate is the user’s keypad is visible to an observer at a given
shoulder-surfing [4]. viewing position. We explain the estimation algorithm in
Shoulder-surfing refers to eavesdropping personal infor- Section VI.
mation, like an alphanumeric password or a PIN, through • We tested the estimated visibility of IllusionPIN through
a user study of simulated shoulder-surfing attacks on
All authors are with the Department of Computer Science and Engi- smartphone devices. In total, we performed 84 attacks
neering, New York University Tandon School of Engineering, Brooklyn,
NY, 11201 USA; e-mail: [email protected], [email protected], mdur- with 21 different people and none of the attacks was
[email protected], and [email protected] successful against our estimations. We provide the details
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
2
of this user study in Section VIII. to successfully perform sequential intersections of sets to
• We estimated the minimum distance from which a camera extract the correct PIN. However, such schemes are usually
is unable to capture the visual information from the user’s complex for the users too, with all the inevitable consequences
keypad. The exact procedure is explained in Section VII. for usability. In addition, recorded material or even repeated
The results show that it is practically almost impossible observation may reveal the authentication credentials, since all
for a surveillance camera to capture the PIN of a smart- the useful information is observable.
phone user when IllusionPIN is in use. The Alteration principle states that the required input has to
change in every authentication attempt [17], [18]. For example,
Deja Vu [19] presents to the user a number n of images, and
II. R ELATED W ORK
asks her to specify which of them belong to a predefined set
We organize shoulder-surfing resistant authentication of images, called the user’s portfolio. In each authentication
schemes according to 6 design principles. The Obscurity attempt, different images from the portfolio are assigned to
principle states that the visual information of interest has to the set of the n images. As a consequence, an observer
be obscured. For example, ShieldPIN [7] requires the user to cannot learn the portfolio of the user in a limited number of
physically cover the keypad by cupping one hand while using observations. However, multiple observations may reveal the
the other hand to enter her PIN. It is obvious that such an whole portfolio. In general, with such schemes is difficult for
approach demands physical effort and simultaneous usage of the user to get familiar with a standard input. For example,
both hands that may be unwanted. An alternative solution that with Deja Vu the user has to identify different pictures in every
does not require any extra effort from the user is to make authentication attempt. This requires additional cognitive effort
the content of the screen visible within a limited range of and may affect the authentication time and the error rate.
viewing angle. This can be achieved either with additional The One-to-Many principle states that the same input has
hardware, e.g. privacy filters, or with special hardware, e.g. to correspond to more than one authentication credentials [20],
automultiscopic displays [8], [9]. In both cases, deployability [21], [22], [23], [16], [24]. For example, SlotPIN [7] allows
may be an issue. However, there is a number of software the user to enter a PIN by aligning four vertical reels of
solutions which create a similar effect [10], [11]. Specifically, randomly ordered digits. The first reel is static and denotes the
depending on the viewing angle, different visual elements position of the first PIN digit. The other three reels get aligned
appear on the screen and obscure the real content. These by the user according to the PIN. In the end, ten PINs are
approaches exploit technical limitations of certain screens’ formed, and a shoulder-surfer is unable to know which is the
technology, and as a result, they can’t be generalized or correct one. In addition, it is difficult to memorize all of them.
expected to be applicable in the future as screen technology However, the attacker could replicate the same input without
advances. In addition, a shoulder-surfer is not necessarily the need to know the correct credentials. That’s why schemes
observing from a big angle, as he may be just standing behind designed under this principle randomize the input interface
the user. periodically. In the case of SlotPIN, the digits on the reels
The Visual Complexity principle states that it has to be are randomized in every authentication attempt. The problem
difficult to receive the visual information of interest [12]. For with such schemes is that they exhibit high complexity in
example, DAS [13] is a simple graphical password that allows order to break the one-to-one correspondence between inputs
the user to create a free-form drawing on a touchscreen and and authentication credentials. This results in high cognitive
to use it as her password. Decoy Strokes [14] is a shoulder- effort on the user side and renders such schemes unacceptable
surfing resistant variation of DAS that draws strokes alongside for frequent usage, e.g. for unlocking mobile phones. Also,
the user’s password to confuse a malicious observer. The multiple observations may reveal the correct credentials.
problem with such schemes is that the user is exposed to The Non-Visual Information principle states that at least
the same distracting information and may end up confused part of the information of interest has to be transmitted through
as well, leading to slower authentication and more frequent channels that are not observable. This way, an observer is
input errors. Also, if the attacker is able to observe multiple always missing a piece of information and shoulder-surfing is
times during the authentication process or to record it, he may mitigated even in cases that multiple observations or record-
be able to steal the credentials of the user. ings are possible. However, the performance of such schemes
The Cognitive Complexity principle states that it has to in usability and deployability varies. For example, schemes
be difficult to process the acquired visual information [15]. which use audio and haptic information [25], [26] suffer from
For example, in one of the cognitive trapdoor games [16], high authentication time and their requirements for additional
the user is required to enter her PIN in the following way. hardware heavily affect their deployability. However, the emer-
The digits on the provided keypad are separated into two sets gence of touchscreen devices which are pressure-sensitive may
based on their color; half of them are black, and half of them favor schemes which use this kind of haptic information [27],
are white. The user selects the set that the current digit of [28], [29], [30], [31], [32], if they are combined with satisfying
her PIN belongs to, and then the digits are reassigned to the usability. Gaze-based authentication [33] corresponds to high
two color sets. This procedure is repeated until the scheme is authentication time, and authentication based on brainwaves
able to uniquely determine the correct digit by intersecting the [34] or bone conduction [35] requires special equipment. Fin-
selected sets. Then, the user proceeds to the next digit of her gerprint authentication [36] is a scheme that gains popularity
PIN. For an observer, it is extremely difficult, if not impossible, nowadays by offering excellent usability. A usual problem with
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
3
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
4
Gain
frequency. In figure 1, we can see that G subtends visual 0.5
angle (θx , θy ). So, if f~ = (nx/dx , ny/dy ) is the actual spa-
tial frequency of G, its perceived spatial frequency will be 0
f~p = (nx/θx , ny/θy ), where nx , ny is the number of cycles 0 20 40 60
perceived spatial frequency magnitude (c/d)
in the horizontal and vertical directions, and (dx , dy ) is the Fig. 2. The Contrast Sensitive Function model as proposed by Mannos et al.
size of the image measured in units of length. All we need [38].
to do is to calculate (θx , θy ). To this end, we consider the (a) (b)
general setting depicted in figure 1. The coordinate system for 15 15
small distance
intermediate distance
specifying spatial positions is placed at the center of the image big distance
ln(1 + |X ( f~p)|)
ln(1 + |X ( f~)|)
where we assume that the observer is focused. The position 10 10 CSF
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
5
(a) (b)
15 15 (a) (b)
Ils Ils
Ih Ih
ln(1 + |X ( f~p)|)
ln(1 + |X ( f~p)|)
u u
10 10
CSF CSF
5 5
0 0
0 20 40 60 0 20 40 60
perceived spatial frequency magnitude (c/d) perceived spatial frequency magnitude (c/d)
Fig. 4. (a) An example of how the perceived spectrums of two square images Fig. 5. (a) A hybrid keypad (b) The simulated perception of the same hybrid
Iuh and Isl look like when they are directly viewed from a small distance. Iuh is keypad when it is directly viewed from distance which is 2 times bigger.
the result of high-pass filtering and Isl of low-pass filtering. (b) The perceived
spectrums of the same images when they are viewed from a bigger distance. attempt (or every digit entry) to avoid disclosing the spatial
we are using the Discrete Fourier Transform (DFT) and the distribution of the pressed digits. We create the keypad of
circles on the curve of figure 3 (a) correspond to the existing IllusionPIN with the method of hybrid images [6], [39] and
discrete magnitude values of actual spatial frequencies. we call it a hybrid keypad.
To understand how an image is perceived from a specific A hybrid keypad I is created by blending appropriately two
viewing position N = (r0 , θ0 , φ0 ), we use equations 6 and 7 to normal keypads denoted with Iu and Is . Our goal is I to
express the actual spectrum in perceived spatial frequencies. be interpreted as being Iu when it is viewed from close up,
We call such a diagram the perceived spectrum. To aid our and to be interpreted as being Is when it is viewed from far
demonstrations, we consider the special case that the observer away. That’s why we call Iu the “user’s keypad” and Is the
is looking directly at the image (θ0 = π/2, φ0 = 0) and “shoulder-surfer’s keypad”. To create I, we process Iu with
additionally holds that dx = dy = d. In such a case, from a high-pass filter and Is with a low-pass filter. The filtering
equations 6 and 7 we get θx = θy = θ. As a consequence, it results in two new images, Iuh and Isl , and we simply set
holds that |f~p | = d/θ · |f~| and the perceived spectrum has I = Iuh + Isl . So, a hybrid keypad is composed by a high
exactly the same form as the actual spectrum. In figure 3 spatial frequency component Iuh and a low spatial frequency
(b), we depict the perceived spectrums of the image with component Isl . To understand how the interpretation of I is
the actual spectrum of figure 3 (a) when it is viewed from changing, we consider that we directly view an example hybrid
3 different distances r0 . The blue curve corresponds to the keypad from different distances. If the viewing distance is
smallest viewing distance. As the viewing distance increases, adequately small, the visual angle is such that the perceived
the visual angle gets smaller and the factor d/θ gets bigger. As spectrums of Iuh and Isl occupy low perceived spatial frequency
a consequence, the perceived spectrums are stretched and take magnitudes as depicted in figure 4 (a). As we can see, the gain
the form of the cyan and the red curve. Since the contrast of a distribution of the CSF favors the perceived spectrum of Iuh ,
grating can be expressed through its amplitude, we can use the and as a result I is interpreted as being Iu . In figure 4 (b)
CSF to filter the perceived spectrums in order to understand we depict the same perceived spectrums for a bigger viewing
how they are perceived. As we can see, the band-pass nature of distance. As we can see, the perceived spectrums are stretched
the CSF favors the cyan curve. In particular, the CSF assigns to higher perceived spatial frequency magnitudes and the CSF
small gain values to gratings with a big contribution in both favors Isl . As a consequence, Is dominates the perception of
the blue and the red curve. As a consequence, part of the visual I. From intermediate viewing distances, both Iuh and Isl are
information that the image is carrying is either perceived with visible to a considerable extent and is not certain how I is
less clarity or it is not perceived at all. For example, in the interpreted. In figure 5, we provide an example hybrid image.
case of the red curve that the viewing distance has its bigger In figure 5 (b) we downscale the image of figure 5 (a) to
value, many gratings with high perceived spatial frequency simulate how it is perceived when it is directly viewed from
magnitudes are completely cut-off. That’s why when an image a 2-times bigger distance. From reading distance, we expect
is viewed from a big distance, it is perceived as being blurred. the digit ordering of the hybrid keypad in figure 5 (a) and (b)
to be perceived as being different.
V. I LLUSION PIN (IPIN)
A. Method B. Parameters
IllusionPIN is a PIN-based authentication scheme for touch- Given the images Iu and Is , the parameters that specify the
screen devices which offers shoulder-surfing resistance. The hybrid keypad I are the parameters of the low-pass and the
design of IllusionPIN is based on the simple observation that high-pass filters which are used to create Iuh and Isl . For the
the user is always viewing the screen of her device from a low-pass filtering, we use a 2D Gaussian filter Gl , and for
smaller distance than a shoulder-surfer. Based on this, the core the high-pass filtering we use a filter Gh = 1 − Ghl , where
idea of IllusionPIN is to make the keypad on the touchscreen to Ghl is also a 2D Gaussian filter. In the spatial domain, both
be interpreted with a different digit ordering when the viewing Gl and Ghl have zero mean values and diagonal covariance
distance is adequately large. This way, when the shoulder- matrices with equal standard deviations. This way, they affect
surfer is standing far enough, he is viewing the keypad as an image by an equal number of pixels in each dimension.
being different from the one that the user is utilizing for her So, if σxls and σyls are the standard deviations of Gl in the
authentication, and consequently he is unable to extract the horizontal and vertical dimensions, it will hold σxls = σyls . If
user’s PIN. Also, the keypad is shuffled in every authentication σxlf and σylf are the corresponding standard deviations in the
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
6
frequency domain, it will hold Nxf /2πσxlf = Nyf /2πσylf ⇒ σylf = Algorithm 1 Visibility Algorithm
lf
Nyf /Nxf ·σx , where (Nxf , Nyf ) is the size of the filter in samples. Require: hybrid keypad I, shoulder-surfer’s keypad Is , DAF
This shows that we can completely define Gl by specifying filter, viewing position N , visibility index threshold value vth
either σyls or σylf . For convenience, we prefer to work in the 1: Isp ← calculate 2D perceived spectrum of I given N
frequency domain and consequently, we consider σylf as the 2: Is,sp ← calculate 2D perceived spectrum of Is given N
DAF
single parameter of Gl . Similarly, we define Gh through the 3: Isp ← apply DAF filter to Isp
standard deviations of Ghl , for which hold σyhf = Nyf /Nxf · σxhf DAF
4: Is,sp ← apply DAF filter to Is,sp
in the frequency domain. Consequently, we consider σyhf as 5: I DAF ← transform Isp DAF
to spatial domain
the single parameter of Gh . DAF DAF
6: Is ← transform Is,sp to spatial domain
We have to make two additional remarks. The first is that 7: Buttons(I DAF ) ← segment the buttons of I DAF
we want a filtered image to maintain its original size in 8: Buttons(IsDAF ) ← segment the buttons of IsDAF
pixels, (Nx , Ny ), without the creation of additional frequency 9: v ← mean(MSSIM(Buttons(I DAF ), Buttons(IsDAF )))
components in the stop-band of the filters. To this end, we 10: if v ≥ vth then
perform the filtering in the frequency domain by multiplying 11: return No
the (Nx , Ny )-point DFT of the image with the corresponding 12: else
filter. Because of this, the ratio Nyf /Nxf is equal to the ratio 13: return Yes
of the image dimensions Ny/Nx , which usually is 16/9. The
second remark is that we measure σylf and σyhf in number of much thicker, we may need to adjust the value of σylf to make
cycles per image (c/im). This way, the visual effect of filtering them equally recognizable.
is invariant to the size of the image. To specify the value of σyhf we have to consider the given
security requirements, which correspond to a safety distance
value. In Section VI we explain how we estimate the minimum
C. Tuning value of σyhf that respects a particular safety distance.
The values of σylf and σyhf create a trade-off between
security and usability. When the value of σylf is increased, Isl D. Discussion
becomes more visible since its perceived spectrum extends to
higher spatial frequencies. This means that the user gets more For the design of IllusionPIN we follow the principle of
distracted during her authentication and she may even need obscurity, since the shoulder-surfer’s keypad Is obscures the
to bring the touchscreen closer to her eyes to clearly see the user’s keypad Iu . We could use any image in the place of Is ,
user’s keypad. So, usability is negatively affected. However, but we decided to always use the image of keypad because
a shoulder-surfer needs to reduce his viewing distance too this way Iuh and Isl are visually aligned. This means that the
in order to see the user’s keypad with the same clarity, and digits in Iuh and Isl overlap, and the less dominant keypad is
consequently security is increased. When σylf is decreased, the perceived as noise, providing more clear interpretations of the
opposite effects are caused and usability is increased while hybrid keypad.
security is decreased. The same trade-off is observed when We also apply the principle of alteration by shuffling the
the value of σyhf changes, since the clarity of Iuh is affected. user’s keypad in every authentication attempt, or after every
In particular, when σyhf is decreased, usability is increased and digit entry. Otherwise, it would be enough for a shoulder-surfer
security is decreased, while the opposite behavior is observed to memorize just the spatial arrangement of the pressed digits.
when σyhf is increased. However, the shoulder-surfer’s keypad always remains the
same because this way we expect the user to become gradually
To resolve the trade-off between security and usability, we
better on ignoring it, resulting in faster authentication with
first set the value of σylf in such a way that every possible
fewer errors. Out of all the possible digit orderings that Is
level of security and usability is still possible depending on
may have, we choose the regular digit ordering, as in figure
the value of σyhf . Then, based on our security requirements, we
5. The reason is that this is the ordering that we expect an
set the minimum σyhf that satisfies them. This way, usability
attacker to be the most familiar with, and as a consequence,
is maximized under the constraint that our security needs are
to have the tendency to recognize.
met. To set the value of σylf , we consider that if σylf gets very
In our threat model, we have considered 4 shoulder-surfing
small, Isl will be that blurred that the original digits from Is
scenarios with safety distance values which are equal to 25,
won’t be recognizable, no matter the viewing distance. On
35, 45 and 60 inches. For each of these values we estimate
the other hand, we have to consider that a user is usually
the minimum σyhf that keeps the user protected. This way we
holding her device at a particular distance. If σylf gets very big,
create 4 hybrid keypads for which hold that when security is
the user’s keypad won’t be able to dominate the perception
increased, usability is decreased. These keypads are offered as
from that specific viewing distance. So, we set σylf to be
predefined options to the user to pick the one that fits better
close to the minimum value that allows the digits on Isl to be
to her needs.
interpreted. We experimented with N exus 6 and iP hone 6
smartphones which have representative keypads at their lock
screens while they differ in size, resolution, and visual content. VI. V ISIBILITY A LGORITHM
We concluded that a suitable value for σylf is 35 c/im. However, The visibility algorithm receives as inputs a hybrid keypad
if the digits in another keypad are considerably different, e.g. I and a viewing position N in the 3D space. It returns a binary
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
7
A. Algorithm
1) Distance-As-Filtering: In the first step of the visibility
algorithm, we simulate the way I is perceived from the Fig. 6. We consider an example hybrid keypad I = Isl + Iuh . In the first
viewing position N by using the distance-as-filtering hypoth- row, the third button of Isl,DAF is depicted when Isl is directly viewed from
different distances. In the second row, the corresponding button of I DAF is
esis proposed by Loftus et al. [40]. The distance-as-filtering depicted when I is directly viewed from the same distances as Isl . In the third
hypothesis states that we can simulate the way an image row, the value of the visibility index for each viewing distance is provided.
is perceived from a particular viewing distance by filtering
the DAF filter both to I and to Isl , and we create the images
the image with an appropriate low-pass filter. The intuition
I DAF and Isl,DAF , respectively. This way we simulate how
behind this method is based on the effect that the CSF has in
I and Isl are perceived when they are viewed from position
the visual perception of an image. However, as explained by
N . Then, we separate in equal rectangular regions the buttons
Loftus et al. [40], the perception of an image from a particular
from I DAF and Isl,DAF , and we compute the similarity of
distance and the perception of the corresponding filtered image
the corresponding buttons with the mean structural similarity
are not identical, but they are equivalent with respect to
index (MSSIM) [41]. The visibility index is the mean value of
performance on some task. Loftus et al. experimentally verified
the 10 MSSIM index values from the pairs of corresponding
the distance-as-filtering hypothesis for face recognition tasks
buttons.
by designing a low-pass filter of a particular form. Our task is
the recognition of digits on hybrid keypads, which differs from The visibility index is the cornerstone of our algorithm
face recognition. However, both tasks require the perception of and we would like to clarify its behavior and the intuition
almost equally fine visual details, and consequently, we expect behind it. Given a reference image I1 and a distorted version
the low-pass filter designed by Loftus et al. to be applicable of I1 denoted with I2 , MSSIM index measures the similarity
in our task too. We should also note that in the experiments between I1 and I2 . The maximum value of the MSSIM index
conducted by Loftus et al., every observer was looking directly is 1 and is obtained when I1 and I2 are identical, meaning that
at an image and his viewing position was completely defined I2 is not distorted at all. In our case, Isl,DAF is the reference
by his viewing distance. We use the same filter to simulate the image and I DAF is considered a distorted version of Isl,DAF
perception of an observer who is at a random viewing position, because of the presence of the user’s keypad. The maximum
by making the simplifying assumption that the perception of value of the visibility index is 1 and is obtained when the
an image depends on the visual angle that it subtends, no user’s keypad is completely out of perception. In figure 6,
matter where the observer stands. we demonstrate the behavior of the visibility index for an
The low-pass filter proposed by Loftus et al. [40] has example hybrid keypad I. In the first row, we depict the third
constant gain equal to 1 until the perceived spatial frequency button of Isl,DAF , when Isl is directly viewed from different
magnitude |f~0 |, and then drops until it reaches the value 0 distances. In the second row, we depict the third button of
at the perceived spatial frequency magnitude |f~1 |. We call it I DAF , when I is directly viewed from the same distances
DAF filter and it is mathematically defined in the following as Isl . In the third row, we provide the value of the visibility
way: index for each viewing distance. From left to right, the viewing
distance is increasing. As we can see, as the viewing distance
1
if|f~p | < |f~0 | is increasing, the digit 9 which belongs to the user’s keypad is
2
DAF (f~p ) =
~
1 − log log
(|fp |/|f0 |)
~
(r) if|f~0 | ≤ |f~p | ≤ |f~1 | fading away and the visibility index is increasing. When the
visibility index becomes big enough, the digit from the user’s
if|f~ | > |f~ |
0
p 1
(8) keypad is no longer visible. We would like to make clear that
where r is a positive constant for which holds r > 1, and we apply the MSSIM index between separated buttons and
|f~0 | = |f~1 |/r. So, the parameters of the filter are the values of not between Isl,DAF and I DAF as a whole, because in Isl,DAF
r and |f~1 |. Loftus et al. conducted 4 different face recognition and I DAF exist big, almost identical, regions and the MSSIM
experiments to specify the values of the parameters, and index would have a very big value irrespectively of the buttons
concluded that r = 3, while |f~1 | may be equal to 25, 31 or form.
42 c/d, depending on the task at hand. In Section VI-B2, we The MSSIM index follows the premise that the main
explain how we specified |f~1 | for the purposes of our work. function of the human eye is to extract structural information
2) Visibility Index: In the second step of our algorithm, we from the viewing field. This connection to human perception
compute the visibility index which quantifies how visible the is the main reason that we decided to use the MSSIM index.
user’s keypad of I from the viewing position N is. We remind An additional advantage is that MSSIM index is very easily
that I = Isl + Iuh . To compute the visibility index, we apply computed.
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
8
3) Threshold Value of the Visibility Index: Let’s assume Materials. We used 2 different phones, a N exus 6 and
that we are given a hybrid keypad I and an observer who first an iP hone 6. These two phones have displays with the
views I from position N1 and then from position N2 . If the same dimension ratio (= 16/9), but different size and different
corresponding visibility index values are v1 and v2 and holds resolution. The N exus 6 has display size 5.96 inches and
v2 > v1 , we expect the user’s keypad to be less visible from resolution 1440 × 2560, while the iP hone 6 has display size
position N2 than from N1 . If v1 ' v2 , we expect the user’s 4.7 inches and resolution 750 × 1334 pixels. For each phone,
keypad to be almost equally visible in both cases. This is a we created 7 different categories of hybrid keypads. In 6 of
direct consequence of the way we have defined the visibility them, we set σylf = 35 c/im, since this is the value that we
index. Now let’s assume that two different hybrid keypads I1 have decided to use in our hybrid keypads. The value of σyhf
and I2 are viewed by the same observer from positions N1 and was equal to 120, 160, 200, 240, 280 and 320 c/im. In the
N2 , respectively. If the corresponding visibility index values seventh category, we tried a different value for σylf , which we
are v1 and v2 and holds v2 > v1 , we expect the user’s keypads set equal to 60 c/im, while σyhf = 200 c/im. For each category,
of I1 to be more clearly visible than that of I2 . Similarly, we created 30 hybrid keypads which had user’s keypads with
if v1 ' v2 , we expect the user’s keypads of I1 and I2 to different digit ordering.
be almost equally visible. This is the main assumption that Procedure. Each subject participated in at least 3 sessions.
we make about the behavior of the visibility index and we Each session was split in 3 trials. The goal of each trial was to
expect to hold in its reverse form too. This means that if specify a viewing position N , from which the user’s keypad
the user’s keypad from a hybrid keypad I1 is more clearly of a hybrid keypad was marginally recognized. The hybrid
visible than the user’s keypad of a different hybrid keypad I2 keypad was displayed on a smartphone device. To specify
when they are viewed from positions N1 and N2 , respectively, the viewing position N of the participant, we used spherical
then for the corresponding visibility index values v1 and v2 coordinates; meaning that N = (r0 , θ0 , φ0 ). The coordinate
we expect to hold v2 > v1 . If the user’s keypads from I1 system was placed at the center of the image. In each trial,
and I2 are almost equally visible, then we expect v1 ' v2 . we kept θ0 and φ0 constant and we varied r0 .
It is important to mention that we expect these assumptions The exact procedure was the following. During each trial,
to hold only for the same observer. The reason is that the a smartphone device was placed on a tripod that matched the
visual capabilities of different observers vary. For example, if height of the participant. This way, when the participant was
a person with strong vision is directly viewing a hybrid keypad looking at the phone, θ0 was π/2 rad. To change the value
from a particular distance and is able to recognize the user’s of θ0 , we tilted the phone on the tripod. Also, the participant
keypad with difficulty, then a person with weaker vision will was able to move relatively to the tripod in order to change
have to go closer to the image to interpret it in the same way. the value of φ0 . Having specified θ0 and φ0 , the participant
As a result, the hybrid keypad will be interpreted in the same started to approach the phone from a big distance r0 . The
way by the two observers, but the value of the visibility index initial value of r0 was big enough to keep the user’s keypad
will be different. out of perception. As the participant was approaching to the
Based on the aforementioned remarks, we set as a threshold phone, the user’s keypad was starting to become visible. We
vth the value of the visibility index when a particular observer recorded the maximum r0 from which the participant was
is able to marginally recognize the digits of a user’s keypad. able to read the digits on the user’s keypad. In particular,
Then, the visibility algorithm calculates the visibility index v the participant had 15 seconds to read all the 10 digits and
for the inputs I and N , and compares it with vth . If v ≥ vth , only one mistake was allowed. The time limit is connected
we predict that the user’s keypad cannot be interpreted by the to the fact that a shoulder-surfer has limited time to observe
observer. If v < vth , we predict that the observer is able to the user while entering her authentication credentials. Also,
interpret the digits of the user’s keypad. Since the threshold as the participant was approaching the phone, each time he
value will vary for different observers, we universally use the or she failed to read the digits, we switched to a different
vth value that corresponds to people with the strongest vision, hybrid keypad from the same category in order to be sure
because we don’t want to mistakenly predict that the user’s that the participant is not getting familiar with a specific digit
keypad is not visible when it is. ordering. This is the reason that we created multiple hybrid
keypads from each category for each phone.
B. Parameter Tuning For each session, the value of θ0 was constant. Specifically,
The parameters of the visibility algorithm are the spatial θ0 was equal to π/2, π/4 or π/3 rad. On each trial of a session,
frequency |f~1 | and the threshold vth . To specify their values, we had a different value for φ0 . During the first trial, we had
we conducted a user study. φ0 = 0 rad, during the second φ0 = π/6 rad, and during the
1) Data Collection: Participants. We recruited 11 partic- third φ0 = π/3 rad. Since θ0 ∈ (0, π) and φ0 ∈ (−π/2, +π/2),
ipants from our institution, who were between 21 and 34 the values of θ0 and φ0 that we considered belonged to the 1/4
years of age. Our aim was to have participants with strong of the available space. The reason is that the visual angle is
vision and that’s why they all were of young age. Out of symmetric with respect to the xy, zx and zy planes as can be
the 11 participants, 6 reported that they had either myopia or easily seen in equations 4 and 5. In addition, on each session
astigmatism, but they were wearing their glasses during the we used hybrid keypads from a single category and we used
process. The rest of the participants reported that they did not only one smartphone device.
have any problem. Last but not least, we have to comment on the illumination
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
9
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
10
TABLE II
T HE COEFFICIENT OF VARIATION FOR THE VISIBILITY INDEX VALUES OF EACH PARTICIPANT IN OUR DATASET.
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11
cv (%) 1.85 1.51 1.48 1.92 2.01 5.41 2.66 2.06 2.54 2.51 3.86
15
10
y (inches)
0
−5
−10
Fig. 8. The box and whisker plot with median notch after applying the −15
10
Kruskal-Wallis test in our whole dataset with the participants as the factor of 15
variation. −30
−20
−10
20
25
0
10
that the hybrid keypads belonged to. To verify this, we used the 20
30 35
30
z (inches)
DAF filter we defined in the previous section to compute the x (inches)
visibility index for each of the 270 entries in our dataset. Then, Fig. 9. The visibility region of a hybrid keypad I created with σylf = 35 c/im
we computed the coefficient of variation (cv) for the visibility and σyhf = 320 c/im for the iphone 6 smartphone.
index values of each participant. We provide the results in
Table II. As we can see, the cv is lower than 10% for every index values. Participants p1, p2, p3, p4 and p5 formed the
participant, and consequently, we concluded that the values of first group, p6, p8, p10 and p11 formed the second group, and
the visibility index are homogeneous for each participant. p7 and p9 formed the third group. We confirmed this grouping
of the participants by performing pairwise comparisons with
We wanted to further test the assumption that the values Man-Whitney tests. Each group had participants with visibility
of the visibility index for each participant are independent index values in a different range and this is because the
of the smartphone device that was used. Also, we wanted to participants demonstrated different visual capabilities. The
test the assumption that the values of the visibility index vary first group had the participants with the highest visibility
significantly between different participants. We tested both index values and consequently, these were the people who
of these assumptions by applying a two-factor ANOVA with demonstrated the strongest vision. The mean value of the
randomized complete block design. We used data from the visibility index in this group was 0, 92996 with standard error
participants who were exposed to both phones during the data 0.00115. The desired threshold value of the visibility index
collection process. These were 5 out of the 11 participants. was set equal to 0.93.
The visibility index values from different participants were
assigned to separate blocks, and consequently, the participants
VII. S AFETY D ISTANCE
were the blocking factor. The factor of interest within each
block was the type of the phone; N exus 6 or iP hone 6. Since A. Shoulder-Surfing With a Naked Eye
the normality and homoscedasticity conditions were satisfied, We assume that we are given a hybrid keypad I and an
we were able to compute the p-value of the two factors. The observer at a position N = (r0 , θ0 , φ0 ). If ds is a safety
p-value for the type of the phone was 12.07% > 5% and we distance for I, then ∀θ0 , φ0 , if r0 ≥ ds , the user’s keypad can
concluded that the smartphone device is not a statistically sig- not be interpreted by the observer. Of course, we are interested
nificant factor of variation for the value of the visibility index. in the minimum possible value of ds , because it corresponds
In contrast, the p-value for the participants was less than 1h to the maximum protection that I can offer against shoulder
and consequently, the values of the visibility index between surfing. In addition, as we have seen in previous sections, even
the participants have statistically significant differences. in the case that we are given a desired ds and we are asked to
We further tested if participants are a factor of variation by design a hybrid keypad I that satisfies it, we aim to maximize
considering the whole dataset with the 11 participants. Our the usability of I by making ds to be the minimum safety
intention was to apply an one-way ANOVA, but the normality distance of I. So, we are only interested in the minimum
and homoscedasticity conditions weren’t satisfied, and as a possible value of the safety distance.
result, we applied the non-parametric Kruskal-Wallis test. The To estimate the minimum ds for a hybrid keypad I, we
p-value was less than 1h and we confirmed that there are first examine from which viewing positions at the 3D space
statistically significant differences between the visibility index the user’s keypad of I is visible. We call the resulting region
values of different participants. In figure 8, we provide the of the 3D space, “visibility region”. To estimate the visibility
corresponding box and whisker plot with median notch of region of a hybrid keypad, we applied the visibility algorithm
the visibility indexes of all participants. As we can see, the for viewing positions from a dense grid at the 3D space.
participants formed three main groups based on their visibility For a hybrid keypad I created with σylf = 35 c/im and
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
11
σyhf = 320 c/im for the iphone 6 smartphone, the visibility corresponds to the value 0.5 will be an ellipse that we call
region is depicted in Figure 9. The depicted zx plane can be the cut-off ellipse. We consider that all the spatial frequencies
used as a reference to understand the geometry of the visibility enclosed by the cut-off ellipse are cut off. Since Gh = 1−Ghl ,
region. The user’s keypad is visible when I is viewed from where Ghl (fx , fy ) = exp(−1/2σxhf · fx2 − 1/2σyhf · fy2 ), for the
positions which are either on the depicted surface or they are axes of the cut-off ellipse
qwill hold:
enclosed by it. The coordinate system is the same as the one
a= 2σxhf log (1/0.5) (10)
used in Figure 1, with the image placed in position (0, 0, 0). q
In addition, for all viewing positions N = (x0 , y0 , z0 ), we b= 2σyhf log (1/0.5) (11)
assume that z0 > 5 inches. The visibility region is symmetric
with respect to planes zx and yz. This is something that To describe the region of the spatial frequencies which are cut
we expected because of the symmetry in equations 4 and off by the filter in a simpler way, we consider the rectangle
5. In general, the form of the visibility region is similar to with the biggest area that is inscribed to the cut-off ellipse.
that of an ellipsoid. If we consider spherical coordinates, the We assume that the spatial frequencies which are cut off by
safety distance will be slightly bigger than the biggest viewing the filter, are those enclosed by this rectangle instead of those
distance r0 of a position N = (r0 , θ0 , φ0 ) that belongs to enclosed by the cut-off ellipse. We call this rectangle, the cut-
the visibility region. Since the form of the visibility region is off rectangle. For the biggest horizontal and vertical spatial
ellipsoidal, for the position N with the maximum r0 will hold frequency components in the cut-off rectangle will hold:
√
θ0 = π/2 and φ0 = 0. This is a result that we expected, since fxs = a · 2/2 (12)
for a given r0 , from equations 6 and 7, we can easily derive s √
fy = b · /2
2 (13)
that the visual angle is maximized when θ0 = π/2 and φ0 = 0.
In our threat model, we have assumed that for the view- Based on these remarks, we conclude that for a grating from
ing position N = (r0 , θ0 , φ0 ) of the shoulder-surfer holds Iuh with spatial frequency f~ = (fx , fy ) will hold fx ≥ fxs
|φ0 | ≥ π/6 rad. In the visibility region, as |φ0 | is increasing, and fy ≥ fys . For an example filter Gh with (σxhf , σyhf ) =
the maximum r0 is decreasing. As a consequence, we define (120.9375 c/im, 215 c/im), we calculate fxs = 100.69 c/im and
the safety distance to be the minimum distance ds for which fys = 178.99 c/im.
holds that from the viewing position N = (rs , π/2, π/6), an After measuring the smallest frequency components present
observer with strong vision is unable recognize the digits on in Iuh , we should calculate the length of the corresponding
the user’s keypad. Based on this definition, if we are a given cycles. To this end, we need to know the resolution xr × yr of
a hybrid keypad I, we can apply the visibility algorithm for the screen that is used to display I and the number of pixels
viewing positions N = (r0 , π/2, π/6) with varying r0 , and set per inch (ppi). Based on that, we can calculate the biggest
the safety distance to be equal to the minimum r0 for which cycle size in each dimension:
holds that the corresponding visibility index v is greater than xr/ppi
lx = (14)
vth . We are also interested in the case that we are given as a fxs
security requirement a safety distance ds and we have to set yr/ppi
the value of σyhf . To do this, we vary the value of σyhf that we ly = (15)
fys
use to create I, and we apply the visibility algorithm for the
position N = (ds , π/2, π/6). We create I with the minimum where xr/ppi and yr/ppi express the length of the display in
σyhf for which holds that the visibility index v is greater than the horizontal and vertical dimension respectively, measured in
vth . This way we were able to create the 4 hybrid keypads that inches. If we assume that I is displayed on a N exus 6 device,
we provide as predefined options to the users of IllusionPIN. the resolution is 1440 × 2560 pixels and ppi = 493 p/in. As
a consequence, we find that lx = 0.029 inches and ly =
0.029 inches. As we can see, both lengths are equal. This is
B. Shoulder-Surfing Through a Surveillance Camera something that we expected, because we have designed Gh to
We assume that we are given a hybrid keypad I, a smart- be isotropic in the spatial domain.
phone device where I is displayed on, and a surveillance To estimate the safety distance ds , we assume that we have a
camera at position N = (r0 , θ0 , φ0 ). To estimate the safety square stimulus c of size (lx , ly ), which is viewed by a camera
distance, we calculate the minimum r0 for which holds that in the same way that an image is viewed by a human observer
∀θ0 , φ0 , the camera is unable to capture the user’s keypad in Figure 1. This means that the camera is modeled as an
Iuh . We assume that the camera is unable to capture Iuh when ideal pinhole camera. In this setting, the camera is in position
a cycle from the grating of Iuh with the biggest cycle size, N = (r0 , θ0 , φ0 ). As we increase r0 , c subtends increasingly
occupies at most a pixel when it is projected on the image smaller angle. The camera will not be able to capture c for the
plane of the camera. first time when c will subtend visual angle that corresponds
We start by estimating the smallest spatial frequency com- exactly to one pixel. As a result, safety distance will be the
ponents which are present in Iuh , since they will correspond to biggest distance for which holds that c is projected to exactly
the cycles with the biggest size. We remind that Iuh is the result one pixel. From equations 6 and 7, we can easily see that a
of applying the high-pass filter Gh to Iu . We assume that when specific visual angle corresponds to the biggest r0 when θ0 =
Gh assigns a gain value less than 0.5 to a spatial frequency, π/2 and φ = 0. As a result, in our setting, we assume that
0
the corresponding grating is cut off. The isocontour of Gh that the camera is in position N = (ds , π/2, 0), while c subtends
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
12
visual angle that corresponds to exactly one pixel on the image roles and repeat the same 5 simulations. The first 5 simulations
plane. Since c is a square stimulus, we simply use perspective si , i = 1, 2, 3, 4, 5, were performed in the following way.
projection in the horizontal dimension to find that for ds holds: The shoulder-surfer was placed at position N = (dis , π/2, π/6),
ds = f · lx/sp (16) while the phone was at position N = (0, π/2, 0). The user
where f is the focal length of the camera, and sp is the freely created a 4-digit PIN with a keypad from category
pixel size. If we assume that the surveillance camera has the ci , and was asked to successfully authenticate 3 times. The
specifications of a N exus 6 camera, for the pixel size will hold attacker was able to observe the user during all authentication
sp = 0.001127 mm and for the focal length f = 3.8 mm. As attempts and then was asked to replicate the PIN. During
a result, we get ds = 97.81 inches. s5 , we set d5s = d1s , which was the biggest distance that we
In our threat model, we assumed that for this attack sce- considered.
nario the surveillance camera is recording from a distance Before the simulations, we asked the participants to practice
that is at least 100 inches. According to the previous cal- with all the keypad categories by authenticating 10 times
culations, for a hybrid keypad created with (σxhf , σyhf ) = with each category. During the simulations, we tried to create
(120.9375 c/im, 215 c/im) and displayed on a N exus 6 smart- favorable conditions for the attackers. In particular, the simu-
phone, our security requirement is satisfied. It is important lations were performed in a silent indoor place with adequate
to note that this specific keypad is the second most usable illumination, while the brightness of the phone screen was
hybrid keypad out of the 4 predefined options that we offer set to its maximum level. In addition, the users were asked
to N exus 6 users. In addition, we have to consider that all to keep the screen of the phone in the view of the attackers
the assumptions that we made during our calculations were in during the authentication. It is also important to clarify that we
favor of the attacker. Some examples are that we required the purposely selected participants of young age with normal or
camera to be unable to capture all the gratings from Iuh and not corrected vision, because these participants could impersonate
a subset of them, that we disregarded the effect of Isl , and that skillful shoulder-surfers.
we overestimated the specifications of the surveillance camera. Collected data. We simulated 84 shoulder-surfing attacks
All these show that under real life conditions, is extremely against IllusionPIN, and none of them was successful. The best
improbable to capture the user’s keypad of IllusionPIN with a performance was demonstrated by 2 participants who managed
surveillance camera. to correctly replicate 3 out of 4 PIN digits in one attack each.
All 21 attacks against the regular PIN authentication were
successful.
VIII. E VALUATION
Data analysis. It is very important that none of the attackers
We would like to estimate the probability of underestimating was able to break IllusionPIN. We should also note that the
the safety distance with the visibility algorithm. In other category of a hybrid keypad wasn’t a factor that affected
words, we would like to know how probable it is for a the success rate of the attackers. Based on that, we consider
shoulder-surfer to steal the credentials of an IllusionPIN user, that we have 21 independent samples and we calculate the
even if the viewing distance of the attacker is equal or bigger Clopper-Pearson interval for the success rate of shoulder-
than the estimated safety distance. To this end, we performed surfing attacks. With probability 95%, we expect the interval
simulated shoulder-surfing attacks against IllusionPIN. [0, 0.1329] to contain the success rate. If we increase the
Participants. We recruited 21 participants who were un- sample size, the range of the interval will be reduced. However,
dergraduate and graduate students from our institution. All based on the unfavorable conditions for our scheme, the visual
participants were less than 40 years old and they had either capabilities of the participants, and the performance of the
normal or corrected vision. Note that all experiments in this attackers, we expect that we have to record a very large number
work were approved by the IRB of our institution. of attacks to find a successful one. As a result, we are already
Materials. We built an application for the Android operat- confident that the success rate is much closer to 0 than to
ing system to aid the execution of the simulated attacks. The 0.1329.
application was run on a N exus 6 phone and allowed the
users to create five keypad categories that we denote with
ci , for i = 1, 2, 3, 4, 5. The first 4 categories were hybrid IX. L IMITATIONS AND C ONCLUDING R EMARKS
keypads that corresponded to the 4 safety distance values dis , The main goal of our work was to design a PIN-based
i = 1, 2, 3, 4, that we have considered in our threat model. All authentication scheme that would be resistant against shoulder-
hybrid keypads were created with σylf = 35 c/im. Category surfing attacks. To this end, we created IllusionPIN. We
c1 was created with σyhf = 145 c/im and corresponded to quantified the level of resistance against shoulder-surfing by
d1s = 60 inches, c2 had σyhf = 215 c/im and d2s = 45 inches, introducing the notion of safety distance, which we estimated
c3 had σyhf = 305 c/im and d3s = 35 inches, and c4 had with a visibility algorithm. In the context of the visibility
σyhf = 440 c/im and d4s = 25 inches. Category c5 was a algorithm, we had to model at a basic level how the human
normal keypad that is used in regular PIN authentication. visual system works. In this process, we made a number
Procedure. Participants worked in pairs. Each pair per- of simplifying assumptions that limit the accuracy of our
formed 10 simulations in total. In the first 5 simulations one calculations. The most obvious example is the pinhole camera
participant was playing the role of the attacker and the other model that we used to describe the image formation process in
the role of the user. Then, the participants were asked to switch the eye. This is a widely used model, but disregards important
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
13
that the values of the DAF filter parameters depend on the task
at hand. As a result, we may have to repeat the estimation
process for a considerably different task. We conclude that
the visibility algorithm could be used to assess the visibility
Fig. 10. A button from a different kind of hybrid keypad. The shoulder- of general images, but its parameters have to be appropriately
surfer’s keypad is composed of uniformly white buttons. tuned for the particular task at hand.
parts of the human eye, like the lens. In Section VI-A1,
we made an additional simplification by assuming that the ACKNOWLEDGMENT
perception of an image depends on the visual angle that it The authors would like to thank all the participants of
subtends, no matter where the observer stands. A problem with the conducted user studies for their cooperation and contri-
this assumptions is that, depending on the viewing position, bution. Also, they would like to thank Assoc. Prof. George
we may perceive the dimensions of an image as having a Papadopoulos (Agricultural Univ. of Athens, Department of
different ratio. For example, as we can see in equations 6 and Crop Science, Laboratory of Biometry) for his contribution
7, when φ0 is increasing, only θx is decreasing and the image in the design and implementation of the statistical analysis
is perceived as being squeezed in the horizontal dimension. presented in sections VI-B3 and VIII.
As a consequence, the image of a circle could be perceived as
the image of an ellipse when it is viewed from a big angle. It R EFERENCES
is interesting to note that even if we did not explicitly model [1] J. Bonneau, C. Herley, P. C. Van Oorschot, and F. Stajano, “The quest
such phenomena, the factor A(φ0 , θ0 ) that we estimated in to replace passwords: A framework for comparative evaluation of web
Section VI-B2 may have accounted for them implicitly to some authentication schemes,” in Security and Privacy (SP), 2012 IEEE
Symposium on. IEEE, 2012, pp. 553–567.
extent. It is very important that despite all these simplifying [2] M. Harbach, A. De Luca, and S. Egelman, “The anatomy of smartphone
assumptions that we made, our results led us to conclusions unlocking,” in Proceedings of the 34th Annual ACM Conference on
that agreed with our expectations. For example, the visibility Human Factors in Computing Systems, CHI, 2016.
[3] J. Bonneau, S. Preibusch, and R. Anderson, “A birthday present every
region depicted in Figure 9 has the expected ellipsoidal eleven wallets? the security of customer-chosen banking pins,” in
form, while the visibility index demonstrated the behavior Financial Cryptography and Data Security. Springer Berlin Heidelberg,
we described in Section VI-A3. So, more strict assumptions, 2012, vol. 7397, pp. 25–40.
[4] R. Anderson, “Why cryptosystems fail,” in Proceedings of the 1st ACM
followed by more detailed models, could improve the accuracy Conference on Computer and Communications Security. ACM, 1993,
of our current results, but we expect the general conclusions pp. 215–227.
to remain the same. [5] A. J. Aviv, K. Gibson, E. Mossop, M. Blaze, and J. M. Smith, “Smudge
attacks on smartphone touch screens.” WOOT, vol. 10, pp. 1–7, 2010.
[6] A. Oliva, A. Torralba, and P. G. Schyns, “Hybrid images,” ACM
The visibility algorithm forms the core of our work and Transactions on Graphics (TOG), vol. 25, no. 3, pp. 527–532, 2006.
we would like to examine whether it can be used to assess [7] D. Kim, P. Dunphy, P. Briggs, J. Hook, J. W. Nicholson, J. Nicholson,
and P. Olivier, “Multi-touch authentication on tabletops,” in Proceedings
the visibility of images other than hybrid keypads. The visi- of the SIGCHI Conference on Human Factors in Computing Systems.
bility algorithm uses the MSSIM index which quantifies the ACM, 2010, pp. 1093–1102.
distortion between two images. If we are given a random [8] L.-W. Chan, T.-T. Hu, J.-Y. Lin, Y.-P. Hung, and J. Hsu, “On top of
tabletop: A virtual touch panel display,” in Horizontal Interactive Human
image Ir and a viewing position N , we could apply the same Computer Systems, 2008. TABLETOP 2008. 3rd IEEE International
rationale by considering IrDAF as the distorted version of Ir . Workshop on. IEEE, 2008, pp. 169–176.
However, we do not expect the visibility index threshold value [9] W. Matusik, C. Forlines, and H. Pfister, “Multiview user interfaces with
an automultiscopic display,” in Proceedings of the working conference
that we specified in Section VI-B3 to be applicable to random on Advanced visual interfaces. ACM, 2008, pp. 363–366.
images. The reason is that the level of distortion does not [10] C. Harrison and S. E. Hudson, “A new angle on cheap lcds: making
uniquely correspond to how visible particular visual details positive use of optical distortion,” in Proceedings of the 24th annual
ACM symposium on User interface software and technology. ACM,
are in the distorted image. To verify this, we extended the 2011, pp. 537–540.
data collection process that we presented in Section VI-B1 to [11] S. Kim, X. Cao, H. Zhang, and D. Tan, “Enabling concurrent dual views
a different kind of hybrid keypads that we call white keypads. on common lcd screens,” in Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems. ACM, 2012, pp. 2175–2184.
An example button from such a keypad is depicted in Figure [12] E. Hayashi, R. Dhamija, N. Christin, and A. Perrig, “Use your illusion:
10. As we can see, the shoulder-surfer’s keypad of a white secure authentication usable anywhere,” in Proceedings of the 4th
keypad is composed of all white buttons. In the small dataset symposium on Usable privacy and security. ACM, 2008, pp. 35–45.
[13] I. Jermyn, A. J. Mayer, F. Monrose, M. K. Reiter, A. D. Rubin et al.,
that we collected, the visibility index values were consistent “The design and analysis of graphical passwords.” in Usenix Security,
for a particular user, but they were considerably lower than 1999.
the corresponding values that the same user reported in the [14] N. H. Zakaria, D. Griffiths, S. Brostoff, and J. Yan, “Shoulder surfing
defence for recall-based graphical passwords,” in Proceedings of the
original dataset. This means that even if a person perceives Seventh Symposium on Usable Privacy and Security. ACM, 2011,
the digits on a hybrid keypad to be equally visible to the p. 6.
digits on a white keypad, the distortion in the white keypad [15] D. S. Tan, P. Keyani, and M. Czerwinski, “Spy-resistant keyboard:
Towards more secure password entry on publicly observable touch
is bigger and the visibility index has a lower value. This is screens,” in Proceedings of OZCHI-Computer-Human Interaction Spe-
something logical, because when the reference buttons are all cial Interest Group (CHISIG) of Australia. Canberra, Australia: ACM
white, a digit that is even slightly visible is considered a big Press, 2005.
[16] V. Roth, K. Richter, and R. Freidinger, “A pin-entry method resilient
distortion. Based on that, we conclude that the visibility index against shoulder surfing,” in Proceedings of the 11th ACM conference
threshold value is not universal. We would also like to remind on Computer and communications security. ACM, 2004, pp. 236–245.
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIFS.2017.2725199, IEEE
Transactions on Information Forensics and Security
14
[17] T. Pering, M. Sundar, J. Light, and R. Want, “Photographic authenti- [38] J. Mannos and D. Sakrison, “The effects of a visual fidelity criterion
cation through untrusted terminals,” IEEE Pervasive Computing, vol. 2, of the encoding of images,” IEEE Transactions on information theory,
no. 1, pp. 30–36, 2003. vol. 20, no. 4, pp. 525–536, 1974.
[18] D. K. Yadav, B. Ionascu, S. V. K. Ongole, A. Roy, and N. Memon, “De- [39] A. Oliva, “The art of hybrid images: two for the view of one,” Art &
sign and analysis of shoulder surfing resistant pin based authentication Perception, vol. 1, no. 1-2, pp. 65–74, 2013.
mechanisms on google glass,” in International Conference on Financial [40] G. R. Loftus and E. M. Harley, “Why is it easier to identify someone
Cryptography and Data Security. Springer Berlin Heidelberg, 2015, close than far away?” Psychonomic Bulletin & Review, vol. 12, no. 1,
pp. 281–297. pp. 43–65, 2005.
[19] R. Dhamija and A. Perrig, “Deja vu-a user study: Using images for [41] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
authentication.” in USENIX Security Symposium, vol. 9, 2000, pp. 4–4. quality assessment: from error visibility to structural similarity,” IEEE
[20] Z. Li, Q. Sun, Y. Lian, and D. D. Giusto, “An association-based graphical transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
password design resistant to shoulder-surfing attack,” in Multimedia and
Expo, 2005. ICME 2005. IEEE International Conference on. IEEE,
2005, pp. 245–248.
[21] S. Wiedenbeck, J. Waters, L. Sobrado, and J.-C. Birget, “Design and
evaluation of a shoulder-surfing resistant graphical password scheme,” Athanasios Papadopoulos Athanasios Papadopou-
in Proceedings of the working conference on Advanced visual interfaces. los is a Ph.D. student at the Department of Computer
ACM, 2006, pp. 177–184. Science and Engineering at NYU Tandon School
[22] T. Perković, M. Čagalj, and N. Saxena, “Shoulder-surfing safe login in of Engineering. He completed his undergraduate
a partially observable attacker model,” in Financial Cryptography and studies at the Department of Electrical and Computer
Data Security. Springer, 2010, pp. 351–358. Engineering at the National Technical University of
[23] A. De Luca, K. Hertzschuch, and H. Hussmann, “Colorpin: securing pin Athens. His research interests lie in Image Process-
entry through indirect input,” in Proceedings of the SIGCHI Conference ing, Computer Vision and Machine Learning.
on Human Factors in Computing Systems. ACM, 2010, pp. 1103–1106.
[24] P. Shi, B. Zhu, and A. Youssef, “A pin entry scheme resistant to
recording-based shoulder-surfing,” in Emerging Security Information,
Systems and Technologies, 2009. SECURWARE’09. Third International
Conference on. IEEE, 2009, pp. 237–241.
[25] A. Bianchi, I. Oakley, V. Kostakos, and D. S. Kwon, “The phone
lock: audio and haptic shoulder-surfing resistant pin entry methods for Toan Nguyen Toan Nguyen is a Ph.D. Candidate at
mobile devices,” in Proceedings of the fifth international conference on the Department of Computer Science and Engineer-
Tangible, embedded, and embodied interaction. ACM, 2011, pp. 197– ing at NYU School of Engineering. Before joining
200. NYU, he was a Lecturer at School of Information
[26] H. Sasamoto, N. Christin, and E. Hayashi, “Undercover: authentication and Communication Technology (SoICT), Hanoi
usable in front of prying eyes,” in Proceedings of the SIGCHI Con- University of Science and Technology (HUST). His
ference on Human Factors in Computing Systems. ACM, 2008, pp. research interests include cybersecurity, usable se-
183–192. curity, computer networking, machine learning, hu-
[27] B. Malek, M. Orozco, and A. El Saddik, “Novel shoulder-surfing mancomputer interaction, and biometrics. He earned
resistant haptic-based graphical password,” in Proc. EuroHaptics, vol. 6, a Bachelor of Engineering in Information Tech-
2006. nology and a Master of Science in Computer and
[28] N. Sae-Bae, K. Ahmed, K. Isbister, and N. Memon, “Biometric-rich Communication Engineering from HUST in 2009 and 2011, respectively. He
gestures: a novel approach to authentication on multi-touch devices,” in is a Fellow of the Vietnam Education Foundation Fellowship Program.
Proceedings of the SIGCHI Conference on Human Factors in Computing
Systems. ACM, 2012, pp. 977–986.
[29] T. V. Nguyen, N. Sae-Bae, and N. Memon, “Finger-drawn pin authen-
tication on touch devices,” in 2014 IEEE International Conference on
Image Processing (ICIP), Oct 2014, pp. 5002–5006. Emre Durmus Emre Durmus is a Ph.D. student
[30] N. Sae-Bae, N. Memon, K. Isbister, and K. Ahmed, “Multitouch gesture- at the Department of Computer Science and En-
based authentication,” IEEE transactions on information forensics and gineering at NYU Tandon School of Engineering.
security, vol. 9, no. 4, pp. 568–582, 2014. He received the Bachelor of Engineering degree
[31] N. Sae-Bae and N. Memon, “Online signature verification on mobile in Industrial Engineering from Koc University and
devices,” IEEE Transactions on Information Forensics and Security, Master of Art degree in Computer Science from
vol. 9, no. 6, pp. 933–947, 2014. Brooklyn College. His research interests include
[32] T. V. Nguyen, N. Sae-Bae, and N. Memon, “DRAW-A-PIN: digital forensics, biometrics, authentication, and ma-
Authentication using finger-drawn pin on touch devices,” Computers chine learning.
& Security, vol. 66, pp. 115 – 128, 2017. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0167404817300123
[33] M. Kumar, T. Garfinkel, D. Boneh, and T. Winograd, “Reducing
shoulder-surfing by using gaze-based password entry,” in Proceedings
of the 3rd symposium on Usable privacy and security. ACM, 2007,
pp. 13–19. Nasir Memon Nasir Memon is a Professor in the
[34] J. Chuang, H. Nguyen, C. Wang, and B. Johnson, “I think, therefore Department of Computer Science and Engineering at
i am: Usability and security of authentication using brainwaves,” in NYU Tandon School of Engineering. His research
Financial Cryptography and Data Security. Springer, 2013, pp. 1– interests include digital forensics, biometrics, data
16. compression, network security and usable security.
[35] S. Schneegass, Y. Oualil, and A. Bulling, “Skullconduct: Biometric user Dr. Memon received a Master of Science in Com-
identification on eyewear computers using bone conduction through the puter Science and a PhD in Computer Science from
skull,” Proc. of CHI 2016, 2016. the University of Nebraska. He has published more
[36] N. K. Ratha, J. H. Connell, and R. M. Bolle, “Enhancing security than 250 articles in journals and conference proceed-
and privacy in biometrics-based authentication systems,” IBM systems ings and holds a dozen patents in image compression
Journal, vol. 40, no. 3, pp. 614–634, 2001. and security. He has won several awards including
[37] V. M. Patel, N. K. Ratha, and R. Chellappa, “Cancelable biometrics: A the Jacobs Excellence in Education award and several best paper awards. He
review,” IEEE Signal Processing Magazine, vol. 32, no. 5, pp. 54–65, has been on the editorial boards of several journals and was the Editor-In-
2015. Chief of Transactions on Information Security and Forensics. He is an IEEE
Fellow and an SPIE fellow.
1556-6013 (c) 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.