Research On Abnormal Behavior Detection of Online Examination Based On Image Information

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics

Research on Abnormal Behavior Detection of Online Examination Based on Image


Information
Senbo Hu1, Xiao Jia2, Yingliang Fu3
Dalian Maritime University
College of Information Science and Technology
Dalian, Liaoning
Email: [email protected], [email protected], [email protected]

Abstract—Over the past few years, online exams have become


popular because of their flexibility, usability, and user- II. RELATED WORK
friendliness. In term of online examinations, the abnormal Eye gaze estimation is an ideal method for online exam
behavior monitoring of the examiner during the examination is
cheat detection, but this method usually requires specific
one of the major challenges. The traditional monitoring
programs mainly focus on the identity of testers and lack the
hardware, such as infrared high-resolution cameras or
effective identification of abnormal behaviors of testers. Faced infrared light sources. In most cases, it takes for long time for
with the monitoring of abnormal behaviors in online calibration. Therefore, detection of abnormal behavior based
examinations, this paper proposes to obtain the information of on head pose estimation is more feasible. In recent years,
the examinee's head posture and mouth state through the some detection methods based on head pose have also
webcam and to discriminate the abnormal behavior of the emerged, as shown in Table 1.
examiners during the online examination. This system has been TABLE I. Comparison of Online Examination Monitoring Methods
tested in an online test scenario, making it easy to monitor the Online exam abnormal behavior monitoring comparison
test. Experiments show that the proposed method performs Name ProctorU Nowcoder S.Prathish[3] Ours
better than the existing systems. Manual Yes Yes Yes No
assistance
Keywords-online examination, behavior monitoring, head pose other devices Yes No Yes No
estimation, mouth detection Use the Yes Yes Yes Yes
camera
Use the Yes yes Yes Yes
I. INTRODUCTION network
Speech No No Yes Yes
With the development of computer technology and the detection
Internet, online testing has become a trend. At present, online Nowcoder[1] is one of the widely used commercial online
exams have been successfully applied to such areas as examination supervision tools. The main detection
TOEFL, IELTS, online learning, and corporate recruitment. mechanism is to randomly intercept camera images during
At the same time, due to the uncertainty in time and space the process of examination, which are then provided for back-
during online examinations, how to effectively monitor the office staff to judge whether there are abnormal behaviors. It
behavior of examiners has become an increasingly important fails to complete the entire monitoring process. ProctorU[2]
research topic. is another commercial online proctoring tool currently in use.
At present, online examination monitoring mainly focuses It can complete real-time monitoring, but it requires a large
on the identification of testers. For example, by comparing number of back-office invigilators, which consumes too
fingerprint authentication, face recognition, and voice much human resources. S.Prathish[3] et al. used the model-
recognition and so on. However, the above methods ignore based head pose estimation method and the audio-based
the abnormal behavior during the examinations. Normal detection method to complete the test abnormal behavior
examinees face the display screen to answer the questions. detection. However, the accuracy rate of the head pose
The abnormal examination behavior usually manifests in the estimation of this method is not high enough, and the use of
abnormal changes in the head posture and the continuous a microphone to collect sound can infringe the relevant
opening and closing of the mouth. privacy of examinees. To sum up, in order to solve the
For the above abnormal behaviors, the cameras are usually problem of monitoring abnormal real-time behavior and
used in the traditional methods to collect images in real time, protecting the privacy of examinees, we propose an abnormal
which is manually identified by the supervisor. These methods behavior detection method for examiners based on image
cannot monitor all the examinees on a large scale with limited information.
manpower, resulting in unfairness in the process of proctoring.
In summary, the automatic monitoring technology for III. SOLUTION
abnormal behavior of examiners based on image information
has become a new solution. A. System Framework
This paper builds a system that uses a webcam to
monitor examinees' head images, and then inputs both head

978-1-5386-5836-9/18/$31.00 ©2018 IEEE 88


DOI 10.1109/IHMSC.2018.10127
posture information and mouth state information into a rule- 3D head model, which affects the accuracy rate of pose
based reasoning system. This system can detect the behavior estimation. The performance-based approach does not
of the respondent and determine whether the his behavior is require precise facial feature points, and directly uses head
abnormal. pose as a hidden parameter of face images. It is assumed that
there is a certain correspondence between face image and
head pose, and this relationship can be learned through
training. Compared to the model-based method, the
performance-based method does not involve the positioning
of the facial feature points and the head 3d model, which can
greatly improve the accuracy rate and robustness of the pose
estimation.

Figure 1. Flow chart of abnormal behavior monitoring system


The system framework is shown in Fig. 1. It consists of a
video input module, a monitoring module, and an abnormal
behavior record module. After the camera captures the Figure 2. Human head rotation gesture in the coordinate system
images, face detection and mouth detection are performed. This paper adopts a performance-based method for head
For the detected face, it is inputted to the head pose pose estimation. This paper constructs a convolution-based
estimation system based on the convolutional neural network neural network to train the relationship between the head
to obtain the head rotation angle. We make a judgement on image and the corresponding pose. The specific network
the detected month, and then the recording information of structure which is constructed is shown as in Fig. 3.
both is used for abnormal judgement. The convolutional neural network[6] shown in Fig. 3
consists of three convolutional layers, three pooling layers,
B. Face Detection
and three fully connected layers, the convolution kernel size
In this paper, the face detection using Adaboost+haar of each layer is shown in Fig. 3. In order to improve the
algorithm is completed, and AdaBoost is an iterative accuracy rate of the head pose estimation, our paper uses the
algorithm. We change the distribution probability of each aflw dataset[7] to train the network. The dataset consists of
sample in a set of training set to obtain different training set 21997 images, with various appearances, lighting, and
Si. Then every obtained Si is trained to obtain a weak environmental conditions. At the same time, it provides the
classifier, which is combined according to different weights pose obtained by the Posit algorithm, which requires 21
to obtain a strong classifier, generating a final face detection manually annotated feature points. In the training phase, this
classifier. paper selects and optimizes different optimizers under the
C. Head pose estimation condition of determining the network structure. The
comparison results are shown in Table 2.
The head pose refers to the angle of the face with respect TABLE II. Test results of different optimizers (MAE)
to the camera. In general, the coordinates of the human face Optimizer Roll Pitch Yaw
and the tip of the nose are the coordinates. The x-axis is Adadelta[8] 6.26° 9.98° 21.87°
defined as the horizontal direction, and the y-axis is defined Adagrad[9] 5.27° 8.24° 17.83°
Adam[10] 4.81° 7.78° 13.5°
as the vertical direction, and the z-axis is perpendicular to the
Rmsprop 4.4° 7.15° 11.04°
x and y axes. The different poses of the front face around the Sgd 7.0° 9.89° 22.98°
x, y, and z axes in the coordinate system, as in Fig. 2. Sgd momentum 6.17° 10.3° 21.31°
Existing head pose estimation methods can be divided TABLE III. Comparison of head pose estimation accuracy (MAE)
into two categories, namely model-based methods[4] and Yaw Pitch Roll Mae
performance-based methods[5]. Model-based method for Our 3.53° 4.76° 3.34° 3.88°
Fan[11] 6.358° 12.277° 8.714° 9.116°
head pose estimation is a two-step process. The first step is to Dlib[12] 23.153° 13.633° 10.545° 15.777°
detect a certain number of feature points in the face, and the Truth landmark 5.924° 11.756° 8.271° 8.651°
second part is to recover the head pose according to a preset In this paper, in the case of selecting RMSProp
3D head model. Therefore, if no feature point is detected, optimizer, the accuracy of rolling, pitching and yawing is
pose estimation cannot be completed. At the same time, the improved to 96.66%, 95.24% and 96.47% respectively,
accuracy rate of pose estimation depends on the quality of the which is much higher than the head pose estimation based on

89
Figure 3. Convolutional neural network diagram of head pose estimation
feature points. The results are shown in Table 3.
IV. EXPERIMENTAL RESULTS
D. Mouth state detection
According to this paper, there is currently no public dataset
During the process of examination, the face detection of that can be used for monitoring abnormal behavioral analysis.
the input image is performed in this paper. In the detected face Therefore, for the purpose of our experiment, a dataset
area, dlib is used to locate the feature points of the face, and containing 30 independent videos is created. The average
related feature points of the mouth are selected, as shown in duration of each video was 2 minutes. Each video in the data
Fig. 4. set simulated 6-10 cheating at random time.
A. Head pose estimation
In the course of the examination, the angle change of the
examinee's head pose in the yaw angle can obviously reflect
Figure 4. Representation of mouth feature point the abnormal behavior of the examinee. The turning range of
According to the formula the yaw angle is [0°, 90°] and [0°, - 90°]. When the examinee
‫ ܯ‬ൌ ݉݅ ݊ሺ‫݌‬ͷͳǤ ‫ݕ‬ǡ ‫݌‬ͷ͵Ǥ ‫ݕ‬ሻ is in the frontal stable state, the yaw angle is almost 0°. This
‫ۓ‬
ۖ ܺ଴ ൌ ‫݌‬ͶͻǤ ‫ ݔ‬െ ሺ‫݌‬͸ͳǤ ‫ ݔ‬െ ‫݌‬ͶͻǤ ‫ݔ‬ሻ state is the normal answer state of the examinees. After a
ܻ଴ ൌ ‫ ܯ‬െ ሺ‫݌‬͸ʹǤ ‫ ݕ‬െ ‫݌‬ͷͳǤ ‫ݕ‬ሻ   large number of experiments and, this paper determines that
‫ ܹ۔‬ൌ ‫݌‬ͷͷǤ ‫ ݔ‬െ ‫݌‬ͶͻǤ ‫ ݔ‬൅ ʹሺ‫݌‬͸ͳǤ ‫ ݔ‬െ ‫݌‬ͶͻǤ ‫ݔ‬ሻ when the rotation amplitude of the yaw angle of the examinee
ۖ
‫ ܪ ە‬ൌ ‫݌‬ͷͺǤ ‫ ݕ‬െ ‫ ܯ‬൅ ʹሺ‫݌‬͸ʹǤ ‫ ݕ‬െ ‫݌‬ͷͳǤ ‫ݕ‬ሻ is greater than 15 degrees, it is abnormal behavior according
The positioning of the mouth area is completed, and the S.Prathish’s work. However, due to the jittering of video and
image of the mouth area is converted into a grayscale image. human physiological factors, we need to ignore some
This paper finally determines the binarization threshold is momentary angle changes. They cannot be judged as
0.4, getting the binarization image of the mouth according to abnormal behavior.
the distribution characteristics of the mouth area grayscale.
Consecutive area markers are used for the binarization image
to compare the area size of each area. The area with the
largest area is selected as the mouth connectivity area. The
area is represented by the number of white pixels in the area.
The sobel edge detection algorithm is used to extract the
edges of the connected components, whose perimeter of the
minimum bounding ellipse perimeter is calculated. By (a) 19.519e (b) -42.337e
calculating the circularity e of the mouth's binary region, and
through statistics and summary of experimental data, a
threshold is obtained to judge whether the mouth is open or
closed. The higher the roundness e is, the greater the degree
of mouth opening is.
E. Inference System
The above head pose information and mouth state
(c) -0.129e
information are inputted into the determination system at the
Figure 5. Examinee's Head Pose Estimation
same time. In the decision system, if the examinee's yaw When the face is guaranteed to face the computer
angle is greater than 15° in successive 20 frames or the yaw forward, the yaw angle is displayed as -0.129°, which is
angle changes frequently and widely, it is determined that almost 0°. When the examinee's head rotates left and right,
there is abnormal behavior; At the same time, if the the yaw angle changes positively and negatively. The
examinee's mouth is frequently open and closed, it is measurement results are shown in Fig. 5. Therefore, the head
determined that there is speech behavior, which is abnormal. pose estimate clearly shows the examinee's behavior. In this
If any behavior that meets the judgment of abnormal behavior paper, through a large number of experiments, the examinee's
occurs within any time range, it can be regarded as a yaw angle change threshold is set to 15°. Therefore, when the
possibility of cheating.

90
examinee’s yaw angle is greater than 15° in the successive 20 detected, and Y indicates that abnormal behavior is
frames, it is determined as an abnormal behavior. successfully detected.
100 Through comprehensive experiments, the system has a
false alarm rate of only 5% on the basis of abnormal behavior
monitoring and recognition rate of 90%. In summary, the
0
method proposed in this paper can more accurately monitor
1
79
157
235
313
391
469
547
625
703
781
859
937
1015
1093
1171
1249
the occurrence of abnormal behavior during the online
-100 examination process.
Figure 6. Examinee yaw angle changes in the video sequence
V. CONCLUSION
It is obviously observed in Fig. 6 from the 199th to 397th
frames, 463th to 660th frames, 991th to 1090th frames, and This paper designs a set of solutions for abnormal
1200th to 1280th frames that the examinee's head pose rotates behavior monitoring of online examination based on image
for a long time (more than 20 frames) and exceeds a threshold information. Through the use of head pose estimation based
of 15°. It is determined that the examinee is abnormal during on convolutional neural network and threshold-based mouth
this period. state judgment, and the combination of certain decision rules,
the monitoring of abnormal behavior such as turning heads
B. Mouth State Detection and speaking during the online examination is completed. In
the future work, we will continue to study the monitoring of
subtle abnormal behavior during the examination process,
such as deviating from the screen to achieve a higher
(a) closed state (b) binary image (c) external ellipse recognition rate and reduce the false alarm rate.
REFERENCES
[1] Nowcoder, https://www.nowcoder.com
(d) open state (e) binary image (f) external ellipse [2] ProctorU:Real People Real Proctor, http://www.proctoru.com
Figure 7. The mouth area Binary image External ellipse [3] S. Prathish and K. Bijlani, “An intelligent system for online exam
The roundness e of opening and closing mouth is monitoring,” Proc. Information Science (ICIS), International
Conference on, IEEE, 2016, pp. 138-143.
calculated according to the formula e = 4ʌA/P2. Where eę
[4] X. Zhu and D. Ramanan, “Face detection, pose estimation, and
[0,1], A and P indicate the area and perimeter respectively. landmark localization in the wild,” Proc. Computer Vision and Pattern
The experimental results show that when e < 0.5, the mouth Recognition (CVPR), 2012 IEEE Conference on, IEEE, 2012, pp.
is in the closed state; when 0.5 > e, the mouth is in the open 2879-2886.
state (such as speaking and so on). [5] X. Geng and Y. Xia, “Head pose estimation based on multivariate label
distribution,” Proc. Computer Vision and Pattern Recognition (CVPR),
The examinee’s mouth is open and closed respectively in 2014 IEEE Conference on, IEEE, 2014, pp. 1837-1842.
Fig. 7(a), (d). According to frequent changes in the opening [6] Y. LeCun, et al., “Handwritten digit recognition with a back-
and closing of the mouth within a certain number of frames, propagation network,” Proc. Advances in neural information
we can judge the examinee's speaking behavior and judge processing systems, 1990, pp. 396-404.
abnormal behavior. [7] M. Koestinger, et al., “Annotated facial landmarks in the wild: A large-
scale, real-world database for facial landmark localization,” Proc.
C. Comprehensive Experiments Computer Vision Workshops (ICCV Workshops), 2011 IEEE
International Conference on, IEEE, 2011, pp. 2144-2151.
For the same video with abnormal behavior, we conduct
[8] M.D. Zeiler, “ADADELTA: An Adaptive Learning Rate Method,”
abnormal behavior monitoring experiment through Computer Science, 2012.
professional invigilators, Swathi [4], and our method. Some [9] J. Duchi, et al., “Adaptive subgradient methods for online learning and
records are as follows: stochastic optimization,” Journal of Machine Learning Research, vol.
TABLE IV. Monitoring comparison of monitoring programs 12, no. Jul, 2011, pp. 2121-2159.
Time Video [10] D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
Invigilators Swathi Our method arXiv preprint arXiv:1412.6980, 2014.
0-20* [11] A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2d
21-40 N N Y & 3d face alignment problem?(and a dataset of 230,000 3d facial
41-60* landmarks),” Proc. International Conference on Computer Vision,
61-80 N N N 2017, pp. 8.
81-100 N N Y [12] V. Kazemi and S. Josephine, “One millisecond face alignment with an
101-120 N N Y ensemble of regression trees,” Proc. 27th IEEE Conference on
121-140* Y Computer Vision and Pattern Recognition, CVPR 2014, Columbus,
The marked * in Table 4 indicates no abnormal behavior United States, 23 June 2014 through 28 June 2014, IEEE Computer
Society, 2014, pp. 1867-1874.
during this period, and N indicates no abnormal behavior is

91

You might also like