Student Attendance System Using Face Recognition
Student Attendance System Using Face Recognition
Winter 12-2019
Recommended Citation
Sylaj, Arlind, "STUDENT ATTENDANCE SYSTEM USING FACE RECOGNITION" (2019). Theses and
Dissertations. 1621.
https://knowledgecenter.ubt-uni.net/etd/1621
This Thesis is brought to you for free and open access by the Student Work at UBT Knowledge Center. It has been
accepted for inclusion in Theses and Dissertations by an authorized administrator of UBT Knowledge Center. For
more information, please contact [email protected].
Mechatronics and Management Program
Arlind Sylaj
December / 2019
Prishtinë
Mechatronics and Management Program
Diploma Thesis
Academic year 2018-2019
Arlind Sylaj
December / 2019
This paper has been compiled and submitted to meet the partial requirements
for the Bachelor Degree
II
ABSTRACT
Since traditional students’ attendance system has been known to be time consuming, not
accurate and a hard process to follow, this thesis will try to provide a method for solving this
problem.
Facial recognition system has started to become widespread over this past decade and the
main interest in many industries. Law enforcement agencies are using face recognition to
keep communities safer. Retailers are preventing crime and violence. Airports are improving
travelers’ convenience and security. And mobile phone companies are using face recognition
to provide consumers with new layers of biometric security.
In this thesis another usage of face recognition system is proposed. Using some Computer
Vision algorithms, one will try to develop a face recognition system for managing students’
attendance. Taking pictures from laptops camera a data set of 150 pictures will be created,
50 for each student and the system will be trained. One technique for face detection will be
proposed based on Haar features and another one for face recognition specifically Local
binary patterns histograms. Results from validation data set will also be provided.
III
ACKNOWLEDGEMENT
The completion of this paper was achieved thanks to the help of many individuals, whom I
would like to thank with all my sincerity. My deepest gratitude goes for my professor Bertan
Karahoda. His willingness to dedicate his time, suggestions and advices are indeed
profoundly appreciated. Without his guidance, critiques and recommendations, this paper
would have been unattainable. I extend a great appreciation to the staff of University for
Business and Technology, for always providing me with assistance, for their help and support
through my years of studies. They have never failed to show generosity and company
whenever it was needed. Their contribution will never be forgotten. Above all I want to thank
my family for making all this possible. Thank you for your never-ending love, support and
motivation. Every success is dedicated to you. Also, a special thanks to my dearest friend,
Valona. Thank you for being with me this entire time.
December, 2019
Prishtinë
IV
LIST OF FIGURES
V
LIST OF ABREVIATIONS
VI
CONTENTS
1 INTRODUCTION ..................................................................................... 1
3 PROBLEM STATEMENT..................................................................... 21
4 METHODOLOGY .................................................................................. 22
5 RESULTS ................................................................................................. 23
VII
1 INTRODUCTION
University departments are required to monitor students’ attendance during their studies. It
is a crucial part of the teaching process, both for the students and for the instructors as well.
Traditionally, this has been done by filling the attendance sheet at the beginning of the
lecture. Nevertheless, this is time consuming for the instructor to monitor every student
during the entire semester, and also students can just write someone else’s name without the
student actually being there. To avoid such things and to make it easier for the students and
the instructors, engineers have been trying to implement some technological approaches. For
instance, in most of the University of Prishtina departments, they have started to use RFID
cards for this purpose. Regardless that this was much more practical than the attendance
sheets, it did not come without disadvantages. Students can give the card to their colleagues
without the professor noticing it. Another disadvantage is that when entering the classroom,
students have to form lines in front of the device, which again is time consuming. Face has
been known as the only thing that cannot be duplicated. So, to solve the problem this thesis
will propose usage of Computer Vision techniques for face detection and face recognition.
1
2 LITERATURE REVIEW
Today, almost every area in technical attempts is impacted in some way by digital image
processing. An image may be defined as a two-dimensional function, f (x,y),where x and y
are spatial(plane) coordinates, and the amplitude of f at any pair of coordinates (x,y) is called
the intensity or gray level of the image at the point.When x,y, and the amplitude values of f
are all finite, discrete quantities,we call the image a digital image. The field of digital image
processing refers to processing digital images by means of digital computer.A digital image
is composed of a finite number of elements, each of which has a particular location and
value.These elements are referred to as picture elements, image elements, pels, and
pixels.Pixel is the term most widely used to denote the elements of a digital image. (Gonzalez
& Woods, 2002)
The size of the 2-D pixel grid together with the data size stored for each individual image
pixel determines the spatial resolution and colour quantization of the image. The
representational power (or size) of an image is defined by its resolution. The resolution of an
image source (e.g. a camera) can be specified in terms of three quantities:
- Spatial Resulution
The column (C) by row (R) dimensions of the image define the number of pixels used to
cover the visual space captured by the image. This relates to the sampling of the image signal
and is sometimes referred to as the pixel or digital resolution of the image. It is commonly
quoted as C x R (e.g. 640 x 480, 800 x 600, 1024 x 768, etc.)
- Temporal resolution
For a continuous capture system such as video, this is the number of images captured in a
given time period. It is commonly quoted in frames per second (fps), where each individual
image is referred to as a video frame (e.g. commonly broadcast TV operates at 25 fps; 25–30
2
fps is suitable for most visual surveillance; higher frame-rate cameras are available for
specialist science/engineering capture).
- Bit resolution
This defines the number of possible intensity/colour values that a pixel may have and relates
to the quantization of the image information. For instance, a binary image has just two colours
(black or white), a grey-scale image commonly has 256 different grey levels ranging from
black to white whilst for a colour image it depends on the colour range in use. The bit
resolution is commonly quoted as the number of binary bits required for storage at a given
quantization level, e.g. binary is 2 bits, grey-scale is 8 bit and colour (most commonly) is 24
bits. The range of values a pixel may take is often referred to as the dynamic range of an
image (Solomon & Breckon, 2011).
The choice of image format used can be largely determined by not just the image contents,
but also the actual image data type that is required for storage. In addition to the bit resolution
of a given image discussed earlier, a number of distinct image types also exist:
- Binary images are 2-D arrays that assign one numerical value from the set {0;1} to
each pixel in the image. These are sometimes referred to as logical images: black
corresponds to zero (an ‘off’ or ‘background’ pixel) and white corresponds to one (an
‘on’ or ‘foreground’ pixel). As no other values are permissible, these images can be
represented as a simple bit-stream, but in practice they are represented as 8-bit integer
images in the common image formats. A fax (or facsimile) image is an example of a
binary image.
- Intensity or grey-scale images are 2-D arrays that assign one numerical value to each
pixel which is representative of the intensity at this point. As discussed previously,
the pixel value range is bounded by the bit resolution of the image and such images
are stored as N-bit integer images with a given format.
- RGB or true-colour images are 3-D arrays that assign three numerical values to each
pixel, each value corresponding to the red, green and blue (RGB) image channel
3
component respectively. Conceptually, we may consider them as three distinct, 2-D
planes so that they are of dimension C by R by 3, where R is the number of image
rows and C the number of image columns.
- Floating-point store a floating-point number which, within a given range defined by
the floating-point precision of the image bit- resolution, represents the intensity. They
may (commonly) represent a measurement value other than simple intensity or colour
as part of a scientific or medical image.
We can convert from an RGB colour space to a grey-scale image using a simple transform.
Grey-scale conversion is the initial step in many image analysis algorithms, as it essentially
simplifies (i.e. reduces) the amount of information in the image. Although a grey-scale image
contains less information than a colour image, the majority of important, feature- related
information is maintained, such as edges, regions, blobs, junctions and so on. Feature
detection and processing algorithms then typically operate on the converted grey- scale
version of the image (Solomon & Breckon, 2011).
4
can easily compile thousands of messages, some of which we know to be spam and some of
which are not, and what we want is to “learn” what constitutes spam from this sample. In
other words, we would like the computer (the machine) to extract automatically the algorithm
for this task.
Machine learning is not just a database or programming problem; it is also a requirement for
artificial intelligence. A system that is in a changing environment should have the ability to
learn; otherwise, we would hardly call it intelligent. If the system can learn and adapt to such
changes, the system designer need not foresee and provide solutions for all possible
situations.
Artificial intelligence takes inspiration from the brain. There are cognitive scientists and
neuroscientists whose aim is to understand the functioning of the brain, and toward this aim,
they build models of neural networks and make simulation studies. But artificial intelligence
is a part of computer science and our aim is to build useful systems, as in any domain of
engineering. So, though the brain inspires us, ultimately, we do not care much about the
biological plausibility of the algorithms we develop. (Alpaydin, 2016)
2.2.1 Problems
The range of learning problems is clearly large but researchers have identified an ever-
growing number of templates which can be used to address a large set of situations. These
templates are discussed above:
- Binary Classification is probably the most frequently studied problem in machine
learning and it has led to a large number of important algorithmic and theoretic
developments over the past century. In its simplest form it reduces to the question:
given a pattern x drawn from a domain X, estimate which value an associated binary
random variable y ∈ {±1} will assume. For instance, given pictures of apples and
oranges, we might want to state whether the object in question is an apple or an
orange. Equally well, we might want to predict whether a home owner might default
on his loan, given income data, his credit history, or whether a given e-mail is spam
5
or ham. The ability to solve this basic problem already allows us to address a large
variety of practical settings. (Smola & Vishwanathan, 2008)
- Multiclass Classification is the logical extension of binary classification. The main
difference is that now y ∈ {1,...,n} may assume a range of different values. For
instance, we might want to classify a document according to the language it was
written in (English, French, German, Spanish, Hindi, Japanese, Chinese, . . .). The
main difference to before is that the cost of error may heavily depend on the type of
error we make. For instance, in the problem of assessing the risk of cancer, it makes
a significant difference whether we misclassify an early stage of cancer as healthy (in
which case the patient is likely to die) or as an advanced stage of cancer (in which
case the patient is likely to be inconvenienced from overly aggressive treatment).
(Smola & Vishwanathan, 2008)
- Structured Estimation goes beyond simple multiclass estimation by assuming that
the labels y has some additional structure which can be used in the estimation process.
For instance, y might be a path in an ontology, when attempting to classify webpages,
y might be a permutation, when attempting to match objects, to perform collaborative
filtering, or to rank documents in a retrieval setting. Equally well, y might be an
annotation of a text, when performing named entity recognition. Each of those
problems has its own properties in terms of the set of y which we might consider
admissible, or how to search this space. (Smola & Vishwanathan, 2008)
- Regression is another prototypical application. Here the goal is to estimate a real
valued variable y ∈ R given a pattern x. For instance, we might want to estimate the
value of a stock the next day, the yield of a semiconductor fab given the current
process, the iron content of ore given mass spectroscopy measurements, or the heart
rate of an athlete, given accelerometer data. One of the key issues in which regression
problems differ from each other is the choice of a loss. For instance, when estimating
stock values our loss for a put option will be decidedly onesided. On the other hand,
a hobby athlete might only care that our estimate of the heart rate matches the actual
on average (Smola & Vishwanathan, 2008)
6
- Novelty Detection is a rather ill-defined problem. It describes the issue of
determining “unusual” observations given a set of past measurements. Clearly, the
choice of what is to be considered unusual is very subjective. A commonly accepted
notion is that unusual events occur rarely. Hence a possible goal is to design a system
which assigns to each observation a rating Novelty Detection is a rather ill-defined
problem. It describes the issue of determining “unusual” observations given a set of
past measurements. Clearly, the choice of what is to be considered unusual is very
subjective. A commonly accepted notion is that unusual events occur rarely. Hence a
possible goal is to design a system which assigns to each observation a rating. (Smola
& Vishwanathan, 2008)
Consequently, the field of machine learning has branched into several subfields dealing with
different types of learning tasks. Four parameters along which learning paradigms can be
classified are:
- Supervised learning, the learning element is given the correct (or approximately
correct) value of the function for particular inputs, and changes its representation of
the function to try to match the information provided by the feedback. More formally,
we say an example is a pair (x, f(x)), where x is the input and f (jt) is the output of the
function applied to x. The task of pure inductive inference (or induction) is this: given
a collection of examples of f, return a function h that approximates f. The function h
is called a hypothesis. (Russell & Norvig, 1995)
- Unsupervised learning refers to the problem of trying to find hidden structure in
unlabeled data. Since the examples given to the learner are unlabeled, there is no error
or reward signal to evaluate a potential solution. This distinguishes unsupervised
learning from supervised learning and reinforcement learning. Unsupervised learning
is closely related to the problem of density estimation in statistics. However
unsupervised learning also encompasses many other techniques that seek to
summarize and explain key features of the data. Many methods employed in
unsupervised learning are based on data mining methods used to preprocess data.
Approaches to unsupervised learning include:
7
• clustering (e.g., k-means, mixture models, k-nearest neighbors, hierarchical
clustering),
• blind signal separation using feature extraction techniques for dimensionality
reduction (e.g., Principal component analysis, Independent component analysis, Non-
negative matrix factorization, Singular value decomposition). (Hinton & Sejnowski,
1999)
- Reinforcement learning subsumes biological and technical concepts for solving an
abstract class of problems that can be described as follows: An agent (e.g., an animal,
a robot, or just a computer program) living in an environment is supposed to find an
optimal behavioral strategy while perceiving only limited feedback from the
environment. The agent receives (not necessarily complete) information on the
current state of the environment, can take actions, which may change the state of the
environment, and receives reward or punishment signals, which reflect how
appropriate the agent’s behavior has been in the past. This reward signal may be
sparse, delayed, and noisy. The goal of RL is to find a policy that maximizes the long-
term reward. Compared to supervised learning, where training data provide
information about the correct behavior in particular situations, the RL problem is
more general and thus more difficult, since learning has to be based on considerably
less knowledge. (Sutton & Barto, 2015)
Computer Vision has a dual goal. From the biological science point of view, computer vision
aims to come up with computational models of the human visual system. From the
engineering point o fview, computer vision aims to build autonomous systems which could
perform some of the tasks which the human visual system can perform (and even surpass it
in many cases). Many vision tasks are related to the extraction of 3D and temporal
information from time-varying 2D data suchas obtained by one or more television cameras,
and more generally the understanding of such dynamic scenes. (Huang, 1996)
8
Of all the visual tasks we might ask a computer to perform, analyzing a scene and recog-
nizing all of the constituent objects remains the most challenging. While computers excel at
accurately reconstructing the 3D shape of a scene from images taken from different views,
they cannot name all the objects and animals present in a picture, even at the level of a two-
year-old child. There is not even any consensus among researchers on when this level of
performance might be achieved. (Szeliski, 2010)
Cameras are everywhere and the number of images uploaded on internet is growing
exponentially. We have images on Instagram, videos on YouTube, feeds of security cameras,
medical and scientific images. Computer vision is essential because we need to sort through
these images and enable computers to understand their content. Computer vision can be used
in number of areas such as 3d urban modeling, scene recognition, face detection and
recognition, optical character recognition, mobile visual search, self-driving cars, automatic
checkout, vision-based interaction, augment reality, virtual reality (Krishna, 2017).
Because the purpose of this thesis is to develop a student attendance system, I will explain in
more details only face detection and face recognition.
9
- Presence or absence of structural components. Facial features such as beards,
mustaches, and glasses may or may not be present and there is a great deal of
variability among these components including shape, color, and size.
- Facial expression. The appearance of faces is directly affected by a person’s facial
expression.
- Occlusion. Faces may be partially occluded by other objects. In an image with a group
of people, some faces may partially occlude other faces.
- Image orientation. Face images directly vary for different rotations about the
camera’s optical axis.
- Imaging conditions. When the image is formed, factors such as lighting (spectra,
source distribution and intensity) and camera characteristics (sensor response, lenses)
affect the appearance of a face. (Yang, Kriegman, & Ahuja, 2002)
There are four techniques to detect faces from a single intensity or color image:
1. Knowledge-based methods. These rule-based methods encode human knowledge of
what constitutes a typical face. Usually, the rules capture the relationships between
facial features. These methods are designed mainly for face localization.
2. Feature invariant approaches. These algorithms aim to find structural features that
exist even when the pose, viewpoint, or lighting conditions vary, and then use the
these to locate faces. These methods are designed mainly for face localization.
3. Template matching methods. Several standard patterns of a face are stored to
describe the face as a whole or the facial features separately. The correlations between
an input image and the stored patterns are computed for detection. These methods
have been used for both face localization and detection.
4. Appearance-based methods. In contrast to template matching, the models (or
templates) are learned from a set of training images which should capture the
representative variability of facial appearance. These learned models are then used
for detection. These methods are designed mainly for face detection. (Yang,
Kriegman, & Ahuja, 2002)
10
2.3.1.1 Haar cascade
There were many attempts to respond to real-time constraints for object detection. Viola and
Jones have come up with a method of rectangular Haar-like features with AdaBoost learning
algorithm combined with a cascade of strong classifiers. The proposed object detection
application can be deployed in different platforms; it can be deployed on a high-performance
platform as well as in mobile platform. It can also be used in surveillance systems with
distributed cameras and a back-end server in which the detection takes place. It can also be
used in mobile devices equipped with camera and processor. A highly short response time in
terms of detection is essential for such systems.
There are three main contributions of this face detection framework:
1. Haar-like feature
Haar feature-based cascade classifiers, classifies images based on the value of simple
features. There are many motivations for using features rather than the pixels directly.
The simple features used are reminiscent of Haar basis functions which have been
used by Papageorgiou et al. (1998). More specifically, we use three kinds of features.
The value of a two-rectangle feature is the difference between the sum of the pixels
within two rectangular regions. The regions have the same size and shape and are
horizontally or vertically adjacent (see Fig 1 (a)). A three-rectangle feature computes
the sum within two outside rectangles subtracted from the sum in a center rectangle
(see Fig 1 (b)). Finally, a four-rectangle feature computes the difference between
diagonal pairs of rectangles (see Fig 1 (c)).
12
classification function to classify the training data well (i.e. for a given problem the
best perceptron may only classify the training data correctly 51% of the time). In
order for the weak learner to be boosted, it is called upon to solve a sequence of
learning problems. After the first round of learning, the examples are re-weighted in
order to emphasize those which were incorrectly classified by the previous weak
classifier. The final strong classifier takes the form of a perceptron, a weighted
combination of weak classifiers followed by a threshold.
The key advantage of AdaBoost as a feature selection mechanism, over competitors
such as the wrapper method, is the speed of learning. Using AdaBoost a 200-feature
classifier can be learned in O (MNK) or about 1011 operations. One key advantage is
that in each round the entire dependence on previously selected features is efficiently
and compactly encoded using the example weights. These weights can then be used
to evaluate a given weak classifier in constant time.
The weak classifier selection algorithm proceeds as follows. For each feature, the
examples are sorted based on feature value. The AdaBoost optimal threshold for that
feature can then be computed in a single pass over this sorted list. For each element
in the sorted list, four sums are maintained and evaluated: the total sum of positive
example weights T +, the total sum of negative example weights T −, the sum of
positive weights below the current example S+ and the sum of negative weights
below the current example S−. The error for a threshold which splits the range
between the current and previous example in the sorted list is:
e = min (S + + (T − − S −), S − + (T + − S +), or the minimum of the error of labeling
all examples below the current example negative and labeling the examples above
positive versus the error of the converse. These sums are easily updated as the search
proceeds.
For the task of face detection, the initial rectangle features selected by AdaBoost are
meaningful and easily interpreted. The first feature selected seems to focus on the
property that the region of the eyes is often darker than the region of the nose and
cheeks (see Fig. 2). This feature is relatively large in comparison with the detection
sub-window, and should be somewhat insensitive to size and location of the face. The
13
second feature selected relies on the property that the eyes are darker than the bridge
of the nose.
In summary the 200-feature classifier provides initial evidence that a boosted
classifier constructed from rectangle features is an effective technique for face
detection. In terms of detection, these results are compelling but not sufficient for
many real-world tasks. In terms of computation, this classifier is very fast, requiring
0.7 seconds to scan a 384 by 288 pixel image. Unfortunately, the most straightforward
technique for improving detection performance, adding features to the classifier,
directly increases computation time. (Viola & Jones, 2004)
3. Constructing a cascade
The cascade classifier consists of a list of stages, where each stage consists of a list
of weak learners. The system detects objects in question by moving a window over
the image. Each stage of the classifier labels the specific region defined bythe
current location of the window as either positive or negative –positive meaning that
an object was found ornegative means that the specified object was not found in
the image.If the labelling yields a negative result, then the classification of this
specific region is hereby complete and the location of the window is moved to
the next location. If the labelling gives a positive result, then the region moves of to
the next stage of classification. The classifier yields a final verdict of positive, when
14
all the stages, including the last one, yield a result, saying that the object is found in
the image.
A true positive means that the object in question is indeed in the image and the
classifier labels it as such a positive result. A false positive means that the labelling
process falsely determines, that the object is located in the image, although it
is not.A false negative occurs when the classifier is unable to detect the actual
object from the image and a true negative means that a non-object was correctly
classifier as not being the object in question. In order to work well, each stage of the
cascade must have a low false negative rate, because if the actual object is classified
as a non-object, then the classification of that branch stops, with no way to correct
the mistake made. However, each stage can have a relatively high false
positive rate, because even if the n-th stage classifies the non-object as actually
being the object, then this mistake can be fixed in n+1-th and subsequent stages of
the classifier (Soo, 2014)
2.3.2.1 Eigenface
Eigenface is based on PCA that classify images to extract features using a set of images. It is
important that the images are in the same lighting condition and the eyes match in each image.
Also, images used in this method must contain the same number of pixels and in grayscale.
In mathematical terms, we have to find the principal components of the distribution of faces,
or the eigenvectors of the covariance matrix of the set of face images, treating an image as
a point (or vector) in a very high dimensional space. The eigenvectors are ordered, each one
accounting for a different amount of the variation among the face images. These eigenvectors
can be thought of as a set of features that together characterize the variation between face
images. Each image location contributes more or less to each eigenvector, so that we can
display the eigenvector as a sort of ghostly face which we call an eigenface.
16
Figure 4. Eigenfaces from AT&T Laboratories Cambridge
Each eigenface deviates from uniform gray where some facial feature differs among the set
of training faces; they are a sort of map of the variations between faces. Each individual face
can be represented exactly in terms of a linear combination of the eigenfaces. Each face can
also be approximated using only the “best” eigenfaces-those that have the largest
eigenvalues, and which therefore account for the most variance within the set of face
images. The best M eigenfaces span an M-dimensional subspace- “face space”- of all
possible images.
This approach to face recognition involves the following initialization operations:
1. Acquire an initial set of face images (the training set).
2. Calculate the eigenfaces from the training set, keeping only the M images that
correspond to the highest eigenvalues. These M images define the face space. As new
faces are experienced, the eigenfaces can be updated or recalculated.
3. Calculate the corresponding distribution in M-dimensional weight space for each
known individual, by projecting their face images onto the “face space.”
17
These operations can also be performed from time to time whenever there is free excess
computational capacity. Having initialized the system, the following steps are then used to
recognize new face images:
1. Calculate a set of weights based on the input image and the M eigenfaces by
projecting the input image onto each of the eigenfaces.
2. Determine if the image is a face at all (whether known or unknown) by checking to
see if the image is sufficiently close to “face space.”
3. If it is a face, classify the weight pattern as either a known person or as unknown.
4. (Optional) Update the eigenfaces and/or weight patterns.
5. (Optional) If the same unknown face is seen several times, calculate its
characteristic weight pattern and incorporate into the known faces. (Turk &
Pentland, 1991)
If the same face is analysed under different lighting conditions, it will mix the values when
distribution is calculated and cannot be effectively classified. This makes to different lighting
conditions poses a problem in matching the features as they can changedramatically.
2.3.2.2 Fisherface
18
Furthermore, it takes less amount of space and is the fastest algorithmin this project. Because
of these PCA is more suitable for representation of a set of data while LDA issuitable for
classification. (Dinalankara, 2017)
The LBP operator (Figure 6) is one of the best performing texture descriptors and it has been
widely used in various applications. It has proven to be highly discriminative and its key
advantages, namely its invariance to monotonic gray level changes and computational
efficiency, make it suitable for demanding image analysis tasks.
The LBP operator was originally designed for texture description. The operator assigns a
label to every pixel of an image by thresholding the 3x3-neighborhood of each pixel with the
center pixel value and considering the result as a binary number. Then the histogram of the
labels can be used as a texture descriptor.
19
On the LBP approach for texture classification, the occurrences of the LBP codes in an image
are collected into a histogram. The classification is then performed by computing simple
histogram similarities. However, considering a similar approach for facial image
representation results in a loss of spatial information and therefore one should codify the
texture information while retaining also their locations. One way to achieve this goal is to
use the LBP texture descriptors to build several local descriptions of the face and combine
them into a global description. Such local descriptions have been gaining interest lately which
is understandable given the limitations of the holistic representations. These local feature-
based methods are more robust against variations in pose or illumination than holistic
methods.
The basic methodology for LBP based face description proposed by Ahonen et al. (2006) is
as follows: The facial image is divided into local regions and LBP texture descriptors are
extracted from each region independently. The descriptors are then concatenated to form a
global description of the face, as shown in Fig. 7.
20
3 PROBLEM STATEMENT
Technology has been the solution to many obstacles that mankind has had for a long time.
New discoveries from different fields, such as artificial intelligence, machine learning and
computer vision, have made researchers inquisitive about how can these systems transform
the way we live. These solutions can be from the very mundane ones, like detection of spam
emails to those more complex like pattern recognition of any kind. One of the problems that
can be solved with computer vision techniques is the students’ attendance. As it has been
stated earlier in this thesis, in every university, students’ attendance is monitored every
semester of their studies. In Kosovo, in almost every educational institution, this process is
accomplished manually, through the attendance sheet, which is signed by the students. The
professor and the corresponding department track the sheets of every lecture and by the end
of the semester they check and use it as an aid for the evaluation procedure. This process, as
any other manual process, requires a lot a time and consideration; for there can be times when
the sheet is lost or when the number of students is enormous. Some of the faculties in Kosovo
have implemented various types of automatic systems. These systems have mainly used
RFID technology, whose costs of implementations are irrelevant with the provided results.
But one thing that has not yet been applied in Kosovo is biometric identification. This
approach will have low costs of implementations, since one will not have to make and ID
card for every student, it would be easy for the IT of the faculties to use, and it would be the
answer to all the problems for monitoring the student attendance; all of this, only by a pc
small as a Raspberry pi and a camera.
21
4 METHODOLOGY
In order to accomplish this thesis , a data set with images of people faces had to be created.
This data set can be created directly from the application that is developed in Python , using
OpenCV and its libraries. 50 images of my face and two different football players are
captured and then the algorithm is trained. Before proceeding to training algorithm, images
are scaled and converted to grayscale.
For the face detection part, OpenCV provides pretrained Haar cascade models trained by
Intel Corporation to detect faces and eyes in an image.
Local Binary Patterns Histograms is the algorithm that is used for face recognition because
of results during testing and illumination invariance.
Google Cloud Platform with Google Sheets API is used to create attendance sheet and add
students when identified .
22
5 RESULTS
To make system easier to use, and to allow user easly to interact with program a GUI has
been created.This GUI contains two elements : a text field and five buttons.
- Register
Image data set of this program can be created directly when writing student name in text field
and clicking “Register” button , which will open computers default camera.Using OpenCV
function “COLOR_BGR2GRAY” video stream is converted to gray.The average pixel value
is calculated and a threshold is used to ensure that images have enough brightness.
A text file with the name “ Names.txt” is created and will store students name starting with
their indexis.
23
Figure 9. Text file to save students name
If a face is detected using Haar cascade and pixel average value is above threshold , image is
saved in folder “dataSet”.50 images for every student will be taken every 300 miliseconds.
The final data set will contain student images with their indexes relative to indexes in text
file “Names.txt”.
24
- Train
If all students are added and are photographed, training procces can start by simply clicking
“Train” button.
- Predicting
Main process of this system is the process of predicting faces of students and sign them in
attendance sheet. This process starts by clicking “START” button which will open default
camera and start predicting students when they are detected.
Firstly, face and eye cascades are imported from folder “Haar” then faces are detected using
photo frames from camera converted into gray as first parameter , scale factor of 1.3 which
is relatively small step for resizing and minimum neighbor parameter set to 5 neighbors each
25
candidate rectangle should have to retain it. This method returns the positions of detected
faces as Rect(x,y,w,h). For the process of predicting to start eyes has to be detected and also
be inside the Rectengle that return face detection method.The face is considered as detected
only if eyes are inside face.
Recognizer object created from “ LBPHFaceRecognizer_create ” takes five parameters :
1. Radius - The radius used for building the Circular Local Binary Pattern. This value is
set to 2. The greater the radius, the smoother the image but more spatial information
you can get.
2. Neighbors - The number of sample points to build a Circular Local Binary Pattern
from. An appropriate value is to use 8 sample points but the more sample points you
include, the higher the computational cost.
3. Grid_x - The number of cells in the horizontal direction, 8 is a common value used
in publications. The more cells, the finer the grid, the higher the dimensionality of the
resulting feature vector.
4. Grid_y - The number of cells in the vertical direction, 8 is a common value used in
publications. The more cells, the finer the grid, the higher the dimensionality of the
resulting feature vector.
5. Threshold - The threshold applied in the prediction. If the distance to the nearest
neighbor is larger than the threshold, this method returns -1.
Finally predict method will return student name and confidence.Method “ID2Name” will
check if confidence is not above a threshold and the name is one of the students’ name in our
textfile.
26
Figure 13. Detecting student face
- Attendance sheet
Attendance sheet is created beforehand in Google Sheets and has been shared with a specified
email provided by Google Sheets API on Google Cloud Platform. In this way we will be able
to have acces on this sheet. Google Service Account Credentials are imported from “json”
file in order to be identified.
27
When the face is detected and the “ID2Name” is called to check for student name, depending
on the date of that day the respective column will be filled.
In order to calculate accuracy and to measure performance of this system, another function
is created which takes 150 images from folder “testdataSet”, different from images that it
was trained and for each image it requests predicted id and confidence.
For each image, result is saved in a text file and accuracy is calculated.The highest accuracy
shown is 78.40 %, precision is 100 % and recall is 64 % .
28
In Figure 22 can be seen the values of the test set and the predicted confidence that the system
made.Because the last 50 images are images of unknown students’ confidence starts to
increase.
In Figure 23 can be seen predicted ID values for given test set. First 75 images are images
of known students so the predicted ID are 100 % correct .
29
6 DISCUSSIONS AND CONCLUSIONS
Face recognition system can solve the problem of students’ attendance on educational
institutions. This can be done by using some Computer Vision techniques and algorithms
first to detect a face then to predict whose face is. This system is done using Haar cascade
classifier to detect faces , with a pre trained cascade from Intel, LBPH algorithm from
OpenCV to recognize faces, and Google Sheets API for updating student’s attendance sheet.
After training algorithms with 50 images for student, process of recognizing can start by
using GUI of the software.
Training the system with values of 1 for radius , 1 for neighbors, 15 for threshold the
maximum accuracy we can get is 78.40 %.
This accuracy value and also precision value of 100 % show us that this system combined
with some other new features can be used to manage students’ attendance.
To increase the value of the accuracy for a better result, usage of adaptive threshold can be
proposed.Also system can be trained with 100 pictures for student but this will increase
computational cost drasticly.
This system can be used as a pilot project including usage of two other algorithms, Eigenfaces
and Fisherface and the final result to be combinated result of the three algorithms.This will
give the system more reliability.
Furthermore, this system can be improved by using databases for saving students name,
students image for updating training files, and saving training files.
30
7 REFERENCES
Alpaydin, E. (2016). Machine learning : the new AI. London: The MIT Press.
Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. Fisherfaces:
RecognitionUsing Class Specific Linear Projection. IEEE TRANSACTIONS ON
PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 19, NO. 7, , 711-720.
Clark, A. F., Ehsan, S., Rehman, N. u., & McDonald-Maier, K. D. (2014). Integral Images:
Efficient Algorithms for Their Computation and Storage in Resource-Constrained
Embedded Vision Systems. Sensors 2015, 15, 16804-16830;
doi:10.3390/s150716804.
Dinalankara, L. (2017, August). ResearchGate. Retrieved from
https://www.researchgate.net/profile/Lahiru_Dinalankara/publication/318900718_F
ace_Detection_Face_Recognition_Using_Open_Computer_Vision_Classifies/links/
5984248aa6fdccb3bfcb42ba/Face-Detection-Face-Recognition-Using-Open-
Computer-Vision-Classifies.pdf
Gonzalez, R. C., & Woods, R. E. (2002). Digital Image Processing 2nd Edition. Upper
Saddle River: Prentice Hall.
Hinton, G., & Sejnowski, T. J. (1999). Unsupervised Learning: Foundations of Neural
Computation. MIT Press.
Huang, T. S. (1996). Computer Vision: Evolution And Promise. 19th CERN School of
Computing, 21-25.
Krishna, R. (2017). COMPUTER VISION: FOUNDATIONS AND APPLICATIONS.
Stanford: Stanford University .
Muller, A. C., & Guido, S. (2016). Indtroduction to Machine Learning with Python.
Sebastopol: O’Reilly Media, Inc. .
Pietikäinen, M. (2010). Scholarpedia. Retrieved from
http://www.scholarpedia.org/article/Local_Binary_Patterns
31
Russell, S. J., & Norvig, P. (1995). Artificial Intelligence A Modern Approach. New Jersey:
Prentice-Hall, Inc.
Smola, A., & Vishwanathan, S. (2008). Introduction to Machine Learning. Santa Clara:
Cambridge University Press .
Solomon, C., & Breckon, T. (2011). Fundamentals of Digital Image Processing: A Practical
Approach with Examples in Matlab . Chichester: Wiley-Blackwell.
Soo, S. (2014). Object detection using Haar-cascade Classifier. (pp. 1-12). Institute of
Computer Science, University of Tartu.
Sutton, R. S., & Barto, A. G. (2015). Reinforcement Learning: An Introduction. London: The
MIT Press.
Szeliski, R. (2010). Computer Vision: Algorithms and Applications. Springer.
Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of cognitive
neuroscience, 3(1), 71-86.
Viola, P., & Jones, M. J. (2004). Robust Real-Time Face Detection. International Journal of
Computer Vision 57(2), 137–154,.
Yang, M.-H., Kriegman, D. J., & Ahuja, N. (2002). Detecting Faces in Images: A Survey.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE
INTELLIGENCE, VOL. 24, NO. 1,, 34-58.
32