0% found this document useful (0 votes)
15 views78 pages

Phase 1

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 78

DEEP CONVOLUTIONAL NEURAL NETWORKS-

BASED AGE AND GENDER CLASSIFICATION WITH


FACIAL IMAGES

A PROJECT REPORT PHASE I

Submitted By
KAVIYAPRIYA.G
(910619405001)
in partial fulfillment for the award of the degree

of

MASTER OF ENGINEERING

in

COMPUTER SCIENCE

K.L.N. College of Engineering,


An Autonomous Institution,
Affiliated to Anna University, Chennai
December 2020
K.L.N.COLLEGE OF ENGINEERING,
AN AUTONOMOUS INSTITUTION,
AFFILIATED TO ANNAUNIVERSITY,
CHENNAI 600 025.

BONAFIDE CERTIFICATE

Certified that this Report titled “DEEP CONVOLUTIONAL NEURAL

NETWORKS-BASED AGE AND GENDXER CLASSIFICATION WITH

FACIAL IMAGES” is the bonafide work of G.KAVIYA PRIYA (910619205001)

who carried out the work under my supervision. Certified further that to the best of my

knowledge the work reported herein does not form part of any other thesis or

dissertation on the basis of which a degree or award was conferred on an earlier

occasion on this or any other candidate.

SIGNATURE SIGNATURE

Dr.P.R. VIJAYALAKSHMI, Dr.S.MIRUNA JOE AMALI,


Professor and Head, Professor,
Computer Science and EngineeringComputer Science and Engineering,
K.L.N. College of Engineering,K.L.N. College of Engineering,
Pottapalayam, Pottapalayam,
Sivagangai – 630 611.Sivagangai – 630 611.

Submitted for the project viva-voce held on

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT

Firstly I thank almighty God for his presence throughout the


project and my parents, who helped me in every means possible for the
completion of this project work. I extend my gratitude to our college President,
Er.G.KAVIYA PRIYAB.TECH(IT)., K.L.N. College of Engineering for making
me march towards the glory of success.

I extend my gratefulness to our college Secretary and


Correspondent, Dr. K.N.K.GANESH B.E.,PhD(Hons) , K.L.N. College of
Engineering for the motivation throughout the academic works.

I express my sincere thanks to our beloved management and


Principal, Dr. A.V.RAM PRASAD, M.E, Ph.D., for all the facilities offered.

I express my thanks to Dr.P.R.VIJALAKSHMI, M.E.,Ph.D.,


our Head of the Department of Computer and Science Engineering for her
motivation and tireless encouragement.

I express my deep sense of gratitude and sincere thanks to


Dr.S.MIRUNAJEOAMALI,M.E.,Ph.D, Associate Professor ,Department of
Computer and Science Engineering for her valuable guidance and continuous
encouragement in completion of my Final project work
ABSTRACT

The build an age and gender classification system to classify age and gender
with the help of deep learning framework. Automatic age and gender
classification has become relevant to an increasing amount of applications,
particularly since the rise of social platforms and social media. Nevertheless,
performance of existing methods on real-world images is still significantly
lacking, especially when com-pared to the tremendous leaps in performance
recently re-ported for the related task of face recognition. Keypoint features
and descriptor contains the visual description of the patch and is used to
compare the similarity between image features. In this we show that by
learning representations through the use of deep-convolutional neural
networks (CNN-VGG16), a significant increase in performance can be
obtained on these tasks. To this end, we propose a simple convolutional net
architecture that can be used even when the amount of learning data is
limited. We evaluate our method on the recent dataset for age and gender
estimation and show it to dramatically outperform current state-of-the-art
methods.
DECLARATION

I hereby declare that the work entitled “A PROJECT REPORT


PHASE I” is submitted partial fulfilment of the requirement for
award of the degree in M.E. Computer science, Anna University,
Chennai is a record of the my own work carried out by me during the
academic year 2020 under the supervision and guidance
ofDr.S.MIRUNAJOEAMALI.M.E.,Ph.D Department of Computer
Science Engineering, K.L.N College of Engineering. The extent and
source of information are derived from the existing literature have
been indicated through the dissertation in the appropriate places.
The matter embodied in this work is original and has not been submitted
for award of any other degree or diploma, either in this or any other
university

KAVIYAPRIYA.G
(910619405001)

I certify that the declaration made above by the candidate is true.

Dr.S.MIRUNAJOEAMALI.ME.,Ph.D
ASSOCIATE PROFESSOR
COMPUTER SCIENCE AND ENGINEERING
K.L.N COLLEGE OF ENGINEERING,
AN AUTONOMOUS INSTITUTION ,
AFFILIATED TO ANNA UNNIVERSITY,CHENNAI 600 025.
TABLEOFCONTENTS

CHAPTER TITLE PAGENO

NO

ACKNOWLEDGEMENT

ABSTRACT

LIST OF FIGURES

LIST OF TABLES

1 INTRODUCTION 01

1.1. introduction to Facial images 01

1.2. Domain explaination04

1.2.1 image processing 04

1.2.2. images and pictures 07

1.2.3. images and digital images 08

1.2.4. image processing fundamentals 09

1.2.5. pixel 09

1.2.6. pixel connectivity 10

1.2.7. pixel value 11

2 LITERATURE SURVEY 15

3 PROBLEM DEFINITION 25
3.1 Existing system 25

3.2 proposed system 26


4. IMPLEMENTATION AND METHODOLOGY 27

4.1.System specification 27

4.2.software description 27

4.3.modules description 36

4.4.Algorithm description 40

5. SYSTEM DESIGN 42

5.1.System Architecture 42

5.2.flow chart 43

6. EXPERIMENTAL RESULT AND ANALYSIS 44

6.1. source code 44

6.2. Screen shot 60

7. CONCLUSION 64

7.1.Future enchancement 64

8. REFERENCE 65
LISTOF FIGURES

FIGURE NO TITLE PAGE NO

1.1 An matrix of pixel 4

1.1(a) Each pixel has 0 to 255 4

1.1(b) A true colour image 5

1.2 4 connectivity image 11

1.3 The model of RGB 14

1.4 Different between RGB Components 14

4.1 Anaconda Distribution 32

4.2 Data science Liberires 33

4.3 Conda Packages 34

4.4 Conda Navigators 35

5.1 System Architecture Diagram 42

5.2 Flow Chart Diagram 43

6.1 Image has selected in file 60

6.2 Select the image 60

6.3 Layered is trained 61

6.4 Waiting for accurancy value 63

6.5 Converted face images into layers 63


9
LIST OF ABBREVIATION

LPB Local Binary Pattern

CNN Cellular Nonlinear Network

RGB Red Green amd Blue

KRRKernel Ridge Regression

SVRSupport Vector Regression

ASGD AsynchronousStochasticGradientDescent

FC Fully Connected Layer

1
0
TABLE OF CONTENT

CHAPTER TITLE PAGE NO


NO

ACKNOWLEDGEMENT
ABSTRACT
LIST OF TABLES 7
LIST OF FIGURES 10
LIST OF ABBREVIATION 12

1 INTRODUCTION 1
1.1 INTRODUCTION OF AGE AND GENDER 1

1.2 DOMAIN EXPLAINATION 4


1.2.1 IMAGE PROCESSING 4
1.2.2 IMAGES AND PICTURES 7
1.2.3 IMAGES AND DIGITAL IMAGES 8
1.2.4 IMAGE PROCESSING FUNDAMENTALS 9
1.2.4.1 PIXELS 9
1.2.4.2 PIXEL CONNECTIVITY 10
1.2.4.3 PIXEL VALUES 11
2 REVIEW OF LITERATURE 15
3 PROBLEM DEFINATION 25
3.1 EXISITING SYSTEM 25
3.2 PROPOSED SYSTEM 26
4 IMPLEMENTATION AND METHODOLOGY 27
4.1 SYSTEM SPECIFICATION 27
4.2 SOFTWARE DESCRIPTION 27
4.3 MODULES DESCRIPTION 36
4.4 ALGORITHM DESCRIPTION 40
5 SYSTEM DESIGN 42
5.1 SYSTEM ARCHITECTURE 42
5.2 FLOW CHART 43

xi
6 EXPERIEMENTAL RESULTS AMDANALYSIS
6.1 SOURCE CODE 44
6.2 SCREEN SHOT 60
7 CONCLUSION 64
7.1 FUTURE ENCHANCEMENT 64
8 REFERENCE 65

xii
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION TO AGE AND GENDER

Age and gender classification is extremely realistic. It is widely used in


gathering demographic information, controlling entrance, searching video and
image, improving recognition speed and accuracy,. Besides, age and gender
classification shows important potential in the business field. The log records
acquired from the system ultimately input to statistical analysis about
consumers in order to accomplish real-time personalized recommendations.

The study of gender classification that began in the 1980s is a two-stage


process. At the first stage, study researched on how to distinguish between sexes
psychologically taking ANN methods mainly. Golomb trained a 2-layer fully
connected network. The average error rate to design was about 8.1%. Cottrell
trained a Back Propagation neural network based on the principal component
analysis to the samples. Valentin and Edelman adopted eigenface and linear
neuron respectively. After a phase, automatic gender classification attracted
more and more attention with the rising demand of intelligent visual monitoring.
Gutta combined radial basis function neural network with C4.5 decision tree
into a hybrid classifier, getting a high averaged rate of recognition of 96%.
Subsequent methods such as support vector machine and Adaboost delivered
quite good reports to the gender classification.

As for age classification, Young and Niels put forward age estimation at
the first time in 1994, roughly dividing people into three groups, kids, youngster
and the aged. Hayashi studied on wrinkles texture and skin analysis based on
Hough transform. Zhou used boosing as regression and Geng proposed method
of Aging

1
Pattern Subspace. Recently Guo presents an approach to extract feature related to
senility via embedded low-dimensional aging of the manifold getting from
subspace learning.

Convolutional neural network is one of the artificial neural network. Three


characteristics of the convolutional neural networks are observed, sharing weights,
locally connection and pooling the sampling. This kind of structure reduces the
number of weights, decrease the complexity and has better robustness to zoom,
rotation and translation. Deep convolutional neural network is composed of several
convolutional layers and pooling layers. Convolutional Neural Network supplies
an end-to-end model to learn the features to extract image and classify by using
stochastic gradient descent algorithm. Features of each layer are obtained from
local region of last layer by sharing weights. According to this characteristic,
convolutional neural network seems more suitable for application of learning and
expressing image features.

Previous convolutional neural network is a simple architecture. Classical


LeNet-5 model is generally applied to recognize handwritten digits and classify
images. Along with the immense progress in structure optimization, the
convolutional neural network processing is extending the applicable field of it.
Deep Belief Network, Convolutional Deep Belief Network, Convolutional Deep
Belief Network and Fully Convolutional Network rise in response to the proper
time and conditions. In recent years, convolutional neural network develops further
with the successful application of transfer learning.

GoogLeNet as the training network. GoogLeNet attained top accuracy rate in the
ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14), which
contains 22 layers. GoogLeNet is notable for optimizing the utilization of the
computing resources, keeping the computational budget constant by improving

2
crafted design to increase the depth and width of the network. In order to avoid the
overfitting and enormous computations come from the expansion of network
architecture, GoogLeNet rationally moves from fully connected to sparsely
connected architectures, even inside the convolutions. However, when referring to
numerical computation on non-uniform sparse data structures, computing
infrastructures becomes inefficient. The Inception architecture is used to cluster
sparse structures into denser submatrix. This not only maintains the sparsity of the
network structure, but also take full advantage of high computing performance of
the dense matrix. GoogLeNet adds softmax layers amidst the network to prevent
vanishing gradient problem. Softmax layer is used to obtain extra train loss. The
gradient calculated related to loss to the gradient of the whole network to propagate
the gradient can regard this kind of method as merging sub-networks in various
depths together. Thanks to convolution kernel sharing, computational gradients
accumulate. As a result, gradient will not fade too much.

Traditional gradient descent algorithm orderly disposes the images in the


dataset only in the manner of serial when training on the GPU. It is not ideal
especially for enormous dataset that is time-consuming to train. Therefore, adopt
stochastic gradient descent algorithm with multi-GPU. Each GPU contains a
replica model of the same network and gets gradient on its own during the train
process. Parameters and gradients only pass between GPU and parameter server.
This kind of training method is termed as Asynchronous Stochastic Gradient
Descent (ASGD). Each GPU trains its own model with independent gradient. The
gradient is sent to parameter server for asynchronously update.

3
1.2 DOMAIN EXPLANATION

1.2.1.IMAGE PROCESSING

An image is an array, or a matrix, of square pixels (picture elements)


arranged in columns and rows.

Figure 1.2.1: An image — an array or a matrix of pixels arranged in columns


and rows.

In a (8-bit) greyscale image each picture element has an assigned intensity that
ranges from 0 to 255. A grey scale image is what people normally call a black and
white image, but the name emphasizes that such an image will also include many
shades of grey.

Figure1.2 (A): Each pixel has a value from 0 (black) to 255 (white).

4
The possible range of the pixel values depend on the colour depth of the
image, here 8 bit = 256 tones or greyscales.

Figure1.2(b): A true-colour image assembled from three grayscale images


coloured red, green and blue. Such an image may contain up to 16 million
different colours

A normal grayscale image has 8 bit colour depth = 256 grayscales. A “true colour”
image has 24 bit colour depth = 8 x 8 x 8 bits = 256 x 256 x 256 colours = ~16
million colours.

Some grayscale images have more grayscales, for instance 16 bit = 65536
grayscales. In principle three grayscale images can be combined to form an image
with 281,474,976,710,656 grayscales.

There are two general groups of ‘images’: vector graphics (or line art) and bitmaps
(pixel-based or ‘images’). Some of the most common file formats are:

GIF — an 8-bit (256 colour), non-destructively compressed bitmap format.


Mostly used for web. Has several sub-standards one of which is the
animated GIF.

5
JPEG— a very efficient (i.e. much information per byte) destructively
compressed 24 bit (16 million colours) bitmap format. Widely used,
especially for web and Internet (bandwidth-limited).

TIFF — the standard 24 bit publication bitmap format. Compresses non-


destructively with, for instance, Lempel-Ziv-Welch (LZW) compression.

PS — Postscript, a standard vector format. Has numerous sub-standards and can


be difficult to transport across platforms and operating systems.

PSD – a dedicated Photoshop format that keeps all the information in an image
including all the layers.

Pictures are the most common and convenient means of conveying or


transmitting information. A picture is worth a thousand words. Pictures concisely
convey information about positions, sizes and inter relationships between objects.
They portray spatial information Recognize as objects. Human beings are good at
deriving information from such images, because of innate visual and mental
abilities. About 75% of the information received by human is in pictorial form. An
image is digitized to convert it to a form which can be stored in a computer's
memory or on some form of storage media such as a hard disk or CD-ROM. This
digitization procedure can be done by a scanner, or by a video camera connected to
a frame grabber board in a computer. Once the image has been digitized, it can be
operated upon by various image processing operations.

Image processing operations can be roughly divided into three major


categories, Image Compression, Image Enhancement and Restoration, and
Measurement Extraction. It involves reducing the amount of memory needed to store
a digital image. Image defects which could be caused by the digitization process or by
faults in the imaging set-up (for example, bad lighting) can be corrected using

6
Image Enhancement techniques. Once the image is in good condition, the
Measurement Extraction operations can be used to obtain useful information from
the image. Some examples of Image Enhancement and Measurement Extraction
are given below. The examples shown all operate on 256 grey-scale images. This
means that each pixel in the image is stored as a number between 0 to 255, where 0
represents a black pixel, 255 represents a white pixel and values in-between
represent shades of grey. These operations can be extended to operate on color
images. The examples below represent only a few of the many techniques available
for operating on images. Details about the inner workings of the operations have
not been given, but some references to books containing this information are given
at the end for the interested reader.

1.2.2Images and pictures

As mentioned in the preface, human beings are predominantly visual


creatures: weekly heavily on vision to make sense of the world around us. not only
look at things to identify and classify them, but scan for differences, and obtain an
overall rough feeling for a scene with a quick glance. Humans have evolved very
precise visual skills:Identify a face in an instant; Differentiate colors; an process a
large amount of visual information very quickly.

However, the world is in constant motion: stare at something for long


enough and it will change in some way. Even a large solid structure, like a building
or a mountain, will change its appearance depending on the time of day (day or
night); amount of sunlight (clear or cloudy), or various shadows falling upon it.
concerned with single images: snapshots, if you like, of a visual scene. Although
image processing can deal with changing scenes, discuss it in any detail in this
text. For purposes, an image is a single picture which represents something. It may
be a picture of a person, of people or animals, or of an outdoor scene, or a

7
microphotograph of an electronic component, or the result of medical imaging. Even
if the picture is not immediately recognizable, it will not be just a random blur.

Image processing involves changing the nature of an image in order to either

1. Improve its pictorial information for human interpretation,

2. Render it more suitable for autonomous machine perception

Concerned with digital image processing, which involves using a computer to


change the nature of a digital image. It is necessary to realize that these two aspects
represent two separate but equally important aspects of image processing. A
procedure which satisfies condition, a procedure which makes an image look better
may be the very worst procedure for satisfying condition. Humans like their
images to be sharp, clear and detailed; machines prefer their images to be simple
and uncluttered.

1.2.3.Images and digital images

Suppos an image,take an photo, say. For the moment, lets make things easy
and suppose the photo is black and white (that is, lots of shades of grey), so no
colour.many consider this image as being a two dimensional function, where the
function values give the brightness of the image at any given point.An assume that
in such an image brightness values can be any real numbers in the range (black)
(white).

A digital image from a photo in that the values are all discrete. Usually they
take on only integer values. The brightness values also ranging from 0 (black) to
255 (white). A digital image can be considered as a large array of discrete dots,
each of which has a brightness associated with it. These dots are called picture
elements, or more simply pixels. The pixels surrounding a given pixel constitute its

8
neighborhood. A neighborhood can be characterized by its shape in the same way
as a matrix speak of a neighborhood. Except in very special circumstances,
neighborhoods have odd numbers of rows and columns; this ensures that the
current pixel is in the centre of the neighborhood.

1.2.4.Image Processing Fundamentals:

1.2.4.1Pixel:

In order for any digital computer processing to be carried out on an image, it


must first be stored within the computer in a suitable form that can be manipulated by
a computer program. The most practical way of doing this is to divide the image up
into a collection of discrete (and usually small) cells, which are known as pixels. Most
commonly, the image is divided up into a rectangular grid of pixels, so that each pixel
is itself a small rectangle. Once this has been done, each pixel is given a pixel value
that represents the color of that pixel. It is assumed that the whole pixel is the same
color, and so any color variation that did exist within the area of the pixel before the
image was discretized is lost. However, if the area of each pixel is very small, then the
discrete nature of the image is often not visible to the human eye.

Other pixel shapes and formations can be used, most notably the hexagonal grid, in
which each pixel is a small hexagon. This has some advantages in image processing,
including the fact that pixel connectivity is less ambiguously defined than with a
square grid, but hexagonal grids are not widely used. Part of the reason is that many
image capture systems (e.g. most CCD cameras and scanners) intrinsically discretize
the captured image into a rectangular grid in the first instance.

1.2.4.2.Pixel Connectivity

9
The notation of pixel connectivity describes a relation between two or more pixels.
For two pixels to be connected they have to fulfill certain conditions on the pixel
brightness and spatial adjacency.

First, in order for two pixels to be considered connected, their pixel values must
both be from the same set of values V. For a grayscale image, V might be any
range of graylevels, e.g.V={22,23,...40}, for a binary image simple have V={1}.

To formulate the adjacency criterion for connectivity,first introduce the notation of


neighborhood. For a pixel p with the coordinates (x,y) the set of pixels given by:

is called its 4-neighbors. Its 8-neighbors are defined as

From this infer the definition for 4- and 8-connectivity:

Two pixels p and q, both having values from a set V are 4-connected if q is from the

set and 8-connected if q is from .

General connectivity can either be based on 4- or 8-connectivity; for the following


discussion use 4-connectivity.

A pixel p is connected to a pixel q if p is 4-connected to q or if p is 4-connected to


a third pixel which itself is connected to q. Or, in other words, two pixels q and p
are connected if there is a path from p and q on which each pixel is 4-connected to
the next one.

10
A set of pixels in an image which are all connected to each other is called a
connected component. Finding all connected components in an image and marking
each of them with a distinctive label is called connected component labeling.

An example of a binary image with two connected components which are based on
4-connectivity can be seen in Figure 1. If the connectivity were based on 8-
neighbors, the two connected components would merge into one.

Figure 1.2:Two connected components based on 4-connectivity.

1.2.4.2.Pixel Values

Each of the pixels that represents an image stored inside a computer has a pixel
value which describes how bright that pixel is, and/or what color it should be. In
the simplest case of binary images, the pixel value is a 1-bit number indicating
either foreground or background. For a gray scale images, the pixel value is a
single number that represents the brightness of the pixel. The most common pixel
format is the byte image, where this number is stored as an 8-bit integer giving a
range of possible values from 0 to 255. Typically zero is taken to be black, and 255
is taken to be white. Values in between make up the different shades of gray.

11
To represent color images, separate red, green and blue components must be
specified for each pixel (assuming an RGB color space), and so the pixel `value' is
actually a vector of three numbers. Often the three different components are stored
as three separate `grayscale' images known as color planes (one for each of red,
green and blue), which have to be recombined when displaying or processing.
Multispectral Images can contain even more than three components for each pixel,
and by extension these are stored in the same kind of way, as a vector pixel value,
or as separate color planes.

The actual grayscale or color component intensities for each pixel may not actually
be stored explicitly. Often, all that is stored for each pixel is an index into a colour
map in which the actual intensity or colors can be looked up.

Although simple 8-bit integers or vectors of 8-bit integers are the most common
sorts of pixel values used, some image formats support different types of value, for
instance 32-bit signed integers or floating point values. Such values are extremely
useful in image processing as they allow processing to be carried out on the image
where the resulting pixel values are not necessarily 8-bit integers. If this approach
is used then it is usually necessary to set up a color map which relates particular
ranges of pixel values to particular displayed colors.

12
Pixels, with a neighborhood:

Color scale

The two main color spaces are RGB and CMYK.

RGB

The RGB color model is an additive color model in which red, green, and blue
light are added together in various ways to reproduce a broad array of colors. RGB
uses additive color mixing and is the basic color model used in television or any
other medium that projects color with light. It is the basic color model used in
computers and for web graphics, but it cannot be used for print production.

The secondary colors of RGB – cyan, magenta, and yellow – are formed by mixing
two of the primary colors (red, green or blue) and excluding the third color. Red
and green combine to make yellow, green and blue to make cyan, and blue and red
form magenta. The combination of red, green, and blue in full intensity makes
white.[figure4]

13
Figure 1.3: The additive model of RGB. Red, green, and blue are the primary
stimuli for human color perception and are the primary additive colours.

FIGURE 1.4.: To see how different RGB components combine together, here is a
selected repertoire of colors and their respective relative intensities for each of the
red, green, and blue components

14
CHAPTER 2

LITERATURE SURVEY

1. Title: Age and Gender Classification using Convolutional Neural


Networks(2015)

Author: Gil Levi and Tal Hassner

Age and gender play fundamental roles in social inter-actions. Languages


reserve different salutations and gram-mar rules for men or women, and very often
different vocabularies are used when addressing elders compared to young people.
Despite the basic roles these attributes play in day-to-day lives, the ability to
automatically estimate them accurately and reliably from face images is still far
from meeting the needs of commercial applications. This is particularly perplexing
when considering recent claims tosuper-human capabilities in the related task of
face recognition.

Past approaches to estimating or classifying these at-tributes from face


images have relied on differences in facial feature dimensions or “tailored” face
descriptors. Most have employed classification schemes designed particularly for
age or gender estimation tasks, including and others. Few of these past methods
were designed to handle the many challenges of unconstrained imaging conditions.
Moreover, the machine learning methods employed by these systems did not fully
exploit the massive numbers of image examples and data available through the
Internet in order to improve classification capabilities.

attempt to close the gap between automatic face recognition capabilities and those of
age and gen-der estimation methods. To this end,follow the successful example laid
down by recent face recognition systems: Face recognition techniques described in the
last few years have shown that tremendous progress can be made by the use

15
of deep convolutional neural networks (CNN). An demonstrate similar gains with a
simple network architecture, designed by considering the rather limited availability
of accurate age and gender labels in existing face data sets.

Test network on the newly released Adience benchmark for age and gender
classification of unfiltered face images. show that despite the very challenging
nature of the images in the Adience set and the simplicity of network
design,method outperforms existing state of the art by substantial margins.
Although these results provide a remarkable baseline for deep-learning-based
approaches, they leave room for improvements by more elaborate system designs,
suggesting that the problem of accurately estimating age and gender in the
unconstrained set-tings, as reflected by the Adience images, remains unsolved. In
order to provide a foothold for the development of more effective future methods,
make trained models and classification system publicly available.

All of these methods have proven effective on small and/or constrained


benchmarks for age estimation. To knowledge, the best performing methods were
demonstrated on the Group Photos benchmark. In state-of-the-art performance on
this benchmark was presented by employing LBP descriptor variations and a
dropout-SVM classified.Proposed method to outperform the results they report on
the more challenging Adience benchmark, designed for the same task.

Gathering a large,labeledimage training set for age andgender estimation


from social image repositories requireseither access to personal information on the
subjects appearing in the images (their birth date and gender), whichis often
private, or is tedious and time-consuming to manually label.

16
2. Title: Imagenet Classification with Deep Convolutional Neural Networks
(2012)

Author: A. Krizhevsky, I. Sutskever, and G. E. Hinton

trained a large, deep convolutional neural network to classify the 1.2 million high-
resolution images in the ImageNet LSVRC-2010 contest into the 1000 different
classes. On the test data, achieved top-1 and top-5 error rates of 37.5%and 17.0%
which is considerably better than the previous state-of-the-art. The neural network,
which has 60 million parameters and 650,000 neurons, consists of five convolutional
layers, some of which are followed by max-pooling layers, and three fully-connected
layers with a final 1000-way softmax. To make train-ing faster, used non-saturating
neurons and a very efficient GPU implementation of the convolution operation. To
reduce overfitting in the fully-connected layers employed a recently-developed
regularization method called “dropout “that proved to be very effective.also entered a
variant of this model in theILSVRC-2012 competition and achieved a winning top-5
test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

Current approaches to object recognition make essential use of machine


learning methods. To improve their performance, Collect larger datasets, learn more
powerful models, and use better techniques for preventing overfitting. Until recently,
datasets of labeled images were relatively small — on the order of tens of thousands
of images (e.g., NORB, Caltech-101/256, and CIFAR-10/100). Simple recognition
tasks can be solved quite well with datasets of this size, especially if they are
augmented with label-preserving transformations. For example, the current-best error
rate on the MNIST digit-recognition task (<0.3%) approaches human performance.
But objects in realistic settings exhibit considerable variability, so to

17
learn to recognize them it is necessary to use much larger training sets. And indeed,
the shortcomings of small image datasets have been widely recognized (e.g., Pinto et
al.), but it has only recently become possible to collect labeled datasets with millions
of images. The new larger datasets include LabelMe, which consists of hundreds of
thousands of fully-segmented images, and ImageNet, which consists of over 15
million labeled high-resolution images in over 22,000 categories.

To learn about thousands of objects from millions of images,Need a model with


a large learning capacity. However, the immense complexity of the object recognition
task means that this problem cannot be specified even by a dataset as large as
ImageNet, so model should also have lots of prior knowledge to compensate for all
the data don’t have. Convolutional neural networks (CNNs) constitute one such class
of models. Their capacity can be con-trolled by varying their depth and breadth, and
they also make strong and mostly correct assumptions about the nature of images
(namely, stationarity of statistics and locality of pixel dependencies).Thus, compared
to standard feed forward neural networks with similarly-sized layers, CNNs have
much fewer connections and parameters and so they are easier to train, while their
theoretically-best performance is likely to be only slightly worse.

Despite the attractive qualities of CNNs, and despite the relative efficiency
of their local architecture, they have still been prohibitively expensive to apply in
large scale to high-resolution images. Luckily, current GPUs, paired with a highly-
optimized implementation of 2D convolution, are powerful enough to facilitate the
training of interestingly-large CNNs, and recent datasets such as ImageNet contain
enough labeled examples to train such models without severe overfitting.

3. Title:Method for performing image based regression using boosting(2010)

18
Author: Zhou, S. K., Georgescu, B., Zhou, X., & Comaniciu, D.

Present a general algorithm of image based regression that is applicable to


many vision problems. The proposed regress or that targets a multiple-output setting
is learned using boosting method. The formulate a multiple-output regression problem
in such a way that overfitting is decreased and an analytic solution is admitted.
Because represent the image via a set of highly redundant Haar-like features that can
be evaluated very quickly and select relevant features through boosting to absorb the
knowledge of the train-ing data, during testing an require no storage of the training
data and evaluate the regression function almost in no time.Propose an efficient
training algorithm that breaks the computational bottleneck in the greedy feature
selection process. Validate the efficiency of the proposed regressor using three
challenging tasks of age estimation, tumor detection, and endocardial wall localization
and achieve the best performance with a dramatic speed, e.g., more than1000 times
faster than conventional data-driven techniques such as support vector regressor in the
experiment of endocardial wall localization.

The problem of IBR is defined as follows: Given an image x, are interested in


inferring an entity y(x) that is associated with the image x. The meaning of y(x) varies
a lot in different applications. For example, it could be a feature characterizing the
image (e.g., the human age in the problem A), a parameter related to the image (e.g.,
the position and anisotropic spread of the tumor in the problem B), or other
meaningful quantity (e.g., the location of the endocardial wall in the problem C).

IBR is an emerging challenge in the vision literature. In the article of Wanget


al., support vector regression was employed to infer a shape deformation parameter.
Ina recent work, Agarwal and Triggs used relevance vector regression to estimate a
3D human pose from silhouettes. However, in the above two works, the inputs to the

19
regressors are not images themselves, rather pre-processed entities, e.g., landmark
locations and shape context descriptor.

Numerous algorithms were proposed in the machine learning literature to attack


the regression problem in general. Data-driven approaches gained prevalence. Popular
data-driven regression approaches include non-parametric kernel regression (NPR),
linear methods and their nonlinear kernel variants such as kernel ridge regression
(KRR), sup-port vector regression (SVR), briefly review them in section
2. However, it is often difficult or inefficient to directly apply them to vision
applications due to the following challenges.

Formulate the multiple-output regression problem in such a way that an


analytic solution is allowed at each round of boosting. No decoupling of the output
dimension is performed. Also, decrease overfitting using an image-based
regularization term that has an interpretation as prior knowledge and also allows an
analytic solution. An invoke the boosting framework to perform feature selection
such that only relevant local features are preserved to conquer the variations in
appearance. The use of decision stumps as weak learners also makes it robust to
appearance change. The Haar-like simple features that can be rapidly computed. do
not store the training data. The knowledge of the training data is absorbed in the
weighting coefficients and the selected feature set. Hence, able to evaluate the
regression function almost in no time during testing.Propose an efficient
implementation to perform boosting training, which is usually a time-consuming
process if a truly greedy feature selection procedure is used. In implementation,
Select the features incrementally over the dimension of the output variable.

20
4. Title: Performance and Scalability of GPU-Based Convolutional Neural
Networks(2010)

Author: D. Strigl, K. Kofler, and S. Podlipnig

Present the implementation of a framework for accelerating training and


classification of arbitrary Convolutional Neural Networks (CNNs) on the GPU.
CNNs are a derivative of standard Multilayer Perceptron (MLP) neural networks
optimized for two-dimensional pattern recognition problems such as Optical
Character Recognition (OCR) or face detection.Describe the basic parts of a CNN
and demonstrate the performance and scalability improvement that can be achieved
by shifting the computation-intensive tasks of a CNN to the GPU. Depending on
the network topology training and classification on theGPU performs 2 to 24 times
faster than on the CPU. Furthermore, the GPU version scales much better than the
CPU implementation with respect to the network size.

The biggest drawback of CNNs, besides a complex implementation, is the


long training time. Since CNN training is very compute- and data-intensive,
training with large data sets may take several days or weeks. The huge number of
floating point operations and relatively low data transfer in every training step
makes this task well suited for GPGPU (General Purpose GPU) computation on
current Graphic Processing Units (GPUs). The main advantage of GPUs over
CPUs is the high computational throughput at relatively low cost, achieved through
their massively parallel architecture.

In contrast to other classifiers like Support Vector Machines (SVMs) where


several parallel implementations for CPUs and GPUs exist, similar efforts for CNNs
are missing. Therefore, Implemented a high performance library in CUDA to

21
perform fast training and classification of CNNs on the GPU.Goal was to
demonstrate the performance.

The traditional approach for two-dimensional pattern recognition is based on


a feature extractor, the output of which is fed into a neural network. This feature
extractor is usually static, independent of the neural network and not part of the
training procedure. It is not an easy task to find a “good” feature extractor because
it is not part of the training procedure and therefore it can neither adapt to the
network topology nor to the parameters generated by the training procedure of the
neural network.

The convolutional layers are the core of any CNN. A convolutional layer
consists of several two-dimensional planes of neurons, the so-called feature maps.
Each neuron of a feature map is connected to a small subset of neurons inside the
feature maps of the previous layer, the so-called receptive fields. The receptive
fields of neighboring neurons overlap and the weights of these receptive fields are
shared through all the neurons of the same feature map. The feature maps of a
convolutional layer and its preceding layer are either fully or partially connected
(either in a predefined way or in a randomized manner).

The relatively low amount of data to transfer to the GPU for every pattern
and the big matrices that have to be handled inside the network seem to be
appropriate for GPGPU processing. Furthermore, experiments showed that the
GPU implementation scales much better than the CPU implementations with
increasing network size.

5. Title: Multi-column Deep Neural Networks for Image Classification(2012)

Author: D. Cireşan, U. Meier, and J. Schmidhuber

22
Traditional methods of computer vision and machine learning cannot match
human performance on tasks such as the recognition of handwritten digits or traffic
signs. Biologically plausible, wide and deep artificial neural net-work architectures
can. Small (often minimal) receptive fields of convolutional winner-take-all
neurons yield large network depth, resulting in roughly as many sparsely
connected neural layers as found in mammals between retina and visual cortex.
Only winner neurons are trained. Several deep neural columns become experts on
inputs pre-processed in different ways; their predictions are averaged. Graphics
cards allow for fast training. On the very competitive MNIST handwriting
benchmark, method is the first to achieve near-human performance. On a traffic
sign recognition benchmark it outperforms humans by a factor of two. Improve the
state-of-the-art on a plethora of common image classification benchmarks.

Recent publications suggest that unsupervised pre-training of deep,


hierarchical neural networks improves supervised pattern classification. Here
Rtrain such nets by simple online back-propagation, setting new, greatly improved
records on MNIST, Latin letters, Chinese characters, traffic signs, NORB (jittered,
cluttered) and CIFAR10 benchmarks.

Focus on deep convolutional neural networks (DNN), introduced, improved,


refined and simplified. Lately, DNN proved their mettle on datasets ranging from
handwritten digits (MNIST), hand-written characters to 3D toys (NORB) and
faces.DNNs fully unfold their potential when they are wide (many maps per layer)
and deep (many layers). But training them requires weeks, months, even years on
CPUs. High data transfer latency prevents multi-threading and multi-CPU code
from saving the situation.

Carefully designed GPU code for image classification can be up to two orders
of magnitude faster than its CPU counterpart. Hence, to train huge DNN in hours or

23
days,Implement them on GPU, building upon the work. The training algorithm
is fully online, i.e. weight updates occur after each error back-propagation
step.Show that properly trained wide and deep DNNs can outperform all
previous methods, and demonstrate that unsupervised initialization/pretrainingis
not necessary (although don’t deny that it might help sometimes, especially for
datasets with few samples perclass)Show to combining several DNN columns
into a Multi-column DNN (MCDNN) further decreases the error rate by 30-
40%.

Evaluate architecture on various commonly used object recognition benchmarks


and improve the state-of-the-art on all of them. The description of the DNN
architecture used for the various experiments is given in the follow-ing way:
2x48x48-100C5-MP2-100C5-MP2-100C4-MP2-300N-100N-6N represents a
net with 2 input images of size48x48, a convolutional layer with 100 maps and
5x5 filters, a max-pooling layer over non overlapping regions of size2x2, a
convolutional layer with 100 maps and 4x4 filters, a max-pooling layer over
non-overlapping regions of size2x2, a fully connected layer with 300 hidden
units, a fully connected layer with 100 hidden units and a fully connected output
layer with 6 neurons (one per class).Scaled hyperbolic tangent activation
function for convolutional and fully connected layers, a linear activation
function for max-pooling layers and a softmax activation function for the output
layer. All DNN are trained using on-line gradient descent with an annealed
learning rate.

24
CHAPTER 3

PROBLEM DEFINITION

3.1 EXISTING SYSTEM

In existing approach, view related methods for age and gender


classification and provide a cursory overview of deep convolutional networks.
In age classification the problem of automatically extract-ing age related
attributes from facial images has received increasing attention in recent years
and many methods have been put fourth. Note that de-spite focus here on age
group classification rather than precise age estimation (i.e., age regression), the
survey be-low includes methods designed for either task. Early methods for age
estimation are based on calculating ratios between different measurements of
facial features. Once facial features (e.g. eyes, nose, mouth, chin.) are localized
and their sizes and distances measured, ratios between them are calculated and
used for classifying the face into different age categories according to hand-
crafted rules.In gender classification a detailed survey of gender classification
methods can be found and more recently. Here quickly survey relevant
methods. One of the early methods for gender classification used a neural
network trained on a small set of near-frontal face images.One of the first
applications of convolutional neural net-works (CNN) is perhaps the LeNet-5
network described for optical character recognition.Compared to modern deep
CNN, their network was relatively modest due to the limited computational
resources of the time and the algorithmic challenges of training bigger
networks.Deep CNN have additionally been successfully applied to applications
including human pose estimation, face parsing, facial keypoint detection, speech
recognition and action classification.
25
Disadvantages:

The computational complex of the extraction is more.


▪ Less Accuracy.

3.2PROPOSED SYSTEM

In Proposed System, architecture is used throughout experiments for both age


and gender classification. Prediction running times can conceivably be
substantially improved by running the net-work on image batches. Test the
accuracy of CNN design using the recently released dataset designed for age and
gender classification. An emphasize that the same network architecture is used for
all test folds of the benchmark and in fact, for both gen-der and age classification
tasks. This is performed in order to ensure the validity of results across folds, but
also to demonstrate the generality of the network design proposed here; the same
architecture performs well across different, related problems.For age classification,
Measure and compare both the accuracy when the algorithm gives the exact age-
group classification and when the algorithm is off by one adjacent age-group (i.e.,
the subject belongs to the group immediately older or immediately younger than
the predicted group). This follows others who have done so in the past, and reflects
the uncertainty inherent to the task – facial features often change very little
between oldest faces in one age class and the youngest faces of the subsequent
class. Which used the same gender classification pipeline of applied to more
effective alignment of the faces; faces in their tests were synthetically modified to
appear facing forward.

Advantages:

▪ The background will be extracted separately.


▪ The Classification/ Recognition accur
26
3.3.MODULES DESCRIPTION

Input Image:
The first stage of any vision system is the image acquisition stage. Image
acquisition is the digitization and storage of an image. After the image has been
obtained, various methods of processing can be applied to the image to perform
the many different vision tasks required today. First Capture the Input Image
from source file by using uigetfile and imread function. However, if the image
has not been acquired satisfactorily then the intended tasks may not be
achievable, even with the aid of some form of image enhancement.

Face Region Detection:

Face detection is a challenging task. Several approaches have been


proposed for face detection. Some approaches are only good for one face per
image, while others can detect multiple faces from an image with greater price
to pay in terms of training. Present an approach that can be used for single or
multiple face detection from simple or cluttered scenes. Faces with different
sizes located in any part of an image can be detected using Viola-Jones
approach.The Viola-Jones algorithm is a widely used mechanism for object
detection. The main property of this algorithm is that training is slow, but
detection is fast. This algorithm uses Haar basis feature filters, so it does not use
multiplications. The efficiency of the Viola-Jones algorithm can be significantly
increased by first generating the integral image.

27
Detection happens inside a detection window. A minimum and maximum
window size is chosen, and for each size a sliding step size is chosen. Then the
detection window is moved across the image as follows:

1. Set the minimum window size, and sliding step corresponding to that
size.
2.
3. Set the minimum window size, and sliding step corresponding to that
size.
4. For the chosen window size, slide the window vertically and horizontally
with the same step. At each step, a set of N face recognition filters is
applied. If one filter gives a positive answer, the face is detected in the
current widow.

3. If the size of the window is the maximum size stop the procedure.
Otherwise increase the size of the window and corresponding sliding step
to the next chosen size and go to the step 2.
Key-point Features:

Image features are small patches that are useful to compute similarities
between images. An image feature is usually composed of a feature keypoint
and a feature descriptor. The keypoint usually contains the patch 2D position
and other stuff if available such as scale and orientation of the image feature.
The descriptorcontains the visual description of the patch and is used to
compare the similarity between image features.

VGG16-Classification:

VGG-16 is a convolutional neural network that is 16 layers deep. You can load
a pretrained version of the network trained on more than a million images from
the ImageNet database. The pretrained network can classify images into 1000
28
object categories, such as keyboard, mouse, pencil, and many animals. As a
result, the network has learned rich feature representations for a wide range of
images. The network has an image input size of 224-by-224. You can use
classify to classify new images using the VGG-16 network. Follow the steps of
Classify Image Using GoogLeNet and replace GoogLeNet with VGG-16. To
retrain the network on a

new classification task, follow the steps of Train Deep Learning Network to
Classify New Images and load VGG-16 instead of GoogLeNet.
net = vgg16 returns a VGG-16 network trained on the ImageNet data set.

This function requires Deep Learning Toolbox™ Model for VGG-16 Network
support package. If this support package is not installed, then the function
provides a download link.

net = vgg16('Weights','imagenet') returns a VGG-16 network trained on the


ImageNet data set. This syntax is equivalent to net = vgg16.

layers = vgg16('Weights','none') returns the untrained VGG-16 network


architecture.
The untrained model does not require the support package.

Performance Analysis:

Accuracy:Itis closeness of the measurements to a specific value

Precision: Itis a description of random errors, a measure of


statisticalvariability.

Recall: It is the fraction of the total amount of relevant instances that


were actually retrieved.

29
Sensitivity and specificity are statistical measures of the performance of
a binary classification test, also known in statistics as classification function:

o Sensitivity (also called the true positive rate, the recall, or probability
of detection in some fields) measures the proportion of positives that
are correctly identified as such (e.g., the percentage of sick people
who
are correctly identified as having the condition).

Specificity (also called the true negative rate) measures the proportion of negatives
that are correctly identified as such (e.g., the percentage of healthy people who are
correctly identified as not having the condition).

30
CHAPTER 4

IMPLEMENTATION AND METHODOLOGY

4.1. SYSTEM SPECIFICATION

Software Requirements

 O/S : Windows 8.1.

 Language : python.

 IDE : Anaconda - Spyder

Hardware Requirements

 System : Pentium IV 2.4 GHz

 Hard Disk : 1000 GB

 Monitor : 15 VGA color

 Mouse : Logitech.

Keyboard : 110 keys enhance

4.2.SOFTWARE DESCRIPTION

Python is a general-purpose interpreted, interactive, object-oriented, and high-


level programming language. It was created by Guido van Rossum during 1985-

31
1990. Like Perl, Python source code is also available under the GNU General
Public License (GPL). This tutorial gives enough understanding on Python
programming language.

Python is a popular programming language. It was created in 1991 by Guido


van Rossum.

It is used for:

• web development (server-side),


• software development,
• mathematics,
• System scripting.

Python can be used on a server to create web applications. Python can be used
alongside software to create workflows. Python can connect to database
systems. It can also read and modify files. Python can be used to handle big data
and perform complex mathematics. Python can be used for rapid prototyping, or
for production-ready software development.

Python works on different platforms (Windows, Mac, Linux, Raspberry Pi,).


Python has a simple syntax similar to the English language. Python has syntax
that

allows developers to write programs with fewer lines than some other
programming languages.

Python runs on an interpreter system, meaning that code can be executed as


soon as it is written. This means that prototyping can be very quick. Python can
be treated in a procedural way, an object-orientated way or a functional way.

32
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer
syntactical constructions than other languages.

• Python is Interpreted − Python is processed at runtime by the


interpreter.
You do not need to compile your program before executing it. This is
similar to PERL and PHP.
• Python is Interactive − you can actually sit at a Python prompt and
interact with the interpreter directly to write a programs.
• Python is Object-Oriented − Python supports Object-Oriented style or
technique of programming that encapsulates code within objects.
• Python is a Beginner's Language − Python is a great language for the
beginner-level programmers and supports the development of a wide
range of applications from simple text processing to WWW browsers to
games.

History of Python

Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer
Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C+
+, Algol-68, SmallTalk, and Unix shell and other scripting languages.

Python is copyrighted. Like Perl, Pythce code is now available under the GNU
General Public License (GPL).

Python is now maintained by a core development team at the institute, although


Guido van Rossum still holds a vital role in directing its progress.

Python Features

33
Python's features include −

• Easy-to-learn − Python has few keywords, simple structure, and a


clearly defined syntax. This allows the student to pick up the language
quickly.
• Easy-to-read − Python code is more clearly defined and visible to the
eyes.
• Easy-to-maintain − Python's source code is fairly easy-to-maintain.
• A broad standard library − Python's bulk of the library is very portable
and cross-platform compatible on UNIX, Windows, and Macintosh.
• Interactive Mode − Python has support for an interactive mode which
allows interactive testing and debugging of snippets of code.
• Portable − Python can run on a wide variety of hardware platforms and
has the same interface on all platforms.
• Extendable − you can add low-level modules to the Python interpreter.
These modules enable programmers to add to or customize their tools to
be more efficient.
• Databases − Python provides interfaces to all major commercial
databases.

GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.

Scalable − Python provides a better structure and support for large programs
than shell scripting.

Apart from the above-mentioned features, Python has a big list of good features,
few are listed below −

34
• It supports functional and structured programming methods as well as
OOP.
• It can be used as a scripting language or can be compiled to byte-code for
building large applications.
• It provides very high-level dynamic data types and supports dynamic type
checking.
• It supports automatic garbage collection.
• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and
Java.

Anaconda is the most popular python data science platform.

Anaconda Distribution

With over 6 million users, the open sourceAnaconda Distributionis the fastest
and easiest way to do Python and R data science and machine learning on
Linux,

Windows, and Mac OS X. It's the industry standard for developing, testing, and
training on a single machine.Anaconda Enterprise

Anaconda Enterpriseis an AI/ML enablement platform that empowers


organizations to develop, govern, and automate AI/ML and data science from
laptop through training to production. It lets organizations scale from individual

data scientists to collaborative teams of thousands, and to go from a single


server to thousands of nodes for model training and deployment.

35
FIGURE 4.1 Anaconda distribution
Anaconda Data Science Libraries
Over 1,400 Anaconda-curated and community data science package
• Develop data science projects using favourite Analyse data with
scalability and performance with Dask, numpy, pandas, and Numba
• Visualize data with Matplotlib, Bokeh, Datashader, and Holoviews
• Create machine learning and deep learning models with Scikit-learn,
Tensorflow, h20, and Theano

36
FIGURE 4.2.Data science libraries

Conda, the Data Science Package& Environment Manager

• Automatically manages all packages, including cross-language


dependencies
• Works across all platforms: Linux, macOS, Windows
• Create virtual environments

37
• Download conda packages from Anaconda, Anaconda Enterprise, Conda

Forge,e andAnaconda cloud

FIG URE4.3. Conda packages

• Anaconda Navigator, the Desktop Portal to Data Science

• Install and launch applications and editors including Jupyter, RStudio,


Visual Studio Code, and Spyder
• Manage local environments and data science projects from a graphical
interface
• Connect to Anaconda Cloud or Anaconda Enterprise
• Access the latest learning and community resources

38
FIGURE 4.4.Conda navigator

Spyder

Spyder is an open source cross-platform integrated development environment


(IDE) for scientific programming in the Python language. ... Initially created and
developed by Pierre Raybaut in 2009, since 2012 Spyder has been maintained and
continuously improved by a team of scientific Python developers and the
community. Strongly recommend the free, open-source Spyder Integrated
Development Environment (IDE) for scientific and engineering programming, due
to its integrated editor, interpreter console, and debugging tools. Spyder is included
in Anaconda and other distributions.
Spyder is a powerful scientific environment written in Python, for Python, and
designed by and for scientists, engineers and data analysts. It offers a unique
combination of the advanced editing, analysis, debugging, and profiling
functionality of a comprehensive development tool with the data exploration,
interactive execution, deep inspection, and beautiful visualization capabilities of a
scientific package.
39
Beyond its many built-in features, its abilities can be extended even further via its
plugin system and API. Furthermore, Spyder can also be used as a PyQt5 extension
library, allowing developers to build upon its functionality and embed its
components, such as the interactive console, in their own PyQt software.

Editor
Work efficiently in a multi-language editor with a function/class browser, code
analysis tools, automatic code completion, horizontal/vertical splitting, and go-to-
definition.

IPython Console
Harness the power of as many IPython consoles as you like within the flexibility of
a full GUI interface; run a code by line, cell, or file; and render plots right inline.

Variable Explorer
Interact with and modify variables on the fly: plot a histogram or time series, edit a
date frame or Numpy array, sort a collection, dig into nested objects, and more!

Profiler
Find and eliminate bottlenecks to unchain code's performance.

Debugger
Trace each step of code's execution interactively.

Help
Instantly view any object's docs, and render own.

40
4.4.ALGORITHM DESCRIPTION
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC)is an
annual computer vision competition. Each year, teams compete on two tasks. The first
is to detect objects within an image coming from200 classes, which is called object
localization. The second is to classify images, each labelled with one of1000
categories, which is called image classification.VGG16 is a convolutionneural net
(CNN) architecture which was used to win ILSVR(ImageNet). TheImageNet Large
Scale Visual Recognition Challenge is an annual computer vision competition. It is
considered to be one of the excellent vision model architecture tilldate. Most unique
thing about VGG16 is that instead of having a large number of hyper-parameter
ImageNet is one the on the largest data-set available. It has 14million hand-annotated
images of what is in the picture. You can load a pretrainedversion of the network
trained on more than a million images from the ImageNet database. The pretrained
network can classify images into 1000 object categories, such as keyboard, mouse,
pencil, and many animals. As a result, the network has learned rich feature
representations for a wide range of images. The network has an image input size of
224-by-224.The input to convolutiona1 layer is of fixed size 224x 224 RGB image.
The image is passed through a stack of convolutional (conv.) layers, where the filters
were used with a very small receptive field: 3×3 (which is the smallest size to capture
the notion of left/right, up/down, center). In one of the configurations, it also utilizes

1×1 convolution filters, which can be seen as a linear transformation of the


input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel;
41
the spatial padding of convol. Layer input is such that the spatial resolution is
preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial
pooling is

carried out by five max-pooling layers, which follow some of the conventional
layers (not all the conv. layers are followed by max-pooling). Max-pooling is
performed over a 2×2 pixel window, with stride 2. It

follows this arrangement of convolution and max pool layers consistently


throughout the whole architecture. In the end it has 2 FC (fully connected layers)
followed by a softmax for output. The 16 in VGG16 refers to it has 16 layers that
have weights. This network is a pretty large network and it has about 138 million
(approx.) parameters.Three Fully-Connected (FC) layers follow a stack of
convolutional layers (which has a different depth in different architectures): The
configuration of the fully connected layers is the same in all networks.All hidden
layers are equipped with the rectification (ReLU) non-linearity. It is also noted that
none of the networks (except for one) contain Local Response Normalisation
(LRN), such normalization does not improve the performance on the ILSVRC
dataset, but leads to increased memory consumption and computation time .

42
CHAPTER 5

SYSTEM DESIGN

5.1.SYSTEM AECHITECTURE

Face Image Face Region Detection

Classification Using VGG16

Age and Gender Prediction

Performance Analysis

FIGURE 5.1:SYSTEM ARCHITECTURE

43
5.2.FLOW DIAGRAM

Face Region
Detection

Face Image

Label Creation For Age and Gender

VGG16
Convolutional Maxpool Fully ConnectedPrediction
Softmax Result

FIGURE5.2.FLOW DIAGRAM

44
CHAPTER 6

EXPERIMENTAL RESULT AND ANALYSIS

6.1.SOURCE CODE:
fromnumpy import all, any, array, arctan2, cos, sin, exp, dot, log, logical_and, roll,
sqrt, stack, trace, unravel_index, pi, deg2rad, rad2deg, where, zeros, floor, full,
nan, isnan, round, float32
fromnumpy.linalg import det, lstsq, norm
from cv2 import resize, GaussianBlur, subtract, KeyPoint,
INTER_LINEAR, INTER_NEAREST
fromfunctools import cmp_to_key
import logging
logger = logging.getLogger(__name__)
float_tolerance = 1e-7
defcomputeKeypointsAndDescriptors(image, sigma=1.6,
num_intervals=3, assumed_blur=0.5, image_border_width=5):
image = image.astype('float32')
base_image = generateBaseImage(image, sigma, assumed_blur)
num_octaves = computeNumberOfOctaves(base_image.shape)
gaussian_kernels = generateGaussianKernels(sigma, num_intervals)
gaussian_images = generateGaussianImages(base_image,
num_octaves, gaussian_kernels)
dog_images = generateDoGImages(gaussian_images)
keypoints = findScaleSpaceExtrema(gaussian_images, dog_images, num_intervals,
sigma, image_border_width)

45
keypoints = removeDuplicateKeypoints(keypoints)
keypoints = convertKeypointsToInputImageSize(keypoints)
descriptors = generateDescriptors(keypoints, gaussian_images)
returnkeypoints, descriptors
defgenerateBaseImage(image, sigma, assumed_blur):
logger.debug('Generating base image...')
image = resize(image, (0, 0), fx=2, fy=2, interpolation=INTER_LINEAR)
sigma_diff = sqrt(max((sigma ** 2) - ((2 * assumed_blur) ** 2), 0.01))
returnGaussianBlur(image, (0, 0), sigmaX=sigma_diff, sigmaY=sigma_diff) # the
image blur is now sigma instead of assumed_blur
defcomputeNumberOfOctaves(image_shape):
returnint(round(log(min(image_shape)) / log(2) - 1))
defgenerateGaussianKernels(sigma, num_intervals):
logger.debug('Generating scales...')
num_images_per_octave = num_intervals + 3
k = 2 ** (1. / num_intervals)
gaussian_kernels = zeros(num_images_per_octave) # scale of gaussian blur
necessary to go from one blur scale to the next within an octave
gaussian_kernels[0] = sigma
forimage_index in range(1, num_images_per_octave):
sigma_previous = (k ** (image_index - 1)) * sigma
sigma_total = k * sigma_previous
gaussian_kernels[image_index] = sqrt(sigma_total ** 2 - sigma_previous **
2) returngaussian_kernels
defgenerateGaussianImages(image, num_octaves, gaussian_kernels):
logger.debug('Generating Gaussian images...')
gaussian_images = []

46
foroctave_index in range(num_octaves):
gaussian_images_in_octave = []
gaussian_images_in_octave.append(image) # first image in octave already has the
correct blur
forgaussian_kernel in gaussian_kernels[1:]:
image = GaussianBlur(image, (0, 0),
sigmaX=gaussian_kernel, sigmaY=gaussian_kernel)
gaussian_images_in_octave.append(image)
gaussian_images.append(gaussian_images_in_octave)
octave_base = gaussian_images_in_octave[-3]
image = resize(octave_base, (int(octave_base.shape[1] / 2),
int(octave_base.shape[0] / 2)), interpolation=INTER_NEAREST)
return array(gaussian_images)

defgenerateDoGImages(gaussian_images):
logger.debug('Generating Difference-of-Gaussian images...')
dog_images = []
forgaussian_images_in_octave in gaussian_images:
dog_images_in_octave = []
forfirst_image, second_image in zip(gaussian_images_in_octave,
gaussian_images_in_octave[1:]):
dog_images_in_octave.append(subtract(second_image, first_image)) # ordinary
subtraction will not work because the images are unsigned integers
dog_images.append(dog_images_in_octave)
return array(dog_images)

###############################

47
# Scale-space extrema related #
###############################

deffindScaleSpaceExtrema(gaussian_images, dog_images, num_intervals, sigma,


image_border_width, contrast_threshold=0.04):
logger.debug('Finding scale-space extrema...')
threshold = floor(0.5 * contrast_threshold / num_intervals * 255) # from OpenCV
implementation
keypoints = []
foroctave_index, dog_images_in_octave in enumerate(dog_images):
forimage_index, (first_image, second_image, third_image) in
enumerate(zip(dog_images_in_octave, dog_images_in_octave[1:],
dog_images_in_octave[2:])):
# (i, j) is the center of the 3x3 array
fori in range(image_border_width, first_image.shape[0] - image_border_width):
for j in range(image_border_width, first_image.shape[1] - image_border_width):
if isPixelAnExtremum(first_image[i-1:i+2, j-1:j+2], second_image[i-
1:i+2, j-1:j+2], third_image[i-1:i+2, j-1:j+2], threshold):
localization_result = localizeExtremumViaQuadraticFit(i, j, image_index + 1,
octave_index, num_intervals, dog_images_in_octave, sigma,
contrast_threshold, image_border_width)
iflocalization_result is not None:
keypoint, localized_image_index = localization_result
keypoints_with_orientations = computeKeypointsWithOrientations(keypoint,
octave_index, gaussian_images[octave_index][localized_image_index])
forkeypoint_with_orientation in keypoints_with_orientations:
keypoints.append(keypoint_with_orientation)
returnkeypoints

48
defisPixelAnExtremum(first_subimage, second_subimage,
third_subimage, threshold):
center_pixel_value = second_subimage[1,
1] if abs(center_pixel_value) > threshold:
ifcenter_pixel_value> 0:
return all(center_pixel_value>= first_subimage) and \
all(center_pixel_value>= third_subimage) and \
all(center_pixel_value>= second_subimage[0, :]) and \
all(center_pixel_value>= second_subimage[2, :]) and \
center_pixel_value>= second_subimage[1, 0] and \
center_pixel_value>= second_subimage[1, 2]
elifcenter_pixel_value< 0:
return all(center_pixel_value<= first_subimage) and \
all(center_pixel_value<= third_subimage) and \
all(center_pixel_value<= second_subimage[0, :]) and \
all(center_pixel_value<= second_subimage[2, :]) and \
center_pixel_value<= second_subimage[1, 0] and \
center_pixel_value<= second_subimage[1, 2] return
False

deflocalizeExtremumViaQuadraticFit(i, j, image_index, octave_index,


num_intervals, dog_images_in_octave, sigma, contrast_threshold,
image_border_width, eigenvalue_ratio=10, num_attempts_until_convergence=5):
logger.debug('Localizing scale-space extrema...')
extremum_is_outside_image = False

49
image_shape = dog_images_in_octave[0].shape
forattempt_index in range(num_attempts_until_convergence):
# need to convert from uint8 to float32 to compute derivatives and need to
rescale pixel values to [0, 1] to apply Lowe's thresholds
first_image, second_image, third_image = dog_images_in_octave[image_index-
1:image_index+2]
pixel_cube = stack([first_image[i-1:i+2, j-1:j+2],
second_image[i-1:i+2, j-1:j+2],
third_image[i-1:i+2, j-1:j+2]]).astype('float32') / 255.
gradient = computeGradientAtCenterPixel(pixel_cube)
hessian = computeHessianAtCenterPixel(pixel_cube)
extremum_update = -lstsq(hessian, gradient, rcond=None)[0]
if abs(extremum_update[0]) < 0.5 and abs(extremum_update[1]) < 0.5 and
abs(extremum_update[2]) < 0.5:
break
j += int(round(extremum_update[0]))
i += int(round(extremum_update[1]))
image_index += int(round(extremum_update[2]))
# make sure the new pixel_cube will lie entirely within the image
ifi<image_border_width or i>= image_shape[0] - image_border_width or
j <image_border_width or j >= image_shape[1] - image_border_width or
image_index< 1 or image_index>num_intervals:
extremum_is_outside_image = True
break
ifextremum_is_outside_image:
logger.debug('Updated extremum moved outside of image before reaching
convergence. Skipping...')

50
return None
ifattempt_index>= num_attempts_until_convergence - 1:
logger.debug('Exceeded maximum number of attempts without reaching
convergence for this extremum. Skipping...')
return None
functionValueAtUpdatedExtremum = pixel_cube[1, 1, 1] + 0.5 *
dot(gradient, extremum_update)
if abs(functionValueAtUpdatedExtremum) * num_intervals>= contrast_threshold:
xy_hessian = hessian[:2, :2]
xy_hessian_trace = trace(xy_hessian)
xy_hessian_det = det(xy_hessian)
ifxy_hessian_det> 0 and eigenvalue_ratio * (xy_hessian_trace ** 2) <
((eigenvalue_ratio + 1) ** 2) * xy_hessian_det:
# Contrast check passed -- construct and return OpenCVKeyPoint
object keypoint = KeyPoint()
keypoint.pt = ((j + extremum_update[0]) * (2 ** octave_index), (i
+ extremum_update[1]) * (2 ** octave_index))
keypoint.octave = octave_index + image_index * (2 ** 8) +
int(round((extremum_update[2] + 0.5) * 255)) * (2 ** 16)
keypoint.size = sigma * (2 ** ((image_index + extremum_update[2]) /
float32(num_intervals))) * (2 ** (octave_index + 1)) # octave_index + 1 because
the input image was doubled
keypoint.response = abs(functionValueAtUpdatedExtremum)
returnkeypoint, image_index
return None

defcomputeGradientAtCenterPixel(pixel_array):
dx = 0.5 * (pixel_array[1, 1, 2] - pixel_array[1, 1, 0])

51
dy = 0.5 * (pixel_array[1, 2, 1] - pixel_array[1, 0, 1])
ds = 0.5 * (pixel_array[2, 1, 1] - pixel_array[0, 1, 1])
return array([dx, dy, ds])

defcomputeHessianAtCenterPixel(pixel_array):
center_pixel_value = pixel_array[1, 1, 1]
dxx = pixel_array[1, 1, 2] - 2 * center_pixel_value + pixel_array[1, 1, 0]
dyy = pixel_array[1, 2, 1] - 2 * center_pixel_value + pixel_array[1, 0, 1]
dss = pixel_array[2, 1, 1] - 2 * center_pixel_value + pixel_array[0, 1, 1]
dxy = 0.25 * (pixel_array[1, 2, 2] - pixel_array[1, 2, 0] - pixel_array[1, 0, 2] +
pixel_array[1, 0, 0])
dxs = 0.25 * (pixel_array[2, 1, 2] - pixel_array[2, 1, 0] - pixel_array[0, 1, 2] +
pixel_array[0, 1, 0])
dys = 0.25 * (pixel_array[2, 2, 1] - pixel_array[2, 0, 1] - pixel_array[0, 2, 1] +
pixel_array[0, 0, 1])
return array([[dxx, dxy, dxs],
[dxy, dyy, dys],
[dxs, dys, dss]])

defcomputeKeypointsWithOrientations(keypoint, octave_index, gaussian_image,


radius_factor=3, num_bins=36, peak_ratio=0.8, scale_factor=1.5):
logger.debug('Computing keypoint orientations...')
keypoints_with_orientations = []
image_shape = gaussian_image.shape

scale = scale_factor * keypoint.size / float32(2 ** (octave_index + 1)) # compare


with keypoint.size computation in localizeExtremumViaQuadraticFit()

52
radius = int(round(radius_factor * scale))
weight_factor = -0.5 / (scale ** 2)
raw_histogram = zeros(num_bins)
smooth_histogram = zeros(num_bins)

fori in range(-radius, radius + 1):


region_y = int(round(keypoint.pt[1] / float32(2 ** octave_index))) + i
ifregion_y> 0 and region_y<image_shape[0] - 1: for j in range(-
radius, radius + 1):
region_x = int(round(keypoint.pt[0] / float32(2 ** octave_index))) + j
ifregion_x> 0 and region_x<image_shape[1] - 1:
dx = gaussian_image[region_y, region_x + 1] - gaussian_image[region_y,
region_x - 1]
dy = gaussian_image[region_y - 1, region_x] - gaussian_image[region_y + 1,
region_x]
gradient_magnitude = sqrt(dx * dx + dy * dy)
gradient_orientation = rad2deg(arctan2(dy, dx))
weight = exp(weight_factor * (i ** 2 + j ** 2)) # constant in front of exponential
can be dropped because will find peaks later
histogram_index = int(round(gradient_orientation * num_bins / 360.))
raw_histogram[histogram_index % num_bins] += weight * gradient_magnitude

for n in range(num_bins):
smooth_histogram[n] = (6 * raw_histogram[n] + 4 * (raw_histogram[n - 1] +
raw_histogram[(n + 1) % num_bins]) + raw_histogram[n - 2] + raw_histogram[(n
+ 2) % num_bins]) / 16.
orientation_max = max(smooth_histogram)

53
orientation_peaks = where(logical_and(smooth_histogram>
roll(smooth_histogram, 1), smooth_histogram> roll(smooth_histogram, -1)))[0]
forpeak_index in orientation_peaks:
peak_value = smooth_histogram[peak_index]
ifpeak_value>= peak_ratio * orientation_max:
# Quadratic peak interpolation
# The interpolation update is given by equation (6.30) in
https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation_Spectral_Peaks.html
left_value = smooth_histogram[(peak_index - 1) % num_bins]
right_value = smooth_histogram[(peak_index + 1) % num_bins]
interpolated_peak_index = (peak_index + 0.5 * (left_value -
right_value) / (left_value - 2 * peak_value + right_value)) % num_bins
orientation = 360. - interpolated_peak_index * 360. / num_bins
if abs(orientation - 360.) <float_tolerance: orientation = 0

new_keypoint = KeyPoint(*keypoint.pt, keypoint.size, orientation,


keypoint.response, keypoint.octave)
keypoints_with_orientations.append(new_keypoint)
returnkeypoints_with_orientations

defcompareKeypoints(keypoint1, keypoint2):
if keypoint1.pt[0] != keypoint2.pt[0]: return
keypoint1.pt[0] - keypoint2.pt[0] if
keypoint1.pt[1] != keypoint2.pt[1]:
return keypoint1.pt[1] - keypoint2.pt[1]
if keypoint1.size != keypoint2.size:
return keypoint2.size - keypoint1.size

54
if keypoint1.angle != keypoint2.angle:
return keypoint1.angle - keypoint2.angle
if keypoint1.response != keypoint2.response:
return keypoint2.response - keypoint1.response
if keypoint1.octave != keypoint2.octave: return
keypoint2.octave - keypoint1.octave return
keypoint2.class_id - keypoint1.class_id

defremoveDuplicateKeypoints(keypoints):
iflen(keypoints) < 2:
returnkeypoints

keypoints.sort(key=cmp_to_key(compareKeypoints))
unique_keypoints = [keypoints[0]]
fornext_keypoint in keypoints[1:]:
last_unique_keypoint = unique_keypoints[-1]
if last_unique_keypoint.pt[0] != next_keypoint.pt[0] or \
last_unique_keypoint.pt[1] != next_keypoint.pt[1] or \
last_unique_keypoint.size != next_keypoint.size or \
last_unique_keypoint.angle != next_keypoint.angle:
unique_keypoints.append(next_keypoint)
returnunique_keypoints
defconvertKeypointsToInputImageSize(keypoints):
converted_keypoints = []
forkeypoint in keypoints:

55
keypoint.pt = tuple(0.5 * array(keypoint.pt))
keypoint.size *= 0.5
keypoint.octave = (keypoint.octave& ~255) | ((keypoint.octave - 1) & 255)
converted_keypoints.append(keypoint)
returnconverted_keypoints
defunpackOctave(keypoint):
octave = keypoint.octave& 255
layer = (keypoint.octave>> 8) & 255
if octave >= 128:
octave = octave | -128
scale = 1 / float32(1 << octave) if octave >= 0 else float32(1 << -octave)
return octave, layer, scale

defgenerateDescriptors(keypoints, gaussian_images, window_width=4,


num_bins=8, scale_multiplier=3, descriptor_max_value=0.2):
logger.debug('Generating descriptors...')
descriptors = []

forkeypoint in keypoints:
octave, layer, scale = unpackOctave(keypoint)
gaussian_image = gaussian_images[octave + 1, layer]
num_rows, num_cols = gaussian_image.shape
point = round(scale * array(keypoint.pt)).astype('int')
bins_per_degree = num_bins / 360.
angle = 360. - keypoint.angle
cos_angle = cos(deg2rad(angle))

56
sin_angle = sin(deg2rad(angle))
weight_multiplier = -0.5 / ((0.5 * window_width) ** 2)
row_bin_list = []
col_bin_list = []
magnitude_list = []
orientation_bin_list = []
histogram_tensor = zeros((window_width + 2, window_width + 2, num_bins)) #
first two dimensions are increased by 2 to account for border effects
# Descriptor window size (described by half_width) follows OpenCV
convention hist_width = scale_multiplier * 0.5 * scale * keypoint.size
half_width = int(round(hist_width * sqrt(2) * (window_width + 1) * 0.5)) #
sqrt(2) corresponds to diagonal length of a pixel
half_width = int(min(half_width, sqrt(num_rows ** 2 + num_cols ** 2))) #
ensure half_width lies within image
for row in range(-half_width, half_width + 1):
for col in range(-half_width, half_width + 1):
row_rot = col * sin_angle + row * cos_angle
col_rot = col * cos_angle - row * sin_angle
row_bin = (row_rot / hist_width) + 0.5 * window_width - 0.5
col_bin = (col_rot / hist_width) + 0.5 * window_width - 0.5
ifrow_bin> -1 and row_bin<window_width and col_bin> -1 and
col_bin<window_width:
window_row = int(round(point[1] + row))
window_col = int(round(point[0] + col))
ifwindow_row> 0 and window_row<num_rows - 1 and window_col> 0
and window_col<num_cols - 1:

57
dx = gaussian_image[window_row, window_col + 1] -
gaussian_image[window_row, window_col - 1]
dy = gaussian_image[window_row - 1, window_col] -
gaussian_image[window_row + 1, window_col]
gradient_magnitude = sqrt(dx * dx + dy * dy)
gradient_orientation = rad2deg(arctan2(dy, dx)) % 360
weight = exp(weight_multiplier * ((row_rot / hist_width) ** 2 +
(col_rot / hist_width) ** 2))
row_bin_list.append(row_bin)
col_bin_list.append(col_bin)
magnitude_list.append(weight * gradient_magnitude)
orientation_bin_list.append((gradient_orientation - angle) * bins_per_degree)

forrow_bin, col_bin, magnitude, orientation_bin in zip(row_bin_list, col_bin_list,


magnitude_list, orientation_bin_list):
# Smoothing via trilinear interpolation
# Notations follows https://en.wikipedia.org/wiki/Trilinear_interpolation
# Note that really doing the inverse of trilinear interpolation here (Take the
center value of the cube and distribute it among its eight neighbors)
row_bin_floor, col_bin_floor, orientation_bin_floor = floor([row_bin,
col_bin, orientation_bin]).astype(int)
row_fraction, col_fraction, orientation_fraction = row_bin - row_bin_floor,
col_bin - col_bin_floor, orientation_bin - orientation_bin_floor
iforientation_bin_floor< 0:
orientation_bin_floor += num_bins
iforientation_bin_floor>= num_bins:
orientation_bin_floor -= num_bins

58
c1 = magnitude * row_fraction
c0 = magnitude * (1 - row_fraction)
c11 = c1 * col_fraction
c10 = c1 * (1 - col_fraction)
c01 = c0 * col_fraction
c00 = c0 * (1 - col_fraction)
c111 = c11 * orientation_fraction
c110 = c11 * (1 - orientation_fraction)
c101 = c10 * orientation_fraction
c100 = c10 * (1 - orientation_fraction)
c011 = c01 * orientation_fraction
c010 = c01 * (1 - orientation_fraction)
c001 = c00 * orientation_fraction
c000 = c00 * (1 - orientation_fraction)

histogram_tensor[row_bin_floor + 1, col_bin_floor + 1, orientation_bin_floor]


+= c000
histogram_tensor[row_bin_floor + 1, col_bin_floor + 1, (orientation_bin_floor + 1)
% num_bins] += c001
histogram_tensor[row_bin_floor + 1, col_bin_floor + 2, orientation_bin_floor] +=
c010
histogram_tensor[row_bin_floor + 1, col_bin_floor + 2, (orientation_bin_floor + 1)
% num_bins] += c011
histogram_tensor[row_bin_floor + 2, col_bin_floor + 1, orientation_bin_floor]
+= c100
histogram_tensor[row_bin_floor + 2, col_bin_floor + 1, (orientation_bin_floor + 1)
% num_bins] += c101

59
histogram_tensor[row_bin_floor + 2, col_bin_floor + 2, orientation_bin_floor]
+= c110
histogram_tensor[row_bin_floor + 2, col_bin_floor + 2, (orientation_bin_floor + 1)
% num_bins] += c111

descriptor_vector = histogram_tensor[1:-1, 1:-1, :].flatten() # Remove histogram


borders
# Threshold and normalize descriptor_vector
threshold = norm(descriptor_vector) * descriptor_max_value
descriptor_vector[descriptor_vector> threshold] = threshold
descriptor_vector /= max(norm(descriptor_vector), float_tolerance)
# Multiply by 512, round, and saturate between 0 and 255 to convert from
float32 to unsigned char (OpenCV convention)
descriptor_vector = round(512 * descriptor_vector)
descriptor_vector[descriptor_vector< 0] = 0
descriptor_vector[descriptor_vector> 255] = 255
descriptors.append(descriptor_vector) return
array(descriptors, dtype='float32')

60
6.2.SCREENSHOTS
STEP1:
Select the file in package

FIGURE 6.1.image is selected in file

STEP2:

Select the face images and destoryed the image

FIGURE 6.2.selected the face image


STEP 3:
layer is trained step by step in VGG LAYER

61
FIGURE.6.3.VGG layer is trained

62
Step4:
Result is trained in VGG layer for accurancy rersult

FIGURE 6.4.:Waiting for accurancy result


Step5:
Predicted result is print accurancy value:89.20

63
FIGURE 6.5.converted the face images into layer and trained the
face image for accurancy value

64
CHAPTER 7
CONCLUSION

conclude that First, CNN can be used to provide improved age and gender
classification results, even considering the much smaller size of contemporary
unconstrained image sets labeled forage and gender. Second, the simplicity of
model implies that more elaborate systems using more training data may well be
capable of substantially improving results be-yond those reported here.

7.1.FUTURE ENHANCEMENT
For future works, consider a deeper CNN architecture and a more robust image
processing algorithm for exact age estimation. Also, the apparent age estimation of
human’s face will be interesting research to investigate in the future

65
REFERENCES

[1]Levi, G., & Hassncer, T. (2015). Age and gender classification using
convolutional neural networks. Computer Vision and Pattern Recognition
Workshops (pp.34-42). IEEE.

[2]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImagenetClassification with


Deep Convolutional Neural Networks,” Advances in Neural Information
Processing Systems, pp. 1097-1105, 2012.

[3]Zhou, S. K., Georgescu, B., Zhou, X., & Comaniciu, D. (2010). Method for
performing image based regression using boosting. US, US7804999.

[4]D. Strigl, K. Kofler, and S. Podlipnig, “Performance and Scalability of GPU-


Based Convolutional Neural Networks,” Euromicro Conference on Parallel,
Distributed, and Network-Based Processing, pp. 317-324, 2010.

[5]D. Cireşan, U. Meier, and J. Schmidhuber, “Multi-column Deep Neural


Networks for Image Classification,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2012.

[6] http://www.openu.ac.il/home/hassner/Adience/

data.html [12] http://www.jdl.ac.cn/peal/index.html

66
67

You might also like