Phase 1
Phase 1
Phase 1
Submitted By
KAVIYAPRIYA.G
(910619405001)
in partial fulfillment for the award of the degree
of
MASTER OF ENGINEERING
in
COMPUTER SCIENCE
BONAFIDE CERTIFICATE
who carried out the work under my supervision. Certified further that to the best of my
knowledge the work reported herein does not form part of any other thesis or
SIGNATURE SIGNATURE
The build an age and gender classification system to classify age and gender
with the help of deep learning framework. Automatic age and gender
classification has become relevant to an increasing amount of applications,
particularly since the rise of social platforms and social media. Nevertheless,
performance of existing methods on real-world images is still significantly
lacking, especially when com-pared to the tremendous leaps in performance
recently re-ported for the related task of face recognition. Keypoint features
and descriptor contains the visual description of the patch and is used to
compare the similarity between image features. In this we show that by
learning representations through the use of deep-convolutional neural
networks (CNN-VGG16), a significant increase in performance can be
obtained on these tasks. To this end, we propose a simple convolutional net
architecture that can be used even when the amount of learning data is
limited. We evaluate our method on the recent dataset for age and gender
estimation and show it to dramatically outperform current state-of-the-art
methods.
DECLARATION
KAVIYAPRIYA.G
(910619405001)
Dr.S.MIRUNAJOEAMALI.ME.,Ph.D
ASSOCIATE PROFESSOR
COMPUTER SCIENCE AND ENGINEERING
K.L.N COLLEGE OF ENGINEERING,
AN AUTONOMOUS INSTITUTION ,
AFFILIATED TO ANNA UNNIVERSITY,CHENNAI 600 025.
TABLEOFCONTENTS
NO
ACKNOWLEDGEMENT
ABSTRACT
LIST OF FIGURES
LIST OF TABLES
1 INTRODUCTION 01
1.2.5. pixel 09
2 LITERATURE SURVEY 15
3 PROBLEM DEFINITION 25
3.1 Existing system 25
4.1.System specification 27
4.2.software description 27
4.3.modules description 36
4.4.Algorithm description 40
5. SYSTEM DESIGN 42
5.1.System Architecture 42
5.2.flow chart 43
7. CONCLUSION 64
7.1.Future enchancement 64
8. REFERENCE 65
LISTOF FIGURES
ASGD AsynchronousStochasticGradientDescent
1
0
TABLE OF CONTENT
ACKNOWLEDGEMENT
ABSTRACT
LIST OF TABLES 7
LIST OF FIGURES 10
LIST OF ABBREVIATION 12
1 INTRODUCTION 1
1.1 INTRODUCTION OF AGE AND GENDER 1
xi
6 EXPERIEMENTAL RESULTS AMDANALYSIS
6.1 SOURCE CODE 44
6.2 SCREEN SHOT 60
7 CONCLUSION 64
7.1 FUTURE ENCHANCEMENT 64
8 REFERENCE 65
xii
CHAPTER 1
INTRODUCTION
As for age classification, Young and Niels put forward age estimation at
the first time in 1994, roughly dividing people into three groups, kids, youngster
and the aged. Hayashi studied on wrinkles texture and skin analysis based on
Hough transform. Zhou used boosing as regression and Geng proposed method
of Aging
1
Pattern Subspace. Recently Guo presents an approach to extract feature related to
senility via embedded low-dimensional aging of the manifold getting from
subspace learning.
GoogLeNet as the training network. GoogLeNet attained top accuracy rate in the
ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14), which
contains 22 layers. GoogLeNet is notable for optimizing the utilization of the
computing resources, keeping the computational budget constant by improving
2
crafted design to increase the depth and width of the network. In order to avoid the
overfitting and enormous computations come from the expansion of network
architecture, GoogLeNet rationally moves from fully connected to sparsely
connected architectures, even inside the convolutions. However, when referring to
numerical computation on non-uniform sparse data structures, computing
infrastructures becomes inefficient. The Inception architecture is used to cluster
sparse structures into denser submatrix. This not only maintains the sparsity of the
network structure, but also take full advantage of high computing performance of
the dense matrix. GoogLeNet adds softmax layers amidst the network to prevent
vanishing gradient problem. Softmax layer is used to obtain extra train loss. The
gradient calculated related to loss to the gradient of the whole network to propagate
the gradient can regard this kind of method as merging sub-networks in various
depths together. Thanks to convolution kernel sharing, computational gradients
accumulate. As a result, gradient will not fade too much.
3
1.2 DOMAIN EXPLANATION
1.2.1.IMAGE PROCESSING
In a (8-bit) greyscale image each picture element has an assigned intensity that
ranges from 0 to 255. A grey scale image is what people normally call a black and
white image, but the name emphasizes that such an image will also include many
shades of grey.
Figure1.2 (A): Each pixel has a value from 0 (black) to 255 (white).
4
The possible range of the pixel values depend on the colour depth of the
image, here 8 bit = 256 tones or greyscales.
A normal grayscale image has 8 bit colour depth = 256 grayscales. A “true colour”
image has 24 bit colour depth = 8 x 8 x 8 bits = 256 x 256 x 256 colours = ~16
million colours.
Some grayscale images have more grayscales, for instance 16 bit = 65536
grayscales. In principle three grayscale images can be combined to form an image
with 281,474,976,710,656 grayscales.
There are two general groups of ‘images’: vector graphics (or line art) and bitmaps
(pixel-based or ‘images’). Some of the most common file formats are:
5
JPEG— a very efficient (i.e. much information per byte) destructively
compressed 24 bit (16 million colours) bitmap format. Widely used,
especially for web and Internet (bandwidth-limited).
PSD – a dedicated Photoshop format that keeps all the information in an image
including all the layers.
6
Image Enhancement techniques. Once the image is in good condition, the
Measurement Extraction operations can be used to obtain useful information from
the image. Some examples of Image Enhancement and Measurement Extraction
are given below. The examples shown all operate on 256 grey-scale images. This
means that each pixel in the image is stored as a number between 0 to 255, where 0
represents a black pixel, 255 represents a white pixel and values in-between
represent shades of grey. These operations can be extended to operate on color
images. The examples below represent only a few of the many techniques available
for operating on images. Details about the inner workings of the operations have
not been given, but some references to books containing this information are given
at the end for the interested reader.
7
microphotograph of an electronic component, or the result of medical imaging. Even
if the picture is not immediately recognizable, it will not be just a random blur.
Suppos an image,take an photo, say. For the moment, lets make things easy
and suppose the photo is black and white (that is, lots of shades of grey), so no
colour.many consider this image as being a two dimensional function, where the
function values give the brightness of the image at any given point.An assume that
in such an image brightness values can be any real numbers in the range (black)
(white).
A digital image from a photo in that the values are all discrete. Usually they
take on only integer values. The brightness values also ranging from 0 (black) to
255 (white). A digital image can be considered as a large array of discrete dots,
each of which has a brightness associated with it. These dots are called picture
elements, or more simply pixels. The pixels surrounding a given pixel constitute its
8
neighborhood. A neighborhood can be characterized by its shape in the same way
as a matrix speak of a neighborhood. Except in very special circumstances,
neighborhoods have odd numbers of rows and columns; this ensures that the
current pixel is in the centre of the neighborhood.
1.2.4.1Pixel:
Other pixel shapes and formations can be used, most notably the hexagonal grid, in
which each pixel is a small hexagon. This has some advantages in image processing,
including the fact that pixel connectivity is less ambiguously defined than with a
square grid, but hexagonal grids are not widely used. Part of the reason is that many
image capture systems (e.g. most CCD cameras and scanners) intrinsically discretize
the captured image into a rectangular grid in the first instance.
1.2.4.2.Pixel Connectivity
9
The notation of pixel connectivity describes a relation between two or more pixels.
For two pixels to be connected they have to fulfill certain conditions on the pixel
brightness and spatial adjacency.
First, in order for two pixels to be considered connected, their pixel values must
both be from the same set of values V. For a grayscale image, V might be any
range of graylevels, e.g.V={22,23,...40}, for a binary image simple have V={1}.
Two pixels p and q, both having values from a set V are 4-connected if q is from the
10
A set of pixels in an image which are all connected to each other is called a
connected component. Finding all connected components in an image and marking
each of them with a distinctive label is called connected component labeling.
An example of a binary image with two connected components which are based on
4-connectivity can be seen in Figure 1. If the connectivity were based on 8-
neighbors, the two connected components would merge into one.
1.2.4.2.Pixel Values
Each of the pixels that represents an image stored inside a computer has a pixel
value which describes how bright that pixel is, and/or what color it should be. In
the simplest case of binary images, the pixel value is a 1-bit number indicating
either foreground or background. For a gray scale images, the pixel value is a
single number that represents the brightness of the pixel. The most common pixel
format is the byte image, where this number is stored as an 8-bit integer giving a
range of possible values from 0 to 255. Typically zero is taken to be black, and 255
is taken to be white. Values in between make up the different shades of gray.
11
To represent color images, separate red, green and blue components must be
specified for each pixel (assuming an RGB color space), and so the pixel `value' is
actually a vector of three numbers. Often the three different components are stored
as three separate `grayscale' images known as color planes (one for each of red,
green and blue), which have to be recombined when displaying or processing.
Multispectral Images can contain even more than three components for each pixel,
and by extension these are stored in the same kind of way, as a vector pixel value,
or as separate color planes.
The actual grayscale or color component intensities for each pixel may not actually
be stored explicitly. Often, all that is stored for each pixel is an index into a colour
map in which the actual intensity or colors can be looked up.
Although simple 8-bit integers or vectors of 8-bit integers are the most common
sorts of pixel values used, some image formats support different types of value, for
instance 32-bit signed integers or floating point values. Such values are extremely
useful in image processing as they allow processing to be carried out on the image
where the resulting pixel values are not necessarily 8-bit integers. If this approach
is used then it is usually necessary to set up a color map which relates particular
ranges of pixel values to particular displayed colors.
12
Pixels, with a neighborhood:
Color scale
RGB
The RGB color model is an additive color model in which red, green, and blue
light are added together in various ways to reproduce a broad array of colors. RGB
uses additive color mixing and is the basic color model used in television or any
other medium that projects color with light. It is the basic color model used in
computers and for web graphics, but it cannot be used for print production.
The secondary colors of RGB – cyan, magenta, and yellow – are formed by mixing
two of the primary colors (red, green or blue) and excluding the third color. Red
and green combine to make yellow, green and blue to make cyan, and blue and red
form magenta. The combination of red, green, and blue in full intensity makes
white.[figure4]
13
Figure 1.3: The additive model of RGB. Red, green, and blue are the primary
stimuli for human color perception and are the primary additive colours.
FIGURE 1.4.: To see how different RGB components combine together, here is a
selected repertoire of colors and their respective relative intensities for each of the
red, green, and blue components
14
CHAPTER 2
LITERATURE SURVEY
attempt to close the gap between automatic face recognition capabilities and those of
age and gen-der estimation methods. To this end,follow the successful example laid
down by recent face recognition systems: Face recognition techniques described in the
last few years have shown that tremendous progress can be made by the use
15
of deep convolutional neural networks (CNN). An demonstrate similar gains with a
simple network architecture, designed by considering the rather limited availability
of accurate age and gender labels in existing face data sets.
Test network on the newly released Adience benchmark for age and gender
classification of unfiltered face images. show that despite the very challenging
nature of the images in the Adience set and the simplicity of network
design,method outperforms existing state of the art by substantial margins.
Although these results provide a remarkable baseline for deep-learning-based
approaches, they leave room for improvements by more elaborate system designs,
suggesting that the problem of accurately estimating age and gender in the
unconstrained set-tings, as reflected by the Adience images, remains unsolved. In
order to provide a foothold for the development of more effective future methods,
make trained models and classification system publicly available.
16
2. Title: Imagenet Classification with Deep Convolutional Neural Networks
(2012)
trained a large, deep convolutional neural network to classify the 1.2 million high-
resolution images in the ImageNet LSVRC-2010 contest into the 1000 different
classes. On the test data, achieved top-1 and top-5 error rates of 37.5%and 17.0%
which is considerably better than the previous state-of-the-art. The neural network,
which has 60 million parameters and 650,000 neurons, consists of five convolutional
layers, some of which are followed by max-pooling layers, and three fully-connected
layers with a final 1000-way softmax. To make train-ing faster, used non-saturating
neurons and a very efficient GPU implementation of the convolution operation. To
reduce overfitting in the fully-connected layers employed a recently-developed
regularization method called “dropout “that proved to be very effective.also entered a
variant of this model in theILSVRC-2012 competition and achieved a winning top-5
test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
17
learn to recognize them it is necessary to use much larger training sets. And indeed,
the shortcomings of small image datasets have been widely recognized (e.g., Pinto et
al.), but it has only recently become possible to collect labeled datasets with millions
of images. The new larger datasets include LabelMe, which consists of hundreds of
thousands of fully-segmented images, and ImageNet, which consists of over 15
million labeled high-resolution images in over 22,000 categories.
Despite the attractive qualities of CNNs, and despite the relative efficiency
of their local architecture, they have still been prohibitively expensive to apply in
large scale to high-resolution images. Luckily, current GPUs, paired with a highly-
optimized implementation of 2D convolution, are powerful enough to facilitate the
training of interestingly-large CNNs, and recent datasets such as ImageNet contain
enough labeled examples to train such models without severe overfitting.
18
Author: Zhou, S. K., Georgescu, B., Zhou, X., & Comaniciu, D.
19
regressors are not images themselves, rather pre-processed entities, e.g., landmark
locations and shape context descriptor.
20
4. Title: Performance and Scalability of GPU-Based Convolutional Neural
Networks(2010)
21
perform fast training and classification of CNNs on the GPU.Goal was to
demonstrate the performance.
The convolutional layers are the core of any CNN. A convolutional layer
consists of several two-dimensional planes of neurons, the so-called feature maps.
Each neuron of a feature map is connected to a small subset of neurons inside the
feature maps of the previous layer, the so-called receptive fields. The receptive
fields of neighboring neurons overlap and the weights of these receptive fields are
shared through all the neurons of the same feature map. The feature maps of a
convolutional layer and its preceding layer are either fully or partially connected
(either in a predefined way or in a randomized manner).
The relatively low amount of data to transfer to the GPU for every pattern
and the big matrices that have to be handled inside the network seem to be
appropriate for GPGPU processing. Furthermore, experiments showed that the
GPU implementation scales much better than the CPU implementations with
increasing network size.
22
Traditional methods of computer vision and machine learning cannot match
human performance on tasks such as the recognition of handwritten digits or traffic
signs. Biologically plausible, wide and deep artificial neural net-work architectures
can. Small (often minimal) receptive fields of convolutional winner-take-all
neurons yield large network depth, resulting in roughly as many sparsely
connected neural layers as found in mammals between retina and visual cortex.
Only winner neurons are trained. Several deep neural columns become experts on
inputs pre-processed in different ways; their predictions are averaged. Graphics
cards allow for fast training. On the very competitive MNIST handwriting
benchmark, method is the first to achieve near-human performance. On a traffic
sign recognition benchmark it outperforms humans by a factor of two. Improve the
state-of-the-art on a plethora of common image classification benchmarks.
Carefully designed GPU code for image classification can be up to two orders
of magnitude faster than its CPU counterpart. Hence, to train huge DNN in hours or
23
days,Implement them on GPU, building upon the work. The training algorithm
is fully online, i.e. weight updates occur after each error back-propagation
step.Show that properly trained wide and deep DNNs can outperform all
previous methods, and demonstrate that unsupervised initialization/pretrainingis
not necessary (although don’t deny that it might help sometimes, especially for
datasets with few samples perclass)Show to combining several DNN columns
into a Multi-column DNN (MCDNN) further decreases the error rate by 30-
40%.
24
CHAPTER 3
PROBLEM DEFINITION
3.2PROPOSED SYSTEM
Advantages:
Input Image:
The first stage of any vision system is the image acquisition stage. Image
acquisition is the digitization and storage of an image. After the image has been
obtained, various methods of processing can be applied to the image to perform
the many different vision tasks required today. First Capture the Input Image
from source file by using uigetfile and imread function. However, if the image
has not been acquired satisfactorily then the intended tasks may not be
achievable, even with the aid of some form of image enhancement.
27
Detection happens inside a detection window. A minimum and maximum
window size is chosen, and for each size a sliding step size is chosen. Then the
detection window is moved across the image as follows:
1. Set the minimum window size, and sliding step corresponding to that
size.
2.
3. Set the minimum window size, and sliding step corresponding to that
size.
4. For the chosen window size, slide the window vertically and horizontally
with the same step. At each step, a set of N face recognition filters is
applied. If one filter gives a positive answer, the face is detected in the
current widow.
3. If the size of the window is the maximum size stop the procedure.
Otherwise increase the size of the window and corresponding sliding step
to the next chosen size and go to the step 2.
Key-point Features:
Image features are small patches that are useful to compute similarities
between images. An image feature is usually composed of a feature keypoint
and a feature descriptor. The keypoint usually contains the patch 2D position
and other stuff if available such as scale and orientation of the image feature.
The descriptorcontains the visual description of the patch and is used to
compare the similarity between image features.
VGG16-Classification:
VGG-16 is a convolutional neural network that is 16 layers deep. You can load
a pretrained version of the network trained on more than a million images from
the ImageNet database. The pretrained network can classify images into 1000
28
object categories, such as keyboard, mouse, pencil, and many animals. As a
result, the network has learned rich feature representations for a wide range of
images. The network has an image input size of 224-by-224. You can use
classify to classify new images using the VGG-16 network. Follow the steps of
Classify Image Using GoogLeNet and replace GoogLeNet with VGG-16. To
retrain the network on a
new classification task, follow the steps of Train Deep Learning Network to
Classify New Images and load VGG-16 instead of GoogLeNet.
net = vgg16 returns a VGG-16 network trained on the ImageNet data set.
This function requires Deep Learning Toolbox™ Model for VGG-16 Network
support package. If this support package is not installed, then the function
provides a download link.
Performance Analysis:
29
Sensitivity and specificity are statistical measures of the performance of
a binary classification test, also known in statistics as classification function:
o Sensitivity (also called the true positive rate, the recall, or probability
of detection in some fields) measures the proportion of positives that
are correctly identified as such (e.g., the percentage of sick people
who
are correctly identified as having the condition).
Specificity (also called the true negative rate) measures the proportion of negatives
that are correctly identified as such (e.g., the percentage of healthy people who are
correctly identified as not having the condition).
30
CHAPTER 4
Software Requirements
Language : python.
Hardware Requirements
Mouse : Logitech.
4.2.SOFTWARE DESCRIPTION
31
1990. Like Perl, Python source code is also available under the GNU General
Public License (GPL). This tutorial gives enough understanding on Python
programming language.
It is used for:
Python can be used on a server to create web applications. Python can be used
alongside software to create workflows. Python can connect to database
systems. It can also read and modify files. Python can be used to handle big data
and perform complex mathematics. Python can be used for rapid prototyping, or
for production-ready software development.
allows developers to write programs with fewer lines than some other
programming languages.
32
Python is a high-level, interpreted, interactive and object-oriented scripting
language. Python is designed to be highly readable. It uses English keywords
frequently where as other languages use punctuation, and it has fewer
syntactical constructions than other languages.
History of Python
Python was developed by Guido van Rossum in the late eighties and early
nineties at the National Research Institute for Mathematics and Computer
Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C+
+, Algol-68, SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Pythce code is now available under the GNU
General Public License (GPL).
Python Features
33
Python's features include −
GUI Programming − Python supports GUI applications that can be created and
ported to many system calls, libraries and windows systems, such as Windows
MFC, Macintosh, and the X Window system of Unix.
Scalable − Python provides a better structure and support for large programs
than shell scripting.
Apart from the above-mentioned features, Python has a big list of good features,
few are listed below −
34
• It supports functional and structured programming methods as well as
OOP.
• It can be used as a scripting language or can be compiled to byte-code for
building large applications.
• It provides very high-level dynamic data types and supports dynamic type
checking.
• It supports automatic garbage collection.
• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and
Java.
Anaconda Distribution
With over 6 million users, the open sourceAnaconda Distributionis the fastest
and easiest way to do Python and R data science and machine learning on
Linux,
Windows, and Mac OS X. It's the industry standard for developing, testing, and
training on a single machine.Anaconda Enterprise
35
FIGURE 4.1 Anaconda distribution
Anaconda Data Science Libraries
Over 1,400 Anaconda-curated and community data science package
• Develop data science projects using favourite Analyse data with
scalability and performance with Dask, numpy, pandas, and Numba
• Visualize data with Matplotlib, Bokeh, Datashader, and Holoviews
• Create machine learning and deep learning models with Scikit-learn,
Tensorflow, h20, and Theano
36
FIGURE 4.2.Data science libraries
37
• Download conda packages from Anaconda, Anaconda Enterprise, Conda
38
FIGURE 4.4.Conda navigator
Spyder
Editor
Work efficiently in a multi-language editor with a function/class browser, code
analysis tools, automatic code completion, horizontal/vertical splitting, and go-to-
definition.
IPython Console
Harness the power of as many IPython consoles as you like within the flexibility of
a full GUI interface; run a code by line, cell, or file; and render plots right inline.
Variable Explorer
Interact with and modify variables on the fly: plot a histogram or time series, edit a
date frame or Numpy array, sort a collection, dig into nested objects, and more!
Profiler
Find and eliminate bottlenecks to unchain code's performance.
Debugger
Trace each step of code's execution interactively.
Help
Instantly view any object's docs, and render own.
40
4.4.ALGORITHM DESCRIPTION
The ImageNet Large Scale Visual Recognition Challenge (ILSVRC)is an
annual computer vision competition. Each year, teams compete on two tasks. The first
is to detect objects within an image coming from200 classes, which is called object
localization. The second is to classify images, each labelled with one of1000
categories, which is called image classification.VGG16 is a convolutionneural net
(CNN) architecture which was used to win ILSVR(ImageNet). TheImageNet Large
Scale Visual Recognition Challenge is an annual computer vision competition. It is
considered to be one of the excellent vision model architecture tilldate. Most unique
thing about VGG16 is that instead of having a large number of hyper-parameter
ImageNet is one the on the largest data-set available. It has 14million hand-annotated
images of what is in the picture. You can load a pretrainedversion of the network
trained on more than a million images from the ImageNet database. The pretrained
network can classify images into 1000 object categories, such as keyboard, mouse,
pencil, and many animals. As a result, the network has learned rich feature
representations for a wide range of images. The network has an image input size of
224-by-224.The input to convolutiona1 layer is of fixed size 224x 224 RGB image.
The image is passed through a stack of convolutional (conv.) layers, where the filters
were used with a very small receptive field: 3×3 (which is the smallest size to capture
the notion of left/right, up/down, center). In one of the configurations, it also utilizes
carried out by five max-pooling layers, which follow some of the conventional
layers (not all the conv. layers are followed by max-pooling). Max-pooling is
performed over a 2×2 pixel window, with stride 2. It
42
CHAPTER 5
SYSTEM DESIGN
5.1.SYSTEM AECHITECTURE
Performance Analysis
43
5.2.FLOW DIAGRAM
Face Region
Detection
Face Image
VGG16
Convolutional Maxpool Fully ConnectedPrediction
Softmax Result
FIGURE5.2.FLOW DIAGRAM
44
CHAPTER 6
6.1.SOURCE CODE:
fromnumpy import all, any, array, arctan2, cos, sin, exp, dot, log, logical_and, roll,
sqrt, stack, trace, unravel_index, pi, deg2rad, rad2deg, where, zeros, floor, full,
nan, isnan, round, float32
fromnumpy.linalg import det, lstsq, norm
from cv2 import resize, GaussianBlur, subtract, KeyPoint,
INTER_LINEAR, INTER_NEAREST
fromfunctools import cmp_to_key
import logging
logger = logging.getLogger(__name__)
float_tolerance = 1e-7
defcomputeKeypointsAndDescriptors(image, sigma=1.6,
num_intervals=3, assumed_blur=0.5, image_border_width=5):
image = image.astype('float32')
base_image = generateBaseImage(image, sigma, assumed_blur)
num_octaves = computeNumberOfOctaves(base_image.shape)
gaussian_kernels = generateGaussianKernels(sigma, num_intervals)
gaussian_images = generateGaussianImages(base_image,
num_octaves, gaussian_kernels)
dog_images = generateDoGImages(gaussian_images)
keypoints = findScaleSpaceExtrema(gaussian_images, dog_images, num_intervals,
sigma, image_border_width)
45
keypoints = removeDuplicateKeypoints(keypoints)
keypoints = convertKeypointsToInputImageSize(keypoints)
descriptors = generateDescriptors(keypoints, gaussian_images)
returnkeypoints, descriptors
defgenerateBaseImage(image, sigma, assumed_blur):
logger.debug('Generating base image...')
image = resize(image, (0, 0), fx=2, fy=2, interpolation=INTER_LINEAR)
sigma_diff = sqrt(max((sigma ** 2) - ((2 * assumed_blur) ** 2), 0.01))
returnGaussianBlur(image, (0, 0), sigmaX=sigma_diff, sigmaY=sigma_diff) # the
image blur is now sigma instead of assumed_blur
defcomputeNumberOfOctaves(image_shape):
returnint(round(log(min(image_shape)) / log(2) - 1))
defgenerateGaussianKernels(sigma, num_intervals):
logger.debug('Generating scales...')
num_images_per_octave = num_intervals + 3
k = 2 ** (1. / num_intervals)
gaussian_kernels = zeros(num_images_per_octave) # scale of gaussian blur
necessary to go from one blur scale to the next within an octave
gaussian_kernels[0] = sigma
forimage_index in range(1, num_images_per_octave):
sigma_previous = (k ** (image_index - 1)) * sigma
sigma_total = k * sigma_previous
gaussian_kernels[image_index] = sqrt(sigma_total ** 2 - sigma_previous **
2) returngaussian_kernels
defgenerateGaussianImages(image, num_octaves, gaussian_kernels):
logger.debug('Generating Gaussian images...')
gaussian_images = []
46
foroctave_index in range(num_octaves):
gaussian_images_in_octave = []
gaussian_images_in_octave.append(image) # first image in octave already has the
correct blur
forgaussian_kernel in gaussian_kernels[1:]:
image = GaussianBlur(image, (0, 0),
sigmaX=gaussian_kernel, sigmaY=gaussian_kernel)
gaussian_images_in_octave.append(image)
gaussian_images.append(gaussian_images_in_octave)
octave_base = gaussian_images_in_octave[-3]
image = resize(octave_base, (int(octave_base.shape[1] / 2),
int(octave_base.shape[0] / 2)), interpolation=INTER_NEAREST)
return array(gaussian_images)
defgenerateDoGImages(gaussian_images):
logger.debug('Generating Difference-of-Gaussian images...')
dog_images = []
forgaussian_images_in_octave in gaussian_images:
dog_images_in_octave = []
forfirst_image, second_image in zip(gaussian_images_in_octave,
gaussian_images_in_octave[1:]):
dog_images_in_octave.append(subtract(second_image, first_image)) # ordinary
subtraction will not work because the images are unsigned integers
dog_images.append(dog_images_in_octave)
return array(dog_images)
###############################
47
# Scale-space extrema related #
###############################
48
defisPixelAnExtremum(first_subimage, second_subimage,
third_subimage, threshold):
center_pixel_value = second_subimage[1,
1] if abs(center_pixel_value) > threshold:
ifcenter_pixel_value> 0:
return all(center_pixel_value>= first_subimage) and \
all(center_pixel_value>= third_subimage) and \
all(center_pixel_value>= second_subimage[0, :]) and \
all(center_pixel_value>= second_subimage[2, :]) and \
center_pixel_value>= second_subimage[1, 0] and \
center_pixel_value>= second_subimage[1, 2]
elifcenter_pixel_value< 0:
return all(center_pixel_value<= first_subimage) and \
all(center_pixel_value<= third_subimage) and \
all(center_pixel_value<= second_subimage[0, :]) and \
all(center_pixel_value<= second_subimage[2, :]) and \
center_pixel_value<= second_subimage[1, 0] and \
center_pixel_value<= second_subimage[1, 2] return
False
49
image_shape = dog_images_in_octave[0].shape
forattempt_index in range(num_attempts_until_convergence):
# need to convert from uint8 to float32 to compute derivatives and need to
rescale pixel values to [0, 1] to apply Lowe's thresholds
first_image, second_image, third_image = dog_images_in_octave[image_index-
1:image_index+2]
pixel_cube = stack([first_image[i-1:i+2, j-1:j+2],
second_image[i-1:i+2, j-1:j+2],
third_image[i-1:i+2, j-1:j+2]]).astype('float32') / 255.
gradient = computeGradientAtCenterPixel(pixel_cube)
hessian = computeHessianAtCenterPixel(pixel_cube)
extremum_update = -lstsq(hessian, gradient, rcond=None)[0]
if abs(extremum_update[0]) < 0.5 and abs(extremum_update[1]) < 0.5 and
abs(extremum_update[2]) < 0.5:
break
j += int(round(extremum_update[0]))
i += int(round(extremum_update[1]))
image_index += int(round(extremum_update[2]))
# make sure the new pixel_cube will lie entirely within the image
ifi<image_border_width or i>= image_shape[0] - image_border_width or
j <image_border_width or j >= image_shape[1] - image_border_width or
image_index< 1 or image_index>num_intervals:
extremum_is_outside_image = True
break
ifextremum_is_outside_image:
logger.debug('Updated extremum moved outside of image before reaching
convergence. Skipping...')
50
return None
ifattempt_index>= num_attempts_until_convergence - 1:
logger.debug('Exceeded maximum number of attempts without reaching
convergence for this extremum. Skipping...')
return None
functionValueAtUpdatedExtremum = pixel_cube[1, 1, 1] + 0.5 *
dot(gradient, extremum_update)
if abs(functionValueAtUpdatedExtremum) * num_intervals>= contrast_threshold:
xy_hessian = hessian[:2, :2]
xy_hessian_trace = trace(xy_hessian)
xy_hessian_det = det(xy_hessian)
ifxy_hessian_det> 0 and eigenvalue_ratio * (xy_hessian_trace ** 2) <
((eigenvalue_ratio + 1) ** 2) * xy_hessian_det:
# Contrast check passed -- construct and return OpenCVKeyPoint
object keypoint = KeyPoint()
keypoint.pt = ((j + extremum_update[0]) * (2 ** octave_index), (i
+ extremum_update[1]) * (2 ** octave_index))
keypoint.octave = octave_index + image_index * (2 ** 8) +
int(round((extremum_update[2] + 0.5) * 255)) * (2 ** 16)
keypoint.size = sigma * (2 ** ((image_index + extremum_update[2]) /
float32(num_intervals))) * (2 ** (octave_index + 1)) # octave_index + 1 because
the input image was doubled
keypoint.response = abs(functionValueAtUpdatedExtremum)
returnkeypoint, image_index
return None
defcomputeGradientAtCenterPixel(pixel_array):
dx = 0.5 * (pixel_array[1, 1, 2] - pixel_array[1, 1, 0])
51
dy = 0.5 * (pixel_array[1, 2, 1] - pixel_array[1, 0, 1])
ds = 0.5 * (pixel_array[2, 1, 1] - pixel_array[0, 1, 1])
return array([dx, dy, ds])
defcomputeHessianAtCenterPixel(pixel_array):
center_pixel_value = pixel_array[1, 1, 1]
dxx = pixel_array[1, 1, 2] - 2 * center_pixel_value + pixel_array[1, 1, 0]
dyy = pixel_array[1, 2, 1] - 2 * center_pixel_value + pixel_array[1, 0, 1]
dss = pixel_array[2, 1, 1] - 2 * center_pixel_value + pixel_array[0, 1, 1]
dxy = 0.25 * (pixel_array[1, 2, 2] - pixel_array[1, 2, 0] - pixel_array[1, 0, 2] +
pixel_array[1, 0, 0])
dxs = 0.25 * (pixel_array[2, 1, 2] - pixel_array[2, 1, 0] - pixel_array[0, 1, 2] +
pixel_array[0, 1, 0])
dys = 0.25 * (pixel_array[2, 2, 1] - pixel_array[2, 0, 1] - pixel_array[0, 2, 1] +
pixel_array[0, 0, 1])
return array([[dxx, dxy, dxs],
[dxy, dyy, dys],
[dxs, dys, dss]])
52
radius = int(round(radius_factor * scale))
weight_factor = -0.5 / (scale ** 2)
raw_histogram = zeros(num_bins)
smooth_histogram = zeros(num_bins)
for n in range(num_bins):
smooth_histogram[n] = (6 * raw_histogram[n] + 4 * (raw_histogram[n - 1] +
raw_histogram[(n + 1) % num_bins]) + raw_histogram[n - 2] + raw_histogram[(n
+ 2) % num_bins]) / 16.
orientation_max = max(smooth_histogram)
53
orientation_peaks = where(logical_and(smooth_histogram>
roll(smooth_histogram, 1), smooth_histogram> roll(smooth_histogram, -1)))[0]
forpeak_index in orientation_peaks:
peak_value = smooth_histogram[peak_index]
ifpeak_value>= peak_ratio * orientation_max:
# Quadratic peak interpolation
# The interpolation update is given by equation (6.30) in
https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation_Spectral_Peaks.html
left_value = smooth_histogram[(peak_index - 1) % num_bins]
right_value = smooth_histogram[(peak_index + 1) % num_bins]
interpolated_peak_index = (peak_index + 0.5 * (left_value -
right_value) / (left_value - 2 * peak_value + right_value)) % num_bins
orientation = 360. - interpolated_peak_index * 360. / num_bins
if abs(orientation - 360.) <float_tolerance: orientation = 0
defcompareKeypoints(keypoint1, keypoint2):
if keypoint1.pt[0] != keypoint2.pt[0]: return
keypoint1.pt[0] - keypoint2.pt[0] if
keypoint1.pt[1] != keypoint2.pt[1]:
return keypoint1.pt[1] - keypoint2.pt[1]
if keypoint1.size != keypoint2.size:
return keypoint2.size - keypoint1.size
54
if keypoint1.angle != keypoint2.angle:
return keypoint1.angle - keypoint2.angle
if keypoint1.response != keypoint2.response:
return keypoint2.response - keypoint1.response
if keypoint1.octave != keypoint2.octave: return
keypoint2.octave - keypoint1.octave return
keypoint2.class_id - keypoint1.class_id
defremoveDuplicateKeypoints(keypoints):
iflen(keypoints) < 2:
returnkeypoints
keypoints.sort(key=cmp_to_key(compareKeypoints))
unique_keypoints = [keypoints[0]]
fornext_keypoint in keypoints[1:]:
last_unique_keypoint = unique_keypoints[-1]
if last_unique_keypoint.pt[0] != next_keypoint.pt[0] or \
last_unique_keypoint.pt[1] != next_keypoint.pt[1] or \
last_unique_keypoint.size != next_keypoint.size or \
last_unique_keypoint.angle != next_keypoint.angle:
unique_keypoints.append(next_keypoint)
returnunique_keypoints
defconvertKeypointsToInputImageSize(keypoints):
converted_keypoints = []
forkeypoint in keypoints:
55
keypoint.pt = tuple(0.5 * array(keypoint.pt))
keypoint.size *= 0.5
keypoint.octave = (keypoint.octave& ~255) | ((keypoint.octave - 1) & 255)
converted_keypoints.append(keypoint)
returnconverted_keypoints
defunpackOctave(keypoint):
octave = keypoint.octave& 255
layer = (keypoint.octave>> 8) & 255
if octave >= 128:
octave = octave | -128
scale = 1 / float32(1 << octave) if octave >= 0 else float32(1 << -octave)
return octave, layer, scale
forkeypoint in keypoints:
octave, layer, scale = unpackOctave(keypoint)
gaussian_image = gaussian_images[octave + 1, layer]
num_rows, num_cols = gaussian_image.shape
point = round(scale * array(keypoint.pt)).astype('int')
bins_per_degree = num_bins / 360.
angle = 360. - keypoint.angle
cos_angle = cos(deg2rad(angle))
56
sin_angle = sin(deg2rad(angle))
weight_multiplier = -0.5 / ((0.5 * window_width) ** 2)
row_bin_list = []
col_bin_list = []
magnitude_list = []
orientation_bin_list = []
histogram_tensor = zeros((window_width + 2, window_width + 2, num_bins)) #
first two dimensions are increased by 2 to account for border effects
# Descriptor window size (described by half_width) follows OpenCV
convention hist_width = scale_multiplier * 0.5 * scale * keypoint.size
half_width = int(round(hist_width * sqrt(2) * (window_width + 1) * 0.5)) #
sqrt(2) corresponds to diagonal length of a pixel
half_width = int(min(half_width, sqrt(num_rows ** 2 + num_cols ** 2))) #
ensure half_width lies within image
for row in range(-half_width, half_width + 1):
for col in range(-half_width, half_width + 1):
row_rot = col * sin_angle + row * cos_angle
col_rot = col * cos_angle - row * sin_angle
row_bin = (row_rot / hist_width) + 0.5 * window_width - 0.5
col_bin = (col_rot / hist_width) + 0.5 * window_width - 0.5
ifrow_bin> -1 and row_bin<window_width and col_bin> -1 and
col_bin<window_width:
window_row = int(round(point[1] + row))
window_col = int(round(point[0] + col))
ifwindow_row> 0 and window_row<num_rows - 1 and window_col> 0
and window_col<num_cols - 1:
57
dx = gaussian_image[window_row, window_col + 1] -
gaussian_image[window_row, window_col - 1]
dy = gaussian_image[window_row - 1, window_col] -
gaussian_image[window_row + 1, window_col]
gradient_magnitude = sqrt(dx * dx + dy * dy)
gradient_orientation = rad2deg(arctan2(dy, dx)) % 360
weight = exp(weight_multiplier * ((row_rot / hist_width) ** 2 +
(col_rot / hist_width) ** 2))
row_bin_list.append(row_bin)
col_bin_list.append(col_bin)
magnitude_list.append(weight * gradient_magnitude)
orientation_bin_list.append((gradient_orientation - angle) * bins_per_degree)
58
c1 = magnitude * row_fraction
c0 = magnitude * (1 - row_fraction)
c11 = c1 * col_fraction
c10 = c1 * (1 - col_fraction)
c01 = c0 * col_fraction
c00 = c0 * (1 - col_fraction)
c111 = c11 * orientation_fraction
c110 = c11 * (1 - orientation_fraction)
c101 = c10 * orientation_fraction
c100 = c10 * (1 - orientation_fraction)
c011 = c01 * orientation_fraction
c010 = c01 * (1 - orientation_fraction)
c001 = c00 * orientation_fraction
c000 = c00 * (1 - orientation_fraction)
59
histogram_tensor[row_bin_floor + 2, col_bin_floor + 2, orientation_bin_floor]
+= c110
histogram_tensor[row_bin_floor + 2, col_bin_floor + 2, (orientation_bin_floor + 1)
% num_bins] += c111
60
6.2.SCREENSHOTS
STEP1:
Select the file in package
STEP2:
61
FIGURE.6.3.VGG layer is trained
62
Step4:
Result is trained in VGG layer for accurancy rersult
63
FIGURE 6.5.converted the face images into layer and trained the
face image for accurancy value
64
CHAPTER 7
CONCLUSION
conclude that First, CNN can be used to provide improved age and gender
classification results, even considering the much smaller size of contemporary
unconstrained image sets labeled forage and gender. Second, the simplicity of
model implies that more elaborate systems using more training data may well be
capable of substantially improving results be-yond those reported here.
7.1.FUTURE ENHANCEMENT
For future works, consider a deeper CNN architecture and a more robust image
processing algorithm for exact age estimation. Also, the apparent age estimation of
human’s face will be interesting research to investigate in the future
65
REFERENCES
[1]Levi, G., & Hassncer, T. (2015). Age and gender classification using
convolutional neural networks. Computer Vision and Pattern Recognition
Workshops (pp.34-42). IEEE.
[3]Zhou, S. K., Georgescu, B., Zhou, X., & Comaniciu, D. (2010). Method for
performing image based regression using boosting. US, US7804999.
[6] http://www.openu.ac.il/home/hassner/Adience/
66
67