Bhagyesh FIR
Bhagyesh FIR
An Project Report on
CERTIFICATE
This is to certify that the project entitled “AI based identity recognition
using finger vein” has been successfully carried out by
BHAGYESH(3RB20AI010) fulfillment of the completion of Internship in
Pantech, Hyderabad in Artificial Intelligence, during the academic year 2023-
2024. It is certified that all corrections/suggestions for internal assessment have
been incorporated in report deposited in the department library. The Internship
report has been approved as it satisfies the academic requirements in respect of
Internship work prescribed for the said Degree.
2
ACKNOWLEDGEMENTS
We would like to express our gratitude to the following people whose constant
support helped immensely in the successful completion of our Internship.
We are grateful to all those who have helped us in preparing our project.
Our sincere thanks to our project guide Prof. Abhishek Sali for their guidance
in the preparation of the project. Their constant and sincere guidance helped us
in maintaining the tempo and thus helped us to complete the Internship.
Finally we thank all our friends for their help in making our project a grand
success.
Cordinally
BHAGYESH 3RB20AI010
Artificial Intelligence
ABSTRACT
INDEX
Chapter Page No
1.Introduction 6
7.Computer 36-47
vision
8.Algorithm 48-49
9.Process IP, 50
OP & scopes
10.References 51
Chapter 1
INTRODUCTION
The main use of this finger print detection for avoiding security
problems. plan of effective biometric distinguishing proof
frameworks, estimating interesting physical or social attributes of
people for their protected acknowledgment, is these days a testing
and important assignment for both the logical and the mechanical
networks. Ordinarily utilized physical biometric characteristics
incorporate face, hand math, unique mark, and iris among the
others, though signature, voice, keystroke design, also, step are
instances of social modalities.
Chapter 2
1.1 IMAGE:
An image is a two-dimensional picture, which has a similar appearance to
some subject usually a physical object or a person.
The word image is also used in the broader sense of any two-dimensional
figure such as a map, a graph, a pie chart, or an abstract painting. In this wider sense,
images can also be rendered manually, such as by drawing, painting, carving,
rendered automatically by printing or computer graphics technology, or developed
by a combination of methods, especially in a pseudo-photograph.
An image is a rectangular grid of pixels. It has a definite height and a definite width
counted in pixels. Each pixel is square and has a fixed size on a given display.
However different computer monitors may use different sized pixels. The pixels that
constitute an image are ordered as a grid (columns and rows); each pixel consists of
numbers representing magnitudes of brightness and color.
Fig: BIT Transferred for Red, Green and Blue plane (24bit=8bit red;8-bit green;8bit blue)
each pixel of an image increases in size when its color depth increases, an 8-bit pixel
(1 byte) stores 256 colors, a 24-bit pixel (3 bytes) stores 16 million colors, the latter
known as true color.Image compression uses algorithms to decrease the size of a file.
High resolution cameras produce large image files, ranging from hundreds of
kilobytes to megabytes, per the camera's resolution and the image-storage format
capacity. High resolution digital cameras record 12 megapixel (1MP = 1,000,000
pixels / 1 million) images, or more, in true color. For example, an image recorded by
a 12 MP camera; since each pixel uses 3 bytes to record true color, the uncompressed
image would occupy 36,000,000 bytes of memory, a great amount of digital storage
for one image, given that cameras must record and store many images to be practical.
Faced with large file sizes, both within the camera and a storage disc, image file
formats were developed to store such large images.
IMAGE FILE FORMATS:
Image file formats are standardized means of organizing and storing images. This
entry is about digital image formats used to store photographic and other images.
Image files are composed of either pixel or vector (geometric) data that are rasterized
to pixels when displayed (with few exceptions) in a vector graphic display. Including
proprietary types, there are hundreds of image file types. The PNG, JPEG, and GIF
formats are most often used to display images on the Internet.
In addition to straight image formats, Metafile formats are portable formats which
can include both raster and vector information. The metafile format is an
intermediate format. Most Windows applications open metafiles and then save them
in their own native format.
IMAGE PROCESSING:
Digital image processing, the manipulation of images by computer, is relatively
recent development in terms of man’s ancient fascination with visual stimuli. In its
short history, it has been applied to practically every type of images with varying
degree of success. The inherent subjective appeal of pictorial displays attracts
perhaps a disproportionate amount of attention from the scientists and also from the
layman. Digital image processing like other glamour fields, suffers from myths, mis-
connect ions, mis-understandings and mis-information. It is vast umbrella under
which fall diverse aspect of optics, electronics, mathematics, photography graphics
and computer technology. It is truly multidisciplinary endeavor ploughed with
imprecise jargon.
Several factor combine to indicate a lively conccc for digital image
processing. A major factor is the declining cost of computer equipment. Several new
technological trends promise to further promote digital image processing. These
include parallel processing mode practical by low cost microprocessors, and the use
of charge coupled devices (CCDs) for digitizing, storage during processing and
display and large low cost of image storage arrays.
IMAGE ACQUISITION:
Image Acquisition is to acquire a digital image. To do so requires an image
sensor and the capability to digitize the signal produced by the sensor. The sensor
could be monochrome or color TV camera that produces an entire image of the
problem domain every 1/30 sec. the image sensor could also be line scan camera that
produces a single image line at a time. In this case, the objects motion past the line.
IMAGE ENHANCEMENT:
Image enhancement is among the simplest and most appealing areas of
digital image processing. Basically, the idea behind enhancement techniques is to
bring out detail that is obscured, or simply to highlight certain features of interesting
an image. A familiar example of enhancement is when we increase the contrast of
an image because “it looks better.” It is important to keep in mind that enhancement
is a very subjective area of image processing.
Fig: Image enhancement process for Gray Scale Image and Colour Image using Histogram
Bits
IMAGE RESTORATION:
Image restoration is an area that also deals with improving the appearance of
an image. However, unlike enhancement, which is subjective, image restoration is
objective, in the sense that restoration techniques tend to be based on mathematical
or probabilistic models of image degradation.
SEGMENTATION:
Chapter 3
A lossy compression mode has been preferred, because in an application like a terrain
explorer texture data (e.g., aerial orthophotos) is usually mid-mapped filtered and
therefore lossy mapped onto the terrain surface. In addition, decoding lossy
compressed images is usually faster than decoding lossless compressed images.
In the next test series we evaluate the lossy compression efficiency of PGF. One of
the best competitors in this area is for sure JPEG 2000. Since JPEG 2000 has two
different filters, we used the one with the better trade-off between compression
efficiency and runtime. On our machine the 5/3 filter set has a better trade-off than
the other. However, JPEG 2000 has in both cases a remarkable good compression
efficiency for very high compression ratios but also a very poor encoding and
decoding speed. The other competitor is JPEG. JPEG is one of the most popular
image file formats.
It is very fast and has a reasonably good compression efficiency for a wide range of
compression ratios. The drawbacks of JPEG are the missing lossless compression
and the often missing progressive decoding.
Fig. 4 depicts the average rate-distortion behavior for the images in the Kodak test
set when fixed (i.e., nonprogressive) lossy compression is used. The PSNR of PGF
is on average 3% smaller than the PSNR of JPEG 2000, but 3% better than JPEG.
Fig. 3 shows the averages of the compression ratios (ratio), encoding (enc),
Dept. of AIML, BKIT Bhalki 2022-2023 Page 20
Artificial Intelligence
and decoding (dec) times over all eight images. JPEG 2000 shows in this test set the
best compression efficiency followed by PGF, JPEG-LS, PNG, and WinZip. In
average PGF is eight percent worse than JPEG 2000. The fact that JPEG 2000 has a
better lossless compression ratio than PGF does not surprise,
because JPEG 2000 is more quality driven than PGF.
Chapter 4
CLASSIFICATION OF IMAGES:
There are 3 types of images used in Digital Image Processing. They are
1. Binary Image
2. Gray Scale Image
3. Colour Image
BINARY IMAGE:
A binary image is a digital image that has only two possible values for
each pixel. Typically the two colors used for a binary image are black and white
though any two colors can be used. The color used for the object(s) in the image is
the foreground color while the rest of the image is the background color.
Binary images are also called bi-level or two-level. This means that each pixel
is stored as a single bit (0 or 1).This name black and white, monochrome or
monochromatic are often used for this concept, but may also designate any images
that have only one sample per pixel, such as grayscale images
Binary images often arise in digital image processing as masks or as the result
of certain operations such as segmentation, thresholding, and dithering. Some
input/output devices, such as laser printers, fax machines, and bi-level computer
displays, can only handle bi-level images
Grayscale images are often the result of measuring the intensity of light at each
pixel in a single band of the electromagnetic spectrum (e.g. infrared, visible
light, ultraviolet, etc.), and in such cases they are monochromatic proper when only
a given frequency is captured. But also they can be synthesized from a full color
image; see the section about converting to grayscale.
COLOUR IMAGE:
A (digital) color image is a digital image that includes color information for
each pixel. Each pixel has a particular value which determines its appearing color.
This value is qualified by three numbers giving the decomposition of the color in the
three primary colors Red, Green and Blue. Any color visible to human eye can be
represented this way. The decomposition of a color in the three primary colors is
quantified by a number between 0 and 255. For example, white will be coded as R =
255, G = 255, B = 255; black will be known as (R,G,B) = (0,0,0); and say, bright
pink will be : (255,0,255).
From the above figure, colors are coded on three bytes representing their
decomposition on the three primary colors.
Chapter 5
The output of a thin-layer was given by the most extreme activation over non-
covering rectangular areas. Max-pooling makes location invariance and down-
samples the image along every direction over a bigger neighbourhood. Filter size of
convolutional and max-pooling layers are selected in such a way that a fully
connected layer can combine the output into a one-dimensional vector. The last layer
always be a fully connected layer which contains one output unit for all classes. Here
rectification linear unit was used as the activation function. Furthermore, it was
deciphered as the likelihood of a specific input image having a place with that class.
Adam optimization algorithm that can be used instead of the classical stochastic
gradient descent procedure to update network weights iterative based in training data.
After the network has been structured for classification application with all
parameters, then it was ready for training. After each iteration, the network
converges by reducing the error rate. The loop was terminating when it reached a
minimum error rate. A learning rate was maintained for each network weight
(parameter) and separately adapted as learning unfolds. The network weight was
adjusted subsequently in each iteration from initial value based on result until it
converges to a value. Weight will decide the convergence. The weight value for each
image is recorded in a neural network after database loaded. Here learning rate is
0.0001. Those defined weights was further used to classify more number of datasets.
Training of normal and abnormal Alzheimer stages images on CNN can be done.
The pre-trained weight which was obtained from the training phase also used
in the testing phase. The input image was allowed to pass through all layers of the
neural network and parameters were obtained [4]. These values were cross-checked
with the pre-trained weight and identify the one which gives maximum
matching with the classes presents in the dataset. The system was considering the
label to which it is closely matched.
SOFTWARE REQUIREMENTS:
• PYTHON IDLE
• ANACONDA
Python Is Easy To Learn Yet Powerful And Versatile Scripting Language Which
Makes It Attractive For Application Development.
Python's Syntax And Dynamic Typing With Its Interpreted Nature, Make It An Ideal
Language For Scripting And Rapid Application Development In Many Areas.
Python Is Not Intended To Work On Special Area Such As Web Programming. That
Is Why It Is Known As Multipurpose Because It Can Be Used With Web, Enterprise,
3d Cad Etc.
Chapter 6
EDGE DETECTION
DETECTION:
conditions , the solution of the maximum value . Using the variation method, you
can export conditions a set of best filter for different values of . It is further found
that this filter can be approximated by a series of best -order derivative of the
different values of the Gaussian kernel function .
iv.Two-dimensional case
For the two-dimensional case, Canny edge filter with two horizontal, respectively
vertical, plus the first derivative of Gaussian additive white noise estimation image
function, edge detection. Note: The cross-section of different edges , Canny operator
has different forms of expression should be derived from depending on the
circumstances . Noting the step edge cases, canny operator form of expression and
LOG are similar.
v.Non-maxima suppression
The specific location of the edge set, and the need to find the local maxima along
the gradient direction image convolution. Therefore, the maximum value of the
second derivative along the gradient direction is 0, the Gaussian
function
Special note that there are a lot of people in an 8 - neighbourhood search approach
to non-maximum suppression
vi.Hysteresis thresholding
Due to noise, the image of the edge of the response to a single error , often result
in the continuous edge appears to be brittle. This problem can be solved hysteresis
thresholding.
If the response of any pixel on the edge operator exceeds the high threshold, these
pixel tags is the edge ; response than the low threshold of pixels , if already marked
edge pixels 4 - adjacency or 8 - adjacent , then the pixels are labelled as edge , this
iterative process , the rest of the response of isolated pixels than the low threshold is
considered noise , no longer marked as edges. Both the SNR threshold value is
determined by 2.
vii.Synthesis of multi-scale feature
Canny using a coarse by the subtlety of the process of edge detection results of
different scales are synthesized. First mark the smallest scales corresponding
operator edge detection . Then , along the direction perpendicular to the edge of these
small-scale edge detection operators conduct Gaussian convolution , edge detection
results in the synthesis of a large scale. Value and the value of the convolution edge
detection operator under the same large scale . The test results with the large -scale
synthesis result of comparing the operator , if the large- scale edge detection operator
, which far exceeds the response signal synthesis result , these edges will be added
to the small -scale edge detection on Operator . Such iterative process followed every
step of the synthesis of all the steps on the front edges are not detected , so you can
be the edge detection results of different scales better synthesis together.
Applications:
• Security applications
• Industrial applications
• Social media
• Mobile applications
• Civil applications
BLOCK DIAGRAM
Input
preprocess
Binary
detection
trained Data
Feature
extraction matched
Edge algorithm
detection Not
matched
Explanation: -
Taking the input from user and applying preprocesses like resize, color conversion,
and filtering techniques then after that it will go for binary detection in this the input
will get converted in binary form machine understanding purpose. And this all
processes will be done on background only while after taking the input it will start
to applying this process. Then after that it will go for Region of interest the name it
self showing what are interested regions, we required for detection of vein that
regions it will detect and extract from the pre-process. Then it will go for feature
extraction method in this it will extract the project regarding requirements features
for vein detection. Then the next and important feature is edge detection because at
this phase only with the help of this technique only we can detect the vein using
edges techniques. Then the next step is to verify with pretrained data set with the
algorithm using CNN. Then it will go to data set and it will verify with our dataset
and gives the results if the person vein is matched or un-matched.
Modules used:
● Digital image processing
● Pre-process
● Edge detection
software used:
Python IDLE any version
Open-CV
LANGUAGE USED:
Python is easy to learn yet powerful and versatile scripting language which makes it
attractive for Application Development.
Python's syntax and dynamic typing with its interpreted nature, make it an ideal
language for scripting and rapid application development in many areas.
Python is not intended to work on special area such as web programming. That is
why it is known as multipurpose because it can be used with web, enterprise, 3D
CAD etc.
We don't need to use data types to declare variable because it is dynamically typed so
we can write a=10 to declare an integer value in a variable.
Python makes the development and debugging fast because there is no compilation
step included in python development and edit-test-debug cycle is very fast.
Chapter 7
COMPUTER VISION
Computer Vision is the broad parent name for any computations involving visual
content – that means images, videos, icons, and anything else with pixels involved.
But within this parent idea, there are a few specific tasks that are core building
blocks:
Any other application that involves understanding pixels through software can safely
be labeled as computer vision.
One of the major open questions in both Neuroscience and Machine Learning is:
how exactly do our brains work, and how can we approximate that with our own
algorithms? The reality is that there are very few working and comprehensive
theories of brain computation; so despite the fact that Neural Nets are supposed to
“mimic the way the brain works,” nobody is quite sure if that’s actually true. Jeff
Hawkins has an entire book on this topic called On Intelligence.
The same paradox holds true for computer vision – since we’re not decided on how
the brain and eyes process images, it’s difficult to say how well the algorithms used
in production approximate our own internal mental processes. For example, studies
have shown that some functions that we thought happen in the brain of frogs actually
take place in the eyes. We’re a far cry from amphibians, but similar uncertainty exists
in human cognition.
Machines interpret images very simply: as a series of pixels, each with their own set
of color values. Consider the simplified image below, and how grayscale values are
converted into a simple array of numbers:
Think of an image as a giant grid of different squares, or pixels (this image is a very
simplified version of what looks like either Abraham Lincoln or a Dementor). Each
pixel in an image can be represented by a number, usually from 0 – 255. The series
of numbers on the right is what software sees when you input an image. For our
image, there are 12 columns and 16 rows, which means there are 192 input values
for this image.
When we start to add in color, things get more complicated. Computers usually read
color as a series of 3 values – red, green, and blue (RGB) – on that same 0 – 255
scale. Now, each pixel actually has 3 values for the computer to store in addition to
its position. If we were to colorize President Lincoln (or Harry Potter’s worst fear),
that would lead to 12 x 16 x 3 values, or 576 numbers.
For some perspective on how computationally expensive this is, consider this tree:
That’s a lot of memory to require for one image, and a lot of pixels for an algorithm
to iterate over. But to train a model with meaningful accuracy – especially when
you’re talking about Deep Learning – you’d usually need tens of thousands of
images, and the more the merrier. Even if you were to use Transfer Learning to use
the insights of an already trained model, you’d still need a few thousand images to
train yours on.
With the sheer amount of computing power and storage required just to train deep
learning models for computer vision, it’s not hard to understand why advances in
those two fields have driven Machine Learning forward to such a degree.
Computer vision is one of the areas in Machine Learning where core concepts are
already being integrated into major products that we use every day. Google is using
maps to leverage their image data and identify street names, businesses, and office
buildings. Facebook is using computer vision to identify people in photos, and do a
number of things with that information.
But it’s not just tech companies that are leverage Machine Learning for image
applications. Ford, the American car manufacturer that has been around literally
since the early 1900’s, is investing heavily in autonomous vehicles (AVs). Much of
the underlying technology in AVs relies on analyzing the multiple video feeds
coming into the car and using computer vision to analyze and pick a path of action.
Another major area where computer vision can help is in the medical field. Much of
diagnosis is image processing, like reading x-rays, MRI scans, and other types of
diagnostics. Google has been working with medical research teams to explore how
deep learning can help medical workflows, and have made significant progress in
terms of accuracy. To paraphrase from their research page:
But aside from the groundbreaking stuff, it’s getting much easier to integrate
computer vision into your own applications. A number of high-quality third party
providers like Clarifai offer a simple API for tagging and understanding images,
while Kairos provides functionality around facial recognition. We’ll dive into the
open-source packages available for use below.
A typical workflow for your product might involve passing images from a security
camera into Emotion Recognition and raising a flag if any aggressive emotions are
exhibited, or using Nudity Detection to block inappropriate profile pictures on your
web application.
MEADIAN FILTER
The median filter is a nonlinear digital filtering technique, often used to remove noise
from an image or signal. Such noise reduction is a typical pre-processing step to
improve the results of later processing (for example, edge detection on an image).
Median filtering is very widely used in digital image processing because, under
certain conditions, it preserves edges while removing noise (but see discussion
below), also having applications in signal processing
The main idea of the median filter is to run through the signal entry by entry,
replacing each entry with the median of neighboring entries. The pattern of neighbors
is called the "window", which slides, entry by entry, over the entire signal. For 1D
signals, the most obvious window is just the first few preceding and following
entries, whereas for 2D (or higher-dimensional) signals such as images, more
complex window patterns are possible (such as "box" or "cross" patterns). Note that
if the window has an odd number of entries, then the median is simple to define: it
is just the middle value after all the entries in the window are sorted numerically. For
an even number of entries, there is more than one possible median, see median for
more details.
Typically, by far the majority of the computational effort and time is spent on
calculating the median of each window. Because the filter must process every entry
in the signal, for large signals such as images, the efficiency of this median
calculation is a critical factor in determining how fast the algorithm can run. The
naïve implementation described above sorts every entry in the window to find the
median; however, since only the middle value in a list of numbers is required,
selection algorithms can be much more efficient. Furthermore, some types of signals
(very often the case for images) use whole number representations: in these cases,
histogram medians can be far more efficient because it is simple to update the
histogram from window to window, and finding the median of a histogram is not
particularly onerous.
using the multi-resolution pyramid technique, to copy the original image with
Pyramidal Gaussian or Laplacian Pyramid shape to obtain an image with the same
size but with reduced bandwidth. This achieves a special blurring effect on the
original image, called Scale-Space and ensures that the points of interest are scale
invariant.
Now that's some real robust image matching going on. The big rectangles mark
matched images. The smaller squares are for individual features in those regions.
Note how the big rectangles are skewed. They follow the orientation and perspective
of the object in the scene.
Chapter 8
SIFT is quite an involved algorithm. It has a lot going on and can become confusing,
So I've split up the entire algorithm into multiple parts. Here's an outline of what
happens in SIFT.
Constructing a scale space This is the initial preparation. You create internal
representations of the original image to ensure scale invariance. This is done by
generating a "scale space".
LoG Approximation The Laplacian of Gaussian is great for finding interesting
points (or key points) in an image. But it's computationally expensive. So we cheat
and approximate it using the representation created earlier.
Finding keypoints With the super fast approximation, we now try to find key points.
These are maxima and minima in the Difference of Gaussian image we calculate in
step 2
Get rid of bad key points Edges and low contrast regions are bad keypoints.
Eliminating these makes the algorithm efficient and robust. A technique similar to
the Harris Corner Detector is used here.
Assigning an orientation to the keypoints An orientation is calculated for each key
point. Any further calculations are done relative to this orientation. This effectively
cancels out the effect of orientation, making it rotation invariant.
Generate SIFT features Finally, with scale and rotation invariance in place, one more
representation is generated. This helps uniquely identify features. Lets say you have
50,000 features. With this representation, you can easily identify the feature you're
looking for (say, a particular eye, or a sign board). That was an overview of the
entire algorithm. Over the next few days, I'll go through each step in detail.
Problem statement:
The main problem is detection of authentication person enter biometric. So, we
reduce these types of problems by this project.
Objective:
The main objective of the project is detect weather person is thief or not based on
biometric of finger. For the best application we are using region of interest and CNN.
Using this techniques we can detect easily.
Existing methods:
• Support vector machine
• Conversion
• Histogram
Advantages:
• More security
• More accuracy
Chapter 9
Chapter 10
REFERENCES
• Chat-gpt(2.0)
• Google.