0% found this document useful (0 votes)
126 views

Identify Web Cam Images Using Neural Networks

This document discusses classifying webcam images in real time using deep learning and convolutional neural networks. It describes using a pretrained AlexNet model to identify objects from webcam images. AlexNet has been trained on over 1 million images and can classify images into 1000 categories. The document outlines problems with image classification like lighting and outlines key aspects of convolutional neural networks like convolutional layers, ReLU activation, pooling, and fully connected layers. It provides details on the architecture of AlexNet, which contains convolutional, pooling, and fully connected layers to classify images.

Uploaded by

gaurav
Copyright
© © All Rights Reserved
0% found this document useful (0 votes)
126 views

Identify Web Cam Images Using Neural Networks

This document discusses classifying webcam images in real time using deep learning and convolutional neural networks. It describes using a pretrained AlexNet model to identify objects from webcam images. AlexNet has been trained on over 1 million images and can classify images into 1000 categories. The document outlines problems with image classification like lighting and outlines key aspects of convolutional neural networks like convolutional layers, ReLU activation, pooling, and fully connected layers. It provides details on the architecture of AlexNet, which contains convolutional, pooling, and fully connected layers to classify images.

Uploaded by

gaurav
Copyright
© © All Rights Reserved
You are on page 1/ 17

CLASSIFY

WEBCAM IMAGES
USING DEEP
LEARNING
ABSTRACT

• Deep learning has emerged as a new era in machine learning which is being applied to a
number of signal and image applications. The main purpose of the work presented in this
paper, is to apply the concept of a Deep Learning algorithm namely, Convolutional neural
networks (CNN) in classifying webcam images in real time. The pretrained deep
convolutional neural network that we are using here is AlexNet that has been trained on
over a million images and can classify images into 1000 object categories (such as
keyboard, coffee mug, pencil, and many animals). Alexnet has learned rich feature
representations for a wide range of images. Images will be captured from our system
webcam and our pretrained deep convolutional neural network, AlexNet will identify
objects in our surroundings.
PROBLEMS IN CLASSIFYING IMAGES

 Large amount of intra-class variability


 Different lightening conditions
 Misalignment
 Non rigid deformation
 Occlusion
 Corruption
WHAT IS DEEP LEARNING?

• Deep learning (also known as deep structured learning or hierarchical learning)


is part of a broader family of machine learning methods based on learning data
representations, as opposed to task-specific algorithms. Learning can be supervised, semi-
supervised or unsupervised.
• Deep learning architectures such as deep neural networks, deep belief networks and
recurrent neural networks have been applied to fields including computer vision, speech
recognition, natural language processing, audio recognition, social network filtering,
machine translation, bioinformatics, drug design, medical image analysis, material inspection
and board game programs, where they have produced results comparable to and in some
cases superior to human experts.
WHY DEEP LEARNING?

• Learning features from data of interest is considered as a possible method of remedying


the limitations of hand-crafted features.
• Discover multiple levels of representation with the hope that higher level features can
represent more abstract semantics of the data. Such abstract representations learned from
a deep network are expected to provide greater robustness to intra-class variability.
• One key ingredient to the success of deep learning in image classification is the use of
convolutional architectures. A convolutional deep neural network (ConvNet) architecture
consists of multiple trainable stages stacked on top of each other followed by a supervised
classifier
CNN

• A CNN network is a class of feed forward artificial neural networks, most commonly
applied to analyzing visual imagery. Convolutional neural networks are inspired by
biological processes. In CNN connectivity pattern between neurons resembles the
organization of the animal visual cortex. Individual cortical neurons respond to stimuli
only in a restricted region of the visual field known as receptive field. The receptive fields
of different neurons partially overlap such that they cover entire visual field.
• CNNs use relatively little pre-processing compared to other image classification
algorithms which means our network learns the filters that in traditional algorithms were
hard engineered. This independence from prior knowledge and human effort in feature
design is a major advantage.
CNN-ALEXNET

• It was designed by Alex Krizhevsky and published with Liya Sutskever and Geoffrey
Hinton. AlexNet competed in the ImageNet Large Scale Visual Recognition Challenge in
2012.
• The network achieved a top-5 error of 15.3%, more than 10.8 percent points lower than
that of the runner up.AlexNet shows the probability of the image
• it captures from the camera. It shows the top five highest categories with the maximum
probabilities and according to that a chart is prepared. AlexNet is trained over more than
50000 times and shows more correct results as compared to previous trained models.
ARCHITECTURE OF CNN

• A CNN consists number of convolutional and subsampling layers optionally followed by fully
connected layers. The input to a convolutional layer is a m x m x r image where m is the height
and width of the image and r is the number of channels, e.g. an RGB image has r=3.
• The convolutional layer will have kk filters (or kernels) of size n x n x q where n is smaller than
the dimension of the image and q can either be the same as the number of channels r or
smaller and may vary for each kernel. The size of the filters gives rise to the locally connected
structure which are each convolved with the image to produce k feature maps of size m−n+1.
Each map is then subsampled typically with mean or max pooling over p x p contiguous regions
where p ranges between 2 for small images and is usually not more than 5 for larger inputs.
A SIMPLE CONV-NET
OPERATIONS IN CONV-NET

• Convolution
• Non-Linearity (ReLU)
• Pooling or Sub Sampling
• Classification (Fully Connected Layer)
CONVOLUTION

• ConvNets derive their name from the “convolution” operator. The primary purpose of
Convolution in case of a ConvNet is to extract features from the input image. Convolution
preserves the spatial relationship between pixels by learning image features using small squares
of input data .In CNN terminology, the 3×3 matrix is called a ‘filter‘ or ‘kernel’ or ‘feature
detector’ and the matrix formed by sliding the filter over the image and computing the dot
product is called the ‘Convolved Feature’ or ‘Activation Map’ or the ‘Feature Map‘. It is
important to note that filters act as feature detectors from the original input image. In practice,
a CNN learns the values of these filters on its own during the training process (although we
still need to specify parameters such as number of filters, filter size, architecture of the network
etc. before the training process). More number of filters we have, the more image features get
extracted and the better our network becomes at recognizing patterns in unseen images.
NON-LINEARITY (RELU)

• An additional operation called ReLU has been used after every Convolution operation.
ReLU stands for Rectified Linear Unit and is a non-linear operation.
• ReLU is an element wise operation (applied per pixel) and replaces all negative pixel
values in the feature map by zero. The purpose of ReLU is to introduce non-linearity in
our ConvNet, since most of the real-world data we would want our ConvNet to learn
would be non-linear (Convolution is a linear operation – element wise matrix
multiplication and addition, so we account for non-linearity by introducing a non-linear
function like ReLU).
POOLING STEP

• Spatial Pooling (also called subsampling or down-sampling) reduces the dimensionality of


each feature map but retains the most important information. Spatial Pooling can be of
different types: Max, Average, Sum etc. In case of Max Pooling, we define a spatial
neighborhood (for example, a 2×2 window) and take the largest element from the
rectified feature map within that window. Instead of taking the largest element we could
also take the average (Average Pooling) or sum of all elements in that window.
FULLY CONNECTED LAYER

• The Fully Connected layer is a traditional Multi-Layer Perceptron that uses a softmax
activation function in the output layer (other classifiers like SVM can also be used, but will
stick to softmax in this post). The term “Fully Connected” implies that every neuron in the
previous layer is connected to every neuron on the next layer. The output from the
convolutional and pooling layers represent high-level features of the input image. The
purpose of the Fully Connected layer is to use these features for classifying the input
image into various classes based on the training dataset.
ALEXNET ARCHITECHTURE
DESCRIBING NETWORK

• The net contains eight layers with weights; the first five are convolutional and the remaining three are fully-connected. The output of the last fully-
connected layer is fed to a 1000-way softmax which produces a distribution over the 1000 class labels. The response-normalization layers follow
the first and second convolutional layers. Max-pooling layers follow both of the response-normalization layers as well as the last (fifth)
convolutional layer. The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer.

• The input to the net is a 227 × 227 × 3 image. The filters for each convolutional layer are:
• 96 kernels of size 11 × 11 × 3 with step size 4
• 256 kernels of size 5 × 5 × 48* with step size 1
• 384 kernels of size 3 × 3 × 256 with step size 1
• 384 kernels of size 3 × 3 × 192* with step size 1
• 256 kernels of size 3 × 3 × 192* with step size 1
THANK YOU

You might also like