Classify Webcam Images Using Deep Learning
Classify Webcam Images Using Deep Learning
WEBCAM
IMAGES USING
DEEP LEARNING
ABSTRACT
• Deep learning has emerged as a new era in machine learning which is being
applied to a number of signal and image applications. The main purpose of the
work presented in this paper, is to apply the concept of a Deep Learning
algorithm namely, Convolutional neural networks (CNN) in classifying webcam
images in real time. The pretrained deep convolutional neural network that we
are using here is AlexNet that has been trained on over a million images and
can classify images into 1000 object categories (such as keyboard, coffee mug,
pencil, and many animals). Alexnet has learned rich feature representations for
a wide range of images. Images will be captured from our system webcam and
our pretrained deep convolutional neural network, AlexNet will identify objects
in our surroundings.
PROBLEMS IN CLASSIFYING IMAGES
• A CNN network is a class of feed forward artificial neural networks, most commonly
applied to analyzing visual imagery. Convolutional neural networks are inspired by
biological processes. In CNN connectivity pattern between neurons resembles the
organization of the animal visual cortex. Individual cortical neurons respond to
stimuli only in a restricted region of the visual field known as receptive field. The
receptive fields of different neurons partially overlap such that they cover entire
visual field.
• CNNs use relatively little pre-processing compared to other image classification
algorithms which means our network learns the filters that in traditional algorithms
were hard engineered. This independence from prior knowledge and human effort
in feature design is a major advantage.
CNN-ALEXNET
• It was designed by Alex Krizhevsky and published with Liya Sutskever and
Geoffrey Hinton. AlexNet competed in the ImageNet Large Scale Visual
Recognition Challenge in 2012.
• The network achieved a top-5 error of 15.3%, more than 10.8 percent points
lower than that of the runner up. AlexNet shows the probability of the image
• it captures from the camera. It shows the top five highest categories with
the maximum probabilities and according to that a chart is prepared.
AlexNet is trained over more than 50000 times and shows more correct
results as compared to previous trained models.
ARCHITECTURE OF CNN
• Convolution
• Non-Linearity (ReLU)
• Pooling or Sub Sampling
• Classification (Fully Connected Layer)
CONVOLUTION
• ConvNets derive their name from the “convolution” operator. The primary purpose of
Convolution in case of a ConvNet is to extract features from the input image. Convolution
preserves the spatial relationship between pixels by learning image features using small
squares of input data .In CNN terminology, the 3×3 matrix is called a ‘filter‘ or ‘kernel’ or
‘feature detector’ and the matrix formed by sliding the filter over the image and
computing the dot product is called the ‘Convolved Feature’ or ‘Activation Map’ or the
‘Feature Map‘. It is important to note that filters act as feature detectors from the original
input image. In practice, a CNN learns the values of these filters on its own during the
training process (although we still need to specify parameters such as number of filters,
filter size, architecture of the network etc. before the training process). More number of
filters we have, the more image features get extracted and the better our network
becomes at recognizing patterns in unseen images.
NON-LINEARITY (RELU)
• The net contains eight layers with weights; the first five are convolutional and the remaining three are fully-connected.
The output of the last fully-connected layer is fed to a 1000-way softmax which produces a distribution over the 1000
class labels. The response-normalization layers follow the first and second convolutional layers. Max-pooling layers
follow both of the response-normalization layers as well as the last (fifth) convolutional layer. The ReLU non-linearity is
applied to the output of every convolutional and fully-connected layer.
•
• The input to the net is a 227 × 227 × 3 image. The filters for each convolutional layer are:
• 96 kernels of size 11 × 11 × 3 with step size 4
• 256 kernels of size 5 × 5 × 48* with step size 1
• 384 kernels of size 3 × 3 × 256 with step size 1
• 384 kernels of size 3 × 3 × 192* with step size 1
• 256 kernels of size 3 × 3 × 192* with step size 1
THANK YOU