0% found this document useful (0 votes)
144 views16 pages

CNN Project

Uploaded by

Aditya Basu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
144 views16 pages

CNN Project

Uploaded by

Aditya Basu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MACHINE LEARNING CO327

PROJECT SYNOPSIS

Convolutional Neural Networks (CNN)

Team Members:

1. Alok | 2K22/CO/45

2. Alok Singh | 2K22/CO/46

3. Aditya Kumar Basu | 2K22/CO/28

4. Ankit | 2K22/CO/62

5. Adarsh Gupta | 2K22/CO/21

Professor Name: Dr. Aruna Bhatt


Convolutional Neural Networks (CNN)
for Image Classification
1. Introduction

Machine learning has been transforming industries and advancing the capabilities of
intelligent systems. A particular area of interest in the realm of deep learning is image
recognition and classification, where machines are taught to understand, interpret, and
classify images in ways that mimic human vision. Convolutional Neural Networks (CNNs)
have been a game changer in solving complex problems in this domain.

CNNs, a type of deep learning architecture specifically designed for image-related tasks,
have achieved state-of-the-art results in tasks such as image classification, object detection,
and even video processing. They have been widely used in applications like autonomous
driving, medical image analysis, and facial recognition systems, making them highly
valuable in both academia and industry.

While CNNs outperform traditional machine learning models on tasks involving


unstructured data like images, they come with significant challenges. For this project, we
will implement CNNs for image classification, study their working principles, and focus on
optimization techniques to enhance performance and reduce computational costs. By
investigating key factors influencing CNNs' accuracy and efficiency, we aim to provide
insights and solutions to tackle

2. Problem Statement

With the increasing amount of digital content, there has been a need to look for effective
ways of processing and classifying the image data. Traditional approaches like designing
and training a machine learning model by crafting features have been shown to thave
problems with scale image data that is high dimensional and complex This is where deep
learning models, particularly Convolutional Neural Networks (CNNs), come in.

The core issue that we focus on in this particular project is concisely formulating Algorithm
3, or more specifically, collecting original data, processing it and creating a robust CNN-
based model with the ability to classify images even with challenges of complexity, class
imbalance and overfitting. Our model should have the property of not overfitting on the
training samples, operate in a manner that is time efficient in real world usage and allows
for scaling of inference to new data that was not seen by the model.
Therefore, within this framework, the primary questions we aim to answer are the ones
below:

1. How do we build an efficiently designed CNN structure that succeeds in high image
classification according to the common benchmarks for image tasks?

2. How do we resolve issues like overfitting, cost, and model size when the dataset is large
for practical model training?

3. What steps or practices can be taken when constructing a CNN to improve its operation
during both training and inference?

3. CNN Background and Core Learning Concepts

At the heart of Convolutional Neural Networks, or CNNs, are the principles used for
processing information in the form of multiple arrays, commonly images. Bear in mind that
conventional fully-connected neural networks are not able to take into consideration the 2D
spatial relationships of pixels within an image, which is very efficient. This is where CNNs
shine through their unique structure that includes:

- Convolutional Layers: These consist of a set of moving filters or kernels that are placed
over the input image carrying out convolution that basically detects different patterns
(edges, textures, etc.) found in the image.

- Pooling Layers: This involves smaller dimension images, or those that occupy less space,
but stores an important part of the initial image to save inconsiderable computation cost.

- Fully Connected Layers: These are attached at the end of several convolutional and pooling
layers so that the output is a one-dimensional picture that is to be classified by the
registered templates.
Convolutional neural networks have a hierarchical structure, meaning that low features
such as edges are captured and represented by the first several layers, mid-level to shapes –
the middle ones, and several high abstract representations take the last layers.

4. Issues confronting CNNs

Even though CNNs are very efficient, there are still a number of issues which complicates
their deployment and optimization:

1. Computational Load: CNNs are computation intensive networks and even for image
classification tasks CNNs are very resource intensive on large datasets. Few Layers and
billions of parameters consisting CNNs need high performing hardware like GPUs or TPUs
for training. Besides, it is also important to optimize CNN architectures in order to improve
the inference time for the real time applications.

2. Overfitting: Overfitting refers to a situation when one performs very well on one’s
training dataset, yet makes poor predictions on unseen datasets. Regularization, dropout,
and data augmentation among other techniques are often employed to help combat
overfitting.

3. Hyperparameters Tuning: Any time one is applying CNNs, there are numerous
hyperparameters which should be tuned for performance optimization. These are the
learning rate, batch size, number of epochs, kernel size, filter depth, etc, among others. The
cost of optimizing factor analysis by having to test numerous set of hyperparameters is high
in both time and computation.

4. Data Availability: Large labeled datasets are usually required for effective training of
CNNs. Nonetheless, for some tasks (e.g. medical imaging), it is difficult to collect such data.
Because of this, methods such as transfer learning, in which models that have already been
trained are adjusted to fit a smaller dataset, are used.

5. Class Imbalance: In classification processes, if some classes are much fewer in number
than the rest within the classification datasets, the performance of the model on those
classes will be very poor. Oversampling, undersampling, or loss function with class weights
are some of the approaches that can be useful in addressing these problems.

6. Interpretability: CNNs indeed perform very well, however they are frequently
considered”black box models” since their decisions are not easily visible. But in such
applications as healthcare where the stakes are very high, interpretability is crucial. Grad-
CAM (Gradient-weighted Class Activation Mapping) and similar algorithms allow
visualization of the parts of the input image that the CNN derives first when making a
decision.

5. Objectives

In response to these problems and to unlock the full potential of CNNs, our project
objectives are:

- Design an Efficient CNN Architecture: Develop a CNN which is capable of image


classification with high accuracy but low computational cost. Asian active will attempt to
resolve several design decisions including the number of layers, filter sizes, and activation
functions.

- Improve Generalization: Use regularization methods like dropout, batch normalization,


and data augmentation to avoid overfitting. In addition we will try transfer learning for the
model performance enhancement in cases of small datasets.

- Optimize CNN Performance: Consider further optimization techniques such as parameter


pruning, quantization, and model compression to enhance the efficiency of the CNN for real-
time applications.

- Comparison with Other Models: Compare the performance of the CNN with traditional
machine learning models (Support Vector Machines, k-NN, etc) to show its advantages in
processing high dimensional images.

6. Approach
We will apply our structured framework to solve the problem in the following steps:

1. Choosing a dataset: We shall use a standard dataset like CIFAR-10, MNIST, or datasets
from specific domains (for example, medical images). We will carry out pre-processing such
as image normalization, resizing, data augmentation and more.

2. Design of CNN Architecture: A CNN architecture with convolution, pooling and fully
connected layers will be implemented from scratch. The different performance indicators
for the CNN will be assessed including accuracy, precision, recall and F1 score.

3. Model training and optimization: The CNN will be trained using Stochastic Gradient
Descent (SGD) or Adam optimizer and Grid or Random search will be used for
hyperparameter tuning. To reduce overfitting, Dropout, Batch Normalization and Early
Stopping techniques would be employed.

4. Performance Evaluation: The model will also be tested, and its ability to generalize on
unseen data will also be evaluated. The efficiency of the model will also be examined in
terms of its training time, inference time and the amount of memory needed.

5. Comparative Analysis: Other algorithms such as SVM, Random forests, or k-NN will be
used to evaluate the performance of the CNN and therefore demonstrate the benefits and
drawbacks of deep learning in the field of image classification.
ABSTRACT
1. Introduction

1. Introduction to CNN

· CNN is a specialized deep learning model designed for processing structured grid-like data,
such as images.

· It automates the process of feature extraction, unlike traditional machine learning


techniques where manual feature engineering is required.

· CNNs are highly effective in image-related tasks, including detection, classification,


segmentation, and more.

2. Objective:

· The main goal is to use CNN for detecting objects within images by identifying patterns
such as edges, textures, and shapes.

· Applications include object recognition, real-time image detection, medical imaging,


autonomous vehicles, and security systems.

3. CNN Architecture:

➢ CNN consists of several key layers:

• Convolutional Layers: Apply filters to extract visual features.

• Pooling Layers: Down sample the data to reduce dimensionality and


computation, preserving important features.

• Fully Connected Layers: Interpret the extracted features to make predictions.

• Activations Functions: Introduce non-linearity (commonly using ReLU).


4. Training and Dataset:

· CNNs are typically trained on large labeled image datasets such as COCO, PASCAL VOC,
MNIST, or CIFAR-10.

· The network learns through backpropagation, adjusting weights based on the error
between predictions and true labels using optimization algorithms like Adam or SGD.

5. Techniques for Image Detection:

Several advanced CNN-based methods for image detection are widely used:

YOLO (You Only Look Once): Predicts bounding boxes and class probabilities directly
from full images in one pass.

Faster R-CNN: Region-based detection, generating candidate regions before classification.

SSD (Single Shot Detector): Detects objects in a single evaluation, offering speed and
accuracy.

6. Challenges and Improvement:

· Key challenges include dealing with occlusion, variations in object size, and real-time
detection for videos.

· Solutions involve data augmentation, transfer learning using pre-trained models (e.g.,
VGG16, ResNet), and improving network depth and complexity.

7. Applications:

CNN-based image detection is applied in various domains:

• Autonomous Vehicles: Detecting pedestrians, traffic signs, and other objects in


real-time.

• Medical Imaging: Identifying tumors and anomalies in X-rays and MRI scans.

• Security System: Real-time surveillance and threat detection.

• Retail: Automated inventory management using image recognition.


8. Conclusions:

· CNNs, particularly in combination with advanced detection techniques like YOLO and
Faster R-CNN, provide state-of-the-art performance for image detection tasks.

· Continuous improvements in CNN architecture, training techniques, and hardware


acceleration are driving progress in image detection, making it faster, more accurate, and
applicable to a wide range of fields.

This structured overview highlights the key aspects of CNN-based image detection,
including architecture, methodology, challenges, and real-world applications.

SURVEY OF RECENT WORKS IN CNN


CNN in Image Classification

1. EfficientNet

EfficientNet is a convolutional neural network architecture and scaling technique that


uniformly scales all dimensions of intensity/width/resolution using a compound coefficient.
Unlike conventional exercise that arbitrary scales these factors, the EfficientNet scaling
approach uniformly scales network width, intensity, and resolution with a set of constant
scaling coefficients. For example, if we want to use

times more computational sources, then we can surely increase the network intensity by
using, and photograph length by means of , in which are steady coefficients decided by way
of a small grid search on the original small model. EfficientNet uses a compound coefficient
to uniformly scales community width, intensity, and backbone in a principled manner.
The compound scaling method is justified through the instinct that if the input photo is
bigger, then the network desires extra layers to increase the receptive field and extra
channels to capture more quality-grained styles on the larger photograph.

The base EfficientNet-B0 community is primarily based on the inverted bottleneck residual
blocks of MobileNetV2, similarly to squeeze-and-excitation blocks.

EfficientNets also transfer nicely and reap present day accuracy on CIFAR-one hundred
(91.7%), Flowers (ninety eight.8%), and three different transfer gaining knowledge of
datasets, with an order of importance fewer parameters.

2. VIT

The Vision Transformer, or ViT, is a model for image category that employs a Transformer-
like structure over patches of the photo. An image is break up into constant-size patches,
each of them are then linearly embedded, role embeddings are brought, and the ensuing
series of vectors is fed to a standard Transformer encoder. In order to perform class, the
standard method of including an extra learnable “classification token” to the
collection is used.
CNN in Object Detection

⚫ YOLOs

YOLOv5 is a pc imaginative and prescient version that belongs to the You Only Look Once
(YOLO) circle of relatives of models. It's in most cases used for object detection, this means
that it may identify and discover gadgets inside pics or films. YOLOv five is an improvement
over its predecessors, imparting higher accuracy and speed. It's extensively utilized in
programs consisting of self-driving cars, surveillance structures, and facial
reputation systems.
⚫ DETR

The DETR model was proposed in End-to-End Object Detection with Transformers by
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov and
Sergey Zagoruyko. DETR consists of a convolutional backbone followed by an encoder-
decoder Transformer which can be trained end-to-end for object detection. It greatly
simplifies a lot of the complexity of models like Faster-R-CNN and Mask-R-CNN, which use
things like region proposals, non-maximum suppression procedure and anchor generation.
Moreover, DETR can also be naturally extended to perform panoptic segmentation, by
simply adding a mask head on top of the decoder outputs.

⚫ CNN in medical image analysis

UNet++

UNet++ is an structure for semantic segmentation primarily based at the U-Net. Through the
usage of densely linked nested decoder sub-networks, it complements extracted
characteristic processing and become reported through its authors to outperform the U-Net
in Electron Microscopy (EM), Cell, Nuclei, Brain Tumor, Liver and Lung Nodule medical
photograph segmentation obligations.
CNN in NLP

⚫ CHAR-CNN

Char-CNN (Character-degree Convolutional Neural Network) tactics textual content on the


man or woman level in preference to phrases. Instead of the use of pre-skilled phrase
embeddings, Char-CNN learns functions without delay from sequences of characters,
making it useful for responsibilities like textual content classification, sentiment evaluation,
or named entity recognition, particularly in noisy or non-trendy textual content (e.G., social
media or area-particular jargon).

It changed into popularized by using researchers like Xiang Zhang et al., who confirmed that
this technique should obtain aggressive outcomes by way of taking pictures diffused styles
in raw text, including misspellings or new phrases.

⚫ LSTM-CNN

CNN-LSTM hybrid models combine *Convolutional Neural Networks (CNNs)* and *Long
Short-Term Memory (LSTM)* networks to leverage the strengths of each architectures.

- *CNNs* excel at feature extraction from spatial information, like pictures or sequences, by
way of capturing neighborhood patterns.

- *LSTMs*, a kind of recurrent neural community (RNN), are designed to handle sequential
statistics, maintaining records throughout time steps.

In a hybrid CNN-LSTM, CNN layers extract features, which might be then surpassed to LSTM
layers to version temporal dependencies. These fashions are widely utilized in video
evaluation, text prediction, speech reputation, and time-collection forecasting.
CNN in Generative Models

⚫ StyleGAN3

*StyleGAN3* is NVIDIA’s 0.33 iteration within the StyleGAN series for top notch picture
synthesis. It addresses aliasing artifacts from preceding variations, making sure smoother,
spatially steady outputs, making it appropriate for videos and animations. The version gives
editions: *StyleGAN3-T* (translation equivariant) and *StyleGAN3-R* (rotation equivariant)
to hold consistency below modifications. Compared to StyleGAN2, it improves geometric
coherence and decreases distortions. StyleGAN3 is usually utilized in AI art, digital avatars,
and game development. It calls for high computational energy and is open-sourced for in
addition research. This model is in particular beneficial for obligations requiring seamless
transitions throughout frames.
⚫ CYCLEGAN

*CycleGAN* is a deep studying version for *unpaired picture-to-image translation, meaning


it may convert pix between two domains (e.G., pix to paintings) without having matching
pairs. It makes use of **two GANs* to carry out bidirectional alterations (A ↔ B) and
employs *cycle consistency loss* to make certain that translating an picture to the goal area
and back keeps its authentic structure. CycleGAN is extensively used for *fashion switch,
**item differences* (like horses to zebras), and *enhancing clinical or satellite tv for pc snap
shots*. While powerful, it may warfare with problematic information in complex
transformations. Its flexibility makes it valuable for innovative tasks in which paired
datasets are unavailable.
RECENT WORKS DONE IN CNN IN 2024

Human Brain Map:

Researchers from Harvard University and Google collaborated to create the most detailed
map of a human brain sample ever created, using CNNs to analyze the data.

As per the report, this achievement took ten years of work, which offers new opportunities
for understanding the brain's complex network of neurons and could pave the way for
breakthroughs in treating neurological and psychiatric conditions.

The project began when Dr Jeff Lichtman, a professor of molecular and cellular biology at
Harvard, received a tiny brain sample from a patient with severe epilepsy. Although the
tissue was smaller than a grain of rice, it contained 57,000 cells, 230 millimetres of blood
vessels, and over 150 million synapses.

You might also like