Batch 4 - Revolutionizing Blood Cell Analysis
Batch 4 - Revolutionizing Blood Cell Analysis
BY
Batch – 04
Assistant Professor
Department of Information Technology
[Accredited by National Assessment and Accreditation Council [NAAC] with A Grade – valid from 2022-2027]
2020-2024
GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING FOR WOMEN
CERTIFICATE
This is to certify that the project report entitled “Revolutionizing Blood Cell Analysis Using Machine
Learning” is a bonafide work of the following IV B.Tech. students in the Department of Information
Technology of Jawaharlal Nehru Technological University, Kakinada during the academic year 2023-
2024, in fulfillment of the requirement for the award of the Degree of Bachelor of Technology of this
University.
R. Gowthami G. Keerthana
21JG5A1205 20JG1A1209
External Examiner
ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of any task would be incomplete without
the mention of people who made it possible and whose constant guidance and encouragement crown all
the efforts with success.
We feel elated to extend our sincere gratitude to Mrs. R. Prasanna Kumari, Assistant Professor for
encouragement all the way during analysis of the project. Her annotations, insinuations and criticisms
are the key behind the successful completion of the thesis and for providing us all the required facilities.
We express our deep sense of gratitude and thanks to Dr. M. Bhanu Sridhar, Professor and Head of
the Department of Information Technology for his guidance and for expressing valuable and grateful
opinions in the project for its development and for providing lab sessions and extra hours to complete
the project.
We would like to take this opportunity to express our profound sense of gratitude to Vice Principal,
Prof. G. Sudheer for allowing us to utilize the college resources thereby facilitating the successful
completion of our thesis.
We would like to take the opportunity to express our profound sense of gratitude to the revered Principal,
Prof. R. K. Goswami for allowing us to utilize the college resources thereby facilitating the successful
completion of our thesis.
We are also thankful to other teaching faculty and non-teaching staff of the Department of Information
Technology for giving valuable suggestions for our project.
VISION & MISSION
INSTITUTE VISION
• To emerge as an acclaimed centre of learning that provides value based technical education
for the holistic development of students.
INSTITUTE MISSION
DEPARTMENT VISION
• Produce competitive engineers instilled with ethical and social responsibilities to deal with the
technological challenges in the field of Information Technology.
DEPARTMENT MISSION
• Facilitate a value based educational environment that provides updated technical knowledge
• Provide opportunities for developing creative, innovative and leadership skills
• Imbue technological and managerial capabilities for a successful career and lifelong learning
TABLE OF CONTENTS
LIST OF FIGURES ii
1. INTRODUCTION 1-2
1.1 MOTIVATION 1
2.1 INTRODUCTION 3
2.2 ALGORITHMS 3
2.5 DATASET 4
2.7 CONCLUSION 5
3.1 INTRODUCTION 6
3.6 CONCLUSION 11
4. DESIGN 12-20
4.1 INTRODUCTION 12
4.3 CONCLUSION 20
5. IMPLEMENTATION 21-59
5.1 ALGORITHMS 21
6.1 INTRODUCTION 60
6.1.1 SCOPE 61
6.1.3 COMPATIBILITY 62
6.4 CONCLUSION 67
7.CONCLUSION 68
7.1 CONCLUSION 68
8. REFERENCES 69
ABSTRACT
In this Research Blood cells help us to fight infections by attacking bacteria, viruses, and germs that
invade the body. White blood cells originate in the bone marrow but circulate throughout the
bloodstream, while red blood cells helps transport oxygen to our body. Accurate counting of those
may require laboratory testing procedure that is not usual to everyone. Generating codes that will help
counting of blood cells that produce accurate response via images gives a relief on this problem. The
suggested method includes preprocessing blood cell pictures, feature extraction, and training machine
learning models for cell detection and counting. A variety of advanced classification techniques are
used they are Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN) are used to
evaluate the shape of cells and differentiate between various cell types.
i
LIST OF FIGURES
ii
LIST OF OUTPUT SCREENS
1.PERFOMANCE EVALUATION 37
2.SIGNUP PAGE 53
3.LOGIN PAGE 54
4.HOME PAGE 54
5.ABOUT PAGE 55
6.LOGOUT PAGE 55
7.EOSINOPHIL OUTPUT 56
8.MONOCYTE OUTPUT 57
9.NEUTROPHIL OUTPUT 58
10.LMYPHOCYTE OUTPUT 59
iii
CHAPTER 1
INTRODUCTION
1.1 MOTIVATION
Machine learning models, such as Convolutional Neural Networks (CNNs), offer a tempting path for
medical research and healthcare innovation in the detection and counting of blood cells. Healthcare
practitioners can greatly minimize the time and effort needed for manual analysis by automating the
blood cell detection and counting process. This will streamline diagnostic procedures and enhance
patient care outcomes. The ultimate goal of this research is to revolutionize haematological analysis and
advance the area of medical diagnostics by utilizing CNNs to provide a reliable and scalable blood cell
identification and counting system. This research project seeks to support larger initiatives aimed at
improving patient outcomes and healthcare delivery through thorough documentation and analysis.
In the field of medical diagnosis and research, the detection and counting of blood cells plays a central
role in the assessment of an individual's health status and the diagnosis of various diseases. The goal of
this project is to develop a robust system that can accurately identify and quantify different types of
blood cells, including white blood cells and red blood cells using microscopic images of blood samples.
The system uses advanced machine learning techniques to analyse images and differentiate different cell
types based on their morphological characteristics such as shape, size and colour.
The project aims to identify and count blood cells using machine learning techniques that significantly
reduce human effort and increase diagnostic accuracy.
While machine learning techniques hold great promise for blood cell detection and counting, it is
important to take into account a few restrictions. First off, the system's accuracy is highly dependent on
the quality of the input photos; discrepancies in image quality, staining methods, and sample preparation
procedures can all cause errors and inconsistencies in the output. Accurate segmentation and
classification may also be hampered by the variety of blood cell morphologies and the existence of
overlapping cells in dense regions. Furthermore, when given data from populations with uncommon or
aberrant blood cell morphologies that are not sufficiently represented in the training dataset, the machine
1
learning models' performance may deteriorate. Moreover, in environments with limited resources, the
processing power needed to train and implement complex machine learning models may be prohibitive.
Chapter 1: Introduction describes the motivation of this project and objective of the developed project.
Chapter 2: Literature survey describes the primary terms involved in the development of this project
and overview of the papers that were referred before starting the project.
Chapter 3: The Analysis chapter deals with detailed analysis of the project. Software Requirement
Specification further contains user requirements analysis, software requirement analysis and hardware
requirement analysis.
Chapter 4: Design includes UML diagrams along with explanation of the system design and the
organization.
Chapter 5: Contains step by step implementation of the project and screenshots of the outputs.
Chapter 6: Gives the testing and validation details with design of test cases and scenarios along with
the screenshots of the validations.
Chapter 7: In this section we conclude the project by citing all the important aspects of the project along
with the different ways in which the project can be developed further are given in the future enhancement
section of the conclusion.
2
CHAPTER 2
LITERATURE SURVEY
2.1 INTRODUCTION
In medical diagnostics and research, blood cell identification and counting are essential procedures that
are necessary for determining a person's overall health and making a variety of disease diagnoses. It
takes a lot of time, work, and human error to analyse blood cells manually using traditional methods
therefore, automated solutions have to be developed for accurate and efficient analysis. The development
of automated blood cell detection and counting systems has been made possible in recent years by
developments in digital imaging technology, as well as the quick development of machine learning and
computer vision techniques.
3
DISADVANTAGES OF THE EXISTING SYSTEM:
Overfitting: When trained on small blood cell datasets, YOLO v5 models are susceptible to overfitting
without proper regularization and modification, which results in poor generalization performance on
new data.
Training data requirements: To train YOLO v5, a significant amount of annotated data is usually
needed. This can be difficult to get for blood cell pictures, particularly for uncommon or abnormal cell
types.
Resolution limitations: YOLO v5 may struggle with detecting small or densely clustered blood cells
due to limitations in resolution or spatial context, potentially leading to undercounting or
misclassification of cells.
2.5 DATASET
The dataset we used contains 12,500 images of blood cells (JPEG). There are approximately 3,000
images for each of 4 different cell types grouped into 4 different folders (according to cell type).
The cell types are Eosinophil, Lymphocyte, Monocyte, and Neutrophil.
4
2.6 FEATURE DESCRIPTION
FEATURE DESCRIPTION
Monocytes Make up about 3% of WBCs and are about two to three times larger than
red blood cells.
Shape: large,round or kidney-shaped cells
Size: 12 to 20 micrometers (µm)
Colour: bluish-gray hue
Eosinophils Make up about 2% of WBCs and identify and destroy parasites and cancer
cells.
Shape: "lobulated" shape or oval shape
Size: 12 to 17 micrometers (µm)
Colour: pale pink or orange
Neutrophils Make up about 62% of WBCs and can destroy foreign particles like bacteria
and viruses.
Shape: polymorphonuclear" or "PMN"
Size: 10 to 12 micrometers (µm)
Colour: pale or light pinkish
Lymphocytes Make up about 32% of WBCs and consist of T cells, natural killer cells, and
B cells.
Shape: Round or slightly oval in shape
Size: 6 to 14 micrometers (µm)
Colour: colourless or have a pale-blue tint
2.7 CONCLUSION
Thus our initiative on blood cell identification, categorization, and counting marks a big step forward in
medical diagnostics and research. Through the use of advanced image processing methods and machine
learning algorithms, we have created a reliable system that can distinguish and measure various blood
cell types from microscopic images. We have proven that the system is capable of achieving high
accuracy and reliability in the blood cell identification and counting task through thorough
experimentation and validation. This initiative highlights how technology is transforming healthcare and
how crucial interdisciplinary collaboration is to encouraging innovation that leads to better medical
results.
5
CHAPTER 3
REQUIREMENT ANALYSIS
3.1 INTRODUCTION
To be used efficiently, all computers need certain Software and Hardware resources to be present on a
computer. These prerequisites are known as (computer) system requirements and are often used as
instructions as opposed to an absolute rule. With the increasing demand for higher processing power and
resources in new versions of software. Industry experts believe that instead of advancements in
technology, this trend is more responsible for the changes that are being made to current computer
systems. The following are the three main tasks that need to be completed during the analysis phase:
• Specifying Software Requirements
• Creating a Content Diagram
• Performing an analysis by drawing a flowchart
Requirement Analysis will identify and consider the risks related to how the technology will be
integrated into the standard operating procedures. Requirements Analysis will also collect the
functional and system requirements of the business process, the user requirements, and the operational
requirements. Hence this section deals with the software and hardware requirements with respect to our
project.
INTRODUCTION
The risks associated with integrating technology into standard processes will be identified and taken into
consideration by requirement analysis. In addition to obtaining user and operational requirements,
requirements analysis will also gather functional and system requirements for the business process.
Hence, the software and hardware specifications for our project are discussed in this section.
User requirements are simply the needs of users that the software system has to meet. The system's and
the users' features and a description of them are provided in the user requirements, which also include
product specifications. It also provides a brief description of the project's technical and statistical
requirements.
6
3.2.2 FUNCTIONAL REQUIREMENTS
The purpose of a system or one of its components is specified by a functional requirement. An input,
behaviour, and output set are characteristics of a function. The definition of what a system is expected
to do can be achieved through specialized functionality, computations, technical specifications, data
processing and manipulation, and other specific functionality. Use cases contain behavioural
requirements that list all scenarios in which the system applies the functional requirements. Non-
functional requirements, also called quality requirements, provide as a support for functional
requirements by placing limitations on the design or implementation (e.g., performance, security, or
reliability requirements).
Software requirements deals with defining the minimum system requirements and software resources
that must be installed for an application to run as efficiently as possible. Usually not included in the
software installation package, these prerequisites must be installed separately before the product can be
installed.
Software: VS code
Language: Python
The hardware, or physical computer resources, are the most common list of requirements specified by
any operating system or software program. Hardware compatibility lists (HCLs) are frequently included
with hardware requirements lists, particularly when it comes to operating systems.
The software being produced by a particular Enthought Python, Canopy, or VS Code user greatly
influences the minimal hardware requirements. Applications that must Constantly keep large arrays or
data in memory will need more RAM, while applications that must rapidly complete a lot of calculations
or operations will need a faster processor:
7
3.3 CONTENT DIAGRAM OF THE PROJECT
8
3.5 LIST OF MODULES
• Data Pre-Processing
• Principal Component Analysis
• Model Training for Artificial Neural Network
• Model Training for Convolutional Neural network
• Performance Evaluation
• User Interface
1.Data Acquisition:
We have used the dataset named as “Blood Cell Images”. The dataset is made up of blood cell images
divided into 4 classes they are Eosinophils, Monocyte, Lymphocyte, Neutrophil containing 12500
images, having approximately 3000 images from each class. The images are represented in JPEG format.
2.Data Preprocessing:
Data Preprocessing is the process of converting raw data into format suitable for machine learning
algorithms. It is done to improve the quality and enhance the performance of data. In the context of
preprocessing, resize refers to converting all the images into same size. By normalization, the categorical
data is transformed into numerical format and grayscale turns the image into gray color. This process
prepares the input data for training.
TensorFlow is a free and open-source machine learning framework developed by Google Brain Team.
It is used to build and train deep learning models. TensorFlow offers flexibility, scalability for
performing tasks like image classification, natural language processing and reinforcement learning. To
import TensorFlow library, simply write a command “import tensor flow as tf”.
B. Input Layer
Input Layer is the first Layer of CNN model. This layer serves as entry point for the dataset. In case of
images they turn out into pixel values and enters. The input layer comprises of images dimensions, such
as height, width, and number of color channels. Fixed range of 120x120 RGB pixels are taken as inputs.
9
C. Convolutional Layer
The Conv2D layer is a fundamental building block in Convolutional Neural Networks (CNNs) for
processing two-dimensional inputs, such as images. It performs convolutional operations on the input
data, applying a set of filters to extract features. This layer performs a convolution operation on the input
data, using a set of learnable filters to extract features. As the filters slide over the input image, they
compute the dot product with overlapping region of the input at each position, generating a feature map.
D. Pooling layer
Pooling layer is responsible for decreasing the dimensions of feature maps. It extracts the important
features from numerous features maps where it performs dimensionality reduction. Moving forward, the
operations will be held on summed-up features rather than precisely positioned features from
convolutional Layer. We considered max pooling to extract maximum value from feature maps. So it
reduces the dimension of feature maps.
Fully connected layer connects every neuron to other neuron present in the layer allowing the network
to learn complex patterns. Fully connected layers works in conjunction with convolutional layer for
image classification tasks, where they combine the features learned by preceding layers.
10
F. Activation Function
Activation Functions are the mathematical operations applied over the neurons in the neural networks
including CNN’s, introducing non-linearity to the model for extracting complex patterns from the
images. ReLU is commonly used but also many other depending upon the problem.
ReLU, short for Rectified Linear Unit, is one of the most commonly used activation functions in neural
networks, including Convolutional Neural Networks (CNNs). It's defined asf(x)=max(0,x), which means
it outputs the input value if it's positive and zero otherwise.
Sigmoid is an activation function commonly used in neural networks, particularly in binary classification
tasks. It squashes the output of a neuron to the range [0, 1], making it suitable for tasks where the output
needs to be interpreted as a probability. Mathematically, the sigmoid function is defined as f(x)=1/(1+e-
x),where x represents the input to function.
3.6 CONCLUSION
This chapter starts with an introduction to analysis in general. Next, software requirements and hardware
requirements are covered, including which software is needed to implement and operate the system and
what hardware is needed in order to set up and manage the project.
11
CHAPTER 4
DESIGN
4.1 INTRODUCTION
Software Design is a process of planning the new or modified system. The design step produces the
understanding and procedural details necessary for implementing the system recommended in the
feasibility study. Analysis specifies what a new or modified system does. Design specifies how to
accomplish the same. Design is essentially a bridge between requirement specification and the final
solution satisfying the requirement. It is a blueprint or a solution for the system. The design step
produces a data design, an architectural design and procedural design. The design process for a software
system has two levels. At first level the focus is on depending in which modules are needed for the
system, the specification of these modules and how the modules should be interconnected. This is what
is called system designing of top-level design. In the second level, the internal design of the modules,
or how the specification should be interconnected.
It is the first level of the design which produces the system design, which defines the components
needed for the system, and how the components interact with each other. It focusses on depending in
which the modules are needed for the system; the specification of these module should be
interconnected.
Logic Design
The logic design of this application shows the major features and how they are related to one another.
The detailed specifications are drawn on the bases of user requirements. The outputs, inputs and
relationship between the variables are designed in this phase.
Input Design
The input design is the bridge between users and information system. It specifies the way the data enters
the system for processing. It can ensure the reliability of the system and produces reports from accurate
data, or it may result in the output of error information. While designing the following points must be
taken into consideration: Input Formats are designed as per user requirements. • Interaction with the
user is maintained in simple dialogues. Appropriate fields are designed with validations there by
allowing valid inputs.
12
Output Design
Each and every presented in the system is result-oriented. The most important feature of this application
for users is the output. Efficient output design improves the usability and acceptability of the system
and helps in decision making.
• Visualizing
• Specifying
• Constructing
• Documenting
Goals of UML
1. Provide users with a ready-to-use, expressive visual modelling language so they can develop and
exchange meaningful models.
13
7. The system can be a software or non-software. So it must be clear that UML is not just a
development method.
The basic building blocks in UML are things and relationships. These are combined in different ways
following different rules to create different types of diagrams. In UML there are Eight types of
diagrams, below is a list and brief description of them. The more indepth descriptions in the document,
will focus on the first five diagrams in the list, which can be seen as the most general, sometimes also
referred to as the UML core diagrams.
The UML consists of a number of graphical elements that combine to form diagrams. Because it’s a
language, the UML has rules for combining these elements. The purpose of the diagrams to present
multiple views of the system, and this set of multiple views is called a Model. A UML Model of a
system is something like a scale model of a building. UML model describes what a system is supposed
to do. It doesn’t tell how to implement the system. These are the artefacts of a software-intensive
system.
The abbreviation for UML is Unified Modelling Language and is being brought of a designed to make
sure that the existing ER Diagrams which do not serve the purpose will be replaced by this UML
Diagrams where in these language as its own set of Diagrams.
Some of the Diagrams that help for the Diagrammatic Approach for the Object Oriented Software
Engineering are:
• Class Diagrams
• Use Case Diagrams
• Sequence Diagrams
• State chart Diagrams
• Activity Diagrams
Using the above mentioned diagram’s we can show the entire system regarding the working of the
system or the flow of control and sequence of flow the state of the system and the activities involved
in the system.
14
4.2.1 CLASS DIAGRAM
A Class is a category or group of things that has similar attributes and common behaviour. A Rectangle
is the icon that represents the class it is divided into three areas. The upper most area contains the
name, the middle area contains the attributes and the lowest areas show the operations. Class diagrams
provides the representation that developers work from. The classes in a Class diagram represent both
the main objects, interactions in the application and the classes to be programmed. In the diagram,
classes are represented with boxes which contain three parts:
1. The top part contains the name of the class. It is printed in bold and cantered, and the first letter is
capitalized.
2. The middle part contains the attributes of the class. They are left-aligned and the first letter is
lowercase.
3. The bottom part contains the methods the class can execute. They are also left-aligned and the first
letter is lower case. In the design of a system, a number of classes are identified and grouped
together in a class diagram which helps to determine the static relations between those described
objects. With detailed modeling, the classes of the conceptual design are often split into a number
of subclasses.
Visibility:
To specify the visibility of a class member (i.e., any attribute or method), the sanitation’s must be
placed before the member's name:
+ Public
/ Derived
# Protected
- Private
~ Package
15
4.2.2 USE-CASE DIAGRAM
A Use-Case is a description of systems behaviour from a understand point. For system developer this
is a valuable tool: it’s a tried-and-true technique for gathering system requirements from a user’s point
of view. That is important if the goal is to build a system that real people can use. A little stick figure
is used to identify an actor the ellipse represents use-case. Here, we have two actors namely developer
and user. The developer is involved in all the steps whereas the user is only interested in visualizing
the result, testing the model and displaying the result.
Use cases: A use case describes a sequence of actions that provide something of measurable value to
an actor and is drawn as a horizontal ellipse.
Actors: An actor is a person, Organization, or external system that plays a role in one or more
interactions with your system. Actors are drawn as stick figures.
Associations: Associations between actors and use case are indicated in use case diagrams by solid
lines. An association exists whenever an actor is involved with an interaction described by a use case.
Associations are modeled as line connecting use 28 cases and actors to one another, with an optional
arrowhead on one end of the line. The arrowhead is often used to indicating the direction of the initial
invocation of the relationship or to indicate the primary actor within the use case.
1. Include
In one form of interaction, a given use case may include another. Include is a directed relationship
between two use cases, implying that the behaviour of the included use case is inserted into the
behaviour of the including use case.
2. Extend
In another form of interaction, a given use case (the extension) may extend another. This relationship
indicates that the behaviour of the extension use case may be inserted in the extended use case under
some conditions. The notation is a dashed arrow from the extension to the extended use case, with the
label ‖<>‖.
16
3. Generalization
In the third form of relationship among use cases, generalization/specialization relationship exists. A
given use case may have common behaviours, constraints and assumptions to the general use case.
The notation is a solid line ending in a hollow triangle drawn from the specialized to the more general
use case.
In a functioning system objects interacts with one another and these interactions occur over time. The
UML Sequence Diagrams shows the time based dynamics of the interaction. The sequence diagrams
consist of objects represented in the use always named rectangles (If the name underlined), messages
represented as solid line arrows and time represented as a vertical progression. 30 Sequence diagrams
describe interactions among classes in terms of an exchange of messages overtime. It is an interaction
diagram that emphasizes the time ordering of messages. A narrow rectangle on an objects lifetime
represents activation- an exchange of one of that objects operation.
Lifeline: A lifeline represents an individual participant in the sequence diagram. It is at the very top of
the diagram.
Actor: An actor is a character who interacts with the subject and plays a part. It isn't covered by the
system's capabilities. It depicts a role in which human users interact with external devices or subjects.
17
Message: Message denotes the interactions between the messages. They are sequential ordering the
timeline.
Call message: By defining a specific communication between the interaction's lifelines, it shows that
the target lifeline has invoked an operation.
Return Message: It specifies a specific communication between the interaction lifelines, which reflect
the flow of data from the receiver of the associated caller message.
Activity diagram is another important diagram in UML to describe dynamic aspects of the system.
Activity diagram is basically a now chart to represent the flow from one activity to another activity. The
activity Can be described as an operation of the system. So the control now is drawn from one operation
to another, This flow can be sequential, branched or concurrent. Activity diagrams deal With all type Of
flow control by using different elements like fork, join etc.
PURPOSE:
The basic purposes of activity diagrams are similar to the other four diagrams. Other four diagrams are
used to show the message flow from one object to another but the activity diagram is used to show the
message flow from one activity to another. Activity is a particular operation of the system. It captures
the dynamic behavior of the system. Activity diagrams are not only used for visualizing the dynamic
nature of a system but they are also used to construct the executable system by using forward and reverse
engineering techniques. The only missing thing in the activity diagram is the message part.
18
Fig 4.2.4 Activity Diagram
State chart diagram is one of the five UML diagrams used to model the dynamic nature of a system.
They define different states of an object during its lifetime. And these states are changed by events. So
State chart diagrams are useful to model reactive systems, Reactive systems can be defined as a system
that responds to external or internal events. State chart diagram describes the flow of control from one
state to another state. States are defined as a condition in which an object exists and it changes when
some event is triggered. So the most important purpose of a State chart diagram is to model the time of
an object from creation to termination.
The first state is an idle state from where the process starts. The next states are arrived for events like
send request, confirm request, and dispatch order. These events are responsible for state changes of order
objects.
During the life cycle of an object (here order object) it goes through the following states and there may
be some abnormalities also. This abnormal exit may occur due to some problem in the system.
FUNCTIONING:
The user or admin login to the module and is authenticated using some validation checks if the user is
authenticated he/she will be allowed to search for an image else he/she have to make an attempt to Iogin
again.If the query image inserted satisfies all the properties like extension of image(pg.png the relevant
search is found and the images are retrieved else not found.
19
Fig 4.2.5 State Chart Diagram
4.3 CONCLUSION
UML diagrams are not only made for developers but also for business users, common people, and
anybody interested to understand the system. The system can be a software or non-software system.
Thus it must be clear that UML is not a development method rather it accompanies processes to make
it a successful system. In conclusion, the goal of UML can be defined as a simple modeling mechanism
to model all possible practical systems in today's complex environment.
20
CHAPTER 5
IMPLEMENTATION
5.1 ALGORITHMS
Artificial Neural Networks contain artificial neurons which are called units. The Artificial Neural
Network of a system is made up of these units grouped in a sequence of layers. Artificial neural networks
typically consist of hidden layers, output layers, and input layers. The input layer is where external data
is entered into the neural network for analysis or training. After that, the data goes through one or more
hidden layers, which convert the input into useful data for the output layer. Lastly, the Artificial Neural
Networks' reaction to the provided input data is presented as an output by the output layer.
• Artificial neural networks have the ability to provide the data to be processed in parallel, which
means they can handle more than one task at the same time.
• We are able to train ANN’s that these networks learn from past events and make decisions.
• ANN stored the information on the entire network rather than the database.
• It stored the information on the entire network rather than the database.
• ANN has distributed memory that helps to generate the desired output.
21
CONVOLUTIONAL NEURAL NETWORK
Convolutional Neural Network (CNN) is a type of artificial neural network specifically designed for
processing and analyzing structured grid data, such as images. CNNs are composed of multiple layers,
including convolutional layers, pooling layers, activation functions, fully connected layers, and often a
softmax layer for classification tasks. In CNNs, learnable filters slide over input data, extracting features
like edges and textures. Pooling layers reduce spatial dimensions, controlling computational load and
overfitting. Activation functions introduce non-linearity, enabling complex pattern learning. Fully
connected layers combine high-level features for classification.
• Convolutional Layers: These layers apply convolutional operations to input images, using
filters (also known as kernels) to detect features such as edges, textures, and more complex
patterns. Convolutional operations help preserve the spatial relationships between pixels.
• Pooling Layers: Pooling layers down sample the spatial dimensions of the input, reducing
the computational complexity and the number of parameters in the network. Max pooling is a
common pooling operation, selecting the maximum value from a group of neighbouring pixels.
• Activation Functions: Non-linear activation functions, such as Rectified Linear Unit
(ReLU), introduce non-linearity to the model, allowing it to learn more complex relationships
in the data.
• Fully Connected Layers: These layers are responsible for making predictions based on the
high-level features learned by the previous layers. They connect every neuron in one layer to
every neuron in the next layer.
22
Fig 5.1.2 – CNN Architecture
Data pre-processing is an important step in preparing a dataset for analysis, as it can help improve the
quality of the data and enhance the accuracy of the analysis. In the case of a kidney disease dataset, the
following are some common pre-processing steps that may be undertaken:
1.Data cleaning: This involves identifying and handling missing, incomplete, or inconsistent data
points. For instance, missing values can be imputed using techniques such as mean or median imputation,
or dropped altogether.
2.Outlier detection: This involves identifying and handling outliers, which are data points that deviate
significantly from the rest of the data. Outliers can be handled using techniques such as removing them
or replacing them with more appropriate values.
3.Feature selection: This involves selecting the most relevant features or variables that are likely to
have a significant impact on the analysis. This can be done using techniques such as correlation analysis
or statistical tests.
4.Data transformation: This involves transforming the data to a more suitable format for analysis. For
example, categorical data can be transformed into numerical data using one-hot encoding or label
encoding.
CODE:
import numpy as np
from skimage import io
image = np.random.randint(0, 10, (3, 5, 5))
N = 2 # Number of Filters
23
kernel = np.random.randint(0, 2, (N, 3, 3, 3))
def convolve2D(a, b, p=0, s=1):
# a --> Input
# b --> Kernel
# p --> Padding
# s --> Stride
# sizes of Input & Kernel
a_size = np.shape(a)
b_size = np.shape(b)
# initialization of output
c_size = (int(1 + (a_size[0] - b_size[0] + 2 * p) / s), int(1 + (a_size[1] - b_size[1] + 2 * p) / s))
c = np.zeros(c_size)
# Padding
if p != 0:
padded_image_size = (a_size[0] + p * 2, a_size[1] + p * 2)
padded_image = np.zeros(padded_image_size)
padded_image[p: -1 * p, p: -1 * p] = a
else:
padded_image = a
# Iterate through image
for y in range(a_size[1]):
# Exit Convolution
if y > a_size[1] - b_size[1]:
break
# Only Convolve if y has gone down by the specified Strides
if y % s == 0:
for x in range(a_size[0]):
# Go to next row once kernel is out of bounds
if x > a_size[0] - b_size[0]:
break
try:
# Only Convolve if x has moved by the specified Strides
if x % s == 0:
c[x, y] = (b * padded_image[x: x + b_size[0], y: y + b_size[1]]).sum()
except:
24
break
return padded_image
def convolution_layer(a, b):
# n --> no of filters
a_size = np.shape(a) # Shape of image
b_size = np.shape(b) # Shape of filter
c_size = (b_size[0], a_size[2] - b_size[3] + 1, a_size[1] - b_size[2] + 1) # Shape of Fetaure map
c = np.zeros(c_size, dtype=int)
for k in range(b_size[0]):
for i in range(c_size[2]):
for j in range(c_size[1]):
c[k, i, j] = convolve2D(a[0, i:i + len(b), j:j + len(b)], b[k][0])[0][0] + \
convolve2D(a[1, i:i + len(b), j:j + len(b)], b[k][1])[0][0] + \
convolve2D(a[2, i:i + len(b), j:j + len(b)], b[k][2])[0][0]
return c
#Feature Map
feature_map = convolution_layer(image,kernel)
print(f" Feature Map \n {feature_map}")
print(f" Size of Feature Map is {np.shape(feature_map)}")
# (2,3,3) --> ( 3 x 3 x 2)
OUTPUT:
#Importing Libraries:
import numpy as np
import pandas as pd
from scipy.spatial import distance as dist
import matplotlib.pyplot as plt
import os
25
import cv2
import seaborn as sns
from tqdm import tqdm
from sklearn.utils import shuffle
from sklearn import decomposition
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
import tensorflow as tf
import keras
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.models import Sequential, Model
from keras.initializers import he_normal
from keras.layers import Lambda, SeparableConv2D, BatchNormalization, Dropout, MaxPooling2D,
Input, Dense, Conv2D, Activation, Flatten
from keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
#Loading Data:
class_names = ['EOSINOPHIL', 'LYMPHOCYTE', 'MONOCYTE', 'NEUTROPHIL']
nb_classes = len(class_names)
image_size = (120,120)
def load_data():
datasets = [r'C:\Users\pramithasri\Downloads\project\dataset2-master\dataset2-
master\images\TRAIN',r'C:\Users\pramithasri\Downloads\project\dataset2-master\dataset2-
master\images\TEST']
images = []
labels = []
# iterate through training and test sets
count =0
for dataset in datasets:
# iterate through folders in each dataset
for folder in os.listdir(dataset):
if folder in ['EOSINOPHIL']: label = 0
elif folder in ['LYMPHOCYTE']: label = 1
26
elif folder in ['MONOCYTE']: label = 2
elif folder in ['NEUTROPHIL']: label = 3
# iterate through each image in folder
for file in tqdm(os.listdir(os.path.join(dataset, folder))):
# get pathname of each image
img_path = os.path.join(os.path.join(dataset, folder), file)
# Open
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# resize the image
image = cv2.resize(image, image_size)
# Append the image and its corresponding label to the output
images.append(image)
labels.append(label)
images = np.array(images, dtype = 'float32')
labels = np.array(labels, dtype = 'int32')
return images, labels
images, labels = load_data()
OUTPUT:
#Classifying Data:
images, labels = shuffle(images, labels, random_state=10)
train_images, test_images, train_labels, test_labels = train_test_split(images, labels, test_size = 0.2)
test_images, val_images, test_labels, val_labels = train_test_split(test_images, test_labels, test_size =
0.5)
n_train = train_labels.shape[0]
n_val = val_labels.shape[0]
n_test = test_labels.shape[0]
print("Number of training examples: {}".format(n_train))
print("Number of validation examples: {}".format(n_val))
27
print("Number of testing examples: {}".format(n_test))
print("Training images are of shape: {}".format(train_images.shape))
print("Training labels are of shape: {}".format(train_labels.shape))
print("Validation images are of shape: {}".format(val_images.shape))
print("Validation labels are of shape: {}".format(val_labels.shape))
print("Test images are of shape: {}".format(test_images.shape))
print("Test labels are of shape: {}".format(test_labels.shape))
OUTPUT:
In this module we split the data into training and test dataset in 75:25 ratio. After that we applying two
algorithms on the dataset i.e; ANN and CNN algorithms. After applying algorithms we will evaluate the
performance of each algorithm:
CODE:
#ANN
from tensorflow.keras.models import Sequential
OUTPUT:
29
#Accuracy of ANN Model:
results = model_ann.evaluate(test_images, test_labels)
print("Loss of the model is - test ", results[0])
print("Accuracy of the model is - test", results[1]*100, "%")
results = model_ann.evaluate(val_images, val_labels)
print("Loss of the model is - val ", results[0])
print("Accuracy of the model is - val", results[1]*100, "%")
results = model_ann.evaluate(train_images, train_labels)
print("Loss of the model is - train ", results[0])
print("Accuracy of the model is - train", results[1]*100, "%")
OUTPUT:
#Data Classification:
print(classification_report(
test_labels,
predictions,
target_names = ['EOSINOPHIL (Class 0)', 'LYMPHOCYTE (Class 1)', 'MONOCYTE (Class 2)',
'NEUTROPHIL (Class 3)']))
30
OUTPUT:
#Confusion Matrix:
cm = confusion_matrix(test_labels, predictions)
cm = pd.DataFrame(cm, index = ['0', '1', '2', '3'], columns = ['0', '1', '2', '3'])
cm
OUTPUT:
31
OUTPUT:
#CNN:
model1 = Sequential()
# First Conv Layer
model1.add(Conv2D(16 , (3,3) , activation = 'relu' , input_shape = (120,120,3)))
model1.add(MaxPooling2D(pool_size = (2,2)))
# Second Conv Layer
model1.add(Conv2D(32, (3,3), activation = 'relu'))
model1.add(MaxPooling2D(pool_size = (2,2)))
model1.add(Dropout(0.25))
# Third Conv Layer
model1.add(Conv2D(64, (3,3), activation = 'relu'))
model1.add(MaxPooling2D(pool_size = (2,2)))
model1.add(Dropout(0.25))
# FC layer
model1.add(Flatten())
model1.add(Dense(units = 128 , activation = 'relu'))
32
model1.add(Dropout(0.25))
# Output layer
model1.add(Dense(units = 4 , activation = 'softmax'))
# Compile
model1.compile(optimizer = "adam" , loss = 'sparse_categorical_crossentropy' , metrics = ['accuracy'])
model1.summary()
# Train
history1 = model1.fit(
train_images,
train_labels,
batch_size = 32,
epochs = 10,
validation_data=(val_images, val_labels))
#OUTPUT:
33
#Accuracy of CNN Model:
34
#Data Classification:
print(classification_report(
test_labels,
predictions,
target_names = ['EOSINOPHIL (Class 0)', 'LYMPHOCYTE (Class 1)', 'MONOCYTE (Class 2)',
'NEUTROPHIL (Class 3)']))
OUTPUT:
#Confusion Matrix:
cm = confusion_matrix(test_labels, predictions)
cm = pd.DataFrame(cm, index = ['0', '1', '2', '3'], columns = ['0', '1', '2', '3'])
cm
OUTPUT:
35
linewidth = 1,
annot = True,
fmt = '',
xticklabels = class_names,
yticklabels = class_names)
plot_confusion_matrix(cm)
OUTPUT:
OUTPUT:
37
5.5 USER INTERFACE
In this module a user interface is developed to input the data for blood cell claasification. The user
interface is developed using FLASK, python, html.
HTML:
The Hypertext Markup Language or HTML is the standard markup language for documents designed to
be displayed in a web browser. It is often assisted by technologies such as Cascading Style Sheets and
scripting languages such as JavaScript.
HTML uses tags to define the elements within a web page, such as headings, paragraphs, images, links,
and forms. These tags are enclosed in angled brackets (< >) and are paired with an opening tag and a
closing tag. For example, to create a heading, you would use the <h1> opening tag and the </h1> closing
tag.
HTML also allows for the use of attributes, which provide additional information about an element.
Attributes are placed within the opening tag and are comprised of a name and a value. For example, the
<img> element for an image can have attributes such as src (the location of the image file) and alt
(alternative text that describes the image for accessibility).
CODE:
Index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Upload Image for Classification</title>
<style>
body {
background-image:
url('https://femina.wwmindia.com/content/2021/may/01redshutterstock10277359781621494933.jpg');
background-size: cover;
display: flex;
justify-content: center;
38
align-items: center;
height: 100vh;
margin: 0;
}
.container {
width: 300px;
padding: 20px;
box-shadow: 0 4px 8px rgba(0,0,0,0.1);
background: rgba(255, 255, 255, 0.8);
border-radius: 8px;
}
input, button {
display: block;
width: 100%;
margin-top: 10px;
}
nav {
position: fixed;
right: 0;
top: 0;
padding: 10px;
background: rgba(255, 255, 255, 0); /* More transparent background */
}
.nav-button {
background-color: #007bff;
border: none;
color: white;
padding: 10px 20px;
text-align: center;
text-decoration: none;
display: inline-block;
font-size: 16px;
margin: 2px;
cursor: pointer;
border-radius: 5px;
39
transition: background-color 0.3s;
}
.nav-button:hover {
background-color: #0056b3;
}
</style>
</head>
<body>
<nav>
<a href="login" class="nav-button">Home</a>
<a href="about" class="nav-button">About</a>
</nav>
<div class="container">
<form action="/predict" method="post" enctype="multipart/form-data">
<h2>Upload Image</h2>
<input type="file" name="file" required>
<button type="submit">Predict</button>
</form>
</div>
</body>
</html>
#About.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>About Blood Cells</title>
<style>
body {
background-image: url('static/img2.webp');
font-family: Arial, sans-serif;
display: flex;
justify-content: center;
40
align-items: center;
height: 100vh;
margin: 0;
background-color: #f4f4f4;
}
.container {
width: 50%;
padding: 20px;
background-color: rgba(255, 255, 255, 0.667);
box-shadow: 0 4px 8px rgba(0,0,0,0.1);
border-radius: 8px;
}
h1 {
text-align: center;
}
p{
text-align: justify;
font-size: large;
font-weight: 400;
}
</style>
</head>
<body>
<div class="container">
<h1>About Blood Cells</h1>
<p>LYMPHOCYTES are small, typically measuring about 6 to 14 micrometers (µm) in diameter.
They are round or slightly oval in shape and lymphocytes are colorless or have a pale-blue tint.</p>
41
<p>Neutrophils are about 10 to 12 micrometers (µm) in diameter, Short Lifespan polymorphonuclear"
or "PMN" due to the multiple nuclei. A pale or light pinkish in colour.</p>
</div>
</body>
</html>
#Login.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Sign Up / Log In</title>
</head>
<body>
<style>
body {
background-image: url('/static/image.png');
background-size: cover;
font-family: Arial, sans-serif;
background-color: #f0f0f0;
margin: 0;
padding: 0;
}
.container {
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
}
.form-container {
background-color: #eee4e4c4;
border-radius: 8px;
box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
padding: 20px;
42
max-width: 400px;
width: 100%;
}
.form-toggle {
display: flex;
justify-content: center;
margin-bottom: 20px;
}
.form-toggle button {
background-color: transparent;
border: none;
cursor: pointer;
padding: 10px;
width: 100px;
font-size: 16px;
outline: none;
}
.form-toggle button.active {
border-bottom: 2px solid #333;
}
.form {
display: flex;
flex-direction: column;
}
.form input {
margin-bottom: 10px;
padding: 10px;
border: 1px solid #ccc;
border-radius: 4px;
outline: none;
}
.form button {
background-color: #333;
color: #fff;
border: none;
43
padding: 10px;
border-radius: 4px;
cursor: pointer;
width:400px;
}
.form button:hover {
background-color: #555;
}
.hidden {
display: none;
}
</style>
<div class="container">
<div class="form-container">
<div class="form-toggle">
<button id="logInBtn" class="active">Log In</button>
<button id="signUpBtn">Sign Up</button>
</div>
<form id="logInForm" class="form" action="/login" method="POST">
<input type="text" name="email" id="loginUsername" placeholder="Username/Email">
<input type="password" name="password" id="loginPassword" placeholder="Password">
<button type="submit">Log In</button>
</form>
<form id="signUpForm" class="form hidden" action="/register" method="POST">
<input type="text" id="username" name="name" placeholder="Username">
<input type="email" id="email" name="email" placeholder="Email">
<input type="date" id="date" name="date" placeholder="date">
<input type="password" id="password" name="password" placeholder="Password">
<input type="password" id="confirmPassword" placeholder="Confirm Password">
<button type="submit">Sign Up</button>
</form>
</div>
</div>
<script>
document.getElementById('logInBtn').addEventListener('click', function() {
44
document.getElementById('logInForm').classList.remove('hidden');
document.getElementById('signUpForm').classList.add('hidden');
this.classList.add('active');
document.getElementById('signUpBtn').classList.remove('active');
});
document.getElementById('signUpBtn').addEventListener('click', function() {
document.getElementById('signUpForm').classList.remove('hidden');
document.getElementById('logInForm').classList.add('hidden');
this.classList.add('active');
document.getElementById('logInBtn').classList.remove('active');
});
</script>
</body>
</html>
#logout.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>About Blood Cells</title>
<style>
body {
background-image: url(img2.webp);
font-family: Arial, sans-serif;
display: flex;
justify-content: center;
align-items: center;
height: 100vh;
margin: 0;
background-color: #f4f4f4;
}
.container {
width: 600px;
45
padding: 20px;
background-color: rgba(255, 255, 255, 0.863);
box-shadow: 0 4px 8px rgba(0,0,0,0.1);
border-radius: 8px;
}
h1 {
text-align: center;
}
p{
text-align: justify;
font-size: large;
font-weight: 400;
}
</style>
</head>
<body>
<div class="container">
<h1>you are sucessfully logged out. </h1>
</div>
</body>
</html>
#result.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Classification Result</title>
<style>
body {
background-image: url('/static/img3.webp');
background-size: cover;
display: flex;
46
justify-content: center;
align-items: start;
height: 100vh;
margin: 0;
padding-top: 50px;
}
div.container {
width: 400px;
padding: 20px;
margin-top: 200px;
text-align: center;
box-shadow: 0 4px 8px rgba(0,0,0,0.1);
background: rgba(255, 255, 255, 0.8)
border-radius: 8px;
}
a{
display: block;
margin-top: 20px;
text-decoration: none;
color: #260eff;
font-weight: bold;
}
nav {
position: fixed;
right: 0;
top: 0;
padding: 10px;
background: rgba(255, 255, 255, 0);
}
.nav-button {
background-color: #007bff;
border: none;
color: white;
padding: 10px 20px;
47
text-align: center;
text-decoration: none;
display: inline-block;
font-size: 16px;
margin: 2px;
cursor: pointer;
border-radius: 5px;
transition: background-color 0.3s;
}
.nav-button:hover {
background-color: #0056b3;
}
</style>
</head>
<body>
<nav>
<a href="about" class="nav-button">About</a>
<a href="logout" class="nav-button">Logout</a>
</nav>
<div class="container">
<h1>Classification Result</h1>
48
5.5.2 BACK END DESIGN CODE
PYTHON:
Python is a high-level, interpreted programming language that is widely used for general-purpose
programming, data science, web development, artificial intelligence, and many other applications. It was
created by Guido van Rossum in the late 1980s and released in 1991. Python is known for its simple
syntax, which makes it easy to learn and use, even for beginners.
It uses whitespace indentation to delimit code blocks, rather than using curly braces like many other
programming languages. Python has a vast standard library that includes modules for a wide range of
tasks, such as handling files, working with databases, networking, and more. It also has a large and active
community of developers who contribute to a rich ecosystem of third-party packages that extend the
language's capabilities.
Python supports multiple programming paradigms, including procedural, object-oriented, and functional
programming. It is dynamically typed, meaning that variable types are inferred at runtime, rather than
being explicitly declared. Python is widely used in many industries, including finance, healthcare,
education, and Technology.
FLASK:
Flask is a micro web framework written in Python. It is classified as a microframework because it does
not require particular tools or libraries. It has no database abstraction layer, form validation, or any other
components where pre-existing third-party libraries provide common functions.
#app.py
49
user="root",
password="",
database="blood"
)
db_cursor=db_connection.cursor()
def preprocess_image(image_path):
img_array = image.img_to_array(img)
@app.route('/')
def home():
return render_template('login.html')
@app.route('/login', methods=['GET','POST'])
def login():
50
if request.method=="POST":
email=request.form['email']
password=request.form['password']
db_cursor.execute('SELECT * FROM user WHERE email=%s AND
password=%s',(email,password))
user=db_cursor.fetchone()
if user:
return render_template("index.html")
return render_template("login.html")
@app.route("/register", methods=['GET','POST'])
def register():
if request.method=='POST':
name=request.form['name']
email=request.form['email']
date=request.form['date']
password=request.form['password']
db_connection.commit()
return render_template("login.html")
return render_template("login.html")
@app.route('/about')
def about():
return render_template("about.html")
@app.route('/logout')
def logout():
return render_template("login.html")
@app.route('/index')
def index():
return render_template("index.html")
51
@app.route('/predict', methods=['POST'])
def predict():
uploaded_file = request.files['file']
uploaded_file.save(temp_file_path)
try:
img_array = preprocess_image(temp_file_path)
predicted_class_index = predict_image_class(img_array)
predicted_class_label = target_names[predicted_class_index]
background_image_url = background_images[predicted_class_index]
return
render_template('results.html',
prediction=predicted_class_label, background_image_url=background_image_url)
finally:
if _name_ == '_main_':
app.run(debug=True)
52
5.5.3 OUTPUT SCREENSHOTS:
Signup page
53
Login page
Home page
54
About page
Logout page
55
EOSINOPHIL
56
MONOCYTE
57
NEUTROPHIL
58
LYMPHOCYTE
59
CHAPTER 6
TESTING AND VALIDATION
6.1. INTRODUCTION
We typically need to consider multiple models before picking the final that will eventually be used to
perform predictions on real-world data. In this process, we usually train models over a subset of the
original dataset that is the dataset that the models use to learn from data. Therefore, we need to somehow
evaluate which of the candidate models performs better based on the specific metrics that have been
determined at the beginning based on the nature of the project and problem we are aiming to solve.
Finally, once the final model has been selected, we also need to evaluate whether it can generalise well
to new, unseen data. In order to be able to train the models, perform model selection and finally evaluate
the final model in order to check whether it can generalise well, we typically split the original dataset
into training, testing and validation sets. In the sections below we are going to discuss the purpose that
each of them serves in the context of supervised learning.
Training Set:
The training set is typically the biggest in terms of size set that is created out of the original dataset and
is being used to fid the model. In other words, the data points included in the training set are used to
learn the parameters of the model of interest. In the context of supervised learning, all the training
examples should include both the predictor variables (ie, the features) as well as the corresponding output
variable (ie, the label), During the training phase, you can use the correct labels in order to derive the
training accuracy that you can then compare against the test accuracy (see below) in order to evaluate
whether the model has been overfitted.
Validation Set:
Now the validation dataset is useful when it comes to hyper-parameter tuning and model selection. The
validation examples included in this set will be used to find the optimal values for the hyper-parameters
of the model under consideration. When working with Machine Learning models, you typically need to
test multiple models with different hyper-parameter values in order to find the optimal values that will
give the best possible performance. Therefore, in order to pick objectively the "best" model you need to
evaluate each of them. For instance, In Deep Learning we use the validation set in order to find the
optimal serwork layer size, the number of hidden units and the regularisation term.
60
Testing Set:
Now that you have tuned the model by performing hyper-parameter optimisation, you should end up
with the final model. The testing set is used to evaluate the performance of this model and ensure that it
can generalise well to new, unseen data points. At this point, you should also compare the testing
accuracy against the training accuracy in order to ensure that the model was not overfitted. This is the
case when both accuracies are "close enough". When the training accuracy significantly outperforms
resting accuracy, then there's a good chance that overfitting has occurred.
6.1.1 SCOPE
A primary purpose for testing is to detect software failures so that defects may be uncovered and
corrected. This is a non-trivial pursuit. Testing cannot establish that a product functions properly under
all conditions but can only establish that it does not function properly under specific conditions. The
scope of software testing often includes examination of code as well as execution of that code in various
environments and conditions as well as examining the aspects of code: does it do what it is supposed to
do and do what it needs to do. In the current culture of software development, a testing organization may
be separate from the development team. There are various roles for testing team members. Information
derived from software testing may be used to correct the process by which software is developed.
Not all software defects are caused by coding errors. One common source of expensive defects is caused
by requirements gaps, e.g., unrecognized requirements that result in errors of omission by the program
designer. A common source of requirements gaps is non-functional requirements such as testability,
scalability, maintainability, usability, performance, and security.
Software faults occur through the following processes. A programmer makes an error (mistake), which
results in a defect (fault, bug) in the software source code. If this defect is executed, in certain situations
the system will produce wrong results, causing a failure. Not all defects will necessarily result in failures.
For example, defects in dead code will never result in failures. A defect can turn into a failure when the
environment is changed. Examples of these changes in environment include the software being run on a
new hardware platform, alterations in source data or interacting with different software. A single defect
may result in a wide range of failure symptoms.
61
6.1.3 COMPATIBILITY
A frequent cause of software failure is compatibility with another application, a new operating system,
or, increasingly, web browser version. In the case of lack of backward compatibility, this can occur
because the programmers have only considered coding their programs for, or testing the software upon,
"the latest version of" this-or-that operating system. The unintended consequence of this fact is that:
their latest work might not be fully compatible with earlier mixtures of software/hardware, or it might
not be fully compatible with another important operating system. In any case, these differences, whatever
they might be, may have resulted in software failures, as witnessed by some significant population of
computer users. This could be considered a "prevention oriented strategy" that fits well with the latest
testing phase suggested by Dave Gelperin and William C. Hetzel, as cited below.
62
6.1.6 TYPES OF TESTING
Data Testing:
• Data Quality Testing: This involves checking your dataset for errors and inconsistencies in labels.
You can use statistical analysis or manual inspection to identify outliers or mislabeled data points.
• Data Distribution Testing: Analyze the distribution of your data (e.g., number of blood cells and
the types) to identify potential biases or imbalances.
Model Testing:
• K-Fold Cross-Validation: This is a common technique to assess model performance. The data is
split into k folds, and the model is trained on k-1 folds and tested on the remaining fold. This process
is repeated k times, providing a robust estimate of the model'sgeneralizability on unseen data.
• Test-Train Split: Similar to cross-validation, you can split your data into a training set (used to build
the model) and a separate test set (used to evaluate the model's performance on unseen data). This
provides a more realistic assessment of how the model will perform in real-world scenarios.
• Sensitivity and Specificity: These metrics evaluate how well the model detects true positives
(pneumonia cases) and avoids false positives (healthy patients flagged as having pneumonia).
63
6.2 DESIGN OF TEST CASES AND SCENARIOS
• A good test case is one that has a high probability of finding an as-yet undiscovered error.
Test Levels: The test strategy describes the test level to be performed. There are primarily three
levels of testing.
• Unit Testing
• Integration Testing
64
• System Testing
In most software development organizations, the developers are responsible for unit testing. Individual
testers or test teams are responsible for integration and system testing.
65
6.3.3 White Box Testing
White Box Testing is software testing technique in which internal structure, design and coding of
software is tested to verify flow of input-output and to improve design, usability and security. In
white box testing, code is visible to testers so it is also called Clear box testing, Open box testing,
Transparent box testing, Code-based testing and Glass box testing. In this test cases are generated
on the logic of each module by drawing 74 flow graphs of that module and logical decisions are
tested on all the cases.
66
6.3.5 System Testing
Testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration-oriented system integration test. System testing is based on process descriptions and
flows, emphasizing pre-driven process links and integration points.
6.4 CONCLUSION
The testing and validation chapter is a crucial component of the project's development process. The
testing and validation procedures carried out ensure that the project meets the initial requirements and
objectives, and functions as intended. The testing process involved testing each module independently
and then integrating them to test the entire system's functionality. This process ensured that each
module performs its intended function and that the modules integrate smoothly.
The validation process involved evaluating the project's performance against the initial requirements
and objectives. This process enabled the development team to identify any areas that required
improvement and optimize the project accordingly. The testing and validation chapter is important as
it provides assurance that the project functions as intended, meets the initial requirements, and is ready
for deployment. Additionally, the testing and validation chapter is a roadmap for future development
and enables developers to identify areas that require improvement and optimize the project accordingly.
Overall, the testing and validation chapter is a critical component of the project's development process.
It ensures that the project meets the initial requirements, functions as intended, and is ready for
deployment. The testing and validation chapter also provides a roadmap for future development,
ensuring that the project continues to meet the changing needs of its users.
67
CHAPTER 7
CONCLUSION
7.1 CONCLUSION
Machine Learning can be beneficial in the field of medical domain. Using machine learning approaches,
In our project we shown significant progress in medical analysis of images by detecting, classifying, and
counting blood cells. We have accomplished accurate identification and categorization of different blood
cell types, such as white blood cells, red blood cells by applying innovative machine learning algorithms.
Our method not only makes it possible to quickly and accurately identify errors in blood samples, but it
also makes cell counting more efficient, lessening the workload for medical personnel and increasing
the effectiveness of diagnostics. We have opened the door for better hematological diagnoses by utilizing
machine learning, which will ultimately result in better patient outcomes and care.
We can expand our model to classify not only the type of white blood cell but also other types of blood
cells, such as red blood cells and platelets and we can develop a system that can analyze blood cell
images in real-time, allowing for immediate feedback in medical settings. We can Integrate our
classification model with medical devices like blood cell counters to automate the process of analyzing
blood samples. And also we can create a mobile application that allows users to capture images of blood
samples using their smartphone cameras and receive instant analysis results. We can Implement a cloud-
based solution where users can upload blood cell images for analysis, making it accessible from
anywhere and scalable for large-scale analysis.
68
CHAPTER 8
REFERENCES
[1] YECAI GUO, ANDMENGYAOZHANG - Blood Cell Detection Method Based on Improved
YOLOv5, National Natural Science Foundation of China, (VOLUME 11, 2023 issued: 6 July 2023)
LINK: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10174640
[2] Mohammad Mahmudul Alam1 and Mohammad Tariqul Islam- A Machine Learning Approach of
Automatic Identification and Counting of Blood Cells, The Institution of Engineering and Technology
(Volume: 11:54:32 issued: 2018/10/20)
LINK: https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/htl.2018.5098
[3] Shin-Jye Lee, Pei-Yun Chen, Jeng-Wei Lin- Complete Blood Cell Detection and Counting Based on
Deep Neural Networks, Institute of Management of Technology, National Yang Ming Chiao Tung
University (Volume: 12(16) issued: 14 August 2022)
LINK:https://www.mdpi.com/2076-3417/12/16/8140
LINK: https://chooser.crossref.org/?doi=10.18502%2Fijph.v48i4.1004
[5] S. Seth and K. Palodhi, "An efficient algorithm for segregation of white and red blood cells based on
modified hough transform", 2017 IEEE Calcutta Conference (CAL-CON), pp. 465-468, 2017.
LINK: https://ieeexplore.ieee.org/document/8280777
[6] R. Roy and S. Sasi, "Classification of WBC Using Deep Learning for Diagnosing Diseases", 2018
Second International Conference on Inventive Communication and Computational Technologies
(ICICCT), pp. 1634-1638, 2018.
LINK: https://ieeexplore.ieee.org/document/8473028
69