Pneumonia NEW1
Pneumonia NEW1
Pneumonia NEW1
INTRODUCTION
Pneumonia is a common and potentially serious respiratory infection that affects the lungs. It
can be caused by a variety of pathogens, including bacteria, viruses, fungi, and, less commonly,
parasites. (Crosta, 2023) This condition can affect people of all ages, but it is more prevalent
in young children, the elderly, and individuals with weakened immune systems or pre-existing
health conditions (Jaul & Barron, 2017). Understanding pneumonia, its causes, symptoms,
diagnosis, treatment, and prevention is crucial for maintaining respiratory health.
Microorganisms that enter the lungs through inhalation or aspiration of infected respiratory
droplets (Normandin, 2023) often cause pneumonia.
The most common pathogens responsible for bacterial pneumonia are Streptococcus
pneumoniae, Haemophilus influenzae, and Staphylococcus aureus (Dessie et al., 2021).
Influenza viruses, respiratory syncytial virus (RSV), and adenovirus, among others, typically
cause viral pneumonia (Al-Romaihi et al., 2020). Fungal pneumonia is often seen in individuals
with compromised immune systems, and organisms such as Pneumocystis jirovecii can cause
it (Tasaka, 2015). The symptoms of pneumonia can vary depending on the cause, age of the
individual, and overall health status. Common signs and symptoms include cough (often
producing phlegm or pus), fever, chills, difficulty breathing or shortness of breath, chest pain,
fatigue, and sometimes bluish tint to the lips or nails due to inadequate oxygenation.
1
improve, to prevent the development of antibiotic-resistant bacteria. Pneumonia can lead to
various complications, particularly in vulnerable populations. These complications may
include pleural effusion (accumulation of fluid around the lungs), lung abscesses (pus-filled
cavities in the lungs), sepsis (a severe response to infection), and respiratory failure (Kuhajda
et al., 2015).
Preventing pneumonia involves several strategies, especially for those at higher risk.
Vaccination is a critical preventive measure. Vaccines against Streptococcus pneumoniae,
Haemophilus influenzae, influenza viruses, and others can significantly reduce the risk of
pneumonia and its complications (Kim et al., 2017). Good hygiene practices, such as frequent
hand washing and covering the mouth and nose when coughing or sneezing, can also help
prevent the spread of pathogens.
Additionally, lifestyle choices play a role in preventing pneumonia. Avoiding smoking and
limiting exposure to secondhand smoke can help keep the lungs healthy. Maintaining overall
good health through regular exercise, a balanced diet, and adequate rest supports a robust
immune system that can better fend off infections. Pneumonia is a severe respiratory infection
that affects millions of people worldwide, causing significant morbidity and mortality.
Timely and accurate diagnosis is crucial for successful treatment and patient outcomes.
Machine Learning (ML) has emerged as a promising tool in medical diagnostics, including
pneumonia detection. It is a lung infection that can be caused by various pathogens, including
bacteria, viruses, and fungi. It leads to inflammation of the air sacs in the lungs, leading to
symptoms like cough, fever, difficulty breathing, and chest pain. Pneumonia can be severe,
particularly in vulnerable populations such as the elderly, children, and individuals with
compromised immune systems. Early detection of pneumonia is crucial for timely intervention
and effective treatment.
Delayed diagnosis can lead to complications, including acute respiratory distress syndrome
(ARDS) and sepsis, which can be life-threatening. Therefore, accurate and swift detection is
vital in reducing the disease burden. Machine learning algorithms can be applied to medical
imaging data, such as chest X-rays and computed tomography (CT) scans or pneumonia
datasets, to assist in the detection of pneumonia (Zhang, 2021).
2
results in detecting pneumonia from medical imaging data (Sharma & Guleria, 2023).
However, there are still challenges to overcome, such as data bias and model interpretability.
Collaborative efforts, ethical considerations, and rigorous validation are critical to ensuring the
safe and effective integration of AI-based systems in clinical practice. As the field of machine
learning continues to evolve, we can anticipate further progress in pneumonia detection and
improved patient outcomes.
Pneumonia is a serious lung infection that can be life-threatening, especially in young children
and the elderly. Early diagnosis and treatment are essential for improving patient outcomes.
However, pneumonia can be difficult to diagnose, especially in its early stages. Machine
learning has the potential to revolutionize the early diagnosis of pneumonia. By analyzing chest
X-ray images, machine learning models could be trained to identify the telltale signs of
pneumonia, even when they are not visible to the naked eye. This could lead to earlier diagnosis
and treatment, which could save lives.
The aim of this study is to design and implementation of pneumonia detection system using
machine learning. The specific objectives are to:
(i) develop Respiratory disease model using Support Vector Machines (SVM) and
Convolutional Neural Networks (CNN).
(ii) develop a pneumonia detection system using Convolutional neural network and support
vector machine.
(iii) evaluate the developed system using recall, confusion matrix and F1 scores.
3
1.4 Research Methodology
The research aims to develop a pneumonia detection system using machine learning algorithms
and a dataset of medical imaging data, including chest X-rays and CT scans. The dataset will
be obtained from Kaggle and will be carefully selected to include diverse patient demographics.
Convolutional Neural Networks will be the primary algorithms for pneumonia detection due
to the success in medical image analysis. Other classical machine learning algorithms may be
used for comparison. Preprocessing steps, like image resizing and normalization, will be
applied to optimize model performance. The dataset will be divided into training, validation,
and testing sets and evaluation metrics like accuracy, precision, recall, F1-score, and AUC-
ROC will be used. Cross-validation techniques will be employed to ensure robust results. The
final model will be deployed as a practical diagnostic tool, possibly as a web-based application
or integrated into hospital information systems, with user-friendly interfaces. Python, Jupyter
Notebook, HTML, CSS, JavaScript, and MySQL will be used for model development and
deployment. Extensive testing and validation will be performed to ensure reliability and safety.
The research aims to create an accurate, interpretable, and ethical pneumonia detection system
to improve patient outcomes and enhance pneumonia diagnosis efficiency in clinical practice.
The study is significant due to its potential to revolutionize pneumonia diagnosis and
management. By applying advanced machine learning algorithms to chest X-rays and CT
scans, the study aims to improve diagnostic accuracy, save time and costs, and make diagnosis
more accessible in various healthcare settings. Real-time diagnosis and personalized treatment
approaches can lead to better patient outcomes. Ethical considerations and fairness-aware
techniques ensure unbiased and equitable diagnosis. The study contributes to the advancement
of AI in healthcare, fostering collaboration between researchers and healthcare professionals.
Overall, the study's impact extends to reducing the global burden of pneumonia and improving
patient care.
The scope of the study is to develop and evaluate a machine learning-based system for
pneumonia detection. It involves collecting diverse medical imaging data, exploring algorithms
like CNN, and addressing data bias and interpretability. The study aims to train and evaluate
the models using evaluation metrics, deploy the system for clinical use, and consider ethical
4
considerations. Limitations include data access and the focus on pneumonia detection only.
The study's goal is to improve pneumonia diagnosis, support healthcare professionals, and
contribute to medical diagnostics.
Pneumonia: A common and potentially serious respiratory infection affecting the lungs caused
by various pathogens, including bacteria, viruses, and fungi.
Chest X-ray: An imaging technique that uses X-rays to visualize the internal structures of the
chest, including the lungs.
Computed Tomography (CT) scans: A medical imaging technique that uses X-rays to create
detailed cross-sectional images of the body. Dataset: A collection of data used for training and
evaluating machine learning models.
Convolutional Neural Networks (CNN): A type of deep learning model specifically designed
for image processing tasks, capable of automatically learning features from images.
Logistic Regression: Another classical machine learning algorithm used for binary
classification.
Data Preprocessing: The process of preparing and cleaning the data to improve the model's
performance.
Evaluation Metrics: Quantitative measures used to assess the performance of the machine
learning model, such as accuracy, precision, recall, F1-score, and AUC-ROC.
Jupyter Notebook: An open-source web application that allows interactive computing and
data analysis using code and visualizations.
MySQL Database: A widely used relational database management system for storing
structured data.
5
CHAPTER TWO
LITERATURE REVIEW
2.1 Introduction
In the world of respiratory infections, one ailment stands out as a serious and potentially life-
threatening adversary pneumonia. This lung infection strikes people of all ages but is
particularly perilous for the young, the elderly, and those with weakened immune systems. As
we delve into the depths of pneumonia, we will explore its causes, symptoms, risk factors,
diagnosis, treatment, and prevention. At the root of pneumonia lie various infectious agents,
each with its own potential to wreak havoc on the respiratory system (Pragman et al., 2016).
Bacteria, viruses, fungi, and occasionally parasites can all be culprits. Streptococcus
pneumoniae takes the lead among bacterial perpetrators, but Haemophilus influenzae,
Staphylococcus aureus, and Mycoplasma pneumoniae can also lead to the condition (Chen et
al., 2023). On the viral front, influenza viruses, respiratory syncytial virus (RSV), and
adenoviruses pose significant threats (Zhang et al., 2020).
Fungal pneumonia, on the other hand, typically affects those with compromised immune
systems, such as individuals battling HIV/AIDS or undergoing chemotherapy (Kaur et al.,
2017). The onset of pneumonia is often characterized by a cascade of distressing symptoms
(Von Ranke et al., 2012). Coughs, sometimes accompanied by mucus, become relentless. Fever
and chills grip the body, causing discomfort and fatigue (Zambon, 2020). Shortness of breath
and rapid breathing add to the struggle, while chest pain intensifies with each breath or cough.
Sweating and clammy skin become constant companions, and in severe cases, a bluish tint may
appear on lips and nails, reflecting decreased oxygen levels (Kahn, 2023).
Certain individuals face higher odds of falling victim to pneumonia. Age becomes a defining
factor, as the very young and the elderly bear greater vulnerability. Additionally, individuals
with weakened immune systems, such as those with HIV, cancer, or under immunosuppressive
therapy, find themselves at a higher risk. Chronic lung conditions like asthma, COPD (chronic
obstructive pulmonary disease), or bronchiectasis also heighten the susceptibility (Athanazio,
2012). Smoking, both active and passive, further compromises the body's defense mechanisms.
Hospital-acquired pneumonia can be more severe and affect patients during their hospital stay,
adding another layer of risk. Aspiration pneumonia, caused by inhaling food, liquids, or foreign
6
objects into the lungs, is another perilous scenario (Košutova & Mikolka, 2021). Living in
crowded or polluted areas may also increase the risk of contracting pneumonia. When an
healthcare professional suspects pneumonia, a series of diagnostic steps comes into play. A
thorough physical examination and a detailed medical history assessment lay the groundwork.
Listening to the lungs with a stethoscope may reveal abnormal breath sounds, hinting at
pneumonia. To visualize any abnormalities, chest X-rays or CT scans come to the fore. Blood
tests and sputum cultures further contribute to identifying the infectious agent responsible for
pneumonia. Treatment for pneumonia hinges on the cause and severity of the infection.
Bacterial pneumonia typically bows down to antibiotics, while viral pneumonia necessitates
supportive care like antiviral medications to relieve symptoms.
In cases of fungal pneumonia, antifungal medications take the reins (Vanreppelen et al., 2023).
In severe instances or when complications arise, hospitalization may be inevitable. Intravenous
antibiotics, oxygen therapy, and close monitoring provide the necessary support in such critical
situations. However, prevention is always the better course of action. Vaccination serves as a
shield against bacterial pathogens like Streptococcus pneumoniae and Haemophilus influenzae
(Alghamdi, 2021).
Additionally, influenza and other viruses that may lead to pneumonia have vaccines available.
Practicing good hand hygiene with regular hand-washing minimizes the risk of infectious
agents spreading. Avoiding smoking, be it active or passive, helps maintain healthy lungs and
a robust immune system. Leading a healthy lifestyle, including a balanced diet, regular
exercise, and managing chronic conditions, further boosts the body's defense mechanisms.
Lastly, minimizing contact with individuals afflicted with respiratory infections lowers the
chances of contracting pneumonia. Machine learning is a powerful tool that can be used to
improve the diagnosis, treatment, and prevention of pneumonia (Effah et al., 2022). In the
realm of diagnosis, machine learning algorithms can analyze medical images to detect
characteristic patterns associated with pneumonia. This can help radiologists make faster and
more accurate diagnoses.
Machine learning can also be used to predict the severity and prognosis of pneumonia. By
sifting through vast amounts of electronic health records and clinical data, predictive models
can identify patterns and markers that contribute to better patient care and resource allocation.
Machine learning can also be used to track pneumonia outbreaks. By harnessing data from
various sources, including social media, healthcare systems, and environmental factors,
7
machine learning models can spot trends indicative of potential outbreaks. This information
can be used by public health authorities to implement proactive measures to curtail the spread
of the disease. In the pursuit of treatment optimization, machine learning's data-driven
approach can unveil insights into the effectiveness of different treatment strategies for various
types of pneumonia and patient profiles. By analyzing patient outcomes on a large scale, the
models offer valuable guidance to healthcare professionals, facilitating informed clinical
decision-making. However, there are also challenges associated with applying machine
learning to healthcare. One challenge is safeguarding patient privacy. Another challenge is
ensuring that machine learning models are interpretable, so that healthcare professionals can
understand how they work. Despite these challenges, machine learning has the potential to
revolutionize the way pneumonia is diagnosed, treated, and prevented. By harnessing the power
of machine learning and addressing the associated challenges, we can improve outcomes for
pneumonia patients worldwide.
Pneumonia can be categorized into several types based on different factors such as the causative
agent, where it was acquired, or the affected population. Below are some common types of
pneumonia:
8
4. Aspiration Pneumonia: Aspiration pneumonia arises when foreign substances, such
as food, liquids, or vomit, are inhaled into the lungs, leading to infection. This type of
pneumonia often affects individuals with impaired swallowing reflexes or altered
consciousness, such as those who have had a stroke or consume excessive alcohol.
5. Atypical Pneumonia or Walking Pneumonia: Atypical pneumonia is caused by
certain bacteria like Mycoplasma pneumoniae and Chlamydophila pneumoniae. It is
often milder and may not present with the classic symptoms seen in typical bacterial
pneumonia. This form of pneumonia is sometimes referred to as "walking pneumonia"
because people can still function despite being infected.
6. Hospital-Acquired, Early-Onset Pneumonia and Late-Onset Pneumonia: In some
cases, hospital-acquired pneumonia is classified based on when it occurs after
admission. Early-onset pneumonia typically occurs within the first four days of
hospitalization, while late-onset pneumonia develops after four or more days.
7. Viral Pneumonia: Viral pneumonia is caused by various viruses, including influenza,
respiratory syncytial virus (RSV), adenovirus, and others. It can affect both children
and adults and tends to be more common during flu seasons.
8. Bacterial Pneumonia: Bacterial pneumonia is caused by different bacteria, with
Streptococcus pneumoniae being the most common culprit. Other bacteria, such as
Haemophilus influenzae, Staphylococcus aureus, and Legionella pneumophila, can also
lead to bacterial pneumonia.
9. Fungal Pneumonia: Fungal pneumonia is caused by various fungi, and it is more
common in individuals with weakened immune systems, such as those with HIV/AIDS
or undergoing chemotherapy. Fungal pneumonia is less common but can be severe in
these vulnerable populations.
Pneumonia is an infection that affects the lungs, and its symptoms can vary depending on the
cause, the individual's age, and overall health. Below is a list of common symptoms of
pneumonia:
1. Cough: A persistent cough is one of the hallmark symptoms of pneumonia. The cough
may be dry or productive, producing phlegm or mucus.
9
2. Fever and Chills: Pneumonia often leads to a fever, with the body temperature rising
above the normal range. Chills may also accompany the fever as the body attempts to
regulate its temperature.
3. Shortness of Breath or Rapid Breathing: As the infection affects the lungs' ability to
function properly, individuals with pneumonia may experience difficulty breathing or
an increased respiratory rate, especially during physical activity.
4. Chest Pain: Pneumonia can cause chest pain, which may worsen with deep breathing,
coughing, or sneezing. The pain is typically sharp and localized to the affected area.
5. Fatigue and Weakness: The body's immune response to pneumonia can be draining,
leading to feelings of fatigue and weakness.
6. Sweating and Clammy Skin: Individuals with pneumonia may experience excessive
sweating and clammy skin due to the body's effort to figureht the infection.
7. Bluish Tint to Lips and Nails: In severe cases of pneumonia, a bluish tint, known as
cyanosis, may appear on the lips and nails. This discoloration indicates a decrease in
oxygen levels in the blood.
8. Confusion (in elderly individuals): Elderly individuals with pneumonia may exhibit
confusion or changes in mental alertness, which can be a significant symptom in
addition to respiratory symptoms.
It is important to note that the severity of symptoms can vary, and some individuals may have
mild symptoms, while others experience more severe manifestations. Additionally, certain
individuals, such as young children, the elderly, and those with weakened immune systems,
may present atypical or less specific symptoms, making diagnosis challenging in some cases.
1. Age: Both the very young and the elderly are at a higher risk of contracting pneumonia.
Children under the age of 5, especially those younger than 2 years old, have less
developed immune systems, making them more vulnerable. Similarly, the immune
10
system weakens with age, making adults aged 65 and older more prone to infections,
including pneumonia.
2. Weakened Immune System: Individuals with weakened immune systems are more
susceptible to infections, including pneumonia. . People taking immunosuppressive
medications or undergoing treatments like chemotherapy are also at increased risk.
3. Chronic Lung Conditions: Chronic lung diseases such as asthma, chronic obstructive
pulmonary disease (COPD), bronchiectasis, and interstitial lung diseases can damage
the respiratory system, making it easier for infections to take hold.
4. Smoking: Smoking weakens the lungs and impairs the body's natural defense
mechanisms, making smokers more susceptible to respiratory infections, including
pneumonia.
5. Hospitalization: Pneumonia acquired during a hospital stay, known as hospital-
acquired pneumonia, is a significant concern, especially for patients on ventilators or
those with prolonged hospital stays
6. Living Conditions: Crowded or densely populated living conditions, such as in nursing
homes or homeless shelters, can facilitate the transmission of respiratory infections like
pneumonia.
7. Environmental Factors: Exposure to environmental pollutants and irritants, such as
air pollution or certain occupational exposures, can weaken the respiratory system and
increase the risk of pneumonia.
8. Seasonal Factors: Certain pathogens that cause pneumonia, such as influenza viruses,
are more prevalent during specific seasons. Influenza-associated pneumonia is more
common during the flu season, which typically occurs in colder months.
Having one or more of these risk factors does not guarantee the development of pneumonia,
but it does increase the likelihood. Moreover, many cases of pneumonia can be prevented or
managed through vaccination, maintaining a healthy lifestyle, practicing good hand hygiene,
and seeking timely medical attention when symptoms arise. Identifying and addressing these
risk factors can play a crucial role in reducing the burden of pneumonia on public health.
11
1. Medical History and Physical Examination: The first step in diagnosing pneumonia
is a thorough medical history assessment and physical examination. The healthcare
professional will inquire about the patient's symptoms, such as cough, fever, shortness
of breath, chest pain, and fatigue. or underlying health conditions that could weaken the
immune system. During the physical examination, the healthcare provider will listen to
the patient's lungs with a stethoscope to check for abnormal breath sounds, such as
crackles or wheezing, which could indicate pneumonia.
2. Chest X-ray: A chest X-ray is one of the most common imaging tests used to diagnose
pneumonia. It can reveal areas of inflammation and consolidation in the lungs, which
are indicative of infection.
3. Blood Tests: Blood tests can provide valuable information to support the diagnosis of
pneumonia. A complete blood count (CBC) can show an elevation in white blood cell
count, indicating an immune response to infection. Additionally, the presence of high
levels of C- reactive protein (CRP) and erythrocyte sedimentation rate (ESR) in the
blood may suggest an ongoing inflammatory process.
4. Sputum Culture: If the patient is producing sputum (mucus coughed up from the
lungs), a sample may be collected and sent for a sputum culture. This test can help
identify the specific pathogen causing the pneumonia, whether it's a bacterium, virus,
or fungus. Determining the causative agent is crucial for guiding appropriate treatment,
especially in severe or complicated cases.
5. Arterial Blood Gas (ABG) Analysis: In severe cases of pneumonia, an arterial blood
gas analysis may be performed to assess the patient's oxygen and carbon dioxide levels.
This test helps determine the adequacy of respiratory function and guides decisions
regarding oxygen therapy and mechanical ventilation, if necessary.
12
2.1.5 Treatment Options for Pneumonia
The treatment options for pneumonia depend on the cause of the infection, the severity of the
illness, and the patient's overall health condition. Here are the main treatment options for
pneumonia:
These drugs can help reduce the severity and duration of the illness. However, antiviral
medications are most effective when started early in the course of the infection, so timely
diagnosis is crucial.
13
likely for individuals with underlying health conditions, the elderly, and those with
compromised immune systems.
2. Respiratory Support: Some severe cases of pneumonia can lead to acute respiratory
distress syndrome (ARDS), which may require advanced respiratory support, such as
mechanical ventilation, to help with breathing.
Preventive measures against pneumonia are essential to reduce the incidence of this potentially
serious respiratory infection. These measures focus on strengthening the immune system,
avoiding exposure to infectious agents, and minimizing risk factors. Here are some key
preventive measures against pneumonia
14
6. Avoiding Aspiration: Aspiration pneumonia can occur when inhaling food, liquids, or
foreign objects into the lungs. To prevent this, individuals at risk should eat slowly,
take small bites, and avoid lying down immediately after eating.
Machine learning is a powerful tool that allows computers to refine algorithms as they process
more data. Take, for instance, the self-driving car example. By feeding computers with
terabytes and petabytes of data, machine learning enables them to learn and create their own
algorithms, based on pre-existing human-driven programming, to achieve the desired results.
As Nvidia explains, the fundamental principle of machine learning involves using algorithms
to analyze data, learn from it, and then make predictions about real-world scenarios (Copeland,
2021).
Unlike traditional programming, where software routines are manually coded to execute
specific tasks, machine learning trains machines to learn how to perform tasks by ingesting
vast amounts of data and using sophisticated algorithms. McKinsey & Company defines
machine learning as an algorithm-based approach to learning from data without relying on pre-
defined rules (Pyle et al., 2019).
Pneumonia detection often employs machine learning, a technology that utilizes artificial
intelligence (AI). With machine learning, systems can automatically learn from their
experiences and improve their performance over time without the need for explicit
programming. Machine learning is focused on developing computer programs that can access
data and utilize it to learn on their own. The learning process begins with observations or data,
such as examples, first-hand experience, or instruction, and aims to find patterns in the data to
enhance future decisions. The primary objective is to enable computers to learn independently,
without human intervention, and adjust their behaviour accordingly.
15
Figure 2.1 How machine learning is a subset of AI (towards data science, Seema Singh,
2018)
Machine learning is a subset of artificial intelligence that involves the development of computer
algorithms that can automatically learn and improve from data without explicit programming.
1. Image and speech recognition: Machine learning is widely used in image and speech
recognition applications, such as facial recognition, voice recognition, and image
classification (Sarker, 2021). These applications are used in security, healthcare,
entertainment, and many other fields.
2. Predictive modelling: Predictive modeling is used to predict future events based on
historical data. Machine learning algorithms are used in predictive modelling to identify
patterns in data and make predictions (Lawton et al., 2022). This application is used in
finance, marketing, healthcare, and many other industries.
3. Natural language processing: Natural language processing involves the development
of algorithms that can process and understand human language (Lutkevich & Burns,
2023). Machine learning is used in natural language processing to improve speech
recognition, machine translation, and sentiment analysis, among others.
4. Fraud detection: Machine learning is used in fraud detection applications to identify
fraudulent transactions, credit card fraud, and insurance fraud (Ali et al., 2022). These
applications are used in finance, insurance, and other industries to prevent financial
losses.
16
5. Recommendation systems: Recommendation systems use machine learning
algorithms to recommend products, services, and content to users based on their past
behavior and preferences. These systems are used in e-commerce, social media, and
entertainment applications (Dwivedi, n.d.).
6. Autonomous vehicles: Machine learning is used in autonomous vehicle applications
to enable self-driving cars and other vehicles. These applications use sensors and
cameras to collect data and machine learning algorithms to interpret the data and make
decisions.
7. Personalized medicine: Machine learning is used in personalized medicine
applications to develop personalized treatment plans based on a patient's medical
history, genetics, and lifestyle. These applications are used in healthcare to improve
patient outcomes and reduce healthcare costs.
Machine learning has numerous applications across various industries, and its use is rapidly
increasing due to its ability to analyze large amounts of data, identify patterns, and make
predictions. The above-listed applications of machine learning are just a few examples of how
it is being used to solve real-world problems and improve our lives.
Machine learning is a subset of artificial intelligence that utilizes statistical algorithms to enable
computers to learn from data and improve their performance without being explicitly
programmed. There are three main approaches to machine learning, each with different types
and examples:
17
2.4.1 Supervised Learning
Supervised learning is a type of machine learning approach that involves training an algorithm
on labelled data, where the input and output data are already known (Petersson, 2021). The
algorithm learns to identify patterns in the data and uses these patterns to make predictions on
new, unseen data. Supervised learning is suitable for problems that involve classification or
regression tasks.
(a) Classification: In classification, the output is a categorical variable. For example, email
classification as spam or not spam, or image classification as a dog or a cat. Examples of
supervised classification algorithms are logistic regression and decision trees.
(b) Regression: In regression, the output is a continuous variable. For example, predicting
house prices based on their features, or predicting a person's salary based on their age and
education. Examples of supervised regression algorithms are linear regression and polynomial
regression.
(a) Clustering: In clustering, the goal is to group similar data points together based on their
characteristics. For example, clustering customers based on their purchasing habits or
clustering news articles based on their content. Examples of unsupervised clustering algorithms
are k-means, hierarchical clustering, and DBSCAN.
18
2.4.3 Reinforcement Learning
Markov Decision Process (MDP): In MDP, the agent interacts with the environment by taking
actions, and the environment responds with a reward signal. The agent's goal is to learn a policy
that maximizes the expected cumulative reward. Examples of reinforcement learning
algorithms using MDP are Q-learning, SARSA, and Deep Q- Networks (DQN).
Policy Gradient: In policy gradient, the agent learns a probability distribution over actions
that maximizes the expected cumulative reward. This approach is suitable for problems with
continuous action spaces, such as robot control or game playing. Examples of reinforcement
learning algorithms using policy gradient are REINFORCE, Actor- Critic, and Proximal Policy
Optimization (PPO).
In summary, machine learning has three main approaches, each with different types and
examples. Supervised learning is suitable for classification and regression tasks, while
unsupervised learning is suitable for clustering and association tasks. Reinforcement learning
is suitable for problems that require decision making in an uncertain environment.
In the context of pneumonia detection using machine learning, four commonly used algorithms
are Support Vector Machine (SVM), Logistic Regression, Random Forest, and Convolutional
Neural Networks (CNN). Each algorithm offers unique strengths and capabilities in the realm
of medical image analysis and classification tasks.
Random Forest is an ensemble learning method that combines multiple decision trees to
improve performance and generalization. It works by creating multiple decision trees during
the training process and averaging their predictions for making final decisions.
19
Figure 2.3 Example of a Random Forest model (Johnson, 2020)
Advantages:
Challenges:
1. Complexity: Random Forest models are more complex than individual decision trees,
making them harder to interpret compared to single decision trees.
2. Computationally intensive: Building and training multiple decision trees can be
computationally expensive, especially with large datasets.
Convolutional Neural Network is a type of deep learning architecture that is primarily used for
image recognition and computer vision tasks. It is designed to automatically and adaptively
learn spatial hierarchies of features from input images, allowing it to identify patterns, objects,
and structures within the images. The key components of a CNN are convolutional layers,
20
pooling layers, and fully connected layers. Here's a brief overview of each: Convolutional
Layer: The convolutional layer is the core building block of a CNN. It consists of a set of
learnable filters (also called kernels) that slide over the input image. Each filter performs a
convolution operation, which involves element-wise multiplication of the filter with a local
region of the input image, followed by summation. The result is a feature map that highlights
certain patterns or features found in the input image. The formula for the convolution operation
in 2D can be represented as follows:
F(i,j)=(I*K)(i,j)= ∑m∑nI(i+m,j+n)K(m,n)
Where:
- \( (i-m, j-n) \) represents the position within the input image where the filter/kernel overlaps.
Pooling Layer: The pooling layer is used to reduce the spatial dimensions of the feature maps
obtained from the convolutional layers. It helps in reducing the computational complexity and
making the network more robust to small variations in the input. The most common type of
pooling is max-pooling, which takes the maximum value from a local region of the feature map
and retains only the most significant information. Fully Connected Layer: After several
convolutional and pooling layers, the final feature maps are flattened into a 1D vector and
passed through one or more fully connected layers. These layers are similar to those in a
traditional neural network, connecting all neurons from the previous layer to all neurons in the
current layer. They help in learning complex non-linear relationships between the extracted
features and the output classes. The formula for a fully connected layer is standard and involves
a matrix multiplication:
21
Figure 2.4 How CNN algorithm works (Kalita, 2022)
The general architecture of machine learning involves several key components and stages that
collectively enable the learning and predictive capabilities of a model. The process of building
and training a machine learning model typically follows these fundamental steps:
Data Collection:
The first step in the machine learning process is to gather relevant data for the problem at hand.
Data can come from various sources, such as databases, APIs, sensors, or online repositories.
The quality and size of the dataset significantly impact the performance and generalization
ability of the model.
22
2.6.1 Data Preprocessing
Raw data often requires preprocessing to make it suitable for training a machine learning
model. This stage involves data cleaning, which includes handling missing values, removing
outliers, and normalizing or scaling features to ensure consistency and comparability.
Feature Engineering:
Model Selection:
Model Training:
In this phase, the selected model is fed with the preprocessed data to learn from it. During
training, the model optimizes its internal parameters based on the input data and a defined
objective (e.g., minimizing error or maximizing accuracy). The learning process typically
involves an optimization algorithm that adjusts the model's parameters to minimize the
difference between the predicted outputs and the actual targets in the training data.
Model Evaluation:
Once the model is trained, it needs to be evaluated using a separate set of data, called the
validation or test set. This evaluation ensures that the model can generalize well to new, unseen
data. Various metrics, such as accuracy, precision, recall, and F1 score, are used to assess the
model's performance.
Hyperparameter Tuning:
Most machine learning algorithms have hyperparameters that govern their behavior but are not
learned during training. Hyperparameter tuning involves selecting the best combination of
23
hyperparameters to optimize the model's performance. Techniques like grid search, random
search, or Bayesian optimization are commonly used for this purpose.
Model Deployment:
Machine learning models require continuous monitoring to ensure they perform as expected in
the production environment. Monitoring involves tracking model performance, detecting drift
(changes in data distribution), and retraining the model periodically with new data to maintain
its accuracy over time.
The iterative nature of machine learning involves going back to previous stages, such as data
collection, preprocessing, and feature engineering, to improve the model's performance
continually. This cyclical process is often referred to as the "machine learning pipeline" and
forms the foundation for solving a wide range of problems across various domains.
Pneumonia is a serious health condition that can be difficult to diagnose. In recent years, there
has been growing interest in the use of machine learning algorithms to improve the detection
of pneumonia. These algorithms can analyze patient symptom data, such as persistent cough,
chest pain, and fever, to identify individuals who are at higher risk of having pneumonia.
Researchers have shown that pneumonia using a variety of datasets, which vary in quality and
features. By carefully selecting and extracting relevant information from these datasets, authors
have been able to draw meaningful conclusions from their research.
24
The sensors measure the local circumference changes of the chest and abdominal walls
simultaneously, and the data is wirelessly transmitted to a laptop. Three different random forest
classifiers were used to process the data, and the results showed that the individual and
weighted- adaptive classifiers were able to classify postures with an accuracy of up to 98.9%
and 98.8%, respectively. The study demonstrates that the accurate monitoring of respiratory
behaviors can be used to track the progression of respiratory disorders and diseases, and can
provide timely and objective approaches for control.
Vatanparvar et al. (2020) developed CoughMatch Subject Verification Using Cough for
Personal Passive Health Monitoring." In this paper, the authors presented a method that utilize
a limited number of cough samples to create a personal cough model for the primary subject.
This model is then employed by an automatic cough detection system to verify whether the
identified cough match the personal pattern and belong to the primary subject. Zhang et al.
(2020) developed Detecting asthma exacerbations using daily home monitoring and machine
learning." In this study, the authors aimed to develop a machine learning algorithm that could
accurately detect severe asthma exacerbations using easily available daily monitoring data.
Khasha et al. (2019) proposed an ensemble learning method for asthma control level detection,
titled "An ensemble learning method for asthma control level detection with leveraging medical
knowledge-based classifier and supervised learning." The study highlights the significance of
asthma, a disease affecting approximately 300 million individuals worldwide and leading to an
estimated 250,000 deaths. Without proper treatment, asthma can become a serious public health
concern.
Pramono et al. (2019) developed "Automatic Cough Detection in Acoustic Signal using
Spectral Features," a study that presents an algorithm for automatically detecting cough events
from acoustic signals. The algorithm utilizes only three spectral features in conjunction with a
logistic regression model to classify sound segments into cough and non-cough events. These
25
spectral features are derived through simple calculations from two specific frequency bands
within the sound spectrum, which were selected based on their distinctive characteristics.
Infante et al. (2017) developed "Classification of Voluntary Coughs Applied to the Screening
of Respiratory Disease." In this study, the authors investigated the potential of analyzing
voluntary cough sounds for screening pulmonary diseases. They recorded voluntary coughs
using a custom mobile phone stethoscope from a total of 54 patients, including 7 with COPD,
15 with asthma, 11 with allergic rhinitis, 17 with both asthma and allergic rhinitis, and 4
withboth COPD and allergic rhinitis. Additionally, data were collected from 33 healthy
subjects for comparison.
Van Vliet et al. (2017) proposed "Can exhaled volatile organic compounds predict asthma
exacerbations in children?" The objectives of the study were twofold: (1) to identify a set of
exhaled volatile organic compounds (VOCs) that could serve as predictors for asthma
exacerbations in children, and (ii) to determine the chemical identity of these predictive
biomarkers. The researchers conducted a one-year prospective observational study involving
96 asthmatic children. At two-month intervals during clinical visits, various parameters were
assessed, including asthma control, fractional exhaled nitric oxide levels, lung function
measurements (FEV1, FEV1/VC), and VOCs in exhaled breath using gas chromatography
time- of-flight mass spectrometry. Random Forest classification modeling was employed to
select the most predictive VOCs, and receiver operating characteristic (ROC) curves were
plotted.
26
Table 2.1: Comparison of different techniques on pneumonia
3 Tsang et al. (2020) DT, LR, and SVM Small sample size 72.5%
4 Khasha et al. (2019) Ensemble learning, LR, SVM, Small dataset 92.7%
random forest, KNN, and DT
Over time, a variety of methods have been developed to diagnose pneumonia. These methods
differ in terms of their accuracy, usefulness. A table of the different techniques used to diagnose
pneumonia is presented in Table 2.1.
Previous research has used a variety of machine learning algorithms to create respiratory
disease prediction models. However, these models have some limitations, such as limited
datasets, over fitting, and unrealistic sizes. Chen et al. (2019) models have various drawbacks,
including limited datasets, pre-processing over fitting of the data, and models built on
predefined sizes that are not relevant in the actual world. The performance of the model is
inefficient or inaccurate as a result of all these restrictions. Zhang et al. (2020) tackled the issue
by integrating a real-time dataset and leveraging a large dataset size. They also adopted
minimal pre-processing and feature extraction techniques to mitigate over fitting and under
fitting during the model's development.
27
CHAPTER THREE
METHODOLOGY
This section looks deeply into exploring the proposed system and the creation of a model
intended to tackle the identified issue discussed in the preceding section. The goal is to predict
instances of pneumonia in potential patients through the utilization of a machine learning
framework. The segment presents a sequence of stages that will be undertaken, commencing
with the gathering of data. This encompasses the parameters and target variables of the
Pneumonia dataset. Following this, a pre-processing procedure will be executed to handle any
absent information, succeeded by feature extraction aimed at streamlining the data and
eliminating insignificant and repetitive attributes. This process aims to enhance the accuracy
of predictions. Within this study, an intricate machine learning model will be formulated with
the objective of foreseeing pneumonia occurrences in potential patients. The model will make
use of Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) as
classifiers for training, with a combination of SVM and Random Forest. The efficiency of the
model will subsequently be assessed using a range of performance metrics, including accuracy,
recall, precision, and the confusion matrix.
Machine learning and artificial intelligence (AI) are revolutionizing the field by leveraging
extensive chest X-ray datasets to identify pneumonia-related patterns. This leads to enhanced
accuracy in detecting even mild cases of pneumonia. Furthermore, machine learning automates
X-ray analysis, reducing the occurrence of human errors. These advancements hold the
potential to accelerate, refine, and cost-effectively improve pneumonia detection, ultimately
benefiting patient outcomes. The proposed detection system, depicted in Figure 1.7, is designed
to tackle the challenge of pneumonia detection. The emphasis lies in creating a comprehensive
and consistent system through the effective integration of machine learning methods. The
primary goal of this research is to bolster the dependability of pneumonia detection in patients.
28
Figure 3.1 Pneumonia Detection Block Diagram
This project employed the dataset of Chest X-Ray Images (Pneumonia), sourced from Kaggle
in CSV format. The data is categorized into three directories (train, test, val), further subdivided
into subfolders corresponding to each image type (Pneumonia/Normal). The dataset
encompasses a total of 5,000 X-Ray images in JPEG format, spanning across two categories
(Pneumonia/Normal). These X-ray images, captured in the anterior-posterior view, were
carefully chosen from historical collections of pediatric patients aged one to five years at the
Guangzhou Women and Children’s Medical Center in Guangzhou.
Image resizing is a process of changing the size of a picture. Imagine you have a big photo, but
you want it to be smaller or bigger. Resizing does that. It makes the picture fit better on a screen
or in a frame. That's basically how image resizing works!
Using data augmentation is like adding some variety to your dateset to help your machine
learning model learn better. A good example is having a bunch of different dog pictures but
you want your model to recognize all kinds of dogs, not just those in one position or lighting.
Data augmentation helps by making small changes to your pictures, like flipping them
29
horizontally, rotating them a bit, or changing the brightness. This creates new versions of your
pictures that your model can learn from, making it better at recognizing dogs in all sorts of
situations. So, by using data augmentation, you're giving your model more examples to learn
from, which can improve its performance
Data Normalization ensures all data features are on a similar scale, preventing biases in model
training. Like putting all your data on the same scale, so your machine learning model can
understand it better.a good example is having two friends age and their salaries. One friend's
age is in the 20s, and the other's is in the 30s, while their salaries are in thousands and tens of
thousands. If you don't normalize the data, your model might think age is more important just
because the numbers are bigger. Normalization fixes this by making sure all the data is on a
similar scale helping your model make better predictions.
Feature extraction involves transforming raw data into a more compact and meaningful
representation that captures relevant information for the machine learning task. In image data,
features can be extracted using techniques like edge detection, color histograms, or deep
learning-based feature extraction using pretrained convolutional neural networks (CNN) to
carry out and execute its operation.
Feature Selection
Feature selection involves choosing a subset of the most relevant features from the extracted
set to improve model performance, reduce overfitting, and enhance interpretability. Feature
selection reduces model complexity, improves training speed, and enhances model
generalization by focusing on the most informative features.
This study aims to design a system that would help in prediction of pneumonia, to carry out
this purpose we would make use of two machine learning algorithm, Convolutional Neural
Networks (CNN), and Support Vector Machine (SVM), a total of five thousand images was
gathered through the research, four hundred and twenty images was used for training, which
was broken down into three categories, Pneumonia normal, Pneumonia bacteria, Pneumonia
30
virus. As these images are uploaded in to the system, it uses it for training by extracting features
from the images after which the system would have learned and be able to predict a new input
image of Pneumonia.
A support vector machine (SVM), is a machine learning algorithm that uses supervised learning
models to solve complex classification, regression, and outlier detection problems by
performing optimal data transformations that determine boundaries between data points. Based
on predefined classes, labels, or outputs, they are widely adopted across disciplines such as
healthcare, natural language processing, signal processing applications, speech and image
recognition fields. In the mathematical context, an SVM refers to a set of machine algorithms
that use kernel methods to transform data features by employing kernel functions. Kernel
functions rely on the process of mapping complex datasets to higher dimensions in a manner
that makes data point separation easier. The function simplifies the data boundaries for non-
linear problems by adding higher dimensions to map complex data points. The mathematical
formulation of SVM focuses on finding the optimal hyperplane that maximizes the margin
between classes in a high-dimensional space. Here's a breakdown of the key components:
Data Represent
Each data point x_i is represented as a vector in an n-dimensional space, where n is the
number of features.
The class label of each data point is denoted by y_i, where y_i can be +1 or -1 for binary
classification problems (other conventions exist for multi-class problems).
Hyperplane Equation:
w^T * x_i + b = 0
Here, w is a weight vector with the same dimension as the data points (n-dimensional),
and b is the bias term that determines the position of the hyperplane relative to the
origin.
31
Margin:
The margin is defined as the distance between the hyperplane and the closest data points
from each class, known as support vectors.
We want to maximize this margin to create a clear separation between classes.
To find the optimal hyperplane, we typically minimize a cost function that penalizes
instances where data points fall on the wrong side of the margin or too close to the
hyperplane.
A common cost function used in SVM is the hinge loss:
When data is not linearly separable in the original feature space, SVM employ the
kernel trick.
This trick involves implicitly mapping the data points to a higher-dimensional space
where they become linearly separable.
A kernel function operates on the original data points and computes their inner product
in the higher-dimensional space without explicitly performing the mapping.
Common kernel functions include linear, polynomial, and Gaussian (RBF).
once the optimal hyperplane is determined, a new data point x is classified by evaluating
the sign of the decision function
32
f(x) = w^T * x + b
This mathematical framework provides a foundation for understanding how SVM operate to
create decision boundaries and perform classification tasks. The main objective of the SVM
algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the
data points in different classes in the feature space. The hyperplane tries that the margin
between the closest points of different classes should be as maximum as possible. The
dimension of the hyperplane depends upon the number of features. If the number of input
features is two, then the hyperplane is just a line. If the number of input features is three, then
the hyperplane becomes a 2-D plane. It becomes difficult to imagine when the number of
features exceeds three. Convolutional Neural Networks (CNN) are a class of deep neural
networks, most commonly applied to analyzing visual imagery. They have revolutionized the
field of computer vision, enabling impressive performance in tasks such as image
classification, object detection, and image segmentation.
Image Classification: CNN excel in classifying images into categories. For example,
identifying whether an image contains a cat or a dog.
Object Detection: They are used to detect and localize objects within images. Popular
frameworks like Faster R - CNN and YOLO (You Only Look Once) use CNN for object
detection.
Image Segmentation: CNN can segment an image into different regions, assigning a label to
each pixel. This is valuable in medical imaging, autonomous driving, and more.
Video Analysis: CNN can be extended to analyze video data by processing each frame using
the same principles as image analysis.
Training a CNN involves forward propagation (where input data passes through the network,
layer by layer) and back propagation (where errors are calculated and weights are updated).
Due to the deep and complex nature of CNN, training often requires substantial computational
resources and large datasets.
33
3.2.8 Model Testing
Model testing in machine learning refers to the process of evaluating the performance and
effectiveness of a trained machine learning model on unseen data. The goal of model testing is
to assess how well the model can generalize its learned patterns from the training data to new,
previously unseen data. This is a crucial step in the machine learning pipeline to ensure that
the model performs well in real-world scenarios.
3.2.9 Evaluation
The model's performance is measured in terms of recall, accuracy, precision, and f1score.
Accuracy: quantifies the percentage of instances that are classified correctly among all
instances. It is computed by dividing the number of accurate predictions by the total number of
predictions. Essentially, it represents the proportion of correct predictions made by our model,
reflecting its overall correctness.
Precision: evaluates the accuracy of identifying true positive instances among the predicted
positives. It quantifies the proportion of true positives out of all positive predictions. This
assessment reflects the model's effectiveness in predicting a particular category and is
employed to gauge its capability in correctly classifying positive values.
Recall: revolves around accurately identifying positive instances among the actual positives.
Mathematically, it represents the true positives divided by the total count of actual positive
instances. This metric provides insights into how effectively the model detects a particular
category and assesses its capacity to predict true positive values.
F1-Score: serves as a balanced metric, taking into account both precision and recall
simultaneously. When there is a need to consider both precision and recall, the F1 Score comes
in handy, as it embodies the harmonic mean of these two metrics.
The application of the confusion matrix technique aids in obtaining the essential parameters
for evaluating model performance. Primarily employed for assessing classification models, this
two-dimensional table arranges the model's predicted labels in columns and the true class labels
34
in rows. The confusion matrix enables the derivation of crucial metrics such as True Positive
(TP), True Negative (TN), False Positive (FP), and False Negative (FN). Table 3.1 visually
illustrates the extraction of these values from the table, which subsequently serve as the
foundation for calculating the model's performance metrics.
A confusion matrix is a table that visualizes the performance of a classification model. It's a
2x2 matrix (for binary classification) with the actual class labels on one axis and the predicted
class labels on the other axis. The confusion matrix is a fundamental tool in evaluating the
performance of a classification model in machine learning. It provides a summary of the
predictions made by a model on a dataset, showing how well the model's predictions align with
the actual labels.
True positives (TP) occur when the model accurately predicts the positive class, correctly
recognizing an observation as part of the positive class.
False positives (FP) happen when the model predicts the positive class incorrectly, wrongly
identifying an observation as belonging to the positive class when it actually does not.
True negatives (TN) are instances where the model correctly predicts the negative class,
accurately identifying an observation as not belonging to the positive class.
35
False negatives (FN) are situations where the model predicts the negative class incorrectly,
mistakenly classifying an observation as not belonging to the positive class when it actually
does.
The user interface design for this research project involves crafting an intuitive and user-
friendly interface aimed at enabling interaction between users, such as medical professionals
and researchers, and the machine learning (ML) model used for diagnosing pneumonia from
medical images. The main page will feature a clearly worded title that communicates the
application's purpose. Additionally, it might incorporate a concise explanation of the system's
functioning and its advantages. A designated area will be created where users can submit
medical images like X- rays or CT scans for analysis. This interface will possess the capability
to support file uploads and provide confirmation for successful uploads. This segment of the
UI will also keep users informed about the progress of their uploaded images, potentially
utilizing a progress bar or animated loading icon to signify the ongoing image analysis process.
Once the analysis concludes, the outcome of the prediction will be presented. This could
manifest as a straightforward message such as "Pneumonia Detected" or "No Pneumonia
Detected," accompanied by the prediction's confidence level.
Input design pertains to the process of creating an interface that allows users to interact with
the system by providing input. In the described user interface design above.
Output design refers to the presentation of results or information from the system to the users
in a clear and comprehensible manner. The following explains how the output design will be
effective:
Prediction Results: Once the system analyzes the medical images, the output design involves
presenting the prediction results. This could be a simple message indicating whether
pneumonia is detected or not.
36
Clear Buttons and Labels: The output design includes ensuring that buttons and labels used
in the UI are clear and descriptive. This enhances user understanding and navigation within the
application.
Responsive UI: The output design extends to making sure that the UI is responsive, adapting
effectively to various screen sizes and devices, including desktops, tablets, and smartphones.
This ensures that users can access and view the output information regardless of the device
they are using.
Software pertains to a collection of instructions and programs that guide the functioning of a
machine. These are forged by programmers and software teams. In this exploration, the
subsequent software constituents find utility, each with its distinct purpose:
The tangible constituents of the computer system encompass the physical constituents that are
observable, tactile, and perceptible. These embrace the input device, display screen, data
37
storage unit, central processing unit, memory module, and tactile interface. Provided below is
an inventory detailing the distinct hardware aspects harnessed within this inquiry, along with
their respective objectives:
1. Central Processor Unit (CPU): A minimum of 2GHz Core i5 8th generation processor.
2. Random Access Memory (RAM): No less than 4GB of RAM.
3. Data Storage Drive: A minimum of 500GB for permanent retention of code, datasets,
and trained models.
4. Typing Device: A 101-key US standard keyboard.
5. Pointing Device: A 3D mouse
38
CHAPTER FOUR
4.1 Introduction
The study deployed a machine learning model, two algorithms was used namely Convolutional
Neural Network, and Support Vector System through the design of a standalone application
by which a user acquires the predicted results from the input images used to train the system.
The images include categories of Pneumonia Normal, Pneumonia bacteria, Pneumonia virus
categories. As these images are uploaded in to the system, it uses it for training by extracting
features from the images after which the system would have learned and be able to predict a
new input image of Pneumonia.
The machine learning model is deployed by designing a web-based application through which
the user can get the predicted results by inputting the values for the required parameters such
as the result for age, gender, anxiety, peer pressure and so on. As these parameters are inputted
into the web application, it sends it to the backend where the machine learning model is stored.
When the data is received by the backend, the model then detect the outcome of the respiratory
disease results and sends the response back to the frontend so the result can be displayed. The
degree of accuracy for this system is based on the number of records contained in the dataset
used to carry out the operation. The application is a medium by which data passes through to
the machine learning model, the model checks itself to see if there is some resemblance in the
dataset used in its construction, it then learns from already available data gotten from the dataset
utilized. Finally, it delivers the appropriate outcome back to the user. The web application
designed in this study was tested with localhost and can be deployed online as a fully working
application i.e., it can be accessed at any time and from anywhere without restrictions. Users
of this system can test whenever they feel the need to, which in turn improves the model’s
ability to learn new things about the different data values presented to it.
This system runs on a web environment and is used with the following procedures on the
localhost:
Step 1: Start the xampp control panel and start the apache server and MySQL database.
39
Step 2: On your preferred browser type “localhost/lung”. Once the web application loads up, it
displays a user interface (frontend) that initially shows information about images that include
categories of Pneumonia Normal, Pneumonia bacteria, Pneumonia virus. A login and
registration page are provided for users to access the system and register if not already on the
system. When the user accesses the system they are provided with the full functionalities of the
system.
We collected the data set of Chest X-Ray Images (Pneumonia), sourced from Kaggle in CSV
format. The data is categorized into three directories (train, test, val), further subdivided into
subfolders corresponding to each image type (Pneumonia/Normal). The dataset encompasses
a total of 5,000 X-Ray images in JPEG format, spanning across three categories
(Pneumonia/Normal bacteria). These X-ray images, captured in the anterior-posterior view,
were carefully chosen from historical collections of pediatric patients aged one to five years at
the Guangzhou Women and Children’s Medical Center in Guangzhou.
40
4.3 Data preprocessing
Data preprocessing is a crucial step in building a machine learning model for pneumonia
detection using CNN and SVM. Here's a breakdown of some key techniques:
i. Data Acquisition:
Obtain chest X-ray datasets containing images of both healthy and pneumonia-infected
lungs.
41
vi. Preprocessing for SVM
Depending on the chosen SVM implementation, you might need to convert the
preprocessed images into feature vectors. This involves extracting relevant features
from the images that can be used by the SVM for classification. Techniques like
extracting pixel intensities or using pre-trained CNN models for feature extraction
can be employed.
vii. Train-Test Split:
Divide your preprocessed data into training and testing sets. The training set is used
to train the model, and the testing set is used to evaluate its performance on unseen
data. A common split is 80% for training and 20% for testing.
By following these data preprocessing steps, you can prepare your X-ray images for effective
training of your CNN and SVM models for pneumonia detection. Remember to choose
techniques best suited for your specific dataset and model architecture.
Feature evaluation plays a critical role in both CNN and SVM-based approaches for pneumonia
detection. Here's how it's approached in each case:
Visualization Techniques: While CNN excel at automatic feature learning, interpreting these
features can be challenging. Techniques like visualization of filters and activation maps can
provide some insights into what the CNN is focusing on within the X-ray images.
Hand-crafted Features: Unlike CNN, SVM typically require pre-defined features as input.
These features need to be carefully chosen to capture the discriminative information between
healthy and pneumonia-infected lungs.
42
Feature Selection Techniques: When using hand-crafted features, it's essential to evaluate
their effectiveness. Techniques like correlation analysis, chi-square tests, and feature
importance scores can help identify the most relevant features that contribute the most to
classification accuracy.
CNN offer an advantage in feature learning, as they can automatically discover relevant
patterns from the data without the need for manual feature engineering. This can be particularly
beneficial when dealing with complex medical images like chest X-rays.
SVM might be suitable if you have domain knowledge about pneumonia and can identify
specific features that differentiate healthy from infected lungs. However, hand-crafted feature
engineering can be time-consuming and requires expertise.
Transfer Learning: Utilize pre-trained CNN models like VGG16 or ResNet for feature
extraction. These models have already learned powerful features from large image datasets and
can be fine-tuned for pneumonia detection. This leverages the power of CNN for feature
learning while potentially reducing training time.
By effectively evaluating features in your CNN or SVM-based approach, you can ensure your
model is focusing on the most relevant information for accurate pneumonia detection.
While CNN excel at automatic feature learning, and SVM can work with hand-crafted features,
feature selection can still be beneficial in both scenarios for pneumonia detection using chest
X-ray images. Here's how it can be applied:
43
Feature Selection for CNN :
ii Filter Methods (for Pre-trained Models): If using a pre-trained CNN for feature extraction,
filter methods can be applied after feature extraction to select a subset of the learned features
most relevant for pneumonia detection. Techniques like chi-square tests or information gain
can be used for this purpose.
Wrapper Methods: These methods involve evaluating different feature subsets using the
SVM classifier itself as a scoring function. The goal is to find the subset that leads to the best
classification performance on a validation set. Techniques like recursive feature elimination
(RFE) or genetic algorithms can be employed.
Embedded Methods: These methods integrate feature selection within the SVM training
process. L1-regularized SVM inherently perform feature selection by driving some feature
weights to zero, effectively removing those features from the model.
Reduced Training Time: Training with fewer features can be computationally faster,
especially for large datasets.
44
Model Interpretability: In SVMs, feature selection can help identify the most discriminative
features contributing to the classification, providing some insights into the model's decision-
making process.
For CNN: Regularization techniques are often a good first approach. Filter methods can be
considered for fine-tuning after using pre-trained models.
For SVM: Feature selection is more crucial. Techniques like PCA for dimensionality reduction
followed by wrapper or embedded methods are common approaches.
Feature selection can be applied after feature extraction from a pre-trained CNN before feeding
them into the SVM. This leverages the power of CNN for feature learning and SVM for
classification with a potentially more interpretable feature set.
By incorporating feature selection techniques, you can optimize your CNN or SVM model for
pneumonia detection, potentially leading to improved performance, faster training times, and
a better understanding of the factors contributing to accurate classification.
Machine learning uses various steps in developing a model for pneumonia detection using a
combination of CNN and SVM:
i. Data Acquisition:
Obtain a chest X-ray dataset containing labeled images (normal,bacteria, virus). the
dataset is balanced and trained.
Preprocess the images using techniques like normalization, resizing, and potentially
data augmentation to increase dataset size and diversity.
Convert images to grayscale format for CNN.pen_spark
45
iii. CNN Model Development:
Choose a CNN architecture suitable for image classification tasks (e.g., VGG16,
ResNet).
Consider pre-training the CNN on a large image dataset (like ImageNet) for feature
extraction and fine-tuning on your pneumonia dataset. Define the CNN architecture
with convolutional layers for feature extraction, pooling layers for dimensionality
reduction, and fully connected layers for classification.
Train the CNN model on the preprocessed training set, specifying an optimizer (e.g.,
Adam) and loss function (e.g., binary cross-entropy) for optimizing the model's weights
and biases.
After training the CNN, extract features from the final layers before the classification
layer. These features represent the learned patterns from the X-ray images.
Evaluate both the CNN and SVM models (if applicable) on a separate testing set unseen
during training.
Use metrics like accuracy, precision, recall, and F1-score to assess the model's
performance in classifying pneumonia cases.
46
Tuning hyperparameters of the CNN and SVM.
Trying different CNN architectures or feature extraction techniques.
Employing techniques like dropout layers or data augmentation to address overfitting.
viii. Deployment:
If satisfied with the model's performance, consider deploying it for real-world use in a
healthcare setting. This might involve integrating the model into a web application or
medical imaging system.
Class Imbalance: If your dataset has a class imbalance (more healthy cases than
pneumonia cases), techniques like oversampling or under sampling the majority class
can be applied to address this issue.
Transfer Learning: Leverage pre-trained CNN models for feature extraction to reduce
training time and potentially improve performance.
x. Explainability:
While CNN are powerful, interpretability can be challenging. Consider techniques like
visualization or using feature importance analysis in SVMto gain insights into the model's
decision-making process.
The main aim of the system designed is to create an application that can help with the early
detection of pneumonia. They are few simple steps that needs to be taken before the result can
be displayed. Below are simple steps on how to go about it.
47
Step 1: Start Matlab IDE (Integrated Development Environment)
Step 2: type PNEM_INTERFACE_1 in the IDE, wait for some minutes for the first interface
to come up, then begin the processes.
Step 5 : Result
Figure 4.3 shows the Matlab IDE (Integrated Development Environment), a comprehensive
software tool designed to facilitate the development, testing, and deployment of algorithms and
applications using the Matlab programming language. The Matlab IDE typically provides a
user-friendly interface with various features tailored to support the entire workflow of scientific
and engineering computing tasks. Within the IDE, users can write and execute Matlab code,
visualize data, debug programs, and create graphical user interfaces (GUIs) for interactive
applications. The IDE often includes tools for managing files and projects, accessing
documentation and help resources, and integrating with other software and hardware
components. Overall, the Matlab IDE serves as a central hub for Matlab users to efficiently
48
develop and explore solutions for a wide range of technical challenges, from data analysis and
signal processing to image processing and machine learning.
Figure 4.4 represents the first interface of the software application or system designed for
uploading images. This interface serves as the initial step in a larger workflow, where users can
select and upload images from their local storage or from external sources into the application,
this image upload interface would enable users to provide the necessary input data (chest X-
ray images) for analysis and diagnosis. Once the images are uploaded, they would likely
proceed to subsequent interfaces or modules for further processing, analysis, and visualization
of results.
49
Figure 4.5: Second Interface (Selecting the Images for Uploading)
Figure 4.5 represents the second interface in a software application or system designed for
uploading images, specifically focusing on the selection process before uploading. This
interface follows the initial interface where users may have initiated the upload process by
selecting files or dragging and dropping images, users would likely select the relevant chest X-
ray images from those uploaded in the previous interface. These selected images would then
be processed further in subsequent steps, such as applying machine learning algorithms for
diagnosis or generating insights.
50
Figure 4.6 shows the third interface in a software application or system designed for predicting
pneumonia based on uploaded chest X-ray images. This interface follows the image selection
process, where users have chosen the images for analysis. Figure 4.4 serves as a crucial
interface for users to interpret and act upon the predictions generated by the system regarding
the presence or absence of pneumonia in the uploaded chest X-ray images.
Figure 4.7 shows the fourth interface in a software application or system specifically focused
on displaying the results generated by a Convolutional Neural Network (CNN) model for
pneumonia detection in chest X-ray images. Figure 4.5 serves as a comprehensive interface for
presenting the results of the CNN model's analysis of chest X-ray images, providing users with
valuable information to support clinical decision-making and patient care.
Figure 4.8 shows a chest X-ray image dataset used for training and testing machine learning
models, particularly for tasks related to respiratory disease detection such as pneumonia
classification.
51
Performance Analysis
They are different performance analysis that was used to carry out a better result on the models
the performance matrices used in this project are confusion matrix, accuracy precession, recall
and F1 score. A confusion matrix, also sometimes called an error matrix, is a visualization tool
used to evaluate the performance of a classification model.
It provides a clear breakdown of how the model performed on a set of test data, allowing you
to see how many predictions were correct and where the model made mistakes
FP ( False positive ) 25 43 22
FN ( False Negative ) 0 36 20
52
The confusion matrix is obtained from the machine learning code, after training the CNN
model. It will be used to describe the performance of a classification model on the training
data.
To evaluate the precision, Recall and specificity of the CNN model, we draw conclusions from
the confusion matrix in Table
Precision: Precision is the percentage of accurately identified positive values. This can be
derived from the above confusion matrix using the following formula:
Recall: Sensiti4vity, also name for recall, is the percentage of true positive cases that are
accurately identified. This can be derived from the above confusion matrix using the following
formula:
Specificity: Specificity is the percentage of truly negative cases that are accurately identified.
This can be derived from the above confusion matrix using the following formula:
53
The table below summarizes the findings from the four machine learning models that were
used in this study by listing the Precision, Recall, Specificity and Accuracy for each model.
Evaluation of performance
The table below shows the performance of the machine learning model used and their
percentaage in accuracy, precision, Recall and F1 score.
The convolution operation is essentially a sliding dot product between a filter (kernel) and the
input image. It allows the network to learn spatial features within the image data.
Chart Title
SVM+ CNN
F-1 score
54
a) Equation:
S(x, y): This represents the output feature map at a specific location (x, y) in the output volume.
W(i, j): This represents the elements of the filter (kernel) at position (i, j). The filter size is
typically much smaller than the image size.
I(x + i, y + j): This represents the element-wise multiplication between the filter and the
corresponding patch of the input image centered at (x, y).
ΣΣ: This denotes summation over all elements (i, j) within the filter size.
b): This represents the bias term added to the output for each location in the feature map.
The core mathematical equation for a Support Vector Machine (SVM) in classification
problems involves the decision function that separates the data points belonging to different
classes. Here's a breakdown:
a) Decision Function:
The decision function determines on which side of the hyperplane (decision boundary) a new
data point falls and consequently its predicted class.
b) Equation:
f(x) = w^T * x + b
f(x): This represents the output of the decision function for a new data point x.
w: This is the weight vector of the SVM, with the same dimensionality (n) as the feature vectors
of the data points.
x: This represents the feature vector of the new data point to be classified.
b: This is the bias term that influences the position of the hyperplane in the feature space.
55
COMBINATION OF BOTH ALGORITHM
The core equation for an SVM involves the decision function that separates the data points:
f(x) = w^T * x + b
T: Transpose operation.
A core operation in CNN is the convolution, which allows them to learn features directly from
the input data (images).
I(x + i, y + j): Element-wise multiplication between filter and input image patch.
Combination Approach:
56
Use features from the final CNN layers and feed them directly into an SVM for
classification.
Extract features from intermediate CNN layers and use them as input to the SVM.
iii Train the SVM on the extracted features or CNN-generated features to classify new X-ray
images as healthy or pneumonia.
57
CHAPTER FIVE
5.1 Conclusion
This project investigated the potential of machine learning (ML) for the detection of respiratory
diseases, with a focus on pneumonia using chest X-ray images. Convolutional Neural Networks
(CNN) and Support Vector Machines (SVM) were explored as promising techniques for
automated feature extraction and classification. The findings of this project demonstrate the
potential of Machine learning to achieve high accuracy in pneumonia detection. The ability of
CNN to automatically learn relevant features from X-ray images offers a significant advantage
over traditional methods. Furthermore, the integration of SVM leverages their robust
classification capabilities, potentially surpassing human radiologists in certain scenarios.
However, the project also acknowledges the challenges associated with implementing Machine
learning in healthcare settings. Data quality and bias, interpretability of complex models, and
regulatory hurdles require careful consideration and ongoing research efforts.
This project would provide valuable insights into the potential of Machine learning for
respiratory disease detection. the existing challenges was addressed and worked on for a better
proposed future work directions, Machine learning has the potential to become a powerful tool
for improving early diagnosis, treatment planning, and ultimately, patient outcomes in the field
of respiratory medicine.
The project is designed to help in the prediction of Pneumonia, the data was gathered from
Kaggles an online data source, and undergo six important phases, data collection,
preprocessing, feature evaluation, feature selection, data modelling and implementation. Other
attributes was put together which helped in carry out a better result on the models.
This project has established a strong foundation for utilizing Machine learning particularly
CNN and SVM in respiratory disease detection. By leveraging extensive data analysis, the
project demonstrates the potential for improved accuracy and efficiency in pneumonia
diagnosis. This project was design as a standalone application by which a user acquires the
predicted results from the input images used to train the system. The images include categories
of Pneumonia Normal, Pneumonia bacteria, Pneumonia virus categories. As these images are
58
uploaded in to the system, it uses it for training by extracting features from the images after
which the system would have learned and be able to predict a new input image of Pneumonia.
For future studies, we intend to use a larger sample of dataset to obtain higher accuracy as well
as design a form of real-time system where a user gets the pneumonia status from other
biometric attributes such as the iris, retina or other facial features.
5.2 Recommendation
To ensure effective and efficient usage of the results from this work, it is recommended that
there is should be a real time system that would acquire real time images from users utilizing
other biometric attributes such as the iris, retina or other facial features whereby they can
predict their Pneumonia status even from the convenience of their homes without the stress of
coming to see a medical practitioners
This research faces limitations in terms of age bias, environmental bias, and feeding bias. The
dataset used for training may lack representation across different age groups, leading to
potential inaccuracies in pediatric or adult cases. Additionally, environmental biases can arise
due to data originating from specific regions, affecting the model's generalizability to diverse
environments. Lastly, the nutritional status of patients, or feeding bias, can impact the model's
effectiveness, necessitating a representative dataset for improved performance across diverse
populations.
59
REFERENCES
Al-Romaihi, H., Smatti, M. K., Khatib, H. a. A., Coyle, P., Ganesan, N., Nadeem, S., Farag,
E., Thani, A. a. A., Khal, A. A., Ansari, K. A., Maslamani, M. A., & Yassine, H.
M. (2020). Molecular epidemiology of influenza, RSV, and other respiratory
infections among children in Qatar: A six years report (2012–2017). International
Journal of Infectious Diseases, 95, 133–141.
https://doi.org/10.1016/j.ijid.2020.04.008
Ali, A., Razak, S. A., Othman, S. N., Eisa, T. a. E., Al-Dhaqm, A., Nasser, M., Elhassan, T.,
Elshafie, H., & Saif, A. (2022). Financial Fraud Detection Based on Machine
Learning: A Systematic Literature Review. Applied Sciences, 12(19),
9637https://doi.org/10.3390/app12199637
Athanazio, R. A. (2012). Airway disease: similarities and differences between asthma, COPD
and bronchiectasis. Clinics, 67(11), 1335–1343.
https://doi.org/10.6061/clinics/2012(11)19
Azam, M. A., Shahzadi, A., Khalid, A., Anwar, S. M., & Naeem, U. (2018). Smartphone Based
Human Breath Analysis from Respiratory Sounds.
https://doi.org/10.1109/embc.2018.8512452
Brooks, L. R. K., & Mias, G. I. (2018). Streptococcus pneumoniae’s Virulence and Host
Immunity: Aging, Diagnostics, and Prevention. Frontiers in Immunology, 9.
https://doi.org/10.3389/fimmu.2018.01366
Chen, A., Zhang, J., Zhao, L., Rhoades, R. D., Kim, D., Wu, N., Liang, J., & Chae, J. (2021).
Machine-learning enabled wireless wearable sensors to study individuality of
respiratory behaviors. Biosensors and Bioelectronics, 173, 112799.
https://doi.org/10.1016/j.bios.2020.112799
60
Chen, D., Cao, L., & Li, W. (2023). Etiological and clinical characteristics of severe pneumonia
in pediatric intensive care unit (PICU). BMC Pediatrics, 23(1).
https://doi.org/10.1186/s12887-023-04175-y
Chen, H., Yuan, X., Li, J., Pei, Z. Y., & Zheng, X. (2019). Automatic Multsti-Level In-Exhale
Segmentation and Enhanced Generalized S-Transform for wheezing detection.
Computer Methods and Programs in Biomedicine, 178, 163–173.
https://doi.org/10.1016/j.cmpb.2019.06.024
Copeland, M. (2021, July 17). The Difference Between AI, Machine Learning, and Deep
Learning? NVIDIA Blog. https://blogs.nvidia.com/blog/2016/07/29/whats-
difference- artificial-intelligence-machine-learning-deep-learning-ai/
Crosta, P. (2023, April 19). What you should know about pneumonia.
https://www.medicalnewstoday.com/articles/151632
Dessie, T., Jemal, M., Temesgen, M. M., & Tiruneh, M. (2021). Multiresistant Bacterial
Pathogens Causing Bacterial Pneumonia and Analyses of Potential Risk Factors
from Northeast Ethiopia. International Journal of Microbiology, 2021, 1–9.
https://doi.org/10.1155/2021/6680343
Effah, C. Y., Miao, R., Drokow, E. K., Agboyibor, C., Qiao, R., Wu, Y., Miao, L., & Wang,
Y. (2022). Machine learning-assisted prediction of pneumonia based on non-
invasive measures. Frontiers in Public Health, 10.
https://doi.org/10.3389/fpubh.2022.938801
Grief, S. N., & Loza, J. K. (2018). Guidelines for the evaluation and treatment of pneumonia.
Primary Care, 45(3), 485–503. https://doi.org/10.1016/j.pop.2018.04.001
61
Infante, C., Chamberlain, D. E., Kodgule, R., & Fletcher, R. (2017). Classification of voluntary
coughs applied to the screening of respiratory disease.
https://doi.org/10.1109/embc.2017.8037098
Ippolito, P. P. (2021, December 10). SVM: Feature Selection and Kernels - towards Data
science. Medium. https://towardsdatascience.com/svm-feature-selection-and-
kernels- 840781cc1a6c
Khasha, R., Sepehri, M. M., & Mahdaviani, S. A. (2019). An ensemble learning method for
asthma control level detection with leveraging medical knowledge-based classifier
and supervised learning. Journal of Medical Systems, 43(6).
https://doi.org/10.1007/s10916- 019-1259-8
Kaur, R., Mehra, B., Dhakad, M. S., Goyal, R., Bhalla, P., & Dewan, R. (2017). Fungal
opportunistic pneumonias in HIV/AIDS patients: an Indian Tertiary care
experience. Journal of Clinical and Diagnostic Research.
https://doi.org/10.7860/jcdr/2017/24219.9277
Kim, B., Kang, M., Lim, J., Lee, J. Y., Kang, D., Kim, E. K., Kim, J., Park, H., Min, K. U.,
Cho, J., & Jeon, K. (2022). Comprehensive risk assessment for hospital-acquired
pneumonia: sociodemographic, clinical, and hospital environmental factors
associated with the incidence of hospital-acquired pneumonia. BMC Pulmonary
Medicine, 22(1). https://doi.org/10.1186/s12890-021-01816-9
Košutova, P., & Mikolka, P. (2021). Aspiration syndromes and associated lung injury:
incidence, pathophysiology and management. Physiological Research, S567–
S583. https://doi.org/10.33549/physiolres.934767
Kim, G. T., Seon, S. H., & Rhee, D. (2017). Pneumonia and Streptococcus pneumoniae
vaccine. Archives of Pharmacal Research, 40(8), 885–893.
https://doi.org/10.1007/s12272-017- 0933-y
62
Kuhajda, I., Zarogoulidis, K., Tsirgogianni, K., Tsavlis, D., Kioumis, I., Kosmidis, C.,
Tsakiridis, K., Mpakas, A., Zarogoulidis, P., Zissimopoulos, A., Baloukas, D., &
Kuhajda, D. (2015). Lung abscess-etiology, diagnostic and treatment options.
PubMed, 3(13), 183. https://doi.org/10.3978/j.issn.2305-5839.2015.07.08
Lawton, G., Burns, E., & Rosencrance, L. (2022, January 20). logistic regression. Business
Analytics. https://www.techtarget.com/searchbusinessanalytics/definition/logistic-
regression
Lawton, G., Carew, J. M., & Burns, E. (2022, January 21). predictive modeling. Enterprise AI.
https://www.techtarget.com/searchenterpriseai/definition/predictivemodeling#:~:t
ext=
Predictive%20modeling%20is%20a%20mathematical,forecast%20activity%2C%
20beha vior%20and%20trends.
Lutkevich, B., & Burns, E. (2023, January 20). natural language processing (NLP). Enterprise
AI.https://www.techtarget.com/searchenterpriseai/definition/natural-language-
processing-NLP
Normandin, B. (2023, February 8). Everything you need to know about pneumonia. Healthline.
https://www.healthline.com/health/pneumonia
Pragman, A. A., Berger, J. T., & Williams, B. (2016). Understanding persistent bacterial lung
infections. Clinical Pulmonary Medicine, 23(2), 57–66.
https://doi.org/10.1097/cpm.0000000000000108
63
Pyle, D., & José, C. S. (2019, February 13). An executive’s guide to machine learning.
McKinsey & Company. https://www.mckinsey.com/industries/technology-media-
and- telecommunications/our-insights/an-executives-guide-to-machine-learning
Sharma, A. (2023, March 13). Random Forest vs Decision Tree | Which Is Right for You?
Analytics Vidhya. https://www.analyticsvidhya.com/blog/2020/05/decision-tree-
vs- random-forest-algorithm/
Scherer, P., & Chen, D. L. (2016). Imaging pulmonary inflammation. The Journal of Nuclear
Medicine, 57(11), 1764–1770. https://doi.org/10.2967/jnumed.115.157438
Sharma, S., & Guleria, K. (2023). A Deep Learning based model for the Detection of
Pneumonia from Chest X-Ray Images using VGG-16 and Neural Networks.
Procedia Computer Science, 218, 357–366.
https://doi.org/10.1016/j.procs.2023.01.018
Tripathi, A., Singh, A. D., Singh, K. N., Choudhary, P., & Vashist, P. C. (2021). Machine
learning architecture and framework. Elsevier EBooks, 1–22.
https://doi.org/10.1016/b978-0-12-821229-5.00005-7
Tsang, K., Pinnock, H., Wilson, A., & Shah, S. a. A. (2020). Application of Machine Learning
to Support Self-Management of Asthma with mHealth.
https://doi.org/10.1109/embc44109.2020.9175679
Vanreppelen, G., Wuyts, J., Van Dijck, P., & Vandecruys, P. (2023). Sources of antifungal
drugs. Journal of Fungi, 9(2), 171. https://doi.org/10.3390/jof9020171
Van Vliet, D., Smolinska, A., Jöbsis, Q., Rosias, P. P., Muris, J. W. M., Dallinga, J. W.,
Dompeling, E., & Van Schooten, F. (2017). Can exhaled volatile organic
64
compounds predict asthma exacerbations in children? Journal of Breath Research,
11(1), 016016. https://doi.org/10.1088/1752-7163/aa5a8b
Vatanparvar, K., Nemati, E., Nathan, V., Rahman, M., & Kuang, J. (2020). CoughMatch –
Subject verification using Cough for personal passive health monitoring.
https://doi.org/10.1109/embc44109.2020.9176835
Von Ranke, F. M., Zanetti, G., Hochhegger, B., & Marchiori, E. (2012). Infectious diseases
causing diffuse alveolar hemorrhage in immunocompetent Patients: A State-of-the-
Art Review. Lung, 191(1), 9–18. https://doi.org/10.1007/s00408-012-9431-7
Zhang, O., Minku, L. L., & Gonem, S. (2020). Detecting asthma exacerbations using daily
home monitoring and machine learning. Journal of Asthma, 58(11), 1518–1527.
https://doi.org/10.1080/02770903.2020.1802746
65
APPENDIX A
66
APPENDIX B
Confusion Matrix
67
APPENDIX C
68
APPENDIX D
69
APPENDIX E
% singleton*.
% existing singleton*. Starting from the left, property value pairs are
70
%
% *See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one
gui_Singleton = 1;
'gui_LayoutFcn', [] , ...
'gui_Callback', []);
gui_State.gui_Callback = str2func(varargin{1});
end
if nargout
71
[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
gui_mainfcn(gui_State, varargin{:});
end
handles.output = hObject;
guidata(hObject, handles);
% uiwait(handles.figure1);
% --- Outputs from this function are returned to the command line.
72
% hObject handle to figure
varargout{1} = handles.output;
pause(2)
waitbar(.33,f,'Training Progressing...30%')
pause(2)
pause(2)
waitbar(1,f,'Finishing...Training 100%')
close(f);
PNEM_INTERFACE_2;
73
close(PNEM_INTERFACE_1);
###########################################################################
#####################################
% singleton*.
% existing singleton*. Starting from the left, property value pairs are
74
% stop. All inputs are passed to PNEM_INTERFACE_2_OpeningFcn via varargin.
% *See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one
gui_Singleton = 1;
'gui_LayoutFcn', [] , ...
'gui_Callback', []);
gui_State.gui_Callback = str2func(varargin{1});
end
if nargout
75
else
gui_mainfcn(gui_State, varargin{:});
end
handles.output = hObject;
guidata(hObject, handles);
% uiwait(handles.figure1);
% --- Outputs from this function are returned to the command line.
76
% eventdata reserved - to be defined in a future version of MATLAB
varargout{1} = handles.output;
global netw;
global loc;
netw =load('pnem_workspace.mat');
loc =strcat(path,filen);
img = imread(loc);
img2 = imresize(img,[250,250]);
set(handles.axes1);
imshow(img2);
net1 = netw.net;
results = classify(net1,img2);
77
disp(results);
if results == 'NORMAL'
end
pause(5);
PNEM_INTERFACE_3;
close(PNEM_INTERFACE_2);
78
% eventdata reserved - to be defined in a future version of MATLAB
% handles empty - handles not created until after all CreateFcns called
set(hObject,'BackgroundColor','white');
end
###########################################################################
#############################
% singleton*.
79
% PNEM_INTERFACE_3('CALLBACK',hObject,eventData,handles,...) calls the local
% existing singleton*. Starting from the left, property value pairs are
% *See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one
gui_Singleton = 1;
80
'gui_OutputFcn', @PNEM_INTERFACE_3_OutputFcn, ...
'gui_LayoutFcn', [] , ...
'gui_Callback', []);
gui_State.gui_Callback = str2func(varargin{1});
end
if nargout
else
gui_mainfcn(gui_State, varargin{:});
end
81
handles.output = hObject;
guidata(hObject, handles);
% uiwait(handles.figure1);
% --- Outputs from this function are returned to the command line.
varargout{1} = handles.output;
82
% str2double(get(hObject,'String')) returns contents of fp as a double
% handles empty - handles not created until after all CreateFcns called
set(hObject,'BackgroundColor','white');
end
83
function fn_CreateFcn(hObject, eventdata, handles)
% handles empty - handles not created until after all CreateFcns called
set(hObject,'BackgroundColor','white');
end
% handles empty - handles not created until after all CreateFcns called
84
% Hint: edit controls usually have a white background on Windows.
set(hObject,'BackgroundColor','white');
end
% handles empty - handles not created until after all CreateFcns called
85
if ispc && isequal(get(hObject,'BackgroundColor'),
get(0,'defaultUicontrolBackgroundColor'))
set(hObject,'BackgroundColor','white');
end
% handles empty - handles not created until after all CreateFcns called
86
if ispc && isequal(get(hObject,'BackgroundColor'),
get(0,'defaultUicontrolBackgroundColor'))
set(hObject,'BackgroundColor','white');
end
% handles empty - handles not created until after all CreateFcns called
87
if ispc && isequal(get(hObject,'BackgroundColor'),
get(0,'defaultUicontrolBackgroundColor'))
set(hObject,'BackgroundColor','white');
end
% handles empty - handles not created until after all CreateFcns called
88
if ispc && isequal(get(hObject,'BackgroundColor'),
get(0,'defaultUicontrolBackgroundColor'))
set(hObject,'BackgroundColor','white');
end
% handles empty - handles not created until after all CreateFcns called
89
if ispc && isequal(get(hObject,'BackgroundColor'),
get(0,'defaultUicontrolBackgroundColor'))
set(hObject,'BackgroundColor','white');
end
% handles empty - handles not created until after all CreateFcns called
set(hObject,'BackgroundColor','white');
90
end
% handles empty - handles not created until after all CreateFcns called
set(hObject,'BackgroundColor','white');
end
91
% --- Executes on button press in pushbutton1.
t2 =load('pnem_workspace.mat');
pre = num2str(t2.overall_precision);
rec = num2str(t2.overall_recall);
f1_s = num2str(t2.f1_score);
acc = num2str(t2.accuracy);
spec = num2str(t2.specificity);
sens = num2str(t2.sensitivity);
tp = num2str(t2.TP);
tn = num2str(t2.TN);
fp = num2str(t2.FP);
fn = num2str(t2.FN);
set(handles.precision,'String',pre);
set(handles.recall,'String',rec);
set(handles.f1score,'String',f1_s);
set(handles.accuracy,'String',acc);
set(handles.specificity,'String',spec);
set(handles.sensitivity,'String',sens);
92
set(handles.tp,'String',tp);
set(handles.fp,'String',fp);
set(handles.tn,'String',tn);
set(handles.fn,'String',fn);
% handles empty - handles not created until after all CreateFcns called
93
% Hint: edit controls usually have a white background on Windows.
set(hObject,'BackgroundColor','white');
end
t =load('pnem_workspace.mat');
figure;
cht1 = confusionchart(t.cm);
%disp(loc);
t =load('pnem_workspace.mat');
94
figure;
plotconfusion(t.YTest,t.YPred);
t =load('pnem_workspace.mat');
figure;
grid on;
t =load('pnem_workspace.mat');
95
cat = categorical({'accuracy', 'f1_score', 'Precision', 'Recall', 'sensitivity', 'specificity'});
figure;
p.LineWidth =4
p.Marker = 'o'
p.MarkerSize =5
grid on;
96