Pneumonia NEW1

CHAPTER ONE
INTRODUCTION
1.1 Background of the Study
Pneumonia is a common and potentially serious respiratory infection that affects the lungs. It
can be caused by a variety of pathogens, including bacteria, viruses, fungi, and, less commonly,
parasites. (Crosta, 2023) This condition can affect people of all ages, but it is more prevalent
in young children, the elderly, and individuals with weakened immune systems or pre-existing
health conditions (Jaul & Barron, 2017). Understanding pneumonia, its causes, symptoms,
diagnosis, treatment, and prevention is crucial for maintaining respiratory health.
Microorganisms that enter the lungs through inhalation or aspiration of infected respiratory
droplets (Normandin, 2023) often cause pneumonia.
The most common pathogens responsible for bacterial pneumonia are Streptococcus
pneumoniae, Haemophilus influenzae, and Staphylococcus aureus (Dessie et al., 2021).
Influenza viruses, respiratory syncytial virus (RSV), and adenovirus, among others, typically
cause viral pneumonia (Al-Romaihi et al., 2020). Fungal pneumonia is often seen in individuals
with compromised immune systems, and organisms such as Pneumocystis jirovecii can cause
it (Tasaka, 2015). The symptoms of pneumonia can vary depending on the cause, age of the
individual, and overall health status. Common signs and symptoms include cough (often
producing phlegm or pus), fever, chills, difficulty breathing or shortness of breath, chest pain,
fatigue, and sometimes bluish tint to the lips or nails due to inadequate oxygenation.
To diagnose pneumonia, a healthcare professional will typically start by performing a physical

examination, listening to the lungs with a stethoscope to detect abnormal breath sounds
(crackles, wheezing), and checking for fever. Chest X-rays and other imaging tests may be
ordered to confirm the presence of inflammation in the lungs (Scherer & Chen, 2016). In some
cases, a sputum culture or blood test may be done to identify the specific pathogen causing the
infection (Crna, 2018). Treatment for pneumonia depends on the cause and severity of the
infection. Bacterial pneumonia is usually treated with antibiotics, while antiviral medications
are used for viral pneumonia (Grief & Loza, 2018). Fungal pneumonia may require antifungal
drugs. In severe cases, hospitalization might be necessary for intravenous antibiotics or oxygen
therapy. It is essential to complete the entire prescribed course of antibiotics, even if symptoms
1
improve, to prevent the development of antibiotic-resistant bacteria. Pneumonia can lead to
various complications, particularly in vulnerable populations. These complications may
include pleural effusion (accumulation of fluid around the lungs), lung abscesses (pus-filled
cavities in the lungs), sepsis (a severe response to infection), and respiratory failure (Kuhajda
et al., 2015).
Preventing pneumonia involves several strategies, especially for those at higher risk.
Vaccination is a critical preventive measure. Vaccines against Streptococcus pneumoniae,
Haemophilus influenzae, influenza viruses, and others can significantly reduce the risk of
pneumonia and its complications (Kim et al., 2017). Good hygiene practices, such as frequent
hand washing and covering the mouth and nose when coughing or sneezing, can also help
prevent the spread of pathogens.
Additionally, lifestyle choices play a role in preventing pneumonia. Avoiding smoking and
limiting exposure to secondhand smoke can help keep the lungs healthy. Maintaining overall
good health through regular exercise, a balanced diet, and adequate rest supports a robust
immune system that can better fend off infections. Pneumonia is a severe respiratory infection
that affects millions of people worldwide, causing significant morbidity and mortality.
Timely and accurate diagnosis is crucial for successful treatment and patient outcomes.
Machine Learning (ML) has emerged as a promising tool in medical diagnostics, including
pneumonia detection. It is a lung infection that can be caused by various pathogens, including
bacteria, viruses, and fungi. It leads to inflammation of the air sacs in the lungs, leading to
symptoms like cough, fever, difficulty breathing, and chest pain. Pneumonia can be severe,
particularly in vulnerable populations such as the elderly, children, and individuals with
compromised immune systems. Early detection of pneumonia is crucial for timely intervention
and effective treatment.
Delayed diagnosis can lead to complications, including acute respiratory distress syndrome
(ARDS) and sepsis, which can be life-threatening. Therefore, accurate and swift detection is
vital in reducing the disease burden. Machine learning algorithms can be applied to medical
imaging data, such as chest X-rays and computed tomography (CT) scans or pneumonia
datasets, to assist in the detection of pneumonia (Zhang, 2021).
Pneumonia detection using machine learning represents a significant advancement in medical

diagnostics. ML algorithms, particularly deep learning models, have demonstrated promising
2
results in detecting pneumonia from medical imaging data (Sharma & Guleria, 2023).
However, there are still challenges to overcome, such as data bias and model interpretability.
Collaborative efforts, ethical considerations, and rigorous validation are critical to ensuring the
safe and effective integration of AI-based systems in clinical practice. As the field of machine
learning continues to evolve, we can anticipate further progress in pneumonia detection and
improved patient outcomes.
In conclusion, pneumonia is a widespread and potentially severe respiratory infection caused

by various pathogens. Early diagnosis, appropriate treatment, and preventive measures,
including vaccination and good hygiene practices, are crucial in reducing its impact. The
integration of machine learning in pneumonia detection shows promise for improving diagnosis
and patient outcomes, but further research and careful implementation are necessary to
maximize its potential benefits in healthcare.
1.2 Statement of the Problem
Pneumonia is a serious lung infection that can be life-threatening, especially in young children
and the elderly. Early diagnosis and treatment are essential for improving patient outcomes.
However, pneumonia can be difficult to diagnose, especially in its early stages. Machine
learning has the potential to revolutionize the early diagnosis of pneumonia. By analyzing chest
X-ray images, machine learning models could be trained to identify the telltale signs of
pneumonia, even when they are not visible to the naked eye. This could lead to earlier diagnosis
and treatment, which could save lives.
1.3 Aim and Objectives of the Study
The aim of this study is to design and implementation of pneumonia detection system using
machine learning. The specific objectives are to:
(i) develop Respiratory disease model using Support Vector Machines (SVM) and
Convolutional Neural Networks (CNN).
(ii) develop a pneumonia detection system using Convolutional neural network and support
vector machine.
(iii) evaluate the developed system using recall, confusion matrix and F1 scores.
3
1.4 Research Methodology
The research aims to develop a pneumonia detection system using machine learning algorithms
and a dataset of medical imaging data, including chest X-rays and CT scans. The dataset will
be obtained from Kaggle and will be carefully selected to include diverse patient demographics.
Convolutional Neural Networks will be the primary algorithms for pneumonia detection due
to the success in medical image analysis. Other classical machine learning algorithms may be
used for comparison. Preprocessing steps, like image resizing and normalization, will be
applied to optimize model performance. The dataset will be divided into training, validation,
and testing sets and evaluation metrics like accuracy, precision, recall, F1-score, and AUC-
ROC will be used. Cross-validation techniques will be employed to ensure robust results. The
final model will be deployed as a practical diagnostic tool, possibly as a web-based application
or integrated into hospital information systems, with user-friendly interfaces. Python, Jupyter
Notebook, HTML, CSS, JavaScript, and MySQL will be used for model development and
deployment. Extensive testing and validation will be performed to ensure reliability and safety.
The research aims to create an accurate, interpretable, and ethical pneumonia detection system
to improve patient outcomes and enhance pneumonia diagnosis efficiency in clinical practice.
1.5 Significance of the Study
The study is significant due to its potential to revolutionize pneumonia diagnosis and
management. By applying advanced machine learning algorithms to chest X-rays and CT
scans, the study aims to improve diagnostic accuracy, save time and costs, and make diagnosis
more accessible in various healthcare settings. Real-time diagnosis and personalized treatment
approaches can lead to better patient outcomes. Ethical considerations and fairness-aware
techniques ensure unbiased and equitable diagnosis. The study contributes to the advancement
of AI in healthcare, fostering collaboration between researchers and healthcare professionals.
Overall, the study's impact extends to reducing the global burden of pneumonia and improving
patient care.
1.6 Scope of the Study
The scope of the study is to develop and evaluate a machine learning-based system for
pneumonia detection. It involves collecting diverse medical imaging data, exploring algorithms
like CNN, and addressing data bias and interpretability. The study aims to train and evaluate
the models using evaluation metrics, deploy the system for clinical use, and consider ethical
4
considerations. Limitations include data access and the focus on pneumonia detection only.
The study's goal is to improve pneumonia diagnosis, support healthcare professionals, and
contribute to medical diagnostics.
1.7 Definition of Terms
Pneumonia: A common and potentially serious respiratory infection affecting the lungs caused
by various pathogens, including bacteria, viruses, and fungi.
Machine Learning: An application of artificial intelligence where computer systems are

trained to learn and improve from experience without being explicitly programmed.
Chest X-ray: An imaging technique that uses X-rays to visualize the internal structures of the
chest, including the lungs.
Computed Tomography (CT) scans: A medical imaging technique that uses X-rays to create
detailed cross-sectional images of the body. Dataset: A collection of data used for training and
evaluating machine learning models.
Convolutional Neural Networks (CNN): A type of deep learning model specifically designed
for image processing tasks, capable of automatically learning features from images.
Logistic Regression: Another classical machine learning algorithm used for binary
classification.
Data Preprocessing: The process of preparing and cleaning the data to improve the model's
performance.
Evaluation Metrics: Quantitative measures used to assess the performance of the machine
learning model, such as accuracy, precision, recall, F1-score, and AUC-ROC.
Jupyter Notebook: An open-source web application that allows interactive computing and
data analysis using code and visualizations.
MySQL Database: A widely used relational database management system for storing
structured data.
Web-based Application: An application accessible through web browsers, typically hosted

on servers, and can be used remotely by users.
5
CHAPTER TWO
LITERATURE REVIEW
2.1 Introduction
In the world of respiratory infections, one ailment stands out as a serious and potentially life-
threatening adversary pneumonia. This lung infection strikes people of all ages but is
particularly perilous for the young, the elderly, and those with weakened immune systems. As
we delve into the depths of pneumonia, we will explore its causes, symptoms, risk factors,
diagnosis, treatment, and prevention. At the root of pneumonia lie various infectious agents,
each with its own potential to wreak havoc on the respiratory system (Pragman et al., 2016).
Bacteria, viruses, fungi, and occasionally parasites can all be culprits. Streptococcus
pneumoniae takes the lead among bacterial perpetrators, but Haemophilus influenzae,
Staphylococcus aureus, and Mycoplasma pneumoniae can also lead to the condition (Chen et
al., 2023). On the viral front, influenza viruses, respiratory syncytial virus (RSV), and
adenoviruses pose significant threats (Zhang et al., 2020).
Fungal pneumonia, on the other hand, typically affects those with compromised immune
systems, such as individuals battling HIV/AIDS or undergoing chemotherapy (Kaur et al.,
2017). The onset of pneumonia is often characterized by a cascade of distressing symptoms
(Von Ranke et al., 2012). Coughs, sometimes accompanied by mucus, become relentless. Fever
and chills grip the body, causing discomfort and fatigue (Zambon, 2020). Shortness of breath
and rapid breathing add to the struggle, while chest pain intensifies with each breath or cough.
Sweating and clammy skin become constant companions, and in severe cases, a bluish tint may
appear on lips and nails, reflecting decreased oxygen levels (Kahn, 2023).
Certain individuals face higher odds of falling victim to pneumonia. Age becomes a defining
factor, as the very young and the elderly bear greater vulnerability. Additionally, individuals
with weakened immune systems, such as those with HIV, cancer, or under immunosuppressive
therapy, find themselves at a higher risk. Chronic lung conditions like asthma, COPD (chronic
obstructive pulmonary disease), or bronchiectasis also heighten the susceptibility (Athanazio,
2012). Smoking, both active and passive, further compromises the body's defense mechanisms.
Hospital-acquired pneumonia can be more severe and affect patients during their hospital stay,
adding another layer of risk. Aspiration pneumonia, caused by inhaling food, liquids, or foreign
6
objects into the lungs, is another perilous scenario (Košutova & Mikolka, 2021). Living in
crowded or polluted areas may also increase the risk of contracting pneumonia. When an
healthcare professional suspects pneumonia, a series of diagnostic steps comes into play. A
thorough physical examination and a detailed medical history assessment lay the groundwork.
Listening to the lungs with a stethoscope may reveal abnormal breath sounds, hinting at
pneumonia. To visualize any abnormalities, chest X-rays or CT scans come to the fore. Blood
tests and sputum cultures further contribute to identifying the infectious agent responsible for
pneumonia. Treatment for pneumonia hinges on the cause and severity of the infection.
Bacterial pneumonia typically bows down to antibiotics, while viral pneumonia necessitates
supportive care like antiviral medications to relieve symptoms.
In cases of fungal pneumonia, antifungal medications take the reins (Vanreppelen et al., 2023).
In severe instances or when complications arise, hospitalization may be inevitable. Intravenous
antibiotics, oxygen therapy, and close monitoring provide the necessary support in such critical
situations. However, prevention is always the better course of action. Vaccination serves as a
shield against bacterial pathogens like Streptococcus pneumoniae and Haemophilus influenzae
(Alghamdi, 2021).
Additionally, influenza and other viruses that may lead to pneumonia have vaccines available.
Practicing good hand hygiene with regular hand-washing minimizes the risk of infectious
agents spreading. Avoiding smoking, be it active or passive, helps maintain healthy lungs and
a robust immune system. Leading a healthy lifestyle, including a balanced diet, regular
exercise, and managing chronic conditions, further boosts the body's defense mechanisms.
Lastly, minimizing contact with individuals afflicted with respiratory infections lowers the
chances of contracting pneumonia. Machine learning is a powerful tool that can be used to
improve the diagnosis, treatment, and prevention of pneumonia (Effah et al., 2022). In the
realm of diagnosis, machine learning algorithms can analyze medical images to detect
characteristic patterns associated with pneumonia. This can help radiologists make faster and
more accurate diagnoses.
Machine learning can also be used to predict the severity and prognosis of pneumonia. By
sifting through vast amounts of electronic health records and clinical data, predictive models
can identify patterns and markers that contribute to better patient care and resource allocation.
Machine learning can also be used to track pneumonia outbreaks. By harnessing data from
various sources, including social media, healthcare systems, and environmental factors,
7
machine learning models can spot trends indicative of potential outbreaks. This information
can be used by public health authorities to implement proactive measures to curtail the spread
of the disease. In the pursuit of treatment optimization, machine learning's data-driven
approach can unveil insights into the effectiveness of different treatment strategies for various
types of pneumonia and patient profiles. By analyzing patient outcomes on a large scale, the
models offer valuable guidance to healthcare professionals, facilitating informed clinical
decision-making. However, there are also challenges associated with applying machine
learning to healthcare. One challenge is safeguarding patient privacy. Another challenge is
ensuring that machine learning models are interpretable, so that healthcare professionals can
understand how they work. Despite these challenges, machine learning has the potential to
revolutionize the way pneumonia is diagnosed, treated, and prevented. By harnessing the power
of machine learning and addressing the associated challenges, we can improve outcomes for
pneumonia patients worldwide.
2.1.1 Causes and Types of Pneumonia
Pneumonia can be categorized into several types based on different factors such as the causative
agent, where it was acquired, or the affected population. Below are some common types of
pneumonia:
1. Community-Acquired Pneumonia (CAP): As the name suggests, community-

acquired pneumonia refers to cases of pneumonia contracted outside of healthcare
settings or hospitals (Kim et al., 2022). It is the most common type of pneumonia and
can be caused by various bacteria, viruses, or fungi. Streptococcus pneumoniae is a
frequent bacterial cause of CAP, while viruses like influenza and respiratory syncytial
virus (RSV) can also lead to this type of pneumonia (Brooks & Mias, 2018).
2. Hospital-Acquired Pneumonia (HAP) or Nosocomial Pneumonia: Hospital-
acquired pneumonia is a type of pneumonia that develops after 48 hours of hospital
admission. It is typically more severe and may be caused by antibiotic-resistant
bacteria, making treatment more challenging. Patients in intensive care units (ICUs) or
on ventilators are particularly susceptible to HAP.
3. Ventilator-Associated Pneumonia (VAP): VAP is a subset of hospital-acquired
pneumonia that specifically occurs in patients who are on mechanical ventilation. The
presence of a breathing tube can facilitate the entry of bacteria into the lungs, increasing
the risk of infection.
8
4. Aspiration Pneumonia: Aspiration pneumonia arises when foreign substances, such
as food, liquids, or vomit, are inhaled into the lungs, leading to infection. This type of
pneumonia often affects individuals with impaired swallowing reflexes or altered
consciousness, such as those who have had a stroke or consume excessive alcohol.
5. Atypical Pneumonia or Walking Pneumonia: Atypical pneumonia is caused by
certain bacteria like Mycoplasma pneumoniae and Chlamydophila pneumoniae. It is
often milder and may not present with the classic symptoms seen in typical bacterial
pneumonia. This form of pneumonia is sometimes referred to as "walking pneumonia"
because people can still function despite being infected.
6. Hospital-Acquired, Early-Onset Pneumonia and Late-Onset Pneumonia: In some
cases, hospital-acquired pneumonia is classified based on when it occurs after
admission. Early-onset pneumonia typically occurs within the first four days of
hospitalization, while late-onset pneumonia develops after four or more days.
7. Viral Pneumonia: Viral pneumonia is caused by various viruses, including influenza,
respiratory syncytial virus (RSV), adenovirus, and others. It can affect both children
and adults and tends to be more common during flu seasons.
8. Bacterial Pneumonia: Bacterial pneumonia is caused by different bacteria, with
Streptococcus pneumoniae being the most common culprit. Other bacteria, such as
Haemophilus influenzae, Staphylococcus aureus, and Legionella pneumophila, can also
lead to bacterial pneumonia.
9. Fungal Pneumonia: Fungal pneumonia is caused by various fungi, and it is more
common in individuals with weakened immune systems, such as those with HIV/AIDS
or undergoing chemotherapy. Fungal pneumonia is less common but can be severe in
these vulnerable populations.
2.1.2 Symptoms of Pneumonia
Pneumonia is an infection that affects the lungs, and its symptoms can vary depending on the
cause, the individual's age, and overall health. Below is a list of common symptoms of
pneumonia:
1. Cough: A persistent cough is one of the hallmark symptoms of pneumonia. The cough
may be dry or productive, producing phlegm or mucus.
9
2. Fever and Chills: Pneumonia often leads to a fever, with the body temperature rising
above the normal range. Chills may also accompany the fever as the body attempts to
regulate its temperature.
3. Shortness of Breath or Rapid Breathing: As the infection affects the lungs' ability to
function properly, individuals with pneumonia may experience difficulty breathing or
an increased respiratory rate, especially during physical activity.
4. Chest Pain: Pneumonia can cause chest pain, which may worsen with deep breathing,
coughing, or sneezing. The pain is typically sharp and localized to the affected area.
5. Fatigue and Weakness: The body's immune response to pneumonia can be draining,
leading to feelings of fatigue and weakness.
6. Sweating and Clammy Skin: Individuals with pneumonia may experience excessive
sweating and clammy skin due to the body's effort to figureht the infection.
7. Bluish Tint to Lips and Nails: In severe cases of pneumonia, a bluish tint, known as
cyanosis, may appear on the lips and nails. This discoloration indicates a decrease in
oxygen levels in the blood.
8. Confusion (in elderly individuals): Elderly individuals with pneumonia may exhibit
confusion or changes in mental alertness, which can be a significant symptom in
addition to respiratory symptoms.
It is important to note that the severity of symptoms can vary, and some individuals may have
mild symptoms, while others experience more severe manifestations. Additionally, certain
individuals, such as young children, the elderly, and those with weakened immune systems,
may present atypical or less specific symptoms, making diagnosis challenging in some cases.
2.1.3 Risk Factors associated with Pneumonia
Several risk factors increase an individual's susceptibility to developing pneumonia.

Understanding these risk factors is essential for identifying vulnerable populations and
implementing preventive measures. Here are some common risk factors associated with
pneumonia:
1. Age: Both the very young and the elderly are at a higher risk of contracting pneumonia.
Children under the age of 5, especially those younger than 2 years old, have less
developed immune systems, making them more vulnerable. Similarly, the immune
10
system weakens with age, making adults aged 65 and older more prone to infections,
including pneumonia.
2. Weakened Immune System: Individuals with weakened immune systems are more
susceptible to infections, including pneumonia. . People taking immunosuppressive
medications or undergoing treatments like chemotherapy are also at increased risk.
3. Chronic Lung Conditions: Chronic lung diseases such as asthma, chronic obstructive
pulmonary disease (COPD), bronchiectasis, and interstitial lung diseases can damage
the respiratory system, making it easier for infections to take hold.
4. Smoking: Smoking weakens the lungs and impairs the body's natural defense
mechanisms, making smokers more susceptible to respiratory infections, including
pneumonia.
5. Hospitalization: Pneumonia acquired during a hospital stay, known as hospital-
acquired pneumonia, is a significant concern, especially for patients on ventilators or
those with prolonged hospital stays
6. Living Conditions: Crowded or densely populated living conditions, such as in nursing
homes or homeless shelters, can facilitate the transmission of respiratory infections like
pneumonia.
7. Environmental Factors: Exposure to environmental pollutants and irritants, such as
air pollution or certain occupational exposures, can weaken the respiratory system and
increase the risk of pneumonia.
8. Seasonal Factors: Certain pathogens that cause pneumonia, such as influenza viruses,
are more prevalent during specific seasons. Influenza-associated pneumonia is more
common during the flu season, which typically occurs in colder months.
Having one or more of these risk factors does not guarantee the development of pneumonia,
but it does increase the likelihood. Moreover, many cases of pneumonia can be prevented or
managed through vaccination, maintaining a healthy lifestyle, practicing good hand hygiene,
and seeking timely medical attention when symptoms arise. Identifying and addressing these
risk factors can play a crucial role in reducing the burden of pneumonia on public health.
2.1.4 Diagnosis of Pneumonia
The diagnosis of pneumonia involves a combination of clinical assessment, medical history

evaluation, and diagnostic tests to confirm the presence of the infection and identify its cause.
Here are the key steps involved in the diagnosis of pneumonia:
11
1. Medical History and Physical Examination: The first step in diagnosing pneumonia
is a thorough medical history assessment and physical examination. The healthcare
professional will inquire about the patient's symptoms, such as cough, fever, shortness
of breath, chest pain, and fatigue. or underlying health conditions that could weaken the
immune system. During the physical examination, the healthcare provider will listen to
the patient's lungs with a stethoscope to check for abnormal breath sounds, such as
crackles or wheezing, which could indicate pneumonia.
2. Chest X-ray: A chest X-ray is one of the most common imaging tests used to diagnose
pneumonia. It can reveal areas of inflammation and consolidation in the lungs, which
are indicative of infection.
3. Blood Tests: Blood tests can provide valuable information to support the diagnosis of
pneumonia. A complete blood count (CBC) can show an elevation in white blood cell
count, indicating an immune response to infection. Additionally, the presence of high
levels of C- reactive protein (CRP) and erythrocyte sedimentation rate (ESR) in the
blood may suggest an ongoing inflammatory process.
4. Sputum Culture: If the patient is producing sputum (mucus coughed up from the
lungs), a sample may be collected and sent for a sputum culture. This test can help
identify the specific pathogen causing the pneumonia, whether it's a bacterium, virus,
or fungus. Determining the causative agent is crucial for guiding appropriate treatment,
especially in severe or complicated cases.
5. Arterial Blood Gas (ABG) Analysis: In severe cases of pneumonia, an arterial blood
gas analysis may be performed to assess the patient's oxygen and carbon dioxide levels.
This test helps determine the adequacy of respiratory function and guides decisions
regarding oxygen therapy and mechanical ventilation, if necessary.
The combination of these diagnostic approaches helps healthcare professionals accurately

diagnose pneumonia, identify the causative agent, assess the severity of the infection, and
determine the most appropriate treatment plan. Prompt and accurate diagnosis is crucial for
initiating timely treatment, which can significantly impact patient outcomes and reduce the risk
of complications.
12
2.1.5 Treatment Options for Pneumonia
The treatment options for pneumonia depend on the cause of the infection, the severity of the
illness, and the patient's overall health condition. Here are the main treatment options for
pneumonia:
1. Antibiotics for Bacterial Pneumonia: If the pneumonia is caused by bacteria,

antibiotics are the primary treatment. The choice of antibiotic depends on the specific
bacteria responsible for the infection and its susceptibility to different drugs.
Commonly prescribed antibiotics for bacterial pneumonia include penicillin,
amoxicillin, azithromycin, clarithromycin, or levofloxacin. It is essential to complete
the full course of antibiotics as prescribed by the healthcare provider to ensure complete
eradication of the bacteria.
2. Antiviral Medications for Viral Pneumonia: For pneumonia caused by viruses like
influenza or respiratory syncytial virus (RSV), antiviral medications may be used.
These drugs can help reduce the severity and duration of the illness. However, antiviral
medications are most effective when started early in the course of the infection, so timely
diagnosis is crucial.
1. Antifungal Medications for Fungal Pneumonia: Fungal pneumonia is less common

but can occur in individuals with weakened immune systems. Antifungal medications,
such as fluconazole or amphotericin B, are prescribed to treat fungal pneumonia.
2. Supportive Care: Supportive care is essential for all pneumonia patients, regardless of
the cause. This includes measures to alleviate symptoms and ensure the body has
adequate resources to figure the infection. Supportive care may include: Pain relievers:
To reduce fever and relieve chest pain.
a. Cough suppressants or expectorants: To manage cough symptoms.
b. Oxygen therapy: For patients with low oxygen levels to maintain adequate
oxygenation.
c. IV fluids: For patients who are dehydrated or unable to take fluids orally.
d. Rest: Sufficient rest to aid the body's recovery process.
1. Hospitalization: In severe cases of pneumonia, hospitalization may be necessary.
Hospitalized patients may receive intravenous (IV) antibiotics or antiviral medications,
along with more intensive monitoring and supportive care. Hospitalization is more
13
likely for individuals with underlying health conditions, the elderly, and those with
compromised immune systems.
2. Respiratory Support: Some severe cases of pneumonia can lead to acute respiratory
distress syndrome (ARDS), which may require advanced respiratory support, such as
mechanical ventilation, to help with breathing.
2.1.6 Preventive Measures against Pneumonia
Preventive measures against pneumonia are essential to reduce the incidence of this potentially
serious respiratory infection. These measures focus on strengthening the immune system,
avoiding exposure to infectious agents, and minimizing risk factors. Here are some key
preventive measures against pneumonia
1. Vaccination: Vaccination is one of the most effective ways to prevent pneumonia.

There are vaccines available for bacterial pathogens, such as Streptococcus pneumoniae
and Haemophilus influenzae, which are common causes of bacterial pneumonia.
Additionally, getting vaccinated against influenza (flu) is crucial, as influenza viruses
can lead to viral pneumonia or increase susceptibility to bacterial pneumonia.
2. Hand Hygiene: Regular hand washing is a simple yet powerful preventive measure
against pneumonia. Washing hands with soap and water for at least 20 seconds helps
reduce the transmission of respiratory infections, including pneumonia.
3. Maintaining a Healthy Lifestyle: A well-balanced diet, regular exercise, and adequate
sleep all contribute to a strong immune system, which plays a vital role in fighting off
infections. Eating a variety of fruits, vegetables, whole grains, and lean proteins
provides essential nutrients that support overall health.
4. Limiting Exposure to Respiratory Infections: Minimizing contact with people who
have respiratory infections can prevent the spread of infectious agents that may cause
pneumonia. This is particularly important during flu season or when there are outbreaks
of respiratory illnesses.
5. Immunization for High-Risk Individuals: High-risk individuals, such as young
children, the elderly, and those with weakened immune systems, may benefit from
additional vaccinations, such as the pneumococcal conjugate vaccine (PCV13) and the
pneumococcal polysaccharide vaccine (PPSV23). These vaccines protect against
specific strains of Streptococcus pneumoniae and reduce the risk of severe pneumonia
in vulnerable populations.
14
6. Avoiding Aspiration: Aspiration pneumonia can occur when inhaling food, liquids, or
foreign objects into the lungs. To prevent this, individuals at risk should eat slowly,
take small bites, and avoid lying down immediately after eating.
In summary, preventive measures against pneumonia revolve around vaccination, hand

hygiene, avoiding smoking and secondhand smoke, maintaining a healthy lifestyle, managing
chronic conditions, limiting exposure to respiratory infections, and ensuring immunization for
high-risk individuals. By incorporating these measures into daily life and public health
practices, we can significantly reduce the burden of pneumonia and protect those who are most
vulnerable to severe respiratory infections.
2.2 Machine learning
Machine learning is a powerful tool that allows computers to refine algorithms as they process
more data. Take, for instance, the self-driving car example. By feeding computers with
terabytes and petabytes of data, machine learning enables them to learn and create their own
algorithms, based on pre-existing human-driven programming, to achieve the desired results.
As Nvidia explains, the fundamental principle of machine learning involves using algorithms
to analyze data, learn from it, and then make predictions about real-world scenarios (Copeland,
2021).
Unlike traditional programming, where software routines are manually coded to execute
specific tasks, machine learning trains machines to learn how to perform tasks by ingesting
vast amounts of data and using sophisticated algorithms. McKinsey & Company defines
machine learning as an algorithm-based approach to learning from data without relying on pre-
defined rules (Pyle et al., 2019).
Pneumonia detection often employs machine learning, a technology that utilizes artificial
intelligence (AI). With machine learning, systems can automatically learn from their
experiences and improve their performance over time without the need for explicit
programming. Machine learning is focused on developing computer programs that can access
data and utilize it to learn on their own. The learning process begins with observations or data,
such as examples, first-hand experience, or instruction, and aims to find patterns in the data to
enhance future decisions. The primary objective is to enable computers to learn independently,
without human intervention, and adjust their behaviour accordingly.
15
Figure 2.1 How machine learning is a subset of AI (towards data science, Seema Singh,
2018)
2.3 Machine Learning Applications
Machine learning is a subset of artificial intelligence that involves the development of computer
algorithms that can automatically learn and improve from data without explicit programming.
Machine learning has numerous applications across various industries, including:
1. Image and speech recognition: Machine learning is widely used in image and speech
recognition applications, such as facial recognition, voice recognition, and image
classification (Sarker, 2021). These applications are used in security, healthcare,
entertainment, and many other fields.
2. Predictive modelling: Predictive modeling is used to predict future events based on
historical data. Machine learning algorithms are used in predictive modelling to identify
patterns in data and make predictions (Lawton et al., 2022). This application is used in
finance, marketing, healthcare, and many other industries.
3. Natural language processing: Natural language processing involves the development
of algorithms that can process and understand human language (Lutkevich & Burns,
2023). Machine learning is used in natural language processing to improve speech
recognition, machine translation, and sentiment analysis, among others.
4. Fraud detection: Machine learning is used in fraud detection applications to identify
fraudulent transactions, credit card fraud, and insurance fraud (Ali et al., 2022). These
applications are used in finance, insurance, and other industries to prevent financial
losses.
16
5. Recommendation systems: Recommendation systems use machine learning
algorithms to recommend products, services, and content to users based on their past
behavior and preferences. These systems are used in e-commerce, social media, and
entertainment applications (Dwivedi, n.d.).
6. Autonomous vehicles: Machine learning is used in autonomous vehicle applications
to enable self-driving cars and other vehicles. These applications use sensors and
cameras to collect data and machine learning algorithms to interpret the data and make
decisions.
7. Personalized medicine: Machine learning is used in personalized medicine
applications to develop personalized treatment plans based on a patient's medical
history, genetics, and lifestyle. These applications are used in healthcare to improve
patient outcomes and reduce healthcare costs.
Machine learning has numerous applications across various industries, and its use is rapidly
increasing due to its ability to analyze large amounts of data, identify patterns, and make
predictions. The above-listed applications of machine learning are just a few examples of how
it is being used to solve real-world problems and improve our lives.
2.4 Machine Learning Approaches
Machine learning is a subset of artificial intelligence that utilizes statistical algorithms to enable
computers to learn from data and improve their performance without being explicitly
programmed. There are three main approaches to machine learning, each with different types
and examples:
Figure 2.2 Machine Learning Approaches (westbrook 2016)
17
2.4.1 Supervised Learning
Supervised learning is a type of machine learning approach that involves training an algorithm
on labelled data, where the input and output data are already known (Petersson, 2021). The
algorithm learns to identify patterns in the data and uses these patterns to make predictions on
new, unseen data. Supervised learning is suitable for problems that involve classification or
regression tasks.
(a) Classification: In classification, the output is a categorical variable. For example, email
classification as spam or not spam, or image classification as a dog or a cat. Examples of
supervised classification algorithms are logistic regression and decision trees.
(b) Regression: In regression, the output is a continuous variable. For example, predicting
house prices based on their features, or predicting a person's salary based on their age and
education. Examples of supervised regression algorithms are linear regression and polynomial
regression.
2.4.2 Unsupervised Learning
Unsupervised learning is a type of machine learning approach that involves training an

algorithm on unlabeled data, where the input data is not categorized or labelled (Pratt, 2020).
The algorithm learns to identify patterns in the data and use these patterns to cluster or group
similar data points together.
(a) Clustering: In clustering, the goal is to group similar data points together based on their
characteristics. For example, clustering customers based on their purchasing habits or
clustering news articles based on their content. Examples of unsupervised clustering algorithms
are k-means, hierarchical clustering, and DBSCAN.
(b) Association: In association, the goal is to find relationships or associations between

different variables in the data. For example, finding products that are often bought together or
finding frequent item sets in a grocery store dataset. Examples of unsupervised association
algorithms are Apriori and FP-Growth.
18
2.4.3 Reinforcement Learning
Reinforcement learning is a type of machine learning approach that involves training an

algorithm in an environment by providing feedback in the form of rewards or penalties (Carew,
2023). The algorithm learns to take actions that maximize the reward and avoid actions that
result in penalties.
Markov Decision Process (MDP): In MDP, the agent interacts with the environment by taking
actions, and the environment responds with a reward signal. The agent's goal is to learn a policy
that maximizes the expected cumulative reward. Examples of reinforcement learning
algorithms using MDP are Q-learning, SARSA, and Deep Q- Networks (DQN).
Policy Gradient: In policy gradient, the agent learns a probability distribution over actions
that maximizes the expected cumulative reward. This approach is suitable for problems with
continuous action spaces, such as robot control or game playing. Examples of reinforcement
learning algorithms using policy gradient are REINFORCE, Actor- Critic, and Proximal Policy
Optimization (PPO).
In summary, machine learning has three main approaches, each with different types and
examples. Supervised learning is suitable for classification and regression tasks, while
unsupervised learning is suitable for clustering and association tasks. Reinforcement learning
is suitable for problems that require decision making in an uncertain environment.
2.5 Theoretical Frameworks
In the context of pneumonia detection using machine learning, four commonly used algorithms
are Support Vector Machine (SVM), Logistic Regression, Random Forest, and Convolutional
Neural Networks (CNN). Each algorithm offers unique strengths and capabilities in the realm
of medical image analysis and classification tasks.
2.5.1 Random Forest
Random Forest is an ensemble learning method that combines multiple decision trees to
improve performance and generalization. It works by creating multiple decision trees during
the training process and averaging their predictions for making final decisions.
19
Figure 2.3 Example of a Random Forest model (Johnson, 2020)
Advantages:
1. Increased accuracy: Random Forest typically offers higher accuracy compared to

individual decision trees by reducing overfitting and capturing more complex patterns
in the data.
2. Robustness: The averaging of multiple trees makes Random Forest more robust and
less susceptible to outliers and noise in the data.
Challenges:
1. Complexity: Random Forest models are more complex than individual decision trees,
making them harder to interpret compared to single decision trees.
2. Computationally intensive: Building and training multiple decision trees can be
computationally expensive, especially with large datasets.
2.5.2 Convolutional Neural Network (CNN)
Convolutional Neural Network is a type of deep learning architecture that is primarily used for
image recognition and computer vision tasks. It is designed to automatically and adaptively
learn spatial hierarchies of features from input images, allowing it to identify patterns, objects,
and structures within the images. The key components of a CNN are convolutional layers,
20
pooling layers, and fully connected layers. Here's a brief overview of each: Convolutional
Layer: The convolutional layer is the core building block of a CNN. It consists of a set of
learnable filters (also called kernels) that slide over the input image. Each filter performs a
convolution operation, which involves element-wise multiplication of the filter with a local
region of the input image, followed by summation. The result is a feature map that highlights
certain patterns or features found in the input image. The formula for the convolution operation
in 2D can be represented as follows:
F(i,j)=(I*K)(i,j)= ∑m∑nI(i+m,j+n)K(m,n)
Where:
- \( I \) is the input image.
- \( K \) is the convolution kernel/filter.
- \( (i, j) \) represents the position of the output pixel.
- \( (m, n) \) represents the position within the filter/kernel.
- \( (i-m, j-n) \) represents the position within the input image where the filter/kernel overlaps.
Pooling Layer: The pooling layer is used to reduce the spatial dimensions of the feature maps
obtained from the convolutional layers. It helps in reducing the computational complexity and
making the network more robust to small variations in the input. The most common type of
pooling is max-pooling, which takes the maximum value from a local region of the feature map
and retains only the most significant information. Fully Connected Layer: After several
convolutional and pooling layers, the final feature maps are flattened into a 1D vector and
passed through one or more fully connected layers. These layers are similar to those in a
traditional neural network, connecting all neurons from the previous layer to all neurons in the
current layer. They help in learning complex non-linear relationships between the extracted
features and the output classes. The formula for a fully connected layer is standard and involves
a matrix multiplication:
21
Figure 2.4 How CNN algorithm works (Kalita, 2022)
2.6 General Architecture of Machine Learning
The general architecture of machine learning involves several key components and stages that
collectively enable the learning and predictive capabilities of a model. The process of building
and training a machine learning model typically follows these fundamental steps:
Figure 2.5 General architecture of machine learning (Tripathi et al., 2021)
Data Collection:
The first step in the machine learning process is to gather relevant data for the problem at hand.
Data can come from various sources, such as databases, APIs, sensors, or online repositories.
The quality and size of the dataset significantly impact the performance and generalization
ability of the model.
22
2.6.1 Data Preprocessing
Raw data often requires preprocessing to make it suitable for training a machine learning
model. This stage involves data cleaning, which includes handling missing values, removing
outliers, and normalizing or scaling features to ensure consistency and comparability.
Feature Engineering:
Feature engineering is the process of selecting, extracting, or transforming specific features

from the data that are most relevant to the problem. Creating informative and meaningful
features is essential for the model to capture patterns and relationships effectively.
Model Selection:
Choosing an appropriate machine learning algorithm or model architecture is critical. The

selection depends on the nature of the problem (classification, regression, clustering, etc.), the
size of the dataset, the complexity of the relationships, and other factors. Common algorithms
include decision trees, support vector machines, neural networks, and more.
Model Training:
In this phase, the selected model is fed with the preprocessed data to learn from it. During
training, the model optimizes its internal parameters based on the input data and a defined
objective (e.g., minimizing error or maximizing accuracy). The learning process typically
involves an optimization algorithm that adjusts the model's parameters to minimize the
difference between the predicted outputs and the actual targets in the training data.
Model Evaluation:
Once the model is trained, it needs to be evaluated using a separate set of data, called the
validation or test set. This evaluation ensures that the model can generalize well to new, unseen
data. Various metrics, such as accuracy, precision, recall, and F1 score, are used to assess the
model's performance.
Hyperparameter Tuning:
Most machine learning algorithms have hyperparameters that govern their behavior but are not
learned during training. Hyperparameter tuning involves selecting the best combination of
23
hyperparameters to optimize the model's performance. Techniques like grid search, random
search, or Bayesian optimization are commonly used for this purpose.
Model Deployment:
After obtaining a well-performing model, it can be deployed in real-world applications to make

predictions on new, unseen data. Model deployment may involve integrating the model into a
larger software system or exposing it as an API for use in web applications or services.
Monitoring and Maintenance:
Machine learning models require continuous monitoring to ensure they perform as expected in
the production environment. Monitoring involves tracking model performance, detecting drift
(changes in data distribution), and retraining the model periodically with new data to maintain
its accuracy over time.
The iterative nature of machine learning involves going back to previous stages, such as data
collection, preprocessing, and feature engineering, to improve the model's performance
continually. This cyclical process is often referred to as the "machine learning pipeline" and
forms the foundation for solving a wide range of problems across various domains.
2.7 Related works
Pneumonia is a serious health condition that can be difficult to diagnose. In recent years, there
has been growing interest in the use of machine learning algorithms to improve the detection
of pneumonia. These algorithms can analyze patient symptom data, such as persistent cough,
chest pain, and fever, to identify individuals who are at higher risk of having pneumonia.
Researchers have shown that pneumonia using a variety of datasets, which vary in quality and
features. By carefully selecting and extracting relevant information from these datasets, authors
have been able to draw meaningful conclusions from their research.
Several significant studies have contributed to the advancement of machine learning-based

pneumonia detection, offering hope for improved diagnosis and outcomes for patient .Chen et
al. (2021) designed Machine-learning enabled wireless wearable sensors to study individuality
of respiratory behaviors. A study was conducted to develop a system that can accurately extract
and classify the features of respiratory behaviors of subjects within various postures. The
system consists of two wireless wearable sensors that are attached to the chest and abdomen.
24
The sensors measure the local circumference changes of the chest and abdominal walls
simultaneously, and the data is wirelessly transmitted to a laptop. Three different random forest
classifiers were used to process the data, and the results showed that the individual and
weighted- adaptive classifiers were able to classify postures with an accuracy of up to 98.9%
and 98.8%, respectively. The study demonstrates that the accurate monitoring of respiratory
behaviors can be used to track the progression of respiratory disorders and diseases, and can
provide timely and objective approaches for control.
Vatanparvar et al. (2020) developed CoughMatch Subject Verification Using Cough for
Personal Passive Health Monitoring." In this paper, the authors presented a method that utilize
a limited number of cough samples to create a personal cough model for the primary subject.
This model is then employed by an automatic cough detection system to verify whether the
identified cough match the personal pattern and belong to the primary subject. Zhang et al.
(2020) developed Detecting asthma exacerbations using daily home monitoring and machine
learning." In this study, the authors aimed to develop a machine learning algorithm that could
accurately detect severe asthma exacerbations using easily available daily monitoring data.
Tsang et al. (2020) developed "Application of Machine Learning to Support Self-Management

of Asthma with Health." In this research, the authors utilized the Asthma Mobile Health Study
(AMHS), a publicly available Health dataset, to employ machine learning techniques for
developing early warning algorithms aimed at enhancing asthma self-management. The AMHS
encompassed longitudinal data from 5,875 patients, consisting of 13,614 weekly surveys and
75,795 daily surveys.
Khasha et al. (2019) proposed an ensemble learning method for asthma control level detection,
titled "An ensemble learning method for asthma control level detection with leveraging medical
knowledge-based classifier and supervised learning." The study highlights the significance of
asthma, a disease affecting approximately 300 million individuals worldwide and leading to an
estimated 250,000 deaths. Without proper treatment, asthma can become a serious public health
concern.
Pramono et al. (2019) developed "Automatic Cough Detection in Acoustic Signal using
Spectral Features," a study that presents an algorithm for automatically detecting cough events
from acoustic signals. The algorithm utilizes only three spectral features in conjunction with a
logistic regression model to classify sound segments into cough and non-cough events. These
25
spectral features are derived through simple calculations from two specific frequency bands
within the sound spectrum, which were selected based on their distinctive characteristics.
H. Chen et al. (2019) proposed "Automatic Multi-Level In-Exhale Segmentation and

Enhanced Generalized S-Transform for wheezing detection" a study that focuses on the
automatic detection of cough events from acoustic signals. The algorithm presented in the
paper utilizes three spectral features along with a logistic regression model to classify sound
segments into cough and non-cough events. Azam et al. (2018) Empolyed "Smartphone Based
Human Breath Analysis from Respiratory Sounds" as a study aimed at detecting irregular
patterns in respiratory cycles caused by respiratory diseases. The research work presented a
scheme that involved analyzing breath segments captured using a smartphone under natural
settings.
Infante et al. (2017) developed "Classification of Voluntary Coughs Applied to the Screening
of Respiratory Disease." In this study, the authors investigated the potential of analyzing
voluntary cough sounds for screening pulmonary diseases. They recorded voluntary coughs
using a custom mobile phone stethoscope from a total of 54 patients, including 7 with COPD,
15 with asthma, 11 with allergic rhinitis, 17 with both asthma and allergic rhinitis, and 4
withboth COPD and allergic rhinitis. Additionally, data were collected from 33 healthy
subjects for comparison.
Van Vliet et al. (2017) proposed "Can exhaled volatile organic compounds predict asthma
exacerbations in children?" The objectives of the study were twofold: (1) to identify a set of
exhaled volatile organic compounds (VOCs) that could serve as predictors for asthma
exacerbations in children, and (ii) to determine the chemical identity of these predictive
biomarkers. The researchers conducted a one-year prospective observational study involving
96 asthmatic children. At two-month intervals during clinical visits, various parameters were
assessed, including asthma control, fractional exhaled nitric oxide levels, lung function
measurements (FEV1, FEV1/VC), and VOCs in exhaled breath using gas chromatography
time- of-flight mass spectrometry. Random Forest classification modeling was employed to
select the most predictive VOCs, and receiver operating characteristic (ROC) curves were
plotted.
26
Table 2.1: Comparison of different techniques on pneumonia
S/N Author(s) Strategy Limitation Performance
1 Chen et al. (2021) Random forest Small dataset 98%
2 Vatanparvar et al. (2020) Gaussian mixture model, Small dataset 93.34%

neural networks
3 Tsang et al. (2020) DT, LR, and SVM Small sample size 72.5%
4 Khasha et al. (2019) Ensemble learning, LR, SVM, Small dataset 92.7%
random forest, KNN, and DT
5 Pramono et al. (2019) Logistic regression Not available 88.70%
6 Chen et al. (2019) SVM, extreme learning Small dataset 99.52%

machine (ELM), KNN
7 Zhang et al. (2020) Recursive feature elimination Self report 90%

PCA Kandom
2.8 Comparison of related works
Over time, a variety of methods have been developed to diagnose pneumonia. These methods
differ in terms of their accuracy, usefulness. A table of the different techniques used to diagnose
pneumonia is presented in Table 2.1.
2.8.1 Research Gap
Previous research has used a variety of machine learning algorithms to create respiratory
disease prediction models. However, these models have some limitations, such as limited
datasets, over fitting, and unrealistic sizes. Chen et al. (2019) models have various drawbacks,
including limited datasets, pre-processing over fitting of the data, and models built on
predefined sizes that are not relevant in the actual world. The performance of the model is
inefficient or inaccurate as a result of all these restrictions. Zhang et al. (2020) tackled the issue
by integrating a real-time dataset and leveraging a large dataset size. They also adopted
minimal pre-processing and feature extraction techniques to mitigate over fitting and under
fitting during the model's development.
27
CHAPTER THREE
METHODOLOGY
3.1 General Overview
This section looks deeply into exploring the proposed system and the creation of a model
intended to tackle the identified issue discussed in the preceding section. The goal is to predict
instances of pneumonia in potential patients through the utilization of a machine learning
framework. The segment presents a sequence of stages that will be undertaken, commencing
with the gathering of data. This encompasses the parameters and target variables of the
Pneumonia dataset. Following this, a pre-processing procedure will be executed to handle any
absent information, succeeded by feature extraction aimed at streamlining the data and
eliminating insignificant and repetitive attributes. This process aims to enhance the accuracy
of predictions. Within this study, an intricate machine learning model will be formulated with
the objective of foreseeing pneumonia occurrences in potential patients. The model will make
use of Support Vector Machines (SVM), and Convolutional Neural Networks (CNN) as
classifiers for training, with a combination of SVM and Random Forest. The efficiency of the
model will subsequently be assessed using a range of performance metrics, including accuracy,
recall, precision, and the confusion matrix.
3.2 Description of the Proposed System
Machine learning and artificial intelligence (AI) are revolutionizing the field by leveraging
extensive chest X-ray datasets to identify pneumonia-related patterns. This leads to enhanced
accuracy in detecting even mild cases of pneumonia. Furthermore, machine learning automates
X-ray analysis, reducing the occurrence of human errors. These advancements hold the
potential to accelerate, refine, and cost-effectively improve pneumonia detection, ultimately
benefiting patient outcomes. The proposed detection system, depicted in Figure 1.7, is designed
to tackle the challenge of pneumonia detection. The emphasis lies in creating a comprehensive
and consistent system through the effective integration of machine learning methods. The
primary goal of this research is to bolster the dependability of pneumonia detection in patients.
28
Figure 3.1 Pneumonia Detection Block Diagram
3.2.1 Data Collection
This project employed the dataset of Chest X-Ray Images (Pneumonia), sourced from Kaggle
in CSV format. The data is categorized into three directories (train, test, val), further subdivided
into subfolders corresponding to each image type (Pneumonia/Normal). The dataset
encompasses a total of 5,000 X-Ray images in JPEG format, spanning across two categories
(Pneumonia/Normal). These X-ray images, captured in the anterior-posterior view, were
carefully chosen from historical collections of pediatric patients aged one to five years at the
Guangzhou Women and Children’s Medical Center in Guangzhou.
3.2.2 Image Resizing
Image resizing is a process of changing the size of a picture. Imagine you have a big photo, but
you want it to be smaller or bigger. Resizing does that. It makes the picture fit better on a screen
or in a frame. That's basically how image resizing works!
(a) Data Augmentation
Using data augmentation is like adding some variety to your dateset to help your machine
learning model learn better. A good example is having a bunch of different dog pictures but
you want your model to recognize all kinds of dogs, not just those in one position or lighting.
Data augmentation helps by making small changes to your pictures, like flipping them
29
horizontally, rotating them a bit, or changing the brightness. This creates new versions of your
pictures that your model can learn from, making it better at recognizing dogs in all sorts of
situations. So, by using data augmentation, you're giving your model more examples to learn
from, which can improve its performance
(b) Data Normalization
Data Normalization ensures all data features are on a similar scale, preventing biases in model
training. Like putting all your data on the same scale, so your machine learning model can
understand it better.a good example is having two friends age and their salaries. One friend's
age is in the 20s, and the other's is in the 30s, while their salaries are in thousands and tens of
thousands. If you don't normalize the data, your model might think age is more important just
because the numbers are bigger. Normalization fixes this by making sure all the data is on a
similar scale helping your model make better predictions.
3.2.6 Feature Extraction
Feature extraction involves transforming raw data into a more compact and meaningful
representation that captures relevant information for the machine learning task. In image data,
features can be extracted using techniques like edge detection, color histograms, or deep
learning-based feature extraction using pretrained convolutional neural networks (CNN) to
carry out and execute its operation.
Feature Selection
Feature selection involves choosing a subset of the most relevant features from the extracted
set to improve model performance, reduce overfitting, and enhance interpretability. Feature
selection reduces model complexity, improves training speed, and enhances model
generalization by focusing on the most informative features.
3.2.7 Training Classification
This study aims to design a system that would help in prediction of pneumonia, to carry out
this purpose we would make use of two machine learning algorithm, Convolutional Neural
Networks (CNN), and Support Vector Machine (SVM), a total of five thousand images was
gathered through the research, four hundred and twenty images was used for training, which
was broken down into three categories, Pneumonia normal, Pneumonia bacteria, Pneumonia
30
virus. As these images are uploaded in to the system, it uses it for training by extracting features
from the images after which the system would have learned and be able to predict a new input
image of Pneumonia.
(a) Support Vector Machine
A support vector machine (SVM), is a machine learning algorithm that uses supervised learning
models to solve complex classification, regression, and outlier detection problems by
performing optimal data transformations that determine boundaries between data points. Based
on predefined classes, labels, or outputs, they are widely adopted across disciplines such as
healthcare, natural language processing, signal processing applications, speech and image
recognition fields. In the mathematical context, an SVM refers to a set of machine algorithms
that use kernel methods to transform data features by employing kernel functions. Kernel
functions rely on the process of mapping complex datasets to higher dimensions in a manner
that makes data point separation easier. The function simplifies the data boundaries for non-
linear problems by adding higher dimensions to map complex data points. The mathematical
formulation of SVM focuses on finding the optimal hyperplane that maximizes the margin
between classes in a high-dimensional space. Here's a breakdown of the key components:
Data Represent
 Each data point x_i is represented as a vector in an n-dimensional space, where n is the
number of features.
 The class label of each data point is denoted by y_i, where y_i can be +1 or -1 for binary
classification problems (other conventions exist for multi-class problems).
Hyperplane Equation:
 The hyperplane can be represented by the equation:
w^T * x_i + b = 0
 Here, w is a weight vector with the same dimension as the data points (n-dimensional),
and b is the bias term that determines the position of the hyperplane relative to the
origin.
31
Margin:
 The margin is defined as the distance between the hyperplane and the closest data points
from each class, known as support vectors.
 We want to maximize this margin to create a clear separation between classes.
Cost Function and Optimization:
 To find the optimal hyperplane, we typically minimize a cost function that penalizes
instances where data points fall on the wrong side of the margin or too close to the
hyperplane.
 A common cost function used in SVM is the hinge loss:
L(w, b) = max(0, 1 - y_i (w^T * x_i + b))
 This function penalizes violations of the desired margin.

 The optimization problem involves minimizing the cost function with respect to the
weight vector (w) and bias term (b).
Kernel Trick (for non-linear data):
 When data is not linearly separable in the original feature space, SVM employ the
kernel trick.
 This trick involves implicitly mapping the data points to a higher-dimensional space
where they become linearly separable.
 A kernel function operates on the original data points and computes their inner product
in the higher-dimensional space without explicitly performing the mapping.
 Common kernel functions include linear, polynomial, and Gaussian (RBF).
Solving the Optimization Problem:
 Specialized optimization algorithms like Sequential Minimal Optimization (SMO) are

used to solve the SVM optimization problem efficiently.
Classification of New Data
 once the optimal hyperplane is determined, a new data point x is classified by evaluating
the sign of the decision function
32
f(x) = w^T * x + b
 If f(x) > 0, the point is classified as class +1.

 If f(x) < 0, the point is classified as class -1.
This mathematical framework provides a foundation for understanding how SVM operate to
create decision boundaries and perform classification tasks. The main objective of the SVM
algorithm is to find the optimal hyperplane in an N-dimensional space that can separate the
data points in different classes in the feature space. The hyperplane tries that the margin
between the closest points of different classes should be as maximum as possible. The
dimension of the hyperplane depends upon the number of features. If the number of input
features is two, then the hyperplane is just a line. If the number of input features is three, then
the hyperplane becomes a 2-D plane. It becomes difficult to imagine when the number of
features exceeds three. Convolutional Neural Networks (CNN) are a class of deep neural
networks, most commonly applied to analyzing visual imagery. They have revolutionized the
field of computer vision, enabling impressive performance in tasks such as image
classification, object detection, and image segmentation.
CNN have found widespread applications in various domains:
Image Classification: CNN excel in classifying images into categories. For example,
identifying whether an image contains a cat or a dog.
Object Detection: They are used to detect and localize objects within images. Popular
frameworks like Faster R - CNN and YOLO (You Only Look Once) use CNN for object
detection.
Image Segmentation: CNN can segment an image into different regions, assigning a label to
each pixel. This is valuable in medical imaging, autonomous driving, and more.
Video Analysis: CNN can be extended to analyze video data by processing each frame using
the same principles as image analysis.
Training a CNN involves forward propagation (where input data passes through the network,
layer by layer) and back propagation (where errors are calculated and weights are updated).
Due to the deep and complex nature of CNN, training often requires substantial computational
resources and large datasets.
33
3.2.8 Model Testing
Model testing in machine learning refers to the process of evaluating the performance and
effectiveness of a trained machine learning model on unseen data. The goal of model testing is
to assess how well the model can generalize its learned patterns from the training data to new,
previously unseen data. This is a crucial step in the machine learning pipeline to ensure that
the model performs well in real-world scenarios.
3.2.9 Evaluation
The model's performance is measured in terms of recall, accuracy, precision, and f1score.
Accuracy: quantifies the percentage of instances that are classified correctly among all
instances. It is computed by dividing the number of accurate predictions by the total number of
predictions. Essentially, it represents the proportion of correct predictions made by our model,
reflecting its overall correctness.
Precision: evaluates the accuracy of identifying true positive instances among the predicted
positives. It quantifies the proportion of true positives out of all positive predictions. This
assessment reflects the model's effectiveness in predicting a particular category and is
employed to gauge its capability in correctly classifying positive values.
Recall: revolves around accurately identifying positive instances among the actual positives.
Mathematically, it represents the true positives divided by the total count of actual positive
instances. This metric provides insights into how effectively the model detects a particular
category and assesses its capacity to predict true positive values.
Recall= True positive / True positive + false negatives
F1-Score: serves as a balanced metric, taking into account both precision and recall
simultaneously. When there is a need to consider both precision and recall, the F1 Score comes
in handy, as it embodies the harmonic mean of these two metrics.
F1-Score= 2* Precision* Recall / Precision + Recall
The application of the confusion matrix technique aids in obtaining the essential parameters
for evaluating model performance. Primarily employed for assessing classification models, this
two-dimensional table arranges the model's predicted labels in columns and the true class labels
34
in rows. The confusion matrix enables the derivation of crucial metrics such as True Positive
(TP), True Negative (TN), False Positive (FP), and False Negative (FN). Table 3.1 visually
illustrates the extraction of these values from the table, which subsequently serve as the
foundation for calculating the model's performance metrics.
Table 3.1 Sample of confusion matrix
A confusion matrix is a table that visualizes the performance of a classification model. It's a
2x2 matrix (for binary classification) with the actual class labels on one axis and the predicted
class labels on the other axis. The confusion matrix is a fundamental tool in evaluating the
performance of a classification model in machine learning. It provides a summary of the
predictions made by a model on a dataset, showing how well the model's predictions align with
the actual labels.
True positives (TP) occur when the model accurately predicts the positive class, correctly
recognizing an observation as part of the positive class.
False positives (FP) happen when the model predicts the positive class incorrectly, wrongly
identifying an observation as belonging to the positive class when it actually does not.
True negatives (TN) are instances where the model correctly predicts the negative class,
accurately identifying an observation as not belonging to the positive class.
35
False negatives (FN) are situations where the model predicts the negative class incorrectly,
mistakenly classifying an observation as not belonging to the positive class when it actually
does.
3.3 User Interface Design
The user interface design for this research project involves crafting an intuitive and user-
friendly interface aimed at enabling interaction between users, such as medical professionals
and researchers, and the machine learning (ML) model used for diagnosing pneumonia from
medical images. The main page will feature a clearly worded title that communicates the
application's purpose. Additionally, it might incorporate a concise explanation of the system's
functioning and its advantages. A designated area will be created where users can submit
medical images like X- rays or CT scans for analysis. This interface will possess the capability
to support file uploads and provide confirmation for successful uploads. This segment of the
UI will also keep users informed about the progress of their uploaded images, potentially
utilizing a progress bar or animated loading icon to signify the ongoing image analysis process.
Once the analysis concludes, the outcome of the prediction will be presented. This could
manifest as a straightforward message such as "Pneumonia Detected" or "No Pneumonia
Detected," accompanied by the prediction's confidence level.
3.3.1 Input Design
Input design pertains to the process of creating an interface that allows users to interact with
the system by providing input. In the described user interface design above.
3.3.2 Output Design
Output design refers to the presentation of results or information from the system to the users
in a clear and comprehensible manner. The following explains how the output design will be
effective:
Prediction Results: Once the system analyzes the medical images, the output design involves
presenting the prediction results. This could be a simple message indicating whether
pneumonia is detected or not.
36
Clear Buttons and Labels: The output design includes ensuring that buttons and labels used
in the UI are clear and descriptive. This enhances user understanding and navigation within the
application.
Responsive UI: The output design extends to making sure that the UI is responsive, adapting
effectively to various screen sizes and devices, including desktops, tablets, and smartphones.
This ensures that users can access and view the output information regardless of the device
they are using.
3.4 Software Component Requirements
Software pertains to a collection of instructions and programs that guide the functioning of a
machine. These are forged by programmers and software teams. In this exploration, the
subsequent software constituents find utility, each with its distinct purpose:
1. Operating system: To ensure optimal use of computational resources, the employment

spans from Windows 7 to more recent iterations.
2. Integrated development environment (IDE): The composition of the system's source
codes finds its expression through the utilization of Visual Studio Code.
3. Web browsers: A selection of web browsers comprising Google Chrome, Mozilla
Firefox, Safari, and Microsoft Edge have been designated
4. Python: A versatile programming language harnessed for a multitude of coding
objectives.
5. Google Colaboratory (Google Colab): As a cloud-grounded Jupyter notebook habitat
provided by Google, it enables the composition and execution of Python code directly
within a web browser.
6. PHP: Renowned as a server-side scripting language, PHP (Hypertext Preprocessor) is
primarily tailored for web development. Its typical application involves crafting
dynamic web pages and applications.
7. GitHub: As an online platform, GitHub serves as a centralized hub for version control
and collaborative efforts among developers
3.5 Hardware Component Requirements
The tangible constituents of the computer system encompass the physical constituents that are
observable, tactile, and perceptible. These embrace the input device, display screen, data
37
storage unit, central processing unit, memory module, and tactile interface. Provided below is
an inventory detailing the distinct hardware aspects harnessed within this inquiry, along with
their respective objectives:
1. Central Processor Unit (CPU): A minimum of 2GHz Core i5 8th generation processor.
2. Random Access Memory (RAM): No less than 4GB of RAM.
3. Data Storage Drive: A minimum of 500GB for permanent retention of code, datasets,
and trained models.
4. Typing Device: A 101-key US standard keyboard.
5. Pointing Device: A 3D mouse
38
CHAPTER FOUR
RESULTS AND DISCUSSION
4.1 Introduction
The study deployed a machine learning model, two algorithms was used namely Convolutional
Neural Network, and Support Vector System through the design of a standalone application
by which a user acquires the predicted results from the input images used to train the system.
The images include categories of Pneumonia Normal, Pneumonia bacteria, Pneumonia virus
categories. As these images are uploaded in to the system, it uses it for training by extracting
features from the images after which the system would have learned and be able to predict a
new input image of Pneumonia.
The machine learning model is deployed by designing a web-based application through which
the user can get the predicted results by inputting the values for the required parameters such
as the result for age, gender, anxiety, peer pressure and so on. As these parameters are inputted
into the web application, it sends it to the backend where the machine learning model is stored.
When the data is received by the backend, the model then detect the outcome of the respiratory
disease results and sends the response back to the frontend so the result can be displayed. The
degree of accuracy for this system is based on the number of records contained in the dataset
used to carry out the operation. The application is a medium by which data passes through to
the machine learning model, the model checks itself to see if there is some resemblance in the
dataset used in its construction, it then learns from already available data gotten from the dataset
utilized. Finally, it delivers the appropriate outcome back to the user. The web application
designed in this study was tested with localhost and can be deployed online as a fully working
application i.e., it can be accessed at any time and from anywhere without restrictions. Users
of this system can test whenever they feel the need to, which in turn improves the model’s
ability to learn new things about the different data values presented to it.
This system runs on a web environment and is used with the following procedures on the
localhost:
Step 1: Start the xampp control panel and start the apache server and MySQL database.
39
Step 2: On your preferred browser type “localhost/lung”. Once the web application loads up, it
displays a user interface (frontend) that initially shows information about images that include
categories of Pneumonia Normal, Pneumonia bacteria, Pneumonia virus. A login and
registration page are provided for users to access the system and register if not already on the
system. When the user accesses the system they are provided with the full functionalities of the
system.
4.2 Data Collection
We collected the data set of Chest X-Ray Images (Pneumonia), sourced from Kaggle in CSV
format. The data is categorized into three directories (train, test, val), further subdivided into
subfolders corresponding to each image type (Pneumonia/Normal). The dataset encompasses
a total of 5,000 X-Ray images in JPEG format, spanning across three categories
(Pneumonia/Normal bacteria). These X-ray images, captured in the anterior-posterior view,
were carefully chosen from historical collections of pediatric patients aged one to five years at
the Guangzhou Women and Children’s Medical Center in Guangzhou.
Figure 4.1: Data Collection Process
40
4.3 Data preprocessing
Data preprocessing is a crucial step in building a machine learning model for pneumonia
detection using CNN and SVM. Here's a breakdown of some key techniques:
i. Data Acquisition:
Obtain chest X-ray datasets containing images of both healthy and pneumonia-infected
lungs.
Ensure the data is high quality and appropriately labeled.
ii. Data Cleaning:

Missing Values: Identify and address missing data points. Common strategies
include removing images with missing values, imputation techniques (filling in
missing values), or data augmentation (creating new data from existing ones).
Normalization: Standardize the intensity values of pixels in the X-ray images. This
ensures all images are on a similar scale and helps the model converge faster during
training. Common methods include min-max scaling or z-score normalization.
iii. Data Augmentation:
Artificially increase the size and diversity of your dataset to improve model
generalizability. Techniques include:
 Random cropping: Extract smaller sections of the X-ray image to capture
different regions of interest.
 Random flipping: Flip images horizontally or vertically to introduce
variations.
 Rotation: Rotate images slightly to account for variations in X-ray
acquisition.
iv. Resizing:
Resize all images to a uniform size appropriate for your CNN architecture. This
ensures consistency in data format for the model.
v. Preprocessing for CNN:
Convert the X-ray images from RGB (if applicable) to grayscale format, as CNN
typically work well with single-channel images for medical image analysis.
41
vi. Preprocessing for SVM
Depending on the chosen SVM implementation, you might need to convert the
preprocessed images into feature vectors. This involves extracting relevant features
from the images that can be used by the SVM for classification. Techniques like
extracting pixel intensities or using pre-trained CNN models for feature extraction
can be employed.
vii. Train-Test Split:
Divide your preprocessed data into training and testing sets. The training set is used
to train the model, and the testing set is used to evaluate its performance on unseen
data. A common split is 80% for training and 20% for testing.
By following these data preprocessing steps, you can prepare your X-ray images for effective
training of your CNN and SVM models for pneumonia detection. Remember to choose
techniques best suited for your specific dataset and model architecture.
4.4 Feature Evaluation
Feature evaluation plays a critical role in both CNN and SVM-based approaches for pneumonia
detection. Here's how it's approached in each case:
i. Feature Evaluation in CNN:
Automatic Feature Learning: A significant advantage of CNN is their ability to automatically

learn relevant features from the input images during the training process. Convolutional layers
act as feature extractors, identifying patterns and edges in the X-ray images that are crucial for
differentiating healthy from pneumonia-infected lungs.
Visualization Techniques: While CNN excel at automatic feature learning, interpreting these
features can be challenging. Techniques like visualization of filters and activation maps can
provide some insights into what the CNN is focusing on within the X-ray images.
ii. Feature Evaluation in SVMs:
Hand-crafted Features: Unlike CNN, SVM typically require pre-defined features as input.
These features need to be carefully chosen to capture the discriminative information between
healthy and pneumonia-infected lungs.
42
Feature Selection Techniques: When using hand-crafted features, it's essential to evaluate
their effectiveness. Techniques like correlation analysis, chi-square tests, and feature
importance scores can help identify the most relevant features that contribute the most to
classification accuracy.
iii. Comparison and Considerations:
CNN offer an advantage in feature learning, as they can automatically discover relevant
patterns from the data without the need for manual feature engineering. This can be particularly
beneficial when dealing with complex medical images like chest X-rays.
SVM might be suitable if you have domain knowledge about pneumonia and can identify
specific features that differentiate healthy from infected lungs. However, hand-crafted feature
engineering can be time-consuming and requires expertise.
Here are some additional points to consider:
Transfer Learning: Utilize pre-trained CNN models like VGG16 or ResNet for feature
extraction. These models have already learned powerful features from large image datasets and
can be fine-tuned for pneumonia detection. This leverages the power of CNN for feature
learning while potentially reducing training time.
Feature Importance in Combined Approaches: If you're using a combination of CNN and

SVM (e.g., using a CNN for feature extraction and SVM for classification), techniques like
feature importance analysis can still be used on the extracted features to understand what
contributes most to the SVM's classification decisions.
By effectively evaluating features in your CNN or SVM-based approach, you can ensure your
model is focusing on the most relevant information for accurate pneumonia detection.
4.5 Feature Selection
While CNN excel at automatic feature learning, and SVM can work with hand-crafted features,
feature selection can still be beneficial in both scenarios for pneumonia detection using chest
X-ray images. Here's how it can be applied:
43
Feature Selection for CNN :
i Regularization Techniques: Techniques like L1/L2 regularization or dropout layers in CNN

architectures can inherently perform feature selection during training. These methods penalize
large weights in the network, encouraging the model to focus on a smaller subset of important
features.
ii Filter Methods (for Pre-trained Models): If using a pre-trained CNN for feature extraction,
filter methods can be applied after feature extraction to select a subset of the learned features
most relevant for pneumonia detection. Techniques like chi-square tests or information gain
can be used for this purpose.
iii. Feature Selection for SVM:
Dimensionality Reduction: When dealing with high-dimensional feature vectors extracted

from images, dimensionality reduction techniques like Principal Component Analysis (PCA)
can be used to reduce the number of features while retaining most of the information relevant
for classification.
Wrapper Methods: These methods involve evaluating different feature subsets using the
SVM classifier itself as a scoring function. The goal is to find the subset that leads to the best
classification performance on a validation set. Techniques like recursive feature elimination
(RFE) or genetic algorithms can be employed.
Embedded Methods: These methods integrate feature selection within the SVM training
process. L1-regularized SVM inherently perform feature selection by driving some feature
weights to zero, effectively removing those features from the model.
Advantages of Feature Selection:
Improved Performance: Selecting a relevant subset of features can lead to better

classification accuracy by reducing noise and overfitting.
Reduced Training Time: Training with fewer features can be computationally faster,
especially for large datasets.
44
Model Interpretability: In SVMs, feature selection can help identify the most discriminative
features contributing to the classification, providing some insights into the model's decision-
making process.
Choosing the Right Approach:
For CNN: Regularization techniques are often a good first approach. Filter methods can be
considered for fine-tuning after using pre-trained models.
For SVM: Feature selection is more crucial. Techniques like PCA for dimensionality reduction
followed by wrapper or embedded methods are common approaches.
Combined CNN-SVM Approaches:
Feature selection can be applied after feature extraction from a pre-trained CNN before feeding
them into the SVM. This leverages the power of CNN for feature learning and SVM for
classification with a potentially more interpretable feature set.
By incorporating feature selection techniques, you can optimize your CNN or SVM model for
pneumonia detection, potentially leading to improved performance, faster training times, and
a better understanding of the factors contributing to accurate classification.
4.6 Model Development
Machine learning uses various steps in developing a model for pneumonia detection using a
combination of CNN and SVM:
i. Data Acquisition:
 Obtain a chest X-ray dataset containing labeled images (normal,bacteria, virus). the
dataset is balanced and trained.
ii. Data Preprocessing:
 Preprocess the images using techniques like normalization, resizing, and potentially
data augmentation to increase dataset size and diversity.
 Convert images to grayscale format for CNN.pen_spark
45
iii. CNN Model Development:
 Choose a CNN architecture suitable for image classification tasks (e.g., VGG16,
ResNet).
 Consider pre-training the CNN on a large image dataset (like ImageNet) for feature
extraction and fine-tuning on your pneumonia dataset. Define the CNN architecture
with convolutional layers for feature extraction, pooling layers for dimensionality
reduction, and fully connected layers for classification.
 Train the CNN model on the preprocessed training set, specifying an optimizer (e.g.,
Adam) and loss function (e.g., binary cross-entropy) for optimizing the model's weights
and biases.
iv. Feature Extraction:
 After training the CNN, extract features from the final layers before the classification
layer. These features represent the learned patterns from the X-ray images.
v. SVM Model Development:
 If using an SVM for classification:

 Choose an SVM implementation (e.g., scikit-learn library).
 Train the SVM model on the extracted features (from step 4) or directly on the
preprocessed images (if not using feature extraction) from the training set.
 Define the SVM kernel function (e.g., RBF) and other hyperparameters (regularization
parameter).
 Train the SVM model using an optimizer to minimize the classification error.
vi. Model Evaluation:
 Evaluate both the CNN and SVM models (if applicable) on a separate testing set unseen
during training.
 Use metrics like accuracy, precision, recall, and F1-score to assess the model's
performance in classifying pneumonia cases.
vii. Model Refinement:
Based on the evaluation results, the model was refined by:
46
 Tuning hyperparameters of the CNN and SVM.
 Trying different CNN architectures or feature extraction techniques.
 Employing techniques like dropout layers or data augmentation to address overfitting.
viii. Deployment:
 If satisfied with the model's performance, consider deploying it for real-world use in a
healthcare setting. This might involve integrating the model into a web application or
medical imaging system.
ix. Additional Considerations:
 Class Imbalance: If your dataset has a class imbalance (more healthy cases than
pneumonia cases), techniques like oversampling or under sampling the majority class
can be applied to address this issue.
 Transfer Learning: Leverage pre-trained CNN models for feature extraction to reduce
training time and potentially improve performance.
x. Explainability:
While CNN are powerful, interpretability can be challenging. Consider techniques like
visualization or using feature importance analysis in SVMto gain insights into the model's
decision-making process.
Figure 4.2 Research flow chart
The main aim of the system designed is to create an application that can help with the early
detection of pneumonia. They are few simple steps that needs to be taken before the result can
be displayed. Below are simple steps on how to go about it.
47
Step 1: Start Matlab IDE (Integrated Development Environment)
Step 2: type PNEM_INTERFACE_1 in the IDE, wait for some minutes for the first interface
to come up, then begin the processes.
Step 3: Select image for uploading
Step 4: prediction of Pneumonia
Step 5 : Result
4.7 User output
Figure 4.3: Matlab IDE (Integrated Development Environment)
Figure 4.3 shows the Matlab IDE (Integrated Development Environment), a comprehensive
software tool designed to facilitate the development, testing, and deployment of algorithms and
applications using the Matlab programming language. The Matlab IDE typically provides a
user-friendly interface with various features tailored to support the entire workflow of scientific
and engineering computing tasks. Within the IDE, users can write and execute Matlab code,
visualize data, debug programs, and create graphical user interfaces (GUIs) for interactive
applications. The IDE often includes tools for managing files and projects, accessing
documentation and help resources, and integrating with other software and hardware
components. Overall, the Matlab IDE serves as a central hub for Matlab users to efficiently
48
develop and explore solutions for a wide range of technical challenges, from data analysis and
signal processing to image processing and machine learning.
Figure 4.4: First Interface (Uploading Images)
Figure 4.4 represents the first interface of the software application or system designed for
uploading images. This interface serves as the initial step in a larger workflow, where users can
select and upload images from their local storage or from external sources into the application,
this image upload interface would enable users to provide the necessary input data (chest X-
ray images) for analysis and diagnosis. Once the images are uploaded, they would likely
proceed to subsequent interfaces or modules for further processing, analysis, and visualization
of results.
49
Figure 4.5: Second Interface (Selecting the Images for Uploading)
Figure 4.5 represents the second interface in a software application or system designed for
uploading images, specifically focusing on the selection process before uploading. This
interface follows the initial interface where users may have initiated the upload process by
selecting files or dragging and dropping images, users would likely select the relevant chest X-
ray images from those uploaded in the previous interface. These selected images would then
be processed further in subsequent steps, such as applying machine learning algorithms for
diagnosis or generating insights.
Figure 4.6: Third Interface (Prediction of Pneumonia)
50
Figure 4.6 shows the third interface in a software application or system designed for predicting
pneumonia based on uploaded chest X-ray images. This interface follows the image selection
process, where users have chosen the images for analysis. Figure 4.4 serves as a crucial
interface for users to interpret and act upon the predictions generated by the system regarding
the presence or absence of pneumonia in the uploaded chest X-ray images.
Figure 4.7: Fourth Interface (Results of the CNN Network Model)
Figure 4.7 shows the fourth interface in a software application or system specifically focused
on displaying the results generated by a Convolutional Neural Network (CNN) model for
pneumonia detection in chest X-ray images. Figure 4.5 serves as a comprehensive interface for
presenting the results of the CNN model's analysis of chest X-ray images, providing users with
valuable information to support clinical decision-making and patient care.
Figure 4.8: Chest Xray Image Data Set
Figure 4.8 shows a chest X-ray image dataset used for training and testing machine learning
models, particularly for tasks related to respiratory disease detection such as pneumonia
classification.
51
Performance Analysis
They are different performance analysis that was used to carry out a better result on the models
the performance matrices used in this project are confusion matrix, accuracy precession, recall
and F1 score. A confusion matrix, also sometimes called an error matrix, is a visualization tool
used to evaluate the performance of a classification model.
It provides a clear breakdown of how the model performed on a set of test data, allowing you
to see how many predictions were correct and where the model made mistakes
Different Possible outcomes using confusion Matrix
Possible outcome SVM CNN CNN + SVM
TP ( True positive ) 855 213 891
FP ( False positive ) 25 43 22
TN ( True Negative ) 98 208 172
FN ( False Negative ) 0 36 20
Table 4.1 Different Possible outcomes using confusion Matrix
Performance metrics for Convolutional Neural Network (CNN)
Figure 4.8 Performance metrics for CNN
52
The confusion matrix is obtained from the machine learning code, after training the CNN
model. It will be used to describe the performance of a classification model on the training
data.
Figure 4.9: Performance CNN model
To evaluate the precision, Recall and specificity of the CNN model, we draw conclusions from
the confusion matrix in Table
Precision: Precision is the percentage of accurately identified positive values. This can be
derived from the above confusion matrix using the following formula:
Precision= TP/(TP+FP)= 208/(208+36) = 0.89
Recall: Sensiti4vity, also name for recall, is the percentage of true positive cases that are
accurately identified. This can be derived from the above confusion matrix using the following
formula:
Recall= TP/(TP+FN)= 208/(208+43) = 0.83
Specificity: Specificity is the percentage of truly negative cases that are accurately identified.
This can be derived from the above confusion matrix using the following formula:
Specificity= TN/(TN+FP)= 213/(213+36) = 0.86
F1-Score = (2×Precision × Recall)/ (Precision + Recall) = 90%
53
The table below summarizes the findings from the four machine learning models that were
used in this study by listing the Precision, Recall, Specificity and Accuracy for each model.
Evaluation of performance
The table below shows the performance of the machine learning model used and their
percentaage in accuracy, precision, Recall and F1 score.
S/N Machine Learning Accuracy Precision Recall F-1 score

Model
1. Convolutional Neural 94% 99% 92% 92%

Network (CNN)
2. Support Vector System 96% 94.5% 100% 90%
Convolution Neural Network Operation:
The convolution operation is essentially a sliding dot product between a filter (kernel) and the
input image. It allows the network to learn spatial features within the image data.
Chart Title
SVM+ CNN
F-1 score
Support Vector System Recall

Precision
Convolutional Neural Accuracy
Network (CNN)
85% 90% 95% 100% 105%
Figure 4.10 Representation of performance metric in a bar chart
54
a) Equation:
S(x, y) = ΣΣ W(i, j) * I(x + i, y + j) + b
S(x, y): This represents the output feature map at a specific location (x, y) in the output volume.
W(i, j): This represents the elements of the filter (kernel) at position (i, j). The filter size is
typically much smaller than the image size.
I(x + i, y + j): This represents the element-wise multiplication between the filter and the
corresponding patch of the input image centered at (x, y).
ΣΣ: This denotes summation over all elements (i, j) within the filter size.
b): This represents the bias term added to the output for each location in the feature map.
Support Vector System operation
The core mathematical equation for a Support Vector Machine (SVM) in classification
problems involves the decision function that separates the data points belonging to different
classes. Here's a breakdown:
a) Decision Function:
The decision function determines on which side of the hyperplane (decision boundary) a new
data point falls and consequently its predicted class.
b) Equation:
f(x) = w^T * x + b
f(x): This represents the output of the decision function for a new data point x.
w: This is the weight vector of the SVM, with the same dimensionality (n) as the feature vectors
of the data points.
x: This represents the feature vector of the new data point to be classified.
T: This denotes the transpose operation.
b: This is the bias term that influences the position of the hyperplane in the feature space.
55
COMBINATION OF BOTH ALGORITHM
SVM Equation (Classification):
The core equation for an SVM involves the decision function that separates the data points:
f(x) = w^T * x + b
f(x): Decision function output for a new data point (x).
w: Weight vector of the SVM (learned during training).
x: Feature vector of the new data point.
T: Transpose operation.
b: Bias term influencing the hyperplane position.
CNN Equation (Convolution Operation):
A core operation in CNN is the convolution, which allows them to learn features directly from
the input data (images).
S(x, y) = ΣΣ W(i, j) * I(x + i, y + j) + b
S(x, y): Output feature map at a specific location.
W(i, j): Elements of the filter (kernel) used for convolution.
I(x + i, y + j): Element-wise multiplication between filter and input image patch.
ΣΣ: Summation over all elements within the filter size.
b: Bias term added to the output.
Combination Approach:
i. Train a CNN on a large dataset of chest X-ray images.
ii. Choose the approach:
56
 Use features from the final CNN layers and feed them directly into an SVM for
classification.
 Extract features from intermediate CNN layers and use them as input to the SVM.
iii Train the SVM on the extracted features or CNN-generated features to classify new X-ray
images as healthy or pneumonia.
57
CHAPTER FIVE
CONCLUSION AND RECOMMENDATION
5.1 Conclusion
This project investigated the potential of machine learning (ML) for the detection of respiratory
diseases, with a focus on pneumonia using chest X-ray images. Convolutional Neural Networks
(CNN) and Support Vector Machines (SVM) were explored as promising techniques for
automated feature extraction and classification. The findings of this project demonstrate the
potential of Machine learning to achieve high accuracy in pneumonia detection. The ability of
CNN to automatically learn relevant features from X-ray images offers a significant advantage
over traditional methods. Furthermore, the integration of SVM leverages their robust
classification capabilities, potentially surpassing human radiologists in certain scenarios.
However, the project also acknowledges the challenges associated with implementing Machine
learning in healthcare settings. Data quality and bias, interpretability of complex models, and
regulatory hurdles require careful consideration and ongoing research efforts.
This project would provide valuable insights into the potential of Machine learning for
respiratory disease detection. the existing challenges was addressed and worked on for a better
proposed future work directions, Machine learning has the potential to become a powerful tool
for improving early diagnosis, treatment planning, and ultimately, patient outcomes in the field
of respiratory medicine.
The project is designed to help in the prediction of Pneumonia, the data was gathered from
Kaggles an online data source, and undergo six important phases, data collection,
preprocessing, feature evaluation, feature selection, data modelling and implementation. Other
attributes was put together which helped in carry out a better result on the models.
This project has established a strong foundation for utilizing Machine learning particularly
CNN and SVM in respiratory disease detection. By leveraging extensive data analysis, the
project demonstrates the potential for improved accuracy and efficiency in pneumonia
diagnosis. This project was design as a standalone application by which a user acquires the
predicted results from the input images used to train the system. The images include categories
of Pneumonia Normal, Pneumonia bacteria, Pneumonia virus categories. As these images are
58
uploaded in to the system, it uses it for training by extracting features from the images after
which the system would have learned and be able to predict a new input image of Pneumonia.
For future studies, we intend to use a larger sample of dataset to obtain higher accuracy as well
as design a form of real-time system where a user gets the pneumonia status from other
biometric attributes such as the iris, retina or other facial features.
5.2 Recommendation
To ensure effective and efficient usage of the results from this work, it is recommended that
there is should be a real time system that would acquire real time images from users utilizing
other biometric attributes such as the iris, retina or other facial features whereby they can
predict their Pneumonia status even from the convenience of their homes without the stress of
coming to see a medical practitioners
5.3 Limitation of the Study
This research faces limitations in terms of age bias, environmental bias, and feeding bias. The
dataset used for training may lack representation across different age groups, leading to
potential inaccuracies in pediatric or adult cases. Additionally, environmental biases can arise
due to data originating from specific regions, affecting the model's generalizability to diverse
environments. Lastly, the nutritional status of patients, or feeding bias, can impact the model's
effectiveness, necessitating a representative dataset for improved performance across diverse
populations.
59
REFERENCES
Al-Romaihi, H., Smatti, M. K., Khatib, H. a. A., Coyle, P., Ganesan, N., Nadeem, S., Farag,
E., Thani, A. a. A., Khal, A. A., Ansari, K. A., Maslamani, M. A., & Yassine, H.
M. (2020). Molecular epidemiology of influenza, RSV, and other respiratory
infections among children in Qatar: A six years report (2012–2017). International
Journal of Infectious Diseases, 95, 133–141.
https://doi.org/10.1016/j.ijid.2020.04.008
Alghamdi, S. (2021). The role of vaccines in combating antimicrobial resistance (AMR)

bacteria. Saudi Journal of Biological Sciences, 28(12), 7505–7510.
https://doi.org/10.1016/j.sjbs.2021.08.054
Ali, A., Razak, S. A., Othman, S. N., Eisa, T. a. E., Al-Dhaqm, A., Nasser, M., Elhassan, T.,
Elshafie, H., & Saif, A. (2022). Financial Fraud Detection Based on Machine
Learning: A Systematic Literature Review. Applied Sciences, 12(19),
9637https://doi.org/10.3390/app12199637
Athanazio, R. A. (2012). Airway disease: similarities and differences between asthma, COPD
and bronchiectasis. Clinics, 67(11), 1335–1343.
https://doi.org/10.6061/clinics/2012(11)19
Azam, M. A., Shahzadi, A., Khalid, A., Anwar, S. M., & Naeem, U. (2018). Smartphone Based
Human Breath Analysis from Respiratory Sounds.
https://doi.org/10.1109/embc.2018.8512452
Brooks, L. R. K., & Mias, G. I. (2018). Streptococcus pneumoniae’s Virulence and Host
Immunity: Aging, Diagnostics, and Prevention. Frontiers in Immunology, 9.
https://doi.org/10.3389/fimmu.2018.01366
Carew, J. M. (2023, February 10). reinforcement learning. Enterprise AI.

https://www.techtarget.com/searchenterpriseai/definition/reinforcement-learning
Chen, A., Zhang, J., Zhao, L., Rhoades, R. D., Kim, D., Wu, N., Liang, J., & Chae, J. (2021).
Machine-learning enabled wireless wearable sensors to study individuality of
respiratory behaviors. Biosensors and Bioelectronics, 173, 112799.
https://doi.org/10.1016/j.bios.2020.112799
60
Chen, D., Cao, L., & Li, W. (2023). Etiological and clinical characteristics of severe pneumonia
in pediatric intensive care unit (PICU). BMC Pediatrics, 23(1).
https://doi.org/10.1186/s12887-023-04175-y
Chen, H., Yuan, X., Li, J., Pei, Z. Y., & Zheng, X. (2019). Automatic Multsti-Level In-Exhale
Segmentation and Enhanced Generalized S-Transform for wheezing detection.
Computer Methods and Programs in Biomedicine, 178, 163–173.
https://doi.org/10.1016/j.cmpb.2019.06.024
Copeland, M. (2021, July 17). The Difference Between AI, Machine Learning, and Deep
Learning? NVIDIA Blog. https://blogs.nvidia.com/blog/2016/07/29/whats-
difference- artificial-intelligence-machine-learning-deep-learning-ai/
Crna, R. N. M. (2018, March 12). Routine sputum culture. Healthline.

https://www.healthline.com/health/routine-sputum-culture
Crosta, P. (2023, April 19). What you should know about pneumonia.
https://www.medicalnewstoday.com/articles/151632
Dessie, T., Jemal, M., Temesgen, M. M., & Tiruneh, M. (2021). Multiresistant Bacterial
Pathogens Causing Bacterial Pneumonia and Analyses of Potential Risk Factors
from Northeast Ethiopia. International Journal of Microbiology, 2021, 1–9.
https://doi.org/10.1155/2021/6680343
Dwivedi, R. (n.d.). What Are Recommendation Systems in Machine Learning? | Analytics

Steps. https://www.analyticssteps.com/blogs/what-are-recommendation-systems-
machine- learning
Effah, C. Y., Miao, R., Drokow, E. K., Agboyibor, C., Qiao, R., Wu, Y., Miao, L., & Wang,
Y. (2022). Machine learning-assisted prediction of pneumonia based on non-
invasive measures. Frontiers in Public Health, 10.
https://doi.org/10.3389/fpubh.2022.938801
Grief, S. N., & Loza, J. K. (2018). Guidelines for the evaluation and treatment of pneumonia.
Primary Care, 45(3), 485–503. https://doi.org/10.1016/j.pop.2018.04.001
61
Infante, C., Chamberlain, D. E., Kodgule, R., & Fletcher, R. (2017). Classification of voluntary
coughs applied to the screening of respiratory disease.
Ippolito, P. P. (2021, December 10). SVM: Feature Selection and Kernels - towards Data
science. Medium. https://towardsdatascience.com/svm-feature-selection-and-
kernels- 840781cc1a6c
Kahn, A. (2023, June 26). What’s causing my clammy skin? Healthline.

https://www.healthline.com/health/skin-clammy
Khasha, R., Sepehri, M. M., & Mahdaviani, S. A. (2019). An ensemble learning method for
asthma control level detection with leveraging medical knowledge-based classifier
and supervised learning. Journal of Medical Systems, 43(6).
https://doi.org/10.1007/s10916- 019-1259-8
Kalita, D. (2022). Basics of CNN in Deep Learning. Analytics Vidhya.

https://www.analyticsvidhya.com/blog/2022/03/basics-of-cnn-in-deep-learning/
Kaur, R., Mehra, B., Dhakad, M. S., Goyal, R., Bhalla, P., & Dewan, R. (2017). Fungal
opportunistic pneumonias in HIV/AIDS patients: an Indian Tertiary care
experience. Journal of Clinical and Diagnostic Research.
https://doi.org/10.7860/jcdr/2017/24219.9277
Kim, B., Kang, M., Lim, J., Lee, J. Y., Kang, D., Kim, E. K., Kim, J., Park, H., Min, K. U.,
Cho, J., & Jeon, K. (2022). Comprehensive risk assessment for hospital-acquired
pneumonia: sociodemographic, clinical, and hospital environmental factors
associated with the incidence of hospital-acquired pneumonia. BMC Pulmonary
Medicine, 22(1). https://doi.org/10.1186/s12890-021-01816-9
Košutova, P., & Mikolka, P. (2021). Aspiration syndromes and associated lung injury:
incidence, pathophysiology and management. Physiological Research, S567–
S583. https://doi.org/10.33549/physiolres.934767
Kim, G. T., Seon, S. H., & Rhee, D. (2017). Pneumonia and Streptococcus pneumoniae
vaccine. Archives of Pharmacal Research, 40(8), 885–893.
https://doi.org/10.1007/s12272-017- 0933-y
62
Kuhajda, I., Zarogoulidis, K., Tsirgogianni, K., Tsavlis, D., Kioumis, I., Kosmidis, C.,
Tsakiridis, K., Mpakas, A., Zarogoulidis, P., Zissimopoulos, A., Baloukas, D., &
Kuhajda, D. (2015). Lung abscess-etiology, diagnostic and treatment options.
PubMed, 3(13), 183. https://doi.org/10.3978/j.issn.2305-5839.2015.07.08
Lawton, G., Burns, E., & Rosencrance, L. (2022, January 20). logistic regression. Business
Analytics. https://www.techtarget.com/searchbusinessanalytics/definition/logistic-
regression
Lawton, G., Carew, J. M., & Burns, E. (2022, January 21). predictive modeling. Enterprise AI.
https://www.techtarget.com/searchenterpriseai/definition/predictivemodeling#:~:t
ext=
Predictive%20modeling%20is%20a%20mathematical,forecast%20activity%2C%
20beha vior%20and%20trends.
Lutkevich, B., & Burns, E. (2023, January 20). natural language processing (NLP). Enterprise
AI.https://www.techtarget.com/searchenterpriseai/definition/natural-language-
processing-NLP
Normandin, B. (2023, February 8). Everything you need to know about pneumonia. Healthline.
https://www.healthline.com/health/pneumonia
Petersson, D. (2021, March 26). supervised learning. Enterprise AI.

https://www.techtarget.com/searchenterpriseai/definition/supervised-learning
Pragman, A. A., Berger, J. T., & Williams, B. (2016). Understanding persistent bacterial lung
infections. Clinical Pulmonary Medicine, 23(2), 57–66.
https://doi.org/10.1097/cpm.0000000000000108
Pramono, R. X. A., Patience, G. S., & Rodriguez-Villegas, E. (2019). Automatic Cough

Detection in Acoustic Signal using Spectral Features.
Pratt, M. K. (2020, July 8). unsupervised learning. Enterprise AI.

https://www.techtarget.com/searchenterpriseai/definition/unsupervised-learning
63
Pyle, D., & José, C. S. (2019, February 13). An executive’s guide to machine learning.
McKinsey & Company. https://www.mckinsey.com/industries/technology-media-
and- telecommunications/our-insights/an-executives-guide-to-machine-learning
Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research

Directions. SN Computer Science, 2(3). https://doi.org/10.1007/s42979-021-
00592-x
Sharma, A. (2023, March 13). Random Forest vs Decision Tree | Which Is Right for You?
Analytics Vidhya. https://www.analyticsvidhya.com/blog/2020/05/decision-tree-
vs- random-forest-algorithm/
Scherer, P., & Chen, D. L. (2016). Imaging pulmonary inflammation. The Journal of Nuclear
Medicine, 57(11), 1764–1770. https://doi.org/10.2967/jnumed.115.157438
Sharma, S., & Guleria, K. (2023). A Deep Learning based model for the Detection of
Pneumonia from Chest X-Ray Images using VGG-16 and Neural Networks.
Procedia Computer Science, 218, 357–366.
https://doi.org/10.1016/j.procs.2023.01.018
Tasaka, S. (2015). PneumocystisPneumonia in Human Immunodeficiency Virus–infected

Adults and Adolescents: Current Concepts and Future Directions. Clinical
Medicine Insights, 9s1, CCRPM.S23324. https://doi.org/10.4137/ccrpm.s23324
Tripathi, A., Singh, A. D., Singh, K. N., Choudhary, P., & Vashist, P. C. (2021). Machine
learning architecture and framework. Elsevier EBooks, 1–22.
https://doi.org/10.1016/b978-0-12-821229-5.00005-7
Tsang, K., Pinnock, H., Wilson, A., & Shah, S. a. A. (2020). Application of Machine Learning
to Support Self-Management of Asthma with mHealth.
https://doi.org/10.1109/embc44109.2020.9175679
Vanreppelen, G., Wuyts, J., Van Dijck, P., & Vandecruys, P. (2023). Sources of antifungal
drugs. Journal of Fungi, 9(2), 171. https://doi.org/10.3390/jof9020171
Van Vliet, D., Smolinska, A., Jöbsis, Q., Rosias, P. P., Muris, J. W. M., Dallinga, J. W.,
Dompeling, E., & Van Schooten, F. (2017). Can exhaled volatile organic
64
compounds predict asthma exacerbations in children? Journal of Breath Research,
11(1), 016016. https://doi.org/10.1088/1752-7163/aa5a8b
Vatanparvar, K., Nemati, E., Nathan, V., Rahman, M., & Kuang, J. (2020). CoughMatch –
Subject verification using Cough for personal passive health monitoring.
https://doi.org/10.1109/embc44109.2020.9176835
Von Ranke, F. M., Zanetti, G., Hochhegger, B., & Marchiori, E. (2012). Infectious diseases
causing diffuse alveolar hemorrhage in immunocompetent Patients: A State-of-the-
Art Review. Lung, 191(1), 9–18. https://doi.org/10.1007/s00408-012-9431-7
Zhang, F. (2021). Application of machine learning in CT images and X-rays of COVID-19

pneumonia. Medicine, 100(36), e26855.
https://doi.org/10.1097/md.0000000000026855 Zambon, V. (2020, August 6).
What to know about a cough with mucus.
https://www.medicalnewstoday.com/articles/cough-with-mucus
Zhang, O., Minku, L. L., & Gonem, S. (2020). Detecting asthma exacerbations using daily
home monitoring and machine learning. Journal of Asthma, 58(11), 1518–1527.
https://doi.org/10.1080/02770903.2020.1802746
65
APPENDIX A
Confusion Chart For Pnuemonia Detection
66
APPENDIX B
Confusion Matrix
67
APPENDIX C
Bar Chart Of Comparison Of Each CNN Matric
68
APPENDIX D
Line Graph Comparison
69
APPENDIX E
Programe Source Code
CODE FOR FIRST MODULE
Function varargout = PNEM_INTERFACE_1(varargin)
% PNEM_INTERFACE_1 MATLAB code for PNEM_INTERFACE_1.fig
% PNEM_INTERFACE_1, by itself, creates a new PNEM_INTERFACE_1 or raises the

existing
% singleton*.
% H = PNEM_INTERFACE_1 returns the handle to a new PNEM_INTERFACE_1 or the

handle to
% the existing singleton*.
% PNEM_INTERFACE_1('CALLBACK',hObject,eventData,handles,...) calls the local
% function named CALLBACK in PNEM_INTERFACE_1.M with the given input

arguments.
% PNEM_INTERFACE_1('Property','Value',...) creates a new PNEM_INTERFACE_1 or

raises the
% existing singleton*. Starting from the left, property value pairs are
% applied to the GUI before PNEM_INTERFACE_1_OpeningFcn gets called. An
% unrecognized property name or invalid value makes property application
% stop. All inputs are passed to PNEM_INTERFACE_1_OpeningFcn via varargin.
70
%
% *See GUI Options on GUIDE's Tools menu. Choose "GUI allows only one
% instance to run (singleton)".
% See also: GUIDE, GUIDATA, GUIHANDLES
% Edit the above text to modify the response to help PNEM_INTERFACE_1
% Last Modified by GUIDE v2.5 12-Feb-2024 10:18:50
% Begin initialization code - DO NOT EDIT
gui_Singleton = 1;
gui_State = struct('gui_Name', mfilename, ...
'gui_Singleton', gui_Singleton, ...
'gui_OpeningFcn', @PNEM_INTERFACE_1_OpeningFcn, ...
'gui_OutputFcn', @PNEM_INTERFACE_1_OutputFcn, ...
'gui_LayoutFcn', [] , ...
'gui_Callback', []);
if nargin && ischar(varargin{1})
gui_State.gui_Callback = str2func(varargin{1});
end
if nargout
71
[varargout{1:nargout}] = gui_mainfcn(gui_State, varargin{:});
else
gui_mainfcn(gui_State, varargin{:});
end
% End initialization code - DO NOT EDIT
% --- Executes just before PNEM_INTERFACE_1 is made visible.
function PNEM_INTERFACE_1_OpeningFcn(hObject, eventdata, handles, varargin)
% This function has no output args, see OutputFcn.
% hObject handle to figure
% eventdata reserved - to be defined in a future version of MATLAB
% handles structure with handles and user data (see GUIDATA)
% varargin command line arguments to PNEM_INTERFACE_1 (see VARARGIN)
% Choose default command line output for PNEM_INTERFACE_1
handles.output = hObject;
% Update handles structure
guidata(hObject, handles);
% UIWAIT makes PNEM_INTERFACE_1 wait for user response (see UIRESUME)
% uiwait(handles.figure1);
% --- Outputs from this function are returned to the command line.
function varargout = PNEM_INTERFACE_1_OutputFcn(hObject, eventdata, handles)
% varargout cell array for returning output args (see VARARGOUT);
72
% Get default command line output from handles structure
varargout{1} = handles.output;
% --- Executes on button press in pushbutton1.
function pushbutton1_Callback(hObject, eventdata, handles)
% hObject handle to pushbutton1 (see GCBO)
pat = uigetdir('MATLAB Root','SELECT FOLDER FOR TRAINING');
f = waitbar(0,'Please wait... Starting Training')
pause(2)
waitbar(.33,f,'Training Progressing...30%')
pause(2)
waitbar(.67,f,'Almost Done.. 60%')
pause(2)
waitbar(1,f,'Finishing...Training 100%')
close(f);
PNEM_INTERFACE_2;
73
close(PNEM_INTERFACE_1);
###########################################################################
#####################################
CODE FOR SECOND MODULE
function varargout = PNEM_INTERFACE_2(varargin)

existing
% singleton*.

handle to

arguments.

raises the
74
gui_Singleton = 1;
end
if nargout
75
else
end
76
global netw;
global loc;
netw =load('pnem_workspace.mat');
[filen,path]= uigetfile({'*.png';'*.jpg';'*.jpeg';},'Select IMAGE');
loc =strcat(path,filen);
img = imread(loc);
img2 = imresize(img,[250,250]);
set(handles.axes1);
imshow(img2);
net1 = netw.net;
results = classify(net1,img2);
77
disp(results);
if results == 'NORMAL'
set(handles.edit1,'String','NORMAL 100% BACTERIA 0% VIRUS 0%');
elseif results== 'PNEUMONIA_BACTERIA'
set(handles.edit1,'String',' BACTERIA 100% VIRUS 0% NORMAL 0%');
elseif results== 'PNEUMONIA_VIRUS'
set(handles.edit1,'String','VIRUS 100% BACTERIA 0% NORMAL 0%');
end
pause(5);
PNEM_INTERFACE_3;
close(PNEM_INTERFACE_2);
function edit1_Callback(hObject, eventdata, handles)
% hObject handle to edit1 (see GCBO)
% Hints: get(hObject,'String') returns contents of edit1 as text
% str2double(get(hObject,'String')) returns contents of edit1 as a double
% --- Executes during object creation, after setting all properties.
function edit1_CreateFcn(hObject, eventdata, handles)
% hObject handle to edit1 (see GCBO)
78
% handles empty - handles not created until after all CreateFcns called
% Hint: edit controls usually have a white background on Windows.
% See ISPC and COMPUTER.
if ispc && isequal(get(hObject,'BackgroundColor'),

get(0,'defaultUicontrolBackgroundColor'))
set(hObject,'BackgroundColor','white');
end
###########################################################################
#############################
CODE FOR THE THIRD MODULE
function varargout = PNEM_INTERFACE_3(varargin)

existing
% singleton*.

handle to
79

arguments.

raises the
gui_Singleton = 1;
80
end
if nargout
else
end
81
function fp_Callback(hObject, eventdata, handles)
% hObject handle to fp (see GCBO)
% Hints: get(hObject,'String') returns contents of fp as text
82
% str2double(get(hObject,'String')) returns contents of fp as a double
function fp_CreateFcn(hObject, eventdata, handles)
% hObject handle to fp (see GCBO)

end
function fn_Callback(hObject, eventdata, handles)
% hObject handle to fn (see GCBO)
% Hints: get(hObject,'String') returns contents of fn as text
% str2double(get(hObject,'String')) returns contents of fn as a double
83
function fn_CreateFcn(hObject, eventdata, handles)
% hObject handle to fn (see GCBO)

end
function tp_Callback(hObject, eventdata, handles)
% hObject handle to tp (see GCBO)
% Hints: get(hObject,'String') returns contents of tp as text
% str2double(get(hObject,'String')) returns contents of tp as a double
function tp_CreateFcn(hObject, eventdata, handles)
% hObject handle to tp (see GCBO)
84

end
function f1score_Callback(hObject, eventdata, handles)
% hObject handle to f1score (see GCBO)
% Hints: get(hObject,'String') returns contents of f1score as text
% str2double(get(hObject,'String')) returns contents of f1score as a double
function f1score_CreateFcn(hObject, eventdata, handles)
% hObject handle to f1score (see GCBO)
85
end
function tn_Callback(hObject, eventdata, handles)
% hObject handle to tn (see GCBO)
% Hints: get(hObject,'String') returns contents of tn as text
% str2double(get(hObject,'String')) returns contents of tn as a double
function tn_CreateFcn(hObject, eventdata, handles)
% hObject handle to tn (see GCBO)
86
end
function sensitivity_Callback(hObject, eventdata, handles)
% hObject handle to sensitivity (see GCBO)
% Hints: get(hObject,'String') returns contents of sensitivity as text
% str2double(get(hObject,'String')) returns contents of sensitivity as a double
function sensitivity_CreateFcn(hObject, eventdata, handles)
% hObject handle to sensitivity (see GCBO)
87
end
function accuracy_Callback(hObject, eventdata, handles)
% hObject handle to accuracy (see GCBO)
% Hints: get(hObject,'String') returns contents of accuracy as text
% str2double(get(hObject,'String')) returns contents of accuracy as a double
function accuracy_CreateFcn(hObject, eventdata, handles)
% hObject handle to accuracy (see GCBO)
88
end
function recall_Callback(hObject, eventdata, handles)
% hObject handle to recall (see GCBO)
% Hints: get(hObject,'String') returns contents of recall as text
% str2double(get(hObject,'String')) returns contents of recall as a double
function recall_CreateFcn(hObject, eventdata, handles)
% hObject handle to recall (see GCBO)
89
end
function specificity_Callback(hObject, eventdata, handles)
% hObject handle to specificity (see GCBO)
% Hints: get(hObject,'String') returns contents of specificity as text
% str2double(get(hObject,'String')) returns contents of specificity as a double
function specificity_CreateFcn(hObject, eventdata, handles)
% hObject handle to specificity (see GCBO)

90
end
function precision_Callback(hObject, eventdata, handles)
% hObject handle to precision (see GCBO)
% Hints: get(hObject,'String') returns contents of precision as text
% str2double(get(hObject,'String')) returns contents of precision as a double
function precision_CreateFcn(hObject, eventdata, handles)
% hObject handle to precision (see GCBO)

end
91
t2 =load('pnem_workspace.mat');
pre = num2str(t2.overall_precision);
rec = num2str(t2.overall_recall);
f1_s = num2str(t2.f1_score);
acc = num2str(t2.accuracy);
spec = num2str(t2.specificity);
sens = num2str(t2.sensitivity);
tp = num2str(t2.TP);
tn = num2str(t2.TN);
fp = num2str(t2.FP);
fn = num2str(t2.FN);
set(handles.precision,'String',pre);
set(handles.recall,'String',rec);
set(handles.f1score,'String',f1_s);
set(handles.accuracy,'String',acc);
set(handles.specificity,'String',spec);
set(handles.sensitivity,'String',sens);
92
set(handles.tp,'String',tp);
set(handles.fp,'String',fp);
set(handles.tn,'String',tn);
set(handles.fn,'String',fn);
set(handles.tinfo,'String',"Training:=80%; Testing:= 20%");
function tinfo_Callback(hObject, eventdata, handles)
% hObject handle to tinfo (see GCBO)
% Hints: get(hObject,'String') returns contents of tinfo as text
% str2double(get(hObject,'String')) returns contents of tinfo as a double
function tinfo_CreateFcn(hObject, eventdata, handles)
% hObject handle to tinfo (see GCBO)
93

end
t =load('pnem_workspace.mat');
figure;
cht1 = confusionchart(t.cm);
cht1.Title = 'CONFUSION CHART FOR PNEUMONIA DETECTION';
%disp(loc);
94
figure;
plotconfusion(t.YTest,t.YPred);
cat = categorical({'accuracy', 'f1_score', 'Precision', 'Recall', 'sensitivity', 'specificity'});
figure;
bar(cat,[t.accuracy t.f1_score t.overall_precision t.overall_recall t.sensitivity

t.specificity]),title('BAR CHART COMPARISON OF EACH CNN METRIC');
grid on;
95
cat = categorical({'accuracy', 'f1_score', 'Precision', 'Recall', 'sensitivity', 'specificity'});
figure;
p = line(cat,[t.accuracy t.f1_score t.overall_precision t.overall_recall t.sensitivity

t.specificity]),title('LINE GRAPH COMPARISON');
p.LineWidth =4
p.Marker = 'o'
p.MarkerFaceColor = [1 0.5 0];
p.MarkerSize =5
grid on;
96

Pneumonia NEW1

Uploaded by

Copyright:

Available Formats

Pneumonia NEW1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pneumonia NEW1

Uploaded by

Copyright:

Available Formats

CHAPTER ONE

1.1 Background of the Study

To diagnose pneumonia, a healthcare professional will typically start by performing a physical

Pneumonia detection using machine learning represents a significant advancement in medical

In conclusion, pneumonia is a widespread and potentially severe respiratory infection caused

1.2 Statement of the Problem

1.3 Aim and Objectives of the Study

1.5 Significance of the Study

1.6 Scope of the Study

1.7 Definition of Terms

Machine Learning: An application of artificial intelligence where computer systems are

Web-based Application: An application accessible through web browsers, typically hosted

2.1.1 Causes and Types of Pneumonia

1. Community-Acquired Pneumonia (CAP): As the name suggests, community-

2.1.2 Symptoms of Pneumonia

2.1.3 Risk Factors associated with Pneumonia

Several risk factors increase an individual's susceptibility to developing pneumonia.

2.1.4 Diagnosis of Pneumonia

The diagnosis of pneumonia involves a combination of clinical assessment, medical history

The combination of these diagnostic approaches helps healthcare professionals accurately

1. Antibiotics for Bacterial Pneumonia: If the pneumonia is caused by bacteria,

1. Antifungal Medications for Fungal Pneumonia: Fungal pneumonia is less common

2.1.6 Preventive Measures against Pneumonia

1. Vaccination: Vaccination is one of the most effective ways to prevent pneumonia.

In summary, preventive measures against pneumonia revolve around vaccination, hand

2.2 Machine learning

2.3 Machine Learning Applications

Machine learning has numerous applications across various industries, including:

2.4 Machine Learning Approaches

Figure 2.2 Machine Learning Approaches (westbrook 2016)

2.4.2 Unsupervised Learning

Unsupervised learning is a type of machine learning approach that involves training an

(b) Association: In association, the goal is to find relationships or associations between

Reinforcement learning is a type of machine learning approach that involves training an

2.5 Theoretical Frameworks

2.5.1 Random Forest

1. Increased accuracy: Random Forest typically offers higher accuracy compared to

2.5.2 Convolutional Neural Network (CNN)

- \( I \) is the input image.

- \( K \) is the convolution kernel/filter.

- \( (i, j) \) represents the position of the output pixel.

- \( (m, n) \) represents the position within the filter/kernel.

2.6 General Architecture of Machine Learning

Figure 2.5 General architecture of machine learning (Tripathi et al., 2021)

Feature engineering is the process of selecting, extracting, or transforming specific features

Choosing an appropriate machine learning algorithm or model architecture is critical. The

After obtaining a well-performing model, it can be deployed in real-world applications to make

Monitoring and Maintenance:

2.7 Related works

Several significant studies have contributed to the advancement of machine learning-based

Tsang et al. (2020) developed "Application of Machine Learning to Support Self-Management

H. Chen et al. (2019) proposed "Automatic Multi-Level In-Exhale Segmentation and

S/N Author(s) Strategy Limitation Performance

1 Chen et al. (2021) Random forest Small dataset 98%

2 Vatanparvar et al. (2020) Gaussian mixture model, Small dataset 93.34%

5 Pramono et al. (2019) Logistic regression Not available 88.70%

6 Chen et al. (2019) SVM, extreme learning Small dataset 99.52%

7 Zhang et al. (2020) Recursive feature elimination Self report 90%

2.8 Comparison of related works