Ravi Internship Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

“Jnana Sangama”, Belagavi - 590018

INTERNSHIP REPORT
“MACHINE LEARNING MODEL FOR PREDICTING
CHRONIC DISEASE”

Submitted in the partial fulfillment of the requirement for the award of


BACHELOR OF ENGINEERING
In
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING ENGINEERING
By
RAVI KUMAR U (1RR20AI022)

Conducted at
COMPANY NAME: VARCONS TECHNOLOGIES PVT LTD.

DEPARTMENT OF AI&ML ENGINEERING

RAJARAJESWARI COLLEGE OF
ENGINEERING
MYSORE ROAD, BANGALORE-560074
(An ISO 9001:2008 Certified Institute)
(2023-24)

internship report 2023-24


RAJARAJESWARI COLLEGE OF ENGINEERING
Department of Artificial Intelligence and Machine Learning
Accredited by NBA, Bengaluru .

CERTIFICATE

This is to certify that the Internship titled “Machine Learning Model for predicting the
risks of chronic diseases” carried out by Mr. Ravi Kumar U, a bonafide student of
Rajarajeswari college of engineering, in partial fulfillment for the award of Bachelor of
Engineering, in Artificial Intelligence and Machine Learning under Visvesvaraya
Technological University, Belagavi, during the year 2023-2024. It is certified that all
corrections/suggestions indicated have been incorporated in the report.

The project report has been approved as it satisfies the academic requirements in respect
of Internship prescribed for the course Internship / Professional Practice (18AII85)

Signature of Guide Signature of HOD Signature of Principal

External Viva:

Name of the Examiner Signature with Date

1)

2)

Internship report 2023-2024 2


DECLARATION

I, Ravi Kumar U, final year student of Artificial Intelligence and Machine


Learning, Rajarajeswari College of Engineering - 560 074, declare that the
Internship has been successfully completed, in Varcons Technologies Pvt Ltd.
This report is submitted in partial fulfillment of the requirements for award of
Bachelor Degree in Artificial Intelligence and Machine Learning, during the
academic year 2023-2024.

Date :21/09/2023 :
Place :Bengaluru

USN : 1RR20AI022
NAME : Ravi Kumar U

Internship report 2023-2024 3


OFFER LETTER

Date: 11th August, 2023

Name: Ravi Kumar U


USN: 1RR20AI022

Dear Student,

We would like to congratulate you on being selected for the Machine Learning With Python
(Research Based) Internship position with Varcons Technologies, effective Start Date 11th August,
2023, All of us are excited about this opportunity provided to you!
This internship is viewed as being an educational opportunity for you, rather than a part-time job. As such,
your internship will include training/orientation and focus primarily on learning and developing new skills
and gaining a deeper understanding of concepts of Machine Learning With Python (Research
Based) through hands-on application of the knowledge you learn while you train with the senior
developers. You will be bound to follow the rules and regulations of the company during your internship
duration.

Again, congratulations and we look forward to working with you!.

Sincerely,

Spoorthi H C
Director
Varcons Technologies
213, 2st Floor,
18 M G Road, Ulsoor,
Bangalore-560001

Internship report 2023-2024 4


ACKNOWLEDGEMENT

This Internship is a result of accumulated guidance, direction and support of several


important persons. We take this opportunity to express our gratitude to all who have helped
us to complete the Internship.

We express our sincere thanks to our Principal, Dr. R Balakrishna for providing us adequate
facilities to undertake this Internship.

We would like to thank our Head of Dept – AIML, Dr. Rajesh K S for providing us an
opportunity to carry out Internship and for his valuable guidance and support.

We would like to thank our Professor’s Software Services for guiding us during the period of
internship.

We express our deep and profound gratitude to our guide, Guide name, Assistant/Associate
Prof, for her keen interest and encouragement at every step in completing the Internship.

We would like to thank all the faculty members of our department for the support extended
during the course of Internship.

We would like to thank the non-teaching members of our dept, for helping us during the
Internship.

Last but not the least, we would like to thank our parents and friends without whose constant
help, the completion of Internship would have not been possible.

NAME:Ravi Kumar U
USN:1RR20AI022

Internship report 2023-2024 5


ABSTRACT

The healthcare field is one of the most essential areas of research in the
modern period, thanks to rapid changes in technology and data. It's difficult
to keep track of a large amount of patient records.

The use of Big Data Analytics allows us to manage this content. Various
ailments can be treated in a variety of ways all throughout the world.
Machine Learning is one of the approaches for disease prediction and
diagnosis. This study reveals how symptoms can be utilized as input
parameters in machine learning to produce disease predictions.

On the dataset, the machine learning technique Random Forest is used to


forecast the disease. The predicted disease and the symptoms are stored on
the database. It is implemented using the Python programming language,
and a graphical user interface (GUI) has been created to dis play the
findings. The Prediction of disease can be done by classifying the given
dataset.

Internship report 2023-2024 6


Table of Contents

Sl no Description Page no

1 Company Profile 9

2 About the Company 11

3 Task Performed 13

4 Introduction 14

5 System Analysis 17

6 Requirement Analysis 19

7 Design Analysis 21

8 Implementation 24

9 Snapshots 30

10 Conclusion 36

11 References 38

Internship report 2023-2024 7


CHAPTER 1
COMPANY PROFILE

Internship report 2023-2024 8


1. COMPANY PROFILE
A Brief History of Company
Company, was incorporated with a goal ”To provide high quality and optimal
Technological Solutions to business requirements of our clients”. Every business
is a different and has a unique business model and so are the technological
requirements. They understand this and hence the solutions provided to these
requirements are different as well. They focus on clients requirements and
provide them with tailor made technological solutions. They also understand that
Reach of their Product to its targeted market or the automation of the existing
process into e-client and simple process are the key features that our clients
desire from Technological Solution they are looking for and these are the
features that we focus on while designing the solutions for their clients.

Company is a Technology Organization providing solutions for all web design


and development, MYSQL, PYTHON Programming, HTML, CSS, ASP.NET
and LINQ. Meeting the ever increasing automation requirements, Sarvamoola
Software Services. specialize in ERP, Connectivity, SEO Services, Conference
Management, effective web promotion and tailor-made software products,
designing solutions best suiting clients requirements.

They understand that the best desired output can be achieved only by
understanding the clients demand better. At our Company we work with them
clients and help them to defiine their exact solution requirement. Sometimes
even they wonder that they have completely redefined their solution or new
application requirement during the brainstorming session, and here they position
themselves as an IT solutions consulting group comprising of high caliber
consultants.

They believe that Technology when used properly can help any business to
scale and achieve new heights of success. It helps Improve its efficiency,
profitability, reliability; to put it in one sentence ” Technology helps you to
Delight your Customers” and that is what we want to achieve.

Internship report 2023-2024 9


CHAPTER 2
ABOUT THE COMPANY

Internship report 2023-2024 10


2. ABOUT THE COMPANY

We are a Technology Organization providing solutions for all web design and development,
Researching and Publishing Papers to ensure the quality of most used ML Models, MYSQL,
PYTHON Programming, HTML, CSS, ASP.NET and LINQ. Meeting the ever increasing
automation requirements, Compsoft Technologies specialize in ERP, Connectivity, SEO
Services, Conference Management, effective web promotion and tailor-made software
products, designing solutions best suiting clients requirements. The organization where they
have a right mix of professionals as a stakeholders to help us serve our clients with best of
our capability and with at par industry standards. They have young, enthusiastic, passionate
and creative Professionals to develop technological innovations in the field of Mobile
technologies, Web applications as well as Business and Enterprise solution. Motto of our
organization is to “Collaborate with our clients to provide them with best Technological
solution hence creating Good Present and Better Future for our client which will bring a
cascading a positive effect in their business shape as well”. Providing a Complete suite of
technical solutions is not just our tag line, it is Our Vision for Our Clients and for Us, We
strive hard to achieve it.

Services provided by Varcons Technologies.


• Core Java and Advanced Java

• Research and Development/Improvise of ML Models

• Web services and development

• Dot Net Framework

• Python

• Selenium Testing

• Conference / Event Management Service

• Academic Project Guidance

• On The Job Training

• Software Training

Internship report 2023-2024 11


CHAPTER 3

TASK
PERFORMED

Internship report 2023-2024 12


WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES
WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED

14/8/23 Monday Preparations – downloading software for python


15/8/23 Tuesday Installation of JuPyter, Introduction to the domain
1st WEEK

16/8/23 Wednesday Data types, Basic Operations of python


17/8/23 Thusrday Operators, List, Sequence
18/8/23 Friday Assignments for the above topics

WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED

21/8/23 Monday Range, Tuples, lists


2nd WEEK

22/8/23 Tuesday List Methods, data types


23/8/23 Wednesday Conditional Statement
24/8/23 Thursday Looping Statements
25/8/23 Friday Exercise on previous topics
26/8/23 Saturday Assignments on exercises for the covered topics

WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED

28/8/23 MondayArray , array methods, pass


29/8/23 TuesdayArray methods, set
3rd WEEK

30/8/23 Wednesday
Seminar on array and files, file operations topic
31/8/23 ThursdayFunction, passing parameters
1/9/23 Friday Installation of anaconda and importing some files for frontend
2/9/23 SaturdayCompletion of installation assignments

WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED

4/9/23 MondayFrontend, I/o files


5/9/23 TuesdayFile operations, frontend exercise
4th WEEK

6/9/23 Wednesday
Modes of files, assignments for the above
7/9/23 ThursdayVarious operations and methods of file operations
8/9/23 Friday Continuation
9/9/23 SaturdayAssignments for the above

WEEK DATE DAY SUMMARY OF THE TOPIC/MODULE COMPLETED

11/9/23 MondayClasses and objects, instance, inheritance


12/9/23 TuesdayMachine Learning introduction
WEEK

13/9/23 Wednesday
Project allocation
14/9/23 ThursdayProject explanation
15/9/23 Friday Working of project
5th

16/9/23 SaturdayWorking on project

Internship report 2023-2024 13


CHAPTER 4

INTRODUCTION

Internship report 2023-2024 14


4.INTRODUCTION

Introduction to ML

Machine Learning (ML) is a subfield of artificial intelligence (AI) that focuses on


developing algorithms and models that enable computers to learn and make
predictions or decisions based on data. ML systems are designed to improve their
performance on a specific task through experience, rather than relying on explicit
programming.

Here are some key concepts in machine learning:

1. Data: ML relies heavily on data. It uses historical or labeled data to train


models. This data can include text, numbers, images, and more.

2. Training: During the training phase, ML models are exposed to a dataset,


and they learn patterns and relationships within that data. This enables them
to make predictions or classifications.

3. Features: Features are the variables or attributes in your dataset that the ML
model uses to make predictions. Effective feature selection and engineering
are crucial for model performance.

4. Algorithms: ML algorithms are the mathematical techniques and rules that


models use to make predictions or decisions. Common ML algorithms
include decision trees, neural networks, and support vector machines.

5. Supervised Learning: In this type of ML, the model is trained on a labeled


dataset, where the input data is paired with the correct output. It learns to
make predictions by mapping inputs to outputs.

6. Unsupervised Learning: In contrast, unsupervised learning deals with


unlabeled data. It involves finding patterns, groupings, or structures in data
without predefined target labels.

7. Evaluation: ML models need to be assessed for their performance. Common


metrics include accuracy, precision, recall, and F1 score, depending on the
specific problem.

Internship report 2023-2024 15


8. Overfitting and Underfitting: These are common issues in ML. Overfitting
occurs when a model learns the training data too well but struggles with new,
unseen data. Underfitting, on the other hand, happens when a model is too
simplistic to capture the underlying patterns.

9. Hyperparameters: These are settings that are not learned from data but are
set before training. Tuning hyperparameters is essential to optimize model
performance.

10.Deployment: After training, ML models are deployed to make predictions


on new, real-world data. This could be in applications like recommendation
systems, image recognition, or autonomous vehicles.

Machine Learning has a wide range of applications across industries, from


healthcare and finance to autonomous vehicles and natural language processing. It's
a rapidly evolving field with ongoing research and development, making it an
exciting and valuable part of modern technology.

Problem Statement

Machine Learning algorithms for predicting the risks of chronic diseases.


The rising prevalence of chronic diseases poses a significant healthcare challenge,
necessitating the development of an accurate and accessible predictive tool. This
project addresses the need for a machine learning-based chronic disease prediction
system that leverages patient data, medical records, and lifestyle information to
enable early detection, personalized risk assessment, and informed decision-
making. The system aims to empower healthcare professionals and individuals by
providing actionable insights for disease prevention and management. It also
emphasizes data privacy and regulatory compliance to ensure secure handling of
sensitive health information. The primary objective is to enhance patient outcomes,
reduce healthcare costs, and promote evidence-based healthcare practices in a user-
friendly and ethical manner.

Internship report 2023-2024 16


CHAPTER 5

SYSTEM ANALYSIS

Internship report 2023-2024 17


5. SYSTEM ANALYSIS

1. Existing System

Here is a general outline of the process involved in existing code-based Chronic


Disease Prediction System:

Steps Involved:

 Define the Problem: Clearly define the problem you want to solve.
Specify the type of chronic disease you want to predict and the objectives
of the model.
 Collect and Prepare Data: Gather relevant data sources. This data may
include medical records, patient histories, lab results, lifestyle factors,
genetic information, and more.
Preprocess the data by cleaning, handling missing values, and performing
feature engineering to extract relevant features.
Split the dataset into training, validation, and test sets to evaluate the
model's performance.
 Model Training: Train the selected ML model using the training dataset.
Fine-tune hyper parameters to optimize model performance, possibly
using techniques like cross-validation.
 Model Evaluation: Evaluate the model using the validation dataset and
appropriate evaluation metrics (e.g., accuracy, precision, recall, F1-score,
ROC-AUC). Adjust the model and hyper parameters based on validation
results.
 Model Testing: Assess the final model's performance using the test
dataset to estimate its real-world predictive capabilities.
 Interpretability and Explainability: Ensure that the model's predictions
are interpretable and explainable, especially in healthcare applications
where transparency is crucial. Use techniques such as feature importance
analysis and SHAP (SHapley Additive exPlanations) values.

Internship report 2023-2024 18


2. Proposed System

Improving an ML model for chronic disease prediction involves an iterative


process of refining the model's performance, robustness, and interpretability. Here
are steps you can take to enhance and optimize your model:

 Collect More Data: Consider gathering additional data, especially if your


dataset is limited. More data can help improve model generalization.
 Data Quality and Preprocessing: Revisit data preprocessing steps to handle
outliers, missing values, and noise more effectively. Explore advanced
techniques like data imputation, anomaly detection, and data augmentation.
 Model Selection: Explore different ML algorithms or model architectures.
Try ensemble methods or deep learning models if appropriate.Opt for more
complex models if simple ones are underfitting. Conversely, use simpler
models if complex ones are overfitting.
 Cross-Validation: Implement cross-validation with multiple folds to obtain
a more robust estimate of model performance.
 Feature Importance Analysis: Use feature importance techniques to
identify the most influential features and prioritize them in your model.
 Ensemble Learning: Explore ensemble methods like Random Forests,
Gradient Boosting, or stacking to combine the predictions of multiple
models.
 Selection of UI Framework: Use of Streamlit as our web interface
framework to create a seemless user experience.
 Design the UI Layout: Plan the layout of your UI, including the placement
of input fields, buttons, and result displays. Consider the user experience
(UX) and aim for an intuitive design.
 Display Predictions: Design an area to display the model predictions in a
clear and understandable format. Use charts, tables, or text to present the
results.

3. Objective of the System

The objective of the chronic disease prediction system is to leverage machine


learning and healthcare data to enable early detection and personalized risk
assessment of chronic diseases. It empowers both healthcare professionals and
individuals to make informed decisions about disease prevention and
management. By providing actionable insights, the system aims to improve
patient outcomes, reduce healthcare costs, and promote evidence-based
healthcare practices. It does so while ensuring ethical data handling, patient
privacy, and compliance with healthcare regulations, making it a valuable tool
in the effort to combat and manage chronic diseases in a user-friendly and
accessible manner.

Internship report 2023-2024 19


CHAPTER 6

REQUIREMENT ANALYSIS

Internship report 2023-2024 20


6. REQUIREMENT ANALYSIS

Hardware Requirement Specification

CATEGORY MINIMUM MAXIMUM


PROCESSOR Intel Core i5 Intel Core i7 and above
HARD DISK 256 GB 500 GB and above
RAM 8 GB 16 GB and above
PROCESSOR SPEED 2.4 GHz
WIRED NETWORKING Ethernet LAN Port USB Ethernet Adapter/Dongle

Software Requirement Specification

CATEGORY MINIMUM MAXIMUM


OPERATING SYSTEM Windows 10 (stable) Windows 11
ENVIRONMENT Visual Studios Code Visual Studios Code(Latest)
LANGUAGE Python 3.10.# Python (Latest version)

BACKEND Scikit learn and Streamlit Backend Integration


Module

Internship report 2023-2024 21


CHAPTER 7
DESIGN ANALYSIS

Internship report 2023-2024 22


7.DESIGN ANALYSIS

Designing and analyzing a project for chronic disease prediction involves several
key components, including data collection, preprocessing, model development, and
evaluation. Below is a high-level overview of the design and analysis process:

Data Collection and Preprocessing:


• Gather relevant healthcare data from sources such as electronic health
records (EHRs), patient histories, medical tests, lifestyle surveys, and genetic
data.
• Ensure data quality by addressing issues like missing values, outliers, and
data integrity.
• Clean and preprocess the data, including handling missing values,
normalizing numerical features, and encoding categorical variables.
• Perform feature selection and engineering to extract meaningful features for
disease prediction.
• Split the dataset into training, validation, and test sets.

Model Selection:
• Choose appropriate machine learning algorithms or models for chronic
disease prediction. Common choices include logistic regression, decision
trees, random forests, gradient boosting, and deep learning models.
• Consider the nature of the disease prediction task (e.g., binary classification
for disease presence or regression for risk assessment).

Model Development and Evaluation:


• Train the selected model(s) on the training dataset while monitoring
performance on the validation set.
• Tune hyperparameters to optimize model performance using techniques like
grid search or Bayesian optimization.
• Implement appropriate regularization methods to prevent overfitting.
• Evaluate the model's performance using metrics such as accuracy, precision,
recall, F1-score, ROC-AUC, and mean squared error (for regression tasks).
• Generate confusion matrices and ROC curves for classification tasks.
• Examine feature importance and interpretability to understand the model's
decision-making process.

Internship report 2023-2024 23


Testing and Validation:
• Thoroughly test the entire system, including data preprocessing, model
execution, and user interface functionality.
• Validate the system's accuracy, reliability, and responsiveness through user
testing and validation with healthcare professionals.

User Interface Design and Deployment:


• Design a user-friendly interface for healthcare professionals and individuals
to interact with the prediction model.
• Include input fields for relevant patient data, buttons for model execution,
and output areas to display predictions and insights.
• Ensure the UI adheres to usability and accessibility standards.
• Deploy the model and UI to a web server or cloud platform, making it
accessible to users.
• Implement secure data transmission and storage protocols to protect sensitive
patient information.

Ethical Considerations:
• Ensure that the project adheres to ethical guidelines, patient privacy
regulations (e.g., HIPAA), and data protection laws.
• Address potential bias and fairness concerns in model predictions.

Documentation and Reporting:


• Document the project thoroughly, including data sources, preprocessing
steps, model details, and evaluation results.
• Provide user documentation and guidelines for using the system effectively
and responsibly.

Internship report 2023-2024 24


CHAPTER 8

IMPLEMENTATION

Internship report 2023-2024 25


8.IMPLEMENTATION
Implementation is the stage where the theoretical design is turned into a working system. The
most crucial stage in achieving a new successful system and in giving confidence on the new
system for the users that it will work efficiently and effectively.

The system can be implemented only after thorough testing is done and if it is found to work
according to the specification. It involves careful planning, investigation of the current
system and it constraints on implementation, design of methods to achieve the change over
and an evaluation of change over methods a part from planning.

Two major tasks of preparing the implementation are education and training of the users and
testing of the system. The more complex the system being implemented, the more involved
will be the system analysis and design effort required just for implementation.

The implementation phase comprises of several activities. The required hardware and
software acquisition is carried out. The system may require some software to be developed.
For this, programs are written and tested. The user then changes over to his new fully tested
system and the old system is discontinued.

1. Preliminaries

1.1. Chronic Disease

According to US National Center for Health Statistics, chronic diseases are diseases that last for a
long period of time, that is, more than three months. These diseases are neither treated by
medicines nor prevented by vaccines. The major cause of chronic diseases is the use of tobacco,
unhealthy food habits, and lack of physical activity. Also, this disease can commonly be caused
due to ageing. Chronic diseases include cardiovascular disease, cancer, arthritis, diabetes, obesity,
epilepsy and seizures, and problems in oral health [35].

Cardiovascular disease includes heart disease and stroke, which highly lead to death. This disease
is caused due to the use of tobacco, intake of nutritionless food, and lack of physical activity.
When these activities are changed by the patient, they might have the chance to reduce the impact
on controlling and preventing cardiovascular disease.

The chronic disease such as arthritis causes inflammation in the joints, causes pain, and stiffness
that increases due to ageing. There is an availability of cost-effective methods for reducing the
effects caused by arthritis but are not used much. The effects of arthritis can be reduced by
following moderate exercises regularly.

Internship report 2023-2024 26


Diabetes is a serious and high-money-consuming disease. The impact of diabetes can be reduced
by self-care and early detection of the disease [53]. Around 7 million people over the age of 65 or
above are affected by this disease particularly type 2 diabetes.

Since 1980, obesity is more common in adults for all age groups. The one who is overweight or
obese can develop the risk of getting high blood pressure (BP), heart diseases, diabetes, and
arthritis. Obesity can also cause some types of cancers.

Oral health problems are a crucial issue that attains special attention in the health of older people.
This is a serious issue, since it affects the normal day-to-day actions of a person such as speak,
chew, swallow, and maintain a nutritional food plan.

1.2. Convolutional Neural Network (CNN)

The ConvNet or CNN is an algorithm of deep learning that fetches the input and assigns the bias
and weights to its several aspects and then distinguishes one from the other [55] as shown
in Algorithm 1. The major reason for using CNN is that it requires only few efforts in
preprocessing the data when compared with other algorithms, since the CNN can learn to
optimize the filters through automate learning [56]. The output layer of CNN can be calculated
using the following expression:

size of output layer=input size−(filter size−1).


(1)

1.3. K-Nearest Neighbor (KNN)

KNN is a supervised machine learning algorithm, which analyzes the similarities between
the new data and the existing data and adds the new data into the category that is highly
similar to the available categories. The KNN can be used in classification as well as
regression tasks, but it is most commonly used in classification. The calculation of Euclidean
distance is expressed mathematically as follows:

x2=c−a2+d−b2.

Internship report 2023-2024 27


2. Proposed Methodology

In this section, a detailed description of the data set creation, model preparation, and
disease prediction has been given. The first action is data collection. Our proposed system
collects structured and unstructured data obtained from various sources. After data
collection, they are subjected to preprocessing and are split into cleaning and test data sets.
Then the training data set is trained with the machine learning algorithms such as CNN and
KNN to a number of epochs for improving the accuracy of the prediction results. After
multiple epochs, once the desired target is achieved, the developed model is ready for
testing.

At this step, the model is tested with the test data set to verify the model performance with
brand-new data that were not used for training. If the model attains the desired accuracy in
test data, then the proposed model is ready for deployment as shown in Figure 1.

2.1. Data Collection & Preprocecssing

The real-life data that includes structured data such as patient basic information including
demographics, living habitat, and lab test results and the unstructured data such as the
symptoms of the disease faced by the patient and their consultation with the doctor. The
data set excludes the patient's personal details such as name, ID, and location so as to
preserve their privacy.

The collected data are preprocessed for the availability of missing values in most of the
structured data. Hence, it is essential to fill out the missed data or remove or modify them to
enhance the quality of the data set. The preprocessing step also eliminates the commas,
punctuations, and white spaces. Once the preprocessing of data has been completed, it is
then subjected to feature extraction followed by disease prediction.

Internship report 2023-2024 28


2.2 . Model Description

As discussed above, the data set consists of both structured and unstructured data. The
structured data comprises patient demographics and the data related to the cause for the
disease such as age, gender height, weight, and so on, patient's living habitat, laboratory test
results, and the disease that they are affected in tabular format. The unstructured data
comprises patient's disease symptoms and the information about the interrogation with
doctors in text format. The unstructured data is an added advantage of the prediction task
to get a more accurate results. The data set is split into 80% for training and 20% for
testing.

2.3. Disease Prediction Using CNN

The proposed system uses the CNN algorithm in the prediction of chronic disease. At first,
the data set is converted into vector form, followed by word embedding to adopt zero
values for filling the data. It is then given to the convolution layer.

The pooling layer takes the input from the convolution layer and follows the max pooling
operation. The output of max pooling is given to the fully connected layer, and then finally,
the output layer provides the classification results. Figure 2 shows the block diagram of the
convolutional neural network.

Internship report 2023-2024 29


3. Performance Evaluation

For evaluating the proposed disease prediction model, four performance evaluation metrics
are used. The confusion matrix consists of the true positives (TP), which is the correct
prediction of the target as a patient with chronic disease; the true negatives (TN), which is
the correct prediction of the persons without diseases; false positives (FP), which is the
incorrect prediction of the healthy person as a diseased person, and false negatives (FN),
which is the incorrect prediction of the target as healthy persons. The following is the
description of the four performance evaluation parameters.

3.1. Accuracy

The classification accuracy is described as the ratio of correct predicted values to the total
predicted values and is depicted mathematically as follows:

Accuracy=TP+TNTP+TN+FP+FN∗100.

3.2. Precision

The precision or positive predictive value (PPV) is described as the ratio of correct
prediction to the total correct values including the true and false predictions and is depicted
mathematically as follows:

Precision=TPTP+FP.

Internship report 2023-2024 30


CHAPTER 9
SNAPSHOTS

Internship report 2023-2024 31


9. SNAPSHOTS

CODE SNIPPET:

Code to read the dataset and train the model

Integration of the model with front end web implementation

Internship report 2023-2024 32


Dataset and Accuracy of the Model:

Dataset that the model is trained with.

100% accuracy of the Prediction Model.

Internship report 2023-2024 33


User Interface and Web Implementation:

User Interface

Internship report 2023-2024 34


Checking for symptoms and selecting them

Internship report 2023-2024 35


Prediction of the model based on the selected symptoms.

Internship report 2023-2024 36


CHAPTER 10
CONCLUSION

Internship report 2023-2024 37


10.CONCLUSION
We set out to create a system which can predict disease on the basis of symptoms
given to it. Such a system can decrease the rush at OPDs of hospitals and reduce
the workload on medical staff. We were successful in creating such a system and
use 4 different algorithm to do so.

On an average we achieved accuracy of ~100%. Such a system can be largely


reliable to do the job. Creating this system we also added a way to store the data
entered by the user in the database which can be used in future to help in creating
better version of such system.

Our system also has an easy to use interface. It also has various visual
representation of data collected and results achieved.

Internship report 2023-2024 38


11.REFERENCE

1. WWW.GOOGLE.COM.
2. GeeksforGeeks.
3. Scikit Learn Documentations.
4. Streamlit Documentation.
5. Youtube

Internship report 2023-2024 39

You might also like