0% found this document useful (0 votes)
35 views12 pages

Hepatitis Disease Prediction Using - Machine.Learning

Uploaded by

KEERTHI PAVANAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views12 pages

Hepatitis Disease Prediction Using - Machine.Learning

Uploaded by

KEERTHI PAVANAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

1|Page

Hepatitis Disease Prediction Using


Machine Learning Techniques
Keerthi Pavanam – 19R21A0588, Naresh Thummala
19R21A05B7, Mohammad Jahangeer Baba – 20R25A0509

MLR INSTITUTE OF TECHNOLOGY

Batch Number: 22PB7

Under the guidance of K. Swetha (Assistant Professor)

(ANN) will be considered as classifiers and


predictive tools for the diagnosis of
1. ABSTRACT hepatitis.

This research aims to define the most


effective instrument for Hepatitis
diagnosis and detection and the
2. INTRODUCTION
estimation of hepatitis patients' longevity.
Medical diagnosis is a crucial and
2.1 Domain Specific :
challenging endeavour that calls for
precise identification. It is critical to
diagnose the disease and treat it as soon Hepatitis is considered a major
as possible. The liver is an imperative
organ in the human body. Hepatitis, which chronic liver disease
causes liver inflammation, is one of the
most serious diseases that impairs liver worldwide. The liver is
function. Inflammation is swelling that
occurs when tissues in the body are considered to be the heaviest
damaged or infected. Hepatitis causes
nearly one and half a million deaths and one of the largest organs of
worldwide annually. Diagnosis of hepatitis
the human body. The liver is
is primarily based on routine blood tests.
Medical diagnosis of Hepatitis is very
one of the vital organs of the
difficult for doctors because there are
many factors to consider when diagnosing human body that performs
the disease procedure. Therefore, the
development of an automatic and various functions. These
accurate diagnostic system detection of
hepatitis and decision-making by functions include bile secretion,
physicians accordingly. The Machine
Learning algorithms like Support Vector protein formation, and removal
Machines (SVM), K Nearest Neighbours
(KNN) and Artificial Neural Networks of toxins from the body.
2|Page

Therefore, inflammation of the Moreover, diagnosis of such

liver (caused by hepatitis) leads diseases by intelligent systems

to impaired liver function and, will reduce costs and examine

as a result, a person's health patients in less time. Therefore,

deteriorates. Symptoms of the development of intelligent

hepatitis vary from person to diagnostic systems for this kind

person, and some people have of disease prediction is of great

no symptoms. Known importance.

symptoms include yellowing of

the eyes and skin, abdominal


3. LITERATURE SURVEY
pain, loss of appetite and

fatigue. Depending on the

duration, hepatitis can be acute

or chronic. If the duration is An extensive literature review

less than 6 months, it is acute. was performed by investigating

However, if it lasts more than existing systems for hepatitis

half a year, it is chronic. prediction. Numerous studies,

journals and publications were

also consulted before this


2.2 Problem Statement :
survey was formulated.
Hepatitis is reported to kill
3.1 Existing System :
more than 1 million people

annually. Diagnosing hepatitis Responses to the various

by conventional methods is a research papers are recorded

difficult task and requires below in the numerical order

expensive medical tests.


3|Page

which is used to identify them and weight it with a fuzzy weighting


preconditioner. An Artificial Immune
in the references at the end. Recognition System (AIRS) classifier is
used to group the weighted input
values resulting from the fuzzy
weighted pre-processing.
a) A.H.Roslina et al., [1] reported that
hepatitis patients need uninterrupted
d) The paper [4] presented a hybrid
treatment to reduce mortality.
medical decision support system
Support Vector Machines (SVM), a
based on Extreme Learning Machines
machine learning technique, can be
(ELM) and Rough Sets (RS) for
used to perform classification and
hepatitis disease identification. This is
prediction. A wrapper method is
called RS-ELM. This hybrid model
included to remove noisy features
consists of two phases and is tested
before classification. Combining
against datasets in the UCI machine
feature selection and classification
learning repository. In the first phase,
processes may allow for improved
the RS approach is used to remove
predictions across data
redundant features with missing
values from the dataset. In the second
b) Mehrbakhsh Nilashi et al., [2] have
phase, ELM is used to implement the
proposed an accurate strategy to
residual feature classification process
identify hepatitis from using
after missing value removal.
synchronized learning approach. They
Ultimately, it improves the
used different techniques for different
classification accuracy.
activities. In this research, a non-
incremental ANFIS has been
e) Sasmita Nayak et al., [5] in this study,
implemented for learning the
the author brings into picture the role
classification models. Moreover, the
of classification in predicting hepatitis
method developed by ANFIS does not
disease. In the training phase, a
support the incremental learning and
classification model is built by
it requires to re compute all the
analysing the training data consisting
training data in constructing the
of class labels. The testing data set
prediction models. Accordingly, in
also consists of class labels. The
order to improve the computation
algorithms used were GRNN, RBF,
time of hepatitis diagnosis, it is
RBEF. The RBF and RBEF algorithms
suggested to develop this method to
gave an accuracy of 96.77%.
incrementally update the training
models when new information is
f) Xiaolu Tian et al., [6] have done a
available.
research to predict Hepatitis B Surface
Antigen. In this model logistic
c) The hepatitis diagnosis discussed in
regression, decision tree, random
Kemal Polat et al. [3] was performed
forest and XGboost were used. XG
using a hybrid machine learning
boost algorithm’s accuracy rates was
algorithm. Data sets on hepatitis
high. Methods used: Logistic
disease are collected from the UCI
regression. : 1.00, Decision tree: 0.97,
repository. The whole process uses
Random forest: 0.99, XG boost: 0.98.
three stages. In the first stage, they
used the C 4.5 decision tree algorithm
g) Surabhi Mali et al., [7] have
to implement a feature selection
developed an application to predict
subroutine with a reduced number of
Hepatitis Mortality. The authors have
hepatitis disease datasets. We then
used machine learning classification
normalize the data set within [0, 1]
4|Page

algorithms such as SVM, KNN, and rest of the models


ANN.
were not efficient in
h) P Mohan Ganesh et al., [8] have
proposed a model to predict real time.
Hepatitis. The authors have used
classifiers such as SVM, KNN and ANN.  Smaller datasets:
Out of the three algorithms, ANN was
the most efficient with an accuracy of Most of the
70% in real time prediction.
proposed works
i) Shantanu Mishra et al., [9] in this
research the authors have used many have only used the
algorithms for classification such as
Decision Trees, KNN, SVM, Extra-tree, dataset from UCI
Light BGM, Adaboost. Out of these
algorithms, LightBgm was the most
repository. The
efficient with an accuracy of 94.11%.
dataset is quite
3.2 Disadvantages/Limitations in
small as it contains
the Existing Systems :
only 416 instances
 Few methods were
of chronic liver
non-incremental. Re
disease patients.
computation of
 Algorithms couldn’t
entire model needs
deal with imbalance
to be done in case of
of data. Logistic
adding a new
Linear Regression
dataset.
has classified the
 Lower Accuracy
instances into
rates. We have
negative class due to
found out that only
imbalance of data.
SVM, KNN and
 Few models only
ANN algorithms
gave accurate results
gave an efficient
for smaller datasets
accuracies, whereas
5|Page

and failed when the redundancies and

volume of the imbalance of data, class

datasets was imbalance, missing

increased. values, etc.

 Besides prediction of
3.3 Proposed Model :
disease, we shall be
 In the proposed model,
adding life expectancy
we will be using SVM,
factor based on the
KNN, and ANN for
results, and also will be
classification as these
displaying preventive
algorithms were proved
measures and
to be efficient and
information regarding
accurate in real time and
the disease.
were compatible with
 Increased Volume of
increased volume of
Data: we have
data.
considered datasets
 The accuracy of the
from two sources, i.e.,
classifiers plays a vital
from UCI repository
role in predicting the
and from ILPD. The
disease. We shall be
total number of
developing a model
instances of liver
with improvised
disease patients after
accuracies and also try
considering both the
to avoid overfitting of
datasets are: 738.
data and also solve the

problems of
6|Page

4. PROPOSED MODEL

4.1 System Architecture:


7|Page

The system architecture gives an overview of Machine Learning Algorithms / Classification


the working of the system. Techniques:
DATASET: 1) SVM:
1) Collection of Patients’ data: Pseudo code:
. The patients’ data is obtained from two • Import necessary libraries
sources: • Load data from the csv file
a) UCI Machine Learning repository: It is a csv • Distribution of classes
file that consists of 155 instances of patients
with chronic liver disease within the age of 7- • Removal of unwanted columns
78 years. The dataset also consists of 20 • Dividing the data as Train/Test dataset
attributes along with two classes: LIVE or DIE. • Modelling (SVM with Scikit-learn)
b) ILPD (Indian Liver Patient Database): The • Evaluation
dataset consists of the information of the
patients of North-East of Andhra Pradesh. It
consists of 416 liver patient records and 167 2) KNN:
non liver patient records, out of which 441 • Import necessary libraries
records are of male patient’s and 142 are of • Read the data
female patient’s
• Split the dataset into values and labels
• Divide the data as Train/Test dataset to
2) Training Dataset: 80% of the dataset is used avoid over fitting by using train_test_split
as training dataset available in scikit-learn
• Scale the features to uniformly evaluate
3) Testing Dataset: 20% of the dataset is used them
as testing dataset • Import KNN classifier and specify the
number of neighbours
4) Data Pre-processing: Data pre-processing is • Make predictions on the test data
an important step because the data collected • Evaluate
is usually incomplete, redundant, noisy, and
inconsistent. Data pre-processing will be
carried out in several steps, including cleaning 3) ANN:
data by handling missing values, data • Load the necessary libraries
transformation by performing standardization
• Import the data
or normalization, and data reduction.
• Define the classes ( LIVE or DIE )
• Splitting the data into train and test dataset
5) EDA and Feature Extraction:
using
EDA: Checks if dataset suffers of Class
train_test_split function
imbalance
• Define ANN classification model
Feature Extraction: It refers to the process of
transforming raw data into numerical features • Apply ANN
that can be processed while preserving the • Test the dataset
information in the original data set. It yields
better results than applying machine learning
directly to the raw data. As the number of
attributes present in both the datasets is 4.2 UML Diagrams:
different, feature extraction must be done to
form new attributes without much variance.
8|Page

a) USE-CASE Diagram:

d) Sequence UML Diagram:

b) Class Diagram:

e) Package UML Diagram:

c) Collaboration UML Diagram:


9|Page

f) Use-case UML Diagram:

 Input device: Standard


5. SYSTEM REQUIREMENTS
Keyboard and Mouse

 Output device: VGA

5.1 Hardware and Software and High Resolution

Requirements: Monitor

Hardware requirements: Software requirements:

 Processor: Any
 Operating system: Windows 10
 Anaconda with Python3
Processor above 500
 Spyder
 Jupyter Notebook
MHz  Scikit-learn library
 RAM: 4 GB
5.2 Functional and Non-
 Hard Disk: 10 GB
functional Requirements:
10 | P a g e

Functional Requirements:  Accessibility: Accessibility is a

term used to describe whether a


 Understand all the
product or software is publicly
features as well as the
available and how easily it can be
data provided in the
accessed.
dataset.
 Maintainability: Maintainability
 Map the data in the
refers to the ease with which a
dataset with the given
software, tool, or system can be
input data. Find
changed to: Fix bugs Meet new
patterns, if any, with
requirements.
both the dataset as well
 Scalability: The system performs
as input data.
well even in situations such as low
 Check whether the input
bandwidth and huge data sets.
data of a patient will
 Portability: Portability is the ability
result in the diagnosis
to easily reuse existing code when
of Hepatitis or not.
moving from one location or
 If Hepatitis is
environment to another.
diagnosed, provide

information on the

diagnosis of Hepatitis. 6. MODULE DESIGN

 Provide the percentage

accuracy of the

proposed prediction.

Non-Functional Requirements:
11 | P a g e

7. REFERENCES

1. H. Roslina and A. Noraziah,

"Prediction of hepatitis prognosis

using Support Vector Machines

and Wrapper Method," 2010

Seventh International Conference

on Fuzzy Systems and

Knowledge Discovery, Yantai,


12 | P a g e

2010, pp. 2209-2211. doi: of hepatitis disease,” Applied

10.1109/FSKD.2010.5569542 Soft Computing, Volume 13,

2. MehrbakhshNilashi, Issue 8, 2013, Pages 3429-3438.

HosseinAhmadi, Leila 5. https://www.researchgate.net/

Shahmoradi, Othman Ibrahim, publication/

ElnazAkbari, “A predictive 342987994_Analysis_of_Infectiou

method for hepatitis disease s_Hepatitis_Disease_with_High_A

diagnosis using ensembles of ccuracy_Using_Machine_Learning

neuro-fuzzy technique,” Journal _Techniques

of Infection and Public 6. https://www.hindawi.com/

Health,Volume 12, Issue 1, 2019, journals/cmmm/2019/6915850/

Pages 13-20. 7. https://www.studocu.com/in/

3. Kemal Polat, SalihGuneş, “A document/savitribai-phule-pune-

hybrid approach to medical university/information-

decision support systems: technology/project-oral-report/

Combining feature selection, 29289882

fuzzy weighted pre-processing and 8. https://www.jetir.org/papers/

AIRS,” Computer Methods and JETIR2005111.pdf

Programs in Biomedicine, 9. https://www.psychosocial.com/wp-

Volume 88, Issue 2, 2007, Pages content/uploads/2021/06/

164-174. PR320062.pdf

4. Yılmaz Kaya, Murat Uyar, “A

hybrid decision support system

based on rough set and extreme

learning machine for diagnosis

You might also like