0% found this document useful (0 votes)

11 views18 pages

Razi AML Assignment2

Uploaded by

ahmed.razi98989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views18 pages

Razi AML Assignment2

Uploaded by

ahmed.razi98989

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Razi_AML_Assignment2

August 23, 2021

1 ASSIGNMENT 2 - AML

2 Step 1: Download the Liver patient data from the following

sources:

https://www.kaggle.com/uciml/indian-liver-patient-records
Step 2: Use the following 07 features from this dataset
Age, Total_Bilirubin, Direct_Bilirubin, Alkaline_Phosphotase, Alamine_Aminotransferase, To-
tal_Protiens, Albumin,
Step 3: Your task is to predict whether a patient suffers from a liver disease using above features.
Split your data into test and train. First use a random forest algorithm for performing this task.
Then, use a Adaboost Classifier to perform similar task. Compare the accuracy of these two
algorithms

3 Step 2: Use the following 07 features from this dataset

Age, Total_Bilirubin, Direct_Bilirubin, Alkaline_Phosphotase, Alamine_Aminotransferase, To-

tal_Protiens, Albumin,

4 Step 3: Your task is to predict whether a patient suffers from a

liver disease using above features.

Split your data into test and train. First use a random forest algorithm for performing this task.
Then, use a Adaboost Classifier to perform similar task. Compare the accuracy of these two
algorithms
[35]: #Import the required libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

1
import warnings
from sklearn import metrics
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

[36]: #Read the data from csv

df = pd.read_csv('indian_liver_patient.csv')
df.head()

[36]: Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

0 65 Female 0.7 0.1 187
1 62 Male 10.9 5.5 699
2 62 Male 7.3 4.1 490
3 58 Male 1.0 0.4 182
4 72 Male 3.9 2.0 195

Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens \

0 16 18 6.8
1 64 100 7.5
2 60 68 7.0
3 14 20 6.8
4 27 59 7.3

Albumin Albumin_and_Globulin_Ratio Dataset

0 3.3 0.90 1
1 3.2 0.74 1
2 3.3 0.89 1
3 3.4 1.00 1
4 2.4 0.40 1

[37]: #Drop not needed feature columns as mentioned

df.
,→drop(columns=['Gender','Aspartate_Aminotransferase','Albumin_and_Globulin_Ratio']␣

,→,inplace=True)

df.head()

[37]: Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

0 65 0.7 0.1 187
1 62 10.9 5.5 699
2 62 7.3 4.1 490
3 58 1.0 0.4 182
4 72 3.9 2.0 195

Alamine_Aminotransferase Total_Protiens Albumin Dataset

2
0 16 6.8 3.3 1
1 64 7.5 3.2 1
2 60 7.0 3.3 1
3 14 6.8 3.4 1
4 27 7.3 2.4 1

[38]: print (df)

value=['Age', 'Total_Bilirubin', 'Direct_Bilirubin', 'Alkaline_Phosphotase',␣
,→'Alamine_Aminotransferase',␣

,→'Aspartate_Aminotransferase','Albumin_and_Globulin_Ratio','Total_Protiens',␣

,→'Albumin']

Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

0 65 0.7 0.1 187
1 62 10.9 5.5 699
2 62 7.3 4.1 490
3 58 1.0 0.4 182
4 72 3.9 2.0 195
.. … … … …
578 60 0.5 0.1 500
579 40 0.6 0.1 98
580 52 0.8 0.2 245
581 31 1.3 0.5 184
582 38 1.0 0.3 216

Alamine_Aminotransferase Total_Protiens Albumin Dataset

0 16 6.8 3.3 1
1 64 7.5 3.2 1
2 60 7.0 3.3 1
3 14 6.8 3.4 1
4 27 7.3 2.4 1
.. … … … …
578 20 5.9 1.6 2
579 35 6.0 3.2 1
580 48 6.4 3.2 1
581 29 6.8 3.4 1
582 21 7.3 4.4 2

[583 rows x 8 columns]

[39]: # Looking for missing values in the dataset

df.isna().sum()

[39]: Age 0
Total_Bilirubin 0
Direct_Bilirubin 0
Alkaline_Phosphotase 0

3
Alamine_Aminotransferase 0
Total_Protiens 0
Albumin 0
Dataset 0
dtype: int64

5 Analyze data frame shape, data types etc.

[40]: def analyze_dataframe(dataframe):

print("\n Shape of df :: \n" ,dataframe.shape)
print("\n Data types of df columns :: \n" ,dataframe.dtypes)
print("\n Description of df :: \n" ,dataframe.describe())

analyze_dataframe(df)

Shape of df ::
(583, 8)

Data types of df columns ::

Age int64
Total_Bilirubin float64
Direct_Bilirubin float64
Alkaline_Phosphotase int64
Alamine_Aminotransferase int64
Total_Protiens float64
Albumin float64
Dataset int64
dtype: object

Description of df ::
Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \
count 583.000000 583.000000 583.000000 583.000000
mean 44.746141 3.298799 1.486106 290.576329
std 16.189833 6.209522 2.808498 242.937989
min 4.000000 0.400000 0.100000 63.000000
25% 33.000000 0.800000 0.200000 175.500000
50% 45.000000 1.000000 0.300000 208.000000
75% 58.000000 2.600000 1.300000 298.000000
max 90.000000 75.000000 19.700000 2110.000000

Alamine_Aminotransferase Total_Protiens Albumin Dataset

count 583.000000 583.000000 583.000000 583.000000
mean 80.713551 6.483190 3.141852 1.286449

4
std 182.620356 1.085451 0.795519 0.452490
min 10.000000 2.700000 0.900000 1.000000
25% 23.000000 5.800000 2.600000 1.000000
50% 35.000000 6.600000 3.100000 1.000000
75% 60.500000 7.200000 3.800000 2.000000
max 2000.000000 9.600000 5.500000 2.000000

[41]: # Rename the dataset column to result

df.rename(columns={'Dataset':'Result'}, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 583 entries, 0 to 582
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 583 non-null int64
1 Total_Bilirubin 583 non-null float64
2 Direct_Bilirubin 583 non-null float64
3 Alkaline_Phosphotase 583 non-null int64
4 Alamine_Aminotransferase 583 non-null int64
5 Total_Protiens 583 non-null float64
6 Albumin 583 non-null float64
7 Result 583 non-null int64
dtypes: float64(4), int64(4)
memory usage: 36.6 KB

[42]: # Having a look at the dataset after the numerical transformation

df.head()

[42]: Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

0 65 0.7 0.1 187
1 62 10.9 5.5 699
2 62 7.3 4.1 490
3 58 1.0 0.4 182
4 72 3.9 2.0 195

Alamine_Aminotransferase Total_Protiens Albumin Result

0 16 6.8 3.3 1
1 64 7.5 3.2 1
2 60 7.0 3.3 1
3 14 6.8 3.4 1
4 27 7.3 2.4 1

5
6 Data Pre Processing and Visualization

[43]: # Dropping the missing values

df = df.dropna()

[44]: # Having a look at the correlation matrix

fig, ax = plt.subplots(figsize=(9,7))
sns.heatmap(df.corr(), annot=True, fmt='.1g', cmap="Greens", cbar=False);

[45]: sns.pairplot(df, hue='Result')

[45]: <seaborn.axisgrid.PairGrid at 0x7f33f0d994f0>

6
[46]: def count_healthy_unhealthy_livers_and_plot(dataframe):
print ('Count of Unhealthy Livers : {} '.format(dataframe.Result.
,→value_counts()[1]))

print ('Count of Healthy Livers : {} '.format(dataframe.Result.

,→value_counts()[2]))

# visualize number of patients diagonised with liver diesease

sns.countplot(data = dataframe, x = 'Result')

count_healthy_unhealthy_livers_and_plot(df)

Count of Unhealthy Livers : 416

Count of Healthy Livers : 167

7
[47]: def pie_plot_draw():
plt.pie(x=df["Result"].value_counts(),
colors=["red","green"],
labels=["UnHealthy Liver","Healthy Liver"],
shadow = True)
plt.show()

plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(7,7))
pie_plot_draw()

8
[48]: def histogram_plot_draw(col_name):
sns.histplot(x=df[col_name], kde=True, color="blue")

histogram_plot_draw("Age")

9
[49]: # X axis data
X = df.drop("Result", axis=1)
X.head()

[49]: Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

0 65 0.7 0.1 187
1 62 10.9 5.5 699
2 62 7.3 4.1 490
3 58 1.0 0.4 182
4 72 3.9 2.0 195

Alamine_Aminotransferase Total_Protiens Albumin

0 16 6.8 3.3
1 64 7.5 3.2
2 60 7.0 3.3
3 14 6.8 3.4
4 27 7.3 2.4

[50]: # y axis data

y = df["Result"]
y.head()
X

10
[50]: Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \
0 65 0.7 0.1 187
1 62 10.9 5.5 699
2 62 7.3 4.1 490
3 58 1.0 0.4 182
4 72 3.9 2.0 195
.. … … … …
578 60 0.5 0.1 500
579 40 0.6 0.1 98
580 52 0.8 0.2 245
581 31 1.3 0.5 184
582 38 1.0 0.3 216

Alamine_Aminotransferase Total_Protiens Albumin

0 16 6.8 3.3
1 64 7.5 3.2
2 60 7.0 3.3
3 14 6.8 3.4
4 27 7.3 2.4
.. … … …
578 20 5.9 1.6
579 35 6.0 3.2
580 48 6.4 3.2
581 29 6.8 3.4
582 21 7.3 4.4

[583 rows x 7 columns]

7 Feature Selection

[51]: from sklearn.ensemble import ExtraTreesClassifier

from sklearn.feature_selection import SelectFromModel
clf = ExtraTreesClassifier(n_estimators=50)
clf = clf.fit(X, y)
print("Showing feature importance values")
print(clf.feature_importances_)

Showing feature importance values

[0.15332017 0.14923714 0.11684129 0.1573351 0.16197282 0.12395798
0.13733548]

[52]: model=SelectFromModel(clf, prefit=True) #getting features from the above␣

,→classifer as per the importances

cols=X.columns.to_list()#getting list of columns

tf=model.get_support()#getting which features are important

11
selectedcols=[]
for i in range(len(cols)):
if tf[i]:
selectedcols.append(cols[i])
print("showing selected columns")
print(selectedcols)
#converting the data
X_new = model.transform(X)
X_new.shape

showing selected columns

['Age', 'Total_Bilirubin', 'Alkaline_Phosphotase', 'Alamine_Aminotransferase']

[52]: (583, 4)

8 Splitting the data into training and test datasets

Here, we are trying to predict whether the patient has an Unhealthy Liver or not using the given
data.
[53]: from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_new, y, test_size=0.2,␣
,→random_state=42)

[54]: # Scaling the data

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

[55]: len(X_train), len(X_test)

[55]: (466, 117)

9 Random Forest Classifier

[56]: from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(n_estimators = 100)
rfc.fit(X_train,y_train)

[56]: RandomForestClassifier()

12
[57]: RandomForestClassifierScore = rfc.score(X_test, y_test)
print("Accuracy obtained by Random Forest Classifier model:
,→",RandomForestClassifierScore*100)

Accuracy obtained by Random Forest Classifier model: 74.35897435897436

[58]: # Having a look at the confusion matrix

from sklearn.metrics import classification_report,confusion_matrix
y_pred_rfc = rfc.predict(X_test)
cf_matrix = confusion_matrix(y_test, y_pred_rfc)
sns.heatmap(cf_matrix, annot=True, cmap="Spectral")
plt.title("Confusion Matrix for Random Forest Classifier", fontsize=14,␣
,→fontname="DejaVu Sans", y=1.03);

[59]: # Classification report of Random Forest Classifier

print(metrics.classification_report(y_test, y_pred_rfc))

precision recall f1-score support

13
1 0.82 0.84 0.83 87
2 0.50 0.47 0.48 30

accuracy 0.74 117

macro avg 0.66 0.65 0.66 117
weighted avg 0.74 0.74 0.74 117

10 Ada Boost Classifier

[60]: bdt = AdaBoostClassifier()

bdt.fit(X_train, y_train)

[60]: AdaBoostClassifier()

[61]: AdaBoostClassifierScore = bdt.score(X_test,y_test)

print("Accuracy obtained by Ada Boost Classifier model:
,→",AdaBoostClassifierScore*100)

Accuracy obtained by Ada Boost Classifier model: 77.77777777777779

[62]: # Confusion matrix

y_pred_bdt = bdt.predict(X_test)
cf_matrix = confusion_matrix(y_test, y_pred_bdt)
sns.heatmap(cf_matrix, annot=True, cmap="Spectral")
plt.title("Confusion Matrix for Ada Boost Classifier", fontsize=14,␣
,→fontname="DejaVu Sans", y=1.03);

14
[63]: # Classification Report of Ada Boost Classifier

print(metrics.classification_report(y_test, y_pred_bdt))

precision recall f1-score support

1 0.85 0.85 0.85 87

2 0.57 0.57 0.57 30

accuracy 0.78 117

macro avg 0.71 0.71 0.71 117
weighted avg 0.78 0.78 0.78 117

15
11 KNNNeighbourClassifier

[64]: from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

[64]: KNeighborsClassifier(n_neighbors=3)

[65]: KNeighborsClassifierScore = knn.score(X_test,y_test)

print("Accuracy obtained by Decision KNN Classifier model:
,→",KNeighborsClassifierScore*100)

Accuracy obtained by Decision KNN Classifier model: 70.94017094017094

[66]: y_pred_knn = knn.predict(X_test)

cf_matrix = confusion_matrix(y_test, y_pred_knn)
sns.heatmap(cf_matrix, annot=True, cmap="Spectral")
plt.title("Confusion Matrix for KNN Classifier", fontsize=14, fontname="DejaVu␣
,→Sans", y=1.03);

16
[67]: # Classification Report of KNN Classifier

print(metrics.classification_report(y_test, y_pred_knn))

precision recall f1-score support

1 0.80 0.80 0.80 87

2 0.43 0.43 0.43 30

accuracy 0.71 117

macro avg 0.62 0.62 0.62 117
weighted avg 0.71 0.71 0.71 117

12 Plot and Compare Results of Classifiers

[68]: def plot_and_compare_results():

plt.style.use("seaborn")
classifiers = ["Ada Boost Classifier",
"KNN Classifier",
"Random ForestClassifier"]

scores = [AdaBoostClassifierScore,
KNeighborsClassifierScore,
RandomForestClassifierScore]

fig, ax = plt.subplots(figsize=(8,6))
sns.barplot(x=classifiers,y=scores);
plt.ylabel("Model Accuracy")
plt.xticks(rotation=60)
plt.title("Model Comparison - Model Accuracy", fontsize=15,␣
,→fontname="DejaVu Sans", y=1.04);

plot_and_compare_results()

17
18

Sms Spam Detectionn
No ratings yet
Sms Spam Detectionn
63 pages
Evaluation-Practice Questions (Answer Key)
100% (1)
Evaluation-Practice Questions (Answer Key)
4 pages
A Machine Learning Analysis of Stock Market Tick Data For Stock Price Trend Prediction
100% (1)
A Machine Learning Analysis of Stock Market Tick Data For Stock Price Trend Prediction
24 pages
Project - Presentation For ML Based Weather Prediction
No ratings yet
Project - Presentation For ML Based Weather Prediction
46 pages
Predictive Modelling ALOK KUMAR
100% (1)
Predictive Modelling ALOK KUMAR
25 pages
Final Group Project
No ratings yet
Final Group Project
26 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
No ratings yet
Heart Disease Risk Factor Data Analysis Midterm Data 2 - Jupyter Notebook
20 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Hybrid Deep Learning Approach For Enhanced Detection and Mitigation of Ddos Attack in SDN Networks
No ratings yet
Hybrid Deep Learning Approach For Enhanced Detection and Mitigation of Ddos Attack in SDN Networks
17 pages
(Defence)
No ratings yet
(Defence)
33 pages
NB 24 Aug
No ratings yet
NB 24 Aug
85 pages
INNHotels Group
No ratings yet
INNHotels Group
40 pages
Liver Disease Prediction Using Machine Learning
No ratings yet
Liver Disease Prediction Using Machine Learning
28 pages
Molecular Classification of Leukemia Using Gene Expression Data and Random Forest
No ratings yet
Molecular Classification of Leukemia Using Gene Expression Data and Random Forest
17 pages
LAB8 LogisticReg HeartDisease
No ratings yet
LAB8 LogisticReg HeartDisease
31 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
42 pages
Hcin620 m6 Lab6 Hanifahmutesi-Finalproject
No ratings yet
Hcin620 m6 Lab6 Hanifahmutesi-Finalproject
5 pages
Capstone Project 2
No ratings yet
Capstone Project 2
15 pages
A Fuzzy Ontology and Its Application To News Summarization
100% (1)
A Fuzzy Ontology and Its Application To News Summarization
22 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Eda-Ml-Decision-Tree - Ipynb - Colab
No ratings yet
Eda-Ml-Decision-Tree - Ipynb - Colab
20 pages
Handwritten Digit Recognition With ML Models
No ratings yet
Handwritten Digit Recognition With ML Models
41 pages
Synergizing Unsupervised and Supervised Learning: A Hybrid Approach For Accurate Natural Language Task Modeling
No ratings yet
Synergizing Unsupervised and Supervised Learning: A Hybrid Approach For Accurate Natural Language Task Modeling
10 pages
Diabetes Prediction 1704256341
No ratings yet
Diabetes Prediction 1704256341
17 pages
Gastric Cancer Detection
No ratings yet
Gastric Cancer Detection
34 pages
Liver Disease Prediction Using Machine Learning
0% (1)
Liver Disease Prediction Using Machine Learning
5 pages
COMP5318
No ratings yet
COMP5318
42 pages
Binary Prediction of Smoker Status Using Bio-Signals
No ratings yet
Binary Prediction of Smoker Status Using Bio-Signals
20 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
Sample Afreen Internship REPORT (4BP17CS004)
No ratings yet
Sample Afreen Internship REPORT (4BP17CS004)
27 pages
Anemia Code
No ratings yet
Anemia Code
33 pages
Rapport
No ratings yet
Rapport
21 pages
Step 1
No ratings yet
Step 1
10 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Rethinking Robustness of Model Attributions: Sandesh Kamath, Sankalp Mittal, Amit Deshpande, Vineeth N Balasubramanian
No ratings yet
Rethinking Robustness of Model Attributions: Sandesh Kamath, Sankalp Mittal, Amit Deshpande, Vineeth N Balasubramanian
28 pages
Machine Learning With PySpark
No ratings yet
Machine Learning With PySpark
21 pages
Databases and Computerized Information Retrieval
No ratings yet
Databases and Computerized Information Retrieval
57 pages
Paper 80
No ratings yet
Paper 80
11 pages
ML
No ratings yet
ML
8 pages
Identifying Banking Transaction Descriptions Via S
No ratings yet
Identifying Banking Transaction Descriptions Via S
14 pages
Parul Institute of Engineering and Technology Faculty of Engineering and Technology Department of Information Technology
No ratings yet
Parul Institute of Engineering and Technology Faculty of Engineering and Technology Department of Information Technology
15 pages
A Comparative Study On Algorithms For Plant Disease Detection Using Transfer Learning
No ratings yet
A Comparative Study On Algorithms For Plant Disease Detection Using Transfer Learning
6 pages
IID3 Classifier To Diagnosis of High Blood Glucose Levels During Pregnancy
No ratings yet
IID3 Classifier To Diagnosis of High Blood Glucose Levels During Pregnancy
16 pages
Bhosale, Patnaik - 2023
No ratings yet
Bhosale, Patnaik - 2023
17 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Liver Disease Prediction Using Ensemble Technique
No ratings yet
Liver Disease Prediction Using Ensemble Technique
4 pages
DL Assignment
No ratings yet
DL Assignment
11 pages
Priya Revised ICCCNT Paper
No ratings yet
Priya Revised ICCCNT Paper
6 pages
Congestion Control Prediction Model For 5G Environment Based On Supervised and Unsupervised Machine Learning Approach
No ratings yet
Congestion Control Prediction Model For 5G Environment Based On Supervised and Unsupervised Machine Learning Approach
13 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
Crime Type and Occurrence Prediction Using Machine Learning Algorithm
No ratings yet
Crime Type and Occurrence Prediction Using Machine Learning Algorithm
8 pages
Ai in HC - 2
No ratings yet
Ai in HC - 2
9 pages
34 Davass1
No ratings yet
34 Davass1
8 pages
Healthcare-Project-Simplilearn - Week1
No ratings yet
Healthcare-Project-Simplilearn - Week1
6 pages
My Code
No ratings yet
My Code
7 pages
Project
No ratings yet
Project
13 pages
C2M4 - Assignment: 1 Cox Proportional Hazards and Random Survival Forests
No ratings yet
C2M4 - Assignment: 1 Cox Proportional Hazards and Random Survival Forests
18 pages
Predicting Clinically Promising Therapeutic Hypotheses Using Tensor Factorization
No ratings yet
Predicting Clinically Promising Therapeutic Hypotheses Using Tensor Factorization
12 pages
Indian Liver Patient RMarkdown
No ratings yet
Indian Liver Patient RMarkdown
14 pages
Experiment 5
No ratings yet
Experiment 5
9 pages
Bengali Text Classification Distinguishing Saintly and Common Forms Using Machine Learning Model
No ratings yet
Bengali Text Classification Distinguishing Saintly and Common Forms Using Machine Learning Model
7 pages
Liver Disease Prediction Using Ensemble Technique
No ratings yet
Liver Disease Prediction Using Ensemble Technique
4 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
No ratings yet
Apply Logistic Regression Model Techniques To Predict Data On Any Dataset
5 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Methodolgy
No ratings yet
Methodolgy
8 pages
IPYNB Converter
No ratings yet
IPYNB Converter
8 pages
17.11.24 - Jupyter Notebook - Doc
No ratings yet
17.11.24 - Jupyter Notebook - Doc
6 pages
Liver Patient Classifi Cation Using Logistic Regression
No ratings yet
Liver Patient Classifi Cation Using Logistic Regression
5 pages
Project 190
No ratings yet
Project 190
6 pages
MLT Lab 07
No ratings yet
MLT Lab 07
4 pages
Comparative Analysis of Machine Learning Techniques For Indian Liver Disease Patients
No ratings yet
Comparative Analysis of Machine Learning Techniques For Indian Liver Disease Patients
5 pages
AI CaseProcessing
No ratings yet
AI CaseProcessing
8 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Personalized Healthcare Recommendations Unified Mentor Internship Project
No ratings yet
Personalized Healthcare Recommendations Unified Mentor Internship Project
3 pages
Cardiovascular Disease Prediction
No ratings yet
Cardiovascular Disease Prediction
2 pages
Roadmap On Machine Learning and Deep Learning
No ratings yet
Roadmap On Machine Learning and Deep Learning
2 pages
Easychair Preprint: Pallavi Kohakade and Sujata Jadhav
No ratings yet
Easychair Preprint: Pallavi Kohakade and Sujata Jadhav
5 pages
Untitled4.ipynb - Colab
No ratings yet
Untitled4.ipynb - Colab
3 pages
Poster Sizes
No ratings yet
Poster Sizes
1 page

Razi AML Assignment2

Uploaded by

Razi AML Assignment2

Uploaded by

Razi_AML_Assignment2

August 23, 2021

2 Step 1: Download the Liver patient data from the following

3 Step 2: Use the following 07 features from this dataset

Age, Total_Bilirubin, Direct_Bilirubin, Alkaline_Phosphotase, Alamine_Aminotransferase, To-

4 Step 3: Your task is to predict whether a patient suffers from a

[36]: #Read the data from csv

[36]: Age Gender Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens \

Albumin Albumin_and_Globulin_Ratio Dataset

[37]: #Drop not needed feature columns as mentioned

[37]: Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

Alamine_Aminotransferase Total_Protiens Albumin Dataset

[38]: print (df)

Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

Alamine_Aminotransferase Total_Protiens Albumin Dataset

[583 rows x 8 columns]

[39]: # Looking for missing values in the dataset

5 Analyze data frame shape, data types etc.

[40]: def analyze_dataframe(dataframe):

Data types of df columns ::

Alamine_Aminotransferase Total_Protiens Albumin Dataset

[41]: # Rename the dataset column to result

[42]: # Having a look at the dataset after the numerical transformation

[42]: Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

Alamine_Aminotransferase Total_Protiens Albumin Result

[43]: # Dropping the missing values

[44]: # Having a look at the correlation matrix

[45]: sns.pairplot(df, hue='Result')

[45]: <seaborn.axisgrid.PairGrid at 0x7f33f0d994f0>

print ('Count of Healthy Livers : {} '.format(dataframe.Result.

# visualize number of patients diagonised with liver diesease

Count of Unhealthy Livers : 416

[49]: Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase \

Alamine_Aminotransferase Total_Protiens Albumin

[50]: # y axis data

Alamine_Aminotransferase Total_Protiens Albumin

[583 rows x 7 columns]

[51]: from sklearn.ensemble import ExtraTreesClassifier

Showing feature importance values

[52]: model=SelectFromModel(clf, prefit=True) #getting features from the above␣

cols=X.columns.to_list()#getting list of columns

showing selected columns

8 Splitting the data into training and test datasets

[54]: # Scaling the data

from sklearn.preprocessing import StandardScaler

[55]: len(X_train), len(X_test)

[55]: (466, 117)

9 Random Forest Classifier

[56]: from sklearn.ensemble import RandomForestClassifier

Accuracy obtained by Random Forest Classifier model: 74.35897435897436

[58]: # Having a look at the confusion matrix

[59]: # Classification report of Random Forest Classifier

precision recall f1-score support

accuracy 0.74 117

10 Ada Boost Classifier

[60]: bdt = AdaBoostClassifier()

[61]: AdaBoostClassifierScore = bdt.score(X_test,y_test)

Accuracy obtained by Ada Boost Classifier model: 77.77777777777779

[62]: # Confusion matrix

precision recall f1-score support

1 0.85 0.85 0.85 87

accuracy 0.78 117

[64]: from sklearn.neighbors import KNeighborsClassifier

[65]: KNeighborsClassifierScore = knn.score(X_test,y_test)

Accuracy obtained by Decision KNN Classifier model: 70.94017094017094

[66]: y_pred_knn = knn.predict(X_test)

precision recall f1-score support

1 0.80 0.80 0.80 87

accuracy 0.71 117

12 Plot and Compare Results of Classifiers

[68]: def plot_and_compare_results():

You might also like