0% found this document useful (0 votes)

28 views10 pages

Aih Lab1

The document outlines an experiment on regression analysis using healthcare datasets, focusing on linear and logistic regression techniques. It details the objectives, outcomes, system requirements, theoretical background, types of regression, and algorithms used for analysis. The conclusion emphasizes the effectiveness of these models in predicting medical outcomes and enhancing healthcare delivery.

Uploaded by

Komal Tarachandani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views10 pages

Aih Lab1

Uploaded by

Komal Tarachandani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Sardar Patel Institute of Technology,Mumbai

Department of Electronics and Telecommunication Engineering

B.E. Sem-VII- PE-IV (2024-2025)
IT 24 - AI in Healthcare

Experiment 1: Regression in Healthcare Dataset

Name: Sanika Tiwarekar Date: 12/08/2024

Objective:

● Write a program for regression analysis for healthcare dataset.

● To demonstrate the working principle of regression techniques on medical data set

for building the model to classify/ predict using a new sample.
Outcomes:
● Explore the Medical Dataset suitable for linear/ logistic regression problem

● Explore the pattern from the dataset and apply suitable algorithm

System Requirements:
Linux OS with Python and libraries or R or windows with MATLAB
Theory:
What is regression with a mathematical approach?

Regression is a statistical method used in finance, investing, and other disciplines that attempts to
determine the strength and character of the relationship between a dependent variable and one or
more independent variables. Linear regression is the most common form of this technique. Also
called simple regression or ordinary least squares (OLS), linear regression establishes the linear
relationship between two variables. Linear regression is graphically depicted using a straight line
of best fit with the slope defining how the change in one variable impacts a change in the other.
The y-intercept of a linear regression relationship represents the value of the dependent variable
when the value of the independent variable is zero. Nonlinear regression models also exist, but
are far more complex.

Let’s consider a model where (y) is linearly dependent on (x) hence we can create a hypothesis
that can be resembled the equation of a straight line (y=mx+c).Here (θ₀)and (θ₁) are also called
regression coefficients.
The two basic types of regression are simple linear regression and multiple linear regression,
although there are nonlinear regression methods for more complicated data and analysis. Simple
linear regression uses one independent variable to explain or predict the outcome of the
dependent variable Y, while multiple linear regression uses two or more independent variables to
predict the outcome. Analysts can use stepwise regression to examine each independent variable
contained in the linear regression model.

What are the types of regression and its significance?

There are several types of regression techniques, each suited for different types of data and
different types of relationships. The main types of regression techniques are:
1. Linear Regression - Linear regression is a linear approach for modeling the relationship
between the criterion or the scalar response and the multiple predictors or explanatory
variables. Linear regression focuses on the conditional probability distribution of the
response given the values of the predictors. For linear regression, there is a danger of
overfitting.
2. Polynomial Regression - This is used to model a non-linear relationship between the
dependent variable and independent variables. Here the input variables include some
polynomial or higher degree terms of some already existing features as well.
3. Stepwise Regression - It is used for fitting regression models with predictive models. It is
carried out automatically. With each step, the variable is added or subtracted from the set
of explanatory variables. The approaches for stepwise regression are forward selection,
backward elimination, and bidirectional elimination.
4. Decision Tree Regression - A Decision tree is a flowchart-like tree structure, where each
internal node denotes a test on an attribute, each branch represents an outcome of the test,
and each leaf node (terminal node) holds a class label. There is a non-parametric method
used to model a decision tree to predict a continuous outcome.
5. Random Forest Regression - The basic idea behind this is to combine multiple decision
trees in determining the final output rather than relying on individual decision trees.
Random Forest has multiple decision trees as base learning models. We randomly
perform row sampling and feature sampling from the dataset forming sample datasets for
every model. This part is called Bootstrap.
6. Support Vector Regression - SVR can use both linear and non-linear kernels. A linear
kernel is a simple dot product between two input vectors, while a non-linear kernel is a
more complex function that can capture more intricate patterns in the data. The choice of
kernel depends on the data’s characteristics and the task’s complexity.

Dataset:
1. Linear Regression - Medical Insurance Dataset
(https://www.kaggle.com/datasets/mirichoi0218/insurance)
2. Logistic Regression - Heart Disease Dataset
(https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction)

Algorithm:
Step 1: Create a sample dataset with multiple independent variables and one dependent
variable (Y).
Step 2: The data is split into training and testing sets using the train_test_split function.
Step3: Different regression models are created and fitted to the training data.
Step4: Predictions are made on the test set.
Step5: The model is evaluated using metrics like Mean Absolute Error, Mean Squared Error,
and Root Mean Squared Error.
Step6: Finally, the coefficients and intercept of the regression equation are printed.

Code:
1. Linear Regression
import opendatasets as od

od.download('https://www.kaggle.com/datasets/mirichoi0218/insurance')

import pandas as pd

df = pd.read_csv('/content/insurance/insurance.csv')

df.head()

df.describe()

df[['sex', 'smoker', 'region']] = df[['sex', 'smoker',

'region']].astype('category')

df.dtypes

from sklearn.preprocessing import LabelEncoder

label = LabelEncoder()

label.fit(df.sex.drop_duplicates())

df.sex = label.transform(df.sex)
label.fit(df.smoker.drop_duplicates())

df.smoker = label.transform(df.smoker)

label.fit(df.region.drop_duplicates())

df.region = label.transform(df.region)

df.dtypes

from sklearn.model_selection import train_test_split as holdout

from sklearn.linear_model import LinearRegression

from sklearn import metrics

x = df.drop(['charges'], axis = 1)

y = df['charges']

x_train, x_test, y_train, y_test = holdout(x, y, test_size=0.2,

random_state=0)

Lin_reg = LinearRegression()

Lin_reg.fit(x_train, y_train)

print(Lin_reg.intercept_)

print(Lin_reg.coef_)

print(Lin_reg.score(x_test, y_test))

print("Coefficients:", model.coef_)

print("Intercept:", model.intercept_)

2. Logistic Regression
from google.colab import drive

drive.mount('/content/drive')

import zipfile

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler, OneHotEncoder

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report, roc_auc_score,

confusion_matrix

import seaborn as sns

import matplotlib.pyplot as plt

import numpy as np

df = pd.read_csv('/content/drive/MyDrive/AIH C4/heart.csv')

# Separate features and target variable

X = df.drop(columns='HeartDisease')

y = df['HeartDisease']

# Identify categorical and numerical columns

categorical_cols = ['Sex', 'ChestPainType', 'RestingECG',

'ExerciseAngina', 'ST_Slope']

numerical_cols = ['Age', 'RestingBP', 'Cholesterol', 'FastingBS',

'MaxHR', 'Oldpeak']

# Preprocessing pipeline for numerical and categorical features

preprocessor = ColumnTransformer(

transformers=[

('num', StandardScaler(), numerical_cols),

('cat', OneHotEncoder(), categorical_cols)

])
# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y,

test_size=0.25, random_state=42)

# Create a pipeline with preprocessing and logistic regression model

model_pipeline = Pipeline(steps=[('preprocessor', preprocessor),

('classifier',
LogisticRegression(max_iter=10000))])

# Train the logistic regression model

model_pipeline.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model_pipeline.predict(X_test)

y_pred_proba = model_pipeline.predict_proba(X_test)[:, 1]

# Evaluate the model

classification_rep = classification_report(y_test, y_pred)

roc_auc = roc_auc_score(y_test, y_pred_proba)

conf_matrix = confusion_matrix(y_test, y_pred)

# Print the results

print("Classification Report:\n", classification_rep)

print("ROC-AUC Score:", roc_auc)

Output:
1. Linear Regression
2. Logistic Regression
Conclusion:
● The exploration of the healthcare dataset, including visualizing the relationships between
features, allowed us to identify key patterns that influence medical conditions.
● Linear regression was applied to predict continuous outcomes (medical insurance), and
logistic regression was used for classification tasks (heart disease). Logistic regression
performed well for binary classification of disease presence, while linear regression
provided reasonably accurate predictions for continuous variables.
● Such models can be integrated into healthcare systems for early prediction, personalized
treatment plans, and enhanced patient monitoring, significantly improving the overall
quality of healthcare delivery.

9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
Unit 2 Notes - Final
No ratings yet
Unit 2 Notes - Final
32 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Ducati Superbike: Owner's Manual
No ratings yet
Ducati Superbike: Owner's Manual
122 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
KWV 230 BT
No ratings yet
KWV 230 BT
96 pages
Written Arguments Consumer
No ratings yet
Written Arguments Consumer
3 pages
NIJ-0108.01 Ballistic Resistant Protective Materials
100% (1)
NIJ-0108.01 Ballistic Resistant Protective Materials
16 pages
ML 01 (Shubham)
No ratings yet
ML 01 (Shubham)
14 pages
m2 Data Analytic and Visualization
No ratings yet
m2 Data Analytic and Visualization
53 pages
ML 2 ND Unit
No ratings yet
ML 2 ND Unit
50 pages
2 Modele Lineare
No ratings yet
2 Modele Lineare
43 pages
DAV 2201079 Exp 2 2-1
No ratings yet
DAV 2201079 Exp 2 2-1
35 pages
Unit 5
No ratings yet
Unit 5
18 pages
ML 01 (Pranavv)
No ratings yet
ML 01 (Pranavv)
14 pages
DLL - Mapeh 6 - Q1 - W6
No ratings yet
DLL - Mapeh 6 - Q1 - W6
6 pages
Feedback Control Systems (FCS) : Lecture-26 Routh-Herwitz Stability Criterion
No ratings yet
Feedback Control Systems (FCS) : Lecture-26 Routh-Herwitz Stability Criterion
19 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Meghnaghat Power Plant
No ratings yet
Meghnaghat Power Plant
65 pages
Unit - 2 MLA
No ratings yet
Unit - 2 MLA
57 pages
Unit 2
No ratings yet
Unit 2
67 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 34-62
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 34-62
29 pages
Movie Recommendation System
No ratings yet
Movie Recommendation System
28 pages
SOP For Protocol For Working Standard
No ratings yet
SOP For Protocol For Working Standard
6 pages
ML Linear Regression Trupesh Patel
No ratings yet
ML Linear Regression Trupesh Patel
23 pages
Unit 2 3 Notes
No ratings yet
Unit 2 3 Notes
16 pages
Data-Analytics-Manual Lab G.anill Kumar
No ratings yet
Data-Analytics-Manual Lab G.anill Kumar
23 pages
Regression Logistic Unit3 Notes
No ratings yet
Regression Logistic Unit3 Notes
6 pages
Assignment Group C
No ratings yet
Assignment Group C
8 pages
ML U2 Regression
No ratings yet
ML U2 Regression
20 pages
4 ML
No ratings yet
4 ML
41 pages
Lab 6 - Linear Regression and Multiple Linear Regression
No ratings yet
Lab 6 - Linear Regression and Multiple Linear Regression
12 pages
228w1f0065 ML
No ratings yet
228w1f0065 ML
15 pages
DA Unit-3
No ratings yet
DA Unit-3
14 pages
Unit - Iii Data Analysis
No ratings yet
Unit - Iii Data Analysis
39 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
ML Unit-2 Half
No ratings yet
ML Unit-2 Half
16 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
Regression
No ratings yet
Regression
31 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
ML PR-2
No ratings yet
ML PR-2
11 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
Rohit Unit 2 ML Notes
No ratings yet
Rohit Unit 2 ML Notes
7 pages
Regression Analysis 2
No ratings yet
Regression Analysis 2
15 pages
BCI Patient Monitoring Catalogue Goodwin
No ratings yet
BCI Patient Monitoring Catalogue Goodwin
36 pages
SPE-199091-MS, Electric Submersible Pump Troubleshooting Guide, An Effective Way To Improve System Performance and Reduce Avoidable System Failues
100% (1)
SPE-199091-MS, Electric Submersible Pump Troubleshooting Guide, An Effective Way To Improve System Performance and Reduce Avoidable System Failues
18 pages
CS ELEC 4 Finals Module
No ratings yet
CS ELEC 4 Finals Module
57 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Aih Exp 1
No ratings yet
Aih Exp 1
6 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Regression in M.L
No ratings yet
Regression in M.L
13 pages
Types of Regression
No ratings yet
Types of Regression
8 pages
Awwa C 510
No ratings yet
Awwa C 510
18 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
R Unit 4th and 5th
No ratings yet
R Unit 4th and 5th
17 pages
Be Form 2 School Work Plan
100% (1)
Be Form 2 School Work Plan
3 pages
Problem Set 1 Answers
No ratings yet
Problem Set 1 Answers
4 pages
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
0% (1)
Prediction of Medical Costs Using Regression Algorithms: A. Lakshmanarao, Chandra Sekhar Koppireddy, G.Vijay Kumar
7 pages
Ba K 0106 1 en
No ratings yet
Ba K 0106 1 en
20 pages
The Beginners Guide To Concrete Maturity Ebook
No ratings yet
The Beginners Guide To Concrete Maturity Ebook
32 pages
Hostel List
No ratings yet
Hostel List
4 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Assignment 1:: Intro To Machine Learning
No ratings yet
Assignment 1:: Intro To Machine Learning
6 pages
Toyota Sienna 6
No ratings yet
Toyota Sienna 6
2 pages
Vintage Disco Kit Manual
No ratings yet
Vintage Disco Kit Manual
24 pages
CSF213 OOP Handout 2023 24 Sem I
No ratings yet
CSF213 OOP Handout 2023 24 Sem I
3 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
CUBO - Work Schedule
No ratings yet
CUBO - Work Schedule
1 page
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
No ratings yet
Iot Hospital Management System and Analysis With Accessing Data From Cloud Using Machine Learning
7 pages
Simple Packer-In C Gunther
No ratings yet
Simple Packer-In C Gunther
10 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Bipolar Soft Neutrosophic Topological Region
No ratings yet
Bipolar Soft Neutrosophic Topological Region
5 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Regression
No ratings yet
Regression
8 pages
TDS Tam 395 Coaltar Epoxy Black
No ratings yet
TDS Tam 395 Coaltar Epoxy Black
2 pages
Features Description: Single Phase, Multifunction Energy Meter IC
No ratings yet
Features Description: Single Phase, Multifunction Energy Meter IC
30 pages
Extensometer: Types, How It Works, Applications: What Is An Extensometer?
No ratings yet
Extensometer: Types, How It Works, Applications: What Is An Extensometer?
4 pages
Matlab Demo Instructions
No ratings yet
Matlab Demo Instructions
1 page
Pointers Reviewer For Second Periodical Exam
No ratings yet
Pointers Reviewer For Second Periodical Exam
2 pages
ISSCC 2021 Regular Presentations (Template & Guide)
No ratings yet
ISSCC 2021 Regular Presentations (Template & Guide)
17 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Expt 6 - P-I-N and Avalanche Photodiode BER Performance Comparison
No ratings yet
Expt 6 - P-I-N and Avalanche Photodiode BER Performance Comparison
4 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet

Aih Lab1

Uploaded by

Aih Lab1

Uploaded by

Sardar Patel Institute of Technology,Mumbai

Department of Electronics and Telecommunication Engineering

Experiment 1: Regression in Healthcare Dataset

Name: Sanika Tiwarekar Date: 12/08/2024

● Write a program for regression analysis for healthcare dataset.

● To demonstrate the working principle of regression techniques on medical data set

What are the types of regression and its significance?

df[['sex', 'smoker', 'region']] = df[['sex', 'smoker',

from sklearn.preprocessing import LabelEncoder

from sklearn.model_selection import train_test_split as holdout

from sklearn.linear_model import LinearRegression

from sklearn import metrics

x_train, x_test, y_train, y_test = holdout(x, y, test_size=0.2,

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler, OneHotEncoder

from sklearn.pipeline import Pipeline

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import classification_report, roc_auc_score,

import seaborn as sns

import matplotlib.pyplot as plt

# Separate features and target variable

# Identify categorical and numerical columns

categorical_cols = ['Sex', 'ChestPainType', 'RestingECG',

numerical_cols = ['Age', 'RestingBP', 'Cholesterol', 'FastingBS',

# Preprocessing pipeline for numerical and categorical features

('num', StandardScaler(), numerical_cols),

('cat', OneHotEncoder(), categorical_cols)

X_train, X_test, y_train, y_test = train_test_split(X, y,

# Create a pipeline with preprocessing and logistic regression model

model_pipeline = Pipeline(steps=[('preprocessor', preprocessor),

# Train the logistic regression model

# Make predictions on the test set

# Evaluate the model

classification_rep = classification_report(y_test, y_pred)

roc_auc = roc_auc_score(y_test, y_pred_proba)

conf_matrix = confusion_matrix(y_test, y_pred)

# Print the results

print("Classification Report:\n", classification_rep)

print("ROC-AUC Score:", roc_auc)

You might also like