0% found this document useful (0 votes)
11 views6 pages

Aih Exp 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Aih Exp 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Sardar Patel Institute of Technology,Mumbai

Department of Electronics and Telecommunication Engineering


B.E. Sem-VII- PE-IV (2024-2025)
IT 24 - AI in Healthcare

Experiment 1: Regression in Healthcare Dataset

Name:Prithvi Singh
Date: 16/08/2024
UID: 2022301014

Objective:
● Write program for regression analysis for healthcare dataset.
● To demonstrate the working principle of regression techniques on medical data set for
building the model to classify/ predict using a new sample.
Outcomes:
● Explore the Medical Dataset suitable for linear/ logistic regression problem
● Explore the pattern from the dataset and apply suitable algorithm

System Requirements:
Linux OS with Python and libraries or R or windows with MATLAB

• Theory:

What is regression with a mathematical approach?


A dependent variable (often represented as yy) and one or more independent variables (xx) can be
modeled and analyzed statistically using regression. Regression analysis is used to determine
which equation best fits the data of the independent variables to predict the dependent variable.

Mathematical Approach to Regression

1. Linear Regression

Linear regression analysis is used to predict the value of a variable based on the value of another
variable. The variable you want to predict is called the dependent variable. The variable you are
using to predict the other variable's value is called the independent variable
The linear regression formula is:
y = mx + b
Where:
• y is the dependent variable (response variable)
• x is the independent variable (predictor variable)
• m is the slope (coefficient) of the regression line
• b is the intercept (constant) of the regression line

2. Logistic Regression

Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote,
based on a given data set of independent variables.

This type of statistical model (also known as logit model) is often used for classification and
predictive analytics. Since the outcome is a probability, the dependent variable is bounded between
0 and 1. In logistic regression, a logit transformation is applied on the odds—that is, the probability
of success divided by the probability of failure. This is also commonly known as the log odds, or
the natural logarithm of odds, and this logistic function is represented by the following formulas:
Logit(pi) = 1/(1+ exp(-pi))

ln(pi/(1-pi)) = Beta_0 + Beta_1*X_1 + … + B_k*K_k

What are the many forms of regression and what does it mean?
Several categories exist for regression analysis, each serving a specific function depending on the
makeup of the independent and dependent variables. A summary of numerous common types of
regression and their applicability may be found here.

1. Linear Regression

Types:

1. Simple Linear Regression: Uses a straight line to represent the relationship between a
single independent variable and a dependent variable.
2. Multiple Linear Regression: This technique simulates the connection between a
dependent variable and two or more independent variables.

Significance: Helpful in understanding the link between factors and forecasting a continuous
outcome. It can assist in forecasting and trend identification and is based on the assumption of a
linear relationship.

2. Logistic Regression

On the basis of the categories, Logistic Regression can be classified into three types:

Types:

1. Binomial: In a binomial logistic regression, the dependent variables can only be of two
types: either 0 or 1, Pass or Fail, etc.

2. Multinomial: In multinomial logistic regression, the dependent variable, such as "cat,"


"dogs," or "sheep," may be one or more of three potential unordered varieties.

3. Ordinal: Three or more ordered sorts of dependent variables, such as "low," "medium," or
"high," are conceivable in ordinal logistic regression.

Significance: Critical for estimating the probability of a categorical outcome, especially when
dealing with multinomial or binary answer variables. It provides information about the factors
influencing categorical conclusions.
• Dataset for Logistic Regression:

• ALGORITHM:
Load the Original Dataset:

● Read the original dataset from a CSV file.

Separate Features and Target:

● Identify and separate the target variable (Growing_Stress) and the feature variables.
● Split the features into numerical and categorical columns.

Generate Synthetic Data:

● For Numeric Columns:


○ Generate synthetic data using a normal distribution based on the mean and standard
deviation of the original data for each numeric feature.
● For Categorical Columns:
○ Generate synthetic data by randomly sampling from the unique values in the
original dataset. Preserve the original distribution using probabilities.

Model the Target Variable (Growing_Stress):

● Train a logistic regression model using the original dataset to predict the
Growing_Stress variable.
● Use this model to predict Growing_Stress values for the synthetic data.
Combine Original and Synthetic Data:

● Append the synthetic data to the original dataset.

Save the Combined Dataset:

● Write the combined dataset (original + synthetic) to a new CSV file.

Validate Model Accuracy

● Train a logistic regression model on the combined dataset to check the model's accuracy.
Ensure the accuracy is around 95%.

● Logistic Regression Dataset:


https://www.kaggle.com/datasets/bhavikjikadara/mental-health-dataset

Output:
● Testing it again on new dataset

● Conclusion:
We performed logistic regression on a mental health dataset. First, we used an initial dataset to train
the model. Next, we constructed a fresh dataset with comparable properties, and we used my trained
model to predict the values with a comparable level of accuracy. To sum up, we have successfully
created a regression analysis software for a healthcare dataset and illustrated how regression techniques
operate on a diabetes data set to create a model that can be used to forecast or classify using a fresh
sample.

You might also like