Aih Exp 1
Aih Exp 1
Name:Prithvi Singh
Date: 16/08/2024
UID: 2022301014
Objective:
● Write program for regression analysis for healthcare dataset.
● To demonstrate the working principle of regression techniques on medical data set for
building the model to classify/ predict using a new sample.
Outcomes:
● Explore the Medical Dataset suitable for linear/ logistic regression problem
● Explore the pattern from the dataset and apply suitable algorithm
System Requirements:
Linux OS with Python and libraries or R or windows with MATLAB
• Theory:
1. Linear Regression
Linear regression analysis is used to predict the value of a variable based on the value of another
variable. The variable you want to predict is called the dependent variable. The variable you are
using to predict the other variable's value is called the independent variable
The linear regression formula is:
y = mx + b
Where:
• y is the dependent variable (response variable)
• x is the independent variable (predictor variable)
• m is the slope (coefficient) of the regression line
• b is the intercept (constant) of the regression line
2. Logistic Regression
Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote,
based on a given data set of independent variables.
This type of statistical model (also known as logit model) is often used for classification and
predictive analytics. Since the outcome is a probability, the dependent variable is bounded between
0 and 1. In logistic regression, a logit transformation is applied on the odds—that is, the probability
of success divided by the probability of failure. This is also commonly known as the log odds, or
the natural logarithm of odds, and this logistic function is represented by the following formulas:
Logit(pi) = 1/(1+ exp(-pi))
What are the many forms of regression and what does it mean?
Several categories exist for regression analysis, each serving a specific function depending on the
makeup of the independent and dependent variables. A summary of numerous common types of
regression and their applicability may be found here.
1. Linear Regression
Types:
1. Simple Linear Regression: Uses a straight line to represent the relationship between a
single independent variable and a dependent variable.
2. Multiple Linear Regression: This technique simulates the connection between a
dependent variable and two or more independent variables.
Significance: Helpful in understanding the link between factors and forecasting a continuous
outcome. It can assist in forecasting and trend identification and is based on the assumption of a
linear relationship.
2. Logistic Regression
On the basis of the categories, Logistic Regression can be classified into three types:
Types:
1. Binomial: In a binomial logistic regression, the dependent variables can only be of two
types: either 0 or 1, Pass or Fail, etc.
3. Ordinal: Three or more ordered sorts of dependent variables, such as "low," "medium," or
"high," are conceivable in ordinal logistic regression.
Significance: Critical for estimating the probability of a categorical outcome, especially when
dealing with multinomial or binary answer variables. It provides information about the factors
influencing categorical conclusions.
• Dataset for Logistic Regression:
• ALGORITHM:
Load the Original Dataset:
● Identify and separate the target variable (Growing_Stress) and the feature variables.
● Split the features into numerical and categorical columns.
● Train a logistic regression model using the original dataset to predict the
Growing_Stress variable.
● Use this model to predict Growing_Stress values for the synthetic data.
Combine Original and Synthetic Data:
● Train a logistic regression model on the combined dataset to check the model's accuracy.
Ensure the accuracy is around 95%.
Output:
● Testing it again on new dataset
● Conclusion:
We performed logistic regression on a mental health dataset. First, we used an initial dataset to train
the model. Next, we constructed a fresh dataset with comparable properties, and we used my trained
model to predict the values with a comparable level of accuracy. To sum up, we have successfully
created a regression analysis software for a healthcare dataset and illustrated how regression techniques
operate on a diabetes data set to create a model that can be used to forecast or classify using a fresh
sample.