Machine Learning (Chapter1)
Machine Learning (Chapter1)
Linear Regression:
Linear regression is a statistical method used to model the relationship
between a dependent variable (target variable) and one or more
independent variables (predictor variables). It assumes a linear
relationship between the variables.
Simple Linear Regression: Involves one independent variable.
Multiple Linear Regression: Involves multiple independent variables.
The linear regression model can be represented as:
y = b0 + b1*x1 + b2*x2 + ... + bn*xn + ε
Where:
* y is the dependent variable
* b0 is the intercept
* b1, b2, ..., bn are the coefficients for the independent variables x1,
x2, ..., xn
* ε is the error term
Examples of Linear Regression
Simple Linear Regression:
➢ Predicting house prices based on square footage:
i. Dependent variable: House price
ii. Independent variable: Square footage
➢ Predicting student grades based on study hours:
i. Dependent variable: Grades
ii. Independent variable: Study hours
Multiple Linear Regression:
➢ Predicting car prices based on mileage, age, and horsepower:
➢ Dependent variable: Car price
➢ Independent variables: Mileage, age, horsepower
➢ Predicting sales based on advertising expenditure, price, and
competition:
➢ Dependent variable: Sales
➢ Independent variables: Advertising expenditure, price,
competition
Applications of Linear Regression
Linear regression is widely used in various fields, including:
• Finance: Predicting stock prices, portfolio returns
• Economics: Forecasting GDP, inflation
• Marketing: Predicting sales, customer churn
• Healthcare: Modeling disease progression, predicting patient
outcomes
• Social Sciences: Analyzing relationships between social factors
Key Considerations
➢ Assumptions: Linear regression makes certain assumptions about
the data, such as linearity, independence, normality, and
homoscedasticity. Violating these assumptions can affect the
model's accuracy.
➢ Outliers: Outliers can significantly impact the regression line. It's
essential to identify and handle outliers appropriately.
➢ Multicollinearity: When independent variables are highly
correlated, it can affect the model's stability and interpretation.
➢ Outliers: Outliers can significantly impact the regression line. It's
essential to identify and handle outliers appropriately.
➢ Multicollinearity: When independent variables are highly
correlated, it can affect the model's stability and interpretation.
Multiple Regression
Multiple regression is an extension of simple linear regression
that involves more than one independent variable to predict a
continuous dependent variable.
Example:
• Predicting house prices based on multiple factors like square
footage, number of bedrooms, location, age, etc.
• Model:
• y = b0 + b1x1 + b2x2 + ... + bn*xn + ε
• y: Dependent variable (house price)
• x1, x2, ..., xn: Independent variables (square footage, number of
bedrooms, etc.)
• b0: Intercept
• b1, b2, ..., bn: Coefficients
• ε: Error term
Logistic Regression
Logistic regression is used for classification problems where the
dependent variable is categorical (usually binary, like yes/no or 0/1). It
estimates the probability of an event occurring.
Example:
* Predicting whether a customer will churn (leave a company) based
on factors like tenure, contract type, usage, etc.
Model:
* Uses a logistic function to map linear combinations of predictors to
probabilities.
* Output is a probability between 0 and 1.
* A threshold (often 0.5) is used to classify instances.
Logistic Function
The logistic function (or sigmoid function) is used in logistic
regression to map any real number to a value between 0 and 1.
Formula:
* f(x) = 1 / (1 + e^(-x))
Graph:
Example:
* In logistic regression, the output of the linear combination of
predictors is passed through the logistic function to obtain the
probability of the positive class.
Key Differences
* Multiple regression predicts a continuous value, while logistic
regression predicts a probability.
* Multiple regression uses a linear equation, while logistic regression
uses a logistic function.