0% found this document useful (0 votes)
1 views2 pages

2.2.1 Transcript

This module discusses linear regression as a supervised learning algorithm used to build regression models with dependent and independent variables. It highlights the importance of hypotheses in generating business rules and explains the distinction between mathematical and statistical relationships in regression. The module emphasizes that while regression can predict the value of a dependent variable based on independent variables, it does not establish causation.

Uploaded by

Dev Chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views2 pages

2.2.1 Transcript

This module discusses linear regression as a supervised learning algorithm used to build regression models with dependent and independent variables. It highlights the importance of hypotheses in generating business rules and explains the distinction between mathematical and statistical relationships in regression. The module emphasizes that while regression can predict the value of a dependent variable based on independent variables, it does not establish causation.

Uploaded by

Dev Chan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Predictive Analytics

Prof. Dinesh Kumar U


Module 2

Introduction to Regression

In this module, I am going to discuss linear regression, one of the supervised learning
algorithms that is to build a regression model we need both values of dependent variable and
independent variable. I would like to start with a quote a famous quote by Ronald Coase he
said, "If you torture the data long enough, it will confess." And aggression is one of the
technique that is frequently used to make data confess. Organizations use different business
rules and business rules are actually generated from a hypothesis that the organization may
believe. Let us look at a few interesting hypotheses which various people have claimed. The
first one is good looking couples are more likely to have a girl child. Personally, I like this
hypothesis because I have a daughter at least I'm statistically good looking.

The next hypothesis says that vegetarians miss fewer flights. Women use camera phone more
than men. Left-handed men earn more money, and smokers are better salespeople, and those
who whistle at the workplace are efficient. Organizations use these hypotheses to add value.
For example, let us consider the hypothesis that women use camera phone more than men. If
there is a company which makes cell phones, they can target women using advertisement and
claim that those phones are great for taking photos and similarly consider the hypothesis
smokers are better salespeople. If a company is hiring salespeople, then in the interview they
can ask whether they smoke to the candidate. So, hypotheses basically lead to business rule
that the company can use. Was the regression used to prove all these hypotheses? Interesting
question. We don't know how they came up with these hypotheses. Regression is one of the
techniques and also one of the powerful techniques, but they may have used simple hypothesis
testing techniques such as Z-test or T-test or F-test and there are so many other tests.

So, we don't know really how they actually created these hypotheses. Let us try to understand
the technique of regression. Regression is the tool for finding the existence of an association
relationship between a dependent variable we call it Y and one or more independent variable
in the study. So, the relationship can be either linear or nonlinear. Regression is a statistical
relationship. Linear regression means that the relationship is linear with respect to the
regression parameters. We have to understand the difference between mathematical
relationships and a statistical relationship. Let us look at a mathematical relationship. So let us
say we have Y equal to beta not + beta 1 X.

© All Rights Reserved. This document has been authored by Prof. Dinesh Kumar U and is permitted for use only within the course "Predictive
Analytics" delivered in the online course format by IIM Bangalore. No part of this document, including any logo, data, illustrations, pictures,
scripts, may be reproduced, or stored in a retrieval system or transmitted in any form or by any means – electronic, mechanical, photocopying,
recording or otherwise – without the prior permission of the author.
Predictive Analytics
Prof. Dinesh Kumar U
Module 2

In a mathematical relationship, if you know the value of X we can predict the value of Y
exactly, whereas in a statistical relationship we will have the relationship as Y equal to beta not
+ beta 1 Plus error term. So here with the knowledge of X we will not be able to predict the
value of Y exactly there will be some error in the prediction. Let us try and understand the
nomenclature used regression. We call a dependent variable or a response variable that
measures the outcome of a study. So, it is also called the outcome variable. In the case of Die
Another Day case, the total cost of treatment is a response variable or outcome variable, and
an independent variable or explanatory variable explains the changes in the response variable.
Independent variables are also called feature in machine learning algorithm lingo. If you want
to understand how the total cost of treatment changes, we may have to look at the variables like
patients’ height, weight, and the past medical history and so on.

So, with that information, we believe that we may be able to tell the value of the outcome
variable which is, in this case, total treatment cost. Regression often sets the values of the
explanatory variable to see how it affects the response variable. It is important to understand
that regression model establishes existence of an association between two variables but not
causation. This is very, very important for students to understand. How can I find the causal
relationship? It is an interesting question.

Now there are techniques such as Counter-Factual models, Ruben Castle model, and Graphical
Models that can be used for establishing causal relationship, but we are not going to discuss
these techniques in this course. In this table, I have given different names used to represent
dependent variable and independent variable. It is also important to understand that the
dependent variable does not mean it depends on the values of independent variables, just names
that we use in regression model. And also, as I said before regression is not designed to capture
causality. The purpose of regression is to predict the value of the dependent variable given the
values of independent variables.

© All Rights Reserved. This document has been authored by Prof. Dinesh Kumar U and is permitted for use only within the course "Predictive
Analytics" delivered in the online course format by IIM Bangalore. No part of this document, including any logo, data, illustrations, pictures,
scripts, may be reproduced, or stored in a retrieval system or transmitted in any form or by any means – electronic, mechanical, photocopying,
recording or otherwise – without the prior permission of the author.

You might also like