0% found this document useful (0 votes)

87 views144 pages

BDA Unit 4

The document discusses regression analysis and its applications in R. Regression is used to predict unknown values based on relationships between variables. There are different types of regression including linear, logistic, and polynomial regression. Linear regression finds relationships between a continuous dependent variable and one or more independent variables. Multiple linear regression extends this to relationships between more than two variables. The document provides examples of applying simple and multiple linear regression in R to predict variables like exam grades, housing prices, and stock prices.

Uploaded by

asbsfg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views144 pages

BDA Unit 4

Uploaded by

asbsfg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 144

R Language -

A Language for Data

Analytics and Visualisation
Regression
To predict the unknown value from known
value.
Regression analysis is a statistical, predictive
modelling technique used to study the
relationship between a dependent variable and
one or more independent variables.
X 0 2 4 6
Y 0 4 ? 12

Dependent variable – ? Find the value of ‘Y’ when X = 4

Independent Variable - ?
To study the relationship between two or more
variables using Regression

e.g.1 Relationship b/w advertising expenditure of

sales
adv. expenditure
sales also
e.g.2: relationship b/w no. of hours practice
hours of practice
no of errors
Objective - To develop a model to show how the
variables are related and to predict?
eg. Predict sales for a given level of advertising
Dependent variable (y)
variables we’re trying to predict.
Independent variable (x)
variable we use to predict the dependent
variable.
Dependent variable (y) –> sales, no of errors
Independent variable (x) –> adv. exp, hours of
Practice.
Input Dataset - Housing price data set of New
York City.
This data set contains information such as, size,
locality, number of bedrooms in the house, etc.
Task - to predict the price of the house.
Independent (or) Predictor variables
Values not depending on any variable.
number of bedrooms, the size of the house and so
on.
These predictor variables are used to predict the
response variable.
Dependent (or) Response variable
values depend on the values of the independent
variable.
Price of the house is the dependent variable.
Types of Regression
Analysis

Linear Regression
Logistic Regression
Polynomial Regression
Linear Regression
It is one of the most basic and widely used
machine learning algorithms.
It is a predictive modeling technique used to
predict a continuous dependent variable, given
one or more independent variables.
Simple linear regression
One independent and one dependent variable.
Multiple linear regression
More than one independent variable and one
dependent variable.
Here, the relationship between the
dependent and independent variable is
always linear thus, when we try to plot
their relationship, we’ll observe more of a
straight line than a curved one.
Equation used to represent a linear
regression model:
Multiple linear regression
Extension of linear regression into
relationship between more than two
variables.
In simple linear relation we have one
predictor and one response variable, but in
multiple regression we have more than one
predictor variable and one response variable.
Logistic Regression

Logistic Regression is a machine learning

algorithm used to solve classification
problems.
Logistic Regression is a predictive analysis
technique used to predict a dependent
variable, given a set of independent
variables, such that the dependent
variable is categorical.
Polynomial Regression

Polynomial Regression is a method used

to handle non-linear data.
Non-linearly separable data is basically
when you cannot draw out a straight line
to study the relationship between the
dependent and independent variables.
The reason it is called ‘Polynomial’ regression is that the power of some
independent variables is more than 1.
Simple Linear Regression

One Independent variable and one output

variable.
It is named as linear, because the
relationship is approximated using a
straight line.
y- intercept

Slope – tells whether the line is increasing / decreasing; how steep it is

Hours Grade
Studied on
(x) Exam
(y)
2 69
9 98
5 82
5 77
3 71
7 84
1 55
8 94
6 84
2 64
Regression equation :

To calculate b0:
2 69 -2.8 -8.8 24.64 7.84
9 98 4.2 20.2 84.84 17.64
5 82 0.2 4.2 0.84 0.04
5 77 0.2 -0,8 -0.16 0.04
3 71 -1.8 -6.8 12.24 3.24
7 84 2.2 6.2 13.64 4.84
1 55 -3.8 -22.8 86.64 14.44
8 94 3.2 16.2 51.84 10.24
6 84 1.2 6.2 7.44 1.44
2 64 -2.8 -13.8 38.64 7.84
48 778 320.6 67.6
x
To predict the grade when no. of hours
studied = 3
Steps to Establish a
Regression

Carry out the experiment of gathering a sample of

observed values of number of hours studied and
corresponding grade.
Create a relationship model using the lm() functions in
R.
Find the coefficients from the model created and create
the mathematical equation using the coefficients.
Get a summary of the relationship model to know the
average error in prediction. Also called residuals.
To predict the grade for new persons, use
the predict() function in R.
Input Data:
lm() function
This function creates the relationship model
between the predictor and the response
variable.
Is this enough to actually use this model?
How do we ensure that the model
generated is statistically significant?
Soln : p-Values
we can consider a linear model to be
statistically significant only when both these
p-Values are less than the pre-determined
statistical significance level of 0.05.
Whenever there is a p-value, there is always a
Null and Alternate Hypothesis associated.
Null Hypothesis (H0)
proposes that there is no difference between certain
characteristics of a population or data-generating
process.
Alternate Hypothesis (H1)
Proposes that there is a difference.
Get the Summary of Relationship
R-squared
Statistical measure which shows how close the data are to the
fitted regression line.
Known as the coefficient of determination, or the coefficient of
multiple determination for multiple regression.
R-squared = Explained variation / Total variation
R-squared is always between 0 and 100%:
0% indicates that the model explains none of the variability of
the response data around its mean.
100% indicates that the model explains all the variability of the
response data around its mean.
predict() function
To predict the value for new records
To predict the grade when number of
hours studied = 3
How good is the prediction?
How well the regression line fits the data?
Solution : coefficient of determination

SST = SSR + SSE

2 69 64.528 4.472 19.9988 -8.8 77.44
9 98 97.708 0.292 0.0852 20.02 408.04
5 82 78.748 3.252 10.5755 4.2 17.64
5 77 78.748 -1.748 3.0555 -0.8 0.64
3 71 69.268 1.732 2.9998 -6.8 46.24
7 84 88.228 -4.228 17.8759 6.2 38.44
1 55 59.788 4.788 22.9249 -22.8 519.84
8 94 92.968 1.032 1.0650 16.2 262.44
6 84 83.488 0.512 0.2621 6.2 368.44
2 64 64.528 -0.528 0.2788 -13.8 190.44
SSE 79.1215 SST 1599.6
SST = SSR + SSE
SSR = SST – SSE
= 1599.6-79.1215
=1520.4785
Multiple Regression

Multiple regression is an extension of linear

regression into relationship between more than
two variables.
In simple linear relation we have one predictor
and one response variable, but in multiple
regression we have more than one predictor
variable and one response variable.
Steps to apply the multiple linear
regression in R
Collect the data
Capture the data in R
Apply the multiple linear regression in R
Make a prediction
Step 1:
Goal is to predict the stock_index_price (the
dependent variable) of a fictitious economy
based on two independent/input variables:
Interest_Rate
Unemployment_Rate

Y = b0 + b1 * X 1 + b2 * X 2
Where,
b0 - Y- intercept
X1 - Interest_Rate
X2 - - Unemployment_Rate
Year Month Interest_Rate Unemployment_Rate Stock_Index_Price
2017 12 2.75 5.3 1464
2017 11 2.5 5.3 1394
2017 10 2.5 5.3 1357
2017 9 2.5 5.3 1293
2017 8 2.5 5.4 1256
2017 7 2.5 5.6 1254
2017 6 2.5 5.5 1234
2017 5 2.25 5.5 1195
2017 4 2.25 5.5 1159
2017 3 2.25 5.6 1167
2017 2 2 5.7 1130
2017 1 2 5.9 1075
2016 12 2 6 1047
2016 11 1.75 5.9 965
2016 10 1.75 5.8 943
2016 9 1.75 6.1 958
2016 8 1.75 6.2 971
2016 7 1.75 6.1 949
2016 6 1.75 6.1 884
2016 5 1.75 6.1 866
2016 4 1.75 5.9 876
2016 3 1.75 6.2 822
2016 2 1.75 6.2 704
2016 1 1.75 6.1 719
Step 2: Capture the data in R
Step 3: Apply multiple linear regression in
R
Multiple linear regression equation
Step 4: Make a prediction.
Predict the stock index price for the
following data:
Interest Rate = 1.5 (i.e., X1= 1.5)
Unemployment Rate = 5.8 (i.e., X2= 5.8)
Example – II
By using the sample.split() we are actually
creating a vector with two values TRUE
and FALSE.
By setting the SplitRatio to 0.7, you are
splitting the original Revenue dataset of
1000 rows to 70% training and 30%
testing data.
Logistic Regression
Use case – College Admission
using Logistic Regression
Polynomial Regression

Polynomial Regression is a special case of

linear regression where the relationship
between X and Y is modeled using a
polynomial, rather than a line….
It can be used when the relationship
between X and Y is nonlinear, although
this is still considered to be a special case
of Multiple Linear Regression.
But what if your linear regression model
cannot model the relationship between
the target variable and the predictor
variable?
In other words, what if they don’t have a
linear relationship?
Linear Regression

Polynomial Regression

𝜃 0 is the bias,
𝜃 1, 𝜃2, …, 𝜃n are the weights in the equation of the
polynomial regression, and
n is the degree of the polynomial
Co-variance
Covariance is a measure of how much two
random variables vary together
Lie between -infinity and +infinity
A positive value shows that both variables
vary in the same direction and negative
value shows that they vary in the opposite
direction.
Measure of correlation
Co-variance
Co-variance
Correlation

Correlation is a statistical measure that

indicates how strongly two variables are
related.
Lie between -1 and +1
Correlation

Correlation(x,y) = 112.33 / sqrt(331.28 * 48.78)

= 112.33 / sqrt(16159.8384)
= 112.33 / 127.121
= 0.88
0.88 shows that strength of the correlation between
temperature and number of customers is very strong.
Pearson’s correlation is a parametric measure of
the linear association between two numeric
variables.
Spearman’s rank correlation is a non-parametric
measure of the monotonic association (increase
[or decrease] in the same direction, but not
always at the same rate) between two numeric
variables.
Kendall’s rank correlation is another non-
parametric measure of the association, based on
concordance or discordance (refer to comparing
two pairs of data points to see if they “match.”) of
x-y pairs.
Hypothesis Testing
It is a statistical method that is used in making
statistical decisions using experimental data.
Hypothesis Testing is basically an assumption
that we make about the population parameter.
Null hypothesis (H0):
A null hypothesis is a type of hypothesis used in
statistics that proposes that no statistical significance
exists in a set of given observations.
Alternative hypothesis (Ha or H1):
An Alternative hypothesis is a type of hypothesis
used in statistics that proposes that there is statistical
significance exists in a set of given observations.
t - test

Consider a telecom company that has two service

centers in the city.
The company wants to find whether the average time
required to service a customer is the same in both
stores.
The company measures the average time taken by 50
random customers in each store.
Store A takes 22 minutes while Store B averages 25
minutes.
Can we say that Store A is more efficient than Store B in
terms of customer service?
This is where the t-test comes into play.
It helps us to understand if the difference
between two sample means is actually
real or simply due to chance.
Types of t-tests
One sample t-test
Independent two-sample t-test
Paired sample t-test
One sample t-test
In a one-sample t-test, we compare the
average (or mean) of one group against the
set average (or mean).
This set average can be any theoretical value
(or it can be the population mean).
Consider the following example – A research
scholar wants to determine if the average eating
time for a (standard size) burger differs from a
set value.
Let’s say this value is 10 minutes.
How do you think the research scholar can go
about determining this?
He/she can broadly follow the below
steps:
Select a group of people
Record the individual eating time of a
standard size burger
Calculate the average eating time for the
group
Finally, compare that average value with the
set value of 10.
where,
t = t-statistic
m = mean of the group
µ = theoretical value or population mean
s = standard deviation of the group
n = group size or sample size
Once we have calculated the t-statistic value,
the next task is to compare it with the critical
value of the t-test.
We can find this in the below t-test table against
the degree of freedom (n-1) and the level of
significance.
Degrees of freedom
the number of values that are free to vary in a data
set
Implementing the One-
Sample t-test in R

A mobile manufacturing company has

taken a sample of mobiles of the same
model from the previous month’s data.
They want to check whether the average
screen size of the sample differs from the
desired length of 10 cm.
t-statistic -> -0.39548.
Degree of freedom here is 999 and the confidence
interval is 95%.
The t-critical value is 1.962.
Since the t-statistic is less than the t-critical
value, we fail to reject the null hypothesis and
can conclude that the average screen size of the
sample does not differ from 10 cm.
We can also verify this from the p-value, which is
greater than 0.05.
Therefore, we fail to reject the null hypothesis at a 95%
confidence interval.
Independent Two-Sample
t-test

The two-sample t-test is used to compare the

means of two different samples.
Let’s say we want to compare the average
height of the male employees to the average
height of the females.
Of course, the number of males and females
should be equal for this comparison.
This is where a two-sample t-test is used.
where,
mA and mB are the means of two different
samples
nA and nB are the sample sizes
S2 is an estimator of the common variance
of two samples, such as:
For this section, we will work with data
about two samples of the various models
of a mobile phone.
We want to check whether the mean
screen size of sample 1 differs from the
mean screen size of sample 2.
We can confirm that the t-statistic is again less
than the t-critical value so we fail to reject the
null hypothesis.
Hence, we can conclude that there is no difference
between the mean screen size of both samples.
We can verify this again using the p-value.
It comes out to be greater than 0.05, therefore we
fail to reject the null hypothesis at a 95% confidence
interval.
There is no difference between the mean of the two
samples.
Paired t-test

Here, we measure one group at two different times.

We compare separate means for a group at two
different times or under two different conditions.
A certain manager realized that the productivity
level of his employees was trending significantly
downwards.
This manager decided to conduct a training
program for all his employees with the aim of
increasing their productivity levels.
How will the manager measure if the
productivity levels increased?
Just compare the productivity level of the
employees before versus after the training
program.
Here, we are comparing the same sample
(the employees) at two different times
(before and after the training).
where,
t = t-statistic
m = mean of the group
s = standard deviation of the group
n = group size or sample size
Degree of freedom = n – 1
As an example of data, 20 mice received a
treatment X during 3 months. We want to
know whether the treatment X has an
impact on the weight of the mice.
To answer to this question, the weight of
the 20 mice has been measured before
and after the treatment. This gives us 20
sets of values before treatment and 20
sets of values after treatment from
measuring twice the weight of the same
20 mice received a treatment X during 3
months. We want to know whether the
treatment X has an impact on the weight of the
mice.
Weight of the 20 mice has been measured before
and after the treatment.
20 sets of values before treatment and 20 sets of
values after treatment from measuring twice the
weight of the same mice.
The p-value is less than 0.05.
We can reject the null hypothesis at
a 95% confidence interval and
conclude that there is a significant
difference between the mean weight
before and after the treatment.
K-means clustering in R
Can you distinguish between the 3 species of IRIS flower
using Machine Learning
k-means Clustering
Features
Initial set of clusters randomly chosen
Iteratively, items are moved among sets of clusters until the desired set is reached
High degree of similarity among elements in a cluster is obtained
Given a cluster Ki={ti1,ti2,…,tim},
The cluster mean : mi = (1/m)(ti1 + … + tim)
Strength
Time complexity O(tkn)
Often terminates at a local optimum
Weakness
Does not work on categorical data
Only convex-shaped cluster are found
Need to specify k, the number of clusters, in advance
Unable to handle noisy data and outliers
K-modes
One variation of K-means
Handle categorical data
Uses modes, instead of using means
k-means Clustering
k-means Clustering
{2,4,10,12,3,20,30,11,25}, k=2
Randomly assign means : m1=2, m2=4
m1 m2 K1 K2
2 4 {2,3} {4,10,12,20,30,11,25}
2.5 16 {2,3,4} {10,12,20,30,11,25}
3 18 {2,3,4,10} {12,20,30,11,25}
4.75 19.6 {2,3,4,10,11,12} {20,30,25}
7 25 {2,3,4,10,11,12} {20,30,25}
Stop as the clusters with these means are the
same
Our answer
K1 = {2,3,4,10,11,12}, K2 = {20,30,25}
Association Rules

Association rules are used to show the

relationships between data items.
Association rules detect common usage of
data items.
E.g. The purchasing of one product when
another product is purchased represents
an association rule.
Association Rules
Association rules have most direct application in the
retail businesses.
Association rules used to assist in marketing,
advertising, floor placements and inventory control.
Eg:
From the transaction history several association rules can be
derived.
E.g. 100% of the time that PeanutButter is purchased, so is
bread.
33% of the time PeanutButter is purchased, Jelly is also
purchased.
Association Rules - Example
Association Rules - Example

Database in which Association rule is to

be found can be viewed as a set of tuples,
where each tuple contains a set of items.
Here, each tuple represents the list of
items purchased at one time.
Support:
The Support of an item (or set of items) is
the percentage of transactions in which that
item (or items) occurs.
Association Rule - Definition

Given a set of items I = {I1,I2,….Im} and a

database of transactions D = {t1,t2,….tm} where
ti = { Ii1,Ii2,….Iik} and IiJ € I , an association rule
is an implication of the form X ➔ Y where X,Y C
I are sets of items called itemsets and X∩Y =ø.
The Support(S) for an association rule
X ➔ Y is the percentage of transactions in the
database that contains X U Y.
NOTE: Support of X  Y is same as support of
X  Y.
Association Rule -
Introduction

Confidence (or) Strength

The Confidence (or) Strength (a) for an
Association rule X ➔ Y is the ratio of number
of transactions that contain X Y to the
number of transactions that contain X.
Selecting Association Rules

The selection of association rules is based on

Support and Confidence.
Confidence measures the strength of the rule,
Whereas support measures how often it should
occur in the database.
Typically large confidence values and a smaller
support are used.
Rules that satisfy both minimum support and
minimum confidence are called strong rules
Association Rule - Example
Association Rule - Example
Apriori algorithm
Most well known Association rule algorithm and
is used in most commercial products.
Uses the property called Large Itemset
property.
Any subset of a large itemset must be large.
To perform Association Rule Mining in R,
we use the arules and
the arulesViz packages in R.
If you don’t have these packages installed
in your system, please use the following
commands to install them.
Decision tree based algorithms

Given:
D = {t1, …, tn} where ti=<ti1, …, tih>
Database schema contains {A1, A2, …, Ah}
Classes C={C1, …., Cm}
Decision or Classification Tree is a tree associated with D such that
Each internal node is labeled with attribute, Ai
Each arc is labeled with predicate which can be applied to
attribute at parent
Each leaf node is labeled with a class, Cj
Solving the Classification problem using Decision trees is
a two step process:
Decision tree Induction
Construct a DT using training data
For each ti belongs to D, apply the DT to determine its class.
Decision tree based algorithms
DT Splits Area

M
Gender
F

Height
Comparing DT’s

Balanced
Deep
Random Forests
Random forest is a supervised learning algorithm which
is used for both classification as well as regression.
Mainly used for classification problems.
Forest is made up of trees and more trees means more
robust forest.
Similarly, random forest algorithm creates decision trees
on data samples and then gets the prediction from each
of them and finally selects the best solution by means of
voting.
It is an ensemble method which is better than a single
decision tree because it reduces the over-fitting by
averaging the result.
Random Forest

Random forest is an ensemble machine

learning algorithm.
It operates by building multiple decision
trees.
They work for both
Classification
Regression
In Banking, it is used to predict fraudulent
customers.
It is used in analysing symptoms of the
patients and detecting the disease.
In e-commerce, the recommendations are
based on customer activity.
Stock market trends can be analysed to
predict profit or loss.
ANOVA

Analysis of Variance (ANOVA) is a

parametric statistical technique used to compare
datasets.
This technique was invented by R.A. Fisher, and
is thus often referred to as Fisher’s ANOVA, as
well.
It is similar in application to techniques such as
t-test, in that it is used to compare means and
the relative variance between them.
However, analysis of variance (ANOVA) is best
applied where more than 2 populations or
samples are meant to be compared.
ANOVA

ANOVA is a statistical test for estimating how

a quantitative dependent variable changes
according to the levels of one or more
categorical independent variables.
ANOVA tests whether there is a difference
in means of the groups at each level of the
independent variable.
The null hypothesis (H0) of the ANOVA is no
difference in means, and the alternate
hypothesis (Ha) is that the means are different
from one another.
One way analysis:
When we are comparing more than three
groups based on one factor variable, then it
said to be one way analysis of variance
(ANOVA).
For example, if we want to compare whether
or not the mean output of three workers is
the same based on the working hours of the
three workers.
Two way analysis
When factor variables are more than two,
then it is said to be two way analysis of
variance (ANOVA).
For example, based on working condition and
working hours, we can compare whether or
not the mean output of three workers is the
same.
K-way analysis
When factor variables are k, then it is said to
be the k-way analysis of variance (ANOVA).
Dealing with Missing
values

A common task in data analysis is dealing

with missing values.
In R, missing values are often represented
by NA or some other value that represents
missing values (i.e. 99).
Test for missing values

To identify missing values

use is.na() which returns a logical vector
with TRUE in the element locations that
contain missing values represented
by NA.
is.na() will work on vectors, lists,
matrices, and data frames.
For data frames, a convenient shortcut to
compute the total missing values in each
column is to use colSums().
Recode missing values
Exclude missing values

We can exclude missing values in a couple

different ways.
First, if we want to exclude missing values
from mathematical operations use
the na.rm = TRUE argument.
If you do not exclude these values most
functions will return an NA.
Non-linear least square

When modeling real world data for regression

analysis, we observe that it is rarely the case
that the equation of the model is a linear
equation giving a linear graph.
Most of the times, the plot of the model gives a
curve rather than a line.
The goal of both linear and non-linear
regression is to adjust the values of the model's
parameters to find the line or curve that comes
closest to your data.
In Least Square regression, we establish a
regression model in which the sum of the
squares of the vertical distances of different
points from the regression curve is minimized.
We generally start with a defined model and
assume some values for the coefficients.
We then apply the nls() function of R to get the
more accurate values along with the confidence
intervals.

Installation Procedure: EVC-E Volvo Penta IPS Triple: Typical Installation / Main Station
No ratings yet
Installation Procedure: EVC-E Volvo Penta IPS Triple: Typical Installation / Main Station
2 pages
J8pF3RMRxmfKRd0TFcZFw Learning Log Template Define Problems and Ask Questions With Data
No ratings yet
J8pF3RMRxmfKRd0TFcZFw Learning Log Template Define Problems and Ask Questions With Data
1 page
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Module 2
No ratings yet
Module 2
21 pages
SML Updated UNIT 3
No ratings yet
SML Updated UNIT 3
41 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
Regression Analysis Linear Multiple Logistic
No ratings yet
Regression Analysis Linear Multiple Logistic
25 pages
Updated Lecture 7
No ratings yet
Updated Lecture 7
29 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Da Unit-3
No ratings yet
Da Unit-3
27 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
U3 U4 Regression
No ratings yet
U3 U4 Regression
22 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
CH 03 Regression Techniques
No ratings yet
CH 03 Regression Techniques
74 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
30 pages
Chapter4 Notes
No ratings yet
Chapter4 Notes
18 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
ML Unit2
No ratings yet
ML Unit2
69 pages
Data Analytics and Visualization Unit-II
No ratings yet
Data Analytics and Visualization Unit-II
23 pages
Regression Test Lesson Notes (Optional Download)
No ratings yet
Regression Test Lesson Notes (Optional Download)
5 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Machine Learning Class Slide
No ratings yet
Machine Learning Class Slide
44 pages
Dimpas Bscpe 2-7 Assignment No.9
No ratings yet
Dimpas Bscpe 2-7 Assignment No.9
17 pages
U-4 Iml
No ratings yet
U-4 Iml
17 pages
PRACTICAL5
No ratings yet
PRACTICAL5
4 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
BDA Exp7 Removed
No ratings yet
BDA Exp7 Removed
4 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Regression in M.L
No ratings yet
Regression in M.L
13 pages
Lesson 7 - Regression Analysis
No ratings yet
Lesson 7 - Regression Analysis
57 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Engineering Analysis & Statistics: Lect. # 11
No ratings yet
Engineering Analysis & Statistics: Lect. # 11
22 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
UNIT II Regression
No ratings yet
UNIT II Regression
59 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
70 pages
228w1f0065 ML
No ratings yet
228w1f0065 ML
15 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
Linear Regression Model 1
No ratings yet
Linear Regression Model 1
23 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Simple Linear Regression Homework Solutions
100% (1)
Simple Linear Regression Homework Solutions
6 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Unit5 R
No ratings yet
Unit5 R
5 pages
Chapter 02 Regression - Ongoing
No ratings yet
Chapter 02 Regression - Ongoing
57 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
Thesis Multiple Linear Regression
100% (2)
Thesis Multiple Linear Regression
5 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Regression Analysis
No ratings yet
Regression Analysis
52 pages
ML Exp1 C36
No ratings yet
ML Exp1 C36
13 pages
Module 3
No ratings yet
Module 3
34 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
SIDDHANT VIJAY 2K20 CH 65 Sem 5
No ratings yet
SIDDHANT VIJAY 2K20 CH 65 Sem 5
29 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Regression Modelling
No ratings yet
Regression Modelling
25 pages
1 - UNIT 2 2 Files Merged
No ratings yet
1 - UNIT 2 2 Files Merged
80 pages
M21des323 01 A
No ratings yet
M21des323 01 A
53 pages
M21des323 02
No ratings yet
M21des323 02
59 pages
Azure Mock Test 2
No ratings yet
Azure Mock Test 2
13 pages
Azure Mock Test1
No ratings yet
Azure Mock Test1
14 pages
Azure Exam Dumbs
No ratings yet
Azure Exam Dumbs
42 pages
Changelog
No ratings yet
Changelog
2 pages
User Manual: ATEQ Leak/Flow Calibrator (CDF)
No ratings yet
User Manual: ATEQ Leak/Flow Calibrator (CDF)
78 pages
SAP Abap Quiz Part 2
No ratings yet
SAP Abap Quiz Part 2
10 pages
Assignment I-21MAB204T (2024-25)
No ratings yet
Assignment I-21MAB204T (2024-25)
2 pages
Lol
No ratings yet
Lol
3 pages
Prelim Intro To Multimedia Chap 1
No ratings yet
Prelim Intro To Multimedia Chap 1
38 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Manual Usuario Cone P (INGLÉS)
No ratings yet
Manual Usuario Cone P (INGLÉS)
40 pages
Lesson 2.2 Understanding Files: Slideshow Created by Sarel Myburgh Updated by Savon (25-Feb-23)
No ratings yet
Lesson 2.2 Understanding Files: Slideshow Created by Sarel Myburgh Updated by Savon (25-Feb-23)
8 pages
Osy Question Bank
No ratings yet
Osy Question Bank
8 pages
MS OFFICE APPLICATION-ren
No ratings yet
MS OFFICE APPLICATION-ren
17 pages
Interntional Audit Workbook
100% (6)
Interntional Audit Workbook
133 pages
睿能全电脑普通袜机使用说明书10寸屏英文版
No ratings yet
睿能全电脑普通袜机使用说明书10寸屏英文版
45 pages
Course Syllabus Gamification
No ratings yet
Course Syllabus Gamification
4 pages
Unit-T Ut510
No ratings yet
Unit-T Ut510
1 page
Types of Neural Networks
No ratings yet
Types of Neural Networks
11 pages
Format For GWA
No ratings yet
Format For GWA
6 pages
Bakhsh 2020 - Effect of Light Irradiation Condition On Gap Formation Under Polymeric Dental Restoration. OCT Study
No ratings yet
Bakhsh 2020 - Effect of Light Irradiation Condition On Gap Formation Under Polymeric Dental Restoration. OCT Study
7 pages
Packet Tracer
No ratings yet
Packet Tracer
4 pages
(Ebooks PDF) Download Triple Focus A New Approach To Education The Full Chapters
100% (3)
(Ebooks PDF) Download Triple Focus A New Approach To Education The Full Chapters
21 pages
Industrial AI Applications With Sustainable Performance 1st Edition Jay Lee Download PDF
No ratings yet
Industrial AI Applications With Sustainable Performance 1st Edition Jay Lee Download PDF
40 pages
Intrinsic & Extrinsic Semiconductors
No ratings yet
Intrinsic & Extrinsic Semiconductors
20 pages
7mbi100sa 060B
No ratings yet
7mbi100sa 060B
8 pages
Design of Mini Compressor Less Powered Refrigerator: Project Report ON
No ratings yet
Design of Mini Compressor Less Powered Refrigerator: Project Report ON
37 pages
Synchronous Optical Networking (Sonet)
No ratings yet
Synchronous Optical Networking (Sonet)
6 pages
Statistics and Probability (MAT02) Numerical Descriptive Measure
No ratings yet
Statistics and Probability (MAT02) Numerical Descriptive Measure
13 pages
Turnitin Originality Report FELEKI
No ratings yet
Turnitin Originality Report FELEKI
74 pages
3 - Offline Participant Information and Consent Form
No ratings yet
3 - Offline Participant Information and Consent Form
3 pages

BDA Unit 4

Uploaded by

BDA Unit 4

Uploaded by

R Language -

A Language for Data

Dependent variable – ? Find the value of ‘Y’ when X = 4

e.g.1 Relationship b/w advertising expenditure of

Logistic Regression is a machine learning

Polynomial Regression is a method used

One Independent variable and one output

Slope – tells whether the line is increasing / decreasing; how steep it is

Carry out the experiment of gathering a sample of

SST = SSR + SSE

Multiple regression is an extension of linear

Polynomial Regression is a special case of

Correlation is a statistical measure that

Correlation(x,y) = 112.33 / sqrt(331.28 * 48.78)

Consider a telecom company that has two service

A mobile manufacturing company has

The two-sample t-test is used to compare the

Here, we measure one group at two different times.

Association rules are used to show the

Database in which Association rule is to

Given a set of items I = {I1,I2,….Im} and a

Confidence (or) Strength

The selection of association rules is based on

Random forest is an ensemble machine

Analysis of Variance (ANOVA) is a

ANOVA is a statistical test for estimating how

A common task in data analysis is dealing

To identify missing values

We can exclude missing values in a couple

When modeling real world data for regression

You might also like