Business Analytics - Ii
Business Analytics - Ii
Business Analytics - Ii
Analytics
Analytics is the data-driven decision-making
approach for a business problem.
WHAT IS BA
Data analysis—includes data description, data
inference, and the search for relationships in data.
Univariate analysis
Bivariate analysis
Multivariate Analysis
• The term univariate analysis refers to the analysis of one variable. You
can remember this because the prefix “uni” means “one.”
1. Summary Statistics
• Measures of central tendency: these numbers describe where the centre
of a dataset is located. Examples include the mean and the median.
• Mean (the average value): 3.8
• Median (the middle value): 4
2. Frequency Distributions
• This allows us to quickly see that the
most frequent household size is 4.
3. Charts
• Boxplot
• Histogram
• Pie Chart
BIVARIATE ANALYSIS
• Two variables:
• (1) Hours spent studying and
• (2) Exam score received by 20 different students:
• 1. Scatterplots
• A scatterplot offers a visual way to perform bivariate analysis. It
allows us to visualize the relationship between two variables by
placing the value of one variable on the x-axis and the value of
the other variable on the y-axis.
• Interpreting Scatterplots
• Strong, positive relationship
• Weak, positive relationship
• No relation
• Strong, negative relationship
• Weak, negative relationship
CORRELATION
Positive Correlation – When the variables are changing in the same direction (either increase or
decrease in parallel), we call it as a positively correlated. For e.g. price of a goods and demand,
hot weather and cold drink consumptions, etc.
Negative Correlation – When the variables are changing in the opposite direction (One is
increasing and other is decreasing), we call it as a negatively correlated. For e.g. alcohol
consumption and lifeline, smartphones usages and battery lifeline, etc.
Zero Correlation – We call it a zero correlated when there is no relationship between the variables
(Correlation=0). For e.g. HR recruits and temperature, paper production and beverages, etc.
STANDARD RANGE OF CORRELATION
COEFFICIENT
• r = - 1 Perfect negative correlation
• - 0.99 to - 0.76 Strong negative correlation
• - 0.75 to - 0.26 Intermediate (Moderate) negative correlation
- 0.25 to 0 Weak negative correlation
• r = 0 Zero correlation
• Steps:-
• Limitation:-
Pearson assumes all features are independent.
Pearson identifies only linear correlations
• In
Correlational Analysis Scattered diagram
• Scatter plot is a simple graph where the data of two continuous variables are plotted
against each other.
• It examines the relationship between two variables and to check the degree of
association between them.
• One variable is called the independent variable and the other variable is called the
dependent variable.
• The degree of association of a variable is known as correlation.
• Scattered diagram is one of the ways of finding the extent of relationship between two
quantitative variables.
• However, this method will only indicate that there is a relationship between two variables
but, not the extent to which they are related.
REGRESSION - LINEAR REGRESSION
• Regression is a statistical measurement that attempts to determine the strength of the relationship between a
dependent variable and a series of independent variables.
Linear regression is a quiet and simple statistical regression method used for predictive
analysis and shows the relationship between continuous variables. Linear regression shows
the linear relationship between the independent variable (X-axis) and the dependent variable
(Y-axis).
If there is a single input variable (x), such linear regression is called simple linear regression.
And if there is more than one input variable, such linear regression is called multiple linear
regression.
Linear regression always uses a linear equation,
So Hypothesis function for Linear Regression is :-
y = mx + c
where x is the explanatory variable and Y is the dependent variable.
Y= Dependent Variable.
x= Independent Variable.
m= slope.
INTERCEPT AND SLOPE
• The intercept (often labelled the constant) is the expected mean value of
Y when all X=0.
• Start with a regression equation with one predictor, X.
• If X sometimes equals 0, the intercept is simply the expected mean
value of Y at that value.
• If X never equals 0, then the intercept has no meaning.
• The slope indicates the steepness of a line.
• m is the slope of a regression line, which is the rate of change
for y as x changes.
• The slope is positive 5. When x increases by 1, y increases by 5. The
y-intercept is 2.
• The slope is negative 0.4. When x increases by 1, y decreases by 0.4.
The y-intercept is 7.2.
Represent in simple variable
Example:- Estimate the salary of an employee based y = mx + c
on year of experience.
• When working with linear regression, our main goal is to find the best fit line,
meaning the error between predicted and actual values should be minimized.
The best fit line will have the least error.
ICE BREAKERS
LEAST SQUARES ESTIMATION
• When fitting a straight line through a scatterplot, choose the line that makes the
vertical distance from the points to the line as small as possible.
• A fitted value is the predicted value of the dependent variable.
1. SS = Sum of Squares.
2. Regression MS = Regression SS / Regression degrees of freedom.
3. Residual MS = mean squared error (Residual SS / Residual degrees of freedom).
4. F: Overall F test for the null hypothesis.
5. Significance F: The significance associated P-Value.
R E G R E S S I O N A N A LY S I S : I N T E R PR ET R E G R ES S I O N
COEFFICIENTS