Analysis of Data-Statistic: Unit IV
Analysis of Data-Statistic: Unit IV
Analysis of Data-Statistic: Unit IV
Part
Unit IV
ANALYSIS OF DATA
The important statistical measures that are used to
analyze the research or the survey are:
1. Measures of central tendency(mean, median &
mode)
2. Measures of dispersion(standard deviation, range,
mean deviation)
3. Measures of asymmetry(skew ness)
4. Measures of relationship etc.( correlation and
regression)
5. Association in case of attributes.
6. Time series Analysis
Statistics
Types of Analysis
The presentation on inferential statistics
will cover univariate, bivariate and
multivariate analysis.
Univariate Analysis:
Mean
Median
Mode
Standard deviation
Measures of Central
Tendency
Central Tendency
A common measure of central tendency is
the average, or mean, of the responses.
The median is the value of the middle
case when all responses are rank-ordered.
The mode is the most common response.
When data are highly skewed, meaning
heavily balanced toward one end of the
distribution, the median or mode might
better represent the most common or
centered response.
1. Arithmetic Mean
Arithmetic mean is a mathematical
average and it is the most popular
measures of central tendency. It is frequently
referred to as mean it is obtained by dividing sum of
the values of all observations in a series ( X) by the
number of items (N) constituting the series.
Thus, mean of a set of numbers X1, X2, X3,
..Xn denoted by xx and is defined as
2.Median
Median is a central value of the distribution, or
the value which divides the distribution in equal
parts, each part containing equal number of items.
Thus it is the central value of the variable, when
the values are arranged in order of magnitude.
Connor has defined as The median is that value
of the variable which divides the group into two
equal parts, one part comprising of all values
greater, and the other, all values less than median
3. Mode
Mode is the most frequent value or score
in the distribution.
It is defined as that value of the item in
a series.
It is denoted by the capital letter Z.
highest point of the frequencies
distribution curve.
Z=L1
Conclusion- Central
Tendency
A measure of central tendency is a measure
that tells us where the middle of a bunch of
data lies.
Dispersion
Dispersion refers to the way the values
are distributed around some central
value, typically the mean.
The range is the distance separating the
lowest and highest values (e.g., the
range of the ages listed previously equals
18-85).
The standard deviation is an index of
the amount of variability in a set of data.
Dispersion (Continued)
The standard deviation represents
dispersion with respect to the normal
(bell-shaped) curve.
Assuming a set of numbers is normally
distributed, then each standard deviation
equals a certain distance from the mean.
Each standard deviation (+1, +2, etc.) is
the same distance from each other on the
bell-shaped curve, but represents a
declining percentage of responses
because of the shape of the curve.
Dispersion (Continued)
If the responses are distributed normal
and the range of responses is low that
means that most responses fall close to
the meanthen the standard deviation
will be small.
The standard deviation of professional
golfers scores on a golf course will be
low.
The standard deviation of amateur
golfers scores on a golf course will be
high.
Bivariate Analysis
1. Introduction
Bivariate analysis refers to an
examination of the relationship
between two variables.
We might ask these questions about the
relationship between two variables:
Do they seem to vary in relation to one
another? That is, as one variable
increases in size does the other
variable increase or decrease in size?
What is the strength of the relationship
between the variables?
Correlation
Correlation- is used in finding the
relationship between two quantitative
variables without being able to infer causal
relationships. Correlation is a statistical
technique used to determine the degree to
which two variables are related.
It is also called Pearson's correlation or
product moment correlation coefficient.
It measures the nature and strength between
two variables of the quantitative type.
Correlation- contd
The sign of r denotes the nature of association while
the value of r denotes the strength of association.
If the sign is +ve , this means the relation is direct
(an increase in one variable is associated with an
increase in the other variable and a decrease in one
variable is associated with a decrease in the other
variable).
While if the sign is -ve , this means an inverse or
indirect relationship (which means an increase in one
variable is associated with a decrease in the other).
The maximum value of r can be 1
Regression Analysis
Regression analysis is one of the most frequently used
tools in market research. In its simplest form, regression
analysis allows market researchers to analyze relationships
between one independent and one dependent variable.
In marketing applications, the dependent variable is
usually the outcome we care about (e.g., sales), while the
independent variables are the instruments we have to
achieve those outcomes with (e.g., pricing or advertising).
Regression analysis can provide insights that few other
techniques can.
The key benefits of using regression analysis are that
it can:
Indicate if independent variables have a significant
relationship with a dependent variable.
Indicate the relative strength of different independent
Regression
Technique concerned with predicting some
variables by knowing others. The process of
predicting variable Y using variable X. Uses a
variable (x) to predict some outcome variable
(y). Tells you how values in y change as a
function of changes in values of x
y= ax+b
where, y is dependent variable, x is
independent variable, b is intercept or
constant.
Chi Square
Use of Chi-square test of independence
Any business situation where you are essentially
checking if one variable, X is related to, or independent
of, another variable, Y. The use ofchi-square testis
indicated in any of the following business scenarios.
Suppose you want to determine if certain types of
products sell better in certain geographic locations
than others.
To verify the influence of gender on purchase
decisions. Are men the primary decision makers
when it comes to purchasing a big ticket items? Is
gender a factor in color preference of a car? Here
variable X would be gender and variable Y would be
color.