Analysis of Data-Statistic: Unit IV

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Analysis of Data- Statistic

Part
Unit IV

ANALYSIS OF DATA
The important statistical measures that are used to
analyze the research or the survey are:
1. Measures of central tendency(mean, median &
mode)
2. Measures of dispersion(standard deviation, range,
mean deviation)
3. Measures of asymmetry(skew ness)
4. Measures of relationship etc.( correlation and
regression)
5. Association in case of attributes.
6. Time series Analysis

Quantitative Data Analysis


Descriptive statistics attempt to explain
or predict the values of a dependent
variable given certain values of one or
more independent variables.
Inferential statistics attempt to generalize
the results of descriptive statistics to a
larger population of interest.

Statistics
Types of Analysis
The presentation on inferential statistics
will cover univariate, bivariate and
multivariate analysis.
Univariate Analysis:
Mean
Median
Mode
Standard deviation

Types of Analysis (Continued)


Bivariate Analysis
Tests of statistical significance.
Chi-square.
Co relation
Regression
Multivariate Analysis:
Multiple Regression.
Path analysis.
Time-series analysis.
Factor analysis.
Analysis of variance (ANOVA).

Introduction: Central Tendency


Measures of central tendency are statistical measures

which describe the position of a distribution.


They are also called statistics of location, and are the
complement of statistics of dispersion, which provide
information concerning the variance or distribution
of observations.
In the univariate context, the mean, median and mode
are the most commonly used measures of central
tendency.
computable values on a distribution that discuss the
behavior of the center of a distribution.

Measures of Central
Tendency

The value or the figure which represents the whole


series is neither the lowest value in the series nor the
highest it lies somewhere between these two
extremes.
1. The average represents all the measurements
made on a group, and gives a concise description
of the group as a whole.
2. When two are more groups are measured, the
central tendency provides the basis of comparison
between them.

Central Tendency
A common measure of central tendency is
the average, or mean, of the responses.
The median is the value of the middle
case when all responses are rank-ordered.
The mode is the most common response.
When data are highly skewed, meaning
heavily balanced toward one end of the
distribution, the median or mode might
better represent the most common or
centered response.

Central Tendency (Continued)


Consider this distribution of respondent
ages:
18, 19, 19, 19, 20, 20, 21, 22, 85
The mean equals 27. But this number
does not adequately represent the
common respondent because the one
person who is 85 skews the distribution
toward the high end.
The median equals 20.
This measure of central tendency gives a
more accurate portrayal of the middle of
the distribution.

1. Arithmetic Mean
Arithmetic mean is a mathematical
average and it is the most popular
measures of central tendency. It is frequently
referred to as mean it is obtained by dividing sum of
the values of all observations in a series ( X) by the
number of items (N) constituting the series.
Thus, mean of a set of numbers X1, X2, X3,
..Xn denoted by xx and is defined as

2.Median
Median is a central value of the distribution, or
the value which divides the distribution in equal
parts, each part containing equal number of items.
Thus it is the central value of the variable, when
the values are arranged in order of magnitude.
Connor has defined as The median is that value
of the variable which divides the group into two
equal parts, one part comprising of all values
greater, and the other, all values less than median

Calculation of Median Discrete series :


i. Arrange the data in ascending or descending
order.
ii. Calculate the cumulative frequencies.
iii. Apply the formula.

3. Mode
Mode is the most frequent value or score

in the distribution.
It is defined as that value of the item in
a series.
It is denoted by the capital letter Z.
highest point of the frequencies
distribution curve.

Croxton and Cowden : defined it as the mode


of a distribution is the value at the point armed
with the item tend to most heavily concentrated.
It may be regarded as the most typical of a series
of value
The exact value of mode can be obtained by the
following formula.

Z=L1

Conclusion- Central
Tendency
A measure of central tendency is a measure
that tells us where the middle of a bunch of
data lies.

Mean is the most common measure of


central tendency. It is simply the sum of the
numbers divided by the number of numbers in
a set of data. This is also known as average.

Median is the number present in the


middle when the numbers in a set of data
are arranged in ascending or descending
order. If the number of numbers in a data
set is even, then the median is the mean
of the two middle numbers.

Mode is the value that occurs most


frequently in a set of data.

Dispersion
Dispersion refers to the way the values
are distributed around some central
value, typically the mean.
The range is the distance separating the
lowest and highest values (e.g., the
range of the ages listed previously equals
18-85).
The standard deviation is an index of
the amount of variability in a set of data.

Dispersion (Continued)
The standard deviation represents
dispersion with respect to the normal
(bell-shaped) curve.
Assuming a set of numbers is normally
distributed, then each standard deviation
equals a certain distance from the mean.
Each standard deviation (+1, +2, etc.) is
the same distance from each other on the
bell-shaped curve, but represents a
declining percentage of responses
because of the shape of the curve.

Dispersion (Continued)
If the responses are distributed normal
and the range of responses is low that
means that most responses fall close to
the meanthen the standard deviation
will be small.
The standard deviation of professional
golfers scores on a golf course will be
low.
The standard deviation of amateur
golfers scores on a golf course will be
high.

Bivariate Analysis
1. Introduction
Bivariate analysis refers to an
examination of the relationship
between two variables.
We might ask these questions about the
relationship between two variables:
Do they seem to vary in relation to one
another? That is, as one variable
increases in size does the other
variable increase or decrease in size?
What is the strength of the relationship
between the variables?

Quantitative Data Analysis


2. Measures of Association (Continued)
Covariance is the extent to which two
variables change with respect to one
another.
As one variable increases, the other
variable either increases (positive
covariance) or decreases (negative
covariance).
Correlation is a standardized measure
of covariance.
Correlation ranges from -1 to +1, with
figures closer to one indicating a

Quantitative Data Analysis


2. Measures of Association (Continued)
Technically, covariance is the extent to
which two variables co-vary about their
means.
If a persons years of formal education
is above the mean of education for all
persons and his/her income is above
the mean of income for all persons,
then this data point would indicate
positive covariance between education
and income.

Correlation
Correlation- is used in finding the
relationship between two quantitative
variables without being able to infer causal
relationships. Correlation is a statistical
technique used to determine the degree to
which two variables are related.
It is also called Pearson's correlation or
product moment correlation coefficient.
It measures the nature and strength between
two variables of the quantitative type.

Correlation- contd
The sign of r denotes the nature of association while
the value of r denotes the strength of association.
If the sign is +ve , this means the relation is direct
(an increase in one variable is associated with an
increase in the other variable and a decrease in one
variable is associated with a decrease in the other
variable).
While if the sign is -ve , this means an inverse or
indirect relationship (which means an increase in one
variable is associated with a decrease in the other).
The maximum value of r can be 1

Regression Analysis
Regression analysis is one of the most frequently used
tools in market research. In its simplest form, regression
analysis allows market researchers to analyze relationships
between one independent and one dependent variable.
In marketing applications, the dependent variable is
usually the outcome we care about (e.g., sales), while the
independent variables are the instruments we have to
achieve those outcomes with (e.g., pricing or advertising).
Regression analysis can provide insights that few other
techniques can.
The key benefits of using regression analysis are that
it can:
Indicate if independent variables have a significant
relationship with a dependent variable.
Indicate the relative strength of different independent

Marketing Applications of Regression


It helps in making predictions. Knowing about the effects of independent
variables on dependent variables can help market researchers in many
different ways. For example, it can help direct spending if we know
promotional activities significantly increases sales.
It helps in knowing about the relative strength of effects. It is useful for the
marketers because it may help to answer questions such as whether sales
depend more on price or on promotions.
Regression analysis also allows the marketer to compare the effects of
variables measured on different scales such as the effect of price changes
(e.g., measured in $) and the number of promotional activities.
Regression analysis can also help to make predictions. For example, if we
have estimated a regression model using data on sales, prices, and
promotional activities, the results from this regression analysis could
provide a precise answer to what would happen to sales if prices were to
increase by 5% and promotional activities were to increase by 10%. Such
precise answers can help (marketing) managers make sound decisions.
Furthermore, by providing various scenarios, such as calculating the sales
effects of price increases of 5%, 10%, and 15%, managers can evaluate
marketing plans and create marketing strategies.

Regression
Technique concerned with predicting some
variables by knowing others. The process of
predicting variable Y using variable X. Uses a
variable (x) to predict some outcome variable
(y). Tells you how values in y change as a
function of changes in values of x
y= ax+b
where, y is dependent variable, x is
independent variable, b is intercept or
constant.

Chi- Square Test


Chi-square is a statistical test commonly used to
compare observed data. Its can be used: When the experimental data or sample observation
are independent of each other
The data collected should be drawn at random from
universe or population.
These data must be present in units not in ratios or
percentage form.
There must be at least 50 observations in the sample.
The number of observation within each cell should
not be less than 5

Chi Square
Use of Chi-square test of independence
Any business situation where you are essentially
checking if one variable, X is related to, or independent
of, another variable, Y. The use ofchi-square testis
indicated in any of the following business scenarios.
Suppose you want to determine if certain types of
products sell better in certain geographic locations
than others.
To verify the influence of gender on purchase
decisions. Are men the primary decision makers
when it comes to purchasing a big ticket items? Is
gender a factor in color preference of a car? Here
variable X would be gender and variable Y would be
color.

Use of Chi-Square Test-Goodness


of Fit
Chi-Square Test-Goodness of Fit
A number of marketing problems involve
decision situations in which it is important for a
marketing manager to know whether the
pattern of frequencies that are observed fit well
with the expected ones. The appropriate test is
thec test of goodness of fit.
e.g.
In consumer marketing, a common problem that
any marketing manager faces is the selection of
appropriate colors for package design.

You might also like