Fds Unit 2 Final
Fds Unit 2 Final
DESCRIPTIVE ANALYTICS
PREPARED BY
AI&DS)
VERIFIED BY
Frequency Distributions
Outliers
Interpreting Distributions
Graphs
Averages
DescribingVariability
Interquartile Range
Normal Distributions
Z Scores
Correlation
Scatter Plots
Regression
Regression Line
Interpretation Of R2
UNIT II
DESCRIPTIVE ANALYTICS
PART A (2 marks)
NegativeCorrelation
NoCorrelation
Capital Employed(Rs.InCrore) 1 2 3 4 5 7 8 9 11 12
Profit(Rs.InLakhs) 3 5 4 7 9 8 10 11 12 14
(a) Draw ascatter diagram
(b) Doyouthinkthatthereisanycorrelationbetweenprofitsandcapitalem
ployed? Isit positive or negative? Is it high or low?
3.Fromfollowinginformationfindthecorrelationcoefficientbetweenadvertisement
expensesandsales volume using KarlPearson’s coefficient of correlation
method.
Firm 1 2 3 4 5 6 7 8 9 10
AdvertisementExp.(Rs.InLakhs) 11 13 14 16 16 15 15 14 13 13
SalesVolume(Rs.InLakhs) 50 50 55 60 65 65 65 60 60 50
4.Findthecorrelationcoefficientbetweenageandplayinghabitsofthefollowing students
using KarlPearson’s coefficient of correlation method.
Age 15 16 17 18 19 20
Number of students 250 200 150 120 100 80
Regular Players 200 150 90 48 30 12
I. Correlation analysis helps inn deriving precisely the degree and the direction of
such relationship.
II. The effect of correlation is to reduce the range of uncertainity of our
prediction. The prediction based on correlation analysis will be more reliable and
near to reality.
III. Correlation analysis contributes to the understanding of economic behaviour, aids
in locating the critically important variables on which others depend, may reveal to
the economist the connections by which disturbances spread and suggest to him
the paths through which stabilizing farces may become effective
IV. Economic theory and business studies show relationships between variables like
price and quantity demanded advertising expenditure and sales promotion
measures etc.
V. The measure of coefficient of correlation is a relative measure of change.
When three or more variables are studied, it is a case of multiple correlation. For
example, in above example if study covers the relationship between student marks,
attendance of students, effectiveness of teacher, use of teaching aids etc,
5. Stateineach casewhetherthereis
(a) Positive Correlation
(b) Negative Correlation
(c) No Correlation
S.NO Particulars Solution
1 Price of commodity and its demand Negative
2 Yield of crop and amoun to frainfall Positive
3 No of fruits eatenandhungryofaperson Negative
4 Noofunitsproduced andfixedcostperunit Negative
5 Noofgirlsintheclassandmarksofboys NoCorrelation
6 AgesofHusbandsand wife Positive
7 Temperatureandsaleofwoollengarments Negative
8 Numberofcowsand milkproduced Positive
9 WeightofpersonandintelligencE NoCorrelation
10 Advertisementexpenditureandsalesvolume Positive
Optional
All classes should have both an upper boundary and a lower boundary.
when many different tables must be listed, as in the Statistical Abstract
of the UnitedStates. An open-ended class appears in the table “Two Age
Distributions”.
The lower boundary of each class interval should be a multiple of the
class interval.
Aim for a total of approximately 10 classes.
23. Define outliers.
An outlier is an extremely high or extremely low data point relative to the
nearest data point and the rest of the neighboring co-existing values in a
data graph or dataset.
24. What is frequency distribution for Ungrouped Data and grouped Data?
Frequency Distribution for Ungrouped Data
A frequency distribution produced whenever observations are sorted
intoclasses of single values.
Frequency Distribution for Grouped Data
A frequency distribution pro duced whenever observations are sorted
into classes of more than one value
25. Define Unit of Measurement.
The smallest possible difference between scores
30. Define the terms Population, Population Mean (μ) and Population Size (N).
Population
A complete set of scores.
Population Mean (μ)
The balance point for a population, found by dividing the sum for
all scores in the population by the number of scores in the
population.
Where,
X=sample mean, μ = Population Mean, σ = Population Standard Deviation.
I. Positive,NegativeandZeroCorrelation:
Whether correlation is positive (direct) or negative (in-versa) would depend up on the
direction of change of the variable.
Positive Correlation:
If both the variables vary in the same direction, correlation is said to be positive. It means
if one variable is increasing, the other on an,the other on an average is also deceasing, then
the correlation is said be positive correlation.
Height( 1 1 1 1 1 1 1 1
cm):X 5 6 6 6 6 7 7 7
8 0 3 6 8 1 4 6
Weight(kg):Y 60 62 64 65 67 69 71 72
Negative Correlation:
Negative correlation is a relationship between two variables such that as the value
of one variable increases, the other decreases. Correlation is expressed on a range from
+1 to -1, known as the correlation co efficent. Values below zero express negative
correlation.
PriceofProduct(Rs.PerUnit):X 6 5 4 3 2 1
Demand(InUnits) :Y 75 120 175 250 215 400
Zero Correlation:
Zero correlation suggests that the correlation statistic does not indicate a
relationship between the two variables. This does not mean that there is no
relationship at all; it simply means that there is not a linear relationship.
A zerocorrelation is often indicated using the abbreviation r = 0.
Actually it is not a type of correlation but still it is called as zero or no correlation.
When we don’t find any relationship between the variables then, it is said to be
zero
correlation. It means a change in value of one variable doesn’t influence or change the
value of other variable. For example, the correlation between weight
Rawmaterial:X 10 20 30 40 50 60
FinishedProduct:Y 2 4 6 8 10 12
Non-linear Correlation: If the amount of change in one variable does not bear aconstant ratio to
the amount of change to the other variable, then correlation is said tobe non- linear. If such
variables are plotted on a graph, the points would fall on a curveand not on a straight line. For
example, if we double the amount of advertisement expenditure, then sales volume would
notnecessarily be doubled.
AdvertisementExpenses:X 10 20 30 40 50 60
SalesVolume:Y 2 4 6 8 10 12
2.Giventhefollowingpairsofvalues:
CapitalEmployed(Rs.InCrore) 1 2 3 4 5 7 8 9 11 12
Profit(Rs.InLakhs) 3 5 4 7 9 8 10 11 12 14
Draw a scatter diagram
Do yothinkthatthereisanycorrelationbetweenprofitsandcapitalemployed? Isit
positive or negative? Is it high or low?
Solution:
From the observation of scatter diagram we can say that the variables are positively
correlated. In the diagram the points trend toward upward rising from the lower lefthand
corner to the upper right hand corner, hence it is positive correlation. Plotted points are in
narrow band which indicates that it is a case of high degree of positivecorrelation.
16
14
12
10
8
6
4
2
0
0 2 4 6 8 10 12 14
CapitalEmployed(Rs.inCrore)
Firm 1 2 3 4 5 6 7 8 9 10
AdvertisementExp.(Rs.InLakhs) 11 13 14 16 16 15 15 14 13 13
SalesVolume(Rs.InLakhs) 50 50 55 60 65 65 65 60 60 50
Solution:
LetusassumethatadvertisementexpensesarevariableXandsalesvolumearevari
able Y.
Calculation of KarlPearson’scoefficientof correlation
Firm X Y x=X-Ẋ y=Y-Ẏ xy
x2 y2
1 11 50 -3 9 -8 64 24
2 13 50 -1 1 -8 64 8
3 14 55 0 0 -3 9 0
4 16 60 2 4 2 4 4
5 16 65 2 4 7 49 14
6 15 65 1 1 7 49 7
7 15 65 1 1 7 49 7
8 14 60 0 0 2 4 0
9 13 60 -1 1 2 4 -2
10 13 50 -1 1 -8 64 8
140 580 22 360 70
∑X ∑Y ∑x2 ∑y2 ∑xy
4. Findthecorrelationcoefficientbetweenageandplayinghabitsofthefollowingstu
dentsusingKarlPearson’scoefficient of correlationmethod.
Age 15 16 17 18 19 20
Numberofstudents 250 200 150 120 100 80
RegularPlayers 200 150 90 48 30 12
Solution:
Tofindthecorrelationbetweenageandplayinghabitsofthestudents,weneedtocompute the
percentages of students who are having the playing habit.
Now,letusassumethatagesofthestudentsarevariableXandpercentagesofplaying
habitsarevariableY.
CalculationofKarlPearson’scoefficientofcorrelation
Percentage
No Regula
Age(X ofStude r
of Playing X-Ẋ (X- Y- (Y-Ẏ)2 (X -Ẋ)(Y-
) n Player Habits(Y) Ẋ)2 Ẏ Ẏ)
ts s
15 25 20 80 - 6.25 30 900 -75
0 0 2.5
16 20 15 75 - 2.25 25 625 -37.5
0 0 1.5
17 15 90 60 - 0.25 10 100 -5
0 0.5
18 12 48 40 0.5 0.25 -10 100 -5
0
19 10 30 30 1.5 2.25 -20 400 -30
0
20 80 12 15 2.5 6.25 -35 1225 -87.5
105 300 17.5 3350 -240
∑X ∑Y ∑x2 ∑y2 ∑xy
Interpretation: From the above calculation it is very clear that there is high degree
negative correlation i.e. r = -0.9912, between the two variables of age and
playinghabits.i.e.Playinghabitsamongstudentsdecreaseswhentheirage increases.
5. Find Karl Pearson’s coefficient of correlation between capital employed and profit
obtained from the following data.
CapitalEmployed(Rs.InCrore) 10 20 30 40 50 60 70 80 90 100
Profit(Rs.InCrore) 2 4 8 5 10 15 14 20 22 50
Solution:
Letusassumethatcapitalemployedis variable and profit isvariableY.
17
6. Explain in Detail About Types of Data?
1. Qualitative Data
2. Ranked Data
3. Quantitative Data
The precise form of a statistical analysis often depends on whether data
arequalitative, ranked, or quantitative.
Qualitative Data
Ranked Data
Quantitative Data
For example, the weights reported by 53 male students in Table 2.1 are
quantitative data, since any single observation, such as 160 lbs,
represents an amount of weight.
18
If the weights in Table 2.1 had been replaced with ranks, beginning with a
rank of 1 for the lightest weight of 133 lbs and ending with a rank of 53 for
the heaviest weight of 245 lbs,
These numbers would have been ranked data, since any single
observation represents not an amount, but only relative standing within
the group of 53 students.
Finally, the Y and N replies of students in Table 1.2 are qualitative data,
since any single observation is a letter that represents a class of replies.
19
FREQUENCY
DISTRIBUTION
(UNGROUPED DATA)
WEIGHT f
245 1
244 0
243 0
242 0
*
*
*
0
16
4
1
1
160
2
159
3
158
157
*
* 0
* 2
13 0
6 1
135
134
133
Total 53
First, arrange a column of consecutive numbers, beginning with the lightest
weight (133) at the bottom and ending with the heaviest weight (245) at the top.
Then place a short vertical stroke or tally next to a number each time its value
appears in the original set of data; once this process has been completed,
substitute for each tally count a number indicating the frequency (f) of
occurrence of each weight
Example
Students in a theater arts appreciation class rated the classic film The
Wizard of Ozon a 10-point scale, ranging from 1 (poor) to 10 (excellent), as
follows:
20
Since the number of possible values is relatively small—only 10—it’s
appropriate toconstruct a frequency distribution for ungrouped data. Do this.
The frequency distribution only partially displayed because there are more
than 100 possible values between the largest and smallest observations.
Grouped Data
Frequency Distribution for Grouped Data A frequency distribution is produced
whenever observations are sorted into classes of more than one value.
When observations are sorted into classes of more than one value.
21
8. Explain in detail Outliers.
Identify any outliers in each of the following sets of data collected from nine
college students.
Outliers are a summer income of $25,700; an age of 61; and a family size
of 18. No outliers for GPA.
Data Histogram
Equal units along the horizontal axis (the X axis, or abscissa) reflect the
22
various class intervals of the frequency distribution.
Equal units along the vertical axis (the Y axis, or ordinate) reflect increases
in frequency. (The units along the vertical axis do not have to be the same
width as those along the horizontal axis.)
The intersection of the two axes defines the origin at which both numerical
scales equal 0.
Histogram
Numerical scales always increase from left to right along the horizontal axis and
from bottom to top along the vertical axis. It is considered good practice to use
wiggly lines to highlight breaks in scale, such as those along the horizontal axis in
figure.
Frequency Polygon
23
B. Place dots at the midpoints of each bar top or, in the absence of bar tops,
at midpoints for classes on the horizontal axis, and connect them with
straight lines. [To find the midpoint of any class, such as 160–169, simply
add the two tabled boundaries (160 + 169 = 329) and divide this sum by 2
(329/2 = 164.5).]
24
10. Explain in detail about z Scores for Non-normal Distributions
In this case, the standard normal table cannot be consulted, since the
shape of thedistribution of z scores is the same as that for the original
non-normal distribution.
The latter score suggests that she did relatively well on the math test, being
almost two standard deviation units above the mean.
The use of z scores can help you identify a person’s relative strengths and
weaknesses on several different tests.
A glance at the z scores suggests that although she did relatively well on the
25
math test, her performance on the English test was only slightly above
average, as indicated by a z score of 0.50, and her performance on the
psychology test was slightly below average, as indicated by a z score of –0.67.
Standard Score
Whenever any unit-free scores are expressed relative to a known mean and a
known standard deviation, they are referred to as standard scores.
Although z scores qualify as standard scores because they are unit-free and
expressed relative to a known mean of 0 and a known standard deviation of 1,
other scores also qualify as standard scores.
Transformed Standard Scores is the types of unit-free standard scores that lack
negative signs and decimal points.
For example, a test score located one standard deviation below the mean
might be reported not as a z score of –1.00 but as a T score of 40 in a
distribution of T scores with a mean of 50 and a standard deviation of 10.
The important point to realize is that although reported as a score of 40, this T
score accurately reflects the relative location of the original z score of –1.00: A
T score of 40 is located at a distance of one standard deviation (of size 10)
below the mean (of size 50).
shows the values of some of the more common types of transformed standard
scores relative to the various portions of the area under the normal curve.
26
Converting to Transformed Standard Scores
Use the following formula to convert any original standard score, z, into a
transformed standard score, z′, having a distribution with any desired mean
and standard deviation.
where z′ (called z prime) is the transformed standard score and z is the original
standard score.
Standard Normal Curve is the one normal curve for which a table is actually
available.
Standard Normal Curve is the tabled normal curve for z scores, with a mean
of 0 and a standard deviation of 1.
However, to verify (rather than prove) that the mean of a standard normal
distribution equals 0, replace X in the z score formula with μ, the mean of any
(nonstandard) normal distribution, and then solve for z:
27
to verify that the standard deviation of the standard normal distribution equals
1,replace X in the z score formula with μ + 1σ, the value corresponding to one
standard deviation above the mean for any (nonstandard) normal distribution,
and then solve for z:
Although there is an infinite number of different normal curves, each with its
own mean and standard deviation, there is only one standard normal curve,
with a mean of 0 and a standard deviation of 1.
The sum of squared deviation scores, or more simply the sum of squares,
symbolized
by SS, merits special attention because it’s a major component in
calculations for thevariance, as well as many other statistical measures.
Sum of Squares (SS)
The sum of squares equals the sum of all squared deviation scores.” You
canreconstruct this formula by remembering the following three steps:
1. Subtract the population mean, μ, from each original score, X, to obtain
a deviationscore, X − μ.
2. Square each deviation score, (X − μ)2 , to eliminate negative signs.
3. Sum all squared deviation scores, Σ (X − μ) 2
29
where ∑ X2 , the sum of the squared X scores, is obtained by first squaring
each X score and then summing all squared X scores; ∑ X2 , the square of
sum of all X scores, is obtained by first adding all X scores and then squaring
the sum of all X scores; and N is the population size.
where X, the sample mean, replaces μ, the population mean, and n, the
sample size,replaces N, the population size.
30
Standard Deviation for Population σ
A mean is defined as the sum of all scores divided by the number of scores.
The variance is the mean of all squared deviation scores, it can be defined
as thesum of all squared deviation scores divided by the number of scores:
variance = sum of all squared deviation scores /number
of scoresor, in symbols: