0% found this document useful (0 votes)
39 views38 pages

Fds Unit 2 Final

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views38 pages

Fds Unit 2 Final

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 38

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS

II YEAR / IV SEMESTER (B.Tech- ARTIFICIAL INTELLIGENCE AND DATA SCIENCE)


UNIT - II

DESCRIPTIVE ANALYTICS

PREPARED BY

S.SANTHI PRIYA, M.E., (AP/

AI&DS)

VERIFIED BY

HOD PRINCIPAL CEO/CORRESPONDENT

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SENGUNTHAR COLLEGE OF ENGINEERING ,TIRUCHENGODE-637 205.


1
UNIT II
DESCRIPTIVE ANALYTICS

 Frequency Distributions

 Outliers

 Interpreting Distributions

 Graphs

 Averages

 DescribingVariability

 Interquartile Range

 Variability For Qualitative And Ranked Data

 Normal Distributions

 Z Scores

 Correlation

 Scatter Plots

 Regression

 Regression Line

 Least Squares Regression Line

 Standard Error Of Estimate

 Interpretation Of R2

 Multiple Regression Equations

 Regression Toward The Mean


LIST OF IMPORTANT QUESTIONS

UNIT II
DESCRIPTIVE ANALYTICS

PART A (2 marks)

1. What do you mean by Correlation?


2. What do you mean by correlation coefficient?
3. Write down the Uses of correlations:
4. What is Multiple Correlation ?
5. Stateineach casewhetherthereis
 PositiveCorrelation

 NegativeCorrelation

 NoCorrelation

6. List out the Properties of Coefficient of Correlation.

7.What are the Uses of Regression Analysis?

8.Distinguish the Correlation and Regression.

9. What is Regression Coefficient?

10. What are the types of data in statistical analysis?


PART B(16 marks)

1. Explain in detail about the types of Correlation.


2. Given the following pairs of values:

Capital Employed(Rs.InCrore) 1 2 3 4 5 7 8 9 11 12
Profit(Rs.InLakhs) 3 5 4 7 9 8 10 11 12 14
(a) Draw ascatter diagram
(b) Doyouthinkthatthereisanycorrelationbetweenprofitsandcapitalem
ployed? Isit positive or negative? Is it high or low?
3.Fromfollowinginformationfindthecorrelationcoefficientbetweenadvertisement
expensesandsales volume using KarlPearson’s coefficient of correlation
method.
Firm 1 2 3 4 5 6 7 8 9 10
AdvertisementExp.(Rs.InLakhs) 11 13 14 16 16 15 15 14 13 13
SalesVolume(Rs.InLakhs) 50 50 55 60 65 65 65 60 60 50

4.Findthecorrelationcoefficientbetweenageandplayinghabitsofthefollowing students
using KarlPearson’s coefficient of correlation method.
Age 15 16 17 18 19 20
Number of students 250 200 150 120 100 80
Regular Players 200 150 90 48 30 12

5.Find Karl Pearson’s coefficient of correlation between capital employed and


profit obtained from the following data.
Capital Employed(Rs.InCrore) 10 20 30 40 50 60 70 80 90 100
Profit(Rs.InCrore) 2 4 8 5 10 15 14 20 22 50
UNIT II
DESCRIPTIVE
ANALYTICS
PART A

1. What do you mean by Correlation?

Correlation is a statistical technique to ascertain the association or relationship between


two or more variables. Correlation analysis is a statistical technique to study the degree
and direction of relationship between two or more variables.
2.What do you mean by correlation coefficient?

A correlation coefficient is a statistical measure of the degree to which changes to the


value of one variable predict change to the value of another. When the fluctuation of one
variable reliably predicts a similar fluctuation in another variable, there’s often a tendency to
think that means that the change in one causes the change in the other.
3.Write down the Uses of correlations:

I. Correlation analysis helps inn deriving precisely the degree and the direction of
such relationship.
II. The effect of correlation is to reduce the range of uncertainity of our
prediction. The prediction based on correlation analysis will be more reliable and
near to reality.
III. Correlation analysis contributes to the understanding of economic behaviour, aids
in locating the critically important variables on which others depend, may reveal to
the economist the connections by which disturbances spread and suggest to him
the paths through which stabilizing farces may become effective
IV. Economic theory and business studies show relationships between variables like
price and quantity demanded advertising expenditure and sales promotion
measures etc.
V. The measure of coefficient of correlation is a relative measure of change.

Types of Correlation: Correlation is described or classified in several different ways.


Three of the most important are: I. Positive and Negative II. Simple, Partial and
Multiple III. Linear and non-linear
4.What is Multiple Correlation ?

When three or more variables are studied, it is a case of multiple correlation. For
example, in above example if study covers the relationship between student marks,
attendance of students, effectiveness of teacher, use of teaching aids etc,
5. Stateineach casewhetherthereis
(a) Positive Correlation
(b) Negative Correlation
(c) No Correlation
S.NO Particulars Solution
1 Price of commodity and its demand Negative
2 Yield of crop and amoun to frainfall Positive
3 No of fruits eatenandhungryofaperson Negative
4 Noofunitsproduced andfixedcostperunit Negative
5 Noofgirlsintheclassandmarksofboys NoCorrelation
6 AgesofHusbandsand wife Positive
7 Temperatureandsaleofwoollengarments Negative
8 Numberofcowsand milkproduced Positive
9 WeightofpersonandintelligencE NoCorrelation
10 Advertisementexpenditureandsalesvolume Positive

6.List out the Properties of Coefficient of Correlation.


The coefficient of correlation always liesbetween–1to+1,symbolically it can written as– 1 ≤r≤ 1.
 The coefficient of correlation is independent of change of origin and scale.
 The coefficient of correlation is a pure number and is independent of the units of
measurement. It means if X represent say height in inches and Y represent say weights
in kgs, then the correlation coefficient will be neither in inches nor in kgs but only a pure
number.
 The coefficient of correlation is the geometric mean of two regression
coefficient,symbolically
𝑟2=bxy∗byx
 IfXandYareindependentvariablesthencoefficientofcorrelationiszero.
 A study of measuring the relationship between associated variables, where in one
variable is dependent on another independent variable, called as Regression. It is
developed by Sir Francis Galton in 1877 to measure the relationship of height between
parents and their children.

7.What are the Uses of Regression Analysis?


 It provides estimates of values of the
dependentvariablesfromvaluesofindependent variables.
 It is used to obtain a measure of the error involved in using the regression line
basis for estimation.
 Withthehelpofregressionanalysis,wecanobtainameasureofdegreeofassociationor
correlation that exists between the twovariables.
8.Distinguish the Correlation and Regression.

9. What is Regression Coefficient?


The quantity“b”in the regression equation is called as the regression coefficient or
slope coefficient. Since there are two regression equations, therefore, we have two
regression coefficients.
1. Regression Coefficient X on Y, symbolically written as“ bxy”
2. Regression Coefficient Y on X, symbolically written as “byx
10. What are the types of data in statistical analysis?
 Qualitative data
 Ranked data
11. Define Qualitative data
A set of observations where any single observation is a word, letter, or
numerical code that represents a class or category.
12. Define Ranked data
A set of observations where any single observation is a number that
indicates relative standing
13. What are Levels of Measurement
 Levels of measurement specify the extent to which a number (or word or
letter) actually represents some attribute and, therefore, has implications
for the appropriateness of various arithmetic operations and statistical
procedures.
 There are three levels of measurement
 nominal
 ordinal
 interval/ratio
14. What is nominal measurement?
 Nominal measurement is classification v that is, sorting observations into
different classes or categories. Words, letters, or numerical codes reflect
only differences in kind, not differences in amount.
 Examples of nominal measurement include classifying mood disorders as
manic, bipolar, or depressive.
15. What is Ordinal measurement?
 Ordinal measurement is order. The relative standing of ranked data that
reflects differences in degree based on the order.
 For example, it’s inappropriate to conclude that the arithmetic means of
ranks 1 and 3 equals rank 2, since this assumes that the actual distance
between ranks 1 and 2 equals the distance between ranks 2 and 3.

16. What is Interval/Ratio measurement?


Interval/ratio measurement is equal intervals and a true zero. Amounts or
counts of quantitative data reflect differences in degree based on equal intervals
and a true zero.
17. What is a Variable in statistical analysis?
A variable is a characteristic or property that can take on different values.
18. Define Constant in statistical analysis

A Constant is a characteristic or property that can take on only one value.

19.What is meant by Discrete and Continuous Variables?


 A discrete variable consists of isolated numbers separated by gaps.
 Examples include most counts, such as the number of children in a
family.
 A continuous variable consists of numbers whose values, at least in
theory, have no restrictions.
Examples include amounts, such as weights of male statistics students.
20.What is meant by Independent and Dependent Variables?
 Independent Variable
 An independent variable is a treatment manipulated by the investigator.
 Dependent Variable
 A dependent Variable is a variable that is believed to have been
influenced by the independent variable.
21. What is frequency distribution and give usage?
 A frequency distribution is a collection of observations produced by sorting
observations into classes and showing their frequency (f) of occurrence in
each class.
 A frequency distribution helps us to detect any pattern in the data
(assuming a pattern exists) by superimposing some order on the inevitable
variability among observations.

22. Give guidelines for frequency distributions


rules. Essential
 Each observation should be included in one, and only one, class.
 List all classes, even those with zero frequencies.
 All classes should have equal intervals.

Optional
 All classes should have both an upper boundary and a lower boundary.
 when many different tables must be listed, as in the Statistical Abstract
of the UnitedStates. An open-ended class appears in the table “Two Age
Distributions”.
 The lower boundary of each class interval should be a multiple of the
class interval.
 Aim for a total of approximately 10 classes.
23. Define outliers.
 An outlier is an extremely high or extremely low data point relative to the
nearest data point and the rest of the neighboring co-existing values in a
data graph or dataset.

24. What is frequency distribution for Ungrouped Data and grouped Data?
 Frequency Distribution for Ungrouped Data
 A frequency distribution produced whenever observations are sorted
intoclasses of single values.
 Frequency Distribution for Grouped Data
 A frequency distribution pro duced whenever observations are sorted
into classes of more than one value
25. Define Unit of Measurement.
The smallest possible difference between scores

26. Define Real Limits of Class Intervals


Located at the midpoint of the gap between adjacent tabled boundaries.

27. What is Relative Frequency Distribution


A frequency distribution showing the frequency of each class as a fraction of
the total frequency for the entire distribution.
28. What is Cumulative Frequency Distribution
 A frequency distribution showing the total number of observations in each
class and all lower-ranked classes.

29. What is mean ?


The mean is found by adding all scores and then dividing by the number
of scores. Mean = sum of all scores / number of scores

30. Define the terms Population, Population Mean (μ) and Population Size (N).
 Population
 A complete set of scores.
 Population Mean (μ)
 The balance point for a population, found by dividing the sum for
all scores in the population by the number of scores in the

population.

 Population Size (N)


 The total number of scores in the population
31. Define the terms Population, Population Mean (μ) and Population Size (N).
 Sample
 A subset of scores
 Sample Mean (x̄ )
 The balance point for a sample, found by dividing the sum for
the values of all scores in the sample by the number of scores in
the sample.
 Sample Size (n)
 The total number of scores in the sample
32. Define Measures of Variability
Descriptions of the amount by which scores are dispersed or scattered in a
distribution
33. What is range, Variance and Standard Deviation?
 Range
 The difference between the largest and smallest scores
 Variance
 The mean of all squared deviation scores

34. Define z Score


 A unit-free, standardized score that indicates how many standard
deviations a score is above or below the mean of its distribution

Where,
X=sample mean, μ = Population Mean, σ = Population Standard Deviation.

35. Define Sum of Squares (SS)


 The sum of squared deviation scores

36. Define Population Standard Deviation (σ).


A rough measure of the average amount by which scores in the population
deviate on either side of their population mean

where, N= amount of score ,s =scores

37. Define Sample Standard Deviation (s).


A rough measure of the average amount by which scores in the sample deviate
on either side of their sample mean.
Where,
Ss= Sum of Squares,n=count
PART B (16- MARKS)
1. Explain in detail about the types of Correlation.
Correlationisdescribedorclassifiedinseveraldifferentways.Threeofthemost important
are:
I. Positive and Negative

II. Simple,Partial and Multiple

III. Linear and non-linear

I. Positive,NegativeandZeroCorrelation:
Whether correlation is positive (direct) or negative (in-versa) would depend up on the
direction of change of the variable.
Positive Correlation:
If both the variables vary in the same direction, correlation is said to be positive. It means
if one variable is increasing, the other on an,the other on an average is also deceasing, then
the correlation is said be positive correlation.
Height( 1 1 1 1 1 1 1 1
cm):X 5 6 6 6 6 7 7 7
8 0 3 6 8 1 4 6
Weight(kg):Y 60 62 64 65 67 69 71 72
Negative Correlation:
Negative correlation is a relationship between two variables such that as the value
of one variable increases, the other decreases. Correlation is expressed on a range from
+1 to -1, known as the correlation co efficent. Values below zero express negative
correlation.

PriceofProduct(Rs.PerUnit):X 6 5 4 3 2 1
Demand(InUnits) :Y 75 120 175 250 215 400
Zero Correlation:
 Zero correlation suggests that the correlation statistic does not indicate a
relationship between the two variables. This does not mean that there is no
relationship at all; it simply means that there is not a linear relationship.
 A zerocorrelation is often indicated using the abbreviation r = 0.
 Actually it is not a type of correlation but still it is called as zero or no correlation.
When we don’t find any relationship between the variables then, it is said to be
zero
correlation. It means a change in value of one variable doesn’t influence or change the
value of other variable. For example, the correlation between weight

II. Simple, Partial and Multiple Correlation:


The distinction between simple, partial and multiple correlation is based up on the number
of variables studied.
Simple Correlation:
Whenonlytwovariablesarestudied,itisacaseofsimplecorrelation.Forexample,whenonest
udiesrelationshipbetweenthemarkssecuredby student and the attendance of student in class, it is
a problem of simple correlation
Partial Correlation:
Partial correlation is the measure of association between two variables, while controlling
or adjusting the effect of one or more additional variables.
Multiple Correlation:
When three or more variables are studied ,it is acase of multiple correlation. For example, in above
example if study covers.
III. Linear and Non-linear Correlation:
Depending upon the constancy of the ratio of change between the variables, the correlation
may be Linear or Non-linear Correlation.
Linear Correlation:
If the amount of change in one variable bears a constant ratio tothe amount of change in the
other variable, then correlation is said to be linear. If such variables are plotted on a graph paper all
the plotted points would fall on a straight line. For example: If it is assumed that, to produce one unit of
finished product weneed10unitsofrawmaterials,then subsequentlytoproduce2unitsoffinishedproduct we
need double of the one unit.

Rawmaterial:X 10 20 30 40 50 60

FinishedProduct:Y 2 4 6 8 10 12

Non-linear Correlation: If the amount of change in one variable does not bear aconstant ratio to
the amount of change to the other variable, then correlation is said tobe non- linear. If such
variables are plotted on a graph, the points would fall on a curveand not on a straight line. For
example, if we double the amount of advertisement expenditure, then sales volume would
notnecessarily be doubled.

AdvertisementExpenses:X 10 20 30 40 50 60

SalesVolume:Y 2 4 6 8 10 12
2.Giventhefollowingpairsofvalues:

CapitalEmployed(Rs.InCrore) 1 2 3 4 5 7 8 9 11 12

Profit(Rs.InLakhs) 3 5 4 7 9 8 10 11 12 14
 Draw a scatter diagram
 Do yothinkthatthereisanycorrelationbetweenprofitsandcapitalemployed? Isit
positive or negative? Is it high or low?

Solution:
From the observation of scatter diagram we can say that the variables are positively
correlated. In the diagram the points trend toward upward rising from the lower lefthand
corner to the upper right hand corner, hence it is positive correlation. Plotted points are in
narrow band which indicates that it is a case of high degree of positivecorrelation.
16
14
12
10
8
6
4
2

0
0 2 4 6 8 10 12 14
CapitalEmployed(Rs.inCrore)

KarlPearson’s Coefficient of Correlation:

Karl Pearson’s method of calculating coefficient of correlation is based on the


covariance of the two variables in a series. This method is widely used in practice and the
coefficient of correlation is denoted by the symbol “r”.If the two variables understudy are X
and Y, the following formula suggested by Karl Pearson can be used for measuring the
degreeof relationship of correlation.
3.From following information find the correlation coefficient between advertisement
expenses and sales volume using Karl Pearson’s coefficient of correlation method.

Firm 1 2 3 4 5 6 7 8 9 10
AdvertisementExp.(Rs.InLakhs) 11 13 14 16 16 15 15 14 13 13
SalesVolume(Rs.InLakhs) 50 50 55 60 65 65 65 60 60 50

Solution:
LetusassumethatadvertisementexpensesarevariableXandsalesvolumearevari
able Y.
Calculation of KarlPearson’scoefficientof correlation
Firm X Y x=X-Ẋ y=Y-Ẏ xy
x2 y2
1 11 50 -3 9 -8 64 24
2 13 50 -1 1 -8 64 8
3 14 55 0 0 -3 9 0
4 16 60 2 4 2 4 4
5 16 65 2 4 7 49 14
6 15 65 1 1 7 49 7
7 15 65 1 1 7 49 7
8 14 60 0 0 2 4 0
9 13 60 -1 1 2 4 -2
10 13 50 -1 1 -8 64 8
140 580 22 360 70
∑X ∑Y ∑x2 ∑y2 ∑xy
4. Findthecorrelationcoefficientbetweenageandplayinghabitsofthefollowingstu
dentsusingKarlPearson’scoefficient of correlationmethod.
Age 15 16 17 18 19 20
Numberofstudents 250 200 150 120 100 80
RegularPlayers 200 150 90 48 30 12
Solution:
 Tofindthecorrelationbetweenageandplayinghabitsofthestudents,weneedtocompute the
percentages of students who are having the playing habit.

 Percentage of playing habits=No.ofRegularPlayers/TotalNo. ofStudents*100

 Now,letusassumethatagesofthestudentsarevariableXandpercentagesofplaying
habitsarevariableY.

CalculationofKarlPearson’scoefficientofcorrelation
Percentage
No Regula
Age(X ofStude r
of Playing X-Ẋ (X- Y- (Y-Ẏ)2 (X -Ẋ)(Y-
) n Player Habits(Y) Ẋ)2 Ẏ Ẏ)
ts s
15 25 20 80 - 6.25 30 900 -75
0 0 2.5
16 20 15 75 - 2.25 25 625 -37.5
0 0 1.5
17 15 90 60 - 0.25 10 100 -5
0 0.5
18 12 48 40 0.5 0.25 -10 100 -5
0
19 10 30 30 1.5 2.25 -20 400 -30
0
20 80 12 15 2.5 6.25 -35 1225 -87.5
105 300 17.5 3350 -240
∑X ∑Y ∑x2 ∑y2 ∑xy

Interpretation: From the above calculation it is very clear that there is high degree
negative correlation i.e. r = -0.9912, between the two variables of age and
playinghabits.i.e.Playinghabitsamongstudentsdecreaseswhentheirage increases.
5. Find Karl Pearson’s coefficient of correlation between capital employed and profit
obtained from the following data.
CapitalEmployed(Rs.InCrore) 10 20 30 40 50 60 70 80 90 100
Profit(Rs.InCrore) 2 4 8 5 10 15 14 20 22 50

Solution:
Letusassumethatcapitalemployedis variable and profit isvariableY.

Calculation of KarlPearson’scoefficient of correlation


X Y XY n∑XY−∑X∑Y
X2 Y2 r=
10 2 100 4 20 √[n(∑X2)−(∑X)2][n(∑Y2)−(∑Y)2]
20 4 400 16 80
30 8 900 64 240
40 5 1600 25 200 (10∗11500)−(550∗150)
r=
50 10 2500 100 500 √[(10∗38500)−(5502)][(10∗4014)−(1502)]
60 15 3600 225 900
70 14 4900 196 980
80 20 6400 400 1600 (1,15,000)−(82,500)
90 22 8100 484 1980 r= √[(3,85,000)−(3,02,500)][(40,140)−(22,500)]
100 50 10000 2500 5000
550 150 38500 4014 11500
∑X ∑Y ∑XY 32,500
∑X2 ∑Y2 r= √(82,500)(17,640) =
32,500
√1455300000

17
6. Explain in Detail About Types of Data?

 Any statistical analysis is performed on data.

 Data is a collection of actual observations or scores in a survey or an


experiment.

 There are three types of data

1. Qualitative Data
2. Ranked Data
3. Quantitative Data
 The precise form of a statistical analysis often depends on whether data
arequalitative, ranked, or quantitative.

Qualitative Data

 Qualitative Data is A set of observations where any single observation


is a word, letter, or numerical code that represents a class or category.

 Qualitative data consist of words (Yes or No), letters (Y or N), or numerical


codes (0 or1) that represent a class or category.

Ranked Data

 Ranked Data is A set of observations where any single observation is a


number thatindicates relative standing.

 Ranked data consist of numbers (1st, 2nd, . . . 40th place) that


represent relative standing within a group.

Quantitative Data

 Quantitative Data A set of observations where any single observation is


a numberthat represents an amount or a count.

 Quantitative data consist of numbers (weights of 238, 170, . . . 185


lbs) thatrepresent an amount or a count.

 To determine the type of data, focus on a single observation in any


collection of observations.

 For example, the weights reported by 53 male students in Table 2.1 are
quantitative data, since any single observation, such as 160 lbs,
represents an amount of weight.

18
 If the weights in Table 2.1 had been replaced with ranks, beginning with a
rank of 1 for the lightest weight of 133 lbs and ending with a rank of 53 for
the heaviest weight of 245 lbs,

 These numbers would have been ranked data, since any single
observation represents not an amount, but only relative standing within
the group of 53 students.

 Finally, the Y and N replies of students in Table 1.2 are qualitative data,
since any single observation is a letter that represents a class of replies.

7. Explain in detail about Describing Data with Tables using frequency


distribution with an example.

 A frequency distribution is a collection of observations produced by sorting


observations into classes and showing their frequency (f) of occurrence in
each class.

 A frequency distribution helps us to detect any pattern in the data


(assuming a pattern exists) by superimposing some order on the inevitable
variability among observations.

19
FREQUENCY
DISTRIBUTION
(UNGROUPED DATA)
WEIGHT f
245 1
244 0
243 0
242 0
*
*
*
0
16
4
1
1
160
2
159
3
158
157
*
* 0
* 2
13 0
6 1
135
134
133

Total 53
 First, arrange a column of consecutive numbers, beginning with the lightest
weight (133) at the bottom and ending with the heaviest weight (245) at the top.

 Then place a short vertical stroke or tally next to a number each time its value
appears in the original set of data; once this process has been completed,
substitute for each tally count a number indicating the frequency (f) of
occurrence of each weight

 Example

Students in a theater arts appreciation class rated the classic film The
Wizard of Ozon a 10-point scale, ranging from 1 (poor) to 10 (excellent), as
follows:

20
Since the number of possible values is relatively small—only 10—it’s
appropriate toconstruct a frequency distribution for ungrouped data. Do this.

 Frequency Distribution for Ungrouped Data - A frequency distribution is


produced whenever observations are sorted into classes of single values.

 When observations are sorted into classes of single values, as in Table


2.3, the result is referred to as a frequency distribution for ungrouped
data.

 The frequency distribution only partially displayed because there are more
than 100 possible values between the largest and smallest observations.

 Frequency distributions for ungrouped data are much more informative


when the number of possible values is less than about 20.

Grouped Data
 Frequency Distribution for Grouped Data A frequency distribution is produced
whenever observations are sorted into classes of more than one value.
 When observations are sorted into classes of more than one value.

21
8. Explain in detail Outliers.

 Be prepared to deal occasionally with the appearance of one or more very


extreme scores, or outliers. A GPA of 0.06, an IQ of 170, summer wages
of $62,000—each requires special attention because of its potential impact
on a summary of the data.

 Identify any outliers in each of the following sets of data collected from nine
college students.

 Outliers are a summer income of $25,700; an age of 61; and a family size
of 18. No outliers for GPA.

9. Explain in detail Graphs for Quantitative

Data Histogram

 A bar-type graph for quantitative data.

 The common boundaries between adjacent bars emphasize the continuity


of thedata, as with continuous variables.

 A casual glance at this histogram confirms previous conclusions: a dense


concentration of weights among the 150s, 160s, and 170s, with a spread
in thedirection of the heavier weights.

 pinpoint some of the more important features of histograms.

 Equal units along the horizontal axis (the X axis, or abscissa) reflect the

22
various class intervals of the frequency distribution.

 Equal units along the vertical axis (the Y axis, or ordinate) reflect increases
in frequency. (The units along the vertical axis do not have to be the same
width as those along the horizontal axis.)

 The intersection of the two axes defines the origin at which both numerical
scales equal 0.

Histogram

Numerical scales always increase from left to right along the horizontal axis and
from bottom to top along the vertical axis. It is considered good practice to use
wiggly lines to highlight breaks in scale, such as those along the horizontal axis in
figure.

Frequency Polygon

 An important variation on a histogram is the frequency polygon, or line graph.

 Frequency polygons may be constructed directly from frequency


distributions. However, we will follow the step-by-step transformation of a
histogram into a frequency polygon, as described in panels A, B, C, and D

 Frequency Polygon is A line graph for quantitative data that also


emphasizes the continuity of continuous variables.

A. This panel shows the histogram for the weight distribution.

23
B. Place dots at the midpoints of each bar top or, in the absence of bar tops,
at midpoints for classes on the horizontal axis, and connect them with
straight lines. [To find the midpoint of any class, such as 160–169, simply
add the two tabled boundaries (160 + 169 = 329) and divide this sum by 2
(329/2 = 164.5).]

24
10. Explain in detail about z Scores for Non-normal Distributions

 z scores are not limited to normal distributions. Non-normal distributions


also canbe transformed into sets of unit-free, standardized z scores.

 In this case, the standard normal table cannot be consulted, since the
shape of thedistribution of z scores is the same as that for the original
non-normal distribution.

 For instance, if the original distribution is positively skewed, the distribution


of zscores also will be positively skewed.

 Regardless of the shape of the distribution, the shift to z scores always


produces adistribution of standard scores with a mean of 0 and a
standard deviation of 1.

Interpreting Test Scores

 z scores provide efficient descriptions of relative performance on one or more


tests.

 Without additional information, it is meaningless to know that Sharon earned a


raw score of 159 on a math test, but it is very informative to know that she
earned a z score of 1.80.

 The latter score suggests that she did relatively well on the math test, being
almost two standard deviation units above the mean.

 More precise interpretations of this score could be made, of course, if it is known


that the test scores approximate a normal curve.

 The use of z scores can help you identify a person’s relative strengths and
weaknesses on several different tests.

 For instance, Sharon’s scores on college achievement tests in three different


subjects. The evaluation of her test performance is greatly facilitated by
converting her raw scores into the z scores listed in the final column

 A glance at the z scores suggests that although she did relatively well on the
25
math test, her performance on the English test was only slightly above
average, as indicated by a z score of 0.50, and her performance on the
psychology test was slightly below average, as indicated by a z score of –0.67.

Standard Score

 Whenever any unit-free scores are expressed relative to a known mean and a
known standard deviation, they are referred to as standard scores.

 Although z scores qualify as standard scores because they are unit-free and
expressed relative to a known mean of 0 and a known standard deviation of 1,
other scores also qualify as standard scores.

Transformed Standard Scores

 Transformed Standard Scores is the types of unit-free standard scores that lack
negative signs and decimal points.

 For example, a test score located one standard deviation below the mean
might be reported not as a z score of –1.00 but as a T score of 40 in a
distribution of T scores with a mean of 50 and a standard deviation of 10.

 The important point to realize is that although reported as a score of 40, this T
score accurately reflects the relative location of the original z score of –1.00: A
T score of 40 is located at a distance of one standard deviation (of size 10)
below the mean (of size 50).

 shows the values of some of the more common types of transformed standard
scores relative to the various portions of the area under the normal curve.

26
Converting to Transformed Standard Scores

 Use the following formula to convert any original standard score, z, into a
transformed standard score, z′, having a distribution with any desired mean
and standard deviation.

 where z′ (called z prime) is the transformed standard score and z is the original
standard score.

 The transformed score of 350 is located at a distance of 1.5 standard deviation


units (each of size 100) below the mean (of size 500).

 The change from a z score of −1.50 to a z′ score of 350 eliminates negative


signs and decimal points without distorting the relative location of the original
score, expressedas a distance from the mean in standard deviation units.

11. Explain in detail about Standard Normal Curve

 If the original distribution approximates a normal curve, then the shift to


standard or z scores will always produce a new distribution that approximates
the standardnormal curve.

 Standard Normal Curve is the one normal curve for which a table is actually
available.

 Standard Normal Curve is the tabled normal curve for z scores, with a mean
of 0 and a standard deviation of 1.

 However, to verify (rather than prove) that the mean of a standard normal
distribution equals 0, replace X in the z score formula with μ, the mean of any
(nonstandard) normal distribution, and then solve for z:

27
 to verify that the standard deviation of the standard normal distribution equals
1,replace X in the z score formula with μ + 1σ, the value corresponding to one
standard deviation above the mean for any (nonstandard) normal distribution,
and then solve for z:

 Although there is an infinite number of different normal curves, each with its
own mean and standard deviation, there is only one standard normal curve,
with a mean of 0 and a standard deviation of 1.

12. Explain in detail about Standard

deviation Sum of Squares (SS)

 The sum of squared deviation scores, or more simply the sum of squares,
symbolized
by SS, merits special attention because it’s a major component in
calculations for thevariance, as well as many other statistical measures.
 Sum of Squares (SS)

 The sum of squared deviation scores.


Sum of Squares Formulas for Population
 The definition formula provides the most accessible version of the population
sum of squares:

 Where SS represents the sum of squares, Σ directs us to sum over the


expression
28
to its right, and (X − μ) 2 denotes each of the squared deviation scores.

 The sum of squares equals the sum of all squared deviation scores.” You
canreconstruct this formula by remembering the following three steps:
1. Subtract the population mean, μ, from each original score, X, to obtain
a deviationscore, X − μ.
2. Square each deviation score, (X − μ)2 , to eliminate negative signs.
3. Sum all squared deviation scores, Σ (X − μ) 2

29
 where ∑ X2 , the sum of the squared X scores, is obtained by first squaring
each X score and then summing all squared X scores; ∑ X2 , the square of
sum of all X scores, is obtained by first adding all X scores and then squaring
the sum of all X scores; and N is the population size.

Calculation Of Population Standard Deviation (Σ) (Computation Formula) Sum


of Squares Formulas for Sample
 Sample notation can be substituted for population notation in the above two
formulas without causing any essential changes:

 where X, the sample mean, replaces μ, the population mean, and n, the
sample size,replaces N, the population size.

30
Standard Deviation for Population σ
 A mean is defined as the sum of all scores divided by the number of scores.
 The variance is the mean of all squared deviation scores, it can be defined
as thesum of all squared deviation scores divided by the number of scores:
variance = sum of all squared deviation scores /number

of scoresor, in symbols:

 where the squared lowercase Greek letter, σ2 (pronounced “sigma squared”),


represents the population variance, SS is the sum of squared deviations for
the population, and N is the population size.
 Population Standard Deviation (σ) is a rough measure of the average amount
by which scores in the population deviate on either side of their population
mean
 To rid us of the bizarre squared units of measurement, take the square root of
the variance to obtain the standard deviation, that is,

 where σ represents the population standard deviation, instructs us to take


the square root of the covered expression, and SS and N are defined above.
Standard Deviation for Sample ( s )
 Although the sum of squares term remains essentially the same for both
populations and samples, there is a small but important change in the formulas
for the variance and standard deviation for samples.
 This change appears in the denominator of each formula where N, the

population size, is replaced not by n, the sample size, but by n − 1, as shown:

 Sample Standard Deviation (s) is a rough measure of the average amount by


which scores in the sample deviate on either side of their sample mean.
31
32
33

You might also like