Basics of Statistics Unit-I SCLS
Basics of Statistics Unit-I SCLS
UNIT-II
BY
DR MASHROOR AHMAD KHAN
ASSISTANT PROFESSOR
DEPARTMENT OF TOXICOLOGY
SCLS
Types of data
Primary data
first time, afresh
Secondary data
already collected
Variable
An item of data
Value varies from one observation
to another
Examples:
gender
testscores
weight
TYPES OF VARIABLES
QUALITATIVE
DISCRETEQUANTITATIVE
CONTINOUS QUANTITATIVE
Qualitative Data
Describes the quality
Non-numerical format
Examples
gender
marital status
geographical region
job title….
Quantitative Data
Frequencies
Measurements
QUALITATIVE
Nominal
Example: Sex ( M, F)
Marital Status (single, married, widowed or divorced)
Blood Group (A,B, O or AB)
Color of Eyes (blue, green, brown, black)
For instance if we record marital status as 1, 2, 3, or 4 as stated above,
we cannot write
4 > 2 or 3 < 4 and we cannot write 3 – 1 = 4 – 2, 1 + 3 = 4 or 4 ¸ 2 = 2
ORDINAL-In those situations when we cannot do
anything except set up inequalities, we refer to the
data as ordinal data
Example:
Response to treatment
(poor, fair, good)
Severity of disease
(mild, moderate, severe)
Income status (low, middle, high)
QUANTITATIVE (DISCRETE)
QUANTITATIVE (CONTINOUS)
Number of Children
Hb
CONTINUOUS DATA
DISCRETE DATA
Interval scale :
Data is placed in meaningful intervals and order. The unit of measurement are
arbitrary.
i=
Where,
i = class interval
L = Largest observation
S = Smallest observation
n = total number of observations
Last week were you working full-time, part-time, going to school, keeping house, or
what”?
1. Working full-time
2. Working part-time
3. Temporarily not working
4. Unemployed, laid off
5. Retired
6. School
7. Keeping house
8. Other
Bar chart
Pie chart
Relationship between two nominal
variables
A major North American city has four competing newspapers:
the Globe and Mail (G&M), Post, Star, and Sun.
f1 X 1 f 2 X 2 ... f n X n f i X i
X
f1 f 2 ... f n fi
Example 1
Average birthweight = (3265+3260+...
+ 2834)/20 = 3166.9 g
Example 2: Calculate the arithmetic mean from the
following data
Merits and demerits of arithmetic mean
Merits
(i) It is rigidly defined
(ii) It is easy to understand and easy to calculate.
(iii) It is based upon all the observations.
(iv) It is amenable to algebraic treatment.
Demerits
(i) It is too much affected by extreme values.
(ii) Mostly it does not correspond to any value
of the set of observations.
Median
It is a most preferable measure of location for asymmetric distributions.
Median is the value of the variable that divides the ordered set of values
into two equal halves.
50 percent values are to the left of the median and 50 percent are to the
right of the median.
Demerits
i) Mode is ill defined. It is not always possible to find a clearly defined mode. Example-
Bimodal distributions
ii) It is not based upon all the observations
iii) It is not capable of further mathematical treatment.
iv) As compared with mean, mode is affected to a greater extent by fluctuations of sampling
Consider the series (i) 7, 8, 10, 11, (ii) 3, 6,
9, 12, 15, (iii) 1, 5, 9, 13, 17
Measures of Dispersion
An average can represent a series only as best as a single
figure can, but it certainly cannot reveal the entire story of any
phenomenon under study. Specially it fails to give any idea
about the scatter of the values of items of a variable in the
series around the true value of average. In order to measure
this scatter, statistical devices called measures of dispersion
are calculated.
x x
2
i
n
For grouped data
f x x
2
i i
f i
{ 8, 5, 4, 12, 15, 5, 7 }
mean 45.52632
SD 15.60713
MD 12.2253
Coefficient of Variation (CV)
Sample 1 Sample 2
Age 25 years 11 years
Mean weight 145 pounds 80 pounds
Standard deviation 10 pounds 10 pounds
Measures of Skewness and Kurtosis
Skewness:
Skewness means lack of symmetry.
If the right tail is longer, we get a positively skewed distribution for
which mean > median > mode.
If the left tail is longer, we get a negatively skewed distribution for
which mean < median < mode.
Measures of Skewness:
Measures of Skewness:
Kurtosis
Kurtosis is another measure of the shape of a frequency curve.
2. The mean, the median, and the mode are all equal.
3. The total area under the curve above the x-axis is one square
unit. This characteristic follows from the fact that the normal
distribution is a probability distribution. Because of the
symmetry already mentioned, 50 percent of the area is to the
right of a perpendicular erected at the mean, and 50 percent
is to the left.
4. If we erect perpendiculars a distance of 1 standard
deviation from the mean in both directions, the area
enclosed by these perpendiculars, the x-axis, and the curve
will be approximately 68 percent of the total area. If we
extend these lateral boundaries a distance of 2 standard
deviations on either side of the mean, approximately 95
percent of the area will be enclosed, and extending them a
distance of 3 standard deviations will cause approximately
99.7 percent of the total area to be enclosed.