0% found this document useful (0 votes)

8 views

Notes of B.Stats

Uploaded by

neet dedhia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Notes of B.Stats

Uploaded by

neet dedhia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Statistics: Introduction

The Word statistics have been derived from Latin word “Status” or the Italian word “Statista”,
meaning of these words is “Political State” or a Government. The word statistics was originally
applied to only such facts and figures that were required by the state for official purposes i.e. data
related to population and property.
The application of statistics was very limited but rulers and kings needed information about lands,
agriculture, commerce, population of their states to assess their military potential, their wealth,
taxation and other aspects of government.
Statistics: Definition
As a singular, “By statistics we mean a science that deals with collection, classification
presentation, analysis and drawing valid inference from numerical facts or data”.

Statistics: Limitation

Applicable on average (group) not for individual

Used mostly for quantitative data
Give ‘good’ answer not the ‘ideal’ answer
Can be misused
One of many ways of studying and solving the problem
May leads to different answer when different method is applied
Since statistics deals with uncertainty and randomness, statistical technique has its limitation.
 https://www.youtube.com/watch?v=wV0Ks7aS7YI
Relevance of Data
Data is indisputable
Data make better decisions
Data solve problems
Data evaluate performance
Data improve processes
Data understand consumers
Data (Variable)
A Data is a characteristic of a unit being observed that may assume more than one of a set of
values to which a numerical measure or a category from a classification can be assigned.
Example: income, age, weight, “occupation”, “industry”, “disease”, etc.
Numerical Data: - A characteristic of objects in the population which can be expressed
numerically is known as numerical data.
Example: Number of Facebook friends
Categorical Data (Attribute): - A Characteristic which cannot be expressed quantitatively but can
be described qualitatively is known as categorical data or attribute.
Example: Favorite Social Media Application.
Types of Data
Ex. (1) A survey by an electric company contains questions on the following. Describe the data
implicit in these 11 items as numerical or categorical data.
a. Age of household head.
b. Sex of household head.
c. Number of people in household.
d. Use of electric heating (yes or no).
e. Number of large appliances used daily.
f. Thermostat setting in winter.
g. Average number of hours heating is on.
h. Average number of heating days.
i. Household income.
j. Average monthly electrical bill.
k. Ranking of this electric company as compared with two previous electricity suppliers.
Types of data

Discrete Data: A discrete variable takes only distinct and integer values (analogous to 'counting').
Example: number of defective items, number of students absent in statistics class.
Continuous data: A continuous variable takes any value on a range of real numbers (analogous
to measurement). Example: height of first year students, time spent on studying at home.

Ex. (2) For each of the following indicate if a discrete or a continuous random variable provides the
best definition.
(a) number of defective items in a sample of 20 items from a large shipment
(b) yearly income for a family
(c) change in price of a share of IBM common stock in a month
(d) number of errors detected in a corporation's accounts
(e) number of claims on a medical insurance policy in a particular year
(f) amount of oil imported into the India in a month
(g) questions answered correctly in 50-objective question examination
(i) number of nonproductive hours in an 8-hoyrs workday.

Scale of Measurement: There are four generally used scales of measurement, listed here from weakest
to strongest.
• Nominal scale: In the nominal scale of measurement, numbers are used simply as labels
of groups or classes. Example: gender of respondent, defectiveness of items.

• Ordinal scale: In the ordinal scale of measurement, data elements may be ordered
according to their relative size or quality. Example: Ranking of brands.

• Interval Scale: In the Interval scale of measurement the value of zero is assigned
arbitrarily and therefore we cannot take ratios of two measurements. But we can take ratios
of intervals. Example: time of day, temperature.

• Ratio Scale: If two measurements are in ratio scale, than we can take ratios of those
measurement. The zero in this scale is an absolute zero. Example: Money, Weight.

Ex (1) A survey by an electric company contains questions on the following, describe the scales of
measurement for the variables implicit in 11 items.

1. Age of household head.

2. Sex of household head.
3. Number of people in household.
4. Use of electric heating (yes or no).
5. Number of large appliances used daily.
6. Thermostat setting in winter.
7. Average number of hours heating is on.
8. Average number of heating days.
9. Household income.
10. Average monthly electrical bill.
11. Ranking of this electric company as compared with two previous electricity suppliers.
Types of data: Primary and Secondary

Primary data

Data that is collected by a researcher from first-hand sources, using methods like surveys,
interviews, or experiments. It is collected with the research project in mind, directly from primary
sources.

Secondary data
Data collected by someone else for some other purpose (but being utilized by the investigator for
his own purpose). The term is used in contrast is the term secondary data. Secondary data is data
gathered from studies, surveys, or experiments that have been run by other people or for other
research.
Advantages

Primary data
• The investigator collects data specific to the problem under study.
• There is no doubt about the quality of the data collected (for the investigator).
• If required, it may be possible to obtain additional data during the study period.
Secondary data

• The data’s already there- no hassles of data collection

• It is less expensive
• The investigator is not personally responsible for the quality of data.
Disadvantages

Primary data
• The investigator has to contend with all the hassles of data collection
• Ensuring the data collected is of a high standard
• Cost of obtaining the data is often the major expense in studies
Secondary data
• The investigator cannot decide what is collected (if specific data about something is
required, for instance).
• One can only hope that the data is of good quality
• Obtaining additional data (or even clarification) about something is not possible (most often)

Sources of data

Primary sources
Provide raw information and first-hand evidence. Examples include interview transcripts, statistical
data, and works of art. A primary source gives you direct access to the subject of your research.
Following are the methods of obtaining the primary data.
• Observation
• Experiment
• Interviews
• Survey
Secondary sources
Provide second-hand information and commentary from other researchers. Examples include
journal articles, reviews, and academic books. A secondary source describes, interprets, or
synthesizes primary sources.
• Research journal
• Government publication
• Trade and business magazines
• Any other publication.
Primary sources are more credible as evidence, but good research uses both primary and
secondary sources.
Census and Sampling
• Population: The group of object(subject) of research intend to generalize the results of study.
• Sample: A subset(part) of population which researcher investigate.

• Census: The method of statistical enumeration where all members of the population are
studied. A population refers to the set of all observations under concern.
• Sampling: The technique of selecting individual members or a subset of the population to
make inferences from them or estimate characteristics of the whole population.
Advantage of Census:
• Accuracy (In terms of sampling error)

Advantage of sampling:
• Time
• Money
• Accuracy (In terms of quality of data)
• Scope
Average: Introduction

Main Value: One of the objectives of the analysis of data is to get one single value which can
describe the characteristics of the entire mass of the data and which can be consider as
representative of the entire data. A value satisfying, this criterion is the central value or an
“average”.
Central Tendency: The average is the representative or typical value of the data. It usually lies
somewhere near the center of the group and that is why the average are termed as measures of
central tendency or central value.
Comparison: Large volume of data cannot be easily understood or remembered so a single
value, summarizing the prominent features of the data as the average can be used. If two or more
sets of data are to be compared then it is not possible to compare each and every item. So, we
require one figure, representing entire data as an average, in a condensed form. Thus averages
can facilitate comparisons.

Definition

Arithmetic Mean: The most widely used measure of location or central tendency is the Arithmetic
Mean. It is defined as sum of the observations divided by the number of observations.
Median: When all the observation of a variable is arranged in either ascending (descending)
order, the middle observation is known as median.
Mode: It is the most frequently occurring observation in a data i.e. most common or most
fashionable, if it exists
Average for ungrouped data
Example (1) A random sample of 22 business economists were asked to predict the percentage
growth in the consumer price index number over the next year. The forecasts were:
3.6 3.1 3.9 3.7 3.5 3.7 3.4
3.0 3.6 3.4 3.1 2.9 3.0 4.0 2.8
3.8 4.2 2.5 3.1 3.9 2.9 2.6
Find the sample mean.

Example (2): The following data represent the number of days it took 7 individuals to quit smoking
after completing a course designed for this purpose. What is sample median?
1 100 5 2 8 3 7
Example (3): A sample of 12 senior executives found the following results for percentage of total
compensation derived from bonus payments. Find the sample median.
15.8 7.3 28.4 18.2 15.0 24.7
13.1 10.2 29.3 34.7 16.9 25.3
Example (4) The following are the sizes of the last 8 dresses sold at a women's boutique. What is
the sample mode?
8 10 6 4 10 12 14 10
Average for grouped data (discrete case)
Example (5) Mr. XYZ is Quality control manager of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches. Find mean, median and Mode.
Class (No of defects) Frequency
0 2
1 8
2 10
3 6
4 4

Mean: Here the variable X assumes separate, distinct values 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 …..𝒙𝒌 with the
corresponding frequencies 𝒇𝟏 , 𝒇𝟐 , 𝒇𝟑 …..𝒇𝒌
Then Arithmetic Mean is
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑿=
𝒏𝒐. 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
𝒇𝟏 𝒙𝟏 + 𝒇𝟐 𝒙𝟐 + 𝒇𝟑 𝒙𝟑 + ⋯ + 𝒇𝒌 𝒙𝒌
=
𝒇𝟏 + 𝒇𝟐 + 𝒇𝟑 + ⋯ + 𝒇𝒌
∑ 𝒇𝒙
=
𝒏
where, 𝒏 = ∑ 𝒇 = 𝒇𝟏 + +𝒇𝟑 + ⋯ + 𝒇𝒌
Wait and think: Mean
• It is based on all observation; hence better representative of data.
• It can be only calculated for interval and above scale of data.
• It is affected by extreme values.

Median: First calculate the cumulative frequency of less than type and then median is given as the
𝒏+𝟏
value of the variable for which cumulative frequency is at or exceeds starting from the top;
𝟐
where n represent the total number of observations.
Wait and think: Median
• It is not affected by extreme values.
• It is not based on all observation.
• It can be only calculated for ordinal and above scale of data.

Mode: Here mode can be obtained as the value of the variable with the maximum frequency.
Wait and think: Mode
• It is not affected by extreme values.
• It is not based on all observation.
• It can be calculated for nominal and above scale of data.

Averages Grouped data (continuous case)

Example (6): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons. Calculate the mean, median and mode.
Class interval
Frequency
(computer usage in hours)
0-3 5
3-6 28
6-9 8
9-12 6
12-15 3

Mean: Here the variable X assumes 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 …..𝒙𝒌 representative (mid value or class marks)
value of the class intervals with the corresponding frequencies 𝒇𝟏 , 𝒇𝟐 , 𝒇𝟑 …..𝒇𝒌 .

Then Arithmetic Mean is

𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑿=
𝒏𝒐. 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
𝒇𝟏 𝒙𝟏 + 𝒇𝟐 𝒙𝟐 + 𝒇𝟑 𝒙𝟑 + ⋯ + 𝒇𝒌 𝒙𝒌
=
𝒇𝟏 + 𝒇𝟐 + 𝒇𝟑 + ⋯ + 𝒇𝒌
∑ 𝒇𝒙
=
𝒏
where, 𝒏 = ∑ 𝒇 = 𝒇𝟏 + +𝒇𝟑 + ⋯ + 𝒇𝒌
Median: First calculate the cumulative frequency of less than type and then identify the median
class (The class interval which contains the median value) as the class interval for which
𝒏+𝟏
cumulative frequency is at or exceeds 𝟐 starting from the top; where n represent the total
number of observations. Then median is calculated using the formula,
𝒏+𝟏
(𝒍𝟐 − 𝒍𝟏 ) (
𝑴𝒆𝒅𝒊𝒂𝒏 = 𝒍𝟏 + 𝟐 − 𝒄. 𝒇)
𝒇

Where
𝒍𝟏 - lower limit of Median class
𝒍𝟐 - upper limit of Median class
𝒇- frequency of Median class
𝒄. 𝒇. – cumulative frequency of pre-Median class
Mode: First identify the model class (the class interval which contains the mode value) as the
class interval for which the frequency is maximum. Then mode is given by,
(𝒍𝟐 − 𝒍𝟏 )(𝒇𝟏 − 𝒇𝟎 )
𝑴𝒐𝒅𝒆 = 𝒍𝟏 +
(𝟐𝒇𝟏 − 𝒇𝟎 − 𝒇𝟐 )

Where
𝒍𝟏 - lower limit of Model class
𝒍𝟐 - upper limit of Model class
𝒇𝟎 - frequency of pre-model class
𝒇𝟏 - frequency of model class
𝒇𝟐 - frequency of post model class
Example (7): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below. Calculate the mean, median and mode.
Minutes
No. of aircrafts
Late
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2
Combined Arithmetic Mean

Example (8) The average wage for 50 male workers is Rs.17630/- & the average wage for 40
female workers is Rs.14540/- in a factory. Find the combined average for all the workers in the
factory.

Combined Mean: If the A.M of two groups are 𝑋̅1and 𝑋̅2 with 𝑛1 and 𝑛2 number of observations in
the groups, then the combined of the two groups taken together is given by

𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

̅ 𝒄𝒐𝒎 =
𝑿
𝒏𝒐. 𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
̅ 𝟏 + 𝒏𝟐 𝑿
𝒏𝟏 𝑿 ̅𝟐
=
𝒏𝟏 + 𝒏 𝟐
Example (9) Mean weight of 50 students in a class was 53 kg. Two new students, with weights 54
kg and 51 kg were admitted to this class. Find the average weight of all 52 students.

Weighted Arithmetic Mean

Example (10) Miss. Pooja has Rs.2,00,000 at her deposal, which she has invested in three
different investment opportunities with following information. Calculate average rate of return on
total investment.
Investment Money Rate of return of
Option Invested Investment (%)
Equities 1,00,000 17
Corporate 60,000 9
Bonds
Government 40,000 6
Bond

While calculating the A.M., we have assumed that all values are equally important, which may not
be true in many practical situations.
In case when some values are more important than the other, we calculate weighted A.M.

Here the variable X assumes distinct values 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 …..𝒙𝒌 with the corresponding together with
their relative importance 𝒘𝟏 , 𝒘𝟐 , 𝒘𝟑 …..𝒘𝒌
Then weighted Arithmetic Mean is
𝑾𝒆𝒊𝒈𝒉𝒕𝒆𝒅 𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
̅𝒘 =
𝑿
𝑻𝒐𝒕𝒂𝒍 𝒗𝒂𝒍𝒖𝒆𝒔 𝒐𝒇 𝒘𝒆𝒊𝒈𝒉𝒕
𝒘𝒙𝟏 + 𝒘𝟐 𝒙𝟐 + 𝒘𝟑 𝒙𝟑 + ⋯ + 𝒘𝒌 𝒙𝒌
=
𝒘𝟏 + 𝒘𝟐 + 𝒘𝟑 + ⋯ + 𝒘𝒌
∑ 𝒘𝒙
=
∑𝒘
Example (11) Calculate A. Mean of marks obtain by Mrs. Pragati in the examination.
Subjects Credits Marks obtained

Business Communication 2 70
Business Statistics 3 98
Financial accounting 3 86
Business Economics 3 69
Foundation Course 2 78

Presentation of data: Frequency distribution

As the number of observations obtained gets larger, the data needs to be further condensed into
summary tables in order to properly present, analyses, and interpret the findings.
The most common method is preparing a frequency distribution, which is a summary table in
which the data is arranged into conveniently established numerically ordered class groupings or
categories.
Important terms:
Frequency
Class /Class interval
Inclusive type of class- interval
Exclusive type of class-interval
Class marks
Some important point to remember while preparing frequency distributions:
Each observation in the dataset must fall into one and only one class/class interval
The number of class intervals should be greater than 4 but no more than 15
Class intervals of equal widths are usually used (although not always)
The midpoint of each interval should be close to the average of the observations included in the
class interval
A useful rule of thumb for establishing the width of the intervals is as follows:
Width of class interval = (largest value-smallest value)/number of class intervals desired

Frequency distribution: Qualitative data

Example (1): Following data is represent the operating system of smart phone used by class of
students. Prepare the frequency distribution of the data. (A = Android, W= Window Phone, I =
IPhone, AM = Amazon’s fire phone)
AM, A, I, I, I, W, I, AM, W, A,
I, I, W, A, I, A, AM, W, W, I,
I, I, W, A, A, A, W, I, AM, AM,
A, A, I, A, I, A, A, W, I, I
Solution: Frequency distribution of OS of smartphone

Frequency distribution: Quantitative data

Example (2): Mr. XYZ is Quality control manger of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches.
2 1 3 2 1 3 3 2 4 1
2 1 0 1 0 2 3 2 1 3
1 2 1 4 4 2 4 3 2 2
Solution: Frequency distribution of No. of Defect
Class(No of defects) Frequency
0
1
2
3
4

Example (3): In a study of job satisfaction, a series of test was administered to 50 subjects. The
following data was obtained; higher score represent greater satisfactions. Summarise the data
using frequency distribution.
87 59 80 61 50 60 70 89 84 76

76 41 81 88 47 65 74 84 76 78

67 50 70 46 81 92 53 83 78 67

58 90 73 85 87 77 43 70 64 74

92 75 69 97 75 71 61 46 69 64

Solution: Frequency distribution of Satisfaction score

Class Interval (Satisfaction score) Frequency
40 - 49 5
50 - 59 5
60 - 69 10
70 - 79 15
80 - 89 11
90 - 99 4

Example (4): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons:
4.1 1.5 10.4 5.9 3.4 5.7 1.6 6.1 3.0 3.7
3.1 4.8 2.0 14.8 5.4 4.2 3.9 4.1 11.1 3.5
4.1 4.1 8.8 5.6 4.3 3.3 7.1 10.3 6.2 7.6
10.8 2.8 9.5 12.9 12.1 0.7 4.0 9.2 4.4 5.7
7.2 6.1 5.7 5.9 4.7 3.9 3.7 3.1 6.1 3.1
Summarize the data by constructing a frequency distribution with class width of 2 hours.
Solution: Frequency distribution of Computer usage in hours
Class interval (computer usage in hours) Frequency
0-3
3-6
6-9
9-12
12-15
Total =

Presentation of Data: Diagrams

Following is the important point to be remembered while making a diagram and graph:
A good diagram and graph:
Provides a clear summary of data
Is a fair and honest representation
Highlights underlying patterns
Allows the extraction of a lot of information quickly.
A bad diagram:
Confuses the viewer
Misleads (either accidentally or intentionally).
Diagrams: Frequency diagram

Example (5) The following frequency diagram represent number of confirmed cases of COVID-19
In India and world.

Diagram: Simple Bar Diagrams

Example (6) The following bar chart represent proportion of woman in the total labor force (%).

Proportion of Women in the Total

Labour Force (% )
50
45
40
35
30
25
20
15
10
5
0
a

P ia
A e

n
d

l
.

n
in

an a

zi
.A

pa
nc

d
n

ta
B rali

ra
h

In
ila

is
ra
C

B
t
he

ak
F

a
gl
T

Diagrams: Multiple bar diagram

Example (7): The following diagram represents Record of Disinvestment (Rs. In Crores) for the
year 1991-02.
Record of Disinvestment
14000
12000
10000 Target Set by
8000 Government
6000
4000 Actual
2000 Receipts
0
2

2
-9

-9

-0

-0
91

01
19

20
Diagram: Subdivided bar diagram

Example (8): The following diagram represents distribution of senior, adults and child at hotel
accommodation at Irish, British, Mainland European and Rest of World.

Diagram: Pie diagram

Pie-chart: With a small number of categories, we could use a pie-chart. The angle can be
calculated using the formula.
Component value x 360
Angle in Degree=
Total value of all components
Example (9): The following Pie diagram represents distribution of favorite types of movie.

Presentation of Data: Quantitively data

Presentation of Data: Histogram

Example (10): The “Computer Today” reported on home technology and its usage by person aged
12 and older. The following data are the hours of personal computer usage during one week for a
sample of 50 persons:
4.1 1.5 10.4 5.9 3.4 5.7 1.6 6.1 3.0 3.7
3.1 4.8 2.0 14.8 5.4 4.2 3.9 4.1 11.1 3.5
4.1 4.1 8.8 5.6 4.3 3.3 7.1 10.3 6.2 7.6
10.8 2.8 9.5 12.9 12.1 0.7 4.0 9.2 4.4 5.7
7.2 6.1 5.7 5.9 4.7 3.9 3.7 3.1 6.1 3.1
Prepare the histogram representing the data. Calculate the mode from histogram and verify your
answer by calculating it using the formula. (Answer=4.6)
Class interval Frequency
0-3 5
3-6 28
6-9 8
9-12 6
12-15 3
Total = 50
Presentation of Data: Ogives

Cumulative (Less than type) frequency graph: It plots the frequency of all observation less than
a given observation. Plot the points by taking upper limit of class interval on x- axis and
corresponding cumulative on y axis. Join these points by smooth free hand curve.
Example (11): For the data given in question example (10). Draw the less than type cumulative
curve. Hence find the value of median from the graph and verify your answer by calculating it
using the formula. (Given Answer = 5.22)
Example (12): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below.
Minutes Late No. of aircrafts
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2

Prepare the histogram representing the data. Calculate the mode from histogram and verify your
answer by calculating it using the formula. (Mode =6.14)
Draw the less than type cumulative curve. Hence find the value of median from the graph and
verify your answer by calculating it using the formula. (Median = 11)
Quartiles: Quartiles are not the measure of central tendency but are partitioning value, that is they
are specific points in data set that separate large ordered data sets into four quarters.
First data must be arranged in ascending order and then quartiles are given by,
First quartile (lower quartiles), Q1: The first quartile, Q1, divides the ordered data set such that
25% of observations are at or below this value.

𝒏 + 𝟏 𝒕𝒉
𝑸𝟏 = ( ) 𝒗𝒂𝒍𝒖𝒆
𝟒
Second quartile, Q2: The second quartile, Q2, divides the ordered data set such that 50% of
observations are at or below this value.

𝒏 + 𝟏 𝒕𝒉 𝒏 + 𝟏 𝒕𝒉
𝑸𝟐 = 𝑴𝒆𝒅𝒊𝒂𝒏 = {𝟐 ( )} 𝒗𝒂𝒍𝒖𝒆 = ( ) 𝒗𝒂𝒍𝒖𝒆
𝟒 𝟐
Third quartile (Upper quartiles), Q3: The third quartile, Q3, divides the ordered data set such that
75% of observations are at or below this value.

𝒏 + 𝟏 𝒕𝒉
𝑸𝟑 = {𝟑 ( )} 𝒗𝒂𝒍𝒖𝒆
𝟒
Where 𝒏 is the number of observation in the data.
Example (1): The growing use of personal computers is suggested to be one reasons people can
operate at-home business. Following is a sample of age data for individuals working at home.
22 58 24 50 29 52 57 31 30 41
44 40 46 29 31 37 32 44 49 29
Compute the first, second and third quartiles.

Example (2) The IQ scores for a sample of 30 students who are entering their first year of high
school are shown below:
95 95 97 98 101
102 103 104 105 106
106 107 108 108 110
111 115 115 117 119
119 121 121 126 126
128 133 134 136 142

Find the three quartiles. Without calculating, give the value of median.
Example (3) Mr. XYZ is Quality control manager of ABC electrical limited. To check the quality of
the switch, he selects 30 switches randomly from the lot and observes the following no of defect in
30 switches. Find three quartiles.
Class (No of defects) Frequency
0 2
1 8
2 10
3 6
4 4

Example (4): During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes
they were is shown in the frequency table below. Calculate the three quartiles.
Minutes
No. of aircrafts
Late
0-10 27
10-20 10
20-30 7
30-40 5
40-50 4
50-60 2

Dispersion: Introduction

In addition to averages, some additional information about the observation is required to know the
extent to which the values vary from one another and from central value.
A measure of spread or scatter of the data is called a measure of variation or dispersion.
The measure of dispersion can give us idea about reliability of the averages. When the variability
is less, the average is more reliable, so that it is a better estimate of the population average and if,
the dispersion is more, the average is not a good representing of the data.
The measures of dispersion can be used to compare two or more distributions. The one with less
dispersion is more consistent or homogenous and the one with more dispersion is less consistent.

Dispersion: Types of dispersion

There are two kinds of measures of dispersion, namely:

• Absolute measures of dispersion
• Relative measures of dispersion
Absolute measures of dispersion indicate the amount of variation in a set of values; in
terms of units of observations. For example, when rainfall data is made available for different
days in mm, any absolute measures of dispersion give the variation in rainfall in mm.
On the other hand, relative measures (also known as coefficient) of dispersion are free from
the units of the measurements of the observations. They are pure numbers. They are used to
compare the variation in two or more sets, which are having different units of measurements of
observations.
Absolute Relative
Dispersion Dispersion

Range Coefficient of Range

Quartile
Coefficient of Q.D.
deviation

Mean
Coefficient of M.D
deviation

Standard
Coefficient of Variance
deviation

Range: It is defined as the difference between the maximum and minimum observation in the
data.

𝑹𝒂𝒏𝒈𝒆 = 𝑴𝒂𝒙. −𝑴𝒊𝒏.

And the corresponding relative measure is given by
𝑴𝒂𝒙. −𝑴𝒊𝒏
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑹𝒂𝒏𝒈𝒆 =
𝑴𝒂𝒙 + 𝑴𝒊𝒏
Quartile Deviation (Q.D.): It is defined as average spread in middle 50% of the data; it is average
of difference between the third quartile, Q3 and the first quartile, Q1 from Median Q2.

(𝑸𝟐 − 𝑸𝟏 ) + (𝑸𝟑 − 𝑸𝟐 )
𝑸. 𝑫. =
𝟐
(𝑸𝟑 − 𝑸𝟏 )
=
𝟐
And the corresponding relative measure is given by
𝑸𝟑 − 𝑸𝟏
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑸. 𝑫. =
𝑸𝟑 + 𝑸𝟏
Mean Deviation: It is defined as average of absolute deviation of value from mean.
𝟏
𝑴. 𝑫. = ̅|
∑|𝑿 − 𝑿
𝒏
And the corresponding relative measure is given by
𝑴. 𝑫.
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑴. 𝑫. =
̅
𝑿
Standard Deviation: It is defined as square root of average of squared deviation of value from
mean.

𝟏 𝟏
̅ )𝟐 = √ ∑ 𝒙𝟐 − (𝒙
𝑺. 𝑫. = 𝑺 = √ ∑(𝑿 − 𝑿 ̅)𝟐
𝒏 𝒏

And the corresponding relative measure is known as coefficient of variance and given by
𝑺. 𝑫.
𝑪𝒐𝒆𝒇𝒇. 𝒐𝒇 𝑺. 𝑫. = 𝟏𝟎𝟎
̅
𝑿

Example (1) Eight participants in a bike race had the following finishing times in minutes.
28 22 26 33 21 23 37 24
Compute the range, Q.D, M.D and S.D. and their coefficient.

Example (2) The Los Angeles Times regularly reports the air quality index for various area of the
southern California. A sample of air quality index values for Pomona provided the following data:
28 42 58 48 45 55 60 49 50
Compute the range, Q.D, M.D and S.D. and their coefficient.

For group data, the formula for

𝟏
𝑴. 𝑫. = ̅|
∑ 𝒇|𝑿 − 𝑿
𝒏

𝟏 𝟏
̅ )𝟐 = √ ∑ 𝒇𝒙𝟐 − (𝒙
𝑺. 𝑫. = 𝑺 = √ ∑ 𝒇(𝑿 − 𝑿 ̅) 𝟐
𝒏 𝒏

Example (3) The score of 20 students in color sensitivity test is given by the following frequency
distribution. Calculate the range, Q.D, M.D and S.D. and their coefficient. (1-least sensitivity and 7-
most)
score 1 2 3 4 5 6 7
frequency 3 1 3 4 6 2 1
Example (4) Following is the frequency diminution of age of Instagram user in a random survey.
Calculate the range, Q.D, M.D and S.D. and their coefficient.

Age of Instagram user No. of user

12-18 9
18-24 34
24-30 35
30-36 16
36-42 8
42-48 4
48-54 2

Example (4) Following is the frequency diminution of age of Instagram user in a random survey.
Calculate the range, Q.D, M.D and S.D. and their coefficient.
Age of Instagram user No. of user
12-18 9
18-24 34
24-30 35
30-36 16
36-42 8
42-48 4
48-54 2

Example (5) In a study of job satisfaction, a series of test was administered to 50 subjects. The
following data was obtained; higher score represent greater satisfactions. Calculate the range,
Q.D, M.D and S.D. and their coefficient.

Class Interval (Satisfaction score) Frequency

40 - 49 5
50 - 59 5
60 - 69 10
70 - 79 15
80 - 89 11
90 - 99 4

Symmetry: The shape of a distribution is said to be symmetric if the observations are balanced,
or evenly distributed, about mean.

In symmetric distribution the mean and the median are equal.

Skewness: It refers to distortion or asymmetry in a symmetrical bell curve, or normal distribution,
in a set of data. A distribution is skewed if the observations are not symmetrically distributed about
the mean. If the curve is shifted to the left or to the right, it is said to be skewed.
Skewness can be quantified as a representation of the extent to which a given distribution varies
from a normal distribution.

A positively skewed (or skewed to the right) distribution has a tail that extends to the right in the
direction of positive values.

Generally for a right skewed distribution, mode< median < mean

A negatively skewed (or skewed to the left) distribution has a tail that extends to the left in the
direction of negative values.

Generally for a left skewed distribution, Mean < median < mode
Example (1) Following data shows the mean and median of 3 year return for two types of funds.
State for skewness of the data.
% return of Growth % return of Value
fund fund
Mean 22.44 20.42
Median 22.32 19.46

Kurtosis: It measure the extent to which values that are very different from the mean (Normal)
effect the shape of the distribution of the data set. Kurtosis affects the peakedness of the curve of
the distribution i.e., how sharply the curve rises approaching the center of the distribution.
In affecting the shape of the central peak, the relative concentration of values near the mean also
effect the ends, or tails, of the distribution of data. Thus, Kurtosis is the “peakedness” and
“tailedness” of the distribution of data.

Mesokurtic: The distribution of data which is neither very peaked nor very flat-topped (Normally
distributed) is also called mesokurtic.
Leptokurtic: Here, the distribution has longer and fatter tails than normal distribution. Moreover,
the peak is higher and also sharper when compared to normal distribution. It means the
distribution produces more extreme outliers than does the normal distribution

Platykurtic: Here, the distribution of the data has shorter and thinner tails than normal distribution.
Moreover, the peak is lower and also broader when compared to normal distribution. It means the
distribution produces fewer and less extreme outliers than does the normal

Business Statistics LU1 Notes
100% (1)
Business Statistics LU1 Notes
11 pages
SINAMICS DC Converter Manual
33% (3)
SINAMICS DC Converter Manual
706 pages
chapter 1_250119_072242
No ratings yet
chapter 1_250119_072242
11 pages
Lecture One - Statistical Data
No ratings yet
Lecture One - Statistical Data
9 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
8 pages
Quantitative Techniques Assignment
No ratings yet
Quantitative Techniques Assignment
22 pages
H
No ratings yet
H
6 pages
Math-101-Statistics
No ratings yet
Math-101-Statistics
100 pages
Business Statistics Note
No ratings yet
Business Statistics Note
15 pages
Nature of Probability & Statistics
No ratings yet
Nature of Probability & Statistics
4 pages
Module 1a Nature of Statistics
No ratings yet
Module 1a Nature of Statistics
56 pages
Statistics For Engineers-1
No ratings yet
Statistics For Engineers-1
49 pages
Slides
No ratings yet
Slides
41 pages
1 Data and Statistics
No ratings yet
1 Data and Statistics
65 pages
Statistics Report Group 1
No ratings yet
Statistics Report Group 1
64 pages
Statistical Analysis With Software Application-ppt_5ff616054a20ee1e28e5a36722f6fc61
No ratings yet
Statistical Analysis With Software Application-ppt_5ff616054a20ee1e28e5a36722f6fc61
57 pages
Stansa23z - 2023 - Basic Statistics
No ratings yet
Stansa23z - 2023 - Basic Statistics
10 pages
Collection of Data
No ratings yet
Collection of Data
115 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
4 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
5 pages
1 Descriptive Part
No ratings yet
1 Descriptive Part
13 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
19 pages
Document from Nashra
No ratings yet
Document from Nashra
14 pages
STATISTICS N Quantitative
No ratings yet
STATISTICS N Quantitative
58 pages
Desc. Stat
No ratings yet
Desc. Stat
55 pages
Notes (Chapter 1 - 3)
No ratings yet
Notes (Chapter 1 - 3)
15 pages
STA111 khairun
No ratings yet
STA111 khairun
63 pages
Statistics and Analysis Notes
No ratings yet
Statistics and Analysis Notes
8 pages
Advance Statistics
No ratings yet
Advance Statistics
21 pages
Lecture Statistics
No ratings yet
Lecture Statistics
5 pages
Statistics
No ratings yet
Statistics
101 pages
STT 201-20212022
No ratings yet
STT 201-20212022
20 pages
Business Statistics Introduction. 1
No ratings yet
Business Statistics Introduction. 1
18 pages
Student Study Guide and Solutions Chapter 1 FA14 PDF
No ratings yet
Student Study Guide and Solutions Chapter 1 FA14 PDF
7 pages
Intreb Statist
No ratings yet
Intreb Statist
47 pages
Statistics Notes 2019 Certificate
No ratings yet
Statistics Notes 2019 Certificate
87 pages
Chapter 2 Stat (MMW)
No ratings yet
Chapter 2 Stat (MMW)
13 pages
1 Introduction To Statistics
No ratings yet
1 Introduction To Statistics
89 pages
Chapter 1 Intro to Statistics
No ratings yet
Chapter 1 Intro to Statistics
12 pages
Introduction To STATISTICS-new
100% (1)
Introduction To STATISTICS-new
46 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
Statistics and Data: April Andrea M.Valera 2 0 1 8
No ratings yet
Statistics and Data: April Andrea M.Valera 2 0 1 8
34 pages
PIM3 - Basics of Business Statistics
No ratings yet
PIM3 - Basics of Business Statistics
37 pages
Lesson 1 Data-Management.pptx
No ratings yet
Lesson 1 Data-Management.pptx
55 pages
Note for Int to Statistics
No ratings yet
Note for Int to Statistics
24 pages
Class 2
No ratings yet
Class 2
5 pages
Module1-Talk-GITAA-modified (Autosaved)
No ratings yet
Module1-Talk-GITAA-modified (Autosaved)
328 pages
Business Tool For Decision Making
No ratings yet
Business Tool For Decision Making
68 pages
Direct Personal Observation
100% (1)
Direct Personal Observation
7 pages
1.STA 112 Session 1
No ratings yet
1.STA 112 Session 1
7 pages
All The Statistical Concept You Required For Data Science
No ratings yet
All The Statistical Concept You Required For Data Science
26 pages
Business Statistics
100% (22)
Business Statistics
506 pages
Math as a Tool Data Management Introduction and Central Tendency
No ratings yet
Math as a Tool Data Management Introduction and Central Tendency
12 pages
Measure of Central Tendancy
No ratings yet
Measure of Central Tendancy
5 pages
Biostat Prelims
No ratings yet
Biostat Prelims
9 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Business Statistics
From Everand
Business Statistics
Knowledge Flow
No ratings yet
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Thinking Statistically
From Everand
Thinking Statistically
Anthony Banfield
5/5 (1)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Introduction To Non Parametric Methods Through R Software
From Everand
Introduction To Non Parametric Methods Through R Software
Editor IJSMI
No ratings yet
Supermarket_Case_Study_Analysis (1) (3)
No ratings yet
Supermarket_Case_Study_Analysis (1) (3)
5 pages
Land Pollution Presentation
100% (1)
Land Pollution Presentation
10 pages
Ocm q8 Solution
No ratings yet
Ocm q8 Solution
6 pages
Price List
No ratings yet
Price List
66 pages
Whitepaper Zend PHP Extensions
No ratings yet
Whitepaper Zend PHP Extensions
52 pages
Application - Mixing
No ratings yet
Application - Mixing
2 pages
EIA 02 - EIA Process-2017
No ratings yet
EIA 02 - EIA Process-2017
33 pages
READING-COMPREHENSION English
No ratings yet
READING-COMPREHENSION English
6 pages
Engineering Drawing Project Report
No ratings yet
Engineering Drawing Project Report
16 pages
Ieee Matlab Vsi
No ratings yet
Ieee Matlab Vsi
4 pages
Google.Professional-Machine-Learning-Engineer.v2021-07-27.q25
No ratings yet
Google.Professional-Machine-Learning-Engineer.v2021-07-27.q25
11 pages
Determination of Oleyl Propylenediamine On The Surfaces of Water Steam Cycles PPChem May June 2017
No ratings yet
Determination of Oleyl Propylenediamine On The Surfaces of Water Steam Cycles PPChem May June 2017
12 pages
JavaScript Objects, Methods and Properties 2
No ratings yet
JavaScript Objects, Methods and Properties 2
5 pages
Q4_WS_TLE 7_Lesson 8_Week 8
No ratings yet
Q4_WS_TLE 7_Lesson 8_Week 8
9 pages
Earthingconcepts 091106084814 Phpapp02
100% (1)
Earthingconcepts 091106084814 Phpapp02
48 pages
Mading LTE 2100
No ratings yet
Mading LTE 2100
892 pages
UK362-2425-003292
No ratings yet
UK362-2425-003292
1 page
Big Bulls: "A Boon To Your Investment"
No ratings yet
Big Bulls: "A Boon To Your Investment"
4 pages
Introduction Genetic Quantitative
100% (3)
Introduction Genetic Quantitative
34 pages
Value of Universal Gas Constant: Formula
No ratings yet
Value of Universal Gas Constant: Formula
2 pages
Brochure Multi Stage Roots Pumps A 200 L
No ratings yet
Brochure Multi Stage Roots Pumps A 200 L
8 pages
Hoffman and Kunze - Linear Algebra Solutions Manual.
100% (1)
Hoffman and Kunze - Linear Algebra Solutions Manual.
304 pages
English Paper II
No ratings yet
English Paper II
86 pages
Writing English 1
No ratings yet
Writing English 1
48 pages
Q-METHOD Watts & Stenner Chapter 1
No ratings yet
Q-METHOD Watts & Stenner Chapter 1
21 pages
Write An Important Characteristics of System With Example: Boundaries and Environment in System
No ratings yet
Write An Important Characteristics of System With Example: Boundaries and Environment in System
7 pages
Marechal DSN6 63a Decontactor Range en
No ratings yet
Marechal DSN6 63a Decontactor Range en
4 pages
Coolant SpecificationSBGEN172E2 PDF
No ratings yet
Coolant SpecificationSBGEN172E2 PDF
1 page
Learning Exercise 2.6 (Good Citizenship)
No ratings yet
Learning Exercise 2.6 (Good Citizenship)
2 pages
Vol 4 02
No ratings yet
Vol 4 02
11 pages
Cie 140 PDF
No ratings yet
Cie 140 PDF
29 pages
Basell For IV Sol. Ldpe - Pe 3220 D
No ratings yet
Basell For IV Sol. Ldpe - Pe 3220 D
1 page
737 Book NG 52 303
No ratings yet
737 Book NG 52 303
42 pages

Notes of B.Stats

Uploaded by

Notes of B.Stats

Uploaded by

Statistics: Introduction

Applicable on average (group) not for individual

1. Age of household head.

• The data’s already there- no hassles of data collection

Averages Grouped data (continuous case)

Then Arithmetic Mean is

𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

Weighted Arithmetic Mean

Presentation of data: Frequency distribution

Frequency distribution: Qualitative data

Frequency distribution: Quantitative data

Solution: Frequency distribution of Satisfaction score

Presentation of Data: Diagrams

Diagram: Simple Bar Diagrams

Proportion of Women in the Total

Diagrams: Multiple bar diagram

Diagram: Pie diagram

Presentation of Data: Quantitively data

Dispersion: Types of dispersion

There are two kinds of measures of dispersion, namely:

Range Coefficient of Range

𝑹𝒂𝒏𝒈𝒆 = 𝑴𝒂𝒙. −𝑴𝒊𝒏.

For group data, the formula for

Age of Instagram user No. of user

Class Interval (Satisfaction score) Frequency

In symmetric distribution the mean and the median are equal.

Generally for a right skewed distribution, mode< median < mean

You might also like