Business Statistics PDF

Business Statistics
Course Code: MKT-206
Prepared by
Ahmed Sabbir
As per syllabus of EMBA Program of Patuakhali Science Technology University,

PSTU, Patuakhali, Bangladesh.
Chapter
Introduction to Business Statistics
Definition of Statistics
The word Statistics refers to a special discipline or a collection of procedures and principles useful
as an aid in gathering and analyzing numerical information for the purpose of drawing
conclusion and making decisions.
Statistics is the branch of mathematics that transforms data into information for decision makers.
The science of collecting, organizing, presenting, analyzing and interpreting data to assist in
making more effective decisions.
Statistics is the concern way of scientific method of collecting, organizing, summarizing

presenting, and analyzing the data as well as drawing and making reasonable decision on the
basis of such analysis.
“Statistics is a way to get information from data.”
Statistics is the study of numerical data, facts, figures and measurement. Statistics is used to
convert raw numerical data into useful information for relevant users.
Statistics
Information
Data
Data: Facts, Information:

especially numerical Knowledge
facts, collected communicated
together for concerning some
reference or particular fact.
information.
Statistics is a tool for creating an understanding from a set of numbers.
Business Statistics and Its importance

Business Statistics involves application of statistical tools in the area of marketing, production,
finance, research and development, manpower planning etc. to extract relevant information for
the purpose of decision making.
EMBA Program, PSTU | 2

 Business Statistics deal with uncertainties by forecasting and general economic
fluctuation.
 Helps is sound decision making providing accurate estimates about costs, demand prices,
sales etc.
 Helps in business planning on the basis of sound predictions and assumptions.
 Helps in measuring variations in performance of products, employees and business unit
etc.
 It allows of two or more products, business units, and assumptions.
 Helps in identifying relationship between variables, ad their effect on other effect of those
each other.
 Helps in validating generalizations and theoretical concepts formulated by managers.
Statistics in business Management
In business Statistics has many important uses. Statistics provide managers with more confidence
in dealing with uncertainty and taking effective decision. Statistical reports provide a summary
of business activities which improves the capability of making more effective decisions regarding
future activities.
 To summarize business data
 To draw conclusion from the business data
 To make reliable forecasts about business activities
 To improve business processes.
Discussed below are certain activities of a organization where statistics plays an important role:
Marketing:
Before a product launched, the market research team of an organization, through a survey, makes
use of various techniques of statistics to analyze data on population, purchasing power, habits of
consumer, competitors, pricing and a hoard of other aspects. Such studies reveal possible market
potential for the product.
Analysis of sales volume in relation to purchasing power and concentration of population is
helpful in marketing strategies to improve sales.
Production:
Decision regarding quantity of production and time to time purchasing of raw materials are based
on statistical data. Statistical methods are also used in quality improvement of existing product
and set standard for new ones.
Finance:
Financial forecast, break-even analysis, investment decision under uncertainty involves the
application of relevant statistical methods for analysis.
A statistical study through correlation analysis of profit and dividends helps to predict and
decide probable dividends for future year.

Personnel Management
In the process of manpower planning, Human resource or personnel department makes statistical
studies of wage rate, incentive plans, labor turnover, performance appraisal, employee rating,
training and development programs.
Limitations of Statistics
 Statistics does not study qualitative phenomena.-Since statistics deals with
numerical data, it cannot be applied in studying those of problems which can be
stated and expressed quantitatively. For example, export volume of Bangladesh
has increased considerably during last few years cannot be annualized
statistically.
 Statistics does not study individual: Statistics cannot consider any single or
isolated figure. Statistics laws are true on average. Statistics are aggregates of facts.
So single observation is not a statistics, it deals with groups and aggregates only.
For example, when average height of EMBA students is 6 ft, it shows the height
not of an individual but as found by the study of all individuals.
 Statistics can be misused: Statistics deal with figures which are innocent in
themselves and can be easily manipulated or distorted by people for their selfish
motives. Therefore, it is a dangerous tool in the hands of a non-expert.
Types of Statistical Methods

Statistical methods are of 2 types:
(i) Descriptive Statistics
(ii) Inferential Statistics

Descriptive statistics
Descriptive statistics includes statistical methods involving the collection, presentation, and
characterization of a set of data in order to describe the various features of that set of data.
In general, descriptive statistics include graphic methods and numeric measures. Bar charts, line
graphs and pie charts comprise graphic method, whereas numeric measures includes measures
of Frequency, Central Tendency (Mean, Median, and Mode), dispersion (Range, Variance,
Standard Deviation), measure of position (Percentile Ranks, Quartile Ranks), skewness, and
kurtosis.
Descriptive statistics involves-

 Collect Data-e.g., Survey
 Present data-e.g., Tables and graphs
 Characterize Data—e.g., the sample mean.
Features:
1. Statistics describes a numerical set of data by its-
 Center
 Variability
 Shape
2. Statistics describes a categorical set of data by
 Frequency, percentage or proportion of each category.
Inferential Statistics
Inferential Statistics includes statistical methods which facilitate estimating the characteristics of
a population or making decisions concerning a population on the basis of sample results. i.e.,
Statistical inference is the process of making an estimate, prediction, or decision about a population based
on a sample.
Inferential statistics start with a sample and then generalizes to a population. The larger group of
units about which inferences are to be made is called population and sample is subset or portion
of that population.
Inferential statistics is used to examine the relationships between variables within a sample and
then make generalizations or predictions about how those variables will relate to a larger
population.
 Estimation: e.g., estimate the population mean weight using the sample mean weight.
 Hypothesis Testing: e.g., Test the claim that the population mean weight is 120 pounds.
Basic Terms of Statistics

 Variable- A characteristic of an item or an individual that will be analyzed using statistics.
E.g., Gender, the household income of the citizens who voted in the last election, the
number of varieties of a brand of cereal.

 Population: A population consists of all items and individuals about which we want to
reach conclusion. E.g., 5 lac voters in DUCSU election.
 Sample: A sample is the portion of population selected for analysis. E.g. a sample of 765
voters exit polled on Election Day, 100 boxes of cereal selected from a factory’s production
line. A sample is a subset of the population.
 Parameter: A numerical measure that describes a characteristic of a population. E.g., the
average weight of all the cereal boxes produced on a factory’s production line on a
particular day.
 Statistic: A numerical measure that describes a characteristic of a sample. e.g., the average
weight of a sample of cereal boxes produced on a factory’s production line on a particular
day.
 Statistical data sets are collection of data maintained in an organized form.
Problem:
The Rathburn Manufacturing Company makes electric wiring, which it sells to contractors in the
construction industry. Approximately 900 electric contractors purchase wire from Rathburn
annually. Rathburn’s director of marketing want’s to determined electric contractors’ satisfaction
with Rathburn’s wire. He developed a questionnaire that yields a satisfaction score between 10
and 50 for participant responses. A random sample of 35 of the 900 contractors is asked to
complete a satisfaction survey. The satisfaction scores for the 35 participants are averaged to
produce a mean satisfaction score.
a. What is the population for this study? 900
b. What is the sample for this study? 35
c. What is the statistic for this study? Sample mean (x) Satisfaction score
d. What would be a parameter for this study? Parameters are population mean (u).

Types Variables
In statistics, data are classified into two broad categories:
 Quantitative data and
 Qualitative data.
Qualitative or Attribute Variable/Categorical: The characteristics being studied is non-

numeric. Variables have values that can only be placed into categories, such as “Yes”, “No”.
Example: Gender, religious affiliation, type of automobile owned.
Quantitative Variable/Numerical: Quantitative Variables are those which are expressed in

numerical terms. Example: Number of Children, Weight of the students of EMBA class,
Quantitative variables can be classified as-

i) Discrete Variables: can only assume certain values and there are usually “gaps”
between values. For example, the number of bedroom in the house, no. of students in
the class etc. Discrete variables are arise from a counting process.
ii) Continuous Variables: can assume any value within a specified range. It can take any
numerical value. Continuous variables arise from a measuring process. The values are
quite precise and close to each other, yet distinguishably different. For example, the
pressure in a tire, weight/height of students of MBA etc.
Variables
Categorical Numerical
Examples:
 Marital Status Discrete Continuous
 Political Party
 Eye color
Examples: Examples:
 Number of Children Weight
 Defects per hour Voltage
Level of measurement
Level of measurement or scale of measure is a classification that describes the nature of
information within the values assigned to variables.
A variable has one of four different levels of measurement: Nominal, Ordinal, Interval, or
Ratio. (Interval and Ratio levels of measurement are sometimes called Continuous or Scale).

Nominal Scale:
In nominal level, data that is classified into categories and cannot be arranged in any particular
order.
For example, gender, religious affiliation, jersey no of football player, Favorite class subject,
Makes of cars etc.
Properties:
 Observations of qualitative variable can only be classified and measured.

 There is no particular order to the levels.
Nominal scale is said to be least powerful among four levels because this scale suggest no order
or distance relationship and have no arithmetic origin.
Ordinal Scale:
In ordinal level, the numerical values are categorized to denote qualitative differences among
various categories as well as rank ordered in some meaningful way according to some preference.
The preferences would be ranked from best to worst, numbered 1, 2, and so on. So, in ordinal level,
data arranged in some order, but differences between data values cannot be determined are
meaningless.
Properties:
 Data classification are represented by sets of labels or names (high, medium, low) that have
relative values.

 Because of relative values, the data classified can be ranked or ordered.
Example of Ordinal scale
 type of residence (single house, village, town, city)
 Category of vehicle (compact car, medium-sized vehicle, luxury car, etc.)
 Student class designation (Freshman, Sophomore, Junior, Senior)
 Product Satisfaction (Satisfied, Neutral, Unsatisfied)
 Student Grade A, B, C, D, F
 Movie ratings
 test of 4 soft drinks, Coca-Cola ranked number 1, Sprite 2, RC cola 3, Lemu 4.,
Interval Level:
A scale of measurement for a variable in which the interval between observations is expressed in
terms of a fixed standard unit of measurement.
The interval scale not only classifies individuals according to certain categories and determine
order of these categories, it also measure the magnitude of the differences in preferences among
the individuals. In interval measurement the distance between attributes does have meaning.
For example, an interval level of measurement could be the measurement of anxiety in a
student between the score of 10 and 11, this interval is the same as that of a student who scores
between 40 and 41. Similarly, the difference between a temperature of 100 degrees and 90 degrees
is the same difference as between 90 degrees and 80 degrees.
Properties:
 Data classification are ordered according to amount of characteristics they possesses.

 Equal differences in the characteristic are represented by equal differences in the
measurement.
Ratio Level
A ratio scale is a scale of measurement for a variable that has interval which is measurable in
standard unit of measurement and meaningful zero, i.e., the ratio of two values is meaningful.
It is most powerful of for scales because it has a unique zero origin. For example, a person
weighting 90 kg is twice as one who weight 45 kg, which have the ration of 2:1
Examples of ratio variables are the following:
 weight in kilograms or pounds
 height in meters or feet
 distance of school from home
 amount of money spent during vacation
Properties:
 Data classifications are ordered according to the amount of the characteristic they possess.

 Equal differences in the characteristic are represented by equal differences in the numbers
assigned to the classification
 The zero point is the absence of the characteristic and the ratio between two numbers is
meaningful.
Exercise
State whether the following variables is qualitative or quantitative and indicate the measurement
scale that is appropriate for each:
i) Age
ii) Gender
iii) Class Rank
iv) Make of automobile
v) Annual sales
vi) Soft drinks size-small, medium large
vii) Earnings per share
viii) Method of payment (Cash, check, credit card)
Solution:
Variable Measurement Scale
(a) Age Quantitative Ratio
(b) Gender Qualitative Nominal
(c) Class Rank Qualitative Ordinal
(d) Make of automobile Quantitative Ordinal
(e) Annual sales Quantitative Ratio
(f) Soft drinks size-small, Qualitative Ordinal
medium large
(g) Earnings per share Quantitative Ratio
(h) Method of payment (Cash, Qualitative Ordinal
check, credit card)

Exercise:
Categorize by level of data from low to high nominal=group, ordinal=rank, interval=non zero
and ratio=absolute zero.
a. Number of pizzas consumed per week per household.------Ratio

b. Age of pizza purchaser-------nominal
c. zip code of the survey respondent.---ordinal
d. Dollars spent per month on pizza per person---Ratio
e. Time in between pizza purchases.-------ratio
f. Rating of taste of a given pizza brand on a scale from 1 to 10, where 1 is very poor
tasting and 10 is excellent tasting---ordinal.
g. Ranking of the taste of four pizza brands on a taste test…………ordinal
h. Number representing the geographical location of the survey respondent.-----nominal
i. Quality rating of pizza brand as excellent, good, average and below average and
poor.-----------ordinal
j. Number representing the pizza brand being evaluated.--------nominal
k. Sex of survey respondent--------------ordinal

Chapter: 2
Data Organization
2.1 Data Collection:

Statistical data are the basic material needed to make an effective decision in a particular situation.
The initial function of statistics is to collection of data on subject of interest. The main reason for
collecting data are:
 To provide necessary inputs to a given phenomenon or situation under study.
 To measure performance in ongoing process such as production, service and so on.
 To enhance the quality of decision making by enumerating alternative courses of action in
a decision making process and selecting an appropriate one.
 To satisfy the desire to understand an unknown phenomenon.
 To assist in guessing the causes and probable effects on certain characteristics in given
situation.
2.2 Reliability of Data
Reliability is the degree of consistency of a measure. A test will be reliable when it gives the same
repeated result under the same conditions.
Before relying on any interpreted data, followings questions must be attempted:
 Have data come from an unbiased source?
 Do data represent entire population under study, i.e., how many observations should
represent the population?
 Do the data support other evidences already available? Is any evidence missing that
may cause to arrive at different conclusion?
 Do the data support the logical conclusion drawn. Have any conclusion made which are
not supported by data.
2.3 Classification of Data
Classification of Data is the process of arranging data in groups/classes on the basis of certain
properties.
According to Secrist, “Classification is the process of arranging data into sequences and groups
according to their common characteristics”.
Classification means arranging the mass of data into different classes or groups on the basis of
their similarities and resemblances. All similar items of data are put in one class and all dissimilar
items of data are put in different classes. Statistical data is classified according to its characteristics.
For example, if we have collected data regarding the number of students admitted to a university
in a year, the students can be classified on the basis of sex. In this case, all male students will be
put in one class and all female students will be put in another class. The students can also be
classified on the basis of age, marks, marital status, height, etc.
The classification of data serves the following purposes:

 It condenses the raw data into a form suitable for statistical analysis.
 It removes complexities and highlights the features of the data
 It facilitates comparisons and drawing inferences from the data.
 It provides information about the mutual relationships among elements of data sets. For
example, based on literacy and criminal tendency of a group of people, it can be established
whether literacy has any impact or not on criminal tendency.
 It helps statistical analysis by separating elements of the data set into homogenous groups
and hence brings out the points of similarity and dissimilarity.
2.4 Bases for Classification

Statistical data are classified after taking into account the nature, scope and purpose of an
investigation. Generally data are classified on the basis of following four bases:
Geographical Classification:
In geographical classification, data are classified on the basis of geographical location such cites,
districts or village. Such this type of classification is also known as areal or spatial classification.
Chronological Classification:
When the data are classified or arranged by their time of occurrence, such as years, months, weeks,
days, etc. Such classification are also called time series.
For example, Sales figure of a company in different years, Population of Bangladesh in different
years.
Year 1990 2000 2010
Population 11.1 12.0 14
Qualitative Classification:
In qualitative classification, data are classified on the basis of the descriptive characteristics or on
the basis of attributes like sex, literacy, religion, cast, or education, which cannot be quantified.
This can done in two ways:
a) Simple Classification: Each class is subdivided into two sub classes and only one attribute
is studied, such as male and female; blind and not blind etc.
b) Manifold: Each class is subdivided into more than two sub classes and only one attribute
is studied further.
Quantitative Classification
In quantitative classification, data classified on the basis of some characteristics which can be
measured such as height, weight, income, expenditure or sales etc.
Qualitative variables can be divided into two types:

a) Discrete variable-is the one whose values change by steps, cannot be assume fractional
value. E.g., the number of children in the family.
b) Continuous Variable-take any value within the range of numbers. Data obtained by
measurement.
Sources of Data

The choice of data collection method from a particular source depends on the facilities available,
the extent of accuracy required in analysis the expertise of the investigator, the time span of the
study, and the amount of money and other resources required for data collection.
Data sources are classified as:
1. Primary data: Those data which do not already exist in any form, and thus have to be
collected for the first time from the primary source(s). By their very nature, these data
require fresh and first-time collection covering the whole population or a sample drawn
from it. Individual, focus groups, and/or panels of respondent specially decided upon and
set up by the investigator for collection are example of primary data sources. The methods
of primary data:
(i) Direct personal observation
(ii) Direct or indirect oral interview
(iii) Administrating questionnaire.
2. Secondary Data: Secondary data refer to those data which has been collected
earlier for some purpose other than the analysis currently being undertaken. It can
be external and internal secondary data sources.
The secondary sources of data:
 Newspaper
 Periodicals
 Journals
 Statement of the profit and loss
 Balance sheets
 Sales figures
 Inventory records
 Previous marketing studies
Organization of Data
The best way to examine a large set of numerical data is first to organize in an appropriate format.
The data can be organized by using-
1. Data Arrary
2. Tabulation.

When a raw data set is arranged in rank order, from smallest to largest observation and vice-versa,
the ordered sequence obtained is called Data Array.
Tabulation is a method of summarizing data and presenting in a meaningful fashion.
The way of tabulating a pool of data of a variable and their respective frequencies side by side is
called a ‘frequency distribution’ of those data.
Frequency distribution:
Frequency distribution is a tabular summery of data showing the number of observations
(frequency) in each of several non-overlapping class intervals.
Frequency distribution is listing of classes and their frequencies. A frequency distribution divides
observations in the data set into conveniently established, numerically ordered classes (groups or
categories). The number of each class is referred as frequency, denoted as f.
Table-1 presents the total number of overtime hours worked for 30 consecutive weeks by
machinists in a machine shop.
94 89 88 89 90 94 92 88 87 85
88 93 94 93 94 95 92 88 94 90
95 84 93 84 91 93 85 91 89 95
The data displayed here are in raw form that is the numerical observations are not arranged in any
particular order and sequence.
These raw data do not highlights any characteristics/trend and do not easily reveal any significant
trend regarding the nature and pattern of variations therein. Moreover, as number of observations
gets large, it becomes more difficult to focus on specific features in a set of data. Thus we need to
organize the observation so that we can better understand the information that the data revealing.
Ordered Array:
84 84 85 85 87 88 88 88 88 89
89 89 90 90 91 91 92 92 93 93
93 93 93 93 94 94 94 94 94 95
Constructing a frequency distribution

i) Select an appropriate number of non-overlapping class intervals.
If k represent the number of classes and N the total number of observations, then the value of k
will be smallest exponent of the number 2, so that 2k  N
In our problem we have N=30 observations. So

2 4  16,  30;
25  32,  30
We may choose k=5 as number of classes.
According to Struge’s rule, k  1 3.222 log e N
ii) Determine the width of the class interval

It is desirable that, width of each class interval should be equal in size. Width of the class interval
Largest Value - Smallest Value
is=
Number of Classes desired
In our problem:
95  84 11
Width of the Class interval=   2.2
5 5
For convenience, the selected width (or interval) of each class is rounded to 3.
iii) Determine Class limits
The limits of each class interval should be clearly defined so that each observation of data set
belongs to one and only one class. Each class has two limits- a lower limit and upper limit.
The smallest and largest possible values in each class of a frequency distribution table are known
as class limits.
In our problem, we take 82 as lower limit and 85 as upper limit in class 1
Midpoint of Class interval:
The class mid-point is the point halfway between the boundaries (both upper and lower class
limits) of each class and is representative of all observations in that class.
Upper Limit  Lower limit

Mid Point 
2
There are two methods:
 Exclusive Method: When Upper limit is lower limit of next succeeding class interval. (No
data falls into more than one class interval).
For example, Class interval-0-10 (O but less than 10), 10-20, 20-30, 30-40. The
observation value of upper limit is included in succeeding class.
We use the exclusive method when continuity of data is required, e.g., Dividend declared
by the company, Prices of any commodity, Salary, etc.
 Inclusive Method: here both upper and lower limits of a class interval are included in the
interval itself. For example, 0-4, 5-9, 10-14 etc.
We use inclusive method when data is like no of units produced or daily shipment etc.
iv) Tally the observation into classes.

v) Count the number of items in each class.
Now, the problem is presented as:
Class Interval Tally Frequency
82-85 || 2
85-88 || 3
88-91 |||| |||| 9
91-94 |||| |||| 10
94-97 |||| | 6
Total frequency 30
Advantage of frequency distribution:
 The data are expressed in a more compact form. One can get a deeper insight into the
salient characteristics of the data at the very first glance.
 One can quickly note the pattern of distribution of observations falling in various classes.
 It permits the use of more complex statistical techniques which help reveal certain other
obscure and hidden characteristics of the data.
Disadvantage
 In the process of grouping there will be too much clustering of observations in various
classes, especially where number of class interval is too small.
 Another disadvantage is that in grouping process individual observation lose their
identity. It becomes difficult to notice how observations contained in each class are
distributed.
Problem-1
The following set of number represents mutual fund prices reported at the end of a week
for selected 40 nationally sold funds.
10 17 15 22 11 16 19 24 29 18
25 26 32 14 17 20 23 27 30 12
15 18 24 36 18 15 21 28 33 38
34 13 10 16 20 22 29 29 23 31
Arrange these prices into frequency distribution having a suitable number of classes.
Solution
Since No of observation are 40, it seems reasonable to choose 6 classes (26>42)
38  10
Class interval is=  4.66 or 5
6
Frequency Distribution

10-15 |||| | 6
15-20 |||| |||| | 11
20-25 |||| |||| 9
25-30 |||| || 7
30-35 |||| 5
35-40 || 2
Total 40
Problem-2
A computer company received a rush order for as many home computers as could be shipped
during a six week period. Company provide the following daily shipments:
22 65 65 67 55 50 65
77 75 30 62 54 48 65
79 60 63 45 51 68 79
83 33 41 49 28 55 61
65 75 55 75 39 87 45
50 66 65 59 25 35 53
Group these daily shipments figure into frequency distribution having suitable number of classes.
Solution:
Since, the number of observations are 42, so it seems reasonable to choose 6 classes (2 6>42)
87  22
Class interval is=  10.833 or11
6
Frequency Distribution
22-32 |||| 2
33-43 |||| 3
44-54 |||| |||| 9
55-65 |||| |||| |||| 14
66-76 |||| | 6
77-87 |||| 5
Total 42
Note:
If a continuous variable is classified according to the inclusive method, then certain adjustment is
needed to obtain continuity.
To ensure continuity first calculate correction factor

Upper limit of a Class - Lower Limit of next higher Class
=
2
Then subtract it from the lower limits of all classes and add it to the upper limit of all the classes.
Difference between Uni-variate Data and Bivariate Data

Univariate Bivariate
01 Involving a single variable Involving two Variable
02 does not deal with causes or relationships deals with causes or relationships
03 the major purpose of univariate analysis is the major purpose of bivariate analysis is to
to describe explain
04 Central tendency-mean, mode, median analysis of two variables simultaneously
Dispersion-range, variance, max, min, correlations
quartiles, standard deviation. comparisons, relationships, causes,
frequency distributions explanations
bar graph, histogram, pie chart, line-graph, Tables where one variable is contingent on
box-and-whisker plot the values of the other variable.
independent and dependent variables
Sample question: How many of the students Sample question: Is there a relationship
in the freshman class are female? between the number of females in Computer
Programming and their scores in
Mathematics?
Bivariate frequency Distribution

Example: The following figure indicate income (x) and percentage expenditure on food (y) of 25
families. Construct a bivariate frequency table, taking x into intervals 200-300, 300-400, ….and y
into 10-15, 15-20 etc.
x y x y x y x y x y
550 12 225 25 680 13 202 29 689 11
623 14 310 26 300 25 255 27 523 12
310 18 640 20 425 16 492 18 317 18
420 16 512 18 555 15 587 21 384 17
600 15 600 12 325 23 643 19 400 19
Write the marginal distribution of x and y and the conditional distribution of x when y lies between
15 and 20.

Solution: The two-way frequency table showing income (in Tk) and percentage
expenditure on food shown in table.
Expenditure Income Marginal
(y) on food 200-300 300-400 400-500 500-600 600-700 frequencies fy
percentage
10-15 || (2) ||||(4) 6
15-20 ||| (3) |||| (4) || (2) || (2) 11
20-25 | | (1) | 3
25-30 |||(3) || 5
Marginal 3 6 4 5 7 25
frequencies fx
The conditional distribution of x when y lies between 15 and 20 percent is as follows:
Income (x) 200-300 300-400 400-500 500-600 600-700

15-20%
0 5 4 2 2
Problem-3
The following data are given the points scored in a tennis match by two players X & Y at the end
of twenty games:
(10,12), (7,11), (7,9), (15,10), (17,21), (12,8), (16,10), (14,14), (22,18), (16,7), (15,16), (22,20),
(19,15), (7,18), (11,11), (12,18), (10,10),(5,13), (11,7), (10,10)
Taking class interval as 5-9, 10-14, 15-19 for X and Y.
i) Construct bivariate frequency table
ii) Conditional frequency distribution for Y given X>15
Solution
(i) Bivariate frequency Table
Y X Marginal
5-9 10-14 15-19 20-24 frequencies fy
5-9 | || | 4
10-14 || |||| | 8
15-19 | | ||| | 6
20-24 | | 2
Marginal 4 8 6 2 20
frequencies fx
(ii) Conditional frequency distribution for Y given X>15

15-19 20-24
5-9 1
10-14 1
15-19 3 1
20-24 1 1
6 2
Problem
The data below shows the mass of 40 students in a class. The measurement is to the nearest kg.
55 70 57 73 55 59 64 72
60 48 58 54 69 51 63 78
75 64 65 57 71 78 76 62
49 66 62 76 61 63 63 76
52 76 71 61 53 56 67 71
Construct a frequency table for the data using an appropriate scale.

Types of frequency Distribution
1. Cumulative frequency distribution

2. Relative frequency distribution
3. Percentage frequency distribution
Cumulative frequency distribution is of two types
 Less than cumulative frequency distribution: It is obtained by adding

successively the frequencies of all the previous classes including the class against
which it is written. The cumulate is started from the lowest to the highest size.
 More than cumulative frequency distribution: It is obtained by finding the
cumulate total of frequencies starting from the highest to the lowest class.
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Relative frequency= 𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑐𝑛 𝑦
Number of Frequency f Cumulative Cumulative Relative Percentage

accidents frequency (cf) frequency (cf) frequency frequency
less than more than distribution
0-4 5 5 50 5/50=0.10 5/50*100=10
5-9 22 27 45 22/50=0.44 44
10-14 13 40 23 18/50=0.26 26
15-19 8 48 10 8/50=0.16 16
20-24 2 50 2 2/50=0.04 4
Exercise
The distribution of ages of 500 readers of a nationality distributed magazine is given

below:
Age (in years) Number of readers

Below 14 20
15-19 125
20-24 25
25-29 35
30-34 80
35-39 140
40-44 30
45 above 45
Find the relative and cumulative frequency distributions for this distribution

Solution:
Age (in years) Number of readers Cumulative Relative

frequency (cf) frequency
0-14 20 20 20/500=0.04
15-19 125 145 0.25
20-24 25 170 0.05
25-29 35 205 0.07
30-34 80 285 0.16
35-39 140 425 0.28
40-44 30 455 0.06
45 -49 45 370 0.09
500

Chapter-3
Presentation of Data
This refers to the organization of data into tables, graphs or charts, so that logical and statistical
conclusions can be derived from the collected measurements.
Data may be presented in(3 Methods):

- Textual
- Tabular or
- Graphical.
Tabulation of Data
Tabulation of data is another way of summarizing and presenting the given data in a
systematic form in rows and column. Such presentation facilitates comparison by
bringing related information close to each other and helps in further statistical analysis
and interpretation.
Tabulation defines as the process of classifying the data in a systematic form which
facilitates comparative studies of data sets.
Objectives of tabulation:
 To simplify the complex data

 To economize space
 To depict trend
 To facilitate comparison
 To help as reference.
Essentials of a good table:
In general, a statistical table consists of the following eight parts. They are as follows:
(i) Table Number: Each table must be given a number. Table number helps in distinguishing one
table from other tables.
ii) Title of the Table: Every table should have a suitable title. It should be short & clear. Title
should be such that one can know the nature of the data contained in the table as well as where and
when such data were collected. It is either placed just below the table number or at its right.

(iii) Caption: Caption refers to the headings of the columns. It consists of one or more column
heads. A caption should be brief, concise and self-explanatory, Column heading is written in the
middle of a column in small letters.
(iv) Stub: Stub refers to the headings of rows.
(v) Body: This is the most important part of a table. It contains a number of cells. Cells are formed
due to the intersection of rows and column. Data are entered in these cells.
(vi) Head Note: The head-note (or prefatory note) contains the unit of measurement of data. It is
usually placed just below the title or at the right hand top corner of the table.
(vii) Foot Note: A foot note is given at the bottom of a table. It helps in clarifying the point which
is not clear in the table. A foot note may be keyed to the title or to any column or to any row
heading. It is identified by symbols such as *,+,@,£ etc.
(viii) Source Note: The source note shows the source of the data presented in the table. Reliability
and accuracy of data can be tested to some extent from the source note.
Exercise:
A survey of 370 students from the Commerce Faculty and 130 students from the Science Faculty
revealed that 180 students were studying for only C.A. examinations, 140 for only Costing
examinations and 80 for both C.A. and Costing examinations. The rest had opted for part-time
Management Courses. Of those studying Costing only, 13 were girls and 90 boys belonged to the
Commerce Faculty. Out of 80 students studying for both C.A. and Costing, 72 were from the
Commerce Faculty amongst which 70 were boys. Amongst those who opted for part-time
Management Courses, 50 boys were from the Science Faculty and 30 boys and 10 girls from the
Commerce Faculty. Of those studying CA only, 158 belongs to commerce faculty in which 150
boys and 6 girls belongs to science faculty. In all there were 110 boys in the Science Faculty.
(i) Present the above information in a tabular form.
(j) Find the number of students from the Science Faculty studying for part-time Management
Courses.
Solution:
Table-1 Distribution of Students according to Faculty Courses
Faculty Courses Commerce Science Total

Boys Girls Total Boys Girls Total Boys Girls Total
Part time Management 30 10 40 50 10 60 80 20 100
CA only 150 8 158 16 6 22 166 14 180
Costing only 90 10 100 37 3 40 127 13 140
CA and Costing 70 2 72 7 1 8 77 3 80
Total 340 30 370 110 20 130 450 50 500

Problem-1:
In a sample study about coffee habit in two towns, the following information was received:
Town A: Females were 40%; Total coffee drinkers were 45% and male non-coffee drinkers were
20%.
Town B: Males were 55%. Male non-coffee drinkers were 30% and Female coffee drinkers were
15%.
Represent the above data in a tabular form.
Solution:
Table Showing the Coffee Drinking Habit of Towns A and B
Town A Town B Total

Attribute Males Females Total Males Females Total
Coffee Drinkers 40 5 45 25 15 40 85
Non-coffee Drinkers 20 35 55 30 30 60 115
Total: 60 40 100 55 45 100 200
Problem-2
Present the following information in suitable form:
In 2003, out of total 1950 workers of a factory, 1400 were members of a trade union. The
number of women employed was 400 of which 275 did not belong to a trade union. In
2008, the number of union workers increased to 1780 of which 1490 were men. On the
other hand, the number of non-union workers fell to 408 of which 280 were men.
In the year 2013, there were 2000 employees who belonged to a trade union and 250 did
not belong to a trade union. Of all employees in 2000, 500 were women of whom only 208
did not belong to trade union.
Table-Year-wise Trade union Membership

Category 2003 2008 2013
Member Non Total Member Non Total Member Non Total
member member member
Men 1275 275 1550 1490 280 1770 1708 42 1750
Women 125 275 400 290 128 418 292 208 500
Total 1400 275 1950 1780 408 2188 2000 250 2250

Problem-3
A supermarket divided into 5 main sections – Grocery, Vegetables, Medicines, Textiles and
Novelties - recorded the following sales in 2005, 2006 and 2007:
In 2005, sales in Grocery, Vegetables, Medicines and Novelties were 6,25,000, 2,20,000,
1,88,000 and 94,000, respectively. Textile accounted for 30% of the total sales during the
year.
In 2006, the total sales showed 40% increase over the previous year. While Grocery and
vegetables showed 80% and 10% increase over their corresponding figure in 2005. Medicines
dropped by 13,000. Textiles stood at 5,36,000.
In 2007, though the total sales remained the same as in 2006, Grocery fell by 22,000,
vegetables by 32,000, Medicines by 10,000 and novelties by 12,000.
a. Tabulate the information given above for the supermarket.
b. How will you present the above information in the different ways?
Problem-4
A survey of 1500 workers in a factory gave the following results. Tabulate the information.
One third of the workers were females; 80% of the female workers were below 40 while the
percentage of male workers below 40 was 50. 80% of male workers below 40 were skilled and
the remaining unskilled. 40% of the male workers above 40 were skilled and the remaining
unskilled. 40% of the male workers above 40 were skilled. There was no skilled female worker
above 40 while 50 percent of the female workers below 40 were skilled.
SOLUTION:
DETAILS OF WORKERS IN A FACTORY
Male Female Total
Age Unskille
Skilled Unskilled Total Skilled Total Skilled Unskilled Total
d
Below 40 400 100 500 200 200 400 600 300 900
Above 40 200 300 500 0 100 100 200 400 600
Total 600 400 1000 200 300 500 800 700 1500

Contingency Table
A contingency table is a special type of frequency distribution table, where two variables are
shown simultaneously. For example, a contingency table showing frequency of Invoices
categorized by size and the presence of errors.
Intersections of the rows and columns are called cells and each cell contains a value associated
with a unique pair of responses for the two variables.
Size amount No of errors Errors Total
Small 170 20 190
Medium 100 40 140
Large amount 65 5 70
Total 335 65 400
Graphical Presentation of Data
Graphical presentation of frequency distribution facilitate easy understanding of data

presentation and interpretation.
Advantage of Graphical Presentation:
 Diagram give an attractive and elegant presentation

 Diagrams have good visual impact.
 Diagram facilitates comparison
 Diagram save time
 Diagrams simplify complexity and depict characteristics of the data.
Limitations:
 They provide only an appropriate picture of the data.

 They cannot be used as alternative to tabulation of data.
 They are capable of representing only homogeneous and comparable data.
 They can be used only for comparative study.

Types of Diagram:
Type of Diagram Definition and Uses Sample Picture
Data
Categorical Bar Chart  Used for both grouped or
Data ugrouped data.
 The graph usually compares
different categories.
 A bar chart is used for when
you have categories of data:
Types of movies, music
genres, or dog breeds.
 Bar graph is not suitable
when trying to show trends
over a course of time.
Pie chart  A Pie Chart is a type of graph
that displays data in a
circular graph.
 These diagram are normally
used show the total number
of observation of different
data types in the data set on
a percentage basis.
 For example yearly income
or expenditure on various
sector, different types of
fruits people has to choice
etc.
 Pie charts are useful for
displaying data that are
classified into nominal or
ordinal categories.
 This kind of chart only
represents one data set –
you'd need a series of pie
charts to compare multiple
sets.
Pareto Chart  A vertical bar chart where
categories are shown in
descending order according
to their frequencies and are
combined with a cumulative
percentage line on the same
chart
 The Pareto diagram is a
graphical overview of
process problems in ranking
order from the most
frequent, down to the least
frequent.
 When analyzing data about
the frequency of problems or
causes in a process.

Side-by side  A side-by-side bar chart
Bar chart uses sets of bars to show
the joint responses from
two categorical variables.
Presenting The stem and
Numerical leaf display
Data
Frequency  Histogram is a bar chart for
distribution grouped numerical data, in
and which vertical bars are used
cumulative to represent the frequencies
distribution or percentage in each group.
 Histogra  Histogram presents the
m continuous data set whereas
bar chart presents discrete
data sets.
 There is no gap between the
adjacent bars.
 Polygon  Polygon-Percentage polygon
uses the midpoint of each
class interval to represent
the data of each class.
 It is useful when there is two
or more groups to compare.
 In percentage polygon, the
data points are marked on
the x-axis and the
percentage of the frequency
of the data points is marked
on the y-axis.
 It is a line graph that is
drawn by joining all the
percentages of the
frequency of the data points.
 Ogive  An ogive (oh-jive), is a line
graph that depicts
cumulative frequencies
 It is also called a cumulative
frequency polygon.
 An ogive graph plots
cumulative frequency on the
y-axis and class boundaries
along the x-axis.
 Two different types of ogive
can be drawn. They are less
than type ogive and more
than type ogive.
 Ogives are useful for
determining the number of
values below and above a
particular value, also ogives
are useful in comparing
between two sets of data.

Presenting Scatter Plot
two
numerical
data
Time series
Plot
Question: For the data given below, construct a less than cumulative frequency table and plot
its ogive.
Marks 0 - 10 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80 80 - 90 90 -100
Frequency 3 5 6 7 8 9 10 12 6 4
Solution:
Less than cumulative
Marks Frequency
frequency
0 - 10 3 3
10 - 20 5 8
20 - 30 6 14
30 - 40 7 21
40 - 50 8 29
50 - 60 9 38
60 - 70 10 48
70 - 80 12 60
80 - 90 6 66
90 - 100 4 70

Plot the points having abscissa as upper limits and ordinates as the cumulative frequencies (10,
3), (20, 8), (30, 14), (40, 21), (50, 29), (60,38), (70, 48), (80, 60), (90, 66), (100, 70) and join the
points by a smooth curve.

Chapter-4
Descriptive Measures
Although frequency distributions and corresponding graphical representation make raw data more
meaningful, yet they fail to identify three major properties that describe a set of quantitative data.
These three major properties are:
1. The numerical value of an observation around which most of the values of other
observations in data set show a tendency to cluster or group, called central tendency.
2. Extent to which numerical values are dispersed around the central value, called variation.
3. The extent of departure of numerical values from symmetrical (normal) distribution around
the central value, called skewness.
These three properties-central tendency, variation and shape of the frequency distribution-may be
used to extract and summarize major features of data set by the application of certain statistical
methods called descriptive measures or summery measures.
There are three types of descriptive measures:

1. Measures of central tendency
2. Measures of dispersion or variation.
3. Measures of symmetry-skewness.
If the descriptive summery measures are computed using data of samples, then these are called
sample statistic or simply statistic but if these measures are computed using data of the
population is called parameters.
Measures of Central Tendency
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. It is the extent which the data values group
around a typical or central value. As such, measures of central tendency are sometimes called
measures of central location. The mean (often called the average) is most likely the measure of
central tendency.
Objective of Averaging
A single value which can represent the whole set of data is called an average”. A few of the
objectives to calculate typical central value or average in order to describe the entire data set are
given below:
 It is useful to extract and summarize the characteristics of the entire data set in a precise form.

 Since an ‘average’ represent the entire data set, it facilitate comparison between two or more
data set. For example, average sales figure of any month can be compare with the preceding
months.
 It offers a base for computing various other measures such as dispersion, skewness etc. that
help in many other phases of statistical analysis.
 To formulate policies or to help in decision making.
Measures of Central Tendency
The various measures of central tendency or average commonly used can be classified as:
1. Mathematical Average
(a) Arithmetic mean, commonly known as mean or average.
(i) Simple
(ii) Weighted
(b) Geometric mean
(c) Harmonic mean
2. Average of Position
(a) Median
(b) Quartiles
(c) Deciles
(d) Percentiles
(e) Mode.
Arithmetic Mean
 Ungrouped data which is also known as raw data is data that has not been placed in any
group or category after collection.
 Grouped (or classified) data is the type of data which is classified into groups after collection.
 There are two methods for calculating arithmetic mean for ungrouped and unclassified data:
i) Direct method
ii) Indirect or Short-cut method.
In direct method, Arithmetic Mean is calculated by adding all the observations and dividing the
total by the number of observation. i.e.,
x1  x2  x3  .................  x N 1 N
Population Mean,  
N

N
x
i 1
i
x1  x2  x3  .................  xn 1 n
Sample mean, x    xi
n n i 1

Alternative formula: When observations xi (i=1,2,3…) are grouped as a frequency
distribution, then arithmetic mean formula rewritten as-
1 n
x  f i xi ,
n i 1
Where f i represent the frequency with which variable x i occurs in the given data set.
Exercise: The number of new orders received by a company over the last 25 days were recorded
as follows: 3,0,1,4,4,4,2,5,3,6,4,5,1,4,2,3,0,2,0,5,4,2,3,3,1. Calculate the arithmetic mean for the
number of orders received over all similar working days.
Solution:
Number of Orders, xi Frequency (fi) fi xi

0 3 0
1 3 3
2 4 8
3 5 15
4 6 24
5 3 15
6 1 6
25 71
1 n 71
Arithmetic Mean, x  
n i 1
f i xi 
25
 2.84  3
In indirect or Short-cut method, an arbitrary assumed mean is used as a basis for calculating
deviations from individual values in the data set. Let A be the arbitrary assumed Arithmetic mean
and let,
d i xi  A or xi  A  di
1 n 1 n 1 n
Now, x   i n
n i 1
x 
i 1
( A  d i )  A   di
n i 1
If frequencies of the numerical values are also taken into consideration, then
1 n
x  A  fi di
n i 1

Exercise: The daily earnings in Tk. of employees working on a daily basis in a firm are:
Daily earnings 100 120 140 160 180 200 220

(Tk.)
Number of 3 6 10 15 24 42 75
employees
Calculate the average daily earnings for all employees.
Solution: The calculations of average daily earnings for employees are shown in below:
Let, A=160
Daily Earnings (Tk.) Number of d i  xi  160 f i di
employees, fi
100 3 100-160=-60 -180
120 6 -40 -240
140 10 -20 -200
160 15 0 0
180 24 20 480
200 42 40 1680
220 75 60 4500
175 6040
Arithmetic Mean
1 n 6040
x  A 
n i 1
f i d i  160 
175
 194.514
Arithmetic Mean of Grouped Data
It also follows the two methods:
(a) Direct method

(b) Indirect or Step-deviation method.
For calculating arithmetic mean for a grouped data set, the following assumptions are made:
i) The class interval must be closed

ii) The width of each class interval should be equal.
iii) The values of observations in each class interval must be uniformly distributed between
lower and upper limits.
iv) The mid value of each class interval represent the average of all values in that class.

Direct Method: The same formula but xi is replaced with the midpoint value mi of class interval.
1 n
x  f i mi
n i 1
Indirect or Step-deviation Method:
x  A
fd i i
h
n
Where, A=assumed arithmetic mean
h=width of the class interval
mi =mid value of class interval
n=sum of all frequencies
mi  A
d i =deviation from assumed mean= d i 
h
Exercise:
A company is planning to improve plant safety. For this, accident data for the last 50 weeks was
compiled. These data are grouped into frequency distribution as shown below:
Number of Accident : 0-4 5-9 10-14 15-19 20-24
Number of weeks : 5 22 13 8 2
i) Calculate the arithmetic mean of the number of accident per week both direct and
indirect method.
Solution:
Number of Mid Points, No. of fimi mi  A mi  12 fidi
Accidents mi weeks, fi di  
h 5
0-4 2 5 10 2  12 -10
 2
5
5-9 7 22 154 7  12 -22
 1
5
10-14 12 13 156 12  12 0
0
5
15-19 17 8 136 17  12 8
1
5
20-24 22 2 44 22  12 4
2
5
50 500 -20

i) The arithmetic mean of the number of accident per week
1 n 500
x   f i mi   10
n i 1 50
ii) The arithmetic mean by using indirect method,
x  A
 fi di  h
n
 20
 x  12   5  12  2  10
50
Advantages of Arithmetic Mean
1. The calculation of arithmetic mean is simple and it is unique, that is every data set has one
and only one mean.
2. The calculation of arithmetic mean is based on all values in the data set.
3. The arithmetic mean is reliable single value that reflects all values in the data set.
4. It is least affected by fluctuations of sampling.
5. It is readily put to algebraic treatment.
Disadvantages
 The arithmetic mean is highly affected by extreme values, Imagine a data set of 4, 5, 6, 7,
and 8,578. The sum of the five numbers is 8,600 and the mean is 1,720 – which doesn’t
tell us anything useful about the level of the individual numbers.
 It cannot average the ratios and percentages properly.
 It is not an appropriate average for highly skewed distributions.
 It cannot be computed accurately if any item is missing.
 The mean sometimes does not coincide with any of the observed value.
 The mean cannot be calculated for qualitative characteristics such as intelligence, beauty
or loyalty.
 Mean cannot be calculated for a unequal or open ended class interval.
Weighted Arithmetic Mean
When calculating the arithmetic mean, the importance of all the items are considered to be equal.
However, there may be situations in which all the items under considerations are not of equal
importance. For example, when we want to find the average number of marks per students in
different subjects like mathematics, statistics, physics and biology. These subjects do not have
equal importance. Thus, the arithmetic mean computed by considering the relative importance

of each item is called the weighted arithmetic mean. To give due importance to each item under
consideration, we assign a number called a weight to each item in proportion to its relative
importance.
So, the weighted arithmetic mean is a measure of central tendency of a set of quantitative
observations when not all the observations have the same importance. We must assign a weight
to each observation depending on its importance relative to other observations.
The weighted arithmetic mean is computed by using the following formula:
xw 
x wi i
w i
Weighted mean gives the result equal to the simple mean if the weights assigned to each of the
variant values are equal.
Weighted arithmetic mean should be used:
 When the importance of all the numerical values in the given data set is not equal.
 When frequencies of various classes are widely varying.
 Where there is a change either in the population of numerical values or in the proportion
of their frequencies.
 When ratios, percentage or rates being averaged.
Exercise: An examination was held to decide the awarding of a scholarship. The weight of various
subjects were different. The marks obtained by 3 candidates (out of 100) are given below:
Subject Weight Student

A B C
Mathematics 4 60 57 62
Physics 3 62 61 67
Chemistry 2 55 53 60
English 1 67 77 49
Calculate the weighted Arithmetic Mean to award the scholarship and make a comparison with
simple arithmetic means to take the decision.

Solution:
Subject Weight (wi) Student A Student B Student C

Marks xi xi wi Marks xi xi wi Marks xi xi wi
Mathematics 4 60 240 57 228 62 248
Physics 3 62 186 61 183 67 201
Chemistry 2 55 110 53 106 60 120
English 1 67 67 77 77 49 49
10 244 605 248 594 238 618
We have,
x wA 
x w
i i
w i
For A,
x wA 
x w
i i

605
 60.3, x wA 
x i

244
 61
w i 10 n 4
For B,
x wA 
x w
i i

594
 59.4, x wA 
x i

248
 62
w i 10 n 4
For C,
x wA 
x w
i i

618
 61.8, x wA 
x i

238
 59.5
w i 10 n 4
From the above calculation, it may be noted that, student B should get the scholarship as per
simple Arithmetic mean values, but according to weighted arithmetic mean Student C should get
the scholarship because all the subject of the examination are not equal importance.
Problem: An appliances manufacturing company is forecasting regional sales for the next year.
The Chittagong branch, with current yearly sales of Tk.387.6 million, is expected to achieve a sales
growth of 7.25%; the Sylhet branch, with current sales Tk.158.6 million, is expected growth by
8.20% and the Barishal branch, with sales of Tk.115 million, is expected to increase sales by 7.15
percent. What is the average rate of growth forecasted for the next year?
x wA 
x w
i i

387.6  7.25  158.6  8.20  115  7.15 2810.10  1300.52  822.25,

w i 387.6  158.6  115 661.20
4932.87
  7.46%
661.20

Problem:
A management consulting firm has four types of professionals on its staff: managing consultants,
senior associates, field staff, and office staff. Average rates charged to consulting clients for the
work of each of these professional categories are Tk.3150/hour, Tk.1680/hour, $1260/hour, and
Tk.630/hour respectively. Office records indicate the following number of hours billed last year in
each category: 8,000, 14,000, 24,000 and 35,000. If the firm is trying to come up with an average
billing rate for estimating client charges for the next year, what would you suggest they do and
what do you think is appropriate rate?
Solution:
The data given in the problem are as bellows:
Staff Counseling Charges (Tk. Per hour), xi Hours Billed, wi

managing consultants 3150 8000
senior associates 1680 14000
field staff 1260 24000
office staff 630 35000
Now,
x wA 
x w
i i
w i
3150(8000)  1680(14000)  1260(24000)  630(35000)


8000  14000  24000  35000
25200000  23520,000  3,02,40,000  220,50,000

81000
 TK .1247.037 per hour
However, firm should cite this average rate for the clients who use four professional categories.
Problem
According to a utility company, utility plant expenditures per employee were approximately $50,845,
$43,690, $47,098, $56,121, and $49,369 for the year 2005 through 2009. Employee at the end of each
year numbered 4738, 4637, 4540, 4397, and 4026, respectively. Using the annual number of employees
as weights, what is the weighted mean for the annual utility plant investment per employee during this
period?

Geometric Mean
In many business and economics problem, we deal with quantities (variables) that change over a
period of time. In such cases, an average percentage change rather simple average value to
represent the average growth or declining rate in the variable value over a period of time. Thus we
need to calculate another measure of central tendency called geometric mean.
The specific application of geometric mean is to show multiplicative effects over time in
compound interest and inflation calculation.
Formula
G.M  n x1 .x2 .x3 .............xn  x1 .x2 .x3 .............xn n

1
Using log
  log xi 
G.M  Anti log  
 n 
If observations occurs with frequencies, then
  f i log xi 
G.M  Anti log  
 n 
Problem
The rate of increase in population of a country during last three decades is 5%, 8% and 12%. Find
the average rate of growth during the last three decades.
Solution:
Decades Rate of Increase (%) Population at the end of
decades
1 5 105
2 8 108
3 12 112
G.M   x1 .x2 .x3 .............xn n
1
1
 (105  108  112) 3
 108.295
Hence, the average rate of increase in population over the last three decades is 108.2-100=8.2
percent.
Advantage:
 The fluctuations of the observations do not affect the geometric mean.
 It is not affected by extreme values.

 A geometric mean is based upon all the observations
 It is useful for averaging ratio and percentage as well in determining rate of increase or
decrease.
 It gives more weight to small items
Disadvantages of Geometric Mean
 Its calculation is very difficult.
 The value of GM cannot be calculated when any of the observation in the data set is either
negative or zero.
 While calculating weighed geometric, mean equal importance is not given to each
observation in the data set.
Problem:
A given machine is assumed to depreciate 40 percent in value in the first year, 25 percent in
the second year, and 10 percent per year for the next three years, each percentage being
calculated on diminishing value. What is the average depreciation recorded on the
diminishing value for the period of five years?
Solution:
Rate of depreciation, No of years, f Log x f log x
x
40 1 1.6021 1.6021
25 1 1.3979 1.3979
10 3 1 3
6
We have,
  f i log xi  6
G.M  Anti log    Anti log   15.848
 n  5
Hence, the average rate of depreciation for first five years is 15.85%.
Harmonic Mean
A simple way to define a harmonic mean is to call it the reciprocal of the arithmetic mean of the
reciprocals of the observations. The most important criteria for it is that none of the observations
should be zero.
A harmonic mean is used in averaging of ratios. The most common examples of ratios are that of
speed and time, cost and unit of material, work and time etc.
n
HM  n
(For ungrouped data)
1

i 1 xi

n
HM  n
(for grouped data)
1
i 1
fi
mi
Problem:
Find the harmonic mean of the following distribution of data:
Dividend yield (percent) 2-6 6-10 10-14
Number of companies 10 12 18
Solution:
Dividend Mid value, No of companies Reciprocal of fi(1/mi)
yield mi (frequencies, fi) Mid value, mi
2-6 4 10 1/4 2.5
6-10 8 12 1/8 1.5
10-14 12 18 1/12 1.5
40 5.5
The harmonic Mean:

n 40
HM  n
  7.27
1
f
5.5
i
i 1 mi
Hence, average dividend yield of 40 companies is 7.27 percent.

Problem: 2 Profit earned by 18 companies is given below:
Profit in Tk. Lakh 20 21 22 23 24 25
No of companies 4 2 7 1 3 1
Calculate the harmonic mean of profit earned.
Solution:
n 18
HM  n

1 1 1 1 1 1 1
f
i 1
i
xi
4
20
 2  7
21 22
 1
23
 3
24
 1
25
18
  21.90lakh
0.821
Advantages
 It is based upon all the observations
 The fluctuations of the observations do not affect the harmonic mean
 More weight is given to smaller items
 The original formula of HM can be extended to accommodate further analysis.

Disadvantages
 Its calculation is very difficult.
 The value of HM cannot be calculated when any of the observation in the data set is either
negative or zero.
Averages of Position
The term ‘position’ refers to the place of the value of an observation in the data set. Sometimes we
need to measure qualitative characteristics of data set such as: honesty, consumer acceptance, and
so on, other measures of central tendency namely,
1. Median
2. Quartiles
3. Deciles
4. Percentile
5. Mode
1. Median: The median may be defined as the middle value in the data set when its elements are
arranged in a sequential order.
 Ungrouped data:
In this case, first the data is arranged in either ascending or descending order of
magnitude.
(i) If observations is an odd number then,
n 1
( )
Med=Size or value of 2 th observations in the data set.
(ii) If observation is an even number,

n n
th observation  (  1)th observation
Med  2 2
2
Exercise: Calculate the median of the following data relates to the number of patients examined
per hour in the outpatient ward in a hospital: 10, 12, 15, 20, 13, 24, 17, 18.
The data are arranged in ascending order as follows
No of observation: 1 2 3 4 5 6 7 8
No. of patient: 10 12 13 15 17 18 20 24

15  17
Med   16
2
Thus median number of patient examined per hour in OPD in a hospital are 16.
 Grouped data: To find median of group data, first identify the class interval which contains
n
the median values, or th observation of the data set. Then identify such class interval, find
2
the cumulative frequency of the each class.
n
   cf
Med  l   
2
h
f
l=lower limit of the median class interval.
cf=cumulative frequency of the class interval prior to the median class interval.
h=width of the class interval.
f=frequency of median class
n=total number of observation.
Example: In a factory employing 3000 persons, 5 percent earn less than Tk.150 per day, 580 earn
Tk.151 to tk.200 per day, 30% earn from Tk.201 to Tk.250 per day, 500 earn from Tk.251 to
Tk.300, 20 percent earn Tk.301 to Tk.350 per day, and the rest earn Tk,351 or more per day. What
is the median value?
Calculation of Median
Earning (Tk) Percent of worker Number of Persons Cumulative
frequency
Less than 150 5 150 150
151-200 580 730
201-250 30 900 1630
251-300 500 2130

301-350 20 600 2730
351 and above 270 3000
n
Median observation= th=3000/2=1500th observation. This observation lies in the class interval
2
201-250.
n
   cf
1500  730
Med  l   
2
 h  201   50  201  42.77  Tk . 243.77
f 900
The measures of central tendency which are used for dividing into several equal parts are called
partition values-such as quartiles, deciles and percentiles.
2. Quartiles: The values which divide an ordered data set into 4 equal parts. The first quartile
divide a distribution such a way that 25 percent (=n/4) of observation have value less than Q 1.
The second quartile position is the median of the data set, which divides the data set in half.
The formula is:
n
i   cf
Qi  l   
4
 h [i=1,2,3,4]
f
3. Deciles: Deciles divide a data set into ten equal parts. The deciles are the nine values of the
variable that divide an ordered data set into ten equal parts. The deciles determine the values
for 10%, 20%... and 90% of the data. The formula is:
n
i   cf
Di  l   
10
 h [i=1,2,3,4,…….10]
f
4. Percentile: In common use, the percentile usually indicates that a certain percentage falls
below that percentile. For example, if you score in the 25th percentile, then 25% of test takers
are below your score, 75 percent at or above your score. The formula is:
 n 
i   cf
Pi  l   100 
h
f
5. Mode: The Mode is that value of an observation which occurs most frequently in the data set,
that is, the point (or class mark) with highest frequencies.
It is always preferable to calculate mode from grouped data set.
f m  f m1
M0  l  h
2 f m  f m1  f m1
Where, l=lower limit of modal class interval

fm=Frequency of modal class
fm-1=Frequency of class interval preceding the mode class interval.
fm+1= Frequency of class interval following the mode class interval

h=width of the mode class
Problem:
You are working for a transport manager of a call center, which hires cars for staffs. You are
interested in the weekly distances covered by these cars. Kilometers recorded for a sample of hired
cars during a given week yielded the following data:
Kilometers Covered Number of cars Kilometers covered Number of cars
100-110 4 150-160 8
110-120 0 160-170 5
120-130 3 170-180 0
130-140 7 180-190 2
140-150 11 40
(i) Calculate Median and

(ii) Quartiles Q1,Q3
(iii) Percentiles P67, P75, P87
(iv) Deciles D7
Solution:
Kilometers Covered Number of cars, f Cumulative frequency, cf

100-110 4 4
110-120 0 4
120-130 3 7 Q1
130-140 7 14
140-150 11 25 Me
150-160 8 33 Q3
160-170 5 38
170-180 0 38
180-190 2 40
40
(i) Median:
We have,
n
   cf
Med  l   
2
h
f
Since a median observation in the data set is the (n/2) th =(40/2)=20 th observation. The
observation lies in the class interval 140-150. Now we have,

 40 
   14
Med  140   
2
 10
11
20  14
 140   10  145.45
11
(ii) Quartiles:
We have
n
i   cf
Qi  l   
4
h
f
Since there are 40 observation in the data set, we find 1st quartiles at 40/4=10th observation. The
observation lies in the class interval 130-140.
 40 
1   7
Q1  130   
4
 10
7
 130  4.28
 134.42
We find 3rd quartiles at 3(40/4)=30th observation. The observation lies in the class interval 150-
160.
 40 
3   25
Q3  150   
4
 10
8
 150  6.25  156.25
 156
(iii) Percentiles:
We find P67 at 67(40/100)th =26.8=27th observation, The observation lies in the class interval
150-160.
 40 
75   25
P75  l   100 
 10
8
27  25
 150   10
8
 152.50
P75 at 75(40/100)th =30th observation, The observation lies in the class interval 150-160.

 40 
75   25
P75  l   100 
 10
8
30  25
 150   10
8
 156.25
We find P87 at 87(40/100)th =35th observation, The observation lies in the class interval 160-170
 40 
87   25
P87  l   100 
 10
8
35  33
 160   10
5
 164
(iv) Deciles:
We find D7 at 7(40/10)=28th observation, which lies in the class interval 150-160.
 40 
7   25
D7  150   
10
 10
8
 153.75
Problem-2
The following distribution gives the pattern of overtime work per week done by 100 employees of
a company. Calculate Median, first quartile and seventh decile. Also calculate P60 and the mode of
the overtime work distribution.
Overtime hours 10-15 15-20 20-25 25-30 30-35 35-40

No of employees 11 20 35 20 8 6
Solution:
Overtime Hours Number of employees Cumulative frequency
10-15 11 11
15-20 20 31
20-25 35 66
25-30 20 86
30-35 8 94
35-40 6 100
100
Since the number of observation in the data set is 100, the median value is (n/2)th (=100/2)=50 th
observation. This observation lies in the class interval 20-25.

Calculation of Median:
We have
n
   cf
Med  l   
2
h
f
Now,
 100 
   31
Med  20   2 
5
35
50  31
 20   5  20  2.714  22.714 hours
35
First Quartile:
We have,
n
i   cf
Qi  l   
4
h
f
Now, first quartile is the value of (n/4)th observation=(100/4)=25 th observation, which lies in the
class interval 15-20.
 100 
1   11
Q1  15   4 
5
20
25  11
 15  5
20
 15  3.5
 18.5 hours
Seventh Deciles:
n
i   cf
Di  l   
10
h
f
Seventh decile is the value of 7(n/10)th observation=7(100/10)=70th observation, which lies in the
class interval 25-30.
Thus,

 100 
7   66
D7  25   10 
5
20
70  66
 25  5
20
 25  1
 26 hours
Percentile Calculation
 n 
i   cf
Pi  l   100 
h
f
P60=Value of 60(n/100)th observation=60(100/100)=60th observation, which lies in the class
interval 20-25
Thus,
 100 
60   31
P60  20   100 
 5  24.14 hours
35
Mode Calculation:
We have,
f m  f m1
M0  l  h
2 f m  f m1  f m1
The largest frequency corresponds to class interval 20-25.
35  20
M 0  20   5  20  2.5  22.50 hours
2  35  20  20
Problem-3
The following are the profit figures earned by 50 companies in the country.
Profit in (Tk. lakh) Number of companies

10 or less 4
20 or less 10
30 or less 30
40 or less 40
50 or less 47
60 or less 50

Calculate
(i) The median
(ii) The range of profit earned by middle 80 percent of the companies.
Problem-4
The following is the data on profit margin (in percent) of three products and their corresponding
sales (inTk.) during a particular period.
Product Profit Margin (Percent) Sales (Tk. thousand)

A 12.5 2000
B 10.3 6000
C 6.4 10000
(a) Determine the mean profit margin.

(b) Determine the weighted mean considering the Tk. sales as weight for each product.
(c) Which of the means calculated is the correct one?

Chapter-5
Measures of Variations
The measures of central tendency describe the major part of the values in the data set appears to
concentrate (cluster) around a central value called average with the remaining values scattered on
either sides of that value. But these measures do not reveal how these values are dispersed (spread
or scattered) on each side of the central value. The dispersion of values is indicated by the extent
to which these values tend to spread over an interval rather than cluster closely around an average.
So, Variation is a way to show how data is dispersed, or spread out.
The statistical techniques to measure such dispersion are two types:

(a) Measures of dispersion (or variation): measures the extent of variation or deviation of each
value from the measure of central tendency, usually mean or median.
(b) Measures of skewness: to measure the direction of variation in the distribution of values in
the data set.
Significance of Measuring Dispersion

Measures of dispersion are needed for four basic purposes
(i) To determine the reliability of an average.
(ii) To serve as a basis for the control of the variability.
(iii) To compare two or more series with regard to their variability.
(iv) To facilitate the use of other statistical measures.
A brief explanation of these points is given below:

(i) Measures of variation point out as to how far an average is representative of the mass.
When dispersion is small, the average is a typical value in the sense that is closely
represents the individual value and it is reliable in the sense that it is a good estimate of the
average in the corresponding universe. On the other hand, when dispersion is large the
average is not so typical, and unless the sample is very large, the average may be quite
unreliable.
(ii) Another purpose of measuring dispersion is to determine nature and cause of variation in
order to control the variation itself. In matter of health, variations in body temperature,
pulse beat and blood pressure are the basic guides to guides to diagnosis. Prescribed
treatment is designed to control their variation. In industrial production efficient operation
requires control of quality variation, the cause of which are sought through inspection and
quality control programmes. Thus measurement of dispersion is basic to the control of
cause of variation. In engineering problems measures of dispersion are often especially
important. In social sciences a special problem requiring the measurement of variability is
the measurement of “inequality” of the distribution of income or wealth, etc.

(iii) Measures of dispersion enable a comparison to be made of two or more series with regard
to their variability. The study of variation may also be looked upon as a means of
determining uniformity or consistency. A high degree of variation would mean little
uniformity or consistency whereas a low degree of variation would mean great uniformity
or consistency.
(iv) Many powerful analytical tools in statistics such as correlation analysis, the testing of
hypothesis, the analysis of fluctuations, techniques of production control, cost control, and
so on are based on measures of variation of one kind or another.
Classification of Dispersion
Measures of dispersion
Algebraic Graphic
Absolute or Relative Lorenz Curve
Distance Measure Average deviation Measure
Range and its Interquartile range Mean absolute deviation Standard Deviation
coefficient Or Deviation & its co-efficient or its co-efficient & its coefficient
Distance Measure
The distance measures describe the spread or dispersion of values of a variable in terms of
difference among values of data set. The average deviation measures describe the average
deviation for a given measure of central tendency.
Two distance measures are-
(i) Range
(ii) Interquartile deviation

Range: It is the difference between the largest and the smallest observation in a set of data.
Range, R  Highest Value of an Observation - Lowest Value of an Observation
 H-L
For example, if the smallest value of an observation in the data set is 160 and the largest value is
250, then the range is 250-160=90.
For grouped frequency distributions value in the data set, the range is the difference between the
upper class limit of the last class and the lower class limit of first class.
Coefficient of Range
The relative measure of range, called coefficient of range.
H  L Range
Coefficient of Range  
HL HL
Exercise: The following are the sales figure of a firm for last 12 months
Months 1 2 3 4 5 6 7 8 9 10 11 12
Sales
80 82 82 84 84 86 86 88 88 90 90 92
(Tk. ‘000)
Calculate range and coefficient of range for sales.
Solution: Given that, H=92 and L=80, therefore
Range=H-L=92-80=12
H L 12
and Coefficient of Range    0.069
H  L 92  80
Merits of Range
 It is the simplest of the measure of dispersion
 Easy to calculate
 Easy to understand
 It is independent of measure of central tendency.
 It is quite useful in cases where the purpose is only to find out the extent of extreme
variation, such as industrial quality control, temperature, rainfall and so on.
Disadvantages
 It is based on two extreme observations.
 It is largely influenced by two extreme values and completely independent of the other
values.

 A range is not a reliable measure of dispersion
 Dependent on change of scale
Application of Range
 Fluctuation in share prices
 Quality control
 Weather forecast
Interquartile Range or Deviation

The limitations of the range is partially be overcome by using another measure of variation which
measures spread over the middle half of the values in the data set so as to minimize the influence
of outliers (extreme values) in the calculation of range.
The interquartile range is a measure of dispersion or spread of values in the data sets between the
third quartile and the first quartile.
Interquartile Range  Third Quartile - 1st Quartile  Q3  Q1
Half distance between third quartile and the first quartile.is called semi-interquartile range or
the quartile deviation.
Third Quartile - 1st Quartile Q3 Q 1

Quartile Deviation(QD)  
2 2
Coefficient of Quartile Deviation:

Coefficient of QD  
Sum of Two Quartile Q3  Q1
Exercise:
Use of appropriate measure to evaluate the variation in the following data:
Farm size (acre) No. of firms Farm size (acre) No of firms

Below 40 394 161-200 169
41-80 461 201-240 113
81-120 391 241-above 148
121-160 334
Solution
Since the frequency distribution has open-end class intervals on the two extreme sides,
therefore, QD is the appropriate measurement of variation. The computation od QD is shown in
Table below:

Farm Size (acre) No of firms Cumulative frequency, cf
Below 40 394 394
41-80 461 855
81-120 391 1246
121-160 334 1580
161-200 169 1749
201-240 113 1862
241-above 148 2010
2010
We have,

Quartile Deviation(QD)  
2 2
First Quartile, Q1
n
i   cf
Qi  l   
4
h
f
Now, first quartile is the value of (n/4)th observation=(2010/4)=502.5th observation, which lies in
the class interval 41-80.
 2010 
   394
Qi  41   4 
 40  41  9.41  50.41 acres
461
Third Quartile, Q3
n
3   cf
Q3  l   
4
h
f
Now, 3rd quartile is the value of 3(n/4)th observation=3(2010/4)=1507.5th observation, which

lies in the class interval 121-160, therefore,
 2010 
3   1246
Q3  121   4 
 40  121  31.31  152.31 acres
334
The quartile deviation is
152.31 - 50.41
Quartile Deviation(QD)   50.95acres
2
And coefficient of QD is

152.31  50.41 101.9
Coefficient of QD    0.502
152.31  50.41 202.72
Problem-1
You are given the data pertaining to kilowatt hours of electricity consumed by 100 persons in a
city.
Consumption No. of users

(Kilowatt hours)
0-10 6
10-20 25
20-30 36
30-40 20
40-50 13
Calculate range within which the middle 50 percent of consumption fall.

Also find interquartile range and coefficient of quartile deviation.
Ans. Q3-Q1=34-17.6=16.4
Average Deviation Measures

The range and quartile deviation indicate the overall variation in a data set, but do not indicate
spread or scatteredness around the centrilier (i.e., mean, median and mode). However, to
understand the nature of distribution of values in the data set, we need to measure ‘spread’ of
values around the mean to indicate how representative the mean is.
The measures are:
(i) Mean Absolute Deviation (MAD) or average deviation.
(ii) Variance and standard deviation.
Mean Absolute Deviation
1 n
MAD   x  x , for a sample
n i 1
n
1
MAD 
N
 x   , for population
i 1

For grouped frequency distribution,
n
 f x x i i
MAD  i 1
f i
Coefficient of MAD
MAD MAD
Coefficient of MAD  or  100
x Me
Exercise:
The number of patient seen in the emergency ward of a hospital for a sample of 5 days in the last
month were: 153,147,151,156, and 153. Determine the mean deviation and interpret.
Solution:
The mean number of patient, x =(153+147+151+156+153)/5=152.
Number of Patients, x x- x xx

153 153-152=-1 1
147 147-152=-5 5
151 151-152=-1 1
156 156-152=4 4
153 153-152=1 1
n
12
 xx
i 1
1 n 12
MAD  
n i 1
xx 
5
 2.4  3 Patients (approx.)
The mean absolute deviation is 3 patients per day. The number of patients deviate per day on the
average by 3 patients from the mean of 152 patients per day.

Exercise-2
Find the mean absolute deviation from mean for the following frequency distribution of sales (Tk
in thousand) in a co-operative store.
Sales 50-100 100-150 150-200 200-250 250-300 300-350
Number of days 11 23 44 19 8 7
Solution:
We have
n
 f x x i i
MAD  i 1
f i
Calculation for MAD: For calculation of mean, x , let A=175
Sales (Tk) Mid Frequency, mi  A fd xx f xx

di 
Value, m f h
50-100 75 11 75-175/50 -2 -22 104.91 1154.01
100-150 125 23 125-175/50 -1 -23 54.91 1262.93
150-200 175 44 175-175/50 0 0 4.91 216.04
200-250 225 19 225-175/50 1 19 45.09 856.71
250-300 275 8 275-175/50 2 16 95.09 760.72
300-350 325 7 325-175/50 3 21 145.09 1015.63
112 11 5266.04
x  A
fd i i
h
n
11
 x  175   50  179.91 per day
112
There mean absolute deviation is
5266.04
MAD   47.01
112
Coefficient of MAD
MAD 47.01
Coefficient of MAD    0.2612  26.12%
x 179.91
Thus, the average sales is Tk.179.91 thousand per day and mean absolute deviation of sales is Tk.
47.01 per day and relative measure of MAD is 26.12%.

Exercise: Calculate the mean absolute deviation and its coefficient from median for the following
data.
Year Sales (Tk. Thousand)

Product A Product B
1996 23 36
1997 41 39
1998 29 36
1999 53 31
2000 38 47
The median sales of the two products A & B is 38 and 36 respectively.
Product A Product B
Sales |x-Me| |x-Me|
23 15 31 5
29 9 36 0
38 0 36 0
41 3 39 3
53 15 47 11
n=5 42 n=5 19
For Product A,
1 n 42
MAD  
n i 1
x  Me 
5
 8.4
Coefficient of MAD
MAD 8.4
Coefficient of MAD    0.221
Me 38
For Product B,
1 n 19
MAD  
n i 1
x  Me 
5
 3.8
MAD 3.8
Coefficient of MAD    0.106
Me 36

Advantage of MAD
 It is based on all the observations of a series.

 It is simple to understand.
 It is easy to calculate.
 It shown the dispersion, or scatter of the various items of a series from its central value.
 It is not very much affected by the values of extreme items of a series.
 It facilitates comparison between different items of a series.
 It truly represents the average of deviations of the items of a series.
 It has practical usefulness in the field of business and commerce.
Disadvantages/Demerits of Mean Deviation

 The algebraic signs are ignored while calculating MAD.
 It is not rigidly defined in the sense that it is computed from any central value viz. Mean,
Median, Mode etc. and thereby it can produce different results.
 It is not capable of further algebraic treatment.
Variance and Standard Deviation
Variance-A measure of variability based on the squared deviation of the observed values in the
data set about the mean value.
Population Variance,
1 n
 2   xi   2
N i 1
Sample Variance,
 x  x 
2
s 2

n 1
Standard Deviation
a) For ungrouped data:
Population standard deviation:
d  d 
2 2
n
 x   
1
      , d  x A
2 2
N
i
N N 
i 1  

Sample standard deviation
 x  x  x
2 2 2
nx
s s  2
 
n 1 n 1 n 1
b) For grouped Data
 fd   fd 
2 2
n
 x   
1 1
   x  ( )     h,
2 2 2 2
N
i
N N N 
i 1  
m A
Where, d 
h
Sample standard deviation
s
 fx 2

( fx) 2
n 1 n(n  1)
Coefficient of Variance
SD
Coefficient of Variance   100
Mean
Advantage of Standard Deviation

 The value of Standard deviation is based on every observation in a data set.
 It is the only measure of variation capable of algebraic treatment
 It is less affected by fluctuation of sampling as compared to other measure of variation.
 It is possible to calculate combined SD of two or more sets of data.
 Standard Deviation has a definite relationship with the area under the symmetric curve of
a frequency distribution.
 Standard deviation is useful in further investigation.
Disadvantage
 It is difficult to calculate compared to other measures of variation.
 While calculating SD, more weight is given to extreme values and less to those near mean.
Therefore large deviation is occurred when squared are proportionately more than small
deviation.
 It cannot be used for comparing the dispersion of two, or more series given in different
units.

Exercise: The wholesale prices of a commodity for seven consecutive days in a month is as
follows:
Days 1 2 3 4 5 6 7
Commodity 240 260 270 245 255 286 264
price/quintal
Calculate Variance and standard deviation.

Also calculate coefficient of variance.
Solution:
Observation xx ( x  x) 2
240 -20 400
260 0 0
270 10 100
245 -15 225
255 -5 25
286 26 676
264 4 16
1820 1442
n
1 1820
Mean, x 
N
x
i 1
i 
7
 260
Variance,  2 
1
N
 xi  x  2

1442
7
 206
Standard deviation,
   2  206  14 .352
14.352
Coefficient of Variance   0.0552
260
Problem
A study of 100 engineering companies gives the following information.
Profit : 0-10 10-20 20-30 30-40 40-50 50-60
(Tk. in crore)
Number of : 8 12 20 30 20 10
companies
Calculate standard deviation and coefficient of variance of the profit earned.

Solution:
Let assumed mean, A be 35.
Profit Mid Value, mi  A mi  35 f fd fd 2
m di  
h 10
0-10 5 -3 8 -24 72
10-20 15 -2 12 -24 48
20-30 25 1 20 20 20
30-40 35 0 30 0 0
40-50 45 1 20 20 20
50-60 55 2 10 20 40
100 -28 200
We have,
 fd   fd 
2 2
    h

N  N 
200   28 
2
    10
100  100 
 2  0.078  10
 13.863
Problem
Mr. Shad, a retired government servant is considering his money in two proposals. He wants to
choose the one that has higher average net present value and lower standard deviation. The relevant
data are given below. Can you help him in choosing the proposal?
Proposal A Net Present Value (NPV) Chance of Possible Outcome

of NPV
1559 0.30
5662 0.40
9175 0.30
Proposal B Net Present Value (NPV) Chance of Possible Outcome

of NPV
-10050 0.30
5812 0.40
20584 0.30

Solution
To suggest Mr. Shad, a proposal for high average net present value, first calculate the expected
(average) net present value, NPV for the both proposal
Proposal A: Expected NPV

1559  0.30  5662  0.40  9175  0.30
 467.70  2264.8  2752.5
 Tk .5485
Proposal B: Expected NPV=
 10050  0.30  5812  0.40  20584  0.30

 3015  2324.8  6175.2
 Tk .5485
Since expected NPV in both the cases is same, he would like to choose the less risky proposal.
So we need to calculate Standard Deviation in the both cases.
For Proposal A:
NPV (x) Expected NPV ( x ) xx f f ( x  x) 2
1559 5485 -3926 0.30 4624042.8
5662 5485 177 0.40 12531.6
9175 5485 3690 0.30 4084830.0
1.0 8721404.4
sA 
 f ( x  x) 2
 8721404.40  Tk .2953.20
f
For Proposal B:
NPV (x) Expected NPV ( x ) xx f f ( x  x) 2
-10050 5485 -15535 0.30 72400867.50
5812 5485 327 0.40 42771.6
20584 5485 15099 0.30 68393940
1.0 140837579
sB 
 f ( x  x) 2
 140837579  Tk .11867.50
f
The SA<SB indicates uniform net profit for proposal A. Thus, proposal A may be chosen.

Problem:
A charitable organization decided to give old age pension to people over sixty years of age. The
scales of pension were fixed as follows:
Age Group Pension

(Tk per month)
60-65 20
65-70 25
70-75 30
75-80 35
80-85 40
The ages of 25 persons who secured the pension are as given below:
74 62 84 72 61 83 72 81 64 71 63 61
60 67 74 64 79 73 75 76 69 68 78 66
67
Calculate the monthly average pension payable per person and the standard deviation.
Solution
Age Group f Pension fx fx2
(Tk per month), x
60-65 7 20 140 2800
65-70 5 25 125 3125
70-75 6 30 180 5400
75-80 4 35 140 4900
80-85 5 40 120 4800
25 705 21025
Mean, x 
 fx  705  Tk .28.20
 f 25
Standard Deviation:

fx2
n

 x
2

21025
25
 (28.20) 2  45.76  6.76
Problem
The weekly sales of two products A and B were recorded as below:
Product A 59 75 27 63 27 28 56
Product B 150 200 125 310 330 250 225
Find out which of two shows greater fluctuation in sales.
For Product A: Let A=56, be assumed mean of sales for product A.

Sales, x f d  x A fd fd 2
27 2 27-56=-29 -58 1682
28 1 28-56=-28 -28 784
56 1 56-56=0 0 0
59 1 59-56=3 3 9
63 1 63-56=7 7 49
75 1 75-56=19 19 561
7 -57 2885
x  A
 fd  56  57  47.86
f 7
 fd    fd   2885    57 
2 2 2
s 2
  412.14  66.30  345.84
 f   f 
A
7  7 
 S A  345.84  18.59
SA 18.59
CVA   100   100  38.84%
x 47.86
For Product B: Let A=225, be assumed mean of sales for product A.
Sales, x f d  x A fd fd 2
125 1 -100 -100 10000
150 1 -75 -75 5625
200 1 25 25 625
225 1 0 0 0
250 1 25 25 625
310 1 85 85 7225
330 1 105 105 11025
7 15 35125
x  A
 fd  225  15  227.14
f 7
 fd    fd   35125   15 
2 2
2
s 2
  5017.85  4.59  5013.26
 f   f 
B
7 7
 S B  5013.26  70.80
SB 70.80
CVB   100   100  31.17%
x 227.14
Since coefficient of Variation for product A is more than the product B, therefore the sales fluctuation
in case of Product A is higher.
Problem:

An analysis of production rejects resulted in the following observation
No of rejects per No of Operators No of rejects per No of Operators

Operator Operator
21-25 5 41-45 15
26-30 15 46-50 12
31-35 28 51-55 3
36-40 42
Calculate mean and standard deviation.
What is the Coefficient of Variation? Discuss its importance in business problems.
The coefficient of variation (CV) is a measure of relative variability. It is the ratio of the standard
deviation to the mean (average). For example, the expression “The standard deviation is 15% of
the mean” is a CV.
Mathematically,
The coefficient of variation (CV) is defined as the ratio of the standard deviation  to the mean 
SD 
CV  
, i.e., Mean  .
It shows the extent of variability in relation to the mean of the population.
The CV is particularly useful when you want to compare results from two different surveys or tests
that have different measures or values. For example, if you are comparing the results from two
tests that have different scoring mechanisms. If sample A has a CV of 12% and sample B has a
CV of 25%, you would say that sample B has more variation, relative to its mean.
The coefficient of variation shows the extent of variability of data in sample in relation to the mean
of the population. In finance, the coefficient of variation allows investors to determine how much
volatility, or risk, is assumed in comparison to the amount of return expected from investments.
The lower the ratio of standard deviation to mean return, the better risk-return trade-off.

Chapter-6
Correlation Analysis
A statistical technique that is used to analyze the strength and direction of the relationship between
two quantitative variables is called correlations analysis.
The measure of correlation called the correlation coefficient.
The degree of relationship is expressed by coefficient which range from correlation ( -1 ≤ r ≥ +1
The coefficient of correlation is a number that indicates the strength and direction of statistical
relationship between two variables.
 The Strength of relationship is determined by the closeness of the points to a straight line
when a pair of values of two variables are plotted on a graph. A straight-line is used as the
frame of reference for evaluating the relationship.
 The direction is determined by whether one variable generally increases or decreases when
the other variable increases.
Correlation is a bivariate analysis that measures the strength of association between two variables
and the direction of the relationship. In terms of the strength of relationship, the value of the
correlation coefficient varies between +1 and -1. A value of ± 1 indicates a perfect degree of
association between the two variables. As the correlation coefficient value goes towards 0, the
relationship between the two variables will be weaker. The direction of the relationship is
indicated by the sign of the coefficient; a + sign indicates a positive relationship and a – sign
indicates a negative relationship.
Figure shows how strength of association between two variables is represented by coefficient of
correlation

 If r is close to 1, we say that the variables are positively correlated. This means there is
likely a strong linear relationship between the two variables, with a positive slope.
 If r is close to -1, we say that the variables are negatively correlated. This means there is
likely a strong linear relationship between the two variables, with a negative slope.
 If r is close to 0, we say that the variables are not correlated. This means that there is likely
no linear relationship between the two variables, however, the variables may still be related
in some other way.
Typical Examples of Correlation
There are various methods of correlation analysis

(i) Scatter diagram
(ii) Karl Pearson’s coefficient of correlation
(iii)Spearman’s Rank correlation coefficient
(iv) Method of least squares
Karl Pearson’s Coefficient of Correlation method
Karl Pearson’s correlation coefficient measures quantitatively the extent to which two variables x
and y are correlated.
Correlation coefficient is a mathematical and most popular method of calculating correlation.
Arithmetic mean and standard deviation are the basis for its calculation.
Co var iance( x, y ) Cov( x, y )

r 
Var x Var y  x . y
1
Where, Cov( x, y ) 
n
 ( x  x )( y  y )

x   ( x  x) 2
, Standard deviation of sample data on variable x.

n
y   ( y  y) 2
Standard deviation of sample data on variable y.

n
1
n
 ( x  x )( y  y ) n xy   x. y
r 
 ( x  x ) 2
 ( y  y ) 2
n x 2  ( x ) 2 n y 2  ( y ) 2
n n
Karl Pearson’s Coefficient of Correlation (assumed mean)

Ungrouped data
n d x d y  ( d x )( d y )
r
n d x   d x  n d y   d y 
2 2 2 2
Grouped Data:
n fdx d y  ( fdx )( fd y )
r
n fdx   fdx  n fd y   fd y 
2 2 2 2
Problem:
The following table gives the distribution of items of production and also relatively defective
items among them, according to size groups. Find the correlation coefficient between size and
defect in quality.
Size group 15-16 16-17 17-18 18-19 19-20 20-21

No of items 200 270 340 360 400 300
No of defective 150 162 170 180 180 114
items

Solution:
Let, group size be denoted by variable x and number of defective items by variable y. Calculations
for Karl Pearson’s correlation coefficient are shown below:
Size- Mid m A dx
2 Percent d y  y  50 dy
2
d xd y
group Value, dx   m  17.5 of
h
m defective
items
15-16 15.5 -2 4 75 25 625 -50
16-17 16.5 -1 1 60 10 100 -10
17-18 17.5 0 0 50 0 0 0
18-19 18.5 1 1 50 0 0 0
19-20 19.5 2 4 45 -5 25 -10
20-21 20.5 3 9 38 -12 144 -36
3 19 18 894 -106
Now,
n d x d y  ( d x )( d y )
r
n d x   d x  n d y   d y 
2 2 2 2
6  (100)  3  18
  0.949
6  19  (3) 2 6  894  (18) 2
Since value of r is negative, and is moderately close to -1, statistical association between x (size
group) and y (percent of defective items) is moderate and negative, we conclude that when size
of group increases, the number of defective items decreases and vice versa.
Problem: The following table gives frequency, according to the marks obtained by 67 students
in an intelligent test. Measure the degree of relationship between age and marks.
Test Marks Age in years Total

18 19 20 21
200-250 4 4 2 1 11
250-300 3 5 4 2 14
300-350 2 6 8 5 21
350-400 1 4 6 10 21
10 19 20 18 67

Solution
Let age of students and marks obtained by them be represented by variables x and y respectively.
Calculations for correlation coefficient for this bivariate data is shown in below:
Test Age in years Total,f fd y fd y

2 fd x d y
Marks x 18 19 20 21
y m dy dx -1 0 1 2
200-250 225 -1 4 4 4 0 2 -2 1 -2 11 -11 11 0
250-300 275 0 3 0 5 0 4 0 2 0 14 0 0 0
300-350 325 1 2 -2 6 0 8 8 5 10 21 21 21 16
350-400 375 2 1 -2 4 0 6 12 10 40 21 42 84 50
Total,
 fd  fd
y y
2  fd x dy
f 10 19 20 18 n=67
=66
=52 =116
fd x
-10 0 20 36
 fd x
=46
 fd
2 2
fd x x
10 0 20 72
=102
fd x d y  fd x dy
0 0 18 48
=66
m  A 225  275
dy    1
h 50
d x  x  19
Substituting the values in the following equation we get,

r
n fdx   fdx  n fd y   fd y 
2 2 2 2
67  66  46  52
  0.415
67 102  (46) 2
67 116  (52) 2
Interpretation: Since the value of r is positive, therefore age of the students and marks obtained
in an intelligence test are positively correlated to the extent of 0.415. Hence, we conclude that as
the age of the student increases, score of marks in intelligence test also increases.

Poblem-2
Calculate the coefficient of correlation from the following bivariate frequency distribution:
Sales Adverting Expense (Tk. in ‘000) Total

Revenue (Tk. 5-10 10-15 15-20 20-25
in Lacs.)
75-125 4 1 5
125-175 7 6 2 1 16
175-225 1 3 4 2 10
225-275 1 1 3 4 9
Total 13 11 9 7 40
Solution: Let advertising expenditure and sales revenue be represented by variables x and y
respectively. The calculation for correlation coefficient are shown below:
Advertising Expenditure
x 5-10 10-15 15-20 20-25
m 7.5 12.5 17.5 22.5

2
dx Total,f fd y fd y fd x d y
 (m  12.5) / 5 -1 0 1 2
Revenue Mid
y value (m) dy
75-125 100 -2 4 8 1 0 0 0 5 -10 20 8
125-175 150 -1 7 7 6 0 2 -2 1 -2 16 -16 16 3
175-225 200 0 1 0 3 0 4 0 2 0 10 10 0 0
225-275 250 1 1 -1 1 0 3 3 4 8 9 9 9 10
Total,
f 13 11 9 7 n=40
 fd  fd
y y
2  fd
=21
x dy
=-7 =45
fd x
-13 0 9 14  fd x
=10
 fd
2 2
fd x x
13 0 9 28
=50
fd x d y  fd x dy
14 0 1 6
=21

Substituting the values in the following equation we get,
r
n fdx   fdx  n fd y   fd y 
2 2 2 2
40  21  10  7 910
   0.498
40  50  (10) 2 40  45  (7) 2 1900 1751
Interpretation: Since the value of r is positive, therefore advertising expenditure and sales
revenue are positively correlated to the extent of 0.498. Hence, we conclude that as the expenditure
on advertising increases, the sales revenue also increases.
Spearman’s Rank Correlation Coefficient
This method is applied to measure the association between two variables when only ordinal (or
rank) data are available. It is applied in a situation in which quantitative measure of certain
qualitative factors such as judgment, brands, TV programs, color, taste etc.
Spearman’s Rank correlation coefficient is defined as,

6 d 2
R 1
n(n 2  1)
Where,
R=Rank correlation coefficient
d=R1-R2, difference between pair of ranks,
R1=Rank of observation with respect to first variable
R2= Rank of observation with respect to 2nd variable
The number ‘6’is placed in the formula as a scaling device, it ensures that the possible range of R
is from -1 to 1.
While using this method we may come across 3 types of cases

1. When ranks are given.
2. When Ranks are not given
3. When ranks are equal.
When ranks are equal:

Spearman’s Rank correlation coefficient,
 1 1 
6 d 2  (m1  m1 )  (m2  m2 )  ....
3 3
R 1  ,
12 12
n(n  1)
2
Where mi (i=1,2,3…) stands for the number of times an observation is repeated in the data set for
both variables.

Rules for solution
1. You may ranking the observation by taking either highest value or lowest value as rank 1.
2. If two observations are ranked equal then average rank is assigned. For example, if two
observation are ranked equal at third place, then average rank of (3+4)/2=3.5 is assigned
to these two observations. Similarly, if three observation are ranked equal at third place
then the average rank of (3+4+5)/3=4 is assigned to those three observation.
3. For interpretation, the following formation may be helpful.
Problem:
A financial analyst wanted to find out whether inventory turnover influences company’s earnings
per share (in per cent). A random sample of 7 companies listed in a stock exchange were selected
and the following data was recorded for each.
Company Inventory turnover Earnings per share (Per cent)

(Number of times)
A 4 11
B 5 9
C 7 13
D 8 7
E 6 13
F 3 8
G 5 8
Find the strength of association between inventory turnover and earnings per share. Interpret this
findings.
Solution:
Let us start ranking from lowest value for both variables. Since there are tied ranks, the sum of the
tied ranks are averaged and assigned to each of the tied observations as shown below.
Inventory Rank, R1 Earnings Per Rank, R2 Difference, d2
turnover, x share, y d  R1  R2
4 2 11 5 -3.0 9.00
5 3.5 9 4 -0.5 0.25
7 6 13 6.5 0.5 0.25
8 7 7 1 6.0 36.00
6 5 13 6.5 -1.5 2.25
3 1 8 2.5 -1.5 2.25
5 3.5 8 2.5 1.0 1.00
 d 2 =51

It may be noted that, a value 5 of variable x is repeated twice, m1=2 and values 8 and 13 of variable
y is also repeated twice, so, m2=2, and m3=2
Applying the formula,
 1 1 1 
6 d 2  (m1  m1 )  (m2  m2 )  (m3  m3 )
3 3 3
R 1  
12 12 12
n(n  1)
2
 1 1 1 1 
651  (23  2)  (2 3  2)  (23  2)  (23  2)
 R 1  
12 12 12 12
7(7  1)
2
651  0.5  0.5  0.5

 R 1
336
 R  1  0.9375  0.0625
Interpretation: As R is positive but value is less than 0.20, so there is a very weak positive
association between two variables x and y, i.e., inventory turnover and earnings per share.
Problem
Obtain the rank correlation coefficient between the variables x and y from the following pairs of
observed values.
x 50 55 65 50 55 60 50 65 70 75
y 110 110 115 125 140 115 130 120 115 160
Solution:
Let us start ranking from lowest value for both the variables. Moreover, certain observation in both
data are repeated, the ranking is done in accordance with suitable average value as shown below:
Variable, x Rank, R1 Variable, y Rank, R2 Difference, d2

d  R1  R2
50 2 110 1.5 0.5 0.25
55 4.5 110 1.5 3.0 9.00
65 7.5 115 4 3.5 12.25
50 2 125 7 -5.0 25.00
55 4.5 140 9 -4.5 20.25
60 6 115 4 2.0 4.00
50 2 130 8 -6.0 36.00
65 7.5 120 6 1.5 2.25
70 9 115 4 5.0 25.00
75 10 160 10 0.0 0.00
 d2
=134.00
50---(1+2+3)/3=2 110---(1+2)/2=1.5
55---(4+5)/2=4.5 115---(3+4+5)/3=4
65—(7+8)/2=7.5
In the data set, for variable x, 50 is repeated thrice, m1=3, 55 is repeated twice, m2=2 and 65 is
repeated twice, m3=2 and for variable y, 110 is repeated twice, m4=2, and 115 is thrice, m5=3.
Applying the formula,
 1 1 1 1 1 
6 d 2  (m1  m1 )  (m2  m2 )  (m3  m3 )  (m4  m4 )  (m5  m5 )
3 3 3 3 3
R 1  
12 12 12 12 12
n(n  1)
2
 1 1 1 1 1 
6134  (33  3)  (23  2)  (23  2)  (23  2)  (33  3)
 R 1  
12 12 12 12 12
10(10  1)
2
6134  2  0.5  0.5  0.5  2

 R 1
990
6  139.5
 R 1  0.155
990
Interpretation: As R is positive and value is less than 0.5, so there is a weak positive relationship
between two variables x and y.
Problem
Use the method of rank correlation to determine the relationship between preference prices and
debentures prices.
R=0.125,
Hence, there is a very low degree of positive correlation, probably no correlation, between
preference share prices and debenture prices.

Chapter-7
Regression Analysis
The statistical technique that express the relationship between two or more variables in the form
of an equation to estimate the value of a variable, based on given value of another variable, is
called regression analysis.
The variable whose value is estimated using algebraic equation is called dependent variable and
the variable whose value is used to estimate this value is independent variable.
The linear algebraic equation used for expressing a dependent variable in terms of independent
variable is called linear regression equation.
It plays a significant role in many human activities, as it is a powerful and flexible tool which used
to forecast the past, present or future events on the basis of past or present events. For instance:
On the basis of past records, a business’s future profit can be estimated.
The basic difference between correlation and regression analysis is-

Correlation is described as the analysis which lets us know the association or the absence of the
relationship between two variables ‘x’ and ‘y’. On the other end, Regression analysis, predicts the
value of the dependent variable based on the known value of the independent variable, assuming
that average mathematical relationship between two or more variables.
Correlation is used to represent the linear relationship between two variables. On the contrary,
regression is used to fit the best line and estimate one variable on the basis of another variable.
Simple Regression Model

The two variables x and y which are correlated can be expressed in terms of each other in the form
of straight line equations called regression equations.
The regression equation of y on x is y  a  bx is used for estimating the value of y for given values
of x.
Regression equation of x on y x  c  dy is used for estimating the value of x for given values of
y.

Regression coefficient
Regression coefficients are estimates of the unknown population parameters and describe the
relationship between a predictor variable and the response. In linear regression, coefficients are
the values that multiply the predictor values.
The Regression Coefficient is the constant ‘b’ in the regression equation that tells about the
change in the value of dependent variable corresponding to the unit change in the independent
variable.
One of the popular method to determine the parameters of a fitted regression equation is Least
Squares method.
Let, yˆ  a  bx be the least squares line of y on x, where, ŷ is the estimated average value of
dependent variable y. The line that minimize the sum of squares of the deviations of the observed
values of y from those predicted is the best fitting line.
We get normal equations like these,
 y  na  b x
 xy  a x  b x 2
From where, we get a and b.
Or, b 
S xy
, S xy   ( x  x)( y  y )   xy 
 x y ; S   ( x  x) 2
xx
S xx n

Types of Relationship
Problem:
Use least squares regression line to estimate the increase in sales revenue expected from the
increase of 7.5 percent in advertising expenditure.
Firm Annual increase in Annual Increase in Sales

advertising Expenditure, Revenue, Percentage
Percentage
A 1 1
B 3 2
C 4 2
D 6 4
E 8 6
F 9 8
G 11 8
H 14 9
Solution:
Assume sales revenue (y) is dependent on advertising expenditure (x). Calculation for
regression line using following normal equations are shown in below:
 y  na  b x …………………………(i)
 xy  a x  b x 2
……………..(ii)
যাহা ইসটিমেি করমে হমে োহা y.

Calculation for Normal Equation:
Sales Revenue, y Advertising X2 xy

Expenditure, x
1 1 1 1
2 3 9 6
2 4 16 8
4 6 36 24
6 8 64 48
8 9 81 72
8 11 121 88
9 14 196 126
40 56 524 373
From equation-(i)
40=8a+56b……….(iii)
From equation-(ii)
373=56a+524b….(iv)
(iv)/7-(iii)
53.285-40=8a+74.85b-8a-56b
 13.285  18.85b
 b  0.704
(iii),
40  8a  56  0.704
 8a  0.576
 a  0.072
Substituting the value in the regression equation
yˆ  a  bx
 y  0.072  0.704 x
For x=7.5% or 0.75increse in advertising expenditure, the estimated increase
ˆ  a  bx
y
 y  0.072  0.704(0.075)  0.1248  12.48%

Problem:
The following data are given the ages and blood pressure of 10 women.
Age 56 42 36 47 49 42 60 72 63 55
Blood 147 125 118 128 145 140 155 160 149 150
Pressure
a) Find the correlation coefficient between age and blood pressure.
b) Determine the least square regression equation of blood pressure and age.
c) Estimate the blood pressure of a women whose age is 45 years.
Solution:
Assume, blood pressure y as the dependent variable and age (x) as the independent variable. Calculation
for regression equation of blood pressure on age are shown in the table below:
Age, x dx=x-49 dx2 Blood dy=y- dy 2 dxdy xy

Pressure, y 145
56 7 49 147 2 4 14 8232
42 -7 49 125 -20 400 140 5250
36 -13 169 118 -27 729 351 4248
47 -2 4 128 -17 289 34 6016
49 0 0 145 0 0 0 7105
42 -7 49 140 -5 25 35 5880
60 11 121 155 10 100 110 9300
72 23 529 160 15 225 225 11520
63 14 196 149 4 16 16 9387
55 6 36 150 5 25 25 8250
522 32 1202 1417 -33 1813 1115 75188
a) The coefficient of correlation between age and blood pressure is given by-
n d x d y  ( d x )( d y )
r
n d x   d x  n d y   d y 
2 2 2 2
10  1115  32  (33)

10(1202)  (32) 2 10  1813  (33) 2
12206
  0.892
13689
We may conclude that there is a high degree of positive correlation between age and blood
pressure.

b) The regression equation of blood pressure on the age is given by
 y  na  b x
 1417  10a  522b    (1)
 xy  a x  b x 2
 75188  522a  28348b    (ii)
Now, (1)X52.2 –(ii)
73967.4  75188  522a  27248.4b  522a  28348b

 1220.6  1099.6b
1220.6
b  1.11
1099.6
1417  10a  522  1.11
 837.58  10a
 a  83.758
So, regression equation is
y  83.758  1.11x
c) For women whose age is 45, the estimated average blood pressure will be
y  83.758  1.11(45)  133.708

Hence, the likely blood pressure of a women of 45 years is 134.

Chapter-8
Index Number
An index number is a statistical device for comparing the general level of magnitude (scale, size)
of a group related variables in two or more situation.
For example, if the price level of 2018 is compared with what it was in 2000.
An index number can be defined as relative measures describing the average changes in any
quantity over time. In other words, an index number measures the changing value of prices,
quantities or values over a period of time in relation to its value at some fixed point in time, called
the base period. These numbers are stated as a percentage of a base figure.
Mathematically,
Current Period Value
Index Number   100
Base Period Value
CHARACTERISTICS OF INDEX NUMBERS

Following are some of the important characteristics of index numbers :
 Index numbers are specialized averages.
 Index numbers are expressed in terms of percentages to show the extent of relative
change
 Index numbers measures changes which are not directly measurable.
 Index numbers measure relative changes. They measure the relative change in the value
of a variable or a group of related variables over a period of time or between places.
 Index numbers are for comparison.
Types of Index Number

 Price Index: Measure changes in price over a specified period of time. It is basically the
ratio of the price of a certain number of commodities at the present year as against base
year.
 Quantity Index: Quantity Index number is the changes in the volume of goods produced
or consumed. They are useful and helpful to study the output in an economy.
 Value Index: Value Index number compares the total value of a certain period with total
value in the base period. Here total value is equal to the price of commodity multiplied by
quantity consumed. These pertain to compare changes in the monetary value of imports,
exports, production or consumption of commodities.

Method of Construction of Index Number
Index Numbers
Un-weighted Weighted
Simple Simple average Weighted Weighted

of price relative average of price
Aggregative aggregate relative method
method
method Index Number
Simple Aggregate Method:

This is the simplest method of construction of index numbers. The price of the different
commodities of the current year are added and the sum is divided by the sum of prices of those
commodities of the base year.
Simple aggregate Price Index 

P 1
 100
P 0
Where,
 P =Total prices for the current Year

1
 P =total prices for the base year.

0
Example
Calculate index number from the following data by simple aggregate method taking prices of
2000 as base.
Commodity Price per Unit (in Tk)
2000 2004
A 80 95
B 50 60
C 90 100
D 30 45
Solution

2000 (P0) 2004 (P1)
A 80 95
B 50 60
C 90 100
D 30 45
 P0 =250  P1 =300
Simple aggregate Price Index  P01 
 P  100  300  100  120.
1
P 0 250
It means the price in 2004 were 120% higher than the base year.
Simple average of price relative method

In this method, first calculate the price relative for various commodities and then average of these
relative is obtained by using arithmetic and geometric mean.
 P1 
  p 100 
Simple average of Price relative   0  , when arithmetic mean is used
n
Where,
 P =Total prices for the current Year

1
 P =total prices for the base year.

0
n=Number of items or commodities
When geometric mean is used for average of price relative, then

 P 
  log( 1 ) 
Simple average of Price relative  Anti log  P0
 100 
 n 
 
 
Example:2
From the following data, construct an index for 2018 taking 2017 as base by the average price of
relative using (a) arithmetic mean and (b) Geometric mean.
2017 2018
A 50 70
B 40 60
C 80 100
D 20 30

Solution: Calculation of Price relative index
Commodity Price in Price in 2018 Price Relative P1

2017 (P0) (P1) log(  100)
P1 p0
 100
p0
A 50 70 140 2.1461
B 40 60 150 2.1761
C 80 100 125 2.0969
D 20 30 150 2.1761
P  8.5952
  p1 100  =565
 0 
a) Calculation of Price relative index using arithmetic mean

P 
  p1  100  565
Simple average of Price relative   0   141.25
n 4
b) Calculation of Price relative index using Geometric mean
 P 
  log( 1 ) 
Simple average of Price relative  Anti log  P0
 100   Anti log(
8.5952
)  140.863
 n  4
 
 
Problem
From the data given below, construct the index of price relatives for the year 2002 taking 2001 as
base year using a) arithmetic mean and (b) geometric mean.
Expense on Food Rent Clothing Education Misc
Price in 2001 1800 1000 700 400 700
Price in 2002 2000 1200 900 500 1000
a) 125.508; b) 125.00
Problem
In 2016, for working class people, wheat was selling at an average price of Tk.160 per 10kg, cloth
at Tk.40 per meter, house rent Tk.10,000 per house, and other items at Tk.100 per unit. By 2017
the cost of wheat rose by Tk.40 per 10 kg, house rent by Tk.1500 per house, and other items
doubled in price. The working class cost of living index for the year 2017 (with 2016 as base) was
160. By how much did the cloth price rise during the period 2016-2017?

Solution:
Let the rise in price of cloth be x
Commodity Price in 2016, Index Price in 2017, P1 Index
P0
Wheat 160 200 200/160*100=125
Clothes 40 x x/40*100=2.5x
House rent 10000 11500 11500/10000*100=115
Misc. 100 200 200/100*100=200
Total 440+2.5x
The index of 2017 as given is 160.

Therefore, the sum of index number of four commodities would be 160X4=640.
440  2.5 x  640
 2.5 x  200
 x  80.
Hence, the rise in price of cloth was (Tk.80-40)=Tk.40 per metre.
Weighted aggregate Index Numbers

In order to attribute appropriate importance to each of the items used in an aggregate index number
some reasonable weight must be used.
There are various method of assigning weights and consequently constructing index number, here
we discuss only three of the methods.
1. Laspeyre’s method
2. Paasche’s method
3. Fisher’s Ideal method.
Laspeyre’s Method:
The Laspeyres price index is a weighted aggregate price index, where weight are determined by
quantities in the base period and is given by
Laspeyre' s Price Index  P01 

L pq 1 0
 100
p q0 0
Paasche’s method:
Here weight is determined by the quantities in the current year.
Paasche' s Price Index  P01 

P  p1q1  100
 p0 q1
Fisher’s ideal Method
Here geometric mean of the Laspeyre and Paasche indices are used.

Fisher' s ideal index Number  P01  L  P
F

pq pq
1 0 1 1
 100
p q p q
0 0 0 1
It is known as ideal index number because

a) It is based on the geometric mean
b) It is based on the current year as well as the base year.
c) It conform certain tests of consistency
d) It is free from bias.
Problem:
For the following data, calculate the price index number of 2001 with 2000 as the base year, using
a) Laspeyere’s method
b) Paasche’s method
c) Fisher’s Ideal Method.
Commodity 2000 2001

Price Quantity Price Quantity
A 2 8 4 5
B 5 12 6 10
C 4 15 5 12
D 2 18 4 20
Solution
Commodity 2000 2001

Price, p0 Quantity, Price p1 Quantity, P0qo P0q1 p1q0 p1q1
q0 q1
A 2 8 4 5 16 10 32 20
B 5 12 6 10 60 50 72 60
C 4 15 5 12 60 48 75 60
D 2 18 4 20 36 40 72 80
172 148 251 220
Laspeyre' s Price Index  P01 

L  p q  100  251  100  145.93
1 0
p q 0 0 172
Paasche' s Price Index  P01

P

 p q  100  220  100  148.7
1 1
p q 0 1 148
Fisher' s ideal index Number  P01  L  P
F
251 220
   100  146.96
172 148

Interpretation:
The result can be interpreted as follows:
If 100 Tk were used in the base year to buy the given commodities, we have to use Tk.145.93 in
the current year to buy the same amount of the commodities as per the Laspeyre’s method. Other
values give similar meaning.
Problem
The Arapaho Valley Pediatrics Clinic has been in business for 18 years. The office manager noticed that
prices of clinic materials and office suppliers fluctuate over time. To get a handle on the price trends for
running the clinic, the office manager examined prices of six items the clinic uses as part of its operations.
Shown here are the items, their prices, and the quantities for the years 2005 and 2006. Use these data to
develop unweighted aggregate price indexes for 2006 with a base year of 2005. Compute the Laspeyres
price index for the year 2006 using 2005 as the base year. Compute the Paasche index number for 2006
using 2005 as the base year.
__________________ 2005 ____________ 2006

Item ______________ Price Quantity ______ Price Quantity
Syringes (dozen) __________ 6.70__ 150 _______6.95__ 135
Cotton swabs (box) _______1.35 __ 60 _______ 1.45___ 65
Patient record forms (pad)_____ 5.10___ 8_______ 6.25___ 12
Children’s Tylenol (bottle)_____ 4.50__ 25_______ 4.95 ___ 30
Computer paper (box) _____11.95____ 6 ______ 13.20 ____ 8
Thermometers __________ 7.90 ___ 4 _______ 9.00 ____ 2
Totals 37.50 __________ 41.80

Chapter-9
Fundamental of Hypothesis Testing
A statistical hypothesis is a claim about a population parameter.

e.g., Population mean
The mean monthly cell phone bill in this city   $42.
Hypothesis testing is a step-by-step methodology that allows you to make inferences about a
population parameter by analyzing differences between the results observed (the sample statistic)
and the results that can be expected if some underlying hypothesis is actually true.
Six Step procedure for Hypothesis testing
Step:1 State the Null hypothesis and alternative hypothesis.
 The Null hypothesis, notated as Ho states the claim or assertion to be tested.

Example: The average diameter of a manufactured bolt is 30 mm. It is written as
H 0 :   30 .
It is always about a population parameter, not about a sample static.
Begin with the assumption that null hypothesis is true.
 Alternative hypothesis:
It is opposite of the null hypothesis.
e.g., the average diameter of a manufactured bold is not equal to 30 mm.
H1 :   30.
It never contain ‘=’ sign.
It is generally the hypothesis that the researcher is trying to prove.
Step-2 State the level of significance,  or Level of Confidence

The level of significance defines the likelihood of rejecting the null hypothesis when it is really
true. It is decision maker’s risk of rejecting the null hypothesis when it is true.
The most common alpha value is 0.05 or 5%. Other popular choices are 0.01 (1%) and 0.1 (10%).
Level of
significance, 
Acceptance region

Step-3 Establish Critical or rejection region
 The acceptance region is a range of values of the sample static spread around the null
hypothesized population parameter. If values of the sample static fall within the limits of
acceptance region, the null hypothesis is accepted, otherwise it is rejected.
 The rejection region is the range of sample static values within which if values of sample
static falls, then null hypothesis is rejected. For  =0.05, the critical Z values are  1.96
Step-4 Select suitable test statistics and Determine the appropriate technique and
 Sample size is an important thing for choosing a appropriate test statistic.
 Hypothesis testing for  , if  is known, Z test; if  is unknown—t test
Step-5 Collect data and compute the value of test statistic.
Step-6 Make a test decision about the null hypothesis and interpret.
 Decide, based on a comparison of the calculated value of the test statistic and the critical
value of the test, whether to reject the null hypothesis in favor of the alternative.
 Once we have found the p-value or rejection region, and made a statistical decision about
the null hypothesis (i.e. we will reject the null or fail to reject the null). Following this
decision, we want to summarize our results into an overall conclusion for our test.

Chapter-10
Basic Probability
Probability:
A probability is a numerical measure of the likelihood or chance of occurrence of an uncertain
event.
For example, starting a new business. There is three possible outcomes may be occurred, Profit,
Loss or break even, or toss a die, there are 2 possible outcomes.
Random Experiment
A random experiment is a process by which we observe something uncertain. After the experiment,
the result of the random experiment is known. An outcome/event is a result of a random
experiment.
Random experiment is (also called act, trial, ) an activity that leads to the occurrence of one and
only one of several possible outcomes which is not likely to be known until its completion., that
is, the outcome is not perfectly predictable.
Example, measuring blood pressure of a group of individuals, tossing a coin and observing the
face that appears. Etc.
The process has following properties:

i) All possible outcomes can be specified in advance
ii) It can be repeated
iii) Actual outcome is not known in advance.
Sample Space
The set of all possible outcomes (events) for a random experiment is called sample space.
(i) No two or more of these outcomes occur simultaneously.
(ii) Exactly one of the outcomes must occur, whenever the experiment is performed.
It may be denoted by the capital letter S.
For example,
 Consider the experiment of tossing two coins. The four possible outcomes are HH, HT,
TH, TT. The sample space is S= {HH, HT, TH, TT}
 Random experiment: roll a die; sample space: S={1,2,3,4,5,6}.
 Random experiment: observe the number of iPhones sold by an Apple store in Boston in
2015; sample space: S={0,1,2,3,⋯}.
Event
The set of outcomes from an experiment. (a subset of the sample space)
Each experiment may result in one or more out comes, which is called events.
For example, conducting an experiment by tossing a coin. The outcome of this experiment is the
coin landing ‘heads’ or ‘tails’. These can be said to be the events connected with the experiment.
So when the coin lands tails, an event can be said to have occurred.
Event types

Simple event-A single possible outcome (or result) of an experiment is called simple event. For
example tossing a coin has the event H and T.
Mutually Exclusive events: If two or more events cannot occur simultaneously in a single trail
of an experiment, then such event are called mutually exclusive. It is also called disjoint event.
For example, the numbers 2 and 3 cannot occur simultaneously on the roll of a dice.
Collectively Exhaustive:
A list of events is said to be collectively exhaustive when all possible events that can occur from
an experiment includes every possible outcome. That is, two or more events are said to be
collectively exhaustive if one of the events must occur. Two events A and B are known as
exhaustive events if the union of A and B gives the sample space.
If you are rolling a six-sided die, the set of events {1, 2, 3, 4, 5, 6} is collectively exhaustive. Any
roll must be represented by one of the set. Another example of an event that is both collectively
exhaustive and mutually exclusive is tossing a coin. The outcome must be either heads or tails, or
p (heads or tails) = 1, so the outcomes are collectively exhaustive. When heads occurs, tails can't
occur, or p (heads and tails) = 0, so the outcomes are also mutually exclusive.
Compound event: When two or more events occur in connection with each other, then their
simultaneous occurrence is called compound event. These event may be independent and
dependent.
For example, when we throw a die, the possibility of an even number appearing is a compound
event, as there is more than one possibility, there are three possibilities i.e. E = {2,4,6}
Dependent and independent Event: Two events are independent if the result of the second
event is not affected by the result of the first event. Two events, A and B, are independent if the
fact that A occurs does not affect the probability of B occurring.
Example, Landing on heads after tossing a coin AND rolling a 5 on a single 6-sided die.
Two events are dependent if the result of the first event affects the outcome of the second event
so that the probability is changed. In the above example, if the first marble is not replaced, the
sample space for the second event changes and so the events are dependent.
Equally likely event: Two or more events are said to be equally likely if each has an equal
chance to occur. In other words, equally likely events are events that have the same theoretical
probability (or likelihood) of occurring.
If we toss a coin, there are equal chances of getting a head or a tail. Hence, getting a head or a tail
by tossing a coin are equally likely events. If a dice is rolled, then getting an odd number and

getting an even number are equally likely events, whereas getting an even number and getting a 1
are not equally likely events.
Complementary event: Complementary events are two outcomes of an event that are the only
two possible outcomes. For example, this is like flipping a coin and getting heads or tails. Of
course, there are no other options, so these events are complementary.
The complement of A, denoted by A, A, A c , consists of all outcomes in which the event A does
not occur.
Conceptual approach of calculating of an event
There are 3 ways to approach probability: classical probability, relative frequency of probability,
and subjective probability.
Classical Approach
The classical approach describes probability in terms of proportion of times that an event can be
theoretically expected to occur. This approach I based on the assumption that all possible outcomes
(finite in numbers) of an experiment are mutually exclusive and equally likely.
For example, most people know that if you toss a coin, it is 50/50 chance.
All outcomes are equally likely since neither head nor tail has a better chance of occurring.
Head can occur 50% of the time and tail can occur 50% of the time as well.
Relative Frequency Approach
The probability of an event A is the ratio of the number of times that A has occurred in n trials of
an experiment.
This approach is based on the assumption that a random experiment can be repeated a large number
of times under identical conditions where trails are independent to each other. While conducting a
random experiment we may or may not observed the desired event. But as the experiment is
repeated many times, that event may occur some proportion of time.
For example,
Experiment: Administering a Taste Test for a New Soup
Event B: A consumer likes the taste

As per this approach, the probability of occurrence of an event is given by the observed relative
frequency of an event in a very large number of trials. In other words, the probability of occurrence
of an event is the ratio of the number of times the event occurs to the total number of trials.
Subjective approach
The subjective approach of calculating probability is always based on degree of beliefs,

convictions, and experience concerning the likelihood of occurrence of a random event.
This approach must be used when either sufficient data are not available or sources of information
giving different results are not known.
For example, you think you have a 50/50 chance of getting the job you applied for, because the
other applicant is also very qualified.
Subjective probability is a prediction that is based on an individual's personal judgment, not on

mathematical calculations. This approach gives no rational basis for people to agree on a right
answer.
Problem

Solution
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑎𝑐ℎ𝑖𝑛𝑖𝑠𝑡𝑠 𝑖𝑛 “𝑚𝑖𝑙𝑑𝑙𝑦 𝑠𝑢𝑝𝑝𝑜𝑟𝑡” 𝑐𝑙𝑎𝑠𝑠 11

a) P (Machinist mildly supported)= =30
otal number of machinists polled
number of inspectors in “undecided” class 2
b) P(Inspector undecided)= total number of inspectors polled =30
19+14 33
c) P(strongly or mildly supported)= =60
60
d) Relative frequency.
Problem:
Classify the following probability estimates as classical, relative frequency, or subjective:
(a) The probability the Cubs will win the World Series this year is 0.175.
(b) The probability tuition will increase next year is 0.95.
(c) The probability that you will win the lottery is 0.00062.
(d) The probability a randomly selected flight will arrive on time is 0.875.
(e) The probability of tossing a coin twice and observing two heads is 0.25.
(f) The probability that your car will start on a very cold day is 0.97
(g) The probability of scoring on a penalty shot in ice hockey is 0.47.
(h) The probability that the current mayor will resign is 0.85.
(i) The probability of rolling two sixes with two dice is 1⁄36.
(j) The probability that a president elected in a year ending in zero will die in office is 7⁄10.
(k) The probability that you will go to Europe this year is 0.14
Solution:
a) Subjective
b) Relative
c) Classical
d) Classical
e) Classical
f) Relative frequency
g) Relative frequency
h) Subjective
i) Classical.
j) Relative frequency
k) Subjective

Problem:
Southern Bell is considering the distribution of funds for a campaign to increase long-distance
calls within North Carolina. The following table lists the markets that the company considers
worthy of focused promotions:
There is up to $800,000 available for these special campaigns.
(a) Are the market segments listed in the table collectively exhaustive? Are they mutually
exclusive?(b) Make a collectively exhaustive and mutually exclusive list of the possible events of
the spending decision.(c) Suppose the company has decided to spend the entire $800,000 on
special campaigns. Does this change your answer to part (b)? If so, what is your new answer?
Presentation of Events
A Contingency table is used to classify sample observation according to two or more identifiable
characteristic.
Contingency Tables—for all days in 2010
Jan Not Jan Total

Wed 4 48 52
Not Wed 27 286 313
Total 31 334 365
Decision Trees: 4
Wed
Jan Not Wed 27

48
Wed
Not Jan
Not Wed 286

Question: Define independent and mutually exclusive events. Can two events be mutually
exclusive and independent simultaneously? Support your answer with an example.
Hints: Mutually exclusive events cannot happen at the same time. For example: when tossing a
coin, the result can either be heads or tails but cannot be both. Events are independent if the
occurrence of one event does not influence (and is not influenced by) the occurrence of the other(s).
The definition of being mutually exclusive (disjoint) means that it is impossible for two events to
occur together. Given two events, A and B, they are mutually exclusive if (A П B) = 0. If these
two events are mutually exclusive, they cannot be independent.
Consider 2 events A and B which satisfy the condition that they both are mutually exclusive and
independent (simultaneously).
Now, Since they are independent,
⇒P(A∩B)=P(A).P(B)
Also, the events are mutually exclusive,

⇒P(A∩B)=0
⇒P(A∩B)=P(A).P(B)=0
⇒P(A).P(B)=0
⇒ At least one of A and B has a probability of occurrence = 0
Thus, if we chose any 2 events such that at least one of them is guaranteed to not occur, then those
two events will be both mutually exclusive and independent.
----------------

Some Problem
Problem-1
A research agency administers a demographic survey to 90 telemarketing companies to determine the size
of their operations. When asked to report how many employees now work in their telemarketing operation,
the companies gave responses ranging from 1 to 100. The agency’s analyst organizes the figures into a
frequency distribution.
Number of Employees Working in Telemarketing Number of Companies
0–under 20 32
20–under 40 16
40–under 60 13
60–under 80 10
80–under 100 19
a. Compute the mean, median, and mode for this distribution.
b. Compute the sample standard deviation for these data.
Problem-2
The client company data from the Decision Dilemma reveal that 155 employees worked one of four types
of positions. Shown here again is the raw values matrix (also called a contingency table) with the frequency
counts for each category and for subtotals and totals containing a breakdown of these employees by type
of position and by sex.
i) If an employee of the company is selected randomly, what is the probability that the employee is
female or a professional worker?
ii) If an employee of the company is selected randomly what is the probability that the
employee is female worker?
iii) If a female employee of the company is selected randomly, what is the probability that the
employee is technical?

Problem-3
The frequency distribution represents the data obtained from a sample of 75 copying machine
service technicians. The values represent the days between service calls for various copying
machines.
Class Frequency
boundaries
15.5 – 18.5 14
18.5 – 21.5 12
21.5 – 24.5 18
24.5 – 27.5 10
27.5 – 30.5 15
30.5 – 33.5 6
Find the mean and the modal class.
Problem-4
The data below represents the overall miles per gallon of 2019 SUV's.
24 23 22 21 22 22 18 19 19 19 21 21 21 18
19 21 17 22 18 18 22 16 16
You are required to compute and comment on the first quartile the third quartile and the
interquartile range
Problem-5
A corporation owns several companies. The strategic planner for the corporation believes dollars
spent on advertising can to some extent be a predictor of total sales dollars. As an aid in long-
term planning, she gathers the following sales and advertising information from several of the
companies for 2009 ($ millions).
Advertising cost Sales
12 148
4 55
222 338
60 994
38 541
6 89
17 126
41 379
Find the degree of linear relationship between advertising and sales.
Based on the relationship developed in (i) above, what would be the sales figures if advertising
cost is $111.50?

Business Statistics PDF

Uploaded by

Copyright:

Available Formats

Business Statistics PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Statistics PDF

Uploaded by

Copyright:

Available Formats

Business Statistics

Course Code: MKT-206

As per syllabus of EMBA Program of Patuakhali Science Technology University,

Statistics is the concern way of scientific method of collecting, organizing, summarizing

“Statistics is a way to get information from data.”

Data: Facts, Information:

Statistics is a tool for creating an understanding from a set of numbers.

Business Statistics and Its importance

EMBA Program, PSTU | 2

Statistics in business Management

EMBA Program, PSTU | 3

Types of Statistical Methods

EMBA Program, PSTU | 4

Descriptive statistics involves-

Basic Terms of Statistics

EMBA Program, PSTU | 5

EMBA Program, PSTU | 6

Qualitative or Attribute Variable/Categorical: The characteristics being studied is non-

Quantitative Variable/Numerical: Quantitative Variables are those which are expressed in

Quantitative variables can be classified as-

EMBA Program, PSTU | 7

 Observations of qualitative variable can only be classified and measured.

EMBA Program, PSTU | 8

 Data classification are ordered according to amount of characteristics they possesses.

EMBA Program, PSTU | 9

EMBA Program, PSTU | 10

a. Number of pizzas consumed per week per household.------Ratio

EMBA Program, PSTU | 11

2.1 Data Collection:

2.3 Classification of Data

The classification of data serves the following purposes:

EMBA Program, PSTU | 12

2.4 Bases for Classification

Qualitative variables can be divided into two types:

EMBA Program, PSTU | 13

EMBA Program, PSTU | 14

Constructing a frequency distribution

EMBA Program, PSTU | 15

We may choose k=5 as number of classes.

According to Struge’s rule, k  1 3.222 log e N

ii) Determine the width of the class interval

Upper Limit  Lower limit

EMBA Program, PSTU | 16

Class Interval Tally Frequency

EMBA Program, PSTU | 17

Class Interval Tally Frequency

EMBA Program, PSTU | 18

Difference between Uni-variate Data and Bivariate Data

Bivariate frequency Distribution

EMBA Program, PSTU | 19

Income (x) 200-300 300-400 400-500 500-600 600-700

(ii) Conditional frequency distribution for Y given X>15

EMBA Program, PSTU | 20

Construct a frequency table for the data using an appropriate scale.

EMBA Program, PSTU | 21

1. Cumulative frequency distribution

Cumulative frequency distribution is of two types

 Less than cumulative frequency distribution: It is obtained by adding

Number of Frequency f Cumulative Cumulative Relative Percentage