0% found this document useful (0 votes)
54 views176 pages

Statistics For Management

Statistics for Management

Uploaded by

kamalmonga000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views176 pages

Statistics For Management

Statistics for Management

Uploaded by

kamalmonga000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 176

Contents

UNIT 1: INTRODUCTION TO STATISTICS _______________________________ 1

UNIT 2: CLASSIFICATION, TABULATION AND PRESENTATION OF DATA _____ 12

UNIT 3: DIAGRAMMATIC AND GRAPHICAL REPRESENTATION OF DATA ____ 21

UNIT 4: MEASURE OF CENTRAL TENDENCY___________________________ 35

UNIT 5: ARITHMETIC MEAN _______________________________________ 40

UNIT 6: MEDIAN ________________________________________________ 52

UNIT 7: MODE __________________________________________________ 66

UNIT 8: MEASURE OF DISPERSION __________________________________ 79

UNIT 9: INTRODUCTION TO PROBABILITY ___________________________ 104

UNIT 10: SAMPLING ____________________________________________ 129

UNIT 11: REGRESSION ___________________________________________ 142

UNIT 12: INTRODUCTION TO CORRELATION AND COEFFICIENT __________ 151


1 SHOOLINI UNIVERSITY

UNIT 1: INTRODUCTION TO STATISTICS

CONTENT:
▪ Objectives
Introduction
1.1. History of Statistics
1.2. Managerial Application of Statistics
1.3. Statistics and Computers
1.4. Importance of Statistics in different field
1.5. Summary
1.6. Key Words
1.7. Review Questions
1.8. Further Reading

OBJECTIVES:
• Relevance of Statistics in real life.

• Importance of statistics in different field

• Various applications of Statistics.


2 STATISTICS FOR MANAGEMENT

INTRODUCTION
Statistical inference is the mathematical study of collecting, organizing, analyzing, and
interpreting numerical data in order to derive conclusions based on the probability of those data
(probability). Because statistical data (unlike individual quantities) tend to act in a regular,
predictable fashion, statistics may analyze aggregates of data too big to be understood by
ordinary observation. Statistics can be broken down even further into its component parts,
namely, descriptive statistics and inferential statistics.

1.1. HISTORY OF STATISTICS


The Latin term "status" or the Italian word "Statista," both of which indicate "political state"
or "administration," are the origins of the word "statistics." Shakespeare used the word "statist"
in his play Hamlet (1602). Politicians previously used data. Monarchs and kings needed
information on their states' territory, agriculture, commerce, and population in order to assess
their military might, wealth, taxation, and other political functions, even though statistics were
rarely utilised in those times.

In 1749, Gottfried Achenwall coined the term statistik at a German university to describe the
political science of many different countries. The Englishman W. Hooper coined the term
"statistics" in 1771 while translating Baron B.F. Bieford's work Elements of Universal
Erudition. Hooper claims that statistics is the science that informs us about the political systems
of all modern states in the globe. While there is a sizable difference between the old and new
statistics, the latter also includes the former.

Statistics has evolved throughout the past few centuries since it was first used by English
writers in their writings in the 18th century. The nineteenth century came to a conclusion with
a lot of work being finished.

Around the turn of the 20th century, William S. Gosset developed the methods for making
decisions based on a limited amount of data. Over the course of the 20th century, various
statisticians have concentrated on developing new statistical methods, theories, and
applications. The availability of electronics and computers nowadays is without a doubt a
crucial factor in the advancement of statistics.
3 SHOOLINI UNIVERSITY

Every student of statistics should be acquainted with the various parts of the field in order to
accurately appreciate statistics from a more complete standpoint. Depending on one's line of
work or profession, the other aspects of statistics are usually buried, but comprehending the
underlying idea that drives statistical analysis is essential to appreciating its importance and
beauty.

The two main statistical subfields are descriptive statistics and inferential statistics. Both of
these are used in the analysis of data for scientific purposes, and both are equally important for
the statistics student.

Descriptive Statistics

Descriptive statistics address subjects like data collection and presentation. This is typically
the initial step in a statistical analysis. The statistician must be careful while designing trials,
choosing the ideal focus group, and avoiding biases that are so easy to bring into the experiment
because it is frequently not as simple as it sounds.

Different research areas require different kinds of analysis when using descriptive statistics.
For instance, the average values that fluctuate over short time periods are required by a
physicist studying turbulence in a lab setting. Physical values must be averaged out of a wide
range of data acquired throughout the experiment due to the nature of the problem.

Inferential Statistics

Inferential statistics enhances our understanding of such data by assessing representative


subsets of data from a wider population. It aids in forming general judgments about the
population using a variety of tests and equipment.

To choose random samples that fairly represent the entire population, a variety of sampling
techniques are used. Simple random sampling, stratified sampling, cluster sampling, and
systematic sampling are some of the most important techniques.
4 STATISTICS FOR MANAGEMENT

1.2. MANAGERIAL CONCEPT OF STATISTICS


• Data collection, analysis, and interpretation are all part of the mathematical discipline
of statistics. Numerous specialties have emerged in order to apply statistical theory and
methodologies to a wide range of topics. Some topics, despite having the word
"statistical" in their names, involve manipulating probability distributions more than
statistical analysis.

• Actuarial science is the area that assesses risk in the insurance and financial industries
using statistical and mathematical methods.

• Astro statistics is a discipline of study that employs statistical analysis to understand


astronomical data.

• Actuarial science is the area that assesses risk in the insurance and financial industries
using statistical and mathematical methods.

• Astrostatistics is a discipline that examines how statistical analysis can be used to


understand astronomical data.

• Biostatistics: Biostatistics, a branch of biology that also covers medical statistics, uses
statistical analysis to study biological occurrences and observations.

• Business analytics is a fast-evolving business process that uses statistical approaches


to analyse data sets, which are frequently quite large, to gain new understanding of
business performance and prospects.

• Chemometrics is the science of using mathematical or statistical techniques to relate


measurements taken on a chemical system or process to the condition of the system.

• Demography The statistical analysis of all populations is called demography. Any


type of dynamic population—one that varies over time or space—can be studied using
5 SHOOLINI UNIVERSITY

this highly general science.

• Econometrics The field of economics known as econometrics employs statistical


techniques to the empirical investigation of economic ideas and correlations.

• Environmental statistics is the use of statistical techniques to the study of the


environment. There are studies of plant and animal populations as well as weather,
climate, air, and water quality.

• Geostatistics The study of data from fields including meteorology, oceanography,


geochemistry, petroleum geology, and hydrogeology is the focus of the geography
subfield of geostatistics.

• Operations research (or Operational research) is an interdisciplinary discipline of


applied mathematics and formal science that uses methods like mathematical
modelling, statistics, and algorithms to discover ideal or close to ideal answers to
difficult problems.

• Population ecology is a branch of ecology that studies the dynamics of species


populations and how they interact with their surroundings.

• Quality control examines the production and manufacturing processes. It may employ
statistical sampling of product items to inform judgments regarding process control or
accepting deliveries.

• Quantitative psychology the study of mathematically explaining and modifying


human thought and behavior is known as quantitative psychology.

• Statistical finance An empirical attempt to move finance away from its normative
roots and toward a positivist framework using examples from statistical physics with a
focus on emergent or collective characteristics of financial markets is known as
6 STATISTICS FOR MANAGEMENT

emergent finance, and it is a subset of econophysics.

• Statistical mechanics is the application of probability theory to the area of mechanics,


which is concerned with the motion of particles or objects when subjected to a force.
Probability theory contains mathematical methods for dealing with huge populations.

• Statistical physics One of the foundational ideas of physics is statistical physics,


which applies probability theory techniques to physical problem-solving.

1.3. STATISTICS AND COMPUTERS


Put numbers to the test and observe the results. When you study computer science and
mathematics, you will create mathematical models or establish formulas that address
mathematical issues using algorithms and computational theory. In other words, you'll create
new instruments that have the ability to predict the future.

Through the Computer Applications option, students can combine a traditional computer
science degree with a degree in a non-traditional field. You may practise using the technologies
you'll use in the field at our state-of-the-art labs for high-performance computing, networks,
and artificial intelligence. You'll also learn through laboratories, lectures, and projects:

1. Look into the limits of the algorithms and data structures that support complex software
systems.

2. Make new applications and tools for science and research areas that involve more than one
field.

3. Look for opportunities for advanced computer modelling and simulation

1.4. IMPORTANCE OF STATISTICS IN DIFFERENT FIELD


Every aspect of human activity benefits from statistics. Statistics play a significant part in
establishing a country's current per capita income, unemployment rate, population growth rate,
housing, educational system, medical infrastructure, etc. The use of statistics is relatively broad
7 SHOOLINI UNIVERSITY

Almost every discipline, including business, commerce, trade, physics, chemistry, economics,
mathematics, biology, botany, psychology, and astronomy, rely on quantum mechanics in some
capacity today. We'll now discuss a few key areas where statistics are regularly applied.

In the field of Business:

Businesses must use statistics to succeed. A successful businessman must be able to decide
swiftly and accurately. He should be able to choose what to produce, sell, and in what quantities
because he knows what his customers desire. Statistics are used by businesspeople to maximise
their production

Statistical methods based on consumer preferences can also be used to evaluate product quality
more efficiently. As a result, statistical information is necessary for all corporate operations.
He is capable of making informed choices regarding the location of the business, how the
products are advertised, the resources that are available, etc.

In the field of economics:

In economics, statistics are significant. Statistics have a significant role in economics. For
economists and administrators, national income accounts are a versatile tool. The creation of
these accounts is done using statistical techniques. In economics research, statistical techniques
are utilized for data collection, analysis, and hypothesis testing. The link between supply and
demand is studied using statistical methods, and issues like imports and exports, inflation, and
per capita income call for a solid understanding of statistics.

In the field of mathematics,

Almost all branches of the social and natural sciences depend heavily on statistics. Although
the methods employed in the natural sciences are the most reliable, the conclusions that can be
drawn from them are only likely and not guaranteed. This is due to the inadequacy of the data
they are based on. In order to describe these measurements more accurately, statistical analysis
is helpful. The field of applied mathematics includes the study of statistics. We use a wide
8 STATISTICS FOR MANAGEMENT

variety of statistical techniques in mathematics, such as probability averages, dispersions,


estimations, and so forth. We use a range of pure mathematical methods, including integration,
differentiation, and algebra, in statistics.

In the field Banking:

The banking industry places a significant emphasis on statistical analysis. Statistics are utilized
for a variety of purposes by the banking industry. The concept behind how banks operate is
based on the fact that customers do not all withdraw their money at the same time. This ensures
that the money is safe. When the bank lends money to other people and charges them interest,
it turns these deposits into profit for the bank. Estimating the number of depositors and the
claims they will make on a given day requires the bankers to make use of statistical methods
that are predicated on the concept of probability.

In the field of Management

The management of a state or country depends heavily on the gathering and processing of
statistical data. Numerous government programmes are built on statistical principles.
Nowadays, statistical information is used in almost all administrative decisions. Statistical
techniques will be used to determine the size of the cost-of-living rise if the government decides
to change the pay scales of employees in light of an increase in the cost of living. The creation
of both the federal and provincial budgets heavily relies on statistics. This is because they are
used to help with the estimation of expected costs and revenues obtained from a number of
sources. Statistics act as the administration of the state's eyes in this regard.

In the fields of Accounting and Auditing:

Accuracy is essential to the practice of accounting. However, in order to make decisions, a


level of precision so high is not required. A choice can be made based on an approximation,
which is referred to as statistics. The adjustment of the values of current assertions is performed
based on the purchasing power of money or the worth that money possesses now. In auditing,
the use of sample techniques is rather prevalent. An auditor will base the size of the sample of
9 SHOOLINI UNIVERSITY

the book that needs to be audited on the likelihood of error.

In the field of Natural and Social Sciences

Statistics is an indispensable tool in the vast majority of natural and social science disciplines.
In fields as diverse as biology, physics, chemistry, mathematics, meteorology, research
chambers of commerce, sociology, business, public administration, communication and
information technology, etc..., statistical methods are frequently utilised for the purpose of
analysing the results of experiments and determining the significance of their findings.

In the field of Astronomy

Astronomy is one of the oldest subfields of statistical research; it is concerned with the use of
observations to determine the distances, sizes, masses, and densities of celestial entities. Errors
are unavoidable when performing these measures; therefore, the most probable measurements
are founded by utilizing statistical approaches.

1.5. SUMMARY
Either the Latin word "status" or the Italian word "Statista" are the roots of the word "statistics."
The first person to use the term is thought to have been an Englishman named W. Hooper in
1771. The development of new statistical methods, theories, and applications has received a lot
of attention from statisticians throughout the 20th century. Descriptive statistics and inferential
statistics are the two main subfields that make up the statistical discipline. The three main parts
of descriptive statistics are data gathering, analysis, and interpretation.

Analyzing data samples that are intended to be representative of a wider population is a key
component of the use of inferential statistics. Each are equally crucial for a statistics student to
comprehend, and both have various research applications. Chemometrics is the use of statistical
techniques to create a relationship between the state of a chemical system and the results of
measurements that were taken on it. Population ecology is the study of the dynamics of animal
populations and how those populations interact with their environments. Quantitative
10 STATISTICS FOR MANAGEMENT

psychology is a branch of psychology that focuses on utilising mathematics to understand,


predict, and shape human cognition and behaviour.

Students who choose the Computer Applications option can obtain both a degree in a field not
often connected with computer science and a degree in computer science. Statistics are crucial
in establishing a nation's current per capita income, unemployment rate, population growth
rate, housing stock, educational system, healthcare infrastructure, etc.

The study of statistics is one of the subfields within the discipline of applied mathematics.
Numerous statistical techniques are used in mathematics, including estimates, dispersions, and
probability averages. Without the systematic collection and analysis of statistical data, state or
national administration cannot operate.

In the auditing sector, sampling approaches are very frequently used. Based on the likelihood
that there will be errors in the book, an auditor will choose the size of the sample of the book
that needs to be audited. Statistics is a tool that is essential to the majority of scientific and
social science fields of study.

1.6. KEY WORDS:


• Statistical Analysis: To find underlying patterns and trends, it is a science to gather,
examine, and present vast amounts of data.

• Descriptive Statistics: The presentation and gathering of data are topics covered by
descriptive statistics

• Inferential Statistics: The technique of employing data analysis to deduce


characteristics of an underlying probability distribution is known as statistical
inference.

1.7. REVIEW QUESTIONS


Q1. Briefly describe importance of Statistics.
11 SHOOLINI UNIVERSITY

Q2. Why statistics is important in computers?

Q3. What is the managerial concept of statistics?

1.8. REFERENCE FOR FURTHER READING:


Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw

Hill Book Company, New York.

Balwani Nitin Quantitative Techniques, First Edition: 2002. Excel Books, New Delhi.

Bhardwaj R.S., Business Statistics, Excel Books.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.

Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York. Gupta S.P.,
Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
12 STATISTICS FOR MANAGEMENT

UNIT 2: CLASSIFICATION, TABULATION AND


PRESENTATION OF DATA

CONTENT:
▪ Objectives
Introduction
2.1. Classification
2.2. Frequency distribution
2.3. Scope of Statistics
2.4. Summary
2.5. Keywords
2.6. Reviews Questions
2.7. References for further readings

OBJECTIVES:
• To understand classification of the data

• To visualise the data through different methods


13 SHOOLINI UNIVERSITY

INTRODUCTION
The raw material of statistics is statistical data. The data may belong to an activity of interest,
a phenomenon, or an investigational concern. They result from measuring, counting, or
watching.

Statistical data are, therefore, the elements of a problem scenario that may be measured,
quantified, counted, or categorised.

A variable is any subject, phenomenon, or action that generates data through this method. In
other words, a variable is a quantity that varies in some way when repeated measurements are
made.

In statistics, data are divided into two broad categories: quantitative data and qualitative data.

This categorization is determined by the sorts of measurable attributes

2.1. CLASSIFICATION
Here is how they are related. Let's go. Therefore, the first form of data we've received is
quantitative, quantitative, and numerical.

Therefore, any number-based data that can be collected, such as an individual's age, height, or
weight, is quantitative data, as opposed to qualitative data. Therefore, the quality of something,
such as black hair, brown hair, or red hair, is not numerical data. This is qualitative data quality
data. Therefore, you may also like to use the term chatter ghastly.

Some majorities of people do not utilise qualitative; instead, they use the term category.

Thus, the two primary categories of data are numerical data and categorical data. We can break
these data kinds as well as quantitative data into distinct data types, therefore we can split our
quantitative data into discrete and continuous data. So let's tackle discrete data. Initially,
discrete is precise.
14 STATISTICS FOR MANAGEMENT

For instance, your current school grade is 1234 1112. These values are discrete. Never enrolled
in grades 1.1 or 1.2. The number of shoe songs is another example of a discrete variable; as a
result, you are either nine, nine and a half, or 10. You don't know about point 2576. Because
they are discrete values, they can only take certain shapes. Now contrast this with continuous,
which is infinitely variable. I know this may blow your head, but you've been every point. You
can have an unlimited amount of different points or points for any particular person. till when,
correct?

You will ascend to a height of one metre before attempting to reach point 000000001. And
after that, you will receive 1.00 points up to whatever height you ludicrously claim to be; you
might then say that, depending on your weight, you are about 75 inches tall. Although you most
likely are 75 and visited 76598623. Therefore, while discrete values are precise, continuous
values can have any value. These precise values, however, can be discrete, continuous, or any
other type of value, and they will be 77.5 or 73 points.

We can divide the qualitative data in a manner similar to how we did with the quantitative data.
We therefore have categorical data, or at least two different categories of categorical data, aren't
we? As a result, we have category and nominal data. When you think of a categorical, you
typically think of nominal categorical data, which means that it is odourless. Now, if this
implies that qualitative categories and ordinal items are odourless, then normal has no odour.

Consider that you are a private, then a lance corporal, and finally Corporal Penny, your
sergeant, in the military. I believe, staff sergeant, that this is not numerical data, but it certainly
has a unique order.

Again, these are categories, but they are also ordinal categories. If you earn good grades during
the semester, you may receive VLA li sound, ha or vi ha. They're classifications inside Okay,
so this is data, which may be quantitative or qualitative with numbers or categories, discrete or
continuous, so the numbers may be exact values or continuous with an unlimited number of
values.

Qualitative either black hair, brown hair, red hair, automobiles red, yellow, green, and blue, or
some other classifications, but also discusses.
15 SHOOLINI UNIVERSITY

2.2. FREQUENCY DISTRIBUTION


A frequency table is all that a frequency distribution is. Usually there are two columns, but
occasionally there are more. The class should be in the first column. Therefore, whether the
class is grouped or ungrouped, the column containing it will exist in either case. The frequency,
which is often merely a count of how frequently that data value or range of values happens,
follows. The relative frequency may occasionally appear in a separate column; however, we'll
get to that in a moment.

We're going to use Excel along with hand calculations to create some frequency distributions
to get things started. And when using Excel, you should think about many of the same issues
that you must take into account when making a distribution by hand. How many classes do you
want in the distribution is the first question, then. Normally, we'll tell you that it should be
between five and twenty, but for this class, we'll suggest to use these three classes.
Alternatively, we might tell you how much bandwidth to use, which we'll discuss in a moment.
Accordingly, if we use five classes here, and once more, you can access this data.

Visualization of Data

In order to make it easier for the human brain to comprehend and make inferences from the
data, an approach known as data visualisation involves presenting information in a visual
format, such as a map or graph. Finding patterns, trends, and outliers in sizable data sets is one
of data visualization's main objectives. It's common to use the terms "information graphics,"
"information visualisation," and "statistical graphics" interchangeably.

In order to draw conclusions from data that has been collected, analysed, and modelled, one
phase in the data science process called data visualisation argues that the data must first be
presented. The goal of data presentation architecture (DPA), which includes data visualisation,
is to identify, modify, prepare, and present data as effectively as feasible.

Almost every profession requires the capacity to visualise data. Teachers can use it to display
test results for their pupils, computer scientists can use it to investigate developments in
artificial intelligence (AI), and company executives can use it to interact with stakeholders. It
16 STATISTICS FOR MANAGEMENT

is crucial for large-scale data projects as well. As businesses gathered enormous amounts of
data in the early stages of the big data trend, they needed a way to quickly and easily get an
overview of their data. Visualization tools are naturally integrated.

For similar reasons, visualisation is essential to advanced analytics. It becomes essential to


show the outputs when a data scientist is developing complex predictive analytics or machine
learning (ML) algorithms in order to track results and ensure that models are operating as
intended. This is due to the fact that complicated algorithmic images are frequently simpler to
understand than their numerical results.

Data visualisation and big data

The importance of visualisation has increased as a result of big data and data analysis projects.
Businesses are using machine learning more and more to gather vast amounts of data, which
may be difficult and time-consuming to organise, understand, and explain.

This process can be sped up with the use of visualisation, which also makes information easier
for stakeholders and business owners to understand.

Big data visualisation frequently goes beyond the typical methods used in traditional
visualisation, such as pie charts, histograms, and business graphs. As an alternative, it uses
more intricate visualisations like heat maps and fever charts. Powerful computer systems are
required for big data visualisation in order to take raw data, understand it, and provide graphical
representations that allow users to draw conclusions quickly.

Univariate Analysis In the univariate analysis, a single characteristic will be used to assess
nearly all of its properties.

Bivariate analysis refers to the process of comparing data between exactly two characteristics.

Multivariate Analysis In the multivariate analysis, more than two variables will be compared.
17 SHOOLINI UNIVERSITY

Table 2.1

Univariate Bivariate

involving a single variable involving two variables

does not deal with causes or deals with causes or relationships


relationships

the major purpose of univariate the major purpose of bivariate analysis


analysis is to describe is to explain

• central tendency - mean, • analysis of two


mode, median variables
simultaneously
• dispersion - range,
variance, max, min, • correlations
quartiles, standard
deviation. • comparisons,
relationships, causes,
• frequency distributions explanations

bar graph, histogram, pie chart, line • tables where one variable is
graph, box-and-whisker plot contingent on the values of the
other variable.

independent and dependent variables


18 STATISTICS FOR MANAGEMENT

2.3. SCOPE OF STATISTICS


For corporate organisations to function effectively, the depth of statistics is essential.
Businessmen are expected to make a range of decisions to ensure that various tasks are
completed efficiently and on time. Making wise decisions in the uncertain corporate world is
aided by statistics. Manufacturing processes and product delivery could be adapted to consumer
wants using statistical methodologies. Businesses use statistical data to inform decisions about,
among other things, pricing, marketing, and financial resources. Statistics and economics are
related and frequently used together. These two concepts are intertwined. Data collection,
analysis, comparison, and presentation are all done using statistics. Almost all aspects of
economics are included, such as production, consumption, distribution, etc. Additionally, it is
used to estimate demand and supply, compute GDP and per-capita income, as well as to
calculate import-export and inflation rates. By providing quantitative data, it resolves economic
problems.

2.4. SUMMARY
The initial phase in the statistical process is the collection of statistical data. The information
may be relevant to a phenomenon, a research question, or an interesting pastime. They are the
results of tasks like counting, measuring, or watching.

The elements of a problem scenario that can be measured, quantified, numbered, or categorise
are known as statistical data. In other words, statistical data can be counted, counted, quantified,
or categorised. A variable is anything that generates data using this methodology, including a
subject, an event, or an action.

In other words, a variable is a quantity that varies in some way between different measurements
of the same thing. Data are divided into two primary groups in statistics: quantitative data and
qualitative data. This classification is based on the types of quantifiable characteristics.

2.5. KEYWORDS
Bivariate frequency: Bivariate frequency distributions are data classifications that are
categorised simultaneously based on the magnitude of two characteristics.
19 SHOOLINI UNIVERSITY

Classification: The technique of classifying things (either physically or conceptually) into


groups or classes based on the unity of traits that may exist among a diverse set of people.

Dichotomous: When a characteristic is an attribute, dichotomous classification can be used to


divide data into two groups based on that feature.

Statistical series: Statistical series are collections of classified data that are organised in some
logical order, such as by size, by time of occurrence, or by some other criterion.

2.6. REVIEWS QUESTIONS


1. What exactly do you mean when you say "classification" and "tabulation"? Explain the
significance of these variables in statistical investigations.

2. In every statistical inquiry, discuss the goal, techniques, and importance of tabulation.
Mention the several types of tables that are commonly utilised.

3. Make a frequency table using the following data using a width of 10 for each class. Use
a method of classification that is inclusive.

30, 38, 43, 59, 82, 40, 45, 39, 83, 85, 72, 66, 45, 33, 53, 67, 70, 72, 52, 50, 43, 44, 60, 89, 67,
66, 78, 32, 56, 47, 65, 56, 38, 84, 64, 52, 43, 33, 31, 35, 38, 39, 40, 37, 52, 53, 60

2.7. REFERENCES FOR FURTHER READINGS


Bhardwaj R. S., Business Statistics, Excel Books.

Lindgren B.W (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.

Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.

Balwani Nitin, Quantitative Techniques, First Edition: 2002. Excel Books, New Delhi.
20 STATISTICS FOR MANAGEMENT

Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York.

Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Hill Book
Company, New York.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.

Selvaraj R., Loganathan C., Quantitative Methods in Management.

Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.

Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay.

Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R.P., Statistics for Business and Economics, Macmillan India Delhi, 2008

Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.
21 SHOOLINI UNIVERSITY

UNIT 3: DIAGRAMMATIC AND GRAPHICAL


REPRESENTATION OF DATA

CONTENT:
▪ Objectives
Introduction
3.1. Importance of Diagram
3.2. Types of Diagrams
3.3. Methods of creating bar diagram
3.4. Summary
3.5. Key Words
3.6. Review Question
3.7. Reference for further Reading

OBJECTIVES:
• To understand the concept of graphical representation of data

• To understand the various diagrams and bar graphs

• To learn the methods of creating bar diagrams


22 STATISTICS FOR MANAGEMENT

INTRODUCTION
Diagrams are a sophisticated way to depict data, even though tabulation is a very effective way
to convey the data. As a layperson, it is difficult to understand the tabular data, but with just a
quick glance at the graphic, one may fully understand the facts being provided. Diagrams
"record a meaningful impression practically before we think," claims M.J. Moroney.

3.1. IMPORTANCE OF DIAGRAMS:


• Diagrams convey the data in a very transparent manner. Even someone with no prior
knowledge of the subject can immediately and easily understand its meaning.

• We can compare and contrast the various samples we have with relative ease. We need
not use any additional statistical techniques in order to compare.

• This approach can be used in every circumstance, at any moment, and anywhere.
Practically speaking, this approach is employed in a wide range of contexts and
academic domains.

• Diagrams are very valuable as well. Tabular data does not convey information as well
as diagrams do.

• The average individual is quickly convinced by strong examples.

• The Mean, Mode, and Median, as well as other statistical variables, can be determined
using this method for the numerical kind of statistical analysis.

• You will not only save time and effort by doing this, but also money. Even high-quality
schematics can be created for comparatively little money.

• These give us a lot more information than tabulation, in comparison. The tabulation
method has some drawbacks of its own.

• It won't be difficult to remember these specifics. Diagrams that we actually see have a
23 SHOOLINI UNIVERSITY

much higher chance of leaving an impact than other data techniques.

• Large volumes of data can be reduced using diagrams. A simple figure might be able
to illustrate facts that even 10,000 words cannot.

General Guidelines for Diagram:

• At the very beginning, the diagram should have been drawn in the appropriate way.
Under a general header that accurately communicates the function of a diagram, the
gist and meat of the topic at hand need to be brought into focus so that the reader can
understand it.

• The size of the scale must not to be excessively large nor insufficiently modest. If it is
excessively huge, it may look unattractive. It is possible that the intended idea will not
be conveyed if it is too little. In every single diagram, the size of the paper needs to be
taken into consideration. The scale of the diagram can then be determined with this
information.

• Some remarks should be placed at the bottom of the diagram in order to address various
uncertainties that exist there. This will provide to create a visual understanding of the
diagram.

• Ensure that the diagrams are well organised and free of clutter. On the diagram, there
should not be any ambiguity, nor should there be any areas with excessive writing.

• The concept of simplicity refers to the feeling of falling in love at first sight. What it
means is that the diagram need to be able to express the content in a straightforward
and simple manner.

• The scale must be shown in conjunction with the diagram.

• It needs to be clear to everyone what the expectation is. It is necessary for it to indicate
24 STATISTICS FOR MANAGEMENT

the nature of the data supplied, as well as its location and source.

• By utilising a variety of tints and colours, one can develop diagrams that are much
simpler to comprehend.

• The use of a vertical diagram is recommended above the use of a horizontal diagram.

• It is essential that the statement be correct. It is not acceptable to sacrifice accuracy to


make something more appealing to the eye or impressive to the listener.

Limitation of Diagrammatic representation

• Diagrams do not present the small differences properly.

• These can easily be misused.

• Only artist can draw multi-dimensional diagrams.

• In statistical analysis, diagrams are of no use.

• Diagrams are just supplement to tabulation.

• Only a limited set of data can be presented in the form of diagram.

• Diagrammatic presentation of data is a more time-consuming process.

• Diagrams present preliminary conclusions.

• Diagrammatic presentation of data shows only on estimate of the actual


behavior of thevariables.
25 SHOOLINI UNIVERSITY

3.2. TYPES OF DIAGRAMS:


Line Diagram
In these diagrams, a single line is drawn to symbolise one of the variables. These
lines can either be vertical or horizontal in orientation. In order to facilitate
straightforward comparisons, the lines have been drawn in such a way that the
proportion between their length and the value of the terms or things they represent
has been taken into account.

Figure 3.1.
Simple Bar Graph
These figures, which resemble line diagrams, are used when it is possible to display
the data using only one dimension, namely length. With the exception of measuring
only one line's thickness, the technique is substantially the same. Depending on your
preference, these can also be created in either a horizontal or a vertical direction. The
width of these lines or bars must be the same. The distance between these bars should
also be the same. When choosing the breadth and distance between them, it's crucial
to keep the available space on the page in mind.
26 STATISTICS FOR MANAGEMENT

Figure 3.2.

Multiple Bar Diagrams:

When we need to make a comparison between more than two different variables, we
turn to the diagram for help. It is possible that there are 2, 3, 4, or even more than
that number of variables. In the event that there are two variables, two bars will be
drawn. In a similar manner, when there are three variables, we draw triple bars. In
this case, the bars are drawn using the same proportionate basis as when drawing
simple bars. The same colour has been applied to both of these items.

Figure 3.3
27 SHOOLINI UNIVERSITY

Sub Divide Bar Diagram

The data that is generally displayed using several bar diagrams can be presented
using this design. As shown in the examples that follow, we integrate the outcomes
of various variables over the length of a period onto a single bar in this scenario.

Each bar must maintain the exact same arrangement for all of its elements. This
diagram will work better if there are three to five components instead of more.

Figure 3.4

Percentage Bar Graph

In a manner analogous to that of the sub-divided bar diagram, the data pertaining to
a single period or variable is represented by a single bar, but this time the values are
expressed in terms of percentages. To make it easier to make comparisons, we keep
the components in each bar in the same order from one to the next.

Duo – Directional Bar Graph

In this particular instance, the diagram can be found on either the left or the right
side of the base line, as well as either the upper or lower side.
28 STATISTICS FOR MANAGEMENT

Broken Bar Diagram

When the value of one variable is significantly higher or lower than that of the others,
this diagram is utilised. In this scenario, the bars that contain more extensive terms
or items might appear broken.

One Dimensional Diagram

One dimensional diagram are those in which only one dimension, length, has a fixed
size proportional to the value of the data. These diagrams are also frequently referred
to as bar diagrams. Both vertical and horizontal diagramming styles are acceptable.
The associated distinct bar diagrams only differ from one another in terms of their
length dimension; their other two dimensions, namely breadth and thickness, remain
unchanged. The number of such diagrams to be made and the size of the paper at
one's end are taken into consideration while determining the width of each diagram.
If they are to be drawn in great quantities on a piece of paper, they may be in the
shape of a line or a thread. However, their width shouldn't be too big or too tiny
because in both situations they appear unattractive. These diagrams don't seem to
emphasise the thickness dimension. The following page contains examples of these
diagrams.

3.3. METHODS FOR CREATING BAR DIAGRAMS AND GRAPHS


The following is a list of the methods that can be used to draw bar diagrams:

(i) To begin, draw the base line, preferably in a horizontal direction, and then
divide it into a number of equal sections, keeping in mind the total number
of diagrams that are to be made.

(ii) After that, create the scale line, preferably in a vertical orientation, and split
it into a number of equal sections while keeping the maximum value that
needs to be represented in mind.
29 SHOOLINI UNIVERSITY

(iii) Next, establish a consistent width for each bar, bearing in mind both the total
number of bars that will be drawn and the distance that will be left between
each pair of bars.

(iv) Next, establish a consistent gap size between each of the two bars by fixing
the distance between them.

(v) Following that, adjust the lengths of the various bars so that they are
proportional to the values of the data.

(vi) After that, draw the various bars in accordance with their length and width
as determined by this step, and arrange them in the order of their length or
the time at which they occurred.

(vii) Next, decorate the bars with colours or shades that are the same, different,
or a combination of both, depending on how similarly or differently the
characteristics of the data are accordingly.

Advantages

The following is a summary of the primary benefits that can be gained from using a
bar diagram:

1. It is really easy to draw, and it is also very easy to read.

2. It is the only type of diagram that can express a vast number of facts on a piece of
paper, and it is the only type of diagram that exists.

3. It is possible to draw it in both the vertical and horizontal planes.

4. It improves the overall appearance and makes comparisons easier.


30 STATISTICS FOR MANAGEMENT

Disadvantages

1. It is unable to display a significant number of the data's characteristics.

2. A drawer decides, seemingly at random, which of the bars will be fixed.

Two Dimensional:

A shape that is only made up of its width and height, with no thickness in between those
two dimensions. Squares, Circles, Triangles, etc are two dimensional objects.
Alternately referred to as "2D."

Figure 3.5

Three Dimension:

An object that, just like any other item in the real world, includes a height, width, and
depth. Consider the fact that your body has three dimensions.
Also referred to as "3D."

Figure 3.6.
31 SHOOLINI UNIVERSITY

Pie Chart

A specific type of chart that illustrates the relative sizes of data by using "pie slices." Imagine
you conducted a survey among your friends to see which types of movies they enjoy watching
the most:

Figure 3.7

Table: Favorite Type of Movie

Comedy Action Romance Drama SciFi


4 5 6 1 4

You may illustrate the data with this pie chart:

It is a good approach to represent relative sizes: it is easy to see which movie types are the most
liked, and it is also easy to see which movie types are the least liked.
32 STATISTICS FOR MANAGEMENT

3.4. SUMMARY
Diagrams offer a highly understandable and comprehensible representation of the facts.
According to M.J. Moroney, diagrams "capture a meaningful impression practically before we
think." [Citation needed] This approach is utilised nearly everywhere, in a huge range of
different domains and specialised disciplines of research. The content must be expressed in a
manner that is uncomplicated and uncomplicated in the diagram.

In each and every diagram, one must remember to take into account the dimensions of the paper
being used. It is essential that this reflect not only the location and source of the data, but also
the kind of data that is being provided. When we need to make a comparison between more
than two different variables, we look to the diagram for assistance since it helps us visually
represent the relationships between the variables. As you can see in the examples that follow,
in this scenario, we mix the outcomes of numerous variables throughout the duration of a period
onto a single bar. When it comes to drawing graphs and other kinds of diagrams, one of the
most frequent tools that people use is a piece of paper on which they design bar diagrams.

When choosing the width of each diagram, factors such as the number of diagrams of this kind
that need to be created and the dimensions of the paper at one end are taken into consideration.
Next, you will need to modify the lengths of the different bars so that they are proportional to
the values of the data. After that, you should arrange them in the order that corresponds to their
duration or the time in which they took place.

3.5. KEY WORDS


• Diagram: a visual representation that explains rather than represents, in particular: a
diagram that depicts how pieces are arranged and related to one another

• Graph: A graph is a diagram that depicts the connections between two or more objects.

• Dimension: a measurement taken in one direction, specifically of something's height,


length, or width

• Data: information about something that is relevant to calculations, logic, or planning


33 SHOOLINI UNIVERSITY

3.6. REVIEW QUESTIONS


Q1. What are methods of creating bar diagrams?

Q2. Elaborate the various types of Diagrams.

Q3. What is the importance of making a diagram, also mention the limitation of the diagram?

3.7. REFERENCE FOR FURTHER READING


Bhardwaj R. S., Business Statistics, Excel Books.

Lindgren B.W (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.

Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.

Balwani Nitin, Quantitative Techniques, First Edition: 2002. Excel Books, New Delhi.

Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York.

Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Hill Book
Company, New York.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.

Selvaraj R., Loganathan C., Quantitative Methods in Management.

Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.

Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay.

Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
34 STATISTICS FOR MANAGEMENT

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R.P., Statistics for Business and Economics, Macmillan India Delhi, 2008

Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.

Wine R.L. (1976), Beginning Statistics, Winthrop Publishers Inc., Massachusetts.


35 SHOOLINI UNIVERSITY

UNIT 4: MEASURE OF CENTRAL TENDENCY

CONTENT:

▪ Objectives
Introduction
4.1. Meaning of Central Tendency
4.2. Uses of Central Tendency
4.3. Limitations of Central Tendency
4.4. Keywords
4.5. Review questions
4.6. References for further reading

OBJECTIVES:
• Students will learn about central tendency

• Use of Central tendency

• And Limitation of central tendency


36 STATISTICS FOR MANAGEMENT

INTRODUCTION
A measure of central tendency is a single value that attempts to characterize a set of data by
finding the central position within that set of data. As such, measurements of central tendency
are frequently dubbed measures of central location. They are also categorised as summary
statistics. Distribution of any geographical attribute referred to as xi, where I = 1,2,3,..., n and
n denotes the number of observations, is originally statistically described by measuring its two
dimensions.

First, the central tendency that is based on the analysis of concentration of the observed values.
This statement is the statistical concentration of n items within the variable xi. Second, the
dispersion that demonstrates the distribution's breadth.

It is an equally essential dimension in statistical analysis because sometimes the central


tendency of two variables has the same value but the dispersion varies, or the dispersion has
the same size but the central trend differs. Consequently, the study of central value and
dispersion are crucial to comprehending the distributional properties of a series or variable.

4.1. MEANING OF CENTRAL TENDENCY


The central tendency is the "middle" value or possibly a typical value of the data, and it can be
calculated using the mean, the median, or the mode of the data. The one that should be utilised
relies totally on the conditions at hand because each of these measures is computed differently.

We may come up with a lot of observations when we study a population in respect to the one
in which we are interested. It is hard to derive any sort of comprehension about the quality after
taking into account all the observations. Getting one number for one group is therefore
preferred. This number must be a good representative one and paint a clear image for all the
observations to accurately depict that quality. It is conceivable to utilise such a representative
number as the common denominator for all of these observations. Depending on the situation,
this central value may also be referred to as an average, a measure of central tendency, or a
measure of locations. There are five averages in total. Simple averages are instances of what
are known as mean, median, and mode, while special averages are examples of what are known
as geometric mean and harmonic mean.
37 SHOOLINI UNIVERSITY

4.2. USES OF CENTRAL TENDENCY


1. The average provides an overall representation of a series. We cannot recall every fact
associated with a topic of study.

2. The average number provides a clear image of the subject of study for advice and conclusion.

3. It provides a brief explanation of the performance of the group as a whole and allows us to
compare the normal performance of two or more groups.

4.3. LIMITATIONS OF CENTRAL TENDENCY


1. It is not possible to determine the mean for categorical data since the values cannot be
added together.

2. Because the mean considers every value in the distribution, it is susceptible to the effects
of outliers and distributions that are skewed.

4.4. SUMMARY
A single value that attempts to characterise a set of data by determining the centre location
within that set of data is referred to as a measure of central tendency. This single value can be
thought of as an average value. Because of this, it is common practise to refer to measurements
of central tendency as "measures of central location." In addition, they are classified as a form
of summary statistics. The distribution of any geographical attribute referred to as xi, where I
= 1,2,3,..., n and n denotes the number of observations, was initially statistically described by
measuring its two dimensions. xi stands for "any geographical attribute," and I, 2, 3,..., n stands
for "the number of observations."

First, the central tendency, which is determined by doing an examination of the concentration
of the values that were observed. The statistical concentration of n items within the variable xi
can be found in this statement.

The second component is the dispersion, which demonstrates how broad the distribution is. In
statistical analysis, it is an equally important dimension since there are situations in which the
38 STATISTICS FOR MANAGEMENT

central trend of two variables has the same value but the dispersion changes, or in which the
dispersion has the same size but the central trend differs. Understanding the distributional
features of a series or variable necessitates an in-depth examination of both the central value
and the dispersion of the data in the series or variable.

4.5. KEYWORDS
Central Tendency: A central tendency is a central or typical value for a probability distribution
in statistics. In common parlance, measures of central tendency are typically referred to as
averages. The term central tendency originated in the 1920s. The arithmetic mean, the median,
and the mode are the most popular measurements of central tendency.

Mean: There are various types of mean in mathematics, particularly statistics. The arithmetic
mean, often known as the arithmetic average, is a measure of the central tendency of a finite
set of numbers. Specifically, the arithmetic mean is the sum of the values divided by the number
of values.

Median: In statistics and probability theory, the median is the dividing line between the upper
and lower halves of a data sample, population, or probability distribution. It can be thought of
as "the middle" of a data set.

Mode: The mode is the value that occurs most frequently in a given data set. If X is a discrete
random variable, the mode corresponds to the value x at which the probability mass function
reaches its highest value. In other words, it is the most likely value to be sampled.

4.6. REVIEW QUESTIONS


1. Define central tendency

2. Explain the advantages and disadvantages of mean

3. Explain the advantages and disadvantages of median

4. Explain the advantages and disadvantages of mode


39 SHOOLINI UNIVERSITY

4.7. REFERENCES FOR FURTHER READING


Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R.P., Statistics for Business and Economics, Macmillan India Delhi, 2008

Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.

Wine R.L. (1976), Beginning Statistics, Winthrop Publishers Inc., Massachusetts.


40 STATISTICS FOR MANAGEMENT

UNIT 5: ARITHMETIC MEAN

CONTENT:
▪ Objectives
Introduction
5.1. Mean
5.2. Functions of mean
5.3. Arithmetic mean
5.4. Merits and Demerits
5.5. Types of mean
5.6. Summary
5.7. Key Words
5.8. Review Question
5.9. Reference on Further Reading

OBJECTIVES:
• To understand the functions of mean and arithmetic mean

• To understand the types of mean


41 SHOOLINI UNIVERSITY

INTRODUCTION
As a consequence of this, the concepts of central tendency and central location are often used
interchangeably. In some circles, they are referred to as summary statistics. The mean, which
is also sometimes referred to as the average, is probably the measure of central tendency that
you are most familiar with.

5.1 MEAN
Multiple definitions exist for the average of a distribution. Important definitions include:

"An average is an attempt to discover a single number that describes the entire set of numbers."

— Clark and Sekkade

"The average is a value that is indicative or typical of a set of facts."

— Murray R. Spiegal

"An average is a single value within the data range that is used to represent all values in a series.
Since the mean falls somewhere inside the data distribution, it is frequently referred to as a
"measure of central value."

— Croxton and Cowden

"A central tendency measure is a characteristic value around which other figures cluster."

— Sipson and Kafka

5.2. FUNCTIONS OF THE MEAN


1. To show a massive amount of facts in a condensed form: It is extremely difficult for the
human mind to comprehend a vast number of numerical values. The average is used to
summarise such data into a single number, making it easier to comprehend and retain.
42 STATISTICS FOR MANAGEMENT

2. To enable comparison, the averages of distinct data sets can be compared to one another.
For instance, the mean (or average) earnings of workers at two factories can be used to
compare the wage levels of their employees.

3. The majority of judgments to be made in research, planning, etc. are based on the average
value of particular variables. For instance, if a company's average monthly sales are down,
the sales manager may need to take steps to increase them.

Qualities of an Excellent Average

A good average measurement must possess the following qualities:

It should be :

1. rigidly defined, preferably with an algebraic formula, so that different individuals acquire
the same value for the same set of data.

2. simple to calculate.

3. simple to comprehend.

4. The conclusion should be based on all observations.

5. capable of receiving additional algebraic treatment.

6. not be altered excessively by severe observations.

7. not be significantly influenced by sample fluctuations.

5.3. ARITHMETIC MEAN


Mean=Average (sum of all numbers/sample size)

Before discussing the arithmetic mean, several notations will be introduced. It is assumed that
there are n observations whose respective values are indicated by X1, X2,..... Xn. The
43 SHOOLINI UNIVERSITY

abbreviated form of the total of these observations X 1 + X2 +..... + Xn is where (called sigma)
signifies the summation sign. I the subscript of X, is a positive integer that represents the
observation's serial number. Given that there are n observations, the range of I will be between
1 and n. As stated above, this is expressed by writing it below and above. When there is no
ambiguity in the range of the summation, we can omit this indicator and write X1 + X2 +..... +
Xn = Xi.

The definition of arithmetic mean is the sum of observations divided by the total number of
observations. It can be computed using either the simple or weighted arithmetic mean. In basic
arithmetic mean, all observations are given equal weight, however in weighted arithmetic
mean, the weight assigned to different observations is not the same.

Mean =X1+X2+X3+X4+X5/5 where there are five individual items.

Suppose we have 5,7,9,11 and 13 , mean =sum of all items divided by 5 which comes to 9.

Please compute mean from following data : 5,8,10,11,13,15,20,22,25 and 26.

Here we have N=10 Sum=155 , Mean = Sum/N= 15.5

The mean of Grouped Data

The data set produced by grouping individual observations of a variable


into discrete groups is referred to as the mean of grouped data. Data that has been organised
into specific categories is referred to as grouped data. The mean, which is regarded as the
average value of a set of data, is the most popular statistical measure. The mean of grouped
44 STATISTICS FOR MANAGEMENT

data can be difficult to calculate precisely, but we can always estimate it. To further
comprehend this concept, let's learn more about the mean of grouped data, the techniques for
determining the mean of grouped data, and work through a few examples.

Grouped Data: What Does It Mean?

Calculating the average of a set of data that has been divided into various groups is known as
the mean of grouped data. A frequency table must be set across the data's frequencies in order
to quickly calculate the mean of grouped data.

The direct method, the assumed mean method, and the step deviation method are the three main
approaches for finding the mean of grouped data. For calculating the mean, each of these
approaches has its own formulas and techniques.

Formula for Grouped Data Mean

Sum of the observations divided by the total number of observations defines the mean formula.
There are two distinct formulas for determining the mean of ungrouped and grouped data,
respectively. Let's examine the formula for determining the mean of grouped data. The equation
is:

x̄ = Σfi/N

x = the average value of the supplied data collection.

f = frequency of the data values

N = total frequency

Consequently, the average of all data points is known as the mean.

Direct Technique
45 SHOOLINI UNIVERSITY

The direct method is the most straightforward way to determine the mean of grouped data. If
the values of the observations are x, then the result is x.

x̄=x1f1+ x2f2+ x3f4+ x4f4….. xifi/f1+ f2+ f3+ f4+…+ fn

Here are the procedures necessary to calculate the mean for grouped data using the direct
method:

● Create a table with the columns class interval, class marks (corresponding), xi
frequencies, fi(corresponding), and xifi.

● Using the Formula, compute Mean. Mean = xifi/ fi, where fi is the frequency and xi is
the class interval's midpoint.

● To determine the median, xi we apply the following formula: xi = (upper class limit +
lower class limit)/2.

Table 5.1

Class Interval 0-10 10-20 20-30 30-40 40-50

Frequency (fi) 9 13 8 15 10

As per steps mentioned above, first create columns

Table 5.2

Class Interval Frequency (fi) Class marks (xi) xifi

0-10 9 5 45
46 STATISTICS FOR MANAGEMENT

10-20 13 15 195

20-30 8 25 200

30-40 15 35 525

40-50 10 45 450

Total 55 1415

Estimated mean = ∑xifi / ∑fi= 1415/55 = 25.73

DISCRETE SERIES EXAMPLES BY DIRECT METHOD AND SHORT CUT METHOD.


Find Arithmetic mean where X=8,9,10,11,12 &13 and we have f=3,5,8,12,7,5
respectively by direct method and short cut method.
By Direct Method we get mean= cumulative fx/cumulative f=430/40=10.75

BY SHORT CUT METHOD MEAN =A+CUMULATIVE(F.DX)/N WHERE A IS THE


ASSUMED MEAN .
Here A=10 we assumed Cumulative fdx=30 =sum (3*-2+5*-1+8*0+12*1+7*2+5*3) =30 ,
N=40

Hence Arithmetic Mean=10+30/40=10.75

Suppose we have to find Arithmetic mean from following frequency distribution :Marks 0-7 ,
7-14,14-21,21-28,28-35,35-42 &42-49 and number of students 19,25,36,72,51,43 and 28
respectively.

By direct method we have formula , Mean=cumulative(f.m)/N where N=cumulative frequency


, cumulative f.m=7259 and N=274 , hence Mean=26.49
47 SHOOLINI UNIVERSITY

By short cut method we take mid values of intervals as 3.5,10.5, 17.5,24.5,31.5,38.5 and 45.5
, taking assumed mean as 24.5 , we get N=274 and cumulative fdx=546 , hence
mean=24.5+546/274=24.5+1.99 =26.49

CONTINUOUS SERIES-EXAMPLES
Calculate mean from the following data , X: 1-10, 11-20,21-30,31-40,41-50 & 51-60 and
corresponding frequencies f being 3,5,8,10,9 and 5 respectively.

By Direct method and By short cut method ,Mean=33.5

Calculate mean from the following data , X: 10-19, 20-29,30-39,40-49 & 50-59 and
corresponding frequencies f being 5,8,12,8 and 7 respectively .

By Direct Method and by Short Cut Method,Mean =35.5

5.4. MERIT AND DEMERIT OF ARITHMETIC MEAN


Merits:
i. It has a strict definition.

ii. It is simple to comprehend and compute.

iii. It is more accurate and dependable if there are a sufficient number of items.

iv. It is not predicated on its place in the series; rather, it is a computed value.

v. Calculations can be made even when part of the data's specifics are missing.

vi. It is least impacted by sample variation of all averages.

vii. It provides a strong platform for comparison.

Demerits:
i. It cannot be discovered using a frequency graph or by visual observation.

ii. In order to explore qualitative phenomena, which cannot be measured numerically, i.e.,
48 STATISTICS FOR MANAGEMENT

IQ, good looks, honesty, etc.

iii. Only at the risk of losing accuracy may it overlook any one thing.

iv. Extreme values significantly alter it.

v. For open-end classes, it cannot be computed.

vi. If the specifics of the data used to compute it are not provided, it could result in false
findings.

5.5. TYPES OF MEAN:


Weighted Mean
A weighted mean is a type of average that is calculated by assigning varying values of
importance to each of the individual data points. In the event that all of the weights are
equivalent to one another, then the weighted mean will coincide with the arithmetic mean.

It is a representation of the average of the data that has been provided. The arithmetic mean
and the sample mean are two measures of central tendency that are analogous to the weighted
mean.

When the data are presented in a manner that is distinct from the arithmetic mean or the sample
mean, the weighted mean is the statistic that is computed instead.

Despite the fact that weighted means behave in a manner that is generally equivalent to that of
arithmetic means, there are a few properties of weighted means that run counter to common
sense. The elements of the data set that have a higher weight contribute more to the weighted
mean than the elements that have a lower weight.

There can be no negative values assigned to the weights. Because division by zero is not
permitted, there could be some that are zero, but that does not mean that they are all zero. In
the systems of data analysis, as well as in weighted differential and integral calculus, weighted
49 SHOOLINI UNIVERSITY

means play a significant role.

Formula:

W= weighted average

n= number of terms to be averaged

Wi= weight applied to x value

Xi= data values to be averaged

Suppose a student scored following marks in different subjects out of 100 : English 35, Maths
80 , Accounts 90 , Economics 45 and Commerce 55 we get his average marks =300/5=60

But in real practice he devotes 3 hours to English , 2 hours to maths, 2 hours to accounts 1 hour
each to economics and commerce , hence his cumulative W=3+2+2+1+1=9

Cumulative WX=35*3+80*2+90*2+45*1 +55*1 =540

Weighted Mean=540/9=60 marks

5.6. SUMMARY
A single value from the data range that is used to represent all of the values in a series is referred
to as the average of those values. Because it is located in the middle of the data distribution,
the mean is often referred to as a "measure of central value." This is because it represents the
value that is most prevalent. When conducting research and making plans, the majority of the
decisions that need to be made are determined by the average value of the relevant variables.
50 STATISTICS FOR MANAGEMENT

The arithmetic mean is calculated by dividing the total number of observations by the sum of
all the observations. Either the simple arithmetic mean or the weighted arithmetic mean can be
used to compute it.

The direct technique, the assumed mean method, and the step deviation method are the three
most used approaches for calculating the mean of grouped data. The direct technique is by far
the most straightforward way for calculating the mean of grouped data. The first step is to
create a table containing the columns class interval, class marks, xi frequencies,
fi(corresponding), and xifi. The following formula should be used to get the mean: Mean is
defined as xifi/fi, where xi is the middle of the class interval and fi is the frequency. When the
data are presented in a way that is different from either the arithmetic mean or the sample mean,
the weighted mean is the statistic that is generated in their place.

The behaviour of weighted means is generally equivalent to the behaviour of arithmetic means;
however, there are properties of weighted means that run counter to what would be expected
under normal circumstances.

5.7. KEY WORDS:


Arithmetic mean: The result of adding together and dividing by the total number of numbers
or variables is the arithmetic mean, which is also known as the average or average value. In
statistics, the arithmetic mean is significant.

Central tendency: A central tendency is a central or typical value for a probability distribution
in statistics. In common parlance, measures of central tendency are typically referred to as
averages. The term central tendency originated in the 1920s. The arithmetic mean, the median,
and the mode are the most popular measurements of central tendency.

Grouped data: Grouped data are those that have been created by grouping together individual
observations of a variable in a way that makes it possible to summarise or analyse the data
easily using a frequency distribution table of the groups.
51 SHOOLINI UNIVERSITY

5.8. REVIEW QUESTIONS:


● What functions does an average serve? Discuss the relative advantages and disadvantages
of various statistical averages.

● Provide the necessary components of a measure of "Central Tendency." In what situations


would a geometric mean or a harmonic mean be preferable to arithmetic mean?

5.9. REFERENCE FOR FURTHER READINGS


Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Notes
Hill Book Company, New York.

Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.

Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.

Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.

Selvaraj R., Loganathan, C. Quantitative Methods in Management.

Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.
52 STATISTICS FOR MANAGEMENT

UNIT 6: MEDIAN

CONTENT:
▪ Objectives
Introduction
6.1. Median of the grouped data
6.2. Ungrouped or raw form
6.3. Cumulative frequency
6.4. Continuous series
6.5. Merits of median
6.6. Demerits of median
6.7. Summary
6.8. Key Words
6.9. Review Questions
6.10. Reference for further Reading

OBJECTIVES:
• To understand the median of the grouped and ungrouped data

• To learn about the merits and demerits of median

• To learn more about cumulative frequency


53 SHOOLINI UNIVERSITY

INTRODUCTION
When values are sorted in ascending or descending order, the median is the middle value of
the distribution. The median is the midpoint of the distribution (there are 50 percent of
observations on either side of the median value). The median value in a distribution with an
odd number of observations is the midway value.

The median is the middle number, which is 57 years, when examining the retirement age
distribution (with 11 observations):

54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60

When the number of observations in a distribution is even, the median value is the mean of the
two middle values. The two middle values in the following distribution are 56 and 57, hence
the median is 56.5 years:

52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60 , 60

Median-Individual Series

• Arrange data in ascending order and make cumulative frequencies.

• Apply formula median-(N+1)/2th item

• Find median from 5,8,10,15,13,11,13,18,20,12

• Please arrange data in increasing order means 5,8,10,11,12,13,13,15 ,18 ,20 , we have
N=10 and applying formula we have median =5.5th item

• Median=Mean of 5th and 6th item or it is average of 12 & 13=12.5

• Hence Median=12.5
54 STATISTICS FOR MANAGEMENT

• Calculate median from 10,12,14,16,15,9,13,17,20

Median-Discrete Series

• Find Median from the following data ,X(marks) are 10,11,13,15 and 16 and f (number
of students) 5,8,12,9 and 6 respectively.

• Compute cumulative frequencies and apply formula median=(N+10/2th item

• Here Median =20.5th item which lies in the cumulative frequency 25.

• Against it X=13

• Hence Median=13

Median-Continuous Series

• Find Median from the following data 0-7,7-14,14-21,21-28,28-35,35-42 and 42-49and


f (number of students) are 19,25,36,72,51,43 and 28 respectively.

• In continuous series Median =N/2th item , here N=274 so it is 137th item.

137th item lies in c.f. 152 ,interval 21-28

• Median =L1 +((N/2-c.f.)/f)*i

• Here L1 =Lower limit of Median class interval, N/2 is median item , c.f. is the
cumulative frequency preceding the median class interval , i is the width of Median
class interval.

• Here L1 =21, N/2=137 c.f.=80 and f=72 ,i=7


55 SHOOLINI UNIVERSITY

• Hence Median by calculation =21+(57*7)/72=26.54

• Find Median from the following wage distribution for a certain factory :-monthly wages
in Rupees 50-80, 80-100,100-110,110-120,120-130,130-150,150-180 and 180-200 and
number of workers are 30,127,140,240,176,135,20 and 3 respectively

• In continuous series Median =N/2th item , here N=871 so it is 435.5th item.

• 435.5th item lies in c.f. 537 and class interval 110-120

• L1=110, N/2=435.5 , c.f.=297 ,i=10 and f=240

• Median by calculation=110+5.77=115.77

• Find Median in the following data:

• X is below 40, 40-50,50-60,60-70 ,70 and above

• Respective frequencies 5,8,12,3 and 2 respectively

,Compute c.f.

• N=30 , Median is 15th item

• Interval 50-60 so i=10 , L1=50 , N/2=15 , c.f.=13 , f=12

• 50+1.67

• Median=51.67

Median-Comparing groups
56 STATISTICS FOR MANAGEMENT

• The following marks are obtained in three papers of statistics by 12 students .

• Paper A -60,56,41,46,54,59 ,55,51,52,44,37 ,39

• Paper B -58,54,21,51,59,46,65,31,68,41,70, 36

• Paper C – 65,55,26,40,30,74,45,29,85,32,80,39

• Median A=(51+52)/2=51.5

• Median B=(51+54)/2=52.5

• Median C=(40+45)/2=42.5

• Hence level of group B is highest since Median is highest

Benefit of the median:

The median is less susceptible to outliers and skewed data than the mean, making it the ideal
measure of central tendency when the distribution is not symmetrical.

Restrictions on the median:

It is impossible to identify the median for categorical nominal data since it cannot be properly
arranged.

6.1. MEDIAN OF THE GROUPED DATA


It is not possible to determine the median of a specific observation in grouped data by
examining the cumulative frequencies. The median value of the provided data will fall inside
a given class interval. Consequently, it is important to identify the value inside the class interval
that divides the entire distribution in half. In this situation, we must identify the middle class.
57 SHOOLINI UNIVERSITY

To determine the median class, we must calculate the cumulative frequency of each class and
n/2. Then, identify the class whose cumulative frequency exceeds (is closest to) n/2. The class
is known as the middle class.

After determining the median class, use the formula below to calculate the median value.

Where.

l represents the lower bound of the median class.

n represents the total number of observations

frequency of the median class

the class size is h

The cumulative frequency of the class before the median class is denoted by cf.

Table 6.1

Weight (in Kg) Number of Boys

Less than 140 4

Less than 145 11

Less than 150 29


58 STATISTICS FOR MANAGEMENT

Less than 155 40

Less than 160 46

Less than 165 51

To get the median weight, we must first determine the class intervals and their associated
frequencies.

The supplied distribution takes the form of being less than type 145, 150,..., and 165, which
represents the upper limit. Thus, the class should be below 140, between 140 and 145, between
145 and 150, between 150 and 155, between 155 and 160, and between 160 and 165.

Observable from the given distribution is the fact that

Four boys are below 140. The frequency of class intervals below 140 is therefore 4.

There are 11 girls with heights below 145, and 4 with heights below 140.

Therefore, the frequency distribution for the class interval 140-145 equals 11 minus 4 equals

Similarly, the frequency of 145 minus 150 = 29 minus 11 = 18

Frequency of 150-155 equals 40-29 equals 11

Frequency of 155-160 minus 46-40 equals 6

frequency of 160-165 equals 51-46 equals 5


59 SHOOLINI UNIVERSITY

Consequently, a frequency distribution table and cumulative frequencies are shown below:

Table 6.2

Class Interval Frequency Cumulative Frequency

Below 140 4 4

140-145 7 11

145-150 18 29

150-155 11 40

155-160 6 46

160-165 5 5

Here, n= 51.

Therefore, n/2 = 51/2 = 25.5

Thus, the observations lie between the class interval 145-150, which is called the median class.

Therefore,

Lower class limit = 145

Class size, h = 5
60 STATISTICS FOR MANAGEMENT

Frequency of the median class, f = 18

Cumulative frequency of the class preceding the median class, cf = 11.

We know that the formula to find the median of the grouped data is:

Median = 145 + (72.5/18)

Median = 145 + 4.03

Median = 149.03.

Therefore, the median height for the given data is 149. 03 kg.

6.2. UNGROUPED OR RAW FORM


Put the values in the appropriate order, increasing from lowest to highest. The value in the
centre of the group, if there are an odd number of values, is the median; if there are an even
number of values, the median is the mean of the middle two values.

Formula

When n is odd, Median = Md

When n is even, Average of

Example
61 SHOOLINI UNIVERSITY

If the weights of sorghum ear heads are 45, 60,48,100,65 gms, calculate the median

Solution

Here n = 5

First arrange it
in ascending
order 45, 48,
60, 65, 100

Median =

3rd = 60

Example

If the sorghum ear- heads are 5,48, 60, 65, 65, 100 gms, calculate the median.

Solution
Here n = 6

Median
62 STATISTICS FOR MANAGEMENT

6.3. CUMULATIVE FREQUENCY


The cumulative frequency of each class is determined by adding up the frequencies of that class
and the classes that came before it. This is accomplished by beginning with the lowest
frequency and working your way up to the highest frequency. The highest cumulative
frequency represents the total number of items.

Step 1: Find cumulative frequencies.

Step 3: See in the cumulative

Step 4: Then the corresponding value of x is median.

6.4. CONTINUOUS SERIES


For the purpose of calculating the median in a continuous series, the steps listed below
are used.

Step 1: Find cumulative frequencies

Step 2: Find

Step 3: See in the cumulative frequency the value first greater than , Then the corresponding
class interval is called the Median class.

Then apply the formula


63 SHOOLINI UNIVERSITY

Where,

l: lower limit of medain class

m: Cumulating frequency preceding the median class

c: width of the class

f: Frequency in the median class

n: Total frequency

6.5. MERITS OF MEDIAN


1. As a positional average, the median is unaffected by extreme values.

2. In the case of a distribution with open-end intervals, the median can be computed.

3. Even with limited data, the median can be found.

6.6. DEMERITS OF MEDIAN


1. The median value may drastically alter with even a minor change in the series.

2. The median is an estimated value other than any value in the series when there are an even
number of items or a continuous series.

3. It can only be used to calculate mean deviation and cannot be subjected to further
mathematical analysis.
64 STATISTICS FOR MANAGEMENT

4. It does not account for every observation.

6.7. SUMMARY:
When looking at the distribution of retirement ages (with 11 different observations), the median
is the number that falls exactly in the middle, which is 57 years. The mean is more likely to be
affected by outliers and skewed data, while the median is less likely to be affected by either.
When the distribution is not symmetrical, it is the most accurate measure of central tendency
that can be used. There are 11 girls who have heights that are less than 145, while there are
only 4 boys who have heights that are greater than 140. The frequency distribution for the class
interval 140-145 equals 11 minus 4 equals 7.5, and the frequency of 145 minus 150 equals 29
minus 11 equals 18r.

Frequency of 150-155 equals 40-29 equals 11, and 6-46 equals 6. Determine the average
number of insects that are living on each plant, and then work your way up from there. The
cumulative frequency that is highest represents the total number of items, and the median is the
average of all the observations that were made by a specific group of plants over a certain
amount of time (n/2).

6.8. KEY WORDS:


Grouped data: A frequency distribution table of the groups that were generated by organising
individual observations of a variable into groups is referred to as grouped data. This table
provides a handy means of summarising or evaluating the data and is therefore given the name
"grouped data."

Median : The median is the number that appears in the middle of a list of numbers that has
been arranged in either ascending or descending order. The median is often more descriptive
of a data collection than the average does.

Cumulative Frequency: A frequency distribution table's frequencies are added to the total of
their predecessors to determine the cumulative frequency. Due to the fact that all frequencies
have already been added to the prior total, the final number will always be the same as the sum
of all observations.
65 SHOOLINI UNIVERSITY

Class Interval : The gap that exists between the upper class limit and the lower class limit is
referred to as the class interval. For instance, the size of the class interval for the first class is
equal to four after subtracting thirty from twenty-six. In a similar manner, the size of the class
interval for the second class is equal to four and is calculated as follows:

6.9. REVIEW QUESTIONS:


Q1. As per understanding, Elaborate the median of grouped data with example

Q2. Explain the advantages and disadvantages of median.

6.10. REFERENCE FOR FURTHER READINGS


Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Notes
Hill Book Company, New York.

Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.

Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.

Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.

Selvaraj R., Loganathan, C. Quantitative Methods in Management.

Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.

Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala


66 STATISTICS FOR MANAGEMENT

UNIT 7: MODE

CONTENT:
▪ Objectives
Introduction
7.1. Continous Distribution
7.2. Percentile
7.3. Quartile
7.4. Summary
7.5. Key Words
7.6. Review Questions
7.7. Reference for further reading

OBJECTIVES:
• Concept of mode and its use

• Concept of Percentile and Wuartile

• Grab some knowledege about Continous Distribtion


67 SHOOLINI UNIVERSITY

INTRODUCTION
The mode of a variable is the value that occurs most frequently. The greatest frequency of the
distribution corresponds to the value of the variable. In any series, the value of the item that is
repeated the most frequently is the one that is most typical or frequent. The mode can be simply
determined by examining the data. All that is required is to identify the item with the highest
frequency of repetition.

• Mode: Most recurring frequency (What is the most popular course amongst Shoolini
students? What is the most common Car Brand? Value which is repeated maximum
number of times in a data.

• Mode is value at a point around which items are most heavily concentrated.

If two or more values appear with the same frequency, they are each considered modes. A
disadvantage of using the mode as a measure of central tendency is that a data set may have no
mode or multiple modes.

However, the identical set of data will only have a single mean and a single median. The mode
of a data set is often referred to using the term modal. If a data collection has only one value
that happens most frequently, it is referred to as unimodal. Similarly, a data set with two values
that occur most frequently is referred to as bimodal. A set is termed multimodal when it has
more than two values that occur with the same highest frequency. No mathematics are required
when finding the mode of an ungrouped data set; nonetheless, good observation is required.
Consider the subsequent instances:

On a congested roadway, the posted speed limit is 80 kilometres per hour. The following are
the speeds (in kilometres per hour) of ten vehicles that were stopped for exceeding the speed
limit:

96 101 99 100 98 103 99 102 95

What is the mode?


68 STATISTICS FOR MANAGEMENT

There is no need to organise the data, unless you believe that arranging the numbers from
smallest to largest would make it easier to identify the mode. The number 99 appears twice in
the data set presented above, while all other numbers appear just once. Given that 99 appears
most frequently, it is the mode of the data values.

Mode = 99 miles per hour

• In individual series , let us compute Mode

• If we have X:5,8,10,11,15,18,20,22,25,30

• Y:4,8,10,12,13,10,15,12,13,12

• Z:3,5,8,12,12,13,15,13,17,19

• No Mode in series X

• Mode is 12 in series Y

• Bi Modal Mode in series Z

Determining the Mode of Grouped Data

When calculating the mode for a grouped frequency distribution, we first determine the modal
class, or the class with the highest frequency. Then, we will calculate the mode using the
following formula.
69 SHOOLINI UNIVERSITY

where l = lower limit of the modal class

fm = frequency of the modal class

f1 = frequency of class preceding the modal class

f2 = frequency of class succeeding the modal class

h = width of the modal class

To compute the mode of a grouped or continuous frequency distribution with equal class
intervals, we can utilise the following steps:

Step 1: Prepare the frequency distribution table with the observations in the first column and
the respective frequency in the second column.

Step 2: Determine the class of maximum frequency by inspection in the second step. This
category is known as the modal class.

Step 3: To calculate mode, use the formula

Example
70 STATISTICS FOR MANAGEMENT

Table 7.1

Height (in
125-130 130-135 135-140 140-145 145-150
cm)

Number of
7 14 10 10 9
students

The maximum frequency in this case is 14 and the corresponding class is 130-350. Therefore,
130-135 is the modal class such

7.1. CONTINUOUS DISTRIBUTION


Locate the highest frequency the class corresponding to that frequency is called the
modal class.Then apply the formula.

Mode =

Where = lower limit of the model class

= the frequency of the class


71 SHOOLINI UNIVERSITY

preceding the model class = the


frequency of the class succeeding the
model class

and c = class interval

Example:

For the frequency distribution of weights of sorghum ear-heads given in table below.
Calculatethe mode

Table 7.2

Weights of ear No of ear

heads (g) heads (f)

60-80 22

80-100 38

100-120 45 f

120-140 35
72 STATISTICS FOR MANAGEMENT

140-160 20

Total 160

Solution

Mode =

Here,

100, f = 45, c = 20, m =60, =38, =35

Mode =

= = 109.589

• Mode in continuous series

• X:0-10,10-20,20-30,30-40 ,40-50

• f:5,8,12,6,4

• Mode lies in interval 20-30 as it has maximum frequency 12

• L1=20 , f0=8,f1=12,f2=6 . i=10

• Applying formula we get Median =20+(4/(4+6))*10=24

• Calculate Mode from following data ,X:0-20,20-40,40-60,60-80,80-100 with


f:5,10,15,12,4
73 SHOOLINI UNIVERSITY

• Mode lie in interval with highest frequency means 40-60

• We have L1=40 , f0=10,f1=15 & f2=12 , i=20

• By formula Median = 40 +(5/8)*20=40+12.5

Median=52.5

• Mode in discrete series by inspection method.

• We have X:5,6,7,8,10,11

• f:4,8,12,15,11,4

• By inspection mode is 8 since 15 is highest frequency

Relation between Mean Median & Mode

• In a symmetrical distribution Mean=Median=Mode

• In Assymetrical Distribution Mode=3Median-2(Mean)

• We have Mean=45 , Mode=41 , Calculate Median

• Applying formula Mode=3(Median)-2(Mean)

• 41=3(Median)-2(45)

• 3(Median)=41+90=131

• Median=43.67
74 STATISTICS FOR MANAGEMENT

• We have Mean=30 , Median=40 , Calculate Mode

• We have Mode=16 , Mean=15.60 , Calculate Median

• We have Median=13 , Mode=14 , Calculate Mean

7.2. PERCENTILE
The percentile values divide the distribution of the data into 100 equal groups, each
representing 1% of the entire distribution. The xth percentile is the point in the
distribution below which x percent of the values fall. It's critical to keep in mind that
the median represents the middle 50% of the distribution.

For raw data, first arrange the n observations in increasing order. Then the xth
percentile is given by

items

For a frequency distribution the xth percentile is given by

Where,

l= lower limit of the percentile class which contains the xth percentile value (x. n /100)

cf = cumulative frequency upto

f = frequency of the percentile class


75 SHOOLINI UNIVERSITY

c = class interval

n = total number of observations

Percentile for Grouped or Raw Data


The following are the paddy yields (kg/plot) from 14 plots:
The following are the paddy yields (kg/plot) from 14 plots:
30,32,35,38,40.42,48,49,52,55,58,60,62,and 65 ( after arranging in ascending order).
30,32,35,38,40.42,48,49,52,55,58,60,62,and 65 ( after arranging in ascending order).
Thecomputation of 25th percentile (Q ) and 75th percentile (Q ) are given below:
Thecomputation of 25th percentile (Q11) and 75th percentile (Q3)3 are given below:

= 3rdrd item item–– 33rdrd item)


(4thth item
item ++ (4 item)

= 35 ++ (38-35)
(38-35)

== 35
35 ++ 33 == 37.25
37.25 kg
kg

= 11th item + (12th item – 11th item)


= 11th item + (12th item – 11th item)
= 55 +(58-55)
= 55 +(58-55)
= 55 + 3 = 55.75 kg
= 55 + 3 = 55.75 kg

7.3. QUARTILE
Four separate groupings are formed within the distribution by the quartiles. The data
can be split into three equal groups known as quartiles. The median and the second
quartile in a normal distribution are similar because they both fall in the middle. The
lower and higher quartiles, respectively, serve to define quarters. The second quartile
contains the median and 50th percentile.
76 STATISTICS FOR MANAGEMENT

Raw and Ungrouped Data


First arrange the given data in the increasing order and use the formula for Q1 and Q3
then quartile deviation, Q.D is given by

Example:

Computequartiles
Compute quartilesforfor
thethe data
data given
given below
below (grains/panicles)
(grains/panicles) 25,30,
25, 18, 18,8, 30,
15, 8,
5, 15, 5, 10,
10, 35, 40,35,
45 40, 45

Solution
Solution
5, 8, 10, 15, 18, 25, 30, 35, 40, 45
5, 8, 10, 15, 18, 25, 30, 35, 40, 45

= (2.75)th item
= (2.75)th item
= 2nd item + (3rd item – 2nd item)

= 2nd item + (3rd item – 2nd item)


= 8+ (10-8)

= 8+ x (10-8)
= 8+ 2

= 8+ x2

= 8+1.5
= 9.5

= 3 x (2.75) th item
= (8.75)th item

= 8th item + (9th item – 8th item)

= 35+ (40-35)

= 35+1.25
= 36.25
77 SHOOLINI UNIVERSITY

7.4. SUMMARY:
The value of a variable that appears most frequently in a data set is referred to as the
mode of the variable. Examining the data is a straightforward method for determining
the mode. The determination of the mode of an ungrouped data set does not require
any mathematics; however, it does require careful observation and observation.
Following these steps will allow you to calculate the mode of a grouped or continuous
frequency distribution that has equal class intervals. Create the table showing the
frequency distribution.

In the second step, inspection will be used to determine which class has the highest
frequency. Find the frequency that is the highest overall, as the class that corresponds
to that frequency is referred to as the modal class. The formula should then be used.
The values of the percentiles divide the distribution of the data into a hundred equal
groups, with each group accounting for 1% of the whole. It is possible to split the data
into three equal parts that will be referred to as quartiles. Because they are both located
in the middle of the distribution, the median and the second quartile of a normal
distribution are identical.

7.5. KEY WORDS:


Mode: The mode is the value that occurs most frequently in a given data set. If X is a discrete
random variable, the mode corresponds to the value x at which the probability mass function
reaches its highest value. In other words, it is the most likely value to be sampled

Percentile: A percentile is a score that compares a specific score to the scores of the other
members of a group. It displays the proportion of other scores that a given score outperformed.
For instance, if you have a test score of 75 and are placed in the 85th percentile, it signifies that
your score is greater than those of 85% of other test takers.

Quartile: A statistical concept known as a quartile refers to the separation of


observations into four discrete intervals based on the values of the data and how they
compare to the whole set of observations.
78 STATISTICS FOR MANAGEMENT

7.6. REVIEW QUESTIONS


Q1. Explain percentile with formula.

Q2. What do you understand by mode?

7.7. REFERENCE FOR FURTHER READING:


Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Notes
Hill Book Company, New York.

Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.

Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.

Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.

Selvaraj R., Loganathan, C. Quantitative Methods in Management.

Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.

Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay
79 SHOOLINI UNIVERSITY

UNIT 8: MEASURE OF DISPERSION

CONTENT:
▪ Objectives
Introduction
8.1. Definitions
8.2. Objectives of measuring dispersion
8.3. Characteristics of a good measure of dispersion
8.4. Measure of dispersion
8.5. Range
8.6. Interquartile range
8.7. Mean deviation of average deviation
8.8. Standard deviation
8.9. Keywords
8.10. Review questions
8.11. References for further reading

OBJECTIVES:
• Students will learn about dispersion and its use
80 STATISTICS FOR MANAGEMENT

INTRODUCTION
A measure of central tendency condenses the distribution of a variable into a single number
that can be considered representative. However, this metric alone is insufficient to define a
distribution, as it is possible for two or more distinct distributions to have the same central
value. Inversely, it is conceivable for the distribution patterns of two or more instances to be
identical but their central tendencies diverge. To effectively capture the properties of a
distribution, it is therefore required to establish additional summary metrics. The measure of
dispersion or the measure of variation is one such metric.

Dispersion relates to the level of scatter or variation in observations. Typically, the variability
of an observation is measured as its divergence from the mean. A reasonable average of all
such variations is known as the dispersion measure.

Below are some essential definitions of dispersion:

"Dispersion is the measure of item variance."-Bowley, A.L.

"Dispersion is the degree to which particular items differ from one another."— L.R. Connor

"Variation or dispersion is the measurement of the scatteredness of the mass of figures in a


series around the mean."— The works of Simpson and Kafka

The degree to which numerical data tend to disperse around a mean value is referred to as the
variance or dispersion of the data.— Spiegel

8.1. WHAT IS MEASURE OF DISPERSION IN STATISTICS?


The variability in the data can be better described using measures of dispersion. A statistical
concept called dispersion can be used to characterize the degree to which data is dispersed in a
variety of directions. As a result, measures of dispersion are subcategories of measures that are
utilized in the process of quantifying the variance present in the data.

Diverse dispersion measures can be grouped into two broad categories:


81 SHOOLINI UNIVERSITY

The metrics that express the dispersion of data in terms of the distance between selected values.
Examples include range, interquartile range, interpercentile range, etc. The metrics that express
the dispersion of observations as the average of their deviations from some central value. These
are sometimes known as second-order averages, such as mean deviation, standard deviation,
etc.

• Dispersion indicates the extent to which individual items differ.

• Dispersion shows lack of uniformity in size of items.

• Dispersion is degree of variation of variables about a central variable.

• Dispersion is measure of variability.

• For example we have 5 items all of value 20 each and Arithmetic Mean is 20 and no
dispersion.

• Also we have 22,21,20,19,18 Arithmetic Mean is 20 but there is moderate dispersion


about mean value.

• Also we have 30,25,20,15,10 Arithmetic Mean is 20 but there is significant dispersion


about mean value.

8.2. OBJECTIVE OF MEASURES OF DISPERSION


● To verify the trustworthiness of a mean, one can utilise a measure of dispersion. A low
number of dispersions indicates that there is a greater degree of homogeneity among
the items, and as a result, their mean can be considered more reliable or representative
of the distribution.

● To assess the degree of variation between two or more distributions: Comparing the
dispersions of two or more distributions allows for a comparison of their variability. A
distribution with a smaller dispersion value is more uniform or consistent.
82 STATISTICS FOR MANAGEMENT

● For the purpose of facilitating the computations of other statistical measures, dispersion
measures are utilised in the computations of numerous crucial statistical measures, such
as correlation, regression, test statistics, confidence intervals, control limits, etc.

● To serve as the foundation for variation control: The primary purpose of computing a
dispersion measure is to determine whether the given observations are uniform. This
knowledge has multiple applications. According to Spurr and Bonini, "in questions of
health, fluctuations in body temperature, heart rate, and blood pressure are fundamental
diagnostic indicators."

● The medication prescribed is intended to control their variances. Through inspection


and quality control programmes, the causes of quality differences in industrial
production are determined in order to ensure efficient operation. The extent of income
and wealth disparities in a given society can inform the selection of an effective policy
to regulate their variability.

8.3. MEASURE OF DISPERSION:


Positive real values that indicate the degree to which the data being examined is homogenous
or heterogeneous are examples of what are known as dispersion measures. If all the data points
in a collection are the same, then the value of a measure of dispersion will be 0, because this is
the case when there is no variation. The value of the measures of dispersion, on the other hand,
will increase as the variability of the data will increase as well.

8.4. TYPES OF MEASURES OF DISPERSION


1. Absolute Dispersion

Absolute measurements of dispersion are those that are represented in the same unit as the
variable being measured, such as kilogrammes, rupees, centimetres, or marks.

2. Relative Dispersion

• When compared to the average value, relative measurements of dispersion are


83 SHOOLINI UNIVERSITY

expressed as ratios or percentages.

• These are also referred to as dispersion coefficients in some circles.

• These are arbitrary figures, such as percentages, which are not dependent in any way
on the units of measure.

8.5. DISPERSION: HOW TO GET A GOOD MEASURE, AND ITS CHARACTERISTICS


• It should not be difficult to compute, and it shouldn't be difficult to comprehend.

• It needs to be based on all the observations made during the series.

• It ought to be defined in a strict manner.

• It ought not to be affected by values that are particularly severe.

• The fluctuations in the sampling should not have an undue impact on it.

• It ought to be susceptible to additional mathematical treatment and statistical


investigation.

The following are essential dispersion measures:

(a) Interquartile Range

(b) Mean Deviation

(c) Standard Deviation

(d) Range
84 STATISTICS FOR MANAGEMENT

8.6. RANGE
The range of a distribution is the difference between the distribution's largest and smallest
observations. R = L - S, where R represents the range and L and S represent the largest and
smallest observations, respectively.

R is the absolute range measurement. The definition of a relative measure of range, often
known as the coefficient of range, is as follows:

• Range- The difference between highest value and lowest value of a series is called
range.

• R=H.V.-L.V.

• Coefficient of Range is defined as ratio of difference between Highest and Lowest


value of series and sum of Highest &Lowest Value of series.

• =(H.V.-L.V.)/(H.V.+L.V.)

• Let us find range and its coefficient from the following data: 22,35,32,45,42,48,39

• Range=48-22=26

• Coefficient =26/70=0.37

• Find Range and coefficient of Range from following data: X:3,4,5,6,7,8,9,10 and
f:35,30,20,10,6,3,2,1 , Range is 7 and Coefficient of Range is 7/13=0.54
85 SHOOLINI UNIVERSITY

8.6.1. MERITS AND DEMERITS OF RANGE


Merits

1. It is simple to comprehend and compute.

2. It provides a rapid measurement of variability.

Demerits

1. It does not account for all observations.

2. It is significantly altered by severe observations.

3. It just provides a basic picture of the distribution of observations.

4. It provides no information about the distribution's pattern. There can exist two distributions
with the same range but distinct patterns.

5. It is significantly affected by sampling variations.

6. It is not susceptible to mathematical treatment.

7. It cannot be determined for open-ended distributions.

8.6.2. USES OF RANGE


Despite numerous significant drawbacks, it is beneficial in the following circumstances:

1. It is used to create control charts for monitoring the quality of manufactured products.

2. It is also utilised in the research of fluctuations, such as the price of a product, a patient's
temperature, the amount of rainfall over a specific period, etc.
86 STATISTICS FOR MANAGEMENT

Interquartile range

Interquartile Range is an absolute measure of dispersion calculated by subtracting the third


quartile (Q3) from the first quartile (Q1) (Q1)

Interquartile range equals Q3 minus Q1

Interpercentile Range

Interpercentile range, or simply percentile range, can also be used to deal with the problem of
extreme observations.

Representatively, percentile range = P(100 - i) – Pi (i < 50).

This measure eliminates i% of the observations at each end of the distribution and is a range of
the middle (100 – 2i)% of the observations.

Generally, a percentile range equivalent to i = 10, i.e., P90 – P10 is used. Since Q1 = P25 and Q3
= P75, therefore, interquartile range is also a percentile range.

8.6.2. QUARTILE DEVIATION OR SEMI-INTERQUARTILE RANGE


The median deviation is measured using the quantile deviation statistic. The quartile deviation,
also known as the semi interquartile range, is equal to half the difference between the third and
first quartiles. Q.D = (Q3 - Q1)/2 is the formula for quartile deviation of the data.

8.6.3. MERITS AND DEMERITS OF QUARTILE DEVIATION


Merits
1. It is precisely defined.

2. It is simple to comprehend and simple to compute.


87 SHOOLINI UNIVERSITY

3. It is an appropriate measure of dispersion when a distribution is strongly skewed since it is


not affected by extreme observations.

4. It is calculable even for distributions with open-ended outcomes.

Demerits
Since it is not based on all observations, it is not a trustworthy dispersion measure.

1. It is significantly affected by sampling fluctuations.

2. It is not susceptible to mathematical treatment.

3. Inter Quartile variation is difference between values of 3rd Quarter and 1st Quarter.

4. Inter Quartile Range=Q3-Q1

5. Semi Quartile Deviation=(Q3-Q1)/2

6. Q3-M=M-Q1 where M is the Median, 25 % below Q1 and 25% above Q3

7. Coefficient of Quartile Deviation =(Q3-Q1)/(Q3+Q1)

8. Calculate Quartile Deviation and its coefficient :

9. Months: Jan,Feb,March,April,May,June,July,Aug,Sep,Oct,Nov,Dec and sales in lakhs


Rs:55,60,70,90,90,110,120,130,145,145,155 ,170

10. Q1=Size of (12+1)/4 th item=Size of 3rd item +1/4(size of 4th item-size of 3rd item)

11. 70+1/4(90-70)=75
88 STATISTICS FOR MANAGEMENT

12. Similarly Q3= Size of ¾(12+1)th item =Size of 9.75th item=size of 9th item+3/4(size of
10th –size of 9th) =145+3/4(145-145)=145

13. Hence Q.D.=(145-75)/2=35

14. Coefficient of Q.D.=(145-75)/(145+75)=70/228=0.318

15. Find values of Quartile Deviation and its coefficient from the following data:

16. Weekly income distribution: 2100,480,650,600,310,240,1200,1600,780,570,370

17. Arrange income in increasing order

18. 240,310,370,480,570,600,650,780,1200,1600,2100

19. Q1=(1/4(11+1)th Term , 3rd Term=370

20. Q3=(3/4(11+1)th Term , 9th Term=1200

21. Q.D.=(1200-370)/2=415

22. Coefficient of Q.D.= (1200-370)/(1200+370)=0.529

Quartile Variation-Continuous Series

• Find values of Quartile Deviation and its coefficient from the following data:

• Marks:0-10.10-20,20-30,30-40,40-50

• Number of Students:4,15,28,16,7
89 SHOOLINI UNIVERSITY

• Cumulative Frequency=50 , Q1=1/4(70)th item=17.5th which lies in interval 10-20 ,


hence by formula Q1=10+((17.5-4)/15)*10=10+9=19

• Q3=(3/4(70)th Term , 52.5th item lies in 30-40 .Q3=30+((52.5-


47/16)*10=30+3.44=33.44

• Q.D.=(33.44-19)/2=7.22

• Coefficient of Q.D.= (33.44-19)/(33.44+19)=14.44/52.44=0.275

• Find values of Quartile Deviation and its coefficient from the following data:

• Income : Less than 50, 50-70,70-90,90-110,110-130,130-150, Above 150

• Number of employees :54,100,140,300,230,125,51

• Cumulative Frequency=1000 , Q1=1/4(1000)th item=250th which lies in interval 70-90


, hence by formula Q1=70+((250-154)/140)*20=70+13.71=83.71

• Q3=(3/4(1000)th Term , 750th item lies in 110-130 .Q3=110+((750-


594/230)*20=110+3120/230=110+13.57=123.57

• Q.D.=(123.57-83.71)/2=19.93

• Coefficient of Q.D.= (123.57-83.71)/(123.57+83.71)=39.86/207.28=0.1923

8.7. MEAN DEVIATION OF AVERAGE DEVIATION


Based on all observations, the mean deviation is a measure of dispersion. It is defined as the
arithmetic mean of observations' absolute departures from a central value such as the mean,
median, or mode. Here, each observation's dispersion is quantified by its divergence from a
central value. This deviation will be positive for observations that exceed the mean and
negative for those that fall below it.
90 STATISTICS FOR MANAGEMENT

8.7.1. CALCULATION OF MEAN DEVIATION


Consider a series of observations X1, X2, X3,….Xn

1
Mean Deviation from 𝑋 = 𝑛 ∑𝑛𝑖=1 |𝑋𝑖 − 𝑋|

The above mean deviation formulas provide an absolute measure of dispersion. The following
are the formulas for relative measure, often known as the coefficient of mean deviation:

Example: Calculate mean deviation from mean and median for the following data of heights
(in inches) of 10 persons.

60, 62, 70, 69, 63, 65, 60, 68, 63, 64

Also calculate their respective coefficients.

Solution:

Calculation of M.D. from 𝑋

60+62+70−6963+65+60+68+63+64
𝑋= = 64.4 𝑖𝑛𝑐ℎ𝑒𝑠
10

Sum of observations greater than 𝑋 = 70 + 69 + 65 + 68 = 272.

Sum of observations less than 𝑋 = 60 + 62 + 63 + 60 + 63 + 64 = 372. Also, k2= 4 and k1 = 6

1
M.D. from 𝑋 = 10 [272 − 372 − (4 − 6)64.4] = 2.88 𝑖𝑛𝑐ℎ𝑒𝑠
91 SHOOLINI UNIVERSITY

2.88
Also, coefficient of M.D. from 𝑋 = 64.4 = 0.045 2

8.7.2. MERITS AND DEMERITS OF MEAN DEVIATION


Merits

1. It is simple to comprehend and simple to compute.

2. It is founded on all observations.

23. It is less susceptible to extreme observations than the range or standard deviation.

24. It is not much impacted by sampling fluctuations.

Demerits

1. It cannot be further analysed mathematically. Since mean deviation is the arithmetic mean
of deviations' absolute values, it is not very amenable to algebraic manipulation.

2. This entails the search for a dispersion measure that can be submitted to further
mathematical analysis.

3. As deviations can be obtained from any measure of central tendency, this measure of
dispersion is not well-defined.

8.8. STANDARD DEVIATION


The mathematically problematic practise of omitting the minus sign of deviations when
calculating the mean deviation renders the formula for the mean deviation unsuitable for further
mathematical treatment. In addition, when signs are considered, the sum of departures from
their arithmetic mean is zero.
92 STATISTICS FOR MANAGEMENT

This would indicate that the observations are homogeneous. However, it is true that various
observations differ from one another. As a measure of dispersion, the positive square root of
the arithmetic mean of the sum of squares of these variances is taken.

This dispersion measurement is known as the standard deviation or root-mean-square


deviation. The square of the standard deviation is the variance. In 1893, Karl Pearson created
the notion of standard deviation.

The standard deviation is represented by the Greek letter, sometimes known as "little sigma"
or just "sigma."

The formula is mentioned below:

The units of sigma are same as the unit of observations (X)

8.8.1. CALCULATION OF STANDARD DEVIATION


The standard deviation can be calculated as the following steps mentioned below:

Individual series:

Consider there are n observations X1, X2, X3,….Xn


93 SHOOLINI UNIVERSITY

The above-mentioned steps are appropriate when 𝑋 represents a whole number. If 𝑋 does not
represents a whole number, then below mentioned steps can be used to find standard deviation

Example: Calculate the standard deviation of the weight of luggage, 10 persons are carrying

45,59,55,50,41,44,60,58,53,55 (Weight in Kgs)

Table 8.1

Weights 45 49 55 50 41 44 60 58 53 55 Total
(X) (510)

X-𝑋 -6 -2 4 -1 -10 -7 9 7 2 4 0
94 STATISTICS FOR MANAGEMENT

2 36 4 16 1 100 49 8 49 4 16 356
(𝑋 − 𝑋)

510 356
𝑋= = 51 𝑘𝑔𝑠 and 𝜎 2 = = 35.6𝑘𝑔𝑠 2
10 10

Therefore, 𝜎 = 5.97 𝑘𝑔𝑠

For grouped data

Consider the observations 𝑥1 , 𝑥2 , 𝑥3 , … 𝑥𝑛 with their respective frequencies 𝑓1 , 𝑓2 , 𝑓3 , … 𝑓𝑛


where ∑ 𝑓𝑖 = 𝑁

8.8.2. COEFFICIENT OF VARIATION


Standard deviation is an absolute measure of dispersion stated in the same units as variable X.
The coefficient of standard deviation, a relative measure of dispersion based on standard
deviation, is calculated by

𝜎
× 100
𝑋
95 SHOOLINI UNIVERSITY

This Karl Pearson-introduced metric compares the variability, homogeneity, stability,


uniformity, and consistency of two or more data sets. The data with a larger coefficient of
variation are characterised as more dispersed, less uniform, etc.

• Calculation of Standard Deviation—Individual Observations

• In case of individual observations, standard deviation may be computed by applying


any of the following two methods:

• 1. By taking deviations of the items from the actual mean.

• 2. By taking deviations of the items from an assumed mean. 1. Deviations taken from
Actual Mean: When deviations are taken from actual mean the following formula is
applied: σ = ∑ x2/N where x=X-Mean and N = number of observations

• Calculate standard deviation from the following observations of marks of 5 students of


a tutorial group:

• By Actual Mean Method: Marks out of 25

• 8 12 13 15 22

• We have cumulative x=70, Mean=70/5=14 , we compute x=X-Mean per row and


cumulative =106 , hence S.D. =square root of 106/5=4.604

• Coefficient of SD=SD/Mean=4.604/14=0.3289

• Coefficient of variance=Coefficient of SD*100=32.89

• By Assumed Mean Method , we assume A=13, dx=X-A , computing cumulative dx


and cumulative d(x) square for the data which we get 6 and 111 respectively.
96 STATISTICS FOR MANAGEMENT

• Applying formula we get S.D.=Square root of (519/25) =4.60

• Coefficient of SD=4.604/14=0.3289

• Coefficient of Variance=Coefficient of SD*100=32.89%

• The square root of mean pf deviations squared or root mean square deviation from
mean.

• Was introduced by Karl Pearson in 1893.

• Main purpose is to ignore zero sum.

• Coefficient of Standard deviation =Standard Deviation/Mean

• Find standard deviation by direct method and short cut method for following data:
40,44,54,60,62,64,70,80,90,96

• By direct method , N=10, cumulative X=660 , Mean=66 , computing d=X-Mean and


then applying formula for S.D.

• S.D.=Square Root of (d square/N)=Rupees 17.46 tentatively

• In short cut method we take assumed mean =64 , compute dx=X-A and dx square=(X-
A) square

• We get Mean=66 , cumulative dx=60 and cumulative dx square=3408

• Applying formula we get S.D.=Rupees 17.46 (a)

• Coefficient of S.D.= (a)/Mean= 17.46/66=0.2645


97 SHOOLINI UNIVERSITY

• We have to calculate S.D. from following data ,marks :12,14,16,18,20,22,24 , number


of students are: 6,12,18,26,16,10,8

• We have N=96, say assumed mean=20, we compute dx=(X-A) ,also compute


cumulative fdx and cumulative f*(dx )square

• Cumulative fdx=-192 and cumulative f (dx) square=1376

• Applying formula we get S.D.=square root of (14.33-4)=3.2145 , Mean=18 and we get


Coefficient of Standard deviation=3.2145/18=0.1786

• Compute standard deviation from following data by direct method:

• X:5,8,11,12,14,16,18

• N=7, Cumulative X=84, Mean =12

• Applying X-Mean per row, cumulative (X-Mean)=Cumulative dx=0

• Compute dx square row wise, cumulative dx square =122

• S.D.=SQUARE ROOT (122/7)=4.17

• Coefficient of S.D.= 4.17/12=0.3475

• By Short cut method , we assume Mean=11=A

• Compute dx=X-A for each row and dx square for each row

• We get cumulative dx=7 and cumulative dx square=129 ,Actual Mean =11+7/7=12


98 STATISTICS FOR MANAGEMENT

• Square root of (17.428)=4.17

• Coefficient of S.D.=4.17/Mean=0.3475 and Coefficient of Variation is 34.75%

Combined Standard Deviation

• A panel of two judges P and O graded seven dramatic performances by independently


awarding marks as follows:

• Performance: 1 2 3 4 5 6 7

• Marks by P: 46 42 44 40 43 41 45

• Marks by O: 40 38 36 35 39 37 41

• Find out coefficient of variation in the marks awarded by two judges and interpret the
result

• Marks by P

• Mean = 301/ 7 = 43

• Σ= Square root of (Cumulative (x) square/N ) = square root (28/7) = 2, C.V. =


(2/43)*100 4.65

• Marks by 0

• Mean = 277/ 7 = 38

• Σ= Square root of (Cumulative (x) square/N ) = square root (28/7) = 2,


99 SHOOLINI UNIVERSITY

• C.V. = (2/38)*100= 5.26

• The average marks obtained by P are higher. Hence his performance is better. The
coefficient of variation is lower in case of P hence he is a more consistent student.

Properties of Standard Deviation

8.8.4. SKEWNESS
A distribution's skewness refers to its asymmetry. The symmetry of a distribution denotes that
for a given departure from the mean, there are an equal number of observations on either side.
If the distribution is asymmetrical or skewed, the frequency curve will have an extended tail to
the left or right. Therefore, the skewness of a distribution is the deviation from symmetry. It is
possible for two or more frequency distributions to share the same mean and standard deviation,
but not skewness.
100 STATISTICS FOR MANAGEMENT

Figure 8.1

8.9. SUMMARY
A variable's distribution is condensed into a single number that serves as a measure of its central
tendency. This number can be regarded indicative of the variable. On the other hand, this
measure by itself is not adequate to define a distribution because it is possible for two or more
separate distributions to have the same central value.

On the other hand, it is possible for there to be differences between the central tendency of two
or more examples despite the fact that their distribution patterns are the same. Therefore, the
establishment of additional summary metrics is required in order to fulfil the requirements for
successful property capture of a distribution.

One example of this kind of statistic is the measure of dispersion, often known as the measure
of variation.

The level of scatter or variance in observations is what is meant by the term "dispersion." In
most cases, the degree to which an observation deviates from the mean is used as a proxy for
quantifying the observation's degree of variability. The term "dispersion measure" refers to an
estimate that is based on an acceptable average of all of these deviations.

8.10. KEYWORDS
Averages of second order: The measurements that indicate the spread of data in terms of the
average of deviations of observations from some central value, such as mean deviation,
standard deviation, etc., are termed averages of second order.
101 SHOOLINI UNIVERSITY

Standard deviation's coefficient: The coefficient of standard deviation is a relative measure of


dispersion based on standard deviation.

Dispersion: is the measure of the degree to which individual objects fluctuate. Measures that
express the dispersion of observations in terms of the distance between the values of selected
observations. Examples include range, interquartile range, interpercentile range, etc.

Interquartile Range: is an absolute measure of dispersion based on the difference between the
third quartile (Q3) and the first quartile (Q1) (Q1) Interquartile range equals Q3 minus Q1

Measure of variation: The measure of the dispersion of the mass of figures in a series around
a mean.

Quartile deviation or semi-interquartile range: The quartile deviation or semi-interquartile


range is half of the interquartile range. The range of a distribution is the difference between its
greatest and smallest observations. R = L - S, where R represents the range and L and S
represent the largest and smallest observations, respectively.

Variance: The square of the standard deviation is the variance.

8.11. REVIEW QUESTIONS


1. "Frequency distributions may differ in the numerical magnitude of their averages, but not
necessarily in their formation, or they may have the same average values but differ in their
formations." Explain and show how dispersion measurements enhance the information
provided by averages regarding frequency distribution.

2. "Indeed, averages and measures of variation cover the majority of a practical statistician's
needs, but their interpretation and usage together require a solid understanding of statistical
theory." — Tippet. Examine this assertion using the arithmetic mean and standard deviation.

3. Explain why the standard deviation is a more accurate measure of dispersion than other
methods that have been tried in the past. If there are any, please mention them.
102 STATISTICS FOR MANAGEMENT

4. Calculate range and its coefficient from the following data: (a)159, 167, 139, 119, 117, 168,
133, 135, 147, 160

Weights (lbs.): 115-125 125-135 135-145 145-155 155-165 165-175

(b) Frequency: 4 5 6 3 1 1

5. Find out quartile deviation and its coefficient from the following data:

Class: 0-4 5-9 10-14 15-19 20-24 25-29

Frequency: 15 26 12 5 4 3

6. Find out the range of income of (a) middle 50% of workers, (b) middle 80% of the workers
and hence the coefficients of quartile deviation and percentile deviation from the following
data:

Wages less than : 40 50 60 70 80 90 100

No. of workers :5 8 15 20 30 33 35

8.12. REFERENCE FOR FURTHER READING


Balwani Nitin Quantitative Techniques, First Edition: 2002. Excel Books, New Delhi.

Bhardwaj R.S., Business Statistics, Excel Books.

Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York.

Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Hill Book
Company, New York.
103 SHOOLINI UNIVERSITY

Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.

Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
104 STATISTICS FOR MANAGEMENT

UNIT 9: INTRODUCTION TO PROBABILITY

CONTENT:
▪ Objectives
Introduction
9.1. Concepts of Random Experiments and RandomVariables
9.2. Views of Probability: Subjective, Classical
9.3. Types of Probability
9.4. Rules of Probability: Multiplication Rule for Independent and
Dependent
9.5. Random Events and Probability
9.6. Counting techniques and calculation of probabilities
9.7. Conditional Probability
9.8. Summary
9.9. Keywords
9.10. Review Questions

OBJECTIVES:
• The concept of probability
• Grab some knowledge about the rule of probability and understands where to apply
• The concept of Counting techniques and calculations of probabilities
105 SHOOLINI UNIVERSITY

INTRODUCTION
Probability is a metric for determining how likely an event is to occur. If an event can occur in
N mutually exclusive and equally likely ways, and if m of these possesses a trait, E, the
probability of the occurrence of E is read as P(E) = m/N

Concept:

Probability is the likelihood or chance that a particular event will or will not occur.

The theory of probability provides a quantitative measure of uncertainty or likelihood of


occurrence of different events resulting from a random experiment, in terms of quantitative
measures ranging from 0 to 1.

This means that the probability of a certain event is 1 and the probability of an impossible event
is 0. In other words, a probability near 0 indicates that an event is unlikely to occur whereas a
probability near 1 indicates that an event is almost certain to occur.

For example:

Suppose an event is the success of a new product launched. A probability 0.90 indicates that
the new product is likely to be successful whereas a probability of 0.15 indicates that the
product is unlikely to be successful in the market. A probability of 0.50 indicates that the
product is just as likely to be successful as not.

Some Basic Concepts

An Experiment When we conduct a trial to obtain some statistical information, it is called an


experiment.

Examples:

(i) Tossing of a fair coin is an experiment and it has two possible outcomes: Head (H) or Tail
(T).
106 STATISTICS FOR MANAGEMENT

(ii) Rolling a fair die is an experiment and it has six possible outcomes: appearance of 1 or 2
or 3 or 4 or 5 or 6 on the upper most face of a die.

(iii) Drawing a card from a well shuffled pack of playing cards is an experiment and it has 52
possible outcomes.

Some Basic Concepts

An Experiment When we conduct a trial to obtain some statistical information, it is called an


experiment.

Examples:

(i) Tossing of a fair coin is an experiment and it has two possible outcomes: Head (H) or Tail
(T).

(ii) Rolling a fair die is an experiment and it has six possible outcomes: appearance of 1 or 2
or 3 or 4 or 5 or 6 on the upper most face of a die.

(iii) Drawing a card from a well shuffled pack of playing cards is an experiment and it has 52
possible outcomes.

Events

The possible outcomes of a trial/experiment are called events.

Events are generally denoted by capital letters A, B, C, etc.

Examples: (i) If a fair coin is tossed, the outcomes - head or tail are called events.

(ii) If a fair die is rolled, the outcomes 1 or 2 or 3 or 4 or 6 appearing up are called events.
107 SHOOLINI UNIVERSITY

Exhaustive Events

The total number of possible outcomes of a trial/experiment are called exhaustive events.

In other words, if all the possible outcomes of an experiment are taken into consideration, then
such events are called exhaustive events.

Examples: (i) In case of tossing a die, the set of six possible outcomes, i.e., 1, 2, 3, 4, 5 and 6
are exhaustive events.

(ii) In case of tossing a coin, the set of two outcomes, i.e., H and T are exhaustive events. (iii)
In case of tossing of two dice, the set of possible outcomes are 6 × 6 = 36

Equally-Likely Events

The events are said to be equally-likely if the chance of happening of each event is equal or
same.

In other words, events are said to be equally likely when one does not occur more often than
the others.

Examples: (i) If a fair coin is tossed, the events H and T are equally-likely events.

(ii) If a dice is rolled, any face is as likely to come up as any other face. Hence, the six outcomes
-1 or 2 or 3 or 4 or 5 or 6 appearing up are equally likely events.

Mutually Exclusive Events

Two events are said to be mutually exclusive when they cannot happen simultaneously in a
single trial.
108 STATISTICS FOR MANAGEMENT

In other words, two events are said to be mutually exclusive when the happening of one
excludes the happening of the other in a single trial.

Examples: (i) In tossing a coin, the events Head and Tail are mutually exclusive because both
cannot happen simultaneously in a single trial. Either head occurs or tail occurs. Both cannot
occur simultaneously. The happening of head excludes the possibility of happening of tail.

(ii) In tossing a dice, the events 1, 2, 3, 4, 5 and 6 are mutually exclusive because all the six
events cannot happen simultaneously in a single trial. If number 1 turns up, all the other five
(i.e., 2, 3, 4, 5, or 6) cannot turn up.

Complementary Events

Let there be two events A and B.

A is called the complementary event of B and B is called the complementary event of A if A


and B are mutually exclusive and exhaustive.

Examples: (i) In tossing a coin, occurrence of head (H) and tail (T) are complementary events.

(ii) In tossing a dice, occurrence of an even number (2, 4, 6) and odd number (1, 3, 5) are
complementary events

Definition of Probability:

According to Laplace,

“Probability is the ratio of the favourable cases to the total number of equally likely cases”.

For example, if a bag contains 6 green and 4 red balls, then the probability of getting a green
ball will be 6/4 + 6 = 6/10 because the total number of balls are 10 and the number of green
balls is 6 .
109 SHOOLINI UNIVERSITY

P(A) = p = Number of Favorable Cases/ Total Number of Equally Likely Cases =m/n

Where, P(A) = Probability of occurrence of an event A

m = Number of favorable cases

n = Total number of equally likely cases.

Similarly, P(B) = q = 1 – P(A) = 1 − m/n Where, P(A) = q = Probability of non-occurrence of


an event A.

p+q=1

“If an event can happen in m ways and fails to happen in n ways, then probability of happening
is m/m n + and that of its failure to happen is n/m+n ”.

Significance of Theory of Probability

(1) The importance of probability is clear from the following points: (1) Probability is used in
making economic decision in situations of risk and uncertainty by sales managers, production
managers, etc.

(2) Probability is used in theory of games which is further used in managerial decisions.

(3) Various sampling tests like Z-test, t-test and F-test are based on the theory of probability.

(4) Probability is the backbone of insurance companies because life tables are based on the
theory of probability. Thus, probability is of immense utility in various fields.

Probability Scale
110 STATISTICS FOR MANAGEMENT

The probCalculation of Probability of an Event

The following steps are to be followed while calculating the probability of an event:

(1) Find the total number of equally likely cases, i.e., n

(2) Obtain the number of favourable cases to the event,. i.e., m

(3) Divide the number of favourable cases to the event (m) by the total number of equally
likely cases (n).

(4) This will give the probability of an event. Symbolically, Probability of occurrence of
an event E is: P(E) = Number of favourable cases to E/Total number of equally likely
cases = m/n

(5) Similarly, Probability of non-occurrence of event E is: P(E) = 1 – P(E)

ability of an event always lies between 0 and 1.

Example 1:

Find the probability of getting a head in a tossing of a coin.

Solution: When a coin is tossed, there are two possible outcomes - Head or Tail.

Total number of equally likely cases = n = 2

Number of cases favourable to H = m = 1

P(H) = m/n = ½
111 SHOOLINI UNIVERSITY

Example 2: What is the probability of getting an even number in a throw of an unbiased dice
?

Solution: When a die is tossed, there are 6 equally likely cases, i.e., 1, 2, 3, 4, 5, 6. Total
number of equally likely cases = n = 6

Number of cases favourable to even points (2, 4, 6) = m = 3 ∴ Probability of getting an even


number = 3/6 = 1/2

Example 3:

A bag contains 5 black and 10 white balls. What is the probability of drawing (i) a black ball,
(ii) a white ball ?

Solution: Total number of balls = 5 + 10 = 15

(i) P (black ball) = No. of black balls/Total No. of balls = 5/15 = 1/3

(ii) P (white ball) = No. of white balls/Total No. of balls = 10/15 = 2/3

Example 4:

In a lottery, there are 10 prizes and 90 blanks. If a person holds one ticket, what are the chances
of (i) getting a prize (ii) not getting a prize Solution: Total No. of tickets = 10 + 90 = 100 (i)
Probability of getting a prize: No. of prizes = 10 ∴ No. of favourable cases = 10 Total No. of
cases = 100 Required Probability = 10/100 = 1/10 = 0.1 (ii) The probability of not getting a
prize: No. of Blanks = 90 ∴ Number of favourable cases = 90

Total Number of cases = 100 Required Probability = 90 100 = 0.9


112 STATISTICS FOR MANAGEMENT

Example 5:

What is the probability of getting a number greater than 4 with an ordinary dice ?

Solution: Number greater than 4 in a dice are 5 and 6.

∴ Number of favorable cases = 2 Total number of cases = 6 Required Probability = 2/6 = 1/3

Example 6:

Find the probability of drawing a face card in a single random draw from a well shuffled pack
of 52 cards.

Solution: There are 52 cards in a pack of cards. Total number of cases = 52 Number of favorable
cases (face cards include the Jack, Queen and King in each) = 12

Required Probability = 12/52=3/13

Example 7: A card is drawn from an ordinary pack of playing cards and a person bets that it
is a spade or an ace. What are odds against his winning this bet ?

Solution: Total number of cases = 52 Since there are 13 spades and 3 aces (one ace is also
present in spades),

Therefore the favorable cases = 13 + 3 = 16

The probability of winning the bet = 16/52 = 4/13

The probability of losing the bet = 1-4/13 = 9/13 Hence, odds against winning the bet = 9/13
:: 4/13= 9: 4
113 SHOOLINI UNIVERSITY

Example 8: A single letter is selected at random from the word ‘PROBABILITY’. What is the
probability that it is a vowel ?

Solution: There are 11 letters in the word ‘PROBABILITY’ out of which 1 is be selected. ∴
Total No. of words = 11

There are four vowels viz. O, A, I, I.

Therefore favourable number of cases = 4

Hence, the required probability = 4/11

Example 9: Find the probability of drawing an ace from a set of 52 cards.

Solution: Number of exhaustive cases (n) = 52

There are 4 ace cards in an ordinary pack.

∴ Favourable cases (n) = 4 ∴ Probability of getting an ace = 4/52 = 1/13

Example 10: What is the probability that a leap year selected at random will contain 53
Sundays ?

Solution: Total number of days in a leap year = 366

Number of weeks in a year = 366/ 7 = 52 weeks and 2 days

Following may be the 7 possible combinations of these two extra days: (i) Monday and Tuesday
(ii) Tuesday and Wednesday (iii) Wednesday and Thursday (iv) Thursday and Friday (v) Friday
and Saturday (vi) Saturday and Sunday (vii) Sunday and Monday
114 STATISTICS FOR MANAGEMENT

A selected leap year can have 53 Sundays if these two extra days happen to be a Sunday Total
possible outcomes of 2 days = n = 7 Number of cases having Sundays = m = 2

∴ The required probability = 2/7

Example 11: From a bag containing 5 red and 4 black balls. A ball is drawn at random. What
is the probability that it is a red ball ?

Solution: Total No. of balls in the bag = 5 + 4 = 9 No. of red balls in the bag = 5 ∴ Probability
of getting a red ball = 5

Example 12: What is the probability of getting a king in a draw from a pack of cards ?

Solution: Number of exhaustive cases = n = 52 There are 4 king cards in an ordinary pack. ∴
Number of favorable cases = m = 4 ∴ Probability of getting a king = 4/52 = 1/13

Example 13:

In a single throw with two uniform dice find the probability of throwing (i) Five, (ii) Eight.

Solution.

Exhaustive number of cases in a single throw with two dice is 6 square = 36. (i) Sum of ‘5’
can be obtained on the two dice in the following mutually exclusive ways : (1, 4,), (4, 1), (2,
3), (3, 2) i.e., 4 cases in all where the first and second number in the bracket ( ) refer to the
numbers on the 1st and 2nd dice respectively.

∴ Required probability = 4/36 = 1/9

(ii) The cases favorable to the event of getting sum of 8 on two dice are : (2, 6), (6, 2), (3, 5),
(5, 3), (4, 4) i.e., 5 distinct cases in all ∴ Required probability = 5/36 ·
115 SHOOLINI UNIVERSITY

Example 14:

Four cards are drawn at random from a pack of 52 cards.

Find the probability that (i) They are a king, a queen, a jack and an ace.

(ii) Two are kings and two are aces.

(iii) All are diamonds.

(iv) Two are red and two are black.

(v) There is one card of each suit.

(vi) There are two cards of clubs and two cards of diamonds

Solution:

Four cards can be drawn from a well shuffled pack of 52 cards in 52C4 ways, which gives the
exhaustive number of cases.

(i) 1 king can be drawn out of the 4 kings is 4C1 = 4 ways.


(ii) Similarly, 1 queen, 1 jack and an ace can each be drawn in 4C1 = 4 ways. Since any
one of the ways of drawing a king can be associated with any one of the ways of
drawing a queen, a jack and an ace, the favourable number of cases are 4C1 × 4C1
× 4C1 × 4C1
(iii) Hence, required probability = 4C1 × 4C1 × 4C1 × 4C1/ 52C4 = 256/ 52C4
(iv) (ii) Required probability = 4C2 × 4C2 /52C4
(v) (iii) Since 4 cards can be drawn out of 13 cards (since there are 13 cards of diamond
in a pack of cards) in 13C4 ways, Required probability = 13C4/52C4 ·
(vi) (iv) Since there are 26 red cards (of diamonds and hearts) and 26 black cards (of
spades and clubs) in a pack of cards,
116 STATISTICS FOR MANAGEMENT

(vii) Required probability = 26C2 × 26C2/52C4


(viii) (v) Since, in a pack of cards there are 13 cards of each suit, Required probability =
13C1 × 13C1 × 13C1 × 13C1/52C4 (vi) Required probability = 13C2 × 13C2/52C4

Example 15:
What is the chance that a non-leap year should have fifty-three sundays ? Solution.
A non-leap year consists of 365 days i.e., 52 full weeks and one over-day. A non-
leap year will consist of 53 sundays if this over-day is sunday. This over-day can
be anyone of the possible outcomes
(i) Sunday (ii) Monday (iii) Tuesday (iv) Wednesday (v) Thursday (vi) Friday
(vii) Saturday i.e., 7 outcomes in all. Of these, the number of ways favorable
to the required event viz., the over-day being Sunday is 1. ∴ Required
probability = 1/7 ·

Example 16:

(ii) A bag contains 6 black and 9 white balls. A person draws out 2 balls. If on
every black ball he gets Rs. 20 and on every white ball Rs. 10, find out his
expectation.
(iii) Solution: There may be the following three options for drawing 2 balls: (i)
Both are white, (ii) Both are black, (iii) One is white and other is black. (i)
Both balls are white P (2W) = p = 9C2/12C2=12/35
(iv) Expectation = p × m = (12/35)*10*2= Rs. 6.86
(v) (ii) Both balls are black P (2B) = p = 6C2/15C2 = 1/7 Expectation = p × m
= 20*2*(1/7) = Rs. 5.71
(vi) (iii) One ball is white and the other is black P (1W 1B) = p =
6C1*9C1/15C2= 18/35 Expectation = p × m = (18/35)*(20+10)= Rs. 15.43
(vii) Total Expectation = 6.86 + 5.71 + 15.43 = Rs. 28

Example 17:

(viii) If it rains, a taxi driver can earn Rs. 1000 per day. If it is fair, he can lose
Rs. 100 per day. If the probability of rain in 0.4, what is his expectation?
117 SHOOLINI UNIVERSITY

(ix) The distribution of earnings

(x) (X) is given as: X

(xi) X1 = 1000

(xii) X2 = – 100 P

(xiii) P1 = 0.4

(xiv) P2 = 1 – 0.4 = 0.6

(xv) ∴ E (X) = P1 X1 + P2 X2 = 0.4 × 1000 + 0.6 × (– 100) = Rs. 340

Example 18:

A petrol pump dealer sells an average petrol of Rs. 80,000 on a rainy day and an average of
Rs. 95,000 at a clear day. The probability of clear weather is 76% on Tuesday. What will be
the expected sale ?

The distribution of earnings (X) is given as:

X1 = 80,000

X2 = 95,000

P 1 – 0.76 = 0.24

P2=0.76
118 STATISTICS FOR MANAGEMENT

(X) = 80,000 × 0.24 + 95,000 × 0.76

= Rs. 91,400

Example 19:

A survey conducted over the last 25 years indicated that in 10 years, the winter was mild, in 8
years it was cold and in the remaining 7 it was very cold. A company sells 1000 woolen coats
in a mild year, 1300 in a cold year and 2000 in a very cold year. If a woolen coat costs Rs. 173
and is sold for Rs. 248, find the yearly expected profit of the company.

State of Nature Prob. P (X) Sale of woollen coat Profit (X)

Mild winter 10/25 = 0.4 1000 1000 × (248 – 173)

Cold winter 8/25 = 0.32 1300 1300 × (248 – 173)

Very cold winter 7/25 = 0.28 2000 2000 × (248 – 173)

∴ Expected profit is given by

E (X) = 1000 × 75 × 0.4 + 1300 × 75 × 0.32 + 2000 × 75 × 0.28

= 30,000 + 31,200 + 42,000 = Rs. 1,03,200

Example 20:

A bag contains 20 tickets marked with numbers 1 to 20. One ticket is drawn at random. Find
the probability that it will be a multiple of (i) 2 or 5, (ii) 3 or 5.

One ticket can be drawn out of 20 tickets in 20C1 = 20 ways, which determines the exhaustive
number of cases.
119 SHOOLINI UNIVERSITY

(i) The number of cases favourable to getting the ticket number which is : (a) a multiple of 2
are 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 i.e., 10 cases. (b) a multiple of 5 are 5, 10, 15, 20 i.e., 4
cases. Of these, two cases viz., 10 and 20 are duplicated. Hence the number of distinct cases
favourable to getting a number which is a multiple of 2 or 5 are 10 + 4 – 2 = 12. ∴ Required
probability = 12/20 = 3/5 = 0·6.

(ii) The cases favourable to getting a multiple of 3 are 3, 6, 9, 12, 15, 18 i.e., 6 cases in all and
getting a multiple of 5 are 5, 10, 15, 20 i.e., 4 cases in all. Of these, one case viz., 15 is
duplicated. Hence, the number of distinct cases favourable to getting a multiple of 3 or 5 is 6 +
4–1=9

Required probability = 9/20 = 0·45

9.1. CONCEPTS OF RANDOM EXPERIMENTS AND RANDOMVARIABLES


A set of conditions used to observe the behaviour of some variables is known as an experiment.
A random experiment is one in which the results cannot be predicted with certainty and is
carried out under specific circumstances. A random experiment often refers to each run as a
trial. Each trial's potential results can vary, which is why it is considered random.

Sample space is the set of all the results of a random experiment, while sample points are each
individual outcome. For instance, the only numbers in the sample space for the random
experiment "counting the number of rainy days in June" are 0 through 30. 'Measuring the
rainfall depth', 'Soil Moisture Content', or 'Wind Speed' results at a place may have any
nonnegative values. This means that any real number between 0 and ∞ comprises the sample
space for these random tests.

A subset of a sample space can be referred to as an event. The event could be made up of one
or more sample points (discrete sample space), or it could be a range from the sample space
(continuous sample space). The event of "counting the number of rainy days in June" is an
example from the sample space where the number of rainy days equals 10. The continuous
sample space of "wind speed" at a place also contains events with wind speeds greater than 100
km/h.
120 STATISTICS FOR MANAGEMENT

Compliment

Complement: Sometimes, we want to know the probability that an event will not happen; an
event opposite to the event of interest is called a complementary event. If A is an event, its
complement is the probability of the complement is AC or A Example: The complement of
male event is the female

P(A) + P(AC) = 1

9.2. VIEWS OF PROBABILITY: SUBJECTIVE AND CLASSIC


It is an estimate that reflects a person’s opinion, or best guess about whether an outcome will
occur. Important in medicine ➔ form the basis of a physician’s opinion (based on information
gained in the history and physical examination) about whether a patient has a specific disease.
Such estimate can be changed with the results of diagnostic procedures.

Views of Probability: Classic

It is well known that the probability of flipping a fair coin and getting a “tail” is 0.50. If a coin
is flipped 10 times, is there a guarantee, that exactly 5 tails will be observed If the coin is
flipped 100 times? With 1000 flips? As the number of flips becomes larger, the proportion of
coin flips that result in tails approaches 0.50

9.3. TYPES OF PROBABILITY


• Marginal probabilities

• Conditional probabilities

• Joint probability

• Rules of probability
121 SHOOLINI UNIVERSITY

9.3.1. MARGINAL PROBABILITIES


The marginal distribution of a subset of a set of random variables in probability theory and
statistics is the probability distribution of the variables that make up the subset. Without
mentioning the values of the other variables, it provides the probability of different subset
values for the variables.

Example: In problem 1, P(Male), P (Blood group A) P(Male) = number of males/total number


of subjects = 50/100 = 0.5

9.3.2. CONDITIONAL PROBABILITIES


It is the probability of an event on condition that certain criteria is satisfied.

The likelihood of a particular outcome happening in the presence of another occurrence is


known as conditional probability. It is frequently written as P(B|A) and is generally described
as the probability of B given A, where the likelihood of B depends on the likelihood that A will
occur.

Example:

If a subject was selected randomly and found to be female what is the probability that she has
a blood group O Here the total possible outcomes constitute a subset (females) of the total
number of subjects. This probability is termed probability of O given F P(O\F) = 20/50 = 0.40

9.3.3. JOINT PROBABILITIES


It is the probability of occurrence of two or more events together. A statistical concept called
joint probability determines the probability that two events will occur simultaneously and at
the same time.

Example:

Probability of being male & belong to blood group AB P(M and AB) = P(M∩AB) = 5/100 =
0.05
122 STATISTICS FOR MANAGEMENT

∩ = intersection

Properties

The probability ranges between 0 and 1

• If an outcome cannot occur, its probability is 0

• If an outcome is sure, it has a probability of 1

• The sum of probabilities of mutually exclusive outcomes is equal to 1

• P(M) + P(F) = 1

9.4. RULES OF PROBABILITY: MULTIPLICATION RULE FOR INDEPENDENT AND


DEPENDENT
Independent

If the occurrence of one event has no bearing on the likelihood that the other event will also
occur, then two events A and B are said to be independent.

P (A and B) = P(A) P(B)

Example:

The joint probability of being male and having blood type O

• To know that two events are independent compute the marginal and conditional probabilities
of one of them if they are equal the two events are independent. If not equal the two events are
dependent
123 SHOOLINI UNIVERSITY

• P(O) = 40/100 = 0.40

• P(O\M) = 20/50 = 0.40

• Then the two events are independent

• P(O∩M) = P(O) P(M) = (40/100) (50/100) = 0.20

Dependent

The two occurrences are said to be dependent if the occurrence of one event affects the
likelihood that the other event will also occur.

P (A and B) = P(A) P(B\A)

Example:

The joint probability of being ill and eat barbecue

P(Ill) = 110/200 = 0.55

P (Ill\Eat B) = 90/120 = 0.75

Then the two events are dependent

P (Ill∩Eat B) = P(Eat B) P(Ill\Eat B)

= (120/200) (90/120)

= 0.45
124 STATISTICS FOR MANAGEMENT

Additional Rule

Assume that there are two events, A and B. Depending on whether they are mutually exclusive,
two different rules apply.

Rule 1: When the events are mutually exclusive

The likelihood of the occurrences happening when they are mutually exclusive is equal to the
probability of both events.

P(A OR B) = P(A U B) = P(A) + P(B)

Example:

The probability of being either blood type O or blood type A

P(OUA) = P(O) + P(A)

= (40/100) +(35/100) = 0.75

Rule 2: When the events are not mutually exclusive, Rule 2 applies.

There is usually some overlap between two non-exclusive occurrences, hence the probability
of the events will change to,

P (A OR B) = P (A U B) = P(A) + P(B) - P (A ∩ B)

9.5. RANDOM EVENTS AND PROBABILITY


• The sample space associated with an experiment is the set consisting of all possible
outcomes and is called the sure event in the experiment. A sample space is also referred
to as a probability space. A sample space will be denoted by S
125 SHOOLINI UNIVERSITY

• An outcome in S is also called a sample point. An event A is a subset of outcomes in S,


that is, A 3 S. We say that an event A occurs if the outcome of the experiment is in A

• The null subset f of S is called an impossible event

• The event A W B consists of all outcomes that are in A or in B or in both.

• The event A X B consists of all outcomes that are both in A and B.

• The event Ac (the complement of A in S) consists of all outcomes not in A, but in S

9.6. COUNTING TECHNIQUES AND CALCULATION OF PROBABILITIES


In a sample space with a large number of outcomes, determining the number of outcomes
associated with the events through direct enumeration could be tedious.

9.6.1. MULTIPLICATION PRINCIPAL:


If the experiments A1 , A2 , ., Am contain, respectively, n1 , n2 , ., nm outcomes, such that for
each possible outcome of A1 there are n2 possible outcomes for A2 , and so on, Then there are
a total of n1 n2 . nm possible outcomes for the composite experiment A1 , A2 , ., Am.

For m=2 and n1 =2, n2 =3,

The tree diagram is as follow If we count the total number of branches at the top of the tree,
we get the total number of possible outcomes for the composite experiment.

Figure 9.1
126 STATISTICS FOR MANAGEMENT

We can see that there are a total of six branches that represent all the possible outcomes of this
experiment. Three diagrams can be utilized for counting for any finite number of composite
experiments.

9.7. CONDITIONAL PROBABILITY


If A and B are two events in a sample space S, and P(B) 0, the conditional prob- ability of B
given that A has already occurred is obtained as the ratio of probability of intersection of A and
B, and probability of A.

𝐴 𝐵 = 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵) provided P(B)>0

9.8. BAYES’ THEOREM


Bayes’ rule shows how probabilities change in the light of information and how to calculate
them.

Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.

• In probability theory, it relates the conditional probability and marginal probabilities of two
random events.

• It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).

• Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
127 SHOOLINI UNIVERSITY

Applying Bayes rule

Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one.

• Suppose we want to perceive the effect of some unknown cause, and want to compute that
cause, then the Bayes' rule becomes:

9.8. SUMMARY
A set of conditions that are used to observe the behaviour of some variables is referred to as an
experiment. An event is a term that can be used to refer to a subset of a sample space. Either a
single sample point or multiple sample points can make up an event in a discrete sample space;
alternatively, an event can be a range taken from the sample space. Conditional probability is
a term used to describe the likelihood of one particular outcome taking place given the
existence of one or more other occurrences. The joint probability of two events occurring
simultaneously and at the same time is the probability that both of those events will take place.

The probability distribution of the variables that are contained within a subset of a set of random
variables is referred to as the marginal distribution of the subset. If the occurrence of one event
does not in any way affect the likelihood that the other event will also take place, then we say
that the two events, A and B, are independent of one another. If they are equal, then the two
occurrences can be considered independent; however, if they are not equal, then the two events
can be considered dependent. For the combined experiment A1, A2, and, there are a total of n1
n2. nm possible outcomes.

Am. For m=2 and n1 =2, n2 =3, \r . The following is the structure of the tree diagram: The
method of sampling will determine the total number of possibilities for selecting a random
sample of size k out of those possibilities. The permutation formula will be used after a random
128 STATISTICS FOR MANAGEMENT

sample of size k is taken from a total of n objects without replacing any of the objects, and after
the objects are arranged in a specific order. The number of unique three-digit numbers is equal
to the number of permutations that can be made using the set of four numbers as the starting
point. With the help of Bayes' rule, we are able to compute the single term P(B|A) in terms of
P(A|B), P(C), and P(D) by utilising our prior experience and knowledge. The Bayes theorem
allows for the probability prediction of an event to be updated by observing new information
of the real world, which can be done in a number of different ways.

9.9. KEY WORDS


Probabilty: Statistics uses probability to indicate the likelihood of an incident happening.
Probability has a value between 0 and 1. In real life, we might forecast an event's result in a
number of different situations. About how an event will turn out, we could be certain or unsure.
We think there is a chance that this will happen in such circumstances.

Event: The collection of results from an experiment is referred to as an Event in probability.


Say, for instance, that you throw a coin as part of an experiment. This experiment's result is
whether the coin lands "heads" or "tails." These can be regarded as the incidents related to the
experiment.

Conditional Probabilty: The chances of a particular outcome happening given that another
event has also happened are referred to as conditional probabilities. It is frequently expressed
as the probability of B given A and is written as P(B|A), where the likelihood of B depends on
the likelihood that A will occur.

9.10. REVIEW QUESTIONS


Q.1. Elaborate the Rules of Probability: Multiplication Rule for Independent and Dependent

Q.2. Briefly explain Bayes’ Theorem

Q3. What do you understand by Conditional Probability, explain with examples?


129 SHOOLINI UNIVERSITY

UNIT 10: SAMPLING

CONTENT:

▪ Objectives

10.1 Introduction

10.2 Census Versus Sample Enumeration

10.3 Sampling Methods

10.4 Principles Of Sampling

10.5 Limitations Of Sampling

10.6 References For Further Readings

OBJECTIVES:
1. To understand the concept of sampling and its importance in statistical research.

2. To differentiate between census and sample enumeration methods.

3. To explore various sampling methods and techniques used for data collection.

4. To discuss the principles underlying the theory of sampling.

5. To recognize the limitations and challenges associated with sampling procedures.


130 STATISTICS FOR MANAGEMENT

10.1 INTRODUCTION
• A finite subset of the population, selected from it with the objective of investigating its
properties is called a sample and the number of units in the sample is known as the
sample size. Sampling is a tool which enables us to draw conclusions about the
characteristics of the population after studying only those objects or items that are
included in the sample.

• The main objectives of the sampling theory are : (i) To obtain the optimum results, i.e.,
the maximum information about the characteristics of the population with the available
sources at our disposal in terms of time, money and manpower by studying the sample
values only. (ii) To obtain the best possible estimates of the population parameters

10.2 CENSUS VERSUS SAMPLE ENUMERATION


• There are two methods of collecting the data : (i) The Census Method or Complete
Enumeration. (ii) The Sample Method or Partial Enumeration

• Census Method. In the census method we resort to 100% inspection of the population
and enumerate each and every unit of the population. In the sample method we inspect
only a selected representative and adequate fraction (finite subset) of the population
and after analysing the results of the sample data we draw coclusions about the
characteristics of the population.

• The census method has its obvious limitations and drawbacks given below :

(i) The complete enumeration of the population requires lot of time, money, manpower
and administrative personnel. As such this method can be adopted only by the
government and big organisations who have vast resources at their disposal.

(ii) Since the entire population is to be enumerated, the census method is usually very
time consuming. If the population is sufficiently large, then it is possible that the
processing and the analysis of the data might take so much time that when the results
are available they are not of much use because of changed conditions.
131 SHOOLINI UNIVERSITY

• When to use Census Method ? Census method is recommended in the following


situations :

• (a) If the information is required about each and every unit of the population, there is
no way but to resort to 100% enumeration.

• (b) In any manufacturing process in industry, 100% enumeration should be taken


recourse to under the following conditions : (i) The occurrence of a defect may cause
loss of life or serious casualty to personnel. (ii) A defect may cause serious malfunction
of the equipment. It may also be desirable to carry out complete census if, (i) N, the lot
size is small and (ii) the incoming lot quality is poor or unknown.

• We now summarise the merits of the sample method over the census method.

• 1. Speed, i.e., less time. Since only a part of the population is to be inspected and
examined, the sample method results in considerable amount of saving in time and
labour. There is saving in time not only in conducting the sampling enquiry but also in
the processing, editing and analysing the data. This is a very sensitive and important
point for the statistical investigations where the results are urgently and quickly needed.

• 2. Economy, i.e., Reduced Cost of the Enquiry. The sample method is much more
economical than a complete census. In a sample enquiry, there is reduction in the cost
of collection of the information, administration, transport, training and man hours.
Although, the labour and the expenses of obtaining information per unit are generally
large in a sample enquiry than in the census method, the overall expenses of a sample
survey are relatively much less, since only a fraction of the population is to be
enumerated. This is particularly significant in conducting socio-economic surveys in
developing countries with budding economies who cannot afford a complete census
because of lack of finances.

• 3. Administrative Convenience. A complete census requires a very huge administrative


set up involving lot of personnel, trained investigators and above all the co-ordination
between the various operating agencies. On the other hand, the organisation and
132 STATISTICS FOR MANAGEMENT

administration of a sample survey is relatively much convenient as it requires less


personnel staff and the field of enquiry is also limited.

10.3 SAMPLING METHODS


• The sampling techniques may be broadly classified as follows : (i) Purposive or
Subjective or Judgment Sampling. (ii) Probability Sampling. (iii) Mixed Sampling.

• Purposive or Subjective or Judgment Sampling. In this method, a desired number of


sample units is selected deliberately or purposely depending upon the object of the
enquiry so that only the important items representing the true characteristics of the
population are included in the sample.

• An obvious and serious drawback of this sampling scheme is that it is highly subjective
in nature, since the selection of the sample depends entirely on the personal
convenience, beliefs, biases and prejudices of the investigator. For example, if in a
socio-economic survey it is desired to study the standard of living of the people in New
Delhi and if the investigator wants to show that the standard has gone down, then he
may include individuals in the samples only from the low income stratum of the society
and exclude the people from the posh colonies like South Extension, Greater Kailash,
Jor Bagh, Chanakyapuri and so on.

• Probability Sampling. Probability sampling provides a scientific technique of drawing


samples from the population according to some laws of chance in which each unit in
the universe has some definite pre-assigned probability of being selected in the sample.
Different types of sampling are in which : (i) Each sample unit has an equal chance of
being selected. (ii) Sampling units have varying probability of being selected. (iii)
Probability of selection of a unit is proportional to the sample size.

• Mixed Sampling

• Some of the important types of sampling schemes covered are:


133 SHOOLINI UNIVERSITY

• (i) Simple Random Sampling (vi) Area Sampling (ii) Stratified Random Sampling (vii)
Simple Cluster Sampling (iii) Systematic Sampling (viii) Multistage Cluster Sampling
(iv) Multistage Sampling (ix) Quota Sampling (v) Quasi Random Sampling

• SIMPLE RANDOM SAMPLING :

Simple random sampling (S.R.S.) is the technique in which sample is so drawn that
each and every unit in the population has an equal and independent chance of being
included in the sample.

If the unit selected in any draw is not replaced in the population before making the next
draw, then it is known as simple random sampling without replacement (srswor) and if
it is replaced back before making the next draw, then the sampling plan is called simple
random sampling with replacement (srswr).

Selection of a Simple Random Sample. Proper care must be exercised to ensure that
the sample drawn is random and therefore, representative of the population. A random
sample may be selected by : (i) Lottery Method. (ii) Use of Table of Random Numbers.

• Lottery Method. The simplest method of drawing a random sample is the lottery
system. This consists in identifying each and every member or unit of the population
with a distinct number which is recorded on a slip or a card. These slips should be as
homogeneous as possible in shape, size, colour, etc., to avoid the human bias. The lot
of these slips or cards is a kind of miniature of the population for sampling purposes. If
the population is small, then these slips are put in a bag and thoroughly shuffled and
then as many slips as units needed in the sample are drawn one by one, the slips being
thoroughly shuffled after each draw.

• For example, let us suppose that we want to draw a random sample of 10 individuals
from a population of 100 individuals. We assign the numbers 1 to 100, one number to
each individual of the population and prepare 100 identical slips bearing the numbers
from 1 to 100. These slips are then placed in a bag or container and shuffled thoroughly.
Finally, a sample of 10 slips is drawn out one by one.
134 STATISTICS FOR MANAGEMENT

• The lottery method gives a sample which is quite independent of the properties of the
population. It is one of the best and most commonly used methods of selecting random
samples. It is quite frequently used in the random draw of prizes, in the Tambola games
and so on.

• Use of Table of Random Numbers.

• The most practical and inexpensive method of selecting a random sample consists in
the use of ‘Random Number Tables’, which have been so constructed that each of the
digits 0, 1, 2, …, 9 appears with approximately the same frequency and independently
of each other. If we have to select a sample from a population of size N(≤ 99), then the
numbers can be combined two by two to give pairs from 00 to 99. Similarly if N ≤ 999
or N ≤ 9999 and so on, then combining the digits three by three (or four by four and so
on), we get numbers from 000 to 999 or 0000 to 9999 and so on. Since each of the digits
0, 1, 2, …, 9 occurs with approximately the same frequency and independently of each
other, so does each of the pairs 00 to 99, triplets 000 to 999 or quadruplets 0000 to 9999
and so on.

• The method of drawing a random sample comprises the following steps : (i) Identify
N units in the population with the numbers 1 to N. (ii) Select at random, any page of
the ‘random number table’ and pick up the numbers in any row, column or diagonal at
random. (iii) The population units corresponding to the numbers selected in step (ii)
constitute the random sample.

• STRATIFIED RANDOM SAMPLING

• Stratified random sampling involves the following steps : 1. Stratify the given
population into a number of sub-groups or sub-populations known as strata such that :
(a) The units within each stratum (sub-group) are as homogeneous as possible. (b) The
differences between various strata are as marked as possible, i.e., the stratum means
differ as widely as possible. (c) Various strata are non-overlapping. This means each
and every unit in the population belongs to one and only one stratum.
135 SHOOLINI UNIVERSITY

• The criterion used for the stratification of the universe into various strata is known as
stratifying factor. In general, geographical, sociological or economic characteristics
form the basis of stratification of the given population. Some of the commonly used
stratifying factors are age, sex, income, occupation, education level, geographic area,
economic status, etc.

• Merits of Stratified Random Sampling

• More Representative Sample

• A stratified random sample gives adequate representation to each strata or important


section of the population and eliminates the possibility of any important group of the
population being completely ignored.

• Greater Precision. As a consequence of the reduction in the variability within each


stratum, stratified random sampling provides more efficient estimates as compared with
simple random sampling.

• Administrative Convenience. The division of the population into relatively


homogeneous subgroups brings administrative convenience.

• SYSTEMATIC SAMPLING

• Systematic sampling is slight variation of the simple random sampling in which only
the first sample unit is selected at random and the remaining units are automatically
selected in a definite sequence at equal spacing from one another.

• This technique of drawing samples is usually recommended if the complete and up-to-
date list of the sampling units, i.e., the frame is available and the units are arranged in
some systematic order such as alphabetical, chronological, geographical order, etc.
136 STATISTICS FOR MANAGEMENT

• This requires the sampling units in the population to be ordered in such a way that each
item in the population is uniquely identified by its order, for example the names of
persons in a telephone directory, the list of voters, etc

• CLUSTER SAMPLING

• In this case the total population is divided, depending on problem under study, into
some recognisable sub-divisions which are termed as clusters and a simple random
sample of these clusters is drawn.

• We then observe, measure and interview each and every unit in the selected clusters.

• For example, if we are interested in obtaining the income or opinion data in a city, the
whole city may be divided into N different blocks or localities (which determine the
clusters) and a simple random sample of n blocks is drawn. The individuals in the
selected blocks determine the cluster sample.

• In using cluster sampling the following points should be borne in mind : (i) Clusters
should be as small as possible consistent with the cost and limitations of the survey,
and (ii) The number of sampling units in each cluster should be approximately same.

• Thus, cluster sampling is not to be recommended if we are sampling areas in the city
where there are private residential houses, business and industrial complexes, apartment
buildings, etc., with widely varying number of persons or households.

• MULTISTAGE SAMPLING

• Instead of enumerating all the sampling units in the selected clusters, one can obtain
better and more efficient estimates by resorting to sub-sampling within the clusters.
137 SHOOLINI UNIVERSITY

• The technique is called two-stage sampling, clusters being termed as primary units and
the units within the clusters as secondary units. The above technique may be generalised
to what is called multistage sampling.

• Multistage sampling refers to a sampling technique which is carried out in various


stages. Here, the population is regarded as made of a number of primary units, each of
which is further composed of a number of secondary stage units and so on, till we
ultimately reach the desired sampling unit in which we are interested.

• For example, if we are interested in obtaining a sample of, say, n households from a
particular State, the first stage units may be districts, the second stage units may be
villages in the districts and third stage units will be households in the villages. Each
stage thus results in a reduction of the sample size. Multistage sampling consists in
sampling first stage units by some suitable method of sampling. From among the
selected first stage units, a sub-sample of secondary stage units is drawn by some
suitable method of sampling which may be same as or different from the method used
in selecting first stage units. Further stages may be added to arrive at a sample of the
desired sampling units

QUOTA SAMPLING

• Quota sampling may be looked as a special form of stratified sampling.

• In this method, the investigator is told in advance the number of the sample units he is
to examine or enumerate from the stratum assigned to him. In the language of stratified
sampling, the quota of the units to be examined by the investigator from the stratum
assigned to him is fixed for each investigator.

• The sampling quotas may be fixed according to some specified characteristic such as
income group, sex, occupation, political or religious affiliations, etc.
138 STATISTICS FOR MANAGEMENT

• The choice of the particular units or individuals for investigation is left to the
investigators themselves.

• They are merely given the quotas with the specific instruction to inspect (interview) a
specified number of units (informants) from each stratum.

• Quite often the investigator does not make a random selection of the sample units. He
usually applies his judgment and discretion in the choice of the sample and tries to get
the desired information as quickly as possible. Moreover, in case of non-response from
some of the selected sample units (due to certain reasons like non-availability of the
respondent even after repeated calls by the investigator, or the inability or refusal of the
informant to furnish the requisite information), the investigator selects some fresh units
himself to complete his quota. In doing so, he is likely to include some purposive units
to get the desired information.

10.4 PRINCIPLES OF SAMPLING


• We discuss below some important laws which form the basis of the sampling theory:

• The principle of statistical regularity impresses upon the following two points : (i)
large sample size. Logically, it seems that as the sample size increases, the sample is
more likely to reveal the true characteristics of the population and thus provide better
estimates of the parameters. It is known that the reliability of the sample statistic as an
estimate of the population parameter is proportional to the square root of the sample
size n. But due to certain limitations in terms of time, money and manpower, it is not
always possible to take very large samples. Moreover, the effort and cost of drawing
large samples might outlive the utility of the sample study as against the complete
enumeration (census). (ii) Random selection. The sample should be selected at random
from the population. By random selection we mean a selection in which each and every
unit in the population has an equal chance of being selected in the sample.

• For example, if we are interested in studying the average height of the students in Delhi
University, then it is not desirable to resort to 100% enumeration of the students in the
university. A fairly adequate sample of the students from each college may be selected
139 SHOOLINI UNIVERSITY

at random and the average height of the students selected in the samples may be
computed.

• Principle of Inertia of Large Numbers

• An immediate deduction from the Principle of Statistical Regularity is the Principle of


Inertia of Large Numbers which states, “Other things being equal, as the sample size
increases, the results tend to be more reliable and accurate”. This is based on the fact
that the behaviour of a phenomenon on mass, i.e., on a large scale is generally stable.
By this we mean that if individual events are observed, their behaviour may be erratic
and unpredictable but when a large number of events are considered, they tend to
behave in a stable pattern.

• For example, if a coin is tossed, say, 20 times then nothing can be said with certainty
about the proportion of heads. We may get 0, 1, 2, …, or even all the 20 heads. But if
it is thrown at random a very large number of times, say, 5,000 times, then we may
expect on the average 50% heads and 50% tails.

• Principle of Persistence of Small Numbers

• If some of the items in a population possess markedly distinct characteristics from the
remaining items, then this tendency would be revealed in the sample values also. Rather
this tendency of persistence will be there even if the population size is increased or even
in the case of large samples.

• For example, if the day’s production of any manufacturing unit is made 4 times, the
proportion of defectives in the lot remains more or less same.

• Principle of Validity.

• A sampling design is termed as valid if it enables us to obtain valid tests and estimates
about the population parameters.
140 STATISTICS FOR MANAGEMENT

• Principle of Optimisation.

• This principle stresses the need of obtaining optimum results in terms of efficiency and
cost of the sampling design with the sources available at our disposal,a measure of
efficiency or reliability of an estimate of the population parameter is provided by the
reciprocal of the standard error of the estimate and the cost of the design is determined
by the total expenses incurred in terms of money and manpower. This principle aims at
: (i) obtaining a desired level of efficiency at minimum cost and (ii) obtaining maximum
possible efficiency with given level of cost

10.5 LIMITATIONS OF SAMPLING


• The sampling procedure has its limitations and problems which are enumerated below

• (i) If a sample survey is not properly planned (or designed) and executed carefully, the
results obtained will not be reliable and quite often might even be misleading. In this
context, it may be worthwhile to quote the words of Frederick F. Stephen : “Samples
are like medicines. They can be harmful when they are taken carelessly or without
knowledge of their effects…. Every good sample should have a proper lable with
instructions about its use”. Sampling design must be perfect otherwise it might lead to
serious complications in the final results. The omission of a few units in a complete
census may be immaterial but non-response or incomplete response from even one or
two units in a small sample might have a significant effect on the final result.

• (ii) An efficient sampling scheme requires the services of qualified, skilled and
experienced personnel, better supervision and more sophisticated equipment and
statistical techniques for the planning and execution of the survey and for the collection,
processing and analysis of the sample data. In the absence of these, the results of the
survey may not be reliable.

• Sometimes the sample survey might require more time, money and labour than a
complete census. This will be so if the sample size is a large proportion of the
population size and if complicated weighted system is used.
141 SHOOLINI UNIVERSITY

• (iv) Sampling procedure cannot be used if we want to obtain information about each
and every unit of the population. Further, if the population is too heterogeneous, it may
be impossible to use a sampling procedure.

10.6 REFERENCES FOR FURTHER READINGS


Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.

Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.

Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.

Selvaraj R., Loganathan, C. Quantitative Methods in Management.

Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.

Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay.

Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.

Wine R.L. (1976), Beginning Statistics, Winthrop Publishers Inc., Massachusetts


142 STATISTICS FOR MANAGEMENT

UNIT 11: REGRESSION

CONTENT:

▪ Objectives

Introduction

11.1. Regression Lines

11.2. Least Square Methods

11.3. Summary

11.4. Keywords

11.5. Review Questions

11.6. References for further readings

OBJECTIVES:
• Using one or more independent or control variables to explain the variability of the
dependent variable.
143 SHOOLINI UNIVERSITY

INTRODUCTION
Redman provides an example. Imagine you're a sales manager predicting next month's
numbers. Hundreds of factors, from the weather to a competitor's advertising to a new model
rumour, might affect the number. Your company may have a hunch about what will boost sales.
"Have faith. More rain, more sales.” "Sales increase six weeks after a competitor's promotion."

Regression analysis determines which variables have an impact. Which factors matter most?
We can ignore which. What's their relationship? How sure are we of all these factors? In order
to perform a regression analysis, it is necessary to collect data on the relevant variables.
(Reminder: you probably don't need to perform this yourself, but it is helpful to understand the
process your data analyst colleague employs.)

You gather all of your monthly sales figures for, say, the past three years, as well as any data
on the independent variables of interest. Also determine the average monthly precipitation over
the past three years.

Then, all of this information is plotted on a chart that looks like this:

11.1. REGRESSION LINES


Regression line is a line used to characterise the behaviour of a data set. In other words, it
provides the optimal data trend. This article will elaborate on regression lines and their
significance.

Why are regression lines significant?

In forecasting techniques, regression lines are useful. Its objective is to describe the relationship
between the dependent variable (y variable) and one or more independent factors (x variables)
(x variable).

Using the equation derived from the regression line, one can predict the future behaviour of the
dependent variables by varying the values of the independent variables.
144 STATISTICS FOR MANAGEMENT

Figure 10.1

11.2. LEAST SQUARE METHODS


Typically, linear regression produces a straight line. It is also known as the regression line with
the fewest squares.

It represents a two-dimensional dataset.

Consider that y is a dependent variable. The independent variable is X. The regression line for
the population is

Y = a0 + a1x

a0 represents the constant, a1 represents the regression coefficient, and x represents the value
of the independent variable.

If you have a random sample of observations, you may estimate the population regression line
by:

Y' = a0 + a1x, where a0 is a constant and b1 is the coefficient of regression.


145 SHOOLINI UNIVERSITY

Here, 'x' represents the value of the independent variable, while 'y' represents the value
predicted for the dependent variable.
146 STATISTICS FOR MANAGEMENT
147 SHOOLINI UNIVERSITY
148 STATISTICS FOR MANAGEMENT

11.3. SUMMARY
Redman presents an illustration. Imagine you're a sales manager projecting the numbers for the
following month. Hundreds of variables, including the weather, a competitor's advertising, and
rumours of a new model, could alter the number.

Your organisation might have a notion about what will increase sales. "Have belief. More rain,
more sales." Six weeks following a competitor's promotion, sales climb.

Regression analysis identifies influential variables. Which factors are most crucial? We can
disregard which. What is their connection? How confident are we in each of these factors? To
do a regression analysis, it is important to collect data on the variables in question.

(Reminder: you likely don't need to execute this yourself, but it is helpful to understand the
process your data analyst colleague uses.) You collect your monthly sales figures for the past,
say, three years, as well as any data on the factors of interest. Additionally, calculate the average
monthly precipitation over the past three years.

11.4. KEYWORDS
Exponential trend: A general expression for an exponential trend is Y = a.bt, where a and b are
constants.

This is one of the most widely used techniques for fitting a mathematical trend. The fitted trend
is deemed the best when the sum of squares of the data' deviations from it is minimised.

The Regression Line Y on X: The usual form of the regression line of Y on X is YCi = a + bXi,
where YCi represents the mean, predicted, or calculated value of Y for a given value of X =
Xi. This line has two constants, denoted by a and b.
149 SHOOLINI UNIVERSITY

The Regression Line X on Y : The usual form of the line of regression of X on Y is XCi = c +
dYi, where X Ci represents the projected or calculated or estimated value of X for a given value
of Y = Yi and c and d are constants. d is referred to as the regression coefficient of X on Y.

Linear Trend: The equation for the linear trend is Yt = a + bt, where t represents a time period
such as a year, month, or day, and a, b are constants.

Regression equation: Yt = a + bt + ct2 or Y = a + bt + ct2 represents the mathematical shape


of a parabolic trend. Where constants a, b, and c are used.

If the coefficient of correlation calculated for bivariate data (Xi, Yi), I = 1,2,...... n, is
sufficiently strong and a cause-and-effect type of relationship is assumed to exist between
them, the next natural step is to determine a functional relationship between these variables. In
statistics, this functional relationship is known as the regression equation.

11.5. REVIEW QUESTIONS


1. Differentiate correlation and regression. Discuss least square method of regression fitting

2. What is your comprehension of linear regression? Why are there two regression lines?
Under what circumstances can there just be one line?

3. Define the regression of Y on X and X on Y given a bivariate data set (Xi, Yi), where I =
1, 2,.... What values would the correlation coefficient have if two regression lines (a) cross
at a right angle and (b) coincide?

4. Include a remark on the estimate's standard error.

5. "The regression line provides merely a 'best estimate' of the variable under consideration.
This estimate's degree of uncertainty can be determined by calculating its standard error.
Explain.

6. What is the least squares method? Demonstrate that the two lines of regression obtained
using this method are irreversible unless r = 1. Explain.
150 STATISTICS FOR MANAGEMENT

11.6. REFERENCES FOR FURTHER READINGS


Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
151 SHOOLINI UNIVERSITY

UNIT 12: INTRODUCTION TO CORRELATION


AND COEFFICIENT

CONTENT:
▪ Objectives
Introduction
12.1. Correlation
12.1.1. Definition of correlation
12.1.2. Scope of correlation analysis
12.1.3. Scatter Diagram
12.1.4. Methods of correlation
12.2. Summary
12.3. Keywords
12.4. Review Questions
12.5. References for further reading

OBJECTIVES:
• Students will learn about correlation

• You will learn about the usage of correlation


152 STATISTICS FOR MANAGEMENT

INTRODUCTION
Up to this point, we have only discussed distributions in relation to a single attribute. The term
"Univariate Distribution" refers to these kinds of distributions. A bivariate distribution is
obtained by simultaneously observing several units of interest with reference to two attributes.
This creates the conditions necessary to derive the distribution.

Consider, for instance, research that looked at both the heights and weights of college students
at the same time. We are also able to determine the mean, variance, skewness, and other
statistics for each individual attribute based on such data.

In addition to this, when we study bivariate distributions, we are also interested in determining
whether or not there is a relationship between two of the characteristics, or, to put it another
way, the degree to which the two variables, which correspond to the two characteristics, tend
to move together in the same or opposite directions, i.e. the degree to which they are associated.
In other words, we want to know whether or not there is a relationship between the two
characteristics.

The understanding of this kind of relationship is beneficial for making accurate projections
regarding the value of one variable. Notes taken into account the worth of the other In addition
to this, it is useful for understanding and analysing a wide range of economic and commercial
issues. It is important to highlight those statistical relations are not the same as exact
mathematical relations at this point in the discussion. If we are given a statistical connection
between two variables, X and Y, such as Y = a + bX, then the only value of Y that we can
obtain is the value that we would expect to find on average for a particular value of X. The
investigation of the connections between three or more variables can be broken down into one
of two major classes:

1. to ascertain whether or not the variables are connected in any way, shape, or form. If
this is the case, then what would you say the level of connection, or the extent of the
correlation is between the two?

2. Given that there is a correlation between the variables, the goal is to find the form of
the connection between them that is the most appropriate.
153 SHOOLINI UNIVERSITY

3. The first category pertains to the unit's discussion of 'Correlation,' while the second
pertains to 'Regression.'

12.1. CORRELATION
Different authors have provided their different definitions of correlation, which, in general,
imply that it is the degree of link between two or more variables.

12.1.1. DEFINITION OF CORRELATION


Below are some essential definitions of correlation:

"Two or more quantities are said to be correlated if their fluctuations tend to be accompanied
by matching fluctuations in the other or others."— L.R. Connor

Analysis of covariation between two or more variables constitutes correlation. — A.M. Tuttle

When the link is quantitative in nature, correlation is the ideal statistical instrument for
detecting and measuring it and expressing it in a concise formula. — Croxton and Cowden

"The objective of correlation analysis is to establish the 'degree of link' between variables." —
Ya Lun Chou

The correlation coefficient quantifies the degree of relationship between two or more variables.

• Correlation means a relation between two groups.

• In statistics, it is the measure to indicate the relationship between two variables in


which, with changes in the values of one variable, the values of other variable also
change. These variables may be related to one item or may not be related to one item
but have dependence on the other due to some reason.
154 STATISTICS FOR MANAGEMENT

• For example, the data on height and weights of a group of people would relate to each
member of the group but prices of sugar and sugarcane are two different series
altogether but there would be some relation between the values of the two, prices of
sugar depending upon the prices of sugarcane.

• This technique provides a tool into the hands of decision-makers because it provides
better understanding of the trends and their dependence on other factors so that the
range of uncertainties associated with decision-making is reduced.

• Definition of Correlation:

• The term correlation indicates the relationship between two variables in which with
changes in the value of one variable, the values of the other variable also change.
Correlation has been defined by various eminent statisticians, mathematicians and
economists.

• Some of the important definitions of correlation are given below: (1) According to La
Yun Chow, “Correlation analysis attempts to determine the degree of relationship
between variables.”

• (2) As per W. I. King, “Correlation means that between two series or groups of data
there exists some casual connections. ....... If it is proved true that in a large number of
instances two variables tend always to fluctuate in the same or in opposite directions,
we consider that the fact is established and that a relationship exists. This relationship
is called correlation.”

• (3) In the words of L. R. Conner, “If two more quantities vary in sympathy so that
movements in the one tend to be accompanied by corresponding movements in the
other/others then they are said to be correlated.
155 SHOOLINI UNIVERSITY

Types of Correlation

• Correlation can be classified as given ahead:

• (1) Positive and negative correlation: When the values of the two variables move in the
same direction, i.e., an increase in one is associated with an increase in other, or vice
versa, the correlation is said to be positive. If the values of two variables move in the
opposite directions i.e., an increase in the value of one variable is associated with fall
in other, or vice versa, the correlation is said to be negative. For example, the price and
supply are positively correlated but price and demand are negatively correlated.

• (2) Linear and non-linear correlation: If, in response to a unit change in the value of one
variable, there is a constant change in the value of the other variable, the correlation
between them is said to be linear. This means, the relation between variables fits in Y
= a + bX. But when no constant change in variable is registered for a given unit change
in other variable, non-linear or curvilinear correlation is said to exist.

• (3) Simple, multiple and partial correlation: When relation between two variables is
studied, it is simple correlation. When three or more factors are studied together to find
relationships, it is called multiple correlation. In partial correlation, two or more factors
are agreed to be involved but correlation is studied between only two factors,
considering other factors to be constant.

Applications of Correlation

• Some of the important areas where correlation has been used successfully are:

• (1) In the field of genetics: Galton and Pearson developed a method of assessing
correlation which was used in studying many problems of biology and genetics.

• (2) In the field of management: Basically, management is all about making decisions.
Correlation technique presents a strong tool into the hands of the manager which
156 STATISTICS FOR MANAGEMENT

reduces the range of uncertainty associated with decision-making. Moreover, it also


helps in identifying the stabilising factors for a disturbed economic situation.

• (3) Other field of social sciences: Correlation helps in determining the interrelationships
between different variables and in this way it is very helpful in promoting research and
opening new frontiers of knowledge. In this way it can be said that correlation has
immense utility in various fields in promoting research and opening new frontiers of
knowledge.

Conceptual Summary

• Correlation measures a degree of the relationship between two or more variables but it
does not indicate any kind of cause and effect relationship between the variables. If,
high degree of correlation is found exist between two variables, it implies that there
must be a reason for such close relationship, but the cause and effect relation can be
revealed specifically when other knowledge of the factor involved being brought to
bear on the situation. This means, to establish a ‘functional relationship’ between two
or more variables, one has to go beyond the confines of statistical analysis to other
factors. (Functional relationship means that two or more factors are interdependent.

12.1.2. SCOPE OF CORRELATION ANALYSIS


If there is a correlation between two variables, it could be the result of any of the following:

1. One of the variables could influence the other: Calculating the correlation coefficient
between the quantity demanded and the related price of tea would only disclose that the
degree of association between them is extremely high. It will not indicate whether the
price of tea influences its demand or vice versa. In order to determine this, further
information beyond the correlation study is required. For instance, if further
information suggests that the price of tea affects its demand, then the price will be the
cause and the quantity will be the consequence. Other terms for the causal variable are
independent variable and dependent variable.
157 SHOOLINI UNIVERSITY

2. The two variables are able to interact: In this instance, there is a cause-and-effect
relationship, but it may be challenging to determine which of the two variables is
independent. For instance, if we have data on the price of wheat and its cost of
production, the correlation between them may be very high, as a higher price of wheat
may encourage farmers to produce more wheat, and an increase in wheat production
may result in a higher cost of production, assuming it is an industry with rising costs.
In addition, the greater production costs may increase the price of wheat. For the
purpose of identifying a relationship between the two variables, we can select any one
of them as the independent variable in such cases.

3. The two variables are susceptible to external influences: In this scenario, there may be
a strong correlation between the two variables, but no obvious cause-and-effect
relationship appears to exist between them. For instance, the rising incomes of
customers may produce a positive correlation between the demand for commodities X
and Y. Such a correlation coefficient is referred to as a false or nonsensical correlation.

4. This is another example of spurious correlation. Given the data on any two variables, it
is possible to acquire a high correlation coefficient number while, in reality, the
variables have no link. For instance, a high correlation coefficient can be achieved
between the size of a person's shoes and their income.

12.1.3. SCATTER DIAGRAM


● These are the primary characteristics of correlation.

● Coefficient of Correlation is between -1 and +1; it cannot be less than -1 or more than
+1. Symbolically,

● -1<=r<= + 1 or | r | <1.

● Correlation Coefficients are Independent of Change of Origin: This feature


demonstrates that subtraction of any constant from each value of X and Y has no effect
on the correlation coefficient.
158 STATISTICS FOR MANAGEMENT

● The coefficients of correlation are symmetrical:

● Coefficient of Correlation is Independent of Change in Scale: This feature indicates that


dividing or multiplying all X and Y values has no effect on the coefficient of correlation.

● The correlation coefficient measures simply the linear relationship between X and Y. If
X and Y are independent, the correlation coefficient between them will be 0.

● The simplest device for determining relationship between two variables is a special type
of dot chart called scatter diagram. When this method is used the given data are plotted
on a graph paper in the form of dots, i.e., for each pair of X and Y values we put a dot
and thus obtain as many points as the number of observations.

● The more the plotted points “scatter” over a chart, the less relationship there is between
the two variables.

● Given the following pairs of value of the variables X and Y: X: 2 3 7 6 8 9 Y: 6 5 5 8


12 11

(a) Make a scatter diagram. (b) Do you think that there is any correlation between
the variables X and Y ?

Is it positive or negative ? Is it high or low ?

(c) By graphic inspection, draw an estimating line

• Merits and Limitations of the Method

Merits:

• It is a simple and non-mathematical method of studying correlation between the


variables. As such it can be easily understood and a rough idea can very quickly be
formed as to whether or not the variables are related.
159 SHOOLINI UNIVERSITY

• It is not influenced by the size of extreme items whereas most of the mathematical
methods of finding correlation are influenced by extreme items.

• Making a scatter diagram usually is the first step in investigating the relationship
between two variables.

Limitations:

• By applying this method we can get an idea about the direction of correlation and also
whether it is high or low. But we cannot establish the exact degree of correlation
between the variables as is possible by applying the mathematical methods.

12.1.4. METHODS OF CORRELATION


Methods for analysing the relationship between two variables include:

1. The scatter diagram technique: It is a graphical representation of determining the link


between two or more variables. The x-axis represents the independent variable, while the y-
axis represents the dependent variable. The various x and y values are plotted on the graph. If
all values increase, there is a positive correlation; if all values decrease, there is a negative
correlation.

1. It is straightforward and simple to use and comprehend.

2. The relationship between two variables can be explored without the use of mathematics.

Demerits-

1. It is not a mathematical procedure; hence the outcomes are neither exact nor precise.

2. It just provides an approximation of the relationship.


160 STATISTICS FOR MANAGEMENT

2.Graphic technique

This is a continuation of linear graphs. In this scenario, graph paper is used to plot two or more
variables. If the curves travel in the same direction, the correlation is positive; otherwise, the
correlation is negative. However, if there is no clear direction, there is no association. Although
it is a straightforward procedure, it provides just an approximate assessment of the
relationship's nature.

Merits

1. It is straightforward and simple to use.

2. The relationship between two variables can be explored without the use of mathematics.

Demerits

1. It is not a mathematical procedure; hence the outcomes are neither exact nor precise.

2. It just provides an approximation of the relationship.

Karl Pearson's correlation coefficient

Karl Pearson's Correlation Coefficient: The correlation coefficient is the most prominent
mathematical approach for calculating correlation. The basis of its calculation is the arithmetic
mean and standard deviation. Correlation coefficient (r), commonly known as the linear
correlation coefficient, quantifies the strength and direction of a linear link between two
variables. The value of r ranges from -1 to 1.

r is the unit of measurement irrespective of the variable's unit of measure.

1. r is independent of the origin and scale change.


161 SHOOLINI UNIVERSITY

2. If two variables are independent of one another, then r equals zero.

The correlation coefficient quantifies the degree of association between two variables.

1. It measures direction as well.

2. It can be used to calculate the regression coefficient if the standard deviations of the
two variables are known.

The Karl Pearson’s method, popularly known as Pearsonian coefficient of correlation, is most
widely used in practice.

The Pearsonian coefficient of correlation is denoted by the symbol r.

It is one of the very few symbols that is used universally for describing the degree of
correlation between two series.

The formula for computing Pearsonian r is:

r = Cumulative dxdy/N(σ dx)(σ dy ) also can be written as

r = ∑dxdy/ (∑dx2. ∑dy2)square root

dx = X-Mean

dy = Y-Mean

σ x = Standard deviation of series X

σ y = Standard deviation of series Y


162 STATISTICS FOR MANAGEMENT

N = Number of paired observations

• This method is to be applied only when the deviations of items are taken from actual
means and not from assumed means.

• The value of the coefficient of correlation as obtained by the above formula shall
always lie between ± 1.

• When r = + 1, it means there is perfect positive correlation between the variables.

• When r = – 1, it means there is perfect negative correlation between the variables.


When r = 0, it means there is no relationship between the two variables.

• The coefficient of correlation describes not only the magnitude of correlation but also
its direction.

• Steps

• (i) Take the deviation of X series from the mean of X and denote the deviations by dx.

• (ii) Square these deviations and obtain the total, i.e., ∑ dx square.

• (iii) Take the deviations of Y series from the mean of Y and denote these deviations
by dy

• (iv) Square these deviations and obtain the total, i.e., ∑ dy square .

• (v) Multiply the deviation of dx and dy and obtain the total, i.e., ∑dxdy .

• (vi) Substitute the values of ., ∑dxdy, ∑ dx square and ∑ dy square in the above
formula.
163 SHOOLINI UNIVERSITY

• Examples for Practice

• Calculate Karl Pearson’s coefficient of correlation from the following data:

• X: 6 8 12 15 18 20 24 28 31

• Y: 10 12 15 15 18 25 22 26 28

• Find Karl Pearson’s coefficient of correlation for the following:

• Cost (Rs.) 39 65 62 90 82 75 25 98 36 78

• Sales (Rs.) 47 53 58 86 62 68 60 91 51 84

• Making use of the data summarised below,

• Calculate the coefficient of correlation, r12:

• Case X1 X2 Case X1 X2 A 10 9 B 6 4 C 9 6 D 10 9 E 12 11 F 13 13 G 11 8

• H94

• Find Coefficient of correlation between X & Y:

• Number of paired items is 15

• Arithmetic Mean is 25 & 18

• Sum of squares of deviation from Mean is 136 and 138 ,summation of product of X&
Y series from mean =122
164 STATISTICS FOR MANAGEMENT

• We have cumulative dxdy=122, cumulative (dx)square=136 and cumulative (dy)square


is 138

• R=122/((square root of 136&138))=+0.89

• Find coefficient of correlation for the following:

• X: 78 89 96 69 59 79 68 61

• Y: 125 137 156 112 107 136 123 108

• Assume 69 and 112 as mean value of X & Y respectively

• N=8, Cumulative dx=47, Cumulative dy=108 , Cumulative (dx) square=1475 ,


Cumulative (dy)square=3468 , Cumulative dxdx=2116

• R=0.954 by assumed mean method

• We have total of multiplication of deviation of X and Y=3044

• Number of pairs of Observation=10

• Total of deviations of X=-170

• Total of deviations of Y=-20

• Total of square of deviations of X=8288

• Total of square of deviations of Y=2264

• If assumed mean of X, Y are 82 and 68 respectively , find Coefficient of Correlation


165 SHOOLINI UNIVERSITY

• We have cumulative dxdy=3044, cumulative dx=-170 , cumulative dy=-20

• Cumulative dx square =8288 Cumulative dy square =2264 , N=10

• Applying formula we get r=+0.78

Spearman’s Rank Coefficient of Correlation

This is a qualitative method for measuring the correlation coefficient. Such qualities as beauty,
honesty, and ability cannot be assessed quantitatively. Therefore, ranks are employed to
calculate the correlation coefficient.

• Spearman Rank Correlation=1-6(Cumulative di square)/n(n+1)(n-1)

• Here,

• n= number of data points of the two variables

• di= difference in ranks of the “ith” element

• The Spearman Coefficient,⍴, can take a value between +1 to -1 where,

• A ⍴ value of +1 means a perfect association of rank

• A ⍴ value of 0 means no association of ranks

• A ⍴ value of -1 means a perfect negative association between ranks.

• Closer the ⍴ value to 0, weaker is the association between the two ranks.

• The scores of 9 students in History and Geography are mentioned in the table below.
166 STATISTICS FOR MANAGEMENT

• Step 1- Create a table of the data obtained.

• Step 2- Start by ranking the two data sets. Data ranking can be achieved by assigning
the ranking “1” to the biggest number in the column, “2” to the second biggest number
and so forth. The smallest value will usually get the lowest ranking. This should be
done for both sets of measurements.

• Step 3- Add a third column d to your data set, d here denotes the difference between
ranks.

• For example, if the first student’s physics rank is 3 and the math rank is 5 then the
difference in the rank is 3. In the fourth column, square your d values.

• History scores are 35,23,47,17,10,43,9,6,28

• whereas Geography scores are 30, 33,45,23,8,49,12,4,31

• We compute ranks and also get cumulative di square as 12

• Step 4- Add up all your d square values, which is 12 (∑d square)

• Step 5- Insert these values in the formula

• We get r=1-6(12)/9*10*8=1-1/10=0.9

• The Spearman’s Rank Correlation for this data is 0.9 and as mentioned above if
the ⍴ value is nearing +1 then they have a perfect association of rank.

• An example of calculating Spearman's correlation

• To calculate a Spearman rank-order correlation on data without any ties we will use the
following data:
167 SHOOLINI UNIVERSITY

• English scores are: 56,75,45,71,62,64,58,80,76,61

• Maths scores are : 66,70,40,60,65,56,59,77,67,63

• Where d = difference between ranks and d2 = difference squared.

• We get Cumulative d2 =54

• R=1 – 6(54)/10(11)9=1-0.33=0.67

• as n = 10. Hence, we have a ρ (or rs) of 0.67. This indicates a strong positive
relationship between the ranks individuals obtained in the maths and English exam.
That is, the higher you ranked in maths, the higher you ranked in English also, and vice
versa.

• Spearman’s Rank Correlation Formula For Repeated Ranks

• The Spearman’s correlation coefficient for tied ranks can be calculated using the
formula

• ρ= 1 – 6/n(n+1)(n –1) [ ∑ d2 + ∑ m(m2 – 1)/12 ]

• Where m1,m2….�1,�2…. are the number of repetitions of ranks and m3i–


mi12��3–��12 is their corresponding correction factors.

• Like normal Spearman’s rank correlation coefficient, the tied rank coefficient will have
values only between 11 and −1−1, both included. +1+1 denotes a perfect positive
correlation, −1−1 denotes a perfect negative correlation, and 00 indicates no
correlation.

• The following table gives the data of the marks obtained by 8 students in Commerce
and Mathematics. Compute the rank correlation coefficient.
168 STATISTICS FOR MANAGEMENT

• Marks in Commerce 15 20 28 12 40 60 20 80

• Marks in Mathematics 40 30 50 30 20 10 30 60

• We get cumulative di square=81.50

• In Commerce (X) 20 is repeated two times, corresponding to ranks 3 and 4.


Therefore, 3.5 is assigned for ranks 2 and 3, with m1=2
In Mathematics (Y) 30 is repeated three times, corresponding to
ranks 3,4 and 5.Therefore,4 is assigned for ranks 3,4 and 5 with m2=3

• Hence r=1- 6(84)/8*9*7=0

• Hence, we can conclude that the marks in Commerce and Mathematics are not
correlated at all.

• In a beauty contest, three judges accorded following ranks to 10 participants:

• Judge I 1 6 5 10 3 2 4 9 7 8

• Judge II 3 5 8 4 7 10 2 1 6 9

• Judge III 6 4 9 8 1 2 3 10 5 7

• Find out by Spearman's Rank Difference Method which pair of judges has a common
taste in respect of beauty.

• r12= -0.212 ,r23=+0.636 and r13=-0.296

• Observation and Conclusion:


As the rank correlation coefficient between Judge 1 and Judge 3 is highest and positive,
so it can be regarded that they have a common taste in respect of beauty.
169 SHOOLINI UNIVERSITY

• Calculate Spearman’s coefficient of rank correlation for the following data:

• X : 53 98 95 81 75 61 59 55

• Y : 47 25 32 37 30 40 39 45

• Cumulative Di square is 160

• Answer is -0.905 (ranks not tied)

• Calculate Spearman’s coefficient of rank correlation for the following data:

• X : 80 78 75 75 68 67 60 59

• Y : 12 13 14 14 14 16 15 17

• Cumulative Di square is 159.5

• Answer is -0.929 (ranks tied)

• Five competitors in a beauty contest are ranked by three judges in the following order
:

• Rank by Judge A 1 2 3 4 5

• Rank by Judge B 2 4 1 5 3

• Rank by Judge C 1 3 5 2 4

• Using rank correlation coefficient, determine which pair of judges has the nearest
approach to tastes in beauty.
170 STATISTICS FOR MANAGEMENT

• Cumulative d12 square is 14 , r12 is 0.30

• Cumulative d13 square is 10 , r13 is 0.50

• Cumulative d23 square is 28 ,r23 is - 0.40

• Since the coefficient of rank correlation is positive and highest in the judgement of the
judges A and C, we conclude that they have the similar tastes to common tastes in
beauty. Judges B and C have very different tastes.

• If N=10 , Cumulative D square is 66, find spearman rank coefficient (A=0.60)

Rank correlation coefficient between some students obtained in statistics and


accountancy is 0.8 , total of squares of rank differences is 33 , please find number of
students.

• 0.8=1- 6 (33)/N(N+1)(N-1)

• 2/10=198/N(N+1)(N-1)

• 990=N(N+1)(N-1)

• (10)(11)(9) =N(N+1)(N-1) or (10)(10+1)(10-1)=N(N+1)(N-1)

• N=10

• From the following data , find spearman rank coefficient

• Serial numbers 1 2 3 4 5 6 7 8 9 10

• Rank difference -2 -4 -1 +3 +2 0 ? +3 +3 -2
171 SHOOLINI UNIVERSITY

• Spearman Rank Coefficient is 0.636 (Answer) as by calculation cumulative D square


is 60

Merits

• This method is simpler to understand and easier to apply compared to the Karl
Pearson’s method. The answer obtained by this method and the Karl Pearson’s method
will be the same provided no value is repeated, i.e., all the items are different.

• Where the data is of a qualitative nature like honesty, efficiency, intelligence, etc., this
method can be used with great advantage. For example, the workers of two factories
can be ranked in order of efficiency and degree of correlation established by applying
this method.

• This is the only method that can be used where we are given the ranks and not the actual
data.

• Even where actual data are given, rank method can be applied for ascertaining degree
of correlation .

Limitations

• This method cannot be used for finding out correlation in a grouped frequency
distribution.

• Where the number of items exceeds 30 the calculations become quite tedious and
require a lot of time. Therefore, this method should not be applied where N is exceeding
30 unless we are given the ranks and not actual values of the variable.

• This method cannot be utilised when the frequency distribution is clustered.

• When the quantity of elements exceeds 30, the calculations become tiresome and time-
consuming.
172 STATISTICS FOR MANAGEMENT

• It always assumes a linear relationship between the variables, regardless of whether or


not this assumption is valid.

• It's influenced by extreme values.

• It takes a considerable amount of time to compute.

When to use Rank Correlation Coefficient

• The rank method has two principal uses:

• (1) The initial data are in the form of ranks.

• (2) If N is fairly small (say, not large than 25 or 30), rank method is sometimes applied
to interval data as an approximation to the more time-consuming r. This requires that
the interval data be transferred to rank orders for both variables. If N is much in excess
of 30, the labour required in ranking the scores becomes greater than is justified by the
anticipated saving of time through the rank formula.

12.2. SUMMARY
We've only discussed distributions for one characteristic so far. These are "Univariate
Distributions." Observing multiple units of interest with two properties yields a bivariate
distribution. This establishes distribution circumstances. Consider study that compared college
students' heights and weights.

We may also determine the mean, variance, skewness, and other statistics for each property.
When we study bivariate distributions, we also want to know if there is a relationship between
two of the characteristics, or the degree to which the two variables, which correspond to the
two characteristics, move together in the same or opposite directions, i.e. the degree to which
they are associated. We want to know if two qualities are related.
173 SHOOLINI UNIVERSITY

Understanding this link helps with variable projections. Notes considering others' worth It's
also useful for analysing economic and business difficulties. At this point, statistical relations
are not the same as exact mathematical relations. If we have a statistical relationship between
two variables, X and Y, such as Y = a + bX, we can only get the average value of Y for a given
value of X.

12.3. KEYWORDS
Bivariate Distribution: When many units are observed simultaneously with respect to two
properties, a Bivariate Distribution is obtained.

Correlation: Correlation is the suitable statistical tool for detecting and quantifying
quantitative relationships and expressing them in a concise formula when the relationship is
quantitative.

Correlation analysis: The objective of correlation analysis is to establish the "degree of link"
between variables.

Correlation Coefficient: It quantifies the degree of correlation between two or more variables.

Connecting points of the diagram: On the graph, each pair of values (Xi, Yi) is represented by
a point. The collection of such points (also known as the diagram's dots).

Scatter Diagram: The bivariate data are represented as (Xi, Yi), where I = 1, 2,...... Each pair
(Xi, Yi), where I = 1, 2,..., n, is plotted on a graph in order to determine the extent of the
relationship between variables X and Y. The resulting diagram is known as a Scatter Diagram.

Spearman's Rank Correlation: This is an imprecise approach for calculating the correlation
between two attributes. In this procedure, numerous things are ranked based on the two criteria,
and the correlation between these ranks is computed.
174 STATISTICS FOR MANAGEMENT

12.4. REVIEW QUESTIONS


1. Describe the relationship between two variables. Differentiate positive correlation from
negative correlation. Employ diagrams to elucidate.

2.Compose an expression for Karl Pearson's linear correlation coefficient. Why is it called the
linear correlation coefficient? Explain.

3. The correlation between two independent variables is always zero, although the opposite is
not always true. Clarify the meaning of this assertion.

4. Differentiate between the coefficient of rank correlation of Spearman and the coefficient of
correlation of Karl Pearson. Explain the circumstances in which Spearman's rank correlation
coefficient can acquire a maximum and minimum value. Under what circumstances do
Spearman's formula and Karl Pearson's formula yield identical outcomes?

5. Write brief notes regarding the scatter diagram.

12.5. REFERENCES FOR FURTHER READINGS


Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.

Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.

Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.

Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.

Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.

Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.

Selvaraj R., Loganathan, C. Quantitative Methods in Management.


175 SHOOLINI UNIVERSITY

Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.

Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay.

Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.

Wine R.L. (1976), Beginning Statistics, Winthrop Publishers Inc., Massachusetts

You might also like