Statistics For Management
Statistics For Management
CONTENT:
▪ Objectives
Introduction
1.1. History of Statistics
1.2. Managerial Application of Statistics
1.3. Statistics and Computers
1.4. Importance of Statistics in different field
1.5. Summary
1.6. Key Words
1.7. Review Questions
1.8. Further Reading
OBJECTIVES:
• Relevance of Statistics in real life.
INTRODUCTION
Statistical inference is the mathematical study of collecting, organizing, analyzing, and
interpreting numerical data in order to derive conclusions based on the probability of those data
(probability). Because statistical data (unlike individual quantities) tend to act in a regular,
predictable fashion, statistics may analyze aggregates of data too big to be understood by
ordinary observation. Statistics can be broken down even further into its component parts,
namely, descriptive statistics and inferential statistics.
In 1749, Gottfried Achenwall coined the term statistik at a German university to describe the
political science of many different countries. The Englishman W. Hooper coined the term
"statistics" in 1771 while translating Baron B.F. Bieford's work Elements of Universal
Erudition. Hooper claims that statistics is the science that informs us about the political systems
of all modern states in the globe. While there is a sizable difference between the old and new
statistics, the latter also includes the former.
Statistics has evolved throughout the past few centuries since it was first used by English
writers in their writings in the 18th century. The nineteenth century came to a conclusion with
a lot of work being finished.
Around the turn of the 20th century, William S. Gosset developed the methods for making
decisions based on a limited amount of data. Over the course of the 20th century, various
statisticians have concentrated on developing new statistical methods, theories, and
applications. The availability of electronics and computers nowadays is without a doubt a
crucial factor in the advancement of statistics.
3 SHOOLINI UNIVERSITY
Every student of statistics should be acquainted with the various parts of the field in order to
accurately appreciate statistics from a more complete standpoint. Depending on one's line of
work or profession, the other aspects of statistics are usually buried, but comprehending the
underlying idea that drives statistical analysis is essential to appreciating its importance and
beauty.
The two main statistical subfields are descriptive statistics and inferential statistics. Both of
these are used in the analysis of data for scientific purposes, and both are equally important for
the statistics student.
Descriptive Statistics
Descriptive statistics address subjects like data collection and presentation. This is typically
the initial step in a statistical analysis. The statistician must be careful while designing trials,
choosing the ideal focus group, and avoiding biases that are so easy to bring into the experiment
because it is frequently not as simple as it sounds.
Different research areas require different kinds of analysis when using descriptive statistics.
For instance, the average values that fluctuate over short time periods are required by a
physicist studying turbulence in a lab setting. Physical values must be averaged out of a wide
range of data acquired throughout the experiment due to the nature of the problem.
Inferential Statistics
To choose random samples that fairly represent the entire population, a variety of sampling
techniques are used. Simple random sampling, stratified sampling, cluster sampling, and
systematic sampling are some of the most important techniques.
4 STATISTICS FOR MANAGEMENT
• Actuarial science is the area that assesses risk in the insurance and financial industries
using statistical and mathematical methods.
• Actuarial science is the area that assesses risk in the insurance and financial industries
using statistical and mathematical methods.
• Biostatistics: Biostatistics, a branch of biology that also covers medical statistics, uses
statistical analysis to study biological occurrences and observations.
• Quality control examines the production and manufacturing processes. It may employ
statistical sampling of product items to inform judgments regarding process control or
accepting deliveries.
• Statistical finance An empirical attempt to move finance away from its normative
roots and toward a positivist framework using examples from statistical physics with a
focus on emergent or collective characteristics of financial markets is known as
6 STATISTICS FOR MANAGEMENT
Through the Computer Applications option, students can combine a traditional computer
science degree with a degree in a non-traditional field. You may practise using the technologies
you'll use in the field at our state-of-the-art labs for high-performance computing, networks,
and artificial intelligence. You'll also learn through laboratories, lectures, and projects:
1. Look into the limits of the algorithms and data structures that support complex software
systems.
2. Make new applications and tools for science and research areas that involve more than one
field.
Almost every discipline, including business, commerce, trade, physics, chemistry, economics,
mathematics, biology, botany, psychology, and astronomy, rely on quantum mechanics in some
capacity today. We'll now discuss a few key areas where statistics are regularly applied.
Businesses must use statistics to succeed. A successful businessman must be able to decide
swiftly and accurately. He should be able to choose what to produce, sell, and in what quantities
because he knows what his customers desire. Statistics are used by businesspeople to maximise
their production
Statistical methods based on consumer preferences can also be used to evaluate product quality
more efficiently. As a result, statistical information is necessary for all corporate operations.
He is capable of making informed choices regarding the location of the business, how the
products are advertised, the resources that are available, etc.
In economics, statistics are significant. Statistics have a significant role in economics. For
economists and administrators, national income accounts are a versatile tool. The creation of
these accounts is done using statistical techniques. In economics research, statistical techniques
are utilized for data collection, analysis, and hypothesis testing. The link between supply and
demand is studied using statistical methods, and issues like imports and exports, inflation, and
per capita income call for a solid understanding of statistics.
Almost all branches of the social and natural sciences depend heavily on statistics. Although
the methods employed in the natural sciences are the most reliable, the conclusions that can be
drawn from them are only likely and not guaranteed. This is due to the inadequacy of the data
they are based on. In order to describe these measurements more accurately, statistical analysis
is helpful. The field of applied mathematics includes the study of statistics. We use a wide
8 STATISTICS FOR MANAGEMENT
The banking industry places a significant emphasis on statistical analysis. Statistics are utilized
for a variety of purposes by the banking industry. The concept behind how banks operate is
based on the fact that customers do not all withdraw their money at the same time. This ensures
that the money is safe. When the bank lends money to other people and charges them interest,
it turns these deposits into profit for the bank. Estimating the number of depositors and the
claims they will make on a given day requires the bankers to make use of statistical methods
that are predicated on the concept of probability.
The management of a state or country depends heavily on the gathering and processing of
statistical data. Numerous government programmes are built on statistical principles.
Nowadays, statistical information is used in almost all administrative decisions. Statistical
techniques will be used to determine the size of the cost-of-living rise if the government decides
to change the pay scales of employees in light of an increase in the cost of living. The creation
of both the federal and provincial budgets heavily relies on statistics. This is because they are
used to help with the estimation of expected costs and revenues obtained from a number of
sources. Statistics act as the administration of the state's eyes in this regard.
Statistics is an indispensable tool in the vast majority of natural and social science disciplines.
In fields as diverse as biology, physics, chemistry, mathematics, meteorology, research
chambers of commerce, sociology, business, public administration, communication and
information technology, etc..., statistical methods are frequently utilised for the purpose of
analysing the results of experiments and determining the significance of their findings.
Astronomy is one of the oldest subfields of statistical research; it is concerned with the use of
observations to determine the distances, sizes, masses, and densities of celestial entities. Errors
are unavoidable when performing these measures; therefore, the most probable measurements
are founded by utilizing statistical approaches.
1.5. SUMMARY
Either the Latin word "status" or the Italian word "Statista" are the roots of the word "statistics."
The first person to use the term is thought to have been an Englishman named W. Hooper in
1771. The development of new statistical methods, theories, and applications has received a lot
of attention from statisticians throughout the 20th century. Descriptive statistics and inferential
statistics are the two main subfields that make up the statistical discipline. The three main parts
of descriptive statistics are data gathering, analysis, and interpretation.
Analyzing data samples that are intended to be representative of a wider population is a key
component of the use of inferential statistics. Each are equally crucial for a statistics student to
comprehend, and both have various research applications. Chemometrics is the use of statistical
techniques to create a relationship between the state of a chemical system and the results of
measurements that were taken on it. Population ecology is the study of the dynamics of animal
populations and how those populations interact with their environments. Quantitative
10 STATISTICS FOR MANAGEMENT
Students who choose the Computer Applications option can obtain both a degree in a field not
often connected with computer science and a degree in computer science. Statistics are crucial
in establishing a nation's current per capita income, unemployment rate, population growth
rate, housing stock, educational system, healthcare infrastructure, etc.
The study of statistics is one of the subfields within the discipline of applied mathematics.
Numerous statistical techniques are used in mathematics, including estimates, dispersions, and
probability averages. Without the systematic collection and analysis of statistical data, state or
national administration cannot operate.
In the auditing sector, sampling approaches are very frequently used. Based on the likelihood
that there will be errors in the book, an auditor will choose the size of the sample of the book
that needs to be audited. Statistics is a tool that is essential to the majority of scientific and
social science fields of study.
• Descriptive Statistics: The presentation and gathering of data are topics covered by
descriptive statistics
Balwani Nitin Quantitative Techniques, First Edition: 2002. Excel Books, New Delhi.
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.
Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York. Gupta S.P.,
Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
12 STATISTICS FOR MANAGEMENT
CONTENT:
▪ Objectives
Introduction
2.1. Classification
2.2. Frequency distribution
2.3. Scope of Statistics
2.4. Summary
2.5. Keywords
2.6. Reviews Questions
2.7. References for further readings
OBJECTIVES:
• To understand classification of the data
INTRODUCTION
The raw material of statistics is statistical data. The data may belong to an activity of interest,
a phenomenon, or an investigational concern. They result from measuring, counting, or
watching.
Statistical data are, therefore, the elements of a problem scenario that may be measured,
quantified, counted, or categorised.
A variable is any subject, phenomenon, or action that generates data through this method. In
other words, a variable is a quantity that varies in some way when repeated measurements are
made.
In statistics, data are divided into two broad categories: quantitative data and qualitative data.
2.1. CLASSIFICATION
Here is how they are related. Let's go. Therefore, the first form of data we've received is
quantitative, quantitative, and numerical.
Therefore, any number-based data that can be collected, such as an individual's age, height, or
weight, is quantitative data, as opposed to qualitative data. Therefore, the quality of something,
such as black hair, brown hair, or red hair, is not numerical data. This is qualitative data quality
data. Therefore, you may also like to use the term chatter ghastly.
Some majorities of people do not utilise qualitative; instead, they use the term category.
Thus, the two primary categories of data are numerical data and categorical data. We can break
these data kinds as well as quantitative data into distinct data types, therefore we can split our
quantitative data into discrete and continuous data. So let's tackle discrete data. Initially,
discrete is precise.
14 STATISTICS FOR MANAGEMENT
For instance, your current school grade is 1234 1112. These values are discrete. Never enrolled
in grades 1.1 or 1.2. The number of shoe songs is another example of a discrete variable; as a
result, you are either nine, nine and a half, or 10. You don't know about point 2576. Because
they are discrete values, they can only take certain shapes. Now contrast this with continuous,
which is infinitely variable. I know this may blow your head, but you've been every point. You
can have an unlimited amount of different points or points for any particular person. till when,
correct?
You will ascend to a height of one metre before attempting to reach point 000000001. And
after that, you will receive 1.00 points up to whatever height you ludicrously claim to be; you
might then say that, depending on your weight, you are about 75 inches tall. Although you most
likely are 75 and visited 76598623. Therefore, while discrete values are precise, continuous
values can have any value. These precise values, however, can be discrete, continuous, or any
other type of value, and they will be 77.5 or 73 points.
We can divide the qualitative data in a manner similar to how we did with the quantitative data.
We therefore have categorical data, or at least two different categories of categorical data, aren't
we? As a result, we have category and nominal data. When you think of a categorical, you
typically think of nominal categorical data, which means that it is odourless. Now, if this
implies that qualitative categories and ordinal items are odourless, then normal has no odour.
Consider that you are a private, then a lance corporal, and finally Corporal Penny, your
sergeant, in the military. I believe, staff sergeant, that this is not numerical data, but it certainly
has a unique order.
Again, these are categories, but they are also ordinal categories. If you earn good grades during
the semester, you may receive VLA li sound, ha or vi ha. They're classifications inside Okay,
so this is data, which may be quantitative or qualitative with numbers or categories, discrete or
continuous, so the numbers may be exact values or continuous with an unlimited number of
values.
Qualitative either black hair, brown hair, red hair, automobiles red, yellow, green, and blue, or
some other classifications, but also discusses.
15 SHOOLINI UNIVERSITY
We're going to use Excel along with hand calculations to create some frequency distributions
to get things started. And when using Excel, you should think about many of the same issues
that you must take into account when making a distribution by hand. How many classes do you
want in the distribution is the first question, then. Normally, we'll tell you that it should be
between five and twenty, but for this class, we'll suggest to use these three classes.
Alternatively, we might tell you how much bandwidth to use, which we'll discuss in a moment.
Accordingly, if we use five classes here, and once more, you can access this data.
Visualization of Data
In order to make it easier for the human brain to comprehend and make inferences from the
data, an approach known as data visualisation involves presenting information in a visual
format, such as a map or graph. Finding patterns, trends, and outliers in sizable data sets is one
of data visualization's main objectives. It's common to use the terms "information graphics,"
"information visualisation," and "statistical graphics" interchangeably.
In order to draw conclusions from data that has been collected, analysed, and modelled, one
phase in the data science process called data visualisation argues that the data must first be
presented. The goal of data presentation architecture (DPA), which includes data visualisation,
is to identify, modify, prepare, and present data as effectively as feasible.
Almost every profession requires the capacity to visualise data. Teachers can use it to display
test results for their pupils, computer scientists can use it to investigate developments in
artificial intelligence (AI), and company executives can use it to interact with stakeholders. It
16 STATISTICS FOR MANAGEMENT
is crucial for large-scale data projects as well. As businesses gathered enormous amounts of
data in the early stages of the big data trend, they needed a way to quickly and easily get an
overview of their data. Visualization tools are naturally integrated.
The importance of visualisation has increased as a result of big data and data analysis projects.
Businesses are using machine learning more and more to gather vast amounts of data, which
may be difficult and time-consuming to organise, understand, and explain.
This process can be sped up with the use of visualisation, which also makes information easier
for stakeholders and business owners to understand.
Big data visualisation frequently goes beyond the typical methods used in traditional
visualisation, such as pie charts, histograms, and business graphs. As an alternative, it uses
more intricate visualisations like heat maps and fever charts. Powerful computer systems are
required for big data visualisation in order to take raw data, understand it, and provide graphical
representations that allow users to draw conclusions quickly.
Univariate Analysis In the univariate analysis, a single characteristic will be used to assess
nearly all of its properties.
Bivariate analysis refers to the process of comparing data between exactly two characteristics.
Multivariate Analysis In the multivariate analysis, more than two variables will be compared.
17 SHOOLINI UNIVERSITY
Table 2.1
Univariate Bivariate
bar graph, histogram, pie chart, line • tables where one variable is
graph, box-and-whisker plot contingent on the values of the
other variable.
2.4. SUMMARY
The initial phase in the statistical process is the collection of statistical data. The information
may be relevant to a phenomenon, a research question, or an interesting pastime. They are the
results of tasks like counting, measuring, or watching.
The elements of a problem scenario that can be measured, quantified, numbered, or categorise
are known as statistical data. In other words, statistical data can be counted, counted, quantified,
or categorised. A variable is anything that generates data using this methodology, including a
subject, an event, or an action.
In other words, a variable is a quantity that varies in some way between different measurements
of the same thing. Data are divided into two primary groups in statistics: quantitative data and
qualitative data. This classification is based on the types of quantifiable characteristics.
2.5. KEYWORDS
Bivariate frequency: Bivariate frequency distributions are data classifications that are
categorised simultaneously based on the magnitude of two characteristics.
19 SHOOLINI UNIVERSITY
Statistical series: Statistical series are collections of classified data that are organised in some
logical order, such as by size, by time of occurrence, or by some other criterion.
2. In every statistical inquiry, discuss the goal, techniques, and importance of tabulation.
Mention the several types of tables that are commonly utilised.
3. Make a frequency table using the following data using a width of 10 for each class. Use
a method of classification that is inclusive.
30, 38, 43, 59, 82, 40, 45, 39, 83, 85, 72, 66, 45, 33, 53, 67, 70, 72, 52, 50, 43, 44, 60, 89, 67,
66, 78, 32, 56, 47, 65, 56, 38, 84, 64, 52, 43, 33, 31, 35, 38, 39, 40, 37, 52, 53, 60
Lindgren B.W (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.
Balwani Nitin, Quantitative Techniques, First Edition: 2002. Excel Books, New Delhi.
20 STATISTICS FOR MANAGEMENT
Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York.
Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Hill Book
Company, New York.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.
Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay.
Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R.P., Statistics for Business and Economics, Macmillan India Delhi, 2008
Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.
21 SHOOLINI UNIVERSITY
CONTENT:
▪ Objectives
Introduction
3.1. Importance of Diagram
3.2. Types of Diagrams
3.3. Methods of creating bar diagram
3.4. Summary
3.5. Key Words
3.6. Review Question
3.7. Reference for further Reading
OBJECTIVES:
• To understand the concept of graphical representation of data
INTRODUCTION
Diagrams are a sophisticated way to depict data, even though tabulation is a very effective way
to convey the data. As a layperson, it is difficult to understand the tabular data, but with just a
quick glance at the graphic, one may fully understand the facts being provided. Diagrams
"record a meaningful impression practically before we think," claims M.J. Moroney.
• We can compare and contrast the various samples we have with relative ease. We need
not use any additional statistical techniques in order to compare.
• This approach can be used in every circumstance, at any moment, and anywhere.
Practically speaking, this approach is employed in a wide range of contexts and
academic domains.
• Diagrams are very valuable as well. Tabular data does not convey information as well
as diagrams do.
• The Mean, Mode, and Median, as well as other statistical variables, can be determined
using this method for the numerical kind of statistical analysis.
• You will not only save time and effort by doing this, but also money. Even high-quality
schematics can be created for comparatively little money.
• These give us a lot more information than tabulation, in comparison. The tabulation
method has some drawbacks of its own.
• It won't be difficult to remember these specifics. Diagrams that we actually see have a
23 SHOOLINI UNIVERSITY
• Large volumes of data can be reduced using diagrams. A simple figure might be able
to illustrate facts that even 10,000 words cannot.
• At the very beginning, the diagram should have been drawn in the appropriate way.
Under a general header that accurately communicates the function of a diagram, the
gist and meat of the topic at hand need to be brought into focus so that the reader can
understand it.
• The size of the scale must not to be excessively large nor insufficiently modest. If it is
excessively huge, it may look unattractive. It is possible that the intended idea will not
be conveyed if it is too little. In every single diagram, the size of the paper needs to be
taken into consideration. The scale of the diagram can then be determined with this
information.
• Some remarks should be placed at the bottom of the diagram in order to address various
uncertainties that exist there. This will provide to create a visual understanding of the
diagram.
• Ensure that the diagrams are well organised and free of clutter. On the diagram, there
should not be any ambiguity, nor should there be any areas with excessive writing.
• The concept of simplicity refers to the feeling of falling in love at first sight. What it
means is that the diagram need to be able to express the content in a straightforward
and simple manner.
• It needs to be clear to everyone what the expectation is. It is necessary for it to indicate
24 STATISTICS FOR MANAGEMENT
the nature of the data supplied, as well as its location and source.
• By utilising a variety of tints and colours, one can develop diagrams that are much
simpler to comprehend.
• The use of a vertical diagram is recommended above the use of a horizontal diagram.
Figure 3.1.
Simple Bar Graph
These figures, which resemble line diagrams, are used when it is possible to display
the data using only one dimension, namely length. With the exception of measuring
only one line's thickness, the technique is substantially the same. Depending on your
preference, these can also be created in either a horizontal or a vertical direction. The
width of these lines or bars must be the same. The distance between these bars should
also be the same. When choosing the breadth and distance between them, it's crucial
to keep the available space on the page in mind.
26 STATISTICS FOR MANAGEMENT
Figure 3.2.
When we need to make a comparison between more than two different variables, we
turn to the diagram for help. It is possible that there are 2, 3, 4, or even more than
that number of variables. In the event that there are two variables, two bars will be
drawn. In a similar manner, when there are three variables, we draw triple bars. In
this case, the bars are drawn using the same proportionate basis as when drawing
simple bars. The same colour has been applied to both of these items.
Figure 3.3
27 SHOOLINI UNIVERSITY
The data that is generally displayed using several bar diagrams can be presented
using this design. As shown in the examples that follow, we integrate the outcomes
of various variables over the length of a period onto a single bar in this scenario.
Each bar must maintain the exact same arrangement for all of its elements. This
diagram will work better if there are three to five components instead of more.
Figure 3.4
In a manner analogous to that of the sub-divided bar diagram, the data pertaining to
a single period or variable is represented by a single bar, but this time the values are
expressed in terms of percentages. To make it easier to make comparisons, we keep
the components in each bar in the same order from one to the next.
In this particular instance, the diagram can be found on either the left or the right
side of the base line, as well as either the upper or lower side.
28 STATISTICS FOR MANAGEMENT
When the value of one variable is significantly higher or lower than that of the others,
this diagram is utilised. In this scenario, the bars that contain more extensive terms
or items might appear broken.
One dimensional diagram are those in which only one dimension, length, has a fixed
size proportional to the value of the data. These diagrams are also frequently referred
to as bar diagrams. Both vertical and horizontal diagramming styles are acceptable.
The associated distinct bar diagrams only differ from one another in terms of their
length dimension; their other two dimensions, namely breadth and thickness, remain
unchanged. The number of such diagrams to be made and the size of the paper at
one's end are taken into consideration while determining the width of each diagram.
If they are to be drawn in great quantities on a piece of paper, they may be in the
shape of a line or a thread. However, their width shouldn't be too big or too tiny
because in both situations they appear unattractive. These diagrams don't seem to
emphasise the thickness dimension. The following page contains examples of these
diagrams.
(i) To begin, draw the base line, preferably in a horizontal direction, and then
divide it into a number of equal sections, keeping in mind the total number
of diagrams that are to be made.
(ii) After that, create the scale line, preferably in a vertical orientation, and split
it into a number of equal sections while keeping the maximum value that
needs to be represented in mind.
29 SHOOLINI UNIVERSITY
(iii) Next, establish a consistent width for each bar, bearing in mind both the total
number of bars that will be drawn and the distance that will be left between
each pair of bars.
(iv) Next, establish a consistent gap size between each of the two bars by fixing
the distance between them.
(v) Following that, adjust the lengths of the various bars so that they are
proportional to the values of the data.
(vi) After that, draw the various bars in accordance with their length and width
as determined by this step, and arrange them in the order of their length or
the time at which they occurred.
(vii) Next, decorate the bars with colours or shades that are the same, different,
or a combination of both, depending on how similarly or differently the
characteristics of the data are accordingly.
Advantages
The following is a summary of the primary benefits that can be gained from using a
bar diagram:
2. It is the only type of diagram that can express a vast number of facts on a piece of
paper, and it is the only type of diagram that exists.
Disadvantages
Two Dimensional:
A shape that is only made up of its width and height, with no thickness in between those
two dimensions. Squares, Circles, Triangles, etc are two dimensional objects.
Alternately referred to as "2D."
Figure 3.5
Three Dimension:
An object that, just like any other item in the real world, includes a height, width, and
depth. Consider the fact that your body has three dimensions.
Also referred to as "3D."
Figure 3.6.
31 SHOOLINI UNIVERSITY
Pie Chart
A specific type of chart that illustrates the relative sizes of data by using "pie slices." Imagine
you conducted a survey among your friends to see which types of movies they enjoy watching
the most:
Figure 3.7
It is a good approach to represent relative sizes: it is easy to see which movie types are the most
liked, and it is also easy to see which movie types are the least liked.
32 STATISTICS FOR MANAGEMENT
3.4. SUMMARY
Diagrams offer a highly understandable and comprehensible representation of the facts.
According to M.J. Moroney, diagrams "capture a meaningful impression practically before we
think." [Citation needed] This approach is utilised nearly everywhere, in a huge range of
different domains and specialised disciplines of research. The content must be expressed in a
manner that is uncomplicated and uncomplicated in the diagram.
In each and every diagram, one must remember to take into account the dimensions of the paper
being used. It is essential that this reflect not only the location and source of the data, but also
the kind of data that is being provided. When we need to make a comparison between more
than two different variables, we look to the diagram for assistance since it helps us visually
represent the relationships between the variables. As you can see in the examples that follow,
in this scenario, we mix the outcomes of numerous variables throughout the duration of a period
onto a single bar. When it comes to drawing graphs and other kinds of diagrams, one of the
most frequent tools that people use is a piece of paper on which they design bar diagrams.
When choosing the width of each diagram, factors such as the number of diagrams of this kind
that need to be created and the dimensions of the paper at one end are taken into consideration.
Next, you will need to modify the lengths of the different bars so that they are proportional to
the values of the data. After that, you should arrange them in the order that corresponds to their
duration or the time in which they took place.
• Graph: A graph is a diagram that depicts the connections between two or more objects.
Q3. What is the importance of making a diagram, also mention the limitation of the diagram?
Lindgren B.W (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.
Balwani Nitin, Quantitative Techniques, First Edition: 2002. Excel Books, New Delhi.
Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York.
Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Hill Book
Company, New York.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.
Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay.
Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
34 STATISTICS FOR MANAGEMENT
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R.P., Statistics for Business and Economics, Macmillan India Delhi, 2008
Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.
CONTENT:
▪ Objectives
Introduction
4.1. Meaning of Central Tendency
4.2. Uses of Central Tendency
4.3. Limitations of Central Tendency
4.4. Keywords
4.5. Review questions
4.6. References for further reading
OBJECTIVES:
• Students will learn about central tendency
INTRODUCTION
A measure of central tendency is a single value that attempts to characterize a set of data by
finding the central position within that set of data. As such, measurements of central tendency
are frequently dubbed measures of central location. They are also categorised as summary
statistics. Distribution of any geographical attribute referred to as xi, where I = 1,2,3,..., n and
n denotes the number of observations, is originally statistically described by measuring its two
dimensions.
First, the central tendency that is based on the analysis of concentration of the observed values.
This statement is the statistical concentration of n items within the variable xi. Second, the
dispersion that demonstrates the distribution's breadth.
We may come up with a lot of observations when we study a population in respect to the one
in which we are interested. It is hard to derive any sort of comprehension about the quality after
taking into account all the observations. Getting one number for one group is therefore
preferred. This number must be a good representative one and paint a clear image for all the
observations to accurately depict that quality. It is conceivable to utilise such a representative
number as the common denominator for all of these observations. Depending on the situation,
this central value may also be referred to as an average, a measure of central tendency, or a
measure of locations. There are five averages in total. Simple averages are instances of what
are known as mean, median, and mode, while special averages are examples of what are known
as geometric mean and harmonic mean.
37 SHOOLINI UNIVERSITY
2. The average number provides a clear image of the subject of study for advice and conclusion.
3. It provides a brief explanation of the performance of the group as a whole and allows us to
compare the normal performance of two or more groups.
2. Because the mean considers every value in the distribution, it is susceptible to the effects
of outliers and distributions that are skewed.
4.4. SUMMARY
A single value that attempts to characterise a set of data by determining the centre location
within that set of data is referred to as a measure of central tendency. This single value can be
thought of as an average value. Because of this, it is common practise to refer to measurements
of central tendency as "measures of central location." In addition, they are classified as a form
of summary statistics. The distribution of any geographical attribute referred to as xi, where I
= 1,2,3,..., n and n denotes the number of observations, was initially statistically described by
measuring its two dimensions. xi stands for "any geographical attribute," and I, 2, 3,..., n stands
for "the number of observations."
First, the central tendency, which is determined by doing an examination of the concentration
of the values that were observed. The statistical concentration of n items within the variable xi
can be found in this statement.
The second component is the dispersion, which demonstrates how broad the distribution is. In
statistical analysis, it is an equally important dimension since there are situations in which the
38 STATISTICS FOR MANAGEMENT
central trend of two variables has the same value but the dispersion changes, or in which the
dispersion has the same size but the central trend differs. Understanding the distributional
features of a series or variable necessitates an in-depth examination of both the central value
and the dispersion of the data in the series or variable.
4.5. KEYWORDS
Central Tendency: A central tendency is a central or typical value for a probability distribution
in statistics. In common parlance, measures of central tendency are typically referred to as
averages. The term central tendency originated in the 1920s. The arithmetic mean, the median,
and the mode are the most popular measurements of central tendency.
Mean: There are various types of mean in mathematics, particularly statistics. The arithmetic
mean, often known as the arithmetic average, is a measure of the central tendency of a finite
set of numbers. Specifically, the arithmetic mean is the sum of the values divided by the number
of values.
Median: In statistics and probability theory, the median is the dividing line between the upper
and lower halves of a data sample, population, or probability distribution. It can be thought of
as "the middle" of a data set.
Mode: The mode is the value that occurs most frequently in a given data set. If X is a discrete
random variable, the mode corresponds to the value x at which the probability mass function
reaches its highest value. In other words, it is the most likely value to be sampled.
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R.P., Statistics for Business and Economics, Macmillan India Delhi, 2008
Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.
CONTENT:
▪ Objectives
Introduction
5.1. Mean
5.2. Functions of mean
5.3. Arithmetic mean
5.4. Merits and Demerits
5.5. Types of mean
5.6. Summary
5.7. Key Words
5.8. Review Question
5.9. Reference on Further Reading
OBJECTIVES:
• To understand the functions of mean and arithmetic mean
INTRODUCTION
As a consequence of this, the concepts of central tendency and central location are often used
interchangeably. In some circles, they are referred to as summary statistics. The mean, which
is also sometimes referred to as the average, is probably the measure of central tendency that
you are most familiar with.
5.1 MEAN
Multiple definitions exist for the average of a distribution. Important definitions include:
"An average is an attempt to discover a single number that describes the entire set of numbers."
— Murray R. Spiegal
"An average is a single value within the data range that is used to represent all values in a series.
Since the mean falls somewhere inside the data distribution, it is frequently referred to as a
"measure of central value."
"A central tendency measure is a characteristic value around which other figures cluster."
2. To enable comparison, the averages of distinct data sets can be compared to one another.
For instance, the mean (or average) earnings of workers at two factories can be used to
compare the wage levels of their employees.
3. The majority of judgments to be made in research, planning, etc. are based on the average
value of particular variables. For instance, if a company's average monthly sales are down,
the sales manager may need to take steps to increase them.
It should be :
1. rigidly defined, preferably with an algebraic formula, so that different individuals acquire
the same value for the same set of data.
2. simple to calculate.
3. simple to comprehend.
Before discussing the arithmetic mean, several notations will be introduced. It is assumed that
there are n observations whose respective values are indicated by X1, X2,..... Xn. The
43 SHOOLINI UNIVERSITY
abbreviated form of the total of these observations X 1 + X2 +..... + Xn is where (called sigma)
signifies the summation sign. I the subscript of X, is a positive integer that represents the
observation's serial number. Given that there are n observations, the range of I will be between
1 and n. As stated above, this is expressed by writing it below and above. When there is no
ambiguity in the range of the summation, we can omit this indicator and write X1 + X2 +..... +
Xn = Xi.
The definition of arithmetic mean is the sum of observations divided by the total number of
observations. It can be computed using either the simple or weighted arithmetic mean. In basic
arithmetic mean, all observations are given equal weight, however in weighted arithmetic
mean, the weight assigned to different observations is not the same.
Suppose we have 5,7,9,11 and 13 , mean =sum of all items divided by 5 which comes to 9.
data can be difficult to calculate precisely, but we can always estimate it. To further
comprehend this concept, let's learn more about the mean of grouped data, the techniques for
determining the mean of grouped data, and work through a few examples.
Calculating the average of a set of data that has been divided into various groups is known as
the mean of grouped data. A frequency table must be set across the data's frequencies in order
to quickly calculate the mean of grouped data.
The direct method, the assumed mean method, and the step deviation method are the three main
approaches for finding the mean of grouped data. For calculating the mean, each of these
approaches has its own formulas and techniques.
Sum of the observations divided by the total number of observations defines the mean formula.
There are two distinct formulas for determining the mean of ungrouped and grouped data,
respectively. Let's examine the formula for determining the mean of grouped data. The equation
is:
x̄ = Σfi/N
N = total frequency
Direct Technique
45 SHOOLINI UNIVERSITY
The direct method is the most straightforward way to determine the mean of grouped data. If
the values of the observations are x, then the result is x.
Here are the procedures necessary to calculate the mean for grouped data using the direct
method:
● Create a table with the columns class interval, class marks (corresponding), xi
frequencies, fi(corresponding), and xifi.
● Using the Formula, compute Mean. Mean = xifi/ fi, where fi is the frequency and xi is
the class interval's midpoint.
● To determine the median, xi we apply the following formula: xi = (upper class limit +
lower class limit)/2.
Table 5.1
Frequency (fi) 9 13 8 15 10
Table 5.2
0-10 9 5 45
46 STATISTICS FOR MANAGEMENT
10-20 13 15 195
20-30 8 25 200
30-40 15 35 525
40-50 10 45 450
Total 55 1415
Suppose we have to find Arithmetic mean from following frequency distribution :Marks 0-7 ,
7-14,14-21,21-28,28-35,35-42 &42-49 and number of students 19,25,36,72,51,43 and 28
respectively.
By short cut method we take mid values of intervals as 3.5,10.5, 17.5,24.5,31.5,38.5 and 45.5
, taking assumed mean as 24.5 , we get N=274 and cumulative fdx=546 , hence
mean=24.5+546/274=24.5+1.99 =26.49
CONTINUOUS SERIES-EXAMPLES
Calculate mean from the following data , X: 1-10, 11-20,21-30,31-40,41-50 & 51-60 and
corresponding frequencies f being 3,5,8,10,9 and 5 respectively.
Calculate mean from the following data , X: 10-19, 20-29,30-39,40-49 & 50-59 and
corresponding frequencies f being 5,8,12,8 and 7 respectively .
iii. It is more accurate and dependable if there are a sufficient number of items.
iv. It is not predicated on its place in the series; rather, it is a computed value.
v. Calculations can be made even when part of the data's specifics are missing.
Demerits:
i. It cannot be discovered using a frequency graph or by visual observation.
ii. In order to explore qualitative phenomena, which cannot be measured numerically, i.e.,
48 STATISTICS FOR MANAGEMENT
iii. Only at the risk of losing accuracy may it overlook any one thing.
vi. If the specifics of the data used to compute it are not provided, it could result in false
findings.
It is a representation of the average of the data that has been provided. The arithmetic mean
and the sample mean are two measures of central tendency that are analogous to the weighted
mean.
When the data are presented in a manner that is distinct from the arithmetic mean or the sample
mean, the weighted mean is the statistic that is computed instead.
Despite the fact that weighted means behave in a manner that is generally equivalent to that of
arithmetic means, there are a few properties of weighted means that run counter to common
sense. The elements of the data set that have a higher weight contribute more to the weighted
mean than the elements that have a lower weight.
There can be no negative values assigned to the weights. Because division by zero is not
permitted, there could be some that are zero, but that does not mean that they are all zero. In
the systems of data analysis, as well as in weighted differential and integral calculus, weighted
49 SHOOLINI UNIVERSITY
Formula:
W= weighted average
Suppose a student scored following marks in different subjects out of 100 : English 35, Maths
80 , Accounts 90 , Economics 45 and Commerce 55 we get his average marks =300/5=60
But in real practice he devotes 3 hours to English , 2 hours to maths, 2 hours to accounts 1 hour
each to economics and commerce , hence his cumulative W=3+2+2+1+1=9
5.6. SUMMARY
A single value from the data range that is used to represent all of the values in a series is referred
to as the average of those values. Because it is located in the middle of the data distribution,
the mean is often referred to as a "measure of central value." This is because it represents the
value that is most prevalent. When conducting research and making plans, the majority of the
decisions that need to be made are determined by the average value of the relevant variables.
50 STATISTICS FOR MANAGEMENT
The arithmetic mean is calculated by dividing the total number of observations by the sum of
all the observations. Either the simple arithmetic mean or the weighted arithmetic mean can be
used to compute it.
The direct technique, the assumed mean method, and the step deviation method are the three
most used approaches for calculating the mean of grouped data. The direct technique is by far
the most straightforward way for calculating the mean of grouped data. The first step is to
create a table containing the columns class interval, class marks, xi frequencies,
fi(corresponding), and xifi. The following formula should be used to get the mean: Mean is
defined as xifi/fi, where xi is the middle of the class interval and fi is the frequency. When the
data are presented in a way that is different from either the arithmetic mean or the sample mean,
the weighted mean is the statistic that is generated in their place.
The behaviour of weighted means is generally equivalent to the behaviour of arithmetic means;
however, there are properties of weighted means that run counter to what would be expected
under normal circumstances.
Central tendency: A central tendency is a central or typical value for a probability distribution
in statistics. In common parlance, measures of central tendency are typically referred to as
averages. The term central tendency originated in the 1920s. The arithmetic mean, the median,
and the mode are the most popular measurements of central tendency.
Grouped data: Grouped data are those that have been created by grouping together individual
observations of a variable in a way that makes it possible to summarise or analyse the data
easily using a frequency distribution table of the groups.
51 SHOOLINI UNIVERSITY
Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.
Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.
52 STATISTICS FOR MANAGEMENT
UNIT 6: MEDIAN
CONTENT:
▪ Objectives
Introduction
6.1. Median of the grouped data
6.2. Ungrouped or raw form
6.3. Cumulative frequency
6.4. Continuous series
6.5. Merits of median
6.6. Demerits of median
6.7. Summary
6.8. Key Words
6.9. Review Questions
6.10. Reference for further Reading
OBJECTIVES:
• To understand the median of the grouped and ungrouped data
INTRODUCTION
When values are sorted in ascending or descending order, the median is the middle value of
the distribution. The median is the midpoint of the distribution (there are 50 percent of
observations on either side of the median value). The median value in a distribution with an
odd number of observations is the midway value.
The median is the middle number, which is 57 years, when examining the retirement age
distribution (with 11 observations):
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
When the number of observations in a distribution is even, the median value is the mean of the
two middle values. The two middle values in the following distribution are 56 and 57, hence
the median is 56.5 years:
52, 54, 54, 54, 55, 56, 57, 57, 58, 58, 60 , 60
Median-Individual Series
• Please arrange data in increasing order means 5,8,10,11,12,13,13,15 ,18 ,20 , we have
N=10 and applying formula we have median =5.5th item
• Hence Median=12.5
54 STATISTICS FOR MANAGEMENT
Median-Discrete Series
• Find Median from the following data ,X(marks) are 10,11,13,15 and 16 and f (number
of students) 5,8,12,9 and 6 respectively.
• Here Median =20.5th item which lies in the cumulative frequency 25.
• Against it X=13
• Hence Median=13
Median-Continuous Series
• Here L1 =Lower limit of Median class interval, N/2 is median item , c.f. is the
cumulative frequency preceding the median class interval , i is the width of Median
class interval.
• Find Median from the following wage distribution for a certain factory :-monthly wages
in Rupees 50-80, 80-100,100-110,110-120,120-130,130-150,150-180 and 180-200 and
number of workers are 30,127,140,240,176,135,20 and 3 respectively
• Median by calculation=110+5.77=115.77
,Compute c.f.
• 50+1.67
• Median=51.67
Median-Comparing groups
56 STATISTICS FOR MANAGEMENT
• Paper B -58,54,21,51,59,46,65,31,68,41,70, 36
• Paper C – 65,55,26,40,30,74,45,29,85,32,80,39
• Median A=(51+52)/2=51.5
• Median B=(51+54)/2=52.5
• Median C=(40+45)/2=42.5
The median is less susceptible to outliers and skewed data than the mean, making it the ideal
measure of central tendency when the distribution is not symmetrical.
It is impossible to identify the median for categorical nominal data since it cannot be properly
arranged.
To determine the median class, we must calculate the cumulative frequency of each class and
n/2. Then, identify the class whose cumulative frequency exceeds (is closest to) n/2. The class
is known as the middle class.
After determining the median class, use the formula below to calculate the median value.
Where.
The cumulative frequency of the class before the median class is denoted by cf.
Table 6.1
To get the median weight, we must first determine the class intervals and their associated
frequencies.
The supplied distribution takes the form of being less than type 145, 150,..., and 165, which
represents the upper limit. Thus, the class should be below 140, between 140 and 145, between
145 and 150, between 150 and 155, between 155 and 160, and between 160 and 165.
Four boys are below 140. The frequency of class intervals below 140 is therefore 4.
There are 11 girls with heights below 145, and 4 with heights below 140.
Therefore, the frequency distribution for the class interval 140-145 equals 11 minus 4 equals
Consequently, a frequency distribution table and cumulative frequencies are shown below:
Table 6.2
Below 140 4 4
140-145 7 11
145-150 18 29
150-155 11 40
155-160 6 46
160-165 5 5
Here, n= 51.
Thus, the observations lie between the class interval 145-150, which is called the median class.
Therefore,
Class size, h = 5
60 STATISTICS FOR MANAGEMENT
We know that the formula to find the median of the grouped data is:
Median = 149.03.
Therefore, the median height for the given data is 149. 03 kg.
Formula
Example
61 SHOOLINI UNIVERSITY
If the weights of sorghum ear heads are 45, 60,48,100,65 gms, calculate the median
Solution
Here n = 5
First arrange it
in ascending
order 45, 48,
60, 65, 100
Median =
3rd = 60
Example
If the sorghum ear- heads are 5,48, 60, 65, 65, 100 gms, calculate the median.
Solution
Here n = 6
Median
62 STATISTICS FOR MANAGEMENT
Step 2: Find
Step 3: See in the cumulative frequency the value first greater than , Then the corresponding
class interval is called the Median class.
Where,
n: Total frequency
2. In the case of a distribution with open-end intervals, the median can be computed.
2. The median is an estimated value other than any value in the series when there are an even
number of items or a continuous series.
3. It can only be used to calculate mean deviation and cannot be subjected to further
mathematical analysis.
64 STATISTICS FOR MANAGEMENT
6.7. SUMMARY:
When looking at the distribution of retirement ages (with 11 different observations), the median
is the number that falls exactly in the middle, which is 57 years. The mean is more likely to be
affected by outliers and skewed data, while the median is less likely to be affected by either.
When the distribution is not symmetrical, it is the most accurate measure of central tendency
that can be used. There are 11 girls who have heights that are less than 145, while there are
only 4 boys who have heights that are greater than 140. The frequency distribution for the class
interval 140-145 equals 11 minus 4 equals 7.5, and the frequency of 145 minus 150 equals 29
minus 11 equals 18r.
Frequency of 150-155 equals 40-29 equals 11, and 6-46 equals 6. Determine the average
number of insects that are living on each plant, and then work your way up from there. The
cumulative frequency that is highest represents the total number of items, and the median is the
average of all the observations that were made by a specific group of plants over a certain
amount of time (n/2).
Median : The median is the number that appears in the middle of a list of numbers that has
been arranged in either ascending or descending order. The median is often more descriptive
of a data collection than the average does.
Cumulative Frequency: A frequency distribution table's frequencies are added to the total of
their predecessors to determine the cumulative frequency. Due to the fact that all frequencies
have already been added to the prior total, the final number will always be the same as the sum
of all observations.
65 SHOOLINI UNIVERSITY
Class Interval : The gap that exists between the upper class limit and the lower class limit is
referred to as the class interval. For instance, the size of the class interval for the first class is
equal to four after subtracting thirty from twenty-six. In a similar manner, the size of the class
interval for the second class is equal to four and is calculated as follows:
Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.
Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.
UNIT 7: MODE
CONTENT:
▪ Objectives
Introduction
7.1. Continous Distribution
7.2. Percentile
7.3. Quartile
7.4. Summary
7.5. Key Words
7.6. Review Questions
7.7. Reference for further reading
OBJECTIVES:
• Concept of mode and its use
INTRODUCTION
The mode of a variable is the value that occurs most frequently. The greatest frequency of the
distribution corresponds to the value of the variable. In any series, the value of the item that is
repeated the most frequently is the one that is most typical or frequent. The mode can be simply
determined by examining the data. All that is required is to identify the item with the highest
frequency of repetition.
• Mode: Most recurring frequency (What is the most popular course amongst Shoolini
students? What is the most common Car Brand? Value which is repeated maximum
number of times in a data.
• Mode is value at a point around which items are most heavily concentrated.
If two or more values appear with the same frequency, they are each considered modes. A
disadvantage of using the mode as a measure of central tendency is that a data set may have no
mode or multiple modes.
However, the identical set of data will only have a single mean and a single median. The mode
of a data set is often referred to using the term modal. If a data collection has only one value
that happens most frequently, it is referred to as unimodal. Similarly, a data set with two values
that occur most frequently is referred to as bimodal. A set is termed multimodal when it has
more than two values that occur with the same highest frequency. No mathematics are required
when finding the mode of an ungrouped data set; nonetheless, good observation is required.
Consider the subsequent instances:
On a congested roadway, the posted speed limit is 80 kilometres per hour. The following are
the speeds (in kilometres per hour) of ten vehicles that were stopped for exceeding the speed
limit:
There is no need to organise the data, unless you believe that arranging the numbers from
smallest to largest would make it easier to identify the mode. The number 99 appears twice in
the data set presented above, while all other numbers appear just once. Given that 99 appears
most frequently, it is the mode of the data values.
• If we have X:5,8,10,11,15,18,20,22,25,30
• Y:4,8,10,12,13,10,15,12,13,12
• Z:3,5,8,12,12,13,15,13,17,19
• No Mode in series X
• Mode is 12 in series Y
When calculating the mode for a grouped frequency distribution, we first determine the modal
class, or the class with the highest frequency. Then, we will calculate the mode using the
following formula.
69 SHOOLINI UNIVERSITY
To compute the mode of a grouped or continuous frequency distribution with equal class
intervals, we can utilise the following steps:
Step 1: Prepare the frequency distribution table with the observations in the first column and
the respective frequency in the second column.
Step 2: Determine the class of maximum frequency by inspection in the second step. This
category is known as the modal class.
Example
70 STATISTICS FOR MANAGEMENT
Table 7.1
Height (in
125-130 130-135 135-140 140-145 145-150
cm)
Number of
7 14 10 10 9
students
The maximum frequency in this case is 14 and the corresponding class is 130-350. Therefore,
130-135 is the modal class such
Mode =
Example:
For the frequency distribution of weights of sorghum ear-heads given in table below.
Calculatethe mode
Table 7.2
60-80 22
80-100 38
100-120 45 f
120-140 35
72 STATISTICS FOR MANAGEMENT
140-160 20
Total 160
Solution
Mode =
Here,
Mode =
= = 109.589
• X:0-10,10-20,20-30,30-40 ,40-50
• f:5,8,12,6,4
Median=52.5
• We have X:5,6,7,8,10,11
• f:4,8,12,15,11,4
• 41=3(Median)-2(45)
• 3(Median)=41+90=131
• Median=43.67
74 STATISTICS FOR MANAGEMENT
7.2. PERCENTILE
The percentile values divide the distribution of the data into 100 equal groups, each
representing 1% of the entire distribution. The xth percentile is the point in the
distribution below which x percent of the values fall. It's critical to keep in mind that
the median represents the middle 50% of the distribution.
For raw data, first arrange the n observations in increasing order. Then the xth
percentile is given by
items
Where,
l= lower limit of the percentile class which contains the xth percentile value (x. n /100)
c = class interval
= 35 ++ (38-35)
(38-35)
== 35
35 ++ 33 == 37.25
37.25 kg
kg
7.3. QUARTILE
Four separate groupings are formed within the distribution by the quartiles. The data
can be split into three equal groups known as quartiles. The median and the second
quartile in a normal distribution are similar because they both fall in the middle. The
lower and higher quartiles, respectively, serve to define quarters. The second quartile
contains the median and 50th percentile.
76 STATISTICS FOR MANAGEMENT
Example:
Computequartiles
Compute quartilesforfor
thethe data
data given
given below
below (grains/panicles)
(grains/panicles) 25,30,
25, 18, 18,8, 30,
15, 8,
5, 15, 5, 10,
10, 35, 40,35,
45 40, 45
Solution
Solution
5, 8, 10, 15, 18, 25, 30, 35, 40, 45
5, 8, 10, 15, 18, 25, 30, 35, 40, 45
= (2.75)th item
= (2.75)th item
= 2nd item + (3rd item – 2nd item)
= 8+ x (10-8)
= 8+ 2
= 8+ x2
= 8+1.5
= 9.5
= 3 x (2.75) th item
= (8.75)th item
= 35+ (40-35)
= 35+1.25
= 36.25
77 SHOOLINI UNIVERSITY
7.4. SUMMARY:
The value of a variable that appears most frequently in a data set is referred to as the
mode of the variable. Examining the data is a straightforward method for determining
the mode. The determination of the mode of an ungrouped data set does not require
any mathematics; however, it does require careful observation and observation.
Following these steps will allow you to calculate the mode of a grouped or continuous
frequency distribution that has equal class intervals. Create the table showing the
frequency distribution.
In the second step, inspection will be used to determine which class has the highest
frequency. Find the frequency that is the highest overall, as the class that corresponds
to that frequency is referred to as the modal class. The formula should then be used.
The values of the percentiles divide the distribution of the data into a hundred equal
groups, with each group accounting for 1% of the whole. It is possible to split the data
into three equal parts that will be referred to as quartiles. Because they are both located
in the middle of the distribution, the median and the second quartile of a normal
distribution are identical.
Percentile: A percentile is a score that compares a specific score to the scores of the other
members of a group. It displays the proportion of other scores that a given score outperformed.
For instance, if you have a test score of 75 and are placed in the 85th percentile, it signifies that
your score is greater than those of 85% of other test takers.
Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.
Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.
Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay
79 SHOOLINI UNIVERSITY
CONTENT:
▪ Objectives
Introduction
8.1. Definitions
8.2. Objectives of measuring dispersion
8.3. Characteristics of a good measure of dispersion
8.4. Measure of dispersion
8.5. Range
8.6. Interquartile range
8.7. Mean deviation of average deviation
8.8. Standard deviation
8.9. Keywords
8.10. Review questions
8.11. References for further reading
OBJECTIVES:
• Students will learn about dispersion and its use
80 STATISTICS FOR MANAGEMENT
INTRODUCTION
A measure of central tendency condenses the distribution of a variable into a single number
that can be considered representative. However, this metric alone is insufficient to define a
distribution, as it is possible for two or more distinct distributions to have the same central
value. Inversely, it is conceivable for the distribution patterns of two or more instances to be
identical but their central tendencies diverge. To effectively capture the properties of a
distribution, it is therefore required to establish additional summary metrics. The measure of
dispersion or the measure of variation is one such metric.
Dispersion relates to the level of scatter or variation in observations. Typically, the variability
of an observation is measured as its divergence from the mean. A reasonable average of all
such variations is known as the dispersion measure.
"Dispersion is the degree to which particular items differ from one another."— L.R. Connor
The degree to which numerical data tend to disperse around a mean value is referred to as the
variance or dispersion of the data.— Spiegel
The metrics that express the dispersion of data in terms of the distance between selected values.
Examples include range, interquartile range, interpercentile range, etc. The metrics that express
the dispersion of observations as the average of their deviations from some central value. These
are sometimes known as second-order averages, such as mean deviation, standard deviation,
etc.
• For example we have 5 items all of value 20 each and Arithmetic Mean is 20 and no
dispersion.
● To assess the degree of variation between two or more distributions: Comparing the
dispersions of two or more distributions allows for a comparison of their variability. A
distribution with a smaller dispersion value is more uniform or consistent.
82 STATISTICS FOR MANAGEMENT
● For the purpose of facilitating the computations of other statistical measures, dispersion
measures are utilised in the computations of numerous crucial statistical measures, such
as correlation, regression, test statistics, confidence intervals, control limits, etc.
● To serve as the foundation for variation control: The primary purpose of computing a
dispersion measure is to determine whether the given observations are uniform. This
knowledge has multiple applications. According to Spurr and Bonini, "in questions of
health, fluctuations in body temperature, heart rate, and blood pressure are fundamental
diagnostic indicators."
Absolute measurements of dispersion are those that are represented in the same unit as the
variable being measured, such as kilogrammes, rupees, centimetres, or marks.
2. Relative Dispersion
• These are arbitrary figures, such as percentages, which are not dependent in any way
on the units of measure.
• The fluctuations in the sampling should not have an undue impact on it.
(d) Range
84 STATISTICS FOR MANAGEMENT
8.6. RANGE
The range of a distribution is the difference between the distribution's largest and smallest
observations. R = L - S, where R represents the range and L and S represent the largest and
smallest observations, respectively.
R is the absolute range measurement. The definition of a relative measure of range, often
known as the coefficient of range, is as follows:
• Range- The difference between highest value and lowest value of a series is called
range.
• R=H.V.-L.V.
• =(H.V.-L.V.)/(H.V.+L.V.)
• Let us find range and its coefficient from the following data: 22,35,32,45,42,48,39
• Range=48-22=26
• Coefficient =26/70=0.37
• Find Range and coefficient of Range from following data: X:3,4,5,6,7,8,9,10 and
f:35,30,20,10,6,3,2,1 , Range is 7 and Coefficient of Range is 7/13=0.54
85 SHOOLINI UNIVERSITY
Demerits
4. It provides no information about the distribution's pattern. There can exist two distributions
with the same range but distinct patterns.
1. It is used to create control charts for monitoring the quality of manufactured products.
2. It is also utilised in the research of fluctuations, such as the price of a product, a patient's
temperature, the amount of rainfall over a specific period, etc.
86 STATISTICS FOR MANAGEMENT
Interquartile range
Interpercentile Range
Interpercentile range, or simply percentile range, can also be used to deal with the problem of
extreme observations.
This measure eliminates i% of the observations at each end of the distribution and is a range of
the middle (100 – 2i)% of the observations.
Generally, a percentile range equivalent to i = 10, i.e., P90 – P10 is used. Since Q1 = P25 and Q3
= P75, therefore, interquartile range is also a percentile range.
Demerits
Since it is not based on all observations, it is not a trustworthy dispersion measure.
3. Inter Quartile variation is difference between values of 3rd Quarter and 1st Quarter.
10. Q1=Size of (12+1)/4 th item=Size of 3rd item +1/4(size of 4th item-size of 3rd item)
11. 70+1/4(90-70)=75
88 STATISTICS FOR MANAGEMENT
12. Similarly Q3= Size of ¾(12+1)th item =Size of 9.75th item=size of 9th item+3/4(size of
10th –size of 9th) =145+3/4(145-145)=145
15. Find values of Quartile Deviation and its coefficient from the following data:
18. 240,310,370,480,570,600,650,780,1200,1600,2100
21. Q.D.=(1200-370)/2=415
• Find values of Quartile Deviation and its coefficient from the following data:
• Marks:0-10.10-20,20-30,30-40,40-50
• Number of Students:4,15,28,16,7
89 SHOOLINI UNIVERSITY
• Q.D.=(33.44-19)/2=7.22
• Find values of Quartile Deviation and its coefficient from the following data:
• Q.D.=(123.57-83.71)/2=19.93
1
Mean Deviation from 𝑋 = 𝑛 ∑𝑛𝑖=1 |𝑋𝑖 − 𝑋|
The above mean deviation formulas provide an absolute measure of dispersion. The following
are the formulas for relative measure, often known as the coefficient of mean deviation:
Example: Calculate mean deviation from mean and median for the following data of heights
(in inches) of 10 persons.
Solution:
60+62+70−6963+65+60+68+63+64
𝑋= = 64.4 𝑖𝑛𝑐ℎ𝑒𝑠
10
1
M.D. from 𝑋 = 10 [272 − 372 − (4 − 6)64.4] = 2.88 𝑖𝑛𝑐ℎ𝑒𝑠
91 SHOOLINI UNIVERSITY
2.88
Also, coefficient of M.D. from 𝑋 = 64.4 = 0.045 2
23. It is less susceptible to extreme observations than the range or standard deviation.
Demerits
1. It cannot be further analysed mathematically. Since mean deviation is the arithmetic mean
of deviations' absolute values, it is not very amenable to algebraic manipulation.
2. This entails the search for a dispersion measure that can be submitted to further
mathematical analysis.
3. As deviations can be obtained from any measure of central tendency, this measure of
dispersion is not well-defined.
This would indicate that the observations are homogeneous. However, it is true that various
observations differ from one another. As a measure of dispersion, the positive square root of
the arithmetic mean of the sum of squares of these variances is taken.
The standard deviation is represented by the Greek letter, sometimes known as "little sigma"
or just "sigma."
Individual series:
The above-mentioned steps are appropriate when 𝑋 represents a whole number. If 𝑋 does not
represents a whole number, then below mentioned steps can be used to find standard deviation
Example: Calculate the standard deviation of the weight of luggage, 10 persons are carrying
Table 8.1
Weights 45 49 55 50 41 44 60 58 53 55 Total
(X) (510)
X-𝑋 -6 -2 4 -1 -10 -7 9 7 2 4 0
94 STATISTICS FOR MANAGEMENT
2 36 4 16 1 100 49 8 49 4 16 356
(𝑋 − 𝑋)
510 356
𝑋= = 51 𝑘𝑔𝑠 and 𝜎 2 = = 35.6𝑘𝑔𝑠 2
10 10
𝜎
× 100
𝑋
95 SHOOLINI UNIVERSITY
• 2. By taking deviations of the items from an assumed mean. 1. Deviations taken from
Actual Mean: When deviations are taken from actual mean the following formula is
applied: σ = ∑ x2/N where x=X-Mean and N = number of observations
• 8 12 13 15 22
• Coefficient of SD=SD/Mean=4.604/14=0.3289
• Coefficient of SD=4.604/14=0.3289
• The square root of mean pf deviations squared or root mean square deviation from
mean.
• Find standard deviation by direct method and short cut method for following data:
40,44,54,60,62,64,70,80,90,96
• In short cut method we take assumed mean =64 , compute dx=X-A and dx square=(X-
A) square
• X:5,8,11,12,14,16,18
• Compute dx=X-A for each row and dx square for each row
• Performance: 1 2 3 4 5 6 7
• Marks by P: 46 42 44 40 43 41 45
• Marks by O: 40 38 36 35 39 37 41
• Find out coefficient of variation in the marks awarded by two judges and interpret the
result
• Marks by P
• Mean = 301/ 7 = 43
• Marks by 0
• Mean = 277/ 7 = 38
• The average marks obtained by P are higher. Hence his performance is better. The
coefficient of variation is lower in case of P hence he is a more consistent student.
8.8.4. SKEWNESS
A distribution's skewness refers to its asymmetry. The symmetry of a distribution denotes that
for a given departure from the mean, there are an equal number of observations on either side.
If the distribution is asymmetrical or skewed, the frequency curve will have an extended tail to
the left or right. Therefore, the skewness of a distribution is the deviation from symmetry. It is
possible for two or more frequency distributions to share the same mean and standard deviation,
but not skewness.
100 STATISTICS FOR MANAGEMENT
Figure 8.1
8.9. SUMMARY
A variable's distribution is condensed into a single number that serves as a measure of its central
tendency. This number can be regarded indicative of the variable. On the other hand, this
measure by itself is not adequate to define a distribution because it is possible for two or more
separate distributions to have the same central value.
On the other hand, it is possible for there to be differences between the central tendency of two
or more examples despite the fact that their distribution patterns are the same. Therefore, the
establishment of additional summary metrics is required in order to fulfil the requirements for
successful property capture of a distribution.
One example of this kind of statistic is the measure of dispersion, often known as the measure
of variation.
The level of scatter or variance in observations is what is meant by the term "dispersion." In
most cases, the degree to which an observation deviates from the mean is used as a proxy for
quantifying the observation's degree of variability. The term "dispersion measure" refers to an
estimate that is based on an acceptable average of all of these deviations.
8.10. KEYWORDS
Averages of second order: The measurements that indicate the spread of data in terms of the
average of deviations of observations from some central value, such as mean deviation,
standard deviation, etc., are termed averages of second order.
101 SHOOLINI UNIVERSITY
Dispersion: is the measure of the degree to which individual objects fluctuate. Measures that
express the dispersion of observations in terms of the distance between the values of selected
observations. Examples include range, interquartile range, interpercentile range, etc.
Interquartile Range: is an absolute measure of dispersion based on the difference between the
third quartile (Q3) and the first quartile (Q1) (Q1) Interquartile range equals Q3 minus Q1
Measure of variation: The measure of the dispersion of the mass of figures in a series around
a mean.
2. "Indeed, averages and measures of variation cover the majority of a practical statistician's
needs, but their interpretation and usage together require a solid understanding of statistical
theory." — Tippet. Examine this assertion using the arithmetic mean and standard deviation.
3. Explain why the standard deviation is a more accurate measure of dispersion than other
methods that have been tried in the past. If there are any, please mention them.
102 STATISTICS FOR MANAGEMENT
4. Calculate range and its coefficient from the following data: (a)159, 167, 139, 119, 117, 168,
133, 135, 147, 160
(b) Frequency: 4 5 6 3 1 1
5. Find out quartile deviation and its coefficient from the following data:
Frequency: 15 26 12 5 4 3
6. Find out the range of income of (a) middle 50% of workers, (b) middle 80% of the workers
and hence the coefficients of quartile deviation and percentile deviation from the following
data:
No. of workers :5 8 15 20 30 33 35
Garrett H.E. (1956), Elementary Statistics, Longmans, Green & Co., New York.
Guilford J.P. (1965), Fundamental Statistics in Psychology and Education, Mc Graw Hill Book
Company, New York.
103 SHOOLINI UNIVERSITY
Gupta S.P., Statistical Method, Sultan Chand and Sons, New Delhi, 2008.
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
104 STATISTICS FOR MANAGEMENT
CONTENT:
▪ Objectives
Introduction
9.1. Concepts of Random Experiments and RandomVariables
9.2. Views of Probability: Subjective, Classical
9.3. Types of Probability
9.4. Rules of Probability: Multiplication Rule for Independent and
Dependent
9.5. Random Events and Probability
9.6. Counting techniques and calculation of probabilities
9.7. Conditional Probability
9.8. Summary
9.9. Keywords
9.10. Review Questions
OBJECTIVES:
• The concept of probability
• Grab some knowledge about the rule of probability and understands where to apply
• The concept of Counting techniques and calculations of probabilities
105 SHOOLINI UNIVERSITY
INTRODUCTION
Probability is a metric for determining how likely an event is to occur. If an event can occur in
N mutually exclusive and equally likely ways, and if m of these possesses a trait, E, the
probability of the occurrence of E is read as P(E) = m/N
Concept:
Probability is the likelihood or chance that a particular event will or will not occur.
This means that the probability of a certain event is 1 and the probability of an impossible event
is 0. In other words, a probability near 0 indicates that an event is unlikely to occur whereas a
probability near 1 indicates that an event is almost certain to occur.
For example:
Suppose an event is the success of a new product launched. A probability 0.90 indicates that
the new product is likely to be successful whereas a probability of 0.15 indicates that the
product is unlikely to be successful in the market. A probability of 0.50 indicates that the
product is just as likely to be successful as not.
Examples:
(i) Tossing of a fair coin is an experiment and it has two possible outcomes: Head (H) or Tail
(T).
106 STATISTICS FOR MANAGEMENT
(ii) Rolling a fair die is an experiment and it has six possible outcomes: appearance of 1 or 2
or 3 or 4 or 5 or 6 on the upper most face of a die.
(iii) Drawing a card from a well shuffled pack of playing cards is an experiment and it has 52
possible outcomes.
Examples:
(i) Tossing of a fair coin is an experiment and it has two possible outcomes: Head (H) or Tail
(T).
(ii) Rolling a fair die is an experiment and it has six possible outcomes: appearance of 1 or 2
or 3 or 4 or 5 or 6 on the upper most face of a die.
(iii) Drawing a card from a well shuffled pack of playing cards is an experiment and it has 52
possible outcomes.
Events
Examples: (i) If a fair coin is tossed, the outcomes - head or tail are called events.
(ii) If a fair die is rolled, the outcomes 1 or 2 or 3 or 4 or 6 appearing up are called events.
107 SHOOLINI UNIVERSITY
Exhaustive Events
The total number of possible outcomes of a trial/experiment are called exhaustive events.
In other words, if all the possible outcomes of an experiment are taken into consideration, then
such events are called exhaustive events.
Examples: (i) In case of tossing a die, the set of six possible outcomes, i.e., 1, 2, 3, 4, 5 and 6
are exhaustive events.
(ii) In case of tossing a coin, the set of two outcomes, i.e., H and T are exhaustive events. (iii)
In case of tossing of two dice, the set of possible outcomes are 6 × 6 = 36
Equally-Likely Events
The events are said to be equally-likely if the chance of happening of each event is equal or
same.
In other words, events are said to be equally likely when one does not occur more often than
the others.
Examples: (i) If a fair coin is tossed, the events H and T are equally-likely events.
(ii) If a dice is rolled, any face is as likely to come up as any other face. Hence, the six outcomes
-1 or 2 or 3 or 4 or 5 or 6 appearing up are equally likely events.
Two events are said to be mutually exclusive when they cannot happen simultaneously in a
single trial.
108 STATISTICS FOR MANAGEMENT
In other words, two events are said to be mutually exclusive when the happening of one
excludes the happening of the other in a single trial.
Examples: (i) In tossing a coin, the events Head and Tail are mutually exclusive because both
cannot happen simultaneously in a single trial. Either head occurs or tail occurs. Both cannot
occur simultaneously. The happening of head excludes the possibility of happening of tail.
(ii) In tossing a dice, the events 1, 2, 3, 4, 5 and 6 are mutually exclusive because all the six
events cannot happen simultaneously in a single trial. If number 1 turns up, all the other five
(i.e., 2, 3, 4, 5, or 6) cannot turn up.
Complementary Events
Examples: (i) In tossing a coin, occurrence of head (H) and tail (T) are complementary events.
(ii) In tossing a dice, occurrence of an even number (2, 4, 6) and odd number (1, 3, 5) are
complementary events
Definition of Probability:
According to Laplace,
“Probability is the ratio of the favourable cases to the total number of equally likely cases”.
For example, if a bag contains 6 green and 4 red balls, then the probability of getting a green
ball will be 6/4 + 6 = 6/10 because the total number of balls are 10 and the number of green
balls is 6 .
109 SHOOLINI UNIVERSITY
P(A) = p = Number of Favorable Cases/ Total Number of Equally Likely Cases =m/n
p+q=1
“If an event can happen in m ways and fails to happen in n ways, then probability of happening
is m/m n + and that of its failure to happen is n/m+n ”.
(1) The importance of probability is clear from the following points: (1) Probability is used in
making economic decision in situations of risk and uncertainty by sales managers, production
managers, etc.
(2) Probability is used in theory of games which is further used in managerial decisions.
(3) Various sampling tests like Z-test, t-test and F-test are based on the theory of probability.
(4) Probability is the backbone of insurance companies because life tables are based on the
theory of probability. Thus, probability is of immense utility in various fields.
Probability Scale
110 STATISTICS FOR MANAGEMENT
The following steps are to be followed while calculating the probability of an event:
(3) Divide the number of favourable cases to the event (m) by the total number of equally
likely cases (n).
(4) This will give the probability of an event. Symbolically, Probability of occurrence of
an event E is: P(E) = Number of favourable cases to E/Total number of equally likely
cases = m/n
Example 1:
Solution: When a coin is tossed, there are two possible outcomes - Head or Tail.
P(H) = m/n = ½
111 SHOOLINI UNIVERSITY
Example 2: What is the probability of getting an even number in a throw of an unbiased dice
?
Solution: When a die is tossed, there are 6 equally likely cases, i.e., 1, 2, 3, 4, 5, 6. Total
number of equally likely cases = n = 6
Example 3:
A bag contains 5 black and 10 white balls. What is the probability of drawing (i) a black ball,
(ii) a white ball ?
(i) P (black ball) = No. of black balls/Total No. of balls = 5/15 = 1/3
(ii) P (white ball) = No. of white balls/Total No. of balls = 10/15 = 2/3
Example 4:
In a lottery, there are 10 prizes and 90 blanks. If a person holds one ticket, what are the chances
of (i) getting a prize (ii) not getting a prize Solution: Total No. of tickets = 10 + 90 = 100 (i)
Probability of getting a prize: No. of prizes = 10 ∴ No. of favourable cases = 10 Total No. of
cases = 100 Required Probability = 10/100 = 1/10 = 0.1 (ii) The probability of not getting a
prize: No. of Blanks = 90 ∴ Number of favourable cases = 90
Example 5:
What is the probability of getting a number greater than 4 with an ordinary dice ?
∴ Number of favorable cases = 2 Total number of cases = 6 Required Probability = 2/6 = 1/3
Example 6:
Find the probability of drawing a face card in a single random draw from a well shuffled pack
of 52 cards.
Solution: There are 52 cards in a pack of cards. Total number of cases = 52 Number of favorable
cases (face cards include the Jack, Queen and King in each) = 12
Example 7: A card is drawn from an ordinary pack of playing cards and a person bets that it
is a spade or an ace. What are odds against his winning this bet ?
Solution: Total number of cases = 52 Since there are 13 spades and 3 aces (one ace is also
present in spades),
The probability of losing the bet = 1-4/13 = 9/13 Hence, odds against winning the bet = 9/13
:: 4/13= 9: 4
113 SHOOLINI UNIVERSITY
Example 8: A single letter is selected at random from the word ‘PROBABILITY’. What is the
probability that it is a vowel ?
Solution: There are 11 letters in the word ‘PROBABILITY’ out of which 1 is be selected. ∴
Total No. of words = 11
Example 10: What is the probability that a leap year selected at random will contain 53
Sundays ?
Following may be the 7 possible combinations of these two extra days: (i) Monday and Tuesday
(ii) Tuesday and Wednesday (iii) Wednesday and Thursday (iv) Thursday and Friday (v) Friday
and Saturday (vi) Saturday and Sunday (vii) Sunday and Monday
114 STATISTICS FOR MANAGEMENT
A selected leap year can have 53 Sundays if these two extra days happen to be a Sunday Total
possible outcomes of 2 days = n = 7 Number of cases having Sundays = m = 2
Example 11: From a bag containing 5 red and 4 black balls. A ball is drawn at random. What
is the probability that it is a red ball ?
Solution: Total No. of balls in the bag = 5 + 4 = 9 No. of red balls in the bag = 5 ∴ Probability
of getting a red ball = 5
Example 12: What is the probability of getting a king in a draw from a pack of cards ?
Solution: Number of exhaustive cases = n = 52 There are 4 king cards in an ordinary pack. ∴
Number of favorable cases = m = 4 ∴ Probability of getting a king = 4/52 = 1/13
Example 13:
In a single throw with two uniform dice find the probability of throwing (i) Five, (ii) Eight.
Solution.
Exhaustive number of cases in a single throw with two dice is 6 square = 36. (i) Sum of ‘5’
can be obtained on the two dice in the following mutually exclusive ways : (1, 4,), (4, 1), (2,
3), (3, 2) i.e., 4 cases in all where the first and second number in the bracket ( ) refer to the
numbers on the 1st and 2nd dice respectively.
(ii) The cases favorable to the event of getting sum of 8 on two dice are : (2, 6), (6, 2), (3, 5),
(5, 3), (4, 4) i.e., 5 distinct cases in all ∴ Required probability = 5/36 ·
115 SHOOLINI UNIVERSITY
Example 14:
Find the probability that (i) They are a king, a queen, a jack and an ace.
(vi) There are two cards of clubs and two cards of diamonds
Solution:
Four cards can be drawn from a well shuffled pack of 52 cards in 52C4 ways, which gives the
exhaustive number of cases.
Example 15:
What is the chance that a non-leap year should have fifty-three sundays ? Solution.
A non-leap year consists of 365 days i.e., 52 full weeks and one over-day. A non-
leap year will consist of 53 sundays if this over-day is sunday. This over-day can
be anyone of the possible outcomes
(i) Sunday (ii) Monday (iii) Tuesday (iv) Wednesday (v) Thursday (vi) Friday
(vii) Saturday i.e., 7 outcomes in all. Of these, the number of ways favorable
to the required event viz., the over-day being Sunday is 1. ∴ Required
probability = 1/7 ·
Example 16:
(ii) A bag contains 6 black and 9 white balls. A person draws out 2 balls. If on
every black ball he gets Rs. 20 and on every white ball Rs. 10, find out his
expectation.
(iii) Solution: There may be the following three options for drawing 2 balls: (i)
Both are white, (ii) Both are black, (iii) One is white and other is black. (i)
Both balls are white P (2W) = p = 9C2/12C2=12/35
(iv) Expectation = p × m = (12/35)*10*2= Rs. 6.86
(v) (ii) Both balls are black P (2B) = p = 6C2/15C2 = 1/7 Expectation = p × m
= 20*2*(1/7) = Rs. 5.71
(vi) (iii) One ball is white and the other is black P (1W 1B) = p =
6C1*9C1/15C2= 18/35 Expectation = p × m = (18/35)*(20+10)= Rs. 15.43
(vii) Total Expectation = 6.86 + 5.71 + 15.43 = Rs. 28
Example 17:
(viii) If it rains, a taxi driver can earn Rs. 1000 per day. If it is fair, he can lose
Rs. 100 per day. If the probability of rain in 0.4, what is his expectation?
117 SHOOLINI UNIVERSITY
(xi) X1 = 1000
(xii) X2 = – 100 P
(xiii) P1 = 0.4
Example 18:
A petrol pump dealer sells an average petrol of Rs. 80,000 on a rainy day and an average of
Rs. 95,000 at a clear day. The probability of clear weather is 76% on Tuesday. What will be
the expected sale ?
X1 = 80,000
X2 = 95,000
P 1 – 0.76 = 0.24
P2=0.76
118 STATISTICS FOR MANAGEMENT
= Rs. 91,400
Example 19:
A survey conducted over the last 25 years indicated that in 10 years, the winter was mild, in 8
years it was cold and in the remaining 7 it was very cold. A company sells 1000 woolen coats
in a mild year, 1300 in a cold year and 2000 in a very cold year. If a woolen coat costs Rs. 173
and is sold for Rs. 248, find the yearly expected profit of the company.
Example 20:
A bag contains 20 tickets marked with numbers 1 to 20. One ticket is drawn at random. Find
the probability that it will be a multiple of (i) 2 or 5, (ii) 3 or 5.
One ticket can be drawn out of 20 tickets in 20C1 = 20 ways, which determines the exhaustive
number of cases.
119 SHOOLINI UNIVERSITY
(i) The number of cases favourable to getting the ticket number which is : (a) a multiple of 2
are 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 i.e., 10 cases. (b) a multiple of 5 are 5, 10, 15, 20 i.e., 4
cases. Of these, two cases viz., 10 and 20 are duplicated. Hence the number of distinct cases
favourable to getting a number which is a multiple of 2 or 5 are 10 + 4 – 2 = 12. ∴ Required
probability = 12/20 = 3/5 = 0·6.
(ii) The cases favourable to getting a multiple of 3 are 3, 6, 9, 12, 15, 18 i.e., 6 cases in all and
getting a multiple of 5 are 5, 10, 15, 20 i.e., 4 cases in all. Of these, one case viz., 15 is
duplicated. Hence, the number of distinct cases favourable to getting a multiple of 3 or 5 is 6 +
4–1=9
Sample space is the set of all the results of a random experiment, while sample points are each
individual outcome. For instance, the only numbers in the sample space for the random
experiment "counting the number of rainy days in June" are 0 through 30. 'Measuring the
rainfall depth', 'Soil Moisture Content', or 'Wind Speed' results at a place may have any
nonnegative values. This means that any real number between 0 and ∞ comprises the sample
space for these random tests.
A subset of a sample space can be referred to as an event. The event could be made up of one
or more sample points (discrete sample space), or it could be a range from the sample space
(continuous sample space). The event of "counting the number of rainy days in June" is an
example from the sample space where the number of rainy days equals 10. The continuous
sample space of "wind speed" at a place also contains events with wind speeds greater than 100
km/h.
120 STATISTICS FOR MANAGEMENT
Compliment
Complement: Sometimes, we want to know the probability that an event will not happen; an
event opposite to the event of interest is called a complementary event. If A is an event, its
complement is the probability of the complement is AC or A Example: The complement of
male event is the female
P(A) + P(AC) = 1
It is well known that the probability of flipping a fair coin and getting a “tail” is 0.50. If a coin
is flipped 10 times, is there a guarantee, that exactly 5 tails will be observed If the coin is
flipped 100 times? With 1000 flips? As the number of flips becomes larger, the proportion of
coin flips that result in tails approaches 0.50
• Conditional probabilities
• Joint probability
• Rules of probability
121 SHOOLINI UNIVERSITY
Example:
If a subject was selected randomly and found to be female what is the probability that she has
a blood group O Here the total possible outcomes constitute a subset (females) of the total
number of subjects. This probability is termed probability of O given F P(O\F) = 20/50 = 0.40
Example:
Probability of being male & belong to blood group AB P(M and AB) = P(M∩AB) = 5/100 =
0.05
122 STATISTICS FOR MANAGEMENT
∩ = intersection
Properties
• P(M) + P(F) = 1
If the occurrence of one event has no bearing on the likelihood that the other event will also
occur, then two events A and B are said to be independent.
Example:
• To know that two events are independent compute the marginal and conditional probabilities
of one of them if they are equal the two events are independent. If not equal the two events are
dependent
123 SHOOLINI UNIVERSITY
Dependent
The two occurrences are said to be dependent if the occurrence of one event affects the
likelihood that the other event will also occur.
Example:
= (120/200) (90/120)
= 0.45
124 STATISTICS FOR MANAGEMENT
Additional Rule
Assume that there are two events, A and B. Depending on whether they are mutually exclusive,
two different rules apply.
The likelihood of the occurrences happening when they are mutually exclusive is equal to the
probability of both events.
Example:
Rule 2: When the events are not mutually exclusive, Rule 2 applies.
There is usually some overlap between two non-exclusive occurrences, hence the probability
of the events will change to,
P (A OR B) = P (A U B) = P(A) + P(B) - P (A ∩ B)
The tree diagram is as follow If we count the total number of branches at the top of the tree,
we get the total number of possible outcomes for the composite experiment.
Figure 9.1
126 STATISTICS FOR MANAGEMENT
We can see that there are a total of six branches that represent all the possible outcomes of this
experiment. Three diagrams can be utilized for counting for any finite number of composite
experiments.
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
• In probability theory, it relates the conditional probability and marginal probabilities of two
random events.
• Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
127 SHOOLINI UNIVERSITY
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and want to
determine the fourth one.
• Suppose we want to perceive the effect of some unknown cause, and want to compute that
cause, then the Bayes' rule becomes:
9.8. SUMMARY
A set of conditions that are used to observe the behaviour of some variables is referred to as an
experiment. An event is a term that can be used to refer to a subset of a sample space. Either a
single sample point or multiple sample points can make up an event in a discrete sample space;
alternatively, an event can be a range taken from the sample space. Conditional probability is
a term used to describe the likelihood of one particular outcome taking place given the
existence of one or more other occurrences. The joint probability of two events occurring
simultaneously and at the same time is the probability that both of those events will take place.
The probability distribution of the variables that are contained within a subset of a set of random
variables is referred to as the marginal distribution of the subset. If the occurrence of one event
does not in any way affect the likelihood that the other event will also take place, then we say
that the two events, A and B, are independent of one another. If they are equal, then the two
occurrences can be considered independent; however, if they are not equal, then the two events
can be considered dependent. For the combined experiment A1, A2, and, there are a total of n1
n2. nm possible outcomes.
Am. For m=2 and n1 =2, n2 =3, \r . The following is the structure of the tree diagram: The
method of sampling will determine the total number of possibilities for selecting a random
sample of size k out of those possibilities. The permutation formula will be used after a random
128 STATISTICS FOR MANAGEMENT
sample of size k is taken from a total of n objects without replacing any of the objects, and after
the objects are arranged in a specific order. The number of unique three-digit numbers is equal
to the number of permutations that can be made using the set of four numbers as the starting
point. With the help of Bayes' rule, we are able to compute the single term P(B|A) in terms of
P(A|B), P(C), and P(D) by utilising our prior experience and knowledge. The Bayes theorem
allows for the probability prediction of an event to be updated by observing new information
of the real world, which can be done in a number of different ways.
Conditional Probabilty: The chances of a particular outcome happening given that another
event has also happened are referred to as conditional probabilities. It is frequently expressed
as the probability of B given A and is written as P(B|A), where the likelihood of B depends on
the likelihood that A will occur.
CONTENT:
▪ Objectives
10.1 Introduction
OBJECTIVES:
1. To understand the concept of sampling and its importance in statistical research.
3. To explore various sampling methods and techniques used for data collection.
10.1 INTRODUCTION
• A finite subset of the population, selected from it with the objective of investigating its
properties is called a sample and the number of units in the sample is known as the
sample size. Sampling is a tool which enables us to draw conclusions about the
characteristics of the population after studying only those objects or items that are
included in the sample.
• The main objectives of the sampling theory are : (i) To obtain the optimum results, i.e.,
the maximum information about the characteristics of the population with the available
sources at our disposal in terms of time, money and manpower by studying the sample
values only. (ii) To obtain the best possible estimates of the population parameters
• Census Method. In the census method we resort to 100% inspection of the population
and enumerate each and every unit of the population. In the sample method we inspect
only a selected representative and adequate fraction (finite subset) of the population
and after analysing the results of the sample data we draw coclusions about the
characteristics of the population.
• The census method has its obvious limitations and drawbacks given below :
(i) The complete enumeration of the population requires lot of time, money, manpower
and administrative personnel. As such this method can be adopted only by the
government and big organisations who have vast resources at their disposal.
(ii) Since the entire population is to be enumerated, the census method is usually very
time consuming. If the population is sufficiently large, then it is possible that the
processing and the analysis of the data might take so much time that when the results
are available they are not of much use because of changed conditions.
131 SHOOLINI UNIVERSITY
• (a) If the information is required about each and every unit of the population, there is
no way but to resort to 100% enumeration.
• We now summarise the merits of the sample method over the census method.
• 1. Speed, i.e., less time. Since only a part of the population is to be inspected and
examined, the sample method results in considerable amount of saving in time and
labour. There is saving in time not only in conducting the sampling enquiry but also in
the processing, editing and analysing the data. This is a very sensitive and important
point for the statistical investigations where the results are urgently and quickly needed.
• 2. Economy, i.e., Reduced Cost of the Enquiry. The sample method is much more
economical than a complete census. In a sample enquiry, there is reduction in the cost
of collection of the information, administration, transport, training and man hours.
Although, the labour and the expenses of obtaining information per unit are generally
large in a sample enquiry than in the census method, the overall expenses of a sample
survey are relatively much less, since only a fraction of the population is to be
enumerated. This is particularly significant in conducting socio-economic surveys in
developing countries with budding economies who cannot afford a complete census
because of lack of finances.
• An obvious and serious drawback of this sampling scheme is that it is highly subjective
in nature, since the selection of the sample depends entirely on the personal
convenience, beliefs, biases and prejudices of the investigator. For example, if in a
socio-economic survey it is desired to study the standard of living of the people in New
Delhi and if the investigator wants to show that the standard has gone down, then he
may include individuals in the samples only from the low income stratum of the society
and exclude the people from the posh colonies like South Extension, Greater Kailash,
Jor Bagh, Chanakyapuri and so on.
• Mixed Sampling
• (i) Simple Random Sampling (vi) Area Sampling (ii) Stratified Random Sampling (vii)
Simple Cluster Sampling (iii) Systematic Sampling (viii) Multistage Cluster Sampling
(iv) Multistage Sampling (ix) Quota Sampling (v) Quasi Random Sampling
Simple random sampling (S.R.S.) is the technique in which sample is so drawn that
each and every unit in the population has an equal and independent chance of being
included in the sample.
If the unit selected in any draw is not replaced in the population before making the next
draw, then it is known as simple random sampling without replacement (srswor) and if
it is replaced back before making the next draw, then the sampling plan is called simple
random sampling with replacement (srswr).
Selection of a Simple Random Sample. Proper care must be exercised to ensure that
the sample drawn is random and therefore, representative of the population. A random
sample may be selected by : (i) Lottery Method. (ii) Use of Table of Random Numbers.
• Lottery Method. The simplest method of drawing a random sample is the lottery
system. This consists in identifying each and every member or unit of the population
with a distinct number which is recorded on a slip or a card. These slips should be as
homogeneous as possible in shape, size, colour, etc., to avoid the human bias. The lot
of these slips or cards is a kind of miniature of the population for sampling purposes. If
the population is small, then these slips are put in a bag and thoroughly shuffled and
then as many slips as units needed in the sample are drawn one by one, the slips being
thoroughly shuffled after each draw.
• For example, let us suppose that we want to draw a random sample of 10 individuals
from a population of 100 individuals. We assign the numbers 1 to 100, one number to
each individual of the population and prepare 100 identical slips bearing the numbers
from 1 to 100. These slips are then placed in a bag or container and shuffled thoroughly.
Finally, a sample of 10 slips is drawn out one by one.
134 STATISTICS FOR MANAGEMENT
• The lottery method gives a sample which is quite independent of the properties of the
population. It is one of the best and most commonly used methods of selecting random
samples. It is quite frequently used in the random draw of prizes, in the Tambola games
and so on.
• The most practical and inexpensive method of selecting a random sample consists in
the use of ‘Random Number Tables’, which have been so constructed that each of the
digits 0, 1, 2, …, 9 appears with approximately the same frequency and independently
of each other. If we have to select a sample from a population of size N(≤ 99), then the
numbers can be combined two by two to give pairs from 00 to 99. Similarly if N ≤ 999
or N ≤ 9999 and so on, then combining the digits three by three (or four by four and so
on), we get numbers from 000 to 999 or 0000 to 9999 and so on. Since each of the digits
0, 1, 2, …, 9 occurs with approximately the same frequency and independently of each
other, so does each of the pairs 00 to 99, triplets 000 to 999 or quadruplets 0000 to 9999
and so on.
• The method of drawing a random sample comprises the following steps : (i) Identify
N units in the population with the numbers 1 to N. (ii) Select at random, any page of
the ‘random number table’ and pick up the numbers in any row, column or diagonal at
random. (iii) The population units corresponding to the numbers selected in step (ii)
constitute the random sample.
• Stratified random sampling involves the following steps : 1. Stratify the given
population into a number of sub-groups or sub-populations known as strata such that :
(a) The units within each stratum (sub-group) are as homogeneous as possible. (b) The
differences between various strata are as marked as possible, i.e., the stratum means
differ as widely as possible. (c) Various strata are non-overlapping. This means each
and every unit in the population belongs to one and only one stratum.
135 SHOOLINI UNIVERSITY
• The criterion used for the stratification of the universe into various strata is known as
stratifying factor. In general, geographical, sociological or economic characteristics
form the basis of stratification of the given population. Some of the commonly used
stratifying factors are age, sex, income, occupation, education level, geographic area,
economic status, etc.
• SYSTEMATIC SAMPLING
• Systematic sampling is slight variation of the simple random sampling in which only
the first sample unit is selected at random and the remaining units are automatically
selected in a definite sequence at equal spacing from one another.
• This technique of drawing samples is usually recommended if the complete and up-to-
date list of the sampling units, i.e., the frame is available and the units are arranged in
some systematic order such as alphabetical, chronological, geographical order, etc.
136 STATISTICS FOR MANAGEMENT
• This requires the sampling units in the population to be ordered in such a way that each
item in the population is uniquely identified by its order, for example the names of
persons in a telephone directory, the list of voters, etc
• CLUSTER SAMPLING
• In this case the total population is divided, depending on problem under study, into
some recognisable sub-divisions which are termed as clusters and a simple random
sample of these clusters is drawn.
• We then observe, measure and interview each and every unit in the selected clusters.
• For example, if we are interested in obtaining the income or opinion data in a city, the
whole city may be divided into N different blocks or localities (which determine the
clusters) and a simple random sample of n blocks is drawn. The individuals in the
selected blocks determine the cluster sample.
• In using cluster sampling the following points should be borne in mind : (i) Clusters
should be as small as possible consistent with the cost and limitations of the survey,
and (ii) The number of sampling units in each cluster should be approximately same.
• Thus, cluster sampling is not to be recommended if we are sampling areas in the city
where there are private residential houses, business and industrial complexes, apartment
buildings, etc., with widely varying number of persons or households.
• MULTISTAGE SAMPLING
• Instead of enumerating all the sampling units in the selected clusters, one can obtain
better and more efficient estimates by resorting to sub-sampling within the clusters.
137 SHOOLINI UNIVERSITY
• The technique is called two-stage sampling, clusters being termed as primary units and
the units within the clusters as secondary units. The above technique may be generalised
to what is called multistage sampling.
• For example, if we are interested in obtaining a sample of, say, n households from a
particular State, the first stage units may be districts, the second stage units may be
villages in the districts and third stage units will be households in the villages. Each
stage thus results in a reduction of the sample size. Multistage sampling consists in
sampling first stage units by some suitable method of sampling. From among the
selected first stage units, a sub-sample of secondary stage units is drawn by some
suitable method of sampling which may be same as or different from the method used
in selecting first stage units. Further stages may be added to arrive at a sample of the
desired sampling units
QUOTA SAMPLING
• In this method, the investigator is told in advance the number of the sample units he is
to examine or enumerate from the stratum assigned to him. In the language of stratified
sampling, the quota of the units to be examined by the investigator from the stratum
assigned to him is fixed for each investigator.
• The sampling quotas may be fixed according to some specified characteristic such as
income group, sex, occupation, political or religious affiliations, etc.
138 STATISTICS FOR MANAGEMENT
• The choice of the particular units or individuals for investigation is left to the
investigators themselves.
• They are merely given the quotas with the specific instruction to inspect (interview) a
specified number of units (informants) from each stratum.
• Quite often the investigator does not make a random selection of the sample units. He
usually applies his judgment and discretion in the choice of the sample and tries to get
the desired information as quickly as possible. Moreover, in case of non-response from
some of the selected sample units (due to certain reasons like non-availability of the
respondent even after repeated calls by the investigator, or the inability or refusal of the
informant to furnish the requisite information), the investigator selects some fresh units
himself to complete his quota. In doing so, he is likely to include some purposive units
to get the desired information.
• The principle of statistical regularity impresses upon the following two points : (i)
large sample size. Logically, it seems that as the sample size increases, the sample is
more likely to reveal the true characteristics of the population and thus provide better
estimates of the parameters. It is known that the reliability of the sample statistic as an
estimate of the population parameter is proportional to the square root of the sample
size n. But due to certain limitations in terms of time, money and manpower, it is not
always possible to take very large samples. Moreover, the effort and cost of drawing
large samples might outlive the utility of the sample study as against the complete
enumeration (census). (ii) Random selection. The sample should be selected at random
from the population. By random selection we mean a selection in which each and every
unit in the population has an equal chance of being selected in the sample.
• For example, if we are interested in studying the average height of the students in Delhi
University, then it is not desirable to resort to 100% enumeration of the students in the
university. A fairly adequate sample of the students from each college may be selected
139 SHOOLINI UNIVERSITY
at random and the average height of the students selected in the samples may be
computed.
• For example, if a coin is tossed, say, 20 times then nothing can be said with certainty
about the proportion of heads. We may get 0, 1, 2, …, or even all the 20 heads. But if
it is thrown at random a very large number of times, say, 5,000 times, then we may
expect on the average 50% heads and 50% tails.
• If some of the items in a population possess markedly distinct characteristics from the
remaining items, then this tendency would be revealed in the sample values also. Rather
this tendency of persistence will be there even if the population size is increased or even
in the case of large samples.
• For example, if the day’s production of any manufacturing unit is made 4 times, the
proportion of defectives in the lot remains more or less same.
• Principle of Validity.
• A sampling design is termed as valid if it enables us to obtain valid tests and estimates
about the population parameters.
140 STATISTICS FOR MANAGEMENT
• Principle of Optimisation.
• This principle stresses the need of obtaining optimum results in terms of efficiency and
cost of the sampling design with the sources available at our disposal,a measure of
efficiency or reliability of an estimate of the population parameter is provided by the
reciprocal of the standard error of the estimate and the cost of the design is determined
by the total expenses incurred in terms of money and manpower. This principle aims at
: (i) obtaining a desired level of efficiency at minimum cost and (ii) obtaining maximum
possible efficiency with given level of cost
• (i) If a sample survey is not properly planned (or designed) and executed carefully, the
results obtained will not be reliable and quite often might even be misleading. In this
context, it may be worthwhile to quote the words of Frederick F. Stephen : “Samples
are like medicines. They can be harmful when they are taken carelessly or without
knowledge of their effects…. Every good sample should have a proper lable with
instructions about its use”. Sampling design must be perfect otherwise it might lead to
serious complications in the final results. The omission of a few units in a complete
census may be immaterial but non-response or incomplete response from even one or
two units in a small sample might have a significant effect on the final result.
• (ii) An efficient sampling scheme requires the services of qualified, skilled and
experienced personnel, better supervision and more sophisticated equipment and
statistical techniques for the planning and execution of the survey and for the collection,
processing and analysis of the sample data. In the absence of these, the results of the
survey may not be reliable.
• Sometimes the sample survey might require more time, money and labour than a
complete census. This will be so if the sample size is a large proportion of the
population size and if complicated weighted system is used.
141 SHOOLINI UNIVERSITY
• (iv) Sampling procedure cannot be used if we want to obtain information about each
and every unit of the population. Further, if the population is too heterogeneous, it may
be impossible to use a sampling procedure.
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.
Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.
Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay.
Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.
CONTENT:
▪ Objectives
Introduction
11.3. Summary
11.4. Keywords
OBJECTIVES:
• Using one or more independent or control variables to explain the variability of the
dependent variable.
143 SHOOLINI UNIVERSITY
INTRODUCTION
Redman provides an example. Imagine you're a sales manager predicting next month's
numbers. Hundreds of factors, from the weather to a competitor's advertising to a new model
rumour, might affect the number. Your company may have a hunch about what will boost sales.
"Have faith. More rain, more sales.” "Sales increase six weeks after a competitor's promotion."
Regression analysis determines which variables have an impact. Which factors matter most?
We can ignore which. What's their relationship? How sure are we of all these factors? In order
to perform a regression analysis, it is necessary to collect data on the relevant variables.
(Reminder: you probably don't need to perform this yourself, but it is helpful to understand the
process your data analyst colleague employs.)
You gather all of your monthly sales figures for, say, the past three years, as well as any data
on the independent variables of interest. Also determine the average monthly precipitation over
the past three years.
Then, all of this information is plotted on a chart that looks like this:
In forecasting techniques, regression lines are useful. Its objective is to describe the relationship
between the dependent variable (y variable) and one or more independent factors (x variables)
(x variable).
Using the equation derived from the regression line, one can predict the future behaviour of the
dependent variables by varying the values of the independent variables.
144 STATISTICS FOR MANAGEMENT
Figure 10.1
Consider that y is a dependent variable. The independent variable is X. The regression line for
the population is
Y = a0 + a1x
a0 represents the constant, a1 represents the regression coefficient, and x represents the value
of the independent variable.
If you have a random sample of observations, you may estimate the population regression line
by:
Here, 'x' represents the value of the independent variable, while 'y' represents the value
predicted for the dependent variable.
146 STATISTICS FOR MANAGEMENT
147 SHOOLINI UNIVERSITY
148 STATISTICS FOR MANAGEMENT
11.3. SUMMARY
Redman presents an illustration. Imagine you're a sales manager projecting the numbers for the
following month. Hundreds of variables, including the weather, a competitor's advertising, and
rumours of a new model, could alter the number.
Your organisation might have a notion about what will increase sales. "Have belief. More rain,
more sales." Six weeks following a competitor's promotion, sales climb.
Regression analysis identifies influential variables. Which factors are most crucial? We can
disregard which. What is their connection? How confident are we in each of these factors? To
do a regression analysis, it is important to collect data on the variables in question.
(Reminder: you likely don't need to execute this yourself, but it is helpful to understand the
process your data analyst colleague uses.) You collect your monthly sales figures for the past,
say, three years, as well as any data on the factors of interest. Additionally, calculate the average
monthly precipitation over the past three years.
11.4. KEYWORDS
Exponential trend: A general expression for an exponential trend is Y = a.bt, where a and b are
constants.
This is one of the most widely used techniques for fitting a mathematical trend. The fitted trend
is deemed the best when the sum of squares of the data' deviations from it is minimised.
The Regression Line Y on X: The usual form of the regression line of Y on X is YCi = a + bXi,
where YCi represents the mean, predicted, or calculated value of Y for a given value of X =
Xi. This line has two constants, denoted by a and b.
149 SHOOLINI UNIVERSITY
The Regression Line X on Y : The usual form of the line of regression of X on Y is XCi = c +
dYi, where X Ci represents the projected or calculated or estimated value of X for a given value
of Y = Yi and c and d are constants. d is referred to as the regression coefficient of X on Y.
Linear Trend: The equation for the linear trend is Yt = a + bt, where t represents a time period
such as a year, month, or day, and a, b are constants.
If the coefficient of correlation calculated for bivariate data (Xi, Yi), I = 1,2,...... n, is
sufficiently strong and a cause-and-effect type of relationship is assumed to exist between
them, the next natural step is to determine a functional relationship between these variables. In
statistics, this functional relationship is known as the regression equation.
2. What is your comprehension of linear regression? Why are there two regression lines?
Under what circumstances can there just be one line?
3. Define the regression of Y on X and X on Y given a bivariate data set (Xi, Yi), where I =
1, 2,.... What values would the correlation coefficient have if two regression lines (a) cross
at a right angle and (b) coincide?
5. "The regression line provides merely a 'best estimate' of the variable under consideration.
This estimate's degree of uncertainty can be determined by calculating its standard error.
Explain.
6. What is the least squares method? Demonstrate that the two lines of regression obtained
using this method are irreversible unless r = 1. Explain.
150 STATISTICS FOR MANAGEMENT
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
151 SHOOLINI UNIVERSITY
CONTENT:
▪ Objectives
Introduction
12.1. Correlation
12.1.1. Definition of correlation
12.1.2. Scope of correlation analysis
12.1.3. Scatter Diagram
12.1.4. Methods of correlation
12.2. Summary
12.3. Keywords
12.4. Review Questions
12.5. References for further reading
OBJECTIVES:
• Students will learn about correlation
INTRODUCTION
Up to this point, we have only discussed distributions in relation to a single attribute. The term
"Univariate Distribution" refers to these kinds of distributions. A bivariate distribution is
obtained by simultaneously observing several units of interest with reference to two attributes.
This creates the conditions necessary to derive the distribution.
Consider, for instance, research that looked at both the heights and weights of college students
at the same time. We are also able to determine the mean, variance, skewness, and other
statistics for each individual attribute based on such data.
In addition to this, when we study bivariate distributions, we are also interested in determining
whether or not there is a relationship between two of the characteristics, or, to put it another
way, the degree to which the two variables, which correspond to the two characteristics, tend
to move together in the same or opposite directions, i.e. the degree to which they are associated.
In other words, we want to know whether or not there is a relationship between the two
characteristics.
The understanding of this kind of relationship is beneficial for making accurate projections
regarding the value of one variable. Notes taken into account the worth of the other In addition
to this, it is useful for understanding and analysing a wide range of economic and commercial
issues. It is important to highlight those statistical relations are not the same as exact
mathematical relations at this point in the discussion. If we are given a statistical connection
between two variables, X and Y, such as Y = a + bX, then the only value of Y that we can
obtain is the value that we would expect to find on average for a particular value of X. The
investigation of the connections between three or more variables can be broken down into one
of two major classes:
1. to ascertain whether or not the variables are connected in any way, shape, or form. If
this is the case, then what would you say the level of connection, or the extent of the
correlation is between the two?
2. Given that there is a correlation between the variables, the goal is to find the form of
the connection between them that is the most appropriate.
153 SHOOLINI UNIVERSITY
3. The first category pertains to the unit's discussion of 'Correlation,' while the second
pertains to 'Regression.'
12.1. CORRELATION
Different authors have provided their different definitions of correlation, which, in general,
imply that it is the degree of link between two or more variables.
"Two or more quantities are said to be correlated if their fluctuations tend to be accompanied
by matching fluctuations in the other or others."— L.R. Connor
Analysis of covariation between two or more variables constitutes correlation. — A.M. Tuttle
When the link is quantitative in nature, correlation is the ideal statistical instrument for
detecting and measuring it and expressing it in a concise formula. — Croxton and Cowden
"The objective of correlation analysis is to establish the 'degree of link' between variables." —
Ya Lun Chou
The correlation coefficient quantifies the degree of relationship between two or more variables.
• For example, the data on height and weights of a group of people would relate to each
member of the group but prices of sugar and sugarcane are two different series
altogether but there would be some relation between the values of the two, prices of
sugar depending upon the prices of sugarcane.
• This technique provides a tool into the hands of decision-makers because it provides
better understanding of the trends and their dependence on other factors so that the
range of uncertainties associated with decision-making is reduced.
• Definition of Correlation:
• The term correlation indicates the relationship between two variables in which with
changes in the value of one variable, the values of the other variable also change.
Correlation has been defined by various eminent statisticians, mathematicians and
economists.
• Some of the important definitions of correlation are given below: (1) According to La
Yun Chow, “Correlation analysis attempts to determine the degree of relationship
between variables.”
• (2) As per W. I. King, “Correlation means that between two series or groups of data
there exists some casual connections. ....... If it is proved true that in a large number of
instances two variables tend always to fluctuate in the same or in opposite directions,
we consider that the fact is established and that a relationship exists. This relationship
is called correlation.”
• (3) In the words of L. R. Conner, “If two more quantities vary in sympathy so that
movements in the one tend to be accompanied by corresponding movements in the
other/others then they are said to be correlated.
155 SHOOLINI UNIVERSITY
Types of Correlation
• (1) Positive and negative correlation: When the values of the two variables move in the
same direction, i.e., an increase in one is associated with an increase in other, or vice
versa, the correlation is said to be positive. If the values of two variables move in the
opposite directions i.e., an increase in the value of one variable is associated with fall
in other, or vice versa, the correlation is said to be negative. For example, the price and
supply are positively correlated but price and demand are negatively correlated.
• (2) Linear and non-linear correlation: If, in response to a unit change in the value of one
variable, there is a constant change in the value of the other variable, the correlation
between them is said to be linear. This means, the relation between variables fits in Y
= a + bX. But when no constant change in variable is registered for a given unit change
in other variable, non-linear or curvilinear correlation is said to exist.
• (3) Simple, multiple and partial correlation: When relation between two variables is
studied, it is simple correlation. When three or more factors are studied together to find
relationships, it is called multiple correlation. In partial correlation, two or more factors
are agreed to be involved but correlation is studied between only two factors,
considering other factors to be constant.
Applications of Correlation
• Some of the important areas where correlation has been used successfully are:
• (1) In the field of genetics: Galton and Pearson developed a method of assessing
correlation which was used in studying many problems of biology and genetics.
• (2) In the field of management: Basically, management is all about making decisions.
Correlation technique presents a strong tool into the hands of the manager which
156 STATISTICS FOR MANAGEMENT
• (3) Other field of social sciences: Correlation helps in determining the interrelationships
between different variables and in this way it is very helpful in promoting research and
opening new frontiers of knowledge. In this way it can be said that correlation has
immense utility in various fields in promoting research and opening new frontiers of
knowledge.
Conceptual Summary
• Correlation measures a degree of the relationship between two or more variables but it
does not indicate any kind of cause and effect relationship between the variables. If,
high degree of correlation is found exist between two variables, it implies that there
must be a reason for such close relationship, but the cause and effect relation can be
revealed specifically when other knowledge of the factor involved being brought to
bear on the situation. This means, to establish a ‘functional relationship’ between two
or more variables, one has to go beyond the confines of statistical analysis to other
factors. (Functional relationship means that two or more factors are interdependent.
1. One of the variables could influence the other: Calculating the correlation coefficient
between the quantity demanded and the related price of tea would only disclose that the
degree of association between them is extremely high. It will not indicate whether the
price of tea influences its demand or vice versa. In order to determine this, further
information beyond the correlation study is required. For instance, if further
information suggests that the price of tea affects its demand, then the price will be the
cause and the quantity will be the consequence. Other terms for the causal variable are
independent variable and dependent variable.
157 SHOOLINI UNIVERSITY
2. The two variables are able to interact: In this instance, there is a cause-and-effect
relationship, but it may be challenging to determine which of the two variables is
independent. For instance, if we have data on the price of wheat and its cost of
production, the correlation between them may be very high, as a higher price of wheat
may encourage farmers to produce more wheat, and an increase in wheat production
may result in a higher cost of production, assuming it is an industry with rising costs.
In addition, the greater production costs may increase the price of wheat. For the
purpose of identifying a relationship between the two variables, we can select any one
of them as the independent variable in such cases.
3. The two variables are susceptible to external influences: In this scenario, there may be
a strong correlation between the two variables, but no obvious cause-and-effect
relationship appears to exist between them. For instance, the rising incomes of
customers may produce a positive correlation between the demand for commodities X
and Y. Such a correlation coefficient is referred to as a false or nonsensical correlation.
4. This is another example of spurious correlation. Given the data on any two variables, it
is possible to acquire a high correlation coefficient number while, in reality, the
variables have no link. For instance, a high correlation coefficient can be achieved
between the size of a person's shoes and their income.
● Coefficient of Correlation is between -1 and +1; it cannot be less than -1 or more than
+1. Symbolically,
● -1<=r<= + 1 or | r | <1.
● The correlation coefficient measures simply the linear relationship between X and Y. If
X and Y are independent, the correlation coefficient between them will be 0.
● The simplest device for determining relationship between two variables is a special type
of dot chart called scatter diagram. When this method is used the given data are plotted
on a graph paper in the form of dots, i.e., for each pair of X and Y values we put a dot
and thus obtain as many points as the number of observations.
● The more the plotted points “scatter” over a chart, the less relationship there is between
the two variables.
(a) Make a scatter diagram. (b) Do you think that there is any correlation between
the variables X and Y ?
Merits:
• It is not influenced by the size of extreme items whereas most of the mathematical
methods of finding correlation are influenced by extreme items.
• Making a scatter diagram usually is the first step in investigating the relationship
between two variables.
Limitations:
• By applying this method we can get an idea about the direction of correlation and also
whether it is high or low. But we cannot establish the exact degree of correlation
between the variables as is possible by applying the mathematical methods.
2. The relationship between two variables can be explored without the use of mathematics.
Demerits-
1. It is not a mathematical procedure; hence the outcomes are neither exact nor precise.
2.Graphic technique
This is a continuation of linear graphs. In this scenario, graph paper is used to plot two or more
variables. If the curves travel in the same direction, the correlation is positive; otherwise, the
correlation is negative. However, if there is no clear direction, there is no association. Although
it is a straightforward procedure, it provides just an approximate assessment of the
relationship's nature.
Merits
2. The relationship between two variables can be explored without the use of mathematics.
Demerits
1. It is not a mathematical procedure; hence the outcomes are neither exact nor precise.
Karl Pearson's Correlation Coefficient: The correlation coefficient is the most prominent
mathematical approach for calculating correlation. The basis of its calculation is the arithmetic
mean and standard deviation. Correlation coefficient (r), commonly known as the linear
correlation coefficient, quantifies the strength and direction of a linear link between two
variables. The value of r ranges from -1 to 1.
The correlation coefficient quantifies the degree of association between two variables.
2. It can be used to calculate the regression coefficient if the standard deviations of the
two variables are known.
The Karl Pearson’s method, popularly known as Pearsonian coefficient of correlation, is most
widely used in practice.
It is one of the very few symbols that is used universally for describing the degree of
correlation between two series.
dx = X-Mean
dy = Y-Mean
• This method is to be applied only when the deviations of items are taken from actual
means and not from assumed means.
• The value of the coefficient of correlation as obtained by the above formula shall
always lie between ± 1.
• The coefficient of correlation describes not only the magnitude of correlation but also
its direction.
• Steps
• (i) Take the deviation of X series from the mean of X and denote the deviations by dx.
• (ii) Square these deviations and obtain the total, i.e., ∑ dx square.
• (iii) Take the deviations of Y series from the mean of Y and denote these deviations
by dy
• (iv) Square these deviations and obtain the total, i.e., ∑ dy square .
• (v) Multiply the deviation of dx and dy and obtain the total, i.e., ∑dxdy .
• (vi) Substitute the values of ., ∑dxdy, ∑ dx square and ∑ dy square in the above
formula.
163 SHOOLINI UNIVERSITY
• X: 6 8 12 15 18 20 24 28 31
• Y: 10 12 15 15 18 25 22 26 28
• Cost (Rs.) 39 65 62 90 82 75 25 98 36 78
• Sales (Rs.) 47 53 58 86 62 68 60 91 51 84
• Case X1 X2 Case X1 X2 A 10 9 B 6 4 C 9 6 D 10 9 E 12 11 F 13 13 G 11 8
• H94
• Sum of squares of deviation from Mean is 136 and 138 ,summation of product of X&
Y series from mean =122
164 STATISTICS FOR MANAGEMENT
• X: 78 89 96 69 59 79 68 61
This is a qualitative method for measuring the correlation coefficient. Such qualities as beauty,
honesty, and ability cannot be assessed quantitatively. Therefore, ranks are employed to
calculate the correlation coefficient.
• Here,
• Closer the ⍴ value to 0, weaker is the association between the two ranks.
• The scores of 9 students in History and Geography are mentioned in the table below.
166 STATISTICS FOR MANAGEMENT
• Step 2- Start by ranking the two data sets. Data ranking can be achieved by assigning
the ranking “1” to the biggest number in the column, “2” to the second biggest number
and so forth. The smallest value will usually get the lowest ranking. This should be
done for both sets of measurements.
• Step 3- Add a third column d to your data set, d here denotes the difference between
ranks.
• For example, if the first student’s physics rank is 3 and the math rank is 5 then the
difference in the rank is 3. In the fourth column, square your d values.
• We get r=1-6(12)/9*10*8=1-1/10=0.9
• The Spearman’s Rank Correlation for this data is 0.9 and as mentioned above if
the ⍴ value is nearing +1 then they have a perfect association of rank.
• To calculate a Spearman rank-order correlation on data without any ties we will use the
following data:
167 SHOOLINI UNIVERSITY
• R=1 – 6(54)/10(11)9=1-0.33=0.67
• as n = 10. Hence, we have a ρ (or rs) of 0.67. This indicates a strong positive
relationship between the ranks individuals obtained in the maths and English exam.
That is, the higher you ranked in maths, the higher you ranked in English also, and vice
versa.
• The Spearman’s correlation coefficient for tied ranks can be calculated using the
formula
• Like normal Spearman’s rank correlation coefficient, the tied rank coefficient will have
values only between 11 and −1−1, both included. +1+1 denotes a perfect positive
correlation, −1−1 denotes a perfect negative correlation, and 00 indicates no
correlation.
• The following table gives the data of the marks obtained by 8 students in Commerce
and Mathematics. Compute the rank correlation coefficient.
168 STATISTICS FOR MANAGEMENT
• Marks in Commerce 15 20 28 12 40 60 20 80
• Marks in Mathematics 40 30 50 30 20 10 30 60
• Hence, we can conclude that the marks in Commerce and Mathematics are not
correlated at all.
• Judge I 1 6 5 10 3 2 4 9 7 8
• Judge II 3 5 8 4 7 10 2 1 6 9
• Judge III 6 4 9 8 1 2 3 10 5 7
• Find out by Spearman's Rank Difference Method which pair of judges has a common
taste in respect of beauty.
• X : 53 98 95 81 75 61 59 55
• Y : 47 25 32 37 30 40 39 45
• X : 80 78 75 75 68 67 60 59
• Y : 12 13 14 14 14 16 15 17
• Five competitors in a beauty contest are ranked by three judges in the following order
:
• Rank by Judge A 1 2 3 4 5
• Rank by Judge B 2 4 1 5 3
• Rank by Judge C 1 3 5 2 4
• Using rank correlation coefficient, determine which pair of judges has the nearest
approach to tastes in beauty.
170 STATISTICS FOR MANAGEMENT
• Since the coefficient of rank correlation is positive and highest in the judgement of the
judges A and C, we conclude that they have the similar tastes to common tastes in
beauty. Judges B and C have very different tastes.
• 0.8=1- 6 (33)/N(N+1)(N-1)
• 2/10=198/N(N+1)(N-1)
• 990=N(N+1)(N-1)
• N=10
• Serial numbers 1 2 3 4 5 6 7 8 9 10
• Rank difference -2 -4 -1 +3 +2 0 ? +3 +3 -2
171 SHOOLINI UNIVERSITY
Merits
• This method is simpler to understand and easier to apply compared to the Karl
Pearson’s method. The answer obtained by this method and the Karl Pearson’s method
will be the same provided no value is repeated, i.e., all the items are different.
• Where the data is of a qualitative nature like honesty, efficiency, intelligence, etc., this
method can be used with great advantage. For example, the workers of two factories
can be ranked in order of efficiency and degree of correlation established by applying
this method.
• This is the only method that can be used where we are given the ranks and not the actual
data.
• Even where actual data are given, rank method can be applied for ascertaining degree
of correlation .
Limitations
• This method cannot be used for finding out correlation in a grouped frequency
distribution.
• Where the number of items exceeds 30 the calculations become quite tedious and
require a lot of time. Therefore, this method should not be applied where N is exceeding
30 unless we are given the ranks and not actual values of the variable.
• When the quantity of elements exceeds 30, the calculations become tiresome and time-
consuming.
172 STATISTICS FOR MANAGEMENT
• (2) If N is fairly small (say, not large than 25 or 30), rank method is sometimes applied
to interval data as an approximation to the more time-consuming r. This requires that
the interval data be transferred to rank orders for both variables. If N is much in excess
of 30, the labour required in ranking the scores becomes greater than is justified by the
anticipated saving of time through the rank formula.
12.2. SUMMARY
We've only discussed distributions for one characteristic so far. These are "Univariate
Distributions." Observing multiple units of interest with two properties yields a bivariate
distribution. This establishes distribution circumstances. Consider study that compared college
students' heights and weights.
We may also determine the mean, variance, skewness, and other statistics for each property.
When we study bivariate distributions, we also want to know if there is a relationship between
two of the characteristics, or the degree to which the two variables, which correspond to the
two characteristics, move together in the same or opposite directions, i.e. the degree to which
they are associated. We want to know if two qualities are related.
173 SHOOLINI UNIVERSITY
Understanding this link helps with variable projections. Notes considering others' worth It's
also useful for analysing economic and business difficulties. At this point, statistical relations
are not the same as exact mathematical relations. If we have a statistical relationship between
two variables, X and Y, such as Y = a + bX, we can only get the average value of Y for a given
value of X.
12.3. KEYWORDS
Bivariate Distribution: When many units are observed simultaneously with respect to two
properties, a Bivariate Distribution is obtained.
Correlation: Correlation is the suitable statistical tool for detecting and quantifying
quantitative relationships and expressing them in a concise formula when the relationship is
quantitative.
Correlation analysis: The objective of correlation analysis is to establish the "degree of link"
between variables.
Correlation Coefficient: It quantifies the degree of correlation between two or more variables.
Connecting points of the diagram: On the graph, each pair of values (Xi, Yi) is represented by
a point. The collection of such points (also known as the diagram's dots).
Scatter Diagram: The bivariate data are represented as (Xi, Yi), where I = 1, 2,...... Each pair
(Xi, Yi), where I = 1, 2,..., n, is plotted on a graph in order to determine the extent of the
relationship between variables X and Y. The resulting diagram is known as a Scatter Diagram.
Spearman's Rank Correlation: This is an imprecise approach for calculating the correlation
between two attributes. In this procedure, numerous things are ranked based on the two criteria,
and the correlation between these ranks is computed.
174 STATISTICS FOR MANAGEMENT
2.Compose an expression for Karl Pearson's linear correlation coefficient. Why is it called the
linear correlation coefficient? Explain.
3. The correlation between two independent variables is always zero, although the opposite is
not always true. Clarify the meaning of this assertion.
4. Differentiate between the coefficient of rank correlation of Spearman and the coefficient of
correlation of Karl Pearson. Explain the circumstances in which Spearman's rank correlation
coefficient can acquire a maximum and minimum value. Under what circumstances do
Spearman's formula and Karl Pearson's formula yield identical outcomes?
Hannagan T.J. (1982), Mastering Statistics, The Macmillan Press Ltd., Surrey.
Hooda R. P., Statistics for Business and Economics, Macmillan India, Delhi, 2008.
Jaeger R.M (1983), Statistics: A Spectator Sport, Sage Publications India Pvt. Ltd., New Delhi.
Lindgren B.W. (1975), Basic Ideas of Statistics, Macmillan Publishing Co. Inc., New York.
Richard I. Levin, Statistics for Management, Pearson Education Asia, New Delhi, 2002.
Sharma J.K., Business Statistics, Pearson Education Asia, New Delhi, 2009.
Stockton and Clark,Introduction to Business and Economic Statistics, D.B. Taraporevala Sons
and Co. Private Limited, Bombay.
Walker H.M. and J. Lev, (1965), Elementary Statistical Methods, Oxford & IBH Publishing
Co., Calcutta.