0% found this document useful (0 votes)

48 views41 pages

Data Analysis and Interpretations Chapter 8

This chapter discusses quantitative data analysis techniques for business research such as graphs, charts, and statistics. It covers preparing, presenting, and examining relationships in data as well as choosing appropriate analysis methods. The chapter is intended as a reading assignment to discuss issues in quantitative analysis.

Uploaded by

Agazzi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views41 pages

Data Analysis and Interpretations Chapter 8

Uploaded by

Agazzi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Chapter 8: Data analysis and Interpretations

Part I: Quantitative data analysis

Quantitative data in a raw form, that is, before these data have been processed and analysed,
convey very little meaning to most people. These data therefore need to be processed to make them
useful, that is, to turn them into information. Quantitative analysis techniques such as graphs,
charts and statistics allow us to do this; helping us to explore, present, describe and examine
relationships and trends within our data.

Virtually any business and management research you undertake is likely to involve some
numerical data or contain data that could usefully be quantified to help you answer your research
question(s) and to meet your objectives. Quantitative data refers to all such data and can be a
product of all research strategies. It can range from simple counts such as the frequency of
occurrences to more complex data such as test scores, prices or rental costs. To be useful these
data need to be analysed and interpreted. Quantitative analysis techniques assist you in this
process. They range from creating simple tables or diagrams that show the frequency of
occurrence and using statistics such as indices to enable comparisons, through establishing
statistical relationships between variables to complex statistical modelling.

This chapter builds on the ideas outlined in earlier chapters about data collection. In this section
of this chapter, we attempt to provide an in-depth discussion of the wide range of graphical and
statistical techniques of analyzing of data. Likewise this section of the chapter does not attempt
to provide an in-depth discussion of the wide range of issues that need to be considered at the
planning and analysis stages of your research project, and outlines analytical techniques. This
section of the chapter left for the students as a reading assignment from the book called Research
Methods for Business Students 4th edition (from page 406 - 462) by Mark Saunders, Philip
Lewis, and Adrian Thornhill. The issues needs to be discussed are concerned with:
 preparing, inputting into a computer and checking your data;
 choosing the most appropriate tables and diagrams to explore and present your data;
 choosing the most appropriate statistics to describe your data;
 choosing the most appropriate statistics to examine relationships and trends in your data.

Statistics in research

Students are often intimidated by statistics. This brief overview is intended to place statistics in
context and to provide a reference sheet for those who are trying to interpret statistics that they
read. It does not attempt to show or to explain the mathematics involved. Although it is helpful if
those who use statistics understand the math, the computer age has rendered that understanding
unnecessary for many purposes. Practically speaking, students often simply want to know
whether a particular result is significant, i.e. how likely it is that the obtained result may be
attributable to something other than chance. Computer programs can easily produce numbers that
allow such conclusions, if the student knows which tests to use and has an understanding of what
the numbers mean. This summary is intended to help achieve that understanding.

Statistics can define from two perspectives. That is the definition give given according to the
Concise Oxford Dictionary and Business statistics, which is summarized as follows:

 The emphasis on this course is not on the actual collection of numerical facts (data) but
on their classification, summarisation, display and analysis. These processes are carried
out in order to help us understand the data and make the best use of them.

 Any decision making process should be supported by some quantitative measures

produced by the analysis of collected data. Once collected, this data needs to be
summarised and displayed in a manner which helps its communication to, and
understanding by, others. Only when fully understood can it profitably become part of the
decision making process.

Types of data and scales measurement

Any data you use can take a variety of values or belong to various categories, either numerical or
non-numerical. The 'thing' being described by the data is therefore known as a variable. The
values or descriptions you use to measure or categorise this 'thing' are the measurements. These
are of different types, each with its own appropriate scale of measurement which has to be
considered when deciding on suitable methods of graphical display or numerical analysis.

Variable - something whose 'value' can 'vary'.

 A car could be red, blue, green, etc
 It could be classed as small, medium or large.
 Its year of manufacture could be 1991, 1995, 1999, etc.
 It would have a particular length.
These values all describe the car, but are measured on different scales.

Categorical data

This is generally non-numerical data which is placed into exclusive categories and then counted
rather than measured. People are often categorised by sex or occupation. Cars can be categorised
by make or colour.

Nominal data

The scale of measurement for a variable is nominal if the data describing it are simple labels or
names which cannot be ordered. This is the lowest level of measurement. Even if it is coded
numerically, or takes the form of numbers, these numbers are still only labels. For example, car
registration numbers only serve to identify cars. Numbers on athletes vests are only nominal and
make no value statements. All nominal data are placed in a limited number of exhaustive
categories and any analysis is carried out on the frequencies within these categories. No other
arithmetic is meaningful.

Ordinal Data

If the exhaustive categories into which the set of data is divided can be placed in a meaningful
order, without any measurement being taken on each case, then it is classed as ordinal data. This
is one level up from nominal. We know that the members of one category are more, or less, than
the members of another but we do not know by how much. For example, the cars can be ordered
as: 'small', 'medium', 'large' without the aid of a tape measure. Degree classifications are only
ordinal. Athlete’s results depend on their order of finishing in a race, not by 'how much' separates
their times. Questionnaires are often used to collect opinions using the categories: 'Strongly
agree', 'Agree', 'No opinion', 'Disagree' or 'Strongly disagree'. The responses may be coded as 1,
2, 3, 4 and 5 for the computer but the differences between these numbers are not claimed to be
equal.

Interval and Ratio data

In both these scales any numbers are defined by standard units of measurement, so equal
difference between numbers genuinely means equal distance between measurements. If there is
also a meaningful zero, then the fact that one number is twice as big as another means that the
measurements are also in that ratio. These data are known as ratio data. If, on the other hand,
the zero is not meaningful the data are interval only.

There are very few examples of genuine interval scales. Temperature in degrees Centigrade
provides one example with the 'zero' on this scale being arbitrary. The difference between 30°C
and 50°C is the same as the difference between 40°C and 60°C but we cannot claim that 60°C is
twice as hot as 30°C. It is therefore interval data but not ratio data. Dates are measured on this
scale as again the zero is arbitrary and not meaningful.

Ratio data must have a meaningful zero as its lowest possible value so, for example, the time
taken for athletes to complete a race would be measured on this scale. Suppose Bill earns £20
000, Ben earns £15 000 and Bob earns £10 000. The intervals of £5000 between Bill and Ben and
also between Ben and Bob genuinely represent equal amounts of money. Also the ratio of Bob's
earnings to Bill's earnings are genuinely in the same ratio, 1 : 2, as the numbers which represent
them since the value of £0 represents 'no money'. This data set is therefore ratio as well as
interval.

Qualitative and Quantitative data

Various definitions exist for the distinction between these two types of data. Non-numerical,
nominal, data is always described as being qualitative or non-metric as the data is being
described some quality but not measured. Quantitative or metric data which describes some
measurement or quantity is always numerical and measured on the interval or ratio scales. All
definitions agree that interval or ratio data are quantitative. Some text-books, however, use the
term qualitative to refer to words only, whilst others also include nominal or ordinal numbers.
Problems of definition could arise with numbers, such as house numbers, which identify or rank
rather than measure.

Discrete and Continuous data

Quantitative data may be discrete or continuous. If the values which can be taken by a variable
change in steps the data are discrete. These discrete values are often, but not always, whole
numbers. If the variable can take any value within a range it is continuous. The number of
people shopping in a supermarket is discrete but the amount they spend is continuous. The
number of children in a family is discrete but a baby's birth weight is continuous.

Populations and Samples

The population is the entire group of interest. This is not confined to people, as is usual in the
non-statistical sense. Examples may include such 'things' as all the houses in a local authority
area rather than the people living in them.

It is not usually possible, or not practical, to examine every member of a population, so we use a
sample, a smaller selection taken from that population, to estimate some value or characteristic
of the whole population. Care must be taken when selecting the sample as it must be
representative of the whole population under consideration otherwise it doesn't tell us anything
relevant to that particular population.

Occasionally the whole population is investigated by a census, such as is carried out every ten
years in the British Isles to produce a complete enumeration of the whole population. The data
are gathered from the whole population. A more usual method of collecting information is by a
survey in which only a sample is selected from the population of interest and its data examined.
Examples of this are the Gallup polls produced from a sample of the electorate when attempting
to forecast the result of a general election.
Analysing a sample instead of the whole population has many advantages such as the obvious
saving of both time and money. It is often the only possibility as the collecting of data may
sometimes destroy the article of interest, e.g. the quality control of rolls of film.

The ideal method of sampling is random sampling. By this method every member of the
population has an equal chance of being selected and every selection is independent of all the
others. This ideal is often not achieved for a variety of reasons and many other methods are used.

Descriptive Statistics

If the data available to us cover the whole population of interest, we may describe them or
analyse them in their own right, i.e. we are only interested in the specific group from which the
measurements are taken. The facts and figures usually referred to as 'statistics' in the media are
very often a numerical summary, sometimes accompanied by a graphical display, of this type of
data, e.g. unemployment figures. Much of the data generated by a business will be descriptive in
nature, as will be the majority of sporting statistics. In the next few weeks you will learn how to
display data graphically and summarise it numerically.

Inferential Statistics

Alternatively, we may have available information from only a sample of the whole population of
interest. In this case the best we can do is to analyse it to produce the sample statistics from
which we can infer various values for the parent population. This branch of statistics is usually
referred to as inferential statistics. For example we use the proportion of faulty items in a
sample taken from a production line to estimate the corresponding proportion of all the items.

A descriptive measure from the sample is usually referred to as a sample statistic and the
corresponding measure estimated for the population as a population parameter.

The problem with using samples is that each sample taken would produce a different sample
statistic giving us a different estimate for the population parameter. They cannot all be correct so
a margin of error is generally quoted with any sample estimations. This is particularly important
when forecasting future statistics.

In this chapter you will learn how to draw conclusions about populations from sample statistics
and estimate future values from past data.

Descriptive statistics: Measures of Central Tendency

A measure of central tendency is meant to tell us the “center” of a data set or population. What
do we mean by “center”? That’s an inherently vague question. We might mean a typical value,
or the most common value, or a value that’s in the middle… We need to be more specific.

Mean
The mean is what we usually think of as the average (although “average” can be used to refer to
other measures of central tendency as well). For a sample, the mean is the sum of all
observations divided by the number of observations. Here is the formula for the sample mean:
n
∑ xi
x̄= i=1
n

For the mean of a population, in principle we do the same thing: add up the values of all
possible observations, and divide by the number of possible observations. But what if there is no
maximum number of observations? Consider rolling a standard 6-sided die. We could roll it an
infinite number of times, so any finite number of rolls is only a sample. What is the population
mean? You have to take each possible value for an outcome (in this case, one through six),
multiply by its frequency as a fraction of all outcomes, and add up the results. Here is the
formula for the population mean:

μ= ∑ xf ( x )
x∈ X

(The expression f(x) means the frequency of the value x in the population; it is a number between
0 and 1.) We use the Greek letter mu (μ) to stand for the population mean, which we sometimes
call the “true” mean. For the throw of 6-sided die, the population mean is 1*(1/6) + 2*(1/6) +
3*(1/6) + 4*(1/6) + 5*(1/6) + 6*(1/6) = 3.5.

In most cases, we don’t actually know the population mean, so we try to estimate it with the
sample mean.

The technique we just used to find the population mean is also useful when you’re given sample
data in the form of a frequency table, with each value that occurred alongside the frequency with
which it occurred. E.g.,

Answers to “How Many Times Did You Use the Restroom Today?”
Answer # Subjects
0 1
1 3
2 4
3 8
4 5
5 3

We take each possible answer and multiply by its frequency in the sample. The sample mean
here is [0(1) + 1(3) + 2(4) + 3(8) + 4(5) + 5(3)]/24 = 2.92. (Notice that this is the same as
finding the fractional frequency of each value as # Subjects/24, multiplying by the answer value,
and then adding them up.)

Median

The median is the value of the observation exactly in the middle of the sample or population,
such that half the observations have a higher value and half have a lower value. (If there is an
even number of observations, then there is no true middle value, so the median is defined as the
mean of the two middle values.)
In the restroom example, the median is 3 (because there are eight observations higher than 3 and
eight observations lower, and the middle eight observations are all the same). What would
happen if we added three more people who went to the restroom 6 times each? The total
observations would be 29, so we’d be looking for the 15 th highest (and 15th lowest) observation.
It’s still 3!

Mode

The mode is the most common outcome in the sample or population. For example, the modal
sex in the Female is female (because there are more women than men). Mode means the single
most common outcome, not necessarily the majority outcome.

The examples involve nominal data, and that’s where mode is most often useful. But it can be
used with numbers, too. In the restroom frequency table above, the mode is 3. In this example,
the possible outcomes are numerical, but they are also discrete, meaning they take on a countable
number of different values. You can’t use the bathroom one-half a time!

The mode makes less sense with characteristics that are not discrete, but are instead continuous,
meaning the variable can take on an uncountable number of different values. Consider height. If
you measure height precisely enough, it’s difficult to find anyone who is exactly any height you
specify in advance – e.g., 6’0”. Everyone you find will be just slightly above or slightly below
it. The frequency of any precisely defined height is approximately zero! So to define the mode
in cases like this, you need to establish intervals (such as inches of height, which actually
includes an interval of heights rounded off to the nearest whole inch).

Variance and Standard Deviation

Variance: a measure of dispersion in which observations are weighted by the square of their
n
∑ ( x i− x̄ )2
s2 = i=1
distance from the mean, as given by the following formula: n−1
Note that this gives greater weight to an observation the further it is from the mean. For
example, suppose the mean is zero. An observation of 4 (or -4) would be weighted four times as
much as an observation of 2 (or -2), despite being only twice as far from the mean.

Standard deviation: the square root of the variance (for both the sample and the population).

s= √ s 2 σ =√ σ 2

Quartiles, Quintiles, Deciles, etc.

We use these measures when we want to divide a group or population into a number of equal-
sized subgroups. Quartiles are four equal-sized subgroups; quintiles are five equal-sized
subgroups; deciles are ten equal-sized subgroups. (There are others, but these are the most
common.)

Note that “equal-sized” is with respect to the number of observations or members in each
subgroup, not the size of the group’s interval. For instance, households are often divided into
income quintiles. The top quintile includes the 20% of households that have the highest annual
income. The bottom quintile includes the 20% of households that have the lowest annual
income. These quintiles include equal numbers of households, but they will not correspond to
the same size intervals of incomes.

Percentiles or “Xiles” are used for various purposes, but most often in economics for dividing the
population into income groups. This can be useful for getting a sense of the dispersion of
incomes in the economy. But to see how they can be misleading, notice that the dividing lines
are much like the median: they can be invariant to changes on either side of them. E.g., people
at the top of the top quintile, or the bottom of the bottom quintile, could get richer or poorer
without affecting the quintile dividing lines.
Descriptive statistics: Tabular and Graphical methods

Frequency distribution is a tabular presentation of data, which shows the frequency of the
appearance of data elements in several nonoverlapping classes. The purpose of the frequency
distribution is to organize masses of data elements into smaller and more manageable groups.
The frequency distribution can present both qualitative and quantitative data. Besides, when
summarizing a large set of data it is often useful to classify the data into classes or categories and
to determine the number of individuals belonging to each class, called the class frequency. A
tabular arrangement of data by classes together with the corresponding frequencies is called a
frequency distribution or simply a frequency table. Consider the following definitions:
 Class: A grouping of data elements in order to develop a frequency distribution.
 Class Width: The length of the class interval. Each class has two limits. The lowest
value is referred to as the lower class limit, and the highest value is the upper class limit.
The difference between the upper and the lower class limits represents the class width.
 Class Midpoint: The point in each class that is halfway between the lower and the upper
class limits.
 Frequency: The number of observations in a class.
 Relative Frequency Distribution: A tabular presentation of a set of data which shows
the frequency of each class as a fraction of the total frequency. The relative frequency
distribution can present both qualitative and quantitative data.
 Percent Frequency Distribution: A tabular presentation of a set of data which shows
the percentage of the total number of items in each class. The percent frequency of a
class is simply the relative frequency multiplied by 100.
 Cumulative Frequency Distribution: A tabular presentation of a set of quantitative data
which shows for each class the total number of data elements with values less than the
upper class limit.
 Cumulative Relative Frequency Distribution: A tabular presentation of a set of
quantitative data which shows for each class the fraction of the total frequency with
values less than the upper class limit.
 Cumulative Percent Frequency Distribution: A tabular presentation of a set of
quantitative data which shows for each class the fraction of the total frequency with
values less than the upper class limit.

Besides to the above points, it is important to discus the graphs and charts with frequency
Distributions. Some of the types of graphs and charts are discussed below:
1. Bar Graph: A graphical method of presenting qualitative data that have been
summarized in a frequency distribution or a relative frequency distribution.
2. Pie Chart: A graphical device for presenting qualitative data by subdividing a circle into
sectors that correspond to the relative frequency of each class.
3. Dot Plot: A graphical presentation of data, where the horizontal axis shows the range of
data values and each observation is plotted as a dot above the axis.
4. Histogram: A graphical method of presenting a frequency or a relative frequency
distribution.
5. Ogive: A graphical method of presenting a cumulative frequency distribution or a
cumulative relative frequency distribution.
6. Stem-and-Leaf Display: An exploratory data analysis technique (the use of simple
arithmetic and easy-to-draw pictures to look at data more effectively) that simultaneously
rank orders quantitative data and provides insight into the shape of the underlying
distribution.
7. Crosstabulation: A tabular presentation of data for two variables. Rows and columns
show the classes of categories for the two variables.
8. Scatter Diagram: A graphical method of presenting the relationship between two
quantitative variables. One variable is shown on the horizontal and the other on the
vertical axis.

Having discussed the above important presentation of data using tabular and graphical methods
of descriptive statistics, lets know illustrate each of the above points using some examples.
Example 1:
A student has completed 20 courses in the School of Accounting and Finance. Her grades in the
20 courses are shown below.
A B A B C
C C B B B
B A B B B
C B C B A
(a) Develop a frequency distribution for her grades.
Answer: To develop a frequency distribution we simply count her grades in each category.
Thus, the frequency distribution of her grades can be presented as
Grade Frequency
A 4
B 11
C 5
20
(b) Develop a relative frequency distribution for her grades.
Answer: The relative frequency distribution is a distribution that shows the fraction or
proportion of data items that fall in each category. The relative frequencies of each category can
be computed by equation 2.1.

Thus, the relative frequency distribution can be shown as follows:

Grade Relative Frequency
A 4/20 = 0.20
B 11/20 = 0.55
C 5/20 = 0.25
(c) Develop a percent frequency distribution for her grades.
Answer: A percent frequency distribution is a tabular summary of a set of data showing the
percent frequency for each class. The percent frequency of a class is simply the relative
frequency multiplied by 100. Thus, we can multiply the relative frequencies that we found in
Part b to arrive at the percent frequency distribution. Hence, the percent frequency distribution
can be shown as follows.
Grade Percent Frequency
A 20
B 55
C 25
(d) Develop a bar graph.
Answer: A bar graph is a graphical device for presenting the information of a frequency
distribution for qualitative data. Bars of equal width are drawn to represent various classes (in
this case, grades). The height of each bar represents the frequencies of various classes. Figure
below shows the bar graph for the above data.

BAR GRAPH OF GRADES

10
Frequency

0
A B C

Grades

(e) Construct a pie chart

Answer: A pie chart is a pictorial device for presenting a relative frequency distribution of
qualitative data. The relative frequency distribution is used to subdivide a circle into sections,
where each section's size corresponds to the relative frequency of each class. Figure below
shows the pie chart for the student's grades.
Pie chart for grades

Exercise 1:

There are 800 students in the School of Business Administration at UTC. There are four majors
in the school: Accounting, Finance, Management and Marketing. The following shows the
number of students in each major:

Major Number of Students

Accounting 240
Finance 160
Management 320
Marketing 80
(a) Develop a relative and a percent frequency distribution.
(b) Construct a bar chart.
(c) Construct a pie chart.
Exercise 2:
Thirty students in the School of Business were asked what their majors were. The following
represents their responses (M = Management;
A = Accounting; E = Economics; O = Others).
A M M A M M E M O A
E E M A O E E A M A
M A O A M E E M A M
(a) Construct a frequency distribution.
(b) Construct a relative frequency and a percent frequency distribution.

Example 2:
In a recent campaign, many airlines reduced their summer fares in order to gain a larger share of
the market. The following data represent the prices of round-trip tickets from Atlanta to Boston
for a sample of nine airlines.
120 140 140
160 160 160
160 180 180
Construct a dot plot for the above data.

Answer: The dot plot is one of the simplest graphical presentations of data. The horizontal axis
shows the range of data values, and each observation is plotted as a dot above the axis. Figure
below shows the dot plot for the above data. The four dots shown at the value of 160 indicate
that four airlines were charging $160 for the round-trip ticket from Atlanta to Boston.

Dot plot for ticket prices

·
·
· · ·
_ · · · · ___
100 120 140 160 180 200
Ticket Prices
Exercise 3:
A sample of the ages of 10 employees of a company is shown below.
20 30 40 30 50
30 20 30 20 40
Construct a dot plot for the above data.
Example 3:
The following data elements represent the amount of time (rounded to the nearest second) that 30
randomly selected customers spent in line before being served at a branch of First County Bank.
183 121 140 198 199
90 62 135 60 175
320 110 185 85 172
235 250 242 193 75
263 295 146 160 210
165 179 359 220 170

(a) Develop a frequency distribution for the above data.

Answer: The first step for developing a frequency distribution is to decide how many classes
are needed. There are no "hard" rules for determining the number of classes; but generally, using
anywhere from five to twenty classes is recommended, depending on the number of
observations. Fewer classes are used when there are fewer observations, and more classes are
used when there are numerous observations. In our case, there are only 30 observations. With
such a limited number of observations, let us use 5 classes. The second step is to determine the
width of each class. By using the following equation which states we can determine the class
width. In the above data set, the highest value is 359, and the lowest value is 60. Therefore,

Approximate Class Width =

Approximate Class Width = = 59.8

We can adjust the above class width (59.8) and use a more convenient value of 60 for the
development of the frequency distribution. Note that I decided to use five classes. If you had
used 6 or 7 or any other reasonable number of classes, you would not have been wrong and
would have had a frequency distribution with a different class width than the one shown above.
Now that we have decided on the number of classes and have determined the class width, we are
ready to prepare a frequency distribution by simply counting the number of data items belonging
to each class. For example, let us count the number of observations belonging to the 60 - 119
class. Six values of 60, 62, 75, 85, 980, and 110 belong to the class of 60 - 119. Thus, the
frequency of this class is 6. Since we want to develop classes of equal width, the last class width
is from 300 to 359.

The frequency distribution of waiting times at first county bank

Waiting Times
(Seconds) Frequency
60 - 119 6
120 - 179 10
180 - 239 8
240 - 299 4
300 - 359 2
Total 30

(b) What are the lower and the upper class limits for the first class of the above frequency
distribution?

Answer: The lower class limit shows the smallest value that is included in a class. Therefore,
the lower limit of the first class is 60. The upper class limit identifies the largest value included
in a class. Thus, the upper limit of the first class is 119. (Note: The difference between the
lower limits of adjacent classes provides the class width. Consider the lower class limits of the
first two classes, which are 60 and 120. We note that the class width is 120 - 60 = 60.)

Answer: The relative frequency for each class is determined by the use of equation.
Relative Frequency of a Class =

Where n is the total number of observations. The percent frequency distribution is simply the
relative frequencies multiplied by 100. Hence, the relative frequency distribution and the percent
frequency distribution are developed as shown on the next page.

Relative and percent frequency distributions of waiting times at first county bank
Waiting Times Relative Percent
(Seconds) Frequency Frequency Frequency

60 - 119 6 6/30 = 0.2000 20.00

120 - 179 10 10/30 = 0.3333 33.33
180 - 239 8 8/30 = 0.2667 26.67
240 - 299 4 4/30 = 0.1333 13.33
300 - 359 2 2/30 = 0.0667 6.67
Total 30 1.0000 100.00

(d) Develop a cumulative frequency distribution.

Answer: The cumulative frequency distribution shows the number of data elements with values
less than or equal to the upper limit of each class. For instance, the number of people who
waited less than or equal to 179 seconds is 16 (6 + 10), and the number of people who waited
less than or equal to 239 seconds is 24 (6 + 10 + 8). Therefore, the frequency and the cumulative
frequency distributions for the above data will be as follows.
Frequency and cumulative frequency distributions for the waiting times at first county
bank
Waiting Times Cumulative
(Seconds) Frequency Frequency
60 - 119 6 6
120 - 179 10 16
180 - 239 8 24
240 - 299 4 28
300 - 359 2 30

(e) How many people waited less than or equal to 239 seconds?
Answer: The answer to this question is given in the table of the cumulative frequency. You can
see that 24 people waited less than or equal to 239 seconds.

(f) Develop a cumulative relative frequency distribution and a cumulative percent frequency
distribution.
Answer: The cumulative relative frequency distribution can be developed from the relative
frequency distribution. It is a table that shows the fraction of data elements with values less than
or equal to the upper limit of each class. Using the table of relative frequency, we can develop
the cumulative relative and the cumulative percent frequency distributions as follows:

Relative frequency and cumulative relative frequency and cumulative percent frequency
distributions of waiting times at first county bank
Cumulative Cumulative
Waiting Times Relative Relative Percent
(Seconds) Frequency Frequency Frequency
60 - 119 0.2000 0.2000 20.00
120 - 179 0.3333 0.5333 53.33
180 - 239 0.2667 0.8000 80.00
240 - 299 0.1333 0.9333 93.33
300 - 359 0.0667 1.0000 100.00
NOTE: To develop the cumulative relative frequency distribution, we could have used the
cumulative frequency distribution and divided all the cumulative frequencies by the total number
of observations, that is, 30.

(g) Construct a histogram for the waiting times in the above example.

Answer: One of the most common graphical presentations of data sets is a histogram. We can
construct a histogram by measuring the class intervals on the horizontal axis and the frequencies
on the vertical axis. Then we can plot bars with the widths equal to the class intervals and the
height equivalent to the frequency of the class that they represent. In Figure 2.4, the histogram
of the waiting times is presented. As you note, the width of each bar is equal to the width of the
various classes (60 seconds), and the height represents the frequency of the various classes. Note
that the first class ends at 119; the next class begins at 120, and one unit exists between these two
classes (and all other classes). To eliminate these spaces, the vertical lines are drawn halfway
between the class limits. Thus, the vertical lines are drawn at 59.5, 119.5, 179.5, 239.5, 299.5,
and 359.5.
Histogram of the waiting times at first county bank

0 60 120 180 240 300 360

Waiting Times (in seconds)

(h) Construct an ogive for the above example.
Answer: An ogive is a graphical representation of the cumulative frequency distribution or
cumulative relative frequency distribution. It is constructed by measuring the class intervals on
the horizontal axis and the cumulative frequencies (or cumulative relative frequencies) on the
vertical axis. Then points are plotted halfway between class limits (i.e., 119.5, 179.5, 239.5, etc.)
at a height equal to the cumulative frequency (or cumulative relative frequency). One additional
point is plotted at 59.5 on the horizontal axis and 0 on the vertical axis. This point shows that
there are no data values below the 60 - 119 class. Finally, these points are connected by straight
lines. The result is an ogive that is shown in Figure below.

Ogive for the cumulative frequency distribution of the waiting times at first county bank
Waiting Times (in seconds)

Waiting Times
Exercise 4:
The following data set shows the number of hours of sick leave that some of the employees of
Bastien's, Inc. have taken during the first quarter of the year (rounded to the nearest hour).
19 22 27 24 28 12
23 47 11 55 25 42
36 25 34 16 45 49
12 20 28 29 21 10
59 39 48 32 40 31
(a) Develop a frequency distribution for the above data. (Let the width of your classes be 10
units and start your first class as 10 - 19.)
(b) Develop a relative frequency distribution and a percent frequency distribution for the data.
(c) Develop a cumulative frequency distribution.
(d) How many employees have taken less than 40 hours of sick leave?

Exercise 5:
The sales record of a real estate company for the month of May shows the following house prices
(rounded to the nearest $1,000). Values are in thousands of dollars.
105 55 45 85 75
30 60 75 79 95
(a) Develop a frequency distribution and a percent frequency distribution for the house prices.
(Use 5 classes and have your first class be 20 - 39.)
(b) Develop a cumulative frequency and a cumulative percent frequency distribution for the
above data.
(c) What percentage of the houses sold at a price below $80,000?

Example 4:
The test scores of 14 individuals on their first statistics examination are shown below.
95 87 52 43 77 84 78
75 63 92 81 83 91 88
a) Construct a stem-and-leaf display for these data.
Answer: To construct a stem-and-leaf display, the first digit of each data item is arranged in an
ascending order and written to the left of a vertical line. Then, the second digit of each data item
is written to the right of the vertical line next to its corresponding first digit as follows.
4 3
5 2
6 3
7 7 8 5
8 7 4 1 3 8
9 5 2 1
Now, the second digits are rank ordered horizontally, thus leading to the following stem-and-leaf
display.
4 3
5 2
6 3
7 5 7 8
8 1 3 4 7 8
9 1 2 5

b) What does the above stem-and-leaf show?

Answer: Each line in the above display is called a stem, and each piece of information on a
stem is a leaf. For instance, let us consider the fourth line:
7 5 7 8
The stem indicates that there are 3 scores in the seventies. These values are
75 77 78
Similarly, we can look at line five (where the first digit is 8) and see
8 1 3 4 7 8
This stem indicates that there are 5 scores in the eighties, and they are
81 83 84 87 88
At a glance, one can see the overall distribution for the grades. There is one score in the forties
(43), one score in the fifties (52), one score in the sixties (63), three scores in the seventies (75,
77, 78), five scores in the eighties (81, 83, 84, 87, 88), and three scores in the nineties (91, 92,
95).

Exercise 6:
Construct a stem-and-leaf display for the following data.
22 44 36 45 49 57 38 47 51 12
18 48 32 19 43 31 26 40 37 52
Example 5:
The following is a crosstabulation of starting salaries (in $1,000s) of a sample of business school
graduates by their gender.
Starting Salary

Gender Less than 20 20 up to 25 25 and more Total

Female 12 84 24 120

Male 20 48 12 80

Total 32 132 36 200

(a) What general comments can be made about the distribution of starting salaries and the
gender of the individuals in the sample?
Answer: Using the frequency distribution at the bottom margin of the above table it is noted
that majority of the individuals in the sample (132) have starting salaries in the range of $20,000
up to $25,000, followed by 36 individuals whose salaries are at least $25,000, and only 32
individuals had starting salaries of under $20,000. Now considering the right-hand margin it is
noted that the majority of the individuals in the sample (120) are female while 80 are male.

(b) Compute row percentages and comment on the relationship between starting salaries and
gender.
Answer: To compute the row percentages we divide the values of each cell by the row total and
express the results as percentages. Let us consider the row representing females. The row
percentages (across) are computed as (12/120)(100)=10%; (84/120)(100)=70%; (24/120)
(100)=20% Continuing in the same manner and computing the row percentages for the other row
we determine the following row percentages table:
Starting Salary

Gender Less than 20 20 up to 25 25 and more Total

Female 10% 70% 20% 100%

Male 25% 60% 15% 100%

From the above percentages it can be noted that the largest percentage of both genders' starting
salaries are in the $20,000 to $25,000 range. However, 70% of females and only 60% of the
males have starting salaries in this range. Also it can be noted that 10% of females' starting
salaries are under $20,000, whereas, 25% of the males' starting salaries fall in this category.

(c) Compute column percentages and comment on the relationship between gender and starting
salaries.
Column percentages are computed by dividing the values in each cell by column total and
expressing the results as percentages. For instance for the category of "Less than 20" the column
percentages are computed as (12/32)(100)=37.5 and (20/32)(100)=62.5 (rounded). Continuing in
the same manner the column percentages will be as follows.
Starting Salary
Gender Less than 20 20 up to 25 25 and more

Female 37.5% 63.6% 66.7%

Male 62.5% 36.4% 33.3%

Total 100% 100% 100%

Considering the "Less than 20" category it is noted that the majority (62.5%) are male. In the
next category of "20 up to 25" the majority (63.6%) are female. Finally in the last category of
"25 and more" the majority (66.7%) are female.
Exercise 7:
A survey of 400 college seniors resulted in the following crosstabulation regarding their
undergraduate major and whether or not they plan to go to graduate school.
Undergraduate Major

Graduate School Business Engineering Others Total

Yes 35 42 63 140

No 91 104 65 260

Total 126 146 128 400

(a) Are majority of seniors in the survey planning to attend graduate school?
(b) Which discipline constitutes the majority of the individuals in the survey?
(c) Compute row percentages and comment on the relationship between the students'
undergraduate major and their intention of attending graduate school.
(d) Compute the column percentages and comment on the relationship between the students'
intention of going to graduate school and their undergraduate major.
Example 6:
The average grades of 8 students in statistics and the number of absences they had during the
semester are shown below.
Number of Average
Absences Grade
Student (x) (y)

1 1 94
2 2 78
3 2 70
4 1 88
5 3 68
6 4 40
7 8 30
8 3 60

Develop a scatter diagram for the relationship between the number of absences (x) and their
average grade (y).

Answer: A scatter diagram is a graphical method of presenting the relationship between two
variables. The scatter diagram is shown in Figure 2.6. The number of absences (x) is shown in
the horizontal axis and the average grade (y) on the vertical axis. The first student has one
absence (x=1) and an average grade of 94 (y=94). Therefore, a point with coordinates of x=1
and y=94 is plotted on the scatter diagram. In a similar manner all other points for all 8 students
are plotted.
The scatter diagram shows that there is a negative relationship between the number of absences
and the average grade. That is, the higher the number of absences, the lower the average grade
appears to be.

Exercise 7:
You are given the following ten observations on two variables, x and y.
x y

1 8
5 15
6 20
4 12
2 10
8 20
9 26
1 5
6 18
8 26
(a) Develop a scatter diagram for the relationship between x and y.
(b) What relationship, if any, appears to exist between x and y?
Inferential Statistics

We have been talking about ways to calculate and describe characteristics about data.
Descriptive statistics tell us information about the distribution of our data, how varied the data
are, and the shape of the data. Now we are also interested in information related to our data
parameters. In other words, we want to know if we have relationships, associations, or
differences within our data and whether statistical significance exists. Inferential statistics help
us make these determinations and allow us to generalize the results to a larger population.
Inferential statistics is defined as the branch of statistics that is used to make inferences about the
characteristics of populations based on sample data. We provide background about parametric
and nonparametric statistics and then show basic inferential statistics that examine associations
among variables and tests of differences between groups.

Parametric and Nonparametric Statistics

In the world of statistics, distinctions are made in the types of analyses that can be used by the
evaluator based on distribution assumptions and the levels of measurement data. For example,
parametric statistics are based on the assumption of normal distribution and randomized
sampling that result in interval or ratio data. The statistical tests usually determine significance of
difference or relationships. These parametric statistical tests commonly include t-tests, Pearson
product-moment correlations, and analyses of variance.

Nonparametric statistics are known as distribution-free tests because they are not based on the
assumptions of the normal probability curve. Nonparametric statistics do not specify conditions
about parameters of the population but assume randomization and are usually applied to nominal
and ordinal data. Several nonparametric tests do exist for interval data, however, when the
sample size is small and the assumption of normal distribution would be violated. The most
common forms of nonparametric tests are chi square analysis, Mann-Whitney U test, the
Wilcoxon matched-pairs signed ranks test, Friedman test, and the Spearman rank-order
correlation coefficient. These non-parametric tests are generally less powerful tests than the
corresponding parametric tests. Table below provides parametric and nonparametric equivalent
tests used for data analysis. The following sections will discuss these types of tests and the
appropriate parametric and nonparametric choices.

Number of Samples

Different situations require different testing procedures. The following discussion is organized
according to the number of samples that is being evaluated: one sample, two samples, and more
than two samples. In each category, both parametric and nonparametric tests are explained.

Hypothesis Testing an Overview

Suppose that a manufacturer of car tires claims that the mean life time of a particular type of tire
(measured in miles driven) is 40,000 miles. A consumer organization tests 40 of these tires in
real life circumstances and finds a mean life time of 37,000 miles with a standard deviation of
7500 miles in the life times of the tires in this sample. The question now is: does the test of this
consumer organization indicates that the mean life time of this particular type of tire is
significantly different from the claimed 40,000 miles or is it not significantly different from
40,000 miles as claimed by the manufacturer. In this case the difference between the
manufacturer’s claim and the sample result could be explained by statistical fluctuations; taking
a different sample may yield a slightly different result.

To answer questions like these we will study a branch of statistics called hypothesis testing. This
can be done in a variety of different ways but not all set-ups work well. In this and all subsequent
papers on hypothesis testing we will use the "Classical Approach" which always works. I request
that you do not deviate from this approach and use this set-up for all hypothesis testing problems.
Also a familiar set-up will make it easier for you to work with these rather lengthy problems.

The classical approach is a four step procedure. First, we formulate a working hypothesis called
the null hypothesis. Second, we will formulate a decision rule stating when to accept the null
hypothesis. Third, we will calculate the test statistic, a formula specific for the sort of hypothesis
test we will perform, and compare the value of this test statistic to the decision rule to decide
whether or not to accept the null hypothesis. Finally we formulate an answer statement to the
question we have been given.

Two tailed: The key word for a two tailed test is the mention of the words "significantly
different". In the conclusion we than use words like "is not significantly different from".

One tailed: We have two types of one tailed tests; left tailed and right tailed tests. It pays off to
be systematic.
 A left tailed test is performed in case of a problem asking us whether or not the
population mean is significantly less than a stated mean value. The null hypothesis in a
left tailed test states that the population mean is not significantly less that the stated mean
value and the decision rule is to accept the null hypothesis when the calculated value is

greater than the table value (for instance in z-test:

z >−z c ). In the conclusion we use words

like "is not significantly less than" or "is significantly less than".
 A right tailed test is performed when the question is something like "is the population
mean significantly more than"? The null hypothesis in case of a right tailed test states that
the population mean is not significantly more than the stated mean value and the decision
rule is to accept the null hypothesis when the calculated value is less than the table value (for

instance in z-test:
z < z c ). In the conclusion we use words like "is not significantly more

than" or "is significantly more than".

Single sample tests

One-sample t test (Parametric)

A t test is commonly used to compare two means to see if they are significantly different from
each other. The t distribution varies according to the size of the sample, so once a t value is
calculated, it has to be looked up in a table to find its significance level. The range of t values
extends from 1.28 to 636.62, and higher numbers show increasing significance. (These
statements are true of all t tests.)
A one-sample or single-sample t test is used to see whether a group within a population is
different from the population as a whole. The dependent variable must be interval or ratio. Using
this test we shall do a hypothesis test for the mean of a population. We will consider a statement

involving given populations mean µ and a sample mean x . The sample size n has to be less than
30, in symbolic formn<30 . The result of the test will be a conclusion in which we state that the
population mean is, or is not significantly different from, or is less than or more than what is
stated.

X−μ
t=
s
The test statistic for this type of hypothesis testing is √ n , the critical value t c is to be
found from the t-distribution table.

Example:

A manufacturer of strapping tape claims that the tape has a mean breaking strength of 500
pounds per square inch (psi). A random sample of 16 specimens is drawn from a large shipment
of tape and a sample mean breaking strength of 480 psi is computed. The sample standard
deviation is 50 psi. Can we conclude from these data that the mean breaking strength for this
shipment is less than what is claimed by the manufacturer? Use the 0.01 level of significance.

Solution:
The word "less than" in the last sentence of the problem tells us to perform a left tailed test.

Step 1: The Null Hypothesis

The mean breaking strength of the tape is not significantly less than the manufacturer’s claim of
500 psi.

Step 2: The Decision Rule

In case of samples with a size less than 30 we have to use the t-distribution table to find the
critical value for the decision rule. In this case we do not find a critical z-value but a critical t-

value
tc.

As outlined earlier in an earlier document called "Confidence Intervals" this table works with the
concept of "degrees of freedom" which is abbreviated as d.f. in the table above. Degrees of
freedom is defined here as the sample size minus one, in symbols d . f .=n−1 . In order to find the
critical value for this test we use the "One tail" line of the table with α =0 . 01 and

d . f .=16−1=15 . Follow the arrows in the table underneath to find the critical value t c=2 . 602 .
Since we are on the left side of the mean we have to use a negative value for the critical value

and thus we use

t c=−2. 602

The decision rule is thus that we accept the null hypothesis if t >−2. 602

Step 3: The Test Statistic

X−μ 480−500
t= t= =−1 .6
s 50
Using √ n we find √16 .

Step 4: The Conclusion

Combining the info from steps 2 and 3 in a picture with a normal curve we get:
49 %
50 %

tc = - 2.602 t=0

t = -1.6

We see that t is in the shaded region, i.e. in the acceptance region formulated in step 2. Thus we
conclude that at the 0.01 level of significance the mean breaking strength of the tape is not
significantly less than 500 psi. This means that the manufacturers claim is accepted as correct.

Exercise 1:

An anthropologist is studying the heights of adult members of an ancient population. The

conventional theory is that the mean height of adult men from the population is 56 inches. A
sample of the remains of 12 men showed a mean of 54.4 inches with a standard deviation of 2
inches. Can the anthropologist conclude that the mean height is significantly different from 56
inches? Use the 0.05 level of significance.

Exercise 2:

A city health department wishes to determine if the mean bacteria count per unit volume of water
at Siesta Lake Beach is higher than the safety level of 200. Researchers have collected 10 water
samples and have found the bacteria count per unit volume to be 185, 190, 215, 198, 204, 207,
211, 205, 198 and 210. At the 0.1 level of significance, do the data warrant cause for concern?

Chi-square goodness of fit test (Nonparametric)

2
A chi-square ( χ ) test for goodness of fit is performed when the question is whether or not an
observed pattern or a distribution of numbers is significantly different from an expected pattern
or a distribution of numbers.

It is used to compare observed and expected frequencies within a group in a sample, i.e. whether
the observed results differ from the expected results, with the expected results derived either
from the whole population or from theoretical expectations.

In addition to the universal assumptions, the Chi-square goodness of fit test rests on the
assumption that the categories in the cross tabulation are mutually exclusive and exhaustive, the
dependent variable must be nominal, and no expected frequency should be less than 1, and no
more than 20% of the expected frequencies should be less than 5.

The chi-square statistic is looked up in a table of critical values, and the statistic must be larger
than the critical value to reject the null hypothesis. Chi square values range from 0 into the
hundreds, and higher numbers show increasing independence of the variables.

Examples:

 The expected or claimed number of cars rented out per category (like small, medium size,
large, SUV etc) versus the actual number of cars rented out per category.
 The expected number of customers per two-hour time periods entering a shop vs. the
actually observed number of customers per two-hour time periods.
 The nationwide number of ex-convicts arrested again after their release from prison vs.
the number of arrests of ex-convicts observed in a particular city.

We will work these three examples in this chapter. It is essential to start each of the problems for
this test by starting with a table looking like this one:

Category Observed No Expected No

We will always have a null hypothesis which states that the observed distribution is not
significantly different from the expected distribution and of course use words relevant to that
particular problem.

χ2 < χ 2
The decision rule for this test will always be c where the critical value has to be read
2
from the χ distribution table. The only two numbers needed to look up this critical value are the
level of significance α and the number of degrees of freedom. The degrees of freedom for this
test will be defined as the number of categories minus 1. This is how we find the critical value
for a particular problem: suppose that we use α =0 . 05 and have 5 degrees of freedom (6
categories). We can find the critical value by looking at the point where the two arrows in the
2
table meet. We find the critical χ value to be 11.071.

( E−O)2
χ 2 =∑
The test statistic is E where E and O are the expected and observed frequencies
per category. How to find these values and work out the problems will hopefully become clear
when working the examples below.

All submitted work concerning hypothesis testing will have to follow the usual 4 step format.

Example:

A new branch of a large car rental company is to be opened on a sunny island. The general
management of this company from previous experiences expects 25% of the car rental contracts
to be for small cars, 35% for medium size cars, 10% for large cars, 25% for SUV’s and the
remaining 5% for specialty cars such as vans, pick ups etc. The local manager on this island
decides to test whether or not this distribution is actually what they see happening in their office.
Out of 200 randomly sampled car rental contracts they note that 37 are for small cars, 81 for
medium size cars, 14 for large cars, 61 for SUV’s and 7 for specialty cars. Can it, at the 0.01
level of significance be concluded that the general management’s claim is correct?

Solution:

Before we start our usual 4-step hypothesis testing routine we have to collect all the information
in a table. What we really need in the table is a list of the 5 categories of rental cars, the 5
observed frequencies for each of the categories and the 5 expected frequencies. These expected
frequencies we have to calculate from the percentages given but that is easily done. After all, the
general management claims that 25% of all cars rented are small cars. In the sample of 200 cars
this means that . 25⋅200=50 are small cars. Likewise . 35⋅200=70 are medium size cars.
Continuing like this we can collect all the info in the following table:

Type of Car Observed No of Cars Expected No of Cars

Small 37 50

Medium 81 70

Large 14 20

SUV 61 50

Specialty Car 7 10

Step 1 : The Null Hypothesis

The observed distribution of the local manager is not significantly different from the expected
distribution given by the general management of this company

Step 2 : The Decision Rule

Since there are 5 categories of rental cars then the number of degrees of freedom is

d . f .=5−1=4 . Using the χ 2 distribution table as explained above we'll find a critical value of
χ 2=13 . 28
c . Using the form of the decision rule outlined above we'll state that we accept the
2
null hypothesis if χ <13 .28
Step 3 : The Test Statistic

( E−O)2
χ =∑
2

Using E we find

2 ( 50−37 )2 ( 70−81 )2 (20−14 )2 ( 50−61 )2 (10−7 )2

χ = + + + + =10. 23
50 70 20 50 10

Step 4 : The Conclusion

2
Combining the info from steps 2 and 3 in a picture with a χ distribution we get:

2
We see that χ is in the shaded region, that is it's in the acceptance region formulated in step 2.
Thus we conclude that at the 0.01 level of significance the observed distribution is not
significantly different from the expected distribution as stated by the general management of the
car rental company.

Exercise 1:

It is claimed that there is no preference for customers to come into a shop as far as the time of
day is concerned. To test the correctness of this claim the manager decides to tally the number of
customers that enter the shop during 6 two-hour periods in a particular week and arrives at the
following information:
Time period No of customers

08 – 10 19

10 – 12 27

12 – 14 38

14 – 16 38

16 – 18 32

18 – 20 26

Judging from these data, can it at the 0.05 level of significance be concluded that customers
indeed have no preference as far as the time of the day is concerned to visit this shop?

Exercise2:

A national study revealed that, within 5 years of their release from prison, 20% of criminals had
not been arrested again, 38% had been arrested once, and so on. The table underneath shows the
complete distribution:

Number of Arrests and Percent of Total

Number of Arrests Percent of Total

0 20.0

1 38.0

2 18.0

3 13.5

4 or more 10.5
A social agency in a large city has developed a guidance program for former prisoners who settle
there. Anxious to compare local results with the national figures, the director of the social agency
at random selected 200 former prisoners who were in the guidance program. His results are
summarized in the following table:

Local Number of Arrests

Number of Number of Ex-convicts

Arrests

0 58

1 62

2 28

3 16

4 or more 36

At the 0.01 level of significance, how would the director of this social agency compare the local
results with the National figures?

Business Statistics
100% (22)
Business Statistics
506 pages
ST1837 B46TU-B48TU Engines
100% (2)
ST1837 B46TU-B48TU Engines
40 pages
Manual Midas m32
100% (1)
Manual Midas m32
61 pages
Introduction To Business Statistics
No ratings yet
Introduction To Business Statistics
506 pages
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
67% (3)
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
35 pages
Fundamentals of Meter Provers and Proving Methods
100% (1)
Fundamentals of Meter Provers and Proving Methods
9 pages
Formwork (Shuttering) For Different Structural Members - Beams, Slabs Etc
No ratings yet
Formwork (Shuttering) For Different Structural Members - Beams, Slabs Etc
6 pages
Pre Qualificaion1
No ratings yet
Pre Qualificaion1
16 pages
w3 ch2 Anno
100% (1)
w3 ch2 Anno
28 pages
Jest All in One Notes 2018-Final Updated PDF
No ratings yet
Jest All in One Notes 2018-Final Updated PDF
165 pages
9 Correlation
No ratings yet
9 Correlation
123 pages
CBSE Class 11 Mathematics Worksheet - Set Theory (1) Export PDF
100% (1)
CBSE Class 11 Mathematics Worksheet - Set Theory (1) Export PDF
14 pages
Pile Type 1 - Screw Pile Load Test Outline (Terna)
No ratings yet
Pile Type 1 - Screw Pile Load Test Outline (Terna)
113 pages
Business Tool For Decision Making
No ratings yet
Business Tool For Decision Making
68 pages
Dynamic Behavior of Materials, Volume 1: Leslie E. Lamberson Editor
No ratings yet
Dynamic Behavior of Materials, Volume 1: Leslie E. Lamberson Editor
218 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
Probability and Statistics 1 Session 1 - 3: Instructor: Prof. Deepika Jain E-Mail Id: Deepika - Jain@iimrohtak - Ac.in
No ratings yet
Probability and Statistics 1 Session 1 - 3: Instructor: Prof. Deepika Jain E-Mail Id: Deepika - Jain@iimrohtak - Ac.in
180 pages
BBA 2nd Sem - BBAHC-3
No ratings yet
BBA 2nd Sem - BBAHC-3
72 pages
Difference Between Primary and Secondary Data
0% (1)
Difference Between Primary and Secondary Data
5 pages
Business Statistics Notes-3 - Compressed (1) - Compressed
No ratings yet
Business Statistics Notes-3 - Compressed (1) - Compressed
113 pages
Lesson 1: Brief History of Statistics
No ratings yet
Lesson 1: Brief History of Statistics
17 pages
DT Notes Unit 1 & 2 Part 1
No ratings yet
DT Notes Unit 1 & 2 Part 1
169 pages
Business Statistics Notes-3 - Compressed (1) - Compressed
No ratings yet
Business Statistics Notes-3 - Compressed (1) - Compressed
97 pages
Tissin Positioner TS900-manual E
No ratings yet
Tissin Positioner TS900-manual E
52 pages
Chem Lab 2
No ratings yet
Chem Lab 2
6 pages
Data Presentation and Analysing: Orcid ADK-8747-2022
No ratings yet
Data Presentation and Analysing: Orcid ADK-8747-2022
18 pages
Chapter One Quantitative Techniques
No ratings yet
Chapter One Quantitative Techniques
70 pages
Lecture 1
No ratings yet
Lecture 1
51 pages
Premium Financing Agent Guide
No ratings yet
Premium Financing Agent Guide
4 pages
Business Tool For Decision Making
No ratings yet
Business Tool For Decision Making
109 pages
QM Version 1.0
No ratings yet
QM Version 1.0
303 pages
04 02 Permutation and Combinations2 PDF
No ratings yet
04 02 Permutation and Combinations2 PDF
28 pages
Organization of Terms
No ratings yet
Organization of Terms
26 pages
BA File
No ratings yet
BA File
68 pages
Data Processing and Presentation: Joycee D. Loquite
No ratings yet
Data Processing and Presentation: Joycee D. Loquite
40 pages
Basics of Business Statistics For Executive MBA
No ratings yet
Basics of Business Statistics For Executive MBA
68 pages
Advanced Quantitative Tecniques LAHIBA
No ratings yet
Advanced Quantitative Tecniques LAHIBA
35 pages
Week 1 Quantitative
No ratings yet
Week 1 Quantitative
32 pages
Unit 2
No ratings yet
Unit 2
72 pages
Statistics 1 2025
No ratings yet
Statistics 1 2025
22 pages
Stat Term Paper
No ratings yet
Stat Term Paper
17 pages
Chapter Summary - SRM - Triad 2
No ratings yet
Chapter Summary - SRM - Triad 2
17 pages
Agents Handbook of 00 Hinei A La
No ratings yet
Agents Handbook of 00 Hinei A La
106 pages
Initial Screening Questions
No ratings yet
Initial Screening Questions
4 pages
Chapter 1 - Introduction To Business Statistics
No ratings yet
Chapter 1 - Introduction To Business Statistics
23 pages
Chapter 5
No ratings yet
Chapter 5
72 pages
Intro To Course and Basic Statistics
No ratings yet
Intro To Course and Basic Statistics
31 pages
PIM3 - Basics of Business Statistics
No ratings yet
PIM3 - Basics of Business Statistics
37 pages
2035 CH1 Notes
No ratings yet
2035 CH1 Notes
32 pages
Introduction To Business Statistics and Sampling Methods
No ratings yet
Introduction To Business Statistics and Sampling Methods
20 pages
Unit 9 Statistical Packages: 9.0 Objectives
No ratings yet
Unit 9 Statistical Packages: 9.0 Objectives
16 pages
Report Stat
No ratings yet
Report Stat
21 pages
Untitled Document-13
No ratings yet
Untitled Document-13
19 pages
0631essch14
No ratings yet
0631essch14
35 pages
BUS 232 - Revision Lectures: Data and Data Sets
No ratings yet
BUS 232 - Revision Lectures: Data and Data Sets
17 pages
Steve Ness - Surety Bonds Claims
No ratings yet
Steve Ness - Surety Bonds Claims
33 pages
Geotechnical Characteristics of Copper Mine Tailings: A Case Study
No ratings yet
Geotechnical Characteristics of Copper Mine Tailings: A Case Study
13 pages
Project Report
No ratings yet
Project Report
29 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
22 pages
Gen Maths Introduction
No ratings yet
Gen Maths Introduction
5 pages
Lecture One - Statistical Data
No ratings yet
Lecture One - Statistical Data
9 pages
Business Statistics Using Excel: Understanding Standard Deviation Well Is Critical For This Course
No ratings yet
Business Statistics Using Excel: Understanding Standard Deviation Well Is Critical For This Course
13 pages
Introduction To Statistics Data Collection
No ratings yet
Introduction To Statistics Data Collection
17 pages
(Lab Report) : Experiment 03
No ratings yet
(Lab Report) : Experiment 03
20 pages
Business Statistics Introduction. 1
No ratings yet
Business Statistics Introduction. 1
18 pages
Chapter 2 Stat (MMW)
No ratings yet
Chapter 2 Stat (MMW)
13 pages
Objective: SQL Server 6.5
No ratings yet
Objective: SQL Server 6.5
24 pages
Unit One Graphing and Descriptive Statis-1
No ratings yet
Unit One Graphing and Descriptive Statis-1
12 pages
UGC NET Paper 1 16 June 2023 Morning Shift
No ratings yet
UGC NET Paper 1 16 June 2023 Morning Shift
40 pages
Input1 CaptivePremiumFinanceGuide 22
No ratings yet
Input1 CaptivePremiumFinanceGuide 22
18 pages
4.02 Statistics Fundamentals
No ratings yet
4.02 Statistics Fundamentals
2 pages
Unit - 1: Statistics: Meaning, Significance & Limitations
No ratings yet
Unit - 1: Statistics: Meaning, Significance & Limitations
11 pages
Introduction To Quantitative Methods
No ratings yet
Introduction To Quantitative Methods
5 pages
BS Chapter5
No ratings yet
BS Chapter5
4 pages
Module 1
No ratings yet
Module 1
10 pages
Lesson Chapter 2 Math Is A Tool
No ratings yet
Lesson Chapter 2 Math Is A Tool
5 pages
2014 F&S Catalog - Authcheckdam
No ratings yet
2014 F&S Catalog - Authcheckdam
24 pages
1.0 STA 192 Nature of Statistics
No ratings yet
1.0 STA 192 Nature of Statistics
8 pages
Chemical Resistance Guide
No ratings yet
Chemical Resistance Guide
20 pages
Create Stored Procedures in The NorthWind
No ratings yet
Create Stored Procedures in The NorthWind
7 pages
Batiment International, Building Research and Practice
No ratings yet
Batiment International, Building Research and Practice
2 pages
4 Callus Induction in Groundnut
No ratings yet
4 Callus Induction in Groundnut
6 pages
Mooring Design FPSO HUST
No ratings yet
Mooring Design FPSO HUST
10 pages
Report? Make It Easy - An Example of Creating Dynamic Reports Into Excel
No ratings yet
Report? Make It Easy - An Example of Creating Dynamic Reports Into Excel
6 pages
Chapter 6 HW Packet
No ratings yet
Chapter 6 HW Packet
19 pages
Are We Compatible or Terrible
No ratings yet
Are We Compatible or Terrible
6 pages
A Novel Dataset of Guava Fruit For Grading and Classification
No ratings yet
A Novel Dataset of Guava Fruit For Grading and Classification
5 pages
Migration XPPS Xpert Sebn Ro: Content
No ratings yet
Migration XPPS Xpert Sebn Ro: Content
5 pages
AEF3e Level 1 TG PCM Grammar 2A
No ratings yet
AEF3e Level 1 TG PCM Grammar 2A
1 page
Lab Java MyRMwhFBL
No ratings yet
Lab Java MyRMwhFBL
2 pages
Property and Liability Insurance: Notes 255
No ratings yet
Property and Liability Insurance: Notes 255
3 pages
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
From Everand
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
Jim Frost
5/5 (1)
Statistics and Data Analysis Essentials
From Everand
Statistics and Data Analysis Essentials
Jayant Ramaswamy
No ratings yet
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
5/5 (2)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Data Types: Getting Started With Statistics
From Everand
Data Types: Getting Started With Statistics
Lee Baker
No ratings yet

Data Analysis and Interpretations Chapter 8

Uploaded by

Data Analysis and Interpretations Chapter 8

Uploaded by

Chapter 8: Data analysis and Interpretations

Part I: Quantitative data analysis

 Any decision making process should be supported by some quantitative measures

Types of data and scales measurement

Variable - something whose 'value' can 'vary'.

Interval and Ratio data

Qualitative and Quantitative data

Discrete and Continuous data

Populations and Samples

Descriptive statistics: Measures of Central Tendency

Variance and Standard Deviation

Quartiles, Quintiles, Deciles, etc.

Thus, the relative frequency distribution can be shown as follows:

BAR GRAPH OF GRADES

(e) Construct a pie chart

Major Number of Students

Dot plot for ticket prices

(a) Develop a frequency distribution for the above data.

Approximate Class Width =

Approximate Class Width = = 59.8

The frequency distribution of waiting times at first county bank

60 - 119 6 6/30 = 0.2000 20.00

(d) Develop a cumulative frequency distribution.

0 60 120 180 240 300 360

Waiting Times (in seconds)

b) What does the above stem-and-leaf show?

Gender Less than 20 20 up to 25 25 and more Total

Total 32 132 36 200

Gender Less than 20 20 up to 25 25 and more Total

Female 10% 70% 20% 100%

Male 25% 60% 15% 100%

Female 37.5% 63.6% 66.7%

Male 62.5% 36.4% 33.3%

Total 100% 100% 100%

Graduate School Business Engineering Others Total

Total 126 146 128 400

Parametric and Nonparametric Statistics

Hypothesis Testing an Overview

greater than the table value (for instance in z-test:

than" or "is significantly more than".

Single sample tests

One-sample t test (Parametric)

Step 1: The Null Hypothesis

Step 2: The Decision Rule

and thus we use

Step 3: The Test Statistic

Step 4: The Conclusion

An anthropologist is studying the heights of adult members of an ancient population. The

Chi-square goodness of fit test (Nonparametric)

Category Observed No Expected No

Type of Car Observed No of Cars Expected No of Cars

Step 1 : The Null Hypothesis

Step 2 : The Decision Rule

2 ( 50−37 )2 ( 70−81 )2 (20−14 )2 ( 50−61 )2 (10−7 )2

Step 4 : The Conclusion

Number of Arrests and Percent of Total

Number of Arrests Percent of Total

Local Number of Arrests

Number of Number of Ex-convicts

You might also like