0% found this document useful (0 votes)
19 views

Chapter-2 Representation of Data Lecture

Uploaded by

Ammar Nadeem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Chapter-2 Representation of Data Lecture

Uploaded by

Ammar Nadeem
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

STATISTICS IS THE GRAMMAR OF SCIENCE

INTRODUCTION TO STATISTICS

CHAPTER – 2
REPRESENTATION OF DATA

PREPARED BY
HAZBER SAMSON
MATHEMATICS DEPAERTMENT
NUML ISLAMABAD
INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

REPRESENTATION OF DATA
After collection of sample data, we must “get acquainted” with them. The best way to
become acquainted with the data is to use an initial exploratory data-analysis. For this we
need to organize the data first and then visualize and analyze it.

There are two major components of data Representation

 Data Organization
 Data Visualization

DATA ORGANIZATION
In order to visualize and analyze the data first step is to organize the data, we can
organize data in two ways.

 Organizing Qualitative Data


 Organizing Quantitative Data

Before going into details of data organization let’s study the idea of frequency distribution

THE FREQUENCY DISTRIBUTION


FREQUENCY The number of times a particular distinct value occurs is called its frequency
(or count).

DISTRIBUTION The pattern of variability displayed by the data of a variable. The


distribution displays the frequency of each value of the variable.

FREQUENCY DISTRIBUTION A frequency distribution provides a table of the values of


the observations and how often they occur.

TYPES OF FREQUENCY DISTRIBUTION There are two types of frequency distributions


that are mostly used

 Categorical frequency distribution


 Grouped frequency distribution.
1
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

ORGANIZING QUALITATIVE DATA


We organize categorical data by tallying responses by categories and placing the results
in tables. Typically, you construct a summary table to organize the data for a single
categorical variable and you construct a contingency table to organize the data from two
or more categorical variables.

There are two simple ways of organizing categorical data

 The Summary Table


 The Contingency Table

THE SUMMARY TABLE


A summary table presents tallied responses as frequencies or percentages for each
category. A summary table helps you see the differences among the categories by
displaying the frequency, amount, or percentage of items in a set of categories in a
separate column. Table 2.1 shows a summary table that tallies the responses to a recent
survey that asked adults how they pay their monthly bills.

TABLE 2.1 Types of Bill Payment

From Table 2.1, you can conclude that more than half the people pay by check and 82%
pay by either check or by electronic/online forms of payment.

THE CONTINGENCY TABLE


A contingency table allows you to study patterns that may exist between the responses of
two or more categorical variables. This type of table cross-tabulates, the responses of the
categorical variables.

TABLE 2.2 Contingency Table Displaying Type of Fund and Whether a Fee Is Charged
2
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

CATEGORICAL FREQUENCY DISTRIBUTIONS


The categorical frequency distribution is used for data that can be placed in specific
categories, such as nominal- or ordinal-level data. For example, data such as political
affiliation, religious affiliation, or major field of study would use categorical frequency
distributions.

PROCEDURE OF CONSTRUCTING FREQUENCY DISTRIBUTION OF QUALITATIVE DATA

EXAMPLE-1 FREQUENCY DISTRIBUTION OF QUALITATIVE DATA

Professor Weiss asked his introductory statistics students to state their political party
affiliations as Democratic (D), Republican (R), or Other (O). The responses of the 40
students in the class are given in Table 2.6. Determine a frequency distribution of these
data.

TABLE 2.6 Political party affiliations of the students in introductory statistics

SOLUTION We apply Procedure discussed above


3
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

STEP-1 List the distinct values of the observations in the data set in the first column of a
table. The distinct values of the observations are Democratic, Republican, and Other,
which we list in the first column of Table 2.7.

STEP-2 For each observation, place a tally mark in the second column of the table in the
row of the appropriate distinct value. The first affiliation listed in Table 2.1 is Democratic,
calling for a tally mark in the Democratic row of Table 2.7. The complete results of the
tallying procedure are shown in the second column of Table 2.7.

STEP-3 Count the tallies for each distinct value and record the totals in the third column
of the table.. Counting the tallies in the second column of Table 2.2 gives the
frequencies in the third column of Table 2.7. The first and third columns of Table 2.7
provide a frequency distribution for the data in Table 2.6.

TABLE 2.7 Table for constructing a frequency distribution for the political party affiliation
data in Table 2.6

Interpretation From Table 2.7, we see that, of the 40 students in the class, 13 are
Democrats, 18 are Republicans, and 9 are Other.

By simply glancing at Table 2.7, we can easily obtain various pieces of useful information.
For instance, we see that more students in the class are Republicans than any other
political party affiliation.

RELATIVE FREQUENCY DISTRIBUTION In addition to the frequency that a particular


distinct value occurs, we are often interested in the relative frequency, which is the ratio
of the frequency to the total number of observations
4

In terms of percentages, 32.5% of the students in Professor Weiss’s introductory


Page

statistics class are Democrats. We see that a relative frequency is just a percentage

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

expressed as a decimal.

So A relative-frequency distribution provides a table of the values of the observations


and (relatively) how often they occur.

PROCEDURE OF CONSTRUCTING RELATIVE FREQUENCY DISTRIBUTION OF


QUALITATIVE DATA

EXAMPLE-2 RELATIVE FREQUENCY DISTRIBUTION OF QUALITATIVE DATA

Professor Weiss asked his introductory statistics students to state their political party
affiliations as Democratic (D), Republican (R), or Other (O). The responses of the 40
students in the class are given in Table 2.6. Determine a relative frequency distribution of
these data.

TABLE 2.6 Political party affiliations of the students in introductory statistics

SOLUTION We apply Procedure discussed above

STEP-1 Obtain a frequency distribution of the data.


We obtained a frequency distribution of the data in Example 2.1; specifically, see the first
and third columns of Table 2.7
5
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

STEP-2 Divide each frequency by the total number of observations.


Dividing each entry in the third column of Table 2.7 by the total number of observations,
40, we obtain the relative frequencies displayed in the second column of Table 2.8. The
two columns of Table 2.8 provide a relative-frequency distribution for the data in Table 2.6.

TABLE 2.8 Relative-frequency distribution for the political party affiliation data

Interpretation From Table 2.8, we see that 32.5% of the students in Professor Weiss’s
introductory statistics class are Democrats, 45.0% are Republicans, and 22.5% are
other.

NOTE: Relative-frequency distributions are better than frequency distributions for


comparing two data sets. Because relative frequencies always fall between 0 and 1, they
provide a standard for comparison
6
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

ORGANIZING QUANTITATIVE DATA


we organize numerical data by creating ordered arrays or distributions. The amount of
data we have and what we seek to discover about your variables influences which
methods we choose, as does the arrangement of data in our worksheet.

In case of quantitative data we can define frequency distribution as follows

The procedures for constructing these distributions are shown now.

GROUPED FREQUENCY DISTRIBUTIONS


When the range of the data is large, the data must be grouped into classes that are more
than one unit in width, in what is called a grouped frequency distribution. For example, a
distribution of the number of hours that boat batteries lasted is the following.

In this distribution, the values 24 and 30 of the first class are called class limits. The
lower class limit is 24; it represents the smallest data value that can be included in the
class. The upper class limit is 30; it represents the largest data value that can be included
in the class. The numbers in the second column are called class boundaries. These
numbers are used to separate the classes so that there are no gaps in the frequency
distribution. The gaps are due to the limits; for example, there is a gap between 30 and
31.

Finally, the class width for a class in a frequency distribution is found by subtracting the
lower (or upper) class limit of one class from the lower (or upper) class limit of the next
class. For example, the class width in the preceding distribution on the duration of boat
batteries is 7, found from 31 - 24 = 7. The class width can also be found by subtracting
the lower boundary from the upper boundary for any given class. In this case, 30.5 -23.5
=7.
7
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

A frequency distribution is a table formed by classifying n data values into k classes


called bins. The bin limits define the values to be included in each bin. Usually, all the
bin widths are the same. The table shows the frequency of data values within each bin.
Frequencies can also be expressed as relative frequencies or percentages of the total
number of observations.

CONSTRUCTING FREQUENCY DISTRIBUTIONS

The basic steps for constructing a frequency distribution are as follows


(1) sort the data in ascending order
(2) choose the number of bins
(3) set the bin limits
(4) put the data values in the appropriate bin
(5) create the table.
Let’s walk through these steps.

8
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

9
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

10
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

DATA VISUALIZATION
When we organize our data, we sometimes begin to discover patterns or relationships in
our data. To better explore and discover patterns and relationships, we can visualize your
data by creating various charts and special “displays.” As is the case when organizing
data, the techniques we use to visualize our data depend on the type of variable
(categorical or numerical) of our data.

We can visualize data in two ways

 Qualitative Data Visualization


 Quantitative Data Visualization

VISUALIZING QUALITATIVE DATA


In this section we shall discuss graphs that are used to summarize qualitative, or attribute,
or categorical data. For a single categorical variable we have three types of graphs

 Bar Chart
 Pie Chart

BAR CHART
The most common graphic form to present a qualitative variable is a bar chart. A bar chart
compares different categories by using individual bars to represent the tallies for each
category. Actually Bar graphs show the amount of data that belong to each category as a
proportionally sized rectangular area. The length of a bar represents the amount,
frequency, or percentage of values falling into a category.

In most cases, the horizontal axis shows the variable of interest. The vertical axis shows
the frequency or fraction of each of the possible outcomes. A distinguishing feature of a
bar chart is there is distance or a gap between the bars. That is, because the variable of
interest is qualitative, the bars are not adjacent to each other. Thus, a bar chart graphically
describes a frequency table using a series of uniformly wide rectangles, where the height
of each rectangle is the class frequency.
11
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

EXAMPLE-3 TABLE 2.1 Frequency Table for Vehicles Sold Last Month at Applewood
Auto Group by Location

We use the Applewood Auto Group data as an example (Table 2.1). The variables of
interest are the location where the vehicle was sold and the number of vehicles sold at
each location. We label the horizontal axis with the four locations and scale the vertical
axis with the number sold. The variable location is of nominal scale, so the order of the
locations on the horizontal axis does not matter. In Figure 2.1, the locations are listed
alphabetically. The locations could also be in order of decreasing or increasing
frequencies.

FIGURE 2.1 Number of Vehicles Sold by Location

12
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

EXAMPLE-4Table 2.1 shows a summary table that tallies the responses to a recent
survey that asked adults how they pay their monthly bills.

TABLE 2.1 Types of Bill Payment

Figure 2.2 displays the bar chart for the data of Table 2.1 on page 30, which is based on
a recent survey that asked adults how they pay their monthly bills

FIGURE 2.2 Bar Chart for how people pay there bills

13
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

PIE CHART
Another useful type of chart for depicting qualitative information is a pie chart. A pie chart
uses parts of a circle to represent the tallies of each category. The size of each part, or
pie slice, varies according to the percentage in each category.

EXAMPLE-5 We explain the details of constructing a pie chart using the information in
Table 2.3, which shows the frequency and percent of cars sold by the Applewood Auto
Group for each vehicle type.

14
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

EXAMPLE-6Table 2.1 shows a summary table


that tallies the responses to a recent
survey that asked adults how they pay their monthly bills.

TABLE 2.1 Types of Bill Payment

In Table 2.1, 54% of the respondents stated that they paid bills by check. To represent
this category as a pie slice, you multiply 54% by the 360 degrees that makes up a circle
to get a pie slice that takes up 194.4 degrees of the 360 degrees of the circle. From Figure
2.4, you can see that the pie chart lets you visualize the portion of the entire pie that is in
each category. In this figure, paying bills by check is the largest slice, containing 54% of
the pie. The second largest slice is paying bills electronically/online, which contains 28%
of the pie.

FIGURE 2.4 Pie Chart for how people pay there bills

15
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

EXAMPLE-7

16
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

FIGURE 2.5 Bar Chart and Pie Chart

BAR CHART PIE CHART


VISUALIZING QUANTITATIVE DATA
In this section we shall discuss graphs that are used to summarize quantitative data. There
are different types of graphs including

 Histogram
 Frequency Polygon
 Ogive

THE HISTOGRAM
Karl Pearson introduced the histogram in 1891. He used it to show time concepts of
various reigns of Prime Ministers. A histogram for a frequency distribution based on
quantitative data is similar to the bar chart showing the distribution of qualitative data. The
classes are marked on the horizontal axis and the class frequencies on the vertical axis.
The class frequencies are represented by the heights of the bars. However, there is one
17

important difference based on the nature of the data. Quantitative data are usually
measured using scales that are continuous, not discrete. Therefore, the horizontal axis
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

represents all possible values, and the bars are drawn adjacent to each other to show the
continuous nature of the data.

A histogram is also called a frequency histogram, a relative frequency histogram, or


a percentage histogram depending on whether frequencies, relative frequencies, or
percentages are marked on the vertical axis.

PROCEDURE OF CONSTRUCTING A HISTOGRAM

EXAMPLE-8 RECORD HEIGHT TEMPERATURE

Construct a histogram to represent the data shown for the record high temperatures for
each of the 50 states having class boundaries and frequencies as follows

18

SOLUTION We apply Procedure discussed above


Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

STEP-1 Draw and label the x and y axes. The x axis is always the horizontal axis, and the
y axis is always the vertical axis.

STEP-2 Represent the frequency on the y axis and the class boundaries on the x axis.

STEP-3 Using the frequencies as the heights, draw vertical bars for each class. See Figure
2.11 below

FIGURE 2.11 HISTOGRAM

DIFFERENT SHAPES OF HISTOGRAMS

Histograms are valuable tools. For example, the histogram of a sample should have a
distribution shape very similar to that of the population from which the sample was drawn.
If the reader of a histogram is at all familiar with the variable involved, he or she will usually
be able to interpret several important facts. Figure 2.12 presents histograms with specific
shapes that suggest descriptive labels. Possible descriptive labels are listed under each
histogram.

Briefly, the terms used to describe histograms are as follows:

Symmetrical: Both sides of this distribution are identical (halves are mirror images).
19

Normal: A symmetrical distribution is mounded up about the mean and becomes sparse
at the extremes.
Page

Uniform (rectangular): Every value appears with equal frequency.

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

Skewed: One tail is stretched out longer than the other. The direction of skewness is on
the side of the longer tail.

J-shaped: There is no tail on the side of the class with the highest frequency.

Bimodal: The two most populous classes are separated by one or more classes. This
situation often implies that two populations are being sampled.

FIGURE DIFFERENT SHAPES OF HISTOGRAM

THE FREQUENCY POLYGON

A frequency polygon also shows the shape of a distribution and is similar to a histogram.
It consists of line segments connecting the points formed by the intersections of the class
midpoints and the class frequencies. The midpoint of each class is scaled on the X-axis
and the class frequencies on the Y-axis.

EXAMPLE-9 RECORD HEIGHT TEMPERATURE


20

Construct a frequency polygon to represent the data shown for the record high
temperatures for each of the 50 states having class boundaries and frequencies as follows
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

SOLUTION We apply Procedure discussed above

STEP-1 Find the midpoints of each class. Recall that midpoints are found by adding
the upper and lower boundaries and dividing by 2:

STEP-2 Draw the x and y axes. Label the x axis with the midpoint of each class, and then use a
suitable scale on the y axis for the frequencies.

STEP-3 Using the midpoints for the x values and the frequencies as the y values, plot the points.

STEP-4 Connect adjacent points with line segments. Draw a line back to the x axis at the beginning
and end of the graph, at the same distance that the previous and next midpoints would be located,
as shown in Figure 2.12.

FIGURE 2.12 FREQUENCY POLYGON


21
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

THE CUMULATIVE FREQUENCY POLYGON (OGIVE)

The cumulative frequency polygon, or ogive, uses the cumulative frequency distribution
to display the variable of interest along the X-axis and the cumulative percentages along
the Y-axis. The cumulative frequency is the sum of the frequencies accumulated up to the
upper boundary of a class in the distribution.

EXAMPLE-10 RECORD HEIGHT TEMPERATURE

Construct a cumulative frequency polygon to represent the data shown for the record high
temperatures for each of the 50 states having class boundaries and frequencies as follows

SOLUTION We apply Procedure discussed above

STEP-1 Find the cumulative frequency for each class.


22
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD


INTRODUCTION TO STATISTICS REPRESENTATION OF DATA

STEP-2 Draw thex and y axes. Label the x axis with the class boundaries. Use an
appropriate scale for the y axis to represent the cumulative frequencies.

STEP-3 Plot the cumulative frequency at each upper class boundary, as shown in Figure
2.13. Upper boundaries are used since the cumulative frequencies represent the number
of data values accumulated up to the upper boundary of each class.

STEP-4 Starting with the first upper class boundary, 104.5, connect adjacent points with
line segments, as shown in Figure 2.13. Then extend the graph to the first lower class
boundary, 99.5, on the x axis.
FIGURE 2.13 CUMULATIVE FREQUENCY POLYGON

23
Page

PREPARED BY HAZBER SAMSON MATHEMATICS DEPARTMENT NUML ISLAMABAD

You might also like