0% found this document useful (0 votes)
39 views8 pages

2.1 Descriptive Statistics (Tabular and Graphical)

Uploaded by

issatay.jax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views8 pages

2.1 Descriptive Statistics (Tabular and Graphical)

Uploaded by

issatay.jax
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

LECTURE 2: DESCRIBING DATA WITH TABLES & GRAPHS

Displaying data graphically often allows us to get important information quickly. In this note, we are going to
use some examples to demonstrate this. All experienced Excel or other spreadsheet application software users
know that spreadsheet programs are strong in producing graphs. Typical graphs that can be created with Excel
include:

 Pie Chart
 Bar Chart
 Histogram
 Line Chart
 Scatter Plot
 Area Chart
 Box Plot

There are however some other useful graphs that cannot be created easily with Excel, such as:

 Stem-and-Leaf Plot
 Dot Plot

Besides these graphical methods of data presentation, it is sometimes useful to describe and summarize data in
table forms. Excel comes with a powerful tool, called PivotTable, which can do this very efficiently.

Creating Tables and Graphs with Excel


Example 1
A human resources manager at Beta Technologies, Inc., has collected current annual salary figures and related
data for 52 of the company’s full-time employees. The data are in the file “Beta.xls”. In particular, these data
include each selected employee’s gender, age, number of years of relevant work experience, the number of years
of post-secondary education, and annual salary. Based on these data, let’s try to answer the following questions:

 Do female employees of Beta get less pay than male employees?


 Does the salary depend on the seniority? Years of education?

To answer the first question, let’s first try to summarize the data in a neater form so that we can get a better idea
about what’s going on. To summarize data using PivotTable, place your mouse any where in the data range
and select Data/PivotTable Report from the menu bar. Then, simply follow the instructions afterward. Using
PivotTable, for example, we get the following result:

Average of Annual_Salary
Gender Total
0 52126.0
1 43645.7

Here, “0” means male and “1” means “female”. The average salary of male workers is 52126 and the average
salary of female workers is only 43645.7. So, it appears that female workers on average get lower pay.

But, there is a slight problem here. Have we taken into consideration of other factors yet? It may be the case
that females are paid less because of their weaker qualifications in other areas (e.g., seniority, education, etc).

The following table allows us to see whether workers, female or male, with the same educational level get about
the same average salary. (Again, it is produced with PivotTable.)

Average of Annual Salary Education


Gender 0 2 4 6 8
0 20985.3 51726.0 57308.9 109285.0
1 16442.3 17784.0 44329.1 49238.8 91875.5

1
Now, we can see clearly from this table, given the same educational level, females are always paid less than
males on average. (So, there may be a discrimination case to keep some lawyers busy and alive.)

If the table is not convincing enough, the following graphs are created from the above table. Don’t tell me that
you still cannot see who gets paid more or less.

You can also manipulate the PivotTable to get other information. As an example, try to see if you can answer
the following questions:

Does salary depend on seniority or education? To answer these questions, the following scatter plots were
created:

 “salary” against “age”


 “salary” against “Experience”
 “salary” against “years of education”.

2
Salary vs. Age

$120,000

$100,000

$80,000
Salary

$60,000

$40,000

$20,000

$0
0 10 20 30 40 50 60 70
Age

Salary vs. Experience

$120,000

$100,000

$80,000
Salary

$60,000

$40,000

$20,000

$0
0 5 10 15 20 25 30 35 40
Experience

Salary vs. Education

$120,000

$100,000

$80,000
Salary

$60,000

$40,000

$20,000

$0
0 1 2 3 4 5 6 7 8 9
Education

What can you say after seeing these scatter plots? How is “salary” related to “age”, “experience”, and
“education”?

3
We can also use PivotTable to generate other useful information. For example:
 The age distribution of male and female employees
 The salary distribution of male and female employees

Following is the age distribution of Beta’s employees, females and males are counted separately.

Sum of Gender Gender


Age 0 1
20 1
21 1
23 1 1
24 1
25 1 1
26 1
27 2
28 1
29 1 1
30 1
31 2
32 1
33 1 2
34 1 1
35 2
36 1
37 1 1
38 1
39 1
40 1
41 1
42 2
44 1 1
45 1
46 2
47 2
48 1
50 2
51 2
52 1
53 1
55 1
56 1
58 1
60 1
63 1
Sum 25 27

4
This table has too much details. To make it easier to read, we can use the “Group” function in the PivotTable to
combine categories. To use the “group” function, do the following:
 Place the mouse cursor any where on “Age” column of the pivot table, then either “right click” the mouse
or choose Data/Group and Outline/Group.
 Specify the way you want to group your data in the popup window.

After re-arrangement, we get the following: (the exact output depends on how you group the data)

Count of Age Gender


Age 0 1
20-24 2 3
25-29 4 4
30-34 5 4
35-39 2 5
40-44 2 4
45-49 4 2
50-54 1 5
55-59 3
60-64 2

We can also display the salary distribution, separated by age, in table form. This requires the change of
PivotTable setup. Any time you want to change the setup of your pivot table, do the following:
 Place mouse cursor any where on the pivot table.
 “Right click” the mouse
 Select Wizard.
 Make whatever change you want

This following table displays the salary distribution, separated by age. (The salary columns have been grouped)

Count of Age Annual Salary


Age 10000-29999 30000-49999 50000-69999 70000-89999 90000-110000
20-24 5
25-29 6 2
30-34 1 8
35-39 7
40-44 3 3
45-49 1 5
50-54 1 2 3
55-59 2 1
60-64 1 1

Example 2
The file “Sky.xls” contains three years (1991-1993) of sales data for the Sky’s the Limit Women’s apparel store.
Each observation represents sales during a 4-week period. Thus, the first observation is sales during the first 4
weeks of 1991, and so on.

 Does there appear to be any trend in sales?


 Do sales appear to be seasonal? If so, discuss the nature of the seasonality.

Time series plot is particularly useful for identifying patterns that may exist in any time series data. We can
create time series plot with Excel easily. Simply create a line chart based on your time series data.

Is there any trend or seasonality in this time series data?

5
Sky's Sales Data 1991-1993

300000

250000

200000
Sales

150000

100000

50000

0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
Week

Example 3
A question of great interest is how the distribution of family income has changed in the United States during the
last 20 years. The file “Income.xls” contains data for a sample of 499 family incomes (in real 1995 dollars).
For each family, the 1975 and 1995 incomes are listed (Although these data are fictitious, they are consistent
with what has actually happened to U.S. family income during these years.) Based on these data, discuss as
completely as possible how the distribution of family income in the United States changed from 1975 to 1995.

Box-Plot is very useful when we want to graphically depict a large amount of data. This plot allows us to see
the mean, the median, the interquartiles, and some other information. When two box-plots are done side-by-
side, we can also compare the difference in the distribution of two data sets.

The keys to understanding a boxplot are the following.

 The right and left of the box are at the third and first quartiles. Therefore, the length of the box equals the
interquartile range (IQR), and the box itself represents the middle 50% of the observations.
 The vertical line inside the box indicates the location of the median. The point inside the box indicates the
location of the mean (Some software do not show the mean).
 Horizontal lines are drawn from each side of the box to represent observations that are either below the first
quartile or above the third quartile.

Based on the following box-plot, what can you say about the family income change in the U.S. from 1975 to
1995?
Income Distribution, 1975 vs. 1995

Income Distribution, 1975 vs. 1995

Year_1995 Five-number Summary


Year_1975 Year_1995
Minimum 7 4
First Quartile 17 16
Year_1975
Median 32 30
Third Quartile 54 56
Maximum 275 318

0 50 100 150 200 250 300 350

Example 4
The file “Grade.xls” contains the final exam grades of a business statistics class. Generate Stem-and-Leaf plot,
frequency table, and histogram to show the grade distribution.

6
To create frequency distribution table and the histogram, we will use Excel’s PivotTable and PivotChart
Report. The results are shown below:
Grade Distribution

Count of ID Count of ID
25
Grade Total
<50 2 20
50-59 3
60-69 9 15
Total
70-79 22 10
80-90 16
>90 3 5

Grand Total 55 0
<50 50-59 60-69 70-79 80-90 >90

Grade

To create the Stem-and-Leaf plot, we use PHStat. To do so, from the menu bar, select PHStat/Descriptive
Statistics/Stem-and-Leaf Display. Then, follow the instruction afterward. The stem-and-leaf plot is shown
below:

Stem-and-Leaf Plot

Stem unit: 10
Statistics
Sample Size 55 4 99
Mean 75.0545 5 379
Median 76 6 235677899
Std. Deviation 10.0819 7 0012334445556667788999
Minimum 49 8 0001123344666779
Maximum 92 9 122

How do you interpret this stem-and-leaf plot? For example, how many students got less than 60? How many
got higher than 90?

Using Excel’s AutoFilter


Excel’s autofilter tool allows us to perform simple queries on an existing Excel database with almost no effort.
Details regarding how to use AutoFilter is described in Chapter 4 of the textbook. You can also read Excel’s
help file. To use the autofilter tool, make sure the cursor is anywhere within the database, and select the
Data/Filter/AutoFilter menu item. A drop down arrow immediately appears next to each field name in the
database. Clicking on these dropdown arrows produces a list.

Getting Summary Statistics


Getting Summary Statistics with Excel:
 Select Tools/Data Analysis. If you can’t find Data Analysis from Tools pull down menu, you need to load
it in. Select Tools/Add-Ins, and then check Analysis ToolPak. Press OK. You should now be able to find
Data Analysis option from Tools pull down menu.
 Select Descriptive Statistics. Press OK.
 Enter all necessary information in the pop-up window.

Using the file Beta.xls again, the summary statistics for “salary” is shown below:

7
Salary

Mean 47722.77
Standard Error 3351.308
Median 41081.5
Mode #N/A
Standard Deviation 24166.62
Sample Variance 5.84E+08
Kurtosis -0.05775
Skewness 0.877974
Range 94914
Minimum 14371
Maximum 109285
Sum 2481584
Count 52

You might also like