0% found this document useful (0 votes)

19 views29 pages

5 Summarizing Data

The document discusses exploring and summarizing data. It describes sorting and filtering tables in Excel and Python to gain insights from data. Simple summary statistics like counts, averages, maximums and minimums are calculated in Excel using a total row and in Python using methods. Exploring extremes and outliers helps assess data quality issues. Summarizing subsets of data filtered on variables allows comparing groups.

Uploaded by

akmam.haque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views29 pages

5 Summarizing Data

Uploaded by

akmam.haque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

5 Exploring Data – Summarizing Data

5.1 Sorting and Filtering a Table

When confronted with a large mass of information, we usually start by trying to condense it.
Summarizing data may start with grouping similar values together, but we will also want to find
simple “landmark” values that describe important characteristics of the data set.

However, before summarizing the data, it is advisable to simply explore the data to see what is
there. Play with the data as if you were in a sandbox, before you start forming opinions about
the data and prematurely drawing conclusions.

Exploring in Excel Tables

If your data is in a Table, then there are some functions that make basic exploration easy. In the
header row at the top of the Table, each variable name is followed with a down arrow. If you
click on it you can see that you can sort or filter your data. The first two sorts (smallest to largest
and largest to smallest for numeric variables and A to Z vs Z to A for text) are self-explanatory.

Open the file Student Survey Data 2010.xlxs.

Figure 5-1 Sorting and Filtering a numeric variable in a Table

5 Exploring Data – Summarizing Data Page 1 of 29

Figure 5-2 Sorting and Filtering a text variable in a Table

The third is labeled Sort by Color. This is a custom sort. You can sort at multiple levels. That is,
sort students by their Program, then within each Program you can sort by Gender, then among
each Gender-Program combination you can sort by GPA,… Note that the whole file is sorted, not
simply the variable that you selected. Normally in Excel, you have to select the data you wish to
sort.

Figure 5-3 Custom Sort

5 Exploring Data – Summarizing Data Page 2 of 29

Figure 5-4 Custom Sort dialog box

You can also Filter your data. This could be by selecting the specific values you wish to examine,
or by using a variety of functions to select ranges of values. You can do a Custom Filter that
combines ranges by using logical operators, such as OR and AND. You can filter on several
different variables separately. This is useful when you want to explore specific sub-groups in
your data.
Suppose that we want to look just at the Business students. We can click on the down arrow by
Program and uncheck all programs except Business.

Maybe we want only the International Business students. Click on the down arrow by Home and
select International.

We can continue selecting just Females or just 1st year students. We can also use more complex
text filters. Or we can select just those that expect to earn over $50,000, or between $40,000
and $80,000…

Remember to take filters off to return to the full data set.

We can also Sort the data in a similar fashion. If you select Sort by Color you can then choose
Custom. Sort that allows you to sort by several variables concurrently.

When first exploring data, doing a simple sort on each numeric variable can be insightful. For
example, sort Salary, either smallest to largest or largest to smallest.

Look at the smallest values. Someone expects to earn $0? $500? $9,450?

Look at the largest values. Someone expects to earn $650,000? $560,000?

5 Exploring Data – Summarizing Data Page 3 of 29

Are these students crazy or are these data entry errors? Looking at the extremes will give you
some sense of data quality issues. Should we keep these values when we do our analyses?
Maybe the $650,000 was supposed to be $65,000, but do we know for sure? Maybe we should
investigate this record more closely. Are there other responses for this student that may give us
insights? Maybe the student gave all erroneous answers and we should discard all responses for
the student.

Notice how many gave no answer. Why? Should we be concerned?

When you are done sorting, it is recommended that you return your data to the original sort
order. Initially, the data was likely sorted by the record ID. Resort the Table by ID, smallest to
largest. If your data set does not have an ID column, it is recommended that you create one.
Without a record ID column that is initially set as smallest to largest, you do not have any way to
restore your original sort order.

As we explore our data and come to understand it, we may start asking ourselves questions
about it.

Sorting and Filtering in Python

Like most actions on “objects” in Python, these are executed by using “methods” attached to
the object. If we wanted to sort the data frame with respect by Program2 and then by Salary,
we would give the following instruction.

The default sort is ascending, but this can be changed by changing the ascending setting to
False. Note that we are sorting on Program2 and Salary, so the sort order will be descending
for BOTH of these variables.

5 Exploring Data – Summarizing Data Page 4 of 29

The sort-values method only sorts a copy of the data frame. To make the sort permanent,
either assign the results to a new object, or use the inplace=True option to overwrite the data
frame.

In the examples above, only the first and last 5 rows are displayed. To display the top (head) 10
rows, add the method .tail(10)

And if you want just the bottom (tail) 10 rows, add the method .tail(10)

Observe that we can chain “methods” together.

To filter cases, we must specify the condition(s) on which we wish to filter. For example, if we
want to select only observations for which Salary exceeded $100,000, we would write

5 Exploring Data – Summarizing Data Page 5 of 29

When you want observations that equal a specific value, such as Age equal to 20 or Program2
equal to Arts, then you must write ==. You can also specify multiple conditions such as

Using tab to save typing

Typing out long expressions, such as repeatedly typing df_Grad_Exp, can be simplified by
typing the first one or two letters and then pressing the tab key. Python will respond with a
list of all variables starting with these letters.

If you want to know what methods or functions you can apply to an object, type a period
after the object and then press tab.

5 Exploring Data – Summarizing Data Page 6 of 29

Getting help with Python syntax

If you can’t remember the syntax associated with the function, type shift+tab and you will get
a help screen for the function.

5.2 Simple Summary Values

Although sorting gives you a sense of what are small and large values and whether they are just
extreme outliers or common, we also want some other “landmark” measures. Geographical
landmarks assist in giving you a sense of where you are by giving you reference points.
Statistical landmarks serve a similar purpose.

Summary Statistics in Excel

When in a Table, if you select the Design tab at the top of the screen, there is a checkbox labeled
Total Row. Click it.

Figure 5-5 - Adding a Total Row

A row will be added to the bottom of the table. Scroll to the last row. Select the entry in the
Salary column. A down arrow should appear. Click on the arrow.

5 Exploring Data – Summarizing Data Page 7 of 29

Figure 5-6 Summary functions in Total Row

Select Count Numbers. 752 students reported a salary value. Look at Average, Max and Min.
The average salary was $52,914.76, with a maximum of $650,000 and a minimum of $0. The
$650,000 appears unrealistic. We noted before when Sorting, that some students gave values
that we thought were suspicious.

Go to the down arrow by Salary in the header row and select Number Filters. Select Between
and then select 20000 and 120000 as your limits. You now get an average of $51,480.40. It
would appear that the extremes at each end did not seriously distort our average.

Click on cell ID for Total row 813 (cell A813). Click on the arrow and select Count.
811 students attempted the survey. This is all of the students. Click on the Total cell for Year.
802 told us the year they started. Repeat this for Home and for Home2. Why is the count 811 for
Home and 801 for Home2? When we applied VLOOKUP to Home, blanks were coded as #NA.
This may be a problem.

What happens when you look at Gender and Gender2? 811 and 665? Why? Initially, we used
VLOOKUP to code 0 and 1 as Male and Female, but VLOOKUP treated a blank as zero = Male. We
added an IF statement to treat blanks as blanks. But, Excel coded blanks as blank text. So the
cells are no longer empty and they were counted.

Lesson: Recoding may have unintended consequences.

Question: why did 802 tell us when they started but only 665 tell us gender?

Lesson: Watch out for survey fatigue with long surveys.

Are salary expectations of Business students similar to those of Arts students? We know that we
have some strange salaries reported, so let us Filter on those between $20,000 and $120,000.

note: do not insert $ and commas. Type 20000 and 120000.

The average salary is $51,480.80.

5 Exploring Data – Summarizing Data Page 8 of 29

Now filter on Program and select Business. The average is $52,463.20. Filter on Program and
select Arts. The average is $48,185. If you filter on Science you will see they have the highest
expectations, at $57,774.77.

Although tedious, you can answer many simple questions with the Total row and Filters.

Video: Sorting, Filtering and Total in an Excel Table

https://youtu.be/rs_H0T1sY3U

Summary Statistics in Python

Common summary measures can be found by applying the appropriate method. For example,
mean() will give us the average.

If you do not specify a column, you will get the mean value for every numeric variable.

5 Exploring Data – Summarizing Data Page 9 of 29

You can obtain all of the most common summary measures by using the describe() method.

We will explain the meaning of each of these measures in the next chapter.

With string (text) data, we can count the values with value_counts(). This method does not give
us a count of the total observations, but counts the number of observations taking on each
unique value.

This last example, is a good segue into our next topic of aggregating data.

5 Exploring Data – Summarizing Data Page 10 of 29

5.3 Aggregating Data with Frequency Distributions
A frequency distribution is a tabular summary of how many values are in each group of values.
Suppose you take our breakdown of salaries from very low to unrealistic and count how many fit
each description.

Table 5-1 - Proposed grouping for Expected Salary data

Usually, frequency distributions do not have text descriptions, just the ranges of values.

We have looked at how to group data by making numeric values into categorical values. In
Excel, this was done with VLOOKUP using an approximate match. In Python, we used the cut
function in a similar manner. Both methods create categorical variables, but do not
automatically make a summary like the table above.

To build a frequency distribution, you must group the data.

What are the “best practices” in grouping?

The objectives for grouping data are to:
• Make the summary concise.
• Make the summary informative.

What does this mean? How do we do it well?

• Don’t have too many groups – not concise – normally limit to a maximum of 20.
• Don’t have too few groups – loss of information – have at least 4-5.
• Groups should be unambiguous (should not overlap so each value fits in only one group).
• The summary should include all the data.
• Use “nice numbers. We like multiples of 2, 5, 10, 25, 50, 100 and not awkward starting or
ending values, such as 13 or 27 or 146.3.
• It is easier to compare groups that are of “equal size”. The proposed groups above went in
increments of $20,000, but “very high” had a range of $40,000. This is NOT a good practice.
• To satisfy all of the above, the 1st and/or last group may need to be larger to capture
outliers.

5 Exploring Data – Summarizing Data Page 11 of 29

Frequency Distributions in Excel – Histograms

A histogram is a graphical depiction of a frequency distribution. There are several ways to create
a histogram in Excel. We will look at two methods in this section and a third method when we
look at pivot tables and charts.

If you wish to set up groupings of different sizes, then you will have to use the Analysis ToolPak
add-in in Excel.

Installing the Analyis ToolPak

In Excel for Windows:
• Go to the File tab.
• Select Options (bottom of list).
• Select Add Ins (near bottom of list).
• Select Manage |Excel Add Ins| Go (at the bottom of the screen).
• Click on Analysis ToolPak.

Figure 5-7 - Installing the Analysis ToolPak in Excel for Windows

If you are using Office 365 in the cloud, then installation is somewhat more complicated. The
Analysis Toolpak is not included in 365. Go to Insert and then select Add-Ins

5 Exploring Data – Summarizing Data Page 12 of 29

Figure 5-8 Excel 365 Add-ins

Select STORE from the options listed at the top of the pop-up screen. From the list of categories
at the left, select Data Analytics. Scroll down the list of suggestions until you see XLMiner.

Figure 5-9 XLMiner Add-in

The XLMiner add-in is free and behaves in a very similar way to the Analysis Toolpak.

If you are using a Mac, then you must go through a similar process to the above. You may need
to google to find installation instructions.

5 Exploring Data – Summarizing Data Page 13 of 29

Using the Histogram Function

To use the Histogram function in Excel, you must define your groups. Excel calls the groups
“bins”.

Since the end of one bin is the start of the next, we only need to define the end of the bin. Note
that this differs from VLOOKUP that required us to define the start of each group. (confusing!)

Let us use a bin width of $10,000 and define bins up to $120,000. To keep our data sheet clean,
we recommend that you put the bins on a separate sheet, such as your LookUp Table
worksheet.

In column R, put a heading “Salary bins” in the 1st row and then enter 10000, 20000, 30000,…. In
the 2nd, 3rd, 4th,….rows. Note: do not use commas. Write 10,000 as 10000.

Figure 3-10 - Salary bins for building Histograms

Go to the Data tab and on the far right, you should see Data Analysis. Click on it. A pop up
window appears. Select Histogram.

5 Exploring Data – Summarizing Data Page 14 of 29

Figure 3-11 - Histogram function
A dialogue box opens.

Figure 5-12 - Histogram dialog box

The Input Range is the location of the data to be summarized $L$1:$L$812.

The Bin Range is the location of the bins ,'Lookup Tables'!$R$1:$R$13.
Our columns have Labels, so check this box.
We want Chart Output, so check this box.
Ask for a New Worksheet, so we do not add clutter to our data sheet.

5 Exploring Data – Summarizing Data Page 15 of 29

Figure 5-13 - Histogram output

The histogram is a simple column chart. You can format the labels on the chart to make it more
attractive. It is common practice to remove the space between columns in a histogram.

Click on any column and then right click.

Select Format Data Series.

Figure 5-14 - Formatting the histogram

To the right are several formatting options. Change the Gap Width to 0.

5 Exploring Data – Summarizing Data Page 16 of 29

Figure 5-15 - Histogram without gaps

The More group on the right is deceptive. Excel makes all columns the same width. But this
group actually represents values from $120,000 to $650,000! The tail on the right should be very
very long, but that would distract us from looking at where most of the data is.

Figure 5-16 - Histogram with 65 bins

Don’t be distracted by outliers. They are important, but not the focus.
It would be nice to filter out extreme values. If you apply a filter and then construct your
histogram, Excel ignores the filter! The only way to filter data and then make a histogram is to
• Apply the filter.
• Copy the filtered data to a new sheet.
• Build the histogram using the copied data.

5 Exploring Data – Summarizing Data Page 17 of 29

Video: Creating a Frequency Distribution and Histogram in Excel

https://youtu.be/3ysPbcSqDXE

Use the Chart Tools to create a Histogram

The most recent version of Excel has an alternate way to construct histograms.
• Select the Salary column.
• Go to Insert and select Recommended Charts.
• You may see an image of a histogram, but if not, select All Charts and select Histogram
from the list.

Figure 5-17 Histogram Chart

5 Exploring Data – Summarizing Data Page 18 of 29

The result is very crude.

Figure 5-18 Histogram Chart Output

But, we can improve this significantly. Right click anywhere on the horizontal axis. Select
Format Axis. A dialog box appears. You will see that Automatic is checked. Let us select Bin
Width and enter 10000 (do not use a comma 10,000). We would also like to limit the number of
bins, but Excel only allows you to fix one setting.

Figure 5-19 Histogram Chart with Bin width set

The advantage of using an Excel Chart instead of the Analysis Toolpak is that we can use the
Filtering tools in Table. Select the down arrow beside Salary and select Number Filters and then
select Less than or equal to. Enter 120000.

5 Exploring Data – Summarizing Data Page 19 of 29

Figure 5-20 Histogram Chart with Filtered values

This now looks similar to what we obtained with the Histogram function in the Analysis
Toolpak. As a Chart, we can change the title and label the axes. In general, it is recommended
that all charts have informative titles and labelled axes.

Figure 5-21 Formatted Histogram Chart

5 Exploring Data – Summarizing Data Page 20 of 29

Frequency Distributions and Histograms in Python

As noted in a previous section, the value_counts() method will count the number of
observations taking a specific value of a categorical variable. For example, with Program2, we
had

If you wish to obtain relative frequencies, then set normalize=True.

Too many decimals are distracting and not informative. To round the results to 2 decimal places,
add the method round(2).

With numeric data, we saw that we can use the cut function to change numeric data to
categories.

In the example we looked at before, we defined the bins with a series of upper limits for the
bins.

5 Exploring Data – Summarizing Data Page 21 of 29

This looks very confusing because by default, value_counts are sorted largest to smallest. We
can set sort=False to reverse this.

We can also simply specify the number of bins we want and let Python create equal sized bins.
You lose control over whether the end points to the bins are “attractive” values and outliers can
seriously distort the groups. For example, if we were to split the Salary data into 20 bins, we
would get

Almost all the data is in the first 4 groups.

To make a frequency distribution into a histograms, we can do this in Pandas with the plot
method, or we can load Python’s library of plotting functions.

The quick approach would be to ask for a bar chart of the frequency distribution. Below is an
example with bins with widths of $10,000 and the last bin being all values above $100,000. We
can add axis labels and a title.

5 Exploring Data – Summarizing Data Page 22 of 29

If you plan to let Python build your bins, then we do not need to create categories and frequency
distributions to get a histogram. You can ask for a histogram directly. If you don’t specify the
number of bins, Python uses 10 bins. The example below uses 50 bins.

5 Exploring Data – Summarizing Data Page 23 of 29

Frequency Distributions versus Histograms – a Table or a Chart

Before leaving this topic, we must ask why we want frequency distributions or histograms?
The rationale for a frequency distribution may be very different from that for a histogram. It
goes to the question of why/how we use tables versus charts.

In a table, we are trying to summarize information while still retaining some detail. We are
interested in how the groups (bins) are defined and the percentage of values within a group.
Readability is important.

In a chart, we are visually looking for patterns. You “read” a table, but “feel” a chart. In most
instances, you do not look at the scales on the chart axes. You don’t care if the vertical axis
shows frequency or percentage. But you also don’t pay attention to the labeling of the
horizontal axis. You may look for a sense of where the middle is, and what are low and high
values, but the exact numbers are not important. You get a sense of shape – symmetry or
skewness.

Although we present frequency distributions and histograms as a single topic, they serve very
different purposes. Defining groups is very important for frequency distributions (tables), but
may not be important for histograms (charts). This distinction in how we mentally process
information will be revisited several times throughout the text.

5.5 Comparing Histograms in Excel

What if we wanted to compare the pattern of salaries for Arts students to Business students?
This might give deeper insight than just looking at averages as done before. You will need to
Filter on Business, then copy the data to a new sheet. Then you would do it for Arts and copy.
Then again for Science.

You could then construct 3 histograms and compare them. But this comparison is hard to do
visually.

5 Exploring Data – Summarizing Data Page 24 of 29

Figure 5-22 - Comparing Histograms

How do you compare these distributions? Arts and Business samples are similar size, but Science
is less than half the size. If comparing frequency distributions or histograms, you need
comparable scales. Convert each frequency to percentage of the total sample size. We call these
relative frequencies.

We must take the three frequency distributions and combine them in a single table, with
columns for each of Arts, Business and Science. For each, create three new columns of relative
frequencies, by dividing the frequencies by the corresponding sample sizes.

5 Exploring Data – Summarizing Data Page 25 of 29

Figure 5-23 Frequencies and Relative Frequencies

Select the relative frequency columns and Insert a Clustered Column Chart.

Figure 5-23 Relative Frequencies as a Clustered Column Chart

Format the chart with an informative title and labelled axes.

5 Exploring Data – Summarizing Data Page 26 of 29

Figure 5-24 Salary Expectations by Program

This chart allows you to more easily compare the patterns among the programs. We will see in
the next chapter an alternative way to summarize data and build charts that is more flexible
than using the Histogram function.

In a later chapter, we will look at a graphical technique that is better suited to comparing the
distributions of many groups. Although histograms are simple charts, they still contain too much
detail for comparing multiple groups.

5.6 Common Shapes of Histograms (Distributions)

What are these distributions and graphs showing us?
• Where are most values located?
• What are small (large) values?
• Are there many small (large) values?
• Are values widely scattered or closely bunched?
• Are there lots of stragglers at the top or bottom?
• Is the distribution symmetrical? Bell shaped?
• Is the distribution skewed with a long tail to the left? or the right?

Classic patterns include the Bell shape with balanced tails on left and right – most common when
summarizing averages. This curve is known as the Normal Distribution and is the basis of much
of statistical analysis.

5 Exploring Data – Summarizing Data Page 27 of 29

Figure 5-25 - Normal Distribution

Skewed distributions with a long tail to left or right are very common. Long right tails are
common with financial data.

Figure 5-26 - Skewed distributions

Distributions that are lumpy, are often ones that describe two distinct groups. For example, if we
were to construct a histogram of summer earnings (q10 in the original data set) and we had the
first group ending in zero, we would get

5 Exploring Data – Summarizing Data Page 28 of 29

Figure 5-27 - Lumpy distribution

The first group represents those students who did not have a summer job and so earned
nothing, and the rest of the distribution is those with jobs. What causes the spikes? Most
students rounded their estimate to the nearest $1,000, but the chart has groupings in $500
increments. A different grouping would result in a smoother histogram. However, unless your
sample is extremely large, most histograms have some lumpiness due to randomness in the
observations. This is to be expected and is not a deficiency in the quality of the data.

Image Citations:
Figures 5-1 to 5-24: Images courtesy of author using Microsoft Excel
Figure 5-25: Image courtesy of Shishirdasika under CC BY-SA 3.0
Figures 5-26 to 5-27: Images courtesy of author using Microsoft Excel

5 Exploring Data – Summarizing Data Page 29 of 29

PERFORMANCETASK Block3
No ratings yet
PERFORMANCETASK Block3
6 pages
Excel Advanced Training Packet Update
No ratings yet
Excel Advanced Training Packet Update
44 pages
Presentation On Sorting and Subtotal
No ratings yet
Presentation On Sorting and Subtotal
37 pages
Excel Advanced Training Packet
0% (1)
Excel Advanced Training Packet
44 pages
Vijay Advance Excel
No ratings yet
Vijay Advance Excel
12 pages
Unit-3 Gathering and Processing Data For Reports
No ratings yet
Unit-3 Gathering and Processing Data For Reports
29 pages
Vineet Raj
No ratings yet
Vineet Raj
25 pages
Learning Unit 5
No ratings yet
Learning Unit 5
29 pages
Akshay Kumar Ruhela
No ratings yet
Akshay Kumar Ruhela
28 pages
20230630-Working With A Spreadsheet
No ratings yet
20230630-Working With A Spreadsheet
18 pages
excel tasks
No ratings yet
excel tasks
9 pages
Excel Tables
No ratings yet
Excel Tables
19 pages
MS Excel Ebook Ira
No ratings yet
MS Excel Ebook Ira
53 pages
Ccw331business Analytics Lab
No ratings yet
Ccw331business Analytics Lab
91 pages
Microsoft Excel: The Basics For Writing A Chemistry Lab Using Excel 2007 (Or Whatever Is On This Computer)
No ratings yet
Microsoft Excel: The Basics For Writing A Chemistry Lab Using Excel 2007 (Or Whatever Is On This Computer)
40 pages
Data Analysis: in Microsoft Excel
100% (1)
Data Analysis: in Microsoft Excel
48 pages
Database Analytics
No ratings yet
Database Analytics
29 pages
Cream and Green Minimalist Nature Presentation
No ratings yet
Cream and Green Minimalist Nature Presentation
7 pages
MS Excel-QUICK GUIDE
No ratings yet
MS Excel-QUICK GUIDE
2 pages
Data Manipulation & Analysis
No ratings yet
Data Manipulation & Analysis
28 pages
Analytics On Spreadsheets: Objectives
No ratings yet
Analytics On Spreadsheets: Objectives
32 pages
04 M365ExcelClass
No ratings yet
04 M365ExcelClass
55 pages
Excel Sheet: Information Technology Management Assignment
No ratings yet
Excel Sheet: Information Technology Management Assignment
32 pages
Excel pdf
No ratings yet
Excel pdf
53 pages
ad3491-foda-question-bank
No ratings yet
ad3491-foda-question-bank
7 pages
Introduction To Basic Statistics
No ratings yet
Introduction To Basic Statistics
53 pages
Chapter 3 MGSC
No ratings yet
Chapter 3 MGSC
28 pages
Statistics for the Behavioral Sciences 4th Edition Susan A. Nolan download
100% (3)
Statistics for the Behavioral Sciences 4th Edition Susan A. Nolan download
59 pages
Advance Excel 2003 Updated PDF
No ratings yet
Advance Excel 2003 Updated PDF
55 pages
Advanced Excel
No ratings yet
Advanced Excel
37 pages
Aman It (4
No ratings yet
Aman It (4
22 pages
Introduction To Excel
No ratings yet
Introduction To Excel
14 pages
Final Dass
No ratings yet
Final Dass
153 pages
Module 3
No ratings yet
Module 3
4 pages
Sorting and Filtering A Table: Chapter 9: Tables and Worksheet Databases
No ratings yet
Sorting and Filtering A Table: Chapter 9: Tables and Worksheet Databases
5 pages
ʜs ʟ 12
No ratings yet
ʜs ʟ 12
11 pages
Excel 2007: Working With Lists (Sorting, Filtering and Using Subtotals)
No ratings yet
Excel 2007: Working With Lists (Sorting, Filtering and Using Subtotals)
9 pages
Gayu Report
No ratings yet
Gayu Report
24 pages
Excelstatguide
No ratings yet
Excelstatguide
8 pages
Fundamental of Statistics
No ratings yet
Fundamental of Statistics
3 pages
Disseminating Information From Surveys
No ratings yet
Disseminating Information From Surveys
32 pages
ITF_labmanual_Excel_1
No ratings yet
ITF_labmanual_Excel_1
18 pages
Excel2016 Sorting Filtering
No ratings yet
Excel2016 Sorting Filtering
19 pages
Excel
No ratings yet
Excel
34 pages
Charts in Spreadsheet:: Exam Results - Subject Marks Student Name English Science Mathematics
No ratings yet
Charts in Spreadsheet:: Exam Results - Subject Marks Student Name English Science Mathematics
4 pages
KPK 10th Maths Ch06 KM
No ratings yet
KPK 10th Maths Ch06 KM
31 pages
Lesson 4 Group 2
No ratings yet
Lesson 4 Group 2
55 pages
WORKSHEETS
No ratings yet
WORKSHEETS
43 pages
Intro To Information Technology: Tables, Functions, and Pivot Tables Microsoft Excel
No ratings yet
Intro To Information Technology: Tables, Functions, and Pivot Tables Microsoft Excel
12 pages
Working With Spreadsheet
No ratings yet
Working With Spreadsheet
64 pages
The Function of Figures in Microsoft Office Excel and Google Spreadsheets
No ratings yet
The Function of Figures in Microsoft Office Excel and Google Spreadsheets
14 pages
1 Qs
No ratings yet
1 Qs
21 pages
Computer Lab Exercise #1: Introduction To Excel.: Exercises
No ratings yet
Computer Lab Exercise #1: Introduction To Excel.: Exercises
3 pages
Chapitre 1
No ratings yet
Chapitre 1
40 pages
Comp1703 Weeknine
No ratings yet
Comp1703 Weeknine
16 pages
Change How Subtotals and Grand Totals Are Shown: Tip After Applying The Layout You Want, You Can
No ratings yet
Change How Subtotals and Grand Totals Are Shown: Tip After Applying The Layout You Want, You Can
7 pages
G5 Q4W6 DLL MATH MELCs
No ratings yet
G5 Q4W6 DLL MATH MELCs
15 pages
2.1 Descriptive Statistics (Tabular and Graphical)
No ratings yet
2.1 Descriptive Statistics (Tabular and Graphical)
8 pages
STATSAssignemnt 1
No ratings yet
STATSAssignemnt 1
9 pages
Data Treatment Sheet 1
No ratings yet
Data Treatment Sheet 1
3 pages
Data Collection and Collation Reporting Analysis
No ratings yet
Data Collection and Collation Reporting Analysis
24 pages
Module 3 - Data Analysis in Excel
100% (1)
Module 3 - Data Analysis in Excel
38 pages
My Test
No ratings yet
My Test
9 pages
LESSON 3 Data Presentation
No ratings yet
LESSON 3 Data Presentation
5 pages
PML Ex3
No ratings yet
PML Ex3
20 pages
Subtotals Are An Ideal Way To Get Totals of Several Columns of Data That The Subtotal Can Help You Insert The SUM, AVERAGE, COUNT, MIN, MAX and
No ratings yet
Subtotals Are An Ideal Way To Get Totals of Several Columns of Data That The Subtotal Can Help You Insert The SUM, AVERAGE, COUNT, MIN, MAX and
6 pages
STA 111 Exam Sem1 2023 - 2024 Final Draft After Moderation Amrking Scheme Final
No ratings yet
STA 111 Exam Sem1 2023 - 2024 Final Draft After Moderation Amrking Scheme Final
10 pages
Lecture Statistics
No ratings yet
Lecture Statistics
5 pages
Gecs 1202 Statisitics
No ratings yet
Gecs 1202 Statisitics
3 pages
TYBA All Psychology Papers Syllabi 2020-21-1
No ratings yet
TYBA All Psychology Papers Syllabi 2020-21-1
20 pages
Stat I Chapter 1 and 2
No ratings yet
Stat I Chapter 1 and 2
29 pages
1350310292EXCEL - Part 2
No ratings yet
1350310292EXCEL - Part 2
12 pages
Statistics For Class 10
No ratings yet
Statistics For Class 10
8 pages
Excel: Thanks To Adam Voyton at Wilmington University For Sharing His Presentation On The Interwebs!
No ratings yet
Excel: Thanks To Adam Voyton at Wilmington University For Sharing His Presentation On The Interwebs!
27 pages
MATH& 146 Lesson 11: Section 1.6
No ratings yet
MATH& 146 Lesson 11: Section 1.6
34 pages
Analyzing Survey Questionnaires
No ratings yet
Analyzing Survey Questionnaires
16 pages
0103
No ratings yet
0103
15 pages
OGRAPHY
No ratings yet
OGRAPHY
11 pages
EXCEL PROGRAM
No ratings yet
EXCEL PROGRAM
14 pages
CHAPTER 2 - Descriptive Statistics
No ratings yet
CHAPTER 2 - Descriptive Statistics
3 pages
Features That Make Excel A Powerful Tool
No ratings yet
Features That Make Excel A Powerful Tool
4 pages
Department of Collegiate and Technical Education
No ratings yet
Department of Collegiate and Technical Education
11 pages
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
100% (1)
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
21 pages
Module 7
No ratings yet
Module 7
5 pages
Create A Pivottable To Analyze Worksheet Data
No ratings yet
Create A Pivottable To Analyze Worksheet Data
23 pages
240E3A - Statistics For Behavioral Science
No ratings yet
240E3A - Statistics For Behavioral Science
3 pages
How To Utilize Data Analysis in Excel
No ratings yet
How To Utilize Data Analysis in Excel
19 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
5/5 (2)
Excel Statistics: Step by Step
From Everand
Excel Statistics: Step by Step
Stephanie Glen
4/5 (8)
Advance Microsoft Excel 2016
No ratings yet
Advance Microsoft Excel 2016
34 pages

5 Summarizing Data

Uploaded by

5 Summarizing Data

Uploaded by

5 Exploring Data – Summarizing Data

5.1 Sorting and Filtering a Table

Exploring in Excel Tables

Open the file Student Survey Data 2010.xlxs.

Figure 5-1 Sorting and Filtering a numeric variable in a Table

5 Exploring Data – Summarizing Data Page 1 of 29

Figure 5-3 Custom Sort

5 Exploring Data – Summarizing Data Page 2 of 29

Remember to take filters off to return to the full data set.

Look at the largest values. Someone expects to earn $650,000? $560,000?

5 Exploring Data – Summarizing Data Page 3 of 29

Notice how many gave no answer. Why? Should we be concerned?

Sorting and Filtering in Python

5 Exploring Data – Summarizing Data Page 4 of 29

Observe that we can chain “methods” together.

5 Exploring Data – Summarizing Data Page 5 of 29

Using tab to save typing

5 Exploring Data – Summarizing Data Page 6 of 29

5.2 Simple Summary Values

Summary Statistics in Excel

Figure 5-5 - Adding a Total Row

5 Exploring Data – Summarizing Data Page 7 of 29

Lesson: Recoding may have unintended consequences.

Lesson: Watch out for survey fatigue with long surveys.

note: do not insert $ and commas. Type 20000 and 120000.

The average salary is $51,480.80.

5 Exploring Data – Summarizing Data Page 8 of 29

Video: Sorting, Filtering and Total in an Excel Table

Summary Statistics in Python

5 Exploring Data – Summarizing Data Page 9 of 29

5 Exploring Data – Summarizing Data Page 10 of 29

Table 5-1 - Proposed grouping for Expected Salary data

To build a frequency distribution, you must group the data.

What are the “best practices” in grouping?

What does this mean? How do we do it well?

5 Exploring Data – Summarizing Data Page 11 of 29

Installing the Analyis ToolPak

Figure 5-7 - Installing the Analysis ToolPak in Excel for Windows

5 Exploring Data – Summarizing Data Page 12 of 29

Figure 5-9 XLMiner Add-in

5 Exploring Data – Summarizing Data Page 13 of 29

Figure 3-10 - Salary bins for building Histograms

5 Exploring Data – Summarizing Data Page 14 of 29

Figure 5-12 - Histogram dialog box

The Input Range is the location of the data to be summarized $L$1:$L$812.

5 Exploring Data – Summarizing Data Page 15 of 29

Click on any column and then right click.

Figure 5-14 - Formatting the histogram

5 Exploring Data – Summarizing Data Page 16 of 29

Figure 5-16 - Histogram with 65 bins

5 Exploring Data – Summarizing Data Page 17 of 29

Use the Chart Tools to create a Histogram

Figure 5-17 Histogram Chart

5 Exploring Data – Summarizing Data Page 18 of 29

Figure 5-18 Histogram Chart Output

Figure 5-19 Histogram Chart with Bin width set

5 Exploring Data – Summarizing Data Page 19 of 29

Figure 5-21 Formatted Histogram Chart

5 Exploring Data – Summarizing Data Page 20 of 29

If you wish to obtain relative frequencies, then set normalize=True.

5 Exploring Data – Summarizing Data Page 21 of 29

Almost all the data is in the first 4 groups.

5 Exploring Data – Summarizing Data Page 22 of 29

5 Exploring Data – Summarizing Data Page 23 of 29

5.5 Comparing Histograms in Excel

5 Exploring Data – Summarizing Data Page 24 of 29

5 Exploring Data – Summarizing Data Page 25 of 29

Figure 5-23 Relative Frequencies as a Clustered Column Chart

Format the chart with an informative title and labelled axes.

5 Exploring Data – Summarizing Data Page 26 of 29

5.6 Common Shapes of Histograms (Distributions)

5 Exploring Data – Summarizing Data Page 27 of 29

Figure 5-26 - Skewed distributions

5 Exploring Data – Summarizing Data Page 28 of 29

5 Exploring Data – Summarizing Data Page 29 of 29

You might also like