0% found this document useful (0 votes)
36 views31 pages

Analytics On Spreadsheets and Data Visualization: Prof M.Shashi

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views31 pages

Analytics On Spreadsheets and Data Visualization: Prof M.Shashi

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Analytics on Spreadsheets and

Data Visualization
Prof M.Shashi
Purchase orders Data in a Spreadsheet
Excel functions for Aggregates
• Functions are used extensively in spreadsheets to
calculate additional metrics or quantities from the
data available in the cells.
• All Excel functions start with “=“ sign followed by
function name with list of arguments in parenthesis
• The formula in cell B99 is =MIN(F4:F97) returns 90
Example Conditional formulae
• Embed IF logic within math functions like SUMIF,
AVERAGEIF, SUMIFS, COUNTIFS, etc
• Syntax:
=SUMIF(range,criterion,[sum range]) where sum range is
optional and it allows to add in a different range like
=SUMIF(D4:D97, “Airframe fasteners”, G4:G97)
=SUMIFS(sum range,range1,criterion1, range2,criterion2,
……,rangeN, criterionN) where function SUM is applied
on cells in the sumrange if they satisfy multiple criteria
like =SUMIFS(F4:F97, A4:A97, “Alum Sheeting”,
D4:D97,”Airframes fastners”)
Formatting Excel data as Tables
Suppose that in the Credit Risk Data table, we wish to calculate the total amount of savings in column
C. We could, of course, simply use the function SUM(C4:C428).
However, with a table, we could use the formula SUM(Table1[Savings]). The table name, Table1, can
be found (and changed) in the Properties group of the Table Tools Design tab. Note that Savings is the
name of the header in column C.
One of the advantages of doing this is that if we add new records to the table, the calculation will be
updated automatically, and we don’t have to change the range in the formula or get a wrong result if
we forget to.
As another example, we could find the number of home owners using the function
COUNTIF(Table1[Housing], “Own”).
Sorting Transaction Data
•Business analytics often involves sorting the contents of a table to find the required collection of records at one place or to locate specific
records with a given detail. The following tables show the sales transaction data before and after sorting based on customer Id to facilitate
the records based on transaction code.

•After selecting the required column we need to apply the sort option available in the Excel ribbon and select specific detail.
•Original table has sale transaction record sorted in ascending order of the cust Id. But for locating the records based on
transaction code, we can sort the table by selecting the column D and applying sort option with ascending order (A to Z) as
sub-option available within it.
Pareto Analysis
• Pareto principle, often called “80-20 Rule”, is observed
in many business situations.
• It refers to the generic situation in which 80% of some
output comes from 20% of some input.
– eg: a large % of inventory value corresponds to a small %
of items
– a large % of quality defects stems from a small % of
sources
– eg:a large % of sales comes from a small % of customers
• Pareto analysis involves sorting data and calculating
the cummulative percentage of the variable of interest.
Filtering Data in Excel Tables
• For large files, finding a particular subset of records that meet a
specific characteristic by sorting can be tedious.
• Extracting a set of records having certain characteristics is called
Filtering.
• Excel provides two filtering tools:
– AutoFilter for simple criteria
– Advanced Filter for more complex criteria
• Applying filter to identify orders involving items whose cost is more
than $200 from the purchase orders table:
– Click on drop-down arrow in the Item Cost(E) column and position the
cursor on Numbers Filter and then select Greater than or Equal to..
from the list.
– This gets you a Custom AutoFilter dialog which accepts upto two
criteria using AND and OR logic. Enter 200 in the box and click OK.
– All records having item cost >=$200 are displayed
Selecting records for Item cost Filtering
Specific Functions used for Business
Analytics
• Financial models require calculation of
Discounted cash flow named as Net Present
Value (NPV) and Excel provides a built-in
function for NPV of a stream of cash flows.
– A cash flow of F rupees at time period t in future has
discounted value today due to
• lack of opportunity to achieve a return by investing it now
• Risks or uncertainty involved in not receiving until later
– Hence future cash flows involves a discount rate, i,
while estimating NPV
Calculation of Net Present Value (NPV)

Eg.:The fixed cost of introducing a new product into market is $25000 which is
incurred just prior to its launch. The forecasted net sale revenues for the next 6
months is shown in C4:H4. Cell B8 finds NPV applying the function =NPV(B6,C4:H4)-B5
Excel functions for Database Queries
Lookup functions for Database Queries
=VLOOKUP(10007,$A$4:$H$475,3) for finding type of payment in sales Database
Excel functions for Database Queries
• To find the sale amount for a given product in a given month
from the Monthly product_sales database shown below:
– Cell I8 =VLOOKUP(I5,A4:F15,IF(I6=“A”,2,IF(I6=“B”,3,IF(I6=“C”,4,
IF(I6=“D”,5,IF(I6=“E”,6))))),FALSE) OR
– =VLOOKUP(I5,A4:F15,MATCH(I6,B3:F3,0)+1,FALSE) OR
– =INDEX(A4:F15,MATCH(I5,A4:A15,0),MATCH(I6,A3:F3,0)
Database Functions available in Excel
• All database functions in Excel start with “D”
– DSUM,DAVERAGE,DCOUNT, etc.
• Allow to specify a criteria to apply the function on a subset of records in
the DB.
• Syntax of DSUM:
– DSUM(database,field,criteria) where database is the range that includes column
labels, field is the column name (enclosed in quotes or a reference) that contains
the values to sum, criteria is to select the subset of records in the range
– Eg: =DSUM(A8:J102,”Cost per order”, A3:J5)
Data Visualization
• Data Visualization is the process of displaying data in a meaningful
way to provide insights that will support better decision making.
• In finance, it is used to track revenues, costs, and profits over time,
to compare performance between years or among different
departments.
• In marketing, DV is used to show trends in customer satisfaction,
compare sales in different regions, and show the impact of
advertising strategies.
• In operations, DV illustrates the performance of different facilities /
equipment, product quality, call volumes in technical support
department, or supply chain metrics such as late deliveries.
• Visualizing data provides a way of communicating data at all levels
of business, helps users to quickly understand / interpret the data
and can reveal surprising patterns and relationships.
• Visualizing a pattern helps analysts to select appropriate
mathematical function to model the phenomenon.
Tabular versus Visual Data Analysis
• Data related to Monthly Product
sales is shown in tabular form as
well as a visual form (line charts).
• By a single glance of the visual
form, it is found that product C is
sold the least but stable while the
sales of the other products
fluctuated
Creating Charts in MS Excel
• General Steps to create charts:
– High light the range of the data we wish to chart
– Click the Insert tab in the Excel ribbon
– Click the chart type and then select the chart subtype to be used
– Once the chart is created, it can be customized using Design and
Format tabs
• In Design tab, we can change the type of chart, data included in it, chart layout
and styles.
• The chart can be customized by right clicking on the elements of the chart
Column and Bar Charts
In Excel vertical and horizontal bar charts are called column charts and
bar charts respectively.
A clustered column chart compares values of categorical variables
using vertical rectangles. (P.T.O)
A Stacked Column chart displays the contribution of each value to the
total by stacking the rectangles.

Bar chart Stacked Column Chart

Category 4 100%
80%
Category 3
60%
Category 2 40%
20%
Category 1
0%
Category 1 Category 2 Category 3 Category 4
0 1 2 3 4 5 6
Series 3 Series 2 Series 1 Series 1 Series 2 Series 3
Creating column charts
for EEO Employment Report data of Alabama State from A3:K6
Pie charts and Area charts

• Data distribution of a variable is


presented using Line charts, Pie charts
and Area charts, et.
• Line charts are drawn for displaying
data over time as shown in the
previous slides for monthly product
sales.
• Pie charts presents the relative
proportion of each category in a whole
lot represented as a circular pie.
• Area charts combines the features of
Line diagrams and Pie charts. The area
charts shows the proportion of fossil
fuel consumption out of the total fuel
consumption is consistent at about half.
Scatter charts and Bubble charts
• Relationship between two variables is shown in Scatter, Orbit and Bubble charts.
• Given the observations for a pair of variables say Market Value and House Size the scatter
plot displays that Market Value increases with House Size.
• An Orbit chart shows the path taken by a temporal data over time. They can be created by
adding smooth lines and markers to scatter plot with time on the X-axis.
• Bubble chart is a type of scatter chart in which the size of the data marker corresponds to a
third variable; represents 3 variables (price, Price/Earnings, market capitalization) in 2-Dim

This bubble chart shows variation of


Price/Earning vs. Price and Market capitalization
is added as the 3rd variable using marker size.
Data Bars, Color Scales and Icon sets
• Data Bars display colored bars
that are scaled to the data
values placed directly within
the cells.
• Color Scales (shown in fig.)
shade cells based on their
numerical value using a color
palette as an option available
in conditional Formatting
menu.
• Icon Sets adds to the cells
similar information as color
scales using various symbols
like arrows, stoplights by
thresholding on the user
specified values.
Dashboards
• A dashboard is a visual representation of a set of key
business measures.
• It is derived from an automobile’s control panel that
displays speed, gasoline level, temparature, etc.
• An effective dashboard captures all key information
called key performance indicators (KPIs) required for
making good decisions.
• The dashboards are designed to keep it simple by
eliminating less important charts
• The most important charts are positioned at the top
left corner.
Data Summarization with Statistics
• “Statistics is both the Science of Uncertainty
and the technology of extracting information
from data”. –David Hand, President of Royal
Statistical Society,UK
• Statistical Methods are applied for data
description and summarization in Excel using
functions and also using Analysis Toolpak add-
in for complex tasks.
Frequency Distribution for Categorical Data
1. To get the number of orders for each item, we need to count the occurrences of each Distinct
value of ‘Item Description’ listed in column A101:A113 using COUNTIF function in column
B101:113
2. To get the relative frequency distribution in column C, fill B114 with SUM(B101:B113) and
apply C101=B101/$B$114 and then copy the function down the column C
Frequency Distribution for Numerical Data
•To construct Histograms using Analysis Toolpak:
Click on Data Analysis tool tab in Analysis group under Data tab and select Histogram.
In the dialogue box select Input Range and Bin Range to specify groups. Column label is optional.
If the input is discrete valued, set up a column of these values and specify this range as bin range
field
The figure shows histograms for discrete valued variable ‘duration of A/P terms’ as shown below:

If the data has too many discrete values or if it is continuous valued, then we have to specify:
Number of groups, width of each group and non-overlapping boundaries of each group
Computing Percentiles and Quantiles
in Excel
• Excel has a tool for sorting from High to low and computing percentiles associated
with each value of a numeric variable.
• Select Rank and Percentile from Data analysis menu and specify the range in the
dialogue box.
• To find the 90th percentile in the purchase order data set on the Cost per order
variable, use function =PERCENTILE.INC(G4:G97,0.9)
• For computing quartiles we can use =QUARTILE.INC(range,quart)
Data Exploration with Pivot Tables
• Pivot tables are used to quickly create cross-tables and to create custom summaries.
• Select any cell in a dataset and choose pivot table under tables group in the insert tab
and follow the steps in the wizard to select fields for rows, columns, values, etc.
• Pivot table for regions versus products is shown below:
– no summarization function except count is sensible on cust ID which is taken as values in the cross-
table
Data Summarization with Pivot tables with Filters
In order to get the number of customers for a
given region and product type, the
summarization operator on Value field is set
to count in the dialogue box shown next. 
The figure below shows how to apply filter on
payment mode before counting the number of
customers for different products in a specific
regions.
Pivot Table with Rows Sub-divided to
Represent Cross-tables
• In order to drill-down further, additional variables are added to
elaborate the rows;
– Rows representing the region-wise customer counts are sub-divided based
on the source of the order.

You might also like