Analytics On Spreadsheets and Data Visualization: Prof M.Shashi
Analytics On Spreadsheets and Data Visualization: Prof M.Shashi
Data Visualization
Prof M.Shashi
Purchase orders Data in a Spreadsheet
Excel functions for Aggregates
• Functions are used extensively in spreadsheets to
calculate additional metrics or quantities from the
data available in the cells.
• All Excel functions start with “=“ sign followed by
function name with list of arguments in parenthesis
• The formula in cell B99 is =MIN(F4:F97) returns 90
Example Conditional formulae
• Embed IF logic within math functions like SUMIF,
AVERAGEIF, SUMIFS, COUNTIFS, etc
• Syntax:
=SUMIF(range,criterion,[sum range]) where sum range is
optional and it allows to add in a different range like
=SUMIF(D4:D97, “Airframe fasteners”, G4:G97)
=SUMIFS(sum range,range1,criterion1, range2,criterion2,
……,rangeN, criterionN) where function SUM is applied
on cells in the sumrange if they satisfy multiple criteria
like =SUMIFS(F4:F97, A4:A97, “Alum Sheeting”,
D4:D97,”Airframes fastners”)
Formatting Excel data as Tables
Suppose that in the Credit Risk Data table, we wish to calculate the total amount of savings in column
C. We could, of course, simply use the function SUM(C4:C428).
However, with a table, we could use the formula SUM(Table1[Savings]). The table name, Table1, can
be found (and changed) in the Properties group of the Table Tools Design tab. Note that Savings is the
name of the header in column C.
One of the advantages of doing this is that if we add new records to the table, the calculation will be
updated automatically, and we don’t have to change the range in the formula or get a wrong result if
we forget to.
As another example, we could find the number of home owners using the function
COUNTIF(Table1[Housing], “Own”).
Sorting Transaction Data
•Business analytics often involves sorting the contents of a table to find the required collection of records at one place or to locate specific
records with a given detail. The following tables show the sales transaction data before and after sorting based on customer Id to facilitate
the records based on transaction code.
•After selecting the required column we need to apply the sort option available in the Excel ribbon and select specific detail.
•Original table has sale transaction record sorted in ascending order of the cust Id. But for locating the records based on
transaction code, we can sort the table by selecting the column D and applying sort option with ascending order (A to Z) as
sub-option available within it.
Pareto Analysis
• Pareto principle, often called “80-20 Rule”, is observed
in many business situations.
• It refers to the generic situation in which 80% of some
output comes from 20% of some input.
– eg: a large % of inventory value corresponds to a small %
of items
– a large % of quality defects stems from a small % of
sources
– eg:a large % of sales comes from a small % of customers
• Pareto analysis involves sorting data and calculating
the cummulative percentage of the variable of interest.
Filtering Data in Excel Tables
• For large files, finding a particular subset of records that meet a
specific characteristic by sorting can be tedious.
• Extracting a set of records having certain characteristics is called
Filtering.
• Excel provides two filtering tools:
– AutoFilter for simple criteria
– Advanced Filter for more complex criteria
• Applying filter to identify orders involving items whose cost is more
than $200 from the purchase orders table:
– Click on drop-down arrow in the Item Cost(E) column and position the
cursor on Numbers Filter and then select Greater than or Equal to..
from the list.
– This gets you a Custom AutoFilter dialog which accepts upto two
criteria using AND and OR logic. Enter 200 in the box and click OK.
– All records having item cost >=$200 are displayed
Selecting records for Item cost Filtering
Specific Functions used for Business
Analytics
• Financial models require calculation of
Discounted cash flow named as Net Present
Value (NPV) and Excel provides a built-in
function for NPV of a stream of cash flows.
– A cash flow of F rupees at time period t in future has
discounted value today due to
• lack of opportunity to achieve a return by investing it now
• Risks or uncertainty involved in not receiving until later
– Hence future cash flows involves a discount rate, i,
while estimating NPV
Calculation of Net Present Value (NPV)
Eg.:The fixed cost of introducing a new product into market is $25000 which is
incurred just prior to its launch. The forecasted net sale revenues for the next 6
months is shown in C4:H4. Cell B8 finds NPV applying the function =NPV(B6,C4:H4)-B5
Excel functions for Database Queries
Lookup functions for Database Queries
=VLOOKUP(10007,$A$4:$H$475,3) for finding type of payment in sales Database
Excel functions for Database Queries
• To find the sale amount for a given product in a given month
from the Monthly product_sales database shown below:
– Cell I8 =VLOOKUP(I5,A4:F15,IF(I6=“A”,2,IF(I6=“B”,3,IF(I6=“C”,4,
IF(I6=“D”,5,IF(I6=“E”,6))))),FALSE) OR
– =VLOOKUP(I5,A4:F15,MATCH(I6,B3:F3,0)+1,FALSE) OR
– =INDEX(A4:F15,MATCH(I5,A4:A15,0),MATCH(I6,A3:F3,0)
Database Functions available in Excel
• All database functions in Excel start with “D”
– DSUM,DAVERAGE,DCOUNT, etc.
• Allow to specify a criteria to apply the function on a subset of records in
the DB.
• Syntax of DSUM:
– DSUM(database,field,criteria) where database is the range that includes column
labels, field is the column name (enclosed in quotes or a reference) that contains
the values to sum, criteria is to select the subset of records in the range
– Eg: =DSUM(A8:J102,”Cost per order”, A3:J5)
Data Visualization
• Data Visualization is the process of displaying data in a meaningful
way to provide insights that will support better decision making.
• In finance, it is used to track revenues, costs, and profits over time,
to compare performance between years or among different
departments.
• In marketing, DV is used to show trends in customer satisfaction,
compare sales in different regions, and show the impact of
advertising strategies.
• In operations, DV illustrates the performance of different facilities /
equipment, product quality, call volumes in technical support
department, or supply chain metrics such as late deliveries.
• Visualizing data provides a way of communicating data at all levels
of business, helps users to quickly understand / interpret the data
and can reveal surprising patterns and relationships.
• Visualizing a pattern helps analysts to select appropriate
mathematical function to model the phenomenon.
Tabular versus Visual Data Analysis
• Data related to Monthly Product
sales is shown in tabular form as
well as a visual form (line charts).
• By a single glance of the visual
form, it is found that product C is
sold the least but stable while the
sales of the other products
fluctuated
Creating Charts in MS Excel
• General Steps to create charts:
– High light the range of the data we wish to chart
– Click the Insert tab in the Excel ribbon
– Click the chart type and then select the chart subtype to be used
– Once the chart is created, it can be customized using Design and
Format tabs
• In Design tab, we can change the type of chart, data included in it, chart layout
and styles.
• The chart can be customized by right clicking on the elements of the chart
Column and Bar Charts
In Excel vertical and horizontal bar charts are called column charts and
bar charts respectively.
A clustered column chart compares values of categorical variables
using vertical rectangles. (P.T.O)
A Stacked Column chart displays the contribution of each value to the
total by stacking the rectangles.
Category 4 100%
80%
Category 3
60%
Category 2 40%
20%
Category 1
0%
Category 1 Category 2 Category 3 Category 4
0 1 2 3 4 5 6
Series 3 Series 2 Series 1 Series 1 Series 2 Series 3
Creating column charts
for EEO Employment Report data of Alabama State from A3:K6
Pie charts and Area charts
If the data has too many discrete values or if it is continuous valued, then we have to specify:
Number of groups, width of each group and non-overlapping boundaries of each group
Computing Percentiles and Quantiles
in Excel
• Excel has a tool for sorting from High to low and computing percentiles associated
with each value of a numeric variable.
• Select Rank and Percentile from Data analysis menu and specify the range in the
dialogue box.
• To find the 90th percentile in the purchase order data set on the Cost per order
variable, use function =PERCENTILE.INC(G4:G97,0.9)
• For computing quartiles we can use =QUARTILE.INC(range,quart)
Data Exploration with Pivot Tables
• Pivot tables are used to quickly create cross-tables and to create custom summaries.
• Select any cell in a dataset and choose pivot table under tables group in the insert tab
and follow the steps in the wizard to select fields for rows, columns, values, etc.
• Pivot table for regions versus products is shown below:
– no summarization function except count is sensible on cust ID which is taken as values in the cross-
table
Data Summarization with Pivot tables with Filters
In order to get the number of customers for a
given region and product type, the
summarization operator on Value field is set
to count in the dialogue box shown next.
The figure below shows how to apply filter on
payment mode before counting the number of
customers for different products in a specific
regions.
Pivot Table with Rows Sub-divided to
Represent Cross-tables
• In order to drill-down further, additional variables are added to
elaborate the rows;
– Rows representing the region-wise customer counts are sub-divided based
on the source of the order.