Excel - Consolidated Lecture Notesids
Excel - Consolidated Lecture Notesids
Agenda
● Excel Introduction
○ What is MS Excel?
○ MS Excel Marketplace
○ Job Opportunities
○ Purchase options : Excel 2021 vs Excel 365
○ Download and Install Excel
○ Excel Basics
● Excel Formulas
○ Problem Statement I & Solution
○ Problem Statement II & Solution
Excel Introduction
Excel is a powerful spreadsheet program that allows you to store, organize, manipulate, and
analyze information.
● Forrester Research found that 81% of businesses use Excel. This means that the
majority of businesses use excel and these skills are very marketable to employers.
● Excel is used by an estimated 750 million people worldwide.
● According to Linkedin, Data Analysts roles in Excel are one of the top 10 most in
demand jobs.
Note: Forrester is a research and advisory company that offers a variety of services including
research, consulting, and events.
● Data Analyst
● Business Analyst
● Financial Analyst
● Project Managers
Excel 2021 is a standalone A subscription service. Monthly fee of Rs. 4899/year for 1
‘one-time purchase’ option of person and Rs. 6199/year for up to 6 persons to rent the
Rs 11,999. software from Microsoft
Excel 2021 is yours to keep but Subscribers automatically receive the latest updates
does not receive perpetual
updates
Pre-read:
https://docs.google.com/document/d/1b_M1zbPYdUCHqsbz9gjMyaBcYY5wk5TDJIVlfNdZ7YM/
edit?usp=sharing)
Excel Basics
This doc is a pre-read that covers the basics of Excel. Please give it a good read.
https://docs.google.com/document/d/1W10kUxwMzPZPX6_JerTxFkzxh9wEa-G-6usCuAoCjxg/e
dit?usp=sharing
Problem Statement
After losing the IPL 2020 final to Mumbai Indians (Match ID 1237181), Delhi Capitals hired you
as a Data Analyst to analyze and understand why they lost the match.
Dataset Overview
The following dataset would be used
Source : IPL Complete Dataset (2008-2020) | Kaggle
Drive link: IPL dataset
Question 1
How many runs were scored by Mumbai Indians and Delhi Capitals?
Intuition
To solve this problem:
● We need to add the ball by ball runs for each of the innings.
● We will briefly discuss Navigation and Selection in Excel while solving this problem.
Solution
● Use total_runs column
● Two methods to do this :
1. Using Cell references -
Formula used:
○ =SUM(J9:J131) for DC (We need to manually select the cells)
○ =SUM(AE9:AE120) for MI
Demo:
● Click on cell K5, type =sum, press tab.
● Click on J9, press Ctrl + Shift + (Down Arrow) on keyboard, press Enter.
2. Using table name - Refer here on how to format range of cells as table
Formula used:
○ =SUM(DC_ball_by_ball[total_runs])
○ =SUM(MI_ball_by_ball[total_runs])
Demo:
Summing using table names with this formula is much clearer.
● To check the table name click on any cell which is table formatted you’ll see a table
design menu appear in the ribbon click on that to check the table name.
● Click on K4, type ‘=SUM’, press tab, start typing the table name, when the table name is
highlighted in the list press tab.
● Type ‘[’, type total_runs, press tab, close both brackets, press enter.
Introducing Functions
Question 2
Intuition
To solve the problem :
● Use extra_run column
● We will use SUM
Solution
● Formula used for DC Using table name:
○ =SUM(Ques2_DC[extra_runs])
● Formula used for MI [Scroll towards right in the same sheet]:
○ =SUM(Ques2_MI[extra_runs])
Demo:
1. Using extra runs column
○ In the Question 2 unworked sheet, click on any cell in the first table, go to table
design in the menu bar, see the table name on the left end.
○ Click on I3 cell, type =SUM(table_name[extra_runs]), press enter (use tab to
complete names from suggestions list)
Question 3
Intuition
● To solve this problem:
○ We will use the COUNTIF function.
○ In its simplest form, COUNTIF says
■ =COUNTIF(Where do you want to look?, What do you want to look for?)
Solution
● We will use the column is_wicket
● Formula used:
=COUNTIF(Ques3_MI[is_wicket],1) for MI
Alternate Solution
● Use of reference of another cell which contains 1
● Formula used:
=COUNTIF(Ques3_DC[is_wicket], AH23) for DC
Demo:
● Check the first table’s name, click on J3, enter the formula
=COUNTIF(table_name[is_wicket],1), press enter
● For DC, scroll to the right, check the second table name click on AI3 and enter the
formula up to =COUNTIF(table_name[is_wicket],
● Click on cell AH23, close the bracket press enter.
Question 4
What is the run rate comparison for MI and DC for the overs 0-4, 5-9, 10-14, 15-19th over.
Intuition
Definition of run rate: Number of runs divided by total number of overs in a given innings
In our dataset, we can’t use the AVERAGE() formula because it will divide the number of balls
but we want to divide by the number of overs.
We will use SUM and MAX.
Solution
● Get total number of overs in an innings
● Get total number of runs
● Divide total runs/total # of overs
● Formula used:
=SUM(FirstInningsData[total_runs])/(MAX(FirstInningsData[over])+1
)
Demo:
● Check table name, Click on H4, type the formula =SUM(table_name[total_runs]),
press enter
● Click on J4, type the formula =MAX(table_name[over])+1, press enter
● Click on Q4, enter formula =H4/J4
The below demo shows how to calculate the run rate for MI because in the 18th over they faced
only 4 balls hence we need to calculate the run rate taking this into account.
● Zoom out and go to the table named ‘Get run rate for MI in 15 to 19 overs’, click on
AA120, enter the formula =SUM(table_name[total_runs]), press enter.
● Click on cell AJ120, enter the formula =AA120/AG120 by selecting the cells.
Note:
● To calculate an accurate run rate we also need to account the number of balls faced.
Refer here for more details.
● Here we are getting the maximum number of balls in 15-19th over manually and also the
total number of overs faced but we will see later how to get it using advanced function.
Question 5
Find the number of wickets that were lost in the first five overs for MI and DC.
Intuition
To solve the problem :
● Using is_wicket column
● We will use SUMIF
Solution
● Formula used for DC: =SUMIF(Ques5_DC[is_wicket],1)
● Formula used for MI: =SUMIF(Ques5_MI[is_wicket],1)
Demo:
● Find table name, click on I3, enter the formula =SUM(Table_16[is_wicket])
Question 6
From the match id 1237181, given the list of batsmen which were out during the match, if they
were out by a catch we need to display the bowler and the fielder in this format “c fielder_name
b bowler_name” (As shown on the cricket match summary/scorecard).
Intuition
To solve this problem:
● In order to check if the person is out by a catch, we are checking the “dismissal_kind”
column by using the IF() formula and the desired output is obtained by using the
CONCAT() formula which is used to concatenate strings.
Solution
Formula used:
=IF(M12="caught",CONCAT("c ",O12," b ",G12),"Not Applicable")
Demo:
Question 7
If the player is dismissed by bowling, then display the output as “b bowler_name”.
Intuition
We will use IF and CONCAT
Solution
Logic is similar to the previous question.
Formula used:
=IF(M9="bowled",CONCAT("b ",G9),"Not Applicable")
Question 8
If the player is dismissed by run-out, then display the output as “run out (fielders_name)”
Intuition
We will use IF and CONCAT
Solution
Logic is similar to the previous question.
Formula used:
=IF(M8="run out",CONCAT("run out (",O8,")"),"Not Applicable")
Question 9
Write a formula/function that combines all the three outputs of the previous three questions in
one i.e. depending on how the player is out the output should be displayed in an appropriate
format.
Intuition
To solve this problem, we will use Nested If.
Nested If is used if you need to test for more than one condition, then take one of several
actions, depending on the result of the tests.
Solution
Solution using Nested If :
Formula used:
=IF(M10="caught",CONCAT("c ",O10," b ",G10),IF(M10="bowled",CONCAT(" b
",G10),IF(M10="run out",CONCAT("run out (",O10,")"),"Not Applicable")))
Demo:
● Click on F46, enter the formula -
=IF(M10="caught",CONCATENATE("c ",O10," b ",G10),IF(M10="bowled",CONCATENATE("
b ",G10),IF(M10="run out",CONCATENATE("run out (",O10,")"),"Not Applicable")))
Nested If flowchart :
Is there any issue with the Nested if? Is it possible to write the nested if logic in a single
function?
● Nested if is error-prone.
● Very difficult to maintain/figure it out at a later point in time.
● As an alternative, there is a single function called IFS.
Alternate Solution
The IFS function checks whether one or more conditions are met, and returns a value that
corresponds to the first TRUE condition.
Formula used:
=IFS(M10="caught", CONCAT("c ",O10," b ",G10), M10="bowled", CONCAT(" b
",G10), M10="run out", CONCAT("run out (",O10,")"), TRUE, "Not Applicable")
To summarize, some of the reasons why DC lost the IPL final against MI were:
● Since both the teams gave away an equal number of extras, nothing can be concluded
here.
● DC lost more wickets in comparison to MI.
● The run rate in the first 5 overs for MI was 11.6 while DC had a run rate of 7 and so we
can say that MI was able to score a significant amount of runs in the first 5 overs itself.
● In the first 5 overs DC lost 3 wickets. They lost a few of their top order batsmen, while MI
lost only 1 wicket during the same duration.
Tables in Excel
● To make managing and analyzing a group of related data easier, you can turn a range of
cells into an Excel table (previously known as an Excel list).
● Import the data into the Excel file.
● Add it to the table with a suitable name as it would be easy to access it later.
○ Note: table names cannot contain spaces
● Used to reference cell ranges by giving a table name.
● Reasons to use an Excel table
● Excel Table Benefits
Calculations in Excel
● Order of operations
● Formulas calculate values in a specific order.
● A formula in Excel always begins with an equal sign (=).
● Excel interprets the characters that follow the equal sign as a formula.
● Following the equal sign are the elements to be calculated (the operands), such as
constants or cell references.
● These are separated by calculation operators.
● Excel calculates the formula from left to right, according to a specific order for each
operator in the formula.
Note: If a formula contains multiple operators with the same priority (e.g. multiplication and
division, or addition and subtraction), Excel will evaluate the operators from left to right.
Question 10
Intuition
Solution
● We will create a new column which checks the month and year of the match and we will
use AND function to combine them.
○ Formula used:
=IF(AND((YEAR([@date])=2008),(MONTH([@date])=5)),1,0)
● We can either use the COUNTIF() or SUM() Formula to add up.
○ Formula used:
=COUNTIF([Condition],1)
Demo:
● In the cell next to umpire 2 in headers, type ‘condition’ and press enter to create a new
column.
● In the first cell of the new column, enter the formula
=IF(AND((YEAR([@date])=2008),(MONTH([@date])=5)),1,0)
● In any cell outside the table, enter =SUM(Table_24[Condition]) using either cell
selection or table reference.
Question 11
Complete ODI batting career data of Sachin Tendulkar is given in [dataset link]. Count the
number of 100s and 50s he has scored?
Solution :
Create a new column ‘cleaned runs’ and perform the above steps.
______________________________________________________________________________
Agenda
● Problem Statement
○ Lookup and Reference functions
■ Lookup - Vlookup & Hlookup
■ Index & Match
■ Dynamic Array Formulas: Filter, Unique & Xlookup
● Text functions
○ TextJoin
● Logical functions
○ IFError
● Math function
○ Sum, Sequence
Problem Statement
You are hired at ESPN Cricinfo as a Data Analyst. You have to implement a search functionality
where given a Match ID :
● You should be able to find the name of the winning team, the venue and city.
● Data can be retrieved based on whichever column is selected.
Lookup Functions
● The LOOKUP Function is categorized under Excel Lookup and Reference functions.
● The function performs a rough match lookup either in a one-row or one-column range
and returns the corresponding value from another one-row or one-column range.
● The more advanced versions of the LOOKUP function are HLOOKUP and VLOOKUP.
● To search multiple rows and columns (like a table), it is not possible using Lookup.
● Use (V/H)LOOKUP to search one row or column, or to search multiple rows and
columns (like a table). It is an improved version of Lookup.
Vlookup
=VLOOKUP(What you want to look up, where you want to look for it, the column number in the
range containing the value to return, return an Approximate or Exact match – indicated as
1/TRUE, or 0/FALSE).
Question 1
Given the ID of the Match, return the name of the team which won the match.
Intuition
To solve this problem :
● We will use Vlookup
● It is a function that makes Excel search for a certain value in a column, in order to return
a value from a different column in the same row.
● In its simplest form, the VLOOKUP function says:
=VLOOKUP(What you want to look up, where you want to look for it, the column number in the
range containing the value to return, return an Approximate or Exact match – indicated as
1/TRUE, or 0/FALSE).
Solution
=VLOOKUP(F10,IPLMatchData23,11,FALSE)
Demo:
Find the table name and enter the formula - =VLOOKUP(F10,table_name,11,FALSE) in
G10.
● Limitations:
○ VLookup always returns the first match
○ Can only lookup values on the leftmost column
○ Add/delete a column, we need to update the column index in the formula
Question 2
Given the ID of the Match, return the name of the city and the venue.
Intuition
To solve this problem: We will use Vlookup
Solution
● Formula to get city :
=VLOOKUP(F10,IPLMatchData23434[[#Headers],[#Data]],2,FALSE)
● Or, =VLOOKUP(F10,IPLMatchData23434,2,FALSE)
● Formula to get venue:
=VLOOKUP(F10,IPLMatchData23434[[#Headers],[#Data]],5,FALSE)
Demo:
● Find the table name
● Enter formula =VLOOKUP(F10,table_name,2,FALSE) in G10
● Enter formula =VLOOKUP(F10,table_name,5,FALSE) in H10
We’ll look at a dummy example on how the Vlookup approximate match works.
Question
Determine the unit prices for the given quantity using the reference quantity table.
In an approximate match it returns a value smaller than the lookup value.
Note:
● If range_lookup is TRUE or left out, the first column needs to be sorted alphabetically or
numerically.
● If the first column isn’t sorted, the return value might be something you don’t expect.
● Either sort the first column, or use FALSE for an exact match.
Question 3
For any Match ID, whichever column is selected we want to have the column information.
Intuition
We need the position of an item in a range instead of the item itself, therefore we use Match
rather than Lookup.
Solution
● Match function formula used:
=MATCH(G10,IPLMatchData2345[[#Headers],[id]:[umpire2]],0)
● Or, =MATCH(G10,IPLMatchData2345[#Headers],0)
● Use this formula inside the Vlookup formula :
=VLOOKUP(F10,IPLMatchData2345,MATCH(G10,IPLMatchData2345[#Headers
],0),FALSE)
● Add Data Validation so that we do not enter incorrect column
Demo:
● Go to H11 and enter the formula =MATCH(G10,table_name[#Headers],0) to get
column number
● In H10, enter the formula :
=VLOOKUP(F10,table_name,MATCH(G10,table_name[#Headers],0),FALSE)
Adding data validation :
Question 4
Find the first Match Id for which Sachin Tendulkar was the Player of the match.
Intuition
● We notice that this is the case of reverse lookup or left lookup as the id column lies on
the left side of the “player_of_match” column.
● We will use the Index and Match functions.
● Index returns the value at that Index position & Match returns the index position where of
the matched value.
Solution
Formula used:
=INDEX(IPLMatchData2345636[id],MATCH(F30,IPLMatchData[player_of_match]
,0),1)
To decide if you can use a VLOOKUP, or if you need to use INDEX and MATCH, you need to look
at two things:
If you’re explicitly asked to find the location of a single input, or to look something up using two
or more pieces of information, you’ll need to use the MATCH and/or INDEX functions instead.
MATCH finds the position of an item in a range. INDEX retrieves the value at a given location in a
range.
HLOOKUP Demonstration
Question 5
Considering the data given, what is the Number of Centuries scored by V Sehwag?
Intuition
We will use HLOOKUP.
HLOOKUP searches for a value in the top row of a table or an array of values, and then returns
a value in the same column from a row you specify in the table or array.
Solution
Formula used:
=HLOOKUP(L9,Table12[[#All],[SR Tendulkar]:[V Kohli]],3,FALSE)
Demo:
● Go to cell L10 and enter =HLOOKUP(L9, Table_name[#All], 3,0)
Question 6
Intuition
We show the comparison of with/without using array function.
An array function can perform multiple calculations on one or more items in an array.
Array functions can return either multiple results, or a single result.
Solution
Demo:
● Enter =[@[Price per piece]]*[@Quantity] in G12
● Delete the value in G19
● Enter =SUM(G12:G18) in G20
Demo:
● Enter =SUM(E20:E26*F20:F26) in G33
Question 7
Consider the data for a match with id 335982. How many overs were bowled in the first innings
and second innings?
Intuition
Legacy formulas return the output in one cell but Dynamic array formulas return the output in a
range of cells
As they return arrays of variable size hence the name Dynamic array. Example : FILTER,
SEQUENCE, UNIQUE, XLOOKUP
Solution
Demo:
● Go to cell E242 and enter =FILTER(Table_name[over],Table_name[inning]=1)
● Go to cell K242 and enter =MAX(E242#)
● Go to Q242 and enter
=MAX(FILTER(Table_name[over],Table_name[inning]=2))
Question 8
Consider the data for a match with id 335982.How many overs were bowled in the first innings
and second innings?
Intuition
Solution
● Filter out rows for 1st innings similar to what we did previously.
=FILTER(MatchData15[over],MatchData15[inning]=1)
○ Syntax: =FILTER(array,include,[if_empty])
● Use max to get the maximum number of overs.
=MAX(E242#)
● Generate sequence of overs from 0 to max.
=SEQUENCE(L242+1,,0,1)
○ Syntax: =SEQUENCE(rows,[columns],[start],[step]).
● Get runs scored on each ball for 1st over
=FILTER(MatchData15[total_runs],(MatchData15[over]=Z242)*(MatchDa
ta15[inning]=1))
● Use text join to combine it into single line which is comma separated and do this for all
the overs
=TEXTJOIN(",",TRUE,FILTER(MatchData15[total_runs],(MatchData15[ov
er]=AH242)*(MatchData15[inning]=1)))
● Syntax: =TEXTJOIN(delimiter, ignore_empty, text1, [text2], …)
● (We remove the textjoin from above formula) and SUM the runs for each over.
=SUM(FILTER(MatchData15[total_runs],(MatchData15[over]=AP242)*(Ma
tchData15[inning]=1)))
● Get the total score over by over
=SUM($AR$242:AR242)
Note: Repeat above steps to get the result for 2nd innings.
New array formulas spill to many cells CSE formulas must be copied to a range of cells
automatically to return multiple results.
The output of dynamic array formulas CSE formulas truncate the output if the return
automatically resizes as the data in the area is too small and return errors in extra cells if
source range changes. the return area is too large.
A dynamic array formula can be easily To modify a CSE formula, you need to select and
edited in a single cell. edit the whole range.
With dynamic arrays, row insertion or It is not possible to delete and insert rows in a
deletion is not a problem. CSE formula range - you need to delete all
existing formulas first.
Question 9
Determine the first match ID and city of match for which Sachin Tendulkar was the player of the
match.
Intuition
We have seen this question previously (Qn 4), this time will use Dynamic Array Formulas
(XLOOKUP function) which works in any direction and returns exact matches by default.
Solution
Formula used:
=XLOOKUP(E10,IPLMatchDataTable[player_of_match],IPLMatchDataTable[[id]
:[city]],,0,1)
Problems of Index Match (and better alternative)
______________________________________________________________________________
Agenda
● Problem Statement I
○ Functions
■ Unique, Filter
○ Pivot Tables
■ Helper column, Calculated Fields, Pivot Filter, Slicers
○ Duplicate Row Deletion
○ Custom formatting of cell
○ Conditional formatting
■ Top/Bottom Rules, Highlight rules, Data Bars
○ Charts
■ Bar, Pie, Stacked Bar
○ Conclusion
● Match Summary Analysis Demonstration
● Problem Statement II
○ Statistical Functions
■ Mean, Median, Mode, SD, Variance, Quartile, Min, Max
■ Box Plot, Histogram
Problem Statement 1
You have applied for a data analyst position in an Indian sports-tech startup company. In the
interview, you are given the IPL dataset and have been asked to analyze Mumbai Indian’s
performance in IPL (2008-2020).
Question 1
Intuition
To solve the problem, we will use :
● Excel functions
● Charts
Solution
● Create two new columns for each team to count total number of matches played and
total number of wins (Show how to do it for 2 teams only rest use the worked worksheet)
○ =IF([@winner]= "Mumbai Indians", 1, 0)
○ =IF( ([@team1]="Mumbai Indians")+([@team2]="Mumbai
Indians"),1,0)
● Finally we calculate :
= Total wins / Total matches played
You can create a bar chart selecting the win % table and clicking Insert > Bar Chart > 2D
column.
Comparisons:
We can see that MI has played the most number of matches and it has the highest winning
percentage amongst all the teams.
Question 2
Intuition
To solve the problem, we will use:
● Excel Remove duplicates button under data tab
● We will use Excel functions
● Charts
Solution
1. Method 1
● Removing duplicates using the remove duplicate button.
● Copy the column content to a new column and apply Remove Duplicates
2. Method 2
● Removing duplicates using the UNIQUE DAF function.
● Formula used:
=UNIQUE(Ques2[venue],FALSE,FALSE)
Analysis: There are various venues where MI has played. So performance analysis is not
biased by venue.
Question 3
Intuition
To solve the problem, we will use :
● Pivot Table, Conditional Formatting (top/bottom rules)
● Filter and large function
● Bar Chart
Solution
1. Select the entire table, click Insert > Pivot Table.
2. Select a location in the existing sheet, click OK.
3. Drag venue to rows, id to values, change value setting of id to count, drop winner on
filter, select Mumbai Indians from dropdown.
Adding conditional formatting and inserting chart :
We can select the data we got using Filter and Large function and can insert a Bar chart.
What are Pivot Tables?
A Pivot Table is a powerful tool to calculate, summarize, and analyze data that lets you see
comparisons, patterns, and trends in your data.
In its most basic form, a Pivot Table takes data and summarizes it so you can make sense of it.
Choosing Fields
1. Value Fields
2. Row Fields
3. Column Fields
4. Filter Fields
Our Table fields are available in the fields list, and we can drop them to the appropriate
fields(i.e. row , column, filter, value) as per our requirement.
Question 4
Highlight winning margins that are greater than 50 and also find out top 10 winning margins
greater than 50.
Intuition
To solve the problem, we will use :
● Conditional formatting (Highlight rules)
● Pivot Table
● Bar Chart
● Sort and filter function
Solution
Pre-processing
When the match was eliminator we change NA to 0 for result margin column
1. Method 1
● Show how to highlight cells using highlight rules of conditional formatting.
2. Method 2
● Using Pivot table and conditional formatting on pivot table.
3. Method 3
● Show how to get only top 10 results using the FILTER function and display the
result using a bar chart explaining how to select appropriate data to plot the chart
as per requirement.
○ =SORT(FILTER(S14:T133,T14:T133>50),2,-1)
Question 5
Intuition
To solve the problem, we will use :
● Pivot table
● Pivot chart
Solution
● Create pivot table
● Use the value filter available in the Pivot table to filter top 10
● Insert a pivot chart
Analysis : RG Sharma has won most number of time man of the match title followed by KA
Pollard.
Question 6
Intuition
To solve the problem, we will use :
● Pivot table
● Calculated fields
● Custom formatting, and
● Pie chart
Solution
● We make 2 helper columns one to check if MI played the match or not and other to
check did MI won the toss or not and sum to find toss won by MI
○ =IF( ([@team1]="Mumbai Indians")+([@team2]="Mumbai
Indians"),1,0)
○ =IF( [@[toss_winner]]="Mumbai Indians",1,0)
● Make a Pivot table
● Add Calculated fields to get % of toss won and lost for MI
● Formula for % toss won= ‘toss won by MI’/’Total matches played by MI’
● Formula for %toss lost= ‘toss lost by MI/Total matches played by MI’ (to get toss lost
by MI we subtract total matches played by MI with toss won by MI)
______________________________________________________________________________
Agenda
In this lecture, we will try to connect the dots and try to build an end to end dashboard using the
same data set to summarize a match.
Note:
● IPL_Matches_2008_2020 - Table is referred as Matches table in the script
● IPL_Ball_by_Ball_2008_2020 - Table is referred to as the Ball by Ball table in the script.
What is a Dashboard?
● A dashboard is a visual representation of key metrics that allow you to quickly view and
analyze your data in one place.
● Dashboards not only provide consolidated data views, but a self-service business
intelligence opportunity, where users are able to filter the data to display just what’s
important to them.
2. Add a new column to number the matches based on season and the numbering restarts
with a new season.
○ Use formula: =COUNTIF($C$2:C2,C2) [Drag it to all the cells]
● Add a column to get the season of the match using match id.
○ Use formula: =VLOOKUP($B2,'IPL Matches
2008-2020'!$A$1:$U$817,3,FALSE)
● Add a column to get the match name using the match id.
○ Use formula: =VLOOKUP($C2,'IPL Matches
2008-2020'!$A$1:$U$817,21,FALSE)
Creating a Slicer sheet
● Add a dummy pivot table to add slicers. [Select ball by ball table range]
● Slicers can only be added when we have a pivot table.
● Add Season, Match name and Innings slicer.
● Add another pivot table to get the match id and connect each slicer to multiple pivot
tables. [Select ball by ball table range]
● This will ensure that as we change values in a slicer the values in the pivot table are
updated.
1. Here we are getting the data of the match ID which is present in the slicer sheet. (2nd
pivot table containing only the match id field)
2. Get the following fields using vlookup from Matches table
○ Team 1: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,10,FALSE),"")
○ Team 2: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,11,FALSE),"")
○ Winner: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,14,FALSE),"")
○ Result: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,15,FALSE),"")
○ Result margin: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,16,FALSE),"")
○ Player of the match: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,7,FALSE),"")
Creating a Dashboard
● Copy paste slicers from Slicer sheet to DB sheet and modify and adjust them until it
looks good.
● Modify the DB sheet to change the look and feel of that sheet and add a dashboard title.
Steps:
1. This is the screenshot of the dashboard we have made. We want to protect the
dashboard where users can interact with Season, Match Name and innings only. But the
user is not able to check formulas/edit the dashboard created.
2. Right click on these groups (Season, match name, inning) with which the user can
interact, And go to size and properties.
○ First go to properties -> uncheck the locked option and choose option: don't
move or size with cells.
○ Then go to position and layout under the same tab. And check disable resizing,
so that user can resize the tab
○ Do the same for all with which user can interact with i.e. Season, match name
and innings
3. Then go to the review tab, and select the protect sheet. And uncheck select locked cells.
You can also give a password to an unprotect sheet so that the user cant unprotect
without password.
Now users can interact with Season, Match name and inning only, and can check results of
others like batting data, bowling data, etc.
4. Since we have used other sheets for developing the dashboard, we want to hide them
as well so that users can’t check them and protect them as well.
○ So, first select all of the sheets which we want to hide and right click and hide
them.
○ After that, go to the protect workbook option under the review tab, and protect
that. Same here, you can use a password protected workbook.
Now a user can just interact with interacting groups (like season, match name, innings) and can
check the plot/dashboard of the same data.
Coffee Chain Capstone
1. You’ll be seeing questions related to the case study under the homework section from
the next lecture.
2. Each question will have a doc link wherein you’ll find a dataset download link and its
description.
3. Case Study Doc: link
1. Balls_helper: to keep a count of the balls while developing the batting pivot table
○ Add 1 to every rows of this column
2. Is_4: to check is the batsman has scored a 4
○ Formula Used: =IF($J2=4,1,0)
3. Is_6: to check is the batsman has scored a 6
Formula Used: =IF($J2=6,1,0)
4. Is_wide: to check if it is a wide ball
○ Formula Used: =IF($R2="wides",1,0)
5. Is_noball: to check if it is a no ball
○ Formula Used: =IF($R2=“noballs”,1,0)
1. Since we have added new columns we have to change the range of earlier pivot table to
accommodate the new columns.
2. Adding the pivot table on the dashboard.
Note:
● To overcome the problem of updating the range of every pivot table we can format the
matches and ball by ball as tables (This is where Excel tables shine).
● After formatting as tables even if we add new columns we just need to right click each
pivot table and select refresh it would automatically include all the new columns added.
● Once you are done with all the steps till slide 9 you can format both matches and ball by
ball as table
● Apply custom formatting for the cell display number up to 2 decimal places in percentage
format (by right click > format cells).
● Remove fields that are not necessary in the pivot table and make a pie chart
visualization.
Analysis: MI has a slightly greater toss win % in comparison to the toss loss %.
Question 7
Intuition
To solve the problem, we will use:
● Pivot table
● Calculated fields
● Formatting
● Stacked Bar Chart
Solution
1. We make 3 helper columns 1st to check if MI played the match, 2nd to check MI Batted
first and 3rd is MI fielded first (use same logic as we did in question 1)
○ =IF( ([@winner]="Mumbai Indians")*(
[@[toss_decision]]="bat"),1,0) // When Mumbai won the toss
○ =IF( (([@winner]="Mumbai Indians")*(
[@[toss_decision]]="bat")*([@[toss_winner]]="Mumbai
Indians"))+(([@winner]="Mumbai Indians")*(
[@[toss_decision]]="field")*([@[toss_winner]]<>"Mumbai
Indians")),1,0) // Doesn’t matter who wins the toss
2. Add a Pivot table
3. Add filter to get only venues where MI played using filter area in pivot table
4. Calculated field (make sure to not multiply by 100)
5. Format as percentage
Analysis: MI has higher batting first win %. Their batsmen have helped them win more matches
compared to that by their bowlers.
Question 8
Intuition
To solve the problem, we will use :
● Pivot table
● Formatting
Solution
1. We make a helper column to check if MI played the match
2. Add a Pivot table
3. Filter where MI played match and method was D/L using filter area in pivot table
4. No Chart
Analysis: No match of MI was decided using the D/L method across the IPL seasons.
Question 9
Intuition
To solve the problem, we will use :
● Pivot table
● Bar chart
Solution
● We will select id and apply count aggregation in values, winner is added to filters while
result field is added to the rows in the Pivot table.
● Go to Ribbon → Insert Tab and select 2D Column (Bar chart)
Analysis: MI has won more matches where they were defending the runs (opted to bat first)
compared to chasing the runs or number of matches that ended in a tie.
Question 10
How many matches had super overs in IPL for MI and MI won them?
Intuition
To solve the problem, we will use:
● Pivot Table
Solution
● Build a pivot table
● In the filters area of the Pivot table filter where it was an eliminator match, MI played that
match and the winner was MI.
● No Chart
Analysis: MI faced 2 super over matches and won both of them.
Conclusion
● Mumbai Indians have the second highest winning % in the IPL history, while Chennai
Super Kings have the highest winning % by a narrow margin.
● Preliminary analysis tells us that Mumbai Indians team have played at various venues,
thus there performance is not biased by a venue
● But, doing a deeper analysis we find that MI has a significantly higher number of wins in
Wankhede Stadium in comparison to the other venues.
● We find that there are 12 matches where MI won with a margin greater than 50+ runs.
For our analysis, we haven’t considered the wickets margin.
● RG Sharma has won most number of time man of the match title followed by KA Pollard
● MI has a slightly greater toss win % in comparison to the toss loss %.
● MI has higher batting first win %. Their batsmen have helped them win more matches
compared to that by their bowlers.
● No match of MI was decided using D/L method across the IPL seasons
● MI has won more matches where they were defending the runs (opted to bat first)
compared to chasing the runs or number of matches that ended in a tie.
● MI faced 2 super over matches and won both of them. We have very few data points
here but definitely a positive thing for MI to be able to win both the matches.
Problem Statement 2
You are working as a data analyst at Hotstar. You are asked to calculate the standard deviation
in the total runs and highest frequently occurring total runs scored by MI throughout their IPL
history.
Steps
● Create a pivot table on ball by ball data. Match id in Rows, batting team in Filters and
total runs as Sum in Values.
● Create chart Box plot and Histogram on column named Total Runs (copied from Pivot
table Sum of total runs column)
● Separately using Excel Statistical formula like:
○ Mean: =AVERAGE(V4:V206)
○ Mode: =MODE(V4:V206)
○ Median: =MEDIAN(V4:V206)
○ Variance: =VAR.P(V4:V206) or =VAR(V4:V206)
○ Standard Deviation: =SQRT(Y5) or =SQRT(Y7)
○ First Quartile: =QUARTILE(V4:V206,1)
○ Third Quartile: =QUARTILE(V4:V206,3)
○ Min: =MIN(V4:V206)
○ Max: =MAX(V4:V206)
Summary
Macros Introduction
● When working in spreadsheets, you can enter into a loop of repetitive actions - copying
cell values, formatting, creating formulas, and so forth - which can grow tedious and lead
to mistakes.
● These actions become the ‘script’ that gets repeated once you save and activate the
macro later.
● A macro is an action or a set of actions that you can run as many times as you want.
When you create a macro, you are recording your mouse clicks and keystrokes.
Macros and VBA tools can be found on the Developer tab, which is hidden by default, so the
first step is to enable it.
Steps:
Problem Statement 3
You are working as a Data Analyst at an EdTech company and you are asked to automate the
task of performing Class Wise total marks & percentage for each student.
Dataset: link
Summary Table Demonstration
Given the ball-by-ball data for match id 1237181 create a batting and bowling scorecard as
shown.
Given the ball by ball data for match id 1237181 create a graph which indicates the cumulative
score at the end of each over as shown:
To get running SUM → Select pivot table DC column → Show Values as → More Options →
Show Data as → Drop down list → Running total in