0% found this document useful (0 votes)
6 views

Excel - Consolidated Lecture Notesids

Uploaded by

minemail257
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Excel - Consolidated Lecture Notesids

Uploaded by

minemail257
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

Excel Lecture 1: Introduction to Excel and Formulas

Agenda

● Excel Introduction
○ What is MS Excel?
○ MS Excel Marketplace
○ Job Opportunities
○ Purchase options : Excel 2021 vs Excel 365
○ Download and Install Excel
○ Excel Basics
● Excel Formulas
○ Problem Statement I & Solution
○ Problem Statement II & Solution

Excel Introduction

What is Microsoft Excel?

Excel is a powerful spreadsheet program that allows you to store, organize, manipulate, and
analyze information.

Excel is one of the most used software applications of all time*

Microsoft Excel Marketplace

● Forrester Research found that 81% of businesses use Excel. This means that the
majority of businesses use excel and these skills are very marketable to employers.
● Excel is used by an estimated 750 million people worldwide.
● According to Linkedin, Data Analysts roles in Excel are one of the top 10 most in
demand jobs.

Note: Forrester is a research and advisory company that offers a variety of services including
research, consulting, and events.

Excel Job opportunities

List of some of the top jobs that require excel skills:

● Data Analyst
● Business Analyst
● Financial Analyst
● Project Managers

Purchase options: Excel 2021 vs Excel 365


Excel 2021 Excel 365

Excel 2021 is a standalone A subscription service. Monthly fee of Rs. 4899/year for 1
‘one-time purchase’ option of person and Rs. 6199/year for up to 6 persons to rent the
Rs 11,999. software from Microsoft

Excel 2021 is yours to keep but Subscribers automatically receive the latest updates
does not receive perpetual
updates

Excel 2021 can be installed on Excel 365 can be installed on up to 5 devices


one-device

No access to an online version Access to online applications included in subscription


of Excel 2021

For pricing refer here.

Office 365 Excel Download and Installation Steps

Pre-read:
https://docs.google.com/document/d/1b_M1zbPYdUCHqsbz9gjMyaBcYY5wk5TDJIVlfNdZ7YM/
edit?usp=sharing)

Excel Basics

This doc is a pre-read that covers the basics of Excel. Please give it a good read.
https://docs.google.com/document/d/1W10kUxwMzPZPX6_JerTxFkzxh9wEa-G-6usCuAoCjxg/e
dit?usp=sharing

Problem Statement

After losing the IPL 2020 final to Mumbai Indians (Match ID 1237181), Delhi Capitals hired you
as a Data Analyst to analyze and understand why they lost the match.

Dataset Overview
The following dataset would be used
Source : IPL Complete Dataset (2008-2020) | Kaggle
Drive link: IPL dataset

It has two tables:

1. IPL Matches 2008-2020 - Details of each match


2. IPL Ball by ball - Details of each ball in each match

Question 1

How many runs were scored by Mumbai Indians and Delhi Capitals?

Intuition
To solve this problem:
● We need to add the ball by ball runs for each of the innings.
● We will briefly discuss Navigation and Selection in Excel while solving this problem.

Solution
● Use total_runs column
● Two methods to do this :
1. Using Cell references -
Formula used:
○ =SUM(J9:J131) for DC (We need to manually select the cells)
○ =SUM(AE9:AE120) for MI

Demo:
● Click on cell K5, type =sum, press tab.
● Click on J9, press Ctrl + Shift + (Down Arrow) on keyboard, press Enter.
2. Using table name - Refer here on how to format range of cells as table
Formula used:
○ =SUM(DC_ball_by_ball[total_runs])
○ =SUM(MI_ball_by_ball[total_runs])

Demo:
Summing using table names with this formula is much clearer.

● To check the table name click on any cell which is table formatted you’ll see a table
design menu appear in the ribbon click on that to check the table name.
● Click on K4, type ‘=SUM’, press tab, start typing the table name, when the table name is
highlighted in the list press tab.
● Type ‘[’, type total_runs, press tab, close both brackets, press enter.
Introducing Functions

1. We have seen SUM, what is SUM?


○ SUM is a mathematical function which adds values. You can add individual
values, cell references or ranges or a mix of all three.
2. What are functions?
○ Functions are predefined formulas that perform calculations by using specific
values, called arguments, in a particular order, or structure.
○ Functions can be used to perform simple or complex calculations.
3. Could you name a few different types of functions?
○ Note that we will see the following function types:
■ Maths
■ Logical
■ Date and Time
■ Text
■ Lookup
■ Array Formulas

Question 2

How many Extras runs were given by DC and MI?

Intuition
To solve the problem :
● Use extra_run column
● We will use SUM

Solution
● Formula used for DC Using table name:
○ =SUM(Ques2_DC[extra_runs])
● Formula used for MI [Scroll towards right in the same sheet]:
○ =SUM(Ques2_MI[extra_runs])

Demo:
1. Using extra runs column
○ In the Question 2 unworked sheet, click on any cell in the first table, go to table
design in the menu bar, see the table name on the left end.
○ Click on I3 cell, type =SUM(table_name[extra_runs]), press enter (use tab to
complete names from suggestions list)

To check extra runs given by MI to DC -


● Scroll to the right to see the other table.
● Find the table name.
● Type the formula =SUM(table_name[extra_runs]) in cell AF3.

Question 3

Find the number of wickets lost by MI and DC?

Intuition
● To solve this problem:
○ We will use the COUNTIF function.
○ In its simplest form, COUNTIF says
■ =COUNTIF(Where do you want to look?, What do you want to look for?)

Solution
● We will use the column is_wicket
● Formula used:
=COUNTIF(Ques3_MI[is_wicket],1) for MI

Alternate Solution
● Use of reference of another cell which contains 1
● Formula used:
=COUNTIF(Ques3_DC[is_wicket], AH23) for DC

Demo:
● Check the first table’s name, click on J3, enter the formula
=COUNTIF(table_name[is_wicket],1), press enter

● For DC, scroll to the right, check the second table name click on AI3 and enter the
formula up to =COUNTIF(table_name[is_wicket],
● Click on cell AH23, close the bracket press enter.
Question 4

What is the run rate comparison for MI and DC for the overs 0-4, 5-9, 10-14, 15-19th over.

Intuition
Definition of run rate: Number of runs divided by total number of overs in a given innings

In our dataset, we can’t use the AVERAGE() formula because it will divide the number of balls
but we want to divide by the number of overs.
We will use SUM and MAX.

Solution
● Get total number of overs in an innings
● Get total number of runs
● Divide total runs/total # of overs
● Formula used:
=SUM(FirstInningsData[total_runs])/(MAX(FirstInningsData[over])+1
)

We are adding 1 because over numbering starts from 0.

Demo:
● Check table name, Click on H4, type the formula =SUM(table_name[total_runs]),
press enter
● Click on J4, type the formula =MAX(table_name[over])+1, press enter
● Click on Q4, enter formula =H4/J4

The below demo shows how to calculate the run rate for MI because in the 18th over they faced
only 4 balls hence we need to calculate the run rate taking this into account.

● Zoom out and go to the table named ‘Get run rate for MI in 15 to 19 overs’, click on
AA120, enter the formula =SUM(table_name[total_runs]), press enter.
● Click on cell AJ120, enter the formula =AA120/AG120 by selecting the cells.
Note:

● To calculate an accurate run rate we also need to account the number of balls faced.
Refer here for more details.
● Here we are getting the maximum number of balls in 15-19th over manually and also the
total number of overs faced but we will see later how to get it using advanced function.

Question 5

Find the number of wickets that were lost in the first five overs for MI and DC.

Intuition
To solve the problem :
● Using is_wicket column
● We will use SUMIF

Solution
● Formula used for DC: =SUMIF(Ques5_DC[is_wicket],1)
● Formula used for MI: =SUMIF(Ques5_MI[is_wicket],1)

Demo:
● Find table name, click on I3, enter the formula =SUM(Table_16[is_wicket])

● Scroll right and repeat the above steps for MI.

Question 6
From the match id 1237181, given the list of batsmen which were out during the match, if they
were out by a catch we need to display the bowler and the fielder in this format “c fielder_name
b bowler_name” (As shown on the cricket match summary/scorecard).

Intuition
To solve this problem:
● In order to check if the person is out by a catch, we are checking the “dismissal_kind”
column by using the IF() formula and the desired output is obtained by using the
CONCAT() formula which is used to concatenate strings.

Solution
Formula used:
=IF(M12="caught",CONCAT("c ",O12," b ",G12),"Not Applicable")

For Google Sheet, use this formula:


=IF(M12="caught",CONCATENATE("c ",O12," b ",G12),"Not Applicable")

Demo:

● Click on G23 and enter the formula - =IF(M12="caught", CONCATENATE("c ",


O12, " b ", G12), "NOT APPLICABLE")

Question 7
If the player is dismissed by bowling, then display the output as “b bowler_name”.

Intuition
We will use IF and CONCAT

Solution
Logic is similar to the previous question.

Formula used:
=IF(M9="bowled",CONCAT("b ",G9),"Not Applicable")

Question 8
If the player is dismissed by run-out, then display the output as “run out (fielders_name)”

Intuition
We will use IF and CONCAT

Solution
Logic is similar to the previous question.

Formula used:
=IF(M8="run out",CONCAT("run out (",O8,")"),"Not Applicable")

If using Google Sheet, then use the concatenate function


=IF(M14="run out",CONCATENATE("run out (",O14,")"),"Not Applicable")

Question 9

Write a formula/function that combines all the three outputs of the previous three questions in
one i.e. depending on how the player is out the output should be displayed in an appropriate
format.

Intuition
To solve this problem, we will use Nested If.

Nested If is used if you need to test for more than one condition, then take one of several
actions, depending on the result of the tests.

Solution
Solution using Nested If :

Formula used:
=IF(M10="caught",CONCAT("c ",O10," b ",G10),IF(M10="bowled",CONCAT(" b
",G10),IF(M10="run out",CONCAT("run out (",O10,")"),"Not Applicable")))

Note: If using Google Sheet, use the CONCATENATE function.

Demo:
● Click on F46, enter the formula -
=IF(M10="caught",CONCATENATE("c ",O10," b ",G10),IF(M10="bowled",CONCATENATE("
b ",G10),IF(M10="run out",CONCATENATE("run out (",O10,")"),"Not Applicable")))

Nested If flowchart :
Is there any issue with the Nested if? Is it possible to write the nested if logic in a single
function?

● Nested if is error-prone.
● Very difficult to maintain/figure it out at a later point in time.
● As an alternative, there is a single function called IFS.

Alternate Solution

Solution using IFS :

The IFS function checks whether one or more conditions are met, and returns a value that
corresponds to the first TRUE condition.

Formula used:
=IFS(M10="caught", CONCAT("c ",O10," b ",G10), M10="bowled", CONCAT(" b
",G10), M10="run out", CONCAT("run out (",O10,")"), TRUE, "Not Applicable")

To summarize, some of the reasons why DC lost the IPL final against MI were:

● Since both the teams gave away an equal number of extras, nothing can be concluded
here.
● DC lost more wickets in comparison to MI.
● The run rate in the first 5 overs for MI was 11.6 while DC had a run rate of 7 and so we
can say that MI was able to score a significant amount of runs in the first 5 overs itself.
● In the first 5 overs DC lost 3 wickets. They lost a few of their top order batsmen, while MI
lost only 1 wicket during the same duration.

Tables in Excel

● To make managing and analyzing a group of related data easier, you can turn a range of
cells into an Excel table (previously known as an Excel list).
● Import the data into the Excel file.
● Add it to the table with a suitable name as it would be easy to access it later.
○ Note: table names cannot contain spaces
● Used to reference cell ranges by giving a table name.
● Reasons to use an Excel table
● Excel Table Benefits

Calculations in Excel

● Order of operations
● Formulas calculate values in a specific order.
● A formula in Excel always begins with an equal sign (=).
● Excel interprets the characters that follow the equal sign as a formula.
● Following the equal sign are the elements to be calculated (the operands), such as
constants or cell references.
● These are separated by calculation operators.
● Excel calculates the formula from left to right, according to a specific order for each
operator in the formula.
Note: If a formula contains multiple operators with the same priority (e.g. multiplication and
division, or addition and subtraction), Excel will evaluate the operators from left to right.

Date and Logical Function Demonstration

Question 10

Count the number of matches played in May-2008.

Intuition

To solve this problem:


We will use IF, AND, Date & Time functions (Year, Month)

Solution
● We will create a new column which checks the month and year of the match and we will
use AND function to combine them.
○ Formula used:
=IF(AND((YEAR([@date])=2008),(MONTH([@date])=5)),1,0)
● We can either use the COUNTIF() or SUM() Formula to add up.
○ Formula used:
=COUNTIF([Condition],1)

Demo:
● In the cell next to umpire 2 in headers, type ‘condition’ and press enter to create a new
column.
● In the first cell of the new column, enter the formula
=IF(AND((YEAR([@date])=2008),(MONTH([@date])=5)),1,0)
● In any cell outside the table, enter =SUM(Table_24[Condition]) using either cell
selection or table reference.

Text Function Demonstration

Question 11

Complete ODI batting career data of Sachin Tendulkar is given in [dataset link]. Count the
number of 100s and 50s he has scored?

Solution :

1. Cleaning the data:


○ Getting rid of the asterisk.
Formula used:
=IF(RIGHT([@Runs],1)="*",LEFT([@Runs],LEN([@Runs])-1),[@Runs
])
○ Converting it to a number from text using NUMBERVALUE formula
Formula used:
=NUMBERVALUE(IF(RIGHT([@Runs],1)="*",LEFT([@Runs],LEN([@Runs
])-1),[@Runs]))
○ If we are getting an error then resolve it using the IFERROR function.
Formula used:
=IFERROR(NUMBERVALUE(IF(RIGHT([@Runs],1)="*",LEFT([@Runs],LE
N([@Runs])-1),[@Runs])),0)
2. Get number of 100’s :
excel! =SUM(IF($J$11:$J$473>=100,1,0))
3. Get number of 50’s:
=SUM(IF(($J$11:$J$473>=50)*($J$11:$J$473<100),1,0))

Create a new column ‘cleaned runs’ and perform the above steps.

______________________________________________________________________________

Excel Lecture 2 : Dynamic Array Formulas and Pivot Tables

Agenda
● Problem Statement
○ Lookup and Reference functions
■ Lookup - Vlookup & Hlookup
■ Index & Match
■ Dynamic Array Formulas: Filter, Unique & Xlookup
● Text functions
○ TextJoin
● Logical functions
○ IFError
● Math function
○ Sum, Sequence

Problem Statement
You are hired at ESPN Cricinfo as a Data Analyst. You have to implement a search functionality
where given a Match ID :

● You should be able to find the name of the winning team, the venue and city.
● Data can be retrieved based on whichever column is selected.

Lookup Functions

● The LOOKUP Function is categorized under Excel Lookup and Reference functions.
● The function performs a rough match lookup either in a one-row or one-column range
and returns the corresponding value from another one-row or one-column range.
● The more advanced versions of the LOOKUP function are HLOOKUP and VLOOKUP.

The issue with Lookup and a better alternative.

● To search multiple rows and columns (like a table), it is not possible using Lookup.
● Use (V/H)LOOKUP to search one row or column, or to search multiple rows and
columns (like a table). It is an improved version of Lookup.

Vlookup

In its simplest form, the VLOOKUP function says:

=VLOOKUP(What you want to look up, where you want to look for it, the column number in the
range containing the value to return, return an Approximate or Exact match – indicated as
1/TRUE, or 0/FALSE).

Question 1

Given the ID of the Match, return the name of the team which won the match.

Intuition
To solve this problem :
● We will use Vlookup
● It is a function that makes Excel search for a certain value in a column, in order to return
a value from a different column in the same row.
● In its simplest form, the VLOOKUP function says:
=VLOOKUP(What you want to look up, where you want to look for it, the column number in the
range containing the value to return, return an Approximate or Exact match – indicated as
1/TRUE, or 0/FALSE).

Solution
=VLOOKUP(F10,IPLMatchData23,11,FALSE)

● We want the information from 11th column hence we have 11


● Note the column numbering starts from 1

Demo:
Find the table name and enter the formula - =VLOOKUP(F10,table_name,11,FALSE) in
G10.

● Now, what if we enter an incorrect Match ID. How to handle it?


● Using IFERROR function and encapsulating Vlookup function in it.

=IFERROR(VLOOKUP(F10,IPLMatchData23,11,FALSE),"Incorrect Match ID")

● Limitations:
○ VLookup always returns the first match
○ Can only lookup values on the leftmost column
○ Add/delete a column, we need to update the column index in the formula

Question 2
Given the ID of the Match, return the name of the city and the venue.

Intuition
To solve this problem: We will use Vlookup

Solution
● Formula to get city :
=VLOOKUP(F10,IPLMatchData23434[[#Headers],[#Data]],2,FALSE)
● Or, =VLOOKUP(F10,IPLMatchData23434,2,FALSE)
● Formula to get venue:
=VLOOKUP(F10,IPLMatchData23434[[#Headers],[#Data]],5,FALSE)

Demo:
● Find the table name
● Enter formula =VLOOKUP(F10,table_name,2,FALSE) in G10
● Enter formula =VLOOKUP(F10,table_name,5,FALSE) in H10

Approximate Match in Vlookup

We’ll look at a dummy example on how the Vlookup approximate match works.

Question

Determine the unit prices for the given quantity using the reference quantity table.
In an approximate match it returns a value smaller than the lookup value.

Note:
● If range_lookup is TRUE or left out, the first column needs to be sorted alphabetically or
numerically.
● If the first column isn’t sorted, the return value might be something you don’t expect.
● Either sort the first column, or use FALSE for an exact match.

Question 3

For any Match ID, whichever column is selected we want to have the column information.

Intuition

We need the position of an item in a range instead of the item itself, therefore we use Match
rather than Lookup.

Solution
● Match function formula used:
=MATCH(G10,IPLMatchData2345[[#Headers],[id]:[umpire2]],0)
● Or, =MATCH(G10,IPLMatchData2345[#Headers],0)
● Use this formula inside the Vlookup formula :
=VLOOKUP(F10,IPLMatchData2345,MATCH(G10,IPLMatchData2345[#Headers
],0),FALSE)
● Add Data Validation so that we do not enter incorrect column

Demo:
● Go to H11 and enter the formula =MATCH(G10,table_name[#Headers],0) to get
column number
● In H10, enter the formula :
=VLOOKUP(F10,table_name,MATCH(G10,table_name[#Headers],0),FALSE)
Adding data validation :

● Click on G10, open the ‘Data’ menu, click Data Validation.


● In Allow, choose List, in Source choose all the cells in the header/1st row.

Question 4

Find the first Match Id for which Sachin Tendulkar was the Player of the match.
Intuition

● We notice that this is the case of reverse lookup or left lookup as the id column lies on
the left side of the “player_of_match” column.
● We will use the Index and Match functions.
● Index returns the value at that Index position & Match returns the index position where of
the matched value.

Solution

Formula used:
=INDEX(IPLMatchData2345636[id],MATCH(F30,IPLMatchData[player_of_match]
,0),1)

● 0 is for exact match


● 1 is for column number 1
● Using match we are getting the row index where Sachin was man of the match and
using Index we get the match_id

Index function Demonstration :

● In cell F13, enter =INDEX(D12:D21,2)


● In cell J13, enter =INDEX(G11:H22,3,2)
● In H31, enter =MATCH(F30,Table_name[player_of_match],0)
● In H30 enter =INDEX(Table_name[id],H31,1)

When to use what?

To decide if you can use a VLOOKUP, or if you need to use INDEX and MATCH, you need to look
at two things:

● the input (the information you’re using to do the lookup), and


● the output (the information you’re trying to get).
When you’re trying to lookup a value based on a single input, you may be able to use a
VLOOKUP function.

If you’re explicitly asked to find the location of a single input, or to look something up using two
or more pieces of information, you’ll need to use the MATCH and/or INDEX functions instead.

MATCH finds the position of an item in a range. INDEX retrieves the value at a given location in a
range.

HLOOKUP Demonstration
Question 5

Considering the data given, what is the Number of Centuries scored by V Sehwag?

Intuition
We will use HLOOKUP.
HLOOKUP searches for a value in the top row of a table or an array of values, and then returns
a value in the same column from a row you specify in the table or array.

Solution

Formula used:
=HLOOKUP(L9,Table12[[#All],[SR Tendulkar]:[V Kohli]],3,FALSE)

Demo:
● Go to cell L10 and enter =HLOOKUP(L9, Table_name[#All], 3,0)

Array Function Demonstration

Question 6

Calculate the total bill amount of products.

Intuition
We show the comparison of with/without using array function.
An array function can perform multiple calculations on one or more items in an array.
Array functions can return either multiple results, or a single result.

Solution

Formula used without array function:


=[@[Price per piece]]*[@Quantity]

Demo:
● Enter =[@[Price per piece]]*[@Quantity] in G12
● Delete the value in G19
● Enter =SUM(G12:G18) in G20

Formula used with array function:


=SUM(E20:E26*F20:F26)

Demo:
● Enter =SUM(E20:E26*F20:F26) in G33
Question 7

Consider the data for a match with id 335982. How many overs were bowled in the first innings
and second innings?

Intuition

Legacy formulas return the output in one cell but Dynamic array formulas return the output in a
range of cells

As they return arrays of variable size hence the name Dynamic array. Example : FILTER,
SEQUENCE, UNIQUE, XLOOKUP

We use Filter and Max.

Solution

● To get results of 1st innings steps are


○ Filter out rows of 1st innings data:
=FILTER(Match33598323[over],Match33598323[inning]=1)
○ Take Max:
=MAX(E242#)
● Note: The # symbol is to take the output of a spilled or a DAF in complete i.e. all the
rows and columns which are spilled by the formula are taken into consideration, that’s
the reason why we are considering MAX(E242#)
○ Or we can combine them as:
=MAX(FILTER(Match33598323[over],Match33598323[inning]=1))
● To get results of 2nd innings:
=MAX(FILTER(Match33598323[over],Match33598323[inning]=2))

Demo:
● Go to cell E242 and enter =FILTER(Table_name[over],Table_name[inning]=1)
● Go to cell K242 and enter =MAX(E242#)
● Go to Q242 and enter
=MAX(FILTER(Table_name[over],Table_name[inning]=2))

Question 8

Consider the data for a match with id 335982.How many overs were bowled in the first innings
and second innings?
Intuition

We will be using FILTER, MAX, SEQUENCE, TEXTJOIN, SUM

Solution

● Filter out rows for 1st innings similar to what we did previously.
=FILTER(MatchData15[over],MatchData15[inning]=1)
○ Syntax: =FILTER(array,include,[if_empty])
● Use max to get the maximum number of overs.
=MAX(E242#)
● Generate sequence of overs from 0 to max.
=SEQUENCE(L242+1,,0,1)
○ Syntax: =SEQUENCE(rows,[columns],[start],[step]).
● Get runs scored on each ball for 1st over
=FILTER(MatchData15[total_runs],(MatchData15[over]=Z242)*(MatchDa
ta15[inning]=1))
● Use text join to combine it into single line which is comma separated and do this for all
the overs
=TEXTJOIN(",",TRUE,FILTER(MatchData15[total_runs],(MatchData15[ov
er]=AH242)*(MatchData15[inning]=1)))
● Syntax: =TEXTJOIN(delimiter, ignore_empty, text1, [text2], …)

● (We remove the textjoin from above formula) and SUM the runs for each over.
=SUM(FILTER(MatchData15[total_runs],(MatchData15[over]=AP242)*(Ma
tchData15[inning]=1)))
● Get the total score over by over
=SUM($AR$242:AR242)
Note: Repeat above steps to get the result for 2nd innings.

Dynamic Array Formulas (DAF) vs Legacy Array Features

Dynamic Array Formulas (DAF) Legacy Ctrl+Shift+Enter (CSE) Array Formulas

A dynamic array formula is entered in To complete an old-fashioned array formula, you


one cell and completed with a regular need to press Ctrl + Shift + Enter.
Enter keystroke.

New array formulas spill to many cells CSE formulas must be copied to a range of cells
automatically to return multiple results.

The output of dynamic array formulas CSE formulas truncate the output if the return
automatically resizes as the data in the area is too small and return errors in extra cells if
source range changes. the return area is too large.

A dynamic array formula can be easily To modify a CSE formula, you need to select and
edited in a single cell. edit the whole range.

With dynamic arrays, row insertion or It is not possible to delete and insert rows in a
deletion is not a problem. CSE formula range - you need to delete all
existing formulas first.

Look up to Excel Dynamic Arrays, Functions & Formulas.


XLOOKUP

Question 9

Determine the first match ID and city of match for which Sachin Tendulkar was the player of the
match.

Intuition

We have seen this question previously (Qn 4), this time will use Dynamic Array Formulas
(XLOOKUP function) which works in any direction and returns exact matches by default.

Solution

Syntax: =XLOOKUP(lookup_value, lookup_array, return_array,


[if_not_found], [match_mode], [search_mode])

By default exact match.

Formula used:
=XLOOKUP(E10,IPLMatchDataTable[player_of_match],IPLMatchDataTable[[id]
:[city]],,0,1)
Problems of Index Match (and better alternative)

______________________________________________________________________________

Excel Lecture 3 - Charts, Statistical Functions & Macros

Agenda

● Problem Statement I
○ Functions
■ Unique, Filter
○ Pivot Tables
■ Helper column, Calculated Fields, Pivot Filter, Slicers
○ Duplicate Row Deletion
○ Custom formatting of cell
○ Conditional formatting
■ Top/Bottom Rules, Highlight rules, Data Bars
○ Charts
■ Bar, Pie, Stacked Bar
○ Conclusion
● Match Summary Analysis Demonstration
● Problem Statement II
○ Statistical Functions
■ Mean, Median, Mode, SD, Variance, Quartile, Min, Max
■ Box Plot, Histogram
Problem Statement 1

You have applied for a data analyst position in an Indian sports-tech startup company. In the
interview, you are given the IPL dataset and have been asked to analyze Mumbai Indian’s
performance in IPL (2008-2020).

Question 1

What is win % of MI in comparison to the other teams in IPL

Intuition
To solve the problem, we will use :
● Excel functions
● Charts

Solution

● Create two new columns for each team to count total number of matches played and
total number of wins (Show how to do it for 2 teams only rest use the worked worksheet)
○ =IF([@winner]= "Mumbai Indians", 1, 0)
○ =IF( ([@team1]="Mumbai Indians")+([@team2]="Mumbai
Indians"),1,0)
● Finally we calculate :
= Total wins / Total matches played
You can create a bar chart selecting the win % table and clicking Insert > Bar Chart > 2D
column.

Comparisons:

We can see that MI has played the most number of matches and it has the highest winning
percentage amongst all the teams.
Question 2

Number of unique venues.

Intuition
To solve the problem, we will use:
● Excel Remove duplicates button under data tab
● We will use Excel functions
● Charts

Solution

1. Method 1
● Removing duplicates using the remove duplicate button.
● Copy the column content to a new column and apply Remove Duplicates

2. Method 2
● Removing duplicates using the UNIQUE DAF function.
● Formula used:
=UNIQUE(Ques2[venue],FALSE,FALSE)

Analysis: There are various venues where MI has played. So performance analysis is not
biased by venue.

Question 3

Top 5 winning venues for MI.

Intuition
To solve the problem, we will use :
● Pivot Table, Conditional Formatting (top/bottom rules)
● Filter and large function
● Bar Chart

Solution
1. Select the entire table, click Insert > Pivot Table.
2. Select a location in the existing sheet, click OK.
3. Drag venue to rows, id to values, change value setting of id to count, drop winner on
filter, select Mumbai Indians from dropdown.
Adding conditional formatting and inserting chart :

5. Select the table created above


6. Go to Home > Conditional formatting > top/bottom rule > top 10 items. Choose top 5 and
green color.
7. Select the created table again. Insert a chart from Insert > Bar Chart > 2D chart.

Using Filter and Large function :

8. In a cell, enter FILTER(row label cells, count of id cells>=LARGE(count


of id cells, 5))
9. In another cell, enter SORT(FILTER(row label cells, count of id
cells>=LARGE(count of id cells, 5)),1,-1)

We can select the data we got using Filter and Large function and can insert a Bar chart.
What are Pivot Tables?

A Pivot Table is a powerful tool to calculate, summarize, and analyze data that lets you see
comparisons, patterns, and trends in your data.

In its most basic form, a Pivot Table takes data and summarizes it so you can make sense of it.

Choosing Fields

Fields are of 4 types:

1. Value Fields
2. Row Fields
3. Column Fields
4. Filter Fields

Our Table fields are available in the fields list, and we can drop them to the appropriate
fields(i.e. row , column, filter, value) as per our requirement.
Question 4

Highlight winning margins that are greater than 50 and also find out top 10 winning margins
greater than 50.

Intuition
To solve the problem, we will use :
● Conditional formatting (Highlight rules)
● Pivot Table
● Bar Chart
● Sort and filter function

Solution

Pre-processing
When the match was eliminator we change NA to 0 for result margin column

1. Method 1
● Show how to highlight cells using highlight rules of conditional formatting.

2. Method 2
● Using Pivot table and conditional formatting on pivot table.
3. Method 3
● Show how to get only top 10 results using the FILTER function and display the
result using a bar chart explaining how to select appropriate data to plot the chart
as per requirement.
○ =SORT(FILTER(S14:T133,T14:T133>50),2,-1)

● Explain formatting of chart in excel (look and feel)


○ Using Right click > Select Data…
○ Using chart design in menu bar
Analysis: We find that there are 12 matches where MI won with a margin greater than 50+
runs. For our analysis, we haven’t considered the wickets margin.

Question 5

Top 10 man of the match winners

Intuition
To solve the problem, we will use :
● Pivot table
● Pivot chart

Solution
● Create pivot table
● Use the value filter available in the Pivot table to filter top 10
● Insert a pivot chart
Analysis : RG Sharma has won most number of time man of the match title followed by KA
Pollard.

Question 6

Percentage of toss win/loss

Intuition
To solve the problem, we will use :
● Pivot table
● Calculated fields
● Custom formatting, and
● Pie chart

Solution

● We make 2 helper columns one to check if MI played the match or not and other to
check did MI won the toss or not and sum to find toss won by MI
○ =IF( ([@team1]="Mumbai Indians")+([@team2]="Mumbai
Indians"),1,0)
○ =IF( [@[toss_winner]]="Mumbai Indians",1,0)
● Make a Pivot table
● Add Calculated fields to get % of toss won and lost for MI
● Formula for % toss won= ‘toss won by MI’/’Total matches played by MI’
● Formula for %toss lost= ‘toss lost by MI/Total matches played by MI’ (to get toss lost
by MI we subtract total matches played by MI with toss won by MI)

______________________________________________________________________________

Excel Lecture 4 - Dashboards & Excel Capstone Intro

Agenda

In this lecture, we will try to connect the dots and try to build an end to end dashboard using the
same data set to summarize a match.

Note:
● IPL_Matches_2008_2020 - Table is referred as Matches table in the script
● IPL_Ball_by_Ball_2008_2020 - Table is referred to as the Ball by Ball table in the script.

What is a Dashboard?

● A dashboard is a visual representation of key metrics that allow you to quickly view and
analyze your data in one place.
● Dashboards not only provide consolidated data views, but a self-service business
intelligence opportunity, where users are able to filter the data to display just what’s
important to them.

Building an End to End Dashboard


Steps:
1. Importing the Dataset
2. Formatting the Data
3. Adding Pivot Tables
4. Adding Slicers
5. Combining all of them together to create a Dashboard.

Importing the Dataset

● Load both the IPL dataset into the same sheet.


○ Worksheet: link
Adding new columns to Matches dataset

1. Add a new column to get the Season for each match.


○ Use formula: =CONCAT("IPL-",YEAR($E2)) [Drag it to all the cells]

2. Add a new column to number the matches based on season and the numbering restarts
with a new season.
○ Use formula: =COUNTIF($C$2:C2,C2) [Drag it to all the cells]

3. Create a column to convert the number of text.


○ Use formula: =TEXT($D2,"0#") [Drag it to all the cells]

4. Create a column to get the match name.


○ Use formula: =CONCAT($E2,". ",$J2," Vs ",$K2) [Drag it to all the cells]

Adding new columns to the Ball by Ball data

● Add a column to get the season of the match using match id.
○ Use formula: =VLOOKUP($B2,'IPL Matches
2008-2020'!$A$1:$U$817,3,FALSE)

● Add a column to get the match name using the match id.
○ Use formula: =VLOOKUP($C2,'IPL Matches
2008-2020'!$A$1:$U$817,21,FALSE)
Creating a Slicer sheet

● Add a dummy pivot table to add slicers. [Select ball by ball table range]
● Slicers can only be added when we have a pivot table.
● Add Season, Match name and Innings slicer.
● Add another pivot table to get the match id and connect each slicer to multiple pivot
tables. [Select ball by ball table range]
● This will ensure that as we change values in a slicer the values in the pivot table are
updated.

Creating a Header sheet

1. Here we are getting the data of the match ID which is present in the slicer sheet. (2nd
pivot table containing only the match id field)
2. Get the following fields using vlookup from Matches table
○ Team 1: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,10,FALSE),"")
○ Team 2: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,11,FALSE),"")
○ Winner: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,14,FALSE),"")
○ Result: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,15,FALSE),"")
○ Result margin: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,16,FALSE),"")
○ Player of the match: =IFERROR(VLOOKUP($D$3,'IPL Matches
2008-2020'!$A$1:$U$817,7,FALSE),"")

Creating a Dashboard

● Copy paste slicers from Slicer sheet to DB sheet and modify and adjust them until it
looks good.
● Modify the DB sheet to change the look and feel of that sheet and add a dashboard title.

● Bringing in all the fields from the Header sheet to DB sheet.


● Demonstrate how fields are updated based on the selections done in the slicers.
Excel worksheet/workbook Protection

Steps:
1. This is the screenshot of the dashboard we have made. We want to protect the
dashboard where users can interact with Season, Match Name and innings only. But the
user is not able to check formulas/edit the dashboard created.
2. Right click on these groups (Season, match name, inning) with which the user can
interact, And go to size and properties.
○ First go to properties -> uncheck the locked option and choose option: don't
move or size with cells.
○ Then go to position and layout under the same tab. And check disable resizing,
so that user can resize the tab
○ Do the same for all with which user can interact with i.e. Season, match name
and innings
3. Then go to the review tab, and select the protect sheet. And uncheck select locked cells.
You can also give a password to an unprotect sheet so that the user cant unprotect
without password.

Now users can interact with Season, Match name and inning only, and can check results of
others like batting data, bowling data, etc.
4. Since we have used other sheets for developing the dashboard, we want to hide them
as well so that users can’t check them and protect them as well.
○ So, first select all of the sheets which we want to hide and right click and hide
them.
○ After that, go to the protect workbook option under the review tab, and protect
that. Same here, you can use a password protected workbook.

Now a user can just interact with interacting groups (like season, match name, innings) and can
check the plot/dashboard of the same data.
Coffee Chain Capstone

1. You’ll be seeing questions related to the case study under the homework section from
the next lecture.
2. Each question will have a doc link wherein you’ll find a dataset download link and its
description.
3. Case Study Doc: link

Add following additional columns to Ball by Ball data :

1. Balls_helper: to keep a count of the balls while developing the batting pivot table
○ Add 1 to every rows of this column
2. Is_4: to check is the batsman has scored a 4
○ Formula Used: =IF($J2=4,1,0)
3. Is_6: to check is the batsman has scored a 6
Formula Used: =IF($J2=6,1,0)
4. Is_wide: to check if it is a wide ball
○ Formula Used: =IF($R2="wides",1,0)
5. Is_noball: to check if it is a no ball
○ Formula Used: =IF($R2=“noballs”,1,0)

Batting Pivot Table

1. Create a new pivot table on Ball by ball data.


2. Add a calculated field to calculate Strike rate of batsman.
3. Name the Pivot table and column by using a meaningful name.

Connecting the Pivot Table to available Slicers

1. Since we have added new columns we have to change the range of earlier pivot table to
accommodate the new columns.
2. Adding the pivot table on the dashboard.

Note:
● To overcome the problem of updating the range of every pivot table we can format the
matches and ball by ball as tables (This is where Excel tables shine).
● After formatting as tables even if we add new columns we just need to right click each
pivot table and select refresh it would automatically include all the new columns added.
● Once you are done with all the steps till slide 9 you can format both matches and ball by
ball as table
● Apply custom formatting for the cell display number up to 2 decimal places in percentage
format (by right click > format cells).
● Remove fields that are not necessary in the pivot table and make a pie chart
visualization.
Analysis: MI has a slightly greater toss win % in comparison to the toss loss %.

Question 7

What is the win % of batting/bowling first for a venue

Intuition
To solve the problem, we will use:
● Pivot table
● Calculated fields
● Formatting
● Stacked Bar Chart

Solution

1. We make 3 helper columns 1st to check if MI played the match, 2nd to check MI Batted
first and 3rd is MI fielded first (use same logic as we did in question 1)
○ =IF( ([@winner]="Mumbai Indians")*(
[@[toss_decision]]="bat"),1,0) // When Mumbai won the toss
○ =IF( (([@winner]="Mumbai Indians")*(
[@[toss_decision]]="bat")*([@[toss_winner]]="Mumbai
Indians"))+(([@winner]="Mumbai Indians")*(
[@[toss_decision]]="field")*([@[toss_winner]]<>"Mumbai
Indians")),1,0) // Doesn’t matter who wins the toss
2. Add a Pivot table
3. Add filter to get only venues where MI played using filter area in pivot table
4. Calculated field (make sure to not multiply by 100)
5. Format as percentage

6. Use stacked bar chart to visualize the result

Analysis: MI has higher batting first win %. Their batsmen have helped them win more matches
compared to that by their bowlers.
Question 8

Number of matches decided by D/L method(runs/wickets)?

Intuition
To solve the problem, we will use :
● Pivot table
● Formatting

Solution
1. We make a helper column to check if MI played the match
2. Add a Pivot table
3. Filter where MI played match and method was D/L using filter area in pivot table
4. No Chart

Analysis: No match of MI was decided using the D/L method across the IPL seasons.

Question 9

Number of times matches won by runs/wickets?

Intuition
To solve the problem, we will use :
● Pivot table
● Bar chart
Solution
● We will select id and apply count aggregation in values, winner is added to filters while
result field is added to the rows in the Pivot table.
● Go to Ribbon → Insert Tab and select 2D Column (Bar chart)

Analysis: MI has won more matches where they were defending the runs (opted to bat first)
compared to chasing the runs or number of matches that ended in a tie.

Question 10

How many matches had super overs in IPL for MI and MI won them?

Intuition
To solve the problem, we will use:
● Pivot Table

Solution
● Build a pivot table
● In the filters area of the Pivot table filter where it was an eliminator match, MI played that
match and the winner was MI.
● No Chart
Analysis: MI faced 2 super over matches and won both of them.

Conclusion

After analysis we found that:

● Mumbai Indians have the second highest winning % in the IPL history, while Chennai
Super Kings have the highest winning % by a narrow margin.
● Preliminary analysis tells us that Mumbai Indians team have played at various venues,
thus there performance is not biased by a venue
● But, doing a deeper analysis we find that MI has a significantly higher number of wins in
Wankhede Stadium in comparison to the other venues.
● We find that there are 12 matches where MI won with a margin greater than 50+ runs.
For our analysis, we haven’t considered the wickets margin.
● RG Sharma has won most number of time man of the match title followed by KA Pollard
● MI has a slightly greater toss win % in comparison to the toss loss %.
● MI has higher batting first win %. Their batsmen have helped them win more matches
compared to that by their bowlers.
● No match of MI was decided using D/L method across the IPL seasons
● MI has won more matches where they were defending the runs (opted to bat first)
compared to chasing the runs or number of matches that ended in a tie.
● MI faced 2 super over matches and won both of them. We have very few data points
here but definitely a positive thing for MI to be able to win both the matches.
Problem Statement 2

You are working as a data analyst at Hotstar. You are asked to calculate the standard deviation
in the total runs and highest frequently occurring total runs scored by MI throughout their IPL
history.

Dataset used: IPL ball by ball table of IPL dataset

To solve this we will use :


● Mean
● Median
● Mode (frequently occuring) - using histogram

Steps
● Create a pivot table on ball by ball data. Match id in Rows, batting team in Filters and
total runs as Sum in Values.
● Create chart Box plot and Histogram on column named Total Runs (copied from Pivot
table Sum of total runs column)
● Separately using Excel Statistical formula like:
○ Mean: =AVERAGE(V4:V206)
○ Mode: =MODE(V4:V206)
○ Median: =MEDIAN(V4:V206)
○ Variance: =VAR.P(V4:V206) or =VAR(V4:V206)
○ Standard Deviation: =SQRT(Y5) or =SQRT(Y7)
○ First Quartile: =QUARTILE(V4:V206,1)
○ Third Quartile: =QUARTILE(V4:V206,3)
○ Min: =MIN(V4:V206)
○ Max: =MAX(V4:V206)
Summary

The required answer is:

● Standard Deviation: 29.14044478


● Highest frequently occurring total runs scored by MI throughout their IPL history is 157
(Mode)

Macros Introduction

● When working in spreadsheets, you can enter into a loop of repetitive actions - copying
cell values, formatting, creating formulas, and so forth - which can grow tedious and lead
to mistakes.
● These actions become the ‘script’ that gets repeated once you save and activate the
macro later.
● A macro is an action or a set of actions that you can run as many times as you want.
When you create a macro, you are recording your mouse clicks and keystrokes.

Enabling Macro in Excel

Macros and VBA tools can be found on the Developer tab, which is hidden by default, so the
first step is to enable it.

Steps:

● On the File tab, go to Options > Customize Ribbon.


● Under Customize the Ribbon and under Main Tabs, select the Developer check box.

Problem Statement 3

You are working as a Data Analyst at an EdTech company and you are asked to automate the
task of performing Class Wise total marks & percentage for each student.

Dataset: link
Summary Table Demonstration

Given the ball-by-ball data for match id 1237181 create a batting and bowling scorecard as
shown.

Ref link for getting overs in decimal format -


http://raaque.blogspot.com/2015/06/microsoft-excel-cricket-convert-balls.html

Ref link for using single slicer on multiple pivot table -


https://www.myexcelonline.com/blog/connect-slicers-to-multiple-excel-pivot-tables/

Cumulative scores summary using a line graph :

Given the ball by ball data for match id 1237181 create a graph which indicates the cumulative
score at the end of each over as shown:
To get running SUM → Select pivot table DC column → Show Values as → More Options →
Show Data as → Drop down list → Running total in

You might also like