0% found this document useful (0 votes)
25 views9 pages

Indian Premier League (IPL) - Data Analysis - Colab

This document outlines a data analysis project focused on the Indian Premier League (IPL) using Python libraries for data wrangling and visualization. It includes importing datasets related to players, matches, and teams, and provides insights into player statistics, match outcomes, and trends over seasons. The analysis highlights key findings such as the distribution of player ages, the number of matches per venue, and the performance of teams in terms of wins and toss outcomes.

Uploaded by

23eg106e26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views9 pages

Indian Premier League (IPL) - Data Analysis - Colab

This document outlines a data analysis project focused on the Indian Premier League (IPL) using Python libraries for data wrangling and visualization. It includes importing datasets related to players, matches, and teams, and provides insights into player statistics, match outcomes, and trends over seasons. The analysis highlights key findings such as the distribution of player ages, the number of matches per venue, and the performance of teams in terms of wins and toss outcomes.

Uploaded by

23eg106e26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab

#DATA WRANGLING & VISUALIZATION


#PROJECT BASED LEARNING
#23EG106E26
#23EG106E37
#24EG506A01
#24EG506B03

Start coding or generate with AI.

#Importing libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
import plotly.graph_objs as go
import plotly.offline as offline

# Initialize plotly for offline plotting


offline.init_notebook_mode(connected=True)

# Updated the paths to use the downloaded data source


players = pd.read_csv('/DIM_PLAYER.csv', encoding='ISO-8859-1')
player_match = pd.read_csv("/DIM_PLAYER_MATCH.csv", encoding='ISO-8859-1')
team = pd.read_csv("/DIM_TEAM.csv", encoding='ISO-8859-1')
ball_fact = pd.read_csv("/FACT_BALL_BY_BALL.csv", encoding='ISO-8859-1')
match = pd.read_excel("/DIM_MATCH.xlsx") # Removed the encoding argument as it is not supported by read_excel

Show hidden output

PLAYER_MATCH

Let's start by looking at player_match dataset

player_match.head(2)

Player_match_SK PlayerMatch_key Match_Id Player_Id Player_Name DOB Batting_hand Bowling_skill Country_Name Role_Desc .

0 -1 -1 -1 -1 NaN NaN NaN NaN NaN NaN

1973- Right-arm
1 12694 33598700006 335987 6 R Dravid Right-hand bat India Captain
01-11 offbreak

2 rows × 22 columns

player_match.describe(include='all')

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 1/9
3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab

Player_match_SK PlayerMatch_key Match_Id Player_Id Player_Name DOB Batting_hand Bowling_skill Country_Name

count 13993.000000 1.399300e+04 1.399300e+04 13993.000000 13992 13992 13992 12862 13992

unique NaN NaN NaN NaN 497 482 5 20 12

1987- Right-arm
top NaN NaN NaN NaN SK Raina Right-hand bat India
04-30 offbreak

freq NaN NaN NaN NaN 160 251 10026 3359 9045

mean 19688.092832 6.371377e+10 6.371377e+05 168.732152 NaN NaN NaN NaN NaN

std 4042.570934 2.350311e+10 2.350311e+05 129.453471 NaN NaN NaN NaN NaN

min -1.000000 -1.000000e+00 -1.000000e+00 -1.000000 NaN NaN NaN NaN NaN

25% 16191.000000 4.191540e+10 4.191540e+05 56.000000 NaN NaN NaN NaN NaN

50% 19689.000000 5.483820e+10 5.483820e+05 136.000000 NaN NaN NaN NaN NaN

75% 23187.000000 8.297460e+10 8.297460e+05 267.000000 NaN NaN NaN NaN NaN

max 26685.000000 1.082650e+11 1.082650e+06 497.000000 NaN NaN NaN NaN NaN

11 rows × 22 columns

captain = player_match[player_match['Role_Desc'] == 'Captain']


captain.Player_Name.unique()

array(['R Dravid', 'SC Ganguly', 'Yuvraj Singh', 'V Sehwag', 'SK Warne',
'Harbhajan Singh', 'VVS Laxman', 'SM Pollock', 'SR Tendulkar',
'SR Watson', 'MS Dhoni', 'KP Pietersen', 'BB McCullum', 'A Kumble',
'G Gambhir', 'SK Raina', 'DPMD Jayawardene', 'KC Sangakkara',
'DJ Bravo', 'DL Vettori', 'V Kohli', 'JR Hopes', 'CL White',
'DJ Hussey', 'SPD Smith', 'RT Ponting', 'AD Mathews',
'LRPL Taylor', 'AJ Finch', 'RG Sharma', 'DA Warner', 'GJ Bailey',
'S Dhawan', 'DJG Sammy', 'JP Duminy', 'Z Khan', 'DA Miller',
'M Vijay', 'GJ Maxwell', 'AM Rahane', 'KK Nair'], dtype=object)

plt.figure(figsize=(14,6))
sns.countplot(x='Age_As_on_match',data=player_match)

<Axes: xlabel='Age_As_on_match', ylabel='count'>

Age is normally distributed. There are some young players, probably talented enough to start playing early. We also observe some older
players, well into there 40's, still playing in the IPL

Match

Now let's analyze the second dataset 'Match'

match.head(2)

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 2/9
3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab

Match_SK match_id Team1 Team2 match_date Season_Year Venue_Name City_Name Country_Name Toss_Winner match_winne

Royal Kolkata M
Kolkata Kolkata Knig
0 546 980964 Challengers Knight 2016-05-02 2016 Chinnaswamy Bangalore India
Knight Riders Ride
Bangalore Riders Stadium

Saurashtra
Gujarat Delhi Cricket Delhi Del
1 547 980966 2016-05-03 2016 Rajkot India
Lions Daredevils Association Daredevils Daredevi
Stadium

Next steps: Generate code with match toggle_off View recommended plots New interactive sheet

match.describe(include='all')

Match_SK match_id Team1 Team2 match_date Season_Year Venue_Name City_Name Country_Name Toss_Winn

count 637.000000 6.370000e+02 637 637 637 637.000000 636 637 637 6

unique NaN NaN 13 13 NaN NaN 37 32 3

Royal M
Mumbai Mum
top NaN NaN Challengers NaN NaN Chinnaswamy Mumbai India
Indians India
Bangalore Stadium

freq NaN NaN 85 85 NaN NaN 66 85 560

2012-10-27
mean 318.000000 6.378825e+05 NaN NaN 2012.497645 NaN NaN NaN N
10:46:31.836734720

2008-04-18
min 0.000000 3.359870e+05 NaN NaN 2008.000000 NaN NaN NaN N
00:00:00

2010-04-11
25% 159.000000 4.191550e+05 NaN NaN 2010.000000 NaN NaN NaN N
00:00:00

2012-05-22
50% 318.000000 5.483830e+05 NaN NaN 2012.000000 NaN NaN NaN N
00:00:00

2015-04-22
75% 477.000000 8.297480e+05 NaN NaN 2015.000000 NaN NaN NaN N
00:00:00

2017-05-21
max 636.000000 1.082650e+06 NaN NaN 2017.000000 NaN NaN NaN N
00:00:00

std 184.030342 2.356312e+05 NaN NaN NaN 2.776600 NaN NaN NaN N

match.isnull().sum(axis=0)

Match_SK 0

match_id 0

Team1 0

Team2 0

match_date 0

Season_Year 0

Venue_Name 1

City_Name 0

Country_Name 0

Toss_Winner 1

match_winner 3

Toss_Name 1

Win_Type 2

Outcome_Type 0

ManOfMach 4

Win_Margin 9

Country_id 0

dtype: int64

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 3/9
3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab
#Number of teams
print("Number of unique teams: ",match.Team1.unique())

Number of unique teams: ['Royal Challengers Bangalore' 'Gujarat Lions' 'Kolkata Knight Riders'
'Delhi Daredevils' 'Sunrisers Hyderabad' 'Kings XI Punjab'
'Mumbai Indians' 'Rising Pune Supergiants' 'Rajasthan Royals'
'Deccan Chargers' 'Chennai Super Kings' 'Kochi Tuskers Kerala'
'Pune Warriors']

#Most man of the matches awards


ManofMatch = match.groupby(['ManOfMach']).count()['match_winner']
ManOfMatch_count = ManofMatch.sort_values(axis=0, ascending=False)
ManOfMatch_count.head()

match_winner

ManOfMach

CH Gayle 18

YK Pathan 16

AB de Villiers 15

DA Warner 15

RG Sharma 14

dtype: int64

#number of matches per season


plt.figure(figsize=(8,6))
sns.countplot(x='Season_Year', data=match)

<Axes: xlabel='Season_Year', ylabel='count'>

Number of games increased during 2011-2013

#Number of matches per venue


plt.figure(figsize=(14,6))
sns.countplot(x='Venue_Name', data=match, order=pd.value_counts(match['Venue_Name']).index)
plt.xticks(rotation='vertical')
plt.show()

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 4/9
3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab

Big cities with a home team have hosted more matches with M Chinnaswamy Stadium leading till 2017 followed by Eden Gardens and Feroz
Shah Kotla

#Wins per team


plt.figure(figsize=(8,6))
ax=sns.countplot(x='match_winner', data=match, order=pd.value_counts(match['match_winner']).index)
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.tight_layout()
plt.show()

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 5/9
3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab

Mumbai has most wins followed by Chennai and than Kolkata. Now, let's see who's winning the toss more often

#Toss wins per team


plt.figure(figsize=(8,6))
ax=sns.countplot(x='Toss_Winner', data=match, order=pd.value_counts(match['Toss_Winner']).index)
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.tight_layout()
plt.show()

Again it's Mumbai who's winning the toss more often. Let's see what are the teams doing after winning the toss over the years

match.replace(to_replace='Field', value = 'field', inplace=True) #Replace 'Field' with 'field'

match.replace(to_replace='Bat', value = 'bat', inplace=True) #Replace 'Bat' with 'bat'

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 6/9
3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab

plt.figure(figsize=(12,6))
sns.countplot(x='Season_Year', hue='Toss_Name', data=match)

<Axes: xlabel='Season_Year', ylabel='count'>

Teams used to bat first after winning the toss during initial years of IPL. But we see there's a clear change in this pattern, specially last couple
of years.

match.head()

Match_SK match_id Team1 Team2 match_date Season_Year Venue_Name City_Name Country_Name Toss_Winner match_winn

Royal Kolkata M
Kolkata Kolkata Kni
0 546 980964 Challengers Knight 2016-05-02 2016 Chinnaswamy Bangalore India
Knight Riders Rid
Bangalore Riders Stadium

Saurashtra
Gujarat Delhi Cricket Delhi De
1 547 980966 2016-05-03 2016 Rajkot India
Lions Daredevils Association Daredevils Daredev
Stadium

Kolkata
Kings XI Eden Kings XI Kolkata Kni
2 548 980968 Knight 2016-05-04 2016 Kolkata India
Punjab Gardens Punjab Rid
Riders

Rising
Delhi Feroz Shah Rising Pune Rising Pu
3 549 980970 Pune 2016-05-05 2016 Delhi India
Daredevils Kotla Supergiants Supergia
Supergiants

Rajiv Gandhi
Sunrisers Gujarat International Sunrisers Sunris
4 550 980972 2016-05-06 2016 Hyderabad India
Hyderabad Lions Stadium, Hyderabad Hyderab
Uppal

Next steps: Generate code with match toggle_off View recommended plots New interactive sheet

PLAYER

Now let's look at the third dataset 'Player'

players.head(2)

PLAYER_SK Player_Id Player_Name DOB Batting_hand Bowling_skill Country_Name

0 0 1 SC Ganguly 1972-07-08 Left-hand bat Right-arm medium India

1 1 2 BB McCullum 1981-09-27 Right-hand bat Right-arm medium New Zealand

Next steps: Generate code with players toggle_off View recommended plots New interactive sheet

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 7/9
3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab

Let's look at the batting and bowling styles of IPL Players

plt.figure(figsize=(8,6))
sns.countplot(x='Batting_hand', data=players)

<Axes: xlabel='Batting_hand', ylabel='count'>

plt.figure(figsize=(12,6))
ax=sns.countplot(x='Bowling_skill', data=players, order=pd.value_counts(players['Bowling_skill']).iloc[:10].index)
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right")
plt.show()

Right arm bat and Right arm medium are clearly more popular

Now, let's look at the countries.

Most players are from India followed by Australia and South Africa

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 8/9
3/4/25, 10:08 PM Indian Premier League(IPL): Data Analysis - Colab

BALL FACT

I will now quickly dive into the next dataset Ball Fact. This has lot of information about each ball bowled in the IPL

ball_fact.describe(include='all')

Ball_key MatcH_id Over_id Ball_id Innings_No Team_Batting Team_Bowling Striker_Batting_Position

count 1.504510e+05 1.504510e+05 150451.000000 150451.000000 150451.000000 150451.0 150451.0 136590.000000

unique NaN NaN NaN NaN NaN 23.0 29.0 NaN

top NaN NaN NaN NaN NaN 7.0 7.0 NaN

freq NaN NaN NaN NaN NaN 16975.0 16019.0 NaN

mean 6.362075e+12 6.362075e+05 10.142704 3.616639 1.482190 NaN NaN 3.583637

std 2.343623e+12 2.343623e+05 5.674255 1.807638 0.501768 NaN NaN 2.145090

min 3.359870e+12 3.359870e+05 1.000000 1.000000 1.000000 NaN NaN 1.000000

25% 4.191540e+12 4.191540e+05 5.000000 2.000000 1.000000 NaN NaN 2.000000

50% 5.483820e+12 5.483820e+05 10.000000 4.000000 1.000000 NaN NaN 3.000000

75% 8.297420e+12 8.297420e+05 15.000000 5.000000 2.000000 NaN NaN 5.000000

max 1.082650e+13 1.082650e+06 20.000000 9.000000 4.000000 NaN NaN 11.000000

11 rows × 53 columns

An extra is a run scored by a means other than a batsman hitting the ball Other than runs scored off the bat from a no ball a batsman is not

https://colab.research.google.com/#fileId=https%3A//storage.googleapis.com/kaggle-colab-exported-notebooks/indian-premier-league-ipl-data-an… 9/9

You might also like