0% found this document useful (0 votes)
25 views22 pages

YEAR: 2024 - 2025: Ipl Data Analysis Using Mysql and Python Connectivy

class 12 cs investigatory project , ipl data analysis using my sql

Uploaded by

Raghav.S /6218
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views22 pages

YEAR: 2024 - 2025: Ipl Data Analysis Using Mysql and Python Connectivy

class 12 cs investigatory project , ipl data analysis using my sql

Uploaded by

Raghav.S /6218
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

YEAR: 2024 - 2025

IPL DATA ANALYSIS


USING MySQL AND PYTHON CONNECTIVY

PROJECT BY:
S.Raghav
INDEX

S.No Contents Page No

1 Objective 1

2 Introduction 2

3 Hardwares and Softwars used 3

4 Python Code 4

5 Table Structure 11

6 Output 13

7 Conclusion 19
8 Bibliography 20
OBJECTIVE

The main objective of this Python project on IPL Data Analysis is to explore
and analyze IPL match data, stored in a MySQL database, to gain insights into
team performance, player statistics, and match outcomes. This project
provides an interactive platform to visualize key statistics and trends,
enabling data-driven decision-making and a deeper understanding of
historical patterns in the IPL.

This project serves as an analytical tool for cricket analysts, enthusiasts, and
coaching staff, offering a Python interface to interact with IPL datasets stored
in MySQL. The purpose is to develop a system that streamlines data handling,
from retrieving and processing data in MySQL to generating reports and
visualizations. With automated data extraction and dynamic insights into
player and team performance
INTRODUCTION

The IPL Data Analysis System is a Python-based application developed to


manage and analyze data from the Indian Premier League, providing in-depth
insights into match statistics, player performance, and team strategies. This
system leverages MySQL to securely store historical match data, player
details, and season summaries, creating a reliable data source for robust
analysis. Through an intuitive Python interface, users can retrieve and
visualize data, exploring trends across seasons, evaluating player consistency,
and comparing team performances.
Analysts and cricket enthusiasts can access a variety of features, including
match outcome statistics, player performance metrics, and season-over-
season comparisons. This system also supports custom reports and
visualizations, helping users gain a comprehensive understanding of IPL
dynamics and uncovering valuable insights for strategy enhancement and
decision-making.
Python acts as the front end of this system, processing data and presenting it
through an interactive interface, while MySQL serves as the back end,
organizing and storing IPL data securely in tables. The application is designed
to be efficient, user-friendly, and suitable for users with minimal technical
knowledge, providing a powerful tool for IPL data analysis and forecasting.
HARDWARE USED

Intel i5 Core Processer


16 GB RAM
1TB Hard Disk

SOFTWARE USED

Windows 10 - 64 bit operating system.


Python 3.12.7
MySQL v8.0.0
Python Code
import math
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#PRINTING THE DEATILS OF FIRST TWO DELIVERIES IN IPL HISTORY


ipl_data=pd.read_csv("D:\\cs project\\deliveries.csv")
ipl_matches=pd.read_csv("D:\\cs project\\matches.csv")

ipl_data.head(2)

print('PRINTING DATA TYPES OF ALL FIELDS IN THE CSV FILE')


ipl_data.info()
#MODIFYING THE FILE TO BE IMPORTED AS MYSQL TABLE
ipl_data[ipl_data.duplicated(keep=False)]
ipl_data.drop_duplicates(inplace=True) #inplace=True,it removes duplicates permanantely
ipl_data.duplicated().sum()

ipl_data.isnull().sum()

import pandas as pd
import mysql.connector
#creating a connection between notebook and database.
mydb=mysql.connector.connect(host="localhost",
database="ipl",
user="root",
password="root")

cursor=mydb.cursor() # making a cursor to excute queries.


cursor.execute('''Select * from ipl_ball_by_ball''') #SQL query execution.

mydb.close()
print('To Find the top five venue where most of IPL Matches played.')
top_played_venue=ipl_matches.groupby(['venue','id']).count().droplevel(level=1).index.value_co
unts().head()
top_played_venue=top_played_venue.reset_index() #reset_index() will convert series to
DataFrame
top_played_venue.rename(columns={'count':'Total_match'},inplace=True) #Renaming the
column to appropriate field
top_played_venue

print('Which team has played highest number of matches till 2020.')


import pandas as pd
import mysql.connector
mydb=mysql.connector.connect(host="localhost",
database="ipl",
user="root",
password="root")

mycursor=mydb.cursor()
sql_statement='''
with cte_matches as (
select team1 as team from matches
UNION ALL
select team2 as team from matches)
select team,count(1) total_played from cte_matches group by team order by 2 desc'''

mycursor.execute(sql_statement)
total_match_df = pd.DataFrame(mycursor.fetchall(),columns=['Team Name','Total Played
Matches'])
total_match_df.head(3)

print('plot of the horizontal bar plot of matches played by individual teams')


import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(6,5))
sns.barplot(data=total_match_df,x='Total Played Matches',y='Team Name',orient='h',width=0.6)
plt.show()
print('how many times team has won and loss the matches')
import pandas as pd
import mysql.connector
mydb=mysql.connector.connect(host="localhost",
database="ipl",
user="root",
password="root")

mycursor=mydb.cursor()
sql_statement='''with CTE_matches as(
select team1 as team,winner from matches
union all
select team2 as team,winner from matches
)
select team, count(case when team = winner then 1 end) as total_won,
count(case when team <> winner then 1 end) as total_loss
from CTE_matches
group by team order by 2 desc'''
mycursor.execute(sql_statement)
won_loss_df = pd.DataFrame(mycursor.fetchall(),columns=['Team','Won','Lose'])
won_loss_df.head(19)

print('WINNING RATIO OF TEAMS')


won_loss_df['Winning Ratio'] = round((won_loss_df['Won']/(won_loss_df['Lose'] +
won_loss_df['Won']))*100,2)
won_loss_df.head(19)

print('Most IPL centuries by a player')


import pandas as pd
import mysql.connector
mydb=mysql.connector.connect(host="localhost",
database="ipl",
user="root",
password="root")
mycursor=mydb.cursor()
statement='''WITH CTE_Run_scored as(
select id Match_id,batter,
sum(batsman_run) as 'Run_scored'
from ipl_ball_by_ball
group by id,batter
)
select * from (
Select batter,
count(Case when run_scored>=100 THEN 1 end) as total_centurie
from CTE_Run_scored
group by batter order by 2 desc) temp
where temp.total_centurie>0'''
mycursor.execute(statement)
most_centuries_df = pd.DataFrame(mycursor.fetchall(),columns=['Player','Total Centuries'])
most_centuries_df.head()

print('TOP 5 RUN SCORERS FROM EACH TEAM')


import pandas as pd
import mysql.connector
mydb=mysql.connector.connect(host="localhost",
database="ipl",
user="root",
password="root")
cursor=mydb.cursor()
SQL_statement='''
WITH IPL_Ranking as(
select Battingteam,batter,sum(batsman_run) as batsman_runs,
dense_rank() OVER(partition by battingteam order by sum(batsman_run) desc) as
'Player_Rank'
from ipl_ball_by_ball group by 1,2
)
select * from IPL_Ranking Where Player_rank <= %s
'''

cursor.execute(SQL_statement,(5,))
df=pd.DataFrame(cursor.fetchall(),columns=['Team_name','Batsman','Total_Run','Ranking'])
mydb.close()
#Output will contain top 5 batsman from each team but we will only see first 10
df.head(10)

print('plot a bar chart over player runs')


import matplotlib.pyplot as plt
plt.figure(figsize=(6,4))
plt.bar(df['Batsman'].head(10).to_list(),df['Total_Run'].head(10).to_list(),width=0.3)
plt.xticks(rotation=90)
plt.show()
print('find the total run scored by Virat Kohli till his 25th,5oth,100th and 200th match')
import pandas as pd
import mysql.connector
mydb=mysql.connector.connect(host="localhost",
database="ipl",
user="root",
password="root")
cursor=mydb.cursor()
SQL_statement='''
WITH CTE_Run_scored as(
select concat('Match-',row_number() Over(order by id)) as Match_No,
sum(batsman_run) as 'Run_scored'
from ipl_ball_by_ball
where batter='V Kohli'
group by id)
Select * from (select Match_No,Run_scored,
sum(run_scored) over(rows between unbounded preceding and current row) as
'cumulative_run'
from CTE_Run_scored) temp
where temp.Match_No IN ("Match-25","Match-50","Match-75",
"Match-100","Match-125","Match-150")
'''
cursor.execute(SQL_statement)
cummulative_run_df=pd.DataFrame(cursor.fetchall(),columns=
["Match_No","Run_scored","cumulative_run"])
mydb.close()
cummulative_run_df

print('Adding one extra column in above dataframe to make easy x-axis.')


cummulative_run_df['Match']=cummulative_run_df['Match_No'].apply(lambda x:x[6:])
cummulative_run_df

print('plotting the graph of above data')

import matplotlib.pyplot as plt


import seaborn as sns
fig=plt.figure(figsize=(6,4))
axes=fig.add_axes([0.1,0.1,0.8,0.8])
cummulative_run_df['Match']=cummulative_run_df['Match_No'].apply(lambda x:x[6:])
axes.plot(cummulative_run_df['Match'].to_list(),cummulative_run_df['cumulative_run'].to_list(),
color='red',linestyle='--',marker='o',markerfacecolor='k')
plt.show()
print('Average of Suresh Raina till his 50th,100th,150th match')
import pandas as pd
import mysql.connector
mydb=mysql.connector.connect(host="localhost",
database="ipl",
user="root",
password="root")
cursor=mydb.cursor()

sql_statement='''
with CTE_Total_run as(
select batter,concat('Match-',row_number() over(order by id)) as Match_No,
sum(batsman_run) as Run_scored
from ipl_ball_by_ball
where batter='SK Raina'
group by id
)
Select * from(
select *,Round(avg(Run_scored) OVER(rows between unbounded preceding and current
row),2) as avg_each_match
from CTE_Total_run) temp
where temp.Match_No="Match-50"
OR temp.Match_No="Match-100"
OR temp.Match_No="Match-150"
'''
cursor.execute(sql_statement)
running_avg_raina=pd.DataFrame(cursor.fetchall(),columns=
["Batsman","Match_No","Run_scored","avg_each_match"])
mydb.close()
running_avg_raina
print('Most Dot Ball by a Bowler')
import pandas as pd
import mysql.connector
mydb=mysql.connector.connect(host='localhost',
database='ipl',
user='root',
password='root')
mycursor=mydb.cursor()
sql_statement='''
select bowler, sum(dot_ball) as total_dot_ball from(
select id,bowler,count(case when total_run=0 then 1 end) as dot_ball
from ipl_ball_by_ball group by id,bowler
)temp
group by bowler order by 2 desc
'''
mycursor.execute(sql_statement)
dot_ball_df = pd.DataFrame(mycursor.fetchall(),columns=['Bowler','Total_dot_ball'])
dot_ball_df.head()
Table Structure:
Python Output:
CONCLUSION

The IPL Data Analysis project using Python and MySQL helps analyze the
performance of players and teams in the Indian Premier League. By storing
match and player data in MySQL and using Python for data processing and
visualization, we can easily explore insights like top run-scorers, winning
ratios, and popular match venues. The project allows for a better
understanding of IPL trends and performances, making it a useful tool for
analyzing team and player statistics. This combination of database
management and data analysis provides a simple yet powerful way to
uncover key information from IPL data.
BIIBLIOGRAPHY:
https://medium.com/@keep9647smile/ipl-data-analysis-
11250e6ee603
https://www.kaggle.com/datasets/patrickb1912/ipl-
complete-dataset-20082020 (For Data)
https://chatgpt.com/
https://www.linkedin.com/pulse/python-practice-project-
ipl-2022-cricket-sports-data-analysis-mishra

You might also like