YEAR: 2024 - 2025: Ipl Data Analysis Using Mysql and Python Connectivy
YEAR: 2024 - 2025: Ipl Data Analysis Using Mysql and Python Connectivy
PROJECT BY:
S.Raghav
INDEX
1 Objective 1
2 Introduction 2
4 Python Code 4
5 Table Structure 11
6 Output 13
7 Conclusion 19
8 Bibliography 20
OBJECTIVE
The main objective of this Python project on IPL Data Analysis is to explore
and analyze IPL match data, stored in a MySQL database, to gain insights into
team performance, player statistics, and match outcomes. This project
provides an interactive platform to visualize key statistics and trends,
enabling data-driven decision-making and a deeper understanding of
historical patterns in the IPL.
This project serves as an analytical tool for cricket analysts, enthusiasts, and
coaching staff, offering a Python interface to interact with IPL datasets stored
in MySQL. The purpose is to develop a system that streamlines data handling,
from retrieving and processing data in MySQL to generating reports and
visualizations. With automated data extraction and dynamic insights into
player and team performance
INTRODUCTION
SOFTWARE USED
ipl_data.head(2)
ipl_data.isnull().sum()
import pandas as pd
import mysql.connector
#creating a connection between notebook and database.
mydb=mysql.connector.connect(host="localhost",
database="ipl",
user="root",
password="root")
mydb.close()
print('To Find the top five venue where most of IPL Matches played.')
top_played_venue=ipl_matches.groupby(['venue','id']).count().droplevel(level=1).index.value_co
unts().head()
top_played_venue=top_played_venue.reset_index() #reset_index() will convert series to
DataFrame
top_played_venue.rename(columns={'count':'Total_match'},inplace=True) #Renaming the
column to appropriate field
top_played_venue
mycursor=mydb.cursor()
sql_statement='''
with cte_matches as (
select team1 as team from matches
UNION ALL
select team2 as team from matches)
select team,count(1) total_played from cte_matches group by team order by 2 desc'''
mycursor.execute(sql_statement)
total_match_df = pd.DataFrame(mycursor.fetchall(),columns=['Team Name','Total Played
Matches'])
total_match_df.head(3)
mycursor=mydb.cursor()
sql_statement='''with CTE_matches as(
select team1 as team,winner from matches
union all
select team2 as team,winner from matches
)
select team, count(case when team = winner then 1 end) as total_won,
count(case when team <> winner then 1 end) as total_loss
from CTE_matches
group by team order by 2 desc'''
mycursor.execute(sql_statement)
won_loss_df = pd.DataFrame(mycursor.fetchall(),columns=['Team','Won','Lose'])
won_loss_df.head(19)
cursor.execute(SQL_statement,(5,))
df=pd.DataFrame(cursor.fetchall(),columns=['Team_name','Batsman','Total_Run','Ranking'])
mydb.close()
#Output will contain top 5 batsman from each team but we will only see first 10
df.head(10)
sql_statement='''
with CTE_Total_run as(
select batter,concat('Match-',row_number() over(order by id)) as Match_No,
sum(batsman_run) as Run_scored
from ipl_ball_by_ball
where batter='SK Raina'
group by id
)
Select * from(
select *,Round(avg(Run_scored) OVER(rows between unbounded preceding and current
row),2) as avg_each_match
from CTE_Total_run) temp
where temp.Match_No="Match-50"
OR temp.Match_No="Match-100"
OR temp.Match_No="Match-150"
'''
cursor.execute(sql_statement)
running_avg_raina=pd.DataFrame(cursor.fetchall(),columns=
["Batsman","Match_No","Run_scored","avg_each_match"])
mydb.close()
running_avg_raina
print('Most Dot Ball by a Bowler')
import pandas as pd
import mysql.connector
mydb=mysql.connector.connect(host='localhost',
database='ipl',
user='root',
password='root')
mycursor=mydb.cursor()
sql_statement='''
select bowler, sum(dot_ball) as total_dot_ball from(
select id,bowler,count(case when total_run=0 then 1 end) as dot_ball
from ipl_ball_by_ball group by id,bowler
)temp
group by bowler order by 2 desc
'''
mycursor.execute(sql_statement)
dot_ball_df = pd.DataFrame(mycursor.fetchall(),columns=['Bowler','Total_dot_ball'])
dot_ball_df.head()
Table Structure:
Python Output:
CONCLUSION
The IPL Data Analysis project using Python and MySQL helps analyze the
performance of players and teams in the Indian Premier League. By storing
match and player data in MySQL and using Python for data processing and
visualization, we can easily explore insights like top run-scorers, winning
ratios, and popular match venues. The project allows for a better
understanding of IPL trends and performances, making it a useful tool for
analyzing team and player statistics. This combination of database
management and data analysis provides a simple yet powerful way to
uncover key information from IPL data.
BIIBLIOGRAPHY:
https://medium.com/@keep9647smile/ipl-data-analysis-
11250e6ee603
https://www.kaggle.com/datasets/patrickb1912/ipl-
complete-dataset-20082020 (For Data)
https://chatgpt.com/
https://www.linkedin.com/pulse/python-practice-project-
ipl-2022-cricket-sports-data-analysis-mishra