Ip Project
Ip Project
SCHOOL,
JAIPUR
By:
Name: Anant Acharya
Class: 12 E
Roll No:
Session: 2024-25
C E RT I F I C AT E
This is to certify that Anant Acharya of Class XII E has successfully completed
the Informatics Practices project entitled IPL Player Performance Analysis
ROLL NO.
Data science is an essential part of any industry in this era of big data. Data
science is a field that deals with vast volumes of data using modern tools and
techniques to derive meaningful information, make business decisions and for
predictive analysis. The data used for analysis can be from multiple sources and
in various formats.
Python provides all the necessary tools to analyse huge datasets. It comes with
powerful statistical, numerical and visualisation libraries such as Pandas,
Numpy, Matplotlib etc. and many advanced libraries also.
INTRODUCTION
In today’s fast-growing world, information has a vital and essential role to play. The
IT revolution has not only affected business, education, science and technology but
also the way people think. Speedy changes in the economy and globalization are
putting more and more stress on cutting-edge technology and processing information
swiftly, accurately and reliably. The conventional system was not capable to show
accuracy and speed.
This project is designed as a comprehensive analytical tool for exploring and visualizing
statistics from the Indian Premier League (IPL). The goal is to provide cricket enthusiasts,
analysts, and fans with a structured, user-friendly interface for accessing a variety of insights
about player performances and match data. By utilizing three datasets—player statistics, match
details, and ball-by-ball deliveries—the project integrates a vast amount of information to offer
both detailed statistical summaries and insightful visualizations.
The tool supports several key functionalities. Users can retrieve detailed individual player
statistics, including batting and bowling averages, strike rates, and fielding contributions like
catches and stumpings. It also allows for the identification of top players in various categories
such as runs scored, wickets taken, strike rates, and economy rates. For in-depth analysis, users
can compare multiple players' performances side-by-side or examine trends in a specific
batsman's runs or a bowler's dismissals over matches. Additionally, users can analyze top run-
scorers across different IPL seasons.
The implementation relies on Python's powerful data analysis libraries. The pandas library is
used for efficient data processing and manipulation, while matplotlib provides capabilities for
creating clear, insightful visualizations. The tool also employs techniques for handling and
validating datasets, ensuring robust and accurate analysis.
By automating complex statistical operations and offering intuitive visual representations, this
project streamlines the exploration of IPL data, making it an invaluable resource for anyone
interested in the intricacies of cricket analytics. Whether you're analyzing past performances or
comparing players, this tool makes it effortless to derive meaningful insights from IPL statistics.
DATA SOURCE
CSV File name: deliveries.csv, IPL Player Stat.csv, matches.csv
I MPLEMENTATION
SOURCE CODE
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from typing import Dict, List
class IPLStatsAnalyzer:
def __init__(self, stats_file: str, matches_file: str, deliveries_file:
str):
"""Initialize the IPL Stats Analyzer with the datasets"""
self.stats = pd.read_csv(stats_file)
self.matches = pd.read_csv(matches_file)
self.deliveries = pd.read_csv(deliveries_file)
self._verify_columns()
def _verify_columns(self):
"""Verify that all required columns are present"""
required_columns_stats = {
'player', 'runs', 'boundaries', 'balls_faced', 'wickets',
'balls_bowled', 'runs_conceded', 'matches', 'batting_avg',
'batting_strike_rate', 'bowling_economy', 'bowling_avg',
'bowling_strike_rate', 'catches', 'stumpings'
}
required_columns_matches = {'id', 'season', 'winner', 'team1',
'team2'}
required_columns_deliveries = {'match_id', 'batter', 'bowler',
'batsman_runs', 'dismissal_kind'}
if missing_stats:
raise ValueError(f"Missing columns in stats dataset:
{missing_stats}")
if missing_matches:
raise ValueError(f"Missing columns in matches dataset:
{missing_matches}")
if missing_deliveries:
raise ValueError(f"Missing columns in deliveries dataset:
{missing_deliveries}")
stats_dict = player_data.iloc[0].to_dict()
return {
'name': stats_dict['player'],
'matches': int(stats_dict['matches']),
'runs': int(stats_dict['runs']),
'batting_avg': round(stats_dict['batting_avg'], 2),
'batting_strike_rate': round(stats_dict['batting_strike_rate'],
2),
'boundaries': int(stats_dict['boundaries']),
'wickets': int(stats_dict['wickets']) if not
pd.isna(stats_dict['wickets']) else 0,
'bowling_economy': round(stats_dict['bowling_economy'], 2) if not
pd.isna(
stats_dict['bowling_economy']) else 0,
'catches': int(stats_dict['catches']) if not
pd.isna(stats_dict['catches']) else 0,
'stumpings': int(stats_dict['stumpings']) if not
pd.isna(stats_dict['stumpings']) else 0
}
if len(players_data) == 0:
print("No data found for the specified players.")
return
# Batting Stats
players_data.plot(kind='bar', x='player', y='batting_avg', ax=ax1,
color='skyblue')
ax1.set_title('Batting Average Comparison')
ax1.tick_params(axis='x', rotation=45)
plt.tight_layout()
plt.show()
batsman_grouped = batsman_data.groupby('match_id')
['batsman_runs'].sum()
plt.figure(figsize=(12, 6))
batsman_grouped.plot(kind='bar', color='blue', alpha=0.7)
plt.title(f"{batsman_name}'s Runs per Match")
plt.xlabel("Match ID")
plt.ylabel("Runs")
plt.show()
dismissals =
bowler_data['dismissal_kind'].notna().groupby(bowler_data['match_id']).sum()
plt.figure(figsize=(12, 6))
dismissals.plot(kind='bar', color='green', alpha=0.7)
plt.title(f"{bowler_name}'s Dismissals per Match")
plt.xlabel("Match ID")
plt.ylabel("Dismissals")
plt.show()
def plot_top_run_scorers_by_season(self):
"""Analyze top run-scorers for each season"""
merged_data = pd.merge(self.deliveries, self.matches,
left_on='match_id', right_on='id')
season_runs = merged_data.groupby(['season', 'batter'])
['batsman_runs'].sum().reset_index()
top_scorers = season_runs.groupby('season').apply(lambda x:
x.nlargest(1, 'batsman_runs')).reset_index(drop=True)
plt.figure(figsize=(12, 6))
for season in top_scorers['season'].unique():
season_data = top_scorers[top_scorers['season'] == season]
plt.bar(season_data['season'], season_data['batsman_runs'],
label=season_data['batter'].values[0])
plt.legend(title="Top Scorers")
plt.title("Top Run Scorers by Season")
plt.xlabel("Season")
plt.ylabel("Runs")
plt.show()
def main():
try:
analyzer = IPLStatsAnalyzer('IPL Player Stat.csv', 'matches.csv',
'deliveries.csv')
while True:
print("\n=== IPL Stats Analysis Tool ===")
print("1. Player Statistics")
print("2. Top Players by Category")
print("3. Compare Players")
print("4. Batsman Performance")
print("5. Bowler Performance")
print("6. Top Run Scorers by Season")
print("7. Exit")
if choice == '1':
player = input("Enter player name: ")
stats = analyzer.get_player_stats(player)
if "error" in stats:
print(f"\n{stats['error']}")
else:
print(f"\nStatistics for {stats['name']}:")
print(f"Matches Played: {stats['matches']}")
print(f"Runs Scored: {stats['runs']}")
print(f"Batting Average: {stats['batting_avg']}")
print(f"Strike Rate: {stats['batting_strike_rate']}")
print(f"Boundaries: {stats['boundaries']}")
if stats['wickets'] > 0:
print(f"Wickets: {stats['wickets']}")
print(f"Bowling Economy: {stats['bowling_economy']}")
print(f"Catches: {stats['catches']}")
print(f"Stumpings: {stats['stumpings']}")
else:
print("Invalid choice. Please try again.")
except Exception as e:
print(f"An error occurred: {e}")
if __name__ == "__main__":
main()
S AMPLE O UTPUTS
Main Menu
Player Statistics
Bowler Performance
Top Scorers by season:
BIBLIOGRAPHY
Informatics Practices Text Book (NCERT)
Informatics Practices by Sumita Arora
docs.python.org