Cricket Player Statistics Analysis
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
BELAGAVI-590 018
Mini-Project Report on
“Cricket Player Statistics Analysis”
Submitted in partial fulfillment of the requirements for the degree of
Master of Computer Applications
of Visvesvaraya Technological University, Belagavi
by
Student Name : Lakshmi B
USN : 1RN20MC026
Under the guidance of
Mrs. Roopa H M
Assistant Professor
Department of MCA
Estd : 2001
Department of Master of Computer Applications
RNS INSTITUTE OF TECHNOLOGY
Dr. Vishnuvardhan Road, Channasandra, Bengaluru – 560 098
2022
1|Page
Cricket Player Statistics Analysis
RNS INSTITUTE OF TECHNOLOGY
Dr. Vishnuvardhan Road, Channasandra, Bengaluru – 560 098
Department of Master of Computer Applications
Estd: 2001
CERTIFICATE
This is to certify that the Mini-Project work entitled “Cricket Player Statistics Analysis”
has been successfully carried out by Lakshmi B bearing USN 1RN20MC026, bonafide
student of RNS Institute of Technology, in partial fulfillment of the requirements for award
of degree of Master of Computer Applications of Visvesvaraya Technological
University, Belagavi, during the year 2021-22. It is certified that all corrections/suggestions
indicated for internal assessment have been incorporated in this report. The Internship report
has been approved as it satisfies the academic requirements for the said degree.
_____________________ __________________
Mrs. Roopa H M Dr. N P Kavya
Project Coordinator Head of Department
Department of MCA Department of MCA
RNSIT, Bengaluru. RNSIT, Bengaluru.
External Viva
Name of Examiners Signature with Date
1.
2.
2|Page
Cricket Player Statistics Analysis
DECLARATION
I, Lakshmi B student of 3rd MCA, RNS Institute of Technology, bearing USN:
1RN20MC026 hereby by declare that the project entitled “Cricket Player Statistics
Analysis” has been carried out by me under the supervision of Project Coordinator Mrs.
Roopa H M, Assistant Professor, Department of MCA and submitted in partial fulfillment of
the requirements for the award of the Degree of Master of Computer Applications by the
Visvesvaraya Technological University during the academic year 2021-22. This report has
not been submitted to any other Organization/University for any award of degree or
certificate.
Name: Lakshmi B
USN: 1RN20MC026
Signature of the candidates
3|Page
Cricket Player Statistics Analysis
ACKNOWLEDGEMENT
The successful completion of Mini-Project work depends on the co-operation and help of
many people, other than those who directly execute the work. I take this opportunity to
acknowledge for the help received for valuable assistance and cooperation from many
sources.
Our institution has played a paramount role in guiding us in right direction. I would like to
profoundly thank the Management of RNS Institute of Technology for providing such
healthy environment for successful completion of this project work.
I express my sincere words of gratitude to our Chairman Sri Dr. R N Shetty, for creating
an academic environment to brighten our career.
I would also like to thank our beloved Principal, Dr. M K Venkatesha, for providing the
necessary facilities to carry out this work.
I am extremely grateful to our beloved HoD, Dr. N P Kavya, for having accepted to
patronize me in the right direction with all her wisdom.
I would also express my heartfelt thanks to our Project Coordinator Mrs. Roopa H M,
Assistant Professor, Department of MCA for her constant guidance and devoted support.
Name: Lakshmi B
USN: 1RN20MC026
ii
4|Page
Cricket Player Statistics Analysis
ABSTRACT
In this project, we are analysing the data of cricketer’s career using bulk
data set. We are analysing the matches and player’s statistics using Python Data
Analysis.
The game is gaining lot of attention across the world. It is growing
rapidly to become one of the biggest business and entertainment provider in the
world. As the seasons go on, the data in the domain is growing rapidly. We need
to keep track of data for future analysis. It is important to record each match of
player and data on a daily basis.
Data analytics generally is the most important task in all areas of today’s
world, so as in this field. It is useful to analyse the career of an individual, his
team or a match which also help us in future assumptions and predictions. We
are achieving this by below specified requirements.
5|Page
Cricket Player Statistics Analysis
iii
TABLE OF CONTENTS
Chapter Name Page No
Declaration i
Acknowledgement ii
Abstract iii
Table of contents iv
List of Figures vi
CHAPTERS
1. INTRODUCTION 08
1.1. Project Overview 08
1.2. Data Collection 09
2. LITERATURE SURVEY 09
2.1. Library/Module Requirements 10
2.2. Hardware & Software Requirements 10
2.3. Tools/ Languages/ Platform 10
3. DATA CLEANING AND WRANGLING MECHANISMS 10
4. DATA ANALYSIS AND VISUALIZATION 11
5. CONCLUSION 16
REFERENCES 16
iv
LIST OF FIGURES
Figure No. Name Page No.
6|Page
Cricket Player Statistics Analysis
Fig. 1.1 Data Collection 09
Fig 4.1 No of Matches against opposition 11
Fig 4.2 Runs Scored against diff opposition 12
Fig 4.3 Avg against major teams 13
Fig 4.4 Matches played by year 14
Fig 4.5 Runs scored by year 14
Fig 4.6 Career avg progression by innings 15
vi
1.INTRODUCTION
1.1 Project Overview
In this article, we’ll see one such use case of Python. We will use Python to
analyze the performance of Indian cricketer MS Dhoni in his One Day
International (ODI),T-20 career.
Cricket, or the gentleman’s game is a very old, widespread and uncomplicated
pastime game. In the late 16th century, the sport of cricket has originated in the southeast
7|Page
Cricket Player Statistics Analysis
England. It became the country’s national sport in the 18th century and has developed
globally in the 19th and 20th The International Cricket Council (ICC) Cricket World Cup,
a One-Day International (ODI) cricket, is the flagship event of the international cricket
calendar and takes place every four years with matches contested in a 50-over format. It is
the biggest cricketing tournament and one of the world’s most viewed sporting events.
While, the Indian Premier League (IPL), a one-day cricket in India with matches
contested in a 20-over format is the most watched cricket league in the world. It is a
tournament centuries and yet the most popular game of the today’s world. It is a game of
uncertainty. One cannot predict outcome of the game upto the last moment of the game
though the possible results are known to all, therefore, an appropriate probability model
can be applied to predict the result.
1.2 Data Collection
If you are familiar with the concept of web scraping, you can scrape the data from this ESPN
cricket info. If you are not aware of web scraping. The data is available as an Excel file.
Once you have the dataset with you, you will need to load it in Python. You can use the
piece of code below to load the dataset in Python:
Once the dataset has been read, we should look at the head and tail of the dataset to make
sure it is imported correctly. The head of the dataset should look like this:
Figure 1.1 : Data Collection
8|Page
Cricket Player Statistics Analysis
2.LITERATURE SURVEY
2.1 Library/Module Requirements
This Project Require some of the Python Libraries and Modules i.e pandas, numpy,
matplotlib libraries
Pandas: pandas is a Python package providing fast, flexible, and expressive data
structures designed to make working with “relational” or “labeled” data both easy and
intuitive. It aims to be the fundamental high-level building block for doing practical, real-
world data analysis in Python. Additionally, it has the broader goal of becoming the most
powerful and flexible open source data.
Numpy: It is a Python library that provides a multidimensional array
object, various derived objects (such as masked arrays and matrices), and an
assortment of routines for fast operations on arrays, including mathematical,
logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear
algebra, basic statistical operations, random simulation and much more. At the core of the
NumPy package, is the ndarray object. This encapsulates n-dimensional arrays of
homogeneous data types, with many operations being performed in compiled code for
performance. There are several important differences between NumPy arrays and the
standard Python sequences:
Matplotlib: Matplotlib is one of the most popular Python packages used for data
visualization. It is a cross-platform library for making 2D plots from data in arrays. It provides
an object-oriented API that helps in embedding plots acan be used in Python and IPython
shells, Jupyter notebook and web application servers also.
Seaborn: Seaborn is an open-source Python library built on top of matplotlib. It is
used for data visualization and exploratory data analysis. Seaborn works easily with
dataframes and the Pandas library. The graphs created can also be customized easily.
2.2 Hardware and Software requirements
Software requirements:
Processor: i3 or higher
RAM:4GB or more
Input Devices: Keyboard, mouse
Hard Disk:500GB or more
9|Page
Cricket Player Statistics Analysis
Hardware requirement:
Windows 7 or Higher
2.3 Tools/Language/Platform
Python Language
Jupyter Platform
3. DATA CLEANING MECHANISMS
Data cleansing is so important for individuals because eventually, all this information
can become overwhelming. It can be difficult to find the most recent paperwork. You may
have to wade through dozens of old files before you find the most recent one.
Disorganization can lead to stress, and even lost documents!Data cleansing ensures you only
have the most recent files and important documents, so when you need to, you can find
them with ease. It also helps ensure that you do not have significant amounts of personal
information on your computer, which can be a security risks.
This data has been taken from a webpage, so it is not very clean. We will start by
removing the first 2 characters from the opposition string because that is not required .
Next, we will create a column for the year in which the match was played. Please make sure
that the date column is present in the DateTime format in your DataFrame. If not, please
use pd.to_datetime() to convert it to DateTime format .
We will also create a column indicating whether Dhoni was not out in that innings or not.
We will also drop all those matches from our records where Dhoni did not bat, and store
this information in a new DataFrame.
Finally, we will fix the data types of all the columns present in our new DataFrame.
4. DATA ANALYSIS AND VISUALIZATION
10 | P a g e
Cricket Player Statistics Analysis
Firstly, we will look at how many matches he has played against different oppositions. You
can use the following piece of code for this purpose:
The output should look like this:
Figure 4.1: No of Matches against opposition
We can see that he has played the majority of his matches against Sri Lanka, Australia,
England, West Indies, South Africa, and Pakistan.Let us look at how many runs he has
scored against different oppositions. You can use the following code snippet to generate
the result:
The output will look like this:
11 | P a g e
Cricket Player Statistics Analysis
Figure 4.2 : Runs scored against diff oppositions
We can see that Dhoni has scored the most runs against Sri Lanka, followed by Australia,
England, and Pakistan. He has also played a lot of matches against these teams, so it makes
sense.
To get a clearer picture, let us look at his batting average against each team. The following
piece of code will help us with getting the desired result:
For generating the plot, use the code snippet below:
The output will look like this:
12 | P a g e
Cricket Player Statistics Analysis
Figure 4.3 : Avg agnst major Teams
As we can see, Dhoni has performed remarkably against tough teams like Australia, England,
and Sri Lanka. His average against these teams is either close to his career average, or
slightly higher. The only team against whom he has not performed well is South Africa.
Let us now look at his year-on-year statistics. We will start by looking at how many matches
he has played each year after his debut. The code for that will be:
The plot will look like this
13 | P a g e
Cricket Player Statistics Analysis
Figure 4.4 : matches played by year
We can see that in 2012, 2014, and 2016, Dhoni played very few ODI matches for India.
Overall, after 2005-2009, the average number of matches he played reduced slightly.
We should also look at how many runs he has scored every year. The code for that will be:
The output should look like this:
Figure 4.5 : Runs scored by year
14 | P a g e
Cricket Player Statistics Analysis
It can be clearly seen that Dhoni scored the most runs in the year 2009, followed by 2007
and 2008. The number of runs started reducing post-2010 (because the number of matches
played also started reducing).
Finally, let’s look at his career batting average progression by innings. This is time-series
data and has been plotted on a line plot. The code for that will be:
The code snippet for the plot will be:
The output plot will look like this:
Figure 4.6 : Career avg progression by innings
15 | P a g e
Cricket Player Statistics Analysis
5. CONCLUSION
Here, we have studied the performance of cricket players in both IPL session 9, 2016 and
ICC World Cup, 2015 in the same direction of Sharma (2013). The statistical technique of
factor analysis has been employed to explore the interrelationship among the various
dimensions of batting and bowling of 20- and 50- overs cricket matches. It has been applied
through PCA to explain items validity as well as groups of items into meaningful clusters. It
is observed that in both the cases of the 20- and 50- overs matches, the five dimensions have
been grouped into factor1 (i.e., batting) while, three dimensions have been grouped into
factor2 (i.e., bowling). The variance explained by factor1 (batting) is much higher than the
variance explained by factor2 (bowling). Thus, it concludes that the batting capability
dominates over bowling capability which rejustified the works of Sharma .
REFERENCES:
https://www.analyticsvidhya.com/blog/2021/06/analyze-
cricket-data-with-python-a-hands-on-guide/#h2
Bailey, M.J. & Clarke, S.R.: Market inefficiencies in player head to head betting on the 2003
cricket world cup. In Economics, Management and Optimization in Sport, S.Butenko, J.Gil-
Lafuente & P.M.Pardalos, editors, SpingerVerlag, Heidelberg,pp. 185-202 (2004).
Barr, G.D.I. and Kantor, B.S.: A criterion for comparing and selecting batsmen in limited overs
cricket, Journal of the Operational Research Society, 55, p. 1266-1274 (2004).
16 | P a g e
Cricket Player Statistics Analysis
17 | P a g e