0% found this document useful (0 votes)
17 views13 pages

Frequency Distribution S AutoRecovered

This assignment presents a statistical analysis of a T20 cricket match between Bangladesh and India held on October 6, 2024, focusing on various statistical tools to interpret player performance and match trends. Key analyses include frequency distribution, measures of central tendency, dispersion, skewness, kurtosis, correlation, and regression analysis of runs scored and balls faced. The findings reveal insights into scoring patterns and relationships between variables, highlighting the importance of statistics in sports analytics.

Uploaded by

Arka Sarker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views13 pages

Frequency Distribution S AutoRecovered

This assignment presents a statistical analysis of a T20 cricket match between Bangladesh and India held on October 6, 2024, focusing on various statistical tools to interpret player performance and match trends. Key analyses include frequency distribution, measures of central tendency, dispersion, skewness, kurtosis, correlation, and regression analysis of runs scored and balls faced. The findings reveal insights into scoring patterns and relationships between variables, highlighting the importance of statistics in sports analytics.

Uploaded by

Arka Sarker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Assignment

on

Statistical Analysis of a T20 Cricket Match

Submitted to Submitted By

Mahbub Parvez Sanjida Islam Supti (0242310004281045)


Associate Professor Tasnim Jahan Jim (0242310004281037)
Department of Business Administration Muhammad Safinan Sabri (0242310004281024)
Dilwar Hussen Rafid (0242310004281034)
Faculty of Business and Entrepreneurship Arka Sarker (0242310004281026)
Daffodil International University Department of THM
Faculty of Business and Entrepreneurship
Daffodil International University

Date : 13 . 04. 2025


Topic: Statistical Analysis of Bangladesh vs. India T20 Match (October 6, 2024)

Venue: Gwalior International Stadium

Introduction

Statistics plays a vital role in sports analytics, especially in cricket, where data can reveal player
performance, match trends, and strategic insights. In this assignment, we explore the statistical analysis
of a T20 cricket match between Bangladesh and India held on October 6, 2024. We use various statistical
tools to interpret the runs scored by Bangladesh in relation to the number of balls faced. This analysis
includes frequency distribution, measures of central tendency, dispersion, skewness, kurtosis, and
correlation.

Frequency distribution

A frequency distribution is a table or graph that displays the frequency of various outcomes in a sample.
It shows how data is distributed across different intervals or categories, helping to understand the
pattern of data spread. In cricket, it helps represent how often specific run values occur, giving insight
into scoring trends.

Data: 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39

Highest : 39

Lowest : 1

Number of classes : 2k > n

n=15 k=4

2k>n

=24>15 =16>15

i
𝐾

= =9.5

Runs Frequency Frequency Midpoint(x) Number of f*x Cumulative


number run(f) frequency
1-10 |||| 5 5.5 5 27.5 5
10-20 |||| 5 15 5 75 10
20-30 ||| 3 26.5 3 79.5 13
30-40 || 2 35 2 70 15
∑𝑛 = 15 ∑𝑓𝑥 = 252
Measures of Central Tendency

Mean, Median, Mode

Mean: The arithmetic average of a set of numbers. It is calculated by dividing the total sum of the values
by the number of values.

Median: The middle value of a data set when the values are arranged in ascending or descending order.
If the number of observations is even, the median is the average of the two central numbers.

Mode: The value that appears most frequently in a data set. A data set may have one mode, more than
one mode, or no mode at all.

Data - 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39

Mean
Runs Frequency midpoint(f) Number of f*x
frequency(x)
1-10 5.5 5 27.5
10-20 15 5 75
20-30 26.5 3 79.5
30-40 35 2 70
∑𝑛 = 15 ∑𝑓𝑥 = 252
∑𝑓𝑥
x=̄
∑𝑛

=
= 16.8

Median
Runs Frequency Cumulative Frequency
1-10 5
10-20 5
20-30 3
30-40 2

𝑛
= = 7.5
2
𝑛
−𝑓
2
Median = L+ ( )
𝑓

7.

= 10 ( ) * 10

= 15

Mode
Runs Frequency
1-10 5
10-20 5
20-30 3
30-40 2

M 𝒄

= 10 +
= 10

L = 10

∆1 = 5-5= 0

∆2 = 5-3 = 2

C= 10

Measures of Dispersion

A measure of dispersion is a statistical tool used to describe the spread or variability of a data set. It tells
how much the values in a dataset differ from the mean or median. Common measures include:

• Range
• Variance
• Standard Deviation
• Interquartile Range (IQR)

Dispersion helps in understanding consistency and variability in performance data, such as runs in
cricket.
Mean deviation

Data : 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39

x=̄

=
= 16.53

x x- x ̄ (x- x)̄

8 -8.53 8.53

4 -12.53 12.53
27 10.47 10.47
12 -4.53 4.53
1 -15.53 15.53
8 -8.53 8.53
35 18.47 18.47
11 -5.53 5.53
12 -4.53 4.53
1 -15.53 15.53
29 12.47 12.47
16 -0.53 0.53
29 12.47 12.47
16 -0.53 0.53
39 22.47 22.47
∑ = (x −
x̄ )=152.65

̄
M.D =
n

=
= 10.18
Population standard varience

Data : 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39

x x-μ x – μ2
8 -8.53 72.75
4 -12.53 157.06
27 10.47 109.67
12 -4.53 20.52
1 -15.53 241.27
8 -8.53 72.75
35 18.47 341.18
11 -5.53 30.58
12 -4.53 20.52
1 -15.53 241.27
29 12.47 155.59
16 -0.53 0.28
29 12.47 155.59
16 -0.53 0.28
39 22.47 504.90
𝑛

𝜎
𝑁
2123.23
=
15

= 141.61

𝜎
= 11.9

Sample standard variance

S
2123.23
= 14

s=

= 12.31
Skewness & Kurtosis

Skewness: It indicates the asymmetry of a data distribution.

Positive skew (right-skewed): Tail is on the right; more lower values.

Negative skew (left-skewed): Tail is on the left; more higher values.

Zero skewness indicates a symmetrical distribution.

Kurtosis: It measures the "tailedness" or the peak of a distribution.

Leptokurtic (high kurtosis): Sharp peak, heavy tails.

Platykurtic (low kurtosis): Flatter peak, light tails.

Mesokurtic: Normal distribution.

Data - 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39

Skewness:

Mean =

= 16.53

Median = 1,1,4,8,8,11,12,12,16,16,27,29,29,35,39

= 12

Sample standard variance :

S
s=

= 12.31

(𝑚𝑒𝑎𝑛 𝑚𝑒𝑑𝑖𝑎𝑛)
Sk =
𝑠

=
= 1.10

So, there is a moderate positive skewness in the data set.

Kurtosis

x (x - x̄ ) (x - ̄)² (x - ̄)4
8 8.53 72.75 5294.01
4 12.53 157.06 24652.14
27 10.47 109.67 12020.93
12 4.53 20.52 421.07
1 15.53 241.27 58220.86
8 8.53 72.75 5294.01
35 18.47 341.18 116417.44
11 5.53 30.58 935.14
12 4.53 20.52 421.07
1 15.53 241.27 58220.86
29 12.47 155.59 24211.36
16 0.53 0.28 0.08
29 12.47 155.59 24211.36
16 0.53 0.28 0.08
39 22.47 504.90 254924.01
𝑛 ∑𝑛 = 353731.48

∑(𝐱 − ̄ )𝟒
µ4= 𝐍

= 23582.098

∑(𝐱 − ̄ )²
µ2 = 𝐍
=
= 141.55

B2 = µ4/µ22

=
= 1.18

The distribution with respect to kurtosis is that platykurtic.

Correlation Analysis

Correlation analysis is a statistical method used to measure the strength and direction of the
relationship between two variables. It tells whether and how strongly pairs of variables are related,
typically using a correlation coefficient (r) ranging from -1 to +1.

Data : (Runs)- 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39

(Balls) - 9,2,25,18,2,6,32,5,13,5,19,7,14,15,16

Run(x) Ball(y) (x - x̄ ) (y- ȳ) (x - ̄ ² (y- ȳ)2 (x - ̄ (y- ȳ)


8 9 -8.53 -3.53 72.75 12.46 30.13
4 2 -12.53 -10.53 157.06 110.08 131.92
27 25 10.47 12.47 109.67 155.50 130.61
12 18 -4.53 5.47 20.52 29.79 -24.78
1 2 -15.53 -10.53 241.27 110.08 163.51
8 6 -8.53 -6.53 72.75 42.64 55.73
35 32 18.47 19.47 341.18 378.40 359.94
11 5 -5.53 -7.53 30.58 56.60 41.66
12 13 -4.53 0.47 20.52 0.22 -2.13
1 5 -15.53 -7.53 241.27 56.60 116.99
29 19 12.47 6.47 155.59 41.83 80.68
16 7 -0.53 -5.53 0.28 30.58 2.93
29 14 12.47 1.47 155.59 2.16 -18.33
16 15 -0.53 2.47 0.28 6.09 -1.31
39 16 22.47 3.47 504.90 11.97 77.98
x =̄16.53 ȳ=12.53
x̄ ) ̄ (y
2123.23 ȳ) =1044
∑𝑛𝑖 =1 (x − x ) (y − ȳ) ̄

√ ∑𝑛𝑖 =1 (x − x )²(y − ȳ)
r=
̄ ²

=
= 0.84

So,the relationship between batsman runs and ball is that 0.84.which is indicates a very high veriables.

Regression Analysis

Regression analysis is a statistical method used to examine the relationship between one dependent
variable (the outcome you're trying to predict) and one or more independent variables (the predictors).

It helps:

• Predict values of the dependent variable based on the values of the independent variables.

• Understand the strength and type of relationship (positive, negative, or none) between
variables.

• Identify trends and patterns in data.

Data- (Runs)- 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39

(Balls) - 9,2,25,18,2,6,32,5,13,5,19,7,14,15,16
Run(x) Ball(y) (x - x̄) (y- ȳ) (x - ̄² (x - ̄ (y- ȳ)

8 9 -8.53 -3.53 72.75 30.13

4 2 -12.53 -10.53 157.06 131.92

27 25 10.47 12.47 109.67 130.61

12 18 -4.53 5.47 20.52 -24.78

1 2 -15.53 -10.53 241.27 163.51

8 6 -8.53 -6.53 72.75 55.73

35 32 18.47 19.47 341.18 359.94

11 5 -5.53 -7.53 30.58 41.66

12 13 -4.53 0.47 20.52 -2.13

1 5 -15.53 -7.53 241.27 116.99

29 19 12.47 6.47 155.59 80.68

16 7 -0.53 -5.53 0.28 2.93

29 14 12.47 1.47 155.59 -18.33

16 15 -0.53 2.47 0.28 -1.31

39 16 22.47 3.47 504.90 77.98

x =̄ ȳ=12.53
16.53 ̄ (y
x̄)
2123.23

Σ(𝑥 − 𝑥̄)(𝑦 − ȳ) = 1246.63

Σ(𝑥 − 𝑥̄)²= 2123.23

x̄ = 16.53

ȳ = 12.53

b =𝛴(𝑥−𝑥̄)(𝑦−ȳ)/𝛴(𝑥−𝑥̄)²
1246.63
= 2123.23

= 0.59

a = ȳ-bx̄

= 12.53 – 0.59 * 16.53

= 2.78

Simple liner -
𝑌𝑖̂ = a+bx +ε

= 2.78 + 0.59x + ε

If estimate balls when 180 then regression coefficient is that

𝑌𝑖̂ = a+bx +ε

= 2.78 + 0.59*(180) + ε

= 108.98 ε

The regression coefficient b = 0.59 means- For every 1 unit increase in X, the estimated Y increases by
0.59 units on average. If estimate balls when 180 then estimation of regression coefficient run is that
108.98 ε

Findings

• Positive correlation between balls and runs implies efficient use of time at the crease.
• A few players scored significantly higher, indicating key performers.
• Lower scores (like 1s and 4s) show some inconsistency or early dismissals.
• The distribution is slightly skewed, with most scores clustering around the mid-range.

• The regression coefficient (b) is the slope of the regression line, showing how much the
dependent variable (Y) changes for a one-unit increase in the independent variable (X).

Recommendation

• Encourage batsmen to occupy the crease longer, as more balls faced tends to result in higher
scores.
• Focus on training for consistency to reduce very low individual scores.
• Use data-driven strategies to identify players with high strike potential and ensure they face
more deliveries.

Implications

• The team can enhance overall performance by balancing aggression with staying power.
• This data-driven approach can help in selecting line-ups, setting batting orders, and making
ingame decisions.
• Long-term use of such analysis may lead to improved match outcomes and strategic advantage
in future games.

• The regression model can serve as a useful forecasting tool within the studied data range.
Conclusion

This statistical analysis of Bangladesh's batting performance against India reveals important insights:

• The frequency distribution highlights scoring patterns and common run values.
• The mean, median, and mode provide a central view of the data.
• The strong positive correlation between balls faced and runs shows that more time at the crease
tends to result in higher scores.
• Skewness and kurtosis can further explain the nature of scoring, whether most players
underperform or outperform.

• The regression model can serve as a useful forecasting tool within the studied data range.

Overall, statistical tools provide a meaningful lens to interpret cricket data, aiding in player performance
evaluation and strategy building.

You might also like