Frequency Distribution S AutoRecovered
Frequency Distribution S AutoRecovered
on
Submitted to Submitted By
Introduction
Statistics plays a vital role in sports analytics, especially in cricket, where data can reveal player
performance, match trends, and strategic insights. In this assignment, we explore the statistical analysis
of a T20 cricket match between Bangladesh and India held on October 6, 2024. We use various statistical
tools to interpret the runs scored by Bangladesh in relation to the number of balls faced. This analysis
includes frequency distribution, measures of central tendency, dispersion, skewness, kurtosis, and
correlation.
Frequency distribution
A frequency distribution is a table or graph that displays the frequency of various outcomes in a sample.
It shows how data is distributed across different intervals or categories, helping to understand the
pattern of data spread. In cricket, it helps represent how often specific run values occur, giving insight
into scoring trends.
Data: 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39
Highest : 39
Lowest : 1
n=15 k=4
2k>n
=24>15 =16>15
i
𝐾
= =9.5
Mean: The arithmetic average of a set of numbers. It is calculated by dividing the total sum of the values
by the number of values.
Median: The middle value of a data set when the values are arranged in ascending or descending order.
If the number of observations is even, the median is the average of the two central numbers.
Mode: The value that appears most frequently in a data set. A data set may have one mode, more than
one mode, or no mode at all.
Data - 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39
Mean
Runs Frequency midpoint(f) Number of f*x
frequency(x)
1-10 5.5 5 27.5
10-20 15 5 75
20-30 26.5 3 79.5
30-40 35 2 70
∑𝑛 = 15 ∑𝑓𝑥 = 252
∑𝑓𝑥
x=̄
∑𝑛
=
= 16.8
Median
Runs Frequency Cumulative Frequency
1-10 5
10-20 5
20-30 3
30-40 2
𝑛
= = 7.5
2
𝑛
−𝑓
2
Median = L+ ( )
𝑓
7.
= 10 ( ) * 10
= 15
Mode
Runs Frequency
1-10 5
10-20 5
20-30 3
30-40 2
M 𝒄
= 10 +
= 10
L = 10
∆1 = 5-5= 0
∆2 = 5-3 = 2
C= 10
Measures of Dispersion
A measure of dispersion is a statistical tool used to describe the spread or variability of a data set. It tells
how much the values in a dataset differ from the mean or median. Common measures include:
• Range
• Variance
• Standard Deviation
• Interquartile Range (IQR)
Dispersion helps in understanding consistency and variability in performance data, such as runs in
cricket.
Mean deviation
Data : 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39
x=̄
=
= 16.53
x x- x ̄ (x- x)̄
8 -8.53 8.53
4 -12.53 12.53
27 10.47 10.47
12 -4.53 4.53
1 -15.53 15.53
8 -8.53 8.53
35 18.47 18.47
11 -5.53 5.53
12 -4.53 4.53
1 -15.53 15.53
29 12.47 12.47
16 -0.53 0.53
29 12.47 12.47
16 -0.53 0.53
39 22.47 22.47
∑ = (x −
x̄ )=152.65
̄
M.D =
n
=
= 10.18
Population standard varience
Data : 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39
x x-μ x – μ2
8 -8.53 72.75
4 -12.53 157.06
27 10.47 109.67
12 -4.53 20.52
1 -15.53 241.27
8 -8.53 72.75
35 18.47 341.18
11 -5.53 30.58
12 -4.53 20.52
1 -15.53 241.27
29 12.47 155.59
16 -0.53 0.28
29 12.47 155.59
16 -0.53 0.28
39 22.47 504.90
𝑛
𝜎
𝑁
2123.23
=
15
= 141.61
𝜎
= 11.9
S
2123.23
= 14
s=
= 12.31
Skewness & Kurtosis
Data - 8,4,27,12,1,8,35,11,12,1,29,16,29,16,39
Skewness:
Mean =
= 16.53
Median = 1,1,4,8,8,11,12,12,16,16,27,29,29,35,39
= 12
S
s=
= 12.31
(𝑚𝑒𝑎𝑛 𝑚𝑒𝑑𝑖𝑎𝑛)
Sk =
𝑠
=
= 1.10
Kurtosis
x (x - x̄ ) (x - ̄)² (x - ̄)4
8 8.53 72.75 5294.01
4 12.53 157.06 24652.14
27 10.47 109.67 12020.93
12 4.53 20.52 421.07
1 15.53 241.27 58220.86
8 8.53 72.75 5294.01
35 18.47 341.18 116417.44
11 5.53 30.58 935.14
12 4.53 20.52 421.07
1 15.53 241.27 58220.86
29 12.47 155.59 24211.36
16 0.53 0.28 0.08
29 12.47 155.59 24211.36
16 0.53 0.28 0.08
39 22.47 504.90 254924.01
𝑛 ∑𝑛 = 353731.48
∑(𝐱 − ̄ )𝟒
µ4= 𝐍
= 23582.098
∑(𝐱 − ̄ )²
µ2 = 𝐍
=
= 141.55
B2 = µ4/µ22
=
= 1.18
Correlation Analysis
Correlation analysis is a statistical method used to measure the strength and direction of the
relationship between two variables. It tells whether and how strongly pairs of variables are related,
typically using a correlation coefficient (r) ranging from -1 to +1.
(Balls) - 9,2,25,18,2,6,32,5,13,5,19,7,14,15,16
√ ∑𝑛𝑖 =1 (x − x )²(y − ȳ)
r=
̄ ²
=
= 0.84
So,the relationship between batsman runs and ball is that 0.84.which is indicates a very high veriables.
Regression Analysis
Regression analysis is a statistical method used to examine the relationship between one dependent
variable (the outcome you're trying to predict) and one or more independent variables (the predictors).
It helps:
• Predict values of the dependent variable based on the values of the independent variables.
• Understand the strength and type of relationship (positive, negative, or none) between
variables.
(Balls) - 9,2,25,18,2,6,32,5,13,5,19,7,14,15,16
Run(x) Ball(y) (x - x̄) (y- ȳ) (x - ̄² (x - ̄ (y- ȳ)
x =̄ ȳ=12.53
16.53 ̄ (y
x̄)
2123.23
x̄ = 16.53
ȳ = 12.53
b =𝛴(𝑥−𝑥̄)(𝑦−ȳ)/𝛴(𝑥−𝑥̄)²
1246.63
= 2123.23
= 0.59
a = ȳ-bx̄
= 2.78
Simple liner -
𝑌𝑖̂ = a+bx +ε
= 2.78 + 0.59x + ε
𝑌𝑖̂ = a+bx +ε
= 2.78 + 0.59*(180) + ε
= 108.98 ε
The regression coefficient b = 0.59 means- For every 1 unit increase in X, the estimated Y increases by
0.59 units on average. If estimate balls when 180 then estimation of regression coefficient run is that
108.98 ε
Findings
• Positive correlation between balls and runs implies efficient use of time at the crease.
• A few players scored significantly higher, indicating key performers.
• Lower scores (like 1s and 4s) show some inconsistency or early dismissals.
• The distribution is slightly skewed, with most scores clustering around the mid-range.
• The regression coefficient (b) is the slope of the regression line, showing how much the
dependent variable (Y) changes for a one-unit increase in the independent variable (X).
Recommendation
• Encourage batsmen to occupy the crease longer, as more balls faced tends to result in higher
scores.
• Focus on training for consistency to reduce very low individual scores.
• Use data-driven strategies to identify players with high strike potential and ensure they face
more deliveries.
Implications
• The team can enhance overall performance by balancing aggression with staying power.
• This data-driven approach can help in selecting line-ups, setting batting orders, and making
ingame decisions.
• Long-term use of such analysis may lead to improved match outcomes and strategic advantage
in future games.
• The regression model can serve as a useful forecasting tool within the studied data range.
Conclusion
This statistical analysis of Bangladesh's batting performance against India reveals important insights:
• The frequency distribution highlights scoring patterns and common run values.
• The mean, median, and mode provide a central view of the data.
• The strong positive correlation between balls faced and runs shows that more time at the crease
tends to result in higher scores.
• Skewness and kurtosis can further explain the nature of scoring, whether most players
underperform or outperform.
• The regression model can serve as a useful forecasting tool within the studied data range.
Overall, statistical tools provide a meaningful lens to interpret cricket data, aiding in player performance
evaluation and strategy building.