Stats Project Doc
Stats Project Doc
Our data is taken from the list of Pakistani ODI cricketers. The list is arranged in the order in which each
player won his first ODI cap. Where more than one player won his first ODI cap in the same match, those
players are listed alphabetically. For simplicity’s sake, we have only considered and used 2 columns for
our statistical analysis, namely the career years of each player (by subtracting their ending year from
their starting year) and the runs made by them during this time. Thus our analysis solely relies on the
correlation between the time the players spent as professional cricketers and their batting performance.
There were a total of 228 records, out of which we took 198 for the statistical analysis on SPSS. The 30
records that were excluded either had certain missing values and thus for better results were removed,
or they belonged to players who had yet to retire and therefore would be bias to include for an accurate
analysis. As mentioned on the source of this data, the statistics were correct as at 3rd November 2020.
In the following sections we have given the descriptive statistics of our data and drawn certain inferences
from them. Further, we have performed normality tests and finally made certain assumptions about the
data and tested our hypothesis using the appropriate tests. The results are discussed in detail. The
snapshots of our data once loaded onto SPSS have been attached in the appendix of this document and
the source of this data has been cited as reference at the very end.
Variables
Runs and number of Career Years of ODI cricketers, being quantitative variables were taken as ratio data
as they are continuous in nature and therefore cannot have negative values. However, as SPSS takes both
ratio and interval data as “scale”, we can say that Runs and Career Years were taken as scale in SPSS.
Descriptive Statistics
runs
N Valid 198
Missing 0
Mean 840.85
Median 88.00
Mode 0
Std. Deviation 1925.076
Variance 3705917.882
Skewness 3.261
Std. Error of Skewness .173
Kurtosis 11.033
Std. Error of Kurtosis .344
Minimum 0
Maximum 11701
Out of the 198 records, the minimum runs made by a player equaled to zero, while the maximum runs
made by a player were 11701. The mean for this data is equal to about 840 while the median is 88 and
the mode is 0. What we can deduce from this piece of information is the fact that due to certain data
points on extreme ends (either greater run score or less run score), the mean has a higher value than
both the median and the mode. As the median does not get skewed by these extreme values, it is about
1/10th the value of the mean. Coming to the mode, the greatest number of players scored zero runs
before the end of their career. This value has arisen because the data consists of all the retired ODI
cricketers, which also includes those whose careers were greatly cut short for whatever reason. Had we
excluded such players, the mode would have been very different as well as the median and mean. But
the greatest and most easily seen impact would have been to the mode. We can also see that the
standard deviation from the mean was equal to about 1925. This clearly shows us that our data is not
clustered about a single value or group of values, rather it is widely spread out from the mean. The
skewness and kurtosis of the data are analyzed in the coming sections. But with respect to normality of
the data, we can assume that the data is positively skewed as the mean is greater than the median which
is greater than the mode. Our assumption will be investigated further in the coming sections.
Career_years
N Valid 198
Missing 0
Mean 5.04
Median 3.50
Mode 1
Std. Deviation 4.776
Variance 22.811
Skewness 1.167
Std. Error of Skewness .173
Kurtosis .710
Std. Error of Kurtosis .344
Minimum 1
Maximum 21
This particular variable basically tells us the number of career years for each of the 198 players. The
minimum number of years that a player was part of the cricket team was 1 year while the maximum
number of years was 21. The mean was about 5 years, the median was 4 while the mode was 1 year.
Relating this to the number of runs made by each player, we can now see that the mode for that
particular data was 0 because the mode for this data is 1. This means that as most players only played for
one year, their run rate was pretty low, quite possibly zero. The probability of a greater run rate increases
with the number of career years. Furthermore, seeing as how the maximum career years are 21, this
explains the great standard deviation we had for the runs made, as well as the extreme values that
resulted in the mean. Now we know that those extreme values are due to runs that are far greater than
the mean. Coming back to this data, the standard deviation is about 4.7 years which suggests that the
values are spread out from the mean, but not as much as those in runs made. The values of skewness
and kurtosis will be discussed in the coming sections. For now it is sufficient to say that the distribution
of the data does not seem to be normal as the mean is greater than the median which, in turn, is greater
than the mode and therefore is positively skewed.
Frequency Distribution:
Runs:
Cumulative
Frequency Percent Valid Percent Percent
4 1 .5 .5 13.1
8 1 .5 .5 17.2
9 1 .5 .5 17.7
17 1 .5 .5 26.8
20 1 .5 .5 28.8
22 1 .5 .5 29.3
24 1 .5 .5 29.8
25 3 1.5 1.5 31.3
29 1 .5 .5 36.4
31 1 .5 .5 36.9
35 1 .5 .5 38.9
36 1 .5 .5 39.4
37 1 .5 .5 39.9
39 1 .5 .5 40.4
41 1 .5 .5 40.9
42 1 .5 .5 41.4
45 1 .5 .5 41.9
53 1 .5 .5 43.9
56 1 .5 .5 44.4
60 1 .5 .5 44.9
61 1 .5 .5 45.5
62 1 .5 .5 46.0
65 1 .5 .5 46.5
66 1 .5 .5 47.0
69 1 .5 .5 47.5
74 1 .5 .5 48.0
78 1 .5 .5 48.5
80 1 .5 .5 49.0
84 1 .5 .5 49.5
87 1 .5 .5 50.0
97 1 .5 .5 51.5
99 1 .5 .5 52.0
100 1 .5 .5 52.5
110 1 .5 .5 53.0
113 1 .5 .5 54.5
119 1 .5 .5 56.1
127 1 .5 .5 56.6
130 1 .5 .5 57.1
131 1 .5 .5 57.6
133 1 .5 .5 58.1
141 1 .5 .5 58.6
142 1 .5 .5 59.1
147 1 .5 .5 59.6
154 1 .5 .5 60.1
166 1 .5 .5 60.6
184 1 .5 .5 61.1
193 1 .5 .5 61.6
197 1 .5 .5 62.1
199 1 .5 .5 62.6
209 1 .5 .5 63.1
210 1 .5 .5 63.6
234 1 .5 .5 65.2
236 1 .5 .5 65.7
242 1 .5 .5 66.2
262 1 .5 .5 66.7
267 1 .5 .5 67.2
271 1 .5 .5 67.7
297 1 .5 .5 68.2
321 1 .5 .5 69.7
324 1 .5 .5 70.2
330 1 .5 .5 70.7
348 1 .5 .5 71.2
349 1 .5 .5 71.7
383 1 .5 .5 72.2
394 1 .5 .5 72.7
457 1 .5 .5 74.2
504 1 .5 .5 74.7
524 1 .5 .5 75.3
543 1 .5 .5 75.8
556 1 .5 .5 76.3
593 1 .5 .5 76.8
641 1 .5 .5 77.3
642 1 .5 .5 77.8
711 1 .5 .5 78.3
725 1 .5 .5 78.8
741 1 .5 .5 79.3
768 1 .5 .5 79.8
782 1 .5 .5 80.3
786 1 .5 .5 80.8
812 1 .5 .5 81.3
966 1 .5 .5 81.8
969 1 .5 .5 82.3
1068 1 .5 .5 82.8
1265 1 .5 .5 83.3
1269 1 .5 .5 83.8
1336 1 .5 .5 84.3
1418 1 .5 .5 84.8
1521 1 .5 .5 85.4
1579 1 .5 .5 85.9
1709 1 .5 .5 86.4
1719 1 .5 .5 86.9
1845 1 .5 .5 87.4
1877 1 .5 .5 87.9
1895 1 .5 .5 88.4
2028 1 .5 .5 88.9
2185 1 .5 .5 89.4
2572 1 .5 .5 89.9
2605 1 .5 .5 90.4
2653 1 .5 .5 90.9
3194 1 .5 .5 91.4
3236 1 .5 .5 91.9
3266 1 .5 .5 92.4
3709 1 .5 .5 92.9
3717 1 .5 .5 93.4
4780 1 .5 .5 93.9
5080 1 .5 .5 94.4
5122 1 .5 .5 94.9
5841 1 .5 .5 95.5
6564 1 .5 .5 96.0
7170 1 .5 .5 96.5
7240 1 .5 .5 97.0
7381 1 .5 .5 97.5
7534 1 .5 .5 98.0
8064 1 .5 .5 98.5
8823 1 .5 .5 99.0
9720 1 .5 .5 99.5
11701 1 .5 .5 100.0
Career_years:
Cumulative
Frequency Percent Valid Percent Percent
16 1 .5 .5 97.0
17 1 .5 .5 97.5
18 1 .5 .5 98.0
20 1 .5 .5 99.5
21 1 .5 .5 100.0
Runs:
From the histogram we can see that the graph is positively screwed. Furthermore, we can also see that
the curve is platykurtic. However, when we look at the values of skewness and kurtosis, it is determined
that the graph is indeed positively skewed with a value of 3.117, but it is not platykurtic in terms of
measure of kurtosis. With kurtosis value of 9.999, the graph is actually leptokurtic. We can also deduce
from the graph that the reason for this positive skewness is because of the values that we have for some
of the players who have runs greater than 2000. As these values are very much greater than the rest of
the data, it has impacted the normality of the curve and thus made the graph positively skewed. Also, it
can be easily seen that the greatest number of players have runs between 0 and 1000.
Career:
In this histogram of Career years of the players, we can see that the graph is positively screwed.
Moreover, we can also see that the curve is platykurtic. However, when we look at the values of
skewness and kurtosis, it is determined that the graph is indeed positively skewed with a value of 1.230,
but it is not platykurtic in terms of measure of kurtosis. With kurtosis value of 0.890, the graph is actually
leptokurtic. We can also deduce from the graph that the reason for this positive skewness is because of
the values that we have for a handful of players who have career years greater than 10. As these values
are very much greater than the rest of the data, it has impacted the normality of the curve and thus
made the graph positively skewed. Furthermore, it can be easily seen that the greatest number of
players have career years between 0 and 10.
Normality Tests
As discussed above, our variables did not show normality. Here we will test for normality using the
Kolmogorov-Smirnova and Shapiro-Wilk tests in order to determine with accuracy whether our earlier
inferences were correct or not.
Runs
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Career_Years:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
T-Test
As our number of records exceed 30 and the population variance is unknown, therefore the appropriate
test to be applied is the z test. However, as SPSS only allows for t test, it really does not matter whether
the data is normal or not, whether the records exceed 30 or not, and whether the population variance is
known or unknown. All that matters for SPSS is whether or not the data is quantitative before applying
the test. Since our variables are quantitative, therefore we can apply the t test in order to test our
hypothesis for the average runs made and the average career years from the dataset. There are two
categories in t test, namely one tailed and two tailed tests. We have applied the two tailed test for our
hypothesis
Runs
One-Sample Test
Career:
One-Sample Test
Test Value = 8