Descriptive Statistic
Descriptive Statistic
Statistics
Many studies generate large numbers of data points, and to make sense of all that data, researchers use statistics that summarize the data, providing a better understanding of overall tendencies within the distributions of scores.
Types of statistics
Types of statistics:
1. descriptive (which summarize some characteristic of a sample)
Measures of central tendency Measures of variability Measures of skewness
2. inferential (which test for significant differences between groups and/or significant relationships among variables within the sample
t-ratio, chi-square, beta-value
Descriptive statistic
Descriptive statistics is a series of procedures designed to illuminate the data, so that its principal characteristics and main features are revealed. This may mean sorting the data by size; perhaps putting it into a table, maybe presenting it in an appropriate chart, or summarising it numerically.
Descriptive Statistics is a tool or technique that is used to describe and organize the characteristics of a collection of information or data. The collection is called a data set or just data.
Question: How spread out are the scores of this data set?
Click for key concepts.
Question: How does a particular score compare to the rest of the set of scores for this data set?
Key terms
Central Tendency measures. They are
computed to give a center around which the measurements in the data are distributed. describe data spread or how far away the measurements are from the center.
formula
m = SX/N (population)
X = Sxi/n (sample)
Example of Mean
MEAN = 40/10 = 4
Measurements x
3 5 5 1 7 2 6 7 0 4 40
Deviation x - mean
-1 1 1 -3 3 -2 2 3 -4 0 0
Notice that the sum of the deviations is 0. Notice that every single observation intervenes in the computation of the mean.
The mean
Features: 1. One advantage of the mean over the median is that it uses all of the information in the data set. 2. it is affected by skewness in the distribution, and by the presence of outliers in the data. 3. it cannot be used with ordinal data.
The median
The median the data is sorted from the lowest to the highest ,the middle value is the median,half of the values will be equal to or less than the median value, and half equal to or above it.
Exercise
The following is 11 rats survival days: 4107503152913>60>60 Questions:the average survival days?
Day: Rank:
2 3 4 7 9 10 13 15 50>60>60 1 2 3 4 5 6 7 8 9 10 11
The median
Features: 1. the median is that it is not much affected by skewness in the distribution, or by the presence of outliers. 2. it discards a lot of information, because it ignores most of the values, apart from those in the centre of the distribution.
Normal Distributions
Curve is basically bell shaped from - to
symmetric with scores concentrated in the middle (i.e. on the mean) than in the tails.
Mean, medium and mode coincide
Mode
The most frequently occurring score Look at the simple frequency of each score Report mode when using nominal scale, the most frequently occurring category If you have a rectangular distribution do not report the mode
Features:
1. the mode is a measure of common-ness or typical-ness. 2. The mode is not particularly useful with metric continuous data where no two values may be the same
Example of Mode
In this case the data have tow modes: 5 and 7 Both measurements are repeated twice
Measurements x 3 5 5 1 7 2 6 7 0 4
Measures of Variability
Range Variance Standard deviation
Range
Distance between the highest and lowest scores in a distribution;
sensitive to extreme scores; Can compensate by calculating interquartile range (distance between the 25th and 75th percentile points) which represents the range of scores for the middle half of a distribution
range
unit 1 unit 2 9.7 9.0 11.5 11.2 11.6 11.3 12.1 11.7 12.4 12.2 12.6 12.5 13.1 13.2 13.5 13.8 13.6 14.0 14.8 15.5 16.3 15.6 26.9 16.2 16.4
unit 1 * 9 | | | | | | | | | | 3 | | 8 | 651 | 641 | 65 | * 7 unit 2 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9
24 56 0 28 25 237 0
* | | | | | | *
R: range(x)
Variance
The difference between an observed value and the
mean is called the deviation from the mean The variance is the mean squared deviation from the mean
i.e. you subtract each value from the mean, square each result and then take the average. Because it is squared it can never be negative
2/n s2 = S(xx ) i
What is:
(lowercase sigma) is the population standard deviation. the sample standard deviation (s-hat) is the sample estimate of
X X
N
The larger the standard deviation the more variability there is in the scores The standard deviation is somewhat less sensitive to extreme outliers than the range (as N increases)
x- x
Therefore, statisticians alter the formula of the sample standard deviation by subtracting 1 from N
The population variance could be interpreted as the average squared difference from the population mean, and the sample variance has almost the same interpretation about the sample mean.
feature
the variance and the standard deviation are shown to be the most appropriate measures of variation when the data come from a symmetric distribution,used to describe the spread tendency of the numeric variable.
Skewness of distributions
Measures look at how lopsided distributions arehow far from the ideal of the normal curve they are When the median and the mean are different, the distribution is skewed. The greater the difference, the greater the skew. Distributions that trail away to the left are negatively skewed and those that trail away to the right are positively skewed If the skewness is extreme, the researcher should either transform the data to make them better resemble a normal curve or else use a different set of statistics nonparametric statisticsto carry out the analysis
Percentiles
The p-the percentile is a number such that at most p% of the measurements are below it and at most 100 p percent of the data are above it. Example, if in a certain data the 85th percentile is 340 means that 15% of the measurements in the data are above 340. It also means that 85% of the measurements are below 340 Notice that the median is the 50th percentile
So
Descriptive statistics are used to summarize data from individual respondents, etc.
They help to make sense of large numbers of individual responses, to communicate the essence of those responses to others
They focus on typical or average scores, the dispersion of scores over the available responses, and the shape of the response curve