The document discusses descriptive statistics, focusing on measures of central tendency (mean, median, mode) and variability (variance, standard deviation, interquartile range, range). It explains the characteristics of normally distributed, positively skewed, and negatively skewed data, as well as methods for assessing normality such as skewness and the Shapiro-Wilk test. Additionally, it covers the use of frequency counts and percentages in data analysis using JASP software.
The document discusses descriptive statistics, focusing on measures of central tendency (mean, median, mode) and variability (variance, standard deviation, interquartile range, range). It explains the characteristics of normally distributed, positively skewed, and negatively skewed data, as well as methods for assessing normality such as skewness and the Shapiro-Wilk test. Additionally, it covers the use of frequency counts and percentages in data analysis using JASP software.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25
Descriptive Statistics
It is use to accurately describe data and trends
or patterns of data.
It involves collection, classification/organization,
presentation, analysis, and interpretation of data.
The most commonly and widely used descriptive
statistics are the measures of central tendency, which are mean, median, and mode. Measures of Central Tendency Used to describe the set of data distribution's center. The mean, median, and mode are the measurements used to describe the distribution's center. The data distribution is described as normally distributed and skewed distribution, which is skewed to the left and skewed to the right. Measures of Central Tendency Normally distributed data, called bell curve, means that the graph data values shows a bell- shape or symmetrical image. Moreover, the mean, median, and the mode are all the same value and coincide with the peak of the curve. The figure of normally distributed data is show below: Measures of Central Tendency Positively skewed means that the left side of the graph contains more values, but the right side contains a longer distribution tail or has lesser values. This distribution has a greater mean value than the median. The figure of positively skewed is show below: Measures of Central Tendency Negatively skewed means that the right side of the graph has more values, while the left side has a longer distribution tail or has fewer values. Furthermore, the mean is lower than the median, and the mode might be zero. The figure of negatively skewed is show below: Measures of Central Tendency Mean is the arithmetic average of a set of scores. is obtained by getting the sum of the scores and dividing it by the number of scores. is the most reliable measure of central tendency because it involves all the scores in the distribution. is affected by extremely low or high scores. There is only one value for the mean. Used when the given data levels of measurement is interval or ratio, and when the distribution is normally distributed. Measures of Central Tendency Median is the middle score. is the score that divides the distribution of scores into 2 halves, the top half and the bottom half. is used when extreme score/values are given. If there are two middle scores (where n = even), the median is the average of the two middle scores. Just like the mean, there is only one value for the median. Used when the data levels of measurement are ordinal, interval, or ratio, and when the distribution skewed (either negatively skewed or positively skewed) Measures of Central Tendency Mode is considered as the most popular score because it is the score that occurred most frequently. is the least reliable measure of central tendency. Unlike the mean and median, it can take several values, or none at all. Unimodal - when there is only one mode value in the distribution. Bimodal - when there are two mode values in the distribution. Trimodal or multi-modal - when there are three or more mode values in the distribution. Used when the data levels of measurement are nominal, ordinal, interval, or ratio. Normality of Distribution Skewness Skewness quantifies the degree to which the distribution of statistical data deviates from the normal distribution, which is evenly distributed on both sides. Skewness value = 0, the distribution of statistical data is normally distributed Skewness value < 0, the distribution of statistical data is negatively skewed Skewness value > 0, the distribution of statistical data is positively skewed Normality of Distribution Description of skewness are as follows: 1) The data are fairly symmetrical if the skewness is between -0.5 and 0.5. 2) The data are substantially skewed if the skewness is between – 1 and – 0.5 or between 0.5 and 1. 3) The data are highly skewed if the skewness is less than -1 or higher than 1. Normality of Distribution Standard error of skewness and kurtosis were also used for checking normality. That is, z-scores for skewness and kurtosis were used as a rule. If z-scores of skewness and kurtosis are smaller than 1.96 (for %5 of type I error rate) the data was considered as normal (Field, 2009; Kim, 2013). A Z score could be obtained by dividing the skewness values or excess kurtosis value by their standard errors. For small sample size (n <50), z value ± 1.96 are sufficient to establish normality of the data. Normality of Distribution Shapiro Wilks Test The Shapiro–Wilk test can be used to decide whether or not a sample fits a normal distribution, and it is commonly used for small samples. If the chosen alpha level is 0.05 and the p-value is less than 0.05, then the null hypothesis that the data are normally distributed is rejected (this means that the distribution is not normal). If the p- value is greater than 0.05, then the null hypothesis is not rejected.(this means that the distribution is normal) Measures of Variability Used to describe the spread of the data, or its variation around a central value. The variance, standard deviation, and interquartile range are the most commonly used measurements to describe variability. Measures of Variability The variability for the mean is standard deviation and variance, the variability for the median is interquartile range, and the variability for the mode is range. Measures of Variability Variance is a numerical value that indicates the degree to which individuals within a group differ. If individual observations differ greatly from the group mean, the variance is large; conversely, if individual observations deviate little from the group mean, the variance is little. is the expectation of the squared deviation from the mean. It measures how far a set of random numbers are spread out from the mean. Measures of Variability Standard deviation As with variance, it is a numerical figure that describes the variability of individual data points within a group. When the standard deviation is large (that means it reflects the heterogeneity of the data set) or there is a large variation in the data or observations from the distribution mean; but when the standard deviation is small (that means it is close to the group mean), there is a small variation in the data or observations from the distribution mean (or homogeneous data set group). Measures of Variability Standard deviation is the square root of the variance. It describes, on the average the distance of the scores from the mean. The higher the value of the standard deviation, the farther the scores are from the mean. Measures of Variability Interquartile range The difference between the upper and lower quartile values or the middle fifty in a set of data. Indicates the measure of where the majority of the values are located. Measures of Variability Range is the score distance between the highest and the lowest value in the distribution Frequency Counts and Percentages in JASP Most of the time, in addition to evaluating the information to address the primary goal we often begin by figuring out how many respondents (from a large data set) fall into a category for a study variable in order to test the hypothesis of the study (i.e., finding out if there is a significant difference in test anxiety between male and female students). For example, "What percentage of the 1000 responders are women? Perhaps our goal is to find out "what percentage of students strongly agreed on one of the self-efficacy questionnaires, again without actually having to count it." is what we might be interested in finding out. Frequency Counts and Percentages in JASP To illustrate this, we will use the excel file. Sample Matrix of Data Lecture CSV which is show below: Frequency Counts, Percentages, Descriptive Statistics, and Normality Distribution in JASP Import the file to JASP. Determine the frequency and percentages of the data. If the table shows 0 Missing, this indicates that the data set is complete. Frequency Counts, Percentages, Descriptive Statistics, and Normality Distribution in JASP Presents the results: The frequency table shows that out of ____ respondents, ____ were males, and that is _____%. There are ____ females comprising _____%. Similarly, _____ or _____% are enrolled as _________ year level, while _____ or ____% are enrolled as________ year level. In terms of program, ____(_____%) responded as ________ program, while ____(_____%) were ________ program. The output shows the descriptive statistics of male and female in the engagement scale. The mean age of the ___ males is _____with the standard deviation of ____. Similarly, the mean age of the ____ females is ___ with the standard deviation of ____. The output also shows that the age of the males are more ________ than the age of the females. The overall mean of the ____ students in the engagement scale is _____. Frequency Counts, Percentages, Descriptive Statistics, and Normality Distribution in JASP Activity 2. Present the results of the data that you have gathered.