0% found this document useful (0 votes)

159 views9 pages

Descriptive Statistics

Descriptive statistics are used to describe and summarize data through measures of central tendency like mean, median and mode, and measures of variability like range, interquartile range, variance and standard deviation. They allow large datasets to be presented clearly but cannot be used to make inferences beyond the current data. Descriptive statistics are calculated from populations or samples, while inferential statistics allow inferences to be made about populations from samples.

Uploaded by

Breane Denece Licayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

159 views9 pages

Descriptive Statistics

Uploaded by

Breane Denece Licayan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Descriptive Statistics

Descriptive statistics is the term given to the analysis of data that helps describe, show or
summarize data in a meaningful way such that, for example, patterns might emerge from the
data. Descriptive statistics do not, however, allow us to make conclusions beyond the data we
have analysed or reach conclusions regarding any hypotheses we might have made. They are
simply a way to describe our data.

Descriptive statistics are very important because if we simply presented our raw data it would be
hard to visualize what the data was showing, especially if there was a lot of it. Descriptive
statistics therefore enables us to present the data in a more meaningful way, which allows
simpler interpretation of the data. For example, if we had the results of 100 pieces of students'
coursework, we may be interested in the overall performance of those students. We would also
be interested in the distribution or spread of the marks. Descriptive statistics allow us to do this.
How to properly describe data through statistics and graphs is an important topic and discussed
in other Laerd Statistics guides. Typically, there are two general types of statistic that are used to
describe data:

Measures of central tendency: these are ways of describing the central position of a frequency
distribution for a group of data. In this case, the frequency distribution is simply the distribution
and pattern of marks scored by the 100 students from the lowest to the highest. We can describe
this central position using a number of statistics, including the mode, median, and mean. You can
learn more in our guide: Measures of Central Tendency.

Measures of spread: these are ways of summarizing a group of data by describing how spread out
the scores are. For example, the mean score of our 100 students may be 65 out of 100. However,
not all students will have scored 65 marks. Rather, their scores will be spread out. Some will be
lower and others higher. Measures of spread help us to summarize how spread out these scores
are. To describe this spread, a number of statistics are available to us, including the range,
quartiles, absolute deviation, variance and standard deviation.

When we use descriptive statistics it is useful to summarize our group of data using a
combination of tabulated description (i.e., tables), graphical description (i.e., graphs and charts)
and statistical commentary (i.e., a discussion of the results).

Inferential Statistics

We have seen that descriptive statistics provide information about our immediate group of data.
For example, we could calculate the mean and standard deviation of the exam marks for the 100
students and this could provide valuable information about this group of 100 students. Any group
of data like this, which includes all the data you are interested in, is called a population. A
population can be small or large, as long as it includes all the data you are interested in. For
example, if you were only interested in the exam marks of 100 students, the 100 students would
represent your population. Descriptive statistics are applied to populations, and the properties of
populations, like the mean or standard deviation, are called parameters as they represent the
whole population (i.e., everybody you are interested in).

Often, however, you do not have access to the whole population you are interested in
investigating, but only a limited number of data instead. For example, you might be interested in
the exam marks of all students in the UK. It is not feasible to measure all exam marks of all
students in the whole of the UK so you have to measure a smaller sample of students (e.g., 100
students), which are used to represent the larger population of all UK students. Properties of
samples, such as the mean or standard deviation, are not called parameters, but statistics.
Inferential statistics are techniques that allow us to use these samples to make generalizations
about the populations from which the samples were drawn. It is, therefore, important that the
sample accurately represents the population. The process of achieving this is called sampling
(sampling strategies are discussed in detail in the section, Sampling Strategy, on our sister site).
Inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a
sample is not expected to perfectly represent the population. The methods of inferential statistics
are (1) the estimation of parameter(s) and (2) testing of statistical hypotheses.

What are the similarities between descriptive and inferential statistics?

Both descriptive and inferential statistics rely on the same set of data. Descriptive statistics rely
solely on this set of data, whilst inferential statistics also rely on this data in order to make
generalisations about a larger population.

What are the strengths of using descriptive statistics to examine a distribution of scores?

Other than the clarity with which descriptive statistics can clarify large volumes of data, there are
no uncertainties about the values you get (other than only measurement error, etc.).

What are the limitations of descriptive statistics?

Descriptive statistics are limited in so much that they only allow you to make summations about
the people or objects that you have actually measured. You cannot use the data you have
collected to generalize to other people or objects (i.e., using data from a sample to infer the
properties/parameters of a population). For example, if you tested a drug to beat cancer and it
worked in your patients, you cannot claim that it would work in other cancer patients only
relying on descriptive statistics (but inferential statistics would give you this opportunity).
CENTRAL TENDENCY

In statistics we have to deal with the mean, mode and the median. These are also called the
„Central Tendency“. These are just three different kinds of „averages” and certainly the most
popular ones.

The mean is simply the average and considered the most reliable measure of central tendency for
making assumptions about a population from a single sample. Central tendency determines the
tendency for the values of your data to cluster around its mean, mode, or median. The mean is
computed by the sum of all values, divided by the number of values.

The mode is the value or category that occurs most often within the data. Therefore a dataset has
no mode, if no number is repeated or if no category is the same. It is possible that a dataset has
more than one mode, but I will cover this in the „Modality“ section below. The mode is also the
only measure of central tendency that can be used for categorical variables since you can’t
compute for example the average for the variable „gender“. You simply report categorical
variables as numbers and percentages.

The median is the “middle” value or midpoint in your data and is also called the „50th
percentile“. Note that the median is much less affected by outliers and skewed data than the
mean. I will explain this with an example: Imagine you have a dataset of housing prizes that
range mostly from $100,000 to $300,000 but contains a few houses that are worth more than 3
million Dollars. These expensive houses will heavily effect then mean since it is the sum of all
values, divided by the number of values. The median will not be heavily affected by these
outliers since it is only the “middle” value of all data points. Therefore the median is a much
more suited statistic, to report about your data.

In a normal distribution, these measures all fall at the same midline point. This means that the
mean, mode and median are all equal.

MEASURES OF VARIABILITY

The most popular variability measures are the range, interquartile range (IQR), variance, and
standard deviation. These are used to measure the amount of spread or variability within your
data.
The range describes the difference between the largest and the smallest points in your data.

The interquartile range (IQR) is a measure of statistical dispersion between upper (75th) and
lower (25th) quartiles.

While the range measures where the beginning and end of your datapoint are, the interquartile
range is a measure of where the majority of the values lie.

The difference between the standard deviation and the variance is often a little bit hard to grasp
for beginners, but I will explain it thoroughly below.

VARIANCE AND STANDARD DEVIATION

The Standard Deviation and the Variance also measure, like the Range and IQR, how spread
apart our data is (e.g the dispersion). Therefore they are both derived from the mean.

The variance is computed by finding the difference between every data point and the mean,
squaring them, summing them up and then taking the average of those numbers.

The squares are used during the calculation because they weight outliers more heavily than
points that are near to the mean. This prevents that differences above the mean neutralize those
below the mean.
The problem with Variance is that because of the squaring, it is not in the same unit of
measurement as the original data.

Let’s say you are dealing with a dataset that contains centimeter values. Your variance would be
in squared centimeters and therefore not the best measurement.

This is why the Standard Deviation is used more often because it is in the original unit. It is
simply the square root of the variance and because of that, it is returned to the original unit of
measurement.

Let’s look at an example that illustrates the difference between variance and standard deviation:

Imagine a data set that contains centimeter values between 1 and 15, which results in a mean of
8. Squaring the difference between each data point and the mean and averaging the squares
renders a variance of 18.67 (squared centimeters), while the standard deviation is 4.3
centimeters.

When you have a low standard deviation, your data points tend to be close to the mean. A high
standard deviation means that your data points are spread out over a wide range.

Standard deviation is best used when data is unimodal. In a normal distribution, approximately
34% of the data points are lying between the mean and one standard deviation above or below
the mean. Since a normal distribution is symmetrical, 68% of the data points fall between one
standard deviation above and one standard deviation below the mean. Approximately 95% fall
between two standard deviations below the mean and two standard deviations above the mean.
And approximately 99.7% fall between three standard deviations above and three standard
deviations below the mean.

The picture below illustrates that perfectly.

With the so-called „Z-Score“, you can check how many standard deviations below (or above) the
mean, a specific data point lies. With pandas you can just use the „std()“ function. To better
understand the concept of a normal distribution, we will now discuss the concepts of modality,
symmetry and peakedness.

MODALITY

The modality of a distribution is determined by the number of peaks it contains. Most

distributions have only one peak but it is possible that you encounter distributions with two or
more peaks.

The picture below shows visual examples of the three types of modality:

Unimodal means that the distribution has only one peak, which means it has only one frequently
occurring score, clustered at the top. A bimodal distribution has two values that occur frequently
(two peaks) and a multimodal has two or several frequently occurring values.

SKEWNESS

Skewness is a measurement of the symmetry of a distribution.

Therefore it describes how much a distribution differs from a normal distribution, either to the
left or to the right. The skewness value can be either positive, negative or zero. Note that a
perfect normal distribution would have a skewness of zero because the mean equals the median.

Below you can see an illustration of the different types of skewness:

We speak of a positives skew if the data is piled up to the left, which leaves the tail pointing to
the right.

A negative skew occurs if the data is piled up to the right, which leaves the tail pointing to the
left. Note that positive skews are more frequent than negative ones.

A good measurement for the skewness of a distribution is Pearson’s skewness coefficient that
provides a quick estimation of a distributions symmetry. To compute the skewness in pandas you
can just use the „skew()“ function.

KURTOSIS

Kurtosis measures whether your dataset is heavy-tailed or light-tailed compared to a normal

distribution. Data sets with high kurtosis have heavy tails and more outliers and data sets with
low kurtosis tend to have light tails and fewer outliers. Note that a histogram is an effective way
to show both the skewness and kurtosis of a data set because you can easily spot if something is
wrong with your data. A probability plot is also a great tool because a normal distribution would
just follow the straight line.

You can see both for a positively skewed dataset in the image below:
A good way to mathematically measure the kurtosis of a distribution is fishers measurement of
kurtosis.

Now we will discuss the three most common types of kurtosis.

A normal distribution is called mesokurtic and has kurtosis of or around zero. A platykurtic
distribution has negative kurtosis and tails are very thin compared to the normal distribution.
Leptokurtic distributions have kurtosis greater than 3 and the fat tails mean that the distribution
produces more extreme values and that it has a relatively small standard deviation.

If you already recognized that a distribution is skewed, you don’t need to calculate it’s kurtosis,
since the distribution is already not normal. In pandas you can view the kurtosis simply by
calling the „kurtosis()“ function.

Summary

This post gave you a proper introduction to descriptive statistics. You learned what a Normal
Distribution looks like and why it is important. Furthermore, you gained knowledge about the
three different kinds of averages (mean, mode and median), also called the Central Tendency.
Afterwards, you learned about the range, interquartile range, variance and standard deviation.
Then we discussed the three types of modality and that you can describe how much a distribution
differs from a normal distribution in terms of Skewness. Lastly, you learned about Leptokurtic,
Mesokurtic and Platykurtic distributions.

Probability and Statistical Inference 9t PDF
100% (1)
Probability and Statistical Inference 9t PDF
30 pages
CRE Equations and Formulas Print Out
No ratings yet
CRE Equations and Formulas Print Out
30 pages
The Process of Conducting Research Using Quantitative and - Id.en
No ratings yet
The Process of Conducting Research Using Quantitative and - Id.en
28 pages
Chapter 1 Statistics
No ratings yet
Chapter 1 Statistics
41 pages
Gamma Extended Frechet Distribution
No ratings yet
Gamma Extended Frechet Distribution
23 pages
Unit WISE 2 MARKS PDF
100% (2)
Unit WISE 2 MARKS PDF
38 pages
22-Lecture Notes On Probability Theory and Random Processes
100% (2)
22-Lecture Notes On Probability Theory and Random Processes
302 pages
Inferential Statistics
100% (1)
Inferential Statistics
57 pages
Laerd Statistics 2013
100% (1)
Laerd Statistics 2013
3 pages
Stat Book
No ratings yet
Stat Book
383 pages
Statictics and Measures of Central Tendency
80% (5)
Statictics and Measures of Central Tendency
46 pages
Stochastic Calculus (EPFL)
No ratings yet
Stochastic Calculus (EPFL)
44 pages
Presentation Mcqs
100% (2)
Presentation Mcqs
2 pages
Introduction To Statistics and Data Presentation
100% (1)
Introduction To Statistics and Data Presentation
35 pages
Descriptive and Inferential Statistics
100% (1)
Descriptive and Inferential Statistics
10 pages
Statstics Full Handout
0% (1)
Statstics Full Handout
95 pages
Basics of Business Statistics
100% (1)
Basics of Business Statistics
66 pages
Business Statistics
100% (1)
Business Statistics
60 pages
LECTURE NOTES 7B - Importance of Validity and Reliability in Classroom Assessments
No ratings yet
LECTURE NOTES 7B - Importance of Validity and Reliability in Classroom Assessments
13 pages
Normal Distributions: The Normal Curve, Skewness, Kurtosis, and Probability
No ratings yet
Normal Distributions: The Normal Curve, Skewness, Kurtosis, and Probability
14 pages
Chapter 5 - Descriptive Statistics
No ratings yet
Chapter 5 - Descriptive Statistics
9 pages
Statistics in Research
No ratings yet
Statistics in Research
11 pages
Descriptive Statistics
100% (3)
Descriptive Statistics
41 pages
List of Formula - Managerial Statistics
No ratings yet
List of Formula - Managerial Statistics
6 pages
TD1 PointEstimation
No ratings yet
TD1 PointEstimation
5 pages
Statistics in Research 2018
No ratings yet
Statistics in Research 2018
8 pages
Chapter Three: Research Methodology
100% (1)
Chapter Three: Research Methodology
20 pages
Intra Class Correlation Icc
No ratings yet
Intra Class Correlation Icc
23 pages
Preliminaries 3
No ratings yet
Preliminaries 3
12 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
16 pages
Inferential Statistics
100% (1)
Inferential Statistics
38 pages
Ines Descriptive Statistics Level I Asta 2010
100% (1)
Ines Descriptive Statistics Level I Asta 2010
81 pages
Elementary Statistics 3
No ratings yet
Elementary Statistics 3
86 pages
Statistics and Freq Distribution
No ratings yet
Statistics and Freq Distribution
35 pages
Descriptive and Inferential Statistics
100% (3)
Descriptive and Inferential Statistics
3 pages
Institutional Autonomy and Diversity in Higher Education
100% (1)
Institutional Autonomy and Diversity in Higher Education
22 pages
Statistics in Research
100% (7)
Statistics in Research
13 pages
Statistics For Information Technology
0% (1)
Statistics For Information Technology
2 pages
Difference Between Correlation and Regression in Statistics
No ratings yet
Difference Between Correlation and Regression in Statistics
1 page
Structured Observation: Chapter Summary
No ratings yet
Structured Observation: Chapter Summary
5 pages
UNIT III - Measures of Dispersion
100% (1)
UNIT III - Measures of Dispersion
39 pages
Module 2 Collection and Presentation of Data
No ratings yet
Module 2 Collection and Presentation of Data
8 pages
Introduction To Statistics (Handouts)
100% (1)
Introduction To Statistics (Handouts)
2 pages
Research Topic 3
No ratings yet
Research Topic 3
3 pages
Basic Concepts of One Way Analysis of Variance (ANOVA)
No ratings yet
Basic Concepts of One Way Analysis of Variance (ANOVA)
38 pages
Statistics
No ratings yet
Statistics
4 pages
Parametric Tests
100% (1)
Parametric Tests
9 pages
Types of Statistical Analysis
No ratings yet
Types of Statistical Analysis
2 pages
Z-Test (Profile)
No ratings yet
Z-Test (Profile)
11 pages
Practice Test in Statistics
100% (2)
Practice Test in Statistics
3 pages
Introduction To Statistics (Week 1)
No ratings yet
Introduction To Statistics (Week 1)
31 pages
Sampling Size Determination
100% (1)
Sampling Size Determination
20 pages
6 Inferential Statistics
100% (1)
6 Inferential Statistics
55 pages
Chapter5 Measures of Variability
No ratings yet
Chapter5 Measures of Variability
31 pages
Importance of M & e
No ratings yet
Importance of M & e
17 pages
Chapter 2 Defining The Research Problem
No ratings yet
Chapter 2 Defining The Research Problem
17 pages
Methods of Data Collection
No ratings yet
Methods of Data Collection
25 pages
Meanings of Statistics
No ratings yet
Meanings of Statistics
28 pages
Descriptive Research - Tampocao&cerilo R.
No ratings yet
Descriptive Research - Tampocao&cerilo R.
66 pages
Statistics Exam
No ratings yet
Statistics Exam
11 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
2 pages
Test of Goodness of Fit
No ratings yet
Test of Goodness of Fit
38 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
2023 11 Exam P Syllabus
No ratings yet
2023 11 Exam P Syllabus
7 pages
Meaning and Definition of Statistics
100% (2)
Meaning and Definition of Statistics
4 pages
HW6 Solutions PDF
No ratings yet
HW6 Solutions PDF
4 pages
Education 217 - Administrative Leadership: Module 1-Basic Concepts in Statistics
No ratings yet
Education 217 - Administrative Leadership: Module 1-Basic Concepts in Statistics
7 pages
Null and Alternative Hypothesis
No ratings yet
Null and Alternative Hypothesis
4 pages
Figueiredo LxMLS2019 PDF
No ratings yet
Figueiredo LxMLS2019 PDF
204 pages
Epsc 123 Statistical Methods in Edc
100% (1)
Epsc 123 Statistical Methods in Edc
34 pages
07 - Workbook Part 2 - Business Statistics
No ratings yet
07 - Workbook Part 2 - Business Statistics
158 pages
Final Exam
No ratings yet
Final Exam
4 pages
Lampiran 4 Hasil Uji Validitas (Analisa Faktor) : KMO and Bartlett's Test
No ratings yet
Lampiran 4 Hasil Uji Validitas (Analisa Faktor) : KMO and Bartlett's Test
3 pages
HW2 ME406 201 Solution
No ratings yet
HW2 ME406 201 Solution
6 pages
(Exercise 1) ME 502
No ratings yet
(Exercise 1) ME 502
2 pages
CSE 323 (1) Statistics in Education
No ratings yet
CSE 323 (1) Statistics in Education
31 pages
Newbold Stat8 Ism 04 Ge
No ratings yet
Newbold Stat8 Ism 04 Ge
50 pages
DA Lecture - 3
No ratings yet
DA Lecture - 3
70 pages
Unit Summary
No ratings yet
Unit Summary
31 pages
Stein's Method For Poisson-Exponential Distributions
No ratings yet
Stein's Method For Poisson-Exponential Distributions
21 pages
Lecture 3
No ratings yet
Lecture 3
6 pages
Stats Presentation
No ratings yet
Stats Presentation
8 pages
Classification and Regression Trees (CART) Algorithm
No ratings yet
Classification and Regression Trees (CART) Algorithm
15 pages
STSM 2023 Module Guide
No ratings yet
STSM 2023 Module Guide
14 pages
QS BANK Signal
No ratings yet
QS BANK Signal
5 pages
Unit 2
No ratings yet
Unit 2
29 pages
UCS - 401 - Unit-LV - Probabilistic Models Normal Distribution and Its Geometric Interpretations - 03
No ratings yet
UCS - 401 - Unit-LV - Probabilistic Models Normal Distribution and Its Geometric Interpretations - 03
14 pages
Biostatics Course
No ratings yet
Biostatics Course
29 pages
5630-1 Final
No ratings yet
5630-1 Final
15 pages
Likelihood and Bayesian Inference With Applications in Biology and Medicine 2nd Edition Unlimited Download
100% (15)
Likelihood and Bayesian Inference With Applications in Biology and Medicine 2nd Edition Unlimited Download
14 pages

Descriptive Statistics

Uploaded by

Descriptive Statistics

Uploaded by

Descriptive Statistics

What are the similarities between descriptive and inferential statistics?

What are the limitations of descriptive statistics?

VARIANCE AND STANDARD DEVIATION

The picture below illustrates that perfectly.

The modality of a distribution is determined by the number of peaks it contains. Most

Skewness is a measurement of the symmetry of a distribution.

Below you can see an illustration of the different types of skewness:

Kurtosis measures whether your dataset is heavy-tailed or light-tailed compared to a normal

Now we will discuss the three most common types of kurtosis.

You might also like