0% found this document useful (0 votes)
16 views

Statistical Analysis_ Descriptive Stat (2)

Statistical Analysis_ Descriptive Stat

Uploaded by

henthent1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Statistical Analysis_ Descriptive Stat (2)

Statistical Analysis_ Descriptive Stat

Uploaded by

henthent1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Descriptive Statistics: Meaning, Type

Descriptive statistics are brief informational coefficients that summarize a given data set, which
can be either a representation of the entire population or a sample of a population. Descriptive
statistics are broken down into measures of central tendency and measures of variability (spread).
Measures of central tendency include the mean, median, and mode, while measures of variability
include standard deviation, variance, minimum and maximum variables, kurtosis, and skewness.

Descriptive statistics help describe and explain the features of a specific data set by giving short
summaries about the sample and measures of the data. The most recognized types of descriptive
statistics are measures of center. For example, the mean, median, and mode, which are used at
almost all levels of math and statistics, are used to define and describe a data set. The mean, or
the average, is calculated by adding all the figures within the data set and then dividing by the
number of figures within the set.
For example, the sum of the following data set is 20: (2, 3, 4, 5, 6). The mean is 4 (20/5). The
mode of a data set is the value appearing most often, and the median is the figure situated in the
middle of the data set. It is the figure separating the higher figures from the lower figures within
a data set. However, there are less common types of descriptive statistics that are still very
important.

Types of Descriptive Statistics


Central Tendency

Measures of central tendency focus on the average or middle values of data sets, whereas
measures of variability focus on the dispersion of data. These two measures use graphs, tables,
and general discussions to help people understand the meaning of the analyzed data.

Measures of central tendency describe the center position of a distribution for a data set. A
person analyzes the frequency of each data point in the distribution and describes it using the
mean, median, or mode, which measures the most common patterns of the analyzed data set.

Measures of Variability

Measures of variability (or measures of spread) aid in analyzing how dispersed the distribution is
for a set of data. For example, while the measures of central tendency may give a person the
average of a data set, it does not describe how the data is distributed within the set.

So while the average of the data might be 65 out of 100, there can still be data points at both 1
and 100. Measures of variability help communicate this by describing the shape and spread of
the data set. Range, quartiles, absolute deviation, and variance are all examples of measures of
variability.3

Consider the following data set: 5, 19, 24, 62, 91, 100. The range of that data set is 95, which is
calculated by subtracting the lowest number (5) in the data set from the highest (100).

Distribution

Distribution (or frequency distribution) refers to the number of times a data point occurs.
Alternatively, it can be how many times a data point fails to occur. Consider this data set: male,
male, female, female, female, other. The distribution of this data can be classified as:

● The number of males in the data set is 2.


● The number of females in the data set is 3.
● The number of individuals identifying as others is 1.
● The number of non-males is 4.

Univariate vs. Bivariate


In descriptive statistics, univariate data analyzes only one variable. It is used to identify
characteristics of a single trait and is not used to analyze any relationships or causations.

For example, imagine a room full of high school students. Say you wanted to gather the average
age of the individuals in the room. This univariate data is only dependent on one factor: each
person's age. By gathering this one piece of information from each person and dividing by the
total number of people, you can determine the average age.

Bivariate data, on the other hand, attempts to link two variables by searching for correlation. Two
types of data are collected, and the relationship between the two pieces of information is
analyzed together. Because multiple variables are analyzed, this approach may also be referred to
as multivariate.

Let's say each high school student in the example above takes a college assessment test, and we
want to see whether older students are testing better than younger students. In addition to
gathering the ages of the students, we need to find out each student's test score. Then, using data
analytics, we mathematically or graphically depict whether there is a relationship between
student age and test scores.
Source: Google Image.

MEASURES OF CENTRAL TENDENCY

1. MEAN (Arithmetic Average):

Mean is the arithmetic average computed by summing all the values in the dataset and dividing
the sum by the number of data values. For a finite set of dataset with measurement values X1,
X2, …., Xn (a set of n numbers), it is defined by the formula:

Mean Formula
The sample mean is represented by x-bar.
The population mean is represented by Greek letter µ.

For a given data set: 12, 14, 11, 12, 12, 12, 15, 17, 22, 15, 12
Sum of data points = (12+14+11+12+12+12+15+17+22+15+12) = 154
Number of data points = (take a total count of observations) = 11
Mean = (Divide sum of data points into total number of data points) = 154/11 = 1

2. MEDIAN:
The middle number in the data set (n/2), when arranged in ascending order (small to large). If
there are odd numbers of observations then the median is the (n+1)/2th ordered value. If there are
even numbers of observations then the median is average of the two middle values.

For a given data set: 12, 14, 11, 12, 12, 12, 15, 17, 22, 15, 12
Ascending Order: 11, 12, 12, 12, 12, 12, 14, 15, 15, 17, 22
Thus, the middle number in the data set Median = 12

3. MODE:
Mode is the data point having the highest frequency (maximum occurrences).

For a given data set: 12, 14, 11, 12, 12, 12, 15, 17, 22, 15, 12
Maximum occurring data point, Mode = 12

4. QUARTILES:
A quartile is any of the three values which divide the sorted data set into four equal parts, so that
each part represents one fourth of the sampled population.
● First quartile (designated Q1) = lower quartile = cuts off lowest 25% of data = 25th
percentile
● Second quartile (designated Q2) = median = cuts data set in half = 50th percentile
● Third quartile (designated Q3) = upper quartile = cuts off highest 25% of data, or lowest
75% = 75th percentile
● The difference between the upper and lower quartiles is called the interquartile range.

MEASURES OF CENTRAL DISPERSION/VARIATION

1. STANDARD DEVIATION:
It can be interpreted as the average distance of the individual observations from the mean.
Standard deviation of the population is represented as "σ". Standard deviation of the sample is
represented as "s".

Standard Deviation Formula

In the above formula,


Sx stands for standard deviation of the sample.
xi is the value of each variable in the data set.
x bar represents the mean.
n is the total sample size.
And Σ stands for summation i.e. it says that we need to take the sum of “xi – x bar” for all values
of x.

2. VARIANCE:
Variance is defined as the square of standard deviation. Variance of the population is represented
as σ times σ. Variance for the sample is represented as "s times s".

Variance Formula

In the above formula,


Sx stands for standard deviation of the sample.
xi is the value of each variable in the data set.
x bar represents the mean.
n is the total sample size.
And Σ stands for summation i.e. it says that we need to take the sum of “xi – x bar” for all values
of x.

3. RANGE:
Range is defined as the difference between the largest value in a data set and the smallest value
in a data set.

Range Formula

ValueMax stands for the highest (maximum) value in the data set and ValueMin stands for the
lowest (minimum) value in the data set.

In a given data-set like 12, 13, 11, 12, 12


Range: 13 – 11 = 2
Mean: (12+13+11+12+12) / 5 = 12
Variance: Sum of [(X – mean) times (X – mean)] / (n – 1) = [0+1+1+0+0] / (5 – 1) = 2 / 4 = 0.50
Standard Deviation: Square Root of 0.50 = 0.7071
Source:

https://www.investopedia.com/

https://www.sixsigma-institute.org/

You might also like