0% found this document useful (0 votes)
20 views13 pages

Freq. Distribution Characteristics

Uploaded by

thangams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Freq. Distribution Characteristics

Uploaded by

thangams
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Characteristics of Frequency Distribution — Descriptive Statistics

Frequency Distribution — Frequency distribution in statistics, a list,

table, graph or data set organized to show the frequency of occurrence

of each possible outcome of a repeatable event observed many times.

For example, in the following list of numbers(1, 2, 3, 4, 6, 9, 9, 8, 5, 1,

1, 9, 9, 0, 6, 9). The frequency of the number 9 is 5 (because it occurs

5 times).

Frequency Distributions are classified into four types

Modality

Symmetry

Measure of Central Tendency

Measure of Dispersion or Variability

Modality
Modality — The modality of a distribution is determined by the number

of peaks it contains.

Types of Modality: Unimodal, Bimodal, Multimodal.

Types of Modality

Unimodal — A unimodal distribution has one values that occur

frequently (one peak)

Bimodal — A bimodal distribution has two values that occur frequently

(two peaks) and

Multimodal — A multimodal has two or several frequently occurring

values (more than two peaks)

ymmetry

Symmetry — Symmetry means that one half of the distribution is a

mirror image of the other half of the image.

Types of Symmetry: Symmetric, Asymmetric


Normal Curve(Symmetric) & Positive/Negative Skew(Asymmetric)

Symmetric — The normal distribution is a symmetric

distribution with no skew. The tails are exactly the same. A normal bell

curve equal on both sides.

Normal Bell Curve (Symmetric)

Asymmetric —Asymmetry is the absence of, or a violation of, symmetry.

which is not identical on both sides of a central line.

Types of Asymmetric: Positive Skewness, Negative Skewness

Positive Skewness — Positive Skewness is when the tail on the right

side of the distribution is longer or fatter than the tail on the left side. The

mean and median will be greater than the mode.

Negative Skewness — Negative Skewness is when the tail of the left

side of the distribution is longer or fatter than the tail on the right side.

The mean and median will be less than the mode.


Negative & Positive Skewness(Asymmetric)

Measure of Central Tendency

A measure of central tendency is a single value that attempts to

describe a set of data by identifying the central position within that set
of data. As such, measures of central tendency are sometimes

called measures of central location. They are also classed as summary

statistics.

In other words, central tendency computes the “center” around which

the data is distributed.

The mean, median and mode are all valid measures of central

tendency

Mean — (Average Value)

Mode — (Middle Value)

Median — (Value occurs maximum no of times)

Mean
The mean is equal to the sum of all the values in the data set divided by

the number of values in the data set.

For Example, We have 10 random numbers like (1,5,2,8,4,55,9,7,3,6) and

we need to add all the 10 numbers

sum of all 10 numbers →1+5+2+8+4+55+9+7+3+6 = 100

Mean

So, if we have n values in a data set and they have values x1, x2, …, xn,

the sample mean, usually denoted by x¯ (pronounced “x bar”), is:

Sample Mean

This formula is usually written in a slightly different manner using the

Greek capital letter, ∑, pronounced “sigma”, which means “sum

of…”:

Sample Mean Formula

You may have noticed that the above formula refers to the sample mean.

So, why have we called it a sample mean?


Please take a look on my previous post Population and Sample to get

better understanding in sample and population.

This is because, in statistics, samples and populations have very

different meanings and these differences are very important, even if, in

the case of the mean, they are calculated in the same way. To

acknowledge that we are calculating the population mean and not the

sample mean, we use the Greek lower case letter “mu”, denoted as μ:

Population Mean Formula

Disadvantages of Mean:

Let us take the above example for summarizing

We have 10 random numbers like (1,5,2,8,4,55,9,7,3,6)

Let us assume this 10 random numbers as 10 employee salary in

thousands

(1k,5k,2k,8k,4k,55k,9k,7k,3k,6k)

Outlier — Outliers are data points that are far from other data points.

In other words, they’re unusual values in a dataset.


So here one Employee has large amount of salary = 55k, So this

value is far from other data points and it affects the whole data, so it is

called as the outlier data

Note: Mean is highly affected by the outliers. The mean is

being skewed by the two large salaries. Therefore, in this situation, we

would like to have a better measure of central tendency. As we will find

out later, taking the median would be a better measure of central

tendency in this situation.

Median

The median is a simple measure of central tendency. To find the median,

we arrange the observations in order from smallest to largest value. If

there is an odd number of observations, the median is the middle value.

If there is an even number of observations, the median is the average of

the two middle values.

Simple way to remember: Middle Value is called Median

If data count is in odd:

1,7,6, 9, 8, 2, 3, 5,4 → Arrange it is ascending order

1,2,3,4,5,6,7,8,9 → Total Count = 9 (odd number)

Middle Value is the Median → Median = 5

If data count is in even:


1,7,6, 9, 8, 2, 3, 5,4,10 → Arrange it is ascending order

1,2,3,4,5,6,7,8,9,10 → Total Count = 10 (odd number)

Middle 2 Values is the Median → Average the 2 numbers to get the

median

Median Formula

Median = 5.5

Mode

·Mode is the number which appears most often in a set of number and

Mode is used for categorical data where we wish to know which is the

most common category.

Example: in {5, 4, 6, 5, 9, 5, 7, 3} the Mode is 5 (it occurs most often)

problem with the mode is that it will not provide us with a very good

measure of central tendency when the most common mark is far away

from the rest of the data in the data set

Note: To use the mode to describe the central tendency of this

data set would be misleading

Measure of Dispersion or Variability


Measures of dispersion describe the spread of the data. They include

the range, interquartile range, standard deviation and variance. The

range is given as the smallest and largest observations. This is the

simplest measure of variability.

Variability is also referred to as spread, scatter or dispersion. It is most

commonly measured with the following: Range — the difference

between the highest and lowest values.

Variability refers to how spread out a group of data is. The common
measures of variability are the range, IQR, variance, and standard

deviation.

Measures of variability or dispersion are descriptive statistics that

can only be used to describe the data in a given data set or study.
Range
Variance
Standard Deviation
Inter Quartile Range (IQR)

Range

The range is the difference between the lowest and highest values.

Range = Maximum Value — Minimum Value (Max — Min)

Example: In {2,4, 6, 9, 3, 7,10}, order in ascending order

lowest value is 2, and the highest is 10


Range = 10–2 = 8

Variance

The variance measures the average degree to which each point differs

from the mean. The average of all data points.

Variance measures variability from the average or mean. Therefore,

the variance statistic can help determine the risk an investor assumes

when purchasing a specific security. A large variance indicates that

numbers in the set are far from the mean and from each other, while a

small variance indicates the opposite

Unlike range and quartiles, the variance combines all the values in a

data set to produce a measure of spread. … It is calculated as the average

squared deviation of each number from the mean of a data set.

For example, for the numbers 1, 2, and 3 the mean is 2 and

the variance is 0.667

Variance Formula

Standard Deviation

The standard deviation is a statistic that measures the dispersion of a

dataset relative to its mean and is calculated as the square root of the

variance. If the data points are further from the mean, there is a

higher deviation within the data set; thus, the more spread out the data,

the higher the standard deviation.


Standard Deviation Formula

Inter Quartile Range (IQR)

Before going to IQR, let us know about the Quartile and Percentile

Percentile — Nth Percentile states that at least Nth % of values less

than or equal to this value and (100-N) is greater than equal to this value.

percentile simply states Nth Percentile of people are below me

Percentile Formula

Quartile — In statistics, a quartile is a type of quantile which divides the

number of data points into four parts, or quarters, of more-or-less equal

size. The data must be ordered from smallest to largest to compute

quartiles; as such, quartiles are a form of order statistic.


Dividing data in to ¼ parts

Q1 →1st Quartile — 25th Percentile

Q2 → 2nd Quartile — 50th Percentile

Q3 → 3rd Quartile — 75th Percentile

Inter Quartile Range — The IQR describes the middle 50% of values

when ordered from lowest to highest. To find the interquartile

range (IQR), first find the median (middle value) of the lower and upper

half of the data. These values are quartile 1 (Q1) and quartile 3 (Q3).

IQR = Q3-Q1

You might also like