STATISTICS
STATISTICS
1|Page
projecting the future return of a security or asset class based on returns in a sample
period.
Level of Measurements
Levels of measurement, also called scales of measurement, tell you how precisely
variables are recorded. In scientific research, a variable is anything that can take on
different values across your data set (e.g., height or test scores).
Going from lowest to highest, the 4 levels of measurement are cumulative. This
means that they each take on the properties of lower levels and add new
properties.
Nominal level Examples of nominal scales
2|Page
Ratio level Examples of ratio scales
You can categorize, rank, and infer equal Height
intervals between neighboring data points, and there is Age
a true zero point. Weight
A true zero means there is an absence of the variable of Temperature in Kelvin
interest. In ratio scales, zero does mean an absolute lack
of the variable.
3|Page
locating and selecting or designing instruments and protocols to use
administering the data collection
Sampling Method
Two primary types of sampling methods that you can use in your research:
Probability sampling involves random selection, allowing you to make
strong statistical inferences about the whole group.
Non-probability sampling involves non-random selection based on
convenience or other criteria, allowing you to easily collect data.
2. Systematic sampling
Systematic sampling is similar to simple random
sampling, but it is usually slightly easier to conduct.
Every member of the population is listed with a
number, but instead of randomly generating numbers,
individuals are chosen at regular intervals.
4|Page
3. Stratified sampling
Stratified sampling involves dividing the
population into subpopulations that may differ in
important ways. It allows you draw more precise
conclusions by ensuring that every subgroup is
properly represented in the sample. To use this
sampling method, you divide the population into
subgroups (called strata) based on the relevant
characteristic (e.g., gender identity, age range,
income bracket, job role).
4. Cluster sampling
Cluster sampling also involves dividing the
population into subgroups, but each subgroup
should have similar characteristics to the whole
sample. Instead of sampling individuals from
each subgroup, you randomly select entire
subgroups.
2. Voluntary response
sampling - Similar to
a convenience
sample, a voluntary
response sample is mainly based on ease of
access. Instead of the researcher choosing
participants and directly contacting them, people
volunteer themselves (e.g. by responding to a
public online survey). Voluntary response
samples are always at least
5|Page
somewhat biased, as some people will inherently be more likely to volunteer
than others, leading to self-selection bias.
DESCRIPTIVE ANALYSIS
Descriptive analysis, also known as descriptive analytics or descriptive
statistics, is the process of using statistical techniques to describe or summarize a
set of data. As one of the major types of data analysis, descriptive analysis is
popular for its ability to generate accessible insights from otherwise un-interpreted
data.
6|Page
Measures of Frequency
In descriptive analysis, it's essential to know how frequently a certain event or
response occurs. This is the purpose of measures of frequency, like a count or
percent.
For example, consider a survey where 1,000 participants are asked about
their favorite ice cream flavor. A list of 1,000 responses would be difficult to
consume, but the data can be made much more accessible by measuring how many
times a certain flavor was selected.
Measures of Central Tendency
In descriptive analysis, it's also worth knowing the central (or average) event
or response. Common measures of central tendency include the three averages —
mean, median, and mode.
Measures of Dispersion
Sometimes, it may be worth knowing how data is distributed across a range.
Measures of Position
Last of all, descriptive analysis can involve identifying the position of one
event or response in relation to others. This is where measures like percentiles and
quartiles can be used.
MEASURE OF CENTRAL TENDENCY
One way of summarizing the data is to figure out the data set by using
the descriptive measures. Among the most commonly used descriptive
measures which are important are the measures of central tendency and
measures of dispersion. The three measures of central tendency are the mean,
median and mode where the mean is the most familiar measure of the “center”.
The mean of the population is symbolized by the lowercase letter “mu” in Greek
alphabet, µ, while the mean of the sample is represented by x (x –bar).
Example: The scores of five students who are selected randomly in a class of
Math 01 are as follows: 44, 37, 41, 35and 32. Find their average score.
Solution:
44 +37+ 41+ 35+32 189
Applying the mean of ungrouped data gives x= = =37.8
5 5
Hence, the average score of the five students is 37.8.
The means of subgroups can be combined to come up with the group
mean known as weighted mean. This can be calculated using the formula
7|Page
where
The median is a single value which divides an array of observations into two
equal parts such that 50% of the observations falls above it and the remaining
50% falls below it. It may be written symbolically by x̃ read as “x -tilde”.
Example:
The number of books owned by the eleven children are as follows: 5, 2, 4, 6, 5, 10,
7, 6, 9, 8, 6.What is the median?
Solution:
Arrange the data in an array form: 2, 4, 5, 5, 6, 6, 6, 7, 8, 9, 10.Since the list
contains 11 numbers then the median is the middlemost value (6thnumber) which
is 6.
The mode is an observation that occurs most frequently in the given data set.
Example:
Find the mode in the following sets of scores.
a) set A: 36, 36, 12, 29, 35, 45. 50, 45, 45, 53
b) set B: 8, 7, 6, 5, 6, 9, 2, 3, 11, 11, 43, 10
c) set C: 39, 23, 25, 25, 63, 37, 45, 37, 48, 51, 28, 45, 50
d) set D: 2, 9, 8, 12, 5, 13, 6, 10
Solution:
The mode in set A is 45 because 45 occurs most frequently in the list. Both 6 and 11
have the most number in set B, therefore, set B has the mode equal to6 and 11.
The mode in set Care 25, 37 and 45since these numbers have the highest
frequency. Each element in set D has the same number of occurrences, thus, the
8|Page
data set has no mode. The distribution of data may be classified as unimodal,
bimodal, trimodal or multimodal distribution depending upon the number of
modal values in the given data set. In the above example, set A is unimodal, set
B is bimodal and set C is trimodal.
Mean
The mean (or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data, although its use is
most often with continuous. The mean is equal to the sum of all the values in the
data set divided by the number of values in the data set.
x̄ =∑fx /n
where:
fx = the product of frequency and class mark
n = total frequencies
Median
Based on the middle data in a set
∑f
−¿ cf
2
∫+( ) cw
fm
where:
∫ = lower boundaries of median class
∑f =total frequencies
<cf = cumulative frequency before/preceding the median class
fm = frequency of median class
cw = class width
Mode
The mode is the most frequent score in our data set
D₁
mode = lbₘₒ + ( ¿ cw
D₁+ D ₂
9|Page
where:
lbₘₒ = lower boundaries of modal class
D₁ = difference of the modal class and the class preciding it
D₂ = difference of the modal class and the class succeeding it
Measure of Dispersion
Variance – to assess group differences of population. Assess whether the
populations they come from significantly differ from each other.
∂² = ∑ f ¿ ¿
Σf = sum of the frequency
x̄ = mean
x = midpoint
Standard Deviation
Measure of how dispersed the data is in the relation to the mean.
Formula:
SD = √ ∑ f ¿ ¿ ¿
X = midpoint
• x ̄ = mean
10 | P a g e