0% found this document useful (0 votes)
30 views10 pages

STATISTICS

Statistics is the mathematical science of data collection, organization, interpretation, analysis, and presentation, used across various scientific disciplines and industries to inform decision-making. It encompasses two main areas: descriptive statistics, which summarize data characteristics, and inferential statistics, which draw conclusions about populations based on sample data. Additionally, the document outlines levels of measurement, research methods, sampling techniques, and descriptive analysis, emphasizing the importance of central tendency and dispersion measures in data interpretation.

Uploaded by

Johaina Ampatuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views10 pages

STATISTICS

Statistics is the mathematical science of data collection, organization, interpretation, analysis, and presentation, used across various scientific disciplines and industries to inform decision-making. It encompasses two main areas: descriptive statistics, which summarize data characteristics, and inferential statistics, which draw conclusions about populations based on sample data. Additionally, the document outlines levels of measurement, research methods, sampling techniques, and descriptive analysis, emphasizing the importance of central tendency and dispersion measures in data interpretation.

Uploaded by

Johaina Ampatuan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Essential Concept of Statistics

In Mathematics, statistics concerns the collection of data, organization,


interpretation, analysis and data presentation. The main purpose of using statistics
is to plan the collected data in terms of experimental designs and statistical
surveys. Statistics is considered a mathematical science that works with numerical
data. In short, statistics is a crucial process which helps to make the decision based
on the data.
Statistics are used in virtually all scientific disciplines, such as the physical
and social sciences as well as in business, medicine, the humanities, government,
and manufacturing. Statistics is fundamentally a branch of applied mathematics
that developed from the application of mathematical tools, including calculus and
linear algebra, to probability theory.
The two major Areas of Statistics
Descriptive Statistics
At the simplest level, descriptive statistics summarize and describe relatively
basic but essential features of a quantitative dataset – for example, a set of survey
responses. They provide a snapshot of the characteristics of your dataset and allow
you to better understand, roughly, how the data are “shaped” (more on this later).
For example, a descriptive statistic could include the proportion of males and
females within a sample or the percentages of different age groups within a
population.
Another common descriptive statistic is the humble average (which in
statistics-talk is called the mean). For example, if you undertook a survey and asked
people to rate their satisfaction with a particular product on a scale of 1 to 10, you
could then calculate the average rating. This is a very basic statistic, but as you can
see, it gives you some idea of how this data point is shaped.
Inferential Statistics
Inferential statistics is a tool that statisticians use to draw conclusions about
the characteristics of a population, drawn from the characteristics of a sample. It is
also used to determine how certain they can be of the reliability of those
conclusions. Based on the sample size and distribution, statisticians can calculate
the probability that statistics, which measure the central tendency, variability,
distribution, and relationships between characteristics within a data sample, provide
an accurate picture of the corresponding parameters of the whole population from
which the sample is drawn.
Inferential statistics are used to make generalizations about large groups,
such as estimating average demand for a product by surveying the buying habits of
a sample of consumers or attempting to predict future events. This might mean

1|Page
projecting the future return of a security or asset class based on returns in a sample
period.
Level of Measurements
Levels of measurement, also called scales of measurement, tell you how precisely
variables are recorded. In scientific research, a variable is anything that can take on
different values across your data set (e.g., height or test scores).

Nominal, ordinal, interval, and ratio data

Going from lowest to highest, the 4 levels of measurement are cumulative. This
means that they each take on the properties of lower levels and add new
properties.
Nominal level Examples of nominal scales

You can categorize your data by labelling them in  City of birth


mutually exclusive groups, but there is no order between  Gender
the categories.  Ethnicity
 Car brands
 Marital status

Ordinal level Examples of ordinal scales


You can categorize and rank your data in an order, but  Top 5 Olympic medalists
you cannot say anything about the intervals between the  Language ability (e.g.,
rankings. beginner, intermediate,
Although you can rank the top 5 Olympic medalists, this fluent)
scale does not tell you how close or far apart they are in  Likert-type questions
number of wins. (e.g., very dissatisfied to
very satisfied)

Interval level Examples of interval scales


You can categorize, rank, and infer equal  Test scores (e.g., IQ or
intervals between neighboring data points, but there is no exams)
true zero point.  Personality inventories
The difference between any two adjacent temperatures is  Temperature in
the same: one degree. But zero degrees is defined Fahrenheit or Celsius
differently depending on the scale – it doesn’t mean an
absolute absence of temperature.

The same is true for test scores and personality


inventories. A zero on a test is arbitrary; it does not mean
that the test-taker has an absolute lack of the trait being
measured.

2|Page
Ratio level Examples of ratio scales
You can categorize, rank, and infer equal  Height
intervals between neighboring data points, and there is  Age
a true zero point.  Weight
A true zero means there is an absence of the variable of  Temperature in Kelvin
interest. In ratio scales, zero does mean an absolute lack
of the variable.

For example, in the Kelvin temperature scale, there are no


negative degrees of temperature – zero means an
absolute lack of thermal energy.

Types of Research Methods


Qualitative methods
Qualitative research is a method that collects data using conversational
methods, usually open-ended questions. The responses collected are essentially
non-numerical. This method helps a researcher understand what participants think
and why they think in a particular way.
Types of qualitative methods include:
 One-to-one Interview
 Focus Groups
 Ethnographic studies
 Text Analysis
 Case Study
Quantitative methods
Quantitative methods deal with numbers and measurable forms. It uses a
systematic way of investigating events or data. It answers questions to justify
relationships with measurable variables to either explain, predict, or control a
phenomenon.
Types of quantitative methods include:
 Survey research
 Descriptive research
 Correlational research
Collecting Quantitative Data
Step in the Process of the Data Collection
 Determining participants and sites to study
 obtaining permissions (access) needed from individuals and organizations
 Considering what types of information to collect

3|Page
 locating and selecting or designing instruments and protocols to use
 administering the data collection

Selecting Participants for the Study


Selecting participants or respondents and the type of sample has many ways
to consider depending on how you use the information.
In research, SAMPLING is a word that refers to the method or process of
selecting respondents or people to answer questions meant to yield data for a
research study. (Paris 2013)

Population: all members from a specified


group, all possible outcomes or measurements
that are of interest.
Sample: the group of members/ elements that
actually participated in the study

Sampling Method
Two primary types of sampling methods that you can use in your research:
 Probability sampling involves random selection, allowing you to make
strong statistical inferences about the whole group.
 Non-probability sampling involves non-random selection based on
convenience or other criteria, allowing you to easily collect data.

There are four main types of probability sample:


1. Simple random sampling
In a simple random sample, every member of the
population has an equal chance of being selected.
Your sampling frame should include the whole
population.

To conduct this type of sampling, you can use tools


like random number generators or other techniques that are based entirely on
chance.

2. Systematic sampling
Systematic sampling is similar to simple random
sampling, but it is usually slightly easier to conduct.
Every member of the population is listed with a
number, but instead of randomly generating numbers,
individuals are chosen at regular intervals.

4|Page
3. Stratified sampling
Stratified sampling involves dividing the
population into subpopulations that may differ in
important ways. It allows you draw more precise
conclusions by ensuring that every subgroup is
properly represented in the sample. To use this
sampling method, you divide the population into
subgroups (called strata) based on the relevant
characteristic (e.g., gender identity, age range,
income bracket, job role).

Based on the overall proportions of the


population, you calculate how many people
should be sampled from each subgroup. Then you
use random or systematic sampling to select a sample from each subgroup.

4. Cluster sampling
Cluster sampling also involves dividing the
population into subgroups, but each subgroup
should have similar characteristics to the whole
sample. Instead of sampling individuals from
each subgroup, you randomly select entire
subgroups.

Non-probability sampling methods:


1. Convenience sampling - A convenience sample
simply includes the individuals who happen to be
most accessible to the researcher.

2. Voluntary response
sampling - Similar to
a convenience
sample, a voluntary
response sample is mainly based on ease of
access. Instead of the researcher choosing
participants and directly contacting them, people
volunteer themselves (e.g. by responding to a
public online survey). Voluntary response
samples are always at least

5|Page
somewhat biased, as some people will inherently be more likely to volunteer
than others, leading to self-selection bias.

3. Purposive sampling- choosing respondents whom you have judged as


people with good background or with great enthusiasm.

4. Snowball sampling - can be used to recruit


participants via other participants. The number of
people you have access to “snowballs” as you get
in contact with more people. The downside here is
also representativeness, as you have no way of
knowing how representative your sample is due to
the reliance on participants recruiting others.

5. Quota Sampling – choosing specific


samples that you know correspond to the
population in terms of one, two or more
characteristics.

DESCRIPTIVE ANALYSIS
Descriptive analysis, also known as descriptive analytics or descriptive
statistics, is the process of using statistical techniques to describe or summarize a
set of data. As one of the major types of data analysis, descriptive analysis is
popular for its ability to generate accessible insights from otherwise un-interpreted
data.

Types of Descriptive Analysis


1. Measures of Frequency
2. Central Tendency
3. Dispersion or Variation
4. Position

6|Page
Measures of Frequency
In descriptive analysis, it's essential to know how frequently a certain event or
response occurs. This is the purpose of measures of frequency, like a count or
percent.
For example, consider a survey where 1,000 participants are asked about
their favorite ice cream flavor. A list of 1,000 responses would be difficult to
consume, but the data can be made much more accessible by measuring how many
times a certain flavor was selected.
Measures of Central Tendency
In descriptive analysis, it's also worth knowing the central (or average) event
or response. Common measures of central tendency include the three averages —
mean, median, and mode.
Measures of Dispersion
Sometimes, it may be worth knowing how data is distributed across a range.
Measures of Position
Last of all, descriptive analysis can involve identifying the position of one
event or response in relation to others. This is where measures like percentiles and
quartiles can be used.
MEASURE OF CENTRAL TENDENCY

One way of summarizing the data is to figure out the data set by using
the descriptive measures. Among the most commonly used descriptive
measures which are important are the measures of central tendency and
measures of dispersion. The three measures of central tendency are the mean,
median and mode where the mean is the most familiar measure of the “center”.
The mean of the population is symbolized by the lowercase letter “mu” in Greek
alphabet, µ, while the mean of the sample is represented by x (x –bar).

Example: The scores of five students who are selected randomly in a class of
Math 01 are as follows: 44, 37, 41, 35and 32. Find their average score.

Solution:
44 +37+ 41+ 35+32 189
Applying the mean of ungrouped data gives x= = =37.8
5 5
Hence, the average score of the five students is 37.8.
The means of subgroups can be combined to come up with the group
mean known as weighted mean. This can be calculated using the formula

7|Page
where

xi is the ith observation


wi is the frequency or weight for each observation
n is the total of the frequencies

The median is a single value which divides an array of observations into two
equal parts such that 50% of the observations falls above it and the remaining
50% falls below it. It may be written symbolically by x̃ read as “x -tilde”.

The median of the data set consisting of an odd –numbered observations is


Xn+1
the middle most value in the list. That is, x̃ = where n is the number of
2
observations. Ifn is even, the median is the average of the two middlemost values.
m 1+ m2
It can be computed as x̃ = where m1 and m2 are the two middlemost values.
2
Take note that the observations are first arranged in an array form (from
lowest to highest) before getting the median value.

Example:
The number of books owned by the eleven children are as follows: 5, 2, 4, 6, 5, 10,
7, 6, 9, 8, 6.What is the median?

Solution:
Arrange the data in an array form: 2, 4, 5, 5, 6, 6, 6, 7, 8, 9, 10.Since the list
contains 11 numbers then the median is the middlemost value (6thnumber) which
is 6.

The mode is an observation that occurs most frequently in the given data set.

Example:
Find the mode in the following sets of scores.
a) set A: 36, 36, 12, 29, 35, 45. 50, 45, 45, 53
b) set B: 8, 7, 6, 5, 6, 9, 2, 3, 11, 11, 43, 10
c) set C: 39, 23, 25, 25, 63, 37, 45, 37, 48, 51, 28, 45, 50
d) set D: 2, 9, 8, 12, 5, 13, 6, 10
Solution:
The mode in set A is 45 because 45 occurs most frequently in the list. Both 6 and 11
have the most number in set B, therefore, set B has the mode equal to6 and 11.
The mode in set Care 25, 37 and 45since these numbers have the highest
frequency. Each element in set D has the same number of occurrences, thus, the

8|Page
data set has no mode. The distribution of data may be classified as unimodal,
bimodal, trimodal or multimodal distribution depending upon the number of
modal values in the given data set. In the above example, set A is unimodal, set
B is bimodal and set C is trimodal.

Measure of Central Tendency for Grouped Data

Mean
The mean (or average) is the most popular and well known measure of central
tendency. It can be used with both discrete and continuous data, although its use is
most often with continuous. The mean is equal to the sum of all the values in the
data set divided by the number of values in the data set.
x̄ =∑fx /n
where:
fx = the product of frequency and class mark
n = total frequencies
Median
Based on the middle data in a set
∑f
−¿ cf
2
∫+( ) cw
fm
where:
∫ = lower boundaries of median class
∑f =total frequencies
<cf = cumulative frequency before/preceding the median class
fm = frequency of median class
cw = class width

Mode
The mode is the most frequent score in our data set
D₁
mode = lbₘₒ + ( ¿ cw
D₁+ D ₂

9|Page
where:
lbₘₒ = lower boundaries of modal class
D₁ = difference of the modal class and the class preciding it
D₂ = difference of the modal class and the class succeeding it

Measure of Dispersion
Variance – to assess group differences of population. Assess whether the
populations they come from significantly differ from each other.

∂² = ∑ f ¿ ¿
Σf = sum of the frequency
x̄ = mean
x = midpoint

Standard Deviation
Measure of how dispersed the data is in the relation to the mean.
Formula:
SD = √ ∑ f ¿ ¿ ¿

Where: Σ f =∑ of the frequency

X = midpoint
• x ̄ = mean

10 | P a g e

You might also like