Statistics Lesson 1
Statistics Lesson 1
STATISTICS
STATISTICS
In statistics, a sample:
a. can be used for inferences but not
for predictions.
b. is another word for population.
c. is only used in descriptive statistics.
d. is a set of data taken from the
population to represent the
population.
Question 1 of 5:
Answer: d
Answer 1 of 5:
How do descriptive and
inferential statistics differ?
a. Inferential statistics only attempt to
describe data, while descriptive statistics
attempt to make predictions based on data.
b. Inferential statistics are more
computationally sophisticated than
descriptive statistics.
c. Descriptive statistics are more
computationally sophisticated than
inferential statistics.
d. Descriptive statistics only attempt to
describe
Question 2 of 5:
data, while inferential statistics
attempt to make predictions based on data.
Answer: d
Answer 2 of 5:
Which two are examples of
descriptive statistics?
Answer 3 of 5:
What is statistical
estimation?
a. Methods for reducing errors in descriptive
statistics.
b. Methods for reducing errors in inferential
statistics.
c. Methods for rounding answers in statistical
calculations.
d. Methods to determine the best graph to
represent statistical data.
Question 4 of 5:
Answer: b
Answer 4 of 5:
What are two examples of
inferential statistics?
a. Regression analysis and hypothesis
testing.
b. Variance and correlation.
c. Range and percentiles.
d. mean and probability distributions.
Question 5 of 5:
Answer: a
Answer 5 of 5:
Statistics
A branch of mathematics that examines and investigates way to
process and analyze the data gathered
Provides procedure in data collection, presentation, organization, and
interpretation to have a meaningful data that is useful to business
Statistics is generally understood as the subject dealing with number
and data, more broadly it involves activities such as collection of data
from survey or experiment, summarization or management of data,
presentation of results in a convincing format, analysis of data or
drawing valid inferences from findings.
Kinds of Statistics
Descriptive Statistics – is the totality of methods and
treatments employed in the collection, description, and
analysis of numerical data
◦ To tell something about the particular group of observation
Inferential Statistics – logical process from sample analysis
to a generalization or conclusion about a population
(statistical inference/inductive statistics)
Population and Sample
Sample
Population
Population
Population – refers to the totality of observations or elements from a
set of data.
◦ Example. Suppose a teacher conducts a study on the correlation of the
students’ entrance examination scores and their respective academic
performance. To ensure the validity of his findings, he decided to include all
the students who are enrolled for the current school year under a certain
program or course, hence the entire population.
Sample
Sample– refers to one or more elements taken from the population
for a specific purpose.
◦ Example. Because of the budget issues and feasibility concerns, the teacher
decided to include only a group of 200 students to participate in his study.
Parameter versus Statistic
Parameter – a numerical measure that describes the whole
population
◦ If all students in a school are surveyed about their heights and an
average height of 65 inches was determined, then 65 inches is
called a population parameter.
Statistic – a numerical description of the sample
◦ 65 inches will be called a sample statistic when only 50 students
out of 230 students are surveyed to determine the average height.
Sources of Data
Primary Data are data that come from an original source, and
are intended to answer specific research questions, can be
taken by interview, mail-in questionnaire, survey, or
experimentation.
Secondary Data are data that are from previously recorded
data, such as information in research conducted, industry
financial statements, business periodicals, and government
reports. It can also be taken electronically like internet
websites or compact disk.
Characteristics of Objects, people or
events
Constant is a characteristic of object, people or events that does not
vary like temperature at which water boils (100 degrees Celsius)
Variable is characteristic of objects, people, events that can take of
different values. It can vary in quantity like weight of people, or in
quality like hair color.
Basic Types of Variables/Data
Qualitative
◦ is conceptualized and analyzed as distinct categories, with no continuum
implied.
◦ Categorical variable
◦ Observations that are put in the same or different classes, each class being
considered as possessing some common characteristic that is not shared by
those in other classes.
Example: eye color, gender, occupation, religious preference, etc.
Basic Types of Variables/Data
Quantitative Data
◦ Also termed as numerical variable;
◦ Variates that yield frequencies when counted, giving rise to discrete
variable or when measured yield frequencies when counted, giving
rise to discrete variable or when measured yield variates that yield
metric or continuous variables
◦ variable that is conceptualized and analyzed along continuum
implied.
◦ Differs in amount of degree
Example: height, weight, math aptitude, salary, etc.
Types of Variables
Variables
Qualitative Quantitative
Discrete Continuous
Mathematical Classification
Continuous variable – is a variable which can assume any of
an infinite number of values, and can be associated with
points on a continuous line interval.
◦ Example. Height, weight, volume, etc.
Discrete variable – is a variable which consist of either a
finite number of values or countable number of values
◦ Gender, courses, olympic games, etc.
Experimental Classification
Independent variables – are variables controlled by the
experimenter/researcher, and expected to have an effect on the
behavior of the subjects.
◦ Also called explanatory variable
Qualitative Quantitative
Compare one group to a One sample t-test Wilcoxon test Chi-square or Binomial Test
hypothetical value
Compare two unpaired Unpaired t-test Mann-Whitney test Fisher’s test (Chi-square for Log-rank test or Mantel-
groups large samples Haenszel
Compare two paired groups Paired t-test Wilcoxon test McNemar’s test Conditional proportional
hazards regression
Compare three or more One-way ANOVA Kruskal-Wallis test Chi-square test Cox proportional hazard
unmatched groups regression
Compare three or more Repeated-measures Friedman test Cochrane Q Conditional proportional
matched groups ANOVA hazard regression
Quantify association Pearson correlation Spearman rho correlation Contingency coefficients
between two variables
Predict value from another Simple linear regression or Nonparametric regression Simple logistic regression Cox proportional hazard
measured variable nonlinear regression regression
Predict value from several Multiple linear regression or Multiple logistic regression Cox proportional hazard
measured or binomial multiple nonlinear regression
variables regression
Sampling Techniques
Probability Sampling
◦ Each member of the population has known probability of being selected in
the sample
Nonprobability Sampling
◦ There is bias in the selection and there is no recognized probability that one
member will be included in the sample
Sampling Techniques
Sampling Simple Random Sampling
Techniques Probability
Systematic Sampling
Sampling
Stratified Sampling
Cluster Sampling
Convenience Sampling
Nonprobability
Purposive Sampling
Sampling
Snowball Sampling
Quota Sampling
Simple Random Sampling
(Probability Sampling)
Most commonly used sampling technique
Each member of the population has an equal chance to be selected
as a participant
Done by choosing the members of the sample one by one, using
either the lottery method or the tables of random numbers
Systematic Random Sampling
(Probability Sampling)
It considers every nth element of the population in the sample with
the selected random starting point from the first k members
Systematic Random
Sampling
1 26
51 76
N = 100 2 27
52 77
3 28
53 78
4 29
54 79
5 30
55 80
6 31
56 81
7 32
57 82
8 33
58 83
9 34
59 84
10 35
60 85
11 36
61 86
12 37
62 87
13 38
63 88
14 39
Systematic Random
Sampling
1 26
51 76
N = 100 2 27
52 77
3 28
53 78
Want n = 20 4 29
54 79
5 30
55 80
6 31
56 81
7 32
57 82
8 33
58 83
9 34
59 84
10 35
60 85
11 36
61 86
12 37
62 87
13 38
63 88
14 39
Systematic
Random Sampling
1 26
51 76
N = 100 2 27
52 77
3 28
53 78
want n = 20 4 29
54 79
5 30
55 80
6 31
N/n = 5 56 81
7 32
57 82
8 33
58 83
9 34
59 84
10 35
60 85
11 36
61 86
12 37
62 87
13 38
63 88
14 39
Systematic Random
Sampling 1
51
2
26
76
27
N = 100
52 77
3 28
53 78
Want n = 20 4 29
54 79
5 30
55 80
6 31
N/n = 5 56 81
7 32
57 82
8 33
Select a random number from 1-5: chose 4 58 83
9 34
59 84
10 35
60 85
11 36
61 86
12 37
62 87
13 38
63 88
14 39
Systematic Random
Sampling N = 100
1
51
2
26
76
27
52 77
3 28
53 78
Want n = 20 4 29
54 79
5 30
55 80
6 31
N/n = 5 56 81
7 32
57 82
8 33
Select a random number from 1-5: chose 4 58 83
9 34
59 84
10 35
60 85
11 36
Start with #4 and take every 5th unit 61 86
12 37
62 87
13 38
63 88
14 39
Stratified Sampling
(Probability Sampling)
Particularly useful only in conditions when the population is divided
into homogeneous groups (grouped based on a controlling variables
in the study such as gender, race, civil status, or nationality)
Homogeneous partitions are also called STRATA (singular form:
STRATUM).
Example.
A sample of 100 students is to be selected from a junior
high school population of 1000 of which
◦ 250 are in Grade 7
◦ 200 are in Grade 8
◦ 300 are in Grade 9
◦ 250 are in Grade 10
3. Registration Method
◦ This method of gathering information is governed by laws.
Methods of Collecting Data
4. Observation Method
◦ This method is used to data that are pertaining to behaviors of an individual
or group of individuals at the time of occurrence of a given situation are best
obtained by observation. One limitation of this method is observation is made
only at the time or occurrence of the appropriate events.
Methods of Collecting Data
5. Experiment Method
◦ This is used to determine the cause and effect relationship of certain
phenomena under controlled conditions. This method usually employed by
scientific researchers.
Methods of Presenting Data
Textual Method – narrative and paragraph forms
Tabular Method – tables which are orderly arranged in rows and
columns for an easier and more comprehensive comparison of
figures
Graphical Method – visual or pictorial form to get a clear view of
data (histogram, pareto chart, pictograph, etc.)
Summation Notation, Sigma Σ
Example. Write the following expressions in
expanded form.
1.
2.
3.
Solution:
1.
2.
3.
Example. Evaluate the following notations using
the values below.X1 = 1 X2 = 3 X3 = 2 X4 = 5
y1 = 0 y2 = 8 y3 = 1 y4 = 6
z1 = 4 z2 = 7 z3 = -2 z4 = 3
Frequency Distribution
After collecting data, the first task for a researcher is to
organize and simplify the data so that it is possible to get a
general overview of the results.
71
Histograms
In a histogram, a bar is centered above each score (or
class interval) so that the height of the bar
corresponds to the frequency and the width extends
to the real limits, so that adjacent bars touch.
72
Polygons
In a polygon, a dot is centered above each score so
that the height of the dot corresponds to the
frequency. The dots are then connected by straight
lines. An additional line is drawn at each end to bring
the graph back to a zero frequency.
74
Bar graphs
When the score categories (X values) are measurements
from a nominal or an ordinal scale, the graph should be a
bar graph.
A bar graph is just like a histogram except that gaps or
spaces are left between adjacent bars.
76
Relative frequency
Many populations are so large that it is impossible to know
the exact number of individuals (frequency) for any specific
category.
In these situations, population distributions can be shown
using relative frequency instead of the absolute number of
individuals for each category.
78
Smooth curve
If the scores in the population are measured on an
interval or ratio scale, it is customary to present the
distribution as a smooth curve rather than a jagged
histogram or polygon.
The smooth curve emphasizes the fact that the
distribution is not showing the exact frequency for
each category.
80
Frequency distribution graphs
Frequency distribution graphs are useful because they show
the entire set of scores.
At a glance, you can determine the highest score, the lowest
score, and where the scores are centered.
The graph also shows whether the scores are clustered
together or scattered over a wide range.
82
Shape
A graph shows the shape of the distribution.
A distribution is symmetrical if the left side of the graph is (roughly)
a mirror image of the right side.
One example of a symmetrical distribution is the bell-shaped normal
distribution.
On the other hand, distributions are skewed when scores pile up on
one side of the distribution, leaving a "tail" of a few extreme values
on the other side.
83
Positively and Negatively
Skewed Distributions
In a positively skewed distribution, the
scores tend to pile up on the left side of the
distribution with the tail tapering off to the
right.
In a negatively skewed distribution, the
scores tend to pile up on the right side and
the tail points to the left.
84
Stem-and-Leaf Displays
There are different qualitative data analysis methods to help you make
sense of qualitative feedback and customer insights, depending on
your business goals and the type of data you've collected.
What is qualitative data analysis?
Qualitative data analysis (QDA) is the process of organizing, analyzing, and interpreting
qualitative data—non-numeric, conceptual information and user feedback—to capture
themes and patterns, answer research questions, and identify actions to take to improve
your product or website.
Qualitative data often refers to user behavior data and customer feedback.
5 Qualitative Data Analysis Methods
Here are five methods of qualitative data analysis to help you make sense of
the data you've collected through customer interviews, surveys, and
feedback:
1. Content analysis
2. Thematic analysis
3. Narrative analysis
4. Grounded theory analysis
5. Discourse analysis
Content analysis
Content analysis is a research method that examines and quantifies the
presence of certain words, subjects, and concepts in text, image, video, or
audio messages. The method transforms qualitative input into quantitative
data to help you make reliable conclusions about what customers think of
your brand, and how you can improve their experience and opinion.
You can conduct content analysis manually or by using tools like Lexalytics
to reveal patterns in communications, uncover differences in individual or
group communication trends, and make connections between concepts.
How content analysis can help your team?
Content analysis is often used by marketers and customer service specialists, helping them
understand customer behavior and measure brand reputation.
For example, you may run a customer survey with open-ended questions to discover users’
concerns—in their own words—about their experience with your product. Instead of having to
process hundreds of answers manually, a content analysis tool helps you analyze and group
results based on the emotion expressed in texts.
Some other examples of content analysis include:
Analyzing brand mentions on social media to understand your brand's reputation
Reviewing customer feedback to evaluate (and then improve) the customer and user experience
(UX)
Researching competitors’ website pages to identify their competitive advantages and value
propositions
Interpreting customer interviews and survey results to determine user preferences, and setting
the direction for new product or feature developments
Content analysis benefits and
challenges
Content analysis has some significant advantages for small teams:
You don’t need to directly interact with participants to collect data
The process is easily replicable once standardized
You can automate the process or perform it manually
It doesn’t require high investments or sophisticated solutions
While content analysis and thematic analysis seem similar, they're different in concept:
Content analysis can be applied to both qualitative and quantitative data, and focuses on
identifying frequencies and recurring words and subjects.
Thematic analysis can only be applied to qualitative data, and focuses on identifying patterns
and ‘themes’.
How thematic analysis can help your
team?
Thematic analysis can be used by pretty much anyone: from product marketers, to customer
relationship managers, to UX researchers.
For example, product teams can use thematic analysis to better understand user behaviors
and needs, and to improve UX. By analyzing customer feedback, you can identify themes
(e.g. ‘poor navigation’ or ‘buggy mobile interface’) highlighted by users, and get actionable
insight into what users really expect from the product.
Thematic analysis benefits and
challenges
Some benefits of thematic analysis:
It’s one of the most accessible analysis forms, meaning you don’t have to train your teams on it
Teams can easily draw important information from raw data
It’s an effective way to process large amounts of data into digestible summaries
Some formats narrative analysis doesn't work for are heavily-structured interviews and
written surveys, which don’t give participants as much opportunity to tell their stories in
their own words.
How narrative analysis can help your
team?
Narrative analysis provides product teams with valuable insight into the complexity of
customers’ lives, feelings, and behaviors.
This might look like analyzing daily content shared by your audiences’ favorite influencers on
Instagram, or analyzing customer reviews on sites like G2 or Capterra to understand
individual customers' experiences.
Narrative analysis benefits and
challenges
Businesses turn to narrative analysis for a number of reasons:
The method provides you with a deep understanding of your customers' actions—and the
motivations behind them
It allows you to personalize customer experiences
It keeps customer profiles as wholes, instead of fragmenting them into components that can be
interpreted differently
Unlike other qualitative data analysis methods, this technique develops theories from data,
not the other way round.
How grounded theory analysis can help
your team?
Grounded theory analysis is used by software engineers, product marketers, managers, and
other specialists that deal with data to make informed business decisions.
For example, product marketing teams may turn to customer surveys to understand the
reasons behind high churn rates, then use grounded theory to analyze responses and
develop hypotheses about why users churn, and how you can get them to stay.
Grounded theory can also be helpful in the talent management process. For example, HR
representatives may use it to develop theories about low employee engagement, and come
up with solutions based on their findings.
Grounded theory analysis benefits and
challenges
Here’s why teams turn to grounded theory analysis:
It explains events that can’t be explained with existing theories
The findings are tightly connected to data
The results are data-informed, and therefore represent the proven state of things
It’s a useful method for researchers that know very little information on the topic
In contrast to content analysis, the method focuses on the contextual meaning of language:
discourse analysis sheds light on what audiences think of a topic, and why they feel the way
they do about it.
How discourse analysis help your team
In a business context, the method is primarily used by marketing teams. Discourse analysis
helps marketers understand the norms and ideas in their market, and reveals why they play
such a significant role for their customers.
Once the origins of trends are uncovered, it’s easier to develop a company mission, create a
unique tone of voice, and craft effective marketing messages.
Discourse analysis benefits and
challenges
Discourse analysis has the following benefits:
It uncovers the motivation behind your customers’ or employees’ words, written or spoken
It helps teams discover the meaning of customer data, competitors’ strategies, and
employee feedback
Choosing the right analysis method for your team isn't a matter of preference—selecting a
method that fits is only possible when you define your research goals and have a clear
intention. Once you know what you need (and why you need it), you can identify an analysis
method that aligns with your objectives.
References
https://qph.cf2.quoracdn.net/main-qimg-288dd9a2cd15d63a171646bcd1371e6f-lq
https://www.hotjar.com/qualitative-data-analysis/methods/#
https://www.slideshare.net/donthuraj/basics-of-statistics-53905627
https://www.scribbr.com/methodology/external-validity/
https://www.scribbr.com/methodology/internal-validity/
https://www.scribbr.com/frequently-asked-questions/correlational-vs-experimental-research/
#:~:text=In%20an%20experimental%20design%2C%20you,without%20manipulating%20any
%20of%20them.
https://www.slideshare.net/menhaz/stat-1163-statistics-in-environmental-science-116147156
https://1.bp.blogspot.com/-QRSuSJ8kMbo/WPvYUzbRDAI/AAAAAAAACCU/
nja0lKnSINk2eNjqpd11JVYJ7hbMcgAdgCLcB/s1600/ronald%2Bfisher%2B2.jpg