0% found this document useful (0 votes)
41 views

Chapter 1. Introduction To Data Analysis

Uploaded by

rbqbjf5wx9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Chapter 1. Introduction To Data Analysis

Uploaded by

rbqbjf5wx9
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Introduction to

Statistical Analysis
Singular
sense
Statistics Branch of science that deals with
the development of methods for a
more effective way of collecting,
organizing, presenting, analyzing,
and interpreting data

Numerical quantities derived Plural


from a set of data sense
Major Areas/
Statistics Categories

Descriptive

summary calculations, tabular and Inferential


graphical displays, and describing
important features of a set of data concerned with making generalizations
for a bigger group of observations
based on information gathered from a
smaller group.
Exercises. Write D if you think the following illustrates descriptive statistics, otherwise,
write I for inferential statistics.
Trial: a. A badminton player wants to know his average score for the past 10 games.
b. Based on last year’s electric bills, Mrs. Venegas forecasted to pay 2,000 php
next month.
1. Janine wants to determine the variability of her six exam scores in Algebra.
2. Efren Reyes wants to estimate his chance of winning in the next World
Championship game in Billiards based on his average scores last championship and
of his competitors.
3. A car manufacturer wishes to estimate the average lifetime of batteries by testing a
sample of 50 batteries.
4. A politician wants to determine the total number of votes his rival obtained in the
past election based on his copies of the tally sheet of electoral returns.
5. A marketing research group wishes to determine the number of families not eating
three times a day in the sample used for their survey.
Basic Terminologies

Sample
a representative subset
of a population

. Population
. complete set of individuals, objects
.
or measurements under study
.

.
Data
‐- Facts or information that are collected for reference and analysis
-‐ Can be numeric or non-numeric
Primary Data – data gathered are original
Ex. Recording the age of your classmates by asking them
Secondary Data – data that were previously gathered from an original source
Ex. Getting the list of 4Ps benefeciaries in Butuan City from DSWD

Variable
- a characteristic that may take on different values
. Examples: Civil Status -‐ Single, Married, Divorced
. Weight (in kgs) – 45kgs, 50kgs, 60kgs, etc.
.

.
Let us determine the population under study and the
variable of interest.
a. A group of researchers is interested in determining the number of
children below 12 years old infected with COVID-19 in Caraga.
Population: Children below 12 years old in Caraga
Variable of Interest: Whether or not the child has ever been
infected with COVID-19.

b. A group of researchers is interested in determining the age of


.
the patients infected with COVID-19 in MJ Santos Hospital.
.

. Population: Patients infected with COVID-19 in MJ Santos Hospital


.
Variable of Interest: Age of the COVID-19 patients in MJ Santos
.

.
Hospital
Types of
Variables

Quantitative Variables Qualitative Variables

‐-variables that are measured ‐-assumes values that are names or


on anumeric or quantitative labels, thus can be categorized
scale (categorical variables)
-‐Categories may be identified by
either non-numerical descriptions or
by numerical codes
-‐E.g. Civil status, religious affiliation,
.
etc.
Discrete variables Continuous variables
.

. ‐-variables with a finite or countable number of ‐-variables that assumes any


possible values (e.g. age (in years), No. of value in a given interval (e.g.
. female enrollees in CSU for A.Y. 2024-2025, height (in meters), weight (in
. etc.) kg), etc.)
.
Determine which type of variable.
1. Address Qualitative
2. House No. Qualitative
3. Weight (in kgs) Quantitative -‐
Continuous
4. Diaper Size Qualitative
5. Color of the Leaf Qualitative
6. Number of Units Enrolled Quantitative-
Discrete
.
7. SSS Number Qualitative
. 8. Telephone Number Qualitative
. 9. Foot length (in cm) Quantitative –
. Continuous
.

.
10. Brand of Cellphone Qualitative
Levels of Measurement
Levels of Measurement
Measurement- process of determining the value or label of the
variable based on what has been observed
Ratio
1. Nominal Level
Interval
2. Ordinal Level Ordinal
3. Interval Level Nominal

.
.
4. Ratio Level
.

.
1. Nominal Level
- Observations can be named without particular
order or ranking imposed on data. Words,
letters and even numbers are used to classify
the data. Ratio

2. Ordinal Level Interval

Ordinal
- Describes ranking or order. The difference
between two rankings may not always be Nominal
.
the same.
.

.
3. Interval Level
- Indicatesan actual amount (numerical). The
order and the difference between the
variables can be known. Its limitation is it
has no “true zero”. Ratio

Interval
4. Ratio Level
Ordinal
- It has the same properties as the interval
Nominal
.
level. The order and difference can be
. described. Additionally, it has true zero and
.
the ratio between two points has a
.

.
meaning.
.
Let us determine the level of measurement of each
variable.
1. Zip code 1. Nominal
2. Brand of Shampoo used 2. Nominal
3. Weight 3. Ratio
4. Police Rank 4. Ordinal
5. Room Temperature in Celsius 5. Interval
.

. 6. Diaper Size 6. Ordinal


.

.
Exercises. Determine whether each item is quantitative or qualitative and identify its
level of measurement.
1. Postal zip code
2. Student Number
3. Ranking of a student in class
4. Annual salary of employee
5. Body temperature of a child measured in Celsius
6. Tax identification number of an employee
7. Performance rating of an employee as excellent, very good, good, fair, and bad
8. Number of subjects enrolled in
9. Student’s score in a quiz
10. Citizenship
Some
Mathematical
Notations
The Summation Symbol
denoted by Σ and is defined by
𝑛

෍ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
𝑖=1
Example: Consider the set of values 5, 4, 8, and 6.
4

෍ 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + 𝑥4 = 5 + 4 + 8 + 6 = 23
𝑖=1
4

෍ 𝑥𝑖 2 = 𝑥1 2 + 𝑥2 2 + 𝑥3 2 + 𝑥4 2 = 52 + 42 + 82 + 62 = 25 + 16 + 64 + 36 = 141
𝑖=1
The Factorial Symbol
denoted by ! and is defined by
𝑛! = 1 ∙ 2 ∙ 3 ∙ ⋯ ∙ 𝑛
- 𝑛! is the product of all positive integers less
than or equal to n
- By convention, 0! = 1
Example: Solve for 𝑛! for 𝑛 = 5 and 𝑛 = 7.
𝑛! = 5! = 5 ∙ 4 ∙ 3 ∙ 2 ∙ 1 = 120
𝑛! = 7! = 7 ∙ 6 ∙ 5 ∙ 4 ∙ 3 ∙ 2 ∙ 1 = 5040
Methods of Data Collection
Data can be collected
Directly using
•Questionnaires
•Interviews
PRIMARY DATA
•Experiments
•Direct observations
Indirectly through
•Existing documents/records SECONDARY
DATA
Methods of Communication
Factor Self-Administered Telephone On-line Survey Personal Interview
Questionnaire Interview
Cost Inexpensive Quite Expensive Quite Expensive Very Expensive
Speed Time-consuming Fast Moderately Fast Time-consuming
Response Rate Poor Average Average Very Good
Interviewer Bias None Likely None Highly Likely
Quality of May be vague Good May be vague Very good
Response
Type of Limited Limited Limited Wide-range
Information
Exercises. What method of data collection is most appropriate for the following
cases? Give a brief explanation for your choice.
1. Studying two groups of patients and determining if exercise lowers the blood
pressure.
2. The Department of Health monitors and evaluates the benefits of the family
planning methods given to Brgy. Ampayon.
3. A nongovernment organization compares the household expenditures of two
districts in Butuan City.
4. A group of Anthropology students studies the culture and norms of two ethnic
group.
5. A social welfare organization gathers information on hospital patients with
mental disorder.
6. A car manufacturer studies the preference of cars for the next production.
Census Vs. Sample Survey
Census Sample Survey
Sampling
Sampling is the process of selecting observations (a
sample) to provide an adequate description and
inferences of the population.
Sample Size Determination
SAMPLE TERMS:
SIZE Total Population (𝑵) – This is the total number of
population.
Margin of error (𝒅) – Is a statistic expressing the
𝑍𝛼2ൗ 𝑝𝑞 INFINITE
amount of random sampling error in a survey's
results.
2
𝑛0 = POPULATION Level of Significance ( 𝜶 ) – Probability of
𝑑2 committing a type I error.
Sample proportion (𝒑) - The sample proportion is
𝑛0 what you expect the results to be. This can often be
𝑛= FINITE determined by using the results from a previous
𝑛0 − 1
1+ 𝑁 POPULATION survey, or by running a small pilot study. If you are
unsure, use 50%, which is conservative and gives
the largest sample size.
𝑞 = 1 − p.
Example. You are investigating the level of awareness of
CHaSS students in CSU towards the accessibility law or the
BP 344. Three (3) programs were used as the target
populations, namely; BS SW (𝑁1 = 200), AB Socio (𝑁2 =
500), and BS Psych (𝑁3 = 800). Since no data are available
on the proportion of CHaSS students knowledgeable, you
take the worst case scenario and set p = 0.5 (and therefore
q = 1-0.5 = 0.5). As this is a preliminary study you are
prepared to accept a margin of error of ± 5% so you set d =
0.05. How many students per program should you get for
your sample?
Given: 𝑁 = 1500, 𝑝 = 0.5, 𝑞 = 0.5, 𝑑 = 0.05, 𝛼 = 0.05.
So, 𝑍𝛼 = 𝑍0.025 = 1.96.
2
𝑍𝛼2ൗ 𝑝𝑞 1.96 2 (0.5)(0.5)
2
Now, 𝑛0 = 𝑑2
= (.05)2
= 384.16 ≈ 𝟑𝟖𝟓.
𝑛0 385
So, 𝑛 = 𝑛 −1 = 385−1 = 306.53 ≈ 𝟑𝟎𝟕.
1+ 0 1+
𝑁 1500
Stratum Population (𝑵𝒊 ) Proportion (𝑷𝒊 = 𝑵𝒊ൗ𝑵) Sample Needed (𝒏𝒊 = 𝒏 × 𝑷𝒊 )
BS SW 200 0.133 307 × 0.133 ≈ 41
AB Socio 500 0.333 307 × 0.333 ≈ 103
BS Psych 800 0.533 307 × 0.533 ≈ 164
Total 1500 100.00 308
Probability Sampling
SIMPLE RANDOM SAMPLING
All units of the frame are given an equal probability.
 Random number generators
 Lottery
SYSTEMATIC RANDOM SAMPLING
 Order all units in the sampling frame
 Then every kth number on the list is selected
 k = Sampling Interval
STRATIFIED RANDOM SAMPLING
 Population is divided into two or more
homogeneous groups called strata
 Samples are randomly selected from each strata
CLUSTER SAMPLING
 The population is divided into natural groups (clusters).
 Randomly pick some clusters from all the clusters.
 Completely enumerate all samples from chosen clusters.
Nonprobability Sampling
CONVENIENCE SAMPLING
 Convenience sampling involves choosing respondents
at the convenience of the researcher.
 Very low cost
 Extensively used
 Restriction of Generalization.
JUDGEMENTAL SAMPLING
 Researcher employs his or her own "expert” judgment
about.
 There is an assurance of Quality response
 Meet the specific objective.
 Bias selection of sample may occur
 Time consuming process.
QUOTA SAMPLING
 Nonprobability sampling version of stratified sampling.
 Strata exist but nonrandom selection of individual
within the group
 Researcher just set a quota
SNOWBALL SAMPLING
 The research starts with a key person and introduce
the next one to become a chain
 Low cost
 Useful in specific circumstances & for locating rare
populations
 Projecting data beyond sample not justified
Guidelines for Conducting Surveys
Before conducting a survey, the following steps should
be completed.
 Develop a clear and concise purpose statement – What
the researcher wants to know and why they want to
know it?
 Develop the items (questions, etc.) for the instrument.
 Test the questions on a group of at least 20 volunteers
to determine the face validity of the items is adequate.
 Bias selection of sample may occur
 Develop the introduction to the survey and the letters
that will precede the distribution of the survey,
accompany the distribution of the survey, and the
reminders after the survey has been sent out.
 Determine the modality of the survey distribution.
 Schedule the sending of the initial letter, the launching
of the survey instrument, and the sending of the
follow-up letter(s).
 Avoid “coverage error” by gathering a sample list of
potential participants that matches the population of
interest as closely as possible.
Basic Terms and Concepts on
Experimental Design
EXPERIMENTAL VS OBSERVATIONAL
Experimental Studies Observational Study
• In an experimental study, the research • In an observational study, the researcher
intervenes (experiments) in some way to merely “observes” what is happening on
affect the manner in which the study units the study units or what has happened
or “experimental units” respond on the past and tries to draw
conclusions based on these
• The investigator controls how the subjects observations.
are assigned to different comparison • Here, the investigator has no control
groups and also regulates the experimental over the group designation of each
conditions of each group subject
Example. The investigator conducts an Example. A study is conducted to observe
experiment to determine the amount of the social behavior of 3 year old toddlers in
fertilizer that produce a high yield in rice a certain play ground.
FACTOR
 It is an experimental variable of interest that potentially affects the
response variable.
Example. In a field experiment, an investigator is interested in the amount of
fertilizer needed for optimizing the yield of a certain crop.
Factor: Fertilizer

Example. A researcher conducted a study on the effect of the 4 types of feeds


to the growth of a certain breed of chicken.
Factor: Type of feeds
TREATMENTS
 In a single-factor experiment, the different level or values of the factor are
called treatments.
Example. A researcher conducted a study on the effect of the 4 types of feeds
to the growth of a certain breed of chicken.
Factor: Type of feeds
Treatment: 4 types of feeds

Example. In a chemical experiment, a chemist is testing the amount of chemical


compound which will dissolve in water at 5 different levels of temperatures.
Factor: Temperature
Treatment: 5 different levels of temperatures
TREATMENTS
 It an experiment with 2 or more factors, a treatment is a combination of the
levels of the factors.
Example. In a biological experiment, four concentrations of a certain chemical are
used to enhanced the growth of two types of plants over a specified period of
time.
Factors: A. Concentration of chemical (4 levels) and B. Type of plants (2 levels)
Treatments:
Con. 1 on plant A Con. 1 on plant B Con. 2 on plant A Con. 2 on plant B
Con. 3 on plant A Con. 3 on plant B Con. 4 on plant A Con. 4 on plant B
EXPERIMENTAL UNIT
 The object/unit to which a treatment is applied.
Example. A researcher conducted a study on the effect of the 4 types of feeds
to the growth of a certain breed of chicken.
Factor: Type of feeds
Treatment: 4 types of feeds
Experimental Unit: Chicken
REPLICATION
 Applying a treatment to more than one experimental unit
 Done to measure the experimental error which will be used in testing the
hypothesis in order to assess the validity and reliability of observations
Example. In the study comparing 4 types of feeds to the growth of a certain
breed of chicken, if treatment 1 or feed 1 is given to 5 chickens, then the
experiment has 5 replicates for treatment 1.

Example. Suppose we want to compare the efficiency of the four tractors and
three workers will drive to observe it. Here the 4 tractors are the treatment and
the workers are the replicates.
EXPERIMENTAL ERROR
 The difference in values among experimental units treated alike
Example. From the experiment of comparing the 4 types of feeds, the
differences in the weight gained of the 5 chickens in a treatment is called the
experimental error.
 How is the experimental error of a treatment measured? It is measured by
computing the sample variance of the observations in that treatment.
Experimental Error for a Treatment:
σ(𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑔𝑟𝑜𝑢𝑝)2
𝑠𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 − 1
Overall experimental error – pooled variance or average of the variances of the
k treatments.
RANDOMIZATION PROCEDURE
 Refers to the methods used for randomly assigning treatments to the
experimental units
 Randomizing the treatments will help to eliminate any possible order effect
 There are the randomization procedures that will be discussed:
 Complete Randomized Design (CRD) – Applicable if the experimental
units have uniform characteristic.
 Randomized Complete Block Design (RCBD) – Applicable if the
experimental units differ from certain characteristics that would affect
the result of the study.
CONFOUNDING EFFECT
 Suppose you put several green plants in an area exposed to sunlight and
several green plants in a dark area
 After few days you will observe that the plants in the dark area have turned
yellow while those exposed to sunlight remained green
 If the dark area has a cooler temperature, the yellow could be due to the
temperature and not due to lack of sunlight
 This effect is what we call CONFOUNDING – the effect of light is confounded
with that of temperature or the effect of light is mixed up with the effect of
the temperature
FIXED EFFECT
 When the levels/treatments of a factor are the only concern of the study
Example. Based on their scores in board examinations, a researcher wanted to
compare the performance of graduates of five school of nursing located in a
city.
 The researcher in interested only in these five schools. So the school which is
the factor of the study, is a fixed effect.
RANDOM EFFECTS
Example.
The researcher may be interested to find out if the performance of
nursing graduates in the Philippines has something to do with the schools of
nursing from which they graduated. It would not be practical to take every
school of nursing as a level of the factor.
Instead, the researcher may take a random selection of, say, 10 nursing
schools out of several thousands. The mean of every level is not a constant for
it varies from sample to sample. In this case, the factor is said to have random
effects.
 An experiment may involve at least one factor having fixed effects and at
least one factor having random effects. In this case, the factors in the
experiment have mixed effects.
DESIGN OF AN EXPERIMENT
 It refers to the number of factors (whether it is a single factor experiment,
two factor, three factor, etc.), the choice of the treatments (whether fixed or
random) to appear in the experiment, and the way in which the study units
are assigned to the treatments.

Examples.
i. Single Factor Experiment with Fixed Factor in Complete Randomized Design
ii. Single Factor Experiment with Fixed Factor in Randomized Complete Block
Design

You might also like