Data Analysis

This document provides an overview of key concepts in data analysis, including coding data, data cleaning, handling missing values, principal analyses, and levels of measurement. It also discusses descriptive versus inferential statistics. Descriptive statistics are used to summarize and organize data through methods like frequency distributions and measures of central tendency and dispersion. Inferential statistics allow inferences about populations based on samples through hypothesis testing, where statistical significance is determined by comparing a test statistic to critical values.

Uploaded by

Zarah Thea Estoquia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views61 pages

Data Analysis

Uploaded by

Zarah Thea Estoquia

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Data Analysis

Assoc. Prof Theresa A. Guino-o

Coding Data
⚫ The process of transforming data into symbols
compatible with computer analysis—usually
numeric values
⚫ Types of coding:
⚫ Inherently quantitative variables (e.g., weight)
⚫ Precoded data (e.g., yes/no)
⚫ Uncategorized data (e.g., open-ended
questions)
⚫ Missing values (e.g., refusals, don’t knows)
Data Cleaning
⚫ Includes checks for unusual values
⚫ Outliers (unusual/extreme values)
⚫ Wild codes (impossible values)
⚫ Includes consistency checks—internal consistency
of data within a case (e.g., any pregnant males?)
Missing Values Problems
Solutions:
⚫ Delete missing cases (listwise deletion)
⚫ pairwise deletion- selective deletion of cases with
missing variables
⚫ Delete the variable
⚫ Estimate missing value (e.g., through regression)
⚫ Performing item
reversals
⚫ Constructing scales
⚫ Performing counts
⚫ Recoding variables
⚫ Meeting statistical
assumptions

⚫ Eg. P 420 scale

Principal Analyses
⚫ Consider the hypotheses/ questions
⚫ Perform descriptive statistical analyses
⚫ Perform bivariate statistical analyses
⚫ Perform multivariate analyses
Levels of Measurement
(Chapter 19 – page 451-454)

⚫ Analysis that can be performed on data

depend on their measurement level

⚫ 4 levels of measurement:
⚫ Nominal
⚫ Ordinal
⚫ Interval
⚫ Ratio
Nominal Measurement
⚫ lowest level
⚫ involves assigning numbers to classify
characteristics into categories
⚫ numbers are merely symbols that represent
diﬀerent values
⚫ categories must be mutually exclusive and
collectively exhaustive.
⚫ e.x: male (1) female (2)
Ordinal Measurement
⚫ Rank orders phenomenon along some
dimensions
⚫ involves sorting objects on the basis of their
relative standing or ranking on an attribute
⚫ Ex.
⚫ 1= low
⚫ 2=medium
⚫ 3= high
Interval Measurement
⚫ A measurement in which an attribute of a
variable is rank ordered on a scale that has
equal distances between points on that
scale.
⚫ Absence of a zero point
⚫ EX. Fahrenheit degrees
⚫ E.g. self-expectancy scale
Ratio Scale
⚫ A quantitative measurement in which
intervals are equal and there is a true zero
point.
⚫ The highest level of measurement
⚫ All arithmetic operations are permissible with
this measurement (add, subtract, multiply,
and divide numbers on this scale).
⚫ E.g. pt’s weight, number of days in ICU,
height)
Descriptive vs. Inferential
Statistics
Descriptive Statistics
⚫ Descriptive Statistics are used to present
quantitative descriptions in a manageable form.
⚫ This method works by reducing lots of data into a
simpler summary.
⚫ Organizes the data for easier understanding
Examples of Descriptive Statistics
⚫ Frequency Distribution-used to group data
⚫ Percentages
⚫ Central Tendencies (Mean, Median & Mode)
⚫ Measures of Dispersion
(Range, Diﬀerence scores, Sum of squares,
variance, Standard Deviation)
This is the examination across cases of one
variable at a time (UNIVARIATE).
Described through A table or A graph
(histogram, bar chart)
A Frequency Distribution Table

Table 1 Age Distribution of Respondents

Category Percent %
Under 35 9
36-45 21
46-55 45
56-65 19
66+ 6

Total 100
Frequency Distribution –
Ungrouped (Bar graph)

Grad-1
th
4 Yr-5
rd
3 Yr-3
nd
2 Yr-1
st
1 Yr- 2
Fig. 1 Distribution of Respondents According to Year Levels in School
Frequency Distribution –
Grouped (histogram)
Ages 20-39 - 14
Ages 40-59 - 43
Ages 60-79 - 26
Ages 80-100 - 4

Fig. 2 Distribution of Respondents by Age Range

Percentage Distribution
Salaries 41.7%
Maintenance 8.3%
Equipment 16.7%
Fixed costs 8.3%
Supplies 25.0%

Fig. 3 Distribution of Company Expense Allocations

Measures of Central Tendency
⚫ An estimate of the “center” of a distribution
⚫ Three different types of estimates:
⚫ Mean
⚫ Median
⚫ Mode
Mean
● The sum of values ● 11113333334444
divided by the number 4444555555555
of values being 5555577777777
summed. 8888889999
● The mean may not be a ● 264/50=5.28
data set value.
● Appropriate for interval
and ratio measures
● The numbers need not
be arranged from highest
to lowest
Median
⚫ The median is the score found at the exact
middle of the set.
⚫ When number of values is uneven, may
not be an actual value in data set.
⚫ One must list all scores in numerical order
and then locate the score in the center of the
sample.
⚫ Example: If there are 500 scores in the list,
score #250 would be the median.
⚫ This is useful in weeding out outliers
⚫ Appropriate for ordinal measures
Mode
⚫ The mode is the most repeated score in the
set of results.; may not be always at the
center.
⚫ Lets take the set of scores: 15,20,21,20,36,15,
25,15
⚫ Line up 15,15,15,20,20,21,25,36
⚫ 15 is the most repeated score and is therefore
labeled the mode.
⚫ Appropriate for nominal measures
Measures of Dispersion
Range ● 11113333334444
● Obtained by subtracting 4444555555555
the lowest score from the 5555577777777
highest score. 8888889999
● Uses only the two extreme ●9 - 1 = 8
scores.
● Very crude measure and
sensitive to outliers.
Standard Deviation ● 1111333333444444
4455555555555555
● The square root of 777777778888889
the variance. 999
● the standard ● SD = 2.22
deviation is the
“average” difference
score.
⚫ The standard deviation is a value
that shows the relation that individual
scores have to the mean of the sample.
⚫ If scores are normally distributed
(standardized), one can assume that
approximately 69% of the scores in the
sample fall within one standard deviation
and 95% of the scores would then fall within
two standard deviations of the mean.
⚫ Can use different parametric tests for
analysis
Standard Deviations in a
Normal Distribution
Inferential Statistics
⚫ Based on the law of probability
⚫ permit inferences on whether relationships
observed in a sample are likely to occur in a
larger population
⚫ It estimates population parameters from
sample statistics
⚫ Used in hypothesis testing
⚫ Based on rules of negative inference:
research hypotheses are supported if null
hypotheses can be rejected
What is bivariate data analysis?
●Comparison of summary
values from two groups on
the same variable or of two
variables within one group
Level of Significance- the risk of
making a type 1 error
⚫ With .05 significance level, we are accepting
the risk that out of 100 samples drawn from
a population, a true null hypothesis would
be rejected only 5 times.

⚫ With a .01 level of signiﬁcance, the risk of a

type I error is lower: in only 1 sample out of
100 would we erroneously reject the null
hypothesis.
Overview of Hypothesis-Testing
Procedures
1. Select an appropriate test statistic
2. Establish the level of significance(e.g., α
= .05)
3. Select a one-tailed or a two-tailed test
4. Compute test statistic with actual data
5. Calculate degrees of freedom (df) for
the test statistic (the number of sample
values free to vary from the mean)
Overview of Hypothesis-Testing
Procedures (cont’d)
6. Obtain a tabled value for the statistical test
or obtain a p-value from the computer
generated results( computed probability of a
type 1 error)
7. Determine the statistical significance of
results
Statistical significance
⚫ Means that the obtained results are not likely
to have been the result of chance.
⚫ If the absolute value or computed value of the
test statistic is larger than the tabled value, the
results are statistically significant.
⚫ OR if the the p value is smaller or more
stringent than the α value (level of
significance), the results are statistically
significant.
⚫ DECISION: Null hypothesis is rejected
Non-significance
A NON -SIGNIFICANT RESULT means that
any observed difference or relationship
could have resulted from chance
fluctuations.
Null hypothesis is accepted
In one-tailed tests, the critical region of
improbable values is entirely in one tail of the
distribution-the tail corresponding to the
direction of the hypothesis
Two-tailed test- this means that both ends or tails
of the sampling distribution are used to
determine improbable values.
Statistical Decisions are Either
Correct or Incorrect
Two types of incorrect decisions:
⚫ Type I error: a null hypothesis is rejected
when there is actually no relationship bet.
variables
⚫ Risk of a Type I error is controlled by the
level of significance (alpha), e.g., α = .01
or.05
⚫ Type II error: null hypothesis is accepted
when there is actually a rel. bet variables
⚫ Risk of a Type II error is controlled by
increasing the sample size and power
Parametric Statistics
⚫ Involve the estimation of a parameter
⚫ Require measurements on an interval scale
or ratio scale
⚫ Involve several assumptions (e.g., that
variables are normally distributed in the
population)
⚫ More powerful than non-parametric tests
⚫ E.g. T-Test, Pearson’s r, ANOVA
Nonparametric Statistics
(Distribution-Free Statistics)

⚫ Do not estimate parameters

⚫ Involve variables measured on a nominal or
ordinal scale
⚫ Have less restrictive assumptions about the
shape of the variables’ distribution than
parametric tests
⚫ E.g. Mann –Whitney U test, Wilcoxon
signed –rank test, median Test, Chi-square
Fisher’s exact test.
Quick Guide to Bivariate Statistical Tests p. 592
t-Test
Tests the difference between two means
⚫ t-Test for independent groups
(between subjects)
⚫ t-Test for dependent groups
(within subjects)
Analysis of Variance (ANOVA)
⚫ Tests the difference between 3+ means
⚫ One-way ANOVA
⚫ Multifactor (e.g., two-way)
ANOVA
⚫ Repeated measures ANOVA
(within subjects)
Correlation
Pearson’s r, a parametric test
⚫ Tests that the relationship between two
variables
⚫ Used when measures are on an interval or
ratio scale
Spearman’s Rho
⚫ Used when measures are on an ordinal scale
Correlation
● The results of the analysis provide two
pieces of information about the data
● the nature of the relationship
(positive or negative)
● the magnitude of the
relationship
● There is no indication of the direction of
a relationship in the analysis
Various Relationships Graphed
on Scatter Plots
2
Chi-Square Test (X )
⚫ Tests the difference in proportions in
categories within a contingency table
⚫ Tests for differences between
frequencies expected if groups are
alike and frequencies actually
observed in the data.
⚫ Used with nominal data
⚫ A nonparametric test
Chi-Square Results
● Chi-square results will not tell you
which cells are different- only the
proportions of each.
● Fisher’s Exact Test- used If data does
not meet the requirements of Chi
square
Regression Analysis
●Used when one wishes to
predict the value of one variable
based on the value of another
variable.
2
●R
Cronbach’s Alpha coefficient
● Tests internal consistency of
measurement scale
● To what extent is measure a true
reflection of subject’s responses.
● Reliability of 0.7 lowest acceptable alpha
● This means 70% of the time you can trust
the score to accurately reflect what is
being measured.
Phase 5: Interpretive Phase
⚫ Integrate and synthesize analyses
⚫ Interpreting statistical outcomes
⚫ Perform supplementary interpretive
analyses (e.g., power analysis)
Significant & Predicted Results
● in keeping with those predicted
by the researcher and support the
logical links developed by the
researcher between the
framework, questions, variables,
and measurement tools.
Nonsignificant Results (negative
results)Analysis showed no significant
differences or relationships
● Could be a true reflection of reality.
● If so, the researcher or the theory
used by the researcher to develop the
hypothesis is in error.
● Negative findings are an important
addition to the body of knowledge.
● Findings can have statistical
significance but not clinical
significance.
● Clinical Significance -Related to
practical importance of the findings
● ultimately a value judgment by:
● The patients and their families
● The clinician/researcher
● Society at large
Could be a type II error
● Inappropriate methods
● Biased sample
● Small sample

● Internal validity problems

● Inadequate measurement
● Weak statistical measures
● Faulty analysis
Power Analysis
⚫ A method of reducing the risk of Type II errors
and estimating their occurrence
⚫ With power = .80, the risk of a Type II error (β) is
20%
⚫ Method is frequently used to estimate how large
a sample is needed to reliably test hypotheses
Unexpected Results (questionable)
Relationships found between variables
that were not hypothesized and not
predicted from the framework being
used
● These findings can be useful in
● theory development
● modification of existing theory
● development of later studies
Significant & Not Predicted Results
● Opposite those predicted
● Indicate flaws in the logic of both
the researcher and the theory
being tested
● If valid, are an important addition
to the body of knowledge
Mixed Results
● Most common outcome
of studies
● One variable may uphold predicted
characteristics while another does
not
● Two dependent measures of the same
variable may show opposite results
● May be due to methodology problems
● May indicate need to modify existing theory
Conclusions
● A synthesis of the findings using
● logical reasoning
● creative formation of a meaningful
whole from pieces of information
obtained through data analysis and
findings from previous studies
● receptivity to subtle clues in the data
● consideration of alternative
explanations of data
Implications
● The meanings of conclusions for the
body of nursing knowledge, theory,
and practice.
● Based on, but more specific than
conclusions.
● Provide specific suggestions for
implementing the findings.
Suggesting Further Studies
● Replications
● Different design
● Larger sample
● Hypotheses emerging from
findings
● Strategies to further test the
framework in use
THE END

Bashir-UCP Art1, Trade Payment
No ratings yet
Bashir-UCP Art1, Trade Payment
89 pages
Gas Treating Technology Comparison GPA 2008
No ratings yet
Gas Treating Technology Comparison GPA 2008
12 pages
A2. Book 2. Finite Element Analysis Concepts Via SolidWorks - Book-3
No ratings yet
A2. Book 2. Finite Element Analysis Concepts Via SolidWorks - Book-3
80 pages
APOS (A8, EX, DX) Development Mode Preparing
No ratings yet
APOS (A8, EX, DX) Development Mode Preparing
27 pages
PHYTOREMEDIATION
50% (2)
PHYTOREMEDIATION
26 pages
ADIT TP 2023-06 Questions
100% (1)
ADIT TP 2023-06 Questions
6 pages
LangChain Academy - Introduction To LangGraph - Motivation
No ratings yet
LangChain Academy - Introduction To LangGraph - Motivation
17 pages
Module 08 Fixture I
100% (1)
Module 08 Fixture I
34 pages
The Nature and Importance of Entrepreneurship
No ratings yet
The Nature and Importance of Entrepreneurship
83 pages
Tetra Plate Heat Exchanger - 2020H
No ratings yet
Tetra Plate Heat Exchanger - 2020H
8 pages
Banchbo AN INITIATIVE FOR THE WELFARE OF SR. CITIZENS
No ratings yet
Banchbo AN INITIATIVE FOR THE WELFARE OF SR. CITIZENS
61 pages
Hillstone HSM 4.0.0 EN
No ratings yet
Hillstone HSM 4.0.0 EN
2 pages
Chaper Five: Curve Fitting
No ratings yet
Chaper Five: Curve Fitting
44 pages
Dassault Mirage III
No ratings yet
Dassault Mirage III
31 pages
LaBorda ENG
No ratings yet
LaBorda ENG
20 pages
Grid Coupling Flex Coupling Mill Gear Box: Motor
No ratings yet
Grid Coupling Flex Coupling Mill Gear Box: Motor
16 pages
Shouding: 1Mhz, 2A Step-Up Current Mode PWM Converter
No ratings yet
Shouding: 1Mhz, 2A Step-Up Current Mode PWM Converter
10 pages
Case Study Presentation Two Tough Calls A Harvard Business School
No ratings yet
Case Study Presentation Two Tough Calls A Harvard Business School
10 pages
Case Study: How Neuroscience Transformed Business: The TCS Story
No ratings yet
Case Study: How Neuroscience Transformed Business: The TCS Story
6 pages
Chuyên Đề 22 - Từ Chỉ Số Lượng
No ratings yet
Chuyên Đề 22 - Từ Chỉ Số Lượng
4 pages
TTT Trainer Checklist
No ratings yet
TTT Trainer Checklist
4 pages
Usp36-Nf31 GC1251
No ratings yet
Usp36-Nf31 GC1251
5 pages
Lulu Chang Resume
No ratings yet
Lulu Chang Resume
1 page
Sample DLP 2024
No ratings yet
Sample DLP 2024
3 pages
Hazard Pay
No ratings yet
Hazard Pay
2 pages
NCP Format
No ratings yet
NCP Format
2 pages
Welbilt Bread Machine Model Abm1h70 Instruction Manual & Recipes Abm 1h70
No ratings yet
Welbilt Bread Machine Model Abm1h70 Instruction Manual & Recipes Abm 1h70
4 pages
Quick Details
No ratings yet
Quick Details
2 pages
Ahu Fan-6000 CFM Twin Fan-1.5 Inch
No ratings yet
Ahu Fan-6000 CFM Twin Fan-1.5 Inch
1 page
Cooperatives Vs Corporations
No ratings yet
Cooperatives Vs Corporations
2 pages
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Data Analysis

Uploaded by

Data Analysis

Uploaded by

Data Analysis

Assoc. Prof Theresa A. Guino-o

⚫ Eg. P 420 scale

⚫ Analysis that can be performed on data

Table 1 Age Distribution of Respondents

Fig. 2 Distribution of Respondents by Age Range

Fig. 3 Distribution of Company Expense Allocations

⚫ With a .01 level of signiﬁcance, the risk of a

⚫ Do not estimate parameters

● Internal validity problems

You might also like