0% found this document useful (0 votes)
23 views

Data Analysis

This document provides an overview of key concepts in data analysis, including coding data, data cleaning, handling missing values, principal analyses, and levels of measurement. It also discusses descriptive versus inferential statistics. Descriptive statistics are used to summarize and organize data through methods like frequency distributions and measures of central tendency and dispersion. Inferential statistics allow inferences about populations based on samples through hypothesis testing, where statistical significance is determined by comparing a test statistic to critical values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Data Analysis

This document provides an overview of key concepts in data analysis, including coding data, data cleaning, handling missing values, principal analyses, and levels of measurement. It also discusses descriptive versus inferential statistics. Descriptive statistics are used to summarize and organize data through methods like frequency distributions and measures of central tendency and dispersion. Inferential statistics allow inferences about populations based on samples through hypothesis testing, where statistical significance is determined by comparing a test statistic to critical values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Data Analysis

Assoc. Prof Theresa A. Guino-o


Coding Data
⚫ The process of transforming data into symbols
compatible with computer analysis—usually
numeric values
⚫ Types of coding:
⚫ Inherently quantitative variables (e.g., weight)
⚫ Precoded data (e.g., yes/no)
⚫ Uncategorized data (e.g., open-ended
questions)
⚫ Missing values (e.g., refusals, don’t knows)
Data Cleaning
⚫ Includes checks for unusual values
⚫ Outliers (unusual/extreme values)
⚫ Wild codes (impossible values)
⚫ Includes consistency checks—internal consistency
of data within a case (e.g., any pregnant males?)
Missing Values Problems
Solutions:
⚫ Delete missing cases (listwise deletion)
⚫ pairwise deletion- selective deletion of cases with
missing variables
⚫ Delete the variable
⚫ Estimate missing value (e.g., through regression)
⚫ Performing item
reversals
⚫ Constructing scales
⚫ Performing counts
⚫ Recoding variables
⚫ Meeting statistical
assumptions

⚫ Eg. P 420 scale


Principal Analyses
⚫ Consider the hypotheses/ questions
⚫ Perform descriptive statistical analyses
⚫ Perform bivariate statistical analyses
⚫ Perform multivariate analyses
Levels of Measurement
(Chapter 19 – page 451-454)

⚫ Analysis that can be performed on data


depend on their measurement level

⚫ 4 levels of measurement:
⚫ Nominal
⚫ Ordinal
⚫ Interval
⚫ Ratio
Nominal Measurement
⚫ lowest level
⚫ involves assigning numbers to classify
characteristics into categories
⚫ numbers are merely symbols that represent
different values
⚫ categories must be mutually exclusive and
collectively exhaustive.
⚫ e.x: male (1) female (2)
Ordinal Measurement
⚫ Rank orders phenomenon along some
dimensions
⚫ involves sorting objects on the basis of their
relative standing or ranking on an attribute
⚫ Ex.
⚫ 1= low
⚫ 2=medium
⚫ 3= high
Interval Measurement
⚫ A measurement in which an attribute of a
variable is rank ordered on a scale that has
equal distances between points on that
scale.
⚫ Absence of a zero point
⚫ EX. Fahrenheit degrees
⚫ E.g. self-expectancy scale
Ratio Scale
⚫ A quantitative measurement in which
intervals are equal and there is a true zero
point.
⚫ The highest level of measurement
⚫ All arithmetic operations are permissible with
this measurement (add, subtract, multiply,
and divide numbers on this scale).
⚫ E.g. pt’s weight, number of days in ICU,
height)
Descriptive vs. Inferential
Statistics
Descriptive Statistics
⚫ Descriptive Statistics are used to present
quantitative descriptions in a manageable form.
⚫ This method works by reducing lots of data into a
simpler summary.
⚫ Organizes the data for easier understanding
Examples of Descriptive Statistics
⚫ Frequency Distribution-used to group data
⚫ Percentages
⚫ Central Tendencies (Mean, Median & Mode)
⚫ Measures of Dispersion
(Range, Difference scores, Sum of squares,
variance, Standard Deviation)
This is the examination across cases of one
variable at a time (UNIVARIATE).
Described through A table or A graph
(histogram, bar chart)
A Frequency Distribution Table

Table 1 Age Distribution of Respondents

Category Percent %
Under 35 9
36-45 21
46-55 45
56-65 19
66+ 6

Total 100
Frequency Distribution –
Ungrouped (Bar graph)

Grad-1
th
4 Yr-5
rd
3 Yr-3
nd
2 Yr-1
st
1 Yr- 2
Fig. 1 Distribution of Respondents According to Year Levels in School
Frequency Distribution –
Grouped (histogram)
Ages 20-39 - 14
Ages 40-59 - 43
Ages 60-79 - 26
Ages 80-100 - 4

Fig. 2 Distribution of Respondents by Age Range


Percentage Distribution
Salaries 41.7%
Maintenance 8.3%
Equipment 16.7%
Fixed costs 8.3%
Supplies 25.0%

Fig. 3 Distribution of Company Expense Allocations


Measures of Central Tendency
⚫ An estimate of the “center” of a distribution
⚫ Three different types of estimates:
⚫ Mean
⚫ Median
⚫ Mode
Mean
● The sum of values ● 11113333334444
divided by the number 4444555555555
of values being 5555577777777
summed. 8888889999
● The mean may not be a ● 264/50=5.28
data set value.
● Appropriate for interval
and ratio measures
● The numbers need not
be arranged from highest
to lowest
Median
⚫ The median is the score found at the exact
middle of the set.
⚫ When number of values is uneven, may
not be an actual value in data set.
⚫ One must list all scores in numerical order
and then locate the score in the center of the
sample.
⚫ Example: If there are 500 scores in the list,
score #250 would be the median.
⚫ This is useful in weeding out outliers
⚫ Appropriate for ordinal measures
Mode
⚫ The mode is the most repeated score in the
set of results.; may not be always at the
center.
⚫ Lets take the set of scores: 15,20,21,20,36,15,
25,15
⚫ Line up 15,15,15,20,20,21,25,36
⚫ 15 is the most repeated score and is therefore
labeled the mode.
⚫ Appropriate for nominal measures
Measures of Dispersion
Range ● 11113333334444
● Obtained by subtracting 4444555555555
the lowest score from the 5555577777777
highest score. 8888889999
● Uses only the two extreme ●9 - 1 = 8
scores.
● Very crude measure and
sensitive to outliers.
Standard Deviation ● 1111333333444444
4455555555555555
● The square root of 777777778888889
the variance. 999
● the standard ● SD = 2.22
deviation is the
“average” difference
score.
⚫ The standard deviation is a value
that shows the relation that individual
scores have to the mean of the sample.
⚫ If scores are normally distributed
(standardized), one can assume that
approximately 69% of the scores in the
sample fall within one standard deviation
and 95% of the scores would then fall within
two standard deviations of the mean.
⚫ Can use different parametric tests for
analysis
Standard Deviations in a
Normal Distribution
Inferential Statistics
⚫ Based on the law of probability
⚫ permit inferences on whether relationships
observed in a sample are likely to occur in a
larger population
⚫ It estimates population parameters from
sample statistics
⚫ Used in hypothesis testing
⚫ Based on rules of negative inference:
research hypotheses are supported if null
hypotheses can be rejected
What is bivariate data analysis?
●Comparison of summary
values from two groups on
the same variable or of two
variables within one group
Level of Significance- the risk of
making a type 1 error
⚫ With .05 significance level, we are accepting
the risk that out of 100 samples drawn from
a population, a true null hypothesis would
be rejected only 5 times.

⚫ With a .01 level of significance, the risk of a


type I error is lower: in only 1 sample out of
100 would we erroneously reject the null
hypothesis.
Overview of Hypothesis-Testing
Procedures
1. Select an appropriate test statistic
2. Establish the level of significance(e.g., α
= .05)
3. Select a one-tailed or a two-tailed test
4. Compute test statistic with actual data
5. Calculate degrees of freedom (df) for
the test statistic (the number of sample
values free to vary from the mean)
Overview of Hypothesis-Testing
Procedures (cont’d)
6. Obtain a tabled value for the statistical test
or obtain a p-value from the computer
generated results( computed probability of a
type 1 error)
7. Determine the statistical significance of
results
Statistical significance
⚫ Means that the obtained results are not likely
to have been the result of chance.
⚫ If the absolute value or computed value of the
test statistic is larger than the tabled value, the
results are statistically significant.
⚫ OR if the the p value is smaller or more
stringent than the α value (level of
significance), the results are statistically
significant.
⚫ DECISION: Null hypothesis is rejected
Non-significance
A NON -SIGNIFICANT RESULT means that
any observed difference or relationship
could have resulted from chance
fluctuations.
Null hypothesis is accepted
In one-tailed tests, the critical region of
improbable values is entirely in one tail of the
distribution-the tail corresponding to the
direction of the hypothesis
Two-tailed test- this means that both ends or tails
of the sampling distribution are used to
determine improbable values.
Statistical Decisions are Either
Correct or Incorrect
Two types of incorrect decisions:
⚫ Type I error: a null hypothesis is rejected
when there is actually no relationship bet.
variables
⚫ Risk of a Type I error is controlled by the
level of significance (alpha), e.g., α = .01
or.05
⚫ Type II error: null hypothesis is accepted
when there is actually a rel. bet variables
⚫ Risk of a Type II error is controlled by
increasing the sample size and power
Parametric Statistics
⚫ Involve the estimation of a parameter
⚫ Require measurements on an interval scale
or ratio scale
⚫ Involve several assumptions (e.g., that
variables are normally distributed in the
population)
⚫ More powerful than non-parametric tests
⚫ E.g. T-Test, Pearson’s r, ANOVA
Nonparametric Statistics
(Distribution-Free Statistics)

⚫ Do not estimate parameters


⚫ Involve variables measured on a nominal or
ordinal scale
⚫ Have less restrictive assumptions about the
shape of the variables’ distribution than
parametric tests
⚫ E.g. Mann –Whitney U test, Wilcoxon
signed –rank test, median Test, Chi-square
Fisher’s exact test.
Quick Guide to Bivariate Statistical Tests p. 592
t-Test
Tests the difference between two means
⚫ t-Test for independent groups
(between subjects)
⚫ t-Test for dependent groups
(within subjects)
Analysis of Variance (ANOVA)
⚫ Tests the difference between 3+ means
⚫ One-way ANOVA
⚫ Multifactor (e.g., two-way)
ANOVA
⚫ Repeated measures ANOVA
(within subjects)
Correlation
Pearson’s r, a parametric test
⚫ Tests that the relationship between two
variables
⚫ Used when measures are on an interval or
ratio scale
Spearman’s Rho
⚫ Used when measures are on an ordinal scale
Correlation
● The results of the analysis provide two
pieces of information about the data
● the nature of the relationship
(positive or negative)
● the magnitude of the
relationship
● There is no indication of the direction of
a relationship in the analysis
Various Relationships Graphed
on Scatter Plots
2
Chi-Square Test (X )
⚫ Tests the difference in proportions in
categories within a contingency table
⚫ Tests for differences between
frequencies expected if groups are
alike and frequencies actually
observed in the data.
⚫ Used with nominal data
⚫ A nonparametric test
Chi-Square Results
● Chi-square results will not tell you
which cells are different- only the
proportions of each.
● Fisher’s Exact Test- used If data does
not meet the requirements of Chi
square
Regression Analysis
●Used when one wishes to
predict the value of one variable
based on the value of another
variable.
2
●R
Cronbach’s Alpha coefficient
● Tests internal consistency of
measurement scale
● To what extent is measure a true
reflection of subject’s responses.
● Reliability of 0.7 lowest acceptable alpha
● This means 70% of the time you can trust
the score to accurately reflect what is
being measured.
Phase 5: Interpretive Phase
⚫ Integrate and synthesize analyses
⚫ Interpreting statistical outcomes
⚫ Perform supplementary interpretive
analyses (e.g., power analysis)
Significant & Predicted Results
● in keeping with those predicted
by the researcher and support the
logical links developed by the
researcher between the
framework, questions, variables,
and measurement tools.
Nonsignificant Results (negative
results)Analysis showed no significant
differences or relationships
● Could be a true reflection of reality.
● If so, the researcher or the theory
used by the researcher to develop the
hypothesis is in error.
● Negative findings are an important
addition to the body of knowledge.
● Findings can have statistical
significance but not clinical
significance.
● Clinical Significance -Related to
practical importance of the findings
● ultimately a value judgment by:
● The patients and their families
● The clinician/researcher
● Society at large
Could be a type II error
● Inappropriate methods
● Biased sample
● Small sample

● Internal validity problems


● Inadequate measurement
● Weak statistical measures
● Faulty analysis
Power Analysis
⚫ A method of reducing the risk of Type II errors
and estimating their occurrence
⚫ With power = .80, the risk of a Type II error (β) is
20%
⚫ Method is frequently used to estimate how large
a sample is needed to reliably test hypotheses
Unexpected Results (questionable)
Relationships found between variables
that were not hypothesized and not
predicted from the framework being
used
● These findings can be useful in
● theory development
● modification of existing theory
● development of later studies
Significant & Not Predicted Results
● Opposite those predicted
● Indicate flaws in the logic of both
the researcher and the theory
being tested
● If valid, are an important addition
to the body of knowledge
Mixed Results
● Most common outcome
of studies
● One variable may uphold predicted
characteristics while another does
not
● Two dependent measures of the same
variable may show opposite results
● May be due to methodology problems
● May indicate need to modify existing theory
Conclusions
● A synthesis of the findings using
● logical reasoning
● creative formation of a meaningful
whole from pieces of information
obtained through data analysis and
findings from previous studies
● receptivity to subtle clues in the data
● consideration of alternative
explanations of data
Implications
● The meanings of conclusions for the
body of nursing knowledge, theory,
and practice.
● Based on, but more specific than
conclusions.
● Provide specific suggestions for
implementing the findings.
Suggesting Further Studies
● Replications
● Different design
● Larger sample
● Hypotheses emerging from
findings
● Strategies to further test the
framework in use
THE END

You might also like