0% found this document useful (0 votes)
40 views57 pages

1 Data Collection Procedure Research Instrument and Interpretation of Data

Uploaded by

johnbenedictrago
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views57 pages

1 Data Collection Procedure Research Instrument and Interpretation of Data

Uploaded by

johnbenedictrago
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

A Research instrument is a tool used to collect, measure, and

analyze data related to your research interests.

These tools are mostly used in health sciences, social


sciences, and education to assess patients, clients, students,
teachers, staff, etc.

A research instrument can include interviews, tests, surveys,


or checklists.

The Research instrument is usually determined by the


researcher and is tied to the study methodology.
Collecting data is one major component of any type of research.

Undermining its importance would result in the production of


inaccurate data sufficient to render your research study invalid.

Hence, in collecting quantitative data, stress is given to the


accuracy or appropriateness of your data-gathering technique as
well as of the right instrument to collect the data.
Name the
following
pictures
presented
• Bar graphs should be
used for categoric,
ordered, and discrete
variables. If the number
of units in a discrete
variable is large it may
be displayed as a
continuous variable.
• Line graphs should be
used for continuous
variables.
• Pie graphs (sometimes called pie
or circle charts) are used to show
the parts that make up a whole.
They can be useful for comparing
the size of relative parts. Because
it is difficult to compare
different circle graphs, and often
hard to compare the angles of
different sectors of the pie, it is
sometimes better to choose other
sorts of graphs.
• Tables are generally used to
present large amounts of exact
values of qualitative or
quantitative data, rather than
quantitative information such as
trends or patterns. Tables can be
used to summarize information
from the Methods or Results.
When preparing tables, keep in
mind that they must be able to
stand alone.
Types of Variable
The types of variables you have usually determine what type of statistical test
you can use.
Quantitative variables represent amounts of things (e.g. the number of trees in a
forest). Types of quantitative variables include:
• Continuous (aka ratio variables): represent measures and can usually be
divided into units smaller than one (e.g. 0.75 grams).
• Discrete (aka integer variables): represent counts and usually can’t be
divided into units smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree


species in a forest). Types of categorical variables include:
• Ordinal: represent data with an order (e.g. rankings).
• Nominal: represent group names (e.g. brands or species names).
• Binary: represent data with a yes/no or 1/0 outcome (e.g. win or lose).
• summarize the characteristics of a data set
• allows you to describe a data set

Using descriptive statistics, you can report characteristics


of your data:
• The distribution concerns the frequency of each value.
• The central tendency concerns the averages of the
values.
• The variability concerns how spread out the values are.
• You collect data on the NAT scores of all 11th
graders in a school for three years.

• You can use descriptive statistics to get a quick


overview of the school’s scores in those years.
You can then directly compare the mean NAT
score with the mean scores of other schools.
Measures of central tendency help you find the middle, or
the average, of a dataset. The 3 most common measures
of central tendency are the mode, median, and mean.

• Mode: the most frequent value.


• Median: the middle number in an ordered dataset.
• Mean: the sum of all values divided by the total number
of values.
A dataset is a distribution of n number of scores or
values.
In a normal distribution,
data is symmetrically
distributed with no skew.
Most values cluster
around a central region,
with values tapering off as
they go further away from
the center. The mean,
mode and median are
exactly the same in a
normal distribution.
In skewed distributions, more values fall on one side of the
center than the other, and the mean, median and mode all
differ from each other. One side has a more spread out and
longer tail with fewer scores at one end than the other. The
direction of this tail tells you the side of the skew

In a positively skewed distribution, there’s a cluster of lower


scores and a spread out tail on the right. In a negatively skewed
distribution, there’s a cluster of higher scores and a spread out
tail on the left.
In this histogram, your distribution is skewed to In this histogram, your distribution is skewed to the left,
the right, and the central tendency of your and the central tendency of your dataset is towards the
dataset is on the lower end of possible scores. higher end of possible scores.
In a positively skewed distribution, In a negatively skewed distribution,
mode < median < mean. mean < median < mode.
The mode is the most frequently occurring value in the dataset. It’s possible to
have no mode, one mode, or more than one mode.

To find the mode, sort your dataset numerically or categorically and select the
response that occurs most frequently.
The median of a dataset is the value that’s exactly in the middle when it is ordered
from low to high
For an odd-numbered dataset, find the value that lies at
the position, where n is the number of values in the dataset.
The arithmetic mean of a dataset (which is different from the geometric mean) is
the sum of all values divided by the total number of values. It’s the most commonly
used measure of central tendency because all values are used in the calculation.
The 3 main measures of central tendency are best used in combination
with each other because they have complementary strengths and
limitations. But sometimes only 1 or 2 of them are applicable to your
dataset, depending on the level of measurement of the variable.

• The mode can be used for any level of measurement,


but it’s most meaningful for nominal and ordinal
levels.
• The median can only be used on data that can be
ordered – that is, from ordinal, interval and ratio
levels of measurement.
• The mean can only be used on interval and ratio
levels of measurement because it requires equal
spacing between adjacent values or scores in the
scale.
help you come up with conclusions and make predictions
based on your data

Inferential statistics have two main uses:


• making estimates about populations (for example, the
mean NAT score of all 11th graders in the US).
• testing hypothesis to draw conclusions about populations
(for example, the relationship between NAT scores and
family income).
The characteristics of samples and populations
are described by numbers called statistics and
parameters:

• A statistics is a measure that describes the


sample (e.g., sample mean).
• A parameter is a measure that describes the
whole population (e.g., population mean).
Statistical tests come in three forms:
tests of (1) comparison, (2) correlation or
(3) regression.
Comparison tests assess whether there are differences in
means, medians or rankings of scores of two or more
groups.

To decide which test suits your aim, consider whether


your data meets the conditions necessary for parametric
tests, the number of samples, and the levels of
measurement of your variables.

Means can only be found for interval or ratio data, while


medians and rankings are more appropriate measures for
ordinal data.
T Test
• A t test is a statistical test that is used to compare the means of
two groups. It is often used in hypothesis testing to determine
whether a process or treatment actually has an effect on the
population of interest, or whether two groups are different from
one another.
• When choosing a t test, you will need to consider two things:
whether the groups being compared come from a single population
or two different populations, and whether you want to test the
difference in a specific direction.
• If the groups come from a single population (e.g., measuring
before and after an experimental treatment), perform a paired t
test. This is a within-subjects design.
• If the groups come from two different populations (e.g., two
different species, or people from two separate cities), perform a
two-sample t test (a.k.a. independent t test). This is a between-
subjects design.
• If there is one group being compared against a standard value
(e.g., comparing the acidity of a liquid to a neutral pH of 7),
perform a one-sample t test.
One-tailed or two-tailed t test
• If you only care whether the two populations are
different from one another, perform a two-tailed t
test.
• If you want to know whether one population mean is
greater than or less than the other, perform a one-
tailed t test.
ANOVA
• ANOVA, which stands for Analysis of Variance, is a statistical
test used to analyze the difference between the means of more
than two groups.
• A one-way ANOVA uses one independent variable, while a two-way
ANOVA uses two independent variables.
ANOVA
• Use a one-way ANOVA when you have collected data about one
categorical independent variable and one quantitative dependent
variable. The independent variable should have at least three
levels (i.e. at least three different groups or categories).
• ANOVA tells you if the dependent variable changes according to
the level of the independent variable. For example:
• Your independent variable is social media use, and you assign groups to
low, medium, and high levels of social media use to find out if there is a
difference in hours of sleep per night.
• Your independent variable is brand of soda, and you collect data on
Coke, Pepsi, and Sprite to find out if there is a difference in the price
per 100ml.
• Your independent variable is type of fertilizer, and you treat crop
fields with mixtures 1, 2 and 3 to find out if there is a difference in
crop yield.
Correlation tests determine the extent to which two variables
are associated.

Although Pearson’s r is the most statistically powerful test,


Spearman’s rho is appropriate for interval and ratio variables
when the data doesn’t follow a normal distribution.

The chi square test of independence is the only test that can be
used with nominal variables.
Pearson Correlation Coefficient (r)
• The Pearson correlation coefficient (r) is the most common way
of measuring a linear correlation. It is a number between –1 and 1
that measures the strength and direction of the relationship
between two variables.
Pearson Correlation Coefficient (r)
Pearson Correlation Coefficient (r)
Spearman’s Rho
• Spearman’s Rho is used to understand the strength of the
relationship between two variables. Your variables of interest can
be continuous or ordinal and should have a monotonic relationship.
• Every statistical method has assumptions. Assumptions mean
that your data must satisfy certain properties in order for
statistical method results to be accurate (1) continuous or
ordinal, (2) monotonicity.
Chi-square Test
• A Pearson’s chi-square test is a statistical test for
categorical data. It is used to determine whether your
data are significantly different from what you expected.
Chi-square Test of Independence
• You can use a chi-square test of independence when you have two
categorical variables. It allows you to test whether the two
variables are related to each other. If two variables are
independent (unrelated), the probability of belonging to a certain
group of one variable isn’t affected by the other variable.
Chi-square goodness of fit test
• You can use a chi-square goodness of fit test when you have one
categorical variable. It allows you to test whether the frequency
distribution of the categorical variable is significantly different
from your expectations. Often, but not always, the expectation is
that the categories will have equal proportions.
Chi-square goodness of fit test
Regression tests demonstrate whether changes in predictor
variables cause changes in an outcome variable. You can decide which
regression test to use based on the number and types of variables
you have as predictors and outcomes.

Most of the commonly used regression tests are parametric. If your


data is not normally distributed, you can perform data
transformations.

Data transformations help you make your data normally distributed


using mathematical operations, like taking the square root of each
value.
Simple Linear Regression
• Simple linear regression is used to estimate the
relationship between two quantitative variables. You can
use simple linear regression when you want to know:
• How strong the relationship is between two variables
(e.g., the relationship between rainfall and soil erosion).
• The value of the dependent variable at a certain value
of the independent variable (e.g., the amount of soil
erosion at a certain level of rainfall).
Multilinear Linear Regression
• Multiple linear regression is used to estimate the relationship
between two or more independent variables and one dependent
variable. You can use multiple linear regression when you want to
know:
• How strong the relationship is between two or more
independent variables and one dependent variable (e.g. how
rainfall, temperature, and amount of fertilizer added affect
crop growth).
• The value of the dependent variable at a certain value of the
independent variables (e.g. the expected yield of a crop at
certain levels of rainfall, temperature, and fertilizer
addition).
Multilinear Linear Regression
• Suppose we fit a multiple linear regression model using the
predictor variables hours studied and prep exams taken and a
response variable exam score.
Multilinear Linear Regression
• Suppose we fit a multiple linear regression model using the
predictor variables hours studied and prep exams taken and a
response variable exam score.

You might also like