0% found this document useful (0 votes)
20 views13 pages

Reseach 04

The document provides an overview of data processing and testing concepts including measurement scales, tabulation, data analysis, correlation, regression, and hypothesis testing. It explains different measurement scales (nominal, ordinal, interval, ratio), the organization of data using coding sheets, and the types of statistical tests (parametric and non-parametric). Additionally, it details the hypothesis testing process, types of errors (Type-I and Type-II), and methods for testing means and proportions for both small and large samples.

Uploaded by

rajsn5352
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Reseach 04

The document provides an overview of data processing and testing concepts including measurement scales, tabulation, data analysis, correlation, regression, and hypothesis testing. It explains different measurement scales (nominal, ordinal, interval, ratio), the organization of data using coding sheets, and the types of statistical tests (parametric and non-parametric). Additionally, it details the hypothesis testing process, types of errors (Type-I and Type-II), and methods for testing means and proportions for both small and large samples.

Uploaded by

rajsn5352
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Processing and testing

Let's break down the key concepts related to Measurement Scales, Tabulation (Coding
Sheet), Data Analysis, Correlation & Regression, and Parametric & Non-Parametric Tests.
1. Measurement Scales
Measurement scales define how variables are measured and categorized. They determine
the types of analyses that can be performed.
 Nominal Scale: This is the most basic scale of measurement. It categorizes data
without a specific order or hierarchy.
o Example: Gender (Male, Female), Eye color (Blue, Green, Brown).
 Ordinal Scale: This scale ranks data in order but the intervals between the ranks are
not necessarily equal.
o Example: Educational level (High School, College, Postgraduate), Survey
ratings (Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree).
 Interval Scale: Data is ordered, and the intervals between values are equal. However,
there is no true zero.
o Example: Temperature in Celsius or Fahrenheit.
 Ratio Scale: This is the most informative scale, which has a true zero and equal
intervals.
o Example: Height, Weight, Age, Income.
2. Tabulation (Coding Sheet)
Tabulation involves organizing data into tables for easier analysis. A coding sheet is used to
record responses and systematically organize raw data, particularly for survey or
questionnaire results.
 A coding sheet typically includes:
o Variables: The data points to be collected (e.g., age, gender, income).
o Codes: Numeric or categorical codes used to represent responses (e.g., "1"
for male, "2" for female).
o Respondent IDs: Unique identifiers for each participant or observation.
Here’s an example layout of a basic coding sheet:

Respondent ID Age Gender (1=Male, 2=Female) Income (in thousands)

001 25 1 40

002 30 2 55
Respondent ID Age Gender (1=Male, 2=Female) Income (in thousands)

003 22 1 30

3. Analysis of Data
Data analysis is the process of inspecting, cleaning, and transforming data to extract
meaningful insights. Here's a brief overview:
 Descriptive Analysis: Summarizes the data using measures like:
o Central Tendency: Mean, Median, Mode.
o Dispersion: Range, Variance, Standard Deviation.
 Inferential Analysis: Makes inferences or predictions about a population based on a
sample.
o Hypothesis Testing: Tests assumptions using various statistical tests.
4. Correlation & Regression
Both are statistical techniques used to understand relationships between variables.
 Correlation: Measures the strength and direction of the relationship between two or
more variables.
o Pearson Correlation is used for linear relationships between continuous
variables.
o Spearman’s Rank Correlation is used for ordinal or non-linear data.
 Regression: Explores the relationship between a dependent variable and one or
more independent variables.
o Linear Regression: Predicts the value of a dependent variable based on the
linear relationship with one or more independent variables.
o Multiple Regression: Examines the relationship between one dependent
variable and several independent variables.
5. Parametric & Non-Parametric Tests
These tests are used to analyze data based on certain assumptions.
 Parametric Tests: Assume that the data follows a specific distribution (usually
normal). These tests are used when data is continuous and normally distributed.
o Examples:
 t-Test: Compares means of two groups (independent or paired).
 ANOVA (Analysis of Variance): Compares means of more than two
groups.
 Pearson Correlation: Measures the relationship between two
continuous variables.
 Non-Parametric Tests: These tests do not assume a specific distribution. They are
often used when the data is not normally distributed or when dealing with ordinal
data.
o Examples:
 Chi-Square Test: Tests the association between categorical variables.
 Mann-Whitney U Test: Compares two independent groups when the
dependent variable is ordinal or not normally distributed.
 Kruskal-Wallis H Test: Compares more than two independent groups
for ordinal or non-normally distributed data.
Summary Table

Concept Explanation Examples

Different ways to categorize and Nominal, Ordinal, Interval,


Measurement Scales
measure data. Ratio

Tabulation (Coding Organizing raw data into a structured Table with codes for responses
Sheet) format for analysis. and variables.

Descriptive and inferential


Data Analysis Analyzing data for insights.
analysis.

Measures the strength of relationship Pearson, Spearman


Correlation
between two variables. correlations.

Linear regression, Multiple


Regression Predicts one variable based on another.
regression.

t-Test, ANOVA, Pearson


Parametric Tests Assumes normal distribution of data.
Correlation.

Non-Parametric Chi-Square, Mann-Whitney U,


Does not assume normal distribution.
Tests Kruskal-Wallis.

Let me know if you need more detailed examples or clarifications!

Hypothesis Testing in Detail


Hypothesis testing is a fundamental process in statistics that allows us to make inferences
about a population based on sample data. The goal is to determine whether there is enough
evidence to reject a null hypothesis in favor of an alternative hypothesis.

1. Hypothesis: Basic Concepts


What is a Hypothesis?
 A hypothesis is a testable statement or assumption about the population
parameter(s). It’s a prediction or educated guess about how variables behave in a
population.
 Hypothesis testing involves using sample data to test the validity of the hypothesis.
Null Hypothesis (H₀):
 The null hypothesis is a statement that assumes no effect, no difference, or no
relationship between the variables under investigation.
 It is the status quo or a starting point of the analysis, which we aim to test.
o Example: A company believes that a new marketing strategy will not increase
sales. The null hypothesis would be that the strategy has no effect on sales.
Alternative Hypothesis (H₁ or Ha):
 The alternative hypothesis suggests that there is an effect, difference, or
relationship.
 It’s what researchers typically aim to support or prove.
o Example: The alternative hypothesis would be that the new marketing
strategy does increase sales.
Types of Hypotheses:
 One-tailed (directional) hypothesis: Specifies the direction of the effect (greater
than or less than).
o Example: "The new drug reduces recovery time compared to the current
treatment."
 Two-tailed (non-directional) hypothesis: Simply suggests that there is a difference
without specifying the direction.
o Example: "The new drug has a different effect on recovery time compared to
the current treatment."

2. Process of Hypothesis Testing


Step-by-Step Procedure:
1. Formulate the Hypotheses:
o Start by stating both the null (H₀) and alternative (H₁) hypotheses. For
example:
 H₀: "There is no significant difference between the means of Group A
and Group B."
 H₁: "There is a significant difference between the means of Group A
and Group B."
2. Set the Significance Level (α):
o α (alpha) is the probability of making a Type-I error, or in other words,
rejecting the null hypothesis when it is true.
o Common values for α are 0.05, 0.01, or 0.10. A significance level of 0.05
means we accept a 5% risk of making a Type-I error.
o If the p-value (probability of observing the data, assuming H₀ is true) is less
than or equal to α, then we reject the null hypothesis.
3. Collect Data:
o Gather sample data from your population or experiment. For example, you
might randomly select a sample of students to compare their test scores.
4. Conduct a Statistical Test:
o Choose an appropriate test based on the data type and hypothesis. For
example:
 t-test: Compares means between two groups.
 ANOVA: Compares means between three or more groups.
 Chi-square: Compares categorical data.
o Calculate the test statistic (e.g., t-value, F-value, Chi-square statistic).
5. Decision:
o If the p-value is less than or equal to α, reject the null hypothesis (indicating
evidence for the alternative hypothesis).
o If the p-value is greater than α, fail to reject the null hypothesis (indicating
insufficient evidence to support the alternative hypothesis).
6. Interpretation:
o Draw conclusions based on the test results. If the null hypothesis is rejected,
you can conclude that the alternative hypothesis is more likely to be true.
3. Type-I and Type-II Errors
In hypothesis testing, we make decisions based on sample data. These decisions may result
in two types of errors: Type-I errors and Type-II errors.
Type-I Error (False Positive):
 Definition: A Type-I error occurs when we reject the null hypothesis when it is
actually true. Essentially, we conclude that there is a significant effect or relationship,
but in reality, there is none.
 Risk: The probability of making a Type-I error is denoted as α (alpha), also known as
the significance level.
o Example: In a drug trial, you conclude that a new drug is effective (reject the
null hypothesis), but in reality, it has no effect (the null hypothesis is true).
 Consequences:
o Incorrectly concluding that a new treatment works.
o Wasting resources on something that isn’t effective.
 Controlling Type-I Error:
o Set an appropriate significance level (e.g., α = 0.05).
o Use a larger sample size to reduce the chance of observing extreme values
that could lead to rejecting the null hypothesis when it is actually true.
Type-II Error (False Negative):
 Definition: A Type-II error occurs when we fail to reject the null hypothesis when it
is actually false. In this case, we incorrectly conclude that there is no effect or
relationship, even though the alternative hypothesis is true.
 Risk: The probability of making a Type-II error is denoted as β (beta), and 1 - β
represents the power of the test (the probability of correctly rejecting a false null
hypothesis).
o Example: In a drug trial, you conclude that the new drug is not effective (fail
to reject the null hypothesis), but in reality, the drug is effective (the null
hypothesis is false).
 Consequences:
o Missing the opportunity to adopt an effective new treatment.
o Failing to detect a true difference or effect.
 Controlling Type-II Error:
o Increase the sample size (larger samples lead to more powerful tests).
o Increase the effect size (stronger treatment effects are easier to detect).
o Use more powerful statistical tests.

Relationship Between Type-I and Type-II Errors


There is an inverse relationship between Type-I and Type-II errors:
 As α (Type-I error probability) is decreased, β (Type-II error probability) tends to
increase, and vice versa.
 Trade-off: The stricter you are about rejecting the null hypothesis (lower α), the
higher the chance that you’ll miss a true effect (higher β).
To minimize errors:
 Increase sample size: Larger samples reduce both Type-I and Type-II errors.
 Use appropriate tests: Choosing the right test based on data type can improve
power and reduce errors.

4. Summary Table of Type-I and Type-II Errors

Error Type Outcome Probability Consequence

Reject the null Incorrectly conclude a treatment


Type-I Error hypothesis when it is α (usually 0.05) effect or relationship exists (false
true positive).

Fail to reject the null


Type-II Miss a real effect or relationship
hypothesis when it is β (varies)
Error (false negative).
false

Correctly reject or fail to


Correct 1 - α (Type-I) or 1 Correctly conclude whether there is
reject the null
Decision - β (Type-II) an effect or not.
hypothesis

Key Takeaways
 Type-I Error: Occurs when we wrongly reject the null hypothesis (false positive).
 Type-II Error: Occurs when we wrongly fail to reject the null hypothesis (false
negative).
 Significance Level (α): The threshold for Type-I error.
 Power (1 - β): The probability of correctly rejecting a false null hypothesis.
 Sample Size: Larger sample sizes help reduce both Type-I and Type-II errors by
providing more reliable estimates.

Example of Hypothesis Testing with Type-I and Type-II Errors:


Let’s say you're testing a new drug for reducing cholesterol.
 Null Hypothesis (H₀): The new drug has no effect on cholesterol levels.
 Alternative Hypothesis (H₁): The new drug reduces cholesterol levels.
You conduct the experiment and collect data. Suppose:
1. Type-I Error: You incorrectly reject H₀ and conclude that the new drug reduces
cholesterol, but in reality, it doesn’t. This is a false positive.
2. Type-II Error: You fail to reject H₀ and conclude that the new drug doesn’t reduce
cholesterol, even though it actually does. This is a false negative.
By carefully setting the significance level (α), controlling sample size, and choosing an
appropriate test, you can balance the risks of these errors and improve your confidence in
the results.

Hypothesis Testing for Means and Proportions for Small and Large Samples
Hypothesis testing allows researchers to make inferences about population parameters
(such as the population mean or proportion) based on sample data. The process for testing
hypotheses varies depending on whether you are dealing with means or proportions and
whether the sample size is small or large.
We will cover:
1. Hypothesis Testing for Means
o Small samples (using t-tests)
o Large samples (using z-tests)
2. Hypothesis Testing for Proportions
o Small and large samples (using z-tests)
1. Hypothesis Testing for Means
a. Testing the Mean for Small Samples (n < 30)
When the sample size is small (less than 30), we use the t-test for hypothesis testing, as the
sample mean's distribution may not be normal. The t-distribution is used because it
accounts for the increased variability in small samples.
 Formula for t-test:
t=Xˉ−μ0s/nt = \frac{\bar{X} - \mu_0}{s / \sqrt{n}}
Where:
o Xˉ\bar{X} = sample mean
o μ0\mu_0 = hypothesized population mean
o ss = sample standard deviation
o nn = sample size
 Steps for Conducting a t-test:
1. State the hypotheses:
 Null Hypothesis (H₀): The population mean is equal to the
hypothesized value (μ=μ0\mu = \mu_0).
 Alternative Hypothesis (H₁): The population mean is not equal to the
hypothesized value (μ≠μ0\mu \neq \mu_0) or is greater than or less
than the hypothesized value (μ>μ0\mu > \mu_0 or μ<μ0\mu < \
mu_0).
2. Select the significance level (α\alpha), commonly 0.05.
3. Calculate the t-statistic using the formula above.
4. Find the critical t-value from the t-distribution table based on the degrees of
freedom (df=n−1df = n - 1) and the significance level (α\alpha).
5. Make a decision:
 If the calculated t-value exceeds the critical t-value (for two-tailed
tests), reject the null hypothesis.
 If the p-value is less than α\alpha, reject the null hypothesis.
b. Testing the Mean for Large Samples (n ≥ 30)
When the sample size is large (greater than or equal to 30), the Central Limit Theorem (CLT)
suggests that the sampling distribution of the sample mean is approximately normal, even if
the population distribution is not normal. For large samples, we typically use the z-test.
 Formula for z-test:
z=Xˉ−μ0σ/nz = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}
Where:
o Xˉ\bar{X} = sample mean
o μ0\mu_0 = hypothesized population mean
o σ\sigma = population standard deviation (or sample standard deviation if
population standard deviation is unknown)
o nn = sample size
 Steps for Conducting a z-test:
1. State the hypotheses:
 Null Hypothesis (H₀): μ=μ0\mu = \mu_0
 Alternative Hypothesis (H₁): μ≠μ0\mu \neq \mu_0 (or one-sided
alternatives).
2. Select the significance level (α\alpha), typically 0.05.
3. Calculate the z-statistic using the formula above.
4. Find the critical z-value from the z-distribution table based on the
significance level (α\alpha).
5. Make a decision:
 If the calculated z-value exceeds the critical z-value, reject the null
hypothesis.
 If the p-value is less than α\alpha, reject the null hypothesis.

2. Hypothesis Testing for Proportions


a. Testing Proportions for Small and Large Samples
For both small and large samples, we can use the z-test for proportions when we are dealing
with proportions (e.g., success/failure, yes/no).
 Formula for z-test for proportions:
z=p^−p0p0(1−p0)nz = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0 (1 - p_0)}{n}}}
Where:
o p^\hat{p} = sample proportion
o p0p_0 = hypothesized population proportion
o nn = sample size
For large samples, the sample size nn should be large enough to satisfy the normality
condition, i.e., both np0np_0 and n(1−p0)n(1 - p_0) should be greater than 5 to use the z-
test.
Steps for Conducting a z-test for Proportions:
1. State the hypotheses:
o Null Hypothesis (H₀): p=p0p = p_0 (the population proportion equals the
hypothesized value).
o Alternative Hypothesis (H₁): p≠p0p \neq p_0, p>p0p > p_0, or p<p0p < p_0.
2. Select the significance level (α\alpha), typically 0.05.
3. Calculate the z-statistic using the formula above.
4. Find the critical z-value from the z-distribution table based on the significance level
(α\alpha).
5. Make a decision:
o If the calculated z-value exceeds the critical z-value, reject the null
hypothesis.
o If the p-value is less than α\alpha, reject the null hypothesis.

Example of Hypothesis Testing for Means


Let’s assume we want to test whether the average height of male students in a university is
different from 170 cm. We will perform the test for both small and large samples.
Scenario 1: Small Sample (n = 25)
 Sample Mean (Xˉ\bar{X}) = 172 cm
 Sample Standard Deviation (s) = 8 cm
 Hypothesized Population Mean (μ0\mu_0) = 170 cm
 Sample Size (n) = 25
 Significance Level (α\alpha) = 0.05
We use the t-test formula to compute the t-statistic:
t=172−1708/25=21.6=1.25t = \frac{172 - 170}{8 / \sqrt{25}} = \frac{2}{1.6} = 1.25
Next, we compare the t-statistic with the critical t-value for 24 degrees of freedom (n - 1)
and α=0.05\alpha = 0.05. If the t-statistic is greater than the critical t-value (for a two-tailed
test), we reject the null hypothesis.
Scenario 2: Large Sample (n = 100)
 Sample Mean (Xˉ\bar{X}) = 172 cm
 Population Standard Deviation (σ\sigma) = 8 cm
 Hypothesized Population Mean (μ0\mu_0) = 170 cm
 Sample Size (n) = 100
 Significance Level (α\alpha) = 0.05
We use the z-test formula:
z=172−1708/100=20.8=2.5z = \frac{172 - 170}{8 / \sqrt{100}} = \frac{2}{0.8} = 2.5
We compare the z-statistic with the critical z-value (e.g., for a two-tailed test, the critical z-
value for α=0.05\alpha = 0.05 is ±1.96). If the z-statistic is greater than the critical z-value,
we reject the null hypothesis.

Example of Hypothesis Testing for Proportions


Let’s assume a political party claims that 60% of voters support their candidate. We want to
test this claim using a sample of 200 voters.
 Sample Proportion (p^\hat{p}) = 0.65
 Hypothesized Population Proportion (p0p_0) = 0.60
 Sample Size (n) = 200
 Significance Level (α\alpha) = 0.05
We use the z-test for proportions:
z=0.65−0.600.60(1−0.60)200=0.050.24200=0.050.0346≈1.45z = \frac{0.65 - 0.60}{\sqrt{\
frac{0.60(1 - 0.60)}{200}}} = \frac{0.05}{\sqrt{\frac{0.24}{200}}} = \frac{0.05}{0.0346} \approx
1.45
Next, compare the z-statistic with the critical z-value (±1.96 for α=0.05\alpha = 0.05). Since
the z-statistic (1.45) is less than 1.96, we fail to reject the null hypothesis.

Conclusion
 For small samples, use a t-test for means and a z-test for proportions if the sample
size is sufficiently large for normality conditions.
 For large samples, use a z-test for means and a z-test for proportions.
Let me know if you need further clarifications or additional examples!
Statistical calculations using SPSS: Creating a Data File, Defining
Variables & data, Frequencies, Crosstabs, Hypothesis Testing,
Reliability test (and above tests), factor analysis, Chi-square, test for
goodness of fits and independence. T Test, Z-Test, F-Test, U-Test,
Kruskal-Wallis Test, Mann-Whitney U Test, Wilcoxon, Multivariate
analysis ANOVA one way and two way classification,

You might also like