Statistics
Statistics
UGC NET
Psychology
Scales of Measurement
Interval Scale: The interval scale has equal intervals between consecutive points on the
scale, and it allows for meaningful comparisons of both order and the size of differences
between values. However, it lacks a true zero point. Examples include temperature measured
in Celsius or Fahrenheit. While you can say that 20 degrees is 10 degrees warmer than 10
degrees, there is no absolute zero temperature.
Ratio Scale: The ratio scale is the most sophisticated and includes a true zero point,
meaning that zero represents the absence of the measured quantity. In addition to having
equal intervals, the ratio scale enables meaningful ratios between values. Examples include
height, weight, income, and age. For instance, a person with a height of 180 cm is twice as
tall as someone with a height of 90 cm.
There are three main measures of central tendency: the mean, the median, and the mode.
1. Mean: The mean, also known as the average, is calculated by adding up all the
values in a dataset and dividing the sum by the number of observations.
● Median: The median is the middle value in a dataset when it is arranged in
ascending or descending order. If there is an even number of observations,
the median is the average of the two middle values. Arrange data in order
and find the middle value. If there is an even number of observations, take
the average of the two middle values.
● Mode: The mode is the value that appears most frequently in a dataset. A
dataset may have no mode (if all values occur equally), one mode
(unimodal), or more than one mode (multimodal).
Standard Deviation
Standard Deviation is a measure which shows how much variation (such as
spread, dispersion, spread,) from the mean exists. Standard deviation calculates
the extent to which the values differ from the average.
Parametric and Non Parametric Tests
1. Parametric Tests:
● Assumption: Parametric tests assume that the data being analyzed follow a specific probability
distribution, usually the normal distribution. Additionally, they often assume homogeneity of variance
across groups.
● Data Type: Parametric tests are typically applied to interval or ratio data.
● Examples: t-tests, analysis of variance (ANOVA), correlation, regression.
2. Non-Parametric Tests:
● Assumption: Non-parametric tests do not rely on specific assumptions about the distribution of the
population from which the sample is drawn. They are considered distribution-free or distribution-
agnostic.
● Data Type: Non-parametric tests can be applied to nominal, ordinal, interval, or ratio data.
● Examples: Mann-Whitney U test, Wilcoxon signed-rank test, Kruskal-Wallis test, Spearman's rank
correlation.
Descriptive and Inferential Statistics
Descriptive Statistics:
Descriptive statistics aim to summarize and describe the main features of a dataset. They
provide a clear and concise summary of the essential characteristics of the data, such as
central tendency, variability, and distribution.
Methods:
1. Measures of Central Tendency: Descriptive statistics include measures like the mean,
median, and mode, which represent the central or typical value of a dataset.
2. Measures of Variability: Descriptive statistics also involve measures of variability,
such as the range, variance, and standard deviation, which indicate how spread out the
values are.
3. Frequency Distributions: Descriptive statistics can be presented through tables, charts,
and graphs, such as histograms or pie charts, to illustrate the distribution of values.
Inferential Statistics:
Inferential statistics involve drawing conclusions or making inferences about a
population based on a sample of data. This branch of statistics allows researchers
to make generalizations and predictions beyond the specific data collected.
Methods:
● Hypothesis Testing: Inferential statistics often involves hypothesis
testing, where researchers test hypotheses about population
parameters using sample data.
● Confidence Intervals: Confidence intervals provide a range of
values within which a population parameter is likely to fall, based
on sample data.
● Inferential Statistical Tools: Inferential statistics include techniques
like regression analysis, t test, ANOVA, Chi Square, Correlation
Levels of Significance
In statistics, the level of significance, often denoted by the symbol α (alpha), is a
predetermined threshold used to determine the statistical significance of a hypothesis test. It
represents the probability of rejecting a true null hypothesis. The most common levels of
significance are 0.05, 0.01, and 0.10.
a. 0.05 Level of Significance (α = 0.05): This is a commonly used level of significance,
indicating that there's a 5% chance of rejecting the null hypothesis when it is true.If the
p-value (probability value) obtained from the statistical test is less than 0.05, then the
null hypothesis is rejected, suggesting that the result is statistically significant at the 5%
level.
b. 0.01 Level of Significance (α = 0.01): This is a stricter level of significance, indicating
that there's only a 1% chance of rejecting the null hypothesis when it is true.If the p-
value obtained from the statistical test is less than 0.01, then the null hypothesis is
rejected, suggesting that the result is statistically significant at the 1% level.
Type 1 and Type 2 Error
Definition: A Type I error occurs when the null hypothesis is rejected when it is actually true. In other
words, it is the mistake of concluding that there is a significant effect or difference when, in reality, there is
none.
Example: Suppose a medical researcher is testing a new drug and sets a significance level of 0.05. If,
based on the sample data, the researcher concludes that the drug is effective (rejecting the null
hypothesis), but in reality, the drug has no effect, it is a Type I error. The researcher mistakenly believes
there is a positive result when there isn't.
Type II Error (False Negative):
Definition: A Type II error occurs when the null hypothesis is accepted when it is actually false. It is the
mistake of failing to detect a real effect or difference.
Example: Continuing with the drug example, if the researcher fails to reject the null hypothesis and concludes
that the new drug has no effect, but in reality, it does have a positive effect, it is a Type II error. The researcher
mistakenly accepts the null hypothesis when it is false.
Normal Probability Curve (NPC)
● The normal probability curve, also known as Gaussian distribution, is a
fundamental concept in statistics and probability theory.
● It describes the distribution of a continuous random variable and is characterized
by a symmetrical bell-shaped curve.
● Bell-shaped Curve:The normal distribution is characterized by a bell-shaped
curve, with the highest point at the mean (average) of the distribution.
● Symmetry: The normal probability curve is symmetric. It is a bell-shaped curve,
and the left and right sides are mirror images of each other. The highest point, or
peak, of the curve is at the mean (average), which is the center of symmetry.
● Mean, Median, and Mode: The mean, median, and mode of a normal
distribution are all located at the center of the distribution, and they are equal in
value. This is a unique property of the normal distribution.
● Z-Score: The Z-score is a measure of how many standard deviations a particular data
point is from the mean. It is calculated using the formula :
Where, X is data point or score
μ = Mean
σ = Standard Deviation
● Standard Deviation: If the standard deviation is smaller, the data are somewhat close to
each other and the graph becomes narrower. If the standard deviation is larger, the data
are dispersed more, and the graph becomes wider. The standard deviations are used to
subdivide the area under the normal curve. Each subdivided section defines the
percentage of data, which falls into the specific region of a graph.
Skewness
Skewness is a measure of the asymmetry of a distribution. A distribution is asymmetrical when
its left and right side are not mirror images. In many distributions which deviate from the
normal, the values of mean, median and mode are different and there is no symmetry between
the right and the left halves of the curve. A distribution can have right (or positive), left (or
negative), or zero skewness. A right-skewed distribution is longer on the right side of its peak,
and a left-skewed distribution is longer on the left side of its peak.
● 1 indicates a perfect positive linear relationship: as one variable increases, the other
variable also increases proportionally.
● -1 indicates a perfect negative linear relationship: as one variable increases, the other
variable decreases proportionally.
● 0 indicates no linear correlation: there is no linear relationship between the two variables.
Example - Suppose you have data on the number of hours students spend studying for an
exam (variable X) and their corresponding exam scores (variable Y) for a group of students.
Now, we want to calculate the Pearson correlation coefficient to measure the strength and
direction of the linear relationship between the hours studied and exam scores.
Spearman’s Rank Method (Non-parametric Correlation)
Spearman's rank correlation coefficient, often referred to as Spearman's rho (ρ), is a non-
parametric measure of correlation between two variables. It assesses the strength and direction of
the monotonic relationship between the ranks of the data points rather than the actual values.
Spearman's rank correlation is particularly useful when dealing with data that may not meet
the assumptions of parametric correlation methods like Pearson correlation (e.g., non-
linear relationships or skewed distributions). It's widely used in various fields, especially when
dealing with ordinal or ranked data.
Here are some reasons why Spearman's rank correlation method is employed:
● Ordinal or Ranked Data: Spearman's rank correlation is particularly suitable for ordinal or
ranked data, where the values represent categories with a meaningful order but do not
necessarily have a consistent numerical difference. It allows researchers to assess the
monotonic relationship between the ranks of the variables.
● Assumption-Free: Spearman's rank correlation does not assume that the variables are
normally distributed. This makes it applicable to a wider range of data distributions,
especially when dealing with skewed or non-normally distributed data.
● Small Sample Sizes: Spearman's rank correlation tends to perform well with smaller sample
sizes. In situations where the sample size is limited, Spearman's method can provide more
reliable results than Pearson correlation.
T-Ratio
The term "t-ratio" typically refers to the t-statistic or t-ratio in statistics. The t-statistic is a measure
used in hypothesis testing to determine if there is a significant difference between the means of
two groups. It is commonly employed in situations where the sample size is small or the
population standard deviation is unknown.
Example - Imagine you are a researcher investigating the effectiveness of a new teaching method
in improving students' test scores. You have two groups of students: one group is taught using the
traditional method, and the other group is taught using the new method. You want to determine if
there is a statistically significant difference in the average test scores between the two groups.
In this example, the t-ratio helps you assess whether any observed differences in average test
scores are likely due to the teaching method itself or if they could occur by random chance.
One Way ANOVA
One-Way Analysis of Variance (ANOVA) is a statistical method used to analyze and compare the means
of three or more groups or treatments to determine if there are significant differences among them. It is
particularly useful when you want to assess whether a single categorical independent variable (often
called a factor) has a statistically significant effect on a continuous dependent variable.
In One-Way ANOVA, you have one categorical independent variable, which is the factor under
investigation. This factor has three or more levels or categories.
An investigator is interested in exploring the most effective method of instruction in the classroom. He
decides to try 3 methods - 1) Lecture 2) Seminar 3) Discussion
One way ANOVA deals with one independent variable (A) - method of teaching and can have many
sublevels (a1 Lecture, a2 Seminar, a3 Discussion).
Method of Teaching - IV
Achievement scores - DV
a1 Superior Intelligence
b2 Seminar
b3 Discussion
Dependent Variable (Response Variable): This is the variable that you are trying to predict
or explain. It is the outcome variable that is affected by changes in the independent
variables. In a simple linear regression, there is only one dependent variable, while in
multiple regression, there can be more than one.
Independent Variable(s) (Predictor Variable(s)): These are the variables that are used to
predict or explain the variation in the dependent variable. In simple linear regression, there is
one independent variable, while in multiple regression, there are multiple independent
variables.
Reliability
Reliability refers to the precision, or accuracy, of the measurement or score. Reliability refers to this
consistency of scores or measurement.
Validity
Validity refers to the extent to which a measure or test actually measures what it is intended to measure.
“Validity refers to the degree to which a test measures what it claims to measure.”