0% found this document useful (0 votes)
34 views25 pages

Unit 1

Health psychology notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views25 pages

Unit 1

Health psychology notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

STATISTICS

UNIT 1
Introduction: Meaning of statistics, Classification of statistics – descriptive vs
inferential, parametric vs non-parametric. Levels of Measurement. Measures of central
tendency – Mean, median, Mode. Measures of variability – Inter Quartile Range,
quartile deviation, standard deviation. Normal Distribution – Meaning, importance,
properties.

MEANING OF STATISTICS
 The word Statistics ‘and Statistical ‘are all derived from the Latin word Status, means
a political state.
 Statistics is concerned with scientific methods for collecting, organising,
summarising, presenting and analysing data as well as deriving valid conclusions and
making reasonable decisions on the basis of this analysis. Statistics is concerned with
the systematic collection of numerical data and its interpretation

Classification of statistics – descriptive vs inferential, parametric vs non-parametric.

DESCRIPTIVE VS INFERENTIAL

What is Descriptive Statistics?

 Descriptive Statistics describes the characteristics of a data set. It is a simple


technique to describe, show and summarize data in a meaningful way.
 You simply choose a group you’re interested in, record data about the group, and then
use summary statistics and graphs to describe the group properties.
 The process allows you to obtain insights and visualize the data rather than simply
pouring through sets of raw numbers. With descriptive statistics, you can describe
both an entire population and an individual sample
 Examples of descriptive statistics in psychology include calculating the average score
on a psychological test, finding the most frequent response in a survey, or measuring
the spread of scores to understand the variability within a group of participants.

What is Inferential Statistics?

 Inferential statistics go beyond describing data and aim to make inferences or draw
conclusions about populations based on sample data. These techniques allow
researchers to generalize their findings to larger populations and assess the likelihood
that observed differences or relationships are not due to random chance.
 For example, if a psychologist wants to know if a new therapy is more effective than
an existing one, they might use inferential statistics to analyze data from a sample of
patients and determine whether the observed improvement is likely to be a real effect
or just due to chance.

Similarities Between Descriptive and Inferential Statistics


 Descriptive and inferential statistics are both used to analyse and comprehend data,
which is a similar function to that of descriptive statistics.
 They both employ statistical techniques and instruments to make judgements about a
community.
 The same fundamental ideas in probability, such as selection, randomization, and
probability distributions, are also used by both of them.
 Last but not least, they both employ the same kinds of statistical programs, including
SPSS, SAS, and R.

Difference Between Descriptive and Inferential Statistics


 Descriptive statistics provide a summary of the features or attributes of a dataset,
while inferential statistics enable hypothesis testing and evaluation of the
applicability of the data to a larger population. Here are the key differences between
descriptive vs inferential statistics
Descriptive Inferential Statistics
Statistics

Purpose Describe and Make inferences and draw conclusions


summarize data about a population based on sample data

Data Analysis Analyzes and Uses sample data to make generalizations


interprets the or predictions about a larger population
characteristics of a
dataset

Population vs Sample Focuses on the Focuses on a subset of the population


entire population (sample) to draw conclusions about the
or dataset entire population

Measurements Provides measures Estimates parameters, tests hypotheses,


of central and determines the level of confidence or
tendency and significance in the results
dispersion

Examples Mean, median, Hypothesis testing, confidence intervals,


mode, standard regression analysis, ANOVA (analysis of
deviation, range, variance), chi-square tests, t-tests, etc.
frequency tables

Goal Summarize, Generalize findings to a larger population,


organize, and make predictions, test hypotheses,
present data evaluate relationships, and support
decision-making

Population Parameters Not typically Estimated using sample statistics (e.g.,


Tools of Descriptive Statistics

Measures of centre tendency (mean, median, mode), measures of variability (range, variance,

standard deviation), frequency distributions, histograms, scatterplots, and box plots are

examples of descriptive statistics tools.

Tools of Inferential Statistics


Hypothesis testing, confidence intervals, regression analysis, analysis of variance (ANOVA),
and chi-square tests are examples of inferential statistics tools.

PROS AND CONS OF DESCRIPTIVE VS INFERENTIAL STATISTICS

Descriptive Statistics:

Pros:

1. Simplicity: Descriptive statistics are straightforward and easy to understand, making


them useful for summarizing data quickly.
2. Clarity: They provide a clear overview of the main features of the data, such as
central tendency and variability.
3. Initial Insights: Descriptive statistics help researchers identify patterns and trends in
data, leading to initial insights.

Cons:

1. Limited Inference: Descriptive statistics don't allow for generalizing findings beyond
the dataset itself; they lack the ability to make broader conclusions.
2. Lack of Context: Descriptive statistics may not reveal underlying relationships or
factors that might impact the data.
3. No Significance Testing: Descriptive statistics don't assess whether observed
differences are statistically significant or due to chance.

Inferential Statistics:

Pros:
1. Generalization: Inferential statistics enable researchers to draw conclusions about
populations beyond the sampled data.
2. Significance Testing: They allow researchers to test hypotheses and determine
whether observed differences are statistically significant or likely due to chance.
3. Complex Relationships: Inferential statistics help identify and understand complex
relationships among variables through techniques like regression and ANOVA.

Cons:

1. Complexity: Inferential statistics can be more complex and require a deeper


understanding of statistical concepts.
2. Assumptions: They often rely on assumptions about the data, such as normal
distribution, which might not always hold true.
3. Potential Misinterpretation: If not used correctly, inferential statistics can lead to
misinterpretation of results or drawing incorrect conclusions.

PARAMETRIC VS NON-PARAMETRIC

Parametric Tests:

Parametric tests assume that the data follows a specific distribution, usually the normal
distribution. These tests require certain assumptions to be met, such as homogeneity of
variance and normality of data

Common examples of parametric tests include:

 Student's t-test: Used to compare means of two groups.

 Analysis of Variance (ANOVA): Used to compare means of three or more groups.

 Linear Regression: Used to analyze the relationship between a dependent variable


and one or more independent variables.

Advantages of parametric tests:

 Generally more powerful when assumptions are met.

 Provide more precise estimates of parameters.

Disadvantages of parametric tests:


 Assumptions can be restrictive and may not be met in real-world data.

 Not suitable for data that deviate significantly from the assumed distribution.

Non-parametric Tests: Non-parametric tests, also known as distribution-free tests, do not


assume a specific distribution for the data. These tests are used when the data does not meet
the assumptions of normality and homogeneity of variance, or when dealing with ordinal or
nominal data.

Common examples of non-parametric tests include:

 Mann-Whitney U test: Used to compare medians of two groups.

 Wilcoxon signed-rank test: Used to compare paired data.

 Kruskal-Wallis test: Non-parametric equivalent of ANOVA.

 Spearman's rank correlation: Non-parametric equivalent of Pearson correlation for


ranked data.

Advantages of non-parametric tests:

 Do not rely on stringent assumptions about the distribution of data.

 Can be used with data that is not normally distributed or when sample sizes are small.

Disadvantages of non-parametric tests:

 Generally less powerful than parametric tests when assumptions of parametric tests
are met.

 Provide less precise estimates of parameters.

DIFFERENCE BETWEEN PARAMETRIC VS NON-PARAMETRIC TEST

ADVANTAGES AND DISADVANTAGES OF PARAMTERIC TEST


Advantages:

1. Statistical Power: Parametric tests tend to have higher statistical power when the
assumptions are met. This means they are more likely to detect true differences or
relationships if they exist in the data.

2. Efficiency: When the assumptions are satisfied, parametric tests are often more
efficient, meaning they require smaller sample sizes to achieve the same level of
confidence compared to non-parametric tests.

3. Well-Developed: Parametric tests have been extensively studied and well-developed


over time. There's a wide range of parametric tests available for different types of
analyses.

4. More Information: They provide more detailed information about the data, including
estimates of population parameters (like means and variances), which can be useful in
drawing meaningful conclusions.

Disadvantages:

1. Assumptions: Parametric tests rely on assumptions about the underlying distribution


of the data. If these assumptions are violated, the results can be misleading or
incorrect. Common assumptions include normality of data and homogeneity of
variances.

2. Limited Applicability: Parametric tests might not be suitable for all types of data. If
the data doesn't follow the assumed distribution, using parametric tests can lead to
inaccurate results.

3. Sensitive to Outliers: Parametric tests can be sensitive to outliers, especially when


the sample size is small. Outliers can heavily influence the mean and variance, which
these tests rely on.

4. Data Transformation: Sometimes, to meet the assumptions, you might need to


transform your data. While this can make the assumptions hold, it can also complicate
the interpretation of results.

5. Less Robust: Parametric tests are less robust when assumptions are violated
compared to non-parametric tests, which don't rely on these assumptions.

NON-PARAMETRIC TEST

MERTIS

1. Robustness: Non-parametric tests are robust against deviations from assumptions like
normality and equal variances. They can handle data with outliers and non-normal
distributions without affecting the results significantly.
2. Flexibility: Non-parametric tests can be applied to a wide range of data types,
including ordinal and categorical data, which may not have a clear numerical
interpretation.

3. Fewer Assumptions: Non-parametric tests do not assume a specific distribution for


the population data, making them more suitable when data distribution assumptions
are not met.

4. Simplicity: Non-parametric tests often involve simpler calculations and fewer


assumptions, making them accessible to researchers without extensive statistical
training.

5. Small Sample Sizes: Non-parametric tests can be used effectively with small sample
sizes when parametric assumptions are not met.

6. Nonlinear Relationships: Non-parametric tests can detect relationships that


parametric tests might miss when relationships between variables are non-linear.

7. Ease of Interpretation: Results from non-parametric tests are often straightforward


to interpret, especially for non-statisticians, as they often involve comparisons of
rankings or medians.

Disadvantages of Non-Parametric Tests:

1. Reduced Power: Non-parametric tests generally have lower statistical power than
parametric tests when the data meets the assumptions of the latter. This means they
might be less likely to detect true effects.

2. Less Precise Estimation: Non-parametric tests tend to provide less precise estimates
of population parameters compared to parametric tests when the data follows the
parametric assumptions.

3. Limited Test Options: There are fewer types of non-parametric tests available
compared to parametric tests. This can limit the ability to test specific research
questions.

4. Limited for Complex Analyses: Non-parametric tests might not be suitable for more
complex analyses involving multiple variables, interactions, and covariates.

5. Loss of Information: Non-parametric tests often involve converting data into ranks,
which can lead to a loss of information from the original data.

6. Difficulty Handling Ties: Non-parametric tests can struggle to handle tied


observations (when multiple data points have the same value), which can lead to
challenges in interpretation.

7. Less Widely Understood: While some non-parametric tests are widely known and
used, they might be less familiar to researchers and practitioners than their parametric
counterparts.
LEVELS OF MEASUREMENT

Levels of measurement, also known as measurement scales or scales of measurement, are a


way of categorizing different types of data based on the characteristics of the data and the
mathematical operations that can be performed on them. There are four commonly
recognized levels of measurement: nominal, ordinal, interval, and ratio.

Nominal Scale:

The nominal scale is the simplest level of measurement. Data at this level are
categorical and are used to categorize items into distinct groups or categories. In this scale,
data points are assigned to different categories based on some shared characteristic, but the
categories themselves have no inherent order or numerical value.. Examples of nominal data
include gender, ethnicity, religious affiliation

Ordinal Scale:

An ordinal scale is a type of measurement scale used in statistics that involves ordering
or ranking data based on some characteristic, without assigning specific numerical values that
represent the magnitude of the differences between the categories. Examples include
educational levels (e.g., high school, bachelor's degree, master's degree) or customer
satisfaction ratings (e.g., "very satisfied," "satisfied," "neutral," "dissatisfied," "very
dissatisfied")

Interval Scale:

Interval data is a type of measurement scale used in statistics that has ordered categories
and consistent intervals between them. In contrast to nominal and ordinal data, interval data
allows for meaningful comparisons of the differences between values because the intervals
are equal and have a consistent meaning. However, interval data doesn't have a true zero
point, meaning that ratios between values are not meaningful. arithmetic operations like
addition and subtraction can be performed on interval data. Examples of data at the interval
scale include temperatures measured in Celsius or Fahrenheit.

Ratio Scale:

The ratio scale is the most advanced level of measurement. It includes an ordered
scale, meaningful differences between values, a true zero point that indicates the absence of
the attribute, and consistent measurement units. Examples of ratio data include height,
weight, age, and income.

Valid Nominal Ordinal Interval Ratio

Frequency Distributions Yes Yes Yes Yes


Median, percentile ranges No Yes Yes Yes

Addition, subtraction, mean,


No No Yes Yes
standard deviation

Multiplication, division, ratios,


No No No Yes
coefficient of variation

Measures of central tendency – Mean, median, Mode.


The central tendency is stated as the statistical measure that represents the single value of the
entire distribution or a dataset. It aims to provide an accurate description of the entire data in
the distribution.

MEAN

The mean is one of the most commonly used measures of central tendency. It provides a way
to summarize a dataset by calculating the arithmetic average of all the values within it.

Mathematically, the mean is calculated by summing up all the values in the dataset and then
dividing the sum by the total number of values. The mean provides insight into the general
level or balance of the data, helping to understand its central location.

There are two main objectives of averaging

1. To get a single value that describes the characteristics of the entire group

2. To facilitate comparison between groups.

Mean formula
25, 28, 24, 26.

1. Add Up the Ages: Add together all the ages. Sum of ages = 25 + 28 + 24 + 26 = 103

2. Count the Ages: Determine the total number of ages. Total number of ages = 4

3. Calculate the Mean: Divide the sum of ages by the total number of ages. Mean = Sum
of ages / Total

4. 103/4=25.75

Properties of Mean

1. Mean can be calculated for any set of numerical data.

2. A set of numerical data has one and only one mean.

3. Mean is the most reliable measure of central tendency since it takes into account every
item in the set of data.

4. The mean is affected by unusually large or small data values.

5. The sum of the differences between individual observations and the mean is zero.

7. The sum of squares of deviation of set of values about its mean is minimum.

9. Mean is not independent of change of origin and change of scale.

When to use the mean?

The mean is used when both of the following conditions are met:

1. Data is scaled a) Data with equal intervals like speed, weight, height, temperature, etc.

2. Distribution is normal b) The mean is sensitive to outliers that are found in skewed
distributions, you should only use the mean when the distribution is more or less normal.

Advantages and disadvantages

MERIT
1. It can be easily calculated; and can be easily understood. It is the reason that it is the
most used measure of central tendency.

2. As every item is taken in calculation, it is effected by every item.


3. As the mathematical formula is rigid one, therefore the result remains the same.

4. When repeated samples are gathered from the same population, fluctuations are minimal
for this measure of central tendency.

5. Unlike other measures like as mode and median, it can be subjected to algebraic treatment.

6. A.M. has an advantage in that it is a calculated quantity that is not depending on the order
of terms in a series.

7. Due to its strict definition, it is mostly used to compare issues.

DEMERITS

1. It cannot be located graphically.

2. A single component can have a significant impact on the outcome.

3. Only if the frequency is regularly distributed will it be useful. If the skewness is greater,
the results will be ineffectual.

4. In the case of open end class intervals, we must assume the intervals’ boundaries, and a
small fluctuation in X is possible. This is not the case with median and mode, as the open end
intervals are not used in their calculations.

 Weighted mean: some values contribute more to the mean than others.
 Geometric mean: values are multiplied rather than summed up.

 Harmonic mean: reciprocals of values are used instead of the values themselves.
MEDIAN

The median is another measure of central tendency that represents the middle value in a
dataset when the values are arranged in order. It's the value that separates the data into two
equal halves: half of the values are greater than or equal to the median, and half are less than
or equal to it. The median is less sensitive to outliers and skewed distributions compared to
the mean.

PROPERTIES

In statistics, the properties of the median are explained in the following points:xfsh

1. There is a unique median for each data set.


2. It is not affected by extremely large or small values and is therefore a valuable
measure of central tendency when such values occur.
3. It is not applicable to qualitative data.
4. Median is used when the data are ordinal.
5. Median can determine by graphic method.
6. The sum of the absolute deviations taken from the median is less than the sum
of the absolute deviations taken from any other observation in the data.

When to use the median?

The median is used when either one of two conditions are met.

If the, 1. Data is ordinal 2. Distribution is skewed or non-normal

FORMULA

Calculating the median for individual series is as follows:

 The data is arranged in ascending or descending order.


 If it is an odd-sized sample, median = value of ([n + 1] / 2)th item.
 If it is an even-sized sample, median = ½ [ value of (n / 2)th item + value of ([n / 2] +
1)th item]
Calculating the median for discrete series is as follows:

 Arrange the data in ascending or descending order.


 The cumulative frequencies need to be computed.
 Median = (n / 2)th term, n refers to cumulative frequency.
The formula to find the median for a continuous distribution is:
Where l = lower limit of the median class

f = frequency of the median class

N = the sum of all frequencies

i = the width of the median class

C = the cumulative frequency of the class preceding the median class

Merits or Uses of Median:


1. Median is rigidly defined as in the case of Mean.

2. Even if the value of extreme item is much different from other values, it is not much
affected by these values e.g. Median in case of 4, 7, 12, 18, 19 is 12 and if we add two
values equal to 450 10000, new median is 18.

3. It can also be used for the Quantities; those can’t give A.M; as is in case of
intelligence etc. It is possible to arrange in any order and to locate the middle valve. For
such cases it is the best measure.

4. It can be located graphically.

5. For open end intervals, it is also suitable one. As taking any value of the intervals,
value of Median remains the same.

6. It can be easily calculated and is also easy to understand

7. Median is also used for other statistical devices such as Mean Deviation and skewness.

8. It can be located by inspection in some cases.

9. Extreme items may not be available to get Median. Only if number of terms is known,
we can get median e.g.

Demerits or Limitations of Median:

1. Even if the value of extreme items is too large, it does not affect too much, but due to
this reason, sometimes median does not remain the representative of the series.

2. It is affected much more by fluctuations of sampling than A.M.

3. Median cannot be used for further algebraic treatment. Unlike mean we can neither
find total of terms as in case of A.M. nor median of some groups when combined.

4. In a continuous series it has to be interpolated. We can find its true-value only if the
frequencies are uniformly spread over the whole class interval in which median lies.
5. If the number of series is even, we can only make its estimate; as the A.M. of two
middle terms is taken as Median.

MODE

The mode is a statistical measure that represents the value that appears most frequently in a
dataset. In other words, the mode is the value that occurs with the highest frequency among
all the values in the dataset. Unlike the mean and median, which focus on the central
tendency of the data, the mode highlights the most common value(s).

Properties of Mode

1. The mode is the easiest to compute.

2. The mode is not always unique. A data set can have more than on mode, or the mode may
not exist for a data set.

3. Can be used for qualitative as well as quantitative data.

4. Not affected by extreme values.

5. The mode can be used when the data are nominal or categorical, such as religious
preference, gender, or political affiliation.

6. It cannot be manipulated algebraically: modes of subgroups cannot be combined.

When to use the mode? The Mode is used when you want to know the most frequent
response, number or observation in a distribution.

Merits of mode:

Following are the various merits of mode:

(1) Simple and popular: - Mode is very simple measure of central tendency. Sometimes, just
at the series is enough to locate the model value. Because of its simplicity, it s a very popular
measure of the central tendency.

(2) Less effect of marginal values: - Compared top mean, mode is less affected by marginal
values in the series. Mode is determined only by the value with highest frequencies.

(3) Graphic presentation:- Mode can be located graphically, with the help of histogram.

(4) Best representative: - Mode is that value which occurs most frequently in the series.
Accordingly, mode is the best representative value of the series.

(5) No need of knowing all the items or frequencies: - The calculation of mode does not
require knowledge of all the items and frequencies of a distribution. In simple series, it is
enough if one knows the items with highest frequencies in the distribution.

Demerits of mode:

Following are the various demerits of mode:

(1) Uncertain and vague: - Mode is an uncertain and vague measure of the central tendency.

(2) Not capable of algebraic treatment: - Unlike mean, mode is not capable of further
algebraic treatment.

(3) Difficult: - With frequencies of all items are identical, it is difficult to identify the modal
value.

(4) Complex procedure of grouping:- Calculation of mode involves cumbersome procedure


of grouping the data. If the extent of grouping changes there will be a change in the model
value.

(5) Ignores extreme marginal frequencies:- It ignores extreme marginal frequencies. To


that extent model value is not a representative value of all the items in a series. Besides, one
can question the representative character of the model value as its calculation does not
involve all items of the series.

Advantages and disadvantages of measures of central tendency

Advantages:

1. Simplicity: Measures of central tendency are relatively easy to calculate and


understand. They provide a quick way to summarize a dataset and communicate its
general "center."

2. Useful for Summary: These measures provide a single value that represents the
center or average of the data, which can be helpful for summarizing large datasets and
making comparisons.

3. Easy Interpretation: The concept of central tendency is intuitive. It's easy to grasp
that the mean, median, or mode represents a "typical" value in the dataset.

4. Basis for Further Analysis: Measures of central tendency are often used as starting
points for more advanced statistical analyses and inferential procedures.

5. Data Reduction: When working with large datasets, these measures allow you to
condense a vast amount of information into a single value, simplifying analysis.
Disadvantages:

1. Sensitive to Outliers: The mean is particularly sensitive to extreme values (outliers)


and can be skewed by them. This can lead to misrepresentations of the overall dataset.

2. Lack of Information: Measures of central tendency don't provide insights into the
full distribution of the data. They might hide important details about how data is
spread out.

3. Unrepresentative for Skewed Data: If the data distribution is heavily skewed or not
symmetric, the mean might not accurately represent the center of the data.

4. Inapplicability to Categorical Data: The mean is not suitable for categorical or


nominal data. While median and mode can be applied, their interpretation might not
be as straightforward.

5. Mode Ambiguity: A dataset might have multiple modes or no clear mode, making the
mode less informative in some cases.

6. Dependence on Sample: The mean and mode can be influenced by the specific
sample you have. If you took a different sample from the same population, you might
get slightly different values.

7. Misleading for Bimodal Data: In cases where the data has two distinct peaks, the
mean might fall between the peaks and not represent either peak well.

NORMAL DISTRIBUTION
Normal distribution, also known as the Gaussian distribution, is a probability distribution A
normal distribution or Gaussian distribution refers to a probability distribution where the
values of a random variable are distributed symmetrically. These values are equally
distributed on the left and the right side of the central tendency. Thus, a bell-shaped curve is
formed.

In graphical form, the normal distribution appears as a "bell curve".

IMPORTANCE

1. Central Limit Theorem: The normal distribution is closely tied to the Central Limit
Theorem (CLT), which states that the sum (or average) of a large number of
independent and identically distributed random variables tends to follow a normal
distribution, even if the individual variables are not normally distributed themselves.
This property is crucial in statistics as it allows us to make inferences about
population parameters based on sample data.
The central limit theorem, which is a statistical theory, states that when a large
sample size as a finite variance, the samples will be normally distributed, and the
mean of samples will be approximately equal to the mean of the whole population. As
the sample size gets bigger and bigger, the mean of the sample will get closer to the
actual population mean. If the sample size is small, the actual distribution of the data
may or may not be normal, but as the sample size gets bigger, it can be approximated
by a normal distribution.

2. Statistical Inferenceany statistical tests and methods, such as t-tests, ANOVA, and
regression analysis, rely on assumptions of normality. When data are approximately
normally distributed, these tests tend to perform well and yield reliable results.
Deviations from normality can affect the validity of these tests..

3. Parameter Estimation: The normal distribution has only two parameters, the mean
(μ) and the standard deviation (σ), which are easy to interpret and estimate. This
makes it a convenient choice for modeling various phenomena in real-world
situations.

4. Real-world Phenomena: The normal distribution often occurs naturally in various


processes and measurements. Examples include the heights of individuals, errors in
measurements, test scores, and many biological, physical, and social variables.
Recognizing the normal distribution helps in understanding the characteristics of these
phenomena.

5. Predictive Modeling: In fields like finance and risk analysis, the normal distribution
is used to model asset prices and returns. It forms the foundation for various risk
assessment and portfolio management techniques.

6. Quality Control: In manufacturing and quality control, the normal distribution is


used to set acceptable limits for product dimensions and characteristics. Deviations
from the normal distribution can indicate potential defects or issues in the production
process.

7. Standardization and Z-Scores: The concept of standard scores or z-scores, which


indicate how many standard deviations a data point is from the mean, is based on the
properties of the normal distribution. This is useful for comparing data points from
different distributions.

8. Psychological Testing: Many psychological tests, such as IQ tests and aptitude tests,
are designed to have a normal distribution of scores in the general population. This
design allows for the identification of individuals who fall above or below a
certain threshold, aiding in diagnostic and decision-making processes.

9. Psychological Measurement: Many psychological traits, such as intelligence,


personality traits, and cognitive abilities, tend to follow a normal distribution in the
population. This distribution allows psychologists to understand the typical range
of scores and the prevalence of various traits within a population.
10. Clinical Psychology and Diagnosis: In clinical psychology, the normal distribution is
important for understanding the distribution of symptoms, behaviors, and disorders
within a population. It helps in setting diagnostic criteria and determining the
prevalence of specific psychological conditions.

.Psychopathology and Abnormal Behavior: While many psychological traits follow a


normal distribution, some variables related to psychopathology and abnormal behavior
might deviate from this pattern. Studying deviations from the normal distribution can
provide insights into the prevalence and nature of psychological disorders.

Properties from book

f(x)=1σ√2πe−(x−μ)22σ2

Test for normal distribution

Shapiro wilk test, Kolmogorov Smirnov test

PROPERTIES

Empirical Rule (68-95-99.7 Rule): This rule states that approximately 68% of the data falls
within one standard deviation of the mean, about 95% within two standard deviations, and
approximately 99.7% within three standard deviations.

Measures of variability – Inter Quartile Range, quartile deviation, standard deviation.

Measures of variability

Measures of variability, also known as measures of dispersion, are statistical metrics that
provide information about the spread or dispersion of a dataset. They help you understand
how the data points are spread out from the central tendency (mean, median, mode) of the
dataset.

INTERQUARTILE RANGE

Quartiles are the values that divide a list of numerical data into three quarters. There are
three quartiles, first, second and third, denoted by Q1, Q2 and Q3. Here, Q2 is nothing but the
median of the given data.

The interquartile range (IQR) measures the spread of the middle half of your
dataIn Statistics, the range is the smallest of all the measures of dispersion. It is the
difference between the two extreme conclusions of the distribution. In other words, the range
is the difference between the maximum and the minimum observation of the distribution.

It is defined by
Range = Xmax – Xmin

Where Xmax is the largest observation and Xmin is the smallest observation of the variable
values.

Interquartile Range Definition

The interquartile range (IQR) measures the spread of the middle half of your
data. The interquartile range defines the difference between the third and the first quartile.
Quartiles are the partitioned values that divide the whole series into 4 equal parts. So, there
are 3 quartiles. First Quartile is denoted by Q1 known as the lower quartile, the second
Quartile is denoted by Q2 and the third Quartile is denoted by Q3 known as the upper quartile.
Therefore, the interquartile range is equal to the upper quartile minus lower quartile.

Interquartile Range Formula

The difference between the upper and lower quartile is known as the interquartile range. The
formula for the interquartile range is given below

Interquartile range = Upper Quartile – Lower Quartile = Q3 – Q1

where Q1 is the first quartile and Q3 is the third quartile of the series

Semi Interquartile Range

The semi-interquartile range is defined as the measures of dispersion. Semi interquartile


range also is defined as half of the interquartile range. It is computed as one half the
difference between the 75th percentile (Q3) and the 25th percentile (Q1). The semi-
interquartile range is one-half of the difference between the first and third quartiles. The
Formula for Semi Interquartile Range is

Semi Interquartile Range = (Q3– Q1) / 2

Merits:
1. Robustness to Outliers: One of the most significant advantages of the IQR is its
resistance to outliers. Since it's based on quartiles (percentiles), extreme values have
less influence on its calculation compared to other measures like the range or standard
deviation.

2. Focus on Middle Data: The IQR concentrates on the middle 50% of the data,
providing insights into the variability of the central portion of the dataset. This is
particularly useful when the extreme values are not the primary concern.

3. Descriptive of Spread: The IQR gives a clear sense of how data points are distributed
within the middle range of the dataset. It describes the spread of the data that's more
representative of the majority of observations.

4. Non-Parametric Nature: The IQR doesn't make assumptions about the distribution
of the data, making it suitable for both symmetric and skewed datasets.

5. Useful in Comparisons: When comparing different datasets, the IQR can help you
assess differences in the spread of the middle portion of the data, independent of
differences in central tendency.

Demerits:

1. Limited Information: While the IQR is useful for understanding the spread within
the middle 50% of the data, it doesn't provide a comprehensive view of the entire
dataset. It can't tell you about the distribution of the data beyond the first and third
quartiles.

2. Ignores Extreme Values: While robustness to outliers is an advantage, it can also be


a disadvantage if you're interested in understanding the full range of data variability,
including the impact of outliers.

3. Lack of Balance: The IQR might not be the best choice if you're interested in a
measure that considers both the center and the spread of the data. In such cases, a
combination of mean and standard deviation might be more suitable.

4. Less Precise than Standard Deviation: The standard deviation provides more
detailed information about the spread of the data and is widely used in statistical
analyses. While the IQR has its own strengths, it lacks the precision offered by the
standard deviation.

Quartile Deviation, also known as the Semi-Interquartile Range, is a measure of variability


that quantifies the spread of data within the middle 50% of a dataset. It is calculated as half of
the difference between the first quartile (Q1) and the third quartile (Q3), which makes it
similar to the interquartile range (IQR). However, the Quartile Deviation focuses on a single
quartile range rather than the full IQR.

Mathematically, the formula for Quartile Deviation is:


Quartile Deviation = (Q3 - Q1) / 2

Where:

Q1 is the first quartile (25th percentile).

Q3 is the third quartile (75th percentile).

The Quartile Deviation gives you an idea of the spread of data within the central 50% of the
dataset, similar to the IQR. It is often used as a measure of dispersion when you're interested
in understanding the spread of the middle portion of the data while being less sensitive to
outliers.

Like the IQR, the Quartile Deviation is a robust measure of dispersion that is less affected by
extreme values compared to the standard deviation or range

Merits:

1. Robustness to Outliers: Similar to the Interquartile Range (IQR), the Quartile


Deviation is less affected by outliers compared to measures like the standard deviation
or range. It focuses on the middle 50% of the data, making it more robust when
extreme values are present.

2. Simplicity: The calculation of Quartile Deviation is relatively simple compared to


some other measures of dispersion. It only requires finding the first quartile (Q1) and
the third quartile (Q3) and then taking half of their difference.

3. Interpretability: The Quartile Deviation has a direct interpretation: it represents the


average dispersion of data points within the central 50% of the dataset. This can be
particularly useful when you're interested in understanding the spread of the middle
portion of the data.

4. Suitable for Skewed Data: The Quartile Deviation doesn't assume a normal
distribution and is appropriate for datasets with asymmetric or skewed distributions.

Demerits:

1. Limited Information: Like the Interquartile Range, the Quartile Deviation provides
information only about the spread of the middle 50% of the data. It doesn't take into
account the full range of data variability, including the outer 25% of the data.

2. Neglects Data Points: Since it focuses only on the quartiles, the Quartile Deviation
doesn't provide insights into the actual data points themselves. It may not give you a
clear picture of how individual values are distributed within the range.
3. Less Precision: While the Quartile Deviation is less affected by outliers, it might not
provide the same level of precision in measuring variability as the standard deviation
or the range.

4. Limited Use in Statistical Analysis: In more advanced statistical analyses and


modeling, the Quartile Deviation might not be as widely used or suitable as other
measures. Measures like the standard deviation are often preferred due to their greater
versatility.

5. Less Commonly Used: The Quartile Deviation is not as commonly used as other
measures like the IQR or standard deviation. This could mean that it might be less
familiar to those interpreting your results.

STANDARD VARIATION

Standard Deviation is a measure which shows how much variation (such as spread,
dispersion, spread,) from the mean exists. The standard deviation indicates a “typical”
deviation from the mean. tandard deviation calculates the extent to which the values differ
from the average. Standard Deviation, the most widely used measure of dispersion, is
based on all values. Therefore a change in even one value affects the value of standard
deviation.

Merits:

1. Comprehensive Measure of Variability: The standard deviation takes into account


all data points in a dataset, providing a comprehensive assessment of how spread out
they are from the mean. It gives a sense of the average amount of deviation from the
mean.

2. Predictive Power: It allows for making predictions and estimates. If you know the
mean (average) and the standard deviation (a measure of how spread out the data is),
you can make educated guesses about where most of the data points will fall.

3. Mathematical Properties: The standard deviation is used in various statistical


analyses, hypothesis testing, and constructing confidence intervals. It's an essential
component of the normal distribution and is used in inferential statistics.

4. Easy Interpretation and Communication: The concept of standard deviation is


relatively easy to understand and explain to both technical and non-technical
audiences.

5. Basis for Z-Scores: Z-scores are calculated using the standard deviation, providing a
standardized way to measure how far a data point is from the mean. This is useful for
identifying outliers.

Demerits:
1. Sensitivity to Outliers: The standard deviation is highly affected by extreme values
(outliers) in the dataset. A single outlier can greatly increase the standard deviation
and potentially distort its interpretation.

2. Assumption of Normal Distribution: The standard deviation assumes a normal


distribution of data. In cases of non-normal distributions or skewed data, it might not
accurately represent the variability.

3. Not Robust for Skewed Data: In asymmetric distributions, such as highly skewed
data, the standard deviation might not provide an accurate reflection of data spread, as
it's influenced by extreme values.

4. Population vs. Sample: There's a distinction between the population and sample
standard deviations. Using the wrong formula (population vs. sample) can lead to
incorrect results.

5. Lack of Interpretability: While the standard deviation provides a measure of spread,


it doesn't give insight into the shape of the distribution or the arrangement of data
points.

6. Misinterpretation of Magnitude: A larger standard deviation doesn't always mean


that the data is more variable. It depends on the context and scale of the data.

advantages and disadvantages of measures of variability:

Advantages:

1. Comprehensive Understanding: Measures of variability provide a way to


understand how spread out or dispersed the data points are in a dataset. This
information can be crucial for making accurate interpretations and decisions.

2. Sensitive to Data Variation: These measures capture the differences between data
points, giving you insights into the degree of variability. They help identify whether
the data points are tightly clustered or widely spread.

3. Comparative Analysis: Variability measures allow you to compare different datasets


or subsets within a dataset. This can be helpful in assessing which dataset or subset
has more consistent or less consistent values.

4. Identifying Outliers: High variability measures can signal the presence of outliers,
extreme values that may skew the interpretation of the data. This helps in identifying
data points that might need further investigation.

5. Assessment of Data Quality: In fields like quality control, variability measures help
in monitoring consistency and identifying potential issues in manufacturing processes.
Disadvantages:

1. Sensitivity to Outliers: Many measures of variability, like the range or standard


deviation, are highly sensitive to outliers. A single extreme value can greatly influence
the calculation, leading to potentially misleading results.

2. Limited Information: Measures of variability focus solely on the spread of data and
don't provide information about the shape or pattern of the distribution itself.

3. Dependence on Scale: Some measures, like the standard deviation, are influenced by
the scale of the data. If the data is measured in different units or on different scales,
direct comparisons of variability can be misleading.

4. Distribution Assumptions: Certain measures assume a normal distribution of data for


accurate interpretation. If the data doesn't follow a normal distribution, these measures
might not be appropriate or accurate.

5. Complex Interpretation: Some measures of variability, like the coefficient of


variation, might be more challenging to interpret for non-statisticians.

6. Lack of Context: Measures of variability alone might not give a full picture without
considering the context of th

You might also like