0% found this document useful (0 votes)
276 views14 pages

Psychology Project

Introduction to Statistics

Uploaded by

dhanalaxmi.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
276 views14 pages

Psychology Project

Introduction to Statistics

Uploaded by

dhanalaxmi.r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Psychology project

Statistics in psychology
A branch of mathematics devoted to the collection, compilation, display, and interpretation of numerical data

Psychologists rely heavily on statistics to help assess the meaning of their measurements. Sometimes
the measurements involve individuals who complete psychological tests; at other times, the
measurements include statistics that describe the general properties of groups of people or animals. In
psychological testing, the psychologist may interpret test results in light of norms, or the typical results,
provided from previous testing. In research, psychologists use two kinds of statistics, descriptive and
inferential.
● According to Garrett [1966]
Statistics is a branch of mathematics that deals with the analysis of complex numerical data, that is, data
influenced by several causes.
● Quantitative reacher methods include measuring and counting.
● Qualitative research methods include interviewing and observing

statistical analysis in psychology involves collecting and analyzing data to discover patterns
and trends. The experimental process consists of the study design, sample group, variables,
testing, and measurements or research interpretations. Psychologists use statistical analysis to
find ways to interpret and draw conclusions from their data. They ultimately want to test whether
the data from an experiment supports or rejects their hypothesis, a testable assumption
proposed by the researcher before experimentation. The information from statistical analysis
can lead to predictions, decisions, and discoveries.
There are the following steps for statistical analysis, they are:
★ Collection of various numerical data from sample groups.
★ Summarizing all the scores obtained from different sample groups.
★ Analyzing the tabulated scores by using different statistical computations.
★ Interpretation of the results.
★ Quantitative representation of the results through bar diagram, pie chart, line graph, etc.
★ A comprehension conclusion is drawn based on all of the calculations and interpretations.

Uses of statistics
Use of statistics in psychology. They are:
➔ Summarizing/describing large amounts of data.
➔ Comparing individuals or groups of individuals in different ways.
➔ Determining certain aspects of behavior are related.
➔ Predicting future behavior from current information.

Types of statistics
In statistics, the central tendency is the descriptive summary of a data set. The single value from the
dataset reflects the center of the data distribution. Moreover, it does not provide information
regarding individual data from the dataset, where it gives a summary of the dataset. Generally, the
central tendency of a dataset can be defined using some of the measures in statistics. There are
two types of statistics they are:
(i) Inferential statistics (ii) descriptive statistics

(i) Inferential statistics: involves making inferences and predictions about a population based on
a sample of data taken from that population. It aims to conclude the immediate data, making
generalizations or predictions about a larger group based on a smaller sample. This branch of
statistics involves hypothesis testing, estimation, and prediction. Common techniques in inferential
statistics include regression analysis, analysis of variance (ANOVA), chi-square tests, t-tests, and
correlation analysis. These methods help researchers make decisions or predictions about a
population based on limited sample data while considering the uncertainty inherent in sampling.

(ii) Descriptive statistics: used to summarize and describe the basic features of the data in a
study. It provides simple summaries of the sample and the measures derived from it. Descriptive
statistics aim to present the data meaningfully, often in tables, charts, or numerical measures such
as mean, median, mode, standard deviation, range, and percentiles. Descriptive statistics help
understand the data set's characteristics, identify patterns, and summarize large amounts of data
efficiently.

Definition of central tendency


The central tendency is stated as the statistical measure that represents the single value of the
entire distribution or a dataset. It aims to provide an accurate description of the entire data in the
distribution.

Measures of Central Tendency


The central tendency of the dataset can be found using the three important measures namely,
mean, median, and mode.
Mean
The mean represents the average value of the dataset. It can be calculated as the sum of all the
values in the dataset divided by the number of values. In general, it is considered the arithmetic
-mean. The mean, or the arithmetic average, is the sum of all the values in a dataset divided by the
number of values. It is calculated as:

Mean = (Sum of all values) / (Number of values)

The mean is sensitive to the data's extreme values, also known as outliers. Even a single outlier can
significantly affect the mean, pulling it towards the extreme value.

Median

The median is the middle value of the dataset in which the dataset is arranged in ascending order or
descending order. When the dataset contains an even number of values, the median value of the
dataset can be found by taking the mean of the middle two values.:
GROUPED DATA

X = L𝑏 + ( 𝑛
2 )
− 𝐶𝐹 < /FM× 𝐶

UNGROUPED DATA

𝑛+1
𝑋 = 2

Mode

The mode represents the frequently occurring value in the dataset. Sometimes the dataset may
contain multiple modes and in some cases, it does not contain any mode at all.

The mode is useful for categorical data or discrete data with distinct categories. For example, in a
survey asking respondents to choose their favorite color from a list of options, the mode would
represent the most commonly chosen color.

Based on the properties of the data, the measures of central tendency are selected.

● If you have a symmetrical distribution of continuous data, all three measures of central tendency hold
good. But most of the time, the analyst uses the mean because it involves all the values in the distribution
or dataset.
● If you have a skewed distribution, the best measure of finding the central tendency is the median.
● If you have the original data, then both the median and mode are the best choice for measuring the central
tendency.
● If you have categorical data, the mode is the best choice to find the central tendency.

UNGROUPED DATA

Most common value


Measure of variability
Measure of variability, also known as dispersion, refers to the extent to which the values in a dataset
differ from the central tendency. It provides information about the spread or scatter of data points
around the central value. Understanding variability is important because it helps to assess the
consistency or variability of the data and provides insights into the range of values present in the
dataset.
There are several common measures of variability, including range, interquartile range (IQR),
variance, and standard deviation. Each measure has its strengths and weaknesses, and they are
used in different situations depending on the characteristics of the data

Range
The range is the simplest measure of variability and is calculated as the difference between the
maximum and minimum values in a dataset. It provides a quick and easy way to understand the
spread of data, but it is sensitive to outliers and may not accurately represent the variability in the
presence of extreme values.

formula
Range = Maximum value - Minimum value

Range of Ungrouped Data

Range = Highest value of the data set – The lowest value of the data set

Range of Grouped Data

In the case of continuous frequency distribution or grouped data, the range is defined as the
difference between the upper limit of the maximum interval of the grouped data and the lower limit
of the minimum interval. It is the simplest measure of dispersion. It gives a comprehensive view of
the total spread of the observations. Thus, the formula to calculate the range of grouped data is
shown below:

Range = Upper-class boundary of the highest interval – Lower-class boundary of the lowest interval
Interquartile Range (IQR):

The interquartile range is a measure of statistical dispersion based on dividing a dataset into
quartiles. Quartiles divide the dataset into four equal parts, each containing approximately 25% of
the data. The IQR is the difference between the third quartile (Q3) and the first quartile (Q1),
representing the range of the middle 50% of the data.

IQR = Q3 - Q1

The IQR is less sensitive to outliers compared to the range and provides a more robust measure of
variability, particularly for skewed distributions.

Variance
● Variance is a measure of the variability of data and describes how the data points are

spread out for the mean.

● There can be two types of variance - sample variance and population variance.

● There can be two kinds of data - grouped and ungrouped. Thus, we can have grouped

sample variance, ungrouped sample variance, grouped population variance, and

ungrouped population variance.

● The variance is the standard deviation squared.

● Covariance describes how a dependent and an independent random variable are related

to each other.

Properties
(1) If the variance is zero, this means that

(𝑎𝑖–𝑎¯)
is equal to zero, which is nothing but each value of the set is equal to the mean value
𝑎¯.
.
(2) If the variance is small, it means that the observations are pretty close to the mean value, and if
the value is greater, the deviations of the observations are far from the mean value.

(3) If each observation is increased by ‘a’ where aϵR, then the variance will remain unchanged.

(4) If each observation is multiplied by ‘a’ where a ϵ R, then the variance will be multiplied by a2 a

Formula

Standard deviation
The standard deviation is the average amount of variability in your dataset. It
tells you, on average, how far each value lies from the mean. A high standard
deviation means that values are generally far from the mean, while a low
standard deviation indicates that values are clustered close to the mean\

Properties
● It cannot be negative.
● It is only used to measure spread or dispersion around the mean of a data set.
● It shows how much variation or dispersion exists from the average value.
● It is sensitive to outliers. A single outlier can raise σ and, in turn, distort the picture of the
spread.
● For data with almost the same mean, the greater the spread, the greater the standard
deviation.
● Standard deviation can be used in conjunction with the mean to calculate data intervals when
analyzing normally distributed data.
● square root of the means of all the squares of all values in a data set is described by the
standard deviation.

● In other terms, the standard deviation is also called the root mean square deviation. The
smallest value of the standard deviation can only be number zero.

Formula

Formulas for Standard Deviation

Population Standard Deviation Formula 𝜎=∑(𝑋−𝜇)2𝑛

Sample Standard Deviation Formula 𝑠=∑(𝑋−𝑋¯)2𝑛−1

Notations for Standard Deviation

● σ = Standard Deviation
● xi = Terms Given in the Data
● x̄ = Mean
● n = Total number of Terms
Variance and Standard Deviation
Variance is a measure of data variability around the mean, obtained by squaring the standard
deviation. Standard deviation assesses how the data points deviate from the mean, indicating the
spread of the data. The symbol σ2 stands for variance and σ for standard deviation. Variance is
expressed in square units while the standard deviation has the same unit as the population or the
sample.

Mean Deviation Definition


The difference between the observed value of a data point and the expected value is known as
deviation in statistics. Thus, mean deviation or mean absolute deviation is the average deviation of
a data point from the mean, median, or mode of the data set. Mean deviation can be abbreviated as
MAD.
Mean Deviation falls under average absolute deviation. The average absolute deviation can be
defined as the average of the absolute deviations from the central point of the data. The central
point can be computed by using either mean, median, or mode.

Mean Deviation Formula

Depending upon the type of data available as well as the type of the central point, there can be

several different formulas to calculate the mean deviation.


Quartile deviation
Quartile deviation is useful for summarizing the variability of a dataset around the median,
especially when the data is not normally distributed or contains outliers. It provides insight into the
spread of the middle 50% of the data and complements other measures of dispersion such as the
range, standard deviation, and variance

For ungrouped data:

● Arrange the dataset in ascending order.


● Calculate the first quartile (Q1), which represents the value below which 25% of the data
falls.
● Calculate the third quartile (Q3), which represents the value below which 75% of the data
falls.
● Compute the interquartile range (IQR) as the difference between Q3 and Q1.

Finally, calculate the quartile deviation as half of the IQR.

The formula for quartile deviation for ungrouped data:

Quartile Deviation = (Q3 - Q1) / 2

For grouped data:

● Determine the cumulative frequency distribution for the grouped data.


● Identify the quartiles (Q1 and Q3) by finding the values corresponding to the cumulative
frequencies closest to 25% and 75% of the total frequency, respectively.

Calculate the quartile deviation using the same formula as for ungrouped data.

Let's denote:

★ L: The lower class boundary of the class containing Q1


★ N: The total frequency of the dataset
★ F: The cumulative frequency of the class preceding the class containing Q1
★ C: The frequency of the class containing Q1
★ h: The class interval

Using these notations, the formula for quartile deviation for grouped data is:

Quartile Deviation = (h / 2) * ((N/4 - F) / C)

This formula considers the difference between the cumulative frequency closest to 25% of the total
frequency and the cumulative frequency of the class preceding Q1, divided by the frequency of the
class containing Q1. The half of the class interval (h/2) is then multiplied by this ratio to obtain the
quartile deviation.

z - Score
a z-score (also called a standard score) gives you an idea of how far from the
mean a data point is. But more technically it’s a measure of how many standard
deviations below or above the population mean a raw score is.

A z-score can be placed on a normal distribution curve. Z-scores range from -3


standard deviations (which would fall to the far left of the normal distribution curve) up to
+3 standard deviations (which would fall to the far right of the normal distribution curve).
To use a z-score, you need to know the mean μ and also the population standard
deviation σ.

Z-scores are a way to compare results to a “normal” population. Results from tests or
surveys have thousands of possible results and units; those results can often seem
meaningless. For example, knowing that someone’s weight is 150 pounds might be good
information, but if you want to compare it to the “average” person’s weight, looking at a
vast table of data can be overwhelming (especially if some weights are recorded in
kilograms). A z-score can tell you where that person’s weight is compared to the
average population’s mean weight.

The basic z-score formula for a sample is:

z = (x – μ) / σ

You may also see the z-score formula shown to the left. This is the same formula
as z = x – μ / σ, μ (the population mean), and σ (the population standard deviation).
😵‍
I am dead

You might also like