0% found this document useful (0 votes)
12 views34 pages

Central Tendency Dispersion Visualization

The document provides an overview of statistical concepts, focusing on central tendency, measures of dispersion, data visualization, and hypothesis testing. It explains key measures such as mean, median, mode, range, variance, and standard deviation, along with various data visualization techniques like bar charts and scatter plots. Additionally, it covers hypothesis testing methods including T-Test, Chi-Square Test, and One-Way ANOVA, detailing their assumptions and examples.

Uploaded by

krishna lasune
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views34 pages

Central Tendency Dispersion Visualization

The document provides an overview of statistical concepts, focusing on central tendency, measures of dispersion, data visualization, and hypothesis testing. It explains key measures such as mean, median, mode, range, variance, and standard deviation, along with various data visualization techniques like bar charts and scatter plots. Additionally, it covers hypothesis testing methods including T-Test, Chi-Square Test, and One-Way ANOVA, detailing their assumptions and examples.

Uploaded by

krishna lasune
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Central Tendency, Measures of

Dispersion & Data Visualization


An Overview of Statistical Concepts
Central Tendency
Central Tendency refers to a single value that
represents the entire data set.
The three main measures are:
- Mean
- Median
- Mode
Mean (Average)
• The sum of all values divided by the number of
values.
• Formula: Mean = ΣX / N
• Sensitive to extreme values (outliers).
Median
• The middle value when data is arranged in
order.
• If odd number of observations: Middle value.
• If even number of observations: Average of
the two middle values.
• Not affected by extreme values.
Mode
• The most frequently occurring value in a data
set.
• A data set can have one mode (unimodal), two
modes (bimodal), or more.
• Useful for categorical data.
Measures of Dispersion
Dispersion measures how spread out the data is.
The key measures are:
- Range
- Variance
- Standard Deviation
Range
• Difference between the highest and lowest
values.
• Formula: Range = Max value - Min value.
• A simple measure but affected by outliers.
Variance
• Measures the average squared deviation from
the mean.
• Formula:
Population Variance (σ²) = Σ(X - Mean)² / N
Sample Variance (s²) = Σ(X - Mean)² / (N-1)
Standard Deviation
• Square root of the variance.
• Indicates the spread of data around the mean.
• Formula: Standard Deviation (σ) = √Variance.
• A lower value indicates less variability.
Data Visualization
Data visualization helps in understanding
patterns and trends.
Common types:
- Bar Chart
- Pie Chart
- Histogram
- Scatter Plot
Bar Chart
• Represents categorical data using bars.
• The length of each bar is proportional to the
value it represents.
• Useful for comparing different categories.
Pie Chart
• Represents data as a circular graph divided
into sectors.
• Each sector represents a proportion of the
whole.
• Best for showing percentage distributions.
Histogram
• Represents the distribution of numerical data.
• Similar to a bar chart but used for continuous
data.
• Shows frequency of data within intervals.
Scatter Plot
• Displays relationships between two numerical
variables.
• Each point represents an observation.
• Helps in identifying correlations.
Hypothesis Testing

T-Test, Chi-Square Test, and One-Way


ANOVA
Introduction to Hypothesis Testing
Hypothesis testing is a statistical method used to
determine whether there is enough evidence to
reject a null hypothesis.

Common tests include:


- T-Test
- Chi-Square Test
- One-Way ANOVA
T-Test
A T-Test is used to compare the means of two
groups to see if they are significantly different.

Assumptions:
- Data should be normally distributed
- Independent observations

Example: Comparing the average test scores of two


classes.
t-test
• t –test is about means: distribution and evaluation
for group distribution
• Withdrawn form the normal distribution
• The shape of distribution depend on sample size and,
the sum of all distributions is a normal distribution
• t- distribution is based on sample size and vary
according to the degrees of freedom
One Sample T-Test Example
Hypothesis:
H0: The population mean is equal to a specified value.
H1: The population mean is not equal to a specified
value.

Example: Testing if the average IQ of students is 100.

Interpretation:
If p-value < 0.05, we reject H0.
Review 6 Steps for Significance Testing
4.
1. Findalpha
Set the critical
(p level).
value of the statistic.
2.
5. State the
hypotheses,
decision Null
rule.
6. and
StateAlternative.
the conclusion.
3. Calculate the test
statistic (sample
value).
What is the t -test
• t test is a useful technique for comparing mean values of
two sets of numbers.
• The comparison will provide you with a statistic for
evaluating whether the difference between two means is
statistically significant.
• t test is named after its inventor, William Gosset, who
published under the pseudonym of student.
• t test can be used either :
1. to compare two independent groups (independent-samples t
test)
2. to compare observations from two measurement occasions for
the same group (paired-samples t test).
What is the t -test
• The null hypothesis states that any difference
between the two means is a result to difference in
distribution.
• Remember, both samples drawn randomly form the
same population.
• Comparing the chance of having difference is one
group due to difference in distribution.
• Assuming that both distributions came from the
same population, both distribution has to be equal.
What is the t -test
• Then, what we intend:
“To find the difference due to chance”
• Logically, The larger the difference in means, the more likely
to find a significant t test.
• But, recall:
1. Variability
More variability = less overlap = larger difference
2. Sample size
Larger sample size = less variability (pop) = larger difference
Types
1. The independent-sample t test is used to compare two groups' scores
on the same variable. For example, it could be used to compare the
salaries of dentists and physicians to evaluate whether there is a
difference in their salaries.

2. The paired-sample t test is used to compare the means of two variables


within a single group. For example, it could be used to see if there is a
statistically significant difference between starting salaries and current
salaries among the general physicians in an organization.
Assumption
1. Dependent variable should be continuous
(I/R)
2. The groups should be randomly drawn from
normally distributed and independent
populations
e.g. Male X Female
Dentist X Physician
Manager X Staff
NO OVER LAP
Assumption
3. the independent variable is categorical with two levels
4. Distribution for the two independent variables is normal
5. Equal variance (homogeneity of variance)
6. large variation = less likely to have sig t test = accepting null
hypothesis (fail to reject) = Type II error = a threat to power
Sending an innocent to jail for no significant reason
Chi-Square Test
Chi-Square test is used to determine if there is an
association between two categorical variables.

Assumptions:
- Data should be categorical
- Expected frequency in each cell should be at least 5

Example: Checking if gender and preference for a


product are related.
Chi-Square Test of Independence Example

Hypothesis:
H0: There is no association between the two categorical
variables.
H1: There is an association between the two categorical
variables.

Example: Analyzing the relationship between education level


and voting preference.

Interpretation:
If p-value < 0.05, we reject H0.
One-Way ANOVA
ANOVA (Analysis of Variance) is used to compare means
across multiple groups.

Assumptions:
- Independent samples
- Normal distribution
- Homogeneity of variance

Example: Comparing the average salaries of employees in


three different companies.
One-Way ANOVA Example
Hypothesis:
H0: All group means are equal.
H1: At least one group mean is different.

Example: Testing if three different diets result in


different weight loss.

Interpretation:
If p-value < 0.05, we reject H0.

You might also like