0% found this document useful (0 votes)
3 views

Chapter 1 Data Analysis Making Sense of Data(1)

The document covers key concepts in AP Statistics, including types of variables (categorical and numerical), descriptive analysis of categorical variables, and methods for displaying quantitative data through graphs and numbers. It explains how to analyze data distributions, create visual representations like bar charts and histograms, and calculate measures of central tendency and spread. Additionally, it emphasizes the importance of both graphical and numerical methods in data analysis.

Uploaded by

王一荣
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 1 Data Analysis Making Sense of Data(1)

The document covers key concepts in AP Statistics, including types of variables (categorical and numerical), descriptive analysis of categorical variables, and methods for displaying quantitative data through graphs and numbers. It explains how to analyze data distributions, create visual representations like bar charts and histograms, and calculate measures of central tendency and spread. Additionally, it emphasizes the importance of both graphical and numerical methods in data analysis.

Uploaded by

王一荣
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

AP Statistics Part I

ILO(Intended Learning Outcomes)


This section covers…

• The type of variable

• Descriptive analysis of categorical variables

• Display of quantitative variable with Graphs

• Display of quantitative variable with Numbers

• Generality measures of data


1. The type of variable

• Individuals have variables

• Variable has values(not necessarily numbers).

• Values of variables can vary with individuals.


1. Types of data

Weight-2kg; Height-12 cm; Age-3weeks


discrete
Quantitativ
e
Variable continuous
Qualitative
( Categoric
al )
Blood type-AB; Ear shape-Round; Favorite food-peanut
1. Types of data

• Categorical data is non-numerical data. It names or describes something


without reference to number or size. Categorical data is also called
qualitative(categorical) variable.
• Numerical data is data in number form. It can be an amount, a
measurement, a time or a score. Numerical data is also called
quantitative variable (from the word quantity).
1. Types of data
Continuous data/variables-this is
data/variables that could take any value
between two given values

Discrete data/variables--this is Continuous


data/variables that can only take certain
values

• Both Continuous data and Discrete data


are Numerical data

Discrete
2. Descriptive analysis of categorical variable

Definition: The distribution of data shows describes what values the


variable takes and how often it takes them 。

The distribution of a categorical variable lists the categories and gives either

I. the count/number

II. the percent

III. The probability

of individuals who fall within each category.


2. Descriptive analysis of categorical variable
2. Descriptive analysis of categorical variable

• Bar charts are normally used to display categorical data.


• Must have
1. a title
2. number scale or axis
3. categories displayed
4. Bars
Note:
5. bars can be horizontal or vertical.
6. Might not provide exact figure
7. Interval is usually the same
2. Descriptive analysis of categorical variable-Practice

Pie charts
• A pie chart is a circular chart which uses slices or sectors of the circle to
show categorical data.
• Must have
1. a title
2. number, scale or axis
3. categories displayed
4. Pie
2. Descriptive analysis of categorical variable

• A two-way table describes two categorical variables


2. Descriptive analysis of categorical variable

Marginal distribution

• For example, , the percent of these young adults who


think they are almost certain to be rich by age 30 is
2. Descriptive analysis of categorical variable

Conditional Distribution
2. Descriptive analysis of categorical variable
Relationships Between Categorical Variables
We could have used a segmented bar graph to compare the distributions
of male and female responses in the previous example.

We say that there is an association between two variables


2. Descriptive analysis of categorical variable

I. Find the conditional distributions of


superpower preference among students from
the United Kingdom and the United States.

II. Make an appropriate graph to compare the


conditional distributions.

III. Is there an association between country of


origin and superpower preference? Give
appropriate evidence to support your answer.
2. Descriptive analysis of categorical variable-Review

• The distribution of a categorical variable lists the categories and gives the count
(frequency) or percent (relative frequency) of individuals

• Pie charts and bar graphs display the distribution of a categorical variable

• A two-way table of counts organizes data about two categorical variables measured
for the same set of individuals.

• Marginal distribution and conditional distribution are often used in two categorical
variables for analysis.

• There is an association between two variables if knowing the value of one variable
helps predict the value of the other.
2. Descriptive analysis of categorical variable -Exercise
3. Display of quantitative data with Graphs

Dotplots and stemplots

• Here are data on the number of goals scored by the team in the 12
months prior to the 2012 Olympics.
3. Display of quantitative data with Graphs

This stem-and-leaf diagram shows the number of employees at 20


companies.

What is the most common number of employees?

How many of the companies have fewer than 25 employees?


2 Representation of discrete data: stem-and-leaf diagrams
3. Display of quantitative data with Graphs
3. Display of quantitative data with Graphs

histograms are for quantitative data bar graphs are for categorical data
3. Display of quantitative data with Graphs

Histogram must have:

1. lower and upper boundary

2. Class widths/interval

3. Frequency/Frequency density
3. Display of quantitative data with Graphs

• In any graph, look for the overall pattern and for striking departures
from that pattern.

• You can describe the overall pattern of a distribution by its shape,


center, and spread.

• An important kind of departure is an outlier, an individual value that


falls outside the overall pattern.
3. Display of quantitative data with Graphs

1. Centre tendency
2. Spread
3. Shape
4. Outliers
3. Display of quantitative data with Graphs

1. Centre tendency
3. Display of quantitative data with Graphs

2. Spread

2. Spread
3. Display of quantitative data with Graphs

Shape:

• From the graph only: symmetry and mode

• Symmetry: symmetric, skewed

• Mode: Unimodal, Bimodal, multimodal


3. Display of quantitative data with Graphs

4. Outliers

High/low outliers
3. Display of quantitative data with Graphs
3. Display of quantitative data with Graphs
3. Display of quantitative data with Graphs

1. Centre tendency
2. Spread
3. Shape
4. Outliers
3. Display of quantitative data with Graphs

Compare two distribution

1. Centre tendency

2. Spread

3. Shape

4. Outliers

Compare the distributions of household


size for these two countries
3. Display of quantitative data with Graphs

Compare Centre tendency

E.g. Household sizes for the South African


students tended to be larger than for the
U.K. students. The midpoint of the
household sizes for the two groups are 6
people and 4 people, respectively.
3. Display of quantitative data with Graphs
Compare Spread

In graph: Compare range then IQR; Compare


the Var or SD

E.G. The household sizes for the South


African students vary more (from 3 to 26
people) than for the U.K. students (from 2 to
6 people)
3. Display of quantitative data with Graphs

Compare shape:

• From the graph only: symmetry and mode

E.G. The distribution of household size for the U.K. sample is roughly
symmetric and unimodal, while the distribution for the South Africa
sample is skewed to the right and unimodal.
3. Display of quantitative data with Graphs

Compare outliers

Describe the outliers is enough

E.G. : There don’t appear to be any outliers


in the U.K. distribution. The South African
distribution seems to have two outliers in
the right tail of the distribution—students
who reported living in households with 15
and 26 people.
4. Display of quantitative data with Numbers

• To explorer the exact features of the variable requires not only


graph but numbers.

4.1 Measurement of central tendency

4.2 Measurement of deviation tendency

4.3 Measurement of relative position


4. Display of quantitative data with Numbers

4.1 Measurement of central tendency


• An average is a single value used to represent a set of data.
• There are three types of average - the mode, median and mean - are
used to summarise a collection of data.
4. Display of quantitative data with Numbers()

For data:

4 6 6 6 7 14 8

Mode=6(most frequent)

Mean=(numerical average)

Median=6(for 6 is in the middle of 4 6 6 6 7 8 14)


4. Display of quantitative data with Numbers

𝑀𝑒𝑎𝑛=𝐸 ( 𝑋 ) =
∑ 𝑥𝑓
( 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎)=
∑ 𝑥𝑝
( random data)
∑𝑓 ∑𝑝
4. Display of quantitative data with Numbers

4.2 Measurement of deviation tendency

• Range:

[Mean: ]

• Variance:
4. Display of quantitative data with Numbers
4. Display of quantitative data with Numbers

Many calculators report two standard deviations.

One is usually labeled , the symbol for the standard deviation of a


population. This standard deviation is calculated by dividing the sum of
squared deviations by instead of before taking the square root.

If your data set consists of the entire population, then it’s appropriate to
use . Most often, the data we’re examining come from a sample.

In that case, we should use .


4. Display of quantitative data with Numbers

4.3 Measurement of relative position

• Percentiles

• Z-scores
4. Display of quantitative data with Numbers

• The median is a special example of a percentile. It is placed exactly


half way through a list of ordered data so that 50% of the data is
smaller than the median.
• The tenth percentile, for example, would lie such that 10% of the data
was smaller than its value.
• The 75th percentile would lie such that 75% of the values are smaller
than its value.
4. Display of quantitative data with Numbers

• Quartiles
• Two very important percentiles are the lower and upper quartiles.
These lie 25% and 75% of the way through the data respectively.
• If the position does not turn out to be a whole number, you simply
find the mean of the pair of numbers on either side.
4. Display of quantitative data with Numbers

• For each of the following sets of data calculate the median, upper and
lower quartiles.
• In each case calculate the interquartile range.
I. 13 12 8 6 11 14 8 5 1 10 16 12
II. 14 10 8 19 15 14 9
4. Display of quantitative data with Numbers

• As with the range, the interquartile range gives a measure of how spread
out or consistent the data is.

• If one set of data has a smaller IQR than another set, then the first set is
more consistent and less spread out. This can be a useful comparison tool.
The 1.5 × IQR rule for outliers
Call an observation an outlier if it falls more than 1.5 × IQR above the third
quartile or below the first quartile
4. Display of quantitative data with Numbers
4. Display of quantitative data with Numbers
4. Display of quantitative data with Numbers
4. Display of quantitative data with Numbers

• Numerical methods are more accurate, specific and complicated.

• Graphical methods are more straightforward, general and

comparable.

• Numerical summaries do not fully describe the shape of a

distribution.

• Always plot your data.


End of chapter review

• Try to explain the following terms in your own words. You may use
description, examples or comparison.

Individual variable Value


Quartile Mode Median
IQR Outliers Positively skewed
Frequency density Marginal Conditional
distribution distribution
association Class interval Standard deviation
Unimodal Sample Population

You might also like