0% found this document useful (0 votes)

17 views12 pages

CHAPTER+ONE+Descriptive+Statistics+ +univariate

Uploaded by

nila.vishwas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views12 pages

CHAPTER+ONE+Descriptive+Statistics+ +univariate

Uploaded by

nila.vishwas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

CHAPTER ONE: Introduction

What is statistics? There are two meanings, or uses, of the word:

I) Descriptive Statistics
methods for organizing and summarizing (often large amounts of) information.
The purpose is to represent a large data set (i.e. a bunch of numbers) in a clear and efficient manner.
Examples of descriptive statistics:
o your SAT score(s)
o the Dow Jones Industrial Average
o a baseball box score
o pie charts showing political polling results

II) Inferential Statistics

methods for drawing and measuring the reliability of conclusions about a population based on information
from a sample
Examples of inferential statistics:
o confidence intervals
o hypothesis tests
o regression analysis

Definitions
The entire body of people or things that you wish to investigate is called the population. The subset of the population
that you directly observe and examine (from which information is recorded) is called the sample.

The first step in a statistical study is that of identifying the population: it all depends on the question you’d like to
answer. (Information obtained from the entire population is called a census.) It is often the case that only some
members of the population can be examined. (Why?)

Examples
1. Suppose that you would like to know the average SAT score of all college freshmen in the United States.
The population: All college freshmen in the United States
A sample: 500 college freshmen, picked across various campuses

2. You would like to know who is favored in an upcoming congressional election in New Jersey.
The population: All registered voters (perhaps only those who “intend to vote”) in that district.
A sample: 1000 computer-selected voters

3. Does a newly developed drug work?

The population: All people who would ever take that new drug.
A sample: 200 people who agree to participate in a study to test the effects of the drug.

Why take samples? It is often the case that the population is much too big for you to gather information from every
single member. When using samples we must understand that there is no guarantee that our conclusions are 100%
correct. Ideally, you would like to use a sample that is “perfectly representative” of the population from which it
came, but there is no such guarantee and one cannot tell how representative it is simply by looking at the sample.
Instead, we must focus on the method of sampling.

Most sampling procedures involve random selection in some way, where the people or things are selected blindly,
or by an outside process that cannot be predicted.
Most common sampling methods:
1. Simple random sampling
2. Stratified random sampling
3. Cluster sampling

1
Definition
Simple random sampling is the sampling procedure which assigns equal likelihood to every possible sample of that
particular size in the population.

Example
If two different students are to be selected from a class of 25, then a simple random sampling procedure would assign
equal likelihood to each of the 300 possible pairs of students. (We’ll see where “300” comes from later.)

If the sample is to be obtained one-by-one, random sampling requires that all individual members of the population
have the same chance of being included in the sample at all times in the selection process. Two main types of sampling:

Sampling with replacement

each chosen person or thing is returned to the population to possibly be chosen again

Sampling without replacement

each chosen person or thing is removed from the population and cannot be chosen again

While most studies involve sampling without replacement, probabilities are more easily derived from samples chosen
with replacement. Fortunately, for most problems of interest, there isn’t much of a difference between “with
replacement” and “without replacement” as far as actual probabilities. (More on this later.)

Before we learn how to summarize data, we first need to know what data is and how to classify it.

Definition
A variable is a characteristic that varies from one person or thing to another.

There are two fundamental types of variables:

o quantitative (numerical)
o qualitative (non-numerical, or categorical)

Examples of quantitative variables Examples of qualitative variables

the height of a building a person’s middle name
the number of pages in a newspaper the shape of a pill
the volume of a box the company that makes a new drug
the temperature in a room the gender of a dog
the dissolving time of a pill the state in which a US resident lives
the number of floors in a building the presence (or lack of) a characteristic

Quantitative variables can be subdivided into two groups: discrete (if possible values are a countable set) and
continuous (if possible values are uncountable).

Definition
Observed values of a variable or variables is called data.

Example
The number of brothers a person has is a variable
The statement “Greg has one brother” is data.

Definition
A data set is a collection of multiple observations of one or more variables.
We can summarize data sets with
• numbers
• tables
• pictures

2
Numerical summaries

The two most important characteristics of a quantitative data set:

• the "center" of the data set, or the most typical value
• the extent to which the data values are scattered

We can identify/quantify both with a pair of numbers

• measure of location
• measure of dispersion

In other words, we can use a very small set of (two) numbers to summarize a much larger set of numbers. By
themselves, these two measures can explain quite a bit about the data set from which they are computed.

The ordered values of a raw data set (from smallest to largest) will be denoted by

x1 , x 2 , x3 , ...... , x n
where n = the total number of pieces of data.

Measure of Location
- a single number that identifies the “center”, or most typical value of a data set.

We will study these, in particular:

1. mean
2. median
3. midrange
4. trimmed mean
5. midquartile (later in the chapter)

Definitions
The mean of a data set is the average of all of the values.

x1+ x2 + x3 + ..... + xn
Mean =
n
The median of a data set is the value that divides the ordered data values into two equal halves – a lower half and
an upper half. It is defined separately for two cases, depending on the sample size:

if n is odd, then the median is the single value in the middle.

Median = x( n+1 )
2

if n is even, then the median is the average of the two middle values.

x( n ) + x( n +1)
Median = 2 2

2
NOTE: When finding the median, do not forget to order the data values first!

3
Examples
n is odd:
For a data set with n=173 values, the median is the ((173+1)/2) = 87th smallest value.

n is even:
For a data set with n=256 values, the median is the average of the (256/2) = 128th and the ((256/2)+1) = 129th values.

PROBLEM 1.1
The following data set consists of exam scores for ten students:

74 88 90 70 74 82 12 84 62 84

(a) What is the mean score?

(b) What is the median score?

PROBLEM 1.2
Refer to the data set in Problem 1.1.
(a) Suppose that the instructor “curved” the scores by adding nine points to each one. What are the mean and
median of the curved scores?
(b) Can you explain what causes the difference between the mean and the median? (See the graphic below.)

The dotplot pictured below displays the exam scores in the above problem

Definition
A resistant measure is one which isn’t affected much (if at all) by outliers or extreme values in a data set.

If you can make a measure be equal to whatever you want it to be, just by changing just one data value, then it is
not resistant. (Otherwise it is.)

How to determine if a measure is resistant:

1) Create, or at least imagine, a small set of numbers.
2) Compute the measure for these numbers.
3) Change the largest data value to something very large (like, say, 100,000,000)
4) Re-compute the measure
- if the measure “blows up”, it is not resistant
- if the measure “stay put” it is resistant

Definition
The midrange of a data set is the average of the smallest and largest data values.

x1 + xn
Midrange =
2
Question: Is the midrange a resistant measure?

Definition
A trimmed mean of a data set is the mean of a specified fraction of data values in the middle; i.e. the same number
of values on the low and high end are ‘trimmed” off and the remaining values are averaged.

Trimmed means are identified by the percentage of data values that are trimmed off.

4
Example
For a data set with n = 50 values, the 20% trimmed mean drops 50 x (.20) = 10 data values. Since half of 10 is 5 …

the five smallest (x1 , ….. , x5) are trimmed off

the five largest (x46, ….. , x50) are trimmed off

and the remaining 40 values in the “middle” (x6, …… , x45) are averaged.

PROBLEM 1.3
Recall once again the ten exam scores from Problem 1.1:

74 88 90 70 74 82 12 84 62 84

Find:
(a) the 20% trimmed mean
(b) the 40% trimmed mean
(c) the 60% trimmed mean
(d) the 80% trimmed mean

By itself, a measure of location does not provide an adequate summary. It can tell you where the center of data
values is on the number line, but nothing else.

Example
Suppose that two brothers, Jimmy and Billy, each have six children of their own. The ages of their children:

Jimmy’s children: 10 11 12 13 14 15
Billy’s children: 3 8 9 16 17 22

Dotplots for each data set are given below.

Jimmy’s children:

Billy’s children:

Both data sets have the same mean, the same median, the same midrange, etc. Yet they are very different. How?

Measure of dispersion
- a single number that measures the extent to which the values of a data set are scattered

We will concentrate on these:

1. range
2. standard deviation
3. median absolute deviation
4. interquartile range (later)

5
Definition
The range of a data set is the difference between the largest and smallest data values.

Range = xn − x1

PROBLEM 1.4
For the exam scores in Problem 1.1, compute the range. Is the range a resistant measure?

*The range is easy to compute, but it is very wasteful. A better measure of dispersion is desired.

Before we proceed, we need to define some special sums:

Data values: x1 , x 2 , x3 , ...... , x n

x i = x1 + x2 + x3 + ...... + xn
(sum of data)

Now let x denote the mean of the data values.

 (x i − x ) 2 = ( x1 − x ) 2 + ( x2 − x ) 2 + ( x3 − x ) 2 + ...... + ( xn − x ) 2
(sum of squared deviations)

So you have a bunch of data values: x1 , x 2 , x3 , ...... , x n . You would like to quantify the amount of scatter.

Compute the mean as a measure of location and ask yourself: are the data values “close to ” or “far from” x ?
In other words, consider these values:

( x1 − x ) , ( x 2 − x ) , ( x3 − x ) , ...... , ( x n − x )

These are called the deviations from the mean – they are the differences between each data value and the overall
mean. Ask yourself what will these deviations look like if
- the data values are close together?
- the data values are widely scattered?

Clearly, these deviations contain information about the degree to which the original data values are scattered.
We would like to have a single number that measures scatter, so should we simply add up these deviations? NO!

 (x i − x ) = 0 ALWAYS!!! (i.e. for every data set)

Consider squaring each of the deviations first …..

( x1 − x ) 2 , ( x2 − x ) 2 , ( x3 − x ) 2 , ...... , ( xn − x ) 2
….. before adding them up.

 (x i − x ) 2 WORKS!

The sum of squared deviations will successfully measure the degree of scatter in a data set. This “special” sum is the
main component in the formula for standard deviation.

6
Definition
The standard deviation of a data set is given by:

Standard Deviation =
 (x − x )
i
2

n −1
Example
For the exam scores from Problem 1.1, the standard deviation is computed with the help of a three-column chart:
We need n, x ,  (x i − x)2

,
For the childrens’ ages on page 5 we see how the standard deviations reflect the difference in variability:

Jimmy’s children: Billy’s children:

NOTE: Standard deviation can also be expressed as:

𝟐 For the exam scores in Problem 1.1:

𝟐 (∑ 𝒙𝒊 )
∑𝒙 −
=√ 𝒊 𝒏−𝟏
𝒏

7
PROBLEM 1.5
Suppose that the instructor of a college class wishes to curve an exam by adding 9 points to the raw scores (as in
Problem 1.2). What will happen to the standard deviation of these new “curved” scores?

Consider the next measure of dispersion which uses the median as its initial measure of location, instead of the
mean.

The Median Absolute Deviation (MAD):

The MAD works like standard deviation except that it relies on the median as its initial measure of location.
Given a set of data: x1 , x 2 , x3 , ...... , x n (in column 1 of a table)

Step 1 Find the median of the data set. Call it ~

x .
Step 2 Subtract ~
x from each data value. This yields the deviations from the median. (column 2)
Step 3 Take absolute values of the deviations of the median. (column 3)
Step 4 Find the median of these absolute values. This is the MAD.

The MAD can be found – and more importantly, understood – with the help of another three-column chart.

Example:
Consider the following small data set, consisting of the ages of ten night-school students:

32 34 22 60 25 38 32 44 28 42

First create a three column chart with the ordered data in column 1, and find the median. The deviations from the
median and their absolute values fill out the chart.

x x-x |x-x|
22 -11 11
25 -8 8
28 -5 5
32 -1 1
32 -1 1
34 1 1
38 5 5
42 9 9
44 11 11
60 27 27

Median = ~
x = 33

Finally: order the values in the third column and find the median again ….

1 1 1 5 5 8 9 11 11 27

MAD = (5+8)/2 = 6.5

Question: Is the MAD resistant?

8
Some notation

It is either the case that our data set constitutes a population in itself (if those values are all that interest us) or it is a
sample from some larger population. (It all depends on what our objective is!)

To distinguish between the two cases, we must introduce some notation.

If we are summarizing a population:

population size: N
population mean: 
population standard deviation: 

If we are summarizing a sample:

sample size: n
sample mean: x
sample standard deviation: s

Definitions
A parameter is a descriptive measure for a population; a statistic is a descriptive measure for a sample.

Statistics are used as estimates of unknown parameters! (In other words, x is used to estimate μ and s is used to
estimate σ.)

NOTES:
1. The variance of a data set, denoted by s2 or 2 is simply the standard deviation “squared”.

2. The standard deviation of a population is defined by

 =  i
( x −  ) 2

N
The “n-1” used for the sample standard deviation is used primarily for estimation purposes. Observe that both
formulas adequately serve the same descriptive purpose.

Quartiles

For a given data set, suppose the median is found. Divide the data values into two equal halves: a lower half and
an upper half.

• if n is odd, then include the median in both halves

• this is not done if n is even

Now find the median of each half. This yields the three quartiles of the data set: Q1, Q2, Q3

• 1st quartile (Q1) = median of lower half of data set

• 2nd quartile (Q2) = median of the full data set (reminder: you find this first)
• 3rd quartile (Q3) = median of upper half of data set

9
Quartiles give us another measure of location and another measure of dispersion:

Q1 + Q3
Midquartile = (location)
2

Interquartile Range = Q3 – Q1 (dispersion)

PROBLEM 1.6
Recall the data set consisting of the exam scores for the ten students in Problem 1.1:

74 88 90 70 74 82 12 84 62 84

Find the quartiles and compute the MQ and the IQR.

It is still the case that the mean and standard deviation are the most popular choices for measuring location and
dispersion, respectively. The following theorem illustrates the significance of both.

Chebyshev’s Theorem

Suppose that, for a particular data set, you are only told its mean and its standard deviation. Nothing else. These two
numbers give you a pretty good idea where “most” of the values fall on the number line. You know for certain:

- at least 75% of the data values fall within 2 standard deviations of the mean
- at least 89% of the data values fall within 3 standard deviations of the mean
- at least 93% of the data values fall within 4 standard deviations of the mean

 1 
Generally speaking, at least 1 − 2   100% of the data values fall within K standard deviations of the mean.
 K 

Example
You read in the newspaper that the population of seniors at City High School averaged 1030 on the SATs last year
with a standard deviation of 80. Given no other information, you know that:

- at least 75% of the seniors scored between   2 1030  2(80)

(870, 1190)

- at least 89% of the seniors scored between   3 1030  3(80)

(790, 1270)

10
Graphical summaries: tables and pictures

We will now look at:

o frequency/relative frequency distributions
o frequency/relative frequency histograms
One way to summarize quantitative data is to put them into groups. Divide the number line into non-overlapping
intervals of equal length (i.e. count by 1s, or by 5s, or by 100s, or ….). Then tally up the data by counting the
number of values that fall into each group. This gives you the frequencies for each group. Dividing the frequencies
by the total number yields the relative frequencies.

Example
The raw scores on an old statistics midterm (40 scores total):

Mean = 66.8
Median = 68.5

Standard deviation = 15.6

The scores are also summarized in a frequency distribution (left) and a relative frequency distribution (right):

Definition
A histogram is a graphical display of the information contained in a frequency distribution. Groups are marked off
on the horizontal axis and frequencies of each are determined by the heights of bars.

Histogram for the above exam scores:

Histograms further summarize a data set by describing its shape. Here are three common shapes of data sets that
have only one “peak”:
Question: In each of these examples,
how do the mean and median compare
to each other? (Is one bigger than the
other, and if so, which one?)

11
Stem and Leaf Diagrams

We can obtain the same graphical summary as a histogram while retaining the information of individual data
values. A Stem and Leaf diagram replaces the bars of a histogram with digits.

For the exam scores on page 11:

Digits to the left of the bar are the stems and digits to If the stem and leaf diagram is rotated counter-
the right are the leaves. Each data value of split into a clockwise by 90 degrees, we obtain the same visual
stem and a leaf. Observe that the entire data set can summary as the histogram, i.e. we can observe that
be reconstructed from a stem and leaf diagram; the data set is left-skewed.
therefore you can still find the mean, median, etc.

Example
A stem and leaf diagram for the raw scores from the first exam in Intro to Business Statistics, Fall 2023:

Unit-1-Introduction To Statistical Analysis
No ratings yet
Unit-1-Introduction To Statistical Analysis
103 pages
Std121-121e - Business Statistics Course Booklet 2023
No ratings yet
Std121-121e - Business Statistics Course Booklet 2023
82 pages
Module 5 Ge 114
No ratings yet
Module 5 Ge 114
15 pages
Chapter One - Introduction
No ratings yet
Chapter One - Introduction
36 pages
Applied Statistics 2017
No ratings yet
Applied Statistics 2017
76 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
MMW Finals Notes Mod 5&6
No ratings yet
MMW Finals Notes Mod 5&6
52 pages
Week 01
No ratings yet
Week 01
71 pages
Eco2061 Week 2
No ratings yet
Eco2061 Week 2
68 pages
Inroduction To Statistics
No ratings yet
Inroduction To Statistics
71 pages
Study Notes
No ratings yet
Study Notes
154 pages
MMW Finals Notes Mod 5 Part 1&2
No ratings yet
MMW Finals Notes Mod 5 Part 1&2
32 pages
Lecture 1 Statistics and Lecture2
No ratings yet
Lecture 1 Statistics and Lecture2
44 pages
Statistics
No ratings yet
Statistics
45 pages
Statistics
No ratings yet
Statistics
88 pages
Lecture 4
No ratings yet
Lecture 4
61 pages
Unit 1 - Examining Distributions
No ratings yet
Unit 1 - Examining Distributions
80 pages
Data - Its Source and Compilation
No ratings yet
Data - Its Source and Compilation
10 pages
Math 101 Statistics
No ratings yet
Math 101 Statistics
100 pages
Bus 172 Chapter 1
No ratings yet
Bus 172 Chapter 1
30 pages
Bus 172 Chapter 1
No ratings yet
Bus 172 Chapter 1
30 pages
Unit 2
No ratings yet
Unit 2
72 pages
Chapter 1, Handout
No ratings yet
Chapter 1, Handout
82 pages
Nature of Statistics
100% (1)
Nature of Statistics
7 pages
Lesson1 160127200215
No ratings yet
Lesson1 160127200215
44 pages
Week 1
No ratings yet
Week 1
6 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
18 pages
Lecture No 01 Statistics 13-2-24
No ratings yet
Lecture No 01 Statistics 13-2-24
34 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
Intro 123243 Ewqs 1
No ratings yet
Intro 123243 Ewqs 1
37 pages
Basic Stat PDF
No ratings yet
Basic Stat PDF
52 pages
Statistics
No ratings yet
Statistics
4 pages
1 - Basic Concepts
No ratings yet
1 - Basic Concepts
71 pages
Statistics 8
No ratings yet
Statistics 8
33 pages
Lesson 1 Introduction To Statistics
No ratings yet
Lesson 1 Introduction To Statistics
12 pages
Module 2 - Statistical Foundations
No ratings yet
Module 2 - Statistical Foundations
108 pages
7a1a96f31c748dbb0763fa4427dffe7b
No ratings yet
7a1a96f31c748dbb0763fa4427dffe7b
66 pages
STAT. Lec.1
No ratings yet
STAT. Lec.1
30 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Sta 103 L1 Upda2
No ratings yet
Sta 103 L1 Upda2
104 pages
Statistics and Probability
No ratings yet
Statistics and Probability
17 pages
Chapter 1 BFC34303
No ratings yet
Chapter 1 BFC34303
104 pages
Lec Notes Business Stat
No ratings yet
Lec Notes Business Stat
7 pages
Prem Mann, Introductory Statistics, 7/E
No ratings yet
Prem Mann, Introductory Statistics, 7/E
30 pages
Chap 4 Part1 Intro Measures of Central Tendency of Ungrouped Data 1
No ratings yet
Chap 4 Part1 Intro Measures of Central Tendency of Ungrouped Data 1
74 pages
Math As A Tool Data Management Introduction and Central Tendency
No ratings yet
Math As A Tool Data Management Introduction and Central Tendency
12 pages
Nature of Statistics
No ratings yet
Nature of Statistics
7 pages
Basic Concepts in Statistics
No ratings yet
Basic Concepts in Statistics
42 pages
Note For Int To Statistics
No ratings yet
Note For Int To Statistics
24 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
64 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Reviewer in IE-SAN1
No ratings yet
Reviewer in IE-SAN1
5 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
Data Management
No ratings yet
Data Management
7 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
What Is Statistics
No ratings yet
What Is Statistics
6 pages
Lec 1 - Data, Tables and Graphs
No ratings yet
Lec 1 - Data, Tables and Graphs
18 pages
03 - May 2025
No ratings yet
03 - May 2025
58 pages
Dokumen - Tips Lesson Plan Board Cbse Class Viii Subject Chapter Cbse Class Viii
No ratings yet
Dokumen - Tips Lesson Plan Board Cbse Class Viii Subject Chapter Cbse Class Viii
57 pages
Third Term Mathematics Gonzaga - SS 1
No ratings yet
Third Term Mathematics Gonzaga - SS 1
63 pages
Unit 1 PsychStats Reviewer
No ratings yet
Unit 1 PsychStats Reviewer
5 pages
C-1 Introduction To Statistics
No ratings yet
C-1 Introduction To Statistics
212 pages
Practical Geography Class XII
No ratings yet
Practical Geography Class XII
76 pages
Course Syllabus Stat
No ratings yet
Course Syllabus Stat
9 pages
Yu
No ratings yet
Yu
21 pages
Xii - Commerce Preparation Papers 2022-23
No ratings yet
Xii - Commerce Preparation Papers 2022-23
20 pages
Ungrouped and Grouped Frequency Distribution Table
No ratings yet
Ungrouped and Grouped Frequency Distribution Table
2 pages
PR2 Draft1 4
No ratings yet
PR2 Draft1 4
19 pages
BraseUStat10 02 01
No ratings yet
BraseUStat10 02 01
42 pages
Work Sheet and Assignment For Probability and Statistics
No ratings yet
Work Sheet and Assignment For Probability and Statistics
6 pages
Applied Statistics
No ratings yet
Applied Statistics
10 pages
Chapter 2
No ratings yet
Chapter 2
70 pages
L02 ECO220 Print
No ratings yet
L02 ECO220 Print
18 pages
Worksheet Chap 2
No ratings yet
Worksheet Chap 2
4 pages
Internship Report On: "Students Feedback and Effectiveness About Professional Skill Development Program"
No ratings yet
Internship Report On: "Students Feedback and Effectiveness About Professional Skill Development Program"
52 pages
Chapter One - 231003 - 141351
No ratings yet
Chapter One - 231003 - 141351
29 pages
6th Sem Frequency Distribution by Halima Zohra
No ratings yet
6th Sem Frequency Distribution by Halima Zohra
8 pages
Topic 1: Data Interpretation: Data Types, Its Collection, Display and Regarding Information
No ratings yet
Topic 1: Data Interpretation: Data Types, Its Collection, Display and Regarding Information
166 pages
Chapter 2 Risk Assessment
No ratings yet
Chapter 2 Risk Assessment
33 pages
NCERT Solutions Class 11 Maths Chapter 15 Statistics
No ratings yet
NCERT Solutions Class 11 Maths Chapter 15 Statistics
39 pages
(McGraw Hill) - Performance Measurement Data Analysis Tools
No ratings yet
(McGraw Hill) - Performance Measurement Data Analysis Tools
55 pages
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
No ratings yet
Describing Data:: Frequency Tables, Frequency Distributions, and Graphic Presentation
24 pages
Collecting Data - Investigating Height (Spicy)
No ratings yet
Collecting Data - Investigating Height (Spicy)
2 pages
Reviewer Math
No ratings yet
Reviewer Math
18 pages
Unit 8 Tutorial 01 Business Management
No ratings yet
Unit 8 Tutorial 01 Business Management
4 pages
Nurul Aisyah Rahmalita Putri C1C020114 Midtest-Stat 1 2021-Dikonversi
No ratings yet
Nurul Aisyah Rahmalita Putri C1C020114 Midtest-Stat 1 2021-Dikonversi
5 pages

CHAPTER+ONE+Descriptive+Statistics+ +univariate

Uploaded by

CHAPTER+ONE+Descriptive+Statistics+ +univariate

Uploaded by

CHAPTER ONE: Introduction

What is statistics? There are two meanings, or uses, of the word:

II) Inferential Statistics

3. Does a newly developed drug work?

Sampling with replacement

Sampling without replacement

There are two fundamental types of variables:

Examples of quantitative variables Examples of qualitative variables

The two most important characteristics of a quantitative data set:

We can identify/quantify both with a pair of numbers

We will study these, in particular:

if n is odd, then the median is the single value in the middle.

(a) What is the mean score?

How to determine if a measure is resistant:

the five smallest (x1 , ….. , x5) are trimmed off

Dotplots for each data set are given below.

We will concentrate on these:

Before we proceed, we need to define some special sums:

Data values: x1 , x 2 , x3 , ...... , x n

Now let x denote the mean of the data values.

 (x i − x ) = 0 ALWAYS!!! (i.e. for every data set)

Consider squaring each of the deviations first …..

Jimmy’s children: Billy’s children:

NOTE: Standard deviation can also be expressed as:

𝟐 For the exam scores in Problem 1.1:

The Median Absolute Deviation (MAD):

Step 1 Find the median of the data set. Call it ~

MAD = (5+8)/2 = 6.5

Question: Is the MAD resistant?

To distinguish between the two cases, we must introduce some notation.

If we are summarizing a population:

If we are summarizing a sample:

2. The standard deviation of a population is defined by

• if n is odd, then include the median in both halves

• 1st quartile (Q1) = median of lower half of data set

Interquartile Range = Q3 – Q1 (dispersion)

Find the quartiles and compute the MQ and the IQR.

- at least 75% of the seniors scored between   2 1030  2(80)

- at least 89% of the seniors scored between   3 1030  3(80)

We will now look at:

Standard deviation = 15.6

Histogram for the above exam scores:

For the exam scores on page 11:

You might also like