0% found this document useful (0 votes)

25 views35 pages

Ch1 Prob&Stat NEW

1. The document discusses key concepts in statistics including the role of statistics in science, different types of studies, variables, types of statistics, and methods for describing data. 2. It covers descriptive statistics such as frequency distributions, graphs, measures of central tendency including mean, median, and mode, and measures of dispersion. 3. The document provides examples and explanations of important statistical terminology to build foundational knowledge in statistics.

Uploaded by

lokasfokaas42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views35 pages

Ch1 Prob&Stat NEW

Uploaded by

lokasfokaas42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

INTRODUCTION AND

GENERAL PRINCIPLES IN
STATISTICS

1
WHAT DO SCIENTISTS DO?
A scientist is someone who solves problems of
interest to society with the efficient application of
scientific principles by:
• Refining existing products
• Designing new products or processes

2
STATISTICS SUPPORTS THE CREATIVE
PROCESS
The field of statistics deals with the collection,
presentation, analysis, and use of data to:
• Make decisions
• Solve problems
• Design products and processes
It is the science of learning information from data.

3
BASIC TYPES OF STUDIES

Three basic methods for collecting data:

– A retrospective study using historical data
• Data collected in the past for other purposes.
– An observational study
• Data, presently collected, by a passive observer.
– A designed experiment
• Data collected in response to process input changes.

6
INTRODUCTION: BASIC TERMS
Population Vs. Sample

Population: A population consists of all elements (individuals,

items, or objects) whose characteristics are being studied.

Sample: A portion of the population selected for study is referred to

as a sample.

Population
Sample
TYPE OF VARIABLES
Variables

Quantitative Qualitative

Ratio Interval Nominal Ordinal

Income, age, Number of Gender, Education

height, sales houses, cars marital status
TYPES OF STATISTICS
I. Descriptive Statistics:
Descriptive statistics consists of methods for organizing, displaying, and
describing data by using tables, graphs, and summary measures.

II. Inferential Statistics:

Inferential statistics consists of methods that use sample results to help
make decisions or predictions about a population.
DESCRIBING DATA USING TABLES
AND GRAPHS
I. Organizing and graphing Qualitative variables
How to organize and display qualitative data.
 Frequency distribution of qualitative variable

Example: A sample of 10 students is selected, and asked how happy of this

course. Suppose that your responses are recorded below, where very
represents very happy, somewhat means somewhat happy, and none stands
for not happy at all.

Somewhat Somewhat Very Very Somewhat

Very Somewhat None Very Very
Frequency distribution

Happy of the course Frequency

Very 5
Somewhat 4
None 1
Sum = 10

Relative Frequency and Percentage Distributions

Frequency of that category

Relative frequency of a category 
Sum of all frequencies

Percentage = (Relative frequency) .100

Relative frequency and percentage distribution

Happy of Frequency Relative Percentage

the course Frequencies
Very 5 5/10 = 0.5 0.5*100=50
Somewhat 4 4/10 = 0.4 0.4*100=40

None 1 1/10 = 0.1 0.1*100=10

Sum = 1 Sum = 100
 Graphical presentation of qualitative data

The bar graph and the pie chart are two types of graphs that are
commonly used to display qualitative data.

Bar Graph: A graph made of bars whose heights represent the

frequencies of respective categories is called a bar graph.
 Graphical presentation of qualitative data

Pie Chart: A pie chart is more commonly used to display percentages,

although it can be used to display frequencies or relative frequencies.
The whole pie (or circle) represents the total sample or population.
Then we divide the pie into different portions that represent the
different categories.

%10

%50
%40
II. Organizing and graphing Quantitative variables
How to organize and display quantitative data.
 Frequency distribution of quantitative
variable Single valued classes
Example: A sample of 10 students is selected, and asked how many cars
owned by your household.
3 1 0 2 1
1 2 1 1 0
Cars Owned Frequency
0 2
1 5
2 2
3 1 2
0
 Graphical presentation of quantitative data

Quantitative data can be displayed in a histogram or polygon.

SHAPES OF HISTOGRAMS
A histogram can assume any one of a large number of shapes. The
most common of these shapes are:
1. Symmetric
2. Skewed
A symmetric histogram is identical on both sides of its central
point. The histograms shown in the down Figure is symmetric
around the dashed line that represent their central points.
SHAPES OF HISTOGRAMS
A skewed histogram is non-symmetric. For a skewed histogram,
the tail on one side is longer than the tail on the other side. A
skewed-to-the-right histogram has a longer tail on the right side
(see Figure 1).
A skewed-to-the-left histogram has a longer tail on the left side
(see Figure 2).

Figure 1 Figure 2
DESCRIBING DATA USING NUMERICAL
MEASURES
We already discussed that Frequency distribution and graphs are
important component of statistics, however it is also important to
numerically describe the main characteristics of a data set. We will
talk about two numerical summary measures. In particular, the
measures that we will discuss include measures of:
1. Central tendency
2. Dispersion or spread
1. Measures of Central Tendency
A measure of central tendency gives the center of a histogram or a
frequency distribution curve. Now, we will discusses four different
measures of central tendency: the mean, trimmed mean, the
median and the mode.
DESCRIBING DATA USING NUMERICAL
MEASURES
I. Mean
The mean is the most frequently used measure of central tendency.
Sum of all values
Mean 
Number of values
x
Mean for population data:  
N

x
Mean for sample data: x 
n
Example: The following are the ages (in years) of all eight employees
of a small company:
53 32 61 27 39 44 49 57
Calculate the mean age of these employees.
DESCRIBING DATA USING NUMERICAL
MEASURES
The population mean is
  x  362  45.25
N 8

If we take a sample of three employees from this company (32, 39 and

57) and calculate the mean age of those three employees


x 32  39  57
x    42.67
n 3
Sometime a data set may contain a few very small or a few very large
values. Such values are called outliers or extreme values.
DESCRIBING DATA USING NUMERICAL
MEASURES
The down Table lists the total sales of six Palestinian companies for 2014.

Company Total Sales (Million)

Jawwal 325
Wattania 50
Siniora 55
Unipal 70
Al-juneidi 40
Plaza 45

Find the 2014 mean sales for these six companies.

 x 585
x    97.5
n 6
DESCRIBING DATA USING NUMERICAL
MEASURES
Notice that the sales of Jawwal are very large compared to those of
other companies. Hence, it is an outlier. The mean of the 5 companies
is:
 x 260
x    52
n 5

We should know that the mean is not always the best measure of
central tendency because it is heavily influenced by outliers.
Sometimes other measures of central tendency give a more accurate
impression of a data set. For example, when a data set has outliers,
instead of using the mean, we can use either the trimmed mean or
the median as a measure of central tendency.
DESCRIBING DATA USING NUMERICAL
MEASURES
II. Trimmed Mean
The trimmed mean is calculated by dropping a certain percentage of values
from each end of a ranked data set. The trimmed mean is especially useful
as a measure of central tendency when a data set contains a few outliers at
each end.
Example: Suppose the following data give the ages (in years) of 10
employees of a company:
47 53 38 26 39 49 19 67 31 23
To calculate the 10% trimmed mean, first we rank these data values in
increasing order; then drop 10% of the smallest values and 10% of the
largest values. The mean of the remaining 80% of the values will give the
10% trimmed mean.
X19 23 26 31 38 39 47 49 53 X67
 x 306
x    38.25
n 8 29
DESCRIBING DATA USING NUMERICAL
MEASURES
III. Median
Another important measure of central tendency is the median which
is the value of the middle term in a data set that has been ranked in
increasing order.
• If n is odd the median is the middle number
• If n is even the median is the mean of the middle two numbers
Example: Suppose the following data give the ages (in years) of 10
employees of a company: 47 53 38 26 39 49 19 67 31 23
First, we rank the given data in increasing order as follows:
19 23 26 31 38 39 47 49 53 67
38  39
Median   38.5
2
The advantage of using the median as a measure of central tendency is that it
is not influenced by outliers. 30
DESCRIBING DATA USING NUMERICAL
MEASURES
IV. Mode
The mode is the value that occurs with the highest frequency in a
data set.
Example: The following data give the speeds (in miles per hour) of
eight cars that were stopped on a road for speeding violations:
77 82 74 81 79 84 74 78

Find the mode. Mode = 74

A major shortcoming of the mode is that a data set may have none or
may have more than one mode, whereas it will have only one mean
and only one median.
RELATIONSHIP AMONG THE MEAN, MEDIAN AND
MODE
 As discussed previously, two of the many shapes that a histogram
can assume are symmetric and skewed.
 Knowing the values of the mean, median, and mode can give us
some idea about the shape of a frequency distribution curve.
I. For a symmetric histogram and frequency distribution curve with
one peak (see down Figure), the values of the mean, median, and
mode are identical, and they lie at the center of the distribution.
RELATIONSHIP AMONG THE MEAN, MEDIAN AND
MODE
II. For a histogram skewed to the right (see the down Figure), the
value of the mean is the largest, that of the mode is the smallest,
and the value of the median lies between these two. (Notice that
the mode always occurs at the peak point). The value of the mean
is the largest in this case because it is sensitive to outliers that
occur in the right tail. These outliers pull the mean to the right.
RELATIONSHIP AMONG THE MEAN, MEDIAN AND
MODE
III. If a histogram and a frequency distribution curve are skewed to
the left (see the down Figure), the value of the mean is the
smallest and that of the mode is the largest, with the value of the
median lying between these two. In this case, the outliers in the
left tail pull the mean to the left.
DESCRIBING DATA USING NUMERICAL
MEASURES
2. Measures of Dispersion
 The measures of central tendency, such as the mean, median, and
mode, do not reveal the whole picture of the distribution of a data
set. Two data sets with the same mean may have completely
different spreads. The variation among the values of observations
for one data set may be much larger or smaller than for the other
data set. (Note that the words dispersion, spread, and variation
have the same meaning).
 Consider the following two data sets on the ages (in years) of all
workers working for each of two small companies.

Company 1: 47 38 35 40 36 45 39
Company 2: 70 33 18 52 27
DESCRIBING DATA USING NUMERICAL
MEASURES
2. Measures of Dispersion
 The mean age of workers in both these companies is the same, 40
years. If we do not know the ages of individual workers at these two
companies and are told only that the mean age of the workers at both
companies is the same, we may deduce that the workers at these two
companies have a similar age distribution.
 As we can observe, however, the variation in the workers’ ages for
each of these two companies is very different. As illustrated in the
diagram, the ages of the workers at the second company have a much
larger variation than the ages of the workers at the first company.
Company 1
35 36 38 39 40 45 47
Company 2
18 27 33 36 52 70
DESCRIBING DATA USING NUMERICAL
MEASURES
2. Measures of Dispersion
 Thus, the mean, median, or mode by itself is usually not a sufficient
measure to reveal the shape of the distribution of a data set. We also
need a measure that can provide some information about the variation
among data values.

 The measures that help us learn about the spread of a data set are
called the measures of dispersion. The measures of central tendency
and dispersion taken together give a better picture of a data set than
the measures of central tendency alone. Here we will discuss three
measures of dispersion: range, variance, and standard deviation.
DESCRIBING DATA USING NUMERICAL
MEASURES
I. Range
The range is the simplest measure of dispersion to calculate. It is
obtained by taking the difference between the largest and the smallest
values in a data set.

Example: The following are the ages (in years) of all eight employees
of a small company:
53 32 61 27 39 44 49 57

Calculate the Range.

Range = Largest value - Smallest value
Range = 61 – 27 = 34

The range, like the mean, has the disadvantage of being influenced
by outliers.
DESCRIBING DATA USING NUMERICAL
MEASURES
II. Variance and Standard Deviation
 The standard deviation is the most-used measure of dispersion. The
value of the standard deviation tells how closely the values of a data
set are clustered around the mean.

 In general, a lower value of the standard deviation for a data set

indicates that the values of that data set are spread over a relatively
smaller range around the mean. In contrast, a larger value of the
standard deviation for a data set indicates that the values of that data
set are spread over a relatively larger range around the mean.

 The standard deviation is obtained by taking the positive square root

of the variance.
DESCRIBING DATA USING NUMERICAL
MEASURES
II. Variance and Standard Deviation

Population variance (x  ) 2

 
2

N

(x x) 2
Sample variance s2 
n 1

Example: suppose the final scores of a sample of four students are 82,
95, 67, and 92, respectively.
Calculate the variance and standard deviation for these data.
The mean score for these four students is
 82  95  67  92
x   84
4
DESCRIBING DATA USING NUMERICAL
MEASURES
II. Variance and Standard Deviation
 
x (x  x) (x x)2
82 82-84 = -2 4
95 95-84 = 11 121
67 67-84 = -17 289
92 92-84 = 8 64
 
(x  x)  0 (x  x)2  478


(x x)2 s2 
478
 159.3
Sample variance = s 
2

n 1 3

Sample Standard deviation (s) = s2  159.3 12.62

Alternative formula for the sample variance
and standard deviation:

BIO401 PPT Slide Full Book
No ratings yet
BIO401 PPT Slide Full Book
354 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
86 pages
Descriptive Statistics and Exploratory Data Analysis
No ratings yet
Descriptive Statistics and Exploratory Data Analysis
36 pages
Basic Statistics
100% (9)
Basic Statistics
73 pages
Basic Statistics (3685) PPT - Lecture On 20-01-2019
100% (1)
Basic Statistics (3685) PPT - Lecture On 20-01-2019
64 pages
Data Management
100% (1)
Data Management
51 pages
Introduction To Statistics
100% (1)
Introduction To Statistics
60 pages
Quantitative Methods
No ratings yet
Quantitative Methods
4 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Chapter 2 Descriptive Statistics
No ratings yet
Chapter 2 Descriptive Statistics
12 pages
Interpreting Test Score: Online Workshop 8602 Aiou
100% (1)
Interpreting Test Score: Online Workshop 8602 Aiou
39 pages
MMW Unit IV Statistics
No ratings yet
MMW Unit IV Statistics
62 pages
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
100% (1)
Basics of Statistics: Definition: Science of Collection, Presentation, Analysis, and Reasonable
33 pages
02 - ASDM Workbook Part 1
No ratings yet
02 - ASDM Workbook Part 1
71 pages
CH 1
No ratings yet
CH 1
36 pages
01 Data
No ratings yet
01 Data
100 pages
Statistics Ppt.1
No ratings yet
Statistics Ppt.1
39 pages
1.ungrouped Data Mean, Median&Mode
No ratings yet
1.ungrouped Data Mean, Median&Mode
39 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
PROBABILITY Lecture 1 - 2 - 3
No ratings yet
PROBABILITY Lecture 1 - 2 - 3
63 pages
Statistics 24 04 2021 20210618114031
No ratings yet
Statistics 24 04 2021 20210618114031
41 pages
02 Workbook+Part+1 Business+Statistics
No ratings yet
02 Workbook+Part+1 Business+Statistics
83 pages
Descr Iptive Statis Tics: Inferential Statistics
No ratings yet
Descr Iptive Statis Tics: Inferential Statistics
36 pages
Statistics
No ratings yet
Statistics
81 pages
Lecture 1
No ratings yet
Lecture 1
28 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
42 pages
Notes 3 Descriptive Statistics RJMurden 2021
No ratings yet
Notes 3 Descriptive Statistics RJMurden 2021
47 pages
Probability+&+Statistics Formulas
No ratings yet
Probability+&+Statistics Formulas
47 pages
Math
No ratings yet
Math
13 pages
Physics
No ratings yet
Physics
6 pages
Introduction To Data Analytics: ITE 5201 Lecture5-Data Visualization-2
No ratings yet
Introduction To Data Analytics: ITE 5201 Lecture5-Data Visualization-2
77 pages
Lesson 02 Probability and Statistics
No ratings yet
Lesson 02 Probability and Statistics
127 pages
Basic Stat
No ratings yet
Basic Stat
46 pages
Biostats Lesson 3
No ratings yet
Biostats Lesson 3
6 pages
Multivariate Analysis of Variance
No ratings yet
Multivariate Analysis of Variance
29 pages
Lesson2 - Measures of Tendency
No ratings yet
Lesson2 - Measures of Tendency
65 pages
Article Review 1 Eng
No ratings yet
Article Review 1 Eng
30 pages
Topic 2 - Descriptive - Statistics
No ratings yet
Topic 2 - Descriptive - Statistics
36 pages
Inferential Statistics
No ratings yet
Inferential Statistics
92 pages
L2-Types of Data, Central Tendency and Dispersion-2
No ratings yet
L2-Types of Data, Central Tendency and Dispersion-2
81 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
41 pages
Inquiry Investigation and Immersion Mod 1
No ratings yet
Inquiry Investigation and Immersion Mod 1
14 pages
Chapter 8 ARIMA Models: 8.1 Stationarity and Differencing
100% (1)
Chapter 8 ARIMA Models: 8.1 Stationarity and Differencing
46 pages
Exploring Data: AP Statistics Unit 1: Chapters 1-4
No ratings yet
Exploring Data: AP Statistics Unit 1: Chapters 1-4
83 pages
Chapter1 Statistics
No ratings yet
Chapter1 Statistics
17 pages
43hyrs Principles of Statistics 3
No ratings yet
43hyrs Principles of Statistics 3
56 pages
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
No ratings yet
Click To Add Text Dr. Cemre Erciyes: Soc 2003 Statistical Methods and Computer Applications in Social Sciences 18/19
69 pages
City Uni of New York
No ratings yet
City Uni of New York
33 pages
MCS Lecture 3
No ratings yet
MCS Lecture 3
57 pages
Statistics
No ratings yet
Statistics
46 pages
Basics of Statistics
No ratings yet
Basics of Statistics
40 pages
Statistics Notes (Final Na Makakapasar)
No ratings yet
Statistics Notes (Final Na Makakapasar)
6 pages
2 - Introduction To Statistics
No ratings yet
2 - Introduction To Statistics
97 pages
Unit 3 - Descriptive Statistics
No ratings yet
Unit 3 - Descriptive Statistics
44 pages
Essential Statistics For Public Managers and Policy Analysts Wang - Get Instant Access To The Full Ebook With Detailed Content
100% (2)
Essential Statistics For Public Managers and Policy Analysts Wang - Get Instant Access To The Full Ebook With Detailed Content
56 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
53 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
59 pages
Unit 2 DS PDF
No ratings yet
Unit 2 DS PDF
97 pages
f592b059 1643454320549
No ratings yet
f592b059 1643454320549
39 pages
Data Management (1) (1) - Compressed
No ratings yet
Data Management (1) (1) - Compressed
46 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Bustat Reviewer
No ratings yet
Bustat Reviewer
6 pages
Statistical Inference Cheat Sheet
No ratings yet
Statistical Inference Cheat Sheet
4 pages
Experimental Research
No ratings yet
Experimental Research
14 pages
Solutions 7
No ratings yet
Solutions 7
4 pages
Sylabus
No ratings yet
Sylabus
10 pages
Slides Presentation
No ratings yet
Slides Presentation
106 pages
II B.Tech (MIC23) SMDS Model Paper-2
No ratings yet
II B.Tech (MIC23) SMDS Model Paper-2
2 pages
2302 01087
No ratings yet
2302 01087
8 pages
Bivariate Data Analysis Olympics Project Lef-2
No ratings yet
Bivariate Data Analysis Olympics Project Lef-2
6 pages
WK 1 Appendix Review
No ratings yet
WK 1 Appendix Review
26 pages
Assignment 4 PS
100% (1)
Assignment 4 PS
2 pages
3-2 F Baumeister Presentation Homogeneity in EQA
No ratings yet
3-2 F Baumeister Presentation Homogeneity in EQA
24 pages
Certified Artificial Intelligence Practitioner 3
No ratings yet
Certified Artificial Intelligence Practitioner 3
36 pages
Moving Range: ISSN: 2339-2541 JURNAL GAUSSIAN, Volume 3, Nomor 4, Tahun 2014, Halaman 701 - 710
No ratings yet
Moving Range: ISSN: 2339-2541 JURNAL GAUSSIAN, Volume 3, Nomor 4, Tahun 2014, Halaman 701 - 710
10 pages
MT131 Tutorial - 5 Discrete Probability 2
No ratings yet
MT131 Tutorial - 5 Discrete Probability 2
40 pages
PM608 - Week 5 Lecture - Probablilty Theory
No ratings yet
PM608 - Week 5 Lecture - Probablilty Theory
24 pages
Arfabark Example
No ratings yet
Arfabark Example
25 pages
Paper-Simple Poisson PCA An Algorithm For Sparse Feature
No ratings yet
Paper-Simple Poisson PCA An Algorithm For Sparse Feature
19 pages
02c# - Guggenmos Et Al. (2018) - Custom Contrast Testing
No ratings yet
02c# - Guggenmos Et Al. (2018) - Custom Contrast Testing
23 pages
Chapter 14 - Nonlinear Regression Models
No ratings yet
Chapter 14 - Nonlinear Regression Models
20 pages
Lab Test 2018 Answers PDF
No ratings yet
Lab Test 2018 Answers PDF
6 pages
3.1 Hypothesis Testing (Critical Value Approach) : Statistics
No ratings yet
3.1 Hypothesis Testing (Critical Value Approach) : Statistics
3 pages
Desi Wahyuni
No ratings yet
Desi Wahyuni
3 pages
Probit Model Analysis
No ratings yet
Probit Model Analysis
14 pages
Problem Solving
No ratings yet
Problem Solving
3 pages
Math 2240 Midterm 2018 Mechanical PDF
No ratings yet
Math 2240 Midterm 2018 Mechanical PDF
6 pages
WeeklyPracticeQuestions (E)
No ratings yet
WeeklyPracticeQuestions (E)
2 pages
Descriptive Statistics: Six Sigma Thinking, #3
From Everand
Descriptive Statistics: Six Sigma Thinking, #3
Sumeet Savant
No ratings yet

Ch1 Prob&Stat NEW

Uploaded by

Ch1 Prob&Stat NEW

Uploaded by

INTRODUCTION AND

Three basic methods for collecting data:

Population: A population consists of all elements (individuals,

Sample: A portion of the population selected for study is referred to

Ratio Interval Nominal Ordinal

Income, age, Number of Gender, Education

II. Inferential Statistics:

Example: A sample of 10 students is selected, and asked how happy of this

Somewhat Somewhat Very Very Somewhat

Happy of the course Frequency

Relative Frequency and Percentage Distributions

Frequency of that category

Percentage = (Relative frequency) .100

Happy of Frequency Relative Percentage

None 1 1/10 = 0.1 0.1*100=10

Bar Graph: A graph made of bars whose heights represent the

Pie Chart: A pie chart is more commonly used to display percentages,

Quantitative data can be displayed in a histogram or polygon.

If we take a sample of three employees from this company (32, 39 and

Company Total Sales (Million)

Find the 2014 mean sales for these six companies.

Find the mode. Mode = 74

Calculate the Range.

 In general, a lower value of the standard deviation for a data set

 The standard deviation is obtained by taking the positive square root

Population variance (x  ) 2

Sample Standard deviation (s) = s2  159.3 12.62

You might also like