0% found this document useful (0 votes)
25 views35 pages

Ch1 Prob&Stat NEW

1. The document discusses key concepts in statistics including the role of statistics in science, different types of studies, variables, types of statistics, and methods for describing data. 2. It covers descriptive statistics such as frequency distributions, graphs, measures of central tendency including mean, median, and mode, and measures of dispersion. 3. The document provides examples and explanations of important statistical terminology to build foundational knowledge in statistics.

Uploaded by

lokasfokaas42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views35 pages

Ch1 Prob&Stat NEW

1. The document discusses key concepts in statistics including the role of statistics in science, different types of studies, variables, types of statistics, and methods for describing data. 2. It covers descriptive statistics such as frequency distributions, graphs, measures of central tendency including mean, median, and mode, and measures of dispersion. 3. The document provides examples and explanations of important statistical terminology to build foundational knowledge in statistics.

Uploaded by

lokasfokaas42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

INTRODUCTION AND

GENERAL PRINCIPLES IN
STATISTICS

1
WHAT DO SCIENTISTS DO?
A scientist is someone who solves problems of
interest to society with the efficient application of
scientific principles by:
• Refining existing products
• Designing new products or processes

2
STATISTICS SUPPORTS THE CREATIVE
PROCESS
The field of statistics deals with the collection,
presentation, analysis, and use of data to:
• Make decisions
• Solve problems
• Design products and processes
It is the science of learning information from data.

3
BASIC TYPES OF STUDIES

Three basic methods for collecting data:


– A retrospective study using historical data
• Data collected in the past for other purposes.
– An observational study
• Data, presently collected, by a passive observer.
– A designed experiment
• Data collected in response to process input changes.

6
INTRODUCTION: BASIC TERMS
Population Vs. Sample

Population: A population consists of all elements (individuals,


items, or objects) whose characteristics are being studied.

Sample: A portion of the population selected for study is referred to


as a sample.

Population
Sample
TYPE OF VARIABLES
Variables

Quantitative Qualitative

Ratio Interval Nominal Ordinal

Income, age, Number of Gender, Education


height, sales houses, cars marital status
TYPES OF STATISTICS
I. Descriptive Statistics:
Descriptive statistics consists of methods for organizing, displaying, and
describing data by using tables, graphs, and summary measures.

II. Inferential Statistics:


Inferential statistics consists of methods that use sample results to help
make decisions or predictions about a population.
DESCRIBING DATA USING TABLES
AND GRAPHS
I. Organizing and graphing Qualitative variables
How to organize and display qualitative data.
 Frequency distribution of qualitative variable

Example: A sample of 10 students is selected, and asked how happy of this


course. Suppose that your responses are recorded below, where very
represents very happy, somewhat means somewhat happy, and none stands
for not happy at all.

Somewhat Somewhat Very Very Somewhat


Very Somewhat None Very Very
Frequency distribution

Happy of the course Frequency


Very 5
Somewhat 4
None 1
Sum = 10

Relative Frequency and Percentage Distributions

Frequency of that category


Relative frequency of a category 
Sum of all frequencies

Percentage = (Relative frequency) .100


Relative frequency and percentage distribution

Happy of Frequency Relative Percentage


the course Frequencies
Very 5 5/10 = 0.5 0.5*100=50
Somewhat 4 4/10 = 0.4 0.4*100=40

None 1 1/10 = 0.1 0.1*100=10


Sum = 1 Sum = 100
 Graphical presentation of qualitative data

The bar graph and the pie chart are two types of graphs that are
commonly used to display qualitative data.

Bar Graph: A graph made of bars whose heights represent the


frequencies of respective categories is called a bar graph.
 Graphical presentation of qualitative data

Pie Chart: A pie chart is more commonly used to display percentages,


although it can be used to display frequencies or relative frequencies.
The whole pie (or circle) represents the total sample or population.
Then we divide the pie into different portions that represent the
different categories.

%10

%50
%40
II. Organizing and graphing Quantitative variables
How to organize and display quantitative data.
 Frequency distribution of quantitative
variable Single valued classes
Example: A sample of 10 students is selected, and asked how many cars
owned by your household.
3 1 0 2 1
1 2 1 1 0
Cars Owned Frequency
0 2
1 5
2 2
3 1 2
0
 Graphical presentation of quantitative data

Quantitative data can be displayed in a histogram or polygon.


SHAPES OF HISTOGRAMS
A histogram can assume any one of a large number of shapes. The
most common of these shapes are:
1. Symmetric
2. Skewed
A symmetric histogram is identical on both sides of its central
point. The histograms shown in the down Figure is symmetric
around the dashed line that represent their central points.
SHAPES OF HISTOGRAMS
A skewed histogram is non-symmetric. For a skewed histogram,
the tail on one side is longer than the tail on the other side. A
skewed-to-the-right histogram has a longer tail on the right side
(see Figure 1).
A skewed-to-the-left histogram has a longer tail on the left side
(see Figure 2).

Figure 1 Figure 2
DESCRIBING DATA USING NUMERICAL
MEASURES
We already discussed that Frequency distribution and graphs are
important component of statistics, however it is also important to
numerically describe the main characteristics of a data set. We will
talk about two numerical summary measures. In particular, the
measures that we will discuss include measures of:
1. Central tendency
2. Dispersion or spread
1. Measures of Central Tendency
A measure of central tendency gives the center of a histogram or a
frequency distribution curve. Now, we will discusses four different
measures of central tendency: the mean, trimmed mean, the
median and the mode.
DESCRIBING DATA USING NUMERICAL
MEASURES
I. Mean
The mean is the most frequently used measure of central tendency.
Sum of all values
Mean 
Number of values
x
Mean for population data:  
N

x
Mean for sample data: x 
n
Example: The following are the ages (in years) of all eight employees
of a small company:
53 32 61 27 39 44 49 57
Calculate the mean age of these employees.
DESCRIBING DATA USING NUMERICAL
MEASURES
The population mean is
  x  362  45.25
N 8

If we take a sample of three employees from this company (32, 39 and


57) and calculate the mean age of those three employees


x 32  39  57
x    42.67
n 3
Sometime a data set may contain a few very small or a few very large
values. Such values are called outliers or extreme values.
DESCRIBING DATA USING NUMERICAL
MEASURES
The down Table lists the total sales of six Palestinian companies for 2014.

Company Total Sales (Million)


Jawwal 325
Wattania 50
Siniora 55
Unipal 70
Al-juneidi 40
Plaza 45

Find the 2014 mean sales for these six companies.


 x 585
x    97.5
n 6
DESCRIBING DATA USING NUMERICAL
MEASURES
Notice that the sales of Jawwal are very large compared to those of
other companies. Hence, it is an outlier. The mean of the 5 companies
is:
 x 260
x    52
n 5

We should know that the mean is not always the best measure of
central tendency because it is heavily influenced by outliers.
Sometimes other measures of central tendency give a more accurate
impression of a data set. For example, when a data set has outliers,
instead of using the mean, we can use either the trimmed mean or
the median as a measure of central tendency.
DESCRIBING DATA USING NUMERICAL
MEASURES
II. Trimmed Mean
The trimmed mean is calculated by dropping a certain percentage of values
from each end of a ranked data set. The trimmed mean is especially useful
as a measure of central tendency when a data set contains a few outliers at
each end.
Example: Suppose the following data give the ages (in years) of 10
employees of a company:
47 53 38 26 39 49 19 67 31 23
To calculate the 10% trimmed mean, first we rank these data values in
increasing order; then drop 10% of the smallest values and 10% of the
largest values. The mean of the remaining 80% of the values will give the
10% trimmed mean.
X19 23 26 31 38 39 47 49 53 X67
 x 306
x    38.25
n 8 29
DESCRIBING DATA USING NUMERICAL
MEASURES
III. Median
Another important measure of central tendency is the median which
is the value of the middle term in a data set that has been ranked in
increasing order.
• If n is odd the median is the middle number
• If n is even the median is the mean of the middle two numbers
Example: Suppose the following data give the ages (in years) of 10
employees of a company: 47 53 38 26 39 49 19 67 31 23
First, we rank the given data in increasing order as follows:
19 23 26 31 38 39 47 49 53 67
38  39
Median   38.5
2
The advantage of using the median as a measure of central tendency is that it
is not influenced by outliers. 30
DESCRIBING DATA USING NUMERICAL
MEASURES
IV. Mode
The mode is the value that occurs with the highest frequency in a
data set.
Example: The following data give the speeds (in miles per hour) of
eight cars that were stopped on a road for speeding violations:
77 82 74 81 79 84 74 78

Find the mode. Mode = 74

A major shortcoming of the mode is that a data set may have none or
may have more than one mode, whereas it will have only one mean
and only one median.
RELATIONSHIP AMONG THE MEAN, MEDIAN AND
MODE
 As discussed previously, two of the many shapes that a histogram
can assume are symmetric and skewed.
 Knowing the values of the mean, median, and mode can give us
some idea about the shape of a frequency distribution curve.
I. For a symmetric histogram and frequency distribution curve with
one peak (see down Figure), the values of the mean, median, and
mode are identical, and they lie at the center of the distribution.
RELATIONSHIP AMONG THE MEAN, MEDIAN AND
MODE
II. For a histogram skewed to the right (see the down Figure), the
value of the mean is the largest, that of the mode is the smallest,
and the value of the median lies between these two. (Notice that
the mode always occurs at the peak point). The value of the mean
is the largest in this case because it is sensitive to outliers that
occur in the right tail. These outliers pull the mean to the right.
RELATIONSHIP AMONG THE MEAN, MEDIAN AND
MODE
III. If a histogram and a frequency distribution curve are skewed to
the left (see the down Figure), the value of the mean is the
smallest and that of the mode is the largest, with the value of the
median lying between these two. In this case, the outliers in the
left tail pull the mean to the left.
DESCRIBING DATA USING NUMERICAL
MEASURES
2. Measures of Dispersion
 The measures of central tendency, such as the mean, median, and
mode, do not reveal the whole picture of the distribution of a data
set. Two data sets with the same mean may have completely
different spreads. The variation among the values of observations
for one data set may be much larger or smaller than for the other
data set. (Note that the words dispersion, spread, and variation
have the same meaning).
 Consider the following two data sets on the ages (in years) of all
workers working for each of two small companies.

Company 1: 47 38 35 40 36 45 39
Company 2: 70 33 18 52 27
DESCRIBING DATA USING NUMERICAL
MEASURES
2. Measures of Dispersion
 The mean age of workers in both these companies is the same, 40
years. If we do not know the ages of individual workers at these two
companies and are told only that the mean age of the workers at both
companies is the same, we may deduce that the workers at these two
companies have a similar age distribution.
 As we can observe, however, the variation in the workers’ ages for
each of these two companies is very different. As illustrated in the
diagram, the ages of the workers at the second company have a much
larger variation than the ages of the workers at the first company.
Company 1
35 36 38 39 40 45 47
Company 2
18 27 33 36 52 70
DESCRIBING DATA USING NUMERICAL
MEASURES
2. Measures of Dispersion
 Thus, the mean, median, or mode by itself is usually not a sufficient
measure to reveal the shape of the distribution of a data set. We also
need a measure that can provide some information about the variation
among data values.

 The measures that help us learn about the spread of a data set are
called the measures of dispersion. The measures of central tendency
and dispersion taken together give a better picture of a data set than
the measures of central tendency alone. Here we will discuss three
measures of dispersion: range, variance, and standard deviation.
DESCRIBING DATA USING NUMERICAL
MEASURES
I. Range
The range is the simplest measure of dispersion to calculate. It is
obtained by taking the difference between the largest and the smallest
values in a data set.

Example: The following are the ages (in years) of all eight employees
of a small company:
53 32 61 27 39 44 49 57

Calculate the Range.


Range = Largest value - Smallest value
Range = 61 – 27 = 34

The range, like the mean, has the disadvantage of being influenced
by outliers.
DESCRIBING DATA USING NUMERICAL
MEASURES
II. Variance and Standard Deviation
 The standard deviation is the most-used measure of dispersion. The
value of the standard deviation tells how closely the values of a data
set are clustered around the mean.

 In general, a lower value of the standard deviation for a data set


indicates that the values of that data set are spread over a relatively
smaller range around the mean. In contrast, a larger value of the
standard deviation for a data set indicates that the values of that data
set are spread over a relatively larger range around the mean.

 The standard deviation is obtained by taking the positive square root


of the variance.
DESCRIBING DATA USING NUMERICAL
MEASURES
II. Variance and Standard Deviation

Population variance (x  ) 2


 
2

N

(x x) 2
Sample variance s2 
n 1

Example: suppose the final scores of a sample of four students are 82,
95, 67, and 92, respectively.
Calculate the variance and standard deviation for these data.
The mean score for these four students is
 82  95  67  92
x   84
4
DESCRIBING DATA USING NUMERICAL
MEASURES
II. Variance and Standard Deviation
 
x (x  x) (x x)2
82 82-84 = -2 4
95 95-84 = 11 121
67 67-84 = -17 289
92 92-84 = 8 64
 
(x  x)  0 (x  x)2  478


(x x)2 s2 
478
 159.3
Sample variance = s 
2

n 1 3

Sample Standard deviation (s) = s2  159.3 12.62


Alternative formula for the sample variance
and standard deviation:

You might also like