0% found this document useful (0 votes)
3 views

Unit 1,2 Introduction N Summarization

Uploaded by

DIV3N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit 1,2 Introduction N Summarization

Uploaded by

DIV3N
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

STATISTICAL METHODS

Unit 1
Data Introduction
and Summarization
WHAT IS STATISTICS ?
 Statistics is the science of
learning from the data
 It is concerned with
data collection
data analysis
data interpretation
TYPES OF STATISTICS
Descriptive Statistics
 It deals with collecting, summarizing and simplifying
data, which are otherwise quite unwieldy and
voluminous.
 When the population interest is small, we will be able
to directly describe the important aspects of the
population measurements.

Inferential Statistics
 It is the science of using a sample to make
generalizations about the important aspects of a
population.
 A descriptive value for a population is called a

parameter and a descriptive value for a sample is


STATISTICAL DATA
Statisticaldata are the basic raw
material of statistics.
It refers to those aspects of a

problem situation that can be


measured, quantified or counted.
STATISTICAL DATA AND ITS USES
• Data are facts and figures from which conclusions can be drawn
that help in decision making of many professions and
organizations. For e.g.
 Economists use conclusions drawn from the latest data on
unemployment and inflation to help the government make policy
decisions.
 Financial planners use recent trends in stock market prices and
economic conditions to make investment decisions.
 Accountants use sample data concerning a company’s actual
sales revenues to assess whether the company’s claimed sales
revenue are valid.
 Marketing professionals help businesses decide which products to
develop and market using data that reveal consumer
preferences.
 Production supervisors use manufacturing data to evaluate,
control and improve product quality.
 Politician rely on data from public opinion polls to formulate
legislation and to devise campaign strategies.
 Physicians and hospitals use data on the effectiveness of drugs
APPLICATIONS OF STATISTICS IN
MANAGEMENT AND INDUSTRY
• Location of Plant
• Size of Plant
• Production Planning
• Quality Control
• Finance Decisions
• Marketing Decisions
• Personnel Decisions
• Purchase Decisions
• Sales Decisions
• Accounting Decisions
DATA SOURCES
Data sources could be seen as of two types:
 Secondary
 Primary
Secondary data: They already exist in some
form: published or unpublished - in an
identifiable secondary source. They are,
generally, available from published source(s),
though not necessarily in the form actually
required.
Primary data: The data which do not already
exist in any form, and thus have to be collected
for the first time from the primary source(s). By
their very nature, these data require fresh and
first-time collection covering the whole
population or a sample drawn from it.
TYPES OF DATA
 In statistics, data are classified into two
broad categories:
 Quantitative Data: That can be quantified
in definite units of measurement.
 Discrete data
e.g. The number of customers visiting a
departmental store everyday, the number of
incoming flights at an airport, number of
defective items in a consignment received for
sale.
 Continuous data:
e.g. All characteristics such as weight, length,
height, thickness, velocity, temperature etc.
Types of Data
 Qualitative:That refers to the qualitative
characteristics of a subject or an object.
 Nominal data
They are the outcome of classification into two
or more categories of items or units comprising
a sample or a population according to some
quality characteristic.
e.g. Classification of students according to
gender (as males and females), of workers
according to skill (as skilled, semi-skilled and
unskilled) and of employees according to the
level of education (as matriculates,
undergraduates and post-graduates).
TYPES OF DATA
 Rank data,
o They are the result of assigning ranks to specify
order in terms of the integers 1,2,3, ..., n.
o Ranks may be assigned according to the level
of performance in a test.
e.g. a contest, a competition, an interview or a
show. The candidates appearing in an interview,
for example, may be assigned ranks in integers
ranging from 1 to n, depending on their
performance in the interview.
VARIABLES
A variable is a characteristic or
condition that can change or take on
different values.
 Most research begins with a general

question about the relationship


between two variables for a specific
group of individuals.
POPULATION
 A population is the set of all elements about
which we wish to draw conclusions.

SAMPLE
 Usually populations are so large that a
researcher cannot examine the entire group.
Therefore, a sample is selected to represent
the population in a research study. The goal
is to use the results obtained from the
sample to help answer questions about the
population.
 A sample is a subset o the elements of a

population.
DATA CLASSIFICATION AND PRESENTATION

Meaning and Definition of Data Classification

“Classification is the process of arranging data


into sequences and groups according to their
common characteristics, or separating them into
different but related parts” -- Secrist
METHODS OF CLASSIFICATION

Every item of the collected data has its own


characteristics. These characteristics can be of two types:

(i) Descriptive: (e.g. Honesty, beauty etc.)

These characteristics are those which cannot be


measured directly but they are counted on the basis of
presence or absence. (Non-measurable characteristics
or attributes)

(ii) Numerical: (e.g. height, weight, profit etc.)


TYPES OF CLASSIFICATION

Statistical data can have two types of classification :

(1) Qualitative classification

(2) Quantitative classification.

Qualitative classification can be of two types:


• Dichotomy or Two-fold Classification
• Manifold Classification
Students

Male Females

Female Female
Male Male Unemploy
Unemploy Employed ed
Employed ed
QUANTITATIVE CLASSIFICATION

Data classification on the basis of phenomena which


is capable of quantitative measurement like age,
height, weight, prices, production, income,
expenditure, sales, profits, etc.

The main methods of such classification are:

(i) Geographical Classification

(ii) Chronological Classification

(iii) Variable Classification

(a) Continuous Variable (b) Discrete Variable


(i) Geographical Classification: This type of
classification is based on geographical or location
differences between various items in the data like
states, cities, regions, zones etc. For e.g. The yield of
agricultural output per hectare for different countries
in some given period may be presented as follows:

Agricultural Output of different countries (in Kg. per


hectare)
Country India USA Pakistan Japan china
Avg. 125 585 140 410 330
Output
(ii)Chronological Classification: When data are
classified with respect to different periods of time
( hour, day, week, month, year, etc.) it is known as
chronological or temporal classification. For
example, the population of India for different
decades may be presented as follows:

Population of India ( in Crores)

Year 1951 1961 1971 1981 1991 2000


Population 36.1 43.9 54.7 68.5 84.4 102.7
(iii) Variable Classification: The classification on
this basis is known as variable classification.
Variables are of two kinds:

(a) Discrete variable (b) Continuous variable


Classification Classification
based on the based on the basis
basis of of Continuous
Discrete values
Values Income (Rs.) No. of Employees

Height No. of Students


(cms.)
1000-1500 15
154 8
1500-2000 33
155 10
2000-2500 22
156 6

157 2 2500-3000 18

158 12 3000-3500 12

159 12
Total 100
TABULAR AND GRAPHICAL METHODS

 Summarizing Qualitative Data


 Summarizing Quantitative Data

 Exploratory Data Analysis

 Scatter Diagrams
SUMMARIZING QUALITATIVE DATA
 Frequency Distribution
 Relative Frequency

 Percent Frequency Distribution

 Bar Graph

 Pie Chart
FREQUENCY DISTRIBUTION
 A frequency distribution is a tabular summary of
data showing the frequency (or number) of items
in each of several non-overlapping classes.

 The objective is to provide insights about the data


that cannot be quickly obtained by looking only at
the original data.
EXAMPLE: MARADA INN
Guests staying at Marada Inn were asked to rate
the quality of their accommodations as being
excellent, above average, average, below
average or poor. The ratings provided by a
sample of 20 guests are shown below.

Below Average Average Above Average


Average
Average Below Average Poor Above
Average
Poor Above Average Below Average
Average
Above Average Above Above
Average Average Average
Above Above Excellent Above
Average Average Average
EXAMPLE: MARADA INN
 Frequency Distribution

Rating Frequency
Poor 2
Below Average 3
Average 5
Above Average 9
Excellent 1
Total 20
RELATIVE FREQUENCY DISTRIBUTION
 The relative frequency of a class is the fraction
or proportion of the total number of data items
belonging to the class.

 A relative frequency distribution is a tabular


summary of a set of data showing the relative
frequency for each class.
PERCENT FREQUENCY DISTRIBUTION

 The percent frequency of a class is the relative


frequency multiplied by 100.

 A percent frequency distribution is a tabular


summary of a set of data showing the percent
frequency for each class.
EXAMPLE: MARADA INN
 Relative Frequency and Percent Frequency Distributions

Relative Percent
Rating Frequency Frequency

Poor .10 10
Below Average .15 15
Average .25 25
Above Average .45 45
Excellent .05 5
Total 1.00 100
BAR GRAPH

 A bar graph is a graphical device for depicting


qualitative data.
 On the horizontal axis we specify the labels that

are used for each of the classes.


 A frequency, relative frequency, or percent

frequency scale can be used for the vertical


axis.
 Using a bar of fixed width drawn above each

class label, we extend the height appropriately.


 The bars are separated to emphasize the fact

that each class is a separate category.


EXAMPLE: MARADA INN
 Bar Graph

9
8
7
Frequency

6
5
4
3
2
1
Rating
Poor Below AverageAbove Excellent
Average Average
PIE CHART
 The pie chart is a commonly used
graphical device for presenting relative
frequency distributions for qualitative
data.
 First draw a circle, then use the relative

frequencies to subdivide the circle into


sectors that correspond to the relative
frequency for each class.
 Since there are 360 degrees in a circle, a

class with a relative frequency of .25


would consume .25(360) = 90 degrees
of the circle.
EXAMPLE: MARADA INN
Exc.
Poor
5%
 Pie Chart 10%
Below
Average
Above
15%
Average
45%
Average
25%

Quality Ratings
EXAMPLE: MARADA INN
 Insights Gained from the Preceding Pie Chart

 One-half of the customers surveyed gave


Marada a quality rating of “above
average” or “excellent” (looking at the left
side of the pie). This might please the
manager.
 For each customer who gave an
“excellent” rating, there were two
customers who gave a “poor” rating
(looking at the top of the pie). This should
displease the manager.
EXPLORATORY DATA ANALYSIS

 The techniques of exploratory data analysis


consist of simple arithmetic and easy-to-draw
pictures that can be used to summarize data
quickly.

 One such technique is the stem-and-leaf


display.
STEM-AND-LEAF DISPLAY

 A stem-and-leaf display shows both the rank


order and shape of the distribution of the data.
 It is similar to a histogram on its side, but it has

the advantage of showing the actual data


values.
 The first digits of each data item are arranged to

the left of a vertical line.


 To the right of the vertical line we record the last

digit for each item in rank order.


 Each line in the display is referred to as a stem.

 Each digit on a stem is a leaf.

85 7
93 6 7 8
STEM-AND-LEAF DISPLAY
 Leaf Units
A single digit is used to define each leaf.
 In the preceding example, the leaf unit was 1.
 Leaf units may be 100, 10, 1, 0.1, and so on.
 Where the leaf unit is not shown, it is assumed to
equal 1.
EXAMPLE: LEAF UNIT = 0.1
If we have data with values such as
8.6 11.7 9.4 9.1 10.2 11.0 8.8
a stem-and-leaf display of these data will be

Leaf Unit = 0.1


8 6 8
9 1 4
10 2
11 0 7
EXAMPLE: HUDSON AUTO
REPAIR
The manager of Hudson Auto would like to get
a better picture of the distribution of costs for
engine tune-up parts. A sample of 50
customer invoices has been taken and the
costs of parts, rounded to the nearest dollar,
are listed below.
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
EXAMPLE: HUDSON AUTO
REPAIR
 Stem-and-Leaf Display

5 2 7
6 2 2 2 2 5 6 7 8 8 8 9 9 9
7 1 1 2 2 3 4 4 5 5 5 6 7 8 9
9 9
8 0 0 2 3 5 8 9
9 1 3 7 7 7 8 9
10 1 4 5 5 9
SCATTER DIAGRAM

 A scatter diagram is a graphical presentation of


the relationship between two quantitative
variables.
 One variable is shown on the horizontal axis

and the other variable is shown on the vertical


axis.
 The general pattern of the plotted points

suggests the overall relationship between the


variables.
EXAMPLE: PANTHERS FOOTBALL
TEAM
 Scatter Diagram
The Panthers football team is interested
in investigating the relationship, if any,
between interceptions made and points scored.

x = Number of y = Number of
Interceptions Points Scored
1 14
3 24
2 18
1 17
3 27
EXAMPLE: PANTHERS FOOTBALL
TEAM
 Scatter Diagram

Number of Points Scored y

30
25
20
15
10
5
0 x
0 1 2 3
Number of Interceptions
EXAMPLE: PANTHERS FOOTBALL TEAM

 The preceding scatter diagram indicates a


positive relationship between the number of
interceptions and the number of points scored.
 Higher points scored are associated with a

higher number of interceptions.


 The relationship is not perfect; all plotted

points in the scatter diagram are not on a


straight line.
SCATTER DIAGRAM
 A Positive Relationship
y

x
SCATTER DIAGRAM
 A Negative Relationship
y

x
SCATTER DIAGRAM
 No Apparent Relationship
y

x
TABULAR AND GRAPHICAL PROCEDURES
Data

Qualitative
Qualitative Data
Data Quantitative Data

Tabular Graphical Tabular Graphical


Methods Methods Methods Methods

•Frequency •Bar Graph


•Frequency •Histogram
Distribution •Pie Chart
Distribution •Ogive
•Rel. Freq. Dist.
•Rel. Freq. Dist. •Scatter
•% Freq. Dist.
•Cum. Freq. Dist. Diagram
•Crosstabulation
•Cum. Rel. Freq.
Distribution
•Stem-and-Leaf
Display

You might also like