0% found this document useful (0 votes)
3 views

Topic 5 Statictics and Introduction

The document discusses the importance of studying statistics, highlighting its relevance in data collection, presentation, analysis, and interpretation across various fields. It outlines the characteristics, types, and limitations of statistics, as well as the distinction between populations and samples. Additionally, it emphasizes the necessity of statistical methods in making informed decisions in professional and everyday contexts.

Uploaded by

msrect14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Topic 5 Statictics and Introduction

The document discusses the importance of studying statistics, highlighting its relevance in data collection, presentation, analysis, and interpretation across various fields. It outlines the characteristics, types, and limitations of statistics, as well as the distinction between populations and samples. Additionally, it emphasizes the necessity of statistical methods in making informed decisions in professional and everyday contexts.

Uploaded by

msrect14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

“A knowledge of statistics is like a

knowledge of foreign
language of algebra; it may prove
of use at any time under
Any circumstance”…
Bowley.
Why study statistics?
1. Data are everywhere
2. Statistical techniques are used to make
many decisions that affect our lives
3. No matter what your career, you will make
professional decisions that involve data. An
understanding of statistical methods will
help you make these decisions efectively
- Refers to a set of pertinent activities
such as collection, presentation,
analysis and interpretation of
quantitative data
- A field of study which deals with
mathematical characterization of a
group or groups of items.
Collection of Data
- Refers to the process of gathering
numerical information. Methods o
gathering pertinent information
include interview, questionnaire,
experiments, observation, and
documentary analysis.
Presentation of Data
- Once the data are gathered
presentation of data in appropriate
tables and graphs is next. Such tables
refer to frequency distributions
which may either be one-
dimensional or two-dimensional.
Graphical presentation includes bar
graphs, frequency polygon, pie graph
and many others.
Analysis of Data
- Refers to the activity of describing the
properties or behaviour of the data or
the possible correlation of different
quantities or variables. Such
description can be obtained after
summarizing the data into
measurements like the averages.
Interpretation
- Has to be made based on
the preliminary activities
and other statistical
methods. Such methods
involve testing the
significance of the results.
Characteristics
of Statistics
a) Aggregate of facts/data
b) Numerically expressed
c) Affected by different factors
d) Collected or estimated
e) Reasonable standard of accuracy
f) Predetermined purpose
g) Comparable
h) Systematic collection.
TYPES OF
STATISTICS
Descriptive Statistics

deals with collecting, summarizing, and


simplifying data, which are otherwise quite unwieldy and
voluminous. It seeks to achieve this in a manner that
meaningful conclusions can be readily drawn from the
data. Descriptive statistics may thus be seen as
comprising methods of bringing out and highlighting the
latent characteristics present in a set of numerical data. It
not only facilitates an understanding of the data and
systematic reporting thereof in a
manner; and also makes them amenable to further
discussion, analysis, and interpretations.
Methods of organizing, summarizing, and
presenting data in an informative way

Thus, the task of the statistician in this area is


simply to select a few procedures , do some
averaging and eventually be able to identify
significant features of the given data.
Inferential statistics

also known as inductive statistics, goes beyond


describing a given problem situation by means of
collecting, summarizing, and meaningfully
presenting the related data. Instead, it consists of
methods that are used for drawing
inferences, or making broad generalizations, about a
totality of observations on the basis of knowledge about
a part of that totality. The totality of observations about
which an inference may be drawn, or a generalization
made, is called a population or a universe.
Thus the task of the statistician
here is not just to devise ways to
give a summary description of the
data but ways to test the
significance of the results.
Scope of
Statistics
Statistical Methods
such as Collection,
Classification, Tabulation,
Presentation, Analysis,
Interpretation and
Forecasting.
Applied Statistics
It is further divided into three parts:
Descriptive Applied
Statistics :

Purpose of this analysis is


to provide descriptive
information.
Scientific Applied Statistics :
Data are collected with the
purpose
of some scientific research and
with the help of these data some
particular theory or principle is
propounded.
Business Applied Statistics :
Under this branch statistical
methods
are used for the study,
analysis and solution of
various problems in
the field of business.
Functions and importance/
utility of Statistics
Statistical methods are used not only in
the social, economic and political fields
but in every field of science and
knowledge. Statistical analysis has
become more
significant in global relations and in the
age of fast developing information
technology.
According to Prof.
Bowley, “The proper
function of statistics is to
enlarge
individual experiences”.
Following are some of the important
functions of Statistics :

a) To provide numerical facts.


b) To simplify complex facts.
c) To enlarge human knowledge
and experience.
d) Helps in formulation of policies.
e) To provide comparison.
f) To establish mutual
relations.
g) Helps in forecasting.
h) Test the accuracy of
scientific theories.
i) To study extensively and
intensively.
The use of statistics has become
almost essential in order to
clearly understand
and solve a problem. Statistics
proves to be much useful in
unfamiliar fields of
application and complex
situations such as :-
a) Planning
b) Administration
c) Economics
d) Trade & Commerce
e) Production
management
f) Quality control
g) Helpful in inspection
h) Insurance business
i) Railways & transport Co
j) Banking Institutions
k) Speculation and Gambling
l) Underwriters and
Investors
m) Politicians & social
workers
LIMITATIONS OF
STATISTICS
(i) There are certain phenomena or
concepts where statistics cannot be used.
This
is because these phenomena or concepts
are not amenable to measurement.
For example, beauty, intelligence, courage
cannot be quantified. Statistics has
no place in all such cases where
quantification is not possible.
(ii) Statistics reveal the average behaviour, the
normal or the general trend. An application of
the 'average' concept if applied to an individual
or a particular situation may lead to a wrong
conclusion and sometimes may be disastrous.
For example, one may be misguided when told
that the average depth of a river from one bank
to the other is four feet, when there may be
some points in between where its depth is far
more than four feet. On this understanding, one
may enter those points having greater depth,
which may be hazardous.
(iii) Since statistics are collected for a
particular purpose, such data may not be
relevant or useful in other situations or
cases. For example, secondary data
(i.e., data originally collected by someone
else) may not be useful for the other
person.
(iv) Statistics are not 100 per cent precise
as is Mathematics or Accountancy.
Those who use statistics should be aware
of this limitation.
(v) In statistical surveys, sampling is
generally used as it is not physically possible
to cover all the units or elements comprising
the universe. The results may not
be appropriate as far as the universe is
concerned. Moreover, different surveys
based on the same size of sample but
different sample units may yield
different results.
(vi) At times, association or relationship
between two or more variables is studied
in statistics, but such a relationship does not
indicate cause and effect'
relationship. It simply shows the similarity
or dissimilarity in the movement of
the two variables. In such cases, it is the
user who has to interpret the results
carefully, pointing out the type of
relationship obtained.
(vii) A major limitation of statistics is that
it does not reveal all pertaining to a
certain phenomenon. There is some
background information that statistics
does not cover. Similarly, there are some
other aspects related to the problem
on hand, which are also not covered. The
user of Statistics has to be well
informed and should interpret Statistics
keeping in mind all other aspects
having relevance on the given problem.
SAMPLE
AND
POPULATION
Suppose an statistician is interested in
knowing the following events:
1.The percentage of all families in the
Philippines who earn less than Php 120
000.00 a year for 2010.
2. The 2011 gross sales of all companies
in Manila.
3.The prices of all statistics books
published in the Philippines during the
past five years.
In these examples, the statistician is
interested in all families, all firms and all
statistics books. Each of this groups is
called the population for the respective
examples. In statistics , a population
does not necessarily mean a collection
of people, it can in fact be a collection of
people or of any kind of items such as
books, television sets or cars.
POPULATION OR TARGET
POPULATION

A population consists of all elements


– individuals , items, or objects –
whose characteristics are being
studied. The population being studied
is called the target population.
SAMPLE
A portion of the population selected
for study
Most of the time, decisions are made based on
portions of populations. For examples, the various
election polls conducted in the Presidential
Election to estimate the percentage of voters
favouring various candidates in the presidential
election is based on only a few hundred or
thousand voters selected from various precincts.
REPRESENTATIVE SAMPLE

A sample that represent s the characteristics


of the population as closely as possible.

As an example, to find the average income


of families living in Quezon City by
conducting a sample survey, the sample
must contain families who belong to
different income groups in almost the same
proportion as they exist in the population.
RANDOM SAMPLE
A sample drawn in such a way that each
element of the population have equal
chances of being selected.
One way to select a random sample is by lottery or
draw. For examples, if we are to select five
employees from a total of 50, we write each of the
50 names on a separate piece of paper. Then we
place all 50 slips in a box and mix them thoroughly.
Finally, we randomly draw five slips from the box.
The five names drawn will give a random sample.
If we arrange all 50 names alphabetically
and then select the first five names on the
list, it would be a non random sample
because the employees listed sixth to fiftieth
have no chance of being included in the
sample.
A simple problem that is usually
encountered by the researcher is the
determination of the sample size. What
must be the percentage of the sample size
to the population size? If we take a part of
the given population as sample, then not
all members of the population shall be
taken into consideration. In this regard, we
have to consider the margin of error that
will be obtained due to sampling.
To compute for the value of the sample size
n relative to the population size N, we have
the formula

𝑁
𝑛= 2
1 + 𝑁𝑒
Where: N is the population size
e is the margin of error
n is the sample size
Example:

A researcher is conducting an
investigation regarding the factors
affecting the efficiency of the 185
faculty members of a certain college. If
he wanted to have a margin of error of
5%, then how many of the faculty
members should be taken as
respondents?
Types of
Variables
Quantitative Variables

A variable that can be measured numerically.


The data collected about a quantitative
variable are called quantitative data.

Examples:
Income, gross sales, prices of homes,
numbers of cars owned, stock prices, and
accidents.
DISCRETE VARIABLES
A variable whose values are countable.
It can assume only certain values with
no intermediate values.

CONTINOUS VARIABLES
A variable that can assume any
numerical value over a certain interval
or intervals.
QUALITATIVE OR CATEGORICAL VARIABLES

A variable that cannot assume a numerical


value but can be classified into two or more
nonnumeric categories . The data collected
about such a variable are called qualitative
data.
Examples:
softness of your skin, the grace with which
you run, and the colour of your eyes
Data can also be classified based on
scales of measurements, also called
levels of measurements. These are four
scales of measurements: nominal,
ordinal, interval and ratio data. The
nominal scale data are of the lowest
level and ratio scale data are the highest
level.
NOMINAL SCALE
applies to data that are divided into different
categories and these are used only for
identification purposes.
ORDINAL SCALE
applies to data that are divided into different
categories that can be ranked.
INTERVAL SCALE
applies to data that can be ranked and for
which the difference between two values can
be calculated and interpreted.
RATIO SCALE
applies to data that can be ranked and for
which all arithmetic operation can be done.
Summation notation is used
to denote the sum of values.
The uppercase Greek letter
σ (pronounced sigma) is
used to denote the sum of
all values.
Consider the following paired
observations represented by the
variables x and y.

x : 3, 2, 8, 6, 2, 0
y : 5, 0, 8, 3, 2, 5
Evaluate the following expressions

1. σ 𝑥
2. σ 𝑦
3. σ 𝑥 2

4. σ 𝑥 2

5. σ(𝑥𝑦)
6. σ 𝑥 σ 𝑦
Tips:
1. Summation of x is simply the sum of
all the values represented by the
variable x.
2. The value of summation of y is the
sum of all values of y.
3. The expression σ 𝑥 2 implies that the
values represented by the variable x
should be squared first before being
added.
σ 2
4. The expression 𝑥 implies that
the quantities represented by x should
be added first before the sum is
squared.
5. The expression σ(𝑥𝑦)implies that
each value of x should b first
multiplied to its corresponding value
of y. Then, the products shall be
added.
6. The expression σ 𝑥 σ 𝑦 refers to the product of
the summation of x and summation of y. Thus, to
get the product we should get first the sum of the
values represented by x and the sum of the values
represented by y.
4
3

෍ (30 − 𝑘) ෍ (2𝑘 − 5)
𝑘=1
𝑘=1

7 5

෍ (3𝑘) ෍ (4𝑘 + 2)
𝑘=1 𝑘=1
Generally, data collected from
different sources are usually
unorganized and in a form unsuitable
for immediate interpretation. In any
statistical investigation, once
pertinent data are already gathered,
the next step is to present such data
in organized form using appropriate
tables and graphs.
When the mass of data is so large, it
can hardly give information that can
be of help in making fast decisions.
Management of modern business
demands fast and accurate decisions.
Otherwise, they can be ruined by
competition from others who know
how to summarize large mass of data
and how to interpret them.
One such summary table is called
frequency distribution. The frequency
distribution is an arrangement of the
data which shows the frequency of
different values or groups of values
of a variable.
Ages (in years) of 100 Residents of Panhulan, Agoncillo,
Batangas
14 27 27 23 29 21 20 12 22 17
23 24 18 20 27 16 12 22 19 19
15 20 29 25 24 20 20 17 18 18
12 22 23 17 23 26 16 21 21 20
17 18 26 18 28 27 18 22 19 16
14 16 19 20 20 18 25 19 26 15
28 13 18 17 14 27 24 20 18 25
17 20 23 18 18 24 19 19 14 18
21 21 25 24 14 25 20 17 17 17
15 12 26 23 17 20 24 25 18 15
What is the lowest age?

What is the oldest age?

How many of the residents ages 20?

How many of the residents ages 13?


Did you find difficulty looking for
the asked data?

What is your recommendation?


Ages (in years) of 100 Residents of Panhulan, Agoncillo,
Batangas
29 26 24 23 20 20 18 18 17 14
29 26 24 22 20 20 18 18 17 14
28 26 24 22 20 19 18 17 16 14
28 25 24 22 20 19 18 17 16 14
27 25 24 22 20 19 18 17 16 14
27 25 23 21 20 19 18 17 16 13
27 25 23 21 20 19 18 17 15 12
27 25 23 21 20 19 18 17 15 12
27 25 23 21 20 19 18 17 15 12
26 24 23 21 20 18 18 17 15 12
1. Find the range by getting the difference between the
highest and lowest values in the set of data.
R=H-L
2. Determine the number of classes. In the determination
of the number of classes, it should be noted that there is
no standard method to follow. Generally, the number of
classes must not be less than 5 and should not be more
than 15. In some instances, however, the number of
classes can be approximately by using the relation
k = 1 + 3.3 log n
Where k is the number of classes
n is the sample size
3. Determine the size of the class interval. The
value of c can be obtained by dividing the
range by the desired number of classes.
𝑅
𝑐=
𝑘
4. Construct the classes. In constructing the
classes, we first determine the lowest lower
limit of the distribution. The value of this lower
limit can be chosen arbitrarily as long as the
lowest value shall fall on the first interval and
the highest value to the last interval.
5. Determine the frequency of each class.
The determination of the number of
frequencies is done by counting the
number of items that fall in each interval.
Refers to the number of observation
belonging to a class interval or the
number of items within a category. A
class interval is a grouping or
category defined by a lower limit and
upper limit such 12 – 14; 15 – 17; 18
– 20 and so forth. In the class interval
21 – 23 for example 21 is the lower
limit and 23 is the upper limit.
Class marks are the midpoints of the
classes and they are found by adding
the lower limits and upper limit and
dividing by two (2). A class interval is
simply the length of a class or the
range of values it can contain.
Add an additional
column for the
class marks (x).
Class boundaries are more precise
expressions of the class limits by at 0.5
of their values. If our data are prices
rounded to the nearest centavos, the
class Php 5 – Php 9 actually contains all
prices between Php 4.50 – Php 9.50 and
if our data are lengths rounded to the
nearest tenth of a centimeter, the class
1.5 – 1.9 actually contains between
1.45 – 1.95 centimeters . These pairs
of values are usually called class
boundaries or real class limits . For
the distribution of ages of the 100
residents the class boundaries are
11.5 and 14.5, 14.5 and 17.5 and so
on.
Add an addition
column for the
class boundaries.
Derived Frequency
Distribution
Given a frequency distribution,
we can construct other
frequency distributions like the
relative frequency distribution
and the cumulative frequency
distribution.
RELATIVE
FREQUENCY
DISTRIBUTION
The relative frequency distribution
of a given data shows the
proportion in percent the
frequency of each class to the total
frequency. The relative frequency
denoted by “%f” can be obtained
by dividing the class frequency by
the sample size and multiplying the
result by 100.
The formula for converting the class
frequency to percent:

𝑓
%𝑓 = 𝑋 100
𝑛
Where: %f = the relative frequency
for each class interval
f = the frequency of each class
n = the sample size
Determine the relative
frequency of the ages of
100 residents of
Panhulan, Agoncillo,
Batangas
The cumulative frequency distribution can
also be derived from the frequency
distribution. This distribution can be
obtained by simply adding the class
frequencies. Unlike the relative frequency
distribution where the frequencies are
converted as percents of the sample size,
this type of distribution tries to determine
partial sums from the data classified in terms
of classes.
There are two types of
cumulative frequency
distribution. These are as
follows:
Less than cumulative
frequency distribution

- Refers to the distribution whose


frequencies are less than or below
the upper class boundary they
correspond to. We shall let < cumf be
the less than cumulative frequency
Greater than Cumulative
Frequency
- Refers to the distribution whose
frequencies are greater than or
above the lower class boundary they
correspond to. We shall let > cumf
be the greater than cumulative
frequency.
Add column for the >cumf
and < cumf on the ages of
100 residents of Panhulan,
Agoncillo, Batangas

You might also like