Lesson in Statistics
Lesson in Statistics
Statistics-is a scientific body of knowledge that deals with the collection, organization or
presentation, analysis, and interpretation of data
CATEGORIES OF STATISTICS
Example: We may describe a collection of persons by stating how many are poor and how many
are rich, how many fall into various categories of age, height, civil status, IQ, and many more.
Example: As a result of the increase in the number of patients in a hospital this week because of
a certain disease, it is expected that the number of patients will double next week.
1.The term population, as used in statistics, refers to a group of aggregate people, objects or
events.
4. Parameters – the data obtained about the population.Ex.The researcher uses the whole
population(N=1500), then the average income obtained is called a parameter.
5. Statistic – data about samples.Ex. The researcher makes use of the sample(n=200), then the
average income is called a statistic.
9.Quantitative data – are data which are numerical in nature.This data are obtained from counting
and measuring. In addition, meaningful arithmetic operations can be done with this type of data.
Example
a)Qualitative data– sex(male or female), attitude (favorable or not favorable), emotional condition
(happy or sad)
b) Quantitative data
Example(Discrete numbers)
Number of barangays
Note. For discrete numbers, decimals have no meaning such as 100 families not 100.47 families.
Example(Continuous numbers)
Classification of variables
1.)discrete variable is one that can assume a finite number of values. In other words it can
assume specific values only.These values are obtained through the process of counting.
2.)continuous is one that can assume infinite values within a specified interval.The values of a
countinuous variable are obtained through measuring.
1.)Nominal scale- this is the most primitive level of measurement.This is use when we want to
distinguish one object from another for identification purposes.Ex. zip code, credit card
numbers,gender.
2.)Ordinal scale – data are arranged in some specific order or rank. When objects are measured
in this level, we can say that one is better or greater than the other, but we cannot tell how much
more or how much less one object than the other.Ex. the rank of contestant in a beauty contest.
3.)Interval scale- if data are measured in this level, we can say not only one object is greater
than or less than the other, but we can also specify the amount of difference.To illustrate, suppose
Maria got 50 in math quiz while Martha got 40, we can say that Maria got higher than Martha by 10
4.)Ratio scale- this level of measurement is like the interval level.The only difference is that ratio
level always starts from an absolute zero point.in addition, this level mostly has the presence of unit
of measure.If data are measured in this level, we can say that one object is so many times as large or
as small as the other .Ex., suppose Mrs. Reyes weighs 50kg, while her daughter weighs 25kg.we can
say that Mrs. Reyes is twice as heavy as her daughter.Thus, weight is an example of data measured
in the ratio scale.
Exercise 1.1
A.Indicate whether the data represented in each of the following is a part of population or a sample
1. Twenty five cases of TB have been reported in the past year and a patient is to be carried out
using data from all 25 cases.
2. A total 338 chest x-rays were performed during the past months.A quality control review is to
be carried on 10% of the group.
B. Tell whether the following situations will make use of descriptive or inferential statistics.
1. A teacher computes the average grades of her students and determine the top 10 students.
1.Place of birth
2.Type of insurance
D.Determine whether the numbers obtained in the following variables are discrete or continuous.
1. Spots on a die
8. Books in a library
9. Height of basketball players
I.Collecting Data
2 Sources of data
1.Primary data refer to information which are gathered from an original source, or which are based
on direct or first- hand experienced. Sources of data are the government institutions, business
agencies, and other organizations.
2.Secondary data refer to information which are taken by other individuals or agencies.Examples are
published books newspapers,biographies and the likes.
1.The Direct or Interview Method- the researcher or the interviewer has a direct contact with the
interviewee.The researcher obtains the information needed by asking questions and inquiries from
the interviewee.in this method the researcher can get more accurate answer since clarification can
be done by the interviewee if respondent does not understand the question. However, this method
is costly and time consuming .
Ex. A business firm would interview residents of a certain barangay regarding their favourite
brand of toothpaste.
2.The Indirect or Questioner Method-This method makes use of the questionnaire. The researcher
distributes the questionnaire to the respondents either by personal delivery or by mail. In this
method the researcher can save a lot of time and money because questionnaires can be given to a
large number of respondents at the same time.however, the researcher cannot expect that all
questionnaires will be answered because some of the respondents simply ignore it.In addition,
clarification cannot be made by the respondent who does not understand the question.
3.The Registration Method- this method of gathering data is enforced by certain laws.
4.The Experimental Method- this method usually used to find out cause and effect relationship of a
certain phenomena under controlled condition.Scientific researchers often used this method.
SAMPLING TECHNIQUES
a.)Random sampling-is the procedure by which all the members of the population have an
equal chance of being selected.
can be performed by the “fish bowl” method or lottery method wherein each individual in
a population is assigned a number and lots are drawn to determine which individual will be
included in the sample or using table of random numbers or using calculator with function
key labeled RAN that gives random numbers.
b.)Systematic Sampling-Involves taking the every kth element in a population as part of the
sample. The starting point is determined by the nature of the population or is selected at
random.
Ex. Mrs. Cruz wants to select 5 students out of her 40 students.First. we select a random
starting point.This is done by dividing the number of members by target number of samples
.Hence, in our case we shall have i=8. The next step is to write the numbers 1,2,3,4,5,6,7 and
8 on a pieces of paper and draw one number by lottery.If we are able to draw 5, this means
that we will select every 5th student in the group of 8.That is, 5th,13th,21st,29th,and 37th.If for
instance we pick 7, then the members of the sample will be the 7th,15th,23rd,31st,and 39th.
c.Stratified Random sampling- The word stratified comes from the root word strata which
means groups or categories(singular form stratum).There are some instances the members
of the population do not belong to the same category or group.When we use this method
we are actually dividing the elements of a population into different categories or
subpopulation.
Strata Number of
Families
High-income 1000
Average-income 2500
Low-income 1500
First find the percentage of each stratum by dividing the number of families in each stratum by the
total number of families.Then multiply each percentage by the desired number of familiesin the
sample. The table below shows how it is done.
N=5000 n=200
d.)Cluster Sampling-is a sampling technique wherein groups or clusters instead of individuals are
randomly chose.This is sometimes called area sampling because this usually apply when the
population is large.
Ex. Let’s suppose we want to find the average income of the families in Zamboanga city. Assume that
there are 98 barangays in zamboanga city. We can draw a random saple of 10 barangays using
simple random sampling, and then a certain number of families from each 10 barangays may be
chosen.
This sampling technique wherein members of the sample drawn from the population based on the
judgment of the researchers. The results of the study using this technique are relatively biased , it
also lacks objectivity of selection; hence, it is sometimes called subjective sampling. However, this is
convenient and economical.
a.)Convinience sampling- as the name implies, convenience sampling is use because of the
convenience it offers to the researcher.
For example, a researcher who wishes to investigate the most popular noontime show may just the
opinions of those without telephone will not be included.
b.) Quota Sampling-in this type of sampling, the proportions or the various subgroups in the
population are determined and the sample is drawn to have the same percentage in it.This is very
similar to stratified random sampling the only difference is that selection of the members is not
done randomly.
To illustrate this, let us suppose that that we want to determine teenager’s favourite brand of t-
shirt.If there are 1000 female and 500 male teenagers in the population and we want 150 members
of the sample, we can select 75males and 75 females from the sample without randomnization.
Ex. Suppose that the target is to find the effectivity of a certain kind of shampoo. Of course, bald
fellows will not be included in the sample.
To determine the sample size from a given population, the Slovin’s formula is used.
Slovin’s formula :
n=
N= population size
e = margin of error
To illustrate suppose we want to find the average age of the students in Zamboanga. However, due
to insufficient time, only the students in three particular schools were used to estimate the average
age. Obviously, the result is not the actual average but just an estimate and thus, there is usually an
error when we use the sample size instead of the population.
Example:
A group of researchers will conduct a survey to find out the opinion of residents of a particular
community regarding the oil price hike. If there are 10000 residents in the community and the
researchers plan to use a sample using 10% margin of error, what should the sample size be?
Solution:
Here, N= 10000 and e= 10% or o.1. Subtituting the given values in the formula, we have :
n= = = = 99.01 or 99
Hence, the researchers will just conduct the survey using 99 residents. A 10% margin of error means
that the researcher is 90% confident that the result obtained using the sample will closely
approximate the result using the entire population.
Summation Notation
In the study of statistics, we shall be using mathematical symbols, one of the most common is the
summation notation or simply summation (Ʃ ).
Recall that variables are represented using capital letters. If our variable is age, then we can
represent this by X. Hence, if there are 40 students in a class, we can represent the age of the first
student by X1, the second student by X2, the third by X3, and so on. If we want to find the sum of
these ages, then we can write the sum in this way:
X1 + X2 + X3 + …………………+ X40
Here i is the index of summation and its value ranges from 1, the lower limit, to40, the upper limit.
Observe also that when we write the sum f values in summation notation, we replace the subscript
of the sum variable by an arbitrary subscript i and indicate in the index the range of the
summation.Thus, = X1 + X2 + X3 + ………..+ X40
Exercise 2
b. N= 8000, e = 10%
a.
b.
c. + Bi )
4.) Given the following: X1=2, X2 = 4, X3 = 5, Y1 =1, Y2 = 3 and Y3 = 7, find the sum.
a.
b.Yi
After data have been gathered and checked for possible errors, the next logical step is to present the
data in a manner that it is easy to understand. It should also convey the relevant information and the
important results at a glance.
Ungrouped data – are data that are not organized, or if arranged, could only be from lowest to
highest or highest to lowest
Grouped data – are data that are organized and arranged into different classes or categories.
1.)Textual- the presentation is narrative or paragraph form. The data are within the text of the
paragraph. This involves enumerating the important characteristics, giving emphasis on significant
figures and identifying important features of the data. This may not get immediate interest of the
reader. However, it can present a more comprehensive picture of the data because of further
written explanation of its nature.
Example:
Nominally, the peso improved by 1.4% as of April 14, 2003 compared to its level n 2002, followed by
Thai baht, which gained 0.86%; Indonesian rupiah, 0.68%; and Taiwan dollar, 0.2%. Other currencies
on the other hand, depreciated during the same period. The Singapore dollar fell 2.33%. The South
Korean won slid 2.14% while the Japanese yen dropped 0.61%(Phil. Daily Inquirer, April 17, 2003,
p.B2)
2.) Tabular – sometimes we could hardly grasp information from textual presentation of data. Thus,
we may present data by using tables. By organizing data in tables, important feature about the data
can readily understood and comparisons can be easily made. Thus, a table shows complete
information regarding the data.
Parts of a Table
1.Heading:It includes the following,
5. Foot note/Source note: This is only placed below the table when the data written are not original;
that is, it indicates the source of data.
Religion Sex
Male Female Total
Roman Catholic 2,758 2,693 5,451
Islam 113 126 239
Iglesia ni Cristo 82 79 161
Others 231 275 506
Total 3,184 3,173 6,357
Source: 1994 Iligan Census Summary Report
B. Frequency Distribution
Frequency distribution is a grouping of the number of all observations into intervals or classes
together with a count of the number of observations that fall in each interval or class.
a. k=
b. 1 + 3.322 log n
3. Estimate the width c of the interval ( c= R/K).Round off this to the same number of significant
decimal places as the original set of data.
4. List the lower and the upper class limits of the first class, this interval should contain the lowest
observation in the data set.
5. List all the class by adding the class width to the limits of the previous interval. The highest class
should contain the largest observation in the data set.
For small data set, grouping data into intervals may still be done without loss of information. Stem
and leaf plot is a table consisting of a stem and leaf.
3.Graphical
A bar chart is graph where the different classes are represented by rectangles and bars. The
width of the rectangles is the width of the interval represented by the class limits in the horizontal
axis or categories for the nominal data. The length of the rectangle, represented by the frequency, is
drawn in the vertical axis.
A histogram is a graph which is a close resemblance of the bar chart. Histogram employs the class
boundaries for the horizontal axis.
B. Friquency polygon
A frequency polygon is constructed by plotting the class marks against the frequency. To
complete the polygon, which is mathematically defined as a closed figure, an additional class mark is
added at the beginning and at the end of the distribution.
C. Frequency ogive