STATISTICS Notes 1
STATISTICS Notes 1
- a branch of mathematics that deals with the collection, presentation, analysis, and interpretation of data.
Collection of data – refers to the gathering of information or data
Presentation of data – involves summarizing data or information in textual, graphical, or tabular forms.
Analysis of data – involves describing data using statistical methods and procedures
Interpretation of data – refers to the process of making conclusions based on the analyzed data
History of Statistics
Sumerians (5,000 yrs ago) – counted their citizens for taxation purposes
Moses (1491 B.C.) and David (1017 B.C.) – census were undertaken
King Asoka (270 – 230 B.C.) – described methods of taking censuses
Servinus Tullius (578- 534 B.C.) – ruled as the 6th king of Rome; instituted the gathering of population
data
William the Conqueror (England) – required the compilation of information on population and resources.
This
compilation, “The Domesday Book,” is the first landmark in British statistics.
Achenwall (1719 – 1772) – first introduced the word “statistiks” in a preface to a statistical work
Zimmermann and Sinclair – introduced and popularized the name “statistics” in their books
Girotamo Cardano – Italian mathematician, physician, and gambler who wrote “Liber de Ludo Aleae” in
which
appeared the first known study of the principles of probability
Chevalier de Mere – made a proposal to Blaise Pascal in the famous “Problem of Points”, a work which
marked the
beginning of the mathematics of probability
Quetelet (19th century) – Belgian astronomer who applied the theory of probability to anthropological
measurements and expanded the same principle to the physiological, psychological, physical,
and chemical fields
Francis Galton (1822 – 1911) and Karl Pearson (1857 – 1936) – contributed greatly to the development
of the
correlation theory. Galton developed the use of percentiles.
Sir Ronald Fisher (1890 – 1962) – made contributions on contemporary statistical procedures, i.e.
analysis of
variance or ANOVA
Application of Statistics
1. In Business
- A business firm collects and gathers data or information from its everyday operation. Statistics is used
to summarize and describe those data such as the amount of sales, expenditures, and production to
enable the management to understand and determine the status of the firm. Data that have been
organized and analyzed provide the management baseline data to make wise decisions pertaining to the
operation of the business.
2. In Education
- Through statistical tools, a teacher can determine the effectiveness of a particular teaching method by
analyzing test scores obtained by their students. Results of this study may be used to improve teaching –
learning activities.
3. In Psychology
- Psychologists are able to interpret meaningful aptitude tests, IQ tests, and other psychological tests
using statistical procedures or tools.
4. In Politics and Government
- Public opinion and election polls are commonly used to assess the opinions or preferences of the public
for issues or candidates of interest. Statistics plays an important role in conducting surveys or interviews
for that purpose.
5. In Medicine
- Statistics is also used in determining the effectiveness of new drug products in treating a particular type
of disease. To illustrate, a drug company wants to test the effectiveness of its new drug product in
treating tuberculosis. An experiment or a clinical trial is conducted. Ten tuberculosis patients are treated
using the new drug product and another ten are treated using the existing drug. The results are analyzed
statistically to find out if the new product is more effective in treating tuberculosis.
6. In Agriculture
- Through statistical tools, an agriculturist can determine the effectiveness of a new fertilizer in the
growth of plants or crops. Moreover, crop production and yield can be better analyzed through the use of
statistical methods.
7. In Entertainment
- The most favorite actresses and actors can be determined by using surveys. Ratings of the members of
the board of judges in a beauty contest are statistically analyzed. Interviews are used to determine the
most widely viewed television show. The top grosser movies for this year are reported based on
statistical records of movie houses. All these activities involve the use of statistics.
8. In Everyday Life
- The number of cars passing through streets or a highway is recorded to enable traffic enforcers to
manage efficiently. Even the number of pedestrians crossing the street, the number of people entering a
warehouse or a department store, and the number of people engaged in video games involve the use of
statistics. In short, statistics is found and used in everyday life.
Types of Statistics
1. Descriptive Statistics
- a statistical procedure concerned with describing the characteristics and properties of a group of
persons, places, or things.
- involves gathering, organizing, presenting, and describing data
- statistical tools: mean, median, mode, range, standard deviation, variance, skewness and kurtosis
- key terms: count, number of, average, level of effectiveness
- e.g. The National Statistics Office conducts surveys to determine the average age, income, and other
characteristics of the Filipino population.
2. Inferential Statistics
- s statistical procedure that is used to draw inferences or information about the properties or
characteristics by a large group of people, places, or things on the basis of the information obtained from
a small portion of a large group.
- involves analysis of data so that meaningful interpretation or conclusion about a large group of people
can be formulated
- statistical tools: z-test, t-test, f-test or Analysis of Variance(ANOVA), simple linear correlation, chi-
square test, regression analysis, and time series analysis
- key terms: predict, significant relationship, significant difference
- e.g. A teacher conducts a research to determine whether there is a correlation between the performance
of students in math and science.
Terminology in Statistics
1. Population – refers to a large collection of objects, persons, places, or things. To illustrate this, suppose
a researcher wants to determine the average income of the residents of a certain barangay and there are
1,500 residents in the barangay. Then all of these residents comprise the population. A population is
usually denoted or represented by N. Hence, in this case, N = 1,500.
2. Sample – a small portion or part of a population. It could also be defined as a subgroup, subset, or
representative of a population. For instance, suppose the above-mentioned researcher does not have
enough time and money to conduct the study using the whole population and he wants to use only 200
residents. These 200 residents comprise the sample. A sample is usually denoted by n, thus, n = 200.
3. Parameter – any numerical or nominal characteristic of a population. It is a value or measurement
obtained from a population. It is usually referred to as the true or actual value. If in the preceding
illustration, the researcher uses the whole population (N=1,500), then the average income obtained is
called a parameter.
4. Statistic – an estimate of a parameter. It is any value or measurement obtained from a sample. If the
researcher in the preceding illustration makes use of the sample (n = 200), then the average income
obtained is called a statistic.
5. Data – (singular form is datum) are facts, or a set of information or observations under study. More
specifically, data are gathered by the researcher from a population or from a sample.
Types:
a. Qualitative Data – data which can assume values that manifest the concept of attributes. These are
sometimes called categorical data. Data falling in this category cannot be subjected to meaningful
arithmetic operations. They cannot be added, subtracted, multiplied or divided.
Categories:
a.1 Dichotomous – 2 e.g. Sex __Male __Female
a.2 Trichotomous – 3 e.g. Scale __ Yes __No __Undecided
a.3 Multinomous – 4 or more e.g. Nationality
Scale __ Very Effective
__ Effective
__ Moderately Effective
__ Not Effective
b. Quantitative Data – data which are numerical in nature. These are data obtained from counting or
measuring. In addition, meaningful arithmetic operations can be done with this type of data.
Categories:
b.1 Discrete – assume a finite number of values. The values of discrete data are obtained through
counting. These are data represented by whole numbers.
e.g. number of students in a class
b.2 Continuous – assume infinite values within a specified interval. The values of continuous data are
obtained through measuring. These are data represented by decimals.
e.g. height of a building
6. Variable – a characteristic or property of a population or sample which makes the members different
from each other. If a class consists of boys and girls, then gender is a variable in this class. Height is also a
variable because different people have different heights.
Types:
a. Dependent Variable – a variable which is affected or influenced by another variable
b. Independent Variable – one which affects or influences the dependent variable.
e.g. The Effect of Absenteeism to the Academic Performance of Students
- absenteeism is the independent variable while academic performance of students is the dependent
variable
7. Constant – property or characteristic of a population or sample, which makes the members of the
group similar to each other. For example, if a class is composed of all boys, then gender is constant.
Levels or Scales of Measurement
Statistics deals mostly with measurements. We define measurement as the assignment of symbols or
numerals to objects or events according to some rules. Since different rules are used for the assignment
of symbols, then this would yield different scales of measurement.
1. Nominal Level
– categorical data
– most primitive level of measurement. It is characterized by data that consist of names, labels, or
categories only.
– e.g. gender, nationality, civil status, political party, survey responses of yes, no, or undecided
2. Ordinal
- ranked data
- involves data that may be arranged in some order but differences between data values either cannot be
determined or are meaningless
- e.g. job position, Likert Scale (scale showing respondents' agreement or disagreement: a scale
measuring the degree to which people agree or disagree with a statement)
3. Interval
- like the ordinal level but meaningful amounts of differences between data can be determined. It has no
inherent (natural) zero starting point (where none of the quantity is present).
- e.g. temperature in 0C and 0F, the years 1001, 1953, 2004, 2009
4. Ratio
- interval level modified to include the inherent zero starting point (where zero indicates that none of the
quantity is present).
- e.g. height, lengths of movies, weight, temperature in Kelvin, income
COLLECTING DATA
Sources of Data
1. Primary Sources – government institutions, business agencies, and organizations
2. Secondary Sources - books, encyclopedia, journals, magazines, and research or studies conducted by
individuals
Ways of Collecting Data
1. The Direct or Interview Method
- The researcher has a direct contact with the interviewee. The researcher obtains the information
needed by asking questions and inquiries from the interviewee.
2. The Indirect or Questionnaire Method
- This method makes use of a written questionnaire. The researcher gives or distributes the
questionnaire to the respondents either by personal delivery or by mail.
3. The Registration Method
- This method of collecting data is governed by laws.
4. The Experimental Method
- This method is usually used to find out cause and effect relationships.
SAMPLING TECHNIQUES
As soon as we have chosen the method of collecting data and the sample size to be used in our study or
research, the next step is to choose the sampling technique to be employed.
Sampling Technique – a procedure used to determine the individuals or members of a sample.
Suppose a guidance counselor of a certain school wants to determine the average weekly allowance of the
students, if there are 2000 students in this school and the guidance counselor decided to use only 100
students as a sample, who will be included in the sample?
Sampling techniques are used to answer the question concerning who will be included in the sample.
1. PROBABILITY SAMPLING
- a sampling technique wherein each member or element of the population has an equal chance of being
selected as members of the sample.
- a sampling without bias because selection of members of the sample is not predetermined.
a. Random Sampling/Lottery Method
- This is done by using chance methods or random numbers. For example, number each subject in the
population. Place each number in a bowl, and select as many card numbers as needed. The subjects
whose numbers are selected composes the sample.
b. Systematic Sampling
- This is done by numbering each subject of the population and then selecting every kth number. For
example, there are 5,000 families in a city. Fifty families are needed as sample for an experiment. Since
5,000 50 = 100, then k = 100. This means that every 100th subject would be selected. However, the first
subject would be selected at random from subjects 1 to 100. Suppose the subject 88 was selected, then
the sample would consist of subjects whose numbers were 88, 188, 288, and so on until 50 families were
obtained.
c. Stratified Random Sampling
- If a population has distinct groups, it is possible to divide the population into these groups and to draw
members of the sample from each group. The groups are called strata. Strata are designed so that
members in each strata are more homogenous, that is, more similar to each other. The results are then
grouped together to form the sample. This technique is particularly useful in populations that can be
stratified into groups by gender, race or geography.
E.g. Suppose a community consists of 5,000 families belonging to different income brackets. We will draw
200 families as our sample using the stratified random sampling. Below are the subpopulations and the
corresponding number of families belonging to each subpopulation or stratum.
Strata Number of Families
High – Income Families 1,000
Average – Income Families 2,500
Low – Income Families 1,500
N = 5,000
Solution:
Strata Number of Families Number of Families in the Sample
High 1,000
Average 2,500
Low 1,500
N = 5,000 (Population) n = 200 (Sample)
From the above computation, we see that if we are going to draw 200 members from a population of
5,000, we should draw 40 families belonging to the high-income, 100 from the average, and 60 from the
low-income group. Observe that the number of families drawn as a sample in each stratum is
proportional to the number of families from the population.
d. Cluster Sampling
- This method uses intact groups called clusters. It is a sampling wherein groups or clusters instead of
individuals are randomly chosen. In cluster sampling, we will select or draw the members of the sample
by group and then we select a sample of elements from each cluster or group randomly. This sampling
technique is sometimes called area sampling because this is usually applied when the population is large.
For example, suppose a medical researcher wants to study the patients in San Fernando City, La Union. It
would be very costly and time-consuming to obtain a random sample since they would be spread over
the different parts of San Fernando City, La Union. Rather, a few hospitals could be selected at random
and the patients in these hospitals would be studied in a cluster.
e. Multi-Stage Sampling
- a combination of several sampling techniques. This method is used by researchers who are interested in
studying a very large population; say the whole island of Luzon, or even the Philippines. This is done by
starting the selection of the members of the sample using cluster sampling and then dividing each cluster
or group into strata. Then, from each stratum individuals are drawn randomly using simple random
sampling.
2. NON-PROBABILITY SAMPLING
- a sampling technique wherein members of the sample are drawn from the population based on the
judgment of the researchers. The results of a study using this sampling technique are relatively biased.
This technique lacks objectivity of selection; hence, it is sometimes called subjective sampling. Inferences
made based on the sample obtained using this technique are not so reliable.
- non-probability sampling techniques are used because they are convenient and economical.
Researchers use these methods because they are inexpensive and easy to conduct.
a. Convenience Sampling
- used because of the convenience if offers to the researcher. For example, a researcher who wishes to
investigate the most popular noontime show may just interview the respondents through the telephone.
The result of this interview will be biased because the opinions of those without telephone will not be
included. Although convenience sampling may be used occasionally, we cannot depend on it making
inferences about a population.
b. Quota Sampling
- In this type of sampling, the proportions of the various subgroups in the population are determined and
the sample is drawn to have the same percentage in it. This is very similar to the stratified random
sampling discussed above. The only difference is that the selection of the members of the sample using
quota sampling is not done randomly. To illustrate this, let us suppose that we want to determine the
teenagers’ most favorite brand of T-shirt. If there are 1,000 female and 1,000 male teenagers in the
population and we want to draw 150 members for our sample, we can select 75 female and 75 male
teenagers from the population without using randomization.
c. Purposive Sampling
- Let us suppose that we want to determine or predict the candidate who will win in the upcoming
election. We can conduct the survey or interview in places or precincts where people voted for the
winner in a series of post elections because we feel objectively that they will again vote for the next
winner in the upcoming election. Also, let us suppose that the target is to find out the effectiveness of a
certain kind of shampoo. Of course, bald fellows will not be included in the sample.
d. Snowball sampling or chain-referral
References:
Statistics by Maxima J. Acelajado (Diwa Publishing)
Introductory Statistics by Cristobal M. Pagoso (Rex Bookstore)
E-Math IV by Orlando Oronce (Rex Bookstore)
Statistics and Probability by Merle S. Alferez (MSA Academic Advancement Institute)
Introduction to Statistics by Francisco A. Febre, Jr. (Phoenix Publishing)
Basic Statistics by Asuncion C. Mercado – del Rosario
Statistics by Ronald Walpole