Lectures in Educational Statistics
Lectures in Educational Statistics
601
(Educational Statistics)
STATISTICS
Objectives:
At the end of the lesson, the students should be able to answer the following questions:
1. What is statistics?
2. What are its concerns?
3. What are the types of statistics?
4. What are the types of inferential techniques?
5. When can you use parametric and nonparametric statistics?
6. What is the difference between parametric and nonparametric statistics?
7. what are the levels of measurement scales?
8. what are the symbols used in this course?
INTRODUCTION
What is statistics?
Statistics refers to a field of study in which quantitative data are collected, organized and
presented, analyzed and interpreted.
Statistics is a science. As a science, it deals with the:
Collection of data
Presentation of data
Organization of data
Analysis of data
Interpretation of the result
Collection of data – this refers to the collection of data from which the investigator gets
information for his study.
Presentation of data – this refers to the organization of data into tables, graphs or charts so
that the reader will be able to get the clear picture of the various relationship presented to
him.
Organization of data - is the way to arrange the raw data in an understandable order.
Organizing data include classification, frequency distribution table, picture representation,
graphical representation, etc.
Analysis of data – it is the process of extracting relevant information from the given data.
Interpretation of data – it refers to the task of drawing conclusions from analyzed data.
What are the types of statistics?
The field of statistics may be divided into descriptive and inferential statistics.
Descriptive Statistics is only concerned with summarizing values, describe group
characteristics of the data after gathering, classifying, and presenting of data. To do this, it
employs graphs, tables and frequency distributions, percentages, measures of central
tendency and position, and measures of variability. It does not need to generalize or make
conclusions. Whereas, Inferential Statistics is concerned with a higher order or critical
thinking and judgment. And it needs more complex mathematical procedures. It aim is to give
generalization, conclusion or information regarding large groups of data called the
population without necessarily dealing with each and every elements of these groups. It only
uses a small portion of the total set of data or only a representative portion called a sample to
give conclusions of generalizations regarding the entire population. To do this, it uses either
Parametric or nonparametric statistics. Parametric Statistics are inferential techniques
which make the following assumptions regarding the nature of the population from which the
observations or data are drawn:
1. The observations must be independent. This means that in choosing any element from
the population to be included in the sample, it must not affect the chances of other elements
for inclusion.
2. The population must be drawn from normally distributed populations. The crude way
of knowing that the distribution is normal is when the mean, the median, and the mode are all
equal (mean = median = mode). If we are going to draw the curve, we can produce a bell-
shaped curve which has an area of one and is symmetrical with respect to the x-axis.
3. If we analyze two groups of populations, these populations must have the same
variance and we call this as homosceclastic populations.
4. The variable must be measured in the interval or ratio scale, so that we can interpret
the results.
While the Non-Parametric Statistics make fewer and weaker assumptions like:
1. The observations must be independent and the variable has the underlying continuity.
2. The observations are measured in either the nominal or ordinal scales. To have a better
understanding on when to use the parametric and non-parametric statistics, please refer to the
table below:
Objectives:
At the end of the session, the participants will be able to:
1. Distinguish a primary data from a secondary data.
2. Experience in understanding simple statistical work like collection, organization and
presentation of data.
The ultimate goal of data collection is information generation. In statistics, history is very
important. We gather data because we would like to know what had happened in the past. We
use history (gather data) to help us decide or formulate a policy. Every bit of data that we
gather we gather from a source or sources in any method or procedure is crucial to a decision
made or to a policy formulated. Therefore, extra care and responsibility are the key words in
data handling; we may call this as data management. Data are measured quantitatively and/or
qualitatively.
Data Collection
2. Source of data. Data can be obtained from the principal sources, namely (1) the direct or
primary data, (2) the secondary source. The data which arise from the original
investigations such as observations, interviews, questionnaires, experiments and the like
from which numerical information not available from other sources are referred to as
primary data. Secondary sources, which include books, journals, records, reports, and
other publication, give rise to secondary data.
Generally, sources of data give a more accurate and reliable picture as they contain
detailed definition of terms, explanation and description of methods or procedure used limits
and delimitation, and the like. On the other hand, secondary sources contain vague, general
and at the same times chopped description of phenomena. They are also more likely subjected
to typographical errors.
3. Methods of collecting data. There are many ways of collecting data. The investigator
selects the method or adopts a combination of methods, which he believes will yield the
desired data relevant to the nature, kind and purposes of his study; the time factor; the cost;
and the population to be studied. What is to be considered foremost is the method, which
will yield accurate data in the shortest possible time at the minimal cost.
a. Data may be obtained by counting or measuring. When data are obtained by these
methods, we say that the data have been collected by observation. Examples of these would
include counting the number of births or deaths in a year, the number of members in some
families, number of students who obtained a certain score in a test, number of vehicles
registered as public utilities, number of retired teachers, number of engineers and carpenters
and the like.
Data, which had been furnished by persons in compliance with laws or ordinances
such as those on registration of births, marriage, death, public utility vehicles are also referred
to as data collected by the registration method. On the other hand, data collected by
measurement include those on height, weight, temperature, pressure, time, water and electric
consumption, and the like.
b. The interview or direct method as the term implies refers to obtaining data from
the respondents personally or in a face – to – face contact. To ensure the reliability of the
results, the interviewer or the data enumerator must be polite, kind and courteous to the
respondents; be careful not to bring his personal biases, prejudices or opinions to the
respondents, not argue on any matter with the respondents; be honest and objective in his
recordings of the responses; and bears in mind at all times the purposes, method and
significance of the study are the important part.
As long as the necessary precautions such as those enumerated above are observed
and that an interview guide is prepared carefully for the interviewer’s intelligent use, the data
obtained from interviews are accurate and reliable.
Kinds of Data
Data may come into two forms, namely: qualitative and quantitative data. Data are
said to be qualitative if it involves the placing of an observation into one of a number of
descriptive categories in which measurement is not involved. On the other hand, if the
observation involves measurement such as height, weight, ratings and others, the data are said
to be quantitative. Quantitative data may be continuous or discrete set of number. When we
count the number of unemployed graduates, number of Filipinos working abroad, and the
like, we have a discrete set of data. We can say that counting or enumeration of things give
rise to discrete data while measurement results to continuous variables.
Organization of Data
b. Ranked form. This refers to arranging of the data in a numerical order either
from highest to lowest (descending order) or from lowest to the highest (ascending order). In
this arrangement, we will know who is on the top or on the bottom line.
Presentation of Data.
There are three different ways of presenting data, namely; textual presentation,
tabular presentation, and graphical presentation.
a. Textual presentation. This presentation of data uses both text and figures to convey
statistical information. This is commonly found in statistical reports in magazines and
newspapers.
SEX
AGE (yr.) TOTAL
Male Female
1-9
10 - 18
19 - 27
28 – 36
37 and above
c. Graphical Presentation. This method of presenting data makes use of the graphs
which provides the reader with a picture of the significant relationship of the facts
presented. The graph can summarize and show visually what would have been
expressed in so many words in a clear and appealing way. So, busy executives who
have no more time to read can just glance at the graphical presentations of important
data and right there he will know what is happening to the company. The most
common graphs are bar graph and pie graph.
Measurement Scales
Nominal scale is the first and the lowest level of measurement. It is merely grouping or
classifying different objects into categories based upon some defined characteristics without
paying attention to order or arrangement. Following the identification of the various
categories, frequencies or the number of objects in each category are counted.
1. The data are mutually exclusive; (an object can belong to only one category).
2. The data categories have no logical order or arrangement. There are two ways of
classifying: The one-way classification and the two-way classification.
Example 1.
SEX YES NEUTRAL NO TOTAL
Male 20 10 30 60
Female 45 10 20 75
Total 65 20 50 135
The ordinal scale is the second level of measurement. In here, there is logical
ordering or arrangement of categories aside from categories being mutually exclusive. The
process of measurement is the same as the nominal scale where number of objects are
counted in each category. However, we can discern which is the highest or the lowest. For
example, rank in military organization, we know that the private<
corporal<sergeant<lieutenant etc. or the academic rank of faculty in college such as;
Instructor 1< Instructor 2 < Instructor 3 < Instructor 4 and etc.
Example
Rank Frequency
Private 20
Corporal 15
Sergeant 10
Lieutenant 5
Interval scale is the third higher level of measurement. It possesses all the properties
of the preceding scales with some additional properties. The additional property is the
difference between the various levels of categories on any part of the scale are equal.
A common variable measured on an interval scale is temperature. The difference
between temperature 65 and 88 is regarded as the different between temperature 13 and 16. In
here zero is just another point on the scale. It does not mean that there is no temperature. In
fact, this is the freezing point of water.
Ratio scale is the highest level of measurement. All properties of the interval scale
are applicable in the ratio scale plus one additional property which is known as the “true zero
point” which reflects the absence of the characteristics measured.
Example, if the teacher in statistics give a quiz and certain student got zero, it means that
the student got no correct answer (score = 0).
In summary:
Objectives:
Sampling is the process of selecting a part (called a sample) from a given whole
(referred to as population) with the ultimate goal of making generalization obtained from the
sample. The primary concern of the process is how to select a sample and utilize the
information derived that would allow us to be able to make “useful” generalization about the
unknown characteristics such as timeless (getting the information when we need it), economy
(affordability), and accuracy (how close the generalizations are to the unknown population
characteristics).
The reason for sampling is that the researcher can gain accurate knowledge about a
population by measuring only a portion or sample of it. Besides, it may be impossible or
impractical to include the total number of cases. However, there may be situations where
complete enumeration is possible. Here are some reasons for sampling:
Timeliness. With fewer observations in the data set, the time that will be spent
from collection, processing, and interpretation is shorter. Hence, getting results
much quicker.
Wider scope and coverage. Most of the time one desires several information
from a single unit of the population. With complete enumeration, the number of
data items to be included is limited by the cost and perhaps by time. When
dealing with fewer observations, one can possibly take several measurements and
thus test several concepts at any given time.
When the process of making observations is more destructive.
Sampling techniques
There are basically two methods or drawings sample from a given population. These are
probability sampling and non-probability sampling. Probability sampling is one where in
every element of the population has a known chance of being included in the sample and the
probability that any specified unit of the population is included in the sample is governed by
this known chance. The likelihood of inclusion is operationalized by the use of randomizing
mechanism (e.g. a device that is used to generate a random number) and the assigned
probability that the unit is specified. In any situation (particularly for finite populations) this
method of sampling requires a listing of the population units and assigning unique label or
identifier (positive integer-usually counting numbers) to each one. Such a listing is referred
to as sampling frame or simply a frame. Ordinary, probability sampling may be more costly
and difficult to carry out (as the possibility of enumerating hard to reach units is a distinct
possibility. However, this procedure allows an objective assessment of accuracy. If done
properly, such assessment may be made using sample data and even without having to make
strong assumptions about how the sample is linked with the population.
• Simple random sampling (SRS). When one wants to be able to generalize as to the
whole population from which the sample is drawn, or if the population is not scattered, this
is, it is more or less homogeneous with respect to the characteristics under investigation and a
good frame is available or can easily be constructed, then simple random sampling may be
used. This is the simplest kind of probability sampling. SRS is considered as the most basic
of all probability sampling techniques. In fact, all other probability sampling techniques can
be considered as modifications of SRS catering of some real situations. One advantage of
this method is the ease in subjecting sample data to further statistical analysis. However,
such procedures may not be practical to implement especially for large population due to the
absence of good quality sampling frame in the possibility that the selected units may be
extremely scattered thus making it doubly difficult to implement.
There are several ways of drawing n by simple random sampling, they are:
a. Lottery sampling this is done by writing on a small piece of paper the names of
each member listed in the sampling frame which is numbered 1 to N. Then mix thoroughly in
a container and pick in units at random Lottery sampling is easy and simple if N is not very
large. But as N increases, Lottery sampling also becomes time consuming tedious.
b. Sampling through the table of random of numbers (TRN). In here, the
selection of n from N is purely by chance and again every member of the population has an
equal chance of being chosen.
There are two ways of using the table of Random Numbers, namely:
Here, the population is divided into n groups with k members each. The researcher
should get one representative sample per group. The element in the population are first
assigned a number from 1 to N the sampling interval is then determined by taking the ratio of
N (size of the population) to the sampling size n (say k). Then, a random number is selected
from 1 to k called the random start. The unit assigned is this number is then included in the
sample and the kth unit thereafter.
Suppose a population consists of N = 300 member, numbered from 1 to 300 if you want
further to draw a sample of size 5 (n = 5). To get how many members (k) are there per group,
we have to use the formula.
So, the population N = 300 is divided into n groups (n= 5). Then draw one member
out of 60 members, first we pick any number 1 called a random start which is any number 1<
r <k so that in this case, we have 1<r < 60. So, we may prepare 60 pieces of paper and
number them 1 to 60. Get a pieces for a random start. Suppose no. 10 was drawn. So, the
length element in the sampling frame will automatically become the first unit of the sample.
(n). The members of n are as follows:
1st = r = 10
2ND = r + k = 10 + = 70
The Stratified random sampling may be used when it is known in advance that the
special segment of the population would not have enough persons in the sample if the simple
random sample were drawn.
a. For Equal Allocation. This involves taking the same number of units from each
stratum to make up the desired sample size n, hence,
ni = n/L = k
Example 1. If a sample size n = 80 will be taken from the four strata, then:
n = 80
L = 4
Therefore:
ni = n/L=k
ni = 80/4 = 20
Hence, 20 units will be taken from each stratum to constitute the 80 sample elements
needed.
COLLEGE Ni wi=Ni/N ni = nw
College of Agriculture 60 60/300 =6/30 30(6/30) = 6
College of art and science 120 120/300 = 12/30 30(12/30) = 12
College of education 80 80/300 =8/30 30(8/30) = 8
College of engineering 40 40/300 = 4/30 30(4/30) = 4
TOTAL 300 1 30
ACTIVITY 1
Answer the following questions:
1. If the students are Stratified according to the college where they belong (college of arts
and, science, college of Agriculture, college of education, college of engineering and college
of veterinary medicine). How many sample students shall we get from each of the colleges?
Show your solution.
Population- N=1000 Formula- K=N/n
Size- n=5 K=1000/5= 200 members each
2. Suppose you would like to allocate Proportionately the sample size of 200 among the
colleges with population given in the table below, how many sampling units would you
allocate per stratum?
• Cluster sampling. Another type of probability sampling which is used when the
population from which a sample is to be drawn is very big and is distributed over a
large area. This means that the population is the same as it is when individuals are
being sampled, except that it is geographic regions that are being sampled. As the
sampling proceeds the area becomes progressively smaller.
b. Quota sampling. This kind of sampling techniques is quick and cheap method to
operate. Each interviewer is given a definite instruction and quota about the sections of the
population he is to work on, but the final choice of the actual person is left to his own
preference, and is not predetermined by some carefully operated randomization plan.
So, the other section of the population will not be given chance. For example, a
person will just stand on the entrance of a department store and interview everyone who
enters. Then, the people who will come inside the store will not have the chance of being
interviewed.
They are so many ways of determining the sample size. One of which is that of the
formula of slowing (1960 ):
n= N
1+Ne2
e = desired margin of error (percent allowance for non - precision because of the use
of the sample instead of the population).
Always remember, however that the assumption of a normal distribution of the population
should be considered. When the normal approximation of the population is Small or poor,
this sample size formula does not apply.
Gay (1976) offers some minimum acceptable size depending on the type of research as
follows:
d. Experimental research- 15 subjects per group. Some authorities believe that 30 per
group should be considered minimum.
ACTIVITY 2
Research the following topics:
1. T - test
a. t – test for Dependent or Correlated Samples.
b. t – test for Independent Samples
3. Regression Analysis
a. Simple Linear Regression Analysis
b. Multiple Linear Regression Analysis
4. Correlation Analysis
Give example to each sub topics and give also at least 3 examples of Parametric
Statistics and Non – Parametric Statistics.
ACTIVITY 3
Problem no. 1. Mrs. Dela Cruz conducted a study for her dissertation. She tried to determine
the factors affecting Teaching Competencies of Student Teachers in Western Mindanao.
Questions:
1. Interpret the tabulated results of the Statistical Treatment.
2. What are the factors that significantly influenced the Teaching Competencies of the
Student Teachers? What is the basis of your conclusion?
Problem no. 2. The following is a hypothetical data relating Sex level of involvement in
Community activities of parents in public elementary school.
Male Female
Level of Involvement
F % F %
High 160 61.5 130 52
Low 100 38.5 120 48
Total 260 100 250 100
Chi – Square Observed value = 4.30, Chi – square Critical value = 3.84, df = 1 and
Probability value at alpha level (α) =0.05.
Questions:
Prepared by