Statistics - MMW
Statistics - MMW
Statistics
➢ is a science which deals with the collection, organization, presentation, analysis and
interpretation of numerical or quantitative data which may be used for prediction
or verification of relationships among variables
Purpose of statistics
1. To summarize and describe a set of data from a research study
2. To provide an objective basis for drawing conclusions from the data collected in
a research study
Descriptive statistics
➢ Refers to the field of statistics that includes the methods of collecting, classifying,
graphing and averaging data with the objective of simply describing the properties
or characteristics of the data gathered
The tasks of the statistician in this area is simply to select a few procedures, do some
averaging, and eventually be able to identify significant features of the given data.
Inferential statistics
➢ Demands a somewhat higher degree of critical judgment and advanced mathematical
models.
➢ This is concerned with drawing conclusions or generalizations FROM ORGANIZED
DATA.
The task of statistician here is not just to devise ways to give a summary of description
of the data but ways to test the significance of the results.
BASIC TERMINOLOGIES IN STATISTICS
• Data – facts, observations, and information that come from investigations
• Measurement data sometimes called quantitative data – the result of using
some instrument to measure something (e.g., test score, weight)
• Categorical data also referred to as frequency or qualitative data – things are
grouped according to common property(ies) and the number of members of
the group are recorded (e.g., males/females, vehicle type)
• Variable – property of an object or event that can take on different values. For
example, college major is a variable that takes on values like mathematics, computer
science, English, psychology, etc.
• Discrete variable – a variable with limited number of values (e.g., gender
(male/female), college classes (freshmen/sophomore/junior/senior)
• Continuous variable - a variable that can take on many different values, in
theory, any value between the lowest and the highest points on the
measurement scale
• Independent variable - a variable that is manipulated, measured or selected by
the researcher as an antecedent condition to an observed behavior. In a
hypothesized cause –and-effect relationship, the independent variable is the
cause and the dependent variable is the outcome or effect
• Dependent variable - a variable that is not under the experimenter’s control –
the data. It is the variable that is observed and measured in response to the
independent variable
• Quantitative variable - a variable that based on quantitative data
• Qualitative variable - a variable that based on categorical data
• Universe (population) – the set of all the individuals or entities under consideration.
The totality of objects, persons, places or things used in a particular study of research.
• Sample – is a small portion or part of the population. it could also be referred to a
subgroup, a subset of a group of representatives of a population.
• Parameter – a numeric characteristic of a population
• Statistic - a numerical characteristic of a sample
• Measurement – is the process of determining the value or label of a particular variable
for a particular experimental unit
• Experimental unit – is the individual or object on which a variable is measured
2. Ordinal
o data collected are labels or classes with an implied ordering in these labels;
o the difference between the two labels cannot be quantified
o a level of measurement higher that nominal
o only ordering or ranking can be done on the data
o the ordering of such numbers are meaningful
Example: rank of university faculty e.g. instructor, asst professor, assoc professor, full
professor
3. Interval
o data collected can be distinguished, ordered or ranked and posses a meaningful
difference
o difference between any two data values can be determined
o the unit of measurement is constant (but arbitrary) and the zero point is arbitrary,
i.e., complete absence of the characteristics being measured
Example: level of academic performance of students in mathematics
scale description
1.00 – 1.80 outstanding
1.81 – 2.60 very satisfactory
2.61 – 3.40 satisfactory
3.41 – 4.20 needs improvement
4.21 – 5.00 poor
4. Ratio
o data collected has all the properties of the interval scale, and in addition can be
multiply or divided, has a zero true point and the ratio of two data is meaningful
o highest level of measurement
o presence of units of measurement
Example: height, weight, volume
METHODS OF SAMPLING
RANDOM SAMPLING
➢ is the most commonly used sampling technique in which each member of the
population is given an equal chance of being selected in the sample.
PROPERTIES OF RANDOM SAMPLING
1. EQUIPROBABILITY – means that each member of the population has an equal chance
of being selected and included in the sample.
2. INDEPENDENCE – means that the chance of one member of being drawn does not
affect the chance of the other member.
Example: A study on the effectiveness of a new drug can be tested to two groups of the
animals, the controlled and experimental groups. Those animals that belong to the
controlled group will not be treated with a new drug while those that belong to the
experimental group will be treated with new drug. The selection of a sample of paired
animals should be with restrictions according to their degree of illnesses so that the
significant difference between the two groups will be accepted.
“Sampling techniques”
A. RANDOM SAMPLING
1. LOTTERY OR FISHBOWL SAMPLING
- this is done by simply writing the names or numbers of all the members of the
population in small rolled pieces of paper which are later placed in a container. The
researcher shakes the container thoroughly then draws “n” out of “N” pieces of papers
as desired for a sample. This is usually done in a lottery.
2. SAMPLING WITH THE USE OF TABLE OF RANDOM NUMBERS
- if the population is large, a more practical procedure is the use of Table of Random
numbers which contains rows and columns of digits randomly ordered by a computer.
A sample of size “n” can be generated by beginning at an arbitrary point in Table of
Random Numbers, closing your eyes and haphazardly pointing at an entry in the Table.
Then the proceed in any direction, vertically, horizontally, or diagonally until “n” distinct
numbers could represent the numerically coded elements in the population.
3. SYSTEMATIC SAMPLING
- is done by taking every kth element in the population. It implies to a group of individuals
arranged in a waiting line or in a methodical manner.
For instance, the objective is to get the opinion of employees regarding employee-
management relations, a sample of size “n” will be selected from the list of employees
arranged alphabetically or according to age, experience, position or academic rank.
by systematic sampling, every kth employee from the listed order will be included in a
sample. if “N” is known, k value can be calculated as
k = N/n where N is the population size
n is the sample size
2. QUOTA SAMPLING
- this is a relatively quick and inexpensive method to operate since the choice of the
number of person or elements to be included in a sample is done at the researcher's own
convenience or preference and is not preference and is not predetermined by some
carefully operated randomizing plan.
3. CLUSTER SAMPLING
- sometimes referred to as an area sampling because it is usually applied on a
geographical basis. The population is grouped into cluster or small units, e.g., blocks or
districts, in city or municipality.
area sampling usually requires larger elementary units than those required in simple
random sampling. It is not a common practice, however, that every individual located in
selected area is interviewed. Often additional sampling stages are introduced.
4. INCIDENTAL SAMPLING
- this design is applied to those samples which are taken because they are the most
available. In an interview, for instance, an interviewer can simply choose to ask those
people around him or in a coffee shop where he is taking a break.
5. CONVENIENCE SAMPLING
- this method has been widely used in televisions and radio programs to find out
opinions of TV viewers and listeners regarding a controversial issues. While the issue is
being discussed in a talk show, the hosts will immediately get responses and comments
from those who will call their telephone operators. This method, of course, is bias
against those without telephones in their houses.
SOURCES OF DATA
TWO SOURCES OF DATA
1. Primary source
2. Secondary source
PRIMARY SOURCE
- which a first-hand information is obtained usually by means of personal interview and
actual observation
SECONDARY SOURCE
- information is taken from other’s works, new reports, readings, and those that are kept
by National Statistics Office, Securities and Exchange Commissions, SSS, and other
government and private agencies
Data are said to be an asset of a company if they are accurate, updated, and available
when needed. Hence, any institution or business organization must have a database
called Management Information System where all information about their business are
made available in order to facilitate verification of claims and to come up with wise
management decisions.
4. OBSERVATION METHOD
- is a scientific method of investigation that makes possible use of all senses to
measure or obtain outcomes/responses from the object of study.
5. EXPERIMENTATION
- is used when the objective is to determine the cause-and-effect of a certain
phenomenon under some controlled conditions.
NOTE:
Data that are collected by these methods are usually referred to as raw data. Responses
out from taped interviews, answered questionnaires, furnished registration forms,
recorder observations and results of an experiment are considered raw data since they
are not yet organized and presented in a form ready for interpretation. These data can
only be understood if appropriate forms of presentation are adopted.
2. TABULAR PRESENTATION
- this form of presentation is better than textual form because it provides
numerical facts in a more concise and systematic manner.
- statistical tables are constructed to facilitate the analysis of relationships
- each class/subclass is assigned to a particular row or column and figures for
various classifications are noted in appropriate calls
Advantages of tabular presentation
1. It is brief; it reduces the matter to the minimum.
2. It provides the reader a good grasp of the meaning of the quantitative relationship
indicated in the report.
3. It tells the whole story without the necessity of mixing textual matter with figures.
4. The systematic arrangement of columns and rows makes them easily read and
readily understood.
5. The column and rows make comparison easier.
3. GRAPHICAL PRESENTATION
- this form is the most effective means of organizing and presenting statistical data
because the important relationship are brought out more clearly and creatively
in virtual solid and colorful figures.
1. LINE GRAPH
- it shows relationships between two sets of quantities
- this is done by plotting point of X set of quantities along the horizontal axis
against the Y set of quantities along the vertical axis in a Cartesian coordinate
plane
- those plotted points will be connected by a line segment which finally forms the
line graph
- it is often use to predict growth trends for a longer period of time
2. BAR GRAPH OR HISTOGRAM
- it consists of bars or rectangles of equal widths, either drawn vertically or
horizontally, segmented or non-segmented
- is done by drawing rectangles with length proportional to the frequencies of
observed items or magnitude of classes under study
- two or more kinds of information can be compared by showing them in multiple
bar graphs, each of which is shaded with different colors to give distinctions of
each
- in some cases, bars can be shown in opposite directions above and below a zero
line to illustrate profits/earnings (positive) and loss/deficit relationship