Rosalie Act. 2.0
Rosalie Act. 2.0
ACTIVITY 5
A. Definition of terms
1.WHAT IS STATISTICS?
The term STATISTICS has both a plural and singular sense. In its plural sense, it refers to Numerical
facts that are systematically collected and analyzed.
Example: Readers of a business section of a newspaper would think of statistics as the Consumer
price index, the returns of a particular stock, the peso to dollar. Exchange rate, among others, being
discussed in the newspaper.
In its singular sense, the word statistics refers to scientific discipline consisting of theory and
methods for processing numerical information that one can use when making decisions in the face of
uncertainty. The recognition of uncertainty and the importance of statistical activities are likely to be as
old as civilization itself. Even before the art of counting was perfected, there is evidence to suggest that
herdsmen were putting notches on trees to keep track of their cattle. In its plural and singular sense, the
term Statistics refers to quantities computed from Numerical information.
As a branch of applied mathematics, it deals with the systematic method of collecting, classifying,
presenting, analyzing, and interpreting numerical data.
Statistical methods range from the most elementary descriptive tools for summarizing data, to
rather esoteric procedures and models. These methods enable us to develop a way of thinking that
helps us
•describe or characterize persons, objects, situations, and phenomena with some reliability;
Statistical methods typically have two broad aims: (a) to describe, and (b) to infer. In the first
case, the main task is that of data organization and presentation ( without drawing conclusions or
inferences beyond the data). These tools are called descriptive statistical methods. In the second case,
the task is to generalize results beyond the data collected is a part ( sample ) of a large set of items
( population). In this case, the statistical analysis required is inferential statistical methods.
A .DESCRIPTIVE STATISTICS
Descriptive statistics deals with methods for collecting, organizing, and describing data by using tables,
graphs, and summary measures.
2 / 12
Suppose that a test in statistics course is given to a class at KSU and the test scores for all students are
collected, then the test scores for the students are called data set (the definition of this term will be
discussed deeper in section 1.2). Usually the data set is very large in the original form and it is not easy
to use it to draw a conclusions or to make decisions while it is very easy to draw conclusions from
summary tables and diagrams than from such original data. So reducing the data set to form more
control by constructing tables, drawing graphs and provide some numerical characteristics for which is a
simple definition to introduce descriptive statistics.
B.INFERENTIAL STATISTICS
Inferential statistics deals with methods that use sample results, to help in estimation or make decisions
about the population.
The set of all elements (observations) of interest in a study is called a population, and the selected
numbers of elements from the population is called a sample. In statistical problems we may interest to
make a decision and prediction about a population by using results that obtained from selected samples,
for instance we may interest to find the number of absent students at PY on a certain day of a week, to
do so, we may select 200 classes from PY and register the number of students that absent on that day,
then you can use this information to make a decision. The area of statistics that interest on such decision
is referred to inferential statistics.
POPULATION is the set of all possible values of the variable. It is the totality of all the actual objects of a
certain class under consideration. It is denoted by a capital N
Notice, the population answers who you want to measure and what you want to measure. Make sure
that your population always answers both of these questions. If it doesn’t, then you haven’t given
someone who is reading your study the entire picture. As an Example, if you just say that you are going
to collect data from the senators in the U.S Congress, you haven’t told your reader want you are going
to collect. Do you want to know their income, their highest degree earned, their voting record, their age,
their Political party, their gender, their marital status, or how they feel about a particular issue? Without
telling what you want to measure, your reader has no idea what your study is actually about.
Sometimes the population is very easy to collect. Such as if you are interested in finding the average
age of all of the current senators in the U.S. Congress, there are only 100 senators. This wouldn’t be
hard to find. However, if instead you were interested in knowing the average age that a senator in the
U.S. Congress first took office for all senators that ever served in the U.S. Congress, then this would be a
bit more work. It is still doable, but it would take a bit of time to collect. But what if you are interested in
finding the average diameter of breast height of all of the Ponderosa Pine trees in the Coconino National
Forest? This would be impossible to actually collect. What do you do in these cases? Instead of collecting
the entire population, you take a smaller group of the population, kind of a snap shot of the population.
This smaller group is called a sample.
3 / 12
How you collect your sample can determine how accurate the results of your study are. There are
many ways to collect samples. Some of them create better samples than others. No sampling method is
perfect, but some are better than others. Sampling techniques will be discussed later. For now, realize
that every time you take a sample you will find different data values. The sample is a snapshot of the
population, and there is more information than is in the picture. The idea is to try to collect a sample
that gives you an accurate picture, but you will never know for sure if your picture is the correct picture.
Unlike previous mathematics classes where there was always one right answer, in statistics there can be
many answers, and you don’t know which are right.
Once you have your data, either from a population or a sample, you need to know how you want to
summarize the data. As an example, suppose you are interested in finding the proportion of people who
like a candidate, the average height a plant grows to using a new fertilizer, or the variability of the test
scores. Understanding how you want to summarize the data helps to determine the type of data you
want to collect. Since the population is what we are interested in, then you want to calculate a number
from the population. This is known as a parameter. As mentioned already, you can’t really collect the
entire population. Even though this is the number you are interested in, you can’t really calculate it.
Instead you use the number calculated from the sample, called a Statistic, to estimate the parameter.
Since no sample is exactly the same, the statistic values are going to be different from sample to sample.
They estimate the value of the parameter, but again, you do not know for sure if your answer is correct.
VARIABLE is the attribute of interest observable of each entity in the universe. It is an observable
characteristics or phenomena which is capable of taking several values or of being expressed in several
different categories.
1. Qualitative( Categorical ) Data Variable are results when the information have been sorted into
categories. Answer questions “what kind”. They can either be ordered or unordered.
Examples: Ordered – income data grouped into high, middle and low income status. Unordered
– sex, religion, part affiliation
4 / 12
2. Quantitative Data (Numerical ) are results of counting and measuring. Answer questions such as “how
much” or “how many “.
• Discrete data ( variable ) are those data that can be counted. They are quantified by the use of whole
numbers.
Examples: the number of days before some equipment fails, the ages of survey respondents measured
to the nearest year, the number of pandesal sold on a certain day.
•Continuous data ( variable ) are those that can be measured. May take values within a specified range
of values.
Examples: heights of survey respondents, exact volume of some liquid substance, weights of a sample of
10 children.
UNGROUPED DATA
Ungrouped data is defined as the data given as individual points (i.e. values or numbers) such as 15, 63,
34, 20, 25, and so on.
The measurement process is an integral part of data collection. If the unit of analysis is an individual
person, many characteristics of that person, some visible and other invisible can be measured.
5 / 12
The kind of analysis that one can perform on the available data critically depends on its scale of
measurement. We speak of the kind of information a measurement give by saying what kind of scale the
measurement is made in
NOMINAL SCALE is the most limited type of measurement. A measurement of a property has a nominal
scale if the measurement tells only what class a unit falls in with respect to that property. It is used to
differentiate classes or categories for purely classification or identification purposes.
Examples: sex, employment status, race, language spoken at home, plate numbers of cars
ORDINAL SCALE is a measurement used to tell when one unit has more of the property than does
another unit. It specifies the relative positive of items with respect to a given characteristics. It is like the
nominal scale in that it consists of mutually exclusive and exhaustive categories. However, categories
are ranked in order of their value on the property.
Examples: employees ratings, salary grade, ranks given to contestants in an essay writing contest
INTERVAL SCALE is used when one unit differs by a certain amount of the property from another unit. It
possesses the properties of the ordinal scale with the additional property of equal intervals between
ranked ordered item. It allows addition and subtraction operations, but it does not possess an absolute
zero.
RATIO SCALE is used when one unit has so many times as much of the property as does another unit.
The ratio scale possesses an absolute, fixed zero point and allows all arithmetic operations. The
existence of zero point is the only difference between ratio and interval measurements.
RANDOM SAMPLING
P = 1 – (N-1/N).(N-2/N-1)…..(N-n/N-(n-1))
6 / 12
Cancelling = 1-(N-n/n)
P = n/N
•The chance of getting a sample selected more than once is given by;
P = 1-(1-(1/N))n
Suppose a firm has 1000 employees in which 100 of them have to be selected for onsite work. All their
names will be put in a basket to pull 100 names out of those. Now, each employee has an equal chance
of getting selected, so we can also easily calculate the probability (P) of a given employee being selected
since we know the sample size (n) and the population size(N).
P = 1-(1-(1/N))n
P = 1 – (999/1000)100
P = 0.952
P ≈ 9.5%
SCHEDULE
Schedule is the tool or instrument used to collect data from the respondents while interview is
conducted. Schedule contains questions, statements (on which opinions are elicited) and blank
spaces/tables for filling up the respondents.
TABULATION
• The process of placing classified data into tabular form is known as tabulation.
• Tabular presentation is the process of condensation of the data for convenience in statistical
processing, presentation and interpretation of the information.
7 / 12
• To do comparisons
• To simplify data
Methods of Tabulation
Simple tabulation:
8 / 12
• For example: the survey that determined the number of employees of a company using different
brands of mobile phones like Blackberry, Nokia, I phone, etc.
Double tabulation:
• For example: number of male and female employees in the company using different brand of mobile
phones like Blackberry, Nokia, Iphone, etc.
9 / 12
Complex tabulation:
• For example: number of male (Age 22-24 and 24-32), female (Age 22-24 and 24-32), in the company
using different brand of mobile phones like Blackberry, Nokia, Iphone, etc.
CUMULATIVE SAMPLING
10 / 12
FREQUENCY DISTRIBUTION
A frequency distribution is a representation, either in a graphical or tabular format, that displays the
number of observations within a given interval. The frequency is how often a value occurs in an interval
while the distribution is the pattern of frequency of the variable.
11 / 12
FREQUENCY POLYGON
HISTOGRAM
A histogram is a type of chart that shows the frequency distribution of data points across a continuous
range of numerical values. The values are grouped into bin or buckets that are arranged in consecutive
order along the horizontal x-axis at the bottom of the chart. Each bin is represented by a vertical bar that
sits on the x-axis and extends upward to indicate the number of data points within that bin.