Statistics Analysis With Software Application
Statistics Analysis With Software Application
DEFINITION OF STATISTICS
MODULE 1:
Statistics plays a major role in many aspects of our lives.
It is used in sports, for example, to help a general manager
INTRODUCTION TO decide which player might be the best fit for a team. It is used
THE STATISTICAL in politics to help candidates understand how the public feels
about various policies. And statistics is used in medicine to help
• Non-statistical considerations – It may include availability of resources, man power, budget, ethics and
sampling frame.
1. Level of Precision
Also called sampling error, the level of precision, is the range in which the true value of the population is
estimated to be.
2. Confidence Interval
It is statistical measure of the number of times out of 100 that results can be expected to be within a
specified range. For example, a confidence interval of 90% means that results of an action will probably
meet expectations 90% of the time.
3. Degree of Variability
Depending upon the target population and attributes under consideration, the degree of variability
varies considerably. The more heterogeneous a population is, the larger the sample size is required to
get an optimum level of precision.
Here are the common used methods in determining the sample size.
• Slovin’s Formula
- Slovin’s formula is used to calculate the sample size n given the population size and error. It is computed
as
N
n≥
1+ N e 2
Where:
N is the total population.
e is the level of precision.
Example:
A researcher plans to conduct a survey about food preference of BS Stat students. If the population of
students is 1000, find the sample size if the error is 5%.
Solution:
N
n≥
1+ N e 2
1000
n≥ 2
=285.71
1+1000(0.05)
- https://www.calculator.net/sample-size-calculator.html
- https://ph.search.yahoo.com/search?
fr=mcafee&type=E211PH885G0&p=raosoft+sample+size+calculator
BASIC SAMPLING DESIGN
The goal in sampling is to obtain individuals for a study in such a way that accurate information about
the population can be obtained.
Reason for Sampling
- Important that the individuals included in a sample represent a cross section of individuals in the
population.
- If sample is not representative it is biased. You cannot generalize to the population from your statistical
data.
DEFINITIONS:
• Observation unit - An object on which a measurement is taken. This is the basic unit of observation,
sometimes called an element. In studying human populations, observation units are often individuals.
• Target population - The complete collection of observations we want to study.
• Sampled population - The collection of all possible observation units that might have been chosen in a sample;
the population from which the sample was taken.
• Sample - A subset of a population.
• Sampling unit - A unit that can be selected for a sample. We may want to study individuals, but do not have a
list of all individuals in the target population. Instead, households serve as the sampling units, and the
observation units are the individuals living in the households.
• Sampling frame - A list, map, or other specification of sampling units in the population from which a sample
may be selected. For a survey using in-person interviews, the sampling frame might be a list of all street
addresses.
• Sampling technique/Sampling Strategies - It is a plan you set forth to be sure that the sample you use in your
research study represents the population from which you drew your sample.
• Sampling Bias - This involves problems in your sampling, which reveals that your sample is not representative
of your population.
ADVANTAGE OF SAMPLING OVER COMPLETE ENUMERATION
- Less Labor - Reduced Cost
- Greater Speed - Greater Scope
- Greater Efficiency and Accuracy - Convenience - Ethical Considerations
TWO TYPES OF SAMPLES
1. Probability Sample
• Samples are obtained using some objective chance mechanism, thus involving randomization.
• They require the use of a complete listing of the elements of the universe called the sampling frame.
• The probabilities of selection are known.
• They are generally referred to as random samples.
• They allow drawing of valid generalizations about the universe/population.
• Simple Random Sampling - Most basic method of drawing a probability sample. Assigns equal probabilities of
selection to each possible sample. It is a technique that uses random
numbers or codes to represent a population. These numbers or
codes represent the entire member of the population. This can be
done by drawing a name from a box, using the random function
of a calculator, or using random generator programs.
When to use: This is preferable to use if the population is not widely spread geographically. Also, this is more
appropriate to use if the population is more or less homogenous with respect to the characteristics of the
population.
• Systematic random sampling is a technique such that every k th member of the population is being selected
until the required sample size is achieved. A careful organization of the members is required to ensure that a
selection process will yield to a sample that best represent the population. You may arrange them (in a list)
alphabetically, by sex, by age, socio-economic status, marital
status, etc.
Advantage: Drawing of the sample is easy. It is easy to administer in the field, and the sample is spread evenly
over the population.
Disadvantage: May give poor precision when unsuspected periodicity is present in the population.
When to use: This is advisable to us if the ordering of the population is essentially random and when
stratification with numerous data is used.
When to use: If the population is such that the distribution of the characteristics of the respondents under
consideration concentrated in small and spread segment of the population. Thus, this is preferred to use if
precise estimates are desired for stratified parts of the population and if sampling problems differ in the various
strata of the population.
Disadvantage: In actual field applications, adjacent households tend to have more similar characteristics than
households distantly apart.
When to use: If the population can be grouped into clusters where individual population elements are known to
be different with respect to the characteristics under study, this preferable to use.
Take Note!
Used probability sampling if the main objective of the sample survey is making inferences about the
characteristics of the population under study.
• Accidental Sampling - There is no system of selection but only those whom the researcher or interviewer
meets by chance.
• Quota Sampling - There is specified number of persons of certain types is included in the sample. The
researcher is aware of categories within the population and draws samples from each category. The size of each
categorical sample is proportional to the proportion of the population that belongs in that category.
• Convenience Sampling - It is a process of picking out people in the most convenient and fastest way to get
reactions immediately. This method can be done by telephone interview to get the immediate reactions of a
certain group of sample for a certain issue.
• Purposive Sampling - It is based on certain criteria laid down by the researcher. People who satisfy the criteria
are interviewed. It is used to determine the target population of those who will be taken for the study.
- Only few are willing to be interviewed - Extreme difficulties in locating or identifying subjects
- Probability sampling is more expensive to implement - Cannot enumerate the population elements.
Sources of Errors in Sampling
1. Non-sampling Error
- Errors that result from the survey process.
- Any errors that cannot be attributed to the sample-to-sample variability.
Sources of Non-Sampling Error
1. Non-responses 2. Interviewer Error 3. Misrepresented Answers
4. Data entry errors 5. Questionnaire Design 6. Wording of Questions
7. Selection Bias
2. Sampling Error
- Error that results from taking one sample instead of examining the whole population.
- Error that results from using sampling to estimate information regarding a population.
DATA PRESENTATION
MODULE 3:
Data are usually collected in a raw format and thus the
Based on the data assign sex in numerical data (e.i Male – 1 , Female – 2)
Using the formula or command in excel calculate the frequency distribution to determine how
many female and male respondents participated in the study.
1. Highlight the “Frequency Column” then,
2. Enter the Formula
3. Enter the data on the data Array by highlighting the data in “Sex Column” → comma →
highlight data in “Bin Array”
4. Shift → Ctrl →Enter.
5. Sum up the total using AutoSum command or =sum(number1, number2) then ENTER.
Percentages
1. Construct the percentage table by adding a column using the frequency table.
3. Sum up the total using AutoSum command or =sum(number1, number2) then ENTER.
→
Now present the final output:
FINAL OUTPUT
Table 1 shows the frequency and percentage distribution of the respondents in terms of sex. It can be
gleaned from the table that, out of 20 respondents considered in the study, 11 or 55 % are male and 9 or 45%
are female.
Frequency
If the data is in the form of quantitative data
The study aims to determine the Intrinsic and Extrinsic Motivation to the academic performance of the
Grade 10 Students of Maximo L. Gatlabayan Memorial National High School.
1. What is the profile of the respondents in terms of sex?
2. What is the level of the Academic Performance of the students?
STEPS:
1. Set an interval or range for your data. It is needed for the “BIN RANGE”.
2. Click “DATA” on the menu bar and Click “DATA ANALYSIS” on the tool bar
3. The dialog box “DATA ANALYSIS” will appear and choose “HISTOGRAM” on the dialog box then click OK.
4. Highlight your data for the “INPUT RANGE”.
5. Highlight your data for the “BIN RANGE”.
6. Click the box of “LABELS IN FIRST ROW” then click “OK”.
7. The result will appear on the new worksheet of the excel file. Get the Percentage and total.
Step 1. Step 2.
Step 3.
Step 4.
Step 5.
A graph is a very effective visual tool as it displays data at a glance, facilitates comparison, and can
reveal trends and relationships within the data such as changes over time, and correlation or relative
share of a whole.
It is considered an important medium of communication because we are able to create a pictorial
representation of the numerical figures.
Suited when we need to show the results of the study to nonprofessionals and or people who dislike
numbers and too lengthy texts.
BAR GRAPH
- It is constructed by labeling each category of data on either the horizontal or vertical axis or the
frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn
for each category. The height of each rectangle represents the category’s frequency or relative
frequency.
- It is use to organize discrete data.
Simple Bar Graph. The simple bar chart is used for the case of one variable only.
Multiple Bar Graph\ Grouped Column Chart. The multiple bar charts are an extension of a simple bar
chart when there are quantities of several variables to be displayed. The bars representing the
quantities for the different variables are piled next to one another for each attribute. The figure
becomes very cumbersome when there are too many variables and components.
Component Bar Graph/ Subdivided Column Chart. In this type of bar chart, the components (quantities)
of each variable are piled on top of one another. It saves space as compared to a multiple bar chart. One
of the disadvantage of this graph is that it is not always easy to compare size of the components, or
parts. It is used to represent data in which the total magnitude is divided into different or components.
Remember!
Bar graphs may also be drawn with horizontal bars. Horizontal bars are preferable when category names
are lengthy.
In bar graphs, the order of the categories does not usually matter. However, bar graphs that have
categories arranged in decreasing order of frequency help prioritize categories for decision-making
purposes in areas such as quality control, human resources, and marketing.
HISTOGRAM
- It is constructed by drawing rectangles for each class of data. The height of each rectangle is the
frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles
touch each other.
- It is a graph used to present quantitative data, is similar to the bar graph.
- It is use to organize continuous data.
PIE CHART
- It is a circle divided into sectors. Each sector represents a category of data. The area of each sector is
proportional to the frequency of the category.
- Pie charts are typically used to present the relative frequency of qualitative data. In most cases the data
are nominal, but ordinal data can also be displayed in a pie chart.
When should a bar graph or a pie chart be used?
Pie charts are useful for showing the division of all possible values of a qualitative variable into its parts.
Bar graphs are useful when we want to compare the different parts, not necessarily the parts to the
whole.
LINE GRAPH
- A graph that shows information that is connected in some way (such as change over time)
- Line segments are then drawn connecting the points. It is use to organize continuous data.
- Very useful in identifying trends in the data over time.
DESCRIPTIVE STATISTICS
How to Calculate Measures of Central Tendency, Measures of Variation, Skewness and Kurtosis for
Ungrouped and Sample Data Using Excel?
Example:
The data given below are the scores of randomly selected applied statistics undergraduate students in
Section A and Section B. Compare the scores of Section A and Section B based on measures of central tendency,
and measures of variation and determine which section performed better in their final examination. Also,
describe the shape of the distribution of these two data sets using skewness and kurtosis.
1. Click “DATA” on the menu bar and Click “DATA ANALYSIS” on the tool bar. The Dialog box will appear.
2. Select “Descriptive Statistics” then click “OK”.
3. Highlight your data for the “INPUT RANGE” and click the box of “LABELS IN FIRST ROW” then click “OK”.
4. Click “Summary statistics” and then click “OK”. Repeat the process for Data Set B.
When comparing distributions, it is better to
use a measure of variation/dispersion in addition to
a measure of central tendency but because in this
example Data set A and Data set B have the same
value for measures of central tendency, we will just
used measure of variation/dispersion to compare
these two data set.
Based on the result, Data set B has a larger
variability since it has larger value computed based
on different measures of variation. This means that
Data Set B is much more spread out than the Data
Set A.
In this example, we want a data set with a large mean value and a small standard deviation so we can
say that this is the section that performed better. Section A and Section B have the same mean value but in
terms of standard deviation Section A have smaller value compared to Section B, therefore, Section A performed
better in their final examination. In terms of the shape of the distribution, these two data sets have the shape in
terms of Skewness and kurtosis. It shows that Data Set A and Data Set B have platykurtic shaped and it is skewed
to the right.