0% found this document useful (0 votes)
8 views16 pages

Reviewer

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views16 pages

Reviewer

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Module 1 – Introduction to Bio Statistics and Epidemiology

ENGAGE

Do you know that statistics are happening in our everyday life? But sometimes
we are not aware that it is happening everywhere. For example playing basketball,
number of children in the community, people living in a certain barangay, total
population in the Philippines and covid 19 pandemic that we are experiencing now.
These are just a few events that statistics is always present. This is just only a few data
that we have and out of this data statistical methods will be applied. As you learn
statistics you will appreciate how to use different statistical tools to treat the data.

For example, any transactional data (from store to banks) Any


tournament data (player statistics) Any survey / census data (unemployment / houses
/roads /population etc) Your history, my history is also a statistical data to. This is an example
of statistical graph shows statistical data.

EXPLORE

What is the importance of statistics?

Statistical knowledge helps you use the proper methods to collect the data,
employ the correct analyses, and effectively present the results. Statistics is a crucial
process behind how we make discoveries in science, make decisions based on data,
and make predictions.
Statistics is the
Collection
Calculation
Summarization
Presentation
Analyzation
Interpretation

April 9,2019, Investopedia


Statistics is a form of mathematical analysis that uses quantified models
representations and synopsis for a given set of experimental data or real life studies.
Statistics studies methodologies to gather, review, analyze and draw conclusions from
data.

Branches of Statistics
Descriptive Statistics aims at summarizing and presenting data in the form
which will make them easier to analyze and interpret.
Inferential Statistics it aims at drawing and making decision on the population
based on evidence obtained from a sample.

Classification of statistics

Parametric Statistics is a statistical approach that assumes random sample


from a normal distribution and involves testing of hypothesis about the mean.

Examples: One Sample z – test / t – test


Two Dependent Samples t – test
Two Independent Samples z – test
Two Independent Samples One – way ANOVA
Two Dependent Samples Two – way ANOVA

Nonparametric Statistics is a statistical approach with no underlying data


distribution assumed and involves hypothesis testing about a population median.

Examples: One Sample – Sign test


Two Dependent Samples – Wilcoxon sign rank
Two Independent Samples – Mann Whitney
Two Independent Samples – Kruskal – Wallies
Two Dependent Samples – Friedman

Measurement
- Refers to the process of assigning meaningful numbers (or Labels) to
Individual persons based on the degree to which they possess particular
characteristics.

The four levels of measurement, namely:

1. Nominal scale consists of a finite set of possible values or categories that have
unordered scales.
Example; cancer, accidents, gender (male, female) blood type, nationality,
occupation, civil status and so on. In this scale of measurement, there is no
natural order of categories.
2. Ordinal scale consists of a finite set of possible values or categories which have
ordered scales.
Example: pain level (none, mild, moderate, severe) social status, socio –
economic status and so on. In the ordinal scale there is a natural ordering of the
categories.
Ordinal scale ranks the categories clearly, but absolute distance between
categories are unknown. The numbers have limited meaning. The real
differences between adjacent ranks may not be equal.
3. Interval scale is generally measured on a continuum and differences between
any two numbers on the scale that are of known size.
Example: tons of gravel, number of covid positive, income, age, and so on.
An important property of interval scale is that there is no true zero point. That is,
the value “ 0 “ is arbitrary and does not reflect absence of the attribute.
4. Ratio scale like the interval scale is also measured on a meaningful continuum.
The distinction is that ratio scale has a meaningful zero point.
Example; Weight in kilograms, height, age and so on.
The Ratio scale is used when not only the order and interval size are important,
but also the ratio between two measurements is meaningful. This scale of
measurement is the highest or most precise scale.

variables

- Refer to characteristics of persons or objects which can take on different


values or labels for different persons or objects under consideration.

Example; undergraduate major is a variable that can take on values such as


mathematics, science, smoking habits, attitude toward studies, height, faculty
ranks and so on.

There are two types of variables

1. Response Variables – this may be continuous, ordinal or nominal. In a


regression setting, they are called dependent variables or Y variables.
2. Explanatory Variables is a variable that is thought to affect the values of the
response variable. It is sometimes called an independent variable or X
variable in a regression setting. In this case, explanatory variables like the
response variable, may be continuous, ordinal or nominal.

Classification of Variables

1. Qualitative variable is one whose categories are simply used as labels to


distinguish one group from another, rather than as a basis for saying that one
group is greater or less, higher or lower, or better or worse than another. This
variable has values that are intrinsically non – numeric. Clearly, qualitative
variables generally have either nominal or ordinal scales.
For example: cause of death, nationality, race, gender, severity of pain and so
on.
2. Quantitative variable is one whose categories can be measured and
ordered according to quantity. These variables are values that are intrinsically
numeric. Both interval and ratio scales belong to this classification. It can be
further divided into discrete and continuous variables.
For example: Number of children in a family, age, population in a barangay,
and so on.
3. Discrete variable refers to each element of a set of possible values that is
either finite or countably infinite that can appear only as whole numbers.
For example: number of missing teeth, household members, number patients
at the hospital X, and among others. In a discrete variable, there are gaps
between its possible values.
4. Continuous variable refers to each element of a set of possible values
including all values in an interval of a real line that can be expressed with
fractions or digits after a decimal point. Examples: body mass index, blood
pressure, cholesterol levels, height and so on.

SUMMATION NOTATION
https://www.cliffsnotes.com/study-guides/algebra/algebra-ii/sequences-and-series/summation-
notation

A simple method for indicating the sum of a finite (ending) number of terms in a
sequence is the summation notation. This involves the Greek letter sigma, Σ. When
using the sigma notation, the variable defined below the Σ is called the index of
summation. The lower number is the lower limit of the index (the term where the
summation starts), and the upper number is the upper limit of the summation (the term
where the summation ends). Consider

This is read as “the summation of (2 k + 3) as k goes from 2 to 7.” The replacements for
the index are always consecutive integers.

Example 1:
Write out the terms of the following sums; then compute the sum.

1.

2.

3.

1.

2.

Example 2:

Use sigma notation to express each series.

1. 8 + 11 + 14 + 17 + 20

2.
This is an arithmetic series with five terms whose first term is 8 and whose common
difference is 3. Therefore, a 1 = 8 and d = 3. The nth term of the corresponding
sequence is

Since there are five terms, the given series can be written as

1.

This is a geometric series with six terms whose first term is and whose

common ratio is . Therefore, and . The nth term of the


corresponding sequence is

Since there are six terms in the given series, the sum can be written

as

Classification of Data

1. Internal data refers to those data that relate to the activities within the
organization collecting the data.
For example: Department of health, it is the agency that is tasked to collect
the data. Philippines Statistics Authority, it is the agency responsible for
socio-demographic data, DOST and so on.
2. External data refers to the data that relates to the activities outside the
organization collecting data. All data obtained from computerized databases,
books, periodicals, government documents, and the like are considered as
external data. The data further classified into either statistical or non-
statistical.
3. Statistical data are those published data of the government, institutions,
companies, and associations which involve figures, tables and graphs.
4. Non-statistical area information which does not involve figures, tables and
graphs, like periodicals, books, and pamphlets.

SOURCES OF DATA

The sources of the data must be treated accurately. In order to ensure the
accuracy of data, one must know the sources of data. There are two sources of data.
These include primary source and secondary source.

1. Primary source refers to data that comes from the original sources and is
collected especially for the task at hand.

2. Secondary source refers to data collected by others for another purpose.


Paper based sources like books, journals, periodicals, abstracts, indexes,
directories, research reports, and so on. Electronic sources like CD-ROM,
online databases, internet, videos and broadcast.

METHODS OF DATA COLLECTION

1. Mailed questionnaires are one of the most popular means of collecting data,
however it is difficult to design and the most criticized method. This method of
data collection may be employed if we have the names and addresses of the
intended respondents as the ones filling the forms.
2. Interview is a method of data collection that is primarily used to gain an
understanding of the underlying reasons and motivations for people’s
attitudes, preferences or behavior where there is a face-to-face conversation
between the interviewer and the interviewee.
3. Observation refers to the method of extracting data which involves recording
the behavioral patterns of people, objects and events in a systematic manner.
In this method, the information gathered may be documented using cameras,
video tape recorders, laboratory diagnostic apparatus, among others.
4. Telephone interview is an alternative form of personal interview. It is
considered as the most popular method in provinces or cities where almost all
residents have personal telephone.

EXPLAIN

Statistics is a collection, presentation, summarization, analyzation and


interpretation of data. There are two branches of statistics, the descriptive and
inferential statistics. Statistics can also measure the level of measurement such as
nominal scale, ordinal scale, interval scale and ratio scale. We also learn what are the
uses of different variables, the qualitative and quantitative variables as dependent and
independent variables. Lastly we understand how to classify the different sources of
data such as internal and external data, also classified as statistical and non-statistical
data. We also determine the sources of data as primary source and secondary source.
Statistics needs to know the methods of data collection, one method is questionnaire
method, interview, observation and telephone interview method.

Statistics helps you use the proper methods to collect the data, employ the correct analyses,
and effectively present the results. Statistics is a crucial process behind how we make
discoveries in science, make decisions based on data, and make predictions.

Statistics play a vital role in every field of human activity. Statistics helps in
determining the existing position of per capita income, unemployment, population and
every fields of endeavor.

Module 2 – Presentation of Data and Frequency Distribution

Learning Objectives:

At the end of this module;

1. You must be able to know how to present the following data.


2. You must be able to differentiate the raw data and the arrangement of data.
3. You must be able to organized the data into frequency distribution.
4. You must be able to graph the data into different polygon or ogive graph.

ENGAGE

The figure above will show you the different tabular and graphical presentation of
data. From the presentation we interpret and give a conclusion to the data. It also helps
us analyze the data.
In this module you will be able to give emphasis to significant figures and
appropriate when there are few figures to be presented. It is concise and easy to
understand, it facilitates analysis of categories of the given variables and presents data
in more detail. The raw data is collected at random and they have not been organized or
processed numerically for use. It is data in its original form. The frequency distribution is
a useful way to present data if the formation of a frequency distribution should neither
be too small nor too large. Also the ogive graph that represents the cumulative
frequencies of the classes. It is constructed by joining with lines a series of points which
are the class marks or mid points of the classes as against less than or greater than
cumulative frequencies.

EXPLORE

Presentation of data and frequency distribution

Textual presentation refers to a method of presenting the data which uses


statements with few numbers in order to describe the data, purposely to get attention to
some significant data. However, if there are many facts involved, this method should not
be used alone, because of the difficulty in reading and assimilating a list of facts and
figures.
For instance, that of the 250 sample interviewed, the following complaints were
noted: 27 for lack of books in the library, 25 for a dirty playground, 20 for lack of
laboratory equipment, 17 for a not well-maintained university building, while another 17
complained of unsanitary cafeteria because of foul smelling toilet. Another 13
complained that the food in the cafeteria is not enough, 10 perceived that the teachers
are not friendly, and five complained for lack of resting place.

Advantages of textual presentation are as follows:


1. Gives emphasis to significant figures
2. Appropriate when there are few figures to be presented.

Disadvantages of textual presentation are the following:

1. Data is incomprehensible when the large quantitative data are included in the
paragraph
2. Paragraph involving many figures can be tiresome to most readers when the same
words are repeated many times

TABULAR PRESENTATION

The data presentation in a table is formally referred to as “tabular presentation”.


Tabular presentation refers to a method of presenting data consisting of columns and
rows. Data should never be put in a table if it can be described efficiently in one or two
sentences. When used alongside with textual form, the discussion must come either
before or after
the table.
The following pointers are needed in the construction of table:
1. Every table should be self-explanatory.
2. Position the table after the text where it is first cited.
3. Unit of measurement must be clearly stated.
4. Show total, subtotals, percentages, and the like if necessary.
5. the number of variables in a table must be at most three.
6. Provide a source of the data when taken from another publication.

The following are the advantages of tabular presentation:


1. Concise and easy to understand.
2. Facilitates analysis of categories of the given variable.
3. Presents data in more detail.

The disadvantages of tabular presentation are given below:


1. Too many rows or columns could make it difficult for the reader to understand the
data. You may need to reduce the amount of data, or separate the data into additional
tables.
2. Require more time to construct.

There are ten essential parts of a statistical table. These include the following: table
number, table title, column spanner, stub head, stub, column heads, body, divider,
footnotes, and source note.

Table Number refers to the relative position of the table within a series. It is placed on
the same line as the opening of the tile, separated from the title proper by a period.
Numbers should be omitted for a single table. Tables should be numbered in a
continuous Arabic numerals beginning with 1.

Table Title refers to a brief statement about the table presented. All beginning letters of
the words in the title must be capitalized and the rest are in lower case. It should be
concise and the key variables must be shown in the table. It should never be more than
two lines. Periods are left out at the end of the title. If the title is two lines long, it must
be single-spaced. It should always go above the table.

Stub head refers to the heading in the table that is placed above the leftmost column.
The column is the stub column. This column usually lists the independent variable. The
data that follow the stub column are known as the stub. All other column headings are
simply referred to as column heads.
Body is the main part of the table, which contains the quantitative information. It is the
actual data in a table occupying the columns, for example, percentages, frequencies,
statistical test results, means, “N” (number of samples), among others.

Dividers are lines that frame the top and bottom of the table and, or mark the different
parts of a table. They are often used for division or emphasis within the body of a table.

Footnote is any statement or note inserted at the foot or bottom of the table. You may
use table notes to explain anything in your table that is not self-explanatory. While basic
symbols and abbreviations like SD for standard deviation, N for sample size, and % for
percentage, are commonly used, you may have other technical terms or other issues
that you wish to explain.

Source Note refers to the specific source of the statistics. It is introduced by the word
“Source”. Thus, source notes may be included to acknowledge the origin of the data.
This is placed beneath the footnote.

GRAPHICAL PRESENTATION

A graph is another method of presenting data using a visual representation of


the relations between certain quantities plotted with reference to a set of axes. It may be
presented in the form of bar graph, component bar, pie chart, line graph, histogram and
frequency polygon. Other types of graph (scatter plot, box plot, dot plot) will be
discussed in the other chapters.

Bar graph is a graph consisting of bars of the same sizes, which are drawn vertically or
horizontally for the purpose of comparing values to each other. Horizontal bar graph
usually used for qualitative variables.

Pie chart is a graph used to show how a whole is divided into its component parts. The
sum parts of the whole should be 100%. The pie chart is sometimes called the circle
graph. This kind of graph is needed to show percentages effectively. Angle of each
wedge “slice” is determined by multiplying the percentage contribution of the component
by 3.6. To highlight a specific component, a slice may be “exploded” or extended. This
kind of graph is most appropriate for nominal data. Different colors of the slice of the pie
chart can be applied to emphasize.

Component bar is a graph made of bar representing the whole which is further divided
into smaller rectangles representing the parts wherein the area of each smaller
rectangle is proportional to the relative contribution of the component to the whole. This
component bar is preferable over the pie chart in situations where the compositions of
two or more groups are to be compared. Like the pie chart, different colors can be
applied to the components to emphasize differences between parts of the whole.

Line graph is a graph used for displaying data that changes continuously over time.
The time is chronologically arranged on the horizontal axis and the relevant values are
indicated on the vertical axis. Variations in the data are indicated by a series of line
segments formed by joining consecutive points.

RAW DATA
Raw data is collected at random; they have not been organized or processed
numerically for use. It is data in its original form.

For example:

The following grades of 30 freshmen students of statistics @ Universidad de


Zamboanga are given below.
89 90 86 93 76 80 92 83 81 92 77 85 95
88 80 79 83 94 75 82 90 86 92 79 80 91
86 80 92 90

With the above raw data, it takes time to find the highest and lowest observations. To
make sense of the data, we have to arrange the observations from highest to lowest or
vice versa.

ARRAY
An array is an arrangement of observations in a given data according to their
magnitude from highest to lowest or lowest to highest.

95 92 89 85 80 79
94 92 88 83 80 79
93 91 86 83 80 77
92 90 86 82 80 76
92 90 86 81 79 75

Having organized the data in an array, it is easier to find the highest to the lowest
observation. Also the frequencies of each observation can be easily and quickly
determined. Aside from that, it is easier to find mode, median and all measures of
position. (refer to the next module)

FREQUENCY DISTRIBUTION
A useful way to present the data is the formation of a frequency distribution.
Frequency distribution refers to the number of observations that fall within a certain
range of data. To organize data into a frequency distribution, we need to pick some
convenient class intervals and tabulate the number of each individual observation that
falls into a particular interval. There is no clear-cut frequency distribution should neither
be too small nor too large; it should not be more than 20 and not less than 7 to avoid
laborious tabulation and erroneous grouping.

The steps in preparing a frequency distribution:

1. Find the lowest and the highest observations;


2. Subtracting the lowest from the highest observations;
3. Decide on the number of class intervals. Maximum number of class intervals is 20,
minimum number is 7, and ideal number is between 10 and 15 inclusively;
4. Determine the interval size by dividing step 2 by the desired number of class
intervals. Unless specified, it is advisable to use the ideal number of class intervals;
5. Choose an appropriate lower limit for the first class interval. This number should
approach, or equal but not exceed the lowest observation and is exactly divisible by the
interval size;
6. Write the lowest limit at the bottom and from it develop the lower limits of the next
higher class intervals by adding the interval size to a preceding lower limit until the
highest observation and is exactly divisible by the interval size;
7. Read each observation in a given data and record a tally for it opposite the class
interval to which it belongs;
8. Count the number of tallies falling within each class interval to get the frequency of
each class interval; and
9. Accumulate the frequencies to get the total number of observations.

Example 1:

Construct a frequency distribution of 30 freshmen students of statistics @


Universidad de Zamboanga randomly chosen or selected from section A and B. The
following data are given below.
89 90 86 93 76 80 92 83 81 92 77 85 95
88 80 79 83 94 75 82 90 86 92 79 80 91
86 80 92 90

Solution:
Find the Highest Score first then find the Lowest Score
H – L = 95 – 75 = 20
H–L 95 – 75
Find the interval =--------------- = ----------------- = 2
10 10

Class Interval Frequency Class Boundary Class Mark


95 – 96 1 94.5 – 96.5 95.5
93 – 94 2 92.5 – 94.5 93.5
91 – 92 5 90.5 – 92.5 91.5
89 – 90 3 88.5 – 90.5 89.5
87 – 88 1 86.5 – 88.5 87.5
85 – 86 4 84.5 – 86.5 85.5
83 – 84 2 82.5 – 84.5 83.5
81 – 82 2 80.5 – 82.5 81.5
79 – 80 7 78.5 – 80.5 79.5
77 – 78 1 76.5 – 78.5 77.5
75 – 76 2 74.5 – 76.5 75.5
_____________
N = 30

A grouping defining a class as 75 – 76 is called a class interval.


The end numbers 75 and 76 are called class limits
The smaller number, 75, is the lower limit and the larger number, 76, is called the
upper limit.
The smaller number, 74.5, is the lower class boundary.
The larger number, 76.5, is the upper class boundary.
The size of a class interval is the difference between the lower and upper class
boundaries and is also referred to as the class size.
The class mark is the midpoint of the class interval and is obtained by adding the lower
limit or lower class boundary and upper class limit or upper class boundary then dividing
it by 2.

HISTOGRAM

Histogram is a method of graphing a frequency distribution. It is constructed by


connecting the classes on the horizontal axis and the frequency of the classes on the
vertical axis; the horizontal axis represents the class boundaries (or class mark) while
the vertical axis represents the frequency in each class interval. Clearly, a histogram is
a bar graph wherein the vertical lines of the bars are erected at the class boundaries (or
class marks) and the height of the bars correspond to the class frequency.

FREQUENCY POLYGON

Frequency polygon is another method of graphing a frequency distribution. It is


constructed by joining with straight lines a series of points which are the class marks
( or midpoints) of the classes as against their corresponding frequencies; the polygon is
closed by considering an additional class at each end and the ends of the lines are
brought down to the horizontal axis at the class marks of the additional classes.

CUMULATIVE FREQUENCY DISTRIBUTION

It is often desirable to accumulate the frequencies of the distribution when the


numbers of observations that lie below (less than) or above (greater than) a certain
class boundary are to be determined. This is called cumulative frequencies. There are
two types of cumulative frequencies: the “less than” and the “greater than”.
A less than cumulative frequency indicates the number of observations in the
distribution that falls below a specified upper class boundary. It is obtained by
successively accumulating up, starting from the smallest to the largest class interval,
and all the class frequencies in the distribution.
A greater than cumulative frequency indicates the number of observations in the
distribution that lies above a certain lower class boundary. It is obtained by successively
accumulating up, starting from the largest to the smallest class interval, all the class
frequencies in the distribution.

Example;

The “less than” and the “greater than” cumulative frequency distribution of 30
freshmen students of statistics @ Universidad de Zamboanga randomly chosen or
selected from section A and B. The following data are given below.

Class Interval Frequency <cf >cf


95 – 96 1 30 1
93 – 94 2 29 3
91 – 92 5 27 8
89 – 90 3 22 11
87 – 88 1 19 12
85 – 86 4 18 16
83 – 84 2 14 18
81 – 82 2 12 20
79 – 80 7 10 27
77 – 78 1 3 28
75 – 76 2 2 30
_____________
N = 30
Then you can do the graph of less than and greater than Ogive of freshmen students of
statistics. Draw the horizontal axis and the vertical axis and level the vertical axis as the
frequency from zero to 30. While the horizontal axis used the class mark or mid-point.

EXPLAIN

From the presentation of data you were able to view how each data was
presented and frequency distribution you were able to understand the different methods
of presenting data. The data were presented in different ways with respect to the kind of
gathered data. It can be a textual, tabular, bar graph, pie chart, component bar and line
graph.
Text, tables, and graphs for data and information presentation are very
powerful communication tools. They can make an article easy to understand, attract and
sustain the interest of readers, and efficiently present large amounts of complex information. May
19, 2017

You might also like