MMW Chap 3 Data Management Statistics Part 1
MMW Chap 3 Data Management Statistics Part 1
INTRODUCTION
History of Statistics
Records also show that the Roman Empire was the first government to
gather extensive data about the population, area, and wealth of the territories
that it controlled. In Europe, few comprehensive censuses were made during
the Middle Ages in the early 16th century, registration of deaths and births
begun in England. Then in 1662 the first noteworthy statistical study of
population was made. In 1691, a similar study of mortality made in Breslau,
1
Germany was used by the English astronomer Edmond Halley as a basis for the
earliest mortality table. In the 19th century, investigators recognized the need
to reduce information to numerical values to avoid the ambiguity of verbal
description.
2
Proper Discussion
Meaning of Statistics
In its plural sense, the word Statistics refer to numerical facts and figures collected in a
systematic manner with a definite purpose in any field of study. In this sense, statistics are also
aggregates of facts which are expressed in numerical form.
In its singular sense, it refers to the science comprising methods which are used in
collection, analysis, interpretation and presentation of numerical data.
Note:
“Statistic” refers to a numerical quantity like mean, median,
variance etc…, calculated from sample value.
3
• Formulating and testing of hypothesis is an important function of statistics. So
statistics examines the truth and helps in innovating new ideas.
• Statistics helps in formulating plans and policies in different fields. Statistical
analysis of data forms the beginning of policy formulations. Hence, statistics is
essential for planners, economists, scientists and administrators to prepare
different plans and programmers.
• Statistics helps in forecasting the trend and tendencies. Statistical techniques are
used for predicting the future values of a variable.
Note:
Population refers to the total set of observations that can be made.
Sample refers to a set of observations drawn from a population.
➢ Descriptive Statistics
Example:
➢ Inferential Statistics
4
Example:
Of 350 randomly selected people in the town of Bangued, Abra, 280 people had
the last name Racsa. An example of inferential statistics is the following statement:
"80% of all people living in Abra have the last name Racsa."
We have no information about all people living in Abra, just about the 350 living
in Bangued. We have taken that information and generalized it to talk about all people
living in Abra. The easiest way to tell that this statement is not descriptive is by trying to
verify it based upon the information provided.
Levels of Measurement
The level of measurement refers to the relationship among the values that
are assigned to the of
Why is Level attributes for a variable.
Measurement Important?
First, knowing the level of measurement helps you decide how to interpret the data from
that variable. When you know that a measure is nominal (like the one just described), then you
know that the numerical values are just short codes for the longer names. Second, knowing the
level of measurement helps you decide what statistical analysis is appropriate on the values that
were assigned. If a measure is nominal, then you know that you would never average the data
values or do a t-test on the data.
• Nominal
• Ordinal
• Interval
• Ratio
5
A nominal variable contains categorical data for mutually exclusive, but
not-ordered, categories.
Examples:
Examples:
6
Essentially, interval data are ordinal, but they have an extra property - the ability to
meaningfully add and subtract measurements. In interval-scaled data, the gaps between the
numbers are comparable, unlike with ordinal data. Any interval has the same meaning regardless
of its location on the scale. "X is five inches longer than y" has meaning regardless of the values
of X and Y. However, ratios are meaningless on an interval scale because an interval scale has
no true zero. Temperature scales are an example of this, so are decibel scales. Zero degrees
Fahrenheit does not mean the total absence of temperature. Zero decibels do not mean there is no
sound. Furthermore, if it is 80 degrees outside today and it was only 40 degrees outside yesterday
we cannot say that today is twice as hot as yesterday. Similarly a sound level of 80 dB is not
twice as loud as a sound level of 40 dB. In short, if the data can be ordered and the arithmetic
difference is meaningful, then the data are at least interval data.
Examples:
• Money
• People
• Education (in years)
• Temperature scale (Fahrenheit or Celsius)
• Measurement of Sea Level
Ratio data are the highest form of data measurement and the form we are most familiar
with. For ratio data both differences and ratios are interpretable. Ratio data have a natural zero.
Ratio data look a lot like interval data. However, the zero point has a special meaning in ratio-
scaled data: it indicates the absence of whatever property is being measured. Ratio data always
have the flavor of counting: when you measure the amount of money that you have, you are
counting up coins and bills. When you are measuring your height, you are counting the number
of inches off the ground to the top of your head. Both ratio and interval data make use of a wide
range of statistical analysis tools.
Examples:
7
tend to be less sensitive. At each level up the hierarchy, the current level includes all of the
qualities of the one below it and adds something new. In general, it is desirable to have a higher
level of measurement (e.g., interval or ratio) rather than a lower one (nominal or ordinal).
Collecting Data:
Types of Questions
All researchers must make two basic decisions when designing a survey--they must
decide: 1) whether they are going to employ an oral or written method, and 2) whether they are
going to choose questions that are open or close-ended.
• Questions worded simply and clearly, not ambiguous or vague, must be objective
• Attractive in appearance (questions spaced out, and neatly arranged)
• Write a descriptive title for the questionnaire
• Write an introduction to the questionnaire
• Order questions in logical sequence
• Keep questionnaire uncluttered and easy to complete
• Delicate questions last (especially demographic questions)
• Design for easy tabulation
• Design to achieve objectives
• Define terms
• Avoid double negatives (I haven't no money)
• Avoid double barreled questions (this AND that)
• Avoid loaded questions ("Have you stopped beating your wife?")
8
• Phrase questions for all respondents
Survey. Statistical surveys are used to collect quantitative information from a specific
population. A survey may focus on opinions or factual information depending upon the purpose
of the study. Surveys may involve answering a questionnaire or being interviewed by a
researcher. The census is a type of survey.
Advantages:
Disadvantages:
• Are dependent upon the respondent's honesty and motivation when answering
• Can be flawed by non-response
• Can possess questions or answer choices that may be interpreted differently by
different respondents (such as the choice "agree slightly")
During a "controlled" experiment, the researcher will separate the sample population into
groups with one group established as the control group. All groups will be manipulated in some
manner, except for the control group which will remain the same.
Sampling Methods
9
Proper sampling methods are important for eliminating bias in the selection process.
They can also allow for the reduction of cost or effort in gathering samples.
In order to have a random selection method, you must set up some process or procedure
that assures that the different units in your population have equal probabilities of being chosen.
Humans have long practiced various forms of random selection, such as picking a name out of a
hat, or choosing the short straw. The key benefit of probability sampling methods is that they
guarantee that the sample chosen is representative of the population. This ensures that the
statistical conclusions will be valid.
There are many ways to obtain a simple random sample. One way would be the lottery
method. Each of the N population members is assigned a unique number. The numbers are
placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers.
Population members having the selected numbers are included in the sample.
Examples:
10
Examples:
2. A campus survey is conducted. Population may be divided into strata based on the
colleges – College of Arts, College of Sciences, College of Engineering, College
of Accountancy, etc. With each stratum, random selection of respondents may be
conducted.
Examples:
1. In a study of the opinions of homeless across a country, rather than study a few
homeless people in all towns, a number of towns are selected and a significant
number of homeless people are interviewed in each one.
2. In studying the common experiences of the failing students in the Campus, a study
of the common experiences of failing students in each college is rather conducted.
In each college, a number of students with failing marks are selected.
This method is different from simple random sampling since every possible sample of n
elements is not equally likely.
11
Example:
120/8=15, so every 15th house is chosen after a random starting point between 1
and 15. If the random starting point is 11, then the houses selected are 11, 26, 41,
56, 71, 86, 101, and 116.
If there were 125 houses, 125/8=15.625, so should you take every 15th house or
every 16th house? If you take every 16th house, 8*16=128 so there is a risk that
the last house chosen does not exist. To overcome this, the random starting point
should be between 1 and 10. On the other hand if you take every 15th house,
8*15=120 so the last five houses will never be selected. The random starting
point should now be between 1 and 20 to ensure that every house has some
chance of being selected.
1. A study of rehabilitation after stroke collected a small sample for a focus group
of patients, care givers, and health care providers with unique expertise.
12
Quota sampling. A non-probability sampling technique wherein
the researcher ensures equal or proportionate representation of subjects
\ depending on which trait is considered as basis of the quota.
Examples:
1. If the basis of the quota is college year level and the researcher needs equal
representation, with a sample size of 100, he must select 25 1st year students,
another 25 2nd year students, 25 3rd year and 25 4th year students. The bases of
the quota are usually age, gender, education, race, religion and socioeconomic
status.
Presentation of Data
The main portion of Statistics is the display of summarized data. Data is initially
collected from a given source, whether they are experiments, surveys, or observation, and is
presented in three methods:
In the presentation of the text, the writer can emphasize the importance of some figures.
This method of data is not particularly effective because of some instances, like it takes dull
reading and may not give a clear and concise meaning of the quantitative relationship indicated
in any particular report.
Tables are constructed to facilitate analysis of relationship and are made possible by the
orderly arrangement of numerical facts in columns and rows.
a. Table Number. Each table must be given a number. Table number helps in
distinguishing one table from other tables. Usually tables are numbered according to
the order of their appearance in a chapter. For example, the first table in the first
chapter of a book should be given number 1.1 and second table of the same chapter
be given 1.2 Table number should be given at its top or towards the left of the table.
13
b. Title of the Table. Every table should have a suitable title. It should be short & clear.
Title should be such that one can know the nature of the data contained in the table as
well as where and when such data were collected. It is either placed just below the
table number or at its right. A title is the main heading written in capital shown at the
top of the table. It must explain the contents of the table and throw light on the table
as whole different parts of the heading can be separated by commas there are no full
stop be used in the little.
c. The Box Head (column captions). Caption refers to the headings of the columns. It
consists of one or more column heads. A caption should be brief, concise and self-
explanatory, Column heading is written in the middle of a column in small letters.
The heading of each column is called a column caption, while the section of a table
that contains the column captions, is referred to as box head.
The vertical heading and subheading of the column are called columns captions. The
spaces were these column headings are written is called box head. Only the first letter
of the box head is in capital letters and the remaining words must be written in small
letters.
d. Stub (Row captions). Stub refers to the headings of rows. The horizontal headings
and sub heading of the row are called row captions and the space where these rows
headings are written is called stub.
e. Body. This is the most important part of a table. It contains a number of cells. Cells
are formed due to the intersection of rows and column. Data are entered in these cells.
It is the main part of the table which contains the numerical information classified
with respect to row and column captions.
f. Head Note. The head-note (or prefatory note) contains the unit of measurement of
data. It is usually placed just below the title or at the right hand top corner of the
table. A statement given below the title and enclosed in brackets usually describe the
units of measurement is called prefatory notes.
g. Foot Note. A foot note is given at the bottom of a table. It helps in clarifying the
point which is not clear in the table. A foot note may be keyed to the title or to any
column or to any row heading. It is identified by symbols such as *, +, @, £ etc. It
appears immediately below the body of the table providing the further additional
explanation.
h. Source Note. The source note shows the source of the data presented in the table.
Reliability and accuracy of data can be tested to some extent from the source note. It
shows the name of the author, title, volume, page, publisher’s name, year and place of
publication of the book or journal from which data are complied. The source notes is
given at the end of the table indicating the source from when information has been
taken. It includes the information about compiling agency, publication etc.
14
----THE TITLE----
----Prefatory Notes----
----Box Head----
----Row Captions---- ----Column Captions----
Foot Notes…
Source Notes…
Types of Graphs
Example: The following are the size of radius of a water tank that are available on the market
with their corresponding quantities. This table serves ast he general data used for the following
graphs.
Radius(decimeter) f x
0.98-1.01 7 0.995
0.94-0.97 5 0.955
0.90-0.93 2 0.915
0.86-0.89 6 0.875
0.82-0.85 4 0.835
0.78-0.81 5 0.795
0.74-0.77 3 0.755
0.70-0.73 8 0.715
0.66-0.69 4 0.675
0.62-0.65 2 0.635
0.58-0.61 1 0.595
0.54-0.57 3 0.555
N=50
15
• Pictograph
A pictograph uses an icon to represent a quantity of data values in order to decrease the
size of the graph. A key must be used to explain the icon.
• Pie chart
A pie chart displays data as a percentage of the whole. Each pie section should have a
label and percentage. A total data number should be included.
• Histogram
A histogram displays continuous data in ordered columns. Categories are of continuous
measure such as time, inches, temperature, etc.
16
• Bar graph
A bar graph displays discrete data in separate columns. A double bar graph can be used to
compare two data sets. Categories are considered unordered and can be rearranged
alphabetically, by size, etc.
• Frequency Polygon
A frequency polygon can be made from a line graph by shading in the area beneath the
graph. It can be made from a histogram by joining midpoints of each column.
• Scatter plot
A scatter plot displays the relationship between two factors of the experiment. A trend
line is used to determine positive, negative, or no correlation.
17
18