Lectures and Notes MATH 212 (Part 1)
Lectures and Notes MATH 212 (Part 1)
Lesson 1: INTRODUCTION
1.2 Data
What is data?
The word data is the plural form of datum. Data is referred to as a collection of
facts, such as values or measurements, observation or even just descriptions of things.
Types of data
Data can be in different forms: qualitative and quantitative:
1. Qualitative data
“Qualitative data” is data that uses words and descriptions. Qualitative data can be
observed but is subjective and therefore difficult to use for the purposes of making
comparisons. Descriptions of texture, taste, or an experience are all examples of
qualitative data. Qualitative data collection methods include focus groups, interviews, or
open-ended items on a survey.
2. Quantitative data
“Quantitative data” is data that is expressed with numbers. Quantitative data is data
which can be put into categories, measured, or ranked. Length, weight, age, cost, rating
scales, are all examples of quantitative data. Quantitative data can be represented visually in
graphs and tables and be statistically analyzed.
Categorical data
“Categorical data” is data that has been placed into groups. An item cannot
belong to more than one group at a time. Examples of categorical data would be
the individual’s current living situation, race, sex, age group, and educational
level
1. Discrete data
Data that are obtained by counting and are measured precisely.
2. Continuous data
“ContinUOUs data” is numerical data measured on a continuous range or scale. In
continuous data, all values are possible with no gaps in between. Examples of
continuous data are a person’s height or weight, and temperature. Many types of
analysis can be used with continuous data, including effect size calculations.
1
1.3 Levels of Measurement
Types of Data & Measurement Scales: Nominal, Ordinal, Interval and Ratio
In statistics, there are four data measurement scales: nominal, ordinal, interval and ratio.
Nominal
Let’s start with the easiest one to understand. Nominal scales are used for labelling
variables, without any quantitative value. “Nominal” scales could simply be called
“labels.” Here are some examples, below. Notice that all of these scales are mutually
exclusive (no overlap) and none of them have any numerical significance. A good way to
remember all of this is that “nominal” sounds a lot like “name” and nominal scales are kind
of like “names” or labels.
Ordinal
With ordinal scales, the order of the values is what’s important and significant, but the
differences between each one is not really known. Ordinal scales are typically measures
of non-numeric concepts like satisfaction, happiness, discomfort, etc.
“Ordinal” is easy to remember because it sounds like “order” and that’s the key to
remember with “ordinal scales”–it is the order that matters, but that’s all you really get
from these.
Example:
Interval
Interval scales are numeric scales in which we know both the order and the exact
differences between the values. The classic example of an interval scale is
Celsius temperature because the difference between each value is the same. For example,
the difference between 60 and 50 degrees is a measurable 10 degrees, as is the
difference between 80 and 70 degrees.
Interval scales are nice because the realm of statistical analysis on these data sets
opens up. For example, central tendency can be measured by mode, median, or mean;
standard deviation can also be calculated.
Here’s the problem with interval scales: they don’t have a “true zero.” For example,
there is no such thing as “no temperature,” at least not with Celsius. In the case of
interval scales, zero doesn’t mean the absence of value, but is actually another number
used on the scale, like 0 degrees Celsius. Negative numbers also have meaning. Without a
true zero, it is impossible to compute ratios. With interval data, we can add and subtract,
but cannot multiply or divide.
Ratio
Ratio scales are the ultimate nirvana when it comes to data measurement scales
because they tell us about the order, they tell us the exact value between units, and they
also have an absolute zero–which allows for a wide range of both descriptive and
inferential statistics to be applied. At the risk of repeating myself, everything above
about interval data applies to ratio scales, plus ratio scales have a clear definition of
zero. Good examples of ratio variables include height, weight, and duration.
CLASSIFICATION OF DATA
1.3 Statistics
What is Statistics?
The field of Statistics deals with the collection, presentation, analysis, and use of
data to make decisions, solve problems, and design products and processes. (Montgomery,
D. and Runger G.)
Statistics is the science of learning from data, and of measuring, controlling, and
communicating uncertainty; and it thereby provides the navigation essential for controlling
the course of scientific and societal advances (Davidian, M. and Louis, T. A.,
10.1126/science.1218685).
Because many aspects of engineering practice involve working with data, obviously
some knowledge of statistics is important to any engineer. Specifically, statistical
techniques can be a powerful aid in designing new products and systems, improving existing
designs, and designing, developing, and improving production processes. Drawing conclusions
from data is vital in research, administration, and business. It is important to collect data
in a way which allows its analysis. The representation of collected data in a data set or
data matrix allows the application of a variety of statistical methods.
The main difference between a population and sample has to do with how observations are
assigned to the data set.
A population includes all of the elements from a set of data.
A sample consists of one or more observations drawn from the population.
Other differences have to do with nomenclature, notation, and computations. For example,
A measurable characteristic of a population, such as a mean or standard deviation,
is called a parameter; but a measurable characteristic of a sample is called
a statistic.
What is sampling?
Sampling Methods
1. Probability sampling: Probability sampling is a sampling technique where a researcher
sets a selection of a few criteria and chooses members of a population randomly. All the
members have an equal opportunity to be a part of the sample with this selection
parameter.
A. Simple Random Sampling
A sampling method is a procedure for selecting sample elements from a
population. Simple random sampling refers to a sampling method that has the following
properties.
The population consists of N objects.
The sample consists of n objects.
All possible samples of n objects are equally likely to occur.
An important benefit of simple random sampling is that it allows researchers to use
statistical methods to analyze sample results. For example, given a simple random sample,
researchers can use statistical methods to define a confidence interval around a sample
mean. Statistical analysis is not appropriate when non-random sampling methods are used.
There are many ways to obtain a simple random sample. One way would be the lottery
method. Each of the N population members is assigned a unique number. The numbers are
placed in a bowl and thoroughly mixed. Then, a blind-folded researcher
selects n numbers. Population members having the selected numbers are included in the
sample.
Slovin's formula
- is used to calculate the sample size (n) given the population size (N) and a margin of error
(e).
- it's a random sampling technique formula to estimate sampling size
-It is computed as
where:
n = no. of samples
N = total
population
e = error margin / margin of error
Example No. 1
Determine the sample size in research methodology, when N=1000 and e=0.05.
Solution:
Example No. 2
A researcher plans to conduct a survey. If the population on High City is
1,000,000 , find the sample size if the margin of error is 2.5%
Solution:
First : Convert the margin of error 2.5% to decimal by dividing it by 100
Given:
e = 2.5% = 0.025 N = 1,000,000
B. Cluster sampling: Cluster sampling is a method where the researchers divide the
entire population into sections or clusters that represent a population. Clusters
are identified and included in a sample based on demographic parameters like age,
sex, location, etc. This makes it very simple for a survey creator to derive
effective inference from the feedback.
EXAMPLE:
In a survey of students from a city, we first select a sample of schools, then we
select a sample of classrooms within the selected schools, and finally we select a
sample of students within the selected classes.
C. Snowball sampling: Snowball sampling is a sampling method that researchers apply when
the subjects are difficult to trace. For example, it will be extremely challenging to
survey shelterless people or illegal immigrants. In such cases, using the snowball theory,
researchers can track a few categories to interview and derive results. Researchers
also implement this sampling method in situations where the topic is highly sensitive and
not openly discussed—for example, surveys to gather information about HIV Aids. Not
many victims will readily respond to the questions. Still, researchers can contact people
they might know or volunteers associated with the cause to get in touch with the victims
and collect information.