0% found this document useful (0 votes)
113 views

Lectures and Notes MATH 212 (Part 1)

This document provides an introduction to data analysis concepts. It defines data analysis as evaluating data using analytical or statistical tools to discover useful information. It describes different types of data including qualitative, quantitative, categorical, discrete, and continuous data. It also discusses different levels of measurement for data including nominal, ordinal, interval, and ratio scales. Finally, it defines key terminology such as population, sample, primary data, secondary data, and provides an overview of statistical analysis methods for describing and comparing data.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

Lectures and Notes MATH 212 (Part 1)

This document provides an introduction to data analysis concepts. It defines data analysis as evaluating data using analytical or statistical tools to discover useful information. It describes different types of data including qualitative, quantitative, categorical, discrete, and continuous data. It also discusses different levels of measurement for data including nominal, ordinal, interval, and ratio scales. Finally, it defines key terminology such as population, sample, primary data, secondary data, and provides an overview of statistical analysis methods for describing and comparing data.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

LECTURE NOTES in MATH 212 ( ENGINEERING DATA ANALYSIS)

Lesson 1: INTRODUCTION

1.1 What Is Data Analysis?


Data analysis is the process of evaluating data using analytical or statistical tools to
discover useful information.

1.2 Data

What is data?
The word data is the plural form of datum. Data is referred to as a collection of
facts, such as values or measurements, observation or even just descriptions of things.

Types of data
Data can be in different forms: qualitative and quantitative:

1. Qualitative data

“Qualitative data” is data that uses words and descriptions. Qualitative data can be
observed but is subjective and therefore difficult to use for the purposes of making
comparisons. Descriptions of texture, taste, or an experience are all examples of
qualitative data. Qualitative data collection methods include focus groups, interviews, or
open-ended items on a survey.

2. Quantitative data

“Quantitative data” is data that is expressed with numbers. Quantitative data is data
which can be put into categories, measured, or ranked. Length, weight, age, cost, rating
scales, are all examples of quantitative data. Quantitative data can be represented visually in
graphs and tables and be statistically analyzed.

Categorical data

“Categorical data” is data that has been placed into groups. An item cannot
belong to more than one group at a time. Examples of categorical data would be
the individual’s current living situation, race, sex, age group, and educational
level

There are two types of quantitative data: discrete and continuous.

1. Discrete data
Data that are obtained by counting and are measured precisely.

2. Continuous data
“ContinUOUs data” is numerical data measured on a continuous range or scale. In
continuous data, all values are possible with no gaps in between. Examples of
continuous data are a person’s height or weight, and temperature. Many types of
analysis can be used with continuous data, including effect size calculations.

1
1.3 Levels of Measurement

Types of Data & Measurement Scales: Nominal, Ordinal, Interval and Ratio
In statistics, there are four data measurement scales: nominal, ordinal, interval and ratio.

Nominal

Let’s start with the easiest one to understand. Nominal scales are used for labelling
variables, without any quantitative value. “Nominal” scales could simply be called
“labels.” Here are some examples, below. Notice that all of these scales are mutually
exclusive (no overlap) and none of them have any numerical significance. A good way to
remember all of this is that “nominal” sounds a lot like “name” and nominal scales are kind
of like “names” or labels.

Examples of Nominal Scales

Ordinal

With ordinal scales, the order of the values is what’s important and significant, but the
differences between each one is not really known. Ordinal scales are typically measures
of non-numeric concepts like satisfaction, happiness, discomfort, etc.

“Ordinal” is easy to remember because it sounds like “order” and that’s the key to
remember with “ordinal scales”–it is the order that matters, but that’s all you really get
from these.

Example:

Interval

Interval scales are numeric scales in which we know both the order and the exact
differences between the values. The classic example of an interval scale is
Celsius temperature because the difference between each value is the same. For example,
the difference between 60 and 50 degrees is a measurable 10 degrees, as is the
difference between 80 and 70 degrees.
Interval scales are nice because the realm of statistical analysis on these data sets
opens up. For example, central tendency can be measured by mode, median, or mean;
standard deviation can also be calculated.

“Interval” itself means “space in between,” which is the important thing to


remember–interval scales not only tell us about order, but also about the value between
each item.

Here’s the problem with interval scales: they don’t have a “true zero.” For example,
there is no such thing as “no temperature,” at least not with Celsius. In the case of
interval scales, zero doesn’t mean the absence of value, but is actually another number
used on the scale, like 0 degrees Celsius. Negative numbers also have meaning. Without a
true zero, it is impossible to compute ratios. With interval data, we can add and subtract,
but cannot multiply or divide.

Ratio
Ratio scales are the ultimate nirvana when it comes to data measurement scales
because they tell us about the order, they tell us the exact value between units, and they
also have an absolute zero–which allows for a wide range of both descriptive and
inferential statistics to be applied. At the risk of repeating myself, everything above
about interval data applies to ratio scales, plus ratio scales have a clear definition of
zero. Good examples of ratio variables include height, weight, and duration.

Ratio scales provide a wealth of possibilities when it comes to statistical analysis.


These variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central
tendency can be measured by mode, median, or mean; measures of dispersion, such as
standard deviation and coefficient of variation can also be calculated from ratio scales.

CLASSIFICATION OF DATA

Data can be classified into Primary and Secondary Data


1. Primary Data: primary data are those that you have collected yourself or the data
collected at source or the data originally collected by individuals, focus groups, and
a panel of respondents specifically set up by the researcher whose opinions may be
sought on specific issues from time to time (Matt, 2001), (Afonja, 2001).

Advantages of Primary Data


 More accurate information can be gotten
 It is less expensive
 It is current (the information may to the prevailing situation).

Disadvantages Of Primary Data


 It is time – consuming/energy
 More difficult to collect

2. Secondary Data: A secondary data research project involves the gathering


and/or use of existing data for which they were originally collected, for
example, computerized database, company records or archives, government
publications, industry analysis offered by the media, information system and
computerized or mathematical models of environmental processes and so on
(Tim ,1997), (Matt, 2001).
Advantages of Secondary Data
 less expensive
 easier to collect than primary data
 cost effective
Disadvantages of Secondary Data
 not current
 error rate is high

How is data analyzed?


Data is analyzed using statistics. Essentially statistics can be classified in two ways: some
help describe data and some help compare data.

1.3 Statistics

What is Statistics?
The field of Statistics deals with the collection, presentation, analysis, and use of
data to make decisions, solve problems, and design products and processes. (Montgomery,
D. and Runger G.)
Statistics is the science of learning from data, and of measuring, controlling, and
communicating uncertainty; and it thereby provides the navigation essential for controlling
the course of scientific and societal advances (Davidian, M. and Louis, T. A.,
10.1126/science.1218685).
Because many aspects of engineering practice involve working with data, obviously
some knowledge of statistics is important to any engineer. Specifically, statistical
techniques can be a powerful aid in designing new products and systems, improving existing
designs, and designing, developing, and improving production processes. Drawing conclusions
from data is vital in research, administration, and business. It is important to collect data
in a way which allows its analysis. The representation of collected data in a data set or
data matrix allows the application of a variety of statistical methods.

1.4 Population and Sample

Data is often collected to make statements or tell a story about a group or


“population” of interest. However, often a population is so large that we cannot possibly
measure the variables of interest for every person in the population. The alternative is to
select a “sample” from the population of interest. The tricky part of taking a sample is
ensuring that the sample is representative of the larger population about which you would
like to make statements. The illustration below may be helpful in understanding this:
A population is the totality of the observations with which a statistician is concerned. The
observations could refer to anything of interest, such as persons, animals or objects; it
need not be limited to people. The size of the population is defined to be the number of
observations in the population. In collecting data concerning a population, the statistician
is often interested in arriving at conclusions involving the entirety of the population.

A sample is a subset of a population. In the process of data gathering, it is often


impossible or impractical to obtain the entire set of observations for the given population.
Often, a sample of the population is taken, data collected from it, and inferences about
the population are made based on the analysis of the sample data.

The main difference between a population and sample has to do with how observations are
assigned to the data set.
 A population includes all of the elements from a set of data.
 A sample consists of one or more observations drawn from the population.
Other differences have to do with nomenclature, notation, and computations. For example,
 A measurable characteristic of a population, such as a mean or standard deviation,
is called a parameter; but a measurable characteristic of a sample is called
a statistic.

What is sampling?

Sampling definition: Sampling is a technique of selecting individual members or a subset


of the population to make statistical inferences from them and estimate characteristics
of the whole population. Different sampling methods are widely used by researchers.

Sampling Methods
1. Probability sampling: Probability sampling is a sampling technique where a researcher
sets a selection of a few criteria and chooses members of a population randomly. All the
members have an equal opportunity to be a part of the sample with this selection
parameter.
A. Simple Random Sampling
A sampling method is a procedure for selecting sample elements from a
population. Simple random sampling refers to a sampling method that has the following
properties.
 The population consists of N objects.
 The sample consists of n objects.
 All possible samples of n objects are equally likely to occur.
An important benefit of simple random sampling is that it allows researchers to use
statistical methods to analyze sample results. For example, given a simple random sample,
researchers can use statistical methods to define a confidence interval around a sample
mean. Statistical analysis is not appropriate when non-random sampling methods are used.
There are many ways to obtain a simple random sample. One way would be the lottery
method. Each of the N population members is assigned a unique number. The numbers are
placed in a bowl and thoroughly mixed. Then, a blind-folded researcher
selects n numbers. Population members having the selected numbers are included in the
sample.
Slovin's formula
- is used to calculate the sample size (n) given the population size (N) and a margin of error
(e).
- it's a random sampling technique formula to estimate sampling size
-It is computed as
where:
n = no. of samples
N = total
population
e = error margin / margin of error

When to Use slovin's formUla?


If a sample is taken from a population, a formula must be used to take into account
confidence levels and margins of error. When taking statistical samples, sometimes a lot is
known about a population, sometimes a little and sometimes nothing at all. For example, we
may know that a population is normally distributed (e.g., for heights, weights or IQs), we
may know that there is a bimodal distribution (as often happens with class grades in
mathematics classes) or we may have no idea about how a population is going to behave
(such as polling college students to get their opinions about quality of student life).
Slovin's formula is used when nothing about the behavior of a population is known at at all.

How to Use slovin's formUla?


- To use the formula, first figure out what you want your error of tolerance to be. For
example, you may be happy with a confidence level of 95 percent (giving a margin error of
0.05), or you may require a tighter accuracy of a 98 percent confidence level (a margin of
error of 0.02). Plug your population size and required margin of error into the formula.
The result will be the number of samples you need to take.

Example No. 1
Determine the sample size in research methodology, when N=1000 and e=0.05.
Solution:

Example No. 2
A researcher plans to conduct a survey. If the population on High City is
1,000,000 , find the sample size if the margin of error is 2.5%
Solution:
First : Convert the margin of error 2.5% to decimal by dividing it by 100
Given:
e = 2.5% = 0.025 N = 1,000,000
B. Cluster sampling: Cluster sampling is a method where the researchers divide the
entire population into sections or clusters that represent a population. Clusters
are identified and included in a sample based on demographic parameters like age,
sex, location, etc. This makes it very simple for a survey creator to derive
effective inference from the feedback.

EXAMPLE:
In a survey of students from a city, we first select a sample of schools, then we
select a sample of classrooms within the selected schools, and finally we select a
sample of students within the selected classes.

C. Systematic sampling: Researchers use the systematic sampling method to


choose the sample members of a population at regular intervals. It requires the
selection of a starting point for the sample and sample size that can be repeated
at regular intervals. This type of sampling method has a predefined range, and
hence this sampling technique is the least time-consuming.

For example, a researcher intends to collect a systematic sample of 500 people in a


population of 5000. He/she numbers each element of the population from 1-5000
and will choose every 10th individual to be a part of the sample (Total population/
Sample Size = 5000/500 = 10).

D. Stratified random sampling: Stratified random sampling is a method in which the


researcher divides the population into smaller groups that don’t overlap but
represent the entire population. While sampling, these groups can be organized and
then draw a sample from each group separately.

2. Non-probability sampling: In non-probability sampling, the researcher chooses


members for research at random. This sampling method is not a fixed or predefined
selection process. This makes it difficult for all elements of a population to have equal
opportunities to be included in a sample.

A. Convenience sampling: This method is dependent on the ease of access to subjects


such as surveying customers at a mall or passers-by on a busy street. It is usually
termed as convenience sampling, because of the researcher’s ease of carrying it out
and getting in touch with the subjects. Researchers have nearly no authority to
select the sample elements, and it’s purely done based on proximity and not
representativeness. This non-probability sampling method is used when there are
time and cost limitations in collecting feedback. In situations where there are
resource limitations such as the initial stages of research, convenience sampling is
used.
For example, start ups and NGOs usually conduct convenience sampling at a mall to
distribute leaflets of upcoming events or promotion of a cause – they do that by
standing at the mall entrance and giving out pamphlets randomly.
B. Judgmental or purposive sampling: Judgemental or purposive samples are formed by
the discretion of the researcher. Researchers purely consider the purpose of the study,
along with the understanding of the target audience. For instance, when researchers
want to understand the thought process of people interested in studying for their
master’s degree. The selection criteria will be: “Are you interested in doing your
masters in …?” and those who respond with a “No” are excluded from the sample.

C. Snowball sampling: Snowball sampling is a sampling method that researchers apply when
the subjects are difficult to trace. For example, it will be extremely challenging to
survey shelterless people or illegal immigrants. In such cases, using the snowball theory,
researchers can track a few categories to interview and derive results. Researchers
also implement this sampling method in situations where the topic is highly sensitive and
not openly discussed—for example, surveys to gather information about HIV Aids. Not
many victims will readily respond to the questions. Still, researchers can contact people
they might know or volunteers associated with the cause to get in touch with the victims
and collect information.

D. Quota sampling: In Quota sampling, the selection of members in this sampling


technique happens based on a pre-set standard. In this case, as a sample is formed
based on specific attributes, the created sample will have the same qualities found
in the total population. It is a
rapid method of collecting samples.

You might also like