100% found this document useful (1 vote)
561 views4 pages

Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022

This document provides an overview of key concepts in descriptive statistics and data analysis, including: - Descriptive statistics are used to summarize data through measures of central tendency and variability, while inferential statistics allow generalizing from samples to populations. - A population is the entire set of data, while a sample is a subset selected from the population. There are different types of measurements (continuous vs discrete) and scales (nominal, ordinal, interval, ratio). - Data can be collected through simple random or stratified sampling. It is then organized and presented through tables, graphs like histograms and frequency polygons, and other visualizations like stem plots and ogives.

Uploaded by

Robert Macalanao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
561 views4 pages

Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022

This document provides an overview of key concepts in descriptive statistics and data analysis, including: - Descriptive statistics are used to summarize data through measures of central tendency and variability, while inferential statistics allow generalizing from samples to populations. - A population is the entire set of data, while a sample is a subset selected from the population. There are different types of measurements (continuous vs discrete) and scales (nominal, ordinal, interval, ratio). - Data can be collected through simple random or stratified sampling. It is then organized and presented through tables, graphs like histograms and frequency polygons, and other visualizations like stem plots and ogives.

Uploaded by

Robert Macalanao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

ENGINEERING

DATA ANALYSIS
FIRST SEMESTER
– A.Y. 2021 - 2022
LESSON 1
PART 1
◉ Descriptive and Inferential Statistics
◉ Population and Sample
◉ Types of Measurements and Scales
◉ Data Collection and Presentation
Statistics – is the study of the collection, organization, examination, summarization, manipulation,
interpretation, and presentation of quantitative data. It deals with all aspects of data including the planning of
data collection in terms of the design of surveys and experiments.
TWO MAJOR FUNCTIONS OF STATISTICS
 Descriptive Statistics – are brief descriptive coefficients that summarize a given data set, which can be
either a representation of the entire or sample population. Descriptive statistics are broken down into
measures of central tendency and measures of variability. In short, descriptive statistics help describe
and understand the features of a specific data set, by giving short summaries about the sample ad
measures of the data.
 Inductive/Inferential Statistics – are techniques that allow us to use samples to make generalizations
about the populations from which the samples were drawn. It is, therefore, important that the sample
accurately represent the population. The process of achieving this is called sampling. Inferential
statistics arise out of that sampling naturally incurs sampling error and thus a sample is not expected to
perfectly represent the population.
Some definitions:
Population – is a set of similar items or events which is of interest for some question or experiment. A
statistical population can be a group of actually existing objects or a hypothetical and potentially infinite group
of objects conceived as a generalization from experience.
Parameter – is any numerical quantity that characterizes a given population or some aspect of it. this means the
parameter tells us something about the whole population.
Data Sample – is a set of data collected and/or selected from a statistical population by a defined procedure.
Statistics - are numbers that summarize data from a sample.
Variable - the characteristic that is being studied. A variable may be a qualitative or quantitative.
Typically, the population is very large, making a census or a complete enumeration of all the values in
the population is either impractical or impossible. The sample usually represents a subset of manageable size.
Samples are collected and statistics are calculated from the samples so that one can make inferences or
extrapolations form the sample to the population.

POPULATION AND SAMPLE


In the language of statistics, one of the most basic concepts is sampling. In most statistical problems, a specified
number of measurements or data – a sample – is drawn from a much larger body of measurements, called the
population.
A population is the set of all measurements of interest to the investigator.
A sample is a subset of measurements selected from the population of interest.
TYPES OF MEASUREMENTS
 Continuous Data – is information that can be measured on a continuum or scale. Continuous data can
have almost any numeric value and can be meaningfully subdivided into finer and finer increments,
depending upon the precision of measurement system. Ex. Standard Normal Distribution
 Discrete Data – is information that can be categorized into a classification. Discrete data is based on
counts. Only a finite number of values is possible, and the values cannot be subdivided meaningfully. It
is typical things counted in whole numbers. Ex. Binomial Probability Distribution
MEASUREMENT SCALES
 Nominal – used for labeling variables, without any qualitative value. “Normal” scales could simply be
called “labels”. A good way to remember all of this is that “nominal” sounds a lot like “name” and
nominal scales are kind of like “names” or labels.
Note: a subtype of nominal scale with only two categories (e.g. males/female) is called “dichotomous”.
 Ordinal – with ordinal scales, it is the order of the values is what’s important and significant, but the
differences between each one is not really known. Ordinal scales are typically measures of non-numeric
concepts like satisfaction, happiness, discomfort, etc. “Ordinal” is easy to remember because it sounds
like “order” and that’s the key to remember with “ordinal scales” – it is the order that matters, but that’s
all you really get from these.
 Interval – are numeric scales in which we know not only the order, but also the exact differences
between the values. The classic example of an interval scale is Celsius temperature because the
difference between each value is the same. “Interval” itself means “space between”, which is the
important thing to remember – interval scales not only tell us about order, but also about the value
between each item.
Here’s the problem with the interval scales; they don’t have a “true zero”. For example, there is no such
thing as “no temperature”. Without a true zero, it is impossible to compute ratios. With interval data, we
can add and subtract, but cannot multiply or divide.
 Ratio – they tell us about the order, they tell us the exact value between units, and they also have an
absolute zero – which allows for a wide range of both descriptive and inferential statistics to be applied.
Good examples of ratio variable include height and weight.
COLLECTION OF DATA
Simple Random Sample – is a subset of a statistical population in which each member of the subset has an
equal probability of being chosen. An example of a simple random sample would be the names of 25 employees
being chosen out of a hat from a company of 250 employees. In this case, the population is all 250 employees,
and the sample is random because each employee has an equal chance of being chosen.
Stratified Sampling – is a method of sampling that involves the division of a population into smaller groups
known as strata. In stratified random sampling, or stratification, the strata are formed based on members’ shared
attributes or characteristics.
- Stratified random sampling is also called proportional random sampling or quota random sampling.
TABULAR AND GRAPHICL METHODS IN DESCRIPTIVE STATISTICS
Frequency Distribution – is a list or graph that displays frequency of various outcomes in a sample. Each entry
in the table contains the frequency or count of the occurrences of values within a particular group or interval,
and in this way, the table summarizes the distribution of values in the sample.
Raw Data – are collected data that have not been organized numerically.
Array – an arrangement of raw data in ascending or descending order or magnitude.
Frequency – the number of times a value appears in the listing
Relative Frequency – actual frequency of the observation divided by the total frequency
Range – is the difference between the largest and smallest values
Class Intervals – range of values in a class consisting of a lower limit and an upper limit
Ungrouped Data – when the data is small (n ≤ 30) or when there are few distinct values, the data may be
organized without grouping.
f
Relative Frequency =
Σf
Grouped Data – statistical data generated in large masses (n > 30) can be assessed by grouping the data into
different classes.
Frequency Distribution from Raw Data
1. Find the range (R)
2. Decide on a suitable number of classes.
M = 1 + 3.3log n where m = number of cases
3. Determine the class size.
R
c=
m
4. Find the number of observations in each class. This is the class frequency.
Class marks –The midpoint of the class interval. Ex. 31.5 is the class mark of 28-35.
Class boundaries – a point that represents halfway or dividing point between successive classes.
Ex. 35.5 is the upper boundary of 28-35. Its lower boundary is 27.5.
Where:
𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑐𝑙𝑎𝑠𝑠 +𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑐𝑙𝑎𝑠𝑠 89+90
𝐶𝑙𝑎𝑠𝑠 𝐵𝑜𝑢𝑛𝑑𝑎𝑟𝑦 = = = 89.5
2 2
𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 +𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 80+89
𝐶𝑙𝑎𝑠𝑠 𝑀𝑎𝑟𝑘𝑠 = = = 84.5
2 2

Cumulative Frequency – total frequency of all values either “less than” or “more than” any class boundary.
Frequency Histogram - a graph that uses vertical columns to show frequencies.
- there should not be any gaps between the bars.
Frequency Polygon - a frequency polygon is very similar to a histogram. In fact, they are almost identical
except that frequency polygons can be used to compare sets of data to display cumulative frequency
distribution. In addition, histograms tend to be rectangle while a frequency polygon assembles a line graph.
CUMULATIVE FREQUENCY POLYGON / OGIVE
- An ogive graph plots cumulative frequency on the y-axis and class boundaries along
the x-axis. It’s very similar to a histogram, only instead of rectangles, an ogive has a
single point marking where the top right of the rectangle would be.

STEMPLOT
- typically used when there is a medium amount of quantitative variables to analyze; Stem plots of
more than 50 observations are unusual. The name “Stem plot” comes because there is one “stem”
with the largest place-value digits to the left and one “leaf” to the right.

1. Select one or more leading digits for the stem values. The remaining digits become the leaves.
2. List all the possible stem values in a vertical column.
3. Record the leaf for every observation beside the corresponding stem value. Indicate the unit for stems
and leaves in the display.
4. A display having between 5 and 20 stems is recorded.

Pie chart –is the familiar circular graph that shows how the measurements are distributed among the categories
Bar chart –shows the same distribution of measurements in categories, with the height of the bar measuring
how often a particular category was observed.
Example (Introduction to Probability and Statistics by Mendenhall and Beaver, 13th edition, 2009, p.12

You might also like