Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
Lesson 1: Engineering Data Analysis First Semester - A.Y. 2021 - 2022
DATA ANALYSIS
FIRST SEMESTER
– A.Y. 2021 - 2022
LESSON 1
PART 1
◉ Descriptive and Inferential Statistics
◉ Population and Sample
◉ Types of Measurements and Scales
◉ Data Collection and Presentation
Statistics – is the study of the collection, organization, examination, summarization, manipulation,
interpretation, and presentation of quantitative data. It deals with all aspects of data including the planning of
data collection in terms of the design of surveys and experiments.
TWO MAJOR FUNCTIONS OF STATISTICS
Descriptive Statistics – are brief descriptive coefficients that summarize a given data set, which can be
either a representation of the entire or sample population. Descriptive statistics are broken down into
measures of central tendency and measures of variability. In short, descriptive statistics help describe
and understand the features of a specific data set, by giving short summaries about the sample ad
measures of the data.
Inductive/Inferential Statistics – are techniques that allow us to use samples to make generalizations
about the populations from which the samples were drawn. It is, therefore, important that the sample
accurately represent the population. The process of achieving this is called sampling. Inferential
statistics arise out of that sampling naturally incurs sampling error and thus a sample is not expected to
perfectly represent the population.
Some definitions:
Population – is a set of similar items or events which is of interest for some question or experiment. A
statistical population can be a group of actually existing objects or a hypothetical and potentially infinite group
of objects conceived as a generalization from experience.
Parameter – is any numerical quantity that characterizes a given population or some aspect of it. this means the
parameter tells us something about the whole population.
Data Sample – is a set of data collected and/or selected from a statistical population by a defined procedure.
Statistics - are numbers that summarize data from a sample.
Variable - the characteristic that is being studied. A variable may be a qualitative or quantitative.
Typically, the population is very large, making a census or a complete enumeration of all the values in
the population is either impractical or impossible. The sample usually represents a subset of manageable size.
Samples are collected and statistics are calculated from the samples so that one can make inferences or
extrapolations form the sample to the population.
Cumulative Frequency – total frequency of all values either “less than” or “more than” any class boundary.
Frequency Histogram - a graph that uses vertical columns to show frequencies.
- there should not be any gaps between the bars.
Frequency Polygon - a frequency polygon is very similar to a histogram. In fact, they are almost identical
except that frequency polygons can be used to compare sets of data to display cumulative frequency
distribution. In addition, histograms tend to be rectangle while a frequency polygon assembles a line graph.
CUMULATIVE FREQUENCY POLYGON / OGIVE
- An ogive graph plots cumulative frequency on the y-axis and class boundaries along
the x-axis. It’s very similar to a histogram, only instead of rectangles, an ogive has a
single point marking where the top right of the rectangle would be.
STEMPLOT
- typically used when there is a medium amount of quantitative variables to analyze; Stem plots of
more than 50 observations are unusual. The name “Stem plot” comes because there is one “stem”
with the largest place-value digits to the left and one “leaf” to the right.
1. Select one or more leading digits for the stem values. The remaining digits become the leaves.
2. List all the possible stem values in a vertical column.
3. Record the leaf for every observation beside the corresponding stem value. Indicate the unit for stems
and leaves in the display.
4. A display having between 5 and 20 stems is recorded.
Pie chart –is the familiar circular graph that shows how the measurements are distributed among the categories
Bar chart –shows the same distribution of measurements in categories, with the height of the bar measuring
how often a particular category was observed.
Example (Introduction to Probability and Statistics by Mendenhall and Beaver, 13th edition, 2009, p.12