Engineering Data Analysis Reviewer Ratio: scales that have measurable
intervals
Chapter 1
Data: facts and figures from which Chapter 2
conclusions can be drawn
Data set: the data that are collected for a Graphically Summarizing Qualitative Data
particular study
Elements: may be people, objects, events, or Frequency distribution: A table that
other entries summarizes the number (or frequency) of
Variable: any characteristic of an items in each of several non-overlapping
element and may change from one object classes
to another in the population. Relative frequency: summarizes the
Univariate: data set consists of proportion of items in each class
observations on a single variable.
Bivariate: data when observations are Formula:
made on each of two variables.
Multivariate: data arises when
observations are made on more than one
variable. Steps in Constructing a Frequency Distribution
Measurement: A way to assign a value of
a variable to the element 1. Find the number of classes
Quantitative: the possible measurements 2. Find the class length
of the values of a variable are numbers 3. Form non-overlapping classes of equal
that represent quantities width
Qualitative: the possible measurements 4. Tally and count the number of
fall into several categories measurements in each class
Cross-sectional data: Data collected at 5. Graph the histogram
the same or approximately the same point
in time Contingency Tables:
Time series data: data collected over
different time periods Classifies data on two dimensions
Rows classify according to one dimension
Data Sources Columns classify according to a second
dimension
Existing sources: data already gathered Requires three variables
by public or private sources like The row variable
o Internet The column variable
o Library The variable counted in the cells
o US Government
o Data collection agency Scatter Plots:
Used to study relationships between two
Experimental and observational variables
studies: data we collect ourselves for a Place one variable on the x-axis
specific purpose Place a second variable on the y-axis
Place dot on pair coordinates
Response variable: variable of interest
Independent Variable: related to the variable Types of Relationships
of interest and will be measured. Linear: A straight line relationship
Experimental study: able to manipulate the between the two variables
independent variables Positive: When one variable goes up, the
Observational: unable to control IV. other variable goes up
Negative: When one variable goes up,
Population and Samples the other variable goes down
No Linear Relationship: There is no
Population: A set of all elements about coordinated linear movement between the
which we wish to draw conclusions two variables
Census: An examination all of the
population of measurements Chapter 3
Sample: A subset of the elements of a
population Measures of Central Tendency
Mean, : The average or expected value
Descriptive Statistics: The science of Median, Md : The value of the middle
describing the important aspects of a set point of the ordered measurements
of measurements Mode, Mo: The most frequent value
Statistical Inference: The science of
describing the important aspects a set of Measures of Variation
measurements Range: Largest minus the smallest
measurement
Scales of Measurement Variance: The average of the squared
Nominative: tags or named variables deviations of all the population
Ordinal: ordering and ranking data measurements from the population mean
Interval: known equal intervals of the Standard Deviation: The square root of
same distance the variance.
Empirical Rule The mean of all these returns is the
68.26% : within one standard deviation means calculated as the geometric mean:
95.44%: within two standard deviation means
99.73%: within three standard deviation means Formula:
Z-scores: the number of standard deviations
that x is from the mean
o A positive z score is for x above the mean Chapter 4
o A negative z score is for x below the mean Experiment: is any process of
o The mean has a z score of zero observation with an uncertain outcome
Sample space: The possible outcomes for
Formula: an experiment also known as
experimental outcomes and sample space
outcomes
Probability is a measure of the chance
Percentiles, Quartiles that an experimental outcome will occur
The first quartile Q1 is the 25th percentile when an experiment is carried out
The second quartile (or median) is the 50th o If E is a sample space outcome,
percentile then P(E) denotes the probability
The third quartile Q3 is the 75th percentile that E will occur and:
The interquartile range IQR is Q3 - Q1 Conditions:
o 0 P(E) 1 such that:
Formula: o If E can never occur, then P(E) = 0
o If E is certain to occur, then P(E) =
1
o The probabilities of all the sample
space outcomes must sum to 1
Covariance
A positive covariance indicates a positive Sample Space
linear relationship between x and y Sample Space: The set of all possible
o As x increases, y increases experimental outcomes
A negative covariance indicates a negative Sample Space Outcomes: The
linear relationship between x and y experimental outcomes in the sample
o As x increases, y decreases space
Event: A set of sample space outcomes
Formula: Probability: The probability of an event is
the sum of the probabilities of the sample
space outcomes that correspond to the
event
Correlation Coefficient: a measure of the Probability Rules
strength of the relationship that does not depend Addition Rule
on the magnitude of the data o If A and B are mutually exclusive,
then the probability that A or B will
Formula: occur is P(AB) = P(A) + P(B)
o If A and B are not mutually
exclusive:
Weighted Means: calculated by multiplying the P(AB) = P(A) + P(B) – P(A∩B),
weight or probability associated with a particular where P(A∩B) is the joint
event or outcome. probability of A and B both
occurring together
Formula:
Conditional Probability
o The probability of an event A, given
that the event B has occurred, is
Example: called the conditional probability of
A given B. Denoted as P(A|B)
Weight (%) Grade (%) o Further, P(A|B) = P(A∩B) / P(B);
Q1 10 70 P(B) ≠ 0
Q2 10 65 o Likewise, P(B|A) = P(A∩B) / P(A)
Q3 30 70
Q4 50 85 Multiplication Rule
Weighted Mean 77% o Given any two events, A and B
P(A∩B) = P(A)P(B|A)
= P(B)P(A|B)
Geometric Mean:
For rates of return of an investment, use Chapter 5
the geometric mean Random variable: a variable that
Suppose the rates of return are R1, R2, …, assumes numerical values determined by
Rn for periods 1, 2, …, n the outcome of an experiment
o Discrete random variable: Properties of f(x): f(x) is a continuous
Possible values can be counted or function such that
listed o f(x) ≥ 0 for all x
Examples o The total area under the curve of
The number of defective units in a batch f(x) is equal to 1
of 20 Essential point: An area under a
A rating on a scale of 1 to 5 continuous probability distribution is a
o Continuous random variable: probability
May assume any numerical value in
one or more intervals Uniform Distribution: all values between a
Examples minimum and maximum value have the same
The waiting time for a credit card probability.
authorization Formula:
The interest rate charged on a business
loan
Discrete Probability Distribution
Probability distribution: is a table,
graph or formula that gives the probability Normal Probability Distribution
associated with each possible value that A straight line indicates a normal
the variable can assume. distribution
o Denote the values of the random
variable by x and the value’s Formula:
associated probability by p(x)
Binomial Experiments
1. Experiment consists of n identical trials
2. Each trial results in either “success” or
“failure” Finding Normal Probabilities
3. Probability of success, p, is constant from 1. Formulate the problem in terms of x
trial to trial values
4. Trials are independent 2. Calculate the corresponding z values, and
o If x is the total number of restate the problem in terms of these z
successes in n trials of a binomial values
experiment, then x is a binomial 3. Find the required areas under the
random variable standard normal curve by using the table
Poisson Distribution: finding the Exponential Distribution
probability of an independent event that is
occurring in a fixed interval of time and Chapter 7
has a constant mean rate. Random Sample: every set of n
Formula: elements in the population has the same
chance of being selected
Probability sampling: is a sampling
Example: where we know the chance that each
In a café, the costumers arrives at a mean rate of element in the population will be included
2 per min. Find the probability of arrival of 5 in 1 in the sample
min. o Allows making statistical inferences
Convenience sampling: is where we
select elements because they are easy or
convenient to sample
Hypergeometric Distribution: Voluntary response sampling: is where
determine the probability of certain participants self-select
number of success in a series of draws Judgment sampling: is where a
made without replacements from a fixed knowledgeable person selects population
population. elements
Formula: Sampling distribution of the sample
mean: is the probability distribution of
the population of the sample means
obtainable from all possible samples of
size n from a population of size N
Sampling distribution of the sample
proportion: distribution of all possible
sample proportions
Chapter 6 o Formula:
Continuous Probability Distribution
A continuous random variable may
assume any numerical value in one or
more intervals Central Limit Theorem: non-normal
o Car mileage population
o Temperature o The larger n, the better the
Properties of Continuous Probability Distribution approximation