0% found this document useful (0 votes)
48 views50 pages

Trust Wallet Spamming

How to spam

Uploaded by

oikugodswill
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views50 pages

Trust Wallet Spamming

How to spam

Uploaded by

oikugodswill
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

ABE 322 Lecture 1

STATISTICAL ANALYSIS IN
ENGINEERING RESEARCH
by
Akachukwu Nwaobi
Definition of Statistics

Statistics is the body of concepts, principles, and methods dealing


with collection, summarization, analysis, and interpretation of
data.“
Statistics is the science of collecting, analyzing, and interpreting
numerical data, with a view to drawing valid conclusions.- Yule and
Kendall (1950)
Statistics can also be defined as the branch of scientific
methodology which deals with the collection, analysis,
interpretation, presentation, and organization of numerical data-
Crocker (1952)
Importance of Statistics
1. Planning experiments efficiently
2. Interpreting experimental data
3. Identifying hidden regularities in biological phenomena
4. Scrutinizing existing theories or formulating new ones
5. Collecting data through field trials, laboratory experiments, and
sample surveys
6. Determining the kind and extent of data collection
7. Summarizing and analyzing data
8. Interpreting results
Limitations of Statistics
• Statistics cannot be applied to cases with no variability- Deterministic events
(e.g., chemical combinations) have no randomness.- Statistical methods are
unnecessary for predictable outcomes.- Other analytical methods (e.g.,
mathematical modeling) may be more suitable.
• Statistical techniques cannot explain individual variations- Statistics focuses
on aggregate patterns, not individual deviations.- Individual variations may be due
to unique factors not captured by statistical models.- Other methods (e.g., case
studies, qualitative analysis) may be needed to understand individual cases.
• Statistical laws are true only on average, in the long run- Statistical principles
hold true in probability, not certainty.- Individual observations may deviate from
the average.- Large sample sizes help ensure statistical laws apply.
• Statistics deals only with aggregates or groups- Statistical analysis focuses on
group-level patterns.- Individual data points may not reflect the group's
characteristics.- Other methods (e.g., data mining, machine learning) may be
needed for individual-level analysis.
• They are not suitable for the study of qualitative variables
Basic Statistical Concepts- Variable(s)
A variable is a characteristic or attribute that can take on different values or levels.
They vary from observation to observations. Variables are the building blocks of
statistical analysis.
Types of Variables:
1. Quantitative Variables: Numerical values that can be measured or counted.
-Continuous– fractional measurements are possible. (e.g., height, weight,
temperature)
-Discrete or discontinuous variables (e.g., number of children, number
of cars)
2. Qualitative or Categorical Variables: Non-numerical values that can be
grouped into categories.
- Nominal (e.g., gender, color, city)
- - Ordinal (e.g., education level, income bracket)
3. Binary Variables: They are dichotomous variables with only two possible values.
Example - Yes/No, 0/1, Male/Female
Basic Statistical Concepts- Variable(s)
• Examples:-

Age (quantitative, continuous)


Gender (categorical, nominal)
City of residence (categorical, nominal)
Score in ABE 322 –(__________,____________)
Basic Statistical Concepts- Data
Data are the collected values or measurements of variables.
Types of Data:
1. Quantitative Data: Numerical values - Height, weight, temperature
2. Qualitative Data: Categorical or textual values. - Color, city,
occupation
Depending on Source
1. Primary Data: Collected directly from the source. - Surveys, experiments
2. Secondary Data: collected from existing sources. - Books, articles,
databases. Examples:- Age data: 25, 31, 42, 28, 35- Gender data: Male,
Female, Male, Female, Male
Basic Statistical Concepts- Population
• A population is the entire group of individuals, objects, or events that
you want to understand or describe. It is the aggregate or totality of
all possible objects possessing a given characteristics.
Characteristics of a Population:
1. Size: The number of elements in the population.
2. Boundaries: Geographic, temporal, or conceptual limits.
3. Homogeneity: Similarity among elements. Examples:- All students
enrolled in a university (population)- All residents of a city
(population)- All patients with a specific disease (population)

• Sample: It is a selected group of individuals, objects, or events


within a population. It is subset of the population.
Basic Statistical Concepts- Population
• Population could be FINITE or INFINITE
• Population could be DISCRETE OR CONTINUOUS
• Population could be EXISTENT OR HYPOTHETICAL
Basic Statistical Concepts- SAMPLING
Sampling is the process of selecting a subset of individuals, objects, or
events from a larger population to represent the entire population. The goal
is to make inferences about the population based on the characteristics of
the sample.
Types of Sampling
1. Probability Sampling: This is one which unit is done with known
probability or circumstances
• Random Sampling: Every individual has an equal chance of selection.
• Stratified Sampling: The population is divided into subgroups, sampled
separately.
• Cluster Sampling: The population is divided into clusters, sampled
randomly.-
• Systematic Sampling: Every nth individual is selected.
• Multistage Sampling: Combination of the above methods.
Basic Statistical Concepts- SAMPLING
2. Non-Probability Sampling
• Convenience Sampling: Non-random selection, often based on
accessibility.-
• Purposive or Judgement Sampling : Selecting individuals with
specific characteristics.-
• Quota Sampling: Selecting individuals to meet specific quotas.
• Snowball Sampling: Participants recruit additional participants.
• Voluntary Sampling: Participants self-select.
Descriptive statistics
Descriptive statistics refers to a set of methods used to
summarize and describe the main features of a dataset, such as
its central tendency, variability, and distribution. These methods
provide an overview of the data and help identify patterns and
relationships.
The four types of descriptive statistics are:
• Measures of Frequency
• Measures of Central tendency
• Measures of Variability or dispersion
• Standards of Relative position
Inferential Statistics
• Inferential statistics is a branch of statistics that
makes the use of various analytical tools to
draw inferences about the population data from
sample data.
Types of Inferential Statistics:
1. Hypothesis Testing: Testing a specific
hypothesis about the population.
2. Confidence Intervals: Estimating a
population parameter with a margin of error.
3. Regression Analysis: Modeling relationships
between variables.
Difference between Inferential and Descriptive
statistics
Frequency Distributions- Measures of Frequency
• Once you have a set of data, you will need to organize it so that you
can analyze how frequently each datum occurs in the set
• Measures of frequency describe the number of occurrences or
observations within a dataset.
• Types of Frequency Measures:
1. Frequency Count: The number of times each value occurs.
2. Relative Frequency: A relative frequency is the ratio (fraction or
proportion) of the number of times a value of the data occurs in the set of all
outcomes to the total number of outcomes. To find the relative frequencies,
divide each frequency by the total number of students in the sample–in this
case, 20. Relative frequencies can be written as fractions, percents, or decimals.
3. Cumulative Frequency: The running total of frequencies.
Ungroup Frequency Distribution
• Frequency = (Number of occurrences of each value)
• Relative Frequency
• = (Frequency / Total observations) × 100
• Cumulative relative frequency is the accumulation of the
previous relative frequencies. To find the cumulative relative
frequencies, add all the previous relative frequencies to the
relative frequency for the current row
Example:
Example :Twenty students who attended the first class of BRE
322 were asked how many hours they study per day. Their
responses, in hours, are as follows:
• 5; 6; 3; 3; 2; 4; 7; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3.
Arrange this ungrouped data, find their frequency, relative
frequency, cumulative frequency and cumulative relative
frequency.
Grouped Frequency Distribution

• A grouped frequency distribution is a table to organize data in


which the data are grouped into classes with more than one
unit in width. Used when the data is large, or it makes sense to
group the data.
Grouped Frequency Distribution
Procedures for constructing a frequency
distribution table
• Sort data and know the number of occurences
• Determine the minimum no of classes or categories
• Determine the class interval,
C =(Max value – Min Value)/no of classes
• Attach the number of occurrences to each class or categories
• Plot the frequency distribution on a regular polygon or histogram
or bar chart or pie chart
Example
The following data relate to the yield of grain maize in (g per plot) of a
maize variety from 50 experimental plots of equal area.

196 142 169 174 162 151 129 186 168 201

209 126 132 156 109 122 148 163 169 109

89 98 104 65 74 147 192 149 162 154

170 162 76 89 54 66 139 177 219 116

137 104 199 177 104 99 58 86 206 191


Solution
• 1. Determine classes/categories using Yule’s Equation
• Appropriate/minimum no of classes = 2.5* n^(0.25)
n = 50

No of classes = 2.5 *50^(0.25) ~ 6.65 ~ 7 classes

max 𝑣𝑎𝑙𝑢𝑒 −min 𝑣𝑎𝑙𝑢𝑒 219−54


2. Class interval ,C = = = 23.57~24
𝑛𝑜 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 7

NB: We took 25 rather.


Table of Frequency distribution
Class (x) No of plots(F) Cumulative Relative
frequency (C Frequency
50-75 49.5-74.5 5 5 5/50 =0.10
75-100 74.5-99.5 6 11 0.12
100-125 99.5 -124.5 7 18 0.14
125-150 124.5- 149.5 9 27 0.18
150-175 149.5 -174.5 12 39 0.24
175-200 174.5-199.5 7 46 0.14
200-225 199.5-225.5 4 50 0.08
50 1.00

CF could be in the form of C< upper limit of each class interval


Or
CF could be in the form of C>= lower limit of each class interval
Presentation 2
Measures of Central Tendency

• Measures of central tendency are summary statistics that


represent the center point or typical value of a dataset.
• Examples of these measures include the mean, median, and
mode. These statistics indicate where most values in a
distribution fall and are also referred to as the central location of a
distribution.
• You can think of central tendency as the propensity for data points
to cluster around a middle value
Measures of Central Tendency
Mean- Sum of all observations divided by the total number of
observations.

Median- The middle or central value in an ordered set.

Mode- The most frequently occurring value in a data set. In a


frequency distribution, the mode may or may not exist.it may
be unimodal, bimodal or multimodal.
https://www.cuemath.com/data/measures-of-central-tendency/
Class Brainstorm : What if you want to express
the salary of the employee using a single value,
what value would you used ?
Measures of Central Tendency

The three distributions below represent different data conditions. In each distribution, look for the region
where the most common values fall. Even though the shapes and type of data are different, you can find that
central tendency. That’s the area in the distribution where the most common values are located.
Arithmetic Mean
σ𝑥
• For a set of observations: Mean(x̄) =
𝑛

σ 𝑓𝑥
• For a set of grouped data: Mean, ̄ x =
Σ𝑓
where,
̄ x = the mean value of the set of given data.
f = frequency of each class
x = mid-interval value of each class
GEOMETRIC MEAN

Mean(Log values) = Antilog [ σ log 𝑥


𝑛
]
In case of frequency distribution
G.M = Antilog [ ]
σ 𝑓(log 𝑥)
𝑛

https://www.cuemath.com/data/geometric-mean/
Example:
• Let 2, 4, 8, 16 and 18 be the 5 items needed for the
distribution of grains in a field. Determine the geometric
mean.
• Solution x Log x
𝟓 2
𝟐 ∗ 𝟒 ∗ 𝟖 ∗ 𝟏𝟔 ∗ 𝟏𝟖 = 4
8
16
18
Total
Harmonic Mean

The Harmonic Mean is rarely computed for frequency distribution


Example

• There are 6 Agricultural Laborers. They can complete weeding


operations in a 100 metre square land in 4, 5, 5, 6, 6 and 7 hours
respectively. If these laborers are employed for weeding in 500
sqm area, in how many hours will they complete the work.
Weighted mean
Median:
• The middlemost item that divides the data into two equal parts when the item are
arranged in ascending order.
• For ungrouped data:
(n + 1)
• For odd number of observations, Median = th term.
𝟐
• For even number of observations, Median = [(n/2)th term + ((n/2) + 1)th term]/2

For grouped data: Median = l + [ (𝒏Τ𝟐) −𝒄𝒇


𝒇
×C ]
where,
l = Lower limit of the median class in which (n/2)th item falls
cf = Cumulative frequency up to l
f = frequency of the median class
h = Class interval
n = Number of observations
Median class = Class where n/2 lies
Example Problems:
• If the weight of 5 sorghum earheads are 34,60,48,100,65. Find the
median.

• If the weight of six sorghum earheads are 34,60,48,65,100,65.


What will now be the median?
Example: Frequency distribution of weight of 190
sorghum ear-heads
𝑛
X =Weight Mid No of < CF = 95 ; 𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑀𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 𝑖𝑠 100 − 120
2
of point earhead
earheads (f) L =100
yams(g) C = 20
40-60 50 6 6
60-80
80-100
70
90
28
35
34
69
Median = l + [ (𝒏Τ𝟐) −𝒄𝒇
𝒇
×C ]
100-120 110 55 124
120-140 130 30 154

=109.45g
140-160 150 15 169
160-180 170 12 181
180-200 190 9 190
Total 190
Mode
For grouped data: Mode = l + [ 𝒇𝒔
𝒇𝒑+𝒇𝒔
×C ]
L = lower limit of the modal class
Fp =the frequency of the class preceeding the modal class
Fs =the frequency of the class succeeding the modal and
C =Class interval

A data distribution could have no node, unimodal, bimodal


Example
X =Weight of No of insect (f)
seed yams(g) L =100
40-60 6 Fp = 35
60-80 28 Fs = 30
80-100 35 C= 20
100-120 55

Mode = 100 + 20*( )


120-140 30 30
140-160 15 35+30
160-180 12 100 +10 = 110g
180-200 9
Total 190
Quantiles
• Quantiles are values that split sorted data or a probability
distribution into equal parts. In general terms, a q-quantile divides sorted
data into q parts. The most commonly used quantiles have special
names:

• Quartiles (4-quantiles): Three quartiles split the data into four parts.
• Deciles (10-quantiles): Nine deciles split the data into 10 parts.
• Percentiles (100-quantiles): 99 percentiles split the data into 100 parts
Quartiles
• Quartiles are three values that split sorted data into four parts, each
with an equal number of observations. Quartiles are a type
of quantile.
• First quartile: Also known as Q1, or the lower quartile. This is the
number halfway between the lowest number and the middle
number.
• Second quartile: Also known as Q2, or the median. This is the middle
number halfway between the lowest number and the highest
number.
• Third quartile: Also known as Q3, or the upper quartile. This is the
number halfway between the middle number and the highest
number.
Percentiles
• Assume that the elements in a data set are rank ordered from
the smallest to the largest. The values that divide a rank-ordered
set of elements into 100 equal parts are called percentiles.
• An element having a percentile rank of Pi would have a greater
value than i percent of all the elements in the set. Thus, the
observation at the 50th percentile would be denoted P50, and it
would be greater than 50 percent of the observations in the set.
An observation at the 50th percentile would correspond to the
median value in the set.
QUICK NOTE

•Q1 = 25th Percentile


•Q2 = 50 Percentile =median
th

•Q3 = 75th Percentile


Percentiles
• For raw data, First arrange the n observations in increasing order.
Then the xth percentile is given as

• l = lower limit of the percentile class


𝒙(𝒏+𝟏)
• Px = th item
𝟏𝟎𝟎 • F = the frequency of the percentile class

• For frequency distribution • cf =cumulative frequency up to l

• Px = l + [
𝒏
𝒙.𝟏𝟎𝟎 −𝒄𝒇
𝒇
×C ] • C =Class interval

• n = total number of observations


Example Problem
• The following are the paddy yields (kg/plot) from 14 plots : 30, 32, 35,
38, 40, 42, 48, 49, 52, 55, 58, 60, 62 and 65. After rearranging in
ascending order, compute the 25th percentile (Q1,) and 75th
percentile (Q3).
• Solution
𝒙(𝒏+𝟏) 𝟐𝟓(𝟏𝟒+𝟏)
• P25 (or Q1) = P25 = th item = 𝒕𝒉 𝒊𝒕𝒆𝒎
𝟏𝟎𝟎 𝟏𝟎𝟎
3
(3 Τ4)th item= 3 item + (4 item - 3rd item) ( )
3 rd th
4
= 35 + (38 - 35)* (¾) = 37.25kg

𝒙(𝒏+𝟏) 𝟕𝟓(𝟏𝟒+𝟏)
P75 (or Q3) = P75 = th item = 𝒕𝒉 𝒊𝒕𝒆𝒎
𝟏𝟎𝟎 𝟏𝟎𝟎
= 55.75kg
Example: Compute the 25th and 75th Percentile

X =Weight
of seed
yams(g)
Mid point No of
insect (f)
CF
Px = l + [
𝒏
𝒙.𝟏𝟎𝟎 −𝒄𝒇
𝒇
×C ]
40-60
60-80
50
70
6
28
6
34 P25 = 80 + [ 190
25.100 −34
35
]
× 20
80-100 90 35 69 = 87.71 g
100-120 110 55 124
120-140
140-160
130
150
30
15
154
169 Px = l + [ 𝒙.
𝒏
𝟏𝟎𝟎
−𝒄𝒇
𝒇
×C ]
]
160-180 170 12 181
180-200
Total
190 9
190
190
P75 = 120 + [ 190
75.100 −124
30
× 20

=132.33g
Find P50 or Q2
IMPORTANT CHARACTERISTICS OF A GOOD
AVERAGES
• An average is a representative item of a distribution. it should possess the following
properties :

1. It should take all the items into consideration.


2. It should not be affected by extreme values.
3. It should be stable from sample to sample.
4. It should be capable of being used for further statistical analysis.

Mean satisfies all the properties excepting that it is affected by the presence of
extreme items. For example, if the items are 5, 6, 7, 7, 7, 8 and 9 then the mean,
median and mode are all equal to 7. if the last value is 30 instead of 9 , the mean will
be 10, whereas median and mode are not changed. Though median and mode are
better in this respect, they do not satisfy the other properties Hence mean is the best
average among these three.
lecture 3

Measures of Dispersion
• RANGE
• QUARTILE DEVIATION
• MEAN DEVIATION
• STANDARD DEVIATION
Range
Suppose we have the distribution of the yield (kg/plot) of two maize variety from 5 plots
each

Variety 1 : 45 42 42 41 40
Variety 2 : 54 48 42 33 30

Calculate the mean


Find the range

Which variety is more reliable ? Why?

• See you in next class

You might also like