## Univariate Analysis
## Univariate Analysis
3.1 Covariance
Data:
• Ventilation Rate (m³/s): 5, 10, 15, 20, 25
• Air Quality (% harmful gases): 12, 9, 7, 5, 3
Solution:
Compute the means of the ventilation rate and air quality data.
Use the Pearson correlation formula to find the correlation coefficient.
Interpretation: A negative correlation would imply that increasing
ventilation improves air quality (reduces harmful gases).
Grade Variability
Distance (meters)
(%)
10 4.0
20 3.6
30 3.0
40 2.9
50 2.5
Solution
## Multivariate Analysis
3.3 Techniques of Multivariate Analysis
Problem:
We want to predict sales (Y) based on two independent variables:
advertising spending (X₁) and price discount (X₂). The relationship is
modeled as follows:
Solution:
Using the method of least squares, the multiple regression equation can
be derived. After computing the coefficients using statistical software, the
estimated regression equation might look something like this:
Interpretation:
Consider two random variables, height (X₁) and weight (X₂), of students in a
university. The goal is to compute the covariance matrix based on the
following data:
• Heights (X₁): [170, 160, 180, 175, 165]
• Weights (X₂): [65, 55, 75, 70, 60]
Solution
CONDITIONS
The central limit theorem states that the sampling distribution of the mean will
always follow a normal distribution under the following conditions:
• The sample size is sufficiently large. This condition is usually met if the
sample size is n ≥ 30.
• The samples are independent and identically distributed (i.i.d.) random
variables. This condition is usually met if the sampling is random.
• The population’s distribution has finite variance. Most distributions have
finite variance.
2) The salaries at a very large corporation have a mean of $62,000 and a standard
deviation of $32,000. If 100 employees are randomly selected, what is the
probability their average salary exceeds $66,000?
Solution:
Population mean, μ = $62,000
Population standard deviation: σ = $32,000
Sample size: n = 100
# 31444
The sample standard deviation, σ!
" = = = 3200
√% √&44
Z-score for the raw score of x = $66,000 is
*+,- (66,000,62,000)
Z= # = = 1.25
.++++/ 3144
√%
Using the z-score table or normal CDF function on a statistical calculator,
P(z > 1.25) = 0.106
Thus, the probability that the average salary exceeds $66,000 is 10.6%.
Problem Statement
A region affected by industrial pollution has soil samples collected from five locations. The concentration of
a contaminant (in mg/kg) is recorded as follows:
Calculate the semivariance for a lag distance of 5 units and fit a simple spherical model to the
Semivariogram.
Solution
1. Calculate Semivariance:
• Exponential Model: The semi-variance increases gradually but never fully levels off.
• Gaussian Model: The semi-variance increases smoothly, particularly over short distances.
Univariate Statistics: The average copper concentration across the sampling points is found to be 2.5%,
with a standard deviation of 0.4%. While this gives a sense of the overall distribution, it does not reveal any
spatial patterns, such as whether copper concentrations are higher in certain areas of the mine.
Variogram: By plotting the variogram, the mining company notices that the semi-variance increases
rapidly at short distances (up to 50 meters), indicating strong spatial dependence. Beyond 150 meters, the
semi-variance levels off, suggesting that copper concentrations are no longer spatially correlated beyond
this distance (range = 150 meters).
In this case, the variogram provides valuable insight into the spatial distribution of copper, allowing the
company to identify zones with higher concentration and optimize their extraction strategy.
5. Example Problem
You need to fit a spherical variogram model to the data and calculate the nugget, range, and sill.
Step-by-Step Solution:
1. Plot the Empirical Variogram: The semi-variance increases rapidly up to 6 km and then flattens out,
suggesting a spherical model is appropriate.
2. Choose the Spherical Model: Since the semi-variance increases sharply and then stabilizes, the
spherical model is a good fit.
3. Fit the Nugget (C0): The semi-variance at short distances (0.5 km) is 0.03, indicating small-scale
variability or measurement noise. Set the nugget at C0=0.03.
4. Fit the Range (a): The semi-variance flattens out around 6 km, so the range is a=6 km. Beyond this
distance, there is no spatial correlation between data points.
5. Fit the Sill (C): The semi-variance stabilizes at 0.52, so set the sill at C=0.52C, representing the total
variance in the data.
Numericals Problems:
Given a dataset from a mineral field with ore grades measured at several
locations, let's compute the isotropic variogram.
Data:
• Ore grades at distances (in meters):
Z(0) = 1.5
Z(10) = 2
Z(20) = 2.5
Z(30) = 3
Z(40) = 2.8
Variogram Calculation:
gamma(10) = 1/2 * [(2 - 1.5) ^ 2 + (2.5 - 2) ^ 2 + (3 - 2.5) ^ 2 + (2.8 - 3) ^ 2]
gamma(10) = 1/2 * [0.25 + 0.25 + 0.25 + 0.04]
gamma(10)= 0.395
Consider a scenario where we have measurements of a spatial variable Z(x) (e.g., soil
contamination levels) at three locations:
• Z(x1)=10
• Z(x2)=15
• Z(x3)=12
Let’s assume we’ve already computed the variogram and found that the following
relationships hold for distances between points:
1. Mathematical Example 2
Step 1: Collect Data
Assume we have a dataset of sample points with their corresponding values
(e.g., pollutant concentrations, mineral grades, etc.):
Where:
• N(h) is the number of pairs of points separated by a distance h
• Z(xi) and Z(xj) are the values at locations xi and xj.
For simplicity, let’s assume we calculate the variogram for h=1(the distance
between points).
γ(1)=1/2⋅3(25+25+25)=75/6=12.5
Step 3: Kriging Estimation
To estimate the value at a new location (2,2) we can use the kriging formula:
For simplicity, assume we have determined the weights λ1=0.3, λ2=0.5and λ3=0.2
Z(2,2)=(0.3×10)+(0.5×15)+(0.2×20)
Calculating this:
Z(2,2)=3+7.5+4=14.5
## Ordinary Kriging
## Co-Kriging
2 Cokriging Equations
The cokriging estimation at an unsampled location x0 is a weighted
linear combination of both the primary and secondary variables
from sampled locations:
N M
Σ Σ
Z1∗ (x0 ) = λi Z1 (xi ) + µj Z2 (xj )
i=1 j=1
Where:
• Z1∗ (x0 ) is the estimated value of the primary variable at x0 .
• λi are the cokriging weights for the primary variable.
• Z1(xi) and Z2(xj) are the observed values of the primary and secondary variables at
sampled locations.
Distances:
• d(x0, x1) = 1 km
• d(x0, x2) = 2 km
• d(x1, x2) = 1 km
γ11(x0, x1) = 0.5 × 1 = 0.5, γ11(x0, x2) = 0.5 × 2 = 1.0, γ11(x1, x2) = 0.5 × 1 = 0.5
Variogram for Z2 (Elevation):
γ22(x0, x1) = 0.4 × 1 = 0.4, γ22(x0, x2) = 0.4 × 2 = 0.8, γ22(x1, x2) = 0.4 × 1 = 0.4
Cross-variogram between Z1 and Z2:
γ12(x0, x1) = 0.3 × 1 = 0.3, γ12(x0, x2) = 0.3 × 2 = 0.6, γ12(x1, x2) = 0.3 × 1 = 0.3
• Weight Calculation: The weights λ1, λ2, µ1, µ2 are optimized to minimize the estima-
tion error. The correct determination of these weights is critical for accurate cokriging
predictions.
4 Applications of Cokriging
Cokriging is widely used in various fields where spatial data is
available, and multiple corre- lated variables can improve
predictions. Some common applications include:
• Mining and Resource Estimation: In mining, cokriging is used to estimate the
concentration of minerals in unsampled locations by considering both the primary
variable (e.g., ore grade) and secondary variables (e.g., geophysical data). This helps
in optimizing resource extraction and planning.
• Environmental Science: Cokriging is employed to predict pollutant levels in the en-
vironment. For example, air pollution levels (primary variable) can be estimated using
meteorological data like temperature, humidity, and wind speed (secondary variables),
which are easier to measure and highly correlated with pollution.
• Agriculture: It is used for predicting soil properties such as nutrient content (pri-
mary variable) by incorporating other measurable factors such as moisture content or
elevation (secondary variables). This helps farmers in better land management and
crop yield predictions.
6 Limitations of Cokriging
Despite its advantages, cokriging has certain limitations:
• Complexity: The cokriging system involves more complex mathematical models and
requires variogram and cross-variogram modeling for both the primary and secondary
variables, making it computationally more intensive and harder to implement than
simple kriging.
• Requires Extensive Data: To build accurate cross-variograms, a significant
amount of data is required for both the primary and secondary variables. If
secondary data is scarce or weakly correlated with the primary variable, cokriging
may not offer signifi- cant advantages.
• Time-Consuming: The variogram modeling, system setup, and solution process for
cokriging are more time-consuming, which can be a drawback for large datasets or
when quick estimations are needed.
## Indicator Kriging
2. Theoretical Framework
Indicator Variables:
In Indicator Kriging, the continuous data values Z(x) at location x are transformed into indicator
variables I(x), based on a cutoff or threshold value z_c:
I(x) = 1 if Z(x) >= z_c, 0 if Z(x) < z_c
The indicator function transforms the spatial data into binary form, facilitating the calculation of
conditional probabilities at unsampled locations.
Variogram in Indicator Kriging:
The experimental variogram γ(h) is computed for the indicator data. It quantifies the spatial
autocorrelation between two locations separated by distance h. This variogram is key to determining
the kriging weights.
Assumptions and Limitations:
1. The model assumes stationarity of the indicator variable.
2. Kriging does not provide absolute certainty; it provides a probabilistic estimate of exceeding a
threshold.
I(x)=1 if Z(x)≥5g/t
I(x)=0 if (x)<5g/t
Location Gold Concentration (g/t) Indicator Value
A 6.0 1
B 4.5 0
C 7.2 1
D 2.8 0
E 10.5 1
• A−B=2km
• A−C=3km
• A−D=4km
• A−E=5km
2. Compute squared differences:
Pair Indicator Difference I1−I2 (I1 − I2) 2
A-B 1− 0 = 1 1
A-C 1− 1 = 0 0
A-D 1− 0 = 1 1
A-E 1− 1 = 0 0
λ1⋅γ(h1x)+λ2⋅γ(h2x)+...+λn⋅γ(hnx)=γ(hx)
λ1⋅0.8+λ2⋅0.6+λ3⋅0.4+λ4⋅0.3=1
I((X)=λ1⋅IA+λ2⋅IB+λ3⋅IC+λ4⋅ID+λ5⋅IE
λ1=0.3,λ2=0.2,λ3=0.25,λ4=0.15,λ5=0.1λ1=0.3,λ2=0.2,λ3=0.25,λ4=0.15,λ5=0.1
So, the probability that the gold concentration at location X exceeds 5 grams per ton is 65%.
I(x)=1 if Z(x)≥100ppm
I(x)=0 if Z(x)<100ppm
We now solve the linear system of equations for the weights λ1,λ2,...,λ6.
Using a solver (e.g., matrix inversion or Gaussian elimination), we get the following
kriging weights:
λ1=0.4,λ2=0.2,λ3=0.3,λ4=0.05,λ5=0.05,λ6=0.1
Step 4: Final Estimation
The estimated indicator value at the unsampled location X is a weighted sum of the
indicator values at the known locations, using the kriging weights:
I(X)=λ1⋅IF+λ2⋅IG+λ3⋅IH+λ4⋅II+λ5⋅IJ+λ6⋅IK
## Block Kriging
Key Differences Between Block Kriging and Point Kriging
Block kriging is used in various fields to estimate the distribution of resources or other variables over
larger areas:
• Mining: In mining operations, block kriging is often used to estimate the average grade of
minerals over mining blocks, helping in the efficient extraction of resources.
• Environmental Science: Block kriging is applied to estimate average pollutant concentrations
over a defined region, aiding in environmental impact assessments and monitoring.
• Agriculture: It is used to estimate soil properties or crop yields over agricultural fields,
facilitating better decision-making for land management and crop planning.
By applying block kriging, researchers and industry professionals gain valuable insights into the spatial
distribution of variables across large areas, enabling more effective decision-making and resource
management.
2. Mathematical Formulation of Block Kriging
Variogram and Covariance Function
The variogram and covariance function are fundamental tools in kriging, as they describe the spatial
correlation between data points. The variogram is a function that quantifies how the difference
between values at two locations increases as the distance between them increases. The variogram is
defined as:
where Z(x) and Z(x+h) are values at two locations separated by a distance h. The covariance function,
on the other hand, describes how similar values at two locations are, based on the distance between
them. The covariance function is related to the variogram as:
where:
• Z^(x0) is the estimated value at location x0,
• Z(xi)are the known values at surrounding data points,
• λi are the kriging weights determined from the spatial correlation between data points.
To find the weights λi we solve the ordinary kriging system:
where C(hij) represents the covariance between data points, and C(hi0) represents the covariance
between a data point and the location of interest.
In block kriging, instead of predicting a value at a point, we aim to estimate the average value over a
block. The kriging system is modified by integrating the covariance function over the block to account
for the spatial relationships within the block. The block kriging estimator becomes:
where Z^(B) is the estimated average value over the block, and the weights λi are determined by
solving a system that incorporates the covariances between the block and the data points
The block-to-point covariances account for the average spatial correlation between the block and the
surrounding data points.
Problem Description
Suppose we are monitoring the concentration of a pollutant in a river basin. The goal is to estimate the
average pollutant concentration in a specific 1 km² block. We have concentration measurements at
several nearby locations (given as coordinates in kilometers), and we want to apply block kriging to
estimate the average concentration over this block.
• Dataset: The concentration of the pollutant Z(x) is measured at the following locations (in
mg/L):
o Z(0,0)=10 mg/L
o Z(1,0)=12 mg/L
o Z(0,1)=8 mg/L
o Z(1,1)=11 mg/L
•
• Block: The block for which we want to estimate the average concentration is the square area
with vertices at (0.5,0.5) to (1.5,1.5)
• Variogram Model: For simplicity, let’s use a spherical variogram model:
Solution
We start by calculating the covariances between the data points based on the variogram model.
1. Covariance:
C(1)=1.5+0.5−1.625=0.375
Repeat this process to calculate all pairwise covariances C(hij) between the known data points.
Now we compute the covariances between the block and each data point. Since the block is centered
at (1,1) we calculate the average covariance over the block for each point. For simplicity, we'll
approximate the covariance between the block and the point at (0,0)by taking the covariance at the
block’s center (1,1) and the point.
For example, the distance between the center of the block (1,1)and the point at (0,0) is:
Finally, use the kriging weights to estimate the average pollutant concentration in the block:
In this example, we want to estimate the average gold concentration in a 100 m x 100 m mining block
using block kriging. Gold concentrations are measured at several drill holes located near the mining
block. The dataset, block size, and variogram model are as follows:
Solution
Using the exponential variogram, calculate the covariances between the data points:
h=50 m
As in the first example, compute the covariances between the block and each point. For simplicity,
approximate the covariance using the distance between the block's center (25,25) and the data points.
For example, the distance between the center of the block and the point (0,0) is:
Substitute this distance into the variogram to find the covariance C(B,x1).
Using the calculated covariances, set up and solve the kriging system to find the weights λ1,λ2,λ3.
Use the kriging weights to estimate the average gold content in the mining block:
1. ## Cholesky Decomposition
Introduction
Cholesky Decomposition is a mathematical technique used to
decompose a positive-definite matrix into a product of a lower triangular
matrix and its transpose. In geostatistics, this method is widely used
for solving large systems of equations, such as those encountered in
spatial interpolation methods like Kriging. The decomposition
simplifies matrix inversion and provides numerical stability, making it
an essential tool in spatial data modeling and prediction.
Relevance in Geostatistics
Geostatistics involves the study and modeling of spatially distributed
data, where covariance matrices play a crucial role in capturing the
relationships between data points across space. Cholesky
Decomposition is particularly valuable because these covariance
matrices are typically large and dense. This decomposition allows for
efficient computation, making it feasible to handle large datasets
common in geostatistical problems.
2. Properties of Cholesky Decomposition
• Symmetry Preservation: Since the covariance matrix is symmetric, Cholesky
Decomposition preserves this structure by decomposing itinto L× L^T.
• Positive-Definiteness: The covariance matrix is positive-definite, which
guarantees that the diagonal entries of L are positive.
• Efficiency: The decomposition is computationally efficient, with a
complexity of O(n^3), where n is the size of the matrix. This is faster than
other matrix decomposition methods like LU Decomposition.
• Numerical Stability: Cholesky Decomposition is numerically stable and
less prone to rounding errors, making it ideal for large geostatistical
models.
The covariance between the new location and known locations is:
Step 2: Solve for the Kriging Weights www To estimate the ore concentration
at the new location, we solve for the Kriging weights by solvingthe system:
C × w= c^T
Using forward substitution and then back-substitution, we solve for
w1,w2,w3. This gives us the Kriging weights:
The covariance between the new location and the observed locations is:
C=L× L^T
After performing the decomposition, we get:
Step 2: Solve for the Kriging Weights www Using the Cholesky factor L,solve the
system:
Thus, the estimated contamination level at the new location is 37.1 mg/kg.
Applications of geostatistics:
The development of geostatistics is maturing, and geostatistics has been applied in many
domains, including soil science, hydrology, geology, zoology, agriculture, ecology, forestry,
computer science, mechanical engineering, medicine, environmental engineering and
management, etc.
1) Mining Industry:
● Application:
Geostatistics is used for ore reserve estimation and mineral deposit evaluation. It
predicts the concentration of minerals at different points using sparse data from
exploratory drilling.
● Applied:
○ Techniques like kriging and variogram analysis help in creating spatial models
of ore bodies.
○ These models estimate mineral grades and determine the most promising
locations for drilling.
● Benefits:
○ Reduces the number of necessary drill holes, cutting exploration costs.
○ Provides more accurate estimates of mineral resources, improving financial
forecasting for mining operations.
● Used in
Geostatistics was used in India coal mines to predict ore distribution, reducing
unnecessary excavation
India targets to increase its coal production to 1,200 million metric tons (1,300 million
short tons) by 2023–24.
2) Environmental Science:
● Application:
Used for environmental monitoring, pollution assessment, and analyzing spatial
patterns of environmental variables such as air pollution, temperature, and soil
quality.
● Applied:
○ Geostatistical methods like interpolation create pollution maps that show
concentration levels of pollutants across different regions.
○ Models predict changes in air quality or temperature over time.
● Benefits:
○ Identifies pollution hotspots, aiding in policy and regulatory actions.
○ Helps in long-term environmental monitoring for climate change studies.
● Used in
Geostatistical models have been used to track air quality in large cities like Beijing, helping
in pollution control strategies.
3) Hydrogeology:
● Application:
Geostatistics is used in Hydrogeology to model groundwater systems, assess aquifer
properties, and predict groundwater flow patterns. These methods are essential in
water resource management, pollution control, and environmental studies.
● Applied:
○ Models help track the spread of contaminants in groundwater, predicting
areas at risk for water pollution.
○ Geostatistics supports the prediction of groundwater recharge zones and
discharge areas, improving water resource management.
● Benefits:
○ Helps in management of water supplies by identifying areas where water
extraction can occur without further future depletion.
○ Ensures proper management of groundwater resources.
● Used in
In California’s Central Valley, geostatistics has been used to predict groundwater recharge
rates and manage water use during drought conditions, ensuring water availability for
agricultural and urban needs.
4) Hydrology:
● Application:
Used for groundwater modeling, flood prediction, and analyzing the spatial
distribution of hydrological variables.
● Applied:
○ Spatial models of rainfall, runoff, and groundwater levels help predict water
resource availability and flood risks.
○ Geostatistics assists in managing water resources, especially in arid regions.
● Benefits:
○ Helps in efficient water resource management, flood control, and drought
mitigation.
○ Supports sustainable urban planning by predicting water demands.
● Used in
Geostatistics was applied in the Nile Basin to model groundwater recharge and predict
flood risks.
5) Meteorology:
● Application:
Study of atmospheric phenomena, including weather forecasting and climate
analysis.
● Applied:
○ Meteorological models use satellite data and ground observations to predict
weather patterns.
○ Advanced forecasting techniques assess the impact of atmospheric conditions
on various sectors.
● Benefits:
○ Informs disaster response efforts and enhances public safety during severe
weather events.
○ Supports agricultural planning by providing forecasts for optimal planting and
harvesting times.
● Used in
The National Hurricane Center uses meteorological models to track and predict hurricanes,
providing crucial information for coastal communities.
6) Oceanography:
● Application:
In oceanography, geostatistical methods are applied to study ocean currents,
temperatures, and salinity levels. This helps in understanding marine ecosystems,
predicting climate impacts, and managing marine resources sustainably.
● Applied:
○ Oceanographic models analyze ocean currents, temperature, salinity, and
biological productivity.
○ Used in climate change studies, marine resource management, and pollution
tracking.
● Benefits:
○ Supports sustainable fishing practices and marine conservation efforts.
○ Enhances understanding of climate systems through ocean-atmosphere
interactions.
● Used in
Research on the India Stream's variability has helped understand its role in regulating
climate patterns across the Indian ocean.
7) Geochemistry
● Application:
Geostatistics is crucial in geochemical analysis to assess the distribution of chemical
elements and contaminants in soils and waters. This application aids in
environmental monitoring, resource exploration, and remediation strategies.
● Applied:
○ Geochemical analysis identifies contaminants in soils, water, and sediments.
○ Used in mineral exploration and environmental remediation efforts.
● Benefits:
○ Helps in assessing the health of ecosystems and the impact of human
activities.
○ Supports the identification of mineral resources for sustainable extraction.
● Used in
Geochemical surveys in mining regions have been crucial for assessing environmental
impacts and guiding remediation efforts.
8) Geography
● Application:
Geostatistics supports geographic analyses by mapping and interpreting spatial data.
It enhances understanding of spatial relationships, informing urban planning,
resource management, and policy development based on geographic patterns.
● Applied:
○ Geographic Information Systems (GIS) analyze spatial data for urban planning,
environmental management, and disaster response.
○ Utilizes mapping techniques to visualize demographic and environmental
data.
● Benefits:
○ Informs land use planning and resource allocation.
○ Enhances understanding of human-environment interactions.
● Used in
GIS has been used in urban planning to identify suitable locations for infrastructure
development while minimizing environmental impacts.
9) Soil Sciences:
● Application:
Geostatistics is used in soil science to analyze spatial variability in soil properties,
aiding in precision agriculture and land management practices. It informs
assessments of soil fertility, contamination, and erosion risks.
● Applied:
○ Soil analysis assesses fertility, contamination, and erosion.
○ Techniques like soil mapping and profile analysis inform agricultural practices
and land use.
● Benefits:
○ Enhances agricultural productivity through informed soil management
practices.
○ Supports environmental conservation efforts by understanding soil health.
● Used in
Soil health assessments guide conservation practices in agricultural regions, promoting
sustainable farming.
10) Forestry:
● Application:
In forestry, geostatistics aids in the assessment of forest health and biomass
distribution. It supports sustainable management practices by analyzing spatial
patterns of tree species and their ecological impacts.
● Applied:
○ Forest inventories assess tree species, health, and biomass.
○ Remote sensing techniques monitor deforestation and forest cover changes.
● Benefits:
○ Supports sustainable timber production and conservation of forest resources.
○ Enhances biodiversity through habitat management practices.
● Used in
Monitoring forest cover changes in the Amazon rainforest helps inform conservation
strategies and combat deforestation.
11) Landscape Ecology
● Application:
Geostatistics helps analyze spatial patterns in landscapes, assessing habitat
fragmentation and connectivity. This application informs conservation efforts and
land use planning by understanding ecological dynamics and their implications for
biodiversity.
● Applied:
○ Landscape models assess habitat fragmentation and connectivity.
○ Used in conservation planning to maintain biodiversity and ecosystem
functions.
● Benefits:
○ Informs land use planning to minimize ecological impacts.
○ Supports habitat restoration efforts by identifying critical areas for
conservation.
● Used in
Landscape ecological studies have guided the restoration of degraded habitats in urban
areas, improving biodiversity.