Overarching Concepts
Generalization
Bias-Variance Tradeoff
A predictive model bias-variance tradeoff indicates that lower bias can lead to higher
variance and vice versa. Bias-variance tradeoff is related to model data fitting biased
underfitting and variance overfitting. a bias will be introduced if the proportions of
positive and negative examples do not represent the real-world data distribution.
The result of bias is the concept shift between the training set and the test set.
During model training, the best overall accuracy is achieved near the area where the
bias and variance curves cross.
Model is too simple: It does not fit the data well (biased solution)
Model is too complex: small change in the data results in big change in solution
(high-variance solution) Independent data for validation and testing required!
Transfer-Learning
Transfer learning utilize knowledge from previously learned
tasks and apply them to newer, related ones.
Learning of a new tasks relies on the previous learned tasks:
© Learning process can be faster, more accurate and/or need less training data
A domain is defined as a pair 𝒟𝒟 = {𝒳𝒳, P(𝒳𝒳)}, which consists a feature space 𝒳𝒳, and
a marginal distribution 𝑃𝑃(𝒳𝒳) over the feature space.
A task is defined as a pair 𝒯𝒯 = {𝒴𝒴, P(𝒴𝒴|𝒳𝒳)}, which consists a label space 𝒴𝒴, and a
conditional distribution P 𝒴𝒴 𝒳𝒳 .
Given
• A source domain 𝒟𝒟𝑠𝑠 and learning task 𝒯𝒯𝑠𝑠
• A target domain 𝒟𝒟𝑇𝑇 and learning task 𝒯𝒯𝑇𝑇
Transfer learning aims to improve the learning of the
target predictive function 𝑓𝑓T (�) using the knowledge in
𝒟𝒟𝑠𝑠 and 𝒯𝒯𝑠𝑠 , where 𝒟𝒟𝑠𝑠 ≠ 𝒟𝒟𝑇𝑇 , or 𝒯𝒯s ≠ 𝒯𝒯T .
[𝑓𝑓T (�) is not observed but can be learned from the training data,
used to predict the corresponding label 𝑓𝑓(𝑥𝑥) of a new instance 𝑥𝑥.]
Example Transfer-Learning - Covid severity detection
Model for detection of general diseases based on X-ray pictures
Model for detection of Covid severity based on X-ray pictures
based on the relationships
Categories of Transfer-Learning between domains and/or tasks
1. Inductive TL requires some labeled data. While the two domains may or may not
differ (DS ~DT , or DS ≠ DT ), the target and source tasks are different (TS ≠ TT ),
for e.g. 3D organ reconstruction across multiple anatomies;
2. Transductive TL (Domain Adaptation) requires labeled source data and
unlabeled target data with related domains (DS ~DT ) and same tasks (TS = TT ),
while the marginal probability distributions differ (p XS ≠ p(XT )), for e.g., lung
tumor detection across X-Ray and computed tomography images;
3. Unsupervised TL does not require labeled data in any domain and has different
tasks (TS ≠ TT ), for e.g., classifying cancer for different anatomies using
unlabeled histology images.
"The goal of domain adaptation is to adapt the model learned on the training data
to the test data of a different distribution" 使在訓練數據上學習的模型適應不同分佈的測試數據
"Such a distributional gap is often formulated as a shift between discrete concepts
of well defined data domains." 這種分佈差距通常被表述為定義明確的數據域的離散概念之間的轉變
Open Compound Domain Adaptation (OCDA), a continuous and more realistic
setting for domain adaptation. The task is to learn a model from labeled source
domain data and adapt it to unlabeled compound target domain data which could
differ from the source domain on various factors.
Specific example for Domain Adaptation
Self-driving Car in all weather conditions, images collected in sunny
weather versus those in rainy weather
Number detection
This approach extracts and differentiates
domain-focused factors and class-
discriminative factors to become robust
against domain changes.
Separate characteristics specific to domains
from those discriminative between classes.
It is achieved by a class- confusion
algorithm in an unsupervised manner.
Sim-to-Real Gap
Problem: Many learning techniques, e.g. deep learning and (pure) reinforcement
learning are data-hungry...out data can be expensive!
Solution (among others): train on simulated data!
Advantages of simulated data:
• Cheap, fast and scalable
• Safe and already labeled
• Not limited to real-world probability distributions
Disadvantages of simulated data:
• It's hard to accurately and efficiently model sensors and physical systems
• Small modeling errors can lead to large control errors
Example: Sim-to-real reinforcement learning in robotics
Combining Machine Learning and Simulation
to a Hybrid Modelling Approach
the integration of machine learning techniques in simulation, often for a specific application,
such as car crash simulation, fluid simulation, or molecular simulation.
A typical motivation is to identify surrogate models, which offer an approximate but cheaper to
evaluate model to replace the full simulation.
The integration of simulation into machine learning as an additional source for training
data, for example in autonomous driving, thermodynamics, or biomedicine.
A typical motivation is the augmentation of data for scenarios that are not sufficiently
represented in the available data
Physics-informed Machine Learning
Problem: Purely data-driven methods are often only accurate in regimes/cases/
situations that are covered by the training data
Approach: Incorporate physical domain knowledge into the training process or into
the model
Example: Single Mass Oscillator
given some experimental data points that come from
some unknown physical phenomenon, e.g., the orange
points. to find a model which can accurately predict new
experimental measurements given this data.
the neural network accurately models the physical
process within the vicinity of the experimental data,
it fails to generalize away from this training data.
The physics-informed neural network can predict
the solution far away from the experimental data
points, and thus performs much better than the
naive network.
Example in Fluid Mechanics
Modelling incompressible laminar flows at low
Reynold's numbers Comparison of PINN to
common PDE solver
No measurement data in this example
A surrogate model is an engineering method used when an outcome of interest
cannot be easily measured or computed, so an approximate mathematical model of the
outcome is used instead.
Semi-supervised Learning (a.k.a. Weak Supervision)
Problem: Only a small amount of labeled data is available but
a lot of unlabeled data (e.g. medical images)
Core idea: Increase the available labeled data for training and
decrease the cost of human experts annotating the data
Assumptions:
Continuity / smoothness assumption: Points that are close to
each other are more likely to share a label.
Cluster assumption: The data tend to form discrete clusters,
and points in the same cluster are more likely to share a label.
Manifold assumption: The data lie approximately on a
manifold of much lower dimension than the input space.
Semi-supervised learning may refer to either transductive learning or inductive learning.
Transductive learning is to infer the correct labels for the given unlabeled data {x} only.
Inductive learning is to infer the correct mapping from X to Y.
a semi-supervised learning approach, which has three main steps:
(1) train a teacher model on labeled images,
(2) use the teacher to generate pseudo labels on unlabeled images,
(3) train a student model on the combination of labeled images and pseudo labeled images.
Example:- Self-training with Noisy Student
Learning Strategies
Active Learning - Can the model ask for help during training?
finding ways to minimize the number of labeled instances and the difference
between distribution of the training set and the real-world data
classifier can actively choose the training data and the size of training set increases
Setup: Given existing knowledge, want to choose where to collect more data
Access to cheap unlabeled points
Make a query to obtain expensive label
Want to find labels that are “informative”
Output: Classifier / predictor trained on less labeled data
Similar to “active learning” in classrooms
Students ask questions, receive a response, and ask further questions vs. passive
learning: student just listens to lecturer
Which unlabeled point should you choose? (e.g. to get one more labeled
MRI-picture from an human expert)
Queries an example based
on the degree of
disagreement between
committee of classifiers
Point on max-margin
hyperplane does not
reduce the number of
valid hypotheses by much
Example for Active Learning - Gene Expression and Cancer classification
Data: Cancerous Lung tissue samples
"Cheap" unlabeled data: gene expression profiles from Affymatrix
microarray, can be represented as heat maps
Labeled data: 0-1 label for cancerous vs. normal samples
• Method:
Linear SVM for classifying cancerous vs. normal samples based on their
gene expression profiles
Measure of uncertainty: distance to SVM hyperplane
Use active learning with m being the number of examples selected to be
labeled in each iteration
In ideal case, the learner correctly identified all the cancerous samples
(positives) using the minimum number of labeled training samples
Active learning outperformed passive learning
Incremental Learning
Incremental learning is a method in which input data is continuously used to extend
the existing model's knowledge i.e. to further train the model
It represents a dynamic technique of supervised learning and unsupervised learning
that can be applied when training data becomes available gradually over time or its
size is out of system memory limits
Many traditional machine learning algorithms inherently support incremental learning
The aim of incremental learning is for the
learning model to adapt to new data
without forgetting its existing knowledge.
Online-Learning
Online ML: Adaptively learns from data points in real-time, providing timely &
accurate predictions in data-rich environments. the model incrementally learns
from a stream of data points in real-time. It’s a dynamic process that adapts its
predictive algorithm over time, allowing the model to change as new data arrives.
In online learning you train the system incrementally by feeding it data instances
sequentially, either individually or in small groups called mini-batches
Online learning processes data in real-time and continuously updates its model,
while incremental learning - processes chunks of data at scheduled intervals.
Each learning step is fast and cheap, so the system can learn about new data on
the fly, as it arrives
Great for systems that receive data as a continuous flow (e.g., stock prices) and
need to adapt to change rapidly or autonomously
Batch-Learning (Offline) Online-Learning
Example for Online-Learning: Driver Preference Learning
Online-Learning of driving behavior with Recursive Gaussian Process
Modeled driver preferences is used for further tuning of autonomous driving function
Based on vehicle sensors, the maneuvers are recognized, and the model is updated
Online learning processes data in real-time and continuously updates its model,
while incremental learning processes chunks of data at scheduled intervals.
Responsible Al
Explainability
TRUST: Question Al decisions and illuminate the black box!
• When fairness is critical — Right to an explanation (cf. GDPR)
• When consequences are severe — Cost of mistakes are high
~ Both very true in Health Care (e.g. recommend surgery, classify tumors, ...)
ACTION ADVICE: Understand which input to change for obtaining a desired output change
DEBUG: Understand how to change model when things go (seemingly) wrong
• Small perturbations lead to false image classification
• When new hypotheses are drawn — an example: "Pneumonia patients with asthma had
lower risk of dying (Caruana et al. 2015)"
Generating good explanations
for accurate black-box models
Making inherently
explainable models
more accurate
Global vs Local Explanations: Do they explain the model behavior on the
entire data set or only a small subset near a single data point?
• Global interpretability methods explain the entire ML model at once from input
to prediction, for example, decision trees and linear regression.
• Local interpretability methods explain how predictions change for when input
changes and are applicable for a single prediction or a group of predictions.
Inherent versus Post-hoc: Does the model yield explanations directly or is
subsequent analysis required? (aka interpretable versus explainable)
• Intrinsically interpretable models are models that are interpretable by design,
and no postprocessing steps are needed to achieve interpretability.
• post hoc methods, explainability is often achieved after the model is trained and
it requires postprocessing using external methods to achieve interpretability.
Model-based versus Agnostic Methods for posthoc explanations: Can
explanations be obtained only for a specific type of model or for any type?
• Model-specific techniques can be used for a specific architecture and require
training the model using a dataset.
• Model-agnostic methods can be used across many black-box models without
considering their inner processing or internal representations and do not require
training the model
Example for Explainability Al
- Predictions for the Prevention of Hypoxaemia during Surgery
Reliability & Resilience
Key Factors for Al Reliability and Resilience:
High data quality and diversity and privacy and security
protect data integrity and confidentiality
Robust algorithms that can handle variations, outliers and unexpected input
Incorporate redundancy and failover mechanisms
Adaptability and continual learning in changing environments
Testing and validation: unit testing, integration testing, stress testing
Feedback loops by the user and system performance
Human oversight and intervention as additional safety net
Interpretability and explainability for diagnose and debugging
Ethical considerations for addressing bias and fairness
Regulatory compliance with laws
Safety & Security
Poisoning or backdooring attack:
Injection of one or more manipulated data items into the training set
Training and test data still produce desired results -> hard to detect
Advefsarial attack:
No white-box access to the victim model needed!
A surrogate model is trained using a surrogate data set. Labels for this data set
might optionally be obtained via queries to the victim model.
The trained surrogate model is used to generate adversarial input examples for
attacking the victim model
Key Factors for Al Safety and Security:
Vulnerability Assessment: identify and mitigate vulnerabilities, security audits and testing
Incident Response: plan for security incidents, swift detection and containment
Data Security and Privacy: protect sensitive data, comply with regulations
Ethical Considerations: address bias and ethics, avoid harmful use
Minimize vulnerability of the Al software
Model Security: secure Al models, use encryption and secure deployment
Regular Updates and Patch Management: keep software up-to-date, apply security
patches
Third-party Security: assess third-party components, verify security standards
Regulatory Compliance: follow industry regulations, demonstrate compliance
Assure a safe and secure environment
Access Control: Manage access, strong authentication
Secure APIs and Interfaces: ensure secure communication, input validation
Auditing and Logging: monitor activities, audit logs for anomalies
Training and Awareness: educate personnel, raise security awareness
Ethics in Al
Fairness and Bias: Prevent discrimination and bias.
Bias Detection and Mitigation: Identify and address biases.
Ethical Data Use: Handle data ethically and with consent.
Transparency and Accountability: Make Al decisions transparent. Assign clear accountability.
Human Oversight: Ensure human control and intervention.
Privacy and Security: Safeguard user data and privacy.
Beneficence and Non-maleficence: Maximize benefits, minimize harm.
Societal Impact Assessment: Assess Al's societal effects.
Global Considerations: Respect cultural differences. Avoid global harm.
L06-Simulation
Bei der Simulation werden Experimenten oder Trainings an einem Modell
durchgeführt, um Erkenntnisse über das reale System zu gewinnen.
Micro-level, e.g. Finite element analysis , Electromagnetic simulation
Product-level, e.g. Multi-body-simulation, Electrical and control simulation
Process-, environment- and network-level, e.g.
Production and logistics process simulation, Traffic flow simulation
Water/Energy grid simulation, Weather simulation
How do real world and simulation interact?
Disadvantages of regular physics-based simulation
• Computationally expensive,
• Laborious to derive/model,
• Limited flexibility,
• Time-consuming,
• Lacking uncertainty quantification
How can AI improve simulation?
Speed and efficiency
• Reduced order modeling
• Automatic data preprocessing
• Supporting user for repetitive tasks
• Novel computational methods
Accuracy and reliability
• Uncertainty quantification
• Flexible and optimized modelling
Design exploration and optimization
• Automatic (hyper-)parameter tuning.
• Synthetic data generation
Dala analysis:
• Finding correlations and patterns in large simulation datasets
Real-time applications
• Closed loop application
Pitfalls of using Al in simulation
• Adaptivity to changing conditions
Data issues
• Ensuring quantity & quality (esp. no
ethical bias!)
Opportunities of using Al in simulation • Privacy & security
Validity issues
Faster simulations & reduced order modeling
• Generalization, explainability & lack of
Optimized design exploration
domain knowledge
Enhanced simulation accuracy & adaptability
• Overfitting/underfitting issues
Uncertainty quantification Ressource issues
Data-driven insight generation • Complexity during evaluation
Automated routine tasks and user-support • Cost of development & training
Legal and ethical issues
Hybrid
AI-driven models for simulation
Stand-alone
Accuracy
Flexible modeling
Speed and efficiency
Reduced-order modeling
Example: Modified Gaussian Process Regression Models for Cyclic
Capacity Prediction of Lithium lon Batteries
Problem: electronic vehicle battery capacity changes due to aging
effect is difficult to predict
Objective: accurate capacity prediction with quantified uncertainty
Approach: using Gaussian process regression with prior physics knowlege
and relevant features
Procedure:
1.Collect training data (capacity, temperature, discharge level)
2.Determine GPR model 𝑓𝑓(𝑥𝑥) ~ GPR(𝑚𝑚 𝑥𝑥 , 𝑘𝑘 𝑥𝑥, 𝑥𝑥 ′ ) (inputs, outputs, kernel function)
3.Training and hyperparmeter tuning
4.Evaluation
Example: Spatial modelling of topsoil properties using
geostatistical methods and machine learning
Problem: geological properties vary spatially and precise treatment (e.g.
precision farming) can increase yield, reduce cost, prevent ecological risks.
Objective: obtain geodetic insights with few samples
Approach: use kriging to build a geostatistical simulation (standalone Al-driven model)
Also Gaussian Proces Regression, Kriging predicts the value of a function at a given point by
computing a weighted average of the known values of the function in the neighborhood of the point.
Example: A deep learning approach to estimate stress distribution: a fast and
accurate surrogate of finite-element analysis (FEA)
Problem: knowing stress distributions in tissues enables new treatment strategies.
But patient-specific FEA is complex and time-consuming
Objective: simple, accurate and fast stress prediction in human tissues
Approach: 1. Deep neural network trained with FEA simulation data (standalone Al-
driven approach) 2. Use in time-sensitive clinical applications
AI-driven solvers & simulation algorithms
Speed and efficiency
• Automatic data preprocessing
• Solver improvement
• Novel numerical methods
Using Al for solving partial differential equations (PDE)
Contraints and initial conditions determine the solution for the specific scenario
Traditional solvers:
Discretize the problem (e.g. spatial Finite element grid)
Result: problem simplifies to a set of coupled ODEs
The remaining temporal problem can be solved by time integration
Data-driven solvers: approximate PDE using deep NNs
Using Al-augmented simulation for design optimization
Speed and efficiency -- Reduced-order modeling
Design exploration and optimization -- Automatic design parameter tuning
Shape Optimization of a Pin Fin Heat Sink
Objective: reduction of pressure, drop thermal resistance Design
optimization by Genetic Algorithm (GA) and computational fluid dynamics (CFD)
Traditional: - Slow - Computationally costly
Surrogate-assisted design optimization
1. Sampling (Latin hypercube)
2. Run CFD-Simulations in parallel
3. Train Surrogate Model (e.g. GPR, Deep NN, ...)
4. Fast optimization loop
A Machine Learning Approach For The Prediction of Time-Averaged
Unsteady Flows in Turbomachinery
Objective: Accurate prediction of unsteady flows between turbomachinery rotor blades
Simplified simulation (Steady flow assumption)
+ Computationally efficient - Inaccurate
High-fidelity CFD simulation
+ Vera accurate - Computational effort (~ 75 min. for data set of study) .
Neural Network Predictions (Graph convolutional neural network)
+ Very accurate + Computationally efficient (~ 5 s for data set of study)
Machine Learning for Alloy Composition Optimization
Objective: Find alloy compositions with desired characteristics (mechanical,
corrosion, surface, electric, magnetic)
Approach: Use machine learning methods to compose alloys virtually and predict
properties
Physics-informed neural networks for wind farm design
Using AI-augmented simulation for real-time applications
Speed and efficiency
• Reduced order modeling
Real-time applications
• Closed loop application
• Adaptivity to changing conditions
Estimation in mechatronic systems using data-based models
Objective: accurate estimation of friction coefficient (and system states)
Approach: learn relationship between sensor data and friction coefficient using
Neural Network offline.
Evaluate first NN for excitation monitoring, second NN for friction estimation
Model-based estimator + physical interpretability
- accuracy is limited by the model accuracy
- identification of physical model paramters
- Tuning of the estimation algorithm
Data-based estimator
+ no physical model needed
+ maximum information can be used
- no physical interpretability
- data for training and testing is required
Robot control using data-based models
Objective: reduction of control error
Approach: Feedforward control based
on online-learned GPs
For every time-step:
1. Measure new input-output data
2. Add data point to training data base Online learning of physical properties:
Learning in-operation,
3. Update GP Optimize
Based on real data,
4. GP-hyperparameters
Adaptive to changes.
5. Predict mean and variance Not physically interpretable
6. Use mean for feedforward control
Summary
Simulation Al can be used
• Is important in various areas and at • Inside high-fidelity analytical
various detail levels, simulation,
• Is used for product & process • In combination with high-fidelity
optimization and risk assessment, analytical simulation,
• Can be combined with Al. • As stand-alone simulation.
L07-Optimization
classical optimization
Minimize Objective function over (many) decision variables
Constraints (equalities/inequalities) need to be satisfied
Objective function often expensive to evaluate
How can Al improve optimization?
Complex problem solving
• Constraint handling
• Multi-objective optimization
• Automated problem formulation
Efficiency and speed
• Surrogate modeling & approximation
• Parallel and distributed computing
• Algorithm selection
Global solution search
• Enhanced search strategies
• Hyperparameter tuning
Hyperparameter tuning & Uncertainty handling
Real-time and dynamic optimization
Rough categorization of optimization methods methods:
Non-Al methods, e.g.
• Gradient-based (eg, stochastic gradient descent (SGD) )
• Linear & quadratic programming
Al-based methods, e.g.
• Evolutionary algorithms
• Swarm intelligence algorithms (e.g. particle swarm optimization)
ML-based methods, e.g.
• Surrogate-assisted SGD
• Reinforcement learning for optimization
• Bayesian optimization
ML to approximate optimization results
Non-Al-Based Optimization Methods
(Stochastic) Gradient Descent Integer Programming
Conjugate Gradient Method Quadratic Programming
Newton's Method Dynamic Programming
Quasi-Newton Methods (e.g., BFGS) Simulated Annealing
Simplex Algorithm Hill Climbing
Interior Point Methods Tabu Search
Linear Programming Genetic Algorithm (in some contexts)
Stochastic gradient descent (SGD)
Gradient descent is an iterative algorithm, starts from a
random point on a function and travels down its slope
in steps until it reaches the lowest point of that function.
Gradient-based optimization algorithm
Performs gradient step only based on a single sample (online)
or few samples (mini-batch) in each iteration
Mini-batch gradient descent offers a compromise between batch gradient
descent and SGD by splitting the training data into smaller batches.
The steps for performing mini-batch gradient descent are identical to SGD.
Procedure:
1. Initialization(vector of parameters) at 𝒙𝒙∗0 with learning rate(step size) 𝜂𝜂
2. While not converged:
• Randomly shuffle samples in training set
• For 𝑖𝑖 = 1,2, … , 𝑁𝑁 do:
- Compute gradient ∇𝑓𝑓(𝑥𝑥𝑖𝑖∗ )
∗
- for each training sample Update 𝑥𝑥𝑖𝑖+1 ≔ 𝑥𝑥𝑖𝑖∗ − 𝜂𝜂∇𝑓𝑓(𝑥𝑥𝑖𝑖∗ )
The learning rate is used to calculate the step size at every iteration.
Too large a learning rate and the step sizes may overstep too far past the
optimum value, the algorithm to diverge.
Too small a learning rate may require many iterations to reach a local minimum,
slow to converge.
Typical implementations use an adaptive learning rate.
Extensions of standard SGD
Momentum: remember last gradient and use it partially for current step
remembers the update Δ𝑤𝑤 at each iteration, and determines the next update
as a linear combination of the gradient and the previous update
𝑤𝑤 ≔ 𝑤𝑤 − 𝜂𝜂∇Q i w + 𝛼𝛼Δ𝑤𝑤 + for flat regions
𝛼𝛼 is an exponential decay factor between 0 and 1 that determines the relative
contribution of the current gradient and earlier gradients to the weight change.
it tends to keep traveling in the same direction, preventing oscillations.
AdaGrad: (Adaptive Gradient Algorithm) choose learning rate per decision
variable, uses different learning rates in different “directions” + for sparse problems
RMSProb: (Root Mean Square Propagation) scheduling of learning rate, to
divide the learning rate for a weight by a running average of the magnitudes of
recent gradients for that weight. + for fast but stable learning
Adam: (Adaptive Moment Estimation) combination of above. running
averages with exponential forgetting of both the gradients and the second
moments of the gradients are used.
Global optimization: multiple restarts needed
Compute gradient :
Automatic differentiation (AD) computes gradients efficiently and automatically
From saving and evaluating a „computational graph" of a function
Stochastic gradient descent is an optimization algorithm often used in machine
learning applications to find the model parameters that correspond to the best fit
between predicted and actual outputs.
Standard for training of ML models
Decision variables: weights and biases of
neural networks
Objective function: accuracy of neural network
Training neural networks requires minimizing a
high-dimensional non-convex loss function
The gradient descent is a strategy that searches through a large or infinite
hypothesis space whenever
1) there are hypotheses continuously being parameterized and
2) the errors are differentiable based on the parameters.
The problem with gradient descent is that converging to a local minimum takes
extensive time and determining a global minimum is not guaranteed. In SGD,
the user initializes the weights, and the process updates the weight vector
using one data point. The gradient descent continuously updates it incrementally
when an error calculation is completed to improve convergence. The method
seeks to determine the steepest descent and it reduces the number of iterations
and the time taken to search large quantities of data points
Al-Based Optimization Methods Cuckoo Search
Particle Swarm Optimization (PSO) Bat Algorithm
Evolutionary Algorithms (Genetic Algorithms, Artificial Bee Colony Algorithm
Genetic Programming, Evolution Strategies) Grey Wolf Optimizer
Ant Colony Optimization (ACO) Harmony Search
Bee Colony Optimization Differential Evolution
Firefly Algorithm Artificial Immune Systems
Evolutionary Algorithms
Mimics biological evolution, selecting the fittest solutions for reproduction and survival.
Popülation of potential solutions, evolving iteratively through generations.
Genetic operators like mutation and crössover create diverse offspring solutions.
Fitness Evaluation: Measures solution guiding the
algorithm towards optimal solutions.
Adaptation and Convergence: Evolves towards
optimal solutions by adapting the population
based on fitness, aiming for convergence.
Firefly Algorithm
Biologically Inspired: Mimics flashing behavior of
fireflies for optimization in algorithms.
Attraction and Intensity: Fireflies are attracted based on
brightness, representing fitness or objective function.
Random Movements: Fireflies move randomly and adjust brightness, introducing
diversity for exploration.
Light Absorption: Light absorption influences attractiveness, aiding in
convergence and escape from local optima.
Contrast with PSO: Differs from Particle Swarm Optimization by emphasizing
attraction and randomness over swarm dynamics.
Particle swarm optimization
The particle swarm algorithm exploits the effect that swarms of birds or fish are
significantly more effective in finding food than single individuals.
PSO is a computational method that optimizes a problem by iteratively trying to
improve a candidate solution with regard to a given measure of quality. It solves
a problem by having a population of candidate solutions, here dubbed particles,
and moving these particles around in the search-space according to simple
mathematical formula over the particle's position and velocity. Each particle's
movement is influenced by its local best-known position but is also guided
toward the best-known positions in the search-space, which are updated as
better positions are found by other particles. This is expected to move the
swarm toward the best solutions, but not guaranteed.
PSO does not use the gradient of the problem being optimized, which means PSO
does not require that the optimization problem be differentiable as is required by
classic optimization methods such as gradient descent and quasi-newton methods
− Each individual independently searches the (parameter) space within a certain
radius for suitable feeding sites.
− Each individual has a certain direction and speed.
− If an individual finds a good feeding site, it communicates with the other
individuals and passes on the location.
− Individuals adapt their direction and speed to the position and distance of the
feeding sites.
Example: Optimization of the Rastrigin function with PSO
Used for complex global optimization problems, e.g.
parameter identification
Decision variables: parameters in (dynamic) models
Objective function: error between measured and
simulated system outputs
ML-based Optimization Methods
Bayesian Optimization AutoML (Automated Machine Learning)
Reinforcement Learning for Optimization Transfer Learning for Optimization
Neural Architecture Search (NAS) Meta-Learning for Optimization
Bayesian Optimization
Bayesian optimization is a sequential design strategy for global optimization
of noisy black-box functions (unknown structure) that does not assume any
functional forms. It is usually employed to optimize expensive-to-evaluate
functions. attempt to find the global optimum in a minimum number of steps.
Bayesian optimization incorporates prior belief about 𝑓𝑓 and updates the prior
with samples drawn from 𝑓𝑓 to get a posterior that better approximates 𝑓𝑓.
Model used for approximating the objective function is called surrogate model.
Bayesian optimization uses an acquisition function that directs sampling to
areas where an improvement over the current best observation is likely.
Since the objective function is unknown, the Bayesian strategy is to treat it
as a random function and place a prior over it. At every step, we determine
what the best point to evaluate next is according to the acquisition function
by optimizing it. We then update our model and repeat this process to
determine the next point to evaluate.
Proposing sampling points in the search space is done by acquisition functions.
They trade off exploitation and exploration. Exploitation means sampling
where the surrogate model predicts a high objective and exploration means
sampling at locations where the prediction uncertainty is high.
Both correspond to high acquisition function values and the goal is to
maximize the acquisition function to determine the next sampling point.
Main idea: using Gaussian Process (GP) to approximate expensive-to-evaluate
objective function search mainly in regions with the best expected result.
Makes informed choices of next evaluation location.
Procedure:
1. Initialization
2. While not converged:
- Construct GP-surrogate 𝝁𝝁 𝒙𝒙 , 𝝈𝝈(𝒙𝒙)
- Choose 𝒙𝒙∗ with maximum expected
improvement (based on 𝝁𝝁 𝒙𝒙 , 𝝈𝝈(𝒙𝒙))
- Evaluate at 𝒙𝒙∗
Gaussian processes also called kriging.
Bayesian Optimization algorithm
1. Choose a surrogate model for modeling the true function 𝑓𝑓 and define its prior.
2. Given the set of observations (function evaluations), use Bayes rule to obtain
the posterior.
3. Use an acquisition function 𝛼𝛼(𝑥𝑥), which is a function of the posterior, to decide
the next sample point xt = argmax 𝛼𝛼 𝑥𝑥 .
𝑥𝑥
4. Add newly sampled data to the set of observations and go to step #2 till
convergence or budget elapses.
Bayesian optimization for hyperparameter optimization
Used for complex global optimization problems, e.g.,
hyperparameter optimization of ML methods
Decision variables: No. of layers / nodes /
activation function types in neural networks
Objective function: accuracy of neural network
A hyperparameter is a parameter used to control the learning process.
hyperparameters are set before learning and the parameters are learned from
the data.
Hyperparameter optimization (also tuning) finds the best performing
hyperparameters on machine learning models. The objective function takes a
tuple of hyperparameters and returns the associated loss.
Other Approaches: Grid search, Random search, Gradient-based optimization.
Reinforcement Learning
Reinforcement learning (RL) is a biologically-
inspired strategy that allows an agent to improve
its behavior by interacting with its environment.
Humans and other animals learn by experience
• The agent has a goal or a task
• Behaviors that yield positive results are reinforced
• Behaviors that yield negative results are discouraged
Agent knows: goal/objective, current and past states and available actions
Agent doesn't know: own model or environment model
Can be applied to policy iteration > leads to optimal control results (e.g.
alternative solution of LQR problem)
Transfer Learning for Optimization For image classification
knowledge learned from a task is re-used to boost performance on a related task
Utilize pre-trained model knowledge for faster
convergence in new optimization tasks.
Initialize model parameters with pre-learned
values, enhancing optimization efficiency.
Adjust pre-trained model parameters to adapt
to specific target optimization objectives.
Reinforcement learning vs. classical optimization
Aspect Reinforcement Learning (RL) Classical Optimization
Objective Learns decision-making through Finds optimal solutions for a
interaction with an environment, predefined objective function,
maximizing cumulative rewards subject to constraints.
over time.
Learning Involves learning from experience, Often focuses on
trial and error. deterministic (non-learning)
exploration of solution space.
Exploration/ Balances exploration (trying new Often lacks the exploration-
Exploitation actions) and exploitation (choosing exploitation trade-off (e.g.,
known high-reward actions). SGD).
Dynamic Adapts to dynamic, uncertain Assumes relatively static
Environments environments, suitable for environments with known
changing system dynamics. parameters.
ML to approximate optimization results
Model predictive control (MPC)
Bei der MPC wird ein zeitdiskretes dynamisches Modell des zu regelnden
Prozesses verwendet, um das zukünftige Verhalten des Prozesses in
Abhängigkeit von den Eingangssignalen zu berechnen. Dies ermöglicht die
Berechnung des – im Sinne einer Gütefunktion – optimalen Eingangssignales, die
zu optimalen Ausgangssignalen führen. Dabei können gleichzeitig Eingangs-,
Ausgangs- und Zustandsbeschränkungen berücksichtigt werden. Während das
Modellverhalten bis zu einem bestimmten Zeithorizont N prädiziert wird, wird in
der Regel nur das Eingangssignal u für den nächsten Zeitschritt verwendet und
danach die Optimierung wiederholt. Dabei wird die Optimierung im nächsten
Zeitschritt mit dem dann aktuellen (gemessenen) Zustand durchgeführt, was als
eine Rückkopplung aufgefasst werden kann und die MPC zu einer Regelung
macht. Dies erlaubt die Berücksichtigung von Störungen, erfordert aber auch eine
erhebliche Rechenleistung.
Optimization-based control method
Objective: minimize error between reference (set-point) and predicted state
trajectory based on system model
Decision variables: control input
Procedure: solve optimization problem in every time-step and apply first piece
of optimal control input
+ Good results for nonlinear systems
+ Extensions for robust & stochastic control
+ Theoretical guarantees
- Computationally expensive
Approximate controller with ML
Approximate MPC
Calculating results for (robust) MPC offline
Apply ML to approximate control law, e.g. by a
neural network
Validate learned controller to guarantee stability
Summary
Optimization is key to solve real-world problems
Depending on the technique, optimization methods can
belong to the field of Al and/or ML
For sure: optimization and Al are strongly interlinked
• Al and/or ML can improve/replace optimization methods
• Advanced optimization methods are commonly used for
training and hyperparameter optimization of ML models
Pitfalls of using Al in optimization
Data quality and bias
• Some Al-driven algorithms assume clean and accurate data. Noise, outliers, or
errors in the -data may lead to suboptimal solutions, e.g., during crossover or
mutation in EA
Tuning of algorithm parameters
• Selecting appropriate priors or kernel functions in Bayesian optimization can
greatly impact the optimization process.
Limited computational ressources
• Complex Al-driven optimization algorithms, like PSO, can be computationally
expensive.
Lacking interpretability
• „Black-box“ nature of some algorithms hinders the ability to provide clear
explanations for chosen solutions, e.g., in RL
Opportunities of using Al in optimization
Solving complex and multi-objective problems
Improve efficiency and speed
Global solution search (good exploration/exploitation trade-off)
Uncertainty handling
Real-time and dynamic optimization
Data Analysis
Data Analysis in Engineering Applications
Manufacturing Transport
Condition monitoring Condition monitoring
• Data: motor torques, RPM, temperature, • Data: load, road topology, tire characteristics
• Question: machine condition? • Question: Fuel consumption? Tire wear?
Demand forecasting Behavior Analysis
• Data: order quantity, material flow, … • Data: LIDAR, images
• Question: Resource demand? • Question: behavior of cars/pedestrians?
Process Monitoring Recuperation Potential
• Data: cycle-times, light barrier signals, • Data: road topology, traffic density
• Question: product quality? Bottlenecks? • Question: recoverable energy?
Energy Agriculture
Fault Detection Condition monitoring
• Data: voltage, currents • Data: load, road topology, tire characteristics
• Question: Component fault? • Question: Fuel consumption? Tire wear?
Renewable Energy Forecasting Behavior Analysis
• Data: temperature, wind speeds, humidity • Data: LIDAR, images
• Question: Solar/Wind energy? • Question: behavior of cars/pedestrians?
Demand Prediction Recuperation Potential
• Data: Historical energy consumption • Data: road topology, traffic density
• Question: Grid demand? • Question: recoverable energy?
Data Analysis
• Data analysis is the process of inspecting, cleansing, transforming, and modeling
data with the goal of discovering useful information, informing conclusions, and
supporting decision-making.
• Insights for designing, optimizing and problem solving
• Basis for making informed decisions.
• Differentiation between model creation and application
Model Creation Model Application
• Preprocessing • Real-Time Monitoring
• Feature Engineering • Predictive Analysis
• Training Data Preparation • Adaptive Solution
Data should be preprocessing & quality checked: Significant impact on Model performance
Challenges in Model Creation
Quality 品質
Noise and Outliers
Missing Data
Quantity 數量
Insufficient Data
Data Distribution
Challenges in Model Creation
Feature Engineering: Relevance 關聯
Domain Knowledge
Model Complexity: Type of Model, Number of Parameters
Generalization vs Overfitting
Overfitting occurs when the model is so closely
aligned to the training data that it does not know Error Overfitting Right Fit Underfitting
how to respond to new data. Training Low Low High
− The machine learning model is too complex; it
Test High Low High
memorizes very subtle patterns in the training
data that don’t generalize well.
− The training data size is too small for the model
complexity and/or contains large amounts of
irrelevant information.
Underfitting: the model doesn’t align well with
the training data or generalize well to new data.
Only rely on error of a model for the training data, overfitting is harder to detect than underfitting.
To avoid overfitting, validate a machine learning model before using it on test data.
Challenges in Model Application
Changing Data Distributions Real-Time Responsiveness
• Shifts in Data • Latency
• Concept Drift • Stream Processing
Interpretability and Trust Feedback-Loop and Learning
• Black-Box Models • Continuous Improvement
• Uncertainty • Human-In-The-Loop
Data Processing Methods
the collection and manipulation of digital data to produce meaningful information.
Noise Removal • FIR-filter • Kalman-filter • Moving average • Exponential smoothing • …
Outlier Removal • IQR-method • Z-score • Trimming • …
Missing Values • Interpolation • Imputation • Missingness as feature • …
Feature Engineering • Derived variables • Interaction terms • Texture • Frequencies • …
Data preparation is vital for machine learning methods as it lays the foundation for model
accuracy and reliability. It involves tasks like cleaning, feature engineering, and handling
missing values. Proper data preparation ensures that the dataset is consistent, accurate, and
relevant, which in turn prevents models from learning noise and irrelevant patterns. By
structuring the data effectively, we enable machine learning algorithms to extract meaningful
insights and relationships, leading to more robust and dependable models that can make
accurate predictions and generalizations, ultimately fulfilling the potential of machine learning
in solving real-world problems.
Data normalization is crucial for machine learning methods because it ensures that features or
variables in a dataset are on a consistent scale, preventing certain attributes from dominating the
learning process due to their larger magnitude. By bringing all features to a common scale,
normalization allows machine learning algorithms to learn patterns more effectively, converging
faster and making the model less sensitive to the magnitude of input data, ultimately leading to
improved model performance and generalization on diverse datasets.
Data Processing Noise Removal
Noise distortion of patterns and relationships
Method: Filtering and Smoothing
• Moving Average: (rolling or running average) is a calculation to
analyze data points by creating a series of averages of different
selections of the full data set. A moving average is commonly used
with time series data to smooth out short-term fluctuations and
highlight longer-term trends or cycles.
As a low-pass filter used in signal processing.
• Exponential smoothing or exponential moving average (EMA) is
for smoothing time series data using the exponential window
function. acting as low-pass filters to remove high-frequency noise.
• FIR-Filter (Finite impulse response) is a filter whose impulse
response is of finite duration, because it settles to zero in finite time.
FIR Filter Forward-backward filtering is used (for non-real-time
applications) to produce a zero-phase filtering effect to remove the
tap delay inherent in the use of FIR filters.
• Adaptive filters is a system with a linear filter that has a transfer
function controlled by variable parameters and a means to adjust
those parameters according to an optimization algorithm.
• Kalman Fliter - linear quadratic estimation (LQE)- an algorithm uses a series of
measurements observed over time, including statistical noise and other
inaccuracies, and produces estimates of unknown variables that tend to be more
accurate than those based on a single measurement alone, by estimating a joint
probability distribution over the variables for each timeframe.
• Fourier-Analysis ist die Theorie der Fourierreihen und Fourier-Integrale. Sie wird
vor allem verwendet, um zeitliche Signale in ihre Frequenzanteile zu zerlegen. Aus
der Summe dieser Frequenzanteile lässt sich das Signal wieder rekonstruieren.
• Gaussian Processes (GP) are a nonparametric supervised learning method used to
solve regression and probabilistic classification problems. probabilistic statistical models
• Median Filter is a non-linear digital filtering technique, often used to remove noise
from an image or signal. Der Medianfilter speichert N Messwerte in einem
sortierten Array und verwendet nur den Wert an der mittleren Position des Arrays.
• Kalman smoothers are used widely to estimate the state of a linear dynamical
system from noisy measurements. The goal in smoothing is to reconstruct or
approximate the missing measurements given the known measurements.
Common Pitfalls
• Loss of Information Online Offline
• FIR-Filter • Fourier Analysis
• Online vs Offline
• Moving Average • Gaussian Process
• Parameter Sensitivity
• Exponential Smoothing • Median Filter
• Assuming Noise is Random • Adaptive Filters • Kalman-Smoothing
• Ignoring Domain Knowledge • Kalman-Filter
• Misinterpretation of Smoothed Data
Data Processing Outlier Removal
Outlier: Extreme values that deviate significantly from the rest of the data
IQR-method: Inter-Quartile-Range defined as the difference
between the 75th and 25th percentiles of the data.
To calculate the IQR, the data set is divided into quartiles,
or four rank-ordered even parts via linear interpolation.
The lower quartile corresponds with the 25th percentile
and the upper quartile corresponds with the 75th
percentile, so IQR = Q3 − Q1.
Outliers are defined as observations that fall
below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR.
The median Q2 is the corresponding measure
of central tendency.
Z-score: Der z-Score, (Standardwert) ist ein statistisches Maß, das die Anzahl der
Standardabweichungen eines Datenpunkts vom Mittelwert eines Datensatzes 𝑥𝑥 − 𝜇𝜇
𝑧𝑧 =
angibt. eine Transformation eine Zufallsvariable, sodass die resultierende 𝜎𝜎
standardisierte Zufallsvariable den Erwartungswert null und die Varianz eins besitzt
Trimming Common pitfalls
Statistical tests • Loss of information
• Biasing the analysis
Data Processing Missing Values • Subjectivity
− Deletion • Data manipulation
Interpolation
Imputation : preserves all cases by replacing missing data with an estimated value
based on other available information.
• Median There are three main problems that missing data causes:
• Regression o missing data can introduce a substantial amount of bias,
• K-Nearest Neighbors o make the handling and analysis of the data more arduous,
• Multiple Imputation o create reductions in efficiency.
Special Handling Common Pitfalls
• Missingness as Feature • Unrepresentative Imputation
• Time-Series Imputation • Overimputation
• Using Domain Knowledge • Ignoring Impact on Variability
Data Processing Feature Engineering
Feature Engineering: identifying & extracting relevant features from raw data for a
machine learning algorithm.
It starts from selecting the most important characteristics (features), transformation
using mathematical operations, construction of new variables as per the requirement,
and feature extraction. to support training a downstream statistical model.
Goal: Improving Model Performance, Better representation of Knowledge
Domain Knowledge Derived Variables Interaction Terms Polynomial Features
For Time-Series Lagged Variables Rolling Statistics Frequencies
For Images Texture Color-Histogram Edges
Feature Engineering: Deep Learning
Automatic feature learning with Auto Encoders
Encoder compresses input into lower dimensional
representation
• Dimensionality reduction,
wesentlichen Merkmale zu extrahieren
Decoder reconstructs input from encoding
• Minimizing reconstruction error
RNNs for sequential data
• LSTM / GRU
• Learning patterns over sequential data
• Time series prediction
The data is linearly transformed onto a
Component Analysis: Importance of a Feature new coordinate system such that the
directions (principal components)
Principal Component Analysis capturing the largest variation in the
Dimensionality Reduction, Noise Reduction data can be easily identified.
Data Visualization , Collinearity reduction
Combine the highly correlated variables
Independent Component Analysis into a set of uncorrelated variables
Separation of Mixed Signals
Statistically Independent Components
ICA attempts to decompose a multivariate signal into independent non-Gaussian signals.
separating a multivariate signal into additive subcomponents, by assuming that at most one
subcomponent is Gaussian and the subcomponents are statistically independent from each other.
Engineering Applications
Condition Monitoring of Wind Turbines
Goal: Detect ice on wind turbines: Lower efficiency, Faster wear, Danger from falling ice
Model must work on class of wind turbines
• Each Machine slightly different, Feature engineering is main focus
Training on data from multiple turbines not a solution
• Cost of data acquisition, New turbines differ from training data
Generation of domain independent features
• Feature engineering with auto-encoders
• Features should not allow differentiation between domains
Condition Monitoring of Belt Drives
Goal: Monitor operating condition of drives
• Predictive maintenance, Lower maintenance cost
Belt drives: Divers industrial applications, Correct belt tension for efficiency
Method must be suitable for industrial application
• No additional sensors, Few datasets, Domain independent model
Excitation has large impact on accuracy
Features from time- & frequency-Domain
Tests with different excitations: Multi-frequency-excitation, Jerk-limited trajectory
Tension estimation with random forest regression
Tissue Tracking
Goal: Tracking the deformation of tissue: Surgery automation
No labels for tissue deformation
• Training loss from tracking tissue forward and backward in
image sequence
• Backward tracking should result in original undeformed grid
Tracking based on optical flow:
• Domain knowledge: Motion model
Real-time capability: Knowledge distillation, Model reduction
Unsupervised learning: Teacher - Student Domain Adaptation
First, utilize a teacher-student approach to transfer knowledge from a slow but accurate teacher
model to a fast student model. Secondly, develop self-supervised tasks where the model is
encouraged to learn from different but related examples.
a) A teacher model is used to produce pseudo
labels 𝑦𝑦.
�
b) The student can train on these pseudo
labels.
c) Teacher Warp: 𝑥𝑥1 is warped using 𝑦𝑦� to create
a pseudo image pair with real label.
d) Zero-flow: Given the same image twice, the
real flow is 𝟎𝟎. Image augmentation (au) can
be applied to increase the difficulty for the
student model.
Detection of Accidents in Tunnels
Goal: Detect accidents in tunnels:
− Detect, identify and track an object over a sequence of images
Deep learning:
1- Detect objects, 2- Assign bounding box & ID,
3- Predict position in next frame, 4- Find object in next frame
Detection of accidents within 10 seconds
Automated Engine Maintenance
Goal: Detect damaged turbine blades, Image classification
High-pressure turbine blade: High operating temperature ,Cooling channels for air
Clogging of cooling channels increases probability of material failure
Analysis of X-ray images: Requires experts ,Small data set
Small set of training-data: Class imbalance Statistic of failure cases unknown
Inconsistency of training-data: Different image formats (8-bit vs. 16-bit)
False labels , Filtered and unfiltered images
Semi-supervised learning with unlabeled data: Circumvent manual labeling
Image augmentation to increase training data
L09_Decision Making
Decision-Making in Engineering Applications
Manufacturing Transport
Predictive maintenance Traffic management
o Data: Remaining useful life o Data: Traffic density
o Decision: Schedule maintenance o Decision: Set states of traffic lights
Supply chain management Autonomous driving
o Data: Demand forecasts o Data: Predicted behaviors of target
o Decision: Schedule supply tasks vehicles
Quality control o Decision: Decide on ego vehicle
trajectory
o Data: Manufacturing data
Public transport planning
o Decision: Schedule quality checks/
adapt process parameters o Data: Predicted traffic density
o Decision: Select optimal route
Energy Agriculture
Battery management Automatic weeding
o Data: Forecasted energy demand o Data: Estimated weed/crop
o Decision: Load/unload Battery positions
Smart grid management o Decision: Set weeding strategy
o Data: Forecasted energy Harvesting automation
demand/production o Data: Crop status
o Decision: Set energy strategy o Decision: Decide on harvesting
Supply chain management Water management
o Data: Demand forecasts o Data: Soil moisture
o Decision: Schedule supply tasks o Decision: Selective watering
Decision-Making
Selecting optimal choices from available alternatives
Al techniques → enhanced decision making by analyzing data and patterns.
Decision making as an optimization problem.
Methods of Decision Making
Rule-Based Methods Expert-Systems, Decision-Trees, Fuzzy-Logic
Model-Based Methods Genetic Algorithms, Dynamic Programming, Particle Swarm
Data-Driven Methods Support-Vector Machines, Neural Networks,
Gaussian Processes
Examples of Decision Making
Manufacturing: Predictive Maintenance (PdM)
Goal: Predict failure timing of machinery using equipment data
• Remaining useful life (RUL)
Example: Semiconductor plasma etcher-Wafer(晶圓) production
Maintenance is done regularly: Tasks during maintenance depend on RUL, Machine
failure can be prevented by countermeasures in preceding maintenance cycle.
Update maintenance schedule based on predicted RUL
• Additional tasks increase maintenance cost
First, the best degradation 剝蝕 feature is At the time of degradation diagnosis, the
selected using equipment data from RUL is predicted based on the feature
degradation cases, and a model to predict value trend, and the maintenance
the remaining useful life (RUL) based on schedule is updated in accordance with
the feature value trends is built. the predictions.
In practice, the predicted RUL has a probabilistic variability.
Knowledge base for decision is uncertain.
• Prediction of RUL, Output is probability distribution.
Singular objective: Minimize maintenance cost, Dependent on multiple factors
Decision process is sequential: Online condition monitoring
Decision is binary: Schedule additional tasks Y/N
maintenance schedule update method determines the maintenance schedule into which
additional PdM work should be incorporated on the basis of the expected maintenance
costs at each scheduled maintenance timing by considering the probabilistic variability of
the predicted failure timing.
The proposed method sequentially calculates the expected increased maintenance costs
due to unplanned maintenance and early replacement of components at each planned
maintenance timing. The maintenance schedule is then updated when the increased
maintenance cost at the most recent planned maintenance is the lowest.
Step 1, the degradation feature value at the diagnosis timing is calculated using the wafer-
etching monitoring sensor data.
Step 2, the predicted distribution of the failure timing is calculated using feature value
trends.
Step 3, the expected maintenance costs at each future planned maintenance are
calculated on the basis of the distribution.
Step 4, If the cost at the most recent planned maintenance is the lowest,
Step 5, the additional work is incorporated into the planned maintenance.
Defining a Cost Function with Uncertainty
The cost function defines the decision making process: What will be a good decision?
Knowledge about the system may be uncertain, E. g.: Prediction of machine failure
Considerations for uncertainties can significantly impact the cost of a decision
o High Low uncertainty, Best case Worst case, Min Max
o Max likelihood / Mean, Cumulative density 纍積密度
Transport: Smart Traffic Light Control
Goal: Optimization of traffic flow: Reduce congestion 擁堵, Increase throughput
Genetic algorithm for traffic light control
Simulation: Vehicles move in predetermined direction, The vehicle at the leftmost cell
on the road will turn left, the center vehicle will always move straight, the rightmost
vehicle will always turn right, Random switch of lane after turning
Knowledge base for decision is deterministic
• Number of vehicles waiting at intersection
Multiple objectives
• Minimize number of stopped cars, Minimize waiting time over all vehicles
Decision process is sequential
• State of traffic changes as vehicles move
• Update traffic lights based on changes in traffic density
Decision is discrete:
• Each traffic light has 6 discrete states,
• Chose state of traffic light at each intersection
Solution space is discrete
• 6 States per intersection, 4 intersections, 1296 configurations
Metric: Time to route 1000 vehicles, 22.8% faster than passive System
Defining a Cost Function with Multiple Objectives
the cost of a decision may be influenced by multiple factors: Multi objective
optimization
Normally cost function yields a scalar value: Multiple objectives are weighted against
each other
Pay attention to the units of individual objectives! E.g.Traffic density vs. Time
• Normalization may be helpful
Energy: Grid Management
Goal: Minimize cost of energy consumption
Smart buildings with distributed energy generation
Energy storage, Energy generation, Power grid supply
Reduce dependency on grid
Forecast demand, Forecast supply, Decide which power source to use
Modeling energy demand and supply : use LSTM
Input to LSTM : Day of week, Hour of day, Temperature, Humidity, Air pressure
Features in frequency domain : Walvelet decomposition transformation
Agriculture: Automatic Weed Elimination
Goal: Increasing crop yield and quality, Reducing cost of labor
Automation involves multiple decision-processes
Where are plants? What plant is a weed? Elimination strategy, Avoid hurting crops
Detection by image processing: Segmentation, Object detection, Classification
Knowledge base for decision is uncertain: image processing 100% accuracy unlikely
Models are trained offline: Global optimization based on training and test data
Decisions on crops is per image
• Filtering can increase robustness, Object detection and tracking
Position of weeds are continuous: Can be anywhere on image
Classification is discrete: Crop/Weed
Searching for a Good Decision
Decision making is (often) an optimization problem
• Optimization algorithm affects final result
It is important to understand the characteristics of an algorithm
• Gradient based vs. Evolutionary, Global optimization, Local minima?
Cost function factors into performance, Has to be evaluated by the optimizer
Consider the computational complexity and convergence speed of an optimizer
• Offline- vs. real-time applications
Benefit
• Al-driven data analysis enables complex decision making
• Real-time adaptability enables systems to respond dynamically to changing conditions
• Predictive capabilities enable planning and risk-management
• Hybrid approaches leverage the strengths of different Al methods
Pitfalls &t Remedy
• Avoid overfitting by appropriate validation techniques: E.g. K-fold cross validation
• Enable scalability by considering model type and complexity
• Consider safety & reliability issues: E.g. limits for actuator values
• Keep regulatory compliance in mind
• Consider cost efficiency: Operational cost vs. benefits
Physical Interaction I
A physical agent must solve a task that requires interaction with a physical environment
Tasks can typically be divided into
Sensing/Perception Planning/Reasoning Action/Control
Solutions can be classified into
• Divide & Conquer: Solve sub-tasks separately • Model-Driven
• End-to-End: Solve the major task at once • Data-Driven
Artificial Intelligence for Physical Interaction
Physical Artificial Intelligence: AI that is implemented physically. Physical
AI refers to using AI techniques to solve problems that involve direct interaction
with the physical world, e.g., by observing the world through sensors or by
modifying the world through actuators.
Digital Artificial Intelligence: AI that is implemented digitally
• Model-Driven Methods: Typically Divide & Conquer – Perception, Planning, Control
• Data-Driven Methods: Learning Control, Reinforcement Learning, Behavior Cloning
Vine Robots for Intubation
Goal: System that autonomously intubates patient
Problem: Tough challenge for Robotics & Digital Al
Solution: Inflatable soft robot
Design robot such that physics "autonomously" solve
the task
Model-Driven Methods for Physical Interaction
Divide & Conquer: Divide motion task in Sensing/Perception, Planning, Acting/Control
Model-Driven Methods:
1. Build model of the real-world problem
2. Solve problem in the model-world
3. Apply solution to real-world
Advantages: Well-established methods available. Theoretical & empirical foundation
Disadvantages: Inherently limited by model quality, No end-to-end approaches
Sensing & Perception
Given is robot with
− State dynamics 𝒙𝒙𝑘𝑘+1 = 𝒇𝒇 𝒙𝒙𝑘𝑘 , 𝒖𝒖𝑘𝑘 + 𝒅𝒅𝑘𝑘
− Measurement equation 𝒚𝒚𝑘𝑘 = 𝒉𝒉 𝒙𝒙𝑘𝑘 + 𝒘𝒘𝑘𝑘
− Environment ℰ = 𝒳𝒳free , 𝒳𝒳obstacle
Problem Categories
• Filtering & Smoothing Remove the measurement noise 𝒘𝒘𝑘𝑘
• State Estimation Estimate 𝒙𝒙𝑘𝑘 based on dynamics and measurements
• Localization and Mapping Estimate environment ℰ based on measurements 𝒚𝒚𝑘𝑘
• Computer Vision Measurements 𝒚𝒚𝑘𝑘 are camera pictures
Example: Inertial motion tracking, drone indoor navigation, autonomous driving
Simultaneous Localization and Mapping via Particle-Filter
Slam is the computational problem of constructing or updating a map of an unknown
environment while simultaneously keeping track of an agent's location within it.
Particle filters are a set of Monte Carlo algorithms used to find approximate solutions
for filtering problems for nonlinear state-space systems, such as signal
processing and Bayesian statistical inference. The filtering problem consists of
estimating the internal states in dynamical systems when partial observations are
made, and random perturbations are present in the sensors as well as in the
dynamical system. The objective is to compute the posterior distributions of the
states of a Markov process, given the noisy and partial observations. Particle filtering
uses a set of particles (also called samples) to represent the posterior distribution of
a stochastic process given the noisy and/or partial observations.
Given: a robot with Dynamics model and Measurement
Problem: Where are obstacles? Where is the robot?
Solution: Particle-Filter-based SLAM
Example: SLAM for Drones Navigating 3D Environments
Motion Planning
a computational problem to find a sequence of valid configurations that moves the
object from the source to destination. The term is used in computational geometry,
computer animation, robotics and computer games.
• Discrete Planning
a finite set of states 𝒳𝒳, a finite set of actions 𝒜𝒜, discrete dynamics 𝑓𝑓: (𝒳𝒳, 𝒜𝒜) ⟼ 𝒳𝒳,
and the task consists in finding a sequence of actions to connect an initial state 𝑥𝑥𝐼𝐼
and goal state 𝑥𝑥𝐺𝐺 .
• Continuous Planning
state dynamics 𝒙𝒙𝑘𝑘+1 = 𝒇𝒇 𝒙𝒙𝑘𝑘 , 𝒖𝒖𝑘𝑘 , configuration 𝒚𝒚𝑘𝑘 = 𝒉𝒉 𝒙𝒙𝑘𝑘
environment ℰ = 𝒴𝒴free , 𝒴𝒴obstacle
Tasks are Categorized in Geometric Planning and Kinodynamic Planning
Geometric Planning time and dynamics are not considered;
find continuum of configurations to connect initial configuration 𝑦𝑦𝐼𝐼
and goal configuration 𝑦𝑦𝐺𝐺 .
Kinodynamic Planning time and dynamics are considered;
find input trajectory to connect initial state 𝑥𝑥𝐼𝐼 and goal state 𝑥𝑥𝐺𝐺 .
Kinodynamic planning is a class of problems for which velocity,
acceleration, and force/torque bounds must be satisfied, together with
kinematic constraints such as avoiding obstacles.
Method Categories
Discrete Planning, Sampling-Based Planning, Optimization-Based Planning
Sampling-Based Motion Planning via Rapidly Exploring Random Trees
Idea for continuous planning:
Build search tree via sampling random states/configurations of the robot
Iteratively
• Sample random (or goal) configuration (or state)
• Find nearest configuration (or state) in search tree
• Try to connect the two configurations (or states)
• lf connection possible: Add new state to the search tree
Global Planning for Contact-Rich Manipulation
Solution
− Build quasi-dynamic contact models
− Smooth the contact dynamics
− Apply kinodynamic-RRT
Control
Model-Based Control
Given is robot with
State dynamics 𝒙𝒙𝑘𝑘+1 = 𝒇𝒇 𝒙𝒙𝑘𝑘 , 𝒖𝒖𝑘𝑘 + 𝒅𝒅𝑘𝑘 , Output equation 𝒚𝒚𝑘𝑘 = 𝒉𝒉 𝒙𝒙𝑘𝑘
Problem is to design control law 𝒖𝒖𝑘𝑘 = 𝒌𝒌 𝒙𝒙𝑘𝑘 , 𝒚𝒚𝑘𝑘 , 𝒓𝒓𝑘𝑘 , ⋯ to typically solve:
• Reference Tracking have output 𝒚𝒚𝑘𝑘 equal some reference 𝒓𝒓𝑘𝑘
• Set-Point Stabilization stabilize state 𝒙𝒙𝑘𝑘 at a set-point 𝒙𝒙𝑆𝑆
• Disturbance Rejection minimize the effect of disturbance 𝒅𝒅𝑘𝑘
Approach Typical Methods
o Build a model of the dynamics o Robust Control
o Design a control law o Adaptive Control
o Evaluate performance o Model Predictive Control
o Tune parameters
Model-Predictive Control
MPC rely on dynamic models of the process, most often linear empirical
models obtained by system identification. The main advantage of MPC is the
fact that it allows the current timeslot to be optimized, while keeping future
timeslots in account. This is achieved by optimizing a finite time-horizon, but
only implementing the current timeslot and then optimizing again, repeatedly.
Procedure
• Build a model
• Choose cost function
• Choose input by minimizing cost function
Advantages
• Can deal with constraints
• Can be extended to nonlinear systems
• Great results in real-world applications
Disadvantages
• Computational expensive
• Performance is inherently limited by the quality of the model
Model Predictive Control for Rocket Landing
Build model 𝒙𝒙𝑡𝑡 = 𝒇𝒇 𝒙𝒙𝑡𝑡 , 𝒖𝒖𝑡𝑡 where
− state 𝒙𝒙𝑡𝑡 consists of the rocket's position, velocity, flight path angle, and mass,
− input 𝒖𝒖𝑡𝑡 consists of thrust magnitude and angle of attack.
Set glide-slope constraints 𝒙𝒙min and 𝒙𝒙max to ensure safe landing
Set cost function 𝐽𝐽 to minimize fuel
min 𝐽𝐽
Optimal control law: 𝑢𝑢𝑡𝑡 ,𝑢𝑢𝑡𝑡+1 ,…,𝑢𝑢𝑡𝑡+𝑁𝑁
Summary of Model-Based Methods and Divide-&-Conquer Approaches
Advantages
• Rule the industry
• Combination of perception, planning and control enables complex behaviors
• Well-established methods \ with theoretical understanding and thorough real-
world Validation
Disadvantages
• Performance is inherently limited by the quality of the model
• Design typically requires lots of expert time and knowledge
• Restricted by the inherent assumptions (e.g., dynamics can be modelled, a
state vector exists, output is known and measured)
Physical Interaction II
Data-Driven Methods for Physical Interaction
Motivation: Fields & Methods
• Reduce required expertise • Learning for Control
• Reduce model requirements − Iterative Learning Control
• Unlock novel applications − Data-Driven Model Predictive Control
− Impossible-to-model systems − Model-Based Reinforcement Learning
− Visualmotor policies • Machine-Learning-Based
− Reinforcement Learning
− Imitation 效仿仿製 Learning
Learning for Control: Ideas & Concepts
Concepts
• Learn to solve control problem (stabilization or reference tracking)
• Employ prior knowledge (approximate model)
• Fast & robust real-world learning
Methods
• Iterative Learning Control
• Data-Driven Model Predictive Control
• Model-Based Reinforcement Learning / Hybrid Methods
Iterative Learning Control
Iterative learning control (ILC) is based on the notion that the performance of a
system that executes the same task multiple times can be improved by learning
from previous executions (trials, iterations, passes), in which a feedforward
input trajectory is applied to the system and an output trajectory results.
The objective of ILC is to improve performance by incorporating error information
into the control for subsequent iterations.
Goal: generate a feedforward control that tracks a specific reference or rejects a
repeating disturbance.
ILC generates its open-loop control through practice (feedback in the iteration
domain), this high-performance control is also highly robust to system uncertainties.
• ILC modifies the control input, which is a signal.
• ILC is intended for discontinuous operation.
• In ILC, the initial conditions are set to the same value on each trial.
ILC often employs the so-called lifted framework, where the samples of a variable
of a trial are collected in vectors which are called trajectories
Application: ILC to iteratively learn an input trajectory that if applied leads to the
output trajectories precisely tracking the desired reference trajectories.
Iterative Learning Control
Problem Formulation: Repetitive system with input/output dynamics 𝒚𝒚𝒋𝒋 = 𝐏𝐏𝒖𝒖𝒋𝒋 + 𝐝𝐝
Repeatedly Reference tracking task 𝒓𝒓 = 𝒚𝒚𝒋𝒋
• Perform a trial, the current input trajectories are applied and obtain the
corresponding output trajectories,
Apply input trajectory 𝒚𝒚𝒋𝒋 = 𝐏𝐏𝒖𝒖𝒋𝒋 + 𝐝𝐝
• Compute the tracking error 𝒆𝒆𝒋𝒋 = 𝒓𝒓 − 𝒚𝒚𝒋𝒋
• Update input trajectory 𝒖𝒖𝑗𝑗+1 = 𝒖𝒖𝑗𝑗 + 𝐋𝐋𝒆𝒆𝑗𝑗 (proportional ILC)
𝐿𝐿: learning gain - regulates how strongly the error trajectory affects an
input trajectory update.
𝐿𝐿 too high: Algorithmus diverge, system vibrate. 𝐿𝐿 too low: slow to converge
Advantages Disadvantages
• Simple approach • Reference tracking only
• Works in real-world applications • Learning must be repeated for each motion
• Theoretical understanding
Iterative Learning Control of Two-Wheeled Inverted Pendulum Robot
Problem: Robot has to dive beneath obstacle
Preliminaries
• Build an approximate linear model of the dynamics 𝒙𝒙𝑡𝑡+1 = 𝐀𝐀 � 𝒙𝒙𝑡𝑡 +𝐁𝐁 � 𝒖𝒖𝑡𝑡
• Model-based feedback control 𝒖𝒖𝑡𝑡 = −𝐊𝐊 � 𝒙𝒙t to stabilize robot in upright
position
• Motion planning to find reference 𝒓𝒓
Iterative Learning Control
• Model-based design via norm-optimal ILC to find learning gain matrix 𝐋𝐋
Model-Based Reinforcement Learning for (Feedback) Control
Fundamental Idea: Combine Supervised Learning with Model-based Control
Iterative procedure of:
1. Train a nonlinear model of the unknown dynamics
2. Design a model-based controller (via optimization)
3. Apply the controller and gather experimental data
Advantages Disadvantages
− No a priori model requirement − No theoretical guarantees
− Applicable to nonlinear systems − Model selection can be difficult
− Fast & robust learning Applicable − Requires a priori knowledge of
to real-world systems effective control function
Probabilistic Inference for Learning Control (PILCO)
Problem
• Unknown, nonlinear dynamics 𝒙𝒙𝑡𝑡+1 = 𝒇𝒇(𝒙𝒙𝑡𝑡 , 𝒖𝒖𝑡𝑡 ) models describe the
• Transition from initial state 𝒙𝒙0 to goal state 𝒙𝒙𝐺𝐺 statistical problems in terms
Assumptions of probability theory and
probability distributions
• State is known and measured
• Effective feedback function 𝒙𝒙𝑡𝑡 = 𝛑𝛑(𝒙𝒙𝑡𝑡 ) is known
• Dynamics are smooth
• Gradients of cost can be computed analytically
PILCO: Application to Pendulum-on-a-Cart
Task: Swing-up and balance pendulum on a cart
Application
• State are pendulum angle, cart position, and respective velocities
• Feedback function is a radial-basis-function neural network
Advantages of PILCO
• Major break through w.r.t. data efficiency and speed of learning
• Applicable to real-world problems
Disadvantages of PILCO
• Requires knowledge of effective feedback function
• Restricted to smooth dynamics
Bayesian Optimization for Learning to Walk
Given
− Some walking robot with unknown, hybrid dynamics
− Effective feedback policy 𝒖𝒖𝑡𝑡 = 𝛑𝛑(𝒙𝒙𝑡𝑡 , 𝜽𝜽)
Task
− Find optimal parameters 𝜽𝜽∗ that maximize walking speed
Iterative Approach
− Apply controller to real-world robot and retrieve value/cost
− Model objective function 𝑐𝑐 = 𝑓𝑓(𝜃𝜃) that maps parameters
to cost/value by GP 通過GP將參數映射到成本/價值的模型目標函數
− Determine novel parameters via bayesian optimization
Feedback policy
− Finite state machine
− Four parameters for thresholds to switch states
− Four parameters for the control action value in each state
Supervised Learning to Optimize Milling 銑削
Task: Cut a workpiece via milling
Approach
− Acquire data
− SVM to map local cutting conditions to shape
deviation SVM ⽤於映射局部切削條件以形成偏差
− Model-based optimization of local cutting conditions
基於模型的局部切削條件優化
Summary of Learning for Control
Advantages Disadvantages
• No end-to-end learning
• Real-world applicable
• Typically requires task-specific prior
• Can be combined with model- information
based techniques • Restricted to very specific problems
End-to-End Learning: Motivation & General Ideas
the model learns all the steps between the initial input phase and the final
output result, A deep learning process where all the different parameters are
simultaneously trained instead of sequentially.
In end-to-end learning, a model is trained to map raw inputs to desired outputs
using a large amount of labeled data. The model learns to extract useful
features from the data and to use these features to make predictions. This is
typically done using deep learning techniques, such as convolutional neural
networks or recurrent neural networks.
Motivation: Modelling is impossible (even from data)
Camera information as measurement,
Contact-rich manipulation
Machine-Learning-Based Methods
Reinforcement Learning, Imitation Learning
Learning of Complex Motion Tasks in Simulation
Problem
• Learn the solutions to a variety of different control tasks
• Dynamics may be of high order, nonlinear and contactrich/hybrid
Little / no assumptions
• No task-specific policy is known
• Rewards / cost function can be sparse
• Learning approach must solve all problems
Reinforcement Learning for End-to-End Solutions
Reinforcement learning: an agent learns to make decisions through trial and error.
A basic reinforcement learning agent AI interacts with its environment in discrete
time steps. how an intelligent agent ought to take actions in a dynamic
environment to maximize the cumulative reward. At each time 𝑡𝑡, the agent
receives the current state 𝑆𝑆𝑡𝑡 and reward 𝑅𝑅𝑡𝑡 . It then chooses an action 𝐴𝐴𝑡𝑡 from
the set of available actions, which is subsequently sent to the environment.
Reinforcement Learning Approaches
o Policy Gradient Methods
o Actor-Critic Methods
o Proximal Policy Optimization
Advantages
o RL can solve almost any control/motion task
o One method can solve many different tasks
Disadvantage
o Hours/days of system interaction and millions of trials required!
o Not real-world applicable!
Reinforcement Learning & Sim-to-Real
Idea:
Learn the solutions in simulation,
Transfer solution to the real-world
Problem: Sim-to-Real Gap
Approaches: Precise Simulation, Domain Randomization
Sim-to-Real for Locomotion
Approach
Build a precise simulation model
Reinforcement Learning in Simulation
1. Reference Motion Tracking
2. Policy Refinement
3. Policy Destillation
Imediate sim-to-real transfer
Results
− Robust walking with 1.2 m/s
− Iterative tuning of the simulation model was required
− Iterative tuning of the rewards was required
Sim-to-Real for Soccer
Solving the Rubic’s Cube by Automatic Domain Randomization
In simulation
• Reinforcement Learning via Proximal Policy Optimization
• Automatic Domain Randomization
In Reality
• Adaptive & robust policy
• Still limited performance (60% success rate)
Behavior Cloning / Learning from Demonstration
Behavioral cloning directly learns a policy by using supervised learning on
observation-action pairs from expert demonstrations. Imitation Learning
Fundamental Idea
• Operator demonstrates motion to solve task
• Robot records input/output observation pairs of the demonstration
• Supervised learning to learn the policy
Advantages
• Enables robots to solve complex tasks
• No reward function required to learn task
Disadvantages
• Expert demonstrations are required
• Limited by human-level performance
Behavior Cloning for Complex Manipulation Tasks
Behavior Cloning + Policy Optimization for Ball-in-a-Cup
Idea
• Generate initial policy by Behavior Cloning
• Improve policy by Reinforcement Learning
Advantages
• Real-world applicable
• Can exceed human performance
• Smaller requirements w.r.t. demonstrations
Disadvantages
• Still requires demonstrations
Summary of End-to-End Learning Approaches
Advantages Disadvantages
Enables novel applications Still some prior knowledge required
• Contact-rich manipulation Learning in reality is challenging
• Visualmotor policies
Reduces required export time and knowledge
Summary of Physical Interaction
Model-based methods Data-driven methods
• Rule actual applications • Reduce required expert time and knowledge
• Require expert time and knowledge • Open up novel application domains
• Application domains are limited • Sever limitations in real-world application