Decision
Decision
Decision Making
Motivation
Intelligent Environments are aimed at
improving the inhabitants’ experience
and task performance
Provide appropriate information
Automate functions in the home
Prediction techniques can only
determine what would happen next, not
what should happen next.
Automated functions can be different from
inhabitant actions
Computer has to determine actions that
would optimize inhabitant experience
Decision Making
Decision Making attempts to determine
the actions the system should take in
the current situation
Should a function be automated ?
What should be done next ?
Decisions should be based on the
current context and the requirements of
the inhabitants
Just programmed timers for automation are
not sufficient
Decision maker has to take into account the
stream of data
Decision Making in
Intelligent Environments
Example Decision Making Tasks in
Intelligent Environments:
Automation of physical devices
Turn on lights
Regulate heating and air conditioning
Control media devices
Automate lawn sprinklers
Automate robotic components (vacuum cleaner, etc)
Control of information devices
Provide recipe services in the kitchen
Construct shopping lists
Decide which types of alarms to display (and where)
Decision Making in
Intelligent Environments
Objectives of decision making:
Optimize inhabitant productivity
Minimize operating costs
Maximize inhabitant comfort
Decision making process has to be safe
Decisions made can never endanger
inhabitants or cause damage
Decisions should be within the range
accepted by the inhabitants
Example Task
Should a light be turned on ?
Decision Factors:
Inhabitant’s location (current and future)
Inhabitant’s task
Inhabitant’s preferences
Time of the day
Other inhabitants
Energy efficiency
Security
Possible Decisions
Turn on
Do not automate
Decision Making
Approaches
Pre-programmed decisions
Timer-based automation
Reactive decision making systems
Decisions are based on condition-action
rules
Decisions are driven by the available facts
Goal-based decision making systems
Decisions are made in order to achieve a
particular outcome
Utility-based decision making systems
Decisions are made in order to maximize a
given performance measure
Reactive Decision Making
Goal-Based Decision
Making
Utility-Based Decision
Making
Qualities of a Decision
Making
Ideal
Complete: always makes a decision
Correct: decision is always right
Natural: knowledge easily expressed
Efficient
Rational
Decisions made to maximize performance
Decision-Making
Techniques
Reactive Decision Making
Rule-based expert system
Goal-Based Decision Making
Planning
Decision theoretic Decision Making
Belief Networks
Markov decision process
Learning Techniques
Neural Networks
Reinforcement Learning
Rule-Based Decision
Making
Decisions are made based on rules and
facts
Facts represent the state of the environment
Represented as first-order predicate logic
Condition-Action rules represent heuristic
knowledge about what to do
Rules represent implications that imply actions
from logic sentences about facts
Inference mechanism
Deduction: {A, A B} B
The left hand side of rules are matched against the
set of facts
Rules where the left hand side matches are active
Rule-Based Inference
Rules define what actions should be
executed for a given set of conditions (facts)
Actions can either be external actions
(“automation”) or internal updates of the set of
facts (“state update”)
Rules are often heuristics provided by an expert
Multiple rules can be active at any given
time
Conflict resolution to decide which rule to fire
Scheduling of active rules to perform sequence
of actions
Example
Facts:
CurrentTime = 6:30
Location(CurrentTime,bedroom)
CurrentDay = Monday
Rules:
Internal actions:
(CurrentDay=Monday)^(CurrentTime>6:00)
^(CurrentTime<7:00)^(Location(CurrentTime,bedroom))
->Set(Location(NextTime,bathroom))
External actions:
(Location(NextTime,X)) -> Action(TurnOnLight,X)
Rule-Based Expert
Systems
Intended to simulate (and automate)
human reasoning process
Domain is modeled in first-order logic
State is represented by a set of facts
Internal rules model behavior of the environment
Experts provide sets of heuristic condition-
action rules
Rules with internal actions can model reasoning
process
Rules with external actions indicate decisions the
expert would make
The system can optionally be provided
with queries by including them in the
Internal Rules
Internal rules have to model the behavior
of the system
Persistence over time
E.g.:
(Location(CurrentTime,X))^(NoMove(CurrentTime))
-> Set(Location(NextTime,X))
Dynamic behavior of devices
E.g.: (Temperature(CurrentTime,X))^(HeatingOn)
-> Set(Temperature(NextTime,X+2))
Behavior of the inhabitants
E.g.: (Location(CurrentTime,bedroom))
^(CurrentTime>23:00)
^(LightOn(CurrentTime, bedroom))
-> Action(TurnOffLight, bedroom)
Rule-Based Expert
Systems
WORKING INFERENCE
MEMORY ENGINE
(Facts)
PATTERN EXECUTION
MATCHER
ENGINE
RULE
BASE
AGENDA
t
The optimal utility function U* can be
computed using Value iteration
U t 1 ( xi ) R ( xi ) max P ( x j | xi , d )U t ( x j )
d
xj
[DLRM94] Dodier, R. H., Lukianow, D., Ries, J., & Mozer, M. C. (1994).
State xi :
:
:
Decision d :
Prediction f(xi, d)
d ( xi ) arg min p f ( xi , d )
d
Decision:
Neural Networks
Characteristics
Efficient
Advantages
Can learn arbitrary decision functions from
training data
Generalizes to new situations
Fast decision making
Problems
Requires training data that contains desired
decision or a goal/objective
Requires design of sufficient input
representation
Reinforcement Learning
Reinforcement learning learns an
optimal decision strategy from trial and
error and sparse reward feedback.
On-line method to solve Markov Decision
Processes (or, with extensions, POMDPs).
Reward, R, is a signal encoding the
instantaneous feedback to the system.
System learns a mapping from states to
decisions, *(xi), which optimizes the
expected utility.
Q-Learning
Q-learning is the most popular
reinforcement learning technique for
MDPs.
Learns a utility function for state-action pairs
Q(x, d)
Utility U(x) = maxa Q(x,d)
Learns by experimentation.
Update Q(xi ,d) after each observed transition from
state xi by comparing the expected utility of (xi,d)
with the expectation computed after observing the
actual outcome xj.
Q(xi,d) = Q(xi,d) + * (R(xi) + maxd’ Q(xj,d’) - Q(xi,d))
Decisions are made to optimize Q-values
(x) = argmax Q(x,d)
Example System:
Regulation in the Adaptive
House [Moz98]
Neural network
regulators can control
lighting and heating to
achieve a given set
point
Set point is learned
using reinforcement
Energy usage
Inhabitant interactions