DSS07 CLS Rule Induction, K NN, Naive Bayesian en Đã G P

DECISION SUPPORT SYSTEM
Lesson 7: CLS - Rule Induction, k-NN, Naïve Bayesian
Dr. Le, Hai Ha

Contents
 Rule Induction
 k-Nearest Neighbors
 Naïve Bayesian
2
Rule Induction
Rule Induction
• Rule induction is a data science algorithm/process of
deducing if-then rules from a dataset
• E.g.
• “if it is 8:00 a.m. on a weekday, then highway traffic will be
heavy” and
• “if it is 8:00 p.m. on a Sunday, then the traffic will be light.”
• They are called symbolic decision rules
• They are not necessarily right all the time. E.g.
• The 8:00 a.m. weekday traffic may be light during a holiday
season.
• They explain an inherent relationship between the
attributes and class labels in a dataset.
• Many real-life experiences are based on intuitive rule
induction.
4
Rules decuced from Golf dataset
• Using tree
Rule 1: if (Outlook = overcast) then yes

Rule 2: if (Outlook = rain) and (Wind = false) then yes
Rule 3: if (Outlook = rain) and (Wind = true) then no Rule set
Rule 4: if (Outlook = sunny and (Humidity > 77.5) then no
Rule 5: if (Outlook = sunny and (Humidity ≤ 77.5) then yes
5
Advantages of Rule Induction
• The rule induction provides a powerful
classification approach that can be easily
understood by the general audience.
• Apart from its use in data science by classification
of unknown data, rule induction is also used to
describe the patterns in the data.
• The easiest way to extract rules from a dataset is
from a decision tree that is developed on the same
dataset.
6
Rules – Rule set
• 𝑅 = { 𝑟1 ∪ 𝑟2 ∪ 𝑟3 ∪ … ∪ 𝑟𝑘 }
• Where 𝑘 is the number of disjuncts in a rule set.
Individual disjuncts can be represented as
𝑟𝑖 = if (antecedent or condition) then

(consequent)
• Each attribute and value test is called the conjunct

of the rule.
• E.g. (Outlook = rain) is a conjunct
7
Rules
• There can be rule sets that are not mutually
exclusive
• If a record activates more than one rule in a rule set
and all the class predictions are the same, then
there is no conflict in the prediction
• If the class predictions differ, ambiguity exists on
which class is the prediction of the induction rule
model
• Techniques used to resolve conflict
• ordered list of rules
• highest vote
• The rule set discussed is also exhaustive.
• “else Class = Default Class Value”
8
E.g. “if the cylinder temperature continues to report more than 852oC,
then the machine will breakdown in the near future”
9
Approaches
Sequential covering
• Sequential covering is an iterative procedure of
extracting rules from a dataset.
• The sequential covering approach attempts to find
all the rules in the dataset class by class.
• One specific implementation of the sequential
covering approach is called the RIPPER, which
stands for Repeated Incremental Pruning to
Produce Error Reduction
11
Example
• Consider the dataset which has two attributes
(dimensions) on the 𝑋 and 𝑌 axis and two-class
labels marked by “+” and “-”
12
Algorithm
• Step 1: Class selection
• Select the least-frequent class label to develop rules
before moving on to the next class
• In the example: class “+” is selected
Step 2: Rule development

• The objective is to cover all “+”
data points using a rectilinear
box with none or as few “-” as
possible
• E.g. rule 𝑟1 identifies the area
of four “+”
13
Algorithm
• Step 3: Learn-One-Rule
• To develop each disjunct rule
• Each rule starts with an empty rule set and conjuncts are
added one by one to increase the rule accuracy.
•
Step 4: Next Rule

• After a rule is developed, then
all the data points covered by
the rule are eliminated
• All these steps are repeated for
the next rule
• E.g. Rule 𝑟2 is developed
14
Algorithm
• Step 5: Development of Rule Set
• After the rule set is developed to identify all “+” data
points, the rule model is evaluated with a test dataset
used for pruning to reduce generalization errors
𝑝−𝑛
• The metric used to evaluate: ; p – Number of
𝑝+𝑛
positive records; n – Number of negative records
• The conjunct is iteratively removed if it improves the
metric
• All rules that identify “+” data points are aggregated to
form a rule group
• In multi-class problems, the previous steps are repeated
with for next class label
15
Rule Induction in RapidMiner
16
Tree to rules
17
k-NEAREST NEIGHBORS
(k-NN)
k-NN
• Lazy learners - memorize the training set
• Old adage “birds of a feather flock together.”
19
Guess the species for A and B
A(2.1, 0.5)
B(5.7, 1.9)
20
k-NN
The 𝑘 in the k-NN algorithm indicates the number of close
training record(s) that need to be considered when making the
prediction for an unlabeled test record.
21
Measure of Proximity
• Distance
• The distance between two points 𝑋(𝑥1 , 𝑥2 ) and
𝑌(𝑦1 , 𝑦2 ):
• Generalize to to the distance in n-dimensional space
• E.g. in Iris dataset 𝑋 = (4.9, 3.0, 1.4, 0.2) and 𝑌 =

4.6, 3.1, 1.5, 0.2
22
• Other Distance
• In general, Minkowsky distance:
23
Correlation similarity
Simple matching coefficient
Jaccard Similarity
Cosine similarity
24
How to determine predicted class
• Vote:
• Weights:
25
RapidMiner
26
NAÏVE BAYESIAN
Predict your commute time
http://www.wired.com/2015/08/pretty-maps-bay-area-hellish-commutes/#slide-2
Bayes’ theorem
• Bayes’ theorem is one of the most influential and
important concepts in statistics and probability theory
• It provides a mathematical expression for how a degree
of subjective belief changes to account for new
evidence
• Assume 𝑿 is the evidence (attribute set) and 𝑌 is the
outcome (class label)
• 𝑿 = {𝑋1 , 𝑋2 , 𝑋3 , . . . , 𝑋𝑛 }, 𝑋𝑖 is an individual attribute
• The probability of outcome 𝑃(𝑌) is called prior
probability - calculated from the training dataset
• 𝑃(𝑌|𝑿) is called the conditional probability, also called
posterior probability.
29
Bayes’ theorem
Prior Probility
(Probability of the outcome) Class conditional
probability
Posterior Probability of the

probability conditions
• P(X|Y) is another conditional probability, called the class

conditional probability. It is the probability of the existence
of conditions given an outcome.
• P(X) is basically the probability of the evidence
Naïve Bayesian algorithm
• Naïve assumption: 𝑋𝑖 are independent each other
• Posterior Probability
• For the purpose of comparing 𝑃(𝑌|𝑿) for different

𝑌, 𝑃(𝑿) does not need to be calculated
31
Example: Golf data set
𝑃 𝑌 = 𝑛𝑜 = 5/14 𝑃 𝑌 = 𝑦𝑒𝑠 = 9/14

Class conditional probability
Test record
Play = no
Issue 1: Incomplete Training Set
• If an unseen test example consists of the attribute
value Outlook = overcast
• In this case, test example will be Play=Yes

• Assume no training records have Temperature =
low for outcome yes => P(Y = no|X) and P(Y =
yes|X), will also be zero => the dilemma
35
Issue 1: Solution
• Laplace correction technique
• Assume the class conditional probability for all the
three values for Outlook is 0/5, 2/5, and 3/5, Y =
no.
• Controlled error can be added by adding 1 to all
numerators and 3 for all denominators, so the class
conditional probabilities are 1/8, 3/8, and 4/8
• Generically, the Laplace correction:
36
Issue 2: Continuous Attributes
• Using probability density function instead
• E.g. Normal distribution
37
38
Issue 3: Attribute Independence
• One of the fundamental assumptions in the naïve
Bayesian model is attribute independence.
• Before applying the naïve Bayesian algorithm, it
makes sense to remove strongly correlated
attributes.
• The independence of categorical attributes can be
tested by the chi-square (𝜒 2 ) test for
independence.
39
RapidMiner
40
Lesson 8: CLS – Artificial Neural Network, Support
Vector Machine, Ensemble Learner
Dr. Le, Hai Ha

Contents
 Artificial Neural Network

 Support Vector Machine
 Ensemble Learner
2
Artificial Neural Network
Neuron – Biological vs. Artificial
A Biological Neuron An Artificial Neuron
An Artificial Neural Network
4
Some activation functions
5
Example: An ANN for Iris data set
Input: 4-dimensional vectors

Hidden: 1 Dense layer
Output: 1 Dense layer
6
How to learn?
• Learning/training means calibrating weights
• Define a loss function:
𝐿 = 𝑓 𝑦,
ത𝑦
• Gradient descent:
′
𝜕𝐿
𝑤 =𝑤−𝜆∗
𝜕𝑤
• 𝜆 – Learning rate
7
Calculate gradient
• Numerical method
𝜕𝐿 Δ𝐿
~
𝜕𝑤 Δ𝑤
• Analytical method
𝜕𝐿
Calculate directly
𝜕𝑤
Backpropagation method
8
Example
9
Example
10
Example
11
Example
12
Example
13
Example
14
Example
15
Example
16
Example
17
Process in Rapidminer
18
Parameters
• Hidden layer:
• Determines the number of layers, size of each hidden
layer, and names of each layer for easy identification in
the output screen. The default size of the node is -1,
which is calculated by (number of attributes + number
of classes)/2 + 1. The default node size can be
overwritten by specifying a number, not including a no-
input threshold node per layer.
• Training cycles:
• This is the number of times a training cycle is repeated;
it defaults to 500. In a neural network, every time a
training record is considered, the previous weights are
quite different, and hence, it is necessary to repeat the
cycle many times.
19
Parameters
• Learning rate:
The value of λ determines the sensitivity of the change while
back propagating the error. It takes a value from 0 to 1. A value
closer to 0 means the new weight would be more based on the
previous weight and less on error correction. A value closer to 1
would be mainly based on error correction.
• Momentum:
This value is used to prevent local maxima and seeks to obtain
globally optimized results by adding a fraction of the previous
weight to the current weight.
• Decay:
During the neural network training, ideally the error would be
minimal in the later portion of the training record sequence.
One wouldn’t want a large error due to any outlier records in
the last few records, as it would thereby impact the
performance of the model. Decay reduces the value of the
learning rate and brings it closer to zero for the last training
record.
20
Parameters
• Shuffle:
If the training record is sorted, one can randomize the sequence
by shuffling it. The sequence has an impact in the model,
particularly if the group of records exhibiting nonlinear
characteristics are all clustered together in the last segment of
the training set.
• Normalize:
Nodes using a sigmoid transfer function expect input in the
range of -1 to 1. Any real value of the input should be
normalized in an ANN model.
• Error epsilon:
The objective of the ANN model should be to minimize the
error but not make it zero, at which the model memorizes the
training set and degrades the performance. The model building
process can be stopped when the error is less than a threshold
called the error epsilon.
21
Support Vector Machine
Boundary
23
Boundary and margin
Many ways (boundaries)
can separate classes
The margin of two classes are equal and as large as possible
24
Transforming linearly non-separable data
25
Optimal hyperplane
26
Ensemble Learner
Ensemble model
• Wisdom of the Crowd
• Meta learners = sum of several base models
• Reduces the model generalization error
28
Ensemble models
29
Example
• Assume a meeting of three board members of a Company
• Individually each board member makes wrong decisions
about 20% of time
• Needs to make a yes/no decision for a major project
proposal
• If all board members make consistent unanimous decisions
every time, then the error rate of the board as a whole is
20%.
• If each board member’s decisions are independent and if
their outcomes are not correlated, the board makes an error
only when more than two board members make an error at
the same time.
• The error rate of the board can be calculated using the
binomial distribution.
30
Probability of the board’s error
Binomial distribution, the probability of 𝑘 successes in 𝑛 independent
trials each with a success rate of 𝑝 is given by a probability mass function
The generic form:
31
Important criteria
• Each member of the ensemble should be
independent.
• The individual model error rate should be less than
50% for binary classifiers.
32
Achieving the Conditions for Ensemble Modeling
• Different model algorithms

• The same training set can be used to build different classifiers
• Parameters within the models
• Changing the parameters like depth of the tree, gain ratio, and
maximum split for the decision tree model can produce multiple
decision trees.
• Changing the training record set
• A training set can be divided into multiple sets and each set can be
used to build one base model.
• Sample training data with replacements from a dataset and repeat
the same process for each base model.
• Changing the attribute set
• Sample the attributes for each base model.
33
Vote model
• Many base model trained by the same training data set
34
Vote model
35
Vote model
36
Bagging model
• Base models are developed by changing the training set for
every base model
• In a given training set 𝑇 of 𝑛 records, 𝑚 training sets are
developed each with 𝑛 records, by sampling with replacement
37
Bagging model
38
Bagging model
39
Boosting model
• Trains the base models in sequence one by one and assigns
weights for all training records
• The next model will focus on the hard-to-classify data space.
40
Boost model
41
Random forest
• Uses a concept similar to the one used in bagging
• When deciding on splitting each node in a decision tree, the
random forest only considers a random subset of the
attributes in the training set
42
Random forest
If there are 𝑛 training records with 𝑚 attributes, and
𝑘 number of trees in the forest; then for each tree
1. An n-size random sample is selected with
replacement. This step is similar to bagging.
2. A number 𝐷 is selected, where 𝐷 << 𝑚. 𝐷
determines the number of attributes to be considered
for node splitting.
3. A decision tree is started. For each node, instead of
considering all m attributes for the best split, a
random number of 𝐷 attributes are considered. This
step is repeated for every node.
4. As in any ensemble, the greater the diversity of the
base trees, the lower the error of the ensemble.
43
Random forest
44
Lesson 9: Clustering
Dr. Le, Hai Ha

Contents
 What is clustering analysis

 k-Means clustering
 DBSCAN clustering
 Self-Organizing Maps
2
What is clustering analysis
Clustering
• Clustering is the process of finding meaningful groups in
data
• For example, the customers of a company can be grouped
based on purchase behavior.
• The task of clustering can be used in two different classes of
applications:
• To describe a given dataset and
• as a preprocessing step for other data science algorithms.
4
Clustering to describe the data
• The most common application of clustering is to explore the
data and find all the possible meaningful groups in the data
• Clustering a company’s customer records can yield a few
groups in such a way that customers within a group are
more like each other than customers belonging to a
different group
• Applications:
• Marketing: Finding the common groups of customers
• Document clustering: One common text mining task is to
automatically group documents
• Session grouping: In web analytics, clustering is helpful to
understand common groups of clickstream patterns
5
Clustering for preprocessing
• Clustering to reduce dimensionality
• Clustering for object reduction
6
Types of clustering techniques
• The clustering process seeks to find groupings in data, in
such a way that data points within a cluster are more similar
to each other than to data points in the other clusters
• One common way of measuring similarity is the Euclidean
distance measurement in n-dimensional space
7
Taxonomy based on data point’s membership
• Exclusive or strict partitioning clusters: Each data object
belongs to one exclusive cluster
• Overlapping clusters: The cluster groups are not exclusive,
and each data object may belong to more than one cluster.
• Hierarchical clusters: Each child cluster can be merged to
form a parent cluster.
• Fuzzy or probabilistic clusters: Each data point belongs to all
cluster groups with varying degrees of membership from 0
to 1.
8
Taxonomy by algorithmic approach
• Prototype-based clustering: In the prototype-based clustering,
each cluster is represented by a central data object, also called a
prototype (centroid clustering or center-based clustering)
• Density clustering:
• Each dense area can be assigned a cluster and the low-density area can
be discarded as noise
• Hierarchical clustering:
• Hierarchical clustering is a process where a cluster hierarchy is created
based on the distance between data points.
• The output of a hierarchical clustering is a dendrogram: a tree diagram
that shows different clusters at any point of precision which is specified
by the user
• Model-based clustering:
• Model-based clustering gets its foundation from statistics and
probability distribution models; this technique is also called
distribution-based clustering.
• Mixture of Gaussians is one of the model-based clustering techniques
9
K-Means Clustering
k-Means
• k-Means clustering is a prototype-based clustering method where
the dataset is divided into k-clusters.
• User specifies the number of clusters (k) that need to be grouped
in the dataset.
• The objective of k-means clustering is to find a prototype data
point for each cluster; all the data points are then assigned to the
nearest prototype, which then forms a cluster.
• The prototype is called as the centroid, the center of the cluster.
• The center of the cluster can be the mean of all data objects in
the cluster, as in k-means, or the most represented data object, as
in k-medoid clustering.
• The cluster centroid or mean data object does not have to be a
real data point in the dataset and can be an imaginary data point
that represents the characteristics of all the data points within
the cluster.
11
k-Means
• The data objects inside a partition belong to the cluster.
• These partitions are also called Voronoi partitions, and each
prototype is a seed in a Voronoi partition.
12
Algorithm
• The logic of finding k-clusters within a given dataset is rather
simple and always converges to a solution
• However, the final result in most cases will be locally
optimal where the solution will not converge to the best
global solution
• Step 1: Initiate Centroids
• The first step in k-means algorithm is to initiate k random centroids
• Step 2: Assign Data Points
• all the data points are now assigned to the nearest centroid to form
a cluster.
• Step 3: Calculate New Centroids
• New centroids are means of each clusters
13
Algorithm
• Step 4: Repeat Assignment and Calculate New Centroids
• assigning data points to the nearest centroid is repeated until all
the data points are reassigned to new centroids
• Step 5: Termination
• Step 3—calculating new centroids, and step 4—assigning data
points to new centroids, are repeated until no further change in
assignment of data points happens.
14
Some key issues
• Initiation: The final clustering grouping depends on the
random initiator and the nature of the dataset.
• Empty clusters: One possibility in k-means clustering is the
formation of empty clusters in which no data objects are
associated.
• Outliers: Since SSE (sum of squared errors) is used as an
objective function, k-means clustering is susceptible to
outliers
• Post-processing: Since k-means clustering seeks to be locally
optimal, a few post-processing techniques can be
introduced to force a new solution that has less SSE
15
Evaluation of Clusters
• Evaluation of clustering can be as simple as computing total
SSE
• Good models will have low SSE within the cluster and low
overall SSE among all clusters.
• SSE can also be referred to as the average within-cluster
distance and can be calculated for each cluster and then
averaged for all the clusters.
• Davies-Bouldin index is a measure of uniqueness of the
clusters and takes into consideration both cohesiveness of
the cluster (distance between the data points and center of
the cluster) and separation between the clusters
16
k-Means in RapidMiner
17
DBSCAN Clustering
DBSCAN
• A cluster can also be defined as an area of high
concentration (or density) of data objects surrounded by
areas of low concentration (or density) of data objects.
• A density-clustering algorithm identifies clusters in the data
based on the measurement of the density distribution in n-
dimensional space
• Specifying the number of the cluster parameters (𝑘) is not
necessary for density-based algorithms
• Thus, density-based clustering can serve as an important
data exploration technique
• Density can be defined as the number of data points in a
unit n-dimensional space
19
Algorithm
• Step 1: Defining Epsilon and
MinPoints
• Calculation of a density for all
data points in a dataset, with a
given fixed radius 𝜀 (epsilon).
• To determine whether a
neighborhood is high-density
or low-density, a threshold of
data points (MinPoints) will
have to be defined, above
which the neighborhood is
considered high-density
• Both 𝜀 and 𝑀𝑖𝑛𝑃𝑜𝑖𝑛𝑡𝑠 are
user-defined parameters
20
Algorithm
• Step 2: Classification of Data Points
Core points: All the data points inside the high-
density region of at least one data point are
considered a core point. A high-density region
is a space where there are at least MinPoints
data points within a radius of 𝜀 for any data
point.
Border points: Border points sit on the

circumference of radius 𝜀 from a
data point. A border point is the boundary
between high-density and low-density space.
Border points are counted within the high-
density space calculation
Noise points: Any point that is neither a core point nor border point is called a
noise point. They form a low-density region around the high density region.
21
Algorithm
• Step 3: Clustering
• Groups of core points form distinct clusters. If two core points are
within 𝜀 of each other, then both core points are within the same
cluster
• All these clustered core points form a cluster, which is surrounded
by low-density noise points
• A few data points are left unlabeled or associated to a default noise
cluster
22
Special Cases: Varying Densities
• The dataset has four distinct regions numbered from 1-4.
Region 1 is the high-density area A, regions 2 and 4 are of
mediumdensity B, and between them is region 3, which is
extremely low-density C
• The density threshold parameters are tuned in such a way

as to partition and identify region 1, then regions 2 and 4
(with density B) will be considered noise, along with region
3
23
DBSCAN in RapidMiner
24
Self-Organizing Maps
SOM
• A self-organizing map (SOM) is a powerful visual clustering
technique that evolved from a combination of neural
networks and prototype-based clustering
• A key distinction in this neural network is the absence of an
output target function to optimize or predict, hence, it is an
unsupervised learning algorithm
• SOM methodology is used to project data objects from data
space, mostly in 𝑛 dimensions, to grid space, usually
resulting in two dimensions
26
Algorithm
• Step 1: Topology Specification
• Two-dimensional rows and columns with either a rectangular lattice
or a hexagonal lattice are commonly used in SOMs
• The number of centroids is the product of the number of rows and
columns in the grid
27
Algorithm
• Step 2: Initialize Centroids
• A SOM starts the process by initializing the centroids. The initial
centroids are values of random data objects from the dataset. This
is similar to initializing centroids in k-means clustering.
• Step 3: Assignment of Data Objects
• After centroids are selected and placed on the grid in the
intersection of rows and columns, data objects are selected one by
one and assigned to the nearest centroid.
• Step 4: Centroid Update
28
Algorithm
• Step 5: Termination
• The entire algorithm is continued until no significant centroid
updates take place in each run or until the specified number of run
count is reached
• a SOM tends to converge to a solution in most cases but doesn’t
guarantee an optimal solution
• Step 6: Mapping a New Data Object
• any new data object can be quickly given a location on the grid
space, based on its proximity to the centroids.
• The characteristics of new data objects can be further understood
by studying the neighbors.
29
SOM in RapidMiner
30
SOM in RapidMiner
31
32
Lesson 10: Text Mining
Dr. Le, Hai Ha

Contents
 Text data
 Corpus
 Preprocessing text
 Context
 Bag of Words
 Document Embedding
 Clustering
 Classification
 Sentiment Analysis
https://ucilnica.fri.uni-lj.si/pluginfile.php/164808/mod_resource/content/2/Text%20Mining.pdf
2
Text data
• Unstructured data (including text, audio, images, videos,
etc.) is the new frontier of data science
• If all the data in the world was equivalent to the water on
earth, then textual data is like the ocean, making up a
majority of the volume
• Text analytics is driven by the need to process natural
human language, but unlike numeric or categorical data,
natural language does not exist in a structured format
consisting of rows (of examples) and columns (of attributes)
• Text mining is, therefore, the domain of unstructured data
science
3
High-level process for text mining
4
Corpus
• A collection of documents
• A document: a collection of
sentences/words/characters
• Example: Grimm-talesselected.tab
5
6
We need to remove all the bits that carry no information,
namely punctuation and stopwords.
7
Preprocessing Text
• Preprocessing is key to defining what is important
in the data. Is “Doctor” the same as “doctor”?
• Should we consider words such as “and”, “the”,
“when” or omit them?
• Do we wish to treat “said” and “say” as the same
word?
• Preprocessing defines the core units of the
analysis.
• Token is a basic unit of the analysis. It can be a
word, a bigram, a sentence… With preprocessing
we define our tokens for the analysis.
8
9
Preprocessing terminology
• Stopwords: articles, conjunctions, pronouns, prepositions,
and other similar terms that need to be filtered before
additional analysis. The process of removing these words is
called Stop word filtering
• Term filtering: process to remove some normal terms in
specific domains
• Stemming: process to convert words into their stem.
• n-gram: group 𝑛 words into a term
• POS tagger: tagging tags each token with a corresponding
part-of-speech tag (sons → noun, plural, tag = NNS)
10
Two of the most frequent words are “would” and “could”.
If we decide these two words are not important for our analysis, it
would be good to omit them.
We can do this with custom filtering.
11
12
Context
• Concordance shows the text around the given word.
13
Bag of Words
• Bag of Words creates a table with words in columns and
documents in rows. Values are word occurrences in each
document. They can be binary, but normally they are
counts.
14
Term Frequency-Inverse Document Frequency
• Example: search web pages with keywords

“RapidMiner books that describe text mining.”
1. Give a high weightage to those keywords that are
relatively rare.
2. Give a high weightage to those web pages that contain
a large number of instances of the rare keywords.
• The highest-weighted web pages are the ones for
which the product of these two weights is the
highest
• The technique of calculating this weighting is called
term TF-IDF, which stands for term frequency-
inverse document frequency.
15
TF-IDF
𝑛𝑘 𝑁
𝑇𝐹 − 𝐼𝐷𝐹 = × log 2
𝑛 𝑁𝑘
• In the example, when the high TF for “that” is multiplied by

its corresponding low IDF, a low (or zero) TF-IDF will be
reached, whereas when the low TF for “RapidMiner” is
multiplied by its corresponding fairly high IDF, a relatively
higher TF-IDF would be obtained
• Typically, TF-IDF scores for every word in the set of
documents is calculated in the preprocessing step of the
three-step process described earlier.
16
TF-IDF
• Term frequency (TF): the ratio of the number of times a keyword
appears in a given document, 𝑛𝑘 (where 𝑘 is the keyword), to the total
number of terms in the document, 𝑛:
𝑛𝑘
𝑇𝐹 =
𝑛
• E.g. “that” has a fairly high TF score, and “RapidMiner” will have a much
lower TF score
• Inverse document frequency (IDF):
𝑁
𝐼𝐷𝐹 = log 2
𝑁𝑘
• 𝑁 is the number of documents, and 𝑁𝑘 is the number of documents
that contain the keyword, 𝑘
• “that” would arguably appear in every document and, thus, the ratio
(𝑁/𝑁𝑘 ) would be close to 1, and the IDF score would be close to zero.
“RapidMiner” would possibly appear in a relatively fewer number of
documents and so the ratio (𝑁/𝑁𝑘 ) would be much greater than 1
17
Example
• Corpus
• Document vector or term document matrix (TDM): the

matrix with columns consist of all the tokens found in the
documents and the cells of the matrix are the counts of the
number of times a token appears
18
Example
• TDM using TF
• TDM using TF-IDF
19
20
Document Embedding
• Word embedders are based on pre-trained deep models
that map words in the language space. In such a model,
words with similar meaning and words from the same family
(car, Toyota, vehicle) would be placed close together.
Computing a vector for an individual word based on the
model is called embedding.
Orange uses fastText pre-trained models
to embed words. Then is averages word
vectors to produce a single document
vector (one can also use sum, min or max
aggregation)
21
22
23
Clustering & Distances
• One common task in text mining is finding interesting groups of similar
documents. That is, we would like to identify documents that are similar
to each other.
• We normally use Euclidean distance to measure the similarity, but the
Euclidean distance is not the only option.
• There are many distance measures and Euclidean doesn’t work very
well for text.
An example of the similarity
24
Word Enrichment
• Word Enrichment compares a subset of documents
against the entire corpus and finds statistically
significant words for the selected subset. It uses
hypergeometric p-value to find words, that are
overrepresented in the subset.
25
26
Classification
27
Predictions
28
Sentiment Analysis
More advanced techniques for sentiment analysis are based on models, usually with
deep neural networks that learn from a large amount of labelled texts.
29
30
Lesson 11: Deep learning
Dr. Le, Hai Ha

Contents
 Deep Learning
 Lost Functions
 Convolution Neural Network
 Practice with Keras/Tensorflow
2
AI/ML/Deep Learning
3
Deep learning
“Deep learning allows computational models that are
composed of multiple processing layers to learn
representations of data with multiple levels of abstraction.”
“Deep-learning methods are representation-learning methods with multiple levels

of representation, obtained by composing simple but non-linear modules that
each transform the representation at one level (starting with the raw input) into a
representation at a higher, slightly more abstract level. […] The key aspect of
deep learning is that these layers of features are not designed by human
engineers: they are learned from data using a general-purpose learning
procedure.”
4
Image classification
• Which is the task of assigning an input image one
label from a fixed set of categories.
• This is one of the core problems in Computer Vision
5
The problem is simple with human,
but with computes
The image is 248 pixels wide, 400 pixels
tall, and has three color channels Red,
Green, Blue (or RGB for short).
Therefore, the image consists of 248 x
400 x 3 numbers, or a total of 297,600
numbers.
Image classification model = a function
to map 297,600 numbers to 1 class
number (or a probability vector of 10-
dimensions)
Inputs Outputs
Called:
Model
6
Many challenges
7
Many challenges
8
Many challenges
9
Parametric approach
10
Linear Classifier
11
Linear Classifier
12
Linear Classifier - Bias addition
13
Example
14
Example
15
Algebraic view
16
Visual view
17
Geometric view
18
CIFAR data
19
Lost functions
20
Lost functions
21
Lost functions
22
Lost functions
23
Lost functions
24
Lost functions
25
Lost functions
26
Softmax classifier
27
Softmax classifier
28
Softmax classifier
29
Softmax classifier
30
Lost Functions
31
Lost Functions
32
Lost Functions
• Broadly, loss functions can be classified into two major
categories depending upon the type of learning task we
are dealing with — Regression losses and Classification
losses.
• Regression losses:
• Mean Square Error/Quadratic Loss/L2 Loss
• Mean Absolute Error/L1 Loss
• Mean Bias Error
• Classification losses:
• Hinge Loss/Multi class SVM Loss
• Cross Entropy Loss/Negative Log Likelihood
33
CNN - Convolution Neural Network
34
• Convolution operation
• Image * kernel
35
36
37
Network
Recognise handwritten digits
Training model with 60,000

samples
(sized 28x28 pixels)
38
Practice with Keras/Tensorflow
• Install Tensorflow
• Install Keras (optional)
• Practice with recognise handwritten digits
39
MNIST Data
• MNIST is handwritten digit database with 60000
training data and 10000 testing data
• Each image is in the size of 28X28, grayscale
• Labeled from 0 to 9 respectively with the images of
handwritten
40
MNIST Dataset
41
General model
42
Model design
Convolution2D(32, (3, 3), activation='relu', input_shape=(1,28,28))
Convolution2D(32, (3, 3), activation='relu')
MaxPooling2D(pool_size=(2,2))
Dropout(0.25)
Flatten()
Dense(128, activation='relu')
Dropout(0.5)
Dense(10, activation='softmax')
43
Model summary
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 32, 26, 26) 320
_________________________________________________________________
conv2d_2 (Conv2D) (None, 32, 24, 24) 9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 12, 12) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 32, 12, 12) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 4608) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 589952
_________________________________________________________________
dropout_2 (Dropout) (None, 128) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 1290
=================================================================
Total params: 600,810
Trainable params: 600,810
Non-trainable params: 0
_________________________________________________________________
44
Numpy and Keras import
import numpy as np
np.random.seed(123) # for reproducibility
from keras import backend as ke

ke.set_image_data_format('channels_first')
#2
from keras.models import Sequential
#3
from keras.layers import Convolution2D, MaxPooling2D, Dropout
from keras.layers import Dense, Flatten
#4
from keras.utils import np_utils
45
Load MNIST dataset
#5
from keras.datasets import mnist
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
#6
print(X_train.shape)
# (60000, 28, 28)
#7
from matplotlib import pyplot as plt
plt.imshow(X_train[0])
plt.show()
46
Data preparation
#8
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
#9
print(X_train.shape)
# (60000, 1, 28, 28)
#10
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
47
Data preparation
#11
print(y_train.shape)
# (60000,)
#12
print(y_train[:10])
# [5 0 4 1 9 2 1 3 1 4]
#13
# Convert 1-dimensional class arrays to 10-dimensional
class matrices
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
#14
print(Y_train.shape)
# (60000, 10)
48
Model design
#15
model = Sequential()
#16
model.add(Convolution2D(32, (3, 3), activation='relu',
input_shape=(1,28,28)))
#17
print(model.output_shape)
# (None, 32, 26, 26)
#18
model.add(Convolution2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
#19
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))
49
ReLU and Softmax
• ReLU
• Softmax
50
Compile and fit the model
#20
model.compile(loss='categorical_crossentropy’,
optimizer='adam’,
metrics=['accuracy'])
#21
model.fit(X_train, Y_train,
batch_size=32, epochs=10, verbose=1)
# Epoch 1/10
# 7744/60000 [==>...........................] - ETA: 96s -
loss: 0.5806 - acc: 0.8164
51
Categorical Crossentropy
52
Adam optimizer
• The Adam optimization algorithm is an extension to
stochastic gradient descent
• Learning rate is depend on Gradients
• Achieves good results fast.
53
Evaluate and save model
#22
score = model.evaluate(X_test, Y_test, verbose=0)
print(score)
#23
model.save('mnist-guide_10_model_adam.h5')
54
Model training continuously
#Load partly trained model

from keras.models import load_model
model=load_model('mnist-guide_10_model_adam.h5')
#21
model.fit(X_train, Y_train,
batch_size=32, epochs=2, verbose=1)
# Epoch 1/10
# 7744/60000 [==>...........................] - ETA: 96s
- loss: 0.5806 - acc: 0.8164
55
Prediction
#Load partly trained model

from keras.models import load_model
model=load_model('mnist-guide_10_model_adam_2.h5’)
img=X_test[1]
#7
from matplotlib import pyplot as plt
plt.imshow(img)
plt.show()
img=img.reshape(1,1,28,28)
img=img.astype(float)/255
pred=model.predict(img)
print(np.argmax(pred))
56
Other models
def baseline_model():
# create model
model.add(Conv2D(32, (5, 5), input_shape=(1, 28, 28),
activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam’,
return model
57
Other models
def larger_model():
# create model
model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28),
activation='relu'))
model.add(Conv2D(15, (3, 3), activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
# Compile model
model.compile(loss='categorical_crossentropy', optimizer='adam’,
return model
58
Lesson 12B: Introduction to PowerBI

Content
1. Introduction to Power BI
2. How to use Power BI Desktop
2
Power BI
• A set of applications and services that transform data from
different sources into information/knowledge
• Contains three components:
3
Power BI
• For large enterprise, using on-premise server, Power BI Report
Server is used instead of Power BI Service
4
Power BI Desktop
• Is a free desktop application that allows connecting, transforming and
visualizing data.
• The usual process is to use Power BI desktop to create reports and then
publish them on the Power BI service to share with other users
5
Three views on Power BI desktop
• Report: create report.
• Data: view tables,
measures
• Model: manage data
model
6
Connecting to data sources
• Connect to multi data
sources
7
Transform, clean and create data model
• Using Power Query Editor
8
Power Query Editor
9
Create dashboard
Mani visualizations
10
Create report
11
Share report
12
Power BI Service
13
How to use PowerBI desktop
• Download and setup:
– https://powerbi.microsoft.com/en-us/desktop/
– Or from Power BI Service (app.powerbi.com)
14
Start P.BI
15
Interface for report creating
3 views: report,
data, relationship
16
Connect to data
https://www.bankrate.com/retirement/best-
and-worst-states-for-retirement/
17
Load or Transform data
18
Change data type
19
Apply transformed steps
20
E.g. remove last 10 rows
21
E.g. remove columns
22
Modify transformed steps
23
Data association
• https://en.wikipedia.org/wiki/List_of_U.S._state_abbreviations
• Choose Codes and abbreviations for U.S. states, federal district, territories,
and other regions
24
Remove column and filter
25
Change column name
26
Merge query
27
Merge query
28
Power BI desktop
• Follow the tutorial
https://docs.microsoft.com/vi-vn/power-
bi/desktop-getting-started
29
Create report
• Using Purchase Orders.xlsx
30
Lesson 12: Data Warehousing
Dr. Le, Hai Ha

Learning Objectives
• Understand the basic definitions and concepts of
data warehouses
• Learn different types of data warehousing
architectures; their comparative advantages and
disadvantages
• Describe the processes used in developing and
managing data warehouses
• Explain data warehousing operations
• Explain the role of data warehouses in decision
support
2
Learning Objectives
• Explain data integration and the extraction,
transformation, and load (ETL) processes
• Describe real-time (a.k.a. right-time and/or active)
data warehousing
• Understand data warehouse administration and
security issues
3
Main Data Warehousing (DW) Topics
• DW definitions
• Characteristics of DW
• Data Marts
• ODS, EDW, Metadata
• DW Framework
• DW Architecture & ETL Process
• DW Development
• DW Issues
4
Data Warehouse Defined
• A physical repository where relational data are
specially organized to provide enterprise-wide,
cleansed data in a standardized format
• “The data warehouse is a collection of integrated,

subject-oriented databases design to support DSS
functions, where each unit of data is non-volatile
and relevant to some moment in time”
5
Characteristics of DW
• Subject oriented
• Integrated
• Time-variant (time series)
• Nonvolatile
• Summarized
• Not normalized
• Metadata
• Web based, relational/multi-dimensional
• Client/server
• Real-time and/or right-time (active)
6
Data Mart
A departmental data warehouse that stores only
relevant data
• Dependent data mart

A subset that is created directly from a data warehouse
• Independent data mart

A small data warehouse designed for a strategic
business unit or a department
7
Data Warehousing Definitions
• Operational data stores (ODS)
A type of database often used as an interim area for a data
warehouse
• Oper marts
An operational data mart.
• Enterprise data warehouse (EDW)
A data warehouse for the enterprise.
• Metadata
Data about data. In a data warehouse, metadata describe
the contents of a data warehouse and the manner of its
acquisition and use
8
A Conceptual Framework for DW
No data marts option
Data Applications
Sources (Visualization)
Access
Routine
ERP Business
ETL
Reporting
Process Data mart
(Marketing)
Select
/ Middleware
Legacy Metadata Data/text
Extract mining
Data mart
(Engineering)
Transform Enterprise
POS Data warehouse
OLAP,
Integrate
API
Data mart Dashboard,
(Finance) Web
Other Load
OLTP/wEB
Replication Data mart
(...) Custom built
External
applications
data
9
Generic DW Architectures
• Three-tier architecture
1. Data acquisition software (back-end)
2. The data warehouse that contains the data & software
3. Client (front-end) software that allows users to access
and analyze data from the warehouse
• Two-tier architecture
First 2 tiers in three-tier architecture is combined into
one
… sometime there is only one tier?
10
Generic DW Architectures
Tier 1: Tier 2: Tier 3:

Client workstation Application server Database server
Tier 1: Tier 2:
Client workstation Application & database server
11
DW Architecture Considerations
• Issues to consider when deciding which
architecture to use:
• Which database management system (DBMS)
should be used?
• Will parallel processing and/or partitioning be
used?
• Will data migration tools be used to load the data
warehouse?
• What tools will be used to support data retrieval
and analysis?
12
A Web-based DW Architecture
Web pages
Application
Server
Client Web
(Web browser) Internet/ Server
Intranet/
Extranet
Data
warehouse
13
Alternative DW Architectures
(a) Independent Data Marts Architecture
ETL
End user
Source Staging Independent data marts
access and
Systems Area (atomic/summarized data)
applications
(b) Data Mart Bus Architecture with Linked Dimensional Datamarts
ETL
Dimensionalized data marts End user
Source Staging
linked by conformed dimentions access and
Systems Area
(atomic/summarized data) applications
(c) Hub and Spoke Architecture (Corporate Information Factory)
ETL
End user
Source Staging Normalized relational
access and
Systems Area warehouse (atomic data)
applications
Dependent data marts

(summarized/some atomic data)
14
Alternative DW Architectures
(d) Centralized Data Warehouse Architecture
ETL
Normalized relational End user
Source Staging
warehouse (atomic/some access and
Systems Area
summarized data) applications
(e) Federated Architecture
Data mapping / metadata

End user
Logical/physical integration of access and
Existing data warehouses
common data elements applications
Data marts and legacy systmes
15
Which Architecture is the Best?
• Bill Inmon versus Ralph Kimball
• Enterprise DW versus Data Marts approach
Empirical study by Ariyachandra and Watson (2006)
16
Data Warehousing Architectures
Ten factors that potentially affect the architecture selection decision:
1. Information 6. Strategic view of the data

interdependence between warehouse prior to
organizational units implementation
2. Upper management’s 7. Compatibility with existing systems
information needs 8. Perceived ability of the in-house IT
3. Urgency of need for a data staff
warehouse 9. Technical issues
4. Nature of end-user tasks 10. Social/political factors
5. Constraints on resources
17
Enterprise Data Warehouse
(by Teradata Corporation)
18
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
• Data integration
Integration that comprises three major processes: data
access, data federation, and change capture.
• Enterprise application integration (EAI)
A technology that provides a vehicle for pushing data from
source systems into a data warehouse
• Enterprise information integration (EII)
An evolving tool space that promises real-time data
integration from a variety of sources
• Service-oriented architecture (SOA)
A new way of integrating information systems
19
Data Integration and the Extraction,
Transformation, and Load (ETL) Process
Extraction, transformation, and load (ETL) process
Packaged Transient
application data source
Data
warehouse
Legacy
Extract Transform Cleanse Load
system
Data mart
Other internal
applications
20
ETL
• Issues affecting the purchase of and ETL tool
• Data transformation tools are expensive
• Data transformation tools may have a long learning
curve
• Important criteria in selecting an ETL tool
• Ability to read from and write to an unlimited number of
data sources/architectures
• Automatic capturing and delivery of metadata
• A history of conforming to open standards
• An easy-to-use interface for the developer and the
functional user
21
Benefits of DW
• Direct benefits of a data warehouse
• Allows end users to perform extensive analysis
• Allows a consolidated view of corporate data
• Better and more timely information
• Enhanced system performance
• Simplification of data access
• Indirect benefits of data warehouse
• Enhance business knowledge
• Present competitive advantage
• Enhance customer service and satisfaction
• Facilitate decision making
• Help in reforming business processes
22
Data Warehouse Development
• Data warehouse development approaches
• Inmon Model: EDW approach (top-down)
• Kimball Model: Data mart approach (bottom-up)
• Which model is best?
• There is no one-size-fits-all strategy to DW
• One alternative is the hosted warehouse
• Data warehouse structure:

• The Star Schema vs. Relational
• Real-time data warehousing?
23
DW Development Approaches
(Kimball Approach) (Inmon Approach)
24
DW Structure: Star Schema
(a.k.a. Dimensional Modeling)
Start Schema Example for an

Automobile Insurance Data Warehouse
Driver Automotive
Facts:
Dimensions: Claim Information Central table that contains
How data will be sliced/
(usually summarized)
diced (e.g., by location,
information; also contains
time period, type of
foreign keys to access each
automobile or driver)
dimension table.
Location Time
25
Dimensional Modeling
Data cube
A two-dimensional,
three-dimensional, or
higher-dimensional
object in which each
dimension of the data
represents a measure of
interest
- Grain
- Drill-down
- Slicing
26
Best Practices for Implementing DW
• The project must fit with corporate strategy
• There must be complete buy-in to the project
• It is important to manage user expectations
• The data warehouse must be built incrementally
• Adaptability must be built in from the start
• The project must be managed by both IT and business
professionals (a business–supplier relationship must be
developed)
• Only load data that have been cleansed/high quality
• Do not overlook training requirements
• Be politically aware.
27
Risks in Implementing DW
• No mission or objective
• Quality of source data unknown
• Skills not in place
• Inadequate budget
• Lack of supporting software
• Source data not understood
• Weak sponsor
• Users not computer literate
• Political problems or turf wars
• Unrealistic user expectations
(Continued …)
28
Risks in Implementing DW – Cont.
• Architectural and design risks
• Scope creep and changing requirements
• Vendors out of control
• Multiple platforms
• Key people leaving the project
• Loss of the sponsor
• Too much new technology
• Having to fix an operational system
• Geographically distributed environment
• Team geography and language culture
29
Things to Avoid for Successful Implementation of
DW
• Starting with the wrong sponsorship chain
• Setting expectations that you cannot meet
• Engaging in politically naive behavior
• Loading the warehouse with information just
because it is available
• Believing that data warehousing database design is
the same as transactional DB design
• Choosing a data warehouse manager who is
technology oriented rather than user oriented
30
Real-time DW
(a.k.a. Active Data Warehousing)
• Enabling real-time data updates for real-time

analysis and real-time decision making is growing
rapidly
• Push vs. Pull (of data)
• Concerns about real-time BI
• Not all data should be updated continuously
• Mismatch of reports generated minutes apart
• May be cost prohibitive
• May also be infeasible
31
Evolution of DSS & DW
32
Active Data Warehousing
(by Teradata Corporation)
33
Comparing Traditional and Active DW
34
Data Warehouse Administration
• Due to its huge size and its intrinsic nature, a DW
requires especially strong monitoring in order to
sustain its efficiency, productivity and security.
• The successful administration and management of
a data warehouse entails skills and proficiency that
go past what is required of a traditional database
administrator.
• Requires expertise in high-performance software,
hardware, and networking technologies
35
DW Scalability and Security
• Scalability
• The main issues pertaining to scalability:
• The amount of data in the warehouse
• How quickly the warehouse is expected to grow
• The number of concurrent users
• The complexity of user queries
• Good scalability means that queries and other data-
access functions will grow linearly with the size of the
warehouse
• Security
• Emphasis on security and privacy
36
Lesson 12C: Business Performance Management
Dr. Le, Hai Ha

Learning Objectives
• Understand the all-encompassing nature of
performance management (BPM)
• Understand the closed-loop processes linking
strategy to execution
• Strategize: Where Do We Want to Go?
• Plan: How Do We Get There?
• Monitor: How Are We Doing?
• Act /Adjust: What Do We Need to Do Differently?
• Describe some of the best practices in planning
and management reporting
2
Learning Objectives
• Describe the difference between performance
management and measurement
• Understand the role of methodologies in BPM
• Describe the basic elements of the balanced
scorecard and Six Sigma methodologies
• Describe the differences between scorecards and
dashboards
• Understand some of the basic concepts of
dashboards and dashboard design
3
Business Performance Management (BPM)
Overview
• Business Performance Management (BPM) is…

A real-time system that alert managers to potential
opportunities, impending problems, and threats,
and then empowers them to react through models
and collaboration
• Also called, corporate performance management
(CPM by Gartner Group), enterprise performance
management (EPM by Oracle), strategic enterprise
management (SEM by SAP)
4
Business Performance Management (BPM)
Overview
• BPM refers to the business processes,

methodologies, metrics, and technologies used by
enterprises to measure, monitor, and manage
business performance
• BPM encompasses three key components
• A set of integrated, closed-loop management and
analytic processes, supported by technology …
• Tools for businesses to define strategic goals and then
measure/manage performance against them
• Methods and tools for monitoring key performance
indicators (KPIs), linked to organizational strategy
5
BPM versus BI
• BPM is an outgrowth of BI and incorporates many
of its technologies, applications, and techniques
• Same companies market and sell them
• BI has evolved so that many of the original differences
between the two no longer exist (e.g., BI used to be
focused on departmental rather than enterprise-wide
projects)
• BI is a crucial element of BPM
• BPM = BI + Planning (a unified solution)
6
A Closed-Loop Process to Optimize Business
Performance
• Process Steps
1. Strategize
2. Plan
3. Monitor/analyze
4. Act/adjust
Each with its own

process steps…
7
Strategize: Where Do We Want to Go?
• Strategic planning
• Common tasks for the strategic planning process:
1. Conduct a current situation analysis
2. Determine the planning horizon
3. Conduct an environment scan
4. Identify critical success factors
5. Complete a gap analysis
6. Create a strategic vision
7. Develop a business strategy
8. Identify strategic objectives and goals
8
Strategize: Where Do We Want to
Go?
• Strategic objective
A broad statement or general course of action prescribing
targeted directions for an organization
• Strategic goal
A quantified objective with a designated time period
• Strategic vision
A picture or mental image of what the organization should
look like in the future
• Critical success factors (CSF)
Key factors that delineate the things that an organization
must excel at to be successful
9
Strategize: Where Do We Want to Go?
“90 percent of organizations fail to execute their strategies”
• The strategy gap
• Four sources for the gap between strategy and
execution:
1. Communication (enterprise-wide)
2. Alignment of rewards and incentives
3. Focus (concentrating on the core elements)
4. Resources
10
Plan: How Do We Get There?
• Operational planning
• Operational plan: plan that translates an organization’s
strategic objectives and goals into a set of well-defined
tactics and initiatives, resources requirements, and
expected results for some future time period (usually a
year)
• Operational planning can be
• Tactic-centric (operationally focused)
• Budget-centric plan (financially focused)
11
Plan: How Do We Get There?
• Financial planning and budgeting
• An organization’s strategic objectives and key metrics
should serve as top-down drivers for the allocation of an
organization’s tangible and intangible assets
• Resource allocations should be carefully aligned with the
organization’s strategic objectives and tactics in order to
achieve strategic success
12
Monitor: How Are We Doing?
• A comprehensive framework for monitoring performance
should address two key issues:
• What to monitor
• Critical success factors
• Strategic goals and targets
• How to monitor
13
• Diagnostic control system
A cybernetic system that has inputs,
a process for transforming the
inputs into outputs, a standard or
benchmark against which to
compare the outputs, and a
feedback channel to allow
information on variances between
the outputs and the standard to be
communicated and acted upon.
14
• Pitfalls of variance analysis
• The vast majority of the exception analysis focuses on
negative variances when functional groups or
departments fail to meet their targets
• Rarely are positive variances reviewed for potential
opportunities, and rarely does the analysis focus on
assumptions underlying the variance patterns
15
What if strategic assumptions (not

the operations) are wrong?
16
Act and Adjust: What Do We Need to Do
Differently?
• Success (or mere survival) depends on new
projects: creating new products, entering new
markets, acquiring new customers (or
businesses), or streamlining some process.
• Most new projects and ventures fail!
• Hollywood movies: 60% chance of failure
• Mergers and acquisitions: 60%
• IT projects (large-scale): 70%
• New food products: 80%
• New pharmaceutical products: 90% …
17
Act and Adjust:
What Do We Need to Do Differently?
Harrah’s Closed-Loop
Marketing Model
18
Act and Adjust:
What Do We Need to Do Differently?
• The Hackett Group’s benchmarking results indicate
that world class companies:
• Are significantly more efficient than their peers at
managing costs
• Focus on operational excellence and experience
significantly reduced rates of employee turnover
• Provide management with the tools and training to
leverage corporate information and to guide strategic
planning, budgeting, and forecasting
• Closely align strategic and tactical plans, enabling
functional areas to contribute more effectively…
19
Performance Measurement
• Performance measurement system
A system that assists managers in tracking the
implementations of business strategy by comparing actual
results against strategic goals and objectives
• Comprises systematic comparative methods that
indicate progress (or lack thereof) against goals
20
• Key performance indicator (KBI)
A KPI represents a strategic objective and metrics that
measures performance against a goal
• Distinguishing features of KPIs
◼ Strategy ◼ Encodings
◼ Targets ◼ Time frames
◼ Ranges ◼ Benchmarks
21
• Key performance indicator (KBI)
Outcome KPIs vs. Driver KPIs
(lagging indicators (leading indicators
e.g., revenues) e.g., sales leads)
• Operational areas covered by driver KPIs

• Customer performance
• Service performance
• Sales operations
• Sales plan/forecast
22
• Problems with existing performance measurement systems
• The most popular system in use is some variant of the
balanced scorecard (BSC)
• 50-90% of all companies implemented BSC
• BSC methodology is a holistic vision of a measurement
system tied to the strategic direction of the organization
and based on a four-perspective view of the world:
• Financial measures supported by customer, internal,
and learning and growth metrics
23
• The drawbacks of using financial data as the core
of a performance measurement:
• Financial measures are usually reported by
organizational structures and not by the processes that
produced them
• Financial measures are lagging indicators, telling us
what happened, not why it happened or what is likely
to happen in the future
• Financial measures are often the product of allocations
that are not related to the underlying processes that
generated them
• Financial measures are focused on the short-term
returns…
24
• Good performance measures should:
• Be focused on key factors
• Be a mix of past, present, and future
• Balance the needs of all stakeholders (shareholders,
employees, partners, suppliers, …)
• Start at the top and trickle down to the bottom
• Have targets that are based on research and reality
rather than be arbitrary
25
BPM Methodologies
• An effective performance measurement system should help:
• Align top-level strategic objectives and bottom-level initiatives
• Identify opportunities and problems in a timely fashion
• Determine priorities and allocate resources accordingly
• Change measurements when the underlying processes and
strategies change
• Delineate responsibilities, understand actual performance relative
to responsibilities, and reward and recognize accomplishments
• Take action to improve processes and procedures when the data
warrant it
• Plan and forecast in a more reliable and timely fashion
26
BPM Methodologies
• Balanced scorecard (BSC)
A performance measurement and management
methodology that helps translate an organization’s financial,
customer, internal process, and learning and growth
objectives and targets into a set of actionable initiatives
• "The Balanced Scorecard: Measures That Drive
Performance” (HBR, 1992)
27
BPM Methodologies: Balanced Scorecard
28
BPM Methodologies
• The meaning of “balance”
• BSC is designed to overcome the limitations of
systems that are financially focused
• Nonfinancial objectives fall into one of three
perspectives:
1. Customer
2. Internal business process
3. Learning and growth
29
BPM Methodologies
• In BSC, the term “balance” arises because the combined
set of measures are supposed to encompass indicators
that are:
• Financial and nonfinancial
• Leading and lagging
• Internal and external
• Quantitative and qualitative
• Short term and long term
30
BPM Methodologies
• Aligning strategies and actions
• A six-step process
1. Developing and formulating a strategy
2. Planning the strategy
3. Aligning the organization
4. Planning the operations
5. Monitoring and learning
6. Testing and adapting the strategy
31
BPM Methodologies
• Strategy map
A visual display that
delineates the relationships
among the key organizational
objectives for all four BSC
perspectives
32
BPM Methodologies
• Six Sigma
A performance management methodology aimed at
reducing the number of defects in a business process to
as close to zero defects per million opportunities (DPMO)
as possible
33
BPM Methodologies
• Six Sigma
• The DMAIC performance model
A closed-loop business improvement model that
encompasses the steps of defining, measuring,
analyzing, improving, and controlling a process
• Lean Six Sigma
• Lean manufacturing / lean production
• Lean production versus six sigma
(See Table 9.2 for a summary)
34
BPM Methodologies
• How to Succeed in Six Sigma
• Six Sigma is integrated with business strategy
• Six Sigma supports business objectives
• Key executives are engaged in the process
• Project selection is based on value potential
• There is a critical mass of projects and resources
• Projects-in-process are actively managed
• Team leadership skills are emphasized
• Results are rigorously tracked
• BSC + Six Sigma = Success (see Tech. Ins. 9.3)
35
BPM Methodologies
• Integrating six sigma with BSC by
• Translating their strategy into quantifiable objectives
• Cascading objectives through the organization
• Setting targets based on the voice of the customer
• Implementing strategic projects using Six Sigma
• Executing processes in a consistent fashion to deliver
business results
36
BPM Architecture and Applications
• BPM architecture
• The logical and physical design of a system
• BPM system consists of three logical parts:
1. BPM Applications
2. Information Hub
3. Source Systems
• BPM system consists of three physical parts:
1. Database tier
2. Application tier
3. Client or user interface
37
38
• BPM applications
1. Strategy management
2. Budgeting, planning, and
forecasting
3. Financial consolidation
4. Profitability modeling and
optimization
5. Financial, statutory, and
management reporting
39
• Leading BPM Application Suits/Vendors
• SAP Business Objects Enterprise Performance
Management
• Oracle Hyperion Performance Management
• IBM Cognos BI and Financial Performance Management
• Microstrategy
• Microsoft…
40
Performance Dashboards
• Dashboards and scorecards both provide visual displays of
important information that is consolidated and arranged on
a single screen so that information can be digested at a
single glance and easily explored
41
42
• Dashboards versus scorecards
• Performance dashboards
Visual display used to monitor operational performance
(free form…)
• Performance scorecards
Visual display used to chart progress against strategic
and tactical goals and targets (predetermined
measures…)
43
• Dashboards versus scorecards
• Performance dashboard is a multilayered application
built on a business intelligence and data integration
infrastructure that enables organizations to measure,
monitor, and manage business performance more
effectively
- Eckerson
• Three types of performance dashboards:
1. Operational dashboards
2. Tactical dashboards
3. Strategic dashboards
44
• Dashboard design
• “The fundamental challenge of dashboard design is to
display all the required information on a single screen,
clearly and without distraction, in a manner that can be
assimilated quickly"
(Few, 2005)
45
• What to look for in a dashboard
• Use of visual components (e.g., charts, performance bars, spark
lines, gauges, meters, stoplights) to highlight, at a glance, the data
and exceptions that require action
• Transparent to the user, meaning that they require minimal training
and are extremely easy to use
• Combine data from a variety of systems into a single, summarized,
unified view of the business
• Enable drill-down or drill-through to underlying data sources or
reports
• Present a dynamic, real-world view with timely data updates
• Require little, if any, customized coding to implement, deploy, and
maintain
46
Lesson 13: Knowledge Management
Dr. Le, Hai Ha

Content
 Introduction to Knowledge Management

 Organizational Learning and Transformation
 Approaches to Knowledge Management
 Information Technology (IT) in Knowledge
Management
 KM System Implementation
 Roles of People in Knowledge Management
 Ensuring the Success of Knowledge
Management Efforts
2
MITRE’s View to the KM Process
ENABLING TECHNOLOGIES FOR

Collaboration Communication
KNOWLEDGE MANAGEMENT
Data
Internet KM LIFE-CYCLE Mining
Create Share
Expert
Extranet
Systems
Search
Intranet
Engine
Identify Modify
Artificial
Web 2.0
Intelligence feedback
Machine
Act Apply Databases
Learning
Measurements Portals
Knowledge CULTURE PROCESS PRACTICE Web

representation technologies
INFLUENCING FACTORS
3
Introduction to Knowledge Management
• Knowledge management concepts and definitions
– Knowledge management
The active management of the expertise in an
organization. It involves collecting, categorizing, and
disseminating knowledge
– Intellectual capital
The invaluable knowledge of an organization’s
employees
4
• Knowledge is
– information that is contextual, relevant, and actionable
– understanding, awareness, or familiarity acquired through
education or experience
– anything that has been learned, perceived, discovered,
inferred, or understood.
In a knowledge management system, “knowledge is

information in action”
5
Data Relevant and

Knowledge
Processed Information Actionable
DEPLOYMENT CHART
Database DEPT 1
PHASE 1 PHASE 2 PHASE 3 PHASE 4 PHASE 5
DEPT 2
Wisdom
DEPT 3
DEPT 4
4 5
2 3
1
Relevant and actionable processed-data
6
• Characteristics of knowledge
– Extraordinary leverage and increasing returns
– Fragmentation, leakage and the need to refresh
– Uncertain value
– Uncertain value of sharing
• Knowledge-based economy
The economic shift from natural resources to
intellectual assets
7
• Explicit and tacit knowledge
– Explicit (leaky) knowledge
Knowledge that deals with objective, rational, and technical
material (data, policies, procedures, software, documents,
etc.)
– Easily documented, transferred, taught and learned
– Examples…
8
• Explicit and tacit knowledge
– Tacit (embedded) knowledge
Knowledge that is usually in the domain of subjective,
cognitive, and experiential learning
– It is highly personal and hard to formalize
– Hard to document, transfer, teach and learn
– Involves a lot of human interpretation
– Examples…
9
• Knowledge management systems (KMS)
A system that facilitates knowledge management by
ensuring knowledge flow from the person(s) who
know to the person(s) who need to know throughout
the organization; knowledge evolves and grows
during the process
10
Organizational Learning and Transformation
• Learning organization
An organization capable of learning from its past experience,
implying the existence of an organizational memory and a
means to save, represent, and share it through its personnel
• Organizational memory
Repository of what the organization “knows”
11
• Organizational learning
– Development of new knowledge and insights that have the
potential to influence organization’s behavior
– The process of capturing knowledge and making it
available enterprise-wide
– Need to establish corporate memory
– Modern IT helps…
– People issues are the most important!
12
• Organizational culture
The aggregate attitudes in an organization concerning a
certain issue (e.g., technology, computers, DSS)
– How do people learn the “culture”?
– Is it explicit or implicit?
– Can culture be changed? How?
– Give some examples of corporate culture: Microsoft,
Google, Apple, HP, GM, …
13
• Why people don’t like to share knowledge:
– Lack of time to share knowledge and time to identify
colleagues in need of specific knowledge
– Fear that sharing may jeopardize one’s job security
– Low awareness and realization of the value and benefit of
the knowledge others possess
– Dominance in sharing explicit over tacit knowledge
– Use of a strong hierarchy, position-based status, and formal
power
– Insufficient capture, evaluation, feedback, communication,
and tolerance of past mistakes
14
• Why people don’t like to share knowledge:
– Differences in experience and education levels
– Lack of contact time and interaction between knowledge
sources and recipients
– Poor verbal/written communication and interpersonal skills
– Age, gender, cultural and ethical defenses
– Lack of a social network
– Ownership of intellectual property
– Lack of trust in people because they may misuse knowledge
or take unjust credit for it
– Perceived lack of accuracy/credibility of knowledge
15
Knowledge Management Activities
• Knowledge management initiatives and activities
– Most knowledge management initiatives have one of
three aims:
1. To make knowledge visible
2. To develop a knowledge-intensive culture
3. To build a knowledge infrastructure
16
• Knowledge creation is the generation of new insights, ideas,
or routines
• Four modes of knowledge creation:
– Socialization
– Externalization
– Internalization
– Combination
– Analytics-based knowledge creation?
17
• Knowledge sharing
– Knowledge sharing is the willful explication of one
person’s ideas, insights, experiences to another individual
either via an intermediary or directly
– In many organizations, information and knowledge are
not considered organizational resources to be shared but
individual competitive weapons to be kept private
18
• Knowledge seeking
– Knowledge seeking (knowledge sourcing) is the search
for and use of internal organizational knowledge
– Lack of time or lack of reward may hinder the sharing of
knowledge or knowledge seeking
19
Approaches to Knowledge Management
• Process approach to knowledge management attempts
to codify organizational knowledge through
formalized controls, processes and technologies
– Focuses on explicit knowledge and IT
• Practice approach focuses on building the social
environments or communities of practice necessary to
facilitate the sharing of tacit understanding
– Focuses on tacit knowledge and socialization
20
• Hybrid approaches to knowledge management
– The practice approach is used so that a
repository stores only explicit knowledge that is
relatively easy to document
– Tacit knowledge initially stored in the repository
Hybrid
is contact information about experts and their
at
areas of expertise
80/20
to – Increasing the amount of tacit knowledge over
50/50 time eventually leads to the attainment of a true
process approach
21
Knowledge Management -
A Demand Led Business Activity
• Supply-driven vs. demand-driven KM
Supp
ly-dr roac
h
iven a p p
: DIKA n ol ogy
Data R Tech Results
obtain
summarize
Information Action
contextulize utilize
Knowledge
Bu s i KID
n e ss
-valu v e n: RA
r i
e appr
o a ch a nd-d
Dem
22
• Best practices
In an organization, the best methods for solving
problems. These are often stored in the
knowledge repository of a knowledge
management system
• Knowledge repository is the actual storage
location of knowledge in a knowledge
management system. Similar in nature to a
database, but generally text-oriented
23
KNOWLEDGE MANAGEMENT PLATFORM (KMP)
Approaches to KNOWLEDGE PORTAL

(Web-based End User Interface)
KNOWLEDGE UTILIZATION
Knowledge
Human Experts
Management Ad hoc
Search
Intelligent Broker
A
Comprehensive KNOWLEDGE REPOSITORY
(Knowledge / Information / Data Nuggets)
JUN
1
5
View to
Knowledge Web Crawler Data/Text Mining Tools
Manual
KNOWLEDGE CREATION
Entries
Repository
DIVERSE INFORMATION / DATA SOURCES
(Weather / Medical Info / Finance / Agriculture / Industrial)
24
• Developing a knowledge repository
– Knowledge repositories are developed using several
different storage mechanisms in combination
– The most important aspects and difficult issues are making
the contribution of knowledge relatively easy for the
contributor and determining a good method for cataloging
the knowledge
25
Information Technology (IT) in Knowledge
Management
• The KMS cycle
– KMS usually follow a six-step cycle:
1. Create knowledge
2. Capture knowledge
3. Refine knowledge
4. Store knowledge
5. Manage knowledge
6. Disseminate knowledge
26
Management
2
Capture
The Cyclic Model Knowledge
of Knowledge
Management Create
Knowledge
1
Refine
Knowledge
3
6 4
Disseminate Store
Knowledge Knowledge
5
Manage
Knowledge
27
Management
• Components of KMS
– KMS are developed using three sets of core technologies:
1. Communication
2. Collaboration
3. Storage and retrieval
– Technologies that support KM

• Artificial intelligence
• Intelligent agents
• Knowledge discovery in databases
• Extensible Markup Language (XML)
28
Management
• Artificial intelligence
– AI methods used in KMS:
• Assist in and enhance searching knowledge
• Help for knowledge representation (e.g., ES)
• Help establish knowledge profiles of individuals and
groups
• Help determine the relative importance of knowledge
when it is contributed to and accessed from the
knowledge repository
29
Management
• AI methods used in KMS:
– Scan e-mail, documents, and databases to perform
knowledge discovery, determine meaningful relationships
and rules
– Identify patterns in data (usually through neural networks
and other data mining techniques)
– Forecast future results by using data/knowledge
– Provide advice directly from knowledge by using neural
networks or expert systems
– Provide a natural language or voice command–driven user
interface for a KMS
30
Management
• Intelligent agents
– Intelligent agents are software systems that learn how users
work and provide assistance in their daily tasks
– They are used to elicit and identify knowledge
• See ibm.com, gentia.com for examples
– Combined with enterprise knowledge portal to proactively
disseminate knowledge
31
Management
• Knowledge discovery in databases (KDD)
A machine learning process that performs rule induction, or a
related procedure to establish (or create) knowledge from large
databases
– a.k.a. Data Mining (and/or Text Mining)
32
Management
• Model marts
Small, generally departmental repositories of
knowledge created by employing knowledge-
discovery techniques on past decision instances.
Similar to data marts
• Model warehouses
Large, generally enterprise-wide repositories of
knowledge created by employing knowledge-
discovery techniques. Similar to data warehouses
33
Management
• Extensible Markup Language (XML)
– XML enables standardized representations of data
structures so that data can be processed appropriately by
heterogeneous information systems without case-by-case
programming or human intervention
• Web 2.0
– The evolution of the Web from statically disseminating
information to collaboratively creating and sharing
information
34
KM System Implementation
• Knowledge management products and vendors
– Knowware
Technology tools (software/hardware products) that support
knowledge management
– Software development companies / vendors
• Collaborative computing tools
• Knowledge servers
• Enterprise knowledge portals (EKP)
An electronic doorway into a knowledge management
system…
35
• Software development companies / vendors
– Electronic document management (EDM)
A method for processing documents electronically,
including capture, storage, retrieval, manipulation, and
presentation
– Content management systems (CMS)

An electronic document management system that
produces dynamic versions of documents, and
automatically maintains the current set for use at the
enterprise level
36
• Software development tools
– Knowledge harvesting tools
– Search engines
– Knowledge management suites
– Knowledge management consulting firms
– Knowledge management ASPs
37
KMS Implementation
• Integration of KMS with other business information systems
– With DSS/BI Systems
– With AI
– With databases and information systems
– With CRM systems
– With SCM systems
– With corporate intranets and extranets
38
Roles of People in Knowledge Management
• Chief knowledge officer (CKO)
The person in charge of a knowledge management
effort in an organization
– Sets KM strategic priorities
– Establishes a repository of best practices
– Gains a commitment from senior executives
– Teaches information seekers how to better elicit it
– Creates a process for managing intellectual assets
– Obtain customer satisfaction information
– Globalizes knowledge management
39
• Skills required of a CKO include:
– Interpersonal communication skills
– Leadership skills
– Business acumen
– Strategic thinking
– Collaboration skills
– The ability to institute effective educational programs
– An understanding of IT and its role in advancing
knowledge management
40
• The CEO, other chief officers, and managers
– The CEO is responsible for championing a knowledge
management effort
– The officers make available the resources needed to get the
job done
• CFO ensures that the financial resources are available
• COO ensures that people begin to embed knowledge management
practices into their daily work processes
• CIO ensures IT resources are available
– Managers also support the KM efforts by providing access
to sources of knowledge
41
• Community of practice (CoP)
A group of people in an organization with a common
professional interest, often self-organized for
managing knowledge in a knowledge management
system
– See Application Case 11.7 as an example of how Xerox
successfully improved practices and cost savings through
CoP
42
• KMS developers
– The team members who actually develop the system
– Internal + External
• KMS staff
– Enterprise-wide KMS require a full-time staff to catalog
and manage the knowledge
43
Ensuring the Success of Knowledge Management
Efforts
• Success stories of knowledge management
– Implementing a good KM strategy can:
• Reduce…
– loss of intellectual capital
– costs by decreasing the number of times the
company must repeatedly solve the same problem
– redundancy of knowledge-based activities
• Increase…
– productivity
– employee satisfaction
44
Efforts
• MAKE: Most Admired Knowledge Enterprises
“Annually identifying the best practitioners of KM”
– Criteria (performance dimensions):
1. Creating a knowledge-driven corporate culture
2. Developing knowledge workers through leadership
3. Fostering innovation
4. Maximizing enterprise intellectual capital
5. Creating an environment for collaborative knowledge sharing
6. Facilitating organizational learning
7. Delivering value based on stakeholder knowledge
8. Transforming enterprise knowledge into stakeholders’ value
45
Efforts
• MAKE: Most Admired Knowledge Enterprises
“Annually identifying the best practitioners of KM”
– 2008 Winners:
1. McKinsey & Company 10. PricewaterhouseCoopers
2. Google 11. Ernst & Young
3. Royal Dutch Shell 12. IBM
4. Toyota 13. Schlumberger
5. Wikipedia 14. Samsung Group
6. Honda 15. BP
7. Apple 16. Unilever
8. Fluor 17. Accenture
9. Microsoft 18. …
46
Efforts
• Useful applications of KMS
– Finding experts electronically and using expert location
systems
• Expert location systems (know-who)
Interactive computerized systems that help employees
find and connect with colleagues who have expertise
required for specific problems—whether they are across
the county or across the room—in order to solve
specific, critical business problems in seconds
47
Efforts
• Knowledge management valuation
– Financial metrics for knowledge management valuation
• Focus knowledge management projects on specific
business problems that can be easily quantified
• When the problems are solved, the value and benefits of
the system become apparent
48
Efforts
• Knowledge management valuation
– Nonfinancial metrics for knowledge management
valuation—new ways to view capital when evaluating
intangibles:
• Customer goodwill
• External relationship capital
• Structural capital
• Human capital
• Social capital
• Environmental capital
49
Efforts
• Causes of knowledge management failure
– The effort mainly relies on technology and does not address
whether the proposed system will meet the needs and
objectives of the organization and its individuals
– Lack of emphasis on human aspects
– Lack of commitment
– Failure to provide reasonable incentive for people to use
the system…
50
Efforts
• Factors that lead to knowledge management success
– A link to a firm’s economic value, to demonstrate financial
viability and maintain executive sponsorship
– A technical and organizational infrastructure on which to
build
– A standard, flexible knowledge structure to match the way
the organization performs work and uses knowledge
51
Efforts
– A knowledge-friendly culture that leads directly to user
support
– A clear purpose and language, to encourage users to buy
into the system
– A change in motivational practices, to create a culture of
sharing
– Multiple channels for knowledge transfer
52
Efforts
– A significant process orientation and valuation to make a
knowledge management effort worthwhile
– Nontrivial motivational methods to encourage users to
contribute and use knowledge
– Senior management support
53
Last words on KM
• Knowledge is an intellectual asset
• IT is “just” an important enabler
• Proper management of knowledge is a necessary ingredient for
success
• Key issues:
– Organizational culture
– Executive sponsorship
– Measurement of success
54
Lesson 15: Expert Systems
Dr. Le, Hai Ha

Content
 Artificial intelligence
 Expert Systems
2
Artificial Intelligence (AI)
• Artificial intelligence (AI)
– A subfield of computer science, concerned with symbolic
reasoning and problem solving
• AI has many definitions…

– Behavior by a machine that, if performed by a human
being, would be considered intelligent
– “…study of how to make computers do things at which, at
the moment, people are better
– Theory of how the human mind works
3
AI Objectives
• Make machines smarter (primary goal)
• Understand what intelligence is
• Make machines more intelligent and useful
• Signs of intelligence…
– Learn or understand from experience
– Make sense out of ambiguous situations
– Respond quickly to new situations
– Use reasoning to solve problems
– Apply knowledge to manipulate the environment
4
Test for Intelligence
Turing Test for Intelligence
• A computer can be considered to be
smart only when a human interviewer,
“conversing” with both an unseen
human being and an unseen computer,
can not determine which is which.
- Alan Turing Questions / Answers
5
Symbolic Processing
• AI …
– represents knowledge as a set of symbols, and
– uses these symbols to represent problems, and
– apply various strategies and rules to manipulate symbols to

solve problems
• A symbol is a string of characters that stands for some real-
world concept (e.g., Product, consumer,…)
• Examples:
– (DEFECTIVE product)
– (LEASED-BY product customer) - LISP
– Tastes_Good (chocolate)
6
AI Concepts
• Reasoning
– Inferencing from facts and rules using heuristics or other search
approaches
• Pattern Matching
– Attempt to describe and match objects, events, or processes in terms of
their qualitative features and logical and computational relationships
• Knowledge Base
Computer
INPUTS OUTPUTS
(questions, Knowledge Inference (answers,
problems, etc.) Base Capability alternatives, etc.)
7
Evolution of artificial intelligence
High
Embedded
Applications
Complexity of the Solutions
Hybrid
Solutions
Domain
Knowledge
General
Methoids
Naïve
Solutions
Low
1960s 1970s 1980s 1990s 2000+ Time
8
Artificial vs. Natural Intelligence
• Advantages of AI
– More permanent
– Ease of duplication and dissemination
– Less expensive
– Consistent and thorough
– Can be documented
– Can execute certain tasks much faster
– Can perform certain tasks better than many people
• Advantages of Biological Natural Intelligence
– Is truly creative
– Can use sensory input directly and creatively
– Can apply experience in different situations
9
The AI Field
▪ AI is many different sciences and technologies
▪ It is a collection of concepts and ideas
– Linguistics ▪ Chemistry
– Psychology ▪ Physics
▪ Statistics
– Philosophy
▪ Mathematics
– Computer Science
▪ Management Science
– Electrical Engineering ▪ Management Information Systems
– Mechanics ▪ Computer hardware and software
– Hydraulics ▪ Commercial, Government and
– Physics Military Organizations
– Optics ▪ …
– Management and Organization Theory
– Chemistry
10
The AI Field…
• AI provides the Intelligent tutoring

Intelligent Agents
Autonomous Robots
scientific Speech Understanding
Natural Language Processing
foundation for Automatic Programming

Voice Recognition
Machine Learning Neural Networks

many commercial Computer Vision
Genetic Algorithms
Applications
Game Playing
technologies Expert Systems
The AI
Fuzzy Logic
Tree
Philosophy Mathematics
Computer Science
Human Behavior
Engineering
Disciplines
Neurology Logic Robotics Management Science
Sociology Information Systems

Statistics
Psychology
Human Cognition Pattern Recognition

Linguistics Biology
11
AI Areas
• Major…
– Expert Systems
– Natural Language Processing
– Speech Understanding
– Robotics and Sensory Systems
– Computer Vision and Scene Recognition
– Intelligent Computer-Aided Instruction
– Automated Programming
– Neural Computing Game Playing
• Additional…
– Game Playing, Language Translation
– Fuzzy Logic, Genetic Algorithms
– Intelligent Software Agents
12
AI is often transparent in many commercial products
• Anti-lock Braking Systems (ABS)

• Automatic Transmissions
• Video Camcorders
• Appliances
– Washers, Toasters, Stoves
• Help Desk Software
• Subway Control…
13
Expert Systems (ES)
• Is a computer program that attempts to imitate
expert’s reasoning processes and knowledge in
solving specific problems
• Most Popular Applied AI Technology
– Enhance Productivity
– Augment Work Forces
• Works best with narrow problem areas/tasks
• Expert systems do not replace experts, but
– Make their knowledge and experience more widely
available, and thus
– Permit non-experts to work better
14
Important Concepts in ES
• Expert
A human being who has developed a high level of
proficiency in making judgments in a specific domain
• Expertise
The set of capabilities that underlines the
performance of human experts, including
✓ extensive domain knowledge,
✓ heuristic rules that simplify and improve approaches to
problem solving,
✓ meta-knowledge and meta-cognition, and
✓ compiled forms of behavior that afford great economy
in a skilled performance
15
Important Concepts in ES
• Experts
– Degrees or levels of expertise
– Nonexperts outnumber experts often by 100 to 1
• Transferring Expertise
– From expert to computer to nonexperts via
acquisition, representation, inferencing, transfer
• Inferencing
– Knowledge = Facts + Procedures (Rules)
– Reasoning/thinking performed by a computer
• Rules (IF … THEN …)
• Explanation Capability (Why? How?)
16
Applications of Expert Systems
• DENDRAL
– Applied knowledge (i.e., rule-based reasoning)
– Deduced likely molecular structure of compounds
• MYCIN
– A rule-based expert system
– Used for diagnosing and treating bacterial infections
• XCON
– A rule-based expert system
– Used to determine the optimal information systems
configuration
• New applications: Credit analysis, Marketing, Finance,
Manufacturing, Human resources, Science and
Engineering, Education, …
17
Structures of Expert
Systems
en nt
nm e
t
ro pm
vi lo
Human
En e v e
Expert(s) Other Knowledge
D
Sources
en n
nm tio
t
Knowledge Information
Development
ro ta
1. Elicitation Gathering
vi sul
Environment
En on
2. Consultation C Knowledge
(Runtime) Rules
Knowledge
Environment Knowledge Base(s)
Engineer (Long Term)
Inferencing
Rules
Rule
Questions Inference Engine Firings
/ Answers
Explanation Knowledge
User Facility Refinement Refined
User Rules
Interface
Blackboard (Workspace)
Facts Data /
Facts Information
Working External Data

Memory Sources
(Short Term) (via WWW)
18
Conceptual Architecture of a Typical Expert Systems
Modeling of Manufacturing Systems
Abstract
ajshjaskahskaskjhakjshakhska akjsja s
askjaskjakskjas
Expert(s)
Printed Materials
Expertise Information
Knowledge
Control Structured
Structure
Engineer Knowledge
Inference
External Engine Knowledge Knowledge
Interfaces Base(s)
Working
Memory
Base Model Questions/

Data Bases Answers
Spreadsheets Solutions Updates
User
Interface
19
The Human Element in ES
• Expert
– Has the special knowledge, judgment, experience and
methods to give advice and solve problems
• Knowledge Engineer
– Helps the expert(s) structure the problem area by
interpreting and integrating human answers to questions,
drawing analogies, posing counter examples, and
enlightening conceptual difficulties
• User
• Others
– System Analyst, Builder, Support Staff, …
20
Structure of ES
• Three major components in ES are:
– Knowledge base
– Inference engine
– User interface
• ES may also contain:
– Knowledge acquisition subsystem
– Blackboard (workplace)
– Explanation subsystem (justifier)
– Knowledge refining system
21
Structure of ES
• Knowledge acquisition (KA)
The extraction and formulation of knowledge derived from various sources,
especially from experts (elicitation)
• Knowledge base
A collection of facts, rules, and procedures organized into schemas. The
assembly of all the information and knowledge about a specific field of
interest
• Blackboard (working memory)
An area of working memory set aside for the description of a current
problem and for recording intermediate results in an expert system
• Explanation subsystem (justifier)
The component of an expert system that can explain the system’s reasoning
and justify its conclusions
22
Knowledge Engineering (KE)
• A set of intensive activities encompassing the
acquisition of knowledge from human experts (and
other information sources) and converting this
knowledge into a repository (commonly called a
knowledge base)
• The primary goal of KE is
– to help experts articulate how they do what they do, and
– to document this knowledge in a reusable form
• Narrow versus Broad definition of KE?
23
The Knowledge Engineering Process
Problem or
Opportunity
Knowledge
Acquisition Raw
knowledge
Knowledge
Representation Codified
knowledge
Knowledge
Validation Validated
knowledge
Inferencing
(Reasoning) Meta
knowledge
Explanation &
Feedback loop (corrections and refinements) Justification
Solution
24
Major Categories of Knowledge in ES
• Declarative Knowledge
– Descriptive representation of knowledge that relates to a specific
object.
– Shallow - Expressed in a factual statements
– Important in the initial stage of knowledge acquisition
• Procedural Knowledge
– Considers the manner in which things work under different sets of
circumstances
– Includes step-by-step sequences and how-to types of instructions
• Metaknowledge
– Knowledge about knowledge
25
How ES Work: Inference Mechanisms
• Knowledge representation and organization
– Expert knowledge must be represented in a computer-
understandable format and organized properly in the
knowledge base
– Different ways of representing human knowledge include:
• Production rules (*)
• Semantic networks
• Logic statements
26
Forms of Rules
• IF premise, THEN conclusion
– IF your income is high, THEN your chance of being audited by the IRS
is high
• Conclusion, IF premise
– Your chance of being audited is high, IF your income is high
• Inclusion of ELSE
– IF your income is high, OR your deductions are unusual, THEN your
chance of being audited by the IRS is high, ELSE your chance of
being audited is low
• More Complex Rules
– IF credit rating is high AND salary is more than $30,000, OR assets are
more than $75,000, AND pay history is not "poor," THEN approve a
loan up to $10,000, and list the loan in category "B.”
27
Knowledge and Inference Rules
• Two types of rules are common in AI:
– Knowledge rules and Inference rules
• Knowledge rules (declarative rules), state all the facts and
relationships about a problem
• Inference rules (procedural rules), advise on how to solve a
problem, given that certain facts are known
• Inference rules contain rules about rules (metarules)
• Knowledge rules are stored in the knowledge base
• Inference rules become part of the inference engine
• Example:
– IF needed data is not known THEN ask the user
– IF more than one rule applies THEN fire the one with the highest
priority value first
28
Inference is the process of chaining multiple rules
together based on available data
– Forward chaining
A data-driven search in a rule-based system
If the premise clauses match the situation, then the process
attempts to assert the conclusion
– Backward chaining
A goal-driven search in a rule-based system
It begins with the action clause of a rule and works backward
through a chain of rules in an attempt to find a verifiable set
of condition clauses
29
Inferencing with Rules:
Forward and Backward Chaining
• Firing a rule
– When all of the rule's hypotheses (the “if parts”) are satisfied, a rule
said to be FIRED
– Inference engine checks every rule in the knowledge base in a forward
or backward direction to find rules that can be FIRED
– Continues until no more rules can fire, or until a goal is achieved
30
Backward Chaining
• Goal-driven: Start from a potential conclusion (hypothesis),
then seek evidence that supports (or contradicts with) it
• Often involves formulating and testing intermediate
hypotheses (or sub-hypotheses)
Knowledge Base
◼ Investment Decision: Variable Definitions
D and
◼ A = Have $10,000
Rule 1: A & C -> E R2
Rule 2: D & C -> F ◼ B = Younger than 30

B C C&D
R4 3 R5
Rule 3: B & E -> F (invest in ◼ C = Education at college level or F G
growth stocks) ◼ D = Annual income > $40,000 2 1
B and B&E
Rule 4: B -> C ◼ E = Invest in securities 4 R3
Rule 5: F -> G (invest in
A and◼ F =A&C
Invest in Egrowth stocks
IBM) 6 5 R1
Legend
◼ G = Invest in IBM stock A, B, C, D, E, F, G: Facts

1, 2, 3, 4: Sequence of rule firings
B C R1, R2, R3, R4, R5: Rules
7 R4
31
Forward Chaining
• Data-driven: Start from available information as it becomes
available, then try to draw conclusions
• Which One to Use?
– If all facts available up front - forward chaining
– Diagnostic problems - backward chaining
Knowledge Base FACTS: D and

A is TRUE R2
B C C&D
Rule 1: A & C -> E B is TRUE 1 R4 R5
Rule 2: D & C -> F or F G
4
Rule 3: B & E -> F (invest in B and B&E
growth stocks) 3 R3
Rule 4: B -> C A and A&C E
Rule 5: F -> G (invest in
Legend
2 R1 A, B, C, D, E, F, G: Facts
IBM) B C
1, 2, 3, 4: Sequence of rule firings
R1, R2, R3, R4, R5: Rules
1 R4
32
Inferencing Issues
• How do we choose between BC and FC
– Follow how a domain expert solves the problem
• If the expert first collect data then infer from it
=> Forward Chaining
• If the expert starts with a hypothetical solution and then attempts to find
facts to prove it => Backward Chaining
• How to handle conflicting rules
IF A & B THEN C
IF X THEN C
1. Establish a goal and stop firing rules when goal is achieved
2. Fire the rule with the highest priority
3. Fire the most specific rule
4. Fire the rule that uses the data most recently entered
33
Inferencing with Uncertainty
Theory of Certainty (Certainty Factors)
• Certainty Factors and Beliefs
• Uncertainty is represented as a Degree of Belief
• Express the Measure of Belief
• Manipulate degrees of belief while using knowledge-based
systems
• Certainty Factors (CF) express belief in an event based on
evidence (or the expert's assessment)
– 1.0 or 100 = absolute truth (complete confidence)
– 0 = certain falsehood
• CFs are NOT probabilities

• CFs need not sum to 100
34
Combining Certainty Factors
• Combining Several Certainty Factors in One Rule where parts are
combined using AND and OR logical operators
• AND
IF inflation is high, CF = 50 percent, (A), AND
unemployment rate is above 7, CF = 70 percent, (B), AND
bond prices decline, CF = 100 percent, (C)
THEN stock prices decline
CF(A, B, and C) = Minimum[CF(A), CF(B), CF(C)]
=>
– The CF for “stock prices to decline” = 50 percent
– The chain is as strong as its weakest link
35
• OR
IF inflation is low, CF = 70 percent, (A), OR
bond prices are high, CF = 85 percent, (B)
THEN stock prices will be high
CF(A, B) = Maximum[CF(A), CF(B)]
=>
– The CF for “stock prices to be high” = 85 percent
– Notice that in OR only one IF premise needs to be true
36
• Combining two or more rules
– Example:
• R1: IF the inflation rate is less than 5 percent,
THEN stock market prices go up (CF = 0.7)
• R2: IF unemployment level is less than 7 percent,
THEN stock market prices go up (CF = 0.6)
– Inflation rate = 4 percent and the unemployment level = 6.5
percent
– Combined Effect
• CF(R1,R2) = CF(R1) + CF(R2)[1 - CF(R1)]; or
• CF(R1,R2) = CF(R1) + CF(R2) - CF(R1)  CF(R2)
37
• Example continued…
Given CF(R1) = 0.7 AND CF(R2) = 0.6, then:
CF(R1,R2) = 0.7 + 0.6(1 - 0.7) = 0.7 + 0.6(0.3) = 0.88
• Expert System tells us that there is an 88 percent chance that stock prices
will increase
• For a third rule to be added
CF(R1,R2,R3) = CF(R1,R2) + CF(R3) [1 - CF(R1,R2)]
R3: IF bond price increases THEN stock prices go up (CF = 0.85)
Assuming all rules are true in their IF part, the chance that stock prices will
go up is
CF(R1,R2,R3) = 0.88 + 0.85 (1 - 0.88) = 0.982
38
Certainty Factors - Example
• Rules
R1: IF blood test result is yes
THEN the disease is malaria (CF 0.8)
R2: IF living in malaria zone
R3: IF bit by a flying bug
◼ Questions
What is the CF for having malaria (as its calculated by ES), if
1. The first two rules are considered to be true ?
2. All three rules are considered to be true?
39
Certainty Factors - Example
◼ Questions
What is the CF for having malaria (as its calculated by ES), if
1. The first two rules are considered to be true ?
2. All three rules are considered to be true?
◼ Answer 1
1. CF(R1, R2) = CF(R1) + CF(R2) * (1 – CF(R1)
= 0.8 + 0.5 * (1 - 0.8) = 0.8 – 0.1 = 0.9
2. CF(R1, R2, R3) = CF(R1, R2) + CF(R3) * (1 - CF(R1, R2))
= 0.9 + 0.3 * (1 - 0.9) = 0.9 – 0.03 = 0.93
◼ Answer 2
1. CF(R1, R2) = CF(R1) + CF(R2) – (CF(R1) * CF(R2))
= 0.8 + 0.5 – (0.8 * 0.5) = 1.3 – 0.4 = 0.9
2. CF(R1, R2, R3) = CF(R1, R2) + CF(R3) – (CF(R1, R2) * CF(R3))
= 0.9 + 0.3 – (0.9 * 0.3) = 1.2 – 0.27 = 0.93
40
Explanation as a Metaknowledge
• Explanation
– Human experts justify and explain their actions
… so should ES
– Explanation: an attempt by an ES to clarify reasoning,
recommendations, other actions (asking a question)
– Explanation facility = Justifier
• Explanation Purposes…
– Make the system more intelligible
– Uncover shortcomings of the knowledge bases (debugging)
– Explain unanticipated situations
– Satisfy users’ psychological and/or social needs
– Clarify the assumptions underlying the system's operations
– Conduct sensitivity analyses
41
Two Basic Explanations
• Why Explanations - Why is a fact requested?
• How Explanations - To determine how a certain
conclusion or recommendation was reached
– Some simple systems - only at the final conclusion
– Most complex systems provide the chain of rules used to
reach the conclusion
• Explanation is essential in ES
• Used for training and evaluation
42
• Development process of ES
– A typical process for developing ES includes:
• Knowledge acquisition
• Knowledge representation
• Selection of development tools
• System prototyping
• Evaluation
• Improvement /Maintenance
43
Development of ES
• Defining the nature and scope of the problem
– Rule-based ES are appropriate when the nature of the
problem is qualitative, knowledge is explicit, and experts
are available to solve the problem effectively and provide
their knowledge
• Identifying proper experts

– A proper expert should have a thorough understanding of:
• Problem-solving knowledge
• The role of ES and decision support technology
• Good communication skills
44
Development of ES
• Acquiring knowledge
– Knowledge engineer
An AI specialist responsible for the technical side of
developing an expert system. The knowledge engineer
works closely with the domain expert to capture the
expert’s knowledge
– Knowledge engineering (KE)
The engineering discipline in which knowledge is
integrated into computer systems to solve complex
problems normally requiring a high level of human
expertise
45
Development of ES
• Selecting the building tools
– General-purpose development environment
– Expert system shell (e.g., ExSys or Corvid)…
A computer program that facilitates relatively easy
implementation of a specific expert system
– Choosing an ES development tool

• Consider the cost benefits
• Consider the functionality and flexibility of the tool
• Consider the tool's compatibility with the existing information
infrastructure
• Consider the reliability of and support from the vendor
46
Development of ES
• Coding (implementing) the system
– The major concern at this stage is whether the coding (or
implementation) process is properly managed to avoid
errors…
• Assessment of an expert system
• Evaluation
• Verification
• Validation
47
Development of ES -
Validation and Verification of the ES
• Evaluation
– Assess an expert system's overall value
– Analyze whether the system would be usable, efficient and cost-
effective
• Validation
– Deals with the performance of the system (compared to the
expert's)
– Was the “right” system built (acceptable level of accuracy?)
• Verification
– Was the system built "right"?
– Was the system correctly implemented to specifications?
48
Problem Areas Addressed by ES
• Interpretation systems
• Prediction systems
• Diagnostic systems
• Repair systems
• Design systems
• Planning systems
• Monitoring systems
• Debugging systems
• Instruction systems
• Control systems, …
49
ES Benefits
• Capture Scarce Expertise
• Increased Productivity and Quality
• Decreased Decision Making Time
• Reduced Downtime via Diagnosis
• Easier Equipment Operation
• Elimination of Expensive Equipment
• Ability to Solve Complex Problems
• Knowledge Transfer to Remote Locations
• Integration of Several Experts' Opinions
• Can Work with Uncertain Information
• … more …
50
Problems and Limitations of ES
• Knowledge is not always readily available
• Expertise can be hard to extract from humans
– Fear of sharing expertise
– Conflicts arise in dealing with multiple experts
• ES work well only in a narrow domain of knowledge

• Experts’ vocabulary often highly technical
• Knowledge engineers are rare and expensive
• Lack of trust by end-users
• ES sometimes produce incorrect recommendations
• … more …
51
ES Success Factors
• Most Critical Factors
– Having a Champion in Management
– User Involvement and Training
– Justification of the Importance of the Problem
– Good Project Management
• Plus
– The level of knowledge must be sufficiently high
– There must be (at least) one cooperative expert
– The problem must be mostly qualitative
– The problem must be sufficiently narrow in scope
– The ES shell must be high quality, with friendly user interface,
and naturally store and manipulate the knowledge
52
Longevity of Commercial ES
• Only about 1/3 survived more than five years
• Generally ES failed due to managerial issues
– Lack of system acceptance by users
– Inability to retain developers
– Problems in transitioning from development to
maintenance (lack of refinement)
– Shifts in organizational priorities
• Proper management of ES development and
deployment could resolve most of them
53
Decision Support Systems
Advanced Intelligent Systems

Learning Objectives
• Understand the basic concepts and definitions of machine-
learning
– Learn the commonalities and differences between machine learning and
human learning
– Know popular machine-learning methods
• Know the concepts and definitions of case-based reasoning
systems (CBR)
• Be aware of the MSS applications of CBR
• Know the concepts behind and applications of genetic
algorithms
• Understand fuzzy logic and its application in designing
intelligent systems
55
Learning Objectives
• Understand the concepts behind support vector machines and
their applications in developing advanced intelligent systems
• Know the commonalities and differences between artificial
neural networks and support vector machines
• Understand the concepts behind intelligent software agents and
their use, capabilities, and limitations in developing advanced
intelligent systems
• Explore integrated intelligent support systems
56
Machine Learning Concepts and Definitions
• Machine learning (ML) is a family of artificial
intelligence technologies that is primarily concerned
with the design and development of algorithms that
allow computers to “learn” from historical data
– ML is the process by which a computer learns from
experience
– It differs from knowledge acquisition in ES: instead of
relying on experts (and their willingness) ML relies on
historical facts
– ML helps in discovering patterns in data
57
• Learning is the process of self-improvement, which is
an critical feature of intelligent behavior
• Human learning is a combination of many
complicated cognitive processes, including:
– Induction
– Deduction
– Analogy
– Other special procedures related to observing and/or
analyzing examples
58
• Machine Learning versus Human Learning
– Some ML behavior can challenge the performance of
human experts (e.g., playing chess)
– Although ML sometimes matches human learning
capabilities, it is not able to learn as well as humans or in
the same way that humans do
– There is no claim that machine learning can be applied in a
truly creative way
– ML systems are not anchored in any formal theories (why
they succeed or fail is not clear)
– ML success is often attributed to manipulation of symbols
(rather than mere numeric information)
59
Machine Learning Methods
Machine Learning
Supervised Reinforcement Unsupervised

Learning Learning Learning
Classification · Q-Learning Clustering / Segmentation

· Decision Tree · Adaptive Heuristic Critic · SOM (Neural Networks)
· Neural Networks (AHC), · Adaptive Resonance Theory
· Support Vector Machines · State-Action-Reward-State- · Expectation Maximization
· Case-based Reasoning Action (SARSA) · K-Means
· Rough Sets · Genetic Algorithms · Genetic Algorithms
· Discriminant Analysis · Gradient Descent Association
· Logistic Regression · Apriory
· Rule Induction · ECLAT Algorithm
Regression · FP-Growth
· Regression Trees · One-attribute Rule
· Neural Networks · Zero-attribute Rule
· Support Vector Machines
· Linear Regression
· Non-linear Regression
· Bayesian Linear Regression
60
Case-Based Reasoning (CBR)
• Case-based reasoning (CBR)
A methodology in which knowledge and/or inferences are
derived directly from historical cases/examples
– Analogical reasoning (= CBR)
Determining the outcome of a problem with the use of
analogies. A procedure for drawing conclusions about a
problem by using past experience directly (no intermediate
model?)
– Inductive learning
A machine learning approach in which rules (or models)
are inferred from the historic data
61
CBR vs. Rule-Based Reasoning
Criterion Rule-Based Case-Based
Reasoning Reasoning
Knowledge unit Rule Case
Granularity Fine Coarse
Explanation mechanism Backtrack of rule Precedent cases

firings
Advantages Flexible use of Rapid knowledge

knowledge acquisition
Potentially optimal Explanation by
answers examples
Disadvantages Possible errors due to Suboptimal solutions

misfit rules and Redundant
problem parameters knowledge base
Black-box answers Computationally
expensive
62
• CBR is based on the
premise that new All Cases
problems are often

similar to previously Classification
encountered problems,
and, therefore, past Repetitive Exceptional Unique
successful solutions Ossified Pragmatic

Stories
Cases Cases
may be of use in
solving the current
Induction Indexing Induction &
situation Indexing
Knowledge
Rules Experiences Lessons
63
The CBR Process
• The CBR Process (4R) New case
(characteristics)
– Retrieve Input
– Reuse Rule 1: If ..
...
1 Assign
indexes to the
–
Rule 2: If ..
Revise ...
new case
Indexing rules
–
Input + Indexes
Retain (case library) 2 Retrieve
Case Matching /
similar old
library similarity rules
cases
Prior solutions
to similar cases
5c Store/ 3 Modify and/
Modification /
catalog the or refine the
repair rules
new case search
Proposed
Solution(s)
5b Assign 4 Test the 6b
Repair
indexes to the proposed
New the solution
new case solution(s)
Solution
Causal
analysis
5a Deploy the 6a Explain
Solution
solution / solve Yes and learn from
the case
works? No
failure
Solution
Predictive features
64
• Advantages of using CBR
– Knowledge acquisition is improved
– System development time is faster
– Existing data and knowledge are leveraged
– Formalized domain knowledge is not required
– Experts feel better discussing concrete cases
– Explanation becomes easier
– Acquisition of new cases is easy
– Learning can occur from both successes and failures
– …more…
65
• Issues and challenges of CBR
– What makes up a case?
– How can we represent cases in memory?
– Automatic case-adaptation can be very complex!
– How is memory organized (the indexing rules)?
– How can we perform efficient searching (i.e., knowledge
navigation) of the cases?
– How can we organize the cases?
– The quality of the results is heavily dependent on the
indexes used
– … more …
66
• Success factors for CBR systems
– Determine specific business objectives
– Understand your end users (the customers)
– Obtain top management support
– Develop an understanding of the problem domain
– Design the system carefully and appropriately
– Plan an ongoing knowledge-management process
– Establish achievable returns on investment (ROI) and
measurable metrics
– Plan and execute a customer-access strategy
– Expand knowledge generation and access across the
enterprise
67
Genetic Algorithms
• It is a type of machine learning technique
• Mimics the biological process of evolution
• Genetic algorithms
– Software programs that learn in an evolutionary manner, similar to the
way biological systems evolve
• An efficient, domain-independent search heuristic for a broad

spectrum of problem domains
• Main theme: Survival of the fittest

– Moving towards better and better solutions by letting only the fittest
parents to create the future generations
68
Evolutionary Algorithm
10010110 10010110
01100010 Elitism 01100010
10100100 10100100
10011001 10011101
01111101 01111001
... Selection Reproduction ...
... . Crossover ...
... . Mutation ...
... ...
Current Next
generation generation
69
GA Structure and GA Operators
• Each candidate solution is Start
called a chromosome
• A chromosome is a string Represent problem’s
chromosome structure
of genes
• Chromosomes can copy Generate initial solutions
(the initial generation)
themselves, mate, and

mutate via evolution
Next Test: Stop -
• In GA we use specific
Yes
generation Is the solution Deploy the
of solutions satisfactory? solution
genetic operators
No
– Reproduction
Elites Select elite solutions; carry
• Crossover them into next generation
• Mutation
Offspring Select parents to reproduce;
apply crossover and mutation
70
Genetic Algorithms
• Limitations of Genetic Algorithms
– Does not guarantee an optimal solution (often settles in a
sub optimal solution / local minimum)
– Not all problems can be put into GA formulation
– Development and interpretation of GA solutions requires
both programming and statistical skills
– Relies heavily on the random number generators
– Locating good variables for a particular problem and
obtaining the data for the variables is difficult
– Selecting methods by which to evolve the system requires
experimentation and experience
71
Genetic Algorithm Applications
• Dynamic process control
• Optimization of induction rules
• Discovery of new connectivity topologies (NNs)
• Simulation of biological models of behavior
• Complex design of engineering structures
• Pattern recognition
• Scheduling, transportation and routing
• Layout and circuit design
• Telecommunication, graph-based problems
72
Fuzzy Logic and Fuzzy Inference System
• Fuzzy logic is a superset of conventional (Boolean) logic that
has been extended to handle the concept of partial truth – truth
values between "completely true" and "completely false”
• First introduced by Dr. Lotfi Zadeh of UC Berkeley in the
1960's as a mean to model the uncertainty of natural language.
• Uses the mathematical theory of fuzzy sets
• Simulates the process of normal human reasoning
• Allows the computer to behave less precisely
• Decision making involves gray areas
73
Fuzzy Logic Example: Tallness
Proportion
Height Voted for You must be taller
5’10” 0.05 than this line to be
considered “tall”
5’11” 0.10
6’00” 0.60
6’01” 0.15
Crisp Set
6’02” 0.10 1.0
Degree of membership
0.8
• Jack is 6 feet tall 0.6
Short Average Tall
– Probability theory - 0.4
0.2
cumulative probability: 0.0
There is a 75 percent 4'9" 5'2" 5'5"
Height
5'9" 6'4" 6'9"
chance that Jack is tall

– Fuzzy logic: Jack's
1.0
Degree of membership
0.8
Fuzzy Set
degree of membership 0.6 Short Average Tall
within the set of tall 0.4
0.2
people is 0.75 0.0
4'9" 5'2" 5'5" 5'9" 6'4" 6'9"
Height
74
Advantages of Fuzzy Logic
• More natural to construct
• Easy to understand - Frees the imagination
• Provides flexibility
• More forgiving
• Shortens system development time
• Increases the system's maintainability
• Uses less expensive hardware
• Handles control or decision-making problems not
easily defined by mathematical models
• …more…
75
Fuzzy Inference System (FIS)
= Expert System + Fuzzy Logic
• An FIS consists of
– A collection of fuzzy membership functions
– A set of fuzzy rules called the rule base
– Fuzzy inference is a method that interprets the values in the
input vector and, based on some set of rules, assigns values
to the output vector
• In an FIS, the reasoning process consists of
– Fuzzification
– Inferencing
– Composition, and
– Defuzzification
76
The Reasoning Process for FIS
(the tipping example)
Example: What % tip to leave at a restaurant?
“Given the IF service is poor or food is bad

Rule 1
quality of THEN tip is low
Defuzzyfication
Fuzzyfication
Input 1
service and Service (0-10)
IF service is good Output
the food, Rule 2 THEN tip is average Summation
Tip (5 - 25%)
how much Input 2

Food (0-10)
should I tip?” Rule 3

IF service is excellent or food is delicious
THEN tip is generous
Fuzzy Inferencing Process
Crisp Crisp
Fuzzification Inferencing Composition Defuzzification
Inputs Outputs
Membership Fuzzy Composition Defuzzification

functions rules heuristics heuristics
77
Fuzzy Applications
• In Manufacturing and Management
– Space shuttle vehicle orbiting
– Regulation of water temperature in shower heads
– Selection of stocks to purchase
– Inspection of beverage cans for printing defects
– Matching of golf clubs to customers' swings
– Risk assessment, project selection
– Consumer products (air conditioners, cameras, dishwashers), …
• In Business
– Strategic planning
– Real estate appraisals and valuation
– Bond evaluation and portfolio design, …
78
Intelligent Software Agents
• Intelligent Agent (IA): is an autonomous computer program
that observes and acts upon an environment and directs its
activity toward achieving specific goals
• Relatively new technology
• Other names include

– Software agents
– Wizards
– Knowbots
– Intelligent software robots (Softbots)
– Bots
• Agent - Someone employed to act on one’s behalf
79
Definitions of Intelligent Agents
• Intelligent agents are software entities that carry out some set of
operations on behalf of a user or another program, with some degree
of independence or autonomy and in so doing, employ some
knowledge or representation of the user’s goals or desires.”
(“The IBM Agent”)
• Autonomous agents are computational systems that inhabit some

complex dynamic environment, sense and act autonomously in this
environment and by doing so realize a set of goals or tasks for which
they are designed
(Maes, 1995, p. 108)
80
Characteristics of Intelligent Agents
• Autonomy (empowerment)
– Agent takes initiative, exercises control over its actions. They are Goal-
oriented, Collaborative, Flexible, Self-starting
• Operates in the background
• Communication (interactivity)
• Automates repetitive tasks
• Proactive (persistence)
• Temporal continuity
• Personality
• Mobile agents
• Intelligence and learning
81
A Taxonomy for Autonomous Agents
Autonomous Agents
Biologics Agents Robotic Agents Computational Agents
Software Agents Artificial-life Agents
Task-specific Agents Entertainment Agents Viruses
82
Classification for Intelligent Agents by Characteristics
• Agents can be classified in terms of these three important

characteristics dimensions
1. Agency
• Degree of autonomy and authority vested in the agent
– More advanced agents can interact with other agents/entities
2. Intelligence
• Degree of reasoning and learned behavior
– Tradeoff between size of an agent and its learning modules
3. Mobility
• Degree to which agents travel through the network
– Mobility requires approval for residence at a foreign locations
83
Intelligent Agents’ Scope in Three Dimensions
Agency
Agent
interactivity
Improved agency
Intelligent
Application Agents
interactivity
User Improved intelligence

interactivity
y
ilit
ob
Intelligence
m
ed
Fixed
ov
ng
ng
ce
in
ni
ni
pr
nn
en
so
ar
Im
a
r
Le
ea
Mobile
fe
Pl
e
R
Pr
Mobility
84
Internet-Based Software Agents
• Software Robots or Softbots
• Major Categories
– E-mail agents (mailbots)
– Web browsing assisting agents
– Frequently asked questions (FAQ) agents
– Intelligent search (or Indexing) agents
– Internet softbot for finding information
– Network Management and Monitoring
• Security agents (virus detectors)
– Electronic Commerce Agents (negotiators)
85
Leading Intelligent Agents Programs
• IBM [research.ibm.com/iagents]
• Carnegie Mellon [cs.cmu.edu/~softagents]
• MIT [agents.media.mit.edu]
• University of Maryland, Baltimore County [agents.umbc.edu]
• University of Massachusetts [dis.cs.umass.edu]
• University of Liverpool [csc.liv.ac.uk/research/agents]
• University of Melbourne
(<URL>agentlab.unimelb.edu.au</URL>)
• Multi-agent Systems [multiagent.com]
• Commercial Agents/Bots [botspot.com]
86
Lesson 14: Group Support Systems
Dr. Le, Hai Ha

Content
 Collaboration & Communication

 Group Support Systems (GSS)
 GSS Meeting Process
2
Collaboration
• What is it?
“… making joint effort toward achieving an agreed upon
goal.”
• Meeting is a common form of collaboration
• Why collaborate?
3
Why Collaborate?
Make Decisions
Review
Build Trust
Synergy
Share the Vision
Share Information
Share Work
Solve Problems
Build Consensus Socialize
4
Collaboration is Difficult
Waiting to speak Wrong People

Domination Groupthink
Fear of Speaking Poor Grasp of Problem
Misunderstanding Ignored Alternatives
Inattention Ineffective Lack of Consensus
Lack of Focus Collaboration
Poor Planning
Inadequate Criteria Hidden Agendas
Premature Decisions Conflict
Missing Information Inadequate Resources
Distractions Poorly Defined Goals
5
Collaboration is Expensive
❑ 15 Million formal Sessions / day
❑ ? Million Informal Sessions / day
❑ 4 Billion Sessions / year
❑ 30-80% Manager’s time
Fortune 500 Companies

3M Corporation Study
6
Collaboration is Essential
• No one has all the …
– Experience
– Knowledge
– Resources
– Insight, and
– Inspiration
…to do the job alone
• Bottom line:
Collaboration is difficult, expensive, and yet essential
for today’s organizations
7
How Do People Collaborate?
3 Levels of Collaboration Capability

Low
Level 1 Collected Work :
Sprinters Uncoordinated Individual Efforts
Degree of Level 2 Coordinated Work:

Collaborative Relay Coordinated Individual Efforts
Effort
Level 3 Concerted Work:

Crew Concerted Team Effort
High
8
Meetings (a form of collaboration)
▪ Joint activity
▪ Equal or near equal status
▪ Outcome depends on participant’s knowledge, etc.
▪ Outcome depends on group composition
▪ Outcome depends on decision-making process
▪ Disagreement settled by rank or negotiation
9
The Ideal Meeting
▪ Dozens of people attends

▪ Everyone …
talks at once
hears everything
understands
remembers
▪ The impossible dream?
10
Traditional Meetings
Only ONE person can speak at a time

11
GSS Meetings
By using the computer everyone can

SPEAK and be understood simultaneously
12
Communication Support
• Vital
• Needed for collaboration
• Modern information technologies provide

inexpensive, fast, capable, reliable means of
supporting communication
• Internet / Web
13
Supporting Communication
▪ Evolution of Communication
▪ Word of mouth
▪ Delivery persons
▪ Horseback
▪ Snailmail
▪ Telegraph
▪ Telephone
▪ Radio
▪ Television
▪ Videoconferencing
▪ Internet / Web…
14
A Time/Place Communication Framework
15
Groupware
▪ Lotus Notes / Domino Server
Includes Learning Space
▪ Netscape Collabra Server
▪ Microsoft NetMeeting
▪ Novell Groupwise
▪ GroupSystems
▪ TCBWorks
▪ WebEx
16
Group Support Systems
▪ Goal: to support groupwork
▪ Increase benefits / decrease losses of collaboration
▪ Based on traditional methods
▪ Nominal Group Technique
“Individuals work alone to generate ideas which are pooled under
guidance of a trained facilitator”
▪ Delphi Method
“A structured process for collecting and distilling knowledge from a
group of experts by means of questionnaires”
▪ Electronic Meeting System (EMS)
17
GSS – Important Features
Process Gains:
▪ Parallelism ( simultaneous contributions )
▪ Larger groups can participate
▪ Anonymity ( promotes equal participation )
▪ Focus on content not personalities
▪ Triggering ( stimulates thinking )
▪ Synergy ( integrates ideas )
▪ Structure ( facilitates problem solving )
▪ Record keeping ( promotes organizational memory )
Process Loses:
▪ Free-riding
▪ Flaming
18
Benefits of Anonymity
✓ Ideas considered on
merit not source
✓ Overcome fear of
speaking up
✓ More ideas leads to
more quality ideas
✓ Defuses tough
political discussions
9#
19
GSS Enabling Technologies
▪ Decision room
▪ Multiple use facility
▪ Web-based
20
The Decision (Electronic Meeting) Room
▪ 12 to 30 networked personal computers
▪ Usually recessed into the desktop
▪ Server PC
▪ Large-screen projection system
▪ Breakout rooms
▪ Need a Trained Facilitator for Success
21
Cool Decision Rooms
IBM Corp.
22
Cooler Decision Rooms
US Air Force
23
Mobile Decision Rooms
Murraysville School District Bus
24
On-Demand Decision Rooms
25
Few Organizations Use Decision Rooms
▪ High Cost
▪ Need for a Trained Facilitator
▪ Requires Specific Software Support for Different
Cooperative Tasks
▪ Infrequent Use
▪ Different Place / Different Time Needs
▪ May Need More Than One
26
Other Technologies
▪ Multiple Use Facility
▪ Cheaper
▪ Still need a facilitator
▪ Web-based
▪ Cheaper: no extra hardware needed
▪ Still need facilitator
27
GroupSystems, Inc.
▪ From GroupSystems.com, Tucson, AZ
▪ Comprehensive groupware
▪ Windows and Web versions
▪ Leading software
▪ Tool: ThinkTank
28
ThinkTank: Supported Activities
▪ Supported tools and activities:
▪ Agenda and Other Planning Activities
▪ Electronic Brainstorming
▪ Group Outliner
▪ Topic Commenter
▪ Categorizer
▪ Vote
▪ Others…
29
GSS Meeting Process
Electronic Brainstorming icon Alternative Analysis icon
Iterate until the

solution is
Group Outliner icon
Categorizer icon
reached…
Topic Commenter icon Survey icon
Vote icon
30
Visit a GSS Meeting
31
Step 1: Prepare an Agenda
• Prepare an agenda
• …
32
Step 2: Collect Information
◼ Brainstorm
Risk
◼ Think about
the risks to
company if
they launch
a new line of
products
33
Step 3: Refine Information
◼ Gather
Additional
Information
◼ Capture
important
issues for
the listed
items
34
Step 4: Prioritize Options
◼ Prioritize Risk
Based on
Likelihood and
Impact
◼ Use of
Alternative
Analysis Ballot
for two Criteria
35
Step 5: Review Prioritized Options
◼ View and
Discuss
Results of
Voting
◼ …
36
Step 5: Review Prioritized Options…
◼ Chose Risks for

Further
Analysis…
37
◼ Collect
Additional
Input On Risks
◼ Collect
additional
comments on
top three risks…
38
◼ Review
Comments on
Risks…
39
Step 6: Create an Action Plan
◼ Create an
Action Plan…
40
Step 7: Distribute Session Transcripts
◼ Create and
Distribute a
Final Report…
41
Last Words about GSS?
▪ Why Successful?
▪ Parallelism
▪ Anonymity
▪ Synergy
▪ Structure
▪ Record keeping
▪ Needs…
▪ Organizational commitment
▪ Executive sponsor
▪ Dedicated well-trained facilitator
▪ Good planning
42
Collaborative Networks
• Integrated supply-chain
– Collaborative planning, forecasting, and replenishment
(CPFR)
– Collaborative design and product development
• Vendor Managed Inventories
– Wal-Mart, …
• Collective Intelligence
• Animal Intelligence (swarm intelligence)
43
Collaborative Planning, Forecasting, and
Replenishment (CPFR)
An industry-wide project in
which suppliers and retailers
collaborate in planning and
demand forecasting in order
to ensure that members of
the supply chain will have the
right amount of raw materials
and finished goods when they
need them
44
Collective Intelligence
◼ A shared intelligence that emerges from the intentional
cooperation, collaboration, and/or coordination of many
individuals.
◼ Examples: Wikipedia, video games, online advertising,
learner-generated context, …
◼ In order for CI to happen:
◼ Openness
◼ Peering
◼ Sharing
For more info see
Center for Collective
◼ Acting globally
Intelligence at MIT
(cci.mit.edu)
45
A Taxonomy of Collective Intelligence
46
Creativity
• Is it a fundamental human trait or something that can
be learned?
• Definition: Creativity is a characteristic of a person
that leads to production of acts, items and/or instances
of novelty
• Creativity is the product of …
a genius vs. an idea generation environment
• Creative people tend to have creative lives
• CREATIVITY  INNOVATION
• Idea Generation via Electronic Brainstorming
47
Creativity…
• What variables affects creativity
1. Cognitive variables: intelligence, knowledge, skills, etc.
2. Environmental variables: cultural and socioeconomic
factors, working conditions, etc.
3. Personality variables: motivation, confidence, sense of
freedom, etc.
• Creativity is fostered by
– Freedom
– Permission-to-fail
Allow and Enable rather than Structure and Control
48
Creativity…
• Software that shows creativity
– Intelligent Agents (Softbots)
– Creativity is an intelligent behavior
• Software that facilitates human creativity
– ThoughtPath: promotes outside-the-box thinking
– Creative WhackPack (Creative Think): whack you out of
your habitual thought process
– IdeaFisher: provides language specific universality -
thesaurus
 Freedom, Collaboration, Prototyping
49
Summary
• Understand the basic concepts and processes of groupwork,
communication and collaboration
• Describe how computer systems facilitate communication and
collaboration in enterprises
• Know the concepts and importance of the time/place
framework
• Be aware of the underlying principles and capabilities of
groupware (e.g., GSS)
• Know the process gains and losses and how GSS
increases/decreases each of them
• Describe indirect support for decision making, especially in
synchronous environments
50
Summary
• Become familiar with the GSS products of the major vendors
(e.g., Lotus, Microsoft, WebEx, Groove)
• Understand the concept of GDSS and describe how to
structure an electronic meeting in a decision room
• Describe the three settings of GDSS
• Describe how a GDSS uses parallelism and anonymity and
how they lead to process/task gains and losses
• Understand how the Web enables collaborative computing and
group support of virtual meetings
• Describe the role of emerging technologies
• Define creativity and explain how it can be facilitated by
computers
51

DSS07 CLS Rule Induction, K NN, Naive Bayesian en Đã G P

Uploaded by

Copyright:

Available Formats

DSS07 CLS Rule Induction, K NN, Naive Bayesian en Đã G P

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DSS07 CLS Rule Induction, K NN, Naive Bayesian en Đã G P

Uploaded by

Copyright:

Available Formats

DECISION SUPPORT SYSTEM

Lesson 7: CLS - Rule Induction, k-NN, Naïve Bayesian

Dr. Le, Hai Ha

Rule 1: if (Outlook = overcast) then yes

𝑟𝑖 = if (antecedent or condition) then

• Each attribute and value test is called the conjunct

Step 2: Rule development

Step 4: Next Rule

• Generalize to to the distance in n-dimensional space

• E.g. in Iris dataset 𝑋 = (4.9, 3.0, 1.4, 0.2) and 𝑌 =

• In general, Minkowsky distance:

Simple matching coefficient

Posterior Probability of the

• P(X|Y) is another conditional probability, called the class

• For the purpose of comparing 𝑃(𝑌|𝑿) for different

𝑃 𝑌 = 𝑛𝑜 = 5/14 𝑃 𝑌 = 𝑦𝑒𝑠 = 9/14

• In this case, test example will be Play=Yes

Dr. Le, Hai Ha

 Artificial Neural Network

A Biological Neuron An Artificial Neuron

An Artificial Neural Network

Input: 4-dimensional vectors

The margin of two classes are equal and as large as possible

• Meta learners = sum of several base models

• Reduces the model generalization error

The generic form:

• Different model algorithms

Dr. Le, Hai Ha

 What is clustering analysis

Border points: Border points sit on the

• The density threshold parameters are tuned in such a way

Dr. Le, Hai Ha

• Example: search web pages with keywords

• In the example, when the high TF for “that” is multiplied by

• Document vector or term document matrix (TDM): the

• TDM using TF-IDF

An example of the similarity

Dr. Le, Hai Ha

“Deep-learning methods are representation-learning methods with multiple levels

Recognise handwritten digits

Training model with 60,000

Convolution2D(32, (3, 3), activation='relu')

from keras import backend as ke

#Load partly trained model

#Load partly trained model

Lesson 12B: Introduction to PowerBI

– Or from Power BI Service (app.powerbi.com)

Dr. Le, Hai Ha

• “The data warehouse is a collection of integrated,

• Dependent data mart

• Independent data mart

Tier 1: Tier 2: Tier 3:

(b) Data Mart Bus Architecture with Linked Dimensional Datamarts

(c) Hub and Spoke Architecture (Corporate Information Factory)

Dependent data marts

(e) Federated Architecture

Data mapping / metadata

Empirical study by Ariyachandra and Watson (2006)

1. Information 6. Strategic view of the data