0% found this document useful (0 votes)
39 views2 pages

Cheatsheet 1

NTU EE6483 course cheatsheet

Uploaded by

yimingxiao2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views2 pages

Cheatsheet 1

NTU EE6483 course cheatsheet

Uploaded by

yimingxiao2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

1.

Draw state space or tree Lift <1: negatively correlated


Search (no adversary) 1. FP-tree is constructed by reading the dataset one transaction
A tree is a graph in which there is unique path between every pairs of nodes Association Rule Mining: Given a set of transactions T, find all the rules at a time and mapping each transaction onto a path in the FPtree, where
Uninformed Search ( blind search ) (X—>Y) having support ≥ minsup and confidence ≥ minconf, where …are the infrequent items are discarded and frequent items are sorted in descending
1. Backtracking (CS,SL,NSL,DE) support counts.
corresponding support and confidence thresholds. 2. If different transactions have several items in common, their paths in the
FP-tree may overlap.
3. Once an FP-tree has been constructed, the FP-Growth uses a recursive
divide-and-conquer approach to mine the frequent itemsets.
4. Pros: FP-Growth is efficient and scalable for mining both long and short
frequent itemsets. It is about an order of magnitude faster than the Apriori
algorithm.
FP-tree: a novel data structure storing compressed, crucial information about
frequent patterns, compact yet complete for frequent pattern mining.
FP-Growth: an efficient mining method of frequent patterns in large
Database: using a highly compact FP-tree, divideand-conquer method in
2. BFS (FIFO(RILO), a queue, open and closed) nature.
If the branching factor, B (ave rage number of childre n),is large , the Why FP-Growth not Apriori
combinatorics may pre vent the algorithm from finding a solution using 1.No Candidate Generation
the available space 2.Data Compression(using FP-tree)
The space utilization of breadth- first search, measured in te rms of 3.Recursive Mining()
the number of states on open, is an e xponential function of the length 4.Less Disk I/O
of the path at any CPB: Conditional Pattern Base
time or Bn states on leve l n
3. DFS (LIFO(LILO), a stack, open(Similar to NSL) and
closed(Similar to DE and SL)
Depth- first search gets quickly into a deep search space.
Depth- first search can get "lost" deep in a graph, missing shorte r
paths to a goal or even becoming stuck in an
infinite ly long path.
The space usage of depth- first search is a linear function of the
length of the path. At each leve l, open retains only the childre n of a
single state or B times n states to go n leve ls deep into the space
4. DFS-ID or IDS ( Depth-First Search with Iterative Deepening,
depth bound)

No information about the state space is retained between


iterations

The Apriori Principle: If an itemset is frequent, then all its subsets


must also be frequent.
Contrapositive: If {A,B} is an infrequent itemset, then all its supersets are
infrequent. This strategy can be used to reduce the number of candidate
itemsets
Why Representing a Problem as a state space graph ?(Graph theory can be
used to analyze the structure and complexity of both the problem and the
procedures used to solve it.)

A strategy is defined by picking the order of node expansion in a graph


Informed Search ( heuristic search )
=>lead to a goal.
Why?
We are forced to use heuristics because the computational cost of finding the
Un-informed Search = blind search
solution is often prohibitive.
Informed Search = heuristics-applied search
A heuristic can lead a search algorithm to a sub-optimal solution or fail to find
any solution at all. This inherent limitation of heuristic search cannot be
Data-driven search(forward chaining)
eliminated by "better" heuristics or more efficient search algorithms
Goal-driven search(backward chaining)
Hill-Climbing
Hybrid of …
1. Greedy Best-First (h(n), open, closed)
Data-driven or Goal-driven:
2. A*(depth count) (f(n)=h(n)+g(n), open, closed)
If we search from Thomas Jefferson, through his descendants, towards John,
h(n) is not large than the real cost.
we need to search N10 descendants, where N is the average number of
FP-Growth (frequent pattern growth) compresses the dataset by representing
children a person has. This repre se nts more work than the previous method
Gaming (adversary/ opponent + time) frequent items into a FP-tree (frequent pattern tree)
if N > 2
Minimax (heuristics in gaming , search a game tree) Association Analysis
But if all Thomas Jefferson’s descendants are known, but few John’s
Perfect decision(no time limit) support count(σ), support, confidence(conditional probability), Lift Information Gain(ID3)
ancestors are known, we do not have a choice
generate the entire tree, assess the terminal, then determine the Apriori Algorithm
Heuristic Search
selection of every level until back to the root. generate k+1 from k (self-joining )
divide a pile of matches into two non-empty & different piles, two Pruning(1.prune candidate itemsets containing infrequent itemsets;
value of 1 or 0 Association Analysis 2.count the support and eliminate the infrequent itemsets)
Imperfect decision(time/ space limit) FP- Tree & FP-Growth(Frequent Pattern Growth)
partial tree search(replace terminal test by a cut-off test) Construct a FP-tree from the database
E(n)=M(n)- O(n) scan twice: 1. find the frequent 1-itemset and get the F-list, 2.construct the
fixed ply depth: n-move look-ahead, blind beyond the ply depth FP-tree(the root is a null node, header table)
Alpha-Beta(Advanced minimax by pruning) Construct the conditional pattern base
Construct the conditional pattern tree
Properties of all Algorithm Recursively apply the FP-Growth on PB
Association Rules Generation

Pruning and Generation


Association Rule Evaluation(interesting)
Lift is a measure between two itemsets, not just for a certain rule.

Lift =1: independent


Lift >1: positive correlated
PCA(Principal Component Analysis)
Why do we need dimensionality reduction?
• Remove Feature Redundancy + Noise
• Prevent Overfitting
• Reduced Model Complexity
• Simplified Data distribution
• Better Visualization
• Simple and popular method.
• Unsupervised learning.
• To learn the “best” low-dimensional subspace for data projection.
Two ways to interpret the “Best”: Multinomial Naïve Bayes(used for document classification)
1. The mean square error (MSE) of the projected data is
minimized.
2.The variance of the projected data is maximized
•What is wrong with high dimensions?
High Dimensions = Lots of Features
Complex system to process
Inefficient algorithm
Overfitting to noise or other data corruptions.
• Two ways to interpret the “Best”(“best” low-dimensional”)
The mean square error (MSE) of the projected data is minimized.
The variance of the projected data is maximized.
Minimizing MSE <=> Maximizing Projected Variance
Why
• Subspace 𝒗 on the left generates higher projected variance
• Subspace 𝒗 on the left generates smaller projected MSE – Smaller
Perpendicular Offset(垂直补偿)
特征值(Eigenvalue)特征向量(Eigenvector)
The magnitude of the eigenvalue represents the variance along that direction.
SVM The eigenvector indicates the direction of the principal component.
防止梯度爆炸:Gradient Clipping
SVMs choose the maximum margin hyperplane as the classifier's decision
boundary. In the example above, D± is the maximum
Regression:
Backpropagation margin hyperplane.
Each decision boundary Dt is associated with a pair of hyperplanes, d^ and
di2, between which the distance is defined as the margin of the classifier

Use the modified x for all data samples…

netk 是神经元 k 的净输入


tk:是目标输出(Target value) Why the support vectors be chosen?
ok:是实际输出(Output),即通过激活函数计算得到的输出 1. Support vectors are the data points that lie closest to the decision boundary
(or margin) and play a crucial role in defining it.
2.The points I selected as support vectors are the ones closest to the CNN(Convolutional Neural Networks)
boundary, as they determine the optimal hyperplane. Def:a specialized (or simplified) form of a multilayer perceptron (MLP) where
3.Any other points in the dataset do not influence the decision boundary as weights are shared through convolutional
significantly as these support vectors. Therefore, choosing these points filters.
ensures that the margin is maximized, leading to better generalization of the
model." • The shared weights w are referred to as filters or kernels.
The outputs of the convolutional layer are called feature
Unsupervised Learning maps or activation maps.
Clustering Pros:
1.shared weights w(CNNs efficiently reduce the number of parameters
•Exploit similar structures amongst the data themselves.
(weights)as compared to general neural networks.)
One way to summarize various data points with a single categorical 2. efficiently scale for image data. (fully-connected neural networks do
variable,i.e., the cluster centroid. efficiently for image data, overfitting)
•Dimensionality Reduction
Another way to simplify the complex and high-dimensional data. RNNs(Recurrent neural networks )
Summarize data with a lower dimensional real valued vector. 1.Recurrent neural networks (RNNs) are designed to handle sequential data
K-Means such as text, speech, time-series data, and biological sequences.
2.RNNs can have a variable number of layers, with each layer processing
inputs based on its specific time step.
3.Each layer in an RNN uses the same set of network parameters, and the
layer structure is repeated across time steps. The parameters are shared
HAC(Hierarchical Agglomerative Algorithm) across time.
Advantages
(1.Do not have to predefine the no. of clusters 2. Any clustering result with the Bayes Decision
ijk desired no. of cluster K, can be obtained by “cutting” the dendrogram at the Bernoulli Naïve Bayes
corresponding level.3. The result is independent of the initialization.) 1. Prior Probabilities
MIN(Single Linkage)MAX(Complete~)Average~ Centroid~ 2. Class conditional Probabilities
PCA 3. P(x) is common denominator, we can ignore it.
K-Veans vs HAC
1,mean of the data
K-Means
2.center the data
Simple and cheap algorithm
3.computer the covariance matrix
Results are sensitive to the initialization
4.find the eigenvalues and eigenvectors:
Number of clusters needs to be pre-defined
5.project the data onto the first
HAC
Deterministic algorithm, i.e., not randomness
Show us a range of clustering results with different choices of K
More memory-and computationally-intensive than K-Means

You might also like