We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2
1.
Draw state space or tree Lift <1: negatively correlated
Search (no adversary) 1. FP-tree is constructed by reading the dataset one transaction A tree is a graph in which there is unique path between every pairs of nodes Association Rule Mining: Given a set of transactions T, find all the rules at a time and mapping each transaction onto a path in the FPtree, where Uninformed Search ( blind search ) (X—>Y) having support ≥ minsup and confidence ≥ minconf, where …are the infrequent items are discarded and frequent items are sorted in descending 1. Backtracking (CS,SL,NSL,DE) support counts. corresponding support and confidence thresholds. 2. If different transactions have several items in common, their paths in the FP-tree may overlap. 3. Once an FP-tree has been constructed, the FP-Growth uses a recursive divide-and-conquer approach to mine the frequent itemsets. 4. Pros: FP-Growth is efficient and scalable for mining both long and short frequent itemsets. It is about an order of magnitude faster than the Apriori algorithm. FP-tree: a novel data structure storing compressed, crucial information about frequent patterns, compact yet complete for frequent pattern mining. FP-Growth: an efficient mining method of frequent patterns in large Database: using a highly compact FP-tree, divideand-conquer method in 2. BFS (FIFO(RILO), a queue, open and closed) nature. If the branching factor, B (ave rage number of childre n),is large , the Why FP-Growth not Apriori combinatorics may pre vent the algorithm from finding a solution using 1.No Candidate Generation the available space 2.Data Compression(using FP-tree) The space utilization of breadth- first search, measured in te rms of 3.Recursive Mining() the number of states on open, is an e xponential function of the length 4.Less Disk I/O of the path at any CPB: Conditional Pattern Base time or Bn states on leve l n 3. DFS (LIFO(LILO), a stack, open(Similar to NSL) and closed(Similar to DE and SL) Depth- first search gets quickly into a deep search space. Depth- first search can get "lost" deep in a graph, missing shorte r paths to a goal or even becoming stuck in an infinite ly long path. The space usage of depth- first search is a linear function of the length of the path. At each leve l, open retains only the childre n of a single state or B times n states to go n leve ls deep into the space 4. DFS-ID or IDS ( Depth-First Search with Iterative Deepening, depth bound)
No information about the state space is retained between
iterations
The Apriori Principle: If an itemset is frequent, then all its subsets
must also be frequent. Contrapositive: If {A,B} is an infrequent itemset, then all its supersets are infrequent. This strategy can be used to reduce the number of candidate itemsets Why Representing a Problem as a state space graph ?(Graph theory can be used to analyze the structure and complexity of both the problem and the procedures used to solve it.)
A strategy is defined by picking the order of node expansion in a graph
Informed Search ( heuristic search ) =>lead to a goal. Why? We are forced to use heuristics because the computational cost of finding the Un-informed Search = blind search solution is often prohibitive. Informed Search = heuristics-applied search A heuristic can lead a search algorithm to a sub-optimal solution or fail to find any solution at all. This inherent limitation of heuristic search cannot be Data-driven search(forward chaining) eliminated by "better" heuristics or more efficient search algorithms Goal-driven search(backward chaining) Hill-Climbing Hybrid of … 1. Greedy Best-First (h(n), open, closed) Data-driven or Goal-driven: 2. A*(depth count) (f(n)=h(n)+g(n), open, closed) If we search from Thomas Jefferson, through his descendants, towards John, h(n) is not large than the real cost. we need to search N10 descendants, where N is the average number of FP-Growth (frequent pattern growth) compresses the dataset by representing children a person has. This repre se nts more work than the previous method Gaming (adversary/ opponent + time) frequent items into a FP-tree (frequent pattern tree) if N > 2 Minimax (heuristics in gaming , search a game tree) Association Analysis But if all Thomas Jefferson’s descendants are known, but few John’s Perfect decision(no time limit) support count(σ), support, confidence(conditional probability), Lift Information Gain(ID3) ancestors are known, we do not have a choice generate the entire tree, assess the terminal, then determine the Apriori Algorithm Heuristic Search selection of every level until back to the root. generate k+1 from k (self-joining ) divide a pile of matches into two non-empty & different piles, two Pruning(1.prune candidate itemsets containing infrequent itemsets; value of 1 or 0 Association Analysis 2.count the support and eliminate the infrequent itemsets) Imperfect decision(time/ space limit) FP- Tree & FP-Growth(Frequent Pattern Growth) partial tree search(replace terminal test by a cut-off test) Construct a FP-tree from the database E(n)=M(n)- O(n) scan twice: 1. find the frequent 1-itemset and get the F-list, 2.construct the fixed ply depth: n-move look-ahead, blind beyond the ply depth FP-tree(the root is a null node, header table) Alpha-Beta(Advanced minimax by pruning) Construct the conditional pattern base Construct the conditional pattern tree Properties of all Algorithm Recursively apply the FP-Growth on PB Association Rules Generation
Pruning and Generation
Association Rule Evaluation(interesting) Lift is a measure between two itemsets, not just for a certain rule.
Lift =1: independent
Lift >1: positive correlated PCA(Principal Component Analysis) Why do we need dimensionality reduction? • Remove Feature Redundancy + Noise • Prevent Overfitting • Reduced Model Complexity • Simplified Data distribution • Better Visualization • Simple and popular method. • Unsupervised learning. • To learn the “best” low-dimensional subspace for data projection. Two ways to interpret the “Best”: Multinomial Naïve Bayes(used for document classification) 1. The mean square error (MSE) of the projected data is minimized. 2.The variance of the projected data is maximized •What is wrong with high dimensions? High Dimensions = Lots of Features Complex system to process Inefficient algorithm Overfitting to noise or other data corruptions. • Two ways to interpret the “Best”(“best” low-dimensional”) The mean square error (MSE) of the projected data is minimized. The variance of the projected data is maximized. Minimizing MSE <=> Maximizing Projected Variance Why • Subspace 𝒗 on the left generates higher projected variance • Subspace 𝒗 on the left generates smaller projected MSE – Smaller Perpendicular Offset(垂直补偿) 特征值(Eigenvalue)特征向量(Eigenvector) The magnitude of the eigenvalue represents the variance along that direction. SVM The eigenvector indicates the direction of the principal component. 防止梯度爆炸:Gradient Clipping SVMs choose the maximum margin hyperplane as the classifier's decision boundary. In the example above, D± is the maximum Regression: Backpropagation margin hyperplane. Each decision boundary Dt is associated with a pair of hyperplanes, d^ and di2, between which the distance is defined as the margin of the classifier
Use the modified x for all data samples…
netk 是神经元 k 的净输入
tk:是目标输出(Target value) Why the support vectors be chosen? ok:是实际输出(Output),即通过激活函数计算得到的输出 1. Support vectors are the data points that lie closest to the decision boundary (or margin) and play a crucial role in defining it. 2.The points I selected as support vectors are the ones closest to the CNN(Convolutional Neural Networks) boundary, as they determine the optimal hyperplane. Def:a specialized (or simplified) form of a multilayer perceptron (MLP) where 3.Any other points in the dataset do not influence the decision boundary as weights are shared through convolutional significantly as these support vectors. Therefore, choosing these points filters. ensures that the margin is maximized, leading to better generalization of the model." • The shared weights w are referred to as filters or kernels. The outputs of the convolutional layer are called feature Unsupervised Learning maps or activation maps. Clustering Pros: 1.shared weights w(CNNs efficiently reduce the number of parameters •Exploit similar structures amongst the data themselves. (weights)as compared to general neural networks.) One way to summarize various data points with a single categorical 2. efficiently scale for image data. (fully-connected neural networks do variable,i.e., the cluster centroid. efficiently for image data, overfitting) •Dimensionality Reduction Another way to simplify the complex and high-dimensional data. RNNs(Recurrent neural networks ) Summarize data with a lower dimensional real valued vector. 1.Recurrent neural networks (RNNs) are designed to handle sequential data K-Means such as text, speech, time-series data, and biological sequences. 2.RNNs can have a variable number of layers, with each layer processing inputs based on its specific time step. 3.Each layer in an RNN uses the same set of network parameters, and the layer structure is repeated across time steps. The parameters are shared HAC(Hierarchical Agglomerative Algorithm) across time. Advantages (1.Do not have to predefine the no. of clusters 2. Any clustering result with the Bayes Decision ijk desired no. of cluster K, can be obtained by “cutting” the dendrogram at the Bernoulli Naïve Bayes corresponding level.3. The result is independent of the initialization.) 1. Prior Probabilities MIN(Single Linkage)MAX(Complete~)Average~ Centroid~ 2. Class conditional Probabilities PCA 3. P(x) is common denominator, we can ignore it. K-Veans vs HAC 1,mean of the data K-Means 2.center the data Simple and cheap algorithm 3.computer the covariance matrix Results are sensitive to the initialization 4.find the eigenvalues and eigenvectors: Number of clusters needs to be pre-defined 5.project the data onto the first HAC Deterministic algorithm, i.e., not randomness Show us a range of clustering results with different choices of K More memory-and computationally-intensive than K-Means