0% found this document useful (0 votes)
147 views6 pages

Website - Machine Learning

This document provides links to resources about machine learning algorithms including support vector machines (SVM), topic modeling, Bayesian networks, and graphical models. It discusses concepts such as regularization, kernel methods, sparse solutions, inference techniques like belief propagation, and evaluating algorithms using benchmark data. Key algorithms mentioned include SVM, probabilistic latent semantic indexing (PLSI), conditional random fields, and Markov random fields.

Uploaded by

leninworld
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views6 pages

Website - Machine Learning

This document provides links to resources about machine learning algorithms including support vector machines (SVM), topic modeling, Bayesian networks, and graphical models. It discusses concepts such as regularization, kernel methods, sparse solutions, inference techniques like belief propagation, and evaluating algorithms using benchmark data. Key algorithms mentioned include SVM, probabilistic latent semantic indexing (PLSI), conditional random fields, and Markov random fields.

Uploaded by

leninworld
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 6

Pattern recognition and machine learning - Christopher m.BISHOP http://jmlr.csail.mit.edu/papers/volume2/manevitz01a/manevitz01a.pdf http://www.dbis.ethz.ch/education/ss2007/07_dbs_algodbs/FreyReport.pdf http://www.cc.gatech.edu/~ssahay/sauravsahay7001-2.pdf http://www.cs.columbia.edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf http://code.google.com/p/textclassification/source/browse/trunk/src/java/libsvm/ svm.java?r=13 http://code.google.com/p/textclassification/source/browse/wiki/HOWTO.wiki?

r=13 concepts: http://en.wikipedia.org/wiki/Bhattacharyya_distance http://en.wikipedia.org/wiki/Slack_variable manifold www.math.ucla.edu/~wittman/mani/ --> http://www.autonlab.org/tutorials/ (important) 1 svm: http://arxiv.org/PS_cache/arxiv/pdf/1107/1107.2347v1.pdf http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Classification/SVM http://opencv.itseez.com/doc/tutorials/ml/introduction_to_svm/introduction_to_sv m.html http://www.cs.berkeley.edu/~bartlett/courses/281b-sp08/6.pdf http://www.support-vector-machines.org/SVM_soft.html http://research.cs.wisc.edu/dmi/lsvm/ http://www.statsoft.com/textbook/support-vector-machines/ http://www.support-vector.net/icml-tutorial.pdf http://research.microsoft.com/en-us/um/people/oferd/papers/dekelsi07.pdf http://www.cs.columbia.edu/~kathy/cs4701/documents/jason_svm_tutorial.pdf https://sites.google.com/site/postechdm/research/implementation/svm-java http://fedc.wiwi.hu-berlin.de/xplore/tutorials/stfhtmlnode64.html http://jmlr.csail.mit.edu/papers/volume1/mangasarian01a/html/node3.html http://mlg.eng.cam.ac.uk/porbanz/teaching/sheet_ml__svm.pdf http://www.ai.mit.edu/courses/6.867-f04/lectures/lecture-7-ho.pdf http://www.yom-tov.info/Uploads/SVM.m http://www.cs.cmu.edu/~ggordon/SVMs/svm.m

PLSI ->http://ezcodesample.com/plsaidiots/NMFPLSA.html http://people.csail.mit.edu/fergus/iccv2005/bagwords.html http://videolectures.net/mlss09uk_blei_tm/ ----http://web.mit.edu/be.400/www/SVD/Singular_Value_Decomposition.htm Paper: finding solution path : author: Hastie Multiclass classification String kernel ,diffusion kernel, Fisher? http://bioinformatics.oxfordjournals.org/content/17/4/349.full.pdf+html http://en.wikipedia.org/wiki/Positive-definite_matrix Original : we need to calculate w ( based on knowing alpha) Kernel points: (1) no need to calculate w ( based on knowing alpha)

Derive : Structural loss function , 2nd slide some formula --------------------why sigmoid for bernoulli bernoulli -> Tossing a coin if book or else? 1) for P(y x;thetha) case 1: y element of R , gaussian -> least square case 2: y= {0,1} -> bernoulli -> logistic regression (maximum likelihood) softmax regression : just k =2 ;generalization of logistic regression 2) Kernel or mercer kernel --> look into source code of svm_light (C:0.05) (author: thorsten) --> libSVM SVD - Singular Vector Decomposition PCA: objective function with constrain ( probabiltity = 1) Bow, and tf-idf does not link related documents having different words of same m eaning,so we go for SVD (1) maximum likelihood (3) posterior probability # sum( n(d,w) log(sum(P(w z) P(d z) P(z)) why p(w z) -> w and d are independent , so we have w and z HW: Derive newton raphson method ? What is gibb's distributon? What is sparse solution in your probaility distribution model? L1 can be used for finding sparse solution. SVM 1 norm : 1/C and w are piecewise linear. Penalty is more :so not smoother while choosing solution path for objective fu nction. for C=1, square and circle (formed at end point of training data line plot) me et point SVM 2 norm : Penalty is higher bcs of square , hence not smoother while choosing solution path for objective fun ction. circle is two norm(formed at end point of training data line plot ) (for c =1) generative model: -> P(w, d) = Integral (P(w,d theta) or discriminative learning -> P(y x) -----------Sometimes supervised to unsupervised can be done .-> Restricted boltzmen machi

ne or RBM this is also mixute model. Convex: first derivative - Gradient second derivative - hessian ML -> descent direction position -> various ways to find the descent, <iteration> <-> convergence step_length <=> minimal step (between 0 and 1), it is favourable second-order necessary condition : if sin or cos , or negative or zero, then nec essary conditions Second-order sufficent condition : if hessian matrix is non-negative then posit ive definite. and better non-zero. if objec is non-convex, then still want find solution , make direction as -dire ction. once we got gradient matrix, then substitue in obj fn,then we get direction wolfe condition : has condition(1) and condition(2) (1) step length satisifed or not ? (2) line search satisfied or not ? Zoutendijk s theorem: to check gradient is smooth or not? x(tilta) = x(i+1) class 1: cross-validation: find lamda for regularization: the maximum f1 corresponding value is lamda. lamda (like C in SVM) = 0.05 P(x,y) = p(y,x) -> joint probability Bayes theorem - > denominator removed or say constant, because p(y=1 x) or p( y=-1 x) has same constant, hence cancels , denominator not needed for classification Expectation -> averaging of probability ; f(x) say weight of ball to draw (1) given obj functio n (2) find gradient descent , take 2nd partial derivate and equate to zero. newton raphon (quadratic) converges much faster than gradient , MAX: max posterier probabilty :approach for max objective funciton (from baye s rule) (1) consider prior (2) only consider conditional probability(maxim likelihood) (3) denominator , Entopy: difference between two probability distributions multivariant , gaussian , diriclet to concentrate Conjugative:

diriclet <-> multinominal gaussian <-> gaussian --> RBM (DEEP LEARNING )and RNN similar since similar energy function. Beta Distribution one time tossing - bernoulli many times tossing- > binominal people coming to class for a class -> gaussian , most events in life is gaussia n overfitting: penalty : lamda/2 * w either make w small or lamda more.. N(dw)/ N(d) we want to smooth this, if we go with 1, may not work all time, so reason for goign to prior distribution or direlect distribution binominal is one param, beta disribution 2 param Conjugate prior : same distribution laplace distribution - > has sparse effect ? what is sparse ->force the vector to have less non-zero values with conjugate prior easy to solve many objective function.. plsi - dirichlet --> read gaussian , read topic modeling, how to avoid over fitting ? --------------------------prior is to avoid over fitting or smoothing multivariant gaussian : ---------------------nheu - covariance matrix inversion matrix -(called) precision matrix covariance controls the angle of matrix ,and hence contour 1) Maximum likelihood 2) bayesian (we have prior , and that is the only diff).. http://en.wikipedia.org/wiki/Robbins-Monro_procedure http://en.wikipedia.org/wiki/Student's_t-distribution -----slide 1) compare mapping , provided feature space same.

www.cse.st.hk/TL/index.html [email protected] ------------Topic:: Direct /undirected and graph model Inference on topic model Sparse: no structure (between link information of words ), only instance inform ation Sigmoid or gaussian distribution : convert real value to value between 0 and 1. graph cut: maximize conditional probability http://en.wikipedia.org/wiki/Conditional_random_field conditional random field:: (1/z) sum( E(y(i) G(i),X(i)) (1/z is normalizing ) label bias: ? factor graph: no direct link between 2 variables, but there is a function that links both (1) marginal probabilt (2) conditional probabilty are two concepts used in fac tor graph loopy belief propagation -> (no exact inference) just between nodes pass messag e -----------------------------------gaussian mean - gaussian gaussian variance - gamma binominal - beta multinominal - diriliect http://reality.media.mit.edu/ Normalizaition is hard: say n node, then we have 2PowerN. http://en.wikipedia.org/wiki/Conditional_random_field http://en.wikipedia.org/wiki/Viterbi_algorithm http://en.wikipedia.org/wiki/Forward-backward_algorithm http://en.wikipedia.org/wiki/Markov_random_field To get bench mark data https://www.mturk.com/mturk/welcome maintain dictionary to hold keywords like traffic n see how it influences GT: red, green, yellow, Prediction: street -> % of traffic p(w,street) = p(z). p(w,z) p(st,z)

qinghua dong mer, yi fir ,

You might also like