0% found this document useful (0 votes)
22 views30 pages

MLT Unit-2

Machine Learning Techniques Unit 2 Notes

Uploaded by

guptaraman600
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
22 views30 pages

MLT Unit-2

Machine Learning Techniques Unit 2 Notes

Uploaded by

guptaraman600
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 30
( 2020-2 Or Discuss Support vectors in SVM. 2020-21 2M Or swaren 2020-21 10M SVM or Support Vector Machine is a linear model for classification and regression problems. It can solve linear and non-linear problems and work well for many practical problems. It tries to classify data by fi between the classes in the t margin classifier. g a hyperplane that maximizes the margin ing data. Hence, SVM is an example of a large The idea of SVM is simple: The algorithm creates a line or a hyperplane which © scanned with OKEN Scanner “According to the SVM algorithm we find the points closest to the line from both the classes. These points are called support vectors. ¥e compute the distance between the line and the support vectors. This distance is called the margin. Our goal is to maximize the margin. The hyperplane for which the margin is maximum is the optimal hyperplane. ‘Thus SVM tries to make a decision boundary in such a way that the separation between the two classes is as wide as possible. SVM KERNELS * SVM can work well in non-linear data cases using kernel trick. * The function of the kernel trick is to map the low-dimensional input space and transforms into a higher dimensional space. + In simple words, kernel converts non-separable problems into separable problems by adding more dimensions to it. + Itmakes SVM more powerful, flexible and accurate. © scanned with OKEN Scanner ind Hyperplane Is not linearly separable © scanned with OKEN Scanner THREE TYPES OF KERNEL \A)Linear Kernel: A linear kernel can be used as normal dot product of any two given observations. The equations for the kernel function: K(x, xi)=sum(x+ xi) 2)Polynomial kernel: It is more generalized form of linear kernel and distinguish curved or nonlinear input space. It is popular in image processing. Following is the formula for polynomial kernel — k(X, Xi)=1+sum(X+ Xi)*d , d is the degree of the polynomial 3)Gaussian Radial Basis Function (RBF) Kernel: RBF kernel, mostly used in SVM classification, maps input space in indefinite dimensional space. © scanned with OKEN Scanner WC Ais a general-purpose kernel; used when there is no prior knowledge about the data Following formula explains it mathematically : K(x, xi)=exp(-gamma + sum(x-xi*2)) Gamma function: y = 1/20? © scanned with OKEN Scanner \ APPLICATIONS OF KERNEL Face detection — SVM classify parts of the image as a face and non-face and create a square boundary around the face. Handwriting recognition — We use SVMs to recognize handwritten characters used widely. ‘Texture Classification using SVM- In this SVM application, we use the images of certain textures and use that data to classify whether the surface is smooth or not. Stenography Detection in Digital Images Using SVM, we can find out if an image is pure or adulterated, This could be used in security-based organizations to uncover secret messages. Yes, we can encrypt messages in high-resolution images. In high-resolution images, there are more pixels, hence, the message is more hard to find. We can segregate the pixels and store in data in various datasets. We can analyze those datasets using SVM. (© Scanned with OKEN Scanner ~ PROPERTIES OF SVM: 1. Flexibility in choosing a similarity function 2. Sparseness of solution when dealing with large data sets- only support vectors are used to specify the separating hyperplane 3. Ability to handle large feature spaces- complexity does not depend on the dimensionality of the feature space 4. Overfitting can be controlled by soft margin approach (we let some data points enter our margin intentionally) s. A simple convex optimization problem which is guaranteed to converge to a single global solution. © scanned with OKEN Scanner PSADVANTAGES OF SVM: 1. SVM algorithm is not suitable for large data sets because the required training time is higher. 2. SVM does not perform very well when the data set has more noise i.e. target classes are overlapping. 3. In cases where the number of features for each data point exceeds the number of training data samples, the SVM will underperform. 4. SVMs with the ‘wrong’ kernel - For SVMs nowadays, choosing the right kernel function is key. As an example, using the linear kernel when the data are not linearly separable results in the algorithm performing poorly. © scanned with OKEN Scanner 2020-21 2M. Regression is a supervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables. © scanned with OKEN Scanner It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship between variabley — (“Some examples of regression can be as: ma <—Frediction of rain using temperature and other factors © Determining Market trends ©. Prediction of road accidents due to rash driving. It is used to find the trends in data. By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. © scanned with OKEN Scanner Linear Regression Logistic Regression | Linear Regression is a supervised ‘Iegression model. Logistic Regression is a supervised classification model. In Linear Regression, we predict the value by an integer number. Tt is based on the least square estimation. In Logistic Regression, we predict the value by | or 0. It is based on maximum likelihood estimation. Here when we plot the training datasets, a straight line can be drawn that touches. maximum plots. Any change in the coefficient leads to a change in both the direction and the steepness of the logistic function. It means positive slopes result in an S- shaped curve and negative slopes result in a Z-shaped curve. Linear regression is used to estimate the dependent variable in case of a change in independent variables. For example, predict the price of houses. Whereas logistic regression is used to calculate the probability of an event. For example, classify if tissue is benign ‘or malignant. © scanned with OKEN Scanner ay scan 2020-21 2M The vertices and edges in Bayesian Network have some sort of meaning. The network building itself gives you important information about the subject dependence between the variables. With Neural Networks the network structure does not tell you anything like Bayesian Network. Similarity in ANN and Bayesian Network is that they both uses directed graphs. © scanned with OKEN Scanner nO~3] (© Scanned with OKEN Scanner G2 News ae 2021-22 10M Or 2021-22 10M Bayes theorem is one of the most popular machine learning concepts that helps to calculate the probability of occurring one event with uncertain knowledge while other one has already occurred. Bayes Theorem is a way of finding a probability when we know certain other posisbilities. POXIY) = PCVIX).POQ PCY) © scanned with OKEN Scanner Bayes Theorem is a way of finding a probability when we know certain other posisbilities. OXY) = PCYIX).PO0 PY) Ree AM When we know: how ofter Y happens given that X happens, written P(Y/X) uw ich tells us: how often X happens given that Y happens, written P(X/Y), and how likely X is on its own, written P(X) and how likely Y is on its own, written P(Y) The above equation is called as Bayes Rule or Bayes Theorem. © scanned with OKEN Scanner __-2P(XIY) is called as posterior, which we need to calculate. It is defined as updated probability after considering the evidence. © P(Y{X) is called the likelihood. It is the probability of evidence when hypothesis is true. © scanned with OKEN Scanner P(X) is called the prior_probability, probability of hypothesis before considering the evidence © P(Y)is called marginal probability. It is defined as the probability of evidence under any consideration. Hence, Bayes Theorem can be written as: posterior = likelihood * prior / evidence © scanned with OKEN Scanner EXAMPLE: © Dangerous fires are rare (1%) © But smoke is fairly common (10%) due to barbecues, ‘© And 90% of dangerous fires make smoke We can then discover the probability of dangerous fire when there is no smoke: P(Fire/Smoke) = (Fire) P(Smoke/Fire)/P(Smoke) = (1% * 90% )/ 10% =% © scanned with OKEN Scanner _- Naive Bayes Classifier Algorithm rq © Naive Bayes algorithm is a supervised leaning algorithm, which is based on Bayes theorem and used for solving classification problems. o It is mainly used in ‘ext classification that includes a high-dimensional training dataset. o Itisa probabilistic classifier, which means it predicts on the basis of the probability of an object. © Some popular examples of Naive Bayes Algorithm are spam filtration, Sentimental analysis, and classifying articles. The distinction between Bayes theorem and Naive Bayes is that Naive Bayes assumes conditional independence where Bayes theorem does not. This means the relationship between all input features are independent . © scanned with OKEN Scanner _Dvorking of Naive Bayes’ Classifier: Working of Naive Bayes’ Classifier can be understood with the help of the below example: Suppose we have a dataset of weather conditions and corresponding target variable "Play". So using this dataset we need to decide that whether we should play or not on a particular day according to the weather conditions. So to solve this problem, we need to follow the below steps: 1. Convert the given dataset into frequency tables. 2. Generate Likelihood table by finding the probabil 3. Now, use Bayes theorem to calculate the posterior probability. ies of given features. Problem: If the weather is sunny, then the Player should play or not? ‘Outlook Play 0 Rainy Yes a Sunny Yes 2 Overcast Yes © scanned with OKEN Scanner © scanned with OKEN Scanner ‘Overcast [¥s 6 AL. Likelihood Table: A Weather fis No 4, [ Overcast [5 0 | Rainy [2 2 | Sunny [3 2 Total [ro 3 13 © scanned with OKEN Scanner 11, Frequency Table: Weather No Yes 1] Overcast 0 3 3714-035 Rainy 2 2 414=0.29 ‘Sunny 2 3 314-035 All 4140.29 10714=0.71 Likelihood Table: © scanned with OKEN Scanner \Applying Bayes theorem: P(Yes | Sunny)= P(Sunny | Yes)*P(Yes)/P(Sunny) P(Sunny | Yes)= 3/10= 0.3 P(Sunny)= 0.35 P(Yes)=0.71 So P(Yes | Sunny) = 0.3*0.71/0.35= 0.60 © scanned with OKEN Scanner \__ P(No | Sunny)= P(Sunny | No)*P(No)/P(Sunny) «—P(Sunny | NO)= 2/4=0.5 P(No | Sunny) Hence on a Sunny day, Player can play the game. © scanned with OKEN Scanner Ques 3) what problem does EM algorithm solves? 10M 2021-22 Or what are task of E-steps in EM Algorithm? 2M 2020-21 The Expectation-Maximization (EM) algorithm is defined as the combination of various unsupervised machine learning algorithms, which is used to determine the local maximum likelihood estimates (MLE) or maximum a posteriori estimates (MAP) for unobservable variables in statistical models. is also referred to as the latent variable model. it is a technique to find maximum likelihood estimation when the latent variables are present. It A latent variable model consists of both observable and unobservable variables where observable can be predicted while unobserved are inferred from the observed variable. These unobservable variables are known as latent variables © scanned with OKEN Scanner Steps in EM Algorithm The EM algorithm is completed mainly in 4 steps, which include Initialization Step, Expectation Step, Maximization Step, and convergence Step. © scanned with OKEN Scanner | JnitiaL Values } [Monit sation see | 2 5G mee. 54" Step: The very first step is to initialize the parameter values. Further, the system is provided with incomplete observed data with the assumption that data is obtained from a specific model. es Step: This step is known as Expectation or E-Step, which is used to estimate or guess the values of the missing or incomplete data using the observed data. Further, E-step primarily updates the variables. © 3" Step: This step is known as Maximization or M-step, where we use complete data obtained from the 2™ step to update the parameter values. Further, M-step primarily updates the hypothesis. o 4" step: The last step is to check if the values of latent variables are converging or not. If it gets "yes", then stop the process; else, repeat the process from step 2 until the convergence occurs. © scanned with OKEN Scanner

You might also like