MLT Part 1

mlt notes

Uploaded by

shubham samadariya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

42 views230 pages

MLT Part 1

mlt notes

Uploaded by

shubham samadariya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 230

a What is Learning? © In the world of machine learning, "learning" is the process of a computer program or system gaining knowledge and improving its performance over time. © Machine Learning is the general term for when computers learn from data. © It describes the intersection of computer science and statistics, where algorithms are used to perform a specific task without being explicitly programmed; instead, they recognize patterns in the data and make predictions once new data arrives.Types of Learning We can divide learning methods into five parts: Rote Learning (memorization): Memorizing things without knowing the concept/ logic behind them 4 x Passive I garning (instructions): Learning froma teacher/expért. 3.Analegy (experience): Learning new things from our past experience. 4. Inductive Learning (experience): On the basis of past experience formulating a generalized concept. 5. Heductive Learning: Deriving new facts from past facts. {The inductive learning is based on formulating a generalized concept after observing a number of instances of examples of the concept}Inductive vs Deductive Learning Inductive Reasoning Deductive Reasoning Data Conclusion Pattern Experiment Conclusion Evidence e A bottom-up approach A top-down approach © Use specific premises to forma e Use general premises to forma general conclusion specific conclusion Conclusions are probabilistic Conclusions are certainInductive and Deductive Learning Approach Data Model Creation Training Goal Examples Strengths Limitations Inductive Learning Bottom-up Specific examples Find correlations and patterns in data. Adapting model parameters and learning from instances Using fresh data, generalizing, and making predictions. Decision trees, neural networks, clustering algorithms capable of learning from a variety of complicated data, adaptable, and versatile It may be difficult to manage complex and diverse data and may overfit to specific facts. Deductive Learning Top-down Légical rules and procedures obey clearly stated guideline: and instructions Programming explicitly and establishing rules Make a model that precisely complies with the given guidelines and instructi Knowledge-based systems, expert systems, and rule-based systems accurately when according to established norms and processes, and effective when doing specific duties limited to well-defined duties and norms, possibly incapable of adjusting to novel circumstancesLecture 2:Content © Well posed learning problem e Examples of learning problems ¢ Checkers learning problem e Handwriting recognition learning problem ¢ Robot driving learning problem ° Designing a learning system.—— Well Posed learning problems © Definition 1(Mitchelligg7) Acomputer program is said to learn from Experience(E) with respect to some class of tasks T and performance measure P, if its performance at task in T , as measured by P,, improves with experience E. e Must identify the following 3 features: - Class of tasks - The measure of performance to be improved - Source of experience_— Well Posed Learning problems * Definition 2(Hadamard 1902) A(machine learning) problem is well-posed if a solution to it exists, if that solution is unique, and if that solution depends on the data / experience but it is not sensitive to (reasonably small) changes in the data / experience.pe Well Posed Learning problems * Definition 2(Hadamard 1902) A (machine learning) problem is well-posed if a solution to it exists, if that solution is unique, and if that solution depends on the data / experience but it is not sensitive to (reasonably small) changes in the data / experience. He believed that mathematical models of physical phenomena should have the properties that: © asolution exists, ¢ the solution is unique, ¢ the solution's behavior changes continuously with the initial conditions.Examples: e Acheckers learning problem e Ahandwriting recognition learning problem e Arobot driving learning problem ¢ Spam Mail detection learning problemhandwriting recognition learning problem: ¢ Task T: recognizing and classifying handwritten words within images © Performance measure P: percent of words correctly classified ° Training experience E: a database of handwritten words with given classifications (Yabo bf fut ei Ay Kr Lop, ann PatyA robot driving learning problem ¢ Task T: driving on public four-lane highways using vision sensors e Performance measure P: average distance traveled before an error (as judged by human overseer) e Training experience E: a sequence of images and steering commands recorded while observing a human driveram Mail detection learnin problem ¢ Task T: To recognize and classify mails into 'spam' or ‘not spam’. ¢ Performance measure P: Total percent of mails being correctly classified as 'spam' (or 'not spam’ ) by the program. e Training experience E: A set of mails with given labels ('spam' / 'not spam’).= — Designing a Learning system © The exact type of knowledge to be learned (Choosing the Target Function) e Arepresentation for this target knowledge (Choosing a representation for the Target Function) e Alearning mechanism (Choosing an approximation algorithm for the Target Function) L.Training inputs: outpats/inputs. 2. Terget Function to learn the ber S.Learning Algorithm Lo Improve trom EpesDesigning a learning system A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as. measured by P, improves with experience E". One of the examples discussed was learning checkers game, the parameters T, E, and P with respect to this example are, T > Play the checkers game. P -> Percentage of games won against the opponent. E -> Playing practice games against itself.Designing a Learning system Training Set Evaluation procedure L.Training inputs- outputs/inputs, 2. Target Function to learn the best move. 3.Learning Algorithm to improve from Experience 4, Procedure Evaluation to Evalute the improvement.Steps to design a learning system Learning system is a five-step process which are as follows 1.Choosing the Training Experience 2.Choosing the Target Function 3,Choose a Representation for the Target Function 4.Choosing a Function Approximation Algorithm 5The Final Design1. Choosing the Training Experience The type of training experience chosen has a considerable amount of impact on our algorithm. The training data's characteristics need to be similar to that of the total data set's characteristics. In order to choose the right training experience for your algorithm, consider these three attributes 1.Check whether the training experience provides direct or indirect feedback to the algorithm based on the choices of the performance system. 2.Degree: The degree of a training experience refers to the extent up to which the learner can control the sequence of training 3.The representation of the distribution of samples across which performance will be tested is the third crucial attribute. i2. Choosing the target function: The next design decision is to figure out exactly what kind of knowledge will be acquired and how the performance software will put it to use. The choice of the target function is a key feature in designing the entire system. The target function V: B -> R. This notation denotes that V maps any legal board state from set B to a real value.3. Choosing Representation for Target function: Once done with choosing the target function now we have to choose a representation of this target function, When the machine algorithm has a complete list of all permitted movements The NextMove function will move the Target move, which will increase the success rate. b For example, if a chess machine has four alternative moves, the computer will select the most optimal move that will lead to victory.4. Choosing a Function Approximation Algorithm: In this step, we choose a learning algorithm that can approximate the target function chosen. This step further consists of two sub-steps, a. Estimating the training value, and b. Adjusting the weights.5. The Final Design: The final design consists of four modules, as described in the picture. The performance system: The performance system solves the given performance task. 2.Critic: The critic takes the history of the game and generates training examples. 3,Generalizer: It outputs the hypothesis that is its estimate of the target function 4.Experiment Generator: It creates a new problem after taking in the hypothesis for the performance system to explore.Limitations of machine learning ¢ Data Quality 1.The most common issue when using ML is poor data quality. To get high-quality data, you must implement data evaluation, integration, exploration, and governance techniques prior to developing ML models. 2. Accuracy of ML is driven by the quality of the data. 3. Common issues include lack of good clean data, the ability to apply the correct learning algorithms, black-box approach, the bias in training data/algorithms, etc.Limitations of machine learning e Transparency 1, It is often very difficult to make definitive statements on how well a model is going to generalize in new environments. 2. It’sa black box for most people. Developers like to go through the code to figure out how things work and customers instrument the code only.- Limitations of machine learning ¢ Traceability and reproduction of results 1.An experiment will have results for one scenario, and as things change during the experimentation process it becomes harder to reproduce the same results. 2.The best approach we've found is to simplify a need to its most basic construct and evaluate performance and metrics to further apply ML.ISSUES IN MACHINE LEARNING ¢ On the basis of training data 1. How much training data is sufficient? 2. What general bounds can be found to relate the confidence in learned hypotheses to the amount of training experience and the character of the learner's hypothesis space?——_ : ISSUES IN MACHINE LEARNING e Based on Algorithms: 1.What algorithms exist for learning general target functions from specific training examples? 2.In what settings will particular algorithms converge to the desired function, given sufficient training data? 3.Which algorithms perform best for which types of problems and representations?- _— ISSUES IN MACHINE LEARNING e On the basis of prior knowledge 1.When and how can prior knowledge held by the learner guide the process of generalizing from examples? 2. Can prior knowledge be helpful even when it is only approximately correct?—— ISSUES IN MACHINE LEARNING e On the basis of training experience 1.Choosing the best strategy 2.The choice of strategy alter the complexity of learning problem. © What is the best way to reduce the learning task to one or more function approximation problems. e How can the learner automatically alter its representation to improve its ability to represent and learn the target function?= Regression analysis is a statistical method and supervised learning technique wl which is used to raalal the relationship between a dependent (target) and independent (predictor)variables with one or more independent variables.Regression analysis is a statistical method and supervised learning technique which is used to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables. More specifically, Regression analysis helps us to understand how the value of the dependent variable is changing corresponding to an independent variable when other independent variables are held fixed.® It predicts continuous/real values such as temperature, age, salary, price, etc. » In Regression, we plot a graph between the variables which best fits the given data points, using this plot, the machine learning modelcan mane predictions about the data.= "Regression shows a line or curve that passes through all the data points on target- predictor graph in such a way that the vertical distance between the data points and the regression line is minimum." = The distance between data points and line tells whether a model has captured a strong relationship or not.~«ependent Variable —~ £B yok = Independent Variable = Outliers Outliers Usa 27 7 = Multicollinearity 4 = Underfitting and Overfitting \= DependentVariable = Independent Variable = Outliers = Multicollinearity = Underfitting and Overfitting Undorftted Good FitRobust Overfitted‘Types : { of Regression 7® Regressionanalysis helpsin the prediction of a continuous variable. ® There are various scenarios in the real world where we need some future predictions such as weather condition, sales prediction, marketing trends, etc. = So for such case we need Regression analysis whichis a statistical method and used in machine learning and data science.® Linearregressionis a statisticalregression method which is used for predictive analysis. ® Itis one of the very simple and easy algorithms which works on regression and shows the relationship betweenthe continuous variables. = Itis used for solving the regression problem in machine learning.= If there is only one input variable (x), then such linear regression is called simple linear regression. Yr < thy, = Andif there is more than one input variable, then such linear regressionis called multiple linear regression. fon" Ba, Bea By= Linearregression shows the linear relationship between the independent variable (X-axis) and the dependent variable (Y-axis), hence called linear regression. = The relationship between variables in the linear regression model can be explained using the below image. Here we are predicting the salary of an employee on the basis of the year of experienceSalary ——> 120000 100000 i +x yo vrte 4 6 8 W* experience —___»® Belowis the mathematical equation for Linear regression: Y=aX+b = Here, Y = dependent variables (target variables), X= Independent variables (predictor variables), aand bare the linear coefficients = Some popular applications of linear regression are: Analyzing trends and sales estimates Salary forecasting Real estate prediction Arriving at ETAs in traffic= In machine learning, we try to determine the best hypothesis from some_hypothesis space H, given the observed training data D. = In Bayesian learning, the best hypothesis means the most probable hypothesis, given the data D plus any initial knowledge about the prior probabilities of the various hypotheses in H.= Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior probability, the probabilities of observing various data given the hypothesis, and the observed data itself.P(h)is prior probability of hypothesish P(h) to denote the initial probability that hypothesis h holds, before observing training data. P(h) may reflect any background knowledge we have about the chance that his correct. If we have no such prior knowledge, then each candidate hrypothesis might simply get the same prior probability. PQ) is prior probability of training dataD The probability of D given no knowledge about which hypothesisholds P(h[D) is posterior probability of h given D P(h[D) is called the posterior probability of h, because it reflects our confidence that = holds after we have seen the training data D. The posterior probability P(h[D) reflects the influence of the training data D, in contrastto the prior probability P(h), which is independent ofD. PO|h) is posterior probability of D givenh ~ Theprobability of observing data D given some world in which hypothesis Hholds. Generally. we write P(xly) to denote the probability of event x given event y.= In ML problems, we are interested in the probability P(h|D) that h_holds given the observed training data D. » Bayes theorem provides a way to calculate the posterior probability P(h|D), from the prior probability P(h), together with P(D) and P(Dhh). BayesTheorem: P(h| D)= oo= P(h|D) increases with P(h), and P(D|h) according to Bayes theorem. = P(h|D) decreases as P(D) increases, because the more probable it is that D will be observed independent of h, the less evidence D provides in support of h.Sample Space for [qholds events A and B [Bholas_| Bholds we 4 P(A)= 4/7 P(B)= 3/7 PIA Is Bayes Theorem correct? P(BIA)= P(AIB)P(B)/ P(A) = (2/3 * 3/7) / 4/7 = 2/4 > CORRECT one P(AIB)= P@BIA)P(A)/ PB) = ( 2/4 * 4/7) /3/7 = 2/3 > CORRECT= Features of Bayesian learning methods: Each observed traming example can igcrementally decrease or increase the estimated probability that a hypothesis is correct. os This provides a more flexible approach to learning than algorithms thatcompletely eliminate a hypothesis if it is found to be inconsistent with any singleexample. Prior knowledge can be combined with observed data to determine the final probability ofa hypothesis. In Bayesian learning, prior knowledge is provided by asserting ©$ aprior probability for each candidate hypothesis, and os a probability distribution over observed data for each possiblehypothesisProduct rule: probability P(A A B) of a conjunction of two events A and B P(A A B) = P(AIB)P(B) = P(BIA)P(A) Sum rule: probability of a disjunction of two events A and B PAA B) = P(A) + P(B) — P(AA B) Theorem of total probability: if events A1,. then in are mutually exclusive with Jy.) P(Ad PB) = > PeaA PCA) io4 = The learner considers some set of candidate hypotheses H and it is interested in finding the most probable hypothesis h 0 H given the observed data D = Any such maximally probable hypothesis is called a maximum a posteriori (MAP) hypothesis = We can determine the MAP hypotheses by using Bayes theorem to calculate the posterior probability of each candidate hypothesis.Amap = argmax P(h|D) heH & P(DIA) P}) PCD) = argmax P(D|hk) P(A) heH- Ifwe assume that every hypothesis in H is equally probable = Le. P(hj) = P(h)) for all h; and h,in H = We can only consider P(DJh) to find the most probable hypothesis. = P(DJh) is often called the likelihood of the data Daw 4fyo\ = 0h 7= Any hypothesis that maximizes R(DJh) is called a maximum likelihood (ML) hypothesis, hy. hy = argmax P(D|h) heH1. For each hypothesis h in H, calculate the posterior probability — P(D|h)P(h) P(h|D)= P(D) = Output the hypothesis hy;sp with the highest posterior probability hmap = argmax P(h|D) heHa Supervised Reinforcement} Task Driven Data Driven Learn from (Predict next value) (Identify Clusters) Mistakes lw 3 it1. Supervised Learning e Input data is called training data and has a known label or result such as spam/not-spam or a stock price at a time. e A model is prepared through a training process in which it is required to make predictions and is corrected when those predictions are wrong. The training process continues until the model achieves a desired level of accuracy on the training data. e Example problems are classification and regression ple p:SUPERVISED LEARNING Supervised machine learning is a branch of artificial intelligence that focuses on odels to make predictions or decisions based on labeled training data. Labeled Data Prediction DatabaseTown Test Data2. Unsupervised Learning e Input data is not labeled and does not have a known result. e A model is prepared by deducing structures present in the input data. This may be to extract general rules. It may be through a mathematical process to systematically reduce redundancy, or it may be to organize data by similarity. e Example problems are clustering, dimensionality reduction and association rule learning.e A model is prepared by deducing structures present in the input data. This may be to extract general rules. It may be through a mathematical profess to systematically reduce redundancy, or it may be to organize data by similarity. e Example problems ar¢ reduction and_associa dimensionality oirrtteJearning.UNSUPERVISED LEARNING Unsupervised learning is a type of machine learning where the algorithm learns from unlabeled data without any predefined outputs or target variables. Input Raw Data 039 @ DatabaseTown Outputs Interpretation Processing] Algorithms Unlabeled Data3. Semi-Supervised Learning e Input data is a mixture of labeled and unlabelled examples. e There is a desired prediction problem but the model must learn the structures to organize the data as well as make predictions. e Example problems are classification and regressionwa 4. Reinforcement Learning Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. e Reinforcement learning uses rewards and punishments as signals for positive and negative behavior.Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) are algorithms based on brain function and are used to model complicated patterns and forecast issues. The Artificial Neural Network (ANN) is a deep learning method that arose from the concept of the human brain Biological Neural Networks. The development of ANN was the result of an attempt to replicate the workings of the human brain. The workings of ANN are extremely similar to those of biological neural networks, although they are not identical. ANN algorithm accepts only numeric and structured data.What is Artificial Neural Network(ANN)? An Artificial Neural Network (ANN) is a computational model inspired by the human. brain's neural structure. It consists of interconnected nodes (neurons) organized into layers. Information flows through these nodes, and the network adjusts the connection strengths (weights) during training to learn from data, enabling it to recognize patterns, make predictions, and solve various tasks in machine learning and artificial intelligence. There are three layers in the network architecture: the input layer, the hidden layer (more than one), and the output layer. Because of the numerous layers are sometimes referred to as the MLP (Multi-Layer Perceptron).Types of neural networks Feedforward neural networks, or multi-layer perceptrons (MLPs):They are comprised of aminput layer, a hidden layer or layers, and an output layer. Data usually is fed into these models to train them, and they are the foundation for computer vision, natural language processing, and other neural networks. Convolutional aencal networks (CNNs) are similar to feedforward networks, but they're usually utilized for image recognition, pattern recognition, and/or computer vision. Recurrent neural newivorks (RNNs) A recurrent neural network (RNN) is a type of artificial neural network which uses sequential data or time series data. These deep learning algorithms are commonly used for ordinal or temporal problems, such as language translation, natural language processing (NLP), speech recognition, and image captioning; they are incorporated into popular applications such as Siri, voice search, and Google Translate.What is Clustering? Clustering is the process of arranging a group of objects in such a manner that the objects in the same group (which is referred to as a cluster) are more similar to each other than to the objects in any other group. Data professionals often use clustering in the Exploratory Data Analysis phase to discover new information and patterns in the data, As clustering is unsupervised machine learning, it doesn’t require a labeled dataset.Clustering: ake, 2 _ gorithReinforcement Learning - Reinforcement Learning(RL) is a type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences. Though both supervised and reinforcement learning use mapping between input and output, unlike supervised learning where the feedback provided to the agent is correct set of actions for performing a task, reinforcement learning uses rewards and punishments as signals for positive and negative behavior. ‘As compared to unsupervised learning, reinforcement learning is different in terms of goals.How to formulate a basic Reinforcement Learning problem? 1.Environment — Physical world in which the agent operates 2.State — Current situation of the agent 3.Reward — Feedback from the environment REINFORCEMENT LEARNING MODEL state) 4.Policy — Method to map agent's state to actions agent 5.Value — Future reward that an agent would receive oa by taking an action in a particular state EnvironmentDecision trees Decision trees are constructed via an algorithmic approach that identifies ways to split a data set based on different conditions. It is one of the most widely used and practical methods for supervised learning. “Decision Trees are a non-parametric supervised learning method used for both classification and regression tasks. “Tree models where the target variable can take a discrete set of values are called classification Us Q ‘spam mail -Yes / no Decision trees where the target’variable can take continuous values (typically real numbers) are called regression trees.The tree can be explained by two entities, namely decision nodes and leaves. The leaves are the decisions or the final outcomes. And the decision nodes are where the data is split. Day Outlook Temperatere Humidity Wind Play Tennis 1 Say] on High [Wak |B 2 Samy HX gh Sig 8 3 Omen tot | igh | Wak | Yes oe oe ee a 5 fin) Gol Soml | Wak Yes Tani Yes Wind 67 pin | oot | Nam | sag! Xo A 5 $ Outlook Sumy Overcast ~ Rain ‘eeas Cool Nomal__ Seog Yes Suny | MIL High Wek No Saeay Cool Newmat_Wek_Yes 10 Rue MAL Novmal Weak Yes M Seay | Mid Nom! Simap_ Yes 1 Overat | Ml Migh Stooges 1 Ovwcat Hot Nol Wak Ye. MR | MAL Siang No High Nonmat Strong Weak No Yes Mo YesBayesian Networks Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a problem which has uncertainty. We can define a Bayesian network as: "A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a directed acyclic graph.” Itis also called a Bayes network, belief network, decision network, or Bayesian model. Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also use probability theory for prediction and anomaly detection,Bayesian Networks The structure is a directed acyclic graph (DAG) that expresses conditional independencies and dependencies among random variables associated with nodes. The parameters consist of conditional probability distributions associated with each node. A Bayesian network is a compact, flexible and interpretable representation of a joint probability distribution,Bayesian Networks-ExampleSupport Vector Machine (SVM) A support vector machine (SVM) is a supervised. machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they're able to categorize new text. The main objective of the SVM algorithm is to find the optimal hyperplane in an N- dimensional space that can separate the data points in different classes in the feature space. The hyperplane tries that the margin between the closest points of different classes should be as maximum as possible.Types of SVMs Simple SVM: Typically used for linear regression and classification problems. Kernel SVM: Has more flexibility for non-linear data because you can add more features to fit a hyperplane instead of a two-dimensional space. Yemx+cSVM-Working QO. ame Margin positive Hyperplane oe Maximum Margin ° Hyperplane oe Support Vectors _| Negative Hyperplane > oGenetic algorithm (GA) A genetic algorithm is a search heuristic that is inspired by Charles Darwin's theory of natural evolution. This algorithm reflects the process of natural selection where the fittest individuals are selected for reproduction in order to produce offspring of the next generation. A genetic algorithm (GA) is a heuristic search algorithm used to solve search and optimization problems. This algorithm is a subset of evolutionary algorithms, which are used in computation. Genetic algorithms employ the concept of genetics and natural selection to provide solutions to problems.Genetic Algorithms Gene Chromosome Aa G[A[OT11 110] | Population ac (ofojola[i]iGenetic algorithm (GA) Five phases are considered in a genetic algorithm. ‘Linitial population: The process begins with a set of individuals which is called a Population. n individual is characterized by a set of parameters (variables) known as Genes. Genes are joined into a string to form a Chromosome (solution). 2.Fitness function: 3.Selection 4.Crossover 5.Mutation(See Sees : om BFE £938 Se a = “= Genetic algorithm (GA) Five phates are considered ina genet algorithm. _nal population: The procers begins with a set of individuals which called a Population. Indiv is characterized by a set of parameters (variables) known 3s Genes, Genes are joined 120 form a Chromorome (solution) _2Flnese function: 3.selection crossover S.Mutation Sue om F -—4Normally we consider: cs What is the most probable Aypothesi: We can also zonsider: 3 what is the most probable classification of the new instance given the training data? given the training data?= The most probable classification of the new instance is obtained by combining the predictions of all hypotheses, weighted by their posterior probabilities. = Ifthe possible classification of the new example can take on any value v;from some set V, then the probability P(v;| D) that the correct classification for the new instance is v;P(yj|D) = J PCajihi)P(hi|D) Ae * Bayes optimal classification: argmax ) P(uy|hi)P(ti1D) yeV eH= Although the Bayes optimal classifier obtains the best performance that can be achieved from the given training data. it can be quite costly to apply. = The expense is due to the fact that it computes the posterior probability for every hypathesis in H and then combines the predictions of each hypothesis to classify each new instance.PhD) = 4, P(Olh1) =O, P(@lh1) =1 3, P(@\h2) + P(@lh2) =0 P(h3|D) = .3, P(elhs) » P(@lh3) =0 Probabilities: DY P@lhyPclD) = 4 eH dX Pe) Paid) = 6 iret Result: argmax Pylh) PAID) = © 1E(.0} p> _» An alternative. less optimal method is the Gibbs algorithm: 1. Choose a hypothesis / from H at random. according to the posterior probability distribution over H. 2. Use hi to predict the classification of the next instance x.= One highly practical Bayesian learning method is Naive Baya Learner (Name Bayer — Classifier). = The naive Bayes classifier applies to learning tasks where each instance x is described by a conjunction of attribute values and where the target function f (x) can take on any value from some finite set V.= A set of training examples is provided. and a new instance is presented. described by the tuple of attribute values (a, a2...dn). = The learner is asked to predict the target value (classification). for this new instance.= The Bayesian approach to classifying the new instance is to assign the Most probable target value Vyyap. given the attribute values (ar, az... an) that describe the instance. vmap = argmax P(vj|a1,a2...dn) yev + ByBayes theorem: P(a1, 42... dnlvj)P (yy. ae ax (a1, 42... dnlvj)P (vj) yev P(a), @2...@n) = argmax P(a1, a2 ...dq|vj)P(yj) yev= Itis easy to estimate each of the P(vj) simply by counting the frequency with which each target value vj occurs in the training data. = However. estimating the different P(al,a2...an | yj) terms is not feasible unless we have a very, very large set of training data, cs The problem is that the number of these terms is equal to the number of possible instances times the number of possible target values. cs Therefore, we need to see gvery instance in the instance space many times in order toobtain reliable estimates.The naive Bayes classifier is based on the simplifying assumption that the attribute values are conditionally independent given the target value. Fora given the target value of the instance, the probability of observing conjunction 41,024.4n, is just the product of the probabilities for the individual attributes: P(aran..-dnlyy) = TI; PCailv;) + NaiveBayes classifier: Uys = argmax P(v;) I P(ai\y) yjev i= Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a problem which has uncertainty. We can define a Bayesian network as: = "A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional dependencies using a directed acyclic graph." = Itis also called a Bayes network, belief network, decision network, or Bayesian model.= Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts: Directed Acyclic Graph Table of conditional probabilities. = The generalized form of Bayesiannetwork that represents and solve decision problems under uncertain knowledge is known as an Influence diagram.= A Bayesian network graph is made up of nodes and Arcs (directed links), where: Node al Ny oN © q= The Bayesian network has mainly two components: Causal Component Actual numbers = Each node in the Bayesian network has condition probabilit distribution P(X; |Parent(X;) ), which determines the effect of the parent on that node. = Bayesian network is based on Joint probability distribution and conditional probability. So let's first understand the joint probability distribution:= Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake occurred, and David and Sophia both called the Harry.All events occurring in this network: = Burglary (B) . Earthquake(E) = Alarm(A) = David Calls(D) = Sophia calls(S)é Burglary )B ft a T | e001 E Gomauis) 7 Tamm PUb=T) | IAF) 7] 058 | 006 F 0.95 0.04 7 | 069 | 069 # [0999 [0559 D s FEN | PSA o7s [025 002 098= Ta: B, E]= P[D|S, A, B, E]. P[S, A, B, E] =P[D|S,A, B, E]. P[S|A, B, E]. P[A, B, E] =P [D]A]. P[S|A, B, E]. PLA, B, E] =P[D|A]. P[S|Al]. P[A|B, E]. P[B, E] : : 7 El. - P[E]1. To understand the networkas the representation of the Joint probability distribution. 8 = It is helpfulto understand how to construct the network. 2. To understand the network as an encoding of a collection of conditional independence statements. = Itis helpful in designing inference procedure.= Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for Classification as well as Regression problems = /The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional space into classes so that we can easily put the new data point in the correct category in the future. a/ This best decision boundary is called a hyper Plane,= SVM chooses the extreme points/vectors that help in creating the hyperplane. 2 hese extreme cases are called as support vectors, and hence algorithm is termed as Support Vector Machine.= SVM can be understood with the example Suppose we see a strange cat that also has some features of dogs. = Soif we want a model that an accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm.2 i Data Ee A Model es Training Prediction Output Past Labeled Data So as support vector creates a decision boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it will classify itas a cat.= Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier.= Non-linear SVM: Non-Linear SVM is used for non- linearly separated data, which means if a dataset cannot be classified by using a gtraighthne then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.= Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space, but we need to find out the best decision boundary that helps to classify the data points. This best botindary is known as the hyperplane of SVM. = Margin — It may be defined as the gap between two lines on the closet data points of different classes. It can be calculated as the perpendicular distance from the line to the support vectors. Large margin is considered as a good margin and small margin is considered as a bad margin.= Support Vectors: The data points or vectors that are the closest to the hyper plane and which affect the position of the hyper plane are termed as Support Vector. Since these vectors support the hyper plane, hence called a Support vector.= AnSVM modelis basically a representation of differentc multidimensionalspace. 2 The hyper plane will be generated ina iterative manner by SVM so that th&error can) be minimized. = The goal of SVM is to divide the datasetsinto classesto find a maximum marginal hyper plane (MMH).Sispportvector:/ Optimal Hyperplane ~ e support vector= The main goal of SVM is to divide the datasets into classes to find a maximum marginal hy; vper plane (MMH) and it can be done in the following two steps - = First, SVM will generate hyper planes iteratively that segregates the classes in best way. = Then, it will choose the hyper plane that separates the classes correctly.= SVM kernel functions = Need for Kernel Method = Kernel rules = SVM kernel functionsSVM algorithms use a set of mathematical functions that are defined as the kernel. The function of kernel is to take data as input and transform it into the required form. Kernel Function is a method used to take data as input and transform into the required form of processing data. “Kernel” is used due to set of mathematical functions used in Support Vector Machine provides the window to manipulate the data= So, Kernel Function generally transforms the training set of data so that a non-linear decision surface is able to transformedto a linear equationin a higher number of dimension spaces. = Basically, It returns the inner product betweentwo points in a standard feature dimension.Separation may be easier in higher dimensions feature as map separating hyperplane ‘complex in low dimensions simple in higher dimensions= |tis very difficult to solve this classification usinga linear classifier as there is no good linear line that should be able to classifythe red and the green dots as the points are randomly distributed. = Here comes the use of kernel function which takes the points to higher dimensions, solves the problem over there and returns the output.= Define kernel or a window function as follows: K (x) _1 ifpfsi “0 otherwise >= This value of this function is 1 inside the closed ball of radius 1 centered at the origin, ando otherwise.= In order to get a mathematical understanding of kernel, let us understand the Lili Jiang’s equation of kernel which is: = K(x, y)= where, Kis the kernel function, X andY are the dimensional inputs, fis the map from n-dimensionalto m- dimensional space and, . is the dot product.= It is popular in image processing. Equationis: R(x4, 5) = (4 x45 +1)"= It is a general-purpose kernel; used when there is no prior knowledge about the data. Equationis:= It is a general-purpose kernel; used when there is no prior knowledge about the data. Equationis: K(x, x5) = exp(—rllxi — x31?) = For 77° = Sometimesparametisedusing: — y= 1/20?= This type of kernel is less prone for changes and is totally equal to previously discussed exponential function kernel, the equation of Laplacian kernel is given as:= We can use it inneural networks. Equationis: A(x;,xj) = tanh(#x; +x; + ¢) Hyperbolic tangent kernel equation= This function is equivalentto a two- layer, perceptron model of neural network, which is used as activation function for artificialneurons. k(x. y) = tanh(az™y +c) Sigmoid kernel equation= We can use it to remove the cross term in mathematical functions. = Equationis:= We can use it in regression problems. = This kernel is known to perform very well in multidimensional regression problems just like the Gaussian and Laplacian kernels. This also comes under the category of radial basis kernel. = Equationis: k(n. y) = >> exp(—e(e* — y*)?)* ANOVA r= Properties of SVM = Applications of SVM = Issues = Weakness Of SVMs= Flexibility in choosing a similarity function = Sparseness of solution when dealing with large data sets - Only support vectors are used to specify the separating hyper plane = Ability to handle large feature spaces - Complexity does not depend on the dimensionality of the feature space. Over feting canbe controlled by soft margin approach = Nice math property: a simple convex optimization problem which is guaranteed to converge to a single global solution = Feature Selection= SVM has been used successfully in many real-world problems \~text (and hypertext) categorization - image classification - bioinformatics (Protein classification, Cancer classification) - hand-written character recognition= Atis sensitive to noise -A relatively small number of mislabeled examples can dramatically decrease the performance \eteonly considers wo classes - how to do multi-class classification with SVM? - Answer: 4 1)To predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into the positive region.= Choice of kernel - Gaussian or polynomial kernel is default - if ineffective, more elaborate kernels are needed - domain experts can give assistance in formulating appropriate similarity measures- e.g. o in Gaussian kernel - ois the distance between closest points with differentclassifications - In the absence of reliable criteria, applications rely on the use of a validationset or cross-validationto set such parameters. = Choice of kernel parameters Cay)= Optimization criterion— Hard margin v.s. Soft margin - a lengthy series of experiments in which various parameters are tested= Functions are the most important aspect of anapplication. = Afunctioncan be defined as the organized block of reusable code, which can be called whenever required. = Python allows us to divide a large program into the basic building blocks knownas a function.= The function contains the set of rogramming statements enclosed by {}. = A functioncan be called multiple times to provide reusability and modularityto the Python Python program. dy rd d( )= Python provide us various inbuilt functions like range() or print(). = Although, the user can create its functions, which can be called user-defined functions.There are mainly two types of functions. = User-define functions - The user-defined functions are those define by the user to perform the specific task. = Built-in functions - The built-in functions are those functions that are pre-defined in Python.= Python provides the def keyword to define the function. = Syntax: def my, function(parameters): function_block return expressionInput Parameter is placed within the parenthesis() and also define parameter inside the parenthesis. The keyword det introduces a del Function name{ parameters ‘atement- function ren _| The code block definition. as within every return [expression function starts with a colon(:) . Return statement exits a function block. And we can also use return with no argument.= The def keyword, along with the function name is used To define the function. = The identifier rule must follow the function name. = A function accepts the parameter (argument), and they can be optional. = The function block is started with the colon (:), and block statements must be a geresnetion. T mentis used to return the value. A function can have onlyoExample # Defining function print_str(str1) ‘** This function prints the string being passed as en argument ‘'* def prigt_ste(stel): prifit(stel) return Function is printing the yument “str. # Calling user-defined function print_str(str1) print_str("Calling the user defined function")= In Python, after the function is created, we can call it from another function. A function must be defined before the function call; otherwise, the Python interpreter gives an error. = To callthe function, use the functionname followed by the parentheses.#fynction definition def hello_world(): print("hello world") # function calling hello_world()= The return statementis used at the end of the function and returns the result of the function. = At terminates the function execution and transfers the result where the function is called. The return statement cannot be used outside of the function.# Defining function def sum(): a=10 b=20 c=atb return ¢ # calling sum() function in print statement print("The sum is:",sum())— + QY A APFittopage CD Pageview | 7A Read aloud Qdeaal Neural Netowik > Antpical Nuual Artwork 1 an port voformetion aclriestecs peas ay stern hak resevubley an char auth ies niwral yelvork + 2 Mumal wor are thor information a) corstrucled and ample ented to Ky der, meodil ashe, ant human brom + muol relvete anrarvh 2s Le 2 Uae main wlycclt Lhe brain develop computational device fox reedteltong de perform vanes semputelonal sack at foila salt Uhan baddhonal ayclong: ete ek ne natbenne malthena3 AIM performs varus Lerks such ay pater: rrailehacng lassificalon appronimation ele: Detach Rewral furluwok, *- we Anus postess davge reernler prectieing Uernartly called ‘noolty! urls‘ or ‘nerony ” Fach neuron 4 conmrckel sacl dvley + 4 bath dk Ls comet ashi conlaun amporrlions abort anpul segrial: nel Ao solu trcgtily enlarcermucded - oun 4 connection fd wale wate Ok, cleol or nsdoctate a es amporrmation Ne seacd Ay neoral a porticutes ieee 4A%202020-21/ml9620ppt/unit3202/unit8202%20notes /unit%202%20notes pat — + A Gfittopege [DB Pageview | A? Read aloud Ueto Arrow, 4s characlircied he D2 collector Bt acall anol gonntet dining Mhot tf human brow autwoores 4 » As wabilaly te leon, patterns ov data arnaler te ¥ ey pour Lo eaprtitly to model was found um brocae crageval neaient demerits art called Mruaxors Thug ANI procesatys Meg AU PeON7 | ot arliftanl peered * iheetre95202020-2.1/mIs620ppt/unit%202/unit%202%20notes/unit%202%20notes. pa — + D PF Cikittopage [D Pageview | A) Read aloud ot fatten secrens * Rural _metiverte fy Massel mel of part Aercat cyiatiens * from Wa abou domple nectiong mel anchaleetors Mra vet amped can be coleulsicl ay. Te FH, WY F HAW6 202020-2 1/mI%20ppt/unit%4202/unit%202%20notes /unit%202%20notes pa — + Q A Gfittopage [1 Pageview | AY Read aloud Me vie ge frchlecture 4 Ann + far Mural nel of port Aereat eyitiors ¥ frcm dre chews imple mecong mel arctuleetre , Ae nel souk aan Le coleulaled ay. Gee 4H Wy + HAD3A%202020-21 /ml320ppt/unit%202/unit%202% 20notes/unit%202%620notes. pdt — + D P Aiittopage pageview | A Read aloud X, xy au np rong eee yaks + Yd eulpal meron aahuch arcesses 24904 MI Ug cert werczhtid aylegconnnelion Lerrkeg KWH care achtialen "4 anpul meeny XU uw he oupul(yed cupul mecces Yi can he oblans ol 44 apply acluinlione® ever al ampat Co Ue fonction 4 nel onpul ilyn) PTAC fonctoon £0 he applid over rut onpal 4 calli acheter’ fomelon* » Uhr ahevt coleutalen of nel erpul ws acmdar tp Aaleutaleen’ of butput a dentar alrorgnk hot eqpsloni Ya mime Rittentent Nuwral Metwok +16202020-21 /mI%20ppt/unit’202/unit%202%20notes/unit%202%520notes. pdf — + DY AP A7ffittopage [DB Pageview | AY Read aloud talid achvclen fonction > Thr ahet ealealen ef net enpul 15 aemdler te Aaleutalon of bulput 4 dencar thant he eyusteos yen Brelogvent Munrol Retwok +. feeteeel » A human braun wonstel vf pucge rember +f ” Marg oppronumnaleey 10" wel nremeraues antercorereclioni y Yha aysloomlee deagvarn of beclogial Reesor 4 Ahoten 4"A%202020-21 /mIs:20ppt/units6202/unit%202%20nates/unit%202%20notes pat — + QD PF Cyfittopage [D pageview | A” Read alou eee ETT EnIEIEI EERIE Wueteug — Dendariley { huclegreal neuron i Diagram e The Aeclegacal naaen tonics 4 Lrru mon pel: 0 Soma or call b basher Uy ell necks as Locolid 2) Dendittey 5 rwhirr Ur rene 42 connected debacle ” Theis ot eaten%202020-21/ml%20ppt/unit%202/unit%2025 20notes /unit%202%20nates pat — + QD Aittopage [Ph Pageview | A? Read atoud “ul v v Tae Aceleg cal mewren tenia ¥ dha men pel: ) Soma or ell bedlsy she Yu wall ruckus ws hoolid v 2) Dendu aahire Ua rurnve 42 connecled de tact acuity dhe ompulees of meuren® 3) Amen: ashi Palaleonehap Artuten Arelogecal anel ANNs | Beotegicad Musron AWN s Cell Nuvren denetailey pughta on . en Lictermcelacrt ae tet expat Avon autpact

ML Unit 1 CS
100% (2)
ML Unit 1 CS
102 pages
Well Posed Learning Problem
No ratings yet
Well Posed Learning Problem
5 pages
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
100% (2)
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
112 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
Designing A Learning System
No ratings yet
Designing A Learning System
21 pages
Unit 1
No ratings yet
Unit 1
20 pages
Video Tutorial: Machine Learning 17CS73
100% (2)
Video Tutorial: Machine Learning 17CS73
27 pages
ML UNIT-1 Notes PDF
No ratings yet
ML UNIT-1 Notes PDF
22 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
135 pages
Unit I
No ratings yet
Unit I
132 pages
UNIT 1 Machine Learning MTech
No ratings yet
UNIT 1 Machine Learning MTech
167 pages
Designing A Learning System
No ratings yet
Designing A Learning System
12 pages
ML Unit 1
No ratings yet
ML Unit 1
156 pages
Machine Learning
No ratings yet
Machine Learning
99 pages
Last Time: - Web As A Graph - What Is Link Analysis
No ratings yet
Last Time: - Web As A Graph - What Is Link Analysis
78 pages
Unit1 ML NGP
No ratings yet
Unit1 ML NGP
106 pages
ML 1
No ratings yet
ML 1
86 pages
Module 3 - AIML
No ratings yet
Module 3 - AIML
134 pages
01 Introduction ML
No ratings yet
01 Introduction ML
60 pages
Unit 1
No ratings yet
Unit 1
62 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
134 pages
ML Notes
No ratings yet
ML Notes
47 pages
ML Unit-1
No ratings yet
ML Unit-1
70 pages
Machine Learning
No ratings yet
Machine Learning
111 pages
Unit-1 ML
No ratings yet
Unit-1 ML
39 pages
ML Unit-1 B.Tech
No ratings yet
ML Unit-1 B.Tech
44 pages
Machine Learning - v1
No ratings yet
Machine Learning - v1
30 pages
MODULE1
No ratings yet
MODULE1
90 pages
Design A Learning System in Machine Learning
No ratings yet
Design A Learning System in Machine Learning
41 pages
ML-UNIT-1 - Introduction PART-1
No ratings yet
ML-UNIT-1 - Introduction PART-1
60 pages
Unit 1: Some Successful Applications of Machine Learning
No ratings yet
Unit 1: Some Successful Applications of Machine Learning
28 pages
Unit 4
No ratings yet
Unit 4
45 pages
ML Lec 03 Machine Learning Process
No ratings yet
ML Lec 03 Machine Learning Process
42 pages
Svit Dept of Computer Science and Engineering Machine Learning B.Tech Iiiyr
No ratings yet
Svit Dept of Computer Science and Engineering Machine Learning B.Tech Iiiyr
53 pages
ML Unit-I Chapter-I Introduction
No ratings yet
ML Unit-I Chapter-I Introduction
36 pages
Unit 1 1
No ratings yet
Unit 1 1
26 pages
Learning
No ratings yet
Learning
35 pages
M01 Machine Learning
No ratings yet
M01 Machine Learning
25 pages
Module 1
No ratings yet
Module 1
27 pages
ML - Unit 1 - Part I
No ratings yet
ML - Unit 1 - Part I
24 pages
ML Unit I Notes
No ratings yet
ML Unit I Notes
27 pages
ML Unit-1 Notes
No ratings yet
ML Unit-1 Notes
15 pages
Ai&ml Unit 4
No ratings yet
Ai&ml Unit 4
21 pages
ML Notes
No ratings yet
ML Notes
26 pages
ML 1
No ratings yet
ML 1
21 pages
Unit 1 ML
No ratings yet
Unit 1 ML
14 pages
Unit - 1
No ratings yet
Unit - 1
11 pages
Artificial Intelligence: Chapter 5 - Machine Learning
No ratings yet
Artificial Intelligence: Chapter 5 - Machine Learning
30 pages
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
No ratings yet
Course. Introduction To Machine Learning Lecture 1. Introduction To ML
46 pages
1 Introduction To Machine Learning
No ratings yet
1 Introduction To Machine Learning
20 pages
Unit 1.2 Desigining A Learning System
No ratings yet
Unit 1.2 Desigining A Learning System
15 pages
ML (Unit-1)
No ratings yet
ML (Unit-1)
17 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
ML
No ratings yet
ML
79 pages
Unit 1
No ratings yet
Unit 1
8 pages
Key Ideas in Machine Learning
No ratings yet
Key Ideas in Machine Learning
11 pages

MLT Part 1

Uploaded by

MLT Part 1

Uploaded by

You might also like