This document provides an introduction to machine learning, including definitions and examples. It discusses the difference between supervised, unsupervised, and semi-supervised learning. It also describes common machine learning algorithms like Naive Bayes, k-means clustering, support vector machines, Apriori, linear regression, and decision trees. The key applications and characteristics of each algorithm are outlined.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
54 views30 pages
Chapter 1 ML
This document provides an introduction to machine learning, including definitions and examples. It discusses the difference between supervised, unsupervised, and semi-supervised learning. It also describes common machine learning algorithms like Naive Bayes, k-means clustering, support vector machines, Apriori, linear regression, and decision trees. The key applications and characteristics of each algorithm are outlined.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30
CHAPTER ONE
Introduction to machine learning and it’s
applications What is machine learning? • Machine learning, sometimes referred to as statistical learning, is a subfield of artificial intelligence (AI) whereby algorithms “learn” patterns in data to perform specific tasks. • Arthur Samuel, a scientist at IBM, first used the term machine learning in 1959. • He used it to describe a form of AI that involved training an algorithm to learn to play the game of checkers. • The word learning is what’s important here, as this is what distinguishes machine learning approaches from traditional AI. What is machine learning?... What is machine learning?... • The ML process is viewed as two phases, namely: a. Learning phase and: The training data is fed to the learning phase for predictions. b. Prediction phase: In the prediction phase, the predicted results are obtained for the new data regarding the concerned trained model. The difference between a model and an algorithm • In practice, we call a set of rules that a machine learning algorithm learns a model. • Once the model has been learned, we can give it new observations, and it will output its predictions for the new data. • We refer to these as models because they represent real-world phenomena in a simplistic enough way that we and the computer can interpret and understand it. • Just as a model of the Eiffel Tower may be a good representation of the real thing but isn’t exactly the same, so statistical models are attempted representations of real-world phenomena but won’t match them perfectly. Classes of machine learning algorithms • All machine learning algorithms can be categorized by their learning type and the task they perform. There are three learning types: Supervised Unsupervised Semi-supervised Reinforcement learning. • The type depends on how the algorithms learn. Do they require us to hold their hand through the learning process? Or do they learn the answers for themselves? • Supervised and unsupervised algorithms can be further split into two classes each: Supervised Classification Regression Unsupervised Dimension reduction Clustering Supervised Learning • Imagine you are trying to get a toddler to learn about shapes by using blocks of wood. • In front of them, they have a ball, a cube, and a star. You ask them to show you the cube, and if they point to the correct shape, you tell them they are correct; if they are incorrect, you also tell them. • You repeat this procedure until the toddler can identify the correct shape almost all of the time. • This is called supervised learning, because you, the person who already knows which shape is which, are supervising the learner by telling them the answers. Supervised Learning… • A machine learning algorithm is said to be supervised if it uses a ground truth or, in other words, labeled data. • For example, if we wanted to classify a patient biopsy as healthy or cancerous based on its gene expression, we would give an algorithm the gene expression data, labeled with whether that tissue was healthy or cancerous. • The algorithm now knows which cases come from each of the two types, and it tries to learn patterns in the data that discriminate them. Supervised Learning… • SML approach’s aim is to learn the mapping function ‘f’ from the input variable (X) to the output variable (Y). • The mapping function tries to estimate during the arrival of novel input data (X) and forecast the corresponding output variable (Y). • This algorithm learns from the training set and stops learning when the acceptable performance level is reached. • The subdivisions of SML are classification (output variable needs categorization) and regression (output is real value). • Some of the examples of SML algorithms are linear regression, random forest and Support Vector Machine (SVM). Unsupervised Learning • Now imagine a toddler is given multiple balls, cubes, and stars but this time is also given three bags. • The toddler has to put all the balls in one bag, the cubes in another bag, and the stars in another, but you won’t tell them if they’re correct—they have to work it out for themselves from nothing but the information they have in front of them. • This is called unsupervised learning, because the learner has to identify patterns themselves with no outside help. Unsupervised Learning… • A machine learning algorithm is said to be unsupervised if it does not use a ground truth and instead looks on its own for patterns in the data that hint at some underlying structure. • For example, let’s say we take the gene expression data from lots of cancerous biopsies and ask an algorithm to tell us if there are clusters of biopsies. • A cluster is a group of data points that are similar to each other but different from data in other clusters. • This type of analysis can tell us if we have subgroups of cancer types that we may need to treat differently. Unsupervised Learning… • The objective of the USML is learning, understanding and exploring the unlabeled input data (X) without historical data . • The USML is subdivided into association (discover rules to describe the data) and clustering (discover inherent group in data). • Examples of USML include apriori and kmeans methods for association and clustering problems, respectively. Semi-supervised Learning • Most machine learning algorithms will fall into one of these categories, but there is an additional approach called semi-supervised learning. • As its name suggests, semi-supervised machine learning is not quite supervised and not quite unsupervised. • Semi-supervised learning often describes a machine learning approach that combines supervised and unsupervised algorithms together, rather than strictly defining a class of algorithms in and of itself. The premise of semi-supervised learning is that, often, labeling a dataset requires a large amount of manual work by an expert observer. Semi-supervised Learning… • This process may be very time consuming, expensive, and error prone, and may be impossible for an entire dataset. • So instead, we expertly label as many of the cases as is feasibly possible, and then we build a supervised model using only the labeled data. • We pass the rest of our data (the unlabeled cases) into the model to get their predicted labels, called pseudo-labels because we don’t know if all of them are actually correct. • Now we combine the data with the manual labels and pseudo-labels, and use the result to train a new model. • This approach allows us to train a model that learns from both labeled and unlabeled data, and it can improve overall predictive performance because we are able to use all of the data at our disposal. Semi-supervised Learning… • The SSML is considered as a mixture of SML and USML, and arises when huge unlabeled input data (X) with few labelled data (Y) . • That is, the mixture of very few labelled data and huge unlabeled data. • The SML and USML algorithms can be applied to get knowledge and discover the data Machine Learning Algorithms • Naïve Bayes Classifier (NBC) Algorithm NBC algorithm helps to classify a webpage, a document, a lengthy text note or an email, it is not easy to do it manually. Its popular application is spam filtering. The classifier function assigns a population’s element value as one of the accessible categories. Sentiment Analysis is also its popular application. It performs well when we have categorical input variables. Machine Learning Algorithms… • K-Means Clustering Algorithm When a user searches for a keyword “jaguar” in the World Wide Web (WWW), results will be the aggregation of Jaguar animal, Jaguar car and Jaguar Mac Operating System. The similar characteristic featured contents (may be text, images, videos, etc) are grouped and displayed to the user. The grouping (also referred as clustering) of similar features is done by K- means clustering Algorithm. Further, this algorithm is adopted by many search engines (namely, Google and Yahoo) to group the web pages based on the similarity and identifies ‘relevance rate’ of concerned search results. Machine Learning Algorithms… • Support Vector Machine (SVM) Algorithm SVM, a supervised algorithm, is used for classification problems. SVM learns about classes of input dataset which are differentiated by a line in the dataset to classify the new data. It is categorized as Linear and Non-linear SVM. Linear SVM can classify using just a hyper plane but it’s not possible for non- linear SVM. This technique does not make any substantial assumptions on the data. It is used to forecast stock market values. Machine Learning Algorithms… • Apriori Algorithm Apriori algorithm, an unsupervised algorithm, depicts association rules from the input data set. Association rule entails that on occurrence of item ‘X’, item ‘Y’ happens definitely, similar to “if-then” rule. For instance, IF a person purchases bike, THEN certainly they will purchase helmet. Critical observations on such purchases are examined to ensure the optimum decisions. Detecting unfavourable drug reactions and auto-complete applications are its applications. Machine Learning Algorithms… • Linear Regression (LR) Algorithm LR algorithm depicts the association between the two variables and how they influence each other. It shows the impact on the dependent variable (termed as factor of interest) when the independent variable (termed as explanatory variables) is changed. It is a wide spread ML technique with rapid speed. Estimating sales and risk assessment is its applications Machine Learning Algorithms… • Decision Tree Algorithm It uses the branching method to represent the possible consequences for a decision with respect to specific scenario. The decisions in a tree are inversely proportional to the expected precision. Two sub-categories of decision tress are Classification and Regression trees. The classification and regression trees are used during the classification of nature and continuous/numerical variables, respectively. The decision tree classification is mainly used in finance and banking sectors to classify the loan applicants based on the probability of payments in a specific periodic intervals. Machine Learning Algorithms… • Random Forest Algorithm Random Forest algorithm works by creating a cluster of decision tree with a subset of randomly chosen data. The prediction performance is improved by repeated training of a model with random samples. The final prediction is made from the integration of decision trees output followed by the polling procedure. Machine Learning Algorithms… • Logistic Regression Algorithm Mainly this type of classification is applied to classify the discrete classes. For instance, to detect the mail as spam or not, online transaction made as a genuine or fraud, tumor classification as benign and malignant, etc. This type of regression transforms the desired output using sigmoidal function to determine the probability value. If the resultant classification is with two outputs (for instance, benign or malignant), then it is referred as Binary Logistic Regression. On the other hand, if the resultant output is of categorical values (for instance, dogs, cats, horses, etc), then it is referred as Multi-Linear Logistic Regression method. Machine Learning Algorithms… • Spam Detection: Received emails are identified whether they are spam if yes then they are not shown to the user with regular emails in their inboxes. All these spam emails are maintained in a separate folder. This is done by learning how to recognize a spam mail by identifying the characteristics from newly received ones. • Face Detection: Identifying faces from a given number of photos and tagging them automatically if next time the same face is detected for a new photo. Currently, Facebook is popularly known for using this technique where when we upload a picture some suggestions of friends are automatically detected and shown to the user whose friends are in the picture. For example, Google photos separate all the pictures according to people in it. Machine Learning Algorithms… • Credit Card Fraud Detection: According to customers past transactions, if there are any inappropriate purchases made then the customer is warned immediately about the condition. • Digit Recognition: Machine’s camera detects postal codes that are handwritten and arranges all the letters according to the geographical locations they have to reach. The machine is trained to learn handwritten numbers and transforms into digital signs. • Speech Understanding: Deals with listening to a speech by the user, understand user intentions process what machine has understood. Machine is expected to follow instructions from the users accurately. “Cortana” by Windows, “Siri” by Apple and “Okay Google” by Google are popular and successfully implemented applications of this technique. Machine Learning Algorithms… • Product Recommendation: With all the data of a customer’s past purchases or interests online, the machine will recommend some products which would attract customers to view them and maybe even purchased. Flipkart, Amazon and many other e-commerce sites have been implanting this technique where we receive only recommended products as advertisements. • Medical Diagnosis: For detecting diseases more accurately, hospitals these days are using machines that could decide whether a person is affected with any diseases for symptoms he/her has, with the help of complete data about all diseases and symptoms. Machine Learning Algorithms… • Stock Trading: Given the current and past price fluctuations of a stock, the machine decides to hold or sell the stock for better service of the customer(s). • Customer Segmentation: With the help of past behavior patterns, the ML algorithms tries to predict the number of users who will choose the paid versions from the trial versions. Amazon Prime has implemented this technique. Machine Learning Algorithms… • Chat smarter with Allo: Allo is a messenger application by Google which will learn from its users, how they are responding to what kind of messages and recommend the user with a response that he /she has thought of typing. • Financial Service: Companies can identify the company insights of financial sector data and can overcome the occurrence of financial fraud. It is used to identify the opportunities for investment and trade. We can also prevent the financial risks prone institutions by using cyber surveillance and take necessary actions to prevent fraud. Machine Learning Algorithms… • Transportation: By the travel history and pattern of travelling in various routes, machine learning helps in Transportation Company to predict the problem in routers and advice their customers to choose a different route. Transportation firmly uses machine language to carry out data analysis and data modeling. • Computational Intelligence: For many years computational intelligence is being developed actively. Constant improvements and improvements are being carried out on classical methods like machine learning algorithms. These days computational intelligence has been for many applications directly or indirectly. Machine Learning Algorithms… • Natural language processing: It is a field that which involves both computer understanding and manipulation of human language and its good in gathering new possibilities. It is mostly seen in a large pool of legislation or other document sets, trying to discover new patterns or to root out corruption. It is a better way to analyze, understand and find the meaning of human language easily and smartly. By using NLP developer can perform tasks such as speech recognition, entity recognition, automatic translation, and summarization.
Unit-4object Segmentation Regression Vs Segmentation Supervised and Unsupervised Learning Tree Building Regression Classification Overfitting Pruning and Complexity Multiple Decision Trees