DataAnalyticsChapter2Vision PDF

Uploaded by

King of Success

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

43 views

DataAnalyticsChapter2Vision PDF

Uploaded by

King of Success

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 56

Machine Learning Cn Overview 1. Introduction In the last few years, we have often heard about terms like Artificial Intelligence (AT), Machine Leaming etc, These have become buzzwords of our times. The reason for this is the hi igh amount of data production by applications, the increase in computation power in the past few years and the development of better algorithms. AI and Machine Learning are used anywhere, from automating mundane tasks to offering intelligent insights. Almost all sectors are employing these techniques to improve their business, for decision making, formulating strategies and providing better service to customers. We may already be using these in many ways. For example, a wearable fitness tracker like Fitbit, or an intelligent home assistant like Google Home and many more. Computers have been assisting humans in performing various tasks, While exploiting the power of computer systems, the curiosity of humans led them to wonder, “Can a machine think and behave like humans do?” Thus, the development of AI and Machine learning started with the intention of creating similar intelligence in machines that we find in humans. In this chapter, we will understand terms like Artificial Intelligence and Machine Learning in more detail along with their types and applications. a42-2 [wien Data Anabtice 2. Introduction to Machine Learning, Deep Learning, Artificial Intelligence There is a lot of confusion regarding the terms Artificial intelligence, Machine learning ang Deep leaming. In this section, we will understand these terms and their relationship with each other, 2.1 Artificial Intelligence Human beings possess natural intelligence. Human intelligence is a quality that helps humans in learning, understanding, and solving problems. They can leam from the environment, understand, reason and make decisions, For several decades, there is ongoing research and efforts to make machines think like humans i.e, make them intelligent. Amiificial intelligence is a branch of Computer Science that aims to create intelligent machines. While human intelligence looks to adjust to new environments by using a combination of various cognitive processes, AI aims to create machines that can imitate human behavior and perform human-like actions. One of the greatest innovators in the field was John ‘McCarthy, widely recognized as the father of Artificial Intelligence due to his astounding contribution in the field of Computer Science and AL It was in the mid-1950s that McCarthy coined the term “Artificial Intelligence” which he defined as “the science and engineering of making intelligent machines”. Artificial Intelligence is the simulation of human intelligence by computer systems. The goal of Al is to create computer models that exhibit “intelligent behaviors” like humans This means machines that can Tecognize a visual scene, understand text written in natural language, or perform intellectual tasks as decision making, problem solving, perception. understanding human communication, and more, Artificial Intelligence is a science and technology based on disciplines such as Computer Science, Biology, Psychology, Linguistics, Mathematics, and Engineering. It is employed in almost all sectors today. A few of the applications of AI are discussed below. Applications of AI L Healthcare; The healthcare sector is applying AI to make a better and faster diagnosis than humans. AI can help doctors with diagnosis and can inform when patients areiv. G\ Machine Learning Overview weon\ 2-3 worsening so that medical help can reach the patient before hospitalization. Al can help in early detection, personalization, even disease diagnosis. Internet and e-commerce: Al is effectively used for various purposes like Search engines, recommendation systems, Targeted advertising and increasing internet engagement, Virtual assistants, Spam detection and filtering, Language translation, Facial and image recognition. AI is used in cyber security for network protection, intrusion detection systems, identifying attacks and malware, identifying suspect user behavior, identity and fraud detection. Automobiles: Artificial Intelligence is used to build self-driving vehicles. Al can be used along with the vehicle’s camera, radar, cloud services, GPS, and control signals to operate the vehicle. AI can improve the in-vehicle experience and provide additional systems like emergency braking, blind-spot monitoring, and driver-assist steering. “Security and Surveillance: Al has made it possible to develop face recognition tools which may be used for surveillance and security purposes. The footage from traffic and CCTV cameras can be analyzed in real-time which provide real-time insights. Education: In the education field, AI is used for automate administrative tasks to Aid Educators, Creating smart content, Voice assistants, identifying special needs of the students for different types of learners. Navigation: Al is heavily used by many logistics companies to improve operational efficiency, analyze road traffic, and optimize routes. The travel industry uses Al to suggest hotels, flights, and best routes to the customers. Agriculture: In agriculture, AI has helped farmers identify areas that need irrigation, fertilization or pesticide treatments, increasing yield, to predict the ripening time for crops, monitor soil moisture, operate agricultural robots, and detect diseases and pests. Entertainment: Online streaming services like Netflix and Amazon Prime, relies heavily on the info collected by the users. This helps with recommendations based upon the previously viewed content. In the gaming sector, AI can be used to create smart, human- like objects to interact with the players. Defence: Al can aid in the extraction of useful information from linked devices like radars and autonomous identification systems. This data can aid in the detection of any unlawful or suspicious activity, as well as alerting the appropriate authorities. AI is also used for image interpretation for target identification and classification, and for diagnosis and maintenance of sophisticated weapon systems such as radars and missiles.2 2-4 /wiion Data Analytics 2.2 Machine Learning Artificial Intelligence is a general or broader concept that enables a machine to simulate intelligence and behavior. Learning is an important part of intelligence. Humans learn from the environment, improve their understanding with time and improvise as required. For a machine j be intelligent, it also has to learn. This “Jearning” can be enabled by using Machine learning Machine learning is a subset of Artificial Intelligence which allows a machine to automaticaly Jearn from past data without programming explicitly. ML is a field that focuses on the learning aspect of Al by developing algorithms that best represent a set of data. “out a certain task. These programs or algorithms are designed in a way that they leam s improve over time when are exposed to new data. It takes examples with answers and learns the rules (patterns) that yield those answers given the data. The term “Machine learning” was defined in the 1950s by AI pioneer Arthur Samuel as “the field of study that gives computers the ability to learn without explicitly being programmed.” Machine Learning vs. Traditional Programming In Traditional Programming (Figure A), the programmer codes the algorithm, which then processes information according to the rules defined in the program and outputs results. b Traditional Programming, we write down the exact steps required to solve the problem. On t other hand, in Machine Learning, the input to the computer is a subset of the data and outp¥t® generate the program (rules or patterns) (Figure B). For example, if you feed sample emails * input and the observed output i.e. spam or not, the machine learning algorithm will formulate &* program which would know how to predict if a new email is spam or not spam. That program ® called a model. : Figure 2.1: Tradtional programming versus machine learning paradigm‘Machine Learning Overview uision\ 25 (A) In Traditional programming, a computer is Supplied with a dataset and an algorithm. The algorithm informs the computer how to operate upon the dataset to create outputs. (B) In Machine learning, a computer is supplied with a dataset and associated outputs. The computer learns and generates an algorithm that describes the Telationship between the two. This algorithm can be used for inference on future datasets. Examples of machine learning problems include, “Is this cancer?”, “What is the market value of this house?”, “Which of these people are good friends with each other”", “Will this person like this movie?”, “Who is this?”, “What did you say?”, and many such. In 1997, Tom Mitchell gave a “well-posed” definition of Machine Learning. | Example1 ee cc ] | Task (1): Image recognition | Experience (E): A database of thousands of images | | Performance (P): Number of images correctly identified _ | Example 2 ) | ‘Task (TT): Predict traffic on the road \ Experience (E): Data about past traffic patterns Performance (P): Number of vehicles predicted ee eels pred 2.3 Deep Learning In early years, AI based systems focused heavily on rule-based systems that would make Predictions using predefined sets of rules that had to be provided by a subject matter expert. In machine learning, machines understand patterns within data and use this underlying structure to take decisions about a given task. There are many ways that machines aim to understand these underlying patterns, Deep learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. With Accelerated computational power and large data sets, deep learning algorithms are able to self- Team hidden patterns within data to make predictions.A 26 /wsian Data Analytics Unlike machine learning models where the developer has to choose and encode the feature beforehand, Deep learning uncovers the hidden features in the dataset on its own. The followin, picture shows the difference between machine learning and deep learning. Machine Learning Car tea cee ; - Not Car Ges Gok Boe — 7 hp Feature extractions Classification thet Deep Learning Car > — Not Car Input Feature extractions + Classification Output Figure 2.2: ML vs. DL How Deep Learning Works? The human brain contains a biological neural network which is a network or circuit of neurons. Neurons carry signals and transmit information between different areas of the brain, and between the brain and the rest of the nervous system. Deep learning algorithms use something called an Artificial Neural Network to find associations between a set of inputs and outputs. The neural network is the heart of deep learning models, and it was initially designed to mimic the working of the neurons in the human brain. A neural networks consist of three main layers: i, The input layer The hidden layer(s) iii, The output layer iiiz Machine Learning Overview wsion\ 2-7 All the layers are composed of “nodes”. The input and output layers are considered the visible layers. The input layer is where the network takes in data for Processing, and the output layer is the result of the classification, detection or prediction problem. The Hidden layers apply given transformations to the input values inside the network. Most of the computation takes place in the hidden layers. The “deep” part of deep learning refers to creating deep neural networks. This refers to a neural network with a large amount of layers - with the addition of more weights and biases, the neural network improves its ability to approximate more complex functions. Hidden Layer Ay WOW OO @ fell eee : ee Input Layer Output Layer W ig K\ Nef Figure 2.3; Artificial neural network The neural network is composed of neurons. An artificial neuron has the structure as shown below. (Inputs) 4 a} Yores (Activation (Summation function) function) (Weights) Figure 2.4: Structure of a neuron2.8 /whion Data Anayios from the connected neurons, multiplies the values ‘A neuron takes all the incoming values ‘ lies an activation function. Then, the neuron; their respective weight, adds them, and then app! ready to send its new value to other neurons, Several activation functions are used in deg learning, These include the Step function, Sigmoid function, ReLU (Rectified L Linear Unit), etc Network learning depends on two main processes: i. Forward propagation: The forward propagation is the set the inputs into the outputs. ii, | Back propagation: The back propagation aims to minimize the error by updating ¢ weights and biases. It calculates the gradients of the loss function (also known as the cx function) with respect to the weights, biases and activations of the network. of operations ‘that transfor Common use cases of deep learning are image classification, time series forecasting, and fr: detection. Difference between Machine Learning and Deep Learning Machine Learning - Deep Learning Data Performs well on small to medium datasets. Performs wellon large datasets a The data must be well-curated, which-means | Does not require the data to © Preprocessing | that the data is carefully preprocessed. prosaaascll Features | Features need to be manually identified. Leams features automatically ivas > | sow OheaoneneaD Requires significant comput ae aces power e.g., GPU Training time [| Quick to train Computationally intensive Relation between AI, ML and DL There is a close connection and overlap among the fields of machine learning, Al, and learning, as depicted in the following figure. Figure 2.8: Relation between Al, ML and DLom Machine Learning Overview wsion\ 2-9 Alcan be understood as the field for developing non-biological systems that exhibit human-like intelligence. After a lot of research in this area, machine learning emerged as one of the most promising directions towards developing such artificially intelligent systems, Machine learning is about teaching machines or computers how to learn from data without the need for humans to hand-code specific rules, Recently, deep learning emerged as a specialized form of machine learning that is concerned with algorithms inspired by the structure and function of the brain called artificial neural networks, A simply analogy to explain the difference between these three + terms is given below, [lle Consider a basket of fruits containing fruits of different types.-The fruits must be examined and put into separate baskets. If done manually, humans would use their knowledge and experience to identify each fruit and put it into its correct basket. AI based approach: Stick a label onto each fruit. A scanner will scan the fruit, read the label, inform the algorithm and the machine would route the fruit into its correct basket. ML based approach: Define the features and attributes that characterize each fruit: such as sizes, colors, shapes, etc. The algorithm is trained to classify the fruits based on these features. The algorithm classifies the actual fruits and routes them to their correct baskets. Deep learning approach: The DL approach does not require features to be extracted and fed to the algorithm. By providing the DL model with lots of images of the fruits, it will build up a pattern of what each fruit looks like, The image’ will be processed through different layers of the neural network. Each layer will define and add specific features of the images, like the shape, size, color etc. The machine will finally scan each fruit and put it into its correct basket. 2.4 Applications for Machine Learning in Data Science We have an abundance of data today, generated in huge volumes every second. In order to make sense of this data and gain meaningful insights, complex analytics must be carried out on the data, Data science basically comprises of various fields and techniques, like Statistics, Mathematics, Data analysis, and Data analytics to draw meaningful insights from the data. Machine learning is €N umbrella term for a set of techniques and tools that help computers learn and adapt on their ‘Own using data, The idea behind Machine Learning is that you train machines by feeding them data and defining features. The machine observes the dataset, identifies patterns in it, learns> 2-10 / tion Data Analytics and makes predictions. Without data, there is very little tha lel is built automatically and further traine; automatically from the behavior, Learning Algorithms are used in th. machines can lear. In machine learning, a data mod ns. This is where the Machine to make real-time predit Data Science Lifecycle. Machine Learning Algorithms are used for three key problems in Data Science: qeRegession fg 2-22 Gig Classification Pg 2-34 qanicnasiening? py 2-45 everal Machine Learning use cases in day Based on the above three categories, we have st science. These include: i, Fraud detection ‘Speech recognition Medical diagnosis Online Recommendation Engines Language translation Image recognition Sentiment analysis. Customer profiling Recommender systems Spam and malware detection x. xi, Predictions: Weather predicti re a prediction, Traffic prediction, Stock market trends, Housing price’ xii, Surveillance systems xiii, Virtual assistants xiv. Autonomous cars xv. Dynamic pricingZa Machine Learning Overviow wision\\. 2-11 3. The Modeling Process The modeling phase consists of four steps: i. Feature engineering and model selection ii, Training the model iii, | Model validation and selection iv. Applying the trained model to unseen data Engineering Features and Selecting a Model Machine learning starts with data — numbers, photos, or text, like bank transactions, pictures of people, shopping data, system logs, time series data from sensors, or sales reports. The data is gathered and prepared to be used as training data, or the information the machine learning model will be trained on. The more data, the better the program. A model consists of constructs of information called features or predictors and a target or response variable. The goal of the model is to predict the target variable, for example, tomorrow's high temperature. The variables that help you do this are the features or predictor variables such as today’s temperature, cloud movements, current wind speed, and so on. The best models are those that accurately predict the value. To achieve this, feature engineering is the most important part of modeling. It is important to choose or create possible predictors for the model. Certain features are the variables you get from a data set. Sometimes you have to use modeling techniques to derive features. Tralning the Model A major goal of the machine learning process is to find an algorithm f(X) that most accurately Predicts future values based on a set of features. In other words, we want an algorithm that not only fits well with our past data, but more importantly, one that predicts a future outcome accurately. With the right predictors and a modeling technique in mind, the next step is to train the model. Machine learning models require sufficient amounts of data. The dataset is split into ‘raining, validation, and test sets. Approximately 60-80% of the data is used for training the Model, In this training phase, the model is presented with training data from which it can learn, Once a model is trained, the next step is to validate the model.ae 2-12 /wsion Data Anaytics Validating the Model Validation is extremely important because it determines whether your model works in real-lif, conditions. Once the model has been trained on training data, it must be validated. Some data i, held out from the training data to be used as evaluation data, which tests how accurate th. machine learning model is when it is shown new data. There are several modeling technique, and the question is which one is the right one to use. A good model has two properties: it has good predictive power and it generalizes well to data it hasn’t seen. To achieve this, we have to evaluate the model based on an error measure (how wrong the model is) and a validation strategy. Several error measures for Classification (Accuracy, Precision, Fl score, ROC etc.) anc Regression (Mean Squared Error, Mean Absolute Error etc.) have been defined. Many validation strategies exist, including the following common ones: i. Divide the data into a training set with X% of the observations and keep the rest as « holdout data set (a data set that’s never used for model creation). ii, K-folds cross validation: This strategy divides the data set into k parts and uses each par one’ time as a test data set while using the others as a training data set. This has the advantage that you use all the data available in the data set. iii. Leave-1 out: This approach is the same as k-folds but with k=1. You always leave ont observation out and train on the rest of the data. This strategy is used when the dataset 's very small. . Another technique often used is Regularization. Regularization is mainly used to stop a mode! from using too many features and thus prevent over-fitting. L1 regularization aims to use as fe predictors as possible. L2 regularization aims to keep the variance between the coefficients of the predictors as small as possible so that the actual impact of each predictor can be measured. Predicting New Observations After successfully performing the first three steps, the model is ready to be applied in a real situation. The process of applying the model to new and unseen data is called model scoring: Model scoring involves two steps. i, Prepare a data set that has features exactly as required by the model. This means that the data preparation step has to be repeated for the new data set. ii. Apply the model on this new data set, and this results in a prediction.Zz Machine Leaming Overview wision\ 2-13 Types of Machine Learning Machine learning algorithms can be classified into the following three types: Supervised Unsupervised Reinforcement learning learning learning yw 9 : 6-6 Task driven Data driven Learning from (Classification/Regression) (Clustering) mistakes (Playing games) Figure 2.6: Types of Machine Learning Supervised Learning Supervised machine learning models are trained with labeled data sets, which allow the models to leam and grow more accurate over time. For example, an algorithm would be trained with pictures of different fruits, all labeled by humans, and the machine would learn ways to identify fruits on its own. Supervised machine learning is the most common type used today. Input Raw Data Training Desired Data Set Output -|&] —|&++-@ | r &@ Algorithm Processing Figure 2.7: Supervised learning> 2-14 h ‘ation ‘Data Analytics The supervised learning approach is similar to human learning under the supervision of teacher, The teacher provides good examples for the student to memorize (learn), and the stu, then derives general rules from these specific examples to use on anew example, In other wen) this algorithm leams from example data (training data) and assuciated response (taryet), 11 done to predict the correct response when given a new example (test data). The supervised learning tasks are: i, Regression: Regression involves predicting numeric data, such as test scores, laboratory values, or prices of an item, much like the housing price example. fi, Classification: Classification, on the other hand, entails predicting to which category an example belongs. Classification * Regression Figure 2.8: Classification and Regression Real-life Examples Email Spam (Classification): The algorithm takes a set of Span and non-spam emails as inp! Consequently, it draws patterns in data to classify spam from others, Stock Price Prediction (Regression): Historical business market data is fed to the algorithm " this method, With proper regression analysis, the new Price for the future is predicted. For example, suppose a real estate company would like to predict the price of a house based &" specific features of the house, To begin, the Company would first gather a dataset that cont" any Hnsanees: Each Instance represenis singular observation of a house and associ! features. Features are the recorded properties of a house that might be useful for predicti"® Prioes (eg. total square-footage, numberof floors, the presence ofa terrace ox balcony et.) ™ea\, Machine Learning Overview wision\ 2-15 target is the feature to be predicted, in this case the housing price. Datasets are generally split into training, validation, and testing datasets. Supervised learning uses patterns in the training dataset to map features to the target so that an algorithm can make housing price predictions on future datasets. This approach is supervised because the model infers an algorithm from feature- target pairs and is informed, by the target, whether it has predicted correctly. The most widely used supervised learning approaches include: Linear regression Logistic regression Decision trees iv. Gradient boosted trees v. Random forest vi. Support vector machines K-nearest neighbors etc, Unsupervised Learning In unsupervised machine learning, a program looks for patterns in unlabeled data. In contrast to Supervised learning, unsupervised learning aims to detect patterns in a dataset and categorize individual instances in the dataset to categories. These algorithms are unsupervised because the Patterns that may or may not exist in a dataset are not informed by a target and are left to be determined by the algorithm. Unsupervised machine learning can find patterns or trends that People aren't explicitly looking for. For example, an unsupervised machine learning program Could look through online sales data and identify different types of clients making purchases. Some of the most common unsupervised learning tasks are clustering, association, and anomaly detection, Figure 2.9: Unsupervised learning(> 2-16 / ‘VisiON ~~ Data Analytics This approach is similar to a student learning by himself or herself. Sometimes the student doe; hot need explicit supervision of a teacher. Instead, the student acquires knowledge (learns) baseq on past experiences or observations (test data) and applies that knowledge in an unseen scenario (test data), In simple words, this learning occurs when an algorithm learns from plain examples without any associated target (response). It leaves the algorithm to, determine the data patterns on its own, Popular techniques used in Unsupervised Learning i. k-means clustering ii, _ PCA (Principal Component Analysis) iii, Association rule mining Seml-supervised Learning Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data). The limitation of any Supervised Learning algorithm is that the dataset has to be hand-labeled either by a Machine Learning Engineer or a Data Scientist. This is a very costly process, especially when dealing with large volumes of data ‘The limitation of Unsupervised Learning is that it has the limited number of applications. Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. In this type of learning, the algorithm is trained upon a combination of ‘labeled and unlabeled data. Typically, this combination will contain a very small amount of labeled data and a very large amount of unlabeled data. As an example, in Supervised learning, a student is under the supervision of a teacher. Unsupervised learning where a student has to learn on his own. In Semi-Supervised learning. 2 teacher teaches a few concepts in class and the student learns the remaining topics which are based on similar concepts on his own. In Assumptions followed by Semi-Supervised Learning To work with the unlabeled dataset, there must be a relationship between the objects. Semi-supervised learning uses any of the following assumptions: i, Continuity assumption: The objects near each other tend to share the same group or label. This assumption is also used in supervised learning, and the datasets are separated by the decision boundaries,aN Machine Leaming Overview wision \ 2-17 ii, Cluster assumptions: The data tend to form discrete clusters, and points in the same cluster are more likely to share a label (although data that shares a label may spread across multiple clusters). iii, Manifold assumptions: The data lie approximately on a manifold of much lower dimension than the input space. The sklearn.semi_supervised module implements semi-supervised learning algorithms. These algorithms utilize small amounts of labeled data and large amounts of unlabeled data for classification tasks. Difference between Supervised, Unsupervised and Semi-supervised learning ~ Supervised learning Unsupervised learning | Semi-supervised learning Data Labeled __| Unlabeled Partially labeled Goal Predict outcomes for new | Get insights from large Extract relevant features data. volumes of new data. from data. ‘Computational | Simple ‘Complex Moderate complexity ‘Accuracy Higher Lower [Lower Mechanism The algorithm learns from | The algorithm learns on its The algorithm learns from the labeled input and | own from unlabeled input | the partially labeled data and output data. data. generates pseudo-labels for the un-labeled data. Tasks Classification and Clustering and Association | Classification and Clustering. Regression. mining. Human Required {0 label the | Not required. Required to label a subset of intervention _| dataset correctly. the data. Dataset Usually small to Very large volume Very large volume moderate volume. ‘Applications | Spam detection, ‘Anomaly detection, Medical images. sentiment analysis, recommendation engines, weather forecasting and | customer profiling and pricing predictions. medical imaging. Reinforcement Learning Reinforcement learning is a form of reward/punishment-based learning in which a machine leams a series of actions to perform a task, Each action of this algorithm is tied with a positive or negative reward, A reinforcement learning algorithm, or agent, learns by interacting with its environment, The agent receives rewards for performing correctly and penalties for performing incorrectly, The agent learns without intervention from a human by maximizing its reward and minimizing its penalty. Thus, it belongs to un-supervised learning category.(> 218 /wbion Cate Anaiyice State Action Environment Figure 2.10: Reinforcement Learning Reinforcement machine learning trains machines through trial and error to take the best action by establishing a reward system, Reinforcement leaming can train models to play games or train autonomous vehicles to drive by telling the machine when it made the right decisions, which helps it leam over time what actions it should take. It is the closest attempt at modeling the human learning experience because it also leams from trial and error rather than data alone. This approach is different from thie previous two approaches. Consider an example, sometimes, a student neither has any-experiences nor is taught by any teacher. But the student gives practice tests and gets +ve marks for right answers and —ve marks for wrong answers. Thus, the training is now aided by rewards (+ve marks) and punishments (ve marks). This process makes the student to learn their mistakes. This is called as Reinforcement Learning. Applications of reinforcement learning are: i, Self-driving cars ii, — Robotics iii Gaming : Most common reinforcement learning algorithms include: i, QuLearning i SARSA (State Action Reward State action) iii, DQN (Deep Q Neural Network) Ensemble Techniques Let us consider a real life example, Suppose you want to buy a new car. Will you simply go the first car shop and purchase one based on the advice of the dealer? It’s highly unlikely. You will first browser a few websites where people have Posted their reviews and compare differen car models, checking for their features and Prices, You will also probably ask your friends and colleagues for their opinion. In short, you wouldn’t directly reach a conclusion, but will instead make a decision considering all options and opinions.on \ Machine Learning Overview wsion \ 2-19 Ensemble models in machine learning operate on a similar idea. They combine the decisions from multiple models to improve the overall performance. There are so many machine learning models. Each model performs differently and may give different results. You can employ ensemble learning techniques when you want to combine multiple models and improve the performance of machine learning models. Ensemble methods, is a machine learning technique that combines several base models in order to produce one optimal predictive model. The three main classes of ensemble learning methods are: i. Bagging: Bagging involves fitting many decision trees on different samples of the same dataset and averaging the predictions. Bootstrap aggregation, or bagging for short, is an ensemble learning method that seeks a diverse group of ensemble members by varying the training data. The name Bagging came from the abbreviation of Bootstrap AGGregatING. As the name implies, the two key ingredients of Bagging are bootstrap and aggregation. This typically involves using a single machine learning algorithm, almost always a decision tree, and training each model on a different sample of the same training dataset. The predictions made by the ensemble members are then combined using simple statistics, such as voting or averaging. Dataset | Random subset 2| aie a [Freon subset n) | Resun Aggregation Fina! Prediction Figure 2.11: Bagging2-20 /mfian Data Annie Bagging algorithms a Bagging meta-estimator b. Random forest Boosting: Let us consider an example of predicting rain based on 6 parameters, name) temperature, atmospheric (barometric) pressure, humidity, precipitation, solar radiatior and wind, Suppose we use different machine learning models for the prediction. The outputs from the Machine Learning models may differ for these six parameters. Ore model which is evaluating air temperature may predict a sunny day. Whereas, anothe: model may predict a rainy day based on humidity. So, even if we predict the outcome or the basis of just one single model, then there is a 50% probability of false prediction These individual models are weak learners. But, if we combine all the weak learners work as one, then the prediction would rely on different models and hence, will give accurate prediction. Boosting involves adding ensemble members sequentially that correc: the predictions made by prior models and output a weighted average of the predictions. I a data point is incorrectly predicted by the first model, and then the next (probably all models), will combining the predictions provide better results? Such situations are take? care of by boosting. ? Boosting is a sequential process, where each subsequent model attempts to correct the errors of the previous model. The succeeding models are dependent on the previous model. Thus, the boosting algorithm combines a number of weak learners to form a strong Jeamer. Model 1, 2. ..., N are individual models (€.9., decision tree) Weakness Weakness 1 Weight Weight 2 Weight... Ensemble (with all Its predecessors) Weight N Figure 2.12: Boosting(> Machine Learning Overview wsion\, 2-21 Let's understand the way boosting works. A subset is created from the original dataset. A base model is created on this subset. This model is used to make predictions on the whole dataset. Errors are calculated using the actual values and predicted values. The observations which are incorrectly predicted, are given higher weights. Another model is created which tries to correct the errors from the previous model. This continues till a desired level of accuracy is achieved. Thus, each model actually boosts the performance of the ensemble. Boosting algorithms Adaptive Boosting (also known as AdaBoost) Gradient Boosting (GBM) Extreme Gradient Boosting Machine (XGBM) LightGBM CatBoost Stacking: Stacking involves fitting many different models types on the same data and using another model to learn how to best combine the predictions. Unlike bagging, in stacking, the models are typically different (e.g. not all decision trees) and fit on the same dataset. Unlike boosting, in stacking, a single model is used to learn how to best combine the predictions from the contributing models (e.g. instead of a sequence of models that correct the predictions of prior models). The architecture of a stacking model involves two or more base models, often referred to as level-0 models, and a meta-model that combines the predictions of the base models teferred to as a level-1 model. Model Stack Level -0 Level-1 Figure 2.13: Stacking2-22 h wbion Data Analytion Level-0 Models (Baye-Models); Models fit on the training data and whose predictions are compiled, Diverse and complex models can be used as the base models SUCH A$ decisicy trees, SVM, random forests ete. Level-1 Model (Meta-Model): Model that learns how to best combine the prediction, the base models. Typically, simple models such as Linear regression model (i, Prediction) and Logistic regression model (for Classification) are used as meta models acl ci 4. Regression Models “Regression” is a generic term for statistical methods that attempt to fit a model to data, in orde: to quantify the relationship between the dependent (outcome) variable and the Predictor (independent) variable(s). Regression analysis is used when you wani to predict a continuous dependent variable from a number of indepen dent variables. Regression analysis is a set of Statistical processes for estimating the relationships between a dependent variable (often called the ‘outcome variable’) and one or more i independent variables (often called predictors ‘covariates’, or ‘features’), Regression analysis ls primarily used for two distinct purposes: a First, it is widely used for Prediction and forecasting, which overlaps with the field of machine learning, i Second, it is also used to infer causal relations! hips between independent and dependent variables, 4.1 Types of Regression There ure many types of regression analysis techniques, upon the number of factors. These factors include the Tegression line, and the number of independent variables, and the use of each method depends type of target variable, shape of th a Linear regression ii, Polynomial regression Logistic regressioni ‘Machine Learning Overview sion \ 2-23 Linear Regression Linear regression analysis is the most widely used of all statistical techniques. It is an approach for modelling the relationship between a scalar response and one or more explanatory variables {also known as dependent and independent variables). Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable's value is called the independent variable. Linear regression attempts to model the relationship between two variables by fitting a linear equation (=a straight line) to the observed data. One variable is considered to be an independent variable (e.g. your income), and the other is considered to be a dependent variable (e.g. your expenses). Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent (y) variables, hence called as linear regression. Since linear regression shows the linear relationship, which means it finds how the value of the dependent variable is changing according to the value of the independent variable, 7;De 2-24 I Sion Data Analytics 85 80 75 2 4 6 8 10 12 14 16 Figure 2.15: Plot of Linear Regression line with residual error , ‘Types of Linear Regression Linear regression can be further divided into two types of the algorithm: i. Simple linear regression: If a single independent variable is used to predict the value 0! numerical dependent variable, then such a linear regression algorithm is called Simp! linear regression. The equation is: y = bo + bix Multiple linear regression: If more than one independent variable is used to predict value of a numerical dependent variable, then such a Linear regression algorithm is cal Multiple linear regression, The regression equation is: y = by + b,x; + bX: + bX; +.... + DaXa Regression equation Y=a+bX+eoN Machine Learning Overview vision \ 2-25 . Y is the predicted value of the dependent variable (Y) for any given value of the independent variable (X). © ais the intercept, the predicted value of y when x is 0. ¢ __ bis the regression coefficient - how much we expect y to change as x increases. . X is the independent variable, ¢ _ sis the error of the estimate, or how much variation there is in our estimate of the Tegression coefficient. The slope and intercept can be calculated as follows: se Gy ex) - Ey (xy) © m@x')=(Gxy » = Mew-n ey ~ n(x") - (tx)? Let us consider an example to understand linear regression, We will find the linear regression equation for the following set of data: x 2 4 J 8 Y 3. [ 7 5 10 We first plot these values and observe the datapoints. Figure 2.16On 2-26 /wsion Data Analytics We now apply the linear regression equation to fit a straight line through these points. Perf, calculations as shown x Y 2 3 6 4 Fam ak | geek 1G 28 6 5 36 30 8 10 64 80 Yx=20| Dy=25 | Yx’= 120} Lxv= 144 Applying these values to calculate a and b, we get 25 x 120 - 20 x 144 = pyaxi=ao, = | 4144 — 2025 Cea mapa aoeaooe = °° Hence, the linear regression equation is given as: Y=15+0.95X Let us now test this équation for the following X values: 274 {ele 34 [53 [72 | 94 As you can see, the predicted Y values differ from the actual Y values. The difference is' error. If we plot these points, we get the regression line as shown below.os ‘ Machine Learning Overview wsien\ 2-27 Regression line 4 5 6 # 8 9 @ Y (Actual) © Y (Predicted) Figure 2.17 Python Implementation The class sklearn.1linear_model.LinearRegression is used for implementing linear regression in Python. Step 1: Import the libraries import numpy as np from sklearn.linear_model import LinearRegression import matplotlib.pyplot as plt Step 2: Create the data X = np.array([5, 15, 25, 35, 45, y= np.array({5, 20, 14, 32, 22, 7 print (x) / print (y) / / 55]).reshape(-1, 1) 22 Output 15) (15 (25) {35} (45) [58] / [520 14 32 22 38) A Lk Note that reshape(-1,1) is used to convert x into a two dimensional array.2.28 /wlien bate Anarcs Step 3: Step 4: Step 5: Step 6: Plot the data points import matplotlib.pyplot as plt plt.scatter(x,y,s=10) plt.xlabel('x') Plt.ylabel('y') plt.show() Create the model and Fit the model #Create the model model = LinearRegression() model. fit (x, y) Get the intercept and Tegression coefficient, ‘The attributes of model are sintercept_, which Tepresents the coefficient, a and .coef. which represents b: print (* intercept: ", model sintercept_) print ('coefficient:' ", model.coef_) Predict the values ypred = model.predict (x) print (‘predicted response:! + Ypred, sep="\n') Output predicted response: [ 8.33333333 13.73333333 19,13333933 24.53333333 29.93333333 36.33333333)Sea\ Machine Learning Overview wsi0n\ 2-29 Step 7: Plot the actual and predicted values plt.scatter(x,y,s=10) plt.xlabel ("x") plt.ylabel('Y') plt.plot (x, y_pred, color = 'red', marker = 'o') plt.show() 354 30 | 204 . 154 i 104 s4e Step 8: Evaluate the model #model performance from sklearn.metrics import r2_score, mean_squared_error, mean_a bsolute_error mae= mean_absolute_error(y, y_pred) mse = mean_squared_error(y, y_pred) 2 = r2_score(y, y_pred) print ("R-Squared :", r2) Output Mean Absolute Error : 5.466666666666666 Mean Squared Error : 33.75555555555555 R-Squared : 0.7158756137479542 i { i Polynomial Regression Sometimes, it is not possible to map all the points in a two-dimensional plane with a linear Fegression line, This is because the plot nature of the points itself is non-linear. In the image shown below, we have taken a dataset which is arranged non-linearly. So if we try to fit it with a linear regression model, then we can clearly see that it hardly covers any data points. On the ther hand, a curve is suitable to cover most of the data points, which is of the polynomial model, Hence, if the datasets are arranged in a non-linear fashion, then we should use the Polynomial Regression model instead of Simple Linear Regression,2-30 / ohn Data Analytics y aby +bx, +bx,2 x Simple linear model Polynomial model Figure 5.7: Linear and Polynomial regression There are many cases where polynomial regression is used. These are basically used to define or describe non-linear phenomena such as growth rate of tissues, Progression of disease epidemics etc, The kth order polynomial regression can be written down as: Y= bo tDiX+bx' +... + bxX +e By transforming the linear line into a polynomial form, the curve is made to pass through all the points. While building a high order regression, in order to eecide on the degree of the regression either of two approaches can be adopted. i, Forward selection: In forward selection procedure, the order of the polynomial is eventually increased until a further higher-order polynomial doesn't bring any difference to the result. fi, Backward elimination: Here, we initially start with Tegression of high order degree an then keep reducing the degree as long as the curve passes through the points, Summary of Equations Simple Linear Regression equation: y= dot bix Multiple Linear Regression equation: Y= Do + ix + boxe + baxy +... Dako, Polynomial Regression equation: Y= Do + Dix + box? + bax? +... Dx”n> \ ‘Machine Learning Overview vision \ 2-31 Python Implementation Let us consider the following dataset of experience vs. salary. Step 1: Step 2: experience salary 1 45000 2 50000 3 60000 4 80000 5 110000 6 150000 7 200000 8 300000 9 500000 0 1000000 Create the dataframe data = {'experience’:[1,2,3,4,5,6,7,8, 9,101], "salary':[45000, 50000, 60000, 80000, 110000, 150000, 200000, 300000, 500000, 10000001} df = pd.DataFrame (data, columns=['experience', 'salary']) Plot the points Let us plot these points to understand the type of regression model to be used. import matplotlib.pyplot as plt plt.scatter (x=df["experience"], y=df["salary"]) plt .xlabel ("Experience") ; plt.ylabel ("Salary") Experience Figure 2.18on : 2:92 / dion Data Analytics Step 3: Step 4: Step 5: Step 6: Step 7: As you can see, we cannot fit a straight linear regression line through these Poin, Hence, we go for polynomial regression. Create the Polynomial using PolynomialFeatures from sklearn.preprocessing import PolynomialFeatures poly = PolynomialFeatures (degree=4, include_bias=False) degree sets the degree of our polynomial function. Degree = 4 means that we want: work with a 4" degree polynomial. include_bias = False should be set to False, becauy we'll use PolynomialFeatures together with LinearRegression(). Split the X and Y columns #X: the 1st column which contains Experience array #y: the last column which contains Salary array X = df.iloc{:, 0:1].values y = df.iloc{:, 1].values Generate the feature matrix poly_features = poly. fit_transform(x) The fit_transform() method Generate a new feature matrix consisting of all polynomial combinations of the features with degrees less than or equal to the specified degree For example, if an input sample is two dimensional and of the form [a, b], the degree? polynomial features are [1, a, b, a2, ab, b*2]. Apply the model Polynomial regression is a variant of Linear regression model with a polynomit! equation. Hence, we apply LinearRegression. from sklearn.linear_model import LinearRegression model = LinearRegression() : model. fit (poly_features, y) y_pred = model.predict (poly_features) print (y_pred) Output [ 5335664335665 3175990675992 58642.1911422 94632.86713289 121724.94172497 149275,05827509 184003.49650352 289994.17249417528604.6386945 988916.08391599] Plot the regression curve plt.scatter (X,y) plt.plot (X,y_pred, c="red") plt.show()om \ Machine Learning Overview wsien\ 2-33 08 0.6 04 0.2 0.0 - 4 6 8 10 Figure 2.19 Step 8: Model Evaluation import numpy as np : from sklearn.metrics import mean_squared_error poly_reg_rmse = np.sqrt (mean_squared_error(y, y_pred)) print (poly_reg_rmse) paar ete oy Output i 9096276 | Step 9: . Predict salary for new values of experience #Predict salary for new experience y_new = poly_reg_model.predict (poly. fit_transform(((4.5], [8-5], (10.5]1)) print (y_new) Output [ 109582.14962124 387705.69274474 1335692.289481 22] Logistic Regression Logistic regression is one of the types of regression analysis technique which is used when the dependent variable is discrete. Example: 0 or 1, true or false, etc. For example, to predict Whether an email is a spam (1) or (0), Whether the tumor is malignant (1) or not (0) This means the target variable can have only two values, and a sigmoid curve denotes the telation between the target variable and the independent variable. The name “Logistic” comes2.84 /mbion Data Anaytce from the Logit function, which is utilized to measure the relationship between the target varigh), and independent variables. While Linear regression is used to solve regression (predicti: problems, Logistic regression is used for binary classification problems. Y 08 Threshold value y=03 Figure 2.20: Sigmoid curve Instead of fitting a regression line, we fit an ‘S’ shaped curve produced by a sigmoid function ‘The Sigmoid function gives a probability value between 0 and 1. The Sigmoid function is used to convert expected values to probabilities. We utilize sigmoid to translate predictions '0 probabilities in machine learning. Usually, a threshold value is set for classification. 4 probabilities above the threshold are considered to be 1 and below the threshold are considered to be 0. How to interpret the probabilities Consider the data points and the sigmoid curve shown below. Hours Studied Figure 2.21de Machine Learning Overiow wisian\ 2-35 In this example, the Y axis is the probability that the student passes or fails the test (0 means the student does not pass the test, 1 means that the student passes the test). The X axis is the number of hours studied. If we draw a line from the data point to the curve, the point where the line intersects the curve gives the probability. For example, the bottom left point intersects the curve at 0, The bottom 4" point has a probability of around 0.7 which can be interpreted as 1 (passed) if the threshold value = 0.5. The primary difference between linear regression and logistic regression is that logistic regression’s range is bounded between 0 and 1. In addition, as opposed to linear regression, logistic regression does not require a linear relationship between inputs and output variables. The logistic regression equation is: P= il -Ter Where, P is the probability, e is the base of the natural logarithm (about 2.718) and a and b are the parameters of the model. Python Implementation Consider the following dataset. A company selects candidates based on three factors namely test_score, their gpa, and work_experience. The selection status (1/0) is stored in the last column. The logistic model is trained using this dataset. The model is evaluated and tested on another dataset to determine whether candidates would get selected or not. The first 5 rows of the dataset are: test_scoregpa work_experie! 173° «4.0 2 2157 3:9 is 368. 3:2: 3 G78 3% 5 568 3.9 4 Step 1: Import the libraries import pandas as pd : from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn import metrics import seaborn as sn import matplotlib.pyplot as plt. ” 2-96 /bian Data Anaiyicn Step 2: Step 3: Load the dataset candidates = ('test_score': [78,75,69, 71, 68, 73,69, 72,74, 69,61 71, 68,77, 61,58, 65, 54,59, 62, 60, 55, 55, 57, 67, 66, 58, 65, 66, 64,62, 66, 68, 65,67,58,59, 63], ‘gpa's [ 1369/3 . Lenn T 13437303 /26399207 'work_experienc: 1,4,6,2,3,2,1,4,1,2,6,4,2, 6,5 ‘selected’: [1, +0,1,0,0,0,0,1,1,0,1,1,0,0,1, } df = pd.DataFrame (candidates, columns= ['test_score', "gpa', “‘work_experience', 'selected']) Split the data into training and testing X = df[['test_score', Y = df['selected'] X_train,X_test,y train vy_test = train_test_split (x,y, test_size=0.25, random_state=1) set (75% is used for training) ‘gpa', 'work_experience'}] the 1_state In the code, then whenever you execute your code a new random value is generated and the train and test datasets would have different values each time. However, if You use a particular value for random: 1_state(random_state = 1 or any + other value) ev rytime the result will be same,.e, same values in train and test datasets, ite the model and train the model dogistic_regression= LogisticRegression() jogistic_regression.fit (x train, y_train) ¥_pred=logistic_regression. pred. ict (X_test) Display the predicted values print (y_test) Print (y_pred) ‘As you can see, the highlighted prediction is wrong. 11010010101)on Machine Learning Overview wsion\ 2-37 Step 6: Evaluate the model confusion_matrix = pd.crosstab(y_test, y_pred, rownames=(‘Actual'], colnames=('Predicted'}) sn. heatmap(confusion_matrix, annot=True) print ('Accuracy: ',metrics.accuracy_score(y_test, y_pred)) pit. show () [ Output | Accuracy: 0.9 oo 4 = 3 § 3 = 2 1 0 Predicted Figure 2.22 Step 7: Apply the model to new data new_candidates = {'test_score': [59, 74, 68, 61,71], ‘gpa’: [2,3.7,3.3,2.3,3], twork_experience': [3,4,6,1,5] ) df2 = pd.DataFrame (new_candidates, columns= ['test_score', ‘gpa’, 'work_experience']) y_pred=logistic_regression.predict (df2) print (df2) print (y_pred) Output test_scoregpa work_experience 0 59 20 3 fer 7A 8.7 4 2 6833 6 a. 6128 1 4 97130 5 101101)2.98 /whion Date Anavtics 5. Concept of Classification As we know, Supervised Machine Learning algorithms can be broadly classified into Regression and Classification Algorithms. In Regression algorithms, we have Predicted the output for continuous values, but to predict the categorical values, we need Classification algorithms. 4 common job of machine learning algorithms is to Tecognize objects and being able to separate | them into categories. Classification is defined as the process of recognition, grouping of objects into relevant groups or categories called “classes”. In machine learning, understanding, and classification refers to a Supervised machine learning technique where 2 class label is predicted for a given example of input data. In classification, a Program learns from ~ the given dataset or observations and then classifies new observations into a number of classes or groups. Classification takes labeled input data, which means it contains input with the corresponding output. Unlike regression, the output variable of classification is a category, not a value, such as, Yes or ‘No, 0 or 1, spam or not spam, apple or banana, etc. Classes can be called as targets/labels or © categories. Figure 2.23: Classification Examples of classification problems include: Given an email, classify if it is ‘spam or not, Given a handwritten character, classify it as one of the known characters. Given user feedback, classify as positive, negative or neutral. Types of classification The algorithm which implements the classification on a dataset is known as a classifier. eaeZa Machine Learning Overview wiron\ 2-39 There are two types of classifications: j, (Binary Classifier: If the classification problem has only two possible outcomes, then it is called as Binary classifier. Examples: YES or NO, SPAM or NOT SPAM, BUY or NOT BUY. ii, Mullfi-class/@lassifier: If a classification problem has more than two outcomes, then it is called as Multi-class classifier. Example: Classifications of types of species, types of music, character recognition. Classification Algorithms There are many classification algorithms available. Which algorithm to use depends on the application and nature of the available dataset. The following are the most commonly used algorithms for classification: - g Logistic Regression ii, Support Vector Machines K-Nearest Neighbors iv. Decision Tree Classification v. Naive Bayes vi. Random Forest Classification The concept behind some of the algorithms is discussed in brief here. Logistic regression is used to predict the probability of a target variable. It is used for binary classification problems like spam detection, diabetes prediction, cancer detection ete. A sigmoid curve denotes the relation between the target variable and the independent variable and the function gives a probability value between 0 and 1. Using this probability, we can further Predict the category of class.Figure 2.24: Logistic Regression ‘Support Vector Machines _ Support vector is used for both regression and classification. It is based on the concept decision planes that define decision boundaries. A decision plane (hyperplane) is one thi separates between a set of objects having different class memberships. 1 } | ' j { { \ } \ | The goal of the SVM algorithm is to create the best line or decision boundary that can segrevat: n-dimensional space into classes. It performs classification by finding the hyperplane thai | maximizes the margin between the two classes with the help of support vectors. This bes decision boundary is called a hyperplane. SVM chooses the extreme Points/vectors that help ir creating the hyperplane. These extreme cases are called as support vectors, and hence algorithin is termed as Support Vector Machine. Maximum Xe a margin Positive hyperplane e¢ Maximum, margin hyperplane x AN aeAe hictaiolane ‘Support vectors x Figure 2.25: Support Vector Machineon Machine Learning Overview wscon\ 2-41 “K-Nearest Neighbors K-Nearest Neighbor is a classification and prediction algorithm that is used to divide data into classes based on the distance between the data points, K-Nearest Neighbor assumes that data points which are close to one another must be similar and hence, the data point to be classified will be grouped with the closest cluster, KNN works by finding the distances between a new object and k number of nearest objects in the data, The new object is classified based on the most frequent label. K-NN algorithm uses a similarity or distance measure such as Euclidean, Minkowski, Manhattan etc, Y-axis New example to classity Class A Class 8 Axis Figure 2.26: Classification using K-Nearest Neighbors The choice of k is important here. In the figure above, the new data point will be classified as class B if k=3 and class A if k=7. There is no particular way to determine the best value for"K", So we need to try some values to find the best out of them. If k is very large, it results in high bias and low variance. Also, it is computationally expensive. If k is very small, it results in low bias and high variance. Also the noise will have a higher influence on the result. A general rule of thumb in choosing the value of k is k = sqrt(N)/2 where N = number of samples. Decision tree builds a classification model in the form of a tree structure. It utilizes an if-then tule set which is mutually exclusive and exhaustive for classification. The rules are learned sequentially using the training data one at a time. Each time a rule is learned, the tuples covered by the rules are removed. This process is continued on the taining set until meeting a termination condition. The tree is constructed in a top-down recursive divide-and-conquer manner,Ie 2-42 /wSion Data Analytics E> [Hunaty Yes Win | \ High /\ Normal strong / \ Wx No Yes No Yes Figure 2.27: Classification using Decision Tree In the above figure, depending on the weather conditions and the humidity and wind, we cu systematically decide if we should play golf or not. The decision tree has the features at th: nodes and the resulting classes at the leaves. Naive Bayes algorithm is a supervised learning algorithm, which is predicated on Bayes theoren and used for solving classification problems. It’s not one algorithm but a family of algorithms where all of them share a standard Principle, i.e. every pair of features being classified i independent of the other. For example, if given a banana, the classifier will see that the fruit is 0 yellow color, oblong-shaped, Jong and tapered. All of these features will ‘contribu independently to the probability of it being a banana and are not dependent on each other. Naive Bayes calculates the possibility of whether a data point belongs within a certain catego") or does not. It uses the Bayes’ Theorem to find the Probability of an event occurring given th Probability of another event that has already occurred. Bayes’ theorem is stated mathematical!) as the following equation: P(A 1B) = PA IB) x PA . PELIA)x PLA). x p (0) where A and B are events and P(B) # 0. P(A |B) = how often A happens given that B happens P(A) = how likely A will happen\ Machine Leaming Overview wision\ 2-43 P(B) = how likely B will happen P(B1A) = how often B happens given that A happens Basically, we are trying to find probability of event A, given the event B is true. Event B is also termed as evidence. Example; Consider the following dataset No. | Outlook | Play Golf 1 | Rainy Yes 2 Sunny, Yes 3_| Overcast Yes 4 | Overcast Yes 5 Sunny No 6 7 8 Rainy | Yes Sunny | _Yes Overcast | — Yes 9 | Rainy [No 40 [Sunny | No 4 ‘Sunny Yes 12 [Rainy | No 13 | Overcast Yes 14 | Overcast Yes Frequency table for weather conditions Weather | Yes | No Overcast| 5 | 0 Rainy 2 2 Sunny 2 [Total | 10 | 5 Likelihood table of weather conditions Weather | No Yes ‘Overcast| 0 5 5/14= 0.35 Rainy 2 2 4/14=0.29 Sunny 2 3 5/14=0.35 ‘All 4/14=0.29 | 10/14=0.71 Applying Bayes'theorem P(YeslSunny)= P(SunnylYes)*P(Yes)/P(Sunny) P(SunnylYes)= 3/10= 0.3, P(Sunny)= 0.35, P(Yes)=0.712-44 /ubion Date Anaites So P(YeslSunny) = 0.3*0.71/0.35= 0.60 P(NolSunny)= P(SunnyINo)*P(No)/P(Sunny) P(SunnylNo)= 2/4=0.5, P(No)= 0.29, P(Sunny)= 0.35 So P(NolSunny)= 0.5*0.29/0.35 = 0.41 So as we can see from the above calculation that P(YeslSunny)>P(NolSunny) Hence on a Sunny day, the player can play the game. ‘Random Forest Classification Random forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can be used for both Classification and Regression problems in Machine Learning It is based’on the concept of ensemble learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the performance of the model, The Random Forest Classifier is a set of decision trees from randomly selected subset of the training set. It aggregates the votes from different decision trees to decide the final class of the test object. The (random forest) algorithm establishes the outcome based on the predictions the decision trees. It Predicts by taking the average or mean of the output from various tree: Increasing the number of trees. increases the precision of the outcome. of Instance Ss Se ‘ meee Ss “ ; uy y Ry Tree-t Tree-2 Thee Class-A Class-B ! Class-c HB tlie Th Majority-Voting Ee atl Figure 2.28: Random ForestOn \ ‘Machine Learning Overview wision\ 2-45 Python Methods for Classification [Algorithm Name Random Forest | Python function Logistic Regression sklearn, linear_model.LogisticRegression ‘Support Vector Machines sklearn.svin, SVC K-Nearest Neighbors sklearn.neighbors.KNeighborsClassifier Decision Tree sklearn.tree.DecisionTreeClassifier Naive Bayes sklearn.naive_bayes.GaussianNB Random Forest sklearn.ensemble.RandomForestClassifier Difference between Classification and Regression Classification and Regression algorithms are Supervised Learning algorithms. Both the algorithms can be used for forecasting in Machine learning and operate with labeled datasets. But the distinction between classification vs. regression is how they are used on particular machine learning problems. Parameter — Classification Regression Goal Used for mapping values to predefined | Used for the mapping of values to I classes. continuous output Task we try to find the decision boundary, which | We try to find the best fit line, can divide the dataset into different classes. | which can predict the output more : accurately. Involves Discrete values Continuous values prediction of Nature of the | Unordered Ordered predicted data Model evaluation | by measuring accuracy by measurement of root mean square error Algorithms Logistic regression, Support Vector Machines, | Linear, Multiple linear regression, K-Nearest Neighbors, Decision Tree | Polynomial Classification, Naive Bayes Examples Identification of spam emails, Speech | Weather prediction, House price recognition, Image recognition, etc. prediction, etc. 6. Concept of Clustering Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups (clusters). It divides the population or data points into a number of groups such that data points in the same groups ate more similar to other data points in the same group than those in other groups. In simple words, the aim is to segregate groups with similar features and assign them into clusters.ae 2-46 h ‘usin ‘Data Analytics Tu.does it by finding some similar patterns in the unlabeled dataset such as shape, size, color behavior, etc,, and divides them as per the presence and absence of those similar patterns. It is an unsupervised leaming method, hence no supervision is provided to the algorithm, and it deals with an unlabeled dataset, ——— @. Algorithm Raw Data a ay Output Figure 2.29: Clustering Clustering can be widely used in various tasks, Some most common uses of this technique are: i, Market segmentation ii, Social network analysis i. Image segmentation iv. Anomaly detection v. Document grouping vi. — Medical imaging etc.om \ Machine Leaming Overview wision \ 2-47 ‘Types of Clustering algorithms Clustering algorithms use various approaches for forming clusters within the dataset. These are: i, Partitioning based clusteringy It is a type of clustering that divides the data into non- hierarchical groups. It is also known as the centroid-based method. The most common example of partitioning clustering is the K-Means Clustering algorithm. In this type, the dataset is divided into a set of k groups, where K is used to define the number of predefined groups. The cluster center is created in such a way that the distance between the data points of one cluster is minimum as compared to another cluster centroid. Figure 2.30: Partitioning based clustering ii, Densitybased elustering: This method identifies different clusters in the dataset based on the density of the points. It connects the areas of high densities into clusters. The dense areas in data space are divided from each other by sparser areas. In density-based clustering, data is grouped by areas of high concentrations of data points surrounded by areas of low concentrations of data points. Basically the algorithm finds the places that are dense with data points and calls those clusters. The important point is that the clusters can be any shape.Figure 2.31: Density based clustering iil, “Hierarchical Clustering? Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters, A Hierarchical clustering method works via grouping data into a tree of clusters. We develop the hierarchy of clusters in the form of a tree, and this tree-shaped structure is known as the dendrogram. The observations or an number of clusters can be selected by cutting the tree at the correct t level. Agglomerative Divisive ab haan Figure 2.32: Hierarchical Clustering | :iv. h Machine Learning Overview who \, 2-49 Hierarchical clustering technique is divided into two types: a. Agglomerative Hierarchical Clustering: The Agglomerative hierarchical clustering is the most common type of hierarchical clustering used to group objects in clusters based on their similarity. It's also known as AGNES (Agglomerative Nesting). It'sa “bottom-up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. b, _ Divisive Hierarchical Clustering: This is a top-down approach to clustering. In this method, we assign all of the observations to a single cluster and then partition the cluster to two least similar clusters. Finally, we proceed recursively on each cluster until there is one cluster for each observation. In both agglomerative and divisive hierarchical clustering, users need to specify the desired number of clusters as a termination condition. Distribution Model-Based’ Clustering) In the distribution model-based clustering method, the data is divided based on the probability of how a dataset belongs to a particular distribution. The grouping. is done by assuming some distributions commonly Gaussian Distribution. Gaussian distribution is more prominent where we have a fixed number of distributions and all the upcoming data is fitted into it such that the distribution of data may get maximized. °F 01 02 03 04 05.06 07 0808 1 Figure 2.33: Distribution-model based ClusteringOn 2.50 /wibion Dara Anaycs Clustering Algorithms The Clustering algorithms can be divided based on their models that are explained above. Some of the popular clustering algorithms that are widely used in machine learning are: i, KeMeans Algorithmy/The k-means algorithm classifies the dataset by dividing the samples into different clusters of equal variances. The number of clusters K must be specified in this algorithm, Initially, each data point is randomly assigned to a cluster, Then the cluster centroids are computed and the data points are re-assigned to the closest cluster centroid, The cluster centroids are re-computed and the process is repeated till no more improvements are possible. fi, DBSCAN Algorithm It stands for Density-Based Spatial Clustering of Applications with Noise. It is an example of a density-based model similar to the mean-shift, but with advantages. In this algorithm, the areas of high density are separated by the areas of low density. Because of this, the clusters can be found in any arbitrary shape. iti, (Expectation-Maximization Clustering using'GMM: This algorithm is a distribution based clustering method. Clustering methods such as K-means have hard boundaries, meaning a data point either belongs to that cluster or it doesn't. On the other hand, clustering methods such as Gaussian Mixture Models (GMM) have soft boundaries, where data points can belong to multiple clusters at the same time but with different degrees of belief. e.g. a data point can have a 60% of belonging to cluster 1, 40% of belonging to cluster 2. 2 iv. (Agglomerative Clustering: The Agglomerative hierarchical algorithm performs the bottom-up hierarchical clustering. In this, each data Point is treated as a single cluster at the outset and then successively merged. The cluster hierarchy can be represented as a tree-structure. ¥. (Mean-shift algorithms is a centroid based algorithm, which .works by updating candidates for centroids to be the mean of the points within a given region. These candidates are then filtered in a post-processing stage to eliminate near-duplicates to form the final set of centroids,wit rn\ 2-51 Machine Learning Overview python functions for clustering Name of Algorithm Python Function K-means algorithm sklearn.cluster.KMeans DBSCAN Algorithm sklearn.cluster.DBSCAN & tion-Maximization Cluster seg GMM "Nd | sklearn mixture GaussianMixture Agglomerative Clustering sklearn.cluster AgglomerativeClustering Mean-shift algorithm: sklearn.cluster.MeanShift Difference between Classification and Clustering Parameter Classification Clustering Type ‘Type of supervised learning ‘Type of unsupervised learning Dataset Labeled data Unlabeled data Process Process of classifying the input instances | Grouping the instances based on their based on their corresponding class labels. | similarity without the help of class labels. Model It has labels so there is need of training | There is no need of training and testing training and testing dataset for verifying the model | dataset. created. Stages Involves training and testing stages. Involves only one stage-grouping of data based on similarity. Complexity | More complex a8 compared to clustering. | Less complex as compared to classification. ‘Types Logistic regression, K-nearest neighbor, | Partitioning-based clustering, Hierarchical- Decision trees, SVM etc. based clustering, Density-based clustering atc. Example Logistic regression, Naive Bayes classifier, | k-means clustering algorithm, Gaussian Algorithms | Support vector machines, etc. (EM) clustering algorithm, etc. ications | Emall Spam Detection, Facial Recognition, | Market and customer segmentation, Social hee Sentiment analysis. network analysis (SNA), Search result , clustering, Biological data analysis, Medical imaging analysis. eee ae2:52 / anon Dats Ares Exercises A, _ Maite Cee Ques, F is ik of sy which lows machines to Jean ftom data or expr and make a prediction hasd on the experience. Data science Deep learning 8 Machine eaing 6 © Anica imeligence 4 sims 10 create machines that can imitate human behavior and pets numa actions 2 Machine ering bata scence © Ati imetience & Dep eaing ‘The ouput ofa machine iain lor is 2 Dw oe © Summary 4 ouput ‘2 aria ner acrid in ich of te folowing? Sep b. Data science BEBE cans 4 Deep earning Wohl ellwing a eg ine oh he aa pons? 4 Liner epesion = Doge ee Polynomial regression hic of he owing f Alot te above 88a S shaped cure © Linea mein See nies © Logit repesion &Pobynonil egreson ic vicina tt © Demiy ses a © Kona igh — sompon vector machines 10. 4, Ihe aepomee eal Which ofthe folowing nes pode facenon? Aon te living ie oe ich apf eng Supervised learing » ©. Reinforcement learing 4 ‘Unsupersised leaning ‘Semi unsupervised earning In which typeof fearing a abled wining data sets ved? Supervised learning b. Reinforcement isring Unsupervised lesming 4 Allott above “Machine learning algorithm bul a mode ase on which data? Ene data et b Training da c. Testing data Cup daa ‘Which of the following is a probability bated techie? 8. Sopport vector machines b Kocarest eighbors Decision wees A Naive Bayes ‘Which ofthe following is NOT a rezesion ype? b Nowtinear Polynomial Linear © Lopiste ‘Which of the follwing is an ensemble ecnigue? b Decision wee Regression 4 Naive Bayes © Random forest 1 machin leaming mode output involves age variable thes that models called a: peers b Predictive 6 Reinforcement ean 4 Chotring “The problem of finding pases in unlabeled daa i called: Supervised lesring Unsupervised learning = 4 None ofthe above2-54 /nbion Onn Artes ees asked 28 positive, Negative And neyny ex series is an exarple of: Unsupervised learning 4 Reinforcement learning 16. You are given reviews of Casting reviews of 2 Superise earing —Semisapersise eaing 17, Which of te fatowing proves Decision Tres Model bse sens ‘Which ofthe fllowing is oan nse echiue? i 8 Boosting b Bagging & Chserng 4 sacking te nich nem aie de ch een mol temp 1 comet he em | (be previous mel? | | | saserof many diferent sizes and shapes? b_Density-based clustering 4 Kemeans clustering 2 Boosting bBagzog © Sucking 4 Noneoftte above 20, Whntdoes K sd fora K mea sgt? 2 Nemterof doses ® | © Nome of pins Een, 21. Chance opin ica example of + ones as | Bile 4 Reinforcement earning | ee ae | Chung . | © Lineerepesion Cassifcation | 4 23. Which of the Sowing is wed tp otis regression (outcome) variable andthe prdiior aly the relationship between the dependet 2 Cassin inked aia? | © Reinforcement aig epession Chserng 2. 2. 2%. a1 28. Macros aang Ovnina wie, 2.85 ‘Which ofthe following i type of supervised leming? Classification b Chusering None ofthe above ‘Which ofthe following isa typeof unsupervised earning? ¢ Reinforcement earning ‘ Classification b Chustering Reinforcement eaing 4. Repression {In which type of learnings reward punishment used? Classification 1 Cusering Reinforcement learning 4 Regression Reinforcement learning is Supervised learning b. Unsupervised learning ‘Semi-supervised learning None ofthe above Tn which ensemble technique, multiple weak Jeamers are combined to improve the performance? Bagging » © Stacking 4 In which method, two or more tase madels are used along with a, metamodel that ‘combines the preditions of the base models? © Bagging Boosting © Stacking 4 None of tbe above Which of the following is used for prediction as well as infesing causal relationships ‘between varibles? Boosting [Nove ofthe above ‘© Classification fer Guang © Ensemble techniques Sng Ramee: Dement rong ian ral ot & Clustering ss ee eee ec ae2.56/wnion oe race 2 x. enc tseringtestign®? Wich of te flowing i Hi a Dens 4 Support vector machine a Agpomertive <_Randomforet closing algo is base om The Kemeans sce ey end ce «¢ ‘Mri tsering 4A istibuion model-based cise “Te we like suture in iach custerng is called 2 Hierarchy wee b._Dendrogram «Chaser 4 Chass Which clasifcaon meibod pefoms classification by finding the hyperplane dy maximizes the marpn between he wo clases? 4 Suppon Vector Machines b. Decision wee Random Forest 4. K-nearest neighbors ‘State True or Fase: ‘Machine leaning and Arial inelgence are interchangeable terms, Machine earsing isa subset of Arial inelignce. ‘Deep learning is a subset of Machine learning. Alisa field of study which allows machines to lear from rience andr! ihc am from deta or experience 26. 0. 2. 2. 31 2, 2 Maco Learig Cvariew ein 2.57 Logistic regression i used for prediction problems, Linear regression is wed fo prediction, nearest neighbor is wed for bh, clsfaton and clsering. Bagging is a sequential proces, Deep leaming models are trained by using large data sets of labled data Logistic regression is used for binary lassifation, ‘Boosting comecs errors rete by previous moses Bagging uses multiple learning algorithms ‘The sigmoid function maps predicted values ito probabilities. In bagging, the algorithm sand wing the same dataset, In polynomial regression, the relation between the dependent and independent variables is linea. ‘A deep neural network contins one input ayer one hen lyer and one outpt layer. ‘K-nearest neighbor performs clasification by finding the hyperplane that maximizes the ‘margin between the two classes K-means and K-nearest neighbor algorithms are both wsed fr casifcation. ‘The linear regression line sa stalgh line. Logistic regression is used for binary classification. In logistic regression, the dependent variable is continuo. Logistic egression its straight line hough the data pists [entifyng spam or nt spam is an example of claseig ‘Stacking involves fing many diferent model types on the same dt, “The goa of cles is to roup dt pint aig Nigh iy within he ster and high isimiarty wit pins in the ther cst, ‘Agelomeratve Hierarchical clustering uses the "bouom up" approach.2:58 / ime anaes Answer Questions va dns the term Arif ieee ‘What is mache ein? ‘What is deep ering? List te sages in the dling POSS ‘What ste we of mache lain? ‘Were is spervised leaming se? ‘Whereis unspervidlexing see? Explain sem-sopervised lang. Give the eterencesherween supervised unsupervised and semi-supervised leaning Give the wel posed definition of machine learning. ‘enify he ak. experience and pefomance forthe following examples: 4. Optical character recognition Went forcasing ‘Whar is te ies behing deep easng? "Wests bai tire: berween machin leaing and deep learning? What isthe ones of Arif earl network? (Give he sence ofan Anica eral newark, Give the ctu of euro, Whats he two poze ei dr ing ewes? What ate sper eng sts? Machoa earing rion wa, 2-58 State the unsupervised leaming techniques ‘What isthe concept behind reinforcement learning? ‘What is ensemble technique of earning? State the three main classes of ensemble earning methods, ‘What are the types of regression? ‘What are the types of linear regression? Give the equation for polynomial regression. ‘Where is linear regression used? What is forward and hack propagation in neal networks? Where is logistic regression used? ‘What are the assumptions of semi-supervised learning? ‘What does K represent in K-means algorithm? “Hierarchical clustering technique i ivded ino whic types? Answer Questions: to, M, ‘What is machine laring? Give its pistons. Wit «note on Machine Leasing. Waite a not on dep leaming. ‘What is Arifcalimeligene? Give is applications. What is the relaonship between Machine Leasing, Deep Leaming and Artificial Intetigence? Explain the modeling proces ‘Whats the dteeace beween machine leaming and waditinal programming? Explain the rare ofan Atl mura erwort and near. Explain deep ean ‘Whats deep leaming? How dost wok? “Whats deep leasing How does it ifr om watonl machine aring?On 2-60 / vision Data Anaiytics Give the difference between Supervised, Unsupervised and Semi-supervised learning 12. 13. Explain reinforcement learning. 14. _ Explain how bagging works in ensemble learning methods. 15. Explain how boosting works in ensemble learning methods. 16. Explain how stacking is carried out in ensemble learning. 17. What is classification? Explain its types. 18. _ How does classification differ from Regression? 19. What is clustering? Explain its types. 20. How does clustering differ from classification? Answers A. ae a a c 8 b 4, 5. a 6. c 7st 8) ead 9 4 10. b glad’ Bah: 13. 5-:c POL TSs ib) 16. a 12 jab. 18... .c 19. a 20. a ae b) PEAS 22D. 24. a 25. b. 26. ¢ Zi ab, es aD: 20 Se 30. d a1. a a2, a 2) eo ote eb: 3S. oa B. 1. False 2, True 3 True 4. False. -2\-5). False 6. True 7. False 8 Te. 9. True 10. False 11. False 12. . False 13, True 14. True 15. False 16. False 17. True 18. True 19. False 20. True 21. False 22, False 23. False 24. False 25. False 26. True 27. True 28, False 29. False 30, False 31. True 32, Tme- 33. True ; o _ UIsiON