0% found this document useful (0 votes)
32 views19 pages

ML

Machine learning is a subset of artificial intelligence that enables machines to learn from data and improve their performance without explicit programming. It is classified into supervised, unsupervised, and reinforcement learning, with applications ranging from self-driving cars to fraud detection. The machine learning life cycle includes data gathering, preparation, analysis, model training, testing, and deployment, with an emphasis on the importance of quality datasets for effective model training.

Uploaded by

Garima Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
32 views19 pages

ML

Machine learning is a subset of artificial intelligence that enables machines to learn from data and improve their performance without explicit programming. It is classified into supervised, unsupervised, and reinforcement learning, with applications ranging from self-driving cars to fraud detection. The machine learning life cycle includes data gathering, preparation, analysis, model training, testing, and deployment, with an emphasis on the importance of quality datasets for effective model training.

Uploaded by

Garima Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 19
MACHINE LEARNING NOTES What is machine learning? Machine learning enablesa machine to automatically learn from data, improve performance from experiences, and predict things without being explicitly programmed, Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the development of algorithms which allow a computer to learn fram the data and past experiences an their awn. The term machine leaming was first introduced by Arthur Samuel in 1959. Training pucpat —_., Haan New data A Machine Learning system learns from historical data, builds the prediction models, and whenever it receives new data, predicts the output for it. The accuracy of predicted output depends upon the amount of data, as the huge amount of data helps to build a better model which predicts the output more accurately. Suppose we have a complex prablem, where we need to perform some predictions, so instead of writing a code for it, we just need to feed the data to generic algorithms, and with the help of these algarithms, machine builds the logic as per the data and predict the output. Features of Machine Learning: Following are some key points which show the importance of Machine Lean Machine learning uses data to detect various patterns ina given dataset. It can learn from past data and improve automatically Itis a data-driven technology. Machine learning is much similar to data mining as it also deals with the huge amount of the data. NEED OF MACHINE LEARNING Rapid increment in the production of data Solving complex problems which are difficult for a human Decision making in various sector including finance Finding hidden patterns and extracting useful information from data. The importance of machine learning can be easily understood by its uses cases, ‘Currently, machine learning is used in self-driving cars, cyber fraud detection. face recognition, and friend suggestion by Facebook etc. Various top companies such as Netflix and Amazon have build machine learning madels that are using a vast amount ‘of data to analyze the user interest and recommend product accordingly. Classification of Machine Leaming Ata broad level, machine learning can be classified into three types: 1. Supervised learning 2. Unsupervised leaming 3. Reinforcement learning Supervised Learning Supervised learning is a type of machine learning method in which we provide sample labeled data to the machine learning system in order to train it, and on that basis, t predicts the output. ‘The system creates a madel using labeled data to understand the datasets and learn about esch data, ‘once the training and processing are dane then we test the madel by providing a sample data to check whether itis predicting the exact output or not. The goal of supervised leaning is to map input data with the output data. The supervised learning is based on supervision, and itis the same as when a student learns things in the supervision of the teacher. The example of supervised learning is spam filtering. ‘Supenvised learning can be grouped further in two categaries of algorithms: Unsupervised lesmning is a learning method in which a machine learns without any supervision. The training is provided to the machine with the set of data that has not been labeled, classified, or categorized, and the algorithm needs ta act on that data without any supervision. The goal of unsupervised learning is to restructure the input data into new features or a group of objects with similar patterns. In unsupervised learning, we don’t have a predetermined result. The machine tries to find useful insights from the huge amount of data. It can be further classifieds into two categories af algorithms: © Clustering © Association Reinforcement leaming Reinforcement learning is a feedback-based learning method, in which a learning agent gets a reward for each right action and gets a penalty for each wrong action. The agent learns automatically with these feedbacks and improves its performance. In reinforcement learning, the agent interacts with the environment and explores it. The goal of an agent is to get the most reward points, and hence, it improves its performance. The robotic dog, which automatically learns the movement of his arms, is an example of Reinforcement leerning. Machine learning at present Now machine learning has got 2 great advancement in its research, and it is present everywhere around us, such as self-driving cars, Amazon Alexa, Catboats, recommender system, and many more. It includes Supervised, unsupervised, and reinforcement learning with clustering, classification, decision tree, SVM algorithms, etc Modern machine learning models can be used for making various predictions, including weather prediction, disease prediction, stock market analysis, etc. Prerequisites Before learning machine learning, you must have the basic knowledge of followings so that you can easily understand the concepts of machine learning: Fundamental knowledge of probability and linear algebra. 2 The ability to code in any computer language, especially in Python language. 2 Knowledge of Calculus, especially derivatives of single variable and multivariate functions. APPLICATIONS OF MACHINE LEARNING Applications of Machine learning ast " Soe" Machine Learning oO life-cycle 1. Gathering Data: Data Gathering is the first step of the machine learning life cycle. The goal of this step is to identify and obtain all data-related problems. In this step, we need to identify the different data sources, as data can be collected from various sources such as files, database, internet, or mobile devices. It is one of the most important steps of the life cycle. The quantity and quality of the collected data will determine the efficiency of the output. The more will be the data, the more accurate will be the prediction. This step includes the below tasks: > Identify various data sources » Collect data « Integrate the data obtained from different sources By performing the above task, we get a coherent set of data, also called as a dataset. It will be used in further steps. 2. Data preparation After collecting the data, we need to prepare it for further steps. Data preparation is a step where we put our data into a suitable place and prepare it to use in our machine learning training. In this step, first, we put all data together, and then randomize the ordering of data. This step can be further jed into two processes: 2 Data exploration: Itis used to understand the nature of data that we have to work with. We need to understand the characteristics, format, and quality of data A better understanding of data leads to an effective outcome. In this, we find Correlations, general trends, and outliers. > Datapre-processing: Now the next step is preprocessing of data for its analysis. 3. Data Wrangling Data wrangling is the process of cleaning and converting raw data into a useable format. It is the process of cleaning the data, selecting the variable to use, and transforming the data in a proper format to make it more suitable for analysis in the next step. It is one of the most important steps of the complete process. Cleaning of data is required to address the quality issues. It is not necessary that data we have collected is always of our use as some of the data may not be useful. In real-world applications, collected data may have various issues, including > Duplicate data > Invalid data o Noise So, we use various filtering techniques to clean the data. Itis mandatory to detect and remove the above issues because it can negatively affect the quality of the outcome. 4, Data Analysis Now the cleaned and prepared data is passed on to the analysis step. This step involves: © Selection of analytical techniques > Building models o Review the result The aim of this step is to build a machine learning model to analyze the data using various analytical techniques and review the outcome. It starts with the determination of the type of the problems, where we select the machine learning techniques such as Classification, Regression, Cluster analysis, Association, etc then build the model using prepared data, and evaluate the model 5. Train Model Now the next step is to train the model, in this step we train our model to imprave its performance for better outcome of the problem. We use datasets to train the model using various machine learning algorithms. Training a model is required so that it can understand the various pattems, rules, and, features. 6. Test Model Once our machine learning model has been trained on a given dataset, then we test the model. In this step, we check for the accuracy of our model by providing a test dataset to it. Testing the model determines the percentage accuracy of the model as per the requirement of project or problem. 7. Deployment The last step of machine learning life cycle is deployment, where we deploy the model in the real-world system. If the above-prepared model is producing an accurate result as per our requirement with acceptable speed, then we deploy the model in the real system. But before deploying the project, we will check whether it is improving its performance using available data or not. The deployment phase is similar to making the final report for a project. Anaconda and python Artificial intelligence is a technology which Machine learning is a subset of Al which allows a machine enables a machine to simulate human to automatically learn from past data without programming behavior. explicitly. The goal of Al is to make a smart computer The goal of ML is to allow machines to learn from data so system like humans to solve complex that they can give accurate output. problems. In Al, we make intelligent systems to perform In ML, we teach machines with data to perform a particular any task like a human. task and give an accurate result. Machine learning and deep learning are the two main subsets of Al Al has a very wide range of scope Al is working to create an intelligent system which can perform various complex tasks. Al system is concerned about maximizing the chances of success. The main applications of Al are Siri, customer support using catboats, Expert System, Online game playing, intelligent humanoid robot, etc On the basis of capabilities, Al can be divided into three types, which are, Weak Al, General Al, and Strong Al. It includes learning, and self- correction. reasoning, Al completely deals with Structured, semi- structured, and unstructured data. Deep learning is a main subset of machine learning. Machine learning has a limited scope Machine learning is working to create machines that can perform only those specific tasks for which they are trained. Machine learning is mainly concerned about accuracy and Patterns. The main applications of machine learning are Online recommender system, Google search algorithms, Facebook auto friend tagging suggestions etc. Machine learning can also be divided into mainly three types that are Supervised learning, Unsupervised learning, and Reinforcement learning It includes leaming and self-correction when introduced with new data. Machine learning deals with Structured and structured data semi- How to get datasets for chine Learning The key to successin the field of machine learning or to become a great data scientist is to practice with different types of datasets. But discovering a suitable dataset for each kind of machine learning project is a difficult task. So, in this topic, we will provide the detail of the sources from where you can easily get the dataset according to your project. Before knowing the sources of the machine learning dataset, let's discuss datasets. What is a dataset? A dataset is a collection of data in which data is arranged in some order. A dataset can contain any data froma series of an array to a database table. Below table shows an example of the dataset Country Age Salary Purchased India 38 4000 No France 43 45000 Yes Germany 30 54000 No France 48 65000 No Germany 40 Yes India 35 se000 Yes A tabular dataset can be understood as a database table or matrix, where each column corresponds to a particular variable, and each row corresponds to the fields of the dataset. The most supported file type for a tabular dataset is "Comma Separated File,” or CSV. But to store a “tree-like data,” we can use the JSON file more efficiently Types of data in datasets Numerical dataSuch as house price, temperature, et. © Categorical data:Such as Yes/No, True/False, Blue/green, etc @ Ordinal data:These data are similar to categorical data but can be measured on the basis of comparison Note: A real-world dataset is of huge size, which is difficult to manage and process at the initial level. Therefore, to practice machine learning algorithms, we can use any dummy dataset. Need of Dataset To work with machine learning projects, we need a huge amount of data, because, without the data, one cannot train ML/Al models. Collecting and preparing the dataset is one of the most crucial parts while creating an ML/AI project. The technology applied behind any ML projects cannot work properly if the dataset is not well prepared and pre-processed During the development of the ML project, the developers completely rely on the datasets, In building ML applications, datasets are divided into two parts: ® Training dataset: > Test Dataset Cela a cern Tn ate Final performance estimate Popular sources for Machine Learning datasets Kaggle Datasets UCI Machine Learning Repository Datasets via AWS Google's Dataset Search Engine Microsoft Datasets Awesome Public Dataset Collection Government Datasets Computer Vision Datasets SUPERVISED MACHINE LEARNING Supervised learning is the types of machine learning in which machines are trained using well “labelled” training data, and on basis of that data, machines predict the output. The labelled data means some input data is already tagged with the correct output. In supervised learning, the training data provided to the machines work as the supervisor that teaches the machines to predict the output correctly. It applies the same concept as a student learns in the supervision of the teacher. Supervised learning is a process of providing input data as well as correct output data to the machine learning model. The aim of 2 supervised learning algorithm is to find a mapping function to map the input variable(x) with the output variable(y) In the real-world, supervised learning can be used for Risk Assessment, Image classification, Fraud Detection, spam filtering, etc. How ed Learning Works? In supervised learning, models are trained using labelled dataset, where the model learns about each type of data. Once the training process is completed, the model is tested on the basis of test data (a subset of the training set), and then it predicts the output. The working of Supervised learning can be easily understood by the below example and diagram: Labeled Data Square Triangle Model Training Lables Test Data Suppose we have a dataset of different types of shapes which includes square, rectangle, triangle, and Polygon. Now the first step is that we need to train the model for each shape. © If the given shape has four sides, and all the sides are equal, then it will be labelled as a Square > If the given shape has three sides, then it will be labelled asa triangle. © If the given shape has six equal sides then it will be labelled as hexagon. Now, after training, we test our model using the test set, and the task of the model is to identify the shape. The machine is already trained on all types of shapes, and when it finds a new shape, it classifies the shape on the bases of a number of sides, and predicts the output. Steps Involved in rvised Learni First Determine the type of training dataset > Collect/Gather the labelled training data. © Split the training dataset into training dataset, test dataset, and validation dataset. > Determine the input features of the training dataset, which should have enough knowledge so that the model can accurately predict the output. 9 Determine the suitable algorithm for the model, such as support vector machine, decision tree, ete. Execute the algorithm on the training dataset. Sometimes we need validation sets as the control parameters, which are the subset of training datasets. > Evaluate the accuracy of the model by providing the test set. If the model predicts the correct output, which means our model is accurate. Types of supervised Machine learning Algorithms Supervised learning can be further divided into two types of problems Supervised Learning 1. Regression Regression algorithms are used if there is a relationship between the input variable and the output variable. It is used for the prediction af continuous variables, suchas Weather forecasting, Market Trends, etc. Below are some popular Regression algorithms which come under supervised learning: © Linear Regression Regression Trees 2 Non-Linear Regression © Bayesian Linear Regression © Palynomial Regression 2. Classification Classification algorithms are used when the output variable is categorical, which means there are two classes such as Yes-No, Male-Female, True-false, etc. Spam Filtering, 2 Random Forest 2 Decision Trees 2 Logistic Regression © Support vector Machines Advantages of Supervised learning: the help of supervised learning, the model can predict the output on the basis of prior experiences. © In supervised learning, we can have an exact idea about the classes of objects. > Supervised learning model helps us to solve various real-world problems such as fraud detection, spam filtering, etc Disadvantages of supervised learning: > Supervised learning models are not suitable for handling the complex tasks > Supervised learning cannot predict the correct output if the test data is different from the training dataset. © Training required lots of computation times. o Insupervised learning, we need enough knowledge about the classes of object. Regression Analysis in Machine learning Regression analysis is a statistical method to model the relationship between a dependent (target) and independent (predictor) variables with one or more independent variables. More specifically, Regression analysis helps us to understand how the value of the dependent variable is changing corresponding to an independent variable when other independent variables are held fixed. It predicts continuous/real values such as temperature, age, salary, price, etc We can understand the concept of regression analysis using the below example: Example: Suppose there is a marketing company A, who does various advertisement every year and get sales on that. The below list shows the advertisement made by the company in the last 5 years and the corresponding sales: Advertisement Now, the company wants to do the advertisement of $200 in the year 2019 and wants to know the prediction about the sales for problems in machine learning, we need regression analysis. is year. Su ta solve such type of prediction Regression is a supervised learning technique which helps in finding the correlation between variables and enables us to predict the continuous output variable based on the one or more predictor variables. It is mainly used for prediction, forecasting, time series modeling, and determining the causal-effect relationship between variables. In Regression, we plota graph between the variables which best fits the given datapoints, using this plot, the machine learning model can make predictions about the data. In simple words, "Regression shows a line or curve that passes through ail the datapoints on target-predictor graph in such a way that the vertical distance between the datapoints and the regression line is minimum." The distance between datapoints and line tells whether a model has captured a strang relationship or not. Some examples of regression can be as: © Prediction of rain using temperature and other factors « Determining Market trends e Prediction of road accidents due to rash driving. Terminologies Related to the Reagression Analysis + Dependent Variable: The main factor in Regression analysis which we want to predict or derstand is called the dependent variable. It is also called target variable. © Independent Variable: The factors which affect the dependent variables or which are used to predict the values of the dependent variables are called independent va ble, also called as a predictor. © Outliers: Outlier is an observation which contains either very low value or very high value in comparison to other observed values. An outlier may hamper the result, so it should be avoided + Multicollinearity: If the independent variables are highly correlated with each other than other variables, then such condition is called Multicollinearity. It should not be present in the dataset, because it creates problem while ranking the most affecting variable. « Underfitting and Overfitting: If our algorithm works well with the trai ing dataset but not well with test dataset, then such problemis called Overfitting. And if our algorithm does not perform well even with training dataset, then such problem is called underfitting Why do we use Regression Analysis? As mentioned above, Regression analysis helps in the prediction of a continuous variable. There are various scenarios in the real world where we need some future pred such as weather condition, sales prediction, marketing trends, etc., for such case we need some technology which can make predictions more accurately. Sa for such case we need Regression analysis which is a statistical method and used in machine learning and data science. Below are some other reasons for using Regression analysis: e Regression estimates the relationship between the target and the independent variable. It is used to find the trends in data. Ithelps to predict real/ continuous values. By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors. Types of Regression There are various types of regressions which are used in data science and machine learning. Each type has its own importance on different scenarios, but at the core, all the regression methads analyze the effect of the independent variable on dependent variables. Here we are discussing some below: portant types of regression which are given Linear Regression Logistic Regression Polynomial Regression Support Vector Regression Decision Tree Regression Random Forest Regression Ridge Regression Lasso Regression: ear Regression Linear regression is a statistical regression method which is used for predictive analysis. Itis one of the very simple and easy algorithms which warks on regression and shows the relationship between the continuous variables. ILis used for solving the regression problem in machine learning, Linear regression shows the lincar relationship between the independent variable (X- and the dependent variable (Y-axis), hence called linear regression. Ifthere is only one input variable (x), thea such linear regression is called simple linear regression. And if there is more than one input variable, then such linear regression is called multiple linear regression. ‘The relationship between variables in the linear regression model can be explained using the below image. Here we are predicting the salary of an employee on the basis of the year of experience. 4 6 8 10 experience —____» 1. YeaX+b Here, oY = dependent —_variables (target _variables), Ke Independent variables (predictor variables), aand bare the linear coefficients Some popular applications of linear regression are: Analyzing trends and sales cstimates Salary forecasting Real estate prediction Arriving at ETAs in waffic. Logistic Regressio: Logistic regression is another supervised learning algorithm which is used to solve the classification problems. In classification problems, we have dependent variables in a binary or diserete format such as 0 ar 1. Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or False, Spam ar not spam, ete. 2 Itis.a predictive analysis algorithm which works on the concept of probability © Logistic regression is a type of regression, but it is different from the linear regression algorithm in the term how they are used. Logistic regression uses sigmoid function or logistic function which is a complex cost function. This sigmoid function is used to model the data in logistic regression. The function can be represented as © {x)= Output between the 0 and 1 value. x= input to the function © ¢= base of natural logarithm, When we provide the input values (data) to the function, it gives the S-curve as follows: « It uses the concept of threshold levels, values above the threshold level are rounded up te 1, and values below the threshold level are rounded up to 0. There are three types of logistic regression: « Binary(0/1, pass/fail) Multi(cats, dogs, lions) » Ordinal(low, medium, high) Polynomial Regression © Polynomial Regression is a type of regression which models the non-linear dataset using a linear model. It is similar to multiple linear regression, but it its 2 non-linear curve between the value ‘of x and corresponding conditional values of y. © Suppose there is.a dataset which consists of datapoints which are presentin a non-linear fashion, so for such case, linear regression will not best fit Lo those datapoints. To cover such datapoints, we need Polynomial regression. In Polynomial regression, the original features are transformed into polynomial features of given degree and then modeled using a linear model, Which means the datapoints are best fitted using a polynomial line ‘The equation for polynomial regression also derived from linear regression equation that means Linear regression equation Y= ba+ brx, is transformed into Polynomial regression equation Y= be+buxt box*+ bax... box" Here Y is the predicted/target output, bo, bi... baare the regression coefficients. xis our independent/input variable. ‘The model is still linear as the coefficients are still linear with quadratic Y

You might also like