Machine Learning
Machine Learning
Once the data is collected you need to validate if the quantity is sufficient for the use case (if it is time-series data,
we need a minimum of 3–5 years of data).
The two important things we do while doing a machine learning project are selecting a learning algorithm and
training the model using some of the acquired data.
So as humans, we naturally tend to make mistakes and as a result, things may go wrong.
Here, the mistakes could be opting for the wrong model or selecting data that is bad.
NON-REPRESENTATIVE TRAINING DATA
The training data should be representative of the new cases to generalize well i.e., the data we use for training
should cover all the cases that occurred and that are going to occur.
By using a non-representative training set, the trained model is not likely to make accurate predictions.
Systems which are developed to make predictions for generalized cases in business problem view is said to be
good machine learning models.
It will help the model to perform well even for the data which the data model has never seen.
If the number of training samples is low, we have sampling noise which is unrepresentative data, again countless
training tests bring sampling bias if the strategy utilized for training is defective.
POOR QUALITY OF DATA
In reality, we don’t directly start training the model, analyzing data is the most important step.
But the data we collected might not be ready for training, some samples are abnormal from others having outliers
or missing values for instance.
In these cases, we can remove the outliers, or fill the missing features/values using median or mean (to fill height)
or simply remove the attributes/instances with missing values, or train the model with and without these
instances.
We don’t want our system to make false predictions, right? So the quality of data is very important to get
accurate results.
Data preprocessing needs to be done by filtering missing values, and extract & rearrange what the model needs.
IRRELEVANT/UNWANTED FEATURES
If the training data contains a large number of irrelevant features and enough relevant features, the machine
learning system will not give the results as expected.
One of the important aspects required for the success of a machine learning project is the selection of good
features to train the model also known as Feature Selection.
OVERFITTING THE TRAINING DATA
Underfitting which is opposite to overfitting generally occurs when the model is too simple to understand the
base structure of the data.
It’s like trying to fit into undersized pants.
It generally happens when we have less information to construct an exact model and when we attempt to build
or develop a linear model with non-linear information.
Main options to reduce underfitting are:
Feature Engineering — feeding better features to the learning algorithm.
Removing noise from the data.
Increasing parameters and selecting a powerful model.
OFFLINE LEARNING & DEPLOYMENT OF THE MODEL
Machine Learning engineering follows these steps while building an application 1) Data collection 2) Data cleaning
3) Feature engineering 4) Analyzing patterns 5) Training the model and Optimization 6) Deployment.
A lot of machine learning practitioners can perform all steps but can lack the skills for deployment, bringing their
cool applications into production has become one of the biggest challenges due to lack of practice and
dependencies issues, low understanding of underlying models with business, understanding of business problems,
unstable models.
Generally, many of the developers collect data from websites like Kaggle and start training the model.
But in reality, we need to make a source for data collection, that varies dynamically.
Offline learning or Batch learning may not be used for this type of variable data.
The system is trained and then it is launched into production, and runs without learning anymore.
Here the data might drift as it changes dynamically.
MACHINE LEARNING TECHNIQUES
SUPERVISED LEARNING
• Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the
mapping function from the input to the output.
Y = f(X)
• Goal: To approximate the mapping function so well that when you have new input data (x), you can predict the output
variables (Y) for that data
TYPES OF SUPERVISED LEARNING
•Linear Regression:
•Used for predicting continuous values by fitting a linear relationship between input features and output.
•Logistic Regression:
•Used for binary classification problems, predicting the probability of a binary outcome.
•Decision Trees:
•A tree-like model that makes decisions based on the features of the input data. It’s intuitive and interpretable.
•Random Forest:
•An ensemble method that combines multiple decision trees to improve accuracy and reduce overfitting.
•Support Vector Machines (SVM):
•A powerful classification technique that finds the hyperplane that best separates different classes in
the feature space
•K-Nearest Neighbors (KNN):
•A non-parametric method where the classification of a data point is based on the majority class among its
nearest neighbors.
•Neural Networks:
•Complex models that mimic the human brain’s structure, suitable for both classification and regression tasks,
especially in high-dimensional spaces.
•Gradient Boosting Machines (GBM):
•An ensemble technique that builds models sequentially, where each new model corrects the errors of the
previous ones.
•AdaBoost:
•An ensemble method that combines multiple weak classifiers to create a strong classifier, focusing on
misclassified instances.
•Naive Bayes:
•A probabilistic classifier based on Bayes’ theorem, assuming independence between features. It’s particularly
effective for text classification.
LINEAR REGRESSION
Linear regression is a statistical regression method that is used for predictive analysis.
One of the very simple and easy algorithms which work on regression and shows the relationship between the
continuous variables.
Linear regression shows the linear relationship between the independent variable (X-axis) and the dependent
variable (Y-axis), hence called linear regression.
If there is only one input variable (x), then such linear regression is called simple linear regression.
If there is more than one input variable, then such linear regression is called multiple linear regression.
LINEAR REGRESSION
The relationship between variables in the linear regression model can be explained using the image on right. Here, we are predicting the
salary of an employee on the basis of the year of experience.
Below is the mathematical equation for Linear regression:
Y = aX + b
Here,Y = dependent variables (target variables),
X= Independent variables (predictor variables),
a and b are the linear coefficients
Some popular applications of linear regression are:
Analyzing trends and sales estimates
Salary forecasting
Real estate prediction
Arriving at ETAs in traffic.
POLYNOMIAL REGRESSION
• A type of regression that models the non-linear dataset using a linear model.
• Similar to multiple linear regression, but it fits a non-linear curve between the value of x and
corresponding conditional values of y.
• Suppose there is a dataset that consists of data points that are present in a non-linear fashion, so for
such a case, linear regression will not best fit those data points.
• To cover such data points, we need Polynomial regression.
• In Polynomial regression, the original features are transformed into polynomial features of a
given degree and then modeled using a linear model.
• The data points are best fitted using a polynomial line.
POLYNOMIAL REGRESSION
• Logistic regression is another supervised learning algorithm that is used to solve classification problems.
• In classification problems, we have dependent variables in a binary or discrete format such as 0 or 1.
• Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes or No, True or
False, Spam or not spam, etc.
• It is a predictive analysis algorithm that works on the concept of probability.
• Logistic regression uses a sigmoid function or logistic function which is a complex cost function. This
sigmoid function is used to model the data in logistic regression.
LOGISTIC REGRESSION
Support Vector Machine is a supervised learning algorithm that can be used for regression as well as
classification problems
Support Vector Regression (SVR) is a regression algorithm that works for continuous variables.
Below are some keywords that are used in Support Vector Machine:
• Kernel: It is a function used to map lower-dimensional data into higher dimensional data.
• Hyperplane: In general SVM, it is a separation line between two classes, but in SVR, it is a line that helps to
predict the continuous variables and covers most of the data points.
• Boundary line: Boundary lines are the two lines apart from the hyperplane, which creates a margin for data
points.
• Support vectors: Support vectors are the data points that are nearest to the hyperplane and opposite class.
SUPPORT VECTOR MACHINE
Assumes the similarity between the new case/data and available cases and put the new case into the category that
is most similar to the available categories.
Stores all the available data and classifies a new data point based on the similarity.
Mostly used for classification problems.
K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.
It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it
stores the dataset and at the time of classification, it performs an action on the dataset.
K-NEAREST NEIGHBOR
Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1,
so this data point will lie in which of these categories.
To solve this type of problem, we need the KNN algorithm.
With the help of K-NN, we can easily identify the category or class of a particular dataset.
UNSUPERVISED LEARNING
• The training of machine using information that is neither classified nor labeled and allowing the algorithm to act on that
information without guidance
• Task of the unsupervised learning algorithm is to group unsorted information according to similarities, patterns, and
differences without any prior training of data
TYPES OF UNSUPERVISED LEARNING
• Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping
customers by purchasing behavior.
• Dimensionality reduction: Refers to techniques for reducing the number of input variables in training data
• More input features often make a predictive modeling task more challenging to model, more generally referred to as the
curse of dimensionality
• Association: An association rule learning problem is where you want to discover rules that describe large portions of your
data, such as people that buy X also tend to buy Y.
CLUSTERING
Feature Selection: The process of selecting the subset of the relevant features and leaving out the
irrelevant features present in a dataset to build a model of high accuracy.
Three methods are used for the feature selection:
Filters Methods
Wrappers Methods
Embedded Methods
DIMENSIONALITY REDUCTION TECHNIQUES
Feature Selection
Filter Methods: In this method, the dataset is filtered, and Embedded Methods: Embedded methods check the
a subset that contains only the relevant features is taken. different training iterations of the machine learning model
Some common techniques of filters method are: and evaluate the importance of each feature.
Correlation Some common techniques of Embedded methods are:
Chi-Square Test LASSO
Elastic Net
ANOVA
Ridge Regression, etc.
Information Gain
Wrappers Methods: The wrapper method has the same
goal as the filter method, but it takes a machine learning
model for its evaluation.
In this method, some features are fed to the ML model, and
evaluate the performance.
The performance decides whether to add those features or
remove them to increase the accuracy of the model.
This method is more accurate than the filtering method but
complex to work.
DIMENSIONALITY REDUCTION TECHNIQUES
Feature Extraction: Feature extraction is the process of transforming the space containing many dimensions
into space with fewer dimensions.
This approach is useful when we want to keep the whole information but use fewer resources while processing the
information.
Some common feature extraction techniques are:
Principal Component Analysis
Linear Discriminant Analysis
Kernel PCA
Quadratic Discriminant Analysis
POPULAR DIMENSIONALITY REDUCTION TECHNIQUES
Backward Feature Selection: Mainly used while developing Linear Regression or Logistic Regression model.
Below steps are performed in this technique to reduce the dimensionality or in feature selection:
In this technique, firstly, all the n variables of the given dataset are taken to train the model.
The performance of the model is checked.
Now we will remove one feature each time and train the model on n-1 features for n times, and will compute the performance of the
model.
We will check the variable that has made the smallest or no change in the performance of the model, and then we will drop that variable
or features; after that, we will be left with n-1 features.
Repeat the complete process until no feature can be dropped.
In this technique, by selecting the optimum performance of the model and maximum tolerable error rate, we can define the optimal
number of features required for the machine learning algorithms.
POPULAR DIMENSIONALITY REDUCTION TECHNIQUES
Forward Feature Selection: Follows the inverse process of the backward elimination process.
It means, in this technique, we don't eliminate the feature; instead, we will find the best features that can produce
the highest increase in the performance of the model.
Below steps are performed in this technique:
• We start with a single feature only, and progressively we will add each feature at a time.
• Here we will train the model on each feature separately.
• The feature with the best performance is selected.
• The process will be repeated until we get a significant increase in the performance of the model.
Missing Value Ratio: If a dataset has too many missing values, then we drop those variables as they
do not carry much useful information.
To perform this, we can set a threshold level, and if a variable has missing values more than that threshold, we
will drop that variable.
The higher the threshold value, the more efficient the reduction.
POPULAR DIMENSIONALITY REDUCTION TECHNIQUES
• Reinforcement learning is the training of machine learning models to make a sequence of decisions
• The agent learns to achieve a goal in an uncertain, potentially complex environment
REINFORCEMENT LEARNING ALGORITHMS
SARSA (state-action-reward-state-action):
An on-policy reinforcement learning algorithm that estimates the value of the policy being followed.
In this algorithm, the agent grasps the optimal policy and uses the same to act.
The policy that is used for updating and the policy used for acting is the same, unlike in Q-learning.
An experience in SARSA is of the form ⟨S,A,R,S’, A’⟩, which means that
current state S,
current action A,
reward R, and
new state S’,
future action A’.
This provides a new experience to update from:
Q(S,A) to R+γQ(S’,A’)
REINFORCEMENT LEARNING ALGORITHMS