0% found this document useful (0 votes)

6 views32 pages

ML Unit-1

Machine Learning is a subset of artificial intelligence focused on developing algorithms that enable computers to learn from data and improve performance autonomously. Its importance is underscored by the rapid increase in data production, the ability to solve complex problems, and applications across various sectors such as healthcare, finance, and retail. The document also outlines different types of machine learning techniques, including supervised, unsupervised, and reinforcement learning, along with popular algorithms and their applications.

Uploaded by

praveenkumardevarakonda8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views32 pages

ML Unit-1

Uploaded by

praveenkumardevarakonda8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 32

UNIT-1

INTRODUCTION
1.a) What is Machine learning? Explain the need of it. [L2][CO1][2M]

Machine Learning is said as a subset of artificial intelligence that is mainly concerned with the
development of algorithms which allow a computer to learn from the data and past experiences on their
own. The term machine learning was first introduced by Arthur Samuel in 1959.A machine has the
ability to learn if it can improve its performance by gaining more data. The need for machine learning
is increasing day by day. The reason behind the need for machine learning is that it is capable of doing tasks
that are too complex for a person to implement directly.

Following are some key points which show the importance of Machine Learning:

 Rapid increment in the production of data

 Decision making in various sector including finance

 It can handle vast amounts of data efficiently, making it suitable for big data
applications.
 Solving complex problems, which are difficult for a human
 Finding hidden patterns and extracting useful information from data

1b) List out applications and some popular algorithms used in Machine
Learning. Explain it? [L2][CO1]10m
1. Healthcare: Machine learning revolutionizes healthcare by enabling advanced diagnostics,
personalized treatments, and predictive analysis. By analyzing medical data such as patient history, lab
results, and imaging scans, ML algorithms assist doctors in identifying diseases early and recommend
appropriate treatments. For instance, IBM Watson Health uses ML to analyze large datasets, helping
oncologists make informed decisions about cancer treatment plans, potentially improving patient
outcomes.
2. Finance: In the finance sector, machine learning plays a crucial role in fraud detection, risk
assessment, and algorithmic trading. By studying patterns in transactions, ML models can identify
suspicious activities and prevent fraud. PayPal, for example, uses ML-powered systems to monitor
real-time transaction data and detect anomalies that could indicate fraudulent behaviour, safeguarding
customers and businesses alike.
3. Retail and E-commerce: ML enhances the shopping experience by providing personalized
recommendations, optimizing inventory management, and improving marketing strategies. E-
commerce platforms like Amazon employ ML algorithms to analyze customer browsing and

1
purchasing habits, offering tailored product recommendations. This feature increases customer
satisfaction and drives sales growth for businesses.
4. Automotive Industry: Autonomous vehicles rely heavily on machine learning for navigation,
obstacle detection, and decision-making in real-time. Tesla’s Autopilot system uses ML to process data
from cameras, sensors, and GPS to detect lanes, identify obstacles, and drive safely without human
intervention. This technology is paving the way for self-driving cars to become a common feature on
roads.
5. Natural Language Processing (NLP): NLP-powered applications enable computers to
understand, interpret, and generate human language. Tools like Google Translate utilize ML-based
neural machine translation to deliver accurate translations while preserving context. Similarly, ML
drives virtual assistants and chatbots like Siri and Alexa, allowing them to understand voice commands
and provide helpful responses.
6. Image and Video Analysis: Machine learning is widely used for facial recognition, object
detection, and video analysis. Social media platforms like Facebook (Meta) use ML algorithms for
facial recognition, helping users tag friends in photos easily. ML is also used in entertainment, such as
creating realistic visual effects for movies or enhancing video content.
7. Cybersecurity: Cybersecurity applications of ML include real-time threat detection, vulnerability
assessment, and incident response. Companies like Darktrace use ML algorithms to monitor network
traffic, identify anomalies, and prevent cyberattacks before they cause damage. By constantly learning
from historical data, ML systems can adapt to new threats effectively.
8. Education: ML personalizes education by adapting lessons to students' learning styles and
proficiency levels. Platforms like Duolingo employ ML to customize language lessons based on the
user’s performance and pace, ensuring an engaging and effective learning experience. Additionally,
ML is used to automate grading and provide instant feedback.
9. Agriculture: Precision agriculture uses machine learning to monitor crops, predict weather
conditions, and optimize farming resources. John Deere has integrated ML into its smart tractors,
which analyze field data to determine the best methods for planting and harvesting. This technology
helps increase agricultural productivity and sustainability.
10. Energy Sector: Machine learning optimizes energy consumption, resource management, and
renewable energy integration. Google DeepMind uses ML in its data centers to predict and reduce
energy usage for cooling systems, achieving up to 40% cost savings. ML also aids in forecasting
energy demands and balancing supply in smart grids.

Here are six popular machine learning algorithms:

1. Linear Regression: Linear regression is a supervised learning algorithm used to predict a
continuous value based on input features. It establishes a linear relationship between the dependent
and independent variables, represented by a straight-line equation.
2. Logistic Regression: Despite its name, logistic regression is used for classification problems
rather than regression. It predicts the probability of a categorical outcome, such as binary classification
(e.g., spam vs. not spam), using the logistic function.
3. Decision Trees: Decision trees are non-linear algorithms used for both classification and
regression tasks. They split data into subsets based on feature values, creating a tree-like structure that
makes decisions by following branches.
4. Support Vector Machines (SVM): SVM is a powerful algorithm for classification and regression
tasks. It works by finding the hyperplane that best separates data points of different classes in a high-
dimensional space, maximizing the margin between the classes.
5. K-Means Clustering: K-Means is an unsupervised learning algorithm used for clustering data
points into groups. It partitions the dataset into clusters based on similarity, minimizing the distance
between data points and the cluster centroids.

2
6. Random Forest: Random Forest is an ensemble learning algorithm that builds multiple decision
trees and combines their outputs for better accuracy and robustness. It is widely used for classification
and regression problems and handles missing data well.

2a) Explain the various types of Machine Learning techniques with neat diagrams. [L2][CO1] 8m
At a broad level, machine learning can be classified into three types:
1. Supervised learning
2.Unsupervised learning
3.Reinforcement learning

1. Supervised machine learning

Supervised learning is a machine learning technique where the model learns from labelled data.
The data includes inputs (features) and corresponding outputs (labels or targets). The goal is to
teach the model to make accurate predictions for new, unseen data based on the patterns it learns
during training.
Steps to Solve a Supervised Learning Problem
1. Data Collection:
- Gather a dataset that contains input features and their corresponding output labels.
- Example: For predicting house prices, you need features like area, location, number of
bedrooms, and the house price (label).
2. Data Preprocessing:
- Handle missing data, outliers, and errors in the dataset.
- Normalize or standardize features if they have different scales.
- Split the dataset into training and testing subsets (e.g., 80% training, 20% testing).
3. Feature Selection/Engineering:
- Choose the most relevant features for the problem.
- Create new features (if needed) to enhance the model's performance.
4. Choose a Model:
- Select a supervised learning algorithm based on the problem type:
- Regression: Linear Regression, Decision Trees, Random Forests.
- Classification: Logistic Regression, Support Vector Machines, K-Nearest Neighbours.
5. Train the Model:
- Use the training data to fit the selected algorithm. The model learns by minimizing the error
(loss function) between its predictions and the actual labels.
6. Evaluate the Model:
- Test the model on the testing data to measure its performance.
- Use metrics like accuracy, precision, recall (for classification) or RMSE, MAE (for regression).

3
7. Optimize the Model:
- Fine-tune hyperparameters (e.g., learning rate, depth of the tree) using techniques like Grid
Search or Random Search.
- Prevent overfitting by using methods such as cross-validation, regularization, or dropout.
8. Deploy the Model:
- Once the model performs well on the testing data, deploy it to make predictions on real-world
data.

Types of supervised Machine learning Algorithms:

Supervised learning can be further divided into two types of problems:

1. Regression

Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come
under supervised learning:

o Linear Regression

o Regression Trees o Non-Linear Regression

o Bayesian Linear Regression o Polynomial Regression

4
2. Classification: Classification algorithms are used when the output variable is categorical,
which means there are two classes such as Yes-No, Male-Female, True-false, etc

o Random Forest o Decision Trees

o Logistic Regression o Support vector Machines

Unsupervised machine Learning: Unsupervised learning is a machine learning technique where

models are trained on unlabelled data . Unlike supervised learning, the algorithm doesn't have
predefined outputs or target labels to guide its learning process. Instead, it tries to identify hidden
structures, patterns, or relationships within the data.

How It Works:

Unsupervised learning algorithms rely on discovering similarities or groupings in data based on their
inherent features. The model explores the data distribution and organizes it, providing insights that
would be difficult for humans to interpret manually, especially in large datasets.

Types of Unsupervised Learning:

1.Clustering: - Clustering involves grouping data points into clusters based on their similarity or
distance from one another. Points within the same cluster are more similar to each other compared to
those in other clusters.

- Example Algorithm: K-Means, DBSCAN, Hierarchical Clustering.

- Applications: Customer segmentation in marketing, grouping similar documents, and biological

data analysis.

2. Dimensionality Reduction:

- This technique reduces the number of features in a dataset while retaining its meaningful
characteristics. It helps in visualizing high-dimensional data and speeding up computations.

- Example Algorithm: Principal Component Analysis (PCA), t-SNE, Autoencoders.

- Applications: Data compression, feature extraction, and preprocessing for supervised learning.

Steps to Apply Unsupervised Learning:

1. Data Collection:

- Gather data without any labels or predefined categories.

5
- Example: Collect demographic information for users without knowing their preferences.

2. Preprocessing:

- Clean and normalize the data to ensure uniformity. Standardize feature scales to avoid dominance
of certain variables.

- Example: Standardizing numeric features like income and age.

3. Algorithm Selection:

- Choose an unsupervised learning technique based on your objective (e.g., clustering for grouping,
PCA for dimensionality reduction).

4. Train the Model:

- Feed the dataset to the selected algorithm. The model identifies patterns, clusters, or latent
features without any labelled guidance.

5. Interpret Results:

- Evaluate the clusters or patterns discovered by the algorithm and interpret their meaning in the
real-world context.

Real-World Applications of Unsupervised Learning:

1. Market Segmentation: Retail companies group customers into clusters based on purchasing
behaviours to design targeted marketing campaigns.

2. Anomaly Detection:

In banking and cybersecurity, unsupervised learning detects unusual behaviours, such as fraudulent
transactions or network breaches.

3. Recommender Systems:

Streaming platforms like Netflix use clustering to recommend shows or movies by grouping users
with similar viewing patterns.

4. Image Compression:

Dimensionality reduction algorithms reduce the size of images while retaining their essential
features.

5. Genomics and Bioinformatics:

Clustering algorithms are used to group genes with similar expressions, aiding in medical research.

Types of Unsupervised Learning Algorithm:

The unsupervised learning algorithm can be further categorized into two types of problems:

6
o Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group. Cluster
analysis finds the commonalities between the data objects and categorizes them as per the presence and
absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs together in
the dataset. Association rule makes marketing strategy more effective. Such as people who buy X item
(suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical example of Association rule is
Market Basket Analysis.

Unsupervised Learning algorithms:

Below is the list of some popular unsupervised learning algorithms:

o K-means clustering o KNN (k-nearest neighbours) o Hierarchal clustering

o Anomaly detection o Neural Networks o Principle Component Analysis

o Independent Component Analysis o Apriority algorithm o Singular value decomposition

Reinforcement learning
It is an area of Machine Learning. It is about taking suitable action to maximize reward in a
particular situation. It is employed by various software and machines to find the best possible
behaviour or path it should take in a specific situation. Reinforcement learning differs from
supervised learning in a way that in supervised learning the training data has the answer key
with it so the model is trained with the correct answer itself whereas in reinforcement learning,
there is no answer but the reinforcement agent decides what to do to perform the given task. In
the absence of a training dataset, it is bound to learn from its experience.

7
Main points in Reinforcement learning –

• Input: The input should be an initial state from which the model will start
• Output: There are many possible outputs as there are a variety of solutions to a
particular problem
• Training: The training is based upon the input, The model will return a state and the
user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
The best solution is decided based on the maximum reward.

2b) Describe the applications of supervised learning. [L1][CO6] 4m

supervised learning applications with examples:

1. Email Spam Detection :The system identifies patterns within emails—like certain
keywords, sender information, or formatting—based on a dataset labelled as "spam" or "not
spam." It then applies these learned patterns to incoming emails to separate unwanted messages.
For example, spam filters in email services like Gmail automatically sort suspicious emails into
the spam folder, saving users from potential phishing or advertising overload.

2. Image Classification :By analyzing labelled images, the model understands visual
features like shapes, colours, and textures to categorize new images. For example, a photo
classification system trained on pictures of dogs and cats can distinguish between the two in
new images by recognizing fur patterns, facial structures, or tail shapes.

3. Fraud Detection :Using labelled transactional data that highlights fraudulent activity, the
model detects irregular spending patterns or unusual account behaviours to flag potential fraud.
For instance, banks monitor credit card transactions and alert users when they detect an anomaly,
such as unexpected purchases from foreign locations.

4. Sentiment Analysis :Based on text samples marked as positive, negative, or neutral, the
model can gauge sentiment in online reviews, social media posts, or customer feedback. For
instance, businesses use this technology to analyze tweets and product reviews, helping them
understand customer satisfaction and adapt their strategies accordingly.

5. Predictive Maintenance :The system examines historical machine performance data,

including vibrations, temperature, or wear indicators, to predict when a machine is likely to
fail. This helps industries pre-empt breakdowns, optimize maintenance schedules, and reduce
operational costs. For example, manufacturers rely on IoT devices to monitor and maintain
heavy machinery.

6. Medical Diagnosis :With access to medical records and diagnostic data labelled by
healthcare experts, the model can identify diseases or conditions. For example, an AI-powered
tool can detect pneumonia by analyzing thousands of labelled chest X-ray images, providing
doctors with a reliable second opinion.

7. Stock Price Prediction :Using financial data like historical stock prices, market trends,
and economic indicators, the model estimates future stock values. For example, investment
firms leverage these predictions to guide their trading strategies and portfolio management.

Supervised learning is like a teacher-student model, where the data acts as the teacher guiding
the algorithm to make accurate predictions or classifications.

8
3a)Compare Machine Learning and Artificial Intelligence. [L6][CO5] 6m
ARTIFICIAL INTELLIGENCE MACHINE LEARNING

1956 The terminology “Artificial The terminology “Machine Learning” was first
Intelligence” was originally used by John used in 1952 by IBM computer scientist Arthur
McCarthy, who also hosted the first AI Samuel, a pioneer in artificial intelligence and
conference. computer games.

AI stands for Artificial intelligence, where ML stands for Machine Learning which is
intelligence is defined as the ability to acquire defined as the
and apply knowledge. acquisition of knowledge or skill

AI is the broader family consisting of ML and Machine Learning is the subset of Artificial
DL as its components. Intelligence.

The aim is to increase the chance of success and The aim is to increase accuracy, but it does not
not accuracy. care about; the success

AI is aiming to develop an intelligent system Machine learning is attempting to construct

capable of performing a variety machines that can only accomplish the jobs for
of complex jobs. decision-making which they have been trained.

It works as a computer program that does smart Here, the tasks systems machine takes data and
work. learns from data.

The goal is to simulate natural intelligence to The goal is to learn from data on certain tasks to
solve complex problems. maximize the performance on that task.

AI has a very broad variety of applications. The scope of machine learning is constrained.

ML allows systems to learn new things from

AI is decision-making.
data.

It is developing a system that mimics humans to

It involves creating self-learning algorithms.
solve problems.

ML will go for a solution whether it is optimal or

AI will go for finding the optimal solution.
not.

AI leads to intelligence or wisdom. ML leads to knowledge.

AI is a broader family consisting of ML and DL

ML is a subset of AI.
as its components.

AI can work with structured, semi structured, and ML can work with only structured and semi
unstructured data. structured data.
3b) Describe classification techniques in supervised learning with an example. [L2][CO1] [6M]

Classification techniques in supervised learning involve training models to categorize data into
predefined classes. These methods aim to learn patterns or features from labelled training data
and then use that knowledge to classify new data accurately. Common techniques include:

9
A classifier is a type of machine learning algorithm that assigns a label to a data input. Classifier
algorithms use labelled data and statistical methods to produce predictions about data input
classifications.

There are different types of Classification techniques used in Supervised learning.

1. Logistic Regression

2. K-Nearest Neighbour

3. Support Vector Machine

• Kernel SVM

• Radial Basis Function

4. Naïve Bayes

5. Decision Tree Classification

1. Logistic Regression

- It predicts the probability of a data point belonging to a certain class by modelling the
relationship between input features and output labels.

- Example : Predicting whether an email is spam or not based on features like word
frequency, sender, and formatting.

2. Decision Trees

- A tree-like structure where each node represents a decision based on feature values, and leaf
nodes denote the classification outcome.

10
- Example : Categorizing animals as mammals, reptiles, or birds based on features like body
temperature and mode of reproduction.

3. Random Forest

- It combines multiple decision trees to improve classification accuracy by taking the majority
vote from all trees.

- Example : Classifying loan applicants as "approved" or "not approved" based on credit

history and income.

4. Support Vector Machines (SVM)

- Finds a hyperplane that separates data points into classes by maximizing the margin between
them.

- Example : Classifying handwriting digits (like '7' or '9') based on pixel intensity values.

5. K-Nearest Neighbours (KNN)

- Classifies data points based on the class of the nearest neighbours in the feature space.

- Example : Identifying whether a fruit is an apple or orange based on size, colour, and
texture.

11
6. Neural Networks

- Mimics the structure of the human brain, consisting of layers of neurons to learn complex
patterns and classifications.

- Example : Recognizing faces in images by analyzing facial features like eyes, nose, and
mouth.

7. Naive Bayes

- Uses probabilities to classify data based on Bayes' theorem, assuming features are
independent.

- Example : Classifying SMS messages as "promotional" or "personal" based on text

analysis.

4a)List out various Unsupervised learning techniques used in Machine Learning. [L1][CO5]
[5M]
Unsupervised learning is a type of machine learning in which models are trained using
unlabelled dataset and are allowed to act on that data without any supervision.

oClustering: Clustering is a method of grouping the objects into clusters such that objects with
most similarities remains into a group and has less or no similarities with the objects of another
group. Cluster analysis finds the commonalities between the data objects and categorizes them as
per the presence and absence of those commonalities.

oAssociation: An association rule is an unsupervised learning method which is used for finding
the relationships between variables in the large database. It determines the set of items that
occurs together in the dataset. Association rule makes marketing strategy more effective. Such as
people who buy X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A
typical example of Association rule is Market Basket Analysis.

12
Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (datapoint belongs to only one
group) and Soft Clustering (data points can belong to another group also). But there are also
other various approaches of Clustering exist. Below are the main clustering methods used in
Machine learning:

1. Partitioning Clustering

2. Density-Based Clustering

3. Distribution Model-Based Clustering

4. Hierarchical Clustering

5. Fuzzy Clustering

4b) Illustrate the clustering techniques in unsupervised learning with examples. [L3][CO2] [7M]

1. Partitioning Methods

These methods divide the dataset into distinct non-overlapping clusters. Each data point belongs
to exactly one cluster.

K-Means Clustering : It partitions data into K clusters by minimizing the distance between
data points and their respective cluster centroid. K-means is ideal for spherical and compact
clusters but requires you to predefine the number of clusters.

Example : Segmenting customers based on purchasing behaviour.

K-Medoids Clustering : Similar to K-means, but uses actual data points (medoids) as cluster
centers instead of centroids. K-medoids is more robust to noise and outliers.

Example : Grouping cities based on their geographic proximity.

2. Hierarchical Clustering

This method creates a tree-like structure (dendrogram) to represent nested clusters.

13
- Agglomerative Clustering (Bottom-Up): Each data point starts as its own cluster, and
clusters are merged iteratively based on similarity.

Example : Analyzing genetic similarities among species.

- Divisive Clustering (Top-Down): All data points start in one cluster, which is split
recursively into smaller clusters.

Example : Breaking down customer groups based on broad categories, then into
subcategories.

3. Density-Based Clustering

Clusters are formed based on the density of data points, identifying regions of high data
concentration.

- DBSCAN (Density-Based Spatial Clustering of Applications with Noise) :

It groups data points in dense areas and labels sparse regions as noise. DBSCAN is suitable for
clusters of arbitrary shapes.

Example : Identifying star clusters in space based on the density of celestial objects.

- OPTICS (Ordering Points To Identify the Clustering Structure) : Extends DBSCAN by

dealing with varying densities in data. It generates an ordering of points rather than explicitly
forming clusters.

Example : Exploring geological formations by clustering soil density data.

4. Model-Based Clustering

This method assumes data is generated from a mixture of probability distributions, such as
Gaussian distributions.

14
- Gaussian Mixture Models (GMM) : Each cluster is modelled as a Gaussian distribution.
GMM provides probabilities for data points belonging to a cluster, allowing overlapping
clusters.

Example : Analyzing stock market behaviours based on fluctuating trends.

5. Grid-Based Clustering

The data space is divided into a grid of finite cells, and clusters are formed based on cell density.

- STING (Statistical Information Grid) : Aggregates statistical information of cells to form

clusters hierarchically.

Example : Clustering geographic regions for environmental monitoring.

- Wave Cluster : Uses wavelet transformation to group data points in grids, useful for
spatial clustering.

Example : Identifying areas of similar land use in satellite imagery.

6. Fuzzy Clustering

Data points can belong to multiple clusters with varying degrees of membership.

- Fuzzy C-Means (FCM) : Each data point has a membership value to clusters rather than
being assigned to one exclusively.

Example : Classifying images with overlapping features, such as blending two colours.

Here we are discussing mainly popular Clustering algorithms that are widely used in machine
learning:

1.K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms.
It classifies the dataset by dividing the samples into different clusters of equal variances. The
number of clusters must be specified in this algorithm. It is fast with fewer computations
required, with the linear complexity of O(n).

2.Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth
density of data points. It is an example of a centroid-based model, that works on updating the
candidates for centroid to be the centre of the points within a given region.

3.DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with

Noise. It is an example of a density-based model similar to the mean-shift, but with some
remarkable advantages. In this algorithm, the areas of high density are separated by the areas of
low density. Because of this, the clusters can be found in any arbitrary shape.

4.Expectation-Maximization Clustering using GMM: This algorithm can be used as an

alternative for the k-means algorithm or for those cases where K-means can be failed. In GMM,
it is assumed that the data points are Gaussian distributed.

5.Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs

the bottom-up hierarchical clustering. In this, each data point is treated as a single cluster at the
outset and then successively merged. The cluster hierarchy can be represented as a tree-structure.

5)Summarize the Guidelines for Machine Learning Experiments. [L2][CO1] [12M]

15
Guidelines for Machine Learning Experiments Before we start experimentation, we need to
have a good idea about what it is we are studying, how the data is to be collected, and how we
are planning to analyze it.
o Aim of the Study
o Selection of the Response Variable
o Selection of the Response Variable
o Choice of Factors and Levels
o Choice of Experimental Design
o Performing the Experiment
o Statistical Analysis of the Data
o Conclusions and Recommendations
A. Aim of the Study:
We need to start by stating the problem clearly, defining what the objectives are. In machine
learning, there may be several possibilities. As we discussed before, we may be interested in
assessing the expected error (or some other response measure) of a learning algorithm on a
particular problem and check that, for example, the error is lower than a certain acceptable
level.
Given two learning algorithms and a particular problem as defined by a dataset, we may want
to determine which one has less generalization error. These can be two different algorithms, or
one can be a proposed improvement of the other, for example, by using a better feature
extractor.
In the general case, we may have more than two learning algorithms, and we may want to
choose the one with the least error, or order them in terms of error, for a given dataset. In an
even more general setting, instead of on a single dataset, we may want to compare two or more
algorithms on two or more datasets.
B. Selection of the Response Variable
We need to decide on what we should use as the quality measure. Most frequently, error is
used that is the misclassification error for classification and mean square error for regression.
We may also use some variant; for example, generalizing from 0/1 to an arbitrary loss, we may
use a risk measure. In information retrieval, we use measures such as precision and recall;
In a cost-sensitive Design and Analysis of Machine Learning Experiments setting, not only the
output but also system parameters, for example, its complexity, are taken into account.
C. Choice of Factors and Levels
What the factors are depend on the aim of the study. If we fix an algorithm and want to find the
best hyper parameters, then those are the factors. If we are comparing algorithms, the learning
algorithm is a factor. If we have different datasets, they also become a factor. The levels of a
factor should be carefully chosen so as not to miss a good configuration and avoid doing
unnecessary experimentation. It is always good to try to normalize factor levels.
For example, in optimizing k of k-nearest neighbour, one can try values such as 1, 3, 5, and so
on, but in optimizing the spread hof Parzen windows, we should not try absolute values such as
1.0, 2.0, and so on, because that depends on the scale of the input; it is better to find some
statistic that is an indicator of scale— for example, the average distance between an instance
and its nearest neighbour—and try has different multiples of that statistic. Though previous
expertise is a plus in general, it is also important to investigate all factors and factor levels that
may be of importance and not be overly influenced by past experience.
D. Choice of Experimental Design
It is always better to do a factorial design unless we are sure that the factors do not interact,
because mostly they do. Replication number depends on the dataset size; it can be kept small
when the dataset is large; we will discuss this in the next section when we talk about resampling.

16
However, too few replicates generate few data and this will make comparing distributions
difficult; in the particular case of parametric tests, the assumptions of Gaussian its may not be
tenable. Generally, given some dataset, we leave some part as the test set and use the rest for
training and validation, probably many times by resampling. How this division is done is
important.
In practice, using small datasets leads to responses with high variance, and the differences will
not be significant and results will not be conclusive. It is also important to avoid as much as
possible toy, synthetic data and use datasets that are collected from real-world under real-life
circumstances.

E. Performing the Experiment

Before running a large factorial experiment with many factors and levels, it is best if one does a
few trial runs for some random settings to check that all is as expected. In a large experiment, it
is always a good idea to save intermediate results (or seeds of the random number generator),
so that a part of the whole
experiment can be rerun when desired.
All the results should be reproducible. In running a large experiment with many factors and
factor levels, one should be aware of the possible negative effects of software aging. It is
important that an experimenter be unbiased during experimentation. In comparing one’s
favourite algorithm with a competitor, both should be investigated equally diligently.
In large-scale studies, it may even be envisaged that testers be different from developers. One
should avoid the temptation to write one’s own “library” and instead, as much as possible, use
code from reliable sources; such code would have been better tested and optimized.
As in any software development study, the advantages of good documentation cannot be
underestimated, especially when working in groups. All the methods developed for high-quality
software engineering should also be used in machine learning experiments.
F. Statistical Analysis of the Data
This corresponds to analyzing data in a way so that whatever conclusion we get is not subjective
or due to chance. We cast the questions that we want to answer in a hypothesis testing
framework and check whether the sample supports the hypothesis.
For example, the question "Is A a more accurate algorithm than B?" becomes the hypothesis
"Can we say that the average error of learners trained by A is significantly lower than the
average error of learners trained by B?" As always, visual analysis is helpful, and we can use
histograms of error distributions, whisker-and-box plots, range plots, and so on.
G. Conclusions and Recommendations
Once all data is collected and analysed, we can draw objective conclusions. One frequently
encountered conclusion is the need for further experimentation. Most statistical, and hence
machine learning or data mining, studies are iterative. It is for this reason that we never start
with all the experimentation. It is suggested that no more than 25 percent of the available
resources should be invested in the first experiment (Montgomery 2005). The first runs are for
investigation only. That is also why it is a good idea not to start with high expectations, or
promises to one’s boss or thesis advisor. We should always remember that statistical testing
never tells us if the hypothesis is correct or false, but how much the sample seems to concur with
the hypothesis. There is always a risk that we do not have a conclusive result or that our
conclusions be wrong, especially if the data is small and noisy. When our expectations are not
met, it is most helpful to investigate why they are not. For example, in checking why our
favourite algorithm A has worked awfully bad on some cases, we can get a splendid idea for
some improved version of A.

17
All improvements are due to the deficiencies of the previous version; finding a deficiency is but
a helpful hint that there is an improvement we can make! But we should not go to the next step
of testing the improved version before we are sure that we have completely analysed the current
data and learned all we could learn from it. Ideas are cheap, and useless unless tested, which is
costly.
6a)Explain Model Selection in Machine learning. [L2][CO1] [6M]

Model selection in machine learning is a crucial process where the most appropriate algorithm or
model is chosen to solve a specific problem based on factors like data, objectives, and
constraints. Here’s a detailed explanation:
1. Understanding the Problem
- Clearly define the task: Is it classification, regression, clustering, or something else?
- Identify the type of data: Tabular, text, images, or time-series data may require specific
models.
- Assess requirements: Are interpretability, scalability, or computational efficiency critical?
2. Exploring the Dataset
- Size of Data : Some models, like neural networks, require large datasets to perform well,
while simpler algorithms like linear regression can handle smaller datasets effectively.
- Feature Types : Categorical or numerical features may need preprocessing for some
models. For example, decision trees handle categorical data naturally, while Support Vector
Machines require numerical data.
- Quality of Data : Assess missing values, outliers, and noise, which can affect model
performance.
3. Criteria for Model Selection
- Accuracy : Models should perform well on training and test data to ensure reliability.
- Speed : If computational time is a concern, simpler models like Logistic Regression may be
preferable over complex ones like Gradient Boosted Trees or Deep Learning models.
- Scalability : Consider if the model can handle larger datasets or additional features
efficiently.
- Interpretability : If understanding the decision-making process is important, models like
decision trees or linear regression are easier to interpret than black-box models like neural
networks.
4. Evaluating Different Models
Before finalizing, different models can be evaluated using approaches like:
- Cross-Validation : Splitting the data into training and validation sets and measuring
performance across multiple iterations ensures the chosen model generalizes well.
- Hyperparameter Tuning : Adjust model parameters to find the optimal configuration for
performance.
- Performance Metrics :
- For classification: Accuracy, F1-score, precision, recall, etc.
- For regression: Mean squared error (MSE), R-squared, etc.
5. Testing with Baseline Models
Start with simpler models as benchmarks (e.g., Linear Regression, Decision Trees) and
compare them against more complex models like ensemble methods (Random Forest, Boost) or
deep learning.
6. Trade-Off Analysis
Every model has strengths and weaknesses. Consider trade-offs based on:
- Performance vs Interpretability: Neural networks may perform better but are harder to
interpret.

18
- Complexity vs Practicality: Ensemble methods might be accurate but require more resources.
7. Tools for Model Selection
Utilize frameworks like:
- Scikit-Learn : A wide range of algorithms for easy experimentation.
- TensorFlow/Kera’s/PyTorch : For deep learning tasks.
- Auto ML Tools : Automatically select models and tune hyperparameters.
8. Testing on Real-World Data
Once a model is selected and trained, test it on unseen data or simulated environments to
ensure its robustness and applicability.

6b) Discriminate Generalization in machine learning with examples. [L5][CO1] [6M]

Discriminate generalization in machine learning refers to a model's ability to correctly
differentiate and generalize its learned patterns across distinct classes or categories, ensuring
accurate predictions on unseen data. This is a vital concept for ensuring that a machine
learning model truly understands the underlying relationships in the data instead of
overfitting to specific details. Here’s a detailed breakdown:

What is Discriminate Generalization?

Discriminate generalization means that a machine learning model can effectively:

Recognize the differences between various classes or categories in the training data.

Apply this differentiation to unseen data by generalizing its learning in a way that maintains
accuracy without confusing classes.

For example, in image classification, a model trained to distinguish between cats and dogs
should consistently classify a new, unseen image of a dog correctly, even if it's slightly
different (such as a unique breed or unusual angle).
Factors Affecting Discriminate Generalization
Feature Engineering: Models must focus on meaningful features that help discriminate
between classes. For instance, texture and shape might be important for distinguishing cats
from dogs.

Model Complexity: Overly simple models might fail to capture subtle differences between
classes, while overly complex models might memorize training data, leading to poor
generalization.

Training Data Quality: Data imbalance (e.g., far more images of cats than dogs) can hinder
a model's ability to generalize across all classes equally.
19
Regularization: Techniques like L2 regularization or dropout prevent the model from
overfitting, allowing better generalization.

Evaluation Metrics: Metrics like precision and recall for each class ensure the model is not
favouring one category at the expense of another.
Examples of Discriminate Generalization in Machine Learning
Example 1: Image Classification
A model trained to classify images into "cars" and "trucks" must learn features that
distinguish between them, such as:

Cars typically have lower profiles and compact shapes.

Trucks often have larger, boxy bodies with cargo space. To test generalization, the model
might be evaluated on new images of vehicles with unusual designs (e.g., a sporty truck).

Example 2: Sentiment Analysis

A natural language processing (NLP) model trained to differentiate "positive" and "negative"
sentiment in text might learn patterns like:

Positive words: "happy," "excellent," "beautiful."

Negative words: "bad," "ugly," "sad." The model should generalize well by accurately
predicting sentiment in reviews containing rare, complex sentences.

How to Improve Discriminate Generalization?

Balanced Training Data: Ensure equal representation of classes to prevent bias.

Robust Features: Extract features that capture the true essence of the difference between
categories (e.g., edges in image classification, or frequency of certain keywords in text
analysis).

Cross-Validation: Use techniques like k-fold cross-validation to check performance on

multiple subsets of the data.

Advanced Architectures: Employ models that are naturally good at discrimination, such as
convolutional neural networks (CNNs) for images or transformers for text.

Monitoring Overfitting: Regularly compare training and validation performance to ensure the
model is generalizing correctly.

Real-World Applications
1.Healthcare Diagnostics: Discriminate between benign and malignant tumours from medical
images.

2.Customer Segmentation: Group customers into categories based on purchasing behaviour

and demographic factors.

3.Spam Detection: Separate spam emails from legitimate emails using NLP.

20
7a) Compare Supervised learning and Unsupervised learning[L6][CO1] [6M]

Supervised Learning Unsupervised Learning

Supervised learning algorithms are trained Unsupervised learning algorithms are trained
using labelled data. using unlabelled data.

Supervised learning model takes direct Unsupervised learning model does not take
feedback to check if it is predicting correct any feedback.
output or not.

Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.

In supervised learning, input data is provided In unsupervised learning, only input data is
to the model along with the output. provided to the model.

The goal of supervised learning is to train the The goal of unsupervised learning is to find
model so that it can predict the output when the hidden patterns and useful insights from
it is given new data. the unknown dataset.

Supervised learning needs supervision to Unsupervised learning does not need any
train the model. supervision to train the model.

Supervised learning can be categorized in Unsupervised Learning can be classified in

Classification and Regression problems. Clustering and Associations problems.

Supervised learning can be used for those Unsupervised learning can be used for those
cases where we know the input as well as cases where we have only input data and no
corresponding outputs. corresponding output data.

7b) Analyze Reinforcement Learning with neat diagram.. [L4][CO1] [6M]

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make
decisions by interacting with an environment to achieve a goal. The agent takes actions in the
environment, receives feedback in the form of rewards or penalties, and updates its strategy to
maximize cumulative rewards over time. Here's a detailed analysis, along with an example and a
visual representation.

Components of Reinforcement Learning

1.Agent: The learner or decision-maker.

2.Environment: The system with which the agent interacts.

3.State (S): A representation of the current situation in the environment.

4.Action (A): The possible moves or choices the agent can make.

21
5.Reward (R): Feedback the agent receives after performing an action; can be positive or
negative.

6.Policy (π): A strategy or mapping of states to actions.

7.Value Function (V): A prediction of future rewards for each state.

Reinforcement Learning Workflow

The process involves the following steps:

Initialization: The agent starts with no knowledge of the environment.

Interaction: The agent observes the state, takes an action, and receives a reward from the
environment.

Learning: Based on the action and reward, the agent updates its policy using algorithms like Q-or Deep Q-
Learning.
Optimization: The agent continues this cycle to improve its decisions.

Example: Self-Driving Car

Imagine a self-driving car learning to navigate traffic lights:

State (S): The current state could be "red light," "green light," or "yellow light."

Actions (A): The car can "stop," "move," or "slow down."

Reward (R):

o Positive reward (+1) for stopping at a red light or moving at a green light.
o Negative reward (-1) for running a red light or stopping unnecessarily at a green light.

Example: The problem is as follows: We have an agent and a reward, with many hurdles in
between. The agent is supposed to find the best possible path to reach the reward. The following
problem explains the problem more easily.

22
The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is
the diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and
then choosing the path which gives him the reward with the least hurdles. Each right step will give the
robot a reward and each wrong step will subtract the reward of the robot. The total reward will be
calculated when it reaches the final reward that is the diamond.

8) Discuss clustering and association rules in unsupervised learning. [L2][CO2] [12M]

Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.

Types of Clustering Methods

1. Partitioning Clustering

2. Density-Based Clustering

3. Distribution Model-Based Clustering

4. Hierarchical Clustering

5. Fuzzy Clustering

1. Partitioning Methods

These methods divide the dataset into distinct non-overlapping clusters. Each data point belongs
to exactly one cluster.

23
K-Means Clustering : It partitions data into K clusters by minimizing the distance between
data points and their respective cluster centroid. K-means is ideal for spherical and compact
clusters but requires you to predefine the number of clusters.

Example : Segmenting customers based on purchasing behaviour.

K-Medoids Clustering : Similar to K-means, but uses actual data points (medoids) as cluster
centers instead of centroids. K-medoids is more robust to noise and outliers.

Example : Grouping cities based on their geographic proximity.

2. Hierarchical Clustering

This method creates a tree-like structure (dendrogram) to represent nested clusters.

- Agglomerative Clustering (Bottom-Up): Each data point starts as its own cluster, and
clusters are merged iteratively based on similarity.

Example : Analyzing genetic similarities among species.

- Divisive Clustering (Top-Down): All data points start in one cluster, which is split
recursively into smaller clusters.

Example : Breaking down customer groups based on broad categories, then into
subcategories.

3. Density-Based Clustering

Clusters are formed based on the density of data points, identifying regions of high data
concentration.

24
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise) :

It groups data points in dense areas and labels sparse regions as noise. DBSCAN is suitable for
clusters of arbitrary shapes.

Example : Identifying star clusters in space based on the density of celestial objects.

- OPTICS (Ordering Points To Identify the Clustering Structure) : Extends DBSCAN by

dealing with varying densities in data. It generates an ordering of points rather than explicitly
forming clusters.

Example : Exploring geological formations by clustering soil density data.

4. Model-Based Clustering

This method assumes data is generated from a mixture of probability distributions, such as
Gaussian distributions.

- Gaussian Mixture Models (GMM) : Each cluster is modelled as a Gaussian distribution.

GMM provides probabilities for data points belonging to a cluster, allowing overlapping
clusters.

Example : Analyzing stock market behaviours based on fluctuating trends.

5. Grid-Based Clustering

The data space is divided into a grid of finite cells, and clusters are formed based on cell density.

- STING (Statistical Information Grid) : Aggregates statistical information of cells to form

clusters hierarchically.

Example : Clustering geographic regions for environmental monitoring.

- Wave Cluster : Uses wavelet transformation to group data points in grids, useful for
spatial clustering.

Example : Identifying areas of similar land use in satellite imagery.

6. Fuzzy Clustering

Data points can belong to multiple clusters with varying degrees of membership.

- Fuzzy C-Means (FCM) : Each data point has a membership value to clusters rather than
being assigned to one exclusively.

Example : Classifying images with overlapping features, such as blending two colours.

25
Here we are discussing mainly popular Clustering algorithms that are widely used in machine
learning:

3.DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with

4.Expectation-Maximization Clustering using GMM: This algorithm can be used as an

alternative for the k-means algorithm or for those cases where K-means can be failed. In GMM,
it is assumed that the data points are Gaussian distributed.

5.Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs

ASSOCIATION RULES:

Association rule learning is a kind of unsupervised learning technique that tests for the reliance of one
data element on another data element and design appropriately so that it can be more cost-effective. It
tries to discover some interesting relations or associations between the variables of the dataset. It
depends on various rules to find interesting relations between variables in the database.

The association rule learning is the most important approach of machine learning, and it is employed in
Market Basket analysis, Web usage mining, continuous production, etc. In market basket analysis, it is
an approach used by several big retailers to find the relations between items.

Types of Association Rule Learning

There are the following types of Association rule learning which are as follows −

Apriori Algorithm − This algorithm needs frequent datasets to produce association rules. It is designed
to work on databases that include transactions. This algorithm needs a breadth-first search and hash tree
to compute the itemset efficiently.

It is generally used for market basket analysis and support to learn the products that can be purchased
together. It can be used in the healthcare area to discover drug reactions for patients.

Eclat Algorithm − The Eclat algorithm represents Equivalence Class Transformation. This algorithm
needs a depth-first search method to discover frequent item sets in a transaction database. It implements
quicker execution than Apriori Algorithm.

26
F-P Growth Algorithm − The F-P growth algorithm represents Frequent Pattern. It is the enhanced
version of the Apriori Algorithm. It describes the database in the form of a tree structure that is referred
to as a frequent pattern or tree. This frequent tree aims to extract the most frequent patterns. There are
various applications of Association Rule which are as follows −

• Items purchased on a credit card, such as rental cars and hotel rooms, support insight into the
following product that customer are likely to buy.

• Optional services purchased by tele-connection users (call waiting, call forwarding, DSL, speed call,
etc.) support decide how to bundle these functions to maximize revenue.

• Banking services used by retail users (money industry accounts, CDs, investment services, car loans,
etc.) recognize users likely to needed other services.

• Unusual group of insurance claims can be an expression of fraud and can spark higher investigation.

 Medical patient histories can supports expressions of likely complications based on definite set
of treatments.

9) Analyze the classification and regression techniques in supervised learning. [L4][CO1] [12M]

Supervised Learning
Supervised learning is a branch of machine learning where models are trained using labelled data. It means
the input data comes with corresponding output labels. The goal is to learn the mapping between inputs and
outputs to make accurate predictions on new, unseen data.

The two primary types of supervised learning problems are classification (for discrete outputs) and
regression (for continuous outputs). Let’s dive deeper into each.

1. Classification Techniques

Classification involves predicting a categorical variable or class label from input data. The model assigns
the input to one of the predefined categories.

Characteristics of Classification

- Output : Discrete classes or categories (e.g., "Yes" or "No," "Spam" or "Not Spam").
- Goal : Minimize misclassification errors and improve accuracy.
- Example Problems : Email spam detection, disease diagnosis (e.g., cancer vs non-cancer), image
recognition.

Classification Process

1. Data Preparation :
- Ensure data is labelled (each input has a corresponding output class).
- Preprocess data (handle missing values, normalize features).

2. Model Selection :
- Simple Models : Logistic Regression for binary classification tasks.
- Complex Models : Neural Networks for tasks like image and text classification.

27
3. Training the Model :
- Apply a classification algorithm to learn decision boundaries.
- Use labelled data to adjust parameters.

4. Evaluation :
- Test the model on unseen data using metrics like accuracy, precision, recall, and F1-score.

Popular Classification Algorithms

1. Logistic Regression : Predicts probabilities of classes using the sigmoid function. Suitable for binary
classification tasks.

- Example : Predicting if a loan will be approved (yes/no).

2. Support Vector Machines (SVM) : Finds hyperplanes that separate classes with maximum margin.
- Example : Classifying species of flowers based on petal dimensions.

3. Decision Trees : Uses a tree-like structure to split data based on feature values.
- Example : Classifying customers based on purchasing behaviour.

4. Random Forests : Combines multiple decision trees to improve accuracy and reduce overfitting.
- Example : Fraud detection in credit card transactions.

5. Neural Networks : Suitable for high-dimensional data like images, videos, and text.
- Example : Identifying handwritten digits.
6. Logistic Regression
Logistic regression in Machine Learning is used to find the probability of event=Success and
event=Failure. We should use logistic regression when the dependent variable is binary (0/ 1, True/ False,
Yes/ No) in nature. Here the value of Y ranges from 0 to 1 and it can be represented by the following
equation.
odds= p/ (1-p) = probability of event occurrence / probability of not event occurrence.

Evaluation Metrics :
- Accuracy : Percentage of correct predictions.
- Precision : Fraction of relevant instances among the retrieved ones.
- Recall : Fraction of relevant instances retrieved out of all relevant ones.

28
- F1-Score : Combines precision and recall into a single metric.
- Confusion Matrix : Provides detailed insight into classification errors for each class.

2. Regression Techniques
Regression is used for predicting a continuous numeric output. The goal is to find the relationship between
dependent and independent variables.

Characteristics of Regression
- Output : Numeric or continuous values (e.g., 0.75, 1000).
- Goal : Minimize the error between predicted and actual values.
- Example Problems : Forecasting stock prices, predicting house prices, estimating temperature.

Regression Process
1. Data Preparation :
- Handle missing values and outliers.
- Scale features for certain algorithms like gradient descent.

2. Model Selection :
- Simple Models : Linear Regression for straight-line relationships.
- Complex Models : Polynomial Regression for non-linear relationships.

3. Training the Model :

- Apply a regression algorithm to fit the data by minimizing error (e.g., mean squared error).
- Learn parameters (coefficients) that describe the relationships between features.

4. Evaluation :
- Test the model on unseen data using metrics like mean squared error (MSE) and R² score.

Popular Regression Algorithms

1. Linear Regression : Fits a straight line to the data to predict the output based on input features.
- Example : Predicting house prices based on square footage.

2. Polynomial Regression : Fits a curve to the data for non-linear relationships.

- Example : Modelling population growth over time.

3. Support Vector Regression (SVR) : Uses principles of SVM to predict continuous outputs.
- Example : Predicting temperature variations.

4. Decision Trees for Regression : Splits the data into regions with similar output values.
- Example : Predicting sales figures based on marketing spend.

29
5. Neural Networks for Regression : Suitable for modelling complex, non-linear dependencies.
- Example : Predicting energy usage from historical data.

Evaluation Metrics :

- Mean Absolute Error (MAE) : Average absolute differences between predicted and actual values.
- Mean Squared Error (MSE) : Average squared differences between predicted and actual values.
- Root Mean Squared Error (RMSE) : Square root of MSE for better interpretation.
- R² Score : Measures the proportion of variance in the output explained by the model.

---

Key Differences Between Classification and Regression

| Feature | Classification | Regression |

|--------------------------|-------------------------------------------|-------------------------------------|
| Output | Discrete class labels | Continuous numeric values |
| Objective | Assign classes correctly | Predict numeric values accurately |
| Algorithms | Logistic Regression, SVM, Decision Trees | Linear Regression, SVR, Neural Networks |
| Applications | Spam detection, disease diagnosis | House price prediction, stock forecasting |

Conclusion
Both classification and regression are fundamental supervised learning techniques, addressing different
types of prediction problems. Classification is suited for categorical outputs, while regression handles
continuous numeric outputs. Selecting the right algorithm depends on the nature of the data, the problem,
and the desired accuracy.

10a) Establish the Association rules in unsupervised learning. [L3][CO2] [6M]

ASSOCIATION RULES:
Association rule learning is a kind of unsupervised learning technique that tests for the reliance of one data
element on another data element and design appropriately so that it can be more cost-effective. It tries to
discover some interesting relations or associations between the variables of the dataset. It depends on
various rules to find interesting relations between variables in the database.

The association rule learning is the most important approach of machine learning, and it is employed in
Market Basket analysis, Web usage mining, continuous production, etc. In market basket analysis, it is an
approach used by several big retailers to find the relations between items.

Types of Association Rule Learning

There are the following types of Association rule learning which are as follows −
Apriori Algorithm − This algorithm needs frequent datasets to produce association rules. It is designed to
work on databases that include transactions. This algorithm needs a breadth-first search and hash tree to
compute the itemset efficiently.
It is generally used for market basket analysis and support to learn the products that can be purchased
together. It can be used in the healthcare area to discover drug reactions for patients.
Eclat Algorithm − The Eclat algorithm represents Equivalence Class Transformation. This algorithm
needs a depth-first search method to discover frequent item sets in a transaction database. It implements
quicker execution than Apriori Algorithm.
F-P Growth Algorithm − The F-P growth algorithm represents Frequent Pattern. It is the enhanced

30
version of the Apriori Algorithm. It describes the database in the form of a tree structure that is referred to
as a frequent pattern or tree. This frequent tree aims to extract the most frequent patterns. There are various
applications of Association Rule which are as follows −
• Items purchased on a credit card, such as rental cars and hotel rooms, support insight into the
following product that customer are likely to buy.
• Optional services purchased by tele-connection users (call waiting, call forwarding, DSL, speed call,
etc.) support decide how to bundle these functions to maximize revenue.
• Banking services used by retail users (money industry accounts, CDs, investment services, car loans,
etc.) recognize users likely to needed other services.
• Unusual group of insurance claims can be an expression of fraud and can spark higher investigation.
Medical patient histories can supports expressions of likely complications based on definite set of
treatments.

10b) Analyze the real world applications of ML. [L4][CO6] [6M]

Machine learning has found numerous applications across various industries, revolutionizing processes and
enabling the development of innovative solutions. Here are some real-world applications of machine
learning:
Healthcare: Machine learning is used for medical diagnosis, patient monitoring, and treatment
planning. It can analyze medical records, images, and genomic data to assist in early disease detection,
personalized medicine, and predicting patient outcomes. Machine learning models can also help identify
patterns and anomalies in large healthcare datasets for improved decision-making.

31
Finance: Machine learning is widely applied in financial institutions for fraud detection, credit
scoring, algorithmic trading, and risk assessment. It can analyze vast amounts of financial data to identify
fraudulent transactions, predict market trends, and optimize investment strategies. Machine learning
models are also used for automated trading based on historical and real-time market data.

Retail and E-commerce: Machine learning is used for personalized recommendations, demand
forecasting, inventory management, and pricing optimization. By analyzing customer behavior, browsing
history, and purchase patterns, machine learning models can recommend relevant products to users,
optimize pricing strategies, and predict customer preferences to improve sales and customer satisfaction.

Transportation and Logistics: Machine learning is utilized for route optimization, demand
forecasting, and predictive maintenance in transportation and logistics. It can analyze historical data,
realtime traffic information, and weather conditions to optimize routes for delivery vehicles, forecast
demand for transportation services, and detect anomalies in equipment performance to prevent
breakdowns.
Manufacturing: Machine learning is used in manufacturing industries for quality control,
predictive maintenance, and process optimization. It can analyze sensor data from production lines to
detect anomalies and ensure product quality. Machine learning models can also predict equipment failures,
enabling proactive maintenance to minimize downtime and maximize productivity.

Natural Language Processing (NLP): Machine learning techniques are applied in NLP
applications such as language translation, sentiment analysis, chatbots, and voice assistants. NLP models
can understand and generate human language, enabling accurate translation between languages, sentiment
analysis of customer feedback, and interactive conversational experiences.

Autonomous Vehicles: Machine learning plays a crucial role in autonomous vehicles by enabling object
detection and recognition, scene understanding, and decision-making. Machine learning models process
sensor data from cameras, LiDAR, and radar to detect and classify objects on the road, navigate complex
environments, and make real-time decisions to ensure safe driving.

Energy and Utilities: Machine learning is used for energy load forecasting, anomaly detection in power
grids, and optimizing energy consumption. It can analyze historical energy consumption data, weather
conditions, and other factors to predict future energy demand and optimize energy generation and
distribution.
These are just a few examples of the vast range of real-world applications of machine learning. The
versatility and potential of machine learning continue to expand, with ongoing research and development
pushing the boundaries of what is possible in various industries and domains.

Self Organized Biological Dynamics and Non Linear Control PDF
100% (2)
Self Organized Biological Dynamics and Non Linear Control PDF
443 pages
MACHINE LEARNING R23 Material
100% (11)
MACHINE LEARNING R23 Material
32 pages
Notes Machine Learning
No ratings yet
Notes Machine Learning
45 pages
HSPhysics 01 Introduction and 1-D Kinematics
No ratings yet
HSPhysics 01 Introduction and 1-D Kinematics
89 pages
Brealey-Fundamentals-Of-Corporate-Finance-10e ch05-ppt-S2qZ
No ratings yet
Brealey-Fundamentals-Of-Corporate-Finance-10e ch05-ppt-S2qZ
56 pages
d2l en
No ratings yet
d2l en
505 pages
Machine Learing Back
No ratings yet
Machine Learing Back
29 pages
2-4 Shell and Tube Heat Exchanger (Problem)
No ratings yet
2-4 Shell and Tube Heat Exchanger (Problem)
35 pages
Machine Learning
No ratings yet
Machine Learning
52 pages
CLASS NOTES Unit 1 ML Material
No ratings yet
CLASS NOTES Unit 1 ML Material
42 pages
Introduction To Machine Learning PPT Main
100% (1)
Introduction To Machine Learning PPT Main
15 pages
Unit 1
No ratings yet
Unit 1
15 pages
Class Notes ML 1
No ratings yet
Class Notes ML 1
108 pages
Toaz - Info Engineering Metrology PR
No ratings yet
Toaz - Info Engineering Metrology PR
129 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
DS Unit2
No ratings yet
DS Unit2
23 pages
Machine Learning.
No ratings yet
Machine Learning.
50 pages
7 Machine Learning Algirithms
No ratings yet
7 Machine Learning Algirithms
20 pages
Test - Calculator: Math No
No ratings yet
Test - Calculator: Math No
15 pages
UC Berkeley EECS: Cal Day, April 18, 2015
No ratings yet
UC Berkeley EECS: Cal Day, April 18, 2015
50 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
30 pages
UJIAN DIAGNOSTIK MATEMATIK (BM) PSPN
No ratings yet
UJIAN DIAGNOSTIK MATEMATIK (BM) PSPN
61 pages
13 Moderator, Mediator, SEM
No ratings yet
13 Moderator, Mediator, SEM
38 pages
Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
Simulation of EMI Filters Using Matlab
No ratings yet
Simulation of EMI Filters Using Matlab
4 pages
Infrastructure Asset Management With Power System Applications First Edition Tjernberg Download
No ratings yet
Infrastructure Asset Management With Power System Applications First Edition Tjernberg Download
54 pages
Control Lab
No ratings yet
Control Lab
61 pages
B7 - B10 (Grafik)
No ratings yet
B7 - B10 (Grafik)
40 pages
Kriner Cash's Resume
No ratings yet
Kriner Cash's Resume
21 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Machine Learning 3-4
No ratings yet
Machine Learning 3-4
5 pages
Input Data Sheet For E-Class Record: Region Division School Name School Id School Year
No ratings yet
Input Data Sheet For E-Class Record: Region Division School Name School Id School Year
13 pages
Wavelet Analysis of Bender Element Signals
No ratings yet
Wavelet Analysis of Bender Element Signals
10 pages
Documentation For GPML Matlab Code
No ratings yet
Documentation For GPML Matlab Code
10 pages
Mathematics Entrance Examination Sample Questions For 13+ Age Group
No ratings yet
Mathematics Entrance Examination Sample Questions For 13+ Age Group
7 pages
What Is Machine Learning?
No ratings yet
What Is Machine Learning?
11 pages
Dynamic Modeling and Control Techniques For A Quadrotor: January 2015
No ratings yet
Dynamic Modeling and Control Techniques For A Quadrotor: January 2015
10 pages
Gas LiquidSeparators QuantifyingSeparationPerformancePart2 SPEMEB
No ratings yet
Gas LiquidSeparators QuantifyingSeparationPerformancePart2 SPEMEB
14 pages
Unit 1
No ratings yet
Unit 1
112 pages
Ds 6tower of Hanoi
No ratings yet
Ds 6tower of Hanoi
13 pages
ML Notes
No ratings yet
ML Notes
101 pages
Exact Ordinary Differential Equation
No ratings yet
Exact Ordinary Differential Equation
9 pages
X Maths (Basic) Pre-Board Paper 1
No ratings yet
X Maths (Basic) Pre-Board Paper 1
11 pages
Practise Set 2 gr10
No ratings yet
Practise Set 2 gr10
7 pages
Rehfeldt, GE. 1989. Ecological Adaptations in Douglas-Fir
No ratings yet
Rehfeldt, GE. 1989. Ecological Adaptations in Douglas-Fir
13 pages
UNIT III DKD
No ratings yet
UNIT III DKD
48 pages
Session One Machine Learning
No ratings yet
Session One Machine Learning
18 pages
Assignment of Mathematics B
No ratings yet
Assignment of Mathematics B
2 pages
Abstract
No ratings yet
Abstract
1 page
Issues in Machine Learning
No ratings yet
Issues in Machine Learning
63 pages
Introduction To Data Science Module 3
No ratings yet
Introduction To Data Science Module 3
24 pages
ML Unit-1
No ratings yet
ML Unit-1
28 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Unit 9 - Machine Learning
No ratings yet
Unit 9 - Machine Learning
18 pages
EMG 2203 Force Systems in Engineering Mechanics
No ratings yet
EMG 2203 Force Systems in Engineering Mechanics
5 pages
Machine Learning Unit 1 Que and Ans
No ratings yet
Machine Learning Unit 1 Que and Ans
6 pages
21cs743 Solutions
No ratings yet
21cs743 Solutions
19 pages
Database Evaluation For Semi-Empirical Methods Applied To Coaxial Jet Noise Prediction
No ratings yet
Database Evaluation For Semi-Empirical Methods Applied To Coaxial Jet Noise Prediction
10 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
24 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
CP Presentation Affan, Hammad, Arman, Shayan
No ratings yet
CP Presentation Affan, Hammad, Arman, Shayan
18 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Artificial Intelligence Lec 1 PDF
No ratings yet
Artificial Intelligence Lec 1 PDF
15 pages
SK Sahidur Rahaman Bba504a 2024
No ratings yet
SK Sahidur Rahaman Bba504a 2024
9 pages
AI Module III
No ratings yet
AI Module III
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
5 pages
Chapter 1
No ratings yet
Chapter 1
27 pages
ML Chatgpt
No ratings yet
ML Chatgpt
6 pages
ASSIGNMENT 1 Mavhine Learning
No ratings yet
ASSIGNMENT 1 Mavhine Learning
8 pages
ML Report
No ratings yet
ML Report
19 pages
ML Unit1 (HKB)
No ratings yet
ML Unit1 (HKB)
7 pages
Machine Learning Foundations - Overview
No ratings yet
Machine Learning Foundations - Overview
10 pages
A Comprehensive Guide To Machine Learning
No ratings yet
A Comprehensive Guide To Machine Learning
8 pages
ML
No ratings yet
ML
2 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Unit I
No ratings yet
Unit I
23 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Introduction To Machine Learning2 - 085047
No ratings yet
Introduction To Machine Learning2 - 085047
11 pages
Application of Machine Learning Report
No ratings yet
Application of Machine Learning Report
7 pages
Study On Machine Learning Research Paper
No ratings yet
Study On Machine Learning Research Paper
17 pages
Article 1 - Understanding Machine Learning - Concepts and Applications
No ratings yet
Article 1 - Understanding Machine Learning - Concepts and Applications
3 pages
A Comprehensive Introduction To Machine Learning
No ratings yet
A Comprehensive Introduction To Machine Learning
4 pages
Article On Machine Learning
No ratings yet
Article On Machine Learning
4 pages
Presenttion 33
No ratings yet
Presenttion 33
2 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet

ML Unit-1

Uploaded by

ML Unit-1

Uploaded by

UNIT-1

 Rapid increment in the production of data

Here are six popular machine learning algorithms:

1. Supervised machine learning

Types of supervised Machine learning Algorithms:

o Regression Trees o Non-Linear Regression

o Bayesian Linear Regression o Polynomial Regression

o Random Forest o Decision Trees

Unsupervised machine Learning: Unsupervised learning is a machine learning technique where

Types of Unsupervised Learning:

- Example Algorithm: K-Means, DBSCAN, Hierarchical Clustering.

- Applications: Customer segmentation in marketing, grouping similar documents, and biological

- Example Algorithm: Principal Component Analysis (PCA), t-SNE, Autoencoders.

Steps to Apply Unsupervised Learning:

- Gather data without any labels or predefined categories.

- Example: Standardizing numeric features like income and age.

4. Train the Model:

Real-World Applications of Unsupervised Learning:

5. Genomics and Bioinformatics:

Types of Unsupervised Learning Algorithm:

Unsupervised Learning algorithms:

o K-means clustering o KNN (k-nearest neighbours) o Hierarchal clustering

o Anomaly detection o Neural Networks o Principle Component Analysis

o Independent Component Analysis o Apriority algorithm o Singular value decomposition

2b) Describe the applications of supervised learning. [L1][CO6] 4m

5. Predictive Maintenance :The system examines historical machine performance data,

AI is aiming to develop an intelligent system Machine learning is attempting to construct

ML allows systems to learn new things from

It is developing a system that mimics humans to

ML will go for a solution whether it is optimal or

AI leads to intelligence or wisdom. ML leads to knowledge.

AI is a broader family consisting of ML and DL

Three broad categories of AI are :

There are different types of Classification techniques used in Supervised learning.

3. Support Vector Machine

• Radial Basis Function

5. Decision Tree Classification

- Example : Classifying loan applicants as "approved" or "not approved" based on credit

4. Support Vector Machines (SVM)

5. K-Nearest Neighbours (KNN)

- Example : Classifying SMS messages as "promotional" or "personal" based on text

3. Distribution Model-Based Clustering

Example : Segmenting customers based on purchasing behaviour.

Example : Grouping cities based on their geographic proximity.

This method creates a tree-like structure (dendrogram) to represent nested clusters.

Example : Analyzing genetic similarities among species.

- DBSCAN (Density-Based Spatial Clustering of Applications with Noise) :

- OPTICS (Ordering Points To Identify the Clustering Structure) : Extends DBSCAN by

Example : Exploring geological formations by clustering soil density data.

Example : Analyzing stock market behaviours based on fluctuating trends.

- STING (Statistical Information Grid) : Aggregates statistical information of cells to form

Example : Clustering geographic regions for environmental monitoring.

Example : Identifying areas of similar land use in satellite imagery.

3.DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with

4.Expectation-Maximization Clustering using GMM: This algorithm can be used as an

5.Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs

5)Summarize the Guidelines for Machine Learning Experiments. [L2][CO1] [12M]

E. Performing the Experiment

6b) Discriminate Generalization in machine learning with examples. [L5][CO1] [6M]

What is Discriminate Generalization?

Cars typically have lower profiles and compact shapes.

Example 2: Sentiment Analysis

Positive words: "happy," "excellent," "beautiful."

How to Improve Discriminate Generalization?

Cross-Validation: Use techniques like k-fold cross-validation to check performance on

2.Customer Segmentation: Group customers into categories based on purchasing behaviour

Supervised Learning Unsupervised Learning

Supervised learning can be categorized in Unsupervised Learning can be classified in

7b) Analyze Reinforcement Learning with neat diagram.. [L4][CO1] [6M]

Components of Reinforcement Learning

1.Agent: The learner or decision-maker.

2.Environment: The system with which the agent interacts.

3.State (S): A representation of the current situation in the environment.

6.Policy (π): A strategy or mapping of states to actions.

7.Value Function (V): A prediction of future rewards for each state.

Reinforcement Learning Workflow