0% found this document useful (0 votes)
21 views30 pages

S11BVAC14-Machine Learnig Using Python-CSE Course Material Unit1

Uploaded by

subbaiyal6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views30 pages

S11BVAC14-Machine Learnig Using Python-CSE Course Material Unit1

Uploaded by

subbaiyal6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Machine Learning Using Python -S11BVAC14

Course Material
UNIT 1: Introduction
About Machine learning - Applications of ML -Uses of ML - Machine learning methods -
Machine learning algorithms Regression,Classification, Clustering, Association - A brief
introduction python libraries.

Introduction

What is Machine Learning

In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work on
our instructions. But can a machine also learn from experiences or past data like a human does?
So here comes the role of Machine Learning.

Traditional Programming

Traditional programming is a manual process—meaning a person (programmer) creates the


program. But without anyone programming the logic, one has to manually formulate or code
rules.
In machine learning, on the other hand, the algorithm automatically formulates the rules from
the data.

Machine Learning Programming

Unlike traditional programming, machine learning is an automated process. It can increase the
value of your embedded analytics in many areas, including data prep, natural language
interfaces, automatic outlier detection, recommendations, and causality and significance
detection. All of these features help speed user insights and reduce decision bias.

For example, if you feed in customer demographics and transactions as input data and use
historical customer churn rates as your output data, the algorithm will formulate a program that
can predict if a customer will churn or not. That program is called a predictive model.

You can use this model to predict business outcomes in any situation where you have input and
historical output data:

1. Identify the business question you would like to ask.

2. Identify the historical input.

3. Identify the historically observed output (i.e., data samples for when the condition is
true and for when it’s false).
For instance, if you want to predict who will pay the bills late, identify the input (customer
demographics, bills) and the output (pay late or not), and let the machine learning use this data
to create your model.

In summary, traditional programming is rule-based and deterministic, relying on human-


crafted logic, whereas machine learning is data-driven and probabilistic, relying on patterns
learned from data.
These ML algorithms help to solve different business problems like Regression, Classification,
Forecasting, Clustering, and Associations, etc.

Based on the methods and way of learning, machine learning is divided into mainly four types,
which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
Machine learning algorithms are molded on a training dataset to create a model. As new input
data is introduced to the trained ML algorithm, it uses the developed model to make a
prediction.

Further, the prediction is checked for accuracy. Based on its accuracy, the ML algorithm is
either deployed or trained repeatedly with an augmented training dataset until the desired
accuracy is accured.

Benefits of Machine Learning

• Enhanced Efficiency and Automation: ML automates repetitive tasks, freeing up


human resources for more complex work. It also streamlines processes, leading to
increased efficiency and productivity.
• Data-Driven Insights: ML can analyze vast amounts of data to identify patterns and
trends that humans might miss. This allows for better decision-making based on real-
world data.
• Improved Personalization: ML personalizes user experiences across various
platforms. From recommendation systems to targeted advertising, ML tailors content
and services to individual preferences.
• Advanced Automation and Robotics: ML empowers robots and machines to perform
complex tasks with greater accuracy and adaptability. This is revolutionizing fields like
manufacturing and logistics.

Challenges of Machine Learning

• Data Bias and Fairness: ML algorithms are only as good as the data they are trained
on. Biased data can lead to discriminatory outcomes, requiring careful data selection
and monitoring of algorithms.
• Security and Privacy Concerns: As ML relies heavily on data, security breaches can
expose sensitive information. Additionally, the use of personal data raises privacy
concerns that need to be addressed.
• Interpretability and Explainability: Complex ML models can be difficult to
understand, making it challenging to explain their decision-making processes. This lack
of transparency can raise questions about accountability and trust.
• Job Displacement and Automation: Automation through ML can lead to job
displacement in certain sectors. Addressing the need for retraining and reskilling the
workforce is crucial.

1. Image Recognition:

Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:

Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.It is
based on the Facebook project named "Deep Face," which is responsible for face recognition
and person identification in the picture. ward Skip 10s

2. Speech Recognition

While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.Speech recognition is a process of converting
voice instructions into text, and it is also known as "Speech to text", or "Computer speech
recognition." At present, machine learning algorithms are widely used by various applications
of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.

3. Traffic prediction:

If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.

It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:

o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.

Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.

4. Product recommendations:

Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for
some product on Amazon, then we started getting an advertisement for the same product while
internet surfing on the same browser and this is because of machine learning.Google
understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.As similar, when we use Netflix, we find some
recommendations for entertainment series, movies, etc., and this is also done with the help of
machine learning.

5. Self-driving cars:

One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car manufacturing
company is working on self-driving car. It is using unsupervised learning method to train the
car models to detect people and objects while driving.

6. Email Spam and Malware Filtering:

Whenever we receive a new email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the important symbol and spam emails
in our spam box, and the technology behind this is Machine learning. Below are some spam
filters used by Gmail:

o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters

Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree,


and Naïve Bayes classifier are used for email spam filtering and malware detection.

7. Virtual Personal Assistant:

We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri.
As the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.These virtual assistants use machine
learning algorithms as an important part.These assistant record our voice instructions, send it
over the server on a cloud, and decode it using ML algorithms and act accordingly.

8. Online Fraud Detection:

Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that a
fraudulent transaction can take place such as fake accounts, fake ids, and steal money in the
middle of a transaction. So to detect this, Feed Forward Neural network helps us by checking
whether it is a genuine transaction or a fraud transaction.

For each genuine transaction, the output is converted into some hash values, and these values
become the input for the next round. For each genuine transaction, there is a specific pattern
which gets change for the fraud transaction hence, it detects it and maks our online transactions
more secure.
9. Medical Diagnosis:

In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact position
of lesions in the brain.It helps in finding brain tumors and other brain-related diseases easily.

10. Automatic Language Translation:

Nowadays, if we visit a new place and we are not aware of the language then it is not a problem
at all, as for this also machine learning helps us by converting the text into our known
languages. Google's GNMT (Google Neural Machine Translation) provide this feature, which
is a Neural Machine Learning that translates the text into our familiar language, and it called
as automatic translation.

The technology behind the automatic translation is a sequence to sequence learning algorithm,
which is used with image recognition and translates the text from one language to another
language.

Types of Machine Learning


There are several types of machine learning, each with special characteristics and applications.
Some of the main types of machine learning algorithms are as follows:
1. Supervised Machine Learning
2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning
Types of Machine Learning
1. Supervised Machine Learning
Supervised learning is defined as when a model gets trained on a “Labelled Dataset”. Labelled
datasets have both input and output parameters. In Supervised Learning algorithms learn to
map points between inputs and correct outputs. It has both training and validation datasets
labelled.
Supervised Learning
Let’s understand it with the help of an example.
Example: Consider a scenario where you have to build an image classifier to differentiate
between cats and dogs. If you feed the datasets of dogs and cats labelled images to the
algorithm, the machine will learn to classify between a dog or a cat from these labeled images.
When we input new dog or cat images that it has never seen before, it will use the learned
algorithms and predict whether it is a dog or a cat. This is how supervised learning works,
and this is particularly an image classification.
There are two main categories of supervised learning that are mentioned below:
• Classification
• Regression

Classification
Classification deals with predicting categorical target variables, which represent
discrete classes or labels. For instance, classifying emails as spam or not spam, or predicting
whether a patient has a high risk of heart disease. Classification algorithms learn to map the
input features to one of the predefined classes.
Here are some classification algorithms:
• Logistic Regression
• Support Vector Machine
• Random Forest
• Decision Tree
• K-Nearest Neighbors (KNN)
• Naive Bayes
Regression
Regression, on the other hand, deals with predicting continuous target variables, which
represent numerical values. For example, predicting the price of a house based on its size,
location, and amenities, or forecasting the sales of a product. Regression algorithms learn to
map the input features to a continuous numerical value.
Here are some regression algorithms:
• Linear Regression
• Polynomial Regression
• Ridge Regression
• Lasso Regression
• Decision tree
• Random Forest
Advantages of Supervised Machine Learning
• Supervised Learning models can have high accuracy as they are trained
on labelled data.
• The process of decision-making in supervised learning models is often
interpretable.
• It can often be used in pre-trained models which saves time and resources when
developing new models from scratch.
Disadvantages of Supervised Machine Learning
• It has limitations in knowing patterns and may struggle with unseen or unexpected
patterns that are not present in the training data.
• It can be time-consuming and costly as it relies on labeled data only.
• It may lead to poor generalizations based on new data.

Applications of Supervised Learning


Supervised learning is used in a wide variety of applications, including:
• Image classification: Identify objects, faces, and other features in images.
• Natural language processing: Extract information from text, such as sentiment,
entities, and relationships.
• Speech recognition: Convert spoken language into text.
• Recommendation systems: Make personalized recommendations to users.
• Predictive analytics: Predict outcomes, such as sales, customer churn, and stock
prices.
• Medical diagnosis: Detect diseases and other medical conditions.
• Fraud detection: Identify fraudulent transactions.
• Autonomous vehicles: Recognize and respond to objects in the environment.
• Email spam detection: Classify emails as spam or not spam.
• Quality control in manufacturing: Inspect products for defects.
• Credit scoring: Assess the risk of a borrower defaulting on a loan.
• Gaming: Recognize characters, analyze player behavior, and create NPCs.
• Customer support: Automate customer support tasks.
• Weather forecasting: Make predictions for temperature, precipitation, and other
meteorological parameters.
• Sports analytics: Analyze player performance, make game predictions, and
optimize strategies.
Unsupervised Machine Learning
Unsupervised Learning Unsupervised learning is a type of machine learning technique in
which an algorithm discovers patterns and relationships using unlabeled data. Unlike
supervised learning, unsupervised learning doesn’t involve providing the algorithm with
labeled target outputs. The primary goal of Unsupervised learning is often to discover hidden
patterns, similarities, or clusters within the data, which can then be used for various purposes,
such as data exploration, visualization, dimensionality reduction, and more.
Let’s understand it with the help of an example.
Example: Consider that you have a dataset that contains information about the purchases you
made from the shop. Through clustering, the algorithm can group the same purchasing behavior
among you and other customers, which reveals potential customers without predefined labels.
This type of information can help businesses get target customers as well as identify outliers.
There are two main categories of unsupervised learning that are mentioned below:
• Clustering
• Association
Clustering
Clustering is the process of grouping data points into clusters based on their similarity. This
technique is useful for identifying patterns and relationships in data without the need for labeled
examples.
Here are some clustering algorithms:
• K-Means Clustering algorithm
• Mean-shift algorithm
• DBSCAN Algorithm
• Principal Component Analysis
• Independent Component Analysis
Association
Association rule learning is a technique for discovering relationships between items in a
dataset. It identifies rules that indicate the presence of one item implies the presence of another
item with a specific probability.
Here are some association rule learning algorithms:
• Apriori Algorithm
• Eclat
• FP-growth Algorithm
Advantages of Unsupervised Machine Learning
• It helps to discover hidden patterns and various relationships between the data.
• Used for tasks such as customer segmentation, anomaly detection, and data
exploration.
• It does not require labeled data and reduces the effort of data labeling.
Disadvantages of Unsupervised Machine Learning
• Without using labels, it may be difficult to predict the quality of the model’s output.
• Cluster Interpretability may not be clear and may not have meaningful
interpretations.
• It has techniques such as autoencoders and dimensionality reduction that can be
used to extract meaningful features from raw data.
Applications of Unsupervised Learning
Here are some common applications of unsupervised learning:
• Clustering: Group similar data points into clusters.
• Anomaly detection: Identify outliers or anomalies in data.
• Dimensionality reduction: Reduce the dimensionality of data while preserving
its essential information.
• Recommendation systems: Suggest products, movies, or content to users based
on their historical behavior or preferences.
• Topic modeling: Discover latent topics within a collection of documents.
• Density estimation: Estimate the probability density function of data.
• Image and video compression: Reduce the amount of storage required for
multimedia content.
• Data preprocessing: Help with data preprocessing tasks such as data cleaning,
imputation of missing values, and data scaling.
• Market basket analysis: Discover associations between products.
• Genomic data analysis: Identify patterns or group genes with similar expression
profiles.
• Image segmentation: Segment images into meaningful regions.
• Community detection in social networks: Identify communities or groups of
individuals with similar interests or connections.
• Customer behavior analysis: Uncover patterns and insights for better marketing
and product recommendations.
• Content recommendation: Classify and tag content to make it easier to
recommend similar items to users.
• Exploratory data analysis (EDA): Explore data and gain insights before defining
specific tasks.
Semi-Supervised Learning
Semi-Supervised learning is a machine learning algorithm that works between the supervised
and unsupervised learning so it uses both labelled and unlabelled data. It’s particularly useful
when obtaining labeled data is costly, time-consuming, or resource-intensive. This approach is
useful when the dataset is expensive and time-consuming. Semi-supervised learning is chosen
when labeled data requires skills and relevant resources in order to train or learn from it.
We use these techniques when we are dealing with data that is a little bit labeled and the rest
large portion of it is unlabeled. We can use the unsupervised techniques to predict labels and
then feed these labels to supervised techniques. This technique is mostly applicable in the case
of image data sets where usually all images are not labeled.

Let’s understand it with the help of an example.


Example: Consider that we are building a language translation model, having labeled
translations for every sentence pair can be resources intensive. It allows the models to learn
from labeled and unlabeled sentence pairs, making them more accurate. This technique has led
to significant improvements in the quality of machine translation services.
Types of Semi-Supervised Learning Methods
There are a number of different semi-supervised learning methods each with its own
characteristics. Some of the most common ones include:
• Graph-based semi-supervised learning: This approach uses a graph to represent
the relationships between the data points. The graph is then used to propagate labels
from the labeled data points to the unlabeled data points.
• Label propagation: This approach iteratively propagates labels from the labeled
data points to the unlabeled data points, based on the similarities between the data
points.
• Co-training: This approach trains two different machine learning models on
different subsets of the unlabeled data. The two models are then used to label each
other’s predictions.
• Self-training: This approach trains a machine learning model on the labeled data
and then uses the model to predict labels for the unlabeled data. The model is then
retrained on the labeled data and the predicted labels for the unlabeled data.
• Generative adversarial networks (GANs): GANs are a type of deep learning
algorithm that can be used to generate synthetic data. GANs can be used to generate
unlabeled data for semi-supervised learning by training two neural networks, a
generator and a discriminator.
Advantages of Semi- Supervised Machine Learning
• It leads to better generalization as compared to supervised learning, as it takes
both labeled and unlabeled data.
• Can be applied to a wide range of data.
Disadvantages of Semi- Supervised Machine Learning
• Semi-supervised methods can be more complex to implement compared to other
approaches.
• It still requires some labeled data that might not always be available or easy to
obtain.
• The unlabeled data can impact the model performance accordingly.
Applications of Semi-Supervised Learning
Here are some common applications of semi-supervised learning:
• Image Classification and Object Recognition: Improve the accuracy of models
by combining a small set of labeled images with a larger set of unlabeled images.
• Natural Language Processing (NLP): Enhance the performance of language
models and classifiers by combining a small set of labeled text data with a vast
amount of unlabeled text.
• Speech Recognition: Improve the accuracy of speech recognition by leveraging a
limited amount of transcribed speech data and a more extensive set of unlabeled
audio.
• Recommendation Systems: Improve the accuracy of personalized
recommendations by supplementing a sparse set of user-item interactions (labeled
data) with a wealth of unlabeled user behavior data.
• Healthcare and Medical Imaging: Enhance medical image analysis by utilizing
a small set of labeled medical images alongside a larger set of unlabeled images.
Reinforcement Machine Learning
Reinforcement machine learning algorithm is a learning method that interacts with the
environment by producing actions and discovering errors. Trial, error, and delay are the most
relevant characteristics of reinforcement learning. In this technique, the model keeps on
increasing its performance using Reward Feedback to learn the behavior or pattern. These
algorithms are specific to a particular problem e.g. Google Self Driving car, AlphaGo where a
bot competes with humans and even itself to get better and better performers in Go Game. Each
time we feed in data, they learn and add the data to their knowledge which is training data. So,
the more it learns the better it gets trained and hence experienced.
Here are some of most common reinforcement learning algorithms:
• Q-learning: Q-learning is a model-free RL algorithm that learns a Q-function,
which maps states to actions. The Q-function estimates the expected reward of
taking a particular action in a given state.
• SARSA (State-Action-Reward-State-Action): SARSA is another model-free RL
algorithm that learns a Q-function. However, unlike Q-learning, SARSA updates
the Q-function for the action that was actually taken, rather than the optimal action.
• Deep Q-learning: Deep Q-learning is a combination of Q-learning and deep
learning. Deep Q-learning uses a neural network to represent the Q-function, which
allows it to learn complex relationships between states and actions.

Reinforcement Machine Learning


Let’s understand it with the help of examples.
Example: Consider that you are training an AI agent to play a game like chess. The agent
explores different moves and receives positive or negative feedback based on the outcome.
Reinforcement Learning also finds applications in which they learn to perform tasks by
interacting with their surroundings.
Types of Reinforcement Machine Learning
There are two main types of reinforcement learning:
Positive reinforcement
• Rewards the agent for taking a desired action.
• Encourages the agent to repeat the behavior.
• Examples: Giving a treat to a dog for sitting, providing a point in a game for a
correct answer.
Negative reinforcement
• Removes an undesirable stimulus to encourage a desired behavior.
• Discourages the agent from repeating the behavior.
• Examples: Turning off a loud buzzer when a lever is pressed, avoiding a penalty
by completing a task.
Advantages of Reinforcement Machine Learning
• It has autonomous decision-making that is well-suited for tasks and that can learn
to make a sequence of decisions, like robotics and game-playing.
• This technique is preferred to achieve long-term results that are very difficult to
achieve.
• It is used to solve a complex problems that cannot be solved by conventional
techniques.
Disadvantages of Reinforcement Machine Learning
• Training Reinforcement Learning agents can be computationally expensive and
time-consuming.
• Reinforcement learning is not preferable to solving simple problems.
• It needs a lot of data and a lot of computation, which makes it impractical and costly.
Applications of Reinforcement Machine Learning
Here are some applications of reinforcement learning:
• Game Playing: RL can teach agents to play games, even complex ones.
• Robotics: RL can teach robots to perform tasks autonomously.
• Autonomous Vehicles: RL can help self-driving cars navigate and make decisions.
• Recommendation Systems: RL can enhance recommendation algorithms by
learning user preferences.
• Healthcare: RL can be used to optimize treatment plans and drug discovery.
• Natural Language Processing (NLP): RL can be used in dialogue systems and
chatbots.
• Finance and Trading: RL can be used for algorithmic trading.
• Supply Chain and Inventory Management: RL can be used to optimize supply
chain operations.
• Energy Management: RL can be used to optimize energy consumption.
• Game AI: RL can be used to create more intelligent and adaptive NPCs in video
games.
• Adaptive Personal Assistants: RL can be used to improve personal assistants.
• Virtual Reality (VR) and Augmented Reality (AR): RL can be used to create
immersive and interactive experiences.
• Industrial Control: RL can be used to optimize industrial processes.
• Education: RL can be used to create adaptive learning systems.
• Agriculture: RL can be used to optimize agricultural operations.

Regression
Regression is a statistical method used in machine learning to model and analyze the
relationships between a dependent variable (output) and one or more independent variables
(inputs). It aims to predict the dependent variable’s value based on the independent variables’
values.
• In machine learning, regression is a type of supervised learning in which the model
learns from a dataset of input-output pairs. The model identifies patterns in the input
features to predict continuous numerical values of the output variable.
• Regression algorithms help solve regression problems by finding the relationship
between the data points and fitting a regression model.
• These algorithms attempt to find the best-fit line, curve, or surface that minimizes the
difference between predicted and actual values.

Applications of Regression Algorithms


Finance and Economics:
Stock Price Prediction: Predicting future stock prices based on historical data, market trends,
and economic indicators.
Economic Forecasting: Modeling economic indicators like GDP growth, unemployment
rates, and inflation trends.
Healthcare:
Disease Progression: Predicting the progression of diseases such as diabetes or cancer.
Patient Outcomes: Estimating patient survival rates, recovery times, and treatment
effectiveness.
Healthcare Costs: Forecasting hospital readmission rates and healthcare expenditures.
Marketing and Sales:
Customer Lifetime Value (CLV) Is the total value a customer will bring to a business over
the course of their relationship.
Sales Forecasting: Predicting future sales based on historical sales data, market conditions,
and promotional activities.
Engineering and Manufacturing:
Predictive Maintenance: Forecasting equipment failures and maintenance needs to reduce
downtime and repair costs.
Environmental Science:
Weather Forecasting: Predicting weather conditions such as temperature, rainfall, and wind
speed.
Climate Change Modeling: Estimating the impacts of climate change on various
environmental factors.
Pollution Levels: Forecasting air and water pollution levels based on industrial activities,
traffic, and meteorological data.
Retail and E-commerce:
Demand Forecasting: Predicting future product demand to optimize inventory levels and
supply chain management.
Price Optimization: Estimating the optimal pricing strategy to maximize revenue and profit.
Transportation and Logistics:
Delivery Time Estimation: Forecasting delivery times in logistics and supply chain operations
based on various factors, such as distance, traffic, and weather conditions.
1. Regression

Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come
under supervised learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

2. Classification

Classification algorithms are used when the output variable is categorical, which means there
are two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

Advantages of Supervised learning:

o With the help of supervised learning, the model can predict the output on the basis of
prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.

Disadvantages of supervised learning:

o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is different from
the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of object.

What is Unsupervised Learning?

Unsupervised learning, on the other hand, is the method that trains machines to use data that is
neither classified nor labeled. It means no training data can be provided and the machine is
made to learn by itself. The machine must be able to classify the data without any prior
information about the data.

The idea is to expose the machines to large volumes of varying data and allow it to learn from
that data to provide insights that were previously unknown and to identify hidden patterns. As
such, there aren’t necessarily defined outcomes from unsupervised learning algorithms. Rather,
it determines what is different or interesting from the given dataset.

The machine needs to be programmed to learn by itself. The computer needs to understand and
provide insights from both structured and unstructured data. Here’s an accurate illustration of
unsupervised learning:
Unsupervised Machine Learning Categorization

1) Clustering is one of the most common unsupervised learning methods. The method of
clustering involves organizing unlabelled data into similar groups called clusters. Thus, a
cluster is a collection of similar data items. The primary goal here is to find similarities in the
data points and group similar data points into a cluster.

2) Anomaly detection is the method of identifying rare items, events or observations which
differ significantly from the majority of the data. We generally look for anomalies or outliers
in data because they are suspicious. Anomaly detection is often utilized in bank fraud and
medical error detection.

Applications of Unsupervised Learning Algorithms

Some practical applications of unsupervised learning algorithms include:

• Fraud detection
• Malware detection
• Identification of human errors during data entry
• Conducting accurate basket analysis, etc.

Types of clustering

 Partitioning Clustering
 Density-Based Clustering
 Distribution Model-Based Clustering
 Hierarchical Clustering
 Fuzzy Clustering

Partitioning Clustering

 a type of clustering that divides the data into non-hierarchical groups. It is also known
as the centroid-based method
Density-Based Clustering

 connects the highly-dense areas into clusters, and the arbitrarily shaped distributions
are formed as long as the dense region can be connected. This algorithm does it by
identifying different clusters in the dataset and connects the areas of high densities into
clusters
Distribution Model-Based Clustering

 The data is divided based on the probability of how a dataset belongs to a particular
distribution. The grouping is done by assuming some distributions
commonly Gaussian Distribution.

Hierarchical Clustering
The dataset is divided into clusters to create a tree-like structure, which is also called
a dendrogram. The observations or any number of clusters can be selected by cutting the tree
at the correct level.
 Fuzzy clustering
 Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster.

Applications of Clustering

 In Identification of Cancer Cells


 In Search Engines:The search result appears based on the closest object to the search
query. It does it by grouping similar data objects in one group that is far from the other
dissimilar objects
 Customer Segmentation: It is used in market research to segment the customers based
on their choice and preferences.
In Biology: to classify different species of plants and animals using the image recognition
technique

Association Rule Learning

 Association rule learning is a type of unsupervised learning technique that checks for
the dependency of one data item on another data item and maps accordingly so
that it can be more profitable.
 Basket analysis
 Web usage mining
 continuous production
 Measure the associations between thousands of data items, there are several metrics.
These metrics are given below:
◼ Support
◼ Confidence
◼ Lift
 Apriori Algorithm
 This algorithm uses frequent datasets to generate association rules. It is designed
to work on the databases that contain transactions. This algorithm uses a
breadth-first search and Hash Tree to calculate the itemset efficiently.
 Eclat Algorithm
 Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database. It
performs faster execution than Apriori Algorithm.
 F-P Growth Algorithm
 The F-P growth algorithm stands for Frequent Pattern, and it is the improved version
of the Apriori Algorithm. It represents the database in the form of a tree structure that
is known as a frequent pattern or tree. The purpose of this frequent tree is to extract the
most frequent patterns.

Python Libraries

1. NumPy

NumPy is a popular Python library for multi-dimensional array and matrix processing because
it can be used to perform a great variety of mathematical operations. Its capability to handle
linear algebra, Fourier transform, and more, makes NumPy ideal for machine learning and
artificial intelligence (AI) projects, allowing users to manipulate the matrix to easily improve
machine learning performance. NumPy is faster and easier to use than most other Python
libraries.
2. Scikit-learn

Scikit-learn is a very popular machine learning library that is built on NumPy and SciPy. It
supports most of the classic supervised and unsupervised learning algorithms, and it can also
be used for data mining, modeling, and analysis. Scikit-learn’s simple design offers a user-
friendly library for those new to machine learning.

3. Pandas

Pandas is another Python library that is built on top of NumPy, responsible for preparing high-
level data sets for machine learning and training. It relies on two types of data structures, one-
dimensional (series) and two-dimensional (DataFrame). This allows Pandas to be applicable in
a variety of industries including finance, engineering, and statistics. Unlike the slow-moving
animals themselves, the Pandas library is quick, compliant, and flexible.

4. TensorFlow

TensorFlow’s open-source Python library specializes in what’s called differentiable


programming, meaning it can automatically compute a function’s derivatives within high-level
language. Both machine learning and deep learning models are easily developed and evaluated
with TensorFlow’s flexible architecture and framework. TensorFlow can be used to visualize
machine learning models on both desktop and mobile.

5. Seaborn

Seaborn is another open-source Python library, one that is based on Matplotlib (which focuses
on plotting and data visualization) but features Pandas’ data structures. Seaborn is often used
in ML projects because it can generate plots of learning data. Of all the Python libraries, it
produces the most aesthetically pleasing graphs and plots, making it an effective choice if you
also use it for marketing and data analysis.

6. Theano

Theano is a Python library that focuses on numerical computation and is specifically made for
machine learning. It is able to optimize and evaluate mathematical models and matrix
calculations that use multi-dimensional arrays to create ML models. Theano is almost
exclusively used by machine learning and deep learning developers or programmers.
7. Keras

Keras is a Python library that is designed specifically for developing neural networks for ML
models. It can run on top of Theano and TensorFlow to train neural networks. Keras is flexible,
portable, user-friendly, and easily integrated with multiple functions.

8. PyTorch

PyTorch is an open-source machine learning Python library based on the C programming


language framework, Torch. It is mainly used in ML applications that involve natural language
processing or computer vision. PyTorch is known for being exceptionally fast at executing
large, dense data sets and graphs.

9. Matplotlib

Matplotlib is a Python library focused on data visualization and primarily used for creating
beautiful graphs, plots, histograms, and bar charts. It is compatible with plotting data from
SciPy, NumPy, and Pandas. If you have experience using other types of graphing tools,
Matplotlib might be the most intuitive choice for you..

10. PyTorch

PyTorch is an open-source machine learning Python library based on the C programming


language framework, Torch. It is mainly used in ML applications that involve natural language
processing or computer vision. PyTorch is known for being exceptionally fast at executing
large, dense data sets and graphs.

11. Matplotlib

Matplotlib is a Python library focused on data visualization and primarily used for creating
beautiful graphs, plots, histograms, and bar charts. It is compatible with plotting data from
SciPy, NumPy, and Pandas. If you have experience using other types of graphing tools,
Matplotlib might be the most intuitive choice for you.

You might also like