S11BVAC14-Machine Learnig Using Python-CSE Course Material Unit1
S11BVAC14-Machine Learnig Using Python-CSE Course Material Unit1
Course Material
UNIT 1: Introduction
About Machine learning - Applications of ML -Uses of ML - Machine learning methods -
Machine learning algorithms Regression,Classification, Clustering, Association - A brief
introduction python libraries.
Introduction
In the real world, we are surrounded by humans who can learn everything from their
experiences with their learning capability, and we have computers or machines which work on
our instructions. But can a machine also learn from experiences or past data like a human does?
So here comes the role of Machine Learning.
Traditional Programming
Unlike traditional programming, machine learning is an automated process. It can increase the
value of your embedded analytics in many areas, including data prep, natural language
interfaces, automatic outlier detection, recommendations, and causality and significance
detection. All of these features help speed user insights and reduce decision bias.
For example, if you feed in customer demographics and transactions as input data and use
historical customer churn rates as your output data, the algorithm will formulate a program that
can predict if a customer will churn or not. That program is called a predictive model.
You can use this model to predict business outcomes in any situation where you have input and
historical output data:
3. Identify the historically observed output (i.e., data samples for when the condition is
true and for when it’s false).
For instance, if you want to predict who will pay the bills late, identify the input (customer
demographics, bills) and the output (pay late or not), and let the machine learning use this data
to create your model.
Based on the methods and way of learning, machine learning is divided into mainly four types,
which are:
Further, the prediction is checked for accuracy. Based on its accuracy, the ML algorithm is
either deployed or trained repeatedly with an augmented training dataset until the desired
accuracy is accured.
• Data Bias and Fairness: ML algorithms are only as good as the data they are trained
on. Biased data can lead to discriminatory outcomes, requiring careful data selection
and monitoring of algorithms.
• Security and Privacy Concerns: As ML relies heavily on data, security breaches can
expose sensitive information. Additionally, the use of personal data raises privacy
concerns that need to be addressed.
• Interpretability and Explainability: Complex ML models can be difficult to
understand, making it challenging to explain their decision-making processes. This lack
of transparency can raise questions about accountability and trust.
• Job Displacement and Automation: Automation through ML can lead to job
displacement in certain sectors. Addressing the need for retraining and reskilling the
workforce is crucial.
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition
and face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.It is
based on the Facebook project named "Deep Face," which is responsible for face recognition
and person identification in the picture. ward Skip 10s
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.Speech recognition is a process of converting
voice instructions into text, and it is also known as "Speech to text", or "Computer speech
recognition." At present, machine learning algorithms are widely used by various applications
of speech recognition. Google assistant, Siri, Cortana, and Alexa are using speech
recognition technology to follow the voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path
with the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
o Real Time location of the vehicle form Google Map app and sensors
o Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information
from the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for
some product on Amazon, then we started getting an advertisement for the same product while
internet surfing on the same browser and this is because of machine learning.Google
understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.As similar, when we use Netflix, we find some
recommendations for entertainment series, movies, etc., and this is also done with the help of
machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine
learning plays a significant role in self-driving cars. Tesla, the most popular car manufacturing
company is working on self-driving car. It is using unsupervised learning method to train the
car models to detect people and objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam.
We always receive an important mail in our inbox with the important symbol and spam emails
in our spam box, and the technology behind this is Machine learning. Below are some spam
filters used by Gmail:
o Content Filter
o Header filter
o General blacklists filter
o Rules-based filters
o Permission filters
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri.
As the name suggests, they help us in finding the information using our voice instruction. These
assistants can help us in various ways just by our voice instructions such as Play music, call
someone, Open an email, Scheduling an appointment, etc.These virtual assistants use machine
learning algorithms as an important part.These assistant record our voice instructions, send it
over the server on a cloud, and decode it using ML algorithms and act accordingly.
Machine learning is making our online transaction safe and secure by detecting fraud
transaction. Whenever we perform some online transaction, there may be various ways that a
fraudulent transaction can take place such as fake accounts, fake ids, and steal money in the
middle of a transaction. So to detect this, Feed Forward Neural network helps us by checking
whether it is a genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these values
become the input for the next round. For each genuine transaction, there is a specific pattern
which gets change for the fraud transaction hence, it detects it and maks our online transactions
more secure.
9. Medical Diagnosis:
In medical science, machine learning is used for diseases diagnoses. With this, medical
technology is growing very fast and able to build 3D models that can predict the exact position
of lesions in the brain.It helps in finding brain tumors and other brain-related diseases easily.
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem
at all, as for this also machine learning helps us by converting the text into our known
languages. Google's GNMT (Google Neural Machine Translation) provide this feature, which
is a Neural Machine Learning that translates the text into our familiar language, and it called
as automatic translation.
The technology behind the automatic translation is a sequence to sequence learning algorithm,
which is used with image recognition and translates the text from one language to another
language.
Classification
Classification deals with predicting categorical target variables, which represent
discrete classes or labels. For instance, classifying emails as spam or not spam, or predicting
whether a patient has a high risk of heart disease. Classification algorithms learn to map the
input features to one of the predefined classes.
Here are some classification algorithms:
• Logistic Regression
• Support Vector Machine
• Random Forest
• Decision Tree
• K-Nearest Neighbors (KNN)
• Naive Bayes
Regression
Regression, on the other hand, deals with predicting continuous target variables, which
represent numerical values. For example, predicting the price of a house based on its size,
location, and amenities, or forecasting the sales of a product. Regression algorithms learn to
map the input features to a continuous numerical value.
Here are some regression algorithms:
• Linear Regression
• Polynomial Regression
• Ridge Regression
• Lasso Regression
• Decision tree
• Random Forest
Advantages of Supervised Machine Learning
• Supervised Learning models can have high accuracy as they are trained
on labelled data.
• The process of decision-making in supervised learning models is often
interpretable.
• It can often be used in pre-trained models which saves time and resources when
developing new models from scratch.
Disadvantages of Supervised Machine Learning
• It has limitations in knowing patterns and may struggle with unseen or unexpected
patterns that are not present in the training data.
• It can be time-consuming and costly as it relies on labeled data only.
• It may lead to poor generalizations based on new data.
Regression
Regression is a statistical method used in machine learning to model and analyze the
relationships between a dependent variable (output) and one or more independent variables
(inputs). It aims to predict the dependent variable’s value based on the independent variables’
values.
• In machine learning, regression is a type of supervised learning in which the model
learns from a dataset of input-output pairs. The model identifies patterns in the input
features to predict continuous numerical values of the output variable.
• Regression algorithms help solve regression problems by finding the relationship
between the data points and fitting a regression model.
• These algorithms attempt to find the best-fit line, curve, or surface that minimizes the
difference between predicted and actual values.
Regression algorithms are used if there is a relationship between the input variable and the
output variable. It is used for the prediction of continuous variables, such as Weather
forecasting, Market Trends, etc. Below are some popular Regression algorithms which come
under supervised learning:
o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression
2. Classification
Classification algorithms are used when the output variable is categorical, which means there
are two classes such as Yes-No, Male-Female, True-false, etc.
Spam Filtering,
o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines
o With the help of supervised learning, the model can predict the output on the basis of
prior experiences.
o In supervised learning, we can have an exact idea about the classes of objects.
o Supervised learning model helps us to solve various real-world problems such as fraud
detection, spam filtering, etc.
o Supervised learning models are not suitable for handling the complex tasks.
o Supervised learning cannot predict the correct output if the test data is different from
the training dataset.
o Training required lots of computation times.
o In supervised learning, we need enough knowledge about the classes of object.
Unsupervised learning, on the other hand, is the method that trains machines to use data that is
neither classified nor labeled. It means no training data can be provided and the machine is
made to learn by itself. The machine must be able to classify the data without any prior
information about the data.
The idea is to expose the machines to large volumes of varying data and allow it to learn from
that data to provide insights that were previously unknown and to identify hidden patterns. As
such, there aren’t necessarily defined outcomes from unsupervised learning algorithms. Rather,
it determines what is different or interesting from the given dataset.
The machine needs to be programmed to learn by itself. The computer needs to understand and
provide insights from both structured and unstructured data. Here’s an accurate illustration of
unsupervised learning:
Unsupervised Machine Learning Categorization
1) Clustering is one of the most common unsupervised learning methods. The method of
clustering involves organizing unlabelled data into similar groups called clusters. Thus, a
cluster is a collection of similar data items. The primary goal here is to find similarities in the
data points and group similar data points into a cluster.
2) Anomaly detection is the method of identifying rare items, events or observations which
differ significantly from the majority of the data. We generally look for anomalies or outliers
in data because they are suspicious. Anomaly detection is often utilized in bank fraud and
medical error detection.
• Fraud detection
• Malware detection
• Identification of human errors during data entry
• Conducting accurate basket analysis, etc.
Types of clustering
Partitioning Clustering
Density-Based Clustering
Distribution Model-Based Clustering
Hierarchical Clustering
Fuzzy Clustering
Partitioning Clustering
a type of clustering that divides the data into non-hierarchical groups. It is also known
as the centroid-based method
Density-Based Clustering
connects the highly-dense areas into clusters, and the arbitrarily shaped distributions
are formed as long as the dense region can be connected. This algorithm does it by
identifying different clusters in the dataset and connects the areas of high densities into
clusters
Distribution Model-Based Clustering
The data is divided based on the probability of how a dataset belongs to a particular
distribution. The grouping is done by assuming some distributions
commonly Gaussian Distribution.
Hierarchical Clustering
The dataset is divided into clusters to create a tree-like structure, which is also called
a dendrogram. The observations or any number of clusters can be selected by cutting the tree
at the correct level.
Fuzzy clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more
than one group or cluster.
Applications of Clustering
Association rule learning is a type of unsupervised learning technique that checks for
the dependency of one data item on another data item and maps accordingly so
that it can be more profitable.
Basket analysis
Web usage mining
continuous production
Measure the associations between thousands of data items, there are several metrics.
These metrics are given below:
◼ Support
◼ Confidence
◼ Lift
Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed
to work on the databases that contain transactions. This algorithm uses a
breadth-first search and Hash Tree to calculate the itemset efficiently.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation. This algorithm uses a
depth-first search technique to find frequent itemsets in a transaction database. It
performs faster execution than Apriori Algorithm.
F-P Growth Algorithm
The F-P growth algorithm stands for Frequent Pattern, and it is the improved version
of the Apriori Algorithm. It represents the database in the form of a tree structure that
is known as a frequent pattern or tree. The purpose of this frequent tree is to extract the
most frequent patterns.
Python Libraries
1. NumPy
NumPy is a popular Python library for multi-dimensional array and matrix processing because
it can be used to perform a great variety of mathematical operations. Its capability to handle
linear algebra, Fourier transform, and more, makes NumPy ideal for machine learning and
artificial intelligence (AI) projects, allowing users to manipulate the matrix to easily improve
machine learning performance. NumPy is faster and easier to use than most other Python
libraries.
2. Scikit-learn
Scikit-learn is a very popular machine learning library that is built on NumPy and SciPy. It
supports most of the classic supervised and unsupervised learning algorithms, and it can also
be used for data mining, modeling, and analysis. Scikit-learn’s simple design offers a user-
friendly library for those new to machine learning.
3. Pandas
Pandas is another Python library that is built on top of NumPy, responsible for preparing high-
level data sets for machine learning and training. It relies on two types of data structures, one-
dimensional (series) and two-dimensional (DataFrame). This allows Pandas to be applicable in
a variety of industries including finance, engineering, and statistics. Unlike the slow-moving
animals themselves, the Pandas library is quick, compliant, and flexible.
4. TensorFlow
5. Seaborn
Seaborn is another open-source Python library, one that is based on Matplotlib (which focuses
on plotting and data visualization) but features Pandas’ data structures. Seaborn is often used
in ML projects because it can generate plots of learning data. Of all the Python libraries, it
produces the most aesthetically pleasing graphs and plots, making it an effective choice if you
also use it for marketing and data analysis.
6. Theano
Theano is a Python library that focuses on numerical computation and is specifically made for
machine learning. It is able to optimize and evaluate mathematical models and matrix
calculations that use multi-dimensional arrays to create ML models. Theano is almost
exclusively used by machine learning and deep learning developers or programmers.
7. Keras
Keras is a Python library that is designed specifically for developing neural networks for ML
models. It can run on top of Theano and TensorFlow to train neural networks. Keras is flexible,
portable, user-friendly, and easily integrated with multiple functions.
8. PyTorch
9. Matplotlib
Matplotlib is a Python library focused on data visualization and primarily used for creating
beautiful graphs, plots, histograms, and bar charts. It is compatible with plotting data from
SciPy, NumPy, and Pandas. If you have experience using other types of graphing tools,
Matplotlib might be the most intuitive choice for you..
10. PyTorch
11. Matplotlib
Matplotlib is a Python library focused on data visualization and primarily used for creating
beautiful graphs, plots, histograms, and bar charts. It is compatible with plotting data from
SciPy, NumPy, and Pandas. If you have experience using other types of graphing tools,
Matplotlib might be the most intuitive choice for you.