Part-1 Introduction of ML
Part-1 Introduction of ML
Machine learning (ML) is a subdomain of artificial intelligence (AI) that focuses on developing
systems that learn—or improve performance—based on the data they ingest. Artificial intelligence is a
broad word that refers to systems or machines that resemble human intelligence. Machine learning and
AI are frequently discussed together, and the terms are occasionally used interchangeably, although
they do not signify the same thing. A crucial distinction is that, while all machine learning is AI, not all
AI is machine learning.
A machine learning system builds prediction models, learns from previous data, and predicts the
output of new data whenever it receives it. The amount of data helps to build a better model that
accurately predicts the output, which in turn affects the accuracy of the predicted output.
Let's say we have a complex problem in which we need to make predictions. Instead of writing
code, we just need to feed the data to generic algorithms, which build the logic based on the data
and predict the output. Our perspective on the issue has changed as a result of machine learning.
The Machine Learning algorithm's operation is depicted in the following block diagram:
Features of Machine learning Machine learning is data driven technology. Large amount of data
generated by organizations on daily bases. So, by notable relationships in data, organizations makes
better decisions.
Machine can learn itself from past data and automatically improve.
From the given dataset it detects various patterns on data.
For the big organizations branding is important and it will become easier to target relatable
customer base.
It is similar to data mining because it is also deals with the huge amount of data.
The demand for machine learning is steadily rising. Because it is able to perform tasks that are too
complex for a person to directly implement, machine learning is required. Humans are constrained
by our inability to manually access vast amounts of data; as a result, we require computer systems,
which is where machine learning comes in to simplify our lives.
By providing them with a large amount of data and allowing them to automatically explore the
data, build models, and predict the required output, we can train machine learning algorithms. The
cost function can be used to determine the amount of data and the machine learning algorithm's
performance. We can save both time and money by using machine learning.
The significance of AI can be handily perceived by its utilization's cases, Presently, AI is utilized in
self-driving vehicles, digital misrepresentation identification, face acknowledgment, and
companion idea by Facebook, and so on. Different top organizations, for example, Netflix and
Amazon have constructed AI models that are utilizing an immense measure of information to
examine the client interest and suggest item likewise.
Following are some key points which show the importance of Machine Learning:
Machine learning is a buzzword for today's technology, and it is growing very rapidly day by day.
We are using machine learning in our daily life even without knowing it such as Google Maps,
Google assistant, Alexa, etc. Below are some most trending real-world applications of Machine
Learning:
1. Image Recognition:
Image recognition is one of the most common applications of machine learning. It is used to
identify objects, persons, places, digital images, etc. The popular use case of image recognition and
face detection is, Automatic friend tagging suggestion:
Facebook provides us a feature of auto friend tagging suggestion. Whenever we upload a photo
with our Facebook friends, then we automatically get a tagging suggestion with name, and the
technology behind this is machine learning's face detection and recognition algorithm.
It is based on the Facebook project named "Deep Face," which is responsible for face recognition
and person identification in the picture.
2. Speech Recognition
While using Google, we get an option of "Search by voice," it comes under speech recognition,
and it's a popular application of machine learning.
Speech recognition is a process of converting voice instructions into text, and it is also known as
"Speech to text", or "Computer speech recognition." At present, machine learning algorithms are
widely used by various applications of speech recognition. Google assistant, Siri, Cortana, and
Alexa are using speech recognition technology to follow the voice instructions.
3. Traffic prediction:
If we want to visit a new place, we take help of Google Maps, which shows us the correct path with
the shortest route and predicts the traffic conditions.
It predicts the traffic conditions such as whether traffic is cleared, slow-moving, or heavily
congested with the help of two ways:
Real Time location of the vehicle form Google Map app and sensors
Average time has taken on past days at the same time.
Everyone who is using Google Map is helping this app to make it better. It takes information from
the user and sends back to its database to improve the performance.
4. Product recommendations:
Machine learning is widely used by various e-commerce and entertainment companies such as
Amazon, Netflix, etc., for product recommendation to the user. Whenever we search for some
product on Amazon, then we started getting an advertisement for the same product while internet
surfing on the same browser and this is because of machine learning.
Google understands the user interest using various machine learning algorithms and suggests the
product as per customer interest.
As similar, when we use Netflix, we find some recommendations for entertainment series, movies,
etc., and this is also done with the help of machine learning.
5. Self-driving cars:
One of the most exciting applications of machine learning is self-driving cars. Machine learning
plays a significant role in self-driving cars. Tesla, the most popular car manufacturing company is
working on self-driving car. It is using unsupervised learning method to train the car models to
detect people and objects while driving.
Whenever we receive a new email, it is filtered automatically as important, normal, and spam. We
always receive an important mail in our inbox with the important symbol and spam emails in our
spam box, and the technology behind this is Machine learning. Below are some spam filters used
by Gmail:
Content Filter
Header filter
General blacklists filter
Rules-based filters
Permission filters
Some machine learning algorithms such as Multi-Layer Perceptron, Decision tree, and Naïve
Bayes classifier are used for email spam filtering and malware detection.
We have various virtual personal assistants such as Google assistant, Alexa, Cortana, Siri. As the
name suggests, they help us in finding the information using our voice instruction. These assistants
can help us in various ways just by our voice instructions such as Play music, call someone, Open
an email, Scheduling an appointment, etc.
These assistant record our voice instructions, send it over the server on a cloud, and decode it using
ML algorithms and act accordingly.
8. Online Fraud Detection:
Machine learning is making our online transaction safe and secure by detecting fraud transaction.
Whenever we perform some online transaction, there may be various ways that a fraudulent
transaction can take place such as fake accounts, fake ids, and steal money in the middle of a
transaction. So to detect this, Feed Forward Neural network helps us by checking whether it is a
genuine transaction or a fraud transaction.
For each genuine transaction, the output is converted into some hash values, and these values
become the input for the next round. For each genuine transaction, there is a specific pattern which
gets change for the fraud transaction hence, it detects it and makes our online transactions more
secure.
Machine learning is widely used in stock market trading. In the stock market, there is always a risk
of up and downs in shares, so for this machine learning's long short term memory neural
network is used for the prediction of stock market trends.
In medical science, machine learning is used for diseases diagnoses. With this, medical technology
is growing very fast and able to build 3D models that can predict the exact position of lesions in the
brain.
Nowadays, if we visit a new place and we are not aware of the language then it is not a problem at
all, as for this also machine learning helps us by converting the text into our known languages.
Google's GNMT (Google Neural Machine Translation) provide this feature, which is a Neural
Machine Learning that translates the text into our familiar language, and it called as automatic
translation.
The technology behind the automatic translation is a sequence to sequence learning algorithm,
which is used with image recognition and translates the text from one language to another
language.
From translation apps to autonomous vehicles, all powers with Machine Learning. It offers a way to
solve problems and answer complex questions. It is basically a process of training a piece of
software called an algorithm or model, to make useful predictions from data. This article discusses
the categories of machine learning problems, and terminologies used in the field of machine
learning.
Types of machine learning problems
There are various ways to classify machine learning problems. Here, we discuss the most obvious
ones.
1. On the basis of the nature of the learning “signal” or “feedback” available to a learning
system
Supervised learning: The model or algorithm is presented with example inputs and their
desired outputs and then finds patterns and connections between the input and the output. The
goal is to learn a general rule that maps inputs to outputs. The training process continues until
the model achieves the desired level of accuracy on the training data. Some real-life examples
are:
Image Classification: You train with images/labels. Then in the future, you give a
new image expecting that the computer will recognize the new object.
Examples:
Email Spam Detection: The model is trained on emails labeled as "spam" or "not spam."
Features such as keywords and metadata are used to classify new emails.
Credit Card Fraud Detection: The model uses labeled transaction data to predict whether
new transactions are fraudulent.
Unsupervised learning: No labels are given to the learning algorithm, leaving it on its
own to find structure in its input. It is used for clustering populations in different groups.
Unsupervised learning can be a goal in itself (discovering hidden patterns in data).
Clustering: You ask the computer to separate similar data into clusters, this is
essential in research and science.
A simple diagram that clears the concept of supervised and unsupervised learning is shown
below:
As you can see clearly, the data in supervised learning is labeled, whereas data in unsupervised
learning is unlabelled.
Semi-supervised learning: Problems where you have a large amount of input data and
only some of the data is labelled, are called semi-supervised learning problems. These
problems sit in between both supervised and unsupervised learning. For example, a photo
archive where only some of the images are labelled, (e.g. dog, cat, person) and the majority
are unlabelled.
Examples:
Game Playing: Training agents to play games like chess or Go, where the agent learns
optimal strategies through trial and error.
Robotics: Training robots to navigate and manipulate objects in an environment.
Classification: Inputs are divided into two or more classes, and the learner must produce
a model that assigns unseen inputs to one or more (multi-label classification) of these classes
and predicts whether or not something belongs to a particular class. This is typically tackled
in a supervised way. Classification models can be categorized in two groups: Binary
classification and Multiclass Classification. Spam filtering is an example of binary
classification, where the inputs are email (or other) messages and the classes are “spam” and
“not spam”.
In this case, an email service provider uses a supervised learning algorithm to classify incoming
emails as "spam" or "not spam." Here's how it works:
1. Data Collection: The system is trained using a large dataset of emails that have been
labeled as spam or not spam by human users.
2. Feature Extraction: The algorithm extracts features from these emails, such as the
presence of certain keywords, the frequency of those keywords, the sender's email address,
and other metadata.
3. Training: The labeled dataset is used to train a machine learning model. The model learns
to associate certain features with spam emails and others with non-spam emails.
4. Model Application: Once trained, the model can analyze new incoming emails in real time,
using the learned features to predict whether each email is spam or not.
5. Feedback Loop: Users can mark emails as spam or not spam, and this feedback is used to
continually improve the model.
For example, when you receive an email, the spam detection system quickly evaluates the content
and metadata of the email and determines whether it should be placed in your inbox or the spam
folder. If you find a spam email in your inbox and mark it as spam, this information helps the
system to improve its future predictions.
Regression: It is also a supervised learning problem, that predicts a numeric value and
outputs are continuous rather than discrete. For example, predicting stock prices using
historical data.
Problem: Predicting the price of a house based on features such as size, number of bedrooms,
location, age, and other characteristics.
Steps Involved:
1. Data Collection: Gather historical data on houses, including their features and actual sale
prices.
2. Feature Selection: Identify relevant features that influence house prices, such as:
o Square footage
o Number of bedrooms and bathrooms
o Location (e.g., distance to city center, neighborhood quality)
o Age of the house
o Amenities (e.g., swimming pool, garage)
3. Model Training: Use a regression algorithm (e.g., linear regression, polynomial regression,
or more advanced techniques like gradient boosting regression) to learn the relationship
between the features and the house prices.
4. Model Evaluation: Assess the model's accuracy using metrics like Mean Absolute Error
(MAE), Mean Squared Error (MSE), or R-squared on a validation dataset.
5. Real-Time Prediction: Apply the trained model to predict the prices of new houses coming
into the market based on their features.
Example:
Suppose a real estate company wants to use machine learning to predict house prices for their new
listings. They have historical data on thousands of houses:
The company uses this data to train a regression model. Once trained, the model can take the
features of a new house and predict its price in real-time. For example, if a new house comes on the
market with the following features:
The model can predict a price, say $350,000, based on the learned relationships between features
and house prices.
Clustering: Here, a set of inputs is to be divided into groups. Unlike in classification, the
groups are not known beforehand, making this typically an unsupervised task. Density
estimation: The task is to find the distribution of inputs in some space.
Dimensionality reduction: It simplifies inputs by mapping them into a lower-
dimensional space. Topic modeling is a related problem, where a program is given a list of
human language documents and is tasked to find out which documents cover similar topics.
Problem: Identifying distinct customer groups within a large customer base to tailor marketing
strategies and offers.
Steps Involved:
Example:
An e-commerce company wants to improve its marketing strategy by segmenting its customers into
different groups based on their shopping behavior. They have data on thousands of customers,
including:
Purchase history: Total amount spent, frequency of purchases
Browsing behavior: Pages visited, products viewed
Demographics: Age, location, income
Product preferences: Categories of products frequently bought
The company applies a k-means clustering algorithm to this data. The algorithm groups the
customers into clusters such as:
Real-Time Application:
With these segments identified, the company can implement real-time targeted marketing
campaigns. For example:
Cluster 1: Send personalized recommendations and exclusive early access to new products.
Cluster 2: Offer premium services and high-end product promotions.
Cluster 3: Provide incentives like discounts on their first purchase to convert browsing into
buying.
Cluster 4: Send notifications about sales and special offers.
As customers interact with the website and make purchases, their behavior data is continuously fed
into the clustering model, allowing for dynamic updating of customer segments. This ensures that
the marketing strategies remain relevant and effective over time.
Boosting Algorithm
Boosting is an ensemble technique that combines multiple weak learners to
create a strong learner. The ensemble of weak models are trained in series
such that each model that comes next, tries to correct errors of the previous
model until the entire training dataset is predicted correctly. One of the most
well-known boosting algorithms is AdaBoost (Adaptive Boosting).
Here are few popular boosting algorithm frameworks:
AdaBoost (Adaptive Boosting): AdaBoost assigns different weights to
data points, focusing on challenging examples in each iteration. It combines
weighted weak classifiers to make predictions.
Gradient Boosting: Gradient Boosting, including algorithms like
Gradient Boosting Machines (GBM), XGBoost, and LightGBM, optimizes a
loss function by training a sequence of weak learners to minimize the
residuals between predictions and actual values, producing strong
predictive models.
https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/
https://www.geeksforgeeks.org/underfitting-and-overfitting-in-machine-learning/