ML@Chapter 1
ML@Chapter 1
1
Outlines
What is machine learning
Foundation of Machine learning
History and relationship to other fields
Applications of machine learning
Types of machine learning techniques
Overview of data mining and KDD process
Prediction vs Description modeling
2
Machine Learning
Machine learning (ML) is a subfield of artificial intelligence that enables machines to learn
Here, experience refers to the past information available to the learner, which typically takes
the form of electronic data collected and made available for analysis.
This data could be in the form of digitized human-labeled training sets, or other types of
In all cases, its quality and size are crucial to the success of the predictions made by the
learner.
These algorithms are trained on data to learn the hidden patterns and make predictions
4
Need for Machine Learning
Human beings, at this moment, are the most intelligent and advanced species on earth because they can
On the other side, AI is still in its initial stage and hasn’t surpassed human intelligence in many aspects.
Then the question is, what is the need to make machines learn? The most suitable reason for doing this
Lately, organizations are investing heavily in newer technologies like Artificial Intelligence, Machine
Learning and Deep Learning to get the key information from data to perform several real-world tasks
and solve problems. We can call it data-driven decisions taken by machines, particularly to automate the
process.
These data-driven decisions can be used, instead of programming logic, in problems that cannot be
programmed inherently.
The fact is that we can’t do without human intelligence, but another aspect is that we all need to solve
real-world problems with efficiency at a huge scale. That is why the need for machine learning arises.
5
History of Machine Learning/foundation/
The history of Machine learning roots back to the year 1959, when Arthur
Samuel invented a program that calculates the winning probability in checkers for each
side.
The evolution of Machine learning through decades started with the question, "Can
Machines think?". Then came the rise of neural networks between 1960 and 1970.
The revolution of Deep Learning started off in the 2010s with the evolution of tasks such
Today, machine learning has turned out to be a revolutionizing technology that has become
processing (NLP) to translate human speech into a written format. To perform voice search,
such as Siri, or improve text accessibility, a large number of Mobile Devices incorporate
Customer service: Chatbots are replacing human operators on websites and social media,
affecting client engagement. Chatbots answer shipping FAQs, offer personalized advice,
cross-sell products, and recommend sizes. Some common examples are virtual agents on e-
commerce sites, Slack and Facebook Messenger bots, and virtual and voice assistants.
meaningful information from digital images, videos, and other visual inputs that can then be
used for appropriate action. Computer vision, powered by convolutional neural networks, is 8
…
Recommendation engines: AI algorithms may help to detect trends in data that might be useful for
developing more efficient marketing strategies using past data patterns. Online retailers use recommendation
engines to provide their customers with relevant product recommendations for the purchasing process.
Robotic process automation (RPA): Also known as software robotics, RPA uses intelligent automation
Automated stock trading: AI-driven high-frequency trading platforms are designed to optimize stock
portfolios and make thousands or even millions of trades each day without human intervention.
Fraud detection: Machine learning is capable of detecting suspected transactions for banks and others in the
financial sector. A model can be trained by supervised learning, based on knowledge of recent fraudulent
transactions. Anomaly detection may identify transactions that appear unusual, and need to be followed up.
Social Media: Social media platforms are particularly popular among the youth for their user-friendly
For example, Facebook uses Machine Learning to observe and record different activities of users and even
tracks their chats, likes, and comments, and the time individuals spend on various posts. Based on these
observations and learning from the data collected, it suggests friends and pages you should follow.
9
Machine Learning Methods/types/models
Machine learning models can be categorized
mainly into the following four types −
10
Supervised Machine Learning
As input data is inputted into the model, its weights modify until it fits
The algorithm then uses this labeled data to make predictions about
12
--OS
13
Unsupervised Machine Learning
15
So--
17
Semi-supervised Machine Learning
As its name implies; Semi-supervised learning is an integration of
regression tasks.
Hence, it's an appropriate method to solve the problem where data is partially
labeled or unlabelled.
Self-training, co-training, and graph-based labeling are some of the popular
19
Semi-supervised learning methods.
Reinforcement Machine Learning
No – predefined data
21
When to use: supervised vs. unsupervised learning or both?
You can use supervised learning techniques to solve problems with known outcomes and that have labeled
data available.
Examples include, Risk Evaluation, Forecast Sales email spam classification, image recognition, and stock
You can use unsupervised learning for scenarios where the data is unlabeled and the objective is to discover
You can also use it for exploratory tasks where labeled data is absent.
Examples include organizing large data archives, building recommendation systems, and grouping customers
Semi-supervised learning is when you apply both supervised and unsupervised learning techniques to a
You can apply semi-supervised learning when it’s difficult to obtain labels for a dataset.
You might have a smaller volume of labeled data but a significant amount of unlabeled data. 22
Stages of Machine Learning
–
23
.
Data collection
Data is a fundamental part of machine learning, the quality and quantity of your data can have direct consequences for model
performance.
Different sources such as databases, text files, pictures, sound files, or web scraping may be used for data collection.
Data needs to be prepared for machine learning once it has been collected.
This process is to organize the data in an appropriate format, and make sure that they are useful for solving your problem.
Data pre-processing
It involves deleting duplicate data, fixing errors, managing missing data either by eliminating or filling it in, and adjusting
Pre-processing improves the quality of your data and ensures that your machine-learning model can read it right.
The next step is to select a machine learning model; once data is prepared then we apply it to ML Models like Linear
regression, decision trees, and Neural Networks that may be selected to implement.
The selection of the model generally depends on what kind of data you're dealing with and your problem.
24
The size and type of data, complexity, and computational resources should be taken into account when choosing a model to
..
Training the model
The next step is to train it with the data that has been prepared after you have chosen a model.
Training is about connecting the data to the model and enabling it to adjust its parameters to predict output more
It is important to assess the model's performance before deployment as soon as a model has been trained. This
means that the model has to be tested on new data that they haven't been able to see during training.
Accuracy in classifying problems, precision and recall for binary classification problems, as well as mean error
squared with regression problems, are common metrics to evaluate the performance of a model.
You may need to adjust its hyperparameters to make it more efficient after you've evaluated the model.
Grid searches, where you try different combinations of parameters, and cross-validation, where you divide your
data into subsets and train your model on each subset, to ensure that it performs well on different data sets, are
As soon as the model has been programmed and optimized, it will be ready to estimate new data.
25
This is done by adding new data to the model and using its output for decision-making or other analysis.
Overview of DM and KDD
Data Mining is defined as the procedure of extracting hiden information from huge sets of
data. In other words, we can say that data mining is mining knowledge from data.
Data Mining (DM) is a part of the KDD process relating to methods for extracting patterns from
data [Fayyad].
Data Mining is a problem solving methodology that finds a logical or mathematical description,
of a complex nature, of patterns and regularities in a set of data [Decker and Focardi].
KDD (Knowledge Discovery in Databases) is a process that involves the extraction of useful,
previously unknown, and potentially valuable information from large datasets. The KDD
process is an iterative process and it requires multiple iterations of the above steps to extract
Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel,
DM: The non-trivial extraction of implicit, previously unknown and potentially useful
Understand Business
• Identify the Company's and Project's Objectives first and Problems that need to be addressed
• Clean the data: handle missing data, data errors, default values, and data corrections.
Evaluation
• Validate models with business goals, and Change the model, adjust the business goal, or revisit the data,
if needed
Deployment
27
• Generate business intelligence, and Continually monitoring, and maintaining the data mining application
Why Data Mining?
Extracting Insights: Data mining techniques allow users to extract useful information and patterns
Decision Making: Data mining contributes to the decision-making process. Businesses can predict
future trends and outcomes with a high degree of confidence through the analysis of historical data.
customers, data mining enables enterprises to gain a more accurate understanding of their clients..
Risk Management: Using data mining techniques to analyze patterns and anomalies in the data,
Improved Efficiency: Data mining, which can greatly enhance the efficiency of operations, aids in
Innovation: Hidden patterns and relationships in the data that can lead to new product ideas,
Personal Development: The analytical and problem-solving skills are enhanced by the knowledge of
28
data mining
Key Areas of Machine Learning
29