0% found this document useful (0 votes)
18 views

Machine Learning Unit-1.1

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Machine Learning Unit-1.1

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Overview of Course

1. Introduction
2. Linear Regression and Decision Trees
3. Instance based learning Feature Selection
4. Probability and Bayes Learning
5. Support Vector Machines
6. Neural Network
7. Introduction to Computational Learning Theory
8. Clustering
UNIT-1: Overview of Course
1. Introduction to Machine Learning
2. Definition of Machine Learning
3. Machine Learning with daily life examples
4. Different Types of Learning
5. Applications of Machine Learning
6. Examples of Machine Learning
7. Hypothesis space, Inductive Bias
8. Evaluation, Training and Test set, Cross-validation
9. Overfitting and Underfitting
Figure1:- Machine
• Artificial Intelligence is the concept of creating
smart intelligent machines. Ability of machine to
imitate intelligence like Intelligent human brain.

• Machine Learning is a subset of artificial


intelligence that helps you build
AI-driven applications. It is the application of AI
that allows system to automatically learn and
improve from experience.

• Deep Learning is a subset of Machine Learning


that uses complex algorithms and deep neural
network to train a model.
Difference Between Machine Learning
And Artificial Intelligence

• Artificial Intelligence is a concept of creating


intelligent machines that stimulates human
behaviour whereas
• Machine learning is a subset of Artificial
intelligence that allows machine to learn from
data without being programmed.
Introduction
• Popularity of this field in recent time and the
reasons behind that
– New software/ algorithms
• Neural networks
• Deep learning
– New hardware
• GPU’s
– Cloud Enabled
– Availability of Big Data
• Ever since computers were invented, we have
wondered whether they might be made to learn. If we
could understand how to program them to learn-to
improve automatically with experience-the impact
would be dramatic.

• Imagine computers learning from medical records


which treatments are most effective for new diseases.

• A successful understanding of how to make computers


learn would open up new uses of computers and new
levels of competence and customization.
• We do not know yet, how to make computers learn nearly as well as people
learn.

• However, algorithms have been invented that are effective for certain types of
learning tasks, and a theoretical understanding of learning is beginning to
emerge.

• Many practical computer programs have been developed to exhibit useful


types of learning, and significant commercial applications have begun to
appear.

• For problems such as speech recognition, algorithms based on machine


learning perform all other approaches that have been attempted to date.

• In the field known as data mining, machine learning algorithms are being used
routinely to discover valuable knowledge from large commercial databases
containing equipment maintenance records, loan applications, financial
transactions, medical records, patient record etc.
• A few specific achievements provide a glimpse of the
state of the art:

• Programs have been developed that successfully learn


to recognize spoken words.

• Predict recovery rates of pneumonia patients, detect


fraudulent use of credit cards, drive autonomous
vehicles on public highways.

• Play games such as backgammon at levels approaching


the performance of human world champions
What is data science?
• Data science is the field of applying advanced analytics techniques and
scientific principles to extract valuable information from data for
decision-making, strategic planning and other business uses.

• Data science combines math and statistics, specialized programming,


advanced analytics, artificial intelligence (AI), and machine learning with
specific subject matter expertise to uncover actionable insights hidden in
an organization’s data.

• Data science is the study of data to extract meaningful insights for


business. It combines principles and practices from the fields of
mathematics, statistics, artificial intelligence, and computer engineering
to analyze large amounts of data.

• These insights can be used to guide decision making and strategic


planning.
Prerequisite for Data Science
1.Machine Learning:- Machine learning is the backbone of data science.
Data Scientists need to have a solid grasp of ML in addition to basic
knowledge of statistics.

2. Modeling:- Mathematical models enable you to make quick calculations


and predictions based on what you already know about the data.

3. Statistics:- A sturdy handle on statistics can help you extract more


intelligence and obtain more meaningful results.

4. Programming:- The most common programming languages are Python,


and R. Python is especially popular because it’s easy to learn, and it
supports multiple libraries for data science and ML.

5. Databases:- A capable data scientist needs to understand how databases


work, how to manage them, and how to extract data from them.
Different Roles/Jobs in Data Science
• Data Scientist:
 a data scientist’s skillset is typically broader than the average data analyst.
Comparatively speaking, data scientist leverage common programming
languages, such as R and Python, to conduct more statistical inference and
data visualization.
 Data scientists require computer science and pure science skills beyond
those of a typical business analyst or data analyst.
 The data scientist must also understand the specifics of the business, such as
Manufacturing, eCommerce, or healthcare, Agriculture domain knowledge.
• Data Scientist:
• Job role: Determine what the problem is, what questions need answers, and
where to find the data. Also, they mine, clean, and present the relevant
data.
• Skills needed: Programming skills (SAS, R, Python), storytelling and data
visualization, statistical and mathematical skills, knowledge of Hadoop, SQL,
and Machine Learning.
• Data analyst: particularly deals with exploratory data analysis and data
visualization.
• Data Analyst:
• Job role: Analysts bridge the gap between the data scientists and the
business analysts, organizing and analyzing data to answer the questions
the organization poses. They take the technical analyses and turn them
into qualitative action items.
• Skills needed: Statistical and mathematical skills, programming skills (SAS,
R, Python), plus experience in data wrangling and data visualization.
• Data Engineer:
• Job role: Data engineers focus on developing, deploying, managing, and
optimizing the organization’s data infrastructure and data pipelines.
Engineers support data scientists by helping to transfer and transform
data for queries.
• Skills needed: NoSQL databases (e.g., MongoDB, Cassandra DB),
programming languages such as Java and Scala, and frameworks (Apache
Hadoop).
• Business Managers:
 The business managers are the people in charge of
overseeing the data science training method. Their primary
responsibility is to collaborate with the data science team to
characterise the problem and establish an analytical method.
Their goal is to ensure projects are completed on time by
collaborating closely with data scientists and IT managers.

• IT Managers:
 They are primarily responsible for developing the
infrastructure and architecture to enable data science
activities. Data science teams are constantly monitored and
resourced accordingly to ensure that they operate efficiently
and safely. They may also be in charge of creating and
maintaining IT environments for data science teams.
Data Science tools
• Data scientists rely on the following popular programming
languages to conduct data analysis and statistical regression.

• Open source tools support pre-built statistical modeling,


machine learning, and graphics capabilities.

1. R Studio: An open source programming language and


environment for developing statistical computing and
graphics.

2. Python: It is a dynamic and flexible programming language.


The Python includes numerous libraries, such as NumPy,
Pandas, Matplotlib, for analyzing data quickly.
Data Science Tools

• Data Analysis: SAS, Jupyter, R Studio, MATLAB, Excel,


RapidMiner.

• Data Warehousing: Informatica/ Talend, AWS Redshift

• Data Visualization: Jupyter, Tableau, Cognos, RAW

• Machine Learning: Spark MLib, Mahout, Azure ML


studio
Programs vs learning algorithms
Algorithmic solution

Data
Computer Output
Program

Machine Learning solution

Data
Computer Program
Output
What is Machine Learning?
• Learning:- “Learning is any process by which a system
improves performance from experience.” - Herbert Simon

• Machine learning is a discipline of computer science that


uses computer algorithms and analytics to build predictive
models that can solve business problems.

• Definition by Tom Mitchell on machine learning says: “A


computer program is said to learn from experience E with
respect to some class of tasks T and performance measure
P, if its performance at tasks T, as measured by P, improves
with experience E.”
What is Machine Learning
• Machine learning is programming computers to
optimize a performance criterion using data or
past experience.
• There is no need to “learn” to calculate payroll
• Learning is used when:
– Human expertise does not exist (navigating on Mars),
– Humans are unable to explain their expertise (speech
recognition)
– Solution changes in time (routing on a computer
network)
– Solution needs to be adapted to particular cases (user
biometrics)
19
Machine Learning : Definition
• Learning is the ability to improve one's behaviour based on
experience.

• Build computer systems that automatically improve with


experience

• What are the fundamental laws that govern all learning


processes?

• Machine Learning explores algorithms that can


– learn from data / build a model from data
– use the model for prediction, decision making or solvingsome
tasks
• A computer program is said to learn from
• experience E with respect to some class of
• tasks T and performance measure P, if its
• performance at tasks in T, as measured by P,
• improves with experience E.
• [Mitchell]
Components of a learning problem
• Task: The behaviour or task being improved.
– For example: classification, acting in an
environment

• Data: The experiences that are being used to


improve performance in the task.

• Measure of improvement :
– For example: increasing accuracy in prediction,
acquiring new, improved speed and efficiency
How Does Machine Learning Work?

• Machine learning accesses vast amounts of data (both structured and unstructured)
and learns from it to predict the future. It learns from the data by using multiple
algorithms and techniques. Below is a diagram that shows how a machine learns from
data.

Machine Learning
Past Data Algorithm Output

• https://www.simplilearn.com/tutorials/artificial-intelligence-tutorial/ai-vs-machine-le
arning-vs-deep-learning
• Reference Books:
1. Machine-Learning-Tom-Mitchell Publisher: McGraw-Hill
Programs vs learning algorithms
Algorithmic solution

Data
Computer Output
Program

Machine Learning solution

Data
Computer Program
Output
Domains and ML Applications
Domain:- Automobile
Example: A robot driving learning problem
• Task T: driving on public four-lane highways
using vision sensors
• Performance measure P: average distance
traveled before an error
• Training experience E: a sequence of images
and steering commands recorded while
observing a human driver
Domain: Health Care

• Diagnose a disease
– Input: symptoms, lab measurements, test result.
– Output: One of set of possible diseases, or
“none of the above”

• Data: Historical medical records.

• Learn: which future patients will respond best to


which treatments
Association rule
• Association rule is a kind of unsupervised learning technique that
tests for the reliance of one data element on another data element
and design appropriately so that it can be more cost-effective.

• The association rule learning is the most important approach of


machine learning, and it is employed in Market Basket analysis,
Web usage mining, continuous production, etc. In market basket
analysis, it is an approach used by several big retailers to find the
relations between items.

• Association rule learning can be divided into three types of


algorithms:
1. Apriori algorithm
2. Eclat algorithm
3. F-P Growth algorithm
Associations: Market Basket Analysis
How does Association Rule Learning work?
• Association rule learning works on the concept of If and
Else Statement, such as if A then B.

• To measure the associations between thousands of data


items, there are several metrics.

• These metrics are given below:


• Support
• Confidence
• Lift
Applications of Machine Learning:
Learning Associations:
TID Items
1 {Bread, Milk}
2 {Bread, Diaper, Beer, Eggs}
3 {Milk, Diaper, Beer, Cola}
4 {Bread, Milk, Diaper, Beer}
5 {Bread, Milk, Diaper, Cola}

Support Count(σ) – Frequency of occurrence of a items.

Here σ({Milk, Diaper, Beer})=2


σ({Bread, Diaper, Beer})=2
• Association Rule – An implication expression of the
form X -> Y, where X and Y are any 2 itemsets.
• Example: {Milk, Diaper}->{Beer}

• Definition of Support:
• Support is the frequency of A or how frequently an
itemset appears in the dataset.
• From the above table:

• Support(s)= σ ({Milk, Diaper, Beer})/|T| = 2/5


= 0.4

Where T is the total number of transactions.


• Definition of Confidence:
• How often the items X and Y occur together in
the dataset when the occurrence of X is already
given. It is the ratio of the transaction that
contains X and Y to the number of records that
contain X.
• From Example {Milk, Diaper} ―> {Beer}
• Confidence(c)= σ(Milk, Diaper, Beer)/σ(Milk,
Diaper)=2/3
=0.67
• Definition of Lift(l): The lift of the rule X=>Y is the confidence of the rule
divided by the expected confidence, assuming that the itemsets X and Y
are independent of each other. The expected confidence is the
confidence divided by the frequency of {Y}.

• If Lift(l)=1: It indicates X and Y almost often appear together as


expected,

• If Lift(l)>1: It means they appear together more than expected and

• If Lift(l)<1: It means they appear less than expected. Greater lift values
indicate stronger association.

• Lift(l)= Support(X,Y)/(Support(X)*Support(Y))
l=Supp({Milk, Diaper, Beer})/ (Supp({Milk, Diaper})*Supp({Beer}))
= 0.4/(0.6*0.6)
=1.11
Applications of Association Rule Learning

• Below are some popular applications of association rule learning:

1. Market Basket Analysis: It is one of the popular examples and applications of


association rule mining. This technique is commonly used by big retailers to
determine the association between items. By discovering such associations,
retailers produce marketing methods by analyzing which elements are
frequently purchased by users.

2. Medical Diagnosis: With the help of association rules, patients can be cured
easily, as it helps in identifying the probability of illness for a particular
disease.

3. Protein Sequence: The association rules help in determining the synthesis of


artificial Proteins.

4. Web usage mining: Web usage mining is basically the extraction of various
types of interesting data that is readily available and accessible in the ocean
of huge web pages, from Internet.
Classification

• Example: Credit
scoring
• Differentiating
between low-risk
and high-risk
customers from their
income and savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk ELSE high-risk
36
Face Recognition

Training examples of a person

Test images

AT&T Laboratories, Cambridge UK


http://www.uk.research.att.com/facedatabase.html

37
Clustering
Clustering
Regression

• Example: Price of a
used car
y = wx+w0
• x : car attributes
y : price
y = g (x | q )
g ( ) model,
q parameters
40
• Regression:
Some other applications
• Fraud detection : Credit card Providers
 Determine whether or not someone will
default on a home mortgage.
 Understand consumer sentiment based off of
unstructured text data.
 Determine customers behavior based on
previous records/pattern.
• Speech recognition:
• Face Recognition:
• Weather Forecasting
• NLP:
detect where entities are mentioned in NL
detect what facts are expressed in NL
detect if a product/movie review is positive,
negative, or neutral
• Financial:
Predict if a stock will rise or fall?
 Predict if a user will click on an ad or not?

You might also like