0% found this document useful (0 votes)

52 views22 pages

ML 15 09 2022

Here are the key steps: 1. Find all frequent itemsets with minimum support of 50%: {Milk}: 60% {Bread}: 70% {Milk, Bread}: 50% 2. Generate strong association rules with minimum confidence of 70%: Milk -> Bread: 80% Bread -> Milk: 71% So the only rule generated is Milk -> Bread since it meets both minimum support and minimum confidence thresholds.

Uploaded by

saibaba8998

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views22 pages

ML 15 09 2022

Uploaded by

saibaba8998

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Machine Learning

Summary of Unit 2

• Draw and explain the flow diagram of machine learning procedure

• What are the basic data types in machine learning? Give an example of each one of them.
• Why do we need to explore data? Is there a difference in the way of exploring qualitative data vis-
a-vis quantitative data?
• What are the different measures of central tendency? Why do mean, in certain data sets, differ
widely from median?
• Explain how bivariate relationships can be explored using scatter plot. Can outliers be detected
using scatter plot?
• What is BOX plot? How it can use to detect outlier? Draw a Box plot for given data
• Explain, in details, the different strategies of addressing missing data values.
• Explain, with proper example, different ways of exploring categorical data.
• What are the Techniques Provided in Data Preprocessing? Explain in brief.
Summary of Unit 2
Example: Box-Plot

Construct a box plot for the following data: 12, 5, 22, 30, 7, 36, 14, 42, 15, 53, 25
Outlier detection using Box Plot

An outlier, is one that appears to deviate markedly from other members of the data set
in which it occurs.

 Inter-Quartile Range (IQR) is the distance between the first and

second quartiles.
 Multiply the IQR by 1.5.
 Subtract that value from the 1st Quartile to get your lower
boundary.
 Add that value to the 2nd Quartile to get your upper boundary.
 Values in the data set that fall outside of these limits are considered
outliers.
Outlier detection using Box Plot: Example 2

24, 58, 61, 67, 71, 73, 76, 79, 82, 83, 85, 87, 88, 88, 92, 93, 94, 97

Median= 82+83/2 =82.5

IQR = Q3 - Q1 = 88 - 71 = 17.
Lower limit = Q1 - 1.5 · IQR = 71 - 1.5 *17 = 45.5
Upper limit = Q3 + 1.5 · IQR = 88 + 1.5 * 19 = 113.5
Lower adjacent value = 58
Upper adjacent value = 97
Since 24 lies outside the lower and upper limit, it is a potential outlier.
Summary of Unit 4

• Explain the need of feature engineering in ML.

• Explain the process of encoding nominal variables.
• Explain the process of transforming numeric features to categorical features
• What are the different distance measures that can be used to determine similarity of features?
• When can a feature be termed as redundant? What are the measures to determine the potentially
redundant features?
• Why is cosine similarity a suitable measure in context of text categorization? Two rows in a document-
term matrix have values - (2, 3, 2, 0, 2, 3, 3, 0, 1) and (2, 1, 0, 0, 3, 2, 1, 3, 1). Find the cosine similarity.
• How can we calculate Hamming distance? Find the Hamming distance between 10001011 and
11001111.
• Differentiate PCA and LDA
• Discuss different feature selection process (Filter, Wrapper, Hybrid, Embedded)
• Explain the overall process of feature selection
Cosine similarity measure

So let’s calculate the cosine similarity of x and y,

x = (2,4,0,0,2,1,3,0,0) and
y = (2,1,0,0,3,2,1,0,1).

x.y = 2*2 + 4*1 + 0*0 + 0*0 + 2*3 + 1*2 + 3*1 + 0*0 + 0*1 = 19.
Summary of Unit 5

• Examples using marginal probabilities and Bayes' theorem

Joint probabilities
 Joint probability is the probability of two events occurring simultaneously.
 For example, an expression of P(height, nationality) describes the probability of a
person has some particular height and has some particular nationality. Therefore,
P(height=165cm, nationality=Australian) = 0.5 means there is 0.5 chances of a
person, picked from a population, is an Australian and has the height of 165cm.

 The probability of the joint event A and B is defined as the product rule:

 where p(A|B) is defined as the conditional probability of event A happening if event B

happens.
Marginal probability
 Marginal probability is the probability of the occurrence of the single event. i.e.
Marginal probability is the probability of an event irrespective of the outcome of
another variable.

 It is just the sum or union over all the probabilities of all events for the second
variable for a given fixed event for the first variable.
P(X=A) = sum P(X=A, Y=yi) for all y
Conditional probability
 The conditional probability for events A given event B is calculated as follows:

Similarly ,
Bayes rule
 The Bayes rule, also known as Bayes Theorem, can be derived by combining the
definition of conditional probability with the product and sum rules, as below
Example
 An antibiotic resistance test (random variable T) has 1% false positives (i.e. 1% of
those not resistance to an antibiotic show positive result in the test) and 5% false
negatives (i.e. 5% of those actually resistant to an antibiotic test negative). Let us
assume that 2% of those tested are resistant to antibiotics.
 Determine the probability that somebody who tests positive is actually resistant
(random variable D).

D= man actually resistant to antibiotic

PT= Positive test result
P(D=1/PT=1)=?
Example
Solution:
D= man actually resistant to antibiotic
PT= Positive test result
1% false positive (FP) => 99% True Negative (TN)
5%false negative(FN)=> 95% True Positive (TP)
P(D=1)=0.02
P(D=0)=0.98
P(PT=1 |D=1)=0.95 =
P(PT=0|D=0)=0.99
P(PT=0|D=1)=0.05 =0.66
P(PT=1|D=0)=0.01
P(D=1|PT=1)=?
Example
 In an airport security checking system, the passengers are checked to find out any
intruder. Let I with i∈ {0, 1} be the random variable which indicates whether
somebody is an intruder (i = 1) or not (i = 0) and A with a ∈ {0, 1} be the variable
indicating alarm. An alarm will be raised if an intruder is identified with probability P(A
= 1|I = 1) = 0.98 and a nonintruder with probability P(A = 1|I = 0) = 0.001, which
implies the error factor. In the population of passengers, the probability of someone is
intruder is P(I = 1) = 0.00001. What is the probability that an alarm is raised when a
person actually is an intruder?
Example
This can be solved directly with the Bayesian theorem
Example
 For preparation of the exam, a student knows that one question is to be solved in the
exam which is either of types A, B, or C. The probabilities of A, B, or C appearing in the
exam are 30%, 20%, and 50% respectively. During the preparation, the student solved
9 of 10 problems of type A, 2 of 10 problems of type B, and 6 of 10 problems of type
C.

1. What is the probability that the student will solve the problem of the exam?
2. Given that the student solved the problem, what is the probability that it was of type
A?
Example
 P(Solved)=0.61
 P(A)=30%
 P(B)=20%
 P(C)=50%
 P(Solved | A)=9/10
 P(Solved | B)=2/10
 P(Solved | C)=6/10
1. What is the probability that the student will solve the problem of the exam?
Example
 Solution
 P(A)=30%
 P(B)=20%
 P(C)=50%
 P(Solved | A)=9/10
 P(Solved | B)=2/10
 P(Solved | C)=6/10
2. Given that the student solved the problem, what is the probability that it was of type
A?
Summary of Unit 8

• Examples apriori algorithm

Solve Apriori algorithm on the following data set with minimum support value
and minimum confidence value set as 50% and 75% respectively to generate
large itemsets and association rules
Summary of Unit 8

• Examples apriori algorithm

Generate frequent itemsets and generate association rules based on it using

apriori algorithm. Minimum support is 50% and minimum confidence is 70%

PPT CH 1 PR Ir
No ratings yet
PPT CH 1 PR Ir
48 pages
Uncertainty
No ratings yet
Uncertainty
32 pages
Unit 2 .Statistical Decision Making-1
No ratings yet
Unit 2 .Statistical Decision Making-1
213 pages
Week 11
No ratings yet
Week 11
97 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
English and Business Communication
87% (31)
English and Business Communication
392 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
DA Unit 2
No ratings yet
DA Unit 2
124 pages
Unit 4
No ratings yet
Unit 4
74 pages
Content Library Read
No ratings yet
Content Library Read
24 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
23 pages
Machine Learning: A Probabilistic Perspective: Solutions Manual (Please Do Not Make Publicly Available)
No ratings yet
Machine Learning: A Probabilistic Perspective: Solutions Manual (Please Do Not Make Publicly Available)
127 pages
Bioeng 3070/5070: App Math/Stats For Bioengineer
No ratings yet
Bioeng 3070/5070: App Math/Stats For Bioengineer
14 pages
U3 Prob & Stat & Hypo
No ratings yet
U3 Prob & Stat & Hypo
80 pages
Concept Learning
No ratings yet
Concept Learning
33 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
1 - Basic Probability Theory
No ratings yet
1 - Basic Probability Theory
58 pages
5 Prob
No ratings yet
5 Prob
35 pages
PMRprobabilistic Modelling Primer
No ratings yet
PMRprobabilistic Modelling Primer
14 pages
תרגול - Bayesian Learning
No ratings yet
תרגול - Bayesian Learning
45 pages
ML BayesionBeliefNetwork Lect12 14
No ratings yet
ML BayesionBeliefNetwork Lect12 14
99 pages
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
No ratings yet
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
20 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
Chapter 5 - 7
No ratings yet
Chapter 5 - 7
110 pages
Unit 2
No ratings yet
Unit 2
20 pages
ML Document-1 - Merged
No ratings yet
ML Document-1 - Merged
19 pages
ML 3
No ratings yet
ML 3
14 pages
Unit IV CI PDF
No ratings yet
Unit IV CI PDF
24 pages
Math Essentials1234adadada PDF
No ratings yet
Math Essentials1234adadada PDF
55 pages
Math Essentials1234adadvklop32165adada PDF
No ratings yet
Math Essentials1234adadvklop32165adada PDF
55 pages
1234adadvklop32165adada PDF
No ratings yet
1234adadvklop32165adada PDF
55 pages
Math Essentials PDF
No ratings yet
Math Essentials PDF
55 pages
Dempster Shafer
No ratings yet
Dempster Shafer
134 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Lecture 8 - Joint Probability Distributions - Applications in Machine Learning
No ratings yet
Lecture 8 - Joint Probability Distributions - Applications in Machine Learning
39 pages
Uncertainty F23 Part1
No ratings yet
Uncertainty F23 Part1
44 pages
M3 - FDS
No ratings yet
M3 - FDS
38 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Statistic & Machine Learning: Team 2
No ratings yet
Statistic & Machine Learning: Team 2
42 pages
Lecture 9 - 10 Naive Generative Analysis
No ratings yet
Lecture 9 - 10 Naive Generative Analysis
54 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
M3 - FDS
No ratings yet
M3 - FDS
38 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
PR FInal Theory
No ratings yet
PR FInal Theory
10 pages
Contact Session6
No ratings yet
Contact Session6
57 pages
Introduction To Probability - Part II: Axioms and A Few Probability Rules
No ratings yet
Introduction To Probability - Part II: Axioms and A Few Probability Rules
13 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
University of Dar Es Salaam Coict: Department of Computer Science & Eng
No ratings yet
University of Dar Es Salaam Coict: Department of Computer Science & Eng
42 pages
Unit-V POAI
No ratings yet
Unit-V POAI
50 pages
Ads M1 02
No ratings yet
Ads M1 02
16 pages
2 Unit PR Statistical Decision Making
No ratings yet
2 Unit PR Statistical Decision Making
61 pages
Fall 2019 Prob Review
No ratings yet
Fall 2019 Prob Review
33 pages
Bayesian
No ratings yet
Bayesian
14 pages
CSE 512 Machine Learning: Homework I: Mari Wahl, Marina.w4hl at Gmail
No ratings yet
CSE 512 Machine Learning: Homework I: Mari Wahl, Marina.w4hl at Gmail
36 pages
MLT Unit 4 Notes
No ratings yet
MLT Unit 4 Notes
26 pages
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
No ratings yet
25-27 Statistical Reasoning-Probablistic Model-Naive Bayes Classifier
35 pages
Assignment AI
No ratings yet
Assignment AI
4 pages
M 2 Vdpi 2 Ealk
50% (2)
M 2 Vdpi 2 Ealk
95 pages
Paranoia Mutant Forms
100% (2)
Paranoia Mutant Forms
5 pages
Yiye Avila - Dones Del Espíritu
50% (4)
Yiye Avila - Dones Del Espíritu
1 page
MS Powerpoint
No ratings yet
MS Powerpoint
7 pages
Chapter 7
100% (1)
Chapter 7
42 pages
Diseases of Nervous System of Farm Animals by Ali Sadiek
100% (7)
Diseases of Nervous System of Farm Animals by Ali Sadiek
65 pages
SC Form 2 Chapter 1
No ratings yet
SC Form 2 Chapter 1
37 pages
TCS-P-122.09-Rev. 00 Storage Handling & Installation of Comp
No ratings yet
TCS-P-122.09-Rev. 00 Storage Handling & Installation of Comp
20 pages
25 Pounder Identification
100% (2)
25 Pounder Identification
11 pages
Address Proof
No ratings yet
Address Proof
1 page
Kyocera KM1650 / 2050 Parts List / Manual
No ratings yet
Kyocera KM1650 / 2050 Parts List / Manual
48 pages
Metaphors in Editorial Cartoons Representing The Global Financial Crisis
No ratings yet
Metaphors in Editorial Cartoons Representing The Global Financial Crisis
21 pages
Problem Tree-Objectives Tree Template
No ratings yet
Problem Tree-Objectives Tree Template
7 pages
AP1000 Design Control Document
No ratings yet
AP1000 Design Control Document
159 pages
Most Favoured Nation Concept in Wto
100% (1)
Most Favoured Nation Concept in Wto
23 pages
Culminating (Creative Writing)
No ratings yet
Culminating (Creative Writing)
7 pages
VIVID E95 4D Spesifikasi
No ratings yet
VIVID E95 4D Spesifikasi
2 pages
The Statistical Mechanics of Irreversible Phenomena 1st Edition Pierre Gaspard Download
100% (5)
The Statistical Mechanics of Irreversible Phenomena 1st Edition Pierre Gaspard Download
76 pages
Functional Reach
No ratings yet
Functional Reach
16 pages
4-Creating A Web Application With Spring Boot
No ratings yet
4-Creating A Web Application With Spring Boot
27 pages
Freshman Admission and Enrollment Procedure
No ratings yet
Freshman Admission and Enrollment Procedure
4 pages
Automatic Gas Leakage Detection and Alarming System Using Esp8266 and Mq6 Gas Sensor
No ratings yet
Automatic Gas Leakage Detection and Alarming System Using Esp8266 and Mq6 Gas Sensor
51 pages
Data Structures & Algorithm Design: Trees
No ratings yet
Data Structures & Algorithm Design: Trees
38 pages
Housing Brochure
No ratings yet
Housing Brochure
2 pages
The Influence of Diets On The Internet-1
No ratings yet
The Influence of Diets On The Internet-1
2 pages
APAT06 Day 2
No ratings yet
APAT06 Day 2
19 pages
He Week 2
No ratings yet
He Week 2
19 pages
Chapter 2 - Measures of Location and Spread
No ratings yet
Chapter 2 - Measures of Location and Spread
3 pages
Rent House Solo Baru Area
No ratings yet
Rent House Solo Baru Area
10 pages
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet

ML 15 09 2022

Uploaded by

ML 15 09 2022

Uploaded by

Machine Learning

• Draw and explain the flow diagram of machine learning procedure

 Inter-Quartile Range (IQR) is the distance between the first and

Median= 82+83/2 =82.5

• Explain the need of feature engineering in ML.

So let’s calculate the cosine similarity of x and y,

• Examples using marginal probabilities and Bayes' theorem

 where p(A|B) is defined as the conditional probability of event A happening if event B

D= man actually resistant to antibiotic

• Examples apriori algorithm

• Examples apriori algorithm

Generate frequent itemsets and generate association rules based on it using

You might also like