0% found this document useful (0 votes)

8 views61 pages

Lecture 4

The lecture covers classification and retrieval methods in text processing, focusing on decision trees as a classification technique. It discusses the process of tree induction, feature selection using information gain, and the challenges of overfitting. The importance of balancing model complexity and generalization is emphasized throughout the lecture.

Uploaded by

Julien LEKA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views61 pages

Lecture 4

Uploaded by

Julien LEKA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Lecture 4: Classification and Retrieval

Dr. YI Cheng (易成)

School of Economics and Management
Mar 18, 2024
Last Lecture
• Information Organization
– Categorization/classification

• Text processing basics

– Statistical Properties of Text

• Zipf Distribution

• Statistical Dependence

– Text Processing Process

Dr. Yi, C., Tsinghua SEM 2

Document Processing Steps (L3)

Dr. Yi, C., Tsinghua SEM 3

Text Processing Applications
• Classification and prediction
– Text categorization
(support browsing)

• Information retrieval
(support querying)

• Other applications
– Clustering
– Information extraction
–…

Dr. Yi, C., Tsinghua SEM 4

Classification as a Prediction Problem
• The classification process

Dr. Yi, C., Tsinghua SEM 5

Classfication Methods
• Decision trees
• Spatial techniques
• Probabilistic classifiers
• Neural networks
• …

Dr. Yi, C., Tsinghua SEM 6

Decision Tree
• The classification process is modeled using a set of
hierarchical decisions on the features, arranged in a
tree-like structure
– Tree structures are common for organizing classification
schemes
• The decision at a particular node, referred to as the
split criterion, is a condition on features based on
training data; divides training data into two or more
parts
• The goal is to identify a split criterion that maximizes
the separation of the different classes among the
children nodes (i.e., try to derive pure sets)
Dr. Yi, C., Tsinghua SEM 7
Decision Tree Example
A decision tree to help a doctor diagnose patient’s disease:

Pain Each branch node represents a choice

among a number of features
abdomen
none
throat chest

Fever Heart attack Cough

Appendicitis
yes no
yes no

Fever None
Flu Strep
yes no
Each leaf node represents
a class/decision
Flu Cold

Dr. Yi, C., Tsinghua SEM 8

Example of Tree Induction:
Instance Data “buys_computer”
(Training Set)
This follows an age income student credit_rating buys_computer
example of <=30 high no fair no
<=30 high no excellent no
Quinlan’s ID3 31…40 high no fair yes
algorithm >40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
14 instances <=30 medium no fair no
2 classes: yes/no <=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

Dr. Yi, C., Tsinghua SEM 9

Output: A Decision Tree for
“buys_computer”
age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

Dr. Yi, C., Tsinghua SEM 10

Tree Induction Algorithm

▪ The algorithm operates over a set of training instances, C.

▪ If all instances in C are in class P (i.e., pure), create a node
P and stop, otherwise select a feature or attribute F and
create a decision node.
▪ Partition the training instances in C into subsets S
according to the values V of F.
▪ Apply the algorithm recursively to each of the subsets S.

Dr. Yi, C., Tsinghua SEM 11

Output: A Decision Tree for
“buys_computer”
Which feature to start?
age? What feature follows the root
feature?
When to stop?
<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

Dr. Yi, C., Tsinghua SEM 12

Feature Selection Measure:
Information Gain
◼ The most common splitting criterion is called information gain
◼ Based on a purity measure (i.e., homogeneity with respect to
the target variable): entropy 1 bit of information required
to distinguish yes and no.

not pure

pure, 100% no Dr. Yi, C., Tsinghua SEM

pure, 100% yes 13
Shannon’s Information Theory (L1)
▪ Information Entropy
• A measure of the disorder/uncertainty of a
system and inversely related to the amount of
energy available to do work.
A discrete random variable X with
Entropy possible values {x1, ..., xn}

Information content of Xi
Probability mass function of X
Dr. Yi, C., Tsinghua SEM 14
Feature Selection Measure:
Information Gain
◼ Consider a set S of 10 documents with seven of the
class A and three of the class B
◼ p (A) = 7/10 = 0.7

◼ p (B) = 3/10 = 0.3

◼ entropy (S)

= - [ p(A) log p(A) + p(B) log p(B) ]

= - [0.7 × log2(0.7) + 0.3 × log2(0.3)] ≈ 0.88

Dr. Yi, C., Tsinghua SEM 15

Feature Selection Measure:
Information Gain
◼ What is the feature with the highest information gain?
◼ S contains si instances of class Ci for i = {1, …, m}
◼ Information (I) measures info required to classify any arbitrary
instance m
si si
I( s1,s2,...,sm) = − log 2 Entropy of “parent”
i =1 s s
◼ Entropy (E) of feature A with values {a1,a2,…,av}
v
s1 j + ... + smj
E(A) =  I ( s1 j ,..., smj ) Entropy of “children”
j =1 s
◼ information gained by branching on feature A

Gain(A) = I(s1 ,s 2 ,...,s m ) − E(A)

Dr. Yi, C., Tsinghua SEM 16
Output: A Decision Tree for
“buys_computer”
age?
Which feature to start?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

Dr. Yi, C., Tsinghua SEM 17

Feature Selection by Information Gain
 Class P: buys_computer = “yes”
5 4
 Class N: buys_computer = “no” E (age) = I (2,3) + I (4,0)
 I(p, n) = I(9, 5) =0.940 14 14
 Compute the entropy for age: 5
+ I (3,2) = 0.694
14
age pi ni I(pi, ni)
5
<=30 2 3 0.971 I (2,3) means “age <=30” has 5
14
30…40 4 0 0 out of 14 samples, with 2
>40 3 2 0.971
age income student credit_rating buys_computer
yes’es and 3 no’s. Hence,
<=30 high no fair no
<=30 high no excellent no Gain(age) = I ( p, n) − E (age) = 0.246
31…40 high no fair yes
>40 medium no fair yes
>40
>40
low
low
yes fair
yes excellent
yes
no
Similarly,
31…40
Gain(income) = 0.029
low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40
<=30
medium
medium
yes fair
yes excellent
yes
yes
Gain( student ) = 0.151
Gain(credit _ rating ) = 0.048
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Dr. Yi, C., Tsinghua SEM 18
Decision Tree Classification
• How good is each of the features individually (i.e.,
as the root of the tree)?
– Information gain
– Over the entire set of instances
• Apply the tree algorithm to the data, what is the
tree like?
– The feature with the highest information gain should be
the root
– Select lower-level features/nodes based on the
instances above them in the tree
– A recursive process of divide and conquer

Dr. Yi, C., Tsinghua SEM 19

Decision Tree Example:
Classifying News Stories – whether they
will likely result in a change in stock price

Dr. Yi, C., Tsinghua SEM 20

Decision Tree Example:
Classifying News Stories
• 1. Summit Tech announces revenues for the three months ended Dec 31, 1998 were $22.4 million,
an increase of 13%.
• 2. Summit Tech and Autonomous Technologies Corporation announce that the Joint
Proxy/Prospectus for Summit’s acquisition of Autonomous has been declared effective by the SEC.
• 3. Summit Tech said that its procedure volume reached new levels in the first quarter and that it had
concluded its acquisition of Autonomous Technologies Corporation.
• 4. Announcement of annual shareholders meeting.
• 5. Summit Tech announces it has filed a registration statement with the SEC to sell 4,000,000 shares
of its common stock.
• 6. A US FDA panel backs the use of a Summit Tech laser in LASIK procedures to correct
nearsightedness with or without astigmatism.
• 7. Summit up 1-1/8 at 27-3/8.
• 8. Summit Tech said today that its revenues for the three months ended June 30, 1999 increased
14%…
• 9. Summit Tech announces the public offering of 3,500,000 shares of its common stock priced at
$16/share.
• 10. Summit announces an agreement with Sterling Vision, Inc. for the purchase of up to six of
Summit’s state of the art, ApexPlus Laser Systems.
• 11. Preferred Capital Markets, Inc. initiates coverage of Summit Technology Inc. with a Strong Buy
rating and a 12-16 month price target of $22.50.
21
Feature Selection in Document
Classification
• Each document is a feature vector, with features
being TF×IDF values of terms
– Labeled “change” or “no change” in training set

• What term is the most useful for distinguishing a

news story that will lead to substantial stock price
changes from one that will not?

Dr. Yi, C., Tsinghua SEM 22

Decision Tree Example:
Classifying News Stories
• Terms with high information gain

Dr. Yi, C., Tsinghua SEM 23

Information Gain Drawbacks
• Problem: attributes with a large number of
values (extreme case: person’s name or ID code)
code age income student credit_rating buys_computer
1 <=30 high no fair no
2 <=30 high no excellent no
3 31…40 high no fair yes
4 >40 medium no fair yes
5 >40 low yes fair yes
6 >40 low yes excellent no
7 31…40 low yes excellent yes
8 <=30 medium no fair no
9 <=30 low yes fair yes
10 >40 medium yes fair yes
11 <=30 medium yes excellent yes
12 31…40 medium no excellent yes
13 31…40 high yes fair yes
14 >40 medium no excellent no

Dr. Yi, C., Tsinghua SEM 24

Information Gain Drawbacks
• Subsets are more likely to be pure if there is a
large number of values for a feature
– Information gain is biased towards choosing
features with a large number of values
– This may result in overfitting
• Selection of an attribute that is non-optimal for
prediction in general

Dr. Yi, C., Tsinghua SEM 25

Gain Ratio
• Gain ratio: a modification of the information gain
that reduces its bias
• Gain ratio takes number and size of branches
into account when choosing an attribute
– It corrects the information gain by taking the split
information into account

– But it may overcompensate: choose an attribute just

because its split information is very low
• Standard fix: only consider attributes with greater than
average information gain, and then compare them on gain
ratio
Dr. Yi, C., Tsinghua SEM 26
Discussion
Q: Is a tree with only pure leaves always the best
classifier you can have?

Dr. Yi, C., Tsinghua SEM 27

Output: A Decision Tree for
“buys_computer”
age?

<=30 overcast
30..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes

Dr. Yi, C., Tsinghua SEM 28

Output: A Decision Tree for
Consumer Churn Problem

Dr. Yi, C., Tsinghua SEM 29

Discussion
Q: Is a tree with only pure leaves always the best
classifier you can have?
A: No.
This tree is the best classifier on the training set,
but possibly not on new and unseen data.
Because of overfitting, the tree may not
generalize very well.

Dr. Yi, C., Tsinghua SEM 30

Overfitting
• We want models to apply not just to the exact
training set but to the general population
from which the training data came.
• There is a fundamental trade-off between
model complexity and the possibility of
overfitting.

Dr. Yi, C., Tsinghua SEM 31

Overfitting in Tree Induction
• If we continue to split the data, eventually the
subsets will be pure.
• Any training instance given to the tree for
classification will eventually land at the
appropriate leaf → perfectly accurate!
• But at some point, the tree will start to overfit:
it acquires details of the training set that are
not characteristics of the population in
general, as represented by the holdout set
Dr. Yi, C., Tsinghua SEM 32
A Typical Fitting Graph for Tree
Induction

The
proportion of
correct
decisions

Dr. Yi, C., Tsinghua SEM 33

Avoiding Overfitting with Tree
Induction
• Two common techniques to avoid overfitting:
– (1) to stop growing the tree before it gets too
complex (Prepruning)
– (2) to grow the tree until it is too large, then
“prune” it back, reducing its size (Post-pruning)

Dr. Yi, C., Tsinghua SEM 34

Prepruning
• The simplest method to is to specify a minimum number
of instances that must be present in a leaf
– But at what threshold?
• Based on statistical significance test
– Stop growing the tree when there is no statistically
significant association between any attribute and the
class at a particular node
• Most popular test: chi-squared test
• ID3 used chi-squared test in addition to information gain
– Only statistically significant attributes were allowed to
be selected by information gain procedure
Dr. Yi, C., Tsinghua SEM 35
Post-pruning
• Build a full tree first , then cut off leaves and
branches, and replace them with leaves
• One general idea is to estimate whether replacing a
set of leaves or a branch with a leaf would reduce
accuracy
– If not, then go ahead and prune
– The process can be iterated on progressive
subtrees until any removal or replacement
would reduce accuracy

Dr. Yi, C., Tsinghua SEM 36

Subtree Replacement
• Bottom-up
• Consider replacing a tree
only after considering all its
subtrees

Dr. Yi, C., Tsinghua SEM 37

Summary
• Decision Trees
– splits – binary, multi-way
– split criteria – information gain, gain
ratio, …
– pruning
– …
• No method is always superior –
experiment!

Dr. Yi, C., Tsinghua SEM 38

Homework 2 (Due 5pm Mar 25)
• Resource:
https://docs.rapidminer.com/latest/studio/op
erators/

Dr. Yi, C., Tsinghua SEM 39

Information Life Cycle
Analyzing User Requirement
1

7
Creation
Collection/
Capture

Reuse /
Leverage
Information
Life Cycle Management
Organization/
Indexing

Distribution /
Dissemination
Storage /
Retrieval

Dr. Yi, C., Tsinghua SEM 40

Structure of an IR System
Storage
Interest profiles Documents Line
Search & Queries & data
Line Information Storage and Retrieval System

Rules of the game =

Rules for subject indexing +
Formulating query in Thesaurus (which consists of Indexing
terms of (Descriptive and
descriptors Lead-In Subject)
Vocabulary
and
Indexing
Language)
Storage of
Storage of
profiles
Documents

Store1: Profiles/ Comparison/ Store2: Document

Search requests Matching representations

Potentially
Relevant
Documents

Dr. Yi, C., Tsinghua SEM 41

Central Concepts in IR
• Documents
• Collections
• User Interface and Queries
• Relevance
• Evaluation

Dr. Yi, C., Tsinghua SEM 42

The Retrieval Process
Text
User
Interface
user need Text

Text Operations

logical view logical view

Query DB Manager
Indexing
user feedback Operations Module

query inverted file

Searching Index

retrieved docs
Text
Ranking Database
ranked docs
Dr. Yi, C., Tsinghua SEM 43
What is a good structure for index?

- Is this a good algorithm?

- No! Query processing time should be largely
independent of database size.
- Probably proportional to answer size.
Dr. Yi, C., Tsinghua SEM 44
What is a good structure for index?
• We need a good data structure to support the
operation:
– “given term t, get all the documents that contain it”
• The structure must support this operation very
efficiently.
• It should be built at preprocessing time, not at
query time
– Can afford to spend some time in its construction.

Dr. Yi, C., Tsinghua SEM 45

Inverted Files
• The crucial data structure for indexing
• A file “inverted” so that rows become
columns and columns become rows

docs t1 t2 t3
D1 1 0 1
D2 1 0 0
D3 0 1 1
D4 1 0 0
D5 1 1 1 Terms D1 D2 D3 D4 D5 D6 D7 …
D6 1 1 0 t1 1 1 0 1 1 1 0
D7 0 1 0 t2 0 0 1 0 1 1 1
D8 0 1 0 t3 1 0 1 0 1 0 0
D9 0 0 1
D10 0 1 1

Dr. Yi, C., Tsinghua SEM 46

Creating Inverted Files

Word Extraction Word IDs

Original Documents

W1:d1,d2,d3
W2:d2,d4,d7,d9

Document IDs
Wn :di,…dn

Inverted Files

Dr. Yi, C., Tsinghua SEM 47

Creating Inverted Files
• Map the file names to file IDs
• Consider the following Original Documents
The Department of Computer Science was established in 1984.
D1
The Department launched its first BSc(Hons) in Computer Studies in 1987.
D2
followed by the MSc in Computer Science which was started in 1991.
D3
The Department also produced its first PhD graduate in 1994.
D4
Our staff have contributed intellectually and professionally to the advancements
D5 in these fields.

Dr. Yi, C., Tsinghua SEM 48

Creating Inverted Files
Blue: stop word

The Department of Computer Science was established in 1984.

D1
The Department launched its first BSc(Hons) in Computer Studies in
D2 1987.

followed by the MSc in Computer Science which was started in 1991.

D3
The Department also produced its first PhD graduate in 1994.
D4
Our staff have contributed intellectually and professionally to the
D5 advancements in these fields.

Dr. Yi, C., Tsinghua SEM 49

Creating Inverted Files
After stemming, make lowercase (optional), delete numbers (optional)

depart comput scienc establish

D1
depart launch bsc hons comput studi
D2
follow msc comput scienc start
D3
depart produc phd graduat
D4
staff contribut intellectu profession advanc field
D5

Dr. Yi, C., Tsinghua SEM 50

Creating Inverted Files (unsorted)
Words Documents Words Documents
depart d1,d2,d4 produc d4
comput d1,d2,d3 phd d4
scienc d1,d3 graduat d4
establish d1 staff d5
launch d2 contribut d5
bsc d2 intellectu d5
hons d2 profession d5
studi d2 advanc d5
follow d3 field d5
msc d3
start d3

Dr. Yi, C., Tsinghua SEM 51

Creating Inverted Files (sorted)
Words Documents Words Documents
advanc d5 msc d3
bsc d2 phd d4
comput d1,d2,d3 produc d4
contribut d5 profession d5
depart d1,d2,d4 scienc d1,d3
establish d1 staff d5
field d5 start d3
follow d3 studi d2
graduat d4
intellectu d5
launch d2

Dr. Yi, C., Tsinghua SEM 52

Another Example:
Creating Inverted Files Term
now
Doc #
1

• Documents are parsed to extract is

the
time
1
1
1

tokens for
all
good
1
1
1
men 1

• These are saved with the Document to

come
to
1
1
1

ID the
aid
of
1
1
1
their 1
country 1
Doc 1 Doc 2 it
was
2
2
a 2
dark 2

Now is the time It was a dark and and

stormy
2
2

for all good men stormy night in night

in
2
2

to come to the aid the country the

country
2
2
manor 2

of their country manor. The time the 2

time 2
was past midnight was
past
2
2
midnight 2

Dr. Yi, C., Tsinghua SEM 53

Creating Inverted Files
Term Doc # Term Doc #
now 1 a 2

• After all documents have is

the
time
1
1
1
aid
all
and
1
1
2

been parsed, the

for 1 come 1
all 1 country 1
good 1 country 2

inverted file is sorted

men 1 dark 2
to 1 for 1
come 1 good 1

alphabetically
to 1 in 2
the 1 is 1
aid 1 it 2
of 1 manor 2
their 1 men 1
country 1 midnight 2
it 2 night 2
was 2 now 1
a 2 of 1
dark 2 past 2
and 2 stormy 2
stormy 2 the 1
night 2 the 1
in 2 the 2
the 2 the 2
country 2 their 1
manor 2 time 1
the 2 time 2
time 2 to 1
was 2 to 1
past 2 was 2
midnight 2 was 2

Dr. Yi, C., Tsinghua SEM 54

Creating Inverted Files
Term Doc # Term Doc # Freq
a 2 a 2 1

• Multiple term entries aid

all
and
1
1
2
aid
all
1
1
1
1
and 2 1
for a single document come
country
country
1
1
2
come
country
1
1
1
1

are merged dark

for
good
2
1
1
country
dark
for
2
2
1
1
1
1
in 2

• Within-document is
it
1
2
good
in
is
1
2
1
1
1
1
manor 2

term frequency men

midnight
1
2
it
manor
2
2
1
1
night 2 men 1 1
information is now
of
1
1
midnight
night
2
2
1
1
past 2

compiled stormy
the
2
1
now
of
past
1
1
2
1
1
1
the 1
the 2 stormy 2 1
the 2 the 1 2
their 1 the 2 2
time 1
their 1 1
time 2
time 1 1
to 1
to 1 time 2 1
was 2 to 1 2
was 2 was 2 2
Dr. Yi, C., Tsinghua SEM 55
Creating Inverted Files
• Then the file can be split into
– A Dictionary file
– and
– A Postings file

Dr. Yi, C., Tsinghua SEM 56

Creating Inverted Files
Term Doc # Freq
a
aid
2
1
1
1
Dictionary Postings
all 1 1 Term N docs Tot Freq Doc # Freq
and 2 1 a 1 1 2 1
come 1 1 aid 1 1 1 1
country 1 1 all 1 1 1 1
country 2 1 and 1 1 2 1
dark 2 1 come 1 1 1 1
country 2 2 1 1
for 1 1
dark 1 1 2 1
good 1 1 2 1
for 1 1
in 2 1 good 1 1 1 1
is 1 1 in 1 1 1 1
it 2 1 is 1 1 2 1
manor 2 1 it 1 1 1 1
men 1 1 manor 1 1 2 1
men 1 1 2 1
midnight 2 1
midnight 1 1 1 1
night 2 1
night 1 1 2 1
now 1 1 2 1
now 1 1
of 1 1 of 1 1 1 1
past 2 1 past 1 1 1 1
stormy 2 1 stormy 1 1 2 1
the 1 2 the 2 4 2 1
the 2 2 their 1 1 1 2
time 2 2 2 2
their 1 1
to 1 2 1 1
time 1 1 1 1
was 1 2
time 2 1 2 1
to 1 2 1 2
was 2 2 Collapse list (resolve repeats) 2 2

Dr. Yi, C., Tsinghua SEM 57

Separate documents for each term
How Inverted Files are Used
Dictionary Postings Query on
Term
a
N docs
1
Tot Freq
1
Doc #
2
Freq
1
“time” AND “dark”
aid 1 1 1 1
all 1 1 1 1
and 1 1 2 1
come
country
1
2
1
2
1
1
1
1 2 docs with “time” in
2 1
dark
for
1
1
1
1 2 1 dictionary ->
good 1 1 1 1
in
is
1
1
1
1
1
2
1
1 IDs 1 and 2 from
1 1
it
manor
1
1
1
1 2 1 posting file
men 1 1 2 1
midnight
night
1
1
1
1
1
2
1
1 1 doc with “dark” in
2 1
now
of
1
1
1
1 1 1 dictionary ->
past 1 1 1 1
stormy
the
1
2
1
4
2
2
1
1
ID 2 from posting file
their 1 1 1 2
time 2 2 2 2
to 1 2 1 1
was 1 2 1 1
2 1 Therefore, only doc 2
1 2
2 2 satisfied the query
Dr. Yi, C., Tsinghua SEM 58
Inverted Indexes
• For each term, you get a list consisting of:
– Document ID
– Frequency of term in doc (optional)
– Position of term in doc (optional)
• Permit fast search for individual terms
• These lists can be used to solve Boolean queries:
– country -> d1, d2
– manor -> d2
– country AND manor -> d2
• Also used for statistical ranking algorithms

Dr. Yi, C., Tsinghua SEM 59

Dr. Yi, C., Tsinghua SEM 60
Week Date Lesson Topics
1 Feb 26 Introduction
2 Mar 4 Metadata and subject analysis (metadata schemes,
controlled vocabularies)
3 Mar 11 - Information categorization
- Computational classification: text processing basics
4 Mar 18 - Computational classification: decision tree
Course - Information retrieval: inverted indexes
5-6 Mar 25, Apr 1 - Information retrieval: models (Boolean, vector space,
Schedule probabilistic) and evaluation
7 Apr 8 Project presentation
8-9 Apr 15, 22 -Web search (link analysis, paid search)
10 Apr 29 - Test 1
- Guest lecture
11-12 May 6, 13 Information and social network (information cascades,
social network analysis)
13-14 May 20, 27 Social and ethical issues (pricing of information,
information goods market, IP issues)
- Review
15 Jun 3 Test 2
Dr. Yi, C., Tsinghua SEM 61

Edc15 Multimap - ECU Connections
67% (3)
Edc15 Multimap - ECU Connections
3 pages
Lecture 023+-+Decision+Trees+ - 1
No ratings yet
Lecture 023+-+Decision+Trees+ - 1
54 pages
Crush Hypothesis Testing
From Everand
Crush Hypothesis Testing
Allison Dillard
No ratings yet
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
NOTES Module 3 - Chapter 6 - Decision Tree Learning
No ratings yet
NOTES Module 3 - Chapter 6 - Decision Tree Learning
20 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
15 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Lecture 06 Part A - Macine Learning
No ratings yet
Lecture 06 Part A - Macine Learning
77 pages
Lesson 5 - Supervised Learning-Classification
100% (1)
Lesson 5 - Supervised Learning-Classification
91 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Tree Models
No ratings yet
Tree Models
42 pages
MAchine Learning 1
No ratings yet
MAchine Learning 1
17 pages
MAchine Learning 2
No ratings yet
MAchine Learning 2
16 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
Classification
No ratings yet
Classification
45 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
CH 5
No ratings yet
CH 5
81 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Class Basic
No ratings yet
Class Basic
75 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
2024 Lecture11 MLAlgorithms
No ratings yet
2024 Lecture11 MLAlgorithms
84 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
08ClassBasic L
No ratings yet
08ClassBasic L
78 pages
Module 3
No ratings yet
Module 3
132 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Classification: Basic Concepts
No ratings yet
Classification: Basic Concepts
73 pages
Decision Tree Intro MDT903
No ratings yet
Decision Tree Intro MDT903
40 pages
Module 4 Lecture - 2
No ratings yet
Module 4 Lecture - 2
65 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
Module 3
No ratings yet
Module 3
101 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Learning
No ratings yet
Learning
51 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
From Everand
This is The Statistics Handbook your Professor Doesn't Want you to See. So Easy, it's Practically Cheating...
S. Deviant
4.5/5 (6)
Statistics I Essentials
From Everand
Statistics I Essentials
Emil G. Milewski
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
From Everand
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
UBER AUTHOR
No ratings yet
محاضرات تطبيقات حاسبة مرحلة رابعة
No ratings yet
محاضرات تطبيقات حاسبة مرحلة رابعة
8 pages
Data Structures: Course COSC 3421 - Spring 2010
No ratings yet
Data Structures: Course COSC 3421 - Spring 2010
20 pages
7TH - Unit 4-21ec74h6 - Ca
No ratings yet
7TH - Unit 4-21ec74h6 - Ca
67 pages
1 Introduction
No ratings yet
1 Introduction
22 pages
Java Advance Topic in Hindi
No ratings yet
Java Advance Topic in Hindi
2 pages
Sem3 Makut
No ratings yet
Sem3 Makut
16 pages
Compiler Design Unit-1 - 3
No ratings yet
Compiler Design Unit-1 - 3
4 pages
Result Analysis III Semester CSE A 2025 March
No ratings yet
Result Analysis III Semester CSE A 2025 March
31 pages
Power BI DAX Cheat Sheet 1750334532
No ratings yet
Power BI DAX Cheat Sheet 1750334532
17 pages
Cse322 Formal Languages and Automation Theory
100% (1)
Cse322 Formal Languages and Automation Theory
2 pages
Maths - Multiplication & Division Pack Answers
No ratings yet
Maths - Multiplication & Division Pack Answers
22 pages
Lab 01
No ratings yet
Lab 01
19 pages
Lab11 Manual
No ratings yet
Lab11 Manual
13 pages
Mathematics Examination Term 3
No ratings yet
Mathematics Examination Term 3
9 pages
100 Days DSA Roadmap
No ratings yet
100 Days DSA Roadmap
21 pages
Strip
No ratings yet
Strip
50 pages
(Year 13) (CS) AS & A-Levels - Catcher Follower Game
No ratings yet
(Year 13) (CS) AS & A-Levels - Catcher Follower Game
8 pages
Programming Language
No ratings yet
Programming Language
161 pages
Co-So-Tri-Tue-Nhan-Tao - 2021-Reviewexercise05-Pl-Sol - (Cuuduongthancong - Com)
No ratings yet
Co-So-Tri-Tue-Nhan-Tao - 2021-Reviewexercise05-Pl-Sol - (Cuuduongthancong - Com)
3 pages
Lecture 27
No ratings yet
Lecture 27
7 pages
Digital Electronics Additional Practice #1
No ratings yet
Digital Electronics Additional Practice #1
8 pages
Software Engineering - An Introduction..
No ratings yet
Software Engineering - An Introduction..
34 pages
OBE Learning Plan Operating System - UPDATED
No ratings yet
OBE Learning Plan Operating System - UPDATED
4 pages
Design and Implementation of AES and SHA-256 Cryptography For Securing Multimedia File Over Android Chat Application
No ratings yet
Design and Implementation of AES and SHA-256 Cryptography For Securing Multimedia File Over Android Chat Application
6 pages
Min Heap in Java
No ratings yet
Min Heap in Java
7 pages
CS Grade XI One Marks-3
No ratings yet
CS Grade XI One Marks-3
24 pages
Ravelonjara Michael Supervised Machine Learning Techniques
No ratings yet
Ravelonjara Michael Supervised Machine Learning Techniques
11 pages
Black and Red Tech Programming Presentation
No ratings yet
Black and Red Tech Programming Presentation
9 pages
Abhishek Tyagi
No ratings yet
Abhishek Tyagi
1 page