0% found this document useful (0 votes)

2 views

DataClassification

The document provides an overview of data classification in data mining, focusing on key concepts such as training, validation, and test data, as well as various classification algorithms like k-NN and decision trees. It discusses the importance of cross-validation, model evaluation metrics, and the challenges of overfitting and underfitting. Additionally, it outlines strategies for addressing rare classes and small data issues, emphasizing the need for optimal parameter selection and preprocessing techniques.

Uploaded by

huy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

DataClassification

Uploaded by

huy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

COMP5009

DATA MINING

DATA
CLASSIFICATION
DR PAUL HANCOCK
CURTIN UNIVERSITY
SEMESTER 2
CLASSIFICATION
BASICS
 Key concepts
Aggarwal Chapters/Sections
10.1  Data Partitioning
10.2-10.2.1
10.3  Cross-validation
10.5-10.5.1
10.8
 Evaluation
10.9
11.1
11.2
11.3

COMP5009 – DATA MINING, CURTIN UNIVERSITY 2

CLASSIFICATION

Y : Test data
D: Training data  To predict class labels

 For selecting models and tuning parameters  A test sample must also belong to one of the known
categories/classes
 Already partitioned into groups, categories, or
classes  Important statistical (learning) assumption:
samples come from the same distribution that
 Class labels provided by domain experts
generates the training samples
 Each sample has a label indicating which class it  If a test sample is identical to a sample in the training
belongs to data, it must also have the same class label

COMP5009 – DATA MINING, CURTIN UNIVERSITY 3

DATA CLASSIFICATION
edible or cap- cap- cap- stalk- stalk-
odor gill-size gill-color
poisonous? shape surface color shape root
p x s n p n k e e
e x s y a b k e c
e b s w l b n e c
p x y w p n n e e
e x s g n b k t e

Train e x y y a b n e c
e b s w a b g e c
e b y w l b n e c
p x y w p n p e e
e b s y a b g e c
e x y y l b g e c
? x y y a b n e c
Predict
? b s y a b w e c
https://archive-beta.ics.uci.edu/ml/datasets/73
COMP5009 – DATA MINING, CURTIN UNIVERSITY 4
DATA CLASSIFICATION

Apples Pears
Apple
or
Pear?

COMP5009 – DATA MINING, CURTIN UNIVERSITY 5

DATA CLASSIFICATION

Apples Pears
Apple
or
Pear?

COMP5009 – DATA MINING, CURTIN UNIVERSITY 6

DATA CLASSIFICATION

Apples Pears
Apple
or
Pear?

COMP5009 – DATA MINING, CURTIN UNIVERSITY 7

TRAINING, VALIDATION, AND TEST DATA

 Retrain best model using the optimum

 Test data must be kept separate to avoid overfitting
parameters and all training data
(and over confidence)
 Test best model on testing data
 Try multiple models

 Models typically have hyper-parameter(s)

 Select values that produce best

average performance with cross validation
 Tuning = optimizing over hyper-parameters

 Select best model and parameters for final model

COMP5009 – DATA MINING, CURTIN UNIVERSITY 8

SUPERVISED LEARNING

Training models: mathematical description of how

attributes/features are mapped to the classes.

Classification: supervised learning Examples:

 Training data used to learn structure of the groups  Extreme case: no model at all, just memory-based (k-
NN)
Two main phases:
 Partitioning the attribute spaces into regions of
 Training: construct predictive model from training data dominant labels (decision trees)
 Testing: apply model to test samples to predict labels  Linear combination of attributes (SVM, linear
discriminant analysis)
 A neural network with suitable weights
 Probabilistic representation (Bayesian methods)

COMP5009 – DATA MINING, CURTIN UNIVERSITY 9

PYTHON EXAMPLE: IRIS DATA

Use Iris flower measurements to predict species

Learn from labeled data
Predict class of unseen data
See prac 05

COMP5009 – DATA MINING, CURTIN UNIVERSITY 10

MODEL EVALUATION

Accuracy
 The fraction of test instances that were correctly
Confusion matrix
labeled (TP + TN)/Total
Precision Actual 1 Actual 0
 Fraction reported positives that are correct TP/(TP+FP) Predicted 1 True positive False positive
Recall
Predicted 0 False negative True negative
 Fraction of positives that are reported TP/(TP+FN)
F1-measure
Receiver Operating Curve (ROC)
 2* Precision * Recall / (Precision + Recall)
 Area under the graph of FPR vs TPR ε [0,1]
 2*TP / (2*TP + FP +FN)

COMP5009 – DATA MINING, CURTIN UNIVERSITY 11

TPR, FPR AND THRESHOLD

 Models/methods typically have a threshold

Likelihood of being class A

parameter that is used to separate class A from

class B
 TPR and FPR are thus functions of the threshold

 Low threshold -> more detections, more false

alarms
 High threshold -> fewer detections, fewer false
alarms

COMP5009 – DATA MINING, CURTIN UNIVERSITY 12

RECEIVER OPERATING CURVE (ROC)

 Plot of detection probability (TPR) vs false alarm

rate (FPR)
 Starts from (0,0) and ends at (1,1)

 Desirable curve: detection probability close to 1 for

small false alarm value

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

COMP5009 – DATA MINING, CURTIN UNIVERSITY 13

ROC AND DESIRABLE PERFORMANCE

 How to quantify “desirable performance”

 Area under the curve (AUC): as close to 1 as possible
 Equal error rate (EER): miss detection rate = false alarm
rate

 Practical interpretations
 Unavoidable trade-off between false alarm rate and
detection probability
 Operating point: depends on the specific application

COMP5009 – DATA MINING, CURTIN UNIVERSITY 14

CONFUSION MATRIX

 Useful for binary and multi-

label classification methods
 Can identify similar classes or
classes which are easily
confused

https://towardsdatascience.com/visual-guide-to-the-confusion-matrix-bb63730c8eba

COMP5009 – DATA MINING, CURTIN UNIVERSITY 15

VALIDATION

Goal of classification: predictive power

Data subsets  Predict the test samples as accurate as possible

 With class labels: for building the predictive model  Accurately explaining the training data not the main
goal
 Without class labels: for predicting class labels by
applying the model How to predict accuracy on unseen test samples?
Question: How to make best use of data with class  Take a subset of the labelled data out for validation
labels?
 Validation subset needs to reflect the statistics of
unseen data

COMP5009 – DATA MINING, CURTIN UNIVERSITY 16

N-FOLD CROSS VALIDATION

COMP5009 – DATA MINING, CURTIN UNIVERSITY 17

VALIDATION EXAMPLE

 Use subsets to train and

validate
 Apply best model to new
data
 Hope/Assume that sample
data is representative of
population data

COMP5009 – DATA MINING, CURTIN UNIVERSITY 18

CROSS VALIDATION

n-fold cross validation

 Divide labelled data into n blocks

 Take one block out for validation, train on others

Stratified cross-validation:
 Repeat for all blocks
 Ensures that each class is represented
 Measure average performance across all blocks
proportionally in all the subsets
Leave-one-out cross validation:
 Special case when block size = 1 sample

COMP5009 – DATA MINING, CURTIN UNIVERSITY 19

CROSS-VALIDATION SPLITTING SCHEMES

COMP5009 – DATA MINING, CURTIN UNIVERSITY 20

OPTIMAL USE OF YOUR TRAINING DATA

1. Choose a cross validation method

2. Choose a classification method

3. Determine the optimal parameters for the classification method using cross validation

 This will use most data for training during each iteration

4. Once you have the optimal parameters, retrain on ALL the data

 This will use all data for training your best model

COMP5009 – DATA MINING, CURTIN UNIVERSITY 21

SCALING AND PREPROCESSING FOR CLASSIFICATION TASKS

1. Use your training data to determine how you will scale your data (e.g. decide on min/max/μ/σ)

2. Apply your scaling to ALL your data (both the training and test data)

3. If you use test data to decide on the scaling properties, then you leak information into your training data set, and
become overconfident in the performance of your classifier.

COMP5009 – DATA MINING, CURTIN UNIVERSITY 22

BINARY VS MULTI-CATEGORY CLASSIFICATION

Many algorithms primarily developed for binary

classification, i.e. 2 classes/categories
Classification methods developed for binary  One-versus-all
classification can be extended to the multi-category  Meaning: compare one category (positive) against “the
case: rest” (negative)
 Build k binary classifiers during training
 One-versus-one (also called All-versus-all)
 Testing: each classifier produces a score (confidence) if
 Build k(k − 1)/2 binary classifiers during training
the result for that category is positive
 Apply all k(k − 1)/2 binary classifiers during testing
 Pick the category with the highest score (of out k scores)
 A category has +1 if predicted by a classifier
 Pick the category with the highest score

COMP5009 – DATA MINING, CURTIN UNIVERSITY 23

CLASSIFICATION ISSUES

Data size: Number of training samples for each category

Solutions
small
 Obtain more data samples: collect or inject samples
 the learned model does not reflect well how data
(resampling, introducing small noise, data
points are distributed => poor prediction
augmentation)
Overfitting: model optimized only on the training set and
 Select the right models, need to understand data well
not on the unseen samples
 Regularization: more penalty for more complex models
 lack generalization ability =>poor prediction
 Use validation/cross-validation to select robust training
Underfitting: model too simple to describe data statistics
models
or model not suitable to the data
 Select/combine suitable classification methods
 low prediction accuracy

COMP5009 – DATA MINING, CURTIN UNIVERSITY 24

ADDRESSING RARE CLASSES AND SMALL DATA

Problems Solutions

 Some classes rarely occur in the training data  Some algorithms can incorporate weights
and are often misclassified  Down weight common classes, up weight rare classes
 Training data is limited, and no new data can be  Include weights into decision boundaries and/or
obtained (easily) evaluation metrics

 Biased sampling
 Over sample rare class, or under sample common classes

 Synthetic oversampling (SMOTE*)

 Use existing data to generate synthetic training data

COMP5009 – DATA MINING, CURTIN UNIVERSITY

*Synthetic Minority Over-sampling Technique 25
CLASSIFICATION
ALGORITHMS
Aggarwal Ch 10.2, 10.3, 10.5.1,  k-NN
 Decision trees
 Naïve Bayes

COMP5009 – DATA MINING, CURTIN UNIVERSITY 26

K-NEAREST NEIGHBOURS CLASSIFICATION

 Data labeled as Red, Green, Blue

 Label new data (star) based on the class

if it's neighbors

COMP5009 – DATA MINING, CURTIN UNIVERSITY 27

K-NEAREST NEIGHBOURS CLASSIFICATION

 k=1 nearest neighbor

 k>1 (odd) majority vote of neighbors

COMP5009 – DATA MINING, CURTIN UNIVERSITY 28

K-NN

Cons
 Slow classification when training size is large
Pros
 Solution: Pre-processing (clustering e.g. BIRCH)
 Simple
 Curse of dimensionality: reduction?
 Memory based: no need to train a model
 Sensitive to local noise/outliers
 Can be used with few training examples
 Does not exploit data structure

 Non-resolvable cases with n>2 categories

COMP5009 – DATA MINING, CURTIN UNIVERSITY 29

K-NN CONSIDERATIONS

 Nearest neighbors
 Smallest distances
 Most similar

 Data similarity and distances

 Numeric: Lp norm, cosine, correlation, etc.

 Categorical: overlap measure with/without inverse

occurance frequency
 Mixed-type data: weighted sum of numerical and
categorical similarities

 Choice of k, and distance metric is important

COMP5009 – DATA MINING, CURTIN UNIVERSITY 30

K-NN EXAMPLE
Play Golf
Outlook Temperature Humidity Windy

Rainy Hot High False No

Rainy Hot High True No

Overcast Hot High False Yes

 Use overlap measure
Sunny Mild High False Yes

Sunny Cool Normal False Yes

Sunny Cool Normal True No

Overcast Cool Normal True Yes

Rainy Mild High False No

Rainy Hot Normal False ?

 Use k=1

COMP5009 – DATA MINING, CURTIN UNIVERSITY 31

DECISION TREES

 Tree – hierarchical structure of a/b choices or splits

 Each split partitions the space into sub-spaces

 After some number of splits, each partition is

labeled

From Zaki + Meira

COMP5009 – DATA MINING, CURTIN UNIVERSITY 34

DECISION TREES

From Zaki + Meira

COMP5009 – DATA MINING, CURTIN UNIVERSITY 35

DECISION TREES

Model
 A set of hierarchical decisions on the feature variables
 Tree-like structure
 Split criterion: divide a subset of the training data into
two or more parts
 Internal nodes: where splits happen

 Leaf nodes: dominant class labels

Predict  How do we split?
 Traverse from root -> leaf according to splits  When do we stop?

COMP5009 – DATA MINING, CURTIN UNIVERSITY 36

SPLIT CRITERIA

Goal: to maximize the separation of the different

classes among the child nodes
 Dependent on the type of attribute
How do we measure the best split option?
 Binary: only one choice
 Categorical with r different values: r -way split, converting  Error rate/purity
to binary  Gini index
 Numeric: several options
 Entropy
 r -way split if containing a small number of r ordered values

 Common: split using binary condition, e.g x ≤ a

COMP5009 – DATA MINING, CURTIN UNIVERSITY 37

SPLITTING WITH ERROR RATE OR PURITY

 Purity = fraction of samples having dominant class  Compute weighted average of error rates
label
 Error rate = 1 - Purity

 Based on smallest weighted average of error rates  Repeat this for all possible r -way splits
 Given a r -way split of set S into S1 , S2 , . . . , Sr  Select the one with the lowest weighted average
 Nr : number of samples in Sr error rate
 For each set compute the error rate er = 1 − pr

COMP5009 – DATA MINING, CURTIN UNIVERSITY 38

EXAMPLE 1

 Using error rate, which is the better split?

COMP5009 – DATA MINING, CURTIN UNIVERSITY 39

 Left half: 6 squares, 2 circles, total 8 samples,
dominant class is square
 e1 = 2/(2+6) = 2/8

 Right half: 3 squares, 7 circles, total 10 samples,

dominant class is circle
 e2 = 3/(3+7) = 3/10

 Weighted average of error rates

 esplit1 = (2/8 x 8 + 3/10 x 10) / (8+10) = 5/18

COMP5009 – DATA MINING, CURTIN UNIVERSITY 40

 Left half: 7 squares, 4 circles, total
 Left half: 6 squares, 2 circles, total 8 samples, 11 samples, dominant class is square
dominant class is square
 e1 = 4/(4+7) = 4/11
 e1 = 2/(2+6) = 2/8
 Right half: 2 squares, 5 circles, total 7
 Right half: 3 squares, 7 circles, total 10 samples, samples, dominant class is circle
dominant class is circle
 e2 = 2/(2+5) = 2/7
 e2 = 3/(3+7) = 3/10
 Weighted average of error rates
 Weighted average of error rates
 esplit2 = (4/11 x 11 + 2/7 x 7) / (11+7) = 6/18
 esplit1 = (2/8 x 8 + 3/10 x 10) / (8+10) = 5/18

COMP5009 – DATA MINING, CURTIN UNIVERSITY 41

5/18 < 6/18 so split 1 is better

COMP5009 – DATA MINING, CURTIN UNIVERSITY 42

SPLITTING WITH GINI INDEX

 Compute weighted average of Gini indices:

 Given a r -way split of set S into S1 , S2 , . . . , Sr

 Nr : number of samples in Sr

 For each subset Sr

 p1 , p2 , . . . , pk fraction of samples from k classes
 Select the split with the lowest weighted average
 Gini index is:
Gini index

COMP5009 – DATA MINING, CURTIN UNIVERSITY 43

EXAMPLE 2

 Using the Gini index, which is the better split?

COMP5009 – DATA MINING, CURTIN UNIVERSITY 44

 Left half: 6 squares, 2 circles, total 8 samples,
dominant class is square

 Right half: 3 squares, 7 circles, total 10 samples,

dominant class is circle

 Weighted average of error rates

COMP5009 – DATA MINING, CURTIN UNIVERSITY 45

 Left half: 6 squares, 2 circles, total 8 samples,  Left half: 7 squares, 4 circles, total
dominant class is square 11 samples, dominant class is square

 Right half: 3 squares, 7 circles, total 10 samples,  Right half: 2 squares, 5 circles, total 7
dominant class is circle samples, dominant class is circle

 Weighted average of Gini indicies  Weighted average of Gini indicies

COMP5009 – DATA MINING, CURTIN UNIVERSITY 46

 Left half: 6 squares, 2 circles, total 8 samples,  Left half: 7 squares, 4 circles, total
dominant class is square 11 samples, dominant class is square

 Right half: 3 squares, 7 circles, total 10 samples,  Right half: 2 squares, 5 circles, total 7
dominant class is circle samples, dominant class is circle

 Weighted average of Gini indicies  Weighted average of Gini indicies

split 1 is better
COMP5009 – DATA MINING, CURTIN UNIVERSITY 47
SPLITTING WITH ENTROPY

Entropy: measure of disorder or uncertainty

 Given a r -way split of set S into S1 , S2 , . . . , Sr

 Nr : number of samples in Sr  Compute weighted average of entropy scores

 For each subset Sr

 p1 , p2 , . . . , pk fraction of samples from k classes

 Entropy is:  Select the split with the lowest weighted entropy

 pi=0 => pi log2(pi) = 0

COMP5009 – DATA MINING, CURTIN UNIVERSITY 48

WHEN TO STOP SPLITTING

 Stopping criteria: prevent further splits of an

attribute when
 Size: the number of data points in the current ≤ a
 Recall: overfitting is one major issue to address predefined size threshold η (typically small); OR

 Overfitting in decision trees: can partition the data  Purity: when the purity of the current set ≥ a pre-defined
space with zero training error with deep trees. purity threshold π

 Deep trees = more complex model => may not  Tree pruning: converting internal nodes to leaf
perform well on unseen data nodes
 Consider both the training error and the tree complexity
 Error typically measured on the validation subset to
evaluate the effectiveness of pruning

COMP5009 – DATA MINING, CURTIN UNIVERSITY 49

EXAMPLE 3

 Given the above split, do we need to split either

group further?
 Assume thresholds:
 η=5 (size)
 Π=0.9 (purity)

COMP5009 – DATA MINING, CURTIN UNIVERSITY 50

DECISION TREES

Pro Con

 Simple and interpretable  Complex calculation if outcomes are linked

 Easy addition of new scenarios  High cost with large trees

 Effective and efficient  Prone to overfitting

 Analytical power  Lack of probabilistic sense (confidence, certainty,

etc.)

COMP5009 – DATA MINING, CURTIN UNIVERSITY 51

BAYES CLASSIFICATION

 Probability of joint event and conditional probability

 Bayes' theorem
A = {class of interest} B = {all samples}

COMP5009 – DATA MINING, CURTIN UNIVERSITY 52

BAYES CLASSIFICATION

 P(A): the prior likelihood of sample belonging to the

class of interest (based on training data)
 P(B): the prior likelihood of sample existing
(ignored)
A = {class of interest} B = {all samples}
 P(A|B): given the current sample, how likely it is
from a class of interest
 P(B|A): if the sample comes from a class if interest,
how likely it is observed

COMP5009 – DATA MINING, CURTIN UNIVERSITY 53

BAYES CLASSIFICATION IN PRACTICE

 D: training data set  Prediction of class: class that maximize the

posterior probability
 x1 , . . . xn : training samples, each with
d dimensions
 Classes {c1 , … , ck }

 P(ci): probability of class ci = fraction of samples  For numeric attributes, we can replace probability
from ci with density f
 P(x|ci ): probability of x, belonging to class ci

COMP5009 – DATA MINING, CURTIN UNIVERSITY 54

EXAMPLE 1D - IRIS

 For test point we compute

 for all 3 classes

COMP5009 – DATA MINING, CURTIN UNIVERSITY 55

MODELLING THE CLASS DISTRIBUTION P(X|C)

Categorical Numeric

 simple counting (1D)  approximate by Gaussian and replace probability

 Bernoulli modeling (multi-dimensional)
with density evaluation

 Example: Training data have 10 Red and 8 Blue  Mean and Standard deviation estimated from
values for class c1 training data
 Current test sample x is Red  Multi-dimensional data: mean vector and
covariance matrix
 P(x|c1) = 10/18
 Note: If some categories have few or zero counts,
adjust the base count with +1.

COMP5009 – DATA MINING, CURTIN UNIVERSITY 56

EXMPLE 2D - IRIS

COMP5009 – DATA MINING, CURTIN UNIVERSITY 57

LARGE DIMENSIONALITY – BAYES CLASSIFIER

 Categorical: joint probability of attribute values  Numeric: mean vectors and covariance matrix

COMP5009 – DATA MINING, CURTIN UNIVERSITY 58

NAÏVE BAYES

Issue: Estimating μ/Σ reliably from the training data is

Naïve Bayes: ignore the cross-terms!
difficult!
 Assume attributes are independent
 d2 − d cross-terms in the covariance matrix
 Estimate their distributions separately (multiple 1D
 They describe how attributes vary against each
problems)
other
 Simplify the calculations
 Need a lot of samples to estimate reliably
 Works better than expected
 Need a lot of computing power (time)

COMP5009 – DATA MINING, CURTIN UNIVERSITY 59

LARGE DIMENSIONALITY – NAÏVE BAYES CLASSIFIER

 Categorical: joint probability of attribute values  Numeric: mean vectors and covariance matrix

COMP5009 – DATA MINING, CURTIN UNIVERSITY 60

EXAMPLE 2D - IRIS

Note the lack of rotation:

Naïve bayes ignores the
correlation between
attributes

COMP5009 – DATA MINING, CURTIN UNIVERSITY 61

Attribute 1 Attribute 2 Attribute 3 Class
T T 5 Y
EXAMPLE T T 7 Y
T F 8 N
 Given the data table to the right, classify a new F F 3 Y
object (T, F, 1) using Naïve bayes.
F T 7 N
F T 4 N
F F 5 N
T F 6 Y
F T 1 N

COMP5009 – DATA MINING, CURTIN UNIVERSITY 62

Attribute 1 Attribute 2 Attribute 3 Class
T T 5 Y
EXAMPLE T T 7 Y
F F 3 Y
 Split the table into classes T F 6 Y
 Compute mean and variance of numerical attribute

 Class Y Attribute 1 Attribute 2 Attribute 3 Class

 μ = 5.25, σ=1.71 T F 8 N
 Class N F T 7 N
 μ = 5, σ = 2.74 F T 4 N
Density function is then F F 5 N
F T 1 N

COMP5009 – DATA MINING, CURTIN UNIVERSITY 63

Attribute 1 Attribute 2 Attribute 3 Class
T T 5 Y
EXAMPLE T T 7 Y
F F 3 Y
 Test for class Y T F 6 Y
 Class prior P(Y) = 4/9

 Categorical Attribute 1 Attribute 2 Attribute 3 Class

 P(a1 =T | Y) = 3/4 T F 8 N
 P(a2 = F | Y) = 1/2 F T 7 N
 Numerical F T 4 N
 μ = 5.25, σ=1.71 F F 5 N
F T 1 N
 All up P = 0.44 x 0.75 x 0.5 x 0.026 = 0.0043

T F 1 ?

COMP5009 – DATA MINING, CURTIN UNIVERSITY 64

Attribute 1 Attribute 2 Attribute 3 Class
T T 5 Y
EXAMPLE T T 7 Y
F F 3 Y
 Test for class N T F 6 Y
 Class prior P(N) = 5/9

 Categorical Attribute 1 Attribute 2 Attribute 3 Class

 P(a1 =T | N) = 1/5 T F 8 N
 P(a2 = F | N) = 2/5 F T 7 N
 Numerical F T 4 N
 μ = 5, σ=2.74 F F 5 N
F T 1 N
 All up P = 0.55 x 0.2 x 0.4 x 0.126=0.0055

 0.0055 > 0.0043 -> predict class N

T F 1 N

COMP5009 – DATA MINING, CURTIN UNIVERSITY 65

Classification basics
 Key concepts

SUMMARY  Data Partitioning

 Cross-validation

 Evaluation
Classification algorithms
 k-NN

 Decision trees
 Naïve Bayes

COMP5009 – DATA MINING, CURTIN UNIVERSITY 66

NEXT: REGRESSION
CHAPTER 11.5.1, 11.5.5, 11.5.6

COMP5009 – DATA MINING, CURTIN UNIVERSITY 67

The 8 Basic Statistics Concepts For Data Science - KDnuggets
No ratings yet
The 8 Basic Statistics Concepts For Data Science - KDnuggets
13 pages
GraphPad Dose Response Ebook PDF
No ratings yet
GraphPad Dose Response Ebook PDF
12 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
7.classification Before
No ratings yet
7.classification Before
27 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
Unit Iii Classification
No ratings yet
Unit Iii Classification
57 pages
Classification & Prediction
No ratings yet
Classification & Prediction
19 pages
1. Introduction to Data Mining & Classification
No ratings yet
1. Introduction to Data Mining & Classification
58 pages
Classification
No ratings yet
Classification
50 pages
DAMI 011114a
No ratings yet
DAMI 011114a
48 pages
7 Classification
100% (3)
7 Classification
63 pages
Lect 1
No ratings yet
Lect 1
38 pages
ABP DWDM UNIT 4 Classification 1
No ratings yet
ABP DWDM UNIT 4 Classification 1
51 pages
Classification Analysis
No ratings yet
Classification Analysis
4 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Classification in Data Mining 12
No ratings yet
Classification in Data Mining 12
7 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
3 DM Classification (2)
No ratings yet
3 DM Classification (2)
62 pages
Data Mining Outline
No ratings yet
Data Mining Outline
5 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
classification basic concept.data mining
No ratings yet
classification basic concept.data mining
20 pages
Unit-4 DM
No ratings yet
Unit-4 DM
19 pages
02 DataPreparation
No ratings yet
02 DataPreparation
43 pages
Down 4
No ratings yet
Down 4
83 pages
06 Classification
No ratings yet
06 Classification
32 pages
classification_unit-4
No ratings yet
classification_unit-4
19 pages
Lecture 16
No ratings yet
Lecture 16
14 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
5.classification and Prediction
No ratings yet
5.classification and Prediction
9 pages
Part 2
No ratings yet
Part 2
165 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
BI Chapter 04 - Unlocked
No ratings yet
BI Chapter 04 - Unlocked
47 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
18mca52c U3
No ratings yet
18mca52c U3
8 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
Lecture 3.1.1
No ratings yet
Lecture 3.1.1
17 pages
Data Minning Unit 2-1
No ratings yet
Data Minning Unit 2-1
10 pages
DMlecture1
No ratings yet
DMlecture1
39 pages
Data Mining
No ratings yet
Data Mining
30 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Data Mining 4th Is
No ratings yet
Data Mining 4th Is
24 pages
Classification: Unit-III
No ratings yet
Classification: Unit-III
90 pages
DATA MINING JNTUH CSE R18
No ratings yet
DATA MINING JNTUH CSE R18
20 pages
6.Data Mining - Classification Ppt
No ratings yet
6.Data Mining - Classification Ppt
37 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
8 Data Mining Concepts 2
No ratings yet
8 Data Mining Concepts 2
75 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
202396123846584_26076Classification - Data Mining
No ratings yet
202396123846584_26076Classification - Data Mining
4 pages
Evaluation_of_Student_Academic_Performan
No ratings yet
Evaluation_of_Student_Academic_Performan
7 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
ML Lect1
100% (1)
ML Lect1
51 pages
19-Introduction classification algorithm-18-09-2024
No ratings yet
19-Introduction classification algorithm-18-09-2024
102 pages
Xchapter 1
No ratings yet
Xchapter 1
31 pages
Tackling Imbalanced Data with Python: Advanced Techniques and Real-World Applications for Tackling Class Imbalance
From Everand
Tackling Imbalanced Data with Python: Advanced Techniques and Real-World Applications for Tackling Class Imbalance
Aarav Joshi
No ratings yet
Ex 3 SI
No ratings yet
Ex 3 SI
25 pages
Statistics and Probability
No ratings yet
Statistics and Probability
5 pages
Journal of Statistical Software: Mice: Multivariate Imputation by Chained Equations in R
No ratings yet
Journal of Statistical Software: Mice: Multivariate Imputation by Chained Equations in R
67 pages
Statistics Exercise Solution
100% (1)
Statistics Exercise Solution
19 pages
Chapter 7 - Sampling Distributions CLT
No ratings yet
Chapter 7 - Sampling Distributions CLT
17 pages
ResearchMethods Poster A4 en Ver 1.0
No ratings yet
ResearchMethods Poster A4 en Ver 1.0
1 page
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
57 pages
Semester Test 1 Memo
No ratings yet
Semester Test 1 Memo
12 pages
Simr Power Analysis For GLMMs
No ratings yet
Simr Power Analysis For GLMMs
19 pages
Jeff 18
No ratings yet
Jeff 18
126 pages
Probability Theory and Examples 4th Edition Rick Durrettinstant download
100% (1)
Probability Theory and Examples 4th Edition Rick Durrettinstant download
49 pages
Stat Exam
100% (1)
Stat Exam
2 pages
Chapter 4 Measure of Central Tendency
No ratings yet
Chapter 4 Measure of Central Tendency
29 pages
Homework Probstat Preuas
No ratings yet
Homework Probstat Preuas
3 pages
Assignment 7 - Nonparametric Tests
No ratings yet
Assignment 7 - Nonparametric Tests
3 pages
Latin Square
No ratings yet
Latin Square
6 pages
Chapter 9 - Regression Analysis: S-1
No ratings yet
Chapter 9 - Regression Analysis: S-1
7 pages
Assignments - NOC - Introduction To Coding Theory
No ratings yet
Assignments - NOC - Introduction To Coding Theory
39 pages
Tsa 3
No ratings yet
Tsa 3
16 pages
Chapter 2 Final of Final
No ratings yet
Chapter 2 Final of Final
158 pages
Quiz 6 - Chap 7
No ratings yet
Quiz 6 - Chap 7
5 pages
Rishi Stats 4
No ratings yet
Rishi Stats 4
3 pages
Slides ChannelCoding
No ratings yet
Slides ChannelCoding
98 pages
Tutorial 3 PDF
No ratings yet
Tutorial 3 PDF
2 pages
Panel Lecture - Gujarati
100% (1)
Panel Lecture - Gujarati
26 pages
Session - 06 To 09 Notes
No ratings yet
Session - 06 To 09 Notes
102 pages
Kisi2 UAS Answer
No ratings yet
Kisi2 UAS Answer
8 pages
Name: Bianca Goldschmidt Username: Bgoldschmidt Date: 10/17/21
No ratings yet
Name: Bianca Goldschmidt Username: Bgoldschmidt Date: 10/17/21
2 pages