0% found this document useful (0 votes)

11 views121 pages

III Unit Mtech 2023

Uploaded by

Maryam Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views121 pages

III Unit Mtech 2023

Uploaded by

Maryam Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 121

Duplicate Data

Handling Missing Values

• Removing the Missing Data: We can delete the rows or columns which
are having missing data.
• Imputation: In this technique we remove the missing values by filling the
missing values positions with some other value.
• Forward Fill or Backward Fill: The forward fill method fills the missing
value with the previous non missing value whereas backward fill method
fills the missing value
• Interpolation: it involves predicting the missing values based on observed
values in the dataset.
• linear interpolation which assumes there is a linear relationship between
observed values and missing data points, this method predicts the missing
value by fitting a straight line between two adjacent non-missing points.
Data discretization refers to a method of converting a huge number of data
values into smaller ones so that the evaluation and management of data
become easy. In other words, data discretization is a method of converting
attributes values of continuous data into a finite set of intervals with
minimum data loss. There are two forms of data discretization first is
supervised discretization, and the second is unsupervised discretization.
Some Famous techniques of data discretization
• Binarization is a process where numerical features are converted into
binary values based on a specified threshold. Values below the threshold
become 0, while values above or equal to the threshold become 1. This is
particularly useful when converting continuous data into discrete
categories.
The process of binarization involves the selection of a threshold value, and then
converting all pixel values below the threshold to 0 and all pixel values above
the threshold to 1. The choice of threshold is critical and can be determined
using various methods, including manual selection, global thresholding, or
adaptive thresholding.
•Manual Threshold Selection: In this approach, the threshold value is chosen
manually by inspecting the histogram of the image or based on domain
knowledge. This method is straightforward but may not be robust across
different images with varying lighting conditions or contrast levels.
•Global Thresholding:
•Global thresholding techniques use a single threshold value for the entire
image. A popular global thresholding method is Otsu's method, which selects
the threshold by minimizing the intra-class variance of the black and white
pixels, effectively separating the background from the foreground.
•Adaptive Thresholding: Adaptive or local thresholding methods determine the
threshold value based on the local neighborhoods of each pixel. This approach
is more flexible and can handle images with varying illumination by considering
the local context of each pixel.

•(global,local,Optmization based )
Sampling means selecting the group that you will actually collect data from
in your research

Purposive sampling :the sample units are selected with definite purpose.
Sample Size
Architecture for feature subset
selection
Entropy-based discretization is a supervised, top-down splitting
approach. It explores class distribution data in its computation and
preservation of split-points the method choose the value of A that
has the minimum entropy as a split-point, and recursively divisions
the resulting intervals to appear at a hierarchical discretization
d = √[(x2 – x1) + (y2 – y1) ]
2 2
d = √[(x2 – x1) + (y2 – y1)]

d = √[(x2 – x1)2 + (y2 – y1)2]

d = MAX[(x2 – x1) + (y2 –

y1)]
The cosine similarity is defined as the cosine of the angle between
them, that is, the dot product of the vectors divided by the product of

their lengths .
The Density-based Clustering tool works by detecting areas where points are
concentrated and where they are separated by areas that are empty or sparse.

A dissimilarity measure for cluster analysis is presented and used in the

context of probabilistic distance (PD) clustering.eg Guassian approach
The best data visualization tools
• include Google Charts, Tableau, Grafana,
Chartist. js, FusionCharts, Datawrapper,
Infogram, ChartBlocks, and D3. js.
Segmentation
• Data Segmentation is the process of taking the data you hold and dividing it up and
grouping similar data together based on the chosen parameters so that you can use it more
efficiently within marketing and operations. Examples of Data Segmentation could be:
Gender.
•
Types
Demographic, psychographic, behavioral and geographic segmentation are considered
the four main types of market segmentation
• Segmentation can be approached in three main ways: firmographic, behavioural and
needs-based
• The most basic level of customer segmentation is demographics, also known as
firmographics in b2b markets.
• Geographic segmentation splits your audience depending on where they are located.
(Continent,Country,Region,City,District)
• Psychographic segmentation separates your audience by their personality.
(Interests,Attitudes,Values)
• Behavioural segmentation divides your audience by their previous behaviour in
relation to your brand.
• Needs-based segmentation groups your audience by similar needs and/or benefits a
particular group is seeking. Problem-solving needs,emotions,functional,value
allignment)
demographic transition is a phenomenon and theory which refers to the historical shift
from high birth rates and high death rates in societies with minimal technology,
education (especially of women) and economic development, to low birth rates and
low death rates in societies with advanced technology, education and economic
development, as well as the stages between these two scenarios.
Transactional segmentation, or RFM modelling, looks at the spending
patterns of your customers to identify who your most valuable customers are
and group them by behaviour.
The model catalogues customers according to:
•Recency. How recently a customer purchased from your business.
•Frequency. How often they purchase from you.
•Monetary. How much they spent.
5 Image Segmentation Techniques

Segmentation is the process of classifying the

market into several approachable groups.
•Edge-Based Segmentation.
•Threshold-Based Segmentation.
•Region-Based Segmentation.
•Cluster-Based Segmentation.
•Watershed Segmentation.
• The threshold segmentation process can be regarded as the process of
separating foreground from background. Threshold segmentation mainly
extracts foreground based on gray value information.
• Edge-based segmentation relies on edges found in an image using
various edge detection operators
• The basic idea of region splitting is to break the image into a set of
disjoint regions which are coherent within themselves
• Watershed is a region-based technique that utilizes image morphology
• Cluster-based : It is a method to perform Image Segmentation of pixel-
wise segmentation. Take each point as a separate cluster
Types of Clustering

• Centroid-based Clustering.
• Density-based Clustering.
• Distribution-based Clustering.
• Hierarchical Clustering.
Data transformation
• Data transformation is the process of converting, cleansing, and structuring
data into a usable format that can be analyzed to support decision making
processes, and to propel the growth of an organization. Data transformation
is the process of converting raw data into a more suitable format or
structure for analysis, to improve its quality and make it compatible with the
requirements of a particular task or system.
Types
The most common types of data transformation are:
• Constructive: The data transformation process adds, copies, or replicates
data.
• Destructive: The system deletes fields or records.
• Aesthetic: The transformation standardizes the data to meet requirements
or parameters
Phases
• Understanding the 4 stages of digital
transformation and what you need to move
forward
• Planning.
• Implementation.
• Acceleration.
• Measurement.
techniques
• The different types of data transformation
techniques such as manipulation,
normalization, attribute construction,
generalization, discretization, aggregation,
and smoothing can help solve various
problems that arise in data analysis projects.
• What are the 5 levels of transformation?
• The five stages of change are precontemplation, contemplation,
preparation, action, and maintenance.

Data Transformations Types

• Bucketing/Binning(Data binning, also called data discrete binning or data
bucketing, is a data pre-processing technique used to reduce the effects
of minor observation errors.)
• Data Aggregation.
• Data Cleansing.
• Data Deduplication.(Data deduplication is a process that eliminates
excessive copies of data and significantly decreases storage capacity
requirements)
• Data Derivation.
• Data Filtering.
• Data Integration.
• Data Joining.
Machine Learning Algorithm

CLASSIFIER
SVM—History and Applications
• Vapnik and colleagues (1992)—groundwork from Vapnik & Chervonenkis’
statistical learning theory in 1960s
• Features: training can be slow but accuracy is high owing to their ability to
model complex nonlinear decision boundaries (margin maximization)
• Used both for classification and prediction
• Applications:
– handwritten digit recognition, object recognition, speaker
identification, benchmarking time-series prediction tests

July 27, 2024 101

SVM—General Philosophy

Small Margin Large Margin

Support Vectors

July 27, 2024 102

SVM—Linearly Separable
 A separating hyperplane can be written as
W●X+b=0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
 For 2-D it can be written as
w 0 + w1 x 1 + w2 x 2 = 0
 The hyperplane defining the sides of the margin:
H 1 : w 0 + w1 x 1 + w2 x 2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
 Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
 This becomes a constrained (convex) quadratic optimization
problem: Quadratic objective function and linear constraints 
Quadratic Programming (QP)  Lagrangian multipliers
July 27, 2024 103
SVM—When Data Is Linearly Separable

Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples
associated with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to
find the best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., maximum
marginal hyperplane (MMH)

July 27, 2024 104

July 27, 2024 105
July 27, 2024 106
SVM vs. Neural Network
• SVM • Neural Network
– Relatively new concept – Relatively old, but hot
again
– Deterministic algorithm
– Nondeterministic
– Nice Generalization algorithm
properties – Generalizes well
– Hard to learn – learned in – Can easily be learned in
batch mode using incremental fashion
quadratic programming – To learn complex functions
—use multilayer
techniques perceptron (not that trivial)
– Using kernels can learn – Local minima
very complex functions
July 27, 2024 107
Bayesian Classification

CLASSIFIER
Bayesian Classification

• Statically classifiers
• Based on Baye’s theorem
• Naïve Bayesian classification
• Class conditional independence
• Bayesian belief netwoks

CLASSIFIER
Lecture-35 - Bayesian Classification
Bayesian Theorem: Basics
• Let X be a data sample (“evidence”), class label is unknown
• Let H be a hypothesis that X belongs to class C
• Classification is to determine P(C|X), the probability that the hypothesis holds
given the observed data sample X
• P(C) (prior probability), the initial probability of C
– E.g., X will buy computer, regardless of age, income, …
• P(X): probability that sample data is observed
• P(X|C) (likelihood), the probability of observing the sample X, given that the
hypothesis H holds
– E.g., Given that X will buy computer, the prob. that X is 31..40, medium
income

CLASSIFIER
July 27, 2024 110
Bayesian Theorem
• Given training data X, posteriori probability of a hypothesis H, P(C|X),
follows the Bayes theorem

P(C | X)  P(X | C ) P(C )

P(X)
• Informally, this can be written as
posteriori = prior x likelihood / evidence
• Predicts X belongs to Ci iff the probability P(Ci|X) is the highest among all
the P(Ck|X) for all the k classes
• Practical difficulty: require initial knowledge of many probabilities,
significant computational cost

CLASSIFIER
July 27, 2024 111
Towards Naïve Bayesian Classifier
• Let D be a training set of tuples and their associated
class labels, and each tuple is represented by an n-D
attribute vector X = (x1, x2, …, xn)
• Suppose there are m classes C1, C2, …, Cm.
• Classification is to derive the maximum posteriori,
i.e., the maximal P(Ci|X)
• This can be derived from Bayes’ theorem (1<= i <=
m)
• Since P(X) is constant for
P(X | C )all
i
P(Cclasses,
i
) only
P(X | C )P(C )
• needs to be maximized P(C | X) 
i
i
P(X)
i

CLASSIFIER
July 27, 2024 112
Derivation of Naïve Bayes Classifier
• A simplified assumption: attributes are conditionally
independent (i.e., no dependence relation between
attributes):
n
P ( X | C i )   P ( x | C i )  P ( x | C i )  P ( x | C i )  ... P ( x | C i )
k 1 2 n
k 1
• This greatly reduces the computation cost: Only counts
the class distribution
• If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having
value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
• If Ak is continuous-valued, P(xk|Ci) is usually computed
based on Gaussian distribution with a mean μ and  ( x )2
1
standard deviation σ

g ( x,  ,  )   2
e 2

2 

and P(xk|Ci) is
P ( X | C i )  g ( xk ,  Ci ,  Ci )
CLASSIFIER
July 27, 2024 Data Mining: Concepts and Techniques 113
Naïve Bayesian Classifier: Training Dataset
age income studentcredit_rating
buys_compu
<=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data sample
31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium,
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
July 27, 2024 114
Naïve Bayesian Classifier: An Example
• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

• Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

CLASSIFIER
July 27, 2024 115
Naïve Bayesian Classifier: Comments
• Advantages
– Easy to implement
– Good results obtained in most of the cases
• Disadvantages
– Assumption: class conditional independence, therefore loss of
accuracy
– Practically, dependencies exist among variables
• E.g., salary and age.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
• Dependencies among these cannot be modeled by Naïve Bayesian
Classifier
• How to deal with these dependencies?
– Bayesian Belief Networks
CLASSIFIER
July 27, 2024 116
Play-tennis example: estimating P(xi|C)
outlook
Outlook Temperature Humidity Windy Class P(sunny|p) = 2/9 P(sunny|n) = 3/5
sunny hot high false N
sunny hot high true N P(overcast|p) = 4/9 P(overcast|n) = 0
overcast hot high false P
rain mild high false P P(rain|p) = 3/9 P(rain|n) = 2/5
rain cool normal false P
rain cool normal true N temperature
overcast cool normal true P
sunny
sunny
mild
cool
high false
normal false
N
P
P(hot|p) = 2/9 P(hot|n) = 2/5
rain
sunny
mild
mild
normal false
normal true
P
P
P(mild|p) = 4/9 P(mild|n) = 2/5
overcast mild high true P
overcast hot normal false P
P(cool|p) = 3/9 P(cool|n) = 1/5
rain mild high true N
humidity
P(high|p) = 3/9 P(high|n) = 4/5
P(p) = 9/14 P(normal|p) = 6/9 P(normal|n) = 2/5
windy
P(n) = 5/14
P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
CLASSIFIER
Lecture-35 - Bayesian Classification
Play-tennis example: classifying X

• An unseen sample X = <rain, hot, high, false>

• Sample X is classified in class n (don’t play)

CLASSIFIER
Lecture-35 - Bayesian Classification
How effective are Bayesian classifiers?

• makes computation possible

• optimal classifiers when satisfied
• but is seldom satisfied in practice, as attributes
(variables) are often correlated.
• Attempts to overcome this limitation:
– Bayesian networks, that combine Bayesian reasoning with
causal relationships between attributes
– Decision trees, that reason on one attribute at the time,
considering most important attributes first

CLASSIFIER
Lecture-35 - Bayesian Classification
Using IF-THEN Rules for Classification

• Represent the knowledge in the form of IF-THEN rules

R: IF age = youth AND student = yes THEN buys_computer = yes
– Rule antecedent/precondition vs. rule consequent
• Assessment of a rule: coverage and accuracy
– ncovers = # of data points covered by R
– ncorrect = # of data points correctly classified by R
coverage(R) = ncovers /|D| /* D: training data set */
accuracy(R) = ncorrect / ncovers
• If more than one rule is triggered, need conflict resolution
– Size ordering: assign the highest priority to the triggering rules that has the
“toughest” requirement (i.e., with the most attribute test)
– Rule-based ordering (decision list): rules are organized into one long priority list,
according to some measure of rule quality (accuracy) or by experts

CLASSIFIER
July 27, 2024 121
Rule Extraction from a Decision Tree
age?

<=30 31..40 >40

 Rules are easier to understand than large trees student? credit rating?
yes
 One rule is created for each path from the root excellent fair
no yes
to a leaf no yes
no yes
 Each attribute-value pair along a path forms a
conjunction: the leaf holds the class prediction
 Rules are mutually exclusive and exhaustive
• Example: Rule extraction from our buys_computer decision-tree
IF age = young AND student = no THEN buys_computer = no
IF age = young AND student = yes THEN buys_computer = yes
IF age = mid-age THEN buys_computer = yes
IF age = old AND credit_rating = excellent THEN buys_computer = yes
IF age = young AND credit_rating = fair THEN buys_computer = no
CLASSIFIER
July 27, 2024 122

Data Discretization
No ratings yet
Data Discretization
9 pages
Module 1 - BCS602 - Chapter 02
No ratings yet
Module 1 - BCS602 - Chapter 02
90 pages
Data Binning
No ratings yet
Data Binning
9 pages
Data Mining
No ratings yet
Data Mining
77 pages
Week 2
No ratings yet
Week 2
96 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
56 pages
Data Mining
No ratings yet
Data Mining
98 pages
China's Grand Strategy. Lukas K. Danner.
100% (4)
China's Grand Strategy. Lukas K. Danner.
219 pages
R21 Unit 2
No ratings yet
R21 Unit 2
101 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
8 Clustering
No ratings yet
8 Clustering
89 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
119 pages
PSK Unit 1 Merged
No ratings yet
PSK Unit 1 Merged
125 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
Clustering 1
No ratings yet
Clustering 1
75 pages
What Is Cluster Analysis?: Unsupervised Learning Stand-Alone Tool Preprocessing Step
No ratings yet
What Is Cluster Analysis?: Unsupervised Learning Stand-Alone Tool Preprocessing Step
21 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
Chapter 7. Cluster Analysis
No ratings yet
Chapter 7. Cluster Analysis
120 pages
Cluster Analysis
No ratings yet
Cluster Analysis
39 pages
K Medoids
No ratings yet
K Medoids
101 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
70 pages
MIS410 Chapter8
No ratings yet
MIS410 Chapter8
30 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
DWDM AR16 Unit 1.2
No ratings yet
DWDM AR16 Unit 1.2
14 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Unit 4
No ratings yet
Unit 4
42 pages
Chapter 5 CLUSTERING
No ratings yet
Chapter 5 CLUSTERING
36 pages
Mining Using Genitic Algorithms
No ratings yet
Mining Using Genitic Algorithms
7 pages
Data Mining Unit-4
No ratings yet
Data Mining Unit-4
15 pages
#CH-2 1 5
No ratings yet
#CH-2 1 5
19 pages
Summary Business Analytics
No ratings yet
Summary Business Analytics
24 pages
Ignore The Tuple
No ratings yet
Ignore The Tuple
2 pages
4 Popular Discretization Techniques You Need To Know in Data Science
No ratings yet
4 Popular Discretization Techniques You Need To Know in Data Science
17 pages
DM Data Transformation Techniques
No ratings yet
DM Data Transformation Techniques
25 pages
Lecture 3.2.1 3.2.2
No ratings yet
Lecture 3.2.1 3.2.2
28 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Normalization
No ratings yet
Normalization
35 pages
Module III Data Mining
No ratings yet
Module III Data Mining
7 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
52 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
123 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
120 pages
1.1 Project Overview: Data Mining
No ratings yet
1.1 Project Overview: Data Mining
74 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
50 pages
What Is Data Science and Cpare Data Science and Information Science
No ratings yet
What Is Data Science and Cpare Data Science and Information Science
11 pages
Unit 4
No ratings yet
Unit 4
65 pages
Stacked It
No ratings yet
Stacked It
28 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Clustering Unit4
No ratings yet
Clustering Unit4
9 pages
Data Analytics
No ratings yet
Data Analytics
24 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
51 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Unit No 3
No ratings yet
Unit No 3
10 pages
Adobe Scan 19 Mar 2025
No ratings yet
Adobe Scan 19 Mar 2025
8 pages
Data Mining Questions
100% (1)
Data Mining Questions
7 pages
The Nature of Psychology
No ratings yet
The Nature of Psychology
13 pages
Logic and Legal Reasoning
No ratings yet
Logic and Legal Reasoning
21 pages
RAMAN PROJECT - New
No ratings yet
RAMAN PROJECT - New
45 pages
Books Recommended Model Syl
No ratings yet
Books Recommended Model Syl
2 pages
Cyberactivism and Citizen Journalism in Egypt Digital Dissidence and Political Change by Courtney C. Radsch (Auth.)
No ratings yet
Cyberactivism and Citizen Journalism in Egypt Digital Dissidence and Political Change by Courtney C. Radsch (Auth.)
364 pages
CH - 3 Business Research Methods
No ratings yet
CH - 3 Business Research Methods
25 pages
Cornell VC Directory
No ratings yet
Cornell VC Directory
186 pages
Fishing
No ratings yet
Fishing
23 pages
FWD - Decoding The Pinoy Gamer Study
No ratings yet
FWD - Decoding The Pinoy Gamer Study
15 pages
BusPart B2 ExClassPracWS
No ratings yet
BusPart B2 ExClassPracWS
51 pages
Uab Graduate School Dissertation Format
100% (2)
Uab Graduate School Dissertation Format
6 pages
Types of Questions in Ielts Reading
No ratings yet
Types of Questions in Ielts Reading
9 pages
Corporate Governance Leadership and Motivation of Tesco PLC
No ratings yet
Corporate Governance Leadership and Motivation of Tesco PLC
11 pages
Holes Lesson Plan
No ratings yet
Holes Lesson Plan
3 pages
An Investigation Into The Confusion in Using The Prefixes in and Un
No ratings yet
An Investigation Into The Confusion in Using The Prefixes in and Un
10 pages
Ce (PC) 602
No ratings yet
Ce (PC) 602
21 pages
Kuliah Mcs Ch1&Silabus
No ratings yet
Kuliah Mcs Ch1&Silabus
21 pages
Ca 290119 Clinical Audit On Current Practice of Anesthesia Machine Che
No ratings yet
Ca 290119 Clinical Audit On Current Practice of Anesthesia Machine Che
5 pages
Public Policy Evaluation Process
No ratings yet
Public Policy Evaluation Process
13 pages
Kesahan Dan Kebolehpercayaan Dalam Kajian Kualitatif: January 2012
No ratings yet
Kesahan Dan Kebolehpercayaan Dalam Kajian Kualitatif: January 2012
38 pages
Agannath Niversity: Causes and Consequences of Water Pollution: A Study in Dhaka City
No ratings yet
Agannath Niversity: Causes and Consequences of Water Pollution: A Study in Dhaka City
8 pages
Jemima
No ratings yet
Jemima
60 pages
Development and Psychometric Properties of A Short Form of The Illness Denial Questionnaire
No ratings yet
Development and Psychometric Properties of A Short Form of The Illness Denial Questionnaire
14 pages
Article Review OPM530
No ratings yet
Article Review OPM530
11 pages
Interim Assessment 1: A. Name(s) of The Author(s)
No ratings yet
Interim Assessment 1: A. Name(s) of The Author(s)
5 pages
The Priming Effect of Family Obligation On Filipino Students Academic Performance
No ratings yet
The Priming Effect of Family Obligation On Filipino Students Academic Performance
15 pages
TERM PAPER ON 7 RISKS THAT CAN THREATEN THE SUCCESS OF A PROPOSED IBADAN INLAND CONTAINER DRY PORT LOCATED IN MONIYA AXIS OF IBADAN AND THE RISK RESPONSE STRATEGIES TO MANAGE THEM.
No ratings yet
TERM PAPER ON 7 RISKS THAT CAN THREATEN THE SUCCESS OF A PROPOSED IBADAN INLAND CONTAINER DRY PORT LOCATED IN MONIYA AXIS OF IBADAN AND THE RISK RESPONSE STRATEGIES TO MANAGE THEM.
6 pages
Supplier-Contractor Collaboration in The Construction Industry A Taxonomic Approach To The Literature of The 2000-2009 Decade
No ratings yet
Supplier-Contractor Collaboration in The Construction Industry A Taxonomic Approach To The Literature of The 2000-2009 Decade
15 pages
Sims (1972) - Gramger Causality
No ratings yet
Sims (1972) - Gramger Causality
8 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Image Segmentation: Unlocking Insights through Pixel Precision
From Everand
Image Segmentation: Unlocking Insights through Pixel Precision
Fouad Sabry
No ratings yet

III Unit Mtech 2023

Uploaded by

III Unit Mtech 2023

Uploaded by

Duplicate Data

Handling Missing Values

d = √[(x2 – x1)2 + (y2 – y1)2]

d = MAX[(x2 – x1) + (y2 –

A dissimilarity measure for cluster analysis is presented and used in the

Segmentation is the process of classifying the

Data Transformations Types

July 27, 2024 101

Small Margin Large Margin

July 27, 2024 102

July 27, 2024 104

P(C | X)  P(X | C ) P(C )

• Compute P(X|Ci) for each class

• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

Therefore, X belongs to class (“buys_computer = yes”)

• An unseen sample X = <rain, hot, high, false>

• Sample X is classified in class n (don’t play)

• makes computation possible

• Represent the knowledge in the form of IF-THEN rules

<=30 31..40 >40

You might also like