0% found this document useful (0 votes)

44 views26 pages

Chapter 3: Categorical Attributes

The document discusses univariate and multivariate analysis of categorical attributes. It describes modeling a single categorical attribute with multiple symbolic values as a Bernoulli variable or multivariate Bernoulli variable. The mean, variance, and probability mass function are defined for both cases. As an example, a categorical attribute of sepal length is modeled as a multivariate Bernoulli variable.

Uploaded by

s8nd11d UNI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views26 pages

Chapter 3: Categorical Attributes

Uploaded by

s8nd11d UNI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Data Mining and Machine Learning:

Fundamental Concepts and Algorithms

dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 3: Categorical Attributes

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 1 / 26
Univariate Analysis: Bernoulli Variable
Consider a single categorical attribute, X , with domain dom(X ) = {a1 , a2 , . . . , am }
comprising m symbolic values. The data D is an n × 1 symbolic data matrix given
as
 
X
x1 
 
D = x2 
 
 .. 
.
xn

where each point xi ∈ dom(X ).

Bernoulli Variable: Special case when m = 2

(
1 if v = a1
X (v ) =
0 if v = a2

i.e., dom(X ) = {0, 1}.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 2 / 26
Bernoulli Variable: Mean and Variance

Assume that each symbolic point has

The probability mass function (PMF) of been mapped to its binary value. The set
X is given as {x1 , x2 , . . . , xn } is a random sample
drawn from X .
P(X = x) = f (x) = p x (1 − p)1−x The sample mean is given as
n
The expected value of X is given as 1X n1
µ̂ = xi = = p̂
n i =1 n
µ = E [X ] = 1 · p + 0 · (1 − p) = p
where ni is the number of points with
xj = i in the random sample (equal to
and the variance of X is given as the number of occurrences of symbol ai ).
The sample variance is given as
σ 2 = var (X ) = p(1 − p)
σ̂ 2 = p̂(1 − p̂)

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 3 / 26
Binomial Distribution: Number of Occurrences
Given the Bernoulli variable X , let {x1 , x2 , . . . , xn } be a random sample of size n.
Let N be the random variable denoting the number of occurrences of the symbol
a1 (value X = 1). N has a binomial distribution, given as

n
f (N = n1 | n, p) = p n1 (1 − p)n−n1
n1

N is theP
sum of the n independent Bernoulli random variables xi IID with X , that
n
is, N = i =1 xi . The mean or expected number of occurrences of a1 is
" n # n n
X X X
µN = E [N] = E xi = E [xi ] = p = np
i =1 i =1 i =1

The variance of N is
n
X n
X
σN2 = var (N) = var (xi ) = p(1 − p) = np(1 − p)
i =1 i =1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 4 / 26
Multivariate Bernoulli Variable
For the general case when dom(X ) = {a1 , a2 , . . . , am }, we model X as an
m-dimensional or multivariate Bernoulli random variable X = (A1 , A2 , . . . , Am )T ,
where each Ai is a Bernoulli variable with parameter pi denoting the probability of
observing symbol ai .
However, X can assume only one of the symbolic values at any one time. Thus,

X (v ) = e i if v = ai

where e i is the i-th standard basis vector in m dimensions. The range of X

consists of m distinct vector values {e 1 , e 2 , . . . , e m }.
The PMF of X is

m
Y e
P(X = e i ) = f (e i ) = pi = pj ij
j =1

Pm
with i =1 pi = 1.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 5 / 26
Multivariate Bernoulli: Mean

The mean or expected value of X can be obtained as

     
1 0 p1
X m X m 0 0  p2 
µ = E [X ] = e i f (e i ) = e i pi =  .  p1 + · · · +  .  pm =  .  = p
     
 ..   ..   .. 
i =1 i =1
0 1 pm

The sample mean is

   
n1 /n p̂1
n m n /n
1 ni 2   p̂2 
X X    
µ̂ = xi = e i =  .  =  .  = p̂

n i =1 i =1
n  ..   .. 
nm /n p̂m

where ni is the number of occurrences of the vector value eP

i in the sample, i.e.,
m
the number of occurrences of the symbol ai . Furthermore, i =1 ni = n.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 6 / 26
Multivariate Bernoulli Variable: sepal length

Bins Domain Counts Probability Mass Function

[4.3, 5.2] Very Short (a1 ) n1 = 45 The total sample size is n = 150; the
(5.2, 6.1] Short (a2 ) n2 = 50 estimates p̂i are:
(6.1, 7.0] Long (a3 ) n3 = 43
(7.0, 7.9] Very Long (a4 ) n4 = 12 p̂1 = 45/150 = 0.3
p̂2 = 50/150 = 0.333
We model sepal length as a multivariate
p̂3 = 43/150 = 0.287
Bernoulli variable X
 p̂4 = 12/150 = 0.08

 e 1 = (1, 0, 0, 0) if v = a1 f (x)

e = (0, 1, 0, 0)
2 if v = a2 0.333
b
X (v ) = 0.3
0.3
b 0.287
b
e 3 = (0, 0, 1, 0)
 if v = a3


e 4 = (0, 0, 0, 1) if v = a4 0.2

For example, the symbolic point 0.1 0.08

x1 = Short = a2 is represented as the vector

0 x
(0, 1, 0, 0)T = e 2 . e1
Very Short
e2
Short
e3
Long
e4
Very Long

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 7 / 26
Multivariate Bernoulli Variable: Covariance Matrix
We have X = (A1 , A2 , . . . , Am )T , where Ai is the Bernoulli variable corresponding to
symbol ai . The variance for each Bernoulli variable Ai is

σi2 = var (Ai ) = pi (1 − pi )

The covariance between Ai and Aj is

σij = E [Ai Aj ] − E [Ai ] · E [Aj ] = 0 − pi pj = −pi pj

Negative relationship since Ai and Aj cannot both be 1 at the same time.

The covariance matrix for X is given as
 2   
σ1 σ12 . . . σ1m p1 (1 − p1 ) −p1 p2 ··· −p1 pm
 σ12 2  
 σ2 . . . σ2m   −p1 p2 p2 (1 − p2 ) · · · −p2 pm  
Σ= . . .. .  =  . . .. .. 
 .. .. . ..   .. .. . . 
2
σ1m σ 2m . . . σm −p1 pm −p2 pm · · · pm (1 − pm )

More compactly Σ = diag (p) − p · p T where µ = p = (p1 , · · · , pm )T .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 8 / 26
Categorical, Mapped Binary and Centered Dataset
Modeling as multivariate Bernoulli variable is equivalent to treating X (xi ) as a new
n × m binary data matrix
X A1 A2 Z1 Z2
x1 Short x1 0 1 z1 −0.4 0.4
x2 Short x2 0 1 z2 −0.4 0.4
x3 Long x3 1 0 z3 0.6 −0.6
x4 Short x4 0 1 z4 −0.4 0.4
x5 Long x5 1 0 z5 0.6 −0.6

X is the multivariate Bernoulli variable


e 1 = (1, 0)T if v = Long(a1 )
X (v ) = e = (0, 1)T if v = Short(a2 )
 2

The sample mean and covariance matrix are

0.24 −0.24
µ̂ = p̂ = (2/5, 3/5)T = (0.4, 0.6)T b = diag (p̂) − p̂p̂ T =
Σ
−0.24 0.24
From the centered data, we have Z = (Z1 , Z2 )T and

b = 1ZTZ =
Σ
0.24 −0.24
5 −0.24 0.24
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 9 / 26
Multinomial Distribution: Number of Occurrences
Let {x 1 , x 2 , . . . , x n } be a random sample from X . Let Ni be the random variable denoting
number of occurrences of symbol ai in the sample, and let N = (N1 , N2 , . . . , Nm )T .
N has a multinomial distribution, given as
! m
n Y n
f N = (n1 , n2 , . . . , nm ) | p = pi i
n1 n2 . . . nm
i =1

The mean and covariance matrix of N are:


np1
 . 
µN = E [N] = nE [X ] = n · µ = n · p =  .. 
npm
 
np1 (1 − p1 ) −np1 p2 ··· −np1 pm
 −np1 p2 np2 (1 − p2 ) ··· −np2 pm 
 
ΣN = n · (diag (p) − pp T ) =  .. .. .. .. 
 . . . . 
−np1 pm −np2 pm ··· npm (1 − pm )
The sample mean and covariance matrix for N are

µ̂N = np̂ b N = n diag (p̂) − p̂p̂ T
Σ
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 10 / 26
Bivariate Analysis
Assume the data comprises two categorical attributes, X1 and X2 ,

dom(X1 ) = {a11 , a12 , . . . , a1m1 }

dom(X2 ) = {a21 , a22 , . . . , a2m2 }

We model X1 and X2 as multivariate Bernoulli variables X 1 and X 2 with dimensions m1

and m2 , respectively. The joint distribution
of X 1 and X 2 is modeled as the m1 + m2
X1
dimensional vector variable X =
X2
X (v ) e
1i
X (v1 , v2 )T =
1 1
=
X 2 (v2 ) e 2j

provided that v1 = a1i and v2 = a2j .

The joint PMF for X is given as the m1 × m2 matrix
 
p11 p12 . . . p 1m 2
 p21 p22 . . . p 2m 2 
 
P 12 =  . . .. .. 
 .. .. . . 
p m1 1 p m1 2 . . . p m1 m2

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 11 / 26
Bivariate Empirical PMF: sepal length and sepal width

X1 :sepal length X2 :sepal width

Bins Domain Counts
Bins Domain Counts
[4.3, 5.2] Very Short (a1 ) n1 = 45
[2.0, 2.8] Short (a1 ) 47
(5.2, 6.1] Short (a2 ) n2 = 50
(2.8, 3.6] Medium (a2 ) 88
(6.1, 7.0] Long (a3 ) n3 = 43
(3.6, 4.4] Long (a3 ) 15
(7.0, 7.9] Very Long (a4 ) n4 = 12

Observed Counts (nij )

X2
Short (e 21 ) Medium (e 22 ) Long (e 23 )
Very Short (e 11 ) 7 33 5
Short (e 22 ) 24 18 8
X1
Long (e 13 ) 13 30 0
Very Long (e 14 ) 3 7 2

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 12 / 26
Bivariate Empirical PMF: sepal length and sepal width

f (x)

Joint probabilities: p̂ij = nij /n

0.2

0.22
b

0.16 0.1
b
0.2
b

0.12 0.047
b
0.087
b
e 11 b
e 12 e 21
e 13
e 22
e 14
X1 0.02 0.053 0.033 e 23
b b
0.047 b
b

X2
0b
0.013
b

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 13 / 26
Attribute Dependence: Contingency Analysis

The contingency table for X 1 and X 2 is the m1 × m2 matrix of observed counts nij
 
n11 n12 · · · n 1m 2
 n21 n 22 · · · n 2m 2 
b 12 = 
N 12 = n · P  . .. .. .. 

 . . . . . 
n m1 1 n m1 2 · · · n m1 m2

b 12 is the empirical joint PMF for X 1 and X 2 . The contingency table is

where P
augmented with row and column marginal counts, as follows:
 1  2
n1 n1
 ..   .. 
N 1 = n · p̂ 1 =  .  N 2 = n · p̂ 2 =  . 
1 2
nm 1
nm 2

1
N 1 and N 2 have a multinomial distribution with parameters p 1 = (p11 , . . . , pm 1
) and
2 2
p 2 = (p1 , . . . , pm2 ), respv.
N 12 also has a multinomial distribution with parameters P 12 = {pij }, for 1 ≤ i ≤ m1 and
1 ≤ j ≤ m2 .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 14 / 26
Contingency Table: sepal length vs. sepal width

Sepal width (X2 )

Sepal length (X1 )

Short Medium Long

a21 a22 a23 Row Counts
Very Short (a11 ) 7 33 5 n11 = 45
Short (a12 ) 24 18 8 n21 = 50
Long (a13 ) 13 30 0 n31 = 43
Very Long (a14 ) 3 7 2 n41 = 12
Column Counts n12 = 47 n22 = 88 n32 = 15 n = 150

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 15 / 26
Chi-Squared Test for Independence
Assume X 1 and X 2 are independent. Then, their joint PMF is
p̂ij = p̂i1 · p̂j2
The expected frequency for each pair of values is
2
ni1 nj ni1 nj2
eij = n · p̂ij = n · p̂i1 · p̂j2 = n · · =
n n n
The χ2 statistic quantifies the difference between observed and expected counts
m1 X
X m2
(nij − eij )2
χ2 =
eij
i =1 j =1

The sampling distribution for the χ2 statistic follows the chi-squared density function:
q
1 −1 − x
f (x|q) = x2 e 2
2q/2 Γ(q/2)
where q is the degrees of freedom
q = |dom(X1 )| × |dom(X2 )| − (|dom(X1 )| + |dom(X2 )|) + 1
= m 1 m2 − m 1 − m 2 + 1
= (m1 − 1)(m2 − 1)
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 16 / 26
Chi-Squared Test: sepal length and sepal width
Expected Counts X2
Short (a21 ) Medium (a22 ) Short (a23 )
Very Short (a11 ) 14.1 26.4 4.5
Short (a12 ) 15.67 29.33 5.0
X1
Long (a13 ) 13.47 25.23 4.3
Very Long (a14 ) 3.76 7.04 1.2
Observed Counts X2
Short (a21 ) Medium (a22 ) Long (a23 )
Very Short (a11 ) 7 33 5
Short (a12 ) 24 18 8
Long (a13 ) 13 30 0
Very Long (a14 ) 3 7 2
The chi-squared statistic value is χ2 = 21.8.
The number of degrees of freedom are

q = (m1 − 1) · (m2 − 1) = 3 · 2 = 6

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 17 / 26
Chi-Squared Distribution (q = 6).

The p-value of a statistic θ is defined as the probability of obtaining a value at least as

extreme as the observed value.
The null hypothesis, that X1 and X2 are independent, is rejected if p-value(z) ≤ α, say
α = 0.01. We have p-value(21.8) = 0.0013. Thus, we reject the null hypothesis, and
conclude that X1 and X2 are dependent.

f (x|6)

0.15

0.12

0.09

0.06
α = 0.01
0.03
H0 Rejection Region

b bC x
0
16.8 21.8
0 5 10 15 20 25
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 18 / 26
Multiway Contingency Analysis
Given X = (X1 , X2 , · · · , Xd )T . The chi-squared statistic is given as
m1 m2 md
X (ni − ei )2 X X X (ni 1 ,i2 ,...,id − ei1 ,i2 ,...,id )2
χ2 = = ···
ei i1 =1 i2 =1 id =1
ei1 ,i2 ,...,id
i

Under the null hypothesis, that attributes are independent, the expected number
of occurrences of the symbol tuple (a1i1 , a2i2 , . . . , adid ) is given as
d
Y ni11 ni22 . . . nidd
ei = n · p̂i = n · p̂ijj =
j =1
nd −1

The total number of degrees of freedom for the chi-squared distribution is given as
d
Y d
X
q= |dom(Xi )| − |dom(Xi )| + (d − 1)
i =1 i =1
d
Y d
X
= mi − mi + d − 1
i =1 i =1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 19 / 26
3-Way Contingency Table
X1 : sepal length, X2 : sepal width and X3 : Iris type

1
0
7 0
3
8 0
19
3 0
1 7

3
33 2

X
X1 0 5
a14 3
45 0 8
X1 a13 0
50 0 0
a12 5 0
43 0 0
a11 17 0
12 12

3
a3
5 0

50
11

a3 3
0 0

X
2
50
0
0

1
a3
50
47

2
X
88
15
a2 1
a2
2
3
a2
2
X

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 20 / 26
3-Way Contingency Analysis

X3 (a31 /a32 /a33 )

X2
a21 a22 a23
a11 1.25 2.35 0.40
a12 4.49 8.41 1.43
X1
a13 5.22 9.78 1.67
a14 4.70 8.80 1.50

The value of the χ2 statistic is χ2 = 231.06, and the number of degrees of freedom is
q = 4 · 3 · 3 − (4 + 3 + 3) + 2 = 36 − 10 + 2 = 28.
For a significance level of α = 0.01, the critical value of the chi-square distribution is
z = 48.28.
The observed value of χ2 = 231.06 is much greater than z, and it is thus extremely
unlikely to happen under the null hypothesis. We conclude that the three attributes are
not 3-way independent, but rather there is some dependence between them.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 21 / 26
Distance and Angle
With the modeling of categorical attributes as multivariate Bernoulli variables, it is
possible to compute the distance or the angle between any two points x i and x j :
   
e 1i1 e 1j1
x i =  ...  x j =  ... 
   

e d id e d jd

The different measures of distance and similarity rely on the number of matching
and mismatching values (or symbols) across the d attributes X k .
The number of matching values s is given as:
d
X
s = x Ti x j = (e kik )T e kjk
k =1

The number of mismatches is simply d − s. Also useful is the norm of each point:
2
kx i k = x Ti x i = d

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 22 / 26
Distance and Angle

The Euclidean distance between x i and x j is given as

q p
δ(x i , x j ) = kx i − x j k = x Ti x i − 2x i x j + x Tj x j = 2(d − s)

The Hamming distance is given as

δH (x i , x j ) = d − s

Cosine Similarity: The cosine of the angle is given as

x Ti x j s
cos θ = =
kx i k · kx j k d

The Jaccard Coeff icient is given as

s s
J(x i , x j ) = =
2(d − s) + s 2d − s

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 23 / 26
Discretization
Discretization, also called binning, converts numeric attributes into categorical ones.
Equal-Width Intervals: Partition the range of X into k equal-width intervals. The
interval width is simply the range of X divided by k:
xmax − xmin
w=
k
Thus, the ith interval boundary is given as

vi = xmin + iw , for i = 1, . . . , k − 1

Equal-Frequency Intervals: We divide the range of X into intervals that contain

(approximately) equal number of points. The intervals are computed from the empirical
quantile or inverse cumulative distribution function

F̂ −1 (q) = min{x | P(X ≤ x) ≥ q}

We require that each interval contain 1/k of the probability mass; therefore, the interval
boundaries are given as follows:

vi = F̂ −1 (i/k) for i = 1, . . . , k − 1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 24 / 26
Equal-Frequency Discretization: sepal length (4 bins)

Quartile values:

F̂ − 1(0.25) = 5.1
F̂ − 1(0.5) = 5.8
Empirical Inverse CDF
8.0 F̂ − 1(0.75) = 6.4
7.5
7.0
6.5
Range: [4.3, 7.9]
F̂ −1 (q)

6.0
5.5
5.0
4.5
Bin Width Count
4
0 0.25 0.50 0.75 1.00 [4.3, 5.1] 0.8 n1 = 41
q
(5.1, 5.8] 0.7 n2 = 39
(5.8, 6.4] 0.6 n3 = 35
(6.4, 7.9] 1.5 n4 = 35

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 25 / 26
Data Mining and Machine Learning:
Fundamental Concepts and Algorithms
dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 3: Categorical Attributes

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 3: Categorical Attributes 26 / 26

Basic Statistics 1
100% (2)
Basic Statistics 1
12 pages
Stochastic Control Notes
No ratings yet
Stochastic Control Notes
173 pages
Solutions CN2116 HW7
No ratings yet
Solutions CN2116 HW7
3 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Durrande 2020
No ratings yet
Durrande 2020
90 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Ch-1 Probabilistic Distributions
No ratings yet
Ch-1 Probabilistic Distributions
46 pages
Unit 19
No ratings yet
Unit 19
16 pages
Advanced Machine Learning: CS 281
100% (1)
Advanced Machine Learning: CS 281
88 pages
I2ml3e Chap5
No ratings yet
I2ml3e Chap5
26 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
Probability - Statistics - Class Notes
No ratings yet
Probability - Statistics - Class Notes
15 pages
Solution
No ratings yet
Solution
148 pages
Bernoulli Distribution
No ratings yet
Bernoulli Distribution
5 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
CSE291D Lecture 3: Conjugate Priors Generative Models For Discrete Data
No ratings yet
CSE291D Lecture 3: Conjugate Priors Generative Models For Discrete Data
71 pages
Class Lecture-2
No ratings yet
Class Lecture-2
6 pages
Covariance Matrix (W Krzanowski)
No ratings yet
Covariance Matrix (W Krzanowski)
5 pages
Maths Roadmap For Machine Learning - Statistics
No ratings yet
Maths Roadmap For Machine Learning - Statistics
5 pages
Johnson11MLSS Talk Extras
No ratings yet
Johnson11MLSS Talk Extras
73 pages
My Notes For Discrete and Continuous Distributions 987654
No ratings yet
My Notes For Discrete and Continuous Distributions 987654
28 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
19 pages
Bi Variate
No ratings yet
Bi Variate
27 pages
Fisher Information
No ratings yet
Fisher Information
59 pages
הרצאה - Density Estimations
No ratings yet
הרצאה - Density Estimations
66 pages
Probability Distributions
No ratings yet
Probability Distributions
37 pages
Multi Variate
No ratings yet
Multi Variate
59 pages
Probabilities For Machine Learning: Roland Memisevic
No ratings yet
Probabilities For Machine Learning: Roland Memisevic
19 pages
Econometricks-Short Guide
No ratings yet
Econometricks-Short Guide
110 pages
Stats Int 2
No ratings yet
Stats Int 2
13 pages
2 Probability
No ratings yet
2 Probability
30 pages
Important PMFs and PDFs
No ratings yet
Important PMFs and PDFs
7 pages
The Gibbs Sampler: Function
No ratings yet
The Gibbs Sampler: Function
1 page
Multivariate Statistical Distributions
No ratings yet
Multivariate Statistical Distributions
12 pages
CHP 5
No ratings yet
CHP 5
63 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
4gaussian Discriminant
No ratings yet
4gaussian Discriminant
50 pages
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
No ratings yet
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
84 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Suresh Kumar 5-9 Chap Notes
No ratings yet
Suresh Kumar 5-9 Chap Notes
24 pages
Pennec - Intrinsic Statistics On Riemannian Manifolds
No ratings yet
Pennec - Intrinsic Statistics On Riemannian Manifolds
40 pages
Msiii PDF
No ratings yet
Msiii PDF
118 pages
Business Statistics 41000-03/81 Fall 2009 Cheat Sheet For Final Exam Hedibert F. Lopes Exploratory Data Analysis
No ratings yet
Business Statistics 41000-03/81 Fall 2009 Cheat Sheet For Final Exam Hedibert F. Lopes Exploratory Data Analysis
4 pages
AIML Module - 4
No ratings yet
AIML Module - 4
25 pages
2 Mle
No ratings yet
2 Mle
28 pages
STAT3006: Tutorial 1: Sample Solutions
No ratings yet
STAT3006: Tutorial 1: Sample Solutions
10 pages
StatisticsToolbox II
No ratings yet
StatisticsToolbox II
16 pages
91 With: Probability
No ratings yet
91 With: Probability
13 pages
Chapter 1: Data Mining and Analysis
No ratings yet
Chapter 1: Data Mining and Analysis
24 pages
Lecture 4
No ratings yet
Lecture 4
42 pages
Formulario Estadística Segundo Parcial
No ratings yet
Formulario Estadística Segundo Parcial
4 pages
Lecture 3 - Adv. Probability - Discrete Random Variables
No ratings yet
Lecture 3 - Adv. Probability - Discrete Random Variables
51 pages
Module-2 Notes-Bcs602
No ratings yet
Module-2 Notes-Bcs602
18 pages
Categorical Reparameterization With Gumbel Softmax
No ratings yet
Categorical Reparameterization With Gumbel Softmax
13 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Chapter 2 - Random Variables and Distributions
No ratings yet
Chapter 2 - Random Variables and Distributions
31 pages
2 DP Handout
No ratings yet
2 DP Handout
41 pages
Statistics and Econometrics
No ratings yet
Statistics and Econometrics
12 pages
Advanced Statistics
100% (1)
Advanced Statistics
131 pages
Chap1 Introduction 2may24
No ratings yet
Chap1 Introduction 2may24
21 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
57 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
79 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
59 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
28 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
16 pages
Chapter 7: Dimensionality Reduction
No ratings yet
Chapter 7: Dimensionality Reduction
34 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
45 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
31 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
58 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
29 pages
Chapter 6: High-Dimensional Data
No ratings yet
Chapter 6: High-Dimensional Data
21 pages
Chapter 8: Itemset Mining
No ratings yet
Chapter 8: Itemset Mining
34 pages
Chapter 10: Sequence Mining
No ratings yet
Chapter 10: Sequence Mining
37 pages
Data Science in Agriculture Part I: Introduction
100% (1)
Data Science in Agriculture Part I: Introduction
2 pages
High Precision Agriculture: An Application of Improved Machine-Learning Algorithms 2019
No ratings yet
High Precision Agriculture: An Application of Improved Machine-Learning Algorithms 2019
6 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Introduction of Data Science - Mahatma Gandhi Central University
No ratings yet
Introduction of Data Science - Mahatma Gandhi Central University
17 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
R 2 Calculations
No ratings yet
R 2 Calculations
30 pages
Budgeting
No ratings yet
Budgeting
17 pages
Nested Logit
No ratings yet
Nested Logit
13 pages
Sempro
No ratings yet
Sempro
24 pages
Hull OFOD11 e Solutions CH 14
No ratings yet
Hull OFOD11 e Solutions CH 14
7 pages
.1.statistics MCQS
No ratings yet
.1.statistics MCQS
5 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
A Novel Hybridization of Artificial Neural Networks and ARIMA Models For Time Series Forecasting
No ratings yet
A Novel Hybridization of Artificial Neural Networks and ARIMA Models For Time Series Forecasting
12 pages
Chapter 19 Decision Analysis
No ratings yet
Chapter 19 Decision Analysis
37 pages
Measure of Central Tendency of Ungrouped Data
No ratings yet
Measure of Central Tendency of Ungrouped Data
19 pages
Coh 602-Sas Analysis Project
100% (1)
Coh 602-Sas Analysis Project
10 pages
LR-Heteroskedastisitas Test-Log10 Method
No ratings yet
LR-Heteroskedastisitas Test-Log10 Method
4 pages
Chapter 10-Advanced Control Charting Techniques
No ratings yet
Chapter 10-Advanced Control Charting Techniques
30 pages
CRC and Error Detection
No ratings yet
CRC and Error Detection
14 pages
Practice Test - CH 10 and 11
No ratings yet
Practice Test - CH 10 and 11
11 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Bman 07
No ratings yet
Bman 07
75 pages
Bus 173 - 2
No ratings yet
Bus 173 - 2
27 pages
North Luzon Philippines State College: Midterm Examination IN Biostatistics
No ratings yet
North Luzon Philippines State College: Midterm Examination IN Biostatistics
6 pages
QA - SQE 1 - TGMP ME Materials SVLRR
No ratings yet
QA - SQE 1 - TGMP ME Materials SVLRR
43 pages
GVSusing BUGS
No ratings yet
GVSusing BUGS
19 pages
CH 02
100% (1)
CH 02
31 pages
Poisson CDF Table
No ratings yet
Poisson CDF Table
6 pages
Statistical Analysis With Software Application - 2nd Summative Test
No ratings yet
Statistical Analysis With Software Application - 2nd Summative Test
5 pages
Time Series Final Review
No ratings yet
Time Series Final Review
18 pages
أثر وحدات تعليمية بالقصص الحركية ممزوجة بالألعاب الصغيرة لتنمية بعض المهارات الحركية الأساسية الإنتقالية لدى تلاميذ السنة الثانية إبتدائي (6 7سنوات) .
No ratings yet
أثر وحدات تعليمية بالقصص الحركية ممزوجة بالألعاب الصغيرة لتنمية بعض المهارات الحركية الأساسية الإنتقالية لدى تلاميذ السنة الثانية إبتدائي (6 7سنوات) .
20 pages
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
No ratings yet
Advanced Econometrics: Based On The Textbook by Verbeek: A Guide To Modern Econometrics
24 pages
Distribution PPT
No ratings yet
Distribution PPT
75 pages
Applying The Concepts: Assignment 2
No ratings yet
Applying The Concepts: Assignment 2
2 pages