Pattern Recognition - Tutorial 4
Pattern Recognition - Tutorial 4
i) Write a function to generate n ∈ N random samples based on the triangle distribution with
the parameters a, b, c ∈ R. Plot a histogram of the n samples with the given parameter set.
ii) Implement a function which estimates the density based on a set of n samples for a range given
by the parameters f rom ∈ R and to ∈ R on m ∈ N equidistant positions. The utilized kernel
is given by k ∈ {exponential, epanechnikov} with the window width h ∈ R. The function
shall return a set (e.g. a vector) of m estimated densities of the equidistant positions. Plot
(only) the implemented kernels first.
iii) Use the implemented functions to estimate the densities with the parameter set using both
kernels. Display the results and the underlying triangle density in one plot.
iv) How would you change the window width for the exponential kernel? Please explain your
decision.
b) Use your implementation to generate estimations using the parameter set in a) with the epanechnikov
kernel and five different reasonable values for h. Display the results and the underlying triangle den-
sity in one plot. How would you choose h in this example? Please explain your answer. (2)
c) Repeat Task a) with n = 10000. What do you observe? (1)
d) Repeat Task b) with n = 10000. What do you observe? What impact has n on h? (1)
2 Non-parametric classification
a) In which cases would you use Parzen window estimation, K-nearest neighbor estimation or posterior
estimation? You can argue, for example, with the size of the training set, the computational effort,
the memory requirements or other useful factors. (3)
b) Name and explain the requirements for a valid Parzen window φ(u) (kernel). (1)
c) What is the worst-case error-rate if you choose k = n for a dataset D with n samples and c
categories when utilizing the k-Nearest-Neighbour classifier? Please explain your answer. (2)
d) Show that Pn (ωi |x) = kki (posterior estimation - see slide 4-18). Give an intuition of this formula
with your own words. (2)
e) What is the naive approach to determine the k-Nearest-Neighbour for one sample x considering a
dataset with n samples with dimensionality d? What is the computational complexity? How could
this approach be improved? (2)