0% found this document useful (0 votes)
9 views2 pages

KDD WS 24 25 Practical Tasks

The document outlines practical tasks for the 'Knowledge Discovery in Databases' course for the Winter Semester 2024/2025, focusing on data manipulation and analysis using Python's Pandas library. It includes three main tasks involving loading datasets, handling missing values, performing one-hot encoding, applying clustering algorithms, and generating association rules. Each task requires the use of Jupyter Notebook and an IDE, emphasizing understanding the underlying code rather than just executing it.

Uploaded by

jakeramsons
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views2 pages

KDD WS 24 25 Practical Tasks

The document outlines practical tasks for the 'Knowledge Discovery in Databases' course for the Winter Semester 2024/2025, focusing on data manipulation and analysis using Python's Pandas library. It includes three main tasks involving loading datasets, handling missing values, performing one-hot encoding, applying clustering algorithms, and generating association rules. Each task requires the use of Jupyter Notebook and an IDE, emphasizing understanding the underlying code rather than just executing it.

Uploaded by

jakeramsons
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Winter Semester 2024/2025

„Knowledge Discovery in Databases“ – Practical Tasks

You can solve the following tasks within a Jupyter Notebook file and an IDE (e.g. VSCode) for
exam preparation. Not all kind of tasks may appear within the exam, nor is it as simple as
renaming some columns and run the identical code again. But if you understood what the code
is doing by getting results, you should be fine.

Task 1
a) Load the dataset “Netflix.csv” into a Pandas dataframe. Print the first lines as well as
the shape of the data. How many rows and columns does it have?

b) Identify missing values (NaN) and remove them (columns, rows) accordingly.

c) One column could be treated as Integer, but it is not. Identify the error and solve it by
replacing the values with an average of that column.

d) The column “type” has typos in some of its values. Print the unique values of that
column and then replace the typos with their correct form.

Task 2
a) Load the dataset “PeopleLikes.csv” into a Pandas dataframe and print out the first few
lines.

b) Use the TransactionEncoder to perform a one-hot encoding of the data. Hint: Since
the data is now a dataframe instead of a list, use .values.tolist() on your dataframe to
give the TransactionEncoder a list as input type.

c) Apply Apriori or FPGrowth on the one-hot encoded data. Use a reasonable value for
support (~30 itemsets should be frequent).

d) Create association rules with a confidence of 75% and answer the following questions:
• Where live the old people?
• What can be said about people who like chocolate?
• If someone likes apples and lives in Schmalkalden, what is his gender?
Task 3
a) Load the dataset “ClusteringDataset.csv” into a Pandas dataframe and print out the
first few lines.

b) Use a scatterplot to visualize the data. You can already guess if the dataset works well
with k-Means or DBSCAN.

c) Apply the k-Means algorithm with varying k from 2 to 5. What does the silhouette
coefficient tell you for each variant? Which one would be the best?

d) Plot again the data with the color of k-Means results. Is this expected?

e) Use a nearest-neighbor diagram to identify a suitable value for the epsilon parameter
of DBSCAN. You can calculate the distance to the 3rd next neighbor per object. What
would be a good value for the MinPts parameter of DBSCAN? How is this determined?

f) Apply DBSCAN and plot the results in the corresponding color. Does it perform better
than k-Means?

You might also like