KDD WS 24 25 Practical Tasks

The document outlines practical tasks for the 'Knowledge Discovery in Databases' course for the Winter Semester 2024/2025, focusing on data manipulation and analysis using Python's Pandas library. It includes three main tasks involving loading datasets, handling missing values, performing one-hot encoding, applying clustering algorithms, and generating association rules. Each task requires the use of Jupyter Notebook and an IDE, emphasizing understanding the underlying code rather than just executing it.

Uploaded by

jakeramsons

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views2 pages

KDD WS 24 25 Practical Tasks

Uploaded by

jakeramsons

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Winter Semester 2024/2025

„Knowledge Discovery in Databases“ – Practical Tasks

You can solve the following tasks within a Jupyter Notebook file and an IDE (e.g. VSCode) for
exam preparation. Not all kind of tasks may appear within the exam, nor is it as simple as
renaming some columns and run the identical code again. But if you understood what the code
is doing by getting results, you should be fine.

Task 1
a) Load the dataset “Netflix.csv” into a Pandas dataframe. Print the first lines as well as
the shape of the data. How many rows and columns does it have?

b) Identify missing values (NaN) and remove them (columns, rows) accordingly.

c) One column could be treated as Integer, but it is not. Identify the error and solve it by
replacing the values with an average of that column.

d) The column “type” has typos in some of its values. Print the unique values of that
column and then replace the typos with their correct form.

Task 2
a) Load the dataset “PeopleLikes.csv” into a Pandas dataframe and print out the first few
lines.

b) Use the TransactionEncoder to perform a one-hot encoding of the data. Hint: Since
the data is now a dataframe instead of a list, use .values.tolist() on your dataframe to
give the TransactionEncoder a list as input type.

c) Apply Apriori or FPGrowth on the one-hot encoded data. Use a reasonable value for
support (~30 itemsets should be frequent).

d) Create association rules with a confidence of 75% and answer the following questions:
• Where live the old people?
• What can be said about people who like chocolate?
• If someone likes apples and lives in Schmalkalden, what is his gender?
Task 3
a) Load the dataset “ClusteringDataset.csv” into a Pandas dataframe and print out the
first few lines.

b) Use a scatterplot to visualize the data. You can already guess if the dataset works well
with k-Means or DBSCAN.

c) Apply the k-Means algorithm with varying k from 2 to 5. What does the silhouette
coefficient tell you for each variant? Which one would be the best?

d) Plot again the data with the color of k-Means results. Is this expected?

e) Use a nearest-neighbor diagram to identify a suitable value for the epsilon parameter
of DBSCAN. You can calculate the distance to the 3rd next neighbor per object. What
would be a good value for the MinPts parameter of DBSCAN? How is this determined?

f) Apply DBSCAN and plot the results in the corresponding color. Does it perform better
than k-Means?

(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Pattern Recognition Lab
No ratings yet
Pattern Recognition Lab
24 pages
Pandas Worksheet
No ratings yet
Pandas Worksheet
3 pages
IP Practical 2023-24 (1 To 34)
100% (1)
IP Practical 2023-24 (1 To 34)
32 pages
ML Lab 04 Manual - Pandas and MatplotLib
No ratings yet
ML Lab 04 Manual - Pandas and MatplotLib
7 pages
Rakesh Kumar - 21554244 - Big Data - Assessment 2
No ratings yet
Rakesh Kumar - 21554244 - Big Data - Assessment 2
23 pages
Data Mining Homework Assignment #2: Dmytro Fishman, Anna Leontjeva and Jaak Vilo February 25, 2014
0% (3)
Data Mining Homework Assignment #2: Dmytro Fishman, Anna Leontjeva and Jaak Vilo February 25, 2014
3 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Final ML File
No ratings yet
Final ML File
34 pages
Lab Manual
No ratings yet
Lab Manual
80 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Abhiml ML File
No ratings yet
Abhiml ML File
74 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
Hints and Answers
No ratings yet
Hints and Answers
13 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Assignment III
No ratings yet
Assignment III
3 pages
Bda Prac 1 - Merged
No ratings yet
Bda Prac 1 - Merged
28 pages
Lab Manual
No ratings yet
Lab Manual
19 pages
Assigniment 2 Machine Learning
No ratings yet
Assigniment 2 Machine Learning
7 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
ASSi2 DSBDA
No ratings yet
ASSi2 DSBDA
4 pages
External
No ratings yet
External
11 pages
Probability and Statistics For ML - Cwa
No ratings yet
Probability and Statistics For ML - Cwa
822 pages
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
No ratings yet
MS6711 Data Mining Homework 1: 1.1 Implement K-Means Manually (8 PTS)
6 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Advanced Statistics Project
17% (6)
Advanced Statistics Project
2 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Jones Daniel Power BI 3 in 1 Comprehensive Guide of Tips and Tricks To Learn The Functions of Powe
No ratings yet
Jones Daniel Power BI 3 in 1 Comprehensive Guide of Tips and Tricks To Learn The Functions of Powe
397 pages
Lab 03
No ratings yet
Lab 03
32 pages
Research Engineer Screening Exercise PDF
No ratings yet
Research Engineer Screening Exercise PDF
4 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
23HCS4142 PDF
No ratings yet
23HCS4142 PDF
24 pages
Lab 02 - Introduction To Pandas
No ratings yet
Lab 02 - Introduction To Pandas
6 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Index
No ratings yet
Index
4 pages
Prerequisites: R Installation
No ratings yet
Prerequisites: R Installation
11 pages
Data Enggineering
No ratings yet
Data Enggineering
16 pages
TP1 - Machine Learning H
No ratings yet
TP1 - Machine Learning H
8 pages
Gec Practicals
No ratings yet
Gec Practicals
31 pages
Set-B - CT2 - AnswerKey
No ratings yet
Set-B - CT2 - AnswerKey
10 pages
Major 2020
No ratings yet
Major 2020
2 pages
EDA Explanations
No ratings yet
EDA Explanations
22 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
TP1 - Machine Learning
No ratings yet
TP1 - Machine Learning
8 pages
ABAP: 7.4 New Features: DATA (LT - Material) VALUE T - Matnr (FOR Ls - Material IN LT - Matnr (Ls - Material) )
No ratings yet
ABAP: 7.4 New Features: DATA (LT - Material) VALUE T - Matnr (FOR Ls - Material IN LT - Matnr (Ls - Material) )
3 pages
Numpy
No ratings yet
Numpy
13 pages
HW 02
No ratings yet
HW 02
3 pages
Set-D CT2 Answerkey
No ratings yet
Set-D CT2 Answerkey
11 pages
Employee Database Detailed
0% (1)
Employee Database Detailed
79 pages
Lab 3 & 4
No ratings yet
Lab 3 & 4
10 pages
MLFILE
No ratings yet
MLFILE
21 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
M PDF
No ratings yet
M PDF
13 pages
KDD WS 24 25 Theory Tasks
No ratings yet
KDD WS 24 25 Theory Tasks
3 pages
KDD WS 24 25 E4 Clustering I
No ratings yet
KDD WS 24 25 E4 Clustering I
2 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
01 134192 066 9559671601 28052022 103753pm
No ratings yet
01 134192 066 9559671601 28052022 103753pm
1 page
Exam Practice Questions
No ratings yet
Exam Practice Questions
3 pages
Oracle Upgrade From 11.2.0.1 To 11.2.0.4
No ratings yet
Oracle Upgrade From 11.2.0.1 To 11.2.0.4
41 pages
Chapter 1 Reference Sources and Basic Types
No ratings yet
Chapter 1 Reference Sources and Basic Types
32 pages
Exercise 1: 1. Getting To Know Arcgis
No ratings yet
Exercise 1: 1. Getting To Know Arcgis
19 pages
Biggest Data Breach in The Philippines
No ratings yet
Biggest Data Breach in The Philippines
1 page
Assignment Cover Sheet
No ratings yet
Assignment Cover Sheet
77 pages
Word Lists
No ratings yet
Word Lists
4 pages
Data Scientist: About Phonepe
No ratings yet
Data Scientist: About Phonepe
3 pages
Oracle 10g Installation Guide On Windows 7
No ratings yet
Oracle 10g Installation Guide On Windows 7
21 pages
Questionaire
No ratings yet
Questionaire
22 pages
Containerize Java A2c in Aws
No ratings yet
Containerize Java A2c in Aws
20 pages
Orange en 1
No ratings yet
Orange en 1
100 pages
FDS - C9 - Relational Database Design by ER - and EER-to-Relational Mapping
No ratings yet
FDS - C9 - Relational Database Design by ER - and EER-to-Relational Mapping
26 pages
SAD Group Asst
No ratings yet
SAD Group Asst
53 pages
11g Standby OEM
No ratings yet
11g Standby OEM
16 pages
UNIT-2 DBMS Part 1
No ratings yet
UNIT-2 DBMS Part 1
24 pages
Bramer - Deduplication of Database Search Results For SR in Endnote
No ratings yet
Bramer - Deduplication of Database Search Results For SR in Endnote
4 pages
Oracle Version Upgrade From 10.2.0.4 To 11.2.0.2
No ratings yet
Oracle Version Upgrade From 10.2.0.4 To 11.2.0.2
4 pages
Analysis On Handwriting Using Pen-Tablet For Identification of Person and Handedness
No ratings yet
Analysis On Handwriting Using Pen-Tablet For Identification of Person and Handedness
5 pages
ER Model Chapter 4
No ratings yet
ER Model Chapter 4
21 pages
Mysql Questions
No ratings yet
Mysql Questions
13 pages
RAexample
No ratings yet
RAexample
2 pages
Cluster
No ratings yet
Cluster
10 pages
2011 Gitam Sem Question Paper
No ratings yet
2011 Gitam Sem Question Paper
1 page
Glossary of Salesforce Terms 1692036627
No ratings yet
Glossary of Salesforce Terms 1692036627
8 pages
Embedded SQL ND ADVANCED SQL
No ratings yet
Embedded SQL ND ADVANCED SQL
3 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
From Everand
IGNOU PGDCA MCS 208 Data Structure and Algorithm Previous Years Unsolved Papers
Manish Soni
No ratings yet

KDD WS 24 25 Practical Tasks

Uploaded by

KDD WS 24 25 Practical Tasks

Uploaded by

Winter Semester 2024/2025

„Knowledge Discovery in Databases“ – Practical Tasks

You might also like