0% found this document useful (0 votes)
48 views2 pages

Week 11 Assignment 11.1.2

This document provides instructions for an assignment to implement DBSCAN clustering, an unsupervised learning algorithm, on a housing price dataset. The objectives are to download the King County housing dataset, check its shape and contents, handle any missing values, select independent variables for clustering, find the optimal number of clusters using the elbow method by running DBSCAN with different epsilon and minimum points parameters, train DBSCAN on the dataset, and visualize the resulting clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views2 pages

Week 11 Assignment 11.1.2

This document provides instructions for an assignment to implement DBSCAN clustering, an unsupervised learning algorithm, on a housing price dataset. The objectives are to download the King County housing dataset, check its shape and contents, handle any missing values, select independent variables for clustering, find the optimal number of clusters using the elbow method by running DBSCAN with different epsilon and minimum points parameters, train DBSCAN on the dataset, and visualize the resulting clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

School of Computer Science Engineering and Technology

Course- BTech Type- Core


Course Code- CSET301 Course Name-AIML
Year- 2022 Semester- Odd
Date- 16-11-2022 Batch- V Sem

Lab Assignment No. 11_1.2

Exp. No. Name CO-1 CO-2 CO-3


11.1.2 DBSCAN clustering ✓ ✓ --

Objective: To implement Density Based Spacial Clustering of Applications with noise


(DBSCAN) (Unsupervised Learning).

Description
DBSCAN stands for Density-Based Spatial Clustering of Applications with Noise and is one of
clustering algorithms implemented in scikit-learn library. It was proposed by Martin Ester, Hans-
Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996 in their famous article “A Density-Based
Algorithm for Discovering Clusters in Large Spatial Database with Noise”.

Download the dataset from https://www.kaggle.com/code/mahmoudlimam/dbscan-clustering-


tutorial/data?select=kc_house_data.csv (10)

About this Dataset


This dataset contains house sale prices for King County, which includes Seattle. It includes homes
sold between May 2014 and May 2015. This dataset is having the following features
1. price
2. bedrooms
3. bathrooms
4. sqft_living
5. sqft_lot
6. floors
7. waterfront
8. view
9. grade
10. sqft_above
11. sqft_basement
12. yr_built
13. yr_renovated
14. sqft_living15
15. sqft_lot15
1. Check the shape of the dataset (5)
2. Print the first 10 rows of the dataset (5)
3. Display the list of columns of the dataset (5)
4. Plot pairplot graph using seaborn to understand the nature of data (5)
5. Check the presence of missing values. Handle it if present
6. Selecting the feature i.e., Identify the Independent variables and perform the extraction.
(Hint: Remove the Target Column as it is Unsupervised Learning Problem)
7. Finding the optimal number of clusters using the elbow method (20)
a) Execute the DBSCAN clustering on a given dataset for different
a. Eps, ε – distance (8,12.75,0.25)
b. MinPts – Minimum number of points within distance Eps (3,10)
b) Because DBSCAN creates clusters itself based on those two parameters, check the
number of generated clusters.
8. Training the DBSCAN algorithm on the training dataset (Hint: from sklearn.cluster import
DBSCAN ) (15)
9. Visualizing the clusters (Hint: Use .scatter() function) (20)

Suggested Platform: Python: Jupyter Notebook/Azure Notebook/Google Colab.

You might also like