0% found this document useful (0 votes)
36 views3 pages

Task 2

Uploaded by

kevire6047
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views3 pages

Task 2

Uploaded by

kevire6047
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

TASK 2

Problem Statement
In modern urban planning, the efficient design of road networks is crucial for fostering
connectivity and sustainable development. Night-time satellite imagery offers a unique
perspective, with illuminated areas often indicative of urban centers. In this task, you have to
develop an algorithm to extract the cities from this satellite image.

You've got a simulated night-time satellite image, a grid of 64x64 squares with white dots
representing places where there's light. Your job is to create an algorithm that can find these dots
and locate the cities on the 64x64 grid by clustering them.
(NOTE: Refrain from employing any libraries such as Scikit Learn for clustering purposes)

The link for the images are given here: Images

After identifying cities through clustering, calculate the distances between them using the center
point of cities. Present the distances in a clear format, such as a table.
Resources for the task
1. How are images stored?

Think of an image like a grid of tiny colored squares. Each square is called a "pixel". These
pixels come together to form the image you see on your screen. Each pixel has information about
its color. The images given in this task are grayscale images.

In a grayscale image, each pixel is represented by a single value that denotes its brightness or
intensity. This value typically ranges from 0 to 255, where 0 represents black and 255 represents
white. The values in between represent varying shades of gray.

To explore more do read How Images Are Stored in a Computer | Grayscale & RGB

2. Working with Images in Python

Python offers various libraries for working with images. One popular library is Pillow. It allows
you to open, manipulate, and save many different image file formats.
Here you will be using it to convert the image into a 2D numpy array. This conversion allows
for easy manipulation and analysis of image data using the functionalities provided by NumPy.

Here is a guided example A Guide to Converting PIL Images to NumPy Arrays in Python

(Numpy Fundamentals:Python NumPy Tutorial for Beginners)

3. Locating and Clustering using K Means

Now that your image has been transformed into an array, you should be capable of pinpointing
the coordinates corresponding to the lights. Subsequently, clustering these points can aid in
identifying the cities depicted in the provided images.
You need to implement the K Means Algorithm on your own from scratch. No points in using
Scikit learn implementation.
Pseudo Code:

1. Initialize K cluster centroids randomly


2. Repeat until convergence:
a. Assign each data point to the nearest centroid
b. Recompute the centroids as the mean of the data points assigned to each
centroid
c. Check for convergence criteria (e.g., small change in centroids or fixed number
of iterations)
3. Return the final centroids and cluster assignments

Resources for K Means :


1. StatQuest: K-means clustering
2. Stanford CS229: Machine Learning | Summer 2019 | Lecture 16 - K-means, GMM,…

4. Determining the Optimal Number of Clusters Using the Elbow Method

The Elbow Method is a popular technique used in cluster analysis to determine the optimal
number of clusters in a dataset. It's a graphical approach that helps in finding the point of
diminishing returns when adding more clusters to the data.
Read more:
https://www.analyticsvidhya.com/blog/2021/01/in-depth-intuition-of-k-means-clustering-algorith
m-in-machine-learning/
Pseudo Code
1. Initialize an empty list to store the within-cluster sum of squares (WCSS) for each
value of K
2. For k = 1 to max_clusters:
Perform clustering using K-means algorithm with k clusters
Calculate the within-cluster sum of squares (WCSS) for the clustering `
result
Append the WCSS value to the list
3. Plot the WCSS values against the number of clusters (k)
4. Identify the "elbow point" where the rate of decrease in WCSS slows down
5. Return the optimal number of clusters (k) corresponding to the elbow point

Further Exploration (Optional)


Kmeans might not be the best suited algorithm for this task due to its requirement of
number of clusters as an input. There are other clustering algorithms which can optimize this
task. You're free to explore more and come up with a better approach

You might also like