Project in DSA Java
Project in DSA Java
1 Introduction
When ordering food online or deciding on a place to eat, you may often have searched ’places to
eat near me’ on Google or opened Zomato to look up for restaurants near you. In this module,
you will implement a program that does exactly this – finds restaurants in the neighbourhood
of users.
2 Problem Statement
You need to write a Java program to solve the ’places to eat near me’ problem. Your Java
program must solve this problem using 2-d trees efficiently. A description of the algorithms
to build 2-d trees and count elements within a range has been provided in the next section.
The input to the program would be present in two files: restaurants.txt and queries.txt.
restaurants.txt contains restaurant locations – latitude and longitude. queries.txt contains
user query locations. The output file output.txt should contain the number of restaurants
which are within (≤) 100 latitude and 100 longitude units of users queries.
[Optional: Cost is an important deciding factor and a user may want to filter restaurants based
on whether they are craving to eat some cheap fast food or whether they want to eat somewhere
fancy. Implement a program that filters restaurants on three dimensions – latitude, longitude
and cost. The user will provide a budget and the program should only find number of restaurants
within the budget.]
1
node and the orthogonal ’range’ covered by the node. The range of a node is based on the
splits from root to that node and is of the form ((xmin , xmax ], (ymin , ymax ])), except when
xmax or ymax is inf . ’]’ bracket indicates closed range with end point included whereas ’)’
bracket indicates open range. The range of the root is ((−inf, +inf ), (−inf, +inf )) and
with each split, the range of the children can be found from that of the parent.
2. In the class, you looked at an algorithm to find/count points on one side of an orthogonal
hyper-plane given a 2-d tree. Here, given a 2-d tree, we discuss a recursive algorithm to
count the number of points within a rectangular range R. The algorithm is as follows:
At the root node, find the intersection of R with the range of the children. Finding this
intersection should be O(1) using the range stored at each node. Consider a child c - if
range(c) is fully contained in R, then add the count of points in c to the total count of
points within range R; if range(c) is fully outside R then skip node c; else if range(c)
intersects R then recursively call the algorithm on c with c as the root. Repeat the same
with the other child. On reaching a leaf node, simply see if the node is contained in R
and update the count of points in range accordingly.
• Right Sub-tree – (1,2), (4,-6), (9,6), (5,5), (7,3), (2,8), (6,7), (8,-3)
Now, both of the sub-trees have depth 1 and will thus be split based on the y coordinates.
Consider the left sub-tree co-ordinates, the median of the y coordinates of the points is -2 and
the left sub-tree will further be divided into –
• Left-left Sub-tree – (-2,-2), (-5,-7), (-8,-2), (-6,-3)
2
5
Longitude
0
−5
−8 −6 −4 −2 0 2 4 6 8 10
Latitude
x ≤ −1
y ≤ −2 y≤3
x ≤ −6 x ≤ −6 x≤4 x≤5
y ≤ −3 y ≤ −7 y≤1 y≤4
number of points under that node, for example the node y ≤ −2, has 8 points in total and
has range (−inf, −1] for x coordinates and (−inf, inf ) for y coordinates. Note: Since we are
splitting the data based on the median, the tree will be balanced.
Example of the range query: Let’s say we wish to calculate the number of points
(restaurants) contained in the range R = [-6,-2],[1,5]. At the root node we calculate the
intersection of this range R with the ranges of the left and right sub-tree. As the right sub-tree
has range (−1, inf ), (−inf, inf ), the intersection is null and therefore we ignore this sub-tree.
The range of the left sub-tree is not fully contained in R therefore we apply the algorithm
recursively at that node. After calculating the intersection with ranges at this point, we see
that the node to the left of y ≤ −2 does not satisfy the range R. We keep applying the algorithm
until we reach the node (−6, 1). This is the only node that satisfies the given range and therefore
the number of restaurants in this region is 1.
Note: The range includes the end points.
Sample I/O: The sample i/o provided has a list of restaurants and queries as per the spec-
3
Figure 3: Representation of sample i/o queries
ifications given. Figure 3 displays the location of the restaurants and the squares representing
the ranges of the queries.
Acknowledgement: Thanks to Aniruddha Deb for providing Figure 3 for the rest of the
class.
• restaurants.txt has one comma separated entry per line for each restaurant (the first
line is column headings)
latitude,longitude
lat1,long1
lat2,long2
... and so on
• queries.txt has one comma separated entry per line for each query (the first line is
column headings)
latitude,longitude
lat1’,long1’
lat2’,long2’
... and so on
4
• output.txt has one number in each line corresponding to the number of restaurants that
satisfy each of the queries in queries.txt
n1
n2
... and so on
6 Submission Instructions
• Your filename must be kdtree.java. This file reads restaurants.txt and queries.txt
and outputs the results in output.txt.
• You must create a directory whose name is your Kerberos id, followed by “Module6” (for
example, if your Kerberos id is “xyz120100”, then the folder name should be “xyz120100Module6”).
The directory must contain only kdtree.java. Finally, compress this directory, and up-
load it on Moodle.
The zipped file name should be kerberosidModule6.zip (that is, xyz120100Module6.zip
in the above example).
7 Acknowledgements
This is Version 1 of the document. If you find any errors, please send an email to [email protected],
[email protected] and [email protected] (and participate in the
‘error-finding competition’).
Many thanks to Soumil Aggarwal, Aradhye Agarwal, Hitesh Reddy, Vansh Kachhwal, Riya
Sawhney and Dhruv Tyagi for finding errors in Version 0 of the lab-sheet.