0% found this document useful (0 votes)
158 views

Saved Search Service Architecture

The document describes a proposed saved search service for dubizzle.com. It is divided into three services: 1) an insertion service to create, update and delete saved searches, 2) an extraction service to read saved searches, and 3) a compute and alert service to generate search results and alerts from dubizzle.com listings and save them to a separate alerts database. The services will use separate MongoDB databases for saved searches and alerts, and follow scalability principles including separate infrastructure, sharding, caching, and writing to the master database via RabbitMQ.

Uploaded by

Usman Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views

Saved Search Service Architecture

The document describes a proposed saved search service for dubizzle.com. It is divided into three services: 1) an insertion service to create, update and delete saved searches, 2) an extraction service to read saved searches, and 3) a compute and alert service to generate search results and alerts from dubizzle.com listings and save them to a separate alerts database. The services will use separate MongoDB databases for saved searches and alerts, and follow scalability principles including separate infrastructure, sharding, caching, and writing to the master database via RabbitMQ.

Uploaded by

Usman Ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Saved Search Service for dubizzle.

com (by Usman Gondal)

Separation of Concerns

Saved Search Service is divided into 3 Services/Subsystems

1. Insertion Service is responsible for creating, updating and deleting Saved Searches, only this
service is allowed to write to the Saved Searches database and only this service will have
access to the Mongodb master.

2. Extraction Service is responsible for reading Saved Searches, this service will interact with
secondary because it only needs to read data. The only parameter in the Query will be
email_address, this field will be indexed for faster retrievals.

3. Compute Abstract & Send Alert Service, Compute Service will consume the main
dubizzle.com listings search service and save listing search results for the email_address.
Since one user can have many saved searches, these will be inserted into the mongodb as
denormalized data.

Database Choice and Design(Mongodb)

1. We need dynamic field names within the same record

2. We need a simple key value store

3. We want to keep things as simple and as denormalized as possible

Saved Search Service will be powered by 2 Separate MongoDB Databases

1. SavedSearch Database

2. Alerts Database

Scalability Principles

1. Having separate infrastructure i.e load balancers and app servers for insertion service,
extraction service and compute service

2. Using Sharding on Mongodb from the get go using hashed Sharding Technique which allows
for uniform load distributing when writing data. (https://www.mongodb.com/blog/post/
performance-best-practices-sharding)

3. Implementing memcache between the AppServers and the database

4. Never disturbing the main dubizzile.com database, not writing to it in any way, only requesting
read access within the compute service to generate listing alerts and save them to a different
database "Alerts"

5. Writing to Master via RabbitMQ, Giving write access to only 1 service for the SavedSearch
database via RabbitMQ to cater for millions of requests daily, thus freeing it up, using
secondary/slave nodes to read the data.

6. Updating cache in the compute service to reduce load on the database.

Insert Service

REST API Interface

HTTP POST: insert.savedsearchapi.dubizzle.com/1/savedsearches/create

HTTP POST: insert.savedsearchapi.dubizzle.com/1/savedsearches/update

HTTP POST: insert.savedsearchapi.dubizzle.com/1/savedsearches/delete

We will be using separate top level nodes(appservers) for reading data and writing data because
in the interest of scalability, ease of development/documentation and testing and openness of
architecture, this is why we will have separate API for inserting new SavedSearches into the
system and separate API for extracting existing SavedSearches from the system.

Benefits of using HTTP POST for all operations.

1. Ensues openness of architecture by allowing more data to be transmitted, for example if in the
future dubizzle.com wants to allow a user to take a picture of a laptop or car and post it to the
website and say "send me new listings which contain a laptop like this", an AI method will
then match and find similar listings, the POST call will still work because it can take way more
data than GET call, which is limited to data in the URL.

2. Developers will have to make less choices when writing code, they will know by default all
calls on SavedSearch API are POST calls, making work easy for them.

Scalability and Openness of Architecture.

1. Load balancer serving http servers (for insertion API) for horizontal scaling

2. Http server code forwards the data to RabbitmMQ which fills all key value pairs and converts
to search api data representation (leveraging Mongodb dynamic fields)

3. This "Create Data Representations for all products on the fly" operation will contain code
which will generate all products' data representations given any 1 product's data.

4. Thus allowing for openness in architecture

5. And then writes to Mongodb master



Extract Service

HTTP POST extract.savedsearchapi.dubizzle.com/1/savedsearches/search/email

Using only http POST to ensure openness for image data

Compute Service
Opennes of Architecture

By Creatings Data Representations for all (new and existing)products on the fly we will ensure if
any new number of products are added to the system, their respective data representation will
automatically become available. This will also allow back-version compatibility if we want to
change data representation in the next version of any of our products.

This is essentially script which creates new key value pairs for other products with same email
address in the same mongodb record

to get key names for other products this script will read data from a static class

key_names = [ [ "title", "title_en", "en_title" ], [ "mileage","km","mi" ], [ "desc", "description",


"<<<new_name>>>" ], ["price","pr","p"] ]

You might also like