0% found this document useful (0 votes)
55 views7 pages

LS1.1 - V2 Scaling With Traditional Databases

This document discusses scaling traditional databases to handle increasing load from a web analytics application. Initially, the application uses a simple database table to track page visits. As the portal grows popular with many concurrent users, the database write load becomes a bottleneck. To address this, an intermediate queue is introduced between the web server and database to hold messages and prevent data loss. However, as load increases further, database partitioning is used to divide data across multiple machines for parallel writes. While this improves scalability, it introduces complexity in management and repartitioning. Traditional approaches struggle with scalability due to the need for complex, bug-prone application code to handle issues like sharding and replication. In contrast, big data

Uploaded by

R Krish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views7 pages

LS1.1 - V2 Scaling With Traditional Databases

This document discusses scaling traditional databases to handle increasing load from a web analytics application. Initially, the application uses a simple database table to track page visits. As the portal grows popular with many concurrent users, the database write load becomes a bottleneck. To address this, an intermediate queue is introduced between the web server and database to hold messages and prevent data loss. However, as load increases further, database partitioning is used to divide data across multiple machines for parallel writes. While this improves scalability, it introduces complexity in management and repartitioning. Traditional approaches struggle with scalability due to the need for complex, bug-prone application code to handle issues like sharding and replication. In contrast, big data

Uploaded by

R Krish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Scaling with Traditional Databases

Pravin Y Pawar
Web Analytics Application
Example Analytics Application
• Designing an application to monitor the page hits for a portal
• Every time a user visiting a portal page in browser, the server side keeps track of that visit
• Maintains a simple database table that holds information about each page hit
• If user visits the same page again, the page hit count is increased by one
• Uses this information for doing analysis of popular pages among the users

Source : Adapted from Big Data by Nathan Marz


Scaling with intermediate layer
Using a queue
• Portal is very popular, lot of users visiting it
 Many users are concurrently visiting the pages of portal
 Every time a page is visited, database needs to be updated to keep track of this visit
 Database write is heavy operation
 Database write is now a bottleneck

• Solution
 Use an intermediate queue between the web server and database
 Queue will hold messages
 Message will not be lost
Scaling with Database Partitions
Using Database shards
• Application is too popular
 Users are using it very heavily, increasing the load on application
 Maintaining the page view count is becoming difficult even with queue

• Solution
• Use database partitions
 Data is divided into partitions which are hosted on multiple machines
 Database writes are parallelized
 Scalability increasing
 Also complexity increasing
Issues Begins
Bottlenecks
• Disks are prone to failure, hence partition can be inaccessible
• Complicated to manage many number of shards
• Repartitioning is again required when load increased
• More buggy application code as complexity increasing
• Difficult to retrieve from the mistakes done either by application code or humans
Rise of Big Data Systems
How it helps
• Main issue with traditional data processing applications
 Hard to make them scalable
 Hard to keep them simple
• Because everything is managed by application code
 Which is more prone to mistakes due to buggy implementations

• New edge systems aka Big Data Systems


 Handles high data volume, at very fast rate coming from variety of sources
 Systems aware about the distributed nature, hence capable of working with each other
 Application does not need to bother about common issues like sharding, replication etc
 Scalability is achieved by horizontal scaling – just add new machines
 Developers more focused on application logic rather than maintaining the environment
Thank You!
In our next session: Big Data Systems

You might also like