Design A Google Analytic Like Backend System
Design A Google Analytic Like Backend System
There are numerous way of designing a backend. We will take Microservices route because the
web scalability is required for Google Analytics (GA) like backend. Micro services enable us to
elastically scale horizontally in response to incoming network traffic into the system. And a
distributed stream processing pipeline scales in proportion to the load.
Here is the High Level architecture of the Google Analytics (GA) like Backend System.
Analytic Customers
s events dashboard
Components Breakdown
We can run polyglot microservices, with such a setup to power UI and business logic
implementations.
Apache Spark is a perfect choice in our case. This is because Spark achieves high performance
for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer,
and a physical execution engine.
Apache Ignite is a distributed memory-centric database and caching platform that is used by to
share RDD between spark jobs and later persistence.
This will power any high computations to power collective data set creation.
InfluxDB
InfluxDB, is a time series database, to support efficient data ingestion and expensive time series
queries. This will store the processed data either from Apache Spark processing or from
microservices(primarily spark processing).
Later, microservices can consume data directly from influx, with inbuild aggregation support.
Redshift
Redshift, being an AWS managed data warehouse, can be used to store historical datasets for
later retrieval of data and processing.
It also supports pre-planned queries across millions of records within milliseconds, so Redshift
can be effectively used for supporting basic crytal reports.