Design A Google Analytic Like Backend System

The document proposes a high-level architecture for designing a backend system similar to Google Analytics. It would use microservices running on Kubernetes with Istio for control. Analytics event data would be ingested into Apache Kafka and processed using Apache Spark and stored in InfluxDB for queries. Processed data would also be stored in Amazon Redshift for reporting. The system is scalable and uses streaming data pipelines and distributed databases to handle large volumes of analytics events.

Uploaded by

Abdul Rehman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

336 views

Design A Google Analytic Like Backend System

Uploaded by

Abdul Rehman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Design A Google Analytic like Backend System

There are numerous way of designing a backend. We will take Microservices route because the
web scalability is required for Google Analytics (GA) like backend. Micro services enable us to
elastically scale horizontally in response to incoming network traffic into the system. And a
distributed stream processing pipeline scales in proportion to the load.

Here is the High Level architecture of the Google Analytics (GA) like Backend System.

Analytic Customers
s events dashboard

Load Balancers(HA proxy)

Postgres (for
CMS/OLTP data)
Istio(control plane) + kubernetes(data plane)
Time series plugin
(…. Microservies …..)
+
more read, more
Kafka messaging InfluxD Redshif
services B t
timeseri wareho
Apache Spark + Ignite es use
Processing databas

Components Breakdown

Analytics events data source:

Every web page or mobile site tracked by GA embed tracking code that collects data about the
visitor. It loads an async script that assigns a tracking cookie to the user if it is not set. It also
sends an XHR request for every user interaction.
HAProxy Load Balancer
to improve the performance and reliability of a server environment by distributing the workload
across multiple servers HAProxy performs load balancing (layer 4 + proxy).

Istio Service Mesh and Kubernetes microservices

cluster
Istio makes it easy to create a network of deployed services with load balancing, service-to-
service authentication, monitoring, and more, with few or no code changes in service code. You
add Istio support to services by deploying a special sidecar proxy throughout your environment
that intercepts all network communication between microservices, then configure and manage
Istio using its control plane functionality, which includes:
● Automatic load balancing for HTTP, gRPC, WebSocket, and TCP traffic.
● Fine-grained control of traffic behavior with rich routing rules, retries, failovers, and fault
injection.
● A pluggable policy layer and configuration API supporting access controls, rate limits
and quotas.
● Automatic metrics, logs, and traces for all traffic within a cluster, including cluster ingress
and egress.
● Secure service-to-service communication in a cluster with strong identity-based
authentication and authorization.
Istio is designed for extensibility and meets diverse deployment needs

We can run polyglot microservices, with such a setup to power UI and business logic
implementations.

Apache Kafka Streams

Apache Kafka is used for building real-time streaming data pipelines.
The ingested data is read directly from Kafka by Apache Spark for stream processing and
creates Timeseries RDD (Resilient Distributed Datasets).

Apache Spark + Ignite Processing

Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput,
fault-tolerant stream processing of live data streams.
It provides a high-level abstraction called a discretized stream, or DStream, which represents a
continuous stream of data.

Apache Spark is a perfect choice in our case. This is because Spark achieves high performance
for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer,
and a physical execution engine.

Apache Ignite is a distributed memory-centric database and caching platform that is used by to
share RDD between spark jobs and later persistence.

This will power any high computations to power collective data set creation.

InfluxDB
InfluxDB, is a time series database, to support efficient data ingestion and expensive time series
queries. This will store the processed data either from Apache Spark processing or from
microservices(primarily spark processing).
Later, microservices can consume data directly from influx, with inbuild aggregation support.

Redshift
Redshift, being an AWS managed data warehouse, can be used to store historical datasets for
later retrieval of data and processing.
It also supports pre-planned queries across millions of records within milliseconds, so Redshift
can be effectively used for supporting basic crytal reports.

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
Notes GCP PCA Preparation
No ratings yet
Notes GCP PCA Preparation
7 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
The Apache Kafka® and Generative AI Handbook
From Everand
The Apache Kafka® and Generative AI Handbook
Joseph Matthew Stein
No ratings yet
Backend Development
From Everand
Backend Development
Kai Turing
No ratings yet
Data Engg
No ratings yet
Data Engg
19 pages
Administering ArcGIS for Server
From Everand
Administering ArcGIS for Server
Hussein Nasser
No ratings yet
Real-Time Big Data Analytics
From Everand
Real-Time Big Data Analytics
Shilpi
5/5 (1)
Database_Management_Ecosystem_Comparison
No ratings yet
Database_Management_Ecosystem_Comparison
2 pages
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
From Everand
Big Data on Kubernetes: A practical guide to building efficient and scalable data solutions
Neylson Crepalde
No ratings yet
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
From Everand
Advanced Apache Kafka: Engineering High-Performance Streaming Applications
Peter Jones
No ratings yet
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
From Everand
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
Tim Peters
No ratings yet
Akash High Scale Benchmarks
No ratings yet
Akash High Scale Benchmarks
74 pages
Modern Web Apps using Rust: Build full-stack applications using Rust-based Leptos framework, GraphQL, WebAssembly, and cloud-native deployment
From Everand
Modern Web Apps using Rust: Build full-stack applications using Rust-based Leptos framework, GraphQL, WebAssembly, and cloud-native deployment
Nira Talvyn
No ratings yet
Modern Web Apps using Rust
From Everand
Modern Web Apps using Rust
Nira Talvyn
No ratings yet
A Brief Introduction of Existing Big Data Tools
No ratings yet
A Brief Introduction of Existing Big Data Tools
37 pages
Amazon Web Services
No ratings yet
Amazon Web Services
9 pages
Essays on Infrastructure-as-code
From Everand
Essays on Infrastructure-as-code
Ravi Rajamani
No ratings yet
Advanced Real-Time Data Integration: Apache Kafka and Spark Streaming Techniques
From Everand
Advanced Real-Time Data Integration: Apache Kafka and Spark Streaming Techniques
Adam Jones
No ratings yet
Bingjing - Big Data Tools
No ratings yet
Bingjing - Big Data Tools
38 pages
Hyper-V 2016 Best Practices
From Everand
Hyper-V 2016 Best Practices
Benedict Berger
No ratings yet
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-4: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
SAAC03-Services Summary
No ratings yet
SAAC03-Services Summary
7 pages
Module4 1
No ratings yet
Module4 1
68 pages
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
From Everand
Cloud Computing Made Simple: Navigating the Cloud: A Practical Guide to Cloud Computing
Poonam Devi
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Google Cloud Product Flashcards
No ratings yet
Google Cloud Product Flashcards
117 pages
API Gateway, Cognito and Node.js Lambdas
From Everand
API Gateway, Cognito and Node.js Lambdas
Matthew Casperson
5/5 (1)
Streaming Ecosystem
No ratings yet
Streaming Ecosystem
31 pages
Cloud Data Warehouse or CDW Platforms
No ratings yet
Cloud Data Warehouse or CDW Platforms
5 pages
Mastering Kafka Streams: From Basics to Expert Proficiency
From Everand
Mastering Kafka Streams: From Basics to Expert Proficiency
William Smith
No ratings yet
GCP Notes For Certification
No ratings yet
GCP Notes For Certification
24 pages
Infrastructure Bitnami Application Catalog
No ratings yet
Infrastructure Bitnami Application Catalog
1 page
Mastering C++ Network Automation
From Everand
Mastering C++ Network Automation
Justin Barbara
No ratings yet
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
From Everand
Mastering C++ Network Automation: Run Automation across Configuration Management, Container Orchestration, Kubernetes, and Cloud Networking
Justin Barbara
No ratings yet
Aprende programación python aplicaciones web: python, #2
From Everand
Aprende programación python aplicaciones web: python, #2
Jesus Jonathan cuevas orozco
No ratings yet
Cheatsheet System Design
No ratings yet
Cheatsheet System Design
16 pages
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Nginx Troubleshooting
From Everand
Nginx Troubleshooting
Alex Kapranoff
No ratings yet
ASP.NET For Beginners: The Simple Guide to Learning ASP.NET Web Programming Fast!
From Everand
ASP.NET For Beginners: The Simple Guide to Learning ASP.NET Web Programming Fast!
Tim Warren
No ratings yet
Postman Cookbook
From Everand
Postman Cookbook
Oliver James
No ratings yet
Postman Cookbook: Hand-picked Solutions and Techniques across API Design, Testing, Performance, Networking, Kubernetes and Integration
From Everand
Postman Cookbook: Hand-picked Solutions and Techniques across API Design, Testing, Performance, Networking, Kubernetes and Integration
Oliver James
No ratings yet
Tools for data science
No ratings yet
Tools for data science
6 pages
Hadoop Training in Bangalore
No ratings yet
Hadoop Training in Bangalore
38 pages
Professional Heroku Programming
From Everand
Professional Heroku Programming
Chris Kemp
4/5 (2)
CCD UNIT 3
No ratings yet
CCD UNIT 3
8 pages
Amazon Web Services: A Complete Guide
From Everand
Amazon Web Services: A Complete Guide
Christopher Ford
No ratings yet
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
From Everand
Mastering OpenStack: Design, deploy, and manage clouds in mid to large IT infrastructures
Omar Khedher
No ratings yet
Databases Researh
No ratings yet
Databases Researh
16 pages
Amazon Web Services: A Complete Guide: The IT Collection
From Everand
Amazon Web Services: A Complete Guide: The IT Collection
Christopher Ford
No ratings yet
Mastering Zabbix - Second Edition
From Everand
Mastering Zabbix - Second Edition
Vacche Andrea Dalle
No ratings yet
Data Pipelines From Zero To Solid
No ratings yet
Data Pipelines From Zero To Solid
58 pages
50 System Design Terminologies
No ratings yet
50 System Design Terminologies
3 pages
General Electric
No ratings yet
General Electric
3 pages
Establishment: and The Data Warehousing AND They Use
No ratings yet
Establishment: and The Data Warehousing AND They Use
18 pages
Decomposing SMACK Stack
No ratings yet
Decomposing SMACK Stack
62 pages
4
No ratings yet
4
2 pages
aws qna
No ratings yet
aws qna
6 pages
Normal Forms
No ratings yet
Normal Forms
39 pages
Splunk 6.4 Administration - Splunk
0% (1)
Splunk 6.4 Administration - Splunk
5 pages
Analyzing Data Using Access
100% (1)
Analyzing Data Using Access
47 pages
What Kind of Data Can Be Mined
No ratings yet
What Kind of Data Can Be Mined
6 pages
Acid Test
No ratings yet
Acid Test
3 pages
IT 305 What Is Enterprise Architecture
No ratings yet
IT 305 What Is Enterprise Architecture
10 pages
DevoxxUK 2021 - Thanos
No ratings yet
DevoxxUK 2021 - Thanos
31 pages
Soal QT 2
No ratings yet
Soal QT 2
2 pages
Datastage On Ibm Cloud Pak For Data
No ratings yet
Datastage On Ibm Cloud Pak For Data
6 pages
Automating The Modern Data Warehouse
No ratings yet
Automating The Modern Data Warehouse
66 pages
SQL Server Interview Questions: Explain The Use of Keyword WITH ENCRYPTION. Create A Store Procedure With Encryption
No ratings yet
SQL Server Interview Questions: Explain The Use of Keyword WITH ENCRYPTION. Create A Store Procedure With Encryption
10 pages
Distributed Databases
100% (10)
Distributed Databases
57 pages
Database Worksheet1
No ratings yet
Database Worksheet1
8 pages
Recursive BAQs
No ratings yet
Recursive BAQs
23 pages
Lab4 SQL Answer
No ratings yet
Lab4 SQL Answer
6 pages
Database Systems The Complete Book 2nd Edition Molina Solutions Manual
100% (33)
Database Systems The Complete Book 2nd Edition Molina Solutions Manual
12 pages
TPL SDS
No ratings yet
TPL SDS
10 pages
LIS 105 Organization of Information Sources I - Final
100% (2)
LIS 105 Organization of Information Sources I - Final
5 pages
Data Entry & Office Work Training For Job in Excel in Hindi
No ratings yet
Data Entry & Office Work Training For Job in Excel in Hindi
14 pages
Data Testing White Paper
No ratings yet
Data Testing White Paper
15 pages
Data Warehousing AND Data Mining
100% (1)
Data Warehousing AND Data Mining
90 pages
DB-III Coc Sample Exam
100% (3)
DB-III Coc Sample Exam
9 pages
Alter Database Backup Controlfile To Trace - PM-DB PDF
No ratings yet
Alter Database Backup Controlfile To Trace - PM-DB PDF
3 pages
Rman Commands
No ratings yet
Rman Commands
69 pages
Sridhar G M: Onfidential Ésumé
No ratings yet
Sridhar G M: Onfidential Ésumé
7 pages
Kastolan 1
No ratings yet
Kastolan 1
9 pages
604 CA Final ISCA MCQ - by CA Swapnil Patni PDF
50% (2)
604 CA Final ISCA MCQ - by CA Swapnil Patni PDF
73 pages
SaaS Application Architecture
No ratings yet
SaaS Application Architecture
9 pages
SAP Query
No ratings yet
SAP Query
10 pages

Design A Google Analytic Like Backend System

Uploaded by

Design A Google Analytic Like Backend System

Uploaded by

Design A Google Analytic like Backend System

Load Balancers(HA proxy)

Analytics events data source:

Istio Service Mesh and Kubernetes microservices

Apache Kafka Streams

Apache Spark + Ignite Processing

You might also like