0% found this document useful (0 votes)
119 views4 pages

Data Sources: Databases File System Website Object Storage

This document provides an overview of the architecture of a data analytics platform including: 1) Data can be ingested from databases, file systems, websites, and object storage and processed via streaming, batch, or Spark/MapReduce jobs. 2) The platform includes tools like Kafka, HDFS, Zeppelin, Prometheus, and Grafana for data ingestion, storage, processing, and monitoring. 3) Key components include a NiFi cluster for data ingestion, a Kafka cluster for streaming data, HDFS for storage, Kubernates for jobs, and Active Directory for security and user management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
119 views4 pages

Data Sources: Databases File System Website Object Storage

This document provides an overview of the architecture of a data analytics platform including: 1) Data can be ingested from databases, file systems, websites, and object storage and processed via streaming, batch, or Spark/MapReduce jobs. 2) The platform includes tools like Kafka, HDFS, Zeppelin, Prometheus, and Grafana for data ingestion, storage, processing, and monitoring. 3) Key components include a NiFi cluster for data ingestion, a Kafka cluster for streaming data, HDFS for storage, Kubernates for jobs, and Active Directory for security and user management.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Data Sources

Databases File System Website Object Storage

Users

SSP Portal Zeppelin

Streaming Metrics Data Security


Data Ingestion Batch
Data Montoring
Processing User management
Portal
Kafka Manager

Map-Reduce/Spark
Nifi Cluster Kafka Cluster ``
Àctive Directory
Jobs
Prometheus

Admin
Grafana Apache Ranger

HDFS Stoarge system

Kubernates
User UI
HDFSFileBrowser
SSP Portal
HUE HueDB Zeppelin
(Need to be Designed)

Monitoring Applications
Kafka manager Prometheus Grafana

Nifi Cluster
Nifi1 Nifi2 Nifi3

NifiRegistry1 NifiRegistry2 NifiResgistry3

Kafka Cluster

Kafka Server1 Kafka Server2 Kafka Server3

Zookeeper Service

Zookeeper1:2181 Zookeeper2:2182 Zookeeper3:2183

Journal Nodes
Journal Node1 Journal Node 2 Journal Node2

Namenode Service

Active namenode:8020 Passive namenode:8021

DataNode Cluster

DataNode 1 DataNode 2 DataNode 3 DataNode n


HDFS Architecture

Please Refer https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/


HDFSHighAvailabilityWithQJM.html for Configurations (environment variables) of HDFS architecture.

You might also like