0% found this document useful (0 votes)

13 views11 pages

Big Data Architectures

LES BIG DATA ET LES et approche de ETL utiliser

Uploaded by

Sofiane Stef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Big Data Architectures

LES BIG DATA ET LES et approche de ETL utiliser

Uploaded by

Sofiane Stef

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/336915402

Big Data Architectures : A detailed and application oriented review

Article · October 2019

CITATIONS READS

10 16,592

2 authors:

Godson Koffi Kalipe Rajat Kumar Behera

KIIT University KIIT, Deemed to be University
2 PUBLICATIONS 62 CITATIONS 41 PUBLICATIONS 458 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Rajat Kumar Behera on 31 October 2019.

The user has requested enhancement of the downloaded file.

Big Data Architectures : A detailed and application
oriented review
Godson Koffi Kalipe Rajat Kumar Behera
School of Computer Engineering School of Computer Engineering
Kalinga Institute of Industrial Technology (Deemed to be Kalinga Institute of Industrial Technology (Deemed to be
University) University)
Bhubaneswar, Odisha , India Bhubaneswar, Odisha, India
[email protected] [email protected]

Abstract— Big Data refers to huge amounts of difficulties because of the absence of architectural
heterogeneous data from both traditional and new sources, planning of their data management solutions [38]. They
growing at a higher rate than ever. Due to their high develop overlapping functionalities and are not able to
heterogeneity, it is a challenge to build systems to centrally achieve sustainability because they usually develop
process and analyze efficiently such huge amount of data which
are internal and external to an organization. A Big data
technology driven solutions.
architecture describes the blueprint of a system handling An early and careful Big data system architecting
massive volume of data during its storage, processing, analysis considers a holistic data strategy while focusing on real
and visualization. Several architectures belonging to different business objectives and requirements. It is of the utmost
categories have been proposed by academia and industry but importance to write down the current but also future needs
the field is still lacking benchmarks. Therefore, a detailed in order to take scalability considerations into account
analysis of the characteristics of the existing architectures is from the earliest stages of the design of the Big data
required in order to ease the choice between architectures for system. Once that list of use cases and requirements is
specific use cases or industry requirements. The types of data made clear, a company can move forward and select
sources, the hardware requirements, the maximum tolerable
latency, the fitment to industry, the amount of data to be
among the many existing Big Data architectures the most
handled are some of the factors that need to be considered suitable for its use.
carefully before making the choice of an architecture of a Big The Lambda architecture was one of the first architectures
Data system. However, the wrong choice of architecture can to be proposed for Big Data processing and it has been
result in huge decline for a company reputation and business. established as the standard over time [4]. The Kappa
This paper reviews the most prominent existing Big Data architecture came next, followed by several other
architectures, their advantages and shortcomings, their architectures [3] designed to be able to address some of
hardware requirements, their open source and proprietary the limitations of the lambda architecture for use cases
software requirements and some of their real-world use cases where the former standard failed to offer satisfying results.
catering to each industry. For each architecture, we present a
set of specific problems related to particular applications
In this paper, we discuss different architectures with their
domains, it can be leveraged to solve. Finally, a trade-off optimal use cases along with some of the factors that need
comparison between the various architectures is presented as to be considered to make the best choice from a pool of
the concluding remarks. The purpose of this body of work is to candidate architectures. The paper also highlighted
equip Big Data architects with the necessary resource to make whether the architecture to be adopted for a given use case
better informed choices to design optimal Big Data systems. should be built from scratch or incrementally constructed
from an existing architecture.
Keywords— Big Data Architecture , Big Data Architectural The structure of this paper is described as follows. Section
Patterns, Big Data Use Cases
2 presents an overview of the rise of Big Data and the
I. INTRODUCTION challenges that have accelerated the need of new tools and
architectures. The next section reviews the work that has
The non-stop growth of data, the frantic releases of new been done in the Big Data field to survey the domain,
electronic devices and the data-driven decision-making propose architectures and eventually compare them.
trend in companies is fueling a constant demand for more Section 4 gives, for each architecture, a brief description,
efficient Big Data processing systems. The investment in its advantages and disadvantages, a set of problems it can
Big Data architecture has been rapidly growing these past solve, some of the fields where it can be used and the
years and according to Gartner, businesses will keep hardware and open source configuration required to set up
investing more in IT in 2018 and 2019 focusing on IOT, an environment based on that architecture. An overall
Block-chain and Big Data [1,2]. 178 billion dollars were comparison of the architectures discussed is presented in
spent on Data Center Systems in 2017 and that number is Section 5 and Section 6 concludes the paper.
expected to increase in the coming years [5]. Considering
the important funds companies invest in their Big Data II. BACKGROUND
solutions, it appears obvious that a careful planning
Since In 2013, the McKinsey institute reported that there
should be done ahead of time before the actual
were more than 2 billion Internet users worldwide [55]. In
implementation of a solution. However, according to the 2018, according to an article published by Forbes, that
McKinsey institute, many organizations, today, are facing number has jumped to 3.7 billion users who are performing
over 5 billion searches every day [61, 62]. Social media III. LITERATURE REVIEW
remains one of the biggest sources of the data produced in Many reviews have been done in the field of Big Data. Most
the world. According to Domo’s “Data Never Sleeps 6.0” of them cover technologies, tools, challenges and
report, every minute, Internet users watch more than 4 opportunities in the field [55]. They try to shed more light
million videos on YouTube, close to 13 million text on the field of Big Data, present its advantages and
messages are exchanged, the weather channel receives 18 inconvenient [56]. For the majority, they review for each of
million forecast requests and 97 000 hours of video content traditional Big Data processing steps from data generation to
are streamed on the Internet [63]. The growth is particularly its analysis, the background, the technical challenges, and
apparent with social media considering companies like the latest advances. There has also been some work done to
Instagram, one of the most used social media platforms in review Big Data analytics methods and tools [60].
the world, which has grown its active user database, Reference architectures for Big Data ecosystem have been
between December 2016 and 2018, from 600 million to 800 published by top tech companies as IBM [53], Oracle [51],
million users who are now posting 95 million photos and Microsoft [52] and the National Institute of Standards and
videos everyday [64]. Companies across various industries Technology (NIST) [54]. Various approaches have been
are experiencing a similar frenetic data growth. In 2018, used by researchers in order to try to come up with a
Amazon ships 1111 packages every minute and Uber is reference architecture that could be used across industries in
used to book 1389 rides every single minute [63]. Another a wide variety of use cases. Pekka, P. and Daniel, P. [39]
main contributor to the data flood is the Internet of Things have proposed a technology independent architecture based
industry. The International Data Corporation (IDC) and Intel on the study of seven major Big Data use cases at top tech
predict that there will be 200 billion iot devices in use by companies like Facebook, Linkedin or Netflix. They have
2020 [65]. Considering that only 15 billion devices were decomposed the 7 reviewed architectures in a set of
identified in 2015 up from 2 billion in 2006, it is easier to components which they have then classified according to
start getting an idea of the exponential rate at which data is their roles in 8 components forming their reference
growing in size. And of course, all those devices transmit architecture. Nevertheless, not all use cases required all the
information across networks, sometimes carrying sensitive components of their architecture and by considering other
data intended to trigger immediate reactions. The case of the use cases than the reviewed ones, they have acknowledged
voice control feature which is now used by 8 million people more components might need to be added. Other authors
every month illustrates the point [63]. have followed a similar approach to propose a five-layer
From all that has been previously described, it is evident reference architecture in [47]. Mert, O. G., & al. [40] have
that the traditional single machines can no longer process fetched among more than 19 million projects in the Github
the diverse and humongous amount of data being produced database and Apache Software Foundation projects list, the
at such a high speed. Several challenges have arisen due to ones related to Big Data. They have extracted from 113
the birth of Big Data. They include data storage issues of documents, including whitepapers and projects
course, but also for instance, the need to separate qualitative documentations across diverse industries, 241 most popular
data from noise and error as fast as possible because of the and actively developed open-source tools. The authors have
volatility of the data. Other challenges are faced during the then classified the tools in 11 groups constituting the
entirety of the data analysis process. During the acquisition components of their reference architecture. They have also
of the data, there is a need to filter it, reduce it and associate discussed the suitability of different tools for their
it with metadata. There is also a need to transform architecture’s implementation taking in account factors such
structureless data and eliminate errors from it in a cleaning as timing, data size, platform independency and data-storage
process before consuming it. Heterogeneous data model requirements. Reference architectures have also been
proceeding from various sources have to be integrated into proposed to address specific issues such as security in Big
single data repositories, requiring new designs and systems Data ecosystems. An example is the Big Data Security
more complex than the traditional ones. Additionally, to that, reference architecture proposed in [50]. Their architecture
most use cases require that integration to be automated. Also, was extended from the NIST reference architecture to
queries need to be easily scaled over different amounts of include for each component, tools and specifications to
data and executed in a matter of second for critical use cases. ensure the protection of the elements of interest: encryption
Finally, there is a need to reflect on the design of specific for data, authentication and authorization for networks and
tools to present human friendly interpretation of the data containerization and isolation for processes’ execution. The
that is being generated. Another category of challenges that authors also presented a brief and high-level comparison of
have led to the conception of Big Data ecosystems is the their architecture with other existing reference architectures.
management related issues such as privacy and security There have been several industry specific propositions too,
among so many others [66, 67, 68]. based on the set of requirements of special use cases.
The landscape of Big Data has kept changing since its birth Architectural solutions have been proposed in the field of
and the storage devices’ prices have been considerably Supply Chain Management [48], Intelligent Transportation
reduced while the data collection methods have kept Systems [46], telecommunications [45], healthcare [44],
increasing. Nowadays, in the same system, some data arrive communication networks security (for fault detection and
at a very fast rate in constant streams while others arrive in monitoring) [43], smart grids in electrical networks [42],
big batches periodically. That diversity has led to the Higher education and universities [41]. Those architectures
creation of Big Data architectures with the intention to all reuse all or some of the layers defined in the common
accommodate various data flows and solve the issues reference architectures namely: the data sources layer, the
specific to each of them [69].
extraction/collection/aggregation layer, the storage layer, the A. Lambda Architecture
analysis layer and the visualization layer. Each of these The lambda architecture is an approach to big data
industry specific architectures defines its layers’ processing that aims to achieve low latency updates while
components in terms of the technological tools or features maintaining the highest possible accuracy. It is divided in 3
required by the use case. layers.
Existing architectures have extensively been documented
The first, “the batch layer” is composed of a distributed
over time as they gained popularity. The biggest part of the
file system which stores the entirety of the collected data.
existing research focuses on two of the most popular ones:
The same layer stores a set of predefined functions to be run
The Lambda and Kappa architectures. Zhelev and Rozeva [5] on the dataset to produce what is called a batch view.
worked to equip data architects with decision-making
information by reviewing cloud types, data persistence Those views are stored in a database constituting the
options, data processing paradigms and tools and also “serving layer” from which they can be queried interactively
briefly both the Lambda and Kappa architecture specifying by the user.
each one’s strengths and flaws and mentioning in which The third layer called “speed layer” computes
situation, each one would be suitable to use. Other works incremental functions on the new data as it arrives in the
have presented both Lambda and Kappa architecture along system. It processes only data which is generated between
with some of their strengths and weaknesses [6]. The two consecutive batch views re-computation producing and
authors have also presented a short comparison of both it produces real-time views which are also stored in the
architectures before proposing a new architecture to serving layer. The different views are queried together to
overcome the deficiencies of both the previously discussed obtain the most accurate possible results. A representation of
ones. The most exhaustive work has been done in [7] where this architecture is given in Figure 1.
seven popular architectures were described with the
software requirements necessary to implement them. Our
aim is to extend the work done in [7], by describing not only
existing related use cases but also a set of specific problems
each architecture can solve given an industrial context.
From an industrial application point of view, a lot of work
has been done to provide exposure on how Big Data can be
leveraged to provide better services or increase business
profit in various fields [8, 9, 10]. [8] provides insights on the
kind of hardware required to build a Big Data processing
system discussing electric energy, storage, processing and
network requirements at a very high level. None of the
existing addressed detailed hardware requirements or Fig. 1. Lambda Architecture
attempted to classify use cases and target problems
1) Advantages
architecture wise.
There does not exist yet to the best of our knowledge any Nathan Marz proposed the Lambda architecture (LA) with
reference document using which, a Big Data System as first objective to palliate the problems encountered while
architect can be guided to choose among the most popular using fully incremental systems. Such kinds of system have
Big Data architectures exhibited problems such as operational complexities (online
knowing the industry of application, the existing hardware compaction for example), the need to handle eventual
architecture, the budget allotted to purchasing new consistency in highly available systems and the lack of
components and the problems the system is expected to human fault tolerance. On the contrary, a lambda
solve. architecture-based system provides better accuracy, higher
throughput and lower latency for reads and updates
IV. BIG DATA ARCHITECTURES simultaneously without compromise on data consistency. A
Big Data architectures are designed to manage the ingestion, LA based architecture is also more resilient thanks to the
processing, visualization and analysis of data that are too Distributed File System used to store the master dataset,
large or too complex to handle with traditional tools. From mostly because it is less subject to human errors (such as
one organization to the other, that data might consist of unintended bulk deletions) than a traditional RDBMS.
hundreds of gigabytes or hundreds of terabytes. In the Finally, the lambda architecture helps achieve the main
context of this paper, the minimum amount we consider as requirements of a reliable Big Data system among which are
Big data is 1 TB. robustness and fault tolerance provided through the batch
A Big data architecture determines how the collection, layer. Each layer of the architecture is scalable
storing, analysis and visualization of data is done. We also independently and the lambda architecture can be easily
refer to it to define how to transform structured, generalized or extended for a great number of use cases
unstructured and semi-structured data for analysis and while requiring only minimal maintenance [4]. This
reporting. We discuss in this section, five of the most architecture provides both real-time data analysis through
prominent Big Data architectures that have gained the ad-hoc querying of real-time views and historical data
recognition in the industry over the years. analysis [11].

2) Drawbacks
The main challenge that comes with the Lambda technology to accommodate the master dataset. MapReduce,
Architecture is maintaining the synchronization of the batch PIG and Hive can be used to develop the batch functions.
and speed layers. It consists in regularly discarding the Speed layer. The speed layer can be implemented using real-
recent data from the speed layer once they have been time processing tools such as Storm or S4. Spark Streaming
committed to the immutable dataset in the batch layer. can also be used although it treats data in micro-batches
Another limitation to keep in mind is the fact that only rather than in real streams. The advantage is that the Spark
analytical operations are possible from the serving layer; no code can be reused of in the batch layer [30].
transactional operation is possible. Finally, one of the major Serving layer. Any random-access NoSQL database can
disadvantages of this architecture is the need to maintain host the real-time and batch views. Some examples are:
two similar code bases: one in the speed layer and another in HBase, CouchDB, Voldemort or even MongoDB.
the batch layer to perform the same computation on Cassandra is particularly preferred because of the write-fast
different sets of data. That implies redundancy and it option that it provides.
requires two different sets of skills in order to write the logic Queuing system. A queuing system is necessary to ensure
for the streaming and for the batch data [3]. asynchronous and fault-tolerant transmission of the real-
time data to the batch and speed layer. Popular options
3) Use Cases include Apache Kafka or Flume.
Several companies spanning across multiple industries have
adopted the Lambda Architecture over time. Many of them 5) Hardware requirements
are referenced in [29] where specific use cases and best
The hardware requirements presented here are estimated for
practices around the lambda architecture are collected and
1 TB of data. For the calculation, we use a method detailed
made available to those who are interested to work with it.
in [15]. In order to exploit this, one can make the naïve
A particularly suitable application of the Lambda
assumption that the hardware requirements grow
architecture is found in Log ingestion and analytics. The
proportionally with the amount of data to process. The data
reason is that log messages are immutable and often
in the batch layer is usually not stored in a normalized form
generated at a high speed in systems that need to offer high
thus some additional storage space is required,
availability [12]. The Lambda Architecture is preferred in
approximately 30% of the original size of the data
cases where there is an equal need for real-time/fluid
amounting to a total of 1.3 TB in our case.
analysis of incoming data and for periodic analysis of the
entire repository of data collected. Social media and TABLE I: LAMBDA ARCHITECTURE HARDWARE
especially tweets analysis is a perfect example of such an REQUIREMENTS
application [12]. But the Lambda architecture can be used in
other types of systems to keep track of users subscribing to a Batch layer 1 replicated master node (6 cores CPU, 4 GB memory,
meet-up online for instance [13]. The system in [13] is RAID-1 storage, 64-bit operating system)
based on the Azure platform and HDInsight Blob Storage is 2 worker nodes (12 cores CPU, 4 GB memory, 2 TB
storage, 1 GbE NIC)
used to permanently store the data and compute the batch
1 dedicated resource manager (YARN) node (4 GB
views every 60 seconds while a Redis key-value storage is memory, and 4core)
used to persist and display the new registrations between
Speed layer Shares the Hadoop node
two computation of batch views. The serving layer returns a
Serving layer 2 nodes (1TB, 4 cores, 16 GB memory)
combination of the results of the two other layers in real-
time, via REST webservices, always providing up-to-date
information without much overhead. [14] presents an Each worker node’s raw storage per node (rpsn) was
Amazon EC2 based system processing data from various calculated using the formula in equation (1). 2% of the
sensors across a city in order to make efficient decisions. total storage per node (tspn) is reserved for the Operating
While some of those decisions require an on-the-fly System and other applications and the remaining storage is
analyses of the sensed data, others require that the analyses divided by Hadoop’s default replication factor (rf) 3.
be performed on massive batches of data accumulated over Finally, for each 4TB worker node, 653 GB rough space is
a long period of time. In such a case, the Lambda available to store data.
architecture, again, reveals itself to be ideal to achieve both
objectives.
The lambda architecture is a good choice when data loss or ...(1)
corruption is not an option and where numerous clients
The Spark documentation recommends to run Apache Spark
expect a rapid feedback, for example, in the case of
on the same node as Hadoop if possible [32]. Either way, to
fraudulent claims processing system [15]. Here, the speed
layer using Spark runs in real-time a machine learning model get a proper idea of the exact Spark hardware requirements,
that detects whether a claim is genuine or needs further it is necessary to load the data in the Spark system and use
checking. In that manner, the overall processing time per the Spark monitoring feature to see how much memory it
claim from a user’s point of view is considerably reduced. consumes.
Another important point to note is that, according to the
4) Software requirements Cassandra’s documentation, it is recommended to keep the
Batch layer. The requirements of the batch layer make utilization of each 1TB node to around 600GB [33]. Beyond
Hadoop the most suitable framework to use for its that threshold, it is not uncommon to observe timeout rates
implementation. HDFS provide the perfect append-only and mean latencies explode and node crashes.
4) Software requirements
B. Kappa Architecture The software requirements for the Kappa architecture
are quite similar to those of the Lambda Architecture minus
the Hadoop platform used to implement the batch layer
The Kappa architecture was proposed to reduce the lambda which is absent here.
architecture’s overhead that came with handling two The preferred ingestion tool is Apache Kafka because
separate code bases for stream and batch processing. Its of its ability to retain ordered data logs allowing data
author, Jay Kreps, observed that the necessity of a batch reprocessing which is essential to the Kappa architecture.
processing system came from the need to reprocess Apache Flink is particularly suitable also for
previously streamed data again when the code changed. In implementing Kappa architecture because it allows building
Kappa architecture the batch layer was removed and the time windows for computations. A popular alternative to it
speed layer enhanced to offer reprocessing capabilities. By is Apache Samza.
using specific stream processing tools such as Apache Kafka, 5) Hardware requirements
it is henceforth possible to store streamed data over a period
Table 2 summarizes the hardware requirements for a Kappa
of time and create new stream processing jobs to reprocess
architecture based system. IBM knowledge center published
that data when it’s needed replacing batch processing jobs.
a sizing example, recommending the above reported
The functioning process is depicted in Figure 2.
hardware requirements to ingest 1 TB of data [35]. [34]
specifies the minimal hardware requirement in a production
environment to run Apache Storm in the speed layer.

TABLE II. KAPPA ARCHITECTURE HARDWARE REQUIREMENTS

INGESTION TOOLS 10 SERVERS HAVING EACH :12 PHYSICAL PROCESSORS, 16 GB

RAM
Minimum one server having : 16 GB RAM, 6 core CPUs of 2
SPEED LAYER GHz (or more) each, 4 x 2 TB, 1 GB Ethernet
(STORM)

Fig. 2. Kappa Architecture

SERVING LAYER IDEM. TO LAMBDA ARCHITECTURE
1) Advantages
Kappa architecture-based systems bring a lot of
simplification to data processing. We get the best of both Apache Zookeeper is necessary for the functioning of
worlds by maintaining a single code base while still Apache Kafka and can be installed on the primary Apache
allowing historical data querying and analysis through Kafka server
replays. It has fewer moving parts than the Lambda
architecture which allows for a simpler programming model
as well. Also, it allows replaying streams and recomputing C. The Microservice Architecture
results in case the processing code changes. The incoming A system based on the microservice architecture is
data can still be stored in HDFS but we don’t rely on it to composed of a collection of loosely coupled services that
run reprocessing tasks on historical data. are able to run independently and to communicate with each
2) Inconvenients other via REST web services, remote calls or Push
One of the challenges faced while using this Messaging. Each service is implemented with the tools and
architecture is that only analytical operations are possible in the language that are most suitable for the task it performs.
not transactional ones. Also, it is not possible to implement Each service runs on a dedicated server and has a dedicated
the Kappa architecture with native cloud services because storage. The main difference between the microservice
they do not support streams with a long Time to live (TTL). architecture and a simple SOA based system is that in the
It is important to know that the data is not conserved for a microservice architecture, each service focuses on
long term. In a Kappa architecture-based system, data is accomplishing only one specific task and represents a
kept for a limited predefined period of time after which it is standalone application on its own [20]. The microservice
discarded [11]. architecture is described in figure 3.
3) Use cases 1) Advantages
The Kappa architecture is particularly suited for real- As compared to monolithic systems, microservice
time applications because it focuses on the speed layer. The based systems allow for faster development, faster tests and
author of this architecture, Jay Kreps’s company, Linkedin deployments because each service is small and independent
itself has already adopted it. Seyvet & Vielahave [16] from others thus, easier to understand. Thanks to that
presented a detailed implementation of a Kappa architecture independence between services, fault tolerance is higher and
for real time analytics of users, network and social data each service can be developed or rewritten at any time using
collected by a telco operator. We have inventoried two other the newest technology stacks without compromising the
use cases, a system for real time calculation of Key other services.
Performance Indicator (KPI) in telecommunication and
another in the IOT field [17].
their results are further evaluated by another service to
decide if the user should be allowed to proceed or not.
4) Software requirements
Each microservice is technologically independent. It can be
developed using any language or technology. Many types of
components intervene in the development of microservices
and we will be listing giving examples of corresponding
tools as used in [31] where the Spring boot framework was
used to develop Java based microservices.
Fig. 3. Microservice architecture Container. Every microservice runs within a container. One
tool that can be used to build, deploy and manage containers
Different teams can work more efficiently by being is OpenShift (particularly suitable for Docker based
allocated specific services each. Moreover, services are containers). It orchestrates how and when container-based
reusable across a business and any function can be scaled applications run and allows developers to fix and scale those
independently from the others. applications seamlessly. The container management service
2) Inconvenients Docker is used to create containers in which the applications
On the other hand, an inter-service communication will be developed, shifted and run anywhere.
mechanism is required and its development is quite complex. Distributed version control system. Microservice
There is a need for a strong team coordination and the architecture projects generally imply the existence of several
network communication among the components has to be independent teams working on separate microservices. In
heavily secured. When two services using two different order to facilitate the collaboration between teams, Git is
technological stacks need to communicate, the changes of generally leveraged as source code repository.
format (marshalling and unmarshalling) also create an Continuous integration tool. Continuous
overhead. Though the deployments are faster, they are more integration/continuous delivery (CI/CD) pipelines are built
complex to setup. Each service usually runs in its own to facilitate the deployment of services by automating their
container (possibly a JVM) thus the overall memory deployment after they have passed a test suite. Its main
consumption is way higher than what is required for a objective is to allow early detection of integration bugs. The
monolithic application [18]. most popular framework used for that purpose is Jenkins.
3) Use cases Other popular options include GitLab CI, Buildbot, Drone
The microservice architecture has provided a solution and Concourse.
for many tech giants such as Amazon, Netflix and eBay as 5) Hardware requirements
they have to handle a huge number of requests dayly [20]. Table 3 summarizes the hardware requirements for the
Modularity can be achieved to a certain extent in microservice architecture. The hardware requirements for a
monolith applications so some of the factors indicating we multi-node cluster deployment of Docker, as specified in the
need to use a microservice architecture can : the need for
IBM Cloud Private documentation, are described in [36].
decentralization, a high existing (or predictable) traffic.
The recommended hardware configuration for Jenkins in
Before making the choice of a microservice architecture, it
small teams has been specified in their documentation [37].
is also important to keep in mind the consequent investment
in time and manpower required from the early stages of the
TABLE III . MICROSERVICE ARCHITECTURE HARDWARE
development before the production stage [21]. In [21], the REQUIREMENTS
authors describe the implementation of a microservice based
Container 1 boot node (1+ core, 4 GB RAM, 100+ GB storage)
scalable mobility service that helps blind users find the most management 1, 3 or 5 master nodes (2+ cores, 4+ GB RAM, 151+
suitable paths for them throughout a city leveraging system GB storage)
facilities like bus stops, stairs and audible traffic lights. A 1, 3 or 5 proxy nodes (2+ cores, 4 GB RAM, 40+ GB
microservice is particularly adapted for that use case storage)
because some of the services required such as dynamic 1+ worker nodes (1+ cores, 4GB RAM, 100+GB
planner service, crowd-sensing service and travelling storage)
service already exist (or can be developed independently as 1+ optional management node (4+ cores, 8+ GB RAM,
100+ GB storage)
standalone applications and reused). The only services that 2.4 GHz cores recommended
needed to be developed were the one that the user invokes CI/CD 1 node (1+ GB RAM, 50+ GB storage)
and a high-level orchestrator service to fetch and provide to
the user the useful information. [22] describes the
application of microservices in a fraud detection system. D. The Zeta Architecture
Fraud detection systems are extremely time sensitive The Zeta architecture proposes a novel approach in which
because, in a matter of seconds, a lot of processing has to be the technological solution of a company is directly integrated
done in order to determine whether or not a transaction is with the business/enterprise architecture. Any application
genuine to prevent a potential fraudster to get away with a required by a business can be “plugged in” this architecture.
customer’s money. Several microservices leveraging It provides containers which are isolated environments in
different databases (user past activity database, blacklist which softwares can be run and made to interact together
database, white list activities database etc.) can quickly and independently of the platform incompatibilities. Due to that,
simultaneously perform the necessary checking required and
many types of applications can be accommodated and run in 4) Hardware requirements
a zeta architecture. The requirements for each of the software components of
this architecture have been described in the previous
architectures’ sections already. The reader can refer to
hardware requirements section for the Lambda, Kappa and
microservice for more details.
E. The IoT Architecture
The Internet of Things domain is so vast that no uniform
architecture has been defined so far in the field.
Nevertheless, several architectures have been proposed by
scholars over time [25]. Michael Hausenblas has made an
attempt to propose a high abstraction architecture for all
IOT projects based on the requirements of an IOT data
processing system [26]. The architecture is called iot-a and
it is the one we discuss here. It is represented by Figure 5.

Fig. 4. Zeta architecture

1) Advantages
Since the hardware is not specifically dedicated to any set
of services in particular but is common to the entire system,
it is better utilized and it can be allocated to serve the most
pressing need at any moment. The near real time backups
also help avoid over extended recovery periods from
failures. The architecture helps to discover issues more
quickly too. It facilitates the testing and deployment phases
by allowing the creation of binaries that can be deployed
seamlessly in any environment without the need to modify
them. The example of an advertising platform based on the
zeta architecture is presented in [24]. It shows how Fig. 5. iot-a architecture
intermediaries are suppressed by logs directly being saved,
read and processed from the same Distributed File System.
1) Advantages and Inconvenients
2) Use cases
The zeta architecture is suitable for organizations handling There has not yet been enough feedback on
real-time data processing as part of their internal business projects done using this architecture to provide any thorough
evaluation of its performance and eventually of its flaws.
operations. For instance, the example of dynamic allocation
of parking lots based on data coming from sensors is a good 2) Use cases
use case that has been evoked in [23]. It is the architecture The discussed architecture is a solution designed to be a
leveraged by Google for systems such as Gmail. The zeta good fit for use cases such as smart homes and smart cities
architecture is also particularly suitable for complex data- [27]. A specific example in the automotive sector describes
centric web applications, machine learning based systems how the Message Queue/Stream Processing layer helps alert
and for Big data analytics solutions [24]. in real-time a car user about failures thus preventing
3) Software requirements eventual accidents [28]. The Database layer is used here to
query the system and obtain information about the status of
There are many components in the zeta architecture playing
a car for checkup or in order to develop a repair strategy.
different roles that can each fulfilled by several existing tools.
Finally, the Distributed File System layer can allow the
We list next some of the tools that can be useful to build a
owner of a car to weekly or monthly assess the overall
decent zeta architecture-based system.
metrics and performance of his car and possibly identify
The Distributed File System hosting the master data is problems. [28] also lists three other potential use cases of
generally implemented using Hadoop Distributed File this architectures respectively in biometric database creation
System while the real-time storage can be implemented (the example of the Aadhaar system in India), financial
using NoSQL or NewSQL databases (HBase, MongoDB, Services and waste collection and recycling. Each of those
VoldDB etc.). The enterprise applications on the diagram use cases requires real-time processing of the data whether
generally consist in web servers or any other business to trigger instantaneous notifications or fraud detection
application (varying from one business to the other). alerts. But the interactive aspect is also important in order to
The compute model/execution engine is destined to perform generate better routes for trucks or to help target specific
all analytics operation. Any data processing tool that is companies with banking offers for instance.
pluggable can be used for that purpose: MapReduce, Apache
Spark and even Apache Drill. Apache Mesos or Apache
YARN can serve as global resource manager. The container
management system can be chosen among Docker,
Kubernetes or Mesos.
TABLE IV. COMPARISON OF THE

Architectures Lambda Kappa Iot-a Microservice Zeta

s
Features
Batch/Re Real-time Batch / Real- Batch/ Real- Batch/ Real-
Analysis type al-time time time time

Processing Query Query and Query and Query and Query and
methodology and reporting reporting/ reporting/ reporting
reporting Analytical/ Analytical
Predictive
analysis
Real-time Continuous On-demand On-demand On- demand
Data frequency feeds feeds feeds feeds feeds

Data type Master Transactional Master data Transactional Transactional

data data data
Structured Structured, Structured, Structured, Structured,Se
Content format , Semi- Semi- Semi- Semi- mi-structured
structured structured & structured & structured & &
& Unstructured Unstructured Unstructured Unstructured
Unstructur
ed
Human & Machine & Machine Internal data Web and
Data sources Machine Human generated sources, social media,
generated generated, machine Internal Data
, web or Web or social generated sources
social media
media
Human/ Other Business Enterprise
Data consumers Human Human data process applications
repositories

Again, the Distributed File System layer can be leveraged

with aggregation-like operations to investigate fraud cases V. ARCHITECTURES COMPARISON
that have been flagged over a certain period of time to build Table 4 summarizes the discussion about the 5
a better detection model. The same layer can help architectures into a simple format where it can be referred to
municipalities to generate useful reports on local waste help Big Data architects make the right choice during the
recycling activities on a monthly basis. design of a Big Data ecosystem, depending on their needs
3) Software requirements and requirements.
The MQ/SP (Message queuing and Stream processing) layer
VI. CONCLUSION
can be implementing Apache Kafka or fluentd for data
collection and Apache Spark or Storm for its processing. This paper presents a review of the different prominent
The interactive storage layer can be implemented using any existing Big Data architectures. We present an overall
NoSQL database along with tools like Apache Drill to assessment of the recent review work done in the field of
interact with it. The DFS layer can use HDFS along with Big Data as a whole. Although there is a plethora of work
Hive and Apache Mahout for machine learning over the concerning the characteristics of Big Data itself, its
master dataset. application domains, opportunities, challenges and
technologies, there is a lack of comprehensive review work
4) Hardware requirements
concerning its architecting process. We present reviews of
The requirements for each of the components of this several architectures that have been proposed in industry
architecture have been described in the previous and in academics in the past years in an attempt to solve real
architectures already. The reader can refer to hardware world problems or to serve as roadmap for Big Data
requirements section for the Lambda, Kappa and architects for a wide range of problems. Then, we focus on
microservice for more details. five architectures: the lambda architecture, the kappa
architecture, the iot-a architecture, the micro service
architecture and the zeta architecture. We describe and
compare them in terms of the type of processing they can
perform and the type of use cases they are suitable for. We
list and briefly explain, for each architecture, a list of real [14] Katkar, J. (2015).Study of Big Data Architecture Lambda Architecture
world reliable use cases that can be referred to get (Master’s thesis). Retrieved from
http://scholarworks.sjsu.edu/etd_projects/458/
guidelines concerning how to make the most out of each [15] Lakhe, B. (2016). Case Study : implementing Lambda Architecture. In
architecture during its implementation. We also present R. Hutchinson, M. Moodie & C. Collins (Eds.)Practical Hadoop Migration
known advantages and inconvenient of each architecture as (pp. 209-251). https://doi.org/10.1007/978-1-4842-1287-5
well as a hardware requirements assessment which can help
[16] Seyvet, N. & Viela, I. M. (2016, May 19). Applying the Kappa
stakeholders plan wisely in terms of budget and
Architecture in the telco industry. Retrieved
infrastructures before going for a Big Data solution. fromhttps://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-
Big Data architecting is still in its early age and there is still telco-industry
a lack of reliable information about the technical aspects of [17] Garcia, J. (2015). Kappa Architecture [PowerPoint slides]. Retrieved
how the industry is leveraging it. There will need to be a lot from https://fr.slideshare.net/juantomas/aspgems-kappa-architecture
[18] Richardson, C. (n.d.). Pattern : Microservice architecture. Retrieved
more experimentation and applications in order to establish from http://microservices.io/patterns/microservices.html
standards and performance statistics to refine the choice of [19] Huston, T. (n.d.).What is microservice architecture?. Retrieved
an appropriate architecture. Our work can be further fromhttps://smartbear.com/learn/api-design/what-are-microservices/
extended with more architectures provided the data [20] Kumar, M. (2016, January 5).Microservices Architecture : What,
concerning their performance and requirements in real When, And How?. Retrieved fromhttps://dzone.com/articles/microservices-
architecture-what-when-how
world use cases is made available. Nevertheless, even at this [21] Melis, A., Mirri, S., Prandi, C., Prandini, M., Salomoni, P. & Callegati,
stage, we hope it will of great contribution to efficient Big F. (2016, November). A Microservice Architecture Use Case for Persons
Data ecosystem builders. with disabilities.Paper presented at Smart Objects and Technologies for
Social Good : Second International Conference, GOODTECHS 2016,
REFERENCES Venice, Italy. https://doi.org/10.1007/978-3-319-61949-1_5
[1] Gartner Says Global IT Spending to Reach $3.7 Trillion in 2018. (2018, [22] Scott, J. (2017,February 21).Using microservices to evolve beyond the
January 16). Retrieved from data lake. Retrieved fromhttps://www.oreilly.com/ideas/using-
https://www.gartner.com/newsroom/id/3845563 microservices-to-evolve-beyond-the-data-lake
[2] Press, G. (2017, January 20). 6 Predictions For The $203 Billion Big [23] Pal, K. (2015, September 28). What can the zeta Architecture do for
Data Analytics Market.Retrieved Enterprise?. Retrieved
fromhttps://www.forbes.com/sites/gilpress/2017/01/20/6-predictions-for- fromhttps://www.techopedia.com/2/31357/technology-trends/what-can-the-
the-203-billion-big-data-analytics-market/#599b23752083 zeta-architecture-do-for-enterprise
[3] Kreps, J. (2014, July 2). Questioning the Lambda [24] Konieczny, B. (2017, April 9). General Big Data. Retrieved from
Architecture. Retrieved May 26, 2018, http://www.waitingforcode.com/general-big-data/zeta-architecture/read
https://www.oreilly.com/ideas/questioning-the-lambda-architecture [25] Madakam, S., Ramaswamy, R. & Tripathi, S. (2015). Internet of
[4] Marz, N., & Warren J. (2015). Big Data : Principles and best practices Things (IoT) : A Literature Review. Journal of Computer Science&
of scalable realtime data systems. Retrieved from Communications, 3(5), 164-173. http://dx.doi.org/10.4236/jcc.2015.35021
https://www.manning.com/books/big-data [26] Hausenblas, M. (2015, January 19). Key Requirements for an IOT data
[5] Zhelev, S.& Rozeva, A.(2017, December).Big Data Processing in the platform. Retrieved fromhttps://mapr.com/blog/key-requirements-iot-data-
Cloud - Challenges and Platforms. Paper presented atthe 43rd international platform/
conference applications of mathematics in engineering and economics, [27] Hausenblas, M. (2014, September 9). iot-a : the internet of things
Sozopol, Bulgaria. http://dx.doi.org/10.1063/1.5014007 architecture. Retrieved from https://github.com/mhausenblas/iot-a.info
[6] Ounacer S., Talhaoui M. A., Ardchir S., Daif A.& Azouazi M. (2017). [28] Hausenblas, M. (2015, April 4). A Modern IoT data processing
A New Architecture for Real Time Data Stream Processing. International toolbox [PowerPoint slides].Retrieved from
Journal of Advanced Computer Science and Applications,8(11), 44-51. https://fr.slideshare.net/Hadoop_Summit/a-modern-iot-data-processing-
http://dx.doi.org/10.14569/IJACSA.2017.081106 toolbox
[7] Singh, K. , Behera R. J. &Mantri, J. K.(2018, February).Big Data [29] Hausenblas, M. & Bijnens, N. (2014, July 1). Lambda Architecture.
Ecosystem - Review On Architectural Evolution. Paper presented at the Retrieved from http://lambda-architecture.net/
International Conference on Emerging Technologies in Data Mining and [30] Chu, A. (2016, March 28). Implementing Lambda Architecture to
Information Security, Kolkata, India. Retrieved from track real-time updates. Retrieved from
https://www.researchgate.net/publication/323387483_Big_Data_Ecosystem https://blog.insightdatascience.com/implementing-lambda-architecture-to-
_-_Review_on_Architectural_Evolution track-real-time-updates-f99f03e0c53
[8] Kambatla, K., Kollias, G., Kumar,V. &Grama, A. (2014). Trends in Big [31] Eudy, K. (2018, March 7). A healthcare use case for Business Rules in
Data Analytics. Journal of Parallel and Distributed Computing, a Microservices Architecture. Retrieved from
74(7),2561-2573. https://doi.org/10.1016/j.jpdc.2014.01.003 https://blog.vizuri.com/business-rules-in-a-microservices-architecture
[9] Chen, M., Mao, S. & Liu, Y.(2014). Big Data : A Survey . Mobile [32] Hardware provisioning - Spark 2.3.1 documentation (n.d.) . Retrieved
Networks and Applications, 19(2),171- from https://spark.apache.org/docs/latest/hardware-provisioning.html
209.https://doi.org/10.1007/s11036-013-0489-0 [33] Cassandra/Hardware (2017, May 12). Retrieved from
[10] Latinović, T. S., Preradović, D. M., Barz, C. R., Latinović, M. T., https://wikitech.wikimedia.org/wiki/Cassandra/Hardware
Petrica, P. P. & Pop-Vadean A. (2015, November).Big Data in Industry. [34] Simplilearn (n.d.). Apache Storm - Installation and Configuration
Paper presented at theInternational Conference on Innovative Ideas in Tutorial. Retrieved from https://www.simplilearn.com/apache-storm-
Science (IIS2015) , Baia Mare, Romania.https://doi.org/10.1088/1757- installation-and-configuration-tutorial-video
899X/144/1/012006 [35] Example sizing (n.d.). Retrieved from
[11] Buckley-Salmon, O. (2017). Using Hazelcast as the Serving Layer in https://www.ibm.com/support/knowledgecenter/en/SSPFMY_1.3.5/com.ib
the Kappa Architecture [PowerPoint slides]. Retrieved from m.scala.doc/config/iwa_cnf_scldc_hw_exmple_c.html
https://fr.slideshare.net/OliverBuckleySalmon/using-hazelcast-in-the- [36] Hardware requirements and recommendations (n.d.). Retrieved from
kappa-architecture https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0/supporte
[12] Kumar, N. (2017, January 31). Twitter’s tweets analysis using Lambda d_system_config/hardware_reqs.html
Architecture [Blog post]. Retrieved [37] Installing Jenkins (n.d.). Retrieved from
fromhttps://blog.knoldus.com/2017/01/31/twitters-tweets-analysis-using- https://jenkins.io/doc/book/installing/
lambda-architecture/ [38] Blumberg, G., Bossert, O., Grabenhorst, H. & Soller, H. (2017,
[13] Dorokhov, V. (2017, March 23). Applying Lambda Architecture on November). Why you need a digital data architecture to build a sustainable
Azure. Retrieved digital business. Retrieved from https://www.mckinsey.com/business-
fromhttps://www.codeproject.com/Articles/1171443/Applying-Lambda- functions/digital-mckinsey/our-insights/why-you-need-a-digital-data-
Architecture-on-Azure architecture
[39] Pekka , P., & Daniel, P. (2015). Reference Architecture and
Classification of Technologies, Products and Services for Big Data Systems.
View publication stats

Big Data Research 2(4). 166-186. doi : [53] IBM Corporation. (2014). IBM Big Data & Analytics Reference
https://doi.org/10.1016/j.bdr.2015.01.001 Architecture v1. Retrieved from
[40] Mert, O. G., & al. (2017). Big-Data Analytics Architecture for https://www.ibm.com/developerworks/community/files/form/anonymous/a
Businesses: a comprehensive review on new open-source big-data tools. pi/library/e747a4bd-614d-4c5d-a411-856255c9ddc4/document/bbc80340-
Cambridge Service Alliance. Retrieved from 3bf4-4e0a-8caf-a43f64a22f05/media
https://cambridgeservicealliance.eng.cam.ac.uk/news/2017OctPaper [54] NIST NBD-WG. (2017). Draft NIST Big Data Interoperability
[41] Peter, M., Ján, Š. & Iveta Z. (2014). Concept Definition for Big Data Framework : Volume 6, Reference Architecture. Retrieved
Architecture in the Education System. Paper presented at the 12th https://bigdatawg.nist.gov/_uploadfiles/M0639_v1_9796711131.docx
International Symposium on Applied Machine Intelligence and Informatics, [55] Nawsher, K. & al. (2014). Big Data: Survey, Technologies,
Herl’any, Slovakia, 2014. https://doi.org/10.1109/SAMI.2014.6822433 Opportunities, and Challenges. The Scientific World Journal 2014(2014).
[42] Xing, H., Qi & al. (2017). A Big Data Architecture Design for Smart 1-19. doi : http://dx.doi.org/10.1155/2014/712826
Grids based on Random Matrix Theory. IEEE Transactions on Smart Grid [56] Seref, S. & Duygu, S., (2013). Big Data : A Review. Paper presented
8(2). 674-686. Doi : https://doi.org/10.1109/TSG.2015.2445828 at International Conference on Collaboration Technologies and Systems
[43] Samuel, M., Xiuyan, J., Radu, S. & Thomas, E. (2014). A Big Data (CTS), San Diego, CA, USA. Doi :
architecture for Large Scale Security Monitoring. Paper presented at IEEE https://doi.org/10.1109/CTS.2013.6567202
International Congress of Big Data, Anchorage, AK, USA, 2014. [57] Andrea, M., Marco, G., & Michele, G. (2015). What is Big Data? A
https://doi.org/10.1109/BigData.Congress.2014.18 Consensual Definition and a Review of Key Research Topics. Paper
[44] Yichuan, W., LeeAnn, K. & Terry, A., B. (2016). Big Data Analytics : presented at 4th International Conference on Integrated Information,
Understanding its capabilities and potential benefits for healthcare Madrid, Spain, 2014. Doi : https://doi.org/10.1063/1.4907823
organizations. Technological forecasting and social change 126. 3-13. doi : [58] Amir, G. & Murtaza, H. (2014). Beyond the hype : Big data concepts,
https://doi.org/10.1016/j.techfore.2015.12.019 methods and analytics. International Journal of Information Management
[45] Fei, S., Yi, P., Xu, M., Xinzhou, C., & Weiwei, C. (2016). The 35(2). 137–144. Doi : https://doi.org/10.1016/j.ijinfomgt.2014.10.007
research of Big Data on Telecom industry. Paper presented at 16th [59] Chen, M., Mao, S. & Liu, Y. (2014). Big Data: A Survey. Mobile
International Symposium on Communications and Information Networks and Applications 19(2). 171-209. Doi :
Technologies (ISCIT), QingDao, China, 2016. https://doi.org/10.1007/s11036-013-0489-0
https://doi.org/10.1109/ISCIT.2016.7751636 [60] Elgendy N. & Elragal A. (2014). Big Data Analytics: A Literature
[46] Guilherme, G., Paulo, F., Ricardo, S., Ruben, C. & Ricardo, J. (2016). Review Paper. Paper presented at Industrial Conference on Data Mining, St.
An Architecture for Big Data Processing on Intelligent Transportation Petersburg, Russia, 2014. doi : https://doi.org/10.1007/978-3-319-08976-
Systems. Paper presented at IEEE 8th International Conference on 8_16
Intelligent Systems, Sofia, Bulgaria, 2016. [61] Bernard M. (2018, May 21). How Much Data Do We Create Everyday?
https://doi.org/10.1109/IS.2016.7737393 The Mind-Blowing Stats Everyone Should Read. Retrieved from
[47] Go, M. S., Lai, X., & Paul, V. (2016). A reference Architecture for Big https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-
Data Systems. Paper presented at 10th International Conference on we-create-every-day-the-mind-blowing-stats-everyone-should-
Software, Knowledge, Information Management & Applications (SKIMA), read/#4c87a72b60ba
Chengdu, China, 2016. Doi : 10.1109/SKIMA.2016.7916249 [62] Tom, H. (2017, July 26). How much data does the world generate
[48] Sanjib, B. & Jaydip, S. (2017). A Proposed Architecture for Big Data every minute? Retrieved from https://www.iflscience.com/technology/how-
Driven Supply Chain Analytics. ICFAI University Press (IUP) Journal of much-data-does-the-world-generate-every-minute/
Supply Chain Management 13(3). 7-34. Doi : [63] Josh J. (DOMO) , (2018, June 5). Data Never Sleeps 6.0. Retrieved
https://arxiv.org/ct?url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttps%2Fdx.doi.org%2F10.2139%252Fss from https://www.domo.com/blog/data-never-sleeps-6/
rn.2795906&v=b6e0857f [64] Mary, L. (WordStream) (2018, October 2017). 33 Mind-Boggling
[49] Julio, M., Manuel A. S., Eduardo, F. & Eduardo, B. F. ( 2018). Instagram Stats & Facts for 2018. Retrieved from
Towards a Security Reference Architecture for Big Data. Paper presented at https://www.wordstream.com/blog/ws/2017/04/20/instagram-statistics
21st International Conference on Extending Database Technology and 21st [65] International Data Corporation (IDC), Intel. A Guide to the Internet of
International Conference on Database Theory joint conference, Vienna, Things. Retrieved from https://www.intel.in/content/www/in/en/internet-of-
Austria, 2018. Retrieved from things/infographics/guide-to-iot.html
https://www.semanticscholar.org/paper/Towards-a-Security-Reference- [66] Nasser, T., & Tariq, R. S. (2015). Big Data Challenges. Journal of
Architecture-for-Big-Moreno- Computer Engineering and Informatiion Technology 4(3). 1-10.
Serrano/3966ce99e50c741dcb401707c6c8cacd8420d27e http://dx.doi.org/10.4172/2324-9307.1000135
[50] Yuri, D., Canh, N. & Peter, M. (2013). Architecture Framework and [67] Chaowei, Y., Qunying, H., Zhenlong, L., Kai, L. & Fei H. (2017). Big
Components for the Big Data Ecosystem. Paper presented at International Data and Cloud Computing : Innovation Opportunities and Cloud
Conference on Collaboration Technologies and Systems (CTS), Computing. International Journal of Digital Earth 10(1). 13-53.
Minneapolis, MN, USA, 2014. Doi : https://doi.org/10.1080/17538947.2016.1239771
https://doi.org/10.1109/CTS.2014.6867550 [68] Uthayasankar, S., Muhammad, M. K., Zahir, I. & Vishanth, W. (2016).
[51] Doug, C., Oracle. (2014). Information Management and Big Data : A Critical analysis of Big Data Challenges and Analytical Methods. Journal
Reference Architecture [White paper]. Retrieved from of Business Research 70. 263-286.
https://www.oracle.com/technetwork/topics/entarch/articles/info-mgmt-big- https://doi.org/10.1016/j.jbusres.2016.08.001
data-ref-arch-1902853.pdf [69] Zoiner, T., Mike, W. (2018, March 31). Big Data architectures.
[52] Microsoft. (2014). Microsoft Big Data : Solution Brief. Retrieved from Retrieved from https://docs.microsoft.com/en-us/azure/architecture/data-
http://download.microsoft.com/download/f/a/1/fa126d6d-841b- guide/big-data/
4565bb26d2add4a28f24/microsoft_big_data_solution_brief.pdf

Big Data Architecture Basics
No ratings yet
Big Data Architecture Basics
24 pages
Ataei P
No ratings yet
Ataei P
416 pages
Lecture 5 Software Engineering For Big Data
No ratings yet
Lecture 5 Software Engineering For Big Data
19 pages
Big Data 2
No ratings yet
Big Data 2
49 pages
Database Systems in The Big Data Era
No ratings yet
Database Systems in The Big Data Era
17 pages
AComprehensiveOverviewofBigData ASurvey
No ratings yet
AComprehensiveOverviewofBigData ASurvey
10 pages
Stream Processing Chapter 2
No ratings yet
Stream Processing Chapter 2
21 pages
2892 ArticleText 18886 3 10 20191006
No ratings yet
2892 ArticleText 18886 3 10 20191006
14 pages
ICT703 - Big Data - Assessment 1 - Case Study Analysis Report - 1.2
No ratings yet
ICT703 - Big Data - Assessment 1 - Case Study Analysis Report - 1.2
14 pages
Module 1
No ratings yet
Module 1
29 pages
Big Data Architecture
No ratings yet
Big Data Architecture
9 pages
Big Data Analytics
No ratings yet
Big Data Analytics
36 pages
Chapter 6 - Big Data Architecture Part 1
No ratings yet
Chapter 6 - Big Data Architecture Part 1
41 pages
Big Data Architectures: A Detailed and Application Oriented Review
No ratings yet
Big Data Architectures: A Detailed and Application Oriented Review
11 pages
Abhishek Seminar 222
No ratings yet
Abhishek Seminar 222
19 pages
Analysis of Frameworks and Technologies For Solving Big Data Storage and Processing Problems in Distributed Systems
No ratings yet
Analysis of Frameworks and Technologies For Solving Big Data Storage and Processing Problems in Distributed Systems
4 pages
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
No ratings yet
Big Data Infrastructure, Data Visualisation and Challenges: Ramanathan Venkatraman Sitalakshmi Venkatraman
5 pages
Microservices
No ratings yet
Microservices
83 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
48 pages
Unit1 - BDH
No ratings yet
Unit1 - BDH
77 pages
Big Data Service Architecture: A Survey
No ratings yet
Big Data Service Architecture: A Survey
14 pages
Lecture 2 Scalable Data Systems
No ratings yet
Lecture 2 Scalable Data Systems
41 pages
InfoQ Modern Data Architectures Pipelines Streams
No ratings yet
InfoQ Modern Data Architectures Pipelines Streams
42 pages
A Review of Machine Learning Techniques
No ratings yet
A Review of Machine Learning Techniques
6 pages
BIG DATA 1 Unit
100% (1)
BIG DATA 1 Unit
17 pages
IET Udaipur BDA Unit-1
No ratings yet
IET Udaipur BDA Unit-1
10 pages
BCE Report
No ratings yet
BCE Report
14 pages
Big Data - Comprehensive Summary
No ratings yet
Big Data - Comprehensive Summary
12 pages
Lecture 2 - Big Data
No ratings yet
Lecture 2 - Big Data
8 pages
BDA Unit 2 1
No ratings yet
BDA Unit 2 1
42 pages
Big Data Unit 1 Notes
No ratings yet
Big Data Unit 1 Notes
20 pages
A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks
No ratings yet
A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks
101 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
24 pages
Data Science
No ratings yet
Data Science
87 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Hitachi White Paper Big Data Infrastructure
No ratings yet
Hitachi White Paper Big Data Infrastructure
9 pages
Big Data Architectures
No ratings yet
Big Data Architectures
8 pages
Big Data Architecture
No ratings yet
Big Data Architecture
4 pages
Uc PDF
No ratings yet
Uc PDF
10 pages
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
No ratings yet
Big Data Problems: Understanding Hadoop Framework: G S Aditya Rao, Palak Pandey
3 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Low-Code Development: Empowering Business Users, Enabling Developers
No ratings yet
Low-Code Development: Empowering Business Users, Enabling Developers
61 pages
MicroService - Introduction
100% (1)
MicroService - Introduction
45 pages
Big Data Ecosystem
No ratings yet
Big Data Ecosystem
11 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Guide To Clear Java Developer Interview
No ratings yet
Guide To Clear Java Developer Interview
346 pages
Chapter 6 - Big Data Architecture Part 1
No ratings yet
Chapter 6 - Big Data Architecture Part 1
41 pages
Professional Cloud Developer Exam
No ratings yet
Professional Cloud Developer Exam
161 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
BDA Presentations
No ratings yet
BDA Presentations
26 pages
A Design Pattern For Deploying ML Models To Production 1651052042
No ratings yet
A Design Pattern For Deploying ML Models To Production 1651052042
60 pages
Hadoop & HDFS Final
No ratings yet
Hadoop & HDFS Final
31 pages
REPEAT 2 Building Multi-Tenant-Aware SaaS Microservices ARC405-R2
100% (1)
REPEAT 2 Building Multi-Tenant-Aware SaaS Microservices ARC405-R2
14 pages
Real-Time Big Data Analytics - Sample Chapter
100% (2)
Real-Time Big Data Analytics - Sample Chapter
30 pages
Chatting Application SRS
No ratings yet
Chatting Application SRS
70 pages
Bigdata Analysis: Streaming Twitter Data With Apache Hadoop and V Isualizing Using Biginsights
No ratings yet
Bigdata Analysis: Streaming Twitter Data With Apache Hadoop and V Isualizing Using Biginsights
5 pages
Building A Big Data Architecture - Core Components, Best Practices
No ratings yet
Building A Big Data Architecture - Core Components, Best Practices
6 pages
(IJCST-V5I4P10) :M Dhavapriya
No ratings yet
(IJCST-V5I4P10) :M Dhavapriya
5 pages
Suppala Patricie
No ratings yet
Suppala Patricie
73 pages
IEEE BigDataOpenSourcePlatforms
No ratings yet
IEEE BigDataOpenSourcePlatforms
8 pages
BYTE D1-4 BigDataTechnologiesInfrastructures FINAL - Compressed
No ratings yet
BYTE D1-4 BigDataTechnologiesInfrastructures FINAL - Compressed
34 pages
Microservices: A de Nition of This New Architectural Term
No ratings yet
Microservices: A de Nition of This New Architectural Term
16 pages
The Comple DevOps Bootcamp With Azure Cloud
No ratings yet
The Comple DevOps Bootcamp With Azure Cloud
32 pages
LS1.1 - V6 Generalized Architecture of Big Data Systems
No ratings yet
LS1.1 - V6 Generalized Architecture of Big Data Systems
8 pages
Building Microservices and A CI
No ratings yet
Building Microservices and A CI
58 pages
Final Thesis
No ratings yet
Final Thesis
97 pages
Synchronous and Asynchronous Communication
No ratings yet
Synchronous and Asynchronous Communication
6 pages
Ingestion Layer PDF
No ratings yet
Ingestion Layer PDF
11 pages
Big Data For Org
No ratings yet
Big Data For Org
10 pages
Enhancing Reliability and Scalability of Microservices Through AI/ML-Driven Automated Testing Methodologies
No ratings yet
Enhancing Reliability and Scalability of Microservices Through AI/ML-Driven Automated Testing Methodologies
34 pages
Dipanjan Nandi
No ratings yet
Dipanjan Nandi
12 pages
Introduction To Microservices
No ratings yet
Introduction To Microservices
62 pages
Oracle: 1Z0-1084-20 Exam
No ratings yet
Oracle: 1Z0-1084-20 Exam
25 pages
Chương 1
No ratings yet
Chương 1
17 pages
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
From Everand
Digital Twins: How Engineers Can Adopt Them To Enhance Performances
Isrin Ismail
No ratings yet
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
From Everand
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
Dr Mehmet Yildiz
4.5/5 (2)
Microservices Design Patterns
No ratings yet
Microservices Design Patterns
18 pages
Big Data for Enterprise Architects
From Everand
Big Data for Enterprise Architects
Dr Mehmet Yildiz
4.5/5 (3)
Amazon's E-Commerce Platform
No ratings yet
Amazon's E-Commerce Platform
22 pages
Future of DevOps by N4si
No ratings yet
Future of DevOps by N4si
17 pages
Developer Questions-2
No ratings yet
Developer Questions-2
6 pages
SpringBoot Interview Questions
No ratings yet
SpringBoot Interview Questions
18 pages
Automatic Detection of Security Deficiencies and Refactoring Advises For Microservices
No ratings yet
Automatic Detection of Security Deficiencies and Refactoring Advises For Microservices
10 pages
Saurabh Mamidwar
No ratings yet
Saurabh Mamidwar
5 pages
Sourabh Sharma Resume-16years
No ratings yet
Sourabh Sharma Resume-16years
2 pages
Alonzo CR
No ratings yet
Alonzo CR
4 pages
Test Lead HP
No ratings yet
Test Lead HP
3 pages
6 Data Management Patterns For Microservices - PROGRESSIVE CODER
No ratings yet
6 Data Management Patterns For Microservices - PROGRESSIVE CODER
7 pages

Big Data Architectures

Uploaded by

Big Data Architectures

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Big Data Architectures : A detailed and application oriented review

Article · October 2019

Godson Koffi Kalipe Rajat Kumar Behera

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

TABLE II. KAPPA ARCHITECTURE HARDWARE REQUIREMENTS

INGESTION TOOLS 10 SERVERS HAVING EACH :12 PHYSICAL PROCESSORS, 16 GB

Fig. 2. Kappa Architecture

Fig. 4. Zeta architecture

Architectures Lambda Kappa Iot-a Microservice Zeta

Data type Master Transactional Master data Transactional Transactional

Again, the Distributed File System layer can be leveraged

You might also like