Big Data Architectures
Big Data Architectures
net/publication/336915402
CITATIONS READS
10 16,592
2 authors:
All content following this page was uploaded by Rajat Kumar Behera on 31 October 2019.
Abstract— Big Data refers to huge amounts of difficulties because of the absence of architectural
heterogeneous data from both traditional and new sources, planning of their data management solutions [38]. They
growing at a higher rate than ever. Due to their high develop overlapping functionalities and are not able to
heterogeneity, it is a challenge to build systems to centrally achieve sustainability because they usually develop
process and analyze efficiently such huge amount of data which
are internal and external to an organization. A Big data
technology driven solutions.
architecture describes the blueprint of a system handling An early and careful Big data system architecting
massive volume of data during its storage, processing, analysis considers a holistic data strategy while focusing on real
and visualization. Several architectures belonging to different business objectives and requirements. It is of the utmost
categories have been proposed by academia and industry but importance to write down the current but also future needs
the field is still lacking benchmarks. Therefore, a detailed in order to take scalability considerations into account
analysis of the characteristics of the existing architectures is from the earliest stages of the design of the Big data
required in order to ease the choice between architectures for system. Once that list of use cases and requirements is
specific use cases or industry requirements. The types of data made clear, a company can move forward and select
sources, the hardware requirements, the maximum tolerable
latency, the fitment to industry, the amount of data to be
among the many existing Big Data architectures the most
handled are some of the factors that need to be considered suitable for its use.
carefully before making the choice of an architecture of a Big The Lambda architecture was one of the first architectures
Data system. However, the wrong choice of architecture can to be proposed for Big Data processing and it has been
result in huge decline for a company reputation and business. established as the standard over time [4]. The Kappa
This paper reviews the most prominent existing Big Data architecture came next, followed by several other
architectures, their advantages and shortcomings, their architectures [3] designed to be able to address some of
hardware requirements, their open source and proprietary the limitations of the lambda architecture for use cases
software requirements and some of their real-world use cases where the former standard failed to offer satisfying results.
catering to each industry. For each architecture, we present a
set of specific problems related to particular applications
In this paper, we discuss different architectures with their
domains, it can be leveraged to solve. Finally, a trade-off optimal use cases along with some of the factors that need
comparison between the various architectures is presented as to be considered to make the best choice from a pool of
the concluding remarks. The purpose of this body of work is to candidate architectures. The paper also highlighted
equip Big Data architects with the necessary resource to make whether the architecture to be adopted for a given use case
better informed choices to design optimal Big Data systems. should be built from scratch or incrementally constructed
from an existing architecture.
Keywords— Big Data Architecture , Big Data Architectural The structure of this paper is described as follows. Section
Patterns, Big Data Use Cases
2 presents an overview of the rise of Big Data and the
I. INTRODUCTION challenges that have accelerated the need of new tools and
architectures. The next section reviews the work that has
The non-stop growth of data, the frantic releases of new been done in the Big Data field to survey the domain,
electronic devices and the data-driven decision-making propose architectures and eventually compare them.
trend in companies is fueling a constant demand for more Section 4 gives, for each architecture, a brief description,
efficient Big Data processing systems. The investment in its advantages and disadvantages, a set of problems it can
Big Data architecture has been rapidly growing these past solve, some of the fields where it can be used and the
years and according to Gartner, businesses will keep hardware and open source configuration required to set up
investing more in IT in 2018 and 2019 focusing on IOT, an environment based on that architecture. An overall
Block-chain and Big Data [1,2]. 178 billion dollars were comparison of the architectures discussed is presented in
spent on Data Center Systems in 2017 and that number is Section 5 and Section 6 concludes the paper.
expected to increase in the coming years [5]. Considering
the important funds companies invest in their Big Data II. BACKGROUND
solutions, it appears obvious that a careful planning
Since In 2013, the McKinsey institute reported that there
should be done ahead of time before the actual
were more than 2 billion Internet users worldwide [55]. In
implementation of a solution. However, according to the 2018, according to an article published by Forbes, that
McKinsey institute, many organizations, today, are facing number has jumped to 3.7 billion users who are performing
over 5 billion searches every day [61, 62]. Social media III. LITERATURE REVIEW
remains one of the biggest sources of the data produced in Many reviews have been done in the field of Big Data. Most
the world. According to Domo’s “Data Never Sleeps 6.0” of them cover technologies, tools, challenges and
report, every minute, Internet users watch more than 4 opportunities in the field [55]. They try to shed more light
million videos on YouTube, close to 13 million text on the field of Big Data, present its advantages and
messages are exchanged, the weather channel receives 18 inconvenient [56]. For the majority, they review for each of
million forecast requests and 97 000 hours of video content traditional Big Data processing steps from data generation to
are streamed on the Internet [63]. The growth is particularly its analysis, the background, the technical challenges, and
apparent with social media considering companies like the latest advances. There has also been some work done to
Instagram, one of the most used social media platforms in review Big Data analytics methods and tools [60].
the world, which has grown its active user database, Reference architectures for Big Data ecosystem have been
between December 2016 and 2018, from 600 million to 800 published by top tech companies as IBM [53], Oracle [51],
million users who are now posting 95 million photos and Microsoft [52] and the National Institute of Standards and
videos everyday [64]. Companies across various industries Technology (NIST) [54]. Various approaches have been
are experiencing a similar frenetic data growth. In 2018, used by researchers in order to try to come up with a
Amazon ships 1111 packages every minute and Uber is reference architecture that could be used across industries in
used to book 1389 rides every single minute [63]. Another a wide variety of use cases. Pekka, P. and Daniel, P. [39]
main contributor to the data flood is the Internet of Things have proposed a technology independent architecture based
industry. The International Data Corporation (IDC) and Intel on the study of seven major Big Data use cases at top tech
predict that there will be 200 billion iot devices in use by companies like Facebook, Linkedin or Netflix. They have
2020 [65]. Considering that only 15 billion devices were decomposed the 7 reviewed architectures in a set of
identified in 2015 up from 2 billion in 2006, it is easier to components which they have then classified according to
start getting an idea of the exponential rate at which data is their roles in 8 components forming their reference
growing in size. And of course, all those devices transmit architecture. Nevertheless, not all use cases required all the
information across networks, sometimes carrying sensitive components of their architecture and by considering other
data intended to trigger immediate reactions. The case of the use cases than the reviewed ones, they have acknowledged
voice control feature which is now used by 8 million people more components might need to be added. Other authors
every month illustrates the point [63]. have followed a similar approach to propose a five-layer
From all that has been previously described, it is evident reference architecture in [47]. Mert, O. G., & al. [40] have
that the traditional single machines can no longer process fetched among more than 19 million projects in the Github
the diverse and humongous amount of data being produced database and Apache Software Foundation projects list, the
at such a high speed. Several challenges have arisen due to ones related to Big Data. They have extracted from 113
the birth of Big Data. They include data storage issues of documents, including whitepapers and projects
course, but also for instance, the need to separate qualitative documentations across diverse industries, 241 most popular
data from noise and error as fast as possible because of the and actively developed open-source tools. The authors have
volatility of the data. Other challenges are faced during the then classified the tools in 11 groups constituting the
entirety of the data analysis process. During the acquisition components of their reference architecture. They have also
of the data, there is a need to filter it, reduce it and associate discussed the suitability of different tools for their
it with metadata. There is also a need to transform architecture’s implementation taking in account factors such
structureless data and eliminate errors from it in a cleaning as timing, data size, platform independency and data-storage
process before consuming it. Heterogeneous data model requirements. Reference architectures have also been
proceeding from various sources have to be integrated into proposed to address specific issues such as security in Big
single data repositories, requiring new designs and systems Data ecosystems. An example is the Big Data Security
more complex than the traditional ones. Additionally, to that, reference architecture proposed in [50]. Their architecture
most use cases require that integration to be automated. Also, was extended from the NIST reference architecture to
queries need to be easily scaled over different amounts of include for each component, tools and specifications to
data and executed in a matter of second for critical use cases. ensure the protection of the elements of interest: encryption
Finally, there is a need to reflect on the design of specific for data, authentication and authorization for networks and
tools to present human friendly interpretation of the data containerization and isolation for processes’ execution. The
that is being generated. Another category of challenges that authors also presented a brief and high-level comparison of
have led to the conception of Big Data ecosystems is the their architecture with other existing reference architectures.
management related issues such as privacy and security There have been several industry specific propositions too,
among so many others [66, 67, 68]. based on the set of requirements of special use cases.
The landscape of Big Data has kept changing since its birth Architectural solutions have been proposed in the field of
and the storage devices’ prices have been considerably Supply Chain Management [48], Intelligent Transportation
reduced while the data collection methods have kept Systems [46], telecommunications [45], healthcare [44],
increasing. Nowadays, in the same system, some data arrive communication networks security (for fault detection and
at a very fast rate in constant streams while others arrive in monitoring) [43], smart grids in electrical networks [42],
big batches periodically. That diversity has led to the Higher education and universities [41]. Those architectures
creation of Big Data architectures with the intention to all reuse all or some of the layers defined in the common
accommodate various data flows and solve the issues reference architectures namely: the data sources layer, the
specific to each of them [69].
extraction/collection/aggregation layer, the storage layer, the A. Lambda Architecture
analysis layer and the visualization layer. Each of these The lambda architecture is an approach to big data
industry specific architectures defines its layers’ processing that aims to achieve low latency updates while
components in terms of the technological tools or features maintaining the highest possible accuracy. It is divided in 3
required by the use case. layers.
Existing architectures have extensively been documented
The first, “the batch layer” is composed of a distributed
over time as they gained popularity. The biggest part of the
file system which stores the entirety of the collected data.
existing research focuses on two of the most popular ones:
The same layer stores a set of predefined functions to be run
The Lambda and Kappa architectures. Zhelev and Rozeva [5] on the dataset to produce what is called a batch view.
worked to equip data architects with decision-making
information by reviewing cloud types, data persistence Those views are stored in a database constituting the
options, data processing paradigms and tools and also “serving layer” from which they can be queried interactively
briefly both the Lambda and Kappa architecture specifying by the user.
each one’s strengths and flaws and mentioning in which The third layer called “speed layer” computes
situation, each one would be suitable to use. Other works incremental functions on the new data as it arrives in the
have presented both Lambda and Kappa architecture along system. It processes only data which is generated between
with some of their strengths and weaknesses [6]. The two consecutive batch views re-computation producing and
authors have also presented a short comparison of both it produces real-time views which are also stored in the
architectures before proposing a new architecture to serving layer. The different views are queried together to
overcome the deficiencies of both the previously discussed obtain the most accurate possible results. A representation of
ones. The most exhaustive work has been done in [7] where this architecture is given in Figure 1.
seven popular architectures were described with the
software requirements necessary to implement them. Our
aim is to extend the work done in [7], by describing not only
existing related use cases but also a set of specific problems
each architecture can solve given an industrial context.
From an industrial application point of view, a lot of work
has been done to provide exposure on how Big Data can be
leveraged to provide better services or increase business
profit in various fields [8, 9, 10]. [8] provides insights on the
kind of hardware required to build a Big Data processing
system discussing electric energy, storage, processing and
network requirements at a very high level. None of the
existing addressed detailed hardware requirements or Fig. 1. Lambda Architecture
attempted to classify use cases and target problems
1) Advantages
architecture wise.
There does not exist yet to the best of our knowledge any Nathan Marz proposed the Lambda architecture (LA) with
reference document using which, a Big Data System as first objective to palliate the problems encountered while
architect can be guided to choose among the most popular using fully incremental systems. Such kinds of system have
Big Data architectures exhibited problems such as operational complexities (online
knowing the industry of application, the existing hardware compaction for example), the need to handle eventual
architecture, the budget allotted to purchasing new consistency in highly available systems and the lack of
components and the problems the system is expected to human fault tolerance. On the contrary, a lambda
solve. architecture-based system provides better accuracy, higher
throughput and lower latency for reads and updates
IV. BIG DATA ARCHITECTURES simultaneously without compromise on data consistency. A
Big Data architectures are designed to manage the ingestion, LA based architecture is also more resilient thanks to the
processing, visualization and analysis of data that are too Distributed File System used to store the master dataset,
large or too complex to handle with traditional tools. From mostly because it is less subject to human errors (such as
one organization to the other, that data might consist of unintended bulk deletions) than a traditional RDBMS.
hundreds of gigabytes or hundreds of terabytes. In the Finally, the lambda architecture helps achieve the main
context of this paper, the minimum amount we consider as requirements of a reliable Big Data system among which are
Big data is 1 TB. robustness and fault tolerance provided through the batch
A Big data architecture determines how the collection, layer. Each layer of the architecture is scalable
storing, analysis and visualization of data is done. We also independently and the lambda architecture can be easily
refer to it to define how to transform structured, generalized or extended for a great number of use cases
unstructured and semi-structured data for analysis and while requiring only minimal maintenance [4]. This
reporting. We discuss in this section, five of the most architecture provides both real-time data analysis through
prominent Big Data architectures that have gained the ad-hoc querying of real-time views and historical data
recognition in the industry over the years. analysis [11].
2) Drawbacks
The main challenge that comes with the Lambda technology to accommodate the master dataset. MapReduce,
Architecture is maintaining the synchronization of the batch PIG and Hive can be used to develop the batch functions.
and speed layers. It consists in regularly discarding the Speed layer. The speed layer can be implemented using real-
recent data from the speed layer once they have been time processing tools such as Storm or S4. Spark Streaming
committed to the immutable dataset in the batch layer. can also be used although it treats data in micro-batches
Another limitation to keep in mind is the fact that only rather than in real streams. The advantage is that the Spark
analytical operations are possible from the serving layer; no code can be reused of in the batch layer [30].
transactional operation is possible. Finally, one of the major Serving layer. Any random-access NoSQL database can
disadvantages of this architecture is the need to maintain host the real-time and batch views. Some examples are:
two similar code bases: one in the speed layer and another in HBase, CouchDB, Voldemort or even MongoDB.
the batch layer to perform the same computation on Cassandra is particularly preferred because of the write-fast
different sets of data. That implies redundancy and it option that it provides.
requires two different sets of skills in order to write the logic Queuing system. A queuing system is necessary to ensure
for the streaming and for the batch data [3]. asynchronous and fault-tolerant transmission of the real-
time data to the batch and speed layer. Popular options
3) Use Cases include Apache Kafka or Flume.
Several companies spanning across multiple industries have
adopted the Lambda Architecture over time. Many of them 5) Hardware requirements
are referenced in [29] where specific use cases and best
The hardware requirements presented here are estimated for
practices around the lambda architecture are collected and
1 TB of data. For the calculation, we use a method detailed
made available to those who are interested to work with it.
in [15]. In order to exploit this, one can make the naïve
A particularly suitable application of the Lambda
assumption that the hardware requirements grow
architecture is found in Log ingestion and analytics. The
proportionally with the amount of data to process. The data
reason is that log messages are immutable and often
in the batch layer is usually not stored in a normalized form
generated at a high speed in systems that need to offer high
thus some additional storage space is required,
availability [12]. The Lambda Architecture is preferred in
approximately 30% of the original size of the data
cases where there is an equal need for real-time/fluid
amounting to a total of 1.3 TB in our case.
analysis of incoming data and for periodic analysis of the
entire repository of data collected. Social media and TABLE I: LAMBDA ARCHITECTURE HARDWARE
especially tweets analysis is a perfect example of such an REQUIREMENTS
application [12]. But the Lambda architecture can be used in
other types of systems to keep track of users subscribing to a Batch layer 1 replicated master node (6 cores CPU, 4 GB memory,
meet-up online for instance [13]. The system in [13] is RAID-1 storage, 64-bit operating system)
based on the Azure platform and HDInsight Blob Storage is 2 worker nodes (12 cores CPU, 4 GB memory, 2 TB
storage, 1 GbE NIC)
used to permanently store the data and compute the batch
1 dedicated resource manager (YARN) node (4 GB
views every 60 seconds while a Redis key-value storage is memory, and 4core)
used to persist and display the new registrations between
Speed layer Shares the Hadoop node
two computation of batch views. The serving layer returns a
Serving layer 2 nodes (1TB, 4 cores, 16 GB memory)
combination of the results of the two other layers in real-
time, via REST webservices, always providing up-to-date
information without much overhead. [14] presents an Each worker node’s raw storage per node (rpsn) was
Amazon EC2 based system processing data from various calculated using the formula in equation (1). 2% of the
sensors across a city in order to make efficient decisions. total storage per node (tspn) is reserved for the Operating
While some of those decisions require an on-the-fly System and other applications and the remaining storage is
analyses of the sensed data, others require that the analyses divided by Hadoop’s default replication factor (rf) 3.
be performed on massive batches of data accumulated over Finally, for each 4TB worker node, 653 GB rough space is
a long period of time. In such a case, the Lambda available to store data.
architecture, again, reveals itself to be ideal to achieve both
objectives.
The lambda architecture is a good choice when data loss or ...(1)
corruption is not an option and where numerous clients
The Spark documentation recommends to run Apache Spark
expect a rapid feedback, for example, in the case of
on the same node as Hadoop if possible [32]. Either way, to
fraudulent claims processing system [15]. Here, the speed
layer using Spark runs in real-time a machine learning model get a proper idea of the exact Spark hardware requirements,
that detects whether a claim is genuine or needs further it is necessary to load the data in the Spark system and use
checking. In that manner, the overall processing time per the Spark monitoring feature to see how much memory it
claim from a user’s point of view is considerably reduced. consumes.
Another important point to note is that, according to the
4) Software requirements Cassandra’s documentation, it is recommended to keep the
Batch layer. The requirements of the batch layer make utilization of each 1TB node to around 600GB [33]. Beyond
Hadoop the most suitable framework to use for its that threshold, it is not uncommon to observe timeout rates
implementation. HDFS provide the perfect append-only and mean latencies explode and node crashes.
4) Software requirements
B. Kappa Architecture The software requirements for the Kappa architecture
are quite similar to those of the Lambda Architecture minus
the Hadoop platform used to implement the batch layer
The Kappa architecture was proposed to reduce the lambda which is absent here.
architecture’s overhead that came with handling two The preferred ingestion tool is Apache Kafka because
separate code bases for stream and batch processing. Its of its ability to retain ordered data logs allowing data
author, Jay Kreps, observed that the necessity of a batch reprocessing which is essential to the Kappa architecture.
processing system came from the need to reprocess Apache Flink is particularly suitable also for
previously streamed data again when the code changed. In implementing Kappa architecture because it allows building
Kappa architecture the batch layer was removed and the time windows for computations. A popular alternative to it
speed layer enhanced to offer reprocessing capabilities. By is Apache Samza.
using specific stream processing tools such as Apache Kafka, 5) Hardware requirements
it is henceforth possible to store streamed data over a period
Table 2 summarizes the hardware requirements for a Kappa
of time and create new stream processing jobs to reprocess
architecture based system. IBM knowledge center published
that data when it’s needed replacing batch processing jobs.
a sizing example, recommending the above reported
The functioning process is depicted in Figure 2.
hardware requirements to ingest 1 TB of data [35]. [34]
specifies the minimal hardware requirement in a production
environment to run Apache Storm in the speed layer.
1) Advantages
Since the hardware is not specifically dedicated to any set
of services in particular but is common to the entire system,
it is better utilized and it can be allocated to serve the most
pressing need at any moment. The near real time backups
also help avoid over extended recovery periods from
failures. The architecture helps to discover issues more
quickly too. It facilitates the testing and deployment phases
by allowing the creation of binaries that can be deployed
seamlessly in any environment without the need to modify
them. The example of an advertising platform based on the
zeta architecture is presented in [24]. It shows how Fig. 5. iot-a architecture
intermediaries are suppressed by logs directly being saved,
read and processed from the same Distributed File System.
1) Advantages and Inconvenients
2) Use cases
The zeta architecture is suitable for organizations handling There has not yet been enough feedback on
real-time data processing as part of their internal business projects done using this architecture to provide any thorough
evaluation of its performance and eventually of its flaws.
operations. For instance, the example of dynamic allocation
of parking lots based on data coming from sensors is a good 2) Use cases
use case that has been evoked in [23]. It is the architecture The discussed architecture is a solution designed to be a
leveraged by Google for systems such as Gmail. The zeta good fit for use cases such as smart homes and smart cities
architecture is also particularly suitable for complex data- [27]. A specific example in the automotive sector describes
centric web applications, machine learning based systems how the Message Queue/Stream Processing layer helps alert
and for Big data analytics solutions [24]. in real-time a car user about failures thus preventing
3) Software requirements eventual accidents [28]. The Database layer is used here to
query the system and obtain information about the status of
There are many components in the zeta architecture playing
a car for checkup or in order to develop a repair strategy.
different roles that can each fulfilled by several existing tools.
Finally, the Distributed File System layer can allow the
We list next some of the tools that can be useful to build a
owner of a car to weekly or monthly assess the overall
decent zeta architecture-based system.
metrics and performance of his car and possibly identify
The Distributed File System hosting the master data is problems. [28] also lists three other potential use cases of
generally implemented using Hadoop Distributed File this architectures respectively in biometric database creation
System while the real-time storage can be implemented (the example of the Aadhaar system in India), financial
using NoSQL or NewSQL databases (HBase, MongoDB, Services and waste collection and recycling. Each of those
VoldDB etc.). The enterprise applications on the diagram use cases requires real-time processing of the data whether
generally consist in web servers or any other business to trigger instantaneous notifications or fraud detection
application (varying from one business to the other). alerts. But the interactive aspect is also important in order to
The compute model/execution engine is destined to perform generate better routes for trucks or to help target specific
all analytics operation. Any data processing tool that is companies with banking offers for instance.
pluggable can be used for that purpose: MapReduce, Apache
Spark and even Apache Drill. Apache Mesos or Apache
YARN can serve as global resource manager. The container
management system can be chosen among Docker,
Kubernetes or Mesos.
TABLE IV. COMPARISON OF THE
Processing Query Query and Query and Query and Query and
methodology and reporting reporting/ reporting/ reporting
reporting Analytical/ Analytical
Predictive
analysis
Real-time Continuous On-demand On-demand On- demand
Data frequency feeds feeds feeds feeds feeds
Big Data Research 2(4). 166-186. doi : [53] IBM Corporation. (2014). IBM Big Data & Analytics Reference
https://doi.org/10.1016/j.bdr.2015.01.001 Architecture v1. Retrieved from
[40] Mert, O. G., & al. (2017). Big-Data Analytics Architecture for https://www.ibm.com/developerworks/community/files/form/anonymous/a
Businesses: a comprehensive review on new open-source big-data tools. pi/library/e747a4bd-614d-4c5d-a411-856255c9ddc4/document/bbc80340-
Cambridge Service Alliance. Retrieved from 3bf4-4e0a-8caf-a43f64a22f05/media
https://cambridgeservicealliance.eng.cam.ac.uk/news/2017OctPaper [54] NIST NBD-WG. (2017). Draft NIST Big Data Interoperability
[41] Peter, M., Ján, Š. & Iveta Z. (2014). Concept Definition for Big Data Framework : Volume 6, Reference Architecture. Retrieved
Architecture in the Education System. Paper presented at the 12th https://bigdatawg.nist.gov/_uploadfiles/M0639_v1_9796711131.docx
International Symposium on Applied Machine Intelligence and Informatics, [55] Nawsher, K. & al. (2014). Big Data: Survey, Technologies,
Herl’any, Slovakia, 2014. https://doi.org/10.1109/SAMI.2014.6822433 Opportunities, and Challenges. The Scientific World Journal 2014(2014).
[42] Xing, H., Qi & al. (2017). A Big Data Architecture Design for Smart 1-19. doi : http://dx.doi.org/10.1155/2014/712826
Grids based on Random Matrix Theory. IEEE Transactions on Smart Grid [56] Seref, S. & Duygu, S., (2013). Big Data : A Review. Paper presented
8(2). 674-686. Doi : https://doi.org/10.1109/TSG.2015.2445828 at International Conference on Collaboration Technologies and Systems
[43] Samuel, M., Xiuyan, J., Radu, S. & Thomas, E. (2014). A Big Data (CTS), San Diego, CA, USA. Doi :
architecture for Large Scale Security Monitoring. Paper presented at IEEE https://doi.org/10.1109/CTS.2013.6567202
International Congress of Big Data, Anchorage, AK, USA, 2014. [57] Andrea, M., Marco, G., & Michele, G. (2015). What is Big Data? A
https://doi.org/10.1109/BigData.Congress.2014.18 Consensual Definition and a Review of Key Research Topics. Paper
[44] Yichuan, W., LeeAnn, K. & Terry, A., B. (2016). Big Data Analytics : presented at 4th International Conference on Integrated Information,
Understanding its capabilities and potential benefits for healthcare Madrid, Spain, 2014. Doi : https://doi.org/10.1063/1.4907823
organizations. Technological forecasting and social change 126. 3-13. doi : [58] Amir, G. & Murtaza, H. (2014). Beyond the hype : Big data concepts,
https://doi.org/10.1016/j.techfore.2015.12.019 methods and analytics. International Journal of Information Management
[45] Fei, S., Yi, P., Xu, M., Xinzhou, C., & Weiwei, C. (2016). The 35(2). 137–144. Doi : https://doi.org/10.1016/j.ijinfomgt.2014.10.007
research of Big Data on Telecom industry. Paper presented at 16th [59] Chen, M., Mao, S. & Liu, Y. (2014). Big Data: A Survey. Mobile
International Symposium on Communications and Information Networks and Applications 19(2). 171-209. Doi :
Technologies (ISCIT), QingDao, China, 2016. https://doi.org/10.1007/s11036-013-0489-0
https://doi.org/10.1109/ISCIT.2016.7751636 [60] Elgendy N. & Elragal A. (2014). Big Data Analytics: A Literature
[46] Guilherme, G., Paulo, F., Ricardo, S., Ruben, C. & Ricardo, J. (2016). Review Paper. Paper presented at Industrial Conference on Data Mining, St.
An Architecture for Big Data Processing on Intelligent Transportation Petersburg, Russia, 2014. doi : https://doi.org/10.1007/978-3-319-08976-
Systems. Paper presented at IEEE 8th International Conference on 8_16
Intelligent Systems, Sofia, Bulgaria, 2016. [61] Bernard M. (2018, May 21). How Much Data Do We Create Everyday?
https://doi.org/10.1109/IS.2016.7737393 The Mind-Blowing Stats Everyone Should Read. Retrieved from
[47] Go, M. S., Lai, X., & Paul, V. (2016). A reference Architecture for Big https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-
Data Systems. Paper presented at 10th International Conference on we-create-every-day-the-mind-blowing-stats-everyone-should-
Software, Knowledge, Information Management & Applications (SKIMA), read/#4c87a72b60ba
Chengdu, China, 2016. Doi : 10.1109/SKIMA.2016.7916249 [62] Tom, H. (2017, July 26). How much data does the world generate
[48] Sanjib, B. & Jaydip, S. (2017). A Proposed Architecture for Big Data every minute? Retrieved from https://www.iflscience.com/technology/how-
Driven Supply Chain Analytics. ICFAI University Press (IUP) Journal of much-data-does-the-world-generate-every-minute/
Supply Chain Management 13(3). 7-34. Doi : [63] Josh J. (DOMO) , (2018, June 5). Data Never Sleeps 6.0. Retrieved
https://arxiv.org/ct?url=https%3A%2F%2Ffanyv88.com%3A443%2Fhttps%2Fdx.doi.org%2F10.2139%252Fss from https://www.domo.com/blog/data-never-sleeps-6/
rn.2795906&v=b6e0857f [64] Mary, L. (WordStream) (2018, October 2017). 33 Mind-Boggling
[49] Julio, M., Manuel A. S., Eduardo, F. & Eduardo, B. F. ( 2018). Instagram Stats & Facts for 2018. Retrieved from
Towards a Security Reference Architecture for Big Data. Paper presented at https://www.wordstream.com/blog/ws/2017/04/20/instagram-statistics
21st International Conference on Extending Database Technology and 21st [65] International Data Corporation (IDC), Intel. A Guide to the Internet of
International Conference on Database Theory joint conference, Vienna, Things. Retrieved from https://www.intel.in/content/www/in/en/internet-of-
Austria, 2018. Retrieved from things/infographics/guide-to-iot.html
https://www.semanticscholar.org/paper/Towards-a-Security-Reference- [66] Nasser, T., & Tariq, R. S. (2015). Big Data Challenges. Journal of
Architecture-for-Big-Moreno- Computer Engineering and Informatiion Technology 4(3). 1-10.
Serrano/3966ce99e50c741dcb401707c6c8cacd8420d27e http://dx.doi.org/10.4172/2324-9307.1000135
[50] Yuri, D., Canh, N. & Peter, M. (2013). Architecture Framework and [67] Chaowei, Y., Qunying, H., Zhenlong, L., Kai, L. & Fei H. (2017). Big
Components for the Big Data Ecosystem. Paper presented at International Data and Cloud Computing : Innovation Opportunities and Cloud
Conference on Collaboration Technologies and Systems (CTS), Computing. International Journal of Digital Earth 10(1). 13-53.
Minneapolis, MN, USA, 2014. Doi : https://doi.org/10.1080/17538947.2016.1239771
https://doi.org/10.1109/CTS.2014.6867550 [68] Uthayasankar, S., Muhammad, M. K., Zahir, I. & Vishanth, W. (2016).
[51] Doug, C., Oracle. (2014). Information Management and Big Data : A Critical analysis of Big Data Challenges and Analytical Methods. Journal
Reference Architecture [White paper]. Retrieved from of Business Research 70. 263-286.
https://www.oracle.com/technetwork/topics/entarch/articles/info-mgmt-big- https://doi.org/10.1016/j.jbusres.2016.08.001
data-ref-arch-1902853.pdf [69] Zoiner, T., Mike, W. (2018, March 31). Big Data architectures.
[52] Microsoft. (2014). Microsoft Big Data : Solution Brief. Retrieved from Retrieved from https://docs.microsoft.com/en-us/azure/architecture/data-
http://download.microsoft.com/download/f/a/1/fa126d6d-841b- guide/big-data/
4565bb26d2add4a28f24/microsoft_big_data_solution_brief.pdf