0% found this document useful (0 votes)

55 views8 pages

Review: Big Data Techniques of Google, Amazon, Facebook and Twitter

Uploaded by

rada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views8 pages

Review: Big Data Techniques of Google, Amazon, Facebook and Twitter

Uploaded by

rada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/323588192

Review: Big Data Techniques of Google, Amazon, Facebook and Twitter

Article in Journal of Communications · February 2018

DOI: 10.12720/jcm.13.2.94-100

CITATIONS READS

10 9,880

4 authors, including:

Malka N. Halgamuge Gullu Ekici

University of Melbourne Charles Sturt University
168 PUBLICATIONS 1,334 CITATIONS 3 PUBLICATIONS 20 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Data Science, Business Intelligence: Big Data Analytics for Decision Making View project

Internet of Things (IoT), Sensor Network View project

All content following this page was uploaded by Malka N. Halgamuge on 01 September 2018.

The user has requested enhancement of the downloaded file.

Journal of Communications Vol. 13, No. 2, February 2018

Review: Big Data Techniques of Google, Amazon,

Facebook and Twitter
Thulara N. Hewage, Malka N. Halgamuge, Ali Syed, and Gullu Ekici
School of Computing and Mathematics, Charles Sturt University, Melbourne, Victoria 3000, Australia
Department of Electrical and Electronic Engineering, The University of Melbourne, Parkville, VIC 3010
Email: [email protected], {ASyed, gekici}@studygroup.com


Abstract—Google, Amazon, Facebook and Twitter gained performance analytics. In the early 2000s the concept of
enormous advantages from big data methodologies and big data aroused, especially in the earlier periods, as
techniques. There are certain unanswered questions regarding storage of data was a problem. However, the newly
the process of big data, however, not much research has been
invented technologies such as Hadoop, MapReduce
undertaken in this area yet. This review will perform a
comparative analysis based on big data techniques obtained provided some solution for particular issues. Big Data
from sixteen peer-reviewed scientific publications (2007-2015) provides solution as it reduces cost, time, and provides
about social media companies such as Google, Amazon, prospect to opens ways for new products. For accurate
Facebook and Twitter to undertake a comparative analysis. decision making, companies have become reliant on big
Google has invented many techniques by using big data methods data. Big data provides skilled analytics and instantly that
to strategize against competitors. Google, Facebook, Amazon
and Twitter are partially similar companies that use big data
deciphers failures and problems that may occur in near
despite their own business model requirements. As an future or in current situations. Simply, big data detects
illustration, Google required the data “ware housing” approach risks early allowing time to take preventative measures.
to store trillion of data related to Facebook, since Facebook Companies like Google, Facebook, Amazon and
owns more than one billion users and Twitter owns 300 million Twitter, have their entire core functions related to big
active users correspondingly equally to Amazon. Since all these data. This paper discusses whether and how Google,
organization required data ware house approach, Google has
Facebook, Amazon and Twitter apply big data techniques
preferred the variation of data ware house storages (Spanner,
Photon, Fusion table) variation of data transaction methods. By to their business models. These companies have different
using these data ware house storage approaches (F1 for execute business models, as Google is a search engine based
queries via SQL) and communication of different approached business model. However, organizations like Google are
such as, Yedalog. Facebook and Twitter are both the only social prone to use big data techniques in with regard to
media companies that have different requirements. The improve their storage ability with accurate output that
requirement of big data is high and these entire requirements
captures client queries, and maintains query logs and etc.
partially depend on each another as it is completely isolated.
This study is a useful reference for many researchers to identify Facebook organization (social network) business model is
the differences of big data approaches and technological completely different when compared to search engine
analysis in comparison to Google, Facebook, Twitter and based organization (Google) yet these companies rely on
Amazon big data techniques and outline their, variations and big data since the main necessities are storage, analysis
similarities analysis. and accuracy.
Based on scientific results presented in the literature;
Index Terms—Big data, big data techniques, amazon, Google File System has successfully overcome the
social networks
problem of Google data storage. GFS is the largest cluster
that provides hundreds of terabytes of storage across
I. INTRODUCTION
thousands of disks over thousand machines and
The accustomed computers need manual setups to continuously requests [1] client requirements. Spanner is
retrieve information, they are also designed to learn a Google database that globally distributes the Database.
through human interaction and, through continuous This product can share data within machines across the
feedback, and essentially, they do not reprogram world within data centers and provides intelligence ability
themselves. However, the revolutionary big data provides to respond to failures and balance the requested load.
comprehensive predictive software that illuminates the This development expands into millions of machines
effort requiring feedback and planning that eventually across hundreds of datacenters and trillion [2] of database
becomes unfeasible with demand. Further, big data rows. Another development of Google is, “Google’s F1
strategize futuristic business moves with user database management system” provides consistency and
high availability allowing the user to execute queries via
SQL. Google advertisers can bid, budgets, get involved in
Manuscript received May 19, 2017; revised January 15, 2018. campaigns that change and provide immediate feedback
Corresponding author email: [email protected].
doi:10.12720/jcm.13.2.94-100

©2018 Journal of Communications 94

Journal of Communications Vol. 13, No. 2, February 2018

perform primary user [4] events (search queries). Colum data concerns and characteristics of techniques
store database concept internally started by Google and demonstrated in Table II. Furthermore, clarification
became very famous in the last decade. The concept is according to the characteristics and techniques of each
beneficial as it allows investigation of lager sets in technology is based on data ware house properties that
datasets (billions of rows, log records) within few are described in Table III.
seconds. The columns store concepts that could work on
A. Collection of Raw Data
thousands of machines known as [5] Dremel. However,
Yedalog allows programmers to code, process and to This analysis was made to pool data of scientific
write data on the same pipeline, and the same formation research of whether and how big data techniques are used
could run in different platforms [6]. by Google, Facebook, Twitter and Amazon. The analysis
Google invented Hive to write codes on map reduce excluded the values of scientific research experiments.
program as an open source data warehouse solution. The technological aspect of big data and their variations,
Allows SQL based queries – HiveQL [7] allow differentiation and similarities of technologies used by
customaries to map reduce script plug into quires. Google, Facebook, Twitter and Amazon were included.
Facebook has more than 1.59 billion active users at the B. Analysis and Comparison of Raw Data
moment, and Facebook has also invented Scuba as a data
For the analysis, we used comparison method to pool
management tool. This product at the moment consumes
big data techniques used by Google, Facebook, Twitter
millions of rows per second and expires millions of data
and Amazon. According to the requirements of business
per second. Facebook owns more than hundred servers
with the capacity of 144 GB RAM [8] therefore Scuba and the use of techniques, the technological products are
store data in the memory of these servers. As a solution invented by Google, Facebook, Twitter and Amazon.
for high growth of data Amazon built Dynamo product, Category of the technique, subsidiaries are the
which is the highest available key value storage system. architecture of the technique, data model, the API of the
The Amazon web site functions needs, a primary key to technique, security of the invented product based of the
access the best seller list, shopping cart, customer big data technique and portioning and replication of the
preference, session management, sales rank and product technology.
catalog to accomplish the requirements of Dynamo [9] in C. Descriptive Analysis and Comparison
pursuance to provide an interface with a simple primary
Big data techniques vary from one another; however
key. Twitter is a social network system with 1.3 billion
there are similarities in many aspects. The technologies
active users, and in the first stages, Twitter used
we consider in this research analysis and comparison are
application specific logging system nevertheless they
according to the information technology categorizations.
have introduced unified log format. When the analytics
All the other categories are excluded in this research
task considers the client session as basis of analysis
paper nevertheless all the required and provided aspects
Twitter comes up with session sequences. This method
are included a clear analysis and comparisons are further
summarizes answers for large classes of common queries
included. In terms of categorization and comparison of
[10] as much as possible. The results from these studies
have not been without controversy. As discussed above data, the ware house (models of databases) and data
the mentioned techniques selected to draw data from communication (query methods), API of each technique,
research papers (2007-2015) in regard to eliminate these Replication, Architecture is included.
controversies.
III. RESULTS
II. MATERIAL AND METHODS Table I illustrates the overview of published articles by
In our analysis, we include research experiments year and number of their publications dates. The analyzed
results from big data techniques used by Google, content of the study is collected by Authors and
Facebook, Twitter and Amazon. We further investigated published year, Big Data technique name, Big Data
big data methods and techniques such as MapReduce, techniques used by the company and Description about
Paxos, Flattening technique, SQL, Tree Structured data the technique are also outlined. Table II describes the
model, Spark, Hadoop, Classical Reed Solomon codes, Author, published year, Big Data technique, Big Data
etc. technique used in the company, Base Technology,
The raw data presented in Tables I-III specifies the Categories of the technology and supporting areas.
variables that were in the Google, Facebook, Twitter and Overview of Author and publication year, Big Data
Amazon. Identification of big data techniques and how technique name, Big Data technique used company. The
Google, Facebook, Twitter and Amazon uses these conclusion is drawn, followed by recommendations that
techniques coherently as a business models that has are provided in Table IV. In our final analysis, we
clearly been described in Table I. The demonstration of describe our findings and supported areas categorized by
the analysis of big data techniques used in various the Big Data technologies, performances and techniques
companies has also been outlined. The categorization of that are used by each company displayed in Table III.

©2018 Journal of Communications 95

Journal of Communications Vol. 13, No. 2, February 2018

TABLE I: THE ANALYSIS OF BIG DATA TECHNIQUES USED BY VARIOUS COMPANIES, DRAWN FROM SIXTEEN PEER REVIEWED SCIENTIFIC ARTICLES
PUBLISHED IN 2007-2015.
Author and Big Data Big Data Description about Technique
Published Technique Name Technique used
Year Company
Corbett et al. Spanner Google - Google Globally Distributed Database.
(2012) [2] - Shared data set into Paxos1 state and expands to millions of machines across hundreds
of datacenters and trillion of database rows.
- Data centers balance the load and respond to failures.
DeCandia et Dynamo Amazon - Key value storage system
al. (2007) [9] - Provides an interface with simple primary key.
- Run continually underneath the failure situations.
Abraham et Scuba Facebook - Data management tool
al. (2013) [8] - Consume millions of rows of data per second and expires millions of data per second.
- Analysis live data
Ghemawat et Google File Google - Storage management.
al. (2003) System - The largest cluster up to the data provides hundreds of terabytes of storage across
[1] thousands of disks over thousand machines and continuously request clients’
requirements.
- Run under the circumstance of fault tolerance on inexpensive commodity hardware
Lee et al. Unified Logging Twitter - Introduced unified log format.
(2012) [10] Infrastructure for - Capture messages in common and well-structured format.
Data Analytics - Based on session sequences and summarizer large class of common.
Rae et al. F1 Google - Relational database management system
(2013) [3] - Allow user to execute queries via SQL.

Ananthanara Photon Google - Make the availability of real time data

yanan et al - Perform primary user’s events such as search query with following event.
(2013) [4]
Melnik et al. Dremel Google - Query engine
(2010) [11] - Dominance relation and semi flattening
- Flattening technique is proposed method for maps.
Madhavan et Fusion Table Google - Cloud based data management system.
al. (2012) - Sharing, collaboration, exploration, visualization, web publishing and provision
[12] visualizations, such as maps, timelines, and network graphs which can be implanted on
any web belongings.
Gupta et al. F1, Mesa and Google - Processing and maintain advertisement related facts and send critical report to Google’s
(2015) [13] Photon Ad user and clients.
- Including performance of their Ad campaigns and budgeting of the live serving system
Hall et al. Processing a Google - Colum store database technology.
(2012) [5] Trillion cells per - Used OLAP or OLTP like SQL interfaces
mouse click - Additional approach for establish products like MonetDB [14], Netezza [15] and
QlikTech [16]
Afrati et al. Dremel Google - TreeStructured data model.
(2014) [17] - One or more than one relations.
- Example JSON data format, Google's protocol buffers[17], Nested relations recent
developments (combination of relational and TreeStructured) Dremel, F1
Chin et al. Yedalog Google - Google finding solution to assemble digital knowledge and search engine query logs.
(2012) [6] -MapReduce and Spark main drawback is not automated.
- Search input parsed using dependency
Cheng et al. Cascades Facebook - Prediction of re-sharing pattern
(2014) [18] - Cascades of re-share content focused on analyzing and characterizing.
Thusoo et al. Hive Facebook - Open source warehouse solution
(2009) [7] - SQL based queries – HiveQL
- Customaries map reduce script plug into quires.
- Hive base on Hadoop system.
Maheswaran XORing Facebook - Coding technique use as saving storage with redundancy.
et al (2013) Elephants - Classical Reed Solomon codes.
[19]

TABLE II: THE ANALYSIS OF BIG DATA TECHNIQUES USED BY VARIOUS SOCIAL MEDIA COMPANIES. THE CONCERNING DATA CHARACTERISTICS OF
TECHNIQUES FROM SIXTEEN PEER REVIEWED SCIENTIFIC ARTICLES PUBLISHED IN 2007-2015.
Author and Big Data Big Data Base Technology Categories of the Subsidiary area
published year technique used technique technology
company
Rae et al. (2013) Google F1 Hybrid database Database Google Ad Work
[3] technologies

©2018 Journal of Communications 96

Journal of Communications Vol. 13, No. 2, February 2018

Ananthanarayanan Google Photon Query events Database Google joint data steams
et al. (2014) [4]
Hall et al. (2012) Google Processing a Composite range partition, System Google single mouse click into
[5] Trillion cells Column oriented database environment trillion datasets producing
per mouse click system, ad hoc queries process.

Afrati et al. (2014) Google Dremel Tree Structured, Schemas Programming Google query language
[17] model features.
Cheng et al. Facebook Cascades Data mining Programming Facebook
(2014) [18] prediction model cascade predicted framework

Sathiamoorthy et Facebook XORing Hadoop HDFS Programming Facebook overcomes

al. (2013) [19] Elephants model Reed-Solomon codes
limitations
Thusoo et al. Facebook Hive Hadoop System Facebook in warehousing
(2009) [7] environment solution
Chin et al. (2015) Google Yedalog Logic programming, Data Programming Google overcome MapReduce
[6] structured and nested model and Spark technology had
records limitation as man powered
coding.
Corbett et al. Google Spanner Versioned key-value store Database (data Data ware house and
(2012) [2] into a temporal multi- ware house transaction of data
version environment)
database
Ghemawat et al. Google Google File Traditional file system System Data ware house and
(2003) [1] System environment transaction of data

Gupta et al. (2015) Google F1, Mesa and Bigtable Database Multi-homing
[13] Photon
Madhavan et al. Google Google Fusion Google maps Database Google map visualizations,
(2010) [12] Tables interactive maps

DeCandia et al. Amazon Dynamo Primary key, decentralized Database Amazon data warehousing
(2007) [9] techniques

Abraham et al. Facebook Scuba Hadoop Database Facebook data warehousing

(2013) [8]
Lee et al. (2012) Twitter Unified Hadoop-based, System Twitter system environment
[10] Logging System running on a environment solutions
Infrastructure cluster of several
for Data
Analytics

Despite the number of considerable publications, big data techniques used by well-known companies.
demonstrating the use of big data techniques used by Consequently, we focus on conferred growth of data and
well-known companies, there are still some authors that performances of techniques since some of the reported
cancel out the growth and performance of these positive findings are flawed by data represent limitation
techniques. Our aim is to provide advance knowledge of and shortcomings.
TABLE III: THE ANALYZE OF BIG DATA TECHNIQUES USED IN VARIOUS COMPANIES. DATA CONCERNING THE DATABASE RELATED
CHARACTERISTICS OF TECHNIQUES TAKEN FROM SIXTEEN PEER REVIEWED SCIENTIFIC ARTICLES PUBLISHED IN 2007-2015.
Technique Architecture Data model API (Function) Security Partitioning Replication
Dynamo Decentralized Key-value Get, put No security Consistent hashing Successor nodes in ring
BigTable Centralized Multidimension Get, scan, put, delete Access control tablet server Chunk server in GF
al sorted map
Megastore Semi Access control Create, update, delete Access control Hashing Synchronous
relational
Spanner Semi Schematized Paxos algorithm Access control Hierarchies of Synchronous
relational, tables
True Time
F1 Decentralized Hierarchical Create, update, delete Access control Relational Not applied
schema
Dynamo Map reduce Key value Create, update, delete, Access control Multiple (sort key, Synchronous, cross
etc. partition key, etc.) region
Scuba Hadoop base Semi-structured Ad hoc queries Access control Not applied Not applied
and sparse
Mesa Decentralized Novel batch- Paxos algorithm Access control Not applied Not applied
update

Journal of Communications Vol. 13, No. 2, February 2018

Photon Decentralized Near-exact Paxos algorithm Access control Not applied Synchronous
semantics
Google Decentralized Map-reduce Selection, projection, Access control Not applied Map data servers
Fusion grouping, aggregation,
Tables equijoin

All these techniques are related to the databases. Some technique. Furthermore, big data techniques connect with
of the techniques are deliberated by early coding each other, as F1, Mesa and Photon are three different
languages [20] that cascade prediction [18] etc. In above products implemented by Google for different functions
comparison, we discussed database related techniques. nevertheless they have been used for the same big data
According to the company requirements, all the implementation technique. Big data proves advantages
techniques or the inventions have to be adopted into the that are ample, however there are minor drawbacks.
organization. At the same time within same organization One drawback of big data classification of modern big
some techniques differ from each other due to the data technologies is that it is questionable and
complexity of the organization. As an example, Google challengeable. Big data classification techniques are
has a search engine based company therefore database is representation of learning, supervised learning and
a massive requirement [4] on the other hand, the database machine lifelong learning. Big Data technologies are
categorizes into several types of data warehouses [7]. Hadoop, Hive etc. Suggested solution for this challenge is
integrated with Hadoop distributed file system with
IV. DISCUSSION representation learning techniques. Furthermore, this
integration solves the prediction network [24] of big data
The conclusion drawn from this analysis shows that
classification strategy and solves continuity parameters.
big data techniques are used by various companies. This
Twitter for the front-end processes JSON logs can be
study has collected data concerning the characteristics
applicable as a fast solution. Data category, integration
and techniques, of particular data methods, after a
hooks, robust data dependency and work flow scheduling
thorough analysis drawn from sixteen peer reviewed
schemas are all beneficial nonetheless applying schema
scientific articles published in 2007-2015. Nevertheless,
techniques and implementing a framework is required.
as discussed, the analysis and comparison of Google,
Together JSON logs and schema provide a solution to
Amazon, Facebook and Twitter big data techniques are
stand and overcome Twitter data mining [10].
outlined and contrasted with many other similar research Requirements of big data vary from one organization to
papers. However Advanced Light Source (ALS) is used another depending on the requirement of the company
by MongoDM (a document oriented data store) to and techniques in use. We clearly considered this
understand the performance, scalability and fault research experiment in the result section as an analysis
tolerance as a comparison of MongoDB and Hadoop, and and as a comparison of views. Furthermore, as an
this is a partially completed understanding of scientific instance Google built Dynamo and BigTable
analysis of two partially similar big data techniques [21]. characteristic partially similar in API nonetheless key
MonogoDB stores Meta data and also provides query aspects such as Architected and data model are
language. MapReduce allow users to write map and completely different. Moreover, investigating various
reduce functions. techniques for Big Data Databases [20], [22], [25],
This paper has discussed Google File system based on security, [26]-[28] prediction and pattern analysis [14]
MapReduce technology. Big data can be used for its could be an interesting path to explore in future.
many technological aspects. Google uses big data for
Google maps [22]. Geometry of road map in Tunisha is V. CONCLUSION
used for categories and traces of the roads and the use of
big GPS data which makes the divider of the data to Google, Amazon, Facebook and Twitter have gained
handle unstructured data [23]. MapReduce has a function enormous returns since big data methodologies and
techniques, lack research in this field. In this review, we
called parallel mode and sequential mode. Parallel
have completed a comparison-based analysis of big data
processes used for the implementation to find roads and
techniques used by various companies. The data has been
trace for vehicles. [14] conducted a research experiment collected from sixteen peer reviewed scientific
used 10GB raw data for process, using big data technique publications from 2007 to 2015. The collected data is in
MapReduce (for continuously growing data). Google and relation to the characteristics and techniques of data
Face, etc process trillion of data in a second or a minute storage. The analysis and comparison carried out in the
nevertheless some countries gain advantage of big data literature discuses different experimental analysis by
by implementing popular techniques of big data using the (category of technology, data model, etc.). In
(MapReduce, Hadoop) Google invented F1, Mesa and this review, we clearly differentiated all big data
Photon used the BigTable technique in big data to techniques according to their mode of techniques that
provide multi homing facility for Google search engine differ from each other. Both Facebook and Twitter are
process. Big Table implemented on top of MapReduce social media based companies that have different

Journal of Communications Vol. 13, No. 2, February 2018

requirements. Facebook requires a cascade prediction [7] A. Thusoo, J. Sarma, N. Jain, Z. Shao, P. Chakka, et al.,
system and coding language for data transition, besides “Hive: A warehousing solution over a map-reduce
Twitter requires a system infrastructure for handle framework,” Proceedings of the VLDB Endowment, vol. 2,
millions of Tweets (per minute data transaction). no. 2, pp. 1626-1629, 2009.
Nevertheless, Facebook and Twitter both require data [8] L. Abraham, S. Subramanian, J. Wiener, O. Zed, J. Allen,
from a ware house as a solution. When considered, the O. Barykin, et al., “Scuba: Diving into data at FaceBook,”
requirement of Google is similar to Facebook and Twitter, Proceedings of the VLDB Endowment, vol. 6, no. 11, pp.
the data ware house solution of Google implemented 1057-1067, 2013.
Google File system and Spanner, etc. Besides that, [9] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A.
Google owns YouTube for advertising and marketing, F1 Lakshman, et al., “Dynamo: Amazon's highly available
make all the process for budgeting, ad clicking and for key-value store,” ACM SIGOPS Operating Systems Review,
customer and user log handling. All these functions vol. 41, no. 6, p. 205, 2007.
connect to big data for connectivity to be used for big [10] G. Lee, J. Lin, C. Liu, A. Lorek, and D. Ryaboy, “The
data technologies as MapReduce, Hadoop, etc. For unified logging infrastructure for data analytics at Twitter,”
instance, XORing Elephants is Facebook programming Proceedings of the VLDB Endowment, vol. 5, no. 12, pp.
model Yedalog is Google programming model. Facebook
1771-1780, 2012.
built XORing Elephants for overcome Reed-Solomon
[11] S. Melnik, A. Gubarev, J. Long, G. Romer, S. Shivakumar,
codes limitations beside Google built Yedalog for
M. Tolton, and T. Vassilakis, “Dremel: Interactive analysis
overcome man powered coding limitation had in
of web-scale datasets,” Communications of the ACM, vol.
MapReduce and Spark technologies. This research study
54, no. 6, p. 114, 2011.
is based on comparison and analysis of big data
techniques. This study should be useful as a reference for [12] J. Madhavan, S. Balakrishnan, K. Brisbin, et al., “Big data
many researchers as this study provides technological storytelling through Interactive maps,” IEEE Data Eng.
analysis and comparison of Google, Facebook, Twitter Bull., 2012.
and Amazon big data techniques. [13] A. Gupta and J. Shute, “High-Availability at massive scale:
Building Google’s data infrastructure for ads,” in Proc.
AUTHOR CONTRIBUTION Workshop on Business Intelligence for the Real Time
Enterprise, 2015.
T.N.H. and M.N.H. conceived the study idea and [14] A. Gupta, A. Mohammad, A. Syed, and M. Halgamuge, “A
developed the analysis plan. T.N.H. analyzed the comparative study of classification algorithms using data
data and wrote the initial paper. M.N.H. helped to mining: Crime and accidents in Denver city the USA,”
prepare the tables and finalizing the manuscript. All International Journal of Advanced Computer Science and
authors read the manuscript. Applications, vol. 7, no. 7, 2016.
[15] IBM Netezza Data Warehouse Appliances – The Simple
REFERENCES Data Warehouse Appliance for Serious Analytics. (2017).
[Online]. Available: https://www-
[1] S. Ghemawat, H. Gobioff, and S. Leung, “The Google file 01.ibm.com/software/data/netezza/.
system,” ACM SIGOPS Operating Systems Review, vol. 37, [16] QlikTech. MongoDB. (2017). [Online]. Available:
no. 5, p. 29, 2003. https://www.mongodb.com/partners/software/qliktech
[2] J. Corbett, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, [17] F. Afrati, D. Delorey, M. Pasumansky, and J. Ullman,
H. Li, A. Lloyd, et al., “Spanner: Google’s globally “Storing and querying tree-structured records in Dremel,”
distributed database,” ACM Transactions on Computer Proceedings of the VLDB Endowment, vol. 7, no. 12, pp.
Systems, vol. 31, no. 3, pp. 1-22, 2013. 1131-1142, 2014.
[3] I. Rae, E. Rollins, J. Shute, S. Sodhi, and R. Vingralek, [18] J. Cheng, L. Adamic, P. Dow, J. Kleinberg, and J.
“Online, asynchronous schema change in F1,” Proceedings Leskovec, “Can cascades be predicted?” in Proc. 23rd
of the VLDB Endowment, vol. 6, no. 11, pp. 1045-1056, International Conference on World Wide Web, 2014.
2013. [19] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A.
[4] R. Ananthanarayanan, S. Venkataraman, V. Basker, S. Das, Dimakis, R. Vadali, S. Chen and D. Borthakur, “XORing
A. Gupta, et al., “Photon: Fault-tolerant and scalable elephants,” Proceedings of the VLDB Endowment, vol. 6,
joining of continuous data streams,” in Proc. International no. 5, pp. 325-336, 2013.
Conference on Management of Data - SIGMOD '13, 2013. [20] Vargas, A. Syed, A. Mohammad, and M. Halgamuge,
[5] A. Hall, O. Bachmann, R. Büssow, S. Gănceanu, and M. “Pentaho and jaspersoft: A comparative study of business
Nunkesser, “Processing a trillion cells per mouse click,” intelligence open source tools processing big data to
Proceedings of the VLDB Endowment, vol. 5, no. 11, pp. evaluate performances,” International Journal of Advanced
1436-1446, 2012. Computer Science and Applications, vol. 7, no. 10, 2016.
[6] B. Chin, D. Dincklage, V. Ercegovac, P. Hawkins, et al., [21] S. Khalid, A. Syed, A. Mohammad, and M. Halgamuge,
“Yedalog: Exploring Knowledge at Scale,” in Proc. 1st “Big-Data NoSQL databases: Comparison and analysis of
Summit on Advances in Programming Languages, 2015, ‘Big-Table’, ‘DynamoDB’, and ‘Cassandra’,” in Proc.
pp. 63-78.

Journal of Communications Vol. 13, No. 2, February 2018

IEEE 2nd International Conference on Big Data Analysis, communication and social media, processes of data analytics
2017. and related computer visions.
[22] K. Kaur, A. Syed, A. Mohammad, and M. Halgamuge,
“Review: An evaluation of major threats in cloud Malka N. Halgamuge is a Research
computing associated with big data,” in Proc. IEEE 2nd Fellow with the department of Electrical
International Conference on Big Data Analysis, 2017. and Electronic engineering, University of
[23] S. Suthaharan, “Big data classification,” ACM Melbourne. She received the PhD degree
from the same department in 2007. Since
SIGMETRICS Performance Evaluation Review, vol. 41, no.
then she has published in areas including
4, pp. 70-73, 2014. wireless communication, life sciences
[24] J. Lin and D. Ryaboy, “Scaling big data mining and data science/big data. She is an
infrastructure,” ACM SIGKDD Explorations Newsletter, experienced researcher and educator with
vol. 14, no. 2, p. 6, 2013 a demonstrated history of working with
[25] S. Munugala, G. K. Brar, A. Syed, A. Mohammad, and M. highly reputed research institutes all over the world on life
N. Halgamuge, “The much needed security and data sciences.
reforms of cloud computing in medical data storage,”
Ali Syed has wide experience as a
Applying Big Data Analytics in Bioinformatics and lecturer and examiner of undergraduate
Medicine, IGI Global, Chapter 5, pp. 99-113, February and postgraduate courses in the field of
2017. Information Systems Management,
[26] D. V. Pham, A. Syed, and M. N. Halgamuge, “Universal Business Studies and has been actively
Serial Bus Based Software Attacks and Protection involved with course development and
Solutions,” Digital Investigation, vol. 7, no. 3, pp. 172-184, delivery. He is a member of several
academic and practitioner communities.
Feb. 2011.
His experience and involvement with the
[27] D. V. Pham, A. Syed, A. Mohammad, and M. N. Industry contributes a strong flavor to Academia. His research
Halgamuge, “Threat analysis of portable hack tools from interests are in the areas of Systems development, Systems
USB storage devices and protection solutions,” in Proc. Security, Knowledge Management, Ethical Issues in
International Conference on Information and Emerging Information Systems and the implications of information
Technologies, pp. 1-5, Karachi, Pakistan, June 14-16, 2010. systems for people and their work environments.
[28] D. V. Pham, A. Syed, and M. N. Halgamuge, “Universal
Gullu Ekici is a PhD. Monash
serial bus based software attacks and protection solutions,” University (in progress - due to complete
Digital Investigation, vol. 7, no. 3, pp. 172-184, Feb. 2011. in 2018). Masters of Education,
(TESOL), Monash University. Bachelor
Thulara N. Hewage was born in of Arts (Linguistics), Monash University.
Western Province, Sri Lanka in 1990. Monash University I have more than 15
She received the B.Sc degree from the years’ experience in teaching/researching
University of Middlesex, London, in and academic skills support experience
2012 and the M.I.T degree from the at tertiary level in a range of settings.
University of Charles Sturt University
(CSU), Melbourne, in 2016, both in
Information Technology. Her research
interest includes Information Technology
enhanced theories, online

View publication stats

Big Data Unit 1 Notes
100% (1)
Big Data Unit 1 Notes
27 pages
Review: Big Data Techniques of Google, Amazon, Facebook and Twitter
No ratings yet
Review: Big Data Techniques of Google, Amazon, Facebook and Twitter
8 pages
Big Data Summery
No ratings yet
Big Data Summery
9 pages
Dont Do That
No ratings yet
Dont Do That
30 pages
001 Introduction Big Data
No ratings yet
001 Introduction Big Data
12 pages
Processign Using Hadoop
No ratings yet
Processign Using Hadoop
44 pages
What Is Big Data - Introduction
No ratings yet
What Is Big Data - Introduction
6 pages
2015 Sentiment Analysis of Twitter Data Within Big Data Distributed
No ratings yet
2015 Sentiment Analysis of Twitter Data Within Big Data Distributed
6 pages
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
No ratings yet
ICT30005 - Assignment 1 - Begum Bolu 6623433 - Big Data Analytics
7 pages
14 Big Data
No ratings yet
14 Big Data
39 pages
Group 11 ISM
No ratings yet
Group 11 ISM
18 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Unit I - Business Analytics
No ratings yet
Unit I - Business Analytics
22 pages
Big Data: Concepts, Techniques, Storage and Challenges
No ratings yet
Big Data: Concepts, Techniques, Storage and Challenges
9 pages
Big Data
No ratings yet
Big Data
9 pages
Enterprise Integration Report
No ratings yet
Enterprise Integration Report
7 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Beyond The Hype
No ratings yet
Beyond The Hype
30 pages
Fundamentos Big Data
No ratings yet
Fundamentos Big Data
80 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Report 1
No ratings yet
Report 1
8 pages
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
No ratings yet
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
11 pages
CC Becse Unit 4 PDF
No ratings yet
CC Becse Unit 4 PDF
32 pages
UNIT 1big Data Introduction
No ratings yet
UNIT 1big Data Introduction
56 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Bda U1
No ratings yet
Bda U1
78 pages
1 Bda
No ratings yet
1 Bda
41 pages
Lecture Notes - Introduction To Big Data
0% (1)
Lecture Notes - Introduction To Big Data
8 pages
BDA Upto Unit3
No ratings yet
BDA Upto Unit3
42 pages
Big Data
100% (2)
Big Data
20 pages
ETEM S01 - (Big Data)
No ratings yet
ETEM S01 - (Big Data)
24 pages
Cours BI 23 24 Session 4 2
No ratings yet
Cours BI 23 24 Session 4 2
46 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
A Study On Big Data Processing Mechanism & Applicability: Byung-Tae Chun and Seong-Hoon Lee
No ratings yet
A Study On Big Data Processing Mechanism & Applicability: Byung-Tae Chun and Seong-Hoon Lee
10 pages
Understanding Big Data and NoSQL
No ratings yet
Understanding Big Data and NoSQL
31 pages
Bda 1
No ratings yet
Bda 1
26 pages
BDA Lecture Notes Updated Unit 1
No ratings yet
BDA Lecture Notes Updated Unit 1
37 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
Unit-1.1-Introduction To Big Data
No ratings yet
Unit-1.1-Introduction To Big Data
50 pages
UNIT 1 - BIG DATA ANALYTICS Full
No ratings yet
UNIT 1 - BIG DATA ANALYTICS Full
28 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Big Data
No ratings yet
Big Data
20 pages
Big Data - Unit-I
No ratings yet
Big Data - Unit-I
17 pages
Akash Decap456 Introduction To Big Data
No ratings yet
Akash Decap456 Introduction To Big Data
297 pages
Big Data Applications and Tools
No ratings yet
Big Data Applications and Tools
16 pages
Big Data Sent 24 10 24
No ratings yet
Big Data Sent 24 10 24
49 pages
Mkristel Aleman Mreview Bigdata
No ratings yet
Mkristel Aleman Mreview Bigdata
11 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
Applications of The Big Data Analytics
No ratings yet
Applications of The Big Data Analytics
3 pages
Review of Recent Technologies in Big Data Analysis
No ratings yet
Review of Recent Technologies in Big Data Analysis
3 pages
Big Data Unit-I
No ratings yet
Big Data Unit-I
28 pages
Big Data The Driver For Innovation in Databases
No ratings yet
Big Data The Driver For Innovation in Databases
4 pages
Understanding The Big Data Problems and Their Solutions Using Hadoop and Map-Reduce
No ratings yet
Understanding The Big Data Problems and Their Solutions Using Hadoop and Map-Reduce
7 pages
Application of Big Data For Students' Behavior Prediction in Education Industry
No ratings yet
Application of Big Data For Students' Behavior Prediction in Education Industry
11 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
Unit 1
No ratings yet
Unit 1
11 pages
Big Data
100% (6)
Big Data
56 pages
Big Data and Social Commerce Submitted By: Asmita Maharjan (1921021015) Purpose of The Study
No ratings yet
Big Data and Social Commerce Submitted By: Asmita Maharjan (1921021015) Purpose of The Study
4 pages
Unit 1 BD
No ratings yet
Unit 1 BD
24 pages
Data Mining Project Titles 2016-2017
100% (3)
Data Mining Project Titles 2016-2017
4 pages
Sybca Bigdata MCQ
No ratings yet
Sybca Bigdata MCQ
7 pages
MothersonSumi INfotech & Designs Limited
No ratings yet
MothersonSumi INfotech & Designs Limited
68 pages
Case Study Smart Cities
No ratings yet
Case Study Smart Cities
10 pages
Business Analytics II, Mid Term
No ratings yet
Business Analytics II, Mid Term
9 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
16 pages
The Scientific World Journal - 2014 - Khan - Big Data Survey Technologies Opportunities and Challenges
No ratings yet
The Scientific World Journal - 2014 - Khan - Big Data Survey Technologies Opportunities and Challenges
18 pages
The - FINESCE - Smart - Energy - Platform
No ratings yet
The - FINESCE - Smart - Energy - Platform
17 pages
50 Years of Data Science
No ratings yet
50 Years of Data Science
23 pages
5v Big Data Impletamentation
No ratings yet
5v Big Data Impletamentation
1 page
54 Top Business Intelligence Tools - Compare BI Software - Docurated
No ratings yet
54 Top Business Intelligence Tools - Compare BI Software - Docurated
46 pages
5212-1693457982871-NEW - Unit 16 - CRP-SEM3 - Proposal 2023 Big Data (AutoRecovered)
No ratings yet
5212-1693457982871-NEW - Unit 16 - CRP-SEM3 - Proposal 2023 Big Data (AutoRecovered)
53 pages
Consolidated Framework and Exam Roadmaps
No ratings yet
Consolidated Framework and Exam Roadmaps
6 pages
58.cse 1.1.3.
No ratings yet
58.cse 1.1.3.
45 pages
A Study On E-Commerce Recommender System Based On Big Data
No ratings yet
A Study On E-Commerce Recommender System Based On Big Data
5 pages
Tmw21544 Big Data Analytics Ref
No ratings yet
Tmw21544 Big Data Analytics Ref
35 pages
Week 4 - Social Media Analytics
No ratings yet
Week 4 - Social Media Analytics
38 pages
Advanced Architecting On AWS Description
No ratings yet
Advanced Architecting On AWS Description
2 pages
Arquivo5203 1
No ratings yet
Arquivo5203 1
180 pages
SQL Fundamentals Slides
100% (1)
SQL Fundamentals Slides
84 pages
Impacts of Social Media Among The Youth On Behavior Change
50% (2)
Impacts of Social Media Among The Youth On Behavior Change
12 pages
Data Science and Data Analytics: Part B
No ratings yet
Data Science and Data Analytics: Part B
42 pages
International Conference On Big Data, Iot and Machine Learning (Bim 2021)
No ratings yet
International Conference On Big Data, Iot and Machine Learning (Bim 2021)
1 page
JD Associate Skywise Engineer
No ratings yet
JD Associate Skywise Engineer
1 page
7091cem Assignment Brief
No ratings yet
7091cem Assignment Brief
10 pages
CaprusIT Profile
No ratings yet
CaprusIT Profile
20 pages
Analytics of Airtel
No ratings yet
Analytics of Airtel
5 pages
Big Data in Hotel Revenue Management - Exploring Cancellation Drivers To Gain Insights Into Booking Cancellation Behavior
No ratings yet
Big Data in Hotel Revenue Management - Exploring Cancellation Drivers To Gain Insights Into Booking Cancellation Behavior
22 pages
Operations Research Dissertation Topics
100% (2)
Operations Research Dissertation Topics
5 pages
Big Data Imp-1
No ratings yet
Big Data Imp-1
16 pages

Review: Big Data Techniques of Google, Amazon, Facebook and Twitter

Uploaded by

Review: Big Data Techniques of Google, Amazon, Facebook and Twitter

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Review: Big Data Techniques of Google, Amazon, Facebook and Twitter

Article in Journal of Communications · February 2018

Malka N. Halgamuge Gullu Ekici

SEE PROFILE SEE PROFILE

Internet of Things (IoT), Sensor Network View project

The user has requested enhancement of the downloaded file.

Review: Big Data Techniques of Google, Amazon,

©2018 Journal of Communications 94

©2018 Journal of Communications 95

Ananthanara Photon Google - Make the availability of real time data

©2018 Journal of Communications 96

Sathiamoorthy et Facebook XORing Hadoop HDFS Programming Facebook overcomes

Abraham et al. Facebook Scuba Hadoop Database Facebook data warehousing

©2018 Journal of Communications 97

©2018 Journal of Communications 98

©2018 Journal of Communications 99

©2018 Journal of Communications 100

View publication stats

You might also like