0% found this document useful (0 votes)

70 views6 pages

An Outline of Big Data Tools & Technologies

Uploaded by

dinesh kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

70 views6 pages

An Outline of Big Data Tools & Technologies

Uploaded by

dinesh kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Volume II, Issue XI, November 2013 IJLTEMAS ISSN 2278 - 2540

An Outline of Big Data Tools & Technologies

Dinesh Kumar, Assistant Professor, MDU Rohtak, [email protected]

Kamal, Assistant Professor, MDU Rohtak, [email protected]

Kanishka Raheja, M.Tech Scholar, MDU Rohtak, [email protected]

1. Overview:

Many of the latest big data technologies are developed by big data startups from around the world that have found a
way to deal with the vast amounts of data. They have developed disruptive big data technology that can be used by
organizations to obtain valuable insights and turn data into information and wisdom. Of course, also the large
existing IT players have developed substantial amounts of big data technology in the recent years. Especially large
corporations that want an all-inclusive package installed use those technologies. There are also many different types
of analysis that these startups perform and each will have a different impact or result. Some technologies integrate
data from different sources directly into a platform, skipping the need for additional data warehousing, while being
able to deliver real-time interactive charts that are easy to interact with or to understand. These PaaS (Platform as a
service) or DaaS (Data as a Service) solutions allow end-users to work with the data without requiring technical
knowledge. There is unanimous agreement that big data is revolutionizing commerce in the 21st century. When it
comes to business, big data offers unprecedented insight, improved decision-making and untapped sources of profit.
Sometimes, ad hoc tools and applications are the best solution, especially when the objective is to provide
information for very specific requirements.

2. Learning from visualizations

There are big data technology vendors that focus on delivering the optimal graphical representation of big data.
Visualizing unstructured and structured data is necessary to make the data understandable and turn it into
information, but it is also very challenging. New big data startups however seem understand the practice of
visualizing and have developed different solutions. One example is visualization based on the visual cortex of the
human eye. This maximizes the ability of pattern recognition for the human brain. It makes it easy to read and
understand massive amounts of relational data. The use of color and different thicknesses of the threats shown
within the cortex allow users to easily recognize patterns and discover abnormalities. Another way of visualizing is
to use a technique called topological data analysis. This type of analysis focuses on the shape of complex data and is
able to identify clusters and any statistical significance that is present. Data scientist can use this to reveal inherent
patterns in those clusters. This type of analyses is best visualized with 3D clusters that show the topological spaces
and can be explored interactively. It is definitely not always a necessity to have complex, innovative and interactive
graphical representations.

Figure 1: Database scenario for Big Data

www.ijltemas.in Page 66
Volume II, Issue XI, November 2013 IJLTEMAS ISSN 2278 - 2540

Infographics are visual representations of information, data or knowledge and they can help to make difficult and
complex material quickly understandable. Dashboards combining different data streams showing ‘traditional’ graphs
(column charts, line charts, pie charts or bar charts) can also provide valuable insights and there are many startups
offering such solution. Sometimes, real-time updated simple graphs showing the status of processes already provide
more valuable information to improve decision-making then complex innovative visualizations. Visualizations on
mobile devices get a completely new meaning when a user is able to play intuitively with the data while swiping,
pinching, rotating or zooming on a mobile device.

2.1 Predicting the future

Having real-time analyses visualized in a great way is important, but being able to predict future outcomes will
provide even more value to organizations. Analyzing current and historical big data can help to make predictions
about future events. This is a huge difference from existing business intelligence, which normally only looks at what
has happened using analytical tools but this says nothing about the future. Predictive analysis can help companies
provide actionable intelligence based on that same data. Also machine-learning platforms, such as Skytree, can
predict trends, make recommendations and reveal untapped markets and customer based on available data. Machine
learning goes much further then general business intelligence. Machine learning is about creating algorithms and
systems that can learn from the data they process and analyze. The more data processed, the better the algorithm will
become.

2.2 Customer Profiling

Profiling of (potential) customers is used to better target customers and better understand (potential) customers. The
ultimate goal should be to develop a 360-degrees view of each individual customer, so that eventually an individual
offering can be created. Behavioral analytics can be used to discover patterns in (un)structured data across customer
touch points, giving organizations better insights in the different types of customers they have. The profiles can also
be used within recommendation systems. Of course we have the recommendations from large web shops such as
Amazon.com that recommend other products that a user can buy when he or she is in the process of checkout. With
big data, real-time recommendations are possible and also more extensive recommendations. Decide for example
helps consumers with data-backed recommendations whether to buy now or wait for a new upcoming product. It
may be clear that there are so many different possibilities with big data. The global big data landscape with big data
startups focusing on different areas is growing rapidly. Therefore we are developing the Big Data Strategy Model to
provide some clarification and guidance for organizations in finding the right big data technology. This innovative,
one-of-a-kind, model will be revealed soon and will be available for free. It will enable organizations to understand
what they can achieve with the data they have, what data they need to develop a certain strategy, which big data
technology they need to do that and what big data technology vendor could help them with achieving that strategy.

3. Selecting Big Data Technology: Operational vs. Analytical

The Big Data landscape is dominated by two classes of technology: systems that provide operational capabilities for
real-time, interactive workloads where data is primarily captured and stored; and systems that provide analytical
capabilities for retrospective, complex analysis that may touch most or all of the data. These classes of technology
are complementary and frequently deployed together. Operational and analytical workloads for Big Data present
opposing requirements and systems have evolved to address their particular demands separately and in very different
ways. Each has driven the creation of new technology architectures. Operational systems, such as the NoSQL
databases, focus on servicing highly concurrent requests while exhibiting low latency for responses operating on
highly selective access criteria. Analytical systems, on the other hand, tend to focus on high throughput; queries can
be very complex and touch most if not all of the data in the system at any time. Both systems tend to operate over
many servers operating in a cluster, managing tens or hundreds of terabytes of data across billions of records.

www.ijltemas.in Page 67
Volume II, Issue XI, November 2013 IJLTEMAS ISSN 2278 - 2540

3.1 Operational Big Data

For operational Big Data workloads, NoSQL Big Data systems such as document databases have emerged to address
a broad set of applications, and other architectures, such as key-value stores, column family stores, and graph
databases are optimized for more specific applications. NoSQL technologies, which were developed to address the
shortcomings of relational databases in the modern computing environment, are faster and scale much more quickly
and inexpensively than relational databases. Critically, NoSQL Big Data systems are designed to take advantage of
new cloud computing architectures that have emerged over the past decade to allow massive computations to be run
inexpensively and efficiently. This makes operational Big Data workloads much easier to manage, and cheaper and
faster to implement. In addition to user interactions with data, most operational systems need to provide some
degree of real-time intelligence about the active data in the system. For example in a multi-user game or financial
application, aggregates for user activities or instrument performance are displayed to users to inform their next
actions. Some NoSQL systems can provide insights into patterns and trends based on real-time data with minimal
coding and without the need for data scientists and additional infrastructure.

Operational Analytical

Latency 1 ms - 100 ms 1 min - 100 min

Concurrency 1000 - 100,000 1 - 10

Access
Writes and Reads Reads
Pattern

Queries Selective Unselective

Data Scope Operational Retrospective

End User Customer Data Scientist

MapReduce, MPP
Technology NoSQL
Database

Table 1: Overview of
Operational vs. Analytical
Systems

3.2 Analytical Big Data

These technologies are also a reaction to the limitations of traditional relational databases and their lack of ability to
scale beyond the resources of a single server. Furthermore, MapReduce provides a new method of analyzing data
that is complementary to the capabilities provided by SQL. As applications gain traction and their users generate
increasing volumes of data, there are a number of retrospective analytical workloads that provide real value to the
business. Where these workloads involve algorithms that are more sophisticated than simple aggregation,
MapReduce has emerged as the first choice for analytics for Big Data. Some NoSQL systems provide native
MapReduce functionality that allows for analytics to be performed on operational data in place. Alternately, data can
be copied from NoSQL systems into analytical systems such as Hadoop for MapReduce.

www.ijltemas.in Page 68
Volume II, Issue XI, November 2013 IJLTEMAS ISSN 2278 - 2540

3.3 Combining Operational and Analytical Technologies; Using Hadoop

New technologies like NoSQL, MPP databases, and Hadoop have emerged to address Big Data challenges and to
enable new types of products and services to be delivered by the business. One of the most common ways
companies are leveraging the capabilities of both systems is by integrating a NoSQL database such as MongoDB
with Hadoop. The connection is easily made by existing APIs and allows analysts and data scientists to perform
complex, retroactive queries for analysis and insights while maintaining the efficiency and ease-of-use of a NoSQL
database. NoSQL, MPP databases and Hadoop are complementary: NoSQL systems should be used to capture Big
Data and provide operational intelligence to users and MPP databases and Hadoop should be used to provide
analytical insight for analysts and data scientists. Together, NoSQL, MPP databases and Hadoop enable businesses
to capitalize on Big Data.

4. Which Technologies should we use?

4.1 The user tools: Sometimes, ad hoc tools and applications are the best solution, especially when the
objective is to provide information for very specific requirements. For example, the results may be integrated in e-
commerce and online services or fundamental for customer and technical support services. In other situations,
where reports reflect more typical results in terms of charts, graphs, pivots or matrices of results, standard reporting
and BI tools may be the right solution. In this area, a large number of purely commercial products are paired with
some open source products that have commercial support. The clear trend is to provide these tools "as a service", so
that customers can benefit from a vast scalability for their analysis. The "as a service model" is extremely cost-
effective to start a project. One think go consider, however, is if it is possible to change this approach if it becomes
less convenient compared to others in the future. This may require the need to re-engineer all the operations.

4.2 The analysis software: Today, the majority of the software used to analyze different information is
commercial or open source with commercial support. Software varies in terms of type of data and objectives. For
structured data, data mining tools have been around for many years and have reached their maturity so they can be
effectively used in Big Data. Map/Reduce operations can crunch a large number of structured data and find patterns
or show behaviors that could have not been feasible only few years ago. Extensions to this software can successfully
analyze unstructured data, in terms of collected text, documents, images, as well as audio and video streams. These
extensions can find similarities in video and audio clips, or in photos. They can understand not only the text stored
in a document but also the sentiment and the emotions expressed in collected comments and texts.

4.3 Modeling software: Data modeling is very much related to the kind of analysis to perform. In general, this
software comes with the analysis, but some technologies can be alternatives and follow different approaches.

4.4 Infrastructure: This is the area where commodity hardware and open source software is mostly used.
There are many commercial solutions that promise improved performance and optimizations, but the common trend
is to use less expensive and generally available boxes and products.

4.5 Choice of database: The technology at the center of a Big Data project is, without any doubt, the database.
There are several commercial options for Big Data, but the common trend is in the open source area. The set of
products developed by the Apache Foundation under the Hadoop umbrella and many side projects are extremely
popular and they are considered the de facto standard for Big Data. Truth is, Hadoop can solve only some aspects of
the analysis required for Big Data and it may be necessary to pair with other database technologies. NoSQL
technologies are particularly popular for their scalability and performance. Cassandra, for example, is a technology
mainly used to store a large set of data collection. It is very effective for fast data inserts rather than for analysis.
Therefore many projects see Cassandra used to store the acquired data with a denormalised model. MongoDB is
used in many cases to store documents or unstructured data in general. Data can be later reviewed and analyzed so
that more structured information can be stored in other databases. SQL databases are always a viable choice for Big
Data, although they seem to be less popular than Hadoop, Cassandra and MongoDB. Due to their internal
architecture, relational databases may struggle if the data acquired is unstructured or it is organized in large objects,
such as documents and multimedia clips. In the recent years, much has been done in this area, so relational databases

www.ijltemas.in Page 69
Volume II, Issue XI, November 2013 IJLTEMAS ISSN 2278 - 2540

today are very different from the ones that were used 10 or more years ago. Certainly, the handling and the analysis
of structured data is where relational databases can play a leading role. Modern relational databases combine the
efficiency of SQL with functionality that can provide faster indexing and optimized access to the data. Columnar
relational databases provide for great improvements in traditional data analysis. New indexing algorithms also solve
the nuisance of data statistics rebuild, index optimization and storage inefficiency when data is moved in large sets.
In addition to these aspects, some relational databases also provide a map/reduce approach similar to the one
available in Hadoop and in other NoSQL products. MariaDB is a drop-in replacement for MySQL, the most used
open source database for online applications. MariaDB falls into the category of the NewSQL products, i.e. a
product that provides unique NoSQL features together with the typical features available in relational databases.
Therefore, aspects like transaction management, durability and consistency are available together with schema or
schema-less modelling, full text storage and analysis and integration with other NoSQL technologies.

5. Emerging technologies for Big Data

There is an infinite number and combination of different technologies that can be used to give you the perception
that the project is already landing somewhere in terms of requirements.

 Column-oriented databases: Traditional, row-oriented databases are excellent for online transaction
processing with high update speeds, but they fall short on query performance as the data volumes grow and
as data become more unstructured. Column-oriented databases store data with a focus on columns, instead
of rows, allowing for huge data compression and very fast query times. The downside to these databases is
that they will generally only allow batch updates, having a much slower update time than traditional
models.
 Schema-less databases, or NoSQL databases: There are several database types that fit into this category,
such as key-value stores and document stores, which focus on the storage and retrieval of large volumes of
unstructured, semi-structured, or even structured data. They achieve performance gains by doing away with
some (or all) of the restrictions traditionally associated with conventional databases, such as read-write
consistency, in exchange for scalability and distributed processing.
 MapReduce: This is a programming paradigm that allows for massive job execution scalability against
thousands of servers or clusters of servers. Any MapReduce implementation consists of two tasks. One the
"Map" task, where an input dataset is converted into a different set of key/value pairs, or tuples; other the
"Reduce" task, where several of the outputs of the "Map" task are combined to form a reduced set of tuples
(hence the name).
 Hadoop: Hadoop is by far the most popular implementation of MapReduce, being an entirely open source
platform for handling Big Data. It is flexible enough to be able to work with multiple data sources, either
aggregating multiple sources of data in order to do large scale processing, or even reading data from a
database in order to run processor-intensive machine learning jobs. It has several different applications, but
one of the top use cases is for large volumes of constantly changing data, such as location-based data from
weather or traffic sensors, web-based or social media data, or machine-to-machine transactional data.
 Hive: Hive is a "SQL-like" bridge that allows conventional BI applications to run queries against a Hadoop
cluster. It was developed originally by Facebook, but has been made open source for some time now, and
it's a higher-level abstraction of the Hadoop framework that allows anyone to make queries against data
stored in a Hadoop cluster just as if they were manipulating a conventional data store. It amplifies the reach
of Hadoop, making it more familiar for BI users.
 PIG: PIG is another bridge that tries to bring Hadoop closer to the realities of developers and business
users, similar to Hive. Unlike Hive, however, PIG consists of a "Perl-like" language that allows for query
execution over data stored on a Hadoop cluster, instead of a "SQL-like" language. PIG was developed by
Yahoo!, and, just like Hive, has also been made fully open source.
 WibiData: WibiData is a combination of web analytics with Hadoop, being built on top of HBase, which is
itself a database layer on top of Hadoop. It allows web sites to better explore and work with their user data,
enabling real-time responses to user behavior, such as serving personalized content, recommendations and
decisions.
 PLATFORA: Perhaps the greatest limitation of Hadoop is that it is a very low-level implementation of
MapReduce, requiring extensive developer knowledge to operate. Between preparing, testing and running
jobs, a full cycle can take hours, eliminating the interactivity that users enjoyed with conventional

www.ijltemas.in Page 70
Volume II, Issue XI, November 2013 IJLTEMAS ISSN 2278 - 2540

databases. PLATFORA is a platform that turns user's queries into Hadoop jobs automatically, thus creating
an abstraction layer that anyone can exploit to simplify and organize datasets stored in Hadoop.

 SkyTree: SkyTree is a high-performance machine learning and data analytics platform focused specifically
on handling Big Data. Machine learning, in turn, is an essential part of Big Data, since the massive data
volumes make manual exploration, or even conventional automated exploration methods unfeasible or too
expensive.

6. Conclusions: When it comes to Big Data, it is not always necessary to move away from well-known
technologies like relational databases. Modern NewSQL databases like MariaDB can achieve the objective and
provide all the features required. The result is a smoother learning curve, less risk, reuse of known technologies and
resources and ultimately a reduced total cost of a Big Data project. In other cases, MariaDB can be used in
conjunction with NoSQL technologies and integrated in many different ways. To that end, the most important point
to consider is that when a single technology is not enough for a successful project, ease of integration is a must.

References:

1. White, Tom (10 May 2012). Hadoop: The Definitive Guide. O'Reilly Media. p. 3. ISBN 978-1-4493-3877-
0.
2. "Data, data everywhere". The Economist. 25 February 2010. Retrieved 9 December 2012.
3. "IBM What is big data? — Bringing big data to the enterprise". www.ibm.com. Retrieved 2013-08-26.
4. "Big Data Definition". MIKE2.0. Retrieved 9 March 2013.
5. Bernhard Warner (April 25, 2013). "'Big Data' Researchers Turn to Google to Beat the Markets".
Bloomberg Businessweek. Retrieved August 9, 2013.
6. Kalil, Tom. "Big Data is a Big Deal". White House. Retrieved 26 September 2012.
7. Kusnetzky, Dan. "What is "Big Data?"". ZDNet.
8. Hellerstein, Joe (9 November 2008). "Parallel Programming in the Age of Big Data". Gigaom Blog.

www.ijltemas.in Page 71

Petrol Pump Management System Project Report
88% (8)
Petrol Pump Management System Project Report
136 pages
Data Mining and Predictive Analytics - Andres Fortino
No ratings yet
Data Mining and Predictive Analytics - Andres Fortino
390 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Big Data
No ratings yet
Big Data
6 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
Big Data Analysis Guide
No ratings yet
Big Data Analysis Guide
11 pages
Need of Big Data
No ratings yet
Need of Big Data
5 pages
Chapter 3 Big Data Analytics and Big Data Analytics Techniques PDF
No ratings yet
Chapter 3 Big Data Analytics and Big Data Analytics Techniques PDF
22 pages
IM08
No ratings yet
IM08
36 pages
Big Data Analytics
No ratings yet
Big Data Analytics
8 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
Big Data Analytics in Financial Reporting - Trends and Challenges
No ratings yet
Big Data Analytics in Financial Reporting - Trends and Challenges
17 pages
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
No ratings yet
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
17 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
Big Data
No ratings yet
Big Data
23 pages
Big Data Spectrum
No ratings yet
Big Data Spectrum
61 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
CC Unit 3 Imp Questions
No ratings yet
CC Unit 3 Imp Questions
15 pages
Unit 2
No ratings yet
Unit 2
35 pages
5.innovating Big Data Analytic
No ratings yet
5.innovating Big Data Analytic
27 pages
Project FInal Report
No ratings yet
Project FInal Report
67 pages
Introduction To Big Data Unit - 2
No ratings yet
Introduction To Big Data Unit - 2
75 pages
Big Data: A Comprehensive
No ratings yet
Big Data: A Comprehensive
15 pages
Ba 4
No ratings yet
Ba 4
5 pages
Big Data
No ratings yet
Big Data
11 pages
Unit 1 Big Data Analytics Full
No ratings yet
Unit 1 Big Data Analytics Full
29 pages
BYTE D1-4 BigDataTechnologiesInfrastructures FINAL - Compressed
No ratings yet
BYTE D1-4 BigDataTechnologiesInfrastructures FINAL - Compressed
34 pages
Data, Big
No ratings yet
Data, Big
90 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
IT UNIT 2 Part 1
No ratings yet
IT UNIT 2 Part 1
33 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
Group 4
No ratings yet
Group 4
10 pages
Assignment 1 - Big Data in Big Companies
No ratings yet
Assignment 1 - Big Data in Big Companies
5 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
Bda Unit 1
No ratings yet
Bda Unit 1
20 pages
BD Archpatterns2 PDF
No ratings yet
BD Archpatterns2 PDF
11 pages
Part 1 - Introduction To Big Data
No ratings yet
Part 1 - Introduction To Big Data
24 pages
BIG Data Analytics 21CSH-471: Computer Science & Engineering
No ratings yet
BIG Data Analytics 21CSH-471: Computer Science & Engineering
16 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
UNIT Two Emerging Technology
No ratings yet
UNIT Two Emerging Technology
43 pages
Demystifying Big Data RGc1.0
100% (1)
Demystifying Big Data RGc1.0
10 pages
Unit 1 (Big Data Analytics)
No ratings yet
Unit 1 (Big Data Analytics)
11 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
TDWI BPReport Q411 Big Data ExecSummary
No ratings yet
TDWI BPReport Q411 Big Data ExecSummary
6 pages
Big Data Analysis Solutions For Driving Innovation in On-Site Decision Making
No ratings yet
Big Data Analysis Solutions For Driving Innovation in On-Site Decision Making
9 pages
Big Data in Business
No ratings yet
Big Data in Business
13 pages
Big Data Analytics
No ratings yet
Big Data Analytics
5 pages
1 Bda
No ratings yet
1 Bda
41 pages
Big Data: Ntroduction
No ratings yet
Big Data: Ntroduction
4 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
Big Data: Concepts, Techniques, Storage and Challenges
No ratings yet
Big Data: Concepts, Techniques, Storage and Challenges
9 pages
G12 It Unit 2
No ratings yet
G12 It Unit 2
30 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
BDA Class1
No ratings yet
BDA Class1
26 pages
Views On Big Data
No ratings yet
Views On Big Data
16 pages
Rdbms 9
No ratings yet
Rdbms 9
11 pages
ETL Overview - The ETL Process - General ETL Issue
No ratings yet
ETL Overview - The ETL Process - General ETL Issue
5 pages
SQL Unit - 1
No ratings yet
SQL Unit - 1
29 pages
E Phishing Doc 2022
No ratings yet
E Phishing Doc 2022
23 pages
Worksheet Database Cambridge With Answer
No ratings yet
Worksheet Database Cambridge With Answer
2 pages
Data Warehouse - Logical Design
No ratings yet
Data Warehouse - Logical Design
40 pages
FU-HCM Introduction To Databases: You Are Here
No ratings yet
FU-HCM Introduction To Databases: You Are Here
16 pages
Savelios Aslanidis - Thesis Project - DBMS For Student Admissions
No ratings yet
Savelios Aslanidis - Thesis Project - DBMS For Student Admissions
45 pages
Open Source Rad With Openobject: Reamble
No ratings yet
Open Source Rad With Openobject: Reamble
18 pages
Database Security
No ratings yet
Database Security
101 pages
DBMS Lecture Notes
No ratings yet
DBMS Lecture Notes
120 pages
DMS Assignment
No ratings yet
DMS Assignment
17 pages
Dbms Assignment
No ratings yet
Dbms Assignment
30 pages
Redshift Vs Snowflake - An In-Depth Comparison PDF
100% (2)
Redshift Vs Snowflake - An In-Depth Comparison PDF
19 pages
Internet of Things 2024
No ratings yet
Internet of Things 2024
112 pages
SSF Computer Level 6 PDF
100% (1)
SSF Computer Level 6 PDF
8 pages
DBMS
No ratings yet
DBMS
32 pages
UGRD ITE6100B Fundamentals of Database System PDF
No ratings yet
UGRD ITE6100B Fundamentals of Database System PDF
21 pages
NORMALIZATION
No ratings yet
NORMALIZATION
51 pages
Databases Slides
No ratings yet
Databases Slides
36 pages
Logical Database Design and The Relational Model
No ratings yet
Logical Database Design and The Relational Model
109 pages
Ch-7 MYSQL
No ratings yet
Ch-7 MYSQL
15 pages
2 - Disadvantages of NoSQL Technology
No ratings yet
2 - Disadvantages of NoSQL Technology
3 pages
SQL DBA Mod 1 Intro
No ratings yet
SQL DBA Mod 1 Intro
27 pages
Rdbms Notes
No ratings yet
Rdbms Notes
28 pages
Sample Viva Question IT Practicals
No ratings yet
Sample Viva Question IT Practicals
4 pages
OGG Flume Integration
No ratings yet
OGG Flume Integration
12 pages
Unit 2 - Question Bank
No ratings yet
Unit 2 - Question Bank
18 pages

An Outline of Big Data Tools & Technologies

Uploaded by

An Outline of Big Data Tools & Technologies

Uploaded by

Volume II, Issue XI, November 2013 IJLTEMAS ISSN 2278 - 2540

An Outline of Big Data Tools & Technologies

Kamal, Assistant Professor, MDU Rohtak, [email protected]

Kanishka Raheja, M.Tech Scholar, MDU Rohtak, [email protected]

2. Learning from visualizations

Figure 1: Database scenario for Big Data

2.1 Predicting the future

2.2 Customer Profiling

3. Selecting Big Data Technology: Operational vs. Analytical

3.1 Operational Big Data

Latency 1 ms - 100 ms 1 min - 100 min

Concurrency 1000 - 100,000 1 - 10

Queries Selective Unselective

Data Scope Operational Retrospective

End User Customer Data Scientist

3.2 Analytical Big Data

3.3 Combining Operational and Analytical Technologies; Using Hadoop

4. Which Technologies should we use?

5. Emerging technologies for Big Data

You might also like