Unit 9 Emerging Database Technology and Application
Unit 9 Emerging Database Technology and Application
Big data involves the data produced by different devices and applications. Given below are
some of the fields that come under the umbrella of Big Data.
Black Box Data − It is a component of helicopter, airplanes, and jets, etc. It captures
voices of the flight crew, recordings of microphones and earphones, and the
performance information of the aircraft.
Social Media Data − Social media such as Facebook and Twitter hold information and
the views posted by millions of people across the globe.
Stock Exchange Data − The stock exchange data holds information about the ‘buy’ and
‘sell’ decisions made on a share of different companies made by the customers.
Power Grid Data − the power grid data holds information consumed by a particular
node with respect to a base station.
Transport Data − Transport data includes model, capacity, distance and availability of a
vehicle.
Search Engine Data − Search engines retrieve lots of data from different databases.
Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in
it will be of three types.
Using the information kept in the social network like Facebook, the marketing agencies
are learning about the response for their campaigns, promotions, and other advertising
mediums.
Using the information in the social media like preferences and product perception of
their consumers, product companies and retail organizations are planning their
production.
Using the data regarding the previous medical history of patients, hospitals are providing
better and quick service.
Concept of NoSQL?
NoSQL Database is a non-relational Data Management System, that does not require a fixed
schema. It avoids joins, and is easy to scale. The major purpose of using a NoSQL database is for
distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-
time web apps. For example, companies like Twitter, Facebook and Google collect terabytes of
user data every single day.
NoSQL database stands for "Not Only SQL" or "Not SQL." Though a better term would be
"NoREL", NoSQL caught on. Carl Strozz introduced the NoSQL concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a
NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data.
Over time, four major types of NoSQL databases emerged: document databases, key-value
databases, wide-column stores, and graph databases. Let’s examine each type.
Document databases
o store data in documents similar to JSON (JavaScript Object Notation) objects. Each
document contains pairs of fields and values.
o The values can typically be a variety of types including things like strings, numbers,
Booleans, arrays, or objects, and their structures typically align with objects
developers are working with in code.
o Because of their variety of field value types and powerful query languages, document
databases are great for a wide variety of use cases and can be used as a general
purpose database.
Key-value databases
o are a simpler type of database where each item contains keys and values. A value can
typically only be retrieved by referencing its key, so learning how to query for a
specific key-value pair is typically simple.
o Key-value databases are great for use cases where you need to store large amounts of
data but you don’t need to perform complex queries to retrieve it. Common use cases
include storing user preferences or caching.
Wide-column
o stores store data in tables, rows, and dynamic columns. Wide-column stores provide a
lot of flexibility over relational databases because each row is not required to have the
same columns.
o Wide-column stores are commonly used for storing Internet of Things data and user
profile data. Cassandra and HBase are two of the most popular wide-column stores.
Graph databases
o Nodes typically store information about people, places, and things while edges store
information about the relationships between the nodes.
o Graph databases excel in use cases where you need to traverse relationships to look
for patterns such as social networks, fraud detection, and recommendation engines.
Data warehousing is the process of constructing and using a data warehouse. A data warehouse
is constructed by integrating data from multiple heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves
data cleaning, data integration, and data consolidations.
There are decision support technologies that help utilize the data available in a data warehouse.
These technologies help executives to use the warehouse quickly and effectively. They can
gather data, analyze it, and take decisions based on the information present in the warehouse.
The information gathered in a warehouse can be used in any of the following domains −
Subject Oriented
Integrated
Nonvolatile
Time Variant
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be
integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical
because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very much in
contrast to online transaction processing (OLTP) systems, where performance requirements
demand that historical data be moved to an archive. A data warehouse's focus on change over
time is what is meant by the term time variant.