0% found this document useful (0 votes)
50 views12 pages

Unit 9 Emerging Database Technology and Application

This document discusses emerging database technologies including big data, NoSQL databases, and geographic information systems (GIS) databases. It provides definitions and examples of big data, what types of data fall under big data, and the benefits of analyzing big data. It also defines NoSQL databases, describes the four main types (document, key-value, wide-column, and graph databases), and provides examples. Finally, it defines GIS databases and their components, including hardware, software, data, people, and methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views12 pages

Unit 9 Emerging Database Technology and Application

This document discusses emerging database technologies including big data, NoSQL databases, and geographic information systems (GIS) databases. It provides definitions and examples of big data, what types of data fall under big data, and the benefits of analyzing big data. It also defines NoSQL databases, describes the four main types (document, key-value, wide-column, and graph databases), and provides examples. Finally, it defines GIS databases and their components, including hardware, software, data, people, and methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Unit 9: Emerging Database Technology and Application

Concept of big data


Big data is a collection of large datasets that cannot be processed using traditional computing
techniques. It is not a single technique or a tool; rather it has become a complete subject, which
involves various tools, techniques and frameworks.

What Comes Under Big Data?

Big data involves the data produced by different devices and applications. Given below are
some of the fields that come under the umbrella of Big Data.

 Black Box Data − It is a component of helicopter, airplanes, and jets, etc. It captures
voices of the flight crew, recordings of microphones and earphones, and the
performance information of the aircraft.

 Social Media Data − Social media such as Facebook and Twitter hold information and
the views posted by millions of people across the globe.

 Stock Exchange Data − The stock exchange data holds information about the ‘buy’ and
‘sell’ decisions made on a share of different companies made by the customers.

 Power Grid Data − the power grid data holds information consumed by a particular
node with respect to a base station.

 Transport Data − Transport data includes model, capacity, distance and availability of a
vehicle.

 Search Engine Data − Search engines retrieve lots of data from different databases.

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in
it will be of three types.

 Structured data − Relational data.

 Semi Structured data − XML data.

 Unstructured data − Word, PDF, Text, Media Logs.


Benefits of Big Data

 Using the information kept in the social network like Facebook, the marketing agencies
are learning about the response for their campaigns, promotions, and other advertising
mediums.

 Using the information in the social media like preferences and product perception of
their consumers, product companies and retail organizations are planning their
production.

 Using the data regarding the previous medical history of patients, hospitals are providing
better and quick service.

Concept of NoSQL?

NoSQL Database is a non-relational Data Management System, that does not require a fixed
schema. It avoids joins, and is easy to scale. The major purpose of using a NoSQL database is for
distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-
time web apps. For example, companies like Twitter, Facebook and Google collect terabytes of
user data every single day.

NoSQL database stands for "Not Only SQL" or "Not SQL." Though a better term would be
"NoREL", NoSQL caught on. Carl Strozz introduced the NoSQL concept in 1998.

Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a
NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data.

What are the Types of NoSQL Databases?

Over time, four major types of NoSQL databases emerged: document databases, key-value
databases, wide-column stores, and graph databases. Let’s examine each type.
 Document databases

o store data in documents similar to JSON (JavaScript Object Notation) objects. Each
document contains pairs of fields and values.

o The values can typically be a variety of types including things like strings, numbers,
Booleans, arrays, or objects, and their structures typically align with objects
developers are working with in code.

o Because of their variety of field value types and powerful query languages, document
databases are great for a wide variety of use cases and can be used as a general
purpose database.

o They can horizontally scale-out to accomodate large data volumes.

o MongoDB is consistently ranked as the world’s most popular NoSQL database


according to DB-engines and is an example of a document database.

 Key-value databases

o are a simpler type of database where each item contains keys and values. A value can
typically only be retrieved by referencing its key, so learning how to query for a
specific key-value pair is typically simple.

o Key-value databases are great for use cases where you need to store large amounts of
data but you don’t need to perform complex queries to retrieve it. Common use cases
include storing user preferences or caching.

o Redis and DynamoDB are popular key-value databases.

 Wide-column

o stores store data in tables, rows, and dynamic columns. Wide-column stores provide a
lot of flexibility over relational databases because each row is not required to have the
same columns.

o Many consider wide-column stores to be two-dimensional key-value databases.


o Wide-column stores are great for when you need to store large amounts of data and
you can predict what your query patterns will be.

o Wide-column stores are commonly used for storing Internet of Things data and user
profile data. Cassandra and HBase are two of the most popular wide-column stores.

 Graph databases

o store data in nodes and edges.

o Nodes typically store information about people, places, and things while edges store
information about the relationships between the nodes.

o Graph databases excel in use cases where you need to traverse relationships to look
for patterns such as social networks, fraud detection, and recommendation engines.

o Neo4j and JanusGraph are examples of graph databases.

Concept of mobile and multimedia data


Mobile multimedia refers to various types of content that are either accessed via portable devices
or created using them. One of the areas that mobile multimedia has become ubiquitous is in
smartphones that incorporate video and music playback capability, cameras, and wireless content
streaming. Other dedicated devices, such as digital music players and personal digital assistants
(PDAs), also provide access to multimedia. One major use of mobile multimedia is
to download or stream content such as television shows, sports, or news reports while away
from a home or office environment, but various devices also allow the creation of such content.
Text, picture, and video messaging services are examples of multimedia that people create and
share using mobile devices.
Multimedia is defined as content that consists of several different types of media. A simple
example of multimedia is television, since it can include visual and audio elements. Other types
of media include text, interactivity, and static images. These various types of media can be
combined in many different ways and played back in a variety of environments, including
mobile devices. When multimedia includes an element of interactivity, it is often referred to as
rich media.
One of the most common ways that mobile multimedia is experienced and created is with
smartphones. These mobile devices can combine a phone handset with a camera and what is
essentially a small computer. Many of these phones are capable of running forms of rich media
such as video games, accessing social media platforms, and displaying the same multimedia
websites that can be accessed from any home computer. Some devices can even be used to
stream television shows, sporting events, and other multimedia experiences.
It is often possible for users of smartphones and other similar devices to also create and share
their own mobile multimedia. At one point text messaging was a single media experience,
though advances in technology allowed this form of communication to expand in scope. Many
mobile phones can take pictures and videos that can then be sent and shared along with text. In
some circumstances, it is even possible to create videos using a smartphone or similar device
and transfer them directly to video sharing sites on the Internet.
Other types of mobile multimedia devices exist as well. Many digital music players are also
capable of playing back videos and other media. Small tablet computers and netbooks can also
be considered to be mobile multimedia devices due to the wide range of content that can be
experienced and created using these highly portable platforms. In many cases, these devices have
built in cellular radios to download and upload multimedia content.

Concept of GIS database


A geographic information system (GIS) is a computer-based tool for mapping and analyzing
things that exist and events that happen on Earth. GIS technology integrates common database
operations such as query and statistical analysis with the unique visualization and geographic
analysis benefits offered by maps. These abilities distinguish GIS from other information
systems and make it valuable to a wide range of public and private enterprises for explaining
events, predicting outcomes, and planning strategies. Map making and geographic analysis are
not new, but a GIS performs these tasks better and faster than do the old manual methods. And,
before GIS technology, only a few people had the skills necessary to use geographic information
to help with decision making and problem solving.
Components of a GIS
A working GIS integrates five key components:
 Hardware
 software,
 data,
 people,
 Methods.
Hardware
Hardware is the computer on which a GIS operates. Today, GIS software runs on a wide range of
hardware types, from centralized computer servers to desktop computers used in stand-alone or
networked configurations.
Software
GIS software provides the functions and tools needed to store, analyze, and display geographic
information. Key software components are: · Tools for the input and manipulation of geographic
information · A database management system (DBMS) · Tools that support geographic query,
analysis, and visualization
Data
Possibly the most important component of a GIS is the data. Geographic data and related tabular
data can be collected in-house or purchased from a commercial data provider. A GIS will
integrate spatial data with other data resources and can even use a DBMS, used by most
organizations to organize and maintain their data, to manage spatial data.
People
GIS technology is of limited value without the people who manage the system and develop plans
for applying it to real world problems. GIS users range from technical specialists who design and
maintain the system to those who use it to help them perform their everyday work.
Methods
A successful GIS operates according to a well-designed plan and business rules, which are the
models and operating practices unique to each organization.

How GIS Works


GIS stores information about the world as a collection of thematic layers that can be linked
together by geography. This simple but extremely powerful and versatile concept has proven
invaluable for solving many real-world problems from tracking delivery vehicles, to recording
details of planning applications, to modeling global atmospheric circulation.
Geographic References:
Geographic information contains either an explicit geographic reference such as a latitude and
longitude or national grid coordinate, or an implicit reference such as an address, postal code,
census tract name, forest stand identifier, or road name. An automated process called geocoding
is used to create explicit geographic references (multiple locations) from implicit references
(descriptions such as addresses). These geographic references allow you to locate features such
as a business or forest stand and events such as an earthquake on the Earth's surface for analysis.
Vector and Raster Models:
 Geographic information systems work with two fundamentally different types of
geographic models--the "vector model" and the "raster model."
Vector model
 In the vector model, information about points, lines, and polygons is encoded and stored
as a collection of x,y coordinates.
 The location of a point feature, such as a bore hole, can be described by a single x,y
coordinate.
 Linear features, such as roads and rivers, can be stored as a collection of point
coordinates.
 Polygonal features, such as sales territories and river catchments, can be stored as a
closed loop of coordinates.
 The vector model is extremely useful for describing discrete features, but less useful for
describing continuously varying features such as soil type or accessibility costs for
hospitals.
Raster model
 The raster model has evolved to model such continuous features.
 A raster image comprises a collection of grid cells rather like a scanned map or picture.
 Both the vector and raster models for storing geographic data have unique advantages
and disadvantages. Modern GISs are able to handle both models

Concept of Data Warehousing and data mining

Data warehousing is the process of constructing and using a data warehouse. A data warehouse
is constructed by integrating data from multiple heterogeneous sources that support analytical
reporting, structured and/or ad hoc queries, and decision making. Data warehousing involves
data cleaning, data integration, and data consolidations.

Using Data Warehouse Information

There are decision support technologies that help utilize the data available in a data warehouse.
These technologies help executives to use the warehouse quickly and effectively. They can
gather data, analyze it, and take decisions based on the information present in the warehouse.
The information gathered in a warehouse can be used in any of the following domains −

 Tuning Production Strategies − The product strategies can be well tuned by


repositioning the products and managing the product portfolios by comparing the sales
quarterly or yearly.

 Customer Analysis − Customer analysis is done by analyzing the customer's buying


preferences, buying time, budget cycles, etc.

 Operations Analysis − Data warehousing also helps in customer relationship


management, and making environmental corrections. The information also allows us to
analyze business operations.

A common way of introducing data warehousing is to refer to the characteristics of a data


warehouse as set forth by William Inmon:

 Subject Oriented

 Integrated

 Nonvolatile

 Time Variant

Subject Oriented

Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented.

Integrated

Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be
integrated.

Nonvolatile

Nonvolatile means that, once entered into the warehouse, data should not change. This is logical
because the purpose of a warehouse is to enable you to analyze what has occurred.

Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very much in
contrast to online transaction processing (OLTP) systems, where performance requirements
demand that historical data be moved to an archive. A data warehouse's focus on change over
time is what is meant by the term time variant.

Benefits of data warehousing


A well-designed data warehouse is the foundation for any successful BI or analytics program. Its
main job is to power the reports, dashboards, and analytical tools that have become indispensable
to businesses today. A data warehouse provides the information for your data-driven decisions –
and helps you make the right call on everything from new product development to inventory
levels. There are many benefits of a data warehouse. Here are just a few:
 Better business analytics: With data warehousing, decision-makers have access to data from
multiple sources and no longer have to make decisions based on incomplete information.
 Faster queries: Data warehouses are built specifically for fast data retrieval and analysis. With a
DW, you can very rapidly query large amounts of consolidated data with little to no support from
IT.
 Improved data quality: Before being loaded into the DW, data cleansing cases are created by
the system and entered in a work list for further processing, ensuring data is transformed into a
consistent format to support analytics – and decisions – based on high quality, accurate data.
 Historical insight: By storing rich historical data, a data warehouse lets decision-makers learn
from past trends and challenges, make predictions, and drive continuous business improvement.

Types of Data Warehouse


Three main types of Data Warehouses (DWH) are:
1. Enterprise Data Warehouse (EDW):
Enterprise Data Warehouse (EDW) is a centralized warehouse. It provides decision support
service across the enterprise. It offers a unified approach for organizing and representing data. It
also provide the ability to classify data according to the subject and give access according to
those divisions.
2. Operational Data Store:
Operational Data Store, which is also called ODS, are nothing but data store required when
neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data
warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like
storing records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of
business, such as sales, finance, sales or finance. In an independent data mart, data can collect
directly from sources.

Concept of data mining


Data Mining is a process of finding potentially useful patterns from huge data sets. It is a multi-
disciplinary skill that uses machine learning, statistics, and AI to extract information to evaluate
future events probability. The insights derived from Data Mining are used for marketing, fraud
detection, scientific discovery, etc.
Data Mining is all about discovering hidden, unsuspected, and previously unknown yet valid
relationships amongst the data. Data mining is also called Knowledge Discovery in Data (KDD),
Knowledge extraction, data/pattern analysis, information harvesting, etc
Types of data mining
 Relational databases
 Data warehouses
 Advanced DB and information repositories
 Object-oriented and object-relational databases
 Transactional and Spatial databases
 Heterogeneous and legacy databases
 Multimedia and streaming database
 Text databases
 Text mining and Web mining

Data Mining Techniques


1. Classification:
This analysis is used to retrieve important and relevant information about data, and metadata.
This data mining method helps to classify data in different classes.
2. Clustering:
Clustering analysis is a data mining technique to identify data that are like each other. This
process helps to understand the differences and similarities between the data.
3. Regression:
Regression analysis is the data mining method of identifying and analyzing the relationship
between variables. It is used to identify the likelihood of a specific variable, given the presence
of other variables.
4. Association Rules:
This data mining technique helps to find the association between two or more Items. It discovers
a hidden pattern in the data set.
5. Outer detection:
This type of data mining technique refers to observation of data items in the dataset which do not
match an expected pattern or expected behavior. This technique can be used in a variety of
domains, such as intrusion, detection, fraud or fault detection, etc. Outer detection is also called
Outlier Analysis or Outlier mining.
6. Sequential Patterns:
This data mining technique helps to discover or identify similar patterns or trends in transaction
data for certain period.
7. Prediction:
Prediction has used a combination of the other techniques of data mining like trends, sequential
patterns, clustering, classification, etc. It analyzes past events or instances in a right sequence for
predicting a future event.

Challenges of Implementation of Data mine:


 Skilled Experts are needed to formulate the data mining queries.
 Over fitting: Due to small size training database, a model may not fit future states.
 Data mining needs large databases which sometimes are difficult to manage
 Business practices may need to be modified to determine to use the information
uncovered.
 If the data set is not diverse, data mining results may not be accurate.
 Integration information needed from heterogeneous databases and global information
systems could be complex

Benefits of Data Mining:


 Data mining technique helps companies to get knowledge-based information.
 Data mining helps organizations to make the profitable adjustments in operation and
production.
 The data mining is a cost-effective and efficient solution compared to other statistical
data applications.
 Data mining helps with the decision-making process.
 Facilitates automated prediction of trends and behaviors as well as automated discovery
of hidden patterns.
 It can be implemented in new systems as well as existing platforms
 It is the speedy process which makes it easy for the users to analyze huge amount of data
in less time.

You might also like