Advanced Concepts-Unit-5
Advanced Concepts-Unit-5
Parallel database system improves performance of data processing using multiple resources
in parallel, like multiple CPU and disks are used parallel.
It also performs many parallelization operations like, data loading and query processing.
Improve performance:
The performance of the system can be improved by connecting multiple CPU and disks in
parallel. Many small processors can also be connected in parallel.
Improve reliability:
Reliability of system is improved with completeness, accuracy and availability of data.
Multimedia databases
The multimedia databases are used to store multimedia data such as images, animation,
audio, video along with text. This data is stored in the form of multiple file types
like .txt(text), .jpg(images), .swf(videos), .mp3(audio) etc.
Contents of the Multimedia Database
The multimedia database stored the multimedia data and information related to it. This is
given in detail as follows −
Media data
This is the multimedia data that is stored in the database such as images, videos, audios,
animation etc.
Media format data
The Media format data contains the formatting information related to the media data such
as sampling rate, frame rate, encoding scheme etc.
Media keyword data
This contains the keyword data related to the media in the database. For an image the
keyword data can be date and time of the image, description of the image etc.
Media feature data
Th Media feature data describes the features of the media data. For an image, feature data
can be colours of the image, textures in the image etc.
Challenges of Multimedia Database
There are many challenges to implement a multimedia database. Some of these are:
Data independence: Separate the database and the management from the
application program.
Query support :Multimedia databases should have the ability to uniformly Query
data (media data, Textual data) represented in different formats and have the
ability to simultaneously query different media sources and conduct classical
database operations across them.
Disadvantages:
Usually, the data size of multimedia is large such as video; therefore, multimedia
data often require a large storage.
Multidimensional Databases
Multidimensional databases are used mostly for OLAP (online analytical processing)
and data warehousing. They can be used to show multiple dimensions of data to
users.
A multidimensional database is created from multiple relational databases. While
relational databases allow users to access data in the form of queries, the
multidimensional databases allow users to ask analytical questions related to
business or market trends.
The multidimensional databases use MOLAP (multidimensional online analytical
processing) to access its data. They allow the users to quickly get answers to their
requests by generating and analysing the data rather quickly.
The data in multidimensional databases is stored in a data cube format. This means
that data can be seen and understood from many dimensions and perspectives.
Example
The revenue costs for a company can be understood and analysed on the basis of
various factors like the company products, the geographical locations of the
company offices, time to develop a product, promotions done etc.
A Data Warehouse (DW) is a relational database that is designed for query and analysis
rather than transaction processing. It includes historical data derived from transaction data
from single and multiple sources.
A Data Warehouse is a group of data specific to the entire organization, not only to a
particular group of users.
It is not used for daily operations and transaction processing but used for making decisions.
A Data Warehouse can be viewed as a data system with the following attributes:
o It is a database designed for investigative tasks, using data from various applications.
o It supports a relatively small number of clients with relatively long interactions.
o It includes current and historical data to provide a historical perspective of
information.
o Its usage is read-intensive.
o It contains a few large tables.
A data warehouse target on the modelling and analysis of data for decision-makers.
Therefore, data warehouses typically provide a concise and straightforward view around a
particular subject, such as customer, product, or sales, instead of the global organization's
ongoing operations. This is done by excluding data that are not useful concerning the
subject and including all data needed by the users to understand the subject.
Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and
online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among
different data sources.
Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files from
3 months, 6 months, 12 months, or even previous data from a data warehouse. These
variations with a transactions system, where often only the most current file is kept.
Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the
source operational RDBMS. The operational updates of data do not occur in the data
warehouse, i.e., update, insert, and delete operations are not performed. It usually requires
only two procedures in data accessing: Initial loading of data and access to data. Therefore,
the DW does not require transaction processing, recovery, and concurrency capabilities,
which allows for substantial speedup of data retrieval. Non-Volatile defines that once
entered into the warehouse, and data should not change.
History of Data Warehouse
The idea of data warehousing came to the late 1980's when IBM researchers Barry Devlin
and Paul Murphy established the "Business Data Warehouse."
In essence, the data warehousing idea was planned to support an architectural model for
the flow of information from the operational system to decisional support environments.
The concept attempt to address the various problems associated with the flow, mainly the
high costs associated with it.
In the absence of data warehousing architecture, a vast amount of space was required to
support multiple decision support environments. In large corporations, it was ordinary for
various decision support environments to operate independently.
1. 1) Business User: Business users require a data warehouse to view summarized data
from the past. Since these people are non-technical, the data may be presented to
them in an elementary form.
2. 2) Store historical data: Data Warehouse is required to store the time variable data
from the past. This input is made to be used for various purposes.
3. 3) Make strategic decisions: Some strategies may be depending upon the data in the
data warehouse. So, data warehouse contributes to making strategic decisions.
4. 4) For data consistency and quality: Bringing the data from different sources at a
commonplace, the user can effectively undertake to bring the uniformity and
consistency in data.
5. 5) High response time: Data warehouse has to be ready for somewhat unexpected
loads and types of queries, which demands a significant degree of flexibility and
quick response time.
Prerequisites
Before learning about Data Warehouse, you must have the fundamental knowledge of basic
database concepts such as schema, ER model, structured query language, etc.
Audience
This tutorial will help computer science students to understand the basic-to-advanced
concepts associated with data warehousing.
Problems
OLTP vs OLAP
Introduction to NoSQL
A NoSQL originally referring to non SQL or non relational is a database that provides a
mechanism for storage and retrieval of data. This data is modeled in means other than the
tabular relations used in relational databases. Such databases came into existence in the
late 1960s, but did not obtain the NoSQL moniker until a surge of popularity in the early
twenty-first century. NoSQL databases are used in real-time web applications and big data
and their use are increasing over time. NoSQL systems are also sometimes called Not only
SQL to emphasize the fact that they may support SQL-like query languages.
A NoSQL database includes simplicity of design, simpler horizontal scaling to clusters of
machines and finer control over availability. The data structures used by NoSQL databases
are different from those used by default in relational databases which makes some
operations faster in NoSQL. The suitability of a given NoSQL database depends on the
problem it should solve. Data structures used by NoSQL databases are sometimes also
viewed as more flexible than relational database tables.
Many NoSQL stores compromise consistency in favor of availability, speed and partition
tolerance. Barriers to the greater adoption of NoSQL stores include the use of low-level
query languages, lack of standardized interfaces, and huge previous investments in
existing relational databases. Most NoSQL stores lack true ACID(Atomicity, Consistency,
Isolation, Durability) transactions but a few databases, such as MarkLogic, Aerospike,
FairCom c-treeACE, Google Spanner (though technically a NewSQL database), Symas
LMDB, and OrientDB have made them central to their designs.
Most NoSQL databases offer a concept of eventual consistency in which database changes
are propagated to all nodes so queries for data might not return updated data
immediately or might result in reading data that is not accurate which is a problem known
as stale reads. Also some NoSQL systems may exhibit lost writes and other forms of data
loss. Some NoSQL systems provide concepts such as write-ahead logging to avoid data
loss. For distributed transaction processing across multiple databases, data consistency is
an even bigger challenge. This is difficult for both NoSQL and relational databases. Even
current relational databases do not allow referential integrity constraints to span
databases. There are few systems that maintain both X/Open XA standards and ACID
transactions for distributed transaction processing.
Advantages of NoSQL:
There are many advantages of working with NoSQL databases such as MongoDB and
Cassandra. The main advantages are high scalability and high availability.
1. High scalability –
NoSQL database use shading for horizontal scaling. Partitioning of data and placing it
on multiple machines in such a way that the order of the data is preserved is sharding.
Vertical scaling means adding more resources to the existing machine whereas
horizontal scaling means adding more machines to handle the data. Vertical scaling is
not that easy to implement but horizontal scaling is easy to implement. Examples of
horizontal scaling databases are MongoDB, Cassandra etc. NoSQL can handle huge
amount of data because of scalability, as the data grows NoSQL scale itself to handle
that data in efficient manner.
2. High availability –
Auto replication feature in NoSQL databases makes it highly available because in case
of any failure data replicates itself to the previous consistent state.
Disadvantages of NoSQL:
NoSQL has the following disadvantages.
1. Narrow focus –
NoSQL databases have very narrow focus as it is mainly designed for storage but it
provides very little functionality. Relational databases are a better choice in the field of
Transaction Management than NoSQL.
2. Open-source –
NoSQL is open-source database. There is no reliable standard for NoSQL yet. In other
words two database systems are likely to be unequal.
3. Management challenge –
The purpose of big data tools is to make management of a large amount of data as
simple as possible. But it is not so easy. Data management in NoSQL is much more
complex than a relational database. NoSQL, in particular, has a reputation for being
challenging to install and even more hectic to manage on a daily basis.
4. GUI is not available –
GUI mode tools to access the database is not flexibly available in the market.
5. Backup –
Backup is a great weak point for some NoSQL databases like MongoDB. MongoDB has
no approach for the backup of data in a consistent manner.
6. Large document size –
Some database systems like MongoDB and CouchDB store data in JSON format. Which
means that documents are quite large (BigData, network bandwidth, speed), and
having descriptive key names actually hurts, since they increase the document size.
Types of NoSQL database:
Types of NoSQL databases and the name of the databases system that falls in that
category are:
1. MongoDB falls in the category of NoSQL document based database.
2. Key value store: Memcached, Redis, Coherence
3. Tabular: Hbase, Big Table, Accumulo
4. Document based: MongoDB, CouchDB, Cloudant