Unit 1
Unit 1
1.0 INTRODUCTION
Database technology has advanced from the relational model to the distributed DBMS
and Object Oriented databases. The technology has also advanced to support data
formats using XML. In addition, data warehousing and data mining technology has
become very popular in the industry from the viewpoint of decision making and
planning.
1.1 OBJECTIVES
After going through this unit, you should be able to:
5
Emerging Trends and
Example DBMS
Architectures
1.2 MULTIMEDIA DATABASE
Multimedia and its applications have experienced tremendous growth. Multimedia
data is typically defined as containing digital images, audio, video, animation and
graphics along with the textual data. In the past decade, with the advances in network
technologies, the acquisition, creation, storage and processing of multimedia data and
its transmission over networks have grown tremendously.
(iii) Applications
With the rapid growth of computing and communication technologies, many
applications have come to the forefront. Thus, any such applications in future will
support life with multimedia data. This trend is expected to go on increasing in the
days to come.
• media commerce
• medical media databases
• bioinformatics
• ease of use of home media
• news and entertainment
• surveillance
6
• wearable computing Emerging Database
Models, Technologies
• management of meeting/presentation recordings and Applications-I
• biometrics (people identification using image, video and/or audio data).
Multimedia Databases (MMDBs) must cope with the large volume of multimedia
data, being used in various software applications. Some such applications may include
digital multimedia libraries, art and entertainment, journalism and so on. Some of
these qualities of multimedia data like size, formats etc. have direct and indirect
influence on the design and development of a multimedia database.
Media Format Data: This data defines the format of the media data after the
acquisition, processing, and encoding phases. For example, such data may consist of
information about sampling rate, resolution, frame rate, encoding scheme etc. of
various media data.
Media Keyword Data: This contains the keyword related to the description of media
data. For example, for a video, this might include the date, time, and place of
recording, the person who recorded, the scene description, etc. This is also known as
content description data.
Media Feature Data: This contains the features derived from the media data. A
feature characterises the contents of the media. For example, this could contain
information on the distribution of colours, the kinds of textures and the different
shapes present in an image. This is also referred to as content dependent data.
The last three types are known as meta data as, they describe several different aspects
of the media data. The media keyword data and media feature data are used as indices
for searching purpose. The media format data is used to present the retrieved
information.
6) One of the main requirements for such a Database would be to handle different
kinds of indices. The multimedia data is in exact and subjective in nature, thus,
the keyword-based indices and exact range searches used in traditional
databases are ineffective in such databases. For example, the retrieval of records
of students based on enrolment number is precisely defined, but the retrieval of
records of student having certain facial features from a database of facial
images, requires, content-based queries and similarity-based retrievals. Thus,
the multimedia database may require indices that are content dependent key-
word indices.
7) The Multimedia database requires developing measures of data similarity that
are closer to perceptual similarity. Such measures of similarity for different
media types need to be quantified and should correspond to perceptual
similarity. This will also help the search process.
8) Multimedia data is created all over world, so it could have distributed database
features that cover the entire world as the geographic area. Thus, the media data
may reside in many different distributed storage locations.
8
Emerging Database
9) Multimedia data may have to be delivered over available networks in real-time. Models, Technologies
and Applications-I
Please note, in this context, the audio and video data is temporal in nature. For
example, the video frames need to be presented at the rate of about 30
frames/sec for smooth motion.
Multimedia data is now being used in many database applications. Thus, multimedia
databases are required for efficient management and effective use of enormous
amounts of data.
These software are used to provide support for a wide variety of different media types,
specifically different media file formats such as image formats, video etc. These files
need to be managed, segmented, linked and searched.
The later commercial systems handle multimedia content by providing complex object
types for various kinds of media. In such databases the object orientation provides the
facilities to define new data types and operations appropriate for the media, such as
video, image and audio. Therefore, broadly MMDBMSs are extensible Object-
Relational DBMS (ORDBMSs). The most advanced solutions presently include
Oracle 10g, IBM DB2 and IBM Informix. These solutions purpose almost similar
approaches for extending the search facility for video on similarity-based techniques.
Some of the newer projects address the needs of applications for richer semantic
content. Most of them are based on the new MPEG-standards MPEG-7 and MPEG-
21.
MPEG-7
MPEG-7 is the ISO/IEC 15938 standard for multimedia descriptions that was issued
in 2002. It is XML based multimedia meta-data standard, and describes various
elements for multimedia processing cycle from the capture, analysis/filtering, to the
delivery and interaction.
• the applications utilising multimedia data are very diverse in nature. There is a
need for the standardisation of such database technologies,
9
Emerging Trends and • technology is ever changing, thus, creating further hurdles in the way of
Example DBMS
Architectures
multimedia databases,
• there is still a need to refine the algorithms to represent multimedia information
semantically. This also creates problems with respect to information
interpretation and comparison.
) Check Your Progress 1
1) What are the reasons for the growth of multimedia data?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2) List four application areas of multimedia databases.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3) What are the contents of multimedia database?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
4) List the challenges in designing multimedia databases.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
Requirements of a GIS
The data in GIS needs to be represented in graphical form. Such data would require
any of the following formats:
A GIS must also support the analysis of data. Some of the sample data analysis
operations that may be needed for typical applications are:
11
Emerging Trends and Once the data is captured in GIS it may be processed through some special operations.
Example DBMS
Architectures
Some such operations are:
The GIS also requires the process of visualisation in order to display the data in a
proper visual.
Thus, GIS is not a database that can be implemented using either the relational or
object oriented database alone. Much more needs to be done to support them. A
detailed discussion on these topics is beyond the scope of this unit.
Biological data by nature is enormous. Bioinformation is one such key area that has
emerged in recent years and which, addresses the issues of information management
of genetic data related to DNA sequence. A detailed discussion on this topic is beyond
the scope of this unit. However, let us identify some of the basic characteristics of the
biological data.
The Human Genome Initiative is an international research initiative for the creation of
detailed genetic and physical maps for each of the twenty-four different human
chromosomes and the finding of the complete deoxyribonucleic acid (DNA) sequence
of the human genome. The term Genome is used to define the complete genetic
information about a living entity. A genetic map shows the linear arrangement of
genes or genetic marker sites on a chromosome. There are two types of genetic maps−
genetic linkage maps and physical maps. Genetic linkage maps are created on the
12
basis of the frequency with which genetic markers are co-inherited. Physical maps are Emerging Database
Models, Technologies
used to determine actual distances between genes on a chromosome. and Applications-I
One of the major uses of such databases is in computational Genomics, which refers
to the applications of computational molecular biology in genome research. On the
basis of the principles of the molecular biology, computational genomics has been
classified into three successive levels for the management and analysis of genetic data
in scientific databases. These are:
• Genomics.
• Gene expression.
• Proteomics.
1.4.1 Genomics
Genomics is a scientific discipline that focuses on the systematic investigation of the
complete set of chromosomes and genes of an organism. Genomics consists of two
component areas:
Genome Databases
Genome databases are used for the storage and analysis of genetic and physical maps.
Chromosome genetic linkage maps represent distances between markers based on
meiotic re-combination frequencies. Chromosome physical maps represent distances
between markers based on numbers of nucleotides.
13
Emerging Trends and Genome databases should define four data types:
Example DBMS
Architectures • Sequence
• Physical
• Genetic
• Bibliographic
• Sequence-tagged sites
• Coding regions
• Non-coding regions
• Control regions
• Telomeres
• Centromeres
• Repeats
• Metaphase chromosome bands.
• Locus name
• Location
• Recombination distance
• Polymorphisms
• Breakpoints
• Rearrangements
• Disease association
• Bibliographic references should cite primary scientific and medical literature.
14
• Spatial information Emerging Database
Models, Technologies
• Quantification and Applications-I
• Gene products
• User annotation of existing data
• Linked entries
• Links to other databases
o Internet access
o Internet submission.
Gene expression databases have not established defined standards for the collection,
storage, retrieval and querying of gene expression data derived from libraries of gene
expression experiments.
Data visualisation is used to display the partial results of cluster analysis generated
from large gene expression database cluster.
1.4.3 Proteomics
Proteomics is the use of quantitative protein-level measurements of gene expression in
order to characterise biological processes and describe the mechanisms of gene
translation. The objective of proteomics is the quantitative measurement of protein
expression particularly under the influence of drugs or disease perturbations. Gene
expression monitors gene transcription whereas proteomics monitors gene translation.
Proteomics provides a more direct response to functional genomics than the indirect
approach provided by gene expression.
Proteome Databases
Proteome databases also provide integrated data management and analysis systems for
the translational expression data generated by large-scale proteomics experiments.
Proteome databases integrate expression levels and properties of thousands of proteins
with the thousands of genes identified on genetic maps and offer a global approach to
the study of gene expression.
Proteome databases address five research problems that cannot be resolved by DNA
analysis:
15
Emerging Trends and The creation of comprehensive databases of genes and gene products will lay the
Example DBMS
Architectures
foundation for further construction of comprehensive databases of higher-level
mechanisms, e.g., regulation of gene expression, metabolic pathways and signalling
cascades.
A detailed discussion on these databases is beyond the scope of this Unit. You may
wish to refer to the further readings for more information.
Knowledge databases are the database for knowledge management. But what is
knowledge management? Knowledge management is the way to gather, manage and
use the knowledge of an organisation. The basic objectives of knowledge management
are to achieve improved performance, competitive advantage and higher levels of
innovation in various tasks of an organisation.
• Please note that during the representation of the fact the data is represented
using the attribute value only and not the attribute name. The attribute name
determination is on the basis of the position of the data. For instance, in the
example above Rakesh is the Mgrname.
• The rules in the Datalog do not contain the data. These are evaluated on the
basis of the stored data in order to deduce more information.
Please note the distinction here. Information visualisation mainly focuses on the tools
that are supported by the computer in order to explore and present large amount of
data in formats that may be easily understood.
You can refer to more details on this topic in the fifth semester course.
1.7 SUMMARY
This unit provides an introduction to some of the later developments in the area of
database management systems. Multimedia databases are used to store and deal with
multimedia information in a cohesive fashion. Multimedia databases are very large in
size and also require support of algorithms for searches based on various media
components. Spatial database primarily deals with multi-dimensional data. GIS is a
spatial database that can be used for many cartographic applications such as irrigation
system planning, vehicle monitoring system etc. This database system may represent
information in a multi-dimensional way.
Genome database is another very large database system that is used for the purpose of
genomics, gene expression and proteomics. Knowledge database store information
either as a set of facts and rules or as semantic models. These databases can be utilised
in order to deduce more information from the stored rules using an inference engine.
Information visualisation is an important area that may be linked to databases from
the point of visual presentation of information for better user interactions.
19
Emerging Trends and
Example DBMS
Architectures
1.8 SOLUTIONS/ANSWERS
Check Your Progress 1
1) a) Advanced technology in terms of devices that were digital in nature and
support capture and display equipment.
b) High speed data communication network and software support for
multimedia data transfer.
c) Newer application requiring multimedia support.
2) Medical media databases
Bio-informatics
Home media
News etc.
3) Content can be of two basic types:
a) Media Content
b) Meta data, which includes media, format data, media keyword data and
media feature data.
4) Some of the challenges are:
a) Support for different types of input/output
b) Handling many compressions algorithms and formats
c) Differences in OS and hardware
d) Integrating to different database models
e) Support for queries for a variety of media types
f) Handling different kinds of indices
g) Data distribution over the world etc.
Check Your Progress 2
1) GIS is a spatial database application where the spatial and non-spatial data is
represented along with the map. Some of the applications of GIS are:
• Cartographic applications
• 3-D Digital modeling applications like land elevation records
• Geographic object applications like traffic control system.
2) A GIS has the following requirements:
• Data representation through vector and raster
• Support for analysis of data
• Representation of information in an integrated fashion
• Capture of information
• Visualisation of information
• Operations on information
3) The data may need to be organised for the following three levels:
• Geonomics: Where four different types of data are represented. The
physical data may be represented using eight different fields.
• Gene expression: Where data is represented in fourteen different fields
• Proteomics: Where data is used for five research problems.
Check Your Progress 3
1) A good knowledge database will have good information, good classification and
structure and an excellent search engine.
20
Emerging Database
Models, Technologies
and Applications-I
21