Database Management Systems
Database Management Systems
Systems
eBooksForAllEdition
(www.ebooks-for-all.com)
PDFgenerated using the opensource mwlib toolkit. Seehttp://code.pediapress.com/ for more information.
PDFgenerated at:Sun, 20 Oct2013 01:48:50 UTC
Contents
Articles
Database 1
Database model 16
Database normalization 23
Database storage structures 31
Distributed database 33
Federated database system 36
Referential integrity 40
Relational algebra 41
Relational calculus 53
Relational database 53
Relational database management system 57
Relational model 59
Object-relational database 69
Transaction processing 72
Concepts 76
ACID 76
Create, read, update and delete 79
Null (SQL) 80
Candidate key 96
Foreign key 98
Unique key 102
Superkey 105
Surrogate key 107
Armstrong'saxioms 111
Objects 113
Relation (database) 113
Table (database) 115
Column (database) 116
Row(database) 117
View(SQL) 118
Database transaction 120
Transaction log 123
Database trigger 124
Database index 130
Stored procedure 135
Cursor(databases) 138
Partition (database) 143
Components 145
Concurrency control 145
Data dictionary 152
Java Database Connectivity 154
XQueryAPIfor Java 157
ODBC 163
Querylanguage 169
Queryoptimization 170
Queryplan 173
Functions 175
Database administration and automation 175
Replication (computing) 177
References
Article Sources and Contributors 234
Image Sources, Licenses and Contributors 240
Article Licenses
License 241
Database 1
Database
Adatabaseisanorganizedcollectionofdata.Thedataaretypicallyorganizedtomodelrelevantaspectsofrealityin
awaythatsupportsprocessesrequiringthisinformation.Forexample,modelingtheavailabilityofroomsinhotelsin a way
that supports finding a hotel with vacancies.
Database management systems (DBMSs) are specially designed applications that interact with the user, other
applications,and the database itself to capture and analyze data. A general-purpose databasemanagementsystem
(DBMS) is a software system designed to allow the definition, creation, querying, update, and administration of
databases. Well-known DBMSs include MySQL, PostgreSQL, SQLite, Microsoft SQL Server,Oracle, SAP, dBASE,
FoxPro, IBM DB2, LibreOffice Base and FileMaker Pro. A database is not generally portable across different
DBMS, but different DBMSs can by using standards such as SQL and ODBC or JDBC to allow a single application
to work with more than one database.
Terminologyandoverview
Formally, the term "database" refers to the data itself and supporting data structures. Databases are created to operate
large quantities of information by inputting, storing, retrieving, and managing that information. Databases are set up
so that one set of software programs provides all users with access to all the data.
A"databasemanagementsystem"(DBMS)isasuiteofcomputersoftwareprovidingtheinterfacebetweenusersand a
database or databases. Because they are so closely related, the term "database" when used casually often refers to
both a DBMS and the data it manipulates.
Outside the world of professional information technology, the term database is sometimes used casually to refer to
any collection of data (perhaps a spreadsheet, maybe even a card index). This article is concerned only withdatabases
[1]
where the size and usage requirements necessitate use of a database management system.
Theinteractions cateredfor bymost existingDBMS fallinto fourmain groups:
• Datadefinition.Definingnewdatastructuresforadatabase,removingdatastructuresfromthedatabase,
modifying the structure of existing data.
• Update. Inserting, modifying, and deleting data.
• Retrieval. Obtaining information either for end-user queries and reports or for processing by applications.
• Administration.Registeringandmonitoringusers,enforcingdatasecurity,monitoringperformance,maintaining data
integrity, dealing with concurrency control, and recovering information if the system fails.
A DBMS is responsible for maintaining the integrity and security of stored data, and for recovering information
ifthe system fails.
[2]
Both a database and its DBMS conform to the principles of a particular database model. "Database system" refers
[3]
collectively to the database model, database management system, and database.
Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and
related software. Database servers are usually multiprocessor computers, with generous memory and RAID disk
arrays used for stable storage. RAID is used for recovery of data if any of the disks fails. Hardware database
accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction
processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built arounda
custom multitaskingkernel with built-in networking support, but modern DBMSs typically rely on a standard
[citationneeded]
operatingsystemtoprovidethesefunctions. SinceDBMSscompriseasignificanteconomicalmarket,
computer and storage vendors often take into account DBMS requirements in their own development plans.
[citation
needed]
Database 2
Databases and DBMSs can be categorized according to the database model(s) that they support (such as relational or
XML), the type(s) of computer they run on (from a server cluster to a mobile phone), the query language(s) used to
access the database (such as SQL or XQuery), and their internal engineering, which affects performance, scalability,
resilience, and security.
Applicationsandroles
Most organizations in developed countries today depend on databases for their business operations. Increasingly,
databases are not only used to support the internal operations of the organization, but also to underpin its online
interactions with customers and suppliers (see Enterprise software). Databases are not used only to hold
administrative information, but are often embedded within applications to hold more specialized data: for example
engineering data or economic models. Examples of database applications include computerized library systems,flight
reservation systems, and computerized parts inventory systems.
Client-server or transactional DBMSs are often complex to maintain high performance, availability and security
whenmanyusersarequeryingandupdatingthedatabaseatthesametime.Personal,desktop-baseddatabasesystems tend to
be less complex. For example, FileMaker and Microsoft Access come with built-in graphical user interfaces.
General-purposeandspecial-purposeDBMSs
ADBMShasevolvedintoacomplexsoftwaresystemanditsdevelopmenttypicallyrequiresthousandsof person-years of
[4]
development effort. Some general-purpose DBMSs such as Adabas, Oracle and DB2 have been undergoing
upgrades since the 1970s. General-purpose DBMSs aim to meet the needs of as many applications as possible, which
adds to the complexity. However, the fact that their development cost can be spread over a large number of users
means that they are often the most cost-effective approach. However, a general-purpose DBMS is not always the
optimal solution: in some cases a general-purpose DBMS may introduce unnecessary overhead. Therefore, there are
many examples of systems that use special-purpose databases. A common example is an email system: email
systems are designed to optimize the handling of email messages, and do not need significant portions of a general-
purpose DBMS functionality.
Many databases have application software that accesses the database on behalf of end-users, without exposing the
DBMS interface directly. Application programmers may use a wire protocol directly, or more likely through an
application programming interface. Database designers and database administrators interact with the DBMS through
dedicated interfaces to build and maintain the applications' databases, and thus need some more knowledge and
understanding about how DBMSs operate and the DBMSs' external interfaces and tuning parameters.
General-purpose databases are usually developed by one organization or community of programmers, while a
different group builds the applications that use it. In many companies, specialized database administrators maintain
databases, run reports, and may work on code that runs on the databases themselves (rather than in the client
application).
History
With the progress in technology in the areas of processors, computer memory, computer storage and
computernetworks, the sizes, capabilities, and performance of databases and their respective DBMSs have grown in
orders of magnitudes.
The development of database technology can be divided into three eras based on data model or structure:
[5]
navigational, SQL/relational, and post-relational. The two main early navigational data models were the
hierarchical model, epitomized by IBM's IMS system, and the Codasyl model (Network model), implemented in a
number of products such as IDMS.
Database 3
The relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that
applications should search for data by content, rather than by following links. The relational model is made up of
ledger-style tables, each used for a different type of entity. It was not until the mid-1980s that computing hardware
became powerful enough to allow relational systems (DBMSs plus applications) to be widely deployed. By the early
1990s, however, relational systems were dominant for all large-scale data processing applications, and they remain
dominant today (2013) except in niche areas. The dominant database language is the standard SQL for the relational
[citation needed]
model, which has influenced database languages for other data models.
Object databases were invented in the 1980s to overcome the inconvenience of object-relational
impedancemismatch, which led to the coining of the term "post-relational" but also development of hybrid object-
relationaldatabases.
The next generation of post-relational databases in the 2000s became known as NoSQL databases, introducing fast
key-value stores and document-oriented databases. A competing "next generation" known as NewSQL databases
attempted new implementations that retained the relational/SQL model while aiming to match the high performance
of NoSQL compared to commercially available relational DBMSs.
1960sNavigationalDBMS
The introduction of the term database coincided with the
availability of direct-access storage (disks and drums) from
the mid-1960s onwards. The term represented a contrast with
the tape-based systems of the past, allowing
sharedinteractiveuseratherthandailybatchprocessing.TheOxfor
dEnglish dictionary cites a 1962 report by the System
Development Corporation of California as the first to use the
term "data-base" in a specific technical sense.
IBM also had their own DBMS system in 1968, known as IMS. IMS was a development of software written for the
ApolloprogramontheSystem/360.IMSwasgenerallysimilarinconcepttoCodasyl,butusedastricthierarchyfor
Database 4
its model of data navigation instead of Codasyl's network model. Both concepts later became known as
navigationaldatabases due to the way data was accessed, and Bachman's 1973 Turing Award presentation was The
Programmer as Navigator. IMS is classified as a hierarchical database. IDMS and Cincom Systems'TOTAL
database are classified as network databases.
1970srelationalDBMS
EdgarCoddworkedatIBMinSanJose,California,inoneoftheiroffshootofficesthatwasprimarilyinvolvedinthe
development of hard disk systems. He was unhappy with the navigational model of the Codasyl approach, notablythe
lack of a "search" facility. In 1970, he wrote a number of papers that outlined a new approach to database
construction that eventually culminated in the groundbreaking A Relational Model of Data for Large Shared Data
[6]
Banks.
In this paper, he described a new system for storing and working with large databases. Instead of records beingstored
in some sort of linked list of free-form records as in Codasyl, Codd's idea was to use a "table" of fixed-length
records, with each table used for a different type of entity. A linked-list system would be very inefficient when
storing "sparse" databases where some of the data for any one record could be left empty. The relational model
solved this by splitting the data into a series of normalized tables (or relations), with optional elements being moved
out of the main table to where they would take up room only if needed. Data may be freely inserted, deleted and
edited in these tables, with the DBMS doing whatever maintenance needed to present a table view to the
application/user.
The relational model also allowed the
content of the database to evolve without
constant rewriting of links and pointers. The
relational part comes from entities
referencing other entities in what is knownas
one-to-many relationship, like a
traditionalhierarchicalmodel,andmany-to-
many relationship, like a navigational
(network) model. Thus, a relational model
can express both hierarchical and
navigational models,
aswellasitsnativetabularmodel,allowingfor
pure or combined modeling in terms ofthese
three models, as the application requires.
For instance, a common use of a database
system is to track information about users,
theirname,logininformation,various
addresses and phone numbers. In the navigational approach all of these data would be placed in a single record, and
unused items would simply not be placed in the database. In the relational approach, the data would be normalized
into a user table, an address table and a phone number tablerecordsare
Inthe , related (for instance). Records
linked together withwould
a "key" be created in theseoptional
tables only if the address or phone numbers were actually provided.
Linking the information back together is the key to this system. In the relational model, some bit of information was
used as a "key", uniquely defining a particular record. When information was being collected about a user,
information stored in the optional tables would be found by searching for this key. For instance, if the login name ofa
user is unique, addresses and phone numbers for that user would be recorded with the login name as its key. This
simple"re-linking"ofrelateddatabackintoasinglecollectionissomethingthattraditionalcomputerlanguagesare
Database 5
Databasemachinesandappliances
In the 1970s and 1980s attempts were made to build database systems with integrated hardware and software. The
underlying philosophy was that such integration would provide higher performance at lower cost. Examples were
IBM System/38, the early offering of Teradata, and the Britton Lee, Inc. database machine.
Another approach to hardware support for database management was ICL's CAFS accelerator, a hardware disk
controllerwithprogrammablesearchcapabilities.Inthelongterm,theseeffortsweregenerallyunsuccessfulbecause
specialized database machines could not keep pace with the rapid development and progress of general-purpose
computers. Thus most database systems nowadays are software systems running on general-purpose hardware, using
general-purpose computer data storage. However this idea is still pursued for certain applications by somecompanies
like Netezza and Oracle (Exadata).
Late-1970sSQLDBMS
IBM started working on a prototype system loosely based on Codd's concepts as System Rin the early 1970s. The
first version was ready in 1974/5, and work then started on multi-table systems in which the data could be split sothat
all of the data for a record (some of which is optional) did not have to be stored in a single large "chunk". Subsequent
multi-user versions were tested by customers in 1978 and 1979, by which time a standardized querylanguage–
[citation needed]
SQL – had been added. Codd's ideas were establishing themselves as both workable and superior to
Codasyl, pushing IBM to develop a true production version of System R, known as SQL/DS, and, later, Database 2
(DB2).
Larry Ellison's Oracle started from a different chain, based on IBM's papers on System R, and beat IBM to market
[citation needed]
when the first version was released in 1978.
StonebrakerwentontoapplythelessonsfromINGREStodevelopanewdatabase,Postgres,whichisnowknownas
PostgreSQL. PostgreSQL is often used for global mission critical applications (the .org and .info domain name
registries use it as their primary data store, as do many large companies and financial institutions).
Database 6
In Sweden, Codd's paper was also read and Mimer SQL was developed from the mid-1970s at Uppsala University.In
1984, this project was consolidated into an independent enterprise. In the early 1980s, Mimer introduced transaction
handling for high robustness in applications, an idea that was subsequently implemented on most other DBMS.
Another data model, the entity-relationship model, emerged in 1976 and gained popularity for database design as it
emphasized a more familiar description than the earlier relational model. Later on, entity-relationship constructswere
retrofitted as a data modeling construct for the relational model, and the difference between the two have become
[citation needed]
irrelevant.
1980sdesktopdatabases
The 1980s ushered in the age of desktop computing. The new computers empowered their users with spreadsheets
like Lotus 1,2,3 and database software like dBASE. The dBASE product was lightweight and easy for any computer
user to understand out of the box. C. Wayne Ratliff the creator of dBASE stated: “dBASE was different from
programs like BASIC, C, FORTRAN, and COBOL in that a lot of the dirty work had already been done. The data
manipulation is done by dBASE instead of by the user, so the user can concentrate on what he is doing, rather than
having to mess with the dirty details of opening, reading, and closing files, and managing space allocation.“
[12]
dBASE was one of the top selling software titles in the 1980s and early 1990s.
1980sobject-orienteddatabases
The 1980s, along with a rise in object oriented programming, saw a growth in how data in various databases were
handled. Programmers and designers began to treat the data in their databases as objects. That is to say that if a
person's data were in a database, that person's attributes, such as their address, phone number, and age, were now
considered to belong to that person instead of being extraneous data. This allows for relations between data to be
[13]
relations to objects and their attributes and not to individual fields. The term "object-relational
impedancemismatch" described the inconvenience of translating between programmed objects and database tables.
Objectdatabases and object-relational databases attempt to solve this problem by providing an object-oriented
language (sometimes as extensions to SQL) that programmers can use as alternative to purely relational SQL. On the
programming side, libraries known as object-relational mappings (ORMs) attempt to solve the same problem.
2000sNoSQLandNewSQLdatabases
The next generation of post-relational databases in the 2000s became known as NoSQL databases, including fastkey-
value stores and document-oriented databases. XML databases are a type of structured document-oriented database
that allows querying based on XML document attributes.
NoSQL databases are often very fast, do not require fixed table schemas, avoid join operations by storing
denormalized data, and are designed to scale horizontally.
In recent years there was a high demand for massively distributed databases with high partition tolerance but
according to the CAP theorem it is impossible for a distributed system to simultaneously provide consistency,
availability and partition tolerance guarantees. A distributed system can satisfy any two of these guarantees at the
same time, but not all three. For that reason many NoSQL databases are using what is called eventual consistency to
provide both availability and partition tolerance guarantees with a maximum level of data consistency.
The most popular NoSQL systems include: MongoDB, Riak, Oracle NoSQL Database, memcached, Redis,
CouchDB, Hazelcast, Apache Cassandra and HBase, note that all are open-source software products.
A number of new relational databases continuing use of SQL but aiming for performance comparable to NoSQL are
known as NewSQL.
Database 7
Databaseresearch
Database technology has been an active research topic since the 1960s, both in academia and in the research and
development groups of companies (for example IBM Research). Research activity includes theory and development
of prototypes. Notable research topics have included models, the atomic transaction concept and related
concurrencycontrol techniques, query languages and query optimization methods, RAID, and more.
The database research area has several dedicated academic journals (for example, ACM Transactions on
DatabaseSystems-TODS, Data and Knowledge Engineering-DKE) and annual conferences (e.g., ACMSIGMOD,
ACM PODS, VLDB, IEEE ICDE).
Databasetypeexamples
One way to classify databases involves the type of their contents, for example: bibliographic, document-text,
statistical, or multimedia objects. Another way is by their application area, for example: accounting, music
compositions, movies, banking, manufacturing, or insurance. A third way is by some technical aspect, such as the
database structure or interface type. This section lists a few of the adjectives used to characterize different kinds of
databases.
• Anin-memory database is a database thatprimarily resides in main memory, but istypically backed-up by
non-volatile computer data storage. Main memory databases are faster than disk databases, and so are often used
whereresponsetimeiscritical,suchasintelecommunicationsnetworkequipment.SAPHANAplatformisavery hot topic
for in-memory database. By May 2012, HANA was able to run on servers with 100TB main memory powered by
IBM. The co founder of the company claimed that the system was big enough to run the 8 largest SAP
customers.
• Anactivedatabaseincludesanevent-drivenarchitecturewhichcanrespondtoconditionsbothinsideandoutside the
database. Possible uses include security monitoring, alerting, statistics gathering and authorization. Many
databases provide active database features in the form of database triggers.
• Aclouddatabasereliesoncloudtechnology.BoththedatabaseandmostofitsDBMSresideremotely,"inthe cloud,"
while its applications are both developed by programmers and later maintained and utilized by (application's)
end-users through a web browser and Open APIs.
• Datawarehousesarchivedatafromoperationaldatabasesandoftenfromexternalsourcessuchasmarketresearch firms.
The warehouse becomes the central source of data for use by managers and other end-users who may not have
access to operational data. For example, sales data might be aggregated to weekly totals and converted from
internal product codes to use UPCs so that they can be compared with ACNielsen data. Some basic and essential
components of data warehousing include retrieving, analyzing, and mining data, transforming, loading and
managing data so as to make them available for further use.
• Adeductivedatabasecombineslogicprogrammingwitharelationaldatabase,forexamplebyusingtheDataloglanguage
.
• Adistributed databaseis onein whichboth thedata andthe DBMSspan multiplecomputers.
• A document-oriented database is designed for storing, retrieving, and managing document-oriented, or semi
structureddata,information.Document-orienteddatabasesareoneofthemaincategoriesofNoSQLdatabases.
• An embedded database system is a DBMS which is tightly integrated with an application software that requires
accesstostoreddatainsuchawaythattheDBMSishiddenfromtheapplication’send-usersandrequireslittleor no ongoing
[14]
maintenance.
• End-userdatabasesconsistofdatadevelopedbyindividualend-users.Examplesofthesearecollectionsof
documents, spreadsheets, presentations, multimedia, and other files. Several products exist to support such
databases.SomeofthemaremuchsimplerthanfullfledgedDBMSs,withmoreelementaryDBMSfunctionality.
Database 8
• A federated database system comprises several distinct databases, each with its own DBMS. It is handled as a
singledatabasebyafederateddatabasemanagementsystem(FDBMS),whichtransparentlyintegratesmultiple
autonomous DBMSs, possibly of different types (in which case it would also be a heterogeneous
databasesystem), and provides them with an integrated conceptual view.
• Sometimes the term multi-database is used as a synonym to federated database, though it may refer to a less
integrated (e.g., without an FDBMS and a managed integrated schema) group of databases that cooperate in a
single application. In this case typically middleware is used for distribution, which typically includes an atomic
commitprotocol(ACP),e.g.,thetwo-phasecommitprotocol,toallowdistributed(global)transactionsacrossthe
participating databases.
• A graph database is a kind of NoSQL database that uses graph structures with nodes, edges, and properties to
representandstoreinformation.Generalgraphdatabasesthatcanstoreanygrapharedistinctfromspecialized graph
databases such as triplestores and network databases.
• In a hypertext or hypermedia database, any word or a piece of text representing an object, e.g., another piece of
text,anarticle,apicture,orafilm,canbehyperlinkedtothatobject.Hypertextdatabasesareparticularlyuseful for
organizing large amounts of disparate information. For example, they are useful for organizing
onlineencyclopedias, where users can conveniently jump around the text. The World Wide Web is thus a large
distributed hypertext database.
[15]
• A knowledge base (abbreviated KB, kbor Δ ) is a special kind of database for knowledge management,
providingthemeansforthecomputerizedcollection,organization,andretrievalofknowledge.Alsoacollection of data
representing problems with their solutions and related experiences.
• Amobile database can be carried on or synchronized froma mobile computing device.
• Operational databases store detailed data about the operations of an organization. They typically process
relatively high volumes of updates using transactions. Examples include customer databases that record contact,
credit,anddemographicinformationaboutabusiness'customers,personneldatabasesthatholdinformationsuch as
salary, benefits, skills data about employees, enterprise resource planning systems that record details about
product components, parts inventory, and financial databases that keep track of the organization's money,
accounting and financial dealings.
• Aparalleldatabaseseekstoimproveperformancethroughparallelizationfortaskssuchasloadingdata,building indexes
and evaluating queries.
Themajor parallelDBMS architectureswhich areinduced by theunderlying hardwarearchitecture are:
• Sharedmemoryarchitecture,wheremultipleprocessorssharethemainmemoryspace,aswellasother data
storage.
• Shareddiskarchitecture, where each processing unit (typically consisting of multiple processors) has its
own main memory, but all units share the other storage.
• Sharednothingarchitecture,whereeachprocessingunithasitsownmainmemoryandotherstorage.
• Probabilisticdatabases employ fuzzy logic to draw inferences from imprecise data.
• Real-timedatabases process transactions fast enough for the result to come back and be acted on right away.
• Aspatialdatabasecanstorethedatawithmultidimensionalfeatures.Thequeriesonsuchdataincludelocation based
queries, like "Where is the closest hotel in my area?".
• Atemporaldatabasehasbuilt-intimeaspects,forexampleatemporaldatamodelandatemporalversionofSQL. More
specifically the temporal aspects usually include valid-time and transaction-time.
• Aterminology-oriented database builds upon an object-oriented database, often customizedfor a specific field.
• Anunstructureddatadatabaseisintendedtostoreinamanageableandprotectedwaydiverseobjectsthatdonot fit
naturally and conveniently in common databases. It may include email messages, documents, journals,
Database 9
multimediaobjects,etc.Thenamemaybemisleadingsincesomeobjectscanbehighlystructured.However,the entire
possible object collection does not fit into a predefined structured framework. Most established DBMSs now
support unstructured data in various ways, and new dedicated DBMSs are emerging.
Databasedesignandmodeling
The first task of a database designer is to produce a conceptual data model that reflects the structure of the
information to be held in the database. A common approach to this is to develop an entity-relationship model, often
with the aid of drawing tools. Another popular approach is the Unified Modeling Language. A successful data model
will accurately reflect the possible state of the external world being modeled: for example, if people can have more
than one phone number, it will allow this information to be captured. Designing a good conceptual data model
requires a good understanding of the application domain; it typically involves asking deep questions about the things
of interest to an organisation, like "can a customer also be a supplier?", or "if a product is sold with two different
forms of packaging, are those the same product or different products?", or "if a plane flies from New York to Dubai
via Frankfurt, is that one flight or two (or maybe even three)?". The answers to these questions establish definitions
of the terminology used for entities (customers, products, flights, flight segments) and their relationships and
attributes.
Producing the conceptual data model sometimes involves input from business processes, or the analysis of workflow
in the organization. This can help to establish what information is needed in the database, and what can be left out.
For example, it can help when deciding whether the database needs to hold historic data as well as current data.
Having produced a conceptual data model that users are happy with, the next stage is to translate this into a schema
that implements the relevant data structures within the database. This process is often called logical database design,
and the output is a logical data model expressed in the form of a schema. Whereas the conceptual data model is (in
theory at least) independent of the choice of database technology, the logical data model will be expressed in termsof
a particular database model supported by the chosen DBMS. (The terms data model and database model are often
used interchangeably, but in this article we use data model for the design of a specific database, and database model
for the modelling notation used to express that design.)
The most popular database model for general-purpose databases is the relational model, or more precisely, the
relational model as represented by the SQL language. The process of creating a logical database design using this
model uses a methodical approach known as normalization. The goal of normalization is to ensure that each
elementary "fact" is only recorded in one place, so that insertions, updates, and deletions automatically maintain
consistency.
The final stage of database design is to make the decisions that affect performance, scalability, recovery, security,and
the like. This is often called physical database design. A key goal during this stage is data independence, meaning
that the decisions made for performance optimization purposes should be invisible to end-users and applications.
Physical design is driven mainly by performance requirements, and requires a good knowledge of the expected
workload and access patterns, and a deep understanding of the features offered by the chosen DBMS.
Another aspect of physical database design is security. It involves both defining access control to database objects as
well as defining security levels and methods for the data itself.
Database 10
Databasemodels
A database model is a type of
datamodel that determines the logical
structure of a database and
fundamentally determines in which
manner data can be stored, organized,
and manipulated. The most popular
example of a database model is the
relational model (or the SQL
approximation of relational), which
uses a table-based format.
External,conceptual,andinternalviews
A database management system
provides three views of the database
data:
• Theexternalleveldefineshoweach
group of end-users sees the
organization of data in the database.
A single database can have any
number of views at the external
level.
• Theconceptuallevelunifiesthe
various external views into a
Traditionalview of data[16]
compatibleglobalview.Itprovides
the synthesis of all the external
views. It is out of the scope of the
various database end-users, and is
rather of interest to database
application developers and database administrators.
• The internallevel(or physical level) is the internal organization of data inside a DBMS (see Implementation
section below). It is concerned with cost, performance, scalability and other operational matters. It deals with
storagelayoutofthedata,usingstoragestructuressuchasindexestoenhanceperformance.Occasionallyitstores data of
individual views (materialized views), computed from generic data, if performance justification exists for
suchredundancy.Itbalancesalltheexternalviews'performancerequirements,possiblyconflicting,inanattempt to
optimize overall performance across all activities.
While there is typically only one conceptual (or logical) and physical (or internal) view of the data, there can be any
number of different external views. This allows users to see database information in a more business-related way
rather than from a technical, processing viewpoint. For example, a financial department of a company needs the
payment details of all employees as part of the company's expenses, but does not need details about employees that
aretheinterestofthehumanresourcesdepartment.Thusdifferentdepartmentsneeddifferentviewsofthecompany's database.
The three-level database architecture relates to the concept of data independencewhich was one of the major initial
driving forces of the relational model. The idea is that changes made at a certain level do not affect the view at a
higher level. For example, changes in the internal level do not affect application programs written using conceptual
level interfaces, which reduces the impact of making physical changes to improve performance.
The conceptual view provides a level of indirection between internal and external. On one hand it provides acommon
view of the database, independent of different external view structures, and on the other hand it abstracts away
details of how the data is stored or managed (internal level). In principle every level, and even every external view,
can be presented by a different data model. In practice usually a given DBMS uses the same data model for both the
external and the conceptual levels (e.g., relational model). The internal level, which is hidden inside the DBMS and
depends on its implementation (see Implementation section below), requires a different level of detailand uses its
own types of data structure types.
Separating the external, conceptual and internal levels was a major feature of the relational database model
implementations that dominate 21st century databases.
Database 12
Databaselanguages
Database languages are special-purpose languages, which do one or more of the following:
• Data definition language - defines data types and the relationships among them
• Data manipulation language - performs tasks such as inserting, updating, or deleting data occurrences
• Querylanguage-allowssearchingforinformationandcomputingderivedinformation
Database languages are specific to a particular data model. Notable examples include:
• SQL combines the roles of data definition, data manipulation, and query in a single language. It was one of the
firstcommerciallanguagesfortherelationalmodel,althoughitdepartsinsomerespectsfromtherelationalmodelas
described by Codd (for example, the rows and columns of a table can be ordered). SQL became a standard of the
American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO)
in 1987. The standards have been regularly enhanced since and is supported (with varying degrees of
conformance) by all mainstream commercial relational DBMSs.
• OQLisanobjectmodellanguagestandard(fromtheObjectDataManagementGroup).Ithasinfluencedthe design of
some of the newer query languages like JDOQL and EJB QL.
• XQueryisastandardXMLquerylanguageimplementedbyXMLdatabasesystemssuchasMarkLogicand eXist,
by relational databases with XML capability such as Oracle and DB2, and also by in-memory XML
processors such as Saxon.
• SQL/XML combines XQuery with SQL.
Adatabase language may alsoincorporate features like:
• DBMS-specific Configuration and storage engine management
• Computationstomodifyqueryresults,likecounting,summing,averaging,sorting,grouping,and cross-
referencing
• Constraint enforcement (e.g. in an automotive database, only allowing one engine type per car)
• Application programming interface version of the query language, for programmer convenience
Performance,security,andavailability
Because of the critical importance of database technology to the smooth running of an enterprise, database systems
include complex mechanisms to deliver the required performance, security, and availability, and allow database
administrators to control the use of these features.
Databasestorage
Database storage is the container of the physical materialization of a database. It comprises the internal (physical)
level in the database architecture. It also contains all the information needed (e.g., metadata, "data about the data",
and internal data structures) to reconstruct the conceptual level and external level from the internal level when
needed. Putting data into permanent storage is generally the responsibility of the database engine a.k.a. "storage
engine". Though typically accessed by a DBMS through the underlying operating system (and often utilizing the
operating systems'file systems as intermediates for storage layout), storage properties and configuration setting are
extremely important for the efficient operation of the DBMS, and thus are closely maintained by database
administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g.,
memory and external storage). The database data and the additional needed information, possibly in very large
amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the
way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these
levels' reconstruction when needed by users and programs, as well as for computing additional types of needed
information from the data (e.g., when querying the database).
Database 13
Some DBMS support specifying which character encoding was used to store data, so multiple encodings can be used
in the same database.
Various low-level database storage structures are used by the storage engine to serialize the data model so it can be
written to the medium of choice. Techniques such as indexing may be used to improve performance. Conventional
storage is row-oriented, but there are also column-oriented and correlation databases.
Databasematerializedviews
Often storage redundancy is employed to increase performance. A common example is storing materialized views,
which consist of frequently needed external views or query results. Storing such views saves the expensive
computing of them each time they are needed. The downsides of materialized views are the overhead incurred when
updating them to keep them synchronized with their original updated database data, and the cost of storage
redundancy.
Databaseanddatabaseobjectreplication
Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to
increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same
database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of areplicated
object need to be synchronized across the object copies. In many cases the entire database is replicated.
Databasesecurity
Database security deals with all various aspects of protecting the database content, its owners, and its users. It ranges
from protection from intentional unauthorized database uses to unintentional database accesses by unauthorized
entities (e.g., a person or a computer program).
Database access control deals with controlling who (a person or a certain computer program) is allowed to access
what information in the database. The information may comprise specific database objects (e.g., record types,specific
records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or utilizing
specific access paths to the former (e.g., using specific indexes or other data structures to access information).
Database access controls are set by special authorized (by the database owner) personnel that uses dedicated
protected security DBMS interfaces.
Thismaybemanageddirectlyonanindividualbasis,orbytheassignmentofindividualsandprivilegestogroups,or (in the
most elaborate models) through the assignment of individuals and groups to roles which are then granted
entitlements. Data security prevents unauthorized users from viewing or updating the database. Using passwords,
users are allowed access to the entire database or subsets of it called "subschemas". For example, an employee
database can contain all the data about an individual employee, but one group of users may be authorized to view
only payroll data, while others are allowed access to only work history and medical data. If the DBMS provides a
way to interactively enter and update the database, as well as interrogate it, this capability allows for managing
personal databases.
Data security in general deals with protecting specific chunks of data, both physically (i.e., from corruption, or
destruction, or removal; e.g., see physical security), or the interpretation of them, or parts of them to meaningful
information (e.g., by looking at the strings of bits that they comprise, concluding specific valid credit-card numbers;
e.g., see data encryption).
Change and access logging records who accessed which attributes, what was changed, and when it was changed.
Logging services allow for a forensic database audit later by keeping a record of access occurrences and changes.
Sometimes application-level code is used to record changes rather than leaving this to the database. Monitoring can
be set up to attempt to detect security breaches.
Database 14
Transactionsandconcurrency
Database transactions can be used to introduce some level of fault tolerance and data integrity after recovery from a
crash. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g.,
reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems.
Each transaction has well defined boundaries in terms of which program/code executions are included in that
transaction (determined by the transaction's programmer via special transaction commands).
The acronym ACID describes some ideal properties of a database transaction: Atomicity, Consistency, Isolation, and
Durability.
Migration
See also section Database migration in article Data migration
A database built with one DBMS is not portable to another DBMS (i.e., the other DBMS cannot run it). However, in
some situations it is desirable to move, migrate a database from one DBMS to another. The reasons are primarily
economical (different DBMSs may have different total costs of ownership or TCOs), functional, and operational
(different DBMSs may have different capabilities). The migration involves the database's transformation from one
DBMS type to another. The transformation should maintain (if possible) the database related application (i.e., all
related application programs) intact. Thus, the database's conceptual and external architectural levels should be
maintained in the transformation. It may be desired that also some aspects of the architecture internal level are
maintained. A complex or large database migration may be a complicated and costly (one-time) project by itself,
which should be factored into the decision to migrate. This in spite of the fact that tools may exist to help migration
between specific DBMS. Typically a DBMS vendor provides tools to help importing databases from other popular
DBMSs.
Databasebuilding,maintaining,andtuning
After designing a database for an application arrives the stage of building the database. Typically an appropriate
general-purpose DBMS can be selected to be utilized for this purpose. A DBMS provides the needed user interfaces
to be utilized by database administrators to define the needed application's data structures within the DBMS's
respective data model. Other user interfaces are used to select needed DBMS parameters (like security related,
storage allocation parameters, etc.).
When the database is ready (all its data structures and other needed components are defined) it is typically populated
with initial application's data (database initialization, which is typically a distinct project; in many cases using
specialized DBMS interfaces that support bulk insertion) before making it operational. In some cases the database
becomes operational while empty from application's data, and data are accumulated along its operation.
After completing building the database and making it operational arrives the database maintenance stage: Various
database parameters may need changes and tuning for better performance, application's data structures may be
changed or added, new related application programs may be written to add to the application's functionality, etc.
Contribution by Malebye Joyce as adapted from informations systems for businesses from chapter 5 - storing ad
organizing data. Databases are often confused with spread sheet such as Microsoft excel which is different from
Microsoft access. Both can be used to store information,however a database serves a better function at this. Below is
a comparison of spreadsheets and databases. Spread sheets strengths -1. Very simple data storage 2. Relatively easy
to use 3. Require less planning Weaknesses- 1. Data integrity problems, include inaccurate,inconsistent and out of
date version and out of date data. 2. Formulas could be incorrect Databases strengths 1. Methods for keeping data up
to date and consistent 2. Data is of higher quality than data stored in spreadsheets 3. Good for storing and organizing
information. Weakness 1. Require more planning and designing
Database 15
Backupandrestore
Sometimes it is desired to bring a database back to a previous state (for many reasons, e.g., cases when the database
is found corrupted due to a software error, or if it has been updated with erroneous data). To achieve this a backup
operationisdoneoccasionallyorcontinuously,whereeachdesireddatabasestate(i.e.,thevaluesofitsdataandtheir
embedding in database's data structures) is kept within dedicated backup files (many techniques exist to do this
effectively). When this state is needed, i.e., when it is decided by a database administrator to bring the database back
to this state (e.g., by specifying this state by a desired point in time when the database was in this state), these files
are utilized to restorethat state.
Other
OtherDBMSfeaturesmightinclude:
• Databaselogs
• Graphics component for producing graphs and charts, especially in a data warehousesystem
• Queryoptimizer-Performsqueryoptimizationoneveryquerytochooseforitthemostefficientqueryplan(a partial
order (tree) of operations) to be executed to compute the query result. May be specific to a particular storage
engine.
• Tools or hooks for database design, application programming, application program maintenance, database
performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a
DBMS and related database may span computers, networks, and storage units) and related database mapping
(especiallyforadistributedDBMS),storageallocationanddatabaselayoutmonitoring,storagemigration,etc.
References
[1] JeffreyUllman1997: Firstcourse indatabase systems,Prentice-Hall Inc.,Simon &Schuster, Page1, ISBN0-13-861337-0.
[2] Tsitchizris,D.C.andF.H.Lochovsky(1982). DataModels.Englewood-Cliffs,Prentice-Hall.
[3] Beynon-DaviesP.(2004).DatabaseSystems3rdEdition.Palgrave,Basingstoke,UK.ISBN1-4039-1601-2
[4] .This article quotes a development time of 5 years involving 750 people for DB2 release 9 alone
[5] (Turing Award Lecture 1973)
[6] Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data
Banks"(http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf). In: Communications of the ACM 13 (6): 377–387.
[7] WilliamHershey and Carol Easthope, "A set theoreticdata structure and retrieval language"(https://docs.google.com/
open?id=0B4t_NX-QeWDYNmVhYjAwMWMtYzc3ZS00YjI0LWJhMjgtZTYyODZmNmFkNThh), Spring Joint Computer
Conference,May1972inACMSIGIRForum,Volume7,Issue4(December1972),pp.45-55,DOI=10.1145/1095495.1095500(http://doi.acm.org/
10.1145/1095495.1095500)
[8] Ken North,"Sets, DataModelsandDataIndependence"(http://drdobbs.com/blogs/database/228700616), Dr.Dobb's,10March2010
[9] Descriptionof a set-theoretic data structure (http://hdl.handle.net/2027.42/4163), D. L. Childs, 1968, Technical Report 3 of
theCONCOMP (Research in Conversational Use of Computers) Project, University of Michigan, Ann Arbor, Michigan, USA
[10] FeasibilityofaSet-TheoreticDataStructure:AGeneralStructureBasedonaReconstitutedDefinitionofRelation(http://hdl.handle.net/
2027.42/4164), D. L. Childs, 1968, Technical Report 6 of the CONCOMP (Research in Conversational Use of Computers) Project,
Universityof Michigan,Ann Arbor,Michigan, USA
[11] MICRO Information Management System (Version 5.0) Reference Manual (http://docs.google.com/viewer?
a=v&pid=explorer&chrome=true&srcid=0B4t_NX-QeWDYZGMwOTRmOTItZTg2Zi00YmJkLTg4MTktN2E4MWU0YmZlMjE3),
M.A. Kahn,
D.L.Rumelhart,andB.L.Bronson,October1977,InstituteofLaborandIndustrialRelations(ILIR),UniversityofMichiganandWayneStateUnive
rsity
[12] InterviewwithWayneRatliff(http://www.foxprohistory.org/interview_wayne_ratliff.htm).TheFoxProHistory.Retrievedon2013-
07-12.
[13] Development of anobject-oriented DBMS; Portland,Oregon, United States;Pages: 472 –482; 1986; ISBN0-89791-204-7
[14] Graves, Steve. "COTS Databases For Embedded Systems"(http://www.embedded-computing.com/articles/id/?2020),
EmbeddedComputing Design magazine, January 2007. Retrieved on August 13, 2008.
[15] Argumentation in Artificial Intelligence by Iyad Rahwan, Guillermo R. Simari
[16] itl.nist.gov (1993) Integration Definition for Information Modeling (IDEFIX) (http://www.itl.nist.gov/fipspubs/idef1x.doc).
21December1993.
Database 16
Furtherreading
• LingLiuandTamerM.Özsu(Eds.)(2009)."EncyclopediaofDatabaseSystems(http://www.springer.com/
computer/database+management+&+information+retrieval/book/978-0-387-49616-0), 4100 p. 60 illus.
ISBN978-0-387-49616-0.
• Beynon-Davies,P. (2004). Database Systems. 3rdEdition. Palgrave, Houndmills, Basingstoke.
• Connolly, Thomas and Carolyn Begg. Database Systems. New York: Harlow, 2002.
• Date,C.J.(2003).AnIntroductionto DatabaseSystems,FifthEdition.AddisonWesley. ISBN0-201-51381-1.
• Gray,J.andReuter,A.TransactionProcessing:ConceptsandTechniques,1stedition,MorganKaufmann
Publishers, 1992.
• Kroenke,David M. andDavid J. Auer. DatabaseConcepts. 3rd ed. NewYork: Prentice, 2007.
• RaghuRamakrishnanandJohannesGehrke,DatabaseManagementSystems(http://pages.cs.wisc.edu/
~dbbook/)
• Abraham Silberschatz,HenryF. Korth,S.Sudarshan, DatabaseSystemConcepts (http://www.db-book.com/)
• Discussionondatabasesystems,(http://www.bbconsult.co.uk/Documents/Database-Systems.docx)
• Lightstone,S.;Teorey,T.;Nadeau,T.(2007).PhysicalDatabaseDesign:thedatabaseprofessional'sguideto
exploiting indexes, views, storage, and more. Morgan Kaufmann Press. ISBN0-12-369389-6.
• Teorey,T.;Lightstone,S.andNadeau,T.DatabaseModeling&Design:LogicalDesign,4thedition,Morgan
Kaufmann Press, 2005. ISBN 0-12-685352-5
Externallinks
• Database(http://www.dmoz.org/Computers/Data_Formats/Database/)attheOpenDirectoryProject
Databasemodel
A databasemodelis a type of
datamodel that determines the logical
structure of a database and
fundamentally determines in which
manner data can be stored, organized,
and manipulated. The most popular
example of a database model is the
relationalmodel,whichusesa table-
based format.
Commonlogicaldatamodelsfor databases
include:
• Hierarchical database model
• Network model
• Relational model
• Entity–relationshipmodel
• Enhancedentity–
relationshipmodel Collage of five types of database models.
• Object model
• Document model
• Entity–attribute–valuemodel
Database model 17
• Star schema
Anobject-relationaldatabasecombinesthetworelatedstructures.
Physical data models include:
• Inverted index
• Flat file
Other models include:
• Associative model
• Multidimensional model
• Multivalue model
• Semantic model
• XML database
• Named graph
• Triplestore
Relationshipsandfunctions
A given database management system may provide one or more of the five models. The optimal structure dependson
the natural organization of the application's data, and on the application's requirements, which include transaction
rate (speed), reliability, maintainability, scalability, and cost. Most database management systems are built around
one particular data model, although it is possible for products to offer support for more than one model.
Various physical data models can implement any given logical model. Most database software will offer the user
some level of control in tuning the physical implementation, since the choices that are made have a significant effect
on performance.
A model is not just a way of structuring data: it also defines a set of operations that can be performed on the data.The
relational model, for example, defines operations such as select (project) and join. Although these operations may
not be explicit in a particular query language, they provide the foundation on which a query language is built.
Flatmodel
The flat (or table) model consists of a
single, two-dimensional array of data
elements, where all members of agiven
column are assumed to besimilar
values, and all members of a row are
assumed to be related to one another.
For instance, columns for name and
password that might be used as a part
of a system security database. Each
row would have the specific
passwordassociatedwithanindividual
user.Columnsofthetableoftenhavea
type associated with them, defining them as character data, date or time information, integers, or floating point
Flat File Model.
numbers. This tabular format is a precursor to the relational model.
Database model 18
Earlydatamodels
These models were popular in the 1960s, 1970s, but nowadays can be found primarily in old legacy systems. They
are characterized primarily by being navigational with strong connections between their logical and physical
representations, and deficiencies in data independence.
Hierarchicalmodel
In a hierarchical model, data is
organized into a tree-like structure,
implying a single parent for each
record. A sort field keeps sibling
records in a particular order.
Hierarchical structures were widely
used in the early mainframe database
management systems, such as the
Information Management System
(IMS) by IBM, and now describe the
structure of XML documents. This
structure allows one one-to-many
relationshipbetweentwotypesofdata.
Thisstructureisveryefficientto
describe many
Hierarchical Model.relationships in the real world; recipes, table of contents, ordering of paragraphs/verses, any nested
and sorted information.
This hierarchy is used as the physical order of records in storage. Record access is done by navigating through the
data structure using pointers combined with sequential accessing. Because of this, the hierarchical structure is
inefficient for certain database operations when a full path (as opposed to upward link and sort field) is not also
included for each record. Such limitations have been compensated for in later IMS versions by additional logical
hierarchies imposed on the base physical hierarchy.
Database model 19
Networkmodel
The network model expands upon the
hierarchicalstructure,allowingmany-to-
manyrelationshipsinatree-like structure
that allows multiple parents. It was the
most popular before being replaced by
the relational model, and is defined by
the CODASYLspecification.
A set consists of circular linked lists where one record type, the set owner or parent, appears once in each circle, and
NetworkModel.
a second record type, the subordinate or child, may appear multiple times in each circle. In this way a hierarchy may
be established between any two record types, e.g., type A is the owner of B. At the same time another set may be
defined where B is the owner of A. Thus all the sets comprise a general directed graph (ownership defines a
direction), or network construct. Access to records is either sequential (usually in each record type) or by navigation
in the circular linked lists.
The network model is able to represent redundancy in data more efficiently than in the hierarchical model, and there
can be more than one path from an ancestor node to a descendant. The operations of the network model are
navigational in style: a program maintains a current position, and navigates from one record to another by following
the relationships in which the record participates. Records can also be located by supplying key values.
Although it is not an essential feature of the model, network databases generally implement the set relationships by
means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at
the expense of operations such as database loading and reorganization.
Popular DBMS products that utilized it were Cincom Systems' Total and Cullinet's IDMS. IDMS gained a
considerable customer base; in the 1980s, it adopted the relational model and SQL in addition to its original toolsand
languages.
Most object databases (invented in the 1990s) use the navigational concept to provide fast navigation acrossnetworks
of objects, generally using object identifiers as "smart" pointers to related objects. Objectivity/DB, for
instance,implementsnamedone-to-one,one-to-many,many-to-one,andmany-to-manynamedrelationshipsthatcan cross
databases. Many object databases also support SQL, combining the strengths of both models.
Database model 20
Invertedfilemodel
In an inverted file or inverted index, the contents of the data are used as keys in a lookup table, and the values in the
table are pointers to the location of each instance of a given content item. This is also the logical structure of
contemporary database indexes, which might only use the contents from a particular columns in the lookup table.The
inverted file data model can put indexes in a second set of files next to existing flat database files, in order to
efficiently directly access needed records in these files.
Notable for using this data model is the ADABAS DBMS of Software AG, introduced in 1970. ADABAS hasgained
considerable customer base and exists and supported until today. In the 1980s it has adopted the relational model and
SQL in addition to its original tools and languages.
Relationalmodel
[2]
The relational model was introduced by E.F. Codd in 1970 as a way to make database management systems more
independent of any particular application. It is a mathematical model defined in terms of predicate logic and
settheory, and systems implementing it have been used by mainframe, midrange and microcomputer systems.
The products that are generally referred to as relational databases in fact implement a model that is only an
approximation to the mathematical model defined by Codd. Three key terms are used extensively in relational
database models: relations, attributes, and domains. A relation is a table with columns and rows. The namedcolumns
of the relation are called attributes, and the domain is the set of values the attributes are allowed to take.
The basic data structure of the relational model is the table, where information about a particular entity (say, an
employee) is represented in rows (also called tuples) and columns. Thus, the "relation" in "relational database" refers
to the various tables in the database; a relation is a set of tuples. The columns enumerate the various attributes of the
entity (the employee's name, address or phone number, for example), and a row is an actual instance of the entity (a
specificemployee)thatisrepresentedbytherelation.Asaresult,eachtupleoftheemployeetablerepresentsvarious attributes
of a single employee.
All relations (and, thus, tables) in a relational database have to adhere to some basic rules to qualify as relations.First,
the ordering of columns is immaterial in a table. Second, there can't be identical tuples or rows in a table. And third,
each tuple will contain a single value for each of its attributes.
A relational database contains multiple tables, each similar to the one in the "flat" database model. One of the
strengths of the relational model is that, in principle, any value occurring in two different records (belonging to the
same table or to different tables), implies a relationship among those two records. Yet, in order to enforce explicit
integrityconstraints,relationshipsbetweenrecordsintablescanalsobedefinedexplicitly,byidentifyingor non-identifying
parent-child relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have a
designated single attribute or a set of attributes that can act as a "key", which can be used to uniquely identify each
tuple in the table.
A key that can be used to uniquely identify a row in a table is called a primary key. Keys are commonly used to join
or combine data from two or more tables. For example, an Employee table may contain a column named Location
which contains a value that matches the key of a Location table. Keys are also critical in the creation of indexes,
whichfacilitatefastretrievalofdatafromlargetables.Anycolumncanbeakey,ormultiplecolumnscanbe
Database model 21
grouped together into a compound key. It is not necessary to define all the keys in advance; a column can be used asa
key even if it was not originally intended to be one.
A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's serial number) is
sometimes called a "natural" key. If no natural key is suitable (think of the many people named Brown), an arbitrary
or surrogate key can be assigned (such as by giving employees ID numbers). In practice, most databases have both
generated and natural keys, because generated keys can be used internally to create links between rows that cannot
break, while natural keys can be used, less reliably, for searches and for integration with other databases. (For
example, records in two independently developed databases could be matched up by social security number, except
when the social security numbers are incorrect, missing, or have changed.)
The most common query language used with the relational model is the Structured Query Language (SQL).
Dimensionalmodel
The dimensional model is a specialized adaptation of the relational model used to represent data in data warehouses
in a way that data can be easily summarized using online analytical processing, or OLAP queries. In the dimensional
model, a database schema consists of a single large table of facts that are described using dimensions and measures.
A dimension provides the context of a fact (such as who participated, when and where it happened, and its type) and
is used in queries to group related facts together. Dimensions tend to be discrete and are often hierarchical; for
example,thelocationmightincludethebuilding,state,andcountry.Ameasureisaquantitydescribingthefact,such as
revenue. It is important that measures can be meaningfully aggregated—for example, the revenue from different
locations can be added together.
InanOLAP query,dimensions are chosenand the factsare groupedand aggregated togetherto create asummary.
The dimensional model is often implemented on top of the relational model using a star schema, consisting of one
highly normalized table containing the facts, and surrounding denormalized tables containing each dimension. An
alternative physical implementation, called a snowflake schema, normalizes multi-level hierarchies within a
dimension into multiple tables.
A data warehouse can contain multiple dimensional schemas that share dimension tables, allowing them to be used
together. Coming up with a standard set of dimensions is an important part of dimensional modeling.
Its high performance has made the dimensional model the most popular database structure for OLAP.
Post-relationaldatabasemodels
Products offering a more general data model than the relational model are sometimes classified as post-relational.
[3]
Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such
products incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires that
all information in the database must be cast explicitly in terms of values in relations and in no other way
Some of these extensions to the relational model integrate concepts from technologies that pre-date the relational
model. For example, they allow representation of a directed graph with trees on the nodes. The German company
sones implements this concept in its GraphDB.
Some post-relational products extend relational systems with non-relational features. Others arrived in much thesame
place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically
pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational.
Theresource spacemodel (RSM)is a non-relationaldata modelbased onmulti-dimensional classification.
Database model 22
Graphmodel
Graphdatabasesallowevenmoregeneralstructurethananetworkdatabase;anynodemaybeconnectedtoanyother node.
Multivaluemodel
Multivalue databases are "lumpy" data, in that they can store exactly the same way as relational databases, but they
also permit a level of depth which the relational model can only approximate using sub-tables. This is nearlyidentical
to the way XML expresses data, where a given field/attribute can have multiple right answers at the same time.
Multivalue can be thought of as a compressed form of XML.
An example is an invoice, which in either multivalue or relational data could be seen as (A) Invoice Header Table -
one entry per invoice, and (B) Invoice Detail Table - one entry per line item. In the multivalue model, we have the
option of storing the data as on table, with an embedded table to represent the detail: (A) Invoice Table - one entry
per invoice, no other tables needed.
The advantage is that the atomicity of the Invoice (conceptual) and the Invoice (data representation) are one-to-one.
This also results in fewer reads, less referential integrity issues, and a dramatic decrease in the hardware needed to
support a given transaction volume.
A variety of these ways have been tried Wikipedia:Manual of Style/Words to watch#Unsupported attributionsfor
storing objects in a database. SomeWikipedia:Avoid weasel words products have approached the problem from the
application programming end, by making the objects manipulated by the program persistent. This typically requires
the addition of some kind of query language, since conventional programming languages do not have the ability to
find objects based on their information content. OthersWikipedia:Avoid weasel words have attacked the problem
from the database end, by defining an object-oriented data model for the database, and defining a database
programming language that allows full programming capabilities as well as traditional query facilities.
Object databases suffered because of a lack of standardization: although standards were defined by ODMG, they
wereneverimplementedwellenoughtoensureinteroperabilitybetweenproducts.Nevertheless,objectdatabases
Database model 23
have been used successfully in many applications: usually specialized applications such as engineering databases or
molecular biology databases rather than mainstream commercial data processing. However, object database ideas
were picked up by the relational vendors and influenced extensions made to these products and indeed to the
SQLlanguage.
An alternative to translating between objects and relational databases is to use an object-relational mapping (ORM)
library.
References
[1] http://toolserver.org/%7Edispenser/cgi-bin/dab_solver.py?page=Database_model&editintro=Template:Disambiguation_needed/
editintro&client=Template:Dn
[2]
E.F.Codd(1970)."Arelationalmodelofdataforlargeshareddatabanks".In:CommunicationsoftheACMarchive.Vol13.Issue6(June1970).pp.377-
387.
[3] IntroducingdatabasesbyStephenChu,inConrick,M.(2006)Healthinformatics:transforminghealthcarewithtechnology,Thomson,ISBN0-17-
012731-1, p. 69.
Databasenormalization
Database normalization is the process of organizing the fields and tables of a relational database to minimize
redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant)
tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and
modifications of a field can be made in just one table and then propagated through the rest of the database using the
defined relationships.
Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know
as the First Normal Form (1NF) in 1970. Codd went on to define the Second Normal Form (2NF) and Third Normal
[1]
Form (3NF) in 1971, and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in 1974.
[2] [3]
Informally, a relational database table is often described as "normalized" if it is in the Third Normal Form. Most
3NF tables are free of insertion, update, and deletion anomalies.
A standard piece of database design guidance is that the designer should first create a fully normalized design; then
[4]
selective denormalization can be performed for performance reasons.
Objectivesofnormalization
A basic objective of the first normal form defined by Edgar Frank "Ted" Codd in 1970 was to permit data to be
[5]
queried and manipulated using a "universal data sub-language" grounded in first-order logic. (SQL is an example
[6]
of such a data sub-language, albeit one that Codd regarded as seriously flawed.)
Theobjectives of normalizationbeyond 1NF (First NormalForm) were stated asfollows by Codd:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
2. To reduce the need for restructuring the collection of relations, as new types of data are
introduced,and thus increase the life span of application programs;
3. To make the relational model more informative to users;
4. To make the collection of relations neutral to the query statistics, where these statistics are liable
tochange as time goes by.
[7]
—E.F.Codd,"FurtherNormalizationoftheDataBaseRelationalModel"
The sections below give details of each of these objectives.
Database normalization 24
Freethedatabaseofmodificationanomalies
When an attempt is made to modify (update,
insert into, or delete from) a table, undesired
side-effects may follow. Not all tables can suffer
from these side-effects; rather, the side-effects
can only arise in tables that have not been
sufficiently normalized. An insufficiently
normalized table might have one or more of the
following characteristics:
• The same information can be expressed on
multiple rows; therefore updates to the table Anupdateanomaly.Employee519isshownashavingdifferentaddresseson different records.
may result in logical inconsistencies. For
example, each record in an "Employees'
Skills" table might contain an Employee ID,
Employee Address, and Skill; thus a change
of address for a particular employee will
potentially need to be applied to multiple
records (one for each skill). If the update is
not carried through successfully—if, that is,
the employee's address is updated on some
records but not others—then the table is left
inaninconsistentstate.Specifically,thetable
provides conflicting answers to the question Aninsertionanomaly.Untilthenewfacultymember,Dr.Newsome,isassigned to teach at least one course
of what this particular employee's address is.
This phenomenon is known as an update
anomaly.
Minimizeredesignwhenextendingthedatabasestructure
Whenafullynormalizeddatabasestructureisextendedtoallowittoaccommodatenewtypesofdata,the pre-existing aspects
of the database structure can remain largely or entirely unchanged. As a result, applications interacting with the
database are minimally affected.
Makethedatamodelmoreinformativetousers
Normalized tables, and the relationship between one normalized table and another, mirror real-world concepts and
their interrelationships.
Avoidbiastowardsanyparticularpatternofquerying
Normalized tables are suitable for general-purpose querying. This means any queries against these tables, including
future queries whose details cannot be anticipated, are supported. In contrast, tables that are not normalized lend
themselves to some types of queries, but not others.
For example, consider an online bookseller whose customers maintain wishlists of books they'd like to have. For the
obvious,anticipatedquery—whatbooksdoesthiscustomerwant?—it'senoughtostorethecustomer'swishlistinthe table as,
say, a homogeneous string of authors and titles.
With this design, though, the database can answer only that one single query. It cannot by itself answer interestingbut
unanticipated queries: What is the most-wished-for book? Which customers are interested in WWII espionage? How
does Lord Byron stack up against his contemporary poets? Answers to these questions must come from special
adaptive tools completely separate from the database. One tool might be software written especially to handle such
queries. This special adaptive software has just one single purpose: in effect to normalize the non-normalized field.
Unforeseen queries can be answered trivially, and entirely within the database framework, with a normalized table.
Example
Querying and manipulating the data within a data structure which is not normalized, such as the following non-1NF
representation of customers' credit card transactions, involves more complexity than is really necessary:
CustomerJonesWilkinsonStevensTransactions
To each customer corresponds a repeating group of transactions. The automated evaluation of any query relating to
customers' transactions therefore would broadly involve two stages:
1. Unpackingoneormorecustomers'groupsoftransactionsallowingtheindividualtransactionsinagrouptobe
examined, and
2. Deriving a query result based on the results of the first stage
For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all
customers, the system would have to know that it must first unpack the Transactions group of each customer, then
sum the Amounts of all transactions thus obtained where the Date of the transaction falls in October 2003.
OneofCodd'simportantinsightswasthatthisstructuralcomplexitycouldalwaysberemovedcompletely,leadingto much
greater power and flexibility in the way queries could be formulated (by users and applications) and evaluated (by
the DBMS). The normalized equivalent of the structure above would look like this:
Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest,
simply by finding all rows with a Date falling in October, and summing their Amounts. The data structure places all
of the values on an equal footing, exposing each to the DBMS directly, so each can potentially participate directly in
queries; whereas in the previous situation some values were embedded in lower-level structures that had to be
handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the
unnormalized design does not.
Backgroundtonormalization:definitions
Functional dependency
In a given table, an attribute Y is said to have a functional dependency on a set of attributes X (written X →Y)if
and only if each X value is associated with precisely one Y value. For example, in an "Employee" table that
includes the attributes "Employee ID" and "Employee Date of Birth", the functional dependency {Employee
ID}→{EmployeeDateofBirth}wouldhold.Itfollowsfromtheprevioustwosentencesthateach
{EmployeeID} is associatedwith precisely one{Employee Date ofBirth}.
Trivial functional dependency
A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee
ID, Employee Address} → {Employee Address} is trivial, as is {Employee Address} → {EmployeeAddress}.
Normalforms
The normalforms(abbrev. NF) of relational database theory provide criteria for determining a table's degree of
immunity against logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less
vulnerable it is. Each table has a "highest normal form" (HNF): by definition, a table always meets the
requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the
requirements of any normal form higher than its HNF.
Database normalization 28
The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that
all of its tables are in normal form n.
Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF
design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization
typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is
overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually
require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to
meet the requirements of these higher normal forms.
The main normal forms are summarized below.
1NF First normal form Twoversions:E.F.Codd 1970 A relation is in first normal form if the domain of each attribute contains
(1970), C.J. and2003 onlyatomicvalues,andthevalueofeachattributecontainsonlyasinglevaluefromthatdom
Date(2003) [8] ain.
3NF Thirdnormal Twoversions:E.F.Codd 1971 Every non-prime attribute is non-transitively dependent on every candidate key
form (1971), C. and1982 inthetable.Theattributesthatdonotcontributetothedescriptionoftheprimarykeyare
Zaniolo (1982) [9] removed from the table. In other words, no transitive dependency is allowed.
5NF Fifth normal form Ronald Fagin 1979 Every non-trivial join dependency in the table is implied by the superkeys of the
[11] table
DKNF Domain/key Ronald Fagin 1981 Every constraint on the table is a logical consequence of the table's domain
normal form [12] constraints and key constraints
6NF Sixth normal C.J.Date,Hugh 2002 Table features no non-trivial join dependencies at all (with reference to generalized
form Darwen,andNikos [13] join operator)
Lorentzos
Denormalization
Databases intended for online transaction processing (OLTP) are typically more normalized than databases intended
foronline analytical processing (OLAP). OLTP applications are characterized by a high volume of smalltransactions
such as updating a sales record at a supermarket checkout counter. The expectation is that each transaction will leave
the database in a consistent state. By contrast, databases intended for OLAP operations are primarily "read mostly"
databases. OLAP applications tend to extract historical data that has accumulated over along period of time. For such
databases, redundant or "denormalized" data may facilitate business intelligence applications. Specifically,
dimensional tables in a star schema often contain denormalized data. The denormalized or redundant data must be
carefully controlled during extract, transform, load (ETL) processing, and users should notbe permitted to see the
data until it is in a consistent state. The normalized alternative to the star schema is the snowflake schema. In many
cases, the need for denormalization has waned as computers and RDBMS software have
becomemorepowerful,butsincedatavolumeshavegenerallyincreasedalongwithhardwareandsoftware
Database normalization 29
performance,OLAPdatabasesoften stillusedenormalizedschemas.
Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and
mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also beused
when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift response
is crucial.
Non-firstnormalform(NF²orN1NF)
Denormalization is the opposite of normalization. In recognition that denormalization can be deliberate and useful,
the non-first normal form is a definition of database designs which do not conform to first normal form, by allowing
"sets and sets of sets to be attribute domains" (Schek 1982). The languages used to query and manipulate data in the
model must be extended accordingly to support such values.
One way of looking at this is to consider such structured values as being specialized types of values (domains), with
their own domain-specific languages. However, what is usually meant by non-1NF models is the approach in which
therelationalmodelandthelanguagesusedtoqueryitareextendedwithageneralmechanismforsuchstructure;for instance,
the nested relational model supports the use of relations as domain values, by adding two additional operators (nest
and unnest) to the relational algebra that can create and flatten nested relations, respectively.
Consider the following table:
FirstNormalForm
Person FavouriteColour
Bob blue
Bob red
Jane green
Jane yellow
Jane red
Assume a person has several favourite colours. Obviously, favourite colours consist of a set of colours modeled by
the given table. To transform a 1NF into an NF² table a "nest" operator is required which extends the relational
algebra of the higher normal forms. Applying the "nest" operator to the 1NF table yields the following NF² table:
Non-FirstNormalForm
Person FavouriteColours
Bob
FavouriteColour
blue
red
Jane
FavouriteColour
green
yellow
red
Database normalization 30
To transform this NF² table back into a 1NF an "unnest" operator is required which extends the relational algebra of
the higher normal forms. The unnest, in this case, would make "colours" into its own table.
Although "unnest" is the mathematical inverse to "nest", the operator "nest" is not always the mathematical
inverseof"unnest". Another constraint required is for the operators to be bijective, which is covered by the
PartitionedNormal Form (PNF).
Notesandreferences
[1] Codd, E.F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Series 6,
"DataBaseSystems",NewYorkCity,May24–25,1971.)IBMResearchReportRJ909(August31,1971).RepublishedinRandallJ.Rustin(ed.),Data
Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.
[2] Codd,E.F."RecentInvestigationsintoRelationalDataBaseSystems".IBMResearchReportRJ1385(April23,1974).RepublishedinProc.1974
Congress (Stockholm, Sweden, 1974). , N.Y.: North-Holland (1974).
[3] C.J.Date. AnIntroduction to Database Systems.Addison-Wesley (1999), p. 290
[4] ChrisDate,forexample,writes:"Ibelievefirmlythatanythinglessthanafullynormalizeddesignisstronglycontraindicated...
[Y]oushould"denormalize" only as a last resort. That is, you should back off from a fully normalized design only if all other strategies for
improvingperformance have somehow failed to meet requirements." Date, C.J. Database in Depth: Relational Theory for Practitioners.
O'Reilly (2005),
p. 152.
[5] "The adoption of a relational model of data ... permits the development of a universal data sub-language based on an applied
predicatecalculus. A first-order predicate calculus suffices if the collection of relations is in first normal form. Such a language would
provide ayardstick of linguistic power for all other proposed data languages, and would itself be a strong candidate for embedding (with
appropriatesyntacticmodification)inavarietyofhostIanguages(programming,command-orproblem-
oriented)."Codd,"ARelationalModelofDatafor Large Shared Data Banks"(http://www.acm.org/classics/nov95/toc.html), p. 381
[6] Codd,E.F.Chapter23,"SeriousFlawsinSQL",inTheRelationalModelforDatabaseManagement:Version2.Addison-Wesley(1990),pp.371–389
[7] Codd,E.F."FurtherNormalizationoftheDataBaseRelationalModel",p.34
[8] Date,C. J. "What First Normal Form Really Means" inDate on Database: Writings 2000–2006 (Springer-Verlag, 2006), pp. 127–128.
[9]
Zaniolo,Carlo."ANewNormalFormfortheDesignofRelationalDatabaseSchemata."ACMTransactionsonDatabaseSystems7(3),September1
982.
[10] Codd,E. F."Recent Investigations intoRelational Data Base Systems".IBM Research ReportRJ1385 (April 23, 1974).Republished in
Proc.1974 Congress(Stockholm, Sweden, 1974).New York, N.Y.:North-Holland (1974).
[11] RonaldFagin."NormalFormsandRelationalDatabaseOperators".ACMSIGMODInternationalConferenceonManagementofData,May31-June
1, 1979, Boston, Mass. Also IBM Research Report RJ2471, Feb. 1979.
[12] RonaldFagin(1981)ANormalFormforRelationalDatabasesThatIsBasedonDomainsandKeys(http://www.almaden.ibm.com/cs/people/
fagin/tods81.pdf), Communications of the ACM, vol. 6, pp. 387–415
[13] C.J.Date,Hugh Darwen,Nikos Lorentzos. TemporalData andthe Relational Model.Morgan Kaufmann(2002), p. 176
• Paper: "Non First Normal Form Relations" by G. Jaeschke, H. -J Schek ; IBM Heidelberg Scientific Center. ->
Paperstudyingnormalizationanddenormalizationoperatorsnestandunnestasmildlydescribedattheendofthis wiki
page.
Furtherreading
• Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
• Date, C. J. (1999), An Introduction to Database Systems
(http://www.aw-bc.com/catalog/academic/product/0,1144,0321197844,00.html) (8th ed.). Addison-Wesley
Longman. ISBN 0-321-19784-4.
• Kent,W.(1983)ASimpleGuidetoFiveNormalFormsinRelationalDatabaseTheory(http://www.bkent.net/Doc/
simple5.htm), Communications of the ACM, vol. 26, pp. 120–125
• H.-J.Schek, P.Pistor Data Structuresfor an IntegratedData BaseManagement and InformationRetrieval System
Database normalization 31
Externallinks
• Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm)by
Mike Chapple (About.com)
• Database Normalization Intro (http://www.databasejournal.com/sqletc/article.php/1428511), Part 2
(http://www.databasejournal.com/sqletc/article.php/26861_1474411_1)
• AnIntroductiontoDatabaseNormalization(http://mikehillyer.com/articles/an-
introduction-to-database-normalization/) by Mike Hillyer.
• Atutorialonthefirst3normalforms(http://phlonx.com/resources/nf3/)byFredCoulson
• DBNormalizationExamples(http://www.dbnormalization.com/)
• Descriptionofthedatabasenormalizationbasics(http://support.microsoft.com/kb/283878)byMicrosoft
• DatabaseNormalization and Design Techniques (http://www.barrywise.com/2008/01/
database-normalization-and-design-techniques/)byBarryWise, recommended reading forthe Harvard MIS.
• ASimpleGuidetoFiveNormalFormsinRelationalDatabaseTheory(http://www.bkent.net/Doc/simple5.htm)
Databasestoragestructures
Database tables and indexes may be stored on disk in one of a number of forms, including ordered/unordered
flatfiles, ISAM, heap files, hash buckets, or B+ trees. Each form has its own particular advantages and
disadvantages. The most commonly used forms are B+ trees and ISAM. Such forms or structures are one aspect of
the overall schema used by a database engine to store information.
Unordered
Unorderedstorage typically stores the records in the order they are inserted. Such storage offers good insertion
efficiency ( ), but inefficient retrieval times ( ). Typically these retrieval times are better, however, as
Ordered
Orderedstoragetypically stores the records in order and may have to rearrange or increase the file size when a new
record is inserted, resulting in lower insertion efficiency. However, ordered storage provides more efficient retrieval
as the records are pre-sorted, resulting in a complexity of .
Structuredfiles
Heapfiles
• Simplest and most basic method
• insert efficient, with new records added at the end of the file, providing chronological order
• retrieval inefficient as searching has to be linear
• deletion is accomplished by marking selected records as "deleted"
[clarify]
• requires periodic reorganization if file is very volatile
• Advantages
• efficient for bulk loading data
• efficient for relatively small relations as indexing overheads are avoided
Database storage structures 32
Hashbuckets
• Hashfunctionscalculatetheaddressofthepageinwhichtherecordistobestoredbasedononeormorefieldsin the record
• hashing functions chosen to ensure that addresses are spread evenly across the address space
• ‘occupancy’isgenerally40%to60%ofthetotalfile size
• uniqueaddress not guaranteed so collision detectionand collision resolution mechanisms are required
• Open addressing
• Chained/unchained overflow
• Prosandcons
• efficient for exact matches on key field
• not suitable for range retrieval, which requires sequential storage
• calculates where the record is stored based on fields in the record
• hash functions ensure even spread of data
• collisionsare possible, so collision detectionand restoration is required
B+trees
These are the most commonly used in practice.
• Time taken to access any record is the same because the same number of nodes is searched
• Indexis a full index so data filedoes not have to be ordered
• Prosandcons
• versatiledatastructure–sequentialaswellasrandomaccess
• access is fast
• supports exact, range, part key and pattern matches efficiently
• volatilefilesarehandledefficientlybecauseindexisdynamic–expandsandcontractsastablegrowsand shrinks
• lesswell suitedto relativelystable files –in thiscase, ISAMis moreefficient
Dataorientation
Most conventional relational databases use "row-oriented" storage, meaning that all data associated with a given row
is stored together. By contrast, column-oriented DBMS store all data from a given column together in order to more
quickly serve data warehouse-style queries. Correlation databases are similar to row-based databases, but apply a
layer of indirection to map multiple instances of the same value to the same numerical identifier.
Distributed database 33
Distributeddatabase
Adistributeddatabaseisadatabaseinwhichstoragedevicesarenotallattachedtoacommonprocessingunitsuch
astheCPU,controlledbyadistributeddatabasemanagementsystem(togethersometimescalledadistributed
database system). It may be stored in multiple computers, located in the same physical location; or may be
dispersed over a network of interconnected computers. Unlike parallel systems, in which the processors are tightly
coupled and constitute a single database system, a distributed database system consists of loosely-coupled sites that
share no physical components.
System administrators can distribute collections of data (e.g. in a database) across multiple physical locations. A
distributed database can reside on network servers on the Internet, on corporate intranets or extranets, or on other
company networks. Because they store data across multiple computers, distributed databases can improve
performance at end-user worksites by allowing transactions to be processed on many machines, instead of being
[1]
limited to one.
Twoprocesses ensure that the distributed databases remain up-to-date and current: replication and duplication.
1. Replication involves using specialized software that looks for changes in the distributive database. Once the
changes have been identified, the replication process makes all the databases look the same. The replication
processcanbecomplexandtime-consumingdependingonthesizeandnumberofthedistributeddatabases.This process
can also require a lot of time and computer resources.
2. Duplication, on the other hand, has less complexity. It basically identifies one database as a master and then
duplicatesthatdatabase.Theduplicationprocessisnormallydoneatasettimeafterhours.Thisistoensurethat each
distributed location has the same data. In the duplication process, users may change only the master database.
This ensures that local data will not be overwritten.
Both replication and duplication can keep the data current in all distributive locations.
Besides distributed database replication and fragmentation, there are many other distributed database design
technologies. For example, local autonomy, synchronous and asynchronous distributed database technologies. These
technologies' implementation can and does depend on the needs of the business and the sensitivity/confidentiality of
the data stored in the database, and hence the price the business is willing to spend on ensuring data security,
consistency and integrity.
When discussing access to distributed databases, Microsoft favors the term distributedquery, which it defines in
protocol-specific manner as "[a]ny SELECT, INSERT, UPDATE, or DELETE statement that references tables and
rowsets from one or more external OLE DB data sources". Oracle provides a more language-centric view in which
distributed queries and distributed transactions form part of distributedSQL.
Architecture
Adatabase user accesses thedistributed database through:
Local applications
applications which do not require data from other sites.
Global applications
applications which do require data from other sites.
Ahomogeneousdistributeddatabasehasidenticalsoftwareand hardware running all databases instances, and may
appearthroughasingleinterfaceasifitwereasingledatabase.Aheterogeneousdistributeddatabasemayhave
different hardware, operating systems, database management systems, and even data models for different databases.
Distributed database 34
HomogeneousDDBMS
In a homogeneous distributed database all sites have identical software and are aware of each other and agree to
cooperateinprocessinguserrequests.Eachsitesurrenderspartofitsautonomyintermsofrighttochangeschemaor software.
A homogeneous DDBMS appears to the user as a single system. The homogeneous system is much easier to design
and manage. The following conditions must be satisfied for homogeneous database:
• Theoperatingsystemused,ateachlocationmustbesameorcompatible.Wikipedia:Avoidweaselwords
Wikipedia:Please clarify
• The data structures used at each location must be same or compatible.
• Thedatabase application(or DBMS)used ateach locationmust besame orcompatible.
HeterogeneousDDBMS
In a heterogeneous distributed database, different sites may use different schema and software. Difference in schema
is a major problem for query processing and transaction processing. Sites may not be aware of each other and may
provide only limited facilities for cooperation in transaction processing. In heterogeneous systems, different nodes
may have different hardware & software and data structures at various nodes or locations are also incompatible.
Different computers and operating systems, database applications or data models may be used at each of the
locations. For example, one location may have the latest relational database management technology, while another
location may store data using conventional files or old version of database management system. Similarly, one
location may have the Windows NT operating system, while another may have UNIX. Heterogeneous systems are
usually used when individual sites use their own hardware and software. On heterogeneous system, translations are
required to allow communication between different sites (or DBMS). In this system, the users must be able to make
requests in a database language at their local sites. Usually the SQL database language is used for this purpose. If the
hardware is different, then the translation is straightforward, in which computer codes and word-length is changed.
The heterogeneous system is often not technically or economically feasible. In this system, a user at one locationmay
be able to read but not update the data at another location.
Importantconsiderations
Care with a distributed database must be taken to ensure the following:
• Thedistributionistransparent—usersmustbeabletointeractwiththesystemasifitwereonelogicalsystem. This
applies to the system's performance, and methods of access among other things.
• Transactionsaretransparent—eachtransactionmustmaintaindatabaseintegrityacrossmultipledatabases.
Transactionsmustalsobedividedintosub-transactions,eachsub-transactionaffectingonedatabasesystem.
There are two principal approaches to store a relation r in a distributed database system:
A) Replication
B) Fragmentation/Partitioning
A) Replication: In replication, the system maintains several identical replicas of the same relation r in different sites.
• Data is more available in this scheme.
• Parallelism is increased when read request is served.
• Increasesoverheadonupdateoperationsaseachsitecontainingthereplicaneededtobeupdatedinorderto maintain
consistency.
• Multi-datacenterreplicationprovidesgeographicaldiversity:http://basho.com/tag/
multi-datacenter-replication/
B) Fragmentation: The relation r is fragmented into several relations r , r , r ....r in such a way that the actual
1 2 3 n
relation could be reconstructed from the fragments and then the fragments are scattered to different locations. There
are basically two schemes of fragmentation:
Distributed database 35
• Horizontal fragmentation - splits the relation by assigning each tuple of r to one or more fragments.
• Vertical fragmentation - splits the relation by decomposing the schema R of relation r.
Advantages
• Managementofdistributeddatawithdifferentlevelsoftransparencylikenetworktransparency,fragmentation
transparency, replication transparency, etc.
• Increase reliability and availability
• Easier expansion
• Reflects organizational structure —database fragments potentially stored within the departments they relate to
• Localautonomyorsiteautonomy—adepartmentcancontrolthedataaboutthem(astheyaretheonesfamiliar with it)
• Protectionofvaluabledata—iftherewereeveracatastrophiceventsuchasafire,allofthedatawouldnotbein one place,
but distributed in multiple locations
• Improved performance —data is located near the site of greatest demand, and the database systems themselves
areparallelized,allowingloadonthedatabasestobebalancedamongservers.(Ahighloadononemoduleofthe
database won't affect other modules of the database in a distributed database)
• Economics —it may cost less to create a network of smaller computers with the power of a single large computer
• Modularity—systemscanbemodified,addedandremovedfromthedistributeddatabasewithoutaffectingother modules
(systems)
• Reliable transactions - due to replication of the database
• Hardware,operating-system,network,fragmentation,DBMS, replicationandlocationindependence
• Continuous operation, even if some nodes go offline (depending on design)
• Distributed query processing can improve performance
• Distributed transaction management
• Single-site failure does not affect performance of system.
• All transactions follow A.C.I.D.property:
• A-atomicity, the transaction takes place as a whole or not at all
• C-consistency,maps oneconsistent DB stateto another
• I-isolation, each transaction sees a consistent DB
• D-durability, the results of a transaction must survive system failures
[citation needed]
TheMerge Replication Method is popularly used to consolidate the data between databases.
Disadvantages
• Complexity—DBAsmayhavetodoextraworktoensurethatthedistributednatureofthesystemistransparent. Extra work
must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work
must also be done to account for the disconnected nature of the database — for example, joins become
prohibitively expensive when performed across multiple systems.
• Economics —increased complexity and a more extensive infrastructure means extra labour costs
• Security—remotedatabasefragmentsmustbesecured,andtheyarenotcentralizedsotheremotesitesmustbe secured as
well. The infrastructure must also be secured (for example, by encrypting the network links between
remote sites).
• Difficulttomaintainintegrity—butinadistributeddatabase,enforcingintegrityoveranetworkmayrequiretoo much of
the network's resources to be feasible
• Inexperience—distributeddatabasesaredifficulttoworkwith,andinsuchayoungfieldthereisnotmuch readily
available experience in "proper" practice
Distributed database 36
• Lackofstandards—therearenotoolsormethodologiesyettohelpusersconvertacentralizedDBMSintoa
[citation needed]
distributed
DBMS
• Databasedesignmorecomplex—besidesofthenormaldifficulties,thedesignofadistributeddatabasehasto consider
fragmentation of data, allocation of fragments to specific sites and data replication
• Additional software is required
• Operating system should support distributed environment
• Concurrency control poses a major issue. It can be solved by locking and timestamping.
• Distributed access to data
• Analysis of distributed data
References
[1] O'Brien,J.& Marakas,G.M.(2008) ManagementInformation Systems(pp. 185-189). NewYork, NY:McGraw-Hill Irwin
• M.T.ÖzsuandP.Valduriez,PrinciplesofDistributedDatabases(3rdedition)(2011),Springer,ISBN 978-1-
4419-8833-1
• ElmasriandNavathe,Fundamentalsofdatabasesystems(3rdedition),Addison-WesleyLongman,ISBN 0-
201-54263-3
• Oracle Database Administrator's Guide 10g (Release 1),
http://docs.oracle.com/cd/B14117_01/server.101/b10739/ds_concepts.htm
Federateddatabasesystem
Afederateddatabasesystemisatypeofmeta-databasemanagementsystem(DBMS),whichtransparentlymaps
multiple autonomous database systems into a single federated database. The constituent databases are
interconnected via a computer network and may be geographically decentralized. Since the constituent database
systems remain autonomous, a federated database system is a contrastable alternative to the (sometimes daunting)
task of merging several disparate databases. A federated database, or virtual database, is a composite of all
constituent databases in a federated database system. There is no actual data integration in the constituent disparate
databases as a result of data federation.
McLeod and Heimbigner were among the first to define a federated database system, as one which "define[s] the
architecture and interconnect[s] databases that minimize central authority yet support partial sharing andcoordination
among database systems".
Throughdataabstraction,federateddatabasesystemscanprovideauniformuserinterface,enablingusersandclients to store
and retrieve data in multiple noncontiguous databases with a single query -- even if the constituent databases are
heterogeneous. To this end, a federated database system must be able to decompose the query into subqueries for
submission to the relevant constituent DBMS's, after which the system must composite the result sets of the
subqueries. Because various database management systems employ different query languages, federated database
systems can apply wrappers to the subqueries to translate them into the appropriate query languages.
• Note: this description of federated databases does not accurately reflect the McLeod/Heimbigner definition of a
federated database. Rather, this description fits what McLeod/Heimbinger called a composite database.
McLeod/Heimbigner's federated database is a collection of autonomous components that make their data
availabletoothermembersofthefederationthroughthepublicationofanexportschemaandaccessoperations; there is
no unified, central schema that encompasses the information available from the members of the federation.
Amongothersurveys,definesaFederatedDatabaseasacollectionofcooperatingcomponentsystemswhichare
autonomousandarepossiblyheterogeneous.ThethreeimportantcomponentsofanFDBSaspointedoutinare
Federated database system 37
autonomy, heterogeneity and distribution. Another dimension which has also been considered is the Networking
EnvironmentComputerNetwork,e.g.,manyDBSsoveraLANormanyDBSsoveraWANupdaterelatedfunctions of
participating DBSs (e.g., no updates, nonatomic transitions, atomic updates).
FDBSarchitecture
A DBMS can be classified as either centralized or distributed. A centralized system manages a single database while
distributed manages multiple databases. A component DBS in a DBMS may be centralized or distributed. A multiple
DBS (MDBS) can be classified into two types depending on the autonomy of the component DBS as federated and
non federated. A nonfederated database system is an integration of component DBMS that are not autonomous. A
federated database system consists of component DBS that are autonomous yet participate in a federation to allow
partial and controlled sharing of their data.
Federated architectures differ based on levels of integration with the component database systems and the extent of
services offered by the federation. A FDBS can be categorized as loosely or tightly coupled systems.
• Loosely Coupled require component databases to construct their own federated schema. A user will typically
access other component database systems by using a multidatabase language but this removes any levels of
locationtransparency,forcingtheusertohavedirectknowledgeofthefederatedschema.Auserimportsthedata they
require from other component databases and integrates it with their own to form a federated schema.
• Tightlycoupledsystemconsistsofcomponentsystemsthatuseindependentprocessestoconstructandpublicize an
integrated federated schema.
Multiple DBS of which FDBS are a specific type can be characterized along three dimensions: Distribution,
Heterogeneity and Autonomy. Another characterization could be based on the dimension of networking, for example
single databases or multiple databases in a LAN or WAN.
Distribution
Distribution of data in an FDBS is due to the existence of a multiple DBS before an FDBS is built. Data can be
distributed among multiple DB which could be stored in a single computer or multiple computers. These computers
could be geographically located in different places but interconnected by a network. The benefits of data distribution
help in increased availability and reliability as well as improved access times.
Heterogeneity
Heterogeneities in databases arise due to factors such as differences in structures, semantics of data, the constraints
supported or query language. Differences in structure occur when two data models provide different primitives such
as object oriented (OO) models that support specialization and inheritance and relational models that do not.
Differences due to constraints occur when two models support two different constraints. For example the set type in
CODASYLschema may be partially modeled as a referential integrity constraint in a relationship schema.
CODASYL supports insertion and retention that are not captured by referential integrity alone. The query language
supported by one DBMS can also contribute to heterogeneity between other component DBMSs. For example,
differences in query languages with the same data models or different versions of query languages could contributeto
heterogeneity.
Semanticheterogeneitiesarisewhenthereisadisagreementaboutmeaning,interpretationorintendeduseofdata.At the
schema and data level, classification of possible heterogeneities include:
• Namingconflicts e.g. databases using different names to represent the same concept.
• Domainconflicts or data representation conflicts e.g. databases using different values to represent same concept.
• Precisionconflicts e.g. databases using same data values from domains of different cardinalities for same data.
• Metadata conflicts e.g. same concepts are represented at schema level and instance level.
Federated database system 38
Schemamatching,schemamapping
Dealing with incompatible data types or query syntax is not the only obstacle to a concrete implementation of an
FDBS. In systems that are not planned top-down, a generic problem lies in matching semantically equivalent, but
differentlynamedpartsfromdifferentschemas(=datamodels)(tables,attributes).Apairwisemappingbetweenn
attributeswouldresultin mappingrules(givenequivalencemappings)-anumberthatquicklygetstoo
large for practical purposes. A common way out is to provide a global schema that comprises the relevant parts of all
member schemas and provide mappings in the form of database views. Two principal solutions can be realized,
depending on the direction of the mapping:
1. Global as View (GaV): the global schema is defined in terms of the underlying schemas
2. Local as View (LaV): the local schemas are defined in terms of the global schema
Both are explained in more detail in the article Data integration. Alternate approaches to the schema matching
problem and a classification of the same are explained in more detail in the article Schema Matching
Autonomy
Fundamental to the difference between an MDBS and an FDBS is the concept of autonomy. It is important to
understand the aspects of autonomy for component databases and how they can be addressed when a componentDBS
participates in an FDBS. There are four kinds of autonomies addressed:
• DesignAutonomywhichreferstoabilitytochooseitsdesignirrespectiveofdata,querylanguageor
conceptualization, functionality of the system implementation.
Heterogeneitiesinan FDBSare primarilydue todesign autonomy.
• Communicationautonomyrefersto thegeneraloperation oftheDBMSto communicatewithother DBMSornot.
• ExecutionautonomyallowsacomponentDBMStocontroltheoperationsrequestedbylocalandexternal
operations.
• AssociationautonomygivesapowertocomponentDBStodisassociateitselffromafederationwhichmeans FDBS
can operate independently of any single DBS.
The ANSI/X3/SPARC Study Group outlined a three level data description architecture, the components of which are
the conceptual schema, internal schema and external schema of databases. The three level architecture is however
inadequate to describing the architectures of an FDBS. It was therefore extended to support the three dimensions of
theFDBS namelyDistribution, Autonomy andHeterogeneity. The fivelevel schema architectureis explained below.
Concurrencycontrol
The Heterogeneity and Autonomy requirements pose special challenges concerning concurrency control in an FDBS,
which is crucial for the correct execution of its concurrent transactions (see also Global concurrency control).
Achieving global serializability, the major correctness criterion, under these requirements has been characterized as
very difficult and unsolved. Commitment ordering, introduced in 1991, has provided a general solution for this issue
(See Global serializability; See Commitment ordering also for the architectural aspects of the solution).
Federated database system 39
FiveLevelSchemaArchitectureforFDBSs
The five level schema architecture includes the following:
• Local Schema is the conceptual concept expressed in primary data model of component DBMS.
• Component Schema is derived by translating local schema into a model called the canonical data model or
commondatamodel.Theyareusefulwhensemanticsmissedinlocalschemaareincorporatedinthecomponent. They
help in integration of data for tightly coupled FDBS.
• ExportSchemarepresentsasubsetofacomponentschemathatisavailabletotheFDBS.Itmayincludeaccess control
information regarding its use by specific federation user. The export schema help in managing flow of control
of data.
• FederatedSchemaisanintegrationofmultipleexportschema.Itincludesinformationondatadistributionthatis
generated when integrating export schemas.
• External Schema defines a schema for a user/applications or a class of users/applications.
While accurately representing the state of the art in data integration, the Five Level Schema Architecture above does
sufferfromamajordrawback,namelyITimposedlookandfeel.Moderndatausersdemandcontroloverhowdatais presented;
their needs are somewhat in conflict with such bottom-up approaches to data integration.
References
Externallinks
• Schemacoordinationinfederateddatabasemanagement:acomparisonwithschemaintegration(http://
citeseer.ist.psu.edu/cache/papers/cs/9149/http:zSzzSzwww.bm.ust.hkzSz~zhaozSzDSS96.pdf/
schema-coordination-in-federated.pdf)
• Storage of Behaviour of Object Database (http://www.computing.dcu.ie/~dalenk/publications/PhD
Transfertalk.ppt)
• DB2 and Federated Databases
(http://www.ibm.com/developerworks/db2/library/techarticle/dm-0504zikopoulos/)
• Tutorial on Federated Database (http://www.vldb.org/conf/1991/P489.PDF)
• GaV and LaV explained (http://www.dcs.bbk.ac.uk/~lucas/talks/SCSIS_RD_200507.pps)
• Issuesofwheretoperformthejoinaka"pushdown"andotherperformancecharacteristics(http://www.ibm.com/
developerworks/db2/library/techarticle/0304lurie/0304lurie.html)
• WorkedexamplefederatingOracle,Informix,DB2,andExcel(http://www.ibm.com/developerworks/db2/
library/techarticle/0307lurie/0307lurie.html)
• CompositeInformationServer-acommercialfederateddatabaseproduct(http://www.compositesw.com/
products/cis.shtml)
• Freitas,André,EdwardCurry,JoãoGabrielOliveira,andSeanO’Riain.2012.“QueryingHeterogeneous Datasets
ontheLinkedDataWeb:Challenges,Approaches,andTrends.”(http://www.edwardcurry.org/publications/
freitas_IC_12.pdf) IEEE Internet Computing 16 (1): 24–33.
• IBM Gaian Database: A dynamic Distributed Federated Database
(https://www.ibm.com/developerworks/community/groups/service/html/communityview?
communityUuid=f6ce657b-f385-43b2-8350-458e6e4a344f)
• Federatedsystemandmethodsandmechanismsofimplementingandusingsuchasystem(http://www.google.com/
patents/US7392255)
Referential integrity 40
Referentialintegrity
Referentialintegrityisapropertyof
data which, when satisfied, requires
every value of one attribute (column)of
a relation (table) to exist as a value of
another attribute in a different (orthe
same) relation (table).
Formalization
Aninclusiondependencyovertwo(possiblyidentical)predicates and fromaschemaiswritten
,wherethe , aredistinctattributes(columnnames)of and .It
implies that the tuples of values appearing in columns for facts of must also appear as a tuple of
values in columns for some fact of .
[2]
Logicalimplicationbetweeninclusiondependenciescanbeaxiomatizedbyinferencerules andcanbedecidedby
a PSPACE algorithm. The problem can be shown to be PSPACE-complete by reduction from the
acceptanceproblemforalinearboundedautomaton.
[3]
However,logicalimplicationbetweendependenciesthatcanbeinclusion dependencies or functional dependencies is
[4]
undecidable by reduction from the word problem for monoids.
Referential integrity 41
References
[1] Coronelet al. (2013).Database Systems 10thed. Cengage Learning,ISBN 978-1-111-96960-8
[2] Abiteboul,Hull,Vianu.FoundationsofDatabasesAddison-Wesley,1994.Section9.1,p.193.Freelyavailableonline(http://webdam.inria.fr/Alice/).
[3] ibid., p. 196
[4] ibid., p. 199
Relationalalgebra
Incomputer science, relationalalgebrais an offshoot of first-order logic and of algebra of sets concerned with
operations over finitary relations, usually made more convenient to work with by identifying the components of a
tuple by a name (called attribute) rather than by a numeric column index, which is called a relation in database
terminology.
The main application of relational algebra is providing a theoretical foundation for relational databases, particularly
query languages for such databases, chief among which is SQL.
Introduction
Relational algebra received little attention outside of pure mathematics until the publication of E.F. Codd's
relationalmodel of data in 1970. Codd proposed such an algebra as a basis for database query languages. (See section
Implementations.)
Both a named and an unnamed perspective are possible for relational algebra, depending on whether the tuples are
endowed with component names or not. In the unnamed perspective, a tuple is simply a member of a
Cartesianproduct. In the named perspective, tuples are functions from a finite set U of attributes (of the relation) to a
[1]
domainof values (assumed distinct from U). The relational algebras obtained from the two perspectives are
[2]
equivalent. The typical undergraduate textbooks present only the named perspective though, and this article follows
suit.
Relational algebra is essentially equivalent in expressive power to relational calculus (and thus first-order logic); this
result is known as Codd's theorem. One must be careful to avoid a mismatch that may arise between the two
languages because negation, applied to a formula of the calculus, constructs a formula that may be true on an infinite
set of possible tuples, while the difference operator of relational algebra always returns a finite result. To overcome
these difficulties, Codd restricted the operands of relational algebra to finite relations only and also proposed
restrictedsupportfornegation(NOT)anddisjunction(OR).Analogousrestrictionsarefoundinmanyother logic-based
computer languages. Codd defined the term relational completeness to refer to a language that is complete with
respect to first-order predicate calculus apart from the restrictions he proposed. In practice the restrictions have no
adverse effect on the applicability of his relational algebra for database purposes.
Primitiveoperations
As in any algebra, some operators are primitive and the others are derived in terms of the primitive ones. It is usefulif
the choice of primitive operators parallels the usual choice of primitive logical operators.
Five primitive operators of Codd's algebra are the selection, the projection, the Cartesian product(also called the
cross product or cross join), the set union, and the set difference. Another operator, renamewas not noted by Codd,
but the need for it is shown by the inventors of ISBL. These six operators are fundamental in the sense that omitting
any one of them causes a loss of expressive power. Many other operators have been defined in terms of these six.
Among the most important are set intersection, division, and the natural join. In fact ISBL made a compelling case
for replacing the Cartesian product with the natural join, of which the Cartesian product is a degenerate case.
Relational algebra 42
Altogether, the operators of relational algebra have an expressive power identical to that of domain relationalcalculus
or tuple relational calculus. However, for the reasons given in section Introduction, relational algebra is less
expressivethanfirst-orderpredicatecalculuswithoutfunctionsymbols.Relationalalgebracorrespondstoasubsetof first-
order logic, namely Horn clauses without recursion and negation.
Setoperators
The relational algebra uses set union, set difference, and Cartesian product from set theory, but adds additional
constraints to these operators.
For set union and set difference, the two relations involved must be union-compatible—that is, the two relationsmust
have the same set of attributes. Because set intersection can be defined in terms of set difference, the two relations
involved in set intersection must also be union-compatible.
For the Cartesian product to be defined, the two relations involved must have disjoint headers—that is, they must not
have a common attribute name.
In addition, the Cartesian product is defined differently from the one in set theory in the sense that tuples are
considered to be "shallow" for the purposes of the operation. That is, the Cartesian product of a set of n-tuples with a
setofm-tuplesyieldsasetof"flattened"(n + m)-tuples(whereasbasicsettheorywouldhaveprescribedasetof 2-tuples, each
containing an n-tuple and an m-tuple). More formally, R × S is defined as follows:
R×S={(r ,r ,...,r ,s ,s ,...,s )|(r ,r ,...,r )∈R,(s ,s ,...,s )∈S}
1 2 n 1 2 m 1 2 n 1 2 m
Thecardinality of the Cartesian product is the product of the cardinalities of itsfactors, i.e., |R × S| = |R| × |S|.
Projection(π)
Aprojectionisaunaryoperationwrittenas where isasetofattributenames.Theresult
of such projection is defined as the set that is obtained when all tuples in R are restricted to the set .
This specifies the specific subset of columns (attributes of each tuple) to be retrieved. To obtain the names and phone
numbers from an address book, the projection might be written
.Theresultofthatprojectionwouldbearelationwhich contains
only the contactName and contactPhoneNumber attributes for each unique entry in addressBook.
Selection(σ)
Ageneralizedselectionisaunaryoperationwrittenas whereisapropositionalformulathatconsistsof
atomsasallowedinthenormalselectionandthelogicaloperators(and),(or)and(negation).This selection selects all those
tuples in R for which holds.
Toobtainalistingofallfriendsorbusinessassociatesinanaddressbook,theselectionmightbewrittenas
. The result would be a relation containing every
attribute of every unique record where isFriend is true or where isBusinessContact is true.
InCodd's1970paper,selectioniscalledrestriction.
Relational algebra 43
Rename(ρ)
Arenameisaunaryoperationwrittenas wheretheresultisidenticaltoRexceptthatthebattributeinall tuples is
renamed to an a attribute. This is simply used to rename the attribute of a relation or the relation itself.
Joinsandjoin-likeoperators
Naturaljoin()
[3]
Natural join () is a binary operator that is written as (RS) where R and S are relations. The result of the natural join
is the set of all combinations of tuples in R and S that are equal on their common attribute names. For an example
consider the tables Employee and Dept and their natural join:
where Fun is a predicate that is true for a relationr if and only ifr is a function. It is usually required that R and S
must have at least one common attribute, but if this constraint is omitted, and R and S have no common attributes,
then the natural join becomes exactly the Cartesian product.
The natural join can be simulated with Codd's primitives as follows. Assume that c ,...,c are the attribute names
1 m
common to R and S, r ,...,r are the attribute names unique to R and s ,...,s are the attribute unique to S. Furthermore
1 n 1 k
assume that the attribute names x ,...,x are neither in R nor in S. In a first step we can now rename the common
1 m
attribute names in S:
Then we take the Cartesian product and select the tuples that are to be joined:
Relational algebra 44
θ-joinandequijoin
ConsidertablesCarandBoatwhichlistmodelsofcarsandboatsandtheirrespectiveprices.Supposeacustomer wants to buy
a car and a boat, but she does not want to spend more money for the boat than for the car. The θ-join ( ) on the
θ
relationCarPrice ≥ BoatPrice produces a table with all the possible options. When using a
conditionwheretheattributesareequal,forexamplePrice,thentheconditionmaybespecifiedasPrice=Priceoralternatively
(Price)itself.
R S=σ (R×S)
θ θ
In case theoperator θis theequality operator(=) thenthis joinis alsocalled anequijoin.
Note, however, that a computer language that supports the natural join and rename operators does not need θ-join as
well, as this can be achieved by selection from the result of a natural join (which degenerates to Cartesian product
when there are no shared attributes).
Semijoin()()
[4]
TheleftsemijoinisjoiningsimilartothenaturaljoinandwrittenasRSwhereRandSarerelations. Theresult of this semijoin
is the set of all tuples in R for which there is a tuple in S that is equal on their common attribute names. For an
example consider the tables Employee and Dept and their semi join:
Antijoin(▷)
The antijoin, written as R S where R and S are relations, is similar to the semijoin, but the result of an antijoin is
[5]
only those tuples in R for which there is no tuple in S that is equal on their common attribute names.
ForanexampleconsiderthetablesEmployeeandDeptandtheirantijoin:
Division(÷)
The division is a binary operation that is written as R ÷ S. The result consists of the restrictions of tuples in R to the
attribute names unique to R, i.e., in the header of R but not in the header of S, for which it holds that all their
combinations with tuples in S are present in R. For an example see the tables Completed, DBProject and their
division:
Fred Compiler1
Eugene Database1
Eugene Compiler1
Sarah Database1
Sarah Database2
Relational algebra 46
If DBProject contains all the tasks of the Database project, then the result of the division above contains exactly the
students who have completed both of the tasks in the Database project.
Moreformallythe semanticsof thedivision is definedas follows:
R÷S= {t[a ,...,a ]:t R s S ( (t[a ,...,a ] s) R)}
1 n 1 n
where {a ,...,a } is the set of attribute names unique to R and t[a ,...,a ] is the restriction of t to this set. It is usually
1 n 1 n
requiredthattheattributenamesintheheaderofSareasubsetofthoseofRbecauseotherwisetheresultofthe
operation will always be empty.
Commonextensions
In practice the classical relational algebra described above is extended with various operations such as outer joins,
aggregate functions and even transitive closure.
Outerjoins
Whereas the result of a join (or inner join) consists of tuples formed by combining matching tuples in the two
operands, an outer join contains those tuples and additionally some tuples formed by extending an unmatched tuplein
one of the operands by "fill" values for each of the attributes of the other operand. Note that outer joins are not
considered part of the classical relational algebra discussed so far.
The operators defined in this section assume the existence of a null value, ω, which we do not define, to be used for
the fill values; in practice this corresponds to the NULL in SQL. In order to make subsequent selection operations on
the resulting table meaningful, a semantic meaning needs to be assigned to nulls; in Codd's approach the
propositional logic used by the selection is extended to a three-valued logic, although we elide those details in this
article.
Three outer join operators are defined: left outer join, right outer join, and full outer join. (The word "outer"is
sometimes omitted.)
Relational algebra 47
Leftouterjoin()
[6]
The left outer join is written as RS where R and S are relations. The result of the left outer join is the set of all
combinations of tuples in R and S that are equal on their common attribute names, in addition (loosely speaking) to
tuples in R that have no matching tuples in S.
Foranexampleconsiderthetables EmployeeandDeptandtheirleft outerjoin:
Rightouterjoin()
The right outer join behaves almost identically to the left outer join, but the roles of the tables are switched.
[7]
The right outer join of relations R and S is written as RS. The result of the right outer join is the set of all
combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have
no matching tuples in R.
ForexampleconsiderthetablesEmployeeandDeptand theirrightouterjoin:
Fullouterjoin()
Theouterjoinorfullouterjoinin effect combinesthe results of theleft and rightouter joins.
[8]
The full outer join is written as RS where R and S are relations. The result of the full outer join is the set of all
combinations of tuples in R and S that are equal on their common attribute names, in addition to tuples in S that have
no matching tuples in R and tuples in R that have no matching tuples in S in their common attribute names.
Foranexampleconsiderthetables EmployeeandDeptandtheirfull outerjoin:
Aggregation
Furthermore, computing various functions on a column, like the summing up its elements, is also not possible using
the relational algebra introduced so far. There are five aggregate functions that are included with most relational
database systems. These operations are Sum, Count, Average, Maximum and Minimum. In relational algebra the
aggregation operation over a schema (A , A , ... A ) is written as follows:
1 2 n
G ,G ,...,G g 1(A '),f (A '),...,f (A ')(r)
1 2 m f 1 2 2 k k
whereeachA ',1≤j≤k,isoneoftheoriginalattributesA ,1≤i≤n.
j i
The attributes preceding the g are grouping attributes, which function like a "group by" clause in SQL. Then
thereareanarbitrarynumberofaggregationfunctionsappliedtoindividualattributes.Theoperationisappliedtoan
arbitraryrelationr.Thegroupingattributesareoptional,andiftheyarenotsupplied,theaggregationfunctionsare
Relational algebra 49
Transitiveclosure
Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural
operators on relations which cannot be expressed by relational algebra. One of them is the transitive closure of a
+
binary relation. Given a domain D, let binary relation R be a subset of D×D. The transitive closure R of R is the
smallest subset of D×D containing R which satisfies the following condition:
+
There is no relational algebra expression E(R) taking R as a variable argument which produces R . This can be
+
proved using the fact that, given a relational expression E for which it is claimed that E(R) =R , where R is a
+
variable, we can always find an instance r of R (and a corresponding domain d) such that E(r) ≠r .
SQL however officially supports such fixpoint queries since 1999, and it had vendor-specific extensions in this
direction well before that.
Useofalgebraicpropertiesforqueryoptimization
Queries can be represented as a tree, where
• the internal nodes are operators,
• leavesare relations,
• subtrees are subexpressions.
Our primary goal is to transform expression trees into equivalent expression trees, where the average size of the
relations yielded by subexpressions in the tree is smaller than it was before the optimization. Our secondary goal
istotrytoformcommonsubexpressionswithinasinglequery,orifthereismorethanonequerybeingevaluatedatthe same time,
in all of those queries. The rationale behind the second goal is that it is enough to compute common subexpressions
once, and the results can be used in all queries that contain that subexpression.
Here we present a set of rules that can be used in such transformations.
Selection
Rules about selection operators play the most important role in query optimization. Selection is an operator that very
effectively decreases the number of rows in its operand, so if we manage to move the selections in an expression tree
towards the leaves, the internal relations (yielded by subexpressions) will likely shrink.
Basicselectionproperties
Selection is idempotent (multiple applications of the same selection have no additional effect beyond the first one),
and commutative (the order selections are applied in has no effect on the eventual result).
1.
2.
Relational algebra 50
Breakingupselectionswithcomplexconditions
A selection whose condition is a conjunction of simpler conditions is equivalent to a sequence of selections with
those same individual conditions, and selection whose condition is a disjunction is equivalent to a union ofselections.
These identities can be used to merge selections so that fewer selections need to be evaluated, or to split them so that
the component selections may be moved or optimized separately.
1.
2.
Selectionandcrossproduct
Cross product is the costliest operator to evaluate. If the input relations have N and M rows, the result will contain
rows.Thereforeitisveryimportanttodoourbesttodecreasethesizeofbothoperandsbeforeapplyingthe
crossproduct operator.
This can be effectively done, if the cross product is followed by a selection operator, e.g. (R × P). Considering the
definition of join, this is the most likely case. If the cross product is not followed by a selection operator, we can try
to push down a selection from higher levels of the expression tree using the other selection rules.
In the above case we break up condition A into conditions B, C and D using the split rules about complex selection
conditions, so that A = BCD and B only contains attributes from R, C contains attributes only from P and D contains
the part of A that contains attributes from both R and P. Note, that B, C or D are possibly empty. Then the following
holds:
Selectionandsetoperators
Selection is distributive over the setminus, intersection, and union operators. The following three rules are used to
push selection below set operations in the expression tree. Note, that in the setminus and the intersection operators it
is possible to apply the selection operator to only one of the operands after the transformation. This can make sensein
cases, where one of the operands is small, and the overhead of evaluating the selection operator outweighs the
benefits of using a smaller relation as an operand.
1.
2.
3.
Selectionandprojection
Selection commutes with projection if and only if the fields referenced in the selection condition are a subset of the
fields in the projection. Performing selection before projection may be useful if the operand is a cross product orjoin.
In other cases, if the selection condition is relatively expensive to compute, moving selection outside the projection
may reduce the number of tuples which must be tested (since projection may produce fewer tuples due to the
elimination of duplicates resulting from omitted fields).
Relational algebra 51
Projection
Basicprojectionproperties
Projectionis idempotent, so that a series of (valid)projections is equivalent to the outermost projection.
Projectionandsetoperators
Projection is distributive over set union.
Projection does not distribute over intersection and set difference. Counterexamples are given by:
and
Rename
Basicrenameproperties
Successive renames of a variable can be collapsed into a single rename. Rename operations which have no variables
in common can be arbitrarily reordered with respect to one another, which can be exploited to make successive
renames adjacent so that they can be collapsed.
1.
2.
Renameandsetoperators
Rename is distributive over set difference, union, and intersection.
1.
2.
3.
Implementations
The first query language to be based on Codd's algebra was ISBL, and this pioneering work has been acclaimed by
many authorities as having shown the way to make Codd's idea into a useful language. Business System 12 was a
short-lived industry-strength relational DBMS that followed the ISBL example.
In 1998 Chris Date and Hugh Darwen proposed a language called TutorialDintended for use in teaching relational
database theory, and its query language also draws on ISBL's ideas. Rel is an implementation of TutorialD.
EventhequerylanguageofSQLislooselybasedonarelationalalgebra,thoughtheoperandsinSQL(tables)arenot exactly
relations and several useful theorems about the relational algebra do not hold in the SQL counterpart (arguably to the
detriment of optimisers and/or users). The SQL table model is a bag (multiset), rather than a set. For example, the
expression (R ∪ S) − T = (R − T ) ∪ (S − T) is a theorem for relational algebra on sets, but not for relational
algebra on bags; for a treatment of relational algebra on bags see chapter 5 of the "Complete" textbook by Garcia-
Molina, Ullman and Widom.
Relational algebra 52
References
[1] SergeAbiteboul,Richard Hull,Victor Vianu,Foundationsof databases,Addison-Wesley, 1995,ISBN 0-201-53771-0,p. 29–33
[2] SergeAbiteboul, Richard Hull, VictorVianu, Foundations of databases, Addison-Wesley, 1995, ISBN 0-201-53771-0, p. 59–63 andp. 71
[3] InUnicode, thebowtie symbol is (U+22C8).
[4] InUnicode, the ltimes symbol is(U+22C9). The rtimes symbol is (U+22CA)
[5] InUnicode, theAntijoin symbol is (U+25B7).
[6] InUnicode, the Leftouter join symbol is (U+27D5).
[7] InUnicode, the Rightouter join symbol is (U+27D6).
[8] InUnicode, the FullOuter join symbol is (U+27D7).
Furtherreading
Practically any academic textbook on databases has a detailed treatment of the classic relational algebra.
• Imieliński,T.;Lipski,W.(1984)."Therelationalmodelofdataandcylindricalgebras".JournalofComputerand System
Sciences 28: 80–102. doi: 10.1016/0022-0000(84)90077-1 (http://dx.doi.org/10.1016/
0022-0000(84)90077-1). (For relationship with cylindric algebras).
Externallinks
• RAT.Software Relational Algebra Translator to SQL (http://www.slinfo.una.ac.cr/rat/rat.html)
• Lecture Notes: Relational Algebra (http://www.databasteknik.se/webbkursen/relalg-lecture/index.html)–A
quick tutorial to adapt SQL queries into relational algebra
• LEAP–Animplementationoftherelationalalgebra(http://leap.sourceforge.net)
• Relational–Agraphicimplementationoftherelationalalgebra(http://galileo.dmi.unict.it/wiki/relational/)
• Query Optimization (http://www-db.stanford.edu/~widom/cs346/ioannidis.pdf) This paper is an introduction
intotheuseoftherelationalalgebrainoptimizingqueries,andincludesnumerouscitationsformorein-depth study.
• bandilab.org–neatgraphicalillustrationsoftherelationaloperators(http://bandilab.org/bandicoot-algebra.
pdf)
• RelationalAlgebra System for Oracle and Microsoft SQL Server (http://www.cse.fau.edu/
~marty#RADownload)
Relational calculus 53
Relationalcalculus
Relationalcalculusconsistsoftwocalculi,thetuplerelationalcalculusandthedomainrelationalcalculus,thatare part of
the relational model for databases and provide a declarative way to specify database queries. This in contrast to the
relational algebra which is also part of the relational model but provides a more procedural way for specifying
queries.
Therelationalalgebramightsuggestthesestepstoretrievethephonenumbersandnamesofbookstoresthatsupply
SomeSampleBook:
1. Join book stores and titles over the BookstoreID.
2. Restrict the result of that join to tuples for the book Some Sample Book.
3. ProjecttheresultofthatrestrictionoverStoreNameandStorePhone. The
relational calculus would formulate a descriptive, declarative way:
Get StoreName and StorePhone for supplies such that there exists a title BK with the same BookstoreID value
and with a BookTitle value of Some Sample Book.
The relational algebra and the relational calculus are essentially logically equivalent: for any algebraic
expression,there is an equivalent expression in the calculus, and vice versa. This result is known as Codd's theorem.
References
• Date,ChristopherJ.(2004).AnIntroductiontoDatabaseSystems(8thed.).AddisonWesley.
ISBN0-321-19784-4.
Relationaldatabase
A relationaldatabaseis a database that has a collection of tables of data items, all of which is formally described
and organized according to the relational model. Data in single table represents relation, from which the name of the
database type comes from. In typical solutions, tables may have additionally defined relationships with each other.
In the relational model, each table schema must identify a column or group of columns, called the primary key, to
uniquely identify each row. A relationship can then be established between each row in the table and a row inanother
table by creating a foreign key, a column or group of columns in one table that points to the primary key of
anothertable.Therelationalmodeloffersvariouslevelsofrefinementoftableorganizationandreorganizationcalled database
normalization. (See Normalization below.) The database management system (DBMS) of a relational database is
called an RDBMS, and is the software of a relational database.
The relational database was first defined in June 1970 by Edgar Codd, of IBM's San Jose Research Laboratory.
Codd's view of what qualifies as an RDBMS is summarized in Codd's 12 rules. A relational database has become the
predominant choice in storing data. Other models besides the relational model include the hierarchical
databasemodel and the network model.
Relational database 54
Terminology
Relational database theory uses
mathematical terminology, which are
roughly equivalent to the SQLdatabase
terminology concerning normalization.
The table below summarizes some of
the
mostimportantrelationaldatabasetermsa
nd their SQL database equivalents. It
was first introduced in 1970 following
the work of E.F.Codd.
Arowortuplehasarelationschema,
but an entire database has a relational schema.
Relational database terminology.
Row Tuple Data set with specific instances in the range of each member
RelationsorTables
A relationis defined as a set of tuples that have the same attributes. A tuple usually represents an object and
information about that object. Objects are typically physical objects or concepts. A relation is usually described as a
table, which is organized into rows and columns. All the data referenced by an attribute are in the same domain and
conform to the same constraints.
The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose
no order on the attributes. Applications access data by specifying queries, which use operations such as select to
identify tuples, project to identify attributes, and join to combine relations. Relations can be modified using the
insert, delete, and update operators. New tuples can supply explicit values or be derived from a query. Similarly,
queries identify tuples for updating or deleting.
Tuples by definition are unique. If the tuple contains a candidate or primary key then obviously it is unique;however,
a primary key need not be defined for a row or record to be a tuple. The definition of a tuple requires that it be
unique, but does not require a primary key to be defined. Because a tuple is unique, its attributes by definition
constitute a superkey.
Relational database 55
Baseandderivedrelations
In a relational database, all data are stored and accessed via relations. Relations that store data are called "base
relations", and in implementations are called "tables". Other relations do not store data, but are computed byapplying
relational operations to other relations. These relations are sometimes called "derived relations". In implementations
these are called "views" or "queries". Derived relations are convenient in that they act as a single relation, even
though they may grab information from several relations. Also, derived relations can be used as an abstraction layer.
Domain
A domain describes the set of possible values for a given attribute, and can be considered a constraint on the value of
the attribute. Mathematically, attaching a domain to an attribute means that any value for the attribute must be an
elementofthespecifiedset.Thecharacterdatavalue'ABC',forinstance,isnotintheintegerdomain,buttheinteger value 123 is
in the integer domain.
Constraints
Constraints make it possible to further restrict the domain of an attribute. For instance, a constraint can restrict a
given integer attribute to values between 1 and 10. Constraints provide one method of implementing business rulesin
the database. SQL implements constraint functionality in the form of check constraints. Constraints restrict thedata
that can be stored in relations. These are usually defined using expressions that result in a boolean value, indicating
whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting
combinations of attributes) or to an entire relation. Since every attribute has an associated domain, there
areconstraints(domainconstraints).Thetwoprincipalrulesfortherelationalmodelareknownasentityintegrity and
referentialintegrity.
Primarykeys
A primary key uniquely specifies a tuple within a table. In order for an attribute to be a good primary key it must not
repeat. While natural attributes (attributes used to describe the data being entered) are sometimes good primary keys,
surrogate keys are often used instead. A surrogate key is an artificial attribute assigned to an object which uniquely
identifies it (for instance, in a table of information about students at a school they might all be assigned a student ID
in order to differentiate them). The surrogate key has no intrinsic (inherent) meaning, but rather is useful through its
ability to uniquely identify a tuple. Another common occurrence, especially in regards to N:M cardinality is the
composite key. A composite key is a key made up of two or more attributes within a table that (together) uniquely
identify a record. (For example, in a database relating students, teachers, and classes. Classes could be uniquely
identified by a composite key of their room number and time slot, since no other class could have exactly the same
combination of attributes. In fact, use of a composite key such as this can be a form of data verification, albeit aweak
one.)
Foreignkey
A foreign key is a field in a relational table that matches the primary key column of another table. The foreign key
can be used to cross-reference tables. Foreign keys need not have unique values in the referencing relation. Foreign
keys effectively use the values of attributes in the referenced relation to restrict the domain of one or more attributes
in the referencing relation. A foreign key could be described formally as: "For all tuples in the referencing relation
projected over the referencing attributes, there must exist a tuple in the referenced relation projected over those same
attributes such that the values in each of the referencing attributes match the corresponding values in the referenced
attributes."
Relational database 56
Storedprocedures
A stored procedure is executable code that is associated with, and generally stored in, the database. Storedprocedures
usually collect and customize common operations, like inserting a tuple into a relation, gathering statistical
information about usage patterns, or encapsulating complex business logic and calculations. Frequently they are used
as an application programming interface (API) for security or simplicity. Implementations of stored
proceduresonSQLRDBMSsoftenallowdeveloperstotakeadvantageofproceduralextensions(oftenvendor-specific) to
the standard declarative SQL syntax. Stored procedures are not part of the relational database model, but all
commercial implementations include them.
Index
An index is one way of providing quicker access to data. Indices can be created on any combination of attributes ona
relation. Queries that filter using those attributes can find matching tuples randomly using the index, withouthaving
to check each tuple in turn. This is analogous to using the index of a book to go directly to the page on which the
information you are looking for is found, so that you do not have to read the entire book to find what you are looking
for. Relational databases typically supply multiple indexing techniques, each of which is optimal for some
combination of data distribution, relation size, and typical access pattern. Indices are usually implemented via
B+trees, R-trees, and bitmaps. Indices are usually not considered part of the database, as they are considered an
implementation detail, though indices are usually maintained by the same group that maintains the other parts of the
database. It should be noted that use of efficient indexes on both primary and foreign keys can dramatically improve
query performance. This is because B-tree indexes result in query times proportional to log(n) where n is the number
of rows in a table and hash indexes result in constant time queries (no size dependency so long as the relevant part of
the index fits into memory).
Relationaloperations
Queries made against the relational database, and the derived relvars in the database are expressed in a
relationalcalculus or a relational algebra. In his original relational algebra, Codd introduced eight relational operators
in two groups of four operators each. The first four operators were based on the traditional mathematical set
operations:
• Theunionoperatorcombinesthetuplesoftworelationsandremovesallduplicatetuplesfromtheresult.The relational
union operator is equivalent to the SQL UNION operator.
• Theintersectionoperatorproducesthesetoftuplesthattworelationsshareincommon.Intersectionis
implemented in SQL in the form of the INTERSECT operator.
• Thedifferenceoperatoractsontworelationsandproducesthesetoftuplesfromthefirstrelationthatdonotexist in the
second relation. Difference is implemented in SQL in the form of the EXCEPT or MINUS operator.
• The cartesian product of two relations is a join that is not restricted by any criteria, resulting in every tuple of the
firstrelationbeingmatchedwitheverytupleofthesecondrelation.ThecartesianproductisimplementedinSQL as the
CROSS JOIN operator.
The remaining operators proposed by Codd involve special operations specific to relational databases:
• Theselection,orrestriction,operationretrievestuplesfromarelation,limitingtheresultstoonlythosethatmeet a specific
criterion, i.e. a subset in terms of set theory. The SQL equivalent of selection is the SELECT query statement
with a WHERE clause.
• The projection operation extracts only the specified attributes from a tuple or set of tuples.
• The join operation defined for relational databases is often referred to as a natural join. In this type of join, two
relations are connected by their common attributes. SQL's approximation of a natural join is the INNER JOIN
operator. In SQL, an INNER JOIN prevents a cartesian product from occurring when there are two tables in a
query.ForeachtableaddedtoaSQLQuery,oneadditionalINNERJOINisaddedtopreventacartesianproduct. Thus, for
N tables in a SQL query, there must be N-1 INNER JOINS to prevent a cartesian product.
Relational database 57
• Therelationaldivisionoperationisaslightlymorecomplexoperation,whichinvolvesessentiallyusingthetuples of one
relation (the dividend) to partition a second relation (the divisor). The relational division operator is effectively
the opposite of the cartesian product operator (hence the name).
Other operators have been introduced or proposed since Codd's introduction of the original eight including relational
comparison operators and extensions that offer support for nesting and hierarchical data, among others.
Normalization
Normalization was first proposed by Codd as an integral part of the relational model. It encompasses a set of
procedures designed to eliminate nonsimple domains (non-atomic values) and the redundancy (duplication) of data,
which in turn prevents data manipulation anomalies and loss of data integrity. The most common forms of
normalization applied to databases are called the normal forms.
References
Relationaldatabasemanagementsystem
Arelationaldatabasemanagementsystem(RDBMS) is a database management system (DBMS) that is based on the
relational model as introduced by E. F. Codd, of IBM's San Jose Research Laboratory. Many popular databases
currently in use are based on the relational database model.
RDBMSs have become since the 1980s a predominant choice for the storage of information in new databases used
for financial records, manufacturing and logistical information, personnel data, and much more. Relational databases
have often replaced legacy hierarchical databases and network databases because they are easier to understand and
use.However,relationaldatabaseshavebeenchallengedbyobjectdatabases,whichwereintroducedinanattemptto address
[citation needed]
the object-relational impedance mismatch in relational database, and XML databases.
Marketshare
According to research company Gartner, the five leading commercial relational database vendors by revenue in 2011
were Oracle (48.8%), IBM (20.2%), Microsoft (17.0%), SAP including Sybase (4.6%), and Teradata (3.7%).
The three leading open source implementations are MySQL, PostgreSQL, and SQLite. MariaDB is a prominent fork
of MySQL prompted by Oracle's acquisition of MySQL AB.
According to Gartner, in 2008, the percentage of database sites using any given technology were (a given site
maydeploy multiple technologies):
• Oracle Database - 70%
• Microsoft SQL Server - 68%
• MySQL (Oracle Corporation) - 50%
• IBM DB2 - 39%
• IBM Informix - 18%
• SAPSybaseAdaptive ServerEnterprise- 15%
• SAPSybaseIQ-14%
• Teradata - 11%
According to DB-Engines, the most popular systems are Oracle, MySQL, Microsoft SQL Server,
PostgreSQLandIBM DB2.
Relational database management system 58
History
In 1974, IBM began developing System R, a research project to develop a prototype RDBMS. Its first commercial
product was SQL/DS, released in 1981. However, the first commercially available RDBMS was Oracle, released in
1979 by Relational Software, now Oracle Corporation. Other examples of an RDBMS include DB2, SAP
SybaseASE, and Informix.it is also developed by ash in 2013.
Historicalusageoftheterm
The term "relational database" was invented by E. F. Codd at IBM in 1970, Codd introduced the term in his seminal
[1]
paper "A Relational Model of Data for Large Shared Data Banks". In this paper and later papers, he defined what
he meant by "relational". One well-known definition of what constitutes a relational database system is composed of
Codd's 12 rules. However, many of the early implementations of the relational model did not conform to all ofCodd's
rules, so the term gradually came to describe a broader class of database systems, which at a minimum:
• Presentthedatatotheuserasrelations(apresentationintabularform,i.e.asacollectionoftableswitheachtable consisting
of a set of rows and columns);
• Provide relational operators to manipulate the data in tabular form.
The first systems that were relatively faithful implementations of the relational model were from the University of
[2]
Michigan; Micro DBMS (1969), the Massachusetts Institute of Technology; (1971), and from IBM UK Scientific
Centre at Peterlee; IS1 (1970–72) and its followon PRTV (1973–79). The first system sold as an RDBMS was
Multics Relational Data Store, first sold in 1978. Others have been Berkeley Ingres QUEL and IBM BS12. The most
popular definition of an RDBMS is a product that presents a view of data as a collection of rows and columns, evenif
it is not based strictly upon relational theory. By this definition, RDBMS products typically implement some but not
all of Codd's 12 rules. A second school of thought argues that if a database does not implement all of Codd'srules (or
the current understanding on the relational model, as expressed by Christopher J Date, Hugh Darwen and others), it
is not relational. This view, shared by many theorists and other strict adherents to Codd's principles, would disqualify
most DBMSs as not relational. For clarification, they often refer to some RDBMSs as Truly-Relational Database
Management Systems (TRDBMS), naming others Pseudo-Relational Database Management Systems (PRDBMS).
[citation needed]
As of 2009, most commercial relational DBMSes employ SQL as their query language. Alternative
query languages have been proposed and implemented, notably the pre-1996 implementation of Berkeley Ingres
QUEL.
References
[1] "ARelational Model of Data forLarge Shared Data Banks"(http://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf)
[2] SIGFIDET'74Proceedingsofthe1974ACMSIGFIDET(nowSIGMOD)workshoponDatadescription,accessandcontrol
Relational model 59
Relationalmodel
The relational model for database management is a database model based on first-order predicate logic, first
[1]
formulated and proposed in 1969 by Edgar F. Codd. In the relational model of a database, all data is represented in
terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.
The purpose of the relational model is to provide
a declarative method for specifying data and
queries: users directly state what information the
database contains and what information they
want from it, and let the database management
system software take care of describing data
structures for storing the data and retrieval
procedures for answering queries.
Overview
The relational model's central idea is to describe
a database as a collection of predicates over a
finite set of predicate variables, describing
constraints on the possible values and
combinations of values. The content of the
database at any given time is a finite (logical)
model of the database, i.e. a set of relations, one
per predicate variable, such that all predicatesare
satisfied. A request for information from the
database (a database query) is also a predicate.
Alternativestotherelational
model
Other models are the
hierarchicalmodel and network model.
Some systems using these older
architectures are still in use today in
data centers with high data volume
needs, or where existing systems are so
complex and abstract it would be cost-
prohibitive to migrate to systems
employing the relational model; also of
note arenewer object-oriented
databases. Relational model concepts.
Implementation
There have been several attempts to produce a true implementation of the relational database model as originally
defined by Codd and explained by Date, Darwen and others, but none have been popular successes so far. Rel is one
of the more recent attempts to do this.
The relational model was the first database model to be described in formal mathematical terms. Hierarchical and
network databases existed before relational databases, but their specifications were relatively informal. After the
relational model was defined, there were many attempts to compare and contrast the different models, and this led to
the emergence of more rigorous descriptions of the earlier models; though the procedural nature of the data
[citation needed]
manipulation interfaces for hierarchical and network databases limited the scope for formalization.
History
The relational model was invented by E.F. (Ted) Codd as a general model of data, and subsequently maintained and
developedbyChrisDateandHughDarwenamongothers.InTheThirdManifesto(firstpublishedin1995)Dateand Darwen
show how the relational model can accommodate certain desired object-oriented features.
Controversies
Coddhimself,someyearsafterpublicationofhis1970model,proposedathree-valuedlogic(True,False,Missingor NULL)
version of it to deal with missing information, and in his The Relational Model for Database Management Version 2
(1990) he went a step further with a four-valued logic (True, False, Missing but Applicable, Missing but
Inapplicable) version. But these have never been implemented, presumably because of attending complexity. SQL's
NULL construct was intended to be part of a three-valued logic system, but fell short of that due to logical errors in
[citation needed]
the standard and in its implementations.
Relationalmodeltopics
Themodel
The fundamental assumption of the relational model is that all data is represented as mathematical n-aryrelations,
an n-ary relation being a subset of the Cartesian product of n domains. In the mathematical model, reasoning about
such data is done in two-valued predicate logic, meaning there are two possible evaluations for each proposition:
either true or false (and in particular no third value such as unknown, or not applicable, either of which are often
associatedwiththeconceptofNULL).Dataareoperateduponbymeansofarelationalcalculusorrelational
Relational model 61
Interpretation
Tofully appreciate the relational model of data it is essential to understand the intended interpretation of a relation.
The body of a relation is sometimes called its extension. This is because it is to be interpreted as a representation of
the extension of some predicate, this being the set of true propositions that can be formed by replacing each
freevariable in that predicate by a name (a term that designates something).
There is a one-to-one correspondence between the free variables of the predicate and the attribute names of the
relation heading. Each tuple of the relation body provides attribute values to instantiate the predicate by substituting
each of its free variables. The result is a proposition that is deemed, on account of the appearance of the tuple in the
relation body, to be true. Contrariwise, every tuple whose heading conforms to that of the relation, but which doesnot
appear in the body is deemed to be false. This assumption is known as the closed world assumption: it is often
violated in practical databases, where the absence of a tuple might mean that the truth of the corresponding
propositionisunknown.Forexample,theabsenceofthetuple('John','Spanish')fromatableoflanguageskills
Relational model 62
cannot necessarily be taken as evidence that John does not speak Spanish.
Fora formal exposition ofthese ideas, see thesection Set-theoretic Formulation, below.
Applicationtodatabases
Adatatypeasusedinatypicalrelationaldatabasemightbethesetofintegers,thesetofcharacterstrings,thesetof dates, or the
two boolean values true and false, and so on. The corresponding typenamesfor these types might be the strings
"int", "char", "date", "boolean", etc. It is important to understand, though, that relational theory does not dictate what
types are to be supported; indeed, nowadays provisions are expected to be available for user-defined types in
addition to the built-in ones provided by the system.
Attributeisthetermusedinthetheoryforwhatiscommonlyreferredtoasacolumn.Similarly,tableiscommonly usedin
place of the theoretical term relation(though in SQL the term is by no means synonymous with relation). A
tabledatastructureisspecifiedasalistofcolumndefinitions,eachofwhichspecifiesauniquecolumnnameandthe type of the
values that are permitted for that column. An attributevalue is the entry in a specific column and row, such as
"John Doe" or "35".
A tupleis basically the same thing as a row, except in an SQL DBMS, where the column values in a row are
ordered. (Tuples are not ordered; instead, each attribute value is identified solely by the attributenameand never
by its ordinal position within the tuple.) An attribute name might be "name" or "age".
Arelationisatablestructuredefinition(asetofcolumndefinitions)alongwiththedataappearinginthatstructure. The
structure definition is the headingand the data appearing in it is the body, a set of rows. A database
relvar(relationvariable)iscommonlyknownasabasetable. Theheadingofitsassignedvalueatanytimeisasspecified in
the table declaration and its body is that most recently assigned to it by invoking some update operator
(typically, INSERT, UPDATE, or DELETE). The heading and body of the table resulting from evaluation of some
query are determined by the definitions of the operators used in the expression of that query. (Note that in SQL the
heading is not always a set of column definitions as described above, because it is possible for a column to have no
nameandalsofortwoormorecolumnstohavethesamename.Also,thebodyisnotalwaysasetofrowsbecausein SQL it is
possible for the same row to appear more than once in the same body.)
SQLandtherelationalmodel
SQL, initially pushed as the standard language for relational databases, deviates from the relational model in several
places. The current ISO SQL standard doesn't mention the relational model or use relational terms or concepts.
However, it is possible to create a database conforming to the relational model using SQL if one does not use certain
SQL features.
The following deviations from the relational model have been notedWikipedia:Avoid weasel words in SQL. Notethat
few database servers implement the entire SQL standard and in particular do not allow some of these deviations.
Whereas NULL is ubiquitous, for example, allowing duplicate column names within a table or anonymous columns
is uncommon.
Duplicate rows
The same row can appear more than once in an SQL table. The same tuple cannot appear more than once in a
relation.
Anonymouscolumns
A column in an SQL table can be unnamed and thus unable to be referenced in expressions. The relational
model requires every attribute to be named and referenceable.
Duplicate column names
Two or more columns of the same SQL table can have the same name and therefore cannot be referenced, on
account of the obvious ambiguity. The relational model requires every attribute to be referenceable.
Relational model 63
Relationaloperations
Users(orprograms)requestdatafromarelationaldatabasebysendingitaquerythatiswritteninaspeciallanguage, usually a
dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be
embedded into software that provides an easier user interface. Many Web sites, such as Wikipedia, perform SQL
queries when generating pages.
In response to a query, the database returns a result set, which is just a list of rows containing the answers. The
simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return
just the answer wanted.
Often, data from multiple tables are combined into one, by doing a join. Conceptually, this is done by taking all
possible combinations of rows (the Cartesian product), and then filtering out everything except the answer. In
practice, relational database management systems rewrite ("optimize") queries to perform faster, using a variety of
techniques.
There are a number of relational operations in addition to join. These include project (the process of
eliminatingsomeofthecolumns),restrict(theprocessofeliminatingsomeoftherows),union(awayofcombiningtwotables
Relational model 64
with similar structures), difference (that lists the rows in one table that are not found in the other), intersect (that lists
the rows found in both tables), and product (mentioned above, which combines each row of one table with each row
of the other). Depending on which other sources you consult, there are a number of other operators –many of which
can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and outer
union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating
operators, and if you permit relation values as attributes (RVA – relation-valued attribute), then operators such as
group and ungroup. The SELECT statement in SQL serves to handle all of these except for the group and ungroup
operators.
The flexibility of relational databases allows programmers to write queries that were not anticipated by the database
designers. As a result, relational databases can be used by multiple applications in ways the original designers didnot
foresee, which is especially important for databases that might be used for a long time (perhaps several decades).
This has made the idea and implementation of relational databases very popular with businesses.
Databasenormalization
Relations are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first
normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no
modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form,
and the database cannot meet the requirements for higher level normal forms without first having met all the
[4]
requirements of the lesser normal forms.
Examples
Database
Anidealized, very simple exampleof a description of somerelvars (relation variables) and theirattributes:
• Customer(CustomerID,TaxID,Name,Address,City,State,Zip,Phone,Email)
• Order(OrderNo,CustomerID,InvoiceNo,DatePlaced,DatePromised,Terms,Status)
• OrderLine(OrderNo,OrderLineNo,ProductCode, Qty)
• Invoice(InvoiceNo, CustomerID,Order No,Date,Status)
• InvoiceLine(InvoiceNo,InvoiceLineNo,ProductCode,QtyShipped)
• Product(ProductCode,ProductDescription)
In this design we have six relvars: Customer, Order, Order Line, Invoice, Invoice Line and Product. The bold,
underlined attributes are candidate keys. The non-bold, underlined attributes are foreign keys.
Usually one candidate key is arbitrarily chosen to be called the primary key and used in preference over the other
candidate keys, which are then called alternate keys.
A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relationinto
something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (that
includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabulardepiction
of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributedto a relvar.
Relational model 65
Customerrelation
CustomerID TaxID Name Address [Morefields…]
If we attempted to insert a new customer with the ID 1234567890, this would violate the design of the relvar since
CustomerIDis a primary key and we already have a customer 1234567890. The DBMS must reject a transaction
such as this that would render the database inconsistent by a violation of an integrity constraint.
Foreign keys are integrity constraints enforcing that the value of the attribute set is drawn from a candidate keyin
another relation. For example in the Order relation the attribute Customer ID is a foreign key. A joinis the
operation that draws on information from several relations at once. By joining relvars from the example above we
could query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific
customer, we would specify this using a restriction condition.
IfwewantedtoretrievealloftheOrdersforCustomer1234567890,wecouldquerythedatabasetoreturneveryrow in the Order
table with CustomerID1234567890 and join the Order table to the Order Line table based on Order No.
There is a flaw in our database design above. The Invoice relvar contains an Order No attribute. So, each tuple in the
Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in reality
an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar
contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always
true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice.
In other words there can be many Invoices per Order and many Orders per Invoice. This is a many-to-
manyrelationship between Order and Invoice (also called a non-specific relationship). To represent this relationship
in the database a new relvar should be introduced whose role is to specify the correspondence between Orders
andInvoices:
OrderInvoice(OrderNo,InvoiceNo)
Now, the Order relvar has a one-to-many relationshipto the OrderInvoice table, as does the Invoice relvar. If we
want to retrieve every Invoice for a particular Order, we can query for all orders where Order No in the Order
relation equals the OrderNoin OrderInvoice, and where InvoiceNoin OrderInvoice equals the InvoiceNoin
Invoice.
Set-theoreticformulation
Basic notions in the relational model are relation names and attribute names. We will represent these as strings such
as "Person" and "name" and we will usually use the variables and to range over them. Another basic
notion is the set of atomic values that contains values such as numbers and strings.
Ourfirstdefinitionconcernsthenotionoftuple,whichformalizesthenotionofroworrecordinatable: Tuple
Atuple is a partial function from attributenames to atomic values.
Header
Aheader is a finite setof attribute names.
Projection
Relational model 66
A relation is a tuple with , the header, and , the body, a set of tuples that all have the domain .
Such a relation closely corresponds to what is usually called the extension of a predicate in first-order logic except
that here we identify the places in the predicate with attribute names. Usually in the relational model a
databaseschema is said to consist of a set of relation names, the headers that are associated with these names and the
constraints that should hold for every instance of the database schema.
Relation universe
Arelation universe over a header is a non-empty setof relations with header .
Relation schema
A relation schema consists of a header and a predicate that is defined for all relations with
header . A relation satisfies a relation schema if it has header and satisfies .
Keyconstraintsandfunctionaldependencies
Oneofthesimplestandmostimportanttypesofrelationconstraintsisthekeyconstraint.Ittellsusthatinevery instance of a
certain relational schema the tuples can be identified by their values for certain attributes.
Superkey
Asuperkeyiswrittenasafinitesetofattributenames. A
superkey holds in a relation if:
• and
• there exist no two distinct tuples such that .
Asuperkeyholdsinarelationuniverse ifitholdsinallrelationsin .
Theorem:Asuperkey holdsinarelationuniverse over ifandonlyif and holds in .
Candidate key
A superkey holds as a candidate key for a relation universe if it holds as a superkey for and there is no
proper subset of that also holds as a superkey for .
Functional dependency
• tuples ,
Afunctional dependency holds in a relation universe if it holdsin all relations in .
Trivial functional dependency
Afunctional dependency is trivial under a header if it holds inall relation universes over .
Theorem:AnFD istrivialunderaheader ifandonlyif .
Closure
• (transitivity) and
• (augmentation)
Theorem: Armstrong'saxiomsaresoundandcomplete;givenaheader andaset ofFDsthatonly
containsubsetsof , ifandonlyif holdsinallrelationuniversesover in which all FDs
in hold.
Completion
The completion of a finite set of attributes under a finite set of FDs , written as , is the
smallestsuperset of such that:
•
The completion of an attribute set can be used to compute if a certain dependency is in the closure of a set of
FDs.
Theorem:Givenaset ofFDs, if and only if .
Irreducible cover
Anirreduciblecover ofa set of FDsis aset of FDssuchthat:
•
• there exists no such that
• is a singleton set and
• .
Algorithmtoderivecandidatekeysfromfunctionaldependencies
INPUT:asetSofFDsthatcontainonlysubsetsofaheaderH
OUTPUT: the set C of superkeys that hold as candidate keys in
allrelationuniversesoverHinwhichallFDsinShold
begin
C:=∅; //foundcandidatekeys
Q:={H}; //superkeysthatcontaincandidatekeys
whileQ<>∅do
letKbesomeelementfromQ; Q
:= Q – { K };
minimal:=true;
foreachX->YinSdo
K':=(K–Y)∪X; //derivenewsuperkey
if K'⊂ K then
minimal:=false;
Q := Q ∪ { K'};
endif
end for
ifminimalandthereisnotasubsetofKinCthen
removeallsupersetsofKfromC; C
:= C ∪ { K };
end if
endwhile
end
Relational model 68
References
[1] "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks", E.F. Codd, IBM Research Report, 1969
[2] DataIntegrationGlossary(http://knowledge.fhwa.dot.gov/tam/aashto.nsf/All+Documents/4825476B2B5C687285256B1F00544258/
$FILE/DIGloss.pdf),U.S. Departmentof Transportation, August2001.
[3] E.F. Codd,The Relational Modelfor Database Management,Addison-Wesley Publishing Company,1990, ISBN 0-201-14192-2
[4] DavidM.Kroenke,DatabaseProcessing:Fundamentals,Design,andImplementation(1997),Prentice-Hall,Inc.,pages130–144
Furtherreading
• Date,C.J.;Darwen,Hugh(2000).Foundationforfuturedatabasesystems:thethirdmanifesto;adetailedstudy
oftheimpactoftypetheoryontherelationalmodelofdata,includingacomprehensivemodeloftypeinheritance (2. ed.
ed.). Reading, Mass. [u.a.]: Addison-Wesley. ISBN0-201-70928-7.
• Date,C.J.(2007).AnIntroductiontoDatabaseSystems(8ed.).Boston[u.a.]:PearsonEducation. ISBN0-
321-19784-4.
Externallinks
• Feasibilityofaset-theoreticdatastructure:ageneralstructurebasedonareconstituteddefinitionofrelation
(http://hdl.handle.net/2027.42/4164) (Childs' 1968 research cited by Codd's 1970 paper)
• The Third Manifesto (TTM) (http://www.thethirdmanifesto.com/)
• Relational Databases (http://www.dmoz.org/Computers/Software/Databases/Relational/) at the
OpenDirectory Project
• Relational Model (http://c2.com/cgi/wiki?RelationalModel)
• Binaryrelationsandtuplescomparedwithrespecttothesemanticweb(http://blogs.sun.com/bblfish/entry/
why_binary_relations_beat_tuples)
Object-relational database 69
Object-relationaldatabase
Anobject-relationaldatabase(ORD),orobject-relationaldatabasemanagementsystem(ORDBMS),isa
database management system (DBMS) similar to a relational database, but with an object-oriented database model:
objects, classes and inheritance are directly supported in database schemas and in the query language. In addition,just
as with pure relational systems, it supports extension of the data model with custom data-types and methods.
An object-relational database can be
said to provide a middle groundbetween
relational databases andobject-oriented
databases (OODBMS). In object-
relational databases, the approach is
essentially that of relational databases:
the data resides in the
databaseandismanipulatedcollectively
with queries in a query language; at the
otherextremeareOODBMSesinwhich
the database is essentially a persistent
object store for software written in an
object-oriented programming language,
with a programming API for storingand
retrieving objects, and little or no
specific support for querying.
"one interface, many implementations". Other OOP principles, inheritance and encapsulation are related both, with
methods and attributes. Method inheritance is included in type inheritance. Encapsulation in OOP is a visibility
degree declared, for example, through the PUBLIC, PRIVATE and PROTECTED modifiers.
History
Object-relational database management systems grew out of research that occurred in the early 1990s. That research
extended existing relational database concepts by adding object concepts. The researchers aimed to retain a
declarative query-language based on predicate calculus as a central component of the architecture. Probably the most
notable research project, Postgres (UC Berkeley), spawned two products tracing their lineage to that research: Illustra
and PostgreSQL.
[1]
In the mid-1990s, early commercial products appeared. These included Illustra (Illustra Information Systems,
acquired by Informix Software which was in turn acquired by IBM), Omniscience (Omniscience Corporation,
acquired by Oracle Corporation and became the original Oracle Lite), and UniSQL (UniSQL, Inc., acquired by
KCOMS). Ukrainian developer Ruslan Zasukhin, founder of Paradigma Software, Inc., developed and shipped the
first version of Valentina database in the mid-1990s as a C++SDK. By the next decade, PostgreSQL had become a
commercially viable database and is the basis for several products today which maintain its ORDBMS features.
Computer scientists came to refer to these products as "object-relational database management systems" or
[2]
ORDBMSs.
Many of the ideas of early object-relational database efforts have largely become incorporated into SQL:1999via
structured types. In fact, any product that adheres to the object-oriented aspects of SQL:1999 could be described asan
object-relational database management product. For example, IBM's DB2, Oracle database, and Microsoft
SQLServer, make claims to support this technology and do so with varying degrees of success.
ComparisontoRDBMS
AnRDBMSmightcommonlyinvolveSQLstatementssuchasthese:
CREATETABLECustomers(
Id Surname FirstName
CHAR(12) NOTNULLPRIMARYKEY,
DOB
VARCHAR(32)NOTNULL,
VARCHAR(32)NOTNULL,
DATE NOTNULL
);
SELECTInitCap(Surname)||','||InitCap(FirstName)
FROMCustomers
WHEREMonth(DOB)=Month(getdate()) AND Day(DOB) = Day(getdate())
Inanobject-relationaldatabase,onemightseesomethinglikethis,withuser-defineddata-typesandexpressions
SELECTFormal(Id) such as
BirthDay():
FROMCustomers
WHEREBirthday(DOB)=Today()
CREATETABLECustomers(
Id Cust_Id NOTNULLPRIMARYKEY,
Object-relational database 71
SELECTFormal(C.Id)
FROMCustomersC
WHEREBirthDay(C.DOB)=TODAY;
The object-relational model can offer another advantage in that the database can make use of the relationships
between data to easily collect related records. In an address book application, an additional table would be added to
theonesabovetoholdzeroormoreaddressesforeachcustomer.UsingatraditionalRDBMS,collectinginformation for both
the user and their address requires a "join":
Externallinks
• Savushkin,Sergey(2003),APointofViewonORDBMS(http://savtechno.com/articles/
ViewOfORDBMS.html), retrieved 2012-07-21.
• JPAPerformance Benchmark(http://www.jpab.org/)—comparison of JavaJPA ORM Products(Hibernate,
EclipseLink,OpenJPA,DataNucleus).
• PolePositionBenchmark(http://www.polepos.org/)—showstheperformancetrade-offsforsolutionsinthe
object-relational impedance mismatch context.
Transaction processing 72
Transactionprocessing
Incomputer science, transactionprocessingis information processing that is divided into individual, indivisible
operations, called transactions. Each transaction must succeed or fail as a complete unit; it cannot remain in an
intermediate state.
Since most, though not necessarily all, transaction processing today is interactive the term is often treated as
synonymous with online transaction processing.
Description
Transaction processing is designed to maintain a database Integrity (typically a database or some modernfilesystems)
in a known, consistent state, by ensuring that interdependent operations on the system are either all completed
successfully or all canceled successfully.
For example, consider a typical banking transaction that involves moving $700 from a customer's savings account to
a customer's checking account. This transaction involves at least two separate operations in computer terms: debiting
the savings account by $700, and crediting the checking account by $700. If one operation succeeds but the other
does not, the books of the bank will not balance at the end of the day. There must therefore be a way to ensure that
either both operations succeed or both fail, so that there is never any inconsistency in the bank's database as a whole.
Transactionprocessinglinksmultipleindividualoperationsinasingle,indivisibletransaction,andensuresthateither all
operations in a transaction are completed without error, or none of them are. If some of the operations are completed
but errors occur when the others are attempted, the transaction-processing system "rolls back"all of the operations of
the transaction (including the successful ones), thereby erasing all traces of the transaction andrestoring the system to
the consistent, known state that it was in before processing of the transaction began. If all
operationsofatransactionarecompletedsuccessfully,thetransactioniscommittedbythesystem,andallchangesto the
database are made permanent; the transaction cannot be rolled back once this is done.
Transaction processing guards against hardware and software errors that might leave a transaction partially
completed. If the computer system crashes in the middle of a transaction, the transaction processing system
guarantees that all operations in any uncommitted transactions are cancelled.
Generally, transactions are issued concurrently. If they overlap (i.e. need to touch the same portion of the database),
this can create conflicts. For example, if the customer mentioned in the example above has $150 in his savings
account and attempts to transfer $100 to a different person while at the same time moving $100 to the checking
account, only one of them can succeed. However, forcing transactions to be processed sequentially is inefficient.
Therefore, concurrent implementations of transaction processing is programmed to guarantee that the end result
reflects a conflict-free outcome, the same as could be reached if executing the transactions sequentially in any order
(a property called serializability). In our example, this means that no matter which transaction was issued first, either
the transfer to a different person or the move to the checking account succeeds, while the other one fails.
Transaction processing 73
Methodology
The basic principles of all transaction-processing systems are the same. However, the terminology may vary fromone
transaction-processing system to another, and the terms used below are not necessarily universal.
Rollback
Transaction-processing systems ensure database integrity by recording intermediate states of the database as it is
modified, then using these records to restore the database to a known state if a transaction cannot be committed. For
example, copies of information on the database prior to its modification by a transaction are set aside by the system
before the transaction can make any modifications (this is sometimes called a before image). If any part of the
transaction fails before it is committed, these copies are used to restore the database to the state it was in before the
transaction began.
Rollforward
It is also possible to keep a separate journal of all modifications to a database (sometimes called after images). Thisis
not required for rollback of failed transactions but it is useful for updating the database in the event of a database
failure, so some transaction-processing systems provide it. If the database fails entirely, it must be restored from the
most recent back-up. The back-up will not reflect transactions committed since the back-up was made. However,
once the database is restored, the journal of after images can be applied to the database ( rollforward) to bring the
database up to date. Any transactions in progress at the time of the failure can then be rolled back. The result is a
database in a consistent, known state that includes the results of all transactions committed up to the moment of
failure.
Deadlocks
In some cases, two transactions may, in the course of their processing, attempt to access the same portion of a
database at the same time, in a way that prevents them from proceeding. For example, transaction A may access
portion X of the database, and transaction B may access portion Y of the database. If, at that point, transaction Athen
tries to access portion Y of the database while transaction B tries to access portion X, a deadlock occurs, and neither
transaction can move forward. Transaction-processing systems are designed to detect these deadlocks when they
occur. Typically both transactions will be cancelled and rolled back, and then they will be started again in a different
order, automatically, so that the deadlock doesn't occur again. Or sometimes, just one of the deadlocked transactions
will be cancelled, rolled back, and automatically restarted after a short delay.
Deadlocks can also occur between three or more transactions. The more transactions involved, the more difficultthey
are to detect, to the point that transaction processing systems find there is a practical limit to the deadlocks they can
detect.
Transaction processing 74
Compensatingtransaction
In systems where commit and rollback mechanisms are not available or undesirable, a compensating
transactionisoften used to undo failed transactions and restore the system to a previous state.
ACIDcriteria
Jim Gray defined properties of a reliable transaction system in the late 1970s under the acronym ACID —atomicity,
consistency, isolation, and durability.
Atomicity
Atransaction’schangestothestateareatomic:eitherallhappenornonehappen.Thesechangesincludedatabase changes,
messages, and actions on transducers.
Consistency
Consistency: A transaction is a correct transformation of the state. The actions taken as a group do not violate any of
the integrity constraints associated with the state.
Isolation
Even though transactions execute concurrently, it appears to each transaction T, that others executed either before T
or after T, but not both.
Durability
Once a transaction completes successfully (commits), its changes to the state survive failures.
Benefits
Transaction processing has these benefits:
• It allows sharing of computer resources among many users
• It shifts the time of job processing to when the computing resources are less busy
• It avoids idling the computing resources without minute-by-minute human interaction and supervision
• Itisusedonexpensiveclassesofcomputerstohelpamortizethecostbykeepinghighratesofutilizationofthose expensive
resources
Implementations
Standard transaction-processing software, notably IBM's Information Management System, was first developed inthe
1960s, and was often closely coupled to particular database management systems. Client–server computing
implemented similar principles in the 1980s with mixed success. However, in more recent years, the distributed
client–server model has become considerably more difficult to maintain. As the number of transactions grew in
response to various online services (especially the Web), a single distributed database was not a practical solution. In
addition, most online systems consist of a whole suite of programs operating together, as opposed to a strict client –
server model where the single server could handle the transaction processing. Today a number of transaction
processing systems are available that work at the inter-program level and which scale to large systems, including
mainframes.
[citation needed]
One well-known (and open) industry standard is the X/Open Distributed Transaction Processing(DTP)
(see also JTA the Java Transaction API). However, proprietary transaction-processing environments such as IBM's
[citation needed]
CICS are still very popular , although CICS has evolved to include open industry standards as well.
Transaction processing 75
The term 'Extreme Transaction Processing' (XTP) has been used to describe transaction processing systems with
uncommonly challenging requirements, particularly throughput requirements (transactions per second). Suchsystems
may be implemented via distributed or cluster style architectures.
References
External references
• NutsandBoltsofTransactionProcessing(http://www.subbu.org/articles/
nuts-and-bolts-of-transaction-processing)
• ManagingTransactionProcessingforSQLDatabaseIntegrity(http://www.informit.com/articles/
article.aspx?p=174375)
Furtherreading
• GerhardWeikum,GottfriedVossen,Transactionalinformationsystems:theory,algorithms,andthepracticeof
concurrency control and recovery, Morgan Kaufmann, 2002, ISBN 1-55860-508-8
• Jim Gray, Andreas Reuter, Transaction Processing —Concepts and Techniques, 1993, Morgan Kaufmann, ISBN
1-55860-190-2
• PhilipA.Bernstein,EricNewcomer,PrinciplesofTransactionProcessing,1997,MorganKaufmann,ISBN 1-
55860-415-4
• AhmedK.Elmagarmid(Editor),TransactionModelsforAdvancedDatabaseApplications,Morgan-Kaufmann, 1992,
ISBN 1-55860-214-3
76
Concepts
ACID
In computer science, ACID(Atomicity, Consistency, Isolation, Durability) is a set of properties that guarantee that
database transactions are processed reliably. In the context of databases, a single logical operation on the data is
calledatransaction.Forexample,atransferoffundsfromonebankaccounttoanother,eveninvolvingmultiple
changes such as debiting one account and crediting another, is a single transaction. The chosen initials refer to the
[citation needed]
acid test.
Jim Gray defined these properties of a reliable transaction system in the late 1970s and developed technologies to
[1]
achieve them automatically.
[2]
In1983,AndreasReuterandTheoHärdercoinedtheacronymACIDtodescribethem.
Characteristics
Atomicity
Atomicity requires that each transaction is "all or nothing": if one part of the transaction fails, the entire transaction
fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every
situation, including power failures, errors, and crashes. To the outside world, a committed transaction appears (by its
effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.
Consistency
The consistency property ensures that any transaction will bring the database from one valid state to another. Any
data written to the database must be valid according to all defined rules, including but not limited to constraints,
cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all waysthe
application programmer might have wanted (that is the responsibility of application-level code) but merely that any
programming errors do not violate any defined rules.
Isolation
The isolation property ensures that the concurrent execution of transactions results in a system state that would be
obtained if transactions were executed serially, i.e. one after the other. Providing isolation is the main goal of
concurrency control. Depending on concurrency control method, the effects of an incomplete transaction might not
[citation needed]
even be visible to another transaction.
Durability
Durability means that once a transaction has been committed, it will remain so, even in the event of power loss,
crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to
be stored permanently (even if the database crashes immediately thereafter). To defend against power loss,
transactions (or their effects) must be recorded in a non-volatile memory.
ACID 77
Examples
The following examples further illustrate the ACID properties. In these examples, the database table has two
columns, A and B. An integrity constraint requires that the value in A and the value in B must sum to 100. The
following SQL code creates a table as described above:
CREATETABLEacidtest(AINTEGER,BINTEGERCHECK(A+B=100));
Atomicityfailure
Assume that a transaction attempts to subtract 10 from A and add 10 to B. This is a valid transaction, since the data
continue to satisfy the constraint after it has executed. However, assume that after removing 10 from A, the
transaction is unable to modify B. If the database retained A's new value, atomicity and the constraint would both be
violated. Atomicity requires that both parts of this transaction, or neither, be complete.
Consistencyfailure
Consistency is a very general term which demands that the data must meet all validation rules. In the previous
example, the validation is a requirement that A + B = 100. Also, it may be inferred that both A and B must be
integers. A valid range for A and B may also be inferred. All validation rules must be checked to ensure consistency.
Assume that a transaction attempts to subtract 10 from A without altering B. Because consistency is checked after
each transaction, it is known that A + B = 100 before the transaction begins. If the transaction removes 10 from A
successfully, atomicity will be achieved. However, a validation check will show that A + B = 90, which is
inconsistentwiththerulesofthedatabase.Theentiretransactionmustbecancelledandtheaffectedrowsrolledback to their
pre-transaction state. If there had been other constraints, triggers, or cascades, every single change operation would
have been checked in the same way as above before the transaction was committed.
Isolationfailure
To demonstrate isolation, we assume two transactions execute at the same time, each attempting to modify the same
data. One of the two must wait until the other completes in order to maintain isolation.
• T subtracts 10 from A.
1
• T adds 10 to B.
1
• T subtracts 10 from B.
2
• T adds 10 to A.
2
If these operations are performed in order, isolation is maintained, although T must wait. Consider what happens if
2
T fails half-way through. The database eliminates T 's effects, and T sees only valid data.
1 1 2
By interleaving the transactions, the actual order of actions might be:
• T subtracts 10 from A.
1
• T subtracts 10 from B.
2
• T adds 10 to A.
2
• T adds 10 to B.
1
Again, consider what happens if T fails halfway through. By the time T fails, T has already modified A; it cannot
1 1
be restored to the value it had before T without leaving an invalid database. This2 is known as a write-write failure,
1
[citationneeded]
becausetwotransactionsattemptedtowritetothesamedatafield.Inatypicalsystem,the
problem would be resolved by reverting to the last known good state, canceling the failed transaction T , and
1
restarting the interrupted transaction T from the good state.
2
ACID 78
Durabilityfailure
Assume that a transaction transfers 10 from A to B. It removes 10 from A. It then adds 10 to B. At this point, a
"success" message is sent to the user. However, the changes are still queued in the disk buffer waiting to be
committedtothedisk.Powerfailsandthechangesarelost.Theuserassumes(understandably)thatthechangeshave been
made.
Implementation
Processing a transaction often requires a sequence of operations that is subject to failure for a number of reasons. For
instance, the system may have no room left on its disk drives, or it may have used up its allocated CPU time.
There are two popular families of techniques: write ahead logging and shadow paging. In both cases, locks must be
acquired on all information that is updated, and depending on the level of isolation, possibly on all data that is readas
well. In write ahead logging, atomicity is guaranteed by copying the original (unchanged) data to a log before
changing the database.Wikipedia:Disputed statement That allows the database to return to a consistent state in the
event of a crash.
In shadowing, updates are applied to a partial copy of the database, and the new copy is activated when the
transaction commits.
Lockingvsmultiversioning
Many databases rely upon locking to provide ACID capabilities. Locking means that the transaction marks the data
that it accesses so that the DBMS knows not to allow other transactions to modify it until the first transaction
succeeds or fails. The lock must always be acquired before processing data, including data that are read but not
modified. Non-trivial transactions typically require a large number of locks, resulting in substantial overhead as well
as blocking other transactions. For example, if user A is running a transaction that has to read a row of data that user
B wants to modify, user B must wait until user A's transaction completes. Two phase locking is often applied to
[citation needed]
guarantee full isolation.
An alternative to locking is multiversion concurrency control, in which the database provides each reading
transaction the prior, unmodified version of data that is being modified by another active transaction. This allows
readers to operate without acquiring locks, i.e. writing transactions do not block reading transactions, and readers do
not block writers. Going back to the example, when user A's transaction requests data that user B is modifying, the
database provides A with the version of that data that existed when user B started his transaction. User A gets a
consistent view of the database even if other users are changing data. One implementation, namely snapshotisolation,
relaxes the isolation property.
Distributedtransactions
Guaranteeing ACID properties in a distributed transaction across a distributed database where no single node is
responsible for all data affecting a transaction presents additional complications. Network connections might fail, or
one node might successfully complete its part of the transaction and then be required to roll back its changes,because
of a failure on another node. The two-phase commit protocol (not to be confused with two-phase locking)
providesatomicity for distributed transactions to ensure that each participant in the transaction agrees on whether the
[citation needed]
transaction should be committed or not. Briefly, in the first phase, one node (the coordinator)
interrogates the other nodes (the participants) and only when all reply that they are prepared does the coordinator, in
the second phase, formalize the transaction.
ACID 79
References
[1] Gray,Jim,andReuter,Andreas,DistributedTransactionProcessing:ConceptsandTechniques.MorganKaufmann,1993.ISBN1-
55860-190-2.
[2] Thesefourproperties,atomicity,consistency,isolation,anddurability(ACID),describethemajorhighlightsofthetransactionparadigm,which
has influenced many aspects of development in database systems.
Create,read,updateanddelete
Incomputerprogramming, create,read,updateanddelete(CRUD)(Sometimes called SCRUD with an "S" for
Search) are the four basic functions of persistent storage. Sometimes CRUD is expanded with the words retrieve
instead of read, modify instead of update, or destroy instead of delete. It is also sometimes used to describe
userinterfaceconventionsthatfacilitateviewing,searching,andchanginginformation;oftenusingcomputer-basedforms
and reports. The term was likely first popularized by James Martin in his 1983 book Managing the Data-base
Environment. The acronym may be extended to CRUDL to cover listing of large data sets which bring additional
complexity such as pagination when the data sets are too large to hold easily in memory.
AnothervariationofCRUDisBREAD,anacronymfor"Browse,Read,Edit,Add,Delete".
Databaseapplications
The acronym CRUD refers to all of the major functions that are implemented in relational databaseapplications.
Each letter in the acronym can map to a standard SQL statement and HTTP method:
Making full use of HTTP methods, along with other constraints, is considered "RESTful".
Although a relational database provides a common persistence layer in software applications, numerous other
persistencelayersexist.CRUDfunctionalitycanbeimplementedwithanobjectdatabase,anXMLdatabase,flattextfiles,
custom file formats, tape, or card, for example.
Userinterface
CRUDisalsorelevantattheuserinterfacelevelofmostapplications.Forexample,inaddressbooksoftware,the basic storage
unit is an individual contact entry. As a bare minimum, the software must allow the user to:
• Create or add new entries
• Read, retrieve, search, or view existing entries
• Update or edit existing entries
• Delete/deactivate existing entries
Without at least these four operations, the software cannot be considered complete. Because these operations are so
fundamental, they are often documented and described under one comprehensive heading, such as "contact
management", "content management" or "contact maintenance" (or "document management" in general, depending
on the basic storage unit for the particular application).
Create, read, update and delete 80
Notes
Null(SQL)
Nullis a special marker used in Structured Query Language (SQL) to indicate
that a data value does not exist in the database. Introduced by the creator of the
relationaldatabasemodel,E.F.Codd,SQLNullservestofulfilltherequirement that all
true relational database management systems (RDBMS)support a representation
of "missing information and inapplicable information". Codd also introduced the
use of the lowercase Greekomega (ω) symbol to represent Nullin database
theory. NULL is also an SQL reserved keyword used to identify the Null special
marker.
Null has been the focus of controversy and a source of debate because of its
associated three-valued logic (3VL), special requirements for its use in
SQLjoins, and the special handling required by aggregate functions and SQL The Greek lowercase characterisusedtorepresentNullin.
groupingoperators.ComputerscienceprofessorRonvanderMeyden
summarized the various issues as: "The inconsistencies in the SQL standard mean that it is not possible to
[1]
ascribeanyintuitivelogicalsemanticstothetreatmentofnullsinSQL." Althoughvariousproposalshavebeenmadefor
resolvingthese issues,thecomplexity ofthe alternativeshas prevented theirwidespread adoption.
Forpeoplenewtothesubject,agoodwaytorememberwhatnullmeansistorememberthatintermsofinformation,
"lackofavalue"isnotthesamethingas"avalueofzero";similarly,"lackofananswer"isnotthesamethingas"an answer of no".
For example, consider the question "How many books does Juan own?" The answer may be "zero" (we know that he
owns none) or "null" (we do not know how many he owns, or doesn't own). In a database table, the column reporting
this answer would start out with a value of null, and it would not be updated with "zero" until we have ascertained
that Juan owns no books.
History
E. F. Codd mentioned nulls as a method of representing missing data in the relational model in an 1975 paper in the
FDT Bulletin of ACM-SIGMOD. Codd's paper that is most commonly cited in relation with the semantics of Null (as
adopted in SQL) is his 1979 paper in the ACM Transactions on Database Systems, in which he also introduced his
Relational Model/Tasmania, although much of the other proposals from the latter paper have remained obscure.
Section 2.3 of his 1979 paper details the semantics of Null propagation in arithmetic operations and well as
comparisons employing a ternary (three-valued) logic when comparing to nulls; it also details the treatment of Nulls
on other set operations (the latter issue still controversial today). In database theory circles, the original proposal of
Codd (1975, 1979) is now referred to as "Krokk tables". Codd later reinforced his requirement that all RDBMS
support Null to indicate missing data in a 1985 two-part article published in ComputerWorldmagazine.
The 1986 SQL standard basically adopted Codd's proposal after an implementation prototype in IBM System R.
Although Don Chamberlin recognized nulls (alongside duplicate rows) as one of the most controversial features of
SQL, he defended the design of Nulls in SQL invoking the pragmatic arguments that it was the least expensive form
of system support for missing information, saving the programmer from many duplicative application-level checks
(see semipredicate problem) while at the same time providing the database designer with the option not to use nullsif
he so desires; for example, in order to avoid well known anomalies (discussed in the semantics section of this
article). Chamberlin also argued that besides providing some missing-value functionality, practical experience with
NullsalsoledtootherlanguagefeatureswhichrelyonNulls,likecertaingroupingconstructsandouterjoins.
Null (SQL) 81
Finally, he argued that in practice Nulls also end up being used a quick way to patch an existing schema when it
needs to evolve beyond its original intent, coding not for missing but rather for inapplicable information; for
example, a database that quickly needs to support electric cars while having a miles-per-gallon column.
Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null
mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers toindicate
the reason why data is missing. In Codd's book, these two Null-type markers are referred to as 'A-Values'and 'I-
Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively. Codd's recommendation
would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this
additional complexity, the idea of multiple Null-type values has not gained widespread acceptance in the database
practitioners' domain. It remains an active field of research though, with numerous papers still being published.
Nullpropagation
Arithmeticoperations
Because Null is not a data value, but a marker for an unknown value, using mathematical operators on Null results in
an unknown value, which is represented by Null. In the following example, multiplying 10 by Null results in Null:
NULL/0
Stringconcatenation
String concatenation operations, which are common in SQL, also result in Null when one of the operands is Null.The
following example demonstrates the Null result returned by using Null with the SQL || string concatenation
operator.
ComparisonswithNULLandthethree-valuedlogic(3VL)
Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder)
indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but
always in a third logical result, Unknown. The logical result of the expression below, which compares the value 10to
Null, is Unknown:
However,certainoperationsonNullcanreturnvaluesifthevalueofNullisnotrelevanttotheoutcomeofthe
SELECT10=NULL --ResultsinUnknown operation. Consider
the following example:
SELECTNULLORTRUE --ResultsinTrue
Null (SQL) 82
In this case, the fact that the value on the left of OR is unknowable is irrelevant, because the outcome of the OR
operation would be True regardless of the value on the left.
SQL implements three logical results, so SQL implementations must provide for a specialized three-valued
logic(3VL).The rules governing SQL three-valued logic are shown in the tables below (pand qrepresent logical
states)" The truth tables SQL uses for AND, OR, and NOT correspond to a common fragment of the Kleene
andŁukasiewicz three-valued logic (which differ in their definition of implication, however SQL defines no such
operation).
p NOTp
True False
False True
Unknown Unknown
EffectofUnknowninWHEREclauses
SQL three-valued logic is encountered in Data Manipulation Language (DML) in comparison predicates of DML
statements and queries. The WHEREclause causes the DML statement to act on only those rows for which the
predicate evaluates to True. Rows for which the predicate evaluates to either False or Unknown are not acted on by
INSERT, UPDATE, or DELETEDML statements, and are discarded by SELECTqueries. Interpreting Unknown and
False as the same logical result is a common error encountered while dealing with Nulls. The following simple
example demonstrates this fallacy:
The example query above logically always returns zero rows because the comparison of the i column with Null
SELECT*
alwaysreturnsUnknown,evenforthoserowswhereiisNull.TheUnknownresultcausestheSELECTstatementto
FROMt
summarilydiscardeachandeveryrow.(However,inpractice,someSQLtoolswillretrieverowsusingacomparison
WHEREi=NULL; with
Null.)
Null (SQL) 83
Null-specificand3VL-specificcomparisonpredicates
Basic SQL comparison operators always return Unknown when comparing anything with Null, so the SQL standard
providesfortwospecialNull-specificcomparisonpredicates.TheIS NULL andIS NOT NULL predicates (which use
a postfix syntax) test whether data is, or is not, Null.
The SQL standard contains an extension F571 "Truth value tests" that introduces three additional logical unary
operators (six in fact, if we count their negation, which is part of their syntax), also using postfix notation. They have
[2]
the following truth tables:
The F571 extension is orthogonal to the presence of the boolean datatype in SQL (discussed later in this article) and,
despite syntactic similarities, F571 does not introduce boolean or three-valued literals in the language. The F571
extension was actually present in SQL92, well before the boolean datatype was introduced to the standard in 1999.
The F571 extension is implemented by few systems however; PostgreSQL is one of those implementing it.
The addition of IS UNKNOWN to the other operators of SQL's three-valued logic makes the SQL three-valued logic
[3]
functionally complete, meaning its logical operators can express (in combination) any conceivable three-valued
logical function.
On systems which don't support the F571 extension, it is possible to emulate IS UNKNOWN p by going over every
argumentthatcouldmaketheexpressionpUnknownandtestthoseargumentswithISNULLorother NULL-specific
functions, although this may be more cumbersome.
Lawoftheexcludedfourth(inWHEREclauses)
In SQL's three-valued logic the law of the excluded middle, p OR NOT p, no longer evaluates to true for all p. More
precisely, in SQL's three-valued logic p OR NOT p is unknown precisely when p is unknown and true otherwise.
Because direct comparisons with Null result in the unknown logical value, the following query
isSELECT*FROMstuffWHERE(x=10)ORNOT(x=10);
not equivalent in SQL with
ifSELECT*FROMstuff;
the column x contains any Nulls; in that case the second query would return some rows the first one does not
return,namelyallthoseinwhichxisNull.Inclassicaltwo-valuedlogic,thelawoftheexcludedmiddlewouldallow the
simplification of the WHERE clause predicate, in fact its elimination. Attempting to apply the law of theexcluded
middle to SQL's 3VL is effectively a false dichotomy. The second query is actually equivalent with:
Thus,to correctly simplify the firststatement in SQL requires thatwe return all rows inwhich x is not null.
SELECT*FROMstuff;
--is(becauseof3VL)equivalentto:
SELECT*FROMstuffWHERE(x=10)ORNOT(x=10)ORxISNULL;
Null (SQL) 84
SELECT*FROMstuffWHERExISNOTNULL;
From the above, it's easy observe that for SQL's WHERE clause a tautology similar to the law of excluded middle
can be written. Assuming the IS UNKNOWN operator is present, this is p OR (NOT p) OR (p IS UNKNOWN) is
true for every predicate p. Among logicians, this is called law of excluded fourth.
There are some SQL expressions in which it is less obvious where the false dilemma occurs, for example:
produces no rows because IN is translates to an iterated version of equality over the argument set and 1<>NULL is
SELECT'ok'WHERE1NOTIN(SELECTCAST(NULLASINTEGER))
Unknown,
UNION just a as 1=NULL is Unknown. (The CAST in this example is needed only in some SQL implementations
like PostgreSQL, which would reject it with a type checking error otherwise. In many systems plain SELECT NULL
SELECT'ok'WHERE1IN(SELECTCAST(NULLASINTEGER));
works in the subquery.) The missing case above is of course:
SELECT'ok'WHERE(1IN(SELECTCAST(NULLASINTEGER)))ISUNKNOWN;
EffectofNullandUnknowninotherconstructs
Joins
Joins evaluate using the same comparison rules as for WHERE clauses. Therefore, care must be taken when using
nullablecolumnsinSQLjoincriteria.Inparticularatablecontaininganynullsisnotequalwithanaturalself-joinof
itself,meaningthatwhereas istrueforanyrelationRinrelationalalgebra,aSQLself-joinwill
[4]
excludeallrowshavinganullvalueanywhere. Anexampleofthisbehaviorisgiveninthesectionanalyzingthe
missing-value semantics of Nulls.
The SQL COALESCE function or CASE expressions can be used to "simulate" Null equality in join criteria, and the
IS NULL and IS NOT NULL predicates can be used in the join criteria as well. The following predicate tests for
equality of the values A and B and treats Nulls as being equal.
CASEexpressions
(A=B)OR(AISNULLANDBIS NULL)
SQLprovidestwoflavoursofconditionalexpressions.Oneiscalled"simpleCASE"andoperateslikeaswitchstatement. The
other is called a "searched CASE" in the standard, and operates like an if...elseif.
ThesimpleCASE expressionsuseimplicitequalitycomparisonswhichoperateunderthesamerulesastheDML WHERE
clauserulesforNull.Thus,asimpleCASE expressioncannotcheckfortheexistenceofNulldirectly.A check for Null in a
simple CASE expression always results in Unknown, as in the following:
SELECTCASEiWHENNULLTHEN'IsNull' --Thiswillneverbereturned
WHEN0THEN'IsZero' --Thiswillbereturnedwheni
=0
WHEN 1THEN'Is One'--Thiswillbereturnedwheni
=1
END
FROMt;
SELECTCASEWHENiISNULLTHEN'NullResult'--Thiswillbereturned
when i is NULL
WHEN i=0 THEN 'Zero' -- This will be returned
when i =0
WHEN i=1 THEN 'One' -- This will be returned
when i =1
END
FROMt;
InthesearchedCASEexpression,thestring'NullResult'isreturnedforallrowsinwhichiisNull.
Oracle'sdialectofSQLprovidesabuilt-infunctionDECODEwhichcanbeusedinsteadofthesimpleCASE expressions and
considers two nulls equal.
IFstatementsinproceduralextensions
SQL/PSM(SQLPersistentStoredModules)definesproceduralextensionsforSQL,suchastheIFstatement. However, the
major SQL vendors have historically included their own proprietary procedural extensions. Procedural extensions for
looping and comparisons operate under Null comparison rules similar to those for DML statementsand queries. The
following code fragment, in ISO SQL standard format, demonstrates the use of Null 3VL in an IF statement.
The IFstatement performs actions only for those comparisons that evaluate to True. For statements that evaluate to
IFi=NULLTHEN
False or Unknown, the IF statement passes control to the ELSEIFclause, and finally to the ELSEclause. The result
SELECT'ResultisTrue'ELS
ofEIF
the code
NOT(iabove
= will
NULL)always be the message 'Result is Unknown'since the comparisons with Null always
THEN
evaluate toSELECT'ResultisFalse'
Unknown.
ELSE
AnalysisofSQLNullmissing-valuesemantics
SELECT'ResultisUnknown';
The groundbreaking work of T. Imielinski and W. Lipski (1984) provided a framework in which to evaluate the
intended semantics of various proposals to implement missing-value semantics. This section roughly follows chapter
19 the "Alice" textbook. A similar presentation appears in the review of Ron van der Meyden, §10.4.
Inselectionsandprojections:weakrepresentation
Constructs representing missing information, such as Codd tables, are actually intended to represent a set ofrelations,
one for each possible instantiation of their parameters; in the case of Codd tables, this means replacementof Nulls
with some concrete value. For example,
Null (SQL) 86
A construct (such as a Codd table) is said to be a strong representation system (of missing information) if anyanswer
to a query made on the construct can be particularized to obtain an answer for any corresponding query onthe
relations it represents, which are seen as models of the construct. More precisely, if is a query formula in the
relational algebra (of "pure" relations) and if is its lifting to a construct intended to represent missing information,a
strong representation has the property that for any query q and (table) construct T, lifts all the answers to the
construct, i.e.:
(The above has to hold for queries taking any number of tables as arguments, but the restriction to one table suffices
for this discussion.) Clearly Codd tables do not have this strong property if selections and projections are considered
as part of the query language. For example, all the answers to
Result
Harriet ω1 ω1=22
where the condition column is interpreted as the row doesn't exist if the condition is false. It turns out that becausethe
formulas in the condition column of a c-table can be arbitrary propositional logic formulas, an algorithm for the
problem whether a c-table represents some concrete relation has a co-NP-complete complexity, thus is of little
practical value.
A weaker notion of representation is therefore desirable. Imielinski and Lipski introduced the notion of weak
representation, which essentially allows (lifted) queries over a construct to return a representation only for sure
information, i.e. if it's valid for all "possible world" instantiations (models) of the construct. Concretely, a constructis
a weak representation system if
The right-hand side of the above equation is the sure information, i.e. information which can be certainly extracted
from the database regardless of what values are used to replace Nulls in the database. In the example we considered
above, it's easy to see that the intersection of all possible models (i.e. the sure information) of the query selecting
WHERE Age = 22 is actually empty because, for instance, the (unlifted) query returns no rows for the relation
EmpH37. More generally, it was shown by Imielinski and Lipski that Codd tables are a weak representation systemif
the query language is restricted to projections, selections (and renaming of columns). However, as soon as we add
either joins or unions to the query language, even this weak property is lost, as evidenced in the next section.
Null (SQL) 87
Ifjoinsorunionsareconsidered:notevenweakrepresentation
Letus consider the following query over the sameCodd table Emp from the previous section:
Whatever concrete value one would choose for the NULL age of Harriet, the above query will return the full column
SELECTNameFROMEmpWHEREAge=22
ofUNION
names of any model of Emp, but when the (lifted) query is ran on Emp itself, Harriet will always be missing, i.e.
we have:
SELECTNameFROMEmpWHEREAge<>22;
Name
QueryresultonEmp: Queryresult on any model of Emp:
Name
George George
Charles Harriet
Charles
Thus when unions are added to the query language, Codd tables are not even a weak representation system ofmissing
information, meaning that queries over them don't even report all sure information. It's important to notehere that
semantics of UNION on Nulls, which are discussed in a later section, did not even come into play in this query. The
"forgetful" nature of the two sub-queries was all that it took to guarantee that some sure information went unreported
when the above query was ran on the Codd table Emp.
For natural joins, the example needed to show that sure information may be unreported by some query is slightly
more complicated. Consider the table
F1 F2 F3
11 NULL 13
21 NULL 23
31 32 33
SELECTF1,F3FROM
QueryresultonJ: Query result on any model of J:
(SELECTF1,F2FROMJ)ASF12
F3 F1 F3
NATURALJOIN
31 33 11 13
(SELECTF2,F3FROMJ)ASF23;
21 23
31 33
The intuition for what happens above is that the Codd tables representing the projections in the subqueries lose track
ofthefactthattheNullvaluesinthecolumnsF12.F2andF23.F2areactuallycopiesoftheoriginalsinthetableJ.
Null (SQL) 88
This observation suggests that a relatively simple improvement of Codd tables (which works correctly for this
example) would be to use Skolem constants (meaning Skolem functions which are also constant functions), say
ω and ω instead of a single NULL symbol. Such an approach, called v-tables or Naive tables, is computationally
12 22
less expensive that the c-tables discussed above. However it is still not a complete solution for incomplete
information in the sense that v-tables are only a weak representation for queries not using any negations in selection
(and not using any set difference either). The first example considered in this section is using a negative selection
clause, WHERE Age <> 22, so it is also an example where v-tables queries would not report sure information.
Checkconstraintsandforeignkeys
The primary place in which SQL three-valued logic intersects with SQL Data Definition Language (DDL) is in the
form of check constraints. A check constraint placed on a column operates under a slightly different set of rules than
thosefortheDMLWHEREclause.WhileaDMLWHEREclausemustevaluatetoTrueforarow,acheckconstraint
mustnotevaluatetoFalse.(Fromalogicperspective,thedesignatedvaluesareTrueandUnknown.)Thismeansthat a check
constraint will succeed if the result of the check is either True or Unknown. The following example table with a
check constraint will prohibit any integer values from being inserted into column i, but will allow Null to be inserted
since the result of the check will always evaluate to Unknown for Nulls.
Because of the change in designated values relative to the WHERE clause, from a logic perspective the law of
CREATETABLEt(
excludediINTEGER,
middle is a tautology for CHECK constraints, meaning CHECK (p OR NOT p) always succeeds.
Furthermore, assuming Nulls are to be interpreted as existing but unknown values, some pathological CHECKs like
CONSTRAINTck_iCHECK(i<0ANDi=0ANDi>0));
the one above allow insertion of Nulls that could never be replaced by any non-null value.
In order to constrain a column to reject Nulls, the NOT NULL constraint can be applied, as shown in the
examplebelow.TheNOTNULLconstraintissemanticallyequivalenttoacheckconstraintwithanISNOTNULL predicate.
By default check constraints against foreign keys succeed if any of the fields in such keys are Null. For example the
CREATETABLEt(iINTEGERNOTNULL);
table
would allow insertion of rows where author_last or author_first are NULL irrespective of how the table Authors is
CREATETABLEBooks
defined or what it contains. More precisely, a null in any of these fields would allow any value in the other one, even
(titleVARCHAR(100),
on that is not found inVARCHAR(20),
author_last Authors table. For example if Authors contained only ('Doe', 'John'), then ('Smith', NULL)
would satisfy the foreign
author_firstVARCHAR(20),key constraint. SQL-92 added two extra options for narrowing down the matches in such
cases.IfMATCHPARTIALisaddedaftertheREFERENCESdeclarationthenanynon-nullmustmatchtheforeign
FOREIGNKEY(author_last,author_first) key, e. g.
('Doe', NULL) would still match, but ('Smith', NULL) would not. Finally, if MATCH FULL is added then ('Smith',
REFERENCESAuthors(last_name,first_name));
NULL) would not match the constraint either, but (NULL, NULL) would still match it.
Null (SQL) 89
Outerjoins
SQLouter joins, including left outer joins, right outer joins, and full outer joins, automatically produce Nulls as
placeholders for missing values in related tables. For left outer joins, for instance, Nulls are produced in place ofrows
missing from the table appearing on the right-hand side of the LEFT OUTER JOIN operator. The following
simple example uses two tables to demonstrate Null placeholder production in a left outer join.
The first table (Employee) contains employee ID numbers and names, while the second table (PhoneNumber)
contains related employee ID numbers and phone numbers, as shown below.
3 Thompson Thomas |
4 Patterson Patricia +PhoneNumbe
r
|+ Employee
The following sample SQL query performs a left outer join on these two tables.
The result set generated by this query demonstrates how SQL uses Null as a placeholder for values missing from the
SELECTe.ID,e.LastName,e.FirstName,pn.Number
right-hand (PhoneNumber) table, as shown below.
FROMEmployeee
LEFTOUTERJOINPhoneNumberpn ID LastName FirstName Number
ONe.ID=pn.ID; 1 Johnson Joe 555-2323
|+Queryresult
Aggregatefunctions
SQL defines aggregate functions to simplify server-side aggregate calculations on data. Except for the COUNT(*)
function,allaggregatefunctionsperformaNull-eliminationstep,sothatNullvaluesarenotincludedinthefinal result of the
calculation.
NotethattheeliminationofNullvaluesisnotequivalenttoreplacingthosevalueswithzero.Forexample,inthe following table,
AVG(i) (the average of the values of i) will give a different result from that of AVG(j):
Null (SQL) 90
i j
150 150
200 200
250 250
NULL 0
|+ Table
Here AVG(i) is 200 (the average of 150, 200, and 250), while AVG(j) is 150 (the average of 150, 200, 250, and
0). A well-known side effect of this is that in SQL AVG(z) is not equivalent with SUM(z)/COUNT(*).
Whentwonullsareequal:grouping,sorting,andsomesetoperations
Because SQL:2003 defines all Null markers as being unequal to one another, a special definition was required in
order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one
another, or any two Nulls", as "not distinct". This definition of not distinct allows SQL to group and sort Nulls when
the GROUP BY clause (and other keywords that perform grouping) are used.
Other SQL operations, clauses, and keywords use "not distinct" in their treatment of Nulls. These include the
following:
• PARTITIONBYclauseofrankingandwindowingfunctionslikeROW_NUMBER
• UNION,INTERSECT,andEXCEPToperator,whichtreatNULLsasthesameforrowcomparison/elimination
purposes
• DISTINCTkeywordusedinSELECTqueries
The principle that Nulls aren't equal to each other (but rather that the result is Unknown) is effectively violated in the
SQL specification for the UNION operator, which does identify nulls with each other. Consequently, some set
operations in SQL, like union or difference, may produce results not representing sure information, unlike operations
involving explicit comparisons with NULL (e.g. those in a WHERE clause discussed above). In Codd's 1979proposal
(which was basically adopted by SQL92) this semantic inconsistency is rationalized by arguing thatremoval of
duplicates in set operations happens "at a lower level of detail than equality testing in the evaluation of retrieval
operations."
TheSQLstandarddoesnotexplicitlydefineadefaultsortorderforNulls.Instead,onconformingsystems,Nullscan be sorted
before or after all data values by using the NULLS FIRST or NULLS LAST clauses of the ORDER BY list,
respectively. Not all DBMS vendors implement this functionality, however. Vendors who do not implement this
functionality may specify different treatments for Null sorting in the DBMS.
Effectonindexoperation
Some SQL products do not index keys containing NULL values. For instance, PostgreSQL versions prior to 8.3 did
not, with the documentation for a B-tree index stating that
B-trees can handle equality and range queries on data that can be sorted into some ordering. In particular, the
PostgreSQL query planner will consider using a B-tree index whenever an indexed column is involved in a
comparison using one of these operators: <≤ = ≥ >
Constructs equivalent to combinations of these operators, such as BETWEEN and IN, can also beimplemented
with a B-tree index search. (But note that IS NULL is not equivalent to = and is not indexable.)
In cases where the index enforces uniqueness, NULL values are excluded from the index and uniqueness is not
enforced between NULL values. Again, quoting from the PostgreSQL documentation:
Null (SQL) 91
When an index is declared unique, multiple table rows with equal indexed values will not be allowed. Null
values are not considered equal. A multicolumn unique index will only reject cases where all of the indexed
columns are equal in two rows.
This is consistent with the SQL:2003-defined behavior of scalar Null comparisons.
Another method of indexing Nulls involves handling them as not distinct in accordance with the SQL:2003-defined
behavior. For example, Microsoft SQL Server documentation states the following:
For indexing purposes, NULL values compare as equal. Therefore, a unique index, or UNIQUE constraint,
cannot be created if the key values are NULL in more than one row. Select columns that are defined as NOT
NULL when columns for a unique index or unique constraint are chosen.
Both of these indexing strategies are consistent with the SQL:2003-defined behavior of Nulls. Because indexing
methodologies are not explicitly defined by the SQL:2003 standard, indexing strategies for Nulls are left entirely to
the vendors to design and implement.
Null-handlingfunctions
SQL defines two functions to explicitly handle Nulls: NULLIF and COALESCE. Both functions are abbreviations
for searched CASE expressions.
NULLIF
TheNULLIFfunctionacceptstwoparameters.Ifthefirstparameterisequaltothesecondparameter,NULLIF
returns Null. Otherwise, the value of the first parameter is returned.
Thus,NULLIFisanabbreviationforthefollowingCASEexpression:
NULLIF(value1,value2)
CASEWHENvalue1=value2THENNULLELSEvalue1END
COALESCE
TheCOALESCEfunctionacceptsa listof parameters,returningthe firstnon-Null valuefromthe list:
COALESCEisdefinedasshorthandforthefollowingSQLCASEexpression:
COALESCE(value1,value2,value3,...)
Some SQL DBMSs implement vendor-specific functions similar to COALESCE. Some systems (e.g. Transact-SQL)
CASEWHENvalue1ISNOTNULLTHENvalue1
implementanISNULL function,orothersimilarfunctionsthatarefunctionallysimilartoCOALESCE.(SeeIsfunctions for
WHENvalue2ISNOTNULLTHENvalue2
more on WHENvalue3ISNOTNULLTHENvalue3
the IS functions in Transact-SQL.)
...
END
Null (SQL) 92
NVL
TheOracleNVLfunctionacceptstwoparameters.Itreturnsthefirstnon-NULLparameterorNULLifallparameters are
NULL.
ACOALESCEexpressioncanbeconvertedintoanequivalentNVLexpressionthus:
turns into:
COALESCE(val1,...,val{n})
DatatypingofNullandUnknown
The NULLliteral is untyped in SQL, meaning that it is not designated as an integer, character, or any other specific
data type. Because of this, it is sometimes mandatory (or desirable) to explicitly convert Nulls to a specific data type.
For instance, if overloaded functions are supported by the RDBMS, SQL might not be able to automatically resolve
to the correct function without knowing the data types of all parameters, including those for which Null is passed.
ConversionfromtheNULLliteraltoaNullofaspecifictypeispossibleusingtheCASTintroducedinSQL-92.For example:
parses and executes successfully in some environments (e.g. SQLite or PostgreSQL) which unify a NULL boolean
SELECT'ok'WHERE(NULL<>1)ISNULL;
with Unknown but fails to parse in others (e.g. in SQL Server Compact). MySQL behaves similarly to PostgreSQLin
this regard (with the minor exception that MySQL regards TRUE and FALSE as no different from the ordinary
integers 1 and 0). PostgreSQL additionally implements a IS UNKNOWN predicate, which can be used to test
whether a three-value logical outcome is Unknown, although this is merely syntactic sugar.
BOOLEANdatatype
TheISOSQL:1999standardintroducedtheBOOLEANdatatypetoSQL,howeverit'sstilljustanoptional, non-core feature,
coded T031.
When restricted by a NOT NULL constraint, the SQL BOOLEAN works like the Boolean type from other
languages. Unrestricted however, the BOOLEAN datatype, despite its name, can hold the truth values TRUE,
FALSE, and UNKNOWN, all of which are defined as boolean literals according to the standard. The standard also
[5]
asserts that NULL and UNKNOWN "may be used interchangeably to mean exactly the same thing".
Null (SQL) 93
The Boolean type has been subject of criticism, particularly because of the mandated behavior of the UNKNOWN
literal, which is never equal to itself because of the identification with NULL.
As discussed above, in the PostgreSQL implementation of SQL, the null value is used to represent all UNKNOWN
results,includingtheUNKNOWNBOOLEAN.PostgreSQLdoesnotimplementtheUNKNOWNliteral(althoughit does
implement the IS UNKNOWN operator, which is an orthogonal feature.) Most other major vendors do not support
[6]
the Boolean type (as defined in T031) as of 2012. The procedural part of Oracle's PL/SQL supports BOOLEAN
however variables; these can also be assigned NULL and the value is considered the same as UNKNOWN.
Controversy
Commonmistakes
Misunderstanding of how Null works is the cause of a great number of errors in SQL code, both in ISO standardSQL
statements and in the specific SQL dialects supported by real-world database management systems. These mistakes
are usually the result of confusion between Null and either 0 (zero) or an empty string (a string value with a length of
zero, represented in SQL as ''). Null is defined by the ISO SQL standard as different from both an empty string and
the numerical value 0, however. While Null indicates the absence of any value, the empty string and numerical zero
both represent actual values.
A classic rookie error is attempting to use the equality operator to find NULL values. Most SQL implementations
will execute the following query as syntactically correct (therefore give no error message) but it never returns any
rows, regardless of whether NULL values do exist in the table.
InSELECT*
a related, but more subtle example, a WHERE clause or conditional statement might compare a column's value
with a constant. It is often incorrectly assumed that a missing value would be "less than" or "not equal to" a constant
FROMsometable
ifWHEREnum=NULL;--Shouldbe"WHEREnumISNULL"
that field contains Null, but, in fact, such expressions return Unknown. An example is below:
Similarly,
SELECT*Null values are often confused with empty strings. Consider the LENGTH function, which returns the
number of characters in a string. When a Null is passed into this function, the function returns Null. This can lead to
FROMsometable
unexpected results, if users are not well versed in 3-value logic. An example is below:
WHEREnum<>1;--RowswherenumisNULLwillnotbereturned,
--contrarytomanyusers'expectations.
Thisiscomplicatedbythefactthatinsomedatabaseinterfaceprograms(orevendatabaseimplementationslike
SELECT* Oracle's),
NULL is reported as an empty string, and empty strings may be incorrectly stored as NULL.
FROMsometable
WHERELENGTH(string)<20;--RowswherestringisNULLwillnotbereturned.
Null (SQL) 94
Criticisms
The ISO SQL implementation of Null is the subject of criticism, debate and calls for change. In The RelationalModel
for Database Management: Version 2, Codd suggested that the SQL implementation of Null was flawed and should
be replaced by two distinct Null-type markers. The markers he proposed were to stand for "Missing but
Applicable"and "Missing but Inapplicable", known as A-values and I-values, respectively. Codd's recommendation,
if accepted, would have required the implementation of a four-valued logic in SQL. Others have suggested adding
additional Null-type markers to Codd's recommendation to indicate even more reasons that a data value might be
"Missing", increasing the complexity of SQL's logic system. At various times, proposals have also been put forth to
implement multiple user-defined Null markers in SQL. Because of the complexity of the Null-handling and logic
systems required to support multiple Null markers, none of these proposals have gained widespread acceptance.
Chris Date and Hugh Darwen, authors of The Third Manifesto, have suggested that the SQL Null implementation is
inherentlyflawedandshouldbeeliminatedaltogether,pointingtoinconsistenciesandflawsintheimplementationof SQL
Null-handling (particularly in aggregate functions) as proof that the entire concept of Null is flawed and should be
removed from the relational model. Others, like author Fabian Pascal, have stated a belief that "how the function
[citation needed]
calculation should treat missing values is not governed by the relational model."
Closedworldassumption
Another point of conflict concerning Nulls is that they violate the closed world assumption model of relational
databases by introducing an open world assumption into it. The closed world assumption, as it pertains to databases,
states that "Everything stated by the database, either explicitly or implicitly, is true; everything else is false." This
view assumes that the knowledge of the world stored within a database is complete. Nulls, however, operate under
the open world assumption, in which some items stored in the database are considered unknown, making the
database's stored knowledge of the world incomplete.
References
[1] Ron van der Meyden, " Logical approaches to incomplete information: a survey (http://books.google.com/books?
id=gF0b85IuqQwC&pg=PA344)"inChomicki,Jan;Saake,Gunter(Eds.)LogicsforDatabasesandInformationSystems,KluwerAcademicPubl
ishersISBN978-0-7923-8129-7, p. 344; PS preprint (http://www.cse.unsw.edu.au/~meyden/research/indef-review.ps) (note: page
numbering differsin preprint from the published version)
[2] C.J.Date (2004), Anintroduction to database systems, 8thed., Pearson Education, p. 594
[3] C.J. Date,Relational database writings, 1991-1994,Addison-Wesley, 1995, p. 371
[4] C.J.Date (2004), Anintroduction to database systems, 8thed., Pearson Education, p. 584
[5] ISO/IEC9075-2:2011§4.5
[6] TroelsArvin,SurveyofBOOLEANdatatypeimplementation(http://troels.arvin.dk/db/rdbms/#data_types-boolean)
Furtherreading
• E.F. Codd.Understanding relations (installment#7). FDT Bulletinof ACM-SIGMOD, 7(3-4):23–28,1975.
• Codd,E.F.(1979)."Extendingthedatabaserelationalmodeltocapturemoremeaning".ACMTransactionson
Database Systems 4 (4): 397. doi: 10.1145/320107.320109 (http://dx.doi.org/10.1145/320107.320109).
Especially §2.3.
• Date,C.J.(2000).TheDatabaseRelationalModel:ARetrospectiveReviewandAnalysis:AHistoricalAccount and
Assessment of E. F. Codd's Contribution to the Field of Database Technology. Addison Wesley Longman.
ISBN0-201-61294-1.
• Klein,Hans-Joachim."HowtomodifySQLqueriesinordertoguaranteesureanswers(http://www.acm.org/sigmod/
record/issues/9409/sql.ps)". ACM SIGMOD Record 23.3 (1994): 14-20.
• ClaudeRubinson,Nulls,Three-ValuedLogic,andAmbiguityinSQL:CritiquingDate’sCritique(http://www.u.
arizona.edu/~rubinson/scrawl/Rubinson.2007.Nulls_Three-Valued_Logic_and_Ambiguity_in_SQL.pdf),
Null (SQL) 95
SIGMODRecord,December2007(Vol.36,No.4)
• John Grant, Null Values in SQL (http://www09.sigmod.org/sigmod/record/issues/0809/p23.grant.pdf).
SIGMOD Record, September 2008 (Vol. 37, No. 3)
• Waraporn,Narongrit,andKriengkraiPorkaew."Nullsemanticsforsubqueriesandatomicpredicates(http://
www.iaeng.org/IJCS/issues_v35/issue_3/IJCS_35_3_08.pdf)". IAENG International Journal of Computer
Science 35.3 (2008): 305-313.
• BernhardThalheim,Klaus-DieterSchewe,"NULL‘Value’AlgebrasandLogics"inAnneliHeimbürger,Yasushi
Kiyoki,TakehiroTokuda,HannuJaakkola,NaofumiYoshida(eds.)InformationModellingandKnowledge Bases
XXII, Frontiers in Artificial Intelligence and Applications, Volume 225, 2011, IOS Press, ISBN
978-1-60750-689-8, pp.354–367doi:10.3233/978-1-60750-690-4-354(http://dx.doi.org/10.3233/
978-1-60750-690-4-354)
• Enrico Franconi and Sergio Tessaris, On the Logic of SQL Nulls (http://ceur-ws.org/Vol-866/paper8.pdf),
Proceedingsofthe6thAlbertoMendelzonInternationalWorkshoponFoundationsofDataManagement,Ouro Preto,
Brazil, June 27–30, 2012. pp. 114–128
Externallinks
• Oracle NULLs (http://www.psoug.org/reference/null.html)
• The Third Manifesto (http://www.thethirdmanifesto.com/)
• Implications of NULLs in sequencing of data
(http://www.sqlexpert.co.uk/2006/05/treatment-of-nulls-by-oracle-sql.html)
• Javabugreportaboutjdbcnotdistinguishingnullandemptystring,whichSunclosedas"notabug"(http://
bugs.sun.com/bugdatabase/view_bug.do?bug_id=4032732)
• TheIntegrationEngineer(http://www.theintegrationengineer.com/the-nature-of-null/)explainshowNULL
works and the logic behind it.
Candidate key 96
Candidatekey
In the relational model of databases, a candidatekeyof a relation is a minimal superkey for that relation; that is, a
set of attributes such that
1. therelationdoesnothavetwodistincttuples(i.e.rowsorrecordsincommondatabaselanguage)withthesame values for
these attributes (which means that the set of attributes is a superkey)
2. there is no proper subset of these attributes for which (1) holds (which means that the set is minimal).
Theconstituentattributesarecalledprimeattributes.Conversely,anattributethatdoesnotoccurinANYcandidate
keyiscalledanon-primeattribute.
Since a relation contains no duplicate tuples, the set of all its attributes is a superkey if NULL values are not used. It
follows that every relation will have at least one candidate key.
The candidate keys of a relation tell us all the possible ways we can identify its tuples. As such they are an important
concept for the design of database schema.
Example
Thedefinitionofcandidatekeyscanbeillustratedwiththefollowing(abstract)example.Considerarelationvariable (relvar) R
with attributes (A, B, C, D) that has only the following two legal values r1 and r2:
r1
A B C D
a1 b1 c1 d1
a1 b2 c2 d1
a2 b1 c2 d1
r2
A B C D
a1 b1 c1 d1
a1 b2 c2 d1
a1 b1 c2 d2
Determiningcandidatekeys
The set of all candidate keys can be computed e.g. from the set of functional dependencies. To this end we need
todefinetheattributeclosure foranattributeset .Theset containsallattributesthatarefunctionally implied
by .
It is quite simple to find a single candidate key. We start with a setof attributes and try to remove successively each
attribute. If after removing an attribute the attribute closure stays the same, then this attribute is not necessary
andwecanremoveitpermanently.Wecalltheresult .Ifisthesetofallattributes,then
is a candidate key.
Actually we can detect every candidate key with this procedure by simply trying every possible order of
removingattributes.Howevertherearemuchmorepermutationsofattributes( )thansubsets( ).Thatis,many attribute
orders will lead to the same candidate key.
There is a fundamental difficulty for efficient algorithms for candidate key computation: Certain sets of
functionaldependenciesleadtoexponentiallymanycandidatekeys.Considerthe functionaldependencies
which yields candidate keys:
. That is, the best we can expect is an algorithm that is efficient with respect to the
number of candidate keys.
Thefollowingalgorithmactuallyrunsinpolynomialtimeinthenumberofcandidatekeysandfunctional dependencies:
TheK[0]:=minimize(A);/*Aisthesetofallattribute*/
idea behind the algorithm is that given a candidate key and anfunctional
:= dependency , the reverse
1; /* Number of Keys known so far */
application of the functional dependency yields the set , which is a key, too. It may however be
i:=0;/*Currentlyprocessedkey*/ while
covered by other already known candidate keys. (The algorithm checks this case using the 'found' variable.) If not,
i < n do
then minimizing the new key yields a new candidate key. The key insight is (pun not intended) that all candidatekeys
can beforeachα→β∈Fdo
created this way.
S:=α∪(K[i]−β);
found:=false;
forj:=0ton-1do
ifK[j]⊆Sthenfound:=true; if
not found then
K[n]:=minimize(S);
n := n + 1;
Candidate key 98
References
• Date,Christopher(2003)."5:Integrity".AnIntroductiontoDatabaseSystems.Addison-Wesley.pp.268–276.
ISBN978-0-321-18956-1.
Externallinks
• Relational Database Management Systems - Database Design - Terms of Reference - Keys
(http://rdbms.opengrass.net/2_Database Design/2.1_TermsOfReference/2.1.2_Keys.html): An overview
of the different types of keys in an RDBMS (Relational Database Management System).
Foreignkey
In the context of relational databases, a foreign key is a field (or collection of fields) in one table that uniquely
identifies a row of another table. In other words, a foreign key is a column or a combination of columns that is used
to establish and enforce a link between the data in two tables.
For example, consider a database with two tables, a CUSTOMER table that includes all customer data and an
ORDER table that includes all customer orders. Suppose that the business requires that each order must refer to a
single customer. To reflect this in the database, the primary key (e.g., CUSTOMERID) in the CUSTOMER table is
added to the ORDER table, where it is called a foreign key. Since CUSTOMERID in the ORDER table uniquely
identifies a row of the CUSTOMER table, it says which customer placed the order.
Thetablecontainingtheforeignkeyiscalledthereferencingorchildtableandthetablecontainingthecandidatekey is called
the referenced or parent table. Since the purpose of the foreign key in the referencing table is to identify a row of the
referenced table, the value of the foreign key must be equal to the candidate key's value in some row ofthe primary
table or else have no value, i.e., the NULL value. This rule is called a referential integrity constraint between the two
tables. Because violations of referential integrity constraints can be the source of many database problems, most
database management systems enforce referential integrity constraints, providing mechanisms to ensure that every
non-null foreign key corresponds to a row of the referenced (or parent) table.
Foreign keys play an essential role in database design. One important part of database design is making sure that
relationships between real-world entities are reflected in the database by references, using foreign keys to refer from
one table to another. Another important part of database design is database normalization, in which tables are broken
apart and foreign keys make it possible for them to be reconstructed.
Multiple rows in the referencing (or child) table may refer to the same row in the referenced (or parent) table. Forthis
reason, the relationship between the two tables is called a one to many relationship between the referenced table and
the referencing table. The child and parent table may be the same table, i.e. the foreign key refers back to the
sametable.SuchaforeignkeyisknowninSQL:2003asaself-referencingorrecursiveforeignkey.
A table may have multiple foreign keys, and each foreign key can have a different parent table. Each foreign key is
enforced independently by the database system. Therefore, cascading relationships between tables can be established
using foreign keys.
Foreign key 99
Definingforeignkeys
Foreign keys are defined in the ISO SQL Standard, through a FOREIGN KEY constraint. The syntax to add such a
constraint to an existing table is defined in SQL:2003 as shown below. Omitting the column list in the
REFERENCES clause implies that the foreign key shall reference the primary key of the referenced table.
Likewise,foreignkeyscanbedefinedas
ALTERTABLE<tableidentifier> partoftheCREATETABLESQLstatement.
ADD[CONSTRAINT<constraintidentifier>]
Ifthe foreign key is a single columnonly, the column can be marked as such using the following syntax:
CREATETABLEtable_name(
FOREIGNKEY(<columnexpression>{,<columnexpression>}...)
id INTEGERPRIMARY KEY,
Foreign keys can be defined with a stored procedure statement.Wikipedia:Please clarify
REFERENCES<tableidentifier>[(<columnexpression>{,<columnexpression>}...)]
CREATETABLEtable_name( [ ON
col2CHARACTERVARYING(20),
idUPDATEINTEGERPRIMARY
<referential action>]KEY,
tabname:thenameofthetableorviewthatcontainstheforeignkeytobedefined.
• sp_foreignkeytabname,pktabname,col1[,col2]...[,col8]
col3INTEGER,
col2CHARACTERVARYING(20),
[ONDELETE<referentialaction>]
• pktabname:thenameofthetableorviewthathastheprimarykeytowhichtheforeignkeyapplies.Theprimary
...
col3INTEGERREFERENCESother_table(column_name),
keyFOREIGNKEY(col3)
must already be defined.
...)
• col1:thenameofthefirstcolumnthatmakesuptheforeignkey.Theforeignkeymusthaveatleastonecolumn and can
REFERENCESother_table(key_col)ONDELETECASCADE,
have a maximum of eight columns.
...)
Foreign key 100
Referentialactions
Because the database management system enforces referential constraints, it must ensure data integrity if rows in a
referenced table are to be deleted (or updated). If dependent rows in referencing tables still exist, those references
havetobeconsidered.SQL:2003specifies5differentreferentialactionsthatshalltakeplaceinsuchoccurrences:
• CASCADE
• RESTRICT
• NOACTION
• SET NULL
• SET DEFAULT
CASCADE
Whenever rows in the master (referenced) table are deleted (resp. updated), the respective rows of the child
(referencing) table with a matching foreign key column will get deleted (resp. updated) as well. This is called a
cascade delete (resp. update).
RESTRICT
Avaluecannotbeupdatedordeletedwhenarowexistsinareferencingorchildtablethatreferencesthevalueinthe referenced
table.
Similarly,a row cannot be deleted aslong as there is a referenceto it from a referencing or childtable.
NOACTION
NO ACTION and RESTRICT are very much alike. The main difference between NO ACTION and RESTRICT is
that with NO ACTION the referential integrity check is done after trying to alter the table. RESTRICT does thecheck
before trying to execute the UPDATE or DELETE statement. Both referential actions act the same if the referential
integrity check fails: the UPDATE or DELETE statement will result in an error.
In other words, when an UPDATE or DELETE statement is executed on the referenced table using the referential
action NO ACTION, the DBMS verifies at the end of the statement execution that none of the referential
relationships are violated. This is different from RESTRICT, which assumes at the outset that the operation will
violatetheconstraint.UsingNOACTION,thetriggersorthesemanticsofthestatementitselfmayyieldanendstate in which
no foreign key relationships are violated by the time the constraint is finally checked, thus allowing the statement to
complete successfully.
SETDEFAULT,SETNULL
In general, the action taken by the DBMS for SET NULL or SET DEFAULT is the same for both ON DELETE or
ON UPDATE: The value of the affected referencing attributes is changed to NULL for SET NULL, and to the
specified default value for SET DEFAULT.
Triggers
Referential actions are generally implemented as implied triggers (i.e. triggers with system-generated names, often
hidden.) As such, they are subject to the same limitations as user-defined triggers, and their order of execution
relativetoothertriggersmayneedtobeconsidered;insomecasesitmaybecomenecessarytoreplacethereferential action with
its equivalent user-defined trigger to ensure proper execution order, or to work around mutating-tablelimitations.
Another important limitation appears with transaction isolation: your changes to a row may not be able to fully
cascadebecausetherowisreferencedbydatayourtransactioncannot"see",andthereforecannotcascadeonto.An
Foreign key 101
example: while your transaction is attempting to renumber a customer account, a simultaneous transaction is
attempting to create a new invoice for that same customer; while a CASCADE rule may fix all the invoice rows your
transaction can see to keep them consistent with the renumbered customer row, it won't reach into anothertransaction
to fix the data there; because the database cannot guarantee consistent data when the two transactions commit, one of
them will be forced to roll back (often on a first-come-first-served basis.)
Example
As a first example to illustrate foreign keys, suppose an accounts database has a table with invoices and each invoice
is associated with a particular supplier. Supplier details (such as name and address) are kept in a separate table; each
supplier is given a 'supplier number' to identify it. Each invoice record has an attribute containing the suppliernumber
for that invoice. Then, the 'supplier number' is the primary key in the Supplier table. The foreign key in the Invoices
table points to that primary key. The relational schema is the following. Primary keys are marked in bold, and
foreign keys are marked in italics.
The Supplier(SupplierNumber,Name,Address,Type)
corresponding Data Definition Language statement is as follows.
Invoices(InvoiceNumber,SupplierNumber,Text)
CREATE TABLE Supplier
References
( SupplierNumberINTEGERNOTNULL,
Externallinks
Name VARCHAR(20)NOTNULL,
Address VARCHAR(50)NOTNULL,
• SQL-99ForeignKeys(https://kb.askmonty.org/en/constraint_type-foreign-key-constraint/)
Type VARCHAR(10),
• PostgreSQLForeignKeys(http://www.postgresql.org/docs/9.2/static/tutorial-fk.html)
CONSTRAINT supplier_pk PRIMARY KEY(SupplierNumber),
• MySQLForeignKeys(http://dev.mysql.com/doc/refman/5.1/en/create-table-foreign-keys.html)
CONSTRAINTnumber_valueCHECK(SupplierNumber>0))
• FirebirdSQLForeignKeys(http://www.firebirdsql.org/manual/nullguide-keys.html#nullguide-keys-fk)
• SQLite support for Foreign Keys (http://www.sqlite.org/foreignkeys.html)
CREATE TABLE Invoices (
InvoiceNumber INTEGERNOTNULL,
SupplierNumberINTEGERNOTNULL,
Text VARCHAR(4096),
CONSTRAINT invoice_pk PRIMARY KEY(InvoiceNumber),
CONSTRAINTinumber_valueCHECK(InvoiceNumber>0),
CONSTRAINT supplier_fk FOREIGN KEY(SupplierNumber)
REFERENCESSupplier(SupplierNumber)
ONUPDATECASCADEONDELETERESTRICT)
Unique key 102
Uniquekey
In an entity relationship diagram of a data model, one or more uniquekeysmay be declared for each data entity.
Eachuniquekeyiscomposedfromoneormoredataattributesofthatdataentity.Thesetofuniquekeysdeclaredfor a data entity
is often referred to as the candidate keys for that data entity. From the set of candidate keys, a single unique key is
selected and declared the primary key for that data entity. In an entity relationship diagram, each entity relationship
uses a unique key, most often the primary key, of one data entity and copies the unique key data attributes to another
data entity to which it relates. This inheritance of the unique key data attributes is referred to asa foreign key and is
used to provide data access paths between data entities. Once the data model is instantiated intoa database, each data
entity usually becomes a database table, unique keys become unique indexes associated with their assigned database
tables, and entity relationships become foreign key constraints. In integrated data models,
[1] [2]
commonalityrelationships donotbecomeforeignkeyconstraintssincecommonalityrelationshipsareapeer-to-peer
type of relationship.
In a relational database, a "Primary Key" is a key that uniquely defines the characteristics of each row (also knownas
record or tuple). The primarykeyhas to consist of characteristics that cannot be duplicated by any other row. The
primary key may consist of a single attribute or a multiple attributes in combination. For example, a birthday could
be shared by many people and so would not be a prime candidate for the Primary Key, but a social security number
or Driver's License number would be ideal since it correlates to one single data value. Another unique characteristic
of a Primary Key as it pertains to a relational database, is that a Primary Key must also serve as a Foreign Key on a
[citation needed]
related table . For example:
Here we canTable
Author see thatSchema:
AUTHOR_ID serves as the Primary Key in AuthorTable but also serves as the Foreign Key on
the BookTable. The Foreign Key serves as the link and therefore the connection between the two "related" tables in
this sample database.
AuthorTable(AUTHOR_ID,AuthorName,CountryBorn,YearBorn)
In a relational database, a uniquekeyindex can uniquely identify each row of data values in a database table. A
BookTableSchema:
unique key index comprises a single column or a set of columns in a single database table. No two distinct rows or
data records in a database table can have the same data value (or combination of data values) in those unique key
BookTable(ISBN,Author_ID,Title,Publisher,Price)
index columns if NULL values are not used. Depending on its design, a database table may have many unique key
indexes but at most one primary key index.
A unique key constraint does not imply the NOTNULLconstraint in practice. Because NULL is not an actual value (it
represents the lack of a value), when two rows are compared, and both rows have NULL in a column, the column
values are not considered to be equal. Thus, in order for a unique key to uniquely identify each row in a table, NULL
[3]
values must not be used. According to the SQL standard and Relational Model theory, a unique key (unique
constraint) should accept NULL in several rows/tuples — however not all RDBMS implement this feature correctly.
[4][5]
A unique key should uniquely identify all possible rows that exist in a table and not only the currently existing rows
[citation needed] [6]
. Examples of unique keys are Social Security numbers (associated with a specific person ) or ISBNs
(associated with a specific book). Telephone books and dictionaries cannot use names, words, or Dewey Decimal
system numbers as candidate keys because they do not uniquely identify telephone numbers or words.
Unique key 103
A table can have at most one primary key, but more than one unique key. A primary key is a combination of
columns which uniquely specify a row. It is a special case of unique keys. One difference is that primary keys have
an implicit NOT NULL constraint while unique keys do not. Thus, the values in unique key columns may or maynot
[7]
be NULL, and in fact such a column may contain at most one NULL fields. Another difference is that primary
keys must be defined using another syntax.
The relational model, as expressed through relational calculus and relational algebra, does not distinguish between
primary keys and other kinds of keys. Primary keys were added to the SQL standard mainly as a convenience to the
[citation needed]
application programmer.
Unique keys as well as primary keys can be referenced by foreign keys.
Definingprimarykeys
Primary keys are defined in the ANSI SQL Standard, through the PRIMARY KEY constraint. The syntax to addsuch
a constraint to an existing table is defined in SQL:2003 like this:
The ALTERTABLE<tableidentifier>
primary key can also be specified directly during table creation. In the SQL Standard, primary keys may consist
of one or multiple columns. Each column participating in the primary key is implicitly defined as NOT NULL. Note
ADD[CONSTRAINT<constraintidentifier>]
[citation needed]
that some PRIMARYKEY(<columnexpression>{,<columnexpression>}...)
DBMS require explicitly marking primary-key columns as NOT NULL.
IftheCREATETABLEtable_name(
primary key consists only of asingle column, the column can be marked as such using the following syntax:
DifferencesbetweenPrimaryKeyandUniqueKey:
CREATE TABLE table_name
...
PrimaryKey
( id_colINTPRIMARYKEY,
)
1. Aprimary
col2key cannot allow null values. (You cannot define a primarykey on columns that allow nulls.)
CHARACTERVARYING(20),
2. Each table can have at most one primary key.
...
3. OnsomeRDBMSa
) primarykeyautomaticallygenerates aclusteredtable indexbydefault.
UniqueKey
1. Aunique key can allow null values. (You can define a uniquekey on columns that allow nulls.)
2. Each table can have multiple unique keys.
3. OnsomeRDBMSa uniquekeyautomaticallygenerates anon-clusteredtable indexbydefault.
Unique key 104
Defininguniquekeys
The definition of unique keys is syntactically very similar to primary keys.
Likewise,uniquekeyscanbedefinedas
ALTERTABLE<tableidentifier> partoftheCREATETABLESQLstatement.
ADD[CONSTRAINT<constraintidentifier>]
CREATETABLEtable_name( id_colINT,
col2CHARACTERVARYING(20),
UNIQUE(<columnexpression>{,<columnexpression>}...)
key_colSMALLINT,
...
CONSTRAINTkey_uniqueUNIQUE(key_col),
...
)
CREATETABLEtable_name(
id_col col2
INTPRIMARYKEY,
... CHARACTERVARYING(20),
key_col
...
SMALLINTUNIQUE,
Surrogatekeys
In some design situations the natural key that uniquely identifies a tuple in a relation is difficult to use for software
development. For example, it may involve multiple columns or large text fields. A surrogate key can be used as the
primary key. In other situations there may be more than one candidate key for a relation, and no candidate key is
obviously preferred. A surrogate key may be used as the primary key to avoid giving one candidate key artificial
primacy over the others.
Since primary keys exist primarily as a convenience to the programmer, surrogate primary keys are often used —in
many cases exclusively—in database application design.
Duetothepopularityofsurrogateprimarykeys,manydevelopersandinsomecaseseventheoreticianshavecometo
regardsurrogateprimarykeysasaninalienablepartoftherelationaldatamodel.Thisislargelyduetoamigrationof
principlesfromtheObject-OrientedProgrammingmodeltotherelationalmodel,creatingthehybrid object-relational model.
In the ORM, these additional restrictions are placed on primary keys:
• Primary keys should be immutable, that is, not changed until the record is destroyed.
• Primary keys should be anonymous integer or numeric identifiers.
However, neither of these restrictions is part of the relational model or any SQL standard. Due diligence should be
applied when deciding on the immutability of primary key values during database and application design. Some
database systems even imply that values in primary key columns cannot be changed using the UPDATE SQL
[citationneeded]
statement .
Unique key 105
Alternate key
It is commonplace in SQL databases to declare a single primary key, the most important unique key. However,
there could be further unique keys that could serve the same purpose. These should be marked as 'unique' keys. This
is done to prevent incorrect data from entering a table (a duplicate entry is not valid in a unique column) and to make
[8]
the database more complete and useful. These could be called alternate keys.
References
[1] DataModelIntegration|TheIntegrationofDataModels(http://www.strins.com/data-model-integration.html)
[2] CommonalityRelationships|CommonalityConstraints(http://www.strins.com/commonality-relationships.html)
[3] SummaryofANSI/ISO/IECSQL(http://www.xcdsql.org/SummaryofSQL.html#chapter-Tableconstraints)
[4] Constraints-SQLDatabaseReferenceMaterial-Learnsql,readansqlmanual,followansqltutorial,orlearnhowtostructureanSQLquery(http://
www.sql.org/sql-database/postgresql/manual/ddl-constraints.html#AEN1832)
[5] ComparisonofdifferentSQLimplementations(http://troels.arvin.dk/db/rdbms/#constraints-unique)
[6]
SSNuniqueness:RareSSNduplicatesdoexistinthefield,aconditionthatledtoproblemswithearlycommercialcomputersystemsthatreliedonSS
Nuniqueness.Practitionersaretaughtthatwell-knownduplicationsinSSNassignmentsoccurredintheearlydaysoftheSSNsystem. This situation
points out the complexity of designing systems that assume unique keys in real-world data.
[7] MySQL 5.5 Reference Manual :: 12.1.14. CREATE TABLE Syntax (http://dev.mysql.com/doc/refman/5.5/en/create-
table.html)"Forall engines, a UNIQUE index permits multiple NULL values for columns that can contain NULL."
[8] Alternatekey- OracleFAQ(http://www.orafaq.com/wiki/Alternate_key)
Externallinks
• Relation Database terms of reference, Keys (http://rdbms.opengrass.net/2_Database
Design/2.1_TermsOfReference/2.1.2_Keys.html):AnoverviewofthedifferenttypesofkeysinanRDB
MS
Superkey
Asuperkeyisdefinedintherelationalmodelofdatabaseorganizationasasetofattributesofarelationvariablefor which it
holds that in all relations assigned to that variable, there are no two distinct tuples (rows) that have the same values
for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relationschema
upon which all attributes of the schema are functionally dependent.
Note that the set of allattributes is a trivial superkey, because in relational algebra duplicate rows are not permitted.
Also note that if attribute set K is a superkey of relation R, then at all times it is the case that the projection of R over
K has the same cardinality as R itself.
Informally, a superkey is a set of attributes within a table whose values can be used to uniquely identify a tuple. A
candidate key is a minimal set of attributes necessary to identify a tuple, this is also called a minimal superkey. For
example, given an employee schema, consisting of the attributes employeeID, name, job, and departmentID, we
could use the employeeID in combination with any or all other attributes of this table to uniquely identify a tuple in
thetable.Examplesofsuperkeysinthisschemawouldbe{employeeID,Name},{employeeID,Name,job},and
{employeeID, Name, job, departmentID}. The last example is known as trivial superkey, because it uses allattributes
of this table to identify the tuple.
Ina real database we do not need values for all of those attributes to identify a tuple. We only need, per our example,
theset{employeeID}.Thisisaminimalsuperkey–thatis,aminimalsetofattributesthatcanbeusedtoidentifya single tuple.
So, employeeID is a candidate key.
Superkey 106
Example
EnglishMonarchs
MonarchName MonarchNumber RoyalHouse
Edward II Plantagenet
Henry IV Lancaster
References
• Silberschatz,Abraham(2011).DatabaseSystemConcepts(6thed.).McGraw-Hill.pp.45–46.
ISBN978-0-07-352332-3.
Externallinks
• Relation Database terms of reference, Keys (http://rdbms.opengrass.net/2_Database
Design/2.1_TermsOfReference/2.1.2_Keys.html):AnoverviewofthedifferenttypesofkeysinanRDB
MS
Surrogate key 107
Surrogatekey
A surrogatekeyin a database is a unique identifier for either an entity in the modeled world or an object in the
database. The surrogate key is not derived from application data.
Definition
There are at least two definitions of a surrogate:
Surrogate(1)–Hall,OwlettandCodd(1976)
A surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is
nevertheless visible to the user or application.
Surrogate(2)–WieringaandDeJonge(1991)
A surrogate represents an object in the database itself. The surrogate is internally generated by the system and
is invisible to the user or application.
The Surrogate (1) definition relates to a data model rather than a storage model and is used throughout this article.
See Date (1998).
An important distinction between a surrogate and a primary key depends on whether the database is a
currentdatabase or a temporal database. Since a current database stores only currently valid data, there is a one-to-
one correspondence between a surrogate in the modeled world and the primary key of the database. In this case the
surrogate may be used as a primary key, resulting in the term surrogate key. In a temporal database, however, thereis
a many-to-one relationship between primary keys and the surrogate. Since there may be several objects in the
database corresponding to a single surrogate, we cannot use the surrogate as a primary key; another attribute is
required, in addition to the surrogate, to uniquely identify each object.
Although Hall et al. (1976) say nothing about this, othersWikipedia:Citing sources have argued that a surrogate
should have the following characteristics:
• the value is unique system-wide, hence never reused
• the value is system generated
• the value is not manipulable by the user or application
• the value contains no semantic meaning
• the value is not visible to the user or application
• the value is not composed of several values from different domains.
Surrogatesinpractice
In a current database, the surrogate key can be the primary key, generated by the database management systemand
not derived from any application data in the database. The only significance of the surrogate key is to act as the
primary key. It is also possible that the surrogate key exists in addition to the database-generated UUID (forexample,
an HR number for each employee other than the UUID of each employee).
A surrogate key is frequently a sequential number (e.g. a Sybase or SQL Server"identity column", a PostgreSQL or
Informixserial, an OracleSEQUENCE or a column defined with AUTO_INCREMENT in MySQL) but doesn't
have to be. Having the key independent of all other columns insulates the database relationships from changes indata
values or database design (making the database more agile) and guarantees uniqueness.
Inatemporaldatabase,itisnecessarytodistinguishbetweenthesurrogatekeyandtheprimarykey.Typically,every row would
have both a primary key and a surrogate key. The primary key identifies the unique row in the database, the
surrogate key identifies the unique entity in the modelled world; these two keys are not the same. For example,
tableStaffmaycontaintworowsfor"JohnSmith",onerowwhenhewasemployedbetween1990and1999,another
Surrogate key 108
row when he was employed between 2001 and 2006. The surrogate key is identical (non-unique) in both rows
however the primary key will be unique.
Some database designers use surrogate keys systematically regardless of the suitability of other candidate keys,while
others will use a key already present in the data, if there is one.
A surrogate key may also be called a synthetic key, an entity identifier, a system-generated key, a database sequence
[citation needed]
number, a factless key, a technical key, or an arbitrary unique identifier. Some of these terms describe
the way of generating new surrogate values rather than the nature of the surrogate concept.
Approaches to generating surrogates include:
• UniversallyUnique Identifiers (UUIDs)
• GloballyUnique Identifiers (GUIDs)
• ObjectIdentifiers (OIDs)
• SybaseorSQLServeridentitycolumnIDENTITYORIDENTITY(n,n)
• OracleSEQUENCE
• PostgreSQL or IBM Informixserial
• MySQLAUTO_INCREMENT
• AutoNumber data type in Microsoft Access
• ASIDENTITYGENERATEDBYDEFAULTinIBMDB2
• Identity column (implemented in DDL) in Teradata
Advantages
Immutability
Surrogatekeys do not changewhile the row exists.This has the followingadvantages:
• Applicationscannot lose their referenceto a row in thedatabase (since the identifier neverchanges).
• Theprimarykeydatacanalwaysbemodified,evenwithdatabasesthatdonotsupportcascadingupdatesacross related
foreign keys.
Requirementchanges
Attributes that uniquely identify an entity might change, which might invalidate the suitability of the natural,
compound keys. Consider the following example:
An employee's network user name is chosen as a natural key. Upon merging with another company, new
employees must be inserted. Some of the new network user names create conflicts because their user names
were generated independently (when the companies were separate).
In these cases, generally a new attribute must be added to the natural key (for example, an original_company
column). With a surrogate key, only the table that defines the surrogate key must be changed. With natural keys, all
tables (and possibly other, related software) that use the natural key will have to change.
Some problem domains do not clearly identify a suitable natural key. Surrogate key avoids choosing a natural key
that might be incorrect.
Surrogate key 109
Performance
Surrogate keys tend to be a compact data type, such as a four-byte integer. This allows the database to query the
singlekeycolumnfasterthanitcouldmultiplecolumns.Furthermoreanon-redundantdistributionofkeyscausesthe resulting
b-tree index to be completely balanced. Surrogate keys are also less expensive to join (fewer columns to compare)
than compound keys.
Compatibility
While using several database application development systems, drivers, and object-relational mapping systems, such
as Ruby on Rails or Hibernate, it is much easier to use an integer or GUID surrogate keys for every table instead of
natural keys in order to support database-system-agnostic operations and object-to-row mapping.
Uniformity
Wheneverytablehasauniformsurrogatekey,sometaskscanbeeasilyautomatedbywritingthecodeina table-independent
way.
Validation
It is possible to design key-values that follow a well-known pattern or structure which can be automatically verified.
For instance, the keys that are intended to be used in some column of some table might be designed to "look
differently from" those that are intended to be used in another column or table, thereby simplifying the detection of
application errors in which the keys have been misplaced. However, this characteristic of the surrogate keys should
never be used to drive any of the logic of the applications themselves, as this would violate the principles ofDatabase
normalization.
Disadvantages
Disassociation
The values of generated surrogate keys have no relationship to the real-world meaning of the data held in a row.
When inspecting a row holding a foreign key reference to another table using a surrogate key, the meaning of the
surrogate key's row cannot be discerned from the key itself. Every foreign key must be joined to see the related data
[citation needed]
item. This can also make auditing more difficult, as incorrect data is not obvious.
Surrogate keys are unnatural for data that is exported and shared. A particular difficulty is that tables from two
otherwise identical schemas (for example, a test schema and a development schema) can hold records that are
equivalent in a business sense, but have different keys. This can be mitigated by not exporting surrogate keys, except
as transient data (most obviously, in executing applications that have a "live" connection to the database).
Queryoptimization
Relational databases assume a unique index is applied to a table's primary key. The unique index serves two
purposes: (i) to enforce entity integrity, since primary key data must be unique across rows and (ii) to quickly search
for rows when queried. Since surrogate keys replace a table's identifying attributes—the natural key—and since the
identifying attributes are likely to be those queried, then the query optimizer is forced to perform a full table scan
when fulfilling likely queries. The remedy to the full table scan is to apply indexes on the identifying attributes, or
sets of them. Where such sets are themselves a candidate key, the index can be a unique index.
These additional indexes, however, will take up disk space and slow down inserts and deletes.
Surrogate key 110
Normalization
The presence of a surrogate key can result in the database administrator forgetting to establish, or accidentally
removing, a secondary unique index on the natural key of the table. Without a unique index on the natural key,
duplicate rows can appear and once present can be difficult to identify.
Businessprocessmodeling
Because surrogate keys are unnatural, flaws can appear when modeling the business requirements. Business
requirements, relying on the natural key, then need to be translated to the surrogate key. A strategy is to draw a clear
distinction between the logical model (in which surrogate keys do not appear) and the physical implementation of
thatmodel,toensurethatthelogicalmodeliscorrectandreasonablywellnormalised,andtoensurethatthephysical model is a
correct implementation of the logical model.
Inadvertentdisclosure
Proprietary information can be leaked if sequential key generators are used. By subtracting a previously generated
sequential key from a recently generated sequential key, one could learn the number of rows inserted during thattime
period. This could expose, for example, the number of transactions or new accounts per period. There are a few ways
to overcome this problem:
• Increase the sequential number by a random amount.
• Generate a completely random primary key. However, to prevent duplication which would cause an insert
rejection,arandomlygeneratedprimarykeymusteitherbequeried(tocheckthatitisnotalreadyinuse),orthe key must
contain enough entropy that one can be confident that collisions will not happen.
Inadvertentassumptions
One might incorrectly infer from sequentially generated surrogate keys that events with a higher primary key value
occurred after events with a lower primary key value. The sequential primary key implies nothing of the kind. It is
possible for inserts to fail and leave gaps, and for those gaps to be filled at some later time. A sequential key value is
not a reliable indicator of chronology. If chronology is important, rely not upon the sequential key but upon a
timestamp. A random key would prevent a person from making the assumption that the key has some bearing toreal-
world chronology only if the person making the assumption is aware that the key is indeed random and has no
bearing upon chronology. A randomly generated primary key must be queried before assigned to prevent duplication
[citation needed]
and cause an insert rejection.
References
ThisarticleisbasedonmaterialtakenfromtheFreeOn-lineDictionaryofComputingpriorto1November2008and
incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.
• Nijssen,G.M.(1976).ModellinginDataBaseManagementSystems.North-HollandPub.Co.
ISBN0-7204-0459-2.
• Engles,R.W.:(1972),ATutorialonData-BaseOrganization,AnnualReviewinAutomaticProgramming,Vol.7, Part 1,
Pergamon Press, Oxford, pp. 1–64.
• Langefors,B(1968).ElementaryFilesandElementaryFileRecords,ProceedingsofFile68,anIFIP/IAG
International Seminar on File Organisation, Amsterdam, November, pp. 89–96.
• Wieringa,R.;deJonge,W.(1991).Theidentificationofobjectsandroles:Objectidentifiersrevisited.CiteSeerX:
[1]
10.1.1.16.3195 .
[2]
• Date,C.J.(1998). "Chapters11and12". RelationalDatabaseWritings1994–1997. ASIN0201398141 .
[3]
• Carter,Breck."IntelligentVersusSurrogateKeys" .Retrieved2006-12-03.
Surrogate key 111
[4]
• Richardson, Lee."Create DataDisaster: Avoid UniqueIndexes –(Mistake3 of10)" .Retrieved 2008-01-19.
[5]
• Berkus,Josh."Database Soup:Primary Keyvil,Part I" . Retrieved2006-12-03.
References
[1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.16.3195
[2] http://www.amazon.co.uk/dp/0201398141
[3] http://www.bcarter.com/intsurr1.htm
[4] http://www.nearinfinity.com/blogs/page/lrichard?entry=create_data_disaster_avoid_unique
[5] http://blogs.ittoolbox.com/database/soup/archives/primary-keyvil-part-i-7327
Armstrong'saxioms
Armstrong's axioms are a set of axioms (or, more precisely, inference rules) used for infer all the
functionaldependencies on a relational database. They were developed by William W. Armstrong on his 1974 paper.
[1]
The axioms are sound in generating only functional dependencies in the closure of a set of functional dependencies
(denotedas )whenappliedtothatset(denotedas ).Theyarealsocompleteinthatrepeatedapplicationof
these rules will generate all functional dependencies in the closure .
More formally, let < ( ), > denote a relational scheme over the set of attributes with a set of functional
dependencies . We say that a functional dependency is logically implied by ,and denote it with if
andonlyifforeveryinstanceof thatsatisfiesthefunctionaldependenciesin ,ralsosatisfies .We denote by the set
of all functional dependencies that are logically implied by .
Furthermore,withrespecttoasetofinferencerules ,wesaythatafunctionaldependency isderivablefrom
the functional dependencies in by the set of inference rules , and we denote it by if and only if is
obtainable by means of repeatedly applying the inference rules in to functional dependencies in . We denote by
the set of all functional dependencies that are derivable from by inference rules in .
Then, a set of inference rules is sound if and only if the following holds:
that is to say, we cannot derive by means of functional dependencies that are not logically implied by . Theset
of inference rules is said to be complete if the following holds:
more simply put, we are able to derive by all the functional dependencies that are logically implied by .
Axioms
Let ( ) be a relation scheme over the set of attributes . Henceforth we will denote by letters , ,
anysubsetof and,forshort,theunionoftwosetsofattributes and by insteadoftheusual ; this
notation is rather standard in database theory when dealing with sets of attributes.
AxiomofReflexivity
If , then
Axiomofaugmentation
If , then for any If , then for any
Armstrong'saxioms 112
Axiomoftransitivity
If and , then
Additional rules
These rules can be derived from above axioms.
Union
If and then
Decomposition
If then and
Pseudotransitivity
If and then
Armstrongrelation
Given a set of functional dependencies , the Armstrongrelationis a relation which satisfies all the functional
dependenciesintheclosure andonlythosedependencies.Unfortunately,theminimum-sizeArmstrongrelation
for agivensetofdependenciescanhaveasizewhichisanexponentialfunctionofthenumberofattributesinthe dependencies
considered.
Externallinks
[2]
• UMBCCMSC461Spring'99
[3]
• CS345 Lecture Notes from Stanford University
References
[1] WilliamWard Armstrong: Dependency Structuresof Data Base Relationships, page580-583. IFIP Congress, 1974.
[2] http://www.cs.umbc.edu/courses/461/current/burt/lectures/lec14/
[3] http://www-db.stanford.edu/~ullman/cs345notes/slides01-1.ps
113
Objects
Relation(database)
In relational database theory, a relation, as originally defined by E.F.
Codd,isasetoftuples(d , d , ..., d ),whereeachelementd isa member
1 2 n j
of D , a data domain. Codd's original definition
j
notwithstanding,andcontrarytotheusualdefinitioninmathematics, there
is no ordering to the elements of the tuples of a relation. Instead,
eachelementistermedanattributevalue.Anattributeisaname
pairedwithadomain(nowadaysmorecommonlyreferredtoastypeor
Relation,tuple,andattributerepresentedastable,row, and column.
datatype).Anattributevalueisanattributenamepairedwithan
elementofthatattribute'sdomain,andatupleisasetofattribute
valuesinwhichnotwodistinctelementshavethesamename.Thus,insomeaccounts,atupleisdescribedasa function, mapping
names to values.
A set of attributes in which no two distinct elements have the same name is called a heading. A set of tuples having
the same heading is called a body. A relation is thus a heading paired with a body, the heading of the relation being
alsotheheadingofeachtupleinitsbody.Thenumberofattributesconstitutingaheadingiscalledthedegree,which term also
applies to tuples and relations. The term n-tuple refers to a tuple of degree n (n>=0).
E.F.Coddusedthetermrelationinitsmathematicalsenseofafinitaryrelation,asetoftuplesonsomesetofnsets
S , S ,,S .Thus,ann-aryrelationisinterpreted,undertheClosedWorldAssumption,astheextensionofsome n-adic
1 2 n
predicate: all and only those n-tuples whose values, substituted for corresponding free variables in the predicate,
yield propositions that hold true, appear in the relation.
The term relation schema refers to a heading paired with a set of constraints defined in terms of that heading. A
relation can thus be seen as an instantiation of a relation schema if it has the heading of that schema and it satisfies
the applicable constraints.
Sometimes a relation schema is taken to include a name. A relational database definition (database schema,
sometimes referred to as a relational schema) can thus be thought of as a collection of named relation schemas.
In implementations, the domain of each attribute is effectively a data type and a named relation schema is effectively
a relation variable or relvar for short (see RelationVariablesbelow).
In SQL, a database language for relational databases, relations are represented by tables, where each row of a table
represents a single tuple, and where the values of each attribute form a column.
Examples
Below is an example of a relation having three named attributes: 'ID' from the domain of integers, and 'Name'and
'Address' from the domain of strings:
Relation (database) 114
A predicate for this relation, using the attribute names to denote free variables, might be "Employee number ID is
knownasNameandlivesatAddress".Examinationoftherelationtellsusthattherearejustfourtuplesforwhichthe predicate
holds true. So, for example, employee 102 is known only by that name, Yonezawa Akinori, and does not live
anywhere else but in Naha, Okinawa. Also, apart from the four employees shown,there is no other employeewho has
both a name and an address.
Under the definition of body, the tuples of a body do not appear in any particular order - one cannot say "The tuple
of'MurataMakoto'isabovethetupleof'MatsumotoYukihiro'",norcanonesay"Thetupleof'YonezawaAkinori'is the first
tuple." A similar comment applies to the rows of an SQL table.
Underthedefinitionofheadingtheelementsofaelementdonotappearinanyparticularordereither,nor,therefore do the
elements of a tuple. A similar comment does not apply here to SQL, which does define an ordering to the columns of
a table.
RelationVariables
A relational database consists of named relation variables (relvars) for the purposes of updating the database in
response to changes in the real world. An update to a single relvar causes the body of the relation assigned to that
variable to be replaced by a different set of tuples. Such variables are classified into two classes: base relation
variablesand derivedrelationvariables, the latter also known as virtualrelvarsbut usually referred to by the
short term view.
A baserelationvariableis a relation variable which is not derived from any other relation variables. In SQLthe
term basetableequates approximately to base relation variable.
A view can be defined by an expression using the operators of the relational algebra or the relational calculus. Such
an expression operates on one or more relations and when evaluated yields another relation. The result is sometimes
referred to as a "derived" relation when the operands are relations assigned to database variables. A view is defined
by giving a name to such an expression, such that the name can subsequently be used as a variable name. (Note that
the expression must then mention at least one base relation variable.)
ByusingaDataDefinitionLanguage(DDL),itisabletodefinebaserelationvariables.InSQL,CREATETABLE
syntax is used to define base tables. The following is an example.
TheDataDefinitionLanguage(DDL)isalsousedtodefinederivedrelationvariables.InSQL,CREATEVIEW
CREATETABLEList_of_people( ID
syntax is used to define a derived relation variable. The following is an example.
INTEGER,
NameCHAR(40),
CREATEVIEWList_of_Okinawa_peopleAS(
AddressCHAR(200),
SELECTID,Name,Address
PRIMARYKEY(ID)
)
Relation (database) 115
FROMList_of_people
WHEREAddressLIKE'%,Okinawa'
)
References
• Date, C. J.(2004). An Introductionto DatabaseSystems (8 ed.).Addison–Wesley. ISBN0-321-19784-4.
Table(database)
Inrelational databases and flat file databases, a table is a set of data elements (values) that is organized using a
model of vertical columns (which are identified by their name) and horizontal rows, the cell being the unit where a
row and column intersect. A table has a specified number of columns, but can have any number of rows
[citationneeded]
. Each row is identified by the values appearing in a particular column subset which has been identified
as a unique key index.
Table is another term for relations; although there is the difference in that a table is usually a multiset (bag) of rows
whereas a relation is a set and does not allow duplicates. Besides the actual data rows, tables generally have
associated with them some metadata, such as constraints on the table or on the values within particular
columns.Wikipedia:Disputed statement|
The data in a table does not have to be physically stored in the database. Views are also relational tables, but their
[clarify]
data are calculated at query time. Another example are nicknames , which represent a pointer to a table in
another database.
Comparisonswithotherdatastructures
In non-relational systems, hierarchical databases, the distant counterpart of a table is a structured file,
representingtherowsofatableineachrecordofthefileandeachcolumninarecord.Thisstructureimpliesthatarecordcanhave
repeating information, Generally in the child data segments.Data are stored in sequence of records which are
equivalent to table term of a relational database.with each record having equivalent rows.
Unlike a spreadsheet, the datatype of field is ordinarily defined by the schema describing the table. Some SQL
systems, such as SQLite, are less strict about field datatype definitions.
Tablesversusrelations
In terms of the relational model of databases, a table can be considered a convenient representation of a relation, but
the two are not strictly equivalent. For instance, an SQL table can potentially contain duplicate rows, whereas a true
relation cannot contain duplicate tuples. Similarly, representation as a table implies a particular ordering to the rows
and columns, whereas a relation is explicitly unordered. However, the database system does not guarantee any
ordering of the rows unless an ORDER BYclause is specified in the SELECTstatement that queries the table.
An equally valid representations of a relation is as an n-dimensional chart, where n is the number of attributes (a
table's columns). For example, a relation with two attributes and three values can be represented as a table with two
columns and three rows, or as a two-dimensional graph with three points. The table and graph representations are
only equivalent if the ordering of rows is not significant, and the table has no duplicate rows.
Table (database) 116
Tabletypes
Two types of tables exist:
• Arelational table, which is the basic structure to hold userdata in a relational database.
• Anobjecttable,whichisatablethatusesanobjecttypetodefineacolumn.Itisdefinedtoholdinstancesof objects of a
defined type.
InSQL,theCREATETABLEstatementcreatesthese tables.
References
Column(database)
Inthecontextofarelationaldatabasetable,acolumnis asetofdatavaluesofaparticularsimpletype,oneforeach row of the
[1]
table. The columns provide the structure according to which the rows are composed.
Thetermfieldisoftenusedinterchangeablywithcolumn,althoughmanyconsideritmorecorrecttousefield(or
fieldvalue)toreferspecificallytothesingleitemthatexistsattheintersectionbetweenonerowandonecolumn. In
relational database terminology, column's equivalent is called attribute.
Forexample, a tablethat represents companies mighthave the following columns:
• ID(integer identifier,unique to eachrow)
• Name (text)
• Addressline1 (text)
• Addressline2 (text)
• City(integeridentifier,drawnfromaseparatetableofcities,fromwhichanystateorcountryinformationwould be
drawn)
• Postal code (text)
• Industry (integer identifier, drawn from a separate table of industries)
• etc.
Each row would provide a data value for each column and would then be understood as a single structured
datavalue,inthiscaserepresentingacompany.Moreformally,eachrowcanbeinterpretedasarelvar,composedofaset
oftuples, with each tuple consisting of the two items: the name of the relevant column and the value this row
provides for that column.
Column1 Column2
Examplesofdatabase:MySQL,SQLServer,Access,Oracle,Sybase,DB2.
Coding involved: SQL [Structured Query Language]
See more at SQL.
Column (database) 117
References
[1] The term "column" also has equivalent application in other, more generic contexts. See e.g., Flat file database, Table (information).
Row(database)
In the context of a relational database, a row—also called a recordor tuple—represents a single, implicitly
structured data item in a table. In simple terms, a database table can be thought of as consisting of rows and columns
or fields. Each row in a table represents a set of related data, and every row in the table has the same structure.
For example, in a table that represents companies, each row would represent a single company. Columns might
represent things like company name, company street address, whether the company is publicly held, its VATnumber,
etc.. In a table that represents the association of employees with departments, each row would associate one
employee with one department.
In a less formal usage, e.g. for a database which is not formally relational, a record is equivalent to a row asdescribed
above, but is not usually referred to as a row.
The implicit structure of a row, and the meaning of the data values in a row, requires that the row be understood as
providing a succession of data values, one in each column of the table. The row is then interpreted as a relvar
composed of a set of tuples, with each tuple consisting of the two items: the name of the relevant column and the
value this row provides for that column.
Each column expects a data value of a particular type. For example, one column might require a unique identifier,
another might require text representing a person's name, another might require an integer representing hourly pay in
cents.
- Column1 Column2
View(SQL)
In database theory, a viewis the result set of a stored query— or map-and-reduce functions — on the data, which the
database users can query just as they would a persistent database collection object. This pre-established query
command is kept in the database dictionary. Unlike ordinary base tables in a relational database, a view does not
form part of the physical schema: as a result set, it is a virtual table computed or collated from data in the database,
dynamically when access to that view is requested. Changes applied to the data in a relevant underlying table are
reflectedinthedatashowninsubsequentinvocationsoftheview.InsomeNoSQLdatabases,viewsaretheonlyway to query
data.
Viewscanprovideadvantagesovertables:
• Views can represent a subset of the data contained in a table; consequently, a view can limit the degree of
exposureoftheunderlyingtablestotheouterworld:agivenusermayhavepermissiontoquerytheview,while denied
access to the rest of the base table.
• Views can join and simplify multiple tables into a single virtual table
• Viewscanactasaggregatedtables,wherethedatabaseengineaggregatesdata(sum,averageetc.)andpresents the
calculated results as part of the data
• Viewscanhidethecomplexityofdata;forexampleaviewcouldappearasSales2000orSales2001,transparently
partitioning the actual underlying table
• Viewstakeverylittlespacetostore;thedatabasecontainsonlythedefinitionofaview,notacopyofallthedata which it
presents
• Depending on the SQL engine used, views can provide extra security
Just as a function (in programming) can provide abstraction, so can a database view. In another parallel with
functions, database users can manipulate nested views, thus one view can aggregate data from other views. Without
the use of views, the normalization of databases above second normal form would become much more difficult.
Views can make it easier to create lossless join decomposition.
Just as rows in a base table lack any defined ordering, rows available through a view do not appear with any default
sorting. A view is a relational table, and the relational model defines a table as a set of rows. Since sets are not
ordered - by definition - nor are the rows of a view. Therefore, an ORDER BY clause in the view definition is
meaningless; the SQL standard (SQL:2003) does not allow an ORDER BY clause in the subquery of a CREATE
VIEW command, just as it is refused in a CREATE TABLE statement. However, sorted data can be obtained from a
view, in the same way as any other table — as part of a query statement on that view. Nevertheless, some DBMS
(such as Oracle Database) do not abide by this SQL standard restriction.
Read-onlyvs.updatableviews
Database practitioners can define views as read-only or updatable. If the database system can determine the reverse
mapping from the view schema to the schema of the underlying base tables, then the view is updatable. INSERT,
UPDATE, and DELETE operations can be performed on updatable views. Read-only views do not support such
operations because the DBMS cannot map the changes to the underlying base tables. A view update is done by key
preservation.
Some systems support the definition of INSTEAD OF triggers on views. This technique allows the definition ofother
logic for execution in place of an insert, update, or delete operation on the views. Thus database systems can
implement data modifications based on read-only views. However, an INSTEAD OF trigger does not change the
read-only or updatable property of the view itself.
View(SQL) 119
Advancedviewfeatures
Variousdatabase management systems have extended the views from read-only subsets of data.
Oracle Database introduced the concept of materialized views: pre-executed, non-virtual views commonly used in
data warehousing. They give a static snapshot of the data and may include data from remote sources. The
accuracyofamaterializedviewdependsonthefrequencyoftriggermechanismsbehinditsupdates.IBMDB2provides so-
called "materialized query tables" (MQTs) for the same purpose. Microsoft SQL Server introduced in its 2000
version indexed views which only store a separate index from the table, but not the entire data. PostgreSQL
implemented materialized views in its 9.3 release.
Equivalence
A view is equivalent to its source query. When queries are run against views, the query is modified. For example, if
there exists a view named accounts_view with the content as follows:
accounts_view:
SELECTname,
money_received, money_sent,
(money_received-money_sent)ASbalance, address,
...
FROMtable_customersc JOIN accounts_table a
ONa.customer_id=c.customer_id
SELECTname,
balance
FROMaccounts_view
TheRDBMSthentakesthesimplequery,replacestheequivalentview,thensendsthefollowingtothequeryoptimizer:
Preprocessedquery:
SELECTname,
balance
FROM(SELECTname,
money_received, money_sent,
(money_received-money_sent)ASbalance, address,
...
FROMtable_customerscJOINaccounts_table a
ONa.customer_id=c.customer_id)
View(SQL) 120
From this point on the optimizer takes the query, removes unnecessary complexity (for example: it is not necessaryto
read the address, since the parent invocation does not make use of it) and then sends the query to the SQL engine for
processing.
Externallinks
[1]
• Materialized query tables in DB2
[2]
• Views in Microsoft SQL Server
[3]
• Views in MySQL
[4]
• Views in PostgreSQL
[5]
• Views in SQLite
[6]
• Views in Oracle 11.2
[7]
• Views in CouchDB
[8]
• Materialized Views in Oracle 11.2
References
[1] http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db2z10.doc.intro/src/tpc/db2z_typesoftables.htm
[2] http://msdn.microsoft.com/en-us/library/ms187956.aspx
[3] http://dev.mysql.com/doc/refman/5.1/en/views.html
[4] http://www.postgresql.org/docs/current/interactive/tutorial-views.html
[5] http://www.sqlite.org/lang_createview.html
[6] http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_8004.htm#SQLRF01504
[7] http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views
[8] http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/statements_6002.htm#SQLRF01302
Databasetransaction
Atransactioncomprisesaunitofworkperformedwithinadatabasemanagementsystem(orsimilarsystem)against a
database, and treated in a coherent and reliable way independent of other transactions. Transactions in a database
environment have two main purposes:
1. Toprovidereliableunitsofworkthatallowcorrectrecoveryfromfailuresandkeepadatabaseconsistentevenin cases of
system failure, when execution stops (completely or partially) and many operations upon a database remain
uncompleted, with unclear status.
2. Toprovideisolationbetweenprogramsaccessingadatabaseconcurrently.Ifthisisolationisnotprovided,the
program's outcome are possibly erroneous.
[1]
Adatabasetransaction,bydefinition,mustbeatomic,consistent,isolatedanddurable. Databasepractitionersoften refer to
these properties of database transactions using the acronym ACID.
Transactions provide an "all-or-nothing" proposition, stating that each work-unit performed in a database must either
complete in its entirety or have no effect whatsoever. Further, the system must isolate each transaction from other
transactions, results must conform to existing constraints in the database, and transactions that complete successfully
must get written to durable storage.
Database transaction 121
Purpose
Databases and other data stores which treat the integrity of data as paramount often include the ability to handle
transactions to maintain the integrity of data. A single transaction consists of one or more independent units of work,
each reading and/or writing information to a database or other data store. When this happens it is often important to
ensure that all such processing leaves the database or data store in a consistent state.
Examples from double-entry accounting systems often illustrate the concept of transactions. In double-entry
accounting every debit requires the recording of an associated credit. If one writes a check for $100 to buy groceries,
a transactional double-entry accounting system must record the following two entries to cover the single transaction:
1. Debit $100 to Groceries Expense Account
2. Credit $100 to Checking Account
A transactional system would make both entries pass or both entries would fail. By treating the recording of multiple
entries as an atomic transactional unit of work the system maintains the integrity of the data recorded. In otherwords,
nobody ends up with a situation in which a debit is recorded but no associated credit is recorded, or viceversa.
Transactionaldatabases
AtransactionaldatabaseisaDBMSwherewritetransactionsonthedatabaseareabletoberolledbackiftheyare not
completed properly (e.g. due to power or connectivity loss).
Most modern[2]relational database management systems fall into the category of databases that supporttransactions.
In a database system a transaction might consist of one or more data-manipulation statements and queries, each
reading and/or writing information in the database. Users of database systems consider consistency and integrity of
data as highly important. A simple transaction is usually issued to the database system in a language like SQL
wrapped in a transaction, using a pattern similar to the following:
1. Begin the transaction
2. Execute a set of data manipulations and/or queries
3. If no errors occur then commit the transaction and end it
4. If errors occur then rollback the transaction and end it
If no errors occurred during the execution of the transaction then the system commits the transaction. A transaction
commit operation applies all data manipulations within the scope of the transaction and persists the results to the
database.Ifanerroroccursduringthetransaction,oriftheuserspecifiesarollbackoperation,thedatamanipulations within the
transaction are not persisted to the database. In no case can a partial transaction be committed to the database since
that would leave the database in an inconsistent state.
Internally,multi-user databases store and process transactions,often by using a transaction ID orXID.
There are multiple varying ways for transactions to be implemented other than the simple way documented above.
Nested transactions, for example, are transactions which contain statements within them that start new transactions
[citationneeded]
(i.e.sub-transactions).Multi-leveltransactionsaresimilarbuthaveafewextraproperties .Anothertype of
transaction is the compensating transaction.
Database transaction 122
InSQL
SQL is inherently transactional, and a transaction is automatically started when another ends. Some databases extend
SQL and implement a START TRANSACTION statement, but while seemingly signifying the start of the
[citation needed]
transaction it merely deactivates autocommit.
The result of any work done after this point will remain invisible to other database-users until the system processes a
COMMITstatement.AROLLBACKstatementcanalsooccur,whichwillundoanyworkperformedsincethelast
transaction.BothCOMMIT andROLLBACK willendthetransaction,andstartanew.Ifautocommitwasdisabled using
START TRANSACTION, autocommit will often also be reenabled.
Some database systems allow the synonyms BEGIN, BEGIN WORK and BEGIN TRANSACTION, and may have
other options available.
Distributedtransactions
Database systems implement distributed transactions as transactions against multiple applications or hosts. A
distributed transaction enforces the ACID properties over multiple systems or data stores, and might include systems
suchasdatabases,filesystems,messagingsystems,andotherapplications.Inadistributedtransactionacoordinating service
ensures that all parts of the transaction are applied to all relevant systems. As with database and other transactions, if
any part of the transaction fails, the entire transaction is rolled back across all affected systems.
Transactionalfilesystems
[3]
The NamesysReiser4 filesystem for Linux supports transactions, and as of MicrosoftWindows Vista, the Microsoft
[4]
NTFS filesystem supports distributed transactions across networks.
References
[1] Atransactionisagroupofoperationsthatareatomic,consistent,isolated,anddurable([[ACID(http://msdn.microsoft.com/en-us/library/
aa366402(VS.85).aspx)]).]
[2] http://en.wikipedia.org/w/index.php?title=Database_transaction&action=edit
[3] http://namesys.com/v4/v4.html#committing
[4] http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/fs/portal.asp
Furtherreading
• Philip A. Bernstein, Eric Newcomer (2009): Principles of Transaction Processing, 2nd Edition
(http://www.elsevierdirect.com/product.jsp?isbn=9781558606234),MorganKaufmann(Elsevier),ISBN978-1-
55860-623-4
• GerhardWeikum,GottfriedVossen(2001),Transactionalinformationsystems:theory,algorithms,andthe
practice of concurrency control and recovery, Morgan Kaufmann, ISBN 1-55860-508-8
Transaction log 123
Transactionlog
Inthefieldofdatabasesincomputerscience,atransactionlog(alsotransactionjournal,databaselog,binarylog
oraudittrail) is a history of actions executed by a database management system to guarantee ACID properties over
crashes or hardware failures. Physically, a log is a file of updates done to the database, stored in stable storage.
If, after a start, the database is found in an inconsistent state or not been shut down properly, the database
management system reviews the database logs for uncommitted transactions and rolls back the changes made by
these transactions. Additionally, all transactions that are already committed but whose changes were not yet
materialized in the database are re-applied. Both are done to ensure atomicity and durability of transactions.
This term is not to be confused with other, human-readable logs that a database management system usuallyprovides.
Anatomyofageneraldatabaselog
Adatabase log record ismade up of:
• Log Sequence Number: A unique id for a log record. With LSNs, logs can be recovered in constant time. Most
logs'LSNsareassignedinmonotonicallyincreasingorder,whichisusefulinrecoveryalgorithms,likeARIES.
• PrevLSN: Alink tothe lastlog record.This impliesdatabase logsare constructedin linkedlistform.
• TransactionID number: A reference to the database transactiongenerating the log record.
• Type: Describes the type of database log record.
• Information about the actual changes that triggered the log record to be written.
Typesofdatabaselogrecords
All log records include the general log attributes above, and also other attributes depending on their type (which
isrecorded in the Type attribute, as above).
• UpdateLogRecordnotes anupdate(change)tothedatabase.Itincludesthisextrainformation:
• PageID:AreferencetothePageIDofthemodifiedpage.
• Length and Offset: Length in bytes and offset of the page are usually included.
• BeforeandAfterImages:Includesthevalueofthebytesofpagebeforeandafterthepagechange.Some databases
may have logs which include one or both images.
• CompensationLogRecordnotestherollbackofaparticularchangetothedatabase.Eachcorrespondwith
exactlyoneotherUpdateLogRecord(althoughthecorrespondingupdatelogrecordisnottypicallystoredinthe
Compensation Log Record). It includes this extra information:
• undoNextLSN:ThisfieldcontainstheLSNofthenextlogrecordthatistobeundonefortransactionthatwrote the last
Update Log.
• CommitRecordnotesadecisiontocommitatransaction.
• AbortRecordnotesadecisiontoabortandhencerollbackatransaction.
• CheckpointRecordnotesthatacheckpointhasbeenmade.Theseareusedtospeeduprecovery.Theyrecord
informationthateliminatestheneedtoreadalongwayintothelog'spast.Thisvariesaccordingtocheckpoint algorithm.
If all dirty pages are flushed while creating the checkpoint (as in PostgreSQL), it might contain:
• redoLSN:Thisisareferencetothefirstlogrecordthatcorrespondstoadirtypage.i.e.thefirstupdatethat wasn't
flushed at checkpoint time. This is where redo must begin on recovery.
• undoLSN:Thisisareferencetotheoldestlogrecordoftheoldestin-progresstransaction.Thisistheoldestlog record
needed to undo all in-progress transactions.
Transaction log 124
• CompletionRecordnotesthatallworkhasbeendoneforthisparticulartransaction.(Ithasbeenfullycommitted or
aborted)
Tables
These tables are maintained in memory, and can be efficiently reconstructed (if not exactly, to an equivalent
state)from the log and the database:
• TransactionTable:Thetablecontainsoneentryforeachactivetransaction.ThisincludesTransactionIDand
lastLSN, where lastLSN describes the LSN of the most recent log record for the transaction.
• DirtyPageTable:Thetablecontainsoneentryforeachdirtypagethathasn'tbeenwrittentodisk.Theentry contains
recLSN, where recLSN is the LSN of the first log record that caused the page to be dirty.
• TransactionLog:ADBMSusesatransactionlogtokeeptrackofalltransactionsthatupdatesthedatabase.The
information stored in this log is used by DBMS for a recovery requirement triggered by 'Roll Back' statement.
Databasetrigger
A databasetriggeris procedural code that is automatically executed in response to certain events on a particular
table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the
database. For example, when a new record (representing a new worker) is added to the employees table, new records
should also be created in the tables of the taxes, vacations and salaries.
Theneedandtheusage
Triggers are commonly used to:
• audit changes (e.g. keep a log of the users and roles involved in changes)
• enhance changes (e.g. ensure that every change to a record is time-stamped by the server's clock)
• enforcebusiness rules (e.g. require that every invoice have at least one line item)
• execute business rules (e.g. notify a manager every time an employee's bank account number changes)
• replicate data (e.g. store a record of every change, to be shipped to another database later)
• enhance performance (e.g. update the account balance after every detail transaction, for faster queries)
The examples above are called Data Manipulation Language (DML) triggers because the triggers are defined as part
of the Data Manipulation Language and are executed at the time the data is manipulated.
SomeWikipedia:Avoidweasel words systems also support non-data triggers, which fire in response to Data
Definition Language (DDL) events such as creating tables, or runtime events such as logon, commit and rollback.
Such DDL triggers can be used for database auditing purposes.
The following are major features of database triggers and their effects:
• triggers do not accept parameters or arguments (but may store affected-data in temporary tables)
• triggerscannotperformcommitorrollbackoperationsbecausetheyarepartofthetriggeringSQLstatement(only through
autonomous transactions)
Database trigger 125
TriggersinDBMS
Belowfollows aseries ofdescriptions ofhow somepopular DBMSsupport triggers.
Oracle
In addition to triggers that fire when data is modified, Oracle 9i supports triggers that fire when schema objects (that
is,tables)aremodifiedandwhenuserlogonorlogoffeventsoccur.Thesetriggertypesarereferredtoas "Schema-level
triggers".
Schema-leveltriggers
• After Creation
• Before Alter
• After Alter
• Before Drop
• After Drop
• Before Logoff
• After Logon
The four main types of triggers are:
1. Row Level Trigger: This gets executed before or after any column value of a row changes
2. Column Level Trigger: This gets executed before or after the specified column changes
3. ForEach Row Type:This trigger getsexecuted once foreach row ofthe result setcaused by insert/update/delete
4. ForEachStatementType:Thistriggergetsexecutedonlyoncefortheentireresultset,butfireseachtimethe statement
is executed.
Mutatingtables
WhenasingleSQLstatementmodifiesseveralrowsofatableatonce,theorderoftheoperationsisnot well-defined; there is no
"order by" clause on "update" statements, for example. Row-level triggers are executed as each row is modified, so
the order in which trigger code is run is also not well-defined. Oracle protects the programmer from this uncertainty
by preventing row-level triggers from modifying other rows in the same table – this is the "mutating table" in the
error message. Side-effects on other tables are allowed, however.
One solution is to have row-level triggers place information into a temporary table indicating what further changes
need to be made, and then have a statement-level trigger fire just once, at the end, to perform the requested changes
and clean up the temporary table.
Because a foreign key's referential actions are implemented via implied triggers, they are similarly restricted. This
may become a problem when defining a self-referential foreign key, or a cyclical set of such constraints, or some
other combination of triggers and CASCADE rules, e.g. user deletes a record from table A, CASCADE rule on table
A deletes a record from table B, trigger on table B attempts to SELECT from table A, error occurs.
Database trigger 126
MicrosoftSQLServer
Microsoft SQL Server supports triggers either before, after, or instead of an insert, update or delete operation. They
can be set on tables and views with the constraint that a view can be referenced only by an INSTEAD OF trigger.
MicrosoftSQLServer2005introducedsupportforDataDefinitionLanguage(DDL)triggers,whichcanfirein reaction to a
very wide range of events, including:
• Drop table
• Create table
• Alter table
• Login events
[1]
Afull list isavailable on MSDN.
Performingconditionalactionsintriggers(ortestingdatafollowingmodification)isdonethroughaccessingthe temporary
Inserted and Deleted tables.
PostgreSQL
PostgreSQL introduced support for triggers in 1997. The following functionality in SQL:2003 was previously
notimplemented in PostgreSQL:
• SQLallowstriggerstofireonupdatestospecificcolumns;Asofversion9.0ofPostgreSQLthisfeatureisalso
implemented in PostgreSQL.
• ThestandardallowstheexecutionofanumberofSQLstatementsotherthanSELECT,INSERT,UPDATE,such as
CREATE TABLE as the triggered action. This can be done through creating a stored procedure or function to
[2]
call CREATE TABLE.
Synopsis:
CREATETRIGGERname{BEFORE|AFTER}{event[OR...]} ON TABLE
Firebird
[ FOR [ EACH ] { ROW | STATEMENT } ]
Firebird supports multiple row-level, BEFORE or AFTER, INSERT, UPDATE, DELETE (or any combination
EXECUTEPROCEDUREfuncname(arguments)
thereof) triggers per table, where they are always "in addition to" the default table changes, and the order of the
triggers relative to each other can be specified where it would otherwise be ambiguous (POSITION clause.) Triggers
may also exist on views, where they are always "instead of" triggers, replacing the default updatable view logic.
(Before version 2.1, triggers on views deemed updatable would run in addition to the default logic.)
Firebird does not raise mutating table exceptions (like Oracle), and triggers will by default both nest and recurse as
required (SQL Server allows nesting but not recursion, by default.) Firebird's triggers use NEW and OLD context
variables (not Inserted and Deleted tables,) and provide UPDATING, INSERTING, and DELETING flags toindicate
the current usage of the trigger.
{CREATE|RECREATE|CREATEORALTER}TRIGGERnameFOR{tablename|
viewname}
[ACTIVE|INACTIVE]
{BEFORE|AFTER}
{INSERT[ORUPDATE][ORDELETE]|UPDATE[ORINSERT][ORDELETE]|
DELETE[ORUPDATE][ORINSERT]} [POSITION
n] AS
BEGIN
Database trigger 127
.....
END
Asof version 2.1, Firebirdadditionally supports the following database-leveltriggers:
• CONNECT (exceptions raised here prevent the connection from completing)
• DISCONNECT
• TRANSACTION START
• TRANSACTIONCOMMIT(exceptionsraisedherepreventthetransactionfromcommitting,orpreparingifa two-
phase commit is involved)
• TRANSACTION ROLLBACK
Database-level triggers can help enforce multi-table constraints, or emulate materialized views. If an exception is
raised in a TRANSACTION COMMIT trigger, the changes made by the trigger so far are rolled back and the client
application is notified, but the transaction remains active as if COMMIT had never been requested; the client
application can continue to make changes and re-request COMMIT.
Syntax for database triggers:
{CREATE|RECREATE|CREATEORALTER}TRIGGERname [ACTIVE |
MySQL
INACTIVE] ON
MySQL5.0.2introduced supportfortriggers. MySQLsupportsthese triggertypes:
{CONNECT|DISCONNECT|TRANSACTIONSTART|TRANSACTIONCOMMIT| TRANSACTION
• ROLLBACK}
Insert Trigger
• [POSITIONn]AS
Update Trigger
• BEGIN
Delete Trigger
Note: MySQL allows only one trigger of each type on each table (i.e. one before insert, one after insert, one before
.....
update,
END one after update, one before delete and one after delete).
Note:MySQL doesNOT fire triggersoutside of astatement (i.e. API's,foreign key cascades)
The SQL:2003 standard mandates that triggers give programmers access to record variables by means of a syntax
such as REFERENCING NEW AS n. For example, if a trigger is monitoring for changes to a salary column one
could write a trigger like the following:
ENDIF;
;
Database trigger 128
--Firstofall,dropanyothertriggerwiththesamename
DROPTRIGGERIFEXISTS`Mytrigger`;
--CreateNewTrigger
DELIMITER$$
CREATE
/*[DEFINER={user|CURRENT_USER}]*/
TRIGGER`DB`.`mytriggers`BEFORE/AFTERINSERT/UPDATE/DELETE
ON `DB`.`<Table Name>`
FOREACHROWBEGIN
END$$
DELIMITER
--Example:
DROPTRIGGERIFEXISTS`Mytrigger`;
DELIMITER$$
CREATE TRIGGER `Mytrigger`
AFTERINSERTONTable_Current
FOR EACH ROW
BEGIN
UPDATETable_Record
IBMDB2LUW
IBM DB2 for distributed systems known as DB2 for LUW (LUW means Linux Unix Windows) supports three
trigger types: Before trigger, After trigger and Instead of trigger. Both statement level and row level triggers are
supported. If there are more triggers for same operation on table then firing order is determined by trigger creation
data. Since version 9.7 IBM DB2 supports autonomous transactions [3].
Before trigger is for checking data and deciding if operation should be permitted. If exception is thrown from before
trigger then operation is aborted and no data are changed. In DB2 before triggers are read only— you can't modify
data in before triggers. After triggers are designed for post processing after requested change was performed. After
triggers can write data into tables and unlike someWikipedia:Avoid weasel words other databases you can write into
any table including table on which trigger operates. Instead of triggers are for making views writeable.
Triggers are usually programmed in SQL PLlanguage.
Database trigger 129
SQLite
SQLite only supports row-level triggers, not statement-level triggers.
CREATE[TEMP|TEMPORARY]TRIGGER[IFNOTEXISTS][database_name.]
Updateableviews, which are not supportedin SQLite, can be emulated withINSTEAD OF triggers.
trigger_name
[BEFORE|AFTER|INSTEADOF]{DELETE|INSERT|UPDATE[OFcolumn_name
XMLdatabases
[,column_name]...]}
An example of implementation of triggers in non-relational database can be Sedna, that provides support for triggers
ON{table_name|view_name}
basedonXQuery.TriggersinSednaweredesignedtobeanalogoustoSQL:2003triggers,butnativelybaseonXML query and
[FOREACHROW][WHENcondition]
update languages (XPath, XQuery and XML update language).
BEGIN
A trigger
...in Sedna is set on any nodes of an XML document stored in database. When these nodes are updated, the
trigger
END automatically executes XQuery queries and updates specified in its body. For example, the following trigger
cancels person node deletion if there are any open auctions referenced by this person:
CREATETRIGGER"trigger3"
References
BEFORE DELETE
[1] http://msdn2.microsoft.com/en-us/library/ms189871(SQL.90).aspx
ONdoc("auction")/site//
[2] http://www.postgresql.org/docs/9.0/static/sql-createtrigger.html
person FOR EACH NODE
[3] http://www.ibm.com/developerworks/data/library/techarticle/dm-0907autonomoustransactions/index.html
DO
Externallinks
{
• Microsoftif(exists($WHERE//open_auction/bidder/personref/@person=$OLD/@id))
SQL Server DROP TRIGGER
then ( )
(http://msdn2.microsoft.com/en-us/library/aa258846(SQL.80).aspx)
• MySQL else$OLD;
Database triggers (http://dev.mysql.com/doc/refman/5.0/en/triggers.html)
}
• MySQLDBCreateTriggers(http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html)
• DB2 CREATE TRIGGER statement
(http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.udb.admin.doc/doc/
r0000931.htm)
• Oracle CREATE TRIGGER
(http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7004.htm#sthref7
885)
Database trigger 130
• PostgreSQLCREATETRIGGER(http://www.postgresql.org/docs/8.2/static/sql-createtrigger.html)
• OracleMutatingTableProblemswithDELETECASCADE(http://www.akadia.com/services/
ora_mutating_table_problems.html)
• SQLite Query Language: CREATE TRIGGER (http://www.sqlite.org/lang_createtrigger.html)
Databaseindex
A databaseindexis a data structure that improves the speed of data retrieval operations on a database table at the
cost of additional writes and the use of more storage space to maintain the extra copy of data. Indexes are used to
quickly locate data without having to search every row in a database table every time a database table is accessed.
Indexes can be created using one or more columns of a database table, providing the basis for both rapid random
lookups and efficient access of ordered records.
In a relational database, indexes are used to quickly and efficiently provide the exact location of the corresponding
data.Anindexisacopyofselectcolumnsofdatafromatablethatcanbesearchedveryefficientlythatalsoincludes a low level
disk block address or direct link to the complete row of data it was copied from. Some databases extend the power of
indexing by allowing indices to be created on functions or expressions. For example, an index could be created on
upper(last_name), which would only store the upper case versions of the last_name field in the index. Another
option sometimes supported is the use of "filtered" indices, where index entries are created only for
thoserecordsthatsatisfysomeconditionalexpression.Afurtheraspectofflexibilityistopermitindexingon user-defined
functions, as well as expressions formed from an assortment of built-in functions.
Usage
Supportforfastlookup
Most database software includes indexing technology that enables sub-linear timelookup to improve performance,as
linear search is inefficient for large databases.
SupposeadatabasecontainsNdataitemsanditisdesiredtoretrieveoneortwoofthembasedonthevalueofoneof the fields. A
naive implementation would retrieve and examine each item until a match was not found. A successful lookup would
retrieve half the objects on average; an unsuccessful lookup all of them for each attempt. This means that the number
of operations in the worst case is O(N) or linear time. Since databases commonly contain millions of objects and
since lookup is a common operation, it is often desirable to improve on this performance.
An index is any data structure that improves the performance of lookup. There are many different data structuresused
for this purpose, and in fact a substantial proportion of the field of Computer Science is devoted to the design and
[citation needed]
analysis of index data structures. There are complex design trade-offs involving lookup performance,
index size, and index update performance. Many index designs exhibit logarithmic (O(log(N))) lookup performance
and in some applications it is possible to achieve flat (O(1)) performance.
Policingthedatabaseconstraints
Indices are used to police database constraints, such as UNIQUE, EXCLUSION, PRIMARY KEY and FOREIGN
KEY. An index may be declared as UNIQUE which creates an implicit constraint on the underlying table. Database
systems usually implicitly create an index on a set of columns declared PRIMARY KEY, and some are capable of
using an already existing index to police this constraint. Many database systems require that both referencing and
referenced sets of columns in a FOREIGN KEY constraint are indexed, thus improving performance of inserts,
updates and deletes to the tables participating in the constraint.
Database index 131
Some database systems support EXCLUSION constraint which ensures that for a newly inserted or updated record a
certain predicate would hold for no other record. This may be used to implement a UNIQUE constraint (withequality
predicate) or more complex constraints, like ensuring that no overlapping time ranges or no intersecting geometry
objects would be stored in the table. An index supporting fast searching for records satisfying the predicate is
[1]
required to police such a constraint.
Indexarchitecture
Non-clustered
The data is present in arbitrary order, but the logicalorderingis specified by the index. The data rows may be
spread throughout the table regardless of the value of the indexed column or expression. The non-clustered indextree
contains the index keys in sorted order, with the leaf level of the index containing the pointer to the record (page and
the row number in the data page in page-organized engines; row offset in file-organized engines).
Ina non-clusteredindex:
• Thephysical order ofthe rows isnot the same asthe index order.
• Typicallycreatedonnon-primarykeycolumnsusedinJOIN,WHERE,andORDERBYclauses. There
can be more than one non-clustered index on a database table.
Clustered
Clustering alters the data block into a certain distinct order to match the index, resulting in the row data being stored
in order. Therefore, only one clustered index can be created on a given database table. Clustered indices can greatly
increase overall speed of retrieval, but usually only where the data is accessed sequentially in the same or reverse
order of the clustered index, or when a range of items is selected.
Since the physical records are in this sort order on disk, the next row item in the sequence is immediately before or
afterthelastone,andsofewerdatablockreadsarerequired.Theprimaryfeatureofaclusteredindexisthereforethe ordering of
the physical data rows in accordance with the index blocks that point to them. Some databases separate the data and
index blocks into separate files, others put two completely different data blocks within the samephysical file(s).
Cluster
Whenmultipledatabasesandmultipletablesarejoined,it'sreferredtoasacluster(nottobeconfusedwithclustered index
described above). The records for the tables sharing the value of a cluster key shall be stored together in the
sameornearbydatablocks.Thismayimprovethejoinsofthesetablesontheclusterkey,sincethematchingrecords are stored
[2]
together and less I/O is required to locate them. The data layout in the tables which are parts of the cluster is
defined by the cluster configuration. A cluster can be keyed with a B-Tree index or a hash table. The data block in
which the table record will be stored is defined by the value of the cluster key.
Database index 132
Columnorder
The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row
identifiersusingonlythefirstindexedcolumn.However,itisnotpossibleorefficient(onmostdatabases)toretrieve the set of
row identifiers using only the second or greater indexed column.
For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If you
are given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it
wouldbeverytedioustofindallthephonenumbersforagivenlastname.Youwouldhavetolookwithineachcity's section for
the entries with that last name. Some databases can do this, others just won’t use the index.
Applicationsandlimitations
Indicesareusefulformanyapplicationsbutcomewithsomelimitations.ConsiderthefollowingSQLstatement:
SELECTfirst_nameFROMpeopleWHERElast_name='Smith';.Toprocessthisstatement
withoutanindexthedatabasesoftwaremustlookatthelast_namecolumnoneveryrowinthetable(thisisknown
asafulltablescan).WithanindexthedatabasesimplyfollowstheB-treedatastructureuntiltheSmithentryhas been found; this
is much less computationally expensive than a full table scan.
ConsiderthisSQLstatement:SELECT email_address FROM customers WHERE email_address LIKE
'%@yahoo.com';. This query would yield an email address for every customer whose email address ends with
"@yahoo.com", but even if the email_address column has been indexed the database must perform a full index scan.
This is because the index is built with the assumption that words go from left to right. With a wildcard at
thebeginningofthesearch-term,thedatabasesoftwareisunabletousetheunderlyingB-treedatastructure(inother
words,theWHERE-clauseisnotsargable).Thisproblemcanbesolvedthroughtheadditionofanotherindex
createdonreverse(email_address)andaSQLquerylikethis:SELECTemail_addressFROM customers
WHERE reverse(email_address) LIKE reverse('%@yahoo.com');.Thisputsthe wild-card at the
right-most part of the query (now moc.oohay@%) which the index on reverse(email_address) cansatisfy.
Typesofindexes
Bitmapindex
A bitmap index is a special kind of index that stores the bulk of its data as bit arrays (bitmaps) and answers most
queries by performing bitwise logical operations on these bitmaps. The most commonly used indexes, such as
B+trees, are most efficient if the values they index do not repeat or repeat a smaller number of times. In contrast, the
bitmap index is designed for cases where the values of a variable repeat very frequently. For example, the gender
field in a customer database usually contains two distinct values: male or female. For such variables, the bitmapindex
can have a significant performance advantage over the commonly used trees.
Database index 133
Denseindex
A dense index in databases is a file with pairs of keys and pointers for every record in the data file. Every key in this
file is associated with a particular pointer to a record in the sorted data file. In clustered indices with duplicate keys,
[3]
the dense index points to the first record with that key.
Sparseindex
A sparse index in databases is a file with pairs of keys and pointers for every block in the data file. Every key in this
file is associated with a particular pointer to the block in the sorted data file. In clustered indices with duplicate keys,
the sparse index points to the lowest search key in each block.
Reverseindex
A reverse key index reverses the key value before entering it in the index. E.g., the value 24538 becomes 83542 in
the index. Reversing the key value is particularly useful for indexing data such as sequence numbers, where new key
values monotonically increase.
Indeximplementations
Indices can be implemented using a variety of data structures. Popular indices include balanced trees, B+ trees and
hashes.
In Microsoft SQL Server, the leaf node of the clustered index corresponds to the actual data, not simply a pointer to
data that resides elsewhere, as is the case with a non-clustered index. Each relation can have a single clustered index
and many unclustered indices.
Indexconcurrencycontrol
An index is typically being accessed concurrently by several transactions and processes, and thus needs
concurrencycontrol. While in principle indexes can utilize the common database concurrency control methods,
specialized concurrency control methods for indexes exist, which are applied in conjunction with the common
methods for a substantial performance gain.
Coveringindex
Inmostcases,anindexisusedtoquicklylocatethedatarecord(s)fromwhichtherequireddataisread.Inother words, the index
is only used to locate data records in the table and not to return data.
Acoveringindexisaspecialcasewheretheindexitselfcontainstherequireddatafield(s)andcanreturnthedata. Consider the
following table (other fields omitted):
Database index 134
ID Name OtherFields
12 Plug ...
13 Lamp ...
14 Fuse ...
To find the Name for ID 13, an index on (ID) will be useful, but the record must still be read to get the Name.
However, an index on (ID, Name) contains the required data field and eliminates the need to look up the record.
A covering index can dramatically speed up data retrieval but may itself be large due to the additional keys, which
slow down data insertion & update. To reduce such index size, some systems allow non-key fields to be included in
the index. Non-key fields are not themselves part of the index ordering but only included at the leaf level, allowing
for a covering index with less overall index size.
Standardization
There is no standard about creating indexes because the ISO SQL Standard does not cover physical aspects. Indexes
are one of the physical parts of database conception among others like storage (tablespace or filegroups). RDBMS
vendors all give a CREATE INDEX syntax with some specific options which depends on functionalities theyprovide
to customers.
References
[1] PostgreSQL9.1.2Documentation:CREATETABLE(http://www.postgresql.org/docs/9.1/static/sql-createtable.html)
[2] Overview of Clusters (http://download.oracle.com/docs/cd/B12037_01/server.101/b10743/schema.htm#sthref1069) Oracle®
DatabaseConcepts 10g Release 1 (10.1)
[3] DatabaseSystems: The CompleteBook. Hector Garcia-Molina, JeffreyD. Ullman, Jennifer D.Widom
Stored procedure 135
Storedprocedure
A stored procedure is a subroutine available to applications that access a relationaldatabase system. A stored
procedure (sometimes called a proc, sproc, StoPro, StoredProc, spor SP) is actually stored in the database
datadictionary.
Typical use for stored procedures include data validation (integrated into the database) or access controlmechanisms.
Furthermore, stored procedures can consolidate and centralize logic that was originally implemented in applications.
Extensive or complex processing that requires execution of several SQL statements is moved intostored procedures,
and all applications call the procedures. One can use nested stored procedures by executing one stored procedure
from within another.
Stored procedures are similar to user-defined functions (UDFs). The major difference is that UDFs can be used like
[1]
anyotherexpressionwithinSQLstatements,whereasstoredproceduresmustbeinvokedusingtheCALL statement.
orCALLprocedure(...)
Stored procedures may return result sets, i.e. the results of a SELECT statement. Such result sets can be processed
EXECUTEprocedure(...)
using cursors, by other stored procedures, by associating a result set locator, or by applications. Stored
proceduresmay also contain declared variables for processing data and cursors that allow it to loop through multiple
rows in atable.StoredprocedureflowcontrolstatementstypicallyincludeIF,WHILE,LOOP,REPEAT,andCASE
statements, and more. Stored procedures can receive variables, return results or modify variables and return
them,depending on how and where the variable is declared.
Implementation
The exact and correct implementation of stored procedures varies from one database system to another. Most major
database vendors support them in some form. Depending on the database system, stored procedures can be
implemented in a variety of programming languages, for example SQL, Java, C, or C++. Stored procedures writtenin
non-SQL programming languages may or may not execute SQL statements themselves.
The increasing adoption of stored procedures led to the introduction of procedural elements to the SQL language in
theSQL:1999andSQL:2003standardsinthepartSQL/PSM.ThatmadeSQLanimperativeprogramminglanguage. Most
database systems offer proprietary and vendor-specific extensions, exceeding SQL/PSM. A standard specification for
Java stored procedures exists as well as SQL/JRT.
Stored procedure 136
Databasesystem Implementationlanguage
CUBRID Java
DB2 SQLPL(closetotheSQL/PSMstandard)orJava
Informix SPLorJava
MySQL ownstoredprocedures,closelyadheringtoSQL/PSMstandard.
PostgreSQL PL/pgSQL, can also use own function languages such as pl/perl or pl/php
Otheruses
In some systems, stored procedures can be used to control transaction management; in others, stored procedures run
inside a transaction such that transactions are effectively transparent to them. Stored procedures can also be invoked
from a database trigger or a condition handler. For example, a stored procedure may be triggered by an insert on a
specific table, or update of a specific field in a table, and the code inside the stored procedure would be executed.
Writing stored procedures as condition handlers also allows database administrators to track errors in the systemwith
greater detail by using stored procedures to catch the errors and record some audit information in the databaseor an
external resource like a file.
ComparisonwithdynamicSQL
Overhead:Because stored procedure statements are stored directly in the database, they may remove all or part of
the compilation overhead that is typically required in situations where software applications send inline (dynamic)
SQL queries to a database. (However, most database systems implement "statement caches" and other
mechanismstoavoidrepetitivecompilationofdynamicSQLstatements.)Inaddition,whiletheyavoidsomeoverhead, pre-
compiled SQL statements add to the complexity of creating an optimal execution plan because not all arguments of
the SQL statement are supplied at compile time. Depending on the specific database implementation and
configuration, mixed performance results will be seen from stored procedures versus generic queries or user defined
functions.
Avoidanceofnetworktraffic:A major advantage with stored procedures is that they can run directly within the
database engine. In a production system, this typically means that the procedures run entirely on a specialized
database server, which has direct access to the data being accessed. The benefit here is that network communication
costs can be avoided completely. This becomes particularly important for complex series of SQL statements.
Encapsulationofbusinesslogic:StoredproceduresallowprogrammerstoembedbusinesslogicasanAPIinthe
database,whichcansimplifydatamanagementandreducetheneedtoencodethelogicelsewhereinclientprograms. This can
result in a lesser likelihood of data corruption by faulty client programs. The database system can ensure data
integrity and consistency with the help of stored procedures.
Delegationofaccess-rights:Inmanysystems,storedprocedurescanbegrantedaccessrightstothedatabasethat users
who execute those procedures do not directly have.
SomeprotectionfromSQLinjectionattacks:Storedprocedurescanbeusedtoprotectagainstinjectionattacks. Stored
procedure parameters will be treated as data even if an attacker inserts SQL commands. Also, some DBMSs
willchecktheparameter'stype.AstoredprocedurethatinturngeneratesdynamicSQLusingtheinputishowever
Stored procedure 137
Comparisonwithfunctions
• Afunction is a subprogram written toperform certain computations
• Ascalarfunctionreturnsonlyasinglevalue(orNULL),whereasatablefunctionreturnsa(relational)table
comprising zero or more rows, each row with one or more columns.
• Functionsmust returna value(using the RETURNkeyword),but forstored procedures thisis notcompulsory.
• Storedprocedurescan useRETURNkeywordbut withoutanyvalue beingpassed.
• FunctionscouldbeusedinSELECTstatements,providedtheydon’tdoanydatamanipulation.However,
procedures cannot be included in SELECT statements.
• Astored procedure canreturn multiple valuesusing the OUTparameter or returnno value atall.
• Astored procedure can save thequery compilation time.
Comparisonwithprepared statements
Prepared statements take an ordinary statement or query and parameterize it so that different literal values can be
used at a later time. Like stored procedures, they are stored on the server for efficiency and provide some protection
from SQL injection attacks. Although simpler and more declarative, prepared statements are not ordinarily written to
use procedural logic and cannot operate on variables. Because of their simple interface and client-side
implementations, prepared statements are more widely reusable between DBMSs.
Disadvantages
• Storedprocedurelanguagesarequiteoftenvendor-specific.Switchingtoanothervendor'sdatabasemostlikely
requires rewriting any existing stored procedures.
• Stored procedure languages from different vendors have different levels of sophistication.
• Forexample,Oracle'sPL/SQLhasmorelanguagefeaturesandbuilt-infeatures(viapackagessuchasDBMS_ and
[citation needed]
UTL_ and others) than Microsoft's T-SQL.
• Toolsupportforwritinganddebuggingstoredproceduresisoftennotasgoodasforotherprogramming
languages, but this differs between vendors and languages.
• Forexample,bothPL/SQLandT-SQLhavededicatedIDEsanddebuggers.PL/PgSQLcanbedebuggedfrom various
IDEs.
References
[1] CallProcedure(http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=/db2/rbafzmstcallstmt.htm)
Externallinks
• StoredProceduresinMySQLFAQ(http://dev.mysql.com/doc/refman/5.7/en/faqs-stored-procs.html)
• AnoverviewofPostgreSQLProceduralLanguagesupport(http://www.postgresql.org/docs/current/
interactive/xplang.html)
• Usinga stored procedure in Sybase ASE (http://www.petersap.nl/SybaseWiki/index.php/Stored_procedure)
• PL/SQLProcedures(http://infolab.stanford.edu/~ullman/fcdb/oracle/or-plsql.html#procedures)
• Oracle Database PL/SQL Language Reference
(http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28370/toc.htm)
Cursor(databases) 138
Cursor(databases)
In computer science, a databasecursoris a control structure that enables traversal over the records in a database.
Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of
database records. The database cursor characteristic of traversal makes cursors akin to the programming language
concept of iterator.
Cursors are used by database programmers to process individual rows returned by database system queries. Cursors
enable manipulation of whole result sets at once. In this scenario, a cursor enables the rows in a result set to be
processed sequentially.
In SQL procedures, a cursor makes it possible to define a result set (a set of data rows) and perform complex logicon
a row by row basis. By using the same mechanics, a SQL procedure can also define a result set and return it directly
to the caller of the SQL procedure or to a client application.
A cursor can be viewed as a pointer to one row in a set of rows. The cursor can only reference one row at a time, but
can move to other rows of the result set as needed.
Usage
To use cursors in SQL procedures, you need to do the following:
1. Declare a cursor that defines a result set.
2. Open the cursor to establish the result set.
3. Fetchthe data into localvariables as needed from thecursor, one row at atime.
4. Close the cursor when done.
To work with cursors you must use the following SQL statements
This section introduces the ways the SQL:2003 standard defines how to use cursors in applications in embedded
SQL. Not all application bindings for relational database systems adhere to that standard, and some (such as CLIor
JDBC) use a different interface.
A programmer makes a cursor known to the DBMS by using a DECLARE ... CURSOR statement and assigning the
cursor a (compulsory) name:
Before code can access the data, it must open the cursor with the OPEN statement. Directly following a successful
DECLAREcursor_nameCURSORFORSELECT...FROM...
opening, the cursor is positioned before the first row in the result set.
Programs position cursors on a specific row in the result set with the FETCH statement. A fetch operation transfers
OPENcursor_name
the data of the row into the application.
Once an application has processed all available rows or the fetch operation is to be positioned on a non-existing row
FETCHcursor_nameINTO...
(compare scrollable cursors below), the DBMS returns a SQLSTATE '02000' (usually accompanied by an
SQLCODE +100) to indicate the end of the result set.
Thefinalstep involvesclosing thecursor usingthe CLOSEstatement:
After closing a cursor, a program can open it again, which implies that the DBMS re-evaluates the same query or a
CLOSEcursor_name
different query and builds a new result set.
Cursor(databases) 139
Scrollablecursors
Programmers may declare cursors as scrollable or not scrollable. The scrollability indicates the direction in which a
cursor can move.
With a non-scrollable (or forward-only) cursor, you can FETCH each row at most once, and the cursor
automatically moves to the next row. After you fetch the last row, if you fetch again, you will put the cursor after the
last row and get the following code: SQLSTATE 02000 (SQLCODE +100).
A program may position a scrollable cursor anywhere in the result set using the FETCH SQL statement. The
keyword SCROLL must be specified when declaring the cursor. The default is NO SCROLL, although different
language bindings like JDBC may apply a different default.
The target position for a scrollable cursor can be specified relatively (from the current cursor position) or absolutely
DECLAREcursor_namesensitivitySCROLLCURSORFORSELECT...FROM...
(from the beginning of the result set).
FETCH[NEXT|PRIOR|FIRST|LAST]FROMcursor_name
FETCHABSOLUTEnFROMcursor_name
FETCHRELATIVEnFROMcursor_name
Scrollable cursors can potentially access the same row in the result set multiple times. Thus, data modifications
(insert, update, delete operations) from other transactions could have an impact on the result set. A cursor can be
SENSITIVE or INSENSITIVE to such data modifications. A sensitive cursor picks up data modifications impacting
the result set of the cursor, and an insensitive cursor does not. Additionally, a cursor may be ASENSITIVE, in which
case the DBMS tries to apply sensitivity as much as possible.
"WITHHOLD"
Cursors are usually closed automatically at the end of a transaction, i.e. when a COMMIT or ROLLBACK (or an
implicit termination of the transaction) occurs. That behavior can be changed if the cursor is declared using the
WITH HOLD clause. (The default is WITHOUT HOLD.) A holdable cursor is kept open over COMMIT and closed
upon ROLLBACK. (Some DBMS deviate from this standard behavior and also keep holdable cursors open over
ROLLBACK.)
When a COMMIT occurs, a holdable cursor is positioned before the next row. Thus, a positioned UPDATE or
DECLAREcursor_nameCURSORWITHHOLDFORSELECT...FROM...
positioned DELETE statement will only succeed after a FETCH operation occurred first in the transaction.
Note that JDBC defines cursors as holdable per default. This is done because JDBC also activates auto-commit per
default. Due to the usual overhead associated with auto-commit and holdable cursors, both features should be
explicitly deactivated at the connection level.
Cursor(databases) 140
Positionedupdate/deletestatements
Cursors can not only be used to fetch data from the DBMS into an application but also to identify a row in a table to
be updated or deleted. The SQL:2003 standard defines positioned update and positioned delete SQL statements for
that purpose. Such statements do not use a regular WHERE clause with predicates. Instead, a cursor identifies the
row. The cursor must be opened and already positioned on a row by means of FETCH statement.
UPDATEtable_name
SET...
WHERECURRENTOFcursor_name
DELETE
FROMtable_name
WHERECURRENTOFcursor_name
The cursor must operate on an updatable result set in order to successfully execute a positioned update or delete
statement. Otherwise, the DBMS would not know how to apply the data changes to the underlying tables referred to
in the cursor.
Cursorsindistributedtransactions
Using cursors in distributed transactions (X/Open XA Environments), which are controlled using a transaction
monitor, is no different than cursors in non-distributed transactions.
One has to pay attention when using holdable cursors, however. Connections can be used by different applications.
Thus, once a transaction has been ended and committed, a subsequent transaction (running in a different application)
could inherit existing holdable cursors. Therefore, an application developer has to be aware of that situation.
CursorsinXQuery
TheXQuerylanguageallowscursorstobecreatedusingthesubsequence()function.
The format is:
Where$resultistheresultoftheinitialXQuery,$startistheitemnumbertostartand$item-countisthenumberof
let$displayed-sequence:=subsequence($result,$start,$item-count) items to
return.
Equivalently this can also be done using a predicate:
Disadvantagesofcursors
The following information may vary depending on the specific database system.
Fetching a row from the cursor may result in a network round trip each time. This uses much more network
bandwidth than would ordinarily be needed for the execution of a single SQL statement like DELETE. Repeated
network round trips can severely impact the speed of the operation using the cursor. Some DBMSs try to reduce this
impact by using block fetch. Block fetch implies that multiple rows are sent together from the server to the client.The
client stores a whole block of rows in a local buffer and retrieves the rows from there until that buffer is exhausted.
Cursors allocate resources on the server, for instance locks, packages, processes, temporary storage, etc. Forexample,
Microsoft SQL Server implements cursors by creating a temporary table and populating it with the query's result set.
If a cursor is not properly closed (deallocated), the resources will not be freed until the SQL session (connection)
itself is closed. This wasting of resources on the server can not only lead to performance degradations but also to
failures.
Example
EMPLOYEES TABLE
SQL>desc EMPLOYEES_DETAILS;
Name Null? Type
SAMPLECURSORKNOWNASEE
v_JOB_IDEMPLOYEES_DETAILS.JOB_ID%TYPE:='IT_PROG';
Cursorc_EMPLOYEES_DETAILSIS
SELECTEMPLOYEE_ID,FIRST_NAME,LAST_NAME
Cursor(databases) 142
FROM EMPLOYEES_DETAILS
WHEREJOB_ID='v_JOB_ID';
BEGIN
OPENc_EMPLOYEES_DETAILS;
LOOP
FETCHc_EMPLOYEES_DETAILSINTOv_employeeID,v_FirstName,v_LASTName;
DBMS_OUTPUT.put_line(v_employeeID)
;
DBMS_OUTPUT.put_line( v_FirstName)
;
DBMS_OUTPUT.put_line( v_LASTName);
EXITWHENc_EMPLOYEES_DETAILS%NOTFOUND;
END LOOP;
CLOSEc_EMPLOYEES_DETAILS;
END;
References
• ChristopherJ.Date: Databasein Depth,O'Reilly& Associates,ISBN 0-596-10012-4
• ThomasM.Connolly,Carolyn E.Begg:Database Systems,Addison-Wesley,ISBN 0-321-21025-5
• RamizElmasri,ShamkantB.Navathe:FundamentalsofDatabaseSystems,Addison-Wesley,ISBN 0-
201-54263-3
• NeilMatthew,RichardStones:BeginningDatabaseswithPostgreSQL:FromNovicetoProfessional,Apress, ISBN
1-59059-478-9
• ThomasKyte:ExpertOne-On-One:Oracle,Apress,ISBN1-59059-525-4
• KevinLoney:OracleDatabase10g:TheComplete Reference,OraclePress,ISBN0-07-225351-7
Externallinks
[2]
• CursorOptimization Tips(for MSSQL Server)
[3]
• Descriptions from Portland Pattern Repository
[4]
• PostgreSQL Documentation
[5]
• BerkeleyDB ReferenceGuide: Cursoroperations
[6]
• Java SE 7
[7]
• Q3SqlCursor Class Reference
[8]
• OCI Scrollable Cursor
[9]
• function oci_new_cursor
[10]
• MySQL'sCursorDocumentation
[11]
• FirebirdSQL cursors documentation
[12] [13]
• CursorsinDB2CLIapplications ;CursorsinDB2SQLstoredprocedures
[14]
• ASimple Example of a MySQL Stored Procedure thatuses a cursor
[15]
• MariaDB/MySQL Cursors: a brief Tutorial
Cursor(databases) 143
References
[1] http://en.wikibooks.org/wiki/XQuery/Searching,Paging_and_Sorting#Paging
[2] http://www.mssqlcity.com/Tips/tipCursor.htm
[3] http://c2.com/cgi/wiki?DistributedCursor
[4] http://www.postgresql.org/docs/8.3/interactive/plpgsql-cursors.html
[5] http://sleepycat.com/docs/ref/am/cursor.html
[6] http://download.oracle.com/javase/7/docs/
[7] http://doc.trolltech.com/4.0/q3sqlcursor.html
[8] http://www.oracle.com/technology/products/oracle9i/daily/mar15.html
[9] http://de2.php.net/manual/en/function.oci-new-cursor.php
[10] http://dev.mysql.com/doc/refman/5.5/en/cursors.html
[11] http://www.firebirdsql.org/refdocs/langrefupd20-psql-declare.html
[12] http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.udb.apdv.cli.doc/doc/c0007645.htm
[13] http://publib.boulder.ibm.com/infocenter/db2luw/v9/topic/com.ibm.db2.udb.apdv.sql.doc/doc/c0024361.htm
[14] http://www.kbedell.com/2009/03/02/a-simple-example-of-a-mysql-stored-procedure-that-uses-a-cursor/
[15] http://falseisnotnull.wordpress.com/2013/06/05/mariadbmysql-cursors-a-brief-tutorial/
Partition(database)
A partitionis a division of a logical database or its constituting elements into distinct independent parts. Database
partitioning is normally done for manageability, performance or availability reasons.
Benefitsofmultiplepartitions
A popular and favourable application of partitioning is in a distributed database management system. Each partition
may be spread over multiple nodes, and users at the node can perform local transactions on the partition. This
increases performance for sites that have regular transactions involving certain views of data, whilst maintaining
availability and security.
Partitioningcriteria
Current high end relational database management systems provide for different criteria to split the database. They
take a partitioning key and assign a partition based on certain criteria. Common criteria are:
Range partitioning
Selects a partition by determining if the partitioning key is inside a certain range. An example could be a
partition for all rows where the columnzipcode has a value between 70000 and 79999.
List partitioning
A partition is assigned a list of values. If the partitioning key has one of these values, the partition is chosen.
For example all rows where the column Country is either Iceland, Norway, Sweden, Finland or
Denmark could build a partition for the Nordic countries.
Hashpartitioning
The value of a hash function determines membership in a partition. Assuming there are four partitions, thehash
function could return a value from 0 to 3.
Compositepartitioningallows for certain combinations of the above partitioning schemes, by for example first
applying a range partitioning and then a hash partitioning. Consistent hashing could be considered a composite of
hash and list partitioning where the hash reduces the key space to a size that can be listed.
Partition (database) 144
Partitioningmethods
The partitioning can be done by either building separate smaller databases (each with its own tables, indices, and
transactionlogs), or by splitting selected elements, for example just one table.
Horizontalpartitioning(also see shard) involves putting different rows into different tables. Perhaps customers
with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to
50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a
view with a union might be created over both of them to provide a complete view of all customers.
Verticalpartitioninginvolvescreatingtableswithfewercolumnsandusingadditionaltablestostoretheremaining
[1]
columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond
that and partitions columns even when already normalized. Different physical storage might be used to realize
vertical partitioning as well; storing infrequently used or very wide columns on a different device, for example, is a
method of vertical partitioning. Done explicitly or implicitly, this type of partitioning is called "row splitting" (the
row is split by its columns). A common form of vertical partitioning is to split dynamic data (slow to find)
fromstaticdata(fasttofind)inatablewherethedynamicdataisnotusedasoftenasthestatic.Creatingaviewacrossthe two
newly created tables restores the original table with a performance penalty, however performance will increase when
accessing the static data e.g. for statistical analysis.
References
[1]
VerticalPartitioningAlgorithmsforDatabaseDesign,byShamkantNavathe,StefanoCeri,GioWiederhold,andJinglieDou,StanfordUniversity
1984 (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.97.8306)
Externallinks
• IBM DB2 partitioning (http://publib.boulder.ibm.com/infocenter/db2help/index.jsp?topic=/
com.ibm.db2.udb.doc/admin/c0004885.htm)
• MySQL partitioning (http://dev.mysql.com/doc/refman/5.5/en/partitioning.html)
• Oraclepartitioning(http://www.oracle.com/us/products/database/options/partitioning/index.htm)
• SQLServerpartitions(http://msdn.microsoft.com/en-us/library/ms190787.aspx)
• PostgreSQLpartitioning(http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html)
• Sybase ASE 15.0 partitioning (http://www.sybase.com/detail?id=1036923)
• MongoDB partitioning (http://www.mongodb.org/display/DOCS/Sharding)
• ScimoreDB partitioning (http://scimore.com/wiki/Distributed_schema)
• VoltDB partitioning (http://community.voltdb.com/docs/UsingVoltDB/ChapAppDesign#DesignPartition)
145
Components
Concurrencycontrol
Ininformation technology and computer science, especially in the fields of computer programming,
operatingsystems,multiprocessors,anddatabases,concurrencycontrolensuresthatcorrectresultsforconcurrentoperati
ons are generated, while getting those results as quickly as possible.
Computer systems, both software and hardware, consist of modules, or components. Each component is designed to
operate correctly, i.e., to obey or to meet certain consistency rules. When components that operate concurrently
interact by messaging or by sharing accessed data (in memory or storage), a certain component's consistency may be
violated by another component. The general area of concurrency control provides rules, methods, design
methodologies,andtheoriestomaintaintheconsistencyofcomponentsoperatingconcurrentlywhileinteracting,and thus the
consistency and correctness of the whole system. Introducing concurrency control into a system means applying
operation constraints which typically result in some performance reduction. Operation consistency and correctness
should be achieved with as good as possible efficiency, without reducing performance below reasonable.
Forexample, a failure in concurrency controlcan result in data corruption fromtorn read or write operations.
Concurrencycontrolindatabases
Comments:
1. This section is applicable to all transactional systems, i.e., to all systems that use database transactions (atomic
transactions;e.g.,transactionalobjectsinSystemsmanagementandinnetworksofsmartphoneswhichtypically
implement private, dedicated database systems), not only general-purpose database management systems
(DBMSs).
2. DBMSs need to deal also with concurrency control issues not typical just to database transactions but rather to
operatingsystemsingeneral.Theseissues(e.g.,seeConcurrencycontrolinoperatingsystemsbelow)areoutof the scope
of this section.
Concurrency control in Database management systems (DBMS; e.g., Bernstein et al. 1987, Weikum and Vossen
2001), other transactional objects, and related distributed applications (e.g., Grid computing and Cloud computing)
ensures that database transactions are performed concurrently without violating the data integrity of the respective
databases. Thus concurrency control is an essential element for correctness in any system where two database
transactions or more, executed with time overlap, can access the same data, e.g., virtually in any general-purpose
database system. Consequently a vast body of related research has been accumulated since database systemsemerged
in the early 1970s. A well established concurrency control theory for database systems is outlined in the references
mentioned above: serializability theory, which allows to effectively design and analyze concurrency control methods
and mechanisms. An alternative theory for concurrency control of atomic transactions over abstractdata types is
presented in (Lynch et al. 1993), and not utilized below. This theory is more refined, complex, with a wider scope,
and has been less utilized in the Database literature than the classical theory above. Each theory has its pros and
cons, emphasis and insight. To some extent they are complementary, and their merging may be useful.
To ensure correctness, a DBMS usually guarantees that only serializabletransaction schedules are generated, unless
serializability is intentionally relaxed to increase performance, but only in cases where application correctness is not
harmed. For maintaining correctness in cases of failed (aborted) transactions (which can always happen for many
reasons) schedules also need to have the recoverability(from abort) property. A DBMS also guarantees that
noeffectofcommittedtransactionsislost,andnoeffectofaborted(rolledback)transactionsremainsintherelated
Concurrency control 146
database. Overall transaction characterization is usually summarized by the ACID rules below. As databases have
become distributed, or needed to cooperate in distributed environments (e.g., Federated databases in the early 1990,
and Cloud computing currently), the effective distribution of concurrency control mechanisms has received special
attention.
Databasetransactionand theACIDrules
The concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood
databasesystembehaviorinafaultyenvironmentwherecrashescanhappenanytime,andrecoveryfromacrashtoa well
understood database state. A database transaction is a unit of work, typically encapsulating a number of operations
over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in database
and also other systems. Each transaction has well defined boundaries in terms of which program/code executions are
included in that transaction (determined by the transaction's programmer via special transaction commands). Every
database transaction obeys the following rules (by support in the database system; i.e., a database system is designed
to guarantee them for the transactions it runs):
• Atomicity- Either the effects of all or none of its operations remain ("all or nothing" semantics) when a
transaction is completed (committed or aborted respectively). In other words, to the outside world a committed
transactionappears(byitseffectsonthedatabase)tobeindivisible,atomic,andanabortedtransactiondoesnot leave
effects on the database at all, as if never existed.
• Consistency- Every transaction must leave the database in a consistent (correct) state, i.e., maintain the
predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction
musttransformadatabasefromoneconsistentstatetoanotherconsistentstate(however,itistheresponsibilityof the
transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it intends
to perform (from the application's point of view) while the predefined integrity rules are enforced by the DBMS).
Thus since a database can be normally changed only by transactions, all the database's states are consistent. An
aborted transaction does not change the database state it has started from, as if it never existed (atomicity above).
• Isolation-Transactionscannotinterferewitheachother(asanendresultoftheirexecutions).Moreover,usually
(depending on concurrency control method) the effects of an incomplete transaction are not even visible to
another transaction. Providing isolation is the main goal of concurrency control.
• Durability-Effectsofsuccessful(committed)transactionsmustpersistthroughcrashes(typicallybyrecording the
transaction's effects and its commit event in a non-volatile memory).
The concept of atomic transaction has been extended during the years to what has become Business transactions
which actually implement types of Workflow and are not atomic. However also such enhanced transactions typically
utilize atomic transactions as components.
Whyisconcurrencycontrolneeded?
If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists.
However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some
unexpected, undesirable result may occur. Here are some typical examples:
1. Thelostupdateproblem:Asecondtransactionwritesasecondvalueofadata-item(datum)ontopofafirstvalue written by
a first concurrent transaction, and the first value is lost to other transactions running concurrentlywhich need, by
their precedence, to read the first value. The transactions that have read the wrong value end with incorrect
results.
2. Thedirtyreadproblem:Transactionsreadavaluewrittenbyatransactionthathasbeenlateraborted.Thisvalue disappears
from the database upon abort, and should not have been read by any transaction ("dirty read"). The reading
transactions end with incorrect results.
Concurrency control 147
3. The incorrect summary problem: While one transaction takes a summary over the values of all the instances of a
repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does
notreflectacorrectresultforany(usuallyneededforcorrectness)precedenceorderbetweenthetwotransactions (if one is
executed before the other), but rather some random result, depending on the timing of the updates, and whether
certain update results have been included in the summary or not.
Most high-performance transactional systems need to run transactions concurrently to meet their performance
requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their
databases consistent.
Concurrencycontrolmechanisms
Categories
The main categories of concurrency control mechanisms are:
• Optimistic- Delay the checking of whether a transaction meets the isolation and other integrity rules (e.g.,
serializability and recoverability) until its end, without blocking any of its (read, write) operations ("...and be
optimistic about the rules being met..."), and then abort a transaction to prevent the violation, if the desired rules
aretobeviolateduponitscommit.Anabortedtransactionisimmediatelyrestartedandre-executed,whichincurs an
obvious overhead (versus executing it to the end only once). If not too many transactions are aborted, then being
optimistic is usually a good strategy.
• Pessimistic-Blockanoperationofatransaction,ifitmaycauseviolationoftherules,untilthepossibilityof violation
disappears. Blocking operations is typically involved with performance reduction.
• Semi-optimistic- Block operations in some situations, if they may cause violation of some rules, and do not
blockinothersituationswhiledelayingruleschecking(ifneeded)totransaction'send,asdonewithoptimistic.
Different categories provide different performance, i.e., different average transaction completion rates (throughput),
depending on transaction types mix, computing level of parallelism, and other factors. If selection and knowledge
about trade-offs are available, then category and method should be chosen to provide the highest performance.
The mutual blocking between two transactions (where each one blocks the other) or more results in a deadlock,
where the transactions involved are stalled and cannot reach completion. Most non-optimistic mechanisms (with
blocking) are prone to deadlocks which are resolved by an intentional abort of a stalled transaction (which releases
the other transactions in that deadlock), and its immediate restart and re-execution. The likelihood of a deadlock is
typically low.
Both blocking, deadlocks, and aborts result in performance reduction, and hence the trade-offs between the
categories.
Methods
Many methods for concurrency control exist. Most of them can be implemented within either main category above.
[1]
The major methods, which have each many variants, and in some cases may overlap or be combined, are:
1. Locking(e.g.,Two-phaselocking-2PL)-Controllingaccesstodatabylocksassignedtothedata.Accessofa
transactiontoadataitem(databaseobject)lockedbyanothertransactionmaybeblocked(dependingonlocktype and
access operation type) until lock release.
2. Serializationgraphchecking(alsocalledSerializability,orConflict,orPrecedencegraphchecking)-Checking for
cycles in the schedule's graph and breaking them by aborts.
3. Timestampordering(TO)-Assigningtimestampstotransactions,andcontrollingorcheckingaccesstodataby
timestamp order.
4. Commitmentordering(orCommitordering;CO)-Controllingorcheckingtransactions'chronologicalorderof
commit events to be compatible with their respective precedence order.
Concurrency control 148
Other major concurrency control types that are utilized in conjunction with the methods above include:
• Multiversionconcurrencycontrol(MVCC)-Increasingconcurrencyandperformancebygeneratinganew
versionofadatabaseobjecteachtimetheobjectiswritten,andallowingtransactions'readoperationsofseveral last
relevant versions (of each object) depending on scheduling method.
• Indexconcurrencycontrol-Synchronizing accessoperationstoindexes,ratherthantouserdata.Specialized
methods provide substantial performance gains.
• Privateworkspacemodel(Deferredupdate)-Eachtransactionmaintainsaprivateworkspaceforitsaccessed
data,anditschangeddatabecomevisibleoutsidethetransactiononlyuponitscommit(e.g.,WeikumandVossen 2001).
This model provides a different concurrency control behavior with benefits in many cases.
The most common mechanism type in database systems since their early days in the 1970s has been Strong
strictTwo-phase locking(SS2PL; also called Rigorous scheduling or Rigorous 2PL) which is a special case (variant)
of both Two-phase locking (2PL) and Commitment ordering (CO). It is pessimistic. In spite of its long name (for
historical reasons) the idea of the SS2PLmechanism is simple: "Release all locks applied by a transaction only after
thetransactionhasended."SS2PL(orRigorousness)isalsothenameofthesetofallschedulesthatcanbegenerated by this
mechanism, i.e., these are SS2PL (or Rigorous) schedules, have the SS2PL (or Rigorousness) property.
Majorgoalsofconcurrencycontrolmechanisms
Concurrency control mechanisms firstly need to operate correctly, i.e., to maintain each transaction's integrity rules
(asrelatedtoconcurrency;application-specificintegrityruleareoutofthescopehere)whiletransactionsarerunning
concurrently, and thus the integrity of the entire transactional system. Correctness needs to be achieved with as good
performance as possible. In addition, increasingly a need exists to operate effectively while transactions are
distributed over processes, computers, and computer networks. Other subjects that may affect concurrency controlare
recovery and replication.
Correctness
Serializability
Forcorrectness,acommonmajorgoalofmostconcurrencycontrolmechanismsisgeneratingscheduleswiththe
Serializabilityproperty. Without serializability undesirable phenomena may occur, e.g., money may disappear from
accounts, or be generated from nowhere. Serializabilityof a schedule means equivalence (in the resulting database
values) to some serial schedule with the same transactions (i.e., in which transactions are sequential with no overlap
in time, and thus completely isolated from each other: No concurrent access by any two transactions to the same data
is possible). Serializability is considered the highest level of isolation among database transactions, and the major
correctness criterion for concurrent transactions. In some cases compromised, relaxed forms of serializability are
allowed for better performance (e.g., the popular Snapshot isolationmechanism) or to meet availability requirements
in highly distributed systems (see Eventual consistency), but only if application's correctness is not violated by the
relaxation (e.g., no relaxation is allowed for money transactions, since by relaxation money can disappear, or appear
from nowhere).
Almost all implemented concurrency control mechanisms achieve serializability by providing Conflict serializablity,
a broad special case of serializability (i.e., it covers, enables most serializable schedules, and does not impose
significant additional delay-causing constraints) which can be implemented efficiently.
Concurrency control 149
Recoverability
See Recoverabilityin Serializability
Comment:While in the general area of systems the term "recoverability" may refer to the ability of a system to
recover from failure or from an incorrect/forbidden state, within concurrency control of database systems this term
has received a specific meaning.
Concurrency control typically also ensures the Recoverabilityproperty of schedules for maintaining correctness in
cases of aborted transactions (which can always happen for many reasons). Recoverability(from abort) means that
no committed transaction in a schedule has read data written by an aborted transaction. Such data disappear from the
database (upon the abort) and are parts of an incorrect database state. Reading such data violates the consistency rule
of ACID. Unlike Serializability, Recoverability cannot be compromised, relaxed at any case, since any relaxation
results in quick database integrity violation upon aborts. The major methods listed above provide serializability
mechanisms. None of them in its general form automatically provides recoverability, and special considerations and
mechanismenhancementsareneededtosupportrecoverability.Acommonlyutilizedspecialcaseofrecoverabilityis
Strictness, which allows efficient database recovery from failure (but excludes optimistic implementations; e.g.,Strict
CO (SCO) cannot have an optimistic implementation, but has semi-optimistic ones).
Comment: Note that the Recoverability property is needed even if no database failure occurs and no database
recovery from failure is needed. It is rather needed to correctly automatically handle transaction aborts, which maybe
unrelated to database failure and recovery from it.
Distribution
With the fast technological development of computing the difference between local and distributed computing over
low latency networks or buses is blurring. Thus the quite effective utilization of local techniques in such distributed
environments is common, e.g., in computer clusters and multi-core processors. However the local techniques have
their limitations and use multi-processes (or threads) supported by multi-processors (or multi-cores) to scale. This
often turns transactions into distributed ones, if they themselves need to span multi-processes. In these cases most
local concurrency control techniques do not scale well.
DistributedserializabilityandCommitment ordering
See Distributed serializabilityin Serializability
As database systems have become distributed, or started to cooperate in distributed environments (e.g.,
Federateddatabases in the early 1990s, and nowadays Grid computing, Cloud computing, and networks with
smartphones), some transactions have become distributed. A distributed transaction means that the transaction spans
processes, and may span computers and geographical sites. This generates a need in effective distributed
concurrency control mechanisms. Achieving the Serializability property of a distributed system's schedule (see
Distributed serializabilityand Global serializability(Modular serializability)) effectively poses special challenges
typically not met by most of the regular serializability mechanisms, originally designed to operate locally. This is
especially due to a need in costly distribution of concurrency control information amid communication and computer
latency. The only known general effective technique for distribution is Commitment ordering, which was disclosed
publicly in 1991 (after beingpatented). Commitmentordering(Commit ordering, CO; Raz 1992) means that
transactions' chronological order of commit events is kept compatible with their respective precedence order. CO
does not require the distributionofconcurrencycontrolinformationandprovidesageneraleffectivesolution(reliable,high-
performance, and scalable) for both distributed and global serializability, also in a heterogeneous environment with
database systems (or other transactional objects) with different (any) concurrency control mechanisms. CO is
indifferent to which mechanism is utilized, since it does not interfere with any transaction operation scheduling
(which most mechanisms control), and only determines the order of commit events. Thus, CO enables the efficient
distribution of allothermechanisms,andalsothedistributionofamixofdifferent(any)localmechanisms,forachieving
Concurrency control 150
distributed and global serializability. The existence of such a solution has been considered "unlikely" until 1991, and
bymanyexpertsalsolater,duetomisunderstandingoftheCOsolution(seeQuotationsinGlobalserializability).An important
side-benefit of CO is automatic distributed deadlock resolution. Contrary to CO, virtually all other techniques (when
not combined with CO) are prone to distributed deadlocks (also called global deadlocks) which need special
handling. CO is also the name of the resulting schedule property: A schedule has the CO property if the
chronological order of its transactions' commit events is compatible with the respective
transactions'precedence(partial) order.
SS2PL mentioned above is a variant (special case) of CO and thus also effective to achieve distributed and global
serializability. It also provides automatic distributed deadlock resolution (a fact overlooked in the research literature
even after CO's publication), as well as Strictness and thus Recoverability. Possessing these desired properties
together with known efficient locking based implementations explains SS2PL's popularity. SS2PL has been utilized
to efficiently achieve Distributed and Global serializability since the 1980, and has become the de facto standard for
it. However, SS2PL is blocking and constraining (pessimistic), and with the proliferation of distribution and
utilization of systems different from traditional database systems (e.g., as in Cloud computing), less constraining
types of CO (e.g., Optimistic CO) may be needed for better performance.
Comments:
1. The Distributed conflict serializability property in its general form is difficult to achieve efficiently, but it is
achievedefficientlyviaitsspecialcaseDistributedCO:Eachlocalcomponent(e.g.,alocalDBMS)needsbothto provide
some form of CO, and enforce a special vote ordering strategy for the Two-phase commit protocol(2PC: utilized
to commit distributed transactions). Differently from the general Distributed CO, Distributed SS2PL exists
automatically when all local components are SS2PL based (in each component CO exists, implied, and the vote
ordering strategy is now met automatically). This fact has been known and utilized since the 1980s (i.e., that
SS2PL exists globally, without knowing about CO) for efficient Distributed SS2PL, which implies Distributed
serializability and strictness (e.g., see Raz 1992, page 293; it is also implied in Bernstein et al. 1987, page 78).
Less constrained Distributed serializability and strictness can be efficiently achieved by Distributed Strict
CO(SCO), or by a mix of SS2PL based and SCO based local components.
2. About the references and Commitment ordering: (Bernstein et al. 1987) was published before the discovery of
CO in 1990. The CO schedule property is called Dynamic atomicityin (Lynch et al. 1993, page 201). CO is
described in (Weikum and Vossen 2001, pages 102, 700), but the description is partial and misses CO's essence.
(Raz1992)wasthefirstrefereedandacceptedforpublicationarticleaboutCOalgorithms(however,publications about an
equivalent Dynamic atomicity property can be traced to 1988). Other CO articles followed. (Bernstein and
Newcomer 2009) note CO as one of the four major concurrency control methods, and CO's ability to provide
interoperability among other methods.
Distributedrecoverability
Unlike Serializability, Distributed recoverability and Distributed strictness can be achieved efficiently in a
straightforward way, similarly to the way Distributed CO is achieved: In each database system they have to be
applied locally, and employ a vote ordering strategy for the Two-phase commit protocol (2PC; Raz 1992, page 307).
As has been mentioned above, Distributed SS2PL, including Distributed strictness (recoverability) and Distributed
commitment ordering (serializability), automatically employs the needed vote ordering strategy, and is achieved
(globally)whenemployedlocallyineach(local)databasesystem(ashasbeenknownandutilizedformanyyears;as a matter of
fact locality is defined by the boundary of a 2PC participant (Raz 1992) ).
Concurrency control 151
Othermajorsubjectsofattention
The design of concurrency control mechanisms is often influenced by the following subjects:
Recovery
All systems are prone to failures, and handling recoveryfrom failure is a must. The properties of the generated
schedules, which are dictated by the concurrency control mechanism, may have an impact on the effectiveness and
efficiency of recovery. For example, the Strictness property (mentioned in the section Recoverability above) is often
desirable for an efficient recovery.
Replication
For high availability database objects are often replicated. Updates of replicas of a same database object need to be
kept synchronized. This may affect the way concurrency control is done (e.g., Gray et al. 1996).
References
• Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987): Concurrency Control and Recovery in
[2]
DatabaseSystems (freePDFdownload),AddisonWesleyPublishingCompany,1987,ISBN0-201-10715-5
[3]
• GerhardWeikum,GottfriedVossen(2001):TransactionalInformationSystems ,Elsevier,ISBN 1-
55860-508-8
• NancyLynch,MichaelMerritt,WilliamWeihl,AlanFekete(1993):AtomicTransactionsinConcurrentand
[4]
Distributed Systems , Morgan Kauffman (Elsevier), August 1993, ISBN 978-1-55860-104-8, ISBN
1-55860-104-X
• YoavRaz(1992):"ThePrincipleofCommitmentOrdering,orGuaranteeingSerializabilityinaHeterogeneous
[5] [6]
Environment of Multiple Autonomous Resource Managers Using Atomic Commitment." (PDF ),
Proceedings of the Eighteenth International Conference on Very Large Data Bases (VLDB), pp. 292-312,
Vancouver, Canada, August 1992. (also DEC-TR 841, Digital Equipment Corporation, November 1990)
Footnotes
[1] PhilipA.Bernstein,EricNewcomer(2009):PrinciplesofTransactionProcessing,2ndEdition(http://www.elsevierdirect.com/product.jsp?
isbn=9781558606234), Morgan Kaufmann (Elsevier), June 2009, ISBN 978-1-55860-623-4 (page 145)
[2] http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx
[3] http://www.elsevier.com/wps/find/bookdescription.cws_home/677937/description#description
[4] http://www.elsevier.com/wps/find/bookdescription.cws_home/680521/description#description
[5] http://www.informatik.uni-trier.de/~ley/db/conf/vldb/Raz92.html
[6] http://www.vldb.org/conf/1992/P292.PDF
Concurrencycontrolinoperatingsystems
Multitasking operating systems, especially real-time operating systems, need to maintain the illusion that all tasks
running on top of them are all running at the same time, even though only one or a few tasks really are running atany
given moment due to the limitations of the hardware the operating system is running on. Such multitasking is fairly
simple when all tasks are independent from each other. However, when several tasks try to use the same resource, or
when tasks try to share information, it can lead to confusion and inconsistency. The task of concurrentcomputing is
to solve that problem. Some solutions involve "locks" similar to the locks used in databases, but they risk causing
problems of their own such as deadlock. Other solutions are Non-blocking algorithms.
Concurrency control 152
References
• AndrewS.Tanenbaum,AlbertSWoodhull(2006):OperatingSystemsDesignandImplementation,3rdEdition,
Prentice Hall, ISBN 0-13-142938-8
• Silberschatz,Avi;Galvin,Peter;Gagne,Greg(2008).OperatingSystemsConcepts,8thedition.JohnWiley&Sons.
ISBN0-470-12872-0.
Datadictionary
A data dictionary, or metadatarepository, as defined in the IBM Dictionary of Computing, is a "centralized
[1]
repository of information about data such as meaning, relationships to other data, origin, usage, and format." The
term may have one of several closely related meanings pertaining to databases and database management systems
(DBMS):
• a document describing a database or collection of databases
• anintegral componentof aDBMS thatis requiredto determineits structure
• a piece of middleware that extends or supplants the native data dictionary of a DBMS
Documentation
Thetermdatadictionaryanddatarepositoryareusedtoindicateamoregeneralsoftwareutilitythanacatalogue. A
catalogueis closely coupled with the DBMS software. It provides the information stored in it to the user and the
DBA, but it is mainly accessed by the various software modules of the DBMS itself, such as DDL and DML
compilers, the query optimiser, the transaction processor, report generators, and the constraint enforcer. On the other
hand, a data dictionary is a data structure that stores metadata, i.e., (structured) data about data. The software
package for a stand-alone data dictionary or data repository may interact with the software modules of the DBMS,
but it is mainly used by the designers, users and administrators of a computer system for information resource
management. These systems are used to maintain information on system hardware and software configuration,
[2]
documentation, application and users as well as other information relevant to system administration.
Ifa data dictionary system is used only by the designers, users, and administrators and not by the DBMS Software, it
iscalledapassivedatadictionary.Otherwise,itis called an activedatadictionaryor datadictionary.When a
passivedatadictionaryisupdated,itisdonesomanuallyandindependentlyfromanychangestoaDBMS(database) structure.
With an active data dictionary, the dictionary is updated first and changes occur in the DBMSautomatically as a
result.
Database users and application developers can benefit from an authoritative data dictionary document that catalogs
[3]
the organization, contents, and conventions of one or more databases. This typically includes the names and
descriptions of various tables (records or Entities) and their contents (fields) plus additional details, like the type and
length of each data element. Another important piece of information that a data dictionary can provide is the
relationship between Tables. This is sometimes referred to in Entity-Relationship diagrams, or if using Set
descriptors, identifying in which Sets database Tables participate.
In an active data dictionary constraints may be placed upon the underlying data. For instance, a Range may be
imposed on the value of numeric data in a data element (field), or a Record in a Table may be FORCED toparticipate
in a set relationship with another Record-Type. Additionally, a distributed DBMS may have certain location
specifics described within its active data dictionary (e.g. where Tables are physically located).
The data dictionary consists of record types (tables) created in the database by systems generated command files,
tailored for each supported back-end DBMS. Command files contain SQL Statements for CREATE TABLE,
CREATEUNIQUEINDEX,ALTERTABLE(forreferentialintegrity),etc.,usingthespecificstatementrequiredby
Data dictionary 153
Middleware
In the construction of database applications, it can be useful to introduce an additional layer of data dictionary
software, i.e. middleware, which communicates with the underlying DBMS data dictionary. Such a "high-level" data
dictionary may offer additional features and a degree of flexibility that goes beyond the limitations of the native"low-
level" data dictionary, whose primary purpose is to support the basic functions of the DBMS, not the
requirementsofatypicalapplication.Forexample,ahigh-leveldatadictionarycanprovidealternative entity-relationship
[4]
models tailored to suit different applications that share a common database. Extensions to the
[5]
datadictionaryalsocanassistinqueryoptimizationagainstdistributeddatabases. Additionally,DBAfunctionsare often
automated using restructuring tools that are tightly coupled to an active data dictionary.
Softwareframeworksaimedatrapidapplicationdevelopmentsometimesincludehigh-leveldatadictionaryfacilities, which
can substantially reduce the amount of programming required to build menus, forms, reports, and other components
of a database application, including the database itself. For example, PHPLens includes a PHPclasslibrary to
[6]
automate the creation of tables, indexes, and foreign key constraints portably for multiple databases. Another PHP-
based data dictionary, part of the RADICORE toolkit, automatically generates program objects,scripts, and SQL
[7]
code for menus and forms with data validation and complex joins. For the ASP.NET environment, Base One's data
dictionary provides cross-DBMS facilities for automated database creation, data validation, performance
[8]
enhancement (caching and index utilization), application security, and extended datatypes. Visual
[9]
DataFlexfeatures provides the ability to use DataDictionaries as class files to form middle layer between the user
interface and the underlying database. The intent is to create standardized rules to maintain data integrity and enforce
business rules throughout one or more related applications.
References
[1] ACM,IBMDictionaryofComputing(http://portal.acm.org/citation.cfm?id=541721),10thedition,1993
[2] Ramez Elmasri, Shamkant B. Navathe: Fundamentals of Database Systems, 3rd. ed. sect. 17.5, p. 582
[3] TechTarget,SearchSOA, What is a data dictionary? (http://searchsoa.techtarget.com/sDefinition/0,,sid26_gci211896,00.html)
[4] U.S.Patent4774661,Databasemanagementsystemwithactivedatadictionary(http://www.freepatentsonline.com/
4774661.html),19November 1985, AT&T
[5] U.S.Patent4769772,Automatedqueryoptimizationmethodusingbothglobalandparallellocaloptimizationsformaterializationaccessplanning
for distributed databases (http://www.freepatentsonline.com/4769772.html), 28 February 1985, Honeywell Bull
[6] PHPLens,ADOdbDataDictionaryLibraryforPHP(http://phplens.com/lens/adodb/docs-datadict.htm)
[7] RADICORE,WhatisaDataDictionary?(http://www.radicore.org/viewarticle.php?article_id=5)
[8] BaseOneInternational Corp.,BaseOneData Dictionary(http://www.boic.com/b1ddic.htm)
[9] VISUALDATAFLEX,features(http://www.visualdataflex.com/features.asp?pageid=1030)
Externallinks
• Yourdon,StructuredAnalysisWiki,DataDictionaries(http://yourdon.com/strucanalysis/wiki/
index.php?title=Chapter_10)
Java Database Connectivity 154
JavaDatabaseConnectivity
JDBC
Type Data Access API
JDBC is a Java-based data access technology (Java Standard Edition platform) from Oracle Corporation. This
technologyisanAPIfortheJavaprogramminglanguagethatdefineshowaclientmayaccessadatabase.Itprovides
methodsforqueryingandupdatingdatainadatabase.JDBCisorientedtowardsrelationaldatabases.A JDBC-to-ODBC
bridge enables connections to any ODBC-accessible data source in the JVM host environment.
Historyandimplementation
Sun Microsystems released JDBC as part of JDK 1.1 on February 19, 1997. It has since formed part of the
JavaStandard Edition.
[1] [2]
TheJDBCclassesarecontainedintheJavapackagejava.sql andjavax.sql .
Starting with version 3.1, JDBC has been developed under the Java Community Process. JSR 54 specifies JDBC 3.0
(included in J2SE 1.4), JSR 114 specifies the JDBC Rowset additions, and JSR 221 is the specification of JDBC 4.0
[3]
(included in Java SE 6).
[4] [5]
Thelatest version,JDBC 4.1,is specified bya maintenancerelease of JSR221 andis included inJava SE7.
Functionality
JDBC allows multiple implementations to exist and be used by the same application. The API provides a mechanism
for dynamically loading the correct Java packages and registering them with the JDBC Driver Manager. The Driver
Manager is used as a connection factory for creating JDBC connections.
JDBC connections support creating and executing statements. These may be update statements such as SQL's
CREATE, INSERT, UPDATE and DELETE, or they may be query statements such as SELECT. Additionally,stored
procedures may be invoked through a JDBC connection. JDBC represents statements using one of the following
classes:
[6]
• Statement –the statement is sent to the database server each and every time.
[7]
• PreparedStatement –thestatementis cachedandthen theexecution pathispre-determined onthe
database server allowing it to be executed multiple times in an efficient manner.
[8]
• CallableStatement –usedforexecutingstoredproceduresonthedatabase.
Update statements such as INSERT, UPDATE and DELETE return an update count that indicates how many rows
were affected in the database. These statements do not return any other information.
Query statements return a JDBC row result set. The row result set is used to walk over the result set. Individual
columns in a row are retrieved either by name or by column number. There may be any number of rows in the result
set. The row result set has metadata that describes the names of the columns and their types.
[2]
ThereisanextensiontothebasicJDBCAPIinthejavax.sql .
JDBC connections are often managed via a connection pool rather than obtained directly from the driver. Examples
[9] [10] [11]
of connection pools include BoneCP , C3P0 and DBCP
Java Database Connectivity 155
JDBCdrivers
JDBC drivers are client-side adapters (installed on the client machine, not on the server) that convert requests from
Java programs to a protocol that the DBMS can understand.
Types
There are commercial and free drivers available for most relational database servers. These drivers fall into one
ofthe following types:
• Type1that callsnative codeof thelocally availableODBC driver.
• Type 2 that calls database vendor native library on a client side. This code then talks to database over network.
• Type 3, the pure-java driver that talks with the server-side middleware that then talks to database.
• Type 4, the pure-java driver that uses database native protocol.
There is also a type called internal JDBC driver, driver embedded with JRE in Java-enabled SQL databases. It's used
for Java stored procedures. This does not belong to the above classification, although it would likely be either a type
2 or type 4 driver (depending on whether the database itself is implemented in Java or not). An example of this is the
KPRB driver supplied with Oracle RDBMS. "jdbc:default:connection" is a relatively standard way of referring
making such a connection (at least Oracle and Apache Derby support it). The distinction here is that the JDBC client
isactuallyrunningaspartofthedatabasebeingaccessed,soaccesscanbemadedirectlyratherthanthroughnetwork protocols.
Sources
• SQLSummit.compublisheslist ofdrivers, includingJDBC driversand vendors
[12]
• Oracleprovides alist ofsome JDBCdrivers andvendors
• SimbaTechnologiesshipsanSDKforbuildingcustomJDBCDriversforanycustom/proprietaryrelationaldata source
• RSSBusType4JDBCDriversforapplications,databases,andwebservices[13].
• DataDirectTechnologiesprovidesacomprehensivesuiteoffastType4JDBCdriversforallmajordatabasethey advertise
as Type 5
• IDSSoftwareprovidesaType3JDBCdriverforconcurrentaccesstoallmajordatabases.Supportedfeatures include
resultset caching, SSL encryption, custom data source, dbShield
• OpenLinkSoftwareshipsJDBCDriversforavarietyofdatabases,includingBridgestootherdataaccess
mechanisms (e.g., ODBC, JDBC) which can provide more functionality than the targeted mechanism
• JDBaccessisaJavapersistencelibraryforMySQLandOraclewhichdefinesmajordatabaseaccessoperationsin an easy
usable API above JDBC
• JNetDirectprovides asuite offully Sun J2EEcertified highperformance JDBC drivers.
• HSQLDBisaRDBMSwithaJDBCdriverandisavailableunderaBSDlicense.
• SchemaCrawlerisanopensourceAPIthatleveragesJDBC,andmakesdatabasemetadataavailableasplainold Java
objects (POJOs)
Java Database Connectivity 156
References
[1] http://download.oracle.com/javase/7/docs/api/java/sql/package-summary.html
[2] http://download.oracle.com/javase/7/docs/api/javax/sql/package-summary.html
[3] JDBCAPISpecificationVersion:4.0(http://java.sun.com/products/jdbc/download.html#corespec40).
[4] JSR-000221JDBC APISpecification4.1(MaintenanceRelease)(http://jcp.org/aboutJava/communityprocess/mrel/jsr221/index.html)
[5] http://docs.oracle.com/javase/7/docs/technotes/guides/jdbc/jdbc_41.html
[6] http://download.oracle.com/javase/7/docs/api/java/sql/Statement.html
[7] http://download.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html
[8] http://download.oracle.com/javase/7/docs/api/java/sql/CallableStatement.html
[9] http://jolbox.com
[10] http://sourceforge.net/projects/c3p0
[11] http://commons.apache.org/dbcp
[12] http://devapp.sun.com/product/jdbc/drivers
[13] http://www.rssbus.com/jdbc/
Externallinks
• Java SE 7 (http://download.oracle.com/javase/7/docs/) This documentation has examples where the JDBC
resources are not closed appropriately (swallowing primary exceptions and being able to cause
[citation needed]
NullPointerExceptions) and has code prone to SQL injection
• java.sql(http://download.oracle.com/javase/7/docs/api/java/sql/package-summary.html) API
Javadocdocumentation
• javax.sql(http://download.oracle.com/javase/7/docs/api/javax/sql/package-summary.html)API
Javadoc documentation
• O/RBroker(http://www.orbroker.org)ScalaJDBCframework
• SqlTool (http://www.hsqldb.org/doc/2.0/util-guide/sqltool-chapt.html) Open source, command-line, generic
JDBC client utility. Works with any JDBC-supporting database.
• JDBCURLStringsandrelatedinformationofAllDatabases.(http://codeoftheday.blogspot.com/2012/12/java-
database-connectivity-jdbc-url.html)
XQueryAPIfor Java 157
XQueryAPIforJava
XQJ
Developer(s) Java Community Process
Website [1]
JSR225:XQueryAPIforJava
Historyandimplementation
The XQuery API for Java was developed at the
JavaCommunity Process as JSR 225. It had some big
[4][5][6][7]
technology backers such as Oracle, IBM,
[8] [9]
BEA Systems, Software AG, Intel, Nokiaand
DataDirect.
[10]
Version 1.0 of the XQuery API for Java Specification was released on June 24, 2009, along with JavaDocs, a
referenceimplementationandaTCK(TechnologyCompatibilityKit)whichimplementingvendorsmustconformto. The
GeneralarchitectureofhowanXQJdriverisusedtocommunicatewithan XML Database from Java A
[11]
XQJ classes are contained in the Java packagejavax.xml.xquery
XQueryAPIfor Java 158
Functionality
XQJallows multiple implementationsto exist andbe used bythe same application.
[12]
XQJconnectionssupportcreatingandexecutingXQueryexpressions.Expressionsmaybeupdating andmay include full
[13]
text searches. XQJ represents XQuery expressions using one of the following classes:
[14]
• XQExpression –theexpression is sentto the XQuery processorevery time.
[15]
• XQPreparedExpression –theexpressioniscachedandtheexecutionpathispre-determinedallowingit
to be executed multiple times in an efficient manner.
XQuery expressions return a result sequence of XDM items which in XQJ are represented through the
[16] [16]
XQResultSequence interface. The programmer can use an XQResultSequence to walk over individual
XDM items in the result sequence. Each item in the sequence has XDM type information associated with it, such as
its node type e.g. element(), document-node() or an XDM atomic type such as xs:string,
xs:integer or xs:dateTime. XDM type information in XQJ can be retrieved via the
[17]
XQItemType interface.
[18]
AtomicXQueryitemscanbeeasilycasttoJavaprimitivesviaXQItemAccessor methodssuchas getByte()
[19] [20] [21]
and getFloat() . Also XQuery items and sequences can be serialized to DOMNode ,
[22] [23] [24]
SAXContentHandler ,StAXXMLStreamReader andthegenericIOReader and
[25]
InputStream classes.
Examples
Basic Example
The following example illustrates creating a connection to an XML Database, submitting an XQuery expression,then
processing the results in Java. Once all of the results have been processed, the connection is closed to free up all
resources associated with it.
//CreateanewconnectiontoanXMLdatabase
XQConnectionconn=vendorDataSource.getConnection("myUser","myPassword");
XQExpressionexpr=conn.createExpression();//CreateareusableXQueryExpressionobject
XQResultSequence result =
expr.executeQuery( "for$ninfn:collection('ca
talog')//item"+
"returnfn:data($n/name)");//executeanXQueryexpression
//Processtheresultsequenceiteratively while
(result.next()) {
// Print the current item in the sequence
System.out.println("Productname:"+result.getItemAsString(null));
}
//Freeallresourcescreatedbytheconnection
conn.close();
XQueryAPIfor Java 159
Bindingavaluetoanexternalvariable
The following example illustrates how a Java value can be bound to an external variable in an
XQueryexpression.Assume that the connection conn already exists
XQExpressionexpr=conn.createExpression();
Defaultdatatypemapping
Mapping between Java and XQuery data types is largely flexible, however the XQJ 1.0 specification does have
//TheXQueryexpressiontobeexecuted
default mapping rules mapping data types when they are not specified by the" user.
Stringes="declarevariable$xasxs:integerexternal;"+ forThese mapping rules bear great
similarities to the mapping
$n in rules found in JAXB.
fn:collection('catalog')//item" +
The following table illustrates the default mapping rules for when binding Java values to external variables in
"where$n/price<=$x"+
XQuery expressions.
"returnfn:data($n/name)";
DefaultconversionruleswhenmappingfromJavadatatypestoXQuerydatatypes
//Bindavalue(21)toanexternalvariablewiththeQNamex expr.bindInt(new
JavaDatatype
QName("x"), 21, null); DefaultXQueryDataType(s)
boolean xs:boolean
// byte
Execute the XQuery expression
xs:byte
XQResultSequenceresult=expr.executeQuery(es);
byte[] xs:hexBinary
double xs:double
//Processtheresult(sequence)iteratively
float(result.next()) {xs:float
while
//Processtheresult...
int xs:int
} long xs:long
short xs:short
[26] xs:boolean
Boolean
[27] xs:byte
Byte
[28] xs:float
Float
XQueryAPIfor Java 160
[29] xs:double
Double
[30] xs:int
Integer
[31] xs:long
Long
[32] xs:short
Short
[33] xs:string
String
[34] xs:decimal
BigDecimal
[35] xs:integer
BigInteger
[36] xs:dayTimeDurationiftheDurationObject'sstateisxs:dayTimeDuration
Duration
xs:yearMonthDurationiftheDurationObject'sstateisxs:yearMonthDuration
xs:durationiftheDurationObject'sstateisxs:duration
[37] xs:dateiftheXMLGregorianCalendarObject'sstateisxs:date
XMLGregorianCalendar
xs:dateTimeiftheXMLGregorianCalendarObject'sstateisxs:dateTime
xs:gDayiftheXMLGregorianCalendarObject'sstateisxs:gDay
xs:gMonthiftheXMLGregorianCalendarObject'sstateisxs:gMonth
xs:gMonthDayiftheXMLGregorianCalendarObject'sstateisxs:gMonthDay
xs:gYeariftheXMLGregorianCalendarObject'sstateisxs:gYear
xs:gYearMonthiftheXMLGregorianCalendarObject'sstateisxs:gYearMonth
xs:timeiftheXMLGregorianCalendarObject'sstateisxs:time
[38] xs:QName
QName
[39] document-node(element(*,xs:untyped))
Document
[40] document-node(element(*,xs:untyped))
DocumentFragment
[41] element(*,xs:untyped)
Element
[42] attribute(*,xs:untypedAtomic)
Attr
[43] comment()
Comment
[44] processing-instruction()
ProcessingInstruction
[45] text()
Text
XQueryAPIfor Java 161
Knownimplementations
NativeXMLdatabases
Thefollowing is a list ofNative XML Databases which are knownto have XQuery API forJava implementations.
[46]
• MarkLogic
[47]
• eXist
[48]
• BaseX
[49]
• Sedna
[50][51]
• OracleXDB
[52]
• Tamino
• TigerLogic
Relationaldatabases
DataDirect provide XQJ adapters for relational databases, by translating XQuery code into SQL on the fly, then
converting SQL result sets into a format suitable for XQJ to process further. The following is a couple of known
implementations.
• OracleDB(NotXDB)
• IBM DB2
• Microsoft SQL Server
• Sybase ASE
• Informix
• MySQL
• PostgreSQL
Client-sideimplementations
Thefollowing is a list ofclient-side XQuery processors whichprovide an XQuery API forJava interface.
• Saxon XSLT and XQuery processor
[53]
• Zorba
• MXQuery
[54]
• Oracle XQuery Processor
References
[1] http://jcp.org/en/jsr/detail?id=225
[2] XQuery1.0andXPath2.0DataModel(XDM)(http://www.w3.org/TR/xpath-datamodel/)
[3] BindingJavaVariables(http://www.cfoster.net/articles/xqj-tutorial/binding-java-variables.xml)
[4] QueryingXML:XQuery,XPath,andSQL/XMLincontext-JimMeltonandStephenBuxton.ISBN978-1558607118
[5] XQJ-XQueryJavaAPIisCompleted,MarcVanCappellen,ZhenHuaLiu,JimMeltonandMaximOrgiyan(http://www.sigmod.org/
publications/sigmod-record/0912/p07.article.cappellen.pdf)
[6] IBMandOracleSubmitXQueryAPIforJava(XQJ)JavaSpecificationRequest.(http://xml.coverpages.org/ni2003-06-12-b.html)
[7] AnEarlyLookatXQueryAPIforJava(XQJ)-AndrewEisenberg,IBMandJimMelton,Oracle(http://www.sigmod.org/publications/sigmod-
record/0406/JimAndrew.pdf)
[8] TheBEAStreamingXQueryProcessor(http://www.cfoster.net/pdf/reference/10.1.1.92.2337.pdf#page=17)
[9] XQJ Interface for Tamino Native XML Database (http://documentation.softwareag.com/webmethods/wmsuites/wmsuite8-
2_ga/CentraSite/8-2-SP1_CentraSite/dg-xqj/overview.htm)
[10] JSR-000225XQueryAPIfor Java(Final Release)(http://jcp.org/aboutJava/communityprocess/final/jsr225/index.html)
[11] http://xqj.net/javadoc/
[12] XQueryUpdateFacility
[13] XQueryFullText(http://www.w3.org/TR/xpath-full-text-10/)
[14] http://xqj.net/javadoc/javax/xml/xquery/XQExpression.html
XQueryAPIfor Java 162
[15] http://xqj.net/javadoc/javax/xml/xquery/XQPreparedExpression.html
[16] http://xqj.net/javadoc/javax/xml/xquery/XQResultSequence.html
[17] http://xqj.net/javadoc/javax/xml/xquery/XQItemType.html
[18] http://xqj.net/javadoc/javax/xml/xquery/XQItemAccessor.html
[19] http://xqj.net/javadoc/javax/xml/xquery/XQItemAccessor.html#getByte()
[20] http://xqj.net/javadoc/javax/xml/xquery/XQItemAccessor.html#getFloat()
[21] http://download.oracle.com/javase/7/docs/api/org/w3c/dom/Node.html
[22] http://download.oracle.com/javase/7/docs/api/org/xml/sax/ContentHandler.html
[23] http://download.oracle.com/javase/7/docs/api/javax/xml/stream/XMLStreamReader.html
[24] http://download.oracle.com/javase/7/docs/api/java/io/Reader.html
[25] http://download.oracle.com/javase/7/docs/api/java/io/InputStream.html
[26] http://download.oracle.com/javase/7/docs/api/java/lang/Boolean.html
[27] http://download.oracle.com/javase/7/docs/api/java/lang/Byte.html
[28] http://download.oracle.com/javase/7/docs/api/java/lang/Float.html
[29] http://download.oracle.com/javase/7/docs/api/java/lang/Double.html
[30] http://download.oracle.com/javase/7/docs/api/java/lang/Integer.html
[31] http://download.oracle.com/javase/7/docs/api/java/lang/Long.html
[32] http://download.oracle.com/javase/7/docs/api/java/lang/Short.html
[33] http://download.oracle.com/javase/7/docs/api/java/lang/String.html
[34] http://download.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
[35] http://download.oracle.com/javase/7/docs/api/java/math/BigInteger.html
[36] http://download.oracle.com/javase/7/docs/api/javax/xml/datatype/Duration.html
[37] http://download.oracle.com/javase/7/docs/api/javax/xml/datatype/XMLGregorianCalendar.html
[38] http://download.oracle.com/javase/7/docs/api/javax/xml/namespace/QName.html
[39] http://download.oracle.com/javase/7/docs/api/org/w3c/dom/Document.html
[40] http://download.oracle.com/javase/7/docs/api/org/w3c/dom/DocumentFragment.html
[41] http://download.oracle.com/javase/7/docs/api/org/w3c/dom/Element.html
[42] http://download.oracle.com/javase/7/docs/api/org/w3c/dom/Attr.html
[43] http://download.oracle.com/javase/7/docs/api/org/w3c/dom/Comment.html
[44] http://download.oracle.com/javase/7/docs/api/org/w3c/dom/ProcessingInstruction.html
[45] http://download.oracle.com/javase/7/docs/api/org/w3c/dom/Text.html
[46] MarkLogicXQJAPI(http://xqj.net/marklogic)
[47] eXistXQJAPI(http://xqj.net/exist)
[48] BaseXXQJAPI(http://xqj.net/basex)
[49] SednaXQJAPI(http://xqj.net/sedna)
[50] http://www.oracle.com/technetwork/database-features/xmldb/overview/index.html
[51] OracleXMLDBSupportforXQJ(http://docs.oracle.com/cd/E16655_01/appdev.121/e17604/adx_j_xqjxdb.htm#ADXDK136)
[52] SoftwareAG-WorkingwiththeCentraSiteXQJInterface(http://documentation.softwareag.com/webmethods/wmsuites/
wmsuite8-2_ga/CentraSite/8-2-SP1_CentraSite/dg-xqj/working_xqjdriver.htm)
[53] Zorba 2.5shipswith alongawaited XQJbinding,14th June2012(http://www.zorba-xquery.com/html/entry/2012/06/14/Zorba_25)
[54] OracleXMLDeveloper'sKit(XDK)providesastandaloneXQuery1.0processorforusebyJavaapplications.(http://docs.oracle.com/cd/
E16655_01/appdev.121/e17604/adx_j_xqj.htm#ADXDK99930)
Externallinks
• JavadocforXQJ(http://xqj.net/javadoc/)
• XQJTutorial(http://www.cfoster.net/articles/xqj-tutorial/)
• BuildingBridgesfromJavatoXQuery,CharlesFoster.XMLPrague2012(http://archive.xmlprague.cz/2012/files/
xmlprague-2012-proceedings.pdf#page=197) ( Prezi Presentation (http://prezi.com/lviyahwtaxge/building-
bridges-from-java-to-xquery/))
• JavaIntegrationofXQuery,Hans-JürgenRennau.Balisage2010(http://www.balisage.net/Proceedings/vol5/
html/Rennau01/BalisageVol5-Rennau01.html)
• Orbeon Forms using XQJ (http://wiki.orbeon.com/forms/doc/developer-
guide/processors-xquery-generator#TOC-XQuery-processor-implementations)
• SpringIntegrationXQuerySupport(https://github.com/SpringSource/spring-integration-extensions/tree/
master/spring-integration-xquery)
XQueryAPIfor Java 163
• XQS:XQueryforScala(SitsontopofXQJ)(https://github.com/fancellu/xqs)
ODBC
Incomputing, ODBC(OpenDatabaseConnectivity) is a standard programming languagemiddlewareAPIfor
accessing database management systems (DBMS). The designers of ODBC aimed to make it independent ofdatabase
systems and operating systems; an application written using ODBC can be ported to other platforms, both on the
client and server side, with few changes to the data access code.
ODBC accomplishes DBMS independence by using an ODBCdriveras a translation layer between the application
and the DBMS. The application uses ODBC functions through an ODBCdrivermanagerwith which it is linked,
and the driver passes the query to the DBMS. An ODBC driver can be thought of as analogous to a printer or other
driver, providing a standard set of functions for the application to use, and implementing DBMS-specific
functionality. An application that can use ODBC is referred to as "ODBC-compliant". Any ODBC-compliant
applicationcanaccessanyDBMSforwhichadriverisinstalled.DriversexistforallmajorDBMSs,manyotherdata sources
like address book systems and Microsoft Excel, and even for text or CSV files.
ODBC was originally developed by Microsoft during the early 1990s, and became the basis for the Call
LevelInterface (CLI) standardized by SQL Access Group in the Unix and mainframe world. ODBC retained a
number of features that were removed as part of the CLI effort. Full ODBC was later ported back to those platforms,
and becameadefactostandardconsiderablybetterknownthanCLI.TheCLIremainssimilartoODBC,andapplications can
be ported from one platform to the other with few changes.
History
PriortoODBC
The introduction of the mainframe-based relational database during the 1970s led to a proliferation of data access
methods. Generally these systems operated hand-in-hand with a simple command processor that allowed the user to
type in English-like commands, and receive output. The best-known examples are SEQUEL from IBM and QUEL
from the Ingres project. These systems may or may not allow other applications to access the data directly, and those
that did used a wide variety of methodologies. The introduction of SQL aimed to solve the problem of language
standardization, although substantial differences in implementation remained.
Additionally, since the SQL language had only rudimentary programming features, it was often desired to use SQL
withinaprogramwritteninanotherlanguage,sayFortranorC.ThisledtotheconceptofEmbeddedSQL,which
allowedSQLcodetobe"embedded"withinanotherlanguage.Forinstance,aSQLstatementlikeSELECT * FROM city
couldbeinsertedastextwithinCsourcecode,andduringcompilationitwouldbeconvertedintoa
customformatthatdirectlycalledafunctionwithinalibrarythatwouldpassthestatementintotheSQLsystem.
ResultsreturnedfromthestatementswouldbeinterpretedbackintoCdataformatslikechar * usingsimilar library code.
There were a number of problems with the Embedded SQL approach. Like the different varieties of SQL, the
Embedded SQL's that used them varied widely, not only from platform to platform, but even across languages on a
single platform - a system that allowed calls into IBM's DB2 would look entirely different than one that called into
their own SQL/DSWikipedia:Disputed statement. Another key problem to the Embedded SQL concept was that the
SQL code could only be changed in the program's source code, so that even small changes to the query required
considerable programmer effort to modify. The SQL market referred to this as "static SQL", as opposed to "dynamic
SQL" which could be changed at any time - like the command-line interfaces that shipped with almost all SQL
systems,oraprogramminginterfacethatlefttheSQLasplaintextuntilitwascalled.DynamicSQLsystems
ODBC 164
Earlyefforts
By the mid-1980s the rapid improvement in microcomputers, and especially the introduction of the graphical
userinterface and data-rich application programs like Lotus 1-2-3 led to an increasing interest in using
personalcomputers as the client-side platform of choice in client-server computing. Under this model, large
mainframes and minicomputers would be used primarily to serve up data over local area networks to
microcomputers that would interpret, display and manipulate that data. For this model to work, a data access
standard was a requirement - in the mainframe world it was highly likely that all of the computers in a shop were
from a single vendor and clients were computer terminals talking directly to them, but in the micro world there was
no such standardization and any client might access any server using any networking system.
By the late 1980s there were a number of efforts underway to provide an abstraction layer for this purpose. Some of
these were mainframe related, designed to allow programs running on those machines to translate between thevariety
of SQL's and provide a single common interface which could then be called by other mainframe or microcomputer
programs. These solutions included IBM's Distributed Relational Database Architecture(DRDA)and Apple
Computer's Data Access Language. Much more common, however, were systems that ran entirely on
microcomputers, including a complete protocol stack that included any required networking or file translation
support.
One of the early examples of such a system was Lotus Development's DataLens, initially known as Blueprint.
Blueprint, developed for 1-2-3, supported a variety of data sources, including SQL/DS, DB2, FOCUS and a varietyof
similar mainframe systems, as well as microcomputer systems like dBase and the early Microsoft/Ashton-Tate
[1]
efforts that would eventually develop into Microsoft SQL Server. Unlike the later ODBC, Blueprint was a purely
code-based system, lacking anything approximating a command language like SQL. Instead, programmers used
datastructures to store the query information, constructing a query by linking many of these structures together.
[2]
Lotus referred to these compound structures as "query trees".
Around the same time, an industry team including members from Sybase, Tandem Computers and Microsoft were
working on a standardized dynamic SQL concept. Much of the system was based on Sybase's DB-Library system,
[3]
with the Sybase-specific sections removed and several additions to support other platforms. DB-Library was aided
by an industry-wide move from library systems that were tightly linked to a particular language, to library systems
that were provided by the operating system and required the languages on that platform to conform to its standards.
This meant that a single library could be used with (potentially) any programming language on a given platform.
The first draft of the Microsoft Data Access API was published in April 1989, about the same time as Lotus'
[4]
announcement of Blueprint. In spite of Blueprint's great lead - it was running when MSDA was still a paperproject
- Lotus eventually joined the MSDA efforts as it became clear that SQL would become the de facto database
[2]
standard. After considerable industry input, in the summer of 1989 the standard became SQLConnectivity, or
[5]
SQLC for short,.
ODBC 165
SAGandCLI
In 1988 a number of vendors, mostly from the Unix and database communities, formed the SQL Access Group
(SAG) in an effort to produce a single basic standard for the SQL language. At the first meeting there was
considerable debate over whether or not the effort should work solely on the SQL language itself, or attempt a wider
standardization which included a dynamic SQL language-embedding system as well, what they called a Call
[6]
LevelInterface (CLI). While attending the meeting with an early draft of what was then still known as MS Data
Access, Kyle Geiger of Microsoft invited Jeff Balboni and Larry Barnes of Digital Equipment Corporation (DEC) to
join the SQLC meetings as well. SQLC was a potential solution to the call for the CLI, which was being led by DEC.
The new SQLC "gang of four", MS, Lotus, DEC and Sybase, brought an updated version of SQLC to the next SAG
[7]
meeting in June 1990. The SAG responded by opening the standard effort to any competing design, but of the
many proposals, only Oracle Corp had a system that presented serious competition. In the end, SQLC won the votes
and became the draft standard, but only after large portions of the API were removed - the standards document was
trimmed from 120 pages to 50 during this time. It was also during this period that the name Call Level Interface was
[7] [8]
formally adopted. In 1995 SQL/CLI became part of the international SQL standard, ISO/IEC 9075-3. The SAG
itself was taken over by the X/Open group in 1996, and, over time, became part of The Open Group's
CommonApplication Environment.
MS continued working with the original SQLC standard, retaining many of the advanced features that were removed
from the CLI version. These included features like scrollable cursors, and metadata information queries. The
commands in the API were split into groups; the Core group was identical to the CLI, the Level 1 extensions were
commands that would be easy to implement in drivers, while Level 2 commands contained the more advanced
features like cursors. A proposed standard was released in December 1991, and industry input was gathered and
[9]
worked into the system through 1992, resulting in yet another name change to ODBC.
JETandODBC
During this time, Microsoft was in the midst of developing their Jet database system. Jet combined three primary
subsystems; an ISAM-based database engine (also known as "Jet", confusingly), a C-based interface allowing
applications to access that data, and a selection of driver DLLs that allowed the same C interface to redirect inputand
output to other ISAM-based databases, like Paradox and xBase. Jet allowed programmers to use a single set of calls
to access common microcomputer databases in a fashion similar to Blueprint (by this point known asDataLens).
However, Jet did not use SQL; like DataLens, the interface was in C and consisted of data structures and function
calls.
The SAG standardization efforts presented an opportunity for Microsoft to adapt their Jet system to the new CLI
standard. This would not only make Windows a premier platform for CLI development, but also allow users to use
SQL to access both Jet and other databases as well. What was missing was the SQL parser that could convert those
calls from their text form into the C-interface used in Jet. To solve this, MS partnered with PageAhead Software to
use their existing query processor, "SIMBA". SIMBA was used as a parser above Jet's C library, turning Jet into an
SQL database. And because Jet could forward those C-based calls to other databases, this also allowed SIMBA to
query other systems. Microsoft included drivers for Excel to turn its spreadsheet documents into SQL-accessible
database tables.
Releaseandcontinueddevelopment
ODBC 1.0 was released in September 1992. At the time, there was little direct support for SQL databases (asopposed
to ISAM), and early drivers were noted for poor performance. Some of this was unavoidable due to the path that the
calls took through the Jet-based stack; ODBC calls to SQL databases were first converted from SIMBA's SQL
dialect to Jet's internal C-based format, then passed to a driver for conversion back into SQL calls for the database.
[10]
Digital Equipment and Oracle both contracted Simba to develop drivers for their databases as well.
ODBC 166
Meanwhile the CLI standard effort dragged on, and it was not until March 1995 that the definitive version was
finalized. By this time Microsoft had already granted Visigenic Software a source code license to develop ODBC on
non-Windows platforms. Visigenic ported ODBC to a wide variety of Unix platforms, where ODBC quickly became
[11]
the de facto standard. "Real" CLI is rare today. The two systems remain similar, and many applications can be
[12]
ported from ODBC to CLI with few or no changes.
Over time, database vendors took over the driver interfaces and provided direct links to their products. Skipping the
intermediate conversions to and from Jet or similar wrappers often resulted in higher performance. However, by this
time Microsoft had changed focus to their OLE DB concept, which provided direct access to a wider variety of data
sources fromaddress books to text files. Several new systems followed which further turned their attention from
ODBC, including DAO, ADO and ADO.net, which interacted more or less with ODBC over their lifetimes.
As Microsoft turned its attention away from working directly on ODBC, the Unix world was increasingly embracing
it. This was propelled by two changes within the market, the introduction of GUIs like GNOME that provided the
need for access to these sources in non-text form, and the emergence of open software database systems like
PostgreSQL and MySQL, initially under Unix. The later adoption of ODBC by Apple for Mac OS X 10.4 using the
standard Unix-side iODBC package further cemented ODBC as the standard for cross-platform data access.
Sun Microsystems used the ODBC system as the basis for their own open standard, JDBC. In most ways, JDBC can
be considered a version of ODBC for the Java programming language as opposed to C. JDBC-to-ODBC "bridges"
allow JDBC programs to access data sources through ODBC drivers on platforms lacking a native JDBC driver,
although these are now relatively rare.
ODBCtoday
ODBC remains largely universal today, with drivers available for most platforms and most databases. It is not
uncommontofindODBCdriversfordatabaseenginesthataremeanttobeembedded,likeSQLite,asawaytoallow existing
[13]
tools to act as front-ends to these engines for testing and debugging.
However, the rise of thin client computing using HTML as an intermediate format has reduced the need for ODBC.
Many web development platforms contain direct links to target databases - MySQL being particularly common. In
these scenarios, there is no direct client-side access nor multiple client software systems to support, everything goes
through the programmer-supplied HTML application. The virtualization that ODBC offers is no longer a strong
requirement, and development of ODBC is no longer as active as it once was.
Versionhistory
Version history:
• 1.0: released in September 1992
• 2.0: ca 1994
• 2.5
• 3.0:ca1995,JohnGoodsonofIntersolvandFrankPellowandPaulCottonofIBMprovidedsignificantinputto OBDC
[14]
3.0
• 3.5: ca 1997
• 3.8: ca 2009, with Windows 7
ODBC 167
DriversandManagers
Drivers
ODBC is based on the device driver model, where the driver encapsulates the logic needed to convert a standard set
of commands and functions into the specific calls required by the underlying system. For instance, a printer driver
presents a standard set of printing commands, the API, to applications using the printing system. Calls made to those
APIs are converted by the driver into the format used by the actual hardware, say PostScript or PCL.
In the case of ODBC, the drivers encapsulate a number of functions that can be broken down into several broad
categories. One set of functions is primarily concerned with finding, connecting to and disconnecting from theDBMS
that driver talks to. A second set is used to send SQL commands from the ODBC system to the DBMS, converting or
interpreting any commands that are not supported internally. For instance, a DBMS that does not
supportcursorscanemulatethisfunctionalityinthedriver.Finally,anothersetofcommands,mostlyusedinternally,
isusedtoconvertdatafromtheDBMS'sinternalformatstoasetofstandardizedODBCformats,whicharebasedon the C
language formats.
AnODBCdriverenablesanODBC-compliantapplicationtouseadatasource,normallyaDBMS.Some non-DBMS drivers
exist, for such data sources as CSV files, by implementing a small DBMS inside the driver itself. ODBC drivers exist
for most DBMSs, including Oracle, PostgreSQL, MySQL, Microsoft SQL Server (but not forthe Compact aka CE
edition),Sybase ASE, and DB2. Because different technologies have different capabilities,most ODBC drivers do not
implement all functionality defined in the ODBC standard. Some drivers offer extra functionality not defined by the
standard.
DriverManager
Device drivers are normally enumerated, set up and managed by a separate Manager layer, which may provide
additional functionality. For instance, printing systems often include functionality to provide spooling functionality
on top of the drivers, providing print spooling for any supported printer.
InODBCtheDriverManager(DM)providesthesefeatures.TheDMcanenumeratetheinstalleddriversandpresent this as a
list, often in a GUI-based form.
But more important to the operation of the ODBC system is the DM's concept of DataSourceNames, or DSN.
DSNs collect additional information needed to connect to a particular data source, as opposed to the DBMS itself.
For instance, the same MySQL driver can be used to connect to any MySQL server, but the connection information
to connect to a local private server is different than the information needed to connect to an internet-hosted public
server. The DSN stores this information in a standardized format, and the DM provides this to the driver during
connection requests. The DM also includes functionality to present a list of DSNs using human readable names, and
to select them at run-time to connect to different resources.
The DM also includes the ability to save partially complete DSN's, with code and logic to ask the user for any
missing information at runtime. For instance, a DSN can be created without a required password. When an ODBC
application attempts to connect to the DBMS using this DSN, the system will pause and ask the user to provide the
password before continuing. This frees the application developer from having to create this sort of code, as well as
having to know which questions to ask. All of this is included in the driver and the DSNs.
ODBC 168
Bridgingconfigurations
Abridge is a special kind of driver: a driverthat uses another driver-based technology.
JDBC-ODBCbridges
A JDBC-ODBC bridge consists of a JDBC driver which employs an ODBC driver to connect to a target database.
This driver translates JDBC method calls into ODBC function calls. Programmers usually use such a bridge when a
particular database lacks a JDBC driver. Sun Microsystems included one such bridge in the JVM, but viewed it as a
stop-gap measure while few JDBC drivers existed. Sun never intended its bridge for production environments, and
generallyrecommendsagainstitsuse.Asof2008[15]independentdata-accessvendorsdeliverJDBC-ODBCbridges which
[citation needed]
support current standards for both mechanisms, and which far outperform the JVM built-in.
ODBC-JDBCbridges
An ODBC-JDBC bridge consists of an ODBC driver which uses the services of a JDBC driver to connect to a
database. This driver translates ODBC function-calls into JDBC method-calls. Programmers usually use such a
bridge when they lack an ODBC driver for a particular database but have access to a JDBC driver.
OLEDB
Microsoft provides an OLE DB-ODBC bridge for simplifying development in COM aware languages (e.g.
VisualBasic). This bridge forms part of the MDACsystem component bundle, together with other database drivers.
References
Citations
[1] EvanMcGlinn,BlueprintLets1-2-3AccessOutsideData"(http://books.google.ca/books?id=6D4EAAAAMBAJ),InfoWorld,4April1988,
p. 1, 69
[2] Geiger 1995, p. 65.
[3] Geiger 1995, p. 86-87.
[4] Geiger 1995, p. 56.
[5] Geiger 1995, p. 106.
[6] Geiger 1995, p. 165.
[7] Geiger 1995, p. 186-187.
[8] ISO/IEC9075-3--Informationtechnology --Databaselanguages-- SQL--Part3: Call-LevelInterface(SQL/CLI)
[9] Geiger 1995, p. 203.
[10] "OurHistory"(http://www.simba.com/simba-history.htm),SimbaTechnologies
[11] RogerSippl,"SQL Access Group's Call-Level Interface"(http://www.drdobbs.com/sql-access-groups-call-level-interface/184410032),
Dr.Dobbs,1February1996
[12] "Similarities and differences between ODBC and CLI"(http://publib.boulder.ibm.com/infocenter/iisclzos/v9r5/index.jsp?topic=/
com.ibm.swg.im.iis.fed.classic.clientsref.doc/topics/iiyfcodbcclisimdiff.html), InfoSphere Classic documentation, IBM, 26 September
2008
[13] ChristianWerner,"SQLiteODBCDriver"(http://www.ch-werner.de/sqliteodbc/)
[14] MicrosoftCorporation.MicrosoftODBC3.0Programmer'sReferenceandSDKGuide,Volume1.MicrosoftPress.February1997.
(ISBN13:9781572315167)
[15] http://en.wikipedia.org/w/index.php?title=ODBC&action=edit
Bibliography
• KyleGeiger,"InsideODBC"(http://books.google.ca/books?id=G-ZQAAAAMAAJ&),MicrosoftPress,1995
ODBC 169
Externallinks
• MicrosoftODBCOverview(http://support.microsoft.com/kb/110093)
• ListofODBCDriversatdatabasedrivers.com(http://www.databasedrivers.com/odbc/)
• ListofODBCDriversatSQLSummit.com(http://www.SQLSummit.com/ODBCVend.htm)
• OS400 and i5OS ODBC Administration
(http://publib.boulder.ibm.com/infocenter/iseries/v5r3/topic/rzaii/rzaiiodbcadm.htm)
• Presentationslides from www.roth.net(http://www.roth.net/perl/odbc/conf/sld002.htm)
• EarlyODBCWhitePaper(http://www.openlinksw.com/info/docs/odbcwhp/tableof.htm)
• Microsoft ODBC & Data Access APIs History Article
(http://blogs.msdn.com/data/archive/2006/12/05/data-access-api-of-the-day-part-i.aspx)
Querylanguage
Querylanguagesarecomputerlanguagesusedtomakequeriesintodatabasesandinformationsystems.
Broadly, query languages can be classified according to whether they are database query languages or
informationretrievalquerylanguages.Thedifferenceisthatadatabasequerylanguageattemptstogivefactualanswerstofactu
al questions, while an information retrieval query language attempts to find documents containing information that is
relevant to an area of inquiry.
Examples include:
• .QL is a proprietary object-oriented query language for querying relational databases; successor of Datalog;
• PL/SQLis Oracle Corporation's procedural extension language for SQL and the Oracle relational database.
• ContextualQueryLanguage(CQL)aformallanguageforrepresentingqueriestoinformationretrievalsystems such as
web indexes or bibliographic catalogues.
• CQLF(CODASYLQueryLanguage,Flat)isaquerylanguageforCODASYL-type databases;
• Concept-Oriented Query Language (COQL) is used in the concept-oriented model (COM). It is based on a novel
datamodelingconstruct,concept,andusessuchoperationsasprojectionandde-projectionformulti-dimensional
analysis, analytical operations and inference;
• DMXisa querylanguage forDataMiningmodels;
• Datalog is a query language for deductive databases;
• F-logicis a declarative object-oriented language for deductive databases and knowledge representation.
• GellishEnglishisalanguagethatcanbeusedforqueriesinGellishEnglishDatabases,fordialogues(requests and
[1]
responses) as well as for information modeling and knowledge modeling;
• HTSQL is a query language that translates HTTP queries to SQL;
• ISBL is a query language for PRTV, one of the earliest relational database management systems;
• LINQ query-expressions is a way to query various data sources from .NETlanguages
• LDAPis anapplication protocol forquerying andmodifying directoryservices running overTCP/IP;
• MQLisacheminformaticsquerylanguageforasubstructuresearchallowingbesidenominalpropertiesalso
numerical properties;
• MDXisaquerylanguageforOLAPdatabases;
• OQL is Object Query Language;
• OCL(ObjectConstraintLanguage).Despiteitsname,OCLisalsoanobjectquerylanguageandanOMGstandard;
• OPath, intended for use in querying WinFSStores;
• OttoQL, intended for querying tables, XML, and databases;
Querylanguage 170
• PoliqarpQueryLanguageisaspecialquerylanguagedesignedtoanalyzeannotatedtext.UsedinthePoliqarp search
engine;
• QUEL is a relational database access language, similar in most ways to SQL;
• RDQLisaRDFquery language;
• SMARTSisthecheminformaticsstandardforasubstructuresearch;
• SPARQLisaquerylanguageforRDFgraphs;
• SPL is a search language for machine-generated big data, based upon Unix Piping and SQL.
• SQL is a well known query language and Data Manipulation Language for relational databases;
• SuprTool is a proprietary query language for SuprTool, a database access program used for accessing data in
Image/SQL (formerly TurboIMAGE) and Oracle databases;
• TMQL Topic Map Query Language is a query language for Topic Maps;
• TutorialDisaquerylanguagefortrulyrelationaldatabasemanagementsystems (TRDBMS);
• XQuery is a query language for XML data sources;
• XPath is a declarative language for navigating XML documents;
• XSPARQLisanintegratedquerylanguagecombiningXQuerywithSPARQLtoquerybothXMLandRDFdata sources
at once;
• YQL is an SQL-like query language created by Yahoo!
References
[1]http://gellish.wiki.sourceforge.net/Querying+a+Gellish+English+database
Queryoptimization
Queryoptimizationisafunction of many relational database management systems. The queryoptimizerattempts to
determine the most efficient way to execute a given query by considering the possible query plans.
Generally, the query optimizer cannot be accessed directly by users: once queries are submitted to database server,
and parsed by the parser, they are then passed to the query optimizer where optimization occurs. However, some
database engines allow guiding the query optimizer with hints.
Aqueryisarequestforinformationfromadatabase.Itcanbeassimpleas"findingtheaddressofapersonwithSS# 123-45-
6789," or more complex like "finding the average salary of all the employed married men in California between the
ages 30 to 39, that earn less than their wives." Queries results are generated by accessing relevant database data and
manipulating it in a way that yields the requested information. Since database structures are
complex,inmostcases,andespeciallyfornot-very-simplequeries,theneededdataforaquerycanbecollectedfrom a database
by accessing it in different ways, through different data-structures, and in different orders. Each different way
typically requires different processing time. Processing times of a same query may have large variance, from a
fraction of a second to hours, depending on the way selected. The purpose of query optimization, which is an
automated process, is to find the way to process a given query in minimum time. The large possible variance in time
justifies performing query optimization, though finding the exact optimal way to execute a query, among all
possibilities,istypicallyverycomplex,timeconsumingbyitself,maybetoocostly,andoftenpracticallyimpossible. Thus
query optimization typically tries to approximate the optimum by comparing several common-sensealternatives to
provide in a reasonable time a "good enough" plan which typically does not deviate much from the best possible
result.
Queryoptimization 171
Generalconsiderations
There is a trade-off between the amount of time spent figuring out the best query plan and the quality of the choice;
the optimizer may not choose the best answer on its own. Different qualities of database management systems have
different ways of balancing these two. Cost-based query optimizers evaluate the resource footprint of various query
plans and use this as the basis for plan selection. These assign an estimated "cost" to each possible query plan, and
choose the plan with the smallest cost. Costs are used to estimate the runtime cost of evaluating the query, in termsof
the number of I/O operations required, CPU path length, amount of disk buffer space, disk storage service time, and
interconnect usage between units of parallelism, and other factors determined from the data dictionary. The setof
query plans examined is formed by examining the possible access paths (e.g., primary index access, secondary index
access, full file scan) and various relational table join techniques (e.g., merge join, hash join, product join).The
search space can become quite large depending on the complexity of the SQL query. There are two types of
optimization. These consist of logical optimization which generates a sequence of relational algebra to solve the
query. In addition there is physical optimization which is used to determine the means of carrying out eachoperation.
Implementation
Most query optimizers represent query plans as a tree of "plan nodes". A plan node encapsulates a single operation
that is required to execute the query. The nodes are arranged as a tree, in which intermediate results flow from the
bottom of the tree to the top. Each node has zero or more child nodes—those are nodes whose output is fed as input
to the parent node. For example, a join node will have two child nodes, which represent the two join operands,
whereas a sort node would have a single child node (the input to be sorted). The leaves of the tree are nodes which
produce results by scanning the disk, for example by performing an index scan or a sequential scan.
Joinordering
The performance of a query plan is determined largely by the order in which the tables are joined. For example,when
joining 3 tables A, B, C of size 10 rows, 10,000 rows, and 1,000,000 rows, respectively, a query plan that joins B and
C first can take several orders-of-magnitude more time to execute than one that joins A and C first. Mostquery
optimizers determine join order via a dynamic programming algorithm pioneered by IBM'sSystem R database
[citation needed]
project . This algorithm works in two stages:
1. First,allwaystoaccesseachrelationinthequeryarecomputed.Everyrelationinthequerycanbeaccessedviaa
sequentialscan.Ifthereisanindexonarelationthatcanbeusedtoanswerapredicateinthequery,anindexscan can also be
used. For each relation, the optimizer records the cheapest way to scan the relation, as well as the cheapest way to
scan the relation that produces records in a particular sorted order.
2. Theoptimizerthenconsiderscombiningeachpairofrelationsforwhichajoinconditionexists.Foreachpair,the optimizer
will consider the available join algorithms implemented by the DBMS. It will preserve the cheapestway to join
each pair of relations, in addition to the cheapest way to join each pair of relations that produces its output
according to a particular sort order.
3. Thenallthree-relationqueryplansarecomputed,byjoiningeachtwo-relationplanproducedbytheprevious phase
with the remaining relations in the query.
In this manner, a query plan is eventually produced that joins all the queries in the relation. Note that the algorithm
keeps track of the sort order of the result set produced by a query plan, also called an interesting order. During
dynamic programming, one query plan is considered to beat another query plan that produces the same result, only if
they produce the same sort order. This is done for two reasons. First, a particular sort order can avoid a
redundantsortoperationlateroninprocessingthequery.Second,aparticularsortordercanspeedupasubsequentjoinbecause
it clusters the data in a particular way.
Queryoptimization 172
Historically, System-R derived query optimizers would often only consider left-deep query plans, which first jointwo
base tables together, then join the intermediate result with another base table, and so on. This heuristic reduces the
number of plans that need to be considered (n! instead of 4^n), but may result in not considering the optimal query
plan. This heuristic is drawn from the observation that join algorithms such as nested loops only require a single
tuple (aka row) of the outer relation at a time. Therefore, a left-deep query plan means that fewer tuples
needtobeheldinmemoryatanytime:theouterrelation'sjoinplanneedonlybeexecuteduntilasingletupleisproduced, and then
the inner base relation can be scanned (this technique is called "pipelining").
Subsequentqueryoptimizershaveexpandedthisplanspacetoconsider"bushy"queryplans,wherebothoperandsto a join
operator could be intermediate results from other joins. Such bushy plans are especially important in
parallelcomputers because they allow different portions of the plan to be evaluated independently.
QueryplanningfornestedSQLqueries
ASQLquerytoamodernrelationalDBMSdoesmorethanjustselectionsandjoins.Inparticular,SQLqueriesoften nest
several layers of SPJ blocks (Select-Project-Join), by means of group by, exists, and not exists operators.
InsomecasessuchnestedSQLqueriescanbeflattenedintoaselect-project-joinquery,butnotalways.Queryplansfor nested
SQL queries can also be chosen using the same dynamic programming algorithm as used for join ordering,but this
can lead to an enormous escalation in query optimization time. So some database management systems use an
alternative rule-based approach that uses a query graph model.
Costestimation
Oneofthehardestproblemsinqueryoptimizationistoaccuratelyestimatethecostsofalternativequeryplans. Optimizers cost
query plans using a mathematical model of query execution costs that relies heavily on estimates of
thecardinality,ornumberoftuples,flowingthrougheachedgeinaqueryplan.Cardinalityestimationinturn depends on
estimates of the selection factor[1] of predicates in the query. Traditionally, database systems estimateselectivities
through fairly detailed statistics on the distribution of values in each column, such as histograms.
Thistechniqueworkswellforestimationofselectivitiesofindividualpredicates.Howevermanyquerieshave
conjunctionsofpredicatessuchasselectcount(*)fromRwhereR.make='Honda'and
R.model='Accord'. Query predicates are often highly correlated (for example, model='Accord'implies
make='Honda'),anditisveryhardtoestimatetheselectivityoftheconjunctingeneral.Poorcardinality estimates and
uncaught correlation are one of the main reasons why query optimizers pick poor query plans. This is one reason why
a database administrator should regularly update the database statistics, especially after major dataloads/unloads.
References
[2]
• Chaudhuri,Surajit(1998)."AnOverviewofQueryOptimizationinRelationalSystems" .Proceedingsofthe ACM
[3]
Symposium on Principles of Database Systems. pp. pages 34–43. doi:10.1145/275487.275492 .
[4]
• Ioannidis,Yannis(March1996)."Queryoptimization" .ACMComputingSurveys28(1):121–123.
[5]
doi:10.1145/234313.234367 .
• Selinger,P.G.;Astrahan,M.M.;Chamberlin,D.D.;Lorie,R.A.;Price,T.G.(1979)."AccessPathSelectionin a
Relational Database Management System". Proceedings of the 1979 ACM SIGMOD International Conference
[6]
on Management of Data. pp. 23–34. doi:10.1145/582095.582099 . ISBN089791001X
Queryoptimization 173
References
[1] http://toolserver.org/%7Edispenser/cgi-bin/dab_solver.py?page=Query_optimization&editintro=Template:Disambiguation_needed/
editintro&client=Template:Dn
[2] http://citeseer.ist.psu.edu/chaudhuri98overview.html
[3] http://dx.doi.org/10.1145%2F275487.275492
[4] http://citeseer.ist.psu.edu/487912.html
[5] http://dx.doi.org/10.1145%2F234313.234367
[6] http://dx.doi.org/10.1145%2F582095.582099
Queryplan
Aqueryplan(orqueryexecutionplan)isanorderedsetofstepsusedtoaccessdatainaSQLrelationaldatabasemanagemen
t system. This is a specific case of the relational model concept of access plans.
Since SQL is declarative, there are typically a large number of alternative ways to execute a given query, withwidely
varying performance. When a query is submitted to the database, the query optimizer evaluates some of the different,
correct possible plans for executing the query and returns what it considers the best alternative. Because query
optimizers are imperfect, database users and administrators sometimes need to manually examine and tune the plans
produced by the optimizer to get better performance.
Generatingqueryplans
A given database management system may offer one or more mechanisms for returning the plan for a given query.
Some packages feature tools which will generate a graphical representation of a query plan. Other tools allow a
special mode to be set on the connection to cause the DBMS to return a textual description of the query plan.Another
mechanism for retrieving the query plan involves querying a virtual database table after executing the query to be
examined. In Oracle, for instance, this can be achieved using the EXPLAIN PLAN statement.
Graphicalplans
TheSQLServerManagementStudiotoolwhichshipswithMicrosoftSQLServer,forexample,showsthisgraphicalplanwhenexecutingthistwo-table
join against a sample database:
TheUIallowsexplorationofvariousattributesoftheoperatorsinvolvedinthequeryplan,includingtheoperatortype,thenumberofrowseachoperator
SELECT*
consumes or produces, and the expected cost AS
FROM HumanResources.Employee of each
e operator's work.
INNERJOINPerson.ContactASc ON
e.ContactID = c.ContactID
Textualplans
ORDERBYc.LastName
|--Sort(ORDERBY:([c].[LastName]ASC))
|--NestedLoops(InnerJoin,OUTERREFERENCES:([e].[ContactID], [Expr1004]) WITH UNORDERED PREFETCH)
|--ClusteredIndex
Queryplan 174
Scan(OBJECT:([AdventureWorks].[HumanResources].[Employee].[PK_Employee_EmployeeID]
AS [e]))
|--Clustered Index Seek(OBJECT:([AdventureWorks].[Person].[Contact].
[PK_Contact_ContactID]
AS[c]),
SEEK:([c].[ContactID]=[AdventureWorks].[HumanResources].[Employee].
[ContactID] as [e].[ContactID]) ORDERED FORWARD)
It indicates that the query engine will do a scan over the primary key index on the Employee table and a matching
seek through the primary key index (the ContactID column) on the Contact table to find matching rows. Theresulting
rows from each side will be shown to a nested loops join operator, sorted, then returned as the result set to the
connection.
In order to tune the query, the user must understand the different operators that the database may use, and whichones
might be more efficient than others while still providing semantically correct query results.
Databasetuning
Reviewing the query plan can present opportunities for new indexes or changes to existing indexes. It can also show
that the database is not properly taking advantage of existing indexes (see query optimizer).
Querytuning
The query optimizer will not always choose the best query plan for a given query. In some databases the query plan
can be reviewed, problems found, and then the query optimizer given hints on how to improve it. In other databases
alternatives to express the same query (other queries that return the same results) can be tried. Some query tools can
generate embedded hints in the query, for use by the optimizer.
Some databases like Oracle provide a Plan table for query tuning. This plan table will return the cost and time for
executing a Query. In Oracle there are 2 optimization techniques:
1. CBO or Cost Based Optimization
2. RBO or Rule Based Optimization
The RBO is slowly being deprecated. For CBO to be used, all the tables referenced by the query must be analyzed.
To analyze a table, a package DBMS_STATS can be made use of.
The others methods for query optimization include:
1. SQL Trace
2. Oracle Trace
3. TKPROF
[1]
• Video tutorial on how to perform SQL performance tuning with reference to Oracle
References
[1] http://seeingwithc.org/sqltuning.html
175
Functions
Databaseadministrationandautomation
Databaseadministrationis the function of managing and maintaining database management systems (DBMS)
software. Mainstream DBMS software such as Oracle, IBM DB2 and Microsoft SQL Server need ongoing
management. As such, corporations that use DBMS software often hire specialized IT ( Information Technology)
personnel called Database Administrators or DBAs.
DBAResponsibilities
• Installation, configuration and upgrading of Database server software and related products.
• Evaluate Database features and Database related products.
• Establish and maintain sound backup and recovery policies and procedures.
• Take care of the Database design and implementation.
• Implement and maintain database security (create and maintain users and roles, assign privileges).
• Database tuning and performance monitoring.
• Application tuning and performance monitoring.
• Setup and maintain documentation and standards.
• Plan growth and changes (capacity planning).
• Work as part of a team and provide 24x7 support when required.
• Dogeneral technicaltroubleshooting andgive cons.
• Database recovery.
Typesofdatabaseadministration
There are three types of DBAs:
1. SystemsDBAs(alsoreferredtoasPhysicalDBAs,OperationsDBAsorProductionSupportDBAs):focusonthe physical
aspects of database administration such as DBMS installation, configuration, patching, upgrades, backups,
restores, refreshes, performance optimization, maintenance and disaster recovery.
2. DevelopmentDBAs:focusonthelogicalanddevelopmentaspectsofdatabaseadministrationsuchasdatamodel design
and maintenance, DDL (data definition language) generation, SQL writing and tuning, coding storedprocedures,
collaborating with developers to help choose the most appropriate DBMS feature/functionality and other pre-
production activities.
3. Application DBAs: usually found in organizations that have purchased 3rd partyapplication software such as
ERP (enterprise resource planning) and CRM (customer relationship management) systems. Examples of such
application software includes Oracle Applications, Siebel and PeopleSoft (both now part of Oracle Corp.) and
SAP. Application DBAs straddle the fence between the DBMS and the application software and are responsible
for ensuring that the application is fully optimized for the database and vice versa. They usually manage all the
applicationcomponentsthatinteractwiththedatabaseandcarryoutactivitiessuchasapplicationinstallationand
patching, application upgrades, database cloning, building and running data cleanup routines, data load
processmanagement, etc.
Whileindividualsusuallyspecializeinonetypeofdatabaseadministration,insmallerorganizations,itisnot uncommon to
find a single individual or group performing more than one type of database administration.
Database administration and automation 176
Natureofdatabaseadministration
The degree to which the administration of a database is automated dictates the skills and personnel required to
manage databases. On one end of the spectrum, a system with minimal automation will require significant
experienced resources to manage; perhaps 5-10 databases per DBA. Alternatively an organization might choose to
automate a significant amount of the work that could be done manually therefore reducing the skills required to
perform tasks. As automation increases, the personnel needs of the organization splits into highly skilled workers to
create and manage the automation and a group of lower skilled "line" DBAs who simply execute the automation.
Database administration work is complex, repetitive, time-consuming and requires significant training. Since
databases hold valuable and mission-critical data, companies usually look for candidates with multiple years of
experience. Database administration often requires DBAs to put in work during off-hours (for example, for planned
afterhoursdowntime,intheeventofadatabase-relatedoutageorifperformancehasbeenseverelydegraded).DBAs are
commonly well compensated for the long hours
One key skill required and often overlooked when selecting a DBA is database recovery (under disaster recovery). It
is not a case of “if” but a case of “when” a database suffers a failure, ranging from a simple failure to a full
catastrophic failure. The failure may be data corruption, media failure, or user induced errors. In either situation the
DBA must have the skills to recover the database to a given point in time to prevent a loss of data. A highly skilled
DBA can spend a few minutes or exceedingly long hours to get the database back to the operational point.
Databaseadministrationtools
Often, the DBMS software comes with certain tools to help DBAs manage the DBMS. Such tools are called native
tools. For example, Microsoft SQL Server comes with SQL Server Enterprise Manager and Oracle has tools such as
SQL*Plus and Oracle Enterprise Manager/Grid Control. In addition, 3rd parties such as BMC, Quest Software,
Embarcadero Technologies, EMS Database Management Solutions and SQL Maestro Group offer GUI tools to
monitor the DBMS and help DBAs carry out certain functions inside the database more easily.
Another kind of database software exists to manage the provisioning of new databases and the management of
existing databases and their related resources. The process of creating a new database can consist of hundreds or
thousands of unique steps from satisfying prerequisites to configuring backups where each step must be successful
before the next can start. A human cannot be expected to complete this procedure in the same exact way time after
time - exactly the goal when multiple databases exist. As the number of DBAs grows, without automation thenumber
of unique configurations frequently grows to be costly/difficult to support. All of these complicated procedures can
be modeled by the best DBAs into database automation software and executed by the standardDBAs. Software has
been created specifically to improve the reliability and repeatability of these procedures such as Stratavia's Data
Palette and GridApp Systems Clarity.
TheimpactofITautomationondatabaseadministration
Recently, automation has begun to impact this area significantly. Newer technologies such as Stratavia's Data Palette
suite and GridApp Systems Clarity have begun to increase the automation of databases causing the reduction of
database related tasks. However at best this only reduces the amount of mundane, repetitive activities and does not
eliminate the need for DBAs. The intention of DBA automation is to enable DBAs to focus on more proactive
activities around database architecture, deployment, performance and service level management.
Every database requires a database owner account that can perform all schema management operations. This
account is specific to the database and cannot log in to Data Director. You can add database owner accounts after
database creation. Data Director users must log in with their database-specific credentials to view the database, its
entities, and its data or to perform database management tasks. Database administrators and application developers
canmanagedatabasesonlyiftheyhaveappropriatepermissionsandrolesgrantedtothembytheorganization
Database administration and automation 177
administrator. The permissions and roles must be granted on the database group or on the database, and they only
apply within the organization in which they are granted.
Learningdatabase administration
There are several education institutes that offer professional courses, including late-night programs, to allow
candidates to learn database administration. Also, DBMS vendors such as Oracle, Microsoft and IBM offer
certificationprogramstohelpcompaniestohirequalifiedDBApractitioners.CollegedegreeinComputerScienceor related
field is helpful but not necessarily a prerequisite.
Externalreferences
[1]
• "Asettheoreticdatastructureandretrievallanguage" .SIGIRForum(ACMSpecialInterestGrouponInformati
on Retrieval) 7 (4): 45–55. Winter 1972.
[2]
• ThomasHaigh (June2006). "Origins ofthe Data BaseManagement System" (PDF). SIGMODRecord (ACM
SpecialInterestGrouponManagementofData)35(2).
ThisarticleisbasedonmaterialtakenfromtheFreeOn-lineDictionaryofComputingpriorto1November2008and
incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.
References
[1] http://portal.acm.org/citation.cfm?id=1095495.1095500
[2] http://www.tomandmaria.com/tom/Writing/VeritableBucketOfFactsSIGMOD.pdf
Replication(computing)
Replicationin computing involves sharing information so as to ensure consistency between redundant resources,
such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
Terminology
Onespeaks of:
[1]
• data replication if the same data is stored on multiple storage devices,
• computation replication if the same computing task is executed many times.
A computational task is typically replicated in space, i.e. executed on separate devices, or it could be replicated in
time, if it is executed repeatedly on a single device.
The access to a replicated entity is typically uniform with access to a single, non-replicated entity. The replication
itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as
[2]
possible. The latter refers to data replication with respect to Quality of Service (QoS) aspects.
Computer scientists talk about active and passive replication in systems that replicate data or services:
• active replication is performed by processing the same request at every replica.
• passivereplicationinvolvesprocessingeachsinglerequestonasinglereplicaandthentransferringitsresultant state to
the other replicas.
Ifatanytimeonemasterreplicaisdesignatedtoprocessalltherequests,thenwearetalkingaboutthe primary-backup scheme
(master-slavescheme) predominant in high-availability clusters. On the other side, if any replica processes a request
and then distributes a new state, then this is a multi-primary scheme (called multi-
masterinthedatabasefield).Inthemulti-primaryscheme,someformofdistributedconcurrencycontrolmustbeused,such
Replication (computing) 178
asdistributed lockmanager.
Load balancing differs from task replication, since it distributes a load of different (not the same) computationsacross
machines, and allows a single computation to be dropped in case of failure. Load balancing, however, sometimes
uses data replication (especially multi-master replication) internally, to distribute its data amongmachines.
[citation
Backup differs from replication in that it saves a copy of data unchanged for a long period of time.
needed]
Replicas, on the other hand, undergo frequent updates and quickly lose any historical state. Replication is one
of the oldest and most important topics in the overall area of distributed systems.
Whether one replicates data or computation, the objective is to have some group of processes that handle incoming
events. If we replicate data, these processes are passive and operate only to maintain the stored data, reply to read
requests, and apply updates. When we replicate computation, the usual goal is to provide fault-tolerance. For
example, a replicated service might be used to control a telephone switch, with the objective of ensuring that even if
the primary controller fails, the backup can take over its functions. But the underlying needs are the same in both
cases: by ensuring that the replicas see the same events in equivalent orders, they stay in consistent states and hence
any replica can respond to queries.
Replicationmodelsindistributedsystems
Anumber of widely cited models exist for data replication, each havingits own properties and performance:
1. Transactional replication. This is the model for replicating transactional data, for example a database or some
otherformoftransactionalstoragestructure.Theone-copyserializabilitymodelisemployedinthiscase,which defines
legal outcomes of a transaction on replicated data in accordance with the overall ACID properties that
transactional systems seek to guarantee.
2. Statemachinereplication.Thismodelassumesthatreplicatedprocessisadeterministicfiniteautomatonandthat atomic
broadcast of every event is possible. It is based on a distributed computing problem called
distributedconsensusandhasagreatdealincommonwiththetransactionalreplicationmodel.Thisissometimesmistakenl
y used as synonym of active replication. State machine replication is usually implemented by a replicated log
consisting of multiple subsequent rounds of the Paxos algorithm. This was popularized by Google's Chubby
system, and is the core behind the open-source Keyspace data store.
3. Virtualsynchrony.Thiscomputationalmodelisusedwhenagroupofprocessescooperatetoreplicatein-memory data or
to coordinate actions. The model defines a distributed entity called a process group. A process can join a group,
and is provided with a checkpoint containing the current state of the data replicated by group members. Processes
can then send multicasts to the group and will see incoming multicasts in the identical order. Membership changes
are handled as a special multicast that delivers a new membership view to the processes in the group.
Databasereplication
Database replication can be used on many database management systems, usually with a master/slave relationship
between the original and the copies. The master logs the updates, which then ripple through to the slaves. The slave
outputsamessagestatingthatithasreceivedtheupdatesuccessfully,thusallowingthesending(andpotentially re-sending
until successfully applied) of subsequent updates.
Multi-master replication, where updates can be submitted to any database node, and then ripple through to other
servers, is often desired, but introduces substantially increased costs and complexity which may make it impractical
in some situations. The most common challenge that exists in multi-master replication is transactional conflict
preventionorresolution.Mostsynchronousoreagerreplicationsolutionsdoconflictprevention,whileasynchronous
solutionshavetodoconflictresolution.Forinstance,ifarecordischangedontwonodessimultaneously,aneager
Replication (computing) 179
replication system would detect the conflict before confirming the commit and abort one of the transactions. A
lazyreplication system would allow both transactions to commit and run a conflict resolution during
resynchronization. The resolution of such a conflict may be based on a timestamp of the transaction, on the hierarchy
of the originnodes or on much more complex logic, which decides consistently on all nodes.
Database replication becomes difficult when it scales up. Usually, the scale up goes with two dimensions, horizontal
and vertical: horizontal scale-up has more data replicas, vertical scale-up has data replicas located further away in
distance. Problems raised by horizontal scale-up can be alleviated by a multi-layer multi-view access protocol.
Vertical scale-up causes fewer problems in that internet reliability and performance are improving.
When data is replicated between database servers, so that the information remains consistent throughout the database
system and users cannot tell or even know which server in the DBMS they are using, the system is said to exhibit
replication transparency.
Diskstoragereplication
Active (real-time) storage replication is usually implemented by
distributing updates of a block device to several physical hard disks.
This way, any file system supported by the operating system can be
replicated without modification, as the file system code works on a
level above the block device driver layer. It is implemented either in
hardware (in a disk array controller) or in software (in a device driver).
The main characteristic of such cross-site replication is how write operations are handled:
• Synchronous replication - guarantees "zero data loss" by the means of atomic write operation, i.e. write either
completesonbothsidesornotatall.Writeisnotconsideredcompleteuntilacknowledgementbybothlocaland remote
storage. Most applications wait for a write transaction to complete before proceeding with further work, hence
overall performance decreases considerably. Inherently, performance drops proportionally to distance, as
latency is caused by speed of light. For 10 km distance, the fastest possible roundtrip takes 67 μs, whereas
nowadays a whole local cached write completes in about 10-20 μs.
• Anoften-overlookedaspectofsynchronousreplicationisthefactthatfailureofremotereplica,orevenjustthe
interconnection,stopsbydefinitionanyandallwrites(freezingthelocalstoragesystem).Thisisthebehaviour that
guarantees zero data loss. However, many commercial systems at such potentially dangerous point do not
freeze, but just proceed with local writes, losing the desired zero recovery point objective.
• Themaindifferencebetweensynchronousandasynchronousvolumereplicationisthatsynchronous
[3]
replication needs to wait for the destination server in any write operation.
• Asynchronousreplication-writeisconsideredcompleteassoonaslocalstorageacknowledgesit.Remotestorage is
updated, but probably with a small lag. Performance is greatly increased, but in case of losing a local storage, the
remote storage is not guaranteed to have the current copy of data and most recent data may be lost.
Replication (computing) 180
[citation needed]
• Semi-synchronous replication - this usually means that a write is considered complete as soon as
local storage acknowledges it and a remote server acknowledges that it has received the write either into memory
ortoadedicatedlogfile.Theactualremotewriteisnotperformedimmediatelybutisperformedasynchronously, resulting
in better performance than synchronous replication but offering no guarantee of durability.
• Point-in-time replication - introduces periodic snapshots that are replicated instead of primary storage. If the
replicated snapshots are pointer-based, then during replication only the changed data is moved not the entire
volume.Usingthismethod,replicationcanoccuroversmaller,lessexpensivebandwidthlinkssuchasiSCSI or T1
instead of fiber optic lines.
To address the limits imposed by latency, techniques of WAN optimization can be applied to the link.
Notableimplementations
Many distributed filesystems use replication to ensure fault tolerance and avoid a single point of failure. See the lists
of distributed fault-tolerant file systems and distributed parallel fault-tolerant file systems.
Other notable storage replication software includes:
[4] [5]
• CA-ARCserve Replication andHighAvailabilityRHA
• Dell-AppAssureBackup,replicationanddisasterrecovery
• Dell - Compellent Remote Instant Replay
• EMC - EMC RecoverPoint
• EMC - EMC SRDF
• EMC - EMC VPLEX
• DataCore SANsymphony &SANmelody
• StarWindiSCSISAN&NAS
• FalconStor Replication & Mirroring (sub-block heterogeneous point-in-time, async, sync)
[6]
• FreeNas- Replication handledby ssh + zfsfile system
• Hitachi TrueCopy
• Hewlett-Packard-Continuous Access(HPCA)
• IBM - Peer to Peer Remote Copy (PPRC) and Global Mirror (known together as IBM Copy Services)
• Linux-DRBD-opensourcemodule
• HASTDRBD-likeOpenSourcesolutionforFreeBSD.
• MapR volume mirroring
• NetApp SyncMirror
• NetApp SnapMirror
• Symantec Veritas Volume Replicator(VVR)
[7]
• VMware-SiteRecoveryManager(SRM)
File-basedreplication
File-based replication is replicating files at a logical level rather than replicating at the storage block level. There are
many different ways of performing this. Unlike with storage-level replication, the solutions almost exclusively rely
on software.
Capturewithakernel driver
With the use of a kernel driver (specifically a filter driver), that intercepts calls to the filesystem functions, any
activity is captured immediately as it occurs. This utilises the same type of technology that real time active virus
checkers employ. At this level, logical file operations are captured like file open, write, delete, etc. The kernel driver
transmitsthesecommandstoanotherprocess,generallyoveranetworktoadifferentmachine,whichwillmimicthe
Replication (computing) 181
operations of the source machine. Like block-level storage replication, the file-level replication allows both
synchronous and asynchronous modes. In synchronous mode, write operations on the source machine are held and
not allowed to occur until the destination machine has acknowledged the successful replication. Synchronous modeis
[8]
less common with file replication products although a few solutions exists.
File level replication solution yield a few benefits. Firstly because data is captured at a file level it can make an
informeddecisiononwhethertoreplicatebasedonthelocationofthefileandthetypeoffile.Henceunlike block-
levelstoragereplicationwhereawholevolumeneedstobereplicated,filereplicationproductshavetheability to exclude
temporary files or parts of a filesystem that hold no business value. This can substantially reduce the amount of data
sent from the source machine as well as decrease the storage burden on the destination machine. A further benefit to
decreasing bandwidth is the data transmitted can be more granular than with block-levelreplication. If an application
writes 100 bytes, only the 100 bytes are transmitted not a complete disk block which is generally 4096 bytes.
On a negative side, as this is a software only solution, it requires implementation and maintenance on the operating
system level, and uses some of machine's processing power (CPU).
Notable implementations:
[4] [5]
• CAARCserve Replication
[9]
• Cofio Software AIMstor Replication
[10]
• Double-Take Software Availability
[11]
• EDpCloud Software EDpCloud Real Time Replication
Filesystemjournalreplication
Inmanywaysworkinglikeadatabasejournal,manyfilesystemshavetheabilitytojournaltheiractivity.Thejournal can be
sent to another machine, either periodically or in real time. It can be used there to play back events.
Notable implementations:
• MicrosoftDPM(periodical updates,notin realtime)
Batchreplication
This is the process of comparing the source and destination filesystems and ensuring that the destination matches the
source. The key benefit is that such solutions are generally free or inexpensive. The downside is that the process of
synchronizing them is quite system-intensive, and consequently this process generally runs infrequently.
Notable implementations:
• rsync
Distributedsharedmemoryreplication
Another example of using replication appears in distributed shared memory systems, where it may happen that many
nodes of the system share the same page of the memory - which usually means, that each node has a separate copy
(replica) of this page.
Primary-backupandmulti-primaryreplication
Many classical approaches to replication are based on a primary/backup model where one device or process has
unilateral control over one or more other processes or devices. For example, the primary might perform some
computation, streaming a log of updates to a backup (standby) process, which can then take over if the primary fails.
This approach is the most common one for replicating databases, despite the risk that if a portion of the log is lost
duringafailure,thebackupmightnotbeinastateidenticaltotheonetheprimarywasin,andtransactionscould
Replication (computing) 182
then be lost.
A weakness of primary/backup schemes is that in settings where both processes could have been active, only one is
actuallyperformingoperations.We'regainingfault-tolerancebutspendingtwiceasmuchmoneytogetthisproperty.
Forthisreason,startingintheperiodaround1985,thedistributedsystemsresearchcommunitybegantoexplore alternative
methods of replicating data. An outgrowth of this work was the emergence of schemes in which a groupof replicas
could cooperate, with each process backup up the others, and each handling some share of the workload. Jim Gray, a
[12]
towering figure within the database community, analyzed multi-primary replication schemes
underthetransactionalmodelandultimatelypublishedawidelycitedpaperskepticaloftheapproach"TheDangersof
[13]
ReplicationandaSolution ".Inanutshell,hearguedthatunlessdatasplitsinsomenaturalwaysothatthe
database can be treated as n disjoint sub-databases, concurrency control conflicts will result in seriously degraded
performance and the group of replicas will probably slow down as a function of n. Indeed, he suggests that the most
common approaches are likely to result in degradation that scales as O(nS). His solution, which is to partition the
data, is only viable in situations where data actually has a natural partitioning key.
The situation is not always so bleak. For example, in the 1985-1987 period, the virtual synchrony model was
proposed and emerged as a widely adopted standard (it was used in the Isis Toolkit, Horus, Transis, Ensemble,
Totem, Spread, C-Ensemble, Phoenix and Quicksilver systems, and is the basis for the CORBA fault-tolerant
computing standard; the model is also used in IBM Websphere to replicate business logic and in Microsoft's
Windows Server 2008 enterprise clustering technology). Virtual synchrony permits a multi-primary approach in
whichagroupofprocessescooperatetoparallelizesomeaspectsofrequestprocessing.Theschemecanonlybeused for some
forms of in-memory data, but when feasible, provides linear speedups in the size of the group.
A number of modern products support similar schemes. For example, the Spread Toolkit supports this same virtual
synchrony model and can be used to implement a multi-primary replication scheme; it would also be possible to use
C-Ensemble or Quicksilver in this manner. WANdisco permits active replication where every node on a network is
an exact copy or replica and hence every node on the network is active at one time; this scheme is optimized for use
in a wide area network.
References
[1] http://searchsqlserver.techtarget.com/definition/database-replication
[2] V.Andronikou,K.Mamouras,K.Tserpes,D.Kyriazis,T.Varvarigou,DynamicQoS-awareDataReplicationinGridEnvironments,ElsevierFuture
Generation Computer Systems - The International Journal of Grid Computing and eScience, 2012
[3] Open-EKnowledgebase."Whatisthedifferencebetweenasynchronousandsynchronousvolumereplication?"(http://kb.open-e.com/What-
is-the-difference-between-asynchronous-and-synchronous-volume-replication-_682.html) 12 August 2009.
[4] http://www.arcserve.com/gb/default.aspx
[5] http://www.arcserve.com/gb/products/ca-arcserve-replication/ca-arcserve-replication-features-overview.aspx
[6] http://doc.freenas.org/index.php/Replication_Tasks
[7] http://pubs.vmware.com/srm-51/index.jsp?topic=
%2Fcom.vmware.srm.install_config.doc%2FGUID-B3A49FFF-E3B9-45E3-AD35-
093D896596A0.html
[8] AIMstorReplication(http://www.cofio.com/AIMstor-Replication/)
[9] http://www.cofio.com/AIMstor-Replication/
[10] http://www.doubletake.com/uk/products/double-take-availability/Pages/default.aspx
[11] http://www.enduradata.com/
[12] Proceedingsofthe1999ACMSIGMODInternationalConferenceonManagementofData:SIGMOD'99,Philadelphia,PA,USA;June1–3, 1999,
Volume 28; p. 3.
[13] http://research.microsoft.com/~gray/replicas.ps
183
Database Products
Comparison of object database management
systems
Thisisacomparison of notableobject database management
systems,showingwhatfundamentalobjectdatabase features are implemented natively.
Db4o 8.0 C#, Java [1] .NETandJ GPL, Native Queries, LINQ
db4o-sql
avadataty support,automatic schema
custom,
pes [2] evolution,TransparentActivation/Pers
proprietary
istence,replication to RDBMS,
ObjectManagerpluginforVisualStudio
andEclipse
Objectivity/DB 10.2.1 C++, C#, Java, SQLsuperset Proprietary Distributed, Parallel Query Engine
Python,Smalltalk
andXML
Perst 4.2 Java(includingJav JSQL- Java and GPL, Small footprint embedded
aSE,JavaME&An object- .NETdatat Proprietary database.Diverse indexes and
droid), orientedsubset of ypes specializedcollectionclasses;LINQ;re
C#(including.NET SQL plication;ACID transactions; native
, full textsearch; includes Silverlight,
.NETCompactFra Androidand Java ME demo apps.
mework,Mono&Si
lverlight)
References
[1] http://code.google.com/p/db4o-sql/
[2] http://www.db4o.com/about/company/legalpolicies/docl.aspx
[3] Wakanda Commercial license (http://www.wakanda.org/license/commercial)
Comparison of object-relational database management systems 185
Comparisonofobject-relationaldatabase
management systems
Thisis acomparisonofobject-relationaldatabasemanagementsystems(ORDBMSs).Eachsystemhasatleast some
features of an object-relational database; they vary widely in their completeness and the approaches taken.
The following tables compare general and technical information; please see the individual products' articles for
further information. This article is not all-inclusive nor necessarily up to date. Unless otherwise specified in
footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.
Basicdata
Name Vendor License OS Notes
Objectfeatures
InformationaboutwhatfundamentalORDBMSesfeaturesareimplemented natively.
Datatypes
Information about what data types are implemented natively.
LogicSQL ? ? ? ? ?
References
[1] http://sourceforge.net/projects/gigabase/
[2] http://webdocs.cs.ualberta.ca/~yuan/databases/index.html
[3] http://www.valentina-db.com/
[4] http://webdocs.cs.ualberta.ca/~yuan/databases/index.html
[5] Noprivate methods, no way tocall super method from a child.
Externallinks
• Arvin.dk(http://troels.arvin.dk/db/rdbms/),ComparisonofdifferentSQLimplementations
List of relational database management systems 187
Listofrelationaldatabasemanagementsystems
Thisisalistofrelationaldatabasemanagementsystems.
ListofSoftware
• 4th Dimension
• Adabas D
• Alpha Five
• Apache Cassandra
• Apache Derby
• Aster Data
• Altibase
• BlackRay
• CA-Datacom
• Clarion
• Clustrix
• CSQL
• CUBRID
• Daffodil database
• DataEase
• Database Management Library
• Dataphor
• dBase
• Derby aka Java DB
• Empress Embedded Database
• EXASolution
• EnterpriseDB
• eXtremeDB
• FileMaker Pro
• Firebird
• Greenplum
• GroveSite
• H2
• Helix database
• HSQLDB
• IBM DB2
• IBM Lotus Approach
• IBM DB2 Express-C
• Infobright
• Informix
• Ingres
• InterBase
• InterSystems Caché
• GT.M
• Linter
• MariaDB
List of relational database management systems 188
• MaxDB
• MemSQL
• Microsoft Access
• Microsoft Jet Database Engine (part of Microsoft Access)
• Microsoft SQL Server
• Microsoft SQL Server Express
• Microsoft Visual FoxPro
• Mimer SQL
• MonetDB
• mSQL
• MySQL
• Netezza
• NexusDB
• NonStop SQL
• Openbase
• OpenLink Virtuoso (Open Source Edition)
• OpenLink Virtuoso Universal Server
• OpenOffice.org Base
• Oracle
• Oracle Rdb for OpenVMS
• Panorama
• Pervasive PSQL
• Polyhedra
• PostgreSQL
• Postgres Plus Advanced Server
• Progress Software
• RDMEmbedded
• RDMServer
• TheSAS system
• SANDCDBMS
• SAP HANA
• SAPSybaseAdaptiveServerEnterprise
• SAPSybaseIQ
• SQL Anywhere (formerly known as Sybase Adaptive Server Anywhere and Watcom SQL)
• ScimoreDB
• SmallSQL
• solidDB
• SQLBase
• SQLite
• Sybase Advantage Database Server
• Teradata
• TimesTen
• txtSQL
• mizanSQL
• UnisysRDMS2200
• UniData
• UniVerse
List of relational database management systems 189
• Vertica
• VMDS
Historical
• Britton LeeIDMs
• Cornerstone
• IBM System R
• MICRO Information Management System
• Oracle Rdb
• Paradox
• Pick
• PRTV
• QBE
• IBM SQL/DS
• Sybase SQL Server
RelationalbytheDate-Darwen-PascalModel
Current
• AlphoraDataphor(aproprietaryvirtual,federatedDBMSandRADMS.NetIDE).
• Rel (free Java implementation).
Obsolete
• IBMBusiness System 12
• IBM IS1
• IBM PRTV(ISBL)
• Multics Relational Data Store
Comparison of relational database management systems 190
Comparisonofrelationaldatabasemanagement
systems
The following tables compare general and technical information for a number of relational database
managementsystems. Please see the individual products' articles for further information. This article is not all-
inclusive or necessarily up to date. Unless otherwise specified in footnotes, comparisons are based on the stable
versions without any add-ons, extensions or external programs.
Generalinformation
Maintainer Firstpublic Lateststable Latestrelease Softwarelicense
releasedate version date
Operatingsystemsupport
TheoperatingsystemsthattheRDBMSescanrunon.
Windows OSX Linux BSD UNIX AmigaOS Symbian z/OS iOS Android
DB2 Yes(Ex
Yes Yes No Yes No No Yes Yes No
pressC)
EXASolution No No Yes No No No No No No No
InterBase Yes(Sol
Yes Yes Yes No No No No No No
aris)
LinterSQLRDBMS UnderLinuxonSy
Yes Yes Yes Yes Yes No No No Yes
stemz
MicrosoftAccess(JET) Yes No No No No No No No No No
MicrosoftVisualFoxpro Yes No No No No No No No No No
MicrosoftSQLServer Yes No No No No No No No No No
MicrosoftSQLServerCompact(E
mbeddedDatabase) Yes No No No No No No No No No
MySQL Yes Yes Yes Yes Yes Yes Yes Yes ? [2]
Yes
Oracle Rdb No No No No No No No No No No
PervasivePSQL Yes(OEM
Yes Yes No No No No No No No
only)
PostgreSQL UnderLinuxon
Yes Yes Yes Yes Yes No No [3] No Yes
System z
R:Base Yes No No No No No No No No No
ScimoreDB Yes No No No No No No No No No
SQLite Yes Yes Yes Yes Yes Yes Yes Maybe Yes Yes
XeroundCloudDatabase Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Fundamentalfeatures
InformationaboutwhatfundamentalRDBMSfeaturesareimplemented natively.
ADABAS proprietarydirectcall&SQL(via3rd
Yes No Yes Yes
party)
MariaDB SQL
Yes2 Partial3 Yes2exceptfor Yes
[4] DDL
Note (1): Currently only supports read uncommited transaction isolation. Version 1.9 adds serializable isolation and
version 2.0 will be fully ACID compliant.
Note(2):MySQLprovidesACIDcompliancethroughthedefaultInnoDBstorageengine.
Comparison of relational database management systems 195
Note (3): "For other [than InnoDB] storage engines, MySQL Server parses and ignores the FOREIGN KEY and
REFERENCES syntax in CREATE TABLE statements. The CHECK clause is parsed but ignored by all storage
engines."
Note (4): Support for Unicode is new in version 10.0.
Note(5):MySQLprovidesGUIinterfacethroughMySQLWorkbench.
Note (6): Pervasive PSQL provides UTF-8 storage.
Limits
Information about data size limits.
MaxDB Maxtable Maxrow Maxcolumnsper MaxBlob/Clob Max Max MinDATE Max DATE Max
size size size row size CHARsize NUMBER column
value value namesize
size
ApacheDer Unlimited Unlimited Unlimited 1,012 (5,000 in views) 2,147,483,647 254(VARCH 64 bits 0001-01-01 9999-12-31 128
by
chars AR:32,672)
Drizzle Unlimited 64TB 8KB 1,000 4GB(longtext,lon 64KB 64 bits 0001 9999 64
gblob) (text)
DB2 Unlimited 2ZB 32,677B 1,012 2GB 32KiB) 64 bits 0001-01-01 9999-12-31 128
EmpressE Unlimited 2GB 32,767 2GB 2GB 64 bits 0000-01-01 9999-12-31 32
mbeddedD 263-1
atabase bytes
EXASolution Unlimited Unlimited Unlimited 10,000 N/A 2MB 128 bits 0001-01-01 9999-12-31 256
HSQLDB 64TB Unlimited8 Unlimited8 Unlimited8 64TB7 Unlimited8 Unlimited8 0001-01-01 9999-12-31 128
H2 64TB Unlimited8 Unlimited8 Unlimited8 64TB7 Unlimited8 64 bits -99999999 99999999 Unlimited8
MaxDB Maxtable Maxrow Maxcolumnsper MaxBlob/Clob Max Max MinDATE Max DATE Max
size size size row size CHARsize NUMBER column
value value namesize
size
Comparison of relational database management systems 196
Informix ~128 PB ~128 PB 32,765 32,765 4TB 32,765 12/31/9999 128 bytes
Dynamic 1032 01/01/000110
Server bytes(exclu
siveoflarge
objects)
Ingres Unlimited Unlimited 256KB 1,024 2GB 32000B 64 bits 0001 9999 256
InterBase Unlimited1 ~32TB 65,536B Dependsondatatypesus 2GB 32,767B 64 bits 100 32768 31
ed
LinterSQL Unlimited 64KB(w/o 250 4GB 4KB 64 bits 0001-01-01 9999-12-31 66
RDBMS 230rows
BLOBs),4
GB(BLOB)
MicrosoftA 2GB 2GB 16MB 255 64KB(memo 255B(textfi 32 bits 0100 9999 64
ccess (JET)
field), 1 eld)
GB("OLEObje
ct"field)
MicrosoftVi Unlimited 2GB 65,500B 255 2GB 16MB 32 bits 0001 9999 10
sualFoxpr
o
MySQL5 Unlimited MyISAM 64KB3 4,0964 4GB(longtext,lon 64KB 64 bits 1000 9999 64
storagelimi gblob) (text)
ts:256TB;
Innodbstor
agelimits:6
4TB
Oracle Unlimited( 4GB* 8KB 1,000 Unlimited 126 bits −4712 9999 30
32,767B11
4 GB * blocksize(
blocksizep withBIGFI
ertablespac LE
e) tablespace)
MaxDB Maxtable Maxrow Maxcolumnsper MaxBlob/Clob Max Max MinDATE Max DATE Max
size size size row size CHARsize NUMBER column
value value namesize
size
PervasivePS 4billion 256 GB 2 GB 1,536 2 GB 8,000 bytes 64 bits 01-01-0001 12-31-9999 128 bytes
QL
objects
Comparison of relational database management systems 197
Polyhedra Limitedb 232rows Unlimited 65,536 4GB(subjecttoRA 4 GB 32 bits 0001-01-01 8000-12-31 255
yavailabl M) (subjecttoR
eRAM, AM)
addressspa
ce
SQL 104TB Limitedb Limitedby 45,000 2GB 2GB 64 bits 0001-01-01 9999-12-31 ?
Anywhere (13files,ea yfilesize filesize
chfileupto
8TB(32K
B
pages))
SQLite 128TB Limitedb Limitedby 32,767 2GB 2GB 64 bits NoDATE NoDATEtype9 Unlimited
31 yfilesize filesize
(2 pages type9
* 64KB
maxpagesi
ze)
UniVerse Unlimited Unlimited Unlimited Unlimited Unlimited Unlimited Unlimited Unlimited Unlimited Unlimited
Xeround Unlimited Unlimited 32 GB, 1,000 4 GB 64 KB 64 bits 1000 9999 64
CloudDa
tabase dependingo
navailable
memory
MaxDB Maxtable Maxrow Maxcolumnsper MaxBlob/Clob Max Max MinDATE Max DATE Max
size size size row size CHARsize NUMBER column
value value namesize
size
Tablesandviews
Informationabout what tables and views (other thanbasic ones) are supported natively.
Temporarytable Materializedview
ADABAS ? ?
ApacheDerby Yes No
Clustrix Yes No
CUBRID No No
EXASolution Yes No
HSQLDB Yes No
H2 Yes No
InterBase Yes No
LucidDB No No
MaxDB Yes No
MicrosoftAccess(JET) No No
MicrosoftSQLServerCompact(EmbeddedDatabase) Yes No
MonetDB/SQL Yes No
PervasivePSQL Yes No
RDMEmbedded Yes No
RDMServer Yes No
ScimoreDB No No
SQLite Yes No
UniData Yes No
UniVerse Yes No
XeroundCloudDatabase Yes No
Note (1): Server provides tempdb, which can be used for public and private (for the session) temp tables.
Note (2): Materialized views are not supported in Informix; the term is used in IBM’s documentation to refer to a
temporary table created to run the view’s query when it is too complex, but one cannot for example define the way it
is refreshed or build an index on it. The term is defined in the Informix Performance Guide.
Note (3): Query optimizer support only in Developer and Enterprise Editions. In other versions, a direct reference to
materialized view and a query hint are required.
Note(4): Materialized views can be emulated using stored procedures and triggers.
Note (5): Materialized views are now standard but can be emulated in versions prior to 9.3 with stored procedures
and triggers using PL/pgSQL, PL/Perl, PL/Python, or other procedural languages.
Indexes
Information about what indexes (other than basic B-/B+ tree indexes) are supported natively.
R-/R+ Hash Expression Partial Reverse Bitmap GiST GIN Full-text Spatial FOT
tree
ADABAS ? ? ? ? ? ? ? ? ? ? ?
AdaptiveServerEnte ? ?
rprise No No Yes No Yes No No No Yes
AdvantageDatabas ? ?
eServer No No Yes No Yes Yes No No Yes
ApacheDerby No No No No No No No No No ? ?
Clustrix No Yes No No No No No No No No ?
Drizzle No No No No No No No No No ? ?
EmpressEmbe ? ?
ddedDatabase
Yes No No Yes No Yes No No No
EXASolution No Yes No No No No No No No ? ?
HSQLDB No No No No No No No No No ? ?
H2 No Yes No No No No No No Yes ? ?
InformixDynamic
Server Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Ingres Ingres ? ?
Yes Yes Ingres v10 No No No No No
v10
InterBase No No No No No No No No No ? ?
LinterSQLRDBMS1
0 No No No No No No No No Yes No No
LucidDB No No No No No Yes No No No ? ?
MaxDB No No No No No No No No No ? ?
MicrosoftAccess(JET ? ?
) No No No No No No No No No
MicrosoftVisualFox ? ?
pro No No Yes Yes Yes2 Yes No No No
MicrosoftSQLServe ? Non/ ?
r Yes3 Yes4 No3 No No No Yes Yes
Cluster&fill
factor
MicrosoftSQLServ ? ?
erCompact(Embed
dedDatabase) No No No No No No No No No
MonetDB/SQL No Yes No No No No No No ? ? ?
Oracle Yes11 Cluster Tables Yes Yes6 Yes Yes No No Yes Yes ?
PervasivePSQL No No No No No No No No No No No
PolyhedraDBMS No Yes No No No No No No No No ?
PostgreSQL Yes Yes Yes Yes Yes7 Yes8 Yes Yes Yes PostGIS ?
ScimoreDB No No No No No No No No Yes ? ?
SQLAnywhere No No No No No No No No Yes ? ?
Comparison of relational database management systems 201
R-/R+ Hash Expression Partial Reverse Bitmap GiST GIN Full-text Spatial FOT
tree
Note (1): The users need to use a function from freeAdhocUDF library or similar.
Note (2): Can be implemented for most data types using expression-based indexes.
Note (3): Can be emulated by indexing a computed column (doesn't easily update) or by using an "Indexed
View"(proper name not just any view works).
Note (4): Can be implemented by using an indexed view.
Note(5):InnoDBautomaticallygenerates adaptivehash indexentriesas needed.
Note(6):CanbeimplementedusingFunction-basedIndexesinOracle8iandhigher,butthefunctionneedstobe used in the sql
for the index to be used.
Note(7): A PostgreSQL functional index can be used to reversethe order of a field.
Note (8): PostgreSQL will likely support on-disk bitmap indexes in a future version. Version 8.2 supports a related
technique known as "in-memory bitmap scans".
Note (10): B+ tree and full-text only for now.
Note (11): R-Tree indexing available in base edition with Locator but some functionality requires Personal
Editionor Enterprise Edition with Spatial option.
Databasecapabilities
Union Intersect Except Inner Outerj Inner Merge Blobs CommonTableEx WindowingF Parallel
joins oins selects joins and pressions unctions Query
Clobs
ADABAS Yes ? ? ? ? ? ? ? ? ? ?
AdaptiveServerEnte ? ? ? ?
rprise Yes Yes Yes Yes Yes Yes Yes
AdvantageDatabas ? ?
eServer Yes No No Yes Yes Yes Yes Yes No
Altibase Yes,via
Yes Yes Yes Yes Yes Yes Yes No No No
MINUS
CUBRID Yes Yes Yes Yes Yes Yes Yes Yes No Yes ?
DB2 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Comparison of relational database management systems 202
EmpressEmbedde ? ? ?
dDatabase Yes Yes Yes Yes Yes Yes Yes Yes
EXASolution Yes Yes Yes Yes Yes Yes Yes No Yes Yes Yes
HSQLDB Yes Yes Yes Yes Yes Yes [13] Yes Yes No Yes
Yes
MicrosoftVisualFox ? ? ? ? ? ?
pro Yes Yes Yes Yes Yes
MicrosoftSQLServe Yes(
r Yes(2005
Yes 2005 Yes Yes Yes Yes Yes Yes Yes Yes
andbeyond
andbe
)
yond)
MicrosoftSQLServe ? ?
rCompact(Embedd
edDatabase) Yes No No Yes Yes No Yes No No
MonetDB/SQL ? ? ? ? ? ? ? ? ? ? ?
Oracle Yes,via
Yes Yes Yes Yes Yes Yes Yes Yes1 Yes Yes
MINUS
Oracle Rdb Yes Yes Yes Yes Yes Yes Yes Yes ? ? ?
PostgreSQL Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No
ScimoreDB ? ? LEFT ? ? ?
Yes Yes Yes Yes Yes
only
SmallSQL ? ? ? ? ? ? ? ? ? ? ?
SQLAnywhere Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Comparison of relational database management systems 203
SQLite LEFT
Yes Yes Yes Yes Yes No Yes No No No
only
Teradata Yes Yes Yes Yes Yes Yes Yes Yes No Yes Yes
Union Intersect Except Inner Outerj Inner Merge Blobs CommonTableEx WindowingF Parallel
joins oins selects joins and pressions unctions Query
Clobs
Note (1): Recursive CTEs introduced in 11gR2 supersedes similar construct called CONNECT BY.
Datatypes
Typesyst Integer Floatingpoint Decimal String Binary Date/Time Boolean Other
em
4thDimens Static UUID(16- REAL,FLOAT REAL,FLOAT CLOB,TEXT, BIT,BITVARYING,B DURATION,I BOOLEAN PICTURE
ion
bit),SMALLIN VARCHAR LOB NTERVAL,TI
T(16-bit),INT MESTAMP
(32-bit),BIGINT
(64-
bit),NUMERIC
(64-bit)
bit),INTEGER bit)
(32-bit),BIGINT
(64-bit)
Clustrix Static TINYINT(8- FLOAT(32- DECIMAL CHAR,BINARY,VARC TINYBLOB,BLOB, DATETIME,DATE, BIT(1),BO ENUM,SET,
bit),SMALLINT bit),DOUBLE HAR,VARBINARY,TE MEDIUMBLOB,LON TIMESTAMP,YEA OLEAN
(16- XT,TINYTEXT,MEDIU GBLOB R
bit),MEDIUM MTEXT,LONGTEXT
INT(24-
bit),INT
(32-bit),BIGINT
(64-bit)
CUBRID Static SMALLINT FLOAT,REAL(3 DECIMAL,NUMERIC CHAR,VARCHAR,NCH BLOB DATE,DATETIME,T BIT MONETARY,BITVA
(32-bit),BIGINT bit) M
(64-bit)
EmpressE Static TINYINT,SQL_TI REAL,SQL_REAL, DECIMAL,DEC,NUMERIC,SQL_ CHARACTER,ECHARA BINARYLARGEOBJ DATE,EDATE,TIME, BOOLEAN SEQUENCE32,SEQ
mbedded
NYINT,orINTEGE orFLOAT32;DOUB DECIMAL,orSQL_NUMERIC;DO CTER,CHARACTERVA ECTorBLOB;BULK ETIME,EPOCH_TIM UENCE
Database
R8;SMALLINT,S LEPRECISION,SQL LLAR RYING,NATIONALCH E,TIMESTAMP,MIC
QL_SMALLINT,or _DOUBLE,orFLOA ARACTER,NATIONAL ROTIMESTAMP
INTEGER16;INT T64;FLOAT,orSQL CHARACTERVARYIN
EGER,INT,SQL_I _FLOAT;EFLOAT G,NLSCHARACTER,CH
NTEGER,orINTE ARACTERLARGEOBJE
GER32;BIGINT,S CT,TEXT,NATIONALC
QL_BIGINT,orIN HARACTERLARGEOB
TEGER64 JECT,NLSTEXT
EXASolution Static TINYINT,SM REAL,FLOAT, DECIMAL,DEC,NUMERIC,NU CHAR,NCHAR,VARC N/A DATE,TIMES BOOLEAN, GEOMETRY
ALLINT,INTE DOUBLE MBER HAR,VARCHAR2,NV TAMP,INTER BOOL
GER,BIGINT, ARCHAR,NVARCHA VAL
R2,CLOB,NCLOB
HSQLDB Static TINYINT(8- DOUBLE(64-bit) DECIMAL,NUMERIC CHAR,VARCHAR, BINARY,VARBINAR DATE,TIME, BOOLEAN OTHER(object),BIT,B
bit),SMALLINT LONGVARCHAR,C Y,LONGVARBINAR TIMESTAMP, ITVARYING,ARRAY
(16- LOB Y,BLOB INTERVAL
bit),INTEGER
(32-bit),BIGINT
(64-bit)
Informix Static SMALLINT( SMALLFLOAT( DECIMAL(32digitsfloat/ CHAR,VARCHAR,NCH TEXT,BYTE, DATE,DATETIME,I BOOLEAN SET,LIST,MULTISET
Dynamic
16-bit),INT 32-bit),FLOAT fixed),MONEY AR,NVARCHAR,LVAR BLOB,CLOB NTERVAL ,ROW,TIMESERIES,S
Server
(32-bit),INT8 (64-bit) CHAR,CLOB,TEXT PATIAL,USERDEFIN
(64- EDTYPES
bitproprietary),B
IGINT(64-bit)
Ingres Static TINYINT(8- FLOAT4(32- DECIMAL C, CHAR, BYTE,VARBYTE,L DATE,ANSIDATE,I N/A MONEY,OBJECT_
bit),SMALLINT bit),FLOAT(64-bit) VARCHAR,LONGVAR ONGVARBYTE(BL NGRESDATE,TIME, KEY,TABLE_KEY
(16- CHAR,NCHAR,NVARC OB) TIMESTAMP,INTER ,USER-
bit),INTEGER HAR,LONGNVARCHA VAL DEFINEDDATAT
(32-bit),BIGINT R,TEXT YPES(viaOME)
(64-bit)
LinterSQL Static SMALLINT REAL(32- DECIMAL,NUMERIC CHAR,VARCHAR,NCH BYTE,VARBYTE,B DATE BOOLEAN GEOMETRY,EXT
RDBMS
(16- bit),DOUBLE(64- AR,NVARCHAR,BLOB LOB FILE
bit),INTEGER bit)
(32-bit),BIGINT
(64-bit)
MicrosoftS Static TINYINT,SMALL FLOAT,REAL NUMERIC,DECIMAL,SMAL CHAR,VARCHAR,TEX BINARY,VARBINAR DATE,DATETIMEOF BIT CURSOR,TIMESTAM
QLServer
INT,INT,BIGINT LMONEY,MONEY T,NCHAR,NVARCHAR Y,IMAGE,FILESTRE FSET,DATETIME2,S P,HIERARCHYID,UN
,NTEXT AM MALLDATETIME,D IQUEIDENTIFIER,SQ
ATETIME,TIME L_VARIANT,XML,T
ABLE
MicrosoftS Static TINYINT,SMALL FLOAT,REAL NUMERIC,DECIMAL,MONEY NCHAR,NVARCHAR,N BINARY,VARBI DATETIME BIT TIMESTAMP,ROWVE
QLServerC
INT,INT,BIGINT TEXT NARY,IMAGE RSION,UNIQUEIDEN
ompact(Em
beddedDat TIFIER,IDENTITY,RO
abase) WGUIDCOL
Comparison of relational database management systems 205
MySQL Static TINYINT(8- FLOAT(32- DECIMAL CHAR,BINARY,VARC TINYBLOB,BLOB, DATETIME,DATE, BIT(1),BO ENUM,SET,GISdata
bit),SMALLINT bit),DOUBLE(aka HAR,VARBINARY,TE MEDIUMBLOB,LON TIMESTAMP,YEA OLEAN types(Geometry,Poi
(16- REAL)(64-bit) XT,TINYTEXT,MEDIU GBLOB R (akaBOO nt,Curve,LineString,
bit),MEDIUM MTEXT,LONGTEXT L)= Surface,Polygon,Geo
INT(24- synonymf metryCollection,Mult
bit),INT orTINYI iPoint,MultiCurve,M
(32-bit),BIGINT NT ultiLineString,MultiS
(64-bit) urface,MultiPolygon)
OpenLink Static+Dy INT,INTEGER,S REAL,DOUBLEPR DECIMAL,DECIMAL'('INTNUM')', CHARACTER,CHAR'('IN BLOB TIMESTAMP,DATET n/a GEOMETRY,REFERE
Virtuoso
namic MALLINT ECISION,FLOAT,FL DECIMAL'('INTNUM','INTNUM')', TNUM')',VARCHAR,VAR IME,TIME,DATE NCE(URI),
OAT'('INTNUM')' NUMERIC,NUMERIC'('INTNUM')' CHAR'('INTNUM')',NVAR UDT(UserDefinedTyp
, CHAR,NVARCHAR'('INT e)
NUMERIC'('INTNUM','INTNUM')' NUM')'
Oracle Static+Dyna NUMBER BINARY_FLOAT,B NUMBER CHAR,VARCHAR2,CLO BLOB,RAW,LONG DATE,TIMESTAMP N/A SPATIAL,IMAGE,AU
mic(through INARY_DOUBLE B,NCLOB,NVARCHAR2, RAW(deprecated),BF (with/ DIO,VIDEO,DICOM,
ANYDATA) NCHAR, ILE withoutTIME XMLType
LONG(deprecated) ZONE),INTE
RVAL
PervasiveP Static BIGINT,INTE BFLOAT4,BFLOAT DECIMAL,NUMERIC,NUMERIC CHAR,LONGVARCH BINARY,LONGVAR DATE,DATETIME,T BIT CURRENCY,IDENTIT
SQL
GER,SMALLI 8,DOUBLE,FLOAT SA,NUMERICSLB,NUMERICSL AR,VARCHAR BINARY,VARBINAR IME Y,SMALLIDENTITY,
NT,TINYINT, S,NUMERICSTB,NUMERICSTS Y TIMESTAMP,UNIQU
UBIGINT,UIN EIDENTIFIER
TEGER,USM
ALLINT,UTIN
YINT
Polyhedra Static INTEGER8 FLOAT32(32- N/A VARCHAR,LARGEVA LARGEBINARY( DATETIME BOOLEAN N/A
bit),INTEGER 64-bit)
(32-
bit),INTEGER
64
(64-bit)
PostgreSQL Static SMALLINT REAL(32- DECIMAL,NUMERIC CHAR,VARCHAR,TE BYTEA DATE,TIME BOOLEAN ENUM,POINT,LINE,L
withoutTIME arrays,composites,rang
ZONE),INTE es,custom
RVAL
RDM Static tinyint,smallint,int real,float,double N/A char, varchar, binary,varbinary,lon date,time,timestamp bit N/A
Embedded
eger,bigint wchar,varwchar,longvarc gvarbinary
har,longvarwchar
RDMServer Static tinyint,smallint,int real,float,double decimal,numeric char, varchar, binary,varbinary,lon date,time,timestamp bit rowid
eger,bigint wchar,varwchar,longvarc gvarbinary
har,longvarwchar
SQLite Dynamic INTEGER REAL(akaFLOAT,D N/A TEXT(akaCHAR,CLO BLOB N/A N/A N/A
(64-bit) OUBLE)(64-bit) B)
UniData Dynamic N/A N/A N/A N/A N/A N/A N/A N/A
Comparison of relational database management systems 206
UniVerse Dynamic N/A N/A N/A N/A N/A N/A N/A N/A
Comparison of relational database management systems 207
XeroundCl Static TINYINT(8- FLOAT(32- DECIMAL CHAR,BINARY,VARC TINYBLOB,BLOB, DATETIME,DATE, BOOLEAN ENUM,SET
oudDatab
bit),SMALLINT bit),DOUBLE(aka HAR,VARBINARY,TE MEDIUMBLOB,LON TIMESTAMP,YEA (akaBOO
ase
(16- REAL)(64-bit) XT,TINYTEXT,MEDIU GBLOB R L)=
bit),MEDIUM MTEXT,LONGTEXT synonymf
INT(24- orTINYI
bit),INT NT
(32-bit),BIGINT
(64-bit)
Typesyst Integer Floatingpoint Decimal String Binary Date/Time Boolean Other
em
Otherobjects
Information about what other objects are supported natively.
DB2 YesviaCHECKCONS
Yes Yes Yes Yes Yes
TRAINT
MicrosoftAccess(JET) Yes,ButsingleDML/
Yes No No No Yes
DDLOperation
MicrosoftSQLServerCompact(Embed
dedDatabase) No Yes No No No No
Note (1): Both functionand procedurerefer to internal routines written in SQL and/or procedural language like
PL/SQL. External routine refers to the one written in the host languages, such as C, Java, Cobol, etc.
"Storedprocedure" is a commonly used term for these routine types. However, its definition varies between
differentdatabase vendors.
Note(2):InDerby,H2,LucidDB,andCUBRID,userscodefunctionsandproceduresinJava. Note
(3): ENUM datatype exist. CHECK clause is parsed, but not enforced in runtime.
Note(4):InDrizzletheusercodesfunctionsandproceduresinC++.
Note(5):InformixsupportsexternalfunctionswritteninJava,C,&C++.
Partitioning
Information about what partitioning methods are supported natively.
Comparison of relational database management systems 209
4thDimension ? ? ? ? ?
ADABAS ? ? ? ? ?
AdaptiveServerEnterprise ?
Yes Yes No Yes
AdvantageDatabaseServe ?
r No No No No
ApacheDerby No No No No ?
Clustrix Yes No No No No
EXASolution No Yes No No No
Firebird No No No No ?
HSQLDB No No No No ?
H2 No No No No ?
InterBase No No No No ?
LinterSQLRDBMS No No No No ?
MaxDB No No No No ?
MicrosoftAccess(JET) No No No No ?
MicrosoftVisualFoxpro No No No No ?
MicrosoftSQLServer Yes No No No ?
MicrosoftSQLServerC ?
ompact(EmbeddedD
atabase) No No No No
OpenBaseSQL ? ? ? ? ?
PervasivePSQL No No No No No
PolyhedraDBMS No No No No No
RDMServer No No No No ?
ScimoreDB No Yes No No ?
SQLAnywhere No No No No ?
SQLite No No No No ?
Note (1): PostgreSQL 8.1 provides partitioning support through check constraints. Range, List and Hash methodscan
be emulated with PL/pgSQL or other procedural languages.
Note (2): RDM Embedded 10.1 requires the application programs to select the correct partition (using range, hash or
composite techniques) when adding data, but the database union functionality allows all partitions to be read as a
single database.
Accesscontrol
Information about access control functionalities (work in progress).
4D Yes(with ? ? ? ? ? ? ? ? ? ?
SSL)
AdaptiveSer Partial(n ?
verEnterprise
eedtoregi
Yes(optiona
Yes(optional ster;depe Yes(EAL
l;topay) Yes Yes Yes Yes Yes Yes
?) ndonwhi 4+1)
chproduc
t)
AdvantageDa ? ?
tabaseServe
r Yes No No No Yes Yes No No Yes
EmpressEmb ? ? ?
eddedDatab
ase No No Yes Yes Yes No Yes No
Firebird Partial( ? ?
Yes(Windowstr
nosecur
No Yes ustedauthenific No Yes No No No7
itypage)
ation)
MicrosoftS ? ?
QLServerC No(notrelev
ompact(E ant,onlyfilep No(notrele No(notrelev No(notrel Yes(file
mbeddedD Yes Yes Yes No
atabase) ermissions) vant) ant) evant) access)
OpenBaseS ? Yes(Open ? ? ? ? ? ? ?
QL
Yes Directory, No
LDAP)
OpenLink Yes(op Yes(op Yes(opt ?
Virtuoso Yes Yes Yes Yes(optional) Yes Yes No
tional) tional) ional)
Oracle ? Yes(EAL ?
Yes Yes Yes Yes Yes Yes Yes Yes
1
4+ )
PervasivePSQ ? ?
L Yes No No Yes Yes Yes12 No No No
PolyhedraDB ?
MS No No No No No Yes Yes13 Yes Yes13 No
RDM ?
No No No No No Yes No No No No
Embedded
SQL ? ? Yes(EAL3 ?
Anywhere +1as
Yes Yes(Kerberos) Yes Yes Yes No Yes AdaptiveServ
erAnywhere)
Xeround N/A- ?
CloudDa
tabase Yes(SSL database
No No No Yes No No No No
with 4.0) asaservic
e
Brute-force Enterprise Audit Resource Separation Security Label
Native Password Patch Run
protection directory limit ofduties Certification Based
network complexity access3 unprivileged4
compatibility Access
encryption 1
rules2 (RBAC)5
Control
(LBAC)
Note (1): Network traffic could be transmitted in a secure way (not clear-text, in general SSL encryption). Precise if
option is default, included option or an extra modules to buy.
Note (2): Options are present to set a minimum size for password, respect complexity like presence of numbers
orspecial characters.
Note(3):Howdoyougetsecurityupdates?Isitfreeaccess,doyouneedaloginortopay?Isthereeasyaccess through a Web/FTP
portal or RSS feed or only through offline access (mail CD-ROM, phone).
Note(4): Doesdatabase process runas root/administrator orunprivileged user? Whatis default configuration?
Note (5): Is there a separate user to manage special operation like backup (only dump/restore permissions), security
officer (audit), administrator (add user/create database), etc.? Is it default or optional?
Note (6): Common Criteria certified product list.
Note(7):FirebirdSQLseemstoonlyhaveSYSDBAuserandDBowner.Therearenoseparaterolesforbackup operator and
security administrator.
Note(8):Usercandefineadedicatedbackupuserbutnothingparticularindefaultinstall. Note (9):
Authentication methods.
Note(10):InformixDynamicServersupportsPAMandotherconfigurableauthentication.BydefaultusesOS authentication.
Note (11): Authentication methods.
Note (12): With the use of Pervasive AuditMaster.
Note(13):User-basedsecurityisoptionalinPolyhedra,butwhenenabledcanbeenhancedtoarole-basedmodel with auditing.
Databasesvsschemas(terminology)
The SQL specification makes clear what an "SQL schema" is; however, different databases implement it incorrectly.
Tocompoundthisconfusionthefunctionalitycan,whenincorrectlyimplemented,overlapwiththatofthe parent-database.
An SQL schema is simply a namespace within a database, things within this namespace are addressed using the
member operator dot ".". This seems to be a universal amongst all of the implementations.
A true fully (database, schema, and table) qualified query is exemplified as such: SELECT * FROM
database.schema.table
Now, the issue, both a schema and a database can be used to isolate one table, "foo" from another like named table
"foo". The following is pseudo code:
• SELECT*FROMdb1.foovs.SELECT*FROMdb2.foo(noexplicitschemabetweendband table)
• SELECT*FROM[db1.]default.foovs.SELECT*FROM[db1.]alternate.foo(noexplicit db prefix)
The problem that arises is that former MySQL users will create multiple databases for one project. In this context,
MySQLdatabasesareanalogousinfunctiontoPostgres-schemas,insomuchasPostgreslacksoff-the-shelf
Comparison of relational database management systems 213
cross-database functionality that MySQL has. Conversely, PostgreSQL has applied more of the specification
implementing cross-table, cross-schema, and then left room for future cross-database functionality.
MySQL aliases schema with database behind the scenes, such that CREATE SCHEMA and CREATE DATABASE
areanalogs.ItcanthereforebesaidthatMySQLhasimplementedcross-databasefunctionality,skippedschema
functionalityentirely,andprovidedsimilarfunctionalityintotheirimplementationofadatabase.Insummary,
PostgresfullysupportsschemasbutlackssomefunctionalityMySQLhaswithdatabases,whileMySQLdoesnot even
attempt to support true schemas.
Oracle has its own spin where creating a user is synonymous with creating a schema. Thus a database administrator
can create a user called PROJECT and then create a table PROJECT.TABLE. Users can exist without schema
objects, but an object is always associated with an owner (though that owner may not have privileges to connect to
the database). With the Oracle 'shared-everything' RAC architecture, the same database can be opened by multiple
servers concurrently. This is independent of replication, which can also be used, whereby the data is copied for useby
different server. In the Oracle view, the 'database' is a set of files which contains the data while the 'instance' is a set
of processes (and memory) through which a database is accessed.
Informix supports multiple databases in a server instance, like MySQL. It supports the CREATE SCHEMA syntaxas
a way to group DDL statements into a single unit creating all objects created as a part of the schema as a single
owner. Informix supports a database mode called ANSI mode which supports creating objects with the same name
but owned by different users.
The end result is confusion between the database factions. The Postgres and Oracle communities maintain that one
database is all that is needed for one project, per the definition of database. MySQL and Informix proponents
maintain that schemas have no legitimate purpose when the functionality can be achieved with databases. Postgres
adheres to the SQL specification, in a more intuitive fashion (bottom-up), while MySQL ’s pragmatic
counterargument allows their users to get the job done while creating conceptual confusion.
References
[1] hsqldb(http://sourceforge.net/projects/hsqldb/files/hsqldb/hsqldb_2_2/)
[2] http://techotv.com/run-apache-mysql-php-http-web-server-android-os-phone-tablet/Run Apache, Mysql, Php –Web server on
Androidmobile or Tablet
[3] http://www.oss4zos.org/mediawiki/index.php?title=PostgreSQL#z.2FOS
[4] TransactionalDDLinPostgreSQL:ACompetitiveAnalysis(http://wiki.postgresql.org/wiki/
Transactional_DDL_in_PostgreSQL:_A_Competitive_Analysis)
[5] SQLiteFullUnicodesupportisoptionalandnotinstalledbydefaultinmostsystems(http://www.sqlite.org/faq.html#q18)(likeAndroid,Debian…)
[6] http://grokbase.com/t/postgresql/pgsql-general/12bsww982c/large-insert-leads-to-invalid-memory-alloc
[7] http://www.postgresql.org/docs/9.3/static/lo-intro.html
[8] The SQLite R*Tree Module (http://www.sqlite.org/rtree.html)
[9] SQLite Partial Indexes (http://sqlite.org/partialindex.html)
[10] SQLite FTS3 Extension (http://www.sqlite.org/fts3.html)
[11] geospatial
[12] HowdoesDrizzle handleparallel "things"?(https://answers.launchpad.net/drizzle/+question/135548)
[13] NewFeaturesinHyperSQL 2.2(http://hsqldb.org/web/features200.html)
[14] H2> Advanced >Recursive Queries (http://h2database.com/html/advanced.html#recursive_queries)
[15] H2Roadmap(http://www.h2database.com/html/roadmap.html)
[16] Informixparalleldataquery(PDQ)(http://portal.acm.org/citation.cfm?id=382443)
Comparison of relational database management systems 214
Externallinks
• Comparison of different SQL implementations against SQL standards (http://troels.arvin.dk/db/rdbms/).
Includes Oracle, DB2, Microsoft SQL Server, MySQL and PostgreSQL. (08/Jun/2007)
• Features,strengthsandweaknessescomparisonbetweenOracleandMSSQL(independent).(http://
www.wisdomforce.com/resources/docs/MSSQL2005_ORACLE10g_compare.pdf)
• The SQL92 standard (http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt)
• MetaMarketDruidIMDB(http://metamarkets.com/druid/)
• VM-WareRedisIMDB(http://redis.io/)
• CSQLDB(http://www.csqldb.com)
Document-orienteddatabase
A document-oriented database is a computer program designed for storing, retrieving, and managing
document-oriented information, also known as semi-structured data. Document-oriented databases are one of the
main categories of so-called NoSQL databases and the popularity of the term "document-oriented database"(or
[citation needed]
"document store") has grown with the use of the term NoSQL itself. In contrast to well-knownrelational
databases and their notions of "Relations" (or "Tables"), these systems are designed around an abstract notion of a
"Document".
Documents
The central concept of a document-oriented database is the notion of a Document. While each document-oriented
database implementation differs on the details of this definition, in general, they all assume documents encapsulate
and encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML,
JSON, and BSON, as well as binary forms like PDF and Microsoft Office documents (MS Word, Excel, and so on).
Documentsinsideadocument-orienteddatabasearesimilar,insomeways,torecordsorrowsinrelationaldatabases, but they
are less rigid. They are not required to adhere to a standard schema, nor will they have all the same sections, slots,
parts, or keys. For example, the following is a document:
Asecond
{ documentmight be:
FirstName: "Bob",
{
Address:"5OakSt.",
FirstName:"Jonathan",
Hobby: "sailing"
Address:"15WanamassaPointRoad",
}
Children: [
{Name:"Michael",Age:10},
{Name:"Jennifer",Age:8},
{Name:"Samantha",Age:5},
{Name:"Elena",Age:2}
]
}
Document-oriented database 215
These two documents share some structural elements with one another, but each also has unique elements. Unlike a
relational database where every record contains the same fields, leaving unused fields empty; there are no empty
'fields' in either document (record) in the above example. This approach allows new information to be added to some
records without requiring that every other record in the database shares the same structure.
Keys
Documents are addressed in the database via a unique key that represents that document. This key is often a simple
string, a URI, or a path. The key can be used to retrieve the document from the database. Typically, the database
retains an index on the key to speed up document retrieval.
Retrieval
Anotherdefiningcharacteristicofadocument-orienteddatabaseisthat,beyondthesimplekey-document(or key-value)
lookup that can be used to retrieve a document, the database offers an API or query language that allows the user to
retrieve documents based on their content. For example, you may want a query that retrieves all the documents with
a certain field set to a certain value. The set of query APIs or query language features available, as well as the
expected performance of the queries, varies significantly from one implementation to the next.
Organization
Implementations offer a variety of ways of organizing documents, including notions of
• Collections
• Tags
• Non-visible Metadata
• Directory hierarchies
• Buckets
Implementations
Name Publisher License Language Notes RESTful
API
XMLdatabaseimplementations
MostXMLdatabasesaredocument-oriented databases.
References
[1] http://www.arangodb.org/
[2] http://www.triagens.com/
[3] ArangoDBRESTAPI(http://www.arangodb.org/manuals/current/ImplementorManual.html)
[4] http://basex.org/
[5] https://cloudant.com/
[6] http://www.clusterpoint.com
[7] ClusterpointDBMSLicensingOptions(http://www.clusterpoint.com/licensing/)
[8] Documentation(http://www.couchbase.com/docs/).Couchbase.Retrievedon2013-09-18.
[9] CouchDBOverview(http://couchdb.apache.org/docs/overview.html)
[10] CouchDBDocumentAPI(http://wiki.apache.org/couchdb/HTTP_Document_API)
[11] http://exist-db.org
[12] eXist-dbOpen Source Native XML Database (http://exist-db.org). Exist-db.org.Retrieved on 2013-09-18.
[13] http://fleetdb.org/
[14] http://fleetdb.org/docs/protocol.html
[15] http://developer.marklogic.com/licensing
[16] MongoDBLicense(http://www.mongodb.org/display/DOCS/Licensing)
[17] MongoDBRESTInterfaces(http://www.mongodb.org/display/DOCS/Http+Interface#HttpInterface-RESTInterfaces)
[18] ExtremeDatabaseprogrammingwithMUMPSGlobals(http://gradvs1.mgateway.com/download/extreme1.pdf)
[19] GTMMUMPSFOSSonSourceForge(http://sourceforge.net/projects/fis-gtm/)
[20] http://www.orientechnologies.com/
[21] http://hibernatingrhinos.com
[22] Ravendb Licensing (http://ravendb.net/licensing)
[23] http://sqrrl.com/
Furtherreading
• AssafArkin.(2007,September20).ReadConsistency:DumbDatabases,SmartServices.(http://blog.labnotes.org/
2007/09/20/read-consistency-dumb-databases-smart-services/) Labnotes:Don’t let the bubble go to your head!
Externallinks
• http://solprovider.com/articles/20020612&cat=Lotus/IBM
Graphdatabase 218
Graphdatabase
A graphdatabaseis a database that uses graph structures with nodes, edges, and properties to represent and store
data. By definition, a graph database is any storage system that provides index-free adjacency. This means that every
element contains a direct pointer to its adjacent element and no index lookups are necessary. General graphdatabases
that can store any graph are distinct from specialized graph databases such as triplestores and networkdatabases.
Structure
Graph databases are based on graph theory. Graph databases employ nodes, properties, and edges. Nodes are very
similar in nature to the objects that object-oriented programmers will be familiar with.
Nodesrepresententities such as people,businesses, accounts, orany other item youmight want tokeep track of.
Properties are pertinent information that relate to nodes. For instance, if "Wikipedia" were one of the nodes, one
might have it tied to properties such as "website", "reference material", or "word that starts with the letter 'w'",
depending on which aspects of "Wikipedia" are pertinent to the particular database.
Edgesarethelinesthatconnectnodestonodesornodestopropertiesandtheyrepresenttherelationshipbetweenthe two. Most
of the important information is really stored in the edges. Meaningful patterns emerge when one examines the
connections and interconnections of nodes, properties, and edges.
Graphdatabase 219
Properties
[citation needed]
Compared with relational databases, graph databases are often faster for associative data sets , and map
more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they
do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to
manage ad hoc and changing data with evolving schemas. Conversely, relational databases are typically faster at
performing the same operation on large numbers of data elements.
Graph databases are a powerful tool for graph-like queries, for example computing the shortest path between two
nodes in the graph. Other graph-like queries can be performed over a graph database in a natural way (for example
graph's diameter computations or community detection).
Graphdatabaseprojects
[1]
The following is a list of several well-known graph database projects:
[13] Proprietary C#
Horton A graph database from Microsoft Research Extreme
ComputingGroup(XCG)
[14] [15]
basedonthecloudprogramminginfrastructureOrleans .
HyperGraphDB 1.2 LGPL Java Agraph database supportinggeneralized hypergraphs where edges
[16] (2012) can point to other edges.
InfiniteGraph 3.0 GPLv3 Java Adistributed and cloud-enabled commercial product withflexible
[17] (January licensing.
2013)
OpenLink ARDFgraphdatabaseserver,deployableasalocalembeddedinstance
Virtuoso (as used in the Nepomuk Semantic Desktop), a
single-instancenetworkserver,orashared-
nothingnetworkclusterinstance.
OntotextOW 5.3 OWLIMLiteisfree Java A graph database engine, based entirely on Semantic
[28] OWLIMSEandEnterprisearecom WebstandardsfromW3C:RDF,RDFS,OWL,SPARQL.OWLIMLitei
LIM
merciallylicenced s an "in memory" engine. OWLIM SE is robust
standalonedatabase engine. OWLIM Enterprise is a clustered
version whichoffers horizontal scalability and failover support and
otherenterprisefeatures.
[29] R2DFframeworkforrankedpathqueriesoverweightedRDFgraphs.
R2DF
Graphdatabasefeatures
The following table compares the features of the above graph databases.
[3]
Bigdata
[10]
Filament
Graphd
[18] Dynamicallytyped,
InfoGrid
object-
orientedgraph,multi
graphs,semanticmo
dels
jCoreDBGraph
[20]
OpenLink
OQGRAPH
[27]
[29]
R2DF
[30]
ROIS
sonesGraphDB
[37]
VertexDB
DistributedGraphProcessing
[46] [47]
• Angrapa - graphpackage in Hama , a bulksynchronous parallel (BSP)platform
[47]
• ApacheHama -apureBSP(BulkSynchronousParallel)computingframeworkontopofHDFS(Hadoop
Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.
[3]
• Bigdata - a RDF/graph database capable of clustered deployment.
[48]
• Faunus - a Hadoop-based graph computing framework that uses Gremlin as its query language. Faunus
providesconnectivitytoTitan,Rexster-frontedgraphdatabases,andtotext/binarygraphformatsstoredinHDFS. Faunus
[34]
is developed by Aurelius .
• FlockDB-anopensourcedistributed,fault-tolerantgraphdatabasebasedonMySQLandtheGizzardframework for
[49]
managing Twitter-like graph data (single-hop relationships) FlockDB on GitHub .
[50]
• Giraph - a Graph processing infrastructure that runs on Hadoop (see Pregel).
[51]
• GraphBase -EnterpriseEditionsupportsembeddingofcallableJavaAgentswithintheverticesofa
distributed graph.
[52]
• GoldenOrb - Pregel implementation built on top of Apache Hadoop
[53]
• GraphLab - A framework for machine learning and datamining in the cloud
[54]
• HipG -alibraryforhigh-levelparallelprocessingoflarge-scalegraphs.HipGisimplementedinJavaandis designed
for distributed-memory machine
[17]
• InfiniteGraph - a commercially available distributed graph database that supports parallel load and parallel
queries.
[55]
• JPregel - In-memory java based Pregel implementation
[56]
• KDT -Anopen-sourcedistributedgraphlibrarywithaPythonfront-endandC++/MPIbackend
[57]
(Combinatorial BLAS ).
Graphdatabase 224
• OpenLinkVirtuoso - the shared-nothing Cluster Edition supports distributed graph data processing.
[25]
• Oracle Spatial and Graph - loading, inferencing, and querying workloads are automatically and transparently
distributedacrossthenodesinanOracleRealApplicationCluster,OracleExadataDatabaseMachine,andOracle
Database Appliance.
[58]
• Phoebus - Pregel implementation written in Erlang
[59]
• Pregel -Google's internal graphprocessing platform,released details inACM paper.
[60]
• Powergraph - Distributed graph-parallel computation on natural graphs.
[61]
• Sedge -Aframeworkfordistributedlargegraphprocessingandgraphpartitionmanagement(includingan open
source version of Google's Pregel)
[62]
• Signal/Collect - a framework for parallel graph processing written in Scala
• SqrrlEnterprise-distributedgraphprocessingutilizingApacheAccumuloandfeaturingcell-levelsecurity,
massive scalability, and JSON support
[45] [34]
• Titan - A distributed, disk-based graph databasedeveloped by Aurelius .
[63]
• Trinity - Distributed in-memory graph engine under development at Microsoft Research Labs.
[64]
• ParallelBoostGraphLibrary(PBGL) -aC++libraryforgraphprocessingondistributedmachines,partof Boost
framework.
[65]
• Mizan - An optimized Pregel clone thatcan be deployed easily on Amazon EC2,or local clusters, or
stand-alone Linux systems.
APIsandGraphQuery/ProgrammingLanguages
[66]
• BoundsLanguage -terseC-stylesyntaxwhichinitiatesconcurrenttraversalsinGraphBaseandsupports
interaction between them.
[67] [68]
• Blueprints - a Java APIfor Property Graphs from TinkerPop and supported by a fewgraph database
vendors.
[69]
• Blueprints.NET -a C#/.NET APIfor generic PropertyGraphs.
[70]
• Bulbflow - a Python persistence framework for Rexster, Titan, and Neo4j Server.
[71]
• Cypher -adeclarativegraphquerylanguageforNeo4jthatenablesadhocaswellasprogrammatic (SQL-
like) access to the graph
[72]
• Gremlin - anopen-source graphprogramming language thatworks over variousgraph database systems.
[73]
• Neo4jClient - a .NET client for accessing Neo4j.
[74]
• Neography - a thin Ruby wrapper that provides access to Neo4j via REST.
[75]
• Neo4jPHP -a PHP librarywrapping the Neo4jgraph database.
[76]
• NodeNeo4j - a Node.js driver for Neo4j that provides access to Neo4j via REST
[77]
• Pacer - a Ruby dialect/implementation of the Gremlin graph traversal language.
[78]
• Pipes -alazydataflowframeworkwritteninJavathatformsthefoundationforvariouspropertygraph traversal
languages.
[39]
• Pixy - adeclarative graphquery language thatworks on anyBlueprints-compatible graph database
[79]
• PYBlueprints -a Python APIfor PropertyGraphs.
[80]
• Pygr -aPythonAPIforlarge-scaleanalysisofbiologicalsequencesandgenomes,withalignments
represented as graphs.
[81]
• Rexster - a graph databaseserver that provides a RESTor binary protocol API (RexPro).Supports Titan,
Neo4j, OrientDB, Dex, and any TinkerPop/Blueprints-enabled graph.
• SPARQL-
aquerylanguagefordatabases,abletoretrieveandmanipulatedatastoredinResourceDescriptionFramework format.
[82]
• SPASQL -anextensionoftheSQLstandard,allowingexecutionofSPARQLquerieswithinSQLstatements,
typicallybytreatingthemassubqueryorfunctionclauses.ThisalsoallowsSPARQLqueriestobeissuedthrough
"traditional" data access APIs (ODBC, JDBC, OLE DB, ADO.NET, etc.)
Graphdatabase 225
[83] [84]
• SpringDataNeo4j -anextensiontoSpringData (partoftheSpringFramework),providingdirect/native access to
Neo4j
[25]
• OracleSQL and PL/SQLAPIs - havegraph extensions forOracle Spatial and Graph.
[85]
• Styx (previouslynamedPipes.Net)-adataflowframeworkforC#/.NETforprocessinggenericgraphsand Property
Graphs.
[86]
• Thunderdome - a Titan Rexster Object-Graph Mapper for Python
References
[1] http://graph-database.org
[2] http://www.arangodb.org
[3] http://www.bigdata.com/blog
[4] http://bitbucket.org/lambdazen/bitsy
[5] http://www.brightstardb.com
[6] http://brightstardb.com/blog/2013/02/brightstardb-goes-open-source/
[7] http://sparsity-technologies.com/dex
[8] http://sparsity-technologies.com
[9] http://www.dama.upc.edu/technology-transfer/dex
[10] http://filament.sourceforge.net/
[11] http://graphbase.net/
[12] http://factnexus.com/
[13] http://research.microsoft.com/en-us/projects/ldg
[14] http://research.microsoft.com/en-us/labs/xcg
[15] http://research.microsoft.com/en-us/projects/orleans/default.aspx
[16] http://www.hypergraphdb.org
[17] http://infinitegraph.com
[18] http://infogrid.org/
[19] http://infogrid.org/wiki/Docs/License
[20] http://www.jcoredb.org
[21] http://www.neo4j.org/download
[22] neo4j.org (http://www.neo4j.org)
[23] Neo4j, World’s LeadingGraph Database(http://www.neotechnology.com/neo4j-graph-database/). Retrieved September16, 2013.
[24] DB-EnginesRankingofGraphDBMS(http://db-engines.com/en/ranking/graph+dbms).RetrievedJuly19,2013.
[25] http://www.oracle.com/technetwork/database-options/spatialandgraph/overview/index.html
[26] http://www.oracle.com/technetwork/products/nosqldb/overview/index.html
[27] http://openquery.com/graph
[28] http://www.ontotext.com/owlim
[29] http://dl.acm.org/citation.cfm?id=1988736/
[30] http://rois.eggbird.eu/
[31] http://sones.com/
[32] http://www.asterdata.com/
[33] http://titan.thinkaurelius.com/
[34] http://thinkaurelius.com
[35] http://www.VelocityGraph.com
[36] http://www.VelocityDB.com
[37] http://www.dekorte.com/projects/opensource/vertexdb/
[38] https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
[39] https://github.com/lambdazen/pixy/wiki
[40] http://sparsity-technologies.com/dex
[41] http://graphbase.net/Enterprise.html/
[42] http://graphbase.net/Agility.html/
[43] http://sqrrl.com
[44] http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
[45] http://thinkaurelius.github.com/titan/
[46] http://wiki.apache.org/hama/GraphPackage
[47] http://incubator.apache.org/hama/
[48] http://thinkaurelius.github.com/faunus/
[49] https://github.com/twitter/flockdb
Graphdatabase 226
[50] http://incubator.apache.org/giraph/
[51] http://graphbase.net/Enterprise.html
[52] http://www.goldenorbos.org
[53] http://graphlab.org
[54] http://www.cs.vu.nl/~ekr/hipg/
[55] http://kowshik.github.com/JPregel/
[56] http://kdt.sourceforge.net
[57] http://gauss.cs.ucsb.edu/~aydin/CombBLAS/html/index.html
[58] http://github.com/xslogic/phoebus
[59] http://portal.acm.org/citation.cfm?id=1582723
[60] http://graphlab.org/powergraph-presented-at-osdi/
[61] http://grafia.cs.ucsb.edu/sedge/
[62] http://code.google.com/p/signal-collect/
[63] http://research.microsoft.com/en-us/projects/trinity/
[64] http://www.boost.org/doc/libs/1_51_0/libs/graph_parallel/doc/html/index.html
[65] http://thegraphsblog.wordpress.com/the-graph-blog/mizan/
[66] http://graphbase.net/JavaAPIHelp.html#BoundsLanguage
[67] http://blueprints.tinkerpop.com
[68] http://www.tinkerpop.com/
[69] https://github.com/Vanaheimr/Blueprints.NET
[70] http://bulbflow.com
[71] http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html
[72] http://gremlin.tinkerpop.com/
[73] http://hg.readify.net/neo4jclient
[74] https://github.com/maxdemarzi/neography/
[75] https://github.com/jadell/neo4jphp/wiki
[76] https://github.com/thingdom/node-neo4j
[77] http://github.com/pangloss/pacer
[78] http://pipes.tinkerpop.com
[79] http://pypi.python.org/pypi/pyblueprints/0.1
[80] http://code.google.com/p/pygr/
[81] http://rexster.tinkerpop.com
[82] http://www.w3.org/wiki/SPASQL
[83] http://www.springsource.org/spring-data/neo4j
[84] http://www.springsource.org/spring-data
[85] https://github.com/ahzf/Styx
[86] https://github.com/StartTheShift/thunderdome
Externallinks
• NoSQLFrankfurt2010-TheGraphDBLandscapeandsones(http://www.slideshare.net/ahzf/
nosql-frankfurt-2010-the-graphdb-landscape-and-sones)
• GraphDatabasesandtheFutureofLarge-ScaleKnowledgeManagement(http://highscalability.com/
paper-graph-databases-and-future-large-scale-knowledge-management)
• Graphs in the database: SQL meets social networks
(http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/)
• Socialnetworksinthedatabase:usingagraphdatabase(http://blog.neo4j.org/2009/09/
social-networks-in-database-using-graph.html)
• Scaling Online Social Networks without Pains (http://netdb09.cis.upenn.edu/netdb09papers/netdb09-
final3.pdf)
• Large-scaleGraphComputingatGoogle(http://googleresearch.blogspot.com/2009/06/
large-scale-graph-computing-at-google.html)
• EricLai.(2009,July1).NotoSQL?Anti-databasemovementgainssteam(http://www.computerworld.com/s/
article/9135086/No_to_SQL_Anti_database_movement_gains_steam_)
Graphdatabase 227
• RenzoAngles,ClaudioGutierrez.Surveyofgraphdatabasemodels(http://portal.acm.org/citation.cfm?
id=1322433). ACM Computing Surveys, Feb. 2008.
• InfoGrid(http://infogrid.org/)-anopen-sourceapplicationplatformincludingagraphdatabase
• Rodriguez,M.A.,Neubauer,P,TheGraphTraversalPattern(http://arxiv.org/abs/1004.1001)article.
• OptimizingSchema-LastTuple-StoreQueriesinGraphd(http://portal.acm.org/citation.cfm?id=1807283)
SIGMOD 2010
NoSQL
ANoSQLdatabaseprovidesamechanismforstorageandretrievalofdatathatemployslessconstrainedconsistencymodels
than traditional relational databases. Motivations for this approach include simplicity of design, horizontalscaling
and finer control over availability. NoSQL databases are often highly optimized key–value stores intendedfor simple
retrieval and appending operations, with the goal being significant performance benefits in terms of latency and
throughput. NoSQL databases are finding significant and growing industry use in big data and real-timeweb
applications. NoSQL systems are also referred to as "Not only SQL" to emphasize that they do in fact allow SQL-
like query languages to be used.
History
Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-source relational database that did not
expose the standard SQL interface. Strozzi suggests that, as the current NoSQL movement "departs from the
relational model altogether; it should therefore have been called more appropriately 'NoREL'.
Eric Evans (then a Rackspace employee) reintroduced the term NoSQL in early 2009 when Johan Oskarsson of
Last.fm wanted to organize an event to discuss open-source distributed databases. The name attempted to label the
emergence of a growing number of non-relational, distributed data stores that often did not attempt to provide
atomicity, consistency, isolation and durability guarantees that are key attributes of classic relational database
systems.
Taxonomy
There have been various approaches to classify NoSQL databases, each with different categories and subcategories.
Because of the variety of approaches and overlaps it is difficult to get and maintain an overview of non-relational
databases. Nevertheless, the basic classification that most would agree on is based on data model. A few of these and
their prototypes are:
• Column:HBase,Accumulo
• Document:MongoDB,Couchbase
• Key-value:Dynamo,Riak,Redis,Cache,ProjectVoldemort
• Graph:Neo4J,Allegro,Virtuoso
Classificationbasedondatamodel
[1]
StephenYenin hisblog post"NoSQL isa HorselessCarriage" suggeststhefollowing:
NoSQL 228
Term MatchingDatabase
KVStore-Ordered TokyoTyrant,Lightcloud,NMDB,Luxio,MemcacheDB,Actord
KVCache Memcached, Repcached, Coherence, Hazelcast, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
Classificationbasedonfeature
Ben Scofield categorized NoSQL databases based on nonfunctional categories (“(il)ities“) plus a rating of their
[citation needed]
feature coverage:
Examples
Documentstore
The central concept of a document store is the notion of a "document". While each document-oriented database
implementation differs on the details of this definition, in general, they all assume that documents encapsulate and
encode data (or information) in some standard formats or encodings. Encodings in use include XML, YAML, and
JSON as well as binary forms like BSON, PDF and Microsoft Office documents (MS Word, Excel, and so on).
Different implementations offer different ways of organizing and/or grouping documents:
• Collections
• Tags
• Non-visible Metadata
• Directory hierarchies
Compared to relational databases, for example, collections could be considered as tables as well as documents could
be considered as records. But they are different: every record in a table has the same sequence of fields, while
documents in a collection may have fields that are completely different.
Documents are addressed in the database via a unique keythat represents that document. One of the other defining
characteristics of a document-oriented database is that, beyond the simple key-document (or key–value) lookup that
youcanusetoretrieveadocument,thedatabasewillofferanAPIorquerylanguagethatwillallowretrievalof
NoSQL 229
documents based on their contents. Some NoSQL document stores offer an alternative way to retrieve information
using MapReduce techniques, in CouchDB the usage of MapReduce is mandatory if you want to retrieve documents
based on the contents, this is called "Views" and it's an indexed collection with the results of the MapReduce
algorithms.
OpenLink Virtuoso C++, C#, Java, SPARQL middleware and database enginehybrid
Graph
This kind of database is designed for data whose relations are well represented as a graph (elements interconnected
with an undetermined number of relations between them). The kind of data could be social relations, public transport
links, road maps or network topologies, for example.
FlockDB Scala
Neo4j Java
OpenLink Virtuoso C++, C#, Java, SPARQL middleware and database enginehybrid
OrientDB Java
SonesGraphDB C#
NoSQL 230
Sqrrl Enterprise Java Distributed, real-time graph database featuring cell-level security
[5] C# [6]
VelocityGraph FullyTinkerpop Blueprints compliant. Scalable hybrid object database andgraph database
Key–valuestores
Key–value stores allow the application to store its data in a schema-less way. The data could be stored in a datatype
of a programming language or an object. Because of this, there is no need for a fixed data model. The followingtypes
exist:
KV-eventuallyconsistent
• Apache Cassandra
• Dynamo
• Hibari
• OpenLink Virtuoso
• Project Voldemort
• Riak
KV-hierarchical
• GT.M
• InterSystems Caché
KV-cacheinRAM
• memcached
• OpenLink Virtuoso
• Hazelcast
• Oracle Coherence
KV-solidstateorrotatingdisk
• Aerospike
• BigTable
• CDB
• Couchbase Server
• Keyspace
• LevelDB
• MemcacheDB (using Berkeley DB)
• MongoDB
• OpenLink Virtuoso
• phpFastCache
• Tarantool
• Tokyo Cabinet
• Tuple space
• OracleNoSQLDatabase
• IBM WebSphere DataPower XC10 Appliance
NoSQL 231
KV-ordered
• Berkeley DB
• FoundationDB
• IBMInformixC-ISAM
• InfinityDB
• MemcacheDB
• NDBM
Objectdatabase
• db4o
• GemStone/S
• InterSystems Caché
• JADE
• NeoDatis ODB
• ObjectDB
• Objectivity/DB
• ObjectStore
• ODABA
• OpenLink Virtuoso
• Versant Object Database
• WakandaDB
• ZODB
Tabular
• Apache Accumulo
• BigTable
• Apache Hbase
• Hypertable
• Mnesia
• OpenLink Virtuoso
Tuplestore
• Apache River
• OpenLink Virtuoso
• Tarantool
Triple/QuadStore(RDF)database
• MeronymySPARQLDatabase Server
• Virtuoso Universal Server
• Ontotext-OWLIM
• Apache JENA
• OracleNoSQLdatabase
NoSQL 232
Hosted
• Freebase
• OpenLink Virtuoso
• Datastore on Google Appengine
• Amazon DynamoDB
• Cloudant Data Layer (CouchDB)
Multivaluedatabases
• Northgate Information Solutions Reality, the original Pick/MV Database
• Extensible Storage Engine(ESE/NT)
• OpenQM
• Revelation Software's OpenInsight
• Rocket U2
• D3Pickdatabase
• InterSystems Caché
• InfinityDB
Celldatabase
[]
• Boardwalk
References
[1] A Yes for a NoSQL Taxonomy (http://highscalability.com/blog/2009/11/5/a-yes-for-a-nosql-taxonomy.html). High
Scalability(2009-11-05). Retrieved on 2013-09-18.
[2] TheenterpriseclassNoSQLdatabase(http://djondb.com).djondb.Retrievedon2013-09-18.
[3] http://tinman.cs.gsu.edu/~raj/8711/sp13/djondb/Report.pdf
[4] Undefined Blog: Meeting with DjonDB (http://undefvoid.blogspot.com/2013/03/meeting-with-djondb.html).
Undefvoid.blogspot.com.Retrieved on 2013-09-18.
[5] https://github.com/VelocityDB/VelocityGraph
[6] https://github.com/Loupi/Frontenac
Furtherreading
• PramodSadalageandMartinFowler(2012).NoSQLDistilled:ABriefGuidetotheEmergingWorldofPolyglot
Persistence. Addison-Wesley. ISBN0-321-82662-0.
• Christof Strauch(2012). "NoSQLDatabases"(http://www.christof-strauch.de/nosqldbs.pdf).
• MoniruzzamanAB,HossainSA(2013)."NoSQLDatabase:NewEraofDatabasesforBigdataAnalytics-
Classification, Characteristics and Comparison"(http://arxiv.org/abs/1307.0191).
• KaiOrend(2013).AnalysisandClassificationofNoSQLDatabasesandEvaluationoftheirAbilitytoReplacean Object-
relational Persistence Layer (http://citeseerx.ist.psu.edu/viewdoc/download?
doi=10.1.1.184.483&rep=rep1&type=pdf).
• GaneshKrishnan,SarangKulkarni,DharmeshKiritDadbhawala."Methodandsystemforversionedsharing,
consolidating and reporting information"(https://www.google.com/patents/US7383272?
pg=PA1&dq=ganesh+krishnan&hl=en&sa=X).
NoSQL 233
Externallinks
• ChristophStrauch."NoSQLwhitepaper"(http://www.christof-strauch.de/nosqldbs.pdf).Hochschuleder
Medien, Stuttgart.
• MartinFowler."NoSQLGuide"(http://martinfowler.com/nosql.html).
• Stefan Edlich."NoSQL databaseList"(http://nosql-database.org/).
• PeterNeubauer(2010)."GraphDatabases,NOSQLandNeo4j"(http://www.infoq.com/articles/
graph-nosql-neo4j).
• SergeyBushik(2012)."Avendor-independentcomparisonofNoSQLdatabases:Cassandra,HBase,MongoDB,
Riak"(http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html). NetworkWorld.
NewSQL
NewSQL is a class of modern relationaldatabase management systems that seek to provide the same scalable
performance of NoSQL systems for online transaction processing (read-write) workloads while still maintaining the
ACID guarantees of a traditional database system.
History
The term was first used by 451 Group analyst Matthew Aslett in a 2011 research paper discussing the rise of new
database systems as challengers to established vendors. Many enterprise systems that handle high-profile data (e.g.,
financial and order processing systems) also need to be able to scale but are unable to use NoSQL solutions because
they cannot give up strong transactional and consistency requirements. The only options previously available for
these organizations were to either purchase a more powerful single-node machine or develop custom middlewarethat
distributes queries over traditional DBMS nodes. Both approaches are prohibitively expensive and thus are not an
option for many. Thus, in this paper, Aslett discusses how NewSQL upstarts are poised to challenge the supremacy
of commercial vendors, in particular Oracle.
Systems
Although NewSQL systems vary greatly in their internal architectures, the two distinguishing features common
amongst them is that they all support the relational data model and use SQL as their primary interface. One of the
first known NewSQL systems is the H-Storeparallel database system.
NewSQLsystemscanbelooselygroupedintothreecategories:
Newarchitectures
The first type of NewSQL systems are completely new database platforms. These are designed to operate in a
distributed cluster of shared-nothing nodes, in which each node owns a subset of the data. Though many of the new
databaseshavetakendifferentdesignapproaches,therearetwoprimarycategoriesevolving.Thefirsttypeofsystem sends the
execution of transactions and queries to the nodes that contain the needed data. SQL queries are split into query
fragments and sent to the nodes that own the data. These databases are able to scale linearly as additionalnodes are
added.
• General-purpose databases —These maintain the full functionality of traditional databases, handling all types of
queries. These databases are often written from scratch with a distributed architecture in mind, and include
componentssuchasdistributedconcurrencycontrol,flowcontrol,anddistributedqueryprocessing.Thisincludes Google
Spanner, Clustrix, NuoDB and TransLattice.
NewSQL 234
• In-memorydatabases—TheapplicationstargetedbytheseNewSQLsystemsarecharacterizedashavingalarge number
of transactions that (1) are short-lived (i.e., no user stalls), (2) touch a small subset of data using index lookups
(i.e., no full table scans or large distributed joins), and (3) are repetitive (i.e., executing the same queries with
different inputs). These NewSQL systems achieve high performance and scalability by eschewing much of the
legacy architecture of the original System R design, such as heavyweight recovery or concurrency control
algorithms. Two example systems in this category are VoltDB and GoPivotal's SQLFire.
MySQLEngines
The second category are highly optimized storage engines for SQL. These systems provide the same programming
interface as MySQL, but scale better than built-in engines, such as InnoDB. Examples of these new storage engines
include TokuDB, MemSQL, and Akiban.
Transparentsharding
Thesesystemsprovideashardingmiddlewarelayertoautomaticallysplitdatabasesacrossmultiplenodes.Examples of this
type of system includes dbShards, Scalearc, ScaleBase and MySQL Cluster.
References
Article Sources and Contributors 235
ArticleSourcesandContributors
DatabaseSource: http://en.wikipedia.org/w/index.php?oldid=577356344Contributors: *drew, 05winsjp, 069952497a, 10285658sdsaa, 10metreh, 110808028 amol,
16@r,2001:558:6033:AE:4A:74EE:6356:A9B8,206.31.111.xxx,25or6to4,28421u2232nfenfcenc,28nebraska,2D,4twenty42o,65.10.163.xxx,APH,AaronBrenneman,Abhikumar1995,
Addihockey10, Aditya gopal3, Admfirepanther, Adrian J. Hunter, Aepanico, Afluegel, Ahodgkinson, Ahoerstemeier, Ahy1, Aitias, Aj.robin, Akamad, Al Wiseman, Alain Amiouni,
Alansohn,Alasdair, Ale jrb, Allan McInnes, Allecher, Alpha Quadrant (alt), Alphax, Alzpp, Amaraiel, Amaury, Amd628, Anders Torlind, Andonic, Andre Engels, Andrewferrier, AndriuZ,
Angela,Anikingos, AnjaliSinha, AnmaFinotera, Ann Stouter, AnonUser, Anonymous Dissident, Antandrus, Antrax, Apparition11, Arbitrarily0, Arcann, Arctic Kangaroo, Argon233,
Arjun01,Armen1304, ArnoLagrange, Arthena, Arthur Rubin, Arved, ArwinJ, AsceticRose, Asyndeton, AtheWeatherman, Atkinsdc, Autumn Wind, AutumnSnow, Avenged Eightfold, AwamerT,
Ayecee,AzaToth, Baa, Babbling.Brook, Barneca, Bbatsell, Bbb23, Bblank, Bcartolo, Bcontins, Beeblebrox, Beetstra, Beland, Ben Ben, Ben-Zin, Benni39, BentlijDB, Bentogoa, Bernd in Japan,
Beta M,Betterusername, Bharath357, BigPimpinBrah, Bjcubsfan, Bkhouser, Blanchardb, BlindEagle, Blood Red Sandman, BluCreator, Bluemask, Bluerocket, BobStepno, Bobblewik,
Bogdangiusca,Bogey97, Boing! said Zebedee, Boli1107, Bongwarrior, Bowlderizer, Branzman, Brick Thrower, BrokenSphere, BryanG, Btilm, Bubba hotep, Bunnyhop11, Burner0718, Buzzimu,
Bwhynot14,C12H22O11, CIreland, COMPFUNK2, CableCat, Calabe1992, Call me Bubba, Callanecc, Calliopejen1, Caltas, Calutuigor, Cambalachero, Cambapp, Cammo33, Camw, Can't sleep,
clown willeat me, CanisRufus, Canterbury Tail, Cantras, Capricorn42, Captain-n00dle, Captain-tucker, Carbonite, CardinalDan, Caster23, CasualVisitor, Cavanagh, Cenarium, CesarB, Cevalsi,
Ceyjan,Chaojoker, Cheolsoo, Chester Markel, Childzy, Chirpy, Chocolateboy, ChorizoLasagna, Chrax, Chris 73, Chris G, ChrisGualtieri, Chrislk02, Chrism, Christophe.billiottet, Chriswiki,
Chtuw,Chuckhoffmann,ChuunenBaka,Clarince63,Clark89,Click23,Closedmouth,Colindolly,ColoniesChris,Cometstyles,CommanderKeane,Compfreak7,Comps,Constructive,Conversionscript,Co
urcelles, Cpereyra, Cpl Syx, Cpuwhiz11, Craftyminion, Craig Stuntz, Crashdoom, Credema, Crucis, Cryptic, Culverin, Cyan, Cybercobra, Cyberjoac, CynicalMe, D. Recorder, DARTHSIDIOUS
2, DEddy, DFS454, DJ Clayworth, DVD R W, DVdm, DamnRandall, Dan100, Dancayta, Dancter, Danhash, Daniel.Cardenas, DanielCD, Danieljamesscott, Danim, Dart88, DarthMike, Darth
Panda, Darthvader023, Davewild, David Fuchs, David0811, Dbates1999, Dbfirs, DePiep, Dead3y3, DeadEyeArrow, DeadlyAssassin, Deathlasersonline,
Decrease789,DeirdreGerhardt,Denisarona,DerBorg,DerHexer,Deville,Dgw,Diamondland,DigitalEnthusiast,Discospinster,Djordjes,Djsasso,Dkastner,Dkwebsub,Docglasgow,Doddsy1993,Dogpos
ter,Donama, Doniago, Donner60, DougBarry, Dougofborg, Doulos Christos, DragonLord, Dreadstar, Dreamyshade, Drivenapart, Drumroll99, Duyanfang, Dwolt, Dysepsion, E23,
Eagleal,Earlypsychosis, EarthPerson, EastTN, Echartre, Eddiecarter1, Eddiejcarter, Edgar181, Edgarde, Edivorce, Edward, Eeekster, Ejrrjs, ElKevbo, Elwikipedista, Epbr123, Era7bd, Eric
Bekins,
EricBurnett,Ericlaw02,Erikrj,EscapeOrbit,Etxrge,EugeneZelenko,EvergreenFir,Everyking,Evildeathmath,Excirial,Exor674,Explicit,Eyesnore,Ezeu,FFGeyer,FangAili,FatalError,Favonian,Feedm
ecereal,FetchcommsAWB,Feydey,Fieldday-sunday,Filx,FinlayMcWalter,Flewis,Flubeca,Fluffernutter,Flyer22,FlyingToaster,Fooker69,Fortdj33,Foxfax555,Fraggle81,Frankman,Franky21, Franl,
Fratrep, Freebiekr, Frsparrow, Fubar Obfusco, Furrykef, Fuzzie, Fæ, G12kid, GDonato, GHe, GLaDOS, Gadfium, Gail, Galzigler, Garyzx, Gburd, Giftlite, Ginsengbomb,Ginsuloft, Girl2k,
Gishac, Glacialfox, GlenPeterson, GnuDoyng, Gogo Dodo, GoingBatty, Gonfus, Gozzy345, Graeme Bartlett, GraemeL, Graham87, GrayFullbuster, GregWPhoto, Gregfitzy,GregorB, Grim23,
Grsmca, Grstain, Gsallis, Gscshoyru, Gwizard, Gzkn, Haakon, Hadal, HaeB, Hamtechperson, Hankhuck, HappyInGeneral, Harej, Hasek is the best, HeliXx,
Helixblue,Helloher,HexaChord,Heymid,Heysim0n,Hotstaff,Hshoemark,Hugsandy,Huntthetroll,Hurricane111,HybridBoy,HydrogenIodide,IComputerSaysNo,IElonex!,IcedKola,Igoldste,Imfargo,
Imnotminkus, Imran, Informatwr, Insineratehymn, Inspector 34, Intgr, IrfanSha, Ironman5247, Isfisk, Itafran2010, ItsZippy, Ixfd64, J.delanoy, JCLately, JForget, JJdaboss, JLaTondre,
JMRyan,Ja 62, JaGa, Jab843, Jabby11, Jack Greenmaven, Jackacon, JamesBWatson, Jan1nad, Jarble, Jasimab, Jasper Deng, Jauerback, Javert, Jaxl, Jay, Jb-adder, Jclemens, Jdlambert,
JeffTan,JeffreyYasskin, Jennavecia, JephapE, Jerome Charles Potts, Jk2q3jrklse, Jmanigold, Jni, JoanneB, Joel7687, Joffeloff, John Vandenberg, John of Reading, Johnuniq, Jojalozzo, Jonathan
Webley,JonathanFreed, Jondel, Jonearles, Jonwynne, Joshnpowell, Joshwa1234567890, Journalist, Jschnur, Jstaniek, JunWan, Jvhertum, Jwoodger, Jwy, KILLERKEA23, Kanonkas,
Karlhahn,Karmafist, Katalaveno, Keenan Pepper, Keilana, Kekekecakes, Kellyk99, Kenny sh, Kevins, KeyStroke, Khazar2, Khoikhoi, Kiand, Kimberly ayoma, Kimera Kat, King of Hearts,
Kingius,Kingpin13, KingsleyIdehen, Kivar2, Kkailas, Knbanker, Koavf, Kocio, Komal.Ar, KoshVorlon, KotetsuKat, Kozmando, Kraftlos, Krashlandon, Kslays, Kukini, Kunaldeo, Kungfuadam,
Kuru,Kushal one, Kvasilev, Kwiki, KyraVixen, Kzzl, L Kensington, LC, Lamp90, LaosLos, Larsinio, Latka, Leaderofearth, Leandrod, LeaveSleaves, LeeHam2007, Leonnicholls07, LessHeard
vanU,Levin, Levin Carsten, Lexo, Lflores92201, Lfstevens, Lguzenda, Lights, LindaEllen, Lingliu07, Lingwitt, Linkspamremover, LittleOldMe, LittleWink, Llyntegid, Lod, Logan,
Lotje,Lovefamosos, Lovelac7, Lowellian, Lradrama, Lsschwar, LuK3, Lucyin, Lugia2453, Luizfsc, Luna Santin, M.badnjki, M4gnum0n, MBisanz, MECiAf., MER-C, MJunkCat,
Machdohvah,Madhava 1947, Magioladitis, Majorly, Makecat, Malvikiran, Mandarax, Manikandan 2030, Mannafredo, Marasmusine, Mark Arsten, Mark Renier, MarkSutton, MartinSpamer,
Materialscientist,Mathewforyou, Mato, Matthewrbowker, Matticus78, Mattisse, Maty18, Maury Markowitz, Max Naylor, Maxferrario, Maxime.Debosschere, Maxmarengo, Mayur, Mazca,
Mboverload,McGeddon, Mdd, Meaghan, Mean as custard, Mediran, Megatronium, Melody Lavender, Melucky2getu, Mentifisto, Menublogger, Mercy11, Methnor, Mhkay, Michael Hardy,
Michael Slone,Microchip08, Mike Dillon, Mike Rosoft, Mike Schwartz, Mike99999, MikeSy, Mikeblas, Mikey180791, MilerWhite, Millermk, Milo99, Mindmatrix, Minimac, Minna Sora no
Shita, Mkeranat,Moazzam chand, Mojo Hand, Moreschi, Morwen, MrNoblet, Mrozlog, Mrt3366, Mspraveen, Mugaliens, Mukherjeeassociates, Mulad, Mumonkan, Mushroom, Mwaci11,
Mxn,N1RK4UDSK714, N25696, NAHID, NSR, Nafclark, Namlemez, Nanshu, NathanBeach, NawlinWiki, NetManage, Netizen, NewEnglandYankee, Ngpd, Nick, Nicoosuna, Niteowlneils,
Nk,Noah Salzman, Noctibus, Noldoaran, Northamerica1000, Northernhenge, Nsaa, Nurg, Ocaasi, Oda Mari, Odavy, Oddbodz, Ohka-, Oho1, Oli Filth, Olinga, OllieFury, OnePt618,
OrgasGirl,Oroso,OverlordQ,PJM,PaePae,Pak21,PappaAvMin,Parzi,PatrikR,PaulAugust,PaulDrye,PaulEEster,PaulFoxworthy,Paulinho28,Pcb21,Pdcook,Peashy,PeeTern,PeeAeMKay,Pengo,Pere
grineAY, Peruvianllama, Pete1248, Peter Karlsen, Peter.C, Pgk, Phantomsteve, Pharaoh of the Wizards, Phearlez, PhilKnight, Philip Trueman, Philippe, Phinicle, Phoenix-wiki, Piano nontroppo,
Pillefj, Pingveno, Pinkadelica, Pjoef, Plrk, Pnm, Poeloq, Pol098, Poor Yorick, Poterxu, Praba tuty, Prabash.A, Prari, Prashanthns, Pratyya Ghosh, PrePress, Preet91119,
Proofreader77,Prunesqualer, Psaajid, Psb777, Puchiko, Pvjohnson, Pyfan, Quadell, Qwertykris, Qwyrxian, Qxz, R'n'B, R3miixasim, RIH-V, RadioFan, RadioKirk, Rafaelschp, Railgun, Rakeki,
Raspalchima,Ravinjit, Ray Lightyear, RayGates, RayMetz100, RazorXX8, Rdsmith4, Reaper Eternal, Reatlas, Refactored, Regancy42, Reidh21234, RenamedUser01302013, Rettetast, RexNL,
Rhobite, RichFarmbrough, Ricky81682, Ringbang, Rishu arora11, Riverraisin, Rj Haseeb, Rjwilmsi, Robert Merkel, Robert Skyhawk, Robocoder, Robth, Rocketrod1960, Rockonomics,
Rohitj.iitk, Romanm,Rotanagol, Rothwellisretarded, Roux, Rursus, Ruud Koot, Ryager, Ryanslater, Ryanslater2, Ryulong, S.K., SAE1962, SDSWIKI, SFK2, SJP, SWAdair, Sae1962, Saiken79,
Salvio giuliano, SamBarsoom, Sam Korn, SamJohnston, Samir, Sander123, Sandman, Sango123, Sarchand, SarekOfVulcan, Satellizer, SatuSuro, Saturdayswiki, Savh, ScMeGr, Sceptre, Seanust
1, Seaphoto,SebastianHelm, Serketan, Several Pending, Sewebster, Shadowjams, Shadowseas, Sheeana, Shipmaster, Shirulashem, Siebren, Silly rabbit, Simeon, Simetrical, SimonMorgan,
Sintaku, SirNicholas de Mimsy-Porpington, Sissi's bd, Siteobserver, Sjakkalle, Sjc, Skamecrazy123, Skybrian, Slakr, Sleske, SnoFox, Somchai1029, Sonett72, Sonia, Soosed, Sophus Bie,
Soumark, SpK,Spartaz, Spazure, Spdegabrielle, SpikeTorontoRCP, SpuriousQ, Squids and Chips, Srdju001, Srikeit, Ssd, StaticVision, Stdazi, Stephen Gilbert, Stephenb, Stevertigo, Stifle,
Stirling Newberry,Storm Rider, Strike Eagle, Strongsauce, Stuhacking, Sucker666, Sudarevic, Suffusion of Yellow, SuperHamster, Supertouch, Supreme Deliciousness, Supten, SwisterTwister,
SymlynX, Sythy2,Tabletop, Tablizer, TakuyaMurata, TalkyLemon, Tasc, Tazmaniacs, Technopat, Tgeairn, Th1rt3en, Thatperson, The Anome, The Thing That Should Not Be, The Wiki
Octopus, The wub,TheGrimReaper NS, TheNewPhobia, Thedjatclubrock, Thehulkmonster, Theimmaculatechemist, Theodolite, Theory of deadman, Thingg, Think outside the box, Thinktdub,
Thomasryno,ThumbFinger, Thumperward, Tictacsir, Tide rolls, Tim Q. Wells, TimBentley, Title302, TittoAssini, Tobias Bergemann, Tolly4bolly, Tomatronster, Tonydent, Toquinha,
Tpbradbury, Treekids,TrentonLipscomb, Trevor MacInnis, Triwbe, Troels Arvin, Trusilver, Tualha, Tudorol, Tuhl, Turlo Lomon, Turnstep, Twebby, Twelvethirteen, TwistOfCain,
TwoTwoHello, Twsx, TyA, Tyler,UberScienceNerd, Ubiq, Ubiquity, Ugebgroup8, Ulric1313, Ultraexactzz, Uncle Dick, Unyoyega, VNeumann, Vary, Velella, Verajohne, Versus22,
Veryprettyfish, Vespristiano, Victor falk,Vikreykja, Vincent Liu, Vipinhari, Vishnava, Visor, Vivacewwxu, VoxLuna, Vrenator, W mccall, W163, WOSlinker, Waggers, Waveguy, Wavelength,
Weetoddid, Welsh, Werdna, Widefox,Widr, Wifione, Wik, Wiki alf, Wiki tiki tr, Wikidrone, Wikipelli, WikiuserNI, Willking1979, Wimt, Windsok, Winterst, Wipe, Wmahan, Woohookitty,
Woseph, Writeread82, Wulfila, Wwmbes,Wya7890,Xfact,Xhelllox,Xin0427,Yintan,Yossman007,ZenerV,Zhenqinli,Zipircik,Zippanova,ZooPro,Zro,Zundark,Zzuuzz,Σ,МИФ,ﻋﻘﯿﻞﮐﺎﺷﻒ, 雷 大 伟
, 3114anonymousedits
DatabasemodelSource:http://en.wikipedia.org/w/index.php?oldid=570714755Contributors:AGK,ARUNKUMARP.R,AgadaUrbanit,Airplaneman,Alansohn,Amywhattt,AutumnSnow,Beland,
Bill Slawski, CharlesBarouch, Cybercobra, Danim, Decrease789, Dkwebsub, Dwils098, ENeville, Edward, FatalError, J.delanoy, Jabbba, Jbolden1517, JoyMundy, Jwoodger, LaosLos,
Magomaitin,MainFrame,MarkRenier,Materialscientist,Mdd,Mihai-gr,Mindmatrix,Minimac,MinnaSoranoShita,Mr.Vernon,Nn123645,Porterjoh,Razorbliss,Richramos,Roenbaeck,SunCreator,
Tim1357, Vegaswikian, Wikipelli, Woohookitty, 81 anonymous edits
DatabasenormalizationSource:http://en.wikipedia.org/w/index.php?
oldid=577707258Contributors:1exec1,4pq1injbok,A3nm,ARPITSRIVASTAV,Ahoerstemeier,Akamad,Akhristov,Alai,Alasdair, Alest, Alexey.kudinkin, Alpha 4615, Amr40, AndrewWTaylor,
Antonielly, Anwar saadat, Apapadop, Arakunem, Arashium, Archer3, Arcturus, Arthena, Arthur Schnabel, Ascend,
AstroWiki, AubreyEllenShomo, Autocracy, AutumnSnow, Azhar600-1, BMF81, Babbling.Brook, Bernard François, Bewildebeast, Bgwhite, Billben74, Billpennock, BillyPreset, Black
Eagle,Blade44,Blakewest,Blanchardb,Bloodshedder,Blowdart,BlueNovember,BlueWanderer,Bongwarrior,Boson,Bovineone,BradBeattie,BrickThrower,BrokenSegue,Bruceshining,Bschmidt,Bu
gsbunny1611, BuzCo, CLW, Callavinash1, Can't sleep, clown will eat me, Chairboy, Chrislk02, Citral, Cl22333, CodeNaked, Combatentropy, Conversion script, Creature, Crenner,Crosbiesmith,
DARTH SIDIOUS 2, Damian Yerrick, DanMS, Dancraggs, Danim, Danlev, Datasmid, David Colbourn, DavidConrad, DavidHOzAu, Davidhorman, Dean001, Decrease789,Demosta,
Denisarona, DerHexer, Dfass, Dflock, Discospinster, DistributorScientiae, Doc vogt, DocRuby, Docu, Don Hammond, Doud101, Dqmiller, Dreftymac, Drowne, Dthomsen8, DukeGanote, Ed
Poor, Edward Z. Yang, Eghanvat, Elcool83, Electricmuffin11, Elwikipedista, EmmetCaulfield, Emperorbma, Emw, Encognito, Enric Naval, Epepke, Eric Burnett, Escape Orbit,Ethan,
Evilyuffie, Ewebxml, Falcon8765, Farquaadhnchmn, Fathergod, FauxFaux, Fieldday-sunday, Fireman biff, Flewellyn, Fluffernutter, Fmjohnson, Fraggle81, Fred Bradstadt, Furrykef,Gadfium,
GateKeeper, Gimboid13, Ginsuloft, Gk5885, Gogo Dodo, Gottabekd, Gregbard, GregorB, Groganus, Gustavb, Guybrush, HMSSolent, Hadal, Hairy Dude, Hanifbbz, Hapsiainen, HbJ,Hbf,
Heracles31, HiDrNick, Hoo man, Hu12, Hydrogen Iodide, Hz.tiang, Ianblanes, IceUnshattered, Imre Fabian, Inquam, Intgr, Jadvinia, Jakew, James086, JamesBWatson, Jamesday,Jamesjusty,
Jan Hidders, Japo, Jarble, Jason Quinn, Javert16, Jdlambert, Jgro, Jjjjjjjjjj, Jklin, Joness59, Joseph Dwayne, Jpatokal, Jpo, Justin W Smith, KAtremer, KathrynLybarger, Keane2007,Keegan,
KevinOwen, KeyStroke, Keyvez, Kgwikipedian, Kingpin13, Klausness, Kushalbiswas777, L Kensington, L'Aquatique, LOL, Larsinio, Lawrence Cohen, Leandrod, Lee J Haywood,Legless the
oaf, Leleutd, Leotohill, Lerdthenerd, Les boys, Lethe, Libcub, Lifeweaver, Linhvn88, LittleOldMe, Longhair, Lssilva, Lujianxiong, Lulu of the Lotus-Eaters, Lumingz, Luna
Santin,M4gnum0n,MER-
C,Magantygk,Manavkataria,MarkRenier,Marknew,MarownIOM,MartinHarper,Masterstupid,Materialscientist,Matmota,Matthew1130,Mckaysalisbury,Metaeducation,Michael Hardy, Michalis
Famelis, Michealt, Microtony, Mike Rosoft, Mikeblas, Mikeo, Mindmatrix, Miss Madeline, Mjhorrell, Mo0, Modeha, Mooredc, Mpd, Mr Stephen, MrDarcy, MrOllie,Nabav, NawlinWiki,
Nick1nildram, NickCT, NoahWolfe, Nocat50, Noisy, Northamerica1000, Nsaa, NubKnacker, Obradovic Goran, Ocrow, OliverMay, Olof nord, Opes, Oxymoron83,
Pagh,Peachey88,Pearle,Perfectblue97,Pete142,PharaohoftheWizards,PhilBoswell,PhilipTrueman,PieMan360,Pinethicket,Plasticrat,Polluxian,Prakicov,ProveIt,Purplepiano,Quarl,RB972,
Article Sources and Contributors 236
RBarryYoung, RadioFan, Railgun, Rathgemz, Rdsmith4, Rdummarf, RealityApologist, Reedy, Regancy42, Reinyday, Remy B, Reofi, RichF, Rjwilmsi, Robert McClenon,
Robomaeyhem,Rockcool19,Rodasmith,Romke,Ronfagin,Rp,Rumplefish,RuudKoot,Ryulong,SamHocevar,Sasha.sheinberg,SchuminWeb,ScottJ,Scwlong,Seaphoto,Sfnhltb,Shadowjams,Shaking
lord,Shawn wiki, Shreyasjoshis, Shyamal, Silpi, Simeon, Simetrical, Sixpence, Skritek, Smjg, Smurfix, Snezzy, Snigbrook, Socialservice, Sonett72, Soulpatch, Soumyasch, Spacesoon,
Sstrader,Stacyshaelo, Stannered, Starwiz, Stephen e nelson, Stephenb, SteveHL, Stifle, Stolkin, Strike Eagle, Sue Rangell, Superjaws, Sydneyw, Sylvain Mielot, Szathmar, Taw, Tbhotch,
Tcamacho,Tedickey, Teknic, Tgantos, Thane, The Thing That Should Not Be, The undertow, The1physicist, Tide rolls, Titofhr, Tobias Bergemann, Toddst1, Tom Lougheed, Tom Morris,
Tommy2010,Toxicwaste288,Traxs7,TroelsArvin,Turnstep,Twinney12,Tyc20,Unforgettableid,Upholder,Utcursch,Vald,Valdor65,Vampyrium,VanishedUserABC,Velella,VinceBowdren,Vladsin
ger,Vodak, Voidxor, Waggers, Wakimakirolls, Wammes Waggel, Wavelength, Wexcan, WikiPuppies, WikipedianYknOK, Wildheat, Wilfordbrimley, Wilsondavidc, Winterst,
Wjhonson,Woohookitty,WookieInHeat,XiongChiamiov,Xiroth,Yong-YeolAhn,Zedla,Zeyn1,Zhenqinli,Zzuuzz,石 庭 豐 , 1332anonymousedits
DatabasestoragestructuresSource: http://en.wikipedia.org/w/index.php?oldid=565777290Contributors: Abdull, Alai, Andrewman327, Beland, Decrease789, ElKevbo, Grafen, Jaytwist,Lenshapir,
Mark Renier, Mskfisher, Rocketrod1960, Rursus, Troels Arvin, TubularWorld, 21 anonymous edits
DistributeddatabaseSource:http://en.wikipedia.org/w/index.php?
oldid=575281332Contributors:Alansohn,Ammubhave,Anthony,ArthurRubin,Beland,Bomazi,Bporopat,CanisRufus,Centrx, Compfreak7, Danim, Derbeth, Dewritech, Donhalcon, Dpkade,
Eastlaw, Eliz81, Gary King, Gensanders, GeorgeBills, Gregbard, Hooperbloob, Hu, Intelligentfool, Intgr, JCLately,
Jamelan, Jandalhandler, Jason.yosinski, Jim1138, KeyStroke, Kku, Kuteni, Lguzenda, LilHelpa, M4gnum0n, Magioladitis, MelRobinson, Mere Mortal, Michaellacorte, Miym,
Mschlindwein,Nikhilsearch,Nivix,Nonnompow,Owenja,Ozsu,Passport90,Pebkac,Perfecto,Peruvianllama,PigFluOink,Prasanna8585,Ramaksoud2000,Satellizer,Sboehringer,Shibaji.paul,Sparky1
32,Squiddy, Sun Creator, Tempodivalse, Terry1944, TheThomas, Uncle Dick, Vektor330, Wbigger, Wizgha, 161 anonymous edits
FederateddatabasesystemSource: http://en.wikipedia.org/w/index.php?oldid=571954221Contributors: Beland, Bovineone, Cantonnier, Chris the speller, Comps, DBigXray, Dfoxvog,
Frap,Gilo1969, Hu12, Joy, Khazar2, KingsleyIdehen, Kku, MacTed, Mark Renier, Martarius, Meena610, Mgh12, P. Dantressangle, Pmehra5730, Pumba lt, R'n'B, Repentsinner, Rettetast,
Ringbang,
Rjwilmsi,Sfan00IMG,Shyamal,Tabletop,Tankiitr,TheThingThatShouldNotBe,TheParanoidOne,Threazy,Uthbrian,VanishedUserABC,Venullian,Woodshed,YannGripay,63anonymousedits
ReferentialintegritySource:http://en.wikipedia.org/w/index.php?oldid=576386750Contributors:A3nm,AimHere,Allens,Amux,AndyDingley,AnubisAscended,AutumnSnow,BL,Bearcat,
Brandon, Brick Thrower, Daniel.Cardenas, Darkunor, Daviburg, DavidLevinson, Elwikipedista, Excirial, FatalError, Flon22, Friendlydata, Greentryst, I dream of horses, KeyStroke,
Kmarshba,Losttourist,MarkRenier,Materialscientist,Michael.Urban,Mindmatrix,Mtking,MuslimloJuheu,Nburden,Neurolysis,Niceguyedc,Nivix,ObradovicGoran,Omicronpersei8,PatrickJCollins,
Penartur, Philip Trueman, Psb777, Reatlas, Reedy, RuM, Sae1962, Sam Hocevar, Sietse Snel, Simtay, Snodnipper, Staszek Lem, Suvs2011, Ta bu shi da yu, Tarquin,Tolly4bolly, Varuna,
Wjhonson, Wlievens, 113 anonymous edits
RelationalalgebraSource:http://en.wikipedia.org/w/index.php?
oldid=574653092Contributors:2620:0:1002:1003:C06E:62E9:72C9:FDA2,Agquarx,AlainAmiouni,AlanLiefting,Alansohn,AlecTaylor, AndrewWarden, Anuj royal, Arthur Rubin,
Arunloboforever, Austinflorida, AutumnSnow, Banazir, BiT, Blahedo, Blaisorblade, Brick Thrower, Bug, CALR, CRGreathouse,
Cdrdata, Charvest, Chewings72, Chocolateboy, Chris the speller, Clawed, Cmdrjameson, Combatentropy, Cometstyles, CountMacula, Cryout, Cybercobra, Cycchina, DaveVoorhis,
Davidfstr,Davnor, Derbeth, Dessources, Dhanuthilaka, DoriSmith, Download, Drowne, Drunken Pirate, EagleFan, Ed g2s, Edcolins, Egmontaz, Egriffin, Elektron, Elwikipedista, Esalder,
Ezrakilty, FabianPijcke, Falcor84, Flyhighplato, Fresheneesz, FuFoFuEd, Gazpacho, Geira, Giftlite, Gregbard, GregorB, Hadal, Hans Adler, Hasanv, Hughitt1, Hussaibi, Hypergraph,
IceCreamAntisocial,Infestor, IvanLanin, JackPotte, Jan Hidders, Jan1nad, Jarble, Javert16, JingguoYao, Jleedev, Joebolte, JohnyDog, Jon Awbrey, Joseph Dwayne, Jsnx, Juansempere, Justin W
Smith, Kanenas,Keegan, KelvSYC, Khalid hassani, Kinaro, Kjetil r, Kku, Klausness, KnightRider, LOL, Lambiam, Larsinio, Leaflord, Lemycanh, Lfstevens, LtWorf, Magic5ball, Maksim-e,
Mandries,
Mani1,MarkRenier,Matthiaspaul,Mckaysalisbury,Mcthree,Mdd,Mets501,MichaelHardy,Michealt,Mikeblas,Mindmatrix,Msnicki,Myheimu,Nbarth,NewEnglandYankee,Ntmatter,O.Koslowski,Oc
ranom, Oleg Alexandrov, PanagosTheOther, Peruvianllama, Peter.vanroose, Pgan002, Phamthelong, Polluxian, Popol1991, Qwertyus, R'n'B, Rathgemz, Reedy, Rgrimson, Rishig327,Rjwilmsi,
Rleyton, Rsrikanth05, Ruakh, Rursus, Salix alba, Sam Staton, Samppi111, Sbrenesms, Scf1984, Schmid, Sdorrance, ShadowPhox, Shreyasjoshis, Sir Nicholas de Mimsy-
Porpington,Slgcat,SnowFire,Sspecter,Stephan202,Tablizer,Tgeairn,Theundertow,Tijfo098,Tommy2010,Tompsci,TroelsArvin,Twimoki,Vaucouleur,Vegpuff,Viperlight89,Wavelength,Way2veers
,WayneSlam,We64,Wifki,Wikipelli,Wrp103,Xcpenguin,Yoosofan,ZinkDawg,Ziusudra,と あ る 白 い 猫 , 343anonymousedits
RelationalcalculusSource:http://en.wikipedia.org/w/index.php?
oldid=541136897Contributors:AutumnSnow,Cdrdata,Elwikipedista,Gregbard,Guppyfinsoup,JanHidders,Jim1138,Joieko,Jpbowen, Kku, Leandrod, Lfstevens, Mark Renier, Michael Hardy,
Mikeblas, Mindmatrix, Omnipaedista, Opabinia regalis, Pewwer42, Remuel, Robert L Pendleton, Rsrikanth05, SqlPac,
TheTito, 26 anonymous edits
RelationaldatabasemanagementsystemSource: http://en.wikipedia.org/w/index.php?oldid=576713352Contributors: 16@r, AMD, Abb615, Acrider, Afabbro, Aldie, Ale And Quail,Altenmann,
Anastrophe, Anvish, Anwar saadat, Apokrif, Athaenara, AutumnSnow, BL, BMF81, Ballin Insane10, Beland, Bgibbs2, Bob hoskins, BodyTag, Bonadea, Borgx, Brassrat70s,
Bressan, Brick Thrower, Brilliantwiki, Cactus26, Chaitrabhat7, Chris Roy, Cnb, Craig Stuntz, Crosbiesmith, Cryptic, Cwitty, DB 103245, Darx9url, Davedx, Daverocks,
Dewritech,DigitalEnthusiast, Dockurt2k, Elwikipedista, EoganOD, Faizan, FatalError, Fatehyab ahmed, Flata, George Rodney Maruri Game, Grunt, Gurch, HEAdrian, Heimstern, II MusLiM
HyBRiD
II,Igoldste,IgorYalovecky,Ikhzter,J.delanoy,J36miles,JCLately,JFM,JHMM13,JamesBWatson,Jameshfisher,JanHidders,Jdthood,Jnlin,Joao.matos,Josemanimala,JosephDwayne,Josephchennai,
Jtgerman, Kaihsu, Karada, Kate, Kernel.package, KeyStroke, Kingston Dominik, Klausness, Kotika98, Kuru, Larsinio, Leandrod, Lfstevens, LinguistAtLarge, Lowellian, Lulu of theLotus-
Eaters, Mangoe, Mark Renier, Maximaximax, Mckaysalisbury, MelbourneStar, MikeSchinkel, Mikeblas, Mindmatrix, Minghong, Mintleaf, Mr4top, Mxn, Neilc,
Nicks100,NuclearWarfare,Nylex,Oberiko,ObradovicGoran,Ohnoitsjamie,Ohyoko,Palica,Paulcolmer,Payal2820,PhiLiP,Pi,Pichpich,PratyyaGhosh,QuestforTruth,RedHillian,Reddi,Reedy,Rfl,Rh
obite, Robert Brockway, Rror, Sasquatch, Setppo, Shabbirbhimani, Shepazu, Smyth, Sparkiegeek, SqlPac, Stevegiacomelli, Szajd, Tablizer, TallMagic, TenPoundHammer,
Tolly4bolly,Tomcat66 g500, Troels Arvin, Turnstep, UnDeRTaKeR, Uriber, Useight, VTPG, Vanished user qkqknjitkcse45u3, Vegaswikian, Vincent Liu, WIKIWIZWORKER, Wykypydya,
Xcasejet,Xphile2868, Лев Дубовой, 382 anonymous edits
RelationalmodelSource:http://en.wikipedia.org/w/index.php?
oldid=573775447Contributors:130.94.122.xxx,62.114.199.xxx,A930913,Adamcscott,Altenmann,AndrewWTaylor,AndrewWarden, AndyKali, AnonMoos, Arthur Rubin, Ashrust,
Asukite, Audiodude, AutumnSnow, Aytharn, BD2412, BMF81, Babbling.Brook, Bblfish, Beland, Bento00, Bobo192,
BonsaiViking, Brick Thrower, Brion VIBBER, Budloveall, CBM, Cadr, Cathy Linton, Cconnett, ChaosControl, Chessphoon, Chrisahn, Chrissi, Conti, Conversion script, Craig
Stuntz,Crashoffer12345, Crosbiesmith, DARTH SIDIOUS 2, Damian Yerrick, Danim, DaveVoorhis, David Eppstein, Derek Ross, Dreadstar, Drunken Pirate, EagleFan, Ehn, Elwikipedista, Emx,
EnricNaval, Erik Garrison, Evildeathmath, Furrykef, Fyrael, Gadfium, Gary D, Gary King, Giftlite, Gilliam, Grassnbread, Greenrd, Gregbard, GregorB, Gurch, Hans Adler, Helvetius,
Hyacinth,Ideogram,Iluvitar,Immunize,Irishguy,Ixfd64,J04n,JCLately,Jadedcrypto,Jalesh,JanHidders,Jarble,Jbolden1517,Jeff3000,JesseW,Jklowden,Jmabel,Joelm,JonAwbrey,Jpbowen,Kassie,Ke
ndrick Hang, Ketiltrout, Khalid hassani, Kimchi.sg, Kjkolb, Klausness, Korrawit, Lahiru k, Larsinio, Leandrod, Leifbk, Lethe, Lfstevens, Lopifalko, MER-C, Madjestic,
Magioladitis,Maokart444, MarXidad, Marc Venot, Mark Renier, Materialscientist, Matt Deres, Mblumber, Mckaysalisbury, Mdd, Metaeducation, Mets501, Mhkay, Michael Hardy, MilerWhite,
Mindmatrix,Moogwrench, Muntfish, NSash, Nad, Nascar1996, Neilc, Niteowlneils, NonDucor, Nsd, Ocaasi, Ocrow, Ozten, Pablothegreat85, Paul Foxworthy, Pitix, Pmsyyz, Pol098,
PsychoAlienDog,Quazak Zouski, R'n'B, Razorbliss, Rbrwr, Reedy, Reyk, RonaldKunenborg, Ronhjones, Rp, Rursus, Ruud Koot, S.K., Sae1962, Sdorrance, Seraphim, SeventyThree, Sietse Snel,
Simetrical,SimonP, Sonett72, Spartan-James, Spellcast, SpuriousQ, SqlPac, SteinbDJ, Stevertigo, THEN WHO WAS PHONE?, Tatrgel, Teknic, The-G-Unit-Boss, Tjic, Tobias Bergemann,
Tohobbes,Tony1, Toreau, Troels Arvin, Tualha, Turnstep, Vikreykja, Welsh, Wgsimon, Windharp, Winhunter, Wjhonson, Woohookitty, Zklink, 303 anonymous edits
TransactionprocessingSource: http://en.wikipedia.org/w/index.php?oldid=574602965Contributors: 16@r, Abdull, Adolphus79, Agateller, Akulkis, Alkamins, Atlant, Avb,
Awolski,BBCWatcher,BD2412,Baiji,Beland,Beve,Bnicolae,Bruvajc,CaroleHenson,Cbwash,ChairmanS.,Charleyrich,Clausen,Cliffb,CraigStuntz,CutOffTies,DGG,Danielle009,Danim,Donsez,
Download,Ellynwinters,Gfuip,Ghaskins,Gordonjcp,GregRobson,Gutza,JCLately,JHunterJ,Jan1nad,Jmcw37,Jorgenev,JoshuaScott,Kgf0,Khalidhassani,Kubanczyk,Lear'sFool,LuísFelipe Braga,
M4gnum0n, MER-C, MONGO, Mandarax, Mark Renier, Maury Markowitz, Mika au, Mikeblas, Mindmatrix, MrOllie, Oo7nets, Oxymoron83, Pcap, Peter Flass,
Pratyeka,Radagast83,Rbpasker,Rettetast,RuudKoot,SEWilco,StephanLeeds,Stymiee,Suruena,TobiasBergemann,Tschristoppe,Unimath,Uzume,Wirelessfriend,Wtmitchell,Zippy,Zzuuzz,106anon
ymousedits
Null(SQL)Source: http://en.wikipedia.org/w/index.php?oldid=577846588Contributors: Abdull, Alejos, Andreas Kaufmann, Andylkl, Andyyso, Arcann, Beno1000, Bgwhite,
BradeosGraphon,Cedar101,Cybercobra,Daniel.Cardenas,Dudegalea,EJSawyer,Ehdrive11,Elwikipedista,Furrykef,GaiusCornelius,Gregbard,GregorB,Halo,Harryboyles,Haus,HaywardRoy,
Hu12,IgorYalovecky,IncnisMrsi,Iqbalhosan,Jdlambert,JohnofReading,JonathandeBoynePollard,Julesd,Koavf,Kobrabones,Langpavel,Lightmouse,LilHelpa,Loadmaster,LuísFelipeBraga, MER-
C, Mahahahaneapneap, Malleus Fatuorum, Mark Renier, Matozqui, Mblumber, MeekMark, Michael Hardy, Mikeblas, Mindmatrix, Modify, Mwtoews, Myheimu, Nigelj,
Nirion,Northernhenge,Ntounsi,Ott2,Plustgarten,Quadell,Random832,RichFarmbrough,Rjwilmsi,RockMFR,Ruakh,Senpai71,Simetrical,SmallRepair,Smith609,SnappingTurtle,SqlPac,Stolze,Tas
devil13, Terrifictriffid, Teukros, The Fortunate Unhappy, Three-quarter-ten, Tijfo098, Tony1, Visor, Voer, Xoneca, Zeeyanwiki, Zhenqinli, 109 anonymous edits
CandidatekeySource:http://en.wikipedia.org/w/index.php?
oldid=575890156Contributors:Acroterion,AlexPlank,Amikake3,AndrewWarden,Arravikumar,Axiomsofchoice,BrickThrower,Bryant1410, Captmjc, Charles Matthews, Crosbiesmith,
CyborgTosser, DMacks, DVdm, Dharmabum420, Docu, Ejrrjs, Eric22, FuthaMukker, Hans Adler, HenningThielemann, J.delanoy, Jan
Hidders, Jleedev, Jorge Stolfi, Josephbui, JoshDuffMan, Kalyson, KeyStroke, LanguageMan, Mark Renier, Massic80, Materialscientist, MiloszD, Mindmatrix, Mwtoews, Nabav,
Neilc,ObradovicGoran,Patrioticdissent,Possession,ProcerusDecor,Prodizy,Rbrewer42,Rholton,Richaraj,RonaldS.Davis,SqlPac,Sreyan,Stolkin,Torzsmokus,Weedwhacker128,YahyaAbd
al-Aziz,ZeroOne,Δ,石 庭 豐 , 96anonymousedits
ForeignkeySource: http://en.wikipedia.org/w/index.php?oldid=577704950Contributors: 16@r, Abdull, Amix.pal, Anbu121, AndrewWarden, Arichnad, Arthur Schnabel,
Arunsinghil,Aurochs,AutumnSnow,Beesforan,Biochemza,BrickThrower,Can'tsleep,clownwilleatme,Cander0000,Causasui,Clarificationgiven,CoryDonnelly,Cpiral,Cww,DHN,DarthPanda,
Derek Balsam, DireWolf, Dobi, Dougher, EWikist, Eldavan, Electricnet, Entropy, FatalError, Feder raz, Fluffernutter, Flyer22, Frap, Govorun, GregorB, Gsm1011, IanHarvey,
JHunterJ,Jadriman,Jesselong,Jim1138,Jk2q3jrklse,Jlenthe,Joebeone,JohnofReading,KeyStroke,Kf4bdy,Kubntk,Larsinio,Marcusfriedman,MarkRenier,MexicanMan24,MikeRosoft,Mikeblas,Mik
eyTheK,Mindmatrix,Minimac,Mmtrebuchet,Mogism,Mormegil,MrOllie,Mrt3366,NatalieErin,Ngriffeth,O.Koslowski,ObradovicGoran,PPOST,Pbwest,Peak,Polypus74,RIL-sv,Reedy, Rjwilmsi,
Rror, Rsrikanth05, SDS, Salvatore Ingala, Sboosali, Selfworm, Semperf, Shreyasjoshis, Species8473, Staecker, Stolze, Svenmathijssen, Tarquin, Tgeairn, The Thing
ThatShouldNotBe,TheExtruder,Threeacres,Timhowardriley,TobiasPersson,TroelsArvin,Unordained,Waskyo,WikHead,Zipz0p,Δ,石 庭 豐 , 220anonymousedits
UniquekeySource: http://en.wikipedia.org/w/index.php?oldid=571010506Contributors: Aberdeen01, Ahoerstemeier, Akyadav324, Alessandro57, Alexjbest, AmandeepJ,
Ambuj.Saxena,Andre.psantos,Baa,BlueSquareThing,Boson,Causasui,ChrisGualtieri,Ctimko,DarkFalls,DeadEyeArrow,Dgc03052,Dougher,Drphilharmonic,Ewebxml,Faizan,Feraudyh,Fnielsen,Fra
p,
Gurch, Hike395, J.delanoy, JHunterJ, Jberkus, Jbodilytm, Jdeperi, Jesdisciple, Joe.dolivo, Jorge Stolfi, Jyothisdavid4u, KeithB, L'Aquatique, L337 kybldmstr, LittleOldMe,
Loren.wilton,Mahemoff,Materialscientist,Mindmatrix,Minimac,Mwtoews,Nabav,NatalieErin,Northamerica1000,O.Koslowski,ObradovicGoran,Pevernagie,Praveentech,ProcerusDecor,Raja2006
82,Ratarsed, RobIII, Spartaz, Special Cases, Spinality, Stolze, Subversive.sound, Themusicgod1, Thumperward, Tijfo098, TommyG, Troels Arvin, Unixxx, Velavan, Vjosullivan,
Wensceslao,Whitejay251, Wiki.Tango.Foxtrot, Winterst, X201, Yintan, Ykliu, Zzuuzz, 113 anonymous edits
SuperkeySource:http://en.wikipedia.org/w/index.php?
oldid=573338089Contributors:2001:700:303:D:8DF3:FDC7:B975:9C41,2001:700:303:D:BCEB:D496:EA00:FD55,AndrewWarden,Anog, AutumnSnow, Boson, CeleronNutcase,
CharlotteWebb, ColinFine, Crosbiesmith, Dawynn, Fatherlinux, Fimp, Igor Yalovecky, IronGargoyle, James Crippen, Jan Hidders, Jorge Stolfi,
Jusdafax, Jwulff, Katieh5584, KeyStroke, Kranix, LOL, Larsinio, M. Frederick, Magioladitis, Mark Renier, Metron4, Michaelcomella, Mikeblas, Millermk, Mindmatrix, Nabav,
Pimlottc,ProcerusDecor,Reedy,Rhoerbe,SpuriousQ,Sss41,Stbrob,TheThingThatShouldNotBe,TheParanoidOne,TobiasBergemann,Torzsmokus,Twarther,Voidxor,Welsh,Wikitanvir,Yayuntothe
Chicken, Zzuuzz, ʘx, Δ, 石 庭 豐 , 74 anonymous edits
Surrogatekey Source:http://en.wikipedia.org/w/index.php?
oldid=572624859Contributors:2001:4898:98:2041:2468:9C5:9E15:D5AF,Barliner,BrickThrower,Bryant1410,ChrisNoe,Chrisxue815, DVdm, Darinw, Demitsu, Djankowski, Dtuinhof,
Egrabczewski, Favonian, Govorun, Groggy Dice, Hairy Dude, Hsauzier, Int19h, Jberkus, Jimgawn, Joeharris76, KeyStroke,
Kgaughan, Kjkolb, LachlanA, Leandrod, Lucianosother, M4gnum0n, Mark Renier, Mcbridematt, Mcclarke, Mdchachi, Mdd, Mindmatrix, MyTigers, Neilc, Pearle, PhiLiP, Phil
Boswell,Pinkadelica,Raggatt2000,Reddyfire,Reedy,RichFarmbrough,Rjwilmsi,RobertKS,Shadowjams,Shenme,Simetrical,Sleske,Stewartadcock,Templatenamespaceinitialisationscript,Tfitzg,Ti
m.spears, Timhowardriley, Toh, Tomas e, Troels Arvin, Vjosullivan, WikipedianMarlith, Xenan, 160 anonymous edits
Armstrong'saxiomsSource: http://en.wikipedia.org/w/index.php?oldid=575689045Contributors: A3 nm, Aednichols, Andonic, Arosa, CBM, Can't sleep, clown will eat me,
CharlesMatthews,ChrisGualtieri,Cornellcloud,Entropeter,Inklein,Jh559,Jonemerson,JosephDwayne,LouI,MarkRenier,Mentoz86,PaoloSerafino,Q-lio,Telofy,Tijfo098,Vegpuff,Wavelength,
62 anonymous edits
Relation(database)Source:http://en.wikipedia.org/w/index.php?oldid=576215376Contributors:AndrewWarden,Asfreeas,Asocall,AutumnSnow,Crosbiesmith,EdPoor,Fratrep,Georgeryp,Icairns,
Lfstevens, MaD70, Mark Renier, MusiKk, NickCT, Nigwil, Rob Bednark, Subversive.sound, Tijfo098, Universalss, 14 anonymous edits
Table(database)Source:http://en.wikipedia.org/w/index.php?
oldid=571781837Contributors:12george1,16@r,Abdull,Ajraddatz,Alai,Arcann,AutumnSnow,Blanchardb,Bobo192,Bongwarrior,Bruxism, C.Fred, Cbrunschen,Correogsk, Cyfal, DARTH
SIDIOUS2, Danim, Dreftymac, Dzlinker,Epbr123, FattyMcjimmy, Feder raz,Funnyfarmofdoom, Gurch, IMSoP,
IanCarter, J36miles, Jamelan, Jerome Charles Potts, Krishna Vinesh, Larsinio, LeonardoGregianin, Lfstevens, Mark Renier, Materialscientist, Mblumber, Mikeblas, Mikeo,
Mindmatrix,Morad86,N0nr3s,Nibs208,Nikuwap,Ofus,Pyfan,Quentar,S.K.,Sae1962,Scs,Senator2029,SietseSnel,SimonP,Sippsin,Sonett72,SqlPac,Stolze,TheParanoidOne,TommyG,Turnstep,Txo
min, Versageek, Widefox, Yug1rt, Zhenqinli, 101 anonymous edits
Column(database)Source: http://en.wikipedia.org/w/index.php?oldid=575439689Contributors:
AbsoluteFlatness,Arcann,CesarB,CommonsDelinker,Danim,Dreftymac,Frietjes,Fæ,GermanX,Huiren92,Jmabel,KeyStroke,MarkRenier,MarkT,Mzuther,Petrb,RJFJR,Sae1962,SietseSnel,SqlPac,
石 庭 豐 , 13anonymousedits
Row(database)Source:http://en.wikipedia.org/w/index.php?oldid=555504177Contributors:2help,Allen3,Asfreeas,CommonsDelinker,D4g0thur,Danim,DavidHBraun(1964),Flip,GLaDOS,
Gail, GermanX, Glacialfox, GregorySmith, Jamespurs, Jerroleth, Jmabel, KKramer, KeyStroke, Liujiang, Mark Renier, Mark T, Mxg75, Mzuther, O.Koslowski, Oyauguru, Pnm,
Pol098, Retodon8, Rjd0060, Ronhjones, Shaka one, Sietse Snel, SootySwift, Troels Arvin, Yamamoto Ichiro, 29 anonymous edits
JeepdaySock, Jerome Charles Potts, Joaquin008, Jobbin, Jtgerman, Jwoodger, Kibbled bits, Kku, Kuru, Larsinio, Mark Renier, Mathmo, Matinict, Mikeblas, Mindmatrix, MrOllie,
Muchium,Nothings,Quentar,Quuxplusone,RaBa,Raeky,Rfl,Ricardorivaldo,Rjwilmsi,Roux,Rsrikanth05,RuudKoot,S.K.,Sappy,Scrool,Sippsin,Smalljim,Stolze,TheDamian,TommyG,Toyotaprius
2, Troels Arvin, UncleDouggie, WmLGann, Woohookitty, Wwphx, Z.E.R.O., Zerodeux, Zhenqinli, 144 anonymous edits
DatabasetransactionSource:http://en.wikipedia.org/w/index.php?
oldid=576565184Contributors:16@r,Adi92,Ajk,Al3ksk,AmandeepJ,AnmaFinotera,Appypani,Babbage,Binksternet,Burschik,CharlotteWebb, Clausen, Comps, Craig Stuntz, DCEdwards1966,
Damian Yerrick, Daniel0524, Dauerad, Derbeth, DnetSvg, Forderud, Fratrep, Geniac,Georgeryp, Gerd-HH, Gf uip,
Ghettoblaster, GregRobson, Haham hanuka, Hbent, Hede2000, Highguard, HumphreyW, Intgr, JCLately, Jarble, Jason Quinn, Jeltz, Karel Anthonissen, KellyCoinGuy, KeyStroke,
Khukri,Larsinio, Leeborkman, Lingliu07, Lubos, Luc4, Lysy, M4gnum0n, Mark Renier, Matiash, MegaHasher, Mike Schwartz, Mikeblas, Mindmatrix, Mintleaf, Neilc, Nixdorf, OMouse,
ObradovicGoran,Owen,PaulFoxworthy,Pcap,Pepper,RedWolf,RichMorin,Rocketrod1960,Roesser,SAE1962,Sandrarossi,SebastianHelm,Sobiaakhtar,SqlPac,Stevag,T0m,Timo,Triwger,TroelsAr
vin, Turnstep, WeißNix, Zerksis, Zhenqinli, 107 anonymous edits
TransactionlogSource:http://en.wikipedia.org/w/index.php?oldid=575149952Contributors:Clausen,DGG,DamianYerrick,Gustronico,Intgr,JCLately,JLaTondre,KeyStroke,Larsinio,Lupin,
Mark Renier, Mikeblas, Mindmatrix, Neoconfederate, Pelister, Poor Yorick, SJP, Sleske, SoledadKabocha, Stolze, Twimoki, 38 anonymous edits
DatabasetriggerSource:http://en.wikipedia.org/w/index.php?oldid=576465806Contributors:Abdull,Acha11,Adarshramesh,Bevo,BobHindy,BrickThrower,Bucketsofg,Cadillac,Can'tsleep,
clown will eat me, Cedar101, ClamDip, ClanCC, CodeNaked, DanBishop, DanielcWiki, Deineka, Denisarona, Derbeth, Dffgd, Dirkbb, Fizalhaji, Fæ, Grondemar, Gurch, HMSSolent,
Hazard-SJ,Heron,Hu12,Jerome Charles Potts, John of Reading, Jyujin, Knakts, L337 kybldmstr, Larsinio, Lugia2453, M2Ys4U, Magioladitis, Mark Renier, Matinict, Mecanismo, Mike
Rosoft,Mikeblas, Mindmatrix, Minna Sora no Shita, Mlpearc, Mortense, MrOllie, Mschlindwein, NathanBeach, Nickleus, Niteowlneils, Noah Salzman, Noelweichbrodt, PaD, Pimlottc,
Pinethicket,RJFJR,Ramkrish,Reedy,Rimonon,RossFraser,Rsrikanth05,S.K.,Sae1962,Sampsonvideos,Sippsin,SluggoOne,Stolze,SuperHamster,Superhilac,Svick,Tlaresch,TroelsArvin,Unordained,
Windofkeltia, Yourbane, Yrithinnd, 295 , ﻭﯾﮑﯽﻋﻠﯽanonymous edits
DatabaseindexSource:http://en.wikipedia.org/w/index.php?
oldid=577312945Contributors:16@r,31stCenturyMatt,Abolen,Afriza,Antandrus,Apavlo,Arcann,Arleyl,Arny,Atree,Aurlee,Bezenek, Brian Tvedt, Cander0000, Carmichael, Ccare, Ceyockey,
Chamoquemas, Chire, ChrisGualtieri, CloudNine, ColinFrayn, Comps, Cybercobra, DJPohly, Dainis, Danlev, Deon Steyn,
Dionyziz, Dominiktesla, Dougher, Drewnoakes, Dvik, Echawkes, Ercanyuz, ErikHaugen, Euryalus, Excirial, Flewis, Flyer22, Flyrev, Focus22, Furrykef, Gaur1982, Gergie, Glacialfox,
Gnaaye,GordonFindlay,Groffg,Groves.w,Gwyant,InShaneee,Interiot,Intgr,JCLately,Jadecristal,Jamesjiao,Jasimab,Jerryji1976,Jfroelich,JimCarnicelli,Jivadent,Jlehew,JohnF1980,JonAwbrey,Jsc
hnur, Jspashett, Jwchong, Kayau, KnightRider, Kuru, Larsinio, Leuko, Lfstevens, Lsschwar, Mabuali, Machadoman, MahSim, Manishkarwa, Mark Renier, MarkusWinand, Mereman,Mets501,
Mike Rosoft, Mindmatrix, Morfeuz, Movses, MrOllie, Mxcatania, Müslimix, NGPriest, Nahoo, NellieBly, Nepenthes, NicDumZ, Nicolas1981, Norm mit, Oxymoron83, P21n7,Pawanjain19,
Pietrow, Planetneutral, Ppntori, Radagast83, Raypereda, Rich Farmbrough, Rl, RobSimpson, Ruzihm, S.K., Salvio giuliano, Samroar, Samson ayalew, Sandgem Addict,
Sbisolo,Searcherfinder,Sideswipe091976,SimonP,Sippsin,Sleske,SpaniardGR,SpeedyGonsales,StefanUdrea,Stegop,THENWHOWASPHONE?,Taka,TheThingThatShouldNotBe,Tiderolls,Tom
my2010, Triddle, Turnstep, TutterMouse, Vacio, Wbm1058, Wikiwikithe3rd, William Avery, Woohookitty, X7q, Yalckram, Yamaguchi 先 生 , Zhenqinli, 469 anonymous edits
StoredprocedureSource: http://en.wikipedia.org/w/index.php?oldid=574743100Contributors: 0goodiegoodie0, 3Jane, Aaadamaa, Abdull, Aleenf1, Amaury, Andreas
Kaufmann,Andy.ruddock,AravindVR,AvicAWB,Avé,Berny68,Bevo,BobHindy,Bobo192,Bovineone,Brenan99,Brianray,Calane83,Cedear,ChristopherGautier,ClementSeveillac,Coachbudka,
Cww,DVdm,Dcoetzee,Derbeth,Dogsgomoo,Dougher,DrakeRedcrest,Dreamofthedolphin,Duster.Cleaner,EdgeOfEpsilon,Elockid,EvanCarroll,EvanSeeds,Farazbs20,Favonian,Flashspot,Frap,
Frecklefoot, Fred Bradstadt, Friendlydata, Gambhava, GoldenTorc, Graham87, GregorB, Harryboyles, Homestarmy, Honeplus, Hu12, IO Device, Ichimonji10, Izogi, JCLately, Jay,Jeffreyarcand,
Jeltz, Jim1138, Jogloran, KeyStroke, Kmsimon, Kuru, Kvdveer, Kyledmorgan, Larsinio, Lewissall1, Lights, Luckypayal, M4gnum0n, MER-C, Mariolina, Markblue, Marr75,Martincamino,
Materialscientist, Matticus78, Mayur, Merbabu, [email protected], Mikeblas, Mikesheffler, MilerWhite, Mindmatrix, Mitchandsherri, Modster, Moe Epsilon, MrJones,MrOllie,
Mschlindwein, NYCDA, Neilc, Nickdc, Nsaa, Ohiostandard, Pedro, Petersap, Pinethicket, Pravs, Primalmoon, Pseudonym, Rajeearul, Red Thrush, Regani, RevRagnarok, RichFarmbrough, Riki,
Rror, Rythie, S.Örvarr.S, Sava chankov, Scadavidson, Scgtrp, Sdorrance, SimonP, Sippsin, Sqlinfo, Stevietheman, Stolze, SymlynX, Taeshadow, Thumperward, Tijfo098,Tobias Bergemann,
Troels Arvin, Unyoyega, Velella, Winston Chuen-Shih Yang, Xenium, Xzilla, Zhenqinli, Σ, 440 anonymous edits
Cursor(databases)Source:http://en.wikipedia.org/w/index.php?
oldid=577352465Contributors:Abdull,Abi79,AgujeroNegro,Aitias,AlikKirillovich,BlastOButter42,Catgut,Cedar101,Christian75,Cwolfsheep,Danielx,DarkFalls,DarthPanda,DigitalEnthusiast,D
mccreary,DougBell,Ejdzej,Epbr123,Federraz,Ffu,Fieldday-sunday,Greenrd,Habitmelon,Haleyga,Hutcher,
Ilyanep,Ivantalk,Jamestochter,Janigabor,JeromeCharlesPotts,Justinc,Kamots,Kubieziel,Larsinio,MarkRenier,Mikeblas,Mindmatrix,Miten.morakhia,MrOllie,Mwtoews,Nagae,NawlinWiki,
NeonMerlin, NewEnglandYankee, OsamaK, PhilHibbs, Primalmoon, RHaworth, RandyFischer, Reedy, Richi, S.K., S.Örvarr.S, Sachzn, SarekOfVulcan, SimonP, Sippsin,SkyWalker, Stolze,
Tbannist, TechTony, Tobias Bergemann, Underpants, Wallie, Winston Chuen-Shih Yang, Wjasongilmore, Wknight94, Xiphoris, 92 anonymous edits
Partition(database)Source:http://en.wikipedia.org/w/index.php?
oldid=577826075Contributors:Alai,Andrew.rose,AndyDingley,Angusmca,Beaddy1238,Brian1975,Ccubedd,Ceva,Doubleplusjeff, Drrngrvy, Ehn, Fholahan, Foonly, Geoffmcgrath,
Georgewilliamherbert, Habitmelon, Highflyerjl, Isheden, Jamelan, Jan.hasller, Jonstephens, Lurkfest, Mark Renier,
Materialscientist,Mdfst13, Mikeblas, Mindmatrix, MrOllie, Peak, Pinkadelica, S.K.,Salobaas, Semmerich, SmallRepair, Stevelihn, Vishnava, Wordstext, YahyaAbdal-Aziz, 42 anonymous edits
ConcurrencycontrolSource:http://en.wikipedia.org/w/index.php?
oldid=550119263Contributors:2GooD,Acdx,Adrianmunyua,Augsod,Bdesham,BrickThrower,CanisRufus,CarlHewitt,Christian75, Clausen, Comps, Craig Stuntz, Cyberpower678, DavidCary,
Furrykef, Gdimitr, GeraldH, JCLately, Jesse Viviano, Jirislaby, John of Reading, JonHarder, Jose Icaza, Karada,
KeyStroke,Kku,Leibniz,M4gnum0n,Magioladitis,Malbrain,MarkRenier,Mgarcia,Mindmatrix,Miym,N3rV3,Nealcardwell,NguyenThanhQuang,Peak,PoorYorick,Reedy,Rholton,RuudKoot,
Siskus, Smallman12q, The Anome, Thingg,Thoreaulylazy, Tikuko, TonyW, Touko vk, Tumble,Victor falk, Vincnet, Wbm1058, Wikidrone, YUL89YYZ,87 anonymous edits
DatadictionarySource: http://en.wikipedia.org/w/index.php?oldid=564190594Contributors: Aednichols, Ajdlinux, AlistairMcMillan, Aluion, Andreas Kaufmann, Awis, Barek,
BozMo,Bubbahotep, ChrisGualtieri, Cybercobra, DEddy, Dan100, Daniel5127, Daswani.Amit, Davnor, DePiep, Dekisugi, Dicklyon, Dmccreary, Floweracre, Friendlydata, Gilliam, Gioto,
Giso6150,
Gobbleswoggler, Hadal, Ham Pastrami, Hardyplants, Haymaker, Helwr, Hooperbloob, Icey, Immunize, Iridescent, Jacobko, Jaksckajwsb, Jamelan, Jeff3000, JeffTan, Joelemaltais,
Joinarnold,Jwissick,Karada,Ketiltrout,KeyStroke,Kku,Klaun,Lauri.pirttiaho,M.rsantoshkumar.,MarkRenier,MarkDWikiUser,Materialscientist,MaxHund,Maziotis,Mdd,Mentifisto,MichaelHardy,M
indmatrix, Mushroom, N1RK4UDSK714, Olaf Davis, Omicronpersei8, PartyDude!, Pavel Vozenilek, Perrydegner, Pnm, RattusMaximus, RayGates, Rettetast, Riana, Rich Farmbrough,RickBeton,
Shadowjams, Sprachpfleger, Sstrader, The Thing That Should Not Be, TheParanoidOne, Thetorpedodog, Tigerente, Ttwaring, Veyklevar, Violetriga, Wireless friend, Woohookitty,Xphile2868,
Xxsquishyxx, Zondor, Тне ежесабботочно, Филатов Алексей, 152 anonymous edits
XQueryAPIforJavaSource:http://en.wikipedia.org/w/index.php?oldid=572931513Contributors:A5b,Bgwhite,Bxj,CodenameLisa,F331491,Frap,GurtPosh,KlemenKocjancic,Mhkay,Protonk,
RHaworth, Vegaswikian, Yutsi, 23 anonymous edits
ODBCSource:http://en.wikipedia.org/w/index.php?oldid=566327529Contributors:AKGhetto,AdventurousSquirrel,AlistairMcMillan,Allens,AndreasKaufmann,AndriuZ,AndyDingley,Arch
dude, Auric, AvicAWB, Avé, Beevvy, Bigpru, BobGibson, BonsaiViking, Borgx, Bovineone, BryEllis, Bunnyhop11, Cander0000, CanisRufus, Canterbury Tail, Charlesgold, Chealer,
ClaudioSantos,Computafreak,CraigStuntz,DFulbright,Danim,DavidGerard,Derbeth,Discospinster,Dittaeva,DragonHawk,Drewgut,Eglobe55,Electrolite,Epim,EverlastingWinner,Gcm,GreyCat,
Gwern, Harry Wood, Inzy, Irish all the way, JLaTondre, Jandalhandler, Jay, Jerome Charles Potts, Jkelly, Jklowden, John of Reading, JonathanMonroe, KeyStroke, KingsleyIdehen,Kuru,
Kyouteki, Kzafer, Larsinio, Lkstrand, Lowellian, Lurcher300b, MacTed, Magnus.de, Manop, Mark Renier, Markhurd, Martijn Hoekstra, Materialscientist, Maury Markowitz,Maximaximax,
MeltBanana, Michael Hardy, Mikeblas, Mindmatrix, Minesweeper, Minghong, Mintleaf, Misog, Mitsukai, NapoliRoma, Nikos 1993, Nixdorf, NoahSussman, Not Sure, Orlady,Orpheus, Oxda,
Pajz, Paul Foxworthy, Pedant17, Pmsyyz, Polluks, Power piglet, PrisonerOfIce, Quuxa, Raffaele Megabyte, Rajkumar9795, Reconsider the static, Reedy, RenniePet, Rjwilmsi,Rrabins,
Seanwong, Sega381, Spellmaster, Sspecter, Struway, Swhalen, The Anome, The wub, Thumperward, Tide rolls, Tin, Todorojo, TommyG, Viridae, Wez, Whpq, Wikipelli, WilliamAvery,
Winterst, Woohookitty, Ysangkok, Yug1rt, Хаш-Эрдэнэ, 231 anonymous edits
QuerylanguageSource:http://en.wikipedia.org/w/index.php?
oldid=575926344Contributors:ASHPvanRenssen,Adarw,Ahoerstemeier,Ahunt,Albert688,AminHashem,AmirMehri,AndrewWarden, BCable, Bacchus123, Beaton1131, BenAveling, Bkonrad,
Chtirrell, CxQL, DEng, Danakil, Danim, Davidfstr, Deepugn, Devourer09, Diamondland, ERfan111, Edward,
Ehajiyev, Elwikipedista, Face, Frieda, Groovenstein, Grutness, HanielBarbosa, Honys, Ihenriksen, Inverse.chi, IvanLanin, Jay42, Joerg Kurt Wegner, John Vandenberg, John of
Reading,Jonathan.mark.lingard, KeyStroke, Kwiki, Larsinio, Logiphile, MarXidad, Mark Arsten, Mark Renier, Markhobley, Mgreenbe, Mhkay, MichaelSpeer, Mild Bill Hiccup, Msnicki, NGC
2736,NikolaSmolenski,Ojigiri,OsamaK,PeterGulutzan,Retireduser1111,Rfl,SarekOfVulcan,Shekhardtu,Slipstream,Soumyasch,Srandrews,StevenWalling,Svick,Tassedethe,Techno.modus,
Article Sources and Contributors 239
Throbblefoot, TommyG, Toussaint, Trevor MacInnis, Troels Arvin, Usien6, Valafar, Vanished user qkqknjitkcse45u3, Vmenkov, Wikiolap, Xodlop, ZygmuntKrynicki, 48 anonymous edits
QueryoptimizationSource: http://en.wikipedia.org/w/index.php?oldid=574425789Contributors: Abdull, Andreas Kaufmann, Andy Dingley, Avalon, Bearcat, Beland, Cadvga,
Cedar101,Danim,Edward,Ginsuloft,Glux,GregorB,Gzuckier,Isulica,JoshRosen,LessThanFree,MBisanz,MildBillHiccup,MouchoirleSouris,MrOllie,Mrmatiko,Nadeemhussain,Neilc,Owl3638,
Paige Master, Pascal.Tesson, Ronwarshawsky, Sct72, Sudhir h, TechPurism, Vitriden, Walter Görlitz, 24 anonymous edits
QueryplanSource:http://en.wikipedia.org/w/index.php?oldid=574422744Contributors:Aaronbrick,AlphaQuadrant,Ammar.w,AnchetaWis,Arcann,Bevo,Cedar101,Cww,Freezegravity,Grace
Note, Hardeeps, James barton, Larsinio, Mark Renier, Mbarbier, Mdesmet, Mikeblas, Mindmatrix, Neilc, Nikola Smolenski, Reedy, Rl, Ronwarshawsky, SimonP, Sippsin, Slaniel,
TheParanoidOne, UnitedStatesian, Walter Görlitz, Woodshed, ZoBlitz, 26 anonymous edits
DatabaseadministrationandautomationSource:http://en.wikipedia.org/w/index.php?
oldid=575478582Contributors:Aflorin27,Akerans,Anas2048,Beetstra,Cuttysc,Dabron,David.lamberth,DrGangrene,Drunkenmonkey,Elonka,Ericgross,Hffmgb899,ITautomationFreak,JEH,JaGa,
Jamesx12345,Jpbowen,Kjkolb,Kku,Kukushk,MZMcBride,Maahela,
Mwtoews,OliFilth, Pianonontroppo, Pointillist,Qwyrxian, R'n'B,Ronz, Rwwww,Rybec, ShelfSkewed,The ThingThatShould NotBe, Theopolisme,Vegaswikian, 64anonymous edits
Comparisonofobject-relationaldatabasemanagementsystemsSource:http://en.wikipedia.org/w/index.php?oldid=566897826Contributors:Akagel,Alexandre.Morgaut,Anas2048,Beland,Beta m,
Calabrese, Chikako, Chris the speller, Christian75, Cigano, Cubridorg, DRady, Donhalcon, Garyzx, Ghp, Gudeldar, JJay, Jeff3000, Jerome Charles Potts, Karnesky, KingsleyIdehen,
Leotohill,Lotje,MER-
C,MarkRenier,Minghong,Palosirkka,Pamri,Pianonontroppo,Reedy,Requestion,RuudKoot,Rwwww,Salixalba,Skyezx,SquidsandChips,Versus22,Wmahan,19anonymousedits
Natishalom, Nawk, Nawroth, Netmesh, Neustradamus, Nick Number, Nileshbansal, Nosql.analyst, Ntoll, OmerMor, Omidnoorani, Orenfalkowitz, Ostrolphant, PatrickFisher, Pcap, Peak,
Pereb,Peter Gulutzan, Phillips-Martin, Philu, Phoe6, Phoenix720, Phunehehe, Plustgarten, Pnm, Poohneat, ProfessorBaltasar, QuiteUnusual, Qwertyus, R39132, RA0808, Rabihnassar,
Raysonho,Razorflame, Really Enthusiastic, Rediosoft, Rfl, Robert1947, RobertG, Robhughadams, Ronz, Rossturk, Rpk512, Rtweed1955, Russss, Rzicari, Sae1962, Sagarjhobalia,
SamJohnston,Sandy.toast, Sanspeur, Sasindar, ScottConroy, Sdrkyj, Sduplooy, Seancribbs, Seraphimblade, Shadowjams, Shepard, Shijucv, Smyth, Socialuser, Somewherepurple, Sorenriise,
Sstrader,StanContributor, Stephen Bain, Stephen E Browne, Steve03Mills, Stevedekorte, Stevenguttman, Stimpy77, Strait, Syaskin, TJRC, Tabletop, Tagishsimon, Techsaint, Tedder, Tgrall,
The-
verver,Theandrewdavis,Thegreeneman5,Thomas.uhl,ThomasMueller,Thumperward,ThurnerRupert,Thüringer,Timwi,Tobiasivarsson,Tomdo08,Trbdavies,Tshanky,Tsm32,Tsvljuchsh,Tuvrotya,T
ylerskf, Ugurbost, Uhbif19, Vegaswikian, Violaaa, Viper007Bond, Volt42, Voodootikigod, Vychtrle, Walter Görlitz, Wavelength, Weimanm, Whimsley, White gecko, Whooym,
Williamgreenly, Winston Chuen-Shih Yang, Winterst, Woohookitty, Wyverald, Xtremejames183, YPavan, Zapher67, Zaxius, Zond, Милан Јелисавчић, 654 anonymous edits
NewSQLSource:http://en.wikipedia.org/w/index.php?oldid=577827531Contributors:Akim.demaille,Amux,Apavlo,Beland,Brianna.galloway,Diegodiazespinoza,Ibains,Intgr,JulianMehnle,
MPH007, MacTed, Maury Markowitz, MrOllie, Mwaci99, Plothridge, Quuxplusone, Stuartyeates, UMD-Database, 12 anonymous edits
Image Sources, Licenses and Contributors 241
ImageSources,LicensesandContributors
File:CodasylB.png Source:http://en.wikipedia.org/w/index.php?title=File:CodasylB.pngLicense:CreativeCommonsAttribution-ShareAlike3.0UnportedContributors:Jean-
BaptisteWaldner,User:Jbw
Image:Relational key.pngSource:http://en.wikipedia.org/w/index.php?title=File:Relational_key.pngLicense:PublicDomainContributors:LionKimbro
File:Database models.jpgSource: http://en.wikipedia.org/w/index.php?title=File:Database_models.jpgLicense: Creative Commons Attribution-Sharealike 3.0Contributors: Marcel DouweDekker
Image:A22TraditionalViewofData.jpgSource:http://en.wikipedia.org/w/index.php?
title=File:A2_2_Traditional_View_of_Data.jpgLicense:PublicDomainContributors:itl.nist.govImage:FlatFileModel.svgSource:http://en.wikipedia.org/w/index.php?
title=File:Flat_File_Model.svgLicense:PublicDomainContributors:Wgabrie(talk)16:48,13March2009(UTC)Image:HierarchicalModel.svgSource:http://en.wikipedia.org/w/index.php?
title=File:Hierarchical_Model.svgLicense:PublicDomainContributors:U.S.DepartmentofTransportationvectorization:
Image:NetworkModel.svgSource:http://en.wikipedia.org/w/index.php?title=File:Network_Model.svgLicense:PublicDomainContributors:U.S.DepartmentofTransportationvectorization:
File:EmpTables(Database).PNGSource: http://en.wikipedia.org/w/index.php?title=File:Emp_Tables_(Database).PNGLicense: Public DomainContributors:
JamesssssImage:Object-OrientedModel.svgSource: http://en.wikipedia.org/w/index.php?title=File:Object-Oriented_Model.svgLicense: Public DomainContributors: U.S. Department
ofTransportationvectorization:
File:Updateanomaly.svgSource:http://en.wikipedia.org/w/index.php?title=File:Update_anomaly.svgLicense:PublicDomainContributors:Nabav,
File:Insertionanomaly.svgSource: http://en.wikipedia.org/w/index.php?title=File:Insertion_anomaly.svgLicense: Public domainContributors: en:User:Nabav,
User:StanneredFile:Deletionanomaly.svgSource: http://en.wikipedia.org/w/index.php?title=File:Deletion_anomaly.svgLicense: Public domainContributors: en:User:Nabav,
User:StanneredFile:Referential integrity broken.pngSource: http://en.wikipedia.org/w/index.php?title=File:Referential_integrity_broken.pngLicense: GNU Free Documentation
LicenseContributors:en:User:Ta bu shi da yu
Image:Relationaldatabaseterms.svgSource: http://en.wikipedia.org/w/index.php?title=File:Relational_database_terms.svgLicense: Public DomainContributors:
User:BooyabazookaFile:RelationalModel.svgSource: http://en.wikipedia.org/w/index.php?title=File:Relational_Model.svgLicense: Public DomainContributors: U.S. Department of
Transportationvectorization:
File:Relationalkey.pngSource:http://en.wikipedia.org/w/index.php?title=File:Relational_key.pngLicense:PublicDomainContributors:LionKimbro
File:Relationalmodelconcepts.pngSource: http://en.wikipedia.org/w/index.php?title=File:Relational_model_concepts.pngLicense: GNU Free Documentation
LicenseContributors:User:AutumnSnow
File:Object-Oriented Model.svgSource: http://en.wikipedia.org/w/index.php?title=File:Object-Oriented_Model.svgLicense: Public DomainContributors: U.S. Department of
Transportationvectorization:
File:Dbnull.pngSource:http://en.wikipedia.org/w/index.php?title=File:Db_null.pngLicense:GNUFreeDocumentationLicenseContributors:User:SqlPac
File:XQJ-Architecture.svg Source: http://en.wikipedia.org/w/index.php?title=File:XQJ-Architecture.svgLicense: GNU Free Documentation LicenseContributors:
F331491File:Storage replication.pngSource: http://en.wikipedia.org/w/index.php?title=File:Storage_replication.pngLicense: Creative Commons Attribution-Sharealike
3.0Contributors:User:Speculos
File:GraphDatabase PropertyGraph.pngSource: http://en.wikipedia.org/w/index.php?title=File:GraphDatabase_PropertyGraph.pngLicense: Creative Commons
ZeroContributors:User:Obersachse
License 242
License
Creative Commons Attribution-Share Alike 3.0
//creativecommons.org/licenses/by-sa/3.0/