0% found this document useful (0 votes)
13 views

TOPIC II_View of data and Data Models

The document discusses database systems, focusing on data abstraction, data models, and database languages. It outlines the three levels of abstraction (physical, logical, and view), explains various data models (relational, entity-relationship, object-based, and semi-structured), and describes the roles of data-definition and data-manipulation languages. Additionally, it highlights the importance of data dictionaries and differentiates between database users and administrators.

Uploaded by

kbjoash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

TOPIC II_View of data and Data Models

The document discusses database systems, focusing on data abstraction, data models, and database languages. It outlines the three levels of abstraction (physical, logical, and view), explains various data models (relational, entity-relationship, object-based, and semi-structured), and describes the roles of data-definition and data-manipulation languages. Additionally, it highlights the importance of data dictionaries and differentiates between database users and administrators.

Uploaded by

kbjoash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

TOPIC II: View of Data

A database system is a collection of interrelated data and a set of programs that allow users to access
and modify these data. A major purpose of a database system is to provide users with an abstract view
of the data. That is, the system hides certain details of how the data are stored and maintained.

Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers
to use complex data structures to represent data in the database. Since many database-system users
are not computer trained, developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system:

Database
DISK

Figure 1.2 : Levels of Abstraction in a DBMS

• Physical level (or Internal View / Schema): The lowest level of abstraction describes how the data
are actually stored. The physical level describes complex low-level data structures in detail.

• Logical level (or Conceptual View / Schema): The next-higher level of abstraction describes what
data are stored in the database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may involve complex physical-level
structures, the user of the logical level does not need to be aware of this complexity. This is referred to
as physical data independence. Database administrators, who must decide what information to keep
in the database, use the logical level of abstraction.

• View level (or External View / Schema): The highest level of abstraction describes only part of the
entire database. Even though the logical level uses simpler structures, complexity remains because of
the variety of information stored in a large database. Many users of the database system do not need
all this information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system. The system may provide many views for
the same database. Figure 1.2 shows the relationship among the three levels of abstraction.
An analogy to the concept of data types in programming languages may clarify the distinction among
levels of abstraction. Many high-level programming languages support the notion of a structured type.
For example, we may describe a record as follows:
type instructor = record
ID : char (5);
name : char (20);
dept name : char (20);
salary : numeric (8,2);
end;

This code defines a new record type called instructor with four fields. Each field has a name
and a type associated with it. A university organization may have several such record types,
including11
• department, with fields dept_name, building, and budget
• course, with fields course_id, title, dept_name, and credits
• student, with fields ID, name, dept_name, and tot_cred

At the physical level, an instructor, department, or student record can be described as a block of
consecutive storage locations. The compiler hides this level of detail from programmers. Similarly, the
database system hides many of the lowest-level storage details from database programmers. Database
administrators, on the other hand, may be aware of certain details of the physical organization of the
data.

At the logical level, each such record is described by a type definition, as in the previous code segment,
and the interrelationship of these record types is defined as well. Programmers using a programming
language work at this level of abstraction. Similarly, database administrators usually work at this level of
abstraction.

Finally, at the view level, computer users see a set of application programs that hide details of the data
types. At the view level, several views of the database are defined, and a database user sees some or
all of these views. In addition to hiding details of the logical level of the database, the views also provide
a security mechanism to prevent users from accessing certain parts of the database. For example,
clerks in the university registrar office can see only that part of the database that has information about
students; they cannot access information about salaries of instructors.

Instances and Schemas

Databases change over time as information is inserted and deleted. The collection of information stored
in the database at a particular moment is called an instance of the database. The overall design of the
database is called the database schema. Schemas are changed infrequently, if at all. The concept of
database schemas and instances can be understood by analogy to a program written in a programming
language. A database schema corresponds to the variable declarations (along with associated type
definitions) in a program.
Each variable has a particular value at a given instant. The values of the variables in a program at a
point in time correspond to an instance of a database schema. Database systems have several
schemas, partitioned according to the levels of abstraction. The physical schema describes the
database design at the physical level, while the logical schema describes the database design at the
logical level. A database may also have several schemas at the view level, sometimes called
subschemas, which describe different views of the database. Of these, the logical schema is by far the
most important, in terms of its effect on application programs, since programmers construct applications
by using the logical schema. The physical schema is hidden beneath the logical schema, and can
usually be changed easily without affecting application programs. Application programs are said to
exhibit physical data independence if they do not depend on the physical schema, and thus need not
be rewritten if the physical schema changes.

Topic III: Data Models

Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints. A data model
provides a way to describe the design of a database at the physical, logical, and view levels.

The data models can be classified into four different categories:


• Relational Model. The relational model uses a collection of tables to represent both data and the
relationships among those data. Each table has multiple columns, and each column has a unique
name. Tables are also known as relations. The relational model is an example of a record-based
model.
Record-based models are so named because the database is structured in fixed-format records of
several types. Each table contains records of a particular type. Each record type defines a fixed number
of fields, or attributes. The columns of the table correspond to the attributes of the record type. The
relational data model is the most widely used data model, and a vast majority of current database
systems are based on the relational model.

• Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of basic
objects, called entities, and relationships among these objects.
An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-
relationship model is widely used in database design.

• Object-Based Data Model. Object-oriented programming (especially in Java, C++, or C#) has
become the dominant software-development methodology. This led to the development of an object-
oriented data model that can be seen as extending the E-R model with notions of encapsulation,
methods (functions), and object identity. The object-relational data model combines features of the
object-oriented data model and relational data model.

• Semi-structured Data Model. The semi-structured data model permits the specification of data
where individual data items of the same type may have different sets of attributes. This is in contrast to
the data models mentioned earlier, where every data item of a particular type must have the same set
of attributes. The Extensible Markup Language (XML) is widely used to represent semi-structured
data.
Historically, the network data model and the hierarchical data model preceded the relational data model.
These models were tied closely to the underlying implementation, and complicated the task of modeling
data. As a result they are used little now, except in old database code that is still in service in some places.

Sub topic: Database Languages


A database system provides a data-definition language to specify the database schema and a data-
manipulation language to express database queries and updates. In practice, the data-definition and
data-manipulation languages are not two separate languages; instead they simply form parts of a single
database language, such as the widely used SQL language.

Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate data
as organized by the appropriate data model. The types of access are:
• Retrieval of information stored in the database
• Insertion of new information into the database
• Deletion of information from the database
• Modification of information stored in the database

There are basically two types:


• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are
needed without specifying how to get those data.
Declarative DMLs are usually easier to learn and use than are procedural DMLs. However, since a user
does not have to specify how to get the data, the database system has to figure out an efficient means
of accessing data. A query is a statement requesting the retrieval of information. The portion of a DML
that involves information retrieval is called a query language. Although technically incorrect, it is
common practice to use the terms query language and data-manipulation language synonymously.

Data-Definition Language (DDL)


We specify a database schema by a set of definitions expressed by a special language called a data-
definition language (DDL). The DDL is also used to specify additional properties of the data.

We specify the storage structure and access methods used by the database system by a set of
statements in a special type of DDL called a data storage and definition language. These statements
define the implementation details of the database schemas, which are usually hidden from the users.
The data values stored in the database must satisfy certain consistency constraints.
For example, suppose the university requires that the account balance of a department must never be
negative. The DDL provides facilities to specify such constraints. The database system checks these
constraints every time the database is updated. In general, a constraint can be an arbitrary predicate
pertaining to the database. However, arbitrary predicates may be costly to test. Thus, database
systems implement integrity constraints that can be tested with minimal overhead.
• Domain Constraints. A domain of possible values must be associated with every attribute (for
example, integer types, character types, date/time types). Declaring an attribute to be of a particular
domain acts as a constraint on the values that it can take. Domain constraints are the most elementary
form of integrity constraint. They are tested easily by the system whenever a new data item is entered
into the database.

• Referential Integrity. There are cases where we wish to ensure that a value that appears in one
relation for a given set of attributes also appears in a certain set of attributes in another relation
(referential integrity). For example, the department listed for each course must be one that actually
exists. More precisely, the dept name value in a course record must appear in the dept name attribute
of some record of the department relation.
Database modifications can cause violations of referential integrity. When a referential-integrity
constraint is violated, the normal procedure is to reject the action that caused the violation.

• Assertions. An assertion is any condition that the database must always satisfy. Domain constraints
and referential-integrity constraints are special forms of assertions. However, there are many
constraints that we cannot express by using only these special forms. For example, “Every department
must have at least five courses offered every semester” must be expressed as an assertion. When an
assertion is created, the system tests it for validity. If the assertion is valid, then any future modification
to the database is allowed only if it does not cause that assertion to be violated.

• Authorization. We may want to differentiate among the users as far as the type of access they are
permitted on various data values in the database. These differentiations are expressed in terms of
authorization, the most common being: read authorization, which allows reading, but not
modification, of data; insert authorization, which allows insertion of new data, but not modification of
existing data; update authorization, which allows modification, but not deletion, of data; and delete
authorization, which allows deletion of data. We may assign the user all, none, or a combination of
these types of authorization.

The DDL, just like any other programming language, gets as input some instructions (statements) and
generates some output. The output of the DDL is placed in the data dictionary, which contains
metadata—that is, data about data. The data dictionary is considered to be a special type of table that
can only be accessed and updated by the database system itself (not a regular user). The database
system consults the data dictionary before reading or modifying actual data.

Data Dictionary
We can define a data dictionary as a DBMS component that stores the definition of data characteristics
and relationships. You may recall that such “data about data” were labeled metadata. The DBMS data
dictionary provides the DBMS with its self describing characteristic. In effect, the data dictionary
resembles and X -ray of the company’s entire data set, and is a crucial element in the data
administration function.
The two main types of data dictionary exist, integrated and stand alone. An integrated data dictionary is
included with the DBMS. For example, all relational DBMSs include a built in data dictionary or system
catalog that is frequently accessed and updated by the RDBMS. Other DBMSs especially older types,
do not have a built in data dictionary instead the DBA may use third party stand alone data dictionary
systems.
Data dictionaries can also be classified as active or passive. An active data dictionary is automatically
updated by the DBMS with every database access, thereby keeping its access information up-to-date.
A passive data dictionary is not updated automatically and usually requires a batch process to be run.
Data dictionary access information is normally used by the DBMS for query optimization purpose.
The data dictionary’s main function is to store the description of all objects that interact with the
database. Integrated data dictionaries tend to limit their metadata to the data managed by the DBMS.
Stand alone data dictionary systems are more usually more flexible and allow the DBA to describe and
manage all the organization’s data, whether or not they are computerized. Whatever the data
dictionary’s format, its existence provides database designers and end users with a much improved
ability to communicate. In addition, the data dictionary is the tool that helps the DBA to resolve data
conflicts.
Although, there is no standard format for the information stored in the data dictionary several features
are common. For example, the data dictionary typically stores descriptions of all:
• Data elements that are define in all tables of all databases. Specifically the data dictionary stores
the name, datatypes, display formats, internal storage formats, and validation rules. The data
dictionary tells where an element is used, by whom it is used and so on.
• Tables define in all databases. For example, the data dictionary is likely to store the name of
the table creator, the date of creation access authorizations, the number of columns, and so
on.
• Indexes define for each database tables. For each index the DBMS stores at least the index
name the attributes used, the location, specific index characteristics and the creation date.
• Define databases: who created each database, the date of creation where the database is located,
who the DBA is and so on.
• End users and The Administrators of the data base
• Programs that access the database including screen formats, report formats application
formats, SQL queries and so on.
• Access authorization for all users of all databases.
• Relationships among data elements which elements are involved: whether the relationship are
mandatory or optional, the connectivity and cardinality and so on.

Sub topic: Database Administrators and Database Users

A primary goal of a database system is to retrieve information from and store new information in the
database.
People who work with a database can be categorized as database users or database administrators.

Database Users and User Interfaces

There are four different types of database-system users, differentiated by the way they expect to
interact with the system. Different types of user interfaces have been designed for the different types of
users.
Naive users are unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously. For example, a bank teller who needs to transfer $50 from
account A to account B invokes a program called transfer. This program asks the teller for the amount
of money to be transferred, the account from which the money is to be transferred, and the account to
which the money is to be transferred.
As another example, consider a user who wishes to find her account balance over the World Wide
Web. Such a user may access a form, where she enters her account number. An application program
at the Web server then retrieves the account balance, using the given account number, and passes this
information back to the user. The typical user interface for naive users is a forms interface, where the
user can fill in appropriate fields of the form. Naive users may also simply read reports generated from
the database.
Application programmers are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces. Rapid application development
(RAD) tools are tools that enable an application programmer to construct forms and reports without
writing a program. There are also special types of programming languages that combine imperative
control structures (for example, for loops, while loops and if-then-else statements) with statements of
the data manipulation language. These languages, sometimes called fourth-generation languages,
often include special features to facilitate the generation of forms and the display of data on the screen.
Most major commercial database systems include a fourth generation language.
Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the storage manager understands.
Analysts who submit queries to explore data in the database fall in this category.
Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view summaries of
data in different ways. For instance, an analyst can see total sales by region (for example, North, South,
East, and West), or by product, or by a combination of region and product (that is, total sales of each
product in each region). The tools also permit the analyst to select specific regions, look at data in more
detail (for example, sales by city within a region) or look at the data in less detail (for example,
aggregate products together by category).
Another class of tools for analysts is data mining tools, which help them find certain kinds of patterns
in data. Specialized users are sophisticated users who write specialized database applications that
do not fit into the traditional data-processing framework.
Among these applications are computer-aided design systems, knowledge base and expert systems,
systems that store data with complex data types (for example, graphics data and audio data), and
environment-modeling systems.

You might also like