About Structuredata
About Structuredata
org/wiki/Data_model
A data model is an abstract model that organizes elements of data and standardizes how
they relate to one another and to the properties of real-world entities.[2][3] For instance, a
data model may specify that the data element representing a car be composed of a number
of other elements which, in turn, represent the color and size of the car and define its
owner.
The corresponding professional activity is called generally data modeling or, more
specifically, database design.
Data models are typically specified by a data expert, data specialist, data scientist, data
librarian, or a data scholar.
A data modeling language and notation are often represented in graphical form as diagrams.
[4]
A data model can sometimes be referred to as a data structure, especially in the context of
programming languages. Data models are often complemented by function models,
especially in the context of enterprise models.
A data model explicitly determines the structure of data; conversely, structured data is data
organized according to an explicit data model or data structure. Structured data is in
contrast to unstructured data and semi-structured data.
The term data model can refer to two distinct but closely related concepts. Sometimes it
refers to an abstract formalization of the objects and relationships found in a particular
application domain: for example the customers, products, and orders found in a
manufacturing organization. At other times it refers to the set of concepts used in defining
such formalizations: for example concepts such as entities, attributes, relations, or tables. So
the "data model" of a banking application may be defined using the entity–relationship
"data model". This article uses the term in both senses.
The reason for these problems is a lack of standards that will ensure that data models will
both meet business needs and be consistent.[5]
A data model explicitly determines the structure of data. Typical applications of data models
include database models, design of information systems, and enabling exchange of data.
Usually, data models are specified in a data modeling language.[3]
A data model instance may be one of three kinds according to ANSI in 1975:[6]
The significance of this approach, according to ANSI, is that it allows the three perspectives
to be relatively independent of each other. Storage technology can change without affecting
either the logical or the conceptual model. The table/column structure can change without
(necessarily) affecting the conceptual model. In each case, of course, the structures must
remain consistent with the other model. The table/column structure may be different from
a direct translation of the entity classes and attributes, but it must ultimately carry out the
objectives of the conceptual entity class structure. Early phases of many software
development projects emphasize the design of a conceptual data model. Such a design can
be detailed into a logical data model. In later stages, this model may be translated into
physical data model. However, it is also possible to implement a conceptual model directly.
One of the earliest pioneering works in modeling information systems was done by Young
and Kent (1958),[7][8] who argued for "a precise and abstract way of specifying the
informational and time characteristics of a data processing problem". They wanted to create
"a notation that should enable the analyst to organize the problem around any piece of
hardware". Their work was the first effort to create an abstract specification and invariant
basis for designing different alternative implementations using different hardware
components. The next step in IS modeling was taken by CODASYL, an IT industry
consortium formed in 1959, who essentially aimed at the same thing as Young and Kent: the
development of "a proper structure for machine-independent problem definition language,
at the system level of data processing". This led to the development of a specific IS
information algebra.[8]
In the 1960s data modeling gained more significance with the initiation of the management
information system (MIS) concept. According to Leondes (2002), "during that time, the
information system provided the data and information for management purposes. The first
generation database system, called Integrated Data Store (IDS), was designed by Charles
Bachman at General Electric. Two famous database models, the network data model and the
hierarchical data model, were proposed during this period of time".[9] Towards the end of
the 1960s, Edgar F. Codd worked out his theories of data arrangement, and proposed the
relational model for database management based on first-order predicate logic.[10]
In the 1970s G.M. Nijssen developed "Natural Language Information Analysis Method"
(NIAM) method, and developed this in the 1980s in cooperation with Terry Halpin into
Object–Role Modeling (ORM). However, it was Terry Halpin's 1989 PhD thesis that created
the formal foundation on which Object–Role Modeling is based.
Bill Kent, in his 1978 book Data and Reality,[11] compared a data model to a map of a
territory, emphasizing that in the real world, "highways are not painted red, rivers don't
have county lines running down the middle, and you can't see contour lines on a mountain".
In contrast to other researchers who tried to create models that were mathematically clean
and elegant, Kent emphasized the essential messiness of the real world, and the task of the
data modeler to create order out of chaos without excessively distorting the truth.
In the 1980s, according to Jan L. Harrington (2000), "the development of the object-oriented
paradigm brought about a fundamental change in the way we look at data and the
procedures that operate on data. Traditionally, data and procedures have been stored
separately: the data and their relationship in a database, the procedures in an application
program. Object orientation, however, combined an entity's procedure with its data."[12]
During the early 1990s, three Dutch mathematicians Guido Bakema, Harm van der Lek, and
JanPieter Zwart, continued the development on the work of G.M. Nijssen. They focused more
on the communication part of the semantics. In 1997 they formalized the method Fully
Communication Oriented Information Modeling FCO-IM.
A data structure diagram (DSD) is a diagram and data model used to describe conceptual
data models by providing graphical notations which document entities and their
relationships, and the constraints that bind them. The basic graphic elements of DSDs are
boxes, representing entities, and arrows, representing relationships. Data structure
diagrams are most useful for documenting complex data entities.
Data structure diagrams are an extension of the entity–relationship model (ER model). In
DSDs, attributes are specified inside the entity boxes rather than outside of them, while
relationships are drawn as boxes composed of attributes which specify the constraints that
bind entities together. DSDs differ from the ER model in that the ER model focuses on the
relationships between different entities, whereas DSDs focus on the relationships of the
elements within an entity and enable users to fully see the links and relationships between
each entity.
There are several styles for representing data structure diagrams, with the notable
difference in the manner of defining cardinality. The choices are between arrow heads,
inverted arrow heads (crow's feet), or numerical representation of the cardinality.
Generic data models are generalizations of conventional data models. They define
standardized general relation types, together with the kinds of things that may be related by
such a relation type. Generic data models are developed as an approach to solving some
shortcomings of conventional data models. For example, different modelers usually produce
different conventional data models of the same domain. This can lead to difficulty in
bringing the models of different people together and is an obstacle for data exchange and
data integration. Invariably, however, this difference is attributable to different levels of
abstraction in the models and differences in the kinds of facts that can be instantiated (the
semantic expression capabilities of the models). The modelers need to communicate and
agree on certain elements that are to be rendered more concretely, in order to make the
differences less significant.
A semantic data model in software engineering is a technique to define the meaning of data
within the context of its interrelationships with other data. A semantic data model is an
abstraction that defines how the stored symbols relate to the real world.[13] A semantic
data model is sometimes called a conceptual data model.
The logical data structure of a database management system (DBMS), whether hierarchical,
network, or relational, cannot totally satisfy the requirements for a conceptual definition of
data because it is limited in scope and biased toward the implementation strategy employed
by the DBMS. Therefore, the need to define data from a conceptual view has led to the
development of semantic data modeling techniques. That is, techniques to define the
meaning of data within the context of its interrelationships with other data. As illustrated in
the figure. The real world, in terms of resources, ideas, events, etc., are symbolically defined
within physical data stores. A semantic data model is an abstraction that defines how the
stored symbols relate to the real world. Thus, the model must be a true representation of
the real world.[13]
Data architecture is the design of data for use in defining the target state and the
subsequent planning needed to hit the target state. It is usually one of several architecture
domains that form the pillars of an enterprise architecture or solution architecture.
A data architecture describes the data structures used by a business and/or its applications.
There are descriptions of data in storage and data in motion; descriptions of data stores,
data groups, and data items; and mappings of those data artifacts to data qualities,
applications, locations, etc.
Essential to realizing the target state, Data architecture describes how data is processed,
stored, and utilized in a given system. It provides criteria for data processing operations
that make it possible to design data flows and also control the flow of data in the system.
Data modeling in software engineering is the process of creating a data model by applying
formal data model descriptions using data modeling techniques. Data modeling is a
technique for defining business requirements for a database. It is sometimes called
database modeling because a data model is eventually implemented in a database.[16]
The figure illustrates the way data models are developed and used today. A conceptual data
model is developed based on the data requirements for the application that is being
developed, perhaps in the context of an activity model. The data model will normally consist
of entity types, attributes, relationships, integrity rules, and the definitions of those objects.
This is then used as the start point for interface or database design.[5]
Some important properties of data for which requirements need to be met are:
Another kind of data model describes how to organize data using a database management
system or other data management technology. It describes, for example, relational tables
and columns or object-oriented classes and attributes. Such a data model is sometimes
referred to as the physical data model, but in the original ANSI three schema architecture, it
is called "logical". In that architecture, the physical model describes the storage media
(cylinders, tracks, and tablespaces). Ideally, this model is derived from the more conceptual
data model described above. It may differ, however, to account for constraints like
processing capacity and usage patterns.
While data analysis is a common term for data modeling, the activity actually has more in
common with the ideas and methods of synthesis (inferring general concepts from
particular instances) than it does with analysis (identifying component concepts from more
general ones). {Presumably we call ourselves systems analysts because no one can say
systems synthesists.} Data modeling strives to bring the data structures of interest together
into a cohesive, inseparable, whole by eliminating unnecessary data redundancies and by
relating data structures with relationships.
A different approach is to use adaptive systems such as artificial neural networks that can
autonomously create implicit models of data.
A data structure is a way of storing data in a computer so that it can be used efficiently. It is
an organization of mathematical and logical concepts of data. Often a carefully chosen data
structure will allow the most efficient algorithm to be used. The choice of the data structure
often begins from the choice of an abstract data type.
A data model describes the structure of the data within a given domain and, by implication,
the underlying structure of that domain itself. This means that a data model in fact specifies
a dedicated grammar for a dedicated artificial language for that domain. A data model
represents classes of entities (kinds of things) about which a company wishes to hold
information, the attributes of that information, and relationships among those entities and
(often implicit) relationships among those attributes. The model describes the organization
of the data to some extent irrespective of how data might be represented in a computer
system.
The entities represented by a data model can be the tangible entities, but models that
include such concrete entity classes tend to change over time. Robust data models often
identify abstractions of such entities. For example, a data model might include an entity
class called "Person", representing all the people who interact with an organization. Such an
abstract entity class is typically more appropriate than ones called "Vendor" or "Employee",
which identify specific roles played by those people.
For example, in the relational model, the structural part is based on a modified concept of
the mathematical relation; the integrity part is expressed in first-order logic and the
manipulation part is expressed using the relational algebra, tuple calculus and domain
calculus.
A data model instance is created by applying a data model theory. This is typically done to
solve some business enterprise requirement. Business requirements are normally captured
by a semantic logical data model. This is transformed into a physical data model instance
from which is generated a physical database. For example, a data modeler may use a data
modeling tool to create an entity–relationship model of the corporate data repository of
some business enterprise. This model is transformed into a relational model, which in turn
generates a relational database.
Patterns[18] are common data modeling structures that occur in many data models.
It is common practice to draw a context-level data-flow diagram first which shows the
interaction between the system and outside entities. The DFD is designed to show how a
system is divided into smaller portions and to highlight the flow of data between those
parts. This context-level data-flow diagram is then "exploded" to show more detail of the
system being modeled
An Information model is not a type of data model, but more or less an alternative model.
Within the field of software engineering, both a data model and an information model can
be abstract, formal representations of entity types that include their properties,
relationships and the operations that can be performed on them. The entity types in the
model may be kinds of real-world objects, such as devices in a network, or they may
themselves be abstract, such as for the entities used in a billing system. Typically, they are
used to model a constrained domain that can be described by a closed set of entity types,
properties, relationships and operations.
In computing the term object model has a distinct second meaning of the general properties
of objects in a specific computer programming language, technology, notation or
methodology that uses them. For example, the Java object model, the COM object model, or
the object model of OMT. Such object models are usually defined using concepts such as
class, message, inheritance, polymorphism, and encapsulation. There is an extensive
literature on formalized object models as a subset of the formal semantics of programming
languages.
Object–Role Modeling (ORM) is a method for conceptual modeling, and can be used as a tool
for information and rules analysis.[25]
UML offers a mix of functional models, data models, and database models.