Module 1 Notes
Module 1 Notes
NOTES
COMPILED BY:
Mrs. RAMEESA K, Assistant Professor
2024-25
1
UNIT - I
2
The warehouse design process consists of the following steps:
3
Characteristics of Data Warehouse
Subject-Oriented
A data warehouse target on the modeling and analysis of data for decision-makers. Therefore,
data warehouses typically provide a concise and straightforward view around a particular
subject, such as customer, product, or sales, instead of the global organization's ongoing
operations. This is done by excluding data that are not useful concerning the subject and
including all data needed by the users to understand the subject.
Integrated
A data warehouse integrates various heterogeneous data sources like RDBMS, flat files, and
online transaction records. It requires performing data cleaning and integration during data
warehousing to ensure consistency in naming conventions, attributes types, etc., among
different data sources.
Time-Variant
Historical information is kept in a data warehouse. For example, one can retrieve files from
3 months, 6 months, 12 months, or even previous data from a data warehouse. These
variations with a transactions system, where often only the most current file is kept.
Non-Volatile
The data warehouse is a physically separate data storage, which is transformed from the source
operational RDBMS. The operational updates of data do not occur in the data warehouse, i.e.,
update, insert, and delete operations are not performed. It usually requires only two procedures
in data accessing: Initial loading of data and access to data. Therefore, the DW does not require
transaction processing, recovery, and concurrency capabilities, which allows for substantial
speedup of data retrieval. Non-Volatile defines that once entered into the warehouse, and data
should not change.
4
A Three Tier Data Warehouse Architecture:
Tier-1:
The bottom tier is a warehouse database server that is almost always a relational database
system. Back-end tools and utilities are used to feed data into the bottom tier from operational
databases or other external sources (such as customer profile information provided by external
consultants). These tools and utilities perform data extraction, cleaning, and transformation
(e.g., to merge similar data from different sources into a unified format), as well as load and
refresh functions to update the data warehouse. The data are extracted using application
program interfaces known as gateways. A gateway is supported by the underlying DBMS and
allows client programs to generate SQL code to be executed at a server. Examples of gateways
include ODBC (Open Database Connection) and OLEDB (Open Linking and Embedding for
Databases) by Microsoft and JDBC (Java Database Connection). This tier also contains a
metadata repository, which stores information about the data warehouse and its contents.
5
Tier-2:
The middle tier is an OLAP server that is typically implemented using either a relational
OLAP (ROLAP) model or a multidimensional OLAP.
OLAP model is an extended relational DBMS that maps operations on
multidimensional data to standard relational operations.
A multidimensional OLAP (MOLAP) model, that is, a special-purpose server that
directly implements multidimensional data and operations.
Tier-3:
The top tier is a front-end client layer, which contains query and reporting tools, analysis
tools, and/or data mining tools (e.g., trend analysis, prediction, and so on).
Multidimensional Model
A multidimensional model views data in the form of a data-cube. A data cube enables data to
be modelled and viewed in multiple dimensions. It is defined by dimensions and facts. The
dimensions are the perspectives or entities concerning which an organization keeps records.
For example, a shop may create a sales data warehouse to keep records of the store's sales for
the dimension time, item, and location. These dimensions allow the save to keep track of
things, for example, monthly sales of items and the locations at which the items were sold. Each
dimension has a table related to it, called a dimensional table, which describes the dimension
further. For example, a dimensional table for an item may contain the attributes item name,
brand, and type. A multidimensional data model is organized around a central theme, for
example, sales. This theme is represented by a fact table. Facts are numerical measures. The
fact table contains the names of the facts or measures of the related dimensional tables.
6
systems or external information providers, and is cross-functional in scope.
It typically contains detailed data as well as summarized data, and can range in size
from a few gigabytes to hundreds of gigabytes, terabytes, or beyond.
An enterprise data warehouse may be implemented on traditional mainframes,
computer super servers, or parallel architecture platforms. It requires extensive
business modeling and may take years to design and build.
2. Data mart:
A data mart contains a subset of corporate-wide data that is of value to a specific
group of users. The scope is confined to specific selected subjects. For example, a
marketing data mart may confine its subjects to customer, item, and sales. The data
contained in data marts tend to be summarized.
Data marts are usually implemented on low-cost departmental servers that are
UNIX/LINUX- or Windows-based. The implementation cycle of a data mart is
more likely to be measured in weeks rather than months or years. However, it may
involve complex integration in the long run if its design and planning were not
enterprise-wide.
Depending on the source of data, data marts can be categorized as independent more
dependent. Independent data marts are sourced from data captured from one or more
operational systems or external information providers, or from data generated
locally within a particular department or geographic area. Dependent data marts are
source directly from enterprise data warehouses.
3. Virtual warehouse:
A virtual warehouse is a set of views over operational databases. For efficient query
processing, only some of the possible summary views may be materialized.
A virtual warehouse is easy to build but requires excess capacity on operational
database servers.
Metadata are data about data. When used in a data warehouse, metadata are the data that define
warehouse objects. Metadata are created for the data names and definitions of the given
warehouse. Additional metadata are created and captured for time stamping any extracted data,
the source of the extracted data, and missing fields that have been added by data cleaning or
integration processes.
7
A metadata repository should contain the following:
o A description of the structure of the data warehouse, which includes the warehouse
schema, view, dimensions, hierarchies, and derived data definitions, as well as data
mart locations and contents.
o Operational metadata, which include data lineage (history of migrated data and the
sequence of transformations applied to it), currency of data (active, archived, or
purged), and monitoring information (warehouse usage statistics, error reports, and
audit trails).
o The algorithms used for summarization, which include measure and dimension
definition algorithms, data on granularity, partitions, subject areas, aggregation,
summarization, and predefined queries and reports.
o The mapping from the operational environment to the data warehouse, which includes
source databases and their contents, gateway descriptions, data partitions, data
extraction, cleaning, transformation rules and defaults, data refresh and purging rules,
and security (user authorization and access control).
o Data related to system performance, which include indices and profiles that improve
data access and retrieval performance, in addition to rules for the timing and scheduling
of refresh, update, and replication cycles.
o Business metadata, which include business terms and definitions, data ownership
information, and charging policies
Schema Design:
Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional Databases The
entity- relationship data model is commonly used in the design of relational databases, where
a database schema consists of a set of entities and the relationships between them. Such a data
model is appropriate for on- line transaction processing. A data warehouse, however, requires
a concise, subject-oriented schema that facilitates on-line data analysis. The most popular data
model for a data warehouse is a multidimensional model. Such a model can exist in the form of
a star schema, a snowflake schema, or a fact constellation schema. Let’s look at each of these
schema types. Star schema: The most common modeling paradigm is the star schema, in which
the data warehouse contains (1) a large central table (fact table) containing
8
the bulk of the data, with no redundancy, and (2) a set of smaller attendant tables (dimension
tables), one for each dimension.
Star schema:
A star schema for All Electronics sales is shown in Figure. Sales are considered along four
dimensions, namely, time, item, branch, and location. The schema contains a central fact table
for sales that contains keys to each of the four dimensions, along with two measures: dollars
sold and units sold. To minimize the size of the fact table, dimension identifiers (such as time
key and item key) are system-generated identifiers. Notice that in the star schema, each
dimension is represented by only one table, and each table contains a set of attributes. For
example, the location dimension table contains the attribute set {location key, street, city,
province or state, country}. This constraint may introduce some redundancy.
For example, “Vancouver” and “Victoria” are both cities in the Canadian province of British
Columbia. Entries for such cities in the location dimension table will create redundancy among
the attributes province or state and country, that is, (..., Vancouver, British Columbia, Canada)
and (..., Victoria, British Columbia, Canada). Moreover, the attributes within a dimension table
may form either a hierarchy (total order) or a lattice (partial order).
9
Snowflake schema.:
A snowflake schema for All Electronics sales is given in Figure Here, the sales fact table is
identical to that of the star schema in Figure. The main difference between the two schemas is
in the definition of dimension tables.
The single dimension table for item in the star schema is normalized in the snowflake schema,
resulting in new item and supplier tables. For example, the item dimension table now contains
the attributes item key, item name, brand, type, and supplier key, where supplier key is linked
to the supplier dimension table, containing supplier key and supplier type information.
Similarly, the single dimension table for location in the star schema can be normalized into two
new tables: location and city.
Notice that further normalization can be performed on province or state and country in the
snowflake schema.
Fact constellation.
A fact constellation schema is shown in Figure. This schema specifies two fact tables, sales
and shipping. The sales table definition is identical to that of the star schema . The shipping
table has five dimensions, or keys: item key, time key, shipper key, from location, and to
location, and two measures: dollars cost and units shipped.
10
A fact constellation schema allows dimension tables to be shared between fact tables. For
example, the dimensions tables for time, item, and location are shared between both the sales
and shipping fact tables. In data warehousing, there is a distinction between a data warehouse
and a data mart.
A data warehouse collects information about subjects that span the entire organization, such as
customers, items, sales, assets, and personnel, and thus its scope is enterprise-wide. For data
warehouses, the fact constellation schema is commonly used, since it can model multiple,
interrelated subjects. A data mart, on the other hand, is a department subset of the data
warehouse that focuses on selected subjects, and thus its scope is department wide. For data
marts, the star or snowflake schema are commonly used, since both are geared toward modeling
single subjects, although the star schema is more popular and efficient.
11
Measures can be organized into three categories (i.e., distributive, algebraic, holistic), based
on the kind of aggregate functions used.
12
Drill-Down
Slicing And Dicing
• Consolidation involves the aggregation of data that can be accumulated and computed
in one or more dimensions. For example, all sales offices are rolled up to the sales
department or sales division to anticipate sales trends.
• The drill-down is a technique that allows users to navigate through the details. For
instance, users can view the sales by individual products that make up a region’s sales.
• Slicing and dicing is a feature whereby users can take out (slicing) a specific set of
data of the OLAP cube and view (dicing) the slices from different viewpoints.
Types of OLAP:
1. Relational OLAP (ROLAP):
• ROLAP works directly with relational databases. The base data and the dimension
tables are stored as relational tables and new tables are created to hold the aggregated
information. It depends on a specialized schema design.
• This methodology relies on manipulating the data stored in the relational database to
give the appearance of traditional OLAP's slicing and dicing functionality. In essence,
each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL
statement. ROLAP tools do not use pre-calculated data cubes but instead pose the query
to the standard relational database and its tables in order to bring back the data required
to answer the question.
• ROLAP tools feature the ability to ask any question because the methodology does not
limit to the contents of a cube. ROLAP also has the ability to drill down to the lowest
level of detail in the database.
13
Benefits:
• It is compatible with data warehouses and OLTP systems.
Limitations:
• SQL functionality is constrained.
• It’s difficult to keep aggregate tables up to date.
14
Benefits:
Limitations:
• It is difficult to change the dimensions without re-aggregating.
• Since all calculations are performed when the cube is built, a large amount of data cannot be
stored in the cube itself.
• There is no clear agreement across the industry as to what constitutes Hybrid OLAP, except
that a database will divide data between relational and specialized storage.
• For example, for some vendors, a HOLAP database will use relational tables to hold the
larger quantities of detailed data, and use specialized storage for at least some aspects of the
smaller quantities of more-aggregate or less-detailed data.
• HOLAP addresses the shortcomings of MOLAP and ROLAP by combining the capabilities
of both approaches.
• HOLAP tools can utilize both pre-calculated cubes and relational data sources.
15
Benefits:
Limitations
• Because it supports both MOLAP and ROLAP servers, HOLAP architecture is extremely
complex.
16
DATA MINING:
Data mining refers to extracting or mining knowledge from large amounts of data. The term is
actually a misnomer. Thus, data mining should have been more appropriately named as
knowledge mining which emphasis on mining from large amounts of data.
It is the computational process of discovering patterns in large data sets involving methods at
the intersection of artificial intelligence, machine learning, statistics, and database systems.
The overall goal of the data mining process is to extract information from a data set and
transform it into an understandable structure for further use.
Given databases of sufficient size and quality, data mining technology can generate new
business opportunities by providing these capabilities:
Data mining automates the process of finding predictive information in large databases.
Questions that traditionally required extensive hands- on analysis can now be answered directly
from the data — quickly.
A typical example of a predictive problem is targeted marketing. Data mining uses data on past
promotional mailings to identify the targets most likely to maximize return on
17
investment in future mailings. Other predictive problems include forecasting bankruptcy and
other forms of default, and identifying segments of a population likely to respond similarly to
given events.
In some cases, users may have no idea regarding what kinds of patterns in their data may be
interesting, and hence may like to search for several different kinds of patterns in parallel. Thus,
it is important to have a data mining system that can mine multiple kinds of patterns to
accommodate different user expectations or applications. Furthermore, data mining systems
should be able to discover patterns at various granularity (i.e., different levels of abstraction).
Data mining systems should also allow users to specify hints to guide or focus the search for
interesting patterns. Because some patterns may not hold for all of the data in the database, a
measure of certainty or “trustworthiness” is usually associated with each discovered pattern.
Data mining functionalities, and the kinds of patterns they can discover, are described Mining
Frequent Patterns, Associations, and Correlations Frequent patterns, as the name suggests, are
patterns that occur frequently in data. There are many kinds of frequent patterns, including item
sets, subsequence’s, and substructures.
18
A frequent item set typically refers to a set of items that frequently appear together in a
transactional data set, such as milk and bread. A frequently occurring subsequence, such as the
pattern that customers tend to purchase first a PC, followed by a digital camera, and then a
memory card, is a (frequent) sequential pattern. A substructure can refer to different structural
forms, such as graphs, trees, or lattices, which may be combined with item sets or
subsequence’s. If a substructure occurs frequently, it is called a (frequent) structured pattern.
Mining frequent patterns leads to the discovery of interesting associations and correlations
within data below.
19
Architecture of Data Mining
A typical data mining system may have the following major components.
1. Knowledge Base:
This is the domain knowledge that is used to guide the search or evaluate the interestingness of
resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes
or attribute values into different levels of abstraction. Knowledge such as user beliefs, which
can be used to assess a pattern’s interestingness based on its unexpectedness, may also be
included. Other examples of domain knowledge are additional interestingness constraints or
thresholds, and metadata (e.g., describing data from multiple heterogeneous sources).
This is essential to the data mining system and ideally consists of a set of functional modules
for tasks such as characterization, association and correlation analysis, classification,
prediction, cluster analysis, outlier analysis, and evolution analysis.
20
3. Pattern Evaluation Module:
This component typically employs interestingness measures interacts with the data mining
modules so as to focus the search toward interesting patterns. It may use interestingness
thresholds to filter out discovered patterns. Alternatively, the pattern evaluation module may
be integrated with the mining module, depending on the implementation of the data mining
method used. For efficient data mining, it is highly recommended to push the evaluation of
pattern interestingness as deep as possible into the mining process as to confine the search to
only the interesting patterns.
4. User interface:
This module communicates between users and the data mining system, allowing the user to
interact with the system by specifying a data mining query or task, providing information to
help focus the search, and performing exploratory data mining based on the intermediate data
mining results. In addition, this component allows the user to browse database and data
warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in
different forms.
21
Data mining systems can be categorized according to various criteria, as follows:
Classification according to the kinds of databases mined: A data mining system can be
classified according to the kinds of databases mined. Database systems can be classified
according to different criteria (such as data models, or the types of data or applications
involved), each of which may require its own data mining technique. Data mining systems can
therefore be classified accordingly.
For instance, if classifying according to data models, we may have a relational, transactional,
object- relational, or data warehouse mining system. If classifying according to the special
types of data handled, we may have a spatial, time-series, text, stream data, multimedia data
mining system, or a World Wide Web mining system.
Classification according to the kinds of knowledge mined: Data mining systems can be
categorized according to the kinds of knowledge they mine, that is, based on data mining
functionalities, such as characterization, discrimination, association and correlation analysis,
classification, prediction, clustering, outlier analysis, and evolution analysis. A comprehensive
data mining system usually provides multiple and/or integrated data mining functionalities.
Moreover, data mining systems can be distinguished based on the granularity or levels of
abstraction of the knowledge mined, including generalized knowledge (at a high level of
abstraction), primitive-level knowledge (at a raw data level), or knowledge at multiple levels
(considering several levels of abstraction). An advanced data mining system should facilitate
the discovery of knowledge at multiple levels of abstraction
22
Data mining systems can also be categorized as those that mine data regularities (commonly
occurring patterns) versus those that mine data irregularities (such as exceptions, or outliers).
In general, concept description, association and correlation analysis, classification, prediction,
and clustering mine data regularities, rejecting outliers as noise. These methods may also help
detect outliers.
: Data mining systems can also be categorized according to the applications they adapt. For
example, data mining systems may be tailored specifically for finance, telecommunications,
DNA, stock markets, e-mail, and so on. Different applications often require the integration of
application-specific methods. Therefore, a generic, all-purpose data mining system may not fit
domain-specific mining tasks.
Data Mining is a process of discovering various models, summaries, and derived values from
a given collection of data.
The general experimental procedure adapted to data-mining problems involves the following
steps:
1. State the problem and formulate the hypothesis
Most data-based modeling studies are performed in a particular application domain. Hence,
domain-specific knowledge and experience are usually necessary in order to come up with a
meaningful problem statement. Unfortunately, many application studies tend to focus on the
data- mining technique at the expense of a clear problem statement. In this step, a modeler
usually specifies a set of variables for the unknown dependency and, if possible, a general
23
form of this dependency as an initial hypothesis. There may be several hypotheses formulated
for a single problem at this stage.
The first step requires the combined expertise of an application domain and a data-mining
model. In practice, it usually means a close interaction between the data-mining expert and the
application expert. In successful data-mining applications, this cooperation does not stop in the
initial phase; it continues during the entire data-mining process.
2. Collect the data
This step is concerned with how the data are generated and collected. In general, there are two
distinct possibilities. The first is when the data-generation process is under the control of an
expert (modeler): this approach is known as a designed experiment.
The second possibility is when the expert cannot influence the data- generation process: this is
known as the observational approach. An observational setting, namely, random data
generation, is assumed in most data-mining applications.
Typically, the sampling distribution is completely unknown after data are collected, or it is
partially and implicitly given in the data-collection procedure. It is very important, however, to
understand how data collection affects its theoretical distribution, since such a priori knowledge
can be very useful for modeling and, later, for the final interpretation of results. Also, it is
important to make sure that the data used for estimating a model and the data used later for
testing and applying a model come from the same, unknown, sampling distribution. If this is
not the case, the estimated model cannot be successfully used in a final application of the
results.
A data mining task can be specified in the form of a data mining query, which is input to the
data mining system. A data mining query is defined in terms of data mining task primitives.
These primitives allow the user to interactively communicate with the data mining system
during discovery to direct the mining process or examine the findings from different angles or
depths. The data mining primitives specify the following,
24
Integration of a data mining system with a database system
The data mining system is integrated with a database or data warehouse system so that it can
do its tasks in an effective presence. A data mining system operates in an environment that
needed it to communicate with other data systems like a database system. There are the
possible integration schemes that can integrate these systems which are as follows –
No coupling − No coupling defines that a data mining system will not use any function of a
database or data warehouse system. It can retrieve data from a specific source (including a file
system), process data using some data mining algorithms, and therefore save the mining results
in a different file.
Loose Coupling − In this data mining system uses some services of a database or data
warehouse system. The data is fetched from a data repository handled by these systems. Data
mining approaches are used to process the data and then the processed data is saved either in
a file or in a designated area in a database or data warehouse. Loose coupling is better than no
coupling as it can fetch some area of data stored in databases by using query processing or
various system facilities.
Semitight Coupling − In this adequate execution of a few essential data mining primitives
can be supported in the database/Datawarehouse system. These primitives can contain
sorting, indexing, aggregation, histogram analysis, multi-way join, and pre-computation of
some important statistical measures, including sum, count, max, min, standard deviation, etc.
Tight coupling − Tight coupling defines that a data mining system is smoothly integrated
into the database/data warehouse system. The data mining subsystem is considered as one
functional element of an information system.
25
Presentation and visualization of data mining results. - Once the patterns are discovered it
needs to be expressed in high level languages, visual representations. These representations
should be easily understandable by the users.
Handling noisy or incomplete data. - The data cleaning methods are required that can handle
the noise, incomplete objects while mining the data regularities. If data cleaning methods are
not there then the accuracy of the discovered patterns will be poor.
Pattern evaluation. - It refers to interestingness of the problem. The patterns discovered
should be interesting because either they represent common knowledge or lack novelty.
Efficiency and scalability of data mining algorithms. - In order to effectively extract the
information from huge amount of data in databases, data mining algorithm must be efficient
and scalable.
Parallel, distributed, and incremental mining algorithms. - The factors such as huge size of
databases, wide distribution of data, and complexity of data mining methods motivate the
development of parallel and distributed data mining algorithms. These algorithms divide the
data into partitions which is further processed parallel. Then the results from the partitions are
merged. The incremental algorithms, updates the databases without having to mine the data
again from the scratch.
Data preprocessing: Data preprocessing is converting raw data into legible and defined sets
that allow businesses to conduct data mining, analyze the data, and process it for business
activities. It's important for businesses to preprocess their data correctly, as they use various
forms of input to collect raw data, which can affect its quality. Preprocessing data is an
important step, as raw data can be inconsistent or incomplete in its formatting. Effectively
preprocessing raw data can increase its accuracy, which can increase the quality of projects and
improve its reliability.
26
3. It increases the data's algorithm readability. Preprocessing enhances the data's quality
and makes it easier for machine learning algorithms to read, use, and interpret it.
Data Cleaning
Data cleaning is an essential step in the data mining process. It is crucial to the construction of
a model. The step that is required, but frequently overlooked by everyone, is data cleaning. The
major problem with quality information management is data quality. Problems with data
quality can happen at any place in an information system. Data cleansing offers a solution to
these issues.
Data cleaning is the process of correcting or deleting inaccurate, damaged, improperly
formatted, duplicated, or insufficient data from a dataset. Even if results and algorithms appear
to be correct, they are unreliable if the data is inaccurate. There are numerous ways for data to
be duplicated or incorrectly labeled when merging multiple data sources.
In general, data cleaning lowers errors and raises the caliber of the data. Although it might be
a time-consuming and laborious operation, fixing data mistakes and removing incorrect
information must be done. A crucial method for cleaning up data is data mining. A method for
finding useful information in data is data mining. Data quality mining is a novel methodology
that uses data mining methods to find and fix data quality issues in sizable databases. Data
mining mechanically pulls intrinsic and hidden information from large data sets. Data cleansing
can be accomplished using a variety of data mining approaches.
To arrive at a precise final analysis, it is crucial to comprehend and improve the quality of your
data. To identify key patterns, the data must be prepared. Exploratory data mining is
understood. Before doing business analysis and gaining insights, data cleaning in data mining
enables the user to identify erroneous or missing data.
Data Integration
Data integration is the process of combining data from multiple sources into a cohesive and
consistent view. This process involves identifying and accessing the different data sources,
mapping the data to a common format, and reconciling any inconsistencies or discrepancies
between the sources. The goal of data integration is to make it easier to access and analyze data
that is spread across multiple systems or platforms, in order to gain a more complete and
accurate understanding of the data.
27
Data Transformation:
In data transformation, the data are transformed or consolidated into forms appropriate for
mining.
Data transformation can involve the following:
1. Smoothing, which works to remove noise from the data. Such techniques include
binning, regression, and clustering.
2. Aggregation, where summary or aggregation operations are applied to the data. For
example, the daily sales data may be aggregated so as to compute monthly and annual
total amounts. This step is typically used in constructing a data cube for analysis of the
data at multiple granularities.
3. Generalization of the data, where low-level or―primitive‖ (raw) data are replaced by
higher-level concepts through the use of concept hierarchies. For example, categorical
attributes, like street, can be generalized to higher- level concepts, like city or country.
4. Normalization, where the attribute data are scaled so as to fall within a small specified
range, such as 1:0 to 1:0, or 0:0 to 1:0.
5. Attribute construction (or feature construction),where new attributes are constructed
and added from the given set of attributes to help the mining process.
Data Reduction
Data reduction techniques can be applied to obtain a reduced representation of the data set that
is much smaller in volume, yet closely maintains the integrity of the original data. That is,
mining on the reduced data set should be more efficient yet produce the same (or almost the
same) analytical results.
28
instead of the actual data) or nonparametric methods such as clustering, sampling, and the use
of histograms.
Discretization and concept hierarchy generation, where raw data values for attributes are
replaced by ranges or higher conceptual levels. Data discretization is a form of numerosity
reduction that is very useful for the automatic generation of concept hierarchies. Discretization
and concept hierarchy generation are powerful tools for data mining, in that they allow the
mining of data at multiple levels of abstraction.
29