Week 5 Database
Week 5 Database
Foundations of
Business
Intelligence:
Databases and
Information
Management
Managing Data in a Traditional File Environment
A computer system
organizes data in a
hierarchy that starts
with the bit, which
represents either a 0 or
a 1. Bits can be grouped
to form a byte to
represent one character,
number, or symbol.
Bytes can be grouped to
form a field, and related
fields can be grouped to
form a record. Related
records can be collected
to form a file, and
related files can be
organized into a
database.
FIGURE 6-1
TRADITIONAL FILE PROCESSING
The use of a
traditional approach to
file processing
encourages each
functional area in a
corporation to develop
specialized
applications. Each
application requires a
unique data file that is
likely to be a subset of
the master file. These
subsets of the master
file lead to data
redundancy and
inconsistency,
processing inflexibility,
and
FIGUREwasted
6-2 storage
resources.
Managing Data in a Traditional File Environment
• Database
– Serves many applications by centralizing data and controlling redundant data
• Database management system (DBMS)
– Interfaces between applications and physical data files
– Separates logical and physical views of data
– Solves problems of traditional file environment
• Controls redundancy
• Eliminates inconsistency
• Uncouples programs and data
• Enables organization to central manage data and data security
Types of Database
A relational database
organizes data in the
form of two-dimensional
tables. Illustrated here
are tables for the
entities SUPPLIER and
PART showing how they
represent each entity
and its attributes.
Supplier Number is a
primary key for the
SUPPLIER
FIGURE 6-4 table and a
foreign key for the PART
table.
Capabilities of Database Management Systems (DBMSs)
FIGURE 6-5 The select, join, and project operations enable data from two different tables to be combined and only selected
attributes to be displayed.
Capabilities of Database Management Systems (DBMSs)
• Designing Databases
– Conceptual (logical) design: abstract model from business perspective
– Physical design: How database is arranged on direct-access storage devices
• Design process identifies:
– Relationships among data elements, redundant database elements
– Most efficient way to group data elements to meet business requirements, needs of
application programs
• Normalization
– Streamlining complex groupings of data to minimize redundant data elements and
awkward many-to-many relationships
AN UNNORMALIZED RELATION FOR ORDER
FIGURE 6-9 An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each
order. There is only a one-to-one correspondence between Order_Number and Order_Date.
Capabilities of Database Management Systems (DBMSs)
BookNo
Phone State
Title
Address
Date
Tools for Improving Business Performance and Decision Making
• Big data
• Massive sets of
unstructured/semi-structured
data from Web traffic, social
media, sensors, and so on
• Petabytes, exabytes of data
• Volumes too great for typical
DBMS
• Can reveal more patterns and
anomalies
Tools for Improving Business Performance and
Decision Making
• Data warehouse:
– Stores current and historical data from many core operational transaction
systems
– Consolidates and standardizes information for use across enterprise, but data
cannot be altered
– Provides analysis and reporting tools
• Data marts:
– Subset of data warehouse
– Summarized or focused portion of data for use by specific population of users
– Typically focuses on single subject or line of business
CONTEMPORARY BUSINESS INTELLIGENCE INFRASTRUCTURE
A contemporary
business intelligence
infrastructure features
capabilities and tools to
manage and
analyze large quantities
and different types of
data from multiple
sources. Easy-to-use
query and
reporting tools for
casual business users
and more sophisticated
analytical toolsets for
power users
FIGURE 6-12
are included.
Tools for Improving Business Performance and Decision Making
• In-memory computing
• Used in big data analysis
• Uses computers main memory (RAM) for data storage to avoid delays in
retrieving data from disk storage
• Can reduce hours/days of processing to seconds
• Requires optimized hardware
• Analytic platforms
• High-speed platforms using both relational and non-relational tools
optimized for large datasets
Tools for Improving Business Performance and Decision Making
• CYFE
• GOOGLE SEARCH CONSOLE
Tools for Improving Business Performance and Decision Making
• Data mining:
• Finds hidden patterns, relationships in datasets
• Example: customer buying patterns
• Infers rules to predict future behavior
• Types of information obtainable from data mining:
• Associations
• Sequences
• Classification
• Clustering
• Forecasting
Tools for Improving Business Performance and Decision Making
• Text mining
• Extracts key elements from large unstructured data sets
• Stored e-mails
• Call center transcripts
• Legal cases
• Patent descriptions
• Service reports, and so on
• Sentiment analysis software
• Mines e-mails, blogs, social media to detect opinions
Tools for Improving Business Performance and Decision Making
• Web mining
– Discovery and analysis of useful patterns and information from Web
– Understand customer behavior
– Evaluate effectiveness of Web site, and so on
– Web content mining
• Mines content of Web pages
– Web structure mining
• Analyzes links to and from Web page
– Web usage mining
• Mines user interaction data recorded by Web server
Tools for Improving Business Performance and Decision Making