BI-Class Notes - Unit 1-4-5
BI-Class Notes - Unit 1-4-5
1
Business Intelligence Introduction [6 Hours]
• Definition
• History of Business intelligence
• Leveraging Data and Knowledge for BI
• BI Components
• Business Intelligence and Business
Analytics
• BI Life Cycle
• Business intelligence architectures
• Effective and timely decisions
Definition & History of Business
intelligence
• In 1865, Richard Millar Devens presented the phrase ―Business
Intelligenceǁ (BI) in the ―Cyclopædia of Commercial and Business
Anecdotes.ǁ
• He used it to describe how Sir Henry Furnese, a banker, profited from
information by gathering and acting on it before his competition.
• More recently, in 1958, an article was written by an IBM computer
scientist named Hans Peter Luhn, describing the potential of gathering
business intelligence (BI) through the use of technology.
• Business intelligence, as it is understood today, uses technology to
gather and analyze data, translate it into useful information, and act on
it ―before the competition.ǁ
• Essentially, the modern version of BI focuses on technology as a
way to make decisions quickly and efficiently, based on the right
information at the right time.
Histor
y In 1968, only individuals with extremely specialized skills
•
could translate data into usable information.
• At this time, data from multiple sources was normally stored in
silos, and research was typically presented in a fragmented,
disjointed report that was open to interpretation.
• Edgar Codd recognized this as a problem, and published a
paper in 1970, altering how people thought about databases.
• His proposal of developing a ―relational database modelǁ
gained tremendous popularity and was adopted worldwide.
Histor
y• Decisionsupport systems (DSS) was the first
management system to database
be developed.
• Many historians suggest the modern version of business
intelligence evolved from the DSS database. The number of BI
vendors grew in the 1980s, as business people discovered the
value of business intelligence.
• An assortment of tools was developed during this time, to
access and organize data in simpler ways.
• OLAP, executive information systems, and data warehouses
were some of the tools developed to work with DSS.
Business Intelligence and Business
Analytics
• Currently, the two terms are used interchangeably.
• Both describe the general practice of using data in making
informed,
intelligent business decisions.
• The term business intelligence has evolved to depend on a
range of technologies that provide useful insights.
• Conversely, analytics represents the tools and processes that can translate
raw data into actionable, useful information for decision-making purposes.
• Different forms of analyticshave been developed, including
streaming analytics, which works in real time.
Descriptive
•Analytics
Descriptive analyticsdescribes, or summarizes data, and is
focused
primarily on historical information.
• This type of analytics describes the past, allowing for an understanding of
how previous behaviors affect the present.
• Descriptive analytics can be used to explain how a company operates and to
describe different aspects of the business.
• In the best-case scenario, descriptive analytics tells a story with a relevant
theme and provides useful information.
Predictive
Analytics
• Predictive analytics is used to predict the future.
• This type of analytics uses statistical data to supply companies with useful
insights about upcoming changes, such as identifying sales trends,
purchasing patterns, and forecasting customer behavior.
• The business uses of predictive analytics normally include anticipating sales
growth at the end of the year, what products customers might purchase
simultaneously, and forecasting inventory totals.
• Credit scores offer an example of this type of analytics, with financial
services using them to determine a customer‘s probability of making
payments on time.
Prescriptive
Analytics
• Prescriptive analytics is a relatively new field, and is still a little hard to work
with.
• This type of analytics ―prescribesǁ several different possible actions and
guides people toward a solution.
• Prescriptive analytics is designed to provide advice.
• Essentially, it predicts multiple futures and allows organizations to assess
many possible outcomes, based upon their actions.
• In the best-case scenario, prescriptive analytics will predict what will
happen, why it will happen, and provide recommendations.
• Larger companies have used prescriptive analytics to successfully optimize
scheduling, revenue streams, and inventory, in turn, improving the customer
experience.
Streaming
• Analytics
Streaming analytics is the real-time processing of data. It is designed to
constantly calculate, monitor, and manage data-based statistical information,
and respond immediately.
• The process deals with recognizing and responding to specific situations, as
they happen.
• Streaming analytics has significantly improved the development and use of
business information.
• Data for streaming analytics can come from a variety of sources, including
mobile phones, the Internet of Things (IoT), market data, transactions, and
mobile devices (tablets, laptops).
Streaming
Analytics
It connects management to external data sources, allowing applications to combine and
•
merge data into an application flow, or update external databases with processed
information, quickly and efficiently. Streaming analytics supports:
• Minimizing damage caused by social media meltdowns, security breaches, airplane
crashes, manufacturing defects, stock exchange meltdowns, customer churn, etc.
• Analyzing routine business operations in real time
• Finding missed opportunities with big data
• The option to create new business models, revenue streams, and product innovations
• Some examples of streaming data are social media feeds, real-time stock trades, up-to-the-
minute retail inventory management, or ride-sharing apps.
• For instance, when a customer calls Lyft, streams of data are joined to create seamless user
experiences. The application merges real-time location tracking, pricing, traffic stats, and
real-time traffic data to provide the customer with the nearest available driver, pricing, and
a time estimate to the destination using both historical and real-time data.
• Streaming analytics has become an extremely useful tool for short-term coordination, as
well as developing business intelligence over the long term.
BI Life
Cycle
• Business Intelligence is a strategic initiative which helps the organizations
to measure the effectiveness of their plans on the market.
• A successful company must know how to plan and how to address a BI
strategy so that the project or projects implicated in the process have
maximum profitability.
• Company managers with each project manager should adopt a specific
methodology based on the needs they know they have.
BI vs Analytics
2. Data Warehousing
• Data Warehousing is a storage area for data that has been collected and is large in number.
. Business Decision
• If the visualization is visible, the next part determines the decision by considering the
information obtained.
Business intelligence
architectures
In either case, it contains a set of core components that collectively support the different stages of the
BI process from data collection, integration, data storage and analysis to data visualization, information
delivery and the use of BI data in business decision-making.
The core components of a BI architecture
the following:
include
• Source systems.
• These are all of the systems that capture and hold the transactional and
operational data identified as essential for the enterprise BI program.
• For example, this can include enterprise resource planning, customer
relationship management, flat files, application programming interfaces,
finance, manufacturing and supply chain management systems as well
as secondary sources, such as market data and customer databases from
outside information providers.
• As a result, both internal and external data sources are often
incorporated into a BI architecture.
• Important criteria in the data source selection process include data
relevancy, data currency, data quality and the level of detail in the
available data sets.
• In addition, a combination of structured, semi-structured and
unstructured data types might be required to meet the data analysis and
decision-making needs of executives and other end users.
Business intelligence
architectures
• Data integration and cleansing tools.
• To effectively analyze the collected data for a BI program, an organization
must integrate and consolidate different data sets to create unified views of
them.
• The most widely used data integration technology for BI applications is
extract, transform and load (ETL) software, which pulls data from source
systems in batch processes.
• A variant of ETL is extract, load and transform, a technology in which data
is extracted and loaded as-is and transformed later for specific BI uses.
• Other methods include real-time data integration, such as change data
capture and streaming integration to support real-time analytics applications,
and data virtualization, which combines data from different source systems
virtually.
• A BI architecture typically also includes data profiling and data cleansing
tools that are used to identify and fix data quality issues.
• They help BI and data management teams provide clean, consistent data
that's suitable for BI uses.
The core components of a BI architecture include the
following
• Analytics data stores.
• This encompasses the various repositories where BI data is stored and
managed.
• The primary repository is a data warehouse, which usually stores
structured data in a relational, columnar or multidimensional database
and makes it available for querying and analysis.
• An enterprise data warehouse can also be tied to smaller data marts set
up for individual departments and business units with data that's specific
to their BI needs.
• BI architectures often include an operational data store (ODS) that's an
interim repository for data before it goes into a data warehouse. An
ODS can also be used to run analytical queries against recent transaction
data. Depending on the size of a BI environment, a data warehouse, data
mart and an ODS can be deployed on a single database server or
separate business intelligence systems.
• A well-planned architecture should specify which of the different data
stores is best suited for particular BI uses.
The core components of a BI architecture include the
following
• BI and data visualization tools.
• The tools used to analyze data and present information to
business users include a suite of technologies that can be
built into a BI architecture -- for example, ad hoc query,
data mining and online analytical processing software.
• The growing adoption of self-service BI tools enables
business analysts and managers to run queries themselves
instead of relying on the members of the BI team to do
that for them.
• BI software also includes data visualization tools that can
be used to create graphical representations of data in the
form of charts, graphs and other types of visualizations
designed to illustrate trends, patterns and outlier
elements in data sets.
The core components of a BI architecture include the
following
• Dashboards, portals and reports.
• These information delivery tools give users visibility into the
results of BI and analytics applications with built-in data
visualizations and, often, self-service capabilities to do
additional data analysis.
• For example, BI dashboards and online portals can be
designed to provide real-time data access with configurable
views and give users the ability to drill down into data.
Reports tend to present data in a more static format.
• Other components that increasingly are part of a business
architecture include data preparation software used to
structure and organize data for analysis and a metadata
repository, a business glossary and a data catalog, which can
help users find relevant data and understand its lineage and
meaning.
The core components of a BI architecture include the
following
• BI architecture tools
• BI architecture tools facilitate the centralization of
data collection as well as data analysis and
visualization.
• These tools play an integral role in empowering
businesses to make informed decisions and extract
insights from extensive data sets.
• Tools:
• Microsoft Power BI
• Oracle Business Intelligence
• SAS Business Intelligence
• Tableau
Business Intelligence Architecture With Components
• Effective decisions.
• rigorous analytical methods allows decision makers to
rely on information and knowledge & ensuing in-depth
examination and thought lead to a deeper awareness
and comprehension of the underlying logic of the
decision-making process
• Timely decisions.
• If decision makers can rely on a business intelligence
system facilitating their activity, we can expect that the
overall quality of the decision-making process will be
greatly improved.
[Unit 4]
Data
❖ Definition of data
Warehousing
warehouse
❖ Data marts
❖ Data quality
❖ Data warehouse architecture
❖ ETL tools
❖ Metadata
❖ Schemas Used in Data Warehouses: Star, Snowflake and
fact constellation
❖ Cubes and multidimensional analysis
❖ Hierarchies of concepts
❖ OLAP operations ,OLAP vs OLTP
❖ Materialization of cubes of data
Metadat
a The standard
• definition of metadata is "data
about the data," which unfortunately is not a
particularly enlightening description.
• It is useful to think of metadata as a catalog of the
intellectual capital that surrounds the creation,
management, and use of a collection of
information.
• That can range from simple observations about
the number of columns in a database table to
complex descriptions about the way that data
flowed from multiple sources into the target
Metadat
a From relatively
• humble beginnings as the data
dictionary associated with mainframe database
tables, the concept of metadata has evolved over
time to become a major component of a BI
program.
• Essentially, metadata is a sharable master key to
all the information that is feeding the business
analytics, from the extraction and population of the
central repository to the provisioning of data out of
the warehouse and onto the screens of the business
clients.
• Metadata are data about data (e.g., see Sen, 2004; and
Zhao, 2005).
• Metadata describe the structure of and some meaning about
data, thereby contributing to their effective or ineffective
use.
• Metadata are generally defined in terms of usage as
technical or business metadata.
• Pattern is another way to view metadata.
• According to the pattern view, we can differentiate between
• syntactic metadata (i.e., data describing the syntax of data),
• structural metadata (i.e., data describing the structure of the data),
and
• semantic metadata (i.e., data describing the meaning of the data in
a specific domain).
Metadat
a• The primary purpose of metadata should be to provide
context to the reported data.
• In many ways, metadata assist in the conversion of
data and information into knowledge.
• Zhao (2005) described five levels of metadata
management maturity: (1) ad hoc, (2) discovered, (3)
managed, ( 4) optimized, and (5) automated.
• The design, creation, and use of metadata-descriptive
or summary data about data-and its accompanying
standards may involve ethical issues.
The Importance of
Metadata
The management of metadata is probably one of the most critical tasks associated
with a successful BI program, for a number of reasons.
❖ Metadata encapsulates both the logical and physical business
knowledge required to transform disparate data sets into a coherent
warehouse.
❖ Metadata captures the structure and meaning of the data that is being
fed into the warehouse.
❖ The recording of operational metadata provides a road map for
deriving an information audit trail.
❖ One can capture differences associated with how data is manipulated
over time (as well as the corresponding business rules), which is critical
with data warehouses whose historical data spans large periods of time.
❖ Metadata provides the means for tracing the evolution of information
as a way to validate and verify results derived from an analytical
process.
Metadata is divided into two areas:
•technical metadata, which describes the
data mechanics, and
•business metadata, which describes the
business perception of that same
information.
Technical Meta
data
• Technical metadata describes the structure
of
information, whether it is the data that
sourcing the or the is
warehouse.
warehouse data in the
• Technical metadata characterizes the structure
of data, the way that data move, and how it is
transformed as it moves from one location to
another.
• This may incorporate some or all of the
following.
• Connectivity metadata, which describes the ways that
data consumers interact with the database system,
including the names used to establish connections,
database names, data source names, whether connections
can be shared, and the connection timeout.
• Table information, including table names; the
description of what is modeled by each table; in which
database the table is stored; the physical location, size,
and growth rate of the table; the data sources that feed
each table; update histories (including the date of last
update and of last refresh); the results of the last update;
candidate keys; foreign keys; the degrees of the foreign
key cardinality (e.g., 1:1 versus 1 :many); referential
integrity constraints; functional dependencies; and
indexes
• Record structure information, which describes the
structure of the record; overall record size; whether the
record is a variable or static length; all column names,
types, descriptions, and sizes; source of values that populate
each column; whether a column is an automatically
generated unique key; null status; domain restrictions; and
validity constraints.
• Record manipulation metadata, which includes record
creation time, time of last update, the last person to modify
the record, and the results of the last modification.
• Index metadata, which describes what indexes exist, on
which tables those indexes are made, the columns that are
used to perform the indexing, whether nulls are allowed,
and whether the index is automatically or manually updated
• Data practitioners, which enumerates the staff members who
work with data, their contact information (e.g., telephone number,
e-mail address), and the objects to which they access.
• Security and access metadata, which identifies the owner of the
data, the ownership paradigm, who may access the data and with
which permissions (e.g., read-only versus modify)
• Data model metadata, which captures entity-relationship
diagrams, dimensional layouts and star join structures, logical
data models, and physical data models
• Physical features metadata, such as the size of tables, the
number of records in each table, and the maximum and minimum
record sizes if the records are of variable length
• Reference metadata, such as defined enumerated data domains,
value ranges, likely values (for reasonableness tests), and
mappings between data domains
• Management metadata, such as the history of a data table or
database, stewardship information, and responsibility matrices
• Transformation metadata, which describes the data sources
that feed into the data warehouse, the ultimate data destination,
and, for each destination data value, the set of transformations
used to materialize the datum and a description of the
transformation
• Process metadata, which describes the information flow and
sequence of extraction and transformation processing, including
data profiling, data cleansing, standardization, and integration
• Supplied data metadata, which, for all supplied data sets, gives
the name of the data set, the name of the supplier, the names of
individuals responsible for data delivery, the delivery mechanism
(including time, location, and method), the expected size of the
supplied data, the data sets that are sourced using each supplied
data set, and any transformations to be applied upon receiving the
Metadata is divided into two areas:
•technical metadata, which describes the
data mechanics, and
•business metadata, which describes the
business perception of that same
information.
Business
Metadata
Business metadata incorporates much of the same information as technical
metadata, as well as:
• Metadata that describes the structure of data as perceived by business
clients
• Descriptions of the methods for accessing data for client
analytical applications
• Business meanings for tables and their attributes
• Data ownership characteristics and responsibilities
• Data domains and mappings between those domains, for validation
• Aggregation and summarization directives
• Reporting directives
• Security and access policies
• Business rules that describe constraints or directives associated with data
within a record or between records as joined through a join condition
The Metadata
Repository
• Metadata is data, which means that it can
managed the same way other data is managed.
be modeled and