0% found this document useful (0 votes)
141 views14 pages

Business Intelligence Architectures

Business intelligence (BI) architectures typically include three main components: data sources, data warehouses/marts, and BI methodologies. Data is extracted from various sources using ETL tools and stored in data warehouses/marts. It is then analyzed using BI methodologies like multidimensional analysis, data mining, and optimization models to support decision making. The pyramid depicts the building blocks of a BI system, with data exploration and mining at the bottom, optimization in the middle, and decisions at the top.

Uploaded by

Mrunal Mestry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views14 pages

Business Intelligence Architectures

Business intelligence (BI) architectures typically include three main components: data sources, data warehouses/marts, and BI methodologies. Data is extracted from various sources using ETL tools and stored in data warehouses/marts. It is then analyzed using BI methodologies like multidimensional analysis, data mining, and optimization models to support decision making. The pyramid depicts the building blocks of a BI system, with data exploration and mining at the bottom, optimization in the middle, and decisions at the top.

Uploaded by

Mrunal Mestry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Q1)What is BI ? Describe architecture of BI?

Business intelligence architectures


The architecture of a business intelligence system, depicted in Figure 1.2,
includes three major components.
Data sources. In a first stage, it is necessary to gather and integrate the data
stored in the various primary and secondary sources, which are heterogeneous
in
origin and type. The sources consist for the most part of data belonging to
operational
systems, but may also include unstructured documents, such as emails
and data received from external providers. Generally speaking, a major effort
is
required to unify and integrate the different data sources, as shown in Chapter
3.
Data warehouses and data marts. Using extraction and transformation tools
known as extract, transform, load (ETL), the data originating from the
different
sources are stored in databases intended to support business intelligence
analyses. These databases are usually referred to as data warehouses and data
marts, and they will be the subject of Chapter 3.
Business intelligence methodologies. Data are finally extracted and used to
feed
mathematical models and analysis methodologies intended to support
decision
makers. In a business intelligence system, several decision support
applications
may be implemented, most of which will be described in the following
chapters:
• multidimensional cube analysis;
• exploratory data analysis;
• time series analysis;
• inductive learning models for data mining;
• optimization models.
The pyramid in Figure 1.3 shows the building blocks of a business
intelligence
system. So far, we have seen the components of the first two levels
when discussing Figure 1.2. We now turn to the description of the upper
tiers.
Data exploration. At the third level of the pyramid we find the tools for
performing
a passive business intelligence analysis, which consist of query and
reporting systems, as well as statistical methods. These are referred to as
passive
methodologies because decision makers are requested to generate prior
hypotheses or define data extraction criteria, and then use the analysis tools
to find answers and confirm their original insight. For instance, consider the
sales manager of a company who notices that revenues in a given geographic
area have dropped for a specific group of customers. Hence, she might want
to bear out her hypothesis by using extraction and visualization tools, and
then
apply a statistical test to verify that her conclusions are adequately supported
by data. Statistical techniques for exploratory data analysis will be described
in Chapters 6 and 7.
Data mining. The fourth level includes active business intelligence
methodologies,
whose purpose is the extraction of information and knowledge from data.
BUSINESS INTELLIGENCE 11
These include mathematical models for pattern recognition, machine
learning
and data mining techniques, which will be dealt with in Part II of this book.
Unlike the tools described at the previous level of the pyramid, the models
of
an active kind do not require decision makers to formulate any prior
hypothesis
to be later verified. Their purpose is instead to expand the decision makers’
knowledge.
Optimization. By moving up one level in the pyramid we find optimization
models that allow us to determine the best solution out of a set of alternative
actions, which is usually fairly extensive and sometimes even infinite.
Example 1.2 shows a typical field of application of optimization models.
Other
optimization models applied in marketing and logistics will be described in
Chapters 13 and 14.
Decisions. Finally, the top of the pyramid corresponds to the choice and the
actual adoption of a specific decision, and in some way represents the
natural
conclusion of the decision-making process. Even when business intelligence
methodologies are available and successfully adopted, the choice of a
decision
pertains to the decision makers, who may also take advantage of informal
and
unstructured information available to adapt and modify the
recommendations
and the conclusions achieved through the use of mathematical models.
As we progress from the bottom to the top of the pyramid, business
intelligence
systems offer increasingly more advanced support tools of an active
type. Even roles and competencies change. At the bottom, the required
competencies
are provided for the most part by the information systems specialists
within the organization, usually referred to as database administrators.
Analysts
and experts in mathematical and statistical models are responsible for the
intermediate phases. Finally, the activities of decision makers responsible for
the application domain appear dominant at the top.
As described above, business intelligence systems address the needs of
different
types of complex organizations, including agencies of public administration
and associations. However, if we restrict our attention to enterprises,
business
intelligence methodologies can be found mainly within three departments
of a company, as depicted in Figure 1.4: marketing and sales; logistics and
production; accounting and control. The applications of business intelligence
described in Part III of this volume will be precisely devoted to these topics.

Q2)List applications of BI and explain any one.

Q3)Write short note on data preparation

Q4)Define local and global optimization and explain stochastic hill climbing
algorithm with flowchart

The strategy of the Stochastic Hill Climbing algorithm is iterate the process
of randomly selecting a neighbor for a candidate solution and only accept it
if it results in an improvement. The strategy was proposed to address the
limitations of deterministic hill climbing techniques that were likely to get
stuck in local optima due to their greedy acceptance of neighboring moves.

 Stochastic Hill Climbing was designed to be used in discrete domains


with explicit neighbors such as combinatorial optimization (compared to
continuous function optimization).
 The algorithm's strategy may be applied to continuous domains by
making use of a step-size to define candidate-solution neighbors (such as
Localized Random Search and Fixed Step-Size Random Search).
 Stochastic Hill Climbing is a local search technique (compared to global
search) and may be used to refine a result after the execution of a global
search algorithm.
 Even though the technique uses a stochastic process, it can still get stuck
in local optima.
 Neighbors with better or equal cost should be accepted, allowing the
technique to navigate across plateaus in the response surface.
 The algorithm can be restarted and repeated a number of times after it
converges to provide an improved result (called Multiple Restart Hill
Climbing).
 The procedure can be applied to multiple candidate solutions
concurrently, allowing multiple algorithm runs to be performed at the
same time (called Parallel Hill Climbing).

Algorithm:
Q5)Describe ETL Process
ETL refers to the software tools that are devoted to performing in an
automatic
way three main functions: extraction, transformation and loading of data
into
the data warehouse.
Extraction. During the first phase, data are extracted from the available
internal
and external sources. A logical distinction can be made between the initial
extraction, where the available data relative to all past periods are fed into
the empty data warehouse, and the subsequent incremental extractions that
update the data warehouse using new data that become available over time.
The
selection of data to be imported is based upon the data warehouse design,
which
in turn depends on the information needed by business intelligence analyses
and decision support systems operating in a specific application domain.
Transformation. The goal of the cleaning and transformation phase is to
improve the quality of the data extracted from the different sources, through
the correction of inconsistencies, inaccuracies and missing values. Some of
the
major shortcomings that are removed during the data cleansing stage are:
• inconsistencies between values recorded in different attributes having
the same meaning;
• data duplication;
• missing data;
• existence of inadmissible values.
During the cleaning phase, preset automatic rules are applied to correct most
recurrent mistakes. In many instances, dictionaries with valid terms are used
to substitute the supposedly incorrect terms, based upon the level of
similarity.
Moreover, during the transformation phase, additional data conversions
occur in order to guarantee homogeneity and integration with respect to the
different data sources. Furthermore, data aggregation and consolidation are
performed in order to obtain the summaries that will reduce the response
time
required by subsequent queries and analyses for which the data warehouse is
intended.
Loading. Finally, after being extracted and transformed, data are loaded into
the
tables of the data warehouse to make them available to analysts and decision
support applications.
Q6)Write short note on metadata
In order to document the meaning of the data contained in a data warehouse,
it
is recommended to set up a specific information structure, known as
metadata,
i.e. data describing data. The metadata indicate for each attribute of a data
warehouse the original source of the data, their meaning and the
transformations
to which they have been subjected. The documentation provided by
metadata
should be constantly kept up to date, in order to reflect any modification in
the data warehouse structure. The documentation should be directly
accessible
to the data warehouse users, ideally through a web browser, according to the
access rights pertaining to the roles of each analyst.
In particular, metadata should perform the following informative tasks:
• a documentation of the data warehouse structure: layout, logical views,
dimensions, hierarchies, derived data, localization of any data mart;
• a documentation of the data genealogy, obtained by tagging the data
sources from which data were extracted and by describing any
transformation
performed on the data themselves;
• a list keeping the usage statistics of the data warehouse, by indicating
how many accesses to a field or to a logical view have been performed;
• a documentation of the general meaning of the data warehouse with
respect to the application domain, by providing the definition of the
terms utilized, and fully describing data properties, data ownership and
loading policies.

Q7)Compare data warehouse and Data marts

Q8)Differentiate between OLAP and OLTP


Q9)List operations in OLAP. Explain any two.
Roll-up. A roll-up operation, also termed drill-up, consists of an aggregation
of data in the cube, which can be obtained alternatively in the following two
ways.
• Proceeding upwards to a higher level along a single dimension defined
over a concepts hierarchy. For example, for the {location} dimension
it is possible to move upwards from the {city} level to the {province}
level and to consolidate the measures of interest through a group-by
conditioned sum over all records whereby the city belongs to the same
province.
• Reducing by one dimension. For example, the removal of the {time}
dimension leads to consolidated measures through the sum over all
time periods existing in the data cube.
Roll-down. A roll-down operation, also referred to as drill-down, is the
opposite
operation to roll-up. It allows navigation through a data cube from
aggregated
and consolidated information to more detailed information. The effect
is to reverse the result achieved through a roll-up operation. A drill-down
operation can therefore be carried out in two ways.
• Shifting down to a lower level along a single dimension hierarchy. For
example, in the case of the {location} dimension, it is possible to shift
from the {province} level to the {city} level and to disaggregate the
measures of interest over all records whereby the city belongs to the
same province.
• Adding one dimension. For example, the introduction of the {time}
dimension leads to disaggregate the measures of interest over all time
periods existing in a data cube.
Slice and dice. Through the slice operation the value of an attribute is
selected
and fixed along one dimension. For example, Table 3.3 has been obtained by
fixing the region at the {Usa} value. The dice operation obtains a cube in a
subspace by selecting several dimensions simultaneously.
Pivot. The pivot operation, also referred to as rotation, produces a rotation of
the axes, swapping some dimensions to obtain a different view of a data
cube.

Q10)Explain with example star schema.


Here are some of the criteria for combining the tables
into a dimensional model.
The model should provide the best data access.
The whole model must be query-centric.
It must be optimized for queries and analyses.
The model must show that the dimension tables interact with the fact table.
It should also be structured in such a way that every dimension can interact
equally
with the fact table.
The model should allow drilling down or rolling up along dimension
hierarchies.
With these requirements, we find that a dimensional model with the fact
table in the
middle and the dimension tables arranged around the fact table satisfies the
conditions. In
this arrangement, each of the dimension tables has a direct relationship with
the fact table
in the middle. This is necessary because every dimension table with its
attributes must
have an even chance of participating in a query to analyze the attributes in
the fact table.
Such an arrangement in the dimensional model looks like a star formation,
with the
fact table at the core of the star and the dimension tables along the spikes of
the star. The
dimensional model is therefore called a STAR schema.

You might also like