Business Intelligence Architectures
Business Intelligence Architectures
Q4)Define local and global optimization and explain stochastic hill climbing
algorithm with flowchart
The strategy of the Stochastic Hill Climbing algorithm is iterate the process
of randomly selecting a neighbor for a candidate solution and only accept it
if it results in an improvement. The strategy was proposed to address the
limitations of deterministic hill climbing techniques that were likely to get
stuck in local optima due to their greedy acceptance of neighboring moves.
Algorithm:
Q5)Describe ETL Process
ETL refers to the software tools that are devoted to performing in an
automatic
way three main functions: extraction, transformation and loading of data
into
the data warehouse.
Extraction. During the first phase, data are extracted from the available
internal
and external sources. A logical distinction can be made between the initial
extraction, where the available data relative to all past periods are fed into
the empty data warehouse, and the subsequent incremental extractions that
update the data warehouse using new data that become available over time.
The
selection of data to be imported is based upon the data warehouse design,
which
in turn depends on the information needed by business intelligence analyses
and decision support systems operating in a specific application domain.
Transformation. The goal of the cleaning and transformation phase is to
improve the quality of the data extracted from the different sources, through
the correction of inconsistencies, inaccuracies and missing values. Some of
the
major shortcomings that are removed during the data cleansing stage are:
• inconsistencies between values recorded in different attributes having
the same meaning;
• data duplication;
• missing data;
• existence of inadmissible values.
During the cleaning phase, preset automatic rules are applied to correct most
recurrent mistakes. In many instances, dictionaries with valid terms are used
to substitute the supposedly incorrect terms, based upon the level of
similarity.
Moreover, during the transformation phase, additional data conversions
occur in order to guarantee homogeneity and integration with respect to the
different data sources. Furthermore, data aggregation and consolidation are
performed in order to obtain the summaries that will reduce the response
time
required by subsequent queries and analyses for which the data warehouse is
intended.
Loading. Finally, after being extracted and transformed, data are loaded into
the
tables of the data warehouse to make them available to analysts and decision
support applications.
Q6)Write short note on metadata
In order to document the meaning of the data contained in a data warehouse,
it
is recommended to set up a specific information structure, known as
metadata,
i.e. data describing data. The metadata indicate for each attribute of a data
warehouse the original source of the data, their meaning and the
transformations
to which they have been subjected. The documentation provided by
metadata
should be constantly kept up to date, in order to reflect any modification in
the data warehouse structure. The documentation should be directly
accessible
to the data warehouse users, ideally through a web browser, according to the
access rights pertaining to the roles of each analyst.
In particular, metadata should perform the following informative tasks:
• a documentation of the data warehouse structure: layout, logical views,
dimensions, hierarchies, derived data, localization of any data mart;
• a documentation of the data genealogy, obtained by tagging the data
sources from which data were extracted and by describing any
transformation
performed on the data themselves;
• a list keeping the usage statistics of the data warehouse, by indicating
how many accesses to a field or to a logical view have been performed;
• a documentation of the general meaning of the data warehouse with
respect to the application domain, by providing the definition of the
terms utilized, and fully describing data properties, data ownership and
loading policies.