0% found this document useful (0 votes)

73 views81 pages

BI-Class Notes - Unit 1-4-5

The document provides an introduction to Business Intelligence (BI), covering its definition, history, components, and various analytics types such as descriptive, predictive, prescriptive, and streaming analytics. It emphasizes the importance of BI in making effective and timely business decisions and outlines the BI life cycle, including data collection, warehousing, analysis, and reporting. Additionally, it discusses BI architecture, including core components and tools that support data integration and visualization for informed decision-making.

Uploaded by

rishabshrivastav2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views81 pages

BI-Class Notes - Unit 1-4-5

Uploaded by

rishabshrivastav2002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

UNIT

1
Business Intelligence Introduction [6 Hours]
• Definition
• History of Business intelligence
• Leveraging Data and Knowledge for BI
• BI Components
• Business Intelligence and Business
Analytics
• BI Life Cycle
• Business intelligence architectures
• Effective and timely decisions
Deﬁnition & History of Business
intelligence
• In 1865, Richard Millar Devens presented the phrase ―Business
Intelligenceǁ (BI) in the ―Cyclopædia of Commercial and Business
Anecdotes.ǁ
• He used it to describe how Sir Henry Furnese, a banker, profited from
information by gathering and acting on it before his competition.
• More recently, in 1958, an article was written by an IBM computer
scientist named Hans Peter Luhn, describing the potential of gathering
business intelligence (BI) through the use of technology.
• Business intelligence, as it is understood today, uses technology to
gather and analyze data, translate it into useful information, and act on
it ―before the competition.ǁ
• Essentially, the modern version of BI focuses on technology as a
way to make decisions quickly and efficiently, based on the right
information at the right time.
Histor
y In 1968, only individuals with extremely specialized skills
•
could translate data into usable information.
• At this time, data from multiple sources was normally stored in
silos, and research was typically presented in a fragmented,
disjointed report that was open to interpretation.
• Edgar Codd recognized this as a problem, and published a
paper in 1970, altering how people thought about databases.
• His proposal of developing a ―relational database modelǁ
gained tremendous popularity and was adopted worldwide.
Histor
y• Decisionsupport systems (DSS) was the first
management system to database
be developed.
• Many historians suggest the modern version of business
intelligence evolved from the DSS database. The number of BI
vendors grew in the 1980s, as business people discovered the
value of business intelligence.
• An assortment of tools was developed during this time, to
access and organize data in simpler ways.
• OLAP, executive information systems, and data warehouses
were some of the tools developed to work with DSS.
Business Intelligence and Business
Analytics
• Currently, the two terms are used interchangeably.
• Both describe the general practice of using data in making
informed,
intelligent business decisions.
• The term business intelligence has evolved to depend on a
range of technologies that provide useful insights.
• Conversely, analytics represents the tools and processes that can translate
raw data into actionable, useful information for decision-making purposes.
• Different forms of analyticshave been developed, including
streaming analytics, which works in real time.
Descriptive
•Analytics
Descriptive analyticsdescribes, or summarizes data, and is
focused
primarily on historical information.
• This type of analytics describes the past, allowing for an understanding of
how previous behaviors affect the present.
• Descriptive analytics can be used to explain how a company operates and to
describe different aspects of the business.
• In the best-case scenario, descriptive analytics tells a story with a relevant
theme and provides useful information.
Predictive
Analytics
• Predictive analytics is used to predict the future.
• This type of analytics uses statistical data to supply companies with useful
insights about upcoming changes, such as identifying sales trends,
purchasing patterns, and forecasting customer behavior.
• The business uses of predictive analytics normally include anticipating sales
growth at the end of the year, what products customers might purchase
simultaneously, and forecasting inventory totals.
• Credit scores offer an example of this type of analytics, with financial
services using them to determine a customer‘s probability of making
payments on time.
Prescriptive
Analytics
• Prescriptive analytics is a relatively new field, and is still a little hard to work
with.
• This type of analytics ―prescribesǁ several different possible actions and
guides people toward a solution.
• Prescriptive analytics is designed to provide advice.
• Essentially, it predicts multiple futures and allows organizations to assess
many possible outcomes, based upon their actions.
• In the best-case scenario, prescriptive analytics will predict what will
happen, why it will happen, and provide recommendations.
• Larger companies have used prescriptive analytics to successfully optimize
scheduling, revenue streams, and inventory, in turn, improving the customer
experience.
Streaming
• Analytics
Streaming analytics is the real-time processing of data. It is designed to
constantly calculate, monitor, and manage data-based statistical information,
and respond immediately.
• The process deals with recognizing and responding to specific situations, as
they happen.
• Streaming analytics has significantly improved the development and use of
business information.
• Data for streaming analytics can come from a variety of sources, including
mobile phones, the Internet of Things (IoT), market data, transactions, and
mobile devices (tablets, laptops).
Streaming
Analytics
It connects management to external data sources, allowing applications to combine and
•
merge data into an application flow, or update external databases with processed
information, quickly and efficiently. Streaming analytics supports:
• Minimizing damage caused by social media meltdowns, security breaches, airplane
crashes, manufacturing defects, stock exchange meltdowns, customer churn, etc.
• Analyzing routine business operations in real time
• Finding missed opportunities with big data
• The option to create new business models, revenue streams, and product innovations
• Some examples of streaming data are social media feeds, real-time stock trades, up-to-the-
minute retail inventory management, or ride-sharing apps.
• For instance, when a customer calls Lyft, streams of data are joined to create seamless user
experiences. The application merges real-time location tracking, pricing, traffic stats, and
real-time traffic data to provide the customer with the nearest available driver, pricing, and
a time estimate to the destination using both historical and real-time data.
• Streaming analytics has become an extremely useful tool for short-term coordination, as
well as developing business intelligence over the long term.
BI Life
Cycle
• Business Intelligence is a strategic initiative which helps the organizations
to measure the effectiveness of their plans on the market.
• A successful company must know how to plan and how to address a BI
strategy so that the project or projects implicated in the process have
maximum profitability.
• Company managers with each project manager should adopt a specific
methodology based on the needs they know they have.

BI vs Analytics

❖ BI usespast data for current business operating decisions.

Meanwhile,
Analytics uses past data for planning future business decisions.
❖ BI is descriptive (demographic answers and performance answers)
while Analytics is predictive (predictive answers and recommendation)
❖ BI discusses reporting, dashboards. Analytics talks about
future-looking,
probability.
Business Intelligence
Lifecycle
1.
•
Data Collection
Because using data to obtain information, data collection can be done in 2 ways, namely
primary (such as interviews) and secondary (such as searching the internet).

2. Data Warehousing
• Data Warehousing is a storage area for data that has been collected and is large in number.

3. Data Analysis & Q.A.

• Then the raw data will be analyzed by entering a query or question to get useful
information conclusions.

4. Reporting, Dashboards, KPIs, Trends

• After entering the query or question, the information will come out in the form of
reporting, dashboards, KPIs, and trends

. Business Decision
• If the visualization is visible, the next part determines the decision by considering the
information obtained.
Business intelligence
architectures

Business intelligence architecture components and diagram

A BI architecture can be deployed in an on-premises data center or in the cloud.

In either case, it contains a set of core components that collectively support the different stages of the
BI process from data collection, integration, data storage and analysis to data visualization, information
delivery and the use of BI data in business decision-making.
The core components of a BI architecture
the following:
include
• Source systems.
• These are all of the systems that capture and hold the transactional and
operational data identified as essential for the enterprise BI program.
• For example, this can include enterprise resource planning, customer
relationship management, flat files, application programming interfaces,
finance, manufacturing and supply chain management systems as well
as secondary sources, such as market data and customer databases from
outside information providers.
• As a result, both internal and external data sources are often
incorporated into a BI architecture.
• Important criteria in the data source selection process include data
relevancy, data currency, data quality and the level of detail in the
available data sets.
• In addition, a combination of structured, semi-structured and
unstructured data types might be required to meet the data analysis and
decision-making needs of executives and other end users.
Business intelligence
architectures
• Data integration and cleansing tools.
• To effectively analyze the collected data for a BI program, an organization
must integrate and consolidate different data sets to create unified views of
them.
• The most widely used data integration technology for BI applications is
extract, transform and load (ETL) software, which pulls data from source
systems in batch processes.
• A variant of ETL is extract, load and transform, a technology in which data
is extracted and loaded as-is and transformed later for specific BI uses.
• Other methods include real-time data integration, such as change data
capture and streaming integration to support real-time analytics applications,
and data virtualization, which combines data from different source systems
virtually.
• A BI architecture typically also includes data profiling and data cleansing
tools that are used to identify and fix data quality issues.
• They help BI and data management teams provide clean, consistent data
that's suitable for BI uses.
The core components of a BI architecture include the
following
• Analytics data stores.
• This encompasses the various repositories where BI data is stored and
managed.
• The primary repository is a data warehouse, which usually stores
structured data in a relational, columnar or multidimensional database
and makes it available for querying and analysis.
• An enterprise data warehouse can also be tied to smaller data marts set
up for individual departments and business units with data that's specific
to their BI needs.
• BI architectures often include an operational data store (ODS) that's an
interim repository for data before it goes into a data warehouse. An
ODS can also be used to run analytical queries against recent transaction
data. Depending on the size of a BI environment, a data warehouse, data
mart and an ODS can be deployed on a single database server or
separate business intelligence systems.
• A well-planned architecture should specify which of the different data
stores is best suited for particular BI uses.
The core components of a BI architecture include the
following
• BI and data visualization tools.
• The tools used to analyze data and present information to
business users include a suite of technologies that can be
built into a BI architecture -- for example, ad hoc query,
data mining and online analytical processing software.
• The growing adoption of self-service BI tools enables
business analysts and managers to run queries themselves
instead of relying on the members of the BI team to do
that for them.
• BI software also includes data visualization tools that can
be used to create graphical representations of data in the
form of charts, graphs and other types of visualizations
designed to illustrate trends, patterns and outlier
elements in data sets.
The core components of a BI architecture include the
following
• Dashboards, portals and reports.
• These information delivery tools give users visibility into the
results of BI and analytics applications with built-in data
visualizations and, often, self-service capabilities to do
additional data analysis.
• For example, BI dashboards and online portals can be
designed to provide real-time data access with configurable
views and give users the ability to drill down into data.
Reports tend to present data in a more static format.
• Other components that increasingly are part of a business
architecture include data preparation software used to
structure and organize data for analysis and a metadata
repository, a business glossary and a data catalog, which can
help users find relevant data and understand its lineage and
meaning.
The core components of a BI architecture include the
following
• BI architecture tools
• BI architecture tools facilitate the centralization of
data collection as well as data analysis and
visualization.
• These tools play an integral role in empowering
businesses to make informed decisions and extract
insights from extensive data sets.
• Tools:
• Microsoft Power BI
• Oracle Business Intelligence
• SAS Business Intelligence
• Tableau
Business Intelligence Architecture With Components

BI Architecture Framework In Modern Business

ARCHITECTURE OF
BUSINESS INTELLIGENCE
Main differences business intelligence and data warehousing
Effective and timely
decisions.
• Business intelligence may be deﬁned as a set of
mathematical models and analysis methodologies that exploit
the available data to generate information and knowledge
useful for complex decision-making processes
Effective and timely
decisions
• In complex organizations, public or private, decisions are made on a continual
basis.
• The ability of these knowledge workers to make decisions, both as individuals and
as a community, is one of the primary factors that influence the performance and
competitive strength of a given organization.
• require a more rigorous attitude based on analytical methodologies and
mathematical models
• Retention in the mobile phone industry or low customer loyalty, also known as
customer attrition or churn or rely on a budget adequate to pursue a customer
retention
• choosing those customers to be contacted so as to optimize the effectiveness
of the campaign
• target the best group of customers and thus reduce churning and maximize
customer retention
The main purpose of business intelligence systems is to provide
knowledge workers with tools and methodologies that allow them to
make effective and timely decisions.

• Effective decisions.
• rigorous analytical methods allows decision makers to
rely on information and knowledge & ensuing in-depth
examination and thought lead to a deeper awareness
and comprehension of the underlying logic of the
decision-making process
• Timely decisions.
• If decision makers can rely on a business intelligence
system facilitating their activity, we can expect that the
overall quality of the decision-making process will be
greatly improved.
[Unit 4]
Data
❖ Deﬁnition of data
Warehousing
warehouse
❖ Data marts
❖ Data quality
❖ Data warehouse architecture
❖ ETL tools
❖ Metadata
❖ Schemas Used in Data Warehouses: Star, Snowﬂake and
fact constellation
❖ Cubes and multidimensional analysis
❖ Hierarchies of concepts
❖ OLAP operations ,OLAP vs OLTP
❖ Materialization of cubes of data
Metadat
a The standard
• definition of metadata is "data
about the data," which unfortunately is not a
particularly enlightening description.
• It is useful to think of metadata as a catalog of the
intellectual capital that surrounds the creation,
management, and use of a collection of
information.
• That can range from simple observations about
the number of columns in a database table to
complex descriptions about the way that data
flowed from multiple sources into the target
Metadat
a From relatively
• humble beginnings as the data
dictionary associated with mainframe database
tables, the concept of metadata has evolved over
time to become a major component of a BI
program.
• Essentially, metadata is a sharable master key to
all the information that is feeding the business
analytics, from the extraction and population of the
central repository to the provisioning of data out of
the warehouse and onto the screens of the business
clients.
• Metadata are data about data (e.g., see Sen, 2004; and
Zhao, 2005).
• Metadata describe the structure of and some meaning about
data, thereby contributing to their effective or ineffective
use.
• Metadata are generally defined in terms of usage as
technical or business metadata.
• Pattern is another way to view metadata.
• According to the pattern view, we can differentiate between
• syntactic metadata (i.e., data describing the syntax of data),
• structural metadata (i.e., data describing the structure of the data),
and
• semantic metadata (i.e., data describing the meaning of the data in
a specific domain).
Metadat
a• The primary purpose of metadata should be to provide
context to the reported data.
• In many ways, metadata assist in the conversion of
data and information into knowledge.
• Zhao (2005) described five levels of metadata
management maturity: (1) ad hoc, (2) discovered, (3)
managed, ( 4) optimized, and (5) automated.
• The design, creation, and use of metadata-descriptive
or summary data about data-and its accompanying
standards may involve ethical issues.
The Importance of
Metadata
The management of metadata is probably one of the most critical tasks associated
with a successful BI program, for a number of reasons.
❖ Metadata encapsulates both the logical and physical business
knowledge required to transform disparate data sets into a coherent
warehouse.
❖ Metadata captures the structure and meaning of the data that is being
fed into the warehouse.
❖ The recording of operational metadata provides a road map for
deriving an information audit trail.
❖ One can capture differences associated with how data is manipulated
over time (as well as the corresponding business rules), which is critical
with data warehouses whose historical data spans large periods of time.
❖ Metadata provides the means for tracing the evolution of information
as a way to validate and verify results derived from an analytical
process.
Metadata is divided into two areas:
•technical metadata, which describes the
data mechanics, and
•business metadata, which describes the
business perception of that same
information.
Technical Meta
data
• Technical metadata describes the structure
of
information, whether it is the data that
sourcing the or the is
warehouse.
warehouse data in the
• Technical metadata characterizes the structure
of data, the way that data move, and how it is
transformed as it moves from one location to
another.
• This may incorporate some or all of the
following.
• Connectivity metadata, which describes the ways that
data consumers interact with the database system,
including the names used to establish connections,
database names, data source names, whether connections
can be shared, and the connection timeout.
• Table information, including table names; the
description of what is modeled by each table; in which
database the table is stored; the physical location, size,
and growth rate of the table; the data sources that feed
each table; update histories (including the date of last
update and of last refresh); the results of the last update;
candidate keys; foreign keys; the degrees of the foreign
key cardinality (e.g., 1:1 versus 1 :many); referential
integrity constraints; functional dependencies; and
indexes
• Record structure information, which describes the
structure of the record; overall record size; whether the
record is a variable or static length; all column names,
types, descriptions, and sizes; source of values that populate
each column; whether a column is an automatically
generated unique key; null status; domain restrictions; and
validity constraints.
• Record manipulation metadata, which includes record
creation time, time of last update, the last person to modify
the record, and the results of the last modification.
• Index metadata, which describes what indexes exist, on
which tables those indexes are made, the columns that are
used to perform the indexing, whether nulls are allowed,
and whether the index is automatically or manually updated
• Data practitioners, which enumerates the staff members who
work with data, their contact information (e.g., telephone number,
e-mail address), and the objects to which they access.
• Security and access metadata, which identifies the owner of the
data, the ownership paradigm, who may access the data and with
which permissions (e.g., read-only versus modify)
• Data model metadata, which captures entity-relationship
diagrams, dimensional layouts and star join structures, logical
data models, and physical data models
• Physical features metadata, such as the size of tables, the
number of records in each table, and the maximum and minimum
record sizes if the records are of variable length
• Reference metadata, such as defined enumerated data domains,
value ranges, likely values (for reasonableness tests), and
mappings between data domains
• Management metadata, such as the history of a data table or
database, stewardship information, and responsibility matrices
• Transformation metadata, which describes the data sources
that feed into the data warehouse, the ultimate data destination,
and, for each destination data value, the set of transformations
used to materialize the datum and a description of the
transformation
• Process metadata, which describes the information flow and
sequence of extraction and transformation processing, including
data profiling, data cleansing, standardization, and integration
• Supplied data metadata, which, for all supplied data sets, gives
the name of the data set, the name of the supplier, the names of
individuals responsible for data delivery, the delivery mechanism
(including time, location, and method), the expected size of the
supplied data, the data sets that are sourced using each supplied
data set, and any transformations to be applied upon receiving the
Metadata is divided into two areas:
•technical metadata, which describes the
data mechanics, and
•business metadata, which describes the
business perception of that same
information.
Business
Metadata
Business metadata incorporates much of the same information as technical
metadata, as well as:
• Metadata that describes the structure of data as perceived by business
clients
• Descriptions of the methods for accessing data for client
analytical applications
• Business meanings for tables and their attributes
• Data ownership characteristics and responsibilities
• Data domains and mappings between those domains, for validation
• Aggregation and summarization directives
• Reporting directives
• Security and access policies
• Business rules that describe constraints or directives associated with data
within a record or between records as joined through a join condition
The Metadata
Repository
• Metadata is data, which means that it can
managed the same way other data is managed.
be modeled and

• As the primary source of knowledge about the inner workings of

the BI environment, it is important to build and maintain a
metadata repository that is available to all knowledge workers
involved in the BI program.
• Whether the metadata repository is physically centralized or
distributed across multiple systems and however it is accessed, it
is important to provide a mechanism for publishing metadata.
• The existence of disparate data systems that contribute
information to the BI environment complicates this process,
because each system may have its own methods for managing its
own metadata.
Management
Issues
• As a manager, it is important to know that the area of
data warehousing is not just about building BI
frameworks. As is typical with any loosely structured
technology, the amount of buzz surrounding data
warehousing seems to be inversely proportional to the
number of truly successful implementations, and my
guess is that the number of available experts on data
warehousing is probably equal to the number of failed
data warehousing projects. The significant
management issues associated with the topics in this
chapter deal with aspects of this.
[Unit 5] Data Mining and Application
of BI
• Data mining
• Definition of data mining
• Models and methods for data mining
• Data mining
• Classical statistics & OLAP
• Applications of data mining
• Representation of input data
• Data mining process
• Applications of BI: Data Warehousing Helps Multi Care Save More Lives
• Smarter Insurance: Infinity P&C Improves Customer
• Service and Combats Fraud with Predictive Analytics.
Data
mining
• the term data mining indicates the process of
exploration and analysis of a dataset, usually
of large size, in order to find regular patterns, to
extract relevant knowledge and to obtain
meaningful recurring rules.
Definition of data
mining
• Data mining activities constitute an iterative process aimed at the
analysis of large with the purpose of
databases,
information and extracting
potentially useful for knowledgethatmay
knowledge prove accurate
workers engaged and
in decision
making and problem solving.
• The term data mining refers therefore to the overall process
consisting of data gathering and analysis, development of
inductive learning models and adoption of practical decisions and
consequent actions based on the knowledge acquired.
• Data mining activities can be subdivided into two major
investigation Streams interpretation and prediction.
Interpretation. The purpose of interpretation is to identify regular patterns in the data and
to express them through rules and criteria that can be easily understood by experts in the
application domain.
o Prediction. The purpose of prediction is to anticipate the value that a random variable
will
Models and methods for data
• There are several learning methods that are available to perform
mining
the different data mining tasks.
• A number of techniques originated in the field of computer
science, such as classification trees or association rules, and
are referred to as machine learning or knowledge discovery in
databases.
Applications of data
mining
• Data mining methodologies can be applied to a variety of domains, from
marketing and manufacturing process control to the study of risk factors in
medical diagnosis, from the evaluation of the effectiveness of new drugs to
fraud detection.

Relational marketing identiﬁcation of customer segments, prediction of the

rate of positive responses
▪ Fraud detection illegal use of credit cards and bank checks)
▪ Risk evaluation. estimate the risk connected with future decisions)
▪Text mining represent unstructured data, in order to classify articles, books,
documents
▪Image recognition. The treatment and classification of digital images It is useful to
recognize written characters, compare and identify human faces, apply correction
filters to photographic equipment and detect suspicious behaviors through
surveillance video cameras.
▪Web mining. intended for the analysis of so-called clickstreams – the sequences of
pages visited and the choices made by a web surfer
▪ Medical diagnosis. Learning models are an invaluable tool within the medical field
for the early detection of diseases using clinical test results.
[Unit 5] Data Mining and Application
of BI
• Data mining
• Definition of data mining
• Models and methods for data mining
• Data mining
• Classical statistics & OLAP
• Applications of data mining
• Representation of input data
• Data mining process
• Applications of BI: Data Warehousing Helps Multi Care Save More Lives
• Smarter Insurance: Infinity P&C Improves Customer
• Service and Combats Fraud with Predictive Analytics.
Key properties of Data Mining :
1. Automatic discovery of patterns
2. Prediction of likely outcomes
3. Creation of actionable
information
4.Focus on large datasets and
Classical statistics &
OLAP
• OLAP
extraction of details andaggregate totals from data
information distribution of incomes of home loan applicants
• Statistics
verification of hypothesesformulated by analysts
validation analysis of variance of incomes home
of loan
applicants
• Data mining
identification of patterns andrecurrences in data
knowledge characterization of home loan applicants and
prediction of future applicants
Data Mining OLAP

Data mining refers to the field of OLAP is a technology of immediate

computer science, which deals with access to data with the help of
the extraction of data, trends and multidimensional structures.
patterns from huge sets of data.

It deals with the data summary. It deals with detailed

transaction- level data.
It is discovery-driven. It is query driven.
It is used for future data prediction. It is used for analyzing past data.

It has huge numbers of dimensions. It has a limited number of

dimensions.
Bottom-up approach. Top-down approach.
It is an emerging field. It is widely used.
Alternative names for Data Mining :
1.Knowledge discovery (mining) in databases
(KDD)
2. Knowledge extraction
3. Data/pattern analysis
4. Data archaeology
5. Data dredging
6. Information harvesting
7. Business intelligence
Data mining
process
• Definition of objectives. Data mining analyses are carried out in specific
application domains and are intended to provide decision makers with useful
knowledge
Data mining
•process
Data gathering and integration. Once the objectives of the investigation
have been identified, the gathering of data begins. Data may come from
different sources and therefore may require integration.
• ▪ Exploratory analysis. In the third phase of the data mining process, a
preliminary analysis of the data is carried out with the purpose of getting
acquainted with the available information and carrying out data cleansing.
• ▪ Attribute Selection. In the subsequent phase, the relevance of the different
attributes is evaluated in relation to the goals of the analysis.(selecting
appropriate attribute means columns rollno, rank, name, state)
• ▪ Model development and validation. Once a high quality dataset has been
assembled and possibly enriched with newly defined attributes, pattern
recognition and predictive models can be developed. (validation against
training sets)
• ▪ Prediction and interpretation. knowledge workers may be able to use it
to draw predictions and acquire a more in-depth knowledge of the
phenomenon of interest.
Tasks of Data
Mining
• Anomaly detection (Outlier/change/deviation detection) – The identification
of unusual data records, that might be interesting or data errors that require
further investigation.
• Association rule learning (Dependency modeling) – Searches for
relationships between variables. For example a supermarket might gather
data on customer purchasing habits. Using association rule learning, the
supermarket can determine which products are frequently bought together
and use this information for marketing purposes. This is sometimes referred
to as market basket analysis.
• Clustering – is the task of discovering groups and structures in the data that
are in some way or another "similar", without using known structures in the
data.
• Classification – is the task of generalizing known structure to apply to new
data. For example, an e-mail program might attempt to classify an e-mail as
"legitimate" or as "spam".
• Regression – attempts to find a function which models the data with the least
error.
Tasks of Data
Mining
• Summarization – providing a more compact
representation of the data set, including visualization
and report generation.
Knowledge Discovery in
Databases(KDD)
Data Cleaning - In this step the noise and inconsistent data is
removed.
Data Integration - In this step multiple data
sources are
combined.
Data Selection- In this step relevant to theanalysis
task are retrieved from the database.
Data Transformation - In this step data are transformed or
consolidated into forms appropriate for mining by performing
summary or aggregation operations.
Data Mining - In this step intelligent methods are applied in
order to extract data
patterns.
Pattern Evaluation - In this step, data patterns are evaluated.
Knowledge Presentation - In this step,knowledge is repres
Major Issues In Data
Mining:
•Mining different kinds of knowledge in databases. - The need
of different users is not the same. And Different user may be in
interested in different kind of knowledge. Therefore it is necessary
for data mining to cover broad range of knowledge discovery task.
• Interactive mining of knowledge at multiple levels of
abstraction. - The data mining process needs to be interactive
because it allows users to focus the search for patterns, providing
and refining data mining requests based on returned results.
• Incorporation of background knowledge. - To guide discovery
process and to express the discovered patterns, the background
knowledge can be used. Background knowledge may be used to
express the discovered patterns not only in concise terms but at
multiple level of abstraction.
• Data mining query languages and ad hoc data mining. -
Data Mining Query language that allows the user to describe ad
hoc mining tasks, should be integrated with a data warehouse
query language and optimized for efficient and flexible data
mining.
• Presentation and visualization of data mining results. - Once
the patterns are discovered it needs to be expressed in high level
languages, visual representations. This representations should
be easily understandable by the users.
• Handling noisy or incomplete data. - The data cleaning
methods are required that can handle the noise, incomplete
objects while mining the data regularities. If data cleaning
methods are not there then the accuracy of the discovered
patterns will be poor.
• Pattern evaluation. - It refers to interestingness of the problem.
The patterns discovered should be interesting because either they
represent common knowledge or lack novelty.
• Efficiency and scalability of data mining algorithms. - In order to
effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable.
• Parallel, distributed, and incremental mining algorithms. -
The factors such as huge size of databases, wide distribution of
data,and complexity of data mining methods motivate the
development of parallel and distributed data mining algorithms.
These algorithm divide the data into partitions which is further
processed parallel. Then the results from the partitions is merged.
The incremental algorithms, updates databases without having mine
the data again from scratch.
Representation of input
data
• Input to a data mining analysis takes the form of a two
dimensional table, called a dataset, irrespective of the
actual logic and material representation adopted to store
the information in files, databases, data warehouses and
data marts used as data sources.
• The rows in the dataset correspond to the observations
recorded in the past and are also called examples, cases,
instances or records.
• The columns represent the information available for
each observation and are termed attributes, variables,
characteristics or features.
• Attributes contained in a dataset can be categorized as
categorical or numerical
❖ Categorical. Categorical attributes assume a finite number
of distinct values

❖ Numerical. Numerical attributes assume a finite or infinite

number of values and lend themselves to subtraction or
division operations
• Counts. Counts are categorical attributes in relation to
which a specific property can be true or false
• Nominal. Nominal attributes are categorical attributes
without a natural ordering, such as the province of
residence.
• Ordinal. Ordinal attributes, such as education level, are
categorical attributes that lend themselves to a natural
ordering
• Discrete. Discrete attributes are numerical attributes that
assume a finite number or a countable infinity of
values(not same every time)
• Continuous. Continuous attributes are numerical
attributes that assume an uncountable infinity of values.
Analysis
methodologies
• First fundamental distinction between supervised and
unsupervised learning processes
• Supervised learning.
• In a supervised (or direct) learning analysis, a target
attribute either represents the class to which each record
belongs. (set of rules are defined and observed)
• Unsupervised learning.
• Unsupervised (or indirect) learning analyses are not
guided by a target attribute.
• one is interested in identifying clusters of records that
are similar within each cluster and different from
members of other clusters.
Seven basic data mining
tasks
• Characterization and discrimination (arrange data according to
characterization, and divide as per set of rules, students or professor, divide as per
class or subjects)
• Classification (Each observation is described by a given number of attributes
whose value is known)
• Regression (regression is used when the target variable takes on continuous
values, predict the sale of product before launch)
• time series analysis (predicting the value of the target variable for one or more
future periods, on history of that variable)
• association rules (identify interesting and recurring associations between
groups of records of a dataset, who buy what how many times)
• Clustering (The term cluster refers to a homogeneous subgroup existing within
a population.)
• description and visualization (representation is justified by the remarkable
conciseness of the information achieved through a well-designed chart)
• Characterization and Discrimination
• For example, in the AllElectronics store, classes of items
for sale include computers and printers, and concepts of
customers include bigSpenders and budgetSpenders.
• Classification
• to check whether it is raining or not; fraud detection
• Regression
•predicting house prices; predicting the weather, predicting
the impact of SAT/GRE scores on college admissions
• Time series analysis
• Stock market analysis
Applications of
BI
• Data Warehousing Helps MultiCare Save More Lives

•Smarter Insurance: Infinity P&C Improves Customer

Service and Combats Fraud with Predictive Analytics.

Business Intelligence Notes
100% (1)
Business Intelligence Notes
88 pages
Business Intelligence - Notes
No ratings yet
Business Intelligence - Notes
71 pages
Unit-1 Bi
No ratings yet
Unit-1 Bi
56 pages
Module 3
No ratings yet
Module 3
40 pages
Business Intelligence Systems
No ratings yet
Business Intelligence Systems
31 pages
Week 4 - Business Intelligent and Analytics
No ratings yet
Week 4 - Business Intelligent and Analytics
30 pages
Business Inteligence
No ratings yet
Business Inteligence
74 pages
MISch03 - Busines Intelligence
No ratings yet
MISch03 - Busines Intelligence
41 pages
An Overview of Business Intelligence, Analytics, and Data Science
0% (1)
An Overview of Business Intelligence, Analytics, and Data Science
44 pages
Business Intelligence Concepts: Ms. Shikha Saraswat
No ratings yet
Business Intelligence Concepts: Ms. Shikha Saraswat
129 pages
Business Intelligence Lecture Notes-21-05
No ratings yet
Business Intelligence Lecture Notes-21-05
14 pages
Business Intelligence - Unit 1 - Upd
No ratings yet
Business Intelligence - Unit 1 - Upd
51 pages
Business Intelligence - I & II Units
No ratings yet
Business Intelligence - I & II Units
84 pages
BI Analytics Overview
No ratings yet
BI Analytics Overview
58 pages
Introduction To BI
No ratings yet
Introduction To BI
45 pages
BA-Unit-II Notes
100% (1)
BA-Unit-II Notes
39 pages
Business Analytics Book
No ratings yet
Business Analytics Book
68 pages
Tutorial Business Intelligence 3rd - PDF Material
No ratings yet
Tutorial Business Intelligence 3rd - PDF Material
22 pages
BI 102 FBI LECTURE 1 (Introduction BI) Slides 2017 3
No ratings yet
BI 102 FBI LECTURE 1 (Introduction BI) Slides 2017 3
33 pages
DSS Lec.2
No ratings yet
DSS Lec.2
23 pages
BI 102 FBI LECTURE 1 (Introduction BI) Slides
No ratings yet
BI 102 FBI LECTURE 1 (Introduction BI) Slides
33 pages
Bi 1
No ratings yet
Bi 1
21 pages
Lecture 9 - Business Intelligence
No ratings yet
Lecture 9 - Business Intelligence
40 pages
Chapter1 BI
No ratings yet
Chapter1 BI
41 pages
Lect 1
No ratings yet
Lect 1
21 pages
Ch01-BI Intro PDF
No ratings yet
Ch01-BI Intro PDF
29 pages
Business Intelligence Unit I
No ratings yet
Business Intelligence Unit I
59 pages
UNIT-1 Business Intelligence
No ratings yet
UNIT-1 Business Intelligence
30 pages
MODULE.1 - Business Intelligence An Introduction
No ratings yet
MODULE.1 - Business Intelligence An Introduction
8 pages
Fundamentals of Business Analytics Module
No ratings yet
Fundamentals of Business Analytics Module
5 pages
List of Courses 2023-24
No ratings yet
List of Courses 2023-24
80 pages
Chapter 1
No ratings yet
Chapter 1
103 pages
Kcs-501: Database Management System
No ratings yet
Kcs-501: Database Management System
83 pages
PPT1-W1-S1-R0-BI-BA-DS Overview
No ratings yet
PPT1-W1-S1-R0-BI-BA-DS Overview
40 pages
Unit 1
No ratings yet
Unit 1
6 pages
Business Intelligence - Saxena y Srinivasan
No ratings yet
Business Intelligence - Saxena y Srinivasan
15 pages
Information System Cloud Computing Bi / Ba Big Data Artificial Intelligence Machine Learning Sensor Network/ Iot Smac / Ismac
No ratings yet
Information System Cloud Computing Bi / Ba Big Data Artificial Intelligence Machine Learning Sensor Network/ Iot Smac / Ismac
37 pages
Introduction To Business Intelligence
No ratings yet
Introduction To Business Intelligence
8 pages
BusinessIntelligence Analytics
No ratings yet
BusinessIntelligence Analytics
13 pages
The Concepts of Business Intelligence
No ratings yet
The Concepts of Business Intelligence
30 pages
Business Intelligence Innovating Data Analytics
No ratings yet
Business Intelligence Innovating Data Analytics
6 pages
P-1.1.1Business Intelligence and Data Analyticsgenerate Insights
No ratings yet
P-1.1.1Business Intelligence and Data Analyticsgenerate Insights
14 pages
Business Intelligence-Final Print
No ratings yet
Business Intelligence-Final Print
13 pages
Business Intelligence Bi in Supply Chain
No ratings yet
Business Intelligence Bi in Supply Chain
12 pages
BI Curs 3
No ratings yet
BI Curs 3
41 pages
Data Visualization Handout
No ratings yet
Data Visualization Handout
5 pages
CH-4 Business Analytics (Wiley)
No ratings yet
CH-4 Business Analytics (Wiley)
30 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
5 pages
Chapter 2 and Chapter 3
No ratings yet
Chapter 2 and Chapter 3
30 pages
Brief History: Garcia, Jeremiah R. Business Analytics
No ratings yet
Brief History: Garcia, Jeremiah R. Business Analytics
3 pages
Search, Char Array, Pointers
No ratings yet
Search, Char Array, Pointers
27 pages
What Is Business Intelligence?
No ratings yet
What Is Business Intelligence?
9 pages
Business Intelligence: "Learn As You Grow"
No ratings yet
Business Intelligence: "Learn As You Grow"
18 pages
MIS Module 1 Notes
No ratings yet
MIS Module 1 Notes
12 pages
BI Software Business Intelligence Portal
No ratings yet
BI Software Business Intelligence Portal
7 pages
05 Memory Examples Fifo PDF
No ratings yet
05 Memory Examples Fifo PDF
22 pages
Business Analytics Introduction
No ratings yet
Business Analytics Introduction
8 pages
Internal Tables:: APPEND Gwa - Student TO It
No ratings yet
Internal Tables:: APPEND Gwa - Student TO It
20 pages
Group 8 Tejaswini Sepideh Tejoram Suvashree Rahul
No ratings yet
Group 8 Tejaswini Sepideh Tejoram Suvashree Rahul
15 pages
Beliefs and Conviction
No ratings yet
Beliefs and Conviction
16 pages
Unit 5 Dbms
No ratings yet
Unit 5 Dbms
12 pages
Quiz No. 2 - AIS
No ratings yet
Quiz No. 2 - AIS
2 pages
Bussiness Intelligent
No ratings yet
Bussiness Intelligent
6 pages
Odata CURD Operation
No ratings yet
Odata CURD Operation
14 pages
CODENA
No ratings yet
CODENA
9 pages
IMS Status Codes
100% (1)
IMS Status Codes
6 pages
Geoinformatics
No ratings yet
Geoinformatics
19 pages
3D Animation Streaming
No ratings yet
3D Animation Streaming
4 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
ASSIGNMENT 1 Comp626
No ratings yet
ASSIGNMENT 1 Comp626
6 pages
Activity Group MGT112
No ratings yet
Activity Group MGT112
3 pages
T-Codes, Tables, Reports
No ratings yet
T-Codes, Tables, Reports
6 pages
2005 1 PDF
No ratings yet
2005 1 PDF
12 pages
Business Intelligence (Bi) in Supply Chain Management
No ratings yet
Business Intelligence (Bi) in Supply Chain Management
12 pages
Chapter 3
100% (1)
Chapter 3
2 pages
INDR 481 01) Information Systems Spring 2021: Syllabus
No ratings yet
INDR 481 01) Information Systems Spring 2021: Syllabus
3 pages
Concurrency Control DBMS
No ratings yet
Concurrency Control DBMS
12 pages
1st Circular of The 14th IESO 2021
No ratings yet
1st Circular of The 14th IESO 2021
5 pages
Hard
No ratings yet
Hard
9 pages
Locator Rev.0
No ratings yet
Locator Rev.0
2 pages
Business Analytics
No ratings yet
Business Analytics
9 pages
1 ODI Architecture
100% (1)
1 ODI Architecture
14 pages
How Is ETL Testing Different From Manual Testing - LinkedIn
No ratings yet
How Is ETL Testing Different From Manual Testing - LinkedIn
2 pages
Eul Definition in RNC
100% (1)
Eul Definition in RNC
3 pages
Nandan Resume
No ratings yet
Nandan Resume
1 page
Business Intelligence & Anlytics
No ratings yet
Business Intelligence & Anlytics
3 pages

BI-Class Notes - Unit 1-4-5

Uploaded by

BI-Class Notes - Unit 1-4-5

Uploaded by

UNIT

❖ BI usespast data for current business operating decisions.

3. Data Analysis & Q.A.

4. Reporting, Dashboards, KPIs, Trends

Business intelligence architecture components and diagram

A BI architecture can be deployed in an on-premises data center or in the cloud.

BI Architecture Framework In Modern Business

• As the primary source of knowledge about the inner workings of

Relational marketing identiﬁcation of customer segments, prediction of the

Data mining refers to the field of OLAP is a technology of immediate

It deals with the data summary. It deals with detailed

It has huge numbers of dimensions. It has a limited number of

❖ Numerical. Numerical attributes assume a finite or infinite

•Smarter Insurance: Infinity P&C Improves Customer

You might also like