Unit 4
Unit 4
INTELLIGENCE
Objectives
After studying this unit, you will be able to:
• Understand the depth of knowledge of Business Intelligence (BI) and
relative terminologies.
• Recognize the usage of various Business Intelligence tools and
techniques to collect, analyze, and interpret data from different sources.
• Gain insights into business operations, customer behaviour, market
trends, and other key areas of interest.
• Evaluate the goal of BI is to provide decision-makers with the
information they need to optimize business performance, reduce costs,
increase revenue, and achieve strategic objectives.
Structure
4.1 Introduction to Business Intelligence
4.2 Data Warehousing
4.2.1 Data Modeling and schema design
4.3 Data Mining and Analytics
4.4 Data Governance and Security
4.5 Business Intelligence Applications
4.6 Summary
4.7 Self-Assessment Exercises
4.8 Further Readings
After the data is collected, it is converted into a format that can be analyzed
with BI tools such as data visualization software, dashboards, and reporting
tools. These tools enable businesses to analyze large datasets, identify trends,
and gain a better understanding of their operations. One of the most
significant advantages of business intelligence is that it allows organizations
to make data-driven decisions. Businesses can identify areas for improvement
in their operations and make changes as a result of real-time data analysis.
For example, if a company's sales are declining, it can use BI tools to identify
the root cause of the problem and take corrective action. Another advantage
of BI is that it can assist organizations in optimizing their operations.
Businesses, for example, can identify areas where they can improve customer
satisfaction and increase sales by analyzing customer behaviour. They can
also use business intelligence to track inventory levels and optimize supply
chain operations to save money.
Assume a retail company wants to increase sales in its physical stores. The
company can use business intelligence to analyze customer behaviour and
identify areas where operations can be improved. First, the company can
collect data from various sources, such as point-of-sale systems, customer
loyalty programs, and social media platforms, using data mining techniques.
This information may include customer demographics, purchasing habits, and
product preferences. The company can then use data visualization tools to
build dashboards and reports that highlight key performance metrics like
sales per store, sales per employee, and customer satisfaction ratings. These
reports can assist the company in identifying customer behaviour trends and
patterns, as well as tracking the effectiveness of marketing campaigns and
promotions.
The company can make data-driven decisions to improve its operations based
on the insights gained from data analysis. For example, if data shows that a
specific product sells well in one store but not in others, the company can
stock more of that product in the underperforming stores. They can also use
the data to identify peak shopping hours and adjust staffing levels
accordingly to ensure that customers are served as soon as possible. By
analyzing customer behavior with business intelligence, the company can
optimize operations and increase sales in its physical stores. They can also
use the data analysis insights to improve their online sales and marketing
efforts, resulting in higher revenue and profits.
Purpose of BI:
Business intelligence (BI) serves several functions, including the following:
The data sources are the first component of a BI system. This includes all
data-generating and data-capture systems and applications, such as
transactional databases, customer relationship management (CRM)
systems, and social media platforms. Data sources are the raw materials
from which a BI system generates insights. Internal databases, external
sources such as social media and market research, and data warehouses
are examples of these sources. To provide a comprehensive view of the
organization's performance, a BI system must be able to access and
integrate data from all of these sources.
In the diagram above, data is collected from various sources, and the ETL
process converts unstructured data into structured and meaningful
information. This leads to the creation of a data warehouse, reporting, and
analysis to make better decisions.
Following the definition of the data model, the next step is to design the
schema that will be used to implement the data model in a specific database
management system. Selecting appropriate data types for each attribute,
defining tables and their columns, and establishing relationships between
tables using primary and foreign keys are all part of this process. Data Model
Schemas are commonly used to visually represent the architecture of a
database and serve as the foundation for an organization's Data Management
practice.
Choosing the right Data Model Schema can help to eliminate bottlenecks and
anomalies during software project execution. An incorrect Schema Design,
on the other hand, can cause several errors in an application and make
refactoring expensive. For example, if you didn't realize early on that your
58 application would require multiple table JOINS, your service will eventually
Introduction to
stop when you reach a certain number of users and data. Business Intelligence
The Data Model Schema design begins with a high level of abstraction and
progresses to become more concrete and specific, as with any design process.
Based on their level of abstraction, data models are generally classified into
three types. The process will start with a Conceptual Model, then a Logical
Model, and finally a Physical Model. Such data models provide a conceptual
framework within which a database user can specify the requirements,
structure, and set of attributes for configuring a Database Schema. A Data
Model also offers users a high-level design implementation that dictates what
can be included in the schema.
The following are some popular data model schemas:
• Hierarchical Schema
• Relational Schema
• Network Schema
• Object-Oriented Schema
• Entity-Relationship Schema
Hierarchical Schema:
A hierarchical schema is a type of database schema that organizes data in a
tree-like structure with a single root, with each node having one parent and
potentially multiple children. A tree schema or a parent-child schema is
another name for this type of schema. Data is organized top-down in a
hierarchical schema, with the parent node at the top of the tree representing
the most general information and child nodes below it representing more
specific information.
59
Business Intelligence A hierarchical schema for a company, for example, might have "Company"
& Decision Making
as the root node, with child nodes for "Departments," "Employees," and
"Projects." One of the primary benefits of a hierarchical schema is that it is
simple to understand and apply, making it ideal for small to medium-sized
databases with simple data relationships. However, when dealing with
complex relationships or when changes to the schema structure are required,
it can be limiting. Furthermore, because some data may need to be repeated at
multiple levels of the hierarchy, this type of schema can result in data
redundancy.
This data model arranges data using a tree-like structure, with the root node
being the highest. When there are multiple nodes at the top level, root
segments exist. It has nodes that are linked together by branches. Each node
has one parent, who may have multiple children. A one-to-many connection
between various types of data. The information is saved as a record and
linked together.
The "Company" node is the root of the hierarchy in this schema, with three
child nodes representing the company's departments. Each department node
has a manager node as its first child, followed by one or more employee
nodes. This structure enables efficient querying of data related to specific
departments or employees, as well as easy navigation of the company's
organizational structure.
Advantages:
• Easy to understand and implement: A hierarchical schema is a simple
and intuitive way to organize data. It is simple to comprehend and
implement.
• Querying data becomes more efficient because the hierarchical schema is
organized in a tree-like structure. This is due to the ease with which we
can navigate the hierarchy by following the links between parent and
child nodes.
• Data Integrity: A hierarchical schema ensures that data is always
consistent and that data integrity is maintained. This is because each
60
Introduction to
child node can only have one parent node, preventing data duplication Business Intelligence
and inconsistency.
• Improved Security: A hierarchical schema improves security because
access to nodes can be easily controlled by setting permissions at the
appropriate levels.
Challenges:
• Limited flexibility: Hierarchical schema has limited flexibility because
it can only represent data in a tree-like structure. This makes representing
complex data relationships difficult.
• Data redundancy: Because data may need to be duplicated at multiple
levels in the hierarchy, hierarchical schema can lead to data redundancy.
• Difficult to scale: Hierarchical schema can be difficult to scale because
adding new levels or nodes requires significant restructuring of the
schema.
• Inefficient updates: Updating data in a hierarchical schema can be
inefficient because changes to a parent node may necessitate updates to
all of its children nodes.
Relational schemas are important because they standardize the way data is
organized and accessed in a database. They make data management easier
and ensure data integrity by imposing rules and constraints on the data. They
also allow for efficient data querying and reporting, making it easier for
applications to retrieve the information they require.
Employee table:
Department table:
Foreign Key:
The field "department_id" in the "Employee" table is a foreign key that refers
to the field "department_id" in the "Department" table. This indicates that
each employee is assigned to a specific department. By joining the
"Employee" and "Department" tables on the "department_id" field, we can
answer questions like "What is the name of the department that employee
John Smith belongs to?" This is just a simple example; in practice, a
relational schema could be much more complex, with many more tables and
relationships.
Advantages:
• Standardization: Relational schema provides a standardized method of
organizing and accessing data in a database. This facilitates the
understanding of the data structure by developers and users, as well as
the access and manipulation of the data by applications.
Challenges:
Network Schema:
The network schema is a type of database schema that organizes data
similarly to the hierarchical schema, but with a more flexible and complex
structure. Data is organized as a graph in a network schema, with nodes
representing entities and edges representing relationships between them. In
contrast to the hierarchical schema, which allows only one parent for each
child, nodes in a network schema can have multiple parents, allowing for
more complex relationships between entities.
Employee record:
Set Fields:
Manager: Pointer to the manager's employee record.
Employee: Pointer to the employee records that belong to the department.
Project record:
Set Fields:
Manager: Pointer to the manager's employee record
Employee: Pointer to the employee records working on the project
We can use this schema to answer questions like "What are the names of the
employees working on project X?" by following the pointers from the project
record to the employee records, and "Who is the supervisor of employee Y?"
by following the pointer from the employee record to the supervisor's
employee record.
The network schema's flexibility in representing complex relationships
between entities is one of its advantages. Entities with multiple parents can
have more complex and flexible relationships than in the hierarchical schema.
Furthermore, the network schema supports many-to-many relationships,
which allow entities to have multiple relationships with other entities.
The network schema's complexity, on the other hand, can make it difficult to
understand and manage. It is more difficult to program and may be less
efficient than the hierarchical schema. Furthermore, the use of pointers or
links between records can make navigating and querying the data more
difficult.
64
Introduction to
Object-Oriented Schema: Business Intelligence
Advantages:
• Encapsulation: One of the primary characteristics of object-oriented
schema is encapsulation. It allows data to be hidden from other parts of
the program, limiting access to only defined interfaces. This reduces
complexity, improves modularity, and boosts security.
Challenges:
Entity-Relationship Schema:
An entity-relationship (ER) schema is a diagrammatic representation of a
database's entities, attributes, and relationships between them. It is a high-
level conceptual model of a database's structure. The ER schema is
commonly used to design relational databases and to communicate database
designs to developers and stakeholders. An entity in an ER schema is a real-
world object, concept, or event with its own identity and the ability to be
uniquely identified. Attributes describe the characteristics of entities and are
used to define their properties. Relationships describe the connection between
entities.
Entities, attributes, and relationships are the three main components of an ER
schema.
Entities: A rectangle represents an entity, and its name is written inside the
rectangle. An entity can be a person, a place, a thing, an event, or a concept.
Attributes are represented by an oval shape and are linked to their respective
entities by a line. An attribute defines an entity's properties and provides
additional information about it.
Book Author
book_id author_id
title name
genre nationality
publish_year
publisher_id
Publisher Borrower
publisher_id borrower_id
name Name
address Email
phone
Borrowed Book Borrowing Log
book_id borrow_id
borrower_id book_id
borrow_date borrower_id
return_date borrow_date
return_date
The lines connecting the entities represent their relationships. The "Book"
entity, for example, has a "publisher_id" attribute that links it to the
"Publisher" entity. The "Borrowed Book" entity has attributes "book_id" and
"borrower_id" that link it to the "Book" and "Borrower" entities, respectively.
67
Business Intelligence Finally, the "Borrowing Log" entity has attributes that describe how
& Decision Making
borrowers borrow and return books, and it is linked to both the "Book" and
"Borrower" entities.
Now that the data is in the data warehouse, the e-commerce company can
analyze the data and gain insights into its business operations using tools
such as SQL queries, data visualization software, and machine learning
algorithms. This can assist the company in making data-driven decisions to
improve customer satisfaction, boost sales, and optimize its supply chain.
ETL is a Data Warehousing process that stands for Extract, Transform, and
Load. An ETL tool extracts data from various data source systems,
transforms it in the staging area, and then loads it into the Data Warehouse
68 system.
Introduction to
Extraction: Business Intelligence
Extraction is the first step in the ETL process. In this step, data from various
source systems is extracted into the staging area in various formats such as
relational databases, No SQL, XML, and flat files. Because the extracted data
is in various formats and can be corrupted, it is critical to extract it from
various source systems and store it first in the staging area rather than directly
in the data warehouse. As a result, loading it directly into the data warehouse
may cause it to be damaged, and rollback will be much more difficult. As a
result, this is one of the most crucial steps in the ETL process.
Transformation:
Transformation is the second step in the ETL process. In this step, the
extracted data is subjected to a set of rules or functions to be converted into a
single standard format. It could include the following processes/tasks:
• Filtering is the process of loading only specific attributes into a data
warehouse.
• Cleaning entails replacing NULL values with default values, mapping
the U.S.A, United States, and America into the USA, and so on.
• Joining is the process of combining multiple attributes into one.
• Splitting is the process of dividing a single attribute into multiple
attributes.
• Sorting is the process of organizing tuples based on some attribute.
(generally key-attribute).
Loading:
Loading is the third and final step in the ETL process. The transformed data
is finally loaded into the data warehouse in this step. The data is sometimes
updated very frequently by loading it into the data warehouse, and other
times it is done at longer but regular intervals. The rate and duration of
loading are solely determined by the requirements and differ from system to
system.
69
Business Intelligence
& Decision Making 4.3 INTRODUCTION TO DATA MINING AND
ANALYTICS
Data mining and analytics are two related fields that involve extracting
insights and knowledge from data using computer algorithms and statistical
techniques. While the two terms are frequently used interchangeably, there
are some distinctions between them. Data mining is the process of identifying
patterns and relationships in large datasets by using algorithms. The goal of
data mining is to extract previously unknown insights and knowledge from
data that can then be used to make better decisions and predictions.
Data mining techniques can be used for a variety of purposes, including fraud
detection, market segmentation, and customer churn prediction. Analytics, on
the other hand, entails analyzing and interpreting data using statistical and
mathematical techniques. Analytics can be used to spot trends, forecast future
outcomes, and test hypotheses. In business, analytics is frequently used to
inform decision-making, such as optimizing pricing strategies or improving
supply chain efficiency.
Data mining and analytics are inextricably linked because both involve
working with data and extracting insights from it. Many techniques, such as
clustering, classification, and regression, are also shared. Data mining, on the
other hand, focuses on identifying patterns and relationships, whereas
analytics focuses on analyzing and interpreting data to make informed
decisions. Data mining and analytics are both critical tools for businesses and
organizations looking to maximize the value of their data. Businesses can
gain valuable insights into customer behavior, market trends, and operational
efficiency by using these techniques, which can help them stay ahead of the
competition and make data-driven decisions.
As an example, suppose a business wants to improve customer retention by
identifying customers who are likely to cancel their subscriptions. They have
a large dataset with data on customer demographics, behaviour, and
transaction history. To use data mining techniques, the company could group
customers based on their behaviour and transaction history using clustering
algorithms. Customers who have made large purchases in the past are more
likely to renew their subscriptions, whereas customers who have recently
decreased their purchase activity are more likely to cancel.
70
Introduction to
4.4 DATA GOVERNANCE AND SECURITY Business Intelligence
Data governance and security are two essential elements of any organization's
data management strategy. Data governance refers to the processes and
policies that ensure data is managed and used effectively and efficiently,
whereas data security refers to the safeguards put in place to prevent
unauthorized access, disclosure, modification, or destruction of data. The
creation of policies, procedures, and standards for managing data throughout
its lifecycle is referred to as data governance. It includes data quality,
privacy, security, and compliance management, as well as overall data asset
management. Data governance aims to ensure that data is managed
effectively and efficiently and that it is used to support business goals.
• Data mining tools: Extract information from large data sets to identify
patterns, correlations, and trends. This data can be used to inform
business decisions and improve operations.
4.6 SUMMARY
This unit gives an overview of Business Intelligence (BI), which is the
process of gathering, analyzing, and transforming raw data into useful
information that businesses can use to make informed decisions. BI employs
a wide range of tools, technologies, and strategies to access and analyze data
from a variety of sources, including databases, spreadsheets, and data
repositories. This unit discusses the advantages of business intelligence, such
as its ability to provide insights into business operations, identify areas for
improvement, and enable data-driven decision-making, which can increase
revenue and profitability. Dashboards, reports, and data visualizations are
also highlighted as tools to assist decision-makers in interpreting complex
data and identifying patterns and trends.
The unit also discusses some common BI tools and technologies, such as data
warehouses, ETL (Extract, Transform, Load) tools, analytics software, and
data visualization platforms. It also discusses the significance of data quality,
data governance, and data security in business intelligence. Overall, this unit
provides a thorough overview of Business Intelligence and its significance in
modern business operations. It focuses on the key concepts, strategies, and
technologies involved in business intelligence and explains how they can be
used to gain a competitive advantage.
75