0% found this document useful (0 votes)
11 views25 pages

Unit 4

Uploaded by

beebird234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

Unit 4

Uploaded by

beebird234
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Introduction to

UNIT 4 INTRODUCTION TO BUSINESS Business Intelligence

INTELLIGENCE

Objectives
After studying this unit, you will be able to:
• Understand the depth of knowledge of Business Intelligence (BI) and
relative terminologies.
• Recognize the usage of various Business Intelligence tools and
techniques to collect, analyze, and interpret data from different sources.
• Gain insights into business operations, customer behaviour, market
trends, and other key areas of interest.
• Evaluate the goal of BI is to provide decision-makers with the
information they need to optimize business performance, reduce costs,
increase revenue, and achieve strategic objectives.

Structure
4.1 Introduction to Business Intelligence
4.2 Data Warehousing
4.2.1 Data Modeling and schema design
4.3 Data Mining and Analytics
4.4 Data Governance and Security
4.5 Business Intelligence Applications
4.6 Summary
4.7 Self-Assessment Exercises
4.8 Further Readings

4.1 INTRODUCTION TO BUSINESS


INTELLIGENCE
Business intelligence (BI) is the process of collecting, analyzing, and
presenting data to assist organizations in making informed business
decisions. Organizations can use BI tools and technologies to access and
analyze large amounts of data from various sources, transforming it into
actionable insights. To collect and analyze data, BI typically employs data
warehouses, data mining, and data visualization tools. It can assist businesses
in identifying trends, patterns, and relationships in their data, which can then
be used to inform strategic decisions and drive business growth.

Business intelligence can be used to analyze customer behaviour, track sales


performance, optimize supply chain operations, and monitor financial
performance. It is an essential component of modern business strategy, and
its importance is growing in today's data-driven business environment.
Business intelligence (BI) is a process that collects, organizes, and analyses
data to gain insights that can help organizations make informed decisions. BI 51
Business Intelligence systems collect data in a variety of ways, including data extraction from
& Decision Making
databases, web analytics, and social media platforms.

After the data is collected, it is converted into a format that can be analyzed
with BI tools such as data visualization software, dashboards, and reporting
tools. These tools enable businesses to analyze large datasets, identify trends,
and gain a better understanding of their operations. One of the most
significant advantages of business intelligence is that it allows organizations
to make data-driven decisions. Businesses can identify areas for improvement
in their operations and make changes as a result of real-time data analysis.
For example, if a company's sales are declining, it can use BI tools to identify
the root cause of the problem and take corrective action. Another advantage
of BI is that it can assist organizations in optimizing their operations.
Businesses, for example, can identify areas where they can improve customer
satisfaction and increase sales by analyzing customer behaviour. They can
also use business intelligence to track inventory levels and optimize supply
chain operations to save money.

Assume a retail company wants to increase sales in its physical stores. The
company can use business intelligence to analyze customer behaviour and
identify areas where operations can be improved. First, the company can
collect data from various sources, such as point-of-sale systems, customer
loyalty programs, and social media platforms, using data mining techniques.
This information may include customer demographics, purchasing habits, and
product preferences. The company can then use data visualization tools to
build dashboards and reports that highlight key performance metrics like
sales per store, sales per employee, and customer satisfaction ratings. These
reports can assist the company in identifying customer behaviour trends and
patterns, as well as tracking the effectiveness of marketing campaigns and
promotions.
The company can make data-driven decisions to improve its operations based
on the insights gained from data analysis. For example, if data shows that a
specific product sells well in one store but not in others, the company can
stock more of that product in the underperforming stores. They can also use
the data to identify peak shopping hours and adjust staffing levels
accordingly to ensure that customers are served as soon as possible. By
analyzing customer behavior with business intelligence, the company can
optimize operations and increase sales in its physical stores. They can also
use the data analysis insights to improve their online sales and marketing
efforts, resulting in higher revenue and profits.

Purpose of BI:
Business intelligence (BI) serves several functions, including the following:

Improved decision-making: Business intelligence (BI) provides


organizations with real-time insights into their operations, allowing decision-
makers to make more informed decisions. Organizations can identify trends,
patterns, and relationships in data that will help them identify opportunities
and potential risks.
52
Introduction to
Increased efficiency: Business intelligence can assist organizations in Business Intelligence
optimizing their operations and increasing efficiency. Organizations can
identify areas where they can cut costs and streamline operations by
analyzing data on inventory levels, sales performance, and customer
behaviour. BI can provide organizations with a competitive advantage by
providing insights that help them make better decisions. By analyzing
competitor data, businesses can identify areas where they can improve their
operations and gain a competitive advantage.
Better customer insights: Business intelligence can assist organizations in
gaining a better understanding of their customers. Organizations can identify
preferences and trends in customer behaviour, which can help them tailor
their products and services to meet customer needs.

Improved collaboration: Business intelligence tools and technologies allow


organizations to share data and insights with employees throughout the
organization. This can improve collaboration and help teams collaborate
more effectively.

Faster reporting and analysis: BI tools can automate data collection,


analysis, and reporting, saving time and improving accuracy. Organizations
can generate reports faster and make decisions in real time by automating
these processes.

Historical development of BI:


Business intelligence (BI) has been around since the mid-nineteenth century.
An overview of BI's historical development is as:
• Early decision-making systems: Companies began using simple
decision-support systems to analyze financial data in the mid-nineteenth
century. These systems analyzed data and generated reports using basic
statistical methods.
• Databases are on the rise: Databases became more common in the
1960s and 1970s, and businesses began to use them to store and manage
large amounts of data. This resulted in the creation of more sophisticated
data analysis tools.
• Data warehousing first appeared in the 1980s as a method for businesses
to store and manage large amounts of data. Companies were able to
analyze data from multiple sources and generate more comprehensive
reports as a result of this.
• The rise of OLAP: Online analytical processing (OLAP) tools gained
popularity in the 1990s. These tools allowed users to quickly analyze
data from various perspectives and generate reports.
• The evolution of data mining: Data mining emerged in the late 1990s
and early 2000s as a method for businesses to identify patterns and
trends in large amounts of data. Companies were able to gain new
insights into their operations and make better decisions as a result of this.
• The emergence of big data: The growth of big data over the last decade
has resulted in the development of new BI tools and technologies. These
53
Business Intelligence tools enable businesses to analyze massive amounts of data in real-time
& Decision Making
and generate previously unattainable insights.

Fig 4.1: Historical development of BI

BI's historical development has been fueled by technological advances and an


increasing need for organizations to analyze data and make informed
decisions. As data volumes increase, BI will evolve, with new tools and
technologies emerging to assist organizations in gaining insights into their
operations and making better decisions.
Key components of a BI system:
A business intelligence (BI) system consists of several components that work
together to collect, process, and analyze data. Below are the key components
of a BI system:

Fig 4.2: Key components of BI System


54
Introduction to
i) Data sources: Business Intelligence

The data sources are the first component of a BI system. This includes all
data-generating and data-capture systems and applications, such as
transactional databases, customer relationship management (CRM)
systems, and social media platforms. Data sources are the raw materials
from which a BI system generates insights. Internal databases, external
sources such as social media and market research, and data warehouses
are examples of these sources. To provide a comprehensive view of the
organization's performance, a BI system must be able to access and
integrate data from all of these sources.

ii) Data integration:


The data integration layer is the second component of a BI system. This
component collects data from various sources, formats it, and loads it
into a central data repository. The process of combining data from
various sources and transforming it into a usable format for analysis is
known as data integration. Cleaning, validating, and standardizing data is
required to ensure accuracy and consistency. Data integration aims to
create a unified view of data that can be used for analysis and reporting.

iii) Data warehouse:


The data warehouse is the third component of a BI system. This is a
central repository of integrated data that is optimized for reporting and
analysis. The data warehouse stores historical information and allows for
complex queries and analysis. A data warehouse is a central repository
where all integrated data is stored in a structured format that is optimized
for analysis. The data warehouse is intended to facilitate efficient
querying and reporting by organizing data into subject areas that
correspond to the organization's business processes. In addition, the data
warehouse offers a historical perspective on the data, allowing users to
analyze trends and patterns over time.

iv) Data modeling:


The data modelling layer is the fourth component of a BI system. This
component includes tools and technologies that enable users to create
data models that represent data element relationships. This encompasses
both logical and physical data models. Data modelling is a critical
component of Business Intelligence (BI) that entails developing a
conceptual representation of the data used in a BI system. A data model
is a graphical representation of data and the relationships between data
entities. The goal of data modelling is to provide a framework for
understanding and organizing data in a business intelligence system.

v) Business intelligence tools:


Business intelligence tools are the fifth component of a BI system. This
includes tools for reporting, dashboards, data visualization, and ad-hoc
querying. Users can use these tools to analyze data and generate reports
and visualizations to help them make decisions. BI tools are software 55
Business Intelligence applications that allow users to access, analyze, and visualize data to gain
& Decision Making
insights into their organization's performance. BI tools are an important
part of a Business Intelligence (BI) system, which is designed to give
users the information they need to make data-driven decisions. Data from
a variety of sources, including databases, spreadsheets, and other data
sources, is processed using BI tools. They provide a range of capabilities,
including reporting, data visualization, data mining, predictive analytics,
and OLAP (Online Analytical Processing). BI tools are designed to make
it easy for users to access and analyze data without requiring advanced
technical skills.

vi) Analytics and data mining:


The analytics and data mining layer is the sixth component of a BI
system. This section contains tools for statistical analysis, predictive
modeling, and data mining algorithms. Users can use these tools to
identify patterns and trends in data and make predictions based on
historical data. Analytics in business intelligence entails the application
of statistical and mathematical techniques to analyze data and generate
insights that can assist organizations in making informed decisions.
Regression analysis, time series analysis, and clustering analysis are
examples of such techniques. BI analytics tools allow organizations to
analyze data in real-time, perform ad hoc analysis, and create custom
reports and dashboards. In business intelligence, data mining entails
using machine learning algorithms and statistical techniques to uncover
hidden patterns and relationships in data. Techniques such as association
rule mining, decision trees, and neural networks are examples of this.
Data mining tools in BI enable organizations to identify patterns and
insights that may not be obvious at first glance and to use this
information to make more informed decisions.

vii) Metadata management:


The metadata management layer is the seventh component of a BI
system. This component is in charge of the metadata, which is the
information that describes the data. This includes information about the
data's origin, data definitions, and data lineage. Metadata management is
a critical component of Business Intelligence (BI) that involves the
administration of metadata, which is data that describes other data.
Metadata describes the context, meaning, and structure of data in a
business intelligence system. The metadata repository is a centralized
location for storing and managing metadata. The metadata repository
contains information about the BI system's data sources, data models,
data transformations, and other components. It enables users to
comprehend and interpret the information presented in the system.
Organizations can ensure that their BI system delivers meaningful
insights that can drive better business decisions by providing accurate,
consistent, and comprehensive metadata.
These components work together to give users insights into their operations
and to help them make decisions.
56
Introduction to
4.2 DATA WAREHOUSING Business Intelligence

Data warehousing is a critical component of Business Intelligence (BI) that


entails collecting, storing, and organizing large amounts of data from various
sources. A data warehouse is a centralized repository that stores historical
and current data from various sources in a structured format that can be
accessed and analyzed easily. Data warehouses typically collect information
from operational systems such as transactional databases, customer
relationship management (CRM) systems, and enterprise resource planning
(ERP) systems. After that, the data is transformed, cleaned, and integrated to
provide a unified view of the business. A data warehouse's primary function
is to provide a single source of truth for data analysis and reporting. This
provides business users with accurate, timely, and consistent data to generate
insights and inform decision-making. Data warehouses also serve as a
platform for advanced analytics and data mining, which can assist businesses
in identifying patterns and trends in their data.

In BI, data warehousing involves the use of a variety of technologies and


tools, such as Extract, Transform, Load (ETL) processes, data modeling, and
data visualization. ETL processes are used to extract data from source
systems, transform the data into a data warehouse-compatible format, and
load the data into the warehouse. Data modeling entails developing a logical
data model and a physical data model, which define the relationships between
data elements and the database structure. Data visualization tools are used to
create reports and dashboards that business users can use to communicate
insights.

Data warehousing serves as a central repository of data that can be accessed


and analyzed to gain insights into business performance, trends, and
opportunities. A data warehouse provides a consolidated view of data from
various sources. This enables companies to combine data from various
systems and departments, such as sales, finance, and operations.

Fig 4.3: ETL Process & Data Warehouse

A data warehouse stores historical data over a long period, allowing


businesses to analyze performance trends and patterns over time. This can aid
in identifying areas for improvement as well as informing strategic decision- 57
Business Intelligence making. Data warehouses serve as the foundation for business intelligence
& Decision Making
tools like dashboards and reports. These tools assist businesses in analyzing
and visualizing data to gain insights and make informed decisions about data
quality, such as data cleansing and validation. This contributes to the
accuracy and dependability of the data used for analysis.

In the diagram above, data is collected from various sources, and the ETL
process converts unstructured data into structured and meaningful
information. This leads to the creation of a data warehouse, reporting, and
analysis to make better decisions.

Data modeling and schema design:


The process of creating a conceptual representation of data structures and
relationships that exist within a specific domain or system is known as data
modelling. It entails identifying and defining the entities, attributes, and
relationships required to effectively represent and manage data. The process
of translating a conceptual data model into a physical database schema that
can be implemented in a specific database management system is known as
schema design. It entails selecting appropriate data types, defining tables and
their columns, and establishing table relationships.
Typically, the data modelling process begins with identifying the entities that
will be represented in the system. A real-world object, concept, or event that
can be uniquely identified and described is referred to as an entity. Let us
consider, Customers, orders, products, and invoices are examples of entities
in a customer relationship management system. After identifying the entities,
the next step is to define the attributes associated with each entity. A
characteristic or property of an entity that can be used to describe or
distinguish it is referred to as an attribute. A customer entity's attributes
might include, for example, name, address, phone number, and email address.
Following the definition of the entities and attributes, the next step is to
identify the relationships between the entities. The associations or
connections that exist between two or more entities are referred to as
relationships. In a customer relationship management system, for example, a
customer may place multiple orders, each of which may contain multiple
products.

Following the definition of the data model, the next step is to design the
schema that will be used to implement the data model in a specific database
management system. Selecting appropriate data types for each attribute,
defining tables and their columns, and establishing relationships between
tables using primary and foreign keys are all part of this process. Data Model
Schemas are commonly used to visually represent the architecture of a
database and serve as the foundation for an organization's Data Management
practice.

Choosing the right Data Model Schema can help to eliminate bottlenecks and
anomalies during software project execution. An incorrect Schema Design,
on the other hand, can cause several errors in an application and make
refactoring expensive. For example, if you didn't realize early on that your
58 application would require multiple table JOINS, your service will eventually
Introduction to
stop when you reach a certain number of users and data. Business Intelligence

To resolve such complications, data will almost certainly need to be moved to


new tables, code will need to point to those new tables, and the tables will
require the necessary JOINs. This implies that you'll need a very strong test
environment (Database and Source Code) to test your changes, as well as a
strategy for managing Data Integrity while also upgrading your database and
source code. Once you begin migrating your database to a new schema, there
is almost no turning back. To avoid such complexities in the early stages of a
data project, it is critical to choose the appropriate schema, avoiding
unprecedented bottlenecks.

The Data Model Schema design begins with a high level of abstraction and
progresses to become more concrete and specific, as with any design process.
Based on their level of abstraction, data models are generally classified into
three types. The process will start with a Conceptual Model, then a Logical
Model, and finally a Physical Model. Such data models provide a conceptual
framework within which a database user can specify the requirements,
structure, and set of attributes for configuring a Database Schema. A Data
Model also offers users a high-level design implementation that dictates what
can be included in the schema.
The following are some popular data model schemas:
• Hierarchical Schema
• Relational Schema
• Network Schema
• Object-Oriented Schema
• Entity-Relationship Schema

Fig 4.4: Types of Data Models

Hierarchical Schema:
A hierarchical schema is a type of database schema that organizes data in a
tree-like structure with a single root, with each node having one parent and
potentially multiple children. A tree schema or a parent-child schema is
another name for this type of schema. Data is organized top-down in a
hierarchical schema, with the parent node at the top of the tree representing
the most general information and child nodes below it representing more
specific information.

59
Business Intelligence A hierarchical schema for a company, for example, might have "Company"
& Decision Making
as the root node, with child nodes for "Departments," "Employees," and
"Projects." One of the primary benefits of a hierarchical schema is that it is
simple to understand and apply, making it ideal for small to medium-sized
databases with simple data relationships. However, when dealing with
complex relationships or when changes to the schema structure are required,
it can be limiting. Furthermore, because some data may need to be repeated at
multiple levels of the hierarchy, this type of schema can result in data
redundancy.

This data model arranges data using a tree-like structure, with the root node
being the highest. When there are multiple nodes at the top level, root
segments exist. It has nodes that are linked together by branches. Each node
has one parent, who may have multiple children. A one-to-many connection
between various types of data. The information is saved as a record and
linked together.

A hierarchical schema is a database schema that organizes data into a tree-


like structure with one or more child nodes for each node. Here's an example
of a hierarchical organizational structure for a company:

Fig 4.5: A hierarchical schema

The "Company" node is the root of the hierarchy in this schema, with three
child nodes representing the company's departments. Each department node
has a manager node as its first child, followed by one or more employee
nodes. This structure enables efficient querying of data related to specific
departments or employees, as well as easy navigation of the company's
organizational structure.

Advantages:
• Easy to understand and implement: A hierarchical schema is a simple
and intuitive way to organize data. It is simple to comprehend and
implement.
• Querying data becomes more efficient because the hierarchical schema is
organized in a tree-like structure. This is due to the ease with which we
can navigate the hierarchy by following the links between parent and
child nodes.
• Data Integrity: A hierarchical schema ensures that data is always
consistent and that data integrity is maintained. This is because each
60
Introduction to
child node can only have one parent node, preventing data duplication Business Intelligence
and inconsistency.
• Improved Security: A hierarchical schema improves security because
access to nodes can be easily controlled by setting permissions at the
appropriate levels.

Challenges:
• Limited flexibility: Hierarchical schema has limited flexibility because
it can only represent data in a tree-like structure. This makes representing
complex data relationships difficult.
• Data redundancy: Because data may need to be duplicated at multiple
levels in the hierarchy, hierarchical schema can lead to data redundancy.
• Difficult to scale: Hierarchical schema can be difficult to scale because
adding new levels or nodes requires significant restructuring of the
schema.
• Inefficient updates: Updating data in a hierarchical schema can be
inefficient because changes to a parent node may necessitate updates to
all of its children nodes.

Relational Schema Model:


A relational schema is a formal description of how data in a relational
database is organized. It defines a database's structure, including the tables,
fields, and relationships between them. The schema defines the names of the
tables and columns, as well as the column data types and any constraints or
rules that apply to the data. It also specifies the primary and foreign keys that
are used to connect tables.

A simple relational schema might consist of two tables, "Customers" and


"Orders," for example. Columns in the "Customers" table would include
"customer_id," "first_name," "last_name," and "email," while columns in the
"Orders" table would include "order_id," "customer_id," "order_date," and
"total_amount." A foreign key would connect the "customer_id" column in
the "Orders" table to the "customer_id" column in the "Customers" table.

Relational schemas are important because they standardize the way data is
organized and accessed in a database. They make data management easier
and ensure data integrity by imposing rules and constraints on the data. They
also allow for efficient data querying and reporting, making it easier for
applications to retrieve the information they require.

A simple relational schema for a company's employee database is shown


below:

Employee table:

Field Name Data Type Description


employee_id Integer Unique identifier for each employee
first_name Varchar(50) First name of the employee
61
Business Intelligence last_name Varchar(50) Last name of the employee
& Decision Making
hire_date Date Date when the employee was hired
job_title Varchar(50) Job title of the employee
department_id Integer The ID of the department the
employee belongs to
manager_id Integer The ID of the employee's manager
Primary Key: employee_id

Department table:

Field Name Data Type Description


department_id Integer Unique identifier for each department
department_name Varchar(50) Name of the department
location Varchar(50) Location of the department
Primary Key: department_id

Foreign Key:
The field "department_id" in the "Employee" table is a foreign key that refers
to the field "department_id" in the "Department" table. This indicates that
each employee is assigned to a specific department. By joining the
"Employee" and "Department" tables on the "department_id" field, we can
answer questions like "What is the name of the department that employee
John Smith belongs to?" This is just a simple example; in practice, a
relational schema could be much more complex, with many more tables and
relationships.

Advantages:
• Standardization: Relational schema provides a standardized method of
organizing and accessing data in a database. This facilitates the
understanding of the data structure by developers and users, as well as
the access and manipulation of the data by applications.

• Data Integrity: Relational schema allows you to define constraints and


rules for your data. This ensures that the data in the database is correct,
consistent, and meets certain requirements. It aids in the prevention of
data duplication, loss, or corruption.

• Scalability: Relational databases can handle large amounts of data while


maintaining performance. The schema enables efficient data querying,
indexing, and searching.

• Flexibility: A relational schema stores data in tables that can be easily


manipulated, queried, and joined to other tables to extract the required
data. This makes it simple to adapt the database to changing needs.

Challenges:

• Complexity: Relational schema can be complicated, particularly in large


databases with many tables and relationships. Designing an optimal
62 schema that balances efficiency and flexibility can be difficult.
Introduction to
• Performance: Although relational databases are scalable, performance Business Intelligence
issues can arise when dealing with very large datasets or complex
queries. This can be mitigated by indexing, caching, and other
optimizations, but it remains a concern.

• Maintenance and updates: Maintenance and updates are required for


the relational schema to ensure that the data remains accurate and
consistent over time. This can take a long time and requires specialized
knowledge and skills.
• Cost: Setting up and maintaining relational databases can be costly,
especially for large-scale applications. The cost of hardware, software,
and licensing can quickly add up.

Network Schema:
The network schema is a type of database schema that organizes data
similarly to the hierarchical schema, but with a more flexible and complex
structure. Data is organized as a graph in a network schema, with nodes
representing entities and edges representing relationships between them. In
contrast to the hierarchical schema, which allows only one parent for each
child, nodes in a network schema can have multiple parents, allowing for
more complex relationships between entities.

Entities are represented as records in a network schema, and relationships


between entities are represented as pointers or links. Each network schema
record contains two types of fields: data fields, which store the values
associated with the entity, and set fields, which store pointers to related
records. Set fields are used to represent the various relationships between
records and can have multiple values.

Consider the employee database of a company. Employee records in a


network schema would include data fields such as name, hire date, and job
title, as well as set fields for the employee's supervisor, department, and
projects they are working on. The department records would include data
fields such as name and location, as well as set fields for the department's
manager and the department's employees.
An example of a network schema for a company's employee database:

Employee record:

Field Name Data Type Description


Employee ID Integer Unique identifier for each employee
Name Varchar(50) Name of the employee
Hire Date Date Date when the employee was hired
Job Title Varchar(50) Job title of the employee
Set Fields:
Supervisor: This is a link to the supervisor's employee record.
Department: A pointer to the employee's department record.
63
Business Intelligence Project: A reference to the project record on which the employee is working.
& Decision Making
Department record:

Field Name Data Type Description


Department Integer Unique identifier for each department
ID
Name Varchar(50) Name of the department
Location Varchar(50) Location of the department

Set Fields:
Manager: Pointer to the manager's employee record.
Employee: Pointer to the employee records that belong to the department.
Project record:

Field Name Data Type Description

Unique identifier for each


Project ID Integer
project

Name Varchar(50) Name of the project

Start Date Date Date when the project started

End Date Date Date when the project ended

Set Fields:
Manager: Pointer to the manager's employee record
Employee: Pointer to the employee records working on the project

We can use this schema to answer questions like "What are the names of the
employees working on project X?" by following the pointers from the project
record to the employee records, and "Who is the supervisor of employee Y?"
by following the pointer from the employee record to the supervisor's
employee record.
The network schema's flexibility in representing complex relationships
between entities is one of its advantages. Entities with multiple parents can
have more complex and flexible relationships than in the hierarchical schema.
Furthermore, the network schema supports many-to-many relationships,
which allow entities to have multiple relationships with other entities.

The network schema's complexity, on the other hand, can make it difficult to
understand and manage. It is more difficult to program and may be less
efficient than the hierarchical schema. Furthermore, the use of pointers or
links between records can make navigating and querying the data more
difficult.

64
Introduction to
Object-Oriented Schema: Business Intelligence

An object-oriented schema is a type of data model used in computer


programming and software engineering. It is based on the principles of
object-oriented programming (OOP), which emphasizes the use of objects to
represent real-world entities and concepts. In an object-oriented schema, data
is organized into objects, each of which has a set of attributes or properties
and a set of methods or operations that can be performed on those attributes.
Objects are defined by classes, which are templates that define the attributes
and methods that all instances of the class will have.
Object-oriented schemas also support inheritance, which allows one class to
inherit properties and methods from another. Because common functionality
can be defined in a parent class and inherited by child classes, this helps to
reduce code duplication and improve code organization. One of the primary
advantages of using an object-oriented schema is that it can aid in the
modularity and maintainability of code. It is easier to modify and extend code
when it is broken down into objects with well-defined interfaces.

Object-oriented schemas are widely used in software development, especially


for large-scale applications and systems. Many popular programming
languages, including Java, C++, and Python, support object-oriented
programming and provide tools for creating and manipulating object-oriented
schemas.

Fig 4.6: Object-oriented schemas – relationships between the classes

We have three classes in this example: Accounts, CheckingAccounts, and


SavingsAccounts. The deposit and withdrawal methods update the account
balance. CheckingAccount is a subclass of Accounts with the private attribute
overdraft_limit. It takes precedence over the withdrawal method, allowing
withdrawals up to the overdraft limit. SavingsAccount is a subclass of
Accounts with the private attribute interest_rate. Fig 4.6 depicts the
relationships between the classes: CheckingAccount and SavingsAccount
derive from Account, and CheckingAccount derives from Account.

Advantages:
• Encapsulation: One of the primary characteristics of object-oriented
schema is encapsulation. It allows data to be hidden from other parts of
the program, limiting access to only defined interfaces. This reduces
complexity, improves modularity, and boosts security.

• Reusability: Object-oriented schema encourages code reuse by allowing


developers to create classes that can be used throughout the programme.
65
Business Intelligence This reduces the amount of redundant code that must be written and aids
& Decision Making
in code maintenance.

• Inheritance: Classes can inherit properties and methods from other


classes. This encourages code reuse, simplifies code maintenance, and
makes it easier for developers to create complex programmes.

• Polymorphism is the ability to treat objects of different classes as if they


are of the same type. This allows for the creation of generic code that can
be applied to a wide variety of objects, increasing code flexibility and
reducing the amount of code that must be written.

Challenges:

• Complexity: Object-oriented schema can be more complex than other


data models, making it more difficult for beginners to learn and apply
effectively.

• Performance: Because of the overhead associated with encapsulation,


inheritance, and polymorphism, object-oriented programmes can be
slower than other types of programmes at times.

• Over-engineering: Object-oriented schema can occasionally lead to


over-engineering, with developers producing overly complex code that is
difficult to understand and maintain.

• Difficulty in debugging: Because of the complexity of the code and the


interactions between different objects, object-oriented programmes can
be difficult to debug at times.

Entity-Relationship Schema:
An entity-relationship (ER) schema is a diagrammatic representation of a
database's entities, attributes, and relationships between them. It is a high-
level conceptual model of a database's structure. The ER schema is
commonly used to design relational databases and to communicate database
designs to developers and stakeholders. An entity in an ER schema is a real-
world object, concept, or event with its own identity and the ability to be
uniquely identified. Attributes describe the characteristics of entities and are
used to define their properties. Relationships describe the connection between
entities.
Entities, attributes, and relationships are the three main components of an ER
schema.

Entities: A rectangle represents an entity, and its name is written inside the
rectangle. An entity can be a person, a place, a thing, an event, or a concept.

Attributes are represented by an oval shape and are linked to their respective
entities by a line. An attribute defines an entity's properties and provides
additional information about it.

Relationships: A relationship connects two entities and is represented by a


diamond shape. It describes the relationship between the two entities and can
66
Introduction to
include constraints on cardinality and participation. The number of entities Business Intelligence
that participate in the relationship is described by cardinality, whereas
participation constraints describe whether the entities are required or optional
in the relationship.

Consider a straightforward ER schema for a library database. Entities in this


schema could include "book," "author," and "borrower." Attributes such as
"book title," "author name," and "borrower ID" would be assigned to each
entity. The schema may include relationships such as "written by" between
"book" and "author" and "borrowed by" between "book" and "borrower."
These entities and relationships would be visually represented in the ER
schema, providing a high-level overview of the database structure.

Book Author
book_id author_id
title name
genre nationality
publish_year
publisher_id
Publisher Borrower
publisher_id borrower_id
name Name
address Email
phone
Borrowed Book Borrowing Log
book_id borrow_id
borrower_id book_id
borrow_date borrower_id
return_date borrow_date
return_date

Fig 4.7: ER Schema

We have four entities in this ER diagram (Fig 4.7): "Book," "Author,"


"Publisher," and "Borrower." Each entity has its own set of attributes, and the
lines connecting them represent the relationships between the entities.
Attributes of the "Book" entity include "book_id," "title," "genre,"
"publish_year," and "publisher_id." Attributes of the "Publisher" entity
include "publisher_id," "name," "address," and "phone." Attributes of the
"Author" entity include "author_id," "name," and "nationality." Attributes of
the "Borrower" entity include "borrower_id," "name," and "email."

The lines connecting the entities represent their relationships. The "Book"
entity, for example, has a "publisher_id" attribute that links it to the
"Publisher" entity. The "Borrowed Book" entity has attributes "book_id" and
"borrower_id" that link it to the "Book" and "Borrower" entities, respectively.
67
Business Intelligence Finally, the "Borrowing Log" entity has attributes that describe how
& Decision Making
borrowers borrow and return books, and it is linked to both the "Book" and
"Borrower" entities.

Processes for extract, transform, and load (ETL):


ETL (extract, transform, load) is a data integration and data warehousing
process. It entails extracting data from various sources, transforming it to
meet the needs of the target system, and loading it into the destination
system.

ETL is divided into three stages:


• Extract: Data is extracted from various sources such as databases, files,
APIs, and web services at this stage. Connecting to the source systems
and retrieving the data in the required format may be required.
• Transform: The extracted data is transformed at this stage to meet the
specific requirements of the target system. This can include data
cleaning, validation, standardization, enrichment, aggregation, and
integration.
• Load: The transformed data is loaded into the target system, such as a
data warehouse or a data lake, at this stage. Loading data into tables or
files, creating indexes, and optimizing the database for faster query
performance are all examples of this.
Consider an e-commerce company with multiple data sources, such as sales
transactions, customer profiles, and product information, as an example of an
ETL process. The company intends to build a data warehouse to analyze this
data and gain insights to improve its business operations.

Fig 4.8: Stages in ETL Process

Now that the data is in the data warehouse, the e-commerce company can
analyze the data and gain insights into its business operations using tools
such as SQL queries, data visualization software, and machine learning
algorithms. This can assist the company in making data-driven decisions to
improve customer satisfaction, boost sales, and optimize its supply chain.
ETL is a Data Warehousing process that stands for Extract, Transform, and
Load. An ETL tool extracts data from various data source systems,
transforms it in the staging area, and then loads it into the Data Warehouse
68 system.
Introduction to
Extraction: Business Intelligence

Extraction is the first step in the ETL process. In this step, data from various
source systems is extracted into the staging area in various formats such as
relational databases, No SQL, XML, and flat files. Because the extracted data
is in various formats and can be corrupted, it is critical to extract it from
various source systems and store it first in the staging area rather than directly
in the data warehouse. As a result, loading it directly into the data warehouse
may cause it to be damaged, and rollback will be much more difficult. As a
result, this is one of the most crucial steps in the ETL process.

Fig 4.9: ETL Process

Transformation:
Transformation is the second step in the ETL process. In this step, the
extracted data is subjected to a set of rules or functions to be converted into a
single standard format. It could include the following processes/tasks:
• Filtering is the process of loading only specific attributes into a data
warehouse.
• Cleaning entails replacing NULL values with default values, mapping
the U.S.A, United States, and America into the USA, and so on.
• Joining is the process of combining multiple attributes into one.
• Splitting is the process of dividing a single attribute into multiple
attributes.
• Sorting is the process of organizing tuples based on some attribute.
(generally key-attribute).

Loading:
Loading is the third and final step in the ETL process. The transformed data
is finally loaded into the data warehouse in this step. The data is sometimes
updated very frequently by loading it into the data warehouse, and other
times it is done at longer but regular intervals. The rate and duration of
loading are solely determined by the requirements and differ from system to
system.
69
Business Intelligence
& Decision Making 4.3 INTRODUCTION TO DATA MINING AND
ANALYTICS
Data mining and analytics are two related fields that involve extracting
insights and knowledge from data using computer algorithms and statistical
techniques. While the two terms are frequently used interchangeably, there
are some distinctions between them. Data mining is the process of identifying
patterns and relationships in large datasets by using algorithms. The goal of
data mining is to extract previously unknown insights and knowledge from
data that can then be used to make better decisions and predictions.

Data mining techniques can be used for a variety of purposes, including fraud
detection, market segmentation, and customer churn prediction. Analytics, on
the other hand, entails analyzing and interpreting data using statistical and
mathematical techniques. Analytics can be used to spot trends, forecast future
outcomes, and test hypotheses. In business, analytics is frequently used to
inform decision-making, such as optimizing pricing strategies or improving
supply chain efficiency.

Data mining and analytics are inextricably linked because both involve
working with data and extracting insights from it. Many techniques, such as
clustering, classification, and regression, are also shared. Data mining, on the
other hand, focuses on identifying patterns and relationships, whereas
analytics focuses on analyzing and interpreting data to make informed
decisions. Data mining and analytics are both critical tools for businesses and
organizations looking to maximize the value of their data. Businesses can
gain valuable insights into customer behavior, market trends, and operational
efficiency by using these techniques, which can help them stay ahead of the
competition and make data-driven decisions.
As an example, suppose a business wants to improve customer retention by
identifying customers who are likely to cancel their subscriptions. They have
a large dataset with data on customer demographics, behaviour, and
transaction history. To use data mining techniques, the company could group
customers based on their behaviour and transaction history using clustering
algorithms. Customers who have made large purchases in the past are more
likely to renew their subscriptions, whereas customers who have recently
decreased their purchase activity are more likely to cancel.

The company could use analytics techniques such as regression analysis to


determine which customer attributes are most strongly correlated with
subscription cancellation. Customers who are in a certain age group, live in a
certain geographic area, or use a specific payment method may be more
likely to cancel. The company can gain a better understanding of customer
behavior and develop more effective retention strategies by combining data
mining and analytics techniques. For example, based on data mining and
analytics insights, they may create targeted promotions for customers who are
about to cancel their subscriptions.

70
Introduction to
4.4 DATA GOVERNANCE AND SECURITY Business Intelligence

Data governance and security are two essential elements of any organization's
data management strategy. Data governance refers to the processes and
policies that ensure data is managed and used effectively and efficiently,
whereas data security refers to the safeguards put in place to prevent
unauthorized access, disclosure, modification, or destruction of data. The
creation of policies, procedures, and standards for managing data throughout
its lifecycle is referred to as data governance. It includes data quality,
privacy, security, and compliance management, as well as overall data asset
management. Data governance aims to ensure that data is managed
effectively and efficiently and that it is used to support business goals.

The following are the key components of data governance:

• Data ownership refers to the identification of individuals or teams who


are in charge of managing specific data sets.
• Data quality is the assurance that data is correct, complete, and
consistent.
• Data privacy is the protection of data following applicable laws and
regulations.
• Data security is the protection of data from unauthorized access,
disclosure, modification, or destruction.
• Data lifecycle management is the process of ensuring that data is
managed from creation to disposal.

Data governance also entails developing a data governance framework that


includes data management policies, processes, and procedures. This
framework should be backed up by appropriate technology solutions that help
automate data governance processes and ensure policy and standard
compliance.
Data security entails safeguarding data against unauthorized access,
disclosure, modification, or destruction. This can be accomplished through
several means, including:
• Controls for access: Restricting data access to only those who require it.
• Encryption: The process of converting data into an unreadable format
that can only be decrypted using a key.
• Backup and recovery: Ensuring that data can be recovered promptly.

Importance of data governance:


Data governance is the process of managing an organization's data's
availability, usability, integrity, and security. Data governance is critical
because it ensures that organizations can effectively manage their data assets
to meet business objectives, comply with regulations, and maintain the trust
of their stakeholders. Here are some of the main reasons why data
governance is critical:
71
Business Intelligence • Data integrity: Data governance that is effective ensures that data is of
& Decision Making
high quality, accurate, complete, and consistent. This improves decision-
making and reduces the likelihood of errors and inefficiencies.

• Compliance: Organizations must follow various laws and regulations


regarding data privacy, security, and confidentiality. Effective data
governance ensures that data is managed by these standards, lowering the
risk of legal and financial penalties.

• Risk management: Effective data governance lowers the risk of data


breaches, unauthorized access, and other security threats, thereby
protecting an organization's reputation and limiting potential financial
losses.
• Efficiency: Effective data governance aids in the streamlining of
processes and the reduction of costs associated with data management,
resulting in increased efficiency and productivity.
• Business strategy: Data governance assists organizations in aligning
their data management practices with their business objectives, allowing
them to better leverage their data assets and gain a competitive
advantage.

Data governance framework:


A data governance framework is a collection of policies, procedures,
standards, and guidelines that define how a company manages and protects
its data assets. A data governance framework's purpose is to ensure that data
is managed consistently, effectively, and securely across an organization
while adhering to applicable laws, regulations, and industry standards.

A typical data governance framework consists of the following elements:


• Data policies are high-level statements that define the organization's
approach to data management, such as how data is collected, processed,
stored, and shared.
• Data standards are detailed specifications outlining the requirements for
data quality, security, and data management processes. Standards help to
ensure consistency and accuracy in data handling.
• Data management processes are the procedures and workflows that help
with the data lifecycle, which includes data acquisition, processing,
storage, and distribution.
• Data stewardship is the delegation of responsibility for the oversight and
management of specific data assets to individuals or teams within an
organization.
• Data security and privacy policies and procedures for protecting data
from unauthorized access, use, or disclosure, as well as compliance with
data privacy regulations, are included.
• Data architecture and technology refer to the infrastructure, systems, and
tools used to manage data throughout an organization.
72
Introduction to
• Data quality management encompasses processes for ensuring data Business Intelligence
accuracy, completeness, consistency, and procedures for resolving data
quality issues.
• Roles and responsibilities of data governance: This defines the roles and
responsibilities of various stakeholders involved in data management,
such as data owners, data stewards, and data users.

Data security and privacy:


Modern technology and the digital age rely heavily on data security and
privacy. They refer to safeguarding sensitive information against
unauthorized access, theft, and misuse. Data security refers to the safeguards
put in place to protect information from malicious attacks and cyber threats.
This can include using encryption to protect data, installing firewalls to
prevent unauthorized access, and establishing authentication mechanisms to
ensure only authorized users have access.

Data privacy, on the other hand, is concerned with the management of


personal data, ensuring that it is collected, used, and shared in a manner that
respects individuals' privacy rights. This includes obtaining consent before
collecting data, securely storing data, and giving individuals control over how
their data is used and shared. Individuals and organizations can use several
best practices to ensure data security and privacy. These are some examples:
• Making use of strong passwords and enabling two-factor authentication.
• Updating software to prevent vulnerabilities from being exploited.
• Encryption is used to protect sensitive data both at rest and in transit.
• Only necessary data should be collected, and it should be securely stored.
• Providing clear and concise privacy policies, as well as obtaining consent
before data collection.
• Reviewing data protection policies and procedures regularly to ensure
they are up to date and effective.

4.5 BUSINESS INTELLIGENCE APPLICATIONS


Business intelligence (BI) applications are software tools that analyze,
process, and present large amounts of data from various sources to assist
organizations in making better business decisions. These applications
typically include features such as reporting, data visualization, dashboards,
and data mining that enable businesses to collect, process, and analyze data
from various departments within an organization. BI applications' primary
goal is to assist businesses in identifying trends and patterns in data and
making data-driven decisions that can lead to improved business
performance. Among the most common BI applications are:

• Tools for reporting: These applications assist businesses in producing


reports and presentations based on data gathered from a variety of
sources. These reports can be used to monitor key performance
indicators, identify areas for improvement, and make strategic decisions.
73
Business Intelligence • Data visualization tools: These are used to create visual representations
& Decision Making
of data to assist users in quickly understanding complex data sets.
Graphs, charts, and interactive dashboards are examples of this.

• Data mining tools: Extract information from large data sets to identify
patterns, correlations, and trends. This data can be used to inform
business decisions and improve operations.

• Performance management applications: These applications assist


businesses in tracking and measuring performance metrics across
departments and business units. Financial metrics, sales metrics, and
customer engagement metrics are examples of such metrics.

4.6 SUMMARY
This unit gives an overview of Business Intelligence (BI), which is the
process of gathering, analyzing, and transforming raw data into useful
information that businesses can use to make informed decisions. BI employs
a wide range of tools, technologies, and strategies to access and analyze data
from a variety of sources, including databases, spreadsheets, and data
repositories. This unit discusses the advantages of business intelligence, such
as its ability to provide insights into business operations, identify areas for
improvement, and enable data-driven decision-making, which can increase
revenue and profitability. Dashboards, reports, and data visualizations are
also highlighted as tools to assist decision-makers in interpreting complex
data and identifying patterns and trends.

The unit also discusses some common BI tools and technologies, such as data
warehouses, ETL (Extract, Transform, Load) tools, analytics software, and
data visualization platforms. It also discusses the significance of data quality,
data governance, and data security in business intelligence. Overall, this unit
provides a thorough overview of Business Intelligence and its significance in
modern business operations. It focuses on the key concepts, strategies, and
technologies involved in business intelligence and explains how they can be
used to gain a competitive advantage.

4.7 SELF-ASSESSMENT EXERCISES


• Consider any data set and assess your data analysis skills by
answering the following questions:
1. Can you interpret complex data sets and identify patterns and
trends?
2. Can you use statistical methods and tools to analyze data?
3. Are you familiar with data visualization techniques and tools?
• Case let: The Marketing Manager's Dilemma
Eva is the marketing manager for a large consumer goods company. She
is responsible for launching a new line of products and needs to make
some key decisions about the marketing strategy. She has access to a lot
of data, but she's not sure how to use it to make informed decisions.
74
Introduction to
Questions: Business Intelligence
1. What is the problem that Eva is facing?
2. How can Business Intelligence (BI) help Eva solve this problem?
3. What data sources should Eva consider when making decisions about the
marketing strategy?
4. What are some potential insights that BI could provide to Eva?
5. What types of BI tools and technologies could Eva use to analyze the
data and generate insights?
6. How can Eva ensure that the data she is using is of high quality and
reliable?
7. What steps can Eva take to ensure that she effectively communicates the
insights from BI to stakeholders within the company?
8. How can Eva use BI to measure the marketing strategy's success and
make necessary adjustments?

4.8 FURTHER READINGS


• "Business Intelligence: A Managerial Perspective on Analytics" by
Ramesh Sharda, Dursun Delen, Efraim Turban.
• "The Data Warehouse Toolkit: The Definitive Guide to Dimensional
Modeling" by Ralph Kimball and Margy Ross.
• "Data Science for Business: What You Need to Know about Data
Mining and Data-Analytic Thinking" by Foster Provost and Tom
Fawcett.

75

You might also like