Data Mining and Business Intelligence
Data Mining and Business Intelligence
Data Mining
and
Business Intelligence
Data-driven strategy for business transformation
www.bpbonline.com
ii
First Edition 2025
ISBN: 978-93-65892-239
All Rights Reserved. No part of this publication may be reproduced, distributed or transmitted in
any form or by any means or stored in a database or retrieval system, without the prior written
permission of the publisher with the exception to the program listings which may be entered,
stored and executed in a computer system, but they can not be reproduced by the means of
publication, photocopy, recording, or by any electronic and mechanical means.
All trademarks referred to in the book are acknowledged as properties of their respective
owners but BPB Publications cannot guarantee the accuracy of this information.
www.bpbonline.com
iii
Dedicated to
My late parents
My wife Mandakini
and
Daughters Mugdha and Snigdha
iv
Dr. Jyotiranjan Hota is a distinguished academic with nearly 20 years of teaching, research
and software consulting experience. He holds a B.E. in computer science and engineering
from NIT Rourkela, a PGDBM from Xavier Institute of Management, Bhubaneswar, and a
Ph.D. in Management Studies from Aligarh Muslim University, which was a joint program
with AIMA, New Delhi. His areas of expertise includes business analytics, artificial
intelligence, machine learning, data mining, text mining, visual analytics, and functional
modules of SAP S4/HANA (SD, MM, PP, and FI-CO). He is also proficient in programming
languages like R and Python and conversant with tools such as KNIME and Power BI. Dr.
Hota's Ph.D. research delved into the adoption of Multivendor ATM technology in India.
He analyzed challenges and opportunities from the perspectives of customers, suppliers,
and bankers. Through this work, he developed and validated several qualitative and
quantitative models addressing the drivers and barriers to technology adoption.
His broader interests focus on integrating information technology across various functional
areas of management. An AIMA-accredited management teacher in the IT domain, Dr.
Hota has an extensive record of publications in prominent top journals, including the
International Journal of Bank Marketing, Asia Pacific Journal of Information Systems,
International Journal of Management in Education, and The IUP Journal of Applied
Economics, among others. He has also contributed to works by leading publishers such as
Sage, Springer and Palgrave Macmillan. Dr. Hota has actively participated in international
conferences in India and abroad, serving in key roles such as program committee member,
advisory board member, technical committee member, track chair, and session chair. In
recognition of his academic contributions, he was awarded with the ICBM-AMP Academic
Excellence Award 2018 in the Best Professor in IT and Operations category in Hyderabad,
India.
v
v Tong Zhi is a seasoned data scientist and data engineer with expertise in the private
equity sector. His professional experience encompasses the development and
deployment of advanced analytical and predictive models, such as Markov Chain
Monte Carlo simulations, sophisticated time series forecasting, and classification
algorithms.
In addition to his role in private equity, he is also the founder of a startup venture,
where he successfully architected and implemented a comprehensive business
intelligence platform from scratch.
Tong holds a master of science degree in business analytics and a bachelor of science
degree in finance. Recognized for his expertise, he has frequently been invited to speak
at prestigious international data science conferences and events. Currently, he serves
at RoundShield Partners, a firm specializing in private equity and private credit, while
concurrently managing HireHarbour, an innovative executive assistant outsourcing
agency.
v Anup Sahoo is a Cloud Technical Lead at Insight India with over 14+ years of rich
experience in the field of Quality Engineering, Test Automation, and DevOps. As a
seasoned professional and a lifelong learner, Anup is passionate about solving real-
world problems by merging deep technical expertise with cutting-edge technologies.
He is a Generative AI enthusiast and researcher, exploring how large language
models (LLMs) and AI-driven automation can transform the future of software testing
and quality engineering. As a Technical Author, Anup has shared his insights through
technical blogs, research-backed frameworks, and a growing portfolio of practical
tools that aim to make QA smarter and more adaptive.
When he's not immersed in designing intelligent test frameworks or experimenting with
AI-infused pipelines, Anup channels his energy into mentoring aspiring professionals,
creating impactful DevOps and automation content, and exploring nature through
trekking. His curiosity-driven approach and commitment to innovation make him a
driving force in both the tech and learning communities.
vi
Acknowledgement
I would like to extend my heartfelt gratitude to everyone who has supported me on the
challenging journey of writing this book as a sole author.
My wife, Mandakini, and my daughters, Mugdha and Snigdha, have been a constant
source of love, motivation, and strength. Their encouragement and sentimental support
have been a constant source of motivation.
At KSOM fraternity, I received consistent writing support from the early drafts to the final
manuscript. The team's encouragement was instrumental in shaping this scholarly work
into its final published version.
My sincere appreciation goes to BPB Publications for their invaluable guidance and
expertise in bringing this book to life. I would also like to acknowledge the technical
reviewers and editors who contributed their valuable feedback to this manuscript. Their
insights and suggestions have greatly enhanced the quality of the book.
Last but not least, I want to express my gratitude to the readers who have shown interest
in the book. Your support and encouragement have been deeply appreciated.
Thank you to everyone who has played a part in making this book a reality.
vii
Preface
In today’s era of digital transformation, information has become the cornerstone of progress
and innovation. The ability to uncover actionable insights from data and effectively use
business intelligence tools is a skill that spans industries, domains, and geographies. This
book is carefully crafted to provide readers with the knowledge, techniques, case studies,
and practical applications required to thrive in this transformative era.
The book is divided into eight thoughtfully designed chapters which offers a balanced
blend of theoretical understanding and practical exercises. It provides a gradual progression
that takes readers from foundational concepts to advanced analytics to prepare them to
navigate the complexities of real-world data challenges.
The journey begins with Chapter 1, which introduces the basics of data mining and
business intelligence. This chapter highlights the significance of these fields, explains
core principles and emphasizes the importance of leveraging data effectively. Readers
are also introduced to key differences between online analytical processing (OLAP) and
online transactional processing (OLTP) systems. Chapter 2 focuses on pre-processing
techniques, regression, and classification methods. It equips readers with essential tools to
improve data quality and build reliable predictive models, laying a strong foundation for
tackling real-world challenges.
Chapter 3 presents association rule mining, which is key to discovering patterns and
relationships in data. This chapter explains metrics such as support, confidence and lift
while introducing algorithms like A priori for identifying valuable insights. Chapter 4
discusses clustering techniques and their applications across various domains. It provides
practical examples to illustrate foundational methods like k-means clustering, as well as
advanced algorithms for grouping and analyzing data effectively.
The middle chapters explore the domain of business intelligence. Chapter 5 introduces its
fundamentals, examining the driving forces, market dynamics, and tactical applications
that define this transformative field. Chapter 6 puts these ideas into practice by exploring
business intelligence architecture, concepts such as slicing and dicing and utilizing Power
BI to model data and create impactful dashboards.
This book is thoughtfully written to be clear and useful for readers from various
backgrounds, including students, professionals, and lifelong learners. With its clear
explanations, practical examples, and comprehensive coverage, data mining and business
intelligence provides a valuable resource for mastering the concepts and applications
of these fields. It is hoped that this book inspires curiosity, fosters critical thinking, and
empowers readers to unlock the potential of data to create meaningful and impactful
solutions in their respective areas of study and work. Through practical examples,
comprehensive explanations, sand a structured approach, this book aims to equip readers
with a solid understanding of digital systems and technology. Whether you are a beginner
or an experienced learner, I hope this book will serve as a valuable resource in your journey
of exploring the foundations of data-driven insights.
Chapter 1: Introduction to Data Mining and Business Intelligence - This chapter briefly
narrates the fundamental principles of data mining and business intelligence tools and
techniques. Readers will gain an overall idea of how to use various techniques to extract
meaningful information from large datasets. It also covers the scope, issues, and future
trends.
clustering are discussed with examples using R. Few advanced clustering algorithms are
discussed to deal with very large datasets.
Chapter 7: Advanced Data Mining and Business Intelligence Techniques - This chapter
narrates advanced data mining techniques to deal with large volume of data of various
functional domains to extract meaningful insights. This chapter covers topics like text
mining, big data, edge analytics, cognitive analytics and real-time analytics to integrate
with data mining and business intelligence tools for making informed decisions in
organizations. Organizations gain strategic intent by uncovering diamonds from large
datasets.
Chapter 8: Data Mining and Business Intelligence Ethical Framework - The lifeblood
of data mining and business intelligence is data. Primary responsibility and challenges of
data governance revolve around data collection, access, preservation, security, privacy, and
democratization. Fairness, transparency, and trust can be achieved through a proper ethical
framework based on good data governance practices. This chapter discusses inhibiting and
facilitating forces of ethical directions of data mining and business intelligence to ensure
trust, transparency, lowering costs incurred in organizations, and social implications for
society as a whole.
x
https://rebrand.ly/30gtva2
The code bundle for the book is also hosted on GitHub at
https://github.com/bpbpublications/Data-Mining-and-Business-Intelligence.
In case there’s an update to the code, it will be updated on the existing GitHub repository.
We have code bundles from our rich catalogue of books and videos available at
https://github.com/bpbpublications. Check them out!
Errata
We take immense pride in our work at BPB Publications and follow best practices to
ensure the accuracy of our content to provide with an indulging reading experience to our
subscribers. Our readers are our mirrors, and we use their inputs to reflect and improve
upon human errors, if any, that may have occurred during the publishing processes
involved. To let us maintain the quality and help us reach out to any readers who might be
having difficulties due to any unforeseen errors, please write to us at :
[email protected]
Your support, suggestions and feedbacks are highly appreciated by the BPB Publications’
Family.
Did you know that BPB offers eBook versions of every book published, with PDF
and ePub files available? You can upgrade to the eBook version at www.bpbonline.
com and as a print book customer, you are entitled to a discount on the eBook copy.
Get in touch with us at :
[email protected] for more details.
Piracy
If you come across any illegal copies of our works in any form on the internet,
we would be grateful if you would provide us with the location address or
website name. Please contact us at [email protected] with a link to
the material.
Reviews
Please leave a review. Once you have read and used this book, why not leave
a review on the site that you purchased it from? Potential readers can then see
and use your unbiased opinion to make purchase decisions. We at BPB can
understand what you think about our products, and our authors can see your
feedback on their book. Thank you!
Table of Contents
Regression........................................................................................................................ 35
Metrics and coefficients of regression.......................................................................... 36
Assumption of linear regression.................................................................................. 37
Case study................................................................................................................... 39
Holdout and cross-validation in classification tasks................................................... 42
Performance metrices of classification techniques....................................................... 42
Decision tree..................................................................................................................... 44
Terminologies of decision tree structure...................................................................... 44
Tree construction and metrics of decision tree............................................................ 45
Gini value.................................................................................................................... 46
Logistics regression......................................................................................................... 51
Other classification techniques...................................................................................... 55
Conclusion........................................................................................................................ 60
Multiple choice questions.............................................................................................. 60
Answers....................................................................................................................... 62
Practice exercises............................................................................................................. 62
4. Clustering............................................................................................................................... 93
Introduction...................................................................................................................... 93
Structure............................................................................................................................ 93
Objectives......................................................................................................................... 94
Distance metrics............................................................................................................... 94
Euclidean distance measure........................................................................................ 94
Manhattan distance measure...................................................................................... 95
Cosine similarity measure........................................................................................... 95
Jaccard coefficient measure.......................................................................................... 96
K-means clustering, applications and challenges....................................................... 96
K-medoids clustering for data partitioning, applications and challenges................... 98
Hierarchical clustering, applications and challenges................................................ 100
Advanced clustering algorithms................................................................................. 102
Data visualization on clustering using R................................................................... 102
K-means clustering application................................................................................. 103
K-means clustering using R...................................................................................... 105
K-medoid application................................................................................................. 106
Hierarchical clustering application........................................................................... 109
Conclusion.......................................................................................................................111
Multiple choice questions............................................................................................ 112
xv
Answers..................................................................................................................... 113
Practice exercises........................................................................................................... 114
Index...............................................................................................................................241-247
xx
Introduction to Data Mining and Business Intelligence 1
Chapter 1
Introduction to
Data Mining and
Business Intelligence
Introduction
In this chapter, we will discuss the fundamentals, techniques, applications, and challenges
of data mining and business intelligence. We will initially focus on its importance and
motivation to grasp the subject foundationally. The difference between Online Analytical
Processing (OLAP) and Online Transactional Processing systems (OLTP) is explained
through real-life examples. Emerging trends of data mining and business intelligence, top
vendors, tools, and markets for data mining and business intelligence, are explained to
build a foundation for subsequent chapters.
Structure
This chapter covers the following topics:
• Reasons for studying data mining
• Evolution
• Introduction to OLTP, OLAP and data mining
• Associated fields of data mining
• Data mining techniques
• Business intelligence techniques
2 Data Mining and Business Intelligence
Objectives
After going through this chapter, you will understand the fundamental principles of
data mining and business intelligence tools, techniques, markets, and privacy concerns.
Readers will also gain an overall idea of how to deal with various techniques to extract
meaningful information from large datasets.
"Data is like a faint light when you are lost in a dark room. Follow it, try to make sense of
it, and you might actually know where you are and what is around you."
- David Sides
Introduction to Data Mining and Business Intelligence 3
Evolution
Data mining is a blend of statistics, mathematics, computer science, machine learning,
and big data. Though there is no fixed timeline, its inception was in the late eighteenth
century with the development of Bayes’ theorem based on conditional probability.
Currently, the Bayes theorem is applied extensively in data mining. Regression as a basic
prediction technique in data mining was developed during the early nineteenth century.
These rich applications strengthened the field of statistics, followed by the application of
computing by the Alan Turing model and neural networks by the mid-twentieth century.
Charlee Babbage surprised the entire human civilization with the power of computing by
a machine. However, Turing took a step forward with the Turing machine model, which
stated that a machine can also think like a human. Similarly, the introduction of neural
networks laid the foundation for data mining by developing a model in 1943.
A drastic development in data mining happened after the mid-twentieth century due
to databases, genetic algorithms, and further evolutionary computation in the era of
computing. Real business applications gained momentum during the last decade of the
twentieth century due to data warehousing and data mining as a prediction technique
with the introduction of data science. Due to the rapid development of social media, big
data, cloud usage, and IoT applications, there was an increased usage of data mining in
all functional areas of business during the twenty-first century. The stages of evolution of
data mining are specified in Figure 1.1:
From the purchase history, data mining techniques can identify the pattern and association
among purchased products. Subsequently, marketers can plan for techniques like cross-
selling, upselling, and product bundling.