0% found this document useful (0 votes)
11 views19 pages

Lecture 2.1.1 2.1.2

The document outlines the course outcomes and content for a Business Analytics course, focusing on data warehousing and data mining concepts. It details the differences between data warehousing and data mining, as well as OLTP and OLAP systems, and introduces data marts and data lakes. Additionally, it emphasizes the importance of architectural aspects in data mining and provides references for further reading.

Uploaded by

Siddharth Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views19 pages

Lecture 2.1.1 2.1.2

The document outlines the course outcomes and content for a Business Analytics course, focusing on data warehousing and data mining concepts. It details the differences between data warehousing and data mining, as well as OLTP and OLAP systems, and introduces data marts and data lakes. Additionally, it emphasizes the importance of architectural aspects in data mining and provides references for further reading.

Uploaded by

Siddharth Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

INSTITUTE-USB

DEPARTMENT-BBA
Bachelor Of Business Administration
Business Analytics (22BAT-264)
Instructor:Ms. Nikita Bhardwaj

Business Analytics (22BAT-264)Operation Research(BAT-308)


DISCOVER . LEARN . EMPOWER
Operation Research(BAT-308)
Business Analytics
Course Outcome
CO Title Level
Number
Understand
CO1 To demonstrate the concepts and methods of
business analytics and their role in business and
society
To apply data processing tools for exploratory Application
CO2 analysis an to demonstrate its effectiveness to
diverse audience

To enhance the analytical skills of students by Analyze


CO3
providing the knowledge of various analytical
software and tools

To evaluate analytical solutions for assessing their Evaluate


CO4
effectiveness in contemporary business world

To build the expertise in delivering practical Application


CO5 https://www.or.tum.de/en/home/
solutions for complex business problems
Contents to be Covered
• Data Warehousing
• Data Mining
• Architectural aspects of Data Mining
• Differences Between Data Warehouse and Data Mining
• OLTP (Online Transaction Processing)
• OLAP (Online Analytical Processing)
• Data Mart
• Data Lake
Data Warehousing:

• Data warehousing involves the process of collecting, storing,


and managing large volumes of data from various sources to
support decision-making processes within an organization.
• It integrates data from different operational systems into a
central repository, known as a data warehouse, which is
optimized for querying and analysis.
• The data in a warehouse is structured in a way that facilitates
reporting, analytics, and data mining activities.
• Data warehousing helps organizations to consolidate and
organize their data for efficient analysis and reporting,
providing a single source of truth for decision-making.
Data Mining:
• Data mining is the process of discovering patterns, trends, and
insights from large datasets using various statistical,
mathematical, and machine learning techniques.
• It involves extracting useful information and knowledge from
the data stored in a data warehouse or other repositories.
• Data mining techniques can be used to uncover hidden patterns,
relationships, and correlations within the data, which can then be
utilized for predictive analysis, anomaly detection, and other
decision-making tasks.
• Data mining algorithms can identify meaningful patterns in data
that may not be immediately apparent, helping organizations
gain valuable insights and make informed decisions.
January 23, 2025 Data Mining: Concepts and Techniques
Architectural aspects of Data Mining
• The architectural aspects of data mining involve the design and implementation of systems and
components that support the data mining process. Here are the key architectural aspects:
• 1.Data Sources:
• Data mining systems typically begin with various data sources, including databases, data
warehouses, data lakes, flat files, APIs, and external data sources.
• Architectural considerations involve identifying and accessing relevant data sources, ensuring data
quality, and integrating data from disparate sources into a unified data environment.
• 2.Data Preprocessing:
• Before data mining algorithms can be applied, data preprocessing is necessary to clean, transform,
and prepare the data for analysis.
• Architectural aspects include defining preprocessing tasks such as missing value imputation,
outlier detection, data normalization, feature selection, and dimensionality reduction.

Data Mining: Concepts and Techniques


3.Data Mining Algorithms:
• Data mining algorithms are the core components of a data mining system, responsible for discovering
patterns, trends, and insights from the data.
• Architectural considerations involve selecting appropriate algorithms based on the characteristics of the
data and the objectives of the analysis, implementing algorithms efficiently, and integrating them into the
system.
4.Model Evaluation and Validation:
• Data mining models need to be evaluated and validated to assess their performance and reliability.
• Architectural aspects include defining evaluation metrics, designing validation procedures such as cross-
validation or holdout validation, and implementing mechanisms for comparing and selecting the best-
performing models.
5.Model Deployment:
• Once data mining models are developed and validated, they need to be deployed into production systems
for real-world use.
• Architectural considerations involve integrating models into operational systems, designing interfaces for
model input and output, monitoring model performance, and updating models as new data becomes
available

Data Mining: Concepts and Techniques


6.Scalability and Performance:
• Architectural aspects of data mining systems include considerations for scalability and
performance to handle large volumes of data and complex analytical tasks efficiently.
• This may involve distributed computing architectures, parallel processing, optimization
techniques, and resource management strategies to ensure scalability and performance.
7.Integration with Business Processes:
• Data mining systems need to be integrated with existing business processes and decision-
making workflows to derive actionable insights and value.
• Architectural considerations involve aligning data mining activities with business objectives,
providing tools and interfaces for decision-makers to access and interpret results, and
incorporating feedback loops for continuous improvement.

Data Mining: Concepts and Techniques


Differences Between Data Warehouse and Data
Mining
Purpose:
• Data warehousing focuses on the process of storing and managing data to support
reporting and analysis.
• Data mining focuses on extracting insights and knowledge from data through advanced
analytical techniques.
Activities:
• Data warehousing involves data integration, transformation, and storage in a central
repository.
• Data mining involves pattern recognition, predictive modeling, and knowledge discovery
from the stored data.
Goal:
• The goal of data warehousing is to provide a centralized and structured repository of data
for analysis and reporting purposes.
• The goal of data mining is to uncover hidden patterns and insights within the data that can
be used for decision-making and predictive analysis.

January 23, 2025 Data Mining: Concepts and Techniques


Output:
• Data warehousing produces structured data repositories optimized for querying and
reporting.
• Data mining produces insights, patterns, and models derived from the data analysis
process.
Techniques:
• Data warehousing primarily involves data integration, schema design, and data
storage techniques.
• Data mining involves various statistical, mathematical, and machine learning
techniques such as clustering, classification, regression, and association rule
mining.
OLTP (Online Transaction Processing):

• OLTP is a type of database processing that focuses on managing and executing high-volume
transactional workloads in real-time.
• It is optimized for handling a large number of short, atomic transactions such as inserting,
updating, and deleting records in a database.
• OLTP systems are designed to ensure data integrity, concurrency control, and high availability to
support day-to-day operational activities of an organization.
• These systems typically have normalized database schemas to minimize redundancy and
maintain consistency in transaction processing.
OLAP (Online Analytical
Processing):
• OLAP is a technology used for querying, analyzing, and aggregating large
volumes of data to facilitate decision-making and business intelligence
activities.
• It enables users to perform complex multidimensional analysis, drill-down, and
slice-and-dice operations on data stored in a data warehouse or a
multidimensional database.
• OLAP systems provide fast query performance and support advanced analytical
functions such as data mining, forecasting, and trend analysis.
• These systems often utilize denormalized or star/snowflake schema designs to
optimize query performance and enable efficient analytical processing.

January 23, 2025 Data Mining: Concepts and Techniques


Differences Between OLTP and OLAP :
1.Purpose:
• OLTP systems are designed for transaction processing, focusing on capturing and managing day-to-day operational data.

• OLAP systems are designed for analytical processing, supporting complex querying and analysis of historical and aggregated data.

2.Workload:
• OLTP systems handle a large number of short, atomic transactions involving insertions, updates, and deletions of data.

• OLAP systems handle analytical queries that involve aggregating, summarizing, and analyzing large volumes of historical data.

3.Data Structure:
• OLTP systems typically have normalized database schemas to minimize redundancy and ensure data integrity.

• OLAP systems often use denormalized or star/snowflake schema designs to optimize query performance and facilitate complex analysis.

4.Query Complexity:
• OLTP queries are typically simple and focused on retrieving or modifying individual records or small subsets of data.

• OLAP queries are more complex and involve aggregating, grouping, and analyzing data across multiple dimensions to derive insights.

5.Usage:
• OLTP systems are used for day-to-day operational activities such as order processing, inventory management, and customer transactions.

• OLAP systems are used for business intelligence, reporting, and decision support activities such as sales analysis, financial reporting, and
market trend analysis.
Data Mart:
• A data mart is a subset of a data warehouse that is focused on a
specific area or department within an organization.
• It contains a smaller, more focused set of data that is tailored to
meet the needs of a particular group of users, such as sales,
marketing, or finance.
• Data marts are typically designed to support specific business
functions or analytical requirements, providing users with easy
access to relevant data for analysis and reporting.
• They are often created using a top-down approach, where data is
extracted from the central data warehouse and transformed to meet
the specific needs of the target business unit or department.
January 23, 2025 Data Mining: Concepts and Techniques
Data Lake:
• A data lake is a centralized repository that allows organizations to store all their
structured and unstructured data at any scale.
• It enables organizations to store data in its raw format, without the need for
extensive pre-processing or schema design, making it suitable for storing
diverse types of data, such as text, images, videos, and sensor data.
• Data lakes are designed to support a wide range of use cases, including data
exploration, advanced analytics, machine learning, and data discovery.
• They are often built using scalable distributed storage systems such as Hadoop
Distributed File System (HDFS) or cloud-based storage solutions like Amazon
S3 or Azure Data Lake Storage.

January 23, 2025 Data Mining: Concepts and Techniques


Differences Between Data Mart and Data Lake:
Structure:

• Data marts are structured repositories that contain curated and pre-processed
data, typically organized around specific business functions or departments.
• Data lakes are unstructured or semi-structured repositories that store raw
data in its native format, without the need for extensive schema design or
data modeling.
Scope:

• Data marts have a narrow scope and are focused on specific business areas
or departments within an organization.
• Data lakes have a broader scope and can store data from multiple sources
and domains, catering to a wide range of analytical and business needs.
January 23, 2025 Data Mining: Concepts and Techniques
Data Processing:
• Data marts involve the extraction, transformation, and loading (ETL) of data from the central
data warehouse or operational systems to create curated datasets.
• Data lakes store raw data in its native format, allowing for on-demand processing and analysis
using tools and technologies such as Apache Spark, Hadoop, or cloud-based analytics services.
Use Cases:
• Data marts are used for structured reporting, ad-hoc querying, and analysis within specific
business units or departments.
• Data lakes are used for exploratory data analysis, advanced analytics, machine learning, and data
science initiatives that require access to raw and diverse datasets.
Agility and Scalability:
• Data marts are relatively rigid and may require significant effort to modify or extend to
accommodate new data sources or analytical requirements.
• Data lakes are highly flexible and scalable, allowing organizations to store and analyze vast
amounts of data from diverse sources with minimal constraints on schema design or data
processing.
References

• TEXT BOOKS
Introduction to Data Mining, Tan, Steinbach and Vipin Kumar, Pearson Education,
2016
•REFERENCE BOOKS
Data Mining: Concepts and Techniques, Pei, Han and Kamber, Elsevier (2 nd
edition)
• Journals
• https://www.sciencedirect.com/topics/computer-science/data-generalization
Thank you

Please Send Your Queries on:

e-Mail: [email protected]

You might also like