Lecture 2.1.1 2.1.2
Lecture 2.1.1 2.1.2
DEPARTMENT-BBA
Bachelor Of Business Administration
Business Analytics (22BAT-264)
Instructor:Ms. Nikita Bhardwaj
• OLTP is a type of database processing that focuses on managing and executing high-volume
transactional workloads in real-time.
• It is optimized for handling a large number of short, atomic transactions such as inserting,
updating, and deleting records in a database.
• OLTP systems are designed to ensure data integrity, concurrency control, and high availability to
support day-to-day operational activities of an organization.
• These systems typically have normalized database schemas to minimize redundancy and
maintain consistency in transaction processing.
OLAP (Online Analytical
Processing):
• OLAP is a technology used for querying, analyzing, and aggregating large
volumes of data to facilitate decision-making and business intelligence
activities.
• It enables users to perform complex multidimensional analysis, drill-down, and
slice-and-dice operations on data stored in a data warehouse or a
multidimensional database.
• OLAP systems provide fast query performance and support advanced analytical
functions such as data mining, forecasting, and trend analysis.
• These systems often utilize denormalized or star/snowflake schema designs to
optimize query performance and enable efficient analytical processing.
• OLAP systems are designed for analytical processing, supporting complex querying and analysis of historical and aggregated data.
2.Workload:
• OLTP systems handle a large number of short, atomic transactions involving insertions, updates, and deletions of data.
• OLAP systems handle analytical queries that involve aggregating, summarizing, and analyzing large volumes of historical data.
3.Data Structure:
• OLTP systems typically have normalized database schemas to minimize redundancy and ensure data integrity.
• OLAP systems often use denormalized or star/snowflake schema designs to optimize query performance and facilitate complex analysis.
4.Query Complexity:
• OLTP queries are typically simple and focused on retrieving or modifying individual records or small subsets of data.
• OLAP queries are more complex and involve aggregating, grouping, and analyzing data across multiple dimensions to derive insights.
5.Usage:
• OLTP systems are used for day-to-day operational activities such as order processing, inventory management, and customer transactions.
• OLAP systems are used for business intelligence, reporting, and decision support activities such as sales analysis, financial reporting, and
market trend analysis.
Data Mart:
• A data mart is a subset of a data warehouse that is focused on a
specific area or department within an organization.
• It contains a smaller, more focused set of data that is tailored to
meet the needs of a particular group of users, such as sales,
marketing, or finance.
• Data marts are typically designed to support specific business
functions or analytical requirements, providing users with easy
access to relevant data for analysis and reporting.
• They are often created using a top-down approach, where data is
extracted from the central data warehouse and transformed to meet
the specific needs of the target business unit or department.
January 23, 2025 Data Mining: Concepts and Techniques
Data Lake:
• A data lake is a centralized repository that allows organizations to store all their
structured and unstructured data at any scale.
• It enables organizations to store data in its raw format, without the need for
extensive pre-processing or schema design, making it suitable for storing
diverse types of data, such as text, images, videos, and sensor data.
• Data lakes are designed to support a wide range of use cases, including data
exploration, advanced analytics, machine learning, and data discovery.
• They are often built using scalable distributed storage systems such as Hadoop
Distributed File System (HDFS) or cloud-based storage solutions like Amazon
S3 or Azure Data Lake Storage.
• Data marts are structured repositories that contain curated and pre-processed
data, typically organized around specific business functions or departments.
• Data lakes are unstructured or semi-structured repositories that store raw
data in its native format, without the need for extensive schema design or
data modeling.
Scope:
• Data marts have a narrow scope and are focused on specific business areas
or departments within an organization.
• Data lakes have a broader scope and can store data from multiple sources
and domains, catering to a wide range of analytical and business needs.
January 23, 2025 Data Mining: Concepts and Techniques
Data Processing:
• Data marts involve the extraction, transformation, and loading (ETL) of data from the central
data warehouse or operational systems to create curated datasets.
• Data lakes store raw data in its native format, allowing for on-demand processing and analysis
using tools and technologies such as Apache Spark, Hadoop, or cloud-based analytics services.
Use Cases:
• Data marts are used for structured reporting, ad-hoc querying, and analysis within specific
business units or departments.
• Data lakes are used for exploratory data analysis, advanced analytics, machine learning, and data
science initiatives that require access to raw and diverse datasets.
Agility and Scalability:
• Data marts are relatively rigid and may require significant effort to modify or extend to
accommodate new data sources or analytical requirements.
• Data lakes are highly flexible and scalable, allowing organizations to store and analyze vast
amounts of data from diverse sources with minimal constraints on schema design or data
processing.
References
• TEXT BOOKS
Introduction to Data Mining, Tan, Steinbach and Vipin Kumar, Pearson Education,
2016
•REFERENCE BOOKS
Data Mining: Concepts and Techniques, Pei, Han and Kamber, Elsevier (2 nd
edition)
• Journals
• https://www.sciencedirect.com/topics/computer-science/data-generalization
Thank you
e-Mail: [email protected]