Business Intelligence Notes
Business Intelligence Notes
• Data Warehouses
Centralized repositories that integrate data from various
sources, providing a comprehensive and consistent view
for analysis and reporting
Extract, Transform, Load (ETL)
Process
Extract, Transform, Load (ETL) Process
A process used to move data from source systems to a
target database or data warehouse.
Stages:
Extract: Gather data from various sources.
Transform: Clean, format, and integrate the data.
Load: Insert the transformed data into the target system.
Data Marts
• Data Marts
Subsets of a data warehouse designed to serve specific
business units or functions.
Data Lakes
• Data Lakes
Storage repositories that hold raw, unstructured, and
structured data in its native format.
NoSQL Databases
• NoSQL Databases
Databases designed to handle a variety of data models
other than the traditional relational model.
Hadoop
• Hadoop
An open-source framework designed for distributed
storage and processing of large data sets.
In-Memory Databases
• In-Memory Databases
Databases that store data primarily in memory (RAM)
rather than on disk.
ACID properties
• ACID properties are a set of principles that ensure reliable
transactions in database systems. They stand for:
1.Atomicity: Transactions are all-or-nothing. Either the entire
transaction is completed, or none of it is. This ensures that partial
transactions do not affect the database.
2.Consistency: A transaction brings the database from one valid state
to another valid state, maintaining database invariants and
constraints.
3.Isolation: Transactions are executed independently of one another.
The intermediate state of a transaction is not visible to other
transactions until it is complete.
4.Durability: Once a transaction is committed, its changes are
permanent, even in the case of a system failure.
The Hadoop Distributed File System
• The Hadoop Distributed File System (HDFS) is a key
component of the Apache Hadoop framework, designed
to store and manage large volumes of data across a
distributed computing environment.
• Apache Hadoop is an open-source framework designed
for processing and storing large datasets in a
distributed computing environment.
MapReduce
• MapReduce is a programming model and processing
technique used to handle large-scale data processing
across distributed computing environments, such as
those managed by Hadoop. It breaks down a task into
smaller, manageable pieces, processes them in parallel,
and then combines the results
Descriptive Analysis
• Descriptive Analysis is a statistical method that helps
to summarize and describe the main features of a
dataset. It's often the first step in data analysis,
providing an overview of the data through measures
such as mean, median, mode, and standard deviation,
as well as through visualizations like charts and graphs.
Visual Analytics
• Visual Analytics is the science of analytical reasoning
facilitated by interactive visual interfaces. It combines
data analysis and visualization to help users understand
complex datasets and extract insights. By turning raw
data into visual representations such as charts, graphs,
and dashboards, visual analytics makes it easier to spot
trends, patterns, outliers, and correlations.
Regression Analysis