0% found this document useful (0 votes)
53 views16 pages

Unit 3

The document outlines the concepts of Enterprise Architecture (EA) and Data Architecture, emphasizing their roles in aligning IT infrastructure with business goals and managing data effectively. It details key components of EA, principles of good data architecture, and various data sources and systems, including data lakes, data warehouses, and OLTP systems. The conclusion highlights the importance of understanding these architectures for building scalable and secure data systems that support organizational objectives.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views16 pages

Unit 3

The document outlines the concepts of Enterprise Architecture (EA) and Data Architecture, emphasizing their roles in aligning IT infrastructure with business goals and managing data effectively. It details key components of EA, principles of good data architecture, and various data sources and systems, including data lakes, data warehouses, and OLTP systems. The conclusion highlights the importance of understanding these architectures for building scalable and secure data systems that support organizational objectives.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT - III

Designing Good Data Architecture


Enterprise Architecture
Enterprise Architecture (EA) is a conceptual blueprint that defines the structure
and operation of an organization. It outlines how an organization’s IT infrastructure,
processes, and systems interconnect and work together to meet its business goals.
EA helps in aligning IT infrastructure and business strategy by providing a clear
vision of how technology supports business operations.

● Key Components of EA:


○ Business Architecture: Defines the business strategy, governance,
organization, and key business processes.
○ Application Architecture: Describes the software applications, their
interactions, and how they support the business processes.
○ Data Architecture: Describes how data is collected, stored, and
managed across the organization.
○ Technology Architecture: Focuses on the hardware, software, and
network technologies used to support the applications and data.
Importance:

Alignment with business goals: EA ensures that technology investments align


with the organization's strategic objectives.

Cost Efficiency: Helps optimize IT resources and reduces redundancy by


streamlining systems and processes.

Agility: Provides a framework to adapt to changes in technology and business


needs.
Data Architecture
Data Architecture is a framework that defines how an organization collects,
stores, manages, and uses its data. It specifies the structure of data systems, the
relationships between data components, and the tools used to process, analyze,
and protect data.

● Key Components of Data Architecture:


○ Data Models: Define the structure of data and its relationships (e.g.,
conceptual, logical, and physical models).
○ Data Storage: Determines how and where data will be stored, whether in
databases, data lakes, or cloud platforms.
○ Data Integration: Defines how data from different sources will be
integrated and made accessible across systems (e.g., ETL, data pipelines).
○ Data Security & Governance: Ensures that data is protected, complies
with regulations, and is managed effectively.
○ Data Access: Defines who can access data and how it will be accessed
(e.g., APIs, BI tools).
Principles of Good Data Architecture
Good data architecture ensures the optimal design of data management systems, ensuring
scalability, flexibility, and efficiency. Here are some principles of good data architecture:

1. Scalability: The architecture should accommodate the growth of data over time.
It should be able to handle increasing data volumes without significant
performance degradation.
2. Flexibility & Adaptability: The architecture should be adaptable to new
technologies, tools, or business needs, enabling the addition of new data sources
or capabilities.
3. Data Quality: The architecture should ensure the accuracy, consistency, and
reliability of data across the organization.
4. Data Security: Security measures should be integrated into the architecture to
protect sensitive data and comply with data privacy regulations (e.g., GDPR,
CCPA).
5. Data Accessibility: Data should be easy to access for authorized users or
applications, promoting collaboration and insight generation.
6. Efficiency: The architecture should be designed to optimize performance,
reducing unnecessary processing and storage costs.
7. Governance and Compliance: The architecture should ensure that data is
managed in accordance with regulatory requirements and internal governance
policies.
8. Automation: The data pipeline should automate data processing, storage,
and integration to reduce human intervention and errors.
Major Architecture
Concepts
Data Lakes vs Data Warehouses:

● Data Lake: A large, centralized repository that stores raw, unstructured, or


semi-structured data. It can handle diverse data types, including social media
posts, logs, and sensor data.
● Data Warehouse: A structured and optimized system for storing relational
data that is used for reporting and analytics. It supports complex queries,
aggregations, and business intelligence tools.

Cloud vs On-Premise Architecture:

● Cloud Architecture: Data and applications are hosted on cloud providers like
AWS, Azure, or Google Cloud, offering scalability and flexibility.
● On-Premise Architecture: The infrastructure is managed internally within an
organization's data center.
Major Architecture Concepts
Microservices Architecture:

● Involves breaking down applications into smaller, independent services that


communicate via APIs. This concept promotes modularity and scalability in
data systems.

Event-Driven Architecture (EDA):

● An architectural paradigm where events (such as data changes) trigger


automated processes or actions in systems. It is commonly used in
streaming data systems and real-time analytics.

Data Integration Layers (ETL/ELT):

● ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are
two common methods for moving and transforming data between systems.
Data Generation in Source Systems
Data generation in source systems refers to the ways in which raw data is
created or collected. Different types of data sources exist, and understanding
them is critical to building a robust data architecture.

Sources of Data:

1. Files and Unstructured Data


2. APIs (Application Programming Interfaces)
3. Application Databases (OLTP)
4. OLAP (Online Analytical Processing)
5. Change Data Capture (CDC)
6. Logs
7. Database Logs
8. CRUD (Create, Read, Update, Delete)
Sources of Data
1. Files and Unstructured Data:
○ Includes log files, text files, CSVs, PDFs, images, videos, and other data
formats that don’t fit neatly into structured databases.
○ These types of data are stored in a Data Lake for further processing.
2. APIs (Application Programming Interfaces):
○ APIs allow applications to request and exchange data in real-time. Many
modern systems, such as social media platforms or cloud services, expose
APIs to allow programmatic access to their data.
○ Example: Pulling stock prices in real-time from a financial API.
3. Application Databases (OLTP):
○ OLTP (Online Transaction Processing) systems are designed for
managing real-time transactional data (e.g., e-commerce platforms, banking
systems).
○ These databases are optimized for frequent reads and writes with a focus on
data integrity.
○ Example: A relational database storing customer orders in an e-commerce
system.
Sources of Data
4. OLAP (Online Analytical Processing):
○ OLAP systems are designed for complex queries and analysis,
typically involving large datasets. They support decision-making
processes by enabling fast and interactive analysis of data.
○ Example: A data warehouse where business analysts query
historical sales data to identify trends.
5. Change Data Capture (CDC):
○ CDC involves capturing changes (insertions, updates, and
deletions) made to data in a database in real-time or near real-
time. It’s often used to synchronize data across systems or
replicate changes to a data warehouse.
○ Example: Using CDC to propagate updates from an operational
database to a data warehouse.
Sources of Data
6. Logs:

○ Logs are records of events that occur in a system, often used for
monitoring and debugging. Logs can be unstructured (e.g., server logs)
and are often ingested into data lakes for processing and analysis.

Example: Web server logs that track user interactions and system events.
Sources of Data
7. Database Logs:

○ Database logs track all changes made to a database, including


transactions, queries, and modifications. These logs are crucial for data
recovery and ensuring data consistency.
○ Example: MySQL’s binary logs that record all changes made to a
database, used for replication and recovery.

8. CRUD (Create, Read, Update, Delete):

○ These are the basic operations for managing data in a database.


■ Create: Adding new data to the system.
■ Read: Retrieving or querying data.
■ Update: Modifying existing data.
■ Delete: Removing data from the system.
○ These operations define how data flows and is manipulated within a source
system or database.
Source System Practical Details

1. Data Source Design: It is crucial to design source systems to ensure the data
is clean, well-structured, and easily accessible for extraction and integration
into the broader data architecture.
2. Data Collection and Sampling: Depending on the use case, data collection
methods (e.g., streaming, batch) should be carefully selected. For example,
real-time systems (e.g., IoT data streams) may require event-based data
collection.
3. Data Consistency and Integrity: Ensuring the consistency and integrity of
data at the source is essential, especially in OLTP systems where transactions
must adhere to ACID properties (Atomicity, Consistency, Isolation, Durability).
Source System Practical Details

4. Data Volume and Velocity: Depending on the system, the data might be
generated in massive volumes (big data systems), or at high velocity (e.g., real-
time systems). The architecture should be capable of handling both types
effectively.

5.Data Storage and Accessibility: Data generated from these sources must
be stored in a way that is scalable and accessible for downstream processing,
typically leveraging both cloud storage (for large volumes) and databases (for
transactional and analytical needs).
Conclusion
Understanding Enterprise Architecture and Data Architecture is critical for
building a system that supports business objectives and ensures the effective use
of data across the organization. Data generation from various sources like APIs,
OLTP systems, and logs demands a careful design of the infrastructure to
efficiently handle, process, and secure data. By integrating good data
management practices, architectures, and tools like CDC and OLAP,
organizations can build scalable, flexible, and secure data systems that drive
business insights and decision-making.

You might also like