0% found this document useful (0 votes)
3 views6 pages

Storage

The document provides an overview of data storage systems in data engineering, detailing structured, unstructured, and semi-structured data storage methods. It discusses key components such as data warehouses, data lakes, and distributed file systems, along with the importance of data access frequency categorized into hot, lukewarm, and cold data. Additionally, it highlights key considerations for effective data storage, including compatibility, scalability, performance, and query capabilities.

Uploaded by

rajapraneesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views6 pages

Storage

The document provides an overview of data storage systems in data engineering, detailing structured, unstructured, and semi-structured data storage methods. It discusses key components such as data warehouses, data lakes, and distributed file systems, along with the importance of data access frequency categorized into hot, lukewarm, and cold data. Additionally, it highlights key considerations for effective data storage, including compatibility, scalability, performance, and query capabilities.

Uploaded by

rajapraneesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Storage Systems

Unit - 2

Prepared By
Dr M Praneesh
Sri Ramakrishna College of Arts & Science
Data Storage

In data engineering, data


storage refers to the
mechanisms and
technologies used to store,
organize, and manage large
volumes of data efficiently.
It encompasses various storage
systems, databases, file systems,
and data formats designed to meet
the diverse needs of storing and
accessing data throughout its
lifecycle.
Data Storage Systems

Structured data storage involves


organizing data into predefined Unstructured data storage
schemas, tables, and columns, deals with storing data
typically used in relational database without a predefined
management systems (RDBMS) like schema or structure, such
MySQL, PostgreSQL, or SQL Server. as documents, images,
videos, audio files, and log
Semi-structured data storage files. Object storage
accommodates data with flexible solutions like Amazon S3,
schemas or irregular structures, such Google Cloud Storage, or
as JSON, XML, or key-value pairs. Azure Blob Storage are
NoSQL databases like MongoDB, popular choices for storing
Cassandra, or Couchbase are unstructured data due to
commonly used for semi-structured their scalability, durability,
data storage due to their flexible and cost-effectiveness.
schemas and scalability for handling
unstructured or semi-structured data.
Key Components of Storage
• Data warehouses are specialized databases designed for storing and analyzing
Data large volumes of structured and semi-structured data for business intelligence
ware (BI) and analytics purposes. Examples- Amazon Redshift, Google BigQuery, and
hous Snowflake
e

• Data lakes are centralized repositories that store vast amounts of raw,
unprocessed data in its native format, enabling organizations to perform
advanced analytics, machine learning, and data exploration. Technologies like
Data Apache Hadoop, Apache Spark, and AWS Glue are commonly used for building
Lake and managing data lakes, offering support for batch and real-time data
processing, data ingestion, and data governance.

• Distributed file systems like Hadoop Distributed File System (HDFS) and Google
File File System (GFS) provide scalable, fault-tolerant storage solutions for
Syste distributed computing environments. They are optimized for storing and
m processing large datasets across multiple nodes in a distributed computing
cluster, supporting parallel data processing and fault tolerance.
Data Access Frequency

Data access frequency determines the


“temperature” of your data

•Hot Data: Frequently accessed data, needing


fast retrieval, stored in high-speed storage
solutions.
•Lukewarm Data: Accessed occasionally,
stored in moderately fast storage solutions.
•Cold Data: Rarely accessed, suitable for
archival storage solutions with lower retrieval
costs but higher storage efficiency.
Key Considerations

Compatibility Scalability

Key Factors Data Retrieval


Performance Bottlenecks

Understanding of Technology Query Capabilities:

You might also like