Lecture 21-30 CC
Lecture 21-30 CC
Lecture 21-30 CC
The cloud data lifecycle refers to the stages data goes through during its time in the cloud.
Here's a breakdown of the common phases:
1. Data Ingestion: Uploading data to the cloud storage service. This can involve transferring
data from on-premises systems, applications, or user uploads.
2. Data Storage: Storing the data in a cloud storage solution based on its access frequency,
regulatory requirements, and cost considerations. Different cloud storage types offer
varying levels of performance, durability, and cost.
3. Data Processing: Analyzing, transforming, or manipulating the data in the cloud.
Processing can involve using cloud-based analytics services or big data frameworks.
4. Data Management: Organizing, securing, and controlling access to the data throughout its
lifecycle. This includes defining access control policies, data encryption, and version
control.
5. Data Archiving: Moving less frequently accessed data to a lower-cost storage tier for long-
term retention. Cloud providers offer archive storage options for data that doesn't require
immediate access.
6. Data Governance: Establishing policies and procedures for data management across the
lifecycle. This ensures data compliance with regulations and organizational needs.
7. Data Deletion: Securely erasing data that is no longer required. This includes following
data retention policies and regulations.
Understanding the data lifecycle helps you choose the right cloud storage and services for each
stage, optimizing cost, performance, and security.
● Object Storage: Stores data as objects (files with metadata) accessed by unique identifiers.
Ideal for large datasets, unstructured data, and frequent retrieval. (e.g., Amazon S3)
● Block Storage: Provides virtual disk volumes that function similarly to physical hard drives.
Suitable for applications requiring frequent read/write access, like databases. (e.g., Amazon
Elastic Block Store (EBS))
● File Storage: Offers a hierarchical file system structure familiar to traditional file servers.
Good for user data, collaboration, and application data that needs a file system structure.
(e.g., Amazon Elastic File System (EFS))
● Archive Storage: Low-cost, long-term storage for infrequently accessed data. Retrieval
times may be longer compared to other storage types. (e.g., Amazon Glacier)
● Scalability: You can easily scale storage capacity up or down by attaching additional EBS
volumes to your EC2 instances as needed.
● Durability: EBS volumes are replicated across multiple availability zones for high availability
and data durability.
● Performance: EBS offers different volume types optimized for various performance needs,
allowing you to choose the right option for your workload (e.g., high-performance SSD vs.
cost-optimized HDD volumes).
● Flexibility: EBS volumes can be detached from one instance and attached to another,
providing flexibility for data migration and application deployment.
EBS is ideal for storing data that requires frequent read/write access, such as databases,
application data, and frequently accessed files.
● Cost-effective: Glacier is ideal for long-term archiving of data that doesn't require immediate
access, significantly reducing storage costs compared to frequently accessed data.
● Scalability: Glacier scales seamlessly to accommodate massive amounts of archived data.
● Durability: Glacier offers high data durability, ensuring your archived data is safe and
retrievable over long periods.
Glacier is a good choice for backups, logs, medical records, and other data that needs to be
retained for compliance or historical purposes but doesn't require fast