Amazon S3 Data Protection
Amazon S3 Data Protection
Versioning-enabled buckets can help you recover objects from accidental deletion or
overwrite. For example, if you delete an object, Amazon S3 inserts a delete marker
instead of removing the object permanently. The delete marker becomes the current
object version. If you overwrite an object, it results in a new object version in the
bucket.
Prerequisites:
you will need to prepare the necessary resources for this workshop. These are in
specific AWS Regions so you will need permission to use us-east-1 and ap-south-1,
as well as us-west-2.
Now create S3 buckets for modules 3-5 in different regions, with multiple buckets:
for buckets_in_uswest2 in {dp-workshop-module-3-source-bucket,dp-workshop-
module-3-report,dp-workshop-module-4-us-west-2,dp-workshop-module-5-primary-
bucket}
do
aws s3api create-bucket --region us-west-2 --bucket $buckets_in_uswest2-
$bucket_suffix --create-bucket-configuration LocationConstraint=us-west-2
done
for buckets_in_useast1 in {dp-workshop-module-3-destination-bucket,dp-workshop-
module-4-us-east-1,dp-workshop-module-5-secondary-bucket}
do
aws s3api create-bucket --region us-east-1 --bucket $buckets_in_useast1-
$bucket_suffix
done
aws s3api create-bucket --region ap-south-1 --bucket dp-workshop-module-4-ap-
south-1-$bucket_suffix --create-bucket-configuration LocationConstraint=ap-south-1
#Create an S3 Multi-Region Access Point for module 4, with the following command.
Note that the command captures the creation request token as an environment
variable.
mrapcreationrequestarn=$(aws s3control create-multi-region-access-point --region
us-west-2 --account-id $account \
--details Name=mrap-1,Regions=[{Bucket=dp-workshop-module-4-us-east-1-
$bucket_suffix},{Bucket=dp-workshop-module-4-us-west-2-$bucket_suffix},
{Bucket=dp-workshop-module-4-ap-south-1-$bucket_suffix}] \
| jq -r '.RequestTokenARN')
aws s3control --region us-west-2 describe-multi-region-access-point-operation --
account $account --request-token-arn $mrapcreationrequestarn #displays the multi-
region-access-point and it’s using the account id and token as variables
Module 1 S3 Versioning:
Versioning-enabled buckets can help you recover objects from accidental deletion or
overwrite. For example, if you delete an object, Amazon S3 inserts a delete marker
instead of removing the object permanently. The delete marker becomes the current
object version. If you overwrite an object, it results in a new object version in the
bucket. You can always restore the previous version.
To make running the commands in the workshop easier, assign a bash variable in
the CloudShell session to your module1 bucket, by running the following command:
module1_bucket=$(aws s3api list-buckets --query "Buckets[].Name" | jq -r '.[]|
select(. | startswith("dp-workshop-module-1"))') #assigns the bucket name to the
bash variable
echo $module1_bucket #displays the value of the bash variable #displays the bash
variable
echo "No versioning on this object" > unversioned.txt #creates a text file and writes
content to it
aws s3api put-object --bucket $module1_bucket --key unversioned.txt --body
unversioned.txt #puts the object(file) into the bucket
aws s3api get-bucket-versioning --bucket $module1_bucket #to check whether
bucket versioning is enabled or not.
Each object has a Version ID (VersionId), whether or not S3 Versioning is enabled. If
S3 Versioning is not enabled, Amazon S3 sets the value of the version ID to null.
aws s3api list-object-versions --bucket $module1_bucket #lists the objects of a
bucket with their version ids.
aws s3api put-bucket-versioning --bucket $module1_bucket --versioning-
configuration Status=Enabled #this enables the versioning on the bucket.
Versioning can only be enabled on a bucket.
aws s3api get-bucket-versioning --bucket $module1_bucket #to check the
versioning on the bucket.
aws s3api put-object --bucket $module1_bucket --key important.txt --body
important.txt #created a file and uploaded into the bucket after enabling the
version
The version ID value distinguishes that object from other versions with the same
key (name).
aws s3api list-objects-v2 --bucket $module1_bucket #displays the latest updated
files only
aws s3api list-object-versions --bucket $module1_bucket #displays the all the
objects with their version IDs
aws s3api delete-object --bucket $module1_bucket --key important.txt #deletes the
object from the bucket which has the latest version of the bucket
When versioning is enabled, a simple delete cannot permanently delete an object.
Instead, Amazon S3 inserts a delete marker in the bucket, and that marker becomes
the current version of the object with a new ID. Now when you perform a list API
call, you won’t see the file anymore as the delete marker is masking it.
aws s3api list-objects-v2 --bucket $module1_bucket #displays the objects of a
bucket
To restore a deleted object, you need to perform a version-specific delete on the
delete marker i.e, you need to run the delete-object command and specify the
version ID of the delete marker to permanently delete the delete marker and
promote the newest remaining object to be the current version.
aws s3api delete-object --bucket $module1_bucket --key important.txt --version-id
"VERSION ID of delete Marker" #deletes the object with that particular version id
aws s3api list-objects-v2 –bucket $module1_bucket #displays the restored file after
deleting the delete marker of the deleted file
To permanently delete an object version, run the delete-object command and
specify the version id of the file you wish to permanently delete. Once a version
specific delete is made, a delete marker will not be generated and that version of
the file will be permanently deleted from the bucket.
You may wish to prevent the use of delete-object-version, to ensure versions are
retained.
Keeping noncurrent versions of objects can increase your storage costs. You can use
lifecycle rules to manage the lifetime and the cost of storing multiple versions of
your objects. You can have lifecycle rules configured to permanently remove these
older versions.
You can also set up a lifecycle rule that archives all of your previous versions to the
lower-cost Amazon S3 Glacier Instant Retrieval or Amazon S3 Glacier Flexible
Retrieval storage classes and then deletes them after a specific number of days,
giving you a window to roll back any changes on your data while lowering your
storage costs.
If you wish to prevent the permanent deletion of object versions, it is possible to
configure a bucket policy which prevents the use of delete-object-version to add an
extra protection against object deletion.
echo '{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Deny delete-object-version",
"Effect": "Deny",
"Principal": {
"AWS": "*"
},
"Action": "s3:DeleteObjectVersion",
"Resource": "arn:aws:s3:::'$module1_bucket'/*"
}
]
}' > deny-deleteobjectversion-policy.json
aws s3api put-bucket-policy --bucket $module1_bucket --policy file://deny-
deleteobjectversion-policy.json #applies the created policy to the above bucket
A versioning-enabled bucket can have many versions of the same object, one
current version and zero or more noncurrent (previous) versions. Using a lifecycle
policy, you can define actions specific to current and noncurrent object versions.
We are going to set up lifecycle rules applying to all objects starting with the
"example/" prefix to:
Move all non-current (previous) versions of your objects to the S3 Glacier
Instant Retrieval Storage Class, if they are at least 400 KB in size;
Delete non-current versions after they have been non-current for 91 days;
Remove expired delete markers, i.e. objects where there is only a delete
marker and no remaining versions.
Create a lifecycle configuration file using the command below. We will use this to
pass the configuration using the CLI.
echo '{
"Rules": [
{
"ID": "Lifecycle rule to transition noncurrent versions to GIR",
"Filter": {
"And": {
"Prefix": "example/",
"ObjectSizeGreaterThan": 409600
}
},
"Status": "Enabled",
"NoncurrentVersionTransitions": [{
"NoncurrentDays": 0,
"StorageClass": "GLACIER_IR"
}]
},
{
"ID": "Lifecycle rule to expire non-current versions after 91 days",
"Filter": {
"Prefix": "example/"
},
"Status": "Enabled",
"NoncurrentVersionExpiration": {
"NoncurrentDays": 91
},
"Expiration": {
"ExpiredObjectDeleteMarker": true
}
}
]
}' > lifecycle.json
Run the command below to add the lifecycle configuration to the bucket.
aws s3api put-bucket-lifecycle-configuration --bucket $module1_bucket --lifecycle-
configuration file://lifecycle.json
Create a unique suffix based on your account number, this will ensure the bucket
name is unique: Creating a new bucket.
bucket_suffix=$(aws sts get-caller-identity --query Account --output text | md5sum |
awk '{print substr($1,length($1)-5)}') #creates a suffix for the bucket
#Next, create the bucket, note that we are enabling object lock during the creation
process with -object-lock-enabled-for-bucket. Enabling Object Lock also enables
versioning.
aws s3api create-bucket --region $AWS_DEFAULT_REGION --bucket dp-workshop-
module-2-$bucket_suffix \
--object-lock-enabled-for-bucket --create-bucket-configuration
LocationConstraint=$AWS_DEFAULT_REGION
In governance mode, you can't overwrite or delete an object version or alter its lock
settings unless you have the s3:BypassGovernanceRetention permission. The S3
console by default includes the x-amz-bypass-governance-retention:true header. If
you try to delete objects protected by governance mode and have
3:BypassGovernanceRetention permissions, the operation will succeed. For this
reason, we are using compliance mode in this tutorial.
Module 3: S3 Replication:
S3 Replication enables automatic, asynchronous copying of objects across Amazon
S3 buckets. You can replicate objects to a single destination bucket or to multiple
destination buckets. The destination buckets can be in different AWS Regions or
within the same Region as the source bucket.
Setting up the bash variables for the buckets(need to reset if the shell is restarted
or disconnected):
module3_source_bucket=$(aws s3api list-buckets --query "Buckets[].Name" | jq -r '.
[]|select(. | startswith("dp-workshop-module-3-source-bucket"))') #source bucket
module3_dest_bucket=$(aws s3api list-buckets --query "Buckets[].Name" | jq -r '.[]|
select(. | startswith("dp-workshop-module-3-destination-bucket"))') #destination
bucket
module3_report_bucket=$(aws s3api list-buckets --query "Buckets[].Name" | jq -r '.
[]|select(. | startswith("dp-workshop-module-3-report"))') #report bucket.
We will also create some objects in the source bucket before configuring replication.
echo "prestaged-object-1" > prestaged-object-1.txt
echo "prestaged-object-2" > prestaged-object-2.txt
echo "prestaged-object-3" > prestaged-object-3.txt
echo "prestaged-object-4" > prestaged-object-4.txt
aws s3api put-object --bucket $module3_source_bucket --key prestaged-object-1.txt
--body prestaged-object-1.txt
aws s3api put-object --bucket $module3_source_bucket --key prestaged-object-2.txt
--body prestaged-object-2.txt
aws s3api put-object --bucket $module3_source_bucket --key prestaged-object-3.txt
--body prestaged-object-3.txt
aws s3api put-object --bucket $module3_source_bucket --key prestaged-object-4.txt
--body prestaged-object-4.txt
Before creating a replication rule, we need to create an IAM role that the S3 service
will leverage to replicate objects from the source to the destination bucket.
(a) First create a trust policy which grants the Amazon S3 service principal
permissions to assume the role. The command below will create a file
named s3-role-trust-policy.json, that we can use when calling the create
role command in a later step.
echo '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "s3.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}' > s3-role-trust-policy.json
(b) Next create the role and attach the trust policy that allows S3 to assume this
role.
aws iam create-role --role-name dp-workshop-replicationRole --assume-role-
policy-document file://s3-role-trust-policy.json
(c) After creating the role, we must attach a permissions policy to the role. The
command below will create a file named s3-role-permissions-
policy.json This policy grants permissions for various Amazon S3 bucket and
object actions to replicate objects between buckets in the same account.
echo '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObjectVersionForReplication",
"s3:GetObjectVersionAcl",
"s3:GetObjectVersionTagging"
],
"Resource": [
"arn:aws:s3:::'$module3_source_bucket'/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetReplicationConfiguration"
],
"Resource": [
"arn:aws:s3:::'$module3_source_bucket'"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ReplicateObject",
"s3:ReplicateDelete",
"s3:ReplicateTags"
],
"Resource": "arn:aws:s3:::'$module3_dest_bucket'/*"
}
]
}' > s3-role-permissions-policy.json
(d) Apply the permissions policy to the assumed role, to specify what it is allowed
to do.
aws iam put-role-policy --role-name dp-workshop-replicationRole --policy-
name dp-workshop-replicationRolePolicy \
--policy-document file://s3-role-permissions-policy.json.
(e) we need the ARN of the IAM role, so let's capture it in a variable with the
following command:
In the section, we set the destination bucket. Then we specify the storage
class we want to use on the destination bucket, this can be different from the
destination class on the source bucket. In this instance we've
chosen GLACIER_IR (Glacier Instant Retrieval) which is a cost-effective storage
class for objects that are accessed less than once a quarter and require
immediate access. We also enable Replication Time Control (RTC) . S3 RTC
replicates most objects that you upload to Amazon S3 in seconds, and 99.99
percent of those objects within 15 minutes.
The next property we've configured, is delete marker replication. This means
that any objects replaced with a delete marker on the source bucket, will
cause a delete marker to be placed on their replica in the destination bucket.
(b) Now we are going to add the replication configuration rule to the source
bucket
(c) Once the rule is applied, all the objects which will be created from now will be
replicated to the destination bucket. However, the replication is not done for
the existing objects between source and destination buckets.
In this section we will create an IAM role which will be used by S3 Batch Replication
to replicate existing objects to the destination bucket.
(a) First let's create a trust policy that grants the Amazon S3 Batch Operations
principal permissions to assume the role, by running the following command,
which will create a file named batch-ops-role-trust-policy.json:
echo '{
"Version":"2012-10-17",
"Statement":[
{
"Effect":"Allow",
"Principal":{
"Service":"batchoperations.s3.amazonaws.com"
},
"Action":"sts:AssumeRole"
}
]
}' > batch-ops-role-trust-policy.json
(b) Next create the role and attach the trust policy that allows S3 batch
operations to assume this role.
aws iam create-role --role-name dp-workshop-batchopsRole --assume-role-
policy-document file://batch-ops-role-trust-policy.json
(c) After creating a role, we must attach a permissions policy to the role. Run the
command below to create a permissions policy and save it to a file
named batchops-role-permissions-policy.json. This policy grants
permissions for S3 batch operations to initiate replication on the existing
objects in the S3 bucket.
echo '{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:InitiateReplication"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::'$module3_source_bucket'/*"
]
},
{
"Action": [
"s3:GetReplicationConfiguration",
"s3:PutInventoryConfiguration"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::'$module3_source_bucket'"
]
},
{
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Effect": "Allow",
"Resource": [
"arn:aws:s3:::'$module3_report_bucket'/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::'$module3_report_bucket'*"
]
}
]
}' > batchops-role-permissions-policy.json
(d) Apply the permissions policy (in this example an inline policy) to the role to
specify what it is allowed to do. The command above doesn’t return any
output.
(e) let's add the batch role arn to a variable to use in the next part of the lab,
where we will create the batch job to replicate existing objects.
Before creating the job, we need to capture the AWS account number using
the following command:
account_id=$(aws sts get-caller-identity | jq -r '.Account')
To obtain the status of the job, run the describe-job command with the job ID
output from the previous command. The output shows the status of the job,
number of tasks succeeded and number of tasks that failed among other
details.
aws s3control describe-job --account-id $account_id --job-id "Job-Id"
Earlier in the workshop we enabled delete marker replication when configuring the
replication rule. This means that when an object is deleted in the source bucket and
a delete marker created on top of it, the delete marker will also be replicated to the
destination bucket.
aws s3api delete-object --bucket $module3_source_bucket --key test.txt
Write S3 objects
#set variables for each of your account ID, bucket names and your S3 Multi-Region
Access Point's ARN with the following commands:
account=$(aws sts get-caller-identity | jq -r '.Account')
ap_southeast_2_bucket=$(aws s3api list-buckets --query "Buckets[].Name" | jq -r '.
[]|select(. | startswith("dp-workshop-module-4-ap-southeast-2"))')
us_east_1_bucket=$(aws s3api list-buckets --query "Buckets[].Name" | jq -r '.[]|
select(. | startswith("dp-workshop-module-4-us-east-1"))')
us_west_2_bucket=$(aws s3api list-buckets --query "Buckets[].Name" | jq -r '.[]|
select(. | startswith("dp-workshop-module-4-us-west-2"))')
mrap_alias=$(aws s3control list-multi-region-access-points --account-id $account |
jq -r '.AccessPoints[0] | "\(.Alias)"')
mrap_arn=arn:aws:s3::$account:accesspoint/$mrap_alias
# Copy and paste the following line into CloudShell to write an object to your Multi-
Region Access Point:
aws s3api put-object --bucket $mrap_arn --key written-to-multi-region-access-
point.txt --body module4.txt
aws s3api list-objects-v2 --bucket $mrap_arn #to display the objects of multi-region
access points
# Let’s check the object metadata of each object by running some s3api head-
object commands:
aws s3api head-object --bucket $mrap_arn --key written-to-multi-region-access-
point.txt
aws s3api head-object --bucket $mrap_arn --key written-to-ap-southeast-2.txt
aws s3api head-object --bucket $mrap_arn --key written-to-us-east-1.txt
aws s3api head-object --bucket $mrap_arn --key written-to-us-west-2.txt
Note that the two of the objects written directly to S3 buckets are replicas
("ReplicationStatus": "REPLICA"). The object written to the us-west-2 bucket is not a
replica, and neither is the one written to the Multi-Region Access Point ARN. This is
because the S3 bucket in us-west-2 is closest to the CloudShell instance in us-west-
2 (funnily enough!), so that's where our requests are being routed.
Now you will use S3 Multi-Region Access Points failover controls to set the S3 Bucket
in us-west-2 to passive, using the AWS Console.
Note that only the object written to the us-east-1 bucket is not a replica
("ReplicationStatus": "REPLICA").
This tells us that, with us-west-2 set to passive and not available for routing, our
nearest AWS Region is us-east-1