Centralizing and automating data protection helps you support your business continuity and regulatory compliance goals. Centralized data protection and enhanced visibility across backup operations can reduce the risks of disasters, improve business continuity, and simplify the auditing process. Many organizations have requirements to retain backups of their compute instances for a certain time based on compliance requirements, and in fulfilling these requirements, the same organizations seek cost-effective archival storage of their backups.
With AWS Backup, a fully managed and centralized backup service, you can centralize data protection for your application data stored in AWS services for compute, storage, and databases and meet your business continuity goals. An AWS Backup recovery point for an Amazon EC2 instance is represented by an Amazon Machine Image (AMI). When you backup an Amazon EC2 instance, AWS Backup protects all Amazon EBS volumes attached to the instance as an AMI. The AMI retains information about block device mapping that specifies the volumes to attach to the instance when it’s launched.
In this post, we explain one of our customer’s use cases, which provided us with an opportunity to develop this solution. The customer use case involves ~500 Amazon EC2 instances to support their critical business applications across multiple accounts. The customer has AWS Backup configured to manage and govern the EC2 instance backups. The customer’s organization has HIPAA and SOC2 compliance requirements to maintain protected backups of each EC2 instance for up to 6 months. To align with compliance, the customer decided to archive backups over 30 days old to the lowest cost archive storage which provides both high performance and instant retrieval for achieving lower RTO objectives. Working with the customer, we identified that the Amazon S3 Glacier Instant Retrieval storage class is suitable in this scenario which delivers the lowest cost storage for long-lived data that is rarely accessed and requires milliseconds retrieval.
In this blog post (the first in a two-part series), we give you a walkthrough of the solution which we’ve provided to our customer to archive old EC2 backup recovery points (AMIs) to the Amazon S3 Glacier Instant Retrieval storage class. In part 2, we discuss restoring archived backups from the Amazon S3 Glacier storage classes.
This solution uses an event-driven architecture built using the following services: AWS Backup, Amazon S3, Amazon EventBridge, AWS Step Functions, AWS Systems Manager Parameter Store, AWS Lambda, AWS IAM, and Amazon SQS. The solution has a workflow that retrieves the EC2 backup recovery points from AWS Backup, copies the backup recovery points to an S3 bucket, immediately moves the objects to an Amazon S3 Glacier storage class of your choice, and deletes the backup recovery points from AWS Backup. The solution uses AWS Systems Manager Parameter Store to store the user input values while deploying the solution. The values are retrieved from the SSM Parameter Store during the execution time.
The following diagram illustrates the solution architecture:
Figure 1: Solution architecture for archiving Amazon EC2 backup recovery points in AWS Backup to Amazon S3 Glacier Storage Classes
- An Amazon EventBridge Rule is configured with an event schedule to run at a scheduled interval based on the CRON expression value provided in the parameters.
- AWS Step Functions state machine workflow is configured to be triggered by an Amazon EventBridge rule as a Target. The state machine has a definition to perform a series of steps which includes executing multiple Lambda functions.
- The archival process begins with:
a. Retrieving the list of EC2 backup recovery points (AMIs) created before ‘X’ number of days from the backup vault in AWS Backup. The list contains one or more recovery points with the Amazon Resource Name (ARN) value of the AMI.
b. Validation performed to ensure that only the recovery points in Completed status are retrieved.
c. If there are no recovery points returned for a particular date, the state machine execution will be completed with no recovery points.
4. A CreateStoreImageTask gets initiated to store an AMI as a single object in an Amazon S3 bucket.
a. This is an asynchronous task that copies the AMI to an S3 bucket as a compressed object file.
b. The AMI object is named with AMI ID followed by bin extension, for example, ami-xxxxxxxxxxxxxxxxx.bin
c. The path of the object is recovery-points/account_id/aws_region/ec2_instance_id/year/month/day/ami-xxxxxxxxxxxxxxxxx.bin where the year, month, and day is the actual date when the AMI was first created by AWS Backup.
d. The object stored in S3 from this process is by default in the S3 Standard storage class.
e. A process is involved to validate the status of CreateStoreImageTask until it is completed.
5. An S3 copy task using Boto3 gets triggered to change the object’s storage class from the S3 Standard storage class to an Amazon S3 Glacier storage class of your choice. This Boto3 action will perform multipart copy in multiple threads if necessary. (Note – This solution does not use Amazon S3 Lifecycle to transition the objects to Amazon S3 Glacier storage).
a. Currently supported S3 Glacier storage classes are GLACIER, DEEP_ARCHIVE and GLACIER_IR
b. This process will validate the object’s storage class change, and it will wait until the change is completed.
6. Once the EC2 backup recovery point (AMI) is transitioned to S3 Glacier storage, the recovery point will be deleted from the backup vault in AWS Backup.
a. Execution status of the recovery point is published as a message to an SQS queue.
b. Repeat Step 4 to Step 6a until all the EC2 recovery points are processed.
7. In this final step, a consolidated report is generated.
a. A process gets triggered to generate a log file containing the consolidated status report of each recovery point from the SQS queue.
b. The log file is placed in a different folder in the same S3 bucket in which the recovery point objects are stored.
The following diagram illustrates the AWS Step Functions workflow.
Figure 2: AWS Step Functions state machine definition showing each step involved in the archival process
In this section, we cover this solution’s prerequisites and instructions to deploy the solution using AWS CloudFormation.
Assuming that you are currently leveraging AWS Backup to manage backups for your Amazon EC2 instances, the following are the prerequisites required to deploy the solution:
- Name of the AWS Backup vault that is currently in use.
- Verify that your backup vault’s access policy doesn’t block performing the
- Make sure you have a few EC2 recovery points in AWS Backup to test the solution.
- IAM permissions to create AWS CloudFormation stack.
Deploy the solution
We created an AWS CloudFormation template that you can launch to deploy the entire solution within minutes in your desired AWS Region. This template creates the following resources in your account:
- Amazon S3 bucket to store the backup recovery points.
- AWS Systems Manager Parameter Store to store the parameter values.
- AWS IAM Role for the AWS Lambda functions to execute, and an AWS IAM Role for the AWS Step Functions state machine to execute.
- AWS Lambda functions to perform the tasks involved.
- AWS Step Functions state machine with definition.
- Amazon CloudWatch log group to capture the state machine logs.
- Amazon EventBridge rule to trigger the workflow on a schedule basis.
- Amazon SQS standard queue with a message retention period of 2 days to record the status of each execution.
Create an AWS CloudFormation stack in the account/Region where the EC2 recovery point(s) exists.
- Launch the stack here. This will automatically launch the AWS CloudFormation service console in your AWS account with a template. If you’re not logged into your AWS account, you will be prompted to sign in.
- Select the Region where you want to deploy this solution.
Figure 3: Create the CloudFormation stack using the quick-create link
3. You are required to provide a Stack name and the following parameters for the solution to work, then select Next.
a. ArchivalBucketName – A unique name for the S3 bucket to archive the EC2 backup recovery points.
b. ArchiveStorageClass – Type of the S3 Glacier storage class. Allowed: GLACIER_IR, GLACIER, DEEP_ARCHIVE
c. BackupVaultName – Name of the backup vault in AWS Backup that is currently in use.
d. DaysRecoveryPointsCreatedBefore– To get the list of EC2 backup recovery points created before the number of given days. For example, if you want to get the list of recovery points created before 5 days from today, give the parameter value as 5.
e. ScheduleWindow – A cron expression to determine the schedule for the workflow to run automatically.
Figure 4: Configuring stack name and parameters for the CloudFormation stack
- On the Configure Stack options page, select Next.
- On the Review page, make sure you select the check box I acknowledge that AWS CloudFormation might create IAM resource. When you’re ready, select Create stack.
Figure 5: Creating the CloudFormation stack
Test the solution
We have deployed this solution using a scheduled EventBridge rule which is set to the CRON expression provided in the input parameters, and will automatically be triggered based on the scheduled time. You can also trigger the solution manually by navigating to Step Functions on your AWS Management Console and selecting the State machine which is just deployed. Select Start execution. The execution time of the workflow depends on the number of recovery points retrieved and the size of the recovery points.
Figure 6: Manually start an execution of Step Functions state machine
Note – This solution is configured to process one EC2 backup recovery point at a time, and this can be modified to work in parallel for multiple recovery points simultaneously. This would require modifying the MaxConcurrency property of the Map task in the state machine definition which is explained below. While making this change, we recommend you review the AMI store and restore service limits documentation for the maximum size of an AMI. The maximum size in reference is the AMI that will be stored in S3 using the CreateStoreImageTask API, and total size of in-progress store image tasks.
Steps to change the MaxConcurrency property of the Map Task:
- Navigate to AWS Step Functions console, then go to State machines and edit the state machine.
Figure 7: Modifying Step Functions state machine definition
- Select the Workshop Studio button on the top right.
- Select the map task in the canvas, and change the Maximum concurrency value in the form on the right side. Then select the Apply and exit button.
Figure 8: Modifying the Maximum Concurrently property of the state machine
Validate the solution
To validate the solution, navigate to the Amazon S3 console, and browse the S3 bucket that is created through this solution. You will be able to view the archived recovery points in the path recovery-points/account_id/aws_region/ec2_instance_id/year/month/day/ami-xxxxxxxxxxxxxxxxx.bin. The year/month/day in the object’s path describes the actual time a particular recovery point was created by AWS Backup.
Figure 9: Sample data showing archived EC2 backup recovery points in S3 Glacier storage
After all of the recovery points are processed in a single execution of the state machine, a log file will be placed in the same S3 bucket that is created through this solution. You will be able to download the log file located in the path logs/ with the latest timestamp.
Figure 10: Download the log file from the S3 bucket
Figure 11: Sample log file containing the status of each recovery point in a state machine execution
To delete the resources that were created through this solution, navigate to the CloudFormation console, select the stack, and delete. All of the resources will be deleted except for the S3 bucket, because you now have the archived data of your EC2 backup recovery point. So, we are retaining the S3 bucket. You can validate and delete the objects and the bucket manually if you no longer need the recovery points. Note that objects stored in S3 Glacier Instant Retrieval and S3 Glacier Flexible Retrieval are charged for a minimum storage duration of 90 days, and objects stored in S3 Glacier Deep Archive have a minimum storage duration of 180 days.
Figure 12: Deleting a CloudFormation stack
In this post, we started with a brief recap of a customer scenario that led us to design a solution to further optimize costs. The solution involves a workflow to copy EC2 backup recovery points from AWS Backup to an Amazon S3 bucket. Further, we showed how the solution instantly changes the storage class of the recovery point from S3 Standard to one of the S3 Glacier storage classes. At the end, the solution invokes a process to delete the recovery point from AWS Backup. We performed several tests using EC2 instances with 100 GB, 500 GB, and 1 TB in size with sample data. Lastly, we provided you with a CloudFormation template to deploy the solution.
In part 2 of this blog, we show you a workflow to retrieve and restore the archived EC2 backup recovery points from S3 Glacier storage.
Thanks for reading this blog post! Don’t hesitate to leave your feedback or comments in the comments section.