8 min.

Data retention and classification in AWS

Overview of data retention and classification features and services in AWS

Data retention and classification in AWS (Amazon Web Services) involve organizing and managing an organization's data according to specific criteria within the AWS ecosystem. 


These practices are essential for determining the protection, access, and retention of data over time, ensuring data security, maintaining compliance with legal and regulatory requirements, and facilitating efficient data management.



Data retention


Data retention in organizations involves defining the duration for storing data, guided by a clear policy that aligns with both regulatory mandates and business requirements. 


Meanwhile, data classification in AWS focuses on identifying and categorizing data within AWS services according to its sensitivity, importance, and usage. This classification is crucial for understanding the stored data, thereby enhancing data management, security, and compliance.


AWS provides a range of services for effective data retention management:


Amazon S3 Lifecycle Policies


Automate the transition of objects to storage classes like Standard, Infrequent Access, or Glacier and enable automatic deletion after a set period.



Amazon S3 Intelligent-Tiering


Moves files between access tiers based on usage patterns, optimizing costs by placing frequently accessed data in cost-effective tiers.



Amazon EBS Snapshots and Amazon Data Lifecycle Manager


Manage EBS snapshots with policies for automatic creation and deletion after a defined retention period.



AWS Backup


Centralizes backup policies across AWS services, allowing backups to be stored in S3 and automatically deleted after a specified duration.



AWS Glue Crawlers


Schedule crawlers to periodically update data in stores like S3 and DynamoDB, aiding in data retention compliance.



Resource Tagging


Facilitate retention policy application and regulatory compliance by classifying data through resource tagging.



Additional Retention Mechanisms:


Data Archiving


Employ Amazon S3 Glacier for cost-effective, long-term archiving of infrequently accessed data.



Monitoring and Audits


Utilize AWS CloudTrail and AWS Config for ongoing auditing and monitoring of data access and lifecycle policies.



Legal Hold and Compliance


Use mechanisms like Amazon S3 Object Lock to prevent deletion of data under legal or compliance requirements.



Disaster Recovery


Align retention policies with disaster recovery and business continuity plans using AWS Backup.



Data classification


Data classification in AWS involves categorizing data stored within its services by sensitivity, importance, and usage. This process is key to understanding the nature of the data, guiding how it should be protected and accessed. It plays a critical role in applying appropriate security controls, such as enhanced access management and encryption, especially for more sensitive data. 


Essential for regulatory compliance and data risk management, AWS offers tools like Amazon S3, Glacier, and Macie to streamline the classification process and the implementation of security measures, thereby ensuring effective data management and compliance.



Resource Tagging


Utilize AWS resource tagging to assign metadata tags for classifying resources by attributes such as confidentiality level.



AWS Identity and Access Management (IAM)


Provides fine-grained access control to manage resource permissions based on their classification.



AWS Organizations


Facilitates central governance of accounts and enforces policies, including mandatory resource tagging for classification.



Data Labeling Services:


Amazon SageMaker


Supports labeling activities in machine learning projects.



AWS Glue DataBrew


Useful for profiling and classifying data in data lakes as part of ETL processes.



Amazon Macie


Automates sensitive data discovery and classification using machine learning and pattern matching, ideal for identifying PII and financial data.



Monitoring and Compliance


Continuously monitor data access and security to ensure ongoing compliance with classification standards.





Data classification overview - Data Classification

Using AWS Cloud to support data classification - Data Classification

Data classification - security best practices

Data classification overview - Data Classification

SUS04-BP01 Implement a data classification policy - AWS Well-Architected Framework (2022-03-31)

Best practice 3.7 – Implement data retention policies for each class of data in the analytics workload - Data Analytics Lens

COST04-BP05 Enforce data retention policies - Cost Optimization Pillar

COST04-BP05 Enforce data retention policies - AWS Well-Architected Framework

Best practice 3.7 – Implement data retention policies for each class of data in the analytics workload - Data Analytics Lens