Jan 29 202410 min.

Storage types with associated characteristics in AWS

Overview of storage types in AWS and their use cases

Full-Stack Cloud Engineer

TABLE OF CONTENTS

Object storages
Block storages
File storages
Hybrid Storages
References

There are three distinct (+ one hybrid) storage types used in the AWS ecosystem, each with unique characteristics and use cases.

Object storages

Object storage is a method for managing unstructured data such as photos, videos, and emails as objects within a flat storage structure called buckets, rather than in hierarchical file systems.

Each object consists of the data, metadata providing context, and a unique identifier. This structure enables efficient access and management of large data volumes, making object storage ideal for data lakes, cloud-native applications, analytics, and machine learning, which require scalability and flexibility.

The flat nature of object storage, where data isn't stored in folders but as individual objects within buckets, allows for easier retrieval and analysis based on metadata and unique identifiers. This approach removes the scalability limitations of traditional storage systems and is particularly suitable for cloud environments.

Key benefits of object storage include its virtually unlimited scalability, cost-effectiveness for storing and managing large volumes of data, and enhanced durability and resilience. Data is distributed across multiple devices and locations, ensuring high availability and protecting against data loss.

AWS object storages

Amazon S3 is the sole object storage service offered by AWS, available in various forms know as classes to cater to diverse requirements and use cases.

S3 Standard

Best for data that is frequently accessed, offering high throughput and low latency. It ensures 99.99% availability, making it reliable for active use cases.

S3 Intelligent-Tiering

Ideal for data with variable access patterns. This class automatically moves data between tiers (Standard, Infrequent Access, Glacier) to optimize storage costs based on usage.

S3 Standard-Infrequent Access (S3 Standard-IA)

Suitable for less frequently accessed data that still requires quick retrieval. It offers lower storage costs compared to S3 Standard, albeit with a retrieval fee.

S3 One Zone-Infrequent Access (S3 One Zone-IA)

Stores data in a single Availability Zone, providing a cost-effective solution for infrequently accessed data without the need for multiple zone availability.

S3 Glacier and S3 Glacier Deep Archive

Designed for long-term data archiving and backup. These classes are cost-efficient for rarely accessed data, with retrieval times varying from minutes (Glacier) to 12 hours or more (Glacier Deep Archive).

Amazon S3 on Outposts

Provides on-premises object storage with AWS Outposts, catering to applications requiring local data processing and residency.

S3 also enables automatic transitioning of objects between these classes based on changing access patterns over time, using S3 Lifecycle policies. This flexibility allows for efficient data management and cost optimization across various storage needs.

Block storages

Block storage organizes data into volumes, commonly referred to as blocks, which explains the origin of the term 'block storage'. A typical block storage system consists of many such blocks. The core concept is that data stored in each block can be accessed and written to independently of other blocks, significantly enhancing the overall performance of the storage system.

This type of storage is typically utilized by Virtual Machines (VMs) like AWS EC2 instances, or directly by applications on host operating systems. These applications often access block storage through a network, usually via Storage Area Networks (SANs).

Key characteristics and advantages of block storage include:

Independent Block Access

Data in each block can be accessed and written to independently, enhancing the performance for read/write operations. This is particularly beneficial for applications that require high random read/write capabilities.

High Performance

Due to the ability to directly access storage blocks, block storage systems usually offer high performance. This makes them suitable for applications where speed is crucial, such as database servers and high-traffic websites.

Flexibility and Management

Users have the flexibility to create, modify, delete, and manage blocks as per their requirements. This adaptability makes block storage a versatile choice for a variety of applications.

Use Cases

Commonly employed for file systems, databases, and virtual machines, block storage is ideal for scenarios demanding frequent and intensive read/write operations. It is often used to provide boot disks for virtual machines and persistent storage for applications.

AWS block storages

In AWS there are two types of block storages: EBS (Elastic Block Storage) and Amazon EC2 instance store.

EBS

EBS functions as a typical block storage system over the network, which introduces some overhead compared to directly attached disks. They can be manually attached to EC2 instances and serve as both root (boot) and secondary volumes. EBS is also utilized behind the scenes by other AWS services, such as RDS (Relational Database Service).

EBS volumes come in a variety of performance tiers, such as General Purpose SSD, Provisioned IOPS SSD, Throughput Optimized HDD, and Cold HDD, each tailored to meet diverse storage requirements. However, it's important to note that an EBS volume can be associated only with EC2 instances that are in the same Availability Zone. To achieve high availability across different zones, strategies like database replication are necessary

Amazon EC2 instance store

nstance store also provides block storage in AWS, but it is exclusively available for EC2 instances. Because instance stores are physically attached to the EC2 host hardware, they offer faster performance. However, there is a trade-off. Data stored in instance stores is ephemeral, meaning it is temporary. After rebooting an EC2 instance, the data on instance store volumes will be lost

File storages

File storage, a method for organizing data in a hierarchical format, involves structuring data into files and folders. This system, resembling a physical filing cabinet, is user-friendly and allows easy data access and management. Its tpically the most pricy option from all three available in AWS.

Key characteristics of file storage include:

File System

Utilizes a file system to manage data storage and retrieval, with common examples being NFS, SMB and iSCSI.

Individual File Accessibility

Enables direct access to individual files, facilitating tasks like document editing or multimedia file handling.

Metadata Management

Accompanies each file with metadata, providing essential details like file name, size, and modification date, aiding in organization and search.

Permissions and Sharing

Allows users to manage file permissions and share files, offering controlled access for viewing or editing by specific users.

Network-Attached Storage (NAS)

Commonly implemented in network environments through NAS devices, providing centralized, file-level access for multiple users and devices.

Cloud File Storage Solutions

Includes managed services like Amazon Elastic File System (EFS) and FSx, offering scalable storage options for EC2 instances and various applications. EFS provides a simple, scalable file system for multiple EC2 instances, while FSx caters to both Windows and Linux systems, with FSx for Lustre optimized for high-performance workloads and FSx for NetApp ONTAP supporting enterprise applications.

Application Suitability

Ideal for applications requiring shared file access and standard file system interfaces.

AWS file storages

AWS offers five managed file storage services tailored for diverse requirements. They differen in interfaces and use cases.

Amazon Elastic File System (EFS)

This service provides scalable file storage for EC2 instances and on-premises systems using the NFS protocol. Ideal for applications that require shared file storage, EFS is suitable for various use cases like content repositories, development environments, and data analytics. Its scalability ensures you pay only for the storage you use.

Amazon FSx

This suite of services offers fully managed file systems with native compatibility, including:

Amazon FSx for Windows File Server: Designed for Windows-based applications, it delivers fully managed file shares using the SMB protocol, built on Windows Server.

Amazon FSx for Lustre: Tailored for compute and data-intensive workloads like HPC and machine learning, offering high-performance capabilities.

Amazon FSx for OpenZFS: Facilitates the migration of Linux-based file servers to AWS without code changes, utilizing the OpenZFS file system.

Amazon S3 File Gateway

As part of AWS Storage Gateway, it provides a file interface to Amazon S3, supporting NFS and SMB protocols. This service is beneficial for applications requiring on-premises access to S3 objects, with the added advantage of local caching for frequently accessed data.

Hybrid Storages

Hybrid storage in AWS refers to storage solutions that integrate on-premises data storage systems with cloud-based storage. This approach allows organizations to maintain essential applications and data on-premises for rapid, low-latency access while leveraging the scalability, availability, and cost-efficiency of cloud storage. Hybrid storage is ideal for balancing local operational requirements with the benefits of cloud computing.

AWS hybrid storages

In AWS there are two types of hybrid storages: gateways and data transfers services.

AWS Storage Gateway

This service integrates on-premises environments with AWS cloud storage. It offers various modes:

File Gateway: Enables storing and retrieving Amazon S3 objects using protocols like NFS and SMB, commonly used for file storage, backups, and archiving.

Volume Gateway: Available in Stored and Cached Volumes; Stored Volumes keep primary data on-premises and back it up as EBS snapshots, while Cached Volumes store primary data in S3, with frequently accessed data kept locally.

Tape Gateway: Mimics a virtual tape library for cloud-based backup and archiving, leveraging S3 and S3 Glacier.

Data Transfer Services

AWS Snow Family: Includes Snowcone, Snowball, and Snowmobile, ideal for transferring large volumes of data to AWS, particularly useful for environments with limited connectivity.

AWS DataSync: Facilitates efficient online data movement between on-premises storage and AWS services, supporting use cases like data migration and synchronization for disaster recovery.