Use Cases of AWS DataSync
The following are some use cases of the AWS DataSync service:-
-
Data Discovery: The Discovery feature of AWS DataSync can be used for getting insights into your on-premises storage performance and utilization
-
Data Migration: Using AWS DataSync, you can move active datasets rapidly to AWS storage services. It supports automatic encryption and integrity validation
-
Data Archives: You can move cold data stored in your on-premised storage directly into AWS S3 Glacier, freeing up your local storage
- In-Cloud Data Processing: With DataSync, you can schedule data transfers to and from different cloud services to perform different workflow tasks such as machine learning, video processing, big-data analytics, etc
The next section will discuss some benefits of using AWS DataSync.
Benefits of Using AWS DataSync
You can get the following benefits if you choose to use AWS DataSync:-
-
Simplified Migration Planning: DataSync Discovery minimizes the time, effort, and costs needed for planning your data migration to AWS. You also don’t have to maintain or write complicated scripts to deal with data transfers
-
Data Security: DataSync provides end-to-end data security by using advanced encryption and data validation techniques. It uses AWS Identity and Access Management (IAM) roles to access your data
-
Reduced Operation Costs: AWS DataSync has per-gigabyte pricing, which means it only charges you for the amount of data you transferred
Now that you are familiar with AWS DataSync let's look at the service's different concepts and terminologies.
Terminologies of AWS DataSync
The following are some terminologies related to AWS DataSync transfers:-
Agent
A DataSync agent is a virtual machine used for reading and writing to the storage during a transfer operation. You can deploy a data sync agent in your storage environment on the following hypervisors:-
-
VMware ESXi
-
Linux Kernel-based Virtual Machine (KVM)
- Microsoft Hyper-V hypervisors.
Task
It identifies a source and destination location and describes how to copy data between them. You also can specify how each task manages metadata, deleted files, and permissions.
When a task executes, it can have the following phases during the transfer:-
-
Queuing
-
Launching
-
Preparing
-
Transferring
-
Verifying
-
Success
- Error
Now let us understand how AWS DataSync transfers files.
How DataSync Transfers Files
AWS DataSync examines your source and destination storage systems to determine what to sync when you execute a data transfer task, and this is done by recursively scanning the contents and metadata of both systems to differentiate between them. The duration of this process depends on the number of files involved and the performance of the storage systems. After the examination is done, the data is transferred based on how the transfer task was set up. For example, you can choose to perform data integrity checks during the transfer or after it is completed. Let’s discuss how data integrity is performed.
Data Integrity Checks
AWS DataSync calculates the checksum for every file in the source and destination storage and compares them. It also compares the metadata of every file transferred from source to target.
If there is a difference, verification fails with an error code specifying exactly what failed.
You can see the following errors code if the data integrity check fails:-
-
Checksum failure
-
Metadata failure
-
Files were added
-
Files were removed, and more.
The next section covers some features of AWS DataSync.
Features of AWS DataSync
The following are some key features of AWS DataSync:-
Discovery
DataSync Discovery simplifies migration planning and accelerates the data migration process. It gives you critical information about your on-premises storage performance and recommends different AWS Storage services suitable for your use case.
Specialized Network Protocol
AWS DataSync uses an AWS-designed data transfer protocol that makes the transfer rate faster. It is highly optimized for sending and receiving data over the internet, and these optimizations include in-line compression, spare file detection, in-line data validation, and encryption.
Optimized Bandwidth Control
The data transfer process doesn’t affect your business as the service supports granular bandwidth consumption control. You can throttle the transfer speeds during business hours and give it the full bandwidth when the network gets free.
Transfer Scheduling
DataSync has built-in support for data transfer task scheduling. It allows you to periodically run data transfer tasks that can detect changes in your source storage and copy them to the destination.
Data Encryption
DataSync encrypts all of your data during transfers using the Transport Layer Security (TLS) protocol.
Frequently Asked Questions
What is a virtual machine hypervisor?
A virtual machine hypervisor is a type of software that creates and manages virtual machines on a computer or server. It allows you to create multiple VMs, each running its own operating system. It abstracts the underlying hardware resources and allocates them to the virtual machines.
What is the network file system protocol?
NFS is a distributed file system protocol that allows a computer to access files over a network. It makes it simple to share files between different systems. It works using the client-server architecture, where one system acts as the NFS server and others as clients.
What is the TLS protocol?
TLS stands for Transport Layer Security, and it is a cryptographic protocol that is used for securely communicating over a network. This protocol ensures data privacy and integrity during transmission. It is the successor to the Secure Sockets Layer (SSL) protocol.
Conclusion
AWS DataSync is a fully managed data transfer service that you can use for transferring data from a local storage system to an AWS storage service. We discussed its features, use cases, and benefits.
You can go through the following articles to learn more about different AWS services:-
Happy Learning!