Table of contents
1.
Introduction
2.
What is AWS DataSync?
3.
Use Cases of AWS DataSync
4.
Benefits of Using AWS DataSync
5.
Terminologies of AWS DataSync
5.1.
Agent
5.2.
Task
6.
How DataSync Transfers Files
6.1.
Data Integrity Checks
7.
Features of AWS DataSync
7.1.
Discovery
7.2.
Specialized Network Protocol
7.3.
Optimized Bandwidth Control
7.4.
Transfer Scheduling
7.5.
Data Encryption
8.
Frequently Asked Questions
8.1.
What is a virtual machine hypervisor?
8.2.
What is the network file system protocol?
8.3.
What is the TLS protocol?
9.
Conclusion
Last Updated: Mar 27, 2024
Easy

AWS DataSync

Author Abhinav Anand
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

Migrating data from your on-premises storage solution to a cloud provider is a hassle. AWS DataSync makes migrating your data to AWS servers easy and safe. 

aws datasync

This article will teach you about AWS DataSync, its use cases, and its features.

What is AWS DataSync?

AWS DataSync is a fully managed data transfer service that makes it easy to migrate your data from your local storage solutions to AWS Services. This service uses Transport Layer Security (TLS) protocol to ensure that your data is transferred securely. You can also use this service to move your data from one AWS storage service to another. This feature helps maintain backups of your primary AWS storage.

transfer flowchart

Credit: aws.amazon.com

AWS DataSync supports the following storage location types:-

  • Network File System (NFS) shares
     
  • Server Message Block (SMB) shares
     
  • Hadoop Distributed File Systems (HDFS)
     
  • Self-managed Object Storage
     
  • Google Cloud Storage
     
  • Azure Files and more.
     

AWS DataSync uses scripting to eliminate manual tasks that slow down data migrations. The DataSync software agent connects to your Network File System (NFS) and Server Message Block (SMB) storage and transfers data at speeds up to 10 times faster than open-source tools using the TLS protocol.

Let's take a look at some use cases of AWS DataSync.

Use Cases of AWS DataSync

The following are some use cases of the AWS DataSync service:-

  • Data Discovery: The Discovery feature of AWS DataSync can be used for getting insights into your on-premises storage performance and utilization
     
  • Data Migration: Using AWS DataSync, you can move active datasets rapidly to AWS storage services. It supports automatic encryption and integrity validation
     
  • Data Archives: You can move cold data stored in your on-premised storage directly into AWS S3 Glacier, freeing up your local storage
     
  • In-Cloud Data Processing: With DataSync, you can schedule data transfers to and from different cloud services to perform different workflow tasks such as machine learning, video processing, big-data analytics, etc

 

The next section will discuss some benefits of using AWS DataSync.

Benefits of Using AWS DataSync

You can get the following benefits if you choose to use AWS DataSync:-

  • Simplified Migration Planning: DataSync Discovery minimizes the time, effort, and costs needed for planning your data migration to AWS. You also don’t have to maintain or write complicated scripts to deal with data transfers
     
  • Data Security: DataSync provides end-to-end data security by using advanced encryption and data validation techniques. It uses AWS Identity and Access Management (IAM) roles to access your data
     
  • Reduced Operation Costs: AWS DataSync has per-gigabyte pricing, which means it only charges you for the amount of data you transferred
     

Now that you are familiar with AWS DataSync let's look at the service's different concepts and terminologies.

Terminologies of AWS DataSync

The following are some terminologies related to AWS DataSync transfers:-

Agent

A DataSync agent is a virtual machine used for reading and writing to the storage during a transfer operation. You can deploy a data sync agent in your storage environment on the following hypervisors:-

  • VMware ESXi
     
  • Linux Kernel-based Virtual Machine (KVM)
     
  • Microsoft Hyper-V hypervisors. 

Task

It identifies a source and destination location and describes how to copy data between them. You also can specify how each task manages metadata, deleted files, and permissions.

When a task executes, it can have the following phases during the transfer:-

  • Queuing
     
  • Launching
     
  • Preparing
     
  • Transferring
     
  • Verifying
     
  • Success
     
  • Error

Now let us understand how AWS DataSync transfers files.

How DataSync Transfers Files

AWS DataSync examines your source and destination storage systems to determine what to sync when you execute a data transfer task, and this is done by recursively scanning the contents and metadata of both systems to differentiate between them. The duration of this process depends on the number of files involved and the performance of the storage systems. After the examination is done, the data is transferred based on how the transfer task was set up. For example, you can choose to perform data integrity checks during the transfer or after it is completed. Let’s discuss how data integrity is performed.

Data Integrity Checks

AWS DataSync calculates the checksum for every file in the source and destination storage and compares them. It also compares the metadata of every file transferred from source to target.

If there is a difference, verification fails with an error code specifying exactly what failed. 

You can see the following errors code if the data integrity check fails:-

  • Checksum failure
     
  • Metadata failure
     
  • Files were added
     
  • Files were removed, and more.
     

The next section covers some features of AWS DataSync.

Features of AWS DataSync

The following are some key features of AWS DataSync:-

Discovery

DataSync Discovery simplifies migration planning and accelerates the data migration process. It gives you critical information about your on-premises storage performance and recommends different AWS Storage services suitable for your use case.

Specialized Network Protocol

AWS DataSync uses an AWS-designed data transfer protocol that makes the transfer rate faster. It is highly optimized for sending and receiving data over the internet, and these optimizations include in-line compression, spare file detection, in-line data validation, and encryption.

Optimized Bandwidth Control

The data transfer process doesn’t affect your business as the service supports granular bandwidth consumption control. You can throttle the transfer speeds during business hours and give it the full bandwidth when the network gets free.

Transfer Scheduling

DataSync has built-in support for data transfer task scheduling. It allows you to periodically run data transfer tasks that can detect changes in your source storage and copy them to the destination.

Data Encryption

DataSync encrypts all of your data during transfers using the Transport Layer Security (TLS) protocol.

Frequently Asked Questions

What is a virtual machine hypervisor?

A virtual machine hypervisor is a type of software that creates and manages virtual machines on a computer or server. It allows you to create multiple VMs, each running its own operating system. It abstracts the underlying hardware resources and allocates them to the virtual machines.

What is the network file system protocol?

NFS is a distributed file system protocol that allows a computer to access files over a network. It makes it simple to share files between different systems. It works using the client-server architecture, where one system acts as the NFS server and others as clients.

What is the TLS protocol?

TLS stands for Transport Layer Security, and it is a cryptographic protocol that is used for securely communicating over a network. This protocol ensures data privacy and integrity during transmission. It is the successor to the Secure Sockets Layer (SSL) protocol.

Conclusion

AWS DataSync is a fully managed data transfer service that you can use for transferring data from a local storage system to an AWS storage service. We discussed its features, use cases, and benefits.

You can go through the following articles to learn more about different AWS services:-

 

Happy Learning!

Live masterclass