Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Have you ever tried to transfer from any public cloud platform into Cloud Storage? Do you know how haptic that is when the data is too big?
In this article, we will learn about Storage Transfer Service in the Google Cloud Platform in detail. Storage Transfer Service is a service that is used for the automation of transfer of data from public cloud platform to Cloud Storage.
We will also discuss the transfer between file systems, transfer between cloud storage buckets, and permissions of the Storage Transfer Service in detail.
Storage Transfer Service can be understood as a fully managed, highly scalable service that is used to automate transfers from other public cloud platforms into Cloud Storage.
When feasible, the Storage Transfer Service employs checksum metadata to identify changes between objects in the Cloud Storage and the source storage system. When transferring file systems, Storage Transfer Service compares the latest changed time and size of the source object to the last time the object was copied to Cloud Storage.
Features of Storage Transfer Service for Cloud-to-Cloud transfers
The Storage Transfer Service for Cloud-to-Cloud transfers has some features:
It supports transfers into Cloud Storage from the S3 as well as HTTP (HyperText Transfer Protocol).
It supports copies that have been made on a daily basis of any object that has been modified.
It doesn’t currently support the data transfers to the S3, maybe in future.
Storage Transfer Service can also be used for on-premises data transfers from the network file system or NFS storage to Cloud Storage.
Features of Storage Transfer Service for On-Premises transfers
The Storage Transfer Service for on-premises data has some features as well:
It is designed for the large-scale transfers (upto petabytes (PB) of data, billions of files).
It supports full copies or incremental copies.
We can set it up by installing on-premises software (known as agents) onto the computers in the data center.
It has a simple, managed graphical user interface (GUI); even non-technically savvy users (after the setup) can use it to move data.
It provides robust error reporting and a record of all files and objects that are moved.
It supports executing recurring transfers on a schedule.
Now, let's move on to the data transfer options.
Data Transfer Options
If we talk about Google, it offers multiple solutions for transferring your data to or from Cloud Storage or between file systems. We can:
Move or back up data from another cloud provider to Cloud Storage.
Move data between Cloud Storage buckets.
Move or back up data from on-premises storage.
Locations with good internet connectivity.
Locations with poor or no internet connectivity.
Transfer from Cloud to a file system.
For different sources and destinations, we have different recommendations:
Transferring more than 1 TB from another Cloud Storage region
Use Storage Transfer Service.
Now, let’s see how to transfer between Cloud Storage buckets next.
Transfer between Cloud Storage buckets
Storage Transfer Service can be utilized to transfer a large amount of data between the Cloud Storage buckets, which is either within the same Google Cloud project, or between different projects.
When transferring more than 1 Tera Byte (TB), we will use Storage Transfer Service. The Storage Transfer Service is a managed transfer option that is used to provide out-of-the-box security, reliability, and performance. It also eliminates the need to optimize and maintain scripts and handle retries.
Transfer data between file systems
After transfer between cloud storage buckets, let’s see the steps that are involved in transfer data between file systems:
First, you need to create agent pools and install agents:
For this, firstly create a source agent pool.
Then, install agents for the source agent pool.
After that, create a destination agent pool and install agents.
Second step is to create a Cloud Storage bucket as an intermediary:
You will need to manage intermediary buckets.
Finally, you can create a transfer job to complete the process.
After this, let’s move on to the most important part which is permissions and roles.
Permissions and roles
Storage Transfer Service utilizes Identity and Access Management (IAM) permissions and roles for controlling who can access resources of the Storage Transfer Service.
The main types of resources available in the Storage Transfer Service are:
jobs,
operations, and
agent pools.
In the IAM policy hierarchy, jobs are the child resources of projects, and operations are child resources of jobs.
Permissions
Let’s check out the different permissions for different types of users. You can grant the following Storage Transfer Service permissions:
Transfer project permission
Permission
Description
storagetransfer.projects.getServiceAccount
Can read the GoogleServiceAccount used by the Storage Transfer Service to access Cloud Storage buckets.
Transfer job permission
The following table describes permissions for Storage Transfer Service jobs:
Permission
Description
storagetransfer.jobs.create
Can create new transfer jobs.
storagetransfer.jobs.delete
Can delete existing transfer jobs.
Transfer jobs are deleted by calling the patch function. However, users must have this permission when deleting transfer jobs to avoid permission errors.
storagetransfer.jobs.get
Can retrieve specific jobs.
storagetransfer.jobs.list
Can list all transfer jobs.
storagetransfer.jobs.run
Can run all transfer jobs.
storagetransfer.jobs.update
Can update transfer job configurations without deleting them.
Transfer operations permissions
The following table describes permissions for Storage Transfer Service operations:
Permission
Description
storagetransfer.operations.assign
Used by transfer agents to assign operations.
storagetransfer.operations.cancel
Can cancel transfer operations.
storagetransfer.operations.get
Can get details of transfer operations.
storagetransfer.operations.list
Can list all transfer job operations.
storagetransfer.operations.pause
Can pause transfer operations.
storagetransfer.operations.report
Used by transfer agents to report operation status.
storagetransfer.operations.resume
Can resume paused transfer operations.
Transfer agent pool permissions
The following table describes permissions for file system transfer agent pools:
Permission
Description
storagetransfer.agentpools.create
Can create agent pools.
storagetransfer.agentpools.update
Can update agent pools.
storagetransfer.agentpools.delete
Can delete agent pools.
storagetransfer.agentpools.get
Can get information on specific agent pools.
storagetransfer.agentpools.list
Can list information for all agent pools in the project.
storagetransfer.agentpools.report
Used by transfer agents to report status.
Predefined roles
There are main three roles that are provided in the platform:
Storage Transfer Admin (roles/storagetransfer.admin)
Storage Transfer User (roles/storagetransfer.user)
Storage Transfer Viewer (roles/storagetransder.viewer)
Custom Roles
You can also create and apply custom IAM roles to meet your organization's access requirements.
When you are creating custom roles, recommend using a combination of predefined roles to ensure that the correct permissions are included together.
File system transfer permissions
Before we move on to file system transfer permissions of different accounts, we need to know about a special user which is known as Google Cloud project administrator.
A Google Cloud project administrator is a user with resourcemanager.projects.setIamPolicy privileges. We can use it to grant Identity and Access Management (IAM) permissions or roles to the appropriate users and service accounts.
The only purpose of the Google Cloud project administrator account is to grant permissions to users and service accounts. It isn't required to start transfer jobs.
Administrator Accounts: The Storage Transfer Service administrator accounts are superuser accounts supporting colleagues that perform transfers. The admins manage transfer agents, set bandwidth usage limits, and can delete transfer jobs.
User Accounts: The Storage Transfer Service user accounts can be used to create and execute transfers. These user accounts typically don't have access to delete transfer jobs.
Service Accounts: The Storage Transfer Service uses a Google-managed service account to move your data. This account is automatically created the first time you create a transfer job, create an agent pool, call googleServiceAccounts.get, or visit the job creation page in the Google Cloud console.
Transfer agents: Storage Transfer Service transfer agents can be run with either the user's account or with a service account.
In order to grant the required permissions to the service agent, you must have the relevant permissions on the source bucket:
storage.buckets.getIamPolicy
storage.buckets.setIamPolicy
The Storage Legacy Bucket Owner role (roles/storage.legacyBucketOwner) or the Storage Admin role (roles/storage.admin) provide the required permissions.
Auto-granting permissions in the console
If you're using the console to create your transfer and have the permissions listed in User permissions, the service agent will automatically be granted the required permissions on your source bucket.
Required permissions
The following predefined roles together grant the required permissions:
One of:
Storage Object Viewer (roles/storage.objectViewer) if the transfer is to another Cloud Storage bucket.
Storage Object Creator (roles/storage.objectCreator) if the transfer is to a file system.
Plus one of:
Storage Legacy Bucket Writer (roles/storage.legacyBucketWriter) if object delete permission is required.
Storage Legacy Bucket Reader (roles/storage.legacyBucketReader) if object delete permission is not required.
These combined roles grant the required permissions.
If you want to grant the required permissions to the service agent, you'll need to follow the steps below.
First, you need to find the service agent’s email.
Then, you can add the service agent to a bucket-level policy.
After this, let's see how to configure access to a sink.
The destination bucket does not need to belong to the same project as the service agent. The steps are the same without the regard to which project the bucket is in.
User permissions
In order to grant the appropriate permissions to the service agent, you must have the relevant permissions on the destination bucket:
storage.buckets.getIamPolicy
storage.buckets.setIamPolicy
The Storage Legacy Bucket Owner role (roles/storage.legacyBucketOwner) or the Storage Admin role (roles/storage.admin) provide the required permissions.
Auto-granting permissions in the console
If you're using the console to create your transfer and have the permissions listed in User permissions, the service agent will automatically be granted the required permissions on your destination bucket.
Required permissions
The following predefined role grants the required permissions:
Any Cloud Storage role marked as a legacy role can only be granted at the bucket level.
Grant the required permissions
To grant the Storage Legacy Bucket Writer role to the service agent, follow the steps below.
First, find the service agent’s email.
Then, add the service agent to a bucket-level policy.
Configure access to a sink: file systems
For transfers involving a file system as the sink, you must install agents on a machine with access to your file system. If agents in the same agent pool are installed on different machines, they must all have uniform access to the file system, or the transfer will fail.
Frequently Asked Questions
What is a data transfer service?
The Data Transfer Service is a product that allows users to move or backup the data to a Cloud Storage bucket from any other cloud storage providers as well as on-premises storage. We can also move data from one Cloud Storage bucket to another to be available to different groups of users or applications.
What are different ways of data transfer to GCP?
Suppose we're transferring data between our private data center and the Google Cloud. There are 3 main approaches: A public internet connection by using a public API, Direct Peering by using a public API, and Cloud Interconnect by using a private API.
How do you transfer data from on-premise to GCP?
The Transfer Appliance is actually a physical server that you request from Google. Then, you connect that appliance to your data center, upload the data to that appliance, and ship it to Google. All the data that you upload to the appliance is encrypted. Then, Google uploads the data to your Cloud Storage account for you faster.
What is storage transfer in cloud?
The Storage Transfer is actually a product that allows you to move or backup the data to a Cloud Storage bucket from other cloud storage providers or from a local or cloud POSIX file system. It allows you to move the data from one Cloud Storage bucket to another so that it can be availed to different groups of users or applications.
How does Google transfer work?
Google's content transfer tool enables you to transfer email and Google Drive files from your organizational Google account to another Google account. It includes all your emails in Gmail, all the documents you own in Google Drive, and all the files in My Drive that you have edit access to.
Conclusion
In this article, we have studied about Storage Transfer Service in the Google Cloud Platform in detail. We have also discussed the transfer between file systems, transfer between cloud storage buckets and permissions of Storage Transfer Service in detail.