Table of contents
1.
Introduction
2.
Managing Data in AWS Data Exchange
2.1.
Assets
2.2.
Revisions
2.3.
Data sets
2.4.
Amazon S3 object data Set
2.5.
API Data Set
2.6.
Amazon Redshift Data Set
3.
What are Jobs?
3.1.
Job Properties
4.
Security in Data Exchange
4.1.
AWS Marketplace Catalog API
5.
AddRevisions in AWS Data Exchange
5.1.
Set up IAM Permissions
5.2.
Access the AWS Marketplace Catalog API
5.3.
Start a Change Request
5.4.
Check the status of your change set
5.5.
AddRevisions Exceptions
6.
AddDataSets
6.1.
Set up IAM Permissions
6.2.
Access the AWS Marketplace Catalog API
6.3.
Start a Change Request
6.4.
AddDataSets Exceptions
7.
Frequently Asked Questions
7.1.
What is the connection between the Availability Zones and the Region?
7.2.
What is the limit of the number of S3 buckets you can have?
7.3.
What should we do if we want to gain access to Amazon Simple Storage buckets and use the data for access audits?
7.4.
What are the native AWS security logging capabilities?
8.
Conclusion
Last Updated: Mar 27, 2024
Easy

Data Exchange in AWS

Author ANKIT MISHRA
1 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

AWS Data Exchange is a service that allows AWS customers to locate easily, subscribe to, and use third-party data in the AWS Cloud.

You can browse and subscribe to thousands of items from qualified data providers as a subscriber. Then, using the AWS Data Exchange UI or APIs, you can create, browse, manage, and retrieve data sets for usage with several AWS analytics and machine learning services. AWS Data Exchange subscribers can be anyone with an AWS account. See Subscribing to data products on AWS Data Exchange for further information on how to do so. AWS Data Exchange eliminates the need for providers to design and maintain any data delivery, entitlement, or billing technologies. AWS Data Exchange provides providers with a secure, transparent, and dependable method to contact AWS customers and award existing customers subscriptions more efficiently. A few procedures are required to determine eligibility for becoming an AWS Data Exchange provider.

Let's see how data exchange works. Firstly we will discuss Data sets.

Managing Data in AWS Data Exchange

AWS Data Exchange organizes data using three building blocks:

  • Assets – Just a piece of data or information.
  • Revisions – A container to store one or more assets.
  • Data sets – A series of storing one or more revisions.

These three essential components serve as the foundation for the product, which you administer either the AWS Data Exchange dashboard or the AWS Data Exchange API.

You can use the AWS Data Exchange dashboard, the AWS Command Line Interface (AWS CLI), your REST client, or any of the AWS SDKs to create, view, update or delete data sets.

Assets

In AWS Data Exchange, assets are data. The type of asset specifies how the data is given to the subscriber through the data sets and products that comprise it.

Any of the following can be considered an asset:

  • A file saved on your computer's local hard drive.
  • A file that has been saved as an object in Amazon Simple Storage Service (Amazon S3).
  • Amazon API Gateway was used to develop a REST API.
  • A data set from Amazon Redshift.

Revisions

A revision is a collection of one or more assets. Revisions are used to update data on Amazon S3. For example, you can revise by grouping a collection of.csv files or a single.csv file with a dictionary. You create modifications and add assets as new data becomes available. After you generate and finalize the revision using the AWS Data Exchange console, it will instantly be available to subscribers.

Data sets

In AWS Data Exchange, a data set is a collection of data that might change over time.

When subscribers access an API data set, they gain access to a data set including API assets that allow subscribers to make API calls to AWS Data Exchange-managed endpoints, which are subsequently proxied through the provider endpoints.

When subscribers visit an Amazon S3 data collection, they gain access to a specific revision of the data set. This structure allows providers to update the data available in data sets over time without worrying about changes to historical data.

Amazon S3 object data Set

  • An Amazon S3's object data set is a data collection containing flat files that Amazon S3 allows.
  • You can export data as a data subscriber either locally (to your computer) or to your Amazon S3 bucket.
  • You can import any type of flat file from your Amazon S3 bucket and add it to the data set as a data provider.

API Data Set

An API data set is a collection of API assets. Subscribers can use API assets to perform API calls to AWS Data Exchange-managed endpoints, which are subsequently proxied to provider endpoints.

As a data provider, you construct an API in Amazon API Gateway and add it to the data set to grant subscribers access to your API.

Amazon Redshift Data Set

AWS Data Exchange datashares for Amazon Redshift are included in the Amazon Redshift data set. When you subscribe to a data set using datashares, you are added as a datashare consumer. This grants you read-only access to the schemas, tables, views, and user-defined functions added to the datashares by the provider.

As a data subscriber, you can use Amazon Redshift to establish a database from the datashare and then query live data without extracting, converting, or loading files. When your membership gets activated, you are automatically provided access to the datashare, and you lose access when your subscription expires.

As a data provider, you establish a datashare in Amazon Redshift and add it to the data set to grant subscription access to your datashare.

What are Jobs?

AWS Data Exchange jobs perform asynchronous import or export activities.

You can generate and manage data sets that you want to publish to a product as a provider. Your assets or revisions can be downloaded (exported) or copied to Amazon Simple Storage Service (Amazon S3) or a signed URL. Additionally, providers can import assets from an Amazon API Gateway API or an Amazon Redshift data collection.

As a subscriber, you can view and have access to the data sets you have access to through your membership. The API functions allow you to download (export) or copy your authorized data sets to Amazon S3 for usage with several AWS analytics and machine learning services.

You can use AWS Management Console, the AWS Command Line Interface (AWS CLI), your own REST application, or one of the AWS SDKs to create or replicate assets or revisions through tasks.

Ninety days after they are created, jobs are eliminated.

Job Properties

Job ID - A unique id for the job generated when the job is created.

Job classification - The following job classifications are supported:

  • Import data from the Amazon Simple Storage Service (Amazon S3)
  • Import data from a signed URL
  • Import data from the Amazon API Gateway API
  • Import data for Amazon Redshift from an AWS Data Exchange datashare.
  • S3 export from Amazon
  • Export from a secure URL

Amazon Resource Name (ARN) - A unique identifier for Amazon Web Services (AWS) resources.

Job state - WAITING, IN PROGRESS, COMPLETED, CANCELLED, ERROR, and TIMED OUT are the task states. When a job is created, it is under the WAITING state until it is started.

Task details - Information about the job's operation, such as export destination or import source details.

Security in Data Exchange

At AWS, cloud security is of the utmost importance. As an AWS customer, you have access to various data centers and network architecture designed to suit the needs of the most security-conscious enterprises.

AWS and you share responsibility for security. This is referred to as cloud security and cloud security under the shared responsibility model:

Security of the cloud  - AWS is in charge of protecting the infrastructure that runs AWS services in the AWS Cloud. AWS also supplies you with services that are safe to utilize. As part of AWS compliance initiatives, third-party auditors regularly test and verify our security effectiveness. See AWS Services in Scope by Compliance Program for more information on the compliance programs that apply to AWS Data Exchange.

Security in the cloud - Your duty is decided by the AWS services you utilize. You are also responsible for other aspects such as the sensitivity of your data, the requirements of your company, and applicable laws and regulations.

We recommend using AWS Identity and Access Management(IAM) to protect AWS account credentials and set up individual user accounts for data protection concerns. As a result, each user is only granted the rights required to carry out their job obligations. We also advise you to secure your data in the following ways:

  • With each account, use multi-factor authentication (MFA).
  • To communicate with AWS resources, use SSL/TLS. TLS 1.2 or later is recommended.
  • Configure AWS CloudTrail to log API and user activity.
  • Use AWS encryption solutions and all AWS service default security measures.
  • Utilize advanced managed security services such as Amazon Macie, which aids in the discovery and protection of personal data stored in Amazon S3.
  • Use a FIPS endpoint if you need FIPS 140-2 verified cryptographic modules while accessing AWS via a command-line interface or an API. For additional information on the FIPS endpoints that are currently available.

According to the official Documentation of AWS Data Exchange,

We strongly advise you to never include confidential or sensitive information, such as your customers' email addresses, into tags or free-form entries like the Name field. This includes using the console, API, AWS CLI, or AWS SDKs to interact with AWS Data Exchange or other AWS services. Any information entered into tags or free-form fields for names may be used for billing or troubleshooting logs. If you offer a URL to an external server, we strongly advise you not to include any credentials in the URL to validate your request to that site.

  • AWS Data Exchange automatically encrypts all data products stored in the service at rest, requiring no further configuration. When you utilize AWS Data Exchange, this encryption is done automatically.
  • AWS Data Exchange employs Transport Layer Security (TLS) and client-side encryption for encryption in transit. AWS Data Exchange always communicates via HTTPS, ensuring that your data is always encrypted in transit. When you utilize AWS Data Exchange, this encryption is enabled by default.

AWS Marketplace Catalog API

To update your AWS Data Exchange products, use the AWS Marketplace Catalog API. The ListEntities and DescribeEntity API calls can be used to view your goods. To update the AWS Data Exchange product, you must first build a new changeset, a Catalog API resource representing an asynchronous product management action.

Keep the following in your mind when working with the Catalog API:

  • Each of the AWS Data Exchange product is represented in the Catalog API as termed as Entity.
  • AWS Data Exchange products have DataProduct as the EntityType.
  • Each product can have at max only one concurrently running change set at a time. This means that you can't create a second change set until the first one has finished running.

AddRevisions in AWS Data Exchange

Create a change set of type AddRevisions to publish new data set revisions to your AWS Data Exchange product. To do so, use the StartChangeSet API operation and supply the change type, product id, product type, and specifics such as data set and revision Amazon Resource Names (ARNs).

Multiple products can be updated in a single AddRevisions change set. Each change inside a product is limited to a single data set. If your product has many data sets that need to be updated, create a separate change for each data set.

This section will walk you through the processes that are necessary to publish new AWS Data Exchange data set changes to an existing product. The following are the high-level phases in the tutorial.

Set up IAM Permissions

  • To use the AWS Marketplace Catalog API, you must first obtain AWS Identity and Access Management (IAM) rights. These permissions are in addition to the ones required to use AWS Data Exchange.
  • Navigate to the IAM console in your browser and sign in using an AWS account that can handle IAM permissions.
  • Select Policies from the left navigation pane.
  • Select Create policy.
  • Select the JSON tab and grant the following permissions. This grants complete access to the AWS Marketplace Catalog API. You can limit access as needed for your use case.
  • Select Review policy.
  • Give the policy a name (for example, CatalogAPIFullAccess) and click Create Policy.
  • Select the users, groups, or roles to which you wish to apply the policy using the IAM console.

Access the AWS Marketplace Catalog API

Use the following HTTP client endpoint to access the AWS Marketplace Catalog API.

catalog.marketplace.us-east-1.amazonaws.com

Start a Change Request

Make a note of the entity ID you obtain by following the AWS Data Exchange console procedures to provide you with your product ID.

Request a StartChangeSet with an AddRevisions change type. In the request body, the details of the AddRevisions change object should include the following:

DataSetArn – The data set to which revisions should be added.

RevisionArns – The revisions that you want to publish to the product's data set. See AWS Data Exchange quotas for more information on the number of revisions that a single update can contain.

Check the status of your change set

After starting the change request with the StartChangeSet API operation, you may verify its status with the DescribeChangeSet operation. Enter the change set ID that was returned in the StartChangeSet API response.

AddRevisions Exceptions

When using the AWS Marketplace Catalog API with AWS Data Exchange, the following exceptions may occur:

  • REVISION_NOT_FOUND
  • REVISION_NOT_FINALIZED
  • DATA_SET_NOT_FOUND
  • INVALID_INPUT
  • DATA_SET_NOT_PUBLISHED
  • REVISION_DUPLICATE_PROVIDED

AddDataSets

Begin a change set of type AddDataSets to add data sets to your AWS Data Exchange product. To do so, use the StartChangeSet API operation and supply the change type, product identifier, product type, and specifics such as the data set Amazon Resource Name (ARN).

This section will lead you through the process of adding new AWS Data Exchange data sets to a published product in detail. The following are the high-level phases in the tutorial.

Set up IAM Permissions

To use the AWS Marketplace Catalog API, you must first obtain AWS Identity and Access Management (IAM) rights. These permissions are in addition to the ones required to use AWS Data Exchange.

  • Navigate to the IAM console in your browser and sign in using an AWS account that can handle IAM permissions.
  • Select Policies from the left navigation pane.
  • Select Create policy.
  • Select the JSON tab and grant the following permissions. This grants complete access to the AWS Marketplace Catalog API. You can limit access as needed for your use case.
  • Next, select Review.
  • Give the policy a name (for example, CatalogAPIFullAccess) and click Create Policy.
  • Select the users, groups, or roles to which you wish to apply the policy using the IAM console.

Access the AWS Marketplace Catalog API

Use the following HTTP client endpoint to access the AWS Marketplace Catalog API.

catalog.marketplace.us-east-1.amazonaws.com

Get your product ID from the AWS Data Exchange console.

Start a Change Request

Begin a change request to add a data set to your test product.

  • Make a note of the entity ID you obtain by following the procedures in The AWS Data Exchange console will provide you with your product ID.
  • Request a StartChangeSet with an AddDataSets change type.

After starting the change request with the StartChangeSet API operation, you may verify its status with the DescribeChangeSet operation. Enter the change set ID that was returned in the StartChangeSet API response.

AddDataSets Exceptions

  • DATA_SET_NOT_FOUND
  • INVALID_INPUT
  • DATA_SET_ALREADY_PUBLISHED
  • DATA_SET_DUPLICATE_PROVIDED

I hope all the discussed sections are clearly understood. 

Frequently Asked Questions

What is the connection between the Availability Zones and the Region?

AWS regions are geographically distinct from one another, such as the US-West 1 (North California) and Asia South (Mumbai). On the other hand, availability zones are the locations that exist within the regions. These are often isolated zones that can reproduce themselves as needed.

What is the limit of the number of S3 buckets you can have?

There is a limit of 100 S3 buckets that can be generated.

What should we do if we want to gain access to Amazon Simple Storage buckets and use the data for access audits?

AWS CloudTrail, which is meant for recording and tracking API calls and has also been made available for storage solutions, can be used in this scenario.

What are the native AWS security logging capabilities?

AWS CloudTrail, AWS Config, AWS detailed billing reports, Amazon S3 access logs, Elastic load balancing access logs, Amazon CloudFront access logs, Amazon VPC Flow Logs, and other native AWS security logging features are available. Click here to learn more about AWS' native security logging capabilities.

Conclusion

In this article, we have extensively discussed AWS data Exchange, its various methods and security issues, and maintenance. I hope you have got a fair understanding of the topic.

To read more about the topic, you can refer to these, Important AWS Interview QuestionsSQS in AWS, and AWS Archives.

Refer to our guided paths on Coding Ninjas Studio to learn more about DSA, Competitive Programming, JavaScript, System Design, etc. Enrol in our courses, refer to the mock test and problems; look at the interview experiences and interview bundle for placement preparations.

Do upvote our blog to help other ninjas grow.

Happy Learning!

Live masterclass