What are Jobs?
AWS Data Exchange jobs perform asynchronous import or export activities.
You can generate and manage data sets that you want to publish to a product as a provider. Your assets or revisions can be downloaded (exported) or copied to Amazon Simple Storage Service (Amazon S3) or a signed URL. Additionally, providers can import assets from an Amazon API Gateway API or an Amazon Redshift data collection.
As a subscriber, you can view and have access to the data sets you have access to through your membership. The API functions allow you to download (export) or copy your authorized data sets to Amazon S3 for usage with several AWS analytics and machine learning services.
You can use AWS Management Console, the AWS Command Line Interface (AWS CLI), your own REST application, or one of the AWS SDKs to create or replicate assets or revisions through tasks.
Ninety days after they are created, jobs are eliminated.
Job Properties
Job ID - A unique id for the job generated when the job is created.
Job classification - The following job classifications are supported:
- Import data from the Amazon Simple Storage Service (Amazon S3)
- Import data from a signed URL
- Import data from the Amazon API Gateway API
- Import data for Amazon Redshift from an AWS Data Exchange datashare.
- S3 export from Amazon
- Export from a secure URL
Amazon Resource Name (ARN) - A unique identifier for Amazon Web Services (AWS) resources.
Job state - WAITING, IN PROGRESS, COMPLETED, CANCELLED, ERROR, and TIMED OUT are the task states. When a job is created, it is under the WAITING state until it is started.
Task details - Information about the job's operation, such as export destination or import source details.
Security in Data Exchange
At AWS, cloud security is of the utmost importance. As an AWS customer, you have access to various data centers and network architecture designed to suit the needs of the most security-conscious enterprises.
AWS and you share responsibility for security. This is referred to as cloud security and cloud security under the shared responsibility model:
Security of the cloud - AWS is in charge of protecting the infrastructure that runs AWS services in the AWS Cloud. AWS also supplies you with services that are safe to utilize. As part of AWS compliance initiatives, third-party auditors regularly test and verify our security effectiveness. See AWS Services in Scope by Compliance Program for more information on the compliance programs that apply to AWS Data Exchange.
Security in the cloud - Your duty is decided by the AWS services you utilize. You are also responsible for other aspects such as the sensitivity of your data, the requirements of your company, and applicable laws and regulations.
We recommend using AWS Identity and Access Management(IAM) to protect AWS account credentials and set up individual user accounts for data protection concerns. As a result, each user is only granted the rights required to carry out their job obligations. We also advise you to secure your data in the following ways:
- With each account, use multi-factor authentication (MFA).
- To communicate with AWS resources, use SSL/TLS. TLS 1.2 or later is recommended.
- Configure AWS CloudTrail to log API and user activity.
- Use AWS encryption solutions and all AWS service default security measures.
- Utilize advanced managed security services such as Amazon Macie, which aids in the discovery and protection of personal data stored in Amazon S3.
- Use a FIPS endpoint if you need FIPS 140-2 verified cryptographic modules while accessing AWS via a command-line interface or an API. For additional information on the FIPS endpoints that are currently available.
According to the official Documentation of AWS Data Exchange,
We strongly advise you to never include confidential or sensitive information, such as your customers' email addresses, into tags or free-form entries like the Name field. This includes using the console, API, AWS CLI, or AWS SDKs to interact with AWS Data Exchange or other AWS services. Any information entered into tags or free-form fields for names may be used for billing or troubleshooting logs. If you offer a URL to an external server, we strongly advise you not to include any credentials in the URL to validate your request to that site.
- AWS Data Exchange automatically encrypts all data products stored in the service at rest, requiring no further configuration. When you utilize AWS Data Exchange, this encryption is done automatically.
- AWS Data Exchange employs Transport Layer Security (TLS) and client-side encryption for encryption in transit. AWS Data Exchange always communicates via HTTPS, ensuring that your data is always encrypted in transit. When you utilize AWS Data Exchange, this encryption is enabled by default.
AWS Marketplace Catalog API
To update your AWS Data Exchange products, use the AWS Marketplace Catalog API. The ListEntities and DescribeEntity API calls can be used to view your goods. To update the AWS Data Exchange product, you must first build a new changeset, a Catalog API resource representing an asynchronous product management action.
Keep the following in your mind when working with the Catalog API:
- Each of the AWS Data Exchange product is represented in the Catalog API as termed as Entity.
- AWS Data Exchange products have DataProduct as the EntityType.
- Each product can have at max only one concurrently running change set at a time. This means that you can't create a second change set until the first one has finished running.
AddRevisions in AWS Data Exchange
Create a change set of type AddRevisions to publish new data set revisions to your AWS Data Exchange product. To do so, use the StartChangeSet API operation and supply the change type, product id, product type, and specifics such as data set and revision Amazon Resource Names (ARNs).
Multiple products can be updated in a single AddRevisions change set. Each change inside a product is limited to a single data set. If your product has many data sets that need to be updated, create a separate change for each data set.
This section will walk you through the processes that are necessary to publish new AWS Data Exchange data set changes to an existing product. The following are the high-level phases in the tutorial.
Set up IAM Permissions
- To use the AWS Marketplace Catalog API, you must first obtain AWS Identity and Access Management (IAM) rights. These permissions are in addition to the ones required to use AWS Data Exchange.
- Navigate to the IAM console in your browser and sign in using an AWS account that can handle IAM permissions.
- Select Policies from the left navigation pane.
- Select Create policy.
- Select the JSON tab and grant the following permissions. This grants complete access to the AWS Marketplace Catalog API. You can limit access as needed for your use case.
- Select Review policy.
- Give the policy a name (for example, CatalogAPIFullAccess) and click Create Policy.
- Select the users, groups, or roles to which you wish to apply the policy using the IAM console.
Access the AWS Marketplace Catalog API
Use the following HTTP client endpoint to access the AWS Marketplace Catalog API.
catalog.marketplace.us-east-1.amazonaws.com
Start a Change Request
Make a note of the entity ID you obtain by following the AWS Data Exchange console procedures to provide you with your product ID.
Request a StartChangeSet with an AddRevisions change type. In the request body, the details of the AddRevisions change object should include the following:
DataSetArn – The data set to which revisions should be added.
RevisionArns – The revisions that you want to publish to the product's data set. See AWS Data Exchange quotas for more information on the number of revisions that a single update can contain.
Check the status of your change set
After starting the change request with the StartChangeSet API operation, you may verify its status with the DescribeChangeSet operation. Enter the change set ID that was returned in the StartChangeSet API response.
AddRevisions Exceptions
When using the AWS Marketplace Catalog API with AWS Data Exchange, the following exceptions may occur:
- REVISION_NOT_FOUND
- REVISION_NOT_FINALIZED
- DATA_SET_NOT_FOUND
- INVALID_INPUT
- DATA_SET_NOT_PUBLISHED
- REVISION_DUPLICATE_PROVIDED
AddDataSets
Begin a change set of type AddDataSets to add data sets to your AWS Data Exchange product. To do so, use the StartChangeSet API operation and supply the change type, product identifier, product type, and specifics such as the data set Amazon Resource Name (ARN).
This section will lead you through the process of adding new AWS Data Exchange data sets to a published product in detail. The following are the high-level phases in the tutorial.
Set up IAM Permissions
To use the AWS Marketplace Catalog API, you must first obtain AWS Identity and Access Management (IAM) rights. These permissions are in addition to the ones required to use AWS Data Exchange.
- Navigate to the IAM console in your browser and sign in using an AWS account that can handle IAM permissions.
- Select Policies from the left navigation pane.
- Select Create policy.
- Select the JSON tab and grant the following permissions. This grants complete access to the AWS Marketplace Catalog API. You can limit access as needed for your use case.
- Next, select Review.
- Give the policy a name (for example, CatalogAPIFullAccess) and click Create Policy.
- Select the users, groups, or roles to which you wish to apply the policy using the IAM console.
Access the AWS Marketplace Catalog API
Use the following HTTP client endpoint to access the AWS Marketplace Catalog API.
catalog.marketplace.us-east-1.amazonaws.com
Get your product ID from the AWS Data Exchange console.
Start a Change Request
Begin a change request to add a data set to your test product.
- Make a note of the entity ID you obtain by following the procedures in The AWS Data Exchange console will provide you with your product ID.
- Request a StartChangeSet with an AddDataSets change type.
After starting the change request with the StartChangeSet API operation, you may verify its status with the DescribeChangeSet operation. Enter the change set ID that was returned in the StartChangeSet API response.
AddDataSets Exceptions
- DATA_SET_NOT_FOUND
- INVALID_INPUT
- DATA_SET_ALREADY_PUBLISHED
- DATA_SET_DUPLICATE_PROVIDED
I hope all the discussed sections are clearly understood.
Frequently Asked Questions
What is the connection between the Availability Zones and the Region?
AWS regions are geographically distinct from one another, such as the US-West 1 (North California) and Asia South (Mumbai). On the other hand, availability zones are the locations that exist within the regions. These are often isolated zones that can reproduce themselves as needed.
What is the limit of the number of S3 buckets you can have?
There is a limit of 100 S3 buckets that can be generated.
What should we do if we want to gain access to Amazon Simple Storage buckets and use the data for access audits?
AWS CloudTrail, which is meant for recording and tracking API calls and has also been made available for storage solutions, can be used in this scenario.
What are the native AWS security logging capabilities?
AWS CloudTrail, AWS Config, AWS detailed billing reports, Amazon S3 access logs, Elastic load balancing access logs, Amazon CloudFront access logs, Amazon VPC Flow Logs, and other native AWS security logging features are available. Click here to learn more about AWS' native security logging capabilities.
Conclusion
In this article, we have extensively discussed AWS data Exchange, its various methods and security issues, and maintenance. I hope you have got a fair understanding of the topic.
To read more about the topic, you can refer to these, Important AWS Interview Questions, SQS in AWS, and AWS Archives.
Refer to our guided paths on Coding Ninjas Studio to learn more about DSA, Competitive Programming, JavaScript, System Design, etc. Enrol in our courses, refer to the mock test and problems; look at the interview experiences and interview bundle for placement preparations.
Do upvote our blog to help other ninjas grow.
Happy Learning!