Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Document AI is a document understanding solution that takes unstructured data (documents, forms, etc.) and gives the data structure through content classification, entity extraction, sophisticated searching, and other methods. This makes the data easier to comprehend, analyze, and consume. To assist you in developing a scalable, cloud-based document interpretation solution, Document AI uses machine learning and Google Cloud. The Data Processing and Security Terms impose obligations on Google concerning Document AI.
You can use Document AI to:
Image-to-text conversion
Document classification
extracting and analyzing entities
For more information about Document AI, let's dive into the article.
Document AI features and solutions
General Processors: An excellent place to start when needing to process documents.
OCR (Optical Character Recognition) recognizes and extracts text from various documents.
Form Parser Key-value pairs are an example of form elements to extract.
Intelligent Documents Quality (Preview) Analyze a document's readability to determine its quality.
Specialized Processors: Specific models for the most prevalent document kinds in use today.
Procurement DocAI - Automate procurement data capture at scale by converting unstructured documents like invoices and receipts into structured data to boost operational effectiveness, enhance customer experience, and assist decision-making.
Lending DocAI - Automating mortgage documents' processing will transform the home loan experience for both borrowers and lenders. Streamline data acquisition and cut processing times while maintaining regulatory and compliance standards.
Contract DocAI - Simplify and extract highly accurate data from contracts to digitize and speed up contract life cycle management.
Identity DocAI - Specialized models to accurately and automatically extract information from identity documents.
Enterprise Knowledge Graph: Add linkages and real-world elements to data.
Human-in-the-Loop (HITL) - Human verification and corrections to assure the accuracy of data retrieved by Document AI processors for usage in crucial business applications.
Document AI processors
Depending on their capability, a growing number of processors (also known as parsers or splitters) are available from Document AI to extract data from particular document types. The following document processors are available right now from Document AI:-
General processors
Contract processors
Identity processors
Lending processors
Procurement processors
Using Document AI processors
The main steps for using Document AI are as follows:
For your use case, pick an appropriate processor.
Use the Cloud console to create a processor.
You can send your documents to the prediction endpoint that Document AI creates.
Refer to Building a processor for comprehensive instructions.
Send us your Document (s), so we can process them.
Once the Document (s) have been processed by Document AI, one or more document objects containing the extracted, structured information are returned.
Human in the Loop documentation
Before being employed in crucial business applications, HITL AI enables human verification and adjustments to guarantee the accuracy of data gathered by humans in the Loop processors. It provides a workflow and user interface (UI) for people to examine, confirm, and correct the data humans in the Loop processors have collected from documents. It is utilized in various sectors, including government, manufacturing, health, and financial services.
We give customers two choices for human labelers:
Bring your labelers so your staff or a partner organization can review the documents.
Documents can be privately previewed using Google HITL Workforce. Only documents without personally identifiable information are subject to this evaluation (PII).
Features
We assist with the following attributes:
Filters with confidence thresholds to control the volume of documents flowing through HITL.
Management of the labeler pool, including task assignments and labeler and task-specific efficiency data.
UI cues and features decrease the time a labeler handles a document.
Analytics and metrics for tasks and labelers, allowing you to automate HITL processes
Benefits
HITL AI provides these advantages:
Risk mitigation- reduce the likelihood that crucial data, such as invoice amounts, billing addresses, loan amounts, etc., would be inaccurate and cause financial loss.
Simplify Exception Handling- Easily implement a system for handling exceptions and human review.
Workforce Efficiencies- Manage, oversee, and enhance workforce productivity through managing the human review.
Cost control- Manage human review expenses using programmable filters.
Data completeness- Ensure that your extracted data is complete for your future business applications.
Processors supported
The HITL review workflow now supports the following processors:
General Processors
Form Parser
Procurement Processors
Expense Parser
Invoice Parser
Utility Parser
Lending Processors
1003 Parser
1040 Parser
1040 Schedule C Parser
1040 Schedule E Parser
1099-DIV Parser
1099-G Parser
1099-INT Parser
1099-MISC Parser
1120S Parser
Bank Statement Parser
HOA Statement Parser
Mortgage Statement Parser
Pay Slip Parser
Retirement/Investment Statement Parser
SSA-89 Parser
W2 Parser
W9 Parser
Contract Processors
Contract parser
Identity Processors
France Driver License Parser
France National ID Parser
France Passport Parser
US Driver License Parser
US Passport Parser
Language support
The text recognition feature (OCR) of the Document AI API can recognize a range of languages, including several languages inside a single document. The Document object's detectedLanguages field contains a BCP-47 identifier for each language identified by the Document AI API. See the Cloud Vision OCR Language Support documentation for a list of the languages and scripts that Document OCR (Optical Character Recognition) supports. Other processors might support a smaller number of languages.
General processors
Contract processors
Identity processors
Lending processors
Procurement processors
Supported Files
Following are the image formats that Document AI supports.
Note: A few of these image formats are "lossy" (for example, JPEG). The image quality and accuracy of Document AI results may suffer as file sizes for lossy formats are reduced.
Document scan resolution
Document scans must be at least 200 dpi for Document AI to produce the most accurate OCR results (dots per inch). The most significant results will often be obtained at 300 dpi or higher.
Document AI client libraries
This section demonstrates how to use the Cloud Client Libraries for the Document AI API. In Client Libraries Explained, you may learn more about client libraries for Cloud APIs, including the older Google API client libraries.
Installing the client library
This section explains how to start with the Document AI Cloud Client Libraries.
Java
Add the following to your pom.xml file if you're using Maven. The Google Cloud Platform Libraries BOM has more information about BOMs.
The following IDE plugins can be used to add client libraries to your project if you're working with Visual Studio Code, IntelliJ, or Eclipse:
Cloud Code for VS Code
Cloud Code for IntelliJ
Cloud Tools for Eclipse
The plugins offer extra features like key management for service accounts. Details can be found in the documentation for each plugin.
Node.Js
npm install @google-cloud/documentai
Python
pip install --upgrade google-cloud-documentai
Setting up authentication
You must first configure authentication before you can start the client library. As demonstrated in the subsequent steps, making a service account and setting an environment variable are two ways to accomplish it. See Authenticating as a service account for other methods of authentication.
Console
Setting up a service account:-
Navigate to the Create service account page in the console. Go to Create service account
Choose a project.
Enter a name in the Service account name field. Based on this name, the console populates the Service account ID column.
Enter a description in the Service account description area. Service account, as an illustration, for a quick start.
Click Create and continue.
To complete creating the service account, click Done.
Keep your browser window open. It is put to use in the following action.
Make a service account key:
Click the email associated with the service account you created in the console.
Press Keys.
Click Create new key after selecting Add key.
Press Create. On your computer, a JSON key file is downloaded.
To close, click.
Gcloud
Configure authentication:-
Setting up the service account
gcloud iam service-accounts create NAME
NAME should be changed to the name of the service account.
Making the key file: gcloud iam service-accounts keys create FILE_NAME.json --iam-account=SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
Substitute the following:
FILE_NAME: The name of the key file
SERVICE_ACCOUNT_NAME: The service account's name
PROJECT_ID: The project ID where the service account was created
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to provide authentication credentials to your application code. This variable only functions during the current shell session. Set the variable in your shell starting file, such as the ~/.bashrc or ~/.profile file, if you want it to apply to subsequent shell sessions.
Using the client library
The use of the client library is demonstrated in the example that follows.
// const projectId = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
// const processorId = 'YOUR_PROCESSOR_ID'; // Create processor in Cloud Console
// const filePath = '/path/to/local/pdf';
const {DocumentProcessorServiceClient} =
require('@google-cloud/documentai').v1;
const client = new DocumentProcessorServiceClient();
async function quickstart() {
// You must create new processors in the Cloud Console first
const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;
const fs = require('fs').promises;
const imageFile = await fs.readFile(filePath);
const encodedImage = Buffer.from(imageFile).toString('base64');
const request = {
name,
rawDocument: {
content: encodedImage,
mimeType: 'application/pdf',
},
};
const [result] = await client.processDocument(request);
const {document} = result;
const {text} = document;
const getText = textAnchor => {
if (!textAnchor.textSegments || textAnchor.textSegments.length === 0) {
return '';
}
const startIndex = textAnchor.textSegments[0].startIndex || 0;
const endIndex = textAnchor.textSegments[0].endIndex;
return text.substring(startIndex, endIndex);
};
console.log('The document contains the following paragraphs:');
const [page1] = document.pages;
const {paragraphs} = page1;
for (const paragraph of paragraphs) {
const paragraphText = getText(paragraph.layout.textAnchor);
console.log(`Paragraph text:\n${paragraphText}`);
}
}
Python
from google.api_core.client_options import ClientOptions
from google.cloud import documentai_v1 as documentai
# TODO(developer): Uncomment these variables before running the sample.
# project_id = 'YOUR_PROJECT_ID'
# location = 'YOUR_PROCESSOR_LOCATION'
# processor_id = 'YOUR_PROCESSOR_ID'
# file_path = '/path/to/local/pdf'
# mime_type = 'application/pdf' # Refer to https://cloud.google.com/document-ai/docs/processors-list for supported file types
def quickstart(
project_id: str, location: str, processor_id: str, file_path: str, mime_type: str
):
# You must set the api_endpoint if you use a location other than 'us', e.g.:
opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project_id/locations/location/processor/processor_id
# You must create new processors in the Cloud Console first
name = client.processor_path(project_id, location, processor_id)
with open(file_path, "rb") as image:
image_content = image.read()
raw_document = documentai.RawDocument(content=image_content, mime_type=mime_type)
request = documentai.ProcessRequest(name=name, raw_document=raw_document)
result = client.process_document(request=request)
# For a full list of Document object attributes, please reference this page:
# https://cloud.google.com/python/docs/reference/documentai/latest/google.cloud.documentai_v1.types.Document
document = result.document
# Read the text recognition output from the processor
print("The document contains the following text:")
print(document.text)
Cloud Document AI API Connector Overview
The built-in features that can be used to access other Google Cloud products within a workflow are defined by the Workflows connector.
An overview of each connector is given in this section. When used in a call step, connectors function right out of the box; therefore, there is no need to import or load connector libraries in a process.
Cloud Document AI API
Service that uses cutting-edge Google AI, including natural language processing, computer vision, translation, and AutoML, to extract structured information from unstructured or semi-structured documents.
Cloud Document AI connector sample
# This workflow demonstrates how to use the process and batchProcess
# APIs in the Cloud Document AI connector.
# Expected successful output: the batch process response.
- process_document:
call: googleapis.documentai.v1.projects.locations.processors.process
args:
name: "projects/placeholder/locations/us/processors/placeholder"
location: "us"
body:
rawDocument:
# Procedure to create some test raw content:
# 1. Create a docx with some arbitrary texts in it. For example, "hello world".
# 2. Export a pdf file from Microsoft Word.
# 3. Use any online pdf-to-raw converter to convert the file to raw base64 texts. (https://pdfmall.com/pdf-to-raw).
# 4. Copy and paste the content here.
content: ""
mimeType: "application/pdf"
result: process_resp
- batch_process:
call: googleapis.documentai.v1.projects.locations.processors.batchProcess
args:
name: "projects/cloudworkflows-test-dev/locations/us/processors/583f73e6003945cc"
location: "us"
body:
inputDocuments:
gcsDocuments:
documents:
- gcsUri: "gs://connector-demo/documents/helloworld1.pdf"
mimeType: "application/pdf"
- gcsUri: "gs://connector-demo/documents/helloworld2.pdf"
mimeType: "application/pdf"
documentOutputConfig:
gcsOutputConfig:
gcsUri: "gs://connector-demo/documents/"
result: batch_process_resp
- return:
return: ${batch_process_resp}
The DocAI platform is a centralized document processing console that provides easy access to all parsers and tools. You can automate and validate documents from the platform to simplify operations, eliminate uncertainty, and maintain correct and compliant data.
How does Google's AI for documents operate?
By providing structure through content classification, entity extraction, advanced searching, and other methods, Document AI is a document understanding solution that takes unstructured data (documents, forms, etc.) and makes it simpler to comprehend, analyze, and consume.
How does intelligent document processing work?
Unstructured data is transformed using intelligent document processing (IDP). Unstructured and semi-structured data can be transformed into structured, usable information, offering document-centric business processes with end-to-end automation.
How does AI understand the text?
The automatic method classifies a text based on predetermined criteria using machine learning models and algorithms. Based on the frequency of a group of words, text classification analytics can identify patterns and sentiments in a text by using the BOW model.
Conclusion
In this article, we have extensively discussed Document AI. We have also explained Document ai features, language support in Document ai, supported files, client libraries, and more in detail.