Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Last Updated: Mar 27, 2024

Document AI

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

Document AI is a document understanding solution that takes unstructured data (documents, forms, etc.) and gives the data structure through content classification, entity extraction, sophisticated searching, and other methods. This makes the data easier to comprehend, analyze, and consume. To assist you in developing a scalable, cloud-based document interpretation solution, Document AI uses machine learning and Google Cloud. The Data Processing and Security Terms impose obligations on Google concerning Document AI.

You can use Document AI to:

  • Image-to-text conversion
  • Document classification
  • extracting and analyzing entities

For more information about Document AI, let's dive into the article.

Document AI features and solutions

  • General Processors: An excellent place to start when needing to process documents.
    • OCR (Optical Character Recognition) recognizes and extracts text from various documents.
    • Form Parser Key-value pairs are an example of form elements to extract.
    • Intelligent Documents Quality (Preview) Analyze a document's readability to determine its quality.
  • Specialized Processors: Specific models for the most prevalent document kinds in use today.
    • Procurement DocAI - Automate procurement data capture at scale by converting unstructured documents like invoices and receipts into structured data to boost operational effectiveness, enhance customer experience, and assist decision-making.
    • Lending DocAI - Automating mortgage documents' processing will transform the home loan experience for both borrowers and lenders. Streamline data acquisition and cut processing times while maintaining regulatory and compliance standards.
    • Contract DocAI - Simplify and extract highly accurate data from contracts to digitize and speed up contract life cycle management.
    • Identity DocAI - Specialized models to accurately and automatically extract information from identity documents.
  • Enterprise Knowledge Graph: Add linkages and real-world elements to data.
  • Human-in-the-Loop (HITL) - Human verification and corrections to assure the accuracy of data retrieved by Document AI processors for usage in crucial business applications.

Document AI processors

Depending on their capability, a growing number of processors (also known as parsers or splitters) are available from Document AI to extract data from particular document types. The following document processors are available right now from Document AI:-

  • General processors
  • Contract processors
  • Identity processors
  • Lending processors
  • Procurement processors

Using Document AI processors

The main steps for using Document AI are as follows:

  • For your use case, pick an appropriate processor.
  • Use the Cloud console to create a processor.
  • You can send your documents to the prediction endpoint that Document AI creates.
  • Refer to Building a processor for comprehensive instructions.
  • Send us your Document (s), so we can process them.
  • Once the Document (s) have been processed by Document AI, one or more document objects containing the extracted, structured information are returned.
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Human in the Loop documentation 

Before being employed in crucial business applications, HITL AI enables human verification and adjustments to guarantee the accuracy of data gathered by humans in the Loop processors. It provides a workflow and user interface (UI) for people to examine, confirm, and correct the data humans in the Loop processors have collected from documents. It is utilized in various sectors, including government, manufacturing, health, and financial services.

We give customers two choices for human labelers:

  • Bring your labelers so your staff or a partner organization can review the documents.
  • Documents can be privately previewed using Google HITL Workforce. Only documents without personally identifiable information are subject to this evaluation (PII).

Features

We assist with the following attributes:

  • Filters with confidence thresholds to control the volume of documents flowing through HITL.
  • Management of the labeler pool, including task assignments and labeler and task-specific efficiency data.
  • UI cues and features decrease the time a labeler handles a document.
  • Analytics and metrics for tasks and labelers, allowing you to automate HITL processes

Benefits

HITL AI provides these advantages:

  • Risk mitigation- reduce the likelihood that crucial data, such as invoice amounts, billing addresses, loan amounts, etc., would be inaccurate and cause financial loss.
  • Simplify Exception Handling- Easily implement a system for handling exceptions and human review.
  • Workforce Efficiencies- Manage, oversee, and enhance workforce productivity through managing the human review.
  • Cost control- Manage human review expenses using programmable filters.
  • Data completeness- Ensure that your extracted data is complete for your future business applications.

Processors supported

The HITL review workflow now supports the following processors:

General Processors

  • Form Parser

 

Procurement Processors

  • Expense Parser
  • Invoice Parser
  • Utility Parser
     

Lending Processors

  • 1003 Parser
  • 1040 Parser
  • 1040 Schedule C Parser
  • 1040 Schedule E Parser
  • 1099-DIV Parser
  • 1099-G Parser
  • 1099-INT Parser
  • 1099-MISC Parser
  • 1120S Parser
  • Bank Statement Parser
  • HOA Statement Parser
  • Mortgage Statement Parser
  • Pay Slip Parser
  • Retirement/Investment Statement Parser
  • SSA-89 Parser
  • W2 Parser
  • W9 Parser

 

Contract Processors

  • Contract parser

 

Identity Processors

  • France Driver License Parser
  • France National ID Parser
  • France Passport Parser
  • US Driver License Parser
  • US Passport Parser

Language support

The text recognition feature (OCR) of the Document AI API can recognize a range of languages, including several languages inside a single document. The Document object's detectedLanguages field contains a BCP-47 identifier for each language identified by the Document AI API. See the Cloud Vision OCR Language Support documentation for a list of the languages and scripts that Document OCR (Optical Character Recognition) supports. Other processors might support a smaller number of languages.

General processors

General processors table

Contract processors

Contract processors table

Identity processors

Identity processors table

Lending processors

Lending processors table

Procurement processors

Procurement processors table

Supported Files

Following are the image formats that Document AI supports.

Supported Files table

Note: A few of these image formats are "lossy" (for example, JPEG). The image quality and accuracy of Document AI results may suffer as file sizes for lossy formats are reduced.

Document scan resolution

Document scans must be at least 200 dpi for Document AI to produce the most accurate OCR results (dots per inch). The most significant results will often be obtained at 300 dpi or higher.

Document AI client libraries 

This section demonstrates how to use the Cloud Client Libraries for the Document AI API. In Client Libraries Explained, you may learn more about client libraries for Cloud APIs, including the older Google API client libraries.

Installing the client library

This section explains how to start with the Document AI Cloud Client Libraries.

Java

Add the following to your pom.xml file if you're using Maven. The Google Cloud Platform Libraries BOM has more information about BOMs.

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.0.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-document-ai</artifactId>
    <version>2.6.0</version>
</dependency>

The following should be included in your dependencies if you are using Gradle:

implementation platform('com.google.cloud:libraries-bom:26.0.0')

implementation 'com.google.cloud:google-cloud-document-ai'

Add the following to your dependencies if you're using sbt:

libraryDependencies += "com.google.cloud" % "google-cloud-document-ai" % "2.6.0"

 

The following IDE plugins can be used to add client libraries to your project if you're working with Visual Studio Code, IntelliJ, or Eclipse:

  • Cloud Code for VS Code
  • Cloud Code for IntelliJ 
  • Cloud Tools for Eclipse

The plugins offer extra features like key management for service accounts. Details can be found in the documentation for each plugin.

Node.Js

npm install @google-cloud/documentai

Python

pip install --upgrade google-cloud-documentai

Setting up authentication

You must first configure authentication before you can start the client library. As demonstrated in the subsequent steps, making a service account and setting an environment variable are two ways to accomplish it. See Authenticating as a service account for other methods of authentication.

Console

Setting up a service account:-

  • Navigate to the Create service account page in the console.
    Go to Create service account
  • Choose a project.
  • Enter a name in the Service account name field. Based on this name, the console populates the Service account ID column.
  • Enter a description in the Service account description area. Service account, as an illustration, for a quick start.
  • Click Create and continue.
  • To complete creating the service account, click Done.
  • Keep your browser window open. It is put to use in the following action.

 

Make a service account key:

  • Click the email associated with the service account you created in the console.
  • Press Keys.
  • Click Create new key after selecting Add key.
  • Press Create. On your computer, a JSON key file is downloaded.
  • To close, click.

Gcloud

Configure authentication:-

  1. Setting up the service account
  2. gcloud iam service-accounts create NAME
  3. NAME should be changed to the name of the service account.
  4. Making the key file:
    gcloud iam service-accounts keys create FILE_NAME.json --iam-account=SERVICE_ACCOUNT_NAME@PROJECT_ID.iam.gserviceaccount.com
  5. Substitute the following:
  • FILE_NAME: The name of the key file
  • SERVICE_ACCOUNT_NAME: The service account's name
  • PROJECT_ID: The project ID where the service account was created

Set the environment variable GOOGLE_APPLICATION_CREDENTIALS to provide authentication credentials to your application code. This variable only functions during the current shell session. Set the variable in your shell starting file, such as the ~/.bashrc or ~/.profile file, if you want it to apply to subsequent shell sessions.

Using the client library

The use of the client library is demonstrated in the example that follows.

Java

import com.google.cloud.documentai.v1.Document;
import com.google.cloud.documentai.v1.DocumentProcessorServiceClient;
import com.google.cloud.documentai.v1.ProcessRequest;
import com.google.cloud.documentai.v1.ProcessResponse;
import com.google.cloud.documentai.v1.RawDocument;
import com.google.protobuf.ByteString;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeoutException;

public class QuickStart {
  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    
    String projectId = "your-project-id";
    String location = "your-project-location"; 
    String processorId = "your-processor-id";
    String filePath = "path/to/input/file.pdf";
    quickStart(projectId, location, processorId, filePath);
  }

  public static void quickStart(
      String projectId, String location, String processorId, String filePath)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
  

    try (DocumentProcessorServiceClient client = DocumentProcessorServiceClient.create()) {

      String name =
          String.format("projects/%s/locations/%s/processors/%s", projectId, location, processorId);

      byte[] imageFileData = Files.readAllBytes(Paths.get(filePath));

      ByteString content = ByteString.copyFrom(imageFileData);

      RawDocument document =
          RawDocument.newBuilder().setContent(content).setMimeType("application/pdf").build();

      ProcessRequest request =
          ProcessRequest.newBuilder().setName(name).setRawDocument(document).build();

      ProcessResponse result = client.processDocument(request);
      Document documentResponse = result.getDocument();
      String text = documentResponse.getText();

      System.out.println("The document contains the following paragraphs:");
      Document.Page firstPage = documentResponse.getPages(0);
      List<Document.Page.Paragraph> paragraphs = firstPage.getParagraphsList();

      for (Document.Page.Paragraph paragraph : paragraphs) {
        String paragraphText = getText(paragraph.getLayout().getTextAnchor(), text);
        System.out.printf("Paragraph text:\n%s\n", paragraphText);
      }
    }
  }

  private static String getText(Document.TextAnchor textAnchor, String text) {
    if (textAnchor.getTextSegmentsList().size() > 0) {
      int startIdx = (int) textAnchor.getTextSegments(0).getStartIndex();
      int endIdx = (int) textAnchor.getTextSegments(0).getEndIndex();
      return text.substring(startIdx, endIdx);
    }
    return "[NO TEXT]";
  }
}

Node.Js

// const projectId = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION'; // Format is 'us' or 'eu'
// const processorId = 'YOUR_PROCESSOR_ID'; // Create processor in Cloud Console
// const filePath = '/path/to/local/pdf';

const {DocumentProcessorServiceClient} =
  require('@google-cloud/documentai').v1;

const client = new DocumentProcessorServiceClient();

async function quickstart() {

  // You must create new processors in the Cloud Console first
  const name = `projects/${projectId}/locations/${location}/processors/${processorId}`;

  const fs = require('fs').promises;
  const imageFile = await fs.readFile(filePath);

  const encodedImage = Buffer.from(imageFile).toString('base64');

  const request = {
    name,
    rawDocument: {
      content: encodedImage,
      mimeType: 'application/pdf',
    },
  };

  const [result] = await client.processDocument(request);
  const {document} = result;

  
const {text} = document;

  const getText = textAnchor => {
    if (!textAnchor.textSegments || textAnchor.textSegments.length === 0) {
      return '';
    }

    const startIndex = textAnchor.textSegments[0].startIndex || 0;
    const endIndex = textAnchor.textSegments[0].endIndex;

    return text.substring(startIndex, endIndex);
  };

  console.log('The document contains the following paragraphs:');
  const [page1] = document.pages;
  const {paragraphs} = page1;

  for (const paragraph of paragraphs) {
    const paragraphText = getText(paragraph.layout.textAnchor);
    console.log(`Paragraph text:\n${paragraphText}`);
  }
}

Python

from google.api_core.client_options import ClientOptions
from google.cloud import documentai_v1 as documentai

# TODO(developer): Uncomment these variables before running the sample.
# project_id = 'YOUR_PROJECT_ID'
# location = 'YOUR_PROCESSOR_LOCATION' 
# processor_id = 'YOUR_PROCESSOR_ID' 
# file_path = '/path/to/local/pdf'
# mime_type = 'application/pdf' # Refer to https://cloud.google.com/document-ai/docs/processors-list for supported file types


def quickstart(
    project_id: str, location: str, processor_id: str, file_path: str, mime_type: str
):
    # You must set the api_endpoint if you use a location other than 'us', e.g.:
    opts = ClientOptions(api_endpoint=f"{location}-documentai.googleapis.com")

    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # The full resource name of the processor, e.g.:
    # projects/project_id/locations/location/processor/processor_id
    # You must create new processors in the Cloud Console first
    name = client.processor_path(project_id, location, processor_id)

    with open(file_path, "rb") as image:
        image_content = image.read()

    raw_document = documentai.RawDocument(content=image_content, mime_type=mime_type)

    request = documentai.ProcessRequest(name=name, raw_document=raw_document)

    result = client.process_document(request=request)

    # For a full list of Document object attributes, please reference this page:
    # https://cloud.google.com/python/docs/reference/documentai/latest/google.cloud.documentai_v1.types.Document
    document = result.document

    # Read the text recognition output from the processor
    print("The document contains the following text:")
    print(document.text)

Cloud Document AI API Connector Overview 

The built-in features that can be used to access other Google Cloud products within a workflow are defined by the Workflows connector.

An overview of each connector is given in this section. When used in a call step, connectors function right out of the box; therefore, there is no need to import or load connector libraries in a process.

Cloud Document AI API

Service that uses cutting-edge Google AI, including natural language processing, computer vision, translation, and AutoML, to extract structured information from unstructured or semi-structured documents.

Cloud Document AI connector sample

# This workflow demonstrates how to use the process and batchProcess
# APIs in the Cloud Document AI connector.
# Expected successful output: the batch process response.

- process_document:
    call: googleapis.documentai.v1.projects.locations.processors.process
    args:
      name: "projects/placeholder/locations/us/processors/placeholder"
      location: "us"
      body:
        rawDocument:
          # Procedure to create some test raw content:
          # 1. Create a docx with some arbitrary texts in it. For example, "hello world".
          # 2. Export a pdf file from Microsoft Word.
          # 3. Use any online pdf-to-raw converter to convert the file to raw base64 texts. (https://pdfmall.com/pdf-to-raw).
          # 4. Copy and paste the content here.
          content: ""
          mimeType: "application/pdf"
    result: process_resp
- batch_process:
    call: googleapis.documentai.v1.projects.locations.processors.batchProcess
    args:
      name: "projects/cloudworkflows-test-dev/locations/us/processors/583f73e6003945cc"
      location: "us"
      body:
        inputDocuments:
          gcsDocuments:
            documents:
              - gcsUri: "gs://connector-demo/documents/helloworld1.pdf"
                mimeType: "application/pdf"
              - gcsUri: "gs://connector-demo/documents/helloworld2.pdf"
                mimeType: "application/pdf"
        documentOutputConfig:
          gcsOutputConfig:
            gcsUri: "gs://connector-demo/documents/"
    result: batch_process_resp
- return:
    return: ${batch_process_resp}

Module: googleapis.documentai.v1.operations

Cloud Document AI API 1

Module: googleapis.documentai.v1.projects.locations

Cloud Document AI API 2

Module: googleapis.documentai.v1.projects.locations.operations

Cloud Document AI API 3

Module: googleapis.documentai.v1.projects.locations.processors

Cloud Document AI API 4

Module: googleapis.documentai.v1.projects.locations.processors.humanReviewConfig

Cloud Document AI API 5

Module: googleapis.documentai.v1.projects.locations.processors.processorVersions

Cloud Document AI API 6

Module: googleapis.documentai.v1.projects.operations

Cloud Document AI API 7

Module: googleapis.documentai.v1beta2.projects.documents

Cloud Document AI API 8

Module: googleapis.documentai.v1beta2.projects.locations.documents

Cloud Document AI API 9

Module: googleapis.documentai.v1beta2.projects.locations.operations

Cloud Document AI API 10

Module: googleapis.documentai.v1beta2.projects.operations

Cloud Document AI API 11

Module: googleapis.documentai.v1beta3.projects.locations

Cloud Document AI API 11

Module: googleapis.documentai.v1beta3.projects.locations.operations

Cloud Document AI API 12

Module: googleapis.documentai.v1beta3.projects.locations.processors

Cloud Document AI API 12

Module: googleapis.documentai.v1beta3.projects.locations.processors.humanReviewConfig

Cloud Document AI API 14

Module: googleapis.documentai.v1beta3.projects.locations.processors.processorVersions

Cloud Document AI API 15

Frequently Asked Questions

What does AI for documents do?

The DocAI platform is a centralized document processing console that provides easy access to all parsers and tools. You can automate and validate documents from the platform to simplify operations, eliminate uncertainty, and maintain correct and compliant data.

How does Google's AI for documents operate?

By providing structure through content classification, entity extraction, advanced searching, and other methods, Document AI is a document understanding solution that takes unstructured data (documents, forms, etc.) and makes it simpler to comprehend, analyze, and consume.

How does intelligent document processing work?

Unstructured data is transformed using intelligent document processing (IDP). Unstructured and semi-structured data can be transformed into structured, usable information, offering document-centric business processes with end-to-end automation.

How does AI understand the text?

The automatic method classifies a text based on predetermined criteria using machine learning models and algorithms. Based on the frequency of a group of words, text classification analytics can identify patterns and sentiments in a text by using the BOW model.

Conclusion

In this article, we have extensively discussed Document AI. We have also explained Document ai features, language support in Document ai, supported files, client libraries, and more in detail.

We hope this blog has helped you enhance your Document AI knowledge. If you would like to learn more, check out our articles on introduction to cloud computingcloud computing technologiesall about GCP and AWS Vs. Azure Vs. Google Cloud. Practice makes a man perfect. To practice and improve yourself in the interview, you can check out Top 100 SQL problemsInterview experienceCoding interview questions, and the Ultimate guide path for interviews.

Do upvote our blog to help other ninjas grow. Happy Coding!

thank you image
Topics covered
1.
Introduction
2.
Document AI features and solutions
2.1.
Document AI processors
2.2.
Using Document AI processors
3.
Human in the Loop documentation 
3.1.
Features
3.2.
Benefits
3.3.
Processors supported
4.
Language support
4.1.
General processors
4.2.
Contract processors
4.3.
Identity processors
4.4.
Lending processors
4.5.
Procurement processors
5.
Supported Files
5.1.
Document scan resolution
6.
Document AI client libraries 
6.1.
Installing the client library
6.1.1.
Java
6.1.2.
Node.Js
6.1.3.
Python
6.2.
Setting up authentication
6.2.1.
Console
6.2.2.
Gcloud
6.3.
Using the client library
6.3.1.
Java
6.3.2.
Node.Js
6.3.3.
Python
7.
Cloud Document AI API Connector Overview 
7.1.
Cloud Document AI API
7.2.
Cloud Document AI connector sample
7.3.
Module: googleapis.documentai.v1.operations
7.4.
Module: googleapis.documentai.v1.projects.locations
7.5.
Module: googleapis.documentai.v1.projects.locations.operations
7.6.
Module: googleapis.documentai.v1.projects.locations.processors
7.7.
Module: googleapis.documentai.v1.projects.locations.processors.humanReviewConfig
7.8.
Module: googleapis.documentai.v1.projects.locations.processors.processorVersions
7.9.
Module: googleapis.documentai.v1.projects.operations
7.10.
Module: googleapis.documentai.v1beta2.projects.documents
7.11.
Module: googleapis.documentai.v1beta2.projects.locations.documents
7.12.
Module: googleapis.documentai.v1beta2.projects.locations.operations
7.13.
Module: googleapis.documentai.v1beta2.projects.operations
7.14.
Module: googleapis.documentai.v1beta3.projects.locations
7.15.
Module: googleapis.documentai.v1beta3.projects.locations.operations
7.16.
Module: googleapis.documentai.v1beta3.projects.locations.processors
7.17.
Module: googleapis.documentai.v1beta3.projects.locations.processors.humanReviewConfig
7.18.
Module: googleapis.documentai.v1beta3.projects.locations.processors.processorVersions
8.
Frequently Asked Questions
8.1.
What does AI for documents do?
8.2.
How does Google's AI for documents operate?
8.3.
How does intelligent document processing work?
8.4.
How does AI understand the text?
9.
Conclusion