Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Detecting labels in an image by using client libraries
2.1.
Install the client library
2.2.
Label detection
3.
Detect text in images
3.1.
Optical Character Recognition (OCR)
3.2.
Text detection requests
3.2.1.
Detect text in a local image
3.2.2.
Detect text in a remote image
4.
Detect Text in Files
4.1.
Limitations
4.2.
Document text detection requests
5.
Detect Handwriting in Images
5.1.
Document text detection requests
5.1.1.
Detect document text in a local image
5.1.2.
Detect document text in a remote image
6.
Detect faces in Cloud Vision
6.1.
Face detection requests
6.1.1.
Detect Faces in a local image
6.1.2.
Detect Faces in a remote image
7.
Frequently Asked Questions
7.1.
What is OCR API?
7.2.
Which algorithm is used to detect the text in images?
7.3.
What is the Google vision?
8.
Conclusion
Last Updated: Mar 27, 2024

Cloud Vision

Introduction

With the help of the Google Cloud Vision API, application developers can easily add vision detection features like image labelling, face and landmark identification, optical character recognition (OCR), and the tagging of explicit information to their product.
 

Google Cloud

 

Due to its wide range of applications in a variety of fields, optical character recognition (OCR), the process of translating handwritten or printed texts into machine-encoded text, has historically been a key topic of research in computer vision — OCR is used by banks to compare statements and by governments to get survey feedback.

Powerful pre-trained machine learning models are available through REST and RPC APIs from Vision API. Easily sort photographs into millions of predefined categories by labelling them. Detect objects and faces, interpret printed and handwritten text, and add essential metadata to your image database.

So, let's get started.

Detecting labels in an image by using client libraries

Before diving into the intricacies of the Cloud Vision API, one should become acquainted with how to use the Vision API in their chosen programming language:

  1. Create an account if you are new to Google Cloud to evaluate how google products perform in real-world scenarios.
     
  2. Select or create a Google Cloud project through the Google Cloud console's project selector page.
     
  3. Check that billing for your Cloud project is enabled. Learn how to determine whether billing is enabled on a project.
     
  4. Enable the Vision API.
     
  5. Create a service account:

    1. Navigate to the Create service account page in the console.
       
    2. Select your project.
       
    3. Enter the name in the Service account name field. Based on this name, the console fills up the Service account ID field.
      Enter a description in the Service account description area. As an example, consider the Service account for quickstart.
       
    4. Click Create and continue.
       
    5. Grant your service account the following role(s) to gain access to your project: Project > Owner.
      Choose a role from the Select a role list.
       
    6. Click on Continue.
       
    7. To finish creating the service account, click Done.
      Keep your browser window open. It will be used in the next step.

       
  6. Create a service account key:

    1. Click the email address for the service account you created in the console.
       
    2. Click on Keys.
       
    3. Click Add key, followed by Create new key.
       
    4. Click the Create button. Your computer will download a JSON key file.
       
    5. Click Close.

       
  7. Set the GOOGLE APPLICATION CREDENTIALS environment variable to the location of the JSON file containing your service account key. This variable only applies to the current shell session, so if you open a new one, you must set it again.
    KEY_PATH should be replaced with the path to the JSON file containing your service account key.

    1. For Linux or macOS
      export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"
       
    2. For Windows
      For PowerShell:
      $env:GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"

      For command prompt:
      set GOOGLE_APPLICATION_CREDENTIALS=”KEY_PATH”

Install the client library

We will be implementing the code in node.js. For more on setting up your Node.js development environment, refer to the Node.js Development Environment Setup Guide.

npm install --save @google-cloud/vision

Label detection

You may now use the Vision API to extract data from images, such as label detection. To perform your first image label detection request, run the code below.

Implementing in Node.js

async function quickstart() {
  // Import Google Cloud client library
  const vision = require('@google-cloud/vision');

  // Creates a client
  const client = new vision.ImageAnnotatorClient();

  // Performs label detection on the image file
  const [result] = await client.labelDetection('./resources/wakeupcat.jpg');
  const labels = result.labelAnnotations;
  console.log('Labels:');
  labels.forEach(label => console.log(label.description));
}
quickstart();

Detect text in images

The process of detecting text in a photograph and then enclosing it with a rectangular bounding box is known as text detection. Image-based or frequency-based techniques can be used to detect text.

Image-based techniques are used to divide images into sections. Each segment is made up of pixels with similar properties that are linked together. To categorize and structure the content, statistical attributes of connected components are used. To categorize the components as text or non-text, machine learning techniques such as support vector machines and convolutional neural networks are used. An example of text detection is shown below.

Optical Character Recognition (OCR)

Text may be detected and extracted from images using the Vision API. Annotation features that facilitate optical character recognition (OCR) are as follows:

  • TEXT_DETECTION: It extracts the text out of any image and outputs it (e.g., photos of street views or sceneries). The model is somewhat more resilient in reading words of diverse types because it was originally created to be useful in a variety of lighting circumstances, but only at a more sparse level. 
    The whole strings, as well as the individual words and their respective bounding boxes are all included in the JSON file that was returned.

     

Example of OCR

  • DOCUMENT_TEXT_DETECTION: Although it extracts text from images, the response is optimized to work best with dense text and documents. The JSON contains information on pages, blocks, paragraphs, words, and breaks.

Text detection requests

When a text recognition request is made, the input image is searched for all possible glyphs or characters, and then each string is analyzed.

Detect text in a local image

By including the contents of the local image file as a base64-encoded string in the body of your request, you may use the Vision API to perform feature detection on the local image file.

Implementing in Node.js

const vision = require('@google-cloud/vision');
// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO for developer, Uncomment the following line before running the sample.
 */
// const fileName = 'Local image file, e.g. /path/to/image.png';

// Performs text detection on the local file
const [result] = await client.textDetection(fileName);
const detections = result.textAnnotations;
console.log('Text:');
detections.forEach(text => console.log(text));


Detect text in a remote image

Without sending the contents of the image file in the body of your request, the Vision API can conduct feature detection directly on an image file that is stored in Google Cloud Storage or online.

Implementing in Node.js

// Import Google Cloud client libraries
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO for developer, Uncomment the following lines before running the sample.
 */
// const bucketName = 'Bucket where the file resides, e.g. my-bucket';
// const fileName = 'Path to file within bucket, e.g. path/to/image.png';

// Performs text detection on the gcs file
const [result] = await client.textDetection(`gs://${bucketName}/${fileName}`);
const detections = result.textAnnotations;
console.log('Text:');
detections.forEach(text => console.log(text));

Detect Text in Files

Text from PDF and TIFF files kept in cloud storage can be detected and transcribed using the Vision API.

The files:asyncBatchAnnotate function, which does an offline (asynchronous) request and delivers its status using the operations resources, must be used to request document text detection from PDF and TIFF.

A JSON file is produced in the designated Cloud Storage bucket and used to store the output from a PDF/TIFF request.

Limitations

Up to 2000-page PDF/TIFF files are supported by the Vision API. Larger files will produce an error message.

Document text detection requests

Currently, only files kept in Cloud Storage buckets may detect PDF/TIFF documents. Similar JSON files are saved to a Cloud Storage bucket as responses.

Example of Text detection in a file

gs://cloud-samples-data/vision/pdf_tiff/census2010.pdfSource: Census Of India 2011.
 

Implementing in Node.js

// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision').v1;

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO for developer, Uncomment the following lines before running the sample.
 */
// Bucket where the file resides
// const bucketName = 'my-bucket';
// Path to PDF file within bucket
// const fileName = 'path/to/document.pdf';
// The folder to store the results
// const outputPrefix = 'results'

const gcsSourceUri = `gs://${bucketName}/${fileName}`;
const gcsDestinationUri = `gs://${bucketName}/${outputPrefix}/`;

const inputConfig = {
  // Supported mime_types are: 'application/pdf' and 'image/tiff'
  mimeType: 'application/pdf',
  gcsSource: {
    uri: gcsSourceUri,
  },
};
const outputConfig = {
  gcsDestination: {
    uri: gcsDestinationUri,
  },
};
const features = [{type: 'DOCUMENT_TEXT_DETECTION'}];
const request = {
  requests: [
    {
      inputConfig: inputConfig,
      features: features,
      outputConfig: outputConfig,
    },
  ],
};

const [operation] = await client.asyncBatchAnnotateFiles(request);
const [filesResponse] = await operation.promise();
const destinationUri =
  filesResponse.responses[0].outputConfig.gcsDestination.uri;
console.log('Json saved to: ' + destinationUri);

Detect Handwriting in Images

The ability of a computer to read and interpret intelligible handwritten input from sources such as paper documents, pictures, touch screens, and other devices is known as handwriting recognition (HWR), sometimes known as handwritten text recognition (HTR).

The Vision API can detect and extract text from images:

  • DOCUMENT_TEXT_DETECTION: Although it extracts text from images, the response is optimized to work best with dense text and documents. The JSON contains information on pages, blocks, paragraphs, words, and breaks.
Example of Handwriting detection

Document text detection requests

Detect document text in a local image

By including the contents of the local image file as a base64-encoded string in the body of your request, you may use the Vision API to perform feature detection on the local image file.

Implementing in Node.js

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO for developer, Uncomment the following line before running the sample.
 */
// const fileName = 'Local image file, e.g. /path/to/image.png';

// Read a local image as a text document
const [result] = await client.documentTextDetection(fileName);
const fullTextAnnotation = result.fullTextAnnotation;
console.log(`Full text: ${fullTextAnnotation.text}`);
fullTextAnnotation.pages.forEach(page => {
  page.blocks.forEach(block => {
    console.log(`Block confidence: ${block.confidence}`);
    block.paragraphs.forEach(paragraph => {
      console.log(`Paragraph confidence: ${paragraph.confidence}`);
      paragraph.words.forEach(word => {
        const wordText = word.symbols.map(s => s.text).join('');
        console.log(`Word text: ${wordText}`);
        console.log(`Word confidence: ${word.confidence}`);
        word.symbols.forEach(symbol => {
          console.log(`Symbol text: ${symbol.text}`);
          console.log(`Symbol confidence: ${symbol.confidence}`);
        });
      });
    });
  });
});


Detect document text in a remote image

Without sending the contents of the image file in the body of your request, the Vision API can conduct feature detection directly on an image file that is stored in Google Cloud Storage or online.

Implementing in Node.js

// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO for developer, Uncomment the following lines before running the sample.
 */
// const bucketName = 'Bucket where the file resides, e.g. my-bucket';
// const fileName = 'Path to file within bucket, e.g. path/to/image.png';

// Read a remote image as a text document
const [result] = await client.documentTextDetection(
  `gs://${bucketName}/${fileName}`
);
const fullTextAnnotation = result.fullTextAnnotation;
console.log(fullTextAnnotation.text);

Detect faces in Cloud Vision

Face detection identifies several faces in an image together with the main facial characteristics that go with them, including emotional mood or headwear use. Facial Recognition for an individual is not supported.

Face detection example

Image creditHimanshu Singh Gurjar on Unsplash (annotations added).

Face detection requests

Consult the API documentation before creating a call to the Vision API. You will be asking the pictures resource to annotate your image in this scenario. An object with a requests list represents a request to this API. Each item in this list has two pieces of information:

  • Data for the base64-encoded image
     
  • a list of the attributes you want to be noted next to that picture.
     

Detect Faces in a local image

By including the contents of the local image file as a base64-encoded string in the body of your request, you may use the Vision API to perform feature detection on the local image file.

Implementing in Node.js

// Imports the Google Cloud client library
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO for developer, Uncomment the following line before running the sample.
 */
// const fileName = 'Local image file, e.g. /path/to/image.png';

const [result] = await client.faceDetection(fileName);
const faces = result.faceAnnotations;
console.log('Faces:');
faces.forEach((face, i) => {
  console.log(`  Face #${i + 1}:`);
  console.log(`    Joy: ${face.joyLikelihood}`);
  console.log(`    Anger: ${face.angerLikelihood}`);
  console.log(`    Sorrow: ${face.sorrowLikelihood}`);
  console.log(`    Surprise: ${face.surpriseLikelihood}`);
});


Detect Faces in a remote image

Without sending the contents of the image file in the body of your request, the Vision API can conduct feature detection directly on an image file that is stored in Google Cloud Storage or online.

Implementing in Node.js

// Imports the Google Cloud client libraries
const vision = require('@google-cloud/vision');

// Creates a client
const client = new vision.ImageAnnotatorClient();

/**
 * TODO for developer, Uncomment the following lines before running the sample.
 */
// const bucketName = 'Bucket where the file resides, e.g. my-bucket';
// const fileName = 'Path to file within bucket, e.g. path/to/image.png';

// Performs face detection on the gcs file
const [result] = await client.faceDetection(`gs://${bucketName}/${fileName}`);
const faces = result.faceAnnotations;
console.log('Faces:');
faces.forEach((face, i) => {
  console.log(`  Face #${i + 1}:`);
  console.log(`    Joy: ${face.joyLikelihood}`);
  console.log(`    Anger: ${face.angerLikelihood}`);
  console.log(`    Sorrow: ${face.sorrowLikelihood}`);
  console.log(`    Surprise: ${face.surpriseLikelihood}`);
});

Frequently Asked Questions

What is OCR API?

Optical Character Recognition (OCR) API allows you to extract text from image files and PDF documents and save it in JSON, CSV, Excel, or other file formats.
 

Which algorithm is used to detect the text in images?

An optical character recognition (OCR) algorithm is used to convert images into text.
 

What is the Google vision?

With the help of the Google Cloud Vision API, application developers can easily add vision detection features like image labeling, face and landmark identification, optical character recognition (OCR), and the tagging of explicit information to their product.

Conclusion

We explored the Cloud Vision API in this blog, as well as optical character recognition (OCR), detecting text in files and images, detecting handwriting in images, and face detection using the cloud vision API.
If you think the blog has helped you with an overview of Cloud Vision API, and if you like to learn more, check out our articles Cloud Computing, Cloud Computing Technologies, Cloud Computing Infrastructure, and Overview of a log-based metric.

Refer to our Coding Ninjas Studio Guided Path to learn Data Structures and Algorithms, Competitive Programming, JavaScript, System Design, and even more! You can also check out the mock test series and participate in the contests hosted by Coding Ninjas Studio! But say you're just starting and want to learn about questions posed by tech titans like Amazon, Microsoft, Uber, and so on. In such a case, for placement preparations, you can also look at the problemsinterview experiences, and interview bundle.

Do upvote our blogs if you find them helpful and engaging!

Happy Coding!

Live masterclass