Code360 powered by Coding Ninjas X Code360 powered by Coding Ninjas X
Table of contents
Image Captioning And Hallucinations
Caption Hallucination Assessment
Key Takeaways
Last Updated: Mar 27, 2024

Hallucinations in Computer Vision

Author soham Medewar
0 upvote
Master Python: Predicting weather forecasts
Ashwin Goyal
Product Manager @


You might have noticed hallucinations in some human beings. Hallucinations are basically an experience involving the apparent perception of something not present. But what are hallucinations in computer vision? Hallucination in computer vision produces weird output and predicts strange results. 

To understand more about hallucinations in computer vision, let us discuss a computer vision topic called image captioning and hallucination in image captioning.

Image Captioning And Hallucinations

Image captioning has been the subject of extensive research. For numerous photos, Neural Baby Talk (NBT), a research project in image captioning, outputs the object "bench" incorrectly. This error is called a hallucination. The ability of visually impaired and blind persons to comprehend the world and images around them is one of the most important uses of image captioning. Many studies have discovered that visually impaired people favor the accuracy of the picture caption above image coverage. As a result, object hallucination is a major concern that has the potential to harm visually impaired persons.

The issue of hallucination also leads to another issue. The problem is that hallucinating model tends to construct very inaccurate internal representations of the visual. The following are some of the many questions that researchers are looking into:

  • Which models are most likely to experience hallucinations?
  • What are some of the most common causes of hallucination?
  • How effectively do established measures of hallucination capture hallucination?


After analyzing many captioning models, researchers were able to answer all the questions. The researchers introduced CHAIR (Caption Hallucination Assessment with Picture Relevance), a new metric that attempts to determine the image relevance of generated captions. The researchers provided picture and language model consistency scores, knowing that hallucinations could be caused by a variety of factors. These results will go more into the problem that arose as a result of the language model. The researchers also point out that many of the measurements they rely on fail to capture and account for the hallucination phenomenon.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job

Caption Hallucination Assessment

As previously stated, the researchers devised a new statistic called CHAIR (Caption Hallucination Assessment with Image Relevance). The ratio of words produced by models in the ground truth sentences and object segmentation is calculated here. The metric includes two types: per-instance, which determines what percentage of object instances are hallucinated, and per-sentence, which determines what percentage of phrases contain a hallucinated object.


Around 80 MSCOCO objects were chosen, and each sentence and word were tokenized and singularized. "For each ground truth statement, we determine a list of MSCOCO objects in the same way," the researchers wrote in their article. By merely relying on the specified object labels, the MSCOCO segmentation annotations are used." The researchers discovered that the annotation source is crucial.


They used the sentence annotations for hunting ground truths and investigating the annotations' human biases. As a result, they discovered that employing only segmentation labels or only reference captions causes more hallucination. The researchers also develop an image and linguistic consistency concept. The image consistency section states that the consistent mistakes from captioning models are with a model that predicts objects based on an image alone. The language consistency model states how consistent errors from captioning models are with a model that predicts words based purely on previously created words.


Many baseline models were constructed by the researchers. They've examined a wide range of model designs. They also look at models that have and don't have attention processes. LSTM RNNs are utilized to generate text in the various baseline models. The self-critical (SC) loss and the standard cross-entropy (CE) loss are used by the majority of models to learn. The captioning models are evaluated on two MSCOCO splits, as previously stated.


Finally, the researchers discovered a variety of causes for hallucinations, as well as the fact that hallucinations do not necessarily match the results of typical captioning measures. One of the most noteworthy findings was that neural networks with attention mechanisms have fewer hallucinations, although the convolutional aspects of the systems deserve more credit. They also discovered that having a strong visual representation helps to decrease hallucinations. The researchers said that the design and training of captioning models should be influenced not only by cross-entropy loss or traditional sentence metrics but also by image relevance.

Also read, Sampling and Quantization


1. What is a hallucination in machine learning? 

A computer hallucination is an artificial intelligence (AI) machine vision and machine learning technology interpretation error. Computer hallucinations are created by a variety of factors that cause AI systems to misclassify.


2. What is a hallucination in NLP?

Hallucination in NLP is a sensory experience of something that does not exist.

3. What is the need for image captioning?

The image caption is a key aspect of scene understanding, which combines computer vision and natural language processing knowledge to automatically generate natural language descriptions based on the content observed in an image.

Key Takeaways

In this article, we have discussed what hallucination is in neural networks and how it occurs.

Want to learn more about Machine Learning? Here is an excellent course that can guide you in learning. 

Happy Coding!

Previous article
Google Deep Dream
Next article
Grad CAM
Live masterclass