Hallucinations in large language models: understanding and reducing the risk in medical translation

Nov 02, 2025

Artificial intelligence (AI) has become an essential part of multilingual communication, from automatic translation to content generation. Yet, one persistent challenge remains: AI hallucinations. These occur when a model generates information that is not supported by the input data or factual reality.

In the context of life sciences, hallucinations can have serious consequences. A translation error in a patient information leaflet or a dosage instruction could mislead users, affect safety, and damage trust. Understanding what causes hallucinations, how they can be reduced, and how companies like Novalins manage these risks is key to using AI responsibly.

What are AI hallucinations?

According to IBM, AI hallucinations occur when a model produces confident but incorrect or fabricated information. Unlike human mistakes, these errors often sound convincing because large language models (LLMs) are designed to predict the most probable next word, not to verify factual accuracy. When faced with incomplete or ambiguous data, the model fills gaps by inventing plausible-sounding details.

In translation, hallucinations manifest as inaccurate or unrelated text that has little or no connection to the source. For example, an LLM might insert unrelated sentences, misinterpret idioms, or even repeat segments endlessly. In Apple’s recent research, these “hallucinated translations” were described as “pathological outputs” that undermine user trust and reliability in multilingual applications.

Why hallucinations occur

Hallucinations are not caused by a single factor but by how LLMs process and generate language.

Common causes include:

• Predictive bias – the model chooses words that sound statistically right but are semantically wrong.
• Data imbalance – limited or biased multilingual data can lead to overgeneralisation or fabricated content.
• Lack of grounding – LLMs do not access external sources or context while generating text, leading to “best guesses” instead of verified statements.
• Prompt ambiguity – unclear input can confuse the model, especially when translating highly technical or domain-specific language.

In machine translation, hallucinations may appear as missing phrases, added content, or repeated patterns (known as “oscillatory hallucinations”).

Hallucinations in translation: real research insights

A 2025 study by Apple and Boston University demonstrated just how widespread hallucinations can be when LLMs are used for translation tasks. The researchers found that:

Even state-of-the-art translation models such as ALMA-7B-R can hallucinate up to 0.12% of the time on high-quality data, with some language pairs showing much higher rates.

Hallucinations often appear in sentences with complex structures, special characters, or cultural expressions.

Most hallucinations fall into repetitive or “oscillatory” categories, where words or phrases are repeated excessively.

To address this, Apple introduced a hallucination-focused fine-tuning framework. By training the model on pairs of correct versus hallucinated translations, and rewarding it for choosing the correct one, hallucination rates dropped by 96% without reducing overall translation quality.

This represents an important shift in how AI companies address translation reliability, moving from post-processing detection to intrinsic prevention during training.

When AI gets it wrong: the Deloitte case

A recent case outside the translation industry illustrates why unchecked AI output can have real-world consequences. In October 2025, Fortune and Business Insider reported that Deloitte Australia partially reimbursed the Australian government after AI-generated hallucinations were discovered in a technology report. The firm had used an AI tool to assist in drafting sections of the report, which later turned out to contain fabricated content and references.

While Deloitte maintained that human oversight had been applied, the partial refund underscored the growing expectation of accountability when using AI in professional contexts. The episode sparked a international debate about transparency, verification, and the limits of generative AI in public-sector work.

In regulated fields like the life sciences, such errors would be unacceptable. A hallucinated dosage instruction or mistranslated safety warning could have serious implications for patients and regulators alike. This is why AI must always work under human supervision, not as a replacement for expertise, but as a support tool guided by it.

At Novalins: balancing innovation with safety

At Novalins, we continuously test both Machine Translation (MT) and Large Language Models (LLMs) on real-life medical and scientific content to evaluate their accuracy and reliability. Before using a model, our team performs a detailed comparison on a sample of the client’s documents to determine which technology performs best for each language pair or document type.

All AI-assisted translations are then reviewed by certified medical translators to ensure that:
• No information is added, omitted, or altered.
• The meaning remains consistent across languages.
• The translation complies with regulatory and scientific standards.

By combining technological efficiency with human expertise, we minimise the risk of hallucination while keeping translation costs and timelines optimised.

The future of AI translation

Hallucinations remind us that artificial intelligence is powerful but not infallible. As research by Apple and others continues to reduce hallucination rates, the goal is not to replace human translators but to empower them with better tools.

Cases like Deloitte’s show that even advanced AI must operate within a framework of human oversight. In life sciences, this partnership between technology and expertise ensures both innovation and safety.

Interested in testing how AI performs on your multilingual content?

Try a free pilot with Novalins to compare the quality of LLM-based and MT-based translations on your documents and see how our expert review process ensures accuracy and compliance.

References

https://www.ibm.com/think/topics/ai-hallucinations

https://arxiv.org/pdf/2501.17295

https://www.businessinsider.com/deloitte-australia-issues-refund-ai-assurance-project-2025-10