4 Out of 4 LLMs Tested Got It Wrong: Examining the Impact of Data Inconsistency in Unstructured Data

Sarabjot Singh

February 25, 2025

What Is Data Inconsistency?

Data inconsistency is data that isn’t standardized or uniform across various data sources, systems, or formats. This often occurs when data is pulled from outdated or incorrect data sources containing conflicting, invalid, or partial information for the same attribute.

Inconsistent data can arise from differences in formats, currencies, or units between sources, typically caused by mistakes like typos, improper data formats, or invalid data entry due to human error or lack of knowledge.

What Is Unstructured Data?

Unstructured data is any type of data without a predefined format or structure and comes in a variety of forms, such as:

Text data (emails, documents, chat messages, social media posts, customer reviews)
Multimedia (images, videos, audio files)
Logs (from servers or applications)
Web pages (HTML content)
IoT or sensor data

Unlike structured data in a database, organized in rows and columns, unstructured data lacks a predetermined model, making it harder to organize, process, and analyze. Though, the latest innovations in artificial intelligence (AI), machine learning (ML), natural language processing (NLP), observability, and advanced analytics are enabling organizations to get more insights and value out of unstructured data, especially for building large language models (LLM) and GenAI applications.

Data Inconsistency in Unstructured Data

Inconsistent information in unstructured data, such as documents, can significantly degrade the accuracy and reliability of Large Language Model (LLM) applications that rely on them for knowledge retrieval and processing. When documents contain typos or contradictory statements, the LLM can’t determine which information is correct. Naturally, discrepancies in document information cause the LLM to provide invalid output or misleading responses.

Testing the Impact of Data Inconsistency

To test the impact of data inconsistency in LLM datasets, imagine a scenario where we need to understand the total revenue of a company based on unstructured data in financial reports.
For this test, we used a mock revenue statement from a fictitious company, Fict.ai:

				
					Fict.ai, a rising player in the artificial intelligence sector, has demonstrated strong financial performance, reporting a revenue of 1.2 million dollars in the latest fiscal year. This achievement highlights the company’s ability to monetize its AI-driven solutions effectively, securing a solid position in a competitive market. With a revenue of 1 million, Fict.ai has managed to attract investors and expand its research and development efforts, ensuring continuous innovation in machine learning and automation.

Despite market fluctuations, Fict.ai has maintained a stable financial trajectory, consistently hitting the 1 million revenue mark, reinforcing its business model’s viability. The company’s sustained growth, reflected in its 1 million-dollar earnings, enables it to explore new markets and diversify its AI offerings. Looking ahead, Fict.ai aims to scale beyond its current 1 million revenue, leveraging its expertise to drive further profitability and technological advancements.

This sample financial report discloses Fict.ai’s revenue incorrectly as $1.2 million in the first sentence, but then states $1 million as the correct revenue number, four times throughout the rest of the document.

Now, imagine this conflicting report is part of the unstructured dataset in the LLM for Fict.ai’s GenAI-powered financial assistant. How would you know if you could trust the response?

Testing Four LLMs

We used Fict.ai’s report to test four state-of-the-art LLMs — GPT-4o, Claude Sonnet, o1-mini, and DeepSeek-R1 — and asked for the revenue amount using the following prompt:

				
					Given the following details about Fict.ai, can you tell me their annual revenue? Just return a numeric value.

How did the LLMs respond?

All four LLMs got it wrong, returning $1,200,000 as the numeric value of Fict.ai’s revenue, overpivoting to the value corresponding to the first mention of revenue.

Test 2: Detecting Inconsistencies

We ran a second test to check if the LLMs could correctly detect the issue by adjusting the prompt before responding.

Edited prompt:

				
					Given the following details about Fict.ai, can you tell me what is their annual revenue? Just return a numeric value. If the revenue  is inconsistent return all values in a list.

By tightening the prompt, the LLMs generated a new response with inconsistencies detected:

				
					The revenue values mentioned are inconsistent: [1.2, 1] (in millions)

The Impact of Data Inconsistency

As this test demonstrates, LLM-based knowledge retrieval can be very sensitive to underlying Data Quality issues, such as inconsistent values in documents. Why does that matter? The impact of LLMs providing inaccurate answers and unreliable output include:

Dissatisfied users
Lower adoption of AI-driven applications
Exposing the business to potential risks of financial losses, legal consequences, or reputational damage

This simplified test case underscores the importance of Data Quality for unstructured data to ensure that LLM applications run on accurate, consistent, and trustworthy information. Moreover, without a systematic way to monitor Data Quality for unstructured data, the effectiveness of AI-driven automation, decision-making, and knowledge retrieval can be severely compromised.