Artificial Intelligence’s Place in Records and Information Management
Introduction & Technology Overview
Large language models (LLMs) such as GPT-style transformers represent a significant shift in how organizations can interact with their information environments. Trained on vast sets of text, these models generate language through vector mathematics and excel at recognizing themes, linking concepts, and producing fluent, context-aware responses. When combined with retrieval-augmented generation (RAG) pipelines and vector databases, LLMs can ingest organizational content and offer summaries, thematic grouping, and conversational search. (Zhao et al., 2025) As a result, organizations can actively explore LLMs for search augmentation, automated summarization, metadata extraction, and preliminary content classification.
Within Records and Information Management (RIM), these capabilities are especially appealing because they address some of the field’s most persistent weaknesses. Cumming (2018) emphasizes that metadata is essential for describing, contextualizing, and retrieving records, yet in practice it is often incomplete, inconsistent, or applied in a way that is unique to the individual entering in metadata. Human cataloging cannot keep up with the volume and diversity of electronic records. LLMs, by contrast, can derive thematic descriptions, extract entities, and generate contextual summaries directly from the text itself, offering a potential way to close the gap between metadata requirements and available people-hours.
RIM also struggles with fragmented information environments. Corrigan and Sprehe (2010) describe how the U.S. Air Force, despite an enterprise records program, accumulated “information landfills” of unstructured data spread across shared drives, local storage, and other systems. Similar fragmentation exists in many organizations, where records and other information material are scattered across email, collaboration platforms, instant messaging platforms, and cloud repositories. LLMs can operate across these silos and “stitch together” meaning, identifying related content that was never harmonized under a single classification scheme.
The sheer volume of unstructured information magnifies these issues. Breuer et al. (2025) note that traditional retrieval systems are strained by the growing volume of text and by brittle keyword matching. They highlight how LLM-based semantic representations can improve discoverability by capturing conceptual relevance rather than relying on exact terms. For organizations overwhelmed by documents, email, and narrative records, LLMs promise a new layer of visibility into what those pieces of information hold.
At the same time, LLMs have important limitations. Because they generate content probabilistically, they cannot guarantee completeness or exhaustiveness in retrieval. Breuer et al. show that even RAG-based systems can hallucinate (a term used in machine-learning circles to indicate when an LLM generates a non-factual response). They also show that changes in indexing or prompt phrasing may alter which documents are returned. From a records perspective, this unpredictability conflicts with requirements for authoritative, defensible retrieval. Palmer (2000) and Cumming (2018) both stress that trustworthy records depend on authenticity, permanence, and verifiable provenance; traits which LLMs cannot ensure in their current use. LLMs produce interpretations of records, not the records themselves. This paper therefore evaluates LLMs not as replacements for traditional RIM tools, but as augmentation technologies. Their power lies in semantic insight, and their limitations require that they remain firmly embedded in information governance (IG) frameworks that distinguish meaning extraction from evidentiary recordkeeping.
Advantages of LLMs for RIM
LLMs offer their greatest value to RIM in large, complex, and poorly structured environments where traditional tools struggle. Their semantic capabilities, rooted in embeddings and contextual inference, allow organizations to discover patterns and relationships that are otherwise difficult to see. (Breuer et al., 2025, p. 71) One key advantage is semantic discovery across unstructured repositories. Rather than relying solely on precise keywords or rigid taxonomies, LLMs can interpret the intent behind a query and map it to conceptually relevant content. Breuer et al. (2025) describe how LLMs support semantic query expansion and conversational retrieval (p. 71), helping address “compromised information needs” when users lack the right terms or cannot fully articulate what they are seeking. (p. 74) This ability to operate in a rich semantic space also supports metadata enrichment: LLMs can infer topics, entities, and relationships across text that was never formally indexed, providing an interpretive layer over collections that are otherwise resistant to retrieval. (p. 73)
These capabilities directly support access and organizational memory. Cumming (2018) argues that metadata exists to describe, link, and contextualize records so they can be retrieved and understood. (p. 35) Where human-applied metadata is inconsistent or missing, LLMs can generate descriptive and contextual tags, summarize content, and propose linkages among related records. (Breuer et al., 2025, p. 72) This helps organizations rediscover information that had effectively disappeared into poorly indexed or overloaded systems and makes previously latent knowledge more readily available for current work. (Breuer et al., 2025, p. 74)
LLMs are also attractive in settings with large-scale information asset management challenges. Corrigan and Sprehe’s (2010) account of the Air Force’s “information landfills” illustrates how unmanaged repositories of unstructured files, duplication, and ROT (“redundant, obsolete, and trivial” information) quickly becomes unusable. (p. 26) Traditional, item-by-item classification cannot realistically rehabilitate such environments. LLMs can help by clustering related documents, identifying high-level themes, and highlighting areas of potential value for further human review. They do not solve lifecycle management on their own, but they can guide appraisal and cleanup efforts by focusing attention on content that appears semantically significant.
Finally, LLMs can support analytic functions such as risk and fraud detection. (Yan et al., 2024) Palmer (2000) notes that effective access to records is essential for auditing and accountability. (p. 65) Investigations often require tracing patterns and inconsistencies across many documents. LLMs can assist by summarizing material, extracting named entities, and highlighting unusual language or relationships at scale. While their outputs are not evidence, they serve as triage tools, pointing the way to areas where evaluation can focus. (Yan et al., pp. 260-261)
Key Drawbacks and RIM Challenges
The same characteristics that make LLMs powerful also limit their suitability for authoritative recordkeeping. These limitations arise from their probabilistic behavior, lack of transparency, and the strict evidentiary and governance requirements that define RIM.
First, without an intentional program designed to audit intrinsic characteristics of a particular model, LLMs cannot guarantee completeness or exhaustiveness when used to retrieve records. Searches that produce the same, consistent output for the same input are central to records work, particularly in contexts such as audits, litigation, and FOIA requests, where organizations must demonstrate that all relevant records have been identified. (Stephens, 2010) Breuer et al. (2025) show that LLM-based retrieval, even when tightly coupled to document stores through RAG, cannot reliably reproduce ranked lists of results. Minor changes in prompts, chunking of longer documents into information items, or small differences in indices can produce different outputs (p. 73) Such systems return what is likely relevant, not what is comprehensively relevant, and this variability undermines the use of LLMs as reliable managers of records without oversight.
Second, LLMs provide no assurance of authenticity, permanence of record, or chain of custody. Cumming (2018) emphasizes that authenticity depends on stable systems, reliable metadata, and clear provenance. LLM outputs, however, are synthetic interpretations generated by a model (Zhao et al., 2025); they are not fixed records and cannot substitute for systems that maintain evidence of creation, modification, and custody. LLMs create meaning about records but cannot preserve or certify the records themselves.
Third, LLMs can hallucinate and misrepresent. Breuer et al. (2025) note that even carefully designed systems may produce content that is plausible yet incorrect. (p. 77) In a records environment, such hallucinations can lead to misattributed facts, invented relationships, or misleading summaries that distort organizational understanding. Because RIM must uphold accuracy and traceability, any tool that can fabricate content must be used with great caution and cannot be relied upon for authoritative retrieval.
Fourth, LLMs raise ethical and bias concerns. Mökander et al. (2024) argue that foundation models exhibit governance gaps, lack transparency, and often embed the biases of their training data. (p. 1098) Without oversight, they may amplify inequities or misrepresent sensitive topics. Records management depends on neutrality, contextual accuracy, and objective handling of evidence. When LLMs reshape content through probabilistic inference, they can unintentionally undermine these principles, and their opacity complicates accountability and verification. (Breuer et al., 2025, p. 77)
Finally, reliance on semantic similarity can produce overclassification and false positives. Two documents that appear similar in language may serve different business functions or fall under different retention rules. If LLM-generated suggestions drive classification or disposition decisions without human review, lifecycle controls can be corrupted. Misclassification (either including irrelevant records or missing required ones) threatens compliance and weakens trust in the RIM environment.
Addressing Challenges: IG Frameworks for Safe Adoption
These limitations do not argue against using LLMs in RIM; rather, they argue for disciplined integration within mature information governance frameworks. The critical distinction is between using LLMs for meaning extraction and relying on them for management functions.
Any safe deployment must recognize that LLMs are tools for insight, not authoritative retention, retrieval, provision, or disposition. They are well suited to summarization, thematic analysis, and semantic search, but unsuited for roles that require completeness, permanence of record, or verified authenticity. IG frameworks should therefore treat LLM outputs as interpretive aids, always tethered to the underlying records and systems that provide evidentiary assurance. Within RIM’s aspirational function of unlocking the value of information assets, LLMs offer powerful new capabilities; but within the governing function of ensuring accountability, compliance, and risk mitigation, LLM outputs must be handled with caution to avoid introducing distortion, incompleteness, or false confidence.
Technical and policy controls are central to this approach. Technically, RAG systems should incorporate authenticated retrieval layers that expose source documents and record which materials informed a given response, mitigating some nondeterminism identified by Breuer et al. (2025) LLM-generated metadata or classifications should be subject to human review before they influence retention, disposition, or access controls. Policy updates should clarify when AI-generated descriptions and summaries are allowed, how they must be validated, and explicitly state that LLM outputs do not themselves constitute records or evidence. Cumming’s (2018) focus on metadata integrity and Palmer’s (2000) emphasis on trustworthy systems provide useful models for these boundaries.
Risk and ethical oversight must also be built in. Drawing on Mökander et al. (2024), IG programs should adopt AI accountability practices that document model limitations, monitor outputs for bias and error, and periodically audit how LLMs are used in practice. (pp. 1094-1096) This should include establishing review cycles, impact assessments, and escalation paths when AI-generated content appears to conflict with organizational values or legal obligations.
Finally, organizational strategy matters. Corrigan and Sprehe’s (2010) work on Information Asset Management underscores that automation can greatly aid large-scale environments but must be aligned with lifecycle-based frameworks. (p. 27) LLMs can provide support for semantic understanding in IAM efforts by helping to surface value in unmanageable repositories, but they cannot replace disciplined RIM practices. Training, change management, and clear role definitions are needed to ensure that human oversight remains central where authenticity, retention, and compliance are at stake.
Recommendations for Incorporating LLMs into RIM Programs
Given these advantages and drawbacks, the most defensible strategy is to integrate LLMs into RIM programs as augmentation tools with clearly defined limits. Organizations should adopt LLMs where their strengths align with RIM needs: metadata enrichment, semantic search augmentation, topic modeling, summarization, and knowledge surfacing across legacy or poorly indexed systems. In those roles, LLMs can improve retrieval, restore visibility into institutional memory, and reduce the burden on human staff without displacing authoritative systems.
Conversely, LLMs should not be used for functions that require deterministic results or completeness. They should not independently drive discovery or FOIA responses, declare authoritative copies, enforce retention requirements, or make final classification decisions without human approval. Such guardrails reflect principles articulated across a wide array of RIM publications: authenticity and metadata integrity (Cumming, 2018), lifecycle alignment and defensible retention (Corrigan and Sprehe, 2010), trustworthy systems (Palmer, 2000), and responsible AI governance (Mökander et al., 2024).
LLMs are especially valuable in large organizations with extensive unstructured content, environments where manual metadata is costly, and contexts where rapid semantic insight is beneficial such as early case assessment, investigative triage, or executive briefing. In these scenarios, they extend analytical capacity and improve visibility while preserving the distinction between insight and evidence.
LLMs represent the first major opportunity in decades to overcome the semantic blindness of many traditional RIM systems. By providing rich, contextual understanding of unstructured records, they can illuminate relationships and themes that have long been obscured by inconsistent metadata, fragmented repositories, and sheer volume. They offer a power means of enriching metadata, improving discoverability, and strengthening institutional memory. At the same time, their probabilistic nature, susceptibility to hallucinations, and lack of inherent authenticity or chain of custody prevent them from serving as authoritative recordkeeping tools. RIMs evidentiary and governance requirements demand determinism, completeness, and verifiable provenance.
Accordingly, the value of LLMs lies not in replacing established RIM frameworks but in augmenting them. When deployed within robust IG structures, supported by technical safeguards, policy boundaries, and human oversight, LLMs can significantly enhance organizational insight without compromising accountability or evidentiary integrity. They should be understood as instruments of meaning extraction rather than sources of record, helping organizations better navigate their information systems while leaving the preservation and management of evidence to systems designed for that purpose.
References
Breuer, T., Frihat, S., Fuhr, N. et al. Large Language Models for Information Retrieval: Challenges and Chances. Datenbank Spektrum 25, 71-81 (2025). https://doi.org/10.1007/s13222-025-00503-x
Corrigan, M., & Sprehe, J. T. (2010, May-June). Cleaning up your information wasteland. Information Management Journal, 44(3), 26-30.
Cumming, K. (2005). Chapter 3: Metadata Matters. In J. McLeod & C. Hare (Eds), Managing Electronic Records. Pp. 34-49. Facet Publishing
Mökander, J., Scheutt, J., Kirk, H.R. et al. Auditing large language models: a three-layered approach. AI Ethics 4, 1085-1115 (2024). https://doi.org/10.1007/s43681-023-00289-2
Palmer, M. (August 2000). Records Management and Accountability Versus Corruption, Fraud and Maladministration. Records Management Journal 10(2): 61-72.
Stephens, D. O. (2010). Chapter 7: Records Retention and the Law. In Records Management: Making the Transition from Paper to Electronic. 2nd Ed. Pp. 103-122. ARMA International
Yan Y., Hu T., and Zhu. W. (2024). Leveraging Large Language Models for Enhancing Financial Compliance: A Focus on Anti-Money Laundering Applications. 2024 4th International Conference on Robotics, Automation and Artificial Intelligence (RAAI), Singapore, Singapore. [https://doi.org/ 10.1109/RAAI64504.2024.10949516](https://doi.org/ 10.1109/RAAI64504.2024.10949516)
Zhao W. X., Zhou, K., Li, J. et al. (2025) A Survey of Large Language Models. Unpublished manuscript. https://doi.org/10.48550/arXiv.2303.18223