Collecting AI-Generated Content for Legal Review: What Chain of Custody Requires
When organizations use generative AI tools for business work, those tools produce electronically stored information (ESI): prompts, model outputs, conversation histories, and associated metadata. K&L Gates confirms that this content is discoverable under FRCP 26(b)(1) when relevant to a claim or defense. For legal operations leaders, compliance officers, and enterprise IT teams, collecting AI-generated content for legal review is no longer a future consideration. It is an active chain of custody requirement in any matter where generative AI has touched the relevant work.
What Is AI-Generated Content in a Legal Context?
Generative AI identifies artifacts as a distinct category of potentially discoverable information, encompassing prompts (the inputs a user submits), model outputs (the content the AI tool generates), conversation histories across multi-turn sessions, versioned iterations of content refined through successive prompts, activity logs that record when and how tools were used, and the configuration or model settings in effect at the time of generation. Each of these elements may be independently relevant to a matter, and each exists as ESI within the meaning of FRCP 34.
The organizational footprint of this data is larger than most legal teams account for at the start of a matter. It spans enterprise-licensed platforms such as Microsoft Copilot, browser-based consumer tools accessed on corporate devices, productivity plug-ins embedded in standard office software, and standalone large language model interfaces that employees use through personal accounts. All of them produce data. Not all of them make that data easy to collect.
According to the Lighthouse Global 2025 AI in eDiscovery Report, enterprise AI adoption grew 95% year over year, with 35% of organizations currently using legal AI solutions and another 49% actively evaluating them. The volume of AI-generated ESI inside enterprise environments is growing faster than most collection workflows have been updated to address.
Why Chain of Custody Applies to AI-Generated Content
Chain of custody in eDiscovery refers to the documented, unbroken record of how data was identified, preserved, collected, handled, and produced. The NIST Generative AI Risk Management Framework (AI 600-1) defines high-integrity information as content that is accurate, reliable, verifiable, authenticatable, and carries a clear chain of custody. That definition applies directly to AI-generated content that enters a legal review workflow.
The chain of custody requirement serves three functions in legal review. First, it establishes that the content collected is the same content that existed at the source, verified through hash values and collection logs. Second, it authenticates that the content was produced by a specific custodian using a specific tool at a specific time, which is necessary for relevance and privilege determinations. Third, it demonstrates that the collection process was targeted, reasoned, and proportional, which courts have stated is the appropriate standard for AI content collection rather than blanket preservation of all AI interactions.
In a recent K&L Gates article, it is noted that the court compelled production of millions of generative AI logs, including user prompts and model responses, subject to anonymization. The same court denied a separate motion to compel internal AI tool content, finding it irrelevant and disproportionate. That pairing illustrates what the current legal standard requires: not blanket collection, but a defensible scope and a documented process for what was collected and why.
What AI-Generated Content Actually Includes
Legal teams approaching their first matter involving AI-generated content frequently underestimate its scope. The content is not limited to final outputs. A single business task completed with a generative AI tool may produce all of the following, each of which may be independently discoverable:
- The prompt chain: Every input submitted during a session, including refinements and follow-up instructions that shaped the final output.
- Model outputs at each stage: The complete text, code, image, or document generated in response to each prompt, not only the final version used.
- Conversation history: The full sequence of turns in a multi-turn session, which provides context that individual prompts and outputs cannot convey alone.
- Activity logs: Platform-level records documenting when the tool was accessed, by which account, and for what duration.
- Embedded metadata: Tool name, version, model configuration, and session identifiers attached to generated content.
- Exported or saved content: Any output saved to a document repository, email, or collaboration platform, which may exist separately from the AI platform's own logs.
The challenge for collection is that these elements are distributed across different systems. The output that appears in a document repository, the prompt that produced it on the AI platform, and the activity log that records the session may sit in three separate data sources with different retention defaults and access configurations.
The Collection Gap: Where Legal Teams Lose Ground
Standard eDiscovery collection workflows were designed for structured sources: email archives, shared drives, messaging platforms, and collaboration tools. AI-generated content does not map cleanly to those workflows, and the gaps it creates are consequential.
Auto-Deletion Before Collection
Many AI platforms auto-delete conversation histories on cycles of 30 days or less. K&L Gates's 2026 guidance on GenAI data preservation makes clear that legal hold notices must specifically instruct custodians to disable auto-delete settings on any AI platform used for business work. A litigation hold that covers email and collaboration platforms but omits AI tools leaves a gap that auto-deletion will fill before collection can begin.
Consumer Tools Outside IT Visibility
Enterprise AI platforms with IT-managed logs represent the visible portion of the problem. Browser-based tools, personal accounts, and AI-enabled productivity features that employees access without IT oversight hold equally relevant content and are systematically excluded from standard custodian data maps. Custodian interviews and legal hold notices must specifically address these sources. Onna's guidance on auditing data from collaboration apps addresses the same class of problem: sources that are outside the standard inventory are the ones most likely to produce collection gaps.
Output Without Context
Collecting AI-generated outputs without the prompts that produced them, or without the surrounding conversation thread, creates an incomplete record that is difficult to authenticate. A document that originated in a generative AI session but is collected only from the repository where it was saved lacks the chain of evidence connecting it to a custodian, a tool, and a point in time. That connection is precisely what chain of custody requires.
Missing Hash Verification
Collection processes that rely on manual export, email forwarding, or copy-paste transfer of AI-generated content do not produce cryptographic hash values at the point of capture. Without hash values recorded at collection, there is no mechanism to demonstrate that the content has not been altered between collection and production. This is a defensibility gap that opposing counsel can exploit.
What a Defensible Collection Record Requires
Every element in the table below should be present in a collection record for AI-generated content to meet chain of custody requirements for legal review.
How to Collect AI-Generated Content Without Breaking the Chain
A defensible collection of AI-generated content for legal review follows a defined sequence. Each step builds on the previous one, and skipping steps creates the gaps described above.
Map AI Data Sources Before the Hold
The legal hold process must begin with a current inventory of every AI tool in active use: enterprise-licensed platforms, browser-based tools, productivity plug-ins, and any standalone interfaces accessed on corporate devices. Without this inventory, the hold notice cannot cover the full custodian population, and collection will be structurally incomplete.
Issue a Hold That Specifically Addresses AI Content
The litigation hold notice must instruct custodians to identify all AI tools used in connection with the relevant matter and time period, disable auto-delete settings on those platforms, preserve prompt threads and outputs without editing or selectively copying content, and disclose use of personal or browser-based tools for evaluation. A hold that covers only email and collaboration platforms is not sufficient when AI-generated content is implicated. See Onna's data collection platform requirements for cross-team matters for practical guidance on structuring holds that cover modern data source inventories.
Preserve in Place Before Collection Begins
Where the platform supports it, apply in-place preservation before initiating export or collection. This prevents auto-deletion from running during the collection window and ensures that the data preserved matches what is later collected. Preservation and collection are distinct steps; conflating them creates audit trail gaps.
Collect Directly from the Source Using a Purpose-Built Platform
Manual export or copy-paste collection of AI-generated content does not produce a defensible chain of custody record. A purpose-built data collection platform collects content directly from the source, preserves native metadata, generates cryptographic hash values at the point of capture, and maintains a complete, auditable collection log. These are the elements that make a collection record defensible.
Capture the Full Interaction Record, Not Just the Output
Collect the complete conversation thread, including all prompts, all model responses, timestamps, session identifiers, platform metadata, and any separately available activity logs. Proportionality does not require collecting every AI interaction in an organization. It requires collecting every AI interaction within the defined scope, completely.
Document the Methodology
Prepare a written collection methodology record identifying the data sources, collection tools and connectors, custodians, date range, and data integrity steps. This record supports meet-and-confer discussions on ESI protocols, serves as the basis for any court certification, and is the primary defense against challenges to collection completeness.
Build a Collection Workflow That Covers AI Sources
If your organization needs to collect AI-generated content for litigation, an internal investigation, or a regulatory matter, Onna's team can help you build a workflow that meets chain of custody requirements across enterprise and collaboration data sources. The time to establish that workflow is before the next matter requires it.
Talk to the Onna team: Contact Us.
Subscribe to our newsletter
Get Complete Visibility into Your Unstructured Data, Today
Complete initial setup and first collection in one business day. No lengthy implementations. No IT backlog. Just full visibility into your collaboration data when you need it most.

