Collect AI-Generated Content for Legal Review

Flutura Ahmetxhekaj

Demand Generation Manager

May 8, 2026

Collecting AI-Generated Content for Legal Review: What Chain of Custody Requires

When organizations use generative AI tools for business work, those tools produce electronically stored information (ESI): prompts, model outputs, conversation histories, and associated metadata. K&L Gates confirms that this content is discoverable under FRCP 26(b)(1) when relevant to a claim or defense. For legal operations leaders, compliance officers, and enterprise IT teams, collecting AI-generated content for legal review is no longer a future consideration. It is an active chain of custody requirement in any matter where generative AI has touched the relevant work.

What Is AI-Generated Content in a Legal Context?

Generative AI identifies artifacts as a distinct category of potentially discoverable information, encompassing prompts (the inputs a user submits), model outputs (the content the AI tool generates), conversation histories across multi-turn sessions, versioned iterations of content refined through successive prompts, activity logs that record when and how tools were used, and the configuration or model settings in effect at the time of generation. Each of these elements may be independently relevant to a matter, and each exists as ESI within the meaning of FRCP 34.

The organizational footprint of this data is larger than most legal teams account for at the start of a matter. It spans enterprise-licensed platforms such as Microsoft Copilot, browser-based consumer tools accessed on corporate devices, productivity plug-ins embedded in standard office software, and standalone large language model interfaces that employees use through personal accounts. All of them produce data. Not all of them make that data easy to collect.

According to the Lighthouse Global 2025 AI in eDiscovery Report, enterprise AI adoption grew 95% year over year, with 35% of organizations currently using legal AI solutions and another 49% actively evaluating them. The volume of AI-generated ESI inside enterprise environments is growing faster than most collection workflows have been updated to address.

Why Chain of Custody Applies to AI-Generated Content

Chain of custody in eDiscovery refers to the documented, unbroken record of how data was identified, preserved, collected, handled, and produced. The NIST Generative AI Risk Management Framework (AI 600-1) defines high-integrity information as content that is accurate, reliable, verifiable, authenticatable, and carries a clear chain of custody. That definition applies directly to AI-generated content that enters a legal review workflow.

The chain of custody requirement serves three functions in legal review. First, it establishes that the content collected is the same content that existed at the source, verified through hash values and collection logs. Second, it authenticates that the content was produced by a specific custodian using a specific tool at a specific time, which is necessary for relevance and privilege determinations. Third, it demonstrates that the collection process was targeted, reasoned, and proportional, which courts have stated is the appropriate standard for AI content collection rather than blanket preservation of all AI interactions.

In a recent K&L Gates article, it is noted that the court compelled production of millions of generative AI logs, including user prompts and model responses, subject to anonymization. The same court denied a separate motion to compel internal AI tool content, finding it irrelevant and disproportionate. That pairing illustrates what the current legal standard requires: not blanket collection, but a defensible scope and a documented process for what was collected and why.

What AI-Generated Content Actually Includes

Legal teams approaching their first matter involving AI-generated content frequently underestimate its scope. The content is not limited to final outputs. A single business task completed with a generative AI tool may produce all of the following, each of which may be independently discoverable:

The prompt chain: Every input submitted during a session, including refinements and follow-up instructions that shaped the final output.
Model outputs at each stage: The complete text, code, image, or document generated in response to each prompt, not only the final version used.
Conversation history: The full sequence of turns in a multi-turn session, which provides context that individual prompts and outputs cannot convey alone.
Activity logs: Platform-level records documenting when the tool was accessed, by which account, and for what duration.
Embedded metadata: Tool name, version, model configuration, and session identifiers attached to generated content.
Exported or saved content: Any output saved to a document repository, email, or collaboration platform, which may exist separately from the AI platform's own logs.

The challenge for collection is that these elements are distributed across different systems. The output that appears in a document repository, the prompt that produced it on the AI platform, and the activity log that records the session may sit in three separate data sources with different retention defaults and access configurations.

The Collection Gap: Where Legal Teams Lose Ground

Standard eDiscovery collection workflows were designed for structured sources: email archives, shared drives, messaging platforms, and collaboration tools. AI-generated content does not map cleanly to those workflows, and the gaps it creates are consequential.

Auto-Deletion Before Collection

Many AI platforms auto-delete conversation histories on cycles of 30 days or less. K&L Gates's 2026 guidance on GenAI data preservation makes clear that legal hold notices must specifically instruct custodians to disable auto-delete settings on any AI platform used for business work. A litigation hold that covers email and collaboration platforms but omits AI tools leaves a gap that auto-deletion will fill before collection can begin.

Consumer Tools Outside IT Visibility

Enterprise AI platforms with IT-managed logs represent the visible portion of the problem. Browser-based tools, personal accounts, and AI-enabled productivity features that employees access without IT oversight hold equally relevant content and are systematically excluded from standard custodian data maps. Custodian interviews and legal hold notices must specifically address these sources. Onna's guidance on auditing data from collaboration apps addresses the same class of problem: sources that are outside the standard inventory are the ones most likely to produce collection gaps.

Output Without Context

Collecting AI-generated outputs without the prompts that produced them, or without the surrounding conversation thread, creates an incomplete record that is difficult to authenticate. A document that originated in a generative AI session but is collected only from the repository where it was saved lacks the chain of evidence connecting it to a custodian, a tool, and a point in time. That connection is precisely what chain of custody requires.

Missing Hash Verification

Collection processes that rely on manual export, email forwarding, or copy-paste transfer of AI-generated content do not produce cryptographic hash values at the point of capture. Without hash values recorded at collection, there is no mechanism to demonstrate that the content has not been altered between collection and production. This is a defensibility gap that opposing counsel can exploit.

What a Defensible Collection Record Requires

Every element in the table below should be present in a collection record for AI-generated content to meet chain of custody requirements for legal review.

Data Element	What to Capture	Why It Matters for Legal Review
Full prompt text	Complete input as submitted, unedited	Supports authentication; decontextualized output cannot be properly reviewed without it
Full model output	Complete response, not excerpted or summarized	Prevents selective production and spoliation exposure
Conversation thread	All preceding and following turns in a multi-turn session	Establishes context; individual outputs without thread are incomplete
Timestamps	Date, time, and time zone for each interaction	Enables timeline construction and supports custodian interview corroboration
User and session identifiers	Account ID, session ID, access method (enterprise vs. personal)	Links content to specific custodians; required for scope determinations
Platform and tool metadata	Tool name, version, enterprise or consumer access type	Establishes data custody and informs proportionality analysis
Activity logs	Log records from platform or IT systems, if separately available	Documents chain of custody from source to point of collection
Hash values at collection	Cryptographic hash of each collected file or object	Baseline for verifying data integrity from collection through production
Retention and auto-delete settings	Documentation of platform defaults at time of collection	Supports argument that preservation was timely and complete

How to Collect AI-Generated Content Without Breaking the Chain

A defensible collection of AI-generated content for legal review follows a defined sequence. Each step builds on the previous one, and skipping steps creates the gaps described above.

Map AI Data Sources Before the Hold

The legal hold process must begin with a current inventory of every AI tool in active use: enterprise-licensed platforms, browser-based tools, productivity plug-ins, and any standalone interfaces accessed on corporate devices. Without this inventory, the hold notice cannot cover the full custodian population, and collection will be structurally incomplete.

Issue a Hold That Specifically Addresses AI Content

The litigation hold notice must instruct custodians to identify all AI tools used in connection with the relevant matter and time period, disable auto-delete settings on those platforms, preserve prompt threads and outputs without editing or selectively copying content, and disclose use of personal or browser-based tools for evaluation. A hold that covers only email and collaboration platforms is not sufficient when AI-generated content is implicated. See Onna's data collection platform requirements for cross-team matters for practical guidance on structuring holds that cover modern data source inventories.

Preserve in Place Before Collection Begins

Where the platform supports it, apply in-place preservation before initiating export or collection. This prevents auto-deletion from running during the collection window and ensures that the data preserved matches what is later collected. Preservation and collection are distinct steps; conflating them creates audit trail gaps.

Collect Directly from the Source Using a Purpose-Built Platform

Manual export or copy-paste collection of AI-generated content does not produce a defensible chain of custody record. A purpose-built data collection platform collects content directly from the source, preserves native metadata, generates cryptographic hash values at the point of capture, and maintains a complete, auditable collection log. These are the elements that make a collection record defensible.

Capture the Full Interaction Record, Not Just the Output

Collect the complete conversation thread, including all prompts, all model responses, timestamps, session identifiers, platform metadata, and any separately available activity logs. Proportionality does not require collecting every AI interaction in an organization. It requires collecting every AI interaction within the defined scope, completely.

Document the Methodology

Prepare a written collection methodology record identifying the data sources, collection tools and connectors, custodians, date range, and data integrity steps. This record supports meet-and-confer discussions on ESI protocols, serves as the basis for any court certification, and is the primary defense against challenges to collection completeness.

Build a Collection Workflow That Covers AI Sources

If your organization needs to collect AI-generated content for litigation, an internal investigation, or a regulatory matter, Onna's team can help you build a workflow that meets chain of custody requirements across enterprise and collaboration data sources. The time to establish that workflow is before the next matter requires it.

Talk to the Onna team: Contact Us.

Subscribe to our newsletter

Get Complete Visibility into Your Unstructured Data, Today

Complete initial setup and first collection in one business day. No lengthy implementations. No IT backlog. Just full visibility into your collaboration data when you need it most.

Get a Demo

Talk to an Expert