Identifying AI-Generated Content in Data

Flutura Ahmetxhekaj

Demand Generation Manager

June 1, 2026

How to Identify AI-Generated Content in Enterprise Collections Before Review Begins

Your review team is about to open a 200,000-document collection. Somewhere inside it are outputs from Microsoft Copilot, ChatGPT, and Gemini. Some were drafted by those tools, some edited by them; some are AI summaries of conversations that no longer exist in their original form. None are labeled. Getting ahead of this before review starts is a defensibility requirement, not a workflow nicety.

Why Standard Collection Methods Miss AI-Generated Content

Traditional enterprise collection workflows index on custodian, date range, file type, and keyword. None reliably surface AI-generated ESI because AI content shares the same file types, custodians, and date ranges as human-authored content. Three gaps explain why it escapes:

No native labeling. Most enterprise AI tools do not append persistent metadata to outputs. A Copilot-drafted email lands in Outlook as a standard message; a ChatGPT-revised contract sits in SharePoint as a standard document.

Prompt logs are siloed. User prompts and AI responses often live in a separate platform log with no connection to the document they produced. Collecting the output without the prompt strips context that may be legally material.

Collaboration tools compress the trail. In Slack and Microsoft Teams, AI-drafted messages are often sent directly under a human user’s name with no indication of AI authorship. Searching AI-generated content in Slack and Teams requires knowing where to look before any keyword search runs.

Onna’s guide to investigating Slack and other modern collaboration platforms outlines how data is structured across these environments and why conventional export methods miss critical content.

Where AI-Generated Content Hides in Enterprise Collections

Collaboration Platforms: Slack and Microsoft Teams

Both Slack and Teams have embedded AI assistants that generate summaries, draft replies, and surface content across the organization. The content these tools produce is often not stored in the thread or channel where it appears. Slack’s AI-generated channel summaries do not persist as standard messages and require specific API access to retrieve. There are also significant technical constraints on what Slack data is recoverable. Onna’s guide to legal and technical considerations before running a Slack export covers the access tier requirements and API limitations that determine what is and is not collectable.

Document Repositories, Email, and Shadow AI

Microsoft 365 Copilot and Google Gemini are embedded into Word, Outlook, Gmail, and Docs. Content they draft is stored as a standard file or message. Identifying it requires querying AI activity logs in admin consoles, such as Copilot usage events in Microsoft Purview, which carry retention windows that may not align with a legal hold timeline.

A significant volume of AI-generated ESI originates from consumer tools used for work: ChatGPT pasted into a work document, Gemini Advanced used to draft a client email. This “shadow AI” content has no corporate IT footprint. Identifying it requires behavioral signals: drafting patterns inconsistent with a custodian’s normal writing, or revision histories that jump from a one-line prompt to a complete draft. Onna’s risk guide on searching AI-generated content covers detection strategies for both provisioned and unprovisioned AI sources.

A Pre-Review Identification Framework

Pre-review identification requires a structured approach applied at collection, not review. Four steps organize that work:

Step 1: Map AI tool deployment. Legal operations and IT must produce a current inventory of AI tools in active use: enterprise-licensed tools, departmental tools procured outside central IT, and known patterns of personal AI tool use. This determines which sources require targeted collection protocols.

Step 2: Query AI activity logs at hold issuance. Microsoft Purview, Google Workspace Admin, and Slack’s audit logs contain records of AI tool interactions with retention windows that may be shorter than standard email retention. A legal hold that does not explicitly cover AI activity logs will lose this data as it ages out.

Step 3: Apply layered detection signals during processing. Flag documents carrying AI-associated metadata: Copilot activity event types in Microsoft 365 audit logs, watermarking where present, and low perplexity scores in text. Microsoft Research confirmed in February 2026 that no single detection method is reliable in isolation, making layering necessary.

Step 4: Tag and segregate before the collection enters review. AI-identified content in a designated review track allows review leadership to design appropriate privilege, authenticity, and materiality protocols for each content type before reviewers encounter it.

Onna’s guide to eDiscovery processing best practices from data collection to review addresses how to connect these stages without gaps that allow AI content to pass through untagged.

The Regulatory and Legal Stakes

The 2025 EDI Leadership Summit, covered by Complete Discovery Source, surfaced a clear judicial concern: AI-generated content raises unresolved questions about authentication and admissibility. Judges flagged hallucinations embedded in produced documents; outside counsel reported professional consequences where unverified AI citations were submitted to courts without disclosure.

At the regulatory level, the EU AI Act requires marking AI-generated content and disclosing its artificial nature. For multinational organizations, AI-generated ESI collected for a US matter may carry parallel disclosure obligations under European law. These are present operational risks, not future ones.

Start Identification Before the Clock Starts Running

AI-generated content is in your enterprise collections right now. Whether it appears as a Slack channel summary, a Copilot-drafted contract, or a ChatGPT output pasted into a work email, it requires identification before reaching a reviewer who is not equipped to handle it.

If your organization is building a pre-review identification workflow or assessing whether your current collection process captures AI-generated ESI, speak with the Onna team. The conversation starts with where your AI content lives and how your collection architecture reaches it.

Subscribe to our newsletter

Get Complete Visibility into Your Unstructured Data, Today

Complete initial setup and first collection in one business day. No lengthy implementations. No IT backlog. Just full visibility into your collaboration data when you need it most.