eDiscovery Processing Best Practices: From Data Collection to Review

Brendan Locke

Marketing Manager

March 16, 2026

eDiscovery data processing is the set of technical steps that transform raw electronically stored information (ESI) into a structured, reviewable dataset. It encompasses data collections, culling, deduplication, format normalization, and ingestion into a data processing platform or data processing software, with the goal of producing a defensible, cost-efficient document set for legal review or production.

What Is eDiscovery Data Processing?

In the context of litigation, regulatory response, or internal investigation, eDiscovery data processing refers to the workflow that takes collected ESI through a series of technical transformations before attorneys begin review. ESI, as defined under FRCP Rule 34(a), includes any information stored in any medium from which it can be obtained. Processing sits between data collection and document review in the Electronic Discovery Reference Model (EDRM) and is one of the highest-leverage points for controlling both cost and risk.

Processing is not a single action. It involves multiple coordinated steps, each of which affects the quality, completeness, and defensibility of the final production. Organizations that treat processing as an afterthought often face inflated review costs, missed documents, and challenges to their production methodology.

Why eDiscovery Data Processing Matters

According to EDRM and Gartner research on information governance, the volume of enterprise data subject to legal hold has grown dramatically with the adoption of cloud collaboration tools, messaging platforms, and distributed work environments. The implications are direct:

• Review is typically the most expensive phase of eDiscovery. Effective processing directly reduces the number of documents that reach reviewers.
• Improperly processed data can lead to sanctions, adverse inference instructions, or production failures.
• Modern data types, including Slack, Microsoft Teams, and other digital communications, require purpose-built handling to preserve structure and context.
• Regulatory frameworks such as the Federal Rules of Civil Procedure (FRCP) impose proportionality requirements that reward defensible, efficient workflows.

Processing decisions made early in a matter compound throughout the lifecycle. A well-configured data processing platform establishes the foundation for everything that follows.

How eDiscovery Data Processing Works: A Stage-by-Stage Overview

The table below maps the core stages of an eDiscovery workflow, the key activities at each stage, and the primary goal each stage serves.

StageKey ActivitiesPrimary Goal Data CollectionIdentify custodians, collect from endpoints, cloud apps, email, digital communicationsPreserve and capture all potentially relevant ESI ProcessingDeduplication, filtering, format normalization, culling by date/custodianReduce volume; prepare data for review IngestionLoad into data processing platform or review tool with metadata intactEnable search, tagging, and linear/predictive review ReviewAttorney review, privilege log, responsiveness decisions, TAR/CAL workflowsIdentify responsive, privileged, and producible documents ProductionBates numbering, redactions, load file generation, format conversionDeliver compliant production set to requesting party

Stage 1: Data Collections

Data collections should be scoped precisely before any technical collection begins. This requires identifying custodians, data sources (email servers, cloud storage, collaboration tools, endpoints), and relevant date ranges. Collection should be forensically sound where required, preserving metadata and ensuring chain of custody documentation.

A key consideration is the growing volume of digital communications data management. Platforms such as Slack, Microsoft Teams, Google Chat, and enterprise social tools generate significant volumes of potentially relevant ESI that must be handled by eDiscovery processing tools capable of preserving threading, reactions, attachments, and user context.

Stage 2: Processing and Culling

Once data is collected, processing begins with volume reduction. Common techniques include:

• Deduplication: Removing exact and near-duplicate documents across custodians to reduce review population.
• Date and domain filtering: Applying date range restrictions and excluding irrelevant sender/recipient domains.
• NIST filtering: Removing known system files using the National Institute of Standards and Technology file hash library.
• File type filtering: Excluding non-reviewable or irrelevant file types (e.g., system executables).
• Exception handling: Identifying password-protected, corrupted, or unprocessable files for separate handling.

Understanding how these decisions affect your final population is critical. Key eDiscovery data processing metrics such as native file count, processed file count, exception rates, and deduplication ratios provide transparency and support defensibility.

Stage 3: Format Normalization and Ingestion

Processed data must be converted into formats compatible with the review platform. This typically involves generating text extracted files, creating TIFF or PDF images where required, and building load files (DAT, OPT, or similar) with all associated metadata fields.

Metadata preservation is critical at this stage. Fields such as sent date, author, recipient, file path, and custodian assignment must be accurately mapped to review platform fields to support meaningful search and filtering.

Stage 4: Review and Production

With processed data loaded into the review environment, attorneys can apply filters, run keyword searches, apply technology-assisted review (TAR) or continuous active learning (CAL) workflows, and code documents for responsiveness and privilege. Production then involves applying Bates numbering, redacting privileged content, and delivering compliant load files to the requesting party.

Common eDiscovery Data Processing Challenges

• Volume and velocity of modern data: Enterprise organizations can collect terabytes of ESI in a single matter. Without scalable data processing software, processing timelines become a bottleneck.
• Digital communications complexity: Slack exports, Teams data, and similar sources require specialized parsers. Standard processing tools may flatten threading or lose contextual metadata.
• Cross-border data: International matters may involve data subject to GDPR or other jurisdictional restrictions that affect what can be collected, processed, and produced.
• Internal investigations: Unlike litigation, internal investigations often require faster turnaround with less formal structure, placing greater pressure on processing teams to configure workflows quickly and correctly.
• Chain of custody documentation: Processing logs, exception reports, and audit trails must be preserved to demonstrate that data was handled defensibly from collection through production.

Practical Use Cases

Regulatory Response

A financial services firm receives a regulatory information request requiring production of all internal communications related to a trading desk over a 36-month period. The data processing team collects from email, Teams, and a legacy archiving system. After deduplication and date filtering, the review population is reduced by 60 percent before attorney review begins. Load files are generated to the regulator's specified format with required metadata fields.

Internal Investigation

A corporate compliance team conducts an internal investigation following an employee whistleblower complaint. Time is a factor. Using a configured data processing platform, the team collects from targeted custodians, applies keyword filters, and loads data for review within 48 hours. Processing logs are preserved as part of the investigation record.

Large-Scale Litigation with Digital Communications

A technology company faces class-action litigation. A significant portion of relevant ESI resides in Slack and Google Chat. The processing workflow must normalize these sources into reviewable format while preserving thread structure, user attribution, and timestamps. Digital communications data management capabilities within the processing platform ensure that reviewers see conversations in context rather than as fragmented individual messages.

Ready to Improve Your eDiscovery Processing Workflow?

Effective eDiscovery data processing requires the right combination of workflow design, technology, and governance discipline. Whether you are managing litigation response, regulatory inquiries, or internal investigations, the decisions made during processing determine the defensibility and efficiency of everything that follows.

If your organization is evaluating how to strengthen its processing workflows, connect with the Onna team to explore how Onna's platform supports end-to-end eDiscovery data processing, from data collections through production. You can also schedule a demo to see the platform in action.

Subscribe to our newsletter

Get Complete Visibility into Your Unstructured Data, Today

Complete initial setup and first collection in one business day. No lengthy implementations. No IT backlog. Just full visibility into your collaboration data when you need it most.