Product
Platform
Platform API
Onna + Reveal Hold
Onna + Logikcull
Connectors
Slack
Google
Microsoft 365
Jira
Confluence
Google Gemini
Miro
Zendesk
Zoom
All Connectors
Solutions
Solutions by Use Case
Preservation
Collections
Early Case Assement
Internal Investigations
Data Archiving
Data Activity Monitor
Solutions by Role
Legal
Information Technology
Information Security
Human Resources
Resources
Content Library
Blog
Webinars & Events
Reveal Academy
Developer Hub
Company
About Us
Careers
Newsroom
Partnerships
Trust Center
Pricing
Login
Get a Demo
eDiscovery

Data Collection for Investigations: Reduce eDiscovery Overcollection

Brendan Locke
Marketing Manager
April 29, 2026

Overcollection in eDiscovery occurs when organizations gather significantly more data than is proportionate or relevant to the legal matter at hand. This inflates processing costs, extends review timelines, and increases exposure risk, without improving case outcomes. Targeted data collection for investigations addresses this by applying defined parameters, such as custodian scope, date ranges, and keywords, before data enters the processing pipeline.

Why Overcollection Is a Real Business Problem

The volume of enterprise data has grown sharply in recent years. According to IDC's Global DataSphere research, the amount of data created and replicated globally reached 120 zettabytes in 2023 and is forecast to grow significantly through 2027. For legal and compliance teams, this growth translates directly into more data to assess, process, and review during any investigation or litigation.

When collection is broader than it needs to be, the downstream impact is compounded at every stage of the eDiscovery workflow:

  • Processing costs scale with data volume, not relevance.
  • Review hours increase, raising outside counsel fees.
  • Sensitive data from non-relevant custodians is unnecessarily exposed.
  • Audit trails become harder to manage and defend.

A 2023 survey by the Association of Certified E-Discovery Specialists (ACEDS) found that data volume and cost management remain among the top challenges practitioners face. Proportionality, which is embedded in Rule 26 of the Federal Rules of Civil Procedure, requires that discovery be proportionate to the needs of the case, making overcollection not just a cost issue, but a compliance risk.

How Targeted Data Collection for Investigations Works

Reducing overcollection is not about collecting less. It is about collecting more precisely. A structured collection approach applies defined parameters before data enters the review queue, reducing noise while preserving defensibility.

1. Scope Definition Before Collection Begins

Legal and compliance teams define custodians, relevant data sources, date ranges, and keyword lists before any data is pulled. This upstream discipline shapes everything that follows. Skipping this step is the most common cause of overcollection.

2. Targeted Collection from Structured and Unstructured Sources

Modern organizations run on email, file shares, and increasingly on collaboration apps data, platforms like Slack, Microsoft Teams, Google Workspace, and Zoom. These sources require collection tools that can filter at the source, not after the fact. Onna's eDiscovery collections capability enables teams to apply custodian, date, and keyword filters directly at the point of collection, before data is moved into the processing pipeline.

3. eDiscovery Processing With Deduplication and Filtering

Once collected, eDiscovery processing applies deduplication, near-deduplication, NIST filtering (to eliminate known system files), and format normalization. These steps further reduce volume before the data reaches the review layer, without removing anything potentially relevant.

4. Data Preservation That Is Defensible

Targeted collection does not mean skipping data preservation obligations. Legal holds must be issued, tracked, and documented before any filtering decisions are made. Preservation ensures that relevant data is protected from deletion or alteration, while targeted collection determines what actually moves forward into the review workflow.

Common Challenges in eDiscovery Collection

Even teams with clear processes encounter predictable friction points during data collection for investigations:

  • Custodian sprawl: When the scope of relevant personnel is unclear, collection expands to cover everyone, including those with no material involvement.
  • Collaboration app fragmentation: Platforms like Slack and Microsoft Teams store data in formats that are difficult to collect and filter without purpose-built data collection software.
  • Late-stage filtering: Applying filters after collection rather than before moves the cost burden downstream without reducing the total data processed.
  • Defensibility concerns: Legal teams sometimes overcollect out of caution, fearing that narrowed collection will be challenged. Clear documentation of collection decisions addresses this concern more effectively than volume.
  • Lack of coordination between IT and legal: When IT executes collection without legal parameters, the result is often over-broad exports that legal teams must then sort through manually.

Practical Use Cases

HR Investigation at a Mid-Size Technology Company

An internal HR matter involves four employees across two departments over a six-month period. Rather than pulling all email and Slack data organization-wide, the legal team defines custodians, sets a date range, and applies keyword filters specific to the matter. Collection is limited to the four individuals across only the relevant data sources. eDiscovery processing then deduplicates thread content across custodians. The resulting review set is a fraction of what an unconstrained collection would have produced.

Regulatory Response for a Financial Services Firm

A regulator requests documentation related to a specific product line over a 12-month window. Using Onna's platform, the compliance team collects from targeted custodians across email, cloud storage, and Microsoft Teams. Filters are applied at the source. The legal team receives a proportionate, defensible dataset rather than a bulk export requiring weeks of manual triage.

eDiscovery Collection Checklist: Reducing Overcollection

Use this checklist to structure targeted data collection for investigations across pre-collection, active collection, and quality assurance stages.

Action Owner Stage
Define custodians and data sources before collection begins Legal / Legal Ops Pre-collection
Issue legal holds through a trackable, auditable system Legal / Compliance Pre-collection
Apply date, keyword, and custodian filters at point of collection Legal Ops / IT Collection
Use targeted collection from collaboration apps (Slack, Teams, etc.) eDiscovery Team Collection
Confirm data preservation obligations are met before filtering Legal / Compliance Collection
Deduplicate and run near-deduplication during processing eDiscovery / IT Processing
Apply NIST filtering to remove known system files IT / eDiscovery Processing
Validate collection scope against legal hold parameters Legal Ops QA / Review
Log and document all collection decisions for defensibility Legal / Compliance QA / Review
Review collection metrics and refine scope if necessary Legal Ops QA / Review

Frequently Asked Questions

What is overcollection in eDiscovery?

Overcollection refers to gathering more data than is proportionate or relevant to a legal or compliance matter. It typically occurs when collection parameters, such as custodians, date ranges, and keywords, are not defined in advance, resulting in broad data pulls that inflate processing and review costs.

How does targeted collection reduce eDiscovery costs?

When filters are applied at the point of collection, only relevant data enters the processing and review pipeline. This reduces the volume of data that must be processed, deduplicated, and reviewed, directly lowering the costs associated with each of those stages.

Can targeted collection put defensibility at risk?

Defensibility is a function of documented process, not data volume. When collection decisions, custodian scope, date parameters, keyword logic, are clearly documented and aligned with legal hold requirements, a targeted collection is fully defensible. The Sedona Conference Commentary on Proportionality supports the use of proportionate collection methods in civil litigation.

How are collaboration apps handled in eDiscovery collection?

Collaboration platforms like Slack, Microsoft Teams, and Google Chat present unique challenges because their data structures differ significantly from traditional email. Purpose-built data collection software that integrates directly with these platforms enables keyword, custodian, and date filtering at the source, before data is exported or processed.

What is the difference between data preservation and data collection in eDiscovery?

Data preservation refers to the legal obligation to protect potentially relevant information from deletion or modification once litigation or investigation is reasonably anticipated. Data collection is the subsequent process of actually gathering that data for processing and review. Preservation is a legal duty; collection is an operational decision about what, specifically, to move forward in the workflow.

Ready to implement targeted eDiscovery collection? Explore how Onna supports defensible, proportionate data collection for investigations across email, cloud storage, and collaboration platforms. Contact the Onna team to learn more.

Subscribe to our newsletter

Get Complete Visibility into Your Unstructured Data, Today

Complete initial setup and first collection in one business day. No lengthy implementations. No IT backlog. Just full visibility into your collaboration data when you need it most.

Get a Demo
Talk to an Expert
Product
PlatformPlatform APIOnna + Reveal HoldOnna + Logikcull
Connectors
All ConnectorsSlackGoogleMicrosoft 365JiraConfluenceMiroZendeskZoom
Solutions by Use Case
PreservationCollectionsEarly Case AssessmentInternal InvestigationsData ArchivingData Activity Monitor
Solutions by Role
LegalInformation TechnologyInformation SecurityHuman Resources
Resources
Content LibraryBlogWebinars & Events
Reveal Academy
Developer Hub
Documentation
Company
About Us
Careers
Newsroom
Partnerships
Trust CenterContact Us
© Copyright 2026 Onna
Privacy PolicySAAS Terms of ServiceModern Slavery Statement