⚡ Defensibility & The Secret Facts of eDiscovery. Get the Guide ×
Home / What is / Technology-Assisted Review (TAR)

Technology-Assisted Review (TAR)

Technology-Assisted Review (TAR) is a machine learning method that automatically classifies documents as relevant or irrelevant in eDiscovery. Key TAR components include:

Table of Contents

What is TAR?

Technology-Assisted Review (TAR) is a machine learning method that automatically classifies documents as relevant or irrelevant in eDiscovery. Key TAR components include:

  • Training Set: Lawyers manually label sample documents
  • Algorithm: Machine learning model learns from examples
  • Ranking: AI ranks all documents by relevance probability
  • Review: Lawyers review highest-ranked documents first

Definition & Core Concept

Technology-Assisted Review (TAR), also called predictive coding or computer-assisted review, uses machine learning algorithms to automatically classify documents as relevant or irrelevant in litigation discovery. Rather than manually reviewing every document, TAR learns from lawyer-reviewed training examples to predict document relevance across large datasets.

Why TAR Matters

TAR revolutionized eDiscovery economics by dramatically reducing manual document review costs. In complex litigation involving millions of documents, traditional manual review can cost $500,000-$2,000,000+ and take 6-12 months. TAR accomplishes comparable results in weeks at 40-60% lower cost.

Legal Acceptance

TAR is widely accepted in U.S. courts. Since 2012, federal courts have approved TAR methodology when properly implemented with documented process and quality controls. The Sedona Conference provides guidelines for TAR protocols, and major law firms now routinely use TAR instead of manual review for large document sets.

TAR Evolution

TAR 1.0 (2012-2020): Batch-based process where lawyers train the system, then review ranked results
TAR 2.0 (2020+): Continuous active learning—system learns iteratively as reviewers work, improving accuracy throughout project

Understanding TAR Origins

Technology-Assisted Review emerged from academic machine learning research in the 2000s. The first court-approved TAR case was Da Silva Moore v. Publicis (2012, U.S. District Court S.D.N.Y.), where Judge Shira Scheindlin approved TAR methodology for document review instead of requiring complete manual review. This landmark decision legitimized machine learning for legal discovery.

Technical Foundation

TAR is a supervised learning classification task:

Supervised Learning: Algorithm learns from labeled examples (documents marked “relevant” or “not relevant”)
Classification: Model assigns documents to categories (relevant vs. not relevant)
Probability Scores: Each document receives relevance probability (0-100%)
Ranking: Documents ranked by probability, highest-ranked reviewed first

Core TAR Process

  1. Seed Set Creation: Lawyers label 200-1,000 sample documents
  2. Feature Extraction: Algorithm identifies patterns in training documents
  3. Model Training: Machine learning algorithm learns the pattern
  4. Full Dataset Scoring: Model scores all documents
  5. Ranking: Documents ranked by predicted relevance
  6. Human Review: Lawyers review documents starting with highest-ranked
  7. Quality Assessment: Track precision/recall and yield curves

TAR vs. Keyword Search

Keyword Search: Find documents containing specific terms (“CEO” + “fraud”)
TAR: Find conceptually similar documents to training examples

Keyword search has high precision (few false positives) but low recall (misses conceptually relevant documents). TAR balances precision and recall, finding relevant documents that don’t contain specific keywords.

How TAR Works

Step-by-Step TAR Workflow

Phase 1: Preparation

Determine Scope: Define “relevant” for this specific case

  • What documents matter to key legal issues?
  • What date range is relevant?
  • What document types matter (emails, contracts, etc.)?

Identify Custodians: Who has documents related to key issues?

Define Success: What accuracy level is needed? (typically 75-95% recall)

Phase 2: Training Set Development

Random Selection: Randomly select 200-1,000 documents from full dataset

  • Random selection ensures representative sample
  • Too small (50 docs) = unreliable model
  • Too large (5,000 docs) = inefficient process

Lawyer Review: Experienced lawyers label each document:

  • Relevant: Responsive to discovery or important to case strategy
  • Not Relevant: Doesn’t matter to case
  • Privileged: Attorney-client communication or work product

Consistency: Ensure labeling is consistent

  • If two lawyers disagree significantly, refine definition
  • Goal: Clear relevance definition

Phase 3: Model Development

Feature Extraction: Algorithm analyzes each training document:

  • Word frequencies and patterns
  • Document structure and format
  • Metadata (author, date, subject line)
  • Linguistic patterns (tone, terminology)

Algorithm Selection: Choose machine learning algorithm

  • Support Vector Machines (SVM)
  • Naïve Bayes
  • Gradient Boosting
  • Neural Networks
  • Ensemble methods

Training: Algorithm learns pattern that distinguishes “relevant” from “not relevant”

Phase 4: Ranking & Prioritization

Full Dataset Scoring: Algorithm scores every document in dataset

  • Each document receives probability score (0-100%)
  • Highest-scoring documents most likely to be relevant

Ranking: Documents ranked by probability

  • Document A: 95% likely relevant (rank #1)
  • Document B: 87% likely relevant (rank #2)
  • Document C: 12% likely relevant (rank #10,000)

Batching: Group documents for lawyer review

  • Review top 100 documents first
  • Then next 500
  • Continue until sufficient recall achieved

Phase 5: Human Review & Model Improvement

Batch 1 Review: Lawyers review top 100 AI-ranked documents

  • Mark as Relevant, Not Relevant, or Privileged
  • Typically 60-80% of top 100 are actually relevant

Model Retrain: Algorithm retrains on original training set + new labels

  • Model improves with each batch
  • Accuracy increases over time

Batch 2 Review: Review next documents

  • AI improved model now ranks remaining documents
  • Usually better accuracy than Batch 1

Iterate: Continue until:

  • Sufficient recall achieved (e.g., 90% of relevant documents found)
  • Diminishing returns (finding fewer new relevant documents)
  • Budget exhausted

Phase 6: Quality Control & Certification

Validation Set: Create independent test set (100-500 documents)

  • Separate from training and review batches
  • Verify model accuracy on unseen data

Yield Curve: Graph showing recall vs. documents reviewed

  • Shows when 90% recall achieved
  • Demonstrates defensibility (shows effort was proportionate)

Certification: Document entire process for litigation record

Typical TAR Results

Dataset: 50,000 documents, 30-day deadline, $200,000 budget

Manual Review:

  • 100 lawyers × 500 documents each = 50,000 total
  • Cost: $200,000 at $4/document
  • Time: 6 weeks (barely meets deadline)
  • Result: 98% recall (highest accuracy but very expensive, slow)

TAR Approach:

  • Training: 500 documents (lawyers: 10 docs each)
  • AI Review: 15,000 documents (lawyers: 1,500 docs each)
  • Remaining 35,000: AI scored but not reviewed (low-probability documents)
  • Cost: $70,000 (65% savings)
  • Time: 3 weeks (completes ahead of deadline)
  • Result: 92% recall (comparable accuracy, much cheaper/faster)

TAR 1.0 VS TAR 2.0

TAR 1.0 (Traditional Predictive Coding)

Process:

  1. Lawyer review sample documents and label
  2. Algorithm trains on full training set
  3. Algorithm scores entire dataset once
  4. Lawyers review ranked results
  5. Process complete (no iteration)

Characteristics:

  • Static model: Single training event, then fixed application
  • Batch processing: All documents scored together
  • One cycle: Lawyers review results once
  • Simpler workflow: Easier to understand and implement
  • Less adaptable: Doesn’t adjust if relevance definition changes mid-project

Accuracy: 80-85% recall typical
Timeline: 4-6 weeks
Cost: 30-40% savings vs. manual

Advantages:

  • Simple to explain to courts
  • Well-established methodology
  • Lower platform costs

Disadvantages:

  • Doesn’t learn from reviewer feedback
  • Fixed accuracy regardless of effort
  • May miss pattern changes discovered during review

TAR 2.0 (Continuous Active Learning – CAL)

Process:

  1. Lawyer review sample documents and label (same as TAR 1.0)
  2. Algorithm trains on training set
  3. Algorithm scores documents and ranks by uncertainty
  4. Lawyers review top-ranked documents
  5. Algorithm retrains on new labels
  6. Algorithm re-ranks remaining documents
  7. Repeat steps 4-6 until recall target achieved

Characteristics:

  • Dynamic model: Continuously learns from reviewer decisions
  • Iterative process: Multiple training cycles
  • Active learning: Prioritizes documents system is uncertain about
  • Adaptive: Model improves throughout project
  • Intelligent prioritization: Focuses reviewer effort on most valuable documents

Accuracy: 85-95% recall typical (higher than TAR 1.0)
Timeline: 3-5 weeks (often faster despite iteration)
Cost: 40-60% savings vs. manual (higher than TAR 1.0)

Advantages:

  • Higher accuracy (learns from reviewer patterns)
  • Faster project completion (learns what matters)
  • More efficient (focuses on uncertain documents)
  • Adapts to changing relevance concepts

Disadvantages:

  • More complex to explain
  • Requires more sophisticated platform
  • Needs experienced legal team to avoid training bias

 

Recommendation: Use TAR 2.0 (CAL) for most projects. Superior accuracy and speed justify added complexity. TAR 1.0 remains useful for smaller projects or teams unfamiliar with AI.

TAR Methodology & Protocols

The Sedona Conference TAR Protocol

The Sedona Conference provides gold-standard TAR guidelines. Key requirements:

1. Clear Relevance Definition

  • Define what “relevant” means for THIS case
  • Avoid vague definitions (“relevant to claims or defenses”)
  • Specific definition: “Emails discussing Product X development 2018-2020”

2. Appropriate Training Set

  • Sufficient size: 200-1,000 documents minimum
  • Representative: Includes all document types, date ranges, custodians
  • Balanced: Adequate relevant documents (not 90% irrelevant)
  • Consistent labeling: Clear criteria applied consistently

3. Transparent Methodology

  • Document every step
  • Explain algorithm chosen and why
  • Show model performance metrics
  • Maintain audit trail

4. Quality Control

  • Validation set: Independent test documents
  • Yield curves: Show recall/document relationship
  • Precision/recall: Track both metrics
  • Regular monitoring: Don’t assume accuracy continues

5. Reasonable Effort

  • Continue until sufficient recall (usually 75-90%)
  • Don’t review 100% (defeats TAR purpose)
  • Show diminishing returns (after X documents, minimal new relevant docs found)
  • Document stopping point rationale

Defensibility Principles

Courts require:

  1. Repeatability: Process could be repeated by independent party
  2. Transparency: Clear documentation of methodology
  3. Quality metrics: Proof of acceptable accuracy
  4. Reasonable effort: Not a shortcut, but efficient method

How TAR satisfies:

  • Machine learning models are reproducible
  • Yield curves prove proportionate effort
  • Precision/recall metrics show accuracy
  • Audit trails document process

Real-World TAR Applications

Example 1: M&A Due Diligence – Rapid Document Assessment

Scenario: Private equity firm acquiring software company. 2 million documents to review for legal risk. 30-day deadline, $100,000 budget.

TAR Application:

  • Lawyers label 1,000 documents for training (sensitive contracts, litigation history, IP issues)
  • TAR 2.0 trains model on training set
  • AI ranks all 2 million documents
  • Focus review on top 5% (100,000 highest-risk documents)
  • Achieve 90% recall of high-risk documents
  • 30-day timeline met with $85,000 cost

Result: Deal team identifies all material legal risks, proceeds with acquisition with confidence. Manual review would have taken 60+ days and $300,000+.

Example 2: Government Investigation – Privilege Protection

Scenario: DOJ investigates healthcare company. Company has 5 million emails, many involving counsel. Manual review would take 3+ months and risk inadvertent privilege waiver.

TAR Application:

  • Lawyers identify 500 privileged emails (attorney communications, legal advice)
  • Train TAR model specifically to identify privilege
  • AI flags 150,000 likely-privileged emails for manual verification
  • Manual review of 150,000 (not 5 million) confirms privilege
  • Remaining 4.85 million emails cleared for production
  • Demonstrate proportionality (reasonable effort balances privilege protection with disclosure)

Result: Efficient privilege protection, faster government response, reduced risk of privilege waiver.

Example 3: Class Action Litigation – Cost Management

Scenario: Consumer class action. 10 million emails/documents. Estimated manual review cost: $5 million, 5-month timeline.

TAR Application:

  • Initial training: Legal team labels 2,000 documents
  • TAR 2.0 deployed across 10 million documents
  • Phase 1 (Weeks 1-2): Review top 50,000 documents
  • Phase 2 (Weeks 3-4): Review next 50,000 documents
  • Achieve 85% recall after 100,000 documents reviewed
  • Low-probability documents set aside (represent only 15% of potentially relevant)
  • Total cost: $1.5 million (70% savings)
  • Timeline: 6 weeks vs. 20 weeks

Result: Significantly lower litigation costs, manageable timeline, defensible methodology.

TAR Best Practices

1. Define Relevance Clearly

Ambiguous definitions produce poor TAR results. Instead of “relevant to claims and defenses,” specify: “Emails discussing Product X performance, safety, or design between 2018-2020.”

Impact: Clear definition improves model accuracy 20-30%

2. Select Representative Training Set

Don’t cherry-pick training documents. Use random selection to ensure all document types, dates, custodians represented.

Impact: Representative training improves generalization 15-25%

3. Use Experienced Reviewers for Training

The lawyers who label training documents should be case experts. Their judgments teach the algorithm what matters.

Impact: Expert labeling improves accuracy 10-15%

4. Monitor Performance Throughout

Track precision, recall, and yield curves continuously. Don’t assume accuracy continues.

Impact: Continuous monitoring catches problems early, prevents wasted effort

5. Plan for Quality Control

Allocate 10-15% of review resources to quality checks (validation sets, spot checks).

Impact: QC catches issues, ensures defensibility

6. Document Everything

Maintain detailed records of:

  • Training set selection methodology
  • Algorithms tried and why chosen
  • Performance metrics at each phase
  • Quality control results

Impact: Documentation proves defensibility in litigation

7. Know When to Stop

Continue reviewing until:

  • Recall target achieved (e.g., 90% of relevant docs found)
  • Diminishing returns (finding <1% new relevant docs per 1,000 reviewed)
  • Budget or timeline exhausted

Impact: Stopping criteria balance thoroughness with efficiency

8. Plan for Human Oversight

AI doesn’t replace human judgment. Lawyers must:

  • Define relevance
  • Label training documents
  • Review AI results
  • Make final decisions

Impact: Human oversight ensures accuracy and defensibility

Key Takeaways

Technology-Assisted Review revolutionized eDiscovery by combining machine learning with human judgment. TAR reduces document review costs by 40-60% while maintaining accuracy comparable to or exceeding manual review.

Key concepts:

  • TAR learns from lawyer-reviewed training examples
  • TAR ranks documents by relevance probability
  • Lawyers review highest-ranked documents first
  • TAR 2.0 (continuous active learning) improves as project progresses
  • Federal courts widely accept TAR methodology since 2012

TAR advantages:

  • 40-60% cost reduction vs. manual review
  • 3-5 week timeline (vs. 6-12 months manual)
  • Consistent, reproducible methodology
  • Scales to millions of documents
  • Legally defensible when properly implemented

Success factors:

  • Clear relevance definition
  • Appropriate training set size/quality
  • Experienced legal review
  • Quality control/monitoring
  • Documented process

TAR is no longer experimental—it’s the standard methodology for large document reviews in litigation worldwide.

FAQ Questions & Answers

Q1: What does TAR stand for?

TAR stands for Technology-Assisted Review. It’s also called predictive coding or computer-assisted review. TAR uses machine learning to automatically classify documents as relevant or irrelevant in eDiscovery, dramatically reducing manual review effort and costs.

Keyword search requires legal teams to specify terms and returns all documents containing those terms, regardless of context. TAR learns from examples—lawyers review sample documents, and the algorithm learns patterns of “relevant” vs. “not relevant” documents. TAR finds conceptually similar documents even without specific keywords, understanding document meaning rather than just word presence.

TAR 1.0 trains the machine learning model once on a training set, then scores all documents without further learning. TAR 2.0 (Continuous Active Learning) continuously learns as lawyers review documents, retraining the model on new labels and improving accuracy throughout the project. TAR 2.0 typically achieves 85-95% accuracy vs. TAR 1.0’s 80-85%.

Yes. Federal courts have accepted TAR since 2012 (Da Silva Moore v. Publicis case). Courts worldwide now recognize TAR as valid for document review when properly implemented with clear relevance definition, appropriate training set, transparent methodology, and quality control. The Sedona Conference provides guidelines for TAR compliance.

TAR typically costs 40-60% less than manual review. For example, a 50,000-document project might cost $250,000 manually but only $100,000-$150,000 with TAR. Savings come from reviewing fewer documents (AI prioritizes highest-probability relevant documents) and faster timeline (weeks vs. months).

Initial training typically takes 1-2 weeks. Lawyers manually label 200-1,000 sample documents, then the machine learning algorithm trains on these examples (usually hours to 1-2 days for model development). Quality of training set matters more than quantity—a well-selected small set outperforms a poorly-selected large set.

TAR 1.0 typically achieves 80-85% recall (finding relevant documents); TAR 2.0 achieves 85-95% recall. Accuracy depends on training set quality, relevance definition clarity, and algorithm choice. Human reviewers also make errors—achieving 85-90% accuracy—so TAR performance is comparable to or exceeds human accuracy.

TAR works best for large datasets (10,000+ documents). For smaller projects (under 5,000 documents), manual review is often more efficient. TAR is ideal for complex litigation, regulatory investigations, and M&A due diligence where large document volumes require rapid assessment. TAR can be applied to any eDiscovery project with clear relevance definition.

Continuous Active Learning (CAL) is an enhancement to traditional predictive coding. Rather than batch processing, CAL learns iteratively—each time lawyers review documents, the system retrains and improves its ranking of remaining documents. CAL prioritizes uncertain documents (documents the system is least confident about), focusing lawyer effort on highest-value reviews.

Yes, when properly implemented. TAR is defensible if:

(1) relevance is clearly defined,

(2) training set is appropriately selected and labeled,

(3) methodology is transparent and documented,

(4) quality metrics prove adequate accuracy, and

(5) effort is proportionate (stopping when diminishing returns reached).

Courts require documentation of the process and proof of accuracy, not perfection.

Related Articles

eDiscovery for Small Law Firms
Artificial intelligence in eDiscovery uses machine learning algorithms to automate document review,...
PII Detection
eDiscovery is the process of identifying, collecting, preserving, reviewing, and producing electronically...

Join Venio at Legalweek 2026

Connect with Legal Tech Leaders

9-12 March 2026 • North Javits Center, NY

We are Excited to see You There

Discover how Venio Systems is transforming legal operations with cutting-edge solutions.

Secure a dedicated 1-on-1 slot with our experts before schedules fill up.