What is TAR? Technology-Assisted Review in eDiscovery

What is TAR?

Technology-Assisted Review (TAR) is a machine learning method that automatically classifies documents as relevant or irrelevant in eDiscovery. Key TAR components include:

Training Set: Lawyers manually label sample documents
Algorithm: Machine learning model learns from examples
Ranking: AI ranks all documents by relevance probability
Review: Lawyers review highest-ranked documents first

Definition & Core Concept

Technology-Assisted Review (TAR), also called predictive coding or computer-assisted review, uses machine learning algorithms to automatically classify documents as relevant or irrelevant in litigation discovery. Rather than manually reviewing every document, TAR learns from lawyer-reviewed training examples to predict document relevance across large datasets.

Why TAR Matters

TAR revolutionized eDiscovery economics by dramatically reducing manual document review costs. In complex litigation involving millions of documents, traditional manual review can cost $500,000-$2,000,000+ and take 6-12 months. TAR accomplishes comparable results in weeks at 40-60% lower cost.

Legal Acceptance

TAR is widely accepted in U.S. courts. Since 2012, federal courts have approved TAR methodology when properly implemented with documented process and quality controls. The Sedona Conference provides guidelines for TAR protocols, and major law firms now routinely use TAR instead of manual review for large document sets.

TAR Evolution

TAR 1.0 (2012-2020): Batch-based process where lawyers train the system, then review ranked results
TAR 2.0 (2020+): Continuous active learning—system learns iteratively as reviewers work, improving accuracy throughout project

Understanding TAR Origins

Technology-Assisted Review emerged from academic machine learning research in the 2000s. The first court-approved TAR case was Da Silva Moore v. Publicis (2012, U.S. District Court S.D.N.Y.), where Judge Shira Scheindlin approved TAR methodology for document review instead of requiring complete manual review. This landmark decision legitimized machine learning for legal discovery.

Technical Foundation

TAR is a supervised learning classification task:

Supervised Learning: Algorithm learns from labeled examples (documents marked “relevant” or “not relevant”)
Classification: Model assigns documents to categories (relevant vs. not relevant)
Probability Scores: Each document receives relevance probability (0-100%)
Ranking: Documents ranked by probability, highest-ranked reviewed first

Core TAR Process

Seed Set Creation: Lawyers label 200-1,000 sample documents
Feature Extraction: Algorithm identifies patterns in training documents
Model Training: Machine learning algorithm learns the pattern
Full Dataset Scoring: Model scores all documents
Ranking: Documents ranked by predicted relevance
Human Review: Lawyers review documents starting with highest-ranked
Quality Assessment: Track precision/recall and yield curves

TAR vs. Keyword Search

Keyword Search: Find documents containing specific terms (“CEO” + “fraud”)
TAR: Find conceptually similar documents to training examples

Keyword search has high precision (few false positives) but low recall (misses conceptually relevant documents). TAR balances precision and recall, finding relevant documents that don’t contain specific keywords.

How TAR Works

Step-by-Step TAR Workflow

Phase 1: Preparation

Determine Scope: Define “relevant” for this specific case

What documents matter to key legal issues?
What date range is relevant?
What document types matter (emails, contracts, etc.)?

Identify Custodians: Who has documents related to key issues?

Define Success: What accuracy level is needed? (typically 75-95% recall)

Phase 2: Training Set Development

Random Selection: Randomly select 200-1,000 documents from full dataset

Random selection ensures representative sample
Too small (50 docs) = unreliable model
Too large (5,000 docs) = inefficient process

Lawyer Review: Experienced lawyers label each document:

Relevant: Responsive to discovery or important to case strategy
Not Relevant: Doesn’t matter to case
Privileged: Attorney-client communication or work product

Consistency: Ensure labeling is consistent

If two lawyers disagree significantly, refine definition
Goal: Clear relevance definition

Phase 3: Model Development

Feature Extraction: Algorithm analyzes each training document:

Word frequencies and patterns
Document structure and format
Metadata (author, date, subject line)
Linguistic patterns (tone, terminology)

Algorithm Selection: Choose machine learning algorithm

Support Vector Machines (SVM)
Naïve Bayes
Gradient Boosting
Neural Networks
Ensemble methods

Training: Algorithm learns pattern that distinguishes “relevant” from “not relevant”

Phase 4: Ranking & Prioritization

Full Dataset Scoring: Algorithm scores every document in dataset

Each document receives probability score (0-100%)
Highest-scoring documents most likely to be relevant

Ranking: Documents ranked by probability

Document A: 95% likely relevant (rank #1)
Document B: 87% likely relevant (rank #2)
Document C: 12% likely relevant (rank #10,000)

Batching: Group documents for lawyer review

Review top 100 documents first
Then next 500
Continue until sufficient recall achieved

Phase 5: Human Review & Model Improvement

Batch 1 Review: Lawyers review top 100 AI-ranked documents

Mark as Relevant, Not Relevant, or Privileged
Typically 60-80% of top 100 are actually relevant

Model Retrain: Algorithm retrains on original training set + new labels

Model improves with each batch
Accuracy increases over time

Batch 2 Review: Review next documents

AI improved model now ranks remaining documents
Usually better accuracy than Batch 1

Iterate: Continue until:

Sufficient recall achieved (e.g., 90% of relevant documents found)
Diminishing returns (finding fewer new relevant documents)
Budget exhausted

Phase 6: Quality Control & Certification

Validation Set: Create independent test set (100-500 documents)

Separate from training and review batches
Verify model accuracy on unseen data

Yield Curve: Graph showing recall vs. documents reviewed

Shows when 90% recall achieved
Demonstrates defensibility (shows effort was proportionate)

Certification: Document entire process for litigation record

Typical TAR Results

Dataset: 50,000 documents, 30-day deadline, $200,000 budget

Manual Review:

100 lawyers × 500 documents each = 50,000 total
Cost: $200,000 at $4/document
Time: 6 weeks (barely meets deadline)
Result: 98% recall (highest accuracy but very expensive, slow)

TAR Approach:

Training: 500 documents (lawyers: 10 docs each)
AI Review: 15,000 documents (lawyers: 1,500 docs each)
Remaining 35,000: AI scored but not reviewed (low-probability documents)
Cost: $70,000 (65% savings)
Time: 3 weeks (completes ahead of deadline)
Result: 92% recall (comparable accuracy, much cheaper/faster)

TAR 1.0 VS TAR 2.0

TAR 1.0 (Traditional Predictive Coding)

Process:

Lawyer review sample documents and label
Algorithm trains on full training set
Algorithm scores entire dataset once
Lawyers review ranked results
Process complete (no iteration)

Characteristics:

Static model: Single training event, then fixed application
Batch processing: All documents scored together
One cycle: Lawyers review results once
Simpler workflow: Easier to understand and implement
Less adaptable: Doesn’t adjust if relevance definition changes mid-project

Accuracy: 80-85% recall typical
Timeline: 4-6 weeks
Cost: 30-40% savings vs. manual

Advantages:

Simple to explain to courts
Well-established methodology
Lower platform costs

Disadvantages:

Doesn’t learn from reviewer feedback
Fixed accuracy regardless of effort
May miss pattern changes discovered during review

TAR 2.0 (Continuous Active Learning – CAL)

Process:

Lawyer review sample documents and label (same as TAR 1.0)
Algorithm trains on training set
Algorithm scores documents and ranks by uncertainty
Lawyers review top-ranked documents
Algorithm retrains on new labels
Algorithm re-ranks remaining documents
Repeat steps 4-6 until recall target achieved

Characteristics:

Dynamic model: Continuously learns from reviewer decisions
Iterative process: Multiple training cycles
Active learning: Prioritizes documents system is uncertain about
Adaptive: Model improves throughout project
Intelligent prioritization: Focuses reviewer effort on most valuable documents

Accuracy: 85-95% recall typical (higher than TAR 1.0)
Timeline: 3-5 weeks (often faster despite iteration)
Cost: 40-60% savings vs. manual (higher than TAR 1.0)

Advantages:

Higher accuracy (learns from reviewer patterns)
Faster project completion (learns what matters)
More efficient (focuses on uncertain documents)
Adapts to changing relevance concepts

Disadvantages:

More complex to explain
Requires more sophisticated platform
Needs experienced legal team to avoid training bias

Recommendation: Use TAR 2.0 (CAL) for most projects. Superior accuracy and speed justify added complexity. TAR 1.0 remains useful for smaller projects or teams unfamiliar with AI.

TAR Methodology & Protocols

The Sedona Conference TAR Protocol

The Sedona Conference provides gold-standard TAR guidelines. Key requirements:

1. Clear Relevance Definition

Define what “relevant” means for THIS case
Avoid vague definitions (“relevant to claims or defenses”)
Specific definition: “Emails discussing Product X development 2018-2020”

2. Appropriate Training Set

Sufficient size: 200-1,000 documents minimum
Representative: Includes all document types, date ranges, custodians
Balanced: Adequate relevant documents (not 90% irrelevant)
Consistent labeling: Clear criteria applied consistently

3. Transparent Methodology

Document every step
Explain algorithm chosen and why
Show model performance metrics
Maintain audit trail

4. Quality Control

Validation set: Independent test documents
Yield curves: Show recall/document relationship
Precision/recall: Track both metrics
Regular monitoring: Don’t assume accuracy continues

5. Reasonable Effort

Continue until sufficient recall (usually 75-90%)
Don’t review 100% (defeats TAR purpose)
Show diminishing returns (after X documents, minimal new relevant docs found)
Document stopping point rationale

Defensibility Principles

Courts require:

Repeatability: Process could be repeated by independent party
Transparency: Clear documentation of methodology
Quality metrics: Proof of acceptable accuracy
Reasonable effort: Not a shortcut, but efficient method

How TAR satisfies:

Machine learning models are reproducible
Yield curves prove proportionate effort
Precision/recall metrics show accuracy
Audit trails document process

Real-World TAR Applications

Example 1: M&A Due Diligence – Rapid Document Assessment

Scenario: Private equity firm acquiring software company. 2 million documents to review for legal risk. 30-day deadline, $100,000 budget.

TAR Application:

Lawyers label 1,000 documents for training (sensitive contracts, litigation history, IP issues)
TAR 2.0 trains model on training set
AI ranks all 2 million documents
Focus review on top 5% (100,000 highest-risk documents)
Achieve 90% recall of high-risk documents
30-day timeline met with $85,000 cost

Result: Deal team identifies all material legal risks, proceeds with acquisition with confidence. Manual review would have taken 60+ days and $300,000+.

Example 2: Government Investigation – Privilege Protection

Scenario: DOJ investigates healthcare company. Company has 5 million emails, many involving counsel. Manual review would take 3+ months and risk inadvertent privilege waiver.

TAR Application:

Lawyers identify 500 privileged emails (attorney communications, legal advice)
Train TAR model specifically to identify privilege
AI flags 150,000 likely-privileged emails for manual verification
Manual review of 150,000 (not 5 million) confirms privilege
Remaining 4.85 million emails cleared for production
Demonstrate proportionality (reasonable effort balances privilege protection with disclosure)

Result: Efficient privilege protection, faster government response, reduced risk of privilege waiver.

Example 3: Class Action Litigation – Cost Management

Scenario: Consumer class action. 10 million emails/documents. Estimated manual review cost: $5 million, 5-month timeline.

TAR Application:

Initial training: Legal team labels 2,000 documents
TAR 2.0 deployed across 10 million documents
Phase 1 (Weeks 1-2): Review top 50,000 documents
Phase 2 (Weeks 3-4): Review next 50,000 documents
Achieve 85% recall after 100,000 documents reviewed
Low-probability documents set aside (represent only 15% of potentially relevant)
Total cost: $1.5 million (70% savings)
Timeline: 6 weeks vs. 20 weeks

Result: Significantly lower litigation costs, manageable timeline, defensible methodology.

TAR Best Practices

1. Define Relevance Clearly

Ambiguous definitions produce poor TAR results. Instead of “relevant to claims and defenses,” specify: “Emails discussing Product X performance, safety, or design between 2018-2020.”

Impact: Clear definition improves model accuracy 20-30%

2. Select Representative Training Set

Don’t cherry-pick training documents. Use random selection to ensure all document types, dates, custodians represented.

Impact: Representative training improves generalization 15-25%

3. Use Experienced Reviewers for Training

The lawyers who label training documents should be case experts. Their judgments teach the algorithm what matters.

Impact: Expert labeling improves accuracy 10-15%

4. Monitor Performance Throughout

Track precision, recall, and yield curves continuously. Don’t assume accuracy continues.

Impact: Continuous monitoring catches problems early, prevents wasted effort

5. Plan for Quality Control

Allocate 10-15% of review resources to quality checks (validation sets, spot checks).

Impact: QC catches issues, ensures defensibility

6. Document Everything

Maintain detailed records of:

Training set selection methodology
Algorithms tried and why chosen
Performance metrics at each phase
Quality control results

Impact: Documentation proves defensibility in litigation

7. Know When to Stop

Continue reviewing until:

Recall target achieved (e.g., 90% of relevant docs found)
Diminishing returns (finding <1% new relevant docs per 1,000 reviewed)
Budget or timeline exhausted

Impact: Stopping criteria balance thoroughness with efficiency

8. Plan for Human Oversight

AI doesn’t replace human judgment. Lawyers must:

Define relevance
Label training documents
Review AI results
Make final decisions

Impact: Human oversight ensures accuracy and defensibility

Key Takeaways

Technology-Assisted Review revolutionized eDiscovery by combining machine learning with human judgment. TAR reduces document review costs by 40-60% while maintaining accuracy comparable to or exceeding manual review.

Key concepts:

TAR learns from lawyer-reviewed training examples
TAR ranks documents by relevance probability
Lawyers review highest-ranked documents first
TAR 2.0 (continuous active learning) improves as project progresses
Federal courts widely accept TAR methodology since 2012

TAR advantages:

40-60% cost reduction vs. manual review
3-5 week timeline (vs. 6-12 months manual)
Consistent, reproducible methodology
Scales to millions of documents
Legally defensible when properly implemented

Success factors:

Clear relevance definition
Appropriate training set size/quality
Experienced legal review
Quality control/monitoring
Documented process

TAR is no longer experimental—it’s the standard methodology for large document reviews in litigation worldwide.

FAQ Questions & Answers

Q1: What does TAR stand for?

TAR stands for Technology-Assisted Review. It’s also called predictive coding or computer-assisted review. TAR uses machine learning to automatically classify documents as relevant or irrelevant in eDiscovery, dramatically reducing manual review effort and costs.

Q2: How is TAR different from keyword search?

Keyword search requires legal teams to specify terms and returns all documents containing those terms, regardless of context. TAR learns from examples—lawyers review sample documents, and the algorithm learns patterns of “relevant” vs. “not relevant” documents. TAR finds conceptually similar documents even without specific keywords, understanding document meaning rather than just word presence.

Q3: What is the difference between TAR 1.0 and TAR 2.0?

TAR 1.0 trains the machine learning model once on a training set, then scores all documents without further learning. TAR 2.0 (Continuous Active Learning) continuously learns as lawyers review documents, retraining the model on new labels and improving accuracy throughout the project. TAR 2.0 typically achieves 85-95% accuracy vs. TAR 1.0’s 80-85%.

Q4: Do courts accept TAR as a valid methodology?

Yes. Federal courts have accepted TAR since 2012 (Da Silva Moore v. Publicis case). Courts worldwide now recognize TAR as valid for document review when properly implemented with clear relevance definition, appropriate training set, transparent methodology, and quality control. The Sedona Conference provides guidelines for TAR compliance.

Q5: How much does TAR cost compared to manual review?

TAR typically costs 40-60% less than manual review. For example, a 50,000-document project might cost $250,000 manually but only $100,000-$150,000 with TAR. Savings come from reviewing fewer documents (AI prioritizes highest-probability relevant documents) and faster timeline (weeks vs. months).

Q6: How long does TAR training take?

Initial training typically takes 1-2 weeks. Lawyers manually label 200-1,000 sample documents, then the machine learning algorithm trains on these examples (usually hours to 1-2 days for model development). Quality of training set matters more than quantity—a well-selected small set outperforms a poorly-selected large set.

Q7: What accuracy does TAR achieve?

TAR 1.0 typically achieves 80-85% recall (finding relevant documents); TAR 2.0 achieves 85-95% recall. Accuracy depends on training set quality, relevance definition clarity, and algorithm choice. Human reviewers also make errors—achieving 85-90% accuracy—so TAR performance is comparable to or exceeds human accuracy.

Q8: Can TAR be used for all eDiscovery projects?

TAR works best for large datasets (10,000+ documents). For smaller projects (under 5,000 documents), manual review is often more efficient. TAR is ideal for complex litigation, regulatory investigations, and M&A due diligence where large document volumes require rapid assessment. TAR can be applied to any eDiscovery project with clear relevance definition.

Q9: What is continuous active learning (CAL)?

Continuous Active Learning (CAL) is an enhancement to traditional predictive coding. Rather than batch processing, CAL learns iteratively—each time lawyers review documents, the system retrains and improves its ranking of remaining documents. CAL prioritizes uncertain documents (documents the system is least confident about), focusing lawyer effort on highest-value reviews.

Q10: Is TAR defensible in litigation?

Yes, when properly implemented. TAR is defensible if:

(1) relevance is clearly defined,

(2) training set is appropriately selected and labeled,

(3) methodology is transparent and documented,

(4) quality metrics prove adequate accuracy, and

(5) effort is proportionate (stopping when diminishing returns reached).

Courts require documentation of the process and proof of accuracy, not perfection.

Technology-Assisted Review (TAR)

Table of Contents

What is TAR?

Definition & Core Concept

Why TAR Matters

Legal Acceptance

TAR Evolution

Understanding TAR Origins

Technical Foundation

Core TAR Process

TAR vs. Keyword Search

How TAR Works

Step-by-Step TAR Workflow

Phase 1: Preparation

Phase 2: Training Set Development

Phase 3: Model Development

Phase 4: Ranking & Prioritization

Phase 5: Human Review & Model Improvement

Phase 6: Quality Control & Certification

Typical TAR Results

TAR 1.0 VS TAR 2.0

TAR 1.0 (Traditional Predictive Coding)

TAR 2.0 (Continuous Active Learning – CAL)

TAR Methodology & Protocols

The Sedona Conference TAR Protocol

1. Clear Relevance Definition

2. Appropriate Training Set

3. Transparent Methodology

4. Quality Control

5. Reasonable Effort

Defensibility Principles

Real-World TAR Applications

Example 1: M&A Due Diligence – Rapid Document Assessment

Example 2: Government Investigation – Privilege Protection

Example 3: Class Action Litigation – Cost Management

TAR Best Practices

1. Define Relevance Clearly

2. Select Representative Training Set

3. Use Experienced Reviewers for Training

4. Monitor Performance Throughout

5. Plan for Quality Control

6. Document Everything

7. Know When to Stop

8. Plan for Human Oversight

Key Takeaways

FAQ Questions & Answers

Related Articles

What is AI in eDiscovery?

What is eDiscovery?

Ready to Get Started?

Join Venio at Legalweek 2026

Table of Contents 

Ready to
Get Started?