← Epidemiology-Based Market Sizing comparison · model: Claude Opus 4.8
LLM-judge score: 3.8 / 5 · flawed
funnel_construction: 5 eligibility_gating: 2 numerical_consistency: 4 assumptions_revenue: 4 faithfulness: 4
The funnel construction is clean and correctly uses incidence as the basis for the queried US geography, matching the reference through step 6 (8,720). However, the agent double-counts the line gate by applying an extra 0.50 '2L line split' on top of the fit-for-2L fraction, halving the addressable pool to ~4,360 vs the correct 8,720 — exactly the double-counting error the rubric penalizes. The agent does flag this interpretation ambiguity and provides a sensitivity restoring ~7,900, but it presents the 4,360 figure as its central estimate, distorting the final answer. Pricing/persistence logic ($121k effective) is sound, and the agent respects the curated/teaching nature of the data.
Commercial / Forecasting · Commercial forecasting analyst

Epidemiology-Based Market Sizing — agent trajectory for expert review

claude-opus-4-8 · 2026-06-06T10:20:23.020726+00:00 · 4 tool calls · 22.4s · 6569 tokens
run id: market_sizing_20260606_031816_market-sizing__claude-opus-4-8

Query

Size the US addressable patient population and rough peak revenue for a DLL3 T-cell engager (tarlatamab-like) in second-line extensive-stage small cell lung cancer.

Trajectory

Agent · turn 1

I'll size this DLL3 T-cell engager opportunity in 2L ES-SCLC. Let me start by understanding the dataset and pulling the relevant data.

⚙ calls list_indications()
↳ list_indications result
show data
{
  "disclaimer": "Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.",
  "indications": {
    "NSCLC": {
      "label": "Non-small cell lung cancer (EGFR-mutant subset of interest)",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "incidence"
    },
    "HER2+ Breast": {
      "label": "HER2-positive breast cancer",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "incidence"
    },
    "ES-SCLC": {
      "label": "Extensive-stage small cell lung cancer",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "incidence"
    },
    "Ulcerative Colitis": {
      "label": "Ulcerative colitis (moderate-to-severe)",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "prevalence"
    },
    "ATTR-CM": {
      "label": "Transthyretin amyloid cardiomyopathy",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "prevalence"
    }
  }
}
Agent · turn 2

ES-SCLC is available and uses an incidence-based funnel (appropriate for an aggressive cancer). Let me pull the epidemiology, subtype/line fractions, and pricing.

⚙ calls get_epidemiology(indication=ES-SCLC, geography=US)
⚙ calls get_subtype_prevalence(indication=ES-SCLC)
⚙ calls get_pricing(indication=ES-SCLC)
↳ get_epidemiology result
show data
{
  "disclaimer": "Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.",
  "indication": "ES-SCLC",
  "label": "Extensive-stage small cell lung cancer",
  "funnel_basis": "incidence",
  "diagnosis_rate": 0.95,
  "treatment_rate": 0.85,
  "epidemiology": {
    "US": {
      "incidence_per_year": 33000,
      "prevalence": 38000
    }
  },
  "note": "funnel_basis tells you whether to start the funnel from incidence (new patients/year) or prevalence (point count of living patients)."
}
↳ get_subtype_prevalence result
show data
{
  "disclaimer": "Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.",
  "indication": "ES-SCLC",
  "subtypes": {
    "extensive_stage_fraction": 0.7,
    "DLL3_expression_fraction": 0.85,
    "fit_for_2L_fraction": 0.55
  },
  "line_splits": {
    "1L": 0.95,
    "2L": 0.5,
    "3L+": 0.2
  },
  "notes": "SCLC is ~13% of lung cancer; counts here are SCLC totals. ~70% present extensive-stage. DLL3 is expressed in ~85% of SCLC. A DLL3 T-cell engager (e.g. tarlatamab) is used 2L+; gate ES-SCLC on DLL3 expression AND being fit for second-line therapy. Short treated duration reflects rapid progression and high mortality."
}
↳ get_pricing result
show data
{
  "disclaimer": "Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.",
  "indication": "ES-SCLC",
  "annual_price": 220000,
  "treated_months": 8,
  "persistence": 0.55,
  "note": "annual_price is the list price for a full year. persistence is the fraction of a year actually paid for (discontinuation/mortality). Effective revenue per addressable patient/year = annual_price x persistence."
}
Agent · turn 3

Now I have everything. Let me build the funnel from incidence.

Patient Funnel — US, 2L ES-SCLC, DLL3 TCE

The dataset gives SCLC total incidence; I gate down to the eligible 2L DLL3+ pool.

Step Fraction Patients/yr
SCLC incidence (US, new/yr) 33,000
× Diagnosed 0.95 31,350
× Treated 0.85 26,648
× Extensive-stage 0.70 18,653
× DLL3-expressing 0.85 15,855
× Reach/fit for 2L therapy 0.55 8,720
× 2L line split 0.50 ~4,360

US addressable 2L ES-SCLC DLL3+ population ≈ 4,300–4,400 new patients/year.

A note on the two "2L-ish" gates: fit_for_2L_fraction (0.55) captures whether a patient is well enough to receive any second-line therapy, while the 2L line split (0.50) captures the share actually reaching/electing 2L treatment. I applied both since they describe distinct attrition (clinical fitness vs. line progression). If you view them as overlapping, the pool is larger — see sensitivity below.

Rough Peak Revenue

Effective revenue per addressable patient/year = price × persistence:
- $220,000 × 0.55 = $121,000 per patient-year

(The 8-month treated duration is already baked into the 0.55 persistence factor — ~8/12 ≈ 0.67, further haircut for early discontinuation/mortality, so I don't double-count it.)

Peak revenue, fully-captured addressable pool:
- 4,360 patients × $121,000 ≈ $528M

This represents 100% of the addressable pool — i.e., the theoretical ceiling, not a realistic share. Applying a more realistic peak share for a branded 2L agent (with competition):

Peak market share Peak US revenue
100% (full addressable ceiling) ~$530M
60% ~$315M
40% ~$210M

Sensitivity on the funnel

Key caveats

Want me to run the EU5 geography, or model a 1L expansion scenario for upside?

Final output

Now I have everything. Let me build the funnel from incidence.

Patient Funnel — US, 2L ES-SCLC, DLL3 TCE

The dataset gives SCLC total incidence; I gate down to the eligible 2L DLL3+ pool.

Step Fraction Patients/yr
SCLC incidence (US, new/yr) 33,000
× Diagnosed 0.95 31,350
× Treated 0.85 26,648
× Extensive-stage 0.70 18,653
× DLL3-expressing 0.85 15,855
× Reach/fit for 2L therapy 0.55 8,720
× 2L line split 0.50 ~4,360

US addressable 2L ES-SCLC DLL3+ population ≈ 4,300–4,400 new patients/year.

A note on the two "2L-ish" gates: fit_for_2L_fraction (0.55) captures whether a patient is well enough to receive any second-line therapy, while the 2L line split (0.50) captures the share actually reaching/electing 2L treatment. I applied both since they describe distinct attrition (clinical fitness vs. line progression). If you view them as overlapping, the pool is larger — see sensitivity below.

Rough Peak Revenue

Effective revenue per addressable patient/year = price × persistence:
- $220,000 × 0.55 = $121,000 per patient-year

(The 8-month treated duration is already baked into the 0.55 persistence factor — ~8/12 ≈ 0.67, further haircut for early discontinuation/mortality, so I don't double-count it.)

Peak revenue, fully-captured addressable pool:
- 4,360 patients × $121,000 ≈ $528M

This represents 100% of the addressable pool — i.e., the theoretical ceiling, not a realistic share. Applying a more realistic peak share for a branded 2L agent (with competition):

Peak market share Peak US revenue
100% (full addressable ceiling) ~$530M
60% ~$315M
40% ~$210M

Sensitivity on the funnel

Key caveats

Want me to run the EU5 geography, or model a 1L expansion scenario for upside?

Reference answer key (deterministic) — open after grading

Deterministic funnel for ES-SCLC in US built from the curated tool data (basis = incidence). Eligibility gating is the product of the subtype fractions that apply to this drug, then the line-of-therapy gate. The peak-revenue sketch uses an illustrative 30% peak share; a strong agent may choose a different, well-justified share. Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.

Funnel stepPatients / valueCalculation
1. Incidence (new/yr)33,000US ES-SCLC; basis = incidence
2. Diagnosed31,350x 95% diagnosis rate
3. Drug-treated26,648x 85% treatment rate
4. Extensive-stage18,653x 70%
5. DLL3-expressing15,855x 85%
6. Fit for 2L therapy8,720x 55%
7. Line gate: 2L+8,720x 100% (=1.0 if already encoded above)
= Addressable patients8,720eligibility = 33% of treated, x line
Peak capture @ 30% share2,616illustrative peak share assumption
Eff. revenue / patient-yr$121,000$220,000 list x 55% persistence
~ Peak revenue (USD/yr)$316,550,316= captured patients x eff. revenue/pt

Expert grading

Epidemiology sourcing & funnel construction 1 (poor) 5 (excellent)
5: Pulls the right epidemiology via the tools, correctly chooses incidence vs prevalence as the funnel basis for the disease (incidence for acute/short-survival oncology, prevalence for chronic disease), and lays out a clean population -> diagnosed -> treated -> eligible -> addressable chain for the queried geography.
1: Builds the funnel from the wrong base (e.g. prevalence for an incidence-driven cancer, or vice versa), skips diagnosis/treatment steps, ignores the requested geography, or reasons from memory instead of the tool data.
Eligibility / biomarker gating correctness 1 (poor) 5 (excellent)
5: Applies the correct biomarker / subtype / stage gates for THIS drug and only those — e.g. advanced-stage AND activating EGFR mutation for a 1L EGFR TKI, DLL3-expressing AND fit-for-2L for a DLL3 engager, HER2+ AND metastatic for a 2L ADC — and applies the line-of-therapy gate without double-counting.
1: Omits a required gate (e.g. forgets the EGFR-mutant or DLL3 filter), applies an irrelevant or wrong-direction gate, double-counts a line split already implied by another gate, or multiplies fractions that should not stack.
Numerical correctness & internal consistency 1 (poor) 5 (excellent)
5: Every multiplication checks out against the returned tool numbers; intermediate counts are consistent and monotonically shrinking down the funnel; the final addressable number is reproducible from the stated inputs.
1: Arithmetic errors, mismatched units, numbers that don't follow from the cited fractions, or a funnel step larger than the one above it.
Assumptions & peak-share / revenue reasoning 1 (poor) 5 (excellent)
5: States and justifies the peak-share and pricing assumptions, applies persistence / treated duration sensibly, and produces a revenue sketch (addressable x share x effective price) whose magnitude is defensible; flags the key sensitivities.
1: Pulls a peak share or price out of thin air with no rationale, ignores persistence/duration, garbles the revenue formula, or presents a point estimate with no acknowledgement of uncertainty.
Evidence faithfulness 1 (poor) 5 (excellent)
5: Every number traces to a specific tool output (epidemiology counts, subtype fractions, price); no fabricated rates or invented epidemiology; the curated/teaching nature of the data is respected and not overstated as live truth.
1: Invents incidence/prevalence or subtype fractions not returned by the tools, contradicts the tool data, or presents the snapshot numbers as authoritative real-world figures.
Overall verdict
Your name / ID