← Epidemiology-Based Market Sizing
Commercial / Forecasting · Commercial forecasting analyst

Epidemiology-Based Market Sizing — agent trajectory for expert review

claude-opus-4-8 · 2026-06-06T09:07:58.147303+00:00 · 4 tool calls · 24.8s · 6240 tokens
run id: market_sizing_20260606_020653_03_size-the-us-addressable-population-and-a-peak-re

Query

Size the US addressable population and a peak-revenue estimate for a HER2-directed ADC (trastuzumab deruxtecan-like) in second-line metastatic HER2-positive breast cancer.

Trajectory

Agent · turn 1

I'll size this opportunity step by step. Let me start by checking the dataset structure and pulling the relevant data.

⚙ calls list_indications()
↳ list_indications result
show data
{
  "disclaimer": "Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.",
  "indications": {
    "NSCLC": {
      "label": "Non-small cell lung cancer (EGFR-mutant subset of interest)",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "incidence"
    },
    "HER2+ Breast": {
      "label": "HER2-positive breast cancer",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "incidence"
    },
    "ES-SCLC": {
      "label": "Extensive-stage small cell lung cancer",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "incidence"
    },
    "Ulcerative Colitis": {
      "label": "Ulcerative colitis (moderate-to-severe)",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "prevalence"
    },
    "ATTR-CM": {
      "label": "Transthyretin amyloid cardiomyopathy",
      "geographies": [
        "US",
        "EU5"
      ],
      "funnel_basis": "prevalence"
    }
  }
}
Agent · turn 2

The indication is "HER2+ Breast", built from incidence. Let me pull all the relevant data.

⚙ calls get_epidemiology(indication=HER2+ Breast, geography=US)
⚙ calls get_subtype_prevalence(indication=HER2+ Breast)
⚙ calls get_pricing(indication=HER2+ Breast)
↳ get_epidemiology result
show data
{
  "disclaimer": "Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.",
  "indication": "HER2+ Breast",
  "label": "HER2-positive breast cancer",
  "funnel_basis": "incidence",
  "diagnosis_rate": 0.95,
  "treatment_rate": 0.9,
  "epidemiology": {
    "US": {
      "incidence_per_year": 300000,
      "prevalence": 4000000
    }
  },
  "note": "funnel_basis tells you whether to start the funnel from incidence (new patients/year) or prevalence (point count of living patients)."
}
↳ get_subtype_prevalence result
show data
{
  "disclaimer": "Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.",
  "indication": "HER2+ Breast",
  "subtypes": {
    "HER2_positive_fraction": 0.15,
    "metastatic_fraction": 0.3
  },
  "line_splits": {
    "1L": 0.55,
    "2L": 0.3,
    "3L+": 0.15
  },
  "notes": "HER2+ ~15% of breast cancer. A 2L+ metastatic ADC (e.g. T-DXd) gates on HER2-positivity AND metastatic disease. For a 2L slice, apply the 2L line-split to the metastatic HER2+ population."
}
↳ get_pricing result
show data
{
  "disclaimer": "Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.",
  "indication": "HER2+ Breast",
  "annual_price": 160000,
  "treated_months": 16,
  "persistence": 0.8,
  "note": "annual_price is the list price for a full year. persistence is the fraction of a year actually paid for (discontinuation/mortality). Effective revenue per addressable patient/year = annual_price x persistence."
}
Agent · turn 3

Now I have everything I need. Let me build the funnel.

Addressable Population Funnel (US, incidence-based)

Step Factor Patients
New breast cancer cases/year 300,000 300,000
× Diagnosed 0.95 285,000
× Treated 0.90 256,500
× HER2-positive 0.15 38,475
× Metastatic 0.30 11,543
× 2L line split 0.30 3,463

US 2L addressable population ≈ 3,500 patients/year (incident).

Peak Revenue Estimate

Using the pricing inputs:
- Effective revenue per patient/year = $160,000 × 0.80 persistence = $128,000

Peak revenue ≈ 3,463 patients × $128,000 ≈ $443M

Refinement note on duration (important)

The funnel above counts ~3,500 newly eligible patients each year, but the mean treated duration is 16 months (1.33 years) — longer than a year. So at steady state, the prevalent on-therapy pool is larger than a single year's incident cohort:

(Note: with a duration adjustment, persistence and treated-months partly overlap conceptually — the persistence factor already captures the fraction-of-year paid. A clean way to present this is the two bookends below.)

Summary

Estimate Addressable Peak revenue
Incident-cohort basis (no duration uplift) ~3,500/yr ~$440M
Steady-state on-therapy basis (16-mo duration uplift) ~4,600 ~$590M

Bottom line: A T-DXd-like HER2 ADC in 2L metastatic HER2+ breast cancer addresses roughly 3,500 US patients/year, supporting a peak US revenue of roughly $0.4–0.6B, depending on whether you credit treatment duration beyond one year.

Key caveats

Want me to extend this to EU5, layer in 1L/HER2-low expansion, or apply a market-share assumption?

Final output

Now I have everything I need. Let me build the funnel.

Addressable Population Funnel (US, incidence-based)

Step Factor Patients
New breast cancer cases/year 300,000 300,000
× Diagnosed 0.95 285,000
× Treated 0.90 256,500
× HER2-positive 0.15 38,475
× Metastatic 0.30 11,543
× 2L line split 0.30 3,463

US 2L addressable population ≈ 3,500 patients/year (incident).

Peak Revenue Estimate

Using the pricing inputs:
- Effective revenue per patient/year = $160,000 × 0.80 persistence = $128,000

Peak revenue ≈ 3,463 patients × $128,000 ≈ $443M

Refinement note on duration (important)

The funnel above counts ~3,500 newly eligible patients each year, but the mean treated duration is 16 months (1.33 years) — longer than a year. So at steady state, the prevalent on-therapy pool is larger than a single year's incident cohort:

(Note: with a duration adjustment, persistence and treated-months partly overlap conceptually — the persistence factor already captures the fraction-of-year paid. A clean way to present this is the two bookends below.)

Summary

Estimate Addressable Peak revenue
Incident-cohort basis (no duration uplift) ~3,500/yr ~$440M
Steady-state on-therapy basis (16-mo duration uplift) ~4,600 ~$590M

Bottom line: A T-DXd-like HER2 ADC in 2L metastatic HER2+ breast cancer addresses roughly 3,500 US patients/year, supporting a peak US revenue of roughly $0.4–0.6B, depending on whether you credit treatment duration beyond one year.

Key caveats

Want me to extend this to EU5, layer in 1L/HER2-low expansion, or apply a market-share assumption?

Reference answer key (deterministic) — open after grading

Deterministic funnel for HER2+ Breast in US built from the curated tool data (basis = incidence). Eligibility gating is the product of the subtype fractions that apply to this drug, then the line-of-therapy gate. The peak-revenue sketch uses an illustrative 30% peak share; a strong agent may choose a different, well-justified share. Teaching snapshot: curated, rounded epidemiology for funnel-building practice. Not a live data pull; do not cite as ground truth.

Funnel stepPatients / valueCalculation
1. Incidence (new/yr)300,000US HER2+ Breast; basis = incidence
2. Diagnosed285,000x 95% diagnosis rate
3. Drug-treated256,500x 90% treatment rate
4. HER2-positive38,475x 15%
5. Metastatic11,542x 30%
6. Line gate: 2L (metastatic ADC)3,463x 30% (=1.0 if already encoded above)
= Addressable patients3,463eligibility = 4% of treated, x line
Peak capture @ 30% share1,039illustrative peak share assumption
Eff. revenue / patient-yr$128,000$160,000 list x 80% persistence
~ Peak revenue (USD/yr)$132,969,600= captured patients x eff. revenue/pt

Expert grading

Epidemiology sourcing & funnel construction 1 (poor) 5 (excellent)
5: Pulls the right epidemiology via the tools, correctly chooses incidence vs prevalence as the funnel basis for the disease (incidence for acute/short-survival oncology, prevalence for chronic disease), and lays out a clean population -> diagnosed -> treated -> eligible -> addressable chain for the queried geography.
1: Builds the funnel from the wrong base (e.g. prevalence for an incidence-driven cancer, or vice versa), skips diagnosis/treatment steps, ignores the requested geography, or reasons from memory instead of the tool data.
Eligibility / biomarker gating correctness 1 (poor) 5 (excellent)
5: Applies the correct biomarker / subtype / stage gates for THIS drug and only those — e.g. advanced-stage AND activating EGFR mutation for a 1L EGFR TKI, DLL3-expressing AND fit-for-2L for a DLL3 engager, HER2+ AND metastatic for a 2L ADC — and applies the line-of-therapy gate without double-counting.
1: Omits a required gate (e.g. forgets the EGFR-mutant or DLL3 filter), applies an irrelevant or wrong-direction gate, double-counts a line split already implied by another gate, or multiplies fractions that should not stack.
Numerical correctness & internal consistency 1 (poor) 5 (excellent)
5: Every multiplication checks out against the returned tool numbers; intermediate counts are consistent and monotonically shrinking down the funnel; the final addressable number is reproducible from the stated inputs.
1: Arithmetic errors, mismatched units, numbers that don't follow from the cited fractions, or a funnel step larger than the one above it.
Assumptions & peak-share / revenue reasoning 1 (poor) 5 (excellent)
5: States and justifies the peak-share and pricing assumptions, applies persistence / treated duration sensibly, and produces a revenue sketch (addressable x share x effective price) whose magnitude is defensible; flags the key sensitivities.
1: Pulls a peak share or price out of thin air with no rationale, ignores persistence/duration, garbles the revenue formula, or presents a point estimate with no acknowledgement of uncertainty.
Evidence faithfulness 1 (poor) 5 (excellent)
5: Every number traces to a specific tool output (epidemiology counts, subtype fractions, price); no fabricated rates or invented epidemiology; the curated/teaching nature of the data is respected and not overstated as live truth.
1: Invents incidence/prevalence or subtype fractions not returned by the tools, contradicts the tool data, or presents the snapshot numbers as authoritative real-world figures.
Overall verdict
Your name / ID