Epidemiology-Based Market Sizing

Commercial / ForecastingCommercial forecasting analyst

Given a query naming a disease, a drug or mechanism, and a geography, build the addressable-patient funnel: epidemiology (incidence or prevalence) -> diagnosed -> treated -> eligible (biomarker / subtype / line-of-therapy gating) -> addressable patients, then an optional peak-share and rough peak-revenue sketch. The agent has only read-only epidemiology / subtype / pricing tools that return raw numbers, and must do all gating, multiplication, and assumption-setting itself.

Why this is fundable

Scarce expert who grades this: Commercial forecasting analyst / epidemiologist (~$150–300/hr loaded); senior diligence-grade forecasters command more
What one decision is worth: Drives go/no-go and deal valuation: peak-sales forecasts anchor $100M–$5B+ licensing and M&A prices. A funnel off by a biomarker fraction moves an NPV by hundreds of millions.
Real-world data sources: SEER / GLOBOCAN incidence & prevalence, biomarker-prevalence literature, IQVIA-style treatment rates, list pricing. Curated snapshot here; each input maps to a citable real source.

Agent tools

list_indicationsget_epidemiologyget_subtype_prevalenceget_pricing

Expert grading rubric

Dimension	5 (excellent)	1 (poor)
Epidemiology sourcing & funnel construction	Pulls the right epidemiology via the tools, correctly chooses incidence vs prevalence as the funnel basis for the disease (incidence for acute/short-survival oncology, prevalence for chronic disease), and lays out a clean population -> diagnosed -> treated -> eligible -> addressable chain for the queried geography.	Builds the funnel from the wrong base (e.g. prevalence for an incidence-driven cancer, or vice versa), skips diagnosis/treatment steps, ignores the requested geography, or reasons from memory instead of the tool data.
Eligibility / biomarker gating correctness	Applies the correct biomarker / subtype / stage gates for THIS drug and only those — e.g. advanced-stage AND activating EGFR mutation for a 1L EGFR TKI, DLL3-expressing AND fit-for-2L for a DLL3 engager, HER2+ AND metastatic for a 2L ADC — and applies the line-of-therapy gate without double-counting.	Omits a required gate (e.g. forgets the EGFR-mutant or DLL3 filter), applies an irrelevant or wrong-direction gate, double-counts a line split already implied by another gate, or multiplies fractions that should not stack.
Numerical correctness & internal consistency	Every multiplication checks out against the returned tool numbers; intermediate counts are consistent and monotonically shrinking down the funnel; the final addressable number is reproducible from the stated inputs.	Arithmetic errors, mismatched units, numbers that don't follow from the cited fractions, or a funnel step larger than the one above it.
Assumptions & peak-share / revenue reasoning	States and justifies the peak-share and pricing assumptions, applies persistence / treated duration sensibly, and produces a revenue sketch (addressable x share x effective price) whose magnitude is defensible; flags the key sensitivities.	Pulls a peak share or price out of thin air with no rationale, ignores persistence/duration, garbles the revenue formula, or presents a point estimate with no acknowledgement of uncertainty.
Evidence faithfulness	Every number traces to a specific tool output (epidemiology counts, subtype fractions, price); no fabricated rates or invented epidemiology; the curated/teaching nature of the data is respected and not overstated as live truth.	Invents incidence/prevalence or subtype fractions not returned by the tools, contradicts the tool data, or presents the snapshot numbers as authoritative real-world figures.

Example queries

Size the US addressable patient population and rough peak revenue for a DLL3 T-cell engager (tarlatamab-like) in second-line extensive-stage small cell lung cancer.
Build the EU5 addressable-patient funnel for a first-line EGFR TKI in EGFR-mutant NSCLC, and give a rough peak-revenue sketch.
Size the US addressable population and a peak-revenue estimate for a HER2-directed ADC (trastuzumab deruxtecan-like) in second-line metastatic HER2-positive breast cancer.
Estimate the US addressable patient funnel and rough peak revenue for an oral advanced therapy in moderate-to-severe ulcerative colitis.

Trajectories

model panel (compare side by side)

Model	Provider	Tier	Judge 1–5	Verdict
Claude Opus 4.8	anthropic	frontier	3.8	flawed
GPT (frontier)	openai	frontier	3.6	acceptable
Claude Haiku 4.5	anthropic	small	2.6	—
GPT-4o mini	openai	small	2.4	flawed

batch 20260606_021624

Query	Model	Tool calls	Time	Status
Size the US addressable patient population and rough peak revenue for a DLL3 T-cell enga	claude-haiku-4-5-20251001	3	7.4s	ok
Build the EU5 addressable-patient funnel for a first-line EGFR TKI in EGFR-mutant NSCLC,	claude-haiku-4-5-20251001	4	7.3s	ok

batch 20260606_020653

Query	Model	Tool calls	Time	Status
Size the US addressable patient population and rough peak revenue for a DLL3 T-cell enga	claude-opus-4-8	4	20.3s	ok
Build the EU5 addressable-patient funnel for a first-line EGFR TKI in EGFR-mutant NSCLC,	claude-opus-4-8	4	19.5s	ok
Size the US addressable population and a peak-revenue estimate for a HER2-directed ADC (	claude-opus-4-8	4	24.8s	ok
Estimate the US addressable patient funnel and rough peak revenue for an oral advanced t	claude-opus-4-8	4	20.6s	ok