Frontier labs and bio-AI companies are bottlenecked on expert data for high-value drug-development work. We turn real biopharma workflows into reusable RL environments, run frontier agents through them, and have domain experts grade the reasoning — producing the evals and labeled trajectories that make agents trustworthy on $100M decisions. Mercor for biopharma, productized as environments.
Each environment is an appreciating asset: it generates unlimited trajectories, and every expert grade compounds into a proprietary dataset and reward model.
5 live across the value chain, of 35 catalogued. Each maps a scarce expert and the dollar-value of the decision it informs.
Given an indication and optionally a target or modality, survey the interventional-trial landscape from an AACT-shaped database and produce a competitive read: who is running what, how designs differ (phase, line, control arm, endpoint), which sponsors lead, and where the modality x line-of-therapy whitespace is. The agent has read-only trial-query tools and must pull and reason itself.
4 trajectories · expert: Oncology competitive-intelligence analyst / MD or PharmD with trial-design fluency
Given a query naming a disease, a drug or mechanism, and a geography, build the addressable-patient funnel: epidemiology (incidence or prevalence) -> diagnosed -> treated -> eligible (biomarker / subtype / line-of-therapy gating) -> addressable patients, then an optional peak-share and rough peak-revenue sketch. The agent has only read-only epidemiology / subtype / pricing tools that return raw numbers, and must do all gating, multiplication, and assumption-setting itself.
6 trajectories · expert: Commercial forecasting analyst / epidemiologist
Given a query describing a proposed product (a molecule/modality for a target+indication with defining features — payload, linker, format), assess freedom-to-operate against a curated CN/US/WO patent-family corpus: find the relevant/blocking families, analyse claim overlap (composition-of-matter vs. method-of-treatment, genus vs. species), account for legal status and expiry, and render a CLEAR/WATCH/BLOCKED verdict with specific blocking claims and credible design-arounds. The agent has read-only patent-search tools returning raw patent data and must do all FTO reasoning itself. Teaching snapshot — not legal advice.
4 trajectories · expert: Patent attorney / IP analyst with life-sciences prosecution experience
Given an investigational drug (mechanism/class, known/anticipated safety liabilities, indication, and phase), author a clinical-trial inclusion/exclusion criteria list that balances patient safety against enrollability — protecting against the drug's specific risks (CRS/ICANS, ILD/LVEF, infection/VTE, hyperglycemia/hepatotoxicity) without being so restrictive the trial cannot enroll. The agent has read-only reference tools — drug safety profiles, precedent eligibility criteria from analogous trials, and standard organ-function lab thresholds — and must pull and reason itself. This is a judgment/authoring workflow graded by rubric; there is no clean answer key.
4 trajectories · expert: Medical monitor / clinical-development physician
Given an individual case safety report (ICSR) narrative — a patient on a study drug who experienced one or more adverse events — extract every reportable event, code each to the correct MedDRA Preferred Term and System Organ Class using the provided dictionary, determine ICH E2A seriousness, and assess WHO-UMC drug causality (temporality, dechallenge/rechallenge, alternative etiology, label). The agent has read-only tools — the case narrative, a MedDRA lookup, and the drug-class label — and must reason itself. Uses a teaching MedDRA subset, not a licensed distribution.
6 trajectories · expert: Drug-safety physician / pharmacovigilance specialist
Decide go/no-go on whether a gene is a causal, druggable disease target. The agent has read-only tools over the standard validation stack — Open Targets association + evidence datatypes, DepMap (Project Achilles) CRISPR gene-effect with selective-vs-pan-essential, Open Targets/GSK tractability buckets, and human genetic evidence (GWAS/OMIM) + mouse-KO (IMPC) concordance. Tools return raw scores; the agent must integrate genetics, dependency interpretation, and tractability into a GO / CONDITIONAL / NO-GO call with calibrated confidence — the single highest-leverage decision in drug R&D, since most pipeline failures are wrong-target failures.
0 trajectories · expert: Target biologist / computational biologist with target-ID & validation experience
Given a candidate molecule's GLP toxicology NOAELs and, for high-risk biologics, its in vitro pharmacology, compute a safe first-in-human starting dose. The agent uses read-only nonclinical-input tools (species NOAELs, the FDA km body-surface-area table, and MABEL pharmacology) and must do the allometric scaling and dose judgment itself: NOAEL -> HED via km scaling, pick the most sensitive/appropriate species, apply a 10x safety factor to get the MRSD, scale to a 60 kg total dose, and — for immunostimulatory or agonist biologics — compute a MABEL dose and recommend the LOWER of the two.
0 trajectories · expert: Clinical pharmacologist / translational PK-PD scientist
Given an asset's indication, seriousness, clinical evidence, endpoint type, modality, and prevalence, choose the optimal FDA expedited / special-designation strategy — which of Fast Track, Breakthrough Therapy, Accelerated Approval, Priority Review, RMAT, Orphan Drug, and Rare Pediatric Disease the asset qualifies for, and how to sequence them into one integrated pathway. The agent has read-only tools for the program rules, the scenario facts, per-program criteria, and real precedents, and must apply the criteria itself. Designations are not mutually exclusive.
0 trajectories · expert: Regulatory-affairs lead / ex-FDA reviewer with expedited-program experience
Given a registered PICO question, screen a pool of PubMed/Embase-shaped study abstracts per explicit inclusion/exclusion criteria (PRISMA discipline: RCT-only, right population/intervention/comparator, outcome reported), then pool the reported effect sizes of the included studies into a single meta-analytic estimate. Pooling is inverse-variance on the log scale with the DerSimonian-Laird random-effects model; heterogeneity is assessed via Cochran's Q and I^2 (I^2>50% => random-effects). The agent has read-only study-query tools that return raw records and must screen and pool itself.
0 trajectories · expert: Evidence scientist / systematic reviewer with biostatistics training
Value a clinical-stage asset to size an in-licensing or acquisition deal. The agent has read-only tools returning raw inputs (asset parameters, a phase x therapeutic-area probability-of-success table from BIO/Informa/QLS & Wong-Siah-Lo, comparable deals, and discount/modelling conventions) and must itself build a risk-adjusted NPV (rNPV): cumulative PoS from the current phase, risk-adjusted and discounted commercial value less remaining R&D, then a recommended upfront + milestone biobucket benchmarked to comparables. The deterministic answer key is reference-only.
0 trajectories · expert: BD/licensing analyst or biotech equity analyst with valuation fluency
Prioritize tumor-associated antigens (TAAs) as cell-surface therapeutic targets for CAR-T, ADC, and bispecific / T-cell-engager modalities. The agent has read-only tools over the standard public stack — UniProt/Swiss-Prot localization & topology, TCGA tumor expression by cohort, and GTEx v8 normal-tissue expression. Tools return raw TPM and curated localization fields; the agent must apply the hard surface-accessibility gate (only an extracellular epitope is addressable), weigh tumor/normal specificity against vital-organ safety, and produce a prioritization (score / tier / ranked recommendation). The deterministic scoring model (base merit × safety multiplier, surface gate) is the reference answer key.
5 trajectories · expert: Antibody / cell-therapy discovery scientist or target biologist with surface-antigen selection experience