Systematic Review & Meta-Analysis

Evidence synthesis / HEOREvidence scientist / systematic reviewer

Given a registered PICO question, screen a pool of PubMed/Embase-shaped study abstracts per explicit inclusion/exclusion criteria (PRISMA discipline: RCT-only, right population/intervention/comparator, outcome reported), then pool the reported effect sizes of the included studies into a single meta-analytic estimate. Pooling is inverse-variance on the log scale with the DerSimonian-Laird random-effects model; heterogeneity is assessed via Cochran's Q and I^2 (I^2>50% => random-effects). The agent has read-only study-query tools that return raw records and must screen and pool itself.

Why this is fundable

Scarce expert who grades this: Evidence scientist / systematic reviewer with biostatistics training (Cochrane-method meta-analyst, ~$150-300/hr loaded)
What one decision is worth: Meta-analyses are the top of the evidence pyramid: they drive clinical-practice guidelines, HTA/payer coverage and pricing decisions, and internal go/no-go on development programs. A wrong pooled estimate or a screening error can swing a guideline recommendation or a reimbursement decision worth hundreds of millions to billions, or greenlight a program the evidence doesn't support.
Real-world data sources: PubMed / Embase abstracts and trial reports (effect estimates + 95% CIs), Cochrane CENTRAL, ClinicalTrials.gov results, with Cochrane RoB 2 for risk of bias. Curated teaching snapshot here; refreshable from live bibliographic APIs.

Agent tools

list_review_questionsget_inclusion_criteriasearch_studiesget_study

Expert grading rubric

Dimension	5 (excellent)	1 (poor)
Screening accuracy & PRISMA discipline	Applies the explicit PICO + RCT-only criteria correctly: includes exactly the eligible RCTs and excludes the rest, each with the correct concrete reason (wrong design, population, comparator, or outcome not reported), and reports coherent PRISMA identification/screening/eligibility/included counts.	Includes ineligible studies (e.g. the observational cohort, the VTE/valve populations, the placebo/aspirin or DOAC-vs-DOAC comparators, or the study that doesn't report the outcome), drops eligible RCTs, or gives no/garbled PRISMA flow.
Effect-measure & log-scale handling	Uses the correct measure (OR for efficacy, RR for safety), pools on the LOG scale, and derives each study's SE from its CI as (ln(hi)-ln(lo))/(2*1.96) rather than treating the point estimate or CI on the natural scale.	Pools raw (non-log) ratios, mishandles or invents the standard errors, mixes OR and RR, or pulls the wrong outcome's effect for a study.
Pooling-model choice & I^2 interpretation	Computes Cochran's Q and I^2, and chooses fixed vs random-effects coherently with the heterogeneity (I^2>50% => random-effects), naming the DerSimonian-Laird estimator and interpreting I^2 correctly.	Ignores heterogeneity, picks a fixed-effect model despite high I^2 (or vice versa with no rationale), or misreads what I^2 means.
Numerical correctness of the pooled estimate & CI	The pooled point estimate, 95% CI, and I^2 match the deterministic inverse-variance / DerSimonian-Laird computation within rounding, and the pooled estimate sits within the range of the included study estimates.	Pooled estimate or CI is materially wrong, falls outside the plausible range of inputs, has an inverted/implausible CI, or the arithmetic is unjustified.
Evidence faithfulness	Every study, effect estimate, and CI used traces to the actual tool outputs; no fabricated trials, effects, or CIs, and excluded studies' numbers are not smuggled into the pool.	Fabricates studies or effect sizes, alters the reported CIs, or pools effects from studies it claimed to exclude.

Example queries

Conduct a systematic review and meta-analysis: pool the effect of direct oral anticoagulants (DOACs) vs adjusted-dose warfarin on stroke or systemic embolism in adults with non-valvular atrial fibrillation, screening per the registered inclusion criteria. Report the included/excluded studies with reasons, the PRISMA counts, the pooled odds ratio with 95% CI, and the heterogeneity (I^2).
Run a meta-analysis of major bleeding for DOACs vs warfarin in non-valvular atrial fibrillation. Screen the study pool against the Q-SAFETY inclusion criteria, then pool the risk ratios. State which studies you included and excluded and why, your choice of fixed- vs random-effects given the heterogeneity, and the pooled RR with CI.
Systematically review the evidence on DOAC vs warfarin efficacy (stroke/systemic embolism) in atrial fibrillation. Apply PICO inclusion/exclusion criteria explicitly, show the PRISMA flow, compute Cochran's Q and I^2, and give the DerSimonian-Laird random-effects pooled OR with its 95% confidence interval.
We need a pooled safety estimate for DOACs vs warfarin in AF. Screen all available studies for the major-bleeding question, exclude the ineligible ones with explicit reasons, assess heterogeneity, choose an appropriate pooling model, and report the meta-analytic risk ratio with 95% CI and I^2.

Trajectories

model panel (compare side by side)

Model	Provider	Tier	Judge 1–5	Verdict
Claude Opus 4.8	anthropic	frontier	3.6	flawed
GPT (frontier)	openai	frontier	3.2	flawed
Claude Haiku 4.5	anthropic	small	3.0	flawed
GPT-4o mini	openai	small	1.2	unusable