| Event extraction completeness | Identifies every reportable adverse event in the narrative — including a serious event that is unrelated to the drug and secondary/lab events — without lumping distinct events or inventing ones. | Misses reportable events (e.g. overlooks the neutropenia behind a febrile-neutropenia admission, or drops the unrelated fracture), or merges separate events into one. |
| MedDRA coding accuracy | Maps each lay event to the correct Preferred Term and its System Organ Class using the dictionary lookup (e.g. 'low white count' -> Neutropenia / Blood and lymphatic system disorders; 'shortness of breath with infiltrates' -> Pneumonitis / Respiratory). PT and SOC are consistent with the dictionary, not guessed from memory. | Wrong PT or mismatched SOC, codes to a symptom when a diagnosis PT exists (or vice versa), or fabricates a code never returned by meddra_lookup. |
| Seriousness determination (ICH E2A) | Correctly classifies each event as serious/non-serious and names the right ICH E2A criterion (death, life-threatening, hospitalization, disability, congenital anomaly, medically important) — e.g. flags the hospitalized CRS/pneumonitis/febrile-neutropenia events as serious and the asymptomatic resolved lab abnormality as non-serious. | Calls a clearly serious (hospitalized/life-threatening) event non-serious or vice versa, or cites the wrong/no criterion, or conflates severity grade with seriousness. |
| Causality assessment quality (WHO-UMC) | Assigns a defensible WHO-UMC category with sound reasoning: weighs temporality (onset vs dosing), dechallenge/rechallenge, alternative etiologies (confounding meds, comorbidity, trauma), and whether the event is labeled — e.g. probable/certain for step-up-dose CRS, unlikely for the antibiotic-confounded transaminitis, unrelated for the traumatic fracture. | Reflexively blames or exonerates the drug, ignores a strong alternative etiology or the temporal mismatch, conflates seriousness with causality, or misuses the WHO-UMC categories. |
| Evidence faithfulness | Grounds every PT/SOC in an actual meddra_lookup result and every causality factor in the narrative or the drug label; no invented events, codes, lab values, or label claims; states uncertainty where the narrative is genuinely ambiguous. | Invents events or MedDRA codes, asserts label content not returned by get_drug_label, or contradicts the narrative (wrong timing, fabricated dechallenge). |