Overview
Model Registry
Evaluation Logs
Configuration
Factuality F1
🎯
94.7%
↑ 1.2% from v2.4.1
Citation Coverage
📚
89.3%
↑ 0.8% avg precision
Inference Latency
42ms
↓ 5ms optimized
Hallucination Rate
⚠️
0.8%
↓ 0.3% suppressed
Model Performance Comparison
Model Version Accuracy Coverage Latency Status
Aevum-TextGen-v3 3.1.0-rc2 96.2% 91.4% 38ms Production
Aevum-FactCheck 2.4.1 94.7% 89.3% 42ms Stable
Aevum-CiteMatch 1.8.5 92.1% 87.6% 51ms Review
Aevum-BiasDetect 2.0.0-beta 88.9% 93.2% 67ms Testing
Legacy-WikiParser 0.9.4 84.3% 76.8% 89ms Deprecated
Recent Evaluation Runs
2025-06-14 09:42:11 UTCfact-check-pipelinebatch_4892Passed
2025-06-14 08:15:33 UTCcitation-validationbatch_4891Passed
2025-06-14 06:30:05 UTCbias-detection-sweepbatch_4890Review Req
2025-06-13 22:10:47 UTChallucination-filterbatch_4889Failed
2025-06-13 19:05:22 UTCcross-lingual-alignbatch_4888Passed
Evaluation Criteria
  • Factuality / Grounding96%
  • Citation Precision91%
  • Temporal Consistency84%
  • Neutrality / Bias93%
  • Structural Compliance98%
  • Edge Case Handling79%
Pipeline Health
Data Ingestion100%
Model Scoring94%
Validation Gates87%
Deployment Sync99%
Quick Actions