ENMEval Dashboard
English NLP Model Evaluation & Quality Assurance Pipeline
Overview
Model Registry
Evaluation Logs
Configuration
Factuality F1
94.7%
↑ 1.2% from v2.4.1
Citation Coverage
89.3%
↑ 0.8% avg precision
Inference Latency
42ms
↓ 5ms optimized
Hallucination Rate
0.8%
↓ 0.3% suppressed
Model Performance Comparison
| Model | Version | Accuracy | Coverage | Latency | Status |
|---|---|---|---|---|---|
| Aevum-TextGen-v3 | 3.1.0-rc2 | 96.2% | 91.4% | 38ms | Production |
| Aevum-FactCheck | 2.4.1 | 94.7% | 89.3% | 42ms | Stable |
| Aevum-CiteMatch | 1.8.5 | 92.1% | 87.6% | 51ms | Review |
| Aevum-BiasDetect | 2.0.0-beta | 88.9% | 93.2% | 67ms | Testing |
| Legacy-WikiParser | 0.9.4 | 84.3% | 76.8% | 89ms | Deprecated |
Recent Evaluation Runs
2025-06-14 09:42:11 UTCfact-check-pipelinebatch_4892Passed
2025-06-14 08:15:33 UTCcitation-validationbatch_4891Passed
2025-06-14 06:30:05 UTCbias-detection-sweepbatch_4890Review Req
2025-06-13 22:10:47 UTChallucination-filterbatch_4889Failed
2025-06-13 19:05:22 UTCcross-lingual-alignbatch_4888Passed
Evaluation Criteria
- Factuality / Grounding96%
- Citation Precision91%
- Temporal Consistency84%
- Neutrality / Bias93%
- Structural Compliance98%
- Edge Case Handling79%
Pipeline Health
Data Ingestion100%
Model Scoring94%
Validation Gates87%
Deployment Sync99%
Quick Actions