Ingest: Academic Feeds
v2.4
Pulls structured metadata & full-text from 42 verified academic repositories via OAuth2.
source: arXiv batch: 500 schedule: 0 */4 * * *
Transform & Dedupe
Normalizes citations, removes duplicates via MinHash, extracts key entities.
dedupe_threshold: 0.92 lang_filter: [en, es, zh]
AI Verification Engine
Cross-references claims against verified knowledge graph. Flags low-confidence entries for review.
model: aevum-verify-3 confidence_min: 0.85
Categorize & Tag
Assigns taxonomy paths, generates semantic tags, links related concepts.
Publish to Encyclopedia
Writes to primary DB, triggers CDN invalidation, notifies editorial queue.
target: prod-us-east rollback: true
Execution Log — Live
14:02:11 [INFO] Initializing pipeline run #4892...
14:02:12 [INFO] Loading 42 academic feed configurations
14:02:14 [OK] Authenticated with arXiv API (scope: read)
14:02:15 [WARN] Rate limit approaching for PubMed endpoint (82%)
14:02:18 [INFO] Batch 1/12 received: 487 documents
14:02:21 [INFO] Running MinHash deduplication (threshold: 0.92)
14:02:24 [OK] Removed 112 duplicates. 375 unique records proceeding.
14:02:26 [INFO] AI Verification Engine loaded (aevum-verify-3)
14:02:31 [WARN] Low confidence detected on 14 entries (<0.85). Routing to review queue.
14:02:33 [INFO] Categorization complete. Taxonomy paths assigned.
14:02:35 [ERR] Publish node failed: Connection timeout to prod-us-east. Retrying...