A transparent look into the architecture, algorithms, and verification pipelines that power Aevum Encyclopedia's 2.4 million verified articles across 140+ languages.
Our models don't just summarize; they understand, cross-reference, and verify. We combine large language models with symbolic reasoning to ensure factual precision at scale.
We orchestrate a fleet of specialized models: one for entity extraction, one for temporal reasoning, and one for cross-lingual alignment. Results are fused via Bayesian consensus to minimize hallucination.
Raw neural outputs are constrained by ontological rules. If the model claims "Event X occurred in 1995" but our temporal graph shows conflicting peer-reviewed sources, the claim is flagged for human review.
Using multilingual embeddings (mBERT, XLM-R), knowledge extracted in high-resource languages is carefully mapped to low-resource languages, preserving nuance and cultural context.
Aevum doesn't store flat articles. We build a dynamic, queryable graph where every concept is a node and every relationship is a verified edge.
Raw text is parsed to identify entities. Disambiguation models resolve coreferences (e.g., distinguishing "Apple" the company from "apple" the fruit) using contextual embeddings and domain ontologies.
Dependency parsing and sequence labeling identify relationships (is_a, part_of, causes, located_in). Each edge is weighted by source authority and temporal validity.
Extracted triples are mapped to a unified schema inspired by Wikidata and Schema.org. This enables seamless cross-lingual querying and machine-readable exports.
Knowledge isn't static. Our graph maintains historical snapshots, allowing users to query "What was believed about climate science in 2005 vs 2025?"
Accuracy is our north star. Every article passes through an automated and human-driven verification funnel before publication.
Every claim requires at least two independent, high-authority sources. Our crawler validates URLs, checks for paywall/retraction status, and assigns trust scores based on institutional reputation.
AI handles 85% of routine verification. The remaining 15% (novel claims, controversies, edge cases) are routed to domain-specific expert reviewers via our contributor network.
Published articles aren't final. Background agents monitor for retractions, updated research, and emerging consensus, automatically triggering revision workflows.
Keyword matching is obsolete. Our search engine understands intent, context, and linguistic nuance across 140+ languages.
We distinguish between definitional, comparative, procedural, and exploratory queries, dynamically adjusting retrieval depth and response format.
Shared vector spaces allow a query in Japanese to retrieve authoritative sources in English, French, or Arabic, seamlessly aligned by meaning rather than translation.
Initial dense retrieval casts a wide net. A lightweight cross-encoder then scores precision, ensuring the top results match the exact nuance of the user's question.
Built for scale, resilience, and open standards. Our architecture handles petabytes of knowledge with sub-100ms query latency globally.
We believe knowledge should be accessible to developers, educators, and institutions. Aevum embraces open formats and provides robust API access.
Query the entire encyclopedia programmatically. Fetch articles, traverse knowledge graphs, or run semantic searches with authenticated endpoints and generous rate limits for educational use.
Our ontology maps directly to Wikidata entities and Schema.org markup, ensuring seamless integration with existing research tools, CMS platforms, and search engines.
Monthly snapshot exports in JSON-LD, RDF, and CSV formats are freely available for academic research, offline archiving, and AI training datasets.
Access our APIs, join the contributor network, or explore the full technical documentation.
Read Technical Docs →