Aevum Encyclopedia — Technical Specification

Document Type: /type-spec-pdf | Classification: Internal / Engineering Reference

Document IDAES-TS-2025-004
Version2.4.1-stable
StatusApproved
Last Updated2025-09-12
AuthorPlatform Architecture Team
Target EnvironmentProduction / Edge CDN / AI Pipeline

1. Document Control & Revision History

VersionDateAuthorChange DescriptionReview Status
2.4.12025-09-12Architectural OpsUpdated KG graph schema v4; added RAG pipeline latency SLAsApproved
2.4.02025-08-28AI SystemsIntroduced vectorized semantic search endpoints; deprecated legacy keyword indexingApproved
2.3.22025-07-15Security TeamEnforced OIDC SSO requirements; updated CORS policies for multi-region CDNsApproved
2.3.02025-06-01Core EngineeringInitial publication of /type-spec-pdf format; API v3 stabilizationApproved

2. System Architecture Overview

Scope: Defines the reference architecture for the Aevum Encyclopedia platform, including ingestion, storage, indexing, AI processing, and public API delivery layers.

The platform operates on a microservices architecture deployed across multi-region AWS/GCP infrastructure. Core components include:

3. Core Data Models

3.1 Article Entity

{
  "id": "ae:art:9f8e7d6c5b4a",
  "slug": "quantum-entanglement-fundamentals",
  "title": "Quantum Entanglement: Principles & Applications",
  "status": "published",
  "language": "en",
  "categories": ["physics", "quantum-mechanics", "theoretical-science"],
  "word_count": 8420,
  "last_reviewed": "2025-09-10T14:22:00Z",
  "verification_score": 0.987,
  "metadata": {
    "authors": ["dr.kim.r", "prof.tanaka.m"],
    "peer_review_cycle": 4,
    "citation_count": 312,
    "graph_nodes_linked": 47
  }
}

3.2 Contributor Profile

FieldTypeDescriptionConstraints
contrib_idUUID v7Unique identifier for verified contributorsImmutable
domain_certificationsArray[String]Verified academic/professional credentialsMax 12, audit-tracked
edit_weightFloat (0.0-1.0)Trust score based on historical accuracyDecay: 0.02/year
last_activeISO 8601Timestamp of last successful submissionRequired

4. API Specification (v3 Stable)

4.1 Endpoint Routing

MethodPathDescriptionAuth Required
GET/v3/articles/{id}Retrieve full article with metadata & citationsNo (Public)
GET/v3/search/semanticVector-based concept search with similarity rankingAPI Key
POST/v3/articles/draftSubmit new article for AI verification queueOIDC + Contributor Token
GET/v3/graph/relationships/{entity}Fetch connected nodes & edge weightsAPI Key

4.2 Request/Response Contract (Semantic Search)

GET /v3/search/semantic?q="CRISPR gene editing ethics"&limit=5&rank_by=relevance

Headers:
  Authorization: Bearer <api_key>
  X-Request-ID: ae-req-88f2a1b9

Response (200 OK):
{
  "query_vector": [0.12, -0.45, 0.88, ...],
  "results": [
    {
      "id": "ae:art:c1d2e3f4",
      "title": "Ethical Frameworks in Genomic Editing",
      "similarity_score": 0.942,
      "match_dimensions": ["bioethics", "crispr-cas9", "regulatory-policy"],
      "snippet": "Current international guidelines emphasize transparent oversight..."
    }
  ],
  "metadata": {
    "latency_ms": 42,
    "vector_index_version": "v4.2.1",
    "cache_hit": false
  }
}

5. AI & Knowledge Graph Pipeline

The verification and enrichment pipeline operates in four sequential stages:

  1. Ingestion & Normalization: Raw markdown/HTML converted to standardized internal schema. Citations parsed via CrossRef/DOI lookup.
  2. Entity Resolution: Named Entity Recognition (NER) maps terms to Wikidata/Neo4j knowledge base. Disambiguation applied via context vectors.
  3. Fact-Checking Layer: Dual-model evaluation (LLM-A for claim extraction, LLM-B for source verification). Claims marked verified, pending, or disputed.
  4. Graph Propagation: Validated entities injected into relationship graph. Edge weights updated based on citation frequency and cross-disciplinary references.
SLA Target: 95th percentile pipeline latency ≤ 3.2s per article (avg. 6k words). Batch processing throughput: 12,000 articles/hour.

6. Security, Compliance & Data Governance

DomainStandard/PolicyImplementation
AuthenticationOAuth 2.1 / OpenID ConnectOIDC provider integration with hardware-backed MFA for editorial roles
Data EncryptionAES-256-GCM / TLS 1.3At-rest (S3/EBS), in-transit (mTLS between services), key rotation every 90 days
Privacy ComplianceGDPR, CCPA, FERPAPseudonymized contributor data, right-to-erasure hooks, regional data residency (EU/US/APAC)
Content ModerationAutomated + Human ReviewToxicity filtering, plagiarism detection (Turnitin API), dispute resolution queue

7. Deployment & Infrastructure

7.1 Environment Matrix

EnvironmentRegionScaling PolicyDB Replica Factor
Productionus-east-1, eu-west-1, ap-southeast-1KEDA auto-scaling on CPU > 70% or queue depth > 5003 (Active-Active)
Stagingus-west-2Manual scaling, nightly sync from prod snapshot1 (Read-only)
Dev/SandboxLocal/Docker ComposeFixed allocation1 (Ephemeral)

7.2 CI/CD Pipeline

pipeline:
  stages:
    - lint & unit tests (Go, Python, TypeScript)
    - schema migration validation (Prisma/Liquibase)
    - security scan (Trivy, Snyk)
    - canary deployment (10% traffic, 30m observation)
    - full rollout + rollback guardrails
    - post-deploy integration smoke tests

8. Appendix: Error Codes & Rate Limits

CodeHTTP StatusDescriptionRetry Strategy
AES-4001400Malformed JSON or missing required fieldsFix payload
AES-4010401Expired or invalid API tokenRe-authenticate
AES-4290429Rate limit exceeded (1000 req/min default)Exponential backoff
AES-5030503AI verification pipeline saturatedRetry with jitter (max 5 attempts)