Implementation Notes

Technical reference, integration guidelines, and deployment documentation for the Aevum Encyclopedia platform. Updated for v3.2.0.

System Overview

Aevum Encyclopedia is a distributed knowledge platform built on a microservices architecture. It combines deterministic search indexing, dense vector retrieval, and a RAG (Retrieval-Augmented Generation) pipeline to deliver verified, multi-lingual encyclopedia content.

ℹ️ Platform Scope

This documentation covers backend integration, API usage, data pipeline configuration, and deployment requirements. Frontend SDK details are available in the companion Client Integration Guide.

Architecture

The system is organized into four primary layers:

  • Ingestion Layer: Handles content submission, OCR, multilingual NLP preprocessing, and metadata extraction.
  • Storage & Indexing: PostgreSQL (relational), Elasticsearch (lexical/BM25), and Weaviate/Qdrant (vector embeddings).
  • AI & Orchestration: LLM routing, fact-checking classifiers, citation validation, and response synthesis.
  • Delivery Layer: REST/GraphQL APIs, CDN caching, WebSocket event streams, and edge function routing.
⚠️ Data Consistency Note

Vector indexes are updated asynchronously with a maximum lag of ~120 seconds. Critical production queries should route through the hybrid search endpoint which synchronizes BM25 and vector results in real-time.

Quick Start

Initialize a local development environment using Docker Compose. Ensure Docker Engine v24+ and Docker Compose v2.20+ are installed.

# Clone and initialize
git clone https://github.com/aevum-encyclopedia/platform-core.git
cd platform-core

# Configure environment
cp .env.example .env
# Edit .env with your API keys and DB credentials

# Start services
docker compose up -d

# Verify health endpoints
curl -s http://localhost:8080/health | jq

Once running, the developer dashboard will be available at http://localhost:3000/dev. The API gateway listens on port 8080.

Authentication

Aevum uses JWT-based authentication with rotating refresh tokens. All API requests require an Authorization: Bearer <token> header.

ScopeDescriptionRate Limit
read:articlesRetrieve encyclopedia entries1200 req/min
write:contributionsSubmit drafts & edits150 req/min
admin:indexTrigger reindexing & pipeline jobs50 req/min
ai:generateAccess RAG & summarization endpoints300 req/min
# Token exchange example
POST /auth/token
{
  "grant_type": "client_credentials",
  "client_id": "your_client_id",
  "client_secret": "your_secret",
  "audience": "https://api.aevum-encyclopedia.com"
}

Core Endpoints

Article Retrieval

GET /v3/articles/{id} returns a structured article object with embedded citations, multilingual variants, and revision metadata.

Hybrid Search

POST /v3/search/hybrid accepts query parameters including language, domain filtering, and confidence thresholds.

{
  "query": "quantum error correction mechanisms",
  "filters": { "domain": "physics", "min_accuracy_score": 0.85 },
  "vector_weight": 0.7,
  "max_results": 10
}

Ingestion Pipeline

Content flows through a deterministic state machine before becoming publicly indexable:

  1. Raw Submission: Markdown/PDF/HTML accepted via API or contributor dashboard.
  2. NLP Processing: Entity extraction, language detection, and section parsing.
  3. Fact-Verification: Cross-referencing against trusted corpora and primary sources.
  4. Vectorization: Chunking and embedding via text-embedding-3-large or custom fine-tuned models.
  5. Index Sync: BM25 and vector indexes updated; CDN cache invalidated.
✅ Pipeline Monitoring

Use the GET /v3/pipeline/status/{job_id} endpoint to track ingestion progress. Webhooks can be configured for article.published and verification.failed events.

Search & AI Configuration

The search engine uses a hybrid scoring formula: final_score = α * BM25 + (1-α) * VectorSimilarity. Default α is 0.4.

AI generation endpoints support structured output via JSON schema enforcement. Always specify response_format: "json_object" for programmatic consumption.

ParameterTypeDefaultDescription
temperaturefloat0.2Controls generation randomness. Lower = more factual.
max_tokensint4096Response length limit.
enforce_citationsbooltrueRequires inline source references in output.
fallback_to_lexicalbooltrueTriggers BM25-only if vector recall < 0.6

Article Schema

Standardized JSON structure for all encyclopedia entries. Versioned for backward compatibility.


{
  "id": "uuid-v4",
  "slug": "topic-name",
  "title": { "en": "...", "es": "...", "fr": "..." },
  "content": [ { "type": "paragraph", "text": "..." }, { "type": "citation", "source_id": "..." } ],
  "metadata": {
    "domain": "science|history|tech|...",
    "difficulty": "beginner|intermediate|advanced",
    "last_verified": "ISO-8601",
    "contribution_count": 42
  },
  "vector_id": "hex-string",
  "status": "draft|review|published|archived"
}

Environment Setup

Production deployments require the following environment variables. Secrets should be managed via HashiCorp Vault or AWS Secrets Manager.

VariableRequiredDescription
AUV_DB_URIYesPostgreSQL connection string
AUV_VECTOR_HOSTYesQdrant/Weaviate endpoint
AUV_LLM_API_KEYYesPrimary AI provider credentials
AUV_CACHE_TTLNoRedis TTL in seconds (default: 3600)
AUV_LOG_LEVELNodebug|info|warn|error
⚠️ Production Hardening

Enable AUV_RATE_LIMIT=strict and configure WAF rules for all public endpoints. Vector databases must be deployed within VPCs with private subnets only.

Security & Compliance

Aevum complies with GDPR, CCPA, and SOC2 Type II standards. Key implementation requirements:

  • All PII is pseudonymized at rest using AES-256-GCM.
  • Contribution history is immutable; edits append to an append-only log.
  • API keys are scoped and rotatable. Revocation propagates within 5 seconds.
  • Automated DLP scanning runs on all ingested content to prevent copyright/regulatory violations.

Maintenance & Versioning

The platform follows Semantic Versioning (MAJOR.MINOR.PATCH). Breaking changes are only introduced in MAJOR releases, with a 6-month deprecation window for affected endpoints.

Recommended Maintenance Schedule

  • Weekly: Vector index compaction and cache pruning
  • Monthly: Model fine-tuning updates and schema migrations
  • Quarterly: Security audits, penetration testing, and disaster recovery drills
🔄 Migration Guide

When upgrading from v2.x to v3.x, run docker compose run --rm migrations upgrade before starting services. Database backups are strongly recommended.