Implementation Notes

Technical reference, integration guidelines, and deployment documentation for the Aevum Encyclopedia platform. Updated for v3.2.0.

System Overview

Aevum Encyclopedia is a distributed knowledge platform built on a microservices architecture. It combines deterministic search indexing, dense vector retrieval, and a RAG (Retrieval-Augmented Generation) pipeline to deliver verified, multi-lingual encyclopedia content.

ℹ️ Platform Scope

This documentation covers backend integration, API usage, data pipeline configuration, and deployment requirements. Frontend SDK details are available in the companion Client Integration Guide.

Architecture

The system is organized into four primary layers:

Ingestion Layer: Handles content submission, OCR, multilingual NLP preprocessing, and metadata extraction.
Storage & Indexing: PostgreSQL (relational), Elasticsearch (lexical/BM25), and Weaviate/Qdrant (vector embeddings).
AI & Orchestration: LLM routing, fact-checking classifiers, citation validation, and response synthesis.
Delivery Layer: REST/GraphQL APIs, CDN caching, WebSocket event streams, and edge function routing.

⚠️ Data Consistency Note

Vector indexes are updated asynchronously with a maximum lag of ~120 seconds. Critical production queries should route through the hybrid search endpoint which synchronizes BM25 and vector results in real-time.

Quick Start

Initialize a local development environment using Docker Compose. Ensure Docker Engine v24+ and Docker Compose v2.20+ are installed.

# Clone and initialize
git clone https://github.com/aevum-encyclopedia/platform-core.git
cd platform-core

# Configure environment
cp .env.example .env
# Edit .env with your API keys and DB credentials

# Start services
docker compose up -d

# Verify health endpoints
curl -s http://localhost:8080/health | jq

Once running, the developer dashboard will be available at http://localhost:3000/dev. The API gateway listens on port 8080.

Authentication

Aevum uses JWT-based authentication with rotating refresh tokens. All API requests require an Authorization: Bearer <token> header.

Scope	Description	Rate Limit
`read:articles`	Retrieve encyclopedia entries	1200 req/min
`write:contributions`	Submit drafts & edits	150 req/min
`admin:index`	Trigger reindexing & pipeline jobs	50 req/min
`ai:generate`	Access RAG & summarization endpoints	300 req/min

# Token exchange example
POST /auth/token
{
  "grant_type": "client_credentials",
  "client_id": "your_client_id",
  "client_secret": "your_secret",
  "audience": "https://api.aevum-encyclopedia.com"
}

Core Endpoints

Article Retrieval

GET /v3/articles/{id} returns a structured article object with embedded citations, multilingual variants, and revision metadata.

Hybrid Search

POST /v3/search/hybrid accepts query parameters including language, domain filtering, and confidence thresholds.

{
  "query": "quantum error correction mechanisms",
  "filters": { "domain": "physics", "min_accuracy_score": 0.85 },
  "vector_weight": 0.7,
  "max_results": 10
}

Ingestion Pipeline

Content flows through a deterministic state machine before becoming publicly indexable:

Raw Submission: Markdown/PDF/HTML accepted via API or contributor dashboard.
NLP Processing: Entity extraction, language detection, and section parsing.
Fact-Verification: Cross-referencing against trusted corpora and primary sources.
Vectorization: Chunking and embedding via text-embedding-3-large or custom fine-tuned models.
Index Sync: BM25 and vector indexes updated; CDN cache invalidated.

✅ Pipeline Monitoring

Use the GET /v3/pipeline/status/{job_id} endpoint to track ingestion progress. Webhooks can be configured for article.published and verification.failed events.

Search & AI Configuration

The search engine uses a hybrid scoring formula: final_score = α * BM25 + (1-α) * VectorSimilarity. Default α is 0.4.

AI generation endpoints support structured output via JSON schema enforcement. Always specify response_format: "json_object" for programmatic consumption.

Parameter	Type	Default	Description
`temperature`	float	0.2	Controls generation randomness. Lower = more factual.
`max_tokens`	int	4096	Response length limit.
`enforce_citations`	bool	true	Requires inline source references in output.
`fallback_to_lexical`	bool	true	Triggers BM25-only if vector recall < 0.6

Article Schema

Standardized JSON structure for all encyclopedia entries. Versioned for backward compatibility.


{
  "id": "uuid-v4",
  "slug": "topic-name",
  "title": { "en": "...", "es": "...", "fr": "..." },
  "content": [ { "type": "paragraph", "text": "..." }, { "type": "citation", "source_id": "..." } ],
  "metadata": {
    "domain": "science|history|tech|...",
    "difficulty": "beginner|intermediate|advanced",
    "last_verified": "ISO-8601",
    "contribution_count": 42
  },
  "vector_id": "hex-string",
  "status": "draft|review|published|archived"
}

Environment Setup

Production deployments require the following environment variables. Secrets should be managed via HashiCorp Vault or AWS Secrets Manager.

Variable	Required	Description
`AUV_DB_URI`	Yes	PostgreSQL connection string
`AUV_VECTOR_HOST`	Yes	Qdrant/Weaviate endpoint
`AUV_LLM_API_KEY`	Yes	Primary AI provider credentials
`AUV_CACHE_TTL`	No	Redis TTL in seconds (default: 3600)
`AUV_LOG_LEVEL`	No	debug\|info\|warn\|error

⚠️ Production Hardening

Enable AUV_RATE_LIMIT=strict and configure WAF rules for all public endpoints. Vector databases must be deployed within VPCs with private subnets only.

Security & Compliance

Aevum complies with GDPR, CCPA, and SOC2 Type II standards. Key implementation requirements:

All PII is pseudonymized at rest using AES-256-GCM.
Contribution history is immutable; edits append to an append-only log.
API keys are scoped and rotatable. Revocation propagates within 5 seconds.
Automated DLP scanning runs on all ingested content to prevent copyright/regulatory violations.

Maintenance & Versioning

The platform follows Semantic Versioning (MAJOR.MINOR.PATCH). Breaking changes are only introduced in MAJOR releases, with a 6-month deprecation window for affected endpoints.

Recommended Maintenance Schedule

Weekly: Vector index compaction and cache pruning
Monthly: Model fine-tuning updates and schema migrations
Quarterly: Security audits, penetration testing, and disaster recovery drills

🔄 Migration Guide

When upgrading from v2.x to v3.x, run docker compose run --rm migrations upgrade before starting services. Database backups are strongly recommended.