High-performance, distributed architecture designed to serve 15M+ lexical entries and 5M+ daily queries with sub-50ms latency across 100+ languages.
Dictionary uses a microservices architecture deployed across multiple availability zones. Traffic is routed through a global CDN and API gateway, with stateless application layers backed by distributed caches and event-driven data pipelines.
Each service is independently deployable, horizontally scalable, and communicates via gRPC or async event buses.
Central entry point handling authentication, rate limiting, request routing, and protocol translation. Supports GraphQL and REST endpoints with automatic versioning.
Kong / EnvoyPowering full-text lexical search across 15M+ entries. Uses inverted indices, n-gram tokenization, and phonetic matching for typo tolerance and fuzzy search.
Elasticsearch / MeilisearchContext-aware definition generation, synonym extraction, part-of-speech tagging, and real-time translation. Fine-tuned transformer models serve via optimized inference endpoints.
PyTorch / vLLMPrimary service for word metadata, etymology, usage examples, and audio pronunciations. Implements caching strategies and read replicas for high throughput.
Go / RustAsync processing for indexing, audio generation, model inference, and analytics. Guarantees exactly-once delivery and dead-letter queue handling.
Kafka / RedpandaOAuth 2.0 / OIDC compliance, JWT rotation, session management, and role-based access control for enterprise tenants and API keys.
Keycloak / Auth0How a word query traverses the system from client to response.
User submits query via web app, mobile SDK, or REST/GraphQL API. Request includes language code, context flags, and authentication token.
CDN edge node validates JWT, checks distributed Redis cache. 85% of hot queries are served directly from edge cache in <10ms.
Missed requests route to API Gateway → Search Service. Query is normalized, stemmed, and passed to AI engine for contextual enrichment.
Lexical Service fetches metadata, audio URLs, and cross-references. Results are aggregated, serialized, cached, and returned to client.
Production-grade tools selected for performance, observability, and developer experience.
Built for resilience, compliance, and global scale.