Monitoring & Observability
Unified visibility across your entire CloudNexus infrastructure. Collect, query, and analyze metrics, logs, and traces in a single pane of glass.
Platform Overview
CloudNexus Observability provides a unified platform for infrastructure and application monitoring. Built on a columnar time-series database and distributed log storage, it delivers sub-second query performance at petabyte scale.
📊 High-Resolution Metrics
Collect custom and system metrics at 1s intervals with automatic downsampling and retention policies.
📜 Structured Logs
Parse, index, and query logs with full-text search. Correlate log events with metrics and traces automatically.
🔍 Distributed Tracing
End-to-end request tracing with OpenTelemetry native support, service maps, and span-level analysis.
🚨 Intelligent Alerting
Threshold, anomaly detection, and composite alert rules with Slack, PagerDuty, and webhook integration.
Architecture
The observability pipeline operates on a distributed collector architecture. The CloudNexus Agent (CN-Agent) handles local metric scraping, log tailing, and trace aggregation before forwarding to the regional ingestion gateway. Data is sharded across availability zones for high availability and low-latency query resolution.
# CloudNexus Agent Configuration global: endpoint: https://ingest.cloudnexus.io/v2 api_key: ${CN_API_KEY} region: us-east-1 flush_interval: 10s metrics: collection_interval: 15s scrape_targets: - localhost:9100 # node_exporter - localhost:9090 # cloudnexus_metrics logs: sources: - type: file path: /var/log/app/*.log parser: json fields: service: payment-gateway environment: production
Metrics Collection & Querying
CloudNexus supports Prometheus exposition format, StatsD, and OpenMetrics. All metrics are automatically tagged with host, region, and instance metadata. Use our PromQL-compatible query language for complex aggregations.
Supported Metric Types
| Type | Description | Use Case |
|---|---|---|
| Gauge | Instantaneous value | CPU usage, memory, queue depth |
| Counter | Monotonically increasing | HTTP requests, error counts |
| Histogram | Observations distribution | Request latency, payload size |
| Summary | Client-side quantiles | Service response times |
Query Language
Write PromQL expressions to slice, dice, and aggregate your metrics. Native functions include `rate()`, `histogram_quantile()`, `absent()`, and custom window functions.
# 95th percentile HTTP response time per service histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))
Log Management & Pipelines
Ingest, parse, and query logs with full-text search and structured field extraction. Logs are automatically correlated with metrics and traces for root cause analysis.
Log Pipelines
Transform logs before storage using our declarative pipeline syntax. Drop noise, enrich with geo-IP or DNS data, redact PII, and route to different retention tiers.
pipeline: name: prod-logs-processor input: type: cloudnexus_ingest match: "environment:production" stages: - type: parse format: grok pattern: "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}" - type: filter drop: "level:DEBUG" - type: redact fields: ["email", "credit_card"] replacement: "***REDACTED***" output: storage: hot_tier retention: 90d
Search Syntax
level:error AND service:auth- Boolean operatorsmessage:*timeout*- Wildcard searchtrace_id:abc123- Cross-pillar correlationduration:[5s TO 10s]- Range queries
Distributed Tracing
Trace requests across microservices, serverless functions, and external dependencies. Native OpenTelemetry SDK integration with automatic instrumentation for 20+ runtimes.
Span Attributes
Each span captures timing, status, and custom attributes. CloudNexus automatically extracts HTTP, gRPC, and database context attributes. Enrich spans with user IDs, session tokens, and deployment versions.
import { trace } from '@cloudnexus/otel-sdk'; const tracer = trace.getTracer('payment-service'); async function processPayment(orderId) { return await tracer.startActiveSpan('payment.charge', async (span) => { span.setAttribute('order.id', orderId); try { const result = await stripe.charges.create({ amount: 5000 }); span.setAttribute('stripe.id', result.id); return result; } catch (err) { span.recordException(err); span.setStatus({ code: 2 }); // ERROR throw err; } }); }
Service Maps
Automatically generated topology graphs show service dependencies, error rates, and latency percentiles per edge. Click any node to drill into detailed span lists and error logs.
Alerting & Notification Rules
Define alert conditions using PromQL, log queries, or trace error rates. Supports evaluation windows, grouping, inhibition rules, and multi-channel routing.
apiVersion: cloudnexus.io/v1 kind: AlertRule metadata: name: high-error-rate namespace: production spec: query: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05 for: 3m severity: critical labels: team: platform runbook_url: https://wiki.cloudnexus.io/alerts/5xx annotations: summary: "{{ $labels.service }} error rate exceeds 5%" description: "{{ $value | humanizePercentage }} of requests failing" routes: - channel: slack webhook: ${SLACK_WEBHOOK} - channel: pagerduty service_key: ${PD_KEY} escalate_after: 5m
Supported Integrations
Native exporters and SDKs for major ecosystems. Configure once, deploy across environments.
Kubernetes
Auto-discovery of pods, services, and deployments. Heapster & cAdvisor integration.
OpenTelemetry
SDKs for Go, Python, Java, Node.js, .NET. Context propagation & batching.
Cloud Providers
AWS CloudWatch, GCP Stackdriver, Azure Monitor metric forwarding.
CI/CD
Terraform provider, Kubernetes Operator, GitHub Actions integration.
Quick Start
Deploy the CloudNexus Agent in under 60 seconds:
- Generate an API key from the Console → Settings → Integrations
- Apply the Kubernetes manifest or run the Docker container
- Verify ingestion in the Metrics Explorer dashboard
- Configure your first alert rule via CLI or API