Auto-Scaling Guide

Guide Updated: Oct 24, 2025 Reading time: 8 min

CloudNexus Auto-Scaling automatically adjusts the number of running instances in your cluster based on real-time metrics. This ensures your application handles traffic spikes seamlessly while optimizing costs during low-demand periods.

ℹ️
Note: Auto-Scaling is available on Professional and Enterprise plans. The Starter plan supports manual scaling only.

Table of Contents

Overview

Auto-Scaling works by monitoring defined metrics and executing scaling policies when thresholds are breached. The system supports three modes:

Configuration

You can configure auto-scaling via the CloudNexus Console, CLI, or Infrastructure-as-Code (IaC).

Using the Console

  1. Navigate to Compute > Clusters.
  2. Select your target cluster.
  3. Click the Scaling tab and toggle Enable Auto-Scaling.
  4. Define min/max instances and select your scaling policy.
  5. Click Save Configuration.

CLI Configuration

Use the cx scale command to configure auto-scaling via terminal.

cx scale --cluster prod-web-cluster \
  --min-instances 2 \
  --max-instances 10 \
  --policy target-tracking \
  --metric cpu-utilization \
  --target-value 70

IaC Example (CloudNexus YAML)

version: v2
resources:
  cluster:
    name: production-app
    instances:
      min: 2
      max: 12
    autoScaling:
      enabled: true
      strategy: target-tracking
      cooldown: 300s
      metrics:
        - type: cpu
          threshold: 75
        - type: memory
          threshold: 85

Supported Metrics

CloudNexus supports a wide range of built-in and custom metrics for scaling decisions.

Metric Name Description Granularity
CPU_Utilization Average CPU usage across instances 1 minute
Memory_Usage RAM consumption percentage 1 minute
Network_In Inbound network traffic (Mbps) 30 seconds
Request_Count HTTP requests per second (per instance) 15 seconds
Custom_Metric User-defined metrics via Agent Variable

Scaling Policies

Target Tracking

Maintains a specific average value for a metric. For example, keeping CPU utilization at 60%.

Recommendation: Target Tracking is the easiest to configure and works well for most steady-state workloads.

Step Scaling

Defines scaling adjustments based on the magnitude of the breach. Useful for handling varying load patterns.

Scheduled Scaling

Scales instances at predefined times. Ideal for predictable traffic patterns like daily batch jobs or retail hours.

# Schedule scaling for business hours (9 AM - 6 PM)
cx scale schedule --cluster retail-api \
  --start 09:00 UTC \
  --end 18:00 UTC \
  --desired 5

Best Practices

1. Set Realistic Minimums

Always set a minimum instance count that ensures high availability. For production workloads, a minimum of 2 instances is recommended to handle single-node failures.

2. Configure Cool-Down Periods

Set an appropriate cool-down period (default: 300s) to prevent flapping. This allows instances to fully initialize and stabilize metrics before another scaling event.

3. Monitor Health Checks

Ensure your load balancer health checks are properly configured. Auto-scaling will replace unhealthy instances, but accurate health checks are crucial for this process.

4. Use Predictive Scaling

Enterprise users can enable Predictive Scaling, which uses ML to analyze historical data and scale proactively before traffic spikes occur.

⚠️
Warning: Avoid setting aggressive scaling thresholds (e.g., CPU > 30%). This can cause unnecessary scaling events and increased costs. We recommend thresholds between 60-80%.

Troubleshooting

Auto-Scaling Not Triggering

Instances Terminating Unexpectedly

# Check scaling logs
cx logs scaling --cluster prod-web \
  --since 24h \
  --level error
← VPS Hosting Load Balancers →