Auto-Scaling Guide
CloudNexus Auto-Scaling automatically adjusts the number of running instances in your cluster based on real-time metrics. This ensures your application handles traffic spikes seamlessly while optimizing costs during low-demand periods.
Table of Contents
Overview
Auto-Scaling works by monitoring defined metrics and executing scaling policies when thresholds are breached. The system supports three modes:
- Target Tracking: Maintains a specific metric value.
- Step Scaling: Adjusts capacity based on the size of the alarm breach.
- Scheduled Scaling: Scales according to a predictable schedule.
Configuration
You can configure auto-scaling via the CloudNexus Console, CLI, or Infrastructure-as-Code (IaC).
Using the Console
- Navigate to Compute > Clusters.
- Select your target cluster.
- Click the Scaling tab and toggle Enable Auto-Scaling.
- Define min/max instances and select your scaling policy.
- Click Save Configuration.
CLI Configuration
Use the cx scale command to configure auto-scaling via terminal.
cx scale --cluster prod-web-cluster \
--min-instances 2 \
--max-instances 10 \
--policy target-tracking \
--metric cpu-utilization \
--target-value 70
IaC Example (CloudNexus YAML)
version: v2
resources:
cluster:
name: production-app
instances:
min: 2
max: 12
autoScaling:
enabled: true
strategy: target-tracking
cooldown: 300s
metrics:
- type: cpu
threshold: 75
- type: memory
threshold: 85
Supported Metrics
CloudNexus supports a wide range of built-in and custom metrics for scaling decisions.
| Metric Name | Description | Granularity |
|---|---|---|
CPU_Utilization |
Average CPU usage across instances | 1 minute |
Memory_Usage |
RAM consumption percentage | 1 minute |
Network_In |
Inbound network traffic (Mbps) | 30 seconds |
Request_Count |
HTTP requests per second (per instance) | 15 seconds |
Custom_Metric |
User-defined metrics via Agent | Variable |
Scaling Policies
Target Tracking
Maintains a specific average value for a metric. For example, keeping CPU utilization at 60%.
Step Scaling
Defines scaling adjustments based on the magnitude of the breach. Useful for handling varying load patterns.
Scheduled Scaling
Scales instances at predefined times. Ideal for predictable traffic patterns like daily batch jobs or retail hours.
# Schedule scaling for business hours (9 AM - 6 PM)
cx scale schedule --cluster retail-api \
--start 09:00 UTC \
--end 18:00 UTC \
--desired 5
Best Practices
1. Set Realistic Minimums
Always set a minimum instance count that ensures high availability. For production workloads, a minimum of 2 instances is recommended to handle single-node failures.
2. Configure Cool-Down Periods
Set an appropriate cool-down period (default: 300s) to prevent flapping. This allows instances to fully initialize and stabilize metrics before another scaling event.
3. Monitor Health Checks
Ensure your load balancer health checks are properly configured. Auto-scaling will replace unhealthy instances, but accurate health checks are crucial for this process.
4. Use Predictive Scaling
Enterprise users can enable Predictive Scaling, which uses ML to analyze historical data and scale proactively before traffic spikes occur.
Troubleshooting
Auto-Scaling Not Triggering
- Verify that the Auto-Scaling agent is installed and running on instances.
- Check Scaling Events logs in the console for errors.
- Ensure metrics are being reported (check Monitoring dashboard).
Instances Terminating Unexpectedly
- Check if instances are failing health checks.
- Review if a
Scale Inpolicy is active during low load. - Ensure Protection Mode isn't disabled on critical instances.
# Check scaling logs
cx logs scaling --cluster prod-web \
--since 24h \
--level error