We've identified a memory leak occurring in the worker thread pool when handling async callbacks. The leak appears after ~4 hours of runtime under heavy load, eventually triggering the OOM killer.
Reproduction
Steps to reproduce the issue:
- Start worker with
--pool-size=16 - Run stress test suite for 4 hours
- Monitor memory usage via
/metricsendpoint
Logs
[INFO] Memory: 245MB | Threads: 16/16 [INFO] Memory: 312MB | Threads: 16/16 [WARN] Memory: 512MB | Threads: 16/16 [WARN] Memory: 890MB | Threads: 16/16 [CRIT] Memory: 1.2GB | Threads: 16/16 [CRIT] OOM Killer invoked
Initial investigation suggests closure retention in the callback queue is preventing garbage collection of completed request objects.
Thanks for the detailed report, alex.dev. I can reproduce this on v2.4.1.
Looking at the heap dump, it seems related to the reference cycle in the callback queue. The worker holds a strong reference to the context object even after the callback completes. I'll draft a PR to fix the reference cycle and add a memory cap to the queue.
U
Leave a comment