๐ Description
When the GeoServer WMS endpoint handles more than 200 concurrent requests, GetMap operations begin to timeout after approximately 30 seconds. This results in 504 Gateway Timeout responses from the reverse proxy, even though the underlying PostgreSQL/PostGIS database connections remain healthy.
The issue appears to be related to the default thread pool configuration in the image rendering pipeline. Under normal load (below 100 concurrent requests), response times are within acceptable thresholds (<2s). However, once concurrency exceeds the maxThreads=200 threshold, the request queue begins to accumulate and threads become blocked waiting on image processing locks.
๐ Steps to Reproduce
- Configure GeoServer 2.24.0+ with a PostGIS data store containing a large raster layer (>500MB)
- Set up load testing with
abork6targeting the WMS GetMap endpoint - Send
250 concurrent requestswith identical bounding box parameters - Observe that requests begin timing out at approximately the 180th concurrent request
- Check
gc.logโ notice significant GC pauses (2โ4 seconds) correlating with timeout events
๐ป Expected Behavior
All GetMap requests should complete within the configured timeout window (60s) without gateway errors, and the thread pool should gracefully handle backpressure.
โ ๏ธ Actual Behavior
Requests beyond the 200 concurrent threshold timeout with 504 errors. Thread dumps show multiple threads blocked on ImageIO$ContainsFilter locks, and heap usage spikes to ~92% before full GC cycles trigger.
๐ Environment
# Environment Details
GeoServer : 2.24.3
Java : OpenJDK 17.0.8
OS : Ubuntu 22.04 LTS
Memory : -Xmx8g -Xms4g -XX:MaxMetaspaceSize=512m
DB : PostgreSQL 15.2 / PostGIS 3.3
Proxy : NGINX 1.24.0 (upstream_timeout 60s)
Layer Size : 1,247 MB GeoTIFF (38720 ร 25848)
Concurrency : 250 simultaneous requests (k6)
๐ Stack Trace (Excerpt)
Thread "http-nio-8080-exec-47" #1892 daemon running
at java.desktop/com.sun.imageio.plugins.jpeg.JPEGImageReader.read(Unknown Source)
at java.desktop/javax.imageio.ImageIO.read(ImageIO.java:1422)
at org.geoserver.wms.map.StreamingImageResponse.encode(StreamingImageResponse.java:127)
at org.geoserver.wms.map.RenderingResult.write(RenderingResult.java:89)
at org.geoserver.wms.GetMap.execute(GetMap.java:312)
at org.geoserver.ows.AbstractDescribeRequest.handle(AbstractDescribeRequest.java:54)
Locked ownable synchronizers:
- locked <0x00000007231abcd0> (a org.geotools.image.RasterSymbolizer)
๐ฌ Activity & Comments (5)
We're seeing the same behavior in production on v2.24.2. Our monitoring shows the issue starts at around 150 concurrent requests, not 200 as described. Could this be related to JVM memory settings? We're running with -Xmx4g vs the -Xmx8g in the environment notes.
Also, has anyone tried the workaround of enabling UseJAI=false in the WMS config?
@James Rodriguez โ yes, memory settings definitely play a role. With -Xmx4g, the full GC pauses will be more frequent. I've benchmarked both configs and confirmed that -Xmx8g delays the onset but doesn't eliminate the issue.
The real fix is Option B โ implementing tile-based rendering with GridCoverage2D. I've got a PR branch with a prototype that reduces memory pressure by ~60% and pushes the concurrency threshold beyond 500 requests. Still need to add tests and clean up the code.
@Alex Chen โ can you test the patch on your environment? I'll share the build artifact via Slack.
Thanks for the detailed report, Alex. I've reproduced this on our staging cluster. The root cause seems to be the default
ImageIOcache settings. Under high concurrency, the cache locks up because all threads are trying to read from the same raster source simultaneously.I'm investigating two potential fixes:
Option A: Increase the
imageio.cache.limitproperty inglobal.xmland add connection pooling for raster readers.Option B: Switch to a non-blocking image rendering pipeline using
GeoTools GridCoverage2Dwith tile-based processing instead of full-raster loads.