Overview
The Data Ingestion module provides a unified interface for connecting external data sources to the GeoServer spatial database. It handles format parsing, coordinate reference system (CRS) transformation, schema inference, and error routing automatically.
Instead of writing custom ETL scripts, the ingestion engine normalizes data streams, applies GeoPackage/PostGIS schemas, and routes payloads to your storage layer with exactly-once delivery guarantees.
Supported Data Sources
Ingestion connectors are available for batch files, database exports, cloud storage, and real-time streams.
| Source Type | Formats | Delivery Mode | Latency |
|---|---|---|---|
| Geospatial | GeoJSON, Shapefile, KML, GeoTIFF, LAS/LAZ | Batch / Streaming | ~200ms - 5s |
| Database | PostGIS, MySQL, SQLite, MongoDB | Sync / CDC | ~500ms |
| Stream | Apache Kafka, AWS Kinesis, MQTT, WebSocket | Real-time | <100ms |
| Cloud Storage | S3, GCS, Azure Blob (CSV, Parquet, NDJSON) | Event-triggered | ~1-3s |
Pipeline Architecture
Data flows through a configurable four-stage pipeline. Each stage can be customized via YAML or the CLI.
- 1. Connect: Establish secure links to source endpoints. Supports OAuth2, API keys, and IAM roles.
- 2. Validate: Schema verification, CRS checking, topology validation, and duplicate detection.
- 3. Transform: Apply coordinate reprojection, attribute mapping, geometry simplification, and enrichment rules.
- 4. Route & Store: Write validated data to target storage (PostGIS, S3, GeoPackage) and publish layer endpoints.
Configuration Example
Define ingestion pipelines using a declarative YAML manifest. The example below configures a real-time Kafka stream ingesting telemetry points.
pipeline: kafka-telemetry-ingest source: type: kafka topic: vehicle_gps_v2 consumer_group: geoserver-fleet transform: crs_target: EPSG:4326 geometry_field: location simplify: true tolerance: 0.00001 sink: type: postgis schema: public table: telemetry_points error_policy: route_to_deadletter
Best Practices
- Validate early: Enable strict schema enforcement at the ingestion boundary to prevent downstream corruption.
- Partition by time/space: Use partitioned tables or spatial indexes (H3, S2, QuadTree) for high-velocity streams.
- Handle clock skew: Tag all records with ingestion timestamps separate from event timestamps.
- Monitor throughput: Set up alerts for queue depth, transformation errors, and sink latency spikes.
- Use idempotent writes: Enable upserts or unique constraints to safely retry failed messages.
API & CLI Integration
Trigger and manage pipelines programmatically via REST or the geospatial CLI tool. Authenticated requests require a service account with ingestion:write permissions.
# Start a pipeline run curl -X POST https://api.geoserver.io/v1/pipelines/run \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "pipeline_id": "kafka-telemetry-ingest", "mode": "streaming" }' # Check run status curl https://api.geoserver.io/v1/pipelines/status?limit=5