Flattening Deeply Nested GeoJSON Feature Collections Safely Jump to heading
Geospatial ETL pipelines frequently break when ingesting vendor-generated GeoJSON containing arbitrary nesting levels, mixed-type arrays, and unbounded metadata objects. Flattening deeply nested GeoJSON feature collections safely requires deterministic recursion boundaries, strict coordinate precision management, and explicit type coercion rules. Without these controls, downstream spatial databases encounter schema drift, geometry validation failures, and silent data truncation.
Core Failure Modes in Nested GeoJSON Jump to heading
Government and enterprise GIS systems expect rigid attribute schemas. When a feature collection contains deeply nested objects or coordinate-like strings, naive flattening produces unpredictable column names, type collisions, and geometry corruption. The most frequent pipeline failures include:
ValueError: invalid literal for float()during coordinate parsingSchemaMismatchduring batch inserts into PostGIS or GeoPackage- Silent precision loss when floating-point coordinates exceed database decimal limits
- Unhandled
Nonegeometries triggering topology validation crashes
Resolving these requires a configuration-driven approach that separates spatial geometry from attribute flattening, enforces explicit recursion depth, and applies deterministic key generation. For foundational patterns on handling complex payloads, review established Nested JSON/GeoJSON Flattening methodologies that prioritize schema stability over raw data preservation.
Step 1: Define Strict Flattening Boundaries and Type Rules Jump to heading
Begin by isolating the flattening configuration from execution logic. A minimal, reproducible configuration establishes maximum recursion depth, coordinate precision thresholds, and explicit type mapping. This prevents unbounded traversal of vendor metadata and ensures consistent column naming.
FLATTEN_CONFIG = {
"max_depth": 4,
"separator": "__",
"coordinate_precision": 6,
"type_coercion": {
"bool": ["true", "false", "1", "0"],
"int": ["integer", "count", "total"],
"float": ["decimal", "ratio", "coordinate", "lat", "lon"],
"string": "default"
},
"preserve_keys": ["id", "geometry", "properties"],
"drop_nulls": True,
"default_crs": "EPSG:4326"
}
Apply the following thresholds and rules during pipeline initialization:
- Recursion Depth: Cap at
4levels. Covers 99% of municipal and federal GeoJSON payloads without risking stack overflow. - Coordinate Precision: Set to
6decimal places. Aligns with ~0.1 m WGS84 accuracy and prevents floating-point bloat in spatial indexes. - Type Coercion: Map string hints to explicit Python types before database insertion. Eliminates downstream casting errors.
- Null Handling: Drop
Nonevalues by default to prevent sparse column generation in columnar stores. - CRS Enforcement: Validate or default to
EPSG:4326per RFC 7946 compliance.
Step 2: Single-Step Automation with Precision Management Jump to heading
Implement a deterministic flattening engine that separates geometry processing from attribute traversal. The following class handles recursion, type coercion, and precision clamping in a single pass.
import json
import logging
from typing import Any, Dict, List
logger = logging.getLogger(__name__)
class GeoJSONFlattener:
def __init__(self, config: Dict[str, Any]):
self.max_depth = config.get("max_depth", 4)
self.separator = config.get("separator", "__")
self.precision = config.get("coordinate_precision", 6)
self.type_coercion = config.get("type_coercion", {})
self.drop_nulls = config.get("drop_nulls", True)
self.default_crs = config.get("default_crs", "EPSG:4326")
def _coerce_value(self, key: str, value: Any) -> Any:
if value is None:
return None
if isinstance(value, bool):
return value
if isinstance(value, (int, float)):
return value
val_str = str(value).strip().lower()
if val_str in self.type_coercion.get("bool", []):
return val_str in ("true", "1")
if any(k in key.lower() for k in self.type_coercion.get("int", [])):
try:
return int(float(value))
except (ValueError, TypeError):
pass
if any(k in key.lower() for k in self.type_coercion.get("float", [])):
try:
return round(float(value), self.precision)
except (ValueError, TypeError):
pass
return str(value)
def _flatten_dict(self, obj: Dict, parent_key: str = "", depth: int = 0) -> Dict:
items: Dict[str, Any] = {}
if depth >= self.max_depth:
# Serialize entire sub-object at depth limit
return {parent_key: json.dumps(obj, ensure_ascii=False)} if parent_key else {}
for k, v in obj.items():
new_key = f"{parent_key}{self.separator}{k}" if parent_key else k
if isinstance(v, dict):
items.update(self._flatten_dict(v, new_key, depth + 1))
elif isinstance(v, list):
if all(isinstance(x, (str, int, float, bool, type(None))) for x in v):
items[new_key] = [self._coerce_value(new_key, x) for x in v]
else:
items[new_key] = json.dumps(v, ensure_ascii=False)
else:
items[new_key] = self._coerce_value(new_key, v)
return items
def _clamp_precision(self, coords: Any) -> Any:
if isinstance(coords, list):
return [self._clamp_precision(c) for c in coords]
if isinstance(coords, (int, float)):
return round(float(coords), self.precision)
return coords
def process_feature(self, feature: Dict) -> Dict:
if not isinstance(feature, dict) or feature.get("type") != "Feature":
raise ValueError("Invalid GeoJSON Feature structure")
geom = feature.get("geometry")
if geom is None or not isinstance(geom, dict):
# Preserve a null geometry marker rather than fabricating coordinates
geom = None
else:
geom = dict(geom)
geom["coordinates"] = self._clamp_precision(geom.get("coordinates"))
props = feature.get("properties") or {}
flat_props = self._flatten_dict(props)
if self.drop_nulls:
flat_props = {k: v for k, v in flat_props.items() if v is not None}
return {
"feature_id": feature.get("id"),
"geometry_type": geom.get("type") if geom else None,
"geometry": geom,
**flat_props,
}
Step 3: Edge Case Handling & CI Integration Jump to heading
Production pipelines must survive malformed inputs, CRS drift, and automated test environments. Implement the following safeguards before batch execution.
- Missing Fields: The engine sets
geometrytoNonewhen absent. Downstream tools can useST_IsValidor shapely’sis_validto flag null geometries without crashing. - CRS Mismatches: Validate the
crsmember if present. RFC 7946 deprecated the top-levelcrsproperty; any non-urn:ogc:def:crs:OGC:1.3:CRS84declaration indicates a non-compliant payload. Log atWARNINGlevel to trigger CI alerts. - CI Failures: Wrap batch processing in structured try/except blocks. Return standardized error dictionaries instead of halting the pipeline.
def process_batch(flattener: GeoJSONFlattener, features: List[Dict]) -> List[Dict]:
results = []
for idx, feat in enumerate(features):
try:
# Warn on non-standard CRS declarations (RFC 7946 §4 deprecates top-level crs)
crs = feat.get("crs", {})
if crs:
crs_name = crs.get("properties", {}).get("name", "")
if crs_name not in ("urn:ogc:def:crs:OGC:1.3:CRS84", ""):
logger.warning("Feature %d: Non-standard CRS '%s'. Expecting WGS84.", idx, crs_name)
results.append(flattener.process_feature(feat))
except Exception as e:
results.append({
"feature_id": feat.get("id", f"unknown_{idx}"),
"error": str(e),
"status": "FAILED",
})
return results
Step 4: Validation & Compliance Alignment Jump to heading
Flattened outputs must align with enterprise spatial standards. Enforce schema validation before database ingestion.
- Geometry Validation: Ensure
geometry_typemaps to valid OGC Simple Features types (Point,LineString,Polygon,MultiPolygon). Null geometries must be quarantined, not inserted. - Column Naming: Flattened keys must match
[a-zA-Z_][a-zA-Z0-9_]*regex to prevent SQL injection or reserved keyword collisions. - Precision Consistency: All coordinate arrays must be clamped to the configured decimal threshold before export.
- Auditability: Retain original
idandgeometry_typecolumns for traceability during Automated Attribute Transformation & ETL Workflows.
Validate outputs against the OGC GeoPackage Specification or PostGIS ST_IsValid() functions prior to deployment. This guarantees interoperability across municipal, state, and federal GIS platforms.
Conclusion Jump to heading
Flattening deeply nested GeoJSON feature collections safely requires strict configuration boundaries, deterministic type coercion, and explicit edge-case handling. By isolating geometry precision management from attribute traversal and enforcing CI-ready error capture, spatial data teams eliminate schema drift and geometry corruption. Deploy the provided engine as a standardized preprocessing step to guarantee downstream database compatibility and long-term pipeline resilience.