Overview

Spatial regression refers to a family of statistical models designed to account for spatial dependence and heterogeneity in geographic data. Traditional regression assumptions often fail when observations are geographically clustered or interact across space, leading to biased coefficients and invalid inference. Spatial regression remedies this by explicitly modeling how a variable at one location influences or correlates with observations at neighboring locations.

Central to these models is the spatial lag operator, typically represented by a weight matrix \( W \), which formalizes neighborhood relationships and enables the incorporation of spatial structure into linear and generalized linear frameworks. These techniques are foundational in econometrics, epidemiology, urban planning, environmental science, and quantitative geography.

The Spatial Lag Operator

In time-series analysis, the lag operator \( L \) or \( B \) shifts observations back in time. In spatial statistics, an analogous operator maps values from neighboring units to each location. It is defined through a spatial weights matrix \( W \), an \( n \times n \) matrix where \( n \) is the number of spatial units (e.g., regions, grid cells, points).

\( W = [w_{ij}] \), where \( w_{ij} \geq 0 \) represents the strength of connection from unit \( j \) to unit \( i \).

Common constructions of \( W \) include:

  • Contiguity weights: \( w_{ij} = 1 \) if regions \( i \) and \( j \) share a boundary, 0 otherwise.
  • Distance-based weights: \( w_{ij} = d_{ij}^{-\beta} \) or threshold-based kernels.
  • K-nearest neighbors: Each unit connects to its \( k \) closest units.

For computational and interpretive convenience, \( W \) is often row-standardized so that \( \sum_{j=1}^n w_{ij} = 1 \). The spatial lag of a variable vector \( y \) is then simply \( Wy \), representing the weighted average of neighbors' values at each location.

⚠️ Important Distinction

The spatial lag operator \( Wy \) captures spatial autocorrelation in the dependent variable (spillover effects), while a spatially lagged error term \( Wu \) captures unobserved spatially structured shocks. Confusing the two leads to misspecified models.

Model Specifications

Spatial regression models differ in how they incorporate \( W \). The four canonical forms are:

1. Spatial Autoregressive (SAR) / Spatial Lag Model (SLM)

\( y = \rho W y + X\beta + \varepsilon \),   \( \varepsilon \sim N(0, \sigma^2 I) \)

Here, \( \rho \) measures the strength of spatial feedback. The model implies that \( y_i \) is directly influenced by neighboring \( y_j \) values. It can be algebraically rearranged as \( y = (I - \rho W)^{-1}X\beta + (I - \rho W)^{-1}\varepsilon \), revealing global spillovers.

2. Spatial Error Model (SEM)

\( y = X\beta + u \),   \( u = \lambda W u + \varepsilon \)

Spatial dependence resides in the error term, typically capturing omitted spatially correlated variables or measurement error clustering. \( \lambda \) governs the error spillover intensity.

3. Spatial Durbin Model (SDM)

\( y = \rho W y + X\beta + WX\theta + \varepsilon \)

Extends the SAR by including spatial lags of the independent variables. \( \theta \) captures how neighbors' covariates affect the focal unit's outcome. The SDM is often preferred empirically as it nests SAR and SLX models.

4. Spatial Lag of X (SLX) Model

\( y = X\beta + WX\theta + \varepsilon \)

Lacks endogenous spatial feedback (\( \rho = 0 \)). Useful when covariates exhibit spatial structure but the dependent variable does not directly spill over. Estimated via OLS, though standard errors require spatial correction.

Estimation & Inference

Unlike standard linear regression, spatial models require specialized estimators due to endogeneity in \( Wy \) and the presence of spatial parameters (\( \rho, \lambda \)):

  • Maximum Likelihood (ML): Provides consistent, efficient estimates under normality. Computationally intensive for large \( n \) due to \( |I - \rho W|^{-1} \) determinant calculations.
  • Method of Moments (MM) & GMM: Pioneered by Anselin (1988) and Kelejian & Prucha (1998). Uses instrumental variables (often \( WX, WWX \)) to address endogeneity. Robust to non-normality.
  • Conditional Least Squares (CLS): Fast iterative approximation, widely used in open-source implementations (e.g., spatialreg in R).
  • Bayesian MCMC: Handles complex priors, missing data, and hierarchical spatial structures naturally.

Inference requires care: standard errors must account for spatial correlation, and hypothesis testing for \( \rho = 0 \) or \( \lambda = 0 \) typically uses Lagrange Multiplier (LM) tests and their robust counterparts.

Diagnostics & Validation

Proper model selection relies on rigorous diagnostics:

  • Moran’s I & Geary’s C: Global spatial autocorrelation measures for residuals and variables.
  • LM Tests: \( LM_{SAR} \), \( LM_{SEM} \), \( LM_{SAR-R} \), \( LM_{SEM-R} \) guide initial specification.
  • Local Indicators of Spatial Association (LISA): Identify spatial clusters and outliers (e.g., high-high, low-low).
  • Wald/LR Tests: Compare nested models (e.g., SDM vs SAR, SDM vs SLX).

Misspecification is a common pitfall. Omitting a spatial lag when \( \rho \neq 0 \) biases all coefficients. Including unnecessary spatial terms inflates variance. Cross-validation and information criteria (AICc, BIC) adjusted for spatial degrees of freedom are recommended.

Applications

Spatial regression and lag operators are deployed across disciplines:

  • Urban Economics: Housing price spillovers, neighborhood effects, gentrification diffusion.
  • Epidemiology: Disease transmission networks, health outcome clustering, vaccination spillovers.
  • Environmental Science: Air/water pollution dispersion, biodiversity gradients, climate impact modeling.
  • Political Geography: Voting behavior contagion, policy diffusion across jurisdictions.
  • Machine Learning: Graph Neural Networks (GNNs) and Spatial Attention mechanisms extend \( W \) to learned adjacency structures.

References

  1. [1] Anselin, L. (1988). Spatial Econometrics: Methods and Models. Kluwer Academic Publishers.
  2. [2] LeSage, J. P., & Pace, R. K. (2009). Introduction to Spatial Econometrics. CRC Press.
  3. [3] Kelejian, H. H., & Prucha, I. R. (1998). A Generalized Spatial Two-Stage Least Squares Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Disturbances. Journal of Real Estate Finance and Economics, 17(1), 99–121.
  4. [4] Elhorst, J. P. (2014). Spatial Econometrics: From Cross-Sectional Data to Spatial Panels. Journal for Economic Methodology, 21(3), 331–355.
  5. [5] Pfeifer, D., & Kelley, M. L. (2021). spatialreg: Spatial Regression (R package v2.0-53). CRAN.