Global and Local Correlations under Spatial Autocorrelation
Posted on
propose a method to test the correlation of two random fields when they are both spatially autocorrelated
it uses Monte-Carlo methods, and focuses on permuting, and then smoothing and scaling one of the variables to destroy the correlation with the other, while maintaining at the same time the initial autocorrelation.
Motivation
- Clifford, Richardson, and Hemon (1989): propose a method that estimates an effective sample size $M < N$ to be used in such tests, in an attempt to capture the real uncertainty. the correlation coefficient is thus evaluated with a Student’s distribution with $M$ degrees of freedom (distribution with larger variance)
- but this is for Gaussian random fields
the paper proposes a Monte-Carlo method to test the correlation of two random fields that takes account of the spatial autocorrelation
- randomly permuting the values of one of the fields across space we eliminate the dependence between them
- smoothing and scaling the permuted field we approximately recover, with the help of variogram, the spatial structure
in the same spirit,
- Allard et al. (2001) propose a method that is based on random local rotations
- but applied to the characteristics of the spatial structure of point processes, where the intensity (rate parameter) is assumed to be constant at small scales and varies at large scales
- here, the random fields, were the autocorrelation fades away as the distance increases
global correlation: a single Pearson’s correlation coefficient between the two random fields
other spatially local tests may be of interest
two criteria:
- biodiversity: amount of species richness
- remoteness: travel time in days needed to reach the nearest city
local correlation: the correlation between biodiversity and remoteness in a given neighborhood
2. Biodiversity Data
- Biodiversity ($X$) is the result of estimating the number of species of plants, amphibians, birds and mammals in an area of 100km x 100km and centered at location $s$
- Remotenss ($Y$) combines a number of data sets that influence speed of travel
the sample is denoted by
\[(X_s, Y_s) = [(X_{s_1}, Y_{s_1}), \ldots, (X_{s_N}, Y_{s_N})], s = (s_1,\ldots, s_N), s_i \in \IR^2\]2.1 The Empirical and Smoothed Variogram for Biodiversity
the theoretical variogram is a function describing the degree of spatial dependence of a random field $X_s$. It is defined as the variance of the difference between field values at two locations $s_i$ and $s_j$ across realizations of the field
\[\gamma(s_i, s_j) = \frac 12\text{Var}(X_{s_i} - X_{s_j})\]the empirical variogram is the collection of pairs of distances $u_{ij} = \Vert s_i - s_j\Vert$ between $s_i$ and $s_j$ and their corresponding variogram ordinates $v_{ij} = \frac 12(X_{s_i} - X_{s_j})^2$
since $\gamma$ is expected to be a smooth function of distance, it is common to smooth the empirical variogram to improve its properties as an estimator for $\gamma$
2.2 Apply the Method to the Biodiversity Dataset
given $X_s$ and $Y_s$, test significance of
- the global correlation coefficient $H_0: \rho_{X_s, Y_s} = 0$
- the set of local correlations: $\hat\gamma_{X_s, Y_s}^\lambda(s_j), \forall j$
3. Behavior of $r_{X_s, Y_s}$ Under Spatial Autocorrelation
4. Algorithm
propose a method that approximately recovers the null distribution of $r_{X_s, Y_s}$, or any other statistic based on the independence of $X_s$ and $Y_s$
Let $X_s$ and $Y_s$ be a realization of two random fields. Repeat the following two steps $B$ times:
- Randomly permute the values of $X_s$ over $s$, whcih denote by $X_{\pi(s)}$, this means $X_{\pi(s)}$ and $Y_s$ are independent
- smooth and scale $X_{\pi(s)}$ to produce $\hat X_s$, such that its smoothed variogram $\hat\gamma$ approximately matches $\hat\gamma(X_s)$; that is, the transformed variable $\hat X_s$ has approximately the same autocorrelation structure as $X_s$
4.1 Step 2: Matching Variograms
this step focuses on recovering the intrinsic spatial structure of $X_s$ that was eliminated with the random permutation
the null distribution of $r_{X_s, Y_s}$ is mainly determined by the amount of autocorrelation,
the problem reduces to choosing a variogram from the family $\beta\hat\gamma(X_s^\delta) + \alpha$, that best approximates $\hat\gamma(X_s)$,
Choose $\Delta$ to be a set of values for the proportion of neighbors to consider for the smoothing step
- Calculate the smoothed variogram $\hat\gamma(X_s)$ by smoothing the empirical variogram of $X_s$
- For each $\delta \in \Delta$ repeat: a. construct the smoothed variable $X_s^\delta$ using a kernel smoother that fits a constant regression to $X_{\pi(s)}$ at each location $s_j$ b. calculate $\hat \gamma(X_s^\delta)$ c. fit a linear regression between $\hat\gamma(X_s^\delta)$ and $\hat\gamma(X_s)$, where $(\hat\alpha_\delta, \hat\beta_\delta)$ are the least-squares estimates
- choose $\delta^\star \in \Delta$ such that the sum of squares of the residuals of the fit is minimized
- transform $X_s^{\delta^\star} = \vert \hat\beta_{\delta^\star}\vert^{1/2}X_s^{\delta^\star} + \vert\hat\alpha_{\delta^\star}\vert^{1/2}Z$, where $Z$ is a vector of mutually independent and identically distributed $Z_i$’s with zero mean and unit variance.
6. Discussion
it develops a nonparametric approach for sampling from the null-hypothesis of independence, that involves three steps:
- pick one of the fields and estimate the spatial autocorrelation structure via its variogram
- randomly permute the values in this field
- apply a local smoothing to the permuted values, using a bandwidth and rescaling so that its resulting variogram matches the original in step 1.
one of the important consequences of autocorrelation is that increasing the sample size does not necessarily increase the power to find significance