李家翔 2019-01-26
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
How to create a spatial effect variable and then lag it?
In matrix notation, a spatial lag is just
X*y
where bigX
is your spatial weights matrix, and little y is your variable of interest. Stack Overflow
注意 X
是一个很大的矩阵,因此当样本量很大时,这种矩阵需要在 Impala 这种批量处理数据的环境进行。
Assuming you have spherical coordinates (i.e. lat & lon)
- see the
spDists
function in thesp
package.- If you have projected coordinates you can just use
dist()
.If you have made your spatial weights matrix that has
n*n
dimensions then yes. Other notes - you may want to normalize by the row sums of the matrix - so the resultant variable will simply be the inverse distance weight. Also if you have repeated measures over time you probably want your spatial weights matrix to account for that (e.g. only calculate spatial lags within the same year).
Let’s produce a minimal exmaple. We assume three locations and correspond latitude and longtitude.
参考在Stack Overflow写的一个答案
w %*% x
is smaller than x
.library(data.table)
sp <- data.table(
location = letters[1:3]
,lat = c(20,40,60)
,lon = c(50,70,90)
)
sp
## location lat lon
## 1: a 20 50
## 2: b 40 70
## 3: c 60 90
library(tidyverse)
w_raw <-
sp %>%
dist(.$lat,.$lon,method = "euclidean", upper=TRUE) %>%
as.matrix()
w_raw
## 1 2 3
## 1 0.00000 34.64102 69.28203
## 2 34.64102 0.00000 34.64102
## 3 69.28203 34.64102 0.00000
w <- w_raw * (1/rowSums(w_raw))
x <- c(5,10,5)
w %*% x
## [,1]
## 1 6.666667
## 2 5.000000
## 3 6.666667
library(broom)
lm(x ~ (w %*% x)) %>%
tidy
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 25.0 3.02e-14 8.29e14 7.68e-16
## 2 w %*% x -3.00 4.90e-15 -6.13e14 1.04e-15
尝试检验 spatial lag term 是显著的,也可以通过 Moran I 检验
w
## 1 2 3
## 1 0.0000000 0.3333333 0.6666667
## 2 0.5000000 0.0000000 0.5000000
## 3 0.6666667 0.3333333 0.0000000
x
## [1] 5 10 5
w %*% x
## [,1]
## 1 6.666667
## 2 5.000000
## 3 6.666667
分步解答
c(w[1,] %*% x,w[2,] %*% x,w[3,] %*% x)
## [1] 6.666667 5.000000 6.666667
矩阵相乘参考Blog
分析
w[1,] %*% x ==
w[1,1] * x[1] +
w[1,2] * x[2] +
w[1,3] * x[3]
## [,1]
## [1,] TRUE
w[1,i]
为样本1的和各个其他样本的距离,且进行了标准化。x
为变量的值。因此这个计算方式可以简化为
x_data <- data_frame(
id = 1:3
,x = x
)
w_long <-
w %>%
as.data.frame() %>%
rownames_to_column('id_left') %>%
gather(id_right,weight,-id_left)
w_long %>%
left_join(
x_data %>%
mutate(id = as.character(id))
,by = c('id_right' = 'id')
) %>%
group_by(id_left) %>%
summarise(
x_multipleby_w = sum(weight * x)
)
## # A tibble: 3 x 2
## id_left x_multipleby_w
## <chr> <dbl>
## 1 1 6.67
## 2 2 5.00
## 3 3 6.67
因此按照这个逻辑,翻译成 SQL,可以在 Impala 和 Hive 中进行 spatial lag term 的构造。