参考 Chollet and Allaire (2018, 184–87)
… built from past information and constantly updated as new information comes in. A (RNN) adopts the same principle …
RNN 模型具备了记忆功能。
这是一个简单的例子。
state_t <- 0
for (input_t in input_sequence) {
output_t <- activation(dot(W, input_t) + dot(U, state_t) + b)
state_t <- output_t
}
timesteps <- 100
input_features <- 32
output_features <- 64
random_array <- function(dim) {
array(runif(prod(dim)), dim = dim)
}
inputs <- random_array(dim = c(timesteps, input_features))
state_t <- rep_len(0, length = c(output_features))
W <- random_array(dim = c(output_features, input_features))
dim(W)
## [1] 64 32
U <- random_array(dim = c(output_features, output_features))
b <- random_array(dim = c(output_features, 1))
output_sequence <- array(0, dim = c(timesteps, output_features))
for (i in 1:nrow(inputs)) {
input_t <- inputs[i,]
output_t <- tanh(as.numeric((W %*% input_t) + (U %*% state_t) + b))
# W %*% input_t = (64,32) * (32,1) = (64,1)
# U %*% state_t = (64,64) * (64,1) = (64,1)
output_sequence[i,] <- as.numeric(output_t)
state_t <- output_t
}
W %>% dim
## [1] 64 32
(W %*% inputs[1,]) %>% dim
## [1] 64 1
(W %*% inputs[1,]) %>% tanh %>% dim
## [1] 64 1
dim(output_sequence)
## [1] 100 64
output_sequence[1,] %>% length # first line
## [1] 64
这样 64 个插入 64 个,就 make sense 了。
这里的random_array(dim = c(output_features, 1))
应用到了广播机制。 prod
点乘
output_sequence %>% dim
## [1] 100 64
output_sequence %>% .[1:6,1:6]
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 0.9999995 1 1 0.9999999 1
## [2,] 1 1.0000000 1 1 1.0000000 1
## [3,] 1 1.0000000 1 1 1.0000000 1
## [4,] 1 1.0000000 1 1 1.0000000 1
## [5,] 1 1.0000000 1 1 1.0000000 1
## [6,] 1 1.0000000 1 1 1.0000000 1
an RNN is a loop that reuses quantities computed for during the previous iteration of the loop.
这种把 RNN 简化为一个 for 循环的解释非常简明。
Chollet, François, and J.J. Allaire. 2018. Deep Learning with R. Manning Publications.