1test数据是每5分钟测试一个点,2test是每15分钟测试一个点,现在需要将1test数据处理成和2test时间格式一样,再对比两组数据,挑选出2test数据和处理后的1test数据时间不对应的(即缺测或者重复测试)。
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(tidyverse)
library(lubridate)
test_1 <- read_csv("./data/1test20180602.csv",col_names = FALSE, skip = 1)
test_2 <- read_csv("./data/2test20180602.csv",col_names = FALSE, skip = 1)
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<c4><ea>'
skip = 1
说明: 表格变量名中,奇异字符,因此删除重命名,否则无法读取文件。test_1 %>%
transmute(datetime = make_datetime(X1,X2,X3,X4,X5)) %>%
arrange(datetime) %>%
mutate(duration = interval(datetime,lag(datetime)) %>% as.duration(),
duration_int = as.integer(duration)) %>%
filter(!duration_int %in% c(-300))
## # A tibble: 1 x 3
## datetime duration duration_int
## <dttm> <S4: Duration> <int>
## 1 2013-09-02 00:00:00 NA NA
test_2 %>%
transmute(datetime = make_datetime(X1,X2,X3,X4,X5)) %>%
arrange(datetime) %>%
mutate(duration = interval(datetime,lag(datetime)) %>% as.duration(),
duration_int = as.integer(duration)) %>%
filter(!duration_int %in% c(-60*15))
## # A tibble: 3 x 3
## datetime duration duration_int
## <dttm> <S4: Duration> <int>
## 1 2013-09-02 00:00:00 NA NA
## 2 2013-09-02 03:45:00 1800s (~30 minutes) -1800
## 3 2013-09-05 19:00:00 1800s (~30 minutes) -1800
这两条数据间隔不满足要求。
对test_2
进行合并。
我想把一小时内的数据点由12个降到4个,把三组合并为一组取平均值。
但是这里并未说明要取什么的平均? 这个只有你可以尝试group_by
和summarise
函数完成,这里就不具体展开了。 具体可以参考,Baert dplyr tutorial
要求很显然,分成,
test_1 %>%
select(X1:X5) %>%
mutate(datetime = make_datetime(X1,X2,X3,X4,X5)) %>%
mutate(
X5 =
case_when(
X5 %in% c(0,5,10) ~ 0,
X5 %in% c(15,20,25) ~ 15,
X5 %in% c(30,35,40 ) ~ 30,
X5 %in% c(45,50,55) ~ 45
)
)
## # A tibble: 1,152 x 6
## X1 X2 X3 X4 X5 datetime
## <int> <int> <int> <int> <dbl> <dttm>
## 1 2013 9 2 0 0. 2013-09-02 00:00:00
## 2 2013 9 2 0 0. 2013-09-02 00:05:00
## 3 2013 9 2 0 0. 2013-09-02 00:10:00
## 4 2013 9 2 0 15. 2013-09-02 00:15:00
## 5 2013 9 2 0 15. 2013-09-02 00:20:00
## 6 2013 9 2 0 15. 2013-09-02 00:25:00
## 7 2013 9 2 0 30. 2013-09-02 00:30:00
## 8 2013 9 2 0 30. 2013-09-02 00:35:00
## 9 2013 9 2 0 30. 2013-09-02 00:40:00
## 10 2013 9 2 0 45. 2013-09-02 00:45:00
## # ... with 1,142 more rows