1test数据是每5分钟测试一个点,2test是每15分钟测试一个点,现在需要将1test数据处理成和2test时间格式一样,再对比两组数据,挑选出2test数据和处理后的1test数据时间不对应的(即缺测或者重复测试)。

knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(tidyverse)
library(lubridate)
test_1 <- read_csv("./data/1test20180602.csv",col_names = FALSE, skip = 1)
test_2 <- read_csv("./data/2test20180602.csv",col_names = FALSE, skip = 1)
Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<c4><ea>'
test_1 %>% 
    transmute(datetime = make_datetime(X1,X2,X3,X4,X5)) %>% 
    arrange(datetime) %>% 
    mutate(duration = interval(datetime,lag(datetime)) %>% as.duration(),
           duration_int = as.integer(duration)) %>% 
    filter(!duration_int %in% c(-300))
## # A tibble: 1 x 3
##   datetime            duration       duration_int
##   <dttm>              <S4: Duration>        <int>
## 1 2013-09-02 00:00:00 NA                       NA
test_2 %>% 
    transmute(datetime = make_datetime(X1,X2,X3,X4,X5)) %>% 
    arrange(datetime) %>% 
    mutate(duration = interval(datetime,lag(datetime)) %>% as.duration(),
           duration_int = as.integer(duration)) %>% 
    filter(!duration_int %in% c(-60*15))
## # A tibble: 3 x 3
##   datetime            duration            duration_int
##   <dttm>              <S4: Duration>             <int>
## 1 2013-09-02 00:00:00 NA                            NA
## 2 2013-09-02 03:45:00 1800s (~30 minutes)        -1800
## 3 2013-09-05 19:00:00 1800s (~30 minutes)        -1800

这两条数据间隔不满足要求。

test_2进行合并。

我想把一小时内的数据点由12个降到4个,把三组合并为一组取平均值。

但是这里并未说明要取什么的平均? 这个只有你可以尝试group_bysummarise函数完成,这里就不具体展开了。 具体可以参考,Baert dplyr tutorial

要求很显然,分成,

test_1 %>% 
    select(X1:X5) %>% 
    mutate(datetime = make_datetime(X1,X2,X3,X4,X5)) %>% 
    mutate(
        X5 = 
            case_when(
                X5 %in% c(0,5,10) ~ 0,
                X5 %in% c(15,20,25) ~ 15,
                X5 %in% c(30,35,40 ) ~ 30,
                X5 %in% c(45,50,55) ~ 45
            )
    )
## # A tibble: 1,152 x 6
##       X1    X2    X3    X4    X5 datetime           
##    <int> <int> <int> <int> <dbl> <dttm>             
##  1  2013     9     2     0    0. 2013-09-02 00:00:00
##  2  2013     9     2     0    0. 2013-09-02 00:05:00
##  3  2013     9     2     0    0. 2013-09-02 00:10:00
##  4  2013     9     2     0   15. 2013-09-02 00:15:00
##  5  2013     9     2     0   15. 2013-09-02 00:20:00
##  6  2013     9     2     0   15. 2013-09-02 00:25:00
##  7  2013     9     2     0   30. 2013-09-02 00:30:00
##  8  2013     9     2     0   30. 2013-09-02 00:35:00
##  9  2013     9     2     0   30. 2013-09-02 00:40:00
## 10  2013     9     2     0   45. 2013-09-02 00:45:00
## # ... with 1,142 more rows