1test数据是每5分钟测试一个点,2test是每15分钟测试一个点,现在需要将1test数据处理成和2test时间格式一样,再对比两组数据,挑选出2test数据和处理后的1test数据时间不对应的(即缺测或者重复测试)。
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(tidyverse)
library(lubridate)
test_1 <- read_csv("./data/1test20180602.csv",col_names = FALSE, skip = 1)
test_2 <- read_csv("./data/2test20180602.csv",col_names = FALSE, skip = 1)
Error in make.names(col.names, unique = TRUE) : 
  invalid multibyte string at '<c4><ea>'
skip = 1说明: 表格变量名中,奇异字符,因此删除重命名,否则无法读取文件。test_1 %>% 
    transmute(datetime = make_datetime(X1,X2,X3,X4,X5)) %>% 
    arrange(datetime) %>% 
    mutate(duration = interval(datetime,lag(datetime)) %>% as.duration(),
           duration_int = as.integer(duration)) %>% 
    filter(!duration_int %in% c(-300))
## # A tibble: 1 x 3
##   datetime            duration       duration_int
##   <dttm>              <S4: Duration>        <int>
## 1 2013-09-02 00:00:00 NA                       NA
test_2 %>% 
    transmute(datetime = make_datetime(X1,X2,X3,X4,X5)) %>% 
    arrange(datetime) %>% 
    mutate(duration = interval(datetime,lag(datetime)) %>% as.duration(),
           duration_int = as.integer(duration)) %>% 
    filter(!duration_int %in% c(-60*15))
## # A tibble: 3 x 3
##   datetime            duration            duration_int
##   <dttm>              <S4: Duration>             <int>
## 1 2013-09-02 00:00:00 NA                            NA
## 2 2013-09-02 03:45:00 1800s (~30 minutes)        -1800
## 3 2013-09-05 19:00:00 1800s (~30 minutes)        -1800
这两条数据间隔不满足要求。
对test_2进行合并。
我想把一小时内的数据点由12个降到4个,把三组合并为一组取平均值。
但是这里并未说明要取什么的平均? 这个只有你可以尝试group_by和summarise函数完成,这里就不具体展开了。 具体可以参考,Baert dplyr tutorial
要求很显然,分成,
test_1 %>% 
    select(X1:X5) %>% 
    mutate(datetime = make_datetime(X1,X2,X3,X4,X5)) %>% 
    mutate(
        X5 = 
            case_when(
                X5 %in% c(0,5,10) ~ 0,
                X5 %in% c(15,20,25) ~ 15,
                X5 %in% c(30,35,40 ) ~ 30,
                X5 %in% c(45,50,55) ~ 45
            )
    )
## # A tibble: 1,152 x 6
##       X1    X2    X3    X4    X5 datetime           
##    <int> <int> <int> <int> <dbl> <dttm>             
##  1  2013     9     2     0    0. 2013-09-02 00:00:00
##  2  2013     9     2     0    0. 2013-09-02 00:05:00
##  3  2013     9     2     0    0. 2013-09-02 00:10:00
##  4  2013     9     2     0   15. 2013-09-02 00:15:00
##  5  2013     9     2     0   15. 2013-09-02 00:20:00
##  6  2013     9     2     0   15. 2013-09-02 00:25:00
##  7  2013     9     2     0   30. 2013-09-02 00:30:00
##  8  2013     9     2     0   30. 2013-09-02 00:35:00
##  9  2013     9     2     0   30. 2013-09-02 00:40:00
## 10  2013     9     2     0   45. 2013-09-02 00:45:00
## # ... with 1,142 more rows