Get the prediction of test set

Assume we have

test dataset test
model xgb_base which is a xgboost model
we want to save it in the directory data with name my_pred.csv

phv_pred(test,xgb_base,dir='data',output_name='my_pred.csv')

The official target metric on the site is not usual, thus here is the function I wrap the metric into.

Assume you finish your model and get four coloumn in dataset dataset,

id is the id of the phv machine
t is time to record every y and x variables
p is the real power, in this contest, it is the target we want to predict.
phat is the predicted power, we want it to approach the real one.

library(add2evaluation)
data(dataset)
library(lubridate)
#> 
#> 载入程辑包：'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
library(psych)
library(tidyverse)
#> ─ Attaching packages ────────────────────────────── tidyverse 1.2.1 ─
#> ✔ ggplot2 3.1.0       ✔ purrr   0.2.5  
#> ✔ tibble  2.1.1       ✔ dplyr   0.8.0.1
#> ✔ tidyr   0.8.2       ✔ stringr 1.4.0  
#> ✔ readr   1.1.1       ✔ forcats 0.3.0
#> ─ Conflicts ─────────────────────────────── tidyverse_conflicts() ─
#> ✖ ggplot2::%+%()           masks psych::%+%()
#> ✖ ggplot2::alpha()         masks psych::alpha()
#> ✖ lubridate::as.difftime() masks base::as.difftime()
#> ✖ lubridate::date()        masks base::date()
#> ✖ dplyr::filter()          masks stats::filter()
#> ✖ lubridate::intersect()   masks base::intersect()
#> ✖ dplyr::lag()             masks stats::lag()
#> ✖ lubridate::setdiff()     masks base::setdiff()
#> ✖ lubridate::union()       masks base::union()
dataset %>% 
    describe()
#> Warning in FUN(newX[, i], ...): min里所有的参数都不存在; 回覆Inf
#> Warning in FUN(newX[, i], ...): max里所有的参数都不存在；回覆-Inf
#>            vars      n mean   sd median trimmed  mad   min   max range
#> short_name    1 183093 2.26 1.18   2.00    2.20 1.48  1.00  4.00  3.00
#> t             2 183093  NaN   NA     NA     NaN   NA   Inf  -Inf  -Inf
#> p             3 183093 4.85 9.50   0.00    2.26 0.17 -0.40 48.83 49.23
#> phat          4 183093 4.84 8.88   0.17    2.47 0.47 -3.55 46.97 50.52
#>            skew kurtosis   se
#> short_name 0.33    -1.40 0.00
#> t            NA       NA   NA
#> p          2.60     6.38 0.02
#> phat       2.39     5.11 0.02
phv_metric(
    id = dataset$short_name
    ,t = dataset$t
    ,y = dataset$p
    ,yhat = dataset$phat
)
#> [1] 0.1192559

Jiaixang Li

2019-09-28