R Base Notes
2020-09-21
- 使用 RMarkdown 的
child
参数,进行文档拼接。 - 这样拼接以后的笔记方便复习。
- 相关问题提交到 Issue
包含 r-base 和 GitHub 上 r-lib 相关的包
1 unload package
detach(package:fortunes, unload=TRUE)
3 文本排序
mixedsort {gtools} R Documentation
Order or Sort strings with embedded numbers so that the numbers are in the correct order
## [1] "f9" "f10" "f1"
## [1] "f1" "f10" "f9"
## [1] "f1" "f9" "f10"
5 \(\hat y\)的各种统计指标
## $fit
## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## 23.00544 23.00544 25.14862 18.96635
## Hornet Sportabout Valiant Duster 360 Merc 240D
## 14.76241 20.32645 14.76241 23.55360
## Merc 230 Merc 280 Merc 280C Merc 450SE
## 23.79677 22.69220 22.69220 18.23272
## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## 18.23272 18.23272 10.14632 10.64090
## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
## 11.46520 26.35622 26.47987 26.66946
## Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
## 24.64992 16.49345 17.07046 15.17456
## Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
## 13.11381 26.34386 24.64168 25.68030
## Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
## 15.13335 23.62366 17.19410 24.61283
##
## $se.fit
## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## 0.6643912 0.6643912 0.8153165 0.5889767
## Hornet Sportabout Valiant Duster 360 Merc 240D
## 0.8375091 0.5754133 0.8375091 0.6979313
## Merc 230 Merc 280 Merc 280C Merc 450SE
## 0.7140677 0.6471724 0.6471724 0.6127705
## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## 0.6127705 0.6127705 1.2739033 1.2237098
## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
## 1.1413740 0.9184018 0.9294688 0.9465968
## Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
## 0.7759228 0.7067474 0.6705132 0.8038897
## Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
## 0.9831357 0.9172997 0.7752901 0.8594940
## Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
## 0.8071908 0.7025060 0.6633450 0.7730805
##
## $df
## [1] 30
##
## $residual.scale
## [1] 3.251454
se.fit
: standard error of predicted means。
6 赋值符号箭头(<-)和等号(=)的区别
参考: 你知道R中的赋值符号箭头(<-)和等号(=)的区别吗?
总结下。
<-
相当于定义个一个全局变量,
=
相当于定义一个局部变量。
## [1] 5.5
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 5.5
因为=
相当于定义一个局部变量,没有定义一个全局变量,因此在全局我们调用x
的时候,x
是不存在的。
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
## [,1] [,2] [,3] [,4]
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
是因为
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
dimnames = NULL)
这里ncol <- 3
相当于nrow = ncol <- 3
,定义一个全局变量,ncol
,等于3,然后是的参数nrow
等于ncol
,在函数中。
因此总结一下,follow大神的code,在定义变量的时候,用<-
,在定义参数,局部变量的时候,用=
。
7 插入时间
{snippet} ts
: # Thu Apr 12 23:03:49 2018 ------------------------------
8 try & install packages
try(require("tidyquant") || install.packages("tidyquant"))
可以配合批量列表,完成包的预装。
## [1] TRUE
8.1 建立一个专门安装包的Rproj
建立一个专门安装包的Rproj
,因为有时候安装R包的时候,耽误时间,并且可能中断命令,R还要强制退出。
8.2 R包的结构
参考 Salmon and Csárdi (2020)
## [1] "D:/install/R-3.6.0/library/ggplot2"
## [1] "D:/install/R-3.6.0/library/ggplot2"
## [1] "D:/install/R-3.6.0/library"
## D:/install/R-3.6.0/library/ggplot2
## +-- CITATION
## +-- data
## | +-- Rdata.rdb
## | +-- Rdata.rds
## | \-- Rdata.rdx
## +-- DESCRIPTION
## +-- doc
## | +-- extending-ggplot2.html
## | +-- extending-ggplot2.R
## | +-- extending-ggplot2.Rmd
## | +-- ggplot2-in-packages.html
## | +-- ggplot2-in-packages.R
## | +-- ggplot2-in-packages.Rmd
## | +-- ggplot2-specs.html
## | +-- ggplot2-specs.R
## | +-- ggplot2-specs.Rmd
## | \-- index.html
## +-- help
## | +-- aliases.rds
## | +-- AnIndex
## | +-- figures
## | | +-- logo.png
## | | \-- README-example-1.png
## | +-- ggplot2.rdb
## | +-- ggplot2.rdx
## | \-- paths.rds
## +-- html
## | +-- 00Index.html
## | \-- R.css
## +-- INDEX
## +-- LICENSE
## +-- MD5
## +-- Meta
## | +-- data.rds
## | +-- features.rds
## | +-- hsearch.rds
## | +-- links.rds
## | +-- nsInfo.rds
## | +-- package.rds
## | +-- Rd.rds
## | \-- vignette.rds
## +-- NAMESPACE
## +-- NEWS.md
## \-- R
## +-- ggplot2
## +-- ggplot2.rdb
## \-- ggplot2.rdx
8.2.1 升级R,不需要重新下载包
如果你是在windows环境下,可以安装一个包:installr。不仅能帮你更新R,还能将原来安装的扩展包配置到新版本下,不用再重新安装。 (张杰 EasyCharts)
8.2.2 安装包报错的解决办法
.libPaths()
查看使用的R版本装包的路径,删除老的yaml
包,再重装。- 在RStudio Community
上,有人很多提问
yaml
包安装报错的办法。 .libPaths()
列出R包的安装路径,退出R和RStudio,删除yaml
文件夹,然后重新安装。 (Hallgrímsson 2018)- 另一种办法是查询包安装路径的方式(Dervieux 2018)
- 如
find.package("ggplot2")
- 如
- 在RStudio Community
上,有人很多提问
- 查看自己的R版本是不是用错了,
Tools
\(\to\)Global Option
,查看。
9 参考包的版本
10 查看特定包有哪些dataset
10.2 data()
但是对于tidyverse
和caret
这种集成包就不适用了。
因此直接使用原始函数。
(Scriven 2014)
然后我们发现存在
oilType (oil) Fatty acid composition of commercial oils
因此,我们可以调用oilType
这个dataset。
11 NA
相关问题
NA
不能用于比较(UCLA: Statistical Consulting Group 2012),因此用a == NA
、a = NA
、a -> NA
时,都是在做比较,因此系统是报错的。
这里的修改的方法是a %in% NA
(Mahto 2014),类似于子集的概念。
12 eval
和parse
## [1] "1+1"
## expression(1 + 1)
## [1] 2
parse
: 把string表达式化eval
: 实现表达式- 别忘记了
text
参数,这不是第一个参数,因此会报错。
13 中文单位替换函数
data_frame(price = c("300", "2.5k", "3.1万", "4.1k", "3.8万")) %>%
mutate(number = str_extract(price,"\\d+\\.\\d+|\\d+"),
number = as.double(number),
unit = str_extract(price,"k|万"),
prod = case_when(
unit == "k" ~ 10^3,
unit == "万" ~ 10^4,
is.na(unit) ~ 1
),
final = number*prod,
final = formattable::accounting(final)
) %>%
select(price, final)
14 处理json格式数据进dataframe(Rodriguez 2018)
your_list <- list(
name = "name",
age = 20,
sex = "M",
classification = list(
category = list(
id = 1001,
loc = c("A", "B", "C")
),
subcategory = list(
id = 2001,
loc = c("A", "B", "C")
),
type = list(
id = 3001,
loc = c("A", "B", "C")
)
)
)
library(tidyverse)
library(jsonlite)
your_list %>%
# make json, then make list
toJSON() %>%
fromJSON() %>%
# remove classification level
purrr::flatten() %>%
# turn nested lists into dataframes
map_if(is_list, as_tibble) %>%
# bind_cols needs tibbles to be in lists
map_if(is_tibble, list) %>%
# creates nested dataframe
bind_cols()
15 分离路径函数
这个函数可以分离一个路径的不同层级,并且方便复制粘贴、set
函数分析路径。
函数构建主要借鉴(Stack Overflow 2015)。
split_path2 <- function(path){
split_path <- function(path) {
if (dirname(path) %in% c(".", path)) return(basename(path))
return(c(basename(path), split_path(dirname(path))))
}
data_frame(path = split_path(path)) %>%
mutate(index = 1:length(path)) %>%
arrange(desc(index)) %>%
filter(!path == "") %>%
select(path)
}
假设一条路径a
这样方便复制粘贴。
或者可以进行set函数分析路径。
## [1] "imp_rmd"
可以查看路径b
少了imp_rmd
这一环节。
basename("C:/some_dir/a.ext")
# [1] "a.ext"
dirname("C:/some_dir/a.ext")
# [1] "C:/some_dir"
其实就是这两个函数的思想(Stack Overflow 2010)。
并且这两个函数还可以进行批量操作。
## [1] "file1" "file2"
## [1] "/p1/p2/p3"
16 file.exists
和download.file
搭配使用
if(!file.exists("2008.csv.bz2"))
{download.file("http://stat-computing.org/dataexpo/2009/2008.csv.bz2", "2008.csv.bz2")}
if(!file.exists("2007.csv.bz2"))
{download.file("http://stat-computing.org/dataexpo/2009/2007.csv.bz2", "2007.csv.bz2")}
17 会计格式的string保持计算功能
## [1] 152821
## [1] 152,821
## [1] 152821
18 letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
## [20] "t" "u" "v" "w" "x" "y" "z"
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
## [20] "T" "U" "V" "W" "X" "Y" "Z"
## [1] "A" "B"
19 当前Rmd
文件和Rproj
的绝对路径
我在RStudio Community上进行了回答。
rstudioapi::getActiveProject() # project path
rstudioapi::getActiveDocumentContext()$path # file path
同时也有可视化的选项进行设定,非常方便,如图。
20 提取R对象 (R语言中文社区 2018)
20.1 drop
参数保持矩阵特性
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [1] 3
## [,1]
## [1,] 3
20.2 [
, [[
, $
在list中
## $foo
## [1] 1 2 3 4
##
## $bar
## [1] 0.6
## $foo
## [1] 1 2 3 4
## [1] 1 2 3 4
## [1] 1 2 3 4
## [1] 1 2 3 4
## $foo
## [1] 1 2 3 4
##
## $bar
## [1] 0.6
20.3 [[]]
可以后续再次赋值变量
## [1] 1 2 3 4
## [1] 1 2 3 4
## [1] 1 2 3 4
## NULL
20.4 [[]]
可以有传导功能
## [1] 10
## [1] 10
20.6 drop = FALSE 保证输出为 data.frame
参考 Developed (2019)
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
21 R查找某一元素位置 (常玉俊 2017)
需求驱动,简单最好。(index = 1)
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 5
22 字符串中的引号处理 (Abel 2015)
a <- paste("modelCheck(var\"",i,"_d.bug\")",sep="")
a
str_length(a)
b <- paste('modelCheck(var"',i,'_d.bug")',sep="")
所以这么操作以后,虽然\
也在print
中,但是是不存入string
内的,str_length(a)
验证了字符串的长度为24
,因此不包含\
。
23 字符串中有会计符号的处理方法
## [1] "character"
## [1] "double"
24 cache文件报错
如果R某个cache文件报错了,那么把整个文件夹删除,重新all run。
25 error reading from connection
报错
Error in install.packages : error reading from connection
每次更新RStudio,需要更新下下载包的路径。
\[\text{Global Options}\to\text{Packages}\to\text{CRAN mirror}\]
可以选择上海同济大学。
26 查询文件夹内文件数量
27 read.table
用来读.txt
和.sql
(Shamma 2009)
28 本地安装R包
首先查看是否有Github版本,有的话,打开Shell
git clone ...
下载后,格式为文件夹,
如果Github上没有,去R CRAN上下载,本地解压。
在R console,输入
install.packages("文件路径", repos=NULL, type="source")
例如,
29 rproj 解读
Version: 1.0
RestoreWorkspace: Default
SaveWorkspace: Default
AlwaysSaveHistory: Default
EnableCodeIndexing: Yes
UseSpacesForTab: Yes
NumSpacesForTab: 4
Encoding: UTF-8
RnwWeave: Sweave
LaTeX: pdfLaTeX
NumSpacesForTab: 4
限定了缩进选择几个。
30 批量修改文件名
去除
33 Fatal Error: cannot open the base package RStudio
有一个包更新报错,不能删除旧版本,查询
就不小心删除了。
相当于把 base 包删除了。
如果还记得当时打开的默认 R 版本,就重新安装,相当于下载了 base 包,就可以打开了。
34 层层递进学习函数的结构
参考 https://github.com/r-lib/lobstr
This package is to visualize R object maded by Hadley Wickham.
## x
## 1. +-global::f(cst())
## 2. | \-global::g(x)
## 3. | \-global::h(x)
## 4. \-lobstr::cst()
## Error in loadNamespace(name): there is no package called 'add2text'
35 查看两个df是否相等
参考 https://sharla.party/post/comparing-two-dfs/
## - different number of columns: 11 vs 10
## Differences found between the objects!
##
## A summary is given below.
##
## There are columns in BASE that are not in COMPARE !!
## All rows are shown in table below
##
## =========
## COLUMNS
## ---------
## mpg
## ---------
36 If 条件安装包
if (!"pacman" %in% dir(.libPaths())) devtools::install_github("trinker/pacman")
这个命令直观,学习。
37 zip
- 可以在不解压的情况下,先查看其中文件的信息。
- 并且也可以知道 csv 文件在压缩前后的大小。
- 也可以知道文件最后被压缩的时间
38 函数中覆盖 global 变量
参考 http://adv-r.had.co.nz/Environments.html
To make assignments to global variables, superassignment operator,
<<-
, is used. https://www.datamentor.io/r-programming/environment-scope/
方便另外一个函数调用。
39 file name conversion problem
解决 Github Issue 486
analysis/漏斗图代码.Rmd
尽量不要用中文命名,会产生这样的报错。
It looks like you have a problem because the path name has non-English characters in it. (Artem 2018)
file name conversion problem -- name too long?
这是一个误导信息,不是文件名太长,而是有文件名中文字符,或者文档中调用的文件路径有中文导致的。
40 查看R和RStudio版本
41 argmax
参考 https://r.789695.n4.nabble.com/argmax-td822125.html
argmax in R 为 which.max
42 fs
## [1] "analysis/a20191126101202"
## [1] "Rmd"
## analysis/a20191126101202.Rmd
## FALSE
## analysis/a20191126101202.Rmd
## FALSE
## analysis/a20191126101202.Rmd
## FALSE
## analysis/a20191126101202.aaa
44 安装 Rtools
- 例如,roxygen2 等底层包是需要的。
- rtools 就是集合了 make 等命令行工具
参考 https://cran.r-project.org/bin/windows/Rtools/ 和 https://github.com/r-lib/roxygen2/issues/1128
配置环境变量,名字不要写错了,中间有一个 usr,C:\\rtools40\\usr\\bin\
45 element operation data.frames
参考 https://stackoverflow.com/questions/6408016/paste-together-two-data-frames-element-by-element-in-r
value_df <- mtcars %>% head() %>% select(1:4)
sig_df <-
matrix(
"**",
nrow = nrow(value_df),
ncol = ncol(value_df),
dimnames = dimnames(value_df)
)
matrix(
paste0(as.matrix(sig_df), as.matrix(value_df), as.matrix(sig_df)),
nrow = nrow(value_df),
ncol = ncol(value_df),
dimnames = dimnames(value_df)
)
## mpg cyl disp hp
## Mazda RX4 "**21**" "**6**" "**160**" "**110**"
## Mazda RX4 Wag "**21**" "**6**" "**160**" "**110**"
## Datsun 710 "**22.8**" "**4**" "**108**" "**93**"
## Hornet 4 Drive "**21.4**" "**6**" "**258**" "**110**"
## Hornet Sportabout "**18.7**" "**8**" "**360**" "**175**"
## Valiant "**18.1**" "**6**" "**225**" "**105**"
46 waldo 查询对象差异
参考 r-lib/waldo: Find differences between R objects
# devtools::install_github("r-lib/waldo")
library(waldo)
df1 <- data.frame(x = 1:3, y = 3:1)
df2 <- tibble::tibble(y = 3:1, x = 1:3)
compare(df1, df2)
## `class(x)`: "data.frame"
## `class(y)`: "tbl_df" "tbl" "data.frame"
##
## `names(x)`: "x" "y"
## `names(y)`: "y" "x"
library(waldo)
df1 <- data.frame(x = 1:3, y = 3:1) %>% matrix()
df2 <- tibble::tibble(y = 3:1, x = 1:3) %>% matrix()
compare(df1, df2)
## `x[[1]]`: 1 2 3
## `y[[1]]`: 3 2 1
##
## `x[[2]]`: 3 2 1
## `y[[2]]`: 1 2 3
47 drop column
参考 https://www.listendata.com/2015/06/r-keep-drop-columns-from-data-frame.html
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
## [1] "mpg" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
48 extract list from data.frame
参考 https://stackoverflow.com/questions/4605206/drop-data-frame-columns-by-name
## [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
包含了一些数据集的 input 条件,最好还是别 run ipynb,改成 py 调用吧。
49 安装包的老版本
参考 https://community.rstudio.com/t/trouble-installing-package-rodbc/66107
要么 R 更新到 4.0,要么用老版本。
50 设置 Python
当不在默认路径里面安装 Python 时,如果重新安装 Python 会导致平时用的 Python 和当前 Python 的库和配置不能共享,很不方便,因此建议如下设置。
51 rlang
!! rlang::parse_expr(filter)
52 Rtools
加速下载链接(使用清华镜像站): https://mirrors.tuna.tsinghua.edu.cn/CRAN/bin/windows/Rtools/Rtools35.exe 下载镜像。
附录
参考文献
Abel, Guy. 2015. “Paste Quotation Marks into Character String, Within a Loop.” 2015. https://stackoverflow.com/questions/4453020/paste-quotation-marks-into-character-string-within-a-loop.
Artem. 2018. “Issue Loading Data of ‘File Name Conversion Problem - Name Too Long?’.” Stack Overflow. 2018. https://stackoverflow.com/a/52460959/8625228.
Dervieux, Christophe. 2018. “Are There Problems with the Readoffice Package?” 2018. https://community.rstudio.com/t/are-there-problems-with-the-readoffice-package/9494/2.
Developed, RStudio. 2019. “Tidy Evaluation with Rlang Cheat Sheet.” 2019. https://resources.rstudio.com/rstudio-developed/tidyeval-2.
Hallgrímsson, Hlynur. 2018. “Error When Starting Rstudio "There Is No Package ’Yaml’".” 2018. https://community.rstudio.com/t/error-when-starting-rstudio-there-is-no-package-yaml/4070.
Mahto, Ananda. 2014. “R: How to Include Na in Ifelse?” 2014. https://stackoverflow.com/questions/22076353/r-how-to-include-na-in-ifelse.
Rodriguez, Tony. 2018. “How to Read Multilevel Json Data and Convert to Data Frame in R.” 2018. https://community.rstudio.com/t/how-to-read-multilevel-json-data-and-convert-to-data-frame-in-r/7571/11.
R语言中文社区. 2018. “R语言笔记3:提取R对象的子集.” 2018. https://mp.weixin.qq.com/s/_ePkjK3cuoPgvCjnQP0o2A.
Salmon, Maëlle, and Gábor Csárdi. 2020. “State of R Packages in Your Library.” The R-hub blog. 2020. https://blog.r-hub.io/2020/09/03/keep.source/.
Scriven, Rich. 2014. “Get a List of the Data Sets in a Particular Package.” 2014. https://stackoverflow.com/questions/27709936/get-a-list-of-the-data-sets-in-a-particular-package.
Shamma, David Ayman. 2009. “How to Call the Output of a Function in Another Function?” Stack Overflow. 2009. https://stackoverflow.com/questions/1407647/reading-text-files-using-read-table.
Stack Overflow. 2010. “Find File Name from Full File Path.” 2010. https://stackoverflow.com/questions/2548815/find-file-name-from-full-file-path.
———. 2015. “Split a File Path into Folder Names Vector.” 2015. https://stackoverflow.com/questions/29214932/split-a-file-path-into-folder-names-vector.
UCLA: Statistical Consulting Group. 2012. “How Does R Handle Missing Values? | R Faq.” 2012. https://stats.idre.ucla.edu/r/faq/how-does-r-handle-missing-values/.
常玉俊. 2017. “R查找某一元素位置.” 2017. https://blog.csdn.net/chang349276/article/details/77299104.