使用 RMarkdown 的 child 参数，进行文档拼接。
这样拼接以后的笔记方便复习。
相关问题提交到 Issue

1 `^`反字符

^表达了选择的字段完全不满足条件。

"30_CI.Rmd" %>% 
  str_subset("[^a-z]{5,}.Rmd")

## [1] "30_CI.Rmd"

  # 这表达了五个以上都不是[a-z]

2 重要的字符

. = 除了\n的任何chr
\d=[:digit:]=[0-9]
\D=[^0-9]
\s=\t\n\r\f\v]=[:space:]
\S=不是\s
\w=[a-zA-Z0-9_]全部字符
\W=不是\w

中文情况下，可以用以下来代替。

'【[^【]+消费[^】]+】|【[^【]?消费[^】]?】' 开头结尾一定要是中文开始或者结尾，这个以排除法的思想做的，可以继续研究下。

d1 <- 
c("Licence:yes","Licence:no")
d1 %>% 
  str_subset("Licence:(yes|no)")

## [1] "Licence:yes" "Licence:no"

d1 %>% 
  str_subset("Licence:yes|no")

## [1] "Licence:yes" "Licence:no"

# 其实是没有区别的

c("car","carr","cas","cars") %>% 
  str_subset("cars?")

## [1] "car"  "carr" "cars"

?表示0,1，可以满足也可以不满足
*表示满足0或0以上次
+表示满足1或1以上次
{n,m}表示满足n到m次

3 取用电话号码

d2 <- 
  c("555-555-555", "555 555 555", "555555555","555 555-555")
d2 %>% 
  str_subset("\\d+[-\\s]?\\d+[-\\s]?\\d+")

## [1] "555-555-555" "555 555 555" "555555555"   "555 555-555"

提取电话号码的一些方法

注意这里和标准的正则化表达有些不一样，例如\d要写成\\d。

这里解释，

\\d表示任意的数字
+前面这个任意的数字可以表示满足1或1以上次
[-\\s]表示数字完后，这里可以是-也可以是空格\\s
?表示[-\\s]表示0,1，可以满足也可以不满足

4 精准取用一个词

^以什么开头
$以什么结尾
\b以一个word为边界
\D不以一个word为边界

c(" hello"," hello ","ahellob","ahello","hellob") %>% 
  str_subset("\\bhello\\b")

## [1] " hello"  " hello "

"\\bhello\\b"比"hello"好，因为它不会匹配其他的。但是我还是不太懂。

(???) work with isolated words and we don’t want to create character sets with every single character that may divide our words (spaces, commas, colons, hyphens, and so on)

5 中文查询

\p{Han} 就是汉语。
\p{Lo}但是不是完全是中文。

Letters that do not distinguish case. Includes Chinese, Japanese, Korean ideographs. ((???))

c("我","Li","我的") %>% 
  str_subset("\\p{Lo}{2,}")

## [1] "我的"

c("我","Li","我的") %>% 
  str_subset("\\p{Han}{2,}")

## [1] "我的"

6 `(x|y)`的学习

这里有提到使用方法(??? Grouped matches)。

7 看书的总结

(???) 这本书虽然介绍了很多平台通用的代码，但是我感觉不实用，没有 (???) 那么简单粗暴，100多页讲清楚基本的东西，而且是基于Python的，所以R也可以借鉴。 (???) 这本也是不错的，可以借鉴。但是基本上已经到了一个段落了，正则化Kill！！！

8 偏业务逻辑

a?, 0或者1
a*, 0或者更多
a+, 1或者更多
只是数字: ^[0-9]*$
- 腾讯QQ号：[1-9][0-9]{4,}
  - 腾讯QQ号从10000开始
非零开头的最多带两位小数的数字: ^([1-9][0-9]*)+(.[0-9]{1,2})?$
- [0-9]*可以没有，但是首位一定是[1-9]
带 1-2 位小数的正数或负数：^(-)?d+(.d{1,2})?$
汉字: ^[一 - 龥]{0,}$，不适用，使用^\\p{Han}+$或者^[\u4e00-\u9fa5]{0,}$

参考: 最全的常用正则表达式大全——包括校验数字、字符、一些特殊的需求等等 - zxin - 博客园

9 not contain by `str_subset` (???)

x <- c("hi", "bye", "hip")
x %>% 
    str_subset("^(?!.*hip|bye)")

## [1] "hi"

x %>% 
    {.[!str_detect(., "hip|bye")]}

## [1] "hi"

10 escape

library(stringr)

regexps use the backslash, \, to escape special behaviour. (Stringr vignette Regular expressions)

escape 是这个意思。

Linux 系统中 grep -E，E就是 escape。这样就不需要加 \。

Unfortunately this creates a problem. We use strings to represent regular expressions, and \ is also used as an escape symbol in strings. So to create the regular expression . we need the string "\\.". (Stringr vignette Regular expressions)

# To create the regular expression, we need \\
dot <- "\\."

# But the expression itself only contains one:
writeLines(dot)
## \.
#> \.

# And this tells R to look for an explicit .
str_extract(c("abc", "a.c", "bef"), "a\\.c")
## [1] NA    "a.c" NA
#> [1] NA    "a.c" NA

If \ is used as an escape character in regular expressions, how do you match a literal \? Well you need to escape it, creating the regular expression \\. To create that regular expression, you need to use a string, which also needs to escape \. That means to match a literal \ you need to write "\\\\" — you need four backslashes to match one! (Stringr vignette Regular expressions)

x <- "a\\b"
writeLines(x)
## a\b
#> a\b

str_extract(x, "\\\\")
## [1] "\\"
#> [1] "\\"

In this vignette, I use \. to denote the regular expression, and "\\." to denote the string that represents the regular expression. (Stringr vignette Regular expressions)

regular expression 和 string to represent 是不一样的。

Figure 10.1: escape 三种情况的举例

正则化 Cookbook

正则化 Cookbook

1 `^`反字符

2 重要的字符

3 取用电话号码

4 精准取用一个词

5 中文查询

6 `(x|y)`的学习

7 看书的总结

8 偏业务逻辑

9 not contain by `str_subset` (???)

10 escape

附录

参考文献

正则化 Cookbook

1 ^反字符

2 重要的字符

3 取用电话号码

4 精准取用一个词

5 中文查询

6 (x|y)的学习

7 看书的总结

8 偏业务逻辑

9 not contain by str_subset (???)

10 escape

附录

参考文献

1 `^`反字符

6 `(x|y)`的学习

9 not contain by `str_subset` (???)