Shell Cookbook
2020-10-09
- 使用 RMarkdown 的
child
参数,进行文档拼接。 - 这样拼接以后的笔记方便复习。
- 相关问题提交到 Issue
1 复制结果
例如,
$ git status | clip
On branch master
Your branch is ahead of 'origin/master' by 4 commits.
(use "git push" to publish your local commits)
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: doing.Rmd
Untracked files:
(use "git add <file>..." to include in what will be committed)
a20191226172557.Rmd
a20191226173430.Rmd
a20191226220133.Rmd
a20191227005202.Rmd
a20191227102143.Rmd
a20191227120459.Rmd
a20191227145216.Rmd
a20191227163037.Rmd
a20191227180428.Rmd
a20191227193010.Rmd
a20191227233245.Rmd
a20191228114856.Rmd
a20191228220831.Rmd
a20191228231124.Rmd
no changes added to commit (use "git add" and/or "git commit -a")
参考 Forsyth (2006)
3 查看命令的来源
4 处理同名 exe
让C:\Program Files\Git\mingw64\bin\
滞后试试。
重启以后
Bourne Again Shell 的脚本 可以学习一行代码向一个脚本的过渡 (Scriven 2020)。
5 review command
在 RMarkdown 中 engine 为 bash, engine.path = "D:\\install\\Git\\bin\\bash.exe"
## Seoul.csv
## Tallinn.csv
## arg.sh
## ch1-example.sh
## ch1.Rmd
## ch1.html
## ch1.ipynb
## ch2.Rmd
## ch3.Rmd
## ch4.Rmd
## hire_data.sh
(e)grep
filters input based on regex pattern matchingcat
concatenates ×le contents line-by-linetail
head
give only the last-n
(a Øag) lineswc
does a word or line count (with Øags-w
-l
)sed
does pattern-matched string replacement
## 锘緾ountry,City,Job Name,Salary
## Afghanistan,Kabul,Javascript Developer,158003
## Akrotiri and Dhekelia,Episkopi Cantonment,Python Developer,194640
## Albania,Tirana,Data Scientist,187506
## Algeria,Algiers,Javascript Developer,165451
## American Samoa,Pago Pago,Python Developer,175138
## Andorra,Andorra la Vella,Data Scientist,197452
## Angola,Luanda,Javascript Developer,144335
## Anguilla,The Valley,Python Developer,121100
## Antigua and Barbuda,St. John's,Data Scientist,108816
## 锘緾ountry,City,Job Name,Salary
## Akrotiri and Dhekelia,Episkopi Cantonment,Python Developer,194640
## 锘緾ountry,City,Job Name,Salary
## Afghanistan,Kabul,Javascript Developer,158003
## Akrotiri and Dhekelia,Episkopi Cantonment,Python Developer,194640
## Albania,Tirana,Data Scientist,187506
## Algeria,Algiers,Javascript Developer,165451
## American Samoa,Pago Pago,Python Developer,175138
## Andorra,Andorra la Vella,Data Scientist,197452
## Angola,Luanda,Javascript Developer,144335
## Anguilla,The Valley,Python Developer,121100
## Antigua and Barbuda,St. John's,Data Scientist,108816
## 1 Afghanistan,Kabul,Javascript Developer,158003
## 1 Akrotiri and Dhekelia,Episkopi Cantonment,Python Developer,194640
## 1 Albania,Tirana,Data Scientist,187506
## 1 Algeria,Algiers,Javascript Developer,165451
## 1 American Samoa,Pago Pago,Python Developer,175138
uniq -c
类似于 group by
+ count(distinct *
的命令。
或命题的使用方式,参考egrep
。
6 script
用 !# /bin/bash
在首行声明,就可以直接跑
否则
## 1 Afghanistan,Kabul,Javascript Developer,158003
## 1 Akrotiri and Dhekelia,Episkopi Cantonment,Python Developer,194640
## 1 Albania,Tirana,Data Scientist,187506
## 1 Algeria,Algiers,Javascript Developer,165451
## 1 American Samoa,Pago Pago,Python Developer,175138
## 1 Afghanistan,Kabul,Javascript Developer,158003
## 1 Akrotiri and Dhekelia,Episkopi Cantonment,Python Developer,194640
## 1 Albania,Tirana,Data Scientist,187506
## 1 Algeria,Algiers,Javascript Developer,165451
## 1 American Samoa,Pago Pago,Python Developer,175138
注意不能忘记./
。
7 STDIN, STDOUT STDERR
- STDIN (standard input). A stream of data into the program
- STDOUT (standard output). A stream of data out of the program
1> ...
- STDERR (standard error). Errors in your program
2> ...
## ```bash
## 1 Afghanistan,Kabul,Javascript Developer,158003
## 1 Akrotiri and Dhekelia,Episkopi Cantonment,Python Developer,194640
## 1 Albania,Tirana,Data Scientist,187506
## 1 Algeria,Algiers,Javascript Developer,165451
## 1 American Samoa,Pago Pago,Python Developer,175138
## ```
8 cut
## ../data/hire_data ../data/hire_data.zip
## ../data/inherited_folder.zip ../data/model_out.zip
## ../data/model_results.zip ../data/new_hires.csv
## ../data/robs_files.zip ../data/soccer_scores.csv
## ../data/soccer_scores_edited.csv
## 锘縔ear,Winner,Winner Goals
## 1932,Arda,4
## 1933,Botev,1
## 1934,Cherno,5
## 1935,Dunav,2
## 1936,Cherno,4
## 1937,Dunav,4
## 1938,Beroe,5
## 1939,Botev,2
## 1940,Beroe,3
选择第二列来计数。
## Usage: cut OPTION... [FILE]...
## Print selected parts of lines from each FILE to standard output.
##
## With no FILE, or when FILE is -, read standard input.
##
## Mandatory arguments to long options are mandatory for short options too.
## -b, --bytes=LIST select only these bytes
## -c, --characters=LIST select only these characters
## -d, --delimiter=DELIM use DELIM instead of TAB for field delimiter
## -f, --fields=LIST select only these fields; also print any line
## that contains no delimiter character, unless
## the -s option is specified
## -n (ignored)
## --complement complement the set of selected bytes, characters
## or fields
## -s, --only-delimited do not print lines not containing delimiters
## --output-delimiter=STRING use STRING as the output delimiter
## the default is to use the input delimiter
## -z, --zero-terminated line delimiter is NUL, not newline
## --help display this help and exit
## --version output version information and exit
##
## Use one, and only one of -b, -c or -f. Each LIST is made up of one
## range, or many ranges separated by commas. Selected input is written
## in the same order that it is read, and is written exactly once.
## Each range is one of:
##
## N N'th byte, character or field, counted from 1
## N- from N'th byte, character or field, to end of line
## N-M from N'th to M'th (included) byte, character or field
## -M from first to M'th (included) byte, character or field
##
## GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
## Report cut translation bugs to <https://translationproject.org/team/>
## Full documentation at: <https://www.gnu.org/software/coreutils/cut>
## or available locally via: info '(coreutils) cut invocation'
## 13 Arda
## 8 Beroe
## 9 Botev
## 8 Cherno
## 17 Dunav
## 15 Etar
## 4 Levski
## 1 Lokomotiv
## Usage: tail [OPTION]... [FILE]...
## Print the last 10 lines of each FILE to standard output.
## With more than one FILE, precede each with a header giving the file name.
##
## With no FILE, or when FILE is -, read standard input.
##
## Mandatory arguments to long options are mandatory for short options too.
## -c, --bytes=[+]NUM output the last NUM bytes; or use -c +NUM to
## output starting with byte NUM of each file
## -f, --follow[={name|descriptor}]
## output appended data as the file grows;
## an absent option argument means 'descriptor'
## -F same as --follow=name --retry
## -n, --lines=[+]NUM output the last NUM lines, instead of the last 10;
## or use -n +NUM to output starting with line NUM
## --max-unchanged-stats=N
## with --follow=name, reopen a FILE which has not
## changed size after N (default 5) iterations
## to see if it has been unlinked or renamed
## (this is the usual case of rotated log files);
## with inotify, this option is rarely useful
## --pid=PID with -f, terminate after process ID, PID dies
## -q, --quiet, --silent never output headers giving file names
## --retry keep trying to open a file if it is inaccessible
## -s, --sleep-interval=N with -f, sleep for approximately N seconds
## (default 1.0) between iterations;
## with inotify and --pid=P, check process P at
## least once every N seconds
## -v, --verbose always output headers giving file names
## -z, --zero-terminated line delimiter is NUL, not newline
## --help display this help and exit
## --version output version information and exit
##
## NUM may have a multiplier suffix:
## b 512, kB 1000, K 1024, MB 1000*1000, M 1024*1024,
## GB 1000*1000*1000, G 1024*1024*1024, and so on for T, P, E, Z, Y.
##
## With --follow (-f), tail defaults to following the file descriptor, which
## means that even if a tail'ed file is renamed, tail will continue to track
## its end. This default behavior is not desirable when you really want to
## track the actual name of the file, not the file descriptor (e.g., log
## rotation). Use --follow=name in that case. That causes tail to track the
## named file in a way that accommodates renaming, removal and creation.
##
## GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
## Report tail translation bugs to <https://translationproject.org/team/>
## Full documentation at: <https://www.gnu.org/software/coreutils/tail>
## or available locally via: info '(coreutils) tail invocation'
use -n +NUM to output starting with line NUM
不计入变量名。
9 sed
## Usage: sed [OPTION]... {script-only-if-no-other-script} [input-file]...
##
## -n, --quiet, --silent
## suppress automatic printing of pattern space
## -e script, --expression=script
## add the script to the commands to be executed
## -f script-file, --file=script-file
## add the contents of script-file to the commands to be executed
## --follow-symlinks
## follow symlinks when processing in place
## -i[SUFFIX], --in-place[=SUFFIX]
## edit files in place (makes backup if SUFFIX supplied)
## -b, --binary
## open files in binary mode (CR+LFs are not processed specially)
## -l N, --line-length=N
## specify the desired line-wrap length for the `l' command
## --posix
## disable all GNU extensions.
## -E, -r, --regexp-extended
## use extended regular expressions in the script
## (for portability use POSIX -E).
## -s, --separate
## consider files as separate rather than as a single,
## continuous long stream.
## --sandbox
## operate in sandbox mode (disable e/r/w commands).
## -u, --unbuffered
## load minimal amounts of data from the input files and flush
## the output buffers more often
## -z, --null-data
## separate lines by NUL characters
## --help display this help and exit
## --version output version information and exit
##
## If no -e, --expression, -f, or --file option is given, then the first
## non-option argument is taken as the sed script to interpret. All
## remaining arguments are names of input files; if no input files are
## specified, then the standard input is read.
##
## GNU sed home page: <https://www.gnu.org/software/sed/>.
## General help using GNU software: <https://www.gnu.org/gethelp/>.
## E-mail bug reports to: <bug-sed@gnu.org>.
相当于 R 的 str_replace
。
## 锘縔ear,Winner,Winner Goals
## 1932,Arda United,4
## 1933,Botev,1
## 1934,Cherno City,5
## 1935,Dunav,2
## 1936,Cherno City,4
## 1937,Dunav,4
## 1938,Beroe,5
## 1939,Botev,2
## 1940,Beroe,3
10 ARGV
- ARGV is the array of all the arguments given to the program.
- Each argument can be accessed via the
$
notation. The ×rst as$1
, the second as$2
etc. $@
and$*
give all ((???) all) the arguments in ARGV$#
gives the length (number) of arguments
## one
## two
## three
## one two three four five
## There are 5 arguments
11 cat + grep 复制文件
## ../data/hire_data ../data/hire_data.zip
## ../data/inherited_folder.zip ../data/model_out.zip
## ../data/model_results.zip ../data/new_hires.csv
## ../data/robs_files.zip ../data/soccer_scores.csv
## ../data/soccer_scores_edited.csv
# Echo the first ARGV argument
echo $1
# cat and grep the files using the first ARGV argument
# then write out to a named csv
cat ../data/hire_data/* | grep "$1" > "$1".csv
## Seoul
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## South Korea,Seoul,Javascript Developer,169193
## Tallinn
## Estonia,Tallinn,Javascript Developer,118286
## Estonia,Tallinn,Javascript Developer,118286
## Estonia,Tallinn,Javascript Developer,118286
## Estonia,Tallinn,Javascript Developer,118286
## Estonia,Tallinn,Javascript Developer,118286
## Estonia,Tallinn,Javascript Developer,118286
12 du 命令
参考 https://www.cnblogs.com/conncui/p/shell.html
12.1 du help 文档
```bash
du --help
Usage: /Rtools/bin/du [OPTION]... [FILE]...
or: /Rtools/bin/du [OPTION]... --files0-from=F
Summarize disk usage of the set of FILEs, recursively for directories.
Mandatory arguments to long options are mandatory for short options too.
-0, --null end each output line with NUL, not newline
-a, --all write counts for all files, not just directories
--apparent-size print apparent sizes, rather than disk usage; although
the apparent size is usually smaller, it may be
larger due to holes in ('sparse') files, internal
fragmentation, indirect blocks, and the like
-B, --block-size=SIZE scale sizes by SIZE before printing them; e.g.,
'-BM' prints sizes in units of 1,048,576 bytes;
see SIZE format below
-b, --bytes equivalent to '--apparent-size --block-size=1'
-c, --total produce a grand total
-D, --dereference-args dereference only symlinks that are listed on the
command line
-d, --max-depth=N print the total for a directory (or file, with --all)
only if it is N or fewer levels below the command
line argument; --max-depth=0 is the same as
--summarize
--files0-from=F summarize disk usage of the
NUL-terminated file names specified in file F;
if F is -, then read names from standard input
-H equivalent to --dereference-args (-D)
-h, --human-readable print sizes in human readable format (e.g., 1K 234M 2G)
--inodes list inode usage information instead of block usage
-k like --block-size=1K
-L, --dereference dereference all symbolic links
-l, --count-links count sizes many times if hard linked
-m like --block-size=1M
-P, --no-dereference don't follow any symbolic links (this is the default)
-S, --separate-dirs for directories do not include size of subdirectories
--si like -h, but use powers of 1000 not 1024
-s, --summarize display only a total for each argument
-t, --threshold=SIZE exclude entries smaller than SIZE if positive,
or entries greater than SIZE if negative
--time show time of the last modification of any file in the
directory, or any of its subdirectories
--time=WORD show time as WORD instead of modification time:
atime, access, use, ctime or status
--time-style=STYLE show times using STYLE, which can be:
full-iso, long-iso, iso, or +FORMAT;
FORMAT is interpreted like in 'date'
-X, --exclude-from=FILE exclude files that match any pattern in FILE
--exclude=PATTERN exclude files that match PATTERN
-x, --one-file-system skip directories on different file systems
--help display this help and exit
--version output version information and exit
Display values are in units of the first available SIZE from --block-size,
and the DU_BLOCK_SIZE, BLOCK_SIZE and BLOCKSIZE environment variables.
Otherwise, units default to 1024 bytes (or 512 if POSIXLY_CORRECT is set).
The SIZE argument is an integer and optional unit (example: 10K is 10*1024).
Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (powers of 1000).
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Report du translation bugs to <http://translationproject.org/team/>
Full documentation at: <http://www.gnu.org/software/coreutils/du>
or available locally via: info '(coreutils) du invocation'
```
12.2 查询路径下所有子路径大小
以项目 learn_rmd 为例
$ du -h --max-depth=1
6.7M ./.git
2.7M ./.Rproj.user
966K ./analysis
722K ./analysis-2
2.0K ./code
6.0K ./cookbook_cache
1.2M ./cookbook_files
1.0K ./data
5.1M ./docs
72K ./figure
3.0M ./libs
2.0K ./output
2.0K ./R
0 ./refs
6.0K ./template
0 ./tmp
1.4M ./xaringan
22M .
可以知道总文件夹大小22M .
,
./.git
可以知道 git 文件夹大小。
14 choco
14.1 安装
参考 https://stackoverflow.com/questions/32127524/how-to-install-and-use-make-in-windows 和 https://chocolatey.org/install
Both Windows 8 and 10 offer a Power Users menu that you can access by pressing Windows+X or just right-clicking the Start button. On the Power Users menu, choose “Command Prompt (Admin).”
管理员权限的 PowerShell 打开。
步骤还有点复杂的。
Run Get-ExecutionPolicy. If it returns Restricted, then run Set-ExecutionPolicy AllSigned or Set-ExecutionPolicy Bypass -Scope Process.
Get-ExecutionPolicy
Restricted,
Set-ExecutionPolicy AllSigned
或者
Set-ExecutionPolicy Bypass -Scope Process
再
Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))
choco 加入环境变量。
14.2 使用
(base) PS C:\Users\lijiaxiang> choco install make
Chocolatey v0.10.15
Installing the following packages:
make
By installing you accept licenses for the packages.
Progress: Downloading make 4.3... 100%
make v4.3 [Approved]
make package files install completed. Performing other installation steps.
ShimGen has successfully created a shim for make.exe
The install of make was successful.
Software install location not explicitly set, could be in package or
default install location if installer.
Chocolatey installed 1/1 packages.
See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
(base) PS C:\Users\lijiaxiang>
15 which 查询
16 where 查询
17 rtools 不要设置全局环境变量
例如,make
, rm
, which
等都会覆盖,很不好用。
例如,C:\rtools40\usr\bin\make.exe
18 append
参考 https://stackoverflow.com/questions/6207573/how-to-append-output-to-the-end-of-a-text-file
附录
参考文献
Forsyth, Scott. 2006. “Clip - Saving Command Line and Powershell Output Directly to the Clipboard.” Scott Forsyth’s Blog. 2006. https://weblogs.asp.net/owscott/clip-saving-command-line-and-powershell-output-directly-to-the-clipboard.
Scriven, Alex. 2020. “Introduction to Bash Scripting.” DataCamp. 2020. https://www.datacamp.com/courses/introduction-to-bash-scripting.