使用 RMarkdown 的 child 参数，进行文档拼接。
这样拼接以后的笔记方便复习。
相关问题提交到 Issue

1 百分比显示

>>> '{:.1%}'.format(1/3.0)
'33.3%'

参考 Stack Overflow上 Python 展示百分比的更好的方法。

2 判断 Nonetype

参考 https://www.jianshu.com/p/a22ee281f05a

if df is None:
    print('It is Nonetype.')

3 run Python by command lines

参考 https://mail.python.org/pipermail/python-list/2002-June/173109.html

python -c "print 5"

4 TypeError: can’t pickle generator objects

参考 https://stackoverflow.com/questions/28963354/typeerror-cant-pickle-generator-objects

用 for 循环保存。

fold_id = kf.split(df)
import pickle as pkl
for (idx,i) in enumerate(fold_id):
    print(idx)
    file_name = '../model/fold_id_{}.pkl'.format(idx)
    print('{} saved.'.format(file_name))
    with open(file_name, 'wb') as f:
        pkl.dump(i, f)

0
../model/fold_id_0.pkl saved.
1
../model/fold_id_1.pkl saved.
2
../model/fold_id_2.pkl saved.
3
../model/fold_id_3.pkl saved.
4
../model/fold_id_4.pkl saved.
5
../model/fold_id_5.pkl saved.
6
../model/fold_id_6.pkl saved.
7
../model/fold_id_7.pkl saved.
8
../model/fold_id_8.pkl saved.
9
../model/fold_id_9.pkl saved.
10
../model/fold_id_10.pkl saved.
11
../model/fold_id_11.pkl saved.
12
../model/fold_id_12.pkl saved.
13
../model/fold_id_13.pkl saved.
14
../model/fold_id_14.pkl saved.
15
../model/fold_id_15.pkl saved.
16
../model/fold_id_16.pkl saved.
17
../model/fold_id_17.pkl saved.
18
../model/fold_id_18.pkl saved.
19
../model/fold_id_19.pkl saved.

$ du -sh model/fold_id* | clip
18M model/fold_id_0.pkl
18M model/fold_id_1.pkl
18M model/fold_id_10.pkl
18M model/fold_id_11.pkl
18M model/fold_id_12.pkl
18M model/fold_id_13.pkl
18M model/fold_id_14.pkl
18M model/fold_id_15.pkl
18M model/fold_id_16.pkl
18M model/fold_id_17.pkl
18M model/fold_id_18.pkl
18M model/fold_id_19.pkl
18M model/fold_id_2.pkl
18M model/fold_id_3.pkl
18M model/fold_id_4.pkl
18M model/fold_id_5.pkl
18M model/fold_id_6.pkl
18M model/fold_id_7.pkl
18M model/fold_id_8.pkl
18M model/fold_id_9.pkl

一般都比较大，fold id 文件，所以建议 ignore 的。

5 编码问题

用 pd.read_csv 读数据的时候，

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 2: invalid start byte
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 0: illegal multibyte sequence

用记事本打开，用另存为保存为UTF-8。

6 文件名加入保存时间

参考 https://blog.csdn.net/u010417185/article/details/52293376

from datetime import datetime
string_time = datetime.now().strftime("%Y%m%d-%H%M%S")

file_name = 'submission-' + string_time + '.csv'

submission_df.to_csv('../submission/{}'.format(file_name), index = False, encoding = "UTF-8")

7 tuple in enumerate

参考天池大数据科研平台 (2020)

letters = [('a', 'A'), ('b', 'B'), ('c', 'C')]
for i, (lowercase, uppercase) in enumerate(letters):
    print(f"Index '{i}' refers to the letters '{lowercase}' and '{uppercase}'")

## Index '0' refers to the letters 'a' and 'A'
## Index '1' refers to the letters 'b' and 'B'
## Index '2' refers to the letters 'c' and 'C'

8 解析 config 或者 yaml 文件

参考 nbdev

read_lines("analysis/sample.ini")

[1] "[DEFAULT]"            "author = Jiaxiang Li"

from configparser import ConfigParser

# note: all settings are in settings.ini; edit there, not here
config = ConfigParser(delimiters=['='])
config.read('analysis/sample.ini')

## ['analysis/sample.ini']

cfg = config['DEFAULT']
cfg['author']

## 'Jiaxiang Li'

9 one-line for loop

参考 https://github.com/dipam7/Fancy-python/blob/master/fancy_python_github.ipynb

fruits = ['apple', 'mango', 'banana', 'watermelon', 'pineapple']
for fruit in fruits: print(fruit)

## apple
## mango
## banana
## watermelon
## pineapple

for fruit in fruits: 
    print(fruit)

## apple
## mango
## banana
## watermelon
## pineapple

fruit = ['apple', 'mango', 'banana', 'watermelon', 'pineapple']
fruit_id = [1,2,3,4,5]
{id:fruit for fruit, id in zip(fruit, fruit_id)}

## {1: 'apple', 2: 'mango', 3: 'banana', 4: 'watermelon', 5: 'pineapple'}

10 List

10.1 append

df_list += df_list.append(output_df)

参考 https://www.cnblogs.com/qiu-1010/p/10622527.html 直接写

df_list.append(output_df)

10.2 sort

参考 https://www.runoob.com/python/att-list-sort.html

sort()和 append 一样，直接就是 inplace 逻辑。

10.3 pop

# Python3 program for pop() method 
  
list1 = [ 1, 2, 3, 4, 5, 6 ] 
  
# Pops and removes the last element from the list 
print(list1.pop()) 
  
# Print list after removing last element 
print("New List after pop : ", list1, "\n") 
  
list2 = [1, 2, 3, ('cat', 'bat'), 4] 
  
# Pop last three element 
print(list2.pop()) 
print(list2.pop()) 
print(list2.pop()) 
  
# Print list 
print("New List after pop : ", list2, "\n")

append 是加入，pop 是删除。

>>> list1 = [ 1, 2, 3, 4, 5, 6 ]
>>> list1.pop() == [5]
True
>>> list1
[1, 2, 3, 4]

11 padding

参考 https://thispointer.com/python-how-to-pad-strings-with-zero-space-or-some-other-character/ str(i).zfill(4)

12 lambda

参考 https://mp.weixin.qq.com/s/yh-zGEtDgNo2bovg218xKA

library(reticulate)

lambda [参数1 [,参数2,..参数n]]:表达式， [,参数2,..参数n]表示可选，有[]，参数1表示必选。

当一个 df 转换成 dict 怎么使用 max 呢？

dict1 = {"a": 10, "b": 20}, {"a": 20, "b": 20}, {"a": 50, "b": 20}, {"a": 6, "b": 20}, {"a": 9, "b": 20}

from pprint import pprint

list1 = [dict1]
pprint(list1)

## [({'a': 10, 'b': 20},
##   {'a': 20, 'b': 20},
##   {'a': 50, 'b': 20},
##   {'a': 6, 'b': 20},
##   {'a': 9, 'b': 20})]

import pandas as pd

df1 = pd.DataFrame(dict1)
df1

##     a   b
## 0  10  20
## 1  20  20
## 2  50  20
## 3   6  20
## 4   9  20

df1['a'].max()

## 50

13 os

参考 https://mp.weixin.qq.com/s/IgTmt06flgfbttHv0HpnfA

library(reticulate)

import os
from pprint import pprint

if os.path.isdir("test-dir") is True:
    os.rmdir("test-dir")
os.mkdir("test-dir")

os.path.isdir("test-dir")

## True

os.rmdir("test-dir")

os.path.isdir("test-dir")

## False

if os.path.isdir("test-dir") is True:
    os.rmdir("test-dir")
os.mkdir("test-dir")

pprint(os.path.isdir("test-dir"))

## True

pprint(os.path.isdir("new-dir"))

## True

os.listdir("new-dir")

## []

14 dict 合并

参考 https://thispointer.com/python-how-to-add-append-key-value-pairs-in-dictionary-using-dict-update/

dict.update(Iterable_Sequence of key:value)

15 requirements.txt

参考 https://stackoverflow.com/a/29718371/8625228

pip install -r requirements.txt

pip freeze > requirements.txt

附录

参考文献

天池大数据科研平台. 2020. “Python中enumerate函数的解释和可视化.” 天池大数据科研平台. 2020. https://mp.weixin.qq.com/s/T_QMWTRuhVfC3maqG3H8vQ.

Python base Cookbook