1. 使用 RMarkdown 的 child 参数,进行文档拼接。
  2. 这样拼接以后的笔记方便复习。
  3. 相关问题提交到 Issue

1 百分比显示

>>> '{:.1%}'.format(1/3.0)
'33.3%'

参考 Stack Overflow上 Python 展示百分比的更好的方法。

2 判断 Nonetype

参考 https://www.jianshu.com/p/a22ee281f05a

if df is None:
    print('It is Nonetype.')

3 run Python by command lines

参考 https://mail.python.org/pipermail/python-list/2002-June/173109.html

python -c "print 5"

4 TypeError: can’t pickle generator objects

参考 https://stackoverflow.com/questions/28963354/typeerror-cant-pickle-generator-objects

用 for 循环保存。

fold_id = kf.split(df)
import pickle as pkl
for (idx,i) in enumerate(fold_id):
    print(idx)
    file_name = '../model/fold_id_{}.pkl'.format(idx)
    print('{} saved.'.format(file_name))
    with open(file_name, 'wb') as f:
        pkl.dump(i, f)
0
../model/fold_id_0.pkl saved.
1
../model/fold_id_1.pkl saved.
2
../model/fold_id_2.pkl saved.
3
../model/fold_id_3.pkl saved.
4
../model/fold_id_4.pkl saved.
5
../model/fold_id_5.pkl saved.
6
../model/fold_id_6.pkl saved.
7
../model/fold_id_7.pkl saved.
8
../model/fold_id_8.pkl saved.
9
../model/fold_id_9.pkl saved.
10
../model/fold_id_10.pkl saved.
11
../model/fold_id_11.pkl saved.
12
../model/fold_id_12.pkl saved.
13
../model/fold_id_13.pkl saved.
14
../model/fold_id_14.pkl saved.
15
../model/fold_id_15.pkl saved.
16
../model/fold_id_16.pkl saved.
17
../model/fold_id_17.pkl saved.
18
../model/fold_id_18.pkl saved.
19
../model/fold_id_19.pkl saved.
$ du -sh model/fold_id* | clip
18M model/fold_id_0.pkl
18M model/fold_id_1.pkl
18M model/fold_id_10.pkl
18M model/fold_id_11.pkl
18M model/fold_id_12.pkl
18M model/fold_id_13.pkl
18M model/fold_id_14.pkl
18M model/fold_id_15.pkl
18M model/fold_id_16.pkl
18M model/fold_id_17.pkl
18M model/fold_id_18.pkl
18M model/fold_id_19.pkl
18M model/fold_id_2.pkl
18M model/fold_id_3.pkl
18M model/fold_id_4.pkl
18M model/fold_id_5.pkl
18M model/fold_id_6.pkl
18M model/fold_id_7.pkl
18M model/fold_id_8.pkl
18M model/fold_id_9.pkl

一般都比较大,fold id 文件,所以建议 ignore 的。

5 编码问题

pd.read_csv 读数据的时候,

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 2: invalid start byte
UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 0: illegal multibyte sequence

用记事本打开,用另存为保存为UTF-8。

6 文件名加入保存时间

参考 https://blog.csdn.net/u010417185/article/details/52293376

from datetime import datetime
string_time = datetime.now().strftime("%Y%m%d-%H%M%S")
file_name = 'submission-' + string_time + '.csv'
submission_df.to_csv('../submission/{}'.format(file_name), index = False, encoding = "UTF-8")

7 tuple in enumerate

参考 天池大数据科研平台 (2020)

letters = [('a', 'A'), ('b', 'B'), ('c', 'C')]
for i, (lowercase, uppercase) in enumerate(letters):
    print(f"Index '{i}' refers to the letters '{lowercase}' and '{uppercase}'")
## Index '0' refers to the letters 'a' and 'A'
## Index '1' refers to the letters 'b' and 'B'
## Index '2' refers to the letters 'c' and 'C'

8 解析 config 或者 yaml 文件

参考 nbdev

read_lines("analysis/sample.ini")
[1] "[DEFAULT]"            "author = Jiaxiang Li"
from configparser import ConfigParser

# note: all settings are in settings.ini; edit there, not here
config = ConfigParser(delimiters=['='])
config.read('analysis/sample.ini')
## ['analysis/sample.ini']
cfg = config['DEFAULT']
cfg['author']
## 'Jiaxiang Li'

9 one-line for loop

参考 https://github.com/dipam7/Fancy-python/blob/master/fancy_python_github.ipynb

fruits = ['apple', 'mango', 'banana', 'watermelon', 'pineapple']
for fruit in fruits: print(fruit)
## apple
## mango
## banana
## watermelon
## pineapple
for fruit in fruits: 
    print(fruit)
## apple
## mango
## banana
## watermelon
## pineapple
fruit = ['apple', 'mango', 'banana', 'watermelon', 'pineapple']
fruit_id = [1,2,3,4,5]
{id:fruit for fruit, id in zip(fruit, fruit_id)}
## {1: 'apple', 2: 'mango', 3: 'banana', 4: 'watermelon', 5: 'pineapple'}

10 List

10.1 append

df_list += df_list.append(output_df) 

参考 https://www.cnblogs.com/qiu-1010/p/10622527.html 直接写

df_list.append(output_df) 

10.2 sort

参考 https://www.runoob.com/python/att-list-sort.html

sort()append 一样,直接就是 inplace 逻辑。

10.3 pop

# Python3 program for pop() method 
  
list1 = [ 1, 2, 3, 4, 5, 6 ] 
  
# Pops and removes the last element from the list 
print(list1.pop()) 
  
# Print list after removing last element 
print("New List after pop : ", list1, "\n") 
  
list2 = [1, 2, 3, ('cat', 'bat'), 4] 
  
# Pop last three element 
print(list2.pop()) 
print(list2.pop()) 
print(list2.pop()) 
  
# Print list 
print("New List after pop : ", list2, "\n") 

append 是加入,pop 是删除。

>>> list1 = [ 1, 2, 3, 4, 5, 6 ]
>>> list1.pop() == [5]
True
>>> list1
[1, 2, 3, 4]

12 lambda

参考 https://mp.weixin.qq.com/s/yh-zGEtDgNo2bovg218xKA

library(reticulate)

lambda [参数1 [,参数2,..参数n]]:表达式[,参数2,..参数n]表示可选,有[]参数1表示必选。

当一个 df 转换成 dict 怎么使用 max 呢?

dict1 = {"a": 10, "b": 20}, {"a": 20, "b": 20}, {"a": 50, "b": 20}, {"a": 6, "b": 20}, {"a": 9, "b": 20}
from pprint import pprint 
list1 = [dict1]
pprint(list1)
## [({'a': 10, 'b': 20},
##   {'a': 20, 'b': 20},
##   {'a': 50, 'b': 20},
##   {'a': 6, 'b': 20},
##   {'a': 9, 'b': 20})]
import pandas as pd
df1 = pd.DataFrame(dict1)
df1
##     a   b
## 0  10  20
## 1  20  20
## 2  50  20
## 3   6  20
## 4   9  20
df1['a'].max()
## 50

13 os

参考 https://mp.weixin.qq.com/s/IgTmt06flgfbttHv0HpnfA

library(reticulate)
import os
from pprint import pprint
if os.path.isdir("test-dir") is True:
    os.rmdir("test-dir")
os.mkdir("test-dir")
os.path.isdir("test-dir")
## True
os.rmdir("test-dir")
os.path.isdir("test-dir")
## False
if os.path.isdir("test-dir") is True:
    os.rmdir("test-dir")
os.mkdir("test-dir")
pprint(os.path.isdir("test-dir"))
## True
pprint(os.path.isdir("new-dir"))
## True
os.listdir("new-dir")
## []

14 dict 合并

参考 https://thispointer.com/python-how-to-add-append-key-value-pairs-in-dictionary-using-dict-update/

dict.update(Iterable_Sequence of key:value)

15 requirements.txt

参考 https://stackoverflow.com/a/29718371/8625228

pip install -r requirements.txt
pip freeze > requirements.txt

附录

参考文献

天池大数据科研平台. 2020. “Python中enumerate函数的解释和可视化.” 天池大数据科研平台. 2020. https://mp.weixin.qq.com/s/T_QMWTRuhVfC3maqG3H8vQ.