Preprocessing¶

affirmative = make_df("data/affirmative.csv")

Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\LIJIAX~1\AppData\Local\Temp\jieba.cache
Loading model cost 1.083 seconds.
Prefix dict has been built succesfully.

affirmative.head()

negative = make_df("data/negative.csv")

negative.head()

LDA¶

stopwords = get_custom_stopwords("data/stopwords.txt", encoding='utf-8') # HITͣ�ôʴʵ�
max_df = 0.9 # �ڳ�����һ�������ĵ��г��ֵĹؼ��ʣ�����ƽ������ȥ������
min_df = 5 # �ڵ�����һ�������ĵ��г��ֵĹؼ��ʣ����ڶ��أ���ȥ������
n_features = 1000 # �����ȡ��������
n_top_words = 20 # ��ʾ�����¹ؼ��ʵ�ʱ����ʾ���ٸ�
col_content = "text" # ˵�����е��ı���Ϣ����������

lda, tf, vect = lda_on_chinese_articles(df = affirmative, n_topics = 3)
pyLDAvis.sklearn.prepare(lda, tf, vect)

D:\install\miniconda\lib\site-packages\sklearn\feature_extraction\text.py:300: UserWarning: Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens ['lex', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ۢ�', '�ۢ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ܣ�', '�ܣ�', '�ܣ�', '�ܣ�', '�ܣ�', '�ݣ�', '�ݣ�', '�ݣ�', '�ݣ�', '�ݣ�', '���', '���', '����������'] not in stop_words.
  'stop_words.' % sorted(inconsistent))

Topic #0:
ѧϰ ѧ�� ���� �γ� ѧϰ�� ��Ƶ ʱ�� ���� ���� ���� ��ʦ ���� ���� ֪ʶ ��ѧ ��Ȥ ʵ�� �ٽ� ���� �ߵȽ���
Topic #1:
���� ��չ ��ͳ ��ѧ �Ѿ� ȡ�� �ҹ� ���� ��� ���� ���� ���� ��Ϊ ʱ�� ���� Ŀǰ ���� ���� �ߵȽ��� ��У
Topic #2:
ѧ�� ���� ��ͳ ѧϰ ��ʦ ��ʦ ��ѧ ���� ��ʽ ���� ��Ƶ ֪ʶ �γ� ��Ҫ һ�� ���� ���� ʱ�� û�� ģʽ

D:\install\miniconda\lib\site-packages\pyLDAvis\_prepare.py:257: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  return pd.concat([default_term_info] + list(topic_dfs))

TypeError: __init__() got an unexpected keyword argument 'n_topics'

һ��ⶼ�ǳ��ĸд��©д֮��

��

��Ȩ��ΪCSDN��zhuimengshaonian66��ԭ��£��ѭ CC 4.0 BY-SA ��ȨЭ�飬ת��븽��ԭ�ĳ��Ӽ�� ԭ��ӣ�https://blog.csdn.net/zhuimengshaonian66/article/details/81700959

n_components ��޸��ˡ�

lda, tf, vect = lda_on_chinese_articles(df = negative, n_topics = 3)
pyLDAvis.sklearn.prepare(lda, tf, vect)

D:\install\miniconda\lib\site-packages\sklearn\feature_extraction\text.py:300: UserWarning: Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens ['lex', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٢�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�٣�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڢ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ڣ�', '�ۢ�', '�ۢ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ۣ�', '�ܣ�', '�ܣ�', '�ܣ�', '�ܣ�', '�ܣ�', '�ݣ�', '�ݣ�', '�ݣ�', '�ݣ�', '�ݣ�', '���', '���', '����������'] not in stop_words.
  'stop_words.' % sorted(inconsistent))

Topic #0:
���� ѧ�� ��ͳ ѧϰ û�� ȡ�� ��ʦ ���� ��ʦ ���� ���� ��Ƶ ��Ϊ ֪ʶ ʦ�� ���� ����� ���� ���� ����
Topic #1:
ѧϰ �γ� ѧ�� ���� ��ѧ ѧϰ�� ��ʦ ��չ ֪ʶ ���� ��Ҫ �޷� �ҹ� ���� ���� ���� û�� ��Ϊ ƽ̨ ��ʽ
Topic #2:
��ͳ ���� ѧ�� ���� ���� ���� ��չ ��У Ч�� ����Ӣ ���� �޷� ȡ�� �й� ���� ʵ�� ���� ��ѧ�� ���� ʱ��

D:\install\miniconda\lib\site-packages\pyLDAvis\_prepare.py:257: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  return pd.concat([default_term_info] + list(topic_dfs))

pyLDAvis.sklearn.prepare(lda, tf, vect)

D:\install\miniconda\lib\site-packages\pyLDAvis\_prepare.py:257: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  return pd.concat([default_term_info] + list(topic_dfs))

�ο� https://github.com/bmabey/pyLDAvis/issues/132

D:\install\miniconda\lib\site-packages\pyLDAvis\_prepare.py:257: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

  return pd.concat([default_term_info] + list(topic_dfs))

��°�װ��Ȼû�н��

pyLDAvis.__version__

'2.1.2'

pd.__version__

'0.24.2'

# !pip install pyldavis

import pickle as pkl
with open("model/sklearn-lda.pkl", 'wb') as fp:
    pkl.dump(lda, fp)

with open("model/sklearn-lda.pkl", 'rb') as fp:
    model0 = pkl.load(fp)
    print(model0.__class__)

<class 'sklearn.decomposition.online_lda.LatentDirichletAllocation'>

	Group	Students	Content	text
0	��1��	��	Ľ�ν��ֲ��ص��ʵĽ��Դ�ۼ��һ��κ��ѧϰԸ��ܹ��ͳɱ��ģ�ͨ��...	Ľ�� ֲ� �� Դ �ۼ� �� һ�� κ� �� ...
1	��1��	��	��Ľ�η�չ��е��ֽ׶Σ��й��Ľ��ƽ̨icourse163��û��ͻ��100��,��...	�� Ľ�� չ �� ֽ׶� �� й� �� Ľ�� ƽ̨ icourse163 ...
2	��1��	��һ	�о��,��Ľ��Ŀ��ѧϰ��,ѧϰ��ḻ,֪ʶ��Լ�Ԫ��֪��õ��,˼��...	�о� �� , �� Ľ�� ѧϰ �� , ѧϰ�� ḻ , ֪ʶ ...
3	��1��	��	Ľ��ڱ�֤��ͬʱ��ṩ��ĳɱ��㽡��κ��κ�ʱ��κεط��...	Ľ�� ֤ �� ͬʱ �� ṩ �� ɱ� �� ...
4	��1��	��һ	�Է��һ��Ҳ˵�ǿ��ܳ��ֵĻ��գ��ͳ��ü��ʦ��֪ʶ��ѵ��һ��...	�Է� ��һ �� Ҳ ˵ �� ͳ �� ʦ ...

	Group	Students	Content	text
0	��1��	��һ	ͨ�׵�˵��Ľ��Ǵ��ģ��翪�ſγ̡�1.��ͳ��п��ܳ��ֵĻ��գ�Ľ��û�У�2.��ͳ...	ͨ�� ˵ �� Ľ�� ģ �� γ� �� 1 . ��ͳ �� ...
1	��1��	��һ	ͨ�׵�˵��Ľ��Ǵ��ģ��翪�ſγ̡��ҷ��۵�Ϊ��Ľ�β��ܴ��洫ͳ��1.��ͳ��п��ܳ��...	ͨ�� ˵ �� Ľ�� ģ �� γ� �� ҷ� �۵� Ϊ �� Ľ�� ...
2	��1��	��һ	��һ�۵��һ�仰��߽��,ѧϰ��ƫ��Ľ��ۡ��ⲻ��Ǵ�ͳ��ܸ��	��һ �۵� �� һ�� , ѧϰ�� ƫ�� ...
3	��1��	��һ	��У�˵��Ľ�ε��ԣ��Ͼ��й�ѧ��Ӵ�Ļ��⡣	�� ˵�� Ľ�� Ͼ� �й� ѧ�� Ӵ� �� ...
4	��1��	��һ	��һ˵�Ķ��˼��⣬��ͳ��õĵ��Խϴ󣬿��ʦѧ��У��ʩ�̣��...	��һ˵ �� ˼�� ͳ �� ϴ� �� ʦ ...

Preprocessing¶

`make_df`[source]

LDA¶

`chinese_word_cut`[source]

`print_top_words`[source]

`get_custom_stopwords`[source]

`lda_on_chinese_articles_with_param`[source]

`lda_on_chinese_articles`[source]

Preprocessing¶

make_df[source]

LDA¶

chinese_word_cut[source]

print_top_words[source]

get_custom_stopwords[source]

lda_on_chinese_articles_with_param[source]

lda_on_chinese_articles[source]

`make_df`[source]

`chinese_word_cut`[source]

`print_top_words`[source]

`get_custom_stopwords`[source]

`lda_on_chinese_articles_with_param`[source]

`lda_on_chinese_articles`[source]