詞雲是最近比較流行的一個玩法,javascript, python, R 等語言都有庫可以實作。
簡單介紹一下python的wordcloud。
github:
https://github.com/amueller/word_cloud
示例代碼位址:
1. 安裝
pip install wordcloud
2. 入門例子
- constitution.py
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# 讀取文本内容
text = open('../text/constitution.txt').read()
# 生成詞雲圖檔資料
wordcloud = WordCloud().generate(text)
image = wordcloud.to_image()
image.show()
wordcloud.to_file("../word_image/constitution.png")
# 使用 matplotlib 展示資料
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()

# 使用 matplotlib 遇到的問題
RuntimeError: Python is not installed as a framework
> 解決方法:
echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc
> 詳細可參考:
https://stackoverflow.com/questions//installation-issue-with-matplotlib-python/#21789908
3. 常用參數配置
- love.py
from wordcloud import WordCloud, STOPWORDS
from scipy.misc import imread
# 讀取文本内容
text = open('../text/love.txt').read()
wc = WordCloud(font_path='../font/simhei.ttf', # 設定字型
background_color="black", # 背景顔色
max_words=, # 詞雲顯示的最大詞數
mask=imread("../background_image/love.jpg"), # 設定背景圖檔
stopwords=set(STOPWORDS), # 使用内置單詞集合過濾
max_font_size=, # 字型最大值
random_state=, # random.Random的種子,用來生成随機顔色
# width=1000,
# height=860,
)
wordcloud = wc.generate(text)
image = wordcloud.to_image()
image.show()
wordcloud.to_file("../word_image/love.png")
全部參數說明
3. 中文分詞
- talk.py
from wordcloud import WordCloud
from scipy.misc import imread
import jieba
# 使用自定義的中文屏蔽詞組
def stop_words():
f = open('../text/stopwords.txt', 'r', encoding='utf-8')
word_list = [' ']
while True:
line = f.readline().rstrip()
word_list.append(line)
if not line:
break
f.close()
return word_list
text = open('../text/talk.txt').read()
word_generator = jieba.cut(text) # 使用結巴分詞,獲得生成器
text = ' '.join([word for word in word_generator])
wc = WordCloud(font_path='../font/simhei.ttf', # 設定字型
background_color="black", # 背景顔色
max_words=, # 詞雲顯示的最大詞數
mask=imread("../background_image/love.jpg"), # 設定背景圖檔
stopwords=set(stop_words()), # 使用内置單詞集合過濾
max_font_size=, # 字型最大值
random_state=, # random.Random的種子,用來生成随機顔色
# width=1000,
# height=860,
)
wc_text = wc.generate(text)
image = wc_text.to_image()
image.show()
wc_text.to_file("../word_image/talk.png")