天天看點

python wordcloud 詞雲

詞雲是最近比較流行的一個玩法,javascript, python, R 等語言都有庫可以實作。

簡單介紹一下python的wordcloud。

github:

https://github.com/amueller/word_cloud

示例代碼位址:

1. 安裝

pip install wordcloud
           

2. 入門例子

  • constitution.py
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# 讀取文本内容
text = open('../text/constitution.txt').read()

# 生成詞雲圖檔資料
wordcloud = WordCloud().generate(text)
image = wordcloud.to_image()
image.show()
wordcloud.to_file("../word_image/constitution.png")

# 使用 matplotlib 展示資料
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
           
python wordcloud 詞雲
# 使用 matplotlib 遇到的問題
RuntimeError: Python is not installed as a framework
> 解決方法:
echo "backend: TkAgg" > ~/.matplotlib/matplotlibrc
> 詳細可參考:
https://stackoverflow.com/questions//installation-issue-with-matplotlib-python/#21789908
           

3. 常用參數配置

  • love.py
from wordcloud import WordCloud, STOPWORDS
from scipy.misc import imread

# 讀取文本内容
text = open('../text/love.txt').read()
wc = WordCloud(font_path='../font/simhei.ttf',  # 設定字型
               background_color="black",  # 背景顔色
               max_words=,  # 詞雲顯示的最大詞數
               mask=imread("../background_image/love.jpg"),  # 設定背景圖檔
               stopwords=set(STOPWORDS),  # 使用内置單詞集合過濾
               max_font_size=,  # 字型最大值
               random_state=,  # random.Random的種子,用來生成随機顔色
               # width=1000,
               # height=860,
               )
wordcloud = wc.generate(text)
image = wordcloud.to_image()
image.show()
wordcloud.to_file("../word_image/love.png")
           
python wordcloud 詞雲

全部參數說明

3. 中文分詞

  • talk.py
from wordcloud import WordCloud
from scipy.misc import imread
import jieba


# 使用自定義的中文屏蔽詞組
def stop_words():
    f = open('../text/stopwords.txt', 'r', encoding='utf-8')
    word_list = [' ']
    while True:
        line = f.readline().rstrip()
        word_list.append(line)
        if not line:
            break
    f.close()
    return word_list


text = open('../text/talk.txt').read()
word_generator = jieba.cut(text)  # 使用結巴分詞,獲得生成器
text = ' '.join([word for word in word_generator])

wc = WordCloud(font_path='../font/simhei.ttf',  # 設定字型
               background_color="black",  # 背景顔色
               max_words=,  # 詞雲顯示的最大詞數
               mask=imread("../background_image/love.jpg"),  # 設定背景圖檔
               stopwords=set(stop_words()),  # 使用内置單詞集合過濾
               max_font_size=,  # 字型最大值
               random_state=,  # random.Random的種子,用來生成随機顔色
               # width=1000,
               # height=860,
               )


wc_text = wc.generate(text)
image = wc_text.to_image()
image.show()
wc_text.to_file("../word_image/talk.png")
           
python wordcloud 詞雲