統計圖表這麼多？這個可視化工具太贊了~~

最近一直在整理統計圖表的繪制方法，發現Python中除了經典Seaborn庫外，還有一些優秀的可互動的第三方庫也能實作一些常見的統計圖表繪制，而且其還擁有Matplotlib、Seaborn等庫所不具備的互動效果。

當然，同時也能繪制出版級别的圖表要求，此外，一些在使用Matplotlib需自定義函數才能繪制的圖表在一些第三方庫中都內建了，這也大大縮短了繪圖時間。

今天我就詳細介紹一個優秀的第三方庫-HoloViews，内容主要如下：

Python-HoloViews庫介紹
Python-HoloViews庫樣例介紹

【注】文末提供技術交流群

Python-HoloViews庫介紹

Python-HoloViews庫作為一個開源的可視化庫，其目的是使資料分析結果和可視化完美銜接，其預設的繪圖主題和配色以及較少的繪圖代碼量，可以使你專注于資料分析本身，同時其統計繪圖功能也非常優秀。更多關于HoloViews庫的介紹，可參考：Python-HoloViews庫官網[1]

Python-HoloViews庫樣例介紹

這一部分小編重點放在一些統計圖表上，其繪制結果不僅可以在網頁上互動，同時其預設的繪圖結果也完全滿足出版界别的要求，主要内容如下(以下圖表都是可互動的)：

「密度圖+箱線圖」

import pandas as pd
import holoviews as hv
from bokeh.sampledata import autompg

hv.extension('bokeh')
df = autompg.autompg_clean
bw = hv.BoxWhisker(df, kdims=["origin"], vdims=["mpg"])
dist = hv.NdOverlay(
    {origin: hv.Distribution(group, kdims=["mpg"]) 
         for origin, group in df.groupby("origin")}
)

bw + dist

密度圖+箱線圖

「散點圖+橫線圖」

scatter = hv.Scatter(df, kdims=["origin"], vdims=["mpg"]).opts(jitter=0.3)

yticks = [(i + 0.25, origin) for i, origin in enumerate(df["origin"].unique())]
spikes = hv.NdOverlay(
    {
        origin: hv.Spikes(group["mpg"]).opts(position=i)
            for i, (origin, group) in enumerate(df.groupby("origin", sort=False))
    }
).opts(hv.opts.Spikes(spike_length=0.5, yticks=yticks, show_legend=False, alpha=0.3))

scatter + spikes

散點圖+橫線圖

「Iris Splom」

from bokeh.sampledata.iris import flowers
from holoviews.operation import gridmatrix

ds = hv.Dataset(flowers)

grouped_by_species = ds.groupby('species', container_type=hv.NdOverlay)
grid = gridmatrix(grouped_by_species, diagonal_type=hv.Scatter)
grid.opts(opts.Scatter(tools=['hover', 'box_select'], bgcolor='#efe8e2', fill_alpha=0.2, size=4))

Iris Splom

「面積圖」

# create some example data
python=np.array([2, 3, 7, 5, 26, 221, 44, 233, 254, 265, 266, 267, 120, 111])
pypy=np.array([12, 33, 47, 15, 126, 121, 144, 233, 254, 225, 226, 267, 110, 130])
jython=np.array([22, 43, 10, 25, 26, 101, 114, 203, 194, 215, 201, 227, 139, 160])

dims = dict(kdims='time', vdims='memory')
python = hv.Area(python, label='python', **dims)
pypy   = hv.Area(pypy,   label='pypy',   **dims)
jython = hv.Area(jython, label='jython', **dims)

opts.defaults(opts.Area(fill_alpha=0.5))
overlay = (python * pypy * jython)
overlay.relabel("Area Chart") + hv.Area.stack(overlay).relabel("Stacked Area Chart")

面積圖

「直方圖系列」

def get_overlay(hist, x, pdf, cdf, label):
    pdf = hv.Curve((x, pdf), label='PDF')
    cdf = hv.Curve((x, cdf), label='CDF')
    return (hv.Histogram(hist, vdims='P(r)') * pdf * cdf).relabel(label)

np.seterr(divide='ignore', invalid='ignore')

label = "Normal Distribution (μ=0, σ=0.5)"
mu, sigma = 0, 0.5

measured = np.random.normal(mu, sigma, 1000)
hist = np.histogram(measured, density=True, bins=50)

x = np.linspace(-2, 2, 1000)
pdf = 1/(sigma * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((x-mu)/np.sqrt(2*sigma**2)))/2
norm = get_overlay(hist, x, pdf, cdf, label)


label = "Log Normal Distribution (μ=0, σ=0.5)"
mu, sigma = 0, 0.5

measured = np.random.lognormal(mu, sigma, 1000)
hist = np.histogram(measured, density=True, bins=50)

x = np.linspace(0, 8.0, 1000)
pdf = 1/(x* sigma * np.sqrt(2*np.pi)) * np.exp(-(np.log(x)-mu)**2 / (2*sigma**2))
cdf = (1+scipy.special.erf((np.log(x)-mu)/(np.sqrt(2)*sigma)))/2
lognorm = get_overlay(hist, x, pdf, cdf, label)


label = "Gamma Distribution (k=1, θ=2)"
k, theta = 1.0, 2.0

measured = np.random.gamma(k, theta, 1000)
hist = np.histogram(measured, density=True, bins=50)

x = np.linspace(0, 20.0, 1000)
pdf = x**(k-1) * np.exp(-x/theta) / (theta**k * scipy.special.gamma(k))
cdf = scipy.special.gammainc(k, x/theta) / scipy.special.gamma(k)
gamma = get_overlay(hist, x, pdf, cdf, label)


label = "Beta Distribution (α=2, β=2)"
alpha, beta = 2.0, 2.0

measured = np.random.beta(alpha, beta, 1000)
hist = np.histogram(measured, density=True, bins=50)

x = np.linspace(0, 1, 1000)
pdf = x**(alpha-1) * (1-x)**(beta-1) / scipy.special.beta(alpha, beta)
cdf = scipy.special.btdtr(alpha, beta, x)
beta = get_overlay(hist, x, pdf, cdf, label)


label = "Weibull Distribution (λ=1, k=1.25)"
lam, k = 1, 1.25

measured = lam*(-np.log(np.random.uniform(0, 1, 1000)))**(1/k)
hist = np.histogram(measured, density=True, bins=50)

x = np.linspace(0, 8, 1000)
pdf = (k/lam)*(x/lam)**(k-1) * np.exp(-(x/lam)**k)
cdf = 1 - np.exp(-(x/lam)**k)
weibull = get_overlay(hist, x, pdf, cdf, label)

直方圖系列

「Route Chord」

import holoviews as hv
from holoviews import opts, dim
from bokeh.sampledata.airport_routes import routes, airports

hv.extension('bokeh')

# Count the routes between Airports
route_counts = routes.groupby(['SourceID', 'DestinationID']).Stops.count().reset_index()
nodes = hv.Dataset(airports, 'AirportID', 'City')
chord = hv.Chord((route_counts, nodes), ['SourceID', 'DestinationID'], ['Stops'])

# Select the 20 busiest airports
busiest = list(routes.groupby('SourceID').count().sort_values('Stops').iloc[-20:].index.values)
busiest_airports = chord.select(AirportID=busiest, selection_mode='nodes')
busiest_airports.opts(
    opts.Chord(cmap='Category20', edge_color=dim('SourceID').str(), 
               height=800, labels='City', node_color=dim('AirportID').str(), width=800))

Route Chord

「小提琴圖」

import holoviews as hv
from holoviews import dim

from  bokeh.sampledata.autompg import autompg
hv.extension('bokeh')

violin = hv.Violin(autompg, ('yr', 'Year'), ('mpg', 'Miles per Gallon')).redim.range(mpg=(8, 45))
violin.opts(height=500, width=900, violin_fill_color=dim('Year').str(), cmap='Set1')

小提琴圖

更多樣例可檢視：Python-HoloViews樣例[2]

總結

今天的推文，小編主要介紹了Python可視化庫HoloViews，着重介紹了其中統計圖表部分，這個庫也會在小編整理的資料中出現，對于一些常見且使用Matplotlib較難繪制的圖表較為友好，感興趣的小夥伴可以學習下哦~~

參考資料

[1]Python-HoloViews庫官網: https://holoviews.org/。

[2]Python-HoloViews

樣例: https://holoviews.org/gallery/index.html。

技術交流

歡迎轉載、收藏、有所收獲點贊支援一下！

目前開通了技術交流群，群友已超過2000人，添加時最好的備注方式為：來源+興趣方向，友善找到志同道合的朋友

方式①、發送如下圖檔至微信，長按識别，背景回複：加群；
方式③、微信搜尋公衆号：Python學習與資料挖掘，背景回複：加群

統計圖表這麼多？這個可視化工具太贊了~~

Python-HoloViews庫介紹

Python-HoloViews庫樣例介紹

總結

參考資料

技術交流

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

統計圖表這麼多？這個可視化工具太贊了~~

Python-HoloViews庫介紹

Python-HoloViews庫樣例介紹

總結

參考資料

​

技術交流

繼續閱讀