貝葉斯回歸

貝葉斯嶺回歸
主動相關決策理論 - ARD

貝葉斯回歸可以用于在預估階段的參數正則化: 正則化參數的選擇不是通過人為的選擇，而是通過手動調節資料值來實作。

上述過程可以通過引入無資訊先驗于模型中的超參數來完成。在嶺回歸中使用的正則項相當于在 w 為高斯先驗條件下，且此先驗的精确度為求最大後驗估計。在這裡，我們沒有手工調參數 lambda ，而是讓他作為一個變量，通過資料中估計得到。

為了得到一個全機率模型，輸出 y 也被認為是關于的高斯分布。

Alpha 在這裡也是作為一個變量，通過資料中估計得到。

貝葉斯回歸有如下幾個優點:

它能根據已有的資料進行改變。

它能在估計過程中引入正則項。

貝葉斯回歸有如下缺點:

它的推斷過程是非常耗時的。

貝葉斯嶺回歸

BayesianRidge 利用機率模型估算了上述的回歸問題，其先驗參數 w 是由以下球面高斯公式得出的：

p(w|\lambda) =

\mathcal{N}(w|0,\lambda^{-1}\bold{I_{p}})

先驗參數 \alpha 和 \lambda 一般是服從 gamma 分布，這個分布與高斯成共轭先驗關系。

得到的模型一般稱為貝葉斯嶺回歸，并且這個與傳統的 Ridge 非常相似。參數 w ， \alpha 和 \lambda 是在模型拟合的時候一起被估算出來的。剩下的超參數就是關于:math:alpha 和 \lambda 的 gamma 分布的先驗了。它們通常被選擇為無資訊先驗。模型參數的估計一般利用最大邊緣似然對數估計。

預設 \alpha_1 = \alpha_2 = \lambda_1 = \lambda_2 = 10^{-6}.

../_images/sphx_glr_plot_bayesian_ridge_0011.png

貝葉斯嶺回歸用來解決回歸問題:

from sklearn import linear_model

X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]

Y = [0., 1., 2., 3.]

reg = linear_model.BayesianRidge()

reg.fit(X, Y)

BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True,

fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300,

normalize=False, tol=0.001, verbose=False)

在模型訓練完成後，可以用來預測新值:

reg.predict ([[1, 0.]])

array([ 0.50000013])

權值 w 可以被這樣通路:

reg.coef_

array([ 0.49999993, 0.49999993])

由于貝葉斯架構的緣故，權值與普通最小二乘法産生的不太一樣。但是，貝葉斯嶺回歸對病态問題（ill-posed）的魯棒性要更好。

示例:

Bayesian Ridge Regression

Computes a Bayesian Ridge Regression on a synthetic dataset.

See 貝葉斯嶺回歸 for more information on the regressor.

Compared to the OLS (ordinary least squares) estimator, the coefficient weights are slightly shifted toward zeros, which stabilises them.

As the prior on the weights is a Gaussian prior, the histogram of the estimated weights is Gaussian.

The estimation of the model is done by iteratively maximizing the marginal log-likelihood of the observations.

We also plot predictions and uncertainties for Bayesian Ridge Regression for one dimensional regression using polynomial feature expansion. Note the uncertainty starts going up on the right side of the plot. This is because these test samples are outside of the range of the training samples.

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

from sklearn.linear_model import BayesianRidge, LinearRegression

# #############################################################################
# Generating simulated data with Gaussian weights
np.random.seed(0)
n_samples, n_features = 100, 100
X = np.random.randn(n_samples, n_features)  # Create Gaussian data
# Create weights with a precision lambda_ of 4.
lambda_ = 4.
w = np.zeros(n_features)
# Only keep 10 weights of interest
relevant_features = np.random.randint(0, n_features, 10)
for i in relevant_features:
    w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(lambda_))
# Create noise with a precision alpha of 50.
alpha_ = 50.
noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples)
# Create the target
y = np.dot(X, w) + noise

# #############################################################################
# Fit the Bayesian Ridge Regression and an OLS for comparison
clf = BayesianRidge(compute_score=True)
clf.fit(X, y)

ols = LinearRegression()
ols.fit(X, y)

# #############################################################################
# Plot true weights, estimated weights, histogram of the weights, and
# predictions with standard deviations
lw = 2
plt.figure(figsize=(6, 5))
plt.title("Weights of the model")
plt.plot(clf.coef_, color='lightgreen', linewidth=lw,
         label="Bayesian Ridge estimate")
plt.plot(w, color='gold', linewidth=lw, label="Ground truth")
plt.plot(ols.coef_, color='navy', linestyle='--', label="OLS estimate")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc="best", prop=dict(size=12))

plt.figure(figsize=(6, 5))
plt.title("Histogram of the weights")
plt.hist(clf.coef_, bins=n_features, color='gold', log=True,
         edgecolor='black')
plt.scatter(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)),
            color='navy', label="Relevant features")
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc="upper left")

plt.figure(figsize=(6, 5))
plt.title("Marginal log-likelihood")
plt.plot(clf.scores_, color='navy', linewidth=lw)
plt.ylabel("Score")
plt.xlabel("Iterations")


# Plotting some predictions for polynomial regression
def f(x, noise_amount):
    y = np.sqrt(x) * np.sin(x)
    noise = np.random.normal(0, 1, len(x))
    return y + noise_amount * noise


degree = 10
X = np.linspace(0, 10, 100)
y = f(X, noise_amount=0.1)
clf_poly = BayesianRidge()
clf_poly.fit(np.vander(X, degree), y)

X_plot = np.linspace(0, 11, 25)
y_plot = f(X_plot, noise_amount=0)
y_mean, y_std = clf_poly.predict(np.vander(X_plot, degree), return_std=True)
plt.figure(figsize=(6, 5))
plt.errorbar(X_plot, y_mean, y_std, color='navy',
             label="Polynomial Bayesian Ridge Regression", linewidth=lw)
plt.plot(X_plot, y_plot, color='gold', linewidth=lw,
         label="Ground Truth")
plt.ylabel("Output y")
plt.xlabel("Feature X")
plt.legend(loc="lower left")
plt.show()

主動相關決策理論 - ARD

ARDRegression （主動相關決策理論）和

Bayesian Ridge Regression

_ 非常相似，

但是會導緻一個更加稀疏的權重 w [1] [2] 。

ARDRegression 提出了一個不同的 w 的先驗假設。具體來說，就是弱化了高斯分布為球形的假設。

它采用 w 分布是與軸平行的橢圓高斯分布。

也就是說，每個權值 w_{i} 從一個中心在 0 點，精度為 \lambda_{i} 的高斯分布中采樣得到的。

p(w|\lambda) = \mathcal{N}(w|0,A^{-1})

并且 diag ; (A) = \lambda = {\lambda_{1},...,\lambda_{p}}.

與

Bayesian Ridge Regression

_ 不同，每個 w_{i} 都有一個标準差 \lambda_i 。所有 \lambda_i 的先驗分布由超參數 \lambda_1 、 \lambda_2 确定的相同的 gamma 分布确定。

../_images/sphx_glr_plot_ard_0011.png

ARD 也被稱為稀疏貝葉斯學習或相關向量機 [3] [4] 。

示例:

Automatic Relevance Determination Regression (ARD)

Fit regression model with Bayesian Ridge Regression.

See 貝葉斯嶺回歸 for more information on the regressor.

Compared to the OLS (ordinary least squares) estimator, the coefficient weights are slightly shifted toward zeros, which stabilises them.

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

from sklearn.linear_model import ARDRegression, LinearRegression

# #############################################################################
# Generating simulated data with Gaussian weights

# Parameters of the example
np.random.seed(0)
n_samples, n_features = 100, 100
# Create Gaussian data
X = np.random.randn(n_samples, n_features)
# Create weights with a precision lambda_ of 4.
lambda_ = 4.
w = np.zeros(n_features)
# Only keep 10 weights of interest
relevant_features = np.random.randint(0, n_features, 10)
for i in relevant_features:
    w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(lambda_))
# Create noise with a precision alpha of 50.
alpha_ = 50.
noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples)
# Create the target
y = np.dot(X, w) + noise

# #############################################################################
# Fit the ARD Regression
clf = ARDRegression(compute_score=True)
clf.fit(X, y)

ols = LinearRegression()
ols.fit(X, y)

# #############################################################################
# Plot the true weights, the estimated weights, the histogram of the
# weights, and predictions with standard deviations
plt.figure(figsize=(6, 5))
plt.title("Weights of the model")
plt.plot(clf.coef_, color='darkblue', linestyle='-', linewidth=2,
         label="ARD estimate")
plt.plot(ols.coef_, color='yellowgreen', linestyle=':', linewidth=2,
         label="OLS estimate")
plt.plot(w, color='orange', linestyle='-', linewidth=2, label="Ground truth")
plt.xlabel("Features")
plt.ylabel("Values of the weights")
plt.legend(loc=1)

plt.figure(figsize=(6, 5))
plt.title("Histogram of the weights")
plt.hist(clf.coef_, bins=n_features, color='navy', log=True)
plt.scatter(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)),
            color='gold', marker='o', label="Relevant features")
plt.ylabel("Features")
plt.xlabel("Values of the weights")
plt.legend(loc=1)

plt.figure(figsize=(6, 5))
plt.title("Marginal log-likelihood")
plt.plot(clf.scores_, color='navy', linewidth=2)
plt.ylabel("Score")
plt.xlabel("Iterations")


# Plotting some predictions for polynomial regression
def f(x, noise_amount):
    y = np.sqrt(x) * np.sin(x)
    noise = np.random.normal(0, 1, len(x))
    return y + noise_amount * noise


degree = 10
X = np.linspace(0, 10, 100)
y = f(X, noise_amount=1)
clf_poly = ARDRegression(threshold_lambda=1e5)
clf_poly.fit(np.vander(X, degree), y)

X_plot = np.linspace(0, 11, 25)
y_plot = f(X_plot, noise_amount=0)
y_mean, y_std = clf_poly.predict(np.vander(X_plot, degree), return_std=True)
plt.figure(figsize=(6, 5))
plt.errorbar(X_plot, y_mean, y_std, color='navy',
             label="Polynomial ARD", linewidth=2)
plt.plot(X_plot, y_plot, color='gold', linewidth=2,
         label="Ground Truth")
plt.ylabel("Output y")
plt.xlabel("Feature X")
plt.legend(loc="lower left")
plt.show()

貝葉斯回歸

貝葉斯嶺回歸

主動相關決策理論 - ARD

繼續閱讀

CSU 1561 (More) Multiplication

CSU 1563 Lexicography

HDU 4721 Food and Productivity

ZOJ 1041 Transmitters

CSU 1562 Fun House

CodeChef PALPROB Palindromeness

UVA 10344- 23 out of 5

ZOJ 1104 Leaps Tall Buildings

HDU 2821 Pusher

UVA 1401 Remember the Word

ZOJ 2748 Free Kick

CSU 1567 Reverse Rot

JAVA 系列——>開發工具IntelliJ IDEA的安裝以及配置、快捷鍵IDEA 簡介

UVA 519 Puzzle (II)

磁盤結構及在Linux中的命名

詳解STM32單片機的堆棧