Logistic回歸與最小二乘機率分類算法簡述與示例

likelihood function, as interpreted by wikipedia:

<a href="https://en.wikipedia.org/wiki/likelihood_function">https://en.wikipedia.org/wiki/likelihood_function</a>

plays one of the key roles in statistic inference, especially methods of estimating a parameter from a set of statistics. in this article, we’ll make full use of it.

pattern recognition works on the way that learning the posterior probability p(y|x) of pattern x belonging to class y. in view of a pattern x, when the posterior probability of one of the class y achieves the maximum, we can take x for class y, i.e.

y^=argmaxy=1,…,cp(u|x)

the posterior probability can be seen as the credibility of model x belonging to class y.

in logistic regression algorithm, we make use of linear logarithmic function to analyze the posterior probability:

q(y|x,θ)=exp(∑bj=1θ(y)jϕj(x))∑cy′=1exp(∑bj=1θ(y′)jϕj(x))

note that the denominator is a kind of regularization term. then the logistic regression is defined by the following optimal problem:

maxθ∑i=1mlogq(yi|xi,θ)

we can solve it by gradient descent method:

initialize θ.

pick up a training sample (xi,yi) randomly.

update θ=(θ(1)t,…,θ(c)t)t along the direction of gradient ascent:θ(y)←θ(y)+ϵ∇yji(θ),y=1,…,c

where ∇yji(θ)=−exp(θ(y)tϕ(xi))ϕ(xi)∑cy′=1exp(θ(y′)tϕ(xi))+{ϕ(xi)0(y=yi)(y≠yi)

go back to step 2,3 until we get a θ of suitable precision.

take the gaussian kernal model as an example:

q(y|x,θ)∝exp⎛⎝∑j=1nθjk(x,xj)⎞⎠

aren’t you familiar with gaussian kernal model? refer to this article:

<a href="http://blog.csdn.net/philthinker/article/details/65628280">http://blog.csdn.net/philthinker/article/details/65628280</a>

here are the corresponding matlab codes:

in ls probability classifiers, linear parameterized model is used to express the posterior probability:

q(y|x,θ(y))=∑j=1bθ(y)jϕj(x)=θ(y)tϕ(x),y=1,…,c

these models depends on the parameters θ(y)=（θ(y)1,…,θ(y)b）t correlated to each classes y that is diverse from the one used by logistic classifiers. learning those models means to minimize the following quadratic error:jy(θ(y))==12∫(q(y|x,θ(y))−p(y|x))2p(x)dx12∫q(y|x,θ(y))2p(x)dx−∫q(y|x,θ(y))p(y|x)p(x)dx+12∫p(y|x)2p(x)dx

where p(x) represents the probability density of training set {xi}ni=1.

by the bayesian formula,p(y|x)p(x)=p(x,y)=p(x|y)p(y)

hence jy can be reformulated as

jy(θ(y))=12∫q(y|x,θ(y))2p(x)dx−∫q(y|x,θ(y))p(x|y)p(y)dx+12∫p(y|x)2p(x)dx

note that the first term and second term in the equation above stand for the mathematical expectation of p(x) and p(x|y) respectively, which are often impossible to calculate directly. the last term is independent of θ and thus can be omitted.

due to the fact that p(x|y) is the probability density of sample x belonging to class y, we are able to estimate term 1 and 2 by the following averages:1n∑i=1nq(y|xi,θ(y))2,1ny∑i:yi=yq(y|xi,θ(y))p(y)

next, we introduce the regularization term to get the following calculation rule:j^y(θ(y))=12n∑i=1nq(y|xi,θ(y))2−1ny∑i:yi=yq(y|xi,θ(y))+λ2n∥θ(y)∥2

let π(y)=(π(y)1,…,π(y)n)t and π(y)i={1(yi=y)0(yi≠y), then

j^y(θ(y))=12nθ(y)tΦtΦθ(y)−1nθ(y)tΦtπ(y)+λ2n∥θ(y)∥2

therefore, it is evident that the problem above can be formulated as a convex optimization problem, and we can get the analytic solution by setting the twice order derivative to zero:

θ^(y)=(ΦtΦ+λi)−1Φtπ(y)

in order not to get a negative estimation of the posterior probability, we need to add a constrain on the negative outcome:p^(y|x)=max(0,θ^(y)tϕ(x))∑cy′=1max(0,θ^(y′)tϕ(x))

we also take gaussian kernal models for example:

logistic regression is good at dealing with sample set with small size since it works in a simple way. however, when the number of samples is large to some degree, it is better to turn to the least square probability classifiers.

Logistic回歸與最小二乘機率分類算法簡述與示例

繼續閱讀

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

筆試面試題目：滑動視窗(二)

27. Remove Element(清單)題目代碼

資料結構與算法（27）——排序（二）

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

hdu7108哈希