greater dimensionality always brings about more difficult learning tasks. here we introduce a supervised dimension reduction method based on linear dimension reduction as introduced in
<a href="http://blog.csdn.net/philthinker/article/details/70212147">http://blog.csdn.net/philthinker/article/details/70212147</a>
which can also be simplified as:
z=tx,x∈rd,z∈rm,m<d
of course, centeralization in the first place is necessary:
xi←xi−1n∑i′=1nxi′
fisher discrimination analysis is one of the most basic supervised linear dimension reduction methods, where we seek for a t to make samples of the same label as close as possible and vice versa. to begin with, define within-class class matrix s(w) and between-class matrix s(b) as:
s(w)=∑y=1c∑i:yi=y(xi−μy)(xi−μy)t∈r(d×d)s(b)=∑y=1cnyμyμty∈r(d×d)
where
μy=1ny∑i:yi=yxi
∑i:yi=y stands for the sum of y satisfying yi=y, ny is the amount of samples belonging to class y.
then we can define the projection matrix t:
maxt∈rm×dtr((ts(w)tt)−1ts(b)tt)
it is obvious that our optimization goal is trying to maximize within-class matrix ts(w)tt as well as minimize between-class matrix ts(b)tt.
s(b)ξ=λs(w)ξ
where the normalized eigenvalues are λ1≥⋯≥λd≥0 and corresponded eigen-vectors are ξ1,⋯,ξd. taking the largest m eigenvalues we get the solution of t:
tˆ=(ξ1,⋯,ξm)t
attention please: when samples have several peeks, the output fails to be ideal. local fisher discrimination analysis may work yet.