天天看點

CalTech machine learning, video 14 note(Support Vector Machine)

14:32 2014-10-09 Thursday

start CalTech machine learning, video 14

Support Vector Machine

14:33 2014-10-09

SVM is argubaly the most successful classification method

in machine learning

14:42 2014-10-09

it's a very very neat piece of work for machine learning

14:43 2014-10-09

outline:

* maximizing the margin

* the solution

* nonlinear transforms

14:43 2014-10-09

it will be a constrained optimization method

14:44 2014-10-09

linear separation

14:44 2014-10-09

linear separable data

14:44 2014-10-09

I can get different lines. Is there advantage of 

choosing one of the lines over the other.

14:45 2014-10-09

that's the margin you have

14:46 2014-10-09

margin of error

14:46 2014-10-09

if you look at this line, it does seem to have

a better margin.

14:46 2014-10-09

let me to get the most possible margin

14:47 2014-10-09

which is the best line for classification.

14:47 2014-10-09

you will choose the fat margin

14:48 2014-10-09

let us ask 2 questions:

* why the big margin better?

* can I solve a w that maximize the margin?

14:48 2014-10-09

all possible dichotomies

14:50 2014-10-09

growth function

14:50 2014-10-09

we know the growth function being big 

is bad news for generalization

14:50 2014-10-09

dichotomies with fact margin

14:51 2014-10-09

I'm requied that the margin at least be something

14:53 2014-10-09

effectively by requiring the margin at least be something,

I'm putting a restriction on the growth function

14:54 2014-10-09

fat margin imply fewer dichotomies

14:54 2014-10-09

fat dichotomies have a smaller growth function, a smaller

VC dimension than if I did not restrict them at all

14:55 2014-10-09

we will find out indeed when you have a bigger margin, 

you'll achieve better out-of-sample performance

14:57 2014-10-09

fat margins are good

14:57 2014-10-09

find the w that not only classifies the points correctly,

but achieve so with big possible margin.

14:57 2014-10-09

margin is just the distance from the plane to the point

14:58 2014-10-09

hyperplane that separate the points

14:58 2014-10-09

if I give you w & x, can you get a formual & plug in to 

give me the distance between that plane described by w

& point xn?

15:00 2014-10-09

normalize w

15:01 2014-10-09

I have the plane, so the plane has that the signal == 0,

it doesn't touch any points, the points are linearly separable,

now when you get the signal to be positive, you're moving in

one direction, you hit the closest point, you hit more points,

the interior points so to be speak.and when you go the other direction,

it's negative, you hit other points

15:05 2014-10-09

pull out w0

15:07 2014-10-09

the plane is now : 

w'x + b = 0 (no x0)

15:08 2014-10-09

computing the distance

15:08 2014-10-09

this is the geometry, I have a plane, and I have a point,

I want to estimate the distance.

15:09 2014-10-09

the vector w is orthogonal to the plane // normal vector

15:10 2014-10-09

now you can see the good old b dropped out

15:10 2014-10-09

x' - x'' // this must be orthogonal to the vector w

15:11 2014-10-09

this could be any 2 points on the plane, and the conclusion is that: 

w is orthogonal to any vector on the plane

15:12 2014-10-09

now w has an explanation // w is always orthogonal to the plane

15:13 2014-10-09

if you project this fellow on this direction

15:14 2014-10-09

project vector a to vector b

15:14 2014-10-09

unit vector

15:14 2014-10-09

we're going to find an equivalent problem that is

more fiendly

15:17 2014-10-09

the optimization problem: maximize ... subject to ...

15:17 2014-10-09

does anybody see quadratic programming coming on 

the horizon?

15:19 2014-10-09

when you solve it, you'll get the separating plane

with the best margin.

15:23 2014-10-09

constrained optimization  // Langrange multiplier

15:24 2014-10-09

but here the constraints are "inequality constraints"

instead of "equality constraints"

15:25 2014-10-09

inequality constraints => KKT

15:26 2014-10-09

it is good to look at that picture because 

it will put the analysis here in perspective

15:27 2014-10-09

what are we optimizing? and what is the constraint?

15:29 2014-10-09

objective function, constraints

15:30 2014-10-09

pass it to a package of quadratic programming

to get the solution

15:30 2014-10-09

and they become part of the objective function // constraints

15:32 2014-10-09

we're minimizing this respect to what?

respect to w & b

15:34 2014-10-09

classification, segmentation

15:38 2014-10-09

this is a "KKT condition"

15:42 2014-10-09

the solution - quadratic programming

15:43 2014-10-09

convex optimization

15:48 2014-10-09

convex function

15:49 2014-10-09

just a word of warning before we go ...

15:49 2014-10-09

it's a practical consideration, but it's an 

important consideration.

15:50 2014-10-09

flirting with danger

15:51 2014-10-09

QP hands us α // QP == Quadratic Programming

15:52 2014-10-09

so the solution is vector of αs

15:53 2014-10-09

you get the αs, you plug them in, so you get the w

15:53 2014-10-09

so I would like to tell you a condition which is

very important, and it will be the key to find the 

support vectors, KKT condition

15:55 2014-10-09

when you look at the vector, you just see a whole bunch

of αs that are 0

15:55 2014-10-09

for all the interior points, you're guaranteed that the 

multiplier is zero

15:59 2014-10-09

we end up with no need for regularization

15:59 2014-10-09

αn > 0    =>    Xn is a support vector

16:00 2014-10-09

now we come to the interesting definition,α is largely 

zeros, interior points, the most important points in the game

are the points that actually define the plane & the margin

these are the ones for which αs positive

16:03 2014-10-09

these are called support vectors, so I have n points, 

I classifies them & I got the maximum margin, because 

it's a maximum margin, it touches on some of the +1 & 

some of the -1 points, those points support the plane 

so to speak, and are called the "support vectors", and 

other guys are interior points 

16:06 2014-10-09

α greater than zero identify a support vector

16:07 2014-10-09

where are the support vectors in this picture?

they're the closest one to the plane

16:08 2014-10-09

all of these guys here & here contribute nothing,

α == 0 in this case

16:08 2014-10-09

and the support vectors achieve the margin exactly

they're the critical points

16:09 2014-10-09

this is our way to get the α back, this is the currency

we get back from the quadratic programming,and plug in

order to get the w

16:11 2014-10-09

now I notice that many of the αs are zero, and αis only 

positive for support vectors,

16:12 2014-10-09

then I can say that I can sum this up over only the support 

vectors(SV)

16:13 2014-10-09

seems only a minor technicality, but there is a very important

point in here

16:13 2014-10-09

α is the parameter of your model, when they're 0,

they don't count.

16:14 2014-10-09

w // weight vector

16:14 2014-10-09

now you realize that why there might be a 

generalization dividend here, because I end up

here with fewer parameters

16:16 2014-10-09

solve for b using any SV(Support Vector)

16:16 2014-10-09

now you have w & b, you're ready with the classification

line or hyperplane that you have

16:17 2014-10-09

nonlinear transformation

16:17 2014-10-09

now let me close with the nonlinear transformation, 

which will be a very short presentation that have 

an enormous impact, we're talking about linear boundary

and we're talking about linearly separable case

16:19 2014-10-09

instead of working in the x space, we work in the z space

16:19 2014-10-09

z instead of x

16:20 2014-10-09

not just get a generic separator, you are getting

the best separator according SVM

16:23 2014-10-09

I getting the separating line or plane in the z space

16:24 2014-10-09

If I have nonlinear transformation, do I still have SV?

you have 'support vectors' sure in the z space

you get the planes there, you get the margin, the margin

will touch some points, these are your support vectors

by definition.

16:28 2014-10-09

you can identify them even without looking geometrically

at the z space because what are the support vectors?

I look at the αs I get, and the αs are positive, those

are corresponds to support vectors

16:31 2014-10-09

this is what the boundaries look like

16:32 2014-10-09

now we have alarm bells, overfitting, overfitting....

16:32 2014-10-09

so if I look at what the support vectors are in the z space,

16:33 2014-10-09

"pre-images" of support vectors

16:33 2014-10-09

but you need to be careful, because the formal definition

is in the z space

16:34 2014-10-09

now the interesting aspect here is that, if this is true,

16:34 2014-10-09

that's remarkable, because I just went to a million dimensional

space, w is a million dimensional vector,and when I did the solution,

if I only 4 support vectors

the generalization behavior will go with 4 parameters

16:36 2014-10-09

so this looks like a sophisticated surface, but it's a 

sophisticated surface in disguise, it's so careful chosen,

there are lots of snakes go aound and messing up with the

generalization,this one will be the best of them, and you have

a handle on how good generalization just by counting the number

of support vectors,

16:38 2014-10-09

the distance betweent the support vectors & the surface here

are not the margin, the margins are in the linear space etc.

16:39 2014-10-09

they're likely these guys to be close to the surface,

but the distance wouldn't be the same, and there are 

perhaps other vectors look like should be support vectors,

they're not! what make them to be support vectors:

they're ahieving the margin in the z space,

this is just an illustrative version of it.

16:40 2014-10-09

now we come to the generalization result that

makes this fly!

16:41 2014-10-09

generalization result: Eout <= #of SV's / (N-1)

16:42 2014-10-09

that will give you an upperbound in Eout

16:42 2014-10-09

the result is very close to this.

16:42 2014-10-09

in order to get the correct result

16:44 2014-10-09

how many support vector do you get?

16:44 2014-10-09

you ask me after you've done with this...,

how many support vectors do you get? if you have 

a thousand data points, and you get 10 support vectors,

you're in pretty good shape, regardless of the dimensionality

of the z space you visited

16:46 2014-10-09

now I can go to any dimension space, it will be fine?

16:46 2014-10-09

but this is the main theoretical result that make

people use support vector

16:47 2014-10-09

kernel method, which is a modification of SVM

--------------------------------------------------------

16:53 2014-10-09

nonlinear transformation => kernel method ???

16:53 2014-10-09

in almost all the cases, the vast majority of terms 

will be zero

16:54 2014-10-09

this is not a accurate statement, but it's a reasonable

statement

16:55 2014-10-09

the number of nonzero paramters, which corresponds to 

VC dimension happens to be the #SV by defn

16:56 2014-10-09

so the number of SV give you a bound on Eout

16:56 2014-10-09

some of the aggregation method like boosting has a margin

on, then you compare that,

16:59 2014-10-09

basically the support vectors are there to "support

the separating plane"

16:59 2014-10-09

they're the ones dictate the margin

繼續閱讀