CalTech machine learning, video 14 note(Support Vector Machine)

14:32 2014-10-09 Thursday

start CalTech machine learning, video 14

Support Vector Machine

14:33 2014-10-09

SVM is argubaly the most successful classification method

in machine learning

14:42 2014-10-09

it's a very very neat piece of work for machine learning

14:43 2014-10-09

outline:

* maximizing the margin

* the solution

* nonlinear transforms

14:43 2014-10-09

it will be a constrained optimization method

14:44 2014-10-09

linear separation

14:44 2014-10-09

linear separable data

14:44 2014-10-09

I can get different lines. Is there advantage of

choosing one of the lines over the other.

14:45 2014-10-09

that's the margin you have

14:46 2014-10-09

margin of error

14:46 2014-10-09

if you look at this line, it does seem to have

a better margin.

14:46 2014-10-09

let me to get the most possible margin

14:47 2014-10-09

which is the best line for classification.

14:47 2014-10-09

you will choose the fat margin

14:48 2014-10-09

let us ask 2 questions:

* why the big margin better?

* can I solve a w that maximize the margin?

14:48 2014-10-09

all possible dichotomies

14:50 2014-10-09

growth function

14:50 2014-10-09

we know the growth function being big

is bad news for generalization

14:50 2014-10-09

dichotomies with fact margin

14:51 2014-10-09

I'm requied that the margin at least be something

14:53 2014-10-09

effectively by requiring the margin at least be something,

I'm putting a restriction on the growth function

14:54 2014-10-09

fat margin imply fewer dichotomies

14:54 2014-10-09

fat dichotomies have a smaller growth function, a smaller

VC dimension than if I did not restrict them at all

14:55 2014-10-09

we will find out indeed when you have a bigger margin,

you'll achieve better out-of-sample performance

14:57 2014-10-09

fat margins are good

14:57 2014-10-09

find the w that not only classifies the points correctly,

but achieve so with big possible margin.

14:57 2014-10-09

margin is just the distance from the plane to the point

14:58 2014-10-09

hyperplane that separate the points

14:58 2014-10-09

if I give you w & x, can you get a formual & plug in to

give me the distance between that plane described by w

& point xn?

15:00 2014-10-09

normalize w

15:01 2014-10-09

I have the plane, so the plane has that the signal == 0,

it doesn't touch any points, the points are linearly separable,

now when you get the signal to be positive, you're moving in

one direction, you hit the closest point, you hit more points,

the interior points so to be speak.and when you go the other direction,

it's negative, you hit other points

15:05 2014-10-09

pull out w0

15:07 2014-10-09

the plane is now :

w'x + b = 0 (no x0)

15:08 2014-10-09

computing the distance

15:08 2014-10-09

this is the geometry, I have a plane, and I have a point,

I want to estimate the distance.

15:09 2014-10-09

the vector w is orthogonal to the plane // normal vector

15:10 2014-10-09

now you can see the good old b dropped out

15:10 2014-10-09

x' - x'' // this must be orthogonal to the vector w

15:11 2014-10-09

this could be any 2 points on the plane, and the conclusion is that:

w is orthogonal to any vector on the plane

15:12 2014-10-09

now w has an explanation // w is always orthogonal to the plane

15:13 2014-10-09

if you project this fellow on this direction

15:14 2014-10-09

project vector a to vector b

15:14 2014-10-09

unit vector

15:14 2014-10-09

we're going to find an equivalent problem that is

more fiendly

15:17 2014-10-09

the optimization problem: maximize ... subject to ...

15:17 2014-10-09

does anybody see quadratic programming coming on

the horizon?

15:19 2014-10-09

when you solve it, you'll get the separating plane

with the best margin.

15:23 2014-10-09

constrained optimization // Langrange multiplier

15:24 2014-10-09

but here the constraints are "inequality constraints"

instead of "equality constraints"

15:25 2014-10-09

inequality constraints => KKT

15:26 2014-10-09

it is good to look at that picture because

it will put the analysis here in perspective

15:27 2014-10-09

what are we optimizing? and what is the constraint?

15:29 2014-10-09

objective function, constraints

15:30 2014-10-09

pass it to a package of quadratic programming

to get the solution

15:30 2014-10-09

and they become part of the objective function // constraints

15:32 2014-10-09

we're minimizing this respect to what?

respect to w & b

15:34 2014-10-09

classification, segmentation

15:38 2014-10-09

this is a "KKT condition"

15:42 2014-10-09

the solution - quadratic programming

15:43 2014-10-09

convex optimization

15:48 2014-10-09

convex function

15:49 2014-10-09

just a word of warning before we go ...

15:49 2014-10-09

it's a practical consideration, but it's an

important consideration.

15:50 2014-10-09

flirting with danger

15:51 2014-10-09

QP hands us α // QP == Quadratic Programming

15:52 2014-10-09

so the solution is vector of αs

15:53 2014-10-09

you get the αs, you plug them in, so you get the w

15:53 2014-10-09

so I would like to tell you a condition which is

very important, and it will be the key to find the

support vectors, KKT condition

15:55 2014-10-09

when you look at the vector, you just see a whole bunch

of αs that are 0

15:55 2014-10-09

for all the interior points, you're guaranteed that the

multiplier is zero

15:59 2014-10-09

we end up with no need for regularization

15:59 2014-10-09

αn > 0 => Xn is a support vector

16:00 2014-10-09

now we come to the interesting definition,α is largely

zeros, interior points, the most important points in the game

are the points that actually define the plane & the margin

these are the ones for which αs positive

16:03 2014-10-09

these are called support vectors, so I have n points,

I classifies them & I got the maximum margin, because

it's a maximum margin, it touches on some of the +1 &

some of the -1 points, those points support the plane

so to speak, and are called the "support vectors", and

other guys are interior points

16:06 2014-10-09

α greater than zero identify a support vector

16:07 2014-10-09

where are the support vectors in this picture?

they're the closest one to the plane

16:08 2014-10-09

all of these guys here & here contribute nothing,

α == 0 in this case

16:08 2014-10-09

and the support vectors achieve the margin exactly

they're the critical points

16:09 2014-10-09

this is our way to get the α back, this is the currency

we get back from the quadratic programming,and plug in

order to get the w

16:11 2014-10-09

now I notice that many of the αs are zero, and αis only

positive for support vectors,

16:12 2014-10-09

then I can say that I can sum this up over only the support

vectors(SV)

16:13 2014-10-09

seems only a minor technicality, but there is a very important

point in here

16:13 2014-10-09

α is the parameter of your model, when they're 0,

they don't count.

16:14 2014-10-09

w // weight vector

16:14 2014-10-09

now you realize that why there might be a

generalization dividend here, because I end up

here with fewer parameters

16:16 2014-10-09

solve for b using any SV(Support Vector)

16:16 2014-10-09

now you have w & b, you're ready with the classification

line or hyperplane that you have

16:17 2014-10-09

nonlinear transformation

16:17 2014-10-09

now let me close with the nonlinear transformation,

which will be a very short presentation that have

an enormous impact, we're talking about linear boundary

and we're talking about linearly separable case

16:19 2014-10-09

instead of working in the x space, we work in the z space

16:19 2014-10-09

z instead of x

16:20 2014-10-09

not just get a generic separator, you are getting

the best separator according SVM

16:23 2014-10-09

I getting the separating line or plane in the z space

16:24 2014-10-09

If I have nonlinear transformation, do I still have SV?

you have 'support vectors' sure in the z space

you get the planes there, you get the margin, the margin

will touch some points, these are your support vectors

by definition.

16:28 2014-10-09

you can identify them even without looking geometrically

at the z space because what are the support vectors?

I look at the αs I get, and the αs are positive, those

are corresponds to support vectors

16:31 2014-10-09

this is what the boundaries look like

16:32 2014-10-09

now we have alarm bells, overfitting, overfitting....

16:32 2014-10-09

so if I look at what the support vectors are in the z space,

16:33 2014-10-09

"pre-images" of support vectors

16:33 2014-10-09

but you need to be careful, because the formal definition

is in the z space

16:34 2014-10-09

now the interesting aspect here is that, if this is true,

16:34 2014-10-09

that's remarkable, because I just went to a million dimensional

space, w is a million dimensional vector,and when I did the solution,

if I only 4 support vectors

the generalization behavior will go with 4 parameters

16:36 2014-10-09

so this looks like a sophisticated surface, but it's a

sophisticated surface in disguise, it's so careful chosen,

there are lots of snakes go aound and messing up with the

generalization,this one will be the best of them, and you have

a handle on how good generalization just by counting the number

of support vectors,

16:38 2014-10-09

the distance betweent the support vectors & the surface here

are not the margin, the margins are in the linear space etc.

16:39 2014-10-09

they're likely these guys to be close to the surface,

but the distance wouldn't be the same, and there are

perhaps other vectors look like should be support vectors,

they're not! what make them to be support vectors:

they're ahieving the margin in the z space,

this is just an illustrative version of it.

16:40 2014-10-09

now we come to the generalization result that

makes this fly!

16:41 2014-10-09

generalization result: Eout <= #of SV's / (N-1)

16:42 2014-10-09

that will give you an upperbound in Eout

16:42 2014-10-09

the result is very close to this.

16:42 2014-10-09

in order to get the correct result

16:44 2014-10-09

how many support vector do you get?

16:44 2014-10-09

you ask me after you've done with this...,

how many support vectors do you get? if you have

a thousand data points, and you get 10 support vectors,

you're in pretty good shape, regardless of the dimensionality

of the z space you visited

16:46 2014-10-09

now I can go to any dimension space, it will be fine?

16:46 2014-10-09

but this is the main theoretical result that make

people use support vector

16:47 2014-10-09

kernel method, which is a modification of SVM

--------------------------------------------------------

16:53 2014-10-09

nonlinear transformation => kernel method ???

16:53 2014-10-09

in almost all the cases, the vast majority of terms

will be zero

16:54 2014-10-09

this is not a accurate statement, but it's a reasonable

statement

16:55 2014-10-09

the number of nonzero paramters, which corresponds to

VC dimension happens to be the #SV by defn

16:56 2014-10-09

so the number of SV give you a bound on Eout

16:56 2014-10-09

some of the aggregation method like boosting has a margin

on, then you compare that,

16:59 2014-10-09

basically the support vectors are there to "support

the separating plane"

16:59 2014-10-09

they're the ones dictate the margin

CalTech machine learning, video 14 note(Support Vector Machine)

繼續閱讀

SVMlight工具包使用說明（windows）

人工智能教育是轉型的新風口？

ML - 貸款使用者逾期情況分析6 - Final思路

25張圖詳解 | 大型分布式電商系統架構（二）

Spark的RDD轉換算子-雙value型Spark的RDD轉換算子-雙value型

記一次msyql InnoDB導緻資料庫崩潰，資料庫重新開機失敗的問題

四：SVMSVM

SVM學習筆記（一）

支援向量機SVM----學習筆記四（代碼實踐一SVM思路解決回歸問題）

學習SVM（四）了解SVM中的支援向量（Support Vector）學習SVM（四）了解SVM中的支援向量（Support Vector）

應用實踐 | 物易雲通基于 Apache Doris 的實時資料倉庫建設業務背景數倉架構演進新架構的優勢系統重點功能新架構的收益問題與經驗寫在最後

Apache Doris 系列：基礎篇-使用BitMap函數精準去重（2）

SVM支援向量機二（Lagrange Duality）SVM支援向量機二（Lagrange Duality）

2021-09-30一碼在手安全無憂從農田到餐桌，全流程追溯四大子產品，助力客戶實作品牌化

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告