天天看點

CalTech machine learning, video 13 note(validation)

8:58 2014-10-09

start CalTech machine learning, vieo 13, 

validation

9:57 2014-10-09

outline:

* validation set

* model selection

* cross validation

10:03 2014-10-09

Validation vs. regularization

Eout(h) = Ein(h) + overfit penalty

regularization estimates "overfit penalty"

validation estimates "Eout(h)"

10:08 2014-10-09

Eval(h) // validation error

// this will be a good estimate of the out-of-sample performance

10:13 2014-10-09

k is taken out of N 

// validation set are different from training set

10:18 2014-10-09

K points => validation

N-K points => training

10:18 2014-10-09

Dval, Dtrain

10:22 2014-10-09

small K => best estimate

large K => 

10:26 2014-10-09

why not put K back into the original N?

10:26 2014-10-09

we call it validation because we use it to make choice

10:34 2014-10-09

Dval is used to make learning choices

If an estimate of Eout affects learning

10:36 2014-10-09

early stopping

10:36 2014-10-09

this is going up, I better stop here

10:37 2014-10-09

What is the difference?

* Test set is unbiased;

* validation set has optimistic bias

10:39 2014-10-09

e1 is an unbiased estimate of out-of-sample error

10:42 2014-10-09

unbiased mean the expected value is what should be

10:42 2014-10-09

Error estimates e1 & e2

Pick h ∈{h1, h2} with e = min(e1, e2)

what is the expectation of e: E(e)?

10:45 2014-10-09

now we realize that this is an optimistic bias

10:46 2014-10-09

fortunately to us the utility of validation in 

machine learning is so light, that we're going to

swallow the bias

10:47 2014-10-09

so with this understanding, let's use validation for 

model selection which validation set do

10:48 2014-10-09

the choice of λ happens to be a manifestation of this

10:48 2014-10-09

Using Dval more than once

10:49 2014-10-09

that's a choice between models

10:50 2014-10-09

they have a small minus because I'm traning on Dtrain

10:53 2014-10-09

so these are done without any validation, just train

on a reduced set.

10:53 2014-10-09

once I get them, I'm going to evaluate the performance

10:54 2014-10-09

these are "validation errors"

10:54 2014-10-09

your model selection is to look at these errors which

supposed to reflect the out-of-sample performance if you

use this as your final product

10:57 2014-10-09

you pick the smallest of them, now you have a bias

10:57 2014-10-09

now we realized it has an optimistic bias

10:58 2014-10-09

we're now going back to our full data set

10:58 2014-10-09

restore your D as we did it before

10:59 2014-10-09

so this is the algorithm for model selection

10:59 2014-10-09

so I'm going to run an experiment to show you the bias

11:00 2014-10-09

not because it has an inherent good performance, but 

because you look for the one with a good performance

11:01 2014-10-09

validation set size

11:02 2014-10-09

and after that, I look the actual out-of-sample performance error

11:03 2014-10-09 

I'd like to ask you 2 questions:

* why the curves goes up?

* why are the 2 curves getting closer together?

11:06 2014-10-09

because when I use more for validation, I use less

for training, 

11:07 2014-10-09

how much bias depend on the factors, but the bias is there

11:11 2014-10-09

I'm using the validation set to estimate the Eout

11:12 2014-10-09

validation set(Dval) is used for "training" on the 

"finalist model"

11:16 2014-10-09

if you have decent set(set size == K), then your estimate 

will not be that far from Eout(out-sample-error)

11:25 2014-10-09

so I'm choosing when to stop

11:25 2014-10-09

the training of the network tries to choose the 

weight of the network

11:27 2014-10-09

validation error is a reasonable estimate of the 

out-of-sample error that we can rely on

11:28 2014-10-09

data contamination:

if you use the data for making choices, you're 

contaminating it as far as the ability to make the 

real performance

11:31 2014-10-09

contamination: optismistic(decpetive) bias

11:32 2014-10-09

you're trying to measure what is the level of contamination

11:33 2014-10-09

we have a great Ein, and we know Ein is no indication

of Eout, this has been contaminated to death

11:34 2014-10-09

when you go to the 'test set', this is totally clean,

there is no bias here

11:35 2014-10-09

Ein   // in-sample error

Etest // out-of-sample error

Eval  // validation set

11:36 2014-10-09

the validation set is in between, it's slightly 

contaminated.

11:36 2014-10-09

now we go to 'cross validation', very sweet regime

11:38 2014-10-09

the dilemma about K

11:40 2014-10-09

the fluctuation around the estimate we want

11:39 2014-10-09

Eout(g) // g is the hypothesis we're going to report

11:42 2014-10-09

Eout(g-) 

// this is the proper sample error but on the hypothese set

// on a reduced set

11:42 2014-10-09

Eout(g) ≈ Eout(g-) ≈ Eval(g-)

Eout(g)  // this is what we want

Eout(g-) // this is unknown to me

Eval(g-) // this is what I'm working with

11:43 2014-10-09

I want K to be small so that: Eout(g) ≈ Eout(g-)

11:45 2014-10-09

but also I want K to be large, because  Eout(g-) ≈ Eval(g-)

11:45 2014-10-09

can we have K both small & large?

11:46 2014-10-09

leave one out, leave more out

11:46 2014-10-09

I'm going to use N-1 points for training,

and 1 point for validation

11:47 2014-10-09

I'm going to create a reduced set from D, called Dn

11:48 2014-10-09

this one(the taken out) will be the one I use for validation

11:48 2014-10-09

let's look at the validation error

11:49 2014-10-09

in this case, the validation error is just 1 point

11:49 2014-10-09

what happens if I repeat this exercise for different

small n?

11:50 2014-10-09

so in spite of these are different hypotheses, the fact 

that they come from different points (N-1), 

11:53 2014-10-09

I'm going to define the cross validation error: Ecv

11:53 2014-10-09

the catch is that these are not independent,

each of them is affected by the other

11:55 2014-10-09

It's remarkablly in getting it

11:56 2014-10-09

let's just estimate the out-of-sample error 

using the cross validation method

11:57 2014-10-09

and we take an average performance of these

as an indication of what will happen out of sample

12:01 2014-10-09

we're using only 2 points here, when we're done,

we're using 3 points 

12:02 2014-10-09

but think of 99/100, who cares?

12:02 2014-10-09

so let's use this for model selection

12:02 2014-10-09

model selection using CV // CV == Cross Validation

12:03 2014-10-09

we're like to find a separating surface

12:07 2014-10-09

Ecv tracks Eout very nicely

12:09 2014-10-09

if I use it as a criteria for model choice

12:10 2014-10-09

let me cutoff at six, and see what the performance like?

// early stop

12:10 2014-10-09

without validation, I'm using the full model

12:11 2014-10-09

with validation, you stop at 6, because the 

cross validation tells you do so, it's nice 

smooth surface

12:12 2014-10-09

I don't care the in-sample error to go to zero, 

that's harmful in some cases

12:12 2014-10-09

so now you can see why validation is seen in this

context as similar to regularization, it does the 

same thing, it prevents overfitting, but it prevents

overfitting by estimating out-of-sample error(Eout)

rather than estimating something else

12:16 2014-10-09

seldom use leave one out in real problems,

12:18 2014-10-09

take more points for validation

12:18 2014-10-09

Leave more than one out

12:18 2014-10-09

what you do is you take your data set,

you just break it into several fold

12:18 2014-10-09

exactly the same, except for here,

here I'm taking a chunk

12:20 2014-10-09

this is what I recommend it to you:

10-fold cross validation

-----------------------------------------------

13:29 2014-10-09

both validation & cross validation have bias

for the same reason

繼續閱讀