CalTech machine learning, video 13 note(validation)

8:58 2014-10-09

start CalTech machine learning, vieo 13,

validation

9:57 2014-10-09

outline:

* validation set

* model selection

* cross validation

10:03 2014-10-09

Validation vs. regularization

Eout(h) = Ein(h) + overfit penalty

regularization estimates "overfit penalty"

validation estimates "Eout(h)"

10:08 2014-10-09

Eval(h) // validation error

// this will be a good estimate of the out-of-sample performance

10:13 2014-10-09

k is taken out of N

// validation set are different from training set

10:18 2014-10-09

K points => validation

N-K points => training

10:18 2014-10-09

Dval, Dtrain

10:22 2014-10-09

small K => best estimate

large K =>

10:26 2014-10-09

why not put K back into the original N?

10:26 2014-10-09

we call it validation because we use it to make choice

10:34 2014-10-09

Dval is used to make learning choices

If an estimate of Eout affects learning

10:36 2014-10-09

early stopping

10:36 2014-10-09

this is going up, I better stop here

10:37 2014-10-09

What is the difference?

* Test set is unbiased;

* validation set has optimistic bias

10:39 2014-10-09

e1 is an unbiased estimate of out-of-sample error

10:42 2014-10-09

unbiased mean the expected value is what should be

10:42 2014-10-09

Error estimates e1 & e2

Pick h ∈{h1, h2} with e = min(e1, e2)

what is the expectation of e: E(e)?

10:45 2014-10-09

now we realize that this is an optimistic bias

10:46 2014-10-09

fortunately to us the utility of validation in

machine learning is so light, that we're going to

swallow the bias

10:47 2014-10-09

so with this understanding, let's use validation for

model selection which validation set do

10:48 2014-10-09

the choice of λ happens to be a manifestation of this

10:48 2014-10-09

Using Dval more than once

10:49 2014-10-09

that's a choice between models

10:50 2014-10-09

they have a small minus because I'm traning on Dtrain

10:53 2014-10-09

so these are done without any validation, just train

on a reduced set.

10:53 2014-10-09

once I get them, I'm going to evaluate the performance

10:54 2014-10-09

these are "validation errors"

10:54 2014-10-09

your model selection is to look at these errors which

supposed to reflect the out-of-sample performance if you

use this as your final product

10:57 2014-10-09

you pick the smallest of them, now you have a bias

10:57 2014-10-09

now we realized it has an optimistic bias

10:58 2014-10-09

we're now going back to our full data set

10:58 2014-10-09

restore your D as we did it before

10:59 2014-10-09

so this is the algorithm for model selection

10:59 2014-10-09

so I'm going to run an experiment to show you the bias

11:00 2014-10-09

not because it has an inherent good performance, but

because you look for the one with a good performance

11:01 2014-10-09

validation set size

11:02 2014-10-09

and after that, I look the actual out-of-sample performance error

11:03 2014-10-09

I'd like to ask you 2 questions:

* why the curves goes up?

* why are the 2 curves getting closer together?

11:06 2014-10-09

because when I use more for validation, I use less

for training,

11:07 2014-10-09

how much bias depend on the factors, but the bias is there

11:11 2014-10-09

I'm using the validation set to estimate the Eout

11:12 2014-10-09

validation set(Dval) is used for "training" on the

"finalist model"

11:16 2014-10-09

if you have decent set(set size == K), then your estimate

will not be that far from Eout(out-sample-error)

11:25 2014-10-09

so I'm choosing when to stop

11:25 2014-10-09

the training of the network tries to choose the

weight of the network

11:27 2014-10-09

validation error is a reasonable estimate of the

out-of-sample error that we can rely on

11:28 2014-10-09

data contamination:

if you use the data for making choices, you're

contaminating it as far as the ability to make the

real performance

11:31 2014-10-09

contamination: optismistic(decpetive) bias

11:32 2014-10-09

you're trying to measure what is the level of contamination

11:33 2014-10-09

we have a great Ein, and we know Ein is no indication

of Eout, this has been contaminated to death

11:34 2014-10-09

when you go to the 'test set', this is totally clean,

there is no bias here

11:35 2014-10-09

Ein // in-sample error

Etest // out-of-sample error

Eval // validation set

11:36 2014-10-09

the validation set is in between, it's slightly

contaminated.

11:36 2014-10-09

now we go to 'cross validation', very sweet regime

11:38 2014-10-09

the dilemma about K

11:40 2014-10-09

the fluctuation around the estimate we want

11:39 2014-10-09

Eout(g) // g is the hypothesis we're going to report

11:42 2014-10-09

Eout(g-)

// this is the proper sample error but on the hypothese set

// on a reduced set

11:42 2014-10-09

Eout(g) ≈ Eout(g-) ≈ Eval(g-)

Eout(g) // this is what we want

Eout(g-) // this is unknown to me

Eval(g-) // this is what I'm working with

11:43 2014-10-09

I want K to be small so that: Eout(g) ≈ Eout(g-)

11:45 2014-10-09

but also I want K to be large, because Eout(g-) ≈ Eval(g-)

11:45 2014-10-09

can we have K both small & large?

11:46 2014-10-09

leave one out, leave more out

11:46 2014-10-09

I'm going to use N-1 points for training,

and 1 point for validation

11:47 2014-10-09

I'm going to create a reduced set from D, called Dn

11:48 2014-10-09

this one(the taken out) will be the one I use for validation

11:48 2014-10-09

let's look at the validation error

11:49 2014-10-09

in this case, the validation error is just 1 point

11:49 2014-10-09

what happens if I repeat this exercise for different

small n?

11:50 2014-10-09

so in spite of these are different hypotheses, the fact

that they come from different points (N-1),

11:53 2014-10-09

I'm going to define the cross validation error: Ecv

11:53 2014-10-09

the catch is that these are not independent,

each of them is affected by the other

11:55 2014-10-09

It's remarkablly in getting it

11:56 2014-10-09

let's just estimate the out-of-sample error

using the cross validation method

11:57 2014-10-09

and we take an average performance of these

as an indication of what will happen out of sample

12:01 2014-10-09

we're using only 2 points here, when we're done,

we're using 3 points

12:02 2014-10-09

but think of 99/100, who cares?

12:02 2014-10-09

so let's use this for model selection

12:02 2014-10-09

model selection using CV // CV == Cross Validation

12:03 2014-10-09

we're like to find a separating surface

12:07 2014-10-09

Ecv tracks Eout very nicely

12:09 2014-10-09

if I use it as a criteria for model choice

12:10 2014-10-09

let me cutoff at six, and see what the performance like?

// early stop

12:10 2014-10-09

without validation, I'm using the full model

12:11 2014-10-09

with validation, you stop at 6, because the

cross validation tells you do so, it's nice

smooth surface

12:12 2014-10-09

I don't care the in-sample error to go to zero,

that's harmful in some cases

12:12 2014-10-09

so now you can see why validation is seen in this

context as similar to regularization, it does the

same thing, it prevents overfitting, but it prevents

overfitting by estimating out-of-sample error(Eout)

rather than estimating something else

12:16 2014-10-09

seldom use leave one out in real problems,

12:18 2014-10-09

take more points for validation

12:18 2014-10-09

Leave more than one out

12:18 2014-10-09

what you do is you take your data set,

you just break it into several fold

12:18 2014-10-09

exactly the same, except for here,

here I'm taking a chunk

12:20 2014-10-09

this is what I recommend it to you:

10-fold cross validation

-----------------------------------------------

13:29 2014-10-09

both validation & cross validation have bias

for the same reason

CalTech machine learning, video 13 note(validation)

繼續閱讀

Text Recognition with ML KitText Recognition with ML Kit

【吳恩達機器學習筆記】7支援向量機12支援向量機（Support Vector Machines）

scikit-learn中的SVM

Solaris下更新檔管理

人工智能教育是轉型的新風口？

ML - 貸款使用者逾期情況分析6 - Final思路

25張圖詳解 | 大型分布式電商系統架構（二）

Spark的RDD轉換算子-雙value型Spark的RDD轉換算子-雙value型

記一次msyql InnoDB導緻資料庫崩潰，資料庫重新開機失敗的問題

What are training set, validation set and test set?

應用實踐 | 物易雲通基于 Apache Doris 的實時資料倉庫建設業務背景數倉架構演進新架構的優勢系統重點功能新架構的收益問題與經驗寫在最後

Apache Doris 系列：基礎篇-使用BitMap函數精準去重（2）

SVM支援向量機二（Lagrange Duality）SVM支援向量機二（Lagrange Duality）

2021-09-30一碼在手安全無憂從農田到餐桌，全流程追溯四大子產品，助力客戶實作品牌化

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告