CalTech machine learning, video 12 note(regularization)

10:05 2014-10-08

start CalTech machine learning, video 11

regularization

10:05 2014-10-08

overfitting: we're fitting the data all too well

at the expense of the out-of-sample expense

10:06 2014-10-08

if you think of what the VC analysis told us,

the VC analysis told us that given the data resources &

the complexity of the hypothese set with nothing said

about the target,given those we can predict the level

of generalization as a bound.

10:08 2014-10-08

data resouce + VC dimension => level of generalization

10:09 2014-10-08

source of the overfitting is to fit the noise

11:15 2014-10-08

stochastic noise/deterministic noise

11:16 2014-10-08

deterministic noise is the function of limitation of your model

11:17 2014-10-08

1st cure for overfitting

11:18 2014-10-08

outline:

* Regularization - informal

* Regularization - formal

* Weight decay

* Choosiing a regulizer

11:19 2014-10-08

unconstrained solution: minimize Ein //in-sample error

11:37 2014-10-08

let's look at the constrained version, what happens

if we constaining the weights

11:40 2014-10-08

so here is the constaint we're going to work with

11:41 2014-10-08

I have a smaller hypothese set, so the VC dimension is going

in the direction of being smaller, so I'm standing at the point

of better generalization.

11:44 2014-10-08

constraining the weights:

* Hard constaint

* Softer-order constaint

11:45 2014-10-08

Wreg instead of Win

11:46 2014-10-08

you are minimize this subject to the constaint

11:47 2014-10-08

KKT

11:47 2014-10-08

I have 2 thing here, I have the error surface trying

to minimize, and I have the constraint.

11:49 2014-10-08

I'm going to put contours where the in-sample error is constant

11:50 2014-10-08

let's take a point on the surface

11:52 2014-10-08

let's look at the gradient of the objective function

11:53 2014-10-08

gradient of the objective function will give me a good

idea about the direction to move in order to minimize the

objective function

11:54 2014-10-08

move along the circle will change the value of Ein

11:58 2014-10-08

Augmented error: Eaug(w)

12:05 2014-10-08

regularization term

12:06 2014-10-08

I use a subset of the hypothese set and I expect good

generalization.

12:07 2014-10-08

one step learning including regularization

12:13 2014-10-08

let's apply it and see the results in the real case.

12:14 2014-10-08

so the medicine is working, a small dose of medicine

did the job

12:16 2014-10-08

I think we're overdosing here.

12:16 2014-10-08

if you keep increasing λ => overdose !!!

12:17 2014-10-08

the choice of λis extremely critical

12:18 2014-10-08

the good new is that this will be a heuristic choice

12:18 2014-10-08

the choice of λwill be extremely principled based on validation

12:19 2014-10-08

we went to another extreme: now we're "underfitting"

12:20 2014-10-08

overfitting => underfitting

12:20 2014-10-08

the proper choice of λ is important

12:21 2014-10-08

the most famous regularizer is "weight decay"

12:21 2014-10-08

we know in neural network you don't have a neat

closed-form solution, you use gradient descent

12:22 2014-10-08

batch gradient descent => stochastic gradient descent(SGD)

12:23 2014-10-08

I'm in the weight space & this is my weight, and

here is the direction that backpropagation suggest to move to.

12:27 2014-10-08

used to be without regularization, I move from here to here

12:27 2014-10-08

shinking & moving

12:28 2014-10-08

the weight decay from this one to the next

12:29 2014-10-08

weight space

12:31 2014-10-08

some weights are more important than others

12:31 2014-10-08

low-order fit

12:33 2014-10-08

Tikhonov regularizer

12:34 2014-10-08

regularization parameter λ

12:36 2014-10-08

you have to use the regularizer, because without

the regularizer, you're going to get overfitting

12:38 2014-10-08

but there are guidelines to choose the regularizer

12:38 2014-10-08

after you choose the regularizer, there is a check of

the λ

12:39 2014-10-08

practical rule:

stochastic noise is 'high-frequency'

deterministic noise is also non-smooth

12:41 2014-10-08

because of this, here is the guideline for

choosing regularizer:

=> constrain learning towards smoother hypothese

12:42 2014-10-08

regularization is a cure, and the cure has a side-effect

12:42 2014-10-08

it's a cure for fitting the noise

12:43 2014-10-08

punishing the noise more than you punishing the signal

12:43 2014-10-08

in most of the parameterization, small weights correspond

to smoother hypothese, that's why small weights or 'weight decay'

works well in those cases.

12:45 2014-10-08

general form of augmented error

calling the regularizer Ω = Ω(h)

12:46 2014-10-08

we minimize

Eaug(h) = Ein(h) + λ/ N * Ω(h)

// this is what we minimize

12:47 2014-10-08

Eaug is better than Ein as a proxy for Eout

12:50 2014-10-08

augmented error(Eaug) is better than Ein for approximating Eout

12:51 2014-10-08

we found a better proxy for the out-of-sample(Eout)

12:51 2014-10-08

how we choose regularizer?

mainly a heuristic choice

12:52 2014-10-08

perfect hypothese set

12:52 2014-10-08

the perfect regularizer Ω:

constaint in the 'direction' of the target function

12:52 2014-10-08

regularization is an attempt to reduce overfitting

12:55 2014-10-08

harms the overfitting(noise) more than the fitting

12:56 2014-10-08

guidelines:

the direction of smoother

12:56 2014-10-08

we have the error function for the movie rating

12:57 2014-10-08

the notion of simple here is very interesting

12:59 2014-10-08

now you're regularizer to the simpler solution

13:04 2014-10-08

what happened if you choose a bad Ω? // Ω regularizer

we don't worry too much, because we have the saving grace

of λ, we're going to validation

13:06 2014-10-08

the validation will tell us it's harmful, we'll factor

the regularizer out of the game all together.

13:08 2014-10-08

neural network regularizer

13:09 2014-10-08

weight decay

13:09 2014-10-08

so we have this big network, layer upon layer upon layer...

13:11 2014-10-08

I'm looking at the functionalities that I'm implementing

13:12 2014-10-08

as you increase the weight, you're going to enter the more

interesting nonlinearity here.

13:12 2014-10-08

you're going from the most simple to the most complex

13:13 2014-10-08

weight decay: from linear to logical

13:13 2014-10-08

weight elimination:

fewer weights => smaller VC dimension

13:15 2014-10-08

early stopping as a regularizer

13:17 2014-10-08

regularization through the optimizer

13:18 2014-10-08

the optimal λ:

as you increase the noise, you need more regularization

-----------------------------------------------------

13:38 2014-10-08

there are regularizer stood the test of time

13:38 2014-10-08

machine learning is somewhere between theory & practice

13:39 2014-10-08

CalTech machine learning, video 12 note(regularization)

繼續閱讀

C++程式設計（第3版）小白筆記1.1

ElasticJob‐Lite：Simple作業

Text Recognition with ML KitText Recognition with ML Kit

【吳恩達機器學習筆記】7支援向量機12支援向量機（Support Vector Machines）

scikit-learn中的SVM

人工智能教育是轉型的新風口？

ML - 貸款使用者逾期情況分析6 - Final思路

25張圖詳解 | 大型分布式電商系統架構（二）

Spark的RDD轉換算子-雙value型Spark的RDD轉換算子-雙value型

記一次msyql InnoDB導緻資料庫崩潰，資料庫重新開機失敗的問題

應用實踐 | 物易雲通基于 Apache Doris 的實時資料倉庫建設業務背景數倉架構演進新架構的優勢系統重點功能新架構的收益問題與經驗寫在最後

Apache Doris 系列：基礎篇-使用BitMap函數精準去重（2）

SVM支援向量機二（Lagrange Duality）SVM支援向量機二（Lagrange Duality）

2021-09-30一碼在手安全無憂從農田到餐桌，全流程追溯四大子產品，助力客戶實作品牌化

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告