machine learning error correction

1, Suppose you have the following training set, and fit a logistic regression classifier hθ(x)=g(θ0+θ1x1+θ2x2)

Which of the following are true? Check all that apply.

J(θ) will be a convex function, so gradient descent should converge to the global minimum.

convex function is a quandratic function.

Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22) ) could increase how well we can fit the training data.

The positive and negative examples cannot be separated using a straight line. So, gradient descent will fail to converge.

[EX] the positive and negative examples cannot be separeted using a straight line, but when using the polynomial models , the gradient descent will still effective to converge

Because the positive and negative examples cannot be separated using a straight line, linear regression will perform as well as logistic regression on this data.

[EX ]linear regression often do not work well in classification problems.

2. Which of the following statements are true? Check all that apply.

The cost function J(θ) for logistic regression trained with m≥1 examples is always greater than or equal to zero.

For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).

[] not for this reason, those three ads faster than gradient descent and you don't need to manully pick alpha.

The one-vs-all technique allows you to use logistic regression for problems in which each y(i) comes from a fixed, discrete set of values.

Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification).

[]we train one classifier for each class

3. You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.

Adding a new feature to the model always results in equal or better performance on examples not in the training set.

Introducing regularization to the model always results in equal or better performance on the training set.

【解析】Adding

more features might result in a model that overfits the training set, and thus can lead to worse performs for examples which are not in the training set.

Adding many new features to the model makes it more likely to overfit the training set.

Introducing regularization to the model always results in equal or better performance on examples not in the training set.

【解析】If

we introduce too much regularization, we can underfit the training set and have worse performance on the training set.

2. Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true.

Which are the two?

A human expert on the application domain

can confidently predict y when given only the features x

(or more generally, if we have some way to be confident

that x contains sufficient information to predict y

accurately).

[解析]：correct, X contains sufficient information!

Our learning algorithm is able to

represent fairly complex functions (for example, if we

train a neural network or other model with a large

number of parameters).

When we are willing to include high

order polynomial features of x (such as x21 , x22 ,

x1x2 , etc.).

[解析]wrong, high order polynomial features may not bring sufficient informations

The classes are not too skewed.

We train a learning algorithm with a

small number of parameters (that is thus unlikely to

overfit).

wrong !!

We train a model that does not use regularization.

wrong irrelevant

We train a learning algorithm with a

large number of parameters (that is able to

learn/represent fairly complex functions).

right

[remainder]: 注意概念sufficient information! so the key is about whether the parameters or features we collected contains sufficient information. you can review the concept about "sufficient" from signal estimation course.

point 3.

Suppose you have trained a logistic regression classifier which is outputing hθ(x) .

Currently, you predict 1 if hθ(x)≥threshold , and predict 0 if hθ(x)<threshold , where currently the threshold is set to 0.5.

Suppose you increase the threshold to 0.7. Which of the following are true? Check all that apply.

The classifier is likely to have unchanged precision and recall, and

thus the same F1 score.

The classifier is likely to now have higher recall.

The classifier is likely to have unchanged precision and recall, but

higher accuracy.

The classifier is likely to now have higher precision.

point 4.

Suppose you are working on a spam classifier, where spam

emails are positive examples ( y=1 ) and non-spam emails are

negative examples ( y=0 ). You have a training set of emails

in which 99% of the emails are non-spam and the other 1% is

spam. Which of the following statements are true? Check all

that apply.

If you always predict spam (output y=1 ),

your classifier will have a recall of 0% and precision

of 99%.

If you always predict spam (output y=1 ),

your classifier will have a recall of 100% and precision

of 1%.

If you always predict non-spam (output

y=0 ), your classifier will have a recall of

0%.

If you always predict non-spam (output

y=0 ), your classifier will have an accuracy of

99%.

point 5.

Which of the following statements are true? Check all that apply.

Using a very large training set

makes it unlikely for model to overfit the training

data.

It is a good idea to spend a lot of time

collecting a large amount of data before building

your first version of a learning algorithm.

If your model is underfitting the

training set, then obtaining more data is likely to

help.

The "error analysis" process of manually

examining the examples which your algorithm got wrong

can help suggest what are good steps to take (e.g.,

developing new features) to improve your algorithm's

performance.

After training a logistic regression

classifier, you must use 0.5 as your threshold

for predicting whether an example is positive or

negative.

machine learning error correction

繼續閱讀

簡單文檔分類——樸素貝葉斯算法樸素貝葉斯算法簡單文檔分類執行個體步驟總結樸素貝葉斯分類調用(sklearn)

【分類算法】什麼是分類算法定義分類與聚類分類過程方法

分類算法的評價名額

K-近鄰算法以及圖像分類應用

weka之NB算法

使用weka的select attribute

weka中分類器算法

在weka中內建自己的算法

【多變量線性回歸】學習記錄序思路實作終

申請評分模型拒絕推斷（RI）方法申請評分模型拒絕推斷（RI）方法

【人工智能行業大師訪談1】吳恩達采訪 Geoffery Hinton

【趨高機器視覺】機器視覺技術原了解析及解決方案

吳恩達 coursera ML 第七課總結+作業答案前言目錄正文模型表示作業答案

XGBoost Plotting API以及GBDT組合特征實踐 XGBoost Plotting API以及GBDT組合特征實踐

解碼器用于語義分割：資料依賴的解碼可以實作靈活的特征聚合

2021-2025年中國運動療法（KT）帶行業市場供需與戰略研究報告