天天看點

machine learning error correction

1,  Suppose you have the following training set, and fit a logistic regression classifier  hθ(x)=g(θ0+θ1x1+θ2x2)

Which of the following are true? Check all that apply.

J(θ)  will be a convex function, so gradient descent should converge to the global minimum.

convex function is a quandratic function.

Adding polynomial features (e.g., instead using  hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22)  ) could increase how well we can fit the training data.

The positive and negative examples cannot be separated using a straight line. So, gradient descent will fail to converge.

[EX] the positive and negative examples cannot be separeted using a straight line, but when using the polynomial models , the gradient descent will still effective to converge 

Because the positive and negative examples cannot be separated using a straight line, linear regression will perform as well as logistic regression on this data.

[EX ]linear regression often do not work well in classification problems. 

2. Which of the following statements are true? Check all that apply.

The cost function  J(θ)  for logistic regression trained with  m≥1  examples is always greater than or equal to zero.

For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).

[] not for this reason, those three ads faster than gradient descent and you don't need to manully pick alpha.

The one-vs-all technique allows you to use logistic regression for problems in which each  y(i)  comes from a fixed, discrete set of values.

Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification).

[]we train one classifier for each class

3. You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.

Adding a new feature to the model always results in equal or better performance on examples not in the training set.

Introducing regularization to the model always results in equal or better performance on the training set.

【解析】Adding

more features might result in a model that overfits the training set, and thus can lead to worse performs for examples which are not in the training set.

Adding many new features to the model makes it more likely to overfit the training set.

Introducing regularization to the model always results in equal or better performance on examples not in the training set.

【解析】If

we introduce too much regularization, we can underfit the training set and have worse performance on the training set.

2. Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true.

Which are the two?

A human expert on the application domain

can confidently predict  y  when given only the features  x

(or more generally, if we have some way to be confident

that  x  contains sufficient information to predict  y

accurately).

[解析]:correct, X contains sufficient information!

Our learning algorithm is able to

represent fairly complex functions (for example, if we

train a neural network or other model with a large

number of parameters).

When we are willing to include high

order polynomial features of  x  (such as  x21 ,  x22 ,

x1x2 , etc.).

[解析]wrong, high order polynomial features may not bring sufficient informations 

The classes are not too skewed.

We train a learning algorithm with a

small number of parameters (that is thus unlikely to

overfit).

wrong !!

We train a model that does not use regularization.

wrong irrelevant 

We train a learning algorithm with a

large number of parameters (that is able to

learn/represent fairly complex functions).

right 

[remainder]: 注意概念sufficient information! so the key is about whether the parameters or features we collected contains sufficient information.  you can review the concept about "sufficient" from signal estimation course. 

1

point 3. 

Suppose you have trained a logistic regression classifier which is outputing  hθ(x) .

Currently, you predict 1 if  hθ(x)≥threshold , and predict 0 if  hθ(x)<threshold , where currently the threshold is set to 0.5.

Suppose you increase the threshold to 0.7. Which of the following are true? Check all that apply.

The classifier is likely to have unchanged precision and recall, and

thus the same  F1  score.

The classifier is likely to now have higher recall.

The classifier is likely to have unchanged precision and recall, but

higher accuracy.

The classifier is likely to now have higher precision.

1

point 4. 

Suppose you are working on a spam classifier, where spam

emails are positive examples ( y=1 ) and non-spam emails are

negative examples ( y=0 ). You have a training set of emails

in which 99% of the emails are non-spam and the other 1% is

spam. Which of the following statements are true? Check all

that apply.

If you always predict spam (output  y=1 ),

your classifier will have a recall of 0% and precision

of 99%.

If you always predict spam (output  y=1 ),

your classifier will have a recall of 100% and precision

of 1%.

If you always predict non-spam (output

y=0 ), your classifier will have a recall of

0%.

If you always predict non-spam (output

y=0 ), your classifier will have an accuracy of

99%.

1

point 5. 

Which of the following statements are true? Check all that apply.

Using a very large training set

makes it unlikely for model to overfit the training

data.

It is a good idea to spend a lot of time

collecting a large amount of data before building

your first version of a learning algorithm.

If your model is underfitting the

training set, then obtaining more data is likely to

help.

The "error analysis" process of manually

examining the examples which your algorithm got wrong

can help suggest what are good steps to take (e.g.,

developing new features) to improve your algorithm's

performance.

After training a logistic regression

classifier, you must use 0.5 as your threshold

for predicting whether an example is positive or

negative.

繼續閱讀