1, Suppose you have the following training set, and fit a logistic regression classifier hθ(x)=g(θ0+θ1x1+θ2x2)
Which of the following are true? Check all that apply.
J(θ) will be a convex function, so gradient descent should converge to the global minimum.
convex function is a quandratic function.
Adding polynomial features (e.g., instead using hθ(x)=g(θ0+θ1x1+θ2x2+θ3x21+θ4x1x2+θ5x22) ) could increase how well we can fit the training data.
The positive and negative examples cannot be separated using a straight line. So, gradient descent will fail to converge.
[EX] the positive and negative examples cannot be separeted using a straight line, but when using the polynomial models , the gradient descent will still effective to converge
Because the positive and negative examples cannot be separated using a straight line, linear regression will perform as well as logistic regression on this data.
[EX ]linear regression often do not work well in classification problems.
2. Which of the following statements are true? Check all that apply.
The cost function J(θ) for logistic regression trained with m≥1 examples is always greater than or equal to zero.
For logistic regression, sometimes gradient descent will converge to a local minimum (and fail to find the global minimum). This is the reason we prefer more advanced optimization algorithms such as fminunc (conjugate gradient/BFGS/L-BFGS/etc).
[] not for this reason, those three ads faster than gradient descent and you don't need to manully pick alpha.
The one-vs-all technique allows you to use logistic regression for problems in which each y(i) comes from a fixed, discrete set of values.
Since we train one classifier when there are two classes, we train two classifiers when there are three classes (and we do one-vs-all classification).
[]we train one classifier for each class
3. You are training a classification model with logistic regression. Which of the following statements are true? Check all that apply.
Adding a new feature to the model always results in equal or better performance on examples not in the training set.
Introducing regularization to the model always results in equal or better performance on the training set.
【解析】Adding
more features might result in a model that overfits the training set, and thus can lead to worse performs for examples which are not in the training set.
Adding many new features to the model makes it more likely to overfit the training set.
Introducing regularization to the model always results in equal or better performance on examples not in the training set.
【解析】If
we introduce too much regularization, we can underfit the training set and have worse performance on the training set.
2. Suppose a massive dataset is available for training a learning algorithm. Training on a lot of data is likely to give good performance when two of the following conditions hold true.
Which are the two?
A human expert on the application domain
can confidently predict y when given only the features x
(or more generally, if we have some way to be confident
that x contains sufficient information to predict y
accurately).
[解析]:correct, X contains sufficient information!
Our learning algorithm is able to
represent fairly complex functions (for example, if we
train a neural network or other model with a large
number of parameters).
When we are willing to include high
order polynomial features of x (such as x21 , x22 ,
x1x2 , etc.).
[解析]wrong, high order polynomial features may not bring sufficient informations
The classes are not too skewed.
We train a learning algorithm with a
small number of parameters (that is thus unlikely to
overfit).
wrong !!
We train a model that does not use regularization.
wrong irrelevant
We train a learning algorithm with a
large number of parameters (that is able to
learn/represent fairly complex functions).
right
[remainder]: 注意概念sufficient information! so the key is about whether the parameters or features we collected contains sufficient information. you can review the concept about "sufficient" from signal estimation course.
1
point 3.
Suppose you have trained a logistic regression classifier which is outputing hθ(x) .
Currently, you predict 1 if hθ(x)≥threshold , and predict 0 if hθ(x)<threshold , where currently the threshold is set to 0.5.
Suppose you increase the threshold to 0.7. Which of the following are true? Check all that apply.
The classifier is likely to have unchanged precision and recall, and
thus the same F1 score.
The classifier is likely to now have higher recall.
The classifier is likely to have unchanged precision and recall, but
higher accuracy.
The classifier is likely to now have higher precision.
1
point 4.
Suppose you are working on a spam classifier, where spam
emails are positive examples ( y=1 ) and non-spam emails are
negative examples ( y=0 ). You have a training set of emails
in which 99% of the emails are non-spam and the other 1% is
spam. Which of the following statements are true? Check all
that apply.
If you always predict spam (output y=1 ),
your classifier will have a recall of 0% and precision
of 99%.
If you always predict spam (output y=1 ),
your classifier will have a recall of 100% and precision
of 1%.
If you always predict non-spam (output
y=0 ), your classifier will have a recall of
0%.
If you always predict non-spam (output
y=0 ), your classifier will have an accuracy of
99%.
1
point 5.
Which of the following statements are true? Check all that apply.
Using a very large training set
makes it unlikely for model to overfit the training
data.
It is a good idea to spend a lot of time
collecting a large amount of data before building
your first version of a learning algorithm.
If your model is underfitting the
training set, then obtaining more data is likely to
help.
The "error analysis" process of manually
examining the examples which your algorithm got wrong
can help suggest what are good steps to take (e.g.,
developing new features) to improve your algorithm's
performance.
After training a logistic regression
classifier, you must use 0.5 as your threshold
for predicting whether an example is positive or
negative.