How should features be selected for logistic regression?

Logistic regression

What is the intuition behind gradient descent? Why is it particularly advantageous for logistic regression?

Complete the code to implement the softmax function

Complete the code to calculate cross entropy loss

Assume b = -3, w = [1, -4, -2, 5, -3, 1, -1, 3]. Given the following feature set f = ["Alex", "are", "best", ",", "sorry", "can", "meet", "we"], what is the predicted classification of "Hi, Dan. Are we still on for Tuesday?"? At what probability?

How should features be selected for logistic regression?

How does logistic regression learn its weights $w$ and bias $b$ ?

When training a logistic regression model, a training example yields $σ (w \cdot x + b) = 0.73$ , though the actual value is $y = 0$ . Using the cross-entropy loss function, what is the loss?

What happens if the learning rate $η$ is too high in gradient descent?

Classify the following sentence using the described classifier: "good good great bad terrible"

10.

What is the sigmoid function? Describe how it can be used for classification.

11.

What's the definition of the precision?

12.

Complete the code to calculate precision.

13.

Give an example of a case where recall is 1 (i.e., perfect recall), but a classifier still performs poorly.

You answeredJUST NOW

(Student response here)

Did you like this question?

(Voting helps us personalize your learning experience!)

Instructor solution

Gus Hahn-PowellMAR 27, 2022, 4:58:29 PM

Often, features for logistic regression will be set by hand, with the implementer relying on linguistic intuition to select features that could be significant in determining a class. For instance, in the task of sentiment analysis, it seems logical to include count of positive lexicons as a feature and count of negative lexicons as a feature. Features can be adjusted when constructing a model, as an implementer could see features with weights close to 0 to potentially remove as noise and add or test different ideas for features.

Unsupervised learning could also be used to cluster examples in the training data and surface features.

Was this helpful?

(Voting helps us personalize your learning experience!)

Think you've got it?

What does a weight of $w_{i} = 0$ imply about its associated feature in a logistic regression model? Assume a bias of 0.

Select one of the following options:

A.
The feature may be irrelevant for determining class.
B.
The feature does not exist in any training examples.
C.
The feature may or may not have a strong positive or negative impact on determining the class.
D.
Both A and B.

Submit answer

Was this helpful?

(Voting helps us personalize your learning experience!)

Continue review

You may exit out of this review and return later without penalty.