Classification problems, sigmoid function, decision boundaries, cross-entropy loss, multi-class
| Regression | Classification | |
|---|---|---|
| Output | Continuous number (any value) | Discrete class label |
| Examples | House price, temperature, revenue | Spam/ham, cancer/benign, cat/dog |
| Loss function | Mean Squared Error (MSE) | Cross-Entropy Loss (Log Loss) |
| Output range | Any real number | Probability 0 to 1 (via sigmoid) |
| Algorithm | Linear regression | Logistic regression |
Linear regression output is unbounded — it can produce values below 0 or above 1, which don't make sense as probabilities. Also, the decision threshold (0.5) may not work correctly when extreme outliers are added to the dataset — the regression line shifts and the threshold breaks down.
Solution: Use the sigmoid function to convert any real-valued score into a probability between 0 and 1.
The sigmoid function converts ANY real number into a probability between 0 and 1:
| Input x | Output f(x) | Interpretation |
|---|---|---|
| x = 0 | 0.5 | Decision boundary — equal probability of either class |
| x → +∞ | → 1 | Almost certain it's class 1 |
| x → -∞ | → 0 | Almost certain it's class 0 |
The sigmoid function creates an S-shaped curve that smoothly transitions from 0 to 1.
Step 1: Fit a linear function to the features (same as linear regression):
Step 2: Pass through the sigmoid to get a probability:
If f(x) ≥ 0.5, predict class 1. If f(x) < 0.5, predict class 0. (Threshold can be adjusted depending on the problem.)
Logistic regression finds the best separating line (or hyperplane in multiple dimensions) between two classes in feature space. Points on one side are predicted class 0; points on the other side are class 1.
Logistic regression uses cross-entropy loss, NOT mean squared error. Why? Because MSE creates a non-convex cost function for logistic regression, making gradient descent unreliable. Cross-entropy creates a convex cost function.
When training with 10 positive examples and 1,000,000 negative examples, the best-scoring decision boundary gets pushed very far from the positive class (trying to be far from the massive negative cluster) rather than being placed at the midpoint between classes.
Ways to balance: find more minority examples, discard majority examples, weight minority class more heavily, replicate minority examples with random perturbation.
Logistic regression is binary by default. For more than 2 classes:
Exception: Ordinal multi-class (like star ratings 1-5) can be encoded numerically using ordinal logistic regression.
Classification predicts discrete labels (vs regression which predicts continuous values). Sigmoid: f(0)=0.5, output always 0-1, converts score to probability. Loss function = Cross-Entropy (Log Loss) — NOT MSE. Log loss is convex → gradient descent finds global minimum. For imbalanced data: balance the training classes. Multi-class: use one-vs-all. Don't encode nominal multi-class as integers. Complex decision boundaries = usually overfitting.
Q1. The primary loss function used in logistic regression for binary classification is:
Answer: C
Logistic regression uses Cross-Entropy Loss (Log Loss), not MSE. MSE creates a non-convex optimization problem for logistic regression, meaning gradient descent could get stuck in local minima. Cross-entropy creates a convex cost function, guaranteeing gradient descent finds the global minimum. The formula penalizes confident wrong predictions most severely.
Q2. In logistic regression, what is the sigmoid function used for?
Answer: B
The sigmoid function converts any real-valued score (the linear combination of features) into a probability between 0 and 1. This probability represents the likelihood that the input belongs to class 1 (or the positive class). For example, P(email is spam | its features) = sigmoid(linear score).
Q3. What does the sigmoid function output when its input is exactly 0?
Answer: D
f(0) = 1/(1+e^0) = 1/(1+1) = 1/2 = 0.5. The sigmoid always equals 0.5 at x=0. This is the decision boundary: if the linear score h(x,w) = 0, the model is exactly 50-50 between the two classes. Positive scores give probability above 0.5 (predict class 1), negative scores give below 0.5 (predict class 0).
Q4. Logistic regression can be extended to multi-class problems using:
Answer: B
One-vs-all (one-vs-rest): for k classes, train k separate binary classifiers. Each classifier answers "Is this class i, or is it something else?" For a new input, run all k classifiers and predict the class whose classifier returns the highest probability. This correctly handles multi-class without creating false orderings.
Q5. Why is it problematic to encode nominal multi-class categories as integers (blond=0, brown=1, red=2)?
Answer: B
Encoding nominal (unordered) categories as integers implies mathematical relationships that don't exist. The model would think "red" (2) = twice "blond" (0), or that "brown" (1) is between the other two. For nominal categories, use one-hot encoding (a separate binary feature for each category). Ordinal encoding is only appropriate when the categories have a genuine meaningful order.
Q6. True or False: Logistic regression can be used for both binary and multi-class classification problems.
Answer: A — True
While logistic regression is inherently binary (outputs a probability between 0 and 1 for two classes), it can handle multi-class problems using the one-vs-all strategy. For k classes, you build k binary logistic regression models. At prediction time, each model gives a probability and you select the class with the highest probability.
Q7. The cross-entropy cost function for logistic regression is convex. Why does this matter?
Answer: B
A convex function has exactly one global minimum. When the cost function is convex, gradient descent is guaranteed to find that global minimum — it cannot get stuck in local minima. This is why cross-entropy is used instead of MSE for logistic regression; MSE creates a non-convex surface with potential local minima.
Q8. When training logistic regression with severely imbalanced classes (10 positive vs 1,000,000 negative), without fixing the imbalance the model will tend to:
Answer: B
With 10 positive vs 1,000,000 negative examples, the optimal decision boundary from the logistic regression perspective is pushed far away from the massive negative cluster (to minimize total error), rather than sitting between the two groups. This means many positive (minority) examples will be on the wrong side. Fix: use equal numbers of positive and negative examples.
Q9. The key difference between logistic regression and linear regression is:
Answer: B
The core difference: linear regression predicts continuous values (no bound on output). Logistic regression passes the linear score through a sigmoid function to produce a probability (0 to 1) for classification. Both use gradient descent. Linear regression uses MSE loss. Logistic regression uses cross-entropy loss. Use logistic regression when your output is a discrete class label.
Q10. A logistic regression model achieves a perfect decision boundary that separates all training examples with zero error. This most likely indicates:
Answer: C
Perfect separation on training data is a red flag for overfitting. Complex decision boundaries that perfectly classify all training examples usually memorize the specific noise and quirks of the training set. On new, unseen data, such models typically perform poorly. A simpler decision boundary that makes a few training errors often generalizes much better.
Q11. Non-linear decision boundaries in logistic regression can be achieved by:
Answer: C
Logistic regression by itself produces linear decision boundaries. To get non-linear boundaries (like circles, curves, or complex shapes), you explicitly add non-linear features to the input — for example, x1^2, x2^2, or x1*x2. The model is then "linear in the features" but the boundary is non-linear in the original input space. However, too many non-linear features risks overfitting.
Q12. For binary classification (class 0 and class 1), the target variable must be encoded as:
Answer: B
Logistic regression requires the target variable to be 0 (negative class) and 1 (positive class). The log loss formula uses these values directly as indicator variables — the y_i terms in the formula switch between the two cost terms. Examples: spam=1, ham=0; cancer=1, benign=0; male=0, female=1.
Q13. The decision boundary of a logistic regression model is located where:
Answer: B
The decision boundary is the set of points where the model is exactly 50/50 (probability = 0.5) between the two classes. Since sigmoid(0) = 0.5, the decision boundary is where h(x,w) = w0 + w1*x1 + ... = 0. Points where h(x,w) > 0 have probability > 0.5 (predict class 1). Points where h(x,w) < 0 have probability < 0.5 (predict class 0).