Modeling philosophy, Occam's Razor, bias-variance tradeoff, baselines, evaluation
The key steps in modeling are: building, fitting, and validating the model.
"The simplest explanation is best."
In model building, this means: prefer the simpler model when two models perform similarly. A simpler model with fewer parameters is:
Every model makes errors from two sources. The goal is to find the balance between them.
Error from wrong assumptions in the model. Example: assuming the data is linear when it is actually curved. The model is too simple — it misses real patterns in BOTH training and test data.
Symptoms: High error on training data AND high error on test data.
Error from too much sensitivity to the training data. The model memorizes noise and specific training examples rather than learning the true pattern.
Symptoms: Very low error on training data BUT high error on new test data.
The tradeoff: Increasing model complexity reduces bias but increases variance. The goal is to find the sweet spot with low bias AND low variance.
Good forecasting models produce a probability distribution over possible outcomes, not a single deterministic prediction. Properties of valid probabilities:
A good forecaster: thinks probabilistically, updates forecasts with new information, and looks for consensus across multiple models.
| Type | Description | Example |
|---|---|---|
| First Principle Models | Based on theoretical understanding of how the system works | Physics simulation, scientific formula |
| Data-Driven Models | Based on observed correlations between input and output from data | Linear regression trained on historical data |
Good models are typically a mixture of both.
Before declaring your sophisticated model is good, you must first compare it to the simplest reasonable alternatives — baselines. If your model only barely beats a baseline, it is not very impressive.
Always evaluate on out-of-sample (test) data that was NOT used for training. A model that performs well on its own training data may simply have memorized it (overfitting).
| Model type | Output | Key metrics |
|---|---|---|
| Classification | Discrete labels (spam/not spam, cat/dog) | Accuracy, Precision, Recall, F1-score, Confusion Matrix |
| Regression | Continuous numerical values (price, temperature) | MSE (mean squared error), RMSE, MAE, R-squared |
Note: Accuracy can be misleading for imbalanced datasets — a model that always predicts the majority class achieves high accuracy but is useless (see Lecture 4).
| Property | Underfitting (High Bias) | Good Fit | Overfitting (High Variance) |
|---|---|---|---|
| Training error | High | Low | Very low |
| Test error | High | Low | High |
| Model complexity | Too simple | Appropriate | Too complex |
| Problem | Misses real patterns | Generalizes well | Memorizes noise |
| Solution | More features, more complexity | Keep as-is | Regularization, simpler model, more data |
Modeling pipeline: Ask → Get → Explore → Model → Communicate. Occam's Razor: simplest model that fits is preferable. Bias = underfitting (too simple, misses patterns; high error everywhere). Variance = overfitting (too complex, memorizes noise; low training error, high test error). Always build baseline models first. Test on out-of-sample data. Primary goal: generalize well to unseen data, not just minimize training error. Accuracy is not the best metric for imbalanced data.
Q1. The primary goal of model building is:
Answer: C
The primary goal is generalization — the model must work well on NEW data it has never seen before. A model that perfectly fits training data but fails on new data (overfitting) is useless in practice. The whole point of modeling is to make predictions on future/unseen data.
Q2. What does Occam's Razor suggest in model building?
Answer: A
Occam's Razor states "the simplest explanation is best." In modeling: when two models perform similarly, choose the simpler one. Simpler models are more interpretable, more robust, and generalize better. More complex models often appear to perform better only because they overfit training data, not because they have genuine insight.
Q3. What is the Bias-Variance Tradeoff?
Answer: A
The Bias-Variance Tradeoff describes the tension between two sources of model error. Bias (underfitting) = error from wrong assumptions; model too simple. Variance (overfitting) = error from over-sensitivity to training data; model too complex. The goal is to find the balance where both are low — good fit on training data AND on new data.
Q4. A model has very low training error but very high error on test data. This is called:
Answer: B
Low training error + high test error = overfitting (high variance). The model has memorized the training data, including its noise, but fails to generalize. It has learned the specific quirks of the training set rather than the true underlying pattern. The solution: simpler model, regularization, or more training data.
Q5. Which of the following is NOT a common step in building a machine learning model?
Answer: D
Database indexing is a database engineering task, not a machine learning step. Common ML model building steps include: data preprocessing (cleaning, scaling), feature selection (choosing which variables to include), model selection, training, validation on test data, and model interpretation/evaluation.
Q6. Why must you evaluate a model on test data rather than training data?
Answer: C
Evaluating on training data tells you how well the model fits the data it was built on — not how well it will perform on new data. An overfit model can achieve near-perfect training accuracy while being useless on new examples. Test data (held out from training) measures genuine generalization ability.
Q7. A baseline model is:
Answer: B
A baseline is the simplest reasonable model you can build — like always predicting the most common class, or using only one variable. Your main model must decisively beat the baseline to be considered genuinely useful. If your complex model only barely beats a naive baseline, it is not adding real value.
Q8. High bias (underfitting) means:
Answer: B
High bias (underfitting) means the model is too simple or makes incorrect assumptions. Example: fitting a straight line to data that is actually curved. Result: high error on BOTH training and test data — the model misses the real pattern everywhere. Solution: increase model complexity, add better features.
Q9. The data science pipeline in correct order is:
Answer: B
The data science pipeline: (1) Ask an interesting question, (2) Get the data, (3) Explore the data (EDA — visualize, clean), (4) Model the data (build, fit, validate), (5) Communicate and visualize the results. Exploration (EDA) always comes BEFORE modeling — you must understand your data before modeling it.
Q10. LASSO and Ridge regression are examples of techniques that apply:
Answer: B
LASSO (L1 regularization) and Ridge (L2 regularization) regression add a penalty term to the cost function that discourages large coefficients, effectively minimizing the number of parameters used. This is a mathematical implementation of Occam's Razor — encouraging the model to use only the most important features and remain simple.
Q11. First principle models differ from data-driven models in that:
Answer: B
First principle models are based on a theoretical explanation of how the system works (like physical simulations or scientific formulas). Data-driven models are based on observed correlations in data (like regression models trained on historical data). Good models are typically a mixture of both — you use domain theory to design the model structure and data to fit the parameters.
Q12. Good forecasting models should produce:
Answer: B
Demanding a single deterministic prediction from a model is a "fool's errand." Good forecasting models produce a probability distribution over all possible events. Properties of valid probabilities: they sum to 1, are never negative, and rare events get small but non-zero probabilities. This captures uncertainty honestly rather than pretending certainty.