All formulas taught across ICT583 lectures. Closed book exam — memorize these.
Memory tip: "B for Binary trials — only needs n (trials) and p (probability)."
Memory tip: "Normal = mu + sigma. Bell curve."
Memory tip: "68-95-99.7 — memorize as three numbers."
Memory tip: "Subtract the mean, divide by SD."
Memory tip: "Residual = actual MINUS predicted. Always vertical."
| alpha | Effect |
|---|---|
| Too small | Slow convergence |
| Too large | Overshoots, fails to converge |
Memory tip: "Walk downhill — step in direction of steepest descent."
| Lambda | Effect | Risk |
|---|---|---|
| Large | Forces parameters to zero, simpler model | Underfitting |
| Small | Minimal penalty, uses all parameters | Overfitting |
Memory tip: "Lambda = how much do I punish large parameters?"
| x | f(x) |
|---|---|
| 0 | 0.5 (decision boundary) |
| +infinity | 1 |
| -infinity | 0 |
Memory tip: "S-curve squashes everything to [0, 1]. At zero input, output is 0.5."
Memory tip: "Log loss penalizes confident wrong predictions most heavily."
Memory tip: "Entropy = messiness. Lower is cleaner = better split."
| k = 1 | Manhattan: d = SUM |xi - yi| |
|---|---|
| k = 2 | Euclidean: d = SQRT( SUM (xi-yi)^2 ) |
Memory tip: "k=1 is Manhattan (grid), k=2 is Euclidean (straight line, most common)."
| Concept | Key numbers / facts |
|---|---|
| Binomial parameters | p and n (NOT mean/SD) |
| Normal parameters | mu and sigma |
| Empirical Rule | 68 / 95 / 99.7 % |
| Significance level alpha | 0.05 (common default) |
| Reject H0 when | p-value < alpha |
| H0 definition | No difference / no relationship |
| Sigmoid at x = 0 | f(0) = 0.5 |
| Logistic regression loss | Cross-Entropy (Log Loss) |
| Euclidean distance k | k = 2 |
| Manhattan distance k | k = 1 |
| Large lambda effect | Simpler model → underfitting risk |
| High learning rate | Overshoots, fails to converge |
| High entropy node | Impure — bad split |
| Residual interpretation | Vertical distance from point to line |
| Support vectors | Points closest to decision boundary |
| AdaBoost weight increase | When misclassified by weak classifier in round t |
| Occam's Razor | Simplest model that fits is preferable |
| Primary goal of modeling | Generalize well to unseen data |