Exam format: 62 multiple-choice questions. Closed book. No notes. Review lecture notes and tutorial questions from all topics 1–10.

Lectures

L1 — Intro to Data Science
Venn diagram, data types, EDA, ML definitions
L2 — Types of Data
Structured/unstructured, qualitative/quantitative
L3 — Statistics
Distributions, hypothesis testing, p-values, CLT
L4 — Data Wrangling
Cleaning, imputation, outliers, missing data
L5 — Data Visualization
Chart selection, best practices, what to avoid
L6 — Building Models
Occam's Razor, bias-variance, baselines
L7 — Linear Regression
Residuals, gradient descent, regularization
L8 — Logistic Regression
Sigmoid, decision boundaries, cross-entropy
L9 — Topics in ML
Naive Bayes, decision trees, SVMs, AdaBoost
L10 — Distance & Clustering
KNN, K-means, distance metrics, hierarchical
📋 Formula Sheet — All Key Formulas
Binomial, Normal, z-score, regression, gradient descent, sigmoid, entropy, distance metrics

Key Distinctions to Know

Concept AConcept BKey Distinction
PopulationSampleEntire group vs subset
ParameterStatisticDescribes population vs describes sample
RegressionClassificationContinuous output vs discrete label
SupervisedUnsupervisedLabeled training data vs find structure with no labels
BiasVarianceUnderfitting (wrong assumptions) vs overfitting (too sensitive)
StructuredUnstructuredRows/columns table vs free-form text/audio/signals
NominalOrdinalCategories with no order vs categories with order
DiscreteContinuousCounted integers vs measurable with decimals
ErrorArtifactData fundamentally lost vs systematic processing problem
Manhattan (L1)Euclidean (L2)Sum of absolute differences vs straight-line distance

Confirmed Exam Sample Answers

Directly from the official review slides: