Structured vs unstructured, quantitative vs qualitative, discrete, continuous, nominal, ordinal
Why unstructured data matters: Most data in the real world IS unstructured — tweets, emails, literature, server logs. We must apply pre-processing techniques to extract structured features from it before using standard models.
Example: To analyze emails, you might count word frequencies and convert each email into a row with columns like "count_of_word_free", "count_of_word_money", etc.
Values are distinct and separate, typically counted. Always integers — cannot have fractions or decimals. You cannot have 2.5 customers or a dice roll of 3.7.
Examples: Number of customers in a shop, dice roll (1-6), number of children in a family, number of cars in a parking lot.
Values are measured and can take any value within a range, including fractions and decimals. Theoretically infinite precision.
Examples: Height (170.2cm or 175.85cm), weight (68.5kg or 89.66kg), temperature, revenue ($12,345.67).
Categories with no natural order or ranking. You cannot say one category is "greater than" another. Comparison between categories is meaningless.
Examples: Hair color (red, brown, blonde), blood type (A, B, AB, O), country name, eye color, zip code.
Categories with a meaningful order or ranking, BUT the gaps between ranks are NOT equal or measurable. You know which is "more" but not by how much.
Examples: Competition placing (1st, 2nd, 3rd), satisfaction survey (Poor, Fair, Good, Excellent), star ratings (1-5 stars), Likert scales (Strongly Agree to Strongly Disagree).
| Category | Sub-type | Arithmetic OK? | Order? | Examples |
|---|---|---|---|---|
| Quantitative | Discrete | Yes (integers only) | Yes | Customer count, dice roll, no. of children |
| Continuous | Yes (any decimal) | Yes | Height, weight, temperature, revenue | |
| Qualitative | Nominal | No | No | Hair color, country, blood type, zip code |
| Ordinal | No | Yes (unequal gaps) | Survey ratings, competition rank, star ratings |
A zip code like 6000 or 90210 looks like a number, but it is qualitative nominal. Why? Because:
General rule: Whenever a word/label could substitute for a number without losing meaning, it is qualitative. The test is always: "Can I meaningfully add these? Can I meaningfully average these?"
| Quantitative questions | Qualitative questions |
|---|---|
| What is the average value? | Which value occurs most/least? |
| Does this increase or decrease over time? | How many unique values are there? |
| Is there a dangerous threshold? | What are all the unique values? |
| What is the standard deviation? | What proportion belongs to each category? |
| Field | Type | Reasoning |
|---|---|---|
| Name of coffee shop | Qualitative Nominal | No arithmetic meaning; no ordering between names |
| Revenue ($thousands) | Quantitative Continuous | Can add/average; decimals are valid ($12,345.67) |
| Zip code | Qualitative Nominal | Numbers but no meaningful arithmetic or ordering |
| Monthly customers | Quantitative Discrete | Counted (whole people only, no 2.5 customers) |
| Country of coffee origin | Qualitative Nominal | No order between countries (Ethiopia not > Colombia) |
| Star rating (1-5) | Qualitative Ordinal | Has order, but 5-4 gap may not equal 2-1 gap |
Structured = rows/columns (ML needs this). Unstructured = free-form (needs pre-processing). Quantitative = arithmetic is meaningful. Qualitative = categories, arithmetic is nonsensical. Quantitative splits into discrete (counted, integers only) and continuous (measured, decimals OK). Qualitative splits into nominal (no order, e.g. hair color) and ordinal (ordered but unequal gaps, e.g. rankings, Likert scales). Zip codes = qualitative nominal despite looking like numbers. Key test: can you meaningfully compute an average?
Q1. Customer satisfaction measured as "Poor, Fair, Good, Excellent" is what type of data?
Answer: C
There is a clear order (Poor < Fair < Good < Excellent) but the gaps between levels are not equal — the improvement from "Poor" to "Fair" may not be the same as from "Good" to "Excellent." Arithmetic is not meaningful (you can't compute "average satisfaction"), making it qualitative. Order present + unequal gaps = ordinal.
Q2. A zip code is classified as:
Answer: C
Despite being represented as numbers, zip codes are qualitative nominal. Computing "the average zip code" or asking which zip code is "greater than" another is meaningless. No arithmetic and no natural ordering = nominal qualitative.
Q3. Which data type consists of distinct values that are always integers, typically obtained by counting?
Answer: D
Discrete data consists of distinct, separate values that are counted (not measured). They are always integers — you cannot have 2.5 customers, a dice roll of 3.7, or 1.5 children. The key test: would a fractional value make sense? If no, it is discrete.
Q4. Revenue in thousands of dollars is:
Answer: D
Revenue can take any value including decimals ($12,345.67) and arithmetic is fully meaningful (you can add revenues, find averages, compare differences). This makes it quantitative continuous. It is not discrete because it can have fractional values.
Q5. The key difference between nominal and ordinal data is:
Answer: B
Ordinal data has a meaningful rank order (1st > 2nd > 3rd, Excellent > Good > Poor) but the intervals between ranks are not equal. Nominal data has no ordering whatsoever — hair color "red" is not greater or less than "brown." Neither allows meaningful arithmetic.
Q6. Most statistical and ML models require which type of data?
Answer: C
Most statistical and ML models were built with structured data in mind. They expect a row/column format where each row is one observation and each column is one feature. Unstructured data (text, audio, logs) must be pre-processed into structured form first.
Q7. A student rates a movie 3.5 out of 5 stars. This fractional rating suggests the rating scale is:
Answer: B
Traditional star ratings (1-5 whole stars) are treated as qualitative ordinal — ordered but unequal gaps. However, if decimals/fractions are allowed (3.5 stars), it starts to behave more like quantitative data. In this course, star ratings are typically treated as ordinal. The key is context and whether arithmetic operations produce meaningful results.
Q8. Blood type (A, B, AB, O) is an example of:
Answer: D
Blood type is qualitative nominal. There is no ordering between blood types (type A is not "more than" type O), and arithmetic is meaningless (the "average blood type" makes no sense). It is simply a categorical label with no mathematical properties.
Q9. Which of the following questions can ONLY be asked about quantitative data?
Answer: C
Average and standard deviation require meaningful arithmetic, which only applies to quantitative data. Questions A, B, and D can be asked about both quantitative and qualitative data. Asking for the "average hair color" or "standard deviation of zip codes" is nonsensical.
Q10. The number of customers visiting a coffee shop each day is:
Answer: B
Customer count is quantitative (arithmetic is meaningful: you can average, compare, sum) and discrete (you count whole people — you cannot have 2.5 customers). The number 237 customers per day is a whole integer.
Q11. Competition placing (1st, 2nd, 3rd) is what type of data?
Answer: D
Competition placings are ordinal — 1st is better than 2nd is better than 3rd (there is a meaningful order). However, the gap between 1st and 2nd place may not be the same as between 2nd and 3rd (one winner might dominate while the others are close). Arithmetic is not meaningful (you cannot average 1st and 3rd place to get "2nd place equivalent performance").
Q12. An important reason why understanding data types matters for model selection is:
Answer: B
If you encode nominal categories as integers (blond=0, brown=1, red=2) and use a model that treats them as quantitative, the model will incorrectly assume red (2) is twice as much as blond (0), or that brown is between them. This creates false mathematical relationships. Wrong data type → wrong model → wrong results.