Descriptive Statistics

Summarizing Data in Meaningful Ways

Descriptive statistics condense data into understandable numbers. They answer the question: "What does this data look like?"

Measures of Central Tendency

Mean (Average)

Add all values, divide by count.

When to use: When values are roughly symmetric and outliers aren't extreme.

Example: Average order value, mean temperature.

Limitation: Sensitive to outliers. A few extreme values can skew it dramatically.

Median (Middle Value)

Sort values, take the one in the middle.

When to use: When distribution is skewed or outliers exist.

Example: Median home price, median income.

Why it matters: If average income is $80,000 but median is $50,000, a few high earners are pulling the average up. Median better represents "typical."

Mode (Most Common)

The most frequently occurring value.

When to use: Categorical data, or when most common matters.

Example: Most popular product, most common complaint category.

Measures of Spread

Range

Maximum minus minimum.

Limitation: Only considers two values. Very sensitive to outliers.

Variance

Average of squared differences from the mean. Measures how spread out values are.

Note: Squared units make it hard to interpret directly.

Standard Deviation

Square root of variance. Same units as original data.

Interpretation: Roughly, the typical distance from the mean.

Rule of thumb: For normal distributions:

  • ~68% of values within 1 standard deviation
  • ~95% within 2 standard deviations
  • ~99% within 3 standard deviations

Interquartile Range (IQR)

Distance between 25th and 75th percentiles. The middle 50%.

When to use: With skewed data or outliers.

Percentiles and Quartiles

Percentiles

The value below which a certain percentage falls.

Example: 90th percentile = 90% of values are below this.

Quartiles

Split data into four parts:

  • Q1 (25th percentile): Lower quartile
  • Q2 (50th percentile): Median
  • Q3 (75th percentile): Upper quartile

Box Plot Elements

Box plots visualize these:

  • Box: Q1 to Q3 (middle 50%)
  • Line in box: Median
  • Whiskers: Extend to reasonable range
  • Points beyond: Outliers

Distribution Shapes

Symmetric (Normal)

Bell-shaped. Mean ≈ Median. Values balanced around center.

Right-Skewed (Positive Skew)

Long tail to the right. Mean > Median. Common in income, prices, counts.

Left-Skewed (Negative Skew)

Long tail to the left. Mean < Median. Less common; think test scores with ceiling.

Uniform

All values equally likely. Flat distribution.

Bimodal/Multimodal

Multiple peaks. May indicate distinct groups mixed together.

Counts and Proportions

Counts

Simple tallies. How many in each category?

Proportions/Percentages

Part relative to whole. Often more meaningful than raw counts.

Example: "3,000 customers churned" vs. "5% of customers churned"

Rates

Proportions over time or per unit. Enable fair comparison.

Example: Revenue per employee, defects per 1,000 units

Comparing Groups

Comparison of Means/Medians

Does one group have higher typical values?

Comparison of Spread

Does one group vary more than another?

Overlap

Do distributions overlap significantly, or are groups distinct?

Which Statistic to Use?

For "Typical" Value

  • Symmetric data: Mean
  • Skewed data: Median
  • Categorical data: Mode

For "How Spread Out"

  • Symmetric data: Standard deviation
  • Skewed data: IQR
  • Simple overview: Range (with caution)

For Comparison

  • Show both mean and median: Difference reveals skew
  • Show percentiles: For fuller picture

AI Prompt: Summary Statistics

Calculate and interpret summary statistics for my data.

Here's my data:
[Paste data or describe]

Please provide:
1. Appropriate measures of center (mean and/or median)
2. Measures of spread
3. Distribution shape assessment
4. Notable percentiles
5. Plain-language interpretation of what these statistics tell us

AI Prompt: Choosing Statistics

Help me choose the right statistics for my data.

My data: [Describe your variable(s)]
My purpose: [What you're trying to show]
Distribution: [If known — skewed, symmetric, etc.]

Recommend:
1. Which central tendency measure to use and why
2. Which spread measure to use and why
3. Additional statistics that would be informative
4. How to present these to my audience

What's Next

Numbers inform. Visuals communicate.

Next chapter: Data visualization — turning numbers into understanding.