Descriptive Statistics
Summarizing Data in Meaningful Ways
Descriptive statistics condense data into understandable numbers. They answer the question: "What does this data look like?"
Measures of Central Tendency
Mean (Average)
Add all values, divide by count.
When to use: When values are roughly symmetric and outliers aren't extreme.
Example: Average order value, mean temperature.
Limitation: Sensitive to outliers. A few extreme values can skew it dramatically.
Median (Middle Value)
Sort values, take the one in the middle.
When to use: When distribution is skewed or outliers exist.
Example: Median home price, median income.
Why it matters: If average income is $80,000 but median is $50,000, a few high earners are pulling the average up. Median better represents "typical."
Mode (Most Common)
The most frequently occurring value.
When to use: Categorical data, or when most common matters.
Example: Most popular product, most common complaint category.
Measures of Spread
Range
Maximum minus minimum.
Limitation: Only considers two values. Very sensitive to outliers.
Variance
Average of squared differences from the mean. Measures how spread out values are.
Note: Squared units make it hard to interpret directly.
Standard Deviation
Square root of variance. Same units as original data.
Interpretation: Roughly, the typical distance from the mean.
Rule of thumb: For normal distributions:
- ~68% of values within 1 standard deviation
- ~95% within 2 standard deviations
- ~99% within 3 standard deviations
Interquartile Range (IQR)
Distance between 25th and 75th percentiles. The middle 50%.
When to use: With skewed data or outliers.
Percentiles and Quartiles
Percentiles
The value below which a certain percentage falls.
Example: 90th percentile = 90% of values are below this.
Quartiles
Split data into four parts:
- Q1 (25th percentile): Lower quartile
- Q2 (50th percentile): Median
- Q3 (75th percentile): Upper quartile
Box Plot Elements
Box plots visualize these:
- Box: Q1 to Q3 (middle 50%)
- Line in box: Median
- Whiskers: Extend to reasonable range
- Points beyond: Outliers
Distribution Shapes
Symmetric (Normal)
Bell-shaped. Mean ≈ Median. Values balanced around center.
Right-Skewed (Positive Skew)
Long tail to the right. Mean > Median. Common in income, prices, counts.
Left-Skewed (Negative Skew)
Long tail to the left. Mean < Median. Less common; think test scores with ceiling.
Uniform
All values equally likely. Flat distribution.
Bimodal/Multimodal
Multiple peaks. May indicate distinct groups mixed together.
Counts and Proportions
Counts
Simple tallies. How many in each category?
Proportions/Percentages
Part relative to whole. Often more meaningful than raw counts.
Example: "3,000 customers churned" vs. "5% of customers churned"
Rates
Proportions over time or per unit. Enable fair comparison.
Example: Revenue per employee, defects per 1,000 units
Comparing Groups
Comparison of Means/Medians
Does one group have higher typical values?
Comparison of Spread
Does one group vary more than another?
Overlap
Do distributions overlap significantly, or are groups distinct?
Which Statistic to Use?
For "Typical" Value
- Symmetric data: Mean
- Skewed data: Median
- Categorical data: Mode
For "How Spread Out"
- Symmetric data: Standard deviation
- Skewed data: IQR
- Simple overview: Range (with caution)
For Comparison
- Show both mean and median: Difference reveals skew
- Show percentiles: For fuller picture
AI Prompt: Summary Statistics
Calculate and interpret summary statistics for my data.
Here's my data:
[Paste data or describe]
Please provide:
1. Appropriate measures of center (mean and/or median)
2. Measures of spread
3. Distribution shape assessment
4. Notable percentiles
5. Plain-language interpretation of what these statistics tell us
AI Prompt: Choosing Statistics
Help me choose the right statistics for my data.
My data: [Describe your variable(s)]
My purpose: [What you're trying to show]
Distribution: [If known — skewed, symmetric, etc.]
Recommend:
1. Which central tendency measure to use and why
2. Which spread measure to use and why
3. Additional statistics that would be informative
4. How to present these to my audience
What's Next
Numbers inform. Visuals communicate.
Next chapter: Data visualization — turning numbers into understanding.