Finding Relationships
Correlation, Comparison, and Connections
Understanding how variables relate to each other is often the heart of analysis. Does X affect Y? Do these groups differ? What moves together?
Correlation
What Correlation Measures
Correlation measures the strength and direction of a linear relationship between two variables.
Correlation coefficient (r): Ranges from -1 to +1
- +1: Perfect positive relationship (as X increases, Y increases)
- 0: No linear relationship
- -1: Perfect negative relationship (as X increases, Y decreases)
Interpreting Correlation
| Correlation | Interpretation |
|---|---|
| 0.9 to 1.0 | Very strong positive |
| 0.7 to 0.9 | Strong positive |
| 0.5 to 0.7 | Moderate positive |
| 0.3 to 0.5 | Weak positive |
| 0 to 0.3 | Little to none |
Same thresholds apply negatively.
Correlation Warnings
Correlation ≠ Causation: Two things moving together doesn't mean one causes the other.
Linear only: Correlation measures linear relationships. Strong non-linear relationships can have low correlation.
Outliers affect it: A few extreme points can dramatically change correlation.
Spurious correlations: With enough variables, some will correlate by chance.
Comparing Groups
Are Groups Different?
Common question: Does one group perform differently than another?
Examples:
- New customers vs. existing customers
- Treatment vs. control
- Region A vs. Region B
- Before vs. after
What to Compare
Central tendency: Is the average/median different?
Distribution: Are the shapes different?
Variability: Is one group more variable?
Statistical Significance
Just because means differ doesn't mean the difference is real. Could be random variation.
Statistical tests help determine if differences are likely real or just noise:
- t-test: Compare two group means
- Chi-square: Compare categorical distributions
- ANOVA: Compare multiple group means
Practical significance: Even statistically significant differences may be too small to matter practically.
Segmentation Analysis
Splitting Data
Dividing data into meaningful groups to compare:
- By customer type
- By time period
- By geography
- By product category
Finding Segments
Look for natural groupings where behavior differs.
Approach:
- Hypothesize segments
- Split data
- Compare metrics
- Validate differences
RFM Analysis (Example)
Classic customer segmentation:
- Recency: How recently did they buy?
- Frequency: How often do they buy?
- Monetary: How much do they spend?
Score customers on each, combine into segments.
Cross-Tabulation
What It Is
A table showing the relationship between two categorical variables.
Example
| Product A | Product B | Product C | |
|---|---|---|---|
| Region 1 | 100 | 50 | 30 |
| Region 2 | 80 | 120 | 60 |
| Region 3 | 150 | 90 | 45 |
What to Look For
- Are patterns consistent across categories?
- Do certain combinations stand out?
- Are differences meaningful?
Time-Based Comparisons
Period-over-Period
Compare same metric across different times:
- Month over month
- Year over year
- This week vs. last week
Seasonality
Patterns that repeat regularly:
- Holiday spikes
- Summer slowdowns
- Day-of-week patterns
Trend Analysis
Long-term direction:
- Is this metric growing, shrinking, or flat?
- What's the rate of change?
- Are there inflection points?
AI Prompt: Relationship Analysis
Help me analyze the relationship between these variables.
Variable 1: [Describe it]
Variable 2: [Describe it]
Sample data: [Paste if available]
Context: [What these represent]
Please help me:
1. Quantify the relationship
2. Visualize it appropriately
3. Interpret what this means
4. Identify any caveats
5. Suggest what this implies for my question
AI Prompt: Group Comparison
Help me compare these groups.
Group A: [Description and data]
Group B: [Description and data]
Metric I'm comparing: [What you're measuring]
Question: [What you want to know]
Please:
1. Calculate appropriate comparison statistics
2. Assess whether differences are meaningful
3. Visualize the comparison
4. Interpret the findings
5. Note any limitations
What's Next
Can we predict what will happen? Let's explore.
Next chapter: Basic predictive analysis.