Data Management
Working With Data
Real-world data is messy. It has inconsistencies, duplicates, gaps, and formatting problems. Good data management skills are what separate spreadsheet users from spreadsheet masters.
This chapter covers how to organize, clean, and maintain your data.
Sorting
Basic Sort
Select your data range (including headers) → Data → Sort
Options:
- Sort by one column
- Sort by multiple columns (first by Region, then by Date)
- Ascending (A-Z, smallest to largest, oldest to newest)
- Descending (Z-A, largest to smallest, newest to oldest)
Quick Sort
Click any cell in a column → Data → Sort A-Z or Sort Z-A
Warning: Make sure your entire data table is selected, or you'll sort only one column and misalign your data.
Custom Sort Order
For non-alphabetical ordering (like days of the week or priority levels): Sort → Order → Custom List
Filtering
AutoFilter
Select your data → Data → Filter
Dropdown arrows appear on headers. Click to filter:
- Check/uncheck specific values
- Filter by condition (greater than, contains, etc.)
- Search within the filter
Multiple Filters
Filters work together. Filter Column A to "Sales" AND Column B to ">1000"
Clearing Filters
Data → Clear to remove all filters, or clear individual column filters from the dropdown.
Advanced Filter
For complex criteria: Data → Advanced → Set criteria range
Allows OR conditions and more complex logic.
Removing Duplicates
Built-in Tool
Select your data → Data → Remove Duplicates
Choose which columns to consider when identifying duplicates.
Note: This permanently deletes duplicate rows. Consider working on a copy first.
Finding Duplicates (Without Removing)
Use conditional formatting: Home → Conditional Formatting → Highlight Cell Rules → Duplicate Values
Or use COUNTIF to flag duplicates:
=COUNTIF(A:A, A1)>1 returns TRUE for duplicates
Data Cleaning
Common Problems
Extra spaces: Use =TRIM(A1) to remove leading, trailing, and extra internal spaces.
Inconsistent capitalization: Use =PROPER(A1), =UPPER(A1), or =LOWER(A1).
Numbers as text: Look for left-aligned "numbers" or green triangles. Multiply by 1 or use Value().
Mixed formats: Dates entered as text, inconsistent date formats.
Typos and variations: "New York", "new york", "NY", "N.Y."
Find and Replace
Ctrl+H opens Find and Replace
Useful for:
- Fixing common typos
- Standardizing text
- Removing unwanted characters
Tip: Leave "Replace with" empty to delete the found text.
Text to Columns
Split a single column into multiple columns: Data → Text to Columns
Delimited: Split by comma, tab, space, or custom character. Fixed width: Split at specific positions.
Common use: Splitting "Last, First" into separate columns, or splitting addresses.
Flash Fill
Excel can recognize patterns and fill automatically (Excel 2013+):
- In a new column, type the pattern you want for the first 1-2 rows
- Excel may suggest the pattern automatically (press Enter to accept)
- Or: Data → Flash Fill (Ctrl+E)
Examples:
- Extract first name from full name
- Reformat phone numbers
- Combine or split fields
Data Validation
What It Does
Restricts what can be entered in a cell:
- Dropdown lists
- Number ranges
- Dates
- Text length
- Custom formulas
Setting Up Validation
Select cells → Data → Data Validation
Types:
- List: Create a dropdown from a list of values
- Whole number/Decimal: Set allowed range
- Date: Restrict to date range
- Text length: Min/max characters
- Custom: Use a formula to define valid inputs
Dropdown Lists
Data Validation → List → Enter values separated by commas, or reference a range.
Example: "Yes,No,Maybe" or "=$A$1:$A$5"
Input Messages and Alerts
Add helpful messages:
- Input Message: Shows when cell is selected
- Error Alert: Shows when invalid data is entered
Tables (Structured References)
Converting to Table
Select data → Insert → Table (Ctrl+T)
Table Benefits
- Automatic formatting and banding
- Filter buttons built-in
- Formulas auto-extend to new rows
- Structured references (use column names in formulas)
- Named automatically for easy reference
Structured References
Inside a table, use column names:
=SUM(Table1[Sales]) instead of =SUM(C:C)
=[@Revenue]-[@Cost] refers to same-row values
Expanding Tables
Just type below the last row — the table expands automatically.
Named Ranges
Creating Named Ranges
Select a range → Name Box (left of formula bar) → Type a name → Enter
Or: Formulas → Define Name
Naming rules:
- No spaces (use underscores)
- Start with letter or underscore
- No special characters
Using Named Ranges
Instead of: =SUM(B2:B100)
Use: =SUM(Sales_2024)
Makes formulas readable and easy to update.
Managing Names
Formulas → Name Manager — view, edit, delete named ranges
Data Organization Best Practices
One Dataset Per Sheet
Keep each distinct dataset on its own sheet. Don't mix multiple tables on one sheet.
Headers in Row 1
Column headers should be in the first row, clearly labeled.
No Blank Rows or Columns
Keep data contiguous. Blank rows break sorting, filtering, and pivot tables.
Consistent Data Types
Each column should have one data type. Don't mix dates and text in the same column.
Avoid Merged Cells
They break functionality. Use Center Across Selection instead if needed for appearance.
AI Prompt: Data Cleaning
I have messy data in Excel that needs cleaning.
The problems I see:
[Describe the issues — duplicates, inconsistent formats, etc.]
My data looks like:
[Describe or paste a few rows]
Help me:
1. Identify all the cleaning steps needed
2. Provide formulas or techniques for each step
3. Suggest the best order to clean
What's Next
Your data is organized. Now let's connect data across different locations.
Next chapter: Lookups and references — VLOOKUP, XLOOKUP, INDEX-MATCH, and connecting data.