Data Management

Working With Data

Real-world data is messy. It has inconsistencies, duplicates, gaps, and formatting problems. Good data management skills are what separate spreadsheet users from spreadsheet masters.

This chapter covers how to organize, clean, and maintain your data.

Sorting

Basic Sort

Select your data range (including headers) → Data → Sort

Options:

  • Sort by one column
  • Sort by multiple columns (first by Region, then by Date)
  • Ascending (A-Z, smallest to largest, oldest to newest)
  • Descending (Z-A, largest to smallest, newest to oldest)

Quick Sort

Click any cell in a column → Data → Sort A-Z or Sort Z-A

Warning: Make sure your entire data table is selected, or you'll sort only one column and misalign your data.

Custom Sort Order

For non-alphabetical ordering (like days of the week or priority levels): Sort → Order → Custom List

Filtering

AutoFilter

Select your data → Data → Filter

Dropdown arrows appear on headers. Click to filter:

  • Check/uncheck specific values
  • Filter by condition (greater than, contains, etc.)
  • Search within the filter

Multiple Filters

Filters work together. Filter Column A to "Sales" AND Column B to ">1000"

Clearing Filters

Data → Clear to remove all filters, or clear individual column filters from the dropdown.

Advanced Filter

For complex criteria: Data → Advanced → Set criteria range

Allows OR conditions and more complex logic.

Removing Duplicates

Built-in Tool

Select your data → Data → Remove Duplicates

Choose which columns to consider when identifying duplicates.

Note: This permanently deletes duplicate rows. Consider working on a copy first.

Finding Duplicates (Without Removing)

Use conditional formatting: Home → Conditional Formatting → Highlight Cell Rules → Duplicate Values

Or use COUNTIF to flag duplicates: =COUNTIF(A:A, A1)>1 returns TRUE for duplicates

Data Cleaning

Common Problems

Extra spaces: Use =TRIM(A1) to remove leading, trailing, and extra internal spaces.

Inconsistent capitalization: Use =PROPER(A1), =UPPER(A1), or =LOWER(A1).

Numbers as text: Look for left-aligned "numbers" or green triangles. Multiply by 1 or use Value().

Mixed formats: Dates entered as text, inconsistent date formats.

Typos and variations: "New York", "new york", "NY", "N.Y."

Find and Replace

Ctrl+H opens Find and Replace

Useful for:

  • Fixing common typos
  • Standardizing text
  • Removing unwanted characters

Tip: Leave "Replace with" empty to delete the found text.

Text to Columns

Split a single column into multiple columns: Data → Text to Columns

Delimited: Split by comma, tab, space, or custom character. Fixed width: Split at specific positions.

Common use: Splitting "Last, First" into separate columns, or splitting addresses.

Flash Fill

Excel can recognize patterns and fill automatically (Excel 2013+):

  1. In a new column, type the pattern you want for the first 1-2 rows
  2. Excel may suggest the pattern automatically (press Enter to accept)
  3. Or: Data → Flash Fill (Ctrl+E)

Examples:

  • Extract first name from full name
  • Reformat phone numbers
  • Combine or split fields

Data Validation

What It Does

Restricts what can be entered in a cell:

  • Dropdown lists
  • Number ranges
  • Dates
  • Text length
  • Custom formulas

Setting Up Validation

Select cells → Data → Data Validation

Types:

  • List: Create a dropdown from a list of values
  • Whole number/Decimal: Set allowed range
  • Date: Restrict to date range
  • Text length: Min/max characters
  • Custom: Use a formula to define valid inputs

Dropdown Lists

Data Validation → List → Enter values separated by commas, or reference a range.

Example: "Yes,No,Maybe" or "=$A$1:$A$5"

Input Messages and Alerts

Add helpful messages:

  • Input Message: Shows when cell is selected
  • Error Alert: Shows when invalid data is entered

Tables (Structured References)

Converting to Table

Select data → Insert → Table (Ctrl+T)

Table Benefits

  • Automatic formatting and banding
  • Filter buttons built-in
  • Formulas auto-extend to new rows
  • Structured references (use column names in formulas)
  • Named automatically for easy reference

Structured References

Inside a table, use column names: =SUM(Table1[Sales]) instead of =SUM(C:C)

=[@Revenue]-[@Cost] refers to same-row values

Expanding Tables

Just type below the last row — the table expands automatically.

Named Ranges

Creating Named Ranges

Select a range → Name Box (left of formula bar) → Type a name → Enter

Or: Formulas → Define Name

Naming rules:

  • No spaces (use underscores)
  • Start with letter or underscore
  • No special characters

Using Named Ranges

Instead of: =SUM(B2:B100) Use: =SUM(Sales_2024)

Makes formulas readable and easy to update.

Managing Names

Formulas → Name Manager — view, edit, delete named ranges

Data Organization Best Practices

One Dataset Per Sheet

Keep each distinct dataset on its own sheet. Don't mix multiple tables on one sheet.

Headers in Row 1

Column headers should be in the first row, clearly labeled.

No Blank Rows or Columns

Keep data contiguous. Blank rows break sorting, filtering, and pivot tables.

Consistent Data Types

Each column should have one data type. Don't mix dates and text in the same column.

Avoid Merged Cells

They break functionality. Use Center Across Selection instead if needed for appearance.

AI Prompt: Data Cleaning

I have messy data in Excel that needs cleaning.

The problems I see:
[Describe the issues — duplicates, inconsistent formats, etc.]

My data looks like:
[Describe or paste a few rows]

Help me:
1. Identify all the cleaning steps needed
2. Provide formulas or techniques for each step
3. Suggest the best order to clean

What's Next

Your data is organized. Now let's connect data across different locations.

Next chapter: Lookups and references — VLOOKUP, XLOOKUP, INDEX-MATCH, and connecting data.