Aquifer Intelligence Playbook

Multi-Source Data Fusion for Groundwater Understanding

Author
Affiliation

Talking Aquifer

Cardiff University & Prairie Research Institute, University of Illinois Urbana-Champaign

Published

March 13, 2026

Stylized illustration of a landscape with wells, a stream, and an underlying aquifer.

0.1 Welcome

This playbook demonstrates how to extract knowledge from multiple data sources to understand underground water systems. It combines:

  • HTEM Geophysical Data (4.8 GB) - Subsurface structure and geology
  • Groundwater Monitoring (114 MB) - 356 wells documented, 18 with time series data
  • Weather/Climate Data (6 GB) - High-resolution meteorological records
  • USGS Stream Gauges - Surface water flow from 9 active gauges (3 within study area)

The power comes from fusion - combining these sources reveals insights impossible to obtain from any single dataset alone.

This playbook is grounded in the Mahomet aquifer system in central Illinois, but the patterns and workflows are intended to be reusable for other groundwater systems facing similar monitoring and decision challenges.

NoteWho This Playbook Is For

This playbook is designed for anyone curious about underground water systems:

  • Complete beginners who have never studied groundwater before
  • Students at any level wanting to explore environmental data science
  • Data enthusiasts who want hands-on examples with real environmental data
  • Water professionals who want to see what modern data tools can reveal
  • Decision makers who need to understand aquifer behavior and risks

No prerequisites required. We build everything from scratch—explaining concepts as we go. You do not need a water science background. Basic familiarity with Python is helpful but not required—all code is provided and explained, and you can skip it entirely and focus on the explanations and visualizations.

TipWhat This Playbook Does (and Does Not) Teach
  • This playbook does not teach Python or programming from scratch.
  • We provide complete, working code so you can:
    • See exactly how each analysis was run.
    • Understand which techniques are being used (trends, fusion, optimization, causal graphs, etc.).
    • Focus on what the results say about the aquifer and its behavior, not on learning syntax.

The primary goal is knowledge and insight extraction from four datasets—how the underground system works, how sources relate, and what patterns emerge. Decision-making examples appear mainly as illustrations of how such insights might be used in practice.

If you are not a coder, you can skim the code blocks and focus on the figures, explanations, and key takeaways.

0.2 Learning Objectives

By the end of this chapter, you will be able to:

  1. Identify the four core data sources (HTEM, wells, weather, streams) and explain what each contributes to understanding the aquifer system.
  2. Select a learning pathway that matches your background and time commitment (see Learning Pathways for detailed guidance).
  3. Articulate the types of questions this playbook answers about groundwater systems and data fusion.
  4. Navigate interactive figures and code examples effectively throughout the book.

Estimated Time: 30-45 minutes (reading); 60-90 minutes (with code exploration)

0.3 How to Use This Playbook

0.3.1 If You Are Completely New to Groundwater

  • Start with:
    • Learning Pathways“No Water Background” track.
    • Part 1 overview and HTEM Survey + Subsurface 3D Model chapters.
  • As you read:

0.3.2 If You Are a Data / ML Person

  • Start with:
  • While reading:
    • Treat chapters as case studies of techniques on environmental data.
    • Use the FAQ and Terminology chapters to fill water-domain gaps.

0.3.3 If You Are a Water / Hydro Expert

  • Start with:
    • Part 1 foundations (HTEM, wells, streams, weather).
    • Then jump to fusion and operations chapters that match your interests.
  • Focus on:
    • How multi-source data changes interpretation.
    • How quantitative tools support decisions (e.g., well siting, early warning).

0.3.4 If You Are a Decision Maker

  • Start with:
  • Focus on:
    • The big-picture behavior of the aquifer and the main uncertainties.
    • How data-driven insights can inform policy, planning, and communication.

0.4 Book Structure

Show code
flowchart LR
    subgraph P1["Part 1: Foundations"]
        A1[HTEM Survey]
        A2[Well Network]
        A3[Weather Data]
        A4[Stream Gauges]
    end

    subgraph P2["Part 2: Spatial"]
        B1[Material Maps]
        B2[Coverage Analysis]
        B3[Gap Identification]
    end

    subgraph P3["Part 3: Temporal"]
        C1[Trends]
        C2[Seasonality]
        C3[Extremes]
    end

    subgraph P4["Part 4: Fusion"]
        D1[Water Balance]
        D2[Causal Networks]
        D3[Predictions]
    end

    subgraph P5["Part 5: Operations"]
        E1[Forecasting]
        E2[Optimization]
        E3[Dashboards]
    end

    P1 --> P2
    P1 --> P3
    P2 --> P4
    P3 --> P4
    P4 --> P5

flowchart LR
    subgraph P1["Part 1: Foundations"]
        A1[HTEM Survey]
        A2[Well Network]
        A3[Weather Data]
        A4[Stream Gauges]
    end

    subgraph P2["Part 2: Spatial"]
        B1[Material Maps]
        B2[Coverage Analysis]
        B3[Gap Identification]
    end

    subgraph P3["Part 3: Temporal"]
        C1[Trends]
        C2[Seasonality]
        C3[Extremes]
    end

    subgraph P4["Part 4: Fusion"]
        D1[Water Balance]
        D2[Causal Networks]
        D3[Predictions]
    end

    subgraph P5["Part 5: Operations"]
        E1[Forecasting]
        E2[Optimization]
        E3[Dashboards]
    end

    P1 --> P2
    P1 --> P3
    P2 --> P4
    P3 --> P4
    P4 --> P5

Part Focus Chapters Key Question
1. Data Foundations Individual sources 7 What data do we have?
2. Spatial Patterns Geographic analysis 9 Where are things?
3. Temporal Dynamics Time series 11 How do things change?
4. Data Fusion Integration 13 What emerges from combination?
5. Operations Applications 10 How do we use this?
Reference Support 4 How do I understand terms/data?

Total: 54 chapters (consolidated from 111 in previous version)

0.5 The Story of the Aquifer

At the heart of this playbook is a hidden underground system—a buried valley filled with sand and gravel that stores and moves water like a giant sponge beneath the landscape.

NoteKey Terms in This Chapter
  • Aquifer: A body of rock or sediment that can store and transmit usable quantities of groundwater.
  • Recharge: Water that moves from the land surface down into the aquifer, often following rainfall or snowmelt.
  • Baseflow: The portion of streamflow that comes from groundwater slowly seeping into rivers and streams.
  • Confined aquifer: An aquifer that is overlain by low-permeability material, so water is under pressure and water levels can respond with a delay.

For a broader glossary, see the Terminology Translation chapter.

We combine four kinds of observations to understand this system:

  • HTEM: Like an MRI of the ground, showing where different underground materials are.
  • Wells: Direct measurements of how water levels change over time at specific points.
  • Weather: How much water falls from the sky and how conditions change day to day.
  • Streams: How water moves on the surface and interacts with the aquifer.

Conceptual cross-section of the Mahomet aquifer system with confining layers, wells, and a connected stream.

Conceptual cross-section of the buried valley aquifer system, showing confining layers, wells, and stream interaction.

Throughout the book, you will see how these pieces fit together to answer questions like:

  • Where is the main aquifer located and how thick is it?
  • Are water levels going up, down, or staying stable?
  • How quickly does the aquifer respond to droughts and heavy rains?
  • Where should we place new wells to balance yield, cost, and sustainability?

0.6 Quick Start Paths

This section summarizes the recommendations above in three concrete example personas.

0.6.1 For Computer Scientists

Start with Data Quality Audit, then explore Causal Discovery and ML Classification.

0.6.2 For Hydrogeologists

Start with HTEM Survey and 3D Model, then Water Balance.

0.6.3 For Decision Makers

Jump to Synthesis Narrative for key findings, then Value of Information for ROI analysis.

0.7 Key Findings Preview

ImportantCritical Discovery: Monitoring Gap

Wells with water level data are 3-25 km away from stream gauges. No collocated sensors exist for stream-aquifer interaction studies. This spatial mismatch limits fusion opportunities. Methods and maps for this finding are developed in Well Network Analysis and Stream Gauge Network, and summarized in the Data Quality Audit. The exact distances and thresholds are discussed, with assumptions and uncertainties, in those chapters.

TipFusion Value: $2.4M NPV

Combining all 4 data sources with ML models yields approximately $2.4M net present value over 5 years through better well siting, reduced drilling failures, and optimized pumping schedules. See Value of Information for the economic analysis details (including assumptions and sensitivity tests) and Well Placement Optimizer for the siting scenarios behind this estimate.

NoteAquifer Behavior: Confined System

The Mahomet Aquifer (Unit D) shows confined aquifer signatures: minimal seasonal amplitude (0.03-0.11 ft), 12-18 month memory, and 18-24 month drought recovery time. These behaviors are quantified in Water Level Trends, Seasonal Decomposition, and Memory Persistence Study, where you will also see how robust these ranges are under different modeling choices.

0.8 Interactive Visualizations

Most analytical figures in this playbook are interactive Plotly charts:

  • Hover for detailed data points
  • Zoom to explore regions of interest
  • Pan to navigate large datasets
  • Download data as CSV or images as PNG

Some conceptual illustrations (like the aquifer cross-section above) are static schematics, but analytical plots are designed to be explored interactively. For accessibility, key findings are always described in the surrounding text, and figures can be exported as static images or CSV files. Keyboard users can reach interactive charts via normal browser focus and use built-in Plotly controls (such as zoom and reset buttons) without relying on a mouse. For a concrete interactive example and a short walkthrough of what to look for, see Water Level Trends. For tips on using figures with screen readers and keyboard navigation, see the accessibility notes in the FAQ.

0.9 Getting Started

Before you begin, you will need:

  • Python 3.11+
  • Quarto installed (for rendering the book)
  • Basic familiarity with Git and the command line
# Clone the repository
git clone https://github.com/ngcharithperera/aquifer-data.git
cd aquifer-data

# Install dependencies
pip install -r requirements.txt

# Preview the book
cd aquifer-book
quarto preview

Make sure your config/data_config.yaml is configured to point to local copies of the HTEM, well, weather, and stream datasets; see the repository quickstart guide and the config/ documentation for details. The full datasets are large (approximately 11 GB total) and may not be bundled by default; follow the instructions there to obtain or mirror them locally.

ImportantData Requirements

Total Storage Required: ~11 GB

  • HTEM data: 4.8 GB
  • Weather/Climate: 6 GB
  • Groundwater DB: 114 MB
  • USGS Stream: ~50 MB

Setup Time: 30-60 minutes (first time only)

See QUICKSTART.md for detailed download and configuration instructions.

0.10 Data Access

from src.utils import get_data_path
from src.data_loaders import IntegratedDataLoader

htem_root = get_data_path("htem_root")
aquifer_db_path = get_data_path("aquifer_db")
weather_db_path = get_data_path("warm_db")
usgs_stream_root = get_data_path("usgs_stream")

with IntegratedDataLoader(
    htem_path=htem_root,
    aquifer_db_path=aquifer_db_path,
    weather_db_path=weather_db_path,
    usgs_stream_path=usgs_stream_root,
) as loader:
    # Load all 4 data sources using the configured paths
    htem = loader.htem.load_material_type_grid("D", "Preferred")
    wells = loader.groundwater.load_well_locations()
    weather = loader.weather.load_hourly_data(station_code=101)
    streams = loader.usgs_stream.load_daily_discharge("03337000")

    # Get overview of available data coverage
    overview = loader.get_data_overview()
    print(overview)

This example assumes your data paths are configured in config/data_config.yaml and that the aquifer, weather, and stream databases are available under the default data/ locations (or symlinked equivalents) referenced there. The station_code (101) and USGS site number (03337000) are representative examples; in your own environment, use station and site identifiers that exist in your local data. You can find lists of available IDs and fields in the Complete Data Dictionary.

0.11 Reflection Questions

  • After skimming this introduction, which learning pathway and first chapter will you start with?
  • Which of the four core data sources (HTEM, wells, weather, streams) is most familiar to you, and which is most new?
  • What kinds of questions about the aquifer (trends, risks, decisions) are you most interested in answering as you work through the playbook?

0.12 Contributing

This is a living document. Contributions welcome:

  1. Report issues on GitHub
  2. Submit pull requests for improvements
  3. Add new analyses following the patterns established

See FAQ for common questions.


Last updated: March 13, 2026


0.13 Playbook Philosophy: Insight First, Applications Second

This playbook is, above all, about understanding:

  • How a buried aquifer system is structured.
  • How it responds to weather, pumping, and time.
  • How four complementary datasets can be combined to reveal patterns we could not see with any one alone.

Chapters in Parts 4 and 5 sometimes show how these insights might be applied (e.g., comparing well locations, sketching a dashboard, estimating value of information). These are meant as illustrative examples of what becomes possible once we understand the system—not as prescriptive decision tools.

0.13.1 While You Read

As you read, you can always ask: - What new thing did we learn about the aquifer from this chapter? - How did combining datasets change or sharpen that understanding?

If those questions are answered clearly, the playbook is doing its job.

0.13.2 Quick Self-Check

After finishing this chapter, you should be able to:

  • Name the four core data sources and say, in one sentence each, what they contribute to understanding the aquifer.
  • Identify which part of the book you are most interested in (Foundations, Spatial, Temporal, Fusion, or Operations) and a reasonable starting chapter for your background.
  • Explain, in 1–2 sentences, why data fusion (combining multiple sources) reveals insights that a single dataset cannot.