30  Data Fusion Overview

The CORE Value - Combining Multiple Data Sources

TipFor Newcomers

You will get: - A clear sense of why combining data sources matters more than looking at any one alone. - Examples of how structure, weather, wells, and streams fit into a single story. - A first look at concepts like water balance, causal networks, and information value.

Read these first if you are new: - Part 1 overview and at least one chapter on each core dataset (HTEM, wells, weather, streams).

You can treat the more mathematical chapters as optional deep dives on first reading and focus on the narrative and key findings.

30.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

  • Explain the role of Part 4 in the book and why fusing HTEM, wells, weather, and streams reveals insights that no single source can provide on its own.
  • Describe the main categories of fusion analyses in this part (water balance, structural–temporal coupling, causal/network, and decision‑support) and how they build on earlier parts.
  • Identify which prerequisite chapters you should read first based on your background (manager, practitioner, researcher) and what you want to achieve.
  • Articulate at least one concrete management or research question that motivates using data fusion rather than single‑source analyses.

30.2 The Fusion Paradigm

Part 4 represents the HEART of this playbook: combining multiple data sources to extract insights impossible from any single source alone.

30.2.1 Why Data Fusion Matters

Important🎯 The Central Insight

1 + 1 > 2: The value of combining data sources exceeds the sum of their individual values.

  • HTEM alone tells us about aquifer structure
  • Weather alone tells us about forcing
  • Groundwater wells alone show response
  • Stream gauges alone measure discharge

But combined, they reveal: - How structure controls response - Where recharge occurs - When water moves through the system - Why the system behaves as it does

30.2.2 The 12 Fusion Analyses

Each chapter in this part combines 2+ data sources:

Chapter Sources Fused Key Question
1. Water Balance Closure Weather + GW + Stream Does the water budget close?
2. Recharge Rate Estimation HTEM + Weather + GW Where does precipitation become recharge?
3. Stream-Aquifer Exchange USGS Stream + GW Are streams gaining or losing?
4. HTEM-Groundwater Fusion HTEM + GW Does structure control response?
5. Weather-Response Fusion HTEM + Weather + GW How does geology moderate climate impact?
6. Temporal Fusion Engine All 4 sources Can we predict using all data?
7. Causal Discovery Network All 4 sources What causes what?
8. Information Flow Analysis All 4 sources Which sensors provide most value?
9. Network Connectivity Map GW + Stream + HTEM How is water connected?
10. Scenario Impact Analysis All 4 sources What happens under change?
11. Bayesian Uncertainty Model All 4 sources How uncertain are we?
12. Value of Information All 4 sources Which data should we collect?

30.2.3 Methodological Progression

Chapters 1-3: Physical water balance - Mass balance equations - Conservation of water - Quantitative flux estimation

Chapters 4-6: Structural-temporal coupling - Geology controls dynamics - Multi-source prediction - Feature engineering from fusion

Chapters 7-9: Causal and network analysis - Directional relationships - Information theory - System connectivity

Chapters 10-12: Decision support - Scenario simulation - Uncertainty quantification - Optimal monitoring design

30.2.4 Key Principles

Note💻 For Computer Scientists

Data fusion is not simple concatenation:

#| code-fold: true
# ❌ Wrong: Simple merge
df_merged = df_htem.merge(df_gw).merge(df_weather)
model.fit(df_merged, target)

# ✅ Right: Physics-informed fusion
# 1. Align spatially (nearest neighbor, kriging)
# 2. Align temporally (common timebase, lags)
# 3. Engineer interaction features (K × P, VCI moderation)
# 4. Validate with physical constraints

Challenges addressed: - Spatial misalignment: HTEM grid ≠ well locations - Temporal misalignment: Hourly weather → monthly GW response - Scale mismatch: Point measurements vs grid averages - Missing data: Not all sources overlap in time/space

Tip🌍 For Hydrogeologists

Why fusion matters for groundwater:

  1. Aquifer characterization: HTEM provides K, wells provide heads → solve for T
  2. Recharge estimation: Weather provides P, HTEM provides infiltration capacity → R
  3. Baseflow separation: GW levels + stream discharge → groundwater contribution
  4. Drought prediction: Weather forecast + aquifer storage + pumping → future levels

Historical context: Before geophysics, we relied on sparse wells. Before continuous monitoring, we relied on snapshots. Data fusion unlocks the complete picture.

30.2.5 Cross-References

Prerequisites (read these first): - Part 1: Individual data source characteristics - Part 2: Data loading and quality control - Part 3: Single-source analyses (establish baselines)

Builds toward: - Part 5: Predictive models, optimization, and operational dashboards

30.2.6 Visualization Philosophy

Every fusion chapter includes: - Sankey diagrams: Show water/information flow between sources - Dual-axis time series: Overlay sources on common timeline - Correlation matrices: Quantify inter-source relationships - Spatial overlays: Show where sources agree/disagree

30.2.7 Reading Guide

For managers (30 min): - Read Chapter 1 (Water Balance) - understand water accounting - Read Chapter 10 (Scenario Analysis) - see decision support - Skim Chapter 12 (Value of Information) - prioritize monitoring

For practitioners (3 hours): - Work through Chapters 1-6 sequentially - Focus on methods applicable to your aquifer - Adapt code to your data sources

For researchers (1 week): - Deep dive into all 12 chapters - Reproduce analyses with provided data - Extend methods to novel fusion combinations

30.3 Reflection Questions

  • Think about a groundwater management decision in your region (for example, drought planning, new well siting, or MAR design). Which data sources from this playbook would you need to combine to support that decision, and why is a single dataset insufficient?
  • Looking at the 12 fusion analyses, which ones feel directly useful for your work today, and which ones seem like “future you” or a partner might care about? How does that influence how you will read this part?
  • Where do you expect the biggest mismatches in space, time, or scale between the data sources (HTEM, wells, weather, streams), and what risks do those mismatches pose if you ignore them in fusion analyses?
  • If you could add one new monitoring asset (for example, an extra well, a stream gauge, or a weather station), where would you place it to maximize the value of information for fusion‑based decision support?

Next: Chapter 1 - Water Balance Closure (The foundation of all fusion analyses)