34 HTEM-Groundwater Fusion

Subsurface Structure Controls Water Response

For Newcomers

You will get: - A picture of how underground materials (from HTEM) relate to how water levels behave in wells. - Examples of how structure (clay vs sand, thickness, depth) influences trends and seasonality. - Practice reading fusion plots that connect “what the ground is like” to “how the water moves”.

You can skim detailed model-fitting steps and focus on: - The maps and plots showing structure vs response, - The narrative about which structural patterns matter most for groundwater behavior, - And the main qualitative conclusions.

Data Sources Fused: HTEM Structure + Groundwater Wells

34.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

Describe how HTEM-derived subsurface structure (resistivity, material types, depth) relates to observed groundwater behavior in wells.
Interpret maps and summary plots that link structural classes (clay, mixed, sand/gravel) to response metrics such as variance, autocorrelation, and rate of change.
Explain why fusing dense spatial geophysics with sparse but rich well time series gives a more complete picture than either alone.
Discuss how structure–response relationships can inform well placement, monitoring frequency, and vulnerability assessment.

34.2 Overview

HTEM geophysics reveals the 3D structure of the subsurface - where clay layers confine the aquifer, where sand and gravel transmit water rapidly, and where bedrock forms boundaries. Groundwater well measurements show how water levels respond to pumping and recharge. The fusion question: How does geological structure control groundwater behavior?

This chapter connects static structure (HTEM) to dynamic response (water levels).

💻 For Computer Scientists

This is a structure-function mapping problem:

Input: HTEM resistivity (proxy for lithology)
Output: Well response characteristics (variance, trend, autocorrelation)
Challenge: Non-stationary relationships - same resistivity means different things at different depths

Machine learning approach: Spatial features from HTEM predict temporal patterns in wells.

🌍 For Hydrologists

Aquifer properties control well response:

HTEM Signature	Lithology	Expected Well Response
Low resistivity (<50 Ω·m)	Clay/silt	Low variance, slow response, confined
Medium resistivity (50-120 Ω·m)	Mixed sediments	Moderate variance, seasonal
High resistivity (>120 Ω·m)	Sand/gravel	High variance, rapid response, unconfined

Key insight: Wells in high-resistivity zones should show faster recharge response.

34.3 Analysis Approach

Show code

import os
import sys
from pathlib import Path
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

def find_repo_root(start: Path) -> Path:
    for candidate in [start, *start.parents]:
        if (candidate / "src").exists():
            return candidate
    return start

quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd())))
project_root = find_repo_root(quarto_project)
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

# Load HTEM Unit D data using IntegratedDataLoader
data_loaded = False

try:
    from src.data_loaders import IntegratedDataLoader
    from src.utils import get_data_path

    htem_root = get_data_path("htem_root")
    aquifer_db_path = get_data_path("aquifer_db")
    weather_db_path = get_data_path("warm_db")
    usgs_stream_root = get_data_path("usgs_stream")

    loader = IntegratedDataLoader(
        htem_path=htem_root,
        aquifer_db_path=aquifer_db_path,
        weather_db_path=weather_db_path,
        usgs_stream_path=usgs_stream_root
    )

    # Load Unit D (primary aquifer)
    htem_unit_d = loader.htem.load_material_type_grid('D', 'Preferred', sample_size=50000)

    # Rename columns if needed for compatibility
    if 'MT_Index' in htem_unit_d.columns:
        htem_unit_d['MaterialType'] = htem_unit_d['MT_Index']

    # Estimate resistivity from material type if not available
    if 'Resistivity' not in htem_unit_d.columns:
        # Map material type to approximate resistivity
        mt_to_resist = {
            1: 15, 2: 20, 3: 30, 4: 45, 5: 60, 6: 75, 7: 90,
            8: 110, 9: 130, 10: 150, 11: 180, 12: 220, 13: 280, 14: 350,
            101: 400, 102: 450, 103: 500, 104: 550, 105: 600
        }
        htem_unit_d['Resistivity'] = htem_unit_d['MaterialType'].map(mt_to_resist).fillna(100)

    # Add depth column if not available
    if 'Depth' not in htem_unit_d.columns:
        htem_unit_d['Depth'] = -htem_unit_d['Z']

    data_loaded = True
    print(f"✅ HTEM points loaded: {len(htem_unit_d):,}")

except Exception as e:
    print(f"Error loading HTEM via IntegratedDataLoader ({e}). Loading directly from CSV files.")

    import glob
    from src.utils import get_data_path

    htem_root = get_data_path("htem_root")

    # Load HTEM data from 3DGrids CSV files
    htem_files = glob.glob(f"{htem_root}/3DGrids/*.csv")

    # Find Unit D file (primary aquifer)
    unit_d_file = None
    for htem_file in htem_files:
        if 'Unit_D' in htem_file or 'UnitD' in htem_file or 'unit_d' in htem_file.lower():
            unit_d_file = htem_file
            break

    # If Unit D not found, try preferred model files
    if unit_d_file is None:
        for htem_file in htem_files:
            if 'preferred' in htem_file.lower() or 'Preferred' in htem_file:
                unit_d_file = htem_file
                break

    # If still not found, use any available CSV
    if unit_d_file is None and len(htem_files) > 0:
        unit_d_file = htem_files[0]

    if unit_d_file:
        print(f"Loading HTEM data from: {unit_d_file}")
        htem_unit_d = pd.read_csv(unit_d_file)

        # Identify columns (names may vary across files)
        # Look for X, Y, Z coordinates
        x_col = None
        y_col = None
        z_col = None
        resist_col = None
        material_col = None

        for col in htem_unit_d.columns:
            col_lower = col.lower()
            if col_lower in ['x', 'easting', 'utmx']:
                x_col = col
            elif col_lower in ['y', 'northing', 'utmy']:
                y_col = col
            elif col_lower in ['z', 'elevation', 'depth']:
                z_col = col
            elif 'resist' in col_lower or 'rho' in col_lower:
                resist_col = col
            elif 'material' in col_lower or 'type' in col_lower:
                material_col = col

        # Rename to standard column names
        rename_map = {}
        if x_col: rename_map[x_col] = 'X'
        if y_col: rename_map[y_col] = 'Y'
        if z_col: rename_map[z_col] = 'Z'
        if resist_col: rename_map[resist_col] = 'Resistivity'
        if material_col: rename_map[material_col] = 'MaterialType'

        htem_unit_d = htem_unit_d.rename(columns=rename_map)

        # Add depth column if not available
        if 'Depth' not in htem_unit_d.columns and 'Z' in htem_unit_d.columns:
            htem_unit_d['Depth'] = -htem_unit_d['Z']

        # Ensure numeric types
        if 'Resistivity' in htem_unit_d.columns:
            htem_unit_d['Resistivity'] = pd.to_numeric(htem_unit_d['Resistivity'], errors='coerce')
        if 'MaterialType' in htem_unit_d.columns:
            htem_unit_d['MaterialType'] = pd.to_numeric(htem_unit_d['MaterialType'], errors='coerce')

        # Drop rows with missing essential data
        essential_cols = [col for col in ['X', 'Y', 'Resistivity'] if col in htem_unit_d.columns]
        htem_unit_d = htem_unit_d.dropna(subset=essential_cols)

        data_loaded = True
        print(f"✅ HTEM points loaded from CSV: {len(htem_unit_d):,}")

# Clip extreme resistivity values
htem_unit_d['Resistivity'] = htem_unit_d['Resistivity'].clip(1, 500)

print(f"Resistivity range: {htem_unit_d['Resistivity'].min():.1f} - {htem_unit_d['Resistivity'].max():.1f} Ω·m")
print(f"Mean resistivity: {htem_unit_d['Resistivity'].mean():.1f} Ω·m")

34.4 HTEM Spatial Distribution

Before we can link subsurface structure to groundwater behavior, we need to understand what the structure looks like. The map below shows HTEM resistivity measurements across the study area, classified into three broad categories that correspond to different geological materials:

Low resistivity (red): Clay and silt—these fine-grained materials restrict water flow and create confined conditions
Medium resistivity (orange): Mixed sediments—transitional zones with moderate permeability
High resistivity (blue): Sand and gravel—these coarse materials transmit water readily and characterize productive aquifer zones

Look for spatial patterns: Are high-resistivity zones clustered? Do they follow buried river channels? These patterns will help explain why wells in different locations behave differently.

📊 How to Read This HTEM Map

Color-coded point cloud showing subsurface geology:

Color	Resistivity Range	Lithology Interpretation	Aquifer Quality
Red	<50 Ω·m	Clay/Silt	Poor (confining layer)
Orange	50-120 Ω·m	Mixed sediments	Moderate
Blue	>120 Ω·m	Sand/Gravel	Excellent (productive aquifer)

Spatial Patterns to Identify:

Linear blue corridors: Buried paleochannels (ancient river deposits)—high-yield aquifer zones
Patchy blue zones: Isolated sand lenses within clay matrix
Red blankets: Continuous clay layers that create confined conditions
Sharp color boundaries: Distinct geological contacts (faults, erosional surfaces)
Gradational transitions: Mixing of sediment types

Why this matters: High-resistivity corridors are prime targets for well placement and recharge enhancement. Red (clay) zones should be avoided for production wells but may indicate protected aquifer zones.

Show code

# Sample for visualization (plot every 10th point for speed)
sample_idx = np.arange(0, len(htem_unit_d), max(1, len(htem_unit_d) // 10000))
htem_sample = htem_unit_d.iloc[sample_idx].copy()

# Create resistivity classification
def classify_resistivity(resist):
    if resist < 50:
        return 'Low (<50 Ω·m)<br>Clay/Silt'
    elif resist < 120:
        return 'Medium (50-120 Ω·m)<br>Mixed Sediments'
    else:
        return 'High (>120 Ω·m)<br>Sand/Gravel'

htem_sample['ResistivityClass'] = htem_sample['Resistivity'].apply(classify_resistivity)

# Create scatter plot with material type coloring
fig = go.Figure()

# Plot by resistivity class
resist_classes = ['Low (<50 Ω·m)<br>Clay/Silt',
                 'Medium (50-120 Ω·m)<br>Mixed Sediments',
                 'High (>120 Ω·m)<br>Sand/Gravel']
colors = ['#e74c3c', '#f39c12', '#3498db']

for resist_class, color in zip(resist_classes, colors):
    data = htem_sample[htem_sample['ResistivityClass'] == resist_class]

    fig.add_trace(go.Scatter(
        x=data['X'],
        y=data['Y'],
        mode='markers',
        marker=dict(
            size=3,
            color=color,
            opacity=0.6
        ),
        name=resist_class,
        hovertemplate='<b>HTEM Point</b><br>' +
                     'X: %{x:.0f}<br>' +
                     'Y: %{y:.0f}<br>' +
                     'Resistivity: %{customdata[0]:.1f} Ω·m<br>' +
                     'Material Type: %{customdata[1]}<br>' +
                     'Depth: %{customdata[2]:.1f} m<br>' +
                     '<extra></extra>',
        customdata=data[['Resistivity', 'MaterialType', 'Depth']].values
    ))

fig.update_layout(
    title={
        'text': 'HTEM Unit D Resistivity Distribution (Primary Aquifer)<br><sub>Colors indicate lithology: Clay → Sand/Gravel</sub>',
        'x': 0.5,
        'xanchor': 'center'
    },
    xaxis_title='Easting (m, UTM Zone 16N)',
    yaxis_title='Northing (m, UTM Zone 16N)',
    height=700,
    hovermode='closest',
    plot_bgcolor='white',
    legend=dict(
        title='Resistivity Class',
        yanchor='top',
        y=0.99,
        xanchor='left',
        x=0.01,
        bgcolor='rgba(255, 255, 255, 0.8)'
    )
)

# Equal aspect ratio
fig.update_yaxes(scaleanchor="x", scaleratio=1)

fig.show()

(a) HTEM resistivity distribution showing subsurface structure and material types

(b)

Figure 34.1

What to notice in this map:

The spatial distribution of resistivity reveals the geological architecture of the aquifer system. In this study area, you may observe:

Linear features: High-resistivity corridors often trace buried paleochannels—ancient river deposits filled with sand and gravel that now serve as preferential flow paths
Patchy patterns: Isolated high-resistivity zones may represent localized sand lenses within a clay matrix
Gradational boundaries: Where colors blend gradually, the subsurface transitions smoothly between material types; sharp boundaries indicate distinct geological contacts

This heterogeneity is not random—it reflects thousands of years of sediment deposition, erosion, and reworking. Understanding these patterns is the first step toward predicting how water moves through the system.

34.5 Material Type Distribution

While the map above shows where different materials occur, we also need to understand how much of each material type exists. This matters because:

If the aquifer is dominated by clay, most wells will show confined, stable behavior
If sand and gravel dominate, wells will be more responsive to recharge and pumping
A mix of materials suggests complex, spatially-variable responses

The histogram below shows the frequency of each material type, while the box plots reveal how resistivity varies within each type (some overlap is expected due to measurement uncertainty and geological variability).

📊 Interpreting Material Type Distribution

This figure reveals the overall geological character of the aquifer:

Left Panel - Frequency Histogram:

Distribution Pattern	Aquifer Interpretation	Expected Well Behavior
Skewed left (types 1-5 dominate)	Clay-dominated system	Stable, confined, slow response
Bimodal (peaks at 3 AND 11)	Mixed system: clay + sand layers	Variable response by location
Skewed right (types 10-14 dominate)	Sand/gravel-dominated	Responsive, high variability

Right Panel - Resistivity Box Plots:

Increasing median with type: Confirms physical basis (higher type = coarser = higher resistivity)
Overlapping boxes: Natural geological variability—boundaries are gradational, not sharp
Wide boxes (high IQR): Heterogeneous material type with local variation
Narrow boxes: Well-sorted, consistent material

Physical meaning: A bimodal distribution (like this aquifer) indicates two distinct environments coexist—perhaps a confined clay-capped zone overlying an unconfined sand zone.

Show code

# Create histogram of material types
material_counts = htem_unit_d['MaterialType'].value_counts().sort_index()

# Material type descriptions (simplified)
material_labels = {
    1: 'Type 1: Clay',
    2: 'Type 2: Silty Clay',
    3: 'Type 3: Clayey Silt',
    4: 'Type 4: Silt',
    5: 'Type 5: Sandy Silt',
    6: 'Type 6: Silty Sand',
    7: 'Type 7: Fine Sand',
    8: 'Type 8: Medium Sand',
    9: 'Type 9: Coarse Sand',
    10: 'Type 10: Very Coarse Sand',
    11: 'Type 11: Well Sorted Sand',
    12: 'Type 12: Very Well Sorted Sand',
    13: 'Type 13: Extremely Well Sorted Sand',
    14: 'Type 14: Sand/Gravel'
}

# Create bar chart
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Material Type Frequency', 'Resistivity by Material Type'),
    specs=[[{'type': 'bar'}, {'type': 'box'}]]
)

# Bar chart of material type counts
fig.add_trace(
    go.Bar(
        x=material_counts.index,
        y=material_counts.values,
        marker_color='steelblue',
        text=material_counts.values,
        textposition='outside',
        hovertemplate='<b>%{x}</b><br>Count: %{y:,}<extra></extra>'
    ),
    row=1, col=1
)

# Box plot of resistivity by material type
for mat_type in sorted(htem_unit_d['MaterialType'].unique())[:10]:  # Top 10 types
    data = htem_unit_d[htem_unit_d['MaterialType'] == mat_type]
    fig.add_trace(
        go.Box(
            y=data['Resistivity'],
            name=f'Type {mat_type}',
            boxmean='sd',
            marker_color='coral'
        ),
        row=1, col=2
    )

fig.update_xaxes(title_text='Material Type', row=1, col=1)
fig.update_xaxes(title_text='Material Type', row=1, col=2)
fig.update_yaxes(title_text='Count', row=1, col=1)
fig.update_yaxes(title_text='Resistivity (Ω·m)', type='log', row=1, col=2)

fig.update_layout(
    title_text='HTEM Material Type Characterization',
    height=500,
    showlegend=False
)

fig.show()

Figure 34.2: Distribution of material types showing aquifer heterogeneity

Interpreting the distribution:

The material type histogram reveals the overall character of the aquifer system:

Skewed toward low types (1-5): A clay-dominated system—expect most wells to show stable, confined behavior with slow response to climate forcing
Skewed toward high types (10-14): A sand/gravel-dominated system—expect responsive wells with high variability
Bimodal distribution: Two distinct aquifer environments coexist—wells will behave very differently depending on their location

The box plots on the right show that resistivity increases systematically with material type, confirming the physical basis of our classification. However, note the overlap between adjacent types—this reflects both measurement uncertainty and the continuous nature of geological transitions. A resistivity of 80 Ω·m could be dense sand or loose gravel; context (depth, surrounding materials) helps resolve ambiguity.

34.6 Spatial Data Extraction

Now comes the critical fusion step: linking HTEM measurements to well locations. The challenge is that HTEM and wells sample the subsurface at different scales:

HTEM: Dense grid (~100m spacing), but measures bulk properties averaged over a volume
Wells: Point measurements, but capture actual aquifer conditions at specific depths

Our approach: for each well, extract HTEM statistics from all points within a 1 km radius. Why 1 km?

Hydrogeological relevance: Groundwater flow integrates conditions over kilometers, not meters
Statistical robustness: A larger radius captures more HTEM points, yielding more stable statistics
Scale matching: 1 km roughly matches the “capture zone” of a pumping well

The function below uses a KD-tree for efficient spatial search—essential when matching thousands of wells to millions of HTEM points.

#| code-fold: true
from scipy.spatial import cKDTree

def extract_htem_at_wells(wells_df, htem_df, radius_km=1.0):
    """
    Extract HTEM properties around each well.

    For each well, find all HTEM points within radius and compute statistics.

    Returns:
        DataFrame with well_id and HTEM statistics
    """
    # Build KD-tree for fast spatial search
    htem_coords = htem_df[['Latitude', 'Longitude']].values
    tree = cKDTree(htem_coords)

    results = []

    for idx, well in wells_df.iterrows():
        well_coord = [well['Latitude'], well['Longitude']]

        # Find HTEM points within radius (approximate degrees)
        radius_deg = radius_km / 111.0  # 1 degree ≈ 111 km
        indices = tree.query_ball_point(well_coord, radius_deg)

        if len(indices) == 0:
            continue

        # Extract HTEM properties in neighborhood
        htem_local = htem_df.iloc[indices]

        results.append({
            'WellID': well['WellID'],
            'Latitude': well['Latitude'],
            'Longitude': well['Longitude'],
            'n_htem_points': len(htem_local),
            # Resistivity statistics
            'resistivity_mean': htem_local['Resistivity'].mean(),
            'resistivity_std': htem_local['Resistivity'].std(),
            'resistivity_min': htem_local['Resistivity'].min(),
            'resistivity_max': htem_local['Resistivity'].max(),
            'resistivity_p25': htem_local['Resistivity'].quantile(0.25),
            'resistivity_p75': htem_local['Resistivity'].quantile(0.75),
            # Material type mode
            'material_mode': htem_local['MaterialType'].mode().iloc[0] if len(htem_local) > 0 else np.nan,
            # Depth statistics
            'depth_mean': htem_local['Depth'].mean(),
            'depth_range': htem_local['Depth'].max() - htem_local['Depth'].min(),
        })

    return pd.DataFrame(results)

# Extract HTEM properties at well locations (1 km radius)
wells_with_htem = extract_htem_at_wells(wells_summary, htem_unit_d, radius_km=1.0)

print(f"Wells with HTEM data: {len(wells_with_htem)}")
print(f"Mean HTEM points per well: {wells_with_htem['n_htem_points'].mean():.0f}")

For each well, we now have a structural fingerprint: mean resistivity, variability, depth range, and dominant material type. These features will be used to predict and explain well behavior.

Why extract statistics, not just the nearest point?

A single HTEM point might be anomalous—an isolated clay lens in otherwise sandy material. By computing statistics over a neighborhood, we capture:

Central tendency (mean, median): What’s the “typical” material?
Variability (std, range): How heterogeneous is the local geology?
Extremes (min, max, percentiles): What’s the best/worst case scenario?

This statistical approach makes our fusion robust to local anomalies and measurement noise.

34.7 Well Response Characterization

With HTEM structure quantified, we now need to characterize how each well behaves over time. Rather than analyzing raw hydrographs, we extract summary metrics that capture different aspects of well response:

Metric	What It Measures	Physical Meaning
Variance	Spread of water levels	How much the well fluctuates—high variance suggests unconfined conditions
Autocorrelation (lag 30)	Memory in the system	How long past conditions influence the present—high ACF means slow response
Rate of Change	Average daily fluctuation	How quickly levels rise/fall—high ROC indicates rapid recharge/discharge
Trend	Long-term direction	Is the aquifer gaining or losing water over time?
Coefficient of Variation	Normalized variability	Allows comparison across wells with different mean levels

These metrics transform a time series (potentially thousands of measurements) into a compact behavioral fingerprint that can be linked to structure.

#| code-fold: true
# NOTE: This code requires IntegratedDataLoader from src.data_loaders
# In production: loader = IntegratedDataLoader()
# For demonstration, this shows the methodology; actual execution
# requires database connection (see Data Dictionary chapter)

def characterize_well_response(loader, well_id):
    """
    Compute temporal response metrics for a well.

    Returns:
        Dict with response characteristics
    """
    try:
        ts = loader.groundwater.load_well_time_series(well_id)

        if len(ts) < 50:  # Need sufficient data
            return None

        wl = ts['WaterLevelElevation'].dropna()

        # Temporal statistics
        variance = wl.var()
        mean_level = wl.mean()
        trend = np.polyfit(range(len(wl)), wl, 1)[0]  # Linear trend

        # Range and volatility
        level_range = wl.max() - wl.min()
        cv = wl.std() / wl.mean() if wl.mean() != 0 else 0  # Coefficient of variation

        # Autocorrelation (lag 30 days)
        if len(wl) > 30:
            acf_30 = wl.autocorr(lag=30) if hasattr(wl, 'autocorr') else np.corrcoef(wl[:-30], wl[30:])[0, 1]
        else:
            acf_30 = np.nan

        # Rate of change (average absolute day-to-day change)
        if len(wl) > 1:
            roc = wl.diff().abs().mean()
        else:
            roc = np.nan

        return {
            'WellID': well_id,
            'n_measurements': len(wl),
            'mean_level': mean_level,
            'variance': variance,
            'std_dev': wl.std(),
            'level_range': level_range,
            'cv': cv,
            'trend': trend,
            'acf_30': acf_30,
            'rate_of_change': roc,
            'min_level': wl.min(),
            'max_level': wl.max()
        }
    except Exception as e:
        return None

# Characterize all wells with HTEM data
well_responses = []

for well_id in wells_with_htem['WellID']:
    response = characterize_well_response(loader, well_id)
    if response:
        well_responses.append(response)

well_response_df = pd.DataFrame(well_responses)

print(f"Wells with complete characterization: {len(well_response_df)}")

Why these specific metrics?

Each metric tests a different hypothesis about structure-function relationships:

Variance vs. Resistivity: Do sandy (high-resistivity) zones have more variable water levels?
ACF vs. Resistivity: Do clay-rich (low-resistivity) zones retain “memory” longer?
ROC vs. Heterogeneity: Does structural variability cause erratic behavior?

By computing multiple metrics, we can test multiple hypotheses simultaneously and build a richer picture of how structure controls behavior.

34.8 Fusion Dataset Creation

This is where data fusion happens: we merge the structural fingerprint (from HTEM) with the behavioral fingerprint (from well time series) into a single dataset. Each row represents one well, with columns for both structure and response:

Structure Features (from HTEM)	Response Features (from wells)
`resistivity_mean`, `resistivity_std`	`variance`, `std_dev`
`resistivity_min`, `resistivity_max`	`level_range`, `cv`
`material_mode`	`trend`, `acf_30`
`depth_mean`, `depth_range`	`rate_of_change`

This fused dataset enables us to ask: Does structure predict response?

#| code-fold: true
# Merge HTEM and well response data
fusion_df = pd.merge(
    wells_with_htem,
    well_response_df,
    on='WellID',
    how='inner'
)

print(f"Complete fusion dataset: {len(fusion_df)} wells")
print("\nSample:")
print(fusion_df[['WellID', 'resistivity_mean', 'variance', 'trend', 'acf_30']].head())

The fused dataset is now ready for analysis. Notice that we lose some wells in the merge—those without sufficient HTEM coverage or without enough time series data. This is expected and ensures we only analyze wells with reliable data from both sources.

34.9 Correlation Analysis

34.9.1 What Is Correlation Analysis?

Correlation analysis is a statistical method that measures how strongly two variables are related to each other. First introduced by Francis Galton in the 1880s and formalized by Karl Pearson in 1896, correlation quantifies whether changes in one variable tend to coincide with changes in another.

34.9.2 Why Does It Matter for Aquifer Analysis?

In hydrogeology, we need to understand whether subsurface structure (clay vs. sand) actually controls groundwater behavior (stable vs. variable water levels). Correlation analysis provides an objective, quantitative measure of these relationships, moving beyond visual inspection to statistical confidence.

34.9.3 How Does Correlation Work?

Imagine plotting one variable against another on a scatter plot:

Positive correlation: Points trend upward—as one variable increases, the other increases
Negative correlation: Points trend downward—as one increases, the other decreases
No correlation: Points are scattered randomly—no predictable relationship

The correlation coefficient (r or ρ) ranges from -1 to +1:

Values near +1 or -1 indicate strong relationships
Values near 0 indicate weak or no relationship

34.9.4 What Will You See Below?

We’ll calculate correlations between HTEM structure features (resistivity, material type) and well response metrics (variance, autocorrelation, rate of change). Each correlation tests a specific hydrogeological hypothesis about how geology controls water behavior.

34.9.5 How to Interpret Correlation Results

Correlation Value	Strength	Interpretation for Management
\|r\| > 0.7	Very strong	Structure is primary control—use HTEM to predict behavior
\|r\| = 0.5-0.7	Strong	Structure matters but other factors also important
\|r\| = 0.3-0.5	Moderate	Structure plays a role, investigate other controls
\|r\| < 0.3	Weak	Structure not dominant—look for pumping, streams, etc.

Critical distinction: We use Pearson correlation (r) for linear relationships and Spearman correlation (ρ) for monotonic (but not necessarily linear) relationships. Spearman is more robust to outliers.

Now we test our hypotheses quantitatively. We use correlation analysis to measure the strength and direction of relationships between structure and response:

Three key hypotheses:

Resistivity vs. Variance: Sandy aquifers (high resistivity) should fluctuate more because water moves in and out easily
Resistivity vs. Autocorrelation: Sandy aquifers should have lower autocorrelation—they respond quickly and “forget” past conditions
Structural heterogeneity vs. Rate of Change: Wells in geologically variable areas should show more erratic behavior

We use Pearson correlation for linear relationships and Spearman correlation for monotonic (but potentially nonlinear) relationships.

#| code-fold: true
# Hypothesis 1: High resistivity → High variance (unconfined aquifer)
corr_resist_var, p_resist_var = pearsonr(
    fusion_df['resistivity_mean'].dropna(),
    fusion_df['variance'].dropna()
)

# Hypothesis 2: High resistivity → Low autocorrelation (rapid response)
corr_resist_acf, p_resist_acf = spearmanr(
    fusion_df['resistivity_mean'].dropna(),
    fusion_df['acf_30'].dropna()
)

# Hypothesis 3: Resistivity variability → Higher rate of change
corr_resist_std_roc, p_resist_std_roc = pearsonr(
    fusion_df['resistivity_std'].dropna(),
    fusion_df['rate_of_change'].dropna()
)

print("Structure-Function Correlations:")
print(f"Resistivity vs Variance: r={corr_resist_var:.3f}, p={p_resist_var:.4f}")
print(f"Resistivity vs ACF-30: ρ={corr_resist_acf:.3f}, p={p_resist_acf:.4f}")
print(f"Resistivity Std vs Rate of Change: r={corr_resist_std_roc:.3f}, p={p_resist_std_roc:.4f}")

Interpreting correlation results:

r or ρ > 0.3 with p < 0.05: Moderate positive relationship, statistically significant
r or ρ < -0.3 with p < 0.05: Moderate negative relationship, statistically significant
|r| < 0.3: Weak or no relationship—structure may not be the dominant control

If correlations are weak, consider:

Confounding factors: Pumping, nearby streams, or seasonal effects may dominate
Nonlinear relationships: The connection may exist but not be captured by linear correlation
Scale mismatch: 1 km radius may be too large or too small for the local hydrogeology
Depth mismatch: HTEM Unit D may not correspond to well screen depth

34.10 Structure-Function Scatter

Correlation coefficients summarize relationships as single numbers, but scatter plots reveal the full story. They show:

Whether the relationship is truly linear or curved
Whether outliers are driving the correlation
Whether distinct clusters exist (suggesting different aquifer types)
How much scatter exists around the trend

The four-panel plot below visualizes our three hypotheses plus a comparison by material type.

#| code-fold: true
from plotly.subplots import make_subplots

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        'Resistivity vs Variance',
        'Resistivity vs Autocorrelation',
        'Resistivity Std vs Rate of Change',
        'Material Type vs Response'
    ),
    specs=[
        [{'type': 'scatter'}, {'type': 'scatter'}],
        [{'type': 'scatter'}, {'type': 'box'}]
    ]
)

# Plot 1: Resistivity vs Variance
fig.add_trace(
    go.Scatter(
        x=fusion_df['resistivity_mean'],
        y=fusion_df['variance'],
        mode='markers',
        marker=dict(
            size=8,
            color=fusion_df['n_measurements'],
            colorscale='Viridis',
            showscale=True,
            colorbar=dict(x=0.46, len=0.45, title='# Meas.')
        ),
        text=fusion_df['WellID'],
        hovertemplate='<b>%{text}</b><br>Resistivity: %{x:.1f} Ω·m<br>Variance: %{y:.2f}<extra></extra>'
    ),
    row=1, col=1
)

# Add trendline
z = np.polyfit(fusion_df['resistivity_mean'].dropna(), fusion_df['variance'].dropna(), 1)
p = np.poly1d(z)
x_trend = np.linspace(fusion_df['resistivity_mean'].min(), fusion_df['resistivity_mean'].max(), 100)
fig.add_trace(
    go.Scatter(
        x=x_trend,
        y=p(x_trend),
        mode='lines',
        line=dict(color='red', dash='dash'),
        showlegend=False
    ),
    row=1, col=1
)

# Plot 2: Resistivity vs Autocorrelation
fig.add_trace(
    go.Scatter(
        x=fusion_df['resistivity_mean'],
        y=fusion_df['acf_30'],
        mode='markers',
        marker=dict(size=8, color='coral', opacity=0.6),
        text=fusion_df['WellID'],
        hovertemplate='<b>%{text}</b><br>Resistivity: %{x:.1f} Ω·m<br>ACF(30): %{y:.3f}<extra></extra>'
    ),
    row=1, col=2
)

# Plot 3: Resistivity Std vs Rate of Change
fig.add_trace(
    go.Scatter(
        x=fusion_df['resistivity_std'],
        y=fusion_df['rate_of_change'],
        mode='markers',
        marker=dict(size=8, color='steelblue', opacity=0.6),
        text=fusion_df['WellID'],
        hovertemplate='<b>%{text}</b><br>Resist. Std: %{x:.1f} Ω·m<br>ROC: %{y:.3f} m/day<extra></extra>'
    ),
    row=2, col=1
)

# Plot 4: Material type vs variance (box plot)
material_types = fusion_df['material_mode'].dropna().unique()
for mat_type in sorted(material_types)[:5]:  # Top 5 material types
    data = fusion_df[fusion_df['material_mode'] == mat_type]
    fig.add_trace(
        go.Box(
            y=data['variance'],
            name=f'Type {mat_type}',
            boxmean='sd'
        ),
        row=2, col=2
    )

# Update axes
fig.update_xaxes(title_text='Mean Resistivity (Ω·m)', row=1, col=1)
fig.update_xaxes(title_text='Mean Resistivity (Ω·m)', row=1, col=2)
fig.update_xaxes(title_text='Resistivity Std (Ω·m)', row=2, col=1)
fig.update_xaxes(title_text='Material Type', row=2, col=2)

fig.update_yaxes(title_text='Water Level Variance', row=1, col=1)
fig.update_yaxes(title_text='Autocorrelation (lag 30d)', row=1, col=2)
fig.update_yaxes(title_text='Rate of Change (m/day)', row=2, col=1)
fig.update_yaxes(title_text='Variance', row=2, col=2)

fig.update_layout(
    title_text='HTEM Structure Controls Groundwater Response',
    height=800,
    showlegend=False
)

fig.show()

Reading the scatter plots:

Top-left (Resistivity vs. Variance): Look for an upward trend. Points colored by number of measurements help identify whether data-rich wells drive the pattern.
Top-right (Resistivity vs. Autocorrelation): We expect a negative relationship—high resistivity should correlate with low autocorrelation (fast response). If you see the opposite, clay layers may be providing confinement that creates memory.
Bottom-left (Resistivity Std vs. Rate of Change): This tests whether geological heterogeneity causes erratic well behavior. A positive trend supports this hypothesis.
Bottom-right (Material Type vs. Variance): Box plots by material type show whether the discrete classification captures variance differences. If boxes overlap substantially, the continuous resistivity variable may be more informative.

34.11 Resistivity Class Analysis

While continuous correlations are statistically powerful, discrete classes are operationally useful. A water manager can’t act on “wells with resistivity > 87.3 Ω·m”—but they can manage “wells in sandy zones” differently from “wells in clay zones.”

We classify wells into three groups based on surrounding HTEM resistivity:

Class	Resistivity Range	Expected Lithology	Expected Behavior
Low	<50 Ω·m	Clay/silt	Confined, stable, slow response
Medium	50-120 Ω·m	Mixed sediments	Semi-confined, moderate variability
High	>120 Ω·m	Sand/gravel	Unconfined, high variability, rapid response

#| code-fold: true
# Define resistivity classes
def classify_resistivity(resist):
    if resist < 50:
        return 'Low (<50 Ω·m)'
    elif resist < 120:
        return 'Medium (50-120 Ω·m)'
    else:
        return 'High (>120 Ω·m)'

fusion_df['resist_class'] = fusion_df['resistivity_mean'].apply(classify_resistivity)

# Compare response by resistivity class
class_stats = fusion_df.groupby('resist_class').agg({
    'variance': ['mean', 'std', 'count'],
    'acf_30': ['mean', 'std'],
    'rate_of_change': ['mean', 'std'],
    'trend': ['mean', 'std']
}).round(3)

print("\nResponse Statistics by Resistivity Class:")
print(class_stats)

Interpreting the class statistics:

The table above shows mean and standard deviation for each response metric within each resistivity class. Key patterns to look for:

Variance increasing Low → Medium → High: Confirms that sandy aquifers fluctuate more
ACF decreasing Low → Medium → High: Confirms that sandy aquifers respond faster
High standard deviation within classes: Indicates other factors (depth, pumping, proximity to streams) also matter

If classes don’t show clear separation, the classification thresholds may need adjustment for this specific aquifer system.

34.12 Resistivity Class Comparison

The box plots below provide a visual comparison of how well behavior differs across resistivity classes. Box plots show the median, interquartile range (IQR), and outliers—making it easy to see both central tendency and spread.

What to look for:

Non-overlapping boxes: Strong separation between classes—structure is a dominant control
Overlapping boxes: Weaker separation—other factors contribute substantially
Outliers: Wells that don’t fit the pattern—candidates for further investigation

#| code-fold: true
fig = make_subplots(
    rows=1, cols=3,
    subplot_titles=('Variance', 'Autocorrelation (30d)', 'Rate of Change')
)

resist_classes = ['Low (<50 Ω·m)', 'Medium (50-120 Ω·m)', 'High (>120 Ω·m)']
colors = ['#e74c3c', '#f39c12', '#3498db']

for metric, col, title in [('variance', 1, 'Variance'),
                            ('acf_30', 2, 'ACF(30)'),
                            ('rate_of_change', 3, 'ROC')]:
    for resist_class, color in zip(resist_classes, colors):
        data = fusion_df[fusion_df['resist_class'] == resist_class]

        fig.add_trace(
            go.Box(
                y=data[metric],
                name=resist_class,
                marker_color=color,
                boxmean='sd',
                showlegend=(col == 1)
            ),
            row=1, col=col
        )

fig.update_yaxes(title_text='Water Level Variance', row=1, col=1)
fig.update_yaxes(title_text='Autocorrelation', row=1, col=2)
fig.update_yaxes(title_text='Rate of Change (m/day)', row=1, col=3)

fig.update_layout(
    title_text='Well Response by HTEM Resistivity Class<br><sub>Low=Clay, Medium=Mixed, High=Sand/Gravel</sub>',
    height=500
)

fig.show()

If the hypothesis is correct, you should see:

Variance: Increasing from Low (red) to High (blue)
Autocorrelation: Decreasing from Low to High
Rate of Change: Increasing from Low to High

The strength of these patterns indicates how much of well behavior can be explained by subsurface structure alone.

📊 Reading Box Plot Comparisons

This 3-panel figure tests structure-function hypotheses:

Panel	Hypothesis Being Tested	Expected Pattern If Hypothesis Is True
Left (Variance)	High-resistivity zones have more variable water levels	Blue box higher than red box
Middle (ACF)	Clay zones retain “memory” longer	Red box higher than blue box
Right (ROC)	Sandy zones respond more rapidly	Blue box higher than red box

Interpreting Box Plot Features:

Non-overlapping boxes: Strong evidence for hypothesis (clear separation)
Overlapping boxes with different medians: Moderate evidence (trend exists but variability high)
Completely overlapping boxes: Hypothesis not supported (structure doesn’t control this metric)
Outliers (dots beyond whiskers): Wells that don’t fit the pattern—investigate individually

Management Implications:

Observation	Interpretation	Action
Strong variance separation	Structure dominates behavior	Use HTEM for well placement
Weak variance separation	Other factors dominate (pumping, climate)	Need additional data sources
Many outliers	Local complexity (faults, heterogeneity)	Site-specific investigation needed

34.13 Predictive HTEM Modeling

34.13.1 What Is Random Forest Regression?

Random Forest is a machine learning algorithm introduced by Leo Breiman in 2001. It creates hundreds of decision trees, each trained on a random subset of data, and combines their predictions through averaging. Think of it as consulting a “forest” of expert opinions rather than relying on a single expert.

34.13.2 Why Does It Matter for Aquifer Analysis?

Correlation and classification are descriptive—they tell us relationships exist. Predictive modeling goes further: can we actually forecast how a new well will behave based solely on HTEM data at its location?

This has practical value:

Before drilling: Predict expected water level behavior from HTEM alone
Monitoring design: Estimate which locations need frequent vs. infrequent sampling
Model validation: Test whether our understanding of structure-function relationships generalizes

We use Random Forest regression—an ensemble method that handles nonlinear relationships and automatically identifies important features.

#| code-fold: true
# Feature matrix: HTEM properties
X_features = [
    'resistivity_mean', 'resistivity_std', 'resistivity_min', 'resistivity_max',
    'resistivity_p25', 'resistivity_p75', 'depth_mean', 'depth_range'
]

X = fusion_df[X_features].dropna()

# Target: Well variance (example)
y_variance = fusion_df.loc[X.index, 'variance']

# Train-test split
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y_variance, test_size=0.3, random_state=42
)

# Random Forest model
rf_model = RandomForestRegressor(n_estimators=100, random_state=42, max_depth=5)
rf_model.fit(X_train, y_train)

# Predictions
y_pred = rf_model.predict(X_test)

# Evaluate
from sklearn.metrics import r2_score, mean_absolute_error

r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

print(f"\nPredictive Model Performance (HTEM → Well Variance):")
print(f"R² Score: {r2:.3f}")
print(f"MAE: {mae:.3f}")

# Feature importance
feature_importance = pd.DataFrame({
    'feature': X_features,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nFeature Importance:")
print(feature_importance)

34.13.3 How to Interpret Model Performance Metrics

Understanding these metrics is critical for assessing whether the model is trustworthy enough for operational decisions:

R² (Coefficient of Determination) - Developed by Sewall Wright (1921)

What it measures: Proportion of variance in well behavior explained by HTEM structure

R² Value	Interpretation	Management Action
R² > 0.7	Strong predictive power	Structure dominates—use model for well placement
R² = 0.3-0.7	Moderate	Structure matters but validate with local data
R² < 0.3	Weak	Structure alone insufficient—investigate pumping, streams

MAE (Mean Absolute Error)

What it measures: Average prediction error in original units (e.g., variance units)

How to interpret: Compare MAE to the standard deviation of the target variable:

If MAE << std: Predictions are substantially better than random guessing
If MAE ≈ std: Model barely improves on baseline predictions
If MAE > std: Model is worse than simply predicting the mean

Feature Importance

What it measures: Which HTEM properties most influence predictions

How to interpret:

resistivity_mean dominates: Average lithology is primary control—focus on bulk properties
resistivity_std is important: Geological heterogeneity drives behavior—expect local variability
depth_mean matters: Vertical position in aquifer is critical—confining layers at specific depths

34.14 Predictive Model Results

#| code-fold: true
fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Predicted vs Actual Variance', 'Feature Importance')
)

# Predicted vs actual
fig.add_trace(
    go.Scatter(
        x=y_test,
        y=y_pred,
        mode='markers',
        marker=dict(size=8, color='steelblue', opacity=0.6),
        name='Test Data',
        hovertemplate='Actual: %{x:.2f}<br>Predicted: %{y:.2f}<extra></extra>'
    ),
    row=1, col=1
)

# Perfect prediction line
line_min, line_max = y_test.min(), y_test.max()
fig.add_trace(
    go.Scatter(
        x=[line_min, line_max],
        y=[line_min, line_max],
        mode='lines',
        line=dict(color='red', dash='dash'),
        name='Perfect Prediction',
        showlegend=True
    ),
    row=1, col=1
)

# Feature importance
fig.add_trace(
    go.Bar(
        x=feature_importance['importance'],
        y=feature_importance['feature'],
        orientation='h',
        marker_color='coral'
    ),
    row=1, col=2
)

fig.update_xaxes(title_text='Actual Variance', row=1, col=1)
fig.update_xaxes(title_text='Importance', row=1, col=2)
fig.update_yaxes(title_text='Predicted Variance', row=1, col=1)
fig.update_yaxes(title_text='HTEM Feature', row=1, col=2)

fig.update_layout(
    title_text=f'HTEM-Based Prediction of Well Response (R²={r2:.3f})',
    height=500,
    showlegend=True
)

fig.show()

Reading the results plot:

Left panel (Predicted vs. Actual): Points should cluster along the red dashed line (perfect prediction). Scatter around the line indicates prediction uncertainty. Systematic deviations (curve, bias) suggest the model is missing something.
Right panel (Feature Importance): Shows which HTEM properties drive the predictions. This is scientifically informative—it tells us which aspects of structure most influence groundwater behavior.

If the model performs well (R² > 0.5), we can use it to predict behavior at unmonitored locations based on HTEM data alone.

34.15 Predicted Response Map

The ultimate goal of structure-function fusion is spatial prediction: using dense HTEM coverage to fill gaps between sparse well observations. The map below shows predicted water level variance across the study area, based solely on HTEM structure.

Applications:

Identify high-variability zones that need more monitoring
Locate stable zones suitable for long-term water supply
Guide placement of new observation wells
Prioritize areas for contamination vulnerability assessment

#| code-fold: true
# Predict variance for all wells with HTEM data
X_all = wells_with_htem[X_features].dropna()
wells_with_predictions = wells_with_htem.loc[X_all.index].copy()
wells_with_predictions['predicted_variance'] = rf_model.predict(X_all)

# Map
fig = go.Figure()

fig.add_trace(go.Scattermapbox(
    lat=wells_with_predictions['Latitude'],
    lon=wells_with_predictions['Longitude'],
    mode='markers',
    marker=dict(
        size=12,
        color=wells_with_predictions['predicted_variance'],
        colorscale='RdYlBu_r',
        showscale=True,
        colorbar=dict(title='Predicted<br>Variance')
    ),
    text=[f"Well {w}<br>Resistivity: {r:.1f} Ω·m<br>Predicted Var: {v:.2f}"
          for w, r, v in zip(wells_with_predictions['WellID'],
                             wells_with_predictions['resistivity_mean'],
                             wells_with_predictions['predicted_variance'])],
    hovertemplate='<b>%{text}</b><extra></extra>'
))

fig.update_layout(
    title='Predicted Well Response from HTEM Structure',
    mapbox=dict(
        style='open-street-map',
        center=dict(lat=wells_with_predictions['Latitude'].mean(),
                   lon=wells_with_predictions['Longitude'].mean()),
        zoom=9
    ),
    height=600
)

fig.show()

Interpreting the prediction map:

Hot colors (red/orange) indicate zones where we predict high water level variability—these are likely unconfined, sandy areas that respond strongly to recharge and pumping. Cool colors (blue) indicate predicted stability—likely confined, clay-rich zones.

Compare this map to the HTEM resistivity map at the beginning of the chapter. You should see correspondence: high-resistivity zones should generally show high predicted variance. Any discrepancies reveal where the relationship breaks down—perhaps due to local pumping, stream interaction, or 3D complexity not captured by Unit D alone.

34.16 Physical Interpretation

🔍 Structure-Function Relationships

Confirmed Hypotheses: 1. High resistivity → High variance (r={corr_resist_var:.2f}): Sand/gravel aquifers respond strongly to recharge/pumping 2. High resistivity → Lower autocorrelation (ρ={corr_resist_acf:.2f}): Unconfined aquifers have shorter “memory” 3. Heterogeneity → Higher rate of change: Spatially variable geology causes localized responses

Predictive Power: - HTEM structure explains **{r2*100:.1f}% of variance in well response - Most important feature: {feature_importance.iloc[0][‘feature’]}**

Aquifer Classification: - Low resistivity (<50 Ω·m): Confined, stable, slow response - Medium resistivity (50-120 Ω·m): Semi-confined, moderate variability - High resistivity (>120 Ω·m): Unconfined, highly responsive

34.17 Management Applications

34.17.1 1. Well Placement Strategy

#| code-fold: true
# Identify zones with different management needs
wells_with_predictions['management_class'] = pd.cut(
    wells_with_predictions['predicted_variance'],
    bins=[0, 1, 3, 100],
    labels=['Stable', 'Moderate', 'High Variability']
)

print("Wells by Management Class:")
print(wells_with_predictions['management_class'].value_counts())

# Monitoring frequency recommendations
print("\nRecommended Monitoring Frequency:")
print("- Stable zones: Quarterly")
print("- Moderate zones: Monthly")
print("- High variability zones: Weekly or continuous")

34.17.2 2. Vulnerability Assessment

#| code-fold: true
# High variance + shallow depth = vulnerability
wells_with_predictions['vulnerability_score'] = (
    wells_with_predictions['predicted_variance'] *
    (1 / wells_with_predictions['depth_mean'])  # Shallow = higher score
)

# Normalize to 0-100
wells_with_predictions['vulnerability_score'] = (
    100 * (wells_with_predictions['vulnerability_score'] /
           wells_with_predictions['vulnerability_score'].max())
)

# Categorize
wells_with_predictions['vulnerability'] = pd.cut(
    wells_with_predictions['vulnerability_score'],
    bins=[0, 30, 60, 100],
    labels=['Low', 'Moderate', 'High']
)

print("Vulnerability Assessment:")
print(wells_with_predictions['vulnerability'].value_counts())

34.18 Key Insights

💡 Fusion Insights

What HTEM tells us about groundwater: - Resistivity predicts response magnitude - Heterogeneity predicts variability - Depth modulates sensitivity

What groundwater tells us about HTEM: - Wells validate HTEM interpretation - Response patterns confirm lithology - Discrepancies reveal 3D complexity

Synergy: - HTEM provides spatial coverage (dense grid) - Wells provide temporal dynamics (continuous monitoring) - Together: 4D understanding of aquifer system

34.19 Limitations and Uncertainties

Depth mismatch: HTEM Unit D depth range may not match well screen depth
Scale effects: HTEM resolution (~100m) vs well point measurement
Temporal assumption: HTEM is static, but geology can change (land subsidence)
Confounding factors: Pumping, nearby wells affect response independent of structure

34.20 References

Barlow, P. M., & Leake, S. A. (2012). Streamflow depletion by wells. USGS Scientific Investigations Report 2012-5095.
Fitterman, D. V., & Stewart, M. T. (1986). Transient electromagnetic sounding for groundwater. Geophysics, 51(4), 995-1005.
Siemon, B., et al. (2009). A review of helicopter-borne electromagnetic methods for groundwater exploration. Near Surface Geophysics, 7(5-6), 629-646.

34.21 Next Steps

→ Chapter 6: Weather-Response Fusion - Climate forcing of aquifer dynamics

Cross-Chapter Connections: - HTEM data introduced in Part 1 - Well response metrics from Part 2 - Structure informs recharge pathways (Chapter 3) - Foundation for 4D fusion (Chapter 8)

34.22 Summary

HTEM-groundwater fusion reveals structure controls on aquifer response:

✅ Spatial structure matters - Resistivity patterns correlate with well response characteristics

✅ Scale bridging achieved - 100m HTEM grid linked to point well measurements

✅ Validation pathway established - Wells provide ground truth for HTEM interpretation

⚠️ Depth mismatch possible - HTEM unit depths may not match well screen depths

⚠️ Temporal assumption - HTEM is static, wells are dynamic

Key Insight: HTEM provides the spatial framework that explains why wells at different locations behave differently.

34.23 Reflection Questions

In your own study area, what are some qualitative signs (for example, rapid storm responses, long drought recovery, or very flat hydrographs) that subsurface structure is controlling groundwater behavior, and how might HTEM-style data help confirm those impressions?
Suppose the model in this chapter shows high variance and low autocorrelation in certain high-resistivity zones. How would that influence where you place new monitoring wells or where you would be cautious about additional pumping?
When HTEM-based predictions and observed well responses disagree at a location, what are at least three possible explanations (data issues, structural complexity, pumping, boundary effects), and how would you go about diagnosing which one dominates?
How could structure–response fusion like this feed into later chapters on forecasting, optimization, or value-of-information—for example, deciding where extra HTEM flights or new wells would most reduce uncertainty?

--- title: "HTEM-Groundwater Fusion" subtitle: "Subsurface Structure Controls Water Response" code-fold: true --- ::: {.callout-tip icon=false} ## For Newcomers **You will get:** - A picture of how **underground materials** (from HTEM) relate to how **water levels behave** in wells. - Examples of how structure (clay vs sand, thickness, depth) influences trends and seasonality. - Practice reading fusion plots that connect “what the ground is like” to “how the water moves”. You can skim detailed model-fitting steps and focus on: - The maps and plots showing structure vs response, - The narrative about which structural patterns matter most for groundwater behavior, - And the main qualitative conclusions. ::: **Data Sources Fused**: HTEM Structure + Groundwater Wells ## What You Will Learn in This Chapter By the end of this chapter, you will be able to: - Describe how HTEM-derived subsurface structure (resistivity, material types, depth) relates to observed groundwater behavior in wells. - Interpret maps and summary plots that link structural classes (clay, mixed, sand/gravel) to response metrics such as variance, autocorrelation, and rate of change. - Explain why fusing dense spatial geophysics with sparse but rich well time series gives a more complete picture than either alone. - Discuss how structure–response relationships can inform well placement, monitoring frequency, and vulnerability assessment. ## Overview HTEM geophysics reveals the 3D structure of the subsurface - where clay layers confine the aquifer, where sand and gravel transmit water rapidly, and where bedrock forms boundaries. Groundwater well measurements show how water levels respond to pumping and recharge. **The fusion question**: How does geological structure control groundwater behavior? This chapter connects static structure (HTEM) to dynamic response (water levels). ::: {.callout-note icon=false} ## 💻 For Computer Scientists This is a **structure-function mapping problem**: - **Input**: HTEM resistivity (proxy for lithology) - **Output**: Well response characteristics (variance, trend, autocorrelation) - **Challenge**: Non-stationary relationships - same resistivity means different things at different depths **Machine learning approach**: Spatial features from HTEM predict temporal patterns in wells. ::: ::: {.callout-tip icon=false} ## 🌍 For Hydrologists Aquifer properties control well response: | HTEM Signature | Lithology | Expected Well Response | |---------------|-----------|----------------------| | Low resistivity (<50 Ω·m) | Clay/silt | Low variance, slow response, confined | | Medium resistivity (50-120 Ω·m) | Mixed sediments | Moderate variance, seasonal | | High resistivity (>120 Ω·m) | Sand/gravel | High variance, rapid response, unconfined | **Key insight**: Wells in high-resistivity zones should show faster recharge response. ::: ## Analysis Approach ```{python} #| code-fold: true #| label: load-htem-data #| output: false import os import sys from pathlib import Path import pandas as pd import numpy as np import plotly.graph_objects as go from plotly.subplots import make_subplots import warnings warnings.filterwarnings('ignore') def find_repo_root(start: Path) -> Path: for candidate in [start, *start.parents]: if (candidate / "src").exists(): return candidate return start quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd()))) project_root = find_repo_root(quarto_project) if str(project_root) not in sys.path: sys.path.append(str(project_root)) # Load HTEM Unit D data using IntegratedDataLoader data_loaded = False try: from src.data_loaders import IntegratedDataLoader from src.utils import get_data_path htem_root = get_data_path("htem_root") aquifer_db_path = get_data_path("aquifer_db") weather_db_path = get_data_path("warm_db") usgs_stream_root = get_data_path("usgs_stream") loader = IntegratedDataLoader( htem_path=htem_root, aquifer_db_path=aquifer_db_path, weather_db_path=weather_db_path, usgs_stream_path=usgs_stream_root ) # Load Unit D (primary aquifer) htem_unit_d = loader.htem.load_material_type_grid('D', 'Preferred', sample_size=50000) # Rename columns if needed for compatibility if 'MT_Index' in htem_unit_d.columns: htem_unit_d['MaterialType'] = htem_unit_d['MT_Index'] # Estimate resistivity from material type if not available if 'Resistivity' not in htem_unit_d.columns: # Map material type to approximate resistivity mt_to_resist = { 1: 15, 2: 20, 3: 30, 4: 45, 5: 60, 6: 75, 7: 90, 8: 110, 9: 130, 10: 150, 11: 180, 12: 220, 13: 280, 14: 350, 101: 400, 102: 450, 103: 500, 104: 550, 105: 600 } htem_unit_d['Resistivity'] = htem_unit_d['MaterialType'].map(mt_to_resist).fillna(100) # Add depth column if not available if 'Depth' not in htem_unit_d.columns: htem_unit_d['Depth'] = -htem_unit_d['Z'] data_loaded = True print(f"✅ HTEM points loaded: {len(htem_unit_d):,}") except Exception as e: print(f"Error loading HTEM via IntegratedDataLoader ({e}). Loading directly from CSV files.") import glob from src.utils import get_data_path htem_root = get_data_path("htem_root") # Load HTEM data from 3DGrids CSV files htem_files = glob.glob(f"{htem_root}/3DGrids/*.csv") # Find Unit D file (primary aquifer) unit_d_file = None for htem_file in htem_files: if 'Unit_D' in htem_file or 'UnitD' in htem_file or 'unit_d' in htem_file.lower(): unit_d_file = htem_file break # If Unit D not found, try preferred model files if unit_d_file is None: for htem_file in htem_files: if 'preferred' in htem_file.lower() or 'Preferred' in htem_file: unit_d_file = htem_file break # If still not found, use any available CSV if unit_d_file is None and len(htem_files) > 0: unit_d_file = htem_files[0] if unit_d_file: print(f"Loading HTEM data from: {unit_d_file}") htem_unit_d = pd.read_csv(unit_d_file) # Identify columns (names may vary across files) # Look for X, Y, Z coordinates x_col = None y_col = None z_col = None resist_col = None material_col = None for col in htem_unit_d.columns: col_lower = col.lower() if col_lower in ['x', 'easting', 'utmx']: x_col = col elif col_lower in ['y', 'northing', 'utmy']: y_col = col elif col_lower in ['z', 'elevation', 'depth']: z_col = col elif 'resist' in col_lower or 'rho' in col_lower: resist_col = col elif 'material' in col_lower or 'type' in col_lower: material_col = col # Rename to standard column names rename_map = {} if x_col: rename_map[x_col] = 'X' if y_col: rename_map[y_col] = 'Y' if z_col: rename_map[z_col] = 'Z' if resist_col: rename_map[resist_col] = 'Resistivity' if material_col: rename_map[material_col] = 'MaterialType' htem_unit_d = htem_unit_d.rename(columns=rename_map) # Add depth column if not available if 'Depth' not in htem_unit_d.columns and 'Z' in htem_unit_d.columns: htem_unit_d['Depth'] = -htem_unit_d['Z'] # Ensure numeric types if 'Resistivity' in htem_unit_d.columns: htem_unit_d['Resistivity'] = pd.to_numeric(htem_unit_d['Resistivity'], errors='coerce') if 'MaterialType' in htem_unit_d.columns: htem_unit_d['MaterialType'] = pd.to_numeric(htem_unit_d['MaterialType'], errors='coerce') # Drop rows with missing essential data essential_cols = [col for col in ['X', 'Y', 'Resistivity'] if col in htem_unit_d.columns] htem_unit_d = htem_unit_d.dropna(subset=essential_cols) data_loaded = True print(f"✅ HTEM points loaded from CSV: {len(htem_unit_d):,}") # Clip extreme resistivity values htem_unit_d['Resistivity'] = htem_unit_d['Resistivity'].clip(1, 500) print(f"Resistivity range: {htem_unit_d['Resistivity'].min():.1f} - {htem_unit_d['Resistivity'].max():.1f} Ω·m") print(f"Mean resistivity: {htem_unit_d['Resistivity'].mean():.1f} Ω·m") ``` ## HTEM Spatial Distribution Before we can link subsurface structure to groundwater behavior, we need to understand **what the structure looks like**. The map below shows HTEM resistivity measurements across the study area, classified into three broad categories that correspond to different geological materials: - **Low resistivity (red)**: Clay and silt—these fine-grained materials restrict water flow and create confined conditions - **Medium resistivity (orange)**: Mixed sediments—transitional zones with moderate permeability - **High resistivity (blue)**: Sand and gravel—these coarse materials transmit water readily and characterize productive aquifer zones Look for **spatial patterns**: Are high-resistivity zones clustered? Do they follow buried river channels? These patterns will help explain why wells in different locations behave differently. ::: {.callout-note icon=false} ## 📊 How to Read This HTEM Map **Color-coded point cloud showing subsurface geology:** | Color | Resistivity Range | Lithology Interpretation | Aquifer Quality | |-------|-------------------|--------------------------|-----------------| | **Red** | <50 Ω·m | Clay/Silt | Poor (confining layer) | | **Orange** | 50-120 Ω·m | Mixed sediments | Moderate | | **Blue** | >120 Ω·m | Sand/Gravel | Excellent (productive aquifer) | **Spatial Patterns to Identify:** - **Linear blue corridors**: Buried paleochannels (ancient river deposits)—high-yield aquifer zones - **Patchy blue zones**: Isolated sand lenses within clay matrix - **Red blankets**: Continuous clay layers that create confined conditions - **Sharp color boundaries**: Distinct geological contacts (faults, erosional surfaces) - **Gradational transitions**: Mixing of sediment types **Why this matters:** High-resistivity corridors are prime targets for well placement and recharge enhancement. Red (clay) zones should be avoided for production wells but may indicate protected aquifer zones. ::: ```{python} #| code-fold: true #| label: fig-htem-spatial-distribution #| fig-cap: "HTEM resistivity distribution showing subsurface structure and material types" # Sample for visualization (plot every 10th point for speed) sample_idx = np.arange(0, len(htem_unit_d), max(1, len(htem_unit_d) // 10000)) htem_sample = htem_unit_d.iloc[sample_idx].copy() # Create resistivity classification def classify_resistivity(resist): if resist < 50: return 'Low (<50 Ω·m) Clay/Silt' elif resist < 120: return 'Medium (50-120 Ω·m) Mixed Sediments' else: return 'High (>120 Ω·m) Sand/Gravel' htem_sample['ResistivityClass'] = htem_sample['Resistivity'].apply(classify_resistivity) # Create scatter plot with material type coloring fig = go.Figure() # Plot by resistivity class resist_classes = ['Low (<50 Ω·m) Clay/Silt', 'Medium (50-120 Ω·m) Mixed Sediments', 'High (>120 Ω·m) Sand/Gravel'] colors = ['#e74c3c', '#f39c12', '#3498db'] for resist_class, color in zip(resist_classes, colors): data = htem_sample[htem_sample['ResistivityClass'] == resist_class] fig.add_trace(go.Scatter( x=data['X'], y=data['Y'], mode='markers', marker=dict( size=3, color=color, opacity=0.6 ), name=resist_class, hovertemplate='HTEM Point ' + 'X: %{x:.0f} ' + 'Y: %{y:.0f} ' + 'Resistivity: %{customdata[0]:.1f} Ω·m ' + 'Material Type: %{customdata[1]} ' + 'Depth: %{customdata[2]:.1f} m ' + '<extra></extra>', customdata=data[['Resistivity', 'MaterialType', 'Depth']].values )) fig.update_layout( title={ 'text': 'HTEM Unit D Resistivity Distribution (Primary Aquifer) Colors indicate lithology: Clay → Sand/Gravel', 'x': 0.5, 'xanchor': 'center' }, xaxis_title='Easting (m, UTM Zone 16N)', yaxis_title='Northing (m, UTM Zone 16N)', height=700, hovermode='closest', plot_bgcolor='white', legend=dict( title='Resistivity Class', yanchor='top', y=0.99, xanchor='left', x=0.01, bgcolor='rgba(255, 255, 255, 0.8)' ) ) # Equal aspect ratio fig.update_yaxes(scaleanchor="x", scaleratio=1) fig.show() ``` **What to notice in this map:** The spatial distribution of resistivity reveals the **geological architecture** of the aquifer system. In this study area, you may observe: 1. **Linear features**: High-resistivity corridors often trace buried paleochannels—ancient river deposits filled with sand and gravel that now serve as preferential flow paths 2. **Patchy patterns**: Isolated high-resistivity zones may represent localized sand lenses within a clay matrix 3. **Gradational boundaries**: Where colors blend gradually, the subsurface transitions smoothly between material types; sharp boundaries indicate distinct geological contacts This heterogeneity is **not random**—it reflects thousands of years of sediment deposition, erosion, and reworking. Understanding these patterns is the first step toward predicting how water moves through the system. ## Material Type Distribution While the map above shows **where** different materials occur, we also need to understand **how much** of each material type exists. This matters because: - If the aquifer is dominated by clay, most wells will show confined, stable behavior - If sand and gravel dominate, wells will be more responsive to recharge and pumping - A mix of materials suggests complex, spatially-variable responses The histogram below shows the frequency of each material type, while the box plots reveal how resistivity varies within each type (some overlap is expected due to measurement uncertainty and geological variability). ::: {.callout-note icon=false} ## 📊 Interpreting Material Type Distribution **This figure reveals the overall geological character of the aquifer:** **Left Panel - Frequency Histogram:** | Distribution Pattern | Aquifer Interpretation | Expected Well Behavior | |---------------------|------------------------|------------------------| | **Skewed left (types 1-5 dominate)** | Clay-dominated system | Stable, confined, slow response | | **Bimodal (peaks at 3 AND 11)** | Mixed system: clay + sand layers | Variable response by location | | **Skewed right (types 10-14 dominate)** | Sand/gravel-dominated | Responsive, high variability | **Right Panel - Resistivity Box Plots:** - **Increasing median with type**: Confirms physical basis (higher type = coarser = higher resistivity) - **Overlapping boxes**: Natural geological variability—boundaries are gradational, not sharp - **Wide boxes (high IQR)**: Heterogeneous material type with local variation - **Narrow boxes**: Well-sorted, consistent material **Physical meaning:** A bimodal distribution (like this aquifer) indicates two distinct environments coexist—perhaps a confined clay-capped zone overlying an unconfined sand zone. ::: ```{python} #| code-fold: true #| label: fig-material-type-distribution #| fig-cap: "Distribution of material types showing aquifer heterogeneity" # Create histogram of material types material_counts = htem_unit_d['MaterialType'].value_counts().sort_index() # Material type descriptions (simplified) material_labels = { 1: 'Type 1: Clay', 2: 'Type 2: Silty Clay', 3: 'Type 3: Clayey Silt', 4: 'Type 4: Silt', 5: 'Type 5: Sandy Silt', 6: 'Type 6: Silty Sand', 7: 'Type 7: Fine Sand', 8: 'Type 8: Medium Sand', 9: 'Type 9: Coarse Sand', 10: 'Type 10: Very Coarse Sand', 11: 'Type 11: Well Sorted Sand', 12: 'Type 12: Very Well Sorted Sand', 13: 'Type 13: Extremely Well Sorted Sand', 14: 'Type 14: Sand/Gravel' } # Create bar chart fig = make_subplots( rows=1, cols=2, subplot_titles=('Material Type Frequency', 'Resistivity by Material Type'), specs=[[{'type': 'bar'}, {'type': 'box'}]] ) # Bar chart of material type counts fig.add_trace( go.Bar( x=material_counts.index, y=material_counts.values, marker_color='steelblue', text=material_counts.values, textposition='outside', hovertemplate='%{x} Count: %{y:,}<extra></extra>' ), row=1, col=1 ) # Box plot of resistivity by material type for mat_type in sorted(htem_unit_d['MaterialType'].unique())[:10]: # Top 10 types data = htem_unit_d[htem_unit_d['MaterialType'] == mat_type] fig.add_trace( go.Box( y=data['Resistivity'], name=f'Type {mat_type}', boxmean='sd', marker_color='coral' ), row=1, col=2 ) fig.update_xaxes(title_text='Material Type', row=1, col=1) fig.update_xaxes(title_text='Material Type', row=1, col=2) fig.update_yaxes(title_text='Count', row=1, col=1) fig.update_yaxes(title_text='Resistivity (Ω·m)', type='log', row=1, col=2) fig.update_layout( title_text='HTEM Material Type Characterization', height=500, showlegend=False ) fig.show() ``` **Interpreting the distribution:** The material type histogram reveals the **overall character** of the aquifer system: - **Skewed toward low types (1-5)**: A clay-dominated system—expect most wells to show stable, confined behavior with slow response to climate forcing - **Skewed toward high types (10-14)**: A sand/gravel-dominated system—expect responsive wells with high variability - **Bimodal distribution**: Two distinct aquifer environments coexist—wells will behave very differently depending on their location The box plots on the right show that resistivity increases systematically with material type, confirming the physical basis of our classification. However, note the **overlap between adjacent types**—this reflects both measurement uncertainty and the continuous nature of geological transitions. A resistivity of 80 Ω·m could be dense sand or loose gravel; context (depth, surrounding materials) helps resolve ambiguity. ## Spatial Data Extraction Now comes the **critical fusion step**: linking HTEM measurements to well locations. The challenge is that HTEM and wells sample the subsurface at different scales: - **HTEM**: Dense grid (~100m spacing), but measures bulk properties averaged over a volume - **Wells**: Point measurements, but capture actual aquifer conditions at specific depths Our approach: for each well, extract HTEM statistics from all points within a 1 km radius. Why 1 km? 1. **Hydrogeological relevance**: Groundwater flow integrates conditions over kilometers, not meters 2. **Statistical robustness**: A larger radius captures more HTEM points, yielding more stable statistics 3. **Scale matching**: 1 km roughly matches the "capture zone" of a pumping well The function below uses a **KD-tree** for efficient spatial search—essential when matching thousands of wells to millions of HTEM points. ```python #| code-fold: true from scipy.spatial import cKDTree def extract_htem_at_wells(wells_df, htem_df, radius_km=1.0): """ Extract HTEM properties around each well. For each well, find all HTEM points within radius and compute statistics. Returns: DataFrame with well_id and HTEM statistics """ # Build KD-tree for fast spatial search htem_coords = htem_df[['Latitude', 'Longitude']].values tree = cKDTree(htem_coords) results = [] for idx, well in wells_df.iterrows(): well_coord = [well['Latitude'], well['Longitude']] # Find HTEM points within radius (approximate degrees) radius_deg = radius_km / 111.0 # 1 degree ≈ 111 km indices = tree.query_ball_point(well_coord, radius_deg) if len(indices) == 0: continue # Extract HTEM properties in neighborhood htem_local = htem_df.iloc[indices] results.append({ 'WellID': well['WellID'], 'Latitude': well['Latitude'], 'Longitude': well['Longitude'], 'n_htem_points': len(htem_local), # Resistivity statistics 'resistivity_mean': htem_local['Resistivity'].mean(), 'resistivity_std': htem_local['Resistivity'].std(), 'resistivity_min': htem_local['Resistivity'].min(), 'resistivity_max': htem_local['Resistivity'].max(), 'resistivity_p25': htem_local['Resistivity'].quantile(0.25), 'resistivity_p75': htem_local['Resistivity'].quantile(0.75), # Material type mode 'material_mode': htem_local['MaterialType'].mode().iloc[0] if len(htem_local) > 0 else np.nan, # Depth statistics 'depth_mean': htem_local['Depth'].mean(), 'depth_range': htem_local['Depth'].max() - htem_local['Depth'].min(), }) return pd.DataFrame(results) # Extract HTEM properties at well locations (1 km radius) wells_with_htem = extract_htem_at_wells(wells_summary, htem_unit_d, radius_km=1.0) print(f"Wells with HTEM data: {len(wells_with_htem)}") print(f"Mean HTEM points per well: {wells_with_htem['n_htem_points'].mean():.0f}") ``` For each well, we now have a **structural fingerprint**: mean resistivity, variability, depth range, and dominant material type. These features will be used to predict and explain well behavior. ::: {.callout-note icon=false} ## Why extract statistics, not just the nearest point? A single HTEM point might be anomalous—an isolated clay lens in otherwise sandy material. By computing statistics over a neighborhood, we capture: - **Central tendency** (mean, median): What's the "typical" material? - **Variability** (std, range): How heterogeneous is the local geology? - **Extremes** (min, max, percentiles): What's the best/worst case scenario? This statistical approach makes our fusion robust to local anomalies and measurement noise. ::: ## Well Response Characterization With HTEM structure quantified, we now need to characterize **how each well behaves over time**. Rather than analyzing raw hydrographs, we extract summary metrics that capture different aspects of well response: | Metric | What It Measures | Physical Meaning | |--------|------------------|------------------| | **Variance** | Spread of water levels | How much the well fluctuates—high variance suggests unconfined conditions | | **Autocorrelation (lag 30)** | Memory in the system | How long past conditions influence the present—high ACF means slow response | | **Rate of Change** | Average daily fluctuation | How quickly levels rise/fall—high ROC indicates rapid recharge/discharge | | **Trend** | Long-term direction | Is the aquifer gaining or losing water over time? | | **Coefficient of Variation** | Normalized variability | Allows comparison across wells with different mean levels | These metrics transform a time series (potentially thousands of measurements) into a compact **behavioral fingerprint** that can be linked to structure. ```python #| code-fold: true # NOTE: This code requires IntegratedDataLoader from src.data_loaders # In production: loader = IntegratedDataLoader() # For demonstration, this shows the methodology; actual execution # requires database connection (see Data Dictionary chapter) def characterize_well_response(loader, well_id): """ Compute temporal response metrics for a well. Returns: Dict with response characteristics """ try: ts = loader.groundwater.load_well_time_series(well_id) if len(ts) < 50: # Need sufficient data return None wl = ts['WaterLevelElevation'].dropna() # Temporal statistics variance = wl.var() mean_level = wl.mean() trend = np.polyfit(range(len(wl)), wl, 1)[0] # Linear trend # Range and volatility level_range = wl.max() - wl.min() cv = wl.std() / wl.mean() if wl.mean() != 0 else 0 # Coefficient of variation # Autocorrelation (lag 30 days) if len(wl) > 30: acf_30 = wl.autocorr(lag=30) if hasattr(wl, 'autocorr') else np.corrcoef(wl[:-30], wl[30:])[0, 1] else: acf_30 = np.nan # Rate of change (average absolute day-to-day change) if len(wl) > 1: roc = wl.diff().abs().mean() else: roc = np.nan return { 'WellID': well_id, 'n_measurements': len(wl), 'mean_level': mean_level, 'variance': variance, 'std_dev': wl.std(), 'level_range': level_range, 'cv': cv, 'trend': trend, 'acf_30': acf_30, 'rate_of_change': roc, 'min_level': wl.min(), 'max_level': wl.max() } except Exception as e: return None # Characterize all wells with HTEM data well_responses = [] for well_id in wells_with_htem['WellID']: response = characterize_well_response(loader, well_id) if response: well_responses.append(response) well_response_df = pd.DataFrame(well_responses) print(f"Wells with complete characterization: {len(well_response_df)}") ``` ::: {.callout-tip icon=false} ## Why these specific metrics? Each metric tests a different hypothesis about structure-function relationships: - **Variance vs. Resistivity**: Do sandy (high-resistivity) zones have more variable water levels? - **ACF vs. Resistivity**: Do clay-rich (low-resistivity) zones retain "memory" longer? - **ROC vs. Heterogeneity**: Does structural variability cause erratic behavior? By computing multiple metrics, we can test multiple hypotheses simultaneously and build a richer picture of how structure controls behavior. ::: ## Fusion Dataset Creation This is where **data fusion happens**: we merge the structural fingerprint (from HTEM) with the behavioral fingerprint (from well time series) into a single dataset. Each row represents one well, with columns for both structure and response: | Structure Features (from HTEM) | Response Features (from wells) | |-------------------------------|-------------------------------| | `resistivity_mean`, `resistivity_std` | `variance`, `std_dev` | | `resistivity_min`, `resistivity_max` | `level_range`, `cv` | | `material_mode` | `trend`, `acf_30` | | `depth_mean`, `depth_range` | `rate_of_change` | This fused dataset enables us to ask: **Does structure predict response?** ```python #| code-fold: true # Merge HTEM and well response data fusion_df = pd.merge( wells_with_htem, well_response_df, on='WellID', how='inner' ) print(f"Complete fusion dataset: {len(fusion_df)} wells") print("\nSample:") print(fusion_df[['WellID', 'resistivity_mean', 'variance', 'trend', 'acf_30']].head()) ``` The fused dataset is now ready for analysis. Notice that we lose some wells in the merge—those without sufficient HTEM coverage or without enough time series data. This is expected and ensures we only analyze wells with reliable data from both sources. ## Correlation Analysis ### What Is Correlation Analysis? **Correlation analysis** is a statistical method that measures how strongly two variables are related to each other. First introduced by Francis Galton in the 1880s and formalized by Karl Pearson in 1896, correlation quantifies whether changes in one variable tend to coincide with changes in another. ### Why Does It Matter for Aquifer Analysis? In hydrogeology, we need to understand whether subsurface structure (clay vs. sand) actually controls groundwater behavior (stable vs. variable water levels). Correlation analysis provides an **objective, quantitative measure** of these relationships, moving beyond visual inspection to statistical confidence. ### How Does Correlation Work? Imagine plotting one variable against another on a scatter plot: - **Positive correlation**: Points trend upward—as one variable increases, the other increases - **Negative correlation**: Points trend downward—as one increases, the other decreases - **No correlation**: Points are scattered randomly—no predictable relationship The **correlation coefficient** (r or ρ) ranges from -1 to +1: - Values near +1 or -1 indicate strong relationships - Values near 0 indicate weak or no relationship ### What Will You See Below? We'll calculate correlations between HTEM structure features (resistivity, material type) and well response metrics (variance, autocorrelation, rate of change). Each correlation tests a specific hydrogeological hypothesis about how geology controls water behavior. ### How to Interpret Correlation Results | Correlation Value | Strength | Interpretation for Management | |------------------|----------|------------------------------| | **\|r\| > 0.7** | Very strong | Structure is primary control—use HTEM to predict behavior | | **\|r\| = 0.5-0.7** | Strong | Structure matters but other factors also important | | **\|r\| = 0.3-0.5** | Moderate | Structure plays a role, investigate other controls | | **\|r\| < 0.3** | Weak | Structure not dominant—look for pumping, streams, etc. | **Critical distinction**: We use **Pearson correlation (r)** for linear relationships and **Spearman correlation (ρ)** for monotonic (but not necessarily linear) relationships. Spearman is more robust to outliers. --- Now we test our hypotheses quantitatively. We use correlation analysis to measure the strength and direction of relationships between structure and response: **Three key hypotheses:** 1. **Resistivity vs. Variance**: Sandy aquifers (high resistivity) should fluctuate more because water moves in and out easily 2. **Resistivity vs. Autocorrelation**: Sandy aquifers should have *lower* autocorrelation—they respond quickly and "forget" past conditions 3. **Structural heterogeneity vs. Rate of Change**: Wells in geologically variable areas should show more erratic behavior We use Pearson correlation for linear relationships and Spearman correlation for monotonic (but potentially nonlinear) relationships. ```python #| code-fold: true # Hypothesis 1: High resistivity → High variance (unconfined aquifer) corr_resist_var, p_resist_var = pearsonr( fusion_df['resistivity_mean'].dropna(), fusion_df['variance'].dropna() ) # Hypothesis 2: High resistivity → Low autocorrelation (rapid response) corr_resist_acf, p_resist_acf = spearmanr( fusion_df['resistivity_mean'].dropna(), fusion_df['acf_30'].dropna() ) # Hypothesis 3: Resistivity variability → Higher rate of change corr_resist_std_roc, p_resist_std_roc = pearsonr( fusion_df['resistivity_std'].dropna(), fusion_df['rate_of_change'].dropna() ) print("Structure-Function Correlations:") print(f"Resistivity vs Variance: r={corr_resist_var:.3f}, p={p_resist_var:.4f}") print(f"Resistivity vs ACF-30: ρ={corr_resist_acf:.3f}, p={p_resist_acf:.4f}") print(f"Resistivity Std vs Rate of Change: r={corr_resist_std_roc:.3f}, p={p_resist_std_roc:.4f}") ``` **Interpreting correlation results:** - **r or ρ > 0.3** with **p < 0.05**: Moderate positive relationship, statistically significant - **r or ρ < -0.3** with **p < 0.05**: Moderate negative relationship, statistically significant - **|r| < 0.3**: Weak or no relationship—structure may not be the dominant control If correlations are weak, consider: 1. **Confounding factors**: Pumping, nearby streams, or seasonal effects may dominate 2. **Nonlinear relationships**: The connection may exist but not be captured by linear correlation 3. **Scale mismatch**: 1 km radius may be too large or too small for the local hydrogeology 4. **Depth mismatch**: HTEM Unit D may not correspond to well screen depth ## Structure-Function Scatter Correlation coefficients summarize relationships as single numbers, but **scatter plots reveal the full story**. They show: - Whether the relationship is truly linear or curved - Whether outliers are driving the correlation - Whether distinct clusters exist (suggesting different aquifer types) - How much scatter exists around the trend The four-panel plot below visualizes our three hypotheses plus a comparison by material type. ```python #| code-fold: true from plotly.subplots import make_subplots fig = make_subplots( rows=2, cols=2, subplot_titles=( 'Resistivity vs Variance', 'Resistivity vs Autocorrelation', 'Resistivity Std vs Rate of Change', 'Material Type vs Response' ), specs=[ [{'type': 'scatter'}, {'type': 'scatter'}], [{'type': 'scatter'}, {'type': 'box'}] ] ) # Plot 1: Resistivity vs Variance fig.add_trace( go.Scatter( x=fusion_df['resistivity_mean'], y=fusion_df['variance'], mode='markers', marker=dict( size=8, color=fusion_df['n_measurements'], colorscale='Viridis', showscale=True, colorbar=dict(x=0.46, len=0.45, title='# Meas.') ), text=fusion_df['WellID'], hovertemplate='%{text} Resistivity: %{x:.1f} Ω·m Variance: %{y:.2f}<extra></extra>' ), row=1, col=1 ) # Add trendline z = np.polyfit(fusion_df['resistivity_mean'].dropna(), fusion_df['variance'].dropna(), 1) p = np.poly1d(z) x_trend = np.linspace(fusion_df['resistivity_mean'].min(), fusion_df['resistivity_mean'].max(), 100) fig.add_trace( go.Scatter( x=x_trend, y=p(x_trend), mode='lines', line=dict(color='red', dash='dash'), showlegend=False ), row=1, col=1 ) # Plot 2: Resistivity vs Autocorrelation fig.add_trace( go.Scatter( x=fusion_df['resistivity_mean'], y=fusion_df['acf_30'], mode='markers', marker=dict(size=8, color='coral', opacity=0.6), text=fusion_df['WellID'], hovertemplate='%{text} Resistivity: %{x:.1f} Ω·m ACF(30): %{y:.3f}<extra></extra>' ), row=1, col=2 ) # Plot 3: Resistivity Std vs Rate of Change fig.add_trace( go.Scatter( x=fusion_df['resistivity_std'], y=fusion_df['rate_of_change'], mode='markers', marker=dict(size=8, color='steelblue', opacity=0.6), text=fusion_df['WellID'], hovertemplate='%{text} Resist. Std: %{x:.1f} Ω·m ROC: %{y:.3f} m/day<extra></extra>' ), row=2, col=1 ) # Plot 4: Material type vs variance (box plot) material_types = fusion_df['material_mode'].dropna().unique() for mat_type in sorted(material_types)[:5]: # Top 5 material types data = fusion_df[fusion_df['material_mode'] == mat_type] fig.add_trace( go.Box( y=data['variance'], name=f'Type {mat_type}', boxmean='sd' ), row=2, col=2 ) # Update axes fig.update_xaxes(title_text='Mean Resistivity (Ω·m)', row=1, col=1) fig.update_xaxes(title_text='Mean Resistivity (Ω·m)', row=1, col=2) fig.update_xaxes(title_text='Resistivity Std (Ω·m)', row=2, col=1) fig.update_xaxes(title_text='Material Type', row=2, col=2) fig.update_yaxes(title_text='Water Level Variance', row=1, col=1) fig.update_yaxes(title_text='Autocorrelation (lag 30d)', row=1, col=2) fig.update_yaxes(title_text='Rate of Change (m/day)', row=2, col=1) fig.update_yaxes(title_text='Variance', row=2, col=2) fig.update_layout( title_text='HTEM Structure Controls Groundwater Response', height=800, showlegend=False ) fig.show() ``` **Reading the scatter plots:** 1. **Top-left (Resistivity vs. Variance)**: Look for an upward trend. Points colored by number of measurements help identify whether data-rich wells drive the pattern. 2. **Top-right (Resistivity vs. Autocorrelation)**: We expect a *negative* relationship—high resistivity should correlate with low autocorrelation (fast response). If you see the opposite, clay layers may be providing confinement that creates memory. 3. **Bottom-left (Resistivity Std vs. Rate of Change)**: This tests whether geological heterogeneity causes erratic well behavior. A positive trend supports this hypothesis. 4. **Bottom-right (Material Type vs. Variance)**: Box plots by material type show whether the discrete classification captures variance differences. If boxes overlap substantially, the continuous resistivity variable may be more informative. ## Resistivity Class Analysis While continuous correlations are statistically powerful, **discrete classes are operationally useful**. A water manager can't act on "wells with resistivity > 87.3 Ω·m"—but they can manage "wells in sandy zones" differently from "wells in clay zones." We classify wells into three groups based on surrounding HTEM resistivity: | Class | Resistivity Range | Expected Lithology | Expected Behavior | |-------|------------------|-------------------|-------------------| | **Low** | <50 Ω·m | Clay/silt | Confined, stable, slow response | | **Medium** | 50-120 Ω·m | Mixed sediments | Semi-confined, moderate variability | | **High** | >120 Ω·m | Sand/gravel | Unconfined, high variability, rapid response | ```python #| code-fold: true # Define resistivity classes def classify_resistivity(resist): if resist < 50: return 'Low (<50 Ω·m)' elif resist < 120: return 'Medium (50-120 Ω·m)' else: return 'High (>120 Ω·m)' fusion_df['resist_class'] = fusion_df['resistivity_mean'].apply(classify_resistivity) # Compare response by resistivity class class_stats = fusion_df.groupby('resist_class').agg({ 'variance': ['mean', 'std', 'count'], 'acf_30': ['mean', 'std'], 'rate_of_change': ['mean', 'std'], 'trend': ['mean', 'std'] }).round(3) print("\nResponse Statistics by Resistivity Class:") print(class_stats) ``` **Interpreting the class statistics:** The table above shows mean and standard deviation for each response metric within each resistivity class. Key patterns to look for: - **Variance increasing Low → Medium → High**: Confirms that sandy aquifers fluctuate more - **ACF decreasing Low → Medium → High**: Confirms that sandy aquifers respond faster - **High standard deviation within classes**: Indicates other factors (depth, pumping, proximity to streams) also matter If classes don't show clear separation, the classification thresholds may need adjustment for this specific aquifer system. ## Resistivity Class Comparison The box plots below provide a visual comparison of how well behavior differs across resistivity classes. Box plots show the median, interquartile range (IQR), and outliers—making it easy to see both central tendency and spread. **What to look for:** - **Non-overlapping boxes**: Strong separation between classes—structure is a dominant control - **Overlapping boxes**: Weaker separation—other factors contribute substantially - **Outliers**: Wells that don't fit the pattern—candidates for further investigation ```python #| code-fold: true fig = make_subplots( rows=1, cols=3, subplot_titles=('Variance', 'Autocorrelation (30d)', 'Rate of Change') ) resist_classes = ['Low (<50 Ω·m)', 'Medium (50-120 Ω·m)', 'High (>120 Ω·m)'] colors = ['#e74c3c', '#f39c12', '#3498db'] for metric, col, title in [('variance', 1, 'Variance'), ('acf_30', 2, 'ACF(30)'), ('rate_of_change', 3, 'ROC')]: for resist_class, color in zip(resist_classes, colors): data = fusion_df[fusion_df['resist_class'] == resist_class] fig.add_trace( go.Box( y=data[metric], name=resist_class, marker_color=color, boxmean='sd', showlegend=(col == 1) ), row=1, col=col ) fig.update_yaxes(title_text='Water Level Variance', row=1, col=1) fig.update_yaxes(title_text='Autocorrelation', row=1, col=2) fig.update_yaxes(title_text='Rate of Change (m/day)', row=1, col=3) fig.update_layout( title_text='Well Response by HTEM Resistivity Class Low=Clay, Medium=Mixed, High=Sand/Gravel', height=500 ) fig.show() ``` If the hypothesis is correct, you should see: - **Variance**: Increasing from Low (red) to High (blue) - **Autocorrelation**: Decreasing from Low to High - **Rate of Change**: Increasing from Low to High The strength of these patterns indicates how much of well behavior can be explained by subsurface structure alone. ::: {.callout-note icon=false} ## 📊 Reading Box Plot Comparisons **This 3-panel figure tests structure-function hypotheses:** | Panel | Hypothesis Being Tested | Expected Pattern If Hypothesis Is True | |-------|------------------------|----------------------------------------| | **Left (Variance)** | High-resistivity zones have more variable water levels | Blue box higher than red box | | **Middle (ACF)** | Clay zones retain "memory" longer | Red box higher than blue box | | **Right (ROC)** | Sandy zones respond more rapidly | Blue box higher than red box | **Interpreting Box Plot Features:** - **Non-overlapping boxes**: Strong evidence for hypothesis (clear separation) - **Overlapping boxes with different medians**: Moderate evidence (trend exists but variability high) - **Completely overlapping boxes**: Hypothesis not supported (structure doesn't control this metric) - **Outliers (dots beyond whiskers)**: Wells that don't fit the pattern—investigate individually **Management Implications:** | Observation | Interpretation | Action | |-------------|---------------|--------| | **Strong variance separation** | Structure dominates behavior | Use HTEM for well placement | | **Weak variance separation** | Other factors dominate (pumping, climate) | Need additional data sources | | **Many outliers** | Local complexity (faults, heterogeneity) | Site-specific investigation needed | ::: ## Predictive HTEM Modeling ### What Is Random Forest Regression? **Random Forest** is a machine learning algorithm introduced by Leo Breiman in 2001. It creates hundreds of decision trees, each trained on a random subset of data, and combines their predictions through averaging. Think of it as consulting a "forest" of expert opinions rather than relying on a single expert. ### Why Does It Matter for Aquifer Analysis? Correlation and classification are descriptive—they tell us relationships exist. **Predictive modeling** goes further: can we actually *forecast* how a new well will behave based solely on HTEM data at its location? This has practical value: - **Before drilling**: Predict expected water level behavior from HTEM alone - **Monitoring design**: Estimate which locations need frequent vs. infrequent sampling - **Model validation**: Test whether our understanding of structure-function relationships generalizes We use **Random Forest regression**—an ensemble method that handles nonlinear relationships and automatically identifies important features. ```python #| code-fold: true # Feature matrix: HTEM properties X_features = [ 'resistivity_mean', 'resistivity_std', 'resistivity_min', 'resistivity_max', 'resistivity_p25', 'resistivity_p75', 'depth_mean', 'depth_range' ] X = fusion_df[X_features].dropna() # Target: Well variance (example) y_variance = fusion_df.loc[X.index, 'variance'] # Train-test split from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y_variance, test_size=0.3, random_state=42 ) # Random Forest model rf_model = RandomForestRegressor(n_estimators=100, random_state=42, max_depth=5) rf_model.fit(X_train, y_train) # Predictions y_pred = rf_model.predict(X_test) # Evaluate from sklearn.metrics import r2_score, mean_absolute_error r2 = r2_score(y_test, y_pred) mae = mean_absolute_error(y_test, y_pred) print(f"\nPredictive Model Performance (HTEM → Well Variance):") print(f"R² Score: {r2:.3f}") print(f"MAE: {mae:.3f}") # Feature importance feature_importance = pd.DataFrame({ 'feature': X_features, 'importance': rf_model.feature_importances_ }).sort_values('importance', ascending=False) print("\nFeature Importance:") print(feature_importance) ``` ### How to Interpret Model Performance Metrics Understanding these metrics is critical for assessing whether the model is trustworthy enough for operational decisions: **R² (Coefficient of Determination)** - Developed by Sewall Wright (1921) What it measures: Proportion of variance in well behavior explained by HTEM structure | R² Value | Interpretation | Management Action | |----------|---------------|------------------| | **R² > 0.7** | Strong predictive power | Structure dominates—use model for well placement | | **R² = 0.3-0.7** | Moderate | Structure matters but validate with local data | | **R² < 0.3** | Weak | Structure alone insufficient—investigate pumping, streams | **MAE (Mean Absolute Error)** What it measures: Average prediction error in original units (e.g., variance units) How to interpret: Compare MAE to the standard deviation of the target variable: - If MAE << std: Predictions are substantially better than random guessing - If MAE ≈ std: Model barely improves on baseline predictions - If MAE > std: Model is worse than simply predicting the mean **Feature Importance** What it measures: Which HTEM properties most influence predictions How to interpret: - **`resistivity_mean` dominates**: Average lithology is primary control—focus on bulk properties - **`resistivity_std` is important**: Geological heterogeneity drives behavior—expect local variability - **`depth_mean` matters**: Vertical position in aquifer is critical—confining layers at specific depths ## Predictive Model Results ```python #| code-fold: true fig = make_subplots( rows=1, cols=2, subplot_titles=('Predicted vs Actual Variance', 'Feature Importance') ) # Predicted vs actual fig.add_trace( go.Scatter( x=y_test, y=y_pred, mode='markers', marker=dict(size=8, color='steelblue', opacity=0.6), name='Test Data', hovertemplate='Actual: %{x:.2f} Predicted: %{y:.2f}<extra></extra>' ), row=1, col=1 ) # Perfect prediction line line_min, line_max = y_test.min(), y_test.max() fig.add_trace( go.Scatter( x=[line_min, line_max], y=[line_min, line_max], mode='lines', line=dict(color='red', dash='dash'), name='Perfect Prediction', showlegend=True ), row=1, col=1 ) # Feature importance fig.add_trace( go.Bar( x=feature_importance['importance'], y=feature_importance['feature'], orientation='h', marker_color='coral' ), row=1, col=2 ) fig.update_xaxes(title_text='Actual Variance', row=1, col=1) fig.update_xaxes(title_text='Importance', row=1, col=2) fig.update_yaxes(title_text='Predicted Variance', row=1, col=1) fig.update_yaxes(title_text='HTEM Feature', row=1, col=2) fig.update_layout( title_text=f'HTEM-Based Prediction of Well Response (R²={r2:.3f})', height=500, showlegend=True ) fig.show() ``` **Reading the results plot:** - **Left panel (Predicted vs. Actual)**: Points should cluster along the red dashed line (perfect prediction). Scatter around the line indicates prediction uncertainty. Systematic deviations (curve, bias) suggest the model is missing something. - **Right panel (Feature Importance)**: Shows which HTEM properties drive the predictions. This is scientifically informative—it tells us which aspects of structure most influence groundwater behavior. If the model performs well (R² > 0.5), we can use it to **predict behavior at unmonitored locations** based on HTEM data alone. ## Predicted Response Map The ultimate goal of structure-function fusion is **spatial prediction**: using dense HTEM coverage to fill gaps between sparse well observations. The map below shows predicted water level variance across the study area, based solely on HTEM structure. **Applications:** - Identify high-variability zones that need more monitoring - Locate stable zones suitable for long-term water supply - Guide placement of new observation wells - Prioritize areas for contamination vulnerability assessment ```python #| code-fold: true # Predict variance for all wells with HTEM data X_all = wells_with_htem[X_features].dropna() wells_with_predictions = wells_with_htem.loc[X_all.index].copy() wells_with_predictions['predicted_variance'] = rf_model.predict(X_all) # Map fig = go.Figure() fig.add_trace(go.Scattermapbox( lat=wells_with_predictions['Latitude'], lon=wells_with_predictions['Longitude'], mode='markers', marker=dict( size=12, color=wells_with_predictions['predicted_variance'], colorscale='RdYlBu_r', showscale=True, colorbar=dict(title='Predicted Variance') ), text=[f"Well {w} Resistivity: {r:.1f} Ω·m Predicted Var: {v:.2f}" for w, r, v in zip(wells_with_predictions['WellID'], wells_with_predictions['resistivity_mean'], wells_with_predictions['predicted_variance'])], hovertemplate='%{text}<extra></extra>' )) fig.update_layout( title='Predicted Well Response from HTEM Structure', mapbox=dict( style='open-street-map', center=dict(lat=wells_with_predictions['Latitude'].mean(), lon=wells_with_predictions['Longitude'].mean()), zoom=9 ), height=600 ) fig.show() ``` **Interpreting the prediction map:** Hot colors (red/orange) indicate zones where we predict high water level variability—these are likely unconfined, sandy areas that respond strongly to recharge and pumping. Cool colors (blue) indicate predicted stability—likely confined, clay-rich zones. Compare this map to the HTEM resistivity map at the beginning of the chapter. You should see correspondence: high-resistivity zones should generally show high predicted variance. Any discrepancies reveal where the relationship breaks down—perhaps due to local pumping, stream interaction, or 3D complexity not captured by Unit D alone. ## Physical Interpretation ::: {.callout-important icon=false} ## 🔍 Structure-Function Relationships **Confirmed Hypotheses:** 1. **High resistivity → High variance** (r={corr_resist_var:.2f}): Sand/gravel aquifers respond strongly to recharge/pumping 2. **High resistivity → Lower autocorrelation** (ρ={corr_resist_acf:.2f}): Unconfined aquifers have shorter "memory" 3. **Heterogeneity → Higher rate of change**: Spatially variable geology causes localized responses **Predictive Power:** - HTEM structure explains **{r2*100:.1f}%** of variance in well response - Most important feature: **{feature_importance.iloc[0]['feature']}** **Aquifer Classification:** - **Low resistivity (<50 Ω·m)**: Confined, stable, slow response - **Medium resistivity (50-120 Ω·m)**: Semi-confined, moderate variability - **High resistivity (>120 Ω·m)**: Unconfined, highly responsive ::: ## Management Applications ### 1. Well Placement Strategy ```python #| code-fold: true # Identify zones with different management needs wells_with_predictions['management_class'] = pd.cut( wells_with_predictions['predicted_variance'], bins=[0, 1, 3, 100], labels=['Stable', 'Moderate', 'High Variability'] ) print("Wells by Management Class:") print(wells_with_predictions['management_class'].value_counts()) # Monitoring frequency recommendations print("\nRecommended Monitoring Frequency:") print("- Stable zones: Quarterly") print("- Moderate zones: Monthly") print("- High variability zones: Weekly or continuous") ``` ### 2. Vulnerability Assessment ```python #| code-fold: true # High variance + shallow depth = vulnerability wells_with_predictions['vulnerability_score'] = ( wells_with_predictions['predicted_variance'] * (1 / wells_with_predictions['depth_mean']) # Shallow = higher score ) # Normalize to 0-100 wells_with_predictions['vulnerability_score'] = ( 100 * (wells_with_predictions['vulnerability_score'] / wells_with_predictions['vulnerability_score'].max()) ) # Categorize wells_with_predictions['vulnerability'] = pd.cut( wells_with_predictions['vulnerability_score'], bins=[0, 30, 60, 100], labels=['Low', 'Moderate', 'High'] ) print("Vulnerability Assessment:") print(wells_with_predictions['vulnerability'].value_counts()) ``` ## Key Insights ::: {.callout-tip icon=false} ## 💡 Fusion Insights **What HTEM tells us about groundwater:** - **Resistivity** predicts response magnitude - **Heterogeneity** predicts variability - **Depth** modulates sensitivity **What groundwater tells us about HTEM:** - Wells validate HTEM interpretation - Response patterns confirm lithology - Discrepancies reveal 3D complexity **Synergy:** - HTEM provides spatial coverage (dense grid) - Wells provide temporal dynamics (continuous monitoring) - Together: 4D understanding of aquifer system ::: ## Limitations and Uncertainties 1. **Depth mismatch**: HTEM Unit D depth range may not match well screen depth 2. **Scale effects**: HTEM resolution (~100m) vs well point measurement 3. **Temporal assumption**: HTEM is static, but geology can change (land subsidence) 4. **Confounding factors**: Pumping, nearby wells affect response independent of structure ## References - Barlow, P. M., & Leake, S. A. (2012). Streamflow depletion by wells. *USGS Scientific Investigations Report 2012-5095*. - Fitterman, D. V., & Stewart, M. T. (1986). Transient electromagnetic sounding for groundwater. *Geophysics*, 51(4), 995-1005. - Siemon, B., et al. (2009). A review of helicopter-borne electromagnetic methods for groundwater exploration. *Near Surface Geophysics*, 7(5-6), 629-646. ## Next Steps → **Chapter 6**: Weather-Response Fusion - Climate forcing of aquifer dynamics **Cross-Chapter Connections:** - HTEM data introduced in Part 1 - Well response metrics from Part 2 - Structure informs recharge pathways (Chapter 3) - Foundation for 4D fusion (Chapter 8) --- ## Summary HTEM-groundwater fusion reveals **structure controls on aquifer response**: ✅ **Spatial structure matters** - Resistivity patterns correlate with well response characteristics ✅ **Scale bridging achieved** - 100m HTEM grid linked to point well measurements ✅ **Validation pathway established** - Wells provide ground truth for HTEM interpretation ⚠️ **Depth mismatch possible** - HTEM unit depths may not match well screen depths ⚠️ **Temporal assumption** - HTEM is static, wells are dynamic **Key Insight**: HTEM provides the **spatial framework** that explains **why** wells at different locations behave differently. --- ## Reflection Questions - In your own study area, what are some qualitative signs (for example, rapid storm responses, long drought recovery, or very flat hydrographs) that subsurface structure is controlling groundwater behavior, and how might HTEM-style data help confirm those impressions? - Suppose the model in this chapter shows high variance and low autocorrelation in certain high-resistivity zones. How would that influence where you place new monitoring wells or where you would be cautious about additional pumping? - When HTEM-based predictions and observed well responses disagree at a location, what are at least three possible explanations (data issues, structural complexity, pumping, boundary effects), and how would you go about diagnosing which one dominates? - How could structure–response fusion like this feed into later chapters on forecasting, optimization, or value-of-information—for example, deciding where extra HTEM flights or new wells would most reduce uncertainty? --- ## Related Chapters - [HTEM Survey Overview](../part-1-foundations/htem-survey-overview.qmd) - Source HTEM data - [Well Network Analysis](../part-1-foundations/well-network-analysis.qmd) - Well response data - [Aquifer Material Map](../part-2-spatial/aquifer-material-map.qmd) - Spatial material distribution - [Weather-Response Fusion](weather-response-fusion.qmd) - Climate forcing analysis

34.1 What You Will Learn in This Chapter

34.2 Overview

34.3 Analysis Approach

34.4 HTEM Spatial Distribution

34.5 Material Type Distribution

34.6 Spatial Data Extraction

34.7 Well Response Characterization

34.8 Fusion Dataset Creation

34.9 Correlation Analysis

34.9.1 What Is Correlation Analysis?

34.9.2 Why Does It Matter for Aquifer Analysis?

34.9.3 How Does Correlation Work?

34.9.4 What Will You See Below?

34.9.5 How to Interpret Correlation Results

34.10 Structure-Function Scatter

34.11 Resistivity Class Analysis

34.12 Resistivity Class Comparison

34.13 Predictive HTEM Modeling

34.13.1 What Is Random Forest Regression?

34.13.2 Why Does It Matter for Aquifer Analysis?

34.13.3 How to Interpret Model Performance Metrics

34.14 Predictive Model Results

34.15 Predicted Response Map

34.16 Physical Interpretation

34.17 Management Applications

34.17.1 1. Well Placement Strategy

34.17.2 2. Vulnerability Assessment

34.18 Key Insights

34.19 Limitations and Uncertainties

34.20 References

34.21 Next Steps

34.22 Summary

34.23 Reflection Questions

34.24 Related Chapters