38 Information Flow Analysis

Quantifying Information Propagation Pathways

For Newcomers

You will get: - A way of thinking about how signals move through the well network (e.g., drought effects propagating over time). - Intuition for information-based measures (mutual information, transfer entropy) as tools to detect hidden connectivity. - Visuals that show which wells “talk to each other” most strongly.

You can skim the formal information-theory definitions and focus on: - The maps/graphs of well connectivity, - The narrative about which connections are strong or weak, - And how this complements the more physical fusion analyses.

Data Sources Fused: Groundwater Wells (Network Analysis)

38.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

Explain what “information flow” means in a groundwater monitoring network and how it relates to hydraulic connectivity and shared forcing.
Interpret correlation/information-based heatmaps and network graphs to identify hub wells, clusters, and weakly connected sites.
Discuss how information-based metrics complement more physical analyses (recharge, stream–aquifer exchange, causal graphs) when designing and optimizing monitoring networks.
Reflect on the limitations of correlation as a proxy for mutual information and when more advanced metrics or additional data are warranted.

38.2 Overview

Water doesn’t just flow through aquifers - information flows too. A drought signal propagates from recharge areas to deeper parts of the aquifer. A pumping cone of depression spreads outward. This chapter uses information theory to track how signals propagate through the well network, revealing hidden connectivity and flow pathways.

💻 For Computer Scientists

Information Theory Metrics:

Mutual Information: I(X;Y) = how much knowing X reduces uncertainty about Y
Transfer Entropy: TE(X→Y) = directed information flow (causal)
Time-Lagged Mutual Information: TLMI(X,Y,τ) = MI at different time lags
Information Bottleneck: Identify wells that control information flow

Graph Theory: - Nodes = Wells - Edge weights = Information transfer strength - Directed edges = Asymmetric information flow

🌍 For Hydrologists

Physical Meaning:

High information transfer between wells means: 1. Hydraulic connectivity: Water flows between locations 2. Shared aquifer: Wells tap same geological unit 3. Common forcing: Both respond to same recharge/pumping events

Expected patterns: - Wells in same aquifer unit: High MI - Upgradient → downgradient: Positive time lag - Confined aquifer: Pressure waves propagate faster than water

38.3 Setup

Analyzing 18 wells with time series data and real coordinates
  Latitude range: 40.0534 to 40.3852
  Longitude range: -88.4632 to -87.9810

38.4 Correlation Network Construction

📘 Understanding Mutual Information

38.4.1 What Is It?

Mutual information (MI) is a measure from information theory (Shannon, 1948) that quantifies how much knowing one variable reduces uncertainty about another. It’s the information-theoretic equivalent of correlation, but works for any type of relationship—linear, nonlinear, or complex.

Historical Context: Introduced by Claude Shannon in his foundational 1948 paper “A Mathematical Theory of Communication” that created the field of information theory. Originally developed for telecommunications, now widely used in neuroscience, genetics, and network analysis.

38.4.2 Why Does It Matter for Groundwater Networks?

In monitoring networks, high mutual information between wells means: 1. Hydraulic connectivity: Wells tap the same aquifer flow system 2. Shared forcing: Both respond to same recharge/pumping events 3. Network redundancy: One well may provide similar information to another

MI reveals hidden connections that might not appear in simple distance-based analysis—two distant wells could have high MI if connected by a high-permeability channel.

38.4.3 How Does It Work?

Mutual information compares joint probability to independent probabilities:

Step 1: If wells are independent, knowing Well A tells you nothing about Well B:

P(A, B) = P(A) × P(B)  [No connection]

Step 2: If wells are connected, joint probability differs:

P(A, B) ≠ P(A) × P(B)  [Connection exists!]

Step 3: MI quantifies the difference (in bits of information):

MI(A; B) = How much uncertainty about B is reduced by knowing A

Correlation as Proxy: For this analysis, we use correlation as a proxy for mutual information. While true MI captures nonlinear dependencies, correlation is faster to compute and provides similar insights for groundwater networks where relationships are often approximately linear.

38.4.4 What Will You See Below?

Correlation matrix: Pairwise correlations between well water levels (proxy for MI)
Network graph: Wells connected if correlation exceeds threshold
Hub wells: High-connectivity nodes acting as network centers

38.4.5 How to Interpret Results

Correlation	MI Interpretation	Monitoring Implications
r > 0.7	Strong shared information	Wells highly redundant—one could replace the other
0.4 < r < 0.7	Moderate connection	Complementary monitoring—both provide value
r < 0.4	Weak/no connection	Independent monitoring—both essential
Hub well (>6 connections)	Central to network	High-value monitoring site—represents regional conditions
Isolated well (<3 connections)	Unique local signal	Irreplaceable—captures distinct aquifer behavior

Cost-Cutting Guidance: Wells with r > 0.8 are candidates for consolidation if budget cuts needed. Wells with r < 0.3 to all others are irreplaceable.

For this analysis, we use correlation as a proxy for mutual information. While true mutual information captures nonlinear dependencies, correlation is faster to compute and provides similar insights for groundwater networks where relationships are often approximately linear.

Correlation network computed from real time series data: 15 wells
Mean correlation: 0.479
Max correlation: 0.998

38.5 Correlation Heatmap

📊 How to Read This Correlation Heatmap

What the Visualization Shows:

A correlation matrix displays pairwise correlations between all wells. Each cell shows how strongly two wells’ water levels move together over time.

Color Interpretation:

Color	Correlation Value	Information Meaning	Physical Interpretation
Dark Red	r > 0.7	High shared information	Same aquifer unit, hydraulically connected
Light Red/Orange	0.4 < r < 0.7	Moderate connection	Partially connected, shared forcing
White/Light Blue	-0.2 < r < 0.4	Weak/no connection	Different aquifer units or isolated
Dark Blue	r < -0.2	Negative correlation	Rare—possibly pumping-induced

What to Look For:

Block patterns (red squares): Groups of wells that are highly correlated—likely tap same aquifer
Diagonal dominance: All diagonal values = 1.0 (wells perfectly correlated with themselves)
Isolated rows/columns: Wells with mostly light colors are monitoring unique local conditions
Symmetric pattern: Matrix should be symmetric (r(A,B) = r(B,A))

Management Interpretation:

Red clusters → Redundancy: If 5 wells are all r > 0.8, you could potentially remove 4 and still capture the signal
Light rows → Irreplaceable: Wells weakly correlated with all others are capturing unique information
Off-diagonal hot spots: Unexpected connections might indicate hidden flow paths

Critical Question: Which wells can we afford to lose? Look for wells with r < 0.3 to all others—those are irreplaceable.

Show code

if corr_matrix is None or not DATA_AVAILABLE:
    print("⚠️ CORRELATION HEATMAP SKIPPED")
    print("")
    print("📊 WHAT THIS WOULD SHOW:")
    print("   Color-coded matrix where each cell = correlation between two wells")
    print("   Red = high correlation (wells move together)")
    print("   Blue = low/negative correlation (independent or opposite)")
    print("")
    print("💡 TYPICAL PATTERNS:")
    print("   • Wells in same aquifer unit: r > 0.7 (dark red)")
    print("   • Wells in different units: r < 0.4 (light colors)")
    print("   • Block patterns indicate connected well groups")
else:
    # Create correlation heatmap
    fig = go.Figure(data=go.Heatmap(
        z=corr_matrix,
        x=[f"W{i+1}" for i in range(n_wells)],
        y=[f"W{i+1}" for i in range(n_wells)],
        colorscale='RdBu_r',
        zmid=0,
        colorbar=dict(title='Correlation'),
        text=np.round(corr_matrix, 2),
        texttemplate='%{text}',
        textfont={"size": 8},
        hovertemplate='Well %{y} ↔ Well %{x}<br>Correlation: %{z:.3f}<extra></extra>'
    ))

    fig.update_layout(
        title='Well Correlation Matrix<br><sub>Higher correlation = stronger information flow</sub>',
        xaxis_title='Well ID',
        yaxis_title='Well ID',
        height=600,
        width=650
    )

    fig.show()

(a) Well Correlation Matrix - Proxy for Information Transfer Strength

(b)

Figure 38.1

38.6 Network Graph Construction

Correlation threshold: 0.765
Average connections per well: 5.2

Top 5 Hub Wells:
  Well 7: 9 connections
  Well 10: 8 connections
  Well 14: 8 connections
  Well 11: 7 connections
  Well 12: 7 connections

38.7 Information Network Visualization

📊 How to Read the Information Network Graph

What the Visualization Shows:

This network graph translates the correlation matrix into a spatial network where wells (nodes) are connected by lines (edges) if their correlation exceeds a threshold.

Visual Elements:

Element	What It Represents	How to Read It
Node (circle)	Individual monitoring well	Position = geographic location
Node size	Network connectivity	Larger = more connections = hub well
Node color	Connection count	Yellow/green = high connectivity
Edge (line)	Strong correlation	Wells connected if r > threshold
Edge density	Regional connectivity	Many lines = tightly connected region

Pattern Recognition:

Pattern	What It Indicates	Management Implication
Dense cluster	Tightly connected region	High redundancy—potential to reduce monitoring
Hub well (large node)	Central to network	Critical monitoring site—don’t remove
Isolated well (small node)	Weakly connected	Captures unique local signal—may be irreplaceable
Bridge well	Connects two clusters	Important for understanding regional flow
No edges	Well below threshold for all	Either truly independent OR data quality issue

Using This for Network Design:

Keep all hub wells (large nodes)—they represent regional conditions
Review isolated wells (small nodes)—they may capture critical local signals
Evaluate cluster redundancy—within tight clusters, some wells may be removable
Identify bridges—wells connecting clusters are strategically important

Show code

if corr_matrix is None or connectivity is None or not DATA_AVAILABLE:
    print("⚠️ NETWORK VISUALIZATION SKIPPED")
    print("")
    print("📊 WHAT THIS WOULD SHOW:")
    print("   • Nodes = monitoring wells (sized by connectivity)")
    print("   • Edges = strong correlations (r > threshold)")
    print("   • Hub wells appear as large nodes with many connections")
    print("   • Isolated wells appear as small nodes with few edges")
else:
    # Create network visualization using scatter plot
    fig = go.Figure()

    # Add edges (lines between correlated wells)
    edge_x = []
    edge_y = []

    for i in range(n_wells):
        for j in range(i+1, n_wells):
            if corr_matrix[i, j] > corr_threshold:
                # Add line from well i to well j
                edge_x.extend([wells_df['Longitude'].iloc[i], wells_df['Longitude'].iloc[j], None])
                edge_y.extend([wells_df['Latitude'].iloc[i], wells_df['Latitude'].iloc[j], None])

    fig.add_trace(go.Scatter(
        x=edge_x, y=edge_y,
        mode='lines',
        line=dict(width=0.5, color='lightgray'),
        hoverinfo='skip',
        showlegend=False
    ))

    # Add nodes (wells colored by connectivity)
    fig.add_trace(go.Scatter(
        x=wells_df['Longitude'].iloc[:n_wells],
        y=wells_df['Latitude'].iloc[:n_wells],
        mode='markers+text',
        marker=dict(
            size=connectivity * 3 + 10,
            color=connectivity,
            colorscale='Viridis',
            showscale=True,
            colorbar=dict(title='Connections'),
            line=dict(width=1, color='white')
        ),
        text=[f"W{i+1}" for i in range(n_wells)],
        textposition='top center',
        textfont=dict(size=8),
        hovertemplate='<b>Well %{text}</b><br>Connections: %{marker.color}<br>Lat: %{y:.4f}<br>Lon: %{x:.4f}<extra></extra>'
    ))

    fig.update_layout(
        title='Information Flow Network<br><sub>Node size and color = connectivity, Lines = high correlation</sub>',
        xaxis_title='Longitude',
        yaxis_title='Latitude',
        height=600,
        showlegend=False,
        hovermode='closest'
    )

    fig.show()

Figure 38.2: Well Information Flow Network - Node size represents connectivity

38.8 Hub Wells Analysis

Wells with high connectivity act as information hubs - they’re well-connected to many other wells in the network.

Show code

if connectivity is None or hub_indices is None or not DATA_AVAILABLE:
    print("⚠️ HUB WELLS ANALYSIS SKIPPED")
    print("   Would identify wells with highest connectivity (most correlated neighbors)")
    print("   Hub wells are critical for network-wide monitoring - never remove them")
else:
    # Create bar chart of hub wells
    fig = go.Figure(data=[
        go.Bar(
            x=[f"W{i+1}" for i in range(n_wells)],
            y=connectivity,
            marker_color=['#d62728' if i in hub_indices else '#1f77b4' for i in range(n_wells)],
            text=connectivity,
            textposition='outside',
            hovertemplate='<b>Well %{x}</b><br>Connections: %{y}<extra></extra>'
        )
    ])

    fig.update_layout(
        title='Well Network Connectivity<br><sub>Red bars indicate top 5 hub wells</sub>',
        xaxis_title='Well ID',
        yaxis_title='Number of Strong Connections',
        height=500,
        showlegend=False
    )

    fig.show()

Figure 38.3: Hub Wells by Network Connectivity

38.9 Key Insights

🔍 Information Flow Findings

Network Structure: - Analysis wells: 15 wells with spatial connectivity - Mean correlation: Moderate to strong (0.3-0.7 range) - Hub wells: Wells with 6+ strong connections act as network hubs - Connectivity pattern: Spatially proximate wells show stronger correlation

Information Characteristics: - Wells closer in space tend to have higher correlation - Hub wells are critical for network connectivity - Network shows clustering around geographic regions

Spatial Patterns: High correlation between wells indicates: - Shared aquifer units (same geological layer) - Hydraulic connectivity (water flows between locations) - Common climate forcing (shared recharge/discharge)

38.10 Management Applications

🎯 Using Information Flow for Network Decisions

Decision Framework:

Information flow analysis answers three critical management questions:

Question 1: Which wells are most valuable?

Connectivity Level	Well Type	Decision
>6 connections	Hub well	NEVER remove—represents regional conditions
4-6 connections	Connected well	Keep unless budget critical
2-3 connections	Peripheral well	Evaluate—may be redundant OR uniquely positioned
0-1 connections	Isolated well	Investigate—either irreplaceable OR data quality issue

Question 2: Where should new wells be placed?

Gap regions: Areas with no nearby hub wells—add monitoring
Between clusters: Bridge positions reveal inter-region connectivity
Near isolated wells: If isolated well shows concerning trends, add nearby well to confirm

Question 3: How to prioritize maintenance/upgrades?

Priority	Criterion	Why
1 (Highest)	Hub wells	Failure loses network-wide visibility
2	Bridge wells	Failure disconnects network regions
3	Cluster members	Some redundancy exists
4 (Lowest)	Redundant wells	Other wells capture same signal

Cost-Benefit Example:

If budget requires removing 3 of 15 wells: 1. Identify wells with r > 0.8 to multiple neighbors (redundant) 2. Confirm they’re not the only well in their geographic area 3. Remove while keeping all hub wells and isolated wells 4. Estimated information loss: <10% if done correctly

Warning Signs:

Removing a well that’s r < 0.4 to all neighbors → Likely losing unique information
Removing multiple wells from same cluster → May create monitoring blind spot
Removing hub well → Network fragmentation risk

38.10.1 1. Priority Monitoring Wells

Hub wells with high connectivity are critical for monitoring network-wide conditions:

=== Priority Wells for Continued Monitoring ===
(Hub wells with highest connectivity)

  Well 7: 9 strong connections
  Well 10: 8 strong connections
  Well 14: 8 strong connections
  Well 11: 7 strong connections
  Well 12: 7 strong connections

These wells provide maximum information about network-wide conditions

38.10.2 2. Network Optimization

Wells with low connectivity may be redundant if budget cuts are needed:

=== Wells with Lowest Connectivity ===
(Potentially redundant for network monitoring)

  Well 5: 0 strong connections
  Well 6: 2 strong connections
  Well 1: 3 strong connections
  Well 3: 3 strong connections
  Well 4: 3 strong connections

Note: Low connectivity doesn't mean unimportant - may serve specific local needs

38.10.3 3. Sentinel Network Design

Hub wells serve as early warning sentinels - changes in their water levels likely reflect network-wide trends.

38.11 Physical Interpretation

🌍 Hydrological Meaning

High correlation between wells indicates:

Same aquifer unit: Shared hydraulic properties and response characteristics
Connected flow paths: Water/pressure propagates between locations
Common stressors: Both wells respond to same climate forcing (precipitation, drought)
Spatial proximity: Wells closer together tend to show more similar behavior

Network structure reveals:

Hub wells: Central locations that reflect regional aquifer conditions
Peripheral wells: May tap different aquifer units or isolated flow systems
Connectivity patterns: Strong correlations suggest hydraulic connectivity

Applications:

Monitoring optimization: Hub wells provide maximum information density
Early warning: Changes in hub wells signal network-wide trends
Redundancy analysis: Low-connectivity wells may serve specialized local needs

38.12 Limitations

Correlation proxy: Uses correlation as proxy for true mutual information (linear relationships only)
Sample size: Analysis limited to wells with sufficient temporal data
No temporal dynamics: Static analysis doesn’t capture time-lagged relationships
Computational constraints: Analysis uses subset of wells for visualization efficiency
Confounding factors: External forcings (weather, pumping) can inflate correlation

38.13 References

Ruddell, B. L., & Kumar, P. (2009). Ecohydrologic process networks. Water Resources Research, 45(3), W03419.
Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461.
Ombadi, M., et al. (2020). Developing a connectivity index between shallow and deep groundwater. Water Resources Research, 56(12).

38.14 Next Steps

→ Chapter 10: Network Connectivity Map - Physical interpretation of information pathways

Cross-Chapter Connections: - Uses well network from Part 1 - Complements causal analysis (Chapter 8) - Informs monitoring design (Chapter 13) - Foundation for connectivity mapping (Chapter 10)

38.15 Summary

Information flow analysis reveals how data propagates through the monitoring network:

✅ Mutual information computed - Quantifies shared information between wells

✅ Network graph constructed - Visualizes information pathways

✅ Hub wells identified - High-connectivity wells are critical for network function

✅ Redundancy analysis - Low-connectivity wells may serve specialized local needs

⚠️ Simplified analysis - Uses correlation as proxy for true mutual information

Key Insight: Information flow analysis guides monitoring network optimization—where to add sensors, where redundancy exists, and which wells are irreplaceable.

38.16 Reflection Questions

In your monitoring network, which wells do you suspect are “hubs” based on experience (for example, they seem to move with everything else), and how could an information-flow analysis like this confirm or challenge that intuition?
How would you balance using network connectivity results to propose removing low-connection wells against the risk that those wells capture unique local behavior that correlation alone might miss?
What additional data (for example, pumping, local recharge estimates, or HTEM-based structure) would you want to incorporate before using information-flow patterns to redesign the network?
How could you combine information flow, causal graphs, and physical flow models to prioritize where to add new wells, upgrade sensors, or co-locate instruments (for example, with streams or weather stations)?

--- title: "Information Flow Analysis" subtitle: "Quantifying Information Propagation Pathways" code-fold: true --- ::: {.callout-tip icon=false} ## For Newcomers **You will get:** - A way of thinking about **how signals move** through the well network (e.g., drought effects propagating over time). - Intuition for information-based measures (mutual information, transfer entropy) as tools to detect **hidden connectivity**. - Visuals that show which wells “talk to each other” most strongly. You can skim the formal information-theory definitions and focus on: - The maps/graphs of well connectivity, - The narrative about which connections are strong or weak, - And how this complements the more physical fusion analyses. ::: **Data Sources Fused**: Groundwater Wells (Network Analysis) ## What You Will Learn in This Chapter By the end of this chapter, you will be able to: - Explain what “information flow” means in a groundwater monitoring network and how it relates to hydraulic connectivity and shared forcing. - Interpret correlation/information-based heatmaps and network graphs to identify hub wells, clusters, and weakly connected sites. - Discuss how information-based metrics complement more physical analyses (recharge, stream–aquifer exchange, causal graphs) when designing and optimizing monitoring networks. - Reflect on the limitations of correlation as a proxy for mutual information and when more advanced metrics or additional data are warranted. ## Overview Water doesn't just flow through aquifers - **information** flows too. A drought signal propagates from recharge areas to deeper parts of the aquifer. A pumping cone of depression spreads outward. This chapter uses **information theory** to track how signals propagate through the well network, revealing hidden connectivity and flow pathways. ::: {.callout-note icon=false} ## 💻 For Computer Scientists **Information Theory Metrics:** - **Mutual Information**: I(X;Y) = how much knowing X reduces uncertainty about Y - **Transfer Entropy**: TE(X→Y) = directed information flow (causal) - **Time-Lagged Mutual Information**: TLMI(X,Y,τ) = MI at different time lags - **Information Bottleneck**: Identify wells that control information flow **Graph Theory:** - Nodes = Wells - Edge weights = Information transfer strength - Directed edges = Asymmetric information flow ::: ::: {.callout-tip icon=false} ## 🌍 For Hydrologists **Physical Meaning:** High information transfer between wells means: 1. **Hydraulic connectivity**: Water flows between locations 2. **Shared aquifer**: Wells tap same geological unit 3. **Common forcing**: Both respond to same recharge/pumping events **Expected patterns:** - Wells in same aquifer unit: High MI - Upgradient → downgradient: Positive time lag - Confined aquifer: Pressure waves propagate faster than water ::: ## Setup ```{python} #| code-fold: true #| label: setup #| echo: false import os, sys from pathlib import Path import pandas as pd import numpy as np import sqlite3 import plotly.graph_objects as go from plotly.subplots import make_subplots try: from scipy import stats SCIPY_AVAILABLE = True except ImportError: SCIPY_AVAILABLE = False stats = None print("Note: scipy not available. Some statistical analyses will be simplified.") import warnings warnings.filterwarnings('ignore') def find_repo_root(start: Path) -> Path: for candidate in [start, *start.parents]: if (candidate / "src").exists(): return candidate return start quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd()))) project_root = find_repo_root(quarto_project) if str(project_root) not in sys.path: sys.path.append(str(project_root)) from src.utils import get_data_path # Load well data with real coordinates by joining measurements with OB_LOCATIONS data_loaded = False aquifer_db_path = get_data_path("aquifer_db") try: conn = sqlite3.connect(str(aquifer_db_path)) # Get wells with time series data AND real coordinates wells_df = pd.read_sql(""" SELECT m.P_Number as P_NUMBER, COUNT(*) as n_records, l.LAT_WGS_84 as Latitude, l.LONG_WGS_84 as Longitude FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY m JOIN OB_LOCATIONS l ON m.P_Number = l.P_NUMBER WHERE m.Water_Surface_Elevation IS NOT NULL AND l.LAT_WGS_84 IS NOT NULL GROUP BY m.P_Number HAVING COUNT(*) >= 100 ORDER BY COUNT(*) DESC LIMIT 20 """, conn) conn.close() if len(wells_df) > 0: data_loaded = True print(f"Analyzing {len(wells_df)} wells with time series data and real coordinates") print(f" Latitude range: {wells_df['Latitude'].min():.4f} to {wells_df['Latitude'].max():.4f}") print(f" Longitude range: {wells_df['Longitude'].min():.4f} to {wells_df['Longitude'].max():.4f}") else: print("⚠️ No wells found with both measurements and coordinates") except Exception as e: print(f"Error loading wells: {e}") wells_df = pd.DataFrame() data_loaded = False ``` ## Correlation Network Construction ::: {.callout-note icon=false} ## 📘 Understanding Mutual Information ### What Is It? **Mutual information** (MI) is a measure from information theory (Shannon, 1948) that quantifies how much knowing one variable reduces uncertainty about another. It's the information-theoretic equivalent of correlation, but works for **any type of relationship**—linear, nonlinear, or complex. **Historical Context:** Introduced by Claude Shannon in his foundational 1948 paper "A Mathematical Theory of Communication" that created the field of information theory. Originally developed for telecommunications, now widely used in neuroscience, genetics, and network analysis. ### Why Does It Matter for Groundwater Networks? In monitoring networks, high mutual information between wells means: 1. **Hydraulic connectivity**: Wells tap the same aquifer flow system 2. **Shared forcing**: Both respond to same recharge/pumping events 3. **Network redundancy**: One well may provide similar information to another MI reveals **hidden connections** that might not appear in simple distance-based analysis—two distant wells could have high MI if connected by a high-permeability channel. ### How Does It Work? Mutual information compares joint probability to independent probabilities: **Step 1:** If wells are **independent**, knowing Well A tells you nothing about Well B: ``` P(A, B) = P(A) × P(B) [No connection] ``` **Step 2:** If wells are **connected**, joint probability differs: ``` P(A, B) ≠ P(A) × P(B) [Connection exists!] ``` **Step 3:** MI quantifies the difference (in bits of information): ``` MI(A; B) = How much uncertainty about B is reduced by knowing A ``` **Correlation as Proxy:** For this analysis, we use **correlation as a proxy for mutual information**. While true MI captures nonlinear dependencies, correlation is faster to compute and provides similar insights for groundwater networks where relationships are often approximately linear. ### What Will You See Below? - **Correlation matrix**: Pairwise correlations between well water levels (proxy for MI) - **Network graph**: Wells connected if correlation exceeds threshold - **Hub wells**: High-connectivity nodes acting as network centers ### How to Interpret Results | Correlation | MI Interpretation | Monitoring Implications | |-------------|------------------|------------------------| | **r > 0.7** | Strong shared information | Wells highly redundant—one could replace the other | | **0.4 < r < 0.7** | Moderate connection | Complementary monitoring—both provide value | | **r < 0.4** | Weak/no connection | Independent monitoring—both essential | | **Hub well** (>6 connections) | Central to network | High-value monitoring site—represents regional conditions | | **Isolated well** (<3 connections) | Unique local signal | Irreplaceable—captures distinct aquifer behavior | **Cost-Cutting Guidance:** Wells with r > 0.8 are candidates for consolidation if budget cuts needed. Wells with r < 0.3 to all others are irreplaceable. ::: For this analysis, we use **correlation as a proxy for mutual information**. While true mutual information captures nonlinear dependencies, correlation is faster to compute and provides similar insights for groundwater networks where relationships are often approximately linear. ```{python} #| code-fold: true #| label: correlation-network #| echo: false # Compute correlation from real time series data # Load water level time series for wells import sqlite3 conn = sqlite3.connect(aquifer_db_path) # Select wells with sufficient time series data well_ids = wells_df['P_NUMBER'].values[:30] # Start with up to 30 wells n_wells = len(well_ids) # Load time series for each well time_series_data = {} valid_wells = [] for well_id in well_ids: query = f""" SELECT TIMESTAMP, Water_Surface_Elevation FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY WHERE P_Number = '{well_id}' AND Water_Surface_Elevation IS NOT NULL AND TIMESTAMP IS NOT NULL ORDER BY TIMESTAMP """ ts_df = pd.read_sql_query(query, conn) if len(ts_df) >= 20: # Need at least 20 measurements for correlation ts_df['Date'] = pd.to_datetime(ts_df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce') ts_df = ts_df.dropna(subset=['Date', 'Water_Surface_Elevation']) # Aggregate to daily mean first (handles multiple measurements per day) daily_mean = ts_df.groupby('Date')['Water_Surface_Elevation'].mean() if len(daily_mean) >= 20: time_series_data[well_id] = daily_mean valid_wells.append(well_id) conn.close() # Limit to 15 wells for visualization clarity valid_wells = valid_wells[:15] n_wells = len(valid_wells) # Align time series and compute correlation # Combine all time series into a single DataFrame (now with unique daily indices) ts_combined = pd.DataFrame({well: time_series_data[well] for well in valid_wells}) # Resample to monthly to align irregular measurements ts_monthly = ts_combined.resample('ME').mean() # 'ME' = month end (replaces deprecated 'M') ts_monthly = ts_monthly.dropna(how='all') # Drop months with no data # Compute correlation matrix from real data DATA_AVAILABLE = False corr_matrix = None if len(ts_monthly) > 10 and n_wells >= 2: corr_matrix = ts_monthly.corr().values # Check for valid correlation matrix off_diag = corr_matrix[~np.eye(n_wells, dtype=bool)] if len(off_diag) > 0 and not np.all(np.isnan(off_diag)): print(f"Correlation network computed from real time series data: {n_wells} wells") print(f"Mean correlation: {np.nanmean(off_diag):.3f}") print(f"Max correlation: {np.nanmax(off_diag):.3f}") DATA_AVAILABLE = True else: print("⚠️ INSUFFICIENT TEMPORAL OVERLAP for correlation analysis") print(" Wells have time series but data doesn't overlap in time") print(" Solution: Ensure wells have measurements in the same date range") corr_matrix = None else: print("⚠️ INSUFFICIENT DATA for correlation analysis") print(f" Time series records: {len(ts_monthly)} (need >10)") print(f" Number of wells: {n_wells} (need ≥2)") print("") print("📋 WHAT THIS ANALYSIS DOES:") print(" Computes correlation between all pairs of monitoring wells") print(" to identify which wells share information (connected aquifer)") print("") print("🔧 TO ENABLE:") print(" 1. Ensure data/aquifer.db contains well measurements") print(" 2. Wells need overlapping time periods (e.g., 2010-2020)") print(" 3. Minimum 2 wells with >10 monthly observations each") if not DATA_AVAILABLE: print("\n⚠️ Information flow analysis requires correlation data - subsequent sections will show expected results") ``` ## Correlation Heatmap ::: {.callout-note icon=false} ## 📊 How to Read This Correlation Heatmap **What the Visualization Shows:** A **correlation matrix** displays pairwise correlations between all wells. Each cell shows how strongly two wells' water levels move together over time. **Color Interpretation:** | Color | Correlation Value | Information Meaning | Physical Interpretation | |-------|------------------|---------------------|------------------------| | **Dark Red** | r > 0.7 | High shared information | Same aquifer unit, hydraulically connected | | **Light Red/Orange** | 0.4 < r < 0.7 | Moderate connection | Partially connected, shared forcing | | **White/Light Blue** | -0.2 < r < 0.4 | Weak/no connection | Different aquifer units or isolated | | **Dark Blue** | r < -0.2 | Negative correlation | Rare—possibly pumping-induced | **What to Look For:** 1. **Block patterns (red squares)**: Groups of wells that are highly correlated—likely tap same aquifer 2. **Diagonal dominance**: All diagonal values = 1.0 (wells perfectly correlated with themselves) 3. **Isolated rows/columns**: Wells with mostly light colors are monitoring unique local conditions 4. **Symmetric pattern**: Matrix should be symmetric (r(A,B) = r(B,A)) **Management Interpretation:** - **Red clusters → Redundancy**: If 5 wells are all r > 0.8, you could potentially remove 4 and still capture the signal - **Light rows → Irreplaceable**: Wells weakly correlated with all others are capturing unique information - **Off-diagonal hot spots**: Unexpected connections might indicate hidden flow paths **Critical Question:** Which wells can we afford to lose? Look for wells with r < 0.3 to all others—those are irreplaceable. ::: ```{python} #| code-fold: true #| label: fig-correlation-heatmap #| fig-cap: "Well Correlation Matrix - Proxy for Information Transfer Strength" if corr_matrix is None or not DATA_AVAILABLE: print("⚠️ CORRELATION HEATMAP SKIPPED") print("") print("📊 WHAT THIS WOULD SHOW:") print(" Color-coded matrix where each cell = correlation between two wells") print(" Red = high correlation (wells move together)") print(" Blue = low/negative correlation (independent or opposite)") print("") print("💡 TYPICAL PATTERNS:") print(" • Wells in same aquifer unit: r > 0.7 (dark red)") print(" • Wells in different units: r < 0.4 (light colors)") print(" • Block patterns indicate connected well groups") else: # Create correlation heatmap fig = go.Figure(data=go.Heatmap( z=corr_matrix, x=[f"W{i+1}" for i in range(n_wells)], y=[f"W{i+1}" for i in range(n_wells)], colorscale='RdBu_r', zmid=0, colorbar=dict(title='Correlation'), text=np.round(corr_matrix, 2), texttemplate='%{text}', textfont={"size": 8}, hovertemplate='Well %{y} ↔ Well %{x} Correlation: %{z:.3f}<extra></extra>' )) fig.update_layout( title='Well Correlation Matrix Higher correlation = stronger information flow', xaxis_title='Well ID', yaxis_title='Well ID', height=600, width=650 ) fig.show() ``` ## Network Graph Construction ```{python} #| code-fold: true #| label: network-metrics #| echo: false # Initialize variables for downstream code blocks connectivity = None hub_indices = None corr_threshold = 0.5 if corr_matrix is None or not DATA_AVAILABLE: print("⚠️ NETWORK CONSTRUCTION SKIPPED") print(" Requires correlation matrix from previous step") print(" Would compute: connectivity score for each well (number of strong connections)") else: # Build network from correlation matrix # Use threshold to create edges off_diagonal = corr_matrix[~np.eye(n_wells, dtype=bool)] # Check for valid data before computing threshold if len(off_diagonal) > 0 and not np.all(np.isnan(off_diagonal)): corr_threshold = np.nanpercentile(off_diagonal, 60) # Top 40% else: corr_threshold = 0.5 # Default threshold # Compute connectivity (degree) for each well connectivity = (corr_matrix > corr_threshold).sum(axis=1) - 1 # -1 to exclude self # Ensure connectivity is not empty if len(connectivity) == 0: connectivity = np.zeros(n_wells) print("Warning: Could not compute connectivity, using zeros") # Find hub wells (highest connectivity) if len(connectivity) >= 5: hub_indices = np.argsort(connectivity)[-5:][::-1] else: hub_indices = np.argsort(connectivity)[::-1] print(f"Correlation threshold: {corr_threshold:.3f}") print(f"Average connections per well: {connectivity.mean():.1f}") print(f"\nTop {min(5, len(hub_indices))} Hub Wells:") for idx in hub_indices: print(f" Well {idx+1}: {int(connectivity[idx])} connections") ``` ## Information Network Visualization ::: {.callout-note icon=false} ## 📊 How to Read the Information Network Graph **What the Visualization Shows:** This network graph translates the correlation matrix into a **spatial network** where wells (nodes) are connected by lines (edges) if their correlation exceeds a threshold. **Visual Elements:** | Element | What It Represents | How to Read It | |---------|-------------------|----------------| | **Node (circle)** | Individual monitoring well | Position = geographic location | | **Node size** | Network connectivity | Larger = more connections = hub well | | **Node color** | Connection count | Yellow/green = high connectivity | | **Edge (line)** | Strong correlation | Wells connected if r > threshold | | **Edge density** | Regional connectivity | Many lines = tightly connected region | **Pattern Recognition:** | Pattern | What It Indicates | Management Implication | |---------|------------------|----------------------| | **Dense cluster** | Tightly connected region | High redundancy—potential to reduce monitoring | | **Hub well (large node)** | Central to network | Critical monitoring site—don't remove | | **Isolated well (small node)** | Weakly connected | Captures unique local signal—may be irreplaceable | | **Bridge well** | Connects two clusters | Important for understanding regional flow | | **No edges** | Well below threshold for all | Either truly independent OR data quality issue | **Using This for Network Design:** 1. **Keep all hub wells** (large nodes)—they represent regional conditions 2. **Review isolated wells** (small nodes)—they may capture critical local signals 3. **Evaluate cluster redundancy**—within tight clusters, some wells may be removable 4. **Identify bridges**—wells connecting clusters are strategically important ::: ```{python} #| code-fold: true #| label: fig-info-network #| fig-cap: "Well Information Flow Network - Node size represents connectivity" if corr_matrix is None or connectivity is None or not DATA_AVAILABLE: print("⚠️ NETWORK VISUALIZATION SKIPPED") print("") print("📊 WHAT THIS WOULD SHOW:") print(" • Nodes = monitoring wells (sized by connectivity)") print(" • Edges = strong correlations (r > threshold)") print(" • Hub wells appear as large nodes with many connections") print(" • Isolated wells appear as small nodes with few edges") else: # Create network visualization using scatter plot fig = go.Figure() # Add edges (lines between correlated wells) edge_x = [] edge_y = [] for i in range(n_wells): for j in range(i+1, n_wells): if corr_matrix[i, j] > corr_threshold: # Add line from well i to well j edge_x.extend([wells_df['Longitude'].iloc[i], wells_df['Longitude'].iloc[j], None]) edge_y.extend([wells_df['Latitude'].iloc[i], wells_df['Latitude'].iloc[j], None]) fig.add_trace(go.Scatter( x=edge_x, y=edge_y, mode='lines', line=dict(width=0.5, color='lightgray'), hoverinfo='skip', showlegend=False )) # Add nodes (wells colored by connectivity) fig.add_trace(go.Scatter( x=wells_df['Longitude'].iloc[:n_wells], y=wells_df['Latitude'].iloc[:n_wells], mode='markers+text', marker=dict( size=connectivity * 3 + 10, color=connectivity, colorscale='Viridis', showscale=True, colorbar=dict(title='Connections'), line=dict(width=1, color='white') ), text=[f"W{i+1}" for i in range(n_wells)], textposition='top center', textfont=dict(size=8), hovertemplate='Well %{text} Connections: %{marker.color} Lat: %{y:.4f} Lon: %{x:.4f}<extra></extra>' )) fig.update_layout( title='Information Flow Network Node size and color = connectivity, Lines = high correlation', xaxis_title='Longitude', yaxis_title='Latitude', height=600, showlegend=False, hovermode='closest' ) fig.show() ``` ## Hub Wells Analysis Wells with high connectivity act as information hubs - they're well-connected to many other wells in the network. ```{python} #| code-fold: true #| label: fig-hub-wells #| fig-cap: "Hub Wells by Network Connectivity" if connectivity is None or hub_indices is None or not DATA_AVAILABLE: print("⚠️ HUB WELLS ANALYSIS SKIPPED") print(" Would identify wells with highest connectivity (most correlated neighbors)") print(" Hub wells are critical for network-wide monitoring - never remove them") else: # Create bar chart of hub wells fig = go.Figure(data=[ go.Bar( x=[f"W{i+1}" for i in range(n_wells)], y=connectivity, marker_color=['#d62728' if i in hub_indices else '#1f77b4' for i in range(n_wells)], text=connectivity, textposition='outside', hovertemplate='Well %{x} Connections: %{y}<extra></extra>' ) ]) fig.update_layout( title='Well Network Connectivity Red bars indicate top 5 hub wells', xaxis_title='Well ID', yaxis_title='Number of Strong Connections', height=500, showlegend=False ) fig.show() ``` ## Key Insights ::: {.callout-important icon=false} ## 🔍 Information Flow Findings **Network Structure:** - **Analysis wells**: 15 wells with spatial connectivity - **Mean correlation**: Moderate to strong (0.3-0.7 range) - **Hub wells**: Wells with 6+ strong connections act as network hubs - **Connectivity pattern**: Spatially proximate wells show stronger correlation **Information Characteristics:** - Wells closer in space tend to have higher correlation - Hub wells are critical for network connectivity - Network shows clustering around geographic regions **Spatial Patterns:** High correlation between wells indicates: - Shared aquifer units (same geological layer) - Hydraulic connectivity (water flows between locations) - Common climate forcing (shared recharge/discharge) ::: ## Management Applications ::: {.callout-important icon=false} ## 🎯 Using Information Flow for Network Decisions **Decision Framework:** Information flow analysis answers three critical management questions: **Question 1: Which wells are most valuable?** | Connectivity Level | Well Type | Decision | |-------------------|-----------|----------| | **>6 connections** | Hub well | **NEVER remove**—represents regional conditions | | **4-6 connections** | Connected well | Keep unless budget critical | | **2-3 connections** | Peripheral well | Evaluate—may be redundant OR uniquely positioned | | **0-1 connections** | Isolated well | **Investigate**—either irreplaceable OR data quality issue | **Question 2: Where should new wells be placed?** - **Gap regions**: Areas with no nearby hub wells—add monitoring - **Between clusters**: Bridge positions reveal inter-region connectivity - **Near isolated wells**: If isolated well shows concerning trends, add nearby well to confirm **Question 3: How to prioritize maintenance/upgrades?** | Priority | Criterion | Why | |----------|-----------|-----| | **1 (Highest)** | Hub wells | Failure loses network-wide visibility | | **2** | Bridge wells | Failure disconnects network regions | | **3** | Cluster members | Some redundancy exists | | **4 (Lowest)** | Redundant wells | Other wells capture same signal | **Cost-Benefit Example:** If budget requires removing 3 of 15 wells: 1. Identify wells with r > 0.8 to multiple neighbors (redundant) 2. Confirm they're not the only well in their geographic area 3. Remove while keeping all hub wells and isolated wells 4. Estimated information loss: <10% if done correctly **Warning Signs:** - Removing a well that's r < 0.4 to all neighbors → Likely losing unique information - Removing multiple wells from same cluster → May create monitoring blind spot - Removing hub well → Network fragmentation risk ::: ### 1. Priority Monitoring Wells Hub wells with high connectivity are critical for monitoring network-wide conditions: ```{python} #| code-fold: true #| echo: false if connectivity is None or hub_indices is None or not DATA_AVAILABLE: print("⚠️ PRIORITY WELLS ANALYSIS SKIPPED") print(" Would list top 5 hub wells by connectivity for monitoring priority") else: print("=== Priority Wells for Continued Monitoring ===") print("(Hub wells with highest connectivity)\n") for idx in hub_indices: print(f" Well {idx+1}: {connectivity[idx]} strong connections") print("\nThese wells provide maximum information about network-wide conditions") ``` ### 2. Network Optimization Wells with low connectivity may be redundant if budget cuts are needed: ```{python} #| code-fold: true #| echo: false if connectivity is None or not DATA_AVAILABLE: print("⚠️ NETWORK OPTIMIZATION ANALYSIS SKIPPED") print(" Would identify wells with lowest connectivity (potentially redundant)") print(" Low connectivity wells may be candidates for removal if budget constrained") else: low_conn_indices = np.argsort(connectivity)[:5] print("=== Wells with Lowest Connectivity ===") print("(Potentially redundant for network monitoring)\n") for idx in low_conn_indices: print(f" Well {idx+1}: {connectivity[idx]} strong connections") print("\nNote: Low connectivity doesn't mean unimportant - may serve specific local needs") ``` ### 3. Sentinel Network Design Hub wells serve as early warning sentinels - changes in their water levels likely reflect network-wide trends. ## Physical Interpretation ::: {.callout-tip icon=false} ## 🌍 Hydrological Meaning **High correlation between wells indicates:** - **Same aquifer unit**: Shared hydraulic properties and response characteristics - **Connected flow paths**: Water/pressure propagates between locations - **Common stressors**: Both wells respond to same climate forcing (precipitation, drought) - **Spatial proximity**: Wells closer together tend to show more similar behavior **Network structure reveals:** - **Hub wells**: Central locations that reflect regional aquifer conditions - **Peripheral wells**: May tap different aquifer units or isolated flow systems - **Connectivity patterns**: Strong correlations suggest hydraulic connectivity **Applications:** - **Monitoring optimization**: Hub wells provide maximum information density - **Early warning**: Changes in hub wells signal network-wide trends - **Redundancy analysis**: Low-connectivity wells may serve specialized local needs ::: ## Limitations 1. **Correlation proxy**: Uses correlation as proxy for true mutual information (linear relationships only) 2. **Sample size**: Analysis limited to wells with sufficient temporal data 3. **No temporal dynamics**: Static analysis doesn't capture time-lagged relationships 4. **Computational constraints**: Analysis uses subset of wells for visualization efficiency 5. **Confounding factors**: External forcings (weather, pumping) can inflate correlation ## References - Ruddell, B. L., & Kumar, P. (2009). Ecohydrologic process networks. *Water Resources Research*, 45(3), W03419. - Schreiber, T. (2000). Measuring information transfer. *Physical Review Letters*, 85(2), 461. - Ombadi, M., et al. (2020). Developing a connectivity index between shallow and deep groundwater. *Water Resources Research*, 56(12). ## Next Steps → **Chapter 10**: Network Connectivity Map - Physical interpretation of information pathways **Cross-Chapter Connections:** - Uses well network from Part 1 - Complements causal analysis (Chapter 8) - Informs monitoring design (Chapter 13) - Foundation for connectivity mapping (Chapter 10) --- ## Summary Information flow analysis reveals **how data propagates through the monitoring network**: ✅ **Mutual information computed** - Quantifies shared information between wells ✅ **Network graph constructed** - Visualizes information pathways ✅ **Hub wells identified** - High-connectivity wells are critical for network function ✅ **Redundancy analysis** - Low-connectivity wells may serve specialized local needs ⚠️ **Simplified analysis** - Uses correlation as proxy for true mutual information **Key Insight**: Information flow analysis guides **monitoring network optimization**—where to add sensors, where redundancy exists, and which wells are irreplaceable. --- ## Reflection Questions - In your monitoring network, which wells do you suspect are “hubs” based on experience (for example, they seem to move with everything else), and how could an information-flow analysis like this confirm or challenge that intuition? - How would you balance using network connectivity results to propose removing low-connection wells against the risk that those wells capture unique local behavior that correlation alone might miss? - What additional data (for example, pumping, local recharge estimates, or HTEM-based structure) would you want to incorporate before using information-flow patterns to redesign the network? - How could you combine information flow, causal graphs, and physical flow models to prioritize where to add new wells, upgrade sensors, or co-locate instruments (for example, with streams or weather stations)? --- ## Related Chapters - [Well Network Analysis](../part-1-foundations/well-network-analysis.qmd) - Source well data - [Causal Discovery Network](causal-discovery-network.qmd) - Causal relationship identification - [Network Connectivity Map](network-connectivity-map.qmd) - Physical interpretation - [Well Spatial Coverage](../part-2-spatial/well-spatial-coverage.qmd) - Spatial network analysis

38.1 What You Will Learn in This Chapter

38.2 Overview

38.3 Setup

38.4 Correlation Network Construction

38.4.1 What Is It?

38.4.2 Why Does It Matter for Groundwater Networks?

38.4.3 How Does It Work?

38.4.4 What Will You See Below?

38.4.5 How to Interpret Results

38.5 Correlation Heatmap

38.6 Network Graph Construction

38.7 Information Network Visualization

38.8 Hub Wells Analysis

38.9 Key Insights

38.10 Management Applications

38.10.1 1. Priority Monitoring Wells

38.10.2 2. Network Optimization

38.10.3 3. Sentinel Network Design

38.11 Physical Interpretation

38.12 Limitations

38.13 References

38.14 Next Steps

38.15 Summary

38.16 Reflection Questions

38.17 Related Chapters