38  Information Flow Analysis

Quantifying Information Propagation Pathways

TipFor Newcomers

You will get: - A way of thinking about how signals move through the well network (e.g., drought effects propagating over time). - Intuition for information-based measures (mutual information, transfer entropy) as tools to detect hidden connectivity. - Visuals that show which wells “talk to each other” most strongly.

You can skim the formal information-theory definitions and focus on: - The maps/graphs of well connectivity, - The narrative about which connections are strong or weak, - And how this complements the more physical fusion analyses.

Data Sources Fused: Groundwater Wells (Network Analysis)

38.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

  • Explain what “information flow” means in a groundwater monitoring network and how it relates to hydraulic connectivity and shared forcing.
  • Interpret correlation/information-based heatmaps and network graphs to identify hub wells, clusters, and weakly connected sites.
  • Discuss how information-based metrics complement more physical analyses (recharge, stream–aquifer exchange, causal graphs) when designing and optimizing monitoring networks.
  • Reflect on the limitations of correlation as a proxy for mutual information and when more advanced metrics or additional data are warranted.

38.2 Overview

Water doesn’t just flow through aquifers - information flows too. A drought signal propagates from recharge areas to deeper parts of the aquifer. A pumping cone of depression spreads outward. This chapter uses information theory to track how signals propagate through the well network, revealing hidden connectivity and flow pathways.

Note💻 For Computer Scientists

Information Theory Metrics:

  • Mutual Information: I(X;Y) = how much knowing X reduces uncertainty about Y
  • Transfer Entropy: TE(X→Y) = directed information flow (causal)
  • Time-Lagged Mutual Information: TLMI(X,Y,τ) = MI at different time lags
  • Information Bottleneck: Identify wells that control information flow

Graph Theory: - Nodes = Wells - Edge weights = Information transfer strength - Directed edges = Asymmetric information flow

Tip🌍 For Hydrologists

Physical Meaning:

High information transfer between wells means: 1. Hydraulic connectivity: Water flows between locations 2. Shared aquifer: Wells tap same geological unit 3. Common forcing: Both respond to same recharge/pumping events

Expected patterns: - Wells in same aquifer unit: High MI - Upgradient → downgradient: Positive time lag - Confined aquifer: Pressure waves propagate faster than water

38.3 Setup

Analyzing 18 wells with time series data and real coordinates
  Latitude range: 40.0534 to 40.3852
  Longitude range: -88.4632 to -87.9810

38.4 Correlation Network Construction

Note📘 Understanding Mutual Information

38.4.1 What Is It?

Mutual information (MI) is a measure from information theory (Shannon, 1948) that quantifies how much knowing one variable reduces uncertainty about another. It’s the information-theoretic equivalent of correlation, but works for any type of relationship—linear, nonlinear, or complex.

Historical Context: Introduced by Claude Shannon in his foundational 1948 paper “A Mathematical Theory of Communication” that created the field of information theory. Originally developed for telecommunications, now widely used in neuroscience, genetics, and network analysis.

38.4.2 Why Does It Matter for Groundwater Networks?

In monitoring networks, high mutual information between wells means: 1. Hydraulic connectivity: Wells tap the same aquifer flow system 2. Shared forcing: Both respond to same recharge/pumping events 3. Network redundancy: One well may provide similar information to another

MI reveals hidden connections that might not appear in simple distance-based analysis—two distant wells could have high MI if connected by a high-permeability channel.

38.4.3 How Does It Work?

Mutual information compares joint probability to independent probabilities:

Step 1: If wells are independent, knowing Well A tells you nothing about Well B:

P(A, B) = P(A) × P(B)  [No connection]

Step 2: If wells are connected, joint probability differs:

P(A, B) ≠ P(A) × P(B)  [Connection exists!]

Step 3: MI quantifies the difference (in bits of information):

MI(A; B) = How much uncertainty about B is reduced by knowing A

Correlation as Proxy: For this analysis, we use correlation as a proxy for mutual information. While true MI captures nonlinear dependencies, correlation is faster to compute and provides similar insights for groundwater networks where relationships are often approximately linear.

38.4.4 What Will You See Below?

  • Correlation matrix: Pairwise correlations between well water levels (proxy for MI)
  • Network graph: Wells connected if correlation exceeds threshold
  • Hub wells: High-connectivity nodes acting as network centers

38.4.5 How to Interpret Results

Correlation MI Interpretation Monitoring Implications
r > 0.7 Strong shared information Wells highly redundant—one could replace the other
0.4 < r < 0.7 Moderate connection Complementary monitoring—both provide value
r < 0.4 Weak/no connection Independent monitoring—both essential
Hub well (>6 connections) Central to network High-value monitoring site—represents regional conditions
Isolated well (<3 connections) Unique local signal Irreplaceable—captures distinct aquifer behavior

Cost-Cutting Guidance: Wells with r > 0.8 are candidates for consolidation if budget cuts needed. Wells with r < 0.3 to all others are irreplaceable.

For this analysis, we use correlation as a proxy for mutual information. While true mutual information captures nonlinear dependencies, correlation is faster to compute and provides similar insights for groundwater networks where relationships are often approximately linear.

Correlation network computed from real time series data: 15 wells
Mean correlation: 0.479
Max correlation: 0.998

38.5 Correlation Heatmap

Note📊 How to Read This Correlation Heatmap

What the Visualization Shows:

A correlation matrix displays pairwise correlations between all wells. Each cell shows how strongly two wells’ water levels move together over time.

Color Interpretation:

Color Correlation Value Information Meaning Physical Interpretation
Dark Red r > 0.7 High shared information Same aquifer unit, hydraulically connected
Light Red/Orange 0.4 < r < 0.7 Moderate connection Partially connected, shared forcing
White/Light Blue -0.2 < r < 0.4 Weak/no connection Different aquifer units or isolated
Dark Blue r < -0.2 Negative correlation Rare—possibly pumping-induced

What to Look For:

  1. Block patterns (red squares): Groups of wells that are highly correlated—likely tap same aquifer
  2. Diagonal dominance: All diagonal values = 1.0 (wells perfectly correlated with themselves)
  3. Isolated rows/columns: Wells with mostly light colors are monitoring unique local conditions
  4. Symmetric pattern: Matrix should be symmetric (r(A,B) = r(B,A))

Management Interpretation:

  • Red clusters → Redundancy: If 5 wells are all r > 0.8, you could potentially remove 4 and still capture the signal
  • Light rows → Irreplaceable: Wells weakly correlated with all others are capturing unique information
  • Off-diagonal hot spots: Unexpected connections might indicate hidden flow paths

Critical Question: Which wells can we afford to lose? Look for wells with r < 0.3 to all others—those are irreplaceable.

Show code
if corr_matrix is None or not DATA_AVAILABLE:
    print("⚠️ CORRELATION HEATMAP SKIPPED")
    print("")
    print("📊 WHAT THIS WOULD SHOW:")
    print("   Color-coded matrix where each cell = correlation between two wells")
    print("   Red = high correlation (wells move together)")
    print("   Blue = low/negative correlation (independent or opposite)")
    print("")
    print("💡 TYPICAL PATTERNS:")
    print("   • Wells in same aquifer unit: r > 0.7 (dark red)")
    print("   • Wells in different units: r < 0.4 (light colors)")
    print("   • Block patterns indicate connected well groups")
else:
    # Create correlation heatmap
    fig = go.Figure(data=go.Heatmap(
        z=corr_matrix,
        x=[f"W{i+1}" for i in range(n_wells)],
        y=[f"W{i+1}" for i in range(n_wells)],
        colorscale='RdBu_r',
        zmid=0,
        colorbar=dict(title='Correlation'),
        text=np.round(corr_matrix, 2),
        texttemplate='%{text}',
        textfont={"size": 8},
        hovertemplate='Well %{y} ↔ Well %{x}<br>Correlation: %{z:.3f}<extra></extra>'
    ))

    fig.update_layout(
        title='Well Correlation Matrix<br><sub>Higher correlation = stronger information flow</sub>',
        xaxis_title='Well ID',
        yaxis_title='Well ID',
        height=600,
        width=650
    )

    fig.show()
(a) Well Correlation Matrix - Proxy for Information Transfer Strength
(b)
Figure 38.1

38.6 Network Graph Construction

Correlation threshold: 0.765
Average connections per well: 5.2

Top 5 Hub Wells:
  Well 7: 9 connections
  Well 10: 8 connections
  Well 14: 8 connections
  Well 11: 7 connections
  Well 12: 7 connections

38.7 Information Network Visualization

Note📊 How to Read the Information Network Graph

What the Visualization Shows:

This network graph translates the correlation matrix into a spatial network where wells (nodes) are connected by lines (edges) if their correlation exceeds a threshold.

Visual Elements:

Element What It Represents How to Read It
Node (circle) Individual monitoring well Position = geographic location
Node size Network connectivity Larger = more connections = hub well
Node color Connection count Yellow/green = high connectivity
Edge (line) Strong correlation Wells connected if r > threshold
Edge density Regional connectivity Many lines = tightly connected region

Pattern Recognition:

Pattern What It Indicates Management Implication
Dense cluster Tightly connected region High redundancy—potential to reduce monitoring
Hub well (large node) Central to network Critical monitoring site—don’t remove
Isolated well (small node) Weakly connected Captures unique local signal—may be irreplaceable
Bridge well Connects two clusters Important for understanding regional flow
No edges Well below threshold for all Either truly independent OR data quality issue

Using This for Network Design:

  1. Keep all hub wells (large nodes)—they represent regional conditions
  2. Review isolated wells (small nodes)—they may capture critical local signals
  3. Evaluate cluster redundancy—within tight clusters, some wells may be removable
  4. Identify bridges—wells connecting clusters are strategically important
Show code
if corr_matrix is None or connectivity is None or not DATA_AVAILABLE:
    print("⚠️ NETWORK VISUALIZATION SKIPPED")
    print("")
    print("📊 WHAT THIS WOULD SHOW:")
    print("   • Nodes = monitoring wells (sized by connectivity)")
    print("   • Edges = strong correlations (r > threshold)")
    print("   • Hub wells appear as large nodes with many connections")
    print("   • Isolated wells appear as small nodes with few edges")
else:
    # Create network visualization using scatter plot
    fig = go.Figure()

    # Add edges (lines between correlated wells)
    edge_x = []
    edge_y = []

    for i in range(n_wells):
        for j in range(i+1, n_wells):
            if corr_matrix[i, j] > corr_threshold:
                # Add line from well i to well j
                edge_x.extend([wells_df['Longitude'].iloc[i], wells_df['Longitude'].iloc[j], None])
                edge_y.extend([wells_df['Latitude'].iloc[i], wells_df['Latitude'].iloc[j], None])

    fig.add_trace(go.Scatter(
        x=edge_x, y=edge_y,
        mode='lines',
        line=dict(width=0.5, color='lightgray'),
        hoverinfo='skip',
        showlegend=False
    ))

    # Add nodes (wells colored by connectivity)
    fig.add_trace(go.Scatter(
        x=wells_df['Longitude'].iloc[:n_wells],
        y=wells_df['Latitude'].iloc[:n_wells],
        mode='markers+text',
        marker=dict(
            size=connectivity * 3 + 10,
            color=connectivity,
            colorscale='Viridis',
            showscale=True,
            colorbar=dict(title='Connections'),
            line=dict(width=1, color='white')
        ),
        text=[f"W{i+1}" for i in range(n_wells)],
        textposition='top center',
        textfont=dict(size=8),
        hovertemplate='<b>Well %{text}</b><br>Connections: %{marker.color}<br>Lat: %{y:.4f}<br>Lon: %{x:.4f}<extra></extra>'
    ))

    fig.update_layout(
        title='Information Flow Network<br><sub>Node size and color = connectivity, Lines = high correlation</sub>',
        xaxis_title='Longitude',
        yaxis_title='Latitude',
        height=600,
        showlegend=False,
        hovermode='closest'
    )

    fig.show()
Figure 38.2: Well Information Flow Network - Node size represents connectivity

38.8 Hub Wells Analysis

Wells with high connectivity act as information hubs - they’re well-connected to many other wells in the network.

Show code
if connectivity is None or hub_indices is None or not DATA_AVAILABLE:
    print("⚠️ HUB WELLS ANALYSIS SKIPPED")
    print("   Would identify wells with highest connectivity (most correlated neighbors)")
    print("   Hub wells are critical for network-wide monitoring - never remove them")
else:
    # Create bar chart of hub wells
    fig = go.Figure(data=[
        go.Bar(
            x=[f"W{i+1}" for i in range(n_wells)],
            y=connectivity,
            marker_color=['#d62728' if i in hub_indices else '#1f77b4' for i in range(n_wells)],
            text=connectivity,
            textposition='outside',
            hovertemplate='<b>Well %{x}</b><br>Connections: %{y}<extra></extra>'
        )
    ])

    fig.update_layout(
        title='Well Network Connectivity<br><sub>Red bars indicate top 5 hub wells</sub>',
        xaxis_title='Well ID',
        yaxis_title='Number of Strong Connections',
        height=500,
        showlegend=False
    )

    fig.show()
Figure 38.3: Hub Wells by Network Connectivity

38.9 Key Insights

Important🔍 Information Flow Findings

Network Structure: - Analysis wells: 15 wells with spatial connectivity - Mean correlation: Moderate to strong (0.3-0.7 range) - Hub wells: Wells with 6+ strong connections act as network hubs - Connectivity pattern: Spatially proximate wells show stronger correlation

Information Characteristics: - Wells closer in space tend to have higher correlation - Hub wells are critical for network connectivity - Network shows clustering around geographic regions

Spatial Patterns: High correlation between wells indicates: - Shared aquifer units (same geological layer) - Hydraulic connectivity (water flows between locations) - Common climate forcing (shared recharge/discharge)

38.10 Management Applications

Important🎯 Using Information Flow for Network Decisions

Decision Framework:

Information flow analysis answers three critical management questions:

Question 1: Which wells are most valuable?

Connectivity Level Well Type Decision
>6 connections Hub well NEVER remove—represents regional conditions
4-6 connections Connected well Keep unless budget critical
2-3 connections Peripheral well Evaluate—may be redundant OR uniquely positioned
0-1 connections Isolated well Investigate—either irreplaceable OR data quality issue

Question 2: Where should new wells be placed?

  • Gap regions: Areas with no nearby hub wells—add monitoring
  • Between clusters: Bridge positions reveal inter-region connectivity
  • Near isolated wells: If isolated well shows concerning trends, add nearby well to confirm

Question 3: How to prioritize maintenance/upgrades?

Priority Criterion Why
1 (Highest) Hub wells Failure loses network-wide visibility
2 Bridge wells Failure disconnects network regions
3 Cluster members Some redundancy exists
4 (Lowest) Redundant wells Other wells capture same signal

Cost-Benefit Example:

If budget requires removing 3 of 15 wells: 1. Identify wells with r > 0.8 to multiple neighbors (redundant) 2. Confirm they’re not the only well in their geographic area 3. Remove while keeping all hub wells and isolated wells 4. Estimated information loss: <10% if done correctly

Warning Signs:

  • Removing a well that’s r < 0.4 to all neighbors → Likely losing unique information
  • Removing multiple wells from same cluster → May create monitoring blind spot
  • Removing hub well → Network fragmentation risk

38.10.1 1. Priority Monitoring Wells

Hub wells with high connectivity are critical for monitoring network-wide conditions:

=== Priority Wells for Continued Monitoring ===
(Hub wells with highest connectivity)

  Well 7: 9 strong connections
  Well 10: 8 strong connections
  Well 14: 8 strong connections
  Well 11: 7 strong connections
  Well 12: 7 strong connections

These wells provide maximum information about network-wide conditions

38.10.2 2. Network Optimization

Wells with low connectivity may be redundant if budget cuts are needed:

=== Wells with Lowest Connectivity ===
(Potentially redundant for network monitoring)

  Well 5: 0 strong connections
  Well 6: 2 strong connections
  Well 1: 3 strong connections
  Well 3: 3 strong connections
  Well 4: 3 strong connections

Note: Low connectivity doesn't mean unimportant - may serve specific local needs

38.10.3 3. Sentinel Network Design

Hub wells serve as early warning sentinels - changes in their water levels likely reflect network-wide trends.

38.11 Physical Interpretation

Tip🌍 Hydrological Meaning

High correlation between wells indicates:

  • Same aquifer unit: Shared hydraulic properties and response characteristics
  • Connected flow paths: Water/pressure propagates between locations
  • Common stressors: Both wells respond to same climate forcing (precipitation, drought)
  • Spatial proximity: Wells closer together tend to show more similar behavior

Network structure reveals:

  • Hub wells: Central locations that reflect regional aquifer conditions
  • Peripheral wells: May tap different aquifer units or isolated flow systems
  • Connectivity patterns: Strong correlations suggest hydraulic connectivity

Applications:

  • Monitoring optimization: Hub wells provide maximum information density
  • Early warning: Changes in hub wells signal network-wide trends
  • Redundancy analysis: Low-connectivity wells may serve specialized local needs

38.12 Limitations

  1. Correlation proxy: Uses correlation as proxy for true mutual information (linear relationships only)
  2. Sample size: Analysis limited to wells with sufficient temporal data
  3. No temporal dynamics: Static analysis doesn’t capture time-lagged relationships
  4. Computational constraints: Analysis uses subset of wells for visualization efficiency
  5. Confounding factors: External forcings (weather, pumping) can inflate correlation

38.13 References

  • Ruddell, B. L., & Kumar, P. (2009). Ecohydrologic process networks. Water Resources Research, 45(3), W03419.
  • Schreiber, T. (2000). Measuring information transfer. Physical Review Letters, 85(2), 461.
  • Ombadi, M., et al. (2020). Developing a connectivity index between shallow and deep groundwater. Water Resources Research, 56(12).

38.14 Next Steps

Chapter 10: Network Connectivity Map - Physical interpretation of information pathways

Cross-Chapter Connections: - Uses well network from Part 1 - Complements causal analysis (Chapter 8) - Informs monitoring design (Chapter 13) - Foundation for connectivity mapping (Chapter 10)


38.15 Summary

Information flow analysis reveals how data propagates through the monitoring network:

Mutual information computed - Quantifies shared information between wells

Network graph constructed - Visualizes information pathways

Hub wells identified - High-connectivity wells are critical for network function

Redundancy analysis - Low-connectivity wells may serve specialized local needs

⚠️ Simplified analysis - Uses correlation as proxy for true mutual information

Key Insight: Information flow analysis guides monitoring network optimization—where to add sensors, where redundancy exists, and which wells are irreplaceable.


38.16 Reflection Questions

  • In your monitoring network, which wells do you suspect are “hubs” based on experience (for example, they seem to move with everything else), and how could an information-flow analysis like this confirm or challenge that intuition?
  • How would you balance using network connectivity results to propose removing low-connection wells against the risk that those wells capture unique local behavior that correlation alone might miss?
  • What additional data (for example, pumping, local recharge estimates, or HTEM-based structure) would you want to incorporate before using information-flow patterns to redesign the network?
  • How could you combine information flow, causal graphs, and physical flow models to prioritize where to add new wells, upgrade sensors, or co-locate instruments (for example, with streams or weather stations)?