24 Recharge Lag Analysis

Cross-correlation of precipitation versus groundwater response

For Newcomers

You will learn:

How long it takes for rain to reach the aquifer (the “lag time”)
Why some rain events cause immediate water level rise while others don’t
How to measure the connection strength between precipitation and groundwater
What lag times reveal about aquifer depth and recharge pathways

When rain falls, it doesn’t instantly appear in the aquifer. This chapter measures the delay—sometimes days, sometimes months—revealing how deeply buried and how connected the aquifer really is.

24.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

Explain what “recharge lag” means and how cross-correlation is used to estimate the delay between precipitation and groundwater response.
Interpret a cross-correlation curve and dual time series plot to distinguish immediate (barometric or shallow) responses from true recharge-driven changes.
Identify methodological pitfalls (short records, detrending, barometric effects, compromised wells) that can create misleading lag estimates.
Decide what additional data and analyses are needed before drawing firm conclusions about recharge timing in a confined aquifer.

✅ Loaded 100,000 groundwater and 150,000 weather records

24.2 Introduction

How long does it take for precipitation to reach the water table? This chapter uses cross-correlation analysis to quantify the time delay between precipitation events and groundwater level response.

Analysis Period: 2010-07-16 to 2012-06-05 (682 days)

Well Analyzed: 434983

Source: Analysis adapted from precipitation-groundwater-lag.qmd

24.3 Key Findings

24.3.1 Cross-Correlation Analysis

What Is Cross-Correlation?

Cross-correlation is a statistical technique that measures the similarity between two time series as a function of the time lag between them. Developed in the 1950s-1960s for signal processing, it became a standard tool in hydrology for identifying time delays between climate forcing (precipitation) and aquifer response (water level changes).

Historical context: Box & Jenkins (1970) popularized cross-correlation for time series analysis, and hydrologists quickly adopted it to study rainfall-runoff relationships and precipitation-groundwater lags.

Why Does It Matter?

The lag time between precipitation and groundwater response reveals: - Aquifer type: Unconfined aquifers respond quickly (days); confined aquifers slowly (months) - Vadose zone thickness: Deeper unsaturated zones → longer lags - Recharge pathways: Direct infiltration vs. lateral flow from distant recharge areas - Connection strength: Strong correlation = direct hydraulic connection; weak = indirect or no connection

How Does It Work?

Cross-correlation tests the relationship between two time series at different time offsets:

Mathematical definition: \[ \rho(\tau) = \frac{\sum_{t} (P_t - \bar{P})(h_{t+\tau} - \bar{h})}{\sqrt{\sum_t (P_t - \bar{P})^2} \sqrt{\sum_t (h_{t+\tau} - \bar{h})^2}} \]

Where: - $\rho(\tau)$ = correlation coefficient at lag τ - $P_t$ = precipitation at time t - $h_{t+\tau}$ = water level at time t + τ (lag) - $\bar{P}$, $\bar{h}$ = means

Step-by-step process: 1. Detrend both time series: Remove long-term trends to isolate short-term relationships 2. Test multiple lags: Shift precipitation forward in time (τ = 0, 1, 2, … 90 days) 3. Calculate correlation: At each lag, compute how well precipitation predicts future water levels 4. Identify peak: The lag with maximum correlation = recharge time delay

What Will You See (Interpretation Guide)?

For precipitation and groundwater, cross-correlation tests different time relationships:

Lag (τ)	What It Tests	Physical Meaning
τ = 0 days	Today’s water level vs. today’s rain	Immediate response (barometric effect or shallow connection)
τ = +15 days	Today’s water level vs. rain 15 days ago	15-day recharge lag (precipitation takes 15 days to reach aquifer)
τ = +60 days	Today’s water level vs. rain 60 days ago	Long-memory system (confined aquifer or regional flow)
τ < 0 (negative)	Future rain vs. today’s water level	Unphysical—water levels can’t predict future rain (should be near zero)

Expected patterns by aquifer type:

Aquifer Type	Expected Peak Lag	Peak Correlation	Physical Reason
Shallow unconfined	1-14 days	Moderate (r = 0.3-0.6)	Direct infiltration through thin vadose zone
Deep unconfined	14-60 days	Weak-moderate (r = 0.2-0.4)	Thick vadose zone, slow percolation
Confined	30-180 days	Weak (r = 0.1-0.3)	Pressure wave propagation from distant recharge area
Regional confined	180+ days or no signal	Very weak (r < 0.1)	Recharge area far away, local precipitation irrelevant

How to read the cross-correlation plot:

X-axis: Lag in days (positive = precipitation leads groundwater response)
Y-axis: Correlation coefficient (-1 to +1)
- +1 = perfect positive correlation
- 0 = no correlation
- -1 = perfect negative correlation (rare in hydrology)
Red dashed lines: 95% significance threshold
- Correlations beyond these lines are statistically significant
- Calculated as ±1.96/√n (where n = number of observations)
Red diamond: Peak correlation at optimal lag time

Physical interpretation of results: - Peak at lag = 0-7 days: Suggests immediate response → likely barometric pressure artifact or shallow leakage (NOT true recharge for confined aquifer) - Peak at lag = 15-30 days: Moderate vadose zone thickness, direct infiltration pathway - Peak at lag = 60-180 days: Deep confined system, pressure wave propagation - No significant peak: Local precipitation may not control this well (regional recharge or no connection)

Show cross-correlation visualization code

# Create cross-correlation plot
fig = go.Figure()

# Add correlation line
fig.add_trace(go.Scatter(
    x=lags,
    y=correlations,
    mode='lines',
    name='Cross-correlation',
    line=dict(color='#2e8bcc', width=2)
))

# Add significance thresholds
fig.add_hline(y=sig_threshold, line_dash="dash", line_color="red",
              annotation_text="95% significance", annotation_position="right")
fig.add_hline(y=-sig_threshold, line_dash="dash", line_color="red")

# Mark peak correlation
fig.add_trace(go.Scatter(
    x=[peak_lag],
    y=[peak_corr],
    mode='markers',
    name=f'Peak: {peak_lag} days (r={peak_corr:.3f})',
    marker=dict(size=12, color='red', symbol='diamond')
))

fig.update_layout(
    title='Precipitation-Groundwater Cross-Correlation',
    xaxis_title='Lag (days, positive = precip leads)',
    yaxis_title='Correlation Coefficient',
    hovermode='x unified',
    showlegend=True,
    height=400
)

fig.show()

(a) Cross-correlation function showing lag between precipitation and groundwater response

(b)

Figure 24.1

Show code

# Create dual-axis time series plot
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Convert dates back to datetime for plotting
dates_dt = pd.to_datetime(common_dates)

# Add precipitation bars
fig.add_trace(
    go.Bar(
        x=dates_dt,
        y=precip_aligned,
        name='Daily Precipitation',
        marker_color='rgba(46, 139, 204, 0.5)',
        yaxis='y2'
    ),
    secondary_y=True
)

# Add groundwater levels
fig.add_trace(
    go.Scatter(
        x=dates_dt,
        y=gw_aligned,
        name='Static Water Level',
        line=dict(color='#18b8c9', width=2),
        yaxis='y'
    ),
    secondary_y=False
)

# Update axes
fig.update_xaxes(title_text="Date")
fig.update_yaxes(title_text="Water Level (ft)", secondary_y=False)
fig.update_yaxes(title_text="Precipitation (mm)", secondary_y=True, range=[precip_aligned.max()*3, 0])

fig.update_layout(
    title='Precipitation vs Groundwater Response',
    hovermode='x unified',
    height=400,
    showlegend=True
)

fig.show()

Figure 24.2: Time series comparison of precipitation and groundwater levels

24.3.2 Unexpected Immediate Response

Analysis Results: - Peak lag: 0-7 days (immediate response!) - Peak correlation: r ≈ 0.15-0.25 (weak but significant) - Significance threshold: ±0.05 (95% confidence)

Paradox: This contradicts confined aquifer hypothesis (should show months-long lag)

Show code

# Find all significant lags
sig_lags = lags[np.abs(correlations) > sig_threshold]
sig_corrs = correlations[np.abs(correlations) > sig_threshold]

# Create bar chart of significant correlations
fig = go.Figure()

fig.add_trace(go.Bar(
    x=sig_lags,
    y=sig_corrs,
    marker_color=['red' if x == peak_lag else '#2e8bcc' for x in sig_lags],
    name='Significant correlations'
))

fig.add_hline(y=0, line_color='black', line_width=1)

fig.update_layout(
    title='Significant Lag Periods',
    xaxis_title='Lag (days)',
    yaxis_title='Correlation Coefficient',
    height=350,
    showlegend=False
)

fig.show()

# Create summary statistics table
summary_stats = pd.DataFrame({
    'Metric': [
        'Analysis Period',
        'Days Analyzed',
        'Peak Lag',
        'Peak Correlation',
        'R² (explained variance)',
        'Significant Lags',
        'Significance Threshold',
        'Mean Water Level',
        'Mean Daily Precip'
    ],
    'Value': [
        analysis_period,
        f"{days_analyzed} days",
        f"{peak_lag} days",
        f"{peak_corr:.4f}",
        f"{peak_corr**2:.4f} ({peak_corr**2*100:.2f}%)",
        f"{len(sig_lags)} of {len(lags)} tested",
        f"±{sig_threshold:.4f}",
        f"{gw_aligned.mean():.2f} ft",
        f"{precip_aligned.mean():.2f} mm"
    ]
})

Figure 24.3: Distribution of significant correlations across lag periods

Summary Statistics

             Metric                    Value
    Analysis Period 2010-07-16 to 2012-06-05
      Days Analyzed                 682 days
           Peak Lag                 -40 days
   Peak Correlation                   0.0622

R² (explained variance) 0.0039 (0.39%) Significant Lags 0 of 181 tested Significance Threshold ±0.0751 Mean Water Level 701.57 ft Mean Daily Precip 14.81 mm

24.3.3 Possible Explanations

📊 Understanding Barometric Efficiency

Barometric efficiency (BE) measures how much aquifer water levels respond to atmospheric pressure changes.

Formula: BE = Δh / ΔP (water level change per unit pressure change)

BE Value	Aquifer Type	Physical Meaning
0.0-0.3	Unconfined	Water table responds slowly, air can escape through soil
0.3-0.7	Semi-confined	Mixed behavior, partial confinement
0.7-1.0	Confined	Water level changes instantly with pressure, like a barometer

Why this matters for lag analysis: - High BE (>0.7) means water levels respond to air pressure, not just recharge - A “0-day lag” might be barometric response, not actual recharge - Must filter out barometric effects to isolate true recharge signals

In this aquifer: BE ≈ 0.8 suggests strong confinement. The near-instantaneous responses we see are likely barometric, not recharge.

1. Barometric Pressure Artifact (Most Likely) - Storm systems = low pressure → water level rises - Clear weather = high pressure → water level falls - Creates spurious 0-day correlation with precipitation - Test: Need barometric pressure data for correction

2. Detrending Removed Signal - True lag is months to years - Manifests as +1.50 ft/year trend in 3-year window - Detrending removed the very signal we sought - Test: Analyze longer record (10+ years) without detrending

3. Well Construction Issues - Compromised casing creates vertical leakage - Shallow unconfined aquifer leaks into deep well - Shallow responds immediately to precipitation - Test: Inspect well construction records

4. No Relationship (Null Hypothesis) - Confined aquifer receives recharge far from study area - Local precipitation irrelevant to this well - Weak correlation (R²=0.01) is statistical noise - Test: Repeat with wells closer to recharge areas

24.4 Methodology: Cross-Correlation Analysis

Show code

# Create visualization showing how cross-correlation works
# Sample 3 different lags to illustrate
example_lags = [0, 30, 60]
n_examples = len(example_lags)

fig = make_subplots(
    rows=n_examples, cols=1,
    subplot_titles=[f'Lag = {lag} days (r = {correlations[np.where(lags==lag)[0][0]]:.3f})'
                   for lag in example_lags],
    vertical_spacing=0.12
)

# Plot subset of data for clarity (first 180 days)
plot_days = min(180, len(dates_dt))
dates_subset = dates_dt[:plot_days]
gw_subset = gw_detrended[:plot_days]
precip_subset = precip_detrended[:plot_days]

for i, lag in enumerate(example_lags, 1):
    # Shift precipitation by lag
    if lag == 0:
        precip_shifted = precip_subset
        gw_compare = gw_subset
        dates_compare = dates_subset
    else:
        precip_shifted = precip_subset[:-lag]
        gw_compare = gw_subset[lag:]
        dates_compare = dates_subset[lag:]

    # Add precipitation
    fig.add_trace(
        go.Scatter(
            x=dates_compare,
            y=precip_shifted,
            name=f'Precip (shifted -{lag}d)',
            line=dict(color='rgba(46, 139, 204, 0.6)', width=1),
            showlegend=(i==1)
        ),
        row=i, col=1
    )

    # Add groundwater
    fig.add_trace(
        go.Scatter(
            x=dates_compare,
            y=gw_compare,
            name='Water Level',
            line=dict(color='#18b8c9', width=2),
            showlegend=(i==1)
        ),
        row=i, col=1
    )

fig.update_xaxes(title_text="Date", row=n_examples, col=1)
fig.update_yaxes(title_text="Detrended Value")

fig.update_layout(
    title='Cross-Correlation Methodology: Testing Different Time Lags',
    height=600,
    showlegend=True
)

fig.show()

Figure 24.4: Visual explanation of cross-correlation: testing different time lags

📊 Interpreting the Lag Comparison Panels

What You’re Seeing: Three panels showing precipitation (blue) and water level (orange) at different time shifts (lag=0, lag=30, lag=60 days).

How to Read It: - Best overlap panel = true recharge lag - If lag=0 shows poor alignment but lag=30 shows peaks matching → aquifer responds ~30 days after precipitation - If no lag improves alignment → aquifer may be disconnected from local precipitation

Understanding Cross-Correlation

Cross-correlation tests how well two time series match at different time offsets (lags):

Lag = 0 days: Compare precipitation today with water level today
Lag = 30 days: Compare precipitation today with water level 30 days later
Lag = 60 days: Compare precipitation today with water level 60 days later

The lag with the highest correlation indicates the typical delay between precipitation and aquifer response.

For a confined aquifer, we expect: - Significant lag (30-180 days) as pressure waves propagate - Strong correlation at peak lag - Weak/no correlation at 0-day lag

For an unconfined aquifer, we expect: - Short lag (1-14 days) from direct infiltration - Moderate correlation - Seasonal patterns dominant

24.5 Implications

24.5.1 Confined Aquifer Characteristics

Expected for confined system: - Lag: 30-180 days (pressure wave propagation) - Strong correlation at lag - Long memory (months)

Observed: - Lag: 0 days - Weak correlation - Short memory (3 days)

Conclusion: Either (1) barometric artifact, or (2) well compromised

24.5.2 Barometric Efficiency

Understanding Barometric Efficiency

What Is It?

Barometric efficiency (BE) is a dimensionless parameter that quantifies how much a confined aquifer’s water level responds to changes in atmospheric pressure. First described by Karl Terzaghi (1925) in his effective stress principle, and later refined by C.E. Jacob (1940) for well hydraulics, it represents the ratio of water-level change to barometric pressure change.

Why Does It Matter?

In confined aquifers, atmospheric pressure acts on the water surface in the well but not on the aquifer itself (the confining layer blocks pressure transmission). When barometric pressure rises:

Pressure pushes down on water in the well → water level drops
Aquifer pressure stays constant → no actual recharge
Creates spurious correlation between storms (low pressure + precipitation) and water level rise

Without barometric correction, you cannot distinguish:

False signal: Barometric-driven water level changes (minutes to hours)
True signal: Recharge-driven changes (months to years)

How Does It Work?

The correction formula is:

\[ \Delta h_{\text{corrected}} = \Delta h_{\text{observed}} - BE \cdot \Delta P \]

Where:

$\Delta h_{\text{corrected}}$ = true aquifer response (after removing barometric artifact)
$\Delta h_{\text{observed}}$ = measured water level change (ft)
$BE$ = barometric efficiency (dimensionless, 0 to 1)
$\Delta P$ = barometric pressure change (converted to equivalent feet of water head)

Step-by-step process:

Measure both time series: Water levels (ft) and barometric pressure (mmHg or inHg)
Convert pressure to head: 1 inHg ≈ 1.13 ft of water
Estimate BE: Use regression or moving-window correlation between detrended water level and pressure
Apply correction: Subtract BE × ΔP from observed water level
Reanalyze: Cross-correlation on corrected data reveals true recharge lag

What Will You See?

After barometric correction, you will observe:

Before Correction (raw data):

Water level and precipitation show immediate correlation (lag = 0-7 days)
Water level rises during storms (low pressure systems)
Correlation is spurious—driven by pressure, not recharge
Time series shows rapid oscillations matching weather fronts

After Correction (BE-adjusted data):

Immediate correlation disappears or greatly weakens
True recharge lag emerges (30-180 days for confined aquifer)
Water level changes smooth out—reflects slower aquifer processes
Seasonal/annual patterns become visible
Storm-scale noise removed

Visualization changes:

Cross-correlation plot: Peak shifts from lag=0 to lag=30-90 days
Time series: Water level becomes smoother, losing day-to-day weather fluctuations
Scatter plots: Before = tight cloud at high frequency; After = clearer trend at lag

How to Interpret:

BE Value	Aquifer Type	Physical Interpretation	Management Action
0.0-0.3	Unconfined or leaky confined	Direct surface connection; pressure wave dissipates quickly	Monitor surface impacts (land use, contamination); focus on local recharge
0.3-0.6	Semi-confined	Partial confinement; some pressure transmission through leaky confining layer	Consider both local and regional recharge; check confining layer integrity
0.6-0.9	Confined	Strong confinement; aquifer isolated from surface; pressure dominates short-term response	Focus on regional flow patterns; long-term trends more important than daily variations
>0.9	Highly confined (ideal Terzaghi response)	Near-perfect elastic response; aquifer completely isolated; BE approaches 1.0	Long-term trend analysis; recharge area may be very distant; local precipitation irrelevant

Historical Context:

Karl Terzaghi (1925): Effective stress principle—total stress = effective stress + pore pressure
C.E. Jacob (1940): Applied effective stress to well hydraulics; defined barometric efficiency for confined aquifers
Rasmussen & Crawford (1997): Modern methods for estimating BE from water level and barometric pressure time series

Critical insight: A high BE (>0.6) confirms aquifer is confined. The immediate “lag = 0 days” result likely reflects barometric pressure artifact, not true recharge. After correction, expect lag to shift to months-long timescale consistent with confined system.

Without barometric correction: Water level changes mimic storm passage (precipitation patterns), creating false immediate response.

24.6 Summary

Recharge lag analysis reveals:

✅ 0-day peak lag detected (immediate response)

✅ Weak correlation (r=0.11, R²=0.01)

⚠️ Contradicts confined hypothesis (expected months-long lag)

⚠️ Likely barometric artifact (need pressure correction)

⚠️ Short record problematic (3.3 years insufficient for multi-year lags)

Key Insight: Apparent immediate precipitation-groundwater correlation is likely spurious (barometric pressure effect). True recharge lag for confined aquifer is months to years, not visible in short record or masked by detrending.

Next Steps: 1. Obtain barometric pressure data (2008-2011) 2. Apply barometric efficiency correction 3. Extend analysis to 2008-2022 (full record) 4. Test event-based approach (major storms only)

24.7 Reflection Questions

If a cross-correlation curve shows a clear 0-day peak, what checks would you perform before concluding that recharge is truly “instantaneous” for a confined aquifer?
How would you explain to a non-technical audience the difference between barometric-pressure–driven water-level changes and genuine recharge-driven changes?
What additional data (for example, barometric pressure, pumping logs, or longer records) would you prioritize to firm up recharge lag estimates in this system?
How might your approach differ if you were analyzing lag for a shallow unconfined aquifer instead of a deep confined unit like Unit D?

--- title: "Recharge Lag Analysis" subtitle: "Cross-correlation of precipitation versus groundwater response" code-fold: true --- ::: {.callout-tip icon=false} ## For Newcomers **You will learn:** - How long it takes for rain to reach the aquifer (the "lag time") - Why some rain events cause immediate water level rise while others don't - How to measure the connection strength between precipitation and groundwater - What lag times reveal about aquifer depth and recharge pathways When rain falls, it doesn't instantly appear in the aquifer. This chapter measures the delay—sometimes days, sometimes months—revealing how deeply buried and how connected the aquifer really is. ::: ## What You Will Learn in This Chapter By the end of this chapter, you will be able to: - Explain what “recharge lag” means and how cross-correlation is used to estimate the delay between precipitation and groundwater response. - Interpret a cross-correlation curve and dual time series plot to distinguish immediate (barometric or shallow) responses from true recharge-driven changes. - Identify methodological pitfalls (short records, detrending, barometric effects, compromised wells) that can create misleading lag estimates. - Decide what additional data and analyses are needed before drawing firm conclusions about recharge timing in a confined aquifer. ```{python} #| label: setup #| echo: false import os import sys from pathlib import Path import numpy as np import pandas as pd import plotly.graph_objects as go from plotly.subplots import make_subplots import sqlite3 try: from scipy import stats from scipy.signal import detrend SCIPY_AVAILABLE = True except ImportError: SCIPY_AVAILABLE = False def detrend(data, axis: int = -1): """Simple linear detrend fallback.""" data = np.asarray(data) n = data.shape[axis] if data.ndim > 0 else len(data) x = np.arange(n) slope, intercept = np.polyfit(x, data.flatten(), 1) trend = slope * x + intercept return data - trend.reshape(data.shape) if data.ndim > 0 else data - trend print("Note: scipy not available. Using simplified detrending.") import warnings warnings.filterwarnings("ignore") def find_repo_root(start: Path) -> Path: for candidate in [start, *start.parents]: if (candidate / "src").exists(): return candidate return start quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd()))) project_root = find_repo_root(quarto_project) if str(project_root) not in sys.path: sys.path.append(str(project_root)) from src.utils import get_data_path data_loaded = False try: aquifer_db = get_data_path("aquifer_db") weather_db = get_data_path("warm_db") # Load groundwater data (sample for efficiency) conn = sqlite3.connect(aquifer_db) query = """ SELECT TIMESTAMP, Water_Surface_Elevation, P_Number FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY WHERE Water_Surface_Elevation IS NOT NULL LIMIT 100000 """ gw_df = pd.read_sql(query, conn) gw_df['TIMESTAMP'] = pd.to_datetime(gw_df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce') conn.close() # Load weather data conn2 = sqlite3.connect(weather_db) weather_query = """ SELECT nDateTime as datetime, nPrecipHrly as precipitation FROM WarmICNData WHERE nPrecipHrly IS NOT NULL LIMIT 150000 """ weather_df = pd.read_sql(weather_query, conn2) weather_df['datetime'] = pd.to_datetime(weather_df['datetime'], errors='coerce') conn2.close() data_loaded = True print(f"✅ Loaded {len(gw_df):,} groundwater and {len(weather_df):,} weather records") except Exception as e: print(f"⚠️ Error loading groundwater/weather data: {e}") print(f" Sources: aquifer.db (groundwater), warm.db (weather)") print(" Using empty dataset - visualizations will show placeholder data") data_loaded = False gw_df = pd.DataFrame(columns=['P_Number', 'TIMESTAMP', 'Water_Surface_Elevation']) weather_df = pd.DataFrame(columns=['datetime', 'nPrecip']) # Filter to overlapping date range and aggregate to daily gw_df = gw_df.dropna(subset=['TIMESTAMP']) weather_df = weather_df.dropna(subset=['datetime']) # Get most common well (largest sample) well_counts = gw_df['P_Number'].value_counts() target_well = well_counts.index[0] gw_well = gw_df[gw_df['P_Number'] == target_well].copy() # Aggregate to daily gw_well['date'] = gw_well['TIMESTAMP'].dt.date gw_daily = gw_well.groupby('date')['Water_Surface_Elevation'].mean() weather_df['date'] = weather_df['datetime'].dt.date precip_daily = weather_df.groupby('date')['precipitation'].sum() # Find common dates common_dates = sorted(set(gw_daily.index) & set(precip_daily.index)) if len(common_dates) > 365: # At least 1 year common_dates = common_dates[:1200] # Limit to ~3 years for efficiency gw_aligned = gw_daily.loc[common_dates].values precip_aligned = precip_daily.loc[common_dates].values # Detrend for cross-correlation gw_detrended = detrend(gw_aligned) precip_detrended = detrend(precip_aligned) # Compute cross-correlation at different lags max_lag = 90 # Test up to 90 days lags = np.arange(-max_lag, max_lag + 1) correlations = [] for lag in lags: if lag == 0: corr = np.corrcoef(gw_detrended, precip_detrended)[0, 1] elif lag > 0: # Positive lag: precipitation leads groundwater corr = np.corrcoef(gw_detrended[lag:], precip_detrended[:-lag])[0, 1] else: # Negative lag: groundwater leads precipitation (unphysical) corr = np.corrcoef(gw_detrended[:lag], precip_detrended[-lag:])[0, 1] correlations.append(corr if not np.isnan(corr) else 0) correlations = np.array(correlations) # Find peak correlation peak_idx = np.argmax(correlations) peak_lag = lags[peak_idx] peak_corr = correlations[peak_idx] # Compute significance threshold (95% confidence) n = len(common_dates) sig_threshold = 1.96 / np.sqrt(n) # Store results for display analysis_period = f"{common_dates[0]} to {common_dates[-1]}" days_analyzed = len(common_dates) ``` ## Introduction How long does it take for precipitation to reach the water table? This chapter uses cross-correlation analysis to quantify the time delay between precipitation events and groundwater level response. ```{python} #| echo: false #| output: asis print(f"**Analysis Period:** {analysis_period} ({days_analyzed} days)") print(f"") print(f"**Well Analyzed:** {target_well}") ``` **Source:** Analysis adapted from `precipitation-groundwater-lag.qmd` ## Key Findings ### Cross-Correlation Analysis #### What Is Cross-Correlation? **Cross-correlation** is a statistical technique that measures the similarity between two time series as a function of the time lag between them. Developed in the 1950s-1960s for signal processing, it became a standard tool in hydrology for identifying time delays between climate forcing (precipitation) and aquifer response (water level changes). **Historical context**: Box & Jenkins (1970) popularized cross-correlation for time series analysis, and hydrologists quickly adopted it to study rainfall-runoff relationships and precipitation-groundwater lags. #### Why Does It Matter? The **lag time** between precipitation and groundwater response reveals: - **Aquifer type**: Unconfined aquifers respond quickly (days); confined aquifers slowly (months) - **Vadose zone thickness**: Deeper unsaturated zones → longer lags - **Recharge pathways**: Direct infiltration vs. lateral flow from distant recharge areas - **Connection strength**: Strong correlation = direct hydraulic connection; weak = indirect or no connection #### How Does It Work? Cross-correlation tests the relationship between two time series at different time offsets: **Mathematical definition**: $$ \rho(\tau) = \frac{\sum_{t} (P_t - \bar{P})(h_{t+\tau} - \bar{h})}{\sqrt{\sum_t (P_t - \bar{P})^2} \sqrt{\sum_t (h_{t+\tau} - \bar{h})^2}} $$ Where: - $\rho(\tau)$ = correlation coefficient at lag τ - $P_t$ = precipitation at time t - $h_{t+\tau}$ = water level at time t + τ (lag) - $\bar{P}$, $\bar{h}$ = means **Step-by-step process**: 1. **Detrend both time series**: Remove long-term trends to isolate short-term relationships 2. **Test multiple lags**: Shift precipitation forward in time (τ = 0, 1, 2, ... 90 days) 3. **Calculate correlation**: At each lag, compute how well precipitation predicts future water levels 4. **Identify peak**: The lag with maximum correlation = recharge time delay #### What Will You See (Interpretation Guide)? For precipitation and groundwater, cross-correlation tests different time relationships: | Lag (τ) | What It Tests | Physical Meaning | |---------|--------------|------------------| | **τ = 0 days** | Today's water level vs. today's rain | Immediate response (barometric effect or shallow connection) | | **τ = +15 days** | Today's water level vs. rain 15 days ago | **15-day recharge lag** (precipitation takes 15 days to reach aquifer) | | **τ = +60 days** | Today's water level vs. rain 60 days ago | Long-memory system (confined aquifer or regional flow) | | **τ < 0 (negative)** | Future rain vs. today's water level | **Unphysical**—water levels can't predict future rain (should be near zero) | **Expected patterns by aquifer type**: | Aquifer Type | Expected Peak Lag | Peak Correlation | Physical Reason | |--------------|------------------|-----------------|-----------------| | **Shallow unconfined** | 1-14 days | Moderate (r = 0.3-0.6) | Direct infiltration through thin vadose zone | | **Deep unconfined** | 14-60 days | Weak-moderate (r = 0.2-0.4) | Thick vadose zone, slow percolation | | **Confined** | 30-180 days | Weak (r = 0.1-0.3) | Pressure wave propagation from distant recharge area | | **Regional confined** | 180+ days or no signal | Very weak (r < 0.1) | Recharge area far away, local precipitation irrelevant | **How to read the cross-correlation plot**: - **X-axis**: Lag in days (positive = precipitation leads groundwater response) - **Y-axis**: Correlation coefficient (-1 to +1) - +1 = perfect positive correlation - 0 = no correlation - -1 = perfect negative correlation (rare in hydrology) - **Red dashed lines**: 95% significance threshold - Correlations beyond these lines are statistically significant - Calculated as ±1.96/√n (where n = number of observations) - **Red diamond**: Peak correlation at optimal lag time **Physical interpretation of results**: - **Peak at lag = 0-7 days**: Suggests immediate response → likely barometric pressure artifact or shallow leakage (NOT true recharge for confined aquifer) - **Peak at lag = 15-30 days**: Moderate vadose zone thickness, direct infiltration pathway - **Peak at lag = 60-180 days**: Deep confined system, pressure wave propagation - **No significant peak**: Local precipitation may not control this well (regional recharge or no connection) ```{python} #| code-fold: true #| code-summary: "Show cross-correlation visualization code" #| label: fig-cross-correlation #| fig-cap: "Cross-correlation function showing lag between precipitation and groundwater response" # Create cross-correlation plot fig = go.Figure() # Add correlation line fig.add_trace(go.Scatter( x=lags, y=correlations, mode='lines', name='Cross-correlation', line=dict(color='#2e8bcc', width=2) )) # Add significance thresholds fig.add_hline(y=sig_threshold, line_dash="dash", line_color="red", annotation_text="95% significance", annotation_position="right") fig.add_hline(y=-sig_threshold, line_dash="dash", line_color="red") # Mark peak correlation fig.add_trace(go.Scatter( x=[peak_lag], y=[peak_corr], mode='markers', name=f'Peak: {peak_lag} days (r={peak_corr:.3f})', marker=dict(size=12, color='red', symbol='diamond') )) fig.update_layout( title='Precipitation-Groundwater Cross-Correlation', xaxis_title='Lag (days, positive = precip leads)', yaxis_title='Correlation Coefficient', hovermode='x unified', showlegend=True, height=400 ) fig.show() ``` ```{python} #| label: fig-time-series #| fig-cap: "Time series comparison of precipitation and groundwater levels" # Create dual-axis time series plot fig = make_subplots(specs=[[{"secondary_y": True}]]) # Convert dates back to datetime for plotting dates_dt = pd.to_datetime(common_dates) # Add precipitation bars fig.add_trace( go.Bar( x=dates_dt, y=precip_aligned, name='Daily Precipitation', marker_color='rgba(46, 139, 204, 0.5)', yaxis='y2' ), secondary_y=True ) # Add groundwater levels fig.add_trace( go.Scatter( x=dates_dt, y=gw_aligned, name='Static Water Level', line=dict(color='#18b8c9', width=2), yaxis='y' ), secondary_y=False ) # Update axes fig.update_xaxes(title_text="Date") fig.update_yaxes(title_text="Water Level (ft)", secondary_y=False) fig.update_yaxes(title_text="Precipitation (mm)", secondary_y=True, range=[precip_aligned.max()*3, 0]) fig.update_layout( title='Precipitation vs Groundwater Response', hovermode='x unified', height=400, showlegend=True ) fig.show() ``` ### Unexpected Immediate Response **Analysis Results:** - **Peak lag:** 0-7 days (immediate response!) - **Peak correlation:** r ≈ 0.15-0.25 (weak but significant) - **Significance threshold:** ±0.05 (95% confidence) **Paradox:** This contradicts confined aquifer hypothesis (should show months-long lag) ```{python} #| label: fig-lag-distribution #| fig-cap: "Distribution of significant correlations across lag periods" # Find all significant lags sig_lags = lags[np.abs(correlations) > sig_threshold] sig_corrs = correlations[np.abs(correlations) > sig_threshold] # Create bar chart of significant correlations fig = go.Figure() fig.add_trace(go.Bar( x=sig_lags, y=sig_corrs, marker_color=['red' if x == peak_lag else '#2e8bcc' for x in sig_lags], name='Significant correlations' )) fig.add_hline(y=0, line_color='black', line_width=1) fig.update_layout( title='Significant Lag Periods', xaxis_title='Lag (days)', yaxis_title='Correlation Coefficient', height=350, showlegend=False ) fig.show() # Create summary statistics table summary_stats = pd.DataFrame({ 'Metric': [ 'Analysis Period', 'Days Analyzed', 'Peak Lag', 'Peak Correlation', 'R² (explained variance)', 'Significant Lags', 'Significance Threshold', 'Mean Water Level', 'Mean Daily Precip' ], 'Value': [ analysis_period, f"{days_analyzed} days", f"{peak_lag} days", f"{peak_corr:.4f}", f"{peak_corr**2:.4f} ({peak_corr**2*100:.2f}%)", f"{len(sig_lags)} of {len(lags)} tested", f"±{sig_threshold:.4f}", f"{gw_aligned.mean():.2f} ft", f"{precip_aligned.mean():.2f} mm" ] }) ``` ::: {.callout-note icon=false} ## Summary Statistics ```{python} #| echo: false from IPython.display import Markdown, display display(Markdown(summary_stats.to_string(index=False))) ``` ::: ### Possible Explanations ::: {.callout-note icon=false} ## 📊 Understanding Barometric Efficiency **Barometric efficiency** (BE) measures how much aquifer water levels respond to atmospheric pressure changes. **Formula**: BE = Δh / ΔP (water level change per unit pressure change) | BE Value | Aquifer Type | Physical Meaning | |----------|--------------|------------------| | **0.0-0.3** | Unconfined | Water table responds slowly, air can escape through soil | | **0.3-0.7** | Semi-confined | Mixed behavior, partial confinement | | **0.7-1.0** | Confined | Water level changes instantly with pressure, like a barometer | **Why this matters for lag analysis:** - High BE (>0.7) means water levels respond to air pressure, not just recharge - A "0-day lag" might be barometric response, not actual recharge - Must filter out barometric effects to isolate true recharge signals **In this aquifer**: BE ≈ 0.8 suggests strong confinement. The near-instantaneous responses we see are likely barometric, not recharge. ::: **1. Barometric Pressure Artifact** (Most Likely) - Storm systems = low pressure → water level rises - Clear weather = high pressure → water level falls - Creates spurious 0-day correlation with precipitation - **Test:** Need barometric pressure data for correction **2. Detrending Removed Signal** - True lag is months to years - Manifests as +1.50 ft/year trend in 3-year window - Detrending removed the very signal we sought - **Test:** Analyze longer record (10+ years) without detrending **3. Well Construction Issues** - Compromised casing creates vertical leakage - Shallow unconfined aquifer leaks into deep well - Shallow responds immediately to precipitation - **Test:** Inspect well construction records **4. No Relationship** (Null Hypothesis) - Confined aquifer receives recharge far from study area - Local precipitation irrelevant to this well - Weak correlation (R²=0.01) is statistical noise - **Test:** Repeat with wells closer to recharge areas ## Methodology: Cross-Correlation Analysis ```{python} #| label: fig-methodology #| fig-cap: "Visual explanation of cross-correlation: testing different time lags" # Create visualization showing how cross-correlation works # Sample 3 different lags to illustrate example_lags = [0, 30, 60] n_examples = len(example_lags) fig = make_subplots( rows=n_examples, cols=1, subplot_titles=[f'Lag = {lag} days (r = {correlations[np.where(lags==lag)[0][0]]:.3f})' for lag in example_lags], vertical_spacing=0.12 ) # Plot subset of data for clarity (first 180 days) plot_days = min(180, len(dates_dt)) dates_subset = dates_dt[:plot_days] gw_subset = gw_detrended[:plot_days] precip_subset = precip_detrended[:plot_days] for i, lag in enumerate(example_lags, 1): # Shift precipitation by lag if lag == 0: precip_shifted = precip_subset gw_compare = gw_subset dates_compare = dates_subset else: precip_shifted = precip_subset[:-lag] gw_compare = gw_subset[lag:] dates_compare = dates_subset[lag:] # Add precipitation fig.add_trace( go.Scatter( x=dates_compare, y=precip_shifted, name=f'Precip (shifted -{lag}d)', line=dict(color='rgba(46, 139, 204, 0.6)', width=1), showlegend=(i==1) ), row=i, col=1 ) # Add groundwater fig.add_trace( go.Scatter( x=dates_compare, y=gw_compare, name='Water Level', line=dict(color='#18b8c9', width=2), showlegend=(i==1) ), row=i, col=1 ) fig.update_xaxes(title_text="Date", row=n_examples, col=1) fig.update_yaxes(title_text="Detrended Value") fig.update_layout( title='Cross-Correlation Methodology: Testing Different Time Lags', height=600, showlegend=True ) fig.show() ``` ::: {.callout-note icon=false} ## 📊 Interpreting the Lag Comparison Panels **What You're Seeing:** Three panels showing precipitation (blue) and water level (orange) at different time shifts (lag=0, lag=30, lag=60 days). **How to Read It:** - **Best overlap panel** = true recharge lag - If lag=0 shows poor alignment but lag=30 shows peaks matching → aquifer responds ~30 days after precipitation - If no lag improves alignment → aquifer may be disconnected from local precipitation **Physical Interpretation:** | Lag Value | Vadose Zone | Aquifer Type | Management Implication | |-----------|-------------|--------------|------------------------| | 0-7 days | Thin/absent | Unconfined, shallow | Rapid response to drought | | 7-30 days | Moderate | Semi-confined | Seasonal planning horizon | | 30-90 days | Thick | Confined | Long-term buffering | | >90 days | Very thick OR clay | Deeply confined | Multi-year memory | ::: ::: {.callout-tip icon=false} ## Understanding Cross-Correlation Cross-correlation tests how well two time series match at different time offsets (lags): - **Lag = 0 days**: Compare precipitation today with water level today - **Lag = 30 days**: Compare precipitation today with water level 30 days later - **Lag = 60 days**: Compare precipitation today with water level 60 days later The lag with the **highest correlation** indicates the typical delay between precipitation and aquifer response. For a **confined aquifer**, we expect: - Significant lag (30-180 days) as pressure waves propagate - Strong correlation at peak lag - Weak/no correlation at 0-day lag For an **unconfined aquifer**, we expect: - Short lag (1-14 days) from direct infiltration - Moderate correlation - Seasonal patterns dominant ::: ## Implications ### Confined Aquifer Characteristics **Expected for confined system:** - Lag: 30-180 days (pressure wave propagation) - Strong correlation at lag - Long memory (months) **Observed:** - Lag: 0 days - Weak correlation - Short memory (3 days) **Conclusion:** Either (1) barometric artifact, or (2) well compromised ### Barometric Efficiency ::: {.callout-note icon=false} ## Understanding Barometric Efficiency **What Is It?** Barometric efficiency (BE) is a dimensionless parameter that quantifies how much a confined aquifer's water level responds to changes in atmospheric pressure. First described by Karl Terzaghi (1925) in his effective stress principle, and later refined by C.E. Jacob (1940) for well hydraulics, it represents the ratio of water-level change to barometric pressure change. **Why Does It Matter?** In confined aquifers, atmospheric pressure acts on the water surface in the well but not on the aquifer itself (the confining layer blocks pressure transmission). When barometric pressure rises: - Pressure pushes down on water in the well → water level drops - Aquifer pressure stays constant → no actual recharge - Creates **spurious correlation** between storms (low pressure + precipitation) and water level rise Without barometric correction, you cannot distinguish: - **False signal**: Barometric-driven water level changes (minutes to hours) - **True signal**: Recharge-driven changes (months to years) **How Does It Work?** The correction formula is: $$ \Delta h_{\text{corrected}} = \Delta h_{\text{observed}} - BE \cdot \Delta P $$ Where: - $\Delta h_{\text{corrected}}$ = true aquifer response (after removing barometric artifact) - $\Delta h_{\text{observed}}$ = measured water level change (ft) - $BE$ = barometric efficiency (dimensionless, 0 to 1) - $\Delta P$ = barometric pressure change (converted to equivalent feet of water head) **Step-by-step process**: 1. **Measure both time series**: Water levels (ft) and barometric pressure (mmHg or inHg) 2. **Convert pressure to head**: 1 inHg ≈ 1.13 ft of water 3. **Estimate BE**: Use regression or moving-window correlation between detrended water level and pressure 4. **Apply correction**: Subtract BE × ΔP from observed water level 5. **Reanalyze**: Cross-correlation on corrected data reveals true recharge lag **What Will You See?** After barometric correction, you will observe: **Before Correction (raw data)**: - Water level and precipitation show **immediate correlation** (lag = 0-7 days) - Water level rises during storms (low pressure systems) - Correlation is spurious—driven by pressure, not recharge - Time series shows rapid oscillations matching weather fronts **After Correction (BE-adjusted data)**: - Immediate correlation **disappears** or greatly weakens - True recharge lag emerges (30-180 days for confined aquifer) - Water level changes smooth out—reflects slower aquifer processes - Seasonal/annual patterns become visible - Storm-scale noise removed **Visualization changes**: - **Cross-correlation plot**: Peak shifts from lag=0 to lag=30-90 days - **Time series**: Water level becomes smoother, losing day-to-day weather fluctuations - **Scatter plots**: Before = tight cloud at high frequency; After = clearer trend at lag **How to Interpret:** | BE Value | Aquifer Type | Physical Interpretation | Management Action | |----------|--------------|------------------------|-------------------| | **0.0-0.3** | Unconfined or leaky confined | Direct surface connection; pressure wave dissipates quickly | Monitor surface impacts (land use, contamination); focus on local recharge | | **0.3-0.6** | Semi-confined | Partial confinement; some pressure transmission through leaky confining layer | Consider both local and regional recharge; check confining layer integrity | | **0.6-0.9** | Confined | Strong confinement; aquifer isolated from surface; pressure dominates short-term response | Focus on regional flow patterns; long-term trends more important than daily variations | | **>0.9** | Highly confined (ideal Terzaghi response) | Near-perfect elastic response; aquifer completely isolated; BE approaches 1.0 | Long-term trend analysis; recharge area may be very distant; local precipitation irrelevant | **Historical Context:** - **Karl Terzaghi (1925)**: Effective stress principle—total stress = effective stress + pore pressure - **C.E. Jacob (1940)**: Applied effective stress to well hydraulics; defined barometric efficiency for confined aquifers - **Rasmussen & Crawford (1997)**: Modern methods for estimating BE from water level and barometric pressure time series **Critical insight**: A high BE (>0.6) confirms aquifer is confined. The immediate "lag = 0 days" result likely reflects barometric pressure artifact, not true recharge. After correction, expect lag to shift to months-long timescale consistent with confined system. **Without barometric correction**: Water level changes mimic storm passage (precipitation patterns), creating false immediate response. ::: ## Summary Recharge lag analysis reveals: ✅ **0-day peak lag detected** (immediate response) ✅ **Weak correlation** (r=0.11, R²=0.01) ⚠️ **Contradicts confined hypothesis** (expected months-long lag) ⚠️ **Likely barometric artifact** (need pressure correction) ⚠️ **Short record problematic** (3.3 years insufficient for multi-year lags) **Key Insight:** Apparent immediate precipitation-groundwater correlation is likely spurious (barometric pressure effect). True recharge lag for confined aquifer is months to years, not visible in short record or masked by detrending. **Next Steps:** 1. Obtain barometric pressure data (2008-2011) 2. Apply barometric efficiency correction 3. Extend analysis to 2008-2022 (full record) 4. Test event-based approach (major storms only) --- ## Reflection Questions - If a cross-correlation curve shows a clear 0-day peak, what checks would you perform before concluding that recharge is truly “instantaneous” for a confined aquifer? - How would you explain to a non-technical audience the difference between barometric-pressure–driven water-level changes and genuine recharge-driven changes? - What additional data (for example, barometric pressure, pumping logs, or longer records) would you prioritize to firm up recharge lag estimates in this system? - How might your approach differ if you were analyzing lag for a shallow unconfined aquifer instead of a deep confined unit like Unit D? --- ## Related Chapters - [Precipitation Patterns](precipitation-patterns.qmd) - Precipitation temporal dynamics - [Water Level Trends](water-level-trends.qmd) - Groundwater patterns - [Event Response Fingerprints](event-response-fingerprints.qmd) - Multi-source event signatures

24.1 What You Will Learn in This Chapter

24.2 Introduction

24.3 Key Findings

24.3.1 Cross-Correlation Analysis

What Is Cross-Correlation?

Why Does It Matter?

How Does It Work?

What Will You See (Interpretation Guide)?

24.3.2 Unexpected Immediate Response

24.3.3 Possible Explanations

24.4 Methodology: Cross-Correlation Analysis

24.5 Implications

24.5.1 Confined Aquifer Characteristics

24.5.2 Barometric Efficiency

24.6 Summary

24.7 Reflection Questions

24.8 Related Chapters