✅ Loaded 100,000 groundwater and 150,000 weather records
24 Recharge Lag Analysis
Cross-correlation of precipitation versus groundwater response
24.1 What You Will Learn in This Chapter
By the end of this chapter, you will be able to:
- Explain what “recharge lag” means and how cross-correlation is used to estimate the delay between precipitation and groundwater response.
- Interpret a cross-correlation curve and dual time series plot to distinguish immediate (barometric or shallow) responses from true recharge-driven changes.
- Identify methodological pitfalls (short records, detrending, barometric effects, compromised wells) that can create misleading lag estimates.
- Decide what additional data and analyses are needed before drawing firm conclusions about recharge timing in a confined aquifer.
24.2 Introduction
How long does it take for precipitation to reach the water table? This chapter uses cross-correlation analysis to quantify the time delay between precipitation events and groundwater level response.
Analysis Period: 2010-07-16 to 2012-06-05 (682 days)
Well Analyzed: 434983
Source: Analysis adapted from precipitation-groundwater-lag.qmd
24.3 Key Findings
24.3.1 Cross-Correlation Analysis
What Is Cross-Correlation?
Cross-correlation is a statistical technique that measures the similarity between two time series as a function of the time lag between them. Developed in the 1950s-1960s for signal processing, it became a standard tool in hydrology for identifying time delays between climate forcing (precipitation) and aquifer response (water level changes).
Historical context: Box & Jenkins (1970) popularized cross-correlation for time series analysis, and hydrologists quickly adopted it to study rainfall-runoff relationships and precipitation-groundwater lags.
Why Does It Matter?
The lag time between precipitation and groundwater response reveals: - Aquifer type: Unconfined aquifers respond quickly (days); confined aquifers slowly (months) - Vadose zone thickness: Deeper unsaturated zones → longer lags - Recharge pathways: Direct infiltration vs. lateral flow from distant recharge areas - Connection strength: Strong correlation = direct hydraulic connection; weak = indirect or no connection
How Does It Work?
Cross-correlation tests the relationship between two time series at different time offsets:
Mathematical definition: \[ \rho(\tau) = \frac{\sum_{t} (P_t - \bar{P})(h_{t+\tau} - \bar{h})}{\sqrt{\sum_t (P_t - \bar{P})^2} \sqrt{\sum_t (h_{t+\tau} - \bar{h})^2}} \]
Where: - \(\rho(\tau)\) = correlation coefficient at lag τ - \(P_t\) = precipitation at time t - \(h_{t+\tau}\) = water level at time t + τ (lag) - \(\bar{P}\), \(\bar{h}\) = means
Step-by-step process: 1. Detrend both time series: Remove long-term trends to isolate short-term relationships 2. Test multiple lags: Shift precipitation forward in time (τ = 0, 1, 2, … 90 days) 3. Calculate correlation: At each lag, compute how well precipitation predicts future water levels 4. Identify peak: The lag with maximum correlation = recharge time delay
What Will You See (Interpretation Guide)?
For precipitation and groundwater, cross-correlation tests different time relationships:
| Lag (τ) | What It Tests | Physical Meaning |
|---|---|---|
| τ = 0 days | Today’s water level vs. today’s rain | Immediate response (barometric effect or shallow connection) |
| τ = +15 days | Today’s water level vs. rain 15 days ago | 15-day recharge lag (precipitation takes 15 days to reach aquifer) |
| τ = +60 days | Today’s water level vs. rain 60 days ago | Long-memory system (confined aquifer or regional flow) |
| τ < 0 (negative) | Future rain vs. today’s water level | Unphysical—water levels can’t predict future rain (should be near zero) |
Expected patterns by aquifer type:
| Aquifer Type | Expected Peak Lag | Peak Correlation | Physical Reason |
|---|---|---|---|
| Shallow unconfined | 1-14 days | Moderate (r = 0.3-0.6) | Direct infiltration through thin vadose zone |
| Deep unconfined | 14-60 days | Weak-moderate (r = 0.2-0.4) | Thick vadose zone, slow percolation |
| Confined | 30-180 days | Weak (r = 0.1-0.3) | Pressure wave propagation from distant recharge area |
| Regional confined | 180+ days or no signal | Very weak (r < 0.1) | Recharge area far away, local precipitation irrelevant |
How to read the cross-correlation plot:
- X-axis: Lag in days (positive = precipitation leads groundwater response)
- Y-axis: Correlation coefficient (-1 to +1)
- +1 = perfect positive correlation
- 0 = no correlation
- -1 = perfect negative correlation (rare in hydrology)
- Red dashed lines: 95% significance threshold
- Correlations beyond these lines are statistically significant
- Calculated as ±1.96/√n (where n = number of observations)
- Red diamond: Peak correlation at optimal lag time
Physical interpretation of results: - Peak at lag = 0-7 days: Suggests immediate response → likely barometric pressure artifact or shallow leakage (NOT true recharge for confined aquifer) - Peak at lag = 15-30 days: Moderate vadose zone thickness, direct infiltration pathway - Peak at lag = 60-180 days: Deep confined system, pressure wave propagation - No significant peak: Local precipitation may not control this well (regional recharge or no connection)
Show cross-correlation visualization code
# Create cross-correlation plot
fig = go.Figure()
# Add correlation line
fig.add_trace(go.Scatter(
x=lags,
y=correlations,
mode='lines',
name='Cross-correlation',
line=dict(color='#2e8bcc', width=2)
))
# Add significance thresholds
fig.add_hline(y=sig_threshold, line_dash="dash", line_color="red",
annotation_text="95% significance", annotation_position="right")
fig.add_hline(y=-sig_threshold, line_dash="dash", line_color="red")
# Mark peak correlation
fig.add_trace(go.Scatter(
x=[peak_lag],
y=[peak_corr],
mode='markers',
name=f'Peak: {peak_lag} days (r={peak_corr:.3f})',
marker=dict(size=12, color='red', symbol='diamond')
))
fig.update_layout(
title='Precipitation-Groundwater Cross-Correlation',
xaxis_title='Lag (days, positive = precip leads)',
yaxis_title='Correlation Coefficient',
hovermode='x unified',
showlegend=True,
height=400
)
fig.show()Show code
# Create dual-axis time series plot
fig = make_subplots(specs=[[{"secondary_y": True}]])
# Convert dates back to datetime for plotting
dates_dt = pd.to_datetime(common_dates)
# Add precipitation bars
fig.add_trace(
go.Bar(
x=dates_dt,
y=precip_aligned,
name='Daily Precipitation',
marker_color='rgba(46, 139, 204, 0.5)',
yaxis='y2'
),
secondary_y=True
)
# Add groundwater levels
fig.add_trace(
go.Scatter(
x=dates_dt,
y=gw_aligned,
name='Static Water Level',
line=dict(color='#18b8c9', width=2),
yaxis='y'
),
secondary_y=False
)
# Update axes
fig.update_xaxes(title_text="Date")
fig.update_yaxes(title_text="Water Level (ft)", secondary_y=False)
fig.update_yaxes(title_text="Precipitation (mm)", secondary_y=True, range=[precip_aligned.max()*3, 0])
fig.update_layout(
title='Precipitation vs Groundwater Response',
hovermode='x unified',
height=400,
showlegend=True
)
fig.show()24.3.2 Unexpected Immediate Response
Analysis Results: - Peak lag: 0-7 days (immediate response!) - Peak correlation: r ≈ 0.15-0.25 (weak but significant) - Significance threshold: ±0.05 (95% confidence)
Paradox: This contradicts confined aquifer hypothesis (should show months-long lag)
Show code
# Find all significant lags
sig_lags = lags[np.abs(correlations) > sig_threshold]
sig_corrs = correlations[np.abs(correlations) > sig_threshold]
# Create bar chart of significant correlations
fig = go.Figure()
fig.add_trace(go.Bar(
x=sig_lags,
y=sig_corrs,
marker_color=['red' if x == peak_lag else '#2e8bcc' for x in sig_lags],
name='Significant correlations'
))
fig.add_hline(y=0, line_color='black', line_width=1)
fig.update_layout(
title='Significant Lag Periods',
xaxis_title='Lag (days)',
yaxis_title='Correlation Coefficient',
height=350,
showlegend=False
)
fig.show()
# Create summary statistics table
summary_stats = pd.DataFrame({
'Metric': [
'Analysis Period',
'Days Analyzed',
'Peak Lag',
'Peak Correlation',
'R² (explained variance)',
'Significant Lags',
'Significance Threshold',
'Mean Water Level',
'Mean Daily Precip'
],
'Value': [
analysis_period,
f"{days_analyzed} days",
f"{peak_lag} days",
f"{peak_corr:.4f}",
f"{peak_corr**2:.4f} ({peak_corr**2*100:.2f}%)",
f"{len(sig_lags)} of {len(lags)} tested",
f"±{sig_threshold:.4f}",
f"{gw_aligned.mean():.2f} ft",
f"{precip_aligned.mean():.2f} mm"
]
})24.3.3 Possible Explanations
1. Barometric Pressure Artifact (Most Likely) - Storm systems = low pressure → water level rises - Clear weather = high pressure → water level falls - Creates spurious 0-day correlation with precipitation - Test: Need barometric pressure data for correction
2. Detrending Removed Signal - True lag is months to years - Manifests as +1.50 ft/year trend in 3-year window - Detrending removed the very signal we sought - Test: Analyze longer record (10+ years) without detrending
3. Well Construction Issues - Compromised casing creates vertical leakage - Shallow unconfined aquifer leaks into deep well - Shallow responds immediately to precipitation - Test: Inspect well construction records
4. No Relationship (Null Hypothesis) - Confined aquifer receives recharge far from study area - Local precipitation irrelevant to this well - Weak correlation (R²=0.01) is statistical noise - Test: Repeat with wells closer to recharge areas
24.4 Methodology: Cross-Correlation Analysis
Show code
# Create visualization showing how cross-correlation works
# Sample 3 different lags to illustrate
example_lags = [0, 30, 60]
n_examples = len(example_lags)
fig = make_subplots(
rows=n_examples, cols=1,
subplot_titles=[f'Lag = {lag} days (r = {correlations[np.where(lags==lag)[0][0]]:.3f})'
for lag in example_lags],
vertical_spacing=0.12
)
# Plot subset of data for clarity (first 180 days)
plot_days = min(180, len(dates_dt))
dates_subset = dates_dt[:plot_days]
gw_subset = gw_detrended[:plot_days]
precip_subset = precip_detrended[:plot_days]
for i, lag in enumerate(example_lags, 1):
# Shift precipitation by lag
if lag == 0:
precip_shifted = precip_subset
gw_compare = gw_subset
dates_compare = dates_subset
else:
precip_shifted = precip_subset[:-lag]
gw_compare = gw_subset[lag:]
dates_compare = dates_subset[lag:]
# Add precipitation
fig.add_trace(
go.Scatter(
x=dates_compare,
y=precip_shifted,
name=f'Precip (shifted -{lag}d)',
line=dict(color='rgba(46, 139, 204, 0.6)', width=1),
showlegend=(i==1)
),
row=i, col=1
)
# Add groundwater
fig.add_trace(
go.Scatter(
x=dates_compare,
y=gw_compare,
name='Water Level',
line=dict(color='#18b8c9', width=2),
showlegend=(i==1)
),
row=i, col=1
)
fig.update_xaxes(title_text="Date", row=n_examples, col=1)
fig.update_yaxes(title_text="Detrended Value")
fig.update_layout(
title='Cross-Correlation Methodology: Testing Different Time Lags',
height=600,
showlegend=True
)
fig.show()24.5 Implications
24.5.1 Confined Aquifer Characteristics
Expected for confined system: - Lag: 30-180 days (pressure wave propagation) - Strong correlation at lag - Long memory (months)
Observed: - Lag: 0 days - Weak correlation - Short memory (3 days)
Conclusion: Either (1) barometric artifact, or (2) well compromised
24.5.2 Barometric Efficiency
24.6 Summary
Recharge lag analysis reveals:
✅ 0-day peak lag detected (immediate response)
✅ Weak correlation (r=0.11, R²=0.01)
⚠️ Contradicts confined hypothesis (expected months-long lag)
⚠️ Likely barometric artifact (need pressure correction)
⚠️ Short record problematic (3.3 years insufficient for multi-year lags)
Key Insight: Apparent immediate precipitation-groundwater correlation is likely spurious (barometric pressure effect). True recharge lag for confined aquifer is months to years, not visible in short record or masked by detrending.
Next Steps: 1. Obtain barometric pressure data (2008-2011) 2. Apply barometric efficiency correction 3. Extend analysis to 2008-2022 (full record) 4. Test event-based approach (major storms only)
24.7 Reflection Questions
- If a cross-correlation curve shows a clear 0-day peak, what checks would you perform before concluding that recharge is truly “instantaneous” for a confined aquifer?
- How would you explain to a non-technical audience the difference between barometric-pressure–driven water-level changes and genuine recharge-driven changes?
- What additional data (for example, barometric pressure, pumping logs, or longer records) would you prioritize to firm up recharge lag estimates in this system?
- How might your approach differ if you were analyzing lag for a shallow unconfined aquifer instead of a deep confined unit like Unit D?