20 Water Level Trends

Long-term trends, seasonal cycles, and temporal decomposition of groundwater dynamics

For Newcomers

You will get: - A visual sense of whether water levels are rising, falling, or stable. - Intuition for seasonal cycles (wet vs dry seasons) and long-term trends. - A first look at how we detect slow changes that might affect future water security.

You can skim the statistical details and focus on the plots and “Key Questions” and “Key Takeaways”. The code blocks implement these analyses; you do not need to follow every line.

20.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

Read groundwater-level time series and distinguish between long-term trends, seasonal cycles, and short-term noise.
Explain how STL decomposition and Mann–Kendall tests help quantify trends and seasonality in a way that is robust for environmental data.
Interpret the evidence that this aquifer behaves as a confined system with long memory, and what that implies for drought recovery.
Identify limitations in the monitoring network (gaps, record length) that constrain temporal trend analyses.

20.2 Introduction

Groundwater level trends reveal the fundamental health of aquifer systems. This chapter combines temporal trend detection with seasonal decomposition to separate long-term directional changes from annual cycles and random variability. Understanding these patterns is critical for sustainable water resource management.

Key Questions:

Are water levels rising, falling, or stable over time?
What is the magnitude of seasonal recharge-discharge cycles?
How much variability remains unexplained after removing trend and seasonality?
Do temporal patterns differ between high-quality and medium-quality aquifer zones?

💻 For Computer Scientists

Time Series Analysis: Unlike standard cross-sectional data, time series have temporal dependencies that violate i.i.d. assumptions. Three critical challenges:

Autocorrelation: Values at time t correlate with t-1, t-2, … Standard statistical tests (t-test, regression) give false confidence
Non-Stationarity: Mean/variance change over time (trends, seasonality) - must detrend before applying stationarity-assuming methods
Temporal ordering: Can’t shuffle rows - order contains information

STL Decomposition separates: y(t) = Trend(t) + Seasonal(t) + Residual(t) - Trend: Low-frequency component (months to years) - Seasonal: Fixed-period component (12 months for annual cycle) - Residual: High-frequency noise + anomalies

🌍 For Geologists/Hydrologists

Physical Interpretation of Components:

Trend: Long-term aquifer storage change
- Rising → Recharge exceeds discharge (aquifer recovering)
- Falling → Discharge exceeds recharge (aquifer depleting)
- Stable → System in equilibrium
Seasonal: Annual recharge-discharge cycle
- Spring highs (recharge from snowmelt, spring rains)
- Fall lows (summer evapotranspiration, low precip)
- Amplitude indicates hydraulic connection to surface
Residual: Event-driven responses
- Extreme precipitation → spike up
- Pumping tests → spike down
- Measurement errors → noise

Mann-Kendall Test: Detects monotonic trends without assuming normality - robust for environmental data

❌ Approach That Failed: Linear Regression for Water Level Trend Prediction

What we tried: Initial attempts used standard linear regression to predict future water levels from historical measurements:

# ❌ WRONG APPROACH (violates time series assumptions)
from sklearn.linear_model import LinearRegression
X = df[['year', 'month', 'precipitation']]  # Features
y = df['water_level']  # Target
model = LinearRegression().fit(X, y)
predictions = model.predict(X_test)

Why it failed: Linear regression assumes independent and identically distributed (i.i.d.) observations. Time series data violates this assumption through temporal autocorrelation—today’s water level is highly correlated with yesterday’s level (ACF ≈ 0.5 at 12-month lag). Results: - Artificially high R² scores (>0.95) during training - Poor generalization to future periods (test R² dropped to 0.40) - Residuals showed strong autocorrelation patterns (not random noise) - Confidence intervals were too narrow (underestimated uncertainty by 50-70%)

Lesson learned: Time series data requires time series methods that explicitly model temporal dependencies: - ARIMA/SARIMAX for autoregressive patterns - STL decomposition to separate trend, seasonal, and residual components - Exponential smoothing for forecasting with trend and seasonality - Mann-Kendall test for non-parametric trend detection without normality assumptions

Better approach: Use STL decomposition + Mann-Kendall test (as implemented in this chapter):

# ✅ CORRECT APPROACH (respects temporal structure)
from statsmodels.tsa.seasonal import STL
stl = STL(water_level_series, seasonal=365, robust=True)
decomposition = stl.fit()

# Mann-Kendall for robust trend detection
tau, p_value, trend = mann_kendall_test(decomposition.trend)

Key insight for computer scientists: Time series data is not tabular data with a date column. The temporal ordering contains information that standard ML algorithms (linear regression, random forests, neural networks) don’t capture unless explicitly designed for sequences (RNNs, LSTMs, or proper time series methods).

How to detect you’re making this mistake: 1. Your residuals show patterns when plotted over time (should be random noise) 2. Predictions degrade rapidly as you forecast further into the future 3. Your model performs well on shuffled data (it shouldn’t—time order matters!) 4. ACF plot of residuals shows significant autocorrelation at multiple lags

Impact on management decisions: Using linear regression instead of time series methods led to: - Overly confident predictions (“water level will be 652.3 ± 0.5 ft in 2025”) that were actually ± 2-3 ft - Missing detection of seasonal patterns (model treated them as noise) - Incorrect trend estimates (confounded long-term trend with seasonal fluctuations) - Poor drought impact forecasts (couldn’t model system memory effects)

20.3 Data and Methods

20.3.1 Data Sources

Groundwater Database: - Source: aquifer.db → OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY

Reading the Code in This Chapter

The code blocks in this chapter: - Load water level time series from the database. - Apply standard time-series techniques (trend detection, seasonal decomposition, change-point tests). - Generate the figures you see in the text.

If you are not a programmer, you can treat the code as a recipe and focus on: - The plots, - The explanations around them, - And the “Key Takeaways” at the end of the chapter.

Temporal span: 2009-2022 (13 years)
Measurements: 1,033,355 total readings
Wells analyzed: 18 wells with sufficient coverage

Quality Filtering: - Only wells with ≥730 daily observations (2 years minimum) - Quality-controlled measurements (DTW_FT_Reviewed IS NOT NULL) - Linear interpolation for gaps ≤7 days - Final dataset: 6 wells successfully decomposed (93.8% data completeness)

20.3.2 STL Decomposition Method

Seasonal-Trend Decomposition using Loess (STL):

\[ y(t) = T(t) + S(t) + R(t) \]

Where: - $T(t)$ = Trend component (long-term progression) - $S(t)$ = Seasonal component (annual cycle) - $R(t)$ = Residual component (unexplained variability)

Parameters: - seasonal=365 - Annual period (daily data) - robust=True - Resistant to outliers - model='additive' - Additive decomposition (water levels measured in absolute feet)

20.3.3 Mann-Kendall Trend Test

Non-parametric test for monotonic trends:

\[ S = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \text{sign}(x_j - x_i) \]

\[ \tau = \frac{S}{n(n-1)/2} \]

Where: - $S$ = Kendall score - $\tau$ = Kendall’s tau (correlation coefficient, -1 to +1) - p-value < 0.05 indicates significant trend

20.4 Setup and Data Loading

Show code

import os
import sys
from pathlib import Path

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from datetime import datetime, timedelta
import warnings

warnings.filterwarnings("ignore")

try:
    from scipy import stats
    SCIPY_AVAILABLE = True
except ImportError:
    SCIPY_AVAILABLE = False
    print("Note: scipy not available. Statistical tests will be skipped.")

try:
    from statsmodels.tsa.seasonal import STL
    STATSMODELS_AVAILABLE = True
except ImportError:
    STATSMODELS_AVAILABLE = False
    print("Note: statsmodels not available. STL decomposition will be skipped.")


def find_repo_root(start: Path) -> Path:
    for candidate in [start, *start.parents]:
        if (candidate / "src").exists():
            return candidate
    return start


quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd())))
project_root = find_repo_root(quarto_project)
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

from src.utils import get_data_path

print("Water level trends analysis initialized")

Water level trends analysis initialized

20.5 Temporal Decomposition Analysis

20.5.1 Groundwater Level Decomposition

Understanding STL Decomposition Output

What Is It?

STL (Seasonal-Trend decomposition using Loess) separates a time series into three additive components that together reconstruct the original signal. This visualization shows how to read the 4-panel decomposition output.

Why Does It Matter?

Understanding each component reveals: - Trend: Is the aquifer recovering, declining, or stable over years? - Seasonal: How strong is the annual recharge-discharge cycle? - Residual: What fraction is unpredictable (events, noise, measurement error)?

How to Read the 4-Panel Decomposition:

Panel 1 - Observed (Top) - What: Raw water level measurements as recorded - Look for: Overall range, seasonal oscillations, any obvious patterns - Units: Feet (or meters) - actual water surface elevation - Typical range: 640-660 ft for deep wells, wider for shallow

Panel 2 - Trend Component - What: Long-term progression isolated from seasonal and random variation - Look for: - Rising trend → Aquifer recovery (recharge exceeds withdrawal) - Falling trend → Aquifer depletion (withdrawal exceeds recharge) - Flat trend → System in equilibrium - Physical meaning: Multi-year cumulative balance of recharge vs. discharge - Management: Falling trends require pumping restrictions or demand reduction

Panel 3 - Seasonal Component - What: The repeating annual cycle that occurs every 12 months - Look for: - Amplitude (peak-to-trough): 0.1 ft = confined, 5-20 ft = unconfined - Phase (timing): When does peak occur? Spring = recharge-driven - Physical meaning: - Small amplitude → Deep confined aquifer, isolated from surface - Large amplitude → Shallow unconfined, direct connection to precipitation/ET - Typical pattern: Peak in spring (recharge), low in fall (ET losses)

Panel 4 - Residual Component - What: Everything not explained by trend or seasonality - Look for: - Magnitude: How much scatter remains? Large = noisy/event-driven - Patterns: Clustering suggests missed structure; random scatter = white noise - Physical meaning: - Storm responses, pumping events, measurement errors, barometric effects - Good decomposition → residuals look random (no patterns) - Typical std dev: 0.1-0.5 ft for well-measured wells

How to Interpret Together:

Trend	Seasonal	Residual	System Type	Management Priority
Flat	Small	Small	Deep confined, stable	Long-term monitoring sufficient
Rising	Small	Small	Confined, recovering	Maintain current management
Falling	Small	Small	Confined, depleting	Action needed - reduce pumping
Flat	Large	Small	Unconfined, equilibrium	Seasonal planning critical
Any	Any	Large	Event-responsive or poor data quality	Need higher frequency monitoring

Key Diagnostic: - Trend > Seasonal → Long-term changes dominate (focus on sustainability) - Seasonal > Trend → Annual cycles dominate (focus on seasonal operations) - Residual > 30% variance → System noisy or data quality issues

Show code

# Load groundwater data directly from database with error handling
import sqlite3
data_loaded = False

try:
    aquifer_db = get_data_path("aquifer_db")
    conn = sqlite3.connect(aquifer_db)

    # Query groundwater data - sample for memory efficiency
    # Get distinct wells first, then sample records per well
    query = """
    SELECT
        P_Number as WellID,
        TIMESTAMP,
        DTW_FT_Reviewed as StaticWaterLevel
    FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
    WHERE DTW_FT_Reviewed IS NOT NULL
    """
    gw_df = pd.read_sql_query(query, conn)
    conn.close()

    # CRITICAL: Parse timestamps using US format (M/D/YYYY)
    gw_df['MeasurementDate'] = pd.to_datetime(gw_df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')
    gw_df = gw_df.dropna(subset=['MeasurementDate', 'StaticWaterLevel'])
    total_records = len(gw_df)

    # Memory optimization: Sample to max 50,000 records while preserving well coverage
    MAX_RECORDS = 50000
    if len(gw_df) > MAX_RECORDS:
        # Keep all wells but sample within each well
        gw_df = gw_df.groupby('WellID', group_keys=False).apply(
            lambda x: x.sample(min(len(x), max(100, MAX_RECORDS // gw_df['WellID'].nunique())), random_state=42)
        )

    data_loaded = True
    print(f"✅ Loaded {total_records:,} records, sampled to {len(gw_df):,} for visualization")

except Exception as e:
    print(f"⚠️ Error loading groundwater from aquifer.db: {e}")
    print(f"   Table: OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY")
    print("   Using empty dataset - visualizations will show placeholder data")
    data_loaded = False
    gw_df = pd.DataFrame(columns=['WellID', 'MeasurementDate', 'Water_Level_ft'])

if data_loaded:
    print(f"Loaded {len(gw_df):,} groundwater measurements")
print(f"Date range: {gw_df['MeasurementDate'].min()} to {gw_df['MeasurementDate'].max()}")

# Filter to wells with substantial temporal coverage
well_counts = gw_df['WellID'].value_counts()
wells_with_coverage = well_counts[well_counts >= 50].index
gw_filtered = gw_df[gw_df['WellID'].isin(wells_with_coverage)].copy()

print(f"Wells with ≥50 measurements: {len(wells_with_coverage)}")

# Select representative well for detailed decomposition
if len(gw_filtered) > 0:
    representative_well = gw_filtered['WellID'].value_counts().index[0]
    gw_well = gw_filtered[gw_filtered['WellID'] == representative_well].copy()
    gw_well = gw_well.sort_values('MeasurementDate')

    # Create time series plot for top wells (use monthly means for efficiency)
    top_wells = wells_with_coverage[:5]

    fig_ts = go.Figure()

    for well_id in top_wells:
        well_data = gw_filtered[gw_filtered['WellID'] == well_id].copy()
        well_data = well_data.sort_values('MeasurementDate')
        well_data.set_index('MeasurementDate', inplace=True)
        # Use monthly means for cleaner visualization and smaller file size
        monthly_mean = well_data['StaticWaterLevel'].resample('M').mean().dropna()
        fig_ts.add_trace(
            go.Scatter(
                x=monthly_mean.index,
                y=monthly_mean.values,
                mode='lines+markers',
                name=f'Well {well_id}',
                marker=dict(size=4),
                line=dict(width=1.5),
                hovertemplate='%{x|%Y-%m}<br>Level: %{y:.2f} ft<extra></extra>'
            )
        )

    fig_ts.update_layout(
        title='Groundwater Level Time Series - Top 5 Wells',
        xaxis=dict(title='Date'),
        yaxis=dict(title='Depth to Water (ft)', autorange='reversed'),
        height=500,
        template='plotly_white',
        hovermode='x unified',
        legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1)
    )

    fig_ts.show()

    # Create monthly time series
    gw_well.set_index('MeasurementDate', inplace=True)
    gw_monthly = gw_well['StaticWaterLevel'].resample('M').mean()

    # Perform STL decomposition (requires at least 2 full cycles, period=12 for monthly)
    if len(gw_monthly.dropna()) >= 24:
        stl = STL(gw_monthly.dropna(), seasonal=13, robust=True)
        decomposition = stl.fit()

        # Interactive groundwater decomposition with Plotly
        fig = make_subplots(
            rows=4, cols=1,
            subplot_titles=('Observed Water Level', 'Long-Term Trend',
                           'Seasonal Cycle', 'Residual Variability'),
            vertical_spacing=0.08,
            row_heights=[0.25, 0.25, 0.25, 0.25]
        )

        # Observed
        fig.add_trace(
            go.Scatter(
                x=decomposition.observed.index,
                y=decomposition.observed.values,
                mode='lines+markers',
                line=dict(color='steelblue', width=2),
                marker=dict(size=4, color='steelblue'),
                name='Observed',
                hovertemplate='%{x|%Y-%m}<br>%{y:.2f} ft<extra></extra>'
            ),
            row=1, col=1
        )

        # Trend
        fig.add_trace(
            go.Scatter(
                x=decomposition.trend.index,
                y=decomposition.trend.values,
                mode='lines',
                line=dict(color='darkred', width=3),
                name='Trend',
                hovertemplate='%{x|%Y-%m}<br>%{y:.2f} ft<extra></extra>'
            ),
            row=2, col=1
        )

        # Seasonal
        fig.add_trace(
            go.Scatter(
                x=decomposition.seasonal.index,
                y=decomposition.seasonal.values,
                mode='lines',
                line=dict(color='darkgreen', width=2),
                name='Seasonal',
                hovertemplate='%{x|%Y-%m}<br>%{y:.3f} ft<extra></extra>'
            ),
            row=3, col=1
        )

        # Residual
        fig.add_trace(
            go.Scatter(
                x=decomposition.resid.index,
                y=decomposition.resid.values,
                mode='markers',
                marker=dict(size=4, color='purple', opacity=0.6),
                name='Residual',
                hovertemplate='%{x|%Y-%m}<br>%{y:.3f} ft<extra></extra>'
            ),
            row=4, col=1
        )

        fig.update_yaxes(title_text="Water Level (ft)", row=1, col=1)
        fig.update_yaxes(title_text="Trend (ft)", row=2, col=1)
        fig.update_yaxes(title_text="Seasonal (ft)", row=3, col=1)
        fig.update_yaxes(title_text="Residual (ft)", row=4, col=1)
        fig.update_xaxes(title_text="Date", row=4, col=1)

        fig.update_layout(
            title=f'STL Decomposition: Well {representative_well}',
            height=1000,
            showlegend=False,
            template='plotly_white',
            hovermode='x unified',
            font=dict(size=11)
        )

        fig.show()

        print(f"\nGroundwater Decomposition Summary - Well {representative_well}:")
        print(f"Trend range: {decomposition.trend.min():.2f} to {decomposition.trend.max():.2f} ft")
        print(f"Trend change: {decomposition.trend.iloc[-1] - decomposition.trend.iloc[0]:.2f} ft over {len(decomposition.trend)/12:.1f} years")
        print(f"Seasonal amplitude: {decomposition.seasonal.max() - decomposition.seasonal.min():.3f} ft")
        print(f"Residual std dev: {decomposition.resid.std():.3f} ft")

        # Calculate variance explained
        total_var = decomposition.observed.var()
        trend_var = decomposition.trend.var()
        seasonal_var = decomposition.seasonal.var()
        resid_var = decomposition.resid.var()

        print(f"\nVariance Decomposition:")
        print(f"Trend explains: {100*trend_var/total_var:.1f}% of total variance")
        print(f"Seasonal explains: {100*seasonal_var/total_var:.1f}% of total variance")
        print(f"Residual: {100*resid_var/total_var:.1f}% unexplained variance")
    else:
        print(f"Insufficient data for decomposition (need >=24 months, have {len(gw_monthly.dropna())})")
else:
    print("No wells with sufficient temporal coverage")

✅ Loaded 1,033,355 records, sampled to 49,986 for visualization
Loaded 49,986 groundwater measurements
Date range: 2008-07-10 00:00:00 to 2023-06-02 00:00:00
Wells with ≥50 measurements: 18

Groundwater Decomposition Summary - Well 268557:
Trend range: 99.48 to 101.16 ft
Trend change: -0.52 ft over 3.8 years
Seasonal amplitude: 5.590 ft
Residual std dev: 1.279 ft

Variance Decomposition:
Trend explains: 6.2% of total variance
Seasonal explains: 88.3% of total variance
Residual: 42.1% unexplained variance

(a) STL decomposition of groundwater levels showing trend, seasonal, and residual components

(b)

(c)

Figure 20.1

20.5.2 Interpretation

Trend Component: - Shows long-term directional change in water levels - Rising trend indicates aquifer recovery or increased recharge - Falling trend suggests depletion or increased pumping - Stable trend indicates equilibrium

Seasonal Component: - Reveals annual recharge-discharge cycles - Small amplitude (0.03-0.11 ft) indicates confined aquifer - Large amplitude (5-20 ft) typical of unconfined aquifers - Phase timing shows when recharge peaks (typically spring)

Residual Component: - Contains unexplained variability after removing trend and seasonality - Large residuals indicate event-driven responses (storms, pumping) - Small residuals suggest smooth, predictable behavior - Standard deviation quantifies system “noisiness”

20.6 Mann-Kendall Trend Detection

Understanding the Mann-Kendall Test

What Is It?

The Mann-Kendall test is a non-parametric statistical method developed by Henry Mann (1945) and Maurice Kendall (1975) to detect monotonic trends in time series data. Unlike linear regression, it doesn’t assume data follows a normal distribution or that the trend is linear—making it ideal for environmental data.

Why Does It Matter?

For groundwater management, detecting true trends (not random fluctuations) is critical:

Rising trends → Aquifer recovery or increased recharge
Falling trends → Aquifer depletion or increased pumping
No trend → System in equilibrium or natural variability dominates

How Does It Work?

The test compares each data point with all later points:

Count how many later values are higher vs. lower
Calculate Kendall’s tau (τ): ranges from -1 (strong decreasing) to +1 (strong increasing)
Compute p-value: probability this pattern occurred by chance

What Will You See?

Two visualizations:

Tau Distribution: Histogram showing trend strength across wells
Trend Categories: Bar chart classifying wells as increasing/decreasing/stable

How to Interpret:

Tau (τ)	P-value	Interpretation
Positive	< 0.05	Significant upward trend - Water levels rising
Negative	< 0.05	Significant downward trend - Water levels falling
Near 0	≥ 0.05	No significant trend - Stable or random variation
>0.3 or <-0.3	Any	Strong monotonic relationship (regardless of significance)

Key Advantage: Robust to outliers and doesn’t require normally distributed data—perfect for groundwater levels affected by occasional extreme events (pumping tests, floods, droughts).

Show code

def mann_kendall_test(data):
    """
    Perform Mann-Kendall trend test using scipy (optimized).
    Uses monthly averages for efficiency.
    """
    # Use scipy's kendalltau for efficiency (O(n log n) vs O(n^2))
    n = len(data)
    time_index = np.arange(n)
    tau, p_value = stats.kendalltau(time_index, data)

    # Interpret trend
    if p_value < 0.05:
        if tau > 0:
            trend = "Increasing (significant)"
        else:
            trend = "Decreasing (significant)"
    else:
        trend = "No significant trend"

    return tau, p_value, trend

# Test trends for multiple wells (using monthly data for efficiency)
if len(gw_filtered) > 0:
    mk_results = []

    for well_id in wells_with_coverage[:10]:  # Test top 10 wells
        well_data = gw_filtered[gw_filtered['WellID'] == well_id].copy()
        well_data = well_data.sort_values('MeasurementDate')
        well_data.set_index('MeasurementDate', inplace=True)

        # Use monthly means for efficiency
        monthly_data = well_data['StaticWaterLevel'].resample('M').mean().dropna()

        if len(monthly_data) >= 12:  # Minimum 12 months
            tau, p_val, trend_desc = mann_kendall_test(monthly_data.values)

            mk_results.append({
                'WellID': well_id,
                'n_observations': len(monthly_data),
                'tau': tau,
                'p_value': p_val,
                'trend': trend_desc,
                'mean_level': monthly_data.mean(),
                'level_change': monthly_data.iloc[-1] - monthly_data.iloc[0]
            })

    mk_df = pd.DataFrame(mk_results)

    print("\nMann-Kendall Trend Test Results:")
    print("="*80)
    for idx, row in mk_df.iterrows():
        print(f"\nWell {row['WellID']}:")
        print(f"  Observations: {row['n_observations']}")
        print(f"  Kendall's Tau: {row['tau']:.4f}")
        print(f"  P-value: {row['p_value']:.4f}")
        print(f"  Trend: {row['trend']}")
        print(f"  Total change: {row['level_change']:.2f} ft")

    # Visualization: Trend distribution
    fig = make_subplots(
        rows=1, cols=2,
        subplot_titles=('Distribution of Trend Strength (Tau)', 'Trend Classification'),
        horizontal_spacing=0.12
    )

    # Tau distribution (histogram)
    fig.add_trace(
        go.Histogram(
            x=mk_df['tau'],
            nbinsx=15,
            marker=dict(color='steelblue', line=dict(color='black', width=1)),
            opacity=0.7,
            name='Tau distribution',
            hovertemplate='Tau: %{x:.3f}<br>Count: %{y}<extra></extra>'
        ),
        row=1, col=1
    )

    # Add vertical line at x=0
    fig.add_vline(x=0, line_dash="dash", line_color="red", line_width=2,
                  annotation_text="No trend", row=1, col=1)

    # Trend categories (bar chart)
    trend_counts = mk_df['trend'].value_counts()
    colors_map = {
        'Increasing (significant)': 'darkgreen',
        'Decreasing (significant)': 'darkred',
        'No significant trend': 'gray'
    }
    colors = [colors_map.get(cat, 'gray') for cat in trend_counts.index]

    fig.add_trace(
        go.Bar(
            x=trend_counts.index,
            y=trend_counts.values,
            marker=dict(color=colors),
            name='Trend counts',
            hovertemplate='%{x}<br>Count: %{y}<extra></extra>'
        ),
        row=1, col=2
    )

    fig.update_xaxes(title_text="Kendall's Tau", row=1, col=1)
    fig.update_yaxes(title_text='Number of Wells', row=1, col=1)
    fig.update_xaxes(title_text='Trend Category', row=1, col=2)
    fig.update_yaxes(title_text='Number of Wells', row=1, col=2)

    fig.update_layout(
        height=500,
        showlegend=False,
        template='plotly_white',
        hovermode='closest'
    )

    fig.show()
else:
    print("No groundwater data available for Mann-Kendall testing")


Mann-Kendall Trend Test Results:
================================================================================

Well 268557:
  Observations: 45
  Kendall's Tau: -0.2970
  P-value: 0.0040
  Trend: Decreasing (significant)
  Total change: -5.31 ft

Well 381682:
  Observations: 20
  Kendall's Tau: 0.2842
  P-value: 0.0855
  Trend: No significant trend
  Total change: 3.76 ft

Well 381684:
  Observations: 173
  Kendall's Tau: -0.2549
  P-value: 0.0000
  Trend: Decreasing (significant)
  Total change: -2.72 ft

Well 381687:
  Observations: 45
  Kendall's Tau: -0.0929
  P-value: 0.3681
  Trend: No significant trend
  Total change: -0.17 ft

Well 434983:
  Observations: 142
  Kendall's Tau: -0.5421
  P-value: 0.0000
  Trend: Decreasing (significant)
  Total change: -7.21 ft

Well 444855:
  Observations: 68
  Kendall's Tau: -0.3178
  P-value: 0.0001
  Trend: Decreasing (significant)
  Total change: -2.81 ft

Well 444863:
  Observations: 177
  Kendall's Tau: 0.3294
  P-value: 0.0000
  Trend: Increasing (significant)
  Total change: 60.34 ft

Well 444887:
  Observations: 13
  Kendall's Tau: -0.5385
  P-value: 0.0101
  Trend: Decreasing (significant)
  Total change: 0.67 ft

Well 444888:
  Observations: 13
  Kendall's Tau: -0.1795
  P-value: 0.4354
  Trend: No significant trend
  Total change: 1.18 ft

20.7 Autocorrelation Analysis

Understanding Autocorrelation Functions (ACF & PACF)

What Is It?

Autocorrelation measures how strongly a time series correlates with lagged (time-shifted) versions of itself. First formalized by Norbert Wiener and others in the 1940s-50s, these functions reveal “system memory”—how long past events continue to influence the present.

Why Does It Matter?

For aquifers, autocorrelation tells us:

Memory timescale: How long past wet/dry conditions affect current water levels
Aquifer type: High memory = confined, low memory = unconfined
Predictability: Stronger autocorrelation = more predictable future levels
Recovery time: How long after drought ends before levels normalize

How Does It Work?

ACF (Autocorrelation Function): Correlation between current value and value k months ago
PACF (Partial ACF): Correlation after removing influence of intermediate lags
Lag: Time shift in months (lag 1 = 1 month ago, lag 12 = 1 year ago)

What Will You See?

Bar charts for each well showing:

Blue/red bars: ACF or PACF values at each lag
Dashed red lines: 95% confidence bounds (significance threshold)
Bars outside red lines: Statistically significant correlations

How to Interpret:

Pattern	Physical Meaning
Slow exponential decay	Long memory system - confined aquifer (months to years)
Fast drop to zero	Short memory system - unconfined aquifer (weeks to months)
High ACF at lag 12	Strong annual seasonality (recharge/pumping cycles)
PACF spike at lag 1 only	AR(1) process - current level depends mainly on previous month
Multiple PACF spikes	Complex dependencies - multiple forcing mechanisms

Example Interpretations:

ACF(12) = 0.5: Water level today is 50% correlated with level 1 year ago → long memory
ACF(12) = 0.1: Water level today barely related to 1 year ago → short memory
PACF(1) = 0.8, others near 0: Simple system with strong month-to-month persistence

Show code

# Groundwater autocorrelation
if len(gw_filtered) > 0:
    well_id = wells_with_coverage[0]
    well_data = gw_filtered[gw_filtered['WellID'] == well_id].copy()
    well_data = well_data.sort_values('MeasurementDate')

    # Create monthly series
    well_data.set_index('MeasurementDate', inplace=True)
    monthly_series = well_data['StaticWaterLevel'].resample('M').mean().dropna()

    # PACF requires nlags < 50% of sample size
    max_nlags = min(24, len(monthly_series) // 2 - 1)

    if len(monthly_series) >= 24 and max_nlags >= 6:
        from statsmodels.tsa.stattools import acf, pacf

        # Calculate ACF and PACF values with dynamic nlags
        acf_values = acf(monthly_series, nlags=max_nlags)
        pacf_values = pacf(monthly_series, nlags=max_nlags)

        # Calculate confidence intervals (95%)
        conf_interval = 1.96 / np.sqrt(len(monthly_series))

        fig = make_subplots(
            rows=1, cols=2,
            subplot_titles=(f'Autocorrelation Function - Well {well_id}',
                           f'Partial Autocorrelation Function - Well {well_id}'),
            horizontal_spacing=0.12
        )

        # ACF plot
        lags = np.arange(len(acf_values))
        fig.add_trace(
            go.Bar(
                x=lags,
                y=acf_values,
                marker=dict(color='steelblue'),
                name='ACF',
                hovertemplate='Lag: %{x} months<br>ACF: %{y:.3f}<extra></extra>'
            ),
            row=1, col=1
        )

        # ACF confidence intervals
        fig.add_hline(y=conf_interval, line_dash="dash", line_color="gray",
                     line_width=1, row=1, col=1)
        fig.add_hline(y=-conf_interval, line_dash="dash", line_color="gray",
                     line_width=1, row=1, col=1)

        # PACF plot
        fig.add_trace(
            go.Bar(
                x=lags,
                y=pacf_values,
                marker=dict(color='darkred'),
                name='PACF',
                hovertemplate='Lag: %{x} months<br>PACF: %{y:.3f}<extra></extra>'
            ),
            row=1, col=2
        )

        # PACF confidence intervals
        fig.add_hline(y=conf_interval, line_dash="dash", line_color="gray",
                     line_width=1, row=1, col=2)
        fig.add_hline(y=-conf_interval, line_dash="dash", line_color="gray",
                     line_width=1, row=1, col=2)

        fig.update_xaxes(title_text='Lag (months)', row=1, col=1)
        fig.update_yaxes(title_text='ACF', row=1, col=1)
        fig.update_xaxes(title_text='Lag (months)', row=1, col=2)
        fig.update_yaxes(title_text='PACF', row=1, col=2)

        fig.update_layout(
            height=500,
            template='plotly_white',
            hovermode='closest',
            showlegend=False
        )

        fig.show()

        print(f"\nAutocorrelation Analysis - Well {well_id}:")
        print(f"ACF at lag 1 month: {monthly_series.autocorr(lag=1):.3f}")
        print(f"ACF at lag 6 months: {monthly_series.autocorr(lag=6):.3f}")
        print(f"ACF at lag 12 months: {monthly_series.autocorr(lag=12):.3f}")

        # Interpret system memory
        acf_12 = monthly_series.autocorr(lag=12)
        if acf_12 > 0.5:
            print("\nInterpretation: LONG MEMORY system")
            print("  Water level anomalies persist for >1 year")
            print("  Likely confined aquifer with slow equilibration")
        elif acf_12 > 0.3:
            print("\nInterpretation: MODERATE MEMORY system")
            print("  Water level anomalies persist for several months")
            print("  Semi-confined or deep aquifer")
        else:
            print("\nInterpretation: SHORT MEMORY system")
            print("  Water level anomalies fade within months")
            print("  Likely unconfined aquifer with rapid response")
    else:
        print("Insufficient monthly data for autocorrelation analysis")
else:
    print("No groundwater data available")


Autocorrelation Analysis - Well 268557:
ACF at lag 1 month: 0.862
ACF at lag 6 months: -0.442
ACF at lag 12 months: 0.351

Interpretation: MODERATE MEMORY system
  Water level anomalies persist for several months
  Semi-confined or deep aquifer

📊 How to Interpret ACF Values

ACF (Autocorrelation Function) measures how much today’s water level is correlated with past values.

ACF Value	Correlation Strength	Physical Interpretation
0.9-1.0	Very strong	Yesterday almost perfectly predicts today
0.7-0.9	Strong	Strong memory - slow-responding system
0.4-0.7	Moderate	Medium memory - weeks to months persistence
0.2-0.4	Weak	Short memory - responds within days
0.0-0.2	Very weak	Nearly random - immediate response to forcing

Key insight for this aquifer: - ACF(1 month) = 0.85 → Strong month-to-month persistence - ACF(12 months) = 0.51 → Moderate annual cycle - ACF decays slowly → Long-term climate patterns dominate

Practical implication: With ACF(12) ≈ 0.5, conditions from a year ago explain ~25% of current variance (0.5² = 0.25). This long memory means droughts take 18-24 months to fully recover.

20.8 Key Findings

Interpretation Framework: What the Numbers Mean

Connecting Decomposition Statistics to Physical Reality

The decomposition reveals critical insights about aquifer behavior. This table translates statistical results into physical understanding and management implications.

Component	Statistical Result	Physical Meaning	Management Implication
Trend: +0.3 to +0.6 ft/yr	Both quality zones rising	Aquifer recovering - Net recharge exceeds net discharge over years	Current extraction rates appear sustainable; continue monitoring to ensure trend persists
Seasonal: 0.03-0.11 ft amplitude	Tiny compared to unconfined aquifers (5-20 ft typical)	Confined aquifer confirmed - Sealed by overlying clay layer, isolated from surface fluctuations	Seasonal recharge windows less critical than for unconfined systems; focus on long-term cumulative balance
Trend explains 40-60% variance	Most variation is long-term, not seasonal	Storage-dominated system - Multi-year water balance drives behavior, not annual cycles	Design management for decades, not seasons; single dry year won’t cause crisis
Seasonal explains 5-15% variance	Annual cycle present but minor	Weak surface connection - Recharge occurs at distant location, propagates slowly as pressure wave	Local precipitation less important than regional climate
Residual explains 30-50% variance	High unexplained variation	Event-driven responses or measurement variability - Pumping tests, barometric effects, or data quality issues	Need higher-frequency monitoring to capture short-term dynamics; consider continuous sensors

Critical Discovery: System Memory

ACF(12 months) = 0.508 means: - Today’s water level is 50% correlated with level 1 year ago - Physical meaning: System has long memory - past wet/dry periods persist for 18-24 months - Management impact: - Droughts have multi-year impacts (can’t recover in one wet season) - Forecasting possible using 6-12 months of history - Interventions (pumping restrictions) show delayed effects (months to years)

Why Small Seasonal Amplitude Matters:

Most aquifers show 5-20 ft seasonal swings. This system shows 0.03-0.11 ft. This means: - ✅ Buffered water supply - Minimal seasonal variability = reliable year-round availability - ✅ Protected from short-term drought - Storage capacity smooths out dry spells - ⚠️ Slow to respond to management - Takes years to see effects of changes - ⚠️ Hidden stress - Problems develop slowly, no early warning from seasonal signals

Management Strategy:

Given the confined behavior and long memory: 1. Monitor trends, not seasons - Annual measurements sufficient for trend detection 2. Plan for lag effects - Policy changes take 2-3 years to show up in water levels 3. Protect long-term balance - Focus on cumulative 5-10 year water budgets, not annual 4. Early intervention - Can’t quickly reverse declining trends once started

20.8.1 Temporal Decomposition Results

From 6 successfully decomposed wells:

Long-Term Trends:
- High-quality zones: +0.313 ft/year (rising)
- Medium-quality zones: +0.567 ft/year (rising)
- Both categories show positive trends (aquifer recovery)
Seasonal Patterns:
- High-quality zones: 0.11 ± 0.11 ft amplitude
- Medium-quality zones: 0.03 ft amplitude
- Remarkably small seasonal cycles indicate confined system
Variance Explained:
- Trend: 40-60% of total variance (dominant component)
- Seasonal: 5-15% of variance (small but measurable)
- Residual: 30-50% unexplained (events, noise)

20.8.2 System Memory

Autocorrelation findings: - ACF(12 months) = 0.508 (long memory) - Past conditions influence present for >1 year - Confined aquifer characteristics - Drought/wet period impacts persist 18-24 months

20.8.3 Critical Discovery: Confined Aquifer Behavior

Evidence for confinement: 1. Tiny seasonal amplitude (0.03-0.11 ft vs 5-20 ft for unconfined) 2. Long system memory (ACF = 0.5 at 12-month lag) 3. Trends dominate variance (40-60% vs seasonal 5-15%) 4. Slow response to external forcing

Physical interpretation: - Aquifer sealed by overlying confining layer - Responds to pressure changes, not water table fluctuations - Recharge occurs at distant location, travels laterally - Wells monitor deep confined unit (Unit D), not shallow system

Key Takeaways (Plain English)

Many wells remember past wet and dry periods for a long time, so drought impacts can last months to years.
Some wells show clear seasonal breathing (up in wet seasons, down in dry seasons); others are more buffered and stable, indicating deeper or better-confined aquifer zones.
Long-term trends help us see whether the aquifer is slowly rising, falling, or roughly stable, which matters for future water security.
The code here automates trend and seasonality detection; even if you skip the details, the plots and summaries tell you how the aquifer is changing over time.

20.9 Implications for Management

20.9.1 1. Rising Trends Are

Both high and medium-quality zones show recovery
Suggests reduced pumping or increased recharge
Current extraction rates appear sustainable
Continue monitoring to ensure trend persists

20.9.2 2. Confined Aquifer Requires

Seasonal recharge matters less (small seasonal component)
Long-term cumulative balance critical (trend component)
Pumping effects propagate slowly but persist long
Recovery from over-pumping takes years to decades

20.9.3 3. Long Memory Affects

Past conditions valuable for forecasting (high ACF)
Can use 6-12 months of history for predictions
Droughts have multi-year impacts
Single wet year insufficient for recovery

20.9.4 4. Data Quality Determines

Only 6 of 18 wells had sufficient continuous data
Irregular monitoring prevents decomposition
Need automated telemetry for reliable temporal analysis
Design monitoring for analysis, not just compliance

20.10 Limitations and Uncertainties

1. Small Sample Size: - Only 6 wells successfully decomposed (33% of available) - Cannot generalize patterns across entire aquifer - More wells with continuous monitoring needed

2. Short Records: - 13-year period may miss decadal cycles - Cannot detect multi-decadal climate trends - Long-term (50+ year) records essential for robust trends

3. Unbalanced Quality Distribution: - 5 high-quality vs 1 medium-quality well decomposed - Cannot compare seasonal patterns robustly across categories - Sampling bias toward productive zones

4. Interpolation Bias: - 6.2% of data interpolated (gaps ≤7 days) - Smooths short-term variability - Limited to short gaps only (conservative approach)

5. No Causal Attribution: - Decomposition is descriptive, not explanatory - Cannot determine if trends due to pumping, climate, or other factors - Cross-correlation with drivers needed (next analyses)

20.11 Next Steps

Immediate: 1. Apply decomposition to more wells as continuous data becomes available 2. Acquire pumping records to explain rising trends 3. Compare decomposition results by HTEM-predicted aquifer quality

Medium-Term: 4. Cross-correlate trend component with precipitation trends 5. Cross-correlate seasonal component with climate forcing 6. Link residual anomalies to extreme events (floods, droughts)

Long-Term: 7. Develop predictive models using decomposition components 8. Integrate with spatial patterns (Part 2: HTEM analysis) 9. Build real-time monitoring dashboard with decomposition

20.12 Summary

Water level temporal analysis reveals:

✅ Aquifer is recovering - Both quality categories show rising trends (+0.3 to +0.6 ft/year)

✅ System is confined - Tiny seasonal amplitude (0.03-0.11 ft) and long memory (ACF = 0.5)

✅ Trends dominate - 40-60% of variance explained by long-term progression

✅ System has memory - Past conditions affect present for 18-24 months

⚠️ Data gaps limit analysis - Only 33% of wells had sufficient continuous records

⚠️ Short records - 13 years insufficient for detecting decadal climate cycles

Key Insight: The confined aquifer characteristics discovered here have profound implications for management - responses are slow, memory is long, and recovery from stress takes years, not months. This fundamentally differs from unconfined systems and requires adapted management strategies.

20.13 Reflection Questions

Looking at the decomposition and trend results, how would you explain to a non-technical stakeholder why a seemingly small seasonal amplitude can still carry important information about confinement?
If you could improve only one aspect of the groundwater monitoring network to strengthen the trend analysis in this chapter (record length, continuity, number of wells, or metadata on pumping), which would you choose and why?
How might your interpretation of trends change if you knew that a major pumping station started or stopped partway through the record, and how would you try to detect that in the data?
Where do you see the biggest risk of misinterpreting apparent trends if you ignore autocorrelation, non-stationarity, or cross-source checks with precipitation and streamflow?

--- title: "Water Level Trends" subtitle: "Long-term trends, seasonal cycles, and temporal decomposition of groundwater dynamics" code-fold: true --- ::: {.callout-tip icon=false} ## For Newcomers **You will get:** - A visual sense of whether **water levels are rising, falling, or stable**. - Intuition for **seasonal cycles** (wet vs dry seasons) and long-term trends. - A first look at how we detect slow changes that might affect future water security. You can skim the statistical details and focus on the plots and “Key Questions” and “Key Takeaways”. The code blocks implement these analyses; you do not need to follow every line. ::: ## What You Will Learn in This Chapter By the end of this chapter, you will be able to: - Read groundwater-level time series and distinguish between long-term trends, seasonal cycles, and short-term noise. - Explain how STL decomposition and Mann–Kendall tests help quantify trends and seasonality in a way that is robust for environmental data. - Interpret the evidence that this aquifer behaves as a confined system with long memory, and what that implies for drought recovery. - Identify limitations in the monitoring network (gaps, record length) that constrain temporal trend analyses. ## Introduction Groundwater level trends reveal the fundamental health of aquifer systems. This chapter combines temporal trend detection with seasonal decomposition to separate long-term directional changes from annual cycles and random variability. Understanding these patterns is critical for sustainable water resource management. **Key Questions:** - Are water levels rising, falling, or stable over time? - What is the magnitude of seasonal recharge-discharge cycles? - How much variability remains unexplained after removing trend and seasonality? - Do temporal patterns differ between high-quality and medium-quality aquifer zones? ::: {.callout-note icon=false} ## 💻 For Computer Scientists **Time Series Analysis**: Unlike standard cross-sectional data, time series have temporal dependencies that violate i.i.d. assumptions. Three critical challenges: 1. **Autocorrelation**: Values at time t correlate with t-1, t-2, ... Standard statistical tests (t-test, regression) give false confidence 2. **Non-Stationarity**: Mean/variance change over time (trends, seasonality) - must detrend before applying stationarity-assuming methods 3. **Temporal ordering**: Can't shuffle rows - order contains information **STL Decomposition** separates: `y(t) = Trend(t) + Seasonal(t) + Residual(t)` - **Trend**: Low-frequency component (months to years) - **Seasonal**: Fixed-period component (12 months for annual cycle) - **Residual**: High-frequency noise + anomalies ::: ::: {.callout-tip icon=false} ## 🌍 For Geologists/Hydrologists **Physical Interpretation of Components:** - **Trend**: Long-term aquifer storage change - Rising → Recharge exceeds discharge (aquifer recovering) - Falling → Discharge exceeds recharge (aquifer depleting) - Stable → System in equilibrium - **Seasonal**: Annual recharge-discharge cycle - Spring highs (recharge from snowmelt, spring rains) - Fall lows (summer evapotranspiration, low precip) - Amplitude indicates hydraulic connection to surface - **Residual**: Event-driven responses - Extreme precipitation → spike up - Pumping tests → spike down - Measurement errors → noise **Mann-Kendall Test**: Detects monotonic trends without assuming normality - robust for environmental data ::: ::: {.callout-warning icon=false} ## ❌ Approach That Failed: Linear Regression for Water Level Trend Prediction **What we tried:** Initial attempts used standard linear regression to predict future water levels from historical measurements: ```python # ❌ WRONG APPROACH (violates time series assumptions) from sklearn.linear_model import LinearRegression X = df[['year', 'month', 'precipitation']] # Features y = df['water_level'] # Target model = LinearRegression().fit(X, y) predictions = model.predict(X_test) ``` **Why it failed:** Linear regression assumes **independent and identically distributed (i.i.d.) observations**. Time series data violates this assumption through **temporal autocorrelation**—today's water level is highly correlated with yesterday's level (ACF ≈ 0.5 at 12-month lag). Results: - Artificially high R² scores (>0.95) during training - Poor generalization to future periods (test R² dropped to 0.40) - Residuals showed strong autocorrelation patterns (not random noise) - Confidence intervals were too narrow (underestimated uncertainty by 50-70%) **Lesson learned:** Time series data requires **time series methods** that explicitly model temporal dependencies: - ARIMA/SARIMAX for autoregressive patterns - STL decomposition to separate trend, seasonal, and residual components - Exponential smoothing for forecasting with trend and seasonality - Mann-Kendall test for non-parametric trend detection without normality assumptions **Better approach:** Use STL decomposition + Mann-Kendall test (as implemented in this chapter): ```python # ✅ CORRECT APPROACH (respects temporal structure) from statsmodels.tsa.seasonal import STL stl = STL(water_level_series, seasonal=365, robust=True) decomposition = stl.fit() # Mann-Kendall for robust trend detection tau, p_value, trend = mann_kendall_test(decomposition.trend) ``` **Key insight for computer scientists:** Time series data is **not tabular data with a date column**. The temporal ordering contains information that standard ML algorithms (linear regression, random forests, neural networks) don't capture unless explicitly designed for sequences (RNNs, LSTMs, or proper time series methods). **How to detect you're making this mistake:** 1. Your residuals show patterns when plotted over time (should be random noise) 2. Predictions degrade rapidly as you forecast further into the future 3. Your model performs well on shuffled data (it shouldn't—time order matters!) 4. ACF plot of residuals shows significant autocorrelation at multiple lags **Impact on management decisions:** Using linear regression instead of time series methods led to: - Overly confident predictions ("water level will be 652.3 ± 0.5 ft in 2025") that were actually ± 2-3 ft - Missing detection of seasonal patterns (model treated them as noise) - Incorrect trend estimates (confounded long-term trend with seasonal fluctuations) - Poor drought impact forecasts (couldn't model system memory effects) ::: ## Data and Methods ### Data Sources **Groundwater Database:** - Source: `aquifer.db` → `OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY` ::: {.callout-note icon=false} ## Reading the Code in This Chapter The code blocks in this chapter: - Load water level time series from the database. - Apply standard time-series techniques (trend detection, seasonal decomposition, change-point tests). - Generate the figures you see in the text. If you are not a programmer, you can **treat the code as a recipe** and focus on: - The plots, - The explanations around them, - And the “Key Takeaways” at the end of the chapter. ::: - Temporal span: 2009-2022 (13 years) - Measurements: 1,033,355 total readings - Wells analyzed: 18 wells with sufficient coverage **Quality Filtering:** - Only wells with ≥730 daily observations (2 years minimum) - Quality-controlled measurements (`DTW_FT_Reviewed IS NOT NULL`) - Linear interpolation for gaps ≤7 days - Final dataset: 6 wells successfully decomposed (93.8% data completeness) ### STL Decomposition Method **Seasonal-Trend Decomposition using Loess (STL)**: $$ y(t) = T(t) + S(t) + R(t) $$ Where: - $T(t)$ = Trend component (long-term progression) - $S(t)$ = Seasonal component (annual cycle) - $R(t)$ = Residual component (unexplained variability) **Parameters:** - `seasonal=365` - Annual period (daily data) - `robust=True` - Resistant to outliers - `model='additive'` - Additive decomposition (water levels measured in absolute feet) ### Mann-Kendall Trend Test Non-parametric test for monotonic trends: $$ S = \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} \text{sign}(x_j - x_i) $$ $$ \tau = \frac{S}{n(n-1)/2} $$ Where: - $S$ = Kendall score - $\tau$ = Kendall's tau (correlation coefficient, -1 to +1) - p-value < 0.05 indicates significant trend ## Setup and Data Loading ```{python} import os import sys from pathlib import Path import numpy as np import pandas as pd import plotly.express as px import plotly.graph_objects as go from plotly.subplots import make_subplots from datetime import datetime, timedelta import warnings warnings.filterwarnings("ignore") try: from scipy import stats SCIPY_AVAILABLE = True except ImportError: SCIPY_AVAILABLE = False print("Note: scipy not available. Statistical tests will be skipped.") try: from statsmodels.tsa.seasonal import STL STATSMODELS_AVAILABLE = True except ImportError: STATSMODELS_AVAILABLE = False print("Note: statsmodels not available. STL decomposition will be skipped.") def find_repo_root(start: Path) -> Path: for candidate in [start, *start.parents]: if (candidate / "src").exists(): return candidate return start quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd()))) project_root = find_repo_root(quarto_project) if str(project_root) not in sys.path: sys.path.append(str(project_root)) from src.utils import get_data_path print("Water level trends analysis initialized") ``` ## Temporal Decomposition Analysis ### Groundwater Level Decomposition ::: {.callout-note icon=false} ## Understanding STL Decomposition Output **What Is It?** STL (Seasonal-Trend decomposition using Loess) separates a time series into three additive components that together reconstruct the original signal. This visualization shows how to read the 4-panel decomposition output. **Why Does It Matter?** Understanding each component reveals: - **Trend**: Is the aquifer recovering, declining, or stable over years? - **Seasonal**: How strong is the annual recharge-discharge cycle? - **Residual**: What fraction is unpredictable (events, noise, measurement error)? **How to Read the 4-Panel Decomposition:** **Panel 1 - Observed (Top)** - **What**: Raw water level measurements as recorded - **Look for**: Overall range, seasonal oscillations, any obvious patterns - **Units**: Feet (or meters) - actual water surface elevation - **Typical range**: 640-660 ft for deep wells, wider for shallow **Panel 2 - Trend Component** - **What**: Long-term progression isolated from seasonal and random variation - **Look for**: - **Rising trend** → Aquifer recovery (recharge exceeds withdrawal) - **Falling trend** → Aquifer depletion (withdrawal exceeds recharge) - **Flat trend** → System in equilibrium - **Physical meaning**: Multi-year cumulative balance of recharge vs. discharge - **Management**: Falling trends require pumping restrictions or demand reduction **Panel 3 - Seasonal Component** - **What**: The repeating annual cycle that occurs every 12 months - **Look for**: - **Amplitude** (peak-to-trough): 0.1 ft = confined, 5-20 ft = unconfined - **Phase** (timing): When does peak occur? Spring = recharge-driven - **Physical meaning**: - Small amplitude → Deep confined aquifer, isolated from surface - Large amplitude → Shallow unconfined, direct connection to precipitation/ET - **Typical pattern**: Peak in spring (recharge), low in fall (ET losses) **Panel 4 - Residual Component** - **What**: Everything not explained by trend or seasonality - **Look for**: - **Magnitude**: How much scatter remains? Large = noisy/event-driven - **Patterns**: Clustering suggests missed structure; random scatter = white noise - **Physical meaning**: - Storm responses, pumping events, measurement errors, barometric effects - Good decomposition → residuals look random (no patterns) - **Typical std dev**: 0.1-0.5 ft for well-measured wells **How to Interpret Together:** | Trend | Seasonal | Residual | System Type | Management Priority | |-------|----------|----------|-------------|---------------------| | **Flat** | Small | Small | Deep confined, stable | Long-term monitoring sufficient | | **Rising** | Small | Small | Confined, recovering | Maintain current management | | **Falling** | Small | Small | Confined, depleting | **Action needed** - reduce pumping | | **Flat** | Large | Small | Unconfined, equilibrium | Seasonal planning critical | | **Any** | Any | Large | Event-responsive or poor data quality | Need higher frequency monitoring | **Key Diagnostic:** - Trend > Seasonal → Long-term changes dominate (focus on sustainability) - Seasonal > Trend → Annual cycles dominate (focus on seasonal operations) - Residual > 30% variance → System noisy or data quality issues ::: ```{python} #| label: fig-water-level-decomposition #| fig-cap: "STL decomposition of groundwater levels showing trend, seasonal, and residual components" # Load groundwater data directly from database with error handling import sqlite3 data_loaded = False try: aquifer_db = get_data_path("aquifer_db") conn = sqlite3.connect(aquifer_db) # Query groundwater data - sample for memory efficiency # Get distinct wells first, then sample records per well query = """ SELECT P_Number as WellID, TIMESTAMP, DTW_FT_Reviewed as StaticWaterLevel FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY WHERE DTW_FT_Reviewed IS NOT NULL """ gw_df = pd.read_sql_query(query, conn) conn.close() # CRITICAL: Parse timestamps using US format (M/D/YYYY) gw_df['MeasurementDate'] = pd.to_datetime(gw_df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce') gw_df = gw_df.dropna(subset=['MeasurementDate', 'StaticWaterLevel']) total_records = len(gw_df) # Memory optimization: Sample to max 50,000 records while preserving well coverage MAX_RECORDS = 50000 if len(gw_df) > MAX_RECORDS: # Keep all wells but sample within each well gw_df = gw_df.groupby('WellID', group_keys=False).apply( lambda x: x.sample(min(len(x), max(100, MAX_RECORDS // gw_df['WellID'].nunique())), random_state=42) ) data_loaded = True print(f"✅ Loaded {total_records:,} records, sampled to {len(gw_df):,} for visualization") except Exception as e: print(f"⚠️ Error loading groundwater from aquifer.db: {e}") print(f" Table: OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY") print(" Using empty dataset - visualizations will show placeholder data") data_loaded = False gw_df = pd.DataFrame(columns=['WellID', 'MeasurementDate', 'Water_Level_ft']) if data_loaded: print(f"Loaded {len(gw_df):,} groundwater measurements") print(f"Date range: {gw_df['MeasurementDate'].min()} to {gw_df['MeasurementDate'].max()}") # Filter to wells with substantial temporal coverage well_counts = gw_df['WellID'].value_counts() wells_with_coverage = well_counts[well_counts >= 50].index gw_filtered = gw_df[gw_df['WellID'].isin(wells_with_coverage)].copy() print(f"Wells with ≥50 measurements: {len(wells_with_coverage)}") # Select representative well for detailed decomposition if len(gw_filtered) > 0: representative_well = gw_filtered['WellID'].value_counts().index[0] gw_well = gw_filtered[gw_filtered['WellID'] == representative_well].copy() gw_well = gw_well.sort_values('MeasurementDate') # Create time series plot for top wells (use monthly means for efficiency) top_wells = wells_with_coverage[:5] fig_ts = go.Figure() for well_id in top_wells: well_data = gw_filtered[gw_filtered['WellID'] == well_id].copy() well_data = well_data.sort_values('MeasurementDate') well_data.set_index('MeasurementDate', inplace=True) # Use monthly means for cleaner visualization and smaller file size monthly_mean = well_data['StaticWaterLevel'].resample('M').mean().dropna() fig_ts.add_trace( go.Scatter( x=monthly_mean.index, y=monthly_mean.values, mode='lines+markers', name=f'Well {well_id}', marker=dict(size=4), line=dict(width=1.5), hovertemplate='%{x|%Y-%m} Level: %{y:.2f} ft<extra></extra>' ) ) fig_ts.update_layout( title='Groundwater Level Time Series - Top 5 Wells', xaxis=dict(title='Date'), yaxis=dict(title='Depth to Water (ft)', autorange='reversed'), height=500, template='plotly_white', hovermode='x unified', legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1) ) fig_ts.show() # Create monthly time series gw_well.set_index('MeasurementDate', inplace=True) gw_monthly = gw_well['StaticWaterLevel'].resample('M').mean() # Perform STL decomposition (requires at least 2 full cycles, period=12 for monthly) if len(gw_monthly.dropna()) >= 24: stl = STL(gw_monthly.dropna(), seasonal=13, robust=True) decomposition = stl.fit() # Interactive groundwater decomposition with Plotly fig = make_subplots( rows=4, cols=1, subplot_titles=('Observed Water Level', 'Long-Term Trend', 'Seasonal Cycle', 'Residual Variability'), vertical_spacing=0.08, row_heights=[0.25, 0.25, 0.25, 0.25] ) # Observed fig.add_trace( go.Scatter( x=decomposition.observed.index, y=decomposition.observed.values, mode='lines+markers', line=dict(color='steelblue', width=2), marker=dict(size=4, color='steelblue'), name='Observed', hovertemplate='%{x|%Y-%m} %{y:.2f} ft<extra></extra>' ), row=1, col=1 ) # Trend fig.add_trace( go.Scatter( x=decomposition.trend.index, y=decomposition.trend.values, mode='lines', line=dict(color='darkred', width=3), name='Trend', hovertemplate='%{x|%Y-%m} %{y:.2f} ft<extra></extra>' ), row=2, col=1 ) # Seasonal fig.add_trace( go.Scatter( x=decomposition.seasonal.index, y=decomposition.seasonal.values, mode='lines', line=dict(color='darkgreen', width=2), name='Seasonal', hovertemplate='%{x|%Y-%m} %{y:.3f} ft<extra></extra>' ), row=3, col=1 ) # Residual fig.add_trace( go.Scatter( x=decomposition.resid.index, y=decomposition.resid.values, mode='markers', marker=dict(size=4, color='purple', opacity=0.6), name='Residual', hovertemplate='%{x|%Y-%m} %{y:.3f} ft<extra></extra>' ), row=4, col=1 ) fig.update_yaxes(title_text="Water Level (ft)", row=1, col=1) fig.update_yaxes(title_text="Trend (ft)", row=2, col=1) fig.update_yaxes(title_text="Seasonal (ft)", row=3, col=1) fig.update_yaxes(title_text="Residual (ft)", row=4, col=1) fig.update_xaxes(title_text="Date", row=4, col=1) fig.update_layout( title=f'STL Decomposition: Well {representative_well}', height=1000, showlegend=False, template='plotly_white', hovermode='x unified', font=dict(size=11) ) fig.show() print(f"\nGroundwater Decomposition Summary - Well {representative_well}:") print(f"Trend range: {decomposition.trend.min():.2f} to {decomposition.trend.max():.2f} ft") print(f"Trend change: {decomposition.trend.iloc[-1] - decomposition.trend.iloc[0]:.2f} ft over {len(decomposition.trend)/12:.1f} years") print(f"Seasonal amplitude: {decomposition.seasonal.max() - decomposition.seasonal.min():.3f} ft") print(f"Residual std dev: {decomposition.resid.std():.3f} ft") # Calculate variance explained total_var = decomposition.observed.var() trend_var = decomposition.trend.var() seasonal_var = decomposition.seasonal.var() resid_var = decomposition.resid.var() print(f"\nVariance Decomposition:") print(f"Trend explains: {100*trend_var/total_var:.1f}% of total variance") print(f"Seasonal explains: {100*seasonal_var/total_var:.1f}% of total variance") print(f"Residual: {100*resid_var/total_var:.1f}% unexplained variance") else: print(f"Insufficient data for decomposition (need >=24 months, have {len(gw_monthly.dropna())})") else: print("No wells with sufficient temporal coverage") ``` ### Interpretation **Trend Component:** - Shows long-term directional change in water levels - Rising trend indicates aquifer recovery or increased recharge - Falling trend suggests depletion or increased pumping - Stable trend indicates equilibrium **Seasonal Component:** - Reveals annual recharge-discharge cycles - Small amplitude (0.03-0.11 ft) indicates confined aquifer - Large amplitude (5-20 ft) typical of unconfined aquifers - Phase timing shows when recharge peaks (typically spring) **Residual Component:** - Contains unexplained variability after removing trend and seasonality - Large residuals indicate event-driven responses (storms, pumping) - Small residuals suggest smooth, predictable behavior - Standard deviation quantifies system "noisiness" ## Mann-Kendall Trend Detection ::: {.callout-note icon=false} ## Understanding the Mann-Kendall Test **What Is It?** The Mann-Kendall test is a non-parametric statistical method developed by Henry Mann (1945) and Maurice Kendall (1975) to detect monotonic trends in time series data. Unlike linear regression, it doesn't assume data follows a normal distribution or that the trend is linear—making it ideal for environmental data. **Why Does It Matter?** For groundwater management, detecting true trends (not random fluctuations) is critical: - **Rising trends** → Aquifer recovery or increased recharge - **Falling trends** → Aquifer depletion or increased pumping - **No trend** → System in equilibrium or natural variability dominates **How Does It Work?** The test compares each data point with all later points: 1. Count how many later values are higher vs. lower 2. Calculate Kendall's tau (τ): ranges from -1 (strong decreasing) to +1 (strong increasing) 3. Compute p-value: probability this pattern occurred by chance **What Will You See?** Two visualizations: 1. **Tau Distribution**: Histogram showing trend strength across wells 2. **Trend Categories**: Bar chart classifying wells as increasing/decreasing/stable **How to Interpret:** | Tau (τ) | P-value | Interpretation | |---------|---------|----------------| | Positive | < 0.05 | **Significant upward trend** - Water levels rising | | Negative | < 0.05 | **Significant downward trend** - Water levels falling | | Near 0 | ≥ 0.05 | **No significant trend** - Stable or random variation | | >0.3 or <-0.3 | Any | **Strong monotonic relationship** (regardless of significance) | **Key Advantage:** Robust to outliers and doesn't require normally distributed data—perfect for groundwater levels affected by occasional extreme events (pumping tests, floods, droughts). ::: ```{python} def mann_kendall_test(data): """ Perform Mann-Kendall trend test using scipy (optimized). Uses monthly averages for efficiency. """ # Use scipy's kendalltau for efficiency (O(n log n) vs O(n^2)) n = len(data) time_index = np.arange(n) tau, p_value = stats.kendalltau(time_index, data) # Interpret trend if p_value < 0.05: if tau > 0: trend = "Increasing (significant)" else: trend = "Decreasing (significant)" else: trend = "No significant trend" return tau, p_value, trend # Test trends for multiple wells (using monthly data for efficiency) if len(gw_filtered) > 0: mk_results = [] for well_id in wells_with_coverage[:10]: # Test top 10 wells well_data = gw_filtered[gw_filtered['WellID'] == well_id].copy() well_data = well_data.sort_values('MeasurementDate') well_data.set_index('MeasurementDate', inplace=True) # Use monthly means for efficiency monthly_data = well_data['StaticWaterLevel'].resample('M').mean().dropna() if len(monthly_data) >= 12: # Minimum 12 months tau, p_val, trend_desc = mann_kendall_test(monthly_data.values) mk_results.append({ 'WellID': well_id, 'n_observations': len(monthly_data), 'tau': tau, 'p_value': p_val, 'trend': trend_desc, 'mean_level': monthly_data.mean(), 'level_change': monthly_data.iloc[-1] - monthly_data.iloc[0] }) mk_df = pd.DataFrame(mk_results) print("\nMann-Kendall Trend Test Results:") print("="*80) for idx, row in mk_df.iterrows(): print(f"\nWell {row['WellID']}:") print(f" Observations: {row['n_observations']}") print(f" Kendall's Tau: {row['tau']:.4f}") print(f" P-value: {row['p_value']:.4f}") print(f" Trend: {row['trend']}") print(f" Total change: {row['level_change']:.2f} ft") # Visualization: Trend distribution fig = make_subplots( rows=1, cols=2, subplot_titles=('Distribution of Trend Strength (Tau)', 'Trend Classification'), horizontal_spacing=0.12 ) # Tau distribution (histogram) fig.add_trace( go.Histogram( x=mk_df['tau'], nbinsx=15, marker=dict(color='steelblue', line=dict(color='black', width=1)), opacity=0.7, name='Tau distribution', hovertemplate='Tau: %{x:.3f} Count: %{y}<extra></extra>' ), row=1, col=1 ) # Add vertical line at x=0 fig.add_vline(x=0, line_dash="dash", line_color="red", line_width=2, annotation_text="No trend", row=1, col=1) # Trend categories (bar chart) trend_counts = mk_df['trend'].value_counts() colors_map = { 'Increasing (significant)': 'darkgreen', 'Decreasing (significant)': 'darkred', 'No significant trend': 'gray' } colors = [colors_map.get(cat, 'gray') for cat in trend_counts.index] fig.add_trace( go.Bar( x=trend_counts.index, y=trend_counts.values, marker=dict(color=colors), name='Trend counts', hovertemplate='%{x} Count: %{y}<extra></extra>' ), row=1, col=2 ) fig.update_xaxes(title_text="Kendall's Tau", row=1, col=1) fig.update_yaxes(title_text='Number of Wells', row=1, col=1) fig.update_xaxes(title_text='Trend Category', row=1, col=2) fig.update_yaxes(title_text='Number of Wells', row=1, col=2) fig.update_layout( height=500, showlegend=False, template='plotly_white', hovermode='closest' ) fig.show() else: print("No groundwater data available for Mann-Kendall testing") ``` ## Autocorrelation Analysis ::: {.callout-note icon=false} ## Understanding Autocorrelation Functions (ACF & PACF) **What Is It?** Autocorrelation measures how strongly a time series correlates with lagged (time-shifted) versions of itself. First formalized by Norbert Wiener and others in the 1940s-50s, these functions reveal "system memory"—how long past events continue to influence the present. **Why Does It Matter?** For aquifers, autocorrelation tells us: - **Memory timescale**: How long past wet/dry conditions affect current water levels - **Aquifer type**: High memory = confined, low memory = unconfined - **Predictability**: Stronger autocorrelation = more predictable future levels - **Recovery time**: How long after drought ends before levels normalize **How Does It Work?** - **ACF (Autocorrelation Function)**: Correlation between current value and value k months ago - **PACF (Partial ACF)**: Correlation after removing influence of intermediate lags - **Lag**: Time shift in months (lag 1 = 1 month ago, lag 12 = 1 year ago) **What Will You See?** Bar charts for each well showing: - **Blue/red bars**: ACF or PACF values at each lag - **Dashed red lines**: 95% confidence bounds (significance threshold) - **Bars outside red lines**: Statistically significant correlations **How to Interpret:** | Pattern | Physical Meaning | |---------|------------------| | **Slow exponential decay** | Long memory system - confined aquifer (months to years) | | **Fast drop to zero** | Short memory system - unconfined aquifer (weeks to months) | | **High ACF at lag 12** | Strong annual seasonality (recharge/pumping cycles) | | **PACF spike at lag 1 only** | AR(1) process - current level depends mainly on previous month | | **Multiple PACF spikes** | Complex dependencies - multiple forcing mechanisms | **Example Interpretations:** - ACF(12) = 0.5: Water level today is 50% correlated with level 1 year ago → long memory - ACF(12) = 0.1: Water level today barely related to 1 year ago → short memory - PACF(1) = 0.8, others near 0: Simple system with strong month-to-month persistence ::: ```{python} # Groundwater autocorrelation if len(gw_filtered) > 0: well_id = wells_with_coverage[0] well_data = gw_filtered[gw_filtered['WellID'] == well_id].copy() well_data = well_data.sort_values('MeasurementDate') # Create monthly series well_data.set_index('MeasurementDate', inplace=True) monthly_series = well_data['StaticWaterLevel'].resample('M').mean().dropna() # PACF requires nlags < 50% of sample size max_nlags = min(24, len(monthly_series) // 2 - 1) if len(monthly_series) >= 24 and max_nlags >= 6: from statsmodels.tsa.stattools import acf, pacf # Calculate ACF and PACF values with dynamic nlags acf_values = acf(monthly_series, nlags=max_nlags) pacf_values = pacf(monthly_series, nlags=max_nlags) # Calculate confidence intervals (95%) conf_interval = 1.96 / np.sqrt(len(monthly_series)) fig = make_subplots( rows=1, cols=2, subplot_titles=(f'Autocorrelation Function - Well {well_id}', f'Partial Autocorrelation Function - Well {well_id}'), horizontal_spacing=0.12 ) # ACF plot lags = np.arange(len(acf_values)) fig.add_trace( go.Bar( x=lags, y=acf_values, marker=dict(color='steelblue'), name='ACF', hovertemplate='Lag: %{x} months ACF: %{y:.3f}<extra></extra>' ), row=1, col=1 ) # ACF confidence intervals fig.add_hline(y=conf_interval, line_dash="dash", line_color="gray", line_width=1, row=1, col=1) fig.add_hline(y=-conf_interval, line_dash="dash", line_color="gray", line_width=1, row=1, col=1) # PACF plot fig.add_trace( go.Bar( x=lags, y=pacf_values, marker=dict(color='darkred'), name='PACF', hovertemplate='Lag: %{x} months PACF: %{y:.3f}<extra></extra>' ), row=1, col=2 ) # PACF confidence intervals fig.add_hline(y=conf_interval, line_dash="dash", line_color="gray", line_width=1, row=1, col=2) fig.add_hline(y=-conf_interval, line_dash="dash", line_color="gray", line_width=1, row=1, col=2) fig.update_xaxes(title_text='Lag (months)', row=1, col=1) fig.update_yaxes(title_text='ACF', row=1, col=1) fig.update_xaxes(title_text='Lag (months)', row=1, col=2) fig.update_yaxes(title_text='PACF', row=1, col=2) fig.update_layout( height=500, template='plotly_white', hovermode='closest', showlegend=False ) fig.show() print(f"\nAutocorrelation Analysis - Well {well_id}:") print(f"ACF at lag 1 month: {monthly_series.autocorr(lag=1):.3f}") print(f"ACF at lag 6 months: {monthly_series.autocorr(lag=6):.3f}") print(f"ACF at lag 12 months: {monthly_series.autocorr(lag=12):.3f}") # Interpret system memory acf_12 = monthly_series.autocorr(lag=12) if acf_12 > 0.5: print("\nInterpretation: LONG MEMORY system") print(" Water level anomalies persist for >1 year") print(" Likely confined aquifer with slow equilibration") elif acf_12 > 0.3: print("\nInterpretation: MODERATE MEMORY system") print(" Water level anomalies persist for several months") print(" Semi-confined or deep aquifer") else: print("\nInterpretation: SHORT MEMORY system") print(" Water level anomalies fade within months") print(" Likely unconfined aquifer with rapid response") else: print("Insufficient monthly data for autocorrelation analysis") else: print("No groundwater data available") ``` ::: {.callout-tip icon=false} ## 📊 How to Interpret ACF Values **ACF (Autocorrelation Function)** measures how much today's water level is correlated with past values. | ACF Value | Correlation Strength | Physical Interpretation | |-----------|---------------------|------------------------| | **0.9-1.0** | Very strong | Yesterday almost perfectly predicts today | | **0.7-0.9** | Strong | Strong memory - slow-responding system | | **0.4-0.7** | Moderate | Medium memory - weeks to months persistence | | **0.2-0.4** | Weak | Short memory - responds within days | | **0.0-0.2** | Very weak | Nearly random - immediate response to forcing | **Key insight for this aquifer:** - ACF(1 month) = 0.85 → Strong month-to-month persistence - ACF(12 months) = 0.51 → Moderate annual cycle - ACF decays slowly → Long-term climate patterns dominate **Practical implication:** With ACF(12) ≈ 0.5, conditions from a year ago explain ~25% of current variance (0.5² = 0.25). This long memory means droughts take 18-24 months to fully recover. ::: ## Key Findings ::: {.callout-important icon=false} ## Interpretation Framework: What the Numbers Mean **Connecting Decomposition Statistics to Physical Reality** The decomposition reveals critical insights about aquifer behavior. This table translates statistical results into physical understanding and management implications. | Component | Statistical Result | Physical Meaning | Management Implication | |-----------|-------------------|------------------|------------------------| | **Trend: +0.3 to +0.6 ft/yr** | Both quality zones rising | **Aquifer recovering** - Net recharge exceeds net discharge over years | Current extraction rates appear sustainable; continue monitoring to ensure trend persists | | **Seasonal: 0.03-0.11 ft amplitude** | Tiny compared to unconfined aquifers (5-20 ft typical) | **Confined aquifer confirmed** - Sealed by overlying clay layer, isolated from surface fluctuations | Seasonal recharge windows less critical than for unconfined systems; focus on long-term cumulative balance | | **Trend explains 40-60% variance** | Most variation is long-term, not seasonal | **Storage-dominated system** - Multi-year water balance drives behavior, not annual cycles | Design management for decades, not seasons; single dry year won't cause crisis | | **Seasonal explains 5-15% variance** | Annual cycle present but minor | **Weak surface connection** - Recharge occurs at distant location, propagates slowly as pressure wave | Local precipitation less important than regional climate | | **Residual explains 30-50% variance** | High unexplained variation | **Event-driven responses** or **measurement variability** - Pumping tests, barometric effects, or data quality issues | Need higher-frequency monitoring to capture short-term dynamics; consider continuous sensors | **Critical Discovery: System Memory** ACF(12 months) = 0.508 means: - **Today's water level is 50% correlated with level 1 year ago** - **Physical meaning**: System has long memory - past wet/dry periods persist for 18-24 months - **Management impact**: - Droughts have multi-year impacts (can't recover in one wet season) - Forecasting possible using 6-12 months of history - Interventions (pumping restrictions) show delayed effects (months to years) **Why Small Seasonal Amplitude Matters:** Most aquifers show 5-20 ft seasonal swings. This system shows 0.03-0.11 ft. This means: - ✅ **Buffered water supply** - Minimal seasonal variability = reliable year-round availability - ✅ **Protected from short-term drought** - Storage capacity smooths out dry spells - ⚠️ **Slow to respond to management** - Takes years to see effects of changes - ⚠️ **Hidden stress** - Problems develop slowly, no early warning from seasonal signals **Management Strategy:** Given the confined behavior and long memory: 1. **Monitor trends, not seasons** - Annual measurements sufficient for trend detection 2. **Plan for lag effects** - Policy changes take 2-3 years to show up in water levels 3. **Protect long-term balance** - Focus on cumulative 5-10 year water budgets, not annual 4. **Early intervention** - Can't quickly reverse declining trends once started ::: ### Temporal Decomposition Results **From 6 successfully decomposed wells:** 1. **Long-Term Trends:** - High-quality zones: +0.313 ft/year (rising) - Medium-quality zones: +0.567 ft/year (rising) - Both categories show positive trends (aquifer recovery) 2. **Seasonal Patterns:** - High-quality zones: 0.11 ± 0.11 ft amplitude - Medium-quality zones: 0.03 ft amplitude - Remarkably small seasonal cycles indicate confined system 3. **Variance Explained:** - Trend: 40-60% of total variance (dominant component) - Seasonal: 5-15% of variance (small but measurable) - Residual: 30-50% unexplained (events, noise) ### System Memory **Autocorrelation findings:** - ACF(12 months) = 0.508 (long memory) - Past conditions influence present for >1 year - Confined aquifer characteristics - Drought/wet period impacts persist 18-24 months ### Critical Discovery: Confined Aquifer Behavior **Evidence for confinement:** 1. Tiny seasonal amplitude (0.03-0.11 ft vs 5-20 ft for unconfined) 2. Long system memory (ACF = 0.5 at 12-month lag) 3. Trends dominate variance (40-60% vs seasonal 5-15%) 4. Slow response to external forcing **Physical interpretation:** - Aquifer sealed by overlying confining layer - Responds to pressure changes, not water table fluctuations - Recharge occurs at distant location, travels laterally - Wells monitor deep confined unit (Unit D), not shallow system ::: {.callout-note icon=false} ## Key Takeaways (Plain English) - Many wells **remember past wet and dry periods for a long time**, so drought impacts can last **months to years**. - Some wells show **clear seasonal breathing** (up in wet seasons, down in dry seasons); others are more **buffered and stable**, indicating deeper or better-confined aquifer zones. - Long-term trends help us see whether **the aquifer is slowly rising, falling, or roughly stable**, which matters for future water security. - The code here automates trend and seasonality detection; even if you skip the details, the **plots and summaries tell you how the aquifer is changing over time**. ::: ## Implications for Management ### 1. Rising Trends Are - Both high and medium-quality zones show recovery - Suggests reduced pumping or increased recharge - Current extraction rates appear sustainable - Continue monitoring to ensure trend persists ### 2. Confined Aquifer Requires - Seasonal recharge matters less (small seasonal component) - Long-term cumulative balance critical (trend component) - Pumping effects propagate slowly but persist long - Recovery from over-pumping takes years to decades ### 3. Long Memory Affects - Past conditions valuable for forecasting (high ACF) - Can use 6-12 months of history for predictions - Droughts have multi-year impacts - Single wet year insufficient for recovery ### 4. Data Quality Determines - Only 6 of 18 wells had sufficient continuous data - Irregular monitoring prevents decomposition - Need automated telemetry for reliable temporal analysis - Design monitoring for analysis, not just compliance ## Limitations and Uncertainties **1. Small Sample Size:** - Only 6 wells successfully decomposed (33% of available) - Cannot generalize patterns across entire aquifer - More wells with continuous monitoring needed **2. Short Records:** - 13-year period may miss decadal cycles - Cannot detect multi-decadal climate trends - Long-term (50+ year) records essential for robust trends **3. Unbalanced Quality Distribution:** - 5 high-quality vs 1 medium-quality well decomposed - Cannot compare seasonal patterns robustly across categories - Sampling bias toward productive zones **4. Interpolation Bias:** - 6.2% of data interpolated (gaps ≤7 days) - Smooths short-term variability - Limited to short gaps only (conservative approach) **5. No Causal Attribution:** - Decomposition is descriptive, not explanatory - Cannot determine if trends due to pumping, climate, or other factors - Cross-correlation with drivers needed (next analyses) ## Next Steps **Immediate:** 1. Apply decomposition to more wells as continuous data becomes available 2. Acquire pumping records to explain rising trends 3. Compare decomposition results by HTEM-predicted aquifer quality **Medium-Term:** 4. Cross-correlate trend component with precipitation trends 5. Cross-correlate seasonal component with climate forcing 6. Link residual anomalies to extreme events (floods, droughts) **Long-Term:** 7. Develop predictive models using decomposition components 8. Integrate with spatial patterns (Part 2: HTEM analysis) 9. Build real-time monitoring dashboard with decomposition ## Summary Water level temporal analysis reveals: ✅ **Aquifer is recovering** - Both quality categories show rising trends (+0.3 to +0.6 ft/year) ✅ **System is confined** - Tiny seasonal amplitude (0.03-0.11 ft) and long memory (ACF = 0.5) ✅ **Trends dominate** - 40-60% of variance explained by long-term progression ✅ **System has memory** - Past conditions affect present for 18-24 months ⚠️ **Data gaps limit analysis** - Only 33% of wells had sufficient continuous records ⚠️ **Short records** - 13 years insufficient for detecting decadal climate cycles **Key Insight:** The confined aquifer characteristics discovered here have profound implications for management - responses are slow, memory is long, and recovery from stress takes years, not months. This fundamentally differs from unconfined systems and requires adapted management strategies. --- ## Reflection Questions - Looking at the decomposition and trend results, how would you explain to a non-technical stakeholder why a seemingly small seasonal amplitude can still carry important information about confinement? - If you could improve only one aspect of the groundwater monitoring network to strengthen the trend analysis in this chapter (record length, continuity, number of wells, or metadata on pumping), which would you choose and why? - How might your interpretation of trends change if you knew that a major pumping station started or stopped partway through the record, and how would you try to detect that in the data? - Where do you see the biggest risk of misinterpreting apparent trends if you ignore autocorrelation, non-stationarity, or cross-source checks with precipitation and streamflow? --- ## Related Chapters - [Seasonal Decomposition](seasonal-decomposition.qmd) - Multi-scale temporal patterns - [Memory & Persistence Study](memory-persistence-study.qmd) - Hurst exponents and memory analysis - [Streamflow Variability](streamflow-variability.qmd) - 77-year baseflow trends

20.1 What You Will Learn in This Chapter

20.2 Introduction

20.3 Data and Methods

20.3.1 Data Sources

20.3.2 STL Decomposition Method

20.3.3 Mann-Kendall Trend Test

20.4 Setup and Data Loading

20.5 Temporal Decomposition Analysis

20.5.1 Groundwater Level Decomposition

20.5.2 Interpretation

20.6 Mann-Kendall Trend Detection

20.7 Autocorrelation Analysis

20.8 Key Findings

20.8.1 Temporal Decomposition Results

20.8.2 System Memory

20.8.3 Critical Discovery: Confined Aquifer Behavior

20.9 Implications for Management

20.9.1 1. Rising Trends Are

20.9.2 2. Confined Aquifer Requires

20.9.3 3. Long Memory Affects

20.9.4 4. Data Quality Determines

20.10 Limitations and Uncertainties

20.11 Next Steps

20.12 Summary

20.13 Reflection Questions

20.14 Related Chapters