21 Seasonal Decomposition Analysis

Multi-scale temporal patterns using STL decomposition

For Newcomers

You will learn:

How to separate “signal” from “noise” in water level data
Why groundwater has a seasonal rhythm (spring highs, summer lows)
What long-term trends reveal about aquifer sustainability
How to identify unusual events that break the normal pattern

Think of this like an audio equalizer that separates bass (long-term trends), mid-range (seasonal patterns), and treble (short-term events). Once separated, each “frequency” tells a different story about the aquifer.

21.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

Explain in plain language what STL decomposition does and why it is useful for groundwater time series.
Interpret the trend, seasonal, and residual components of a decomposed groundwater-level record and relate them to physical processes.
Use variance decomposition to decide whether long-term trends, seasonal cycles, or short-term events dominate a given well.
Connect the dominant temporal component to monitoring design (sampling frequency) and management strategy.

21.2 Introduction

Groundwater systems operate simultaneously across multiple temporal scales - from daily precipitation events to decadal climate cycles. This chapter uses STL (Seasonal and Trend decomposition using Loess) to separate these scales, revealing which time periods dominate aquifer behavior.

Key Questions:

How much variance is in the trend vs seasonal vs residual components?
What is the characteristic seasonal pattern in groundwater levels?
Are temporal patterns stationary or changing over time?
What physical processes operate at each scale?

🎯 Plain Language: What is “Decomposition”?

Think of water levels like a music signal: When you hear a song, your ears perceive one sound—but it’s actually multiple instruments playing together. Audio engineers can “decompose” the song into separate tracks (drums, bass, vocals).

STL decomposition does the same for water levels:

What you measure: One water level value each day
What’s really happening: Multiple processes (climate trends, seasonal pumping, random storms) all mixed together
What STL does: Separates the single measurement into distinct “tracks”—each representing a different process

Why this matters: Once separated, you can answer questions like “Is the long-term trend declining?” or “How strong is the seasonal pumping signal?”—questions impossible to answer looking at raw data alone.

💻 For Computer Scientists

STL decomposition separates a time series into three additive components:

\[Y_t = T_t + S_t + R_t\]

Where: - \(Y_t\) = observed time series - \(T_t\) = trend component (long-term direction) - \(S_t\) = seasonal component (repeating pattern) - \(R_t\) = residual component (unexplained variance)

STL advantages: - Handles any type of seasonality (not just monthly/yearly) - Robust to outliers via LOESS (locally weighted regression) - Seasonal component allowed to change over time - No assumptions about stationarity

Key parameters: - period: Length of seasonal cycle (12 for monthly data with annual seasonality) - seasonal: Controls smoothness of seasonal component (higher = smoother) - trend: Controls smoothness of trend component (higher = smoother)

🌍 For Geologists/Hydrologists

Physical interpretation of STL components:

Trend Component (multi-year): - Long-term climate change impacts - Aquifer depletion/recovery from pumping - Land use changes affecting recharge - Regional water management policies - Timescale: Years to decades

Seasonal Component (annual cycle): - Recharge-discharge seasonal pattern - Agricultural pumping seasons - Evapotranspiration variations - Seasonal precipitation patterns - Timescale: 12-month cycle

Residual Component (short-term): - Individual storm events - Pumping fluctuations - Barometric pressure effects - Data measurement errors - Timescale: Days to weeks

Physical meaning of variance distribution: - High seasonal variance → Strong annual forcing (recharge/pumping cycles) - High trend variance → Long-term changes dominate (climate/management) - High residual variance → Event-driven system (storms, pumping)

Understanding component variance reveals which processes dominate and informs monitoring design.

21.3 Data and Methods

21.3.1 Data Loading

Show setup and imports code

import os
import sys
from pathlib import Path

import numpy as np
import pandas as pd
import sqlite3
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings

warnings.filterwarnings("ignore")

# Conditional import for statsmodels
try:
    from statsmodels.tsa.seasonal import STL
    STATSMODELS_AVAILABLE = True
except ImportError:
    STATSMODELS_AVAILABLE = False
    print("Note: statsmodels not available. STL decomposition will use fallback.")


def find_repo_root(start: Path) -> Path:
    for candidate in [start, *start.parents]:
        if (candidate / "src").exists():
            return candidate
    return start


quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd())))
project_root = find_repo_root(quarto_project)
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

from src.utils import get_data_path

print("Seasonal decomposition analysis initialized")

Seasonal decomposition analysis initialized

21.3.2 Load Groundwater Time Series

Show groundwater data loading code

# Connect to aquifer database with error handling
data_loaded = False

try:
    aquifer_db = get_data_path("aquifer_db")
    conn = sqlite3.connect(aquifer_db)

    # Load groundwater data with proper timestamp parsing
    query = """
    SELECT P_Number, TIMESTAMP, Water_Surface_Elevation
    FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
    WHERE Water_Surface_Elevation IS NOT NULL
    ORDER BY TIMESTAMP
    """

    gw_df = pd.read_sql_query(query, conn)
    conn.close()

    # Parse timestamps using US format (M/D/YYYY)
    gw_df['MeasurementDate'] = pd.to_datetime(gw_df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')
    gw_df = gw_df.dropna(subset=['MeasurementDate', 'Water_Surface_Elevation'])
    data_loaded = True
    print(f"✅ Loaded {len(gw_df):,} measurements from database")

except Exception as e:
    print(f"⚠️ Error loading groundwater from aquifer.db: {e}")
    print(f"   Table: OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY")
    print("   Using empty dataset - visualizations will show placeholder data")
    data_loaded = False
    gw_df = pd.DataFrame(columns=['P_Number', 'TIMESTAMP', 'Water_Surface_Elevation', 'MeasurementDate'])

# Select a well with longest time span for temporal analysis
well_stats = gw_df.groupby('P_Number')['MeasurementDate'].agg(['min', 'max', 'count'])
well_stats['span_days'] = (well_stats['max'] - well_stats['min']).dt.days
well_stats = well_stats.sort_values('span_days', ascending=False)

print(f"\nTop wells by time span:")
print(well_stats.head(10))

# Use the well with longest time span
selected_well = well_stats.index[0]
well_data = gw_df[gw_df['P_Number'] == selected_well].copy()
well_data = well_data.sort_values('MeasurementDate')

print(f"\nSelected well: {selected_well}")
print(f"  Total measurements: {len(well_data)}")
print(f"  Date range: {well_data['MeasurementDate'].min()} to {well_data['MeasurementDate'].max()}")

# Aggregate to monthly data for efficiency
well_data.set_index('MeasurementDate', inplace=True)
monthly_wl = well_data['Water_Surface_Elevation'].resample('ME').mean().dropna()

# Limit to ~500 points maximum (last ~40 years of monthly data)
if len(monthly_wl) > 500:
    monthly_wl = monthly_wl.iloc[-500:]

print(f"  Monthly aggregated: {len(monthly_wl)} points")
print(f"  Analysis period: {monthly_wl.index.min()} to {monthly_wl.index.max()}")

✅ Loaded 1,033,355 measurements from database

Top wells by time span:
                min        max   count  span_days
P_Number                                         
444863   2008-08-27 2023-06-02  127967       5392
381684   2008-09-04 2023-06-02  113570       5384
434983   2008-07-09 2020-04-28  102372       4311
444855   2012-12-17 2023-06-02   47317       3819
495463   2019-02-22 2023-06-02   37022       1561
496467   2019-02-22 2023-06-02   34972       1561
452904   2019-08-01 2023-06-02   33611       1401
381687   2019-10-21 2023-06-02   31254       1320
268557   2019-10-22 2023-06-02   30936       1319
505586   2021-10-05 2023-06-02   14515        605

Selected well: 444863
  Total measurements: 127967
  Date range: 2008-08-27 00:00:00 to 2023-06-02 00:00:00
  Monthly aggregated: 179 points
  Analysis period: 2008-08-31 00:00:00 to 2023-06-30 00:00:00

21.4 STL Decomposition

21.4.1 Compute Seasonal Decomposition

Show STL decomposition computation code

# STL decomposition with period=12 (monthly data, annual seasonality)
stl = STL(monthly_wl, period=12, seasonal=13, trend=None)
result = stl.fit()

# Extract components
observed = monthly_wl
trend = result.trend
seasonal = result.seasonal
residual = result.resid

print(f"\nSTL decomposition completed:")
print(f"  Original series: {len(observed)} monthly observations")
print(f"  Components extracted: Trend, Seasonal (12-month), Residual")
print(f"\nComponent statistics:")
print(f"  Trend range: {trend.min():.2f} to {trend.max():.2f} ft")
print(f"  Seasonal amplitude: {seasonal.max() - seasonal.min():.2f} ft")
print(f"  Residual std dev: {residual.std():.2f} ft")


STL decomposition completed:
  Original series: 179 monthly observations
  Components extracted: Trend, Seasonal (12-month), Residual

Component statistics:
  Trend range: 611.59 to 678.80 ft
  Seasonal amplitude: 15.55 ft
  Residual std dev: 4.32 ft

21.4.2 Visualize STL Components

Show STL component visualization code

fig = make_subplots(
    rows=4, cols=1,
    subplot_titles=(
        'Original Time Series',
        'Trend Component (Multi-Year Pattern)',
        'Seasonal Component (Annual Cycle)',
        'Residual Component (Short-Term Variability)'
    ),
    vertical_spacing=0.08,
    row_heights=[0.25, 0.25, 0.25, 0.25]
)

# Panel 1: Original time series
fig.add_trace(
    go.Scatter(
        x=observed.index,
        y=observed.values,
        mode='lines',
        line=dict(color='#2c3e50', width=1.5),
        name='Observed',
        hovertemplate='%{x|%Y-%m}<br>%{y:.2f} ft<extra></extra>'
    ),
    row=1, col=1
)

# Panel 2: Trend component
fig.add_trace(
    go.Scatter(
        x=trend.index,
        y=trend.values,
        mode='lines',
        line=dict(color='#e74c3c', width=2),
        name='Trend',
        hovertemplate='%{x|%Y-%m}<br>%{y:.2f} ft<extra></extra>'
    ),
    row=2, col=1
)

# Panel 3: Seasonal component
fig.add_trace(
    go.Scatter(
        x=seasonal.index,
        y=seasonal.values,
        mode='lines',
        line=dict(color='#3498db', width=1.5),
        name='Seasonal',
        hovertemplate='%{x|%Y-%m}<br>%{y:.2f} ft<extra></extra>'
    ),
    row=3, col=1
)

# Add horizontal line at 0 for seasonal
fig.add_hline(y=0, line=dict(color='gray', dash='dash', width=1), row=3, col=1)

# Panel 4: Residual component
fig.add_trace(
    go.Scatter(
        x=residual.index,
        y=residual.values,
        mode='lines',
        line=dict(color='#95a5a6', width=1),
        name='Residual',
        hovertemplate='%{x|%Y-%m}<br>%{y:.2f} ft<extra></extra>'
    ),
    row=4, col=1
)

# Add horizontal line at 0 for residual
fig.add_hline(y=0, line=dict(color='gray', dash='dash', width=1), row=4, col=1)

# Update axes
fig.update_xaxes(title_text='Date', row=4, col=1)
fig.update_yaxes(title_text='Water Level (ft)', row=1, col=1)
fig.update_yaxes(title_text='Trend (ft)', row=2, col=1)
fig.update_yaxes(title_text='Seasonal (ft)', row=3, col=1)
fig.update_yaxes(title_text='Residual (ft)', row=4, col=1)

fig.update_layout(
    height=1000,
    showlegend=False,
    hovermode='x unified',
    template='plotly_white'
)

fig.show()

print("\nVisualization shows:")
print("  - Original series with clear seasonal oscillations")
print("  - Trend captures long-term changes (years to decades)")
print("  - Seasonal pattern repeats annually with consistent amplitude")
print("  - Residuals show short-term variability and measurement noise")


Visualization shows:
  - Original series with clear seasonal oscillations
  - Trend captures long-term changes (years to decades)
  - Seasonal pattern repeats annually with consistent amplitude
  - Residuals show short-term variability and measurement noise

(a) STL Decomposition of Groundwater Levels showing trend, seasonal, and residual components

(b)

Figure 21.1

📘 How to Read the 4-Panel STL Decomposition

What Is It? STL decomposes your time series into three additive components: Trend (long-term direction), Seasonal (repeating annual pattern), and Residual (everything else). Reading this plot reveals which timescales dominate aquifer behavior.

Why Does It Matter? Understanding component structure informs monitoring design (how often to sample), forecasting strategy (which patterns are predictable), and management timescales (when interventions matter).

How to Interpret Each Panel:

Panel	What to Look For	Physical Meaning	Management Action
Original (Top)	Overall pattern, amplitude	Raw measured behavior	Context for all other panels
Trend (2nd)	Direction (rising/falling), curvature	Multi-year aquifer changes (climate, pumping)	Long-term sustainability planning
Seasonal (3rd)	Amplitude, regularity, stability	Annual recharge-discharge cycle	Seasonal water allocation timing
Residual (Bottom)	Magnitude, spikiness, randomness	Events, noise, measurement error	Real-time monitoring needs

Key Patterns:

Smooth trend line: Gradual long-term change (predictable)
Wiggly trend line: Regime shifts or non-stationary behavior (requires investigation)
Large seasonal amplitude: Strong annual forcing (recharge/pumping cycles dominant)
Small seasonal amplitude: Buffered system (deep confined aquifer)
Small residuals: Well-explained by trend+seasonal (good model fit)
Large spiky residuals: Event-driven or high noise (need more frequent monitoring)


## Variance Decomposition

### Component Variance Analysis

::: {.callout-note icon=false}
## Understanding Variance Decomposition

**What Is It?**

Variance decomposition partitions the total variability in water levels into three components: how much variation comes from long-term trends, seasonal cycles, and short-term events. This analysis answers "what drives the ups and downs we see?"

**Why Does It Matter?**

Knowing the dominant source of variability informs management:

- **Trend-dominated**: Focus on long-term sustainability (climate change, pumping rates)
- **Season-dominated**: Plan around annual cycles (recharge windows, pumping schedules)
- **Residual-dominated**: Need real-time monitoring for event-driven responses

**How Does It Work?**

Since STL decomposes the signal as: Observed = Trend + Seasonal + Residual

We can calculate the variance (measure of spread) of each component and express as percentage of total variance:

- **Var(Trend)**: Variance explained by multi-year changes
- **Var(Seasonal)**: Variance explained by annual cycles
- **Var(Residual)**: Unexplained variance (events, noise, measurement error)

**What Will You See?**

1. **Table**: Numerical breakdown showing variance and percentage for each component
2. **Bar Chart**: Visual comparison of which component explains most variability
3. **Seasonal Pattern**: The average annual cycle isolated from other effects

**How to Interpret:**

| Dominant Component | % Variance | Management Implication |
|-------------------|------------|------------------------|
| **Trend** | >50% | Long-term planning critical - system changing over time |
| **Seasonal** | >50% | Annual patterns dominate - align operations with seasons |
| **Residual** | >40% | High event sensitivity - need frequent monitoring |
| **Balanced** | ~33% each | Multiple timescales matter - need integrated approach |

**Typical Aquifer Patterns:**

- **Confined aquifer**: Trend 40-60%, Seasonal 5-15%, Residual 30-50%
- **Unconfined aquifer**: Trend 20-30%, Seasonal 40-60%, Residual 10-20%
- **Event-driven system**: Trend 10-20%, Seasonal 10-30%, Residual 50-70%

:::

::: {#9a07a4a4 .cell execution_count=5}
``` {.python .cell-code code-fold="true" code-summary="Show variance calculation code"}
# Calculate variance of each component
var_observed = observed.var()
var_trend = trend.var()
var_seasonal = seasonal.var()
var_residual = residual.var()

# Total variance (should approximately equal observed variance)
var_total = var_trend + var_seasonal + var_residual

# Percentage contribution
pct_trend = 100 * var_trend / var_total
pct_seasonal = 100 * var_seasonal / var_total
pct_residual = 100 * var_residual / var_total

print("\nVariance Decomposition:")
print("="*70)
print(f"{'Component':<20} {'Variance':<15} {'Percentage':<15}")
print("="*70)
print(f"{'Trend':<20} {var_trend:>10.4f}     {pct_trend:>6.1f}%")
print(f"{'Seasonal':<20} {var_seasonal:>10.4f}     {pct_seasonal:>6.1f}%")
print(f"{'Residual':<20} {var_residual:>10.4f}     {pct_residual:>6.1f}%")
print("="*70)
print(f"{'Total':<20} {var_total:>10.4f}     {100.0:>6.1f}%")
print(f"{'Observed':<20} {var_observed:>10.4f}")
print("="*70)


Variance Decomposition:
======================================================================
Component            Variance        Percentage     
======================================================================
Trend                  435.9434       94.6%
Seasonal                 6.1602        1.3%
Residual                18.6714        4.1%
======================================================================
Total                  460.7750      100.0%
Observed               483.1449
======================================================================

21.4.3 Visualize Variance Distribution

Show variance visualization code

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Variance by Component', 'Seasonal Pattern (Averaged Across Years)'),
    horizontal_spacing=0.15,
    specs=[[{"type": "bar"}, {"type": "scatter"}]]
)

# Panel A: Variance bars
components = ['Trend', 'Seasonal', 'Residual']
variances = [pct_trend, pct_seasonal, pct_residual]
colors = ['#e74c3c', '#3498db', '#95a5a6']

fig.add_trace(
    go.Bar(
        x=components,
        y=variances,
        marker=dict(color=colors, line=dict(color='black', width=1)),
        text=[f'{v:.1f}%' for v in variances],
        textposition='outside',
        hovertemplate='%{x}<br>%{y:.1f}% of variance<extra></extra>',
        showlegend=False
    ),
    row=1, col=1
)

# Panel B: Average seasonal pattern
# Extract just one year of seasonal pattern (it repeats)
seasonal_pattern = seasonal.iloc[:12].values
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

fig.add_trace(
    go.Scatter(
        x=months,
        y=seasonal_pattern,
        mode='lines+markers',
        line=dict(color='#3498db', width=3),
        marker=dict(size=10, color='#3498db', line=dict(color='black', width=1)),
        name='Seasonal',
        hovertemplate='%{x}<br>%{y:.2f} ft<extra></extra>',
        showlegend=False
    ),
    row=1, col=2
)

# Add zero line
fig.add_hline(y=0, line=dict(color='gray', dash='dash', width=1), row=1, col=2)

# Update axes
fig.update_xaxes(title_text='Component', row=1, col=1)
fig.update_yaxes(title_text='Variance (%)', range=[0, max(variances)*1.2], row=1, col=1)
fig.update_xaxes(title_text='Month', row=1, col=2)
fig.update_yaxes(title_text='Seasonal Effect (ft)', row=1, col=2)

fig.update_layout(
    height=500,
    template='plotly_white',
    hovermode='closest'
)

fig.show()

# Identify peak recharge/discharge months
max_seasonal = seasonal.iloc[:12].idxmax()
min_seasonal = seasonal.iloc[:12].idxmin()

print(f"\nSeasonal Pattern:")
print(f"  Peak water levels: {max_seasonal.strftime('%B')} (+{seasonal_pattern.max():.2f} ft)")
print(f"  Lowest water levels: {min_seasonal.strftime('%B')} ({seasonal_pattern.min():.2f} ft)")
print(f"  Seasonal amplitude: {seasonal_pattern.max() - seasonal_pattern.min():.2f} ft")

Figure 21.2: Distribution of variance across STL components reveals dominant temporal patterns


Seasonal Pattern:
  Peak water levels: October (+1.90 ft)
  Lowest water levels: May (-1.39 ft)
  Seasonal amplitude: 3.29 ft

21.5 Component Analysis

21.5.1 Residual Diagnostics

Understanding Residual Analysis

What Is It?

After removing trend and seasonality, what’s left should be “white noise”—random, unpredictable variation with no pattern. If residuals still show patterns, it means our decomposition missed something important.

Why Does It Matter?

Good residuals (white noise) mean:

Complete decomposition: We’ve captured all systematic patterns
Valid forecasts: Can predict future using trend + seasonal components
Trustworthy inference: Statistical tests on components are reliable

Bad residuals (not white noise) mean:

Missing patterns: Maybe sub-seasonal cycles, delayed responses, or regime changes
Model inadequacy: Need more sophisticated methods (time-varying seasonality, nonlinear trends)

How Does It Work?

We test residuals for “white noise” properties:

Mean near zero: No systematic bias (should average to 0)
Constant variance: Spread doesn’t change over time
Normal distribution: Bell-shaped histogram (validated by Shapiro-Wilk test, 1965)
No autocorrelation: No patterns in the ACF plot (bars within red bounds)

What Will You See?

Statistics Table: Mean, standard deviation, skewness, kurtosis, normality test
Histogram: Distribution shape with normal curve overlay (red line)
ACF Plot: Bars showing correlation at different lags with significance bounds (red dashed lines)

How to Interpret:

Diagnostic	Good (White Noise)	Bad (Pattern Remains)
Mean	≈ 0	Far from 0
Histogram	Bell-shaped, symmetric	Skewed, multi-modal, heavy tails
Shapiro-Wilk p	> 0.05 (normal)	< 0.05 (non-normal)
ACF bars	<10% outside red lines	Many bars outside red lines
Skewness	≈ 0 (symmetric)	>>0 or <<0 (asymmetric)
Kurtosis	≈ 0 (normal tails)	>>0 (heavy tails) or <<0 (light tails)

What Patterns in Residuals Mean:

Significant ACF at lag 1-3: Short-term dependencies (weather events, pumping)
Significant ACF at lag 12: Incomplete seasonal removal
Skewness ≠ 0: Asymmetric events (e.g., pumping creates only drawdown, not rise)
High kurtosis: Occasional extreme events (floods, droughts)

Show residual diagnostics code

# Check if residuals are white noise (as they should be if decomposition is good)
try:
    from scipy import stats
    SCIPY_AVAILABLE = True
except ImportError:
    SCIPY_AVAILABLE = False
    print("Note: scipy not available. Skipping normality tests.")

# Compute residual statistics
residual_mean = residual.mean()
residual_std = residual.std()

if SCIPY_AVAILABLE:
    residual_skew = stats.skew(residual.dropna())
    residual_kurt = stats.kurtosis(residual.dropna())
    # Test for normality
    shapiro_stat, shapiro_p = stats.shapiro(residual.dropna())
else:
    residual_skew = 0.0
    residual_kurt = 0.0
    shapiro_stat, shapiro_p = 0.0, 1.0
    print("(Skewness, kurtosis, and normality test skipped - scipy not available)")

print("\nResidual Diagnostics:")
print("="*70)
print(f"Mean:     {residual_mean:>10.4f} (should be ~0)")
print(f"Std Dev:  {residual_std:>10.4f}")
print(f"Skewness: {residual_skew:>10.4f} (0 = symmetric)")
print(f"Kurtosis: {residual_kurt:>10.4f} (0 = normal)")
print(f"\nShapiro-Wilk Test for Normality:")
print(f"  Statistic: {shapiro_stat:.4f}")
print(f"  P-value:   {shapiro_p:.4f}")
if shapiro_p > 0.05:
    print(f"  Result: Residuals appear normally distributed (p > 0.05)")
else:
    print(f"  Result: Residuals deviate from normality (p < 0.05)")
print("="*70)


Residual Diagnostics:
======================================================================
Mean:         0.0076 (should be ~0)
Std Dev:      4.3210
Skewness:    -0.0847 (0 = symmetric)
Kurtosis:    12.7068 (0 = normal)

Shapiro-Wilk Test for Normality:
  Statistic: 0.7349
  P-value:   0.0000
  Result: Residuals deviate from normality (p < 0.05)
======================================================================

21.5.2 Visualize Residual Distribution

Show residual visualization code

# Import scipy.stats for normal distribution overlay
try:
    from scipy import stats as scipy_stats
    SCIPY_STATS_AVAILABLE = True
except ImportError:
    SCIPY_STATS_AVAILABLE = False

# Import statsmodels acf or use numpy fallback
try:
    from statsmodels.tsa.stattools import acf as statsmodels_acf
    def acf_func(x, nlags=40, fft=False):
        return statsmodels_acf(x, nlags=nlags, fft=fft)
except ImportError:
    # Numpy fallback for ACF
    def acf_func(x, nlags=40, fft=False):
        x = np.asarray(x)
        x = x - np.mean(x)
        n = len(x)
        result = np.correlate(x, x, mode='full')
        result = result[n-1:] / (np.var(x) * np.arange(n, 0, -1))
        return result[:nlags + 1]

fig = make_subplots(
    rows=1, cols=2,
    subplot_titles=('Residual Histogram', 'Residual Autocorrelation'),
    horizontal_spacing=0.12
)

# Panel A: Histogram
fig.add_trace(
    go.Histogram(
        x=residual.dropna(),
        nbinsx=30,
        marker=dict(color='#95a5a6', line=dict(color='black', width=1)),
        name='Residuals',
        hovertemplate='Bin: %{x:.2f}<br>Count: %{y}<extra></extra>',
        showlegend=False
    ),
    row=1, col=1
)

# Add normal distribution overlay (if scipy available)
if SCIPY_STATS_AVAILABLE:
    x_range = np.linspace(residual.min(), residual.max(), 100)
    normal_dist = scipy_stats.norm.pdf(x_range, residual_mean, residual_std)
    # Scale to match histogram
    normal_dist_scaled = normal_dist * len(residual) * (residual.max() - residual.min()) / 30

    fig.add_trace(
        go.Scatter(
            x=x_range,
            y=normal_dist_scaled,
            mode='lines',
            line=dict(color='red', width=2),
            name='Normal Fit',
            hovertemplate='Value: %{x:.2f}<br>Density: %{y:.2f}<extra></extra>',
            showlegend=False
        ),
        row=1, col=1
    )

# Panel B: Autocorrelation (first 24 lags = 2 years)
lags = 24
acf_values = acf_func(residual.dropna(), nlags=lags, fft=False)

fig.add_trace(
    go.Bar(
        x=list(range(lags+1)),
        y=acf_values,
        marker=dict(color='#95a5a6', line=dict(color='black', width=1)),
        name='ACF',
        hovertemplate='Lag: %{x} months<br>ACF: %{y:.3f}<extra></extra>',
        showlegend=False
    ),
    row=1, col=2
)

# Add significance bounds (95% confidence)
conf_bound = 1.96 / np.sqrt(len(residual))
fig.add_hline(y=conf_bound, line=dict(color='red', dash='dash'), row=1, col=2)
fig.add_hline(y=-conf_bound, line=dict(color='red', dash='dash'), row=1, col=2)

# Update axes
fig.update_xaxes(title_text='Residual Value (ft)', row=1, col=1)
fig.update_yaxes(title_text='Frequency', row=1, col=1)
fig.update_xaxes(title_text='Lag (months)', row=1, col=2)
fig.update_yaxes(title_text='Autocorrelation', row=1, col=2)

fig.update_layout(
    height=500,
    template='plotly_white',
    hovermode='closest'
)

fig.show()

# Count significant autocorrelations
sig_acf = sum(np.abs(acf_values[1:]) > conf_bound)
print(f"\nAutocorrelation Analysis:")
print(f"  Significant lags (out of {lags}): {sig_acf}")
if sig_acf < lags * 0.1:
    print(f"  Interpretation: Residuals are approximately white noise (good)")
else:
    print(f"  Interpretation: Some residual structure remains (consider different parameters)")

Figure 21.3: Residual diagnostics show distribution and autocorrelation structure


Autocorrelation Analysis:
  Significant lags (out of 24): 14
  Interpretation: Some residual structure remains (consider different parameters)

21.6 Key Findings

21.6.1 Component Hierarchy Discovery

Show component hierarchy analysis code

# Determine dominant component
component_dict = {'Trend': pct_trend, 'Seasonal': pct_seasonal, 'Residual': pct_residual}
dominant = max(component_dict, key=component_dict.get)

findings = f"""
VARIANCE DECOMPOSITION SUMMARY:

1. DOMINANT COMPONENT: {dominant}
   • Trend: {pct_trend:.1f}% of variance
   • Seasonal: {pct_seasonal:.1f}% of variance
   • Residual: {pct_residual:.1f}% of variance

2. SEASONAL PATTERN:
   • Peak levels: {max_seasonal.strftime('%B')} (+{seasonal_pattern.max():.2f} ft)
   • Lowest levels: {min_seasonal.strftime('%B')} ({seasonal_pattern.min():.2f} ft)
   • Annual amplitude: {seasonal_pattern.max() - seasonal_pattern.min():.2f} ft

3. TREND CHARACTERISTICS:
   • Range: {trend.min():.2f} to {trend.max():.2f} ft
   • Total change: {trend.iloc[-1] - trend.iloc[0]:.2f} ft over analysis period

4. RESIDUAL QUALITY:
   • Standard deviation: {residual_std:.2f} ft
   • Mean: {residual_mean:.4f} (close to zero = good)
   • Significant autocorrelations: {sig_acf}/{lags} lags

PHYSICAL INTERPRETATION:
"""

if pct_seasonal > pct_trend and pct_seasonal > pct_residual:
    findings += """
   • Seasonal forcing dominates (recharge/pumping cycles)
   • Predictable annual pattern (good for forecasting)
   • Management should align with seasonal cycles
"""
elif pct_trend > pct_seasonal and pct_trend > pct_residual:
    findings += """
   • Long-term trends dominate (climate/pumping impacts)
   • System changing over time (non-stationary)
   • Management must address long-term sustainability
"""
else:
    findings += """
   • Event-scale variability dominates (storms/pumping)
   • System highly responsive to short-term forcing
   • Management needs real-time monitoring
"""

print(findings)


VARIANCE DECOMPOSITION SUMMARY:

1. DOMINANT COMPONENT: Trend
   • Trend: 94.6% of variance
   • Seasonal: 1.3% of variance
   • Residual: 4.1% of variance

2. SEASONAL PATTERN:
   • Peak levels: October (+1.90 ft)
   • Lowest levels: May (-1.39 ft)
   • Annual amplitude: 3.29 ft

3. TREND CHARACTERISTICS:
   • Range: 611.59 to 678.80 ft
   • Total change: -60.80 ft over analysis period

4. RESIDUAL QUALITY:
   • Standard deviation: 4.32 ft
   • Mean: 0.0076 (close to zero = good)
   • Significant autocorrelations: 14/24 lags

PHYSICAL INTERPRETATION:

   • Long-term trends dominate (climate/pumping impacts)
   • System changing over time (non-stationary)
   • Management must address long-term sustainability

21.7 Implications for Management

21.7.1 Monitoring Design

Based on the variance decomposition, we can optimize monitoring frequency:

Show monitoring recommendations code

print("\nMonitoring Recommendations:")
print("="*70)

if pct_seasonal > 50:
    print("RECOMMENDATION: Monthly monitoring sufficient")
    print(f"  • Seasonal component ({pct_seasonal:.1f}% variance) captured by monthly data")
    print(f"  • Annual cycle well-characterized with 12 samples/year")
    print(f"  • Cost-effective for long-term tracking")
elif pct_residual > 40:
    print("RECOMMENDATION: Weekly or bi-weekly monitoring")
    print(f"  • High residual variance ({pct_residual:.1f}%) indicates event-driven system")
    print(f"  • Need higher frequency to capture short-term dynamics")
    print(f"  • Critical for real-time management")
else:
    print("RECOMMENDATION: Quarterly monitoring adequate")
    print(f"  • Trend dominates ({pct_trend:.1f}% variance)")
    print(f"  • Seasonal patterns secondary")
    print(f"  • Focus on long-term tracking")

print("\nADDITIONAL CONSIDERATIONS:")
print(f"  • Deploy continuous sensors if residual > 30% ({pct_residual:.1f}%)")
print(f"  • Annual sampling sufficient if trend > 60% ({pct_trend:.1f}%)")
print("="*70)


Monitoring Recommendations:
======================================================================
RECOMMENDATION: Quarterly monitoring adequate
  • Trend dominates (94.6% variance)
  • Seasonal patterns secondary
  • Focus on long-term tracking

ADDITIONAL CONSIDERATIONS:
  • Deploy continuous sensors if residual > 30% (4.1%)
  • Annual sampling sufficient if trend > 60% (94.6%)
======================================================================

21.7.2 Forecast Horizons

Component-dependent predictability:

Trend component: Highly predictable (smooth, slow-changing)
- Forecast horizon: Years to decades
- Use for long-term planning and infrastructure design
Seasonal component: Moderately predictable (repeating pattern)
- Forecast horizon: Months to 1 year
- Use for annual water allocation and pumping schedules
Residual component: Low predictability (white noise)
- Forecast horizon: Days to weeks (if autocorrelated)
- Requires real-time monitoring for management

21.7.3 Management Interventions

Align interventions with dominant component:

Show management strategy code

print("\nManagement Strategy Recommendations:")
print("="*70)

if pct_trend > max(pct_seasonal, pct_residual):
    print("STRATEGY: Long-term sustainability focus")
    print("  • Address trend direction (declining/rising)")
    print("  • Implement demand management")
    print("  • Plan infrastructure for future conditions")
    print("  • Monitor climate change impacts")

elif pct_seasonal > max(pct_trend, pct_residual):
    print("STRATEGY: Seasonal management focus")
    print("  • Optimize pumping around recharge seasons")
    print("  • Implement seasonal water allocation")
    print("  • Design storage for inter-seasonal transfer")
    print("  • Coordinate with agricultural cycles")

else:
    print("STRATEGY: Adaptive/real-time management")
    print("  • Install continuous monitoring")
    print("  • Implement trigger-based responses")
    print("  • Maintain operational flexibility")
    print("  • Prepare for extreme events")

print("="*70)


Management Strategy Recommendations:
======================================================================
STRATEGY: Long-term sustainability focus
  • Address trend direction (declining/rising)
  • Implement demand management
  • Plan infrastructure for future conditions
  • Monitor climate change impacts
======================================================================

21.8 Key Findings

📊 Variance Decomposition Results Summary

What Does the Decomposition Tell Us?

Based on the variance analysis above, this well’s behavior is dominated by the {dominant} component, explaining {max(pct_trend, pct_seasonal, pct_residual):.1f}% of total variability.

Key Findings by Component:

Component	% Variance	Interpretation	Management Implication
Trend	{pct_trend:.1f}%	{‘Dominant - Long-term changes control system’ if pct_trend > max(pct_seasonal, pct_residual) else ‘Secondary - Some multi-year signal present’ if pct_trend > 20 else ‘Minimal - Stationary baseline’}	{‘Focus on sustainability and long-term planning’ if pct_trend > 40 else ‘Monitor for emerging trends’ if pct_trend > 20 else ‘Short-term focus adequate’}
Seasonal	{pct_seasonal:.1f}%	{‘Dominant - Annual cycles control system’ if pct_seasonal > max(pct_trend, pct_residual) else ‘Secondary - Moderate seasonal forcing’ if pct_seasonal > 20 else ‘Minimal - Weakly seasonal’}	{‘Align operations with recharge seasons’ if pct_seasonal > 40 else ‘Consider seasonal patterns’ if pct_seasonal > 20 else ‘Season less critical’}
Residual	{pct_residual:.1f}%	{‘Dominant - Event-driven, high variability’ if pct_residual > max(pct_trend, pct_seasonal) else ‘Secondary - Moderate event response’ if pct_residual > 20 else ‘Minimal - Smooth, predictable’}	{‘Real-time monitoring essential’ if pct_residual > 40 else ‘Monthly monitoring sufficient’ if pct_residual > 20 else ‘Quarterly adequate’}

Seasonal Pattern Analysis:

Peak month: {max_seasonal.strftime(‘%B’)} (water levels highest)
Lowest month: {min_seasonal.strftime(‘%B’)} (water levels lowest)
Annual amplitude: {seasonal_pattern.max() - seasonal_pattern.min():.2f} ft
Physical meaning: {seasonal_pattern.max() - seasonal_pattern.min():.2f} ft annual swing indicates {‘strong’ if (seasonal_pattern.max() - seasonal_pattern.min()) > 2 else ‘moderate’ if (seasonal_pattern.max() - seasonal_pattern.min()) > 1 else ‘weak’} seasonal forcing

Optimal Monitoring Strategy:

{f’Quarterly sampling recommended - Trend dominates ({pct_trend:.0f}%), slow changes, annual tracking sufficient’ if pct_trend > 50 else f’Monthly sampling recommended - Seasonal patterns dominate ({pct_seasonal:.0f}%), need to capture annual cycle’ if pct_seasonal > 40 else f’Weekly or bi-weekly sampling recommended - High event variability ({pct_residual:.0f}%), need frequent measurements’ if pct_residual > 40 else ‘Monthly sampling recommended - Balanced variance across scales’}

Residual Quality Assessment:

Mean: {residual_mean:.4f} (near zero = good)
Normality: {‘Approximately normal’ if shapiro_p > 0.05 else ‘Deviates from normal’} (p={shapiro_p:.3f})
Autocorrelation: {sig_acf}/{lags} significant lags {‘(good - white noise)’ if sig_acf < lags * 0.1 else ‘(some structure remains)’}
Conclusion: {‘Decomposition captures all systematic patterns’ if sig_acf < lags * 0.1 and shapiro_p > 0.05 else ‘Some residual structure present - consider alternative models’}

21.9 Summary

STL decomposition of groundwater levels reveals:

✅ Component separation - Trend, seasonal, and residual patterns clearly identified

✅ Variance quantification - Each component’s contribution to total variability measured

✅ Seasonal characterization - Annual cycle amplitude and timing determined

✅ Trend detection - Long-term changes isolated from short-term fluctuations

✅ Residual analysis - Unexplained variance characterized and tested

⚠️ Multiple timescales - All components contribute to system behavior

⚠️ Non-stationary potential - Trend component indicates changing baseline

Key Insight: The dominant component (trend, seasonal, or residual) reveals the primary forcing mechanism and informs optimal monitoring frequency and management strategy. Systems dominated by seasonal patterns are predictable and manageable with annual planning; trend-dominated systems require long-term sustainability focus; residual-dominated systems need real-time adaptive management.

21.10 Further Work

Extensions for researchers:

Time-varying seasonality: Apply STL with adaptive seasonal windows to detect changing seasonal amplitude (climate change signature)
Multi-well comparison: Compare variance decomposition across wells to identify regional vs. local patterns
Covariate integration: Correlate residuals with precipitation, pumping, or stream flow to explain unexplained variance
Forecast validation: Use decomposed components for improved water level forecasting (trend + seasonal extrapolation)
Regime detection: Apply change-point detection to identify when variance structure shifts

Open questions:

Does the seasonal amplitude change over decades? (climate signal)
Can residual patterns predict extreme events?
How do pumping regimes affect decomposition structure?

21.11 Reflection Questions

For a well where the seasonal component explains most of the variance, how would you design a monitoring schedule and management plan differently than for a well dominated by long-term trend?
How would you explain the idea of “variance decomposition” to a non-technical audience, and how does it help you decide where to invest in more frequent measurements?
If you observe that the seasonal amplitude is increasing over time, what physical explanations would you consider, and how might that affect vulnerability and operations?
What additional information (for example, pumping records, land-use changes, or precipitation trends) would you bring in to interpret the residual component more confidently?