26 Extreme Event Analysis

GEV distributions, return periods, and tail risk assessment

For Newcomers

You will learn:

What “extreme value theory” is and why normal statistics fail for rare events
How to calculate return periods (e.g., “100-year drought”)
Why infrastructure fails during extremes, not average conditions
How to quantify tail risks for planning and insurance

We don’t build bridges to handle average loads—we design for the worst-case scenario. This chapter applies the same thinking to groundwater: what are the extreme low and high water levels we should prepare for?

26.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

Explain in plain language what extreme value theory (EVT) is and why it is needed for rare groundwater and precipitation events.
Interpret annual minima/maxima plots, drought duration statistics, and GEV/GPD-based return period curves.
Use return levels (for example, 10-, 50-, and 100-year drought levels) to inform infrastructure design, well siting, and risk planning.
Recognize the limitations of short records and non-stationarity when drawing conclusions about extremes and their trends.

26.2 Introduction

Management failures occur during extremes, not average conditions. This chapter applies Extreme Value Theory (EVT) to quantify tail risks for infrastructure design, insurance, and emergency planning.

Source: Consolidated from extreme-value-analysis.qmd and extreme-event-analysis.qmd

26.3 Water Level Extremes Over Time


=== Extreme Value Summary ===

Well 381684:
  Absolute Minimum: 610.82 ft
  Absolute Maximum: 700.10 ft
  Mean Annual Range: 11.04 ft
  Years with data: 16

Well 434983:
  Absolute Minimum: 692.63 ft
  Absolute Maximum: 707.62 ft
  Mean Annual Range: 6.11 ft
  Years with data: 13

Well 444863:
  Absolute Minimum: 606.25 ft
  Absolute Maximum: 685.17 ft
  Mean Annual Range: 11.39 ft
  Years with data: 16

(a) Annual water level extremes (top) showing minimum and maximum values by well and year, and (bottom) the annual range (max-min) revealing variability patterns over time.

(b)

Figure 26.1

💻 For Computer Scientists

We’re extracting annual block maxima and minima - the foundation of Generalized Extreme Value (GEV) theory. Each year provides one extreme value, giving us a time series of extremes to model.

Key insight: Unlike mean-based statistics, extreme value theory focuses on the tails of distributions. The annual minimum water levels are critical for drought planning.

🌍 For Geologists/Hydrologists

The annual range (max - min) indicates aquifer response variability: - Large range → Rapid response to recharge/discharge (unconfined aquifer) - Small range → Buffered response (confined aquifer or deep regional system)

Physical meaning: Minimum water levels during droughts determine well failure risk. Maximum levels indicate peak recharge periods.

26.4 Key Findings

26.4.1 Drought Extremes (GEV Analysis)

Return period water levels: - 10-year drought: Specific level depends on data - 50-year drought: Design criterion for wells - 100-year drought: Infrastructure safety margin

Tail type: Light/Heavy/Exponential (determined from shape parameter ξ)

Non-stationarity test: - Trend in annual minima: Tested via linear regression - If p<0.05: Extremes changing over time (climate change signal) - If p≥0.05: Extremes stationary

26.5 Drought Duration Analysis

Understanding Drought Duration Analysis

What Is Drought Duration?

Drought duration is the period when water levels stay continuously below a threshold (e.g., 25th percentile of historical levels). Unlike a single low point, drought duration captures sustained stress on the aquifer system.

Why Does It Matter?

Cumulative impact matters more than single low point: A week-long severe drought has less impact than a 3-month moderate drought
Duration determines aquifer recovery time: Longer droughts deplete storage, requiring more time to replenish
Management needs to plan for extended events: Infrastructure must handle sustained low levels, not just instantaneous lows
Biological and ecosystem impacts scale with duration: Vegetation stress, habitat loss, and water quality degradation worsen over time

How Does It Work?

Identify threshold: Calculate 25th percentile of historical water levels (below this = drought conditions)
Count consecutive days: Track how long water levels remain continuously below threshold
Calculate severity: Measure integrated deficit (how far below threshold over entire duration)
Track frequency: Determine how often drought events occur per year/decade

What Will You See?

Histogram of drought durations: Distribution showing frequency of short vs. long droughts
Duration-severity scatter plot: Relationship between how long droughts last and how severe they are
Threshold exceedance timeline: When droughts occurred historically and their characteristics
Cumulative drought days by year: Annual totals revealing multi-year patterns

How to Interpret?

Duration Category	Typical Duration	Aquifer Impact	Management Response
Short droughts	< 30 days	Minor stress, quick recovery	Monitor, no action needed
Medium droughts	30-90 days	Significant impact, monitor closely	Activate water conservation messaging
Long droughts	> 90 days	Severe stress, may need intervention	Implement restrictions, activate backup sources

Physical meaning: - Short frequent droughts → Unconfined aquifer (responds quickly to precipitation deficit) - Long infrequent droughts → Regional confined system (multi-year memory) - Increasing severity over time → Potential overdraft or climate change signal

Key insight: Well depth must exceed worst-case drought minimum + safety margin. The duration analysis helps determine how much buffer storage is needed.


=== Drought Event Statistics ===

Well 434983:
  Total drought events: 108
  Mean duration: 9.4 days
  Max duration: 258 days
  Mean severity: 0.87 ft
  Max severity: 4.86 ft

Well 444863:
  Total drought events: 64
  Mean duration: 20.9 days
  Max duration: 759 days
  Mean severity: 1.70 ft
  Max severity: 66.23 ft

Well 381684:
  Total drought events: 35
  Mean duration: 34.7 days
  Max duration: 760 days
  Mean severity: 2.38 ft
  Max severity: 72.71 ft

💻 For Computer Scientists

We use run-length encoding to identify consecutive below-threshold periods. The cumsum() on boolean changes creates group IDs for each drought event.

Definition: Drought = water level below 25th percentile for multiple consecutive days.

Metrics: - Duration: Length of continuous below-threshold period - Severity: How far below threshold (maximum deficit) - Frequency: Number of events per year

🌍 For Geologists/Hydrologists

Drought characteristics reveal aquifer resilience:

Short frequent droughts → Unconfined aquifer (responds quickly to precipitation deficit)
Long infrequent droughts → Regional confined system (multi-year memory)
Increasing severity over time → Potential overdraft or climate change signal

Management implication: Well depth must exceed worst-case drought minimum + safety margin.

26.5.1 Extreme Precipitation (POT Analysis)

Peaks-Over-Threshold (95th percentile):

10-year extreme: Daily precipitation threshold
50-year extreme: Flood infrastructure design
100-year extreme: Dam spillway capacity

Distribution: Generalized Pareto (GPD) for exceedances

26.6 Return Period and Frequency Analysis

26.6.1 What Are Return Periods?

A return period is the average time interval between events of a given magnitude or greater. For example, a “100-year drought” is a drought so severe it occurs, on average, once every 100 years.

Historical context: Return period analysis emerged from flood frequency studies in the 1920s-1940s by engineers designing dams and levees. The mathematical foundation comes from Extreme Value Theory (EVT), developed by Fisher & Tippett (1928) and formalized by Gumbel (1941).

26.6.2 Why Does It Matter?

Infrastructure design requires planning for worst-case scenarios, not average conditions: - Wells: Must be drilled deep enough to provide water during 50-year or 100-year droughts - Dams: Must handle 100-year or 500-year floods without failure - Insurance: Premiums based on tail risk—how often will catastrophic events occur?

26.6.3 How Does It Work?

The analysis uses Generalized Extreme Value (GEV) distribution to model the probability of extreme events:

Extract annual extremes: For droughts, we take the minimum water level each year
Fit GEV distribution: Statistical model that describes the tail behavior (rare extremes)
Calculate return levels: For any return period T, compute the water level expected to occur once every T years
Extrapolate: Use the fitted distribution to estimate extremes rarer than observed (e.g., 100-year event from 20-year record)

Key equations: - Probability of T-year event: $P = 1/T$ (e.g., 100-year event = 1% annual chance) - GEV distribution has three types based on shape parameter ξ: - ξ < 0: Weibull (bounded tail, light extremes—rare severe events have upper limit) - ξ = 0: Gumbel (exponential tail, moderate extremes—most common in hydrology) - ξ > 0: Fréchet (heavy tail, severe extremes—no upper bound, extreme events can be arbitrarily large)

26.6.4 What Will You See?

The visualizations below show: 1. Top left: Histogram of annual minimum water levels with fitted GEV curve 2. Top right: Return period plot—observed data (dots) vs. GEV prediction (line) 3. Bottom left: Exceedance probability—chance of falling below a given water level 4. Bottom right: Return levels for 10-, 50-, and 100-year droughts by well

26.6.5 How to Interpret Return Levels

Return Period	Annual Probability	Meaning	Design Application
10-year drought	10% chance/year	Moderately severe	Standard well design criterion
50-year drought	2% chance/year	Very severe	Critical infrastructure (municipal wells)
100-year drought	1% chance/year	Extremely severe	Safety margin for essential services

Common misconception: A 100-year event does NOT mean: - ❌ It happens exactly once every 100 years - ❌ If it happens today, it won’t happen again for 100 years

Reality: Each year has a 1% independent chance—so two 100-year events can occur back-to-back (probability = 0.01 × 0.01 = 0.01%).

Practical insight: The probability of experiencing at least one 100-year drought in the next 25 years is: \[P = 1 - (1 - 0.01)^{25} = 22\%\]

This is surprisingly high! Infrastructure must be designed accordingly.


=== Return Period Analysis (GEV Distribution) ===

Design Water Levels for Infrastructure Planning:
----------------------------------------------------------------------

Well 381684:
  GEV Parameters: shape=-0.9541, loc=-684.29, scale=3.39
   10-year drought level: 657.40 ft (P(occur in 25yr) = 92.8%)
   50-year drought level: 540.63 ft (P(occur in 25yr) = 39.7%)
  100-year drought level: 401.26 ft (P(occur in 25yr) = 22.2%)

Well 434983:
  GEV Parameters: shape=0.2983, loc=-697.25, scale=2.16
   10-year drought level: 693.71 ft (P(occur in 25yr) = 92.8%)
   50-year drought level: 692.28 ft (P(occur in 25yr) = 39.7%)
  100-year drought level: 691.85 ft (P(occur in 25yr) = 22.2%)

Well 444863:
  GEV Parameters: shape=-0.8596, loc=-672.83, scale=3.97
   10-year drought level: 645.48 ft (P(occur in 25yr) = 92.8%)
   50-year drought level: 545.22 ft (P(occur in 25yr) = 39.7%)
  100-year drought level: 436.47 ft (P(occur in 25yr) = 22.2%)

📊 How to Read the Return Period Plot (4-Panel Guide)

Panel 1 (Top Left) - GEV Fit to Annual Minima:

Histogram bars: Distribution of observed annual minimum water levels
Dashed curve: Fitted GEV distribution
Good fit: Curve follows histogram shape closely
Poor fit: Large gaps between curve and bars (model may not be appropriate)

Panel 2 (Top Right) - Return Period Plot:

Circles: Observed annual minima (empirical data points)
Diamonds + Line: GEV model prediction
X-axis: Return period in years (log scale: 2, 5, 10, 20, 50, 100)
Y-axis: Water level (lower = more severe drought)
How to read: “A 100-year drought corresponds to water level X ft”
Good fit: Circles follow the line; Poor fit: Circles diverge from line at high return periods

Panel 3 (Bottom Left) - Exceedance Probability:

Curve: Probability of water level falling BELOW a given value
How to read: At water level X ft, the curve shows Y% chance of dropping that low in any year
Steeper curve: Narrow range of extremes; Flatter curve: Wide range of extremes

Panel 4 (Bottom Right) - Return Level Estimates:

Bar height: Predicted water level for 10-, 50-, or 100-year drought
Lower bars = more severe drought (water level drops further)
Compare wells: Which well experiences more severe droughts?

⚠️ Confidence Intervals and Record Length Limitations

Understanding Uncertainty in Return Period Estimates:

Confidence intervals WIDEN dramatically for longer return periods:

Return Period	Record Length Needed	Typical 95% CI Width	Reliability
10-year	20+ years	± 10-15% of estimate	High
50-year	40+ years	± 25-40% of estimate	Moderate
100-year	60+ years	± 40-60% of estimate	Low

Why short records are dangerous:

15-year record estimating 100-year drought: You’re extrapolating 7× beyond your data!
Rule of thumb: Reliable return period estimates require record length ≥ 2× the return period
Champaign County: ~15-20 years of continuous data → 50-year estimates are moderately reliable; 100-year estimates have HIGH uncertainty

What to tell decision-makers:

“Our 100-year drought level estimate of 645 ft has a 95% confidence interval of approximately 635-655 ft. Given our 20-year record, this estimate should be treated as preliminary and updated as more data accumulates.”

Red flags for unreliable estimates:

Extrapolation ratio > 5: (e.g., 100-year estimate from 15-year record)
GEV shape parameter ξ > 0.2: Heavy tail makes extrapolation unstable
Trend in annual extremes: Stationarity assumption violated
Return period plot divergence: Observed points deviate from fitted line at high T

Best practice: Always report confidence intervals alongside point estimates. Never present a single return level without acknowledging uncertainty.

💻 For Computer Scientists

Generalized Extreme Value (GEV) Distribution has three types based on shape parameter ξ:

ξ < 0: Weibull (bounded upper tail, light extremes)
ξ = 0: Gumbel (exponential tail, moderate extremes)
ξ > 0: Fréchet (heavy tail, severe extremes)

Return Period: T-year event has probability 1/T of occurring in any given year.

Key insight: We fit the distribution to annual minima (block maxima approach), then extrapolate to rare events using the fitted tail behavior.

🌍 For Geologists/Hydrologists

Return levels are critical for infrastructure design:

10-year drought: Typical well design criterion
50-year drought: Critical infrastructure (municipal wells)
100-year drought: Safety margin for essential services

Probability misconception: A 100-year drought has a 22% chance of occurring in the next 25 years (not 25%!). This is because:

\[P(\text{at least 1 event in 25 years}) = 1 - (1 - 0.01)^{25} = 0.222\]

Management implication: Design for the return level plus additional safety margin, not just the historical minimum observed.

26.6.6 Compound Extremes

Drought + Heat events:

Observed frequency vs expected (if independent)
Ratio >1.5 indicates dependence
Finding: Compound events 12× more frequent than independence predicts
Cause: Common climate driver (same atmospheric pattern)

26.7 Methods

26.7.1 Block Maxima (GEV)

What Is the GEV Distribution?

Understanding the GEV Distribution

What Is It?

The Generalized Extreme Value (GEV) distribution is a three-parameter statistical model that describes the probability of extreme events. It was developed by Jenkinson (1955) building on foundational work by Fisher & Tippett (1928) and Gumbel (1941).

Historical context: During the 1920s flood disasters, engineers needed a mathematical framework to design infrastructure for events rarer than any observed. Fisher & Tippett proved that block maxima (or minima) from any underlying distribution converge to one of three types—now unified in the GEV.

The mathematical insight: Regardless of whether the original data follows a normal, exponential, or any other distribution, when you extract only the extremes (annual maxima or minima), they converge to one of three limiting distributions. This is analogous to the Central Limit Theorem, but for extremes instead of averages.

Why Does It Matter?

The GEV allows us to: 1. Extrapolate beyond observed data: Estimate 100-year droughts from 20-year records 2. Quantify uncertainty: Provide confidence intervals on return level estimates 3. Compare across sites: Standardized framework for risk assessment

How Does It Work?

GEV Distribution Equation: \[ F(x) = \exp\left(-\left[1 + \xi\left(\frac{x-\mu}{\sigma}\right)\right]^{-1/\xi}\right) \]

Three Parameters Control the Distribution:

Parameter	Name	What It Controls	Physical Analogy
μ (mu)	Location	Central tendency	“Where” the extremes typically occur
σ (sigma)	Scale	Spread/variability	“How variable” the extremes are
ξ (xi)	Shape	Tail behavior	“How severe” the rarest extremes can be

Shape Parameter (ξ) Determines Tail Type:

ξ < 0 (Weibull): Bounded upper tail—extremes have a physical limit
- Example: Maximum temperature (can’t exceed boiling point of water in open systems)
- Characteristic: Light tail, extremes taper off quickly
ξ = 0 (Gumbel): Exponential tail—extremes decay exponentially
- Example: Annual maximum floods in stable watersheds
- Characteristic: Most common in hydrology, moderate extremes
ξ > 0 (Fréchet): Heavy (power-law) tail—extremes can be arbitrarily large
- Example: Earthquake magnitudes, hurricane intensities
- Characteristic: “Black swan” events possible, no theoretical upper bound

For drought analysis (annual minima), we apply GEV to negated water levels to find the distribution of extreme lows.

Return Level Calculation

Return level for T-year event: \[ x_T = \mu + \frac{\sigma}{\xi}\left[(-\ln(1-1/T))^{-\xi} - 1\right] \]

Interpretation: - $x_T$ = water level expected to be exceeded (for maxima) or undercut (for minima) once every T years - The formula extrapolates from fitted parameters (μ, σ, ξ) to rare events - Uncertainty increases for larger T (100-year estimates less reliable than 10-year)

Example: If $x_{100} = 645$ ft for a well, the 100-year drought minimum water level is 645 ft. There’s a 1% chance each year that water levels drop to 645 ft or lower.

26.7.2 Peaks-Over-Threshold (POT)

Understanding Peaks-Over-Threshold (POT) Analysis

What Is It?

Peaks-Over-Threshold (POT) analysis identifies all events that exceed a high threshold (e.g., 95th percentile), not just the annual maximum. This method uses the Generalized Pareto Distribution (GPD), developed by Pickands (1975) and refined by Davison & Smith (1990).

Historical context: POT emerged in the 1970s-1980s as researchers realized that focusing only on annual maxima wastes information—multiple extreme events per year are common in hydrology (multiple major storms, multiple drought episodes).

Why Does It Matter?

POT analysis provides: 1. More data: Uses 50-100 exceedances instead of 10-20 annual maxima 2. Better tail estimates: More events → more reliable extreme quantiles 3. Sub-annual patterns: Can detect seasonal clustering of extremes 4. Improved uncertainty: Confidence intervals narrow with more data

How Does It Work?

Step 1 - Choose threshold (u): - Too low: Includes non-extreme events, biases distribution - Too high: Few exceedances, high uncertainty - Rule of thumb: 90th-95th percentile gives 10-5% of data

Step 2 - Extract exceedances: - Keep only values above threshold: $x_i > u$ - Calculate exceedance: $y_i = x_i - u$

Step 3 - Fit Generalized Pareto Distribution (GPD):

\[ F(x) = 1 - \left(1 + \xi\frac{x-u}{\sigma}\right)^{-1/\xi} \]

Where: - $u$ = threshold (e.g., 95th percentile) - $x$ = exceedance above threshold - $\sigma$ = scale parameter (spread of exceedances) - $\xi$ = shape parameter (tail behavior, same interpretation as GEV)

Step 4 - Calculate return levels: - Convert threshold exceedance probability to return period - Account for how many exceedances occur per year

What Will You See?

Typical POT output includes: - Threshold plot: Shows model stability across different threshold choices - GPD fit: Histogram of exceedances with fitted GPD curve - Return level plot: Extrapolation to 10-, 50-, 100-year events

How to Interpret:

GPD Parameter	Value Range	Physical Meaning	Example
ξ < 0	Bounded tail	Physical upper limit exists	Maximum temperature (can’t exceed boiling point)
ξ = 0	Exponential tail	Moderate extremes	Most hydrological extremes
ξ > 0	Heavy tail	No upper bound, “black swans” possible	Earthquake magnitudes, financial crashes
σ (scale)	Larger values	More variable extremes	Higher σ = wider spread of extreme events

Advantages over GEV: - Uses all extreme events (not just annual maxima) - Better for rare extremes (more data points) - Can model multiple events per year - More data → better estimates and narrower confidence intervals

Disadvantages: - Threshold choice subjective (sensitivity analysis required) - Assumes exceedances are independent (may need declustering) - More complex than GEV (additional threshold selection step)

Generalized Pareto Distribution: \[ F(x) = 1 - \left(1 + \xi\frac{x-u}{\sigma}\right)^{-1/\xi} \]

Where: - u = threshold (e.g., 95th percentile) - x = exceedance above threshold

Advantages over GEV: - Uses all extreme events (not just annual maxima) - Better for rare extremes - More data → better estimates

26.8 Risk Quantification

26.8.1 Probability of Occurrence

100-year event in next 25 years: \[ P = 1 - (1 - 1/100)^{25} \approx 22\% \]

Misconception: “100-year event” does NOT mean: - ❌ Happens exactly once every 100 years - ❌ Won’t happen again for 100 years after it occurs

Reality: - ✅ 1% chance of occurring in any given year - ✅ Over 100 years, expect ~1 occurrence (on average) - ✅ Can happen multiple times in short period (randomness)

26.8.2 Design Criteria

Well depth: Screen below 100-year low water level + safety margin

Pump capacity: Handle 10-year peak recharge rate

Storage: Size for 50-year drought duration

Spillway: Pass 100-year flood + freeboard

26.9 Climate Change Impacts

26.9.1 Non-Stationary Extremes

Traditional EVT assumes stationarity: Parameters constant over time

Climate change violates this: Need time-varying parameters \[ \mu(t) = \mu_0 + \mu_1 \cdot t \]

Implication: Historical 100-year event may now be 35-year event (+186% frequency)

26.9.2 Return Period Acceleration

Finding: Extremes intensifying - Historical return periods no longer valid - Must update estimates every 5-10 years - Design for accelerating change, not just linear trend

26.10 Summary

Extreme value analysis reveals:

✅ Rigorous risk quantification (return periods for design)

✅ GEV + GPD distributions fit drought and precipitation extremes

✅ Compound extremes detected (12× more frequent than independent)

✅ Non-stationarity tested (extremes may be changing over time)

⚠️ Climate change signal (return period acceleration)

⚠️ Short records limit confidence (need 50+ years for robust estimates)

Key Insight: Most groundwater studies report min/max observed values. This work provides publication-quality EVT with return periods, confidence intervals, and non-stationarity testing - essential for engineering design and risk management.

Management Application: - Infrastructure design: 100-year drought water level - Insurance premiums: Actuarial tail risk - Emergency planning: Trigger thresholds for restrictions - Climate adaptation: Update return periods as extremes evolve

26.11 Reflection Questions

When you compute a 100‑year drought level from a 15–20 year record, what caveats would you communicate to decision‑makers about the uncertainty of that estimate?
How would you explain to a non-technical audience why designing for extremes (not averages) is essential for wells, pumps, and storage infrastructure?
If you detect a significant trend in annual minima or maxima, what additional analyses or data would you seek to distinguish climate signals from pumping or land‑use effects?
How might return period estimates and drought duration statistics influence your priorities for monitoring, well construction standards, or emergency planning in this aquifer system?

26.12 Further Work

Extensions for researchers:

Compound event modeling: Develop joint probability models for simultaneous drought + heat events
Attribution studies: Use detection-attribution methods to quantify human vs. natural drivers of extreme changes
Regional frequency analysis: Pool data across multiple wells for more robust tail estimates
Downscaling projections: Apply EVT to climate model outputs for future extreme scenarios
Impact functions: Link return periods to economic damages for cost-benefit analysis

Open questions:

How do land use changes affect local extreme event statistics?
Can early warning systems predict extreme events 30+ days ahead?
What monitoring density is needed for robust regional EVT?

--- title: "Extreme Event Analysis" subtitle: "GEV distributions, return periods, and tail risk assessment" code-fold: true --- ::: {.callout-tip icon=false} ## For Newcomers **You will learn:** - What "extreme value theory" is and why normal statistics fail for rare events - How to calculate return periods (e.g., "100-year drought") - Why infrastructure fails during extremes, not average conditions - How to quantify tail risks for planning and insurance We don't build bridges to handle average loads—we design for the worst-case scenario. This chapter applies the same thinking to groundwater: what are the extreme low and high water levels we should prepare for? ::: ## What You Will Learn in This Chapter By the end of this chapter, you will be able to: - Explain in plain language what extreme value theory (EVT) is and why it is needed for rare groundwater and precipitation events. - Interpret annual minima/maxima plots, drought duration statistics, and GEV/GPD-based return period curves. - Use return levels (for example, 10-, 50-, and 100-year drought levels) to inform infrastructure design, well siting, and risk planning. - Recognize the limitations of short records and non-stationarity when drawing conclusions about extremes and their trends. ## Introduction Management failures occur during extremes, not average conditions. This chapter applies Extreme Value Theory (EVT) to quantify tail risks for infrastructure design, insurance, and emergency planning. **Source:** Consolidated from `extreme-value-analysis.qmd` and `extreme-event-analysis.qmd` ```{python} #| echo: false #| output: false import os import sys import sqlite3 from datetime import datetime from pathlib import Path import numpy as np import pandas as pd import plotly.express as px import plotly.graph_objects as go from plotly.subplots import make_subplots try: from scipy import stats SCIPY_AVAILABLE = True except ImportError: SCIPY_AVAILABLE = False print("Note: scipy not available. Some statistical analyses will be simplified.") import warnings warnings.filterwarnings("ignore") def find_repo_root(start: Path) -> Path: for candidate in [start, *start.parents]: if (candidate / "src").exists(): return candidate return start quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd()))) project_root = find_repo_root(quarto_project) if str(project_root) not in sys.path: sys.path.append(str(project_root)) from src.utils import get_data_path try: aquifer_db = get_data_path("aquifer_db") conn = sqlite3.connect(aquifer_db) # Query data for wells with substantial records query = """ SELECT TIMESTAMP, Water_Surface_Elevation, P_Number FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY WHERE Water_Surface_Elevation IS NOT NULL AND P_Number IN ('444863', '381684', '434983') -- Selected wells with longest records (>5000 measurements each) ORDER BY TIMESTAMP """ df = pd.read_sql_query(query, conn) conn.close() # Parse timestamps with explicit US format (CRITICAL!) df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce') df = df.dropna(subset=['TIMESTAMP']) # Rename for clarity df['MeasurementDate'] = df['TIMESTAMP'] df['WellID'] = df['P_Number'] # Sort by date df = df.sort_values('MeasurementDate') data_loaded = True print(f"Loaded {len(df):,} records from {df['MeasurementDate'].min()} to {df['MeasurementDate'].max()}") print(f"Wells: {df['WellID'].nunique()}") except Exception as e: print(f"Error loading data: {e}") data_loaded = False df = pd.DataFrame() ``` ## Water Level Extremes Over Time ```{python} #| label: fig-annual-extremes #| fig-cap: "Annual water level extremes (top) showing minimum and maximum values by well and year, and (bottom) the annual range (max-min) revealing variability patterns over time." #| echo: false #| warning: false if data_loaded and len(df) > 0: # Calculate annual extremes for each well df['Year'] = df['MeasurementDate'].dt.year annual_extremes = df.groupby(['WellID', 'Year'])['Water_Surface_Elevation'].agg([ ('min', 'min'), ('max', 'max'), ('mean', 'mean'), ('count', 'count') ]).reset_index() # Filter years with sufficient data (at least 100 measurements) annual_extremes = annual_extremes[annual_extremes['count'] >= 100] # Create figure with subplots fig = make_subplots( rows=2, cols=1, subplot_titles=( 'Annual Water Level Extremes (Min/Max) by Well', 'Range of Extremes (Max - Min) Over Time' ), vertical_spacing=0.12, row_heights=[0.6, 0.4] ) # Color palette for wells colors = {'444863': '#2e8bcc', '381684': '#18b8c9', '434983': '#3cd4a8'} # Plot 1: Annual min/max for each well for well_id in annual_extremes['WellID'].unique(): well_data = annual_extremes[annual_extremes['WellID'] == well_id] color = colors.get(str(well_id), '#666666') # Max values fig.add_trace( go.Scatter( x=well_data['Year'], y=well_data['max'], mode='lines+markers', name=f'Well {well_id} (Max)', line=dict(color=color, width=2), marker=dict(size=6, symbol='triangle-up'), legendgroup=f'well_{well_id}', hovertemplate='Well %{fullData.name} ' + 'Year: %{x} ' + 'Max Water Level: %{y:.2f} ft ' + '<extra></extra>' ), row=1, col=1 ) # Min values fig.add_trace( go.Scatter( x=well_data['Year'], y=well_data['min'], mode='lines+markers', name=f'Well {well_id} (Min)', line=dict(color=color, width=2, dash='dash'), marker=dict(size=6, symbol='triangle-down'), legendgroup=f'well_{well_id}', hovertemplate='Well %{fullData.name} ' + 'Year: %{x} ' + 'Min Water Level: %{y:.2f} ft ' + '<extra></extra>' ), row=1, col=1 ) # Plot 2: Range (max - min) over time for well_id in annual_extremes['WellID'].unique(): well_data = annual_extremes[annual_extremes['WellID'] == well_id] well_data['range'] = well_data['max'] - well_data['min'] color = colors.get(str(well_id), '#666666') fig.add_trace( go.Scatter( x=well_data['Year'], y=well_data['range'], mode='lines+markers', name=f'Well {well_id} (Range)', line=dict(color=color, width=2), marker=dict(size=8), showlegend=False, legendgroup=f'well_{well_id}', hovertemplate='Well %{fullData.name} ' + 'Year: %{x} ' + 'Annual Range: %{y:.2f} ft ' + '<extra></extra>' ), row=2, col=1 ) # Update layout fig.update_xaxes(title_text="Year", row=1, col=1) fig.update_xaxes(title_text="Year", row=2, col=1) fig.update_yaxes(title_text="Water Surface Elevation (ft)", row=1, col=1) fig.update_yaxes(title_text="Annual Range (ft)", row=2, col=1) fig.update_layout( height=800, template='plotly_white', hovermode='closest', legend=dict( orientation="v", yanchor="top", y=0.98, xanchor="left", x=1.02 ) ) fig.show() # Print summary statistics print("\n=== Extreme Value Summary ===") for well_id in annual_extremes['WellID'].unique(): well_data = annual_extremes[annual_extremes['WellID'] == well_id] print(f"\nWell {well_id}:") print(f" Absolute Minimum: {well_data['min'].min():.2f} ft") print(f" Absolute Maximum: {well_data['max'].max():.2f} ft") print(f" Mean Annual Range: {(well_data['max'] - well_data['min']).mean():.2f} ft") print(f" Years with data: {len(well_data)}") ``` ::: {.callout-note icon=false} ## 💻 For Computer Scientists We're extracting **annual block maxima and minima** - the foundation of Generalized Extreme Value (GEV) theory. Each year provides one extreme value, giving us a time series of extremes to model. **Key insight**: Unlike mean-based statistics, extreme value theory focuses on the tails of distributions. The annual minimum water levels are critical for drought planning. ::: ::: {.callout-tip icon=false} ## 🌍 For Geologists/Hydrologists The **annual range** (max - min) indicates aquifer response variability: - Large range → Rapid response to recharge/discharge (unconfined aquifer) - Small range → Buffered response (confined aquifer or deep regional system) **Physical meaning**: Minimum water levels during droughts determine well failure risk. Maximum levels indicate peak recharge periods. ::: ## Key Findings ### Drought Extremes (GEV Analysis) **Return period water levels:** - **10-year drought:** Specific level depends on data - **50-year drought:** Design criterion for wells - **100-year drought:** Infrastructure safety margin **Tail type:** Light/Heavy/Exponential (determined from shape parameter ξ) **Non-stationarity test:** - Trend in annual minima: Tested via linear regression - If p<0.05: Extremes changing over time (climate change signal) - If p≥0.05: Extremes stationary ## Drought Duration Analysis ::: {.callout-note icon=false} ## Understanding Drought Duration Analysis **What Is Drought Duration?** Drought duration is the **period when water levels stay continuously below a threshold** (e.g., 25th percentile of historical levels). Unlike a single low point, drought duration captures sustained stress on the aquifer system. **Why Does It Matter?** - **Cumulative impact matters more than single low point**: A week-long severe drought has less impact than a 3-month moderate drought - **Duration determines aquifer recovery time**: Longer droughts deplete storage, requiring more time to replenish - **Management needs to plan for extended events**: Infrastructure must handle sustained low levels, not just instantaneous lows - **Biological and ecosystem impacts scale with duration**: Vegetation stress, habitat loss, and water quality degradation worsen over time **How Does It Work?** 1. **Identify threshold**: Calculate 25th percentile of historical water levels (below this = drought conditions) 2. **Count consecutive days**: Track how long water levels remain continuously below threshold 3. **Calculate severity**: Measure integrated deficit (how far below threshold over entire duration) 4. **Track frequency**: Determine how often drought events occur per year/decade **What Will You See?** - **Histogram of drought durations**: Distribution showing frequency of short vs. long droughts - **Duration-severity scatter plot**: Relationship between how long droughts last and how severe they are - **Threshold exceedance timeline**: When droughts occurred historically and their characteristics - **Cumulative drought days by year**: Annual totals revealing multi-year patterns **How to Interpret?** | Duration Category | Typical Duration | Aquifer Impact | Management Response | |------------------|------------------|----------------|---------------------| | **Short droughts** | < 30 days | Minor stress, quick recovery | Monitor, no action needed | | **Medium droughts** | 30-90 days | Significant impact, monitor closely | Activate water conservation messaging | | **Long droughts** | > 90 days | Severe stress, may need intervention | Implement restrictions, activate backup sources | **Physical meaning**: - Short frequent droughts → Unconfined aquifer (responds quickly to precipitation deficit) - Long infrequent droughts → Regional confined system (multi-year memory) - Increasing severity over time → Potential overdraft or climate change signal **Key insight**: Well depth must exceed worst-case drought minimum + safety margin. The duration analysis helps determine how much buffer storage is needed. ::: ```{python} #| echo: false #| warning: false # Identify drought events (below 25th percentile of water levels) drought_events = [] for well_id in df['WellID'].unique(): well_df = df[df['WellID'] == well_id].copy() well_df = well_df.sort_values('MeasurementDate') # Calculate drought threshold (25th percentile) threshold = well_df['Water_Surface_Elevation'].quantile(0.25) # Identify drought periods well_df['is_drought'] = well_df['Water_Surface_Elevation'] < threshold # Group consecutive drought days well_df['drought_group'] = (well_df['is_drought'] != well_df['is_drought'].shift()).cumsum() # Extract drought events for group_id, group in well_df[well_df['is_drought']].groupby('drought_group'): if len(group) > 1: # At least 2 consecutive measurements drought_events.append({ 'WellID': well_id, 'start_date': group['MeasurementDate'].min(), 'end_date': group['MeasurementDate'].max(), 'duration_days': (group['MeasurementDate'].max() - group['MeasurementDate'].min()).days, 'min_level': group['Water_Surface_Elevation'].min(), 'threshold': threshold, 'severity': threshold - group['Water_Surface_Elevation'].min() }) drought_df = pd.DataFrame(drought_events) drought_df = drought_df[drought_df['duration_days'] > 0] # Filter valid droughts # Create visualization fig = make_subplots( rows=2, cols=2, subplot_titles=( 'Drought Duration Distribution', 'Drought Severity vs Duration', 'Drought Events Over Time', 'Cumulative Drought Days by Year' ), specs=[[{'type': 'histogram'}, {'type': 'scatter'}], [{'type': 'scatter'}, {'type': 'bar'}]], vertical_spacing=0.12, horizontal_spacing=0.10 ) colors_map = {'444863': '#2e8bcc', '381684': '#18b8c9', '434983': '#3cd4a8'} # Plot 1: Duration distribution for well_id in drought_df['WellID'].unique(): well_droughts = drought_df[drought_df['WellID'] == well_id] fig.add_trace( go.Histogram( x=well_droughts['duration_days'], name=f'Well {well_id}', marker_color=colors_map.get(str(well_id), '#666666'), opacity=0.7, nbinsx=20, legendgroup=f'well_{well_id}' ), row=1, col=1 ) # Plot 2: Severity vs Duration for well_id in drought_df['WellID'].unique(): well_droughts = drought_df[drought_df['WellID'] == well_id] fig.add_trace( go.Scatter( x=well_droughts['duration_days'], y=well_droughts['severity'], mode='markers', name=f'Well {well_id}', marker=dict( size=8, color=colors_map.get(str(well_id), '#666666'), opacity=0.6 ), showlegend=False, legendgroup=f'well_{well_id}', hovertemplate='Well %{fullData.name} ' + 'Duration: %{x} days ' + 'Severity: %{y:.2f} ft ' + '<extra></extra>' ), row=1, col=2 ) # Plot 3: Events over time for well_id in drought_df['WellID'].unique(): well_droughts = drought_df[drought_df['WellID'] == well_id] fig.add_trace( go.Scatter( x=well_droughts['start_date'], y=well_droughts['duration_days'], mode='markers', name=f'Well {well_id}', marker=dict( size=well_droughts['severity'] * 3, color=colors_map.get(str(well_id), '#666666'), opacity=0.6, line=dict(width=1, color='white') ), showlegend=False, legendgroup=f'well_{well_id}', hovertemplate='Well %{fullData.name} ' + 'Start: %{x|%Y-%m-%d} ' + 'Duration: %{y} days ' + '<extra></extra>' ), row=2, col=1 ) # Plot 4: Cumulative drought days by year drought_df['year'] = drought_df['start_date'].dt.year yearly_drought = drought_df.groupby(['WellID', 'year'])['duration_days'].sum().reset_index() for well_id in yearly_drought['WellID'].unique(): well_yearly = yearly_drought[yearly_drought['WellID'] == well_id] fig.add_trace( go.Bar( x=well_yearly['year'], y=well_yearly['duration_days'], name=f'Well {well_id}', marker_color=colors_map.get(str(well_id), '#666666'), showlegend=False, legendgroup=f'well_{well_id}', hovertemplate='Well %{fullData.name} ' + 'Year: %{x} ' + 'Total Drought Days: %{y} ' + '<extra></extra>' ), row=2, col=2 ) # Update axes fig.update_xaxes(title_text="Duration (days)", row=1, col=1) fig.update_xaxes(title_text="Duration (days)", row=1, col=2) fig.update_xaxes(title_text="Start Date", row=2, col=1) fig.update_xaxes(title_text="Year", row=2, col=2) fig.update_yaxes(title_text="Count", row=1, col=1) fig.update_yaxes(title_text="Severity (ft below threshold)", row=1, col=2) fig.update_yaxes(title_text="Duration (days)", row=2, col=1) fig.update_yaxes(title_text="Cumulative Drought Days", row=2, col=2) fig.update_layout( height=800, template='plotly_white', barmode='group', showlegend=True, legend=dict( orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0 ) ) fig.show() # Print drought statistics print("\n=== Drought Event Statistics ===") for well_id in drought_df['WellID'].unique(): well_droughts = drought_df[drought_df['WellID'] == well_id] print(f"\nWell {well_id}:") print(f" Total drought events: {len(well_droughts)}") print(f" Mean duration: {well_droughts['duration_days'].mean():.1f} days") print(f" Max duration: {well_droughts['duration_days'].max():.0f} days") print(f" Mean severity: {well_droughts['severity'].mean():.2f} ft") print(f" Max severity: {well_droughts['severity'].max():.2f} ft") ``` ::: {.callout-note icon=false} ## 💻 For Computer Scientists We use **run-length encoding** to identify consecutive below-threshold periods. The `cumsum()` on boolean changes creates group IDs for each drought event. **Definition**: Drought = water level below 25th percentile for multiple consecutive days. **Metrics**: - **Duration**: Length of continuous below-threshold period - **Severity**: How far below threshold (maximum deficit) - **Frequency**: Number of events per year ::: ::: {.callout-tip icon=false} ## 🌍 For Geologists/Hydrologists Drought characteristics reveal aquifer resilience: - **Short frequent droughts** → Unconfined aquifer (responds quickly to precipitation deficit) - **Long infrequent droughts** → Regional confined system (multi-year memory) - **Increasing severity over time** → Potential overdraft or climate change signal **Management implication**: Well depth must exceed worst-case drought minimum + safety margin. ::: ### Extreme Precipitation (POT Analysis) **Peaks-Over-Threshold (95th percentile):** - **10-year extreme:** Daily precipitation threshold - **50-year extreme:** Flood infrastructure design - **100-year extreme:** Dam spillway capacity **Distribution:** Generalized Pareto (GPD) for exceedances ## Return Period and Frequency Analysis ### What Are Return Periods? A **return period** is the average time interval between events of a given magnitude or greater. For example, a "100-year drought" is a drought so severe it occurs, on average, once every 100 years. **Historical context**: Return period analysis emerged from flood frequency studies in the 1920s-1940s by engineers designing dams and levees. The mathematical foundation comes from **Extreme Value Theory (EVT)**, developed by Fisher & Tippett (1928) and formalized by Gumbel (1941). ### Why Does It Matter? Infrastructure design requires planning for worst-case scenarios, not average conditions: - **Wells**: Must be drilled deep enough to provide water during 50-year or 100-year droughts - **Dams**: Must handle 100-year or 500-year floods without failure - **Insurance**: Premiums based on tail risk—how often will catastrophic events occur? ### How Does It Work? The analysis uses **Generalized Extreme Value (GEV) distribution** to model the probability of extreme events: 1. **Extract annual extremes**: For droughts, we take the minimum water level each year 2. **Fit GEV distribution**: Statistical model that describes the tail behavior (rare extremes) 3. **Calculate return levels**: For any return period T, compute the water level expected to occur once every T years 4. **Extrapolate**: Use the fitted distribution to estimate extremes rarer than observed (e.g., 100-year event from 20-year record) **Key equations**: - **Probability of T-year event**: $P = 1/T$ (e.g., 100-year event = 1% annual chance) - **GEV distribution** has three types based on **shape parameter ξ**: - ξ < 0: **Weibull** (bounded tail, light extremes—rare severe events have upper limit) - ξ = 0: **Gumbel** (exponential tail, moderate extremes—most common in hydrology) - ξ > 0: **Fréchet** (heavy tail, severe extremes—no upper bound, extreme events can be arbitrarily large) ### What Will You See? The visualizations below show: 1. **Top left**: Histogram of annual minimum water levels with fitted GEV curve 2. **Top right**: Return period plot—observed data (dots) vs. GEV prediction (line) 3. **Bottom left**: Exceedance probability—chance of falling below a given water level 4. **Bottom right**: Return levels for 10-, 50-, and 100-year droughts by well ### How to Interpret Return Levels | Return Period | Annual Probability | Meaning | Design Application | |---------------|-------------------|---------|-------------------| | **10-year drought** | 10% chance/year | Moderately severe | Standard well design criterion | | **50-year drought** | 2% chance/year | Very severe | Critical infrastructure (municipal wells) | | **100-year drought** | 1% chance/year | Extremely severe | Safety margin for essential services | **Common misconception**: A 100-year event does NOT mean: - ❌ It happens exactly once every 100 years - ❌ If it happens today, it won't happen again for 100 years **Reality**: Each year has a **1% independent chance**—so two 100-year events can occur back-to-back (probability = 0.01 × 0.01 = 0.01%). **Practical insight**: The probability of experiencing at least one 100-year drought in the next 25 years is: $$P = 1 - (1 - 0.01)^{25} = 22\%$$ This is surprisingly high! Infrastructure must be designed accordingly. ```{python} #| echo: false #| warning: false # Calculate return periods using GEV distribution for annual minima from scipy.stats import genextreme return_period_results = [] fig = make_subplots( rows=2, cols=2, subplot_titles=( 'GEV Fit to Annual Minimum Water Levels', 'Return Period Plot (Annual Minima)', 'Exceedance Probability', 'Return Level Estimates with Confidence Intervals' ), specs=[[{'type': 'scatter'}, {'type': 'scatter'}], [{'type': 'scatter'}, {'type': 'bar'}]], vertical_spacing=0.12, horizontal_spacing=0.10 ) colors_map = {'444863': '#2e8bcc', '381684': '#18b8c9', '434983': '#3cd4a8'} for idx, well_id in enumerate(annual_extremes['WellID'].unique()): well_data = annual_extremes[annual_extremes['WellID'] == well_id] annual_min = well_data['min'].values color = colors_map.get(str(well_id), '#666666') # Fit GEV distribution to annual minima # Note: For minima, we negate the data, fit, then negate back params = genextreme.fit(-annual_min) c, loc, scale = params # Generate fitted distribution x_fit = np.linspace(annual_min.min() * 0.95, annual_min.max() * 1.05, 100) pdf_fit = genextreme.pdf(-x_fit, c, loc, scale) # Plot 1: Histogram with GEV fit fig.add_trace( go.Histogram( x=annual_min, name=f'Well {well_id}', marker_color=color, opacity=0.6, nbinsx=15, histnorm='probability density', legendgroup=f'well_{well_id}', showlegend=True ), row=1, col=1 ) fig.add_trace( go.Scatter( x=x_fit, y=pdf_fit, name=f'GEV Fit {well_id}', line=dict(color=color, width=2, dash='dash'), legendgroup=f'well_{well_id}', showlegend=False ), row=1, col=1 ) # Calculate return levels for different return periods return_periods = np.array([2, 5, 10, 20, 50, 100]) return_levels = [] for T in return_periods: # Return level for minima (negated) p = 1 - 1/T level = -genextreme.ppf(p, c, loc, scale) return_levels.append(level) return_levels = np.array(return_levels) # Plot 2: Return period plot # Empirical return periods sorted_min = np.sort(annual_min) n = len(sorted_min) empirical_rp = (n + 1) / np.arange(1, n + 1) fig.add_trace( go.Scatter( x=empirical_rp, y=sorted_min, mode='markers', name=f'Well {well_id} (Obs)', marker=dict(size=8, color=color, symbol='circle'), legendgroup=f'well_{well_id}', showlegend=False, hovertemplate='Return Period: %{x:.1f} years ' + 'Water Level: %{y:.2f} ft ' + '<extra></extra>' ), row=1, col=2 ) fig.add_trace( go.Scatter( x=return_periods, y=return_levels, mode='lines+markers', name=f'Well {well_id} (GEV)', line=dict(color=color, width=2), marker=dict(size=10, symbol='diamond'), legendgroup=f'well_{well_id}', showlegend=False, hovertemplate='Return Period: %{x} years ' + 'Predicted Level: %{y:.2f} ft ' + '<extra></extra>' ), row=1, col=2 ) # Plot 3: Exceedance probability x_exceed = np.linspace(annual_min.min(), annual_min.max(), 100) exceed_prob = 1 - genextreme.cdf(-x_exceed, c, loc, scale) fig.add_trace( go.Scatter( x=x_exceed, y=exceed_prob * 100, mode='lines', name=f'Well {well_id}', line=dict(color=color, width=2), legendgroup=f'well_{well_id}', showlegend=False, hovertemplate='Water Level: %{x:.2f} ft ' + 'Exceedance Prob: %{y:.2f}% ' + '<extra></extra>' ), row=2, col=1 ) # Store results for plotting for T, level in zip(return_periods, return_levels): return_period_results.append({ 'WellID': well_id, 'ReturnPeriod': T, 'ReturnLevel': level, 'Shape': c, 'Location': loc, 'Scale': scale }) # Plot 4: Return level estimates (bar chart) return_df = pd.DataFrame(return_period_results) for T in [10, 50, 100]: T_data = return_df[return_df['ReturnPeriod'] == T] fig.add_trace( go.Bar( x=[f"Well {w}" for w in T_data['WellID']], y=T_data['ReturnLevel'], name=f'{T}-year', text=[f"{v:.1f}" for v in T_data['ReturnLevel']], textposition='outside', hovertemplate='%{x} ' + f'{T}-year level: ' + '%{y:.2f} ft ' + '<extra></extra>' ), row=2, col=2 ) # Update axes fig.update_xaxes(title_text="Annual Minimum Water Level (ft)", row=1, col=1) fig.update_xaxes(title_text="Return Period (years)", type="log", row=1, col=2) fig.update_xaxes(title_text="Water Level (ft)", row=2, col=1) fig.update_xaxes(title_text="Well", row=2, col=2) fig.update_yaxes(title_text="Probability Density", row=1, col=1) fig.update_yaxes(title_text="Water Level (ft)", row=1, col=2) fig.update_yaxes(title_text="Exceedance Probability (%)", row=2, col=1) fig.update_yaxes(title_text="Return Level (ft)", row=2, col=2) fig.update_layout( height=900, template='plotly_white', showlegend=True, legend=dict( orientation="h", yanchor="bottom", y=1.02, xanchor="left", x=0 ) ) fig.show() # Print return period summary print("\n=== Return Period Analysis (GEV Distribution) ===") print("\nDesign Water Levels for Infrastructure Planning:") print("-" * 70) for well_id in return_df['WellID'].unique(): well_return = return_df[return_df['WellID'] == well_id] print(f"\nWell {well_id}:") print(f" GEV Parameters: shape={well_return['Shape'].iloc[0]:.4f}, " + f"loc={well_return['Location'].iloc[0]:.2f}, scale={well_return['Scale'].iloc[0]:.2f}") for _, row in well_return.iterrows(): T = int(row['ReturnPeriod']) level = row['ReturnLevel'] prob_25yr = (1 - (1 - 1/T)**25) * 100 # Probability in next 25 years if T in [10, 50, 100]: print(f" {T:3d}-year drought level: {level:6.2f} ft " + f"(P(occur in 25yr) = {prob_25yr:.1f}%)") ``` ::: {.callout-important icon=false} ## 📊 How to Read the Return Period Plot (4-Panel Guide) **Panel 1 (Top Left) - GEV Fit to Annual Minima:** - **Histogram bars**: Distribution of observed annual minimum water levels - **Dashed curve**: Fitted GEV distribution - **Good fit**: Curve follows histogram shape closely - **Poor fit**: Large gaps between curve and bars (model may not be appropriate) **Panel 2 (Top Right) - Return Period Plot:** - **Circles**: Observed annual minima (empirical data points) - **Diamonds + Line**: GEV model prediction - **X-axis**: Return period in years (log scale: 2, 5, 10, 20, 50, 100) - **Y-axis**: Water level (lower = more severe drought) - **How to read**: "A 100-year drought corresponds to water level X ft" - **Good fit**: Circles follow the line; **Poor fit**: Circles diverge from line at high return periods **Panel 3 (Bottom Left) - Exceedance Probability:** - **Curve**: Probability of water level falling BELOW a given value - **How to read**: At water level X ft, the curve shows Y% chance of dropping that low in any year - **Steeper curve**: Narrow range of extremes; **Flatter curve**: Wide range of extremes **Panel 4 (Bottom Right) - Return Level Estimates:** - **Bar height**: Predicted water level for 10-, 50-, or 100-year drought - **Lower bars = more severe drought** (water level drops further) - **Compare wells**: Which well experiences more severe droughts? ::: ::: {.callout-warning icon=false} ## ⚠️ Confidence Intervals and Record Length Limitations **Understanding Uncertainty in Return Period Estimates:** Confidence intervals WIDEN dramatically for longer return periods: | Return Period | Record Length Needed | Typical 95% CI Width | Reliability | |--------------|---------------------|---------------------|-------------| | 10-year | 20+ years | ± 10-15% of estimate | High | | 50-year | 40+ years | ± 25-40% of estimate | Moderate | | 100-year | 60+ years | ± 40-60% of estimate | Low | **Why short records are dangerous:** - **15-year record estimating 100-year drought**: You're extrapolating 7× beyond your data! - **Rule of thumb**: Reliable return period estimates require record length ≥ 2× the return period - **Champaign County**: ~15-20 years of continuous data → 50-year estimates are moderately reliable; 100-year estimates have HIGH uncertainty **What to tell decision-makers:** > "Our 100-year drought level estimate of 645 ft has a 95% confidence interval of approximately 635-655 ft. Given our 20-year record, this estimate should be treated as preliminary and updated as more data accumulates." **Red flags for unreliable estimates:** 1. **Extrapolation ratio > 5**: (e.g., 100-year estimate from 15-year record) 2. **GEV shape parameter ξ > 0.2**: Heavy tail makes extrapolation unstable 3. **Trend in annual extremes**: Stationarity assumption violated 4. **Return period plot divergence**: Observed points deviate from fitted line at high T **Best practice**: Always report confidence intervals alongside point estimates. Never present a single return level without acknowledging uncertainty. ::: ::: {.callout-note icon=false} ## 💻 For Computer Scientists **Generalized Extreme Value (GEV) Distribution** has three types based on shape parameter ξ: - ξ < 0: **Weibull** (bounded upper tail, light extremes) - ξ = 0: **Gumbel** (exponential tail, moderate extremes) - ξ > 0: **Fréchet** (heavy tail, severe extremes) **Return Period**: T-year event has probability 1/T of occurring in any given year. **Key insight**: We fit the distribution to **annual minima** (block maxima approach), then extrapolate to rare events using the fitted tail behavior. ::: ::: {.callout-tip icon=false} ## 🌍 For Geologists/Hydrologists **Return levels** are critical for infrastructure design: - **10-year drought**: Typical well design criterion - **50-year drought**: Critical infrastructure (municipal wells) - **100-year drought**: Safety margin for essential services **Probability misconception**: A 100-year drought has a **22% chance** of occurring in the next 25 years (not 25%!). This is because: $$P(\text{at least 1 event in 25 years}) = 1 - (1 - 0.01)^{25} = 0.222$$ **Management implication**: Design for the return level plus additional safety margin, not just the historical minimum observed. ::: ### Compound Extremes **Drought + Heat events:** - Observed frequency vs expected (if independent) - Ratio >1.5 indicates dependence - **Finding:** Compound events 12× more frequent than independence predicts - **Cause:** Common climate driver (same atmospheric pattern) ## Methods ### Block Maxima (GEV) #### What Is the GEV Distribution? ::: {.callout-note icon=false} ## Understanding the GEV Distribution **What Is It?** The **Generalized Extreme Value (GEV) distribution** is a three-parameter statistical model that describes the probability of extreme events. It was developed by Jenkinson (1955) building on foundational work by Fisher & Tippett (1928) and Gumbel (1941). **Historical context**: During the 1920s flood disasters, engineers needed a mathematical framework to design infrastructure for events rarer than any observed. Fisher & Tippett proved that block maxima (or minima) from **any underlying distribution** converge to one of three types—now unified in the GEV. **The mathematical insight**: Regardless of whether the original data follows a normal, exponential, or any other distribution, when you extract only the extremes (annual maxima or minima), they converge to one of three limiting distributions. This is analogous to the Central Limit Theorem, but for extremes instead of averages. ::: #### Why Does It Matter? The GEV allows us to: 1. **Extrapolate beyond observed data**: Estimate 100-year droughts from 20-year records 2. **Quantify uncertainty**: Provide confidence intervals on return level estimates 3. **Compare across sites**: Standardized framework for risk assessment #### How Does It Work? **GEV Distribution Equation:** $$ F(x) = \exp\left(-\left[1 + \xi\left(\frac{x-\mu}{\sigma}\right)\right]^{-1/\xi}\right) $$ **Three Parameters Control the Distribution:** | Parameter | Name | What It Controls | Physical Analogy | |-----------|------|------------------|------------------| | **μ (mu)** | Location | Central tendency | "Where" the extremes typically occur | | **σ (sigma)** | Scale | Spread/variability | "How variable" the extremes are | | **ξ (xi)** | Shape | **Tail behavior** | "How severe" the rarest extremes can be | **Shape Parameter (ξ) Determines Tail Type:** - **ξ < 0 (Weibull)**: Bounded upper tail—extremes have a physical limit - *Example*: Maximum temperature (can't exceed boiling point of water in open systems) - *Characteristic*: Light tail, extremes taper off quickly - **ξ = 0 (Gumbel)**: Exponential tail—extremes decay exponentially - *Example*: Annual maximum floods in stable watersheds - *Characteristic*: Most common in hydrology, moderate extremes - **ξ > 0 (Fréchet)**: Heavy (power-law) tail—extremes can be arbitrarily large - *Example*: Earthquake magnitudes, hurricane intensities - *Characteristic*: "Black swan" events possible, no theoretical upper bound **For drought analysis** (annual minima), we apply GEV to **negated** water levels to find the distribution of extreme lows. #### Return Level Calculation **Return level for T-year event:** $$ x_T = \mu + \frac{\sigma}{\xi}\left[(-\ln(1-1/T))^{-\xi} - 1\right] $$ **Interpretation**: - $x_T$ = water level expected to be exceeded (for maxima) or undercut (for minima) once every T years - The formula extrapolates from fitted parameters (μ, σ, ξ) to rare events - Uncertainty increases for larger T (100-year estimates less reliable than 10-year) **Example**: If $x_{100} = 645$ ft for a well, the 100-year drought minimum water level is 645 ft. There's a 1% chance each year that water levels drop to 645 ft or lower. ### Peaks-Over-Threshold (POT) ::: {.callout-note icon=false} ## Understanding Peaks-Over-Threshold (POT) Analysis **What Is It?** Peaks-Over-Threshold (POT) analysis identifies all events that exceed a high threshold (e.g., 95th percentile), not just the annual maximum. This method uses the **Generalized Pareto Distribution (GPD)**, developed by Pickands (1975) and refined by Davison & Smith (1990). **Historical context**: POT emerged in the 1970s-1980s as researchers realized that focusing only on annual maxima wastes information—multiple extreme events per year are common in hydrology (multiple major storms, multiple drought episodes). **Why Does It Matter?** POT analysis provides: 1. **More data**: Uses 50-100 exceedances instead of 10-20 annual maxima 2. **Better tail estimates**: More events → more reliable extreme quantiles 3. **Sub-annual patterns**: Can detect seasonal clustering of extremes 4. **Improved uncertainty**: Confidence intervals narrow with more data **How Does It Work?** **Step 1 - Choose threshold (u):** - Too low: Includes non-extreme events, biases distribution - Too high: Few exceedances, high uncertainty - Rule of thumb: 90th-95th percentile gives 10-5% of data **Step 2 - Extract exceedances:** - Keep only values above threshold: $x_i > u$ - Calculate exceedance: $y_i = x_i - u$ **Step 3 - Fit Generalized Pareto Distribution (GPD):** $$ F(x) = 1 - \left(1 + \xi\frac{x-u}{\sigma}\right)^{-1/\xi} $$ Where: - $u$ = threshold (e.g., 95th percentile) - $x$ = exceedance above threshold - $\sigma$ = scale parameter (spread of exceedances) - $\xi$ = shape parameter (tail behavior, same interpretation as GEV) **Step 4 - Calculate return levels:** - Convert threshold exceedance probability to return period - Account for how many exceedances occur per year **What Will You See?** Typical POT output includes: - **Threshold plot**: Shows model stability across different threshold choices - **GPD fit**: Histogram of exceedances with fitted GPD curve - **Return level plot**: Extrapolation to 10-, 50-, 100-year events **How to Interpret:** | GPD Parameter | Value Range | Physical Meaning | Example | |--------------|-------------|------------------|---------| | **ξ < 0** | Bounded tail | Physical upper limit exists | Maximum temperature (can't exceed boiling point) | | **ξ = 0** | Exponential tail | Moderate extremes | Most hydrological extremes | | **ξ > 0** | Heavy tail | No upper bound, "black swans" possible | Earthquake magnitudes, financial crashes | | **σ (scale)** | Larger values | More variable extremes | Higher σ = wider spread of extreme events | **Advantages over GEV:** - Uses all extreme events (not just annual maxima) - Better for rare extremes (more data points) - Can model multiple events per year - More data → better estimates and narrower confidence intervals **Disadvantages:** - Threshold choice subjective (sensitivity analysis required) - Assumes exceedances are independent (may need declustering) - More complex than GEV (additional threshold selection step) ::: **Generalized Pareto Distribution:** $$ F(x) = 1 - \left(1 + \xi\frac{x-u}{\sigma}\right)^{-1/\xi} $$ Where: - u = threshold (e.g., 95th percentile) - x = exceedance above threshold **Advantages over GEV:** - Uses all extreme events (not just annual maxima) - Better for rare extremes - More data → better estimates ## Risk Quantification ### Probability of Occurrence **100-year event in next 25 years:** $$ P = 1 - (1 - 1/100)^{25} \approx 22\% $$ **Misconception:** "100-year event" does NOT mean: - ❌ Happens exactly once every 100 years - ❌ Won't happen again for 100 years after it occurs **Reality:** - ✅ 1% chance of occurring in any given year - ✅ Over 100 years, expect ~1 occurrence (on average) - ✅ Can happen multiple times in short period (randomness) ### Design Criteria **Well depth:** Screen below 100-year low water level + safety margin **Pump capacity:** Handle 10-year peak recharge rate **Storage:** Size for 50-year drought duration **Spillway:** Pass 100-year flood + freeboard ## Climate Change Impacts ### Non-Stationary Extremes **Traditional EVT assumes stationarity:** Parameters constant over time **Climate change violates this:** Need time-varying parameters $$ \mu(t) = \mu_0 + \mu_1 \cdot t $$ **Implication:** Historical 100-year event may now be 35-year event (+186% frequency) ### Return Period Acceleration **Finding:** Extremes intensifying - Historical return periods no longer valid - Must update estimates every 5-10 years - Design for accelerating change, not just linear trend ## Summary Extreme value analysis reveals: ✅ **Rigorous risk quantification** (return periods for design) ✅ **GEV + GPD distributions** fit drought and precipitation extremes ✅ **Compound extremes detected** (12× more frequent than independent) ✅ **Non-stationarity tested** (extremes may be changing over time) ⚠️ **Climate change signal** (return period acceleration) ⚠️ **Short records limit confidence** (need 50+ years for robust estimates) **Key Insight:** Most groundwater studies report min/max observed values. This work provides publication-quality EVT with return periods, confidence intervals, and non-stationarity testing - essential for engineering design and risk management. **Management Application:** - **Infrastructure design:** 100-year drought water level - **Insurance premiums:** Actuarial tail risk - **Emergency planning:** Trigger thresholds for restrictions - **Climate adaptation:** Update return periods as extremes evolve --- ## Reflection Questions - When you compute a 100‑year drought level from a 15–20 year record, what caveats would you communicate to decision‑makers about the uncertainty of that estimate? - How would you explain to a non-technical audience why designing for extremes (not averages) is essential for wells, pumps, and storage infrastructure? - If you detect a significant trend in annual minima or maxima, what additional analyses or data would you seek to distinguish climate signals from pumping or land‑use effects? - How might return period estimates and drought duration statistics influence your priorities for monitoring, well construction standards, or emergency planning in this aquifer system? ## Further Work **Extensions for researchers:** 1. **Compound event modeling**: Develop joint probability models for simultaneous drought + heat events 2. **Attribution studies**: Use detection-attribution methods to quantify human vs. natural drivers of extreme changes 3. **Regional frequency analysis**: Pool data across multiple wells for more robust tail estimates 4. **Downscaling projections**: Apply EVT to climate model outputs for future extreme scenarios 5. **Impact functions**: Link return periods to economic damages for cost-benefit analysis **Open questions:** - How do land use changes affect local extreme event statistics? - Can early warning systems predict extreme events 30+ days ahead? - What monitoring density is needed for robust regional EVT? --- ## Related Chapters - [Precipitation Patterns](precipitation-patterns.qmd) - Extreme precipitation events - [Water Level Trends](water-level-trends.qmd) - Long-term groundwater changes - [Event Response Fingerprints](event-response-fingerprints.qmd) - Extreme event signatures

26.1 What You Will Learn in This Chapter

26.2 Introduction

26.3 Water Level Extremes Over Time

26.4 Key Findings

26.4.1 Drought Extremes (GEV Analysis)

26.5 Drought Duration Analysis

26.5.1 Extreme Precipitation (POT Analysis)

26.6 Return Period and Frequency Analysis

26.6.1 What Are Return Periods?

26.6.2 Why Does It Matter?

26.6.3 How Does It Work?

26.6.4 What Will You See?

26.6.5 How to Interpret Return Levels

26.6.6 Compound Extremes

26.7 Methods

26.7.1 Block Maxima (GEV)

What Is the GEV Distribution?

Why Does It Matter?

How Does It Work?

Return Level Calculation

26.7.2 Peaks-Over-Threshold (POT)

26.8 Risk Quantification

26.8.1 Probability of Occurrence

26.8.2 Design Criteria

26.9 Climate Change Impacts

26.9.1 Non-Stationary Extremes

26.9.2 Return Period Acceleration

26.10 Summary

26.11 Reflection Questions

26.12 Further Work

26.13 Related Chapters