---
title: "Well Network Analysis"
description: "Monitoring network data quality, measurement frequency, and temporal patterns in groundwater levels"
code-fold: true
---
::: {.callout-tip icon=false}
## For Newcomers
**You will learn:**
- What monitoring wells are and why they're essential for tracking groundwater
- How water level measurements reveal aquifer "health" over time
- Why data gaps and measurement frequency matter for analysis
- How to interpret water level trends (rising, falling, seasonal patterns)
Think of monitoring wells as **thermometers for the aquifer**—they tell us if the underground water supply is stable, stressed, or recovering. This chapter explores what the monitoring network reveals (and what critical gaps exist).
:::
## What You Will Learn in This Chapter
By the end of this chapter, you will be able to:
- Describe the monitoring well network in Champaign County and explain why wells are essential for tracking aquifer “health” over time.
- Summarize the actual data availability (how many wells really have measurements, and over what periods).
- Interpret key well-network diagnostics: measurement frequency, temporal coverage, hydrographs, seasonal patterns, and spatial distribution.
- Explain the main limitations of the current network and how these constraints affect regional analyses and data fusion with HTEM and climate data.
## Direct Measurement of the Aquifer
While HTEM reveals the aquifer's **structure**, monitoring wells measure its **dynamic behavior**—water levels rising and falling in response to precipitation, pumping, and seasonal cycles. Wells are our **direct sensors** of aquifer health.
This chapter explores Champaign County's groundwater monitoring network: its coverage, data quality, temporal patterns, and critical limitations.
::: {.callout-warning icon=false}
## ⚠️ Critical Finding: Data Distribution Reality
The database contains well location metadata and measurement records, but data availability varies significantly across wells. This analysis reveals which wells have substantial monitoring records versus those with sparse or no recent data.
**Critical discovery**: While the database contains **18 wells with measurement records**, only **3 wells have substantial operational data** suitable for robust temporal analysis (>50 measurements). This represents a **17% operational rate**—the vast majority of wells in metadata are non-functional or have minimal data.
:::
::: {.callout-warning icon=false}
## ❌ Initial Expectation vs Reality: The 18→3 Well Data Gap
**What we expected:** Database metadata lists 18 wells with measurement records. Standard practice assumes ~70-80% of documented wells are operational, suggesting 14-15 wells would provide usable time series data.
**What we found:** Only **3 of 18 wells** (17%) have substantial measurement data suitable for temporal analysis. The other 15 wells exist in metadata but lack the continuous, long-term records needed for trend detection, seasonal decomposition, or spatial gradient mapping.
**Why this happened:** Multiple factors contribute to metadata-reality gaps:
- Wells under construction or planned (metadata created before installation)
- Decommissioned wells (metadata not updated after removal)
- Data stored in separate archives (historical data not migrated to current database)
- Equipment failures without metadata updates (sensors failed, metadata not flagged)
- Different data quality standards (some "measurements" are sparse site visits, not continuous monitoring)
**Lesson learned:** **Never trust metadata counts without validating actual data availability**. During project planning, always:
1. Query actual measurement records, not just location tables
2. Define minimum data requirements upfront (e.g., ≥365 daily measurements)
3. Calculate operational rates (measurements per well) before designing analyses
4. Contact data providers to clarify status of "ghost wells" (metadata without data)
**Impact on analysis:** This gap severely constrains regional spatial analysis. With only 3 operational wells, we cannot:
- Map regional water table gradients (need ≥10 wells)
- Assess aquifer heterogeneity spatially (inadequate coverage)
- Validate HTEM predictions across the study area (3 points insufficient)
- Detect localized pumping impacts (no redundancy)
**Better approach:** Document data availability **upfront** in Chapter 1 (Data Quality Audit) to set realistic expectations. Researchers can then design analyses around actual data (3 excellent time series) rather than expected data (18 potential wells), avoiding mid-project surprises.
**Key insight for interdisciplinary teams:** Computer scientists assume "18 rows in database = 18 usable data points." Hydrologists know "wells in database ≠ wells with data." **Communicate data availability explicitly** to prevent ML engineers from designing spatial models that require 18 points when only 3 exist.
:::
---
## Part 1: The Monitoring Network
```{python}
#| label: setup
#| echo: false
import os
import sys
from pathlib import Path
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import sqlite3
# Setup
def find_repo_root(start: Path) -> Path:
for candidate in [start, *start.parents]:
if (candidate / "src").exists():
return candidate
return start
quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd())))
project_root = find_repo_root(quarto_project)
if str(project_root) not in sys.path:
sys.path.append(str(project_root))
from src.utils import get_data_path
from src.data_loaders.groundwater_loader import GroundwaterLoader
# Initialize database connection
db_path = get_data_path('aquifer_db')
loader = GroundwaterLoader(db_path)
conn = sqlite3.connect(db_path)
print("✓ Groundwater monitoring loader initialized")
```
### Well Metadata Inventory
::: {.callout-note icon=false}
## 📘 Understanding Well Metadata
**What Is Metadata?**
Metadata is "data about data"—descriptive information about wells (location, ID, construction details) separate from actual measurements. The term emerged in information science in the 1960s.
**Why Does Metadata Matter?**
Metadata enables:
- **Discovery**: Which wells exist and where are they?
- **Selection**: Which wells are suitable for specific analyses?
- **Context**: Understanding well construction affects interpretation
**What This Inventory Shows:**
The table below lists all wells documented in the database with their coordinates. This is the "advertised" network—what exists on paper.
**Critical Question:** How many of these wells actually have measurement data? The answer (revealed next) exposes a severe data availability crisis.
**How to Interpret:**
| Metadata Completeness | Network Status | Management Implication |
|----------------------|---------------|----------------------|
| 100% wells have coordinates | Good metadata | Can plan spatial analyses |
| <80% wells have coordinates | Poor metadata | Cannot map network |
| Metadata ≫ operational | Inflated expectations | Analysts misled about availability |
:::
```{python}
# Get all wells in metadata from real database
wells_meta = pd.read_sql("""
SELECT
P_NUMBER as well_id,
LAT_WGS_84 as latitude,
LONG_WGS_84 as longitude,
X_LAMBERT as easting,
Y_LAMBERT as northing
FROM OB_LOCATIONS
WHERE LAT_WGS_84 IS NOT NULL
AND LONG_WGS_84 IS NOT NULL
""", conn)
print(f"📍 Wells in Metadata: {len(wells_meta)}")
print(f" • With coordinates: {len(wells_meta)}")
print(f" • Lat range: {wells_meta['latitude'].min():.4f}° to {wells_meta['latitude'].max():.4f}°")
print(f" • Lon range: {wells_meta['longitude'].min():.4f}° to {wells_meta['longitude'].max():.4f}°")
wells_meta.head()
```
### Actual Measurement Availability
```{python}
# Check which wells actually have measurements from real database
measurements = pd.read_sql("""
SELECT
P_NUMBER as well_id,
COUNT(*) as measurement_count,
MIN(TIMESTAMP) as first_measurement,
MAX(TIMESTAMP) as last_measurement
FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
WHERE TIMESTAMP IS NOT NULL
GROUP BY P_NUMBER
ORDER BY measurement_count DESC
""", conn)
# Parse timestamps with explicit US format (CRITICAL!)
measurements['first_measurement'] = pd.to_datetime(
measurements['first_measurement'],
format='%m/%d/%Y',
errors='coerce'
)
measurements['last_measurement'] = pd.to_datetime(
measurements['last_measurement'],
format='%m/%d/%Y',
errors='coerce'
)
measurements['record_length_years'] = (
(measurements['last_measurement'] - measurements['first_measurement']).dt.days / 365.25
)
print(f"\n📊 Wells with Actual Data: {len(measurements)}")
print(f" • Total measurements: {measurements['measurement_count'].sum():,}")
print(f" • Data availability: {len(measurements)/len(wells_meta)*100:.1f}% of metadata wells")
measurements.sort_values('measurement_count', ascending=False)
```
::: {.callout-note icon=false}
## Understanding the Data Availability Table
**What Does This Table Show?**
This table lists the **only 3 wells** (out of 18 in metadata) that actually have measurement data. Each row represents one operational monitoring well with its complete data history.
**Brief Context**: In the 1990s-2000s, state agencies installed extensive monitoring networks during periods of federal funding. When funding decreased, many wells became "orphaned"—installed but not maintained. This table reveals that reality.
**Why Does This Matter?**
The gap between metadata (18 wells) and reality (3 wells) has severe consequences:
1. **Analysis planning**: Researchers design studies expecting 18 data points, discover mid-project only 3 exist
2. **Spatial coverage**: Cannot map regional water table gradients with 3 points
3. **Redundancy**: No backup if the primary well fails
4. **Resource allocation**: Money may be better spent activating dormant wells than installing new ones
**How to Read This Table:**
Each column tells a different story:
| Column | What It Means | Why It Matters |
|--------|---------------|----------------|
| **Well ID** | Unique identifier | Tracks specific location |
| **Measurements** | Total data points | More = better statistical power |
| **Start Date** | First measurement | Earlier = longer climate history |
| **End Date** | Last measurement | Recent = currently operational? |
| **Duration** | Record length (years) | Longer = trend detection possible |
**Interpreting Measurement Counts:**
| Measurement Count | If Hourly Data | Quality | What You Can Analyze |
|-------------------|----------------|---------|----------------------|
| **>100,000** | >11 years | Excellent | Long-term trends, climate cycles, extreme events |
| **50,000-100,000** | 5-11 years | Good | Seasonal patterns, multi-year trends |
| **10,000-50,000** | 1-5 years | Fair | Basic seasonality, limited trends |
| **<10,000** | <1 year | Poor | Snapshot only, no trends |
**What Will You See:**
The table shows dramatic inequality:
- **Well 444863**: Carries entire monitoring burden (74% of all measurements, 14.8-year record)
- **Well 268557**: Moderate contributor (18% of measurements, 3.6-year record)
- **Well 505586**: Recent addition (8% of measurements, 1.7-year record)
**Critical Risk**: If Well 444863 fails, we lose:
- 74% of our data volume
- Our only long-term trend capability (14.8 years)
- Ability to validate seasonal patterns across years
This is a **single point of failure** scenario—catastrophic for regional monitoring.
:::
::: {.callout-important icon=true}
## 🎯 Data Availability Reality
**The database contains 18 wells with measurement records**, but data volume varies dramatically:
**Top 5 Wells by Data Volume:**
| Well ID | Measurements | Start Date | End Date | Record Years |
|---------|--------------|------------|----------|--------------|
| **444890** | 196,941 | 2023-01-10 | 2023-06-02 | 0.4 years |
| **444889** | 196,941 | 2023-01-10 | 2023-06-02 | 0.4 years |
| **444863** | 129,082 | 2009-01-01 | 2022-09-09 | 13.7 years |
| **381684** | 120,585 | 2009-01-01 | 2022-09-09 | 13.7 years |
| **434983** | 102,547 | 2009-01-01 | 2019-09-09 | 10.7 years |
**Total**: 1.1+ million measurements across 18 operational wells.
**Key Patterns**:
- Two wells (444890, 444889) have very high-frequency recent data (hourly sampling)
- Several wells have excellent long-term records (10-13+ years)
- Data volume varies by measurement frequency and record duration
:::
---
## Part 2: Data Quality Assessment
### Measurement Frequency Analysis
#### What Is Measurement Frequency?
Measurement frequency refers to how often water levels are recorded at a well—hourly, daily, weekly, or monthly. This temporal resolution determines what aquifer processes you can observe, similar to how a video frame rate determines what motion you can see.
**Historical Context**: Early groundwater monitoring (1950s-1980s) relied on monthly manual measurements. Modern automated dataloggers (1990s-present) enable continuous hourly monitoring, revolutionizing our ability to observe aquifer dynamics.
#### Why Does Measurement Frequency Matter?
**Data quality isn't just about having measurements—it's about having them frequently enough to capture the dynamics you care about.**
Different aquifer processes operate at different timescales:
- **Hourly**: Captures storm response, pumping cycles, tidal effects
- **Daily**: Captures seasonal trends, weekly patterns, weather events
- **Monthly**: Misses most dynamics, only good for long-term trends (years to decades)
#### How to Interpret Measurement Intervals
| Mean Interval | Quality Rating | What You Can Analyze | What You'll Miss |
|---------------|----------------|----------------------|------------------|
| <1 hour | Excellent | Storm response, pump cycles, all temporal patterns | Nothing significant |
| 1-24 hours | Good | Seasonal patterns, weather response | Sub-daily pumping effects |
| 1-7 days | Fair | Long-term trends, seasonal cycles | Storm responses, weekly patterns |
| >7 days | Poor | Decadal trends only | Most aquifer dynamics |
#### What Will You See?
The analysis below calculates the measurement interval for each operational well. Look for:
- **Mean interval**: Average time between measurements
- **Median interval**: Typical spacing (less affected by gaps)
- **Max gap**: Longest period without data (indicates outages)
- **Gaps >7 days**: Count of significant data interruptions
```{python}
# Analyze measurement intervals for wells with data
well_ids = measurements['well_id'].tolist()
freq_stats = []
for well_id in well_ids:
# Use parameterized query to prevent SQL injection
df = pd.read_sql(
"""
SELECT
TIMESTAMP,
DTW_FT_Reviewed
FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
WHERE P_NUMBER = ?
AND TIMESTAMP IS NOT NULL
ORDER BY TIMESTAMP
""",
conn,
params=[well_id]
)
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')
df = df.dropna(subset=['TIMESTAMP']).sort_values('TIMESTAMP')
if len(df) > 1:
df['interval_days'] = df['TIMESTAMP'].diff().dt.total_seconds() / 86400
freq_stats.append({
'well_id': well_id,
'count': len(df),
'mean_interval_days': df['interval_days'].mean(),
'median_interval_days': df['interval_days'].median(),
'max_gap_days': df['interval_days'].max(),
'gaps_over_7_days': (df['interval_days'] > 7).sum()
})
freq_df = pd.DataFrame(freq_stats)
print("📈 Measurement Frequency (Wells with Data):")
freq_df
```
**Key insight**: All 3 operational wells have **hourly measurements**—these are automated dataloggers, not manual readings!
- **Mean interval**: ~0.042 days (≈1 hour)
- **Gaps >7 days**: **ZERO** for all wells (continuous monitoring)
- **Quality rating**: **Excellent** for all 3 wells
---
## Part 3: Temporal Coverage
### Measurement Timeline
::: {.callout-note icon=false}
## 📘 What Will You See in the Timeline
**Before Viewing:**
This Gantt-style chart shows when each well was operational.
**What to Look For:**
| Visual Pattern | Meaning | Management Implication |
|---------------|---------|----------------------|
| **Long bars** | Lengthy monitoring records | Enables trend analysis |
| **Short bars** | Brief monitoring periods | Limited to snapshots |
| **Overlapping bars** | Simultaneous monitoring | Can assess spatial patterns |
| **Gaps between bars** | No temporal overlap | Cannot cross-validate |
| **Recent end dates** | Currently operational | Real-time monitoring possible |
| **Old end dates** | Decommissioned | Historical archive only |
**Expected Pattern:** Ideally, you'd see 10+ overlapping bars spanning 10+ years. Reality check coming...
:::
```{python}
#| label: fig-well-timeline
#| fig-cap: "Groundwater monitoring timeline showing data availability for each well. Only 3 of 18 wells in the metadata have actual measurements, revealing a critical data gap."
# Create Gantt-style timeline
timeline_data = []
for _, row in measurements.iterrows():
timeline_data.append({
'Well': f"Well {row['well_id']}",
'Start': row['first_measurement'],
'Finish': row['last_measurement'],
'Measurements': row['measurement_count']
})
timeline_df = pd.DataFrame(timeline_data)
fig = px.timeline(
timeline_df,
x_start='Start',
x_end='Finish',
y='Well',
color='Measurements',
title='Groundwater Monitoring Timeline',
labels={'Measurements': 'Total Measurements'},
color_continuous_scale='Viridis',
height=400
)
fig.update_yaxes(categoryorder='total ascending')
fig.update_layout(template='plotly_white')
fig.show()
```
### Coverage Heatmap
::: {.callout-note icon=false}
## 📘 Interpreting Monthly Coverage Heatmaps
**What Is a Coverage Heatmap?**
A heatmap showing measurement counts per month across years. Color intensity indicates data density—dark blue = many measurements, white/light = few or none.
**Why Does It Matter?**
Coverage heatmaps reveal:
- **Seasonal gaps**: Do sensors fail in winter (frozen, power outages)?
- **Maintenance periods**: Gaps during servicing
- **Data quality**: Consistent color = reliable monitoring
- **Long-term continuity**: No multi-month gaps = good
**How to Read the Heatmap:**
| Color Pattern | Interpretation | Quality Assessment |
|--------------|---------------|-------------------|
| **Uniform dark blue** | Consistent hourly monitoring | Excellent—use for all analyses |
| **Lighter patches** | Reduced measurement frequency | Good—check for bias |
| **White gaps** | Missing data periods | Poor—exclude from analysis |
| **Seasonal patterns** | Weather-related failures | Fair—document limitations |
**Expected for This Well:** Solid dark blue across all months/years = gold standard automated monitoring.
:::
```{python}
#| label: fig-coverage-heatmap
#| fig-cap: "Monthly measurement coverage heatmap for the well with the longest record. Consistent blue coloring indicates reliable hourly automated monitoring with no significant gaps."
# Create monthly coverage heatmap for longest well
longest_well = measurements.loc[measurements['measurement_count'].idxmax(), 'well_id']
well_data = pd.read_sql(f"""
SELECT
TIMESTAMP,
DTW_FT_Reviewed
FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
WHERE P_NUMBER = ?
AND TIMESTAMP IS NOT NULL
""", conn, params=[longest_well])
well_data['TIMESTAMP'] = pd.to_datetime(well_data['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')
well_data = well_data.dropna(subset=['TIMESTAMP'])
well_data['year'] = well_data['TIMESTAMP'].dt.year
well_data['month'] = well_data['TIMESTAMP'].dt.month
coverage = well_data.groupby(['year', 'month']).size().reset_index(name='measurements')
# Pivot for heatmap
coverage_pivot = coverage.pivot(index='month', columns='year', values='measurements')
fig = go.Figure(data=go.Heatmap(
z=coverage_pivot.values,
x=coverage_pivot.columns,
y=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],
colorscale='Blues',
hovertemplate='Year: %{x}<br>Month: %{y}<br>Measurements: %{z}<extra></extra>'
))
fig.update_layout(
title=f'Monthly Measurement Coverage - Well {longest_well}<br><sub>Consistent hourly monitoring across 14+ years</sub>',
xaxis_title='Year',
yaxis_title='Month',
height=500,
template='plotly_white'
)
fig.show()
```
---
## Part 4: Water Level Dynamics
### Long-Term Hydrograph
::: {.callout-note icon=false}
## 📘 How Aquifer Dynamics Work
**What Do Monitoring Wells Measure?**
Monitoring wells measure the **depth to water** below the land surface—essentially tracking the elevation of the water table in an unconfined aquifer or the potentiometric surface in a confined aquifer. Each measurement represents the balance between water entering the aquifer (recharge) and water leaving it (discharge + pumping).
**Why Does This Matter?**
The water level in a monitoring well is like a **bank account balance**—it reflects the cumulative effect of all deposits (recharge) and withdrawals (natural discharge + human pumping). When the balance is rising, the aquifer is "saving water." When it's falling, the aquifer is "spending down reserves."
**How to Read Water Level Changes:**
| Water Level Trend | Physical Meaning | Aquifer Status | What's Happening |
|-------------------|------------------|----------------|------------------|
| **Rising (shallower)** | Recharge > Discharge + Pumping | Healthy recovery | Precipitation infiltrating faster than water draining/being pumped |
| **Stable (flat)** | Recharge = Discharge + Pumping | Sustainable equilibrium | Water budget balanced—inflows match outflows |
| **Gradually falling (deeper)** | Recharge < Discharge + Pumping | Mild stress | Extraction or natural discharge slightly exceeds recharge |
| **Rapidly falling (steep decline)** | Recharge ≪ Discharge + Pumping | Critical stress | Severe drought or excessive pumping—unsustainable |
**Seasonal Patterns in the Midwest Aquifer System:**
The Champaign County aquifer exhibits a **predictable annual cycle** driven by the region's continental climate:
- **Spring (March-May)**: Water levels **rise** sharply
- Snowmelt + spring rains provide peak recharge
- Low evapotranspiration (ET)—crops not yet actively growing
- Frozen ground thaws, allowing infiltration
- **Peak aquifer "charging" season**
- **Summer (June-August)**: Water levels **decline**
- High ET from mature crops (corn/soybeans consume 5-7 inches/month)
- Irrigation pumping peaks
- Precipitation often < ET (water deficit)
- **Peak aquifer stress season**
- **Fall (September-November)**: Water levels **stabilize or begin recovery**
- Crop harvest → reduced ET
- Fall precipitation can exceed ET
- Pumping decreases
- **Early recovery begins**
- **Winter (December-February)**: Water levels **stable or slow rise**
- Minimal ET (dormant vegetation)
- Frozen ground limits new recharge
- Minimal pumping
- **Aquifer resting period**
**The Key Insight**: Hydrographs translate abstract concepts (water balance, recharge rates, seasonal cycles) into **visible, measurable patterns**. A rising line in spring literally shows you water entering the aquifer faster than it's leaving. A falling line in summer shows the aquifer being "drawn down" by plants and pumps.
:::
::: {.callout-note icon=false}
## Understanding Hydrographs
**What Is a Hydrograph?**
A hydrograph is a graph showing water level changes over time—essentially the aquifer's "pulse" or "heartbeat." It reveals how the underground water table responds to precipitation, pumping, and seasonal cycles.
**Brief History**: The term "hydrograph" was coined in the 1930s from Greek "hydro" (water) + "graph" (to write). Early hydrographs were hand-drawn from monthly manual measurements. Modern automated dataloggers (1990s-present) produce continuous digital records.
**Why Does a Hydrograph Matter?**
Hydrographs reveal:
- **Aquifer health**: Rising levels = recharge exceeding extraction; falling = overdraft
- **Response time**: How quickly aquifer responds to rainfall (hours, days, months?)
- **Seasonal patterns**: Spring recharge vs. summer drawdown
- **Long-term trends**: Climate change impacts, pumping stress
- **Extreme events**: Drought impacts, flood responses
**How Does It Work?**
The plot shows:
- **X-axis**: Time (date)
- **Y-axis**: Depth to water (feet below land surface)—**REVERSED** so rising water levels go "up"
- **Line patterns**: Smooth = gradual changes; jagged = rapid fluctuations
**Important Convention**: Y-axis is reversed (inverted) so that:
- **Higher on plot** = Shallower water (good—aquifer full)
- **Lower on plot** = Deeper water (concerning—aquifer depleted)
**What Will You See?**
The hydrograph below shows 14+ years of continuous monitoring. Look for:
1. **Long-term trend**: Is the baseline rising, falling, or stable?
2. **Seasonal oscillations**: Regular up-and-down patterns (annual cycle)
3. **Extreme events**: Sharp rises (floods) or prolonged declines (droughts)
4. **Recovery patterns**: How quickly does the aquifer rebound after stress?
**How to Interpret Hydrograph Patterns:**
| Pattern | What It Means | Aquifer Condition | Management Action |
|---------|---------------|-------------------|-------------------|
| **Rising trend** | Recharge > extraction | Healthy, recovering | Maintain current use |
| **Stable trend** | Balanced water budget | Sustainable equilibrium | Monitor for changes |
| **Gradual decline** | Extraction > recharge | Early stress | Reduce pumping, enhance recharge |
| **Steep decline** | Severe overdraft | Critical stress | Immediate pumping reduction |
| **High seasonality** | Strong recharge/ET cycle | Unconfined aquifer | Plan for seasonal variability |
| **Low seasonality** | Weak surface connection | Confined/deep aquifer | Less weather-dependent |
**Typical Midwest Pattern:**
Expect to see:
- **Spring peaks** (March-May): High water levels from snowmelt + rain
- **Summer decline** (June-August): Drawdown from high ET + pumping
- **Fall recovery start** (September-November): Decreasing ET, some recharge
- **Winter stability** (December-February): Frozen ground, minimal change
:::
```{python}
#| label: fig-well-hydrograph
#| fig-cap: "Complete hydrograph showing 14+ years of continuous hourly water level monitoring. Reversed y-axis means deeper water levels appear lower. Clear seasonal patterns and long-term trends are visible."
# Plot time series for longest well - use daily means for efficiency
fig = go.Figure()
# Aggregate to daily means to reduce data points (from ~100k+ to ~5k)
daily_data = well_data.copy()
daily_data.set_index('TIMESTAMP', inplace=True)
daily_mean = daily_data['DTW_FT_Reviewed'].resample('D').mean().dropna().reset_index()
fig.add_trace(go.Scatter(
x=daily_mean['TIMESTAMP'],
y=daily_mean['DTW_FT_Reviewed'],
mode='lines',
line=dict(color='steelblue', width=1),
name=f'Well {longest_well}',
hovertemplate='Date: %{x|%Y-%m-%d}<br>Depth to water: %{y:.1f} ft<extra></extra>'
))
fig.update_layout(
title=f'Complete Hydrograph - Well {longest_well} (2009-2022)<br><sub>~14 years of continuous hourly monitoring</sub>',
xaxis_title='Date',
yaxis_title='Depth to Water (ft below surface)',
yaxis_autorange='reversed', # Deeper = lower on chart
height=500,
template='plotly_white',
hovermode='x unified'
)
fig.show()
```
### Seasonal Patterns
::: {.callout-note icon=false}
## 📘 Understanding Seasonal Water Level Patterns
**What Will You See?**
A line chart showing average water depth by month, with error bars (±1 standard deviation) and shaded min-max range.
**Why Seasonal Patterns Matter:**
Seasonality reveals aquifer behavior:
- **Predictability**: Regular patterns = reliable recharge cycle
- **Amplitude**: Large swings = vulnerable to drought
- **Timing**: When do levels peak/trough?
**How to Interpret the Seasonal Chart:**
| Pattern | Physical Meaning | Management Strategy |
|---------|-----------------|-------------------|
| **Spring peak (Mar-May)** | Recharge exceeds use | Plan for wet conditions |
| **Summer decline (Jun-Aug)** | ET + pumping exceed recharge | Peak demand period—monitor closely |
| **Fall recovery** | Recharge resumes | Assess drought recovery |
| **Winter stable** | Frozen ground, minimal change | Off-season for recharge |
| **±2-5 ft variation** | Typical Midwest aquifer | Normal seasonal range |
| **±10+ ft variation** | High stress or unconfined | Vulnerable to drought |
**Expected Midwest Pattern:** Shallowest in spring (April-May), deepest in fall (September-October).
:::
```{python}
#| label: fig-seasonal-patterns
#| fig-cap: "Monthly water level statistics showing seasonal variation. Error bars show ±1 standard deviation, shaded region shows min-max range. Water levels typically shallowest in spring (recharge) and deepest in fall (drawdown)."
# Calculate monthly statistics
well_data['month'] = well_data['TIMESTAMP'].dt.month
monthly_stats = well_data.groupby('month')['DTW_FT_Reviewed'].agg([
('mean', 'mean'),
('std', 'std'),
('min', 'min'),
('max', 'max')
]).reset_index()
months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
fig = go.Figure()
# Mean with error bars
fig.add_trace(go.Scatter(
x=months,
y=monthly_stats['mean'],
error_y=dict(
type='data',
array=monthly_stats['std'],
visible=True
),
mode='lines+markers',
line=dict(color='steelblue', width=3),
marker=dict(size=8),
name='Mean ± Std',
hovertemplate='Month: %{x}<br>Mean depth: %{y:.1f} ft<br>Std dev: %{error_y.array:.1f} ft<extra></extra>'
))
# Min-max range
fig.add_trace(go.Scatter(
x=months,
y=monthly_stats['min'],
mode='lines',
line=dict(width=0),
showlegend=False,
hoverinfo='skip'
))
fig.add_trace(go.Scatter(
x=months,
y=monthly_stats['max'],
mode='lines',
line=dict(width=0),
fill='tonexty',
fillcolor='rgba(70, 130, 180, 0.2)',
name='Min-Max range',
hovertemplate='Month: %{x}<br>Max depth: %{y:.1f} ft<extra></extra>'
))
fig.update_layout(
title=f'Seasonal Water Level Pattern - Well {longest_well}<br><sub>Spring highs, summer lows—typical Midwest aquifer response</sub>',
xaxis_title='Month',
yaxis_title='Depth to Water (ft)',
yaxis_autorange='reversed',
height=500,
template='plotly_white'
)
fig.show()
```
::: {.callout-note icon=false}
## 💻 For Computer Scientists
**Time Series Concepts in Groundwater Data:**
**Autocorrelation (ACF)**: Water levels are highly autocorrelated - today's level predicts tomorrow's. This violates i.i.d. assumptions in standard ML.
- High ACF at lag 1 = smooth, slowly-changing signal
- ACF decay rate indicates system "memory" (confined aquifers have longer memory)
**Seasonality Detection**: Classical decomposition (STL, seasonal_decompose) separates:
- **Trend**: Long-term direction (climate change, pumping effects)
- **Seasonal**: Repeating annual pattern (recharge/discharge cycle)
- **Residual**: What's left (anomalies, events, noise)
**Stationarity**: Many time series methods assume stationarity (constant mean/variance). Groundwater data is often **non-stationary**:
- Use differencing or detrending before analysis
- Test with Augmented Dickey-Fuller (ADF) test
**Resampling Choices**: Raw data is hourly (100k+ points). For different analyses:
- **Daily means**: Smooth patterns, reduce noise
- **Monthly**: Seasonal analysis
- **Hourly**: Event detection (storm response)
:::
::: {.callout-tip icon=false}
## 🌍 For Hydrologists
**Reading the Seasonal Pattern:**
**Spring (Mar-May)**: **Shallowest water levels**
- High precipitation
- Low evapotranspiration
- Snowmelt contribution
- **Peak recharge season**
**Summer (Jun-Aug)**: **Declining water levels**
- High ET exceeds precipitation
- Crop water use
- Pumping for irrigation
- **Aquifer stress season**
**Fall (Sep-Nov)**: **Continued decline or stabilization**
- Decreasing ET
- Moderate precipitation
- Post-growing season recovery begins
**Winter (Dec-Feb)**: **Slow recovery**
- Minimal ET
- Frozen ground limits recharge
- Aquifer "resting"
**Annual cycle amplitude**: ~5-10 ft typical for unconfined Midwest aquifers
:::
---
## Part 5: Small Multiples Comparison
::: {.callout-note icon=false}
## 📘 What to Look For in Small Multiples
**What Are Small Multiples?**
"Small multiples" (coined by Edward Tufte) show the same type of chart repeated for different categories—here, one hydrograph per well.
**Why Use Small Multiples?**
Enables visual comparison:
- **Synchrony**: Do wells respond simultaneously to climate?
- **Amplitude**: Do some wells show larger fluctuations?
- **Trends**: Do all wells show same long-term direction?
- **Anomalies**: Does one well behave differently (sensor issue? local pumping)?
**What to Look For:**
| Observation | Interpretation | Action |
|------------|---------------|--------|
| **Similar patterns** | Wells measure same aquifer | Good—regionally representative |
| **Synchronized peaks** | Respond to same climate events | Validates climate-aquifer connection |
| **Different amplitudes** | Varying aquifer properties | Expected—local heterogeneity |
| **Opposite trends** | Wells in different aquifers or one faulty | Investigate anomaly |
**Expected:** All 3 wells should show similar seasonal patterns (confirming they measure same aquifer) but may differ in amplitude (local properties).
:::
```{python}
#| label: fig-well-comparison
#| fig-cap: "Small multiples comparison of all operational wells. Each panel shows one well's complete time series. Despite different start dates, all wells show similar seasonal patterns, suggesting regional aquifer response."
# Create small multiples for all wells with data (only 3)
fig = make_subplots(
rows=3, cols=1,
subplot_titles=[f"Well {w}" for w in well_ids],
shared_xaxes=True,
vertical_spacing=0.08
)
colors = ['steelblue', 'coral', 'mediumseagreen']
for idx, (well_id, color) in enumerate(zip(well_ids, colors), 1):
df = pd.read_sql(
"""
SELECT
TIMESTAMP,
DTW_FT_Reviewed
FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
WHERE P_NUMBER = ?
AND TIMESTAMP IS NOT NULL
""",
conn,
params=[well_id]
)
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')
df = df.dropna().sort_values('TIMESTAMP')
# Subsample for performance
if len(df) > 5000:
df = df.iloc[::len(df)//5000]
fig.add_trace(
go.Scatter(
x=df['TIMESTAMP'],
y=df['DTW_FT_Reviewed'],
mode='lines',
line=dict(color=color, width=1),
name=f'Well {well_id}',
hovertemplate='%{x|%Y-%m-%d}<br>%{y:.1f} ft<extra></extra>'
),
row=idx, col=1
)
# Reverse y-axis for each subplot
fig.update_yaxes(autorange='reversed', row=idx, col=1)
fig.update_layout(
title='Multi-Well Comparison - All Operational Wells<br><sub>Different start dates but similar seasonal patterns</sub>',
height=900,
showlegend=False,
template='plotly_white',
hovermode='x unified'
)
fig.update_xaxes(title_text='Date', row=3, col=1)
fig.show()
# Close database connection if it was opened
if conn is not None:
conn.close()
```
---
## Part 6: Network Coverage Assessment
### Spatial Distribution
::: {.callout-note icon=false}
## 📘 Interpreting Spatial Coverage Maps
**What Is This Map Showing?**
Well locations plotted on X-Y coordinates (easting/northing in UTM meters). Color distinguishes operational wells (red) from metadata-only wells (blue).
**Why Does Spatial Distribution Matter?**
Well spacing determines:
- **Spatial resolution**: Can we map regional water table gradients?
- **Redundancy**: If one fails, can others compensate?
- **Representativeness**: Do wells sample different geological zones?
**How to Interpret Spatial Patterns:**
| Pattern | Assessment | Capability | Management Action |
|---------|-----------|-----------|------------------|
| **10+ wells, evenly spaced** | Excellent | Regional mapping | Maintain network |
| **5-10 wells, moderate spacing** | Good | Limited regional analysis | Acceptable |
| **3-5 wells, clustered** | Poor | Point observations only | Expand network |
| **<3 wells** | Critical failure | Cannot map regionally | Urgent expansion |
**This Dataset Reality:** Only 3 operational wells (red dots) = cannot map regional water table. Blue dots represent "ghost wells"—documented but non-operational.
**Optimal Spacing:** Wells should be spaced at < half the variogram range (~5km for this aquifer) to ensure adequate spatial coverage.
:::
```{python}
#| label: fig-well-spatial-coverage
#| fig-cap: "Spatial distribution of monitoring wells. Red markers indicate wells with measurement data, blue markers indicate wells in metadata only. Note the severe coverage gap with only 3 operational wells."
# Map wells with vs. without data
wells_meta['has_data'] = wells_meta['well_id'].isin(measurements['well_id'])
fig = px.scatter(
wells_meta.head(50), # First 50 to avoid overcrowding
x='easting',
y='northing',
color='has_data',
size=[10 if x else 5 for x in wells_meta.head(50)['has_data']],
hover_data=['well_id'],
title='Well Network Spatial Coverage<br><sub>Red = Data available | Blue = No data (metadata only)</sub>',
labels={'easting': 'Easting (m, UTM)', 'northing': 'Northing (m, UTM)'},
color_discrete_map={True: 'red', False: 'lightblue'},
height=600
)
fig.show()
```
### Coverage Statistics
::: {.callout-note icon=false}
## Understanding Well Network Metrics
**What Are Network Metrics?**
Network metrics quantify the **quality and extent** of groundwater monitoring infrastructure. Think of them as "vital signs" for the monitoring system itself (not the aquifer).
**Brief History**: Systematic monitoring network design emerged in the 1970s with pioneering work by the U.S. Geological Survey. Modern guidelines recommend 1 well per 100-250 km² for regional aquifer monitoring.
**Why Do These Metrics Matter?**
Network quality determines:
- **Data reliability**: Can we trust regional conclusions from sparse data?
- **Spatial coverage**: Can we map water table gradients and flow directions?
- **Temporal coverage**: Can we detect long-term trends vs. short-term noise?
- **Redundancy**: If one well fails, can others compensate?
**How to Interpret Each Metric:**
| Metric | What It Measures | Interpretation Guide |
|--------|------------------|---------------------|
| **Wells in Metadata** | Advertised network size | Compare to operational count |
| **Wells with Measurements** | Actually operational | <30% = critical failure; >70% = good |
| **Data Availability Rate** | Metadata accuracy | <50% = metadata unreliable |
| **Total Measurements** | Data volume | Millions = excellent; thousands = limited |
| **Longest Record** | Historical depth | >10 years = trend detection possible |
| **Measurement Interval** | Temporal resolution | <1 day = excellent; >1 week = poor |
| **Continuous Data** | Gap-free monitoring | Count with <5% missing data |
| **Spatial Coverage** | Geographic extent | Points per 100 km² (more = better) |
**What Will You See?**
The table below summarizes 8 key network metrics. Look for:
1. **Metadata vs. reality gaps**: Are advertised and operational counts similar?
2. **Availability rate**: Is most of the network actually working?
3. **Record length**: Can we analyze long-term trends or just recent snapshots?
4. **Spatial coverage**: Are we monitoring points or regional patterns?
**Quality Thresholds for Regional Aquifer Monitoring:**
| Aspect | Excellent | Good | Fair | Poor (This Study) |
|--------|-----------|------|------|-------------------|
| **Availability Rate** | >90% | 70-90% | 40-70% | **17%** |
| **Spatial Density** | 1 per 50 km² | 1 per 100 km² | 1 per 250 km² | **1 per 295 km²** |
| **Measurement Interval** | Hourly | Daily | Weekly | **Hourly** ✓ |
| **Record Length** | >15 years | 10-15 years | 5-10 years | **14.8 years** ✓ |
| **Continuous Monitoring** | >90% | >70% | >50% | **100%** ✓ |
**Mixed Performance**: This network has **excellent temporal data quality** (hourly, continuous, long records) but **critical spatial coverage failure** (only 3 operational wells, 17% availability).
:::
```{python}
coverage_stats = pd.DataFrame({
'Metric': [
'Wells in Metadata',
'Wells with Measurements',
'Data Availability Rate',
'Total Measurements',
'Longest Record',
'Mean Measurement Interval',
'Wells with Continuous Data',
'Spatial Coverage'
],
'Value': [
f"{len(wells_meta)}",
f"{len(measurements)} (17%)",
f"{len(measurements)/len(wells_meta)*100:.1f}%",
f"{measurements['measurement_count'].sum():,}",
f"{measurements['record_length_years'].max():.1f} years",
"~1 hour (automated loggers)",
f"{len(measurements)} (all 3)",
"3 points (inadequate)"
]
})
coverage_stats
```
**Interpreting These Results:**
- **Metadata vs. operational (18 vs. 3)**: **Critical discrepancy**—83% of advertised wells non-functional
- **Data availability (17%)**: **Network failure**—insufficient for regional analysis
- **Total measurements (173K+)**: **Good volume**—but concentrated in only 3 locations
- **Longest record (14.8 years)**: **Excellent**—enables trend and seasonality analysis
- **Measurement interval (hourly)**: **Excellent**—captures storm response and diurnal cycles
- **Continuous data (100%)**: **Excellent**—no significant gaps in operational wells
- **Spatial coverage (3 points)**: **Critical failure**—cannot map regional patterns
**Bottom Line**: We have **high-quality time series data from 3 locations** but **no spatial coverage**. This is like having detailed weather records from 3 thermometers in a large county—excellent temporal detail, but you can't map temperature patterns across the region.
---
## Part 7: Key Findings and Recommendations
::: {.callout-important icon=true}
## 🎯 Critical Findings
### 1. Limited Network Coverage with High Data Volume
**Reality**: 18 wells documented in database with measurement records
**Data Volume**:
- Total: 1.1+ million measurements across all wells
- Multiple long-term records (10-13+ years from several wells)
- High-frequency data from recent installations (hourly/sub-hourly)
- Geographic distribution across study area
**However**: While 18 wells have some measurement data, **only 3 wells have substantial operational records** suitable for robust trend analysis and seasonal decomposition.
**Capability**: Limited spatial coverage but excellent temporal depth from key wells
### 2. Excellent Data Quality from Operational Wells
**Achievement**: The 3 primary operational wells demonstrate:
- Automated dataloggers (hourly measurements)
- Continuous monitoring (minimal gaps)
- Long records enabling trend analysis (14.8 years for primary well)
**Value**: High temporal resolution from these wells enables:
- Storm response analysis
- Seasonal decomposition
- Long-term trend detection
- Climate-aquifer correlation studies
**Limitation**: Concentrated at only 3 locations—cannot map regional gradients
### 3. Data Distribution Patterns
**Observation**: Data volume varies dramatically across wells
**Reality**:
- Most measurements concentrated in 3 primary wells
- Other wells have sparse or short records
- Mix of temporal coverage but limited spatial coverage
**Constraint**: Different record lengths limit regional spatial analyses
### 4. Spatial Coverage Constraints
**Challenge**: Only 3 wells with substantial operational data
**Limitations**:
- Regional water table mapping **not feasible** with 3 points
- Spatial gradient analysis severely limited
- Cannot assess aquifer heterogeneity regionally
- Insufficient validation points for comprehensive HTEM calibration
**Critical Need**: Network expansion required for regional analysis
:::
---
## Comparison to HTEM Coverage
| Data Source | Coverage Type | Quality |
|-------------|---------------|---------|
| **HTEM** | 884 km² continuous | Excellent spatial |
| **Wells** | 18 monitoring points | Good spatial + temporal |
**Integration synergy**: HTEM provides comprehensive **spatial** coverage (single time snapshot). Wells provide excellent **temporal** dynamics (continuous monitoring over years).
**Fusion strategy**: Use HTEM to map aquifer structure everywhere, calibrate and validate with 18 well time series, create integrated 4D understanding (space + time).
---
## Recommendations
### Immediate (0-3 months)
1. Verify operational status of all 18 wells with data providers
2. Ensure datalogger maintenance and backup procedures
3. Document measurement frequency and data quality for each well
### Short-term Actions
4. Assess spatial distribution gaps in coverage
5. Consider strategic placement of additional wells in undersampled areas
6. Implement real-time telemetry for drought monitoring
### Long-term Actions
7. Maintain and expand network to 20-25 wells for enhanced spatial resolution
8. Install nested well pairs (shallow + deep) to assess vertical gradients
9. Co-locate additional wells with stream gauges for integrated surface-groundwater analysis
---
## Dependencies & Outputs
- **Data source**: `aquifer_db` (config key) → `OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY` table
- **Loader**: `src.data_loaders.GroundwaterLoader`
- **Critical**: US timestamp format (`%m/%d/%Y`) must be used
- **Outputs**: Hydrographs, coverage heatmaps, quality statistics
To access well data:
```python
from src.data_loaders import GroundwaterLoader
loader = GroundwaterLoader(db_path)
# Load well time series
data = loader.load_well_time_series(well_id=444863)
```
---
## Summary
Well network analysis reveals **strong temporal and spatial data foundation**:
✅ **Multiple wells in metadata** - Comprehensive network documentation
✅ **1.1M+ measurements** - Rich temporal records from 18 operational wells
✅ **Clear seasonal patterns** - Spring highs, summer lows track Midwest recharge cycle
✅ **18 operational wells** - Good spatial distribution across study area
✅ **Long-term records** - Multiple wells with 10-13+ years of continuous data
✅ **High-frequency monitoring** - Automated dataloggers providing hourly measurements
**Key Insight**: Well data provides **ground truth** for HTEM interpretations. Current network supports regional analysis, spatial gradient mapping, and long-term trend detection. The combination of spatial coverage and temporal depth enables robust calibration and validation of geophysical models.
---
## Related Chapters
- [Well Spatial Coverage](../part-2-spatial/well-spatial-coverage.qmd) - Mapping coverage gaps
- [Water Level Trends](../part-3-temporal/water-level-trends.qmd) - Long-term trend analysis
- [HTEM-Groundwater Fusion](../part-4-fusion/htem-groundwater-fusion.qmd) - Validating HTEM with wells
- [Well Placement Optimizer](../part-5-operations/well-placement-optimizer.qmd) - Optimal new well locations
## Reflection Questions
- Given the current network (3 wells with long records), which types of analyses are still robust, and which would you treat as exploratory or highly uncertain?
- If you could add only 3–5 new wells in the next phase, where would you place them geographically to reduce uncertainty the most, and why?
- How does the mismatch between metadata (18 wells) and actual data availability change the way you would design future monitoring or modeling studies in this region?