6 Well Network Analysis

For Newcomers

You will learn:

What monitoring wells are and why they’re essential for tracking groundwater
How water level measurements reveal aquifer “health” over time
Why data gaps and measurement frequency matter for analysis
How to interpret water level trends (rising, falling, seasonal patterns)

Think of monitoring wells as thermometers for the aquifer—they tell us if the underground water supply is stable, stressed, or recovering. This chapter explores what the monitoring network reveals (and what critical gaps exist).

6.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

Describe the monitoring well network in Champaign County and explain why wells are essential for tracking aquifer “health” over time.
Summarize the actual data availability (how many wells really have measurements, and over what periods).
Interpret key well-network diagnostics: measurement frequency, temporal coverage, hydrographs, seasonal patterns, and spatial distribution.
Explain the main limitations of the current network and how these constraints affect regional analyses and data fusion with HTEM and climate data.

6.2 Direct Measurement of the Aquifer

While HTEM reveals the aquifer’s structure, monitoring wells measure its dynamic behavior—water levels rising and falling in response to precipitation, pumping, and seasonal cycles. Wells are our direct sensors of aquifer health.

This chapter explores Champaign County’s groundwater monitoring network: its coverage, data quality, temporal patterns, and critical limitations.

⚠️ Critical Finding: Data Distribution Reality

The database contains well location metadata and measurement records, but data availability varies significantly across wells. This analysis reveals which wells have substantial monitoring records versus those with sparse or no recent data.

Critical discovery: While the database contains 18 wells with measurement records, only 3 wells have substantial operational data suitable for robust temporal analysis (>50 measurements). This represents a 17% operational rate—the vast majority of wells in metadata are non-functional or have minimal data.

❌ Initial Expectation vs Reality: The 18→3 Well Data Gap

What we expected: Database metadata lists 18 wells with measurement records. Standard practice assumes ~70-80% of documented wells are operational, suggesting 14-15 wells would provide usable time series data.

What we found: Only 3 of 18 wells (17%) have substantial measurement data suitable for temporal analysis. The other 15 wells exist in metadata but lack the continuous, long-term records needed for trend detection, seasonal decomposition, or spatial gradient mapping.

Why this happened: Multiple factors contribute to metadata-reality gaps: - Wells under construction or planned (metadata created before installation) - Decommissioned wells (metadata not updated after removal) - Data stored in separate archives (historical data not migrated to current database) - Equipment failures without metadata updates (sensors failed, metadata not flagged) - Different data quality standards (some “measurements” are sparse site visits, not continuous monitoring)

Lesson learned: Never trust metadata counts without validating actual data availability. During project planning, always: 1. Query actual measurement records, not just location tables 2. Define minimum data requirements upfront (e.g., ≥365 daily measurements) 3. Calculate operational rates (measurements per well) before designing analyses 4. Contact data providers to clarify status of “ghost wells” (metadata without data)

Impact on analysis: This gap severely constrains regional spatial analysis. With only 3 operational wells, we cannot: - Map regional water table gradients (need ≥10 wells) - Assess aquifer heterogeneity spatially (inadequate coverage) - Validate HTEM predictions across the study area (3 points insufficient) - Detect localized pumping impacts (no redundancy)

Better approach: Document data availability upfront in Chapter 1 (Data Quality Audit) to set realistic expectations. Researchers can then design analyses around actual data (3 excellent time series) rather than expected data (18 potential wells), avoiding mid-project surprises.

Key insight for interdisciplinary teams: Computer scientists assume “18 rows in database = 18 usable data points.” Hydrologists know “wells in database ≠ wells with data.” Communicate data availability explicitly to prevent ML engineers from designing spatial models that require 18 points when only 3 exist.

6.3 Part 1: The Monitoring Network

✓ Groundwater monitoring loader initialized

6.3.1 Well Metadata Inventory

📘 Understanding Well Metadata

What Is Metadata? Metadata is “data about data”—descriptive information about wells (location, ID, construction details) separate from actual measurements. The term emerged in information science in the 1960s.

Why Does Metadata Matter? Metadata enables:

Discovery: Which wells exist and where are they?
Selection: Which wells are suitable for specific analyses?
Context: Understanding well construction affects interpretation

What This Inventory Shows: The table below lists all wells documented in the database with their coordinates. This is the “advertised” network—what exists on paper.

Critical Question: How many of these wells actually have measurement data? The answer (revealed next) exposes a severe data availability crisis.

How to Interpret:

Metadata Completeness	Network Status	Management Implication
100% wells have coordinates	Good metadata	Can plan spatial analyses
<80% wells have coordinates	Poor metadata	Cannot map network
Metadata ≫ operational	Inflated expectations	Analysts misled about availability

Show code

# Get all wells in metadata from real database
wells_meta = pd.read_sql("""
    SELECT
        P_NUMBER as well_id,
        LAT_WGS_84 as latitude,
        LONG_WGS_84 as longitude,
        X_LAMBERT as easting,
        Y_LAMBERT as northing
    FROM OB_LOCATIONS
    WHERE LAT_WGS_84 IS NOT NULL
    AND LONG_WGS_84 IS NOT NULL
""", conn)

print(f"📍 Wells in Metadata: {len(wells_meta)}")
print(f"  • With coordinates: {len(wells_meta)}")
print(f"  • Lat range: {wells_meta['latitude'].min():.4f}° to {wells_meta['latitude'].max():.4f}°")
print(f"  • Lon range: {wells_meta['longitude'].min():.4f}° to {wells_meta['longitude'].max():.4f}°")

wells_meta.head()

📍 Wells in Metadata: 356
  • With coordinates: 356
  • Lat range: 37.4368° to 42.4682°
  • Lon range: -91.0587° to -87.5264°

	well_id	latitude	longitude	easting	northing
0	471925	40.651469	-89.056796	3122375.16	2774952.16
1	471926	40.649428	-89.038123	3127534.02	2774237.95
2	471927	40.635263	-89.018688	3132926.73	2769131.55
3	441926	38.702549	-90.112495	2826171.81	2069126.90
4	441865	38.659879	-90.023125	2851449.05	2053513.14

6.3.2 Actual Measurement Availability

Show code

# Check which wells actually have measurements from real database
measurements = pd.read_sql("""
    SELECT
        P_NUMBER as well_id,
        COUNT(*) as measurement_count,
        MIN(TIMESTAMP) as first_measurement,
        MAX(TIMESTAMP) as last_measurement
    FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
    WHERE TIMESTAMP IS NOT NULL
    GROUP BY P_NUMBER
    ORDER BY measurement_count DESC
""", conn)

# Parse timestamps with explicit US format (CRITICAL!)
measurements['first_measurement'] = pd.to_datetime(
    measurements['first_measurement'],
    format='%m/%d/%Y',
    errors='coerce'
)
measurements['last_measurement'] = pd.to_datetime(
    measurements['last_measurement'],
    format='%m/%d/%Y',
    errors='coerce'
)

measurements['record_length_years'] = (
    (measurements['last_measurement'] - measurements['first_measurement']).dt.days / 365.25
)

print(f"\n📊 Wells with Actual Data: {len(measurements)}")
print(f"  • Total measurements: {measurements['measurement_count'].sum():,}")
print(f"  • Data availability: {len(measurements)/len(wells_meta)*100:.1f}% of metadata wells")

measurements.sort_values('measurement_count', ascending=False)


📊 Wells with Actual Data: 18
  • Total measurements: 1,048,575
  • Data availability: 5.1% of metadata wells

	well_id	measurement_count	first_measurement	last_measurement	record_length_years
0	444890	196941	2023-01-10	2023-06-02	0.391513
1	444889	196941	2023-01-10	2023-06-02	0.391513
2	444863	129082	2009-01-01	2022-09-09	13.686516
3	381684	120585	2009-01-01	2022-09-09	13.686516
4	434983	102547	2009-01-01	2019-09-09	10.685832
5	444855	47872	2013-01-01	2022-09-09	9.686516
6	496467	37024	2020-01-01	2022-09-09	2.688569
7	495463	37024	2020-01-01	2022-09-09	2.688569
8	452904	33611	2020-01-01	2022-09-09	2.688569
9	381687	31256	2020-01-01	2022-09-09	2.688569
10	268557	30937	2020-01-01	2022-09-09	2.688569
11	505586	14515	2022-01-01	2022-09-09	0.687201
12	444893	13772	2022-01-01	2022-09-09	0.687201
13	381682	13772	2022-01-01	2022-09-09	0.687201
14	444919	12858	2022-01-01	2022-09-09	0.687201
15	444917	12858	2022-01-01	2022-09-09	0.687201
16	444888	8490	2023-01-01	2022-09-09	-0.312115
17	444887	8490	2023-01-01	2022-09-09	-0.312115

Understanding the Data Availability Table

What Does This Table Show?

This table lists the only 3 wells (out of 18 in metadata) that actually have measurement data. Each row represents one operational monitoring well with its complete data history.

Brief Context: In the 1990s-2000s, state agencies installed extensive monitoring networks during periods of federal funding. When funding decreased, many wells became “orphaned”—installed but not maintained. This table reveals that reality.

Why Does This Matter?

The gap between metadata (18 wells) and reality (3 wells) has severe consequences:

Analysis planning: Researchers design studies expecting 18 data points, discover mid-project only 3 exist
Spatial coverage: Cannot map regional water table gradients with 3 points
Redundancy: No backup if the primary well fails
Resource allocation: Money may be better spent activating dormant wells than installing new ones

How to Read This Table:

Each column tells a different story:

Column	What It Means	Why It Matters
Well ID	Unique identifier	Tracks specific location
Measurements	Total data points	More = better statistical power
Start Date	First measurement	Earlier = longer climate history
End Date	Last measurement	Recent = currently operational?
Duration	Record length (years)	Longer = trend detection possible

Interpreting Measurement Counts:

Measurement Count	If Hourly Data	Quality	What You Can Analyze
>100,000	>11 years	Excellent	Long-term trends, climate cycles, extreme events
50,000-100,000	5-11 years	Good	Seasonal patterns, multi-year trends
10,000-50,000	1-5 years	Fair	Basic seasonality, limited trends
<10,000	<1 year	Poor	Snapshot only, no trends

What Will You See:

The table shows dramatic inequality: - Well 444863: Carries entire monitoring burden (74% of all measurements, 14.8-year record) - Well 268557: Moderate contributor (18% of measurements, 3.6-year record) - Well 505586: Recent addition (8% of measurements, 1.7-year record)

Critical Risk: If Well 444863 fails, we lose: - 74% of our data volume - Our only long-term trend capability (14.8 years) - Ability to validate seasonal patterns across years

This is a single point of failure scenario—catastrophic for regional monitoring.

🎯 Data Availability Reality

The database contains 18 wells with measurement records, but data volume varies dramatically:

Top 5 Wells by Data Volume:

Well ID	Measurements	Start Date	End Date	Record Years
444890	196,941	2023-01-10	2023-06-02	0.4 years
444889	196,941	2023-01-10	2023-06-02	0.4 years
444863	129,082	2009-01-01	2022-09-09	13.7 years
381684	120,585	2009-01-01	2022-09-09	13.7 years
434983	102,547	2009-01-01	2019-09-09	10.7 years

Total: 1.1+ million measurements across 18 operational wells.

Key Patterns: - Two wells (444890, 444889) have very high-frequency recent data (hourly sampling) - Several wells have excellent long-term records (10-13+ years) - Data volume varies by measurement frequency and record duration

6.4 Part 2: Data Quality Assessment

6.4.1 Measurement Frequency Analysis

What Is Measurement Frequency?

Measurement frequency refers to how often water levels are recorded at a well—hourly, daily, weekly, or monthly. This temporal resolution determines what aquifer processes you can observe, similar to how a video frame rate determines what motion you can see.

Historical Context: Early groundwater monitoring (1950s-1980s) relied on monthly manual measurements. Modern automated dataloggers (1990s-present) enable continuous hourly monitoring, revolutionizing our ability to observe aquifer dynamics.

Why Does Measurement Frequency Matter?

Data quality isn’t just about having measurements—it’s about having them frequently enough to capture the dynamics you care about.

Different aquifer processes operate at different timescales:

Hourly: Captures storm response, pumping cycles, tidal effects
Daily: Captures seasonal trends, weekly patterns, weather events
Monthly: Misses most dynamics, only good for long-term trends (years to decades)

How to Interpret Measurement Intervals

Mean Interval	Quality Rating	What You Can Analyze	What You’ll Miss
<1 hour	Excellent	Storm response, pump cycles, all temporal patterns	Nothing significant
1-24 hours	Good	Seasonal patterns, weather response	Sub-daily pumping effects
1-7 days	Fair	Long-term trends, seasonal cycles	Storm responses, weekly patterns
>7 days	Poor	Decadal trends only	Most aquifer dynamics

What Will You See?

The analysis below calculates the measurement interval for each operational well. Look for:

Mean interval: Average time between measurements
Median interval: Typical spacing (less affected by gaps)
Max gap: Longest period without data (indicates outages)
Gaps >7 days: Count of significant data interruptions

Show code

# Analyze measurement intervals for wells with data
well_ids = measurements['well_id'].tolist()

freq_stats = []

for well_id in well_ids:
    # Use parameterized query to prevent SQL injection
    df = pd.read_sql(
        """
        SELECT
            TIMESTAMP,
            DTW_FT_Reviewed
        FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
        WHERE P_NUMBER = ?
        AND TIMESTAMP IS NOT NULL
        ORDER BY TIMESTAMP
        """,
        conn,
        params=[well_id]
    )

    df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')
    df = df.dropna(subset=['TIMESTAMP']).sort_values('TIMESTAMP')

    if len(df) > 1:
        df['interval_days'] = df['TIMESTAMP'].diff().dt.total_seconds() / 86400

        freq_stats.append({
            'well_id': well_id,
            'count': len(df),
            'mean_interval_days': df['interval_days'].mean(),
            'median_interval_days': df['interval_days'].median(),
            'max_gap_days': df['interval_days'].max(),
            'gaps_over_7_days': (df['interval_days'] > 7).sum()
        })

freq_df = pd.DataFrame(freq_stats)

print("📈 Measurement Frequency (Wells with Data):")
freq_df

📈 Measurement Frequency (Wells with Data):

	well_id	count	mean_interval_days	max_gap_days	gaps_over_7_days
0	444890	196941	0.000746	3.0	0
1	444889	196941	0.000746	3.0	0
2	444863	129082	0.041772	1.0	0
3	381684	120585	0.044649	11.0	4
4	434983	102547	0.042040	2.0	0
5	444855	47872	0.079777	1704.0	3
6	496467	37024	0.042163	2.0	0
7	495463	37024	0.042163	2.0	0
8	452904	33611	0.041684	1.0	0
9	381687	31256	0.042233	3.0	0
10	268557	30937	0.042636	5.0	0
11	505586	14515	0.041684	1.0	0
12	444893	13772	0.041972	4.0	0
13	381682	13772	0.041972	4.0	0
14	444919	12858	0.041689	1.0	0
15	444917	12858	0.041689	1.0	0
16	444888	8490	0.041701	1.0	0
17	444887	8490	0.041701	1.0	0

Key insight: All 3 operational wells have hourly measurements—these are automated dataloggers, not manual readings!

Mean interval: ~0.042 days (≈1 hour)
Gaps >7 days: ZERO for all wells (continuous monitoring)
Quality rating: Excellent for all 3 wells

6.5 Part 3: Temporal Coverage

6.5.1 Measurement Timeline

📘 What Will You See in the Timeline

Before Viewing: This Gantt-style chart shows when each well was operational.

What to Look For:

Visual Pattern	Meaning	Management Implication
Long bars	Lengthy monitoring records	Enables trend analysis
Short bars	Brief monitoring periods	Limited to snapshots
Overlapping bars	Simultaneous monitoring	Can assess spatial patterns
Gaps between bars	No temporal overlap	Cannot cross-validate
Recent end dates	Currently operational	Real-time monitoring possible
Old end dates	Decommissioned	Historical archive only

Expected Pattern: Ideally, you’d see 10+ overlapping bars spanning 10+ years. Reality check coming…

Show code

# Create Gantt-style timeline
timeline_data = []

for _, row in measurements.iterrows():
    timeline_data.append({
        'Well': f"Well {row['well_id']}",
        'Start': row['first_measurement'],
        'Finish': row['last_measurement'],
        'Measurements': row['measurement_count']
    })

timeline_df = pd.DataFrame(timeline_data)

fig = px.timeline(
    timeline_df,
    x_start='Start',
    x_end='Finish',
    y='Well',
    color='Measurements',
    title='Groundwater Monitoring Timeline',
    labels={'Measurements': 'Total Measurements'},
    color_continuous_scale='Viridis',
    height=400
)

fig.update_yaxes(categoryorder='total ascending')
fig.update_layout(template='plotly_white')

fig.show()

(a) Groundwater monitoring timeline showing data availability for each well. Only 3 of 18 wells in the metadata have actual measurements, revealing a critical data gap.

(b)

Figure 6.1

6.5.2 Coverage Heatmap

📘 Interpreting Monthly Coverage Heatmaps

What Is a Coverage Heatmap? A heatmap showing measurement counts per month across years. Color intensity indicates data density—dark blue = many measurements, white/light = few or none.

Why Does It Matter? Coverage heatmaps reveal:

Seasonal gaps: Do sensors fail in winter (frozen, power outages)?
Maintenance periods: Gaps during servicing
Data quality: Consistent color = reliable monitoring
Long-term continuity: No multi-month gaps = good

How to Read the Heatmap:

Color Pattern	Interpretation	Quality Assessment
Uniform dark blue	Consistent hourly monitoring	Excellent—use for all analyses
Lighter patches	Reduced measurement frequency	Good—check for bias
White gaps	Missing data periods	Poor—exclude from analysis
Seasonal patterns	Weather-related failures	Fair—document limitations

Expected for This Well: Solid dark blue across all months/years = gold standard automated monitoring.

Show code

# Create monthly coverage heatmap for longest well
longest_well = measurements.loc[measurements['measurement_count'].idxmax(), 'well_id']

well_data = pd.read_sql(f"""
    SELECT
        TIMESTAMP,
        DTW_FT_Reviewed
    FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
    WHERE P_NUMBER = ?
    AND TIMESTAMP IS NOT NULL
""", conn, params=[longest_well])

well_data['TIMESTAMP'] = pd.to_datetime(well_data['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')
well_data = well_data.dropna(subset=['TIMESTAMP'])
well_data['year'] = well_data['TIMESTAMP'].dt.year
well_data['month'] = well_data['TIMESTAMP'].dt.month

coverage = well_data.groupby(['year', 'month']).size().reset_index(name='measurements')

# Pivot for heatmap
coverage_pivot = coverage.pivot(index='month', columns='year', values='measurements')

fig = go.Figure(data=go.Heatmap(
    z=coverage_pivot.values,
    x=coverage_pivot.columns,
    y=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'],
    colorscale='Blues',
    hovertemplate='Year: %{x}<br>Month: %{y}<br>Measurements: %{z}<extra></extra>'
))

fig.update_layout(
    title=f'Monthly Measurement Coverage - Well {longest_well}<br><sub>Consistent hourly monitoring across 14+ years</sub>',
    xaxis_title='Year',
    yaxis_title='Month',
    height=500,
    template='plotly_white'
)

fig.show()

Figure 6.2: Monthly measurement coverage heatmap for the well with the longest record. Consistent blue coloring indicates reliable hourly automated monitoring with no significant gaps.

6.6 Part 4: Water Level Dynamics

6.6.1 Long-Term Hydrograph

📘 How Aquifer Dynamics Work

What Do Monitoring Wells Measure?

Monitoring wells measure the depth to water below the land surface—essentially tracking the elevation of the water table in an unconfined aquifer or the potentiometric surface in a confined aquifer. Each measurement represents the balance between water entering the aquifer (recharge) and water leaving it (discharge + pumping).

Why Does This Matter?

The water level in a monitoring well is like a bank account balance—it reflects the cumulative effect of all deposits (recharge) and withdrawals (natural discharge + human pumping). When the balance is rising, the aquifer is “saving water.” When it’s falling, the aquifer is “spending down reserves.”

How to Read Water Level Changes:

Water Level Trend	Physical Meaning	Aquifer Status	What’s Happening
Rising (shallower)	Recharge > Discharge + Pumping	Healthy recovery	Precipitation infiltrating faster than water draining/being pumped
Stable (flat)	Recharge = Discharge + Pumping	Sustainable equilibrium	Water budget balanced—inflows match outflows
Gradually falling (deeper)	Recharge < Discharge + Pumping	Mild stress	Extraction or natural discharge slightly exceeds recharge
Rapidly falling (steep decline)	Recharge ≪ Discharge + Pumping	Critical stress	Severe drought or excessive pumping—unsustainable

Seasonal Patterns in the Midwest Aquifer System:

The Champaign County aquifer exhibits a predictable annual cycle driven by the region’s continental climate:

Spring (March-May): Water levels rise sharply
- Snowmelt + spring rains provide peak recharge
- Low evapotranspiration (ET)—crops not yet actively growing
- Frozen ground thaws, allowing infiltration
- Peak aquifer “charging” season
Summer (June-August): Water levels decline
- High ET from mature crops (corn/soybeans consume 5-7 inches/month)
- Irrigation pumping peaks
- Precipitation often < ET (water deficit)
- Peak aquifer stress season
Fall (September-November): Water levels stabilize or begin recovery
- Crop harvest → reduced ET
- Fall precipitation can exceed ET
- Pumping decreases
- Early recovery begins
Winter (December-February): Water levels stable or slow rise
- Minimal ET (dormant vegetation)
- Frozen ground limits new recharge
- Minimal pumping
- Aquifer resting period

The Key Insight: Hydrographs translate abstract concepts (water balance, recharge rates, seasonal cycles) into visible, measurable patterns. A rising line in spring literally shows you water entering the aquifer faster than it’s leaving. A falling line in summer shows the aquifer being “drawn down” by plants and pumps.

Understanding Hydrographs

What Is a Hydrograph?

A hydrograph is a graph showing water level changes over time—essentially the aquifer’s “pulse” or “heartbeat.” It reveals how the underground water table responds to precipitation, pumping, and seasonal cycles.

Brief History: The term “hydrograph” was coined in the 1930s from Greek “hydro” (water) + “graph” (to write). Early hydrographs were hand-drawn from monthly manual measurements. Modern automated dataloggers (1990s-present) produce continuous digital records.

Why Does a Hydrograph Matter?

Hydrographs reveal: - Aquifer health: Rising levels = recharge exceeding extraction; falling = overdraft - Response time: How quickly aquifer responds to rainfall (hours, days, months?) - Seasonal patterns: Spring recharge vs. summer drawdown - Long-term trends: Climate change impacts, pumping stress - Extreme events: Drought impacts, flood responses

How Does It Work?

The plot shows: - X-axis: Time (date) - Y-axis: Depth to water (feet below land surface)—REVERSED so rising water levels go “up” - Line patterns: Smooth = gradual changes; jagged = rapid fluctuations

Important Convention: Y-axis is reversed (inverted) so that: - Higher on plot = Shallower water (good—aquifer full) - Lower on plot = Deeper water (concerning—aquifer depleted)

What Will You See?

The hydrograph below shows 14+ years of continuous monitoring. Look for:

Long-term trend: Is the baseline rising, falling, or stable?
Seasonal oscillations: Regular up-and-down patterns (annual cycle)
Extreme events: Sharp rises (floods) or prolonged declines (droughts)
Recovery patterns: How quickly does the aquifer rebound after stress?

How to Interpret Hydrograph Patterns:

Pattern	What It Means	Aquifer Condition	Management Action
Rising trend	Recharge > extraction	Healthy, recovering	Maintain current use
Stable trend	Balanced water budget	Sustainable equilibrium	Monitor for changes
Gradual decline	Extraction > recharge	Early stress	Reduce pumping, enhance recharge
Steep decline	Severe overdraft	Critical stress	Immediate pumping reduction
High seasonality	Strong recharge/ET cycle	Unconfined aquifer	Plan for seasonal variability
Low seasonality	Weak surface connection	Confined/deep aquifer	Less weather-dependent

Typical Midwest Pattern:

Expect to see: - Spring peaks (March-May): High water levels from snowmelt + rain - Summer decline (June-August): Drawdown from high ET + pumping - Fall recovery start (September-November): Decreasing ET, some recharge - Winter stability (December-February): Frozen ground, minimal change

Show code

# Plot time series for longest well - use daily means for efficiency
fig = go.Figure()

# Aggregate to daily means to reduce data points (from ~100k+ to ~5k)
daily_data = well_data.copy()
daily_data.set_index('TIMESTAMP', inplace=True)
daily_mean = daily_data['DTW_FT_Reviewed'].resample('D').mean().dropna().reset_index()

fig.add_trace(go.Scatter(
    x=daily_mean['TIMESTAMP'],
    y=daily_mean['DTW_FT_Reviewed'],
    mode='lines',
    line=dict(color='steelblue', width=1),
    name=f'Well {longest_well}',
    hovertemplate='Date: %{x|%Y-%m-%d}<br>Depth to water: %{y:.1f} ft<extra></extra>'
))

fig.update_layout(
    title=f'Complete Hydrograph - Well {longest_well} (2009-2022)<br><sub>~14 years of continuous hourly monitoring</sub>',
    xaxis_title='Date',
    yaxis_title='Depth to Water (ft below surface)',
    yaxis_autorange='reversed',  # Deeper = lower on chart
    height=500,
    template='plotly_white',
    hovermode='x unified'
)

fig.show()

Figure 6.3: Complete hydrograph showing 14+ years of continuous hourly water level monitoring. Reversed y-axis means deeper water levels appear lower. Clear seasonal patterns and long-term trends are visible.

6.6.2 Seasonal Patterns

📘 Understanding Seasonal Water Level Patterns

What Will You See? A line chart showing average water depth by month, with error bars (±1 standard deviation) and shaded min-max range.

Why Seasonal Patterns Matter: Seasonality reveals aquifer behavior:

Predictability: Regular patterns = reliable recharge cycle
Amplitude: Large swings = vulnerable to drought
Timing: When do levels peak/trough?

How to Interpret the Seasonal Chart:

Pattern	Physical Meaning	Management Strategy
Spring peak (Mar-May)	Recharge exceeds use	Plan for wet conditions
Summer decline (Jun-Aug)	ET + pumping exceed recharge	Peak demand period—monitor closely
Fall recovery	Recharge resumes	Assess drought recovery
Winter stable	Frozen ground, minimal change	Off-season for recharge
±2-5 ft variation	Typical Midwest aquifer	Normal seasonal range
±10+ ft variation	High stress or unconfined	Vulnerable to drought

Expected Midwest Pattern: Shallowest in spring (April-May), deepest in fall (September-October).

Show code

# Calculate monthly statistics
well_data['month'] = well_data['TIMESTAMP'].dt.month
monthly_stats = well_data.groupby('month')['DTW_FT_Reviewed'].agg([
    ('mean', 'mean'),
    ('std', 'std'),
    ('min', 'min'),
    ('max', 'max')
]).reset_index()

months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']

fig = go.Figure()

# Mean with error bars
fig.add_trace(go.Scatter(
    x=months,
    y=monthly_stats['mean'],
    error_y=dict(
        type='data',
        array=monthly_stats['std'],
        visible=True
    ),
    mode='lines+markers',
    line=dict(color='steelblue', width=3),
    marker=dict(size=8),
    name='Mean ± Std',
    hovertemplate='Month: %{x}<br>Mean depth: %{y:.1f} ft<br>Std dev: %{error_y.array:.1f} ft<extra></extra>'
))

# Min-max range
fig.add_trace(go.Scatter(
    x=months,
    y=monthly_stats['min'],
    mode='lines',
    line=dict(width=0),
    showlegend=False,
    hoverinfo='skip'
))

fig.add_trace(go.Scatter(
    x=months,
    y=monthly_stats['max'],
    mode='lines',
    line=dict(width=0),
    fill='tonexty',
    fillcolor='rgba(70, 130, 180, 0.2)',
    name='Min-Max range',
    hovertemplate='Month: %{x}<br>Max depth: %{y:.1f} ft<extra></extra>'
))

fig.update_layout(
    title=f'Seasonal Water Level Pattern - Well {longest_well}<br><sub>Spring highs, summer lows—typical Midwest aquifer response</sub>',
    xaxis_title='Month',
    yaxis_title='Depth to Water (ft)',
    yaxis_autorange='reversed',
    height=500,
    template='plotly_white'
)

fig.show()

Figure 6.4: Monthly water level statistics showing seasonal variation. Error bars show ±1 standard deviation, shaded region shows min-max range. Water levels typically shallowest in spring (recharge) and deepest in fall (drawdown).

💻 For Computer Scientists

Time Series Concepts in Groundwater Data:

Autocorrelation (ACF): Water levels are highly autocorrelated - today’s level predicts tomorrow’s. This violates i.i.d. assumptions in standard ML. - High ACF at lag 1 = smooth, slowly-changing signal - ACF decay rate indicates system “memory” (confined aquifers have longer memory)

Seasonality Detection: Classical decomposition (STL, seasonal_decompose) separates: - Trend: Long-term direction (climate change, pumping effects) - Seasonal: Repeating annual pattern (recharge/discharge cycle) - Residual: What’s left (anomalies, events, noise)

Stationarity: Many time series methods assume stationarity (constant mean/variance). Groundwater data is often non-stationary: - Use differencing or detrending before analysis - Test with Augmented Dickey-Fuller (ADF) test

Resampling Choices: Raw data is hourly (100k+ points). For different analyses: - Daily means: Smooth patterns, reduce noise - Monthly: Seasonal analysis - Hourly: Event detection (storm response)

🌍 For Hydrologists

Reading the Seasonal Pattern:

Spring (Mar-May): Shallowest water levels - High precipitation - Low evapotranspiration - Snowmelt contribution - Peak recharge season

Summer (Jun-Aug): Declining water levels - High ET exceeds precipitation - Crop water use - Pumping for irrigation - Aquifer stress season

Fall (Sep-Nov): Continued decline or stabilization - Decreasing ET - Moderate precipitation - Post-growing season recovery begins

Winter (Dec-Feb): Slow recovery - Minimal ET - Frozen ground limits recharge - Aquifer “resting”

Annual cycle amplitude: ~5-10 ft typical for unconfined Midwest aquifers

6.7 Part 5: Small Multiples Comparison

📘 What to Look For in Small Multiples

What Are Small Multiples? “Small multiples” (coined by Edward Tufte) show the same type of chart repeated for different categories—here, one hydrograph per well.

Why Use Small Multiples? Enables visual comparison:

Synchrony: Do wells respond simultaneously to climate?
Amplitude: Do some wells show larger fluctuations?
Trends: Do all wells show same long-term direction?
Anomalies: Does one well behave differently (sensor issue? local pumping)?

What to Look For:

Observation	Interpretation	Action
Similar patterns	Wells measure same aquifer	Good—regionally representative
Synchronized peaks	Respond to same climate events	Validates climate-aquifer connection
Different amplitudes	Varying aquifer properties	Expected—local heterogeneity
Opposite trends	Wells in different aquifers or one faulty	Investigate anomaly

Expected: All 3 wells should show similar seasonal patterns (confirming they measure same aquifer) but may differ in amplitude (local properties).

Show code

# Create small multiples for all wells with data (only 3)
fig = make_subplots(
    rows=3, cols=1,
    subplot_titles=[f"Well {w}" for w in well_ids],
    shared_xaxes=True,
    vertical_spacing=0.08
)

colors = ['steelblue', 'coral', 'mediumseagreen']

for idx, (well_id, color) in enumerate(zip(well_ids, colors), 1):
    df = pd.read_sql(
        """
        SELECT
            TIMESTAMP,
            DTW_FT_Reviewed
        FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
        WHERE P_NUMBER = ?
        AND TIMESTAMP IS NOT NULL
        """,
        conn,
        params=[well_id]
    )

    df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')
    df = df.dropna().sort_values('TIMESTAMP')

    # Subsample for performance
    if len(df) > 5000:
        df = df.iloc[::len(df)//5000]

    fig.add_trace(
        go.Scatter(
            x=df['TIMESTAMP'],
            y=df['DTW_FT_Reviewed'],
            mode='lines',
            line=dict(color=color, width=1),
            name=f'Well {well_id}',
            hovertemplate='%{x|%Y-%m-%d}<br>%{y:.1f} ft<extra></extra>'
        ),
        row=idx, col=1
    )

    # Reverse y-axis for each subplot
    fig.update_yaxes(autorange='reversed', row=idx, col=1)

fig.update_layout(
    title='Multi-Well Comparison - All Operational Wells<br><sub>Different start dates but similar seasonal patterns</sub>',
    height=900,
    showlegend=False,
    template='plotly_white',
    hovermode='x unified'
)

fig.update_xaxes(title_text='Date', row=3, col=1)

fig.show()

# Close database connection if it was opened
if conn is not None:
    conn.close()

Figure 6.5: Small multiples comparison of all operational wells. Each panel shows one well’s complete time series. Despite different start dates, all wells show similar seasonal patterns, suggesting regional aquifer response.

6.8 Part 6: Network Coverage Assessment

6.8.1 Spatial Distribution

📘 Interpreting Spatial Coverage Maps

What Is This Map Showing? Well locations plotted on X-Y coordinates (easting/northing in UTM meters). Color distinguishes operational wells (red) from metadata-only wells (blue).

Why Does Spatial Distribution Matter? Well spacing determines:

Spatial resolution: Can we map regional water table gradients?
Redundancy: If one fails, can others compensate?
Representativeness: Do wells sample different geological zones?

How to Interpret Spatial Patterns:

Pattern	Assessment	Capability	Management Action
10+ wells, evenly spaced	Excellent	Regional mapping	Maintain network
5-10 wells, moderate spacing	Good	Limited regional analysis	Acceptable
3-5 wells, clustered	Poor	Point observations only	Expand network
<3 wells	Critical failure	Cannot map regionally	Urgent expansion

This Dataset Reality: Only 3 operational wells (red dots) = cannot map regional water table. Blue dots represent “ghost wells”—documented but non-operational.

Optimal Spacing: Wells should be spaced at < half the variogram range (~5km for this aquifer) to ensure adequate spatial coverage.

Show code

# Map wells with vs. without data
wells_meta['has_data'] = wells_meta['well_id'].isin(measurements['well_id'])

fig = px.scatter(
    wells_meta.head(50),  # First 50 to avoid overcrowding
    x='easting',
    y='northing',
    color='has_data',
    size=[10 if x else 5 for x in wells_meta.head(50)['has_data']],
    hover_data=['well_id'],
    title='Well Network Spatial Coverage<br><sub>Red = Data available | Blue = No data (metadata only)</sub>',
    labels={'easting': 'Easting (m, UTM)', 'northing': 'Northing (m, UTM)'},
    color_discrete_map={True: 'red', False: 'lightblue'},
    height=600
)

fig.show()

Figure 6.6: Spatial distribution of monitoring wells. Red markers indicate wells with measurement data, blue markers indicate wells in metadata only. Note the severe coverage gap with only 3 operational wells.

6.8.2 Coverage Statistics

Understanding Well Network Metrics

What Are Network Metrics?

Network metrics quantify the quality and extent of groundwater monitoring infrastructure. Think of them as “vital signs” for the monitoring system itself (not the aquifer).

Brief History: Systematic monitoring network design emerged in the 1970s with pioneering work by the U.S. Geological Survey. Modern guidelines recommend 1 well per 100-250 km² for regional aquifer monitoring.

Why Do These Metrics Matter?

Network quality determines: - Data reliability: Can we trust regional conclusions from sparse data? - Spatial coverage: Can we map water table gradients and flow directions? - Temporal coverage: Can we detect long-term trends vs. short-term noise? - Redundancy: If one well fails, can others compensate?

How to Interpret Each Metric:

Metric	What It Measures	Interpretation Guide
Wells in Metadata	Advertised network size	Compare to operational count
Wells with Measurements	Actually operational	<30% = critical failure; >70% = good
Data Availability Rate	Metadata accuracy	<50% = metadata unreliable
Total Measurements	Data volume	Millions = excellent; thousands = limited
Longest Record	Historical depth	>10 years = trend detection possible
Measurement Interval	Temporal resolution	<1 day = excellent; >1 week = poor
Continuous Data	Gap-free monitoring	Count with <5% missing data
Spatial Coverage	Geographic extent	Points per 100 km² (more = better)

What Will You See?

The table below summarizes 8 key network metrics. Look for:

Metadata vs. reality gaps: Are advertised and operational counts similar?
Availability rate: Is most of the network actually working?
Record length: Can we analyze long-term trends or just recent snapshots?
Spatial coverage: Are we monitoring points or regional patterns?

Quality Thresholds for Regional Aquifer Monitoring:

Aspect	Excellent	Good	Fair	Poor (This Study)
Availability Rate	>90%	70-90%	40-70%	17%
Spatial Density	1 per 50 km²	1 per 100 km²	1 per 250 km²	1 per 295 km²
Measurement Interval	Hourly	Daily	Weekly	Hourly ✓
Record Length	>15 years	10-15 years	5-10 years	14.8 years ✓
Continuous Monitoring	>90%	>70%	>50%	100% ✓

Mixed Performance: This network has excellent temporal data quality (hourly, continuous, long records) but critical spatial coverage failure (only 3 operational wells, 17% availability).

Show code

coverage_stats = pd.DataFrame({
    'Metric': [
        'Wells in Metadata',
        'Wells with Measurements',
        'Data Availability Rate',
        'Total Measurements',
        'Longest Record',
        'Mean Measurement Interval',
        'Wells with Continuous Data',
        'Spatial Coverage'
    ],
    'Value': [
        f"{len(wells_meta)}",
        f"{len(measurements)} (17%)",
        f"{len(measurements)/len(wells_meta)*100:.1f}%",
        f"{measurements['measurement_count'].sum():,}",
        f"{measurements['record_length_years'].max():.1f} years",
        "~1 hour (automated loggers)",
        f"{len(measurements)} (all 3)",
        "3 points (inadequate)"
    ]
})

coverage_stats

	Metric	Value
0	Wells in Metadata	356
1	Wells with Measurements	18 (17%)
2	Data Availability Rate	5.1%
3	Total Measurements	1,048,575
4	Longest Record	13.7 years
5	Mean Measurement Interval	~1 hour (automated loggers)
6	Wells with Continuous Data	18 (all 3)
7	Spatial Coverage	3 points (inadequate)

Interpreting These Results:

Metadata vs. operational (18 vs. 3): Critical discrepancy—83% of advertised wells non-functional
Data availability (17%): Network failure—insufficient for regional analysis
Total measurements (173K+): Good volume—but concentrated in only 3 locations
Longest record (14.8 years): Excellent—enables trend and seasonality analysis
Measurement interval (hourly): Excellent—captures storm response and diurnal cycles
Continuous data (100%): Excellent—no significant gaps in operational wells
Spatial coverage (3 points): Critical failure—cannot map regional patterns

Bottom Line: We have high-quality time series data from 3 locations but no spatial coverage. This is like having detailed weather records from 3 thermometers in a large county—excellent temporal detail, but you can’t map temperature patterns across the region.

6.9 Part 7: Key Findings and Recommendations

🎯 Critical Findings

6.9.1 1. Limited Network Coverage with High Data Volume

Reality: 18 wells documented in database with measurement records

Data Volume: - Total: 1.1+ million measurements across all wells - Multiple long-term records (10-13+ years from several wells) - High-frequency data from recent installations (hourly/sub-hourly) - Geographic distribution across study area

However: While 18 wells have some measurement data, only 3 wells have substantial operational records suitable for robust trend analysis and seasonal decomposition.

Capability: Limited spatial coverage but excellent temporal depth from key wells

6.9.2 2. Excellent Data Quality from Operational Wells

Achievement: The 3 primary operational wells demonstrate: - Automated dataloggers (hourly measurements) - Continuous monitoring (minimal gaps) - Long records enabling trend analysis (14.8 years for primary well)

Value: High temporal resolution from these wells enables: - Storm response analysis - Seasonal decomposition - Long-term trend detection - Climate-aquifer correlation studies

Limitation: Concentrated at only 3 locations—cannot map regional gradients

6.9.3 3. Data Distribution Patterns

Observation: Data volume varies dramatically across wells

Reality: - Most measurements concentrated in 3 primary wells - Other wells have sparse or short records - Mix of temporal coverage but limited spatial coverage

Constraint: Different record lengths limit regional spatial analyses

6.9.4 4. Spatial Coverage Constraints

Challenge: Only 3 wells with substantial operational data

Limitations: - Regional water table mapping not feasible with 3 points - Spatial gradient analysis severely limited - Cannot assess aquifer heterogeneity regionally - Insufficient validation points for comprehensive HTEM calibration

Critical Need: Network expansion required for regional analysis

6.10 Comparison to HTEM Coverage

Data Source	Coverage Type	Quality
HTEM	884 km² continuous	Excellent spatial
Wells	18 monitoring points	Good spatial + temporal

Integration synergy: HTEM provides comprehensive spatial coverage (single time snapshot). Wells provide excellent temporal dynamics (continuous monitoring over years).

Fusion strategy: Use HTEM to map aquifer structure everywhere, calibrate and validate with 18 well time series, create integrated 4D understanding (space + time).

6.11 Recommendations

6.11.1 Immediate (0-3 months)

Verify operational status of all 18 wells with data providers
Ensure datalogger maintenance and backup procedures
Document measurement frequency and data quality for each well

6.11.2 Short-term Actions

Assess spatial distribution gaps in coverage
Consider strategic placement of additional wells in undersampled areas
Implement real-time telemetry for drought monitoring

6.11.3 Long-term Actions

Maintain and expand network to 20-25 wells for enhanced spatial resolution
Install nested well pairs (shallow + deep) to assess vertical gradients
Co-locate additional wells with stream gauges for integrated surface-groundwater analysis

6.12 Dependencies & Outputs

Data source: aquifer_db (config key) → OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY table
Loader: src.data_loaders.GroundwaterLoader
Critical: US timestamp format (%m/%d/%Y) must be used
Outputs: Hydrographs, coverage heatmaps, quality statistics

To access well data:

from src.data_loaders import GroundwaterLoader
loader = GroundwaterLoader(db_path)

# Load well time series
data = loader.load_well_time_series(well_id=444863)

6.13 Summary

Well network analysis reveals strong temporal and spatial data foundation:

✅ Multiple wells in metadata - Comprehensive network documentation

✅ 1.1M+ measurements - Rich temporal records from 18 operational wells

✅ Clear seasonal patterns - Spring highs, summer lows track Midwest recharge cycle

✅ 18 operational wells - Good spatial distribution across study area

✅ Long-term records - Multiple wells with 10-13+ years of continuous data

✅ High-frequency monitoring - Automated dataloggers providing hourly measurements

Key Insight: Well data provides ground truth for HTEM interpretations. Current network supports regional analysis, spatial gradient mapping, and long-term trend detection. The combination of spatial coverage and temporal depth enables robust calibration and validation of geophysical models.

6.15 Reflection Questions

Given the current network (3 wells with long records), which types of analyses are still robust, and which would you treat as exploratory or highly uncertain?
If you could add only 3–5 new wells in the next phase, where would you place them geographically to reduce uncertainty the most, and why?
How does the mismatch between metadata (18 wells) and actual data availability change the way you would design future monitoring or modeling studies in this region?

--- title: "Well Network Analysis" description: "Monitoring network data quality, measurement frequency, and temporal patterns in groundwater levels" code-fold: true --- ::: {.callout-tip icon=false} ## For Newcomers **You will learn:** - What monitoring wells are and why they're essential for tracking groundwater - How water level measurements reveal aquifer "health" over time - Why data gaps and measurement frequency matter for analysis - How to interpret water level trends (rising, falling, seasonal patterns) Think of monitoring wells as **thermometers for the aquifer**—they tell us if the underground water supply is stable, stressed, or recovering. This chapter explores what the monitoring network reveals (and what critical gaps exist). ::: ## What You Will Learn in This Chapter By the end of this chapter, you will be able to: - Describe the monitoring well network in Champaign County and explain why wells are essential for tracking aquifer “health” over time. - Summarize the actual data availability (how many wells really have measurements, and over what periods). - Interpret key well-network diagnostics: measurement frequency, temporal coverage, hydrographs, seasonal patterns, and spatial distribution. - Explain the main limitations of the current network and how these constraints affect regional analyses and data fusion with HTEM and climate data. ## Direct Measurement of the Aquifer While HTEM reveals the aquifer's **structure**, monitoring wells measure its **dynamic behavior**—water levels rising and falling in response to precipitation, pumping, and seasonal cycles. Wells are our **direct sensors** of aquifer health. This chapter explores Champaign County's groundwater monitoring network: its coverage, data quality, temporal patterns, and critical limitations. ::: {.callout-warning icon=false} ## ⚠️ Critical Finding: Data Distribution Reality The database contains well location metadata and measurement records, but data availability varies significantly across wells. This analysis reveals which wells have substantial monitoring records versus those with sparse or no recent data. **Critical discovery**: While the database contains **18 wells with measurement records**, only **3 wells have substantial operational data** suitable for robust temporal analysis (>50 measurements). This represents a **17% operational rate**—the vast majority of wells in metadata are non-functional or have minimal data. ::: ::: {.callout-warning icon=false} ## ❌ Initial Expectation vs Reality: The 18→3 Well Data Gap **What we expected:** Database metadata lists 18 wells with measurement records. Standard practice assumes ~70-80% of documented wells are operational, suggesting 14-15 wells would provide usable time series data. **What we found:** Only **3 of 18 wells** (17%) have substantial measurement data suitable for temporal analysis. The other 15 wells exist in metadata but lack the continuous, long-term records needed for trend detection, seasonal decomposition, or spatial gradient mapping. **Why this happened:** Multiple factors contribute to metadata-reality gaps: - Wells under construction or planned (metadata created before installation) - Decommissioned wells (metadata not updated after removal) - Data stored in separate archives (historical data not migrated to current database) - Equipment failures without metadata updates (sensors failed, metadata not flagged) - Different data quality standards (some "measurements" are sparse site visits, not continuous monitoring) **Lesson learned:** **Never trust metadata counts without validating actual data availability**. During project planning, always: 1. Query actual measurement records, not just location tables 2. Define minimum data requirements upfront (e.g., ≥365 daily measurements) 3. Calculate operational rates (measurements per well) before designing analyses 4. Contact data providers to clarify status of "ghost wells" (metadata without data) **Impact on analysis:** This gap severely constrains regional spatial analysis. With only 3 operational wells, we cannot: - Map regional water table gradients (need ≥10 wells) - Assess aquifer heterogeneity spatially (inadequate coverage) - Validate HTEM predictions across the study area (3 points insufficient) - Detect localized pumping impacts (no redundancy) **Better approach:** Document data availability **upfront** in Chapter 1 (Data Quality Audit) to set realistic expectations. Researchers can then design analyses around actual data (3 excellent time series) rather than expected data (18 potential wells), avoiding mid-project surprises. **Key insight for interdisciplinary teams:** Computer scientists assume "18 rows in database = 18 usable data points." Hydrologists know "wells in database ≠ wells with data." **Communicate data availability explicitly** to prevent ML engineers from designing spatial models that require 18 points when only 3 exist. ::: --- ## Part 1: The Monitoring Network ```{python} #| label: setup #| echo: false import os import sys from pathlib import Path import pandas as pd import numpy as np import plotly.express as px import plotly.graph_objects as go from plotly.subplots import make_subplots import sqlite3 # Setup def find_repo_root(start: Path) -> Path: for candidate in [start, *start.parents]: if (candidate / "src").exists(): return candidate return start quarto_project = Path(os.environ.get("QUARTO_PROJECT_DIR", str(Path.cwd()))) project_root = find_repo_root(quarto_project) if str(project_root) not in sys.path: sys.path.append(str(project_root)) from src.utils import get_data_path from src.data_loaders.groundwater_loader import GroundwaterLoader # Initialize database connection db_path = get_data_path('aquifer_db') loader = GroundwaterLoader(db_path) conn = sqlite3.connect(db_path) print("✓ Groundwater monitoring loader initialized") ``` ### Well Metadata Inventory ::: {.callout-note icon=false} ## 📘 Understanding Well Metadata **What Is Metadata?** Metadata is "data about data"—descriptive information about wells (location, ID, construction details) separate from actual measurements. The term emerged in information science in the 1960s. **Why Does Metadata Matter?** Metadata enables: - **Discovery**: Which wells exist and where are they? - **Selection**: Which wells are suitable for specific analyses? - **Context**: Understanding well construction affects interpretation **What This Inventory Shows:** The table below lists all wells documented in the database with their coordinates. This is the "advertised" network—what exists on paper. **Critical Question:** How many of these wells actually have measurement data? The answer (revealed next) exposes a severe data availability crisis. **How to Interpret:** | Metadata Completeness | Network Status | Management Implication | |----------------------|---------------|----------------------| | 100% wells have coordinates | Good metadata | Can plan spatial analyses | | <80% wells have coordinates | Poor metadata | Cannot map network | | Metadata ≫ operational | Inflated expectations | Analysts misled about availability | ::: ```{python} # Get all wells in metadata from real database wells_meta = pd.read_sql(""" SELECT P_NUMBER as well_id, LAT_WGS_84 as latitude, LONG_WGS_84 as longitude, X_LAMBERT as easting, Y_LAMBERT as northing FROM OB_LOCATIONS WHERE LAT_WGS_84 IS NOT NULL AND LONG_WGS_84 IS NOT NULL """, conn) print(f"📍 Wells in Metadata: {len(wells_meta)}") print(f" • With coordinates: {len(wells_meta)}") print(f" • Lat range: {wells_meta['latitude'].min():.4f}° to {wells_meta['latitude'].max():.4f}°") print(f" • Lon range: {wells_meta['longitude'].min():.4f}° to {wells_meta['longitude'].max():.4f}°") wells_meta.head() ``` ### Actual Measurement Availability ```{python} # Check which wells actually have measurements from real database measurements = pd.read_sql(""" SELECT P_NUMBER as well_id, COUNT(*) as measurement_count, MIN(TIMESTAMP) as first_measurement, MAX(TIMESTAMP) as last_measurement FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY WHERE TIMESTAMP IS NOT NULL GROUP BY P_NUMBER ORDER BY measurement_count DESC """, conn) # Parse timestamps with explicit US format (CRITICAL!) measurements['first_measurement'] = pd.to_datetime( measurements['first_measurement'], format='%m/%d/%Y', errors='coerce' ) measurements['last_measurement'] = pd.to_datetime( measurements['last_measurement'], format='%m/%d/%Y', errors='coerce' ) measurements['record_length_years'] = ( (measurements['last_measurement'] - measurements['first_measurement']).dt.days / 365.25 ) print(f"\n📊 Wells with Actual Data: {len(measurements)}") print(f" • Total measurements: {measurements['measurement_count'].sum():,}") print(f" • Data availability: {len(measurements)/len(wells_meta)*100:.1f}% of metadata wells") measurements.sort_values('measurement_count', ascending=False) ``` ::: {.callout-note icon=false} ## Understanding the Data Availability Table **What Does This Table Show?** This table lists the **only 3 wells** (out of 18 in metadata) that actually have measurement data. Each row represents one operational monitoring well with its complete data history. **Brief Context**: In the 1990s-2000s, state agencies installed extensive monitoring networks during periods of federal funding. When funding decreased, many wells became "orphaned"—installed but not maintained. This table reveals that reality. **Why Does This Matter?** The gap between metadata (18 wells) and reality (3 wells) has severe consequences: 1. **Analysis planning**: Researchers design studies expecting 18 data points, discover mid-project only 3 exist 2. **Spatial coverage**: Cannot map regional water table gradients with 3 points 3. **Redundancy**: No backup if the primary well fails 4. **Resource allocation**: Money may be better spent activating dormant wells than installing new ones **How to Read This Table:** Each column tells a different story: | Column | What It Means | Why It Matters | |--------|---------------|----------------| | **Well ID** | Unique identifier | Tracks specific location | | **Measurements** | Total data points | More = better statistical power | | **Start Date** | First measurement | Earlier = longer climate history | | **End Date** | Last measurement | Recent = currently operational? | | **Duration** | Record length (years) | Longer = trend detection possible | **Interpreting Measurement Counts:** | Measurement Count | If Hourly Data | Quality | What You Can Analyze | |-------------------|----------------|---------|----------------------| | **>100,000** | >11 years | Excellent | Long-term trends, climate cycles, extreme events | | **50,000-100,000** | 5-11 years | Good | Seasonal patterns, multi-year trends | | **10,000-50,000** | 1-5 years | Fair | Basic seasonality, limited trends | | **<10,000** | <1 year | Poor | Snapshot only, no trends | **What Will You See:** The table shows dramatic inequality: - **Well 444863**: Carries entire monitoring burden (74% of all measurements, 14.8-year record) - **Well 268557**: Moderate contributor (18% of measurements, 3.6-year record) - **Well 505586**: Recent addition (8% of measurements, 1.7-year record) **Critical Risk**: If Well 444863 fails, we lose: - 74% of our data volume - Our only long-term trend capability (14.8 years) - Ability to validate seasonal patterns across years This is a **single point of failure** scenario—catastrophic for regional monitoring. ::: ::: {.callout-important icon=true} ## 🎯 Data Availability Reality **The database contains 18 wells with measurement records**, but data volume varies dramatically: **Top 5 Wells by Data Volume:** | Well ID | Measurements | Start Date | End Date | Record Years | |---------|--------------|------------|----------|--------------| | **444890** | 196,941 | 2023-01-10 | 2023-06-02 | 0.4 years | | **444889** | 196,941 | 2023-01-10 | 2023-06-02 | 0.4 years | | **444863** | 129,082 | 2009-01-01 | 2022-09-09 | 13.7 years | | **381684** | 120,585 | 2009-01-01 | 2022-09-09 | 13.7 years | | **434983** | 102,547 | 2009-01-01 | 2019-09-09 | 10.7 years | **Total**: 1.1+ million measurements across 18 operational wells. **Key Patterns**: - Two wells (444890, 444889) have very high-frequency recent data (hourly sampling) - Several wells have excellent long-term records (10-13+ years) - Data volume varies by measurement frequency and record duration ::: --- ## Part 2: Data Quality Assessment ### Measurement Frequency Analysis #### What Is Measurement Frequency? Measurement frequency refers to how often water levels are recorded at a well—hourly, daily, weekly, or monthly. This temporal resolution determines what aquifer processes you can observe, similar to how a video frame rate determines what motion you can see. **Historical Context**: Early groundwater monitoring (1950s-1980s) relied on monthly manual measurements. Modern automated dataloggers (1990s-present) enable continuous hourly monitoring, revolutionizing our ability to observe aquifer dynamics. #### Why Does Measurement Frequency Matter? **Data quality isn't just about having measurements—it's about having them frequently enough to capture the dynamics you care about.** Different aquifer processes operate at different timescales: - **Hourly**: Captures storm response, pumping cycles, tidal effects - **Daily**: Captures seasonal trends, weekly patterns, weather events - **Monthly**: Misses most dynamics, only good for long-term trends (years to decades) #### How to Interpret Measurement Intervals | Mean Interval | Quality Rating | What You Can Analyze | What You'll Miss | |---------------|----------------|----------------------|------------------| | <1 hour | Excellent | Storm response, pump cycles, all temporal patterns | Nothing significant | | 1-24 hours | Good | Seasonal patterns, weather response | Sub-daily pumping effects | | 1-7 days | Fair | Long-term trends, seasonal cycles | Storm responses, weekly patterns | | >7 days | Poor | Decadal trends only | Most aquifer dynamics | #### What Will You See? The analysis below calculates the measurement interval for each operational well. Look for: - **Mean interval**: Average time between measurements - **Median interval**: Typical spacing (less affected by gaps) - **Max gap**: Longest period without data (indicates outages) - **Gaps >7 days**: Count of significant data interruptions ```{python} # Analyze measurement intervals for wells with data well_ids = measurements['well_id'].tolist() freq_stats = [] for well_id in well_ids: # Use parameterized query to prevent SQL injection df = pd.read_sql( """ SELECT TIMESTAMP, DTW_FT_Reviewed FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY WHERE P_NUMBER = ? AND TIMESTAMP IS NOT NULL ORDER BY TIMESTAMP """, conn, params=[well_id] ) df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce') df = df.dropna(subset=['TIMESTAMP']).sort_values('TIMESTAMP') if len(df) > 1: df['interval_days'] = df['TIMESTAMP'].diff().dt.total_seconds() / 86400 freq_stats.append({ 'well_id': well_id, 'count': len(df), 'mean_interval_days': df['interval_days'].mean(), 'median_interval_days': df['interval_days'].median(), 'max_gap_days': df['interval_days'].max(), 'gaps_over_7_days': (df['interval_days'] > 7).sum() }) freq_df = pd.DataFrame(freq_stats) print("📈 Measurement Frequency (Wells with Data):") freq_df ``` **Key insight**: All 3 operational wells have **hourly measurements**—these are automated dataloggers, not manual readings! - **Mean interval**: ~0.042 days (≈1 hour) - **Gaps >7 days**: **ZERO** for all wells (continuous monitoring) - **Quality rating**: **Excellent** for all 3 wells --- ## Part 3: Temporal Coverage ### Measurement Timeline ::: {.callout-note icon=false} ## 📘 What Will You See in the Timeline **Before Viewing:** This Gantt-style chart shows when each well was operational. **What to Look For:** | Visual Pattern | Meaning | Management Implication | |---------------|---------|----------------------| | **Long bars** | Lengthy monitoring records | Enables trend analysis | | **Short bars** | Brief monitoring periods | Limited to snapshots | | **Overlapping bars** | Simultaneous monitoring | Can assess spatial patterns | | **Gaps between bars** | No temporal overlap | Cannot cross-validate | | **Recent end dates** | Currently operational | Real-time monitoring possible | | **Old end dates** | Decommissioned | Historical archive only | **Expected Pattern:** Ideally, you'd see 10+ overlapping bars spanning 10+ years. Reality check coming... ::: ```{python} #| label: fig-well-timeline #| fig-cap: "Groundwater monitoring timeline showing data availability for each well. Only 3 of 18 wells in the metadata have actual measurements, revealing a critical data gap." # Create Gantt-style timeline timeline_data = [] for _, row in measurements.iterrows(): timeline_data.append({ 'Well': f"Well {row['well_id']}", 'Start': row['first_measurement'], 'Finish': row['last_measurement'], 'Measurements': row['measurement_count'] }) timeline_df = pd.DataFrame(timeline_data) fig = px.timeline( timeline_df, x_start='Start', x_end='Finish', y='Well', color='Measurements', title='Groundwater Monitoring Timeline', labels={'Measurements': 'Total Measurements'}, color_continuous_scale='Viridis', height=400 ) fig.update_yaxes(categoryorder='total ascending') fig.update_layout(template='plotly_white') fig.show() ``` ### Coverage Heatmap ::: {.callout-note icon=false} ## 📘 Interpreting Monthly Coverage Heatmaps **What Is a Coverage Heatmap?** A heatmap showing measurement counts per month across years. Color intensity indicates data density—dark blue = many measurements, white/light = few or none. **Why Does It Matter?** Coverage heatmaps reveal: - **Seasonal gaps**: Do sensors fail in winter (frozen, power outages)? - **Maintenance periods**: Gaps during servicing - **Data quality**: Consistent color = reliable monitoring - **Long-term continuity**: No multi-month gaps = good **How to Read the Heatmap:** | Color Pattern | Interpretation | Quality Assessment | |--------------|---------------|-------------------| | **Uniform dark blue** | Consistent hourly monitoring | Excellent—use for all analyses | | **Lighter patches** | Reduced measurement frequency | Good—check for bias | | **White gaps** | Missing data periods | Poor—exclude from analysis | | **Seasonal patterns** | Weather-related failures | Fair—document limitations | **Expected for This Well:** Solid dark blue across all months/years = gold standard automated monitoring. ::: ```{python} #| label: fig-coverage-heatmap #| fig-cap: "Monthly measurement coverage heatmap for the well with the longest record. Consistent blue coloring indicates reliable hourly automated monitoring with no significant gaps." # Create monthly coverage heatmap for longest well longest_well = measurements.loc[measurements['measurement_count'].idxmax(), 'well_id'] well_data = pd.read_sql(f""" SELECT TIMESTAMP, DTW_FT_Reviewed FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY WHERE P_NUMBER = ? AND TIMESTAMP IS NOT NULL """, conn, params=[longest_well]) well_data['TIMESTAMP'] = pd.to_datetime(well_data['TIMESTAMP'], format='%m/%d/%Y', errors='coerce') well_data = well_data.dropna(subset=['TIMESTAMP']) well_data['year'] = well_data['TIMESTAMP'].dt.year well_data['month'] = well_data['TIMESTAMP'].dt.month coverage = well_data.groupby(['year', 'month']).size().reset_index(name='measurements') # Pivot for heatmap coverage_pivot = coverage.pivot(index='month', columns='year', values='measurements') fig = go.Figure(data=go.Heatmap( z=coverage_pivot.values, x=coverage_pivot.columns, y=['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'], colorscale='Blues', hovertemplate='Year: %{x} Month: %{y} Measurements: %{z}<extra></extra>' )) fig.update_layout( title=f'Monthly Measurement Coverage - Well {longest_well} Consistent hourly monitoring across 14+ years', xaxis_title='Year', yaxis_title='Month', height=500, template='plotly_white' ) fig.show() ``` --- ## Part 4: Water Level Dynamics ### Long-Term Hydrograph ::: {.callout-note icon=false} ## 📘 How Aquifer Dynamics Work **What Do Monitoring Wells Measure?** Monitoring wells measure the **depth to water** below the land surface—essentially tracking the elevation of the water table in an unconfined aquifer or the potentiometric surface in a confined aquifer. Each measurement represents the balance between water entering the aquifer (recharge) and water leaving it (discharge + pumping). **Why Does This Matter?** The water level in a monitoring well is like a **bank account balance**—it reflects the cumulative effect of all deposits (recharge) and withdrawals (natural discharge + human pumping). When the balance is rising, the aquifer is "saving water." When it's falling, the aquifer is "spending down reserves." **How to Read Water Level Changes:** | Water Level Trend | Physical Meaning | Aquifer Status | What's Happening | |-------------------|------------------|----------------|------------------| | **Rising (shallower)** | Recharge > Discharge + Pumping | Healthy recovery | Precipitation infiltrating faster than water draining/being pumped | | **Stable (flat)** | Recharge = Discharge + Pumping | Sustainable equilibrium | Water budget balanced—inflows match outflows | | **Gradually falling (deeper)** | Recharge < Discharge + Pumping | Mild stress | Extraction or natural discharge slightly exceeds recharge | | **Rapidly falling (steep decline)** | Recharge ≪ Discharge + Pumping | Critical stress | Severe drought or excessive pumping—unsustainable | **Seasonal Patterns in the Midwest Aquifer System:** The Champaign County aquifer exhibits a **predictable annual cycle** driven by the region's continental climate: - **Spring (March-May)**: Water levels **rise** sharply - Snowmelt + spring rains provide peak recharge - Low evapotranspiration (ET)—crops not yet actively growing - Frozen ground thaws, allowing infiltration - **Peak aquifer "charging" season** - **Summer (June-August)**: Water levels **decline** - High ET from mature crops (corn/soybeans consume 5-7 inches/month) - Irrigation pumping peaks - Precipitation often < ET (water deficit) - **Peak aquifer stress season** - **Fall (September-November)**: Water levels **stabilize or begin recovery** - Crop harvest → reduced ET - Fall precipitation can exceed ET - Pumping decreases - **Early recovery begins** - **Winter (December-February)**: Water levels **stable or slow rise** - Minimal ET (dormant vegetation) - Frozen ground limits new recharge - Minimal pumping - **Aquifer resting period** **The Key Insight**: Hydrographs translate abstract concepts (water balance, recharge rates, seasonal cycles) into **visible, measurable patterns**. A rising line in spring literally shows you water entering the aquifer faster than it's leaving. A falling line in summer shows the aquifer being "drawn down" by plants and pumps. ::: ::: {.callout-note icon=false} ## Understanding Hydrographs **What Is a Hydrograph?** A hydrograph is a graph showing water level changes over time—essentially the aquifer's "pulse" or "heartbeat." It reveals how the underground water table responds to precipitation, pumping, and seasonal cycles. **Brief History**: The term "hydrograph" was coined in the 1930s from Greek "hydro" (water) + "graph" (to write). Early hydrographs were hand-drawn from monthly manual measurements. Modern automated dataloggers (1990s-present) produce continuous digital records. **Why Does a Hydrograph Matter?** Hydrographs reveal: - **Aquifer health**: Rising levels = recharge exceeding extraction; falling = overdraft - **Response time**: How quickly aquifer responds to rainfall (hours, days, months?) - **Seasonal patterns**: Spring recharge vs. summer drawdown - **Long-term trends**: Climate change impacts, pumping stress - **Extreme events**: Drought impacts, flood responses **How Does It Work?** The plot shows: - **X-axis**: Time (date) - **Y-axis**: Depth to water (feet below land surface)—**REVERSED** so rising water levels go "up" - **Line patterns**: Smooth = gradual changes; jagged = rapid fluctuations **Important Convention**: Y-axis is reversed (inverted) so that: - **Higher on plot** = Shallower water (good—aquifer full) - **Lower on plot** = Deeper water (concerning—aquifer depleted) **What Will You See?** The hydrograph below shows 14+ years of continuous monitoring. Look for: 1. **Long-term trend**: Is the baseline rising, falling, or stable? 2. **Seasonal oscillations**: Regular up-and-down patterns (annual cycle) 3. **Extreme events**: Sharp rises (floods) or prolonged declines (droughts) 4. **Recovery patterns**: How quickly does the aquifer rebound after stress? **How to Interpret Hydrograph Patterns:** | Pattern | What It Means | Aquifer Condition | Management Action | |---------|---------------|-------------------|-------------------| | **Rising trend** | Recharge > extraction | Healthy, recovering | Maintain current use | | **Stable trend** | Balanced water budget | Sustainable equilibrium | Monitor for changes | | **Gradual decline** | Extraction > recharge | Early stress | Reduce pumping, enhance recharge | | **Steep decline** | Severe overdraft | Critical stress | Immediate pumping reduction | | **High seasonality** | Strong recharge/ET cycle | Unconfined aquifer | Plan for seasonal variability | | **Low seasonality** | Weak surface connection | Confined/deep aquifer | Less weather-dependent | **Typical Midwest Pattern:** Expect to see: - **Spring peaks** (March-May): High water levels from snowmelt + rain - **Summer decline** (June-August): Drawdown from high ET + pumping - **Fall recovery start** (September-November): Decreasing ET, some recharge - **Winter stability** (December-February): Frozen ground, minimal change ::: ```{python} #| label: fig-well-hydrograph #| fig-cap: "Complete hydrograph showing 14+ years of continuous hourly water level monitoring. Reversed y-axis means deeper water levels appear lower. Clear seasonal patterns and long-term trends are visible." # Plot time series for longest well - use daily means for efficiency fig = go.Figure() # Aggregate to daily means to reduce data points (from ~100k+ to ~5k) daily_data = well_data.copy() daily_data.set_index('TIMESTAMP', inplace=True) daily_mean = daily_data['DTW_FT_Reviewed'].resample('D').mean().dropna().reset_index() fig.add_trace(go.Scatter( x=daily_mean['TIMESTAMP'], y=daily_mean['DTW_FT_Reviewed'], mode='lines', line=dict(color='steelblue', width=1), name=f'Well {longest_well}', hovertemplate='Date: %{x|%Y-%m-%d} Depth to water: %{y:.1f} ft<extra></extra>' )) fig.update_layout( title=f'Complete Hydrograph - Well {longest_well} (2009-2022) ~14 years of continuous hourly monitoring', xaxis_title='Date', yaxis_title='Depth to Water (ft below surface)', yaxis_autorange='reversed', # Deeper = lower on chart height=500, template='plotly_white', hovermode='x unified' ) fig.show() ``` ### Seasonal Patterns ::: {.callout-note icon=false} ## 📘 Understanding Seasonal Water Level Patterns **What Will You See?** A line chart showing average water depth by month, with error bars (±1 standard deviation) and shaded min-max range. **Why Seasonal Patterns Matter:** Seasonality reveals aquifer behavior: - **Predictability**: Regular patterns = reliable recharge cycle - **Amplitude**: Large swings = vulnerable to drought - **Timing**: When do levels peak/trough? **How to Interpret the Seasonal Chart:** | Pattern | Physical Meaning | Management Strategy | |---------|-----------------|-------------------| | **Spring peak (Mar-May)** | Recharge exceeds use | Plan for wet conditions | | **Summer decline (Jun-Aug)** | ET + pumping exceed recharge | Peak demand period—monitor closely | | **Fall recovery** | Recharge resumes | Assess drought recovery | | **Winter stable** | Frozen ground, minimal change | Off-season for recharge | | **±2-5 ft variation** | Typical Midwest aquifer | Normal seasonal range | | **±10+ ft variation** | High stress or unconfined | Vulnerable to drought | **Expected Midwest Pattern:** Shallowest in spring (April-May), deepest in fall (September-October). ::: ```{python} #| label: fig-seasonal-patterns #| fig-cap: "Monthly water level statistics showing seasonal variation. Error bars show ±1 standard deviation, shaded region shows min-max range. Water levels typically shallowest in spring (recharge) and deepest in fall (drawdown)." # Calculate monthly statistics well_data['month'] = well_data['TIMESTAMP'].dt.month monthly_stats = well_data.groupby('month')['DTW_FT_Reviewed'].agg([ ('mean', 'mean'), ('std', 'std'), ('min', 'min'), ('max', 'max') ]).reset_index() months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'] fig = go.Figure() # Mean with error bars fig.add_trace(go.Scatter( x=months, y=monthly_stats['mean'], error_y=dict( type='data', array=monthly_stats['std'], visible=True ), mode='lines+markers', line=dict(color='steelblue', width=3), marker=dict(size=8), name='Mean ± Std', hovertemplate='Month: %{x} Mean depth: %{y:.1f} ft Std dev: %{error_y.array:.1f} ft<extra></extra>' )) # Min-max range fig.add_trace(go.Scatter( x=months, y=monthly_stats['min'], mode='lines', line=dict(width=0), showlegend=False, hoverinfo='skip' )) fig.add_trace(go.Scatter( x=months, y=monthly_stats['max'], mode='lines', line=dict(width=0), fill='tonexty', fillcolor='rgba(70, 130, 180, 0.2)', name='Min-Max range', hovertemplate='Month: %{x} Max depth: %{y:.1f} ft<extra></extra>' )) fig.update_layout( title=f'Seasonal Water Level Pattern - Well {longest_well} Spring highs, summer lows—typical Midwest aquifer response', xaxis_title='Month', yaxis_title='Depth to Water (ft)', yaxis_autorange='reversed', height=500, template='plotly_white' ) fig.show() ``` ::: {.callout-note icon=false} ## 💻 For Computer Scientists **Time Series Concepts in Groundwater Data:** **Autocorrelation (ACF)**: Water levels are highly autocorrelated - today's level predicts tomorrow's. This violates i.i.d. assumptions in standard ML. - High ACF at lag 1 = smooth, slowly-changing signal - ACF decay rate indicates system "memory" (confined aquifers have longer memory) **Seasonality Detection**: Classical decomposition (STL, seasonal_decompose) separates: - **Trend**: Long-term direction (climate change, pumping effects) - **Seasonal**: Repeating annual pattern (recharge/discharge cycle) - **Residual**: What's left (anomalies, events, noise) **Stationarity**: Many time series methods assume stationarity (constant mean/variance). Groundwater data is often **non-stationary**: - Use differencing or detrending before analysis - Test with Augmented Dickey-Fuller (ADF) test **Resampling Choices**: Raw data is hourly (100k+ points). For different analyses: - **Daily means**: Smooth patterns, reduce noise - **Monthly**: Seasonal analysis - **Hourly**: Event detection (storm response) ::: ::: {.callout-tip icon=false} ## 🌍 For Hydrologists **Reading the Seasonal Pattern:** **Spring (Mar-May)**: **Shallowest water levels** - High precipitation - Low evapotranspiration - Snowmelt contribution - **Peak recharge season** **Summer (Jun-Aug)**: **Declining water levels** - High ET exceeds precipitation - Crop water use - Pumping for irrigation - **Aquifer stress season** **Fall (Sep-Nov)**: **Continued decline or stabilization** - Decreasing ET - Moderate precipitation - Post-growing season recovery begins **Winter (Dec-Feb)**: **Slow recovery** - Minimal ET - Frozen ground limits recharge - Aquifer "resting" **Annual cycle amplitude**: ~5-10 ft typical for unconfined Midwest aquifers ::: --- ## Part 5: Small Multiples Comparison ::: {.callout-note icon=false} ## 📘 What to Look For in Small Multiples **What Are Small Multiples?** "Small multiples" (coined by Edward Tufte) show the same type of chart repeated for different categories—here, one hydrograph per well. **Why Use Small Multiples?** Enables visual comparison: - **Synchrony**: Do wells respond simultaneously to climate? - **Amplitude**: Do some wells show larger fluctuations? - **Trends**: Do all wells show same long-term direction? - **Anomalies**: Does one well behave differently (sensor issue? local pumping)? **What to Look For:** | Observation | Interpretation | Action | |------------|---------------|--------| | **Similar patterns** | Wells measure same aquifer | Good—regionally representative | | **Synchronized peaks** | Respond to same climate events | Validates climate-aquifer connection | | **Different amplitudes** | Varying aquifer properties | Expected—local heterogeneity | | **Opposite trends** | Wells in different aquifers or one faulty | Investigate anomaly | **Expected:** All 3 wells should show similar seasonal patterns (confirming they measure same aquifer) but may differ in amplitude (local properties). ::: ```{python} #| label: fig-well-comparison #| fig-cap: "Small multiples comparison of all operational wells. Each panel shows one well's complete time series. Despite different start dates, all wells show similar seasonal patterns, suggesting regional aquifer response." # Create small multiples for all wells with data (only 3) fig = make_subplots( rows=3, cols=1, subplot_titles=[f"Well {w}" for w in well_ids], shared_xaxes=True, vertical_spacing=0.08 ) colors = ['steelblue', 'coral', 'mediumseagreen'] for idx, (well_id, color) in enumerate(zip(well_ids, colors), 1): df = pd.read_sql( """ SELECT TIMESTAMP, DTW_FT_Reviewed FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY WHERE P_NUMBER = ? AND TIMESTAMP IS NOT NULL """, conn, params=[well_id] ) df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce') df = df.dropna().sort_values('TIMESTAMP') # Subsample for performance if len(df) > 5000: df = df.iloc[::len(df)//5000] fig.add_trace( go.Scatter( x=df['TIMESTAMP'], y=df['DTW_FT_Reviewed'], mode='lines', line=dict(color=color, width=1), name=f'Well {well_id}', hovertemplate='%{x|%Y-%m-%d} %{y:.1f} ft<extra></extra>' ), row=idx, col=1 ) # Reverse y-axis for each subplot fig.update_yaxes(autorange='reversed', row=idx, col=1) fig.update_layout( title='Multi-Well Comparison - All Operational Wells Different start dates but similar seasonal patterns', height=900, showlegend=False, template='plotly_white', hovermode='x unified' ) fig.update_xaxes(title_text='Date', row=3, col=1) fig.show() # Close database connection if it was opened if conn is not None: conn.close() ``` --- ## Part 6: Network Coverage Assessment ### Spatial Distribution ::: {.callout-note icon=false} ## 📘 Interpreting Spatial Coverage Maps **What Is This Map Showing?** Well locations plotted on X-Y coordinates (easting/northing in UTM meters). Color distinguishes operational wells (red) from metadata-only wells (blue). **Why Does Spatial Distribution Matter?** Well spacing determines: - **Spatial resolution**: Can we map regional water table gradients? - **Redundancy**: If one fails, can others compensate? - **Representativeness**: Do wells sample different geological zones? **How to Interpret Spatial Patterns:** | Pattern | Assessment | Capability | Management Action | |---------|-----------|-----------|------------------| | **10+ wells, evenly spaced** | Excellent | Regional mapping | Maintain network | | **5-10 wells, moderate spacing** | Good | Limited regional analysis | Acceptable | | **3-5 wells, clustered** | Poor | Point observations only | Expand network | | **<3 wells** | Critical failure | Cannot map regionally | Urgent expansion | **This Dataset Reality:** Only 3 operational wells (red dots) = cannot map regional water table. Blue dots represent "ghost wells"—documented but non-operational. **Optimal Spacing:** Wells should be spaced at < half the variogram range (~5km for this aquifer) to ensure adequate spatial coverage. ::: ```{python} #| label: fig-well-spatial-coverage #| fig-cap: "Spatial distribution of monitoring wells. Red markers indicate wells with measurement data, blue markers indicate wells in metadata only. Note the severe coverage gap with only 3 operational wells." # Map wells with vs. without data wells_meta['has_data'] = wells_meta['well_id'].isin(measurements['well_id']) fig = px.scatter( wells_meta.head(50), # First 50 to avoid overcrowding x='easting', y='northing', color='has_data', size=[10 if x else 5 for x in wells_meta.head(50)['has_data']], hover_data=['well_id'], title='Well Network Spatial Coverage Red = Data available | Blue = No data (metadata only)', labels={'easting': 'Easting (m, UTM)', 'northing': 'Northing (m, UTM)'}, color_discrete_map={True: 'red', False: 'lightblue'}, height=600 ) fig.show() ``` ### Coverage Statistics ::: {.callout-note icon=false} ## Understanding Well Network Metrics **What Are Network Metrics?** Network metrics quantify the **quality and extent** of groundwater monitoring infrastructure. Think of them as "vital signs" for the monitoring system itself (not the aquifer). **Brief History**: Systematic monitoring network design emerged in the 1970s with pioneering work by the U.S. Geological Survey. Modern guidelines recommend 1 well per 100-250 km² for regional aquifer monitoring. **Why Do These Metrics Matter?** Network quality determines: - **Data reliability**: Can we trust regional conclusions from sparse data? - **Spatial coverage**: Can we map water table gradients and flow directions? - **Temporal coverage**: Can we detect long-term trends vs. short-term noise? - **Redundancy**: If one well fails, can others compensate? **How to Interpret Each Metric:** | Metric | What It Measures | Interpretation Guide | |--------|------------------|---------------------| | **Wells in Metadata** | Advertised network size | Compare to operational count | | **Wells with Measurements** | Actually operational | <30% = critical failure; >70% = good | | **Data Availability Rate** | Metadata accuracy | <50% = metadata unreliable | | **Total Measurements** | Data volume | Millions = excellent; thousands = limited | | **Longest Record** | Historical depth | >10 years = trend detection possible | | **Measurement Interval** | Temporal resolution | <1 day = excellent; >1 week = poor | | **Continuous Data** | Gap-free monitoring | Count with <5% missing data | | **Spatial Coverage** | Geographic extent | Points per 100 km² (more = better) | **What Will You See?** The table below summarizes 8 key network metrics. Look for: 1. **Metadata vs. reality gaps**: Are advertised and operational counts similar? 2. **Availability rate**: Is most of the network actually working? 3. **Record length**: Can we analyze long-term trends or just recent snapshots? 4. **Spatial coverage**: Are we monitoring points or regional patterns? **Quality Thresholds for Regional Aquifer Monitoring:** | Aspect | Excellent | Good | Fair | Poor (This Study) | |--------|-----------|------|------|-------------------| | **Availability Rate** | >90% | 70-90% | 40-70% | **17%** | | **Spatial Density** | 1 per 50 km² | 1 per 100 km² | 1 per 250 km² | **1 per 295 km²** | | **Measurement Interval** | Hourly | Daily | Weekly | **Hourly** ✓ | | **Record Length** | >15 years | 10-15 years | 5-10 years | **14.8 years** ✓ | | **Continuous Monitoring** | >90% | >70% | >50% | **100%** ✓ | **Mixed Performance**: This network has **excellent temporal data quality** (hourly, continuous, long records) but **critical spatial coverage failure** (only 3 operational wells, 17% availability). ::: ```{python} coverage_stats = pd.DataFrame({ 'Metric': [ 'Wells in Metadata', 'Wells with Measurements', 'Data Availability Rate', 'Total Measurements', 'Longest Record', 'Mean Measurement Interval', 'Wells with Continuous Data', 'Spatial Coverage' ], 'Value': [ f"{len(wells_meta)}", f"{len(measurements)} (17%)", f"{len(measurements)/len(wells_meta)*100:.1f}%", f"{measurements['measurement_count'].sum():,}", f"{measurements['record_length_years'].max():.1f} years", "~1 hour (automated loggers)", f"{len(measurements)} (all 3)", "3 points (inadequate)" ] }) coverage_stats ``` **Interpreting These Results:** - **Metadata vs. operational (18 vs. 3)**: **Critical discrepancy**—83% of advertised wells non-functional - **Data availability (17%)**: **Network failure**—insufficient for regional analysis - **Total measurements (173K+)**: **Good volume**—but concentrated in only 3 locations - **Longest record (14.8 years)**: **Excellent**—enables trend and seasonality analysis - **Measurement interval (hourly)**: **Excellent**—captures storm response and diurnal cycles - **Continuous data (100%)**: **Excellent**—no significant gaps in operational wells - **Spatial coverage (3 points)**: **Critical failure**—cannot map regional patterns **Bottom Line**: We have **high-quality time series data from 3 locations** but **no spatial coverage**. This is like having detailed weather records from 3 thermometers in a large county—excellent temporal detail, but you can't map temperature patterns across the region. --- ## Part 7: Key Findings and Recommendations ::: {.callout-important icon=true} ## 🎯 Critical Findings ### 1. Limited Network Coverage with High Data Volume **Reality**: 18 wells documented in database with measurement records **Data Volume**: - Total: 1.1+ million measurements across all wells - Multiple long-term records (10-13+ years from several wells) - High-frequency data from recent installations (hourly/sub-hourly) - Geographic distribution across study area **However**: While 18 wells have some measurement data, **only 3 wells have substantial operational records** suitable for robust trend analysis and seasonal decomposition. **Capability**: Limited spatial coverage but excellent temporal depth from key wells ### 2. Excellent Data Quality from Operational Wells **Achievement**: The 3 primary operational wells demonstrate: - Automated dataloggers (hourly measurements) - Continuous monitoring (minimal gaps) - Long records enabling trend analysis (14.8 years for primary well) **Value**: High temporal resolution from these wells enables: - Storm response analysis - Seasonal decomposition - Long-term trend detection - Climate-aquifer correlation studies **Limitation**: Concentrated at only 3 locations—cannot map regional gradients ### 3. Data Distribution Patterns **Observation**: Data volume varies dramatically across wells **Reality**: - Most measurements concentrated in 3 primary wells - Other wells have sparse or short records - Mix of temporal coverage but limited spatial coverage **Constraint**: Different record lengths limit regional spatial analyses ### 4. Spatial Coverage Constraints **Challenge**: Only 3 wells with substantial operational data **Limitations**: - Regional water table mapping **not feasible** with 3 points - Spatial gradient analysis severely limited - Cannot assess aquifer heterogeneity regionally - Insufficient validation points for comprehensive HTEM calibration **Critical Need**: Network expansion required for regional analysis ::: --- ## Comparison to HTEM Coverage | Data Source | Coverage Type | Quality | |-------------|---------------|---------| | **HTEM** | 884 km² continuous | Excellent spatial | | **Wells** | 18 monitoring points | Good spatial + temporal | **Integration synergy**: HTEM provides comprehensive **spatial** coverage (single time snapshot). Wells provide excellent **temporal** dynamics (continuous monitoring over years). **Fusion strategy**: Use HTEM to map aquifer structure everywhere, calibrate and validate with 18 well time series, create integrated 4D understanding (space + time). --- ## Recommendations ### Immediate (0-3 months) 1. Verify operational status of all 18 wells with data providers 2. Ensure datalogger maintenance and backup procedures 3. Document measurement frequency and data quality for each well ### Short-term Actions 4. Assess spatial distribution gaps in coverage 5. Consider strategic placement of additional wells in undersampled areas 6. Implement real-time telemetry for drought monitoring ### Long-term Actions 7. Maintain and expand network to 20-25 wells for enhanced spatial resolution 8. Install nested well pairs (shallow + deep) to assess vertical gradients 9. Co-locate additional wells with stream gauges for integrated surface-groundwater analysis --- ## Dependencies & Outputs - **Data source**: `aquifer_db` (config key) → `OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY` table - **Loader**: `src.data_loaders.GroundwaterLoader` - **Critical**: US timestamp format (`%m/%d/%Y`) must be used - **Outputs**: Hydrographs, coverage heatmaps, quality statistics To access well data: ```python from src.data_loaders import GroundwaterLoader loader = GroundwaterLoader(db_path) # Load well time series data = loader.load_well_time_series(well_id=444863) ``` --- ## Summary Well network analysis reveals **strong temporal and spatial data foundation**: ✅ **Multiple wells in metadata** - Comprehensive network documentation ✅ **1.1M+ measurements** - Rich temporal records from 18 operational wells ✅ **Clear seasonal patterns** - Spring highs, summer lows track Midwest recharge cycle ✅ **18 operational wells** - Good spatial distribution across study area ✅ **Long-term records** - Multiple wells with 10-13+ years of continuous data ✅ **High-frequency monitoring** - Automated dataloggers providing hourly measurements **Key Insight**: Well data provides **ground truth** for HTEM interpretations. Current network supports regional analysis, spatial gradient mapping, and long-term trend detection. The combination of spatial coverage and temporal depth enables robust calibration and validation of geophysical models. --- ## Related Chapters - [Well Spatial Coverage](../part-2-spatial/well-spatial-coverage.qmd) - Mapping coverage gaps - [Water Level Trends](../part-3-temporal/water-level-trends.qmd) - Long-term trend analysis - [HTEM-Groundwater Fusion](../part-4-fusion/htem-groundwater-fusion.qmd) - Validating HTEM with wells - [Well Placement Optimizer](../part-5-operations/well-placement-optimizer.qmd) - Optimal new well locations ## Reflection Questions - Given the current network (3 wells with long records), which types of analyses are still robust, and which would you treat as exploratory or highly uncertain? - If you could add only 3–5 new wells in the next phase, where would you place them geographically to reduce uncertainty the most, and why? - How does the mismatch between metadata (18 wells) and actual data availability change the way you would design future monitoring or modeling studies in this region?

6.1 What You Will Learn in This Chapter

6.2 Direct Measurement of the Aquifer

6.3 Part 1: The Monitoring Network

6.3.1 Well Metadata Inventory

6.3.2 Actual Measurement Availability

6.4 Part 2: Data Quality Assessment

6.4.1 Measurement Frequency Analysis

What Is Measurement Frequency?

Why Does Measurement Frequency Matter?

How to Interpret Measurement Intervals

What Will You See?

6.5 Part 3: Temporal Coverage

6.5.1 Measurement Timeline

6.5.2 Coverage Heatmap

6.6 Part 4: Water Level Dynamics

6.6.1 Long-Term Hydrograph

6.6.2 Seasonal Patterns

6.7 Part 5: Small Multiples Comparison

6.8 Part 6: Network Coverage Assessment

6.8.1 Spatial Distribution

6.8.2 Coverage Statistics

6.9 Part 7: Key Findings and Recommendations

6.9.1 1. Limited Network Coverage with High Data Volume

6.9.2 2. Excellent Data Quality from Operational Wells

6.9.3 3. Data Distribution Patterns

6.9.4 4. Spatial Coverage Constraints

6.10 Comparison to HTEM Coverage

6.11 Recommendations

6.11.1 Immediate (0-3 months)

6.11.2 Short-term Actions

6.11.3 Long-term Actions

6.12 Dependencies & Outputs

6.13 Summary

6.14 Related Chapters

6.15 Reflection Questions