10 Spatial Patterns Overview

Mapping Where Things Are

For Newcomers

You will get: - Maps that show where the aquifer is thick, thin, vulnerable, or well monitored. - Intuition for how distance and coverage affect what we can learn. - Examples of how to spot gaps in monitoring networks.

Read these first if you are new: - Part 1 overview and HTEM Survey + Subsurface 3D Model chapters.

You can skim spatial statistics details and focus on the story the maps tell about where the aquifer is strong or at risk.

🔗 Connection to Part 1

Part 1 showed WHAT data we have and its quality limitations (356 wells in metadata, but only 18 with measurements, only 3 with long records).

Part 2 answers WHERE: Where is the aquifer most productive? Where are the monitoring gaps? Where should we focus resources?

The spatial analyses in this part build directly on the data quality findings from Part 1. When you see coverage gaps in the maps, remember: they reflect the 17% operational well network documented in Data Foundations Overview.

10.1 What You Will Learn in This Chapter

By the end of this chapter, you will be able to:

Describe the eight core spatial analyses (materials, resistivity, well coverage, stream proximity, weather density, monitoring gaps, vulnerability, cross-sections) and what question each one answers.
Explain why spatial autocorrelation, anisotropy, and scale matter for interpreting aquifer maps and monitoring networks.
Understand how the spatial chapters in Part 2 fit together to tell a coherent “where” story for the aquifer system.
Identify which downstream chapters to consult for detailed methods or results on a particular spatial question (e.g., vulnerability vs coverage vs architecture).

11 Where is What?

Spatial analysis reveals the underground architecture of our aquifer system. While temporal patterns tell us when things happen, spatial patterns show us where they happen - and why location matters for water resource management.

11.1 Why Spatial Analysis Matters

💻 For Computer Scientists

Spatial data violates the i.i.d. assumption central to machine learning: - Spatial autocorrelation: Nearby points are more similar than distant points - Non-stationarity: Statistical properties change across space - Edge effects: Boundaries create artifacts - Scale dependence: Patterns emerge differently at local vs regional scales

Key Challenge: Standard cross-validation fails. We need spatial CV with geographic blocking.

Core Methods: - Clustering: K-means, DBSCAN for zone identification - Interpolation: Kriging for optimal spatial prediction - Hot spot analysis: Getis-Ord Gi* for statistical significance - Variograms: Quantify spatial correlation structure

🌍 For Hydrologists

Spatial patterns reveal geological controls on groundwater systems:

Aquifer heterogeneity: Sand channels vs clay-rich zones
Confining layers: Where do they protect the aquifer?
Recharge areas: Where does water enter the system?
Discharge zones: Where do streams interact with groundwater?
Vulnerability: Which areas need protection most?

Physical Processes: - Paleochannels: Ancient river systems create high-permeability corridors - Glacial till: Clay-rich deposits confine the aquifer - Bedrock topography: Structural controls on flow patterns - Stream networks: Surface-groundwater interaction zones

11.2 The Eight Spatial Analyses

11.2.1 Aquifer Material Map

Question: Where are the best aquifer materials?

Method: Map HTEM material types across Unit D (primary aquifer)

Key Finding: A large fraction of Unit D consists of high-quality sand and gravel, forming distinct NE–SW corridors (paleochannels; see the detailed statistics and maps in the Aquifer Material Map chapter).

Impact: Identifies drilling targets and validates historical well placement patterns

11.2.2 Resistivity Distribution

Question: How does aquifer quality vary spatially?

Method: Statistical analysis + mapping of resistivity patterns

Key Finding: Mean resistivity values consistent with fine-to-medium sand aquifer materials, with substantial spatial variation (see the Resistivity Distribution chapter for exact ranges and calibration details).

Impact: Calibrates geophysical signatures to hydraulic properties

11.2.3 Well Spatial Coverage

Question: Is our monitoring network adequate?

Method: Point pattern analysis, nearest neighbor statistics, geostatistics

Key Finding: Wells cluster (NN ratio < 1.0), leaving monitoring gaps in some areas

Impact: Guides expansion of monitoring network to under-sampled regions

11.2.4 Stream Proximity

Question: Can we study stream-groundwater interaction?

Method: Spatial proximity analysis between wells and USGS stream gauges

Key Finding: Critical gap - wells with data are 3-25 km from streams (too far for interaction studies)

Impact: Explains why we can’t directly test two-aquifer hypothesis; guides future monitoring investments

11.2.5 Weather Station Density

Question: Does weather station coverage support spatial precipitation analysis?

Method: Thiessen polygons, coverage analysis, representativeness assessment

Key Finding: Regional network of stations provides adequate coverage for county-scale analyses but is sparse for capturing small-scale convective storm variability (see Weather Station Density for quantitative coverage metrics).

Impact: Defines spatial scale for precipitation-recharge analysis

11.2.6 Monitoring Gap

Question: Where are the critical gaps across all 4 data sources?

Method: Multi-source spatial overlay to identify under-monitored zones

Key Finding: High-quality aquifer zones lack both groundwater AND weather monitoring

Impact: Prioritizes locations for targeted monitoring investments

11.2.7 Vulnerability Map

Question: Which areas are most vulnerable to contamination?

Method: DRASTIC index integrating depth, recharge, aquifer properties, vadose zone

Key Finding: 66% moderate vulnerability (confining layers provide protection), 28% high vulnerability in shallow zones

Impact: Guides land-use planning and wellhead protection zones

11.2.8 Cross-Section Visualization

Question: What is the vertical aquifer architecture?

Method: N-S and E-W transects through 3D HTEM data (6 stratigraphic units)

Key Finding: Unit D sandwiched between clay-rich units above and bedrock below - confined aquifer confirmed

Impact: Explains temporal patterns (low seasonality, long memory, barometric response)

11.3 Integration: The Spatial Story

These 8 analyses weave together to tell a coherent spatial story:

The Physical Structure (Ch 8: Cross-sections) - 6 stratigraphic units stacked vertically - Unit D (primary aquifer) confined by clay cap and bedrock

The Aquifer Quality (Ch 2-3: Materials & Resistivity) - 42.7% high-quality sand in NE-SW paleochannels - Heterogeneous distribution requires spatial mapping

The Monitoring Network (Ch 3-6: Wells, Streams, Weather, Gaps) - Well network clusters, leaving gaps - Stream-groundwater studies impossible with current network - Weather stations adequate for regional but not local analysis - Targeted investments needed in high-priority zones

The Vulnerability Assessment (Ch 7: DRASTIC) - Integrates structure + materials + recharge - Quantifies protection from confining layers - Identifies 28% high-vulnerability zones for management

11.4 Key Spatial Concepts

Understanding Key Spatial Analysis Methods

Before diving into the spatial chapters, it’s helpful to understand the core statistical methods we use throughout Part 2. These methods were developed over the past century to solve specific problems in geography, geology, and environmental science.

11.4.1 Variograms (Matheron, 1960s)

What Is It?

A variogram quantifies how similar two measurements are as a function of the distance between them. Developed by French mathematician Georges Matheron in the 1960s (building on work by South African mining engineer Danie Krige in 1951), variograms are the foundation of geostatistics.

Why Does It Matter?

Variograms answer three critical questions for aquifer management: 1. How far can we trust interpolation? (the range parameter) 2. How variable is the aquifer? (the sill parameter) 3. How good are our measurements? (the nugget parameter)

How Does It Work?

Calculate differences between all pairs of measurements
Group pairs by distance (lag)
Plot average squared difference vs. distance
Fit a mathematical model (spherical, exponential, Gaussian)

What Will You See?

The variogram plot shows lag distance (x-axis, in km) versus semivariance (y-axis, a measure of dissimilarity). You’ll see:

Near origin: Low semivariance (nearby points are similar)
Rising curve: Semivariance increases with distance (correlation decreases)
Plateau (sill): Maximum variance reached (~0.245 for our aquifer)
Range: Distance where curve flattens (~8.5 km - beyond this, no spatial correlation)
Nugget: Y-intercept jump (~0.037 - represents measurement error + micro-scale variation)

Interpretation Guide:

Parameter	Typical Value	What It Means for Management
Range	8.5 km	Wells >8.5 km apart provide independent information
Sill	0.245	Total aquifer variability - larger = more heterogeneous
Nugget	0.037 (15%)	Measurement error - <20% is excellent quality

11.4.2 Kriging (Krige, 1951; Matheron, 1960s)

What Is It?

Kriging is the optimal spatial interpolation method that uses the variogram to predict values at unmeasured locations while minimizing prediction error. Named after Danie Krige who pioneered these methods for estimating gold ore reserves in South African mines.

Why Does It Matter?

Unbiased predictions: Best linear unbiased predictor (BLUP)
Uncertainty quantification: Provides prediction variance at every location
Accounts for spatial correlation: Nearby wells get more weight, but redundant wells get less

How Does It Work?

Build variogram from measured well data
For each prediction location, calculate weights for nearby wells
Weights depend on: distance to target AND correlation between wells
Combine weighted values to get prediction + uncertainty

What Will You See?

Kriging produces two complementary maps:

Interpolated surface map: Smooth continuous surface showing predicted values at every location. Colors typically range from blue (low values, e.g., shallow water table) to red (high values, e.g., deep water table).
Prediction variance map: Shows uncertainty in predictions. Dark areas = low uncertainty (near wells), bright areas = high uncertainty (far from wells). This map guides where to add new monitoring wells.

How to Interpret:

Kriging Variance	Meaning	Management Action
< 0.1	Low uncertainty (near wells)	Confident in predictions, no new wells needed
0.1 - 0.3	Moderate uncertainty	Adequate for regional planning
> 0.3	High uncertainty (data gaps)	Priority for new monitoring wells
> 0.5	Very high uncertainty	Avoid making management decisions without more data

When to Use:

Mapping water levels between wells
Estimating aquifer properties at unsampled locations
Network optimization (target high-uncertainty areas)

11.4.3 Getis-Ord Gi* Hot Spot Analysis (1992)

What Is It?

A statistical test that identifies spatial clusters that are significant (not random). Developed by Arthur Getis and J. Keith Ord in 1992 to detect local pockets of spatial association in geographic data.

Why Does It Matter?

Visual inspection can be misleading - our brains see patterns even in random data. Gi* provides statistical rigor:

p < 0.05 = 95% confidence the cluster is real
p < 0.01 = 99% confidence the cluster is real

How Does It Work?

For each location, define neighbors (typically 5 km radius)
Calculate weighted average of neighbors’ values
Compare to global average using z-score
High z-score = hot spot, Low z-score = cold spot

What Will You See?

A hot spot map displays spatial clusters colored by statistical significance:

Red zones: Hot spots (clusters of high values) - e.g., deep water table, high resistivity
Blue zones: Cold spots (clusters of low values) - e.g., shallow water table, low resistivity
White/gray zones: Not statistically significant (random variation)
Color intensity: Darker = stronger statistical significance (higher z-score)

The map includes z-score values or p-values to show confidence level. Only colored areas with p < 0.05 represent real clusters (not random chance).

Interpretation:

Gi* z-score	Meaning	Management Action
> 2.58	Hot spot (99% confidence)	Priority for well development
1.96 to 2.58	Hot spot (95% confidence)	Good drilling target
-1.96 to 1.96	Not significant	Random variation
< -1.96	Cold spot (95% confidence)	Avoid for development

11.4.4 Moran’s I (Moran, 1950)

What Is It?

A global measure of spatial autocorrelation that tests whether the entire dataset shows clustering, randomness, or dispersion. Developed by statistician Patrick Moran in 1950 as the spatial equivalent of Pearson correlation.

Why Does It Matter?

Moran’s I tells us if spatial analysis is even justified:

Positive autocorrelation (I > 0): Spatial patterns exist → use spatial methods
No autocorrelation (I ≈ 0): Randomness → spatial analysis won’t help
Negative autocorrelation (I < 0): Checkerboard pattern (rare in nature)

How Does It Work?

Define neighbors: Create spatial weight matrix (e.g., wells within 5 km are neighbors)
Calculate deviations: For each location, find difference from global mean
Compute products: Multiply each location’s deviation by its neighbors’ deviations
Weighted average: Sum all products, weighted by neighbor relationships
Normalize: Scale to range [-1, 1] for interpretation

Intuitive explanation: Imagine comparing each well to its neighbors. If high-value wells tend to be near other high-value wells (and low near low), that’s positive autocorrelation. If high wells are randomly scattered among low wells, that’s zero autocorrelation.

What Will You See?

The Moran scatter plot displays:

X-axis: Standardized values at each location (z-scores)
Y-axis: Average of neighbors’ standardized values (spatial lag)
Four quadrants:
- Top-right (High-High): High values surrounded by high values - clustering
- Bottom-left (Low-Low): Low values surrounded by low values - clustering
- Top-left (Low-High): Low values surrounded by high values - spatial outliers
- Bottom-right (High-Low): High values surrounded by low values - spatial outliers

The slope of the best-fit line through all points equals Moran’s I. A steep positive slope indicates strong clustering.

Interpretation Guide:

Moran’s I	Pattern	Implication
0.60 to 1.0	Strong clustering	Kriging highly effective
0.30 to 0.60	Moderate clustering	Spatial analysis useful
-0.30 to 0.30	Random	Use simpler methods
-1.0 to -0.30	Dispersion	Check data quality

Our aquifer: Moran’s I = 0.68 (strong positive autocorrelation) confirms spatial methods are appropriate.

11.4.5 Tobler’s First Law

“Everything is related to everything else, but near things are more related than distant things.”

Implication: We can interpolate between measurements, but uncertainty increases with distance.

Quantified: Spatial correlation range = 8.5 km (from variogram analysis in Ch 3)

11.4.6 Spatial Scale Hierarchy

Scale	Distance	Features	Data Sources
Local	10-100 m	Individual wells, HTEM grid cells	HTEM 3D grids
Intermediate	100-1000 m	Paleochannels, well clusters	HTEM 2D aggregates, well networks
Regional	1-10 km	Basin-scale formations, watersheds	All integrated

Analysis must match scale to question: Local vulnerability needs 100m resolution; regional trends work at 1km.

11.4.7 Anisotropy

Definition: Spatial correlation stronger in some directions than others

Evidence: NE-SW paleochannels create directional grain

Impact: Kriging interpolation should account for anisotropy (elliptical search radius, not circular)

11.5 Interdisciplinary Bridges

11.5.1 Computer Science ↔︎ Hydrogeology

Clustering Algorithms (K-means, DBSCAN) automate what geologists call lithofacies classification

CS: “Cluster analysis identifies 4 groups in feature space”
Hydro: “We mapped 4 distinct hydrogeological zones”
Same result, different language

Hot Spot Analysis (Getis-Ord Gi*) provides statistical rigor for what geologists call anomalies

CS: “Statistically significant spatial clusters with p < 0.01”
Hydro: “High-permeability zones requiring management attention”
Statistics validates geological intuition

11.5.2 Spatial Autocorrelation

What CS sees: Moran’s I = 0.68 (positive autocorrelation)

What Hydro sees: Continuous aquifer with gradual lithologic transitions (not random patches)

Combined: Both agree - spatial interpolation is justified, but account for correlation in uncertainty estimates

11.6 Tools & Technologies

11.6.1 Visualization

Plotly: Interactive maps with zoom, pan, click for details
3D scatter plots: Subsurface visualization (X, Y, Z coordinates)
Contour plots: Smooth continuous surfaces from point data

11.6.2 Spatial Analysis

scipy.spatial.cKDTree: Fast nearest-neighbor search for well-station matching
skgstat: Variogram computation and model fitting
sklearn.cluster: K-means, DBSCAN for zone identification

11.6.3 Coordinate Systems

WGS84 (EPSG:4326): Lat/lon for GPS coordinates (wells, stations)
UTM Zone 16N (EPSG:32616): Meters for HTEM grids and distance calculations
pyproj: Accurate coordinate transformation between systems

11.7 Navigation

The chapters are ordered from data characterization → network assessment → integrated risk mapping:

Characterization (Ch 1-2): What do we have? - Aquifer materials and resistivity patterns

Network Assessment (Ch 3-6): Can we measure what we need? - Well coverage, stream proximity, weather density, monitoring gaps

Risk & Architecture (Ch 7-8): Where should we focus management? - Vulnerability mapping, vertical structure

Read sequentially for the full spatial story, or jump to individual chapters for specific questions.

11.8 Reflection Questions

After reading this overview, which of the eight spatial analyses do you think is most critical for your own interests (materials, resistivity, coverage, gaps, vulnerability, or cross-sections), and why?
Looking at the roles of HTEM, wells, weather stations, and stream gauges, where do you see the biggest spatial blind spots, and how might those influence conclusions about the aquifer?
When you interpret a spatial map in later chapters, how will you check that the question, data resolution, and analysis scale are appropriately matched?

11.9 Expected Outcomes

By the end of Part 2, you will understand:

✅ Where high-quality aquifer zones are located (paleochannel corridors) ✅ Why monitoring networks have gaps (independent design for different purposes) ✅ Which areas need protection most (shallow zones with thin confining layers) ✅ How vertical confinement works (clay cap creates sealed system) ✅ What spatial scale is appropriate for different analyses (local vs regional)

Most importantly: You’ll see how spatial patterns explain temporal behavior - the confined structure (Ch 8) determines the system response (Part 3: Temporal Patterns).

Let’s map the aquifer. 🗺️

--- title: "Spatial Patterns Overview" subtitle: "Mapping Where Things Are" --- ::: {.callout-tip icon=false} ## For Newcomers **You will get:** - Maps that show **where** the aquifer is thick, thin, vulnerable, or well monitored. - Intuition for how distance and coverage affect what we can learn. - Examples of how to spot **gaps** in monitoring networks. **Read these first if you are new:** - Part 1 overview and **HTEM Survey** + **Subsurface 3D Model** chapters. You can skim spatial statistics details and focus on the **story the maps tell** about where the aquifer is strong or at risk. ::: ::: {.callout-note icon=false} ## 🔗 Connection to Part 1 **Part 1 showed WHAT data we have** and its quality limitations (356 wells in metadata, but only 18 with measurements, only 3 with long records). **Part 2 answers WHERE**: Where is the aquifer most productive? Where are the monitoring gaps? Where should we focus resources? The spatial analyses in this part build directly on the data quality findings from Part 1. When you see coverage gaps in the maps, remember: they reflect the 17% operational well network documented in [Data Foundations Overview](../part-1-foundations/overview.qmd). ::: ## What You Will Learn in This Chapter By the end of this chapter, you will be able to: - Describe the eight core spatial analyses (materials, resistivity, well coverage, stream proximity, weather density, monitoring gaps, vulnerability, cross-sections) and what question each one answers. - Explain why spatial autocorrelation, anisotropy, and scale matter for interpreting aquifer maps and monitoring networks. - Understand how the spatial chapters in Part 2 fit together to tell a coherent “where” story for the aquifer system. - Identify which downstream chapters to consult for detailed methods or results on a particular spatial question (e.g., vulnerability vs coverage vs architecture). # Where is What? Spatial analysis reveals the underground architecture of our aquifer system. While temporal patterns tell us **when** things happen, spatial patterns show us **where** they happen - and why location matters for water resource management. ## Why Spatial Analysis Matters ::: {.callout-note icon=false} ## 💻 For Computer Scientists Spatial data violates the **i.i.d. assumption** central to machine learning: - **Spatial autocorrelation**: Nearby points are more similar than distant points - **Non-stationarity**: Statistical properties change across space - **Edge effects**: Boundaries create artifacts - **Scale dependence**: Patterns emerge differently at local vs regional scales **Key Challenge**: Standard cross-validation fails. We need spatial CV with geographic blocking. **Core Methods**: - **Clustering**: K-means, DBSCAN for zone identification - **Interpolation**: Kriging for optimal spatial prediction - **Hot spot analysis**: Getis-Ord Gi* for statistical significance - **Variograms**: Quantify spatial correlation structure ::: ::: {.callout-tip icon=false} ## 🌍 For Hydrologists Spatial patterns reveal **geological controls** on groundwater systems: - **Aquifer heterogeneity**: Sand channels vs clay-rich zones - **Confining layers**: Where do they protect the aquifer? - **Recharge areas**: Where does water enter the system? - **Discharge zones**: Where do streams interact with groundwater? - **Vulnerability**: Which areas need protection most? **Physical Processes**: - **Paleochannels**: Ancient river systems create high-permeability corridors - **Glacial till**: Clay-rich deposits confine the aquifer - **Bedrock topography**: Structural controls on flow patterns - **Stream networks**: Surface-groundwater interaction zones ::: ## The Eight Spatial Analyses ### Aquifer Material Map {#sec-spatial-overview-material} **Question**: Where are the best aquifer materials? **Method**: Map HTEM material types across Unit D (primary aquifer) **Key Finding**: A large fraction of Unit D consists of high-quality sand and gravel, forming distinct NE–SW corridors (paleochannels; see the detailed statistics and maps in the Aquifer Material Map chapter). **Impact**: Identifies drilling targets and validates historical well placement patterns --- ### Resistivity Distribution {#sec-spatial-overview-resistivity} **Question**: How does aquifer quality vary spatially? **Method**: Statistical analysis + mapping of resistivity patterns **Key Finding**: Mean resistivity values consistent with fine-to-medium sand aquifer materials, with substantial spatial variation (see the Resistivity Distribution chapter for exact ranges and calibration details). **Impact**: Calibrates geophysical signatures to hydraulic properties --- ### Well Spatial Coverage **Question**: Is our monitoring network adequate? **Method**: Point pattern analysis, nearest neighbor statistics, geostatistics **Key Finding**: Wells cluster (NN ratio < 1.0), leaving monitoring gaps in some areas **Impact**: Guides expansion of monitoring network to under-sampled regions --- ### Stream Proximity {#sec-spatial-overview-streams} **Question**: Can we study stream-groundwater interaction? **Method**: Spatial proximity analysis between wells and USGS stream gauges **Key Finding**: **Critical gap** - wells with data are 3-25 km from streams (too far for interaction studies) **Impact**: Explains why we can't directly test two-aquifer hypothesis; guides future monitoring investments --- ### Weather Station Density **Question**: Does weather station coverage support spatial precipitation analysis? **Method**: Thiessen polygons, coverage analysis, representativeness assessment **Key Finding**: Regional network of stations provides adequate coverage for county-scale analyses but is sparse for capturing small-scale convective storm variability (see Weather Station Density for quantitative coverage metrics). **Impact**: Defines spatial scale for precipitation-recharge analysis --- ### Monitoring Gap {#sec-spatial-overview-gaps} **Question**: Where are the critical gaps across all 4 data sources? **Method**: Multi-source spatial overlay to identify under-monitored zones **Key Finding**: High-quality aquifer zones lack both groundwater AND weather monitoring **Impact**: Prioritizes locations for targeted monitoring investments --- ### Vulnerability Map {#sec-spatial-overview-vulnerability} **Question**: Which areas are most vulnerable to contamination? **Method**: DRASTIC index integrating depth, recharge, aquifer properties, vadose zone **Key Finding**: 66% moderate vulnerability (confining layers provide protection), 28% high vulnerability in shallow zones **Impact**: Guides land-use planning and wellhead protection zones --- ### Cross-Section Visualization {#sec-spatial-overview-cross-section} **Question**: What is the vertical aquifer architecture? **Method**: N-S and E-W transects through 3D HTEM data (6 stratigraphic units) **Key Finding**: Unit D sandwiched between clay-rich units above and bedrock below - **confined aquifer confirmed** **Impact**: Explains temporal patterns (low seasonality, long memory, barometric response) --- ## Integration: The Spatial Story These 8 analyses weave together to tell a coherent spatial story: **The Physical Structure** (Ch 8: Cross-sections) - 6 stratigraphic units stacked vertically - Unit D (primary aquifer) confined by clay cap and bedrock **The Aquifer Quality** (Ch 2-3: Materials & Resistivity) - 42.7% high-quality sand in NE-SW paleochannels - Heterogeneous distribution requires spatial mapping **The Monitoring Network** (Ch 3-6: Wells, Streams, Weather, Gaps) - Well network clusters, leaving gaps - Stream-groundwater studies impossible with current network - Weather stations adequate for regional but not local analysis - **Targeted investments needed in high-priority zones** **The Vulnerability Assessment** (Ch 7: DRASTIC) - Integrates structure + materials + recharge - Quantifies protection from confining layers - Identifies 28% high-vulnerability zones for management ## Key Spatial Concepts ::: {.callout-note icon=false} ## Understanding Key Spatial Analysis Methods Before diving into the spatial chapters, it's helpful to understand the core statistical methods we use throughout Part 2. These methods were developed over the past century to solve specific problems in geography, geology, and environmental science. ### Variograms (Matheron, 1960s) **What Is It?** A variogram quantifies how similar two measurements are as a function of the distance between them. Developed by French mathematician Georges Matheron in the 1960s (building on work by South African mining engineer Danie Krige in 1951), variograms are the foundation of geostatistics. **Why Does It Matter?** Variograms answer three critical questions for aquifer management: 1. **How far can we trust interpolation?** (the range parameter) 2. **How variable is the aquifer?** (the sill parameter) 3. **How good are our measurements?** (the nugget parameter) **How Does It Work?** 1. Calculate differences between all pairs of measurements 2. Group pairs by distance (lag) 3. Plot average squared difference vs. distance 4. Fit a mathematical model (spherical, exponential, Gaussian) **What Will You See?** The variogram plot shows lag distance (x-axis, in km) versus semivariance (y-axis, a measure of dissimilarity). You'll see: - **Near origin**: Low semivariance (nearby points are similar) - **Rising curve**: Semivariance increases with distance (correlation decreases) - **Plateau (sill)**: Maximum variance reached (~0.245 for our aquifer) - **Range**: Distance where curve flattens (~8.5 km - beyond this, no spatial correlation) - **Nugget**: Y-intercept jump (~0.037 - represents measurement error + micro-scale variation) **Interpretation Guide:** | Parameter | Typical Value | What It Means for Management | |-----------|---------------|------------------------------| | **Range** | 8.5 km | Wells >8.5 km apart provide independent information | | **Sill** | 0.245 | Total aquifer variability - larger = more heterogeneous | | **Nugget** | 0.037 (15%) | Measurement error - <20% is excellent quality | ### Kriging (Krige, 1951; Matheron, 1960s) **What Is It?** Kriging is the optimal spatial interpolation method that uses the variogram to predict values at unmeasured locations while minimizing prediction error. Named after Danie Krige who pioneered these methods for estimating gold ore reserves in South African mines. **Why Does It Matter?** - **Unbiased predictions**: Best linear unbiased predictor (BLUP) - **Uncertainty quantification**: Provides prediction variance at every location - **Accounts for spatial correlation**: Nearby wells get more weight, but redundant wells get less **How Does It Work?** 1. Build variogram from measured well data 2. For each prediction location, calculate weights for nearby wells 3. Weights depend on: distance to target AND correlation between wells 4. Combine weighted values to get prediction + uncertainty **What Will You See?** Kriging produces two complementary maps: 1. **Interpolated surface map**: Smooth continuous surface showing predicted values at every location. Colors typically range from blue (low values, e.g., shallow water table) to red (high values, e.g., deep water table). 2. **Prediction variance map**: Shows uncertainty in predictions. Dark areas = low uncertainty (near wells), bright areas = high uncertainty (far from wells). This map guides where to add new monitoring wells. **How to Interpret:** | Kriging Variance | Meaning | Management Action | |-----------------|---------|-------------------| | **< 0.1** | Low uncertainty (near wells) | Confident in predictions, no new wells needed | | **0.1 - 0.3** | Moderate uncertainty | Adequate for regional planning | | **> 0.3** | High uncertainty (data gaps) | Priority for new monitoring wells | | **> 0.5** | Very high uncertainty | Avoid making management decisions without more data | **When to Use:** - Mapping water levels between wells - Estimating aquifer properties at unsampled locations - Network optimization (target high-uncertainty areas) ### Getis-Ord Gi* Hot Spot Analysis (1992) **What Is It?** A statistical test that identifies spatial clusters that are significant (not random). Developed by Arthur Getis and J. Keith Ord in 1992 to detect local pockets of spatial association in geographic data. **Why Does It Matter?** Visual inspection can be misleading - our brains see patterns even in random data. Gi* provides statistical rigor: - p < 0.05 = 95% confidence the cluster is real - p < 0.01 = 99% confidence the cluster is real **How Does It Work?** 1. For each location, define neighbors (typically 5 km radius) 2. Calculate weighted average of neighbors' values 3. Compare to global average using z-score 4. High z-score = hot spot, Low z-score = cold spot **What Will You See?** A hot spot map displays spatial clusters colored by statistical significance: - **Red zones**: Hot spots (clusters of high values) - e.g., deep water table, high resistivity - **Blue zones**: Cold spots (clusters of low values) - e.g., shallow water table, low resistivity - **White/gray zones**: Not statistically significant (random variation) - **Color intensity**: Darker = stronger statistical significance (higher z-score) The map includes z-score values or p-values to show confidence level. Only colored areas with p < 0.05 represent real clusters (not random chance). **Interpretation:** | Gi* z-score | Meaning | Management Action | |-------------|---------|------------------| | **> 2.58** | Hot spot (99% confidence) | Priority for well development | | **1.96 to 2.58** | Hot spot (95% confidence) | Good drilling target | | **-1.96 to 1.96** | Not significant | Random variation | | **< -1.96** | Cold spot (95% confidence) | Avoid for development | ### Moran's I (Moran, 1950) **What Is It?** A global measure of spatial autocorrelation that tests whether the entire dataset shows clustering, randomness, or dispersion. Developed by statistician Patrick Moran in 1950 as the spatial equivalent of Pearson correlation. **Why Does It Matter?** Moran's I tells us if spatial analysis is even justified: - Positive autocorrelation (I > 0): Spatial patterns exist → use spatial methods - No autocorrelation (I ≈ 0): Randomness → spatial analysis won't help - Negative autocorrelation (I < 0): Checkerboard pattern (rare in nature) **How Does It Work?** 1. **Define neighbors**: Create spatial weight matrix (e.g., wells within 5 km are neighbors) 2. **Calculate deviations**: For each location, find difference from global mean 3. **Compute products**: Multiply each location's deviation by its neighbors' deviations 4. **Weighted average**: Sum all products, weighted by neighbor relationships 5. **Normalize**: Scale to range [-1, 1] for interpretation **Intuitive explanation**: Imagine comparing each well to its neighbors. If high-value wells tend to be near other high-value wells (and low near low), that's positive autocorrelation. If high wells are randomly scattered among low wells, that's zero autocorrelation. **What Will You See?** The Moran scatter plot displays: - **X-axis**: Standardized values at each location (z-scores) - **Y-axis**: Average of neighbors' standardized values (spatial lag) - **Four quadrants**: - Top-right (High-High): High values surrounded by high values - **clustering** - Bottom-left (Low-Low): Low values surrounded by low values - **clustering** - Top-left (Low-High): Low values surrounded by high values - spatial outliers - Bottom-right (High-Low): High values surrounded by low values - spatial outliers The slope of the best-fit line through all points equals Moran's I. A steep positive slope indicates strong clustering. **Interpretation Guide:** | Moran's I | Pattern | Implication | |-----------|---------|-------------| | **0.60 to 1.0** | Strong clustering | Kriging highly effective | | **0.30 to 0.60** | Moderate clustering | Spatial analysis useful | | **-0.30 to 0.30** | Random | Use simpler methods | | **-1.0 to -0.30** | Dispersion | Check data quality | **Our aquifer**: Moran's I = 0.68 (strong positive autocorrelation) confirms spatial methods are appropriate. ::: ### Tobler's First Law > "Everything is related to everything else, but near things are more related than distant things." **Implication**: We can interpolate between measurements, but uncertainty increases with distance. **Quantified**: Spatial correlation range = 8.5 km (from variogram analysis in Ch 3) ### Spatial Scale Hierarchy | Scale | Distance | Features | Data Sources | |-------|----------|----------|--------------| | **Local** | 10-100 m | Individual wells, HTEM grid cells | HTEM 3D grids | | **Intermediate** | 100-1000 m | Paleochannels, well clusters | HTEM 2D aggregates, well networks | | **Regional** | 1-10 km | Basin-scale formations, watersheds | All integrated | **Analysis must match scale to question**: Local vulnerability needs 100m resolution; regional trends work at 1km. ### Anisotropy **Definition**: Spatial correlation stronger in some directions than others **Evidence**: NE-SW paleochannels create directional grain **Impact**: Kriging interpolation should account for anisotropy (elliptical search radius, not circular) ## Interdisciplinary Bridges ### Computer Science ↔ Hydrogeology **Clustering Algorithms** (K-means, DBSCAN) automate what geologists call **lithofacies classification** - CS: "Cluster analysis identifies 4 groups in feature space" - Hydro: "We mapped 4 distinct hydrogeological zones" - **Same result, different language** **Hot Spot Analysis** (Getis-Ord Gi*) provides statistical rigor for what geologists call **anomalies** - CS: "Statistically significant spatial clusters with p < 0.01" - Hydro: "High-permeability zones requiring management attention" - **Statistics validates geological intuition** ### Spatial Autocorrelation **What CS sees**: Moran's I = 0.68 (positive autocorrelation) **What Hydro sees**: Continuous aquifer with gradual lithologic transitions (not random patches) **Combined**: Both agree - spatial interpolation is justified, but account for correlation in uncertainty estimates ## Tools & Technologies ### Visualization - **Plotly**: Interactive maps with zoom, pan, click for details - **3D scatter plots**: Subsurface visualization (X, Y, Z coordinates) - **Contour plots**: Smooth continuous surfaces from point data ### Spatial Analysis - **scipy.spatial.cKDTree**: Fast nearest-neighbor search for well-station matching - **skgstat**: Variogram computation and model fitting - **sklearn.cluster**: K-means, DBSCAN for zone identification ### Coordinate Systems - **WGS84 (EPSG:4326)**: Lat/lon for GPS coordinates (wells, stations) - **UTM Zone 16N (EPSG:32616)**: Meters for HTEM grids and distance calculations - **pyproj**: Accurate coordinate transformation between systems ## Navigation The chapters are ordered from **data characterization** → **network assessment** → **integrated risk mapping**: **Characterization** (Ch 1-2): What do we have? - Aquifer materials and resistivity patterns **Network Assessment** (Ch 3-6): Can we measure what we need? - Well coverage, stream proximity, weather density, monitoring gaps **Risk & Architecture** (Ch 7-8): Where should we focus management? - Vulnerability mapping, vertical structure **Read sequentially** for the full spatial story, or jump to individual chapters for specific questions. --- ## Reflection Questions - After reading this overview, which of the eight spatial analyses do you think is most critical for your own interests (materials, resistivity, coverage, gaps, vulnerability, or cross-sections), and why? - Looking at the roles of HTEM, wells, weather stations, and stream gauges, where do you see the biggest spatial blind spots, and how might those influence conclusions about the aquifer? - When you interpret a spatial map in later chapters, how will you check that the question, data resolution, and analysis scale are appropriately matched? ## Expected Outcomes By the end of Part 2, you will understand: ✅ **Where high-quality aquifer zones are located** (paleochannel corridors) ✅ **Why monitoring networks have gaps** (independent design for different purposes) ✅ **Which areas need protection most** (shallow zones with thin confining layers) ✅ **How vertical confinement works** (clay cap creates sealed system) ✅ **What spatial scale is appropriate** for different analyses (local vs regional) **Most importantly**: You'll see how **spatial patterns explain temporal behavior** - the confined structure (Ch 8) determines the system response (Part 3: Temporal Patterns). Let's map the aquifer. 🗺️