54 Terminology Translation Guide

The Rosetta Stone for interdisciplinary aquifer data science

55 Why Translation Matters

Environmental data science requires collaboration across disciplines that often use different words for the same concepts, or worse, the same words for different concepts. This chapter serves as a living translation guide to bridge these gaps.

Who this helps: - Computer scientists learning hydrogeology terminology - Hydrogeologists learning data science methods - Statisticians understanding domain context - Geophysicists connecting EM theory to analysis - Students navigating multiple disciplines

How to use this guide: - Search: Use Ctrl+F / Cmd+F to find any term quickly. - Read across: Check equivalent concepts in other disciplines. - Check confusions: See “Common Confusion” sections for pitfalls. - See examples: “In This Project” shows concrete applications.

If you are completely new to groundwater, start with the Plain-Language Basics below. You do not need to memorize definitions; come back here whenever a chapter uses a term you do not recognize.

56 Plain-Language Basics

These are the core ideas that appear throughout the playbook, written for readers with no water background.

Term	Plain Description	Why It Matters in This Playbook	Real Example
Groundwater	Water stored in pores and cracks of rocks and sediments underground.	It is the main water source we are trying to understand and manage.	When you drill a well 50 meters deep and water fills it to 10 meters below surface, that’s groundwater from the aquifer.
Aquifer	A body of rock or sediment that can store and transmit usable amounts of groundwater—like a buried sponge.	Most of the analyses ask: Where is the aquifer? How full is it? How does it respond to weather and pumping?	Unit D (Mahomet Aquifer) is a buried sand/gravel valley 12-96m deep that stores billions of gallons of water.
Confining layer	A layer of clay or rock that does not let much water pass through.	Protects deeper aquifers from quick changes at the land surface, making them respond more slowly.	Unit E (clay layer above Unit D) prevents surface spills from quickly reaching the drinking water aquifer below.
Recharge	Water that soaks down from the surface (rain, snowmelt, irrigation) to refill the aquifer.	Links weather and land-surface processes to long-term groundwater levels.	Spring rains in Illinois soak through soil → percolate down through sand → raise water levels in Unit D over weeks to months.
Well	A hole drilled into the ground to reach groundwater, often with sensors that measure water level.	Provides direct observations of how the aquifer is behaving at specific locations.	Our 356 observation wells measure water levels every 15 minutes, creating a 1-million-record time series.
Water level	The height of groundwater in a well, usually measured relative to a reference point.	Rising or falling water levels tell us if the aquifer is gaining or losing storage.	If water level rises 2 meters in spring, the aquifer gained storage (recharged). If it drops 1 meter in summer, it lost storage.
HTEM	Helicopter-borne geophysical survey that measures how the ground resists electrical currents.	Gives us a 3D picture of underground materials without drilling, which we link to aquifer properties.	2008 helicopter survey mapped 2,361 km² in weeks—would take decades and millions of dollars to drill that many wells.
Resistivity	A measure of how strongly a material resists electric current; clays are low, sands and gravels are high.	Used as a proxy for material type and aquifer quality in HTEM maps.	Clay: 5-30 Ω·m (low), Sand: 100-200 Ω·m (high). High resistivity = good aquifer material.
Confined aquifer	An aquifer trapped between confining layers, reacting mainly to pressure changes, not directly to surface water table.	Explains why some wells show tiny seasonal swings but long-term memory of past conditions.	When Unit D is sealed by clay above/bedrock below, water levels change slowly (±0.5m) but track multi-year climate patterns.
Unconfined aquifer	An aquifer whose top surface is the water table, directly connected to the surface.	Responds quickly to rain and drought with larger seasonal swings.	Shallow aquifers near streams can swing 3-5 meters seasonally, rising quickly after rain, dropping in summer.
Hydraulic head	The potential energy of water at a point (combination of elevation and pressure).	Water flows from high head to low head—this determines groundwater flow direction.	If Well A has head of 200m and Well B has 195m (5km away), water flows from A→B at ~1 meter drop per kilometer.
Transmissivity	How easily water flows horizontally through the full thickness of an aquifer.	High transmissivity = wells produce more water, aquifer recovers faster from pumping.	Good sand aquifer: T = 1000 m²/day (productive). Clay layer: T = 1 m²/day (poor, can’t supply wells).
Storativity	The volume of water an aquifer releases (or stores) per unit area per unit head change.	Determines how much water level drops when you pump, or rises when it rains.	Unconfined: S = 0.15 (15% of aquifer volume drainable). Confined: S = 0.0001 (only 0.01% released by pressure).

57 Core Concept Translations

57.1 Master Translation Table

Computer Science	Hydrogeology	Statistics	Geophysics	Unified Meaning
Outlier detection	Anomalous water levels	Statistical anomaly	Measurement error	Identifying observations that deviate from expected patterns - requires domain context to interpret
Feature engineering	Aquifer properties	Predictor variables	Material parameters	Transforming raw observations into model inputs that capture relevant physics
Clustering	Aquifer compartments	Spatial grouping	Material zones	Identifying natural groupings where similar properties occur together
Classification	Lithology mapping	Categorical prediction	Material identification	Assigning observations to discrete categories (e.g., sand vs clay)
Regression	Empirical relationships	Continuous prediction	Forward modeling	Predicting continuous values (e.g., water level, resistivity)
Time series forecasting	Water level prediction	ARIMA/Prophet	-	Extrapolating temporal patterns into the future
Dimensionality reduction	Stratigraphic simplification	PCA/Factor analysis	Layer averaging	Reducing complexity while preserving essential information
Interpolation	Spatial estimation	Kriging/IDW	Grid generation	Estimating values at unobserved locations from nearby measurements
Cross-validation	Independent validation	Model assessment	Test-train split	Evaluating model performance on data not used for training
Hyperparameter tuning	Model calibration	Parameter optimization	Inversion tuning	Finding optimal configuration for model performance
Supervised learning	Training on known lithology	Labeled data modeling	Constrained inversion	Learning from observations with known outcomes
Unsupervised learning	Exploratory analysis	Pattern discovery	Data-driven zonation	Finding structure in data without predefined labels
Ensemble methods	Multi-model prediction	Bagging/Boosting	Combined inversions	Combining multiple models to improve predictions
Neural networks	Non-linear modeling	Deep learning	Complex mapping	Flexible models that learn hierarchical patterns
Gradient descent	Optimization	Iterative minimization	Inversion algorithm	Iteratively improving model by following error gradient
Loss function	Misfit function	Error metric	Data residual	Quantifies difference between model predictions and observations
Overfitting	Over-parameterization	Poor generalization	Non-unique solution	Model fits training data perfectly but fails on new data
Regularization	Parsimony constraint	Penalized regression	Damping/Smoothing	Constraining model complexity to prevent overfitting
Batch processing	Bulk analysis	-	Survey-wide processing	Processing multiple records simultaneously for efficiency
Pipeline	Workflow	Processing chain	Analysis sequence	Series of automated steps from raw data to results

58 Spatial Analysis Translations

Computer Science	Hydrogeology	Statistics	Unified Meaning
Spatial autocorrelation	Aquifer continuity	Tobler’s First Law	Nearby locations are more similar than distant ones
Variogram	Spatial structure	Covariance function	How similarity decreases with distance
Kriging	Optimal interpolation	BLUE estimation	Best Linear Unbiased Estimator for spatial data
Neighborhood search	Zone of influence	Local estimation	Determining which nearby points affect prediction
Anisotropy	Directional permeability	Directional correlation	Properties vary differently in different directions
Range	Correlation distance	Spatial dependence limit	Maximum distance where spatial correlation exists
Sill	Total variance	Asymptotic variance	Variance at distances beyond correlation
Nugget	Measurement error	Small-scale variance	Discontinuity at zero distance

59 Temporal Analysis Translations

Computer Science	Hydrogeology	Statistics	Unified Meaning
Autocorrelation	System memory	Temporal dependence	Current values depend on past values
Lag	Response time	Time shift	Delay between cause and effect
Trend	Long-term change	Systematic component	Non-stationary mean over time
Seasonality	Annual cycle	Periodic component	Repeating patterns at fixed intervals
Stationarity	Equilibrium	Constant statistics	Statistical properties don’t change over time
Differencing	Change analysis	Detrending	Removing non-stationarity by subtracting previous values
Decomposition	Component separation	STL/Seasonal	Breaking time series into trend, seasonal, residual
Change point	Regime shift	Structural break	Time when system behavior fundamentally changes
Wavelet analysis	Multi-scale patterns	Time-frequency	Identifying patterns at multiple timescales

60 Data Quality Translations

Computer Science	Hydrogeology	Statistics	Unified Meaning
Missing data	Measurement gaps	NA/NaN values	Observations not recorded or lost
Imputation	Gap-filling	Missing value estimation	Estimating missing values from available data
Normalization	Unit conversion	Standardization	Scaling variables to common range
Filtering	Data cleaning	Outlier removal	Removing erroneous or irrelevant observations
Resampling	Time aggregation	Temporal binning	Changing temporal resolution (hourly → daily)
Data fusion	Multi-source integration	Data combination	Merging different data types for joint analysis
Quality flags	Data codes	Data qualifiers	Indicators of reliability or issues

61 Model Performance Translations

Computer Science	Statistics	Hydrogeology	Unified Meaning
Accuracy	Correct classification rate	Prediction success	Fraction of predictions that are correct
Precision	Positive predictive value	-	Of predicted positives, how many are correct
Recall	Sensitivity / TPR	-	Of actual positives, how many were found
F1 score	Harmonic mean	-	Balanced measure of model performance
RMSE	Root mean squared error	Prediction error	Average magnitude of prediction errors
R²	Coefficient of determination	Variance explained	Proportion of variance captured by model
AIC/BIC	Information criterion	Model parsimony	Balances model fit with complexity
Confusion matrix	Classification table	Contingency table	Cross-tabulation of predicted vs actual classes

62 Common Confusion Points

62.1 1. Spatial Autocorrelation

62.1.1 What Each Discipline Says

Computer Science: “Data points that are close together have similar values. This violates the i.i.d. assumption of most ML algorithms.”

Hydrogeology: “Aquifer properties vary smoothly across space due to depositional processes. Tobler’s First Law: Everything is related, but near things are more related.”

Statistics: “The covariance structure depends on distance. We model this with variograms and use spatial cross-validation instead of random splits.”

62.1.2 Why It Matters

Standard train/test splits fail (nearby points in train and test leak information)
Must use spatial CV or block CV
Predictions inherit spatial structure from training data
Uncertainty quantification requires spatial correlation modeling

62.1.3 In This Project

HTEM resistivity shows strong spatial autocorrelation (range ~500-1000m)
Well measurements are spatially correlated (aquifer continuity)
See: Part 2 - Spatial Patterns for variogram analysis

62.2 2. Feature vs Property

62.2.1 What Each Discipline Says

Computer Science: “Features are input variables to a model. We engineer features by transforming raw data (e.g., log transform, polynomial features, interactions).”

Hydrogeology: “Aquifer properties are physical characteristics: transmissivity (T), storativity (S), hydraulic conductivity (K). These come from pumping tests and geological analysis.”

62.2.2 How They Connect

CS features ← Derived from → Hydro properties
feature_depth = Z-coordinate → Physical: confining_pressure
feature_resistivity_log → Physical: clay_content

62.2.3 In This Project

HTEM resistivity → Feature for predicting material type
Depth, elevation, neighboring values → Features
True aquifer properties (K, T, S) are target variables OR constraints

62.3 3. Outlier vs Anomaly

62.3.1 What Each Discipline Says

Computer Science: Outlier = Statistical anomaly (>3σ from mean)

Hydrogeology: Anomaly = Unexpected measurement (could be real or error)

62.3.2 What It Could Be

Measurement error (sensor malfunction) → Remove
Pump test (intentional drawdown) → Flag, don’t remove
Natural event (earthquake, flood) → Keep, study
Contamination plume (localized change) → Key finding!

62.3.3 Decision Rule

Don’t auto-remove outliers! Investigate with domain knowledge.

62.3.4 In This Project

Water level “outliers” often = pump tests (intentional)
Resistivity “outliers” often = geological contacts (real)
See: Part 1 - Data Quality for outlier handling

62.4 4. Training vs Calibration

62.4.1 What Each Discipline Says

Computer Science: Training = Fitting model parameters to minimize loss function on labeled data

Hydrogeology: Calibration = Adjusting model parameters until model matches field observations

Geophysics: Inversion = Estimating subsurface structure from surface measurements (ill-posed problem)

62.4.2 Similarities

All optimize parameters to match observations
All risk overfitting

62.4.3 Differences

Training: Large labeled dataset, many samples
Calibration: Few observations, physics-based model
Inversion: Underdetermined (infinite solutions), requires regularization

62.4.4 In This Project

HTEM interpretation = Inversion (resistivity → lithology)
Material type classifier = Training (supervised learning)
Groundwater model = Calibration (match observed heads)

62.5 5. Prediction vs Forecast

62.5.1 Statistics Perspective

Prediction: Estimating unknown values (spatial or temporal)
Forecast: Predicting future values (temporal only)
Projection: Conditional “what-if” scenarios

62.5.2 Hydrogeology Perspective

Prediction: Where to drill for water (spatial)
Forecast: Water levels next month (temporal)
Projection: Impact of climate change (scenario)

62.5.3 Key Distinction

Forecasts assume trends continue. Projections explore alternatives.

62.5.4 In This Project

Well productivity prediction (spatial)
30-day water level forecast (temporal)
Climate change projections (scenarios)

62.6 6. Clustering Purposes

62.6.1 Different Goals

Computer Science: Find groups that minimize within-cluster variance

Hydrogeology: Delineate aquifer compartments with similar flow properties

Statistics: Identify distinct statistical populations

62.6.2 Critical Difference

CS: k chosen by elbow plot or silhouette score
Hydro: k should match expected geological units
Stats: k validated by mixture model BIC

62.6.3 In This Project

We constrain k=6 for stratigraphic units (domain knowledge) rather than optimize k statistically.

62.7 7. Depth vs Elevation

62.7.1 Common Confusion

These are NOT interchangeable!

⚠️ Critical Beginner Mistake

Many newcomers confuse depth and elevation. This causes serious analysis errors!

The key difference:

Depth goes DOWN from where you stand (depth = 0 at surface, increases downward)
Elevation goes UP from sea level (elevation increases upward, like a mountain)

They move in opposite directions!

62.7.2 Depth to Water (DTW)

What it is: How far down to reach water (feet or meters)
Direction: Increases when water level drops (more depth to reach water)
Reference: Measured from land surface (where you stand)
Used for: Well drilling depth, pumping lift
Example: “Water is 10 meters deep” = 10 meters below your feet

62.7.3 Water Surface Elevation (WSE)

What it is: Height of water surface above sea level
Direction: Decreases when water level drops (surface is lower)
Reference: Measured from mean sea level (like mountain heights)
Used for: Hydraulic gradients, flow direction
Comparable: Can compare between wells at different surface elevations
Example: “Water surface elevation is 195 meters” = 195 meters above sea level

62.7.4 Formula

WSE = Land_Surface_Elevation - Depth_To_Water

Concrete Example

Well A:

Land surface elevation: 210 meters above sea level
Depth to water: 15 meters below surface
Water surface elevation: 210 - 15 = 195 meters above sea level

Well B (1 km away):

Land surface elevation: 205 meters above sea level
Depth to water: 8 meters below surface
Water surface elevation: 205 - 8 = 197 meters above sea level

What this tells us:

Water flows from B (197m) to A (195m) because elevation is higher at B
Even though Well A has greater depth to water (15m vs 8m), water is actually lower in Well A
You cannot compare depths directly between wells at different elevations!

Why WSE matters: It tells you which direction groundwater flows (high → low elevation), regardless of surface topography.

62.7.5 In This Project

Database stores: DTW (measured directly)
Analysis uses: WSE (calculated from formula)
Flow direction: Determined by WSE gradients, not DTW
See: Data Dictionary for database schema and column definitions

62.8 8. Resistivity vs Conductivity

62.8.1 Geophysics Clarification

Resistance (Ω): - Property of a specific object - Depends on geometry

Resistivity (Ω·m): - Material property (independent of geometry) - What HTEM measures - Inverse of electrical conductivity

Electrical Conductivity (S/m or mS/m): - Inverse of resistivity: σ = 1/ρ - Higher in saline water, lower in fresh water

Hydraulic Conductivity (m/day): - Completely different! (flow property, not electrical) - Can correlate with resistivity (sand = high K, high ρ)

62.8.2 In This Project

HTEM measures resistivity (ρ in Ω·m)
Clay: 1-10 Ω·m (low resistivity, low electrical conductivity)
Sand: 100-1000 Ω·m (high resistivity, high hydraulic conductivity)

63 Discipline-Specific Glossaries

63.1 Computer Science → Hydrogeology

When you say… → Hydrogeologists mean…

“Training data” → Wells with known lithology
“Test data” → New wells or blind validation set
“Features” → Geophysical measurements + spatial coordinates
“Labels” → Material types from drill logs
“Model prediction” → Lithology interpretation
“Model uncertainty” → Geological uncertainty / non-uniqueness
“Hyperparameter tuning” → Model calibration
“Feature importance” → Sensitivity analysis
“Ensemble model” → Multiple scenarios / realizations
“Cross-validation” → Independent validation wells

63.2 Hydrogeology → Computer Science

When you say… → Computer scientists mean…

“Hydraulic head” → Target variable (regression)
“Transmissivity” → Derived feature (from multiple sources)
“Aquifer heterogeneity” → High data variance / noise
“Anisotropy” → Directional features matter
“Boundary conditions” → Model constraints
“Calibration” → Training / fitting
“Validation” → Test set evaluation
“Conceptual model” → Model architecture choice
“Uncertainty” → Prediction confidence intervals
“Sensitivity analysis” → Feature importance / ablation study

63.3 Statistics → Hydrogeology

When you say… → Hydrogeologists mean…

“Random variable” → Measured quantity with uncertainty
“Probability distribution” → Range of plausible values
“Spatial process” → Geological property field
“Stochastic simulation” → Multiple equally likely realizations
“Bayesian inference” → Updating understanding with new data
“Prior distribution” → Geological expectation before data
“Likelihood” → Consistency with observations
“Posterior distribution” → Updated geological understanding

64 Quick Reference Cards

64.1 For Computer Scientists

64.1.1 Key Concepts

Hydraulic head = Potential energy of water (drives flow)
Darcy’s Law = Q = -K·A·(dh/dl) (groundwater’s Ohm’s Law)
Aquifer types: Confined (pressurized) vs Unconfined (water table)
Transmissivity = How easily water flows horizontally (T = K × b)
Storativity = How much water is stored/released
Recharge = Water entering aquifer (from precipitation)
Discharge = Water leaving aquifer (to wells, streams)

64.1.2 Physics Constraints Required

Water flows downhill (hydraulic gradient)
Conservation of mass (water balance)
Properties vary smoothly (geological continuity)
Anisotropic (horizontal K ≠ vertical K)

64.2 For Hydrogeologists

64.2.1 Key Concepts

Supervised learning = You provide examples (wells + lithology)
Features = Variables input to model (depth, resistivity, etc.)
Overfitting = Model memorizes training data, fails on new data
Cross-validation = Test on data not used for training
Regularization = Penalizing overly complex models
Ensemble methods = Combining multiple models (like multiple realizations)
Neural networks = Flexible non-linear models (like complex transfer functions)

64.2.2 Common Pitfalls

Don’t trust models on data outside training range (extrapolation)
Spatial autocorrelation violates independence assumptions
More features ≠ better (curse of dimensionality)
Correlation ≠ causation (even strong correlations)

64.3 For Statisticians

64.3.1 Key Concepts

Physical constraints limit model flexibility (water flows downhill)
Geological processes create spatial structure (not random)
Measurement errors are often systematic (sensor drift, calibration)
Missing data is rarely random (wells where water is needed)
Outliers often = interesting phenomena (not errors)
Time scales matter (recharge takes weeks, regional flow takes years)

64.3.2 Statistical Challenges

Small sample sizes (expensive to drill wells)
High-dimensional but sparse (many features, few samples)
Non-stationary processes (climate change, land use)
Censored data (detection limits, regulatory thresholds)

65 Project-Specific Examples

65.1 Example 1: K-means HTEM

65.1.1 Computer Science Perspective

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=6)  # Minimize within-cluster variance
clusters = kmeans.fit_predict(resistivity_features)

65.1.2 Hydrogeology Perspective

“We’re grouping similar resistivity values to delineate geological units (A-F). The k=6 is chosen because we expect 6 stratigraphic layers, not from elbow plot optimization.”

65.1.3 Statistics Perspective

“This is mixture modeling with hard assignments. We assume 6 Gaussian components. Could validate with BIC, but domain knowledge constrains k.”

65.2 Example 2: ARIMA Forecasting

65.2.1 Statistics Perspective

from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(water_levels, order=(1,1,1), seasonal_order=(1,1,1,12))
forecast = model.predict(steps=30)

65.2.2 Hydrogeology Perspective

“We’re predicting future water levels accounting for seasonal recharge cycles (12-month period) and short-term trends. The AR(1) component captures aquifer memory.”

65.2.3 Computer Science Perspective

“Time series model that uses past values to predict future. The (p,d,q) and seasonal orders are hyperparameters chosen by AIC/BIC or domain knowledge (12-month cycle).”

65.3 Example 3: Interpolation Choice

65.3.1 When to Use Kriging

Need uncertainty estimates (kriging variance)
Data follows Gaussian assumptions
Spatial autocorrelation is primary pattern
Interpretability matters

65.3.2 When Use ML

Non-stationary processes
Multiple covariates available
Non-linear relationships
Large datasets (>100k points)

65.3.3 In This Project

We use both, compare results, and choose based on validation metrics.

66 Visual Concept Map

graph TD
    A[Environmental Data Science] --> B[Computer Science]
    A --> C[Hydrogeology]
    A --> D[Statistics]
    A --> E[Geophysics]

    B --> B1[Algorithms]
    B --> B2[Data Structures]
    B --> B3[Software Engineering]

    C --> C1[Aquifer Properties]
    C --> C2[Flow Systems]
    C --> C3[Water Quality]

    D --> D1[Spatial Statistics]
    D --> D2[Time Series]
    D --> D3[Uncertainty]

    E --> E1[EM Theory]
    E --> E2[Inversion]
    E --> E3[Material Properties]

    B1 -.-> C2
    C1 -.-> D1
    E3 -.-> C1
    D2 -.-> C2

67 Operations & Decision Support Terminology

This section covers terms commonly used in Part 5 (Predictive Operations) that bridge machine learning, optimization, and water management.

67.1 Model Performance Metrics

Term	What It Means	When to Use	Example
R² (R-squared)	Proportion of variance explained by model (0-1). Higher = better fit.	Comparing models, regression tasks	R² = 0.85 means model explains 85% of water level variation
RMSE	Root Mean Square Error - average prediction error in original units	Understanding “how wrong” predictions are	RMSE = 0.3 m means predictions typically off by 0.3 meters
MAE	Mean Absolute Error - average absolute prediction error	Robust to outliers	MAE = 0.2 m means average absolute error is 0.2 meters
Accuracy	Percentage of correct classifications	Classification tasks (sand vs clay)	86% accuracy = 86 of 100 predictions correct
Precision	Of predictions labeled “positive”, how many were correct	When false positives are costly	90% precision = 9 of 10 “sand” predictions were actually sand
Recall	Of actual positives, how many did model find	When false negatives are costly	80% recall = found 8 of 10 actual sand locations

67.2 Optimization Terminology

Term	What It Means	Water Management Context
Pareto frontier	Set of solutions where improving one objective worsens another	Trade-off between well yield (want high) and drilling cost (want low)
Multi-objective optimization	Finding best trade-offs across competing goals	Balancing yield, cost, uncertainty, and sustainability simultaneously
Constraint	Hard limit that cannot be violated	“Well must be >500m from contamination source”
Objective function	Mathematical formula being optimized	Maximize: 0.35×Yield + 0.25×(1-Cost) + 0.25×Confidence + 0.15×Sustainability
Risk-adjusted NPV	Net Present Value accounting for uncertainty	Expected value × probability of success

67.3 Explainability & Trust

Term	What It Means	Why It Matters
SHAP values	Feature contribution to individual predictions	“This well predicted as sand because: 40% from resistivity, 30% from depth, 20% from location”
Feature importance	Global ranking of which inputs matter most	“Across all predictions, resistivity is most important (35%), then depth (25%)”
Black box model	Model where internal logic is hidden	Neural networks - accurate but hard to explain to stakeholders
Interpretable model	Model with transparent logic	Decision trees - can show exact rules: “If resistivity > 100 AND depth < 50m → Sand”
Confidence interval	Range where true value likely falls	“Yield = 135 GPM ± 15 GPM (95% CI)” means 95% chance true yield is 120-150 GPM
Prediction interval	Range where future observations likely fall	Wider than CI because includes both model and data uncertainty

67.4 Common Confusion Pairs

Term 1	Term 2	The Difference
Parameter (ML)	Parameter (Hydro)	ML: Weights learned during training. Hydro: Physical properties (transmissivity, storativity)
Hyperparameter	Parameter	Hyperparameter: Set before training (e.g., tree depth). Parameter: Learned during training
Training	Calibration	Training = ML term. Calibration = Hydro term. Both mean fitting model to data
Validation	Verification	Validation: Does model perform well? Verification: Is model coded correctly?
Uncertainty	Error	Uncertainty: Range of possible values. Error: Difference between prediction and actual
Forecast	Prediction	Forecast: Future values (time-dependent). Prediction: Any estimated value
Overfitting	Over-parameterization	Both mean: Model too complex for available data, won’t generalize

67.5 Autocorrelation Interpretation Guide

When analyzing water level time series, autocorrelation (ACF) values tell you about aquifer “memory”:

ACF at Lag	Physical Meaning	Aquifer Type Indication
ACF(1 month) = 0.95	Very high memory - this month almost completely predicts next month	Confined aquifer, slow response
ACF(1 month) = 0.50	Moderate memory - this month gives ~25% information about next month (0.5²)	Semi-confined or deep unconfined
ACF(1 month) = 0.20	Low memory - rapid response, levels change quickly	Shallow unconfined, stream-connected
ACF(12 months) = 0.50	Annual cycle explains ~25% of variance	Strong seasonal forcing (annual recharge pattern)
ACF decays slowly	Long-term persistence, multi-year droughts/wet periods	Climate-dominated system, slow recovery
ACF decays quickly	Short-term memory only	Weather-dominated system, fast recovery

Rule of thumb: Confined aquifers typically show ACF(12 months) > 0.3; unconfined aquifers show ACF(12 months) < 0.2.

68 Contributing to This Guide

68.1 How to Contribute

68.1.1 Found a Term That Needs Translation?

Submit a PR or issue with: - The term in your discipline - How it’s used in context - Potential equivalents in other disciplines - Example from this project

68.1.2 Disagree with a translation?

Translations are nuanced! Start a discussion: - Explain your perspective - Provide references if available - Suggest alternative phrasing

68.1.3 Add Discipline?

We welcome additional perspectives: - Climate science - Ecology - Economics - Policy/regulation - Engineering

69 Further Resources

69.1 Books Bridging Disciplines

CS ↔︎ Hydrogeology: “Hydrogeological Data Analysis” by Kitanidis
Stats ↔︎ Spatial: “Statistics for Spatial Data” by Cressie
ML ↔︎ Hydrology: “Data-Driven Modeling of Environmental Systems” by Reichstein et al.

69.2 Online Glossaries

USGS Water Science Glossary: water.usgs.gov/edu/dictionary.html
Machine Learning Glossary: ml-cheatsheet.readthedocs.io
Geostatistics Glossary: geostatisticslessons.com

69.3 Community Forums

Hydrogeology: eng-tips.com
Data Science: stats.stackexchange.com
Geophysics: seg.org/connect

70 Summary

This translation guide serves as a living bridge between disciplines. As the project evolves, so will this guide.

Goal: When a computer scientist, hydrogeologist, statistician, or geophysicist reads the same analysis, they should each understand it in their own terms while appreciating what the other disciplines contribute.

Next Steps: 1. Bookmark this page for quick reference 2. Use Ctrl+F to search for terms as needed 3. Suggest additions via issues/PRs 4. Share with colleagues from other disciplines

Questions? Open an issue with the terminology label.

Last Updated: November 26, 2025 Contributors: Open to all License: CC-BY-4.0 (attribution required)

--- title: "Terminology Translation Guide" subtitle: "The Rosetta Stone for interdisciplinary aquifer data science" description: "Cross-disciplinary translation bridging computer science, hydrogeology, statistics, and geophysics" --- # Why Translation Matters Environmental data science requires collaboration across disciplines that often use **different words for the same concepts**, or worse, **the same words for different concepts**. This chapter serves as a living translation guide to bridge these gaps. **Who this helps:** - Computer scientists learning hydrogeology terminology - Hydrogeologists learning data science methods - Statisticians understanding domain context - Geophysicists connecting EM theory to analysis - Students navigating multiple disciplines **How to use this guide:** - **Search:** Use Ctrl+F / Cmd+F to find any term quickly. - **Read across:** Check equivalent concepts in other disciplines. - **Check confusions:** See "Common Confusion" sections for pitfalls. - **See examples:** "In This Project" shows concrete applications. If you are completely new to groundwater, start with the **Plain-Language Basics** below. You do not need to memorize definitions; come back here whenever a chapter uses a term you do not recognize. --- # Plain-Language Basics These are the core ideas that appear throughout the playbook, written for readers with **no water background**. | Term | Plain Description | Why It Matters in This Playbook | Real Example | |------|-------------------|----------------------------------|--------------| | **Groundwater** | Water stored in pores and cracks of rocks and sediments underground. | It is the main water source we are trying to understand and manage. | When you drill a well 50 meters deep and water fills it to 10 meters below surface, that's groundwater from the aquifer. | | **Aquifer** | A body of rock or sediment that can store and transmit usable amounts of groundwater—like a buried sponge. | Most of the analyses ask: *Where is the aquifer? How full is it? How does it respond to weather and pumping?* | Unit D (Mahomet Aquifer) is a buried sand/gravel valley 12-96m deep that stores billions of gallons of water. | | **Confining layer** | A layer of clay or rock that does not let much water pass through. | Protects deeper aquifers from quick changes at the land surface, making them respond more slowly. | Unit E (clay layer above Unit D) prevents surface spills from quickly reaching the drinking water aquifer below. | | **Recharge** | Water that soaks down from the surface (rain, snowmelt, irrigation) to refill the aquifer. | Links weather and land-surface processes to long-term groundwater levels. | Spring rains in Illinois soak through soil → percolate down through sand → raise water levels in Unit D over weeks to months. | | **Well** | A hole drilled into the ground to reach groundwater, often with sensors that measure water level. | Provides direct observations of how the aquifer is behaving at specific locations. | Our 356 observation wells measure water levels every 15 minutes, creating a 1-million-record time series. | | **Water level** | The height of groundwater in a well, usually measured relative to a reference point. | Rising or falling water levels tell us if the aquifer is gaining or losing storage. | If water level rises 2 meters in spring, the aquifer gained storage (recharged). If it drops 1 meter in summer, it lost storage. | | **HTEM** | Helicopter-borne geophysical survey that measures how the ground resists electrical currents. | Gives us a 3D picture of underground materials without drilling, which we link to aquifer properties. | 2008 helicopter survey mapped 2,361 km² in weeks—would take decades and millions of dollars to drill that many wells. | | **Resistivity** | A measure of how strongly a material resists electric current; clays are low, sands and gravels are high. | Used as a proxy for material type and aquifer quality in HTEM maps. | Clay: 5-30 Ω·m (low), Sand: 100-200 Ω·m (high). High resistivity = good aquifer material. | | **Confined aquifer** | An aquifer trapped between confining layers, reacting mainly to pressure changes, not directly to surface water table. | Explains why some wells show tiny seasonal swings but long-term memory of past conditions. | When Unit D is sealed by clay above/bedrock below, water levels change slowly (±0.5m) but track multi-year climate patterns. | | **Unconfined aquifer** | An aquifer whose top surface is the water table, directly connected to the surface. | Responds quickly to rain and drought with larger seasonal swings. | Shallow aquifers near streams can swing 3-5 meters seasonally, rising quickly after rain, dropping in summer. | | **Hydraulic head** | The potential energy of water at a point (combination of elevation and pressure). | Water flows from high head to low head—this determines groundwater flow direction. | If Well A has head of 200m and Well B has 195m (5km away), water flows from A→B at ~1 meter drop per kilometer. | | **Transmissivity** | How easily water flows horizontally through the full thickness of an aquifer. | High transmissivity = wells produce more water, aquifer recovers faster from pumping. | Good sand aquifer: T = 1000 m²/day (productive). Clay layer: T = 1 m²/day (poor, can't supply wells). | | **Storativity** | The volume of water an aquifer releases (or stores) per unit area per unit head change. | Determines how much water level drops when you pump, or rises when it rains. | Unconfined: S = 0.15 (15% of aquifer volume drainable). Confined: S = 0.0001 (only 0.01% released by pressure). | --- # Core Concept Translations ## Master Translation Table | Computer Science | Hydrogeology | Statistics | Geophysics | Unified Meaning | |------------------|--------------|------------|------------|-----------------| | **Outlier detection** | Anomalous water levels | Statistical anomaly | Measurement error | Identifying observations that deviate from expected patterns - requires domain context to interpret | | **Feature engineering** | Aquifer properties | Predictor variables | Material parameters | Transforming raw observations into model inputs that capture relevant physics | | **Clustering** | Aquifer compartments | Spatial grouping | Material zones | Identifying natural groupings where similar properties occur together | | **Classification** | Lithology mapping | Categorical prediction | Material identification | Assigning observations to discrete categories (e.g., sand vs clay) | | **Regression** | Empirical relationships | Continuous prediction | Forward modeling | Predicting continuous values (e.g., water level, resistivity) | | **Time series forecasting** | Water level prediction | ARIMA/Prophet | - | Extrapolating temporal patterns into the future | | **Dimensionality reduction** | Stratigraphic simplification | PCA/Factor analysis | Layer averaging | Reducing complexity while preserving essential information | | **Interpolation** | Spatial estimation | Kriging/IDW | Grid generation | Estimating values at unobserved locations from nearby measurements | | **Cross-validation** | Independent validation | Model assessment | Test-train split | Evaluating model performance on data not used for training | | **Hyperparameter tuning** | Model calibration | Parameter optimization | Inversion tuning | Finding optimal configuration for model performance | | **Supervised learning** | Training on known lithology | Labeled data modeling | Constrained inversion | Learning from observations with known outcomes | | **Unsupervised learning** | Exploratory analysis | Pattern discovery | Data-driven zonation | Finding structure in data without predefined labels | | **Ensemble methods** | Multi-model prediction | Bagging/Boosting | Combined inversions | Combining multiple models to improve predictions | | **Neural networks** | Non-linear modeling | Deep learning | Complex mapping | Flexible models that learn hierarchical patterns | | **Gradient descent** | Optimization | Iterative minimization | Inversion algorithm | Iteratively improving model by following error gradient | | **Loss function** | Misfit function | Error metric | Data residual | Quantifies difference between model predictions and observations | | **Overfitting** | Over-parameterization | Poor generalization | Non-unique solution | Model fits training data perfectly but fails on new data | | **Regularization** | Parsimony constraint | Penalized regression | Damping/Smoothing | Constraining model complexity to prevent overfitting | | **Batch processing** | Bulk analysis | - | Survey-wide processing | Processing multiple records simultaneously for efficiency | | **Pipeline** | Workflow | Processing chain | Analysis sequence | Series of automated steps from raw data to results | --- # Spatial Analysis Translations | Computer Science | Hydrogeology | Statistics | Unified Meaning | |------------------|--------------|------------|-----------------| | **Spatial autocorrelation** | Aquifer continuity | Tobler's First Law | Nearby locations are more similar than distant ones | | **Variogram** | Spatial structure | Covariance function | How similarity decreases with distance | | **Kriging** | Optimal interpolation | BLUE estimation | Best Linear Unbiased Estimator for spatial data | | **Neighborhood search** | Zone of influence | Local estimation | Determining which nearby points affect prediction | | **Anisotropy** | Directional permeability | Directional correlation | Properties vary differently in different directions | | **Range** | Correlation distance | Spatial dependence limit | Maximum distance where spatial correlation exists | | **Sill** | Total variance | Asymptotic variance | Variance at distances beyond correlation | | **Nugget** | Measurement error | Small-scale variance | Discontinuity at zero distance | --- # Temporal Analysis Translations | Computer Science | Hydrogeology | Statistics | Unified Meaning | |------------------|--------------|------------|-----------------| | **Autocorrelation** | System memory | Temporal dependence | Current values depend on past values | | **Lag** | Response time | Time shift | Delay between cause and effect | | **Trend** | Long-term change | Systematic component | Non-stationary mean over time | | **Seasonality** | Annual cycle | Periodic component | Repeating patterns at fixed intervals | | **Stationarity** | Equilibrium | Constant statistics | Statistical properties don't change over time | | **Differencing** | Change analysis | Detrending | Removing non-stationarity by subtracting previous values | | **Decomposition** | Component separation | STL/Seasonal | Breaking time series into trend, seasonal, residual | | **Change point** | Regime shift | Structural break | Time when system behavior fundamentally changes | | **Wavelet analysis** | Multi-scale patterns | Time-frequency | Identifying patterns at multiple timescales | --- # Data Quality Translations | Computer Science | Hydrogeology | Statistics | Unified Meaning | |------------------|--------------|------------|-----------------| | **Missing data** | Measurement gaps | NA/NaN values | Observations not recorded or lost | | **Imputation** | Gap-filling | Missing value estimation | Estimating missing values from available data | | **Normalization** | Unit conversion | Standardization | Scaling variables to common range | | **Filtering** | Data cleaning | Outlier removal | Removing erroneous or irrelevant observations | | **Resampling** | Time aggregation | Temporal binning | Changing temporal resolution (hourly → daily) | | **Data fusion** | Multi-source integration | Data combination | Merging different data types for joint analysis | | **Quality flags** | Data codes | Data qualifiers | Indicators of reliability or issues | --- # Model Performance Translations | Computer Science | Statistics | Hydrogeology | Unified Meaning | |------------------|------------|--------------|-----------------| | **Accuracy** | Correct classification rate | Prediction success | Fraction of predictions that are correct | | **Precision** | Positive predictive value | - | Of predicted positives, how many are correct | | **Recall** | Sensitivity / TPR | - | Of actual positives, how many were found | | **F1 score** | Harmonic mean | - | Balanced measure of model performance | | **RMSE** | Root mean squared error | Prediction error | Average magnitude of prediction errors | | **R²** | Coefficient of determination | Variance explained | Proportion of variance captured by model | | **AIC/BIC** | Information criterion | Model parsimony | Balances model fit with complexity | | **Confusion matrix** | Classification table | Contingency table | Cross-tabulation of predicted vs actual classes | --- # Common Confusion Points ## 1. Spatial Autocorrelation ### What Each Discipline Says **Computer Science:** "Data points that are close together have similar values. This violates the i.i.d. assumption of most ML algorithms." **Hydrogeology:** "Aquifer properties vary smoothly across space due to depositional processes. Tobler's First Law: Everything is related, but near things are more related." **Statistics:** "The covariance structure depends on distance. We model this with variograms and use spatial cross-validation instead of random splits." ### Why It Matters - Standard train/test splits fail (nearby points in train and test leak information) - Must use spatial CV or block CV - Predictions inherit spatial structure from training data - Uncertainty quantification requires spatial correlation modeling ### In This Project - HTEM resistivity shows strong spatial autocorrelation (range ~500-1000m) - Well measurements are spatially correlated (aquifer continuity) - **See:** Part 2 - Spatial Patterns for variogram analysis --- ## 2. Feature vs Property ### What Each Discipline Says **Computer Science:** "Features are input variables to a model. We engineer features by transforming raw data (e.g., log transform, polynomial features, interactions)." **Hydrogeology:** "Aquifer properties are physical characteristics: transmissivity (T), storativity (S), hydraulic conductivity (K). These come from pumping tests and geological analysis." ### How They Connect - CS features ← Derived from → Hydro properties - `feature_depth` = Z-coordinate → Physical: `confining_pressure` - `feature_resistivity_log` → Physical: `clay_content` ### In This Project - HTEM resistivity → Feature for predicting material type - Depth, elevation, neighboring values → Features - True aquifer properties (K, T, S) are target variables OR constraints --- ## 3. Outlier vs Anomaly ### What Each Discipline Says **Computer Science:** Outlier = Statistical anomaly (>3σ from mean) **Hydrogeology:** Anomaly = Unexpected measurement (could be real or error) ### What It Could Be 1. **Measurement error** (sensor malfunction) → Remove 2. **Pump test** (intentional drawdown) → Flag, don't remove 3. **Natural event** (earthquake, flood) → Keep, study 4. **Contamination plume** (localized change) → Key finding! ### Decision Rule **Don't auto-remove outliers!** Investigate with domain knowledge. ### In This Project - Water level "outliers" often = pump tests (intentional) - Resistivity "outliers" often = geological contacts (real) - **See:** Part 1 - Data Quality for outlier handling --- ## 4. Training vs Calibration ### What Each Discipline Says **Computer Science:** Training = Fitting model parameters to minimize loss function on labeled data **Hydrogeology:** Calibration = Adjusting model parameters until model matches field observations **Geophysics:** Inversion = Estimating subsurface structure from surface measurements (ill-posed problem) ### Similarities - All optimize parameters to match observations - All risk overfitting ### Differences - **Training:** Large labeled dataset, many samples - **Calibration:** Few observations, physics-based model - **Inversion:** Underdetermined (infinite solutions), requires regularization ### In This Project - HTEM interpretation = Inversion (resistivity → lithology) - Material type classifier = Training (supervised learning) - Groundwater model = Calibration (match observed heads) --- ## 5. Prediction vs Forecast ### Statistics Perspective - **Prediction:** Estimating unknown values (spatial or temporal) - **Forecast:** Predicting future values (temporal only) - **Projection:** Conditional "what-if" scenarios ### Hydrogeology Perspective - **Prediction:** Where to drill for water (spatial) - **Forecast:** Water levels next month (temporal) - **Projection:** Impact of climate change (scenario) ### Key Distinction **Forecasts assume trends continue. Projections explore alternatives.** ### In This Project - Well productivity prediction (spatial) - 30-day water level forecast (temporal) - Climate change projections (scenarios) --- ## 6. Clustering Purposes ### Different Goals **Computer Science:** Find groups that minimize within-cluster variance **Hydrogeology:** Delineate aquifer compartments with similar flow properties **Statistics:** Identify distinct statistical populations ### Critical Difference - CS: k chosen by elbow plot or silhouette score - Hydro: k should match expected geological units - Stats: k validated by mixture model BIC ### In This Project We constrain k=6 for stratigraphic units (domain knowledge) rather than optimize k statistically. --- ## 7. Depth vs Elevation ### Common Confusion **These are NOT interchangeable!** ::: {.callout-warning icon=false} ## ⚠️ Critical Beginner Mistake Many newcomers confuse depth and elevation. This causes serious analysis errors! **The key difference:** - **Depth** goes **DOWN** from where you stand (depth = 0 at surface, increases downward) - **Elevation** goes **UP** from sea level (elevation increases upward, like a mountain) **They move in opposite directions!** ::: ### Depth to Water (DTW) - **What it is:** How far down to reach water (feet or meters) - **Direction:** Increases when water level drops (more depth to reach water) - **Reference:** Measured from land surface (where you stand) - **Used for:** Well drilling depth, pumping lift - **Example:** "Water is 10 meters deep" = 10 meters below your feet ### Water Surface Elevation (WSE) - **What it is:** Height of water surface above sea level - **Direction:** Decreases when water level drops (surface is lower) - **Reference:** Measured from mean sea level (like mountain heights) - **Used for:** Hydraulic gradients, flow direction - **Comparable:** Can compare between wells at different surface elevations - **Example:** "Water surface elevation is 195 meters" = 195 meters above sea level ### Formula ```text WSE = Land_Surface_Elevation - Depth_To_Water ``` ::: {.callout-note icon=false} ## Concrete Example **Well A:** - Land surface elevation: 210 meters above sea level - Depth to water: 15 meters below surface - Water surface elevation: 210 - 15 = **195 meters** above sea level **Well B (1 km away):** - Land surface elevation: 205 meters above sea level - Depth to water: 8 meters below surface - Water surface elevation: 205 - 8 = **197 meters** above sea level **What this tells us:** - Water flows from B (197m) to A (195m) because elevation is higher at B - Even though Well A has greater depth to water (15m vs 8m), water is actually **lower** in Well A - **You cannot compare depths directly between wells at different elevations!** **Why WSE matters:** It tells you which direction groundwater flows (high → low elevation), regardless of surface topography. ::: ### In This Project - **Database stores:** DTW (measured directly) - **Analysis uses:** WSE (calculated from formula) - **Flow direction:** Determined by WSE gradients, not DTW - **See:** [Data Dictionary](data-dictionary.qmd) for database schema and column definitions --- ## 8. Resistivity vs Conductivity ### Geophysics Clarification **Resistance (Ω):** - Property of a specific object - Depends on geometry **Resistivity (Ω·m):** - Material property (independent of geometry) - What HTEM measures - Inverse of electrical conductivity **Electrical Conductivity (S/m or mS/m):** - Inverse of resistivity: σ = 1/ρ - Higher in saline water, lower in fresh water **Hydraulic Conductivity (m/day):** - Completely different! (flow property, not electrical) - Can correlate with resistivity (sand = high K, high ρ) ### In This Project - HTEM measures resistivity (ρ in Ω·m) - Clay: 1-10 Ω·m (low resistivity, low electrical conductivity) - Sand: 100-1000 Ω·m (high resistivity, high hydraulic conductivity) --- # Discipline-Specific Glossaries ## Computer Science → Hydrogeology **When you say...** → **Hydrogeologists mean...** - "Training data" → Wells with known lithology - "Test data" → New wells or blind validation set - "Features" → Geophysical measurements + spatial coordinates - "Labels" → Material types from drill logs - "Model prediction" → Lithology interpretation - "Model uncertainty" → Geological uncertainty / non-uniqueness - "Hyperparameter tuning" → Model calibration - "Feature importance" → Sensitivity analysis - "Ensemble model" → Multiple scenarios / realizations - "Cross-validation" → Independent validation wells --- ## Hydrogeology → Computer Science **When you say...** → **Computer scientists mean...** - "Hydraulic head" → Target variable (regression) - "Transmissivity" → Derived feature (from multiple sources) - "Aquifer heterogeneity" → High data variance / noise - "Anisotropy" → Directional features matter - "Boundary conditions" → Model constraints - "Calibration" → Training / fitting - "Validation" → Test set evaluation - "Conceptual model" → Model architecture choice - "Uncertainty" → Prediction confidence intervals - "Sensitivity analysis" → Feature importance / ablation study --- ## Statistics → Hydrogeology **When you say...** → **Hydrogeologists mean...** - "Random variable" → Measured quantity with uncertainty - "Probability distribution" → Range of plausible values - "Spatial process" → Geological property field - "Stochastic simulation" → Multiple equally likely realizations - "Bayesian inference" → Updating understanding with new data - "Prior distribution" → Geological expectation before data - "Likelihood" → Consistency with observations - "Posterior distribution" → Updated geological understanding --- # Quick Reference Cards ## For Computer Scientists ### Key Concepts 1. **Hydraulic head** = Potential energy of water (drives flow) 2. **Darcy's Law** = Q = -K·A·(dh/dl) (groundwater's Ohm's Law) 3. **Aquifer types:** Confined (pressurized) vs Unconfined (water table) 4. **Transmissivity** = How easily water flows horizontally (T = K × b) 5. **Storativity** = How much water is stored/released 6. **Recharge** = Water entering aquifer (from precipitation) 7. **Discharge** = Water leaving aquifer (to wells, streams) ### Physics Constraints Required - Water flows downhill (hydraulic gradient) - Conservation of mass (water balance) - Properties vary smoothly (geological continuity) - Anisotropic (horizontal K ≠ vertical K) --- ## For Hydrogeologists ### Key Concepts 1. **Supervised learning** = You provide examples (wells + lithology) 2. **Features** = Variables input to model (depth, resistivity, etc.) 3. **Overfitting** = Model memorizes training data, fails on new data 4. **Cross-validation** = Test on data not used for training 5. **Regularization** = Penalizing overly complex models 6. **Ensemble methods** = Combining multiple models (like multiple realizations) 7. **Neural networks** = Flexible non-linear models (like complex transfer functions) ### Common Pitfalls - Don't trust models on data outside training range (extrapolation) - Spatial autocorrelation violates independence assumptions - More features ≠ better (curse of dimensionality) - Correlation ≠ causation (even strong correlations) --- ## For Statisticians ### Key Concepts 1. **Physical constraints** limit model flexibility (water flows downhill) 2. **Geological processes** create spatial structure (not random) 3. **Measurement errors** are often systematic (sensor drift, calibration) 4. **Missing data** is rarely random (wells where water is needed) 5. **Outliers** often = interesting phenomena (not errors) 6. **Time scales** matter (recharge takes weeks, regional flow takes years) ### Statistical Challenges - Small sample sizes (expensive to drill wells) - High-dimensional but sparse (many features, few samples) - Non-stationary processes (climate change, land use) - Censored data (detection limits, regulatory thresholds) --- # Project-Specific Examples ## Example 1: K-means HTEM ### Computer Science Perspective ```python from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=6) # Minimize within-cluster variance clusters = kmeans.fit_predict(resistivity_features) ``` ### Hydrogeology Perspective "We're grouping similar resistivity values to delineate geological units (A-F). The k=6 is chosen because we expect 6 stratigraphic layers, not from elbow plot optimization." ### Statistics Perspective "This is mixture modeling with hard assignments. We assume 6 Gaussian components. Could validate with BIC, but domain knowledge constrains k." --- ## Example 2: ARIMA Forecasting ### Statistics Perspective ```python from statsmodels.tsa.arima.model import ARIMA model = ARIMA(water_levels, order=(1,1,1), seasonal_order=(1,1,1,12)) forecast = model.predict(steps=30) ``` ### Hydrogeology Perspective "We're predicting future water levels accounting for seasonal recharge cycles (12-month period) and short-term trends. The AR(1) component captures aquifer memory." ### Computer Science Perspective "Time series model that uses past values to predict future. The (p,d,q) and seasonal orders are hyperparameters chosen by AIC/BIC or domain knowledge (12-month cycle)." --- ## Example 3: Interpolation Choice ### When to Use Kriging - Need uncertainty estimates (kriging variance) - Data follows Gaussian assumptions - Spatial autocorrelation is primary pattern - Interpretability matters ### When Use ML - Non-stationary processes - Multiple covariates available - Non-linear relationships - Large datasets (>100k points) ### In This Project We use **both**, compare results, and choose based on validation metrics. --- # Visual Concept Map ```mermaid graph TD A[Environmental Data Science] --> B[Computer Science] A --> C[Hydrogeology] A --> D[Statistics] A --> E[Geophysics] B --> B1[Algorithms] B --> B2[Data Structures] B --> B3[Software Engineering] C --> C1[Aquifer Properties] C --> C2[Flow Systems] C --> C3[Water Quality] D --> D1[Spatial Statistics] D --> D2[Time Series] D --> D3[Uncertainty] E --> E1[EM Theory] E --> E2[Inversion] E --> E3[Material Properties] B1 -.-> C2 C1 -.-> D1 E3 -.-> C1 D2 -.-> C2 ``` --- # Operations & Decision Support Terminology This section covers terms commonly used in Part 5 (Predictive Operations) that bridge machine learning, optimization, and water management. ## Model Performance Metrics | Term | What It Means | When to Use | Example | |------|---------------|-------------|---------| | **R² (R-squared)** | Proportion of variance explained by model (0-1). Higher = better fit. | Comparing models, regression tasks | R² = 0.85 means model explains 85% of water level variation | | **RMSE** | Root Mean Square Error - average prediction error in original units | Understanding "how wrong" predictions are | RMSE = 0.3 m means predictions typically off by 0.3 meters | | **MAE** | Mean Absolute Error - average absolute prediction error | Robust to outliers | MAE = 0.2 m means average absolute error is 0.2 meters | | **Accuracy** | Percentage of correct classifications | Classification tasks (sand vs clay) | 86% accuracy = 86 of 100 predictions correct | | **Precision** | Of predictions labeled "positive", how many were correct | When false positives are costly | 90% precision = 9 of 10 "sand" predictions were actually sand | | **Recall** | Of actual positives, how many did model find | When false negatives are costly | 80% recall = found 8 of 10 actual sand locations | ## Optimization Terminology | Term | What It Means | Water Management Context | |------|---------------|-------------------------| | **Pareto frontier** | Set of solutions where improving one objective worsens another | Trade-off between well yield (want high) and drilling cost (want low) | | **Multi-objective optimization** | Finding best trade-offs across competing goals | Balancing yield, cost, uncertainty, and sustainability simultaneously | | **Constraint** | Hard limit that cannot be violated | "Well must be >500m from contamination source" | | **Objective function** | Mathematical formula being optimized | Maximize: 0.35×Yield + 0.25×(1-Cost) + 0.25×Confidence + 0.15×Sustainability | | **Risk-adjusted NPV** | Net Present Value accounting for uncertainty | Expected value × probability of success | ## Explainability & Trust | Term | What It Means | Why It Matters | |------|---------------|----------------| | **SHAP values** | Feature contribution to individual predictions | "This well predicted as sand because: 40% from resistivity, 30% from depth, 20% from location" | | **Feature importance** | Global ranking of which inputs matter most | "Across all predictions, resistivity is most important (35%), then depth (25%)" | | **Black box model** | Model where internal logic is hidden | Neural networks - accurate but hard to explain to stakeholders | | **Interpretable model** | Model with transparent logic | Decision trees - can show exact rules: "If resistivity > 100 AND depth < 50m → Sand" | | **Confidence interval** | Range where true value likely falls | "Yield = 135 GPM ± 15 GPM (95% CI)" means 95% chance true yield is 120-150 GPM | | **Prediction interval** | Range where future observations likely fall | Wider than CI because includes both model and data uncertainty | ## Common Confusion Pairs | Term 1 | Term 2 | The Difference | |--------|--------|----------------| | **Parameter** (ML) | **Parameter** (Hydro) | ML: Weights learned during training. Hydro: Physical properties (transmissivity, storativity) | | **Hyperparameter** | **Parameter** | Hyperparameter: Set before training (e.g., tree depth). Parameter: Learned during training | | **Training** | **Calibration** | Training = ML term. Calibration = Hydro term. Both mean fitting model to data | | **Validation** | **Verification** | Validation: Does model perform well? Verification: Is model coded correctly? | | **Uncertainty** | **Error** | Uncertainty: Range of possible values. Error: Difference between prediction and actual | | **Forecast** | **Prediction** | Forecast: Future values (time-dependent). Prediction: Any estimated value | | **Overfitting** | **Over-parameterization** | Both mean: Model too complex for available data, won't generalize | ## Autocorrelation Interpretation Guide When analyzing water level time series, autocorrelation (ACF) values tell you about aquifer "memory": | ACF at Lag | Physical Meaning | Aquifer Type Indication | |------------|------------------|------------------------| | **ACF(1 month) = 0.95** | Very high memory - this month almost completely predicts next month | Confined aquifer, slow response | | **ACF(1 month) = 0.50** | Moderate memory - this month gives ~25% information about next month (0.5²) | Semi-confined or deep unconfined | | **ACF(1 month) = 0.20** | Low memory - rapid response, levels change quickly | Shallow unconfined, stream-connected | | **ACF(12 months) = 0.50** | Annual cycle explains ~25% of variance | Strong seasonal forcing (annual recharge pattern) | | **ACF decays slowly** | Long-term persistence, multi-year droughts/wet periods | Climate-dominated system, slow recovery | | **ACF decays quickly** | Short-term memory only | Weather-dominated system, fast recovery | **Rule of thumb:** Confined aquifers typically show ACF(12 months) > 0.3; unconfined aquifers show ACF(12 months) < 0.2. --- # Contributing to This Guide ## How to Contribute ### Found a Term That Needs Translation? Submit a PR or issue with: - The term in your discipline - How it's used in context - Potential equivalents in other disciplines - Example from this project ### Disagree with a translation? Translations are nuanced! Start a discussion: - Explain your perspective - Provide references if available - Suggest alternative phrasing ### Add Discipline? We welcome additional perspectives: - Climate science - Ecology - Economics - Policy/regulation - Engineering --- # Further Resources ## Books Bridging Disciplines - **CS ↔ Hydrogeology:** "Hydrogeological Data Analysis" by Kitanidis - **Stats ↔ Spatial:** "Statistics for Spatial Data" by Cressie - **ML ↔ Hydrology:** "Data-Driven Modeling of Environmental Systems" by Reichstein et al. ## Online Glossaries - USGS Water Science Glossary: [water.usgs.gov/edu/dictionary.html](https://water.usgs.gov/edu/dictionary.html) - Machine Learning Glossary: [ml-cheatsheet.readthedocs.io](https://ml-cheatsheet.readthedocs.io) - Geostatistics Glossary: [geostatisticslessons.com](http://www.geostatisticslessons.com) ## Community Forums - Hydrogeology: [eng-tips.com](https://www.eng-tips.com/threadarea.cfm?fid=305) - Data Science: [stats.stackexchange.com](https://stats.stackexchange.com) - Geophysics: [seg.org/connect](https://seg.org/connect) --- # Summary This translation guide serves as a **living bridge** between disciplines. As the project evolves, so will this guide. **Goal:** When a computer scientist, hydrogeologist, statistician, or geophysicist reads the same analysis, they should each understand it in their own terms while appreciating what the other disciplines contribute. **Next Steps:** 1. Bookmark this page for quick reference 2. Use Ctrl+F to search for terms as needed 3. Suggest additions via issues/PRs 4. Share with colleagues from other disciplines **Questions?** Open an issue with the `terminology` label. --- **Last Updated:** November 26, 2025 **Contributors:** Open to all **License:** CC-BY-4.0 (attribution required)