54 Terminology Translation Guide
The Rosetta Stone for interdisciplinary aquifer data science
55 Why Translation Matters
Environmental data science requires collaboration across disciplines that often use different words for the same concepts, or worse, the same words for different concepts. This chapter serves as a living translation guide to bridge these gaps.
Who this helps: - Computer scientists learning hydrogeology terminology - Hydrogeologists learning data science methods - Statisticians understanding domain context - Geophysicists connecting EM theory to analysis - Students navigating multiple disciplines
How to use this guide: - Search: Use Ctrl+F / Cmd+F to find any term quickly. - Read across: Check equivalent concepts in other disciplines. - Check confusions: See “Common Confusion” sections for pitfalls. - See examples: “In This Project” shows concrete applications.
If you are completely new to groundwater, start with the Plain-Language Basics below. You do not need to memorize definitions; come back here whenever a chapter uses a term you do not recognize.
56 Plain-Language Basics
These are the core ideas that appear throughout the playbook, written for readers with no water background.
| Term | Plain Description | Why It Matters in This Playbook | Real Example |
|---|---|---|---|
| Groundwater | Water stored in pores and cracks of rocks and sediments underground. | It is the main water source we are trying to understand and manage. | When you drill a well 50 meters deep and water fills it to 10 meters below surface, that’s groundwater from the aquifer. |
| Aquifer | A body of rock or sediment that can store and transmit usable amounts of groundwater—like a buried sponge. | Most of the analyses ask: Where is the aquifer? How full is it? How does it respond to weather and pumping? | Unit D (Mahomet Aquifer) is a buried sand/gravel valley 12-96m deep that stores billions of gallons of water. |
| Confining layer | A layer of clay or rock that does not let much water pass through. | Protects deeper aquifers from quick changes at the land surface, making them respond more slowly. | Unit E (clay layer above Unit D) prevents surface spills from quickly reaching the drinking water aquifer below. |
| Recharge | Water that soaks down from the surface (rain, snowmelt, irrigation) to refill the aquifer. | Links weather and land-surface processes to long-term groundwater levels. | Spring rains in Illinois soak through soil → percolate down through sand → raise water levels in Unit D over weeks to months. |
| Well | A hole drilled into the ground to reach groundwater, often with sensors that measure water level. | Provides direct observations of how the aquifer is behaving at specific locations. | Our 356 observation wells measure water levels every 15 minutes, creating a 1-million-record time series. |
| Water level | The height of groundwater in a well, usually measured relative to a reference point. | Rising or falling water levels tell us if the aquifer is gaining or losing storage. | If water level rises 2 meters in spring, the aquifer gained storage (recharged). If it drops 1 meter in summer, it lost storage. |
| HTEM | Helicopter-borne geophysical survey that measures how the ground resists electrical currents. | Gives us a 3D picture of underground materials without drilling, which we link to aquifer properties. | 2008 helicopter survey mapped 2,361 km² in weeks—would take decades and millions of dollars to drill that many wells. |
| Resistivity | A measure of how strongly a material resists electric current; clays are low, sands and gravels are high. | Used as a proxy for material type and aquifer quality in HTEM maps. | Clay: 5-30 Ω·m (low), Sand: 100-200 Ω·m (high). High resistivity = good aquifer material. |
| Confined aquifer | An aquifer trapped between confining layers, reacting mainly to pressure changes, not directly to surface water table. | Explains why some wells show tiny seasonal swings but long-term memory of past conditions. | When Unit D is sealed by clay above/bedrock below, water levels change slowly (±0.5m) but track multi-year climate patterns. |
| Unconfined aquifer | An aquifer whose top surface is the water table, directly connected to the surface. | Responds quickly to rain and drought with larger seasonal swings. | Shallow aquifers near streams can swing 3-5 meters seasonally, rising quickly after rain, dropping in summer. |
| Hydraulic head | The potential energy of water at a point (combination of elevation and pressure). | Water flows from high head to low head—this determines groundwater flow direction. | If Well A has head of 200m and Well B has 195m (5km away), water flows from A→B at ~1 meter drop per kilometer. |
| Transmissivity | How easily water flows horizontally through the full thickness of an aquifer. | High transmissivity = wells produce more water, aquifer recovers faster from pumping. | Good sand aquifer: T = 1000 m²/day (productive). Clay layer: T = 1 m²/day (poor, can’t supply wells). |
| Storativity | The volume of water an aquifer releases (or stores) per unit area per unit head change. | Determines how much water level drops when you pump, or rises when it rains. | Unconfined: S = 0.15 (15% of aquifer volume drainable). Confined: S = 0.0001 (only 0.01% released by pressure). |
57 Core Concept Translations
57.1 Master Translation Table
| Computer Science | Hydrogeology | Statistics | Geophysics | Unified Meaning |
|---|---|---|---|---|
| Outlier detection | Anomalous water levels | Statistical anomaly | Measurement error | Identifying observations that deviate from expected patterns - requires domain context to interpret |
| Feature engineering | Aquifer properties | Predictor variables | Material parameters | Transforming raw observations into model inputs that capture relevant physics |
| Clustering | Aquifer compartments | Spatial grouping | Material zones | Identifying natural groupings where similar properties occur together |
| Classification | Lithology mapping | Categorical prediction | Material identification | Assigning observations to discrete categories (e.g., sand vs clay) |
| Regression | Empirical relationships | Continuous prediction | Forward modeling | Predicting continuous values (e.g., water level, resistivity) |
| Time series forecasting | Water level prediction | ARIMA/Prophet | - | Extrapolating temporal patterns into the future |
| Dimensionality reduction | Stratigraphic simplification | PCA/Factor analysis | Layer averaging | Reducing complexity while preserving essential information |
| Interpolation | Spatial estimation | Kriging/IDW | Grid generation | Estimating values at unobserved locations from nearby measurements |
| Cross-validation | Independent validation | Model assessment | Test-train split | Evaluating model performance on data not used for training |
| Hyperparameter tuning | Model calibration | Parameter optimization | Inversion tuning | Finding optimal configuration for model performance |
| Supervised learning | Training on known lithology | Labeled data modeling | Constrained inversion | Learning from observations with known outcomes |
| Unsupervised learning | Exploratory analysis | Pattern discovery | Data-driven zonation | Finding structure in data without predefined labels |
| Ensemble methods | Multi-model prediction | Bagging/Boosting | Combined inversions | Combining multiple models to improve predictions |
| Neural networks | Non-linear modeling | Deep learning | Complex mapping | Flexible models that learn hierarchical patterns |
| Gradient descent | Optimization | Iterative minimization | Inversion algorithm | Iteratively improving model by following error gradient |
| Loss function | Misfit function | Error metric | Data residual | Quantifies difference between model predictions and observations |
| Overfitting | Over-parameterization | Poor generalization | Non-unique solution | Model fits training data perfectly but fails on new data |
| Regularization | Parsimony constraint | Penalized regression | Damping/Smoothing | Constraining model complexity to prevent overfitting |
| Batch processing | Bulk analysis | - | Survey-wide processing | Processing multiple records simultaneously for efficiency |
| Pipeline | Workflow | Processing chain | Analysis sequence | Series of automated steps from raw data to results |
58 Spatial Analysis Translations
| Computer Science | Hydrogeology | Statistics | Unified Meaning |
|---|---|---|---|
| Spatial autocorrelation | Aquifer continuity | Tobler’s First Law | Nearby locations are more similar than distant ones |
| Variogram | Spatial structure | Covariance function | How similarity decreases with distance |
| Kriging | Optimal interpolation | BLUE estimation | Best Linear Unbiased Estimator for spatial data |
| Neighborhood search | Zone of influence | Local estimation | Determining which nearby points affect prediction |
| Anisotropy | Directional permeability | Directional correlation | Properties vary differently in different directions |
| Range | Correlation distance | Spatial dependence limit | Maximum distance where spatial correlation exists |
| Sill | Total variance | Asymptotic variance | Variance at distances beyond correlation |
| Nugget | Measurement error | Small-scale variance | Discontinuity at zero distance |
59 Temporal Analysis Translations
| Computer Science | Hydrogeology | Statistics | Unified Meaning |
|---|---|---|---|
| Autocorrelation | System memory | Temporal dependence | Current values depend on past values |
| Lag | Response time | Time shift | Delay between cause and effect |
| Trend | Long-term change | Systematic component | Non-stationary mean over time |
| Seasonality | Annual cycle | Periodic component | Repeating patterns at fixed intervals |
| Stationarity | Equilibrium | Constant statistics | Statistical properties don’t change over time |
| Differencing | Change analysis | Detrending | Removing non-stationarity by subtracting previous values |
| Decomposition | Component separation | STL/Seasonal | Breaking time series into trend, seasonal, residual |
| Change point | Regime shift | Structural break | Time when system behavior fundamentally changes |
| Wavelet analysis | Multi-scale patterns | Time-frequency | Identifying patterns at multiple timescales |
60 Data Quality Translations
| Computer Science | Hydrogeology | Statistics | Unified Meaning |
|---|---|---|---|
| Missing data | Measurement gaps | NA/NaN values | Observations not recorded or lost |
| Imputation | Gap-filling | Missing value estimation | Estimating missing values from available data |
| Normalization | Unit conversion | Standardization | Scaling variables to common range |
| Filtering | Data cleaning | Outlier removal | Removing erroneous or irrelevant observations |
| Resampling | Time aggregation | Temporal binning | Changing temporal resolution (hourly → daily) |
| Data fusion | Multi-source integration | Data combination | Merging different data types for joint analysis |
| Quality flags | Data codes | Data qualifiers | Indicators of reliability or issues |
61 Model Performance Translations
| Computer Science | Statistics | Hydrogeology | Unified Meaning |
|---|---|---|---|
| Accuracy | Correct classification rate | Prediction success | Fraction of predictions that are correct |
| Precision | Positive predictive value | - | Of predicted positives, how many are correct |
| Recall | Sensitivity / TPR | - | Of actual positives, how many were found |
| F1 score | Harmonic mean | - | Balanced measure of model performance |
| RMSE | Root mean squared error | Prediction error | Average magnitude of prediction errors |
| R² | Coefficient of determination | Variance explained | Proportion of variance captured by model |
| AIC/BIC | Information criterion | Model parsimony | Balances model fit with complexity |
| Confusion matrix | Classification table | Contingency table | Cross-tabulation of predicted vs actual classes |
62 Common Confusion Points
62.1 1. Spatial Autocorrelation
62.1.1 What Each Discipline Says
Computer Science: “Data points that are close together have similar values. This violates the i.i.d. assumption of most ML algorithms.”
Hydrogeology: “Aquifer properties vary smoothly across space due to depositional processes. Tobler’s First Law: Everything is related, but near things are more related.”
Statistics: “The covariance structure depends on distance. We model this with variograms and use spatial cross-validation instead of random splits.”
62.1.2 Why It Matters
- Standard train/test splits fail (nearby points in train and test leak information)
- Must use spatial CV or block CV
- Predictions inherit spatial structure from training data
- Uncertainty quantification requires spatial correlation modeling
62.1.3 In This Project
- HTEM resistivity shows strong spatial autocorrelation (range ~500-1000m)
- Well measurements are spatially correlated (aquifer continuity)
- See: Part 2 - Spatial Patterns for variogram analysis
62.2 2. Feature vs Property
62.2.1 What Each Discipline Says
Computer Science: “Features are input variables to a model. We engineer features by transforming raw data (e.g., log transform, polynomial features, interactions).”
Hydrogeology: “Aquifer properties are physical characteristics: transmissivity (T), storativity (S), hydraulic conductivity (K). These come from pumping tests and geological analysis.”
62.2.2 How They Connect
- CS features ← Derived from → Hydro properties
feature_depth= Z-coordinate → Physical:confining_pressurefeature_resistivity_log→ Physical:clay_content
62.2.3 In This Project
- HTEM resistivity → Feature for predicting material type
- Depth, elevation, neighboring values → Features
- True aquifer properties (K, T, S) are target variables OR constraints
62.3 3. Outlier vs Anomaly
62.3.1 What Each Discipline Says
Computer Science: Outlier = Statistical anomaly (>3σ from mean)
Hydrogeology: Anomaly = Unexpected measurement (could be real or error)
62.3.2 What It Could Be
- Measurement error (sensor malfunction) → Remove
- Pump test (intentional drawdown) → Flag, don’t remove
- Natural event (earthquake, flood) → Keep, study
- Contamination plume (localized change) → Key finding!
62.3.3 Decision Rule
Don’t auto-remove outliers! Investigate with domain knowledge.
62.3.4 In This Project
- Water level “outliers” often = pump tests (intentional)
- Resistivity “outliers” often = geological contacts (real)
- See: Part 1 - Data Quality for outlier handling
62.4 4. Training vs Calibration
62.4.1 What Each Discipline Says
Computer Science: Training = Fitting model parameters to minimize loss function on labeled data
Hydrogeology: Calibration = Adjusting model parameters until model matches field observations
Geophysics: Inversion = Estimating subsurface structure from surface measurements (ill-posed problem)
62.4.2 Similarities
- All optimize parameters to match observations
- All risk overfitting
62.4.3 Differences
- Training: Large labeled dataset, many samples
- Calibration: Few observations, physics-based model
- Inversion: Underdetermined (infinite solutions), requires regularization
62.4.4 In This Project
- HTEM interpretation = Inversion (resistivity → lithology)
- Material type classifier = Training (supervised learning)
- Groundwater model = Calibration (match observed heads)
62.5 5. Prediction vs Forecast
62.5.1 Statistics Perspective
- Prediction: Estimating unknown values (spatial or temporal)
- Forecast: Predicting future values (temporal only)
- Projection: Conditional “what-if” scenarios
62.5.2 Hydrogeology Perspective
- Prediction: Where to drill for water (spatial)
- Forecast: Water levels next month (temporal)
- Projection: Impact of climate change (scenario)
62.5.3 Key Distinction
Forecasts assume trends continue. Projections explore alternatives.
62.5.4 In This Project
- Well productivity prediction (spatial)
- 30-day water level forecast (temporal)
- Climate change projections (scenarios)
62.6 6. Clustering Purposes
62.6.1 Different Goals
Computer Science: Find groups that minimize within-cluster variance
Hydrogeology: Delineate aquifer compartments with similar flow properties
Statistics: Identify distinct statistical populations
62.6.2 Critical Difference
- CS: k chosen by elbow plot or silhouette score
- Hydro: k should match expected geological units
- Stats: k validated by mixture model BIC
62.6.3 In This Project
We constrain k=6 for stratigraphic units (domain knowledge) rather than optimize k statistically.
62.7 7. Depth vs Elevation
62.7.1 Common Confusion
These are NOT interchangeable!
62.7.2 Depth to Water (DTW)
- What it is: How far down to reach water (feet or meters)
- Direction: Increases when water level drops (more depth to reach water)
- Reference: Measured from land surface (where you stand)
- Used for: Well drilling depth, pumping lift
- Example: “Water is 10 meters deep” = 10 meters below your feet
62.7.3 Water Surface Elevation (WSE)
- What it is: Height of water surface above sea level
- Direction: Decreases when water level drops (surface is lower)
- Reference: Measured from mean sea level (like mountain heights)
- Used for: Hydraulic gradients, flow direction
- Comparable: Can compare between wells at different surface elevations
- Example: “Water surface elevation is 195 meters” = 195 meters above sea level
62.7.4 Formula
WSE = Land_Surface_Elevation - Depth_To_Water
62.7.5 In This Project
- Database stores: DTW (measured directly)
- Analysis uses: WSE (calculated from formula)
- Flow direction: Determined by WSE gradients, not DTW
- See: Data Dictionary for database schema and column definitions
62.8 8. Resistivity vs Conductivity
62.8.1 Geophysics Clarification
Resistance (Ω): - Property of a specific object - Depends on geometry
Resistivity (Ω·m): - Material property (independent of geometry) - What HTEM measures - Inverse of electrical conductivity
Electrical Conductivity (S/m or mS/m): - Inverse of resistivity: σ = 1/ρ - Higher in saline water, lower in fresh water
Hydraulic Conductivity (m/day): - Completely different! (flow property, not electrical) - Can correlate with resistivity (sand = high K, high ρ)
62.8.2 In This Project
- HTEM measures resistivity (ρ in Ω·m)
- Clay: 1-10 Ω·m (low resistivity, low electrical conductivity)
- Sand: 100-1000 Ω·m (high resistivity, high hydraulic conductivity)
63 Discipline-Specific Glossaries
63.1 Computer Science → Hydrogeology
When you say… → Hydrogeologists mean…
- “Training data” → Wells with known lithology
- “Test data” → New wells or blind validation set
- “Features” → Geophysical measurements + spatial coordinates
- “Labels” → Material types from drill logs
- “Model prediction” → Lithology interpretation
- “Model uncertainty” → Geological uncertainty / non-uniqueness
- “Hyperparameter tuning” → Model calibration
- “Feature importance” → Sensitivity analysis
- “Ensemble model” → Multiple scenarios / realizations
- “Cross-validation” → Independent validation wells
63.2 Hydrogeology → Computer Science
When you say… → Computer scientists mean…
- “Hydraulic head” → Target variable (regression)
- “Transmissivity” → Derived feature (from multiple sources)
- “Aquifer heterogeneity” → High data variance / noise
- “Anisotropy” → Directional features matter
- “Boundary conditions” → Model constraints
- “Calibration” → Training / fitting
- “Validation” → Test set evaluation
- “Conceptual model” → Model architecture choice
- “Uncertainty” → Prediction confidence intervals
- “Sensitivity analysis” → Feature importance / ablation study
63.3 Statistics → Hydrogeology
When you say… → Hydrogeologists mean…
- “Random variable” → Measured quantity with uncertainty
- “Probability distribution” → Range of plausible values
- “Spatial process” → Geological property field
- “Stochastic simulation” → Multiple equally likely realizations
- “Bayesian inference” → Updating understanding with new data
- “Prior distribution” → Geological expectation before data
- “Likelihood” → Consistency with observations
- “Posterior distribution” → Updated geological understanding
64 Quick Reference Cards
64.1 For Computer Scientists
64.1.1 Key Concepts
- Hydraulic head = Potential energy of water (drives flow)
- Darcy’s Law = Q = -K·A·(dh/dl) (groundwater’s Ohm’s Law)
- Aquifer types: Confined (pressurized) vs Unconfined (water table)
- Transmissivity = How easily water flows horizontally (T = K × b)
- Storativity = How much water is stored/released
- Recharge = Water entering aquifer (from precipitation)
- Discharge = Water leaving aquifer (to wells, streams)
64.1.2 Physics Constraints Required
- Water flows downhill (hydraulic gradient)
- Conservation of mass (water balance)
- Properties vary smoothly (geological continuity)
- Anisotropic (horizontal K ≠ vertical K)
64.2 For Hydrogeologists
64.2.1 Key Concepts
- Supervised learning = You provide examples (wells + lithology)
- Features = Variables input to model (depth, resistivity, etc.)
- Overfitting = Model memorizes training data, fails on new data
- Cross-validation = Test on data not used for training
- Regularization = Penalizing overly complex models
- Ensemble methods = Combining multiple models (like multiple realizations)
- Neural networks = Flexible non-linear models (like complex transfer functions)
64.2.2 Common Pitfalls
- Don’t trust models on data outside training range (extrapolation)
- Spatial autocorrelation violates independence assumptions
- More features ≠ better (curse of dimensionality)
- Correlation ≠ causation (even strong correlations)
64.3 For Statisticians
64.3.1 Key Concepts
- Physical constraints limit model flexibility (water flows downhill)
- Geological processes create spatial structure (not random)
- Measurement errors are often systematic (sensor drift, calibration)
- Missing data is rarely random (wells where water is needed)
- Outliers often = interesting phenomena (not errors)
- Time scales matter (recharge takes weeks, regional flow takes years)
64.3.2 Statistical Challenges
- Small sample sizes (expensive to drill wells)
- High-dimensional but sparse (many features, few samples)
- Non-stationary processes (climate change, land use)
- Censored data (detection limits, regulatory thresholds)
65 Project-Specific Examples
65.1 Example 1: K-means HTEM
65.1.1 Computer Science Perspective
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=6) # Minimize within-cluster variance
clusters = kmeans.fit_predict(resistivity_features)65.1.2 Hydrogeology Perspective
“We’re grouping similar resistivity values to delineate geological units (A-F). The k=6 is chosen because we expect 6 stratigraphic layers, not from elbow plot optimization.”
65.1.3 Statistics Perspective
“This is mixture modeling with hard assignments. We assume 6 Gaussian components. Could validate with BIC, but domain knowledge constrains k.”
65.2 Example 2: ARIMA Forecasting
65.2.1 Statistics Perspective
from statsmodels.tsa.arima.model import ARIMA
model = ARIMA(water_levels, order=(1,1,1), seasonal_order=(1,1,1,12))
forecast = model.predict(steps=30)65.2.2 Hydrogeology Perspective
“We’re predicting future water levels accounting for seasonal recharge cycles (12-month period) and short-term trends. The AR(1) component captures aquifer memory.”
65.2.3 Computer Science Perspective
“Time series model that uses past values to predict future. The (p,d,q) and seasonal orders are hyperparameters chosen by AIC/BIC or domain knowledge (12-month cycle).”
65.3 Example 3: Interpolation Choice
65.3.1 When to Use Kriging
- Need uncertainty estimates (kriging variance)
- Data follows Gaussian assumptions
- Spatial autocorrelation is primary pattern
- Interpretability matters
65.3.2 When Use ML
- Non-stationary processes
- Multiple covariates available
- Non-linear relationships
- Large datasets (>100k points)
65.3.3 In This Project
We use both, compare results, and choose based on validation metrics.
66 Visual Concept Map
graph TD
A[Environmental Data Science] --> B[Computer Science]
A --> C[Hydrogeology]
A --> D[Statistics]
A --> E[Geophysics]
B --> B1[Algorithms]
B --> B2[Data Structures]
B --> B3[Software Engineering]
C --> C1[Aquifer Properties]
C --> C2[Flow Systems]
C --> C3[Water Quality]
D --> D1[Spatial Statistics]
D --> D2[Time Series]
D --> D3[Uncertainty]
E --> E1[EM Theory]
E --> E2[Inversion]
E --> E3[Material Properties]
B1 -.-> C2
C1 -.-> D1
E3 -.-> C1
D2 -.-> C2
67 Operations & Decision Support Terminology
This section covers terms commonly used in Part 5 (Predictive Operations) that bridge machine learning, optimization, and water management.
67.1 Model Performance Metrics
| Term | What It Means | When to Use | Example |
|---|---|---|---|
| R² (R-squared) | Proportion of variance explained by model (0-1). Higher = better fit. | Comparing models, regression tasks | R² = 0.85 means model explains 85% of water level variation |
| RMSE | Root Mean Square Error - average prediction error in original units | Understanding “how wrong” predictions are | RMSE = 0.3 m means predictions typically off by 0.3 meters |
| MAE | Mean Absolute Error - average absolute prediction error | Robust to outliers | MAE = 0.2 m means average absolute error is 0.2 meters |
| Accuracy | Percentage of correct classifications | Classification tasks (sand vs clay) | 86% accuracy = 86 of 100 predictions correct |
| Precision | Of predictions labeled “positive”, how many were correct | When false positives are costly | 90% precision = 9 of 10 “sand” predictions were actually sand |
| Recall | Of actual positives, how many did model find | When false negatives are costly | 80% recall = found 8 of 10 actual sand locations |
67.2 Optimization Terminology
| Term | What It Means | Water Management Context |
|---|---|---|
| Pareto frontier | Set of solutions where improving one objective worsens another | Trade-off between well yield (want high) and drilling cost (want low) |
| Multi-objective optimization | Finding best trade-offs across competing goals | Balancing yield, cost, uncertainty, and sustainability simultaneously |
| Constraint | Hard limit that cannot be violated | “Well must be >500m from contamination source” |
| Objective function | Mathematical formula being optimized | Maximize: 0.35×Yield + 0.25×(1-Cost) + 0.25×Confidence + 0.15×Sustainability |
| Risk-adjusted NPV | Net Present Value accounting for uncertainty | Expected value × probability of success |
67.3 Explainability & Trust
| Term | What It Means | Why It Matters |
|---|---|---|
| SHAP values | Feature contribution to individual predictions | “This well predicted as sand because: 40% from resistivity, 30% from depth, 20% from location” |
| Feature importance | Global ranking of which inputs matter most | “Across all predictions, resistivity is most important (35%), then depth (25%)” |
| Black box model | Model where internal logic is hidden | Neural networks - accurate but hard to explain to stakeholders |
| Interpretable model | Model with transparent logic | Decision trees - can show exact rules: “If resistivity > 100 AND depth < 50m → Sand” |
| Confidence interval | Range where true value likely falls | “Yield = 135 GPM ± 15 GPM (95% CI)” means 95% chance true yield is 120-150 GPM |
| Prediction interval | Range where future observations likely fall | Wider than CI because includes both model and data uncertainty |
67.4 Common Confusion Pairs
| Term 1 | Term 2 | The Difference |
|---|---|---|
| Parameter (ML) | Parameter (Hydro) | ML: Weights learned during training. Hydro: Physical properties (transmissivity, storativity) |
| Hyperparameter | Parameter | Hyperparameter: Set before training (e.g., tree depth). Parameter: Learned during training |
| Training | Calibration | Training = ML term. Calibration = Hydro term. Both mean fitting model to data |
| Validation | Verification | Validation: Does model perform well? Verification: Is model coded correctly? |
| Uncertainty | Error | Uncertainty: Range of possible values. Error: Difference between prediction and actual |
| Forecast | Prediction | Forecast: Future values (time-dependent). Prediction: Any estimated value |
| Overfitting | Over-parameterization | Both mean: Model too complex for available data, won’t generalize |
67.5 Autocorrelation Interpretation Guide
When analyzing water level time series, autocorrelation (ACF) values tell you about aquifer “memory”:
| ACF at Lag | Physical Meaning | Aquifer Type Indication |
|---|---|---|
| ACF(1 month) = 0.95 | Very high memory - this month almost completely predicts next month | Confined aquifer, slow response |
| ACF(1 month) = 0.50 | Moderate memory - this month gives ~25% information about next month (0.5²) | Semi-confined or deep unconfined |
| ACF(1 month) = 0.20 | Low memory - rapid response, levels change quickly | Shallow unconfined, stream-connected |
| ACF(12 months) = 0.50 | Annual cycle explains ~25% of variance | Strong seasonal forcing (annual recharge pattern) |
| ACF decays slowly | Long-term persistence, multi-year droughts/wet periods | Climate-dominated system, slow recovery |
| ACF decays quickly | Short-term memory only | Weather-dominated system, fast recovery |
Rule of thumb: Confined aquifers typically show ACF(12 months) > 0.3; unconfined aquifers show ACF(12 months) < 0.2.
68 Contributing to This Guide
68.1 How to Contribute
68.1.1 Found a Term That Needs Translation?
Submit a PR or issue with: - The term in your discipline - How it’s used in context - Potential equivalents in other disciplines - Example from this project
68.1.2 Disagree with a translation?
Translations are nuanced! Start a discussion: - Explain your perspective - Provide references if available - Suggest alternative phrasing
68.1.3 Add Discipline?
We welcome additional perspectives: - Climate science - Ecology - Economics - Policy/regulation - Engineering
69 Further Resources
69.1 Books Bridging Disciplines
- CS ↔︎ Hydrogeology: “Hydrogeological Data Analysis” by Kitanidis
- Stats ↔︎ Spatial: “Statistics for Spatial Data” by Cressie
- ML ↔︎ Hydrology: “Data-Driven Modeling of Environmental Systems” by Reichstein et al.
69.2 Online Glossaries
- USGS Water Science Glossary: water.usgs.gov/edu/dictionary.html
- Machine Learning Glossary: ml-cheatsheet.readthedocs.io
- Geostatistics Glossary: geostatisticslessons.com
69.3 Community Forums
- Hydrogeology: eng-tips.com
- Data Science: stats.stackexchange.com
- Geophysics: seg.org/connect
70 Summary
This translation guide serves as a living bridge between disciplines. As the project evolves, so will this guide.
Goal: When a computer scientist, hydrogeologist, statistician, or geophysicist reads the same analysis, they should each understand it in their own terms while appreciating what the other disciplines contribute.
Next Steps: 1. Bookmark this page for quick reference 2. Use Ctrl+F to search for terms as needed 3. Suggest additions via issues/PRs 4. Share with colleagues from other disciplines
Questions? Open an issue with the terminology label.
Last Updated: November 26, 2025 Contributors: Open to all License: CC-BY-4.0 (attribution required)