55 Complete Data Dictionary
Reference for all four data sources
56 Overview
This data dictionary documents all four primary data sources in the Aquifer Intelligence System. The repository contains 10+ GB of integrated hydrogeological data from Champaign County, Illinois.
Data Sources: 1. HTEM geophysical data (electromagnetic surveys) 2. Groundwater monitoring database (well measurements) 3. Weather/climate database (meteorological observations) 4. USGS stream gauge data (surface water discharge)
Total Storage: ~10.44 GB across all sources
Why Four Data Sources?
Each data source sees the aquifer differently: - HTEM: Where is the aquifer? (structure, materials) - Groundwater wells: How full is it? (water levels over time) - Weather: What drives changes? (rain, temperature) - Stream gauges: Where does water go? (surface-groundwater connections)
Together, they tell the complete story of how the aquifer behaves.
56.1 Database Schemas
56.1.1 Groundwater Database (aquifer.db)
Table: OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
-- Primary table for groundwater level analysis
-- ~1 million records from 18 wells (2009-2023)
Column Name | Type | Description
-----------------------------|---------|--------------------------------------------------
P_Number | TEXT | Well identifier (e.g., "444863", "410614")
TIMESTAMP | TEXT | Measurement date (M/D/YYYY format - US style!)
Water_Surface_Elevation | REAL | Elevation of water surface (ft above datum)
DTW_FT_Reviewed | REAL | Depth to water (ft below ground surface)
DateTimeReceived | TEXT | When data was uploaded (not measurement time)
DateTimeChanged | TEXT | Last modification timestampCRITICAL: TIMESTAMP Parsing
# CORRECT - Explicit US format
df['Date'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y')
# WRONG - Auto-detection may fail
df['Date'] = pd.to_datetime(df['TIMESTAMP']) # DON'T DO THISTable: OB_LOCATIONS
-- Well metadata (30 wells, different from measurements table!)
-- WARNING: Only ~1 well overlaps with measurements table
Column Name | Type | Description
--------------|---------|------------------------------------------
P_NUMBER | TEXT | Well identifier
LAT_NAD_83 | REAL | Latitude (NAD83 datum)
LONG_NAD_83 | REAL | Longitude (NAD83 datum)
ELEV_FT | REAL | Ground surface elevation56.1.2 Weather Database (warm.db)
Table: WarmICNData
-- Hourly weather observations from Illinois Climate Network
-- ~2.3 million records from 20+ stations
Column Name | Type | Description
--------------|---------|------------------------------------------
station_code | INTEGER | Station identifier (e.g., 101, 102)
datetime | TEXT | Observation timestamp
nPrecip | REAL | Precipitation (mm)
nAirTemp | REAL | Air temperature (°C)
nRelHumid | REAL | Relative humidity (%)
nWindSpeed | REAL | Wind speed (m/s)
nSolarRad | REAL | Solar radiation (W/m²)56.1.3 USGS Stream Data (usgs_stream/)
Files: Parquet format, one per site
Pattern: usgs_stream/{site_id}_daily.parquet
Column Name | Type | Description
--------------|----------|------------------------------------------
datetime | DATETIME | Observation date
discharge_cfs | FLOAT | Daily mean discharge (cubic feet/second)
site_no | STRING | USGS site number (e.g., "03337000")
57 Data Sources Summary
| Source | Location | Size | Records | Description |
|---|---|---|---|---|
| HTEM Geophysical | data/htem/ |
4.74 GB | 600K+ points | Helicopter-borne EM resistivity models |
| Groundwater DB | data/aquifer.db |
113.6 MB | 1.05M | 356 observation wells, time series |
| Weather DB | data/warm.db |
5.99 GB | ~2.3M (hourly) | 20+ stations, hourly to daily data |
| USGS Stream | data/usgs_stream/ |
5-10 MB | 160K+ | 9 active gauge sites, daily discharge |
| HTEM Archive | data/htem.zip |
4.8 GB | - | Compressed backup of HTEM data |
58 1. HTEM Geophysical Data
58.1 1.1 Overview
Location: data/htem/ Format: Binary grids (.grd, .grd3), CSV, XYZ ASCII Coordinate System: UTM Zone 16N (EPSG:32616) Survey Date: 2008 (original), processed 2024-2025 Coverage: 2,361 km² (Champaign County)
58.2 1.2 Directory Structure
data/htem/
├── 2DGrids/ # 2D resistivity grids
│ ├── Unit_A_Original.grd
│ ├── Unit_A_Adjusted.grd
│ └── ... (94 grid files)
├── 3DGrids/ # 3D models and classifications
│ ├── SCI11_40L_NPSmooth_TOP.grd3
│ ├── SCI11Smooth_MaterialType_Grids/
│ └── SCI11Smooth_ResistClass_Grids/
├── Geophysics/ # Raw survey data
├── InterpretationPoints/ # Manual interpretations
├── StratigraphicSystems/ # Geological framework
└── figures_tables/ # Reference lookups
58.3 1.3 Grid Files
Location: data/htem/2DGrids/ Format: Surfer .grd binary grid files Count: 94 grid files + 7 XML metadata files
58.3.1 File Naming Convention
Unit_[A-F]_[Original|Adjusted].grd
Units: - A, B, C, D, E, F (6 stratigraphic units)
Variants: - Original: Direct inversion results - Adjusted: Calibrated to well logs
58.3.2 Grid Specifications
| Property | Value |
|---|---|
| Horizontal Resolution | 100m × 100m |
| Extent | ~2,361 km² |
| Coordinate System | UTM Zone 16N |
| Data Type | Float32 |
| No Data Value | 1.70141e+38 |
58.4 1.4 Inversion Models
Location: data/htem/3DGrids/ Format: .grd3 (3D grid format)
58.4.1 Available Models
| Model | File | Size | Description |
|---|---|---|---|
| Smooth | SCI11_40L_NPSmooth_TOP.grd3 |
114 MB | Smooth regularization, gradual transitions |
| Sharp | SCI12_40L_NPSharp_TOP.grd3 |
114 MB | Sharp contrasts, layer boundaries |
| Blocky | SCI_40L_NPBlocky_TOP2.grd3 |
297 MB | Blocky inversion, distinct zones |
Parameters: - Layers: 40 layers - Processing: Non-Parametric (NP) inversion - TOP: Topography-constrained
58.5 1.5 Material Classifications
Location: data/htem/3DGrids/SCI11Smooth_MaterialType_Grids/ Format: CSV
58.5.1 File Structure
Each unit has 3 scenario files:
Unit_A_Preferred_MT.csv
Unit_A_LoFreqSand_MT.csv
Unit_A_HiFreqSand_MT.csv
58.5.2 CSV Columns
| Column | Type | Description |
|---|---|---|
| X | Float | UTM Easting (m) |
| Y | Float | UTM Northing (m) |
| Z | Float | Elevation (m above sea level) |
| MT_Index | Integer | Material type index (1-105) |
58.5.3 Scenarios
- Preferred: Best-estimate classification
- LoFreqSand: Conservative (less sand)
- HiFreqSand: Optimistic (more sand)
58.6 1.6 Resistivity Classifications
Location: data/htem/3DGrids/SCI11Smooth_ResistClass_Grids/ Format: CSV
58.6.1 CSV Columns
| Column | Type | Description |
|---|---|---|
| X | Float | UTM Easting (m) |
| Y | Float | UTM Northing (m) |
| Z | Float | Elevation (m above sea level) |
| Resist_Class | Integer | Resistivity class (1-15) |
58.6.2 Resistivity Classes
| Class | Range (Ω·m) | Typical Material |
|---|---|---|
| 1 | 1-5 | Clay, saturated silt |
| 2 | 5-10 | Silty clay |
| 3 | 10-15 | Clayey silt |
| 4 | 15-20 | Fine sand, silt |
| 5 | 20-30 | Fine to medium sand |
| 6 | 30-40 | Medium sand |
| 7 | 40-55 | Medium to coarse sand |
| 8 | 55-75 | Coarse sand |
| 9 | 75-100 | Very coarse sand |
| 10 | 100-120 | Gravelly sand |
| 11 | 120-150 | Sandy gravel |
| 12 | 150-200 | Gravel |
| 13 | 200-500 | Coarse gravel |
| 14 | 500-1000 | Bedrock (weathered) |
| 15 | 1000-10000 | Bedrock (competent) |
58.7 1.7 Material Lookup
58.7.1 Sedimentary Materials (1-14)
| Index | Material Type | Resistivity (Ω·m) | Aquifer Quality |
|---|---|---|---|
| 1 | Clay | 1-15 | Poor (confining layer) |
| 2 | Very poorly sorted sands | 15-20 | Poor |
| 3 | Fine diamicton | 20-25 | Poor to fair |
| 4 | Coarse diamicton | 25-30 | Fair |
| 5 | Poorly sorted sands | 30-35 | Fair |
| 6 | Silty sands | 35-40 | Fair to moderate |
| 7 | Slightly poorly sorted sands | 40-55 | Moderate |
| 8 | Moderately sorted sands | 55-75 | Moderate to good |
| 9 | Slightly well sorted sands | 75-100 | Good |
| 10 | Moderately well sorted sands | 100-120 | Good |
| 11 | Well sorted sands | 120-150 | Very good |
| 12 | Very moderately sorted sands | 150-170 | Very good |
| 13 | Very well sorted sands | 170-200 | Excellent |
| 14 | Extremely well sorted sands | 200-600 | Excellent |
58.7.2 Bedrock Materials (101-105)
| Index | Material Type | Description |
|---|---|---|
| 101 | Carboniferous shale mix | Shale-dominated bedrock |
| 102 | Carboniferous sandstone mix | Sandstone-dominated bedrock |
| 103 | Carboniferous margins | Transition zones |
| 104 | Carbonate margins | Limestone/dolomite margins |
| 105 | Carbonate | Pure limestone/dolomite |
58.8 1.8 Stratigraphic Units
| Unit | Name | Depth Range (m) | Geological Period | Primary Lithology |
|---|---|---|---|---|
| A | Deep Bedrock | 180-194 | Carboniferous | Competent bedrock (~48.7 Ω·m) |
| B | Transition Zone | 108-168 | Carboniferous | Weathered bedrock transition |
| C | Upper Bedrock | 124-166 | Carboniferous | Upper bedrock interface |
| D | Primary Aquifer (Mahomet) | 12-96 | Quaternary | Sand/gravel (~128 Ω·m) |
| E | Clay-rich Quaternary | Near surface | Quaternary | Clay confining layer |
| F | Mixed Surface | 0-30 | Quaternary | Mixed surface materials |
Note: Unit D is the Mahomet Aquifer—the primary water resource target. Units A-C are bedrock (too deep for extraction). Units E-F are clay-rich confining layers that protect Unit D from surface contamination.
59 2. Groundwater Database
59.1 2.1 Overview
Location: data/aquifer.db Format: SQLite3 database Size: 113.6 MB Records: 1,048,575+ measurements Wells: 356 observation wells Temporal Coverage: Continuous monitoring (varies by well)
59.2 2.2 Database Schema
59.2.1 Tables
| Table | Rows | Description |
|---|---|---|
OB_LOCATIONS |
356 | Well locations and metadata |
OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY |
1,048,575 | Time series measurements |
OB_LOCATIONS_FIELD_DESCRIPTIONS |
8 | Metadata for OB_LOCATIONS |
OB_WELL_MEASUREMENTS_FIELD_DESCRIPTIONS |
11 | Metadata for measurements |
59.3 2.3 OB_LOCATIONS Table
Primary observation point locations with multiple coordinate systems.
| Column | Type | Unit | Description |
|---|---|---|---|
P_Number |
TEXT | - | Well identifier (unique key) |
X_LAMBERT |
REAL | meters | Lambert Conformal Conic X |
Y_LAMBERT |
REAL | meters | Lambert Conformal Conic Y |
LONG_NAD_83 |
REAL | degrees | NAD83 longitude |
LAT_NAD_83 |
REAL | degrees | NAD83 latitude |
LONG_WGS_84 |
REAL | degrees | WGS84 longitude |
LAT_WGS_84 |
REAL | degrees | WGS84 latitude |
LS_ELEV_FT |
REAL | feet | Land surface elevation |
TOC_ELEV_FT |
REAL | feet | Top of casing elevation |
Primary Key: P_Number Spatial Reference: NAD83 and WGS84 for geographic, Lambert for projected
59.4 2.4 OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY Table
Time-series water level and temperature measurements.
| Column | Type | Unit | Description |
|---|---|---|---|
P_Number |
TEXT | - | Well identifier (foreign key) |
TIMESTAMP |
TEXT | M/D/YYYY | Measurement date (US format) |
DTW_FT_RAW |
REAL | feet | Raw depth to water |
DTW_FT_Reviewed |
REAL | feet | Quality-controlled depth to water |
Water_Temp_F_Raw |
REAL | °F | Raw water temperature |
Water_Temp_F_Reviewed |
REAL | °F | QC’d water temperature |
Water_Surface_Elevation |
REAL | feet | Calculated water surface elevation |
DTW_FT_Flag |
INTEGER | - | Quality flag for depth (0=good) |
Water_Temp_F_Flag |
INTEGER | - | Quality flag for temperature |
Finalized_Flag |
INTEGER | - | Data finalization status |
DateTimeReceived |
TEXT | - | When data received by system |
DateTimeChanged |
TEXT | - | Last modification timestamp |
Foreign Key: P_Number → OB_LOCATIONS.P_Number Primary Timestamp: TIMESTAMP (measurement time, NOT received time)
59.4.1 CRITICAL: TIMESTAMP Format
Format: M/D/YYYY (United States non-ISO format)
Examples:
7/9/2008= July 9, 2008 (NOT September 7, 2008)12/25/2020= December 25, 20201/1/2000= January 1, 2000
Correct parsing:
# ✅ ALWAYS use explicit format specification
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')Wrong (ambiguous):
# ❌ DON'T DO THIS - lets pandas guess (locale-dependent)
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP']) # WRONG!59.4.2 Quality Flags
| Flag Value | Meaning |
|---|---|
| 0 | Good data (passed QC) |
| 1 | Questionable (minor issues) |
| 2 | Bad data (failed QC) |
| 9 | Missing data |
Usage: Filter to *_Flag = 0 for analysis.
59.5 2.5 Column Aliases
For backward compatibility, the IntegratedDataLoader provides aliases:
| Database Column | Alias | Recommended Use |
|---|---|---|
TIMESTAMP |
MeasurementDate |
Use TIMESTAMP (primary) |
P_Number |
WellID |
Use P_Number (primary) |
60 3. Weather Database
60.1 3.1 Overview
Location: data/warm.db Format: SQLite3 database Size: 5.99 GB Records: 20,301,881+ observations Stations: 20+ weather stations Temporal Resolution: 5-minute to daily
60.2 3.2 Database Schema
60.2.1 Tables
| Table | Rows | Description |
|---|---|---|
WarmICNFiveMin |
20,301,881 | 5-minute observations |
WarmStationLookup |
20 | Station metadata |
WarmICNDaily |
~500,000 | Daily aggregations |
WarmICNHourly |
~5,000,000 | Hourly aggregations |
60.3 3.3 WarmICNFiveMin Table
High-frequency weather observations at 5-minute intervals.
| Column | Type | Unit | Description |
|---|---|---|---|
nStationCode |
INTEGER | - | Station identifier (foreign key) |
nDateTime |
INTEGER | seconds | Unix timestamp (seconds since 1970) |
nAirTemp |
REAL | °F | Air temperature |
nRelHumid |
REAL | % | Relative humidity |
nWindSpeed |
REAL | mph | Wind speed |
nWindDirectionDeg |
REAL | degrees | Wind direction (0-360°, 0=North) |
nSolarRad |
REAL | W/m² | Solar radiation |
nRainfall |
REAL | inches | Rainfall (5-min accumulation) |
nBaroPressure |
REAL | inHg | Barometric pressure |
nDewPoint |
REAL | °F | Dew point temperature |
Primary Key: (nStationCode, nDateTime) Temporal Coverage: Varies by station (2000s-present)
60.4 3.4 WarmStationLookup Table
Weather station metadata.
| Column | Type | Description |
|---|---|---|
StationCode |
INTEGER | Unique station identifier |
StationName |
TEXT | Station name |
Latitude |
REAL | Latitude (WGS84) |
Longitude |
REAL | Longitude (WGS84) |
Elevation |
REAL | Elevation (feet) |
Status |
TEXT | Active/Inactive |
StartDate |
TEXT | Station start date |
EndDate |
TEXT | Station end date (if inactive) |
Primary Key: StationCode
60.5 3.5 Aggregated Tables
60.5.1 WarmICNHourly
Hourly aggregations from 5-minute data.
Aggregation Methods: - Temperature: Mean - Rainfall: Sum - Wind speed: Mean - Wind direction: Vector average - Solar radiation: Mean
60.5.2 WarmICNDaily
Daily aggregations from hourly data.
Additional Columns: - MaxTemp, MinTemp (daily extremes) - TotalRainfall (daily accumulation) - AvgWindSpeed (daily average)
61 4. USGS Stream Gauge Data
61.1 4.1 Overview
Location: data/usgs_stream/ Format: CSV files (one per gauge) Size: 5-10 MB total Records: 160,000+ daily measurements Gauges: 9 active sites Temporal Coverage: 1948-2025 (75+ years, varies by site)
61.2 4.2 File Structure
data/usgs_stream/
├── 03337000_daily.csv # Embarrass River
├── 03339000_daily.csv # Kaskaskia River
├── 03340000_daily.csv # Kaskaskia River (continued)
├── 03341500_daily.csv # Lake Fork
├── 03342000_daily.csv # Sangamon River
├── 03343000_daily.csv # Sangamon River (continued)
├── 03344000_daily.csv # Sangamon River (downstream)
├── 03345500_daily.csv # Salt Fork
└── 03346000_daily.csv # Vermilion River
Naming Convention: {site_no}_daily.csv
61.3 4.3 CSV Schema
Standard USGS format with agency-specific conventions.
| Column | Type | Description |
|---|---|---|
agency_cd |
TEXT | Agency code (always “USGS”) |
site_no |
TEXT | USGS site number (8 digits) |
datetime |
TEXT | Date (YYYY-MM-DD format) |
discharge_cfs |
REAL | Mean daily discharge (cubic feet/second) |
discharge_cd |
TEXT | Qualification code |
61.3.1 Discharge Qualification Codes
| Code | Meaning |
|---|---|
A |
Approved for publication |
P |
Provisional (subject to revision) |
e |
Estimated |
Ice |
Ice-affected |
| (blank) | Not qualified |
Usage: Filter to discharge_cd == 'A' for finalized data.
61.4 4.4 Site Metadata
| Site No | Name | Drainage Area (mi²) | Period of Record |
|---|---|---|---|
| 03337000 | Embarrass River near Camargo | 207 | 1957-present |
| 03339000 | Kaskaskia River at Shelbyville | 1,030 | 1970-present |
| 03340000 | Kaskaskia River at Newman | 320 | 1976-present |
| 03341500 | Lake Fork near Catlin | 205 | 1975-present |
| 03342000 | Sangamon River at Mahomet | 362 | 1970-present |
| 03343000 | Sangamon River at Fisher | 549 | 1970-present |
| 03344000 | Sangamon River at Monticello | 550 | 1970-present |
| 03345500 | Salt Fork at St. Joseph | 335 | 1972-present |
| 03346000 | Vermilion River near Danville | 1,280 | 1970-present |
Source: USGS National Water Information System (NWIS)
62 5. Coordinate Systems
Multiple coordinate reference systems are used across datasets:
| System | EPSG | Usage | Notes |
|---|---|---|---|
| UTM Zone 16N | 32616 | HTEM primary grid | NAD83 datum |
| Illinois State Plane | - | Lambert coordinates (wells) | Local projection |
| NAD83 | 4269 | Geographic coordinates | North American Datum 1983 |
| WGS84 | 4326 | GPS/modern mapping | World Geodetic System 1984 |
| Web Mercator | 3857 | Web visualization | For Leaflet/Mapbox |
62.1 Coordinate Transformations
Example: Convert WGS84 to UTM:
from pyproj import Transformer
transformer = Transformer.from_crs("EPSG:4326", "EPSG:32616", always_xy=True)
utm_x, utm_y = transformer.transform(lon, lat)63 6. Data Quality Notes
63.1 Spatial Coverage
- HTEM: Complete coverage of Champaign County (2,361 km²)
- Wells: 356 wells, spatially distributed but clustered in populated areas
- Weather: 20+ stations, uneven spatial distribution
- Streams: 9 gauges on major rivers, limited small stream coverage
63.2 Temporal Coverage
| Dataset | Earliest | Latest | Frequency |
|---|---|---|---|
| HTEM | 2008 (survey) | 2024-2025 (processing) | One-time |
| Groundwater | Varies (1950s+) | Present | 15-min to daily |
| Weather | ~2000 | Present | 5-min to daily |
| USGS Stream | 1957 (oldest site) | Present | Daily |
Note: Temporal coverage varies by specific well/station/gauge.
63.3 Data Quality Flags
63.3.1 Groundwater
*_Flag = 0: Good data (use for analysis)*_Flag = 1: Questionable (use with caution)*_Flag = 2: Bad data (exclude)
63.3.2 USGS Stream
discharge_cd = 'A': Approved (finalized)discharge_cd = 'P': Provisional (subject to revision)discharge_cd = 'e': Estimated
64 7. File Naming Conventions
64.1 HTEM Files
2D Grids:
Unit_[A-F]_[Original|Adjusted].grd
3D Models:
SCI[11-13]_[Parameters]_[Model].grd3
Material Types:
Unit_[A-F]_[Preferred|LoFreqSand|HiFreqSand]_MT.csv
Resistivity Classes:
Unit_[A-F]_ResistClass.csv
64.2 Database Tables
Groundwater: - OB_*: Observation/monitoring data - *_FIELD_DESCRIPTIONS: Metadata tables
Weather: - Warm*: Weather/climate data - *Lookup: Reference/metadata tables
65 8. Usage Examples
65.1 Reading HTEM Material Types
import pandas as pd
# Load Unit D (primary aquifer) material types
df = pd.read_csv('data/htem/3DGrids/SCI11Smooth_MaterialType_Grids/Unit_D_Preferred_MT.csv')
# Columns: X, Y, Z, MT_Index
print(df.head())65.2 Querying Groundwater Database
import sqlite3
conn = sqlite3.connect('data/aquifer.db')
query = """
SELECT P_Number, TIMESTAMP, DTW_FT_Reviewed, Water_Surface_Elevation
FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
WHERE DTW_FT_Flag = 0 -- Good data only
AND TIMESTAMP > '1/1/2020'
ORDER BY TIMESTAMP DESC
LIMIT 1000
"""
df = pd.read_sql_query(query, conn)
# CRITICAL: Parse timestamp with explicit format
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y')
conn.close()65.3 Loading Weather Data
import sqlite3
conn = sqlite3.connect('data/warm.db')
# Get station info
stations = pd.read_sql_query("SELECT * FROM WarmStationLookup WHERE Status='Active'", conn)
# Load hourly data for station 101
query = """
SELECT nDateTime, nAirTemp, nRainfall
FROM WarmICNHourly
WHERE nStationCode = 101
AND nDateTime > strftime('%s', '2020-01-01')
"""
weather_df = pd.read_sql_query(query, conn)
# Convert Unix timestamp to datetime
weather_df['datetime'] = pd.to_datetime(weather_df['nDateTime'], unit='s')
conn.close()65.4 Using IntegratedDataLoader (Recommended)
from src.data_loaders import IntegratedDataLoader
with IntegratedDataLoader() as loader:
# Load HTEM data
htem_data = loader.htem.load_material_type_grid('D', 'Preferred')
# Load groundwater data (timestamps pre-parsed correctly)
gw_data = loader.groundwater.load_well_time_series('P123456')
# Load weather data
weather = loader.weather.load_hourly_data(station_code=101, start_date='2020-01-01')
# Load USGS stream data
discharge = loader.usgs_stream.load_daily_discharge('03337000')
# Get overview of all data
overview = loader.get_data_overview()
print(overview)Advantages: - Handles timestamp parsing correctly - Provides consistent interface - Manages database connections - Includes data validation
66 9. Common Data Issues
66.1 Timestamp Parsing
Issue: Ambiguous date format in groundwater database
Solution: Always use explicit format:
pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y')66.2 Missing Data
HTEM: No data value = 1.70141e+38 (filter out)
Groundwater: Check quality flags, missing values as NULL
Weather: Missing values as NULL or -999
USGS Stream: Missing values as blank or “Ice”
66.3 Coordinate Mismatches
Issue: Different datasets use different coordinate systems
Solution: Always transform to common CRS (e.g., UTM Zone 16N) before spatial operations
67 10. Data Access Configuration
All data paths should be managed through config/data_config.yaml:
data_paths:
# HTEM data
grids_2d: "data/htem/2DGrids"
grids_3d: "data/htem/3DGrids"
material_types: "data/htem/3DGrids/SCI11Smooth_MaterialType_Grids"
resistivity_classes: "data/htem/3DGrids/SCI11Smooth_ResistClass_Grids"
# Databases
aquifer_db: "data/aquifer.db"
weather_db: "data/warm.db"
# USGS stream data
usgs_stream: "data/usgs_stream"Usage:
from src.utils.config_loader import get_data_path
htem_path = get_data_path("grids_2d")
db_path = get_data_path("aquifer_db")68 11. Update History
| Date | Version | Description |
|---|---|---|
| 2024-10-29 | 1.0 | Initial data dictionary |
| 2024-11-15 | 1.1 | Added USGS stream data |
| 2025-01-10 | 1.2 | Updated TIMESTAMP documentation |
| 2025-11-26 | 2.0 | Consolidated for Playbook |
69 12. Contact and Support
Data Questions: - Check field description tables in databases - Consult XML metadata files for HTEM grids - Review TIMESTAMP_AUDIT_AND_FIXES.md for timestamp issues
Issues: - Open GitHub issue with data label - Include dataset, field, and specific question
Contributions: - Submit PR to update this dictionary - Document new fields or data sources
Last Updated: November 26, 2025 Data Version: 2024-2025 Survey Campaign Processing Software: Aarhus Workbench 2024.1.1.0 License: Data usage subject to provider terms (ISWS, USGS, etc.)