55  Complete Data Dictionary

Reference for all four data sources

56 Overview

This data dictionary documents all four primary data sources in the Aquifer Intelligence System. The repository contains 10+ GB of integrated hydrogeological data from Champaign County, Illinois.

NoteFor Newcomers: What This Dictionary Does

Think of this as a technical reference manual for the data. You don’t need to memorize it—bookmark this page and return when: - A chapter mentions a field name you don’t recognize - You need to understand what a measurement means - You want to know where data comes from or how it’s formatted

Skim now, reference later.

Data Sources: 1. HTEM geophysical data (electromagnetic surveys) 2. Groundwater monitoring database (well measurements) 3. Weather/climate database (meteorological observations) 4. USGS stream gauge data (surface water discharge)

Total Storage: ~10.44 GB across all sources

Why Four Data Sources?

Each data source sees the aquifer differently: - HTEM: Where is the aquifer? (structure, materials) - Groundwater wells: How full is it? (water levels over time) - Weather: What drives changes? (rain, temperature) - Stream gauges: Where does water go? (surface-groundwater connections)

Together, they tell the complete story of how the aquifer behaves.


56.1 Database Schemas

56.1.1 Groundwater Database (aquifer.db)

Table: OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY

-- Primary table for groundwater level analysis
-- ~1 million records from 18 wells (2009-2023)

Column Name                  | Type    | Description
-----------------------------|---------|--------------------------------------------------
P_Number                     | TEXT    | Well identifier (e.g., "444863", "410614")
TIMESTAMP                    | TEXT    | Measurement date (M/D/YYYY format - US style!)
Water_Surface_Elevation      | REAL    | Elevation of water surface (ft above datum)
DTW_FT_Reviewed              | REAL    | Depth to water (ft below ground surface)
DateTimeReceived             | TEXT    | When data was uploaded (not measurement time)
DateTimeChanged              | TEXT    | Last modification timestamp

CRITICAL: TIMESTAMP Parsing

# CORRECT - Explicit US format
df['Date'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y')

# WRONG - Auto-detection may fail
df['Date'] = pd.to_datetime(df['TIMESTAMP'])  # DON'T DO THIS

Table: OB_LOCATIONS

-- Well metadata (30 wells, different from measurements table!)
-- WARNING: Only ~1 well overlaps with measurements table

Column Name   | Type    | Description
--------------|---------|------------------------------------------
P_NUMBER      | TEXT    | Well identifier
LAT_NAD_83    | REAL    | Latitude (NAD83 datum)
LONG_NAD_83   | REAL    | Longitude (NAD83 datum)
ELEV_FT       | REAL    | Ground surface elevation

56.1.2 Weather Database (warm.db)

Table: WarmICNData

-- Hourly weather observations from Illinois Climate Network
-- ~2.3 million records from 20+ stations

Column Name   | Type    | Description
--------------|---------|------------------------------------------
station_code  | INTEGER | Station identifier (e.g., 101, 102)
datetime      | TEXT    | Observation timestamp
nPrecip       | REAL    | Precipitation (mm)
nAirTemp      | REAL    | Air temperature (°C)
nRelHumid     | REAL    | Relative humidity (%)
nWindSpeed    | REAL    | Wind speed (m/s)
nSolarRad     | REAL    | Solar radiation (W/m²)

56.1.3 USGS Stream Data (usgs_stream/)

Files: Parquet format, one per site

Pattern: usgs_stream/{site_id}_daily.parquet

Column Name   | Type     | Description
--------------|----------|------------------------------------------
datetime      | DATETIME | Observation date
discharge_cfs | FLOAT    | Daily mean discharge (cubic feet/second)
site_no       | STRING   | USGS site number (e.g., "03337000")

57 Data Sources Summary

Source Location Size Records Description
HTEM Geophysical data/htem/ 4.74 GB 600K+ points Helicopter-borne EM resistivity models
Groundwater DB data/aquifer.db 113.6 MB 1.05M 356 observation wells, time series
Weather DB data/warm.db 5.99 GB ~2.3M (hourly) 20+ stations, hourly to daily data
USGS Stream data/usgs_stream/ 5-10 MB 160K+ 9 active gauge sites, daily discharge
HTEM Archive data/htem.zip 4.8 GB - Compressed backup of HTEM data

Weather Data Note: The WarmICNData table contains ~2.3 million hourly records. Higher-frequency 5-minute data (WarmICNFiveMin, ~20M records) may be available but requires additional storage.


58 1. HTEM Geophysical Data

58.1 1.1 Overview

Location: data/htem/ Format: Binary grids (.grd, .grd3), CSV, XYZ ASCII Coordinate System: UTM Zone 16N (EPSG:32616) Survey Date: 2008 (original), processed 2024-2025 Coverage: 2,361 km² (Champaign County)

TipWhy HTEM Matters

HTEM = Helicopter Time-domain ElectroMagnetic survey

Imagine trying to understand what’s underground without drilling thousands of expensive holes. HTEM does exactly that:

  1. A helicopter flies over the area carrying a large electromagnetic transmitter coil
  2. Electrical pulses are sent into the ground
  3. Sensors measure how quickly the ground’s response decays
  4. Different materials respond differently (clay = slow decay, sand = fast decay)
  5. Computer processing converts these signals into 3D maps of subsurface materials

Result: We can “see” underground layers, aquifer boundaries, and material types across the entire county from a single survey, instead of guessing between sparse well locations.

NoteTechnical Terms Explained
  • UTM Zone 16N: A coordinate system for mapping (like latitude/longitude but in meters). “16N” means the zone covering Illinois.
  • EPSG:32616: The official code for UTM Zone 16N that software uses
  • Binary grid (.grd): A file format storing 2D or 3D data efficiently (like an image, but for numbers)
  • Resistivity (Ω·m): Measures how much a material resists electrical current. High = sand/gravel, Low = clay.

58.2 1.2 Directory Structure

data/htem/
├── 2DGrids/                    # 2D resistivity grids
│   ├── Unit_A_Original.grd
│   ├── Unit_A_Adjusted.grd
│   └── ... (94 grid files)
├── 3DGrids/                    # 3D models and classifications
│   ├── SCI11_40L_NPSmooth_TOP.grd3
│   ├── SCI11Smooth_MaterialType_Grids/
│   └── SCI11Smooth_ResistClass_Grids/
├── Geophysics/                 # Raw survey data
├── InterpretationPoints/       # Manual interpretations
├── StratigraphicSystems/       # Geological framework
└── figures_tables/             # Reference lookups

58.3 1.3 Grid Files

Location: data/htem/2DGrids/ Format: Surfer .grd binary grid files Count: 94 grid files + 7 XML metadata files

58.3.1 File Naming Convention

Unit_[A-F]_[Original|Adjusted].grd

Units: - A, B, C, D, E, F (6 stratigraphic units)

Variants: - Original: Direct inversion results - Adjusted: Calibrated to well logs

58.3.2 Grid Specifications

Property Value
Horizontal Resolution 100m × 100m
Extent ~2,361 km²
Coordinate System UTM Zone 16N
Data Type Float32
No Data Value 1.70141e+38

58.4 1.4 Inversion Models

Location: data/htem/3DGrids/ Format: .grd3 (3D grid format)

58.4.1 Available Models

Model File Size Description
Smooth SCI11_40L_NPSmooth_TOP.grd3 114 MB Smooth regularization, gradual transitions
Sharp SCI12_40L_NPSharp_TOP.grd3 114 MB Sharp contrasts, layer boundaries
Blocky SCI_40L_NPBlocky_TOP2.grd3 297 MB Blocky inversion, distinct zones

Parameters: - Layers: 40 layers - Processing: Non-Parametric (NP) inversion - TOP: Topography-constrained


58.5 1.5 Material Classifications

Location: data/htem/3DGrids/SCI11Smooth_MaterialType_Grids/ Format: CSV

58.5.1 File Structure

Each unit has 3 scenario files:

Unit_A_Preferred_MT.csv
Unit_A_LoFreqSand_MT.csv
Unit_A_HiFreqSand_MT.csv

58.5.2 CSV Columns

Column Type Description
X Float UTM Easting (m)
Y Float UTM Northing (m)
Z Float Elevation (m above sea level)
MT_Index Integer Material type index (1-105)

58.5.3 Scenarios

  • Preferred: Best-estimate classification
  • LoFreqSand: Conservative (less sand)
  • HiFreqSand: Optimistic (more sand)

58.6 1.6 Resistivity Classifications

Location: data/htem/3DGrids/SCI11Smooth_ResistClass_Grids/ Format: CSV

58.6.1 CSV Columns

Column Type Description
X Float UTM Easting (m)
Y Float UTM Northing (m)
Z Float Elevation (m above sea level)
Resist_Class Integer Resistivity class (1-15)

58.6.2 Resistivity Classes

Class Range (Ω·m) Typical Material
1 1-5 Clay, saturated silt
2 5-10 Silty clay
3 10-15 Clayey silt
4 15-20 Fine sand, silt
5 20-30 Fine to medium sand
6 30-40 Medium sand
7 40-55 Medium to coarse sand
8 55-75 Coarse sand
9 75-100 Very coarse sand
10 100-120 Gravelly sand
11 120-150 Sandy gravel
12 150-200 Gravel
13 200-500 Coarse gravel
14 500-1000 Bedrock (weathered)
15 1000-10000 Bedrock (competent)

58.7 1.7 Material Lookup

58.7.1 Sedimentary Materials (1-14)

Index Material Type Resistivity (Ω·m) Aquifer Quality
1 Clay 1-15 Poor (confining layer)
2 Very poorly sorted sands 15-20 Poor
3 Fine diamicton 20-25 Poor to fair
4 Coarse diamicton 25-30 Fair
5 Poorly sorted sands 30-35 Fair
6 Silty sands 35-40 Fair to moderate
7 Slightly poorly sorted sands 40-55 Moderate
8 Moderately sorted sands 55-75 Moderate to good
9 Slightly well sorted sands 75-100 Good
10 Moderately well sorted sands 100-120 Good
11 Well sorted sands 120-150 Very good
12 Very moderately sorted sands 150-170 Very good
13 Very well sorted sands 170-200 Excellent
14 Extremely well sorted sands 200-600 Excellent

58.7.2 Bedrock Materials (101-105)

Index Material Type Description
101 Carboniferous shale mix Shale-dominated bedrock
102 Carboniferous sandstone mix Sandstone-dominated bedrock
103 Carboniferous margins Transition zones
104 Carbonate margins Limestone/dolomite margins
105 Carbonate Pure limestone/dolomite

58.8 1.8 Stratigraphic Units

ImportantFor Newcomers: Understanding Layers

Think of the ground beneath your feet like a layer cake. Each layer formed at a different time and has different properties. These six “units” (A through F) represent those layers from deepest (A) to shallowest (F).

The key player: Unit D is the main aquifer—a buried sand and gravel valley that holds most of the groundwater we use.

Unit Name Depth Range (m) Geological Period Primary Lithology
A Deep Bedrock 180-194 Carboniferous Competent bedrock (~48.7 Ω·m)
B Transition Zone 108-168 Carboniferous Weathered bedrock transition
C Upper Bedrock 124-166 Carboniferous Upper bedrock interface
D Primary Aquifer (Mahomet) 12-96 Quaternary Sand/gravel (~128 Ω·m)
E Clay-rich Quaternary Near surface Quaternary Clay confining layer
F Mixed Surface 0-30 Quaternary Mixed surface materials

Note: Unit D is the Mahomet Aquifer—the primary water resource target. Units A-C are bedrock (too deep for extraction). Units E-F are clay-rich confining layers that protect Unit D from surface contamination.

TipWhy These Layers Matter

Unit D (Primary Aquifer):

  • What it is: Ancient river valley filled with sand and gravel, now buried
  • Why it matters: High-quality water storage and transmission
  • Depth: 12-96 meters (40-315 feet) below surface
  • Resistivity: ~128 Ω·m (high = good aquifer)

Unit E (Confining Layer Above):

  • What it is: Clay-rich layer sitting on top of Unit D
  • Why it matters: Protects aquifer from surface contamination (acts as a seal)
  • Depth: Near surface to ~30m
  • Resistivity: Low (<30 Ω·m, clay conducts electricity better than sand)

Units A-C (Bedrock Below):

  • What they are: Ancient bedrock from 300+ million years ago
  • Why they matter: Form the “floor” beneath the aquifer
  • Depth: 100-200 meters deep (too deep and too hard for water wells)

Units F (Surface):

  • What it is: Mixed glacial deposits and soil
  • Why it matters: Less important for aquifer, but affects how rain soaks into ground
NoteDepth Convention

All depths are measured downward from the land surface. So:

  • 0 meters = ground surface (where you stand)
  • 50 meters = 50 meters below the surface
  • Unit D at “12-96m” means it starts 12 meters down and extends to 96 meters down

This is the opposite of elevation, which increases upward!


59 2. Groundwater Database

59.1 2.1 Overview

Location: data/aquifer.db Format: SQLite3 database Size: 113.6 MB Records: 1,048,575+ measurements Wells: 356 observation wells Temporal Coverage: Continuous monitoring (varies by well)

TipWhy Well Measurements Matter

While HTEM shows us where the aquifer is, well measurements tell us how it behaves over time:

  • Is the aquifer getting fuller or emptier? (rising or falling water levels)
  • How does it respond to rain? (quick rise = good connection, slow rise = confined)
  • Does it recover from drought? (resilience)
  • Are there seasonal patterns? (summer lows, spring highs)

Think of HTEM as a snapshot photograph of the aquifer structure. Wells are like time-lapse video showing how the system responds to weather, pumping, and seasons.

356 wells × 1 million measurements = A detailed record of aquifer behavior over years

NoteWhat’s in a Well Measurement?

Each record captures a moment in time at a specific well:

  • Which well? (P_Number = well identifier)
  • When? (TIMESTAMP = date of measurement)
  • How deep is the water? (DTW_FT_Reviewed = Depth To Water in feet)
  • How high is the water surface? (Water_Surface_Elevation = calculated from land surface - depth)
  • How good is this data? (Quality flags: 0=good, 1=questionable, 2=bad)

Pro tip: Always filter to quality flag = 0 before analysis!


59.2 2.2 Database Schema

59.2.1 Tables

Table Rows Description
OB_LOCATIONS 356 Well locations and metadata
OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY 1,048,575 Time series measurements
OB_LOCATIONS_FIELD_DESCRIPTIONS 8 Metadata for OB_LOCATIONS
OB_WELL_MEASUREMENTS_FIELD_DESCRIPTIONS 11 Metadata for measurements

59.3 2.3 OB_LOCATIONS Table

Primary observation point locations with multiple coordinate systems.

Column Type Unit Description
P_Number TEXT - Well identifier (unique key)
X_LAMBERT REAL meters Lambert Conformal Conic X
Y_LAMBERT REAL meters Lambert Conformal Conic Y
LONG_NAD_83 REAL degrees NAD83 longitude
LAT_NAD_83 REAL degrees NAD83 latitude
LONG_WGS_84 REAL degrees WGS84 longitude
LAT_WGS_84 REAL degrees WGS84 latitude
LS_ELEV_FT REAL feet Land surface elevation
TOC_ELEV_FT REAL feet Top of casing elevation

Primary Key: P_Number Spatial Reference: NAD83 and WGS84 for geographic, Lambert for projected


59.4 2.4 OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY Table

Time-series water level and temperature measurements.

Column Type Unit Description
P_Number TEXT - Well identifier (foreign key)
TIMESTAMP TEXT M/D/YYYY Measurement date (US format)
DTW_FT_RAW REAL feet Raw depth to water
DTW_FT_Reviewed REAL feet Quality-controlled depth to water
Water_Temp_F_Raw REAL °F Raw water temperature
Water_Temp_F_Reviewed REAL °F QC’d water temperature
Water_Surface_Elevation REAL feet Calculated water surface elevation
DTW_FT_Flag INTEGER - Quality flag for depth (0=good)
Water_Temp_F_Flag INTEGER - Quality flag for temperature
Finalized_Flag INTEGER - Data finalization status
DateTimeReceived TEXT - When data received by system
DateTimeChanged TEXT - Last modification timestamp

Foreign Key: P_NumberOB_LOCATIONS.P_Number Primary Timestamp: TIMESTAMP (measurement time, NOT received time)

59.4.1 CRITICAL: TIMESTAMP Format

Warning⚠️ CRITICAL DATA FORMAT ISSUE

This is the #1 data error that breaks analyses!

The database stores dates in US format: Month/Day/Year (e.g., 7/9/2008 means July 9, NOT September 7).

Why this matters:

If you let software auto-detect the date format, it might guess wrong based on your computer’s locale settings. This silently corrupts all temporal analysis:

  • Seasonal patterns shift by months
  • Trend calculations become meaningless
  • Drought/wet periods get misidentified
  • Correlations with weather data fail

The fix is simple but MANDATORY:

Format: M/D/YYYY (United States non-ISO format)

Examples:

  • 7/9/2008 = July 9, 2008 (NOT September 7, 2008)
  • 12/25/2020 = December 25, 2020
  • 1/1/2000 = January 1, 2000

Correct parsing:

# ✅ ALWAYS use explicit format specification
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y', errors='coerce')

Wrong (ambiguous):

# ❌ DON'T DO THIS - lets pandas guess (locale-dependent)
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'])  # WRONG!
NoteFor Newcomers: Why Explicit Format Matters

Think of it like spelling out “December” vs. using “12”:

  • Explicit format (format='%m/%d/%Y'): “I know this means Month/Day/Year”
  • Auto-detect: “Is 7/9 July 9th or September 7th? I’ll guess based on your computer’s settings”

The database was created in the US, so it uses US date format. Your computer might use European format (day/month/year). Without explicit format, pandas guesses wrong and you get data from the wrong months!

See: TIMESTAMP_AUDIT_AND_FIXES.md for complete documentation and examples of what goes wrong.


59.4.2 Quality Flags

Flag Value Meaning
0 Good data (passed QC)
1 Questionable (minor issues)
2 Bad data (failed QC)
9 Missing data

Usage: Filter to *_Flag = 0 for analysis.


59.5 2.5 Column Aliases

For backward compatibility, the IntegratedDataLoader provides aliases:

Database Column Alias Recommended Use
TIMESTAMP MeasurementDate Use TIMESTAMP (primary)
P_Number WellID Use P_Number (primary)

60 3. Weather Database

60.1 3.1 Overview

Location: data/warm.db Format: SQLite3 database Size: 5.99 GB Records: 20,301,881+ observations Stations: 20+ weather stations Temporal Resolution: 5-minute to daily


60.2 3.2 Database Schema

60.2.1 Tables

Table Rows Description
WarmICNFiveMin 20,301,881 5-minute observations
WarmStationLookup 20 Station metadata
WarmICNDaily ~500,000 Daily aggregations
WarmICNHourly ~5,000,000 Hourly aggregations

60.3 3.3 WarmICNFiveMin Table

High-frequency weather observations at 5-minute intervals.

Column Type Unit Description
nStationCode INTEGER - Station identifier (foreign key)
nDateTime INTEGER seconds Unix timestamp (seconds since 1970)
nAirTemp REAL °F Air temperature
nRelHumid REAL % Relative humidity
nWindSpeed REAL mph Wind speed
nWindDirectionDeg REAL degrees Wind direction (0-360°, 0=North)
nSolarRad REAL W/m² Solar radiation
nRainfall REAL inches Rainfall (5-min accumulation)
nBaroPressure REAL inHg Barometric pressure
nDewPoint REAL °F Dew point temperature

Primary Key: (nStationCode, nDateTime) Temporal Coverage: Varies by station (2000s-present)


60.4 3.4 WarmStationLookup Table

Weather station metadata.

Column Type Description
StationCode INTEGER Unique station identifier
StationName TEXT Station name
Latitude REAL Latitude (WGS84)
Longitude REAL Longitude (WGS84)
Elevation REAL Elevation (feet)
Status TEXT Active/Inactive
StartDate TEXT Station start date
EndDate TEXT Station end date (if inactive)

Primary Key: StationCode


60.5 3.5 Aggregated Tables

60.5.1 WarmICNHourly

Hourly aggregations from 5-minute data.

Aggregation Methods: - Temperature: Mean - Rainfall: Sum - Wind speed: Mean - Wind direction: Vector average - Solar radiation: Mean

60.5.2 WarmICNDaily

Daily aggregations from hourly data.

Additional Columns: - MaxTemp, MinTemp (daily extremes) - TotalRainfall (daily accumulation) - AvgWindSpeed (daily average)


61 4. USGS Stream Gauge Data

61.1 4.1 Overview

Location: data/usgs_stream/ Format: CSV files (one per gauge) Size: 5-10 MB total Records: 160,000+ daily measurements Gauges: 9 active sites Temporal Coverage: 1948-2025 (75+ years, varies by site)


61.2 4.2 File Structure

data/usgs_stream/
├── 03337000_daily.csv    # Embarrass River
├── 03339000_daily.csv    # Kaskaskia River
├── 03340000_daily.csv    # Kaskaskia River (continued)
├── 03341500_daily.csv    # Lake Fork
├── 03342000_daily.csv    # Sangamon River
├── 03343000_daily.csv    # Sangamon River (continued)
├── 03344000_daily.csv    # Sangamon River (downstream)
├── 03345500_daily.csv    # Salt Fork
└── 03346000_daily.csv    # Vermilion River

Naming Convention: {site_no}_daily.csv


61.3 4.3 CSV Schema

Standard USGS format with agency-specific conventions.

Column Type Description
agency_cd TEXT Agency code (always “USGS”)
site_no TEXT USGS site number (8 digits)
datetime TEXT Date (YYYY-MM-DD format)
discharge_cfs REAL Mean daily discharge (cubic feet/second)
discharge_cd TEXT Qualification code

61.3.1 Discharge Qualification Codes

Code Meaning
A Approved for publication
P Provisional (subject to revision)
e Estimated
Ice Ice-affected
(blank) Not qualified

Usage: Filter to discharge_cd == 'A' for finalized data.


61.4 4.4 Site Metadata

Site No Name Drainage Area (mi²) Period of Record
03337000 Embarrass River near Camargo 207 1957-present
03339000 Kaskaskia River at Shelbyville 1,030 1970-present
03340000 Kaskaskia River at Newman 320 1976-present
03341500 Lake Fork near Catlin 205 1975-present
03342000 Sangamon River at Mahomet 362 1970-present
03343000 Sangamon River at Fisher 549 1970-present
03344000 Sangamon River at Monticello 550 1970-present
03345500 Salt Fork at St. Joseph 335 1972-present
03346000 Vermilion River near Danville 1,280 1970-present

Source: USGS National Water Information System (NWIS)


62 5. Coordinate Systems

Multiple coordinate reference systems are used across datasets:

System EPSG Usage Notes
UTM Zone 16N 32616 HTEM primary grid NAD83 datum
Illinois State Plane - Lambert coordinates (wells) Local projection
NAD83 4269 Geographic coordinates North American Datum 1983
WGS84 4326 GPS/modern mapping World Geodetic System 1984
Web Mercator 3857 Web visualization For Leaflet/Mapbox

62.1 Coordinate Transformations

Example: Convert WGS84 to UTM:

from pyproj import Transformer

transformer = Transformer.from_crs("EPSG:4326", "EPSG:32616", always_xy=True)
utm_x, utm_y = transformer.transform(lon, lat)

63 6. Data Quality Notes

63.1 Spatial Coverage

  • HTEM: Complete coverage of Champaign County (2,361 km²)
  • Wells: 356 wells, spatially distributed but clustered in populated areas
  • Weather: 20+ stations, uneven spatial distribution
  • Streams: 9 gauges on major rivers, limited small stream coverage

63.2 Temporal Coverage

Dataset Earliest Latest Frequency
HTEM 2008 (survey) 2024-2025 (processing) One-time
Groundwater Varies (1950s+) Present 15-min to daily
Weather ~2000 Present 5-min to daily
USGS Stream 1957 (oldest site) Present Daily

Note: Temporal coverage varies by specific well/station/gauge.


63.3 Data Quality Flags

63.3.1 Groundwater

  • *_Flag = 0: Good data (use for analysis)
  • *_Flag = 1: Questionable (use with caution)
  • *_Flag = 2: Bad data (exclude)

63.3.2 USGS Stream

  • discharge_cd = 'A': Approved (finalized)
  • discharge_cd = 'P': Provisional (subject to revision)
  • discharge_cd = 'e': Estimated

64 7. File Naming Conventions

64.1 HTEM Files

2D Grids:

Unit_[A-F]_[Original|Adjusted].grd

3D Models:

SCI[11-13]_[Parameters]_[Model].grd3

Material Types:

Unit_[A-F]_[Preferred|LoFreqSand|HiFreqSand]_MT.csv

Resistivity Classes:

Unit_[A-F]_ResistClass.csv

64.2 Database Tables

Groundwater: - OB_*: Observation/monitoring data - *_FIELD_DESCRIPTIONS: Metadata tables

Weather: - Warm*: Weather/climate data - *Lookup: Reference/metadata tables


65 8. Usage Examples

65.1 Reading HTEM Material Types

import pandas as pd

# Load Unit D (primary aquifer) material types
df = pd.read_csv('data/htem/3DGrids/SCI11Smooth_MaterialType_Grids/Unit_D_Preferred_MT.csv')

# Columns: X, Y, Z, MT_Index
print(df.head())

65.2 Querying Groundwater Database

import sqlite3

conn = sqlite3.connect('data/aquifer.db')

query = """
SELECT P_Number, TIMESTAMP, DTW_FT_Reviewed, Water_Surface_Elevation
FROM OB_WELL_MEASUREMENTS_CHAMPAIGN_COUNTY
WHERE DTW_FT_Flag = 0  -- Good data only
  AND TIMESTAMP > '1/1/2020'
ORDER BY TIMESTAMP DESC
LIMIT 1000
"""

df = pd.read_sql_query(query, conn)

# CRITICAL: Parse timestamp with explicit format
df['TIMESTAMP'] = pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y')

conn.close()

65.3 Loading Weather Data

import sqlite3

conn = sqlite3.connect('data/warm.db')

# Get station info
stations = pd.read_sql_query("SELECT * FROM WarmStationLookup WHERE Status='Active'", conn)

# Load hourly data for station 101
query = """
SELECT nDateTime, nAirTemp, nRainfall
FROM WarmICNHourly
WHERE nStationCode = 101
  AND nDateTime > strftime('%s', '2020-01-01')
"""

weather_df = pd.read_sql_query(query, conn)

# Convert Unix timestamp to datetime
weather_df['datetime'] = pd.to_datetime(weather_df['nDateTime'], unit='s')

conn.close()

66 9. Common Data Issues

66.1 Timestamp Parsing

Issue: Ambiguous date format in groundwater database

Solution: Always use explicit format:

pd.to_datetime(df['TIMESTAMP'], format='%m/%d/%Y')

66.2 Missing Data

HTEM: No data value = 1.70141e+38 (filter out)

Groundwater: Check quality flags, missing values as NULL

Weather: Missing values as NULL or -999

USGS Stream: Missing values as blank or “Ice”


66.3 Coordinate Mismatches

Issue: Different datasets use different coordinate systems

Solution: Always transform to common CRS (e.g., UTM Zone 16N) before spatial operations


67 10. Data Access Configuration

All data paths should be managed through config/data_config.yaml:

data_paths:
  # HTEM data
  grids_2d: "data/htem/2DGrids"
  grids_3d: "data/htem/3DGrids"
  material_types: "data/htem/3DGrids/SCI11Smooth_MaterialType_Grids"
  resistivity_classes: "data/htem/3DGrids/SCI11Smooth_ResistClass_Grids"

  # Databases
  aquifer_db: "data/aquifer.db"
  weather_db: "data/warm.db"

  # USGS stream data
  usgs_stream: "data/usgs_stream"

Usage:

from src.utils.config_loader import get_data_path

htem_path = get_data_path("grids_2d")
db_path = get_data_path("aquifer_db")

68 11. Update History

Date Version Description
2024-10-29 1.0 Initial data dictionary
2024-11-15 1.1 Added USGS stream data
2025-01-10 1.2 Updated TIMESTAMP documentation
2025-11-26 2.0 Consolidated for Playbook

69 12. Contact and Support

Data Questions: - Check field description tables in databases - Consult XML metadata files for HTEM grids - Review TIMESTAMP_AUDIT_AND_FIXES.md for timestamp issues

Issues: - Open GitHub issue with data label - Include dataset, field, and specific question

Contributions: - Submit PR to update this dictionary - Document new fields or data sources


Last Updated: November 26, 2025 Data Version: 2024-2025 Survey Campaign Processing Software: Aarhus Workbench 2024.1.1.0 License: Data usage subject to provider terms (ISWS, USGS, etc.)