Energy System Specification
This is the formal specification for Joule's compile-time energy verification system.
Overview
The energy system consists of:
- Energy budget attributes -- Programmer-declared constraints on function energy consumption
- Energy estimator -- Static analysis that estimates energy from HIR
- Energy cost model -- Calibrated per-instruction energy costs
- Energy IR (EIR) -- Intermediate representation with picojoule cost annotations
- Accelerator energy -- Runtime measurement for GPUs and other accelerators
- Diagnostics -- Error messages when budgets are violated
Attribute Syntax
#[energy_budget( budget_param { , budget_param } )]
Where budget_param is one of:
| Parameter | Type | Unit | Description |
|---|---|---|---|
max_joules | f64 | joules | Maximum total energy |
max_watts | f64 | watts | Maximum average power |
max_temp_delta | f64 | celsius | Maximum temperature rise |
Estimation Model
Instruction Costs
The cost model assigns picojoule costs to each instruction type. Costs are calibrated against real hardware measurements:
| Instruction | Base Cost (pJ) | Thermal Scaling |
|---|---|---|
IntAdd | 0.05 | Linear |
IntSub | 0.05 | Linear |
IntMul | 0.35 | Linear |
IntDiv | 3.5 | Linear |
IntRem | 3.5 | Linear |
FloatAdd | 0.35 | Quadratic |
FloatSub | 0.35 | Quadratic |
FloatMul | 0.35 | Quadratic |
FloatDiv | 3.5 | Quadratic |
FloatSqrt | 5.25 | Quadratic |
MemLoadL1 | 0.5 | Linear |
MemLoadL2 | 3.0 | Linear |
MemLoadL3 | 10.0 | Linear |
MemLoadDram | 200.0 | Linear |
MemStoreDram | 200.0 | Linear |
BranchTaken | 0.1 | None |
BranchNotTaken | 0.1 | None |
BranchMispredicted | 1.5 | None |
SimdF32x8Add | 1.5 | Quadratic |
SimdF32x8Mul | 1.5 | Quadratic |
SimdF32x8Div | 7.0 | Quadratic |
SimdF32x8Fma | 2.0 | Quadratic |
Thermal Scaling
Actual cost = base_cost * thermal_factor, where thermal_factor depends on the thermal model:
- None: cost is constant regardless of temperature
- Linear:
actual = base * (1.0 + 0.3 * thermal_state) - Quadratic:
actual = base * (1.0 + 0.3 * thermal_state + 0.1 * thermal_state^2)
Default thermal state: 0.3 (nominal operating temperature).
Expression Costs
| Expression | Cost |
|---|---|
| Literal | 0.01 pJ |
| Variable access | L1 load |
| Binary operation | left + right + op_cost |
| Unary operation | inner + op_cost |
| Function call | args + branch + 2x L1 (stack) |
| Method call | receiver + args + branch + 3x L1 |
| Field access | inner + IntAdd + L1 |
| Index access | array + index + IntMul + IntAdd + branch (bounds) + L1 |
| Struct construction | fields + (field_count x L1) |
| Array construction | elements + (element_count x L1) |
Loop Estimation
- Known bounds: body_cost * iteration_count
- Unknown bounds: body_cost * default_iterations (100)
- Max iterations cap: 10,000
- PGO-refined: body_cost * actual_trip_count (from profile data)
Unknown-bound loops reduce confidence by 0.7x. PGO data restores confidence to 0.95x.
Branch Estimation
- if/else: condition + avg(then_cost, else_cost) + branch_cost
- match: scrutinee + avg(arm_costs) + (arm_count x branch_cost)
Branches reduce confidence by 0.9x (if/else) or 0.85x (match).
Confidence Score
Range: 0.0 to 1.0
- Straight-line code: 1.0
- Each if/else: multiply by 0.9
- Each match: multiply by 0.85
- Each unbounded loop: multiply by 0.7
- PGO-refined loop: multiply by 0.95
The confidence score is reported in diagnostics to help the programmer assess estimate reliability.
Energy IR (EIR)
The Energy IR is an intermediate representation where every node carries a picojoule cost annotation. It sits between HIR and MIR in the pipeline:
HIR -> EIR (with picojoule costs) -> E-Graph Optimizer -> MIR
EIR nodes include:
EirExpr-- Expressions with energy costsEirStmt-- Statements with energy costsEirBody-- Function bodies with total energy and effect sets
Effect Sets
EIR tracks side effects using EffectSet:
- Pure (no effects)
- IO (reads/writes)
- Alloc (heap allocation)
- Panic (may abort)
The e-graph optimizer uses effect information to determine which rewrites are safe.
E-Graph Optimization
When --egraph-optimize is enabled, the EIR passes through an e-graph optimizer with 30+ algebraic rewrite rules:
- Arithmetic simplification (
x + 0 -> x,x * 1 -> x) - Constant folding
- Dead code elimination
- Common subexpression elimination
- Strength reduction (
x * 2 -> x << 1) - Energy-aware rewrites (prefer lower-energy equivalent operations)
Three-Tier Measurement
Tier 1: Static Estimation
Compile-time energy estimation using the instruction cost model. Available for all programs, no hardware access required.
Tier 2: CPU Performance Counters
Runtime measurement using hardware performance counters:
- Intel/AMD: RAPL (Running Average Power Limit) via
perf_eventor MSR - Apple Silicon:
powermetricsintegration
Tier 3: Accelerator Energy
Runtime measurement using vendor-specific APIs:
| Vendor | API | Measurement |
|---|---|---|
| NVIDIA | NVML (nvmlDeviceGetTotalEnergyConsumption) | Board power, per-GPU |
| AMD | ROCm SMI (rsmi_dev_power_ave_get) | Average power, per-GPU |
| Intel | Level Zero (zesDeviceGetProperties + power domains) | Per-device power |
| TPU Runtime | Per-chip power | |
| AWS | Neuron SDK | Per-core power |
| Groq | HLML (hlmlDeviceGetTotalEnergyConsumption) | Board power |
| Cerebras | CS SDK | Wafer-scale power |
| SambaNova | DataScale API | Per-RDU power |
See Accelerator Energy Measurement for details.
Power Estimation
avg_pj_per_cycle = 0.15 (weighted average for mixed workloads)
estimated_cycles = total_pJ / avg_pj_per_cycle
execution_time = estimated_cycles / reference_frequency (3.0 GHz)
power_watts = energy_joules / execution_time
Thermal Estimation
thermal_resistance = 0.4 K/W (typical CPU with standard cooling)
temp_delta = power_watts * thermal_resistance
Transitive Energy Budgets
Energy budgets are enforced across call boundaries. When function A calls function B, the energy cost of B is included in A's total:
#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }
#[energy_budget(max_joules = 0.0005)]
fn caller() -> i32 {
helper() + helper()
// Total includes 2x helper's energy + caller's own instructions
}
The call graph analyzer (joule-callgraph) builds a complete energy call graph and identifies hotspots.
JSON Output
When JOULE_ENERGY_JSON=1 is set, energy reports are emitted as structured JSON:
{
"functions": [
{
"name": "process_data",
"file": "program.joule",
"line": 15,
"energy_joules": 0.00035,
"power_watts": 12.5,
"confidence": 0.85,
"budget_joules": 0.0001,
"status": "exceeded",
"breakdown": {
"compute_pj": 280000,
"memory_pj": 70000,
"branch_pj": 500
}
}
],
"total_energy_joules": 0.00042
}
Violation Diagnostics
When a budget is exceeded, the compiler emits an error:
error: energy budget exceeded in function 'name'
--> file.joule:line:col
|
| fn name(...) {
| ^^^^^^^^^^^^^^
|
= estimated: X.XXXXX J (confidence: NN%)
= budget: X.XXXXX J
= exceeded by NNN%
For power and thermal budgets, similar diagnostics are produced with the appropriate units.