Energy System Specification

This is the formal specification for Joule's compile-time energy verification system.

Overview

The energy system consists of:

Energy budget attributes -- Programmer-declared constraints on function energy consumption
Energy estimator -- Static analysis that estimates energy from HIR
Energy cost model -- Calibrated per-instruction energy costs
Energy IR (EIR) -- Intermediate representation with picojoule cost annotations
Accelerator energy -- Runtime measurement for GPUs and other accelerators
Diagnostics -- Error messages when budgets are violated

Attribute Syntax

#[energy_budget( budget_param { , budget_param } )]

Where budget_param is one of:

Parameter	Type	Unit	Description
`max_joules`	`f64`	joules	Maximum total energy
`max_watts`	`f64`	watts	Maximum average power
`max_temp_delta`	`f64`	celsius	Maximum temperature rise

Estimation Model

Instruction Costs

The cost model assigns picojoule costs to each instruction type. Costs are calibrated against real hardware measurements:

Instruction	Base Cost (pJ)	Thermal Scaling
`IntAdd`	0.05	Linear
`IntSub`	0.05	Linear
`IntMul`	0.35	Linear
`IntDiv`	3.5	Linear
`IntRem`	3.5	Linear
`FloatAdd`	0.35	Quadratic
`FloatSub`	0.35	Quadratic
`FloatMul`	0.35	Quadratic
`FloatDiv`	3.5	Quadratic
`FloatSqrt`	5.25	Quadratic
`MemLoadL1`	0.5	Linear
`MemLoadL2`	3.0	Linear
`MemLoadL3`	10.0	Linear
`MemLoadDram`	200.0	Linear
`MemStoreDram`	200.0	Linear
`BranchTaken`	0.1	None
`BranchNotTaken`	0.1	None
`BranchMispredicted`	1.5	None
`SimdF32x8Add`	1.5	Quadratic
`SimdF32x8Mul`	1.5	Quadratic
`SimdF32x8Div`	7.0	Quadratic
`SimdF32x8Fma`	2.0	Quadratic

Thermal Scaling

Actual cost = base_cost * thermal_factor, where thermal_factor depends on the thermal model:

None: cost is constant regardless of temperature
Linear: actual = base * (1.0 + 0.3 * thermal_state)
Quadratic: actual = base * (1.0 + 0.3 * thermal_state + 0.1 * thermal_state^2)

Default thermal state: 0.3 (nominal operating temperature).

Expression Costs

Expression	Cost
Literal	0.01 pJ
Variable access	L1 load
Binary operation	left + right + op_cost
Unary operation	inner + op_cost
Function call	args + branch + 2x L1 (stack)
Method call	receiver + args + branch + 3x L1
Field access	inner + IntAdd + L1
Index access	array + index + IntMul + IntAdd + branch (bounds) + L1
Struct construction	fields + (field_count x L1)
Array construction	elements + (element_count x L1)

Loop Estimation

Known bounds: body_cost * iteration_count
Unknown bounds: body_cost * default_iterations (100)
Max iterations cap: 10,000
PGO-refined: body_cost * actual_trip_count (from profile data)

Unknown-bound loops reduce confidence by 0.7x. PGO data restores confidence to 0.95x.

Branch Estimation

if/else: condition + avg(then_cost, else_cost) + branch_cost
match: scrutinee + avg(arm_costs) + (arm_count x branch_cost)

Branches reduce confidence by 0.9x (if/else) or 0.85x (match).

Confidence Score

Range: 0.0 to 1.0

Straight-line code: 1.0
Each if/else: multiply by 0.9
Each match: multiply by 0.85
Each unbounded loop: multiply by 0.7
PGO-refined loop: multiply by 0.95

The confidence score is reported in diagnostics to help the programmer assess estimate reliability.

Energy IR (EIR)

The Energy IR is an intermediate representation where every node carries a picojoule cost annotation. It sits between HIR and MIR in the pipeline:

HIR -> EIR (with picojoule costs) -> E-Graph Optimizer -> MIR

EIR nodes include:

EirExpr -- Expressions with energy costs
EirStmt -- Statements with energy costs
EirBody -- Function bodies with total energy and effect sets

Effect Sets

EIR tracks side effects using EffectSet:

Pure (no effects)
IO (reads/writes)
Alloc (heap allocation)
Panic (may abort)

The e-graph optimizer uses effect information to determine which rewrites are safe.

E-Graph Optimization

When --egraph-optimize is enabled, the EIR passes through an e-graph optimizer with 30+ algebraic rewrite rules:

Arithmetic simplification (x + 0 -> x, x * 1 -> x)
Constant folding
Dead code elimination
Common subexpression elimination
Strength reduction (x * 2 -> x << 1)
Energy-aware rewrites (prefer lower-energy equivalent operations)

Three-Tier Measurement

Tier 1: Static Estimation

Compile-time energy estimation using the instruction cost model. Available for all programs, no hardware access required.

Tier 2: CPU Performance Counters

Runtime measurement using hardware performance counters:

Intel/AMD: RAPL (Running Average Power Limit) via perf_event or MSR
Apple Silicon: powermetrics integration

Tier 3: Accelerator Energy

Runtime measurement using vendor-specific APIs:

Vendor	API	Measurement
NVIDIA	NVML (`nvmlDeviceGetTotalEnergyConsumption`)	Board power, per-GPU
AMD	ROCm SMI (`rsmi_dev_power_ave_get`)	Average power, per-GPU
Intel	Level Zero (`zesDeviceGetProperties` + power domains)	Per-device power
Google	TPU Runtime	Per-chip power
AWS	Neuron SDK	Per-core power
Groq	HLML (`hlmlDeviceGetTotalEnergyConsumption`)	Board power
Cerebras	CS SDK	Wafer-scale power
SambaNova	DataScale API	Per-RDU power

See Accelerator Energy Measurement for details.

Power Estimation

avg_pj_per_cycle = 0.15  (weighted average for mixed workloads)
estimated_cycles = total_pJ / avg_pj_per_cycle
execution_time = estimated_cycles / reference_frequency  (3.0 GHz)
power_watts = energy_joules / execution_time

Thermal Estimation

thermal_resistance = 0.4 K/W  (typical CPU with standard cooling)
temp_delta = power_watts * thermal_resistance

Transitive Energy Budgets

Energy budgets are enforced across call boundaries. When function A calls function B, the energy cost of B is included in A's total:

#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }

#[energy_budget(max_joules = 0.0005)]
fn caller() -> i32 {
    helper() + helper()
    // Total includes 2x helper's energy + caller's own instructions
}

The call graph analyzer (joule-callgraph) builds a complete energy call graph and identifies hotspots.

JSON Output

When JOULE_ENERGY_JSON=1 is set, energy reports are emitted as structured JSON:

{
  "functions": [
    {
      "name": "process_data",
      "file": "program.joule",
      "line": 15,
      "energy_joules": 0.00035,
      "power_watts": 12.5,
      "confidence": 0.85,
      "budget_joules": 0.0001,
      "status": "exceeded",
      "breakdown": {
        "compute_pj": 280000,
        "memory_pj": 70000,
        "branch_pj": 500
      }
    }
  ],
  "total_energy_joules": 0.00042
}

Violation Diagnostics

When a budget is exceeded, the compiler emits an error:

error: energy budget exceeded in function 'name'
  --> file.joule:line:col
   |
   | fn name(...) {
   | ^^^^^^^^^^^^^^
   |
   = estimated: X.XXXXX J (confidence: NN%)
   = budget:    X.XXXXX J
   = exceeded by NNN%

For power and thermal budgets, similar diagnostics are produced with the appropriate units.

The Joule Programming Language