Energy System Specification

This is the formal specification for Joule's compile-time energy verification system.

Overview

The energy system consists of:

  1. Energy budget attributes -- Programmer-declared constraints on function energy consumption
  2. Energy estimator -- Static analysis that estimates energy from HIR
  3. Energy cost model -- Calibrated per-instruction energy costs
  4. Energy IR (EIR) -- Intermediate representation with picojoule cost annotations
  5. Accelerator energy -- Runtime measurement for GPUs and other accelerators
  6. Diagnostics -- Error messages when budgets are violated

Attribute Syntax

#[energy_budget( budget_param { , budget_param } )]

Where budget_param is one of:

ParameterTypeUnitDescription
max_joulesf64joulesMaximum total energy
max_wattsf64wattsMaximum average power
max_temp_deltaf64celsiusMaximum temperature rise

Estimation Model

Instruction Costs

The cost model assigns picojoule costs to each instruction type. Costs are calibrated against real hardware measurements:

InstructionBase Cost (pJ)Thermal Scaling
IntAdd0.05Linear
IntSub0.05Linear
IntMul0.35Linear
IntDiv3.5Linear
IntRem3.5Linear
FloatAdd0.35Quadratic
FloatSub0.35Quadratic
FloatMul0.35Quadratic
FloatDiv3.5Quadratic
FloatSqrt5.25Quadratic
MemLoadL10.5Linear
MemLoadL23.0Linear
MemLoadL310.0Linear
MemLoadDram200.0Linear
MemStoreDram200.0Linear
BranchTaken0.1None
BranchNotTaken0.1None
BranchMispredicted1.5None
SimdF32x8Add1.5Quadratic
SimdF32x8Mul1.5Quadratic
SimdF32x8Div7.0Quadratic
SimdF32x8Fma2.0Quadratic

Thermal Scaling

Actual cost = base_cost * thermal_factor, where thermal_factor depends on the thermal model:

  • None: cost is constant regardless of temperature
  • Linear: actual = base * (1.0 + 0.3 * thermal_state)
  • Quadratic: actual = base * (1.0 + 0.3 * thermal_state + 0.1 * thermal_state^2)

Default thermal state: 0.3 (nominal operating temperature).

Expression Costs

ExpressionCost
Literal0.01 pJ
Variable accessL1 load
Binary operationleft + right + op_cost
Unary operationinner + op_cost
Function callargs + branch + 2x L1 (stack)
Method callreceiver + args + branch + 3x L1
Field accessinner + IntAdd + L1
Index accessarray + index + IntMul + IntAdd + branch (bounds) + L1
Struct constructionfields + (field_count x L1)
Array constructionelements + (element_count x L1)

Loop Estimation

  • Known bounds: body_cost * iteration_count
  • Unknown bounds: body_cost * default_iterations (100)
  • Max iterations cap: 10,000
  • PGO-refined: body_cost * actual_trip_count (from profile data)

Unknown-bound loops reduce confidence by 0.7x. PGO data restores confidence to 0.95x.

Branch Estimation

  • if/else: condition + avg(then_cost, else_cost) + branch_cost
  • match: scrutinee + avg(arm_costs) + (arm_count x branch_cost)

Branches reduce confidence by 0.9x (if/else) or 0.85x (match).

Confidence Score

Range: 0.0 to 1.0

  • Straight-line code: 1.0
  • Each if/else: multiply by 0.9
  • Each match: multiply by 0.85
  • Each unbounded loop: multiply by 0.7
  • PGO-refined loop: multiply by 0.95

The confidence score is reported in diagnostics to help the programmer assess estimate reliability.

Energy IR (EIR)

The Energy IR is an intermediate representation where every node carries a picojoule cost annotation. It sits between HIR and MIR in the pipeline:

HIR -> EIR (with picojoule costs) -> E-Graph Optimizer -> MIR

EIR nodes include:

  • EirExpr -- Expressions with energy costs
  • EirStmt -- Statements with energy costs
  • EirBody -- Function bodies with total energy and effect sets

Effect Sets

EIR tracks side effects using EffectSet:

  • Pure (no effects)
  • IO (reads/writes)
  • Alloc (heap allocation)
  • Panic (may abort)

The e-graph optimizer uses effect information to determine which rewrites are safe.

E-Graph Optimization

When --egraph-optimize is enabled, the EIR passes through an e-graph optimizer with 30+ algebraic rewrite rules:

  • Arithmetic simplification (x + 0 -> x, x * 1 -> x)
  • Constant folding
  • Dead code elimination
  • Common subexpression elimination
  • Strength reduction (x * 2 -> x << 1)
  • Energy-aware rewrites (prefer lower-energy equivalent operations)

Three-Tier Measurement

Tier 1: Static Estimation

Compile-time energy estimation using the instruction cost model. Available for all programs, no hardware access required.

Tier 2: CPU Performance Counters

Runtime measurement using hardware performance counters:

  • Intel/AMD: RAPL (Running Average Power Limit) via perf_event or MSR
  • Apple Silicon: powermetrics integration

Tier 3: Accelerator Energy

Runtime measurement using vendor-specific APIs:

VendorAPIMeasurement
NVIDIANVML (nvmlDeviceGetTotalEnergyConsumption)Board power, per-GPU
AMDROCm SMI (rsmi_dev_power_ave_get)Average power, per-GPU
IntelLevel Zero (zesDeviceGetProperties + power domains)Per-device power
GoogleTPU RuntimePer-chip power
AWSNeuron SDKPer-core power
GroqHLML (hlmlDeviceGetTotalEnergyConsumption)Board power
CerebrasCS SDKWafer-scale power
SambaNovaDataScale APIPer-RDU power

See Accelerator Energy Measurement for details.

Power Estimation

avg_pj_per_cycle = 0.15  (weighted average for mixed workloads)
estimated_cycles = total_pJ / avg_pj_per_cycle
execution_time = estimated_cycles / reference_frequency  (3.0 GHz)
power_watts = energy_joules / execution_time

Thermal Estimation

thermal_resistance = 0.4 K/W  (typical CPU with standard cooling)
temp_delta = power_watts * thermal_resistance

Transitive Energy Budgets

Energy budgets are enforced across call boundaries. When function A calls function B, the energy cost of B is included in A's total:

#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }

#[energy_budget(max_joules = 0.0005)]
fn caller() -> i32 {
    helper() + helper()
    // Total includes 2x helper's energy + caller's own instructions
}

The call graph analyzer (joule-callgraph) builds a complete energy call graph and identifies hotspots.

JSON Output

When JOULE_ENERGY_JSON=1 is set, energy reports are emitted as structured JSON:

{
  "functions": [
    {
      "name": "process_data",
      "file": "program.joule",
      "line": 15,
      "energy_joules": 0.00035,
      "power_watts": 12.5,
      "confidence": 0.85,
      "budget_joules": 0.0001,
      "status": "exceeded",
      "breakdown": {
        "compute_pj": 280000,
        "memory_pj": 70000,
        "branch_pj": 500
      }
    }
  ],
  "total_energy_joules": 0.00042
}

Violation Diagnostics

When a budget is exceeded, the compiler emits an error:

error: energy budget exceeded in function 'name'
  --> file.joule:line:col
   |
   | fn name(...) {
   | ^^^^^^^^^^^^^^
   |
   = estimated: X.XXXXX J (confidence: NN%)
   = budget:    X.XXXXX J
   = exceeded by NNN%

For power and thermal budgets, similar diagnostics are produced with the appropriate units.