Joule Documentation

Welcome to the Joule programming language documentation (v1.1.7). Joule is developed by Open Interface Engineering, Inc.

For New Users

  • Getting Started -- Install the compiler and write your first Joule program
  • Language Tour -- Learn Joule's syntax and features through examples

Guides

Language Reference

The formal specification of Joule's syntax and semantics.

  • Types -- Primitives, compounds, generics, union types, type inference
  • Expressions -- Operators, pipe operator, literals, control flow, closures
  • Items -- Functions, structs, enums, traits, impls, modules, const fn, comptime
  • Patterns -- Pattern matching: or patterns, range patterns, guard clauses
  • Attributes -- Energy budgets, #[test], #[bench], thermal awareness
  • Memory -- Ownership, borrowing, references, lifetimes
  • Concurrency -- Async/await, spawn, channels, task groups, parallel for
  • Energy -- Energy system specification with accelerator support

Standard Library Reference

Joule ships with 110+ batteries-included modules.

  • Overview -- Index of all standard library modules
  • String -- String type and operations
  • Vec -- Dynamic arrays
  • Option -- Optional values
  • Result -- Error handling
  • HashMap -- Key-value maps
  • Primitives -- Numeric types, bool, char
  • Collections -- All collection types
  • I/O -- File and stream I/O
  • Math -- Mathematical functions and linear algebra

Feedback

To report bugs, request features, or ask questions, visit joule-lang.org or open an issue on GitHub.

Getting Started with Joule

This guide walks you through installing the Joule compiler and writing your first program.

Current version: v1.2.0

Install

PlatformCommand
macOS / Linuxbrew install openIE-dev/joule/joule
Windowswinget install OpenIE.Joule
Ubuntu / Debiansudo apt install joule (after adding the repo)
Arch Linuxyay -S joule-bin
Nixnix run github:openIE-dev/joule-lang
Snapsudo snap install joule --classic
Any (curl)curl -fsSL https://joule-lang.org/install.sh | sh

macOS

Homebrew (recommended):

brew install openIE-dev/joule/joule

Or download joule-macos-arm64.pkg (Apple Silicon) or joule-macos-x86_64.pkg (Intel) from the releases page:

sudo installer -pkg joule-macos-arm64.pkg -target /

Windows

Winget (recommended, built into Windows 11):

winget install OpenIE.Joule

Scoop:

scoop bucket add joule https://github.com/openIE-dev/scoop-joule
scoop install joule

Chocolatey:

choco install joule

Or download joule-windows-x86_64.msi or joule-windows-arm64.msi from the releases page. The MSI installer adds joulec to your PATH automatically.

APT (Ubuntu/Debian)

curl -fsSL https://openie-dev.github.io/joule-lang/KEY.gpg | sudo gpg --dearmor -o /usr/share/keyrings/joule.gpg
echo "deb [signed-by=/usr/share/keyrings/joule.gpg] https://openie-dev.github.io/joule-lang stable main" | sudo tee /etc/apt/sources.list.d/joule.list
sudo apt update && sudo apt install joule

Arch Linux (AUR)

yay -S joule-bin

Or with any AUR helper: paru -S joule-bin, trizen -S joule-bin.

Nix

# Run without installing
nix run github:openIE-dev/joule-lang

# Install into profile
nix profile install github:openIE-dev/joule-lang

Snap

sudo snap install joule --classic

Install Script

Universal one-line installer for macOS and Linux:

curl -fsSL https://joule-lang.org/install.sh | sh

From Source

git clone https://github.com/openIE-dev/joule-lang.git
cd joule-lang && cargo build --release

From C Source (Zero Dependencies)

Download joule-c-src-*.tar.gz from the releases page:

tar xzf joule-c-src-*.tar.gz && cd joule-c-src-*
make    # or: cc -O2 -o joulec output.c -lm

Verify

joulec --version
# joulec 1.2.0

Write Your First Program

Create a file called hello.joule:

pub fn main() {
    let message = "Hello from Joule!";
    println!("{}", message);
}

Compile and Run

joulec hello.joule -o hello
./hello

Output:

Hello from Joule!

Try JIT Mode

For interactive development, skip the compile step entirely:

joulec --jit hello.joule

This JIT-compiles and runs your program in memory using the Cranelift backend. No intermediate files are produced.

For an even faster workflow, use watch mode. It monitors your source file and re-runs automatically when you save:

joulec --watch hello.joule

JIT mode requires the jit feature flag. See JIT Compilation for details.

Add an Energy Budget

Joule's defining feature is compile-time energy budget verification. Annotate your function with an energy allowance:

#[energy_budget(max_joules = 0.0001)]
pub fn main() {
    let x = 42;
    let y = 58;
    let result = x + y;
    println!("{}", result);
}

Compile with energy checking:

joulec hello.joule -o hello --energy-check

The compiler estimates the energy cost of your function at compile time. If it exceeds the declared budget, compilation fails with a diagnostic showing the estimated vs. allowed energy.

Measure Energy in Existing Code

Already have Python or JavaScript code? Joule can measure its energy consumption without rewriting it:

# Measure energy in a Python script
joulec --lift-run python script.py

# Measure energy in a JavaScript file
joulec --lift-run js app.js

# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py

The --lift-run flag lifts foreign code into Joule's intermediate representation, applies energy analysis, and executes it. See Polyglot Energy Analysis for details.

Batteries Included

Joule ships with 110+ standard library modules. No package manager needed for common tasks:

use std::math;
use std::collections::HashMap;
use std::io::File;
use std::net::TcpStream;
use std::crypto::sha256;

See the Standard Library Reference for the complete list.

Feedback

Joule is developed and maintained by Open Interface Engineering, Inc. We welcome bug reports, feature requests, and questions via joule-lang.org or GitHub Issues.

Next Steps

Language Tour

A quick introduction to Joule's syntax and features through examples.

Variables

Variables are immutable by default. Use mut for mutable bindings.

let x = 42;              // immutable, type inferred as i32
let name: String = "Jo"; // explicit type annotation
let mut count = 0;        // mutable
count = count + 1;

Primitive Types

let a: i8  = -128;        // signed integers: i8, i16, i32, i64, isize
let b: u32 = 42;          // unsigned integers: u8, u16, u32, u64, usize
let c: f64 = 3.14159;     // floats: f16, bf16, f32, f64
let d: bool = true;       // boolean
let e: char = 'A';        // unicode character
let s: String = "hello";  // string
let h: f16 = 0.5f16;      // half-precision (ML inference, signal processing)
let g: bf16 = 0.001bf16;  // brain float (ML training)

Functions

// Basic function with parameters and return type
fn add(a: i32, b: i32) -> i32 {
    a + b   // last expression is the return value
}

// Public function (visible outside the module)
pub fn greet(name: String) {
    println!("Hello, {}", name);
}

// Mutable self parameter for methods that modify state
fn advance(mut self) -> Token {
    let token = self.peek();
    self.pos = self.pos + 1;
    token
}

Structs

pub struct Point {
    pub x: f64,
    pub y: f64,
}

// Construction
let p = Point { x: 3.0, y: 4.0 };

// Field access
let dist = p.x * p.x + p.y * p.y;

Impl Blocks

Methods are defined in impl blocks, separate from the struct definition.

impl Point {
    // Associated function (constructor)
    pub fn new(x: f64, y: f64) -> Point {
        Point { x, y }
    }

    // Method on self
    pub fn distance(self) -> f64 {
        (self.x * self.x + self.y * self.y).sqrt()
    }

    // Mutable method
    pub fn translate(mut self, dx: f64, dy: f64) {
        self.x = self.x + dx;
        self.y = self.y + dy;
    }
}

let p = Point::new(3.0, 4.0);
let d = p.distance();

Enums

Enums can hold data in each variant, making them sum types (tagged unions).

pub enum Shape {
    Circle { radius: f64 },
    Rectangle { width: f64, height: f64 },
    Triangle { base: f64, height: f64 },
}

let s = Shape::Circle { radius: 5.0 };

Pattern Matching

match is exhaustive -- the compiler ensures you handle every variant.

fn area(shape: Shape) -> f64 {
    match shape {
        Shape::Circle { radius } => {
            3.14159 * radius * radius
        }
        Shape::Rectangle { width, height } => {
            width * height
        }
        Shape::Triangle { base, height } => {
            0.5 * base * height
        }
    }
}

Match with a wildcard:

match token.kind {
    TokenKind::Fn => parse_function(),
    TokenKind::Struct => parse_struct(),
    TokenKind::Enum => parse_enum(),
    _ => parse_expression(),
}

Or Patterns

Match multiple alternatives in a single arm:

match x {
    1 | 2 | 3 => "small",
    4 | 5 | 6 => "medium",
    _ => "large",
}

Range Patterns

Match a range of values:

match score {
    0..=59 => "F",
    60..=69 => "D",
    70..=79 => "C",
    80..=89 => "B",
    90..=100 => "A",
    _ => "invalid",
}

Guard Clauses

Add conditions to match arms:

match value {
    x if x > 0 => "positive",
    x if x < 0 => "negative",
    _ => "zero",
}

Control Flow

// if-else (these are expressions -- they return values)
let max = if a > b { a } else { b };

// while loop
let mut i = 0;
while i < 10 {
    i = i + 1;
}

// for loop
for item in items {
    process(item);
}

// loop (infinite, break to exit)
loop {
    if done() {
        break;
    }
}

Option and Result

Option<T> represents a value that may or may not exist. Result<T, E> represents an operation that can succeed or fail.

// Option
fn find(items: Vec<i32>, target: i32) -> Option<usize> {
    let mut i = 0;
    while i < items.len() {
        if items[i] == target {
            return Option::Some(i);
        }
        i = i + 1;
    }
    Option::None
}

// Handling an Option
match find(items, 42) {
    Option::Some(index) => println!("Found at {}", index),
    Option::None => println!("Not found"),
}

// Result
fn parse_number(s: String) -> Result<i32, String> {
    // ...
    Result::Ok(42)
}

match parse_number(input) {
    Result::Ok(n) => println!("Got: {}", n),
    Result::Err(e) => println!("Error: {}", e),
}

Generics

Functions and types can be parameterized over types.

pub struct Pair<A, B> {
    pub first: A,
    pub second: B,
}

fn swap<A, B>(pair: Pair<A, B>) -> Pair<B, A> {
    Pair { first: pair.second, second: pair.first }
}

Traits

Traits define shared behavior. Types implement traits with impl.

pub trait Display {
    fn to_string(self) -> String;
}

impl Display for Point {
    fn to_string(self) -> String {
        "(" + self.x.to_string() + ", " + self.y.to_string() + ")"
    }
}

Collections

// Vec -- dynamic array
let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
v.push(3);
let first = v[0];       // indexing
let len = v.len();       // length

// HashMap -- key-value store
let mut map: HashMap<String, i32> = HashMap::new();
map.insert("alice", 42);
map.insert("bob", 17);

Closures

Anonymous functions that can capture variables from their enclosing scope:

let double = |x: i32| -> i32 { x * 2 };
let result = double(21);  // 42

// Closures capture variables
let multiplier = 3;
let multiply = |x: i32| -> i32 { x * multiplier };

Range-Based For Loops

Iterate over numeric ranges with ..:

// Exclusive range: 0, 1, 2, ..., 9
for i in 0..10 {
    println!("{}", i);
}

// Use in accumulation
let mut sum = 0;
for i in 1..101 {
    sum = sum + i;
}
// sum = 5050

Iterator Methods

Vec supports functional-style iterator methods:

let numbers = vec![1, 2, 3, 4, 5];

// Transform elements
let doubled = numbers.map(|x: i32| -> i32 { x * 2 });

// Filter elements
let evens = numbers.filter(|x: i32| -> bool { x % 2 == 0 });

// Check conditions
let has_negative = numbers.any(|x: i32| -> bool { x < 0 });
let all_positive = numbers.all(|x: i32| -> bool { x > 0 });

// Reduce to single value
let sum = numbers.fold(0, |acc: i32, x: i32| -> i32 { acc + x });

Option and Result Methods

Rich combinator APIs for safe value handling:

let opt: Option<i32> = Option::Some(42);

// Query
let is_there = opt.is_some();     // true
let is_empty = opt.is_none();     // false

// Extract with default
let val = opt.unwrap_or(0);       // 42

// Transform
let doubled = opt.map(|x: i32| -> i32 { x * 2 });  // Some(84)

// Chain operations
let result = opt.and_then(|x: i32| -> Option<i32> {
    if x > 0 { Option::Some(x * 10) } else { Option::None }
});

Pipe Operator

The pipe operator |> passes the result of the left expression as the first argument to the right function. It makes data transformation pipelines readable:

// Without pipe -- deeply nested calls
let result = to_uppercase(trim(read_file("data.txt")));

// With pipe -- reads left to right
let result = read_file("data.txt")
    |> trim
    |> to_uppercase;

// Works with closures and multi-argument functions
let processed = data
    |> filter(|x| x > 0)
    |> map(|x| x * 2)
    |> fold(0, |acc, x| acc + x);

Union Types

Union types allow a value to be one of several types, checked at compile time:

type JsonValue = i64 | f64 | String | bool | Vec<JsonValue>;

fn process(value: JsonValue) {
    match value {
        x: i64 => println!("integer: {}", x),
        x: f64 => println!("float: {}", x),
        s: String => println!("string: {}", s),
        b: bool => println!("bool: {}", b),
        arr: Vec<JsonValue> => println!("array of {}", arr.len()),
    }
}

Algebraic Effects

Effects declare the side effects a function may perform, tracked by the type system:

effect Log {
    fn log(message: String);
}

effect Fail {
    fn fail(reason: String) -> !;
}

fn process(data: Vec<u8>) -> Result<Output, Error> with Log, Fail {
    Log::log("Processing started");
    if data.is_empty() {
        Fail::fail("empty input");
    }
    // ...
}

Effects are handled at the call site:

handle process(data) {
    Log::log(msg) => {
        println!("[LOG] {}", msg);
        resume;
    }
    Fail::fail(reason) => {
        Result::Err(Error::new(reason))
    }
}

Supervisors

Supervisors manage the lifecycle of concurrent tasks with automatic restart strategies:

use std::concurrency::Supervisor;

let sup = Supervisor::new(RestartStrategy::OneForOne);

sup.spawn("worker-1", || {
    // If this task panics, only this task is restarted
    process_queue()
});

sup.spawn("worker-2", || {
    process_events()
});

sup.run();

Parallel For

Parallel iteration over collections with automatic work distribution:

// Parallel map over a vector
let results = parallel for item in data {
    heavy_computation(item)
};

// With explicit chunk size
let processed = parallel(chunk_size: 1024) for row in matrix {
    transform(row)
};

The compiler tracks energy consumption across all parallel branches and sums them for the total budget.

Computation Builders

Computation builders provide a monadic syntax for composing complex operations:

let result = async {
    let data = fetch(url).await;
    let parsed = parse(data).await;
    transform(parsed)
};

let query = query {
    from users
    where age > 18
    select name, email
    order_by name
};

Const Functions

Functions that can be evaluated at compile time:

const fn factorial(n: i32) -> i32 {
    if n <= 1 { 1 } else { n * factorial(n - 1) }
}

// Evaluated at compile time
const FACT_10: i32 = factorial(10);

Comptime Blocks

Execute arbitrary code at compile time:

comptime {
    let lookup = generate_lookup_table(256);
    // lookup is available as a constant in runtime code
}

Modules and Imports

// Import specific items
use crate::ast::{File, AstItem};
use std::collections::HashMap;

// Module declarations (loads from separate file)
mod lexer;      // loads from lexer.joule or lexer/mod.joule
mod parser;
mod typeck;

// Public module re-export
pub mod utils;

// Inline module
mod helpers {
    pub fn clamp(x: i32, lo: i32, hi: i32) -> i32 {
        if x < lo { lo } else if x > hi { hi } else { x }
    }
}

// Glob import from stdlib
use std::math::*;

Async/Await with Channels

Asynchronous programming with channels for communication:

use std::concurrency::{spawn, channel};

async fn fetch_and_process(url: String) -> Result<Data, Error> {
    let response = http::get(url).await?;
    let data = parse(response.body()).await?;
    Result::Ok(data)
}

// Bounded channels for backpressure
let (tx, rx) = channel(capacity: 100);

spawn(|| {
    for item in source {
        tx.send(item);
    }
});

while let Option::Some(item) = rx.recv() {
    process(item);
}

Smart Pointers

Manage shared ownership and heap allocation:

// Box — heap allocation, required for recursive types
let b = Box::new(42);

// Rc — single-threaded shared ownership
let shared = Rc::new(vec![1, 2, 3]);
let copy = shared.clone();   // reference count +1

// Arc — thread-safe shared ownership
let data = Arc::new(vec![1, 2, 3]);
spawn(|| { let local = data.clone(); });

// Cow — clone-on-write (free reads, allocate on mutation)
let text = Cow::borrowed("hello");

See Smart Pointers for full documentation.

Const-Generic Types

Types with compile-time integer parameters:

// SmallVec — inline buffer, heap only when overflow
let mut sv: SmallVec[i32; 8] = SmallVec::new();
sv.push(42);    // inline — no heap allocation

// Simd — portable SIMD vectors
let a: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let b: Simd[f32; 4] = Simd::splat(2.0);
let c = a.mul(&b);  // [2.0, 4.0, 6.0, 8.0] — single instruction

// NDArray — multi-dimensional arrays
let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]);
let val = mat[1, 2];

Box (Heap Allocation)

Box<T> puts data on the heap. Required for recursive types.

pub enum Expr {
    Literal(i32),
    Add {
        left: Box<Expr>,
        right: Box<Expr>,
    },
}

Type Aliases

pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;

Energy Budgets

Joule's defining feature. Declare the maximum energy a function is allowed to consume:

#[energy_budget(max_joules = 0.0001)]
fn efficient_add(x: i32, y: i32) -> i32 {
    x + y
}

The compiler estimates energy consumption at compile time. If a function exceeds its budget, compilation fails.

Power and thermal budgets are also available:

#[energy_budget(max_joules = 0.0002)]
#[thermal_aware]
fn thermal_safe_compute(n: i32) -> i32 {
    let result = n * n;
    result + 1
}

Compile with --energy-check to enable verification:

joulec program.joule -o program --energy-check

See Energy System Guide for a deep dive.

Testing with Energy

Write tests that verify both correctness and energy consumption:

#[test]
fn test_sort_energy() {
    let data = vec![5, 3, 1, 4, 2];
    let sorted = sort(data);
    assert_eq!(sorted, vec![1, 2, 3, 4, 5]);
}

#[bench]
fn bench_matrix_multiply() {
    let a = Matrix::random(100, 100);
    let b = Matrix::random(100, 100);
    let _ = a.multiply(b);
}

Run with:

joulec program.joule --test    # runs tests with energy reporting
joulec program.joule --bench   # runs benchmarks with energy reporting

Built-in Macros

Joule provides built-in macros for common operations:

// Output
println!("Hello, {}!", name);       // print with newline
print!("no newline");               // print without newline

// Formatting
let s = format!("{} + {} = {}", a, b, a + b);

// Collections
let nums = vec![1, 2, 3, 4, 5];

// Assertions (for testing)
assert!(x > 0);
assert_eq!(result, expected);

For FFI with C libraries, use extern declarations:

extern fn sqrt(x: f64) -> f64;

What's Next

Energy System Guide

Joule's defining feature is compile-time energy budget verification. This guide explains how it works and how to use it.

Why Energy Budgets?

Computing consumes enormous amounts of energy, and most of it is invisible. Cloud providers report aggregate billing units. Industry benchmarks report averages. Nobody tells you what a single sort, a single allocation, or a single network call actually costs in joules.

Joule makes that cost visible. Every function can declare its energy budget, and the compiler enforces it at compile time.

Basic Usage

Annotate a function with #[energy_budget]:

#[energy_budget(max_joules = 0.0001)]  // 100 microjoules
fn add(x: i32, y: i32) -> i32 {
    x + y
}

Compile with energy checking enabled:

joulec program.joule -o program --energy-check

If the function's estimated energy exceeds the declared budget, compilation fails with a diagnostic:

error: energy budget exceeded in function 'process_data'
  --> program.joule:15:1
   |
15 | fn process_data(input: Vec<f64>) -> f64 {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = estimated: 0.00035 J (confidence: 85%)
   = budget:    0.00010 J
   = exceeded by 250%

Budget Types

Energy Budget (Joules)

The primary budget type. Limits total energy consumption:

#[energy_budget(max_joules = 0.0005)]
fn fibonacci(n: i32) -> i32 {
    // ...
}

Power Budget (Watts)

Limits average power draw. Useful for sustained workloads:

#[energy_budget(max_watts = 15.0)]
fn render_frame(scene: Scene) -> Image {
    // ...
}

Thermal Budget (Temperature Delta)

Limits the temperature increase caused by the function. Prevents thermal throttling:

#[energy_budget(max_temp_delta = 5.0)]  // max 5 degrees Celsius rise
fn heavy_compute(data: Vec<f64>) -> f64 {
    // ...
}

Thermal-Aware Functions

The #[thermal_aware] attribute marks functions that should adapt to thermal conditions:

#[energy_budget(max_joules = 0.0002)]
#[thermal_aware]
fn adaptive_compute(n: i32) -> i32 {
    let result = n * n;
    result + 1
}

Combining Budgets

You can declare multiple budget constraints:

#[energy_budget(max_joules = 0.001, max_watts = 20.0, max_temp_delta = 3.0)]
fn critical_path(data: Vec<f64>) -> f64 {
    // ...
}

How the Estimator Works

The compiler uses static analysis to estimate energy consumption without running your code. Here's what it considers:

Instruction Costs

Every operation has a calibrated energy cost in picojoules:

OperationApproximate CostCycles
Integer add/sub0.05 pJ1
Integer multiply0.35 pJ3
Integer divide3.5 pJ10
Float add/sub0.35 pJ3
Float multiply0.35 pJ3
Float divide3.5 pJ10
Float sqrt5.25 pJ15
L1 cache load0.5 pJ4
L2 cache load3.0 pJ12
L3 cache load10.0 pJ40
DRAM load/store200.0 pJ200
Branch (taken)0.1 pJ1
Branch misprediction1.5 pJ15
SIMD f32x8 multiply1.5 pJ3
Half-precision (f16/bf16) op0.4 pJ1
SmallVec inline push0.5 pJ1
SmallVec heap spill45.0 pJ~100
SIMD vector op (any width)2.0 pJ3
Atomic read-modify-write8.0 pJ20
Rc/Arc clone/drop3.0 pJ5
Arena bump alloc1.0 pJ2
Arena reset (free all)0.5 pJ1
BitSet/BitVec word op0.3 pJ1
Decimal (128-bit) arithmetic5.0 pJ15
Deque push/pop2.0 pJ5
Intern hash lookup10.0 pJ30
Complex arithmetic1.6 pJ4
Instant::now() (clock read)15.0 pJ50
BTreeMap/BTreeSet traversal12.0 pJ40

Loop Analysis

For loops with known bounds, the estimator multiplies the loop body cost by the iteration count. For unbounded loops (while with runtime conditions), it uses a configurable default (100 iterations) and reduces the confidence score.

Branch Analysis

For if/else and match expressions, the estimator computes the cost of each branch and averages them, since it can't know which branch will execute at compile time. This reduces the confidence score.

Confidence Score

Every estimate comes with a confidence score from 0.0 to 1.0:

  • 1.0 -- Straight-line code, no loops or branches. Estimate is precise.
  • 0.85-0.95 -- Code with branches. Estimate is an average.
  • 0.5-0.85 -- Code with unbounded loops. Estimate depends on assumed iteration count.
  • < 0.5 -- Complex code with nested unbounded loops. Estimate is rough.

The confidence score is shown in diagnostic output so you can judge the reliability of the estimate.

Power Estimation

Power (watts) is derived from energy and estimated execution time:

Power = Energy / Time
Time = Estimated Cycles / CPU Frequency (3.0 GHz reference)

Thermal Estimation

Temperature delta is derived from power using a simplified thermal model:

Delta_T = Power * Thermal_Resistance (0.4 K/W typical)

Three-Tier Energy Measurement

Joule measures energy at three levels, providing increasing precision:

Tier 1: Static Estimation (Compile Time)

The compiler estimates energy from code structure alone, using the instruction cost model described above. This is available for all programs, everywhere, with zero runtime overhead.

  • No hardware access required
  • Works at compile time
  • Confidence score indicates reliability
  • Used for #[energy_budget] verification

Tier 2: CPU Performance Counters (Runtime)

On supported platforms, Joule reads hardware performance counters (RAPL on Intel/AMD) to measure actual CPU energy consumption during execution.

  • Requires Linux with perf_event or macOS with powermetrics
  • Per-function and per-scope measurements
  • Joule-level precision (not just watt-hours)

Tier 3: Accelerator Energy (Runtime)

For GPU and accelerator workloads, Joule queries vendor-specific energy APIs. See Accelerator Energy Measurement for details.

  • NVIDIA GPUs via NVML
  • AMD GPUs via ROCm SMI
  • Intel GPUs/accelerators via Level Zero
  • Google TPUs via TPU runtime
  • AWS Inferentia/Trainium via Neuron SDK
  • Groq LPUs via HLML
  • Cerebras and SambaNova via vendor APIs

JSON Output Mode

For programmatic consumption, set the environment variable JOULE_ENERGY_JSON=1:

JOULE_ENERGY_JSON=1 joulec program.joule --emit c --energy-check -o program.c

This outputs energy reports as structured JSON:

{
  "functions": [
    {
      "name": "process_data",
      "file": "program.joule",
      "line": 15,
      "energy_joules": 0.00035,
      "power_watts": 12.5,
      "confidence": 0.85,
      "budget_joules": 0.0001,
      "status": "exceeded",
      "breakdown": {
        "compute_pj": 280000,
        "memory_pj": 70000,
        "branch_pj": 500
      }
    }
  ],
  "total_energy_joules": 0.00042,
  "device": "cpu"
}

When accelerator energy is available, the JSON includes per-device breakdowns:

{
  "devices": [
    { "type": "cpu", "energy_joules": 0.00042 },
    { "type": "gpu", "vendor": "nvidia", "energy_joules": 0.0031, "api": "nvml" }
  ],
  "total_energy_joules": 0.00352
}

Practical Guidelines

Start Generous, Then Tighten

Begin with a generous budget, measure, then reduce:

// Start here
#[energy_budget(max_joules = 0.01)]

// After profiling, tighten
#[energy_budget(max_joules = 0.001)]

// Production target
#[energy_budget(max_joules = 0.0005)]

Budget Hot Loops Carefully

The estimator assumes 100 iterations for unbounded loops. If your loop runs 10,000 times, the estimate will be 100x too low. Consider refactoring into bounded loops or adjusting your budget accordingly.

Use Confidence Scores

If the compiler reports low confidence (< 0.7), the estimate may be significantly off. Review the function for unbounded loops and complex branching.

Transitive Energy Budgets

Energy budgets are enforced across call boundaries. A function calling another budgeted function includes the callee's energy in its own estimate:

#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }

#[energy_budget(max_joules = 0.0005)]
fn main_work() -> i32 {
    // The compiler accounts for helper's energy within main_work's budget
    helper() + helper()
}

Profile-Guided Refinement

For the most accurate energy estimates, use profile-guided optimization:

# Phase 1: instrument and run
joulec program.joule --profile-generate -o program
./program

# Phase 2: compile with profile data
joulec program.joule --profile-use profile.json --energy-check -o program

The profile data provides actual loop trip counts and branch frequencies, dramatically improving estimate accuracy.

Feedback

Questions about the energy system? Visit joule-lang.org or open an issue on GitHub.

Compiler Reference

Usage

joulec <INPUT> [OPTIONS]

Where <INPUT> is a .joule source file (or a foreign source file when using --lift-run).

Options

FlagDescriptionDefault
-o <FILE>Output file pathDerived from input
--emit <TYPE>Emit intermediate representation: ast, hir, mir, llvm-ir, c, eir(compile to binary)
--backend <BACKEND>Code generation backend: cranelift, llvm, mlir, autocranelift
--target <TARGET>Target platform: cpu, cuda, metal, rocm, hybridcpu
-O <LEVEL>Optimization level: 0, 1, 2, 30
--energy-checkEnable compile-time energy budget verificationOff
--gpuEnable GPU code generation (uses MLIR backend)Off
--jitJIT-compile and run immediately (requires --features jit)Off
--watchWatch source file and re-run on changes (implies --jit)Off
--lift <LANG>Lift foreign code for energy analysis: python, js, c(none)
--lift-run <LANG> <FILE>Lift and execute foreign code with energy tracking(none)
--energy-optimizeApply energy optimization passes to lifted codeOff
--egraph-optimizeEnable e-graph algebraic optimization (30+ rewrite rules)Off
--profile-generateInstrument code for profile-guided optimizationOff
--profile-use <FILE>Apply PGO profile data from a previous run(none)
--incrementalEnable incremental compilation (FNV-1a fingerprinting)Off
--testBuild and run #[test] functions with energy reportingOff
--benchBuild and run #[bench] functions with energy reportingOff
--debugDebug build profile (no optimizations, debug info)Default
--releaseRelease build profile (-O2, strip debug info)Off
--stdlib-path <DIR>Path to the Joule standard libraryBuilt-in
-v, --verboseVerbose compiler outputOff

Environment Variables

VariableDescription
JOULE_ENERGY_JSON=1Output energy reports as JSON instead of human-readable text

Examples

Basic Compilation

# Compile to executable via C backend
joulec program.joule --emit c -o program.c
cc -o program program.c

# Compile with energy checking
joulec program.joule --emit c -o program.c --energy-check

# Release build with optimizations
joulec program.joule --release -o program

Emit Intermediate Representations

# Emit the AST (for debugging)
joulec program.joule --emit ast

# Emit HIR (typed intermediate representation)
joulec program.joule --emit hir

# Emit MIR (mid-level IR, after lowering)
joulec program.joule --emit mir

# Emit EIR (Energy IR with picojoule cost annotations)
joulec program.joule --emit eir

JIT Compilation

# JIT-compile and run immediately
joulec --jit program.joule

# Watch mode: re-compile and re-run on file changes
joulec --watch program.joule

See JIT Compilation for details.

Polyglot Energy Analysis

# Lift and run Python with energy measurement
joulec --lift-run python script.py

# Lift and run JavaScript with energy measurement
joulec --lift-run js app.js

# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py

See Polyglot Energy Analysis for details.

Advanced Optimization

# E-graph algebraic optimization
joulec program.joule --emit c --egraph-optimize -o program.c

# Profile-guided optimization (two-phase)
joulec program.joule --profile-generate -o program
./program                    # generates profile data
joulec program.joule --profile-use profile.json -o program_optimized

# Incremental compilation
joulec program.joule --incremental -o program

Testing and Benchmarking

# Run tests with energy reporting
joulec program.joule --test

# Run benchmarks with energy reporting
joulec program.joule --bench

JSON Energy Output

# Get energy reports as JSON
JOULE_ENERGY_JSON=1 joulec program.joule --emit c -o program.c --energy-check

Compilation Pipeline

Source code flows through these stages:

Source (.joule)
    |
    v
  Lexer ---------- Tokens
    |
    v
  Parser --------- AST (Abstract Syntax Tree)
    |
    v
  Type Checker ---- HIR (High-level IR) + Type Information
    |
    +-- Energy Budget Checker (if --energy-check)
    |
    v
  EIR Lowering ---- EIR (Energy IR) [if --egraph-optimize or --emit eir]
    |
    +-- E-Graph Optimizer (30+ algebraic rewrite rules)
    |
    v
  MIR Lowering ---- MIR (Mid-level IR)
    |
    v
  Borrow Checker -- Ownership/lifetime verification
    |
    v
  Code Generation
    +-- C Backend ---------- C source code
    +-- Cranelift Backend --- Native binary (fast compilation)
    +-- Cranelift JIT ------- In-memory execution (--jit/--watch)
    +-- LLVM Backend -------- Native binary (optimized)
    +-- MLIR Backend -------- GPU/accelerator code
    +-- WASM Backend -------- WebAssembly

Incremental Compilation

When --incremental is enabled, the compiler:

  1. Fingerprints each source file using FNV-1a hashing
  2. Builds a dependency graph between modules
  3. On recompilation, only reprocesses files whose fingerprint changed (or whose dependencies changed)
  4. Caches query results to disk as JSON for persistence across sessions

Profile-Guided Optimization

PGO is a two-phase process:

  1. Phase 1 (--profile-generate): The compiler instruments the C output with basic-block counters. Running the instrumented binary produces a JSON profile with execution frequencies.
  2. Phase 2 (--profile-use): The compiler reads the profile and refines EIR energy cost estimates using actual execution frequencies. Loop trip counts are derived from back-edge counter ratios. Hot paths get more accurate energy budgets.

Backends

C Backend (--emit c)

Generates portable C source code. This is the primary backend and the one used for the bootstrap compiler. The generated C compiles with any standard C compiler (gcc, clang, cc).

joulec program.joule --emit c -o program.c
cc -o program program.c

Features:

  • Freestanding mode for embedded targets (jrt_* runtime abstraction)
  • #line directives for source-level debugging
  • Energy instrumentation for PGO

Cranelift Backend

Fast compilation, suitable for development. Uses the Cranelift code generator. Enable with --features cranelift.

Cranelift JIT Backend

In-memory compilation and execution. No intermediate files. Enable with --features jit.

joulec --jit program.joule

LLVM Backend

Optimized compilation for release builds. Requires LLVM 16+. Enable with --features llvm.

MLIR Backend

Heterogeneous computing with GPU/accelerator support. Targets CUDA, Metal, and ROCm. Enable with --gpu.

WASM Backend

WebAssembly output for browser and edge deployment.

File Extension

Joule source files must use the .joule extension. The compiler rejects all other extensions (except when using --lift-run, which accepts .py, .js, and .c).

Energy Checking

When --energy-check is passed, the compiler performs static analysis on every function with an #[energy_budget] attribute. Functions that exceed their declared budget produce a compilation error.

See Energy System Guide for details.

Diagnostics

The compiler produces structured error messages with source locations:

error[E0001]: mismatched types
  --> program.joule:10:15
   |
10 |     let x: i32 = "hello";
   |                  ^^^^^^^ expected i32, found String

Warnings are shown for potential issues but don't prevent compilation (unless you've set strict mode).

JIT Compilation

Joule supports just-in-time compilation for interactive development. Instead of producing an executable file, the compiler compiles your code in memory and runs it immediately.

Quick Start

# JIT-compile and run
joulec --jit program.joule

# Watch mode: re-compile on file changes
joulec --watch program.joule

Requirements

JIT mode requires the jit feature flag, which enables the Cranelift JIT backend:

# Build joulec with JIT support
cargo build --release -p joulec --features jit

The feature chain is: jit -> cranelift -> joule-codegen-cranelift + joule-codegen + notify.

How It Works

JIT Mode (--jit)

  1. Source code is parsed, type-checked, and lowered to MIR (the same pipeline as normal compilation)
  2. MIR is translated to Cranelift IR
  3. Cranelift compiles the IR to native machine code in memory
  4. The main() function is called directly via a function pointer
  5. The program runs and exits

No intermediate files are produced. No C compiler is invoked. Compilation and execution happen in a single process.

Watch Mode (--watch)

Watch mode extends JIT with file monitoring:

  1. The source file is JIT-compiled and run (same as --jit)
  2. The notify crate monitors the source file for changes
  3. When the file is saved, a fresh JIT module is created and the program re-runs
  4. A 50ms debounce prevents multiple re-runs from editor save-rename sequences

Each watch cycle creates a fresh JITModule because Cranelift's JIT module cannot redefine functions. This ensures clean state on every re-run.

Architecture

FunctionTranslator

The FunctionTranslator<'a, M: Module> is generic over the module type:

Module TypeModeOutput
ObjectModuleAOT compilationObject file (.o)
JITModuleJIT compilationIn-memory executable code

This means the same translation logic handles both AOT and JIT -- no code duplication.

Runtime Symbols

JIT mode provides runtime symbols that replace the C runtime's functions:

SymbolPurpose
joule_jit_printlnPrint a string with newline
joule_jit_printPrint a string without newline
joule_jit_panicPanic with a message
mallocMemory allocation (libc)
freeMemory deallocation (libc)
memcpyMemory copy (libc)

These symbols are registered with the JITModule before compilation so that generated code can call them.

PIC Mode

JIT compilation uses position-independent code (PIC) = false, since the code runs in a known memory location. AOT compilation uses PIC = true for shared library compatibility.

Energy Tracking

JIT mode includes full energy tracking. Energy consumed during execution is measured and reported:

$ joulec --jit program.joule
Hello from JIT!
Energy consumed: 0.000123 J

Energy budgets declared with #[energy_budget] are checked at compile time, before JIT execution begins. If a budget is violated, compilation fails and the program does not run.

Limitations

  • No persistent output: JIT mode does not produce an executable file. For deployment, use the C backend or AOT Cranelift compilation.
  • Single-file: JIT mode currently compiles a single source file. Multi-file projects should use mod declarations within the entry file.
  • Feature gate: JIT support is behind --features jit to keep the default binary small. The notify dependency is only pulled in when JIT is enabled.

Use Cases

Rapid Prototyping

JIT mode eliminates the compile-link-run cycle:

# Edit, save, see results instantly
joulec --watch prototype.joule

Energy Experimentation

Try different algorithms and immediately see their energy impact:

// Try bubble sort
#[energy_budget(max_joules = 0.001)]
fn sort_experiment(data: Vec<i32>) -> Vec<i32> {
    bubble_sort(data)
}
joulec --jit experiment.joule
# Change to quicksort, save, see new energy reading

Interactive Testing

Run tests without a full build:

joulec --jit --test tests.joule

Comparison with Other Modes

ModeCommandSpeedOutputUse Case
JIT--jitFastestNone (runs in memory)Development
Watch--watchFast (re-runs on save)NoneInteractive development
C Backend--emit cModerate.c fileDeployment, bootstrap
Cranelift AOT(default)FastBinaryDevelopment builds
LLVM--features llvmSlowOptimized binaryRelease builds

Polyglot Energy Analysis

Joule can measure and optimize the energy consumption of code written in other languages. The --lift-run flag lifts foreign code into Joule's intermediate representation, applies energy analysis, and executes it with full energy tracking.

Quick Start

# Measure energy in a Python script
joulec --lift-run python script.py

# Measure energy in a JavaScript file
joulec --lift-run js app.js

# Measure energy in C code
joulec --lift-run c program.c

# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py

# Generate JSON energy report
joulec --lift python script.py --energy-report report.json

# Set energy budget (exit code 1 if exceeded)
joulec --lift python script.py --energy-budget 100nJ

How It Works

The polyglot pipeline has four stages:

  1. Parse: The source file is parsed by a language-specific parser (Python, JavaScript, or C) into Joule's LiftedModule representation.

  2. Lower: The lifted AST is lowered to MIR (Mid-level IR), the same representation used for native Joule code. Variables, functions, classes, and control flow are all mapped to MIR constructs.

  3. Optimize (optional): When --energy-optimize is passed, four energy optimization passes are applied to the MIR before execution.

  4. Execute: The MIR is JIT-compiled via the Cranelift backend and executed in-memory. Energy consumption is tracked throughout execution and reported at the end.

Supported Languages

Python

Comprehensive support for Python syntax and semantics:

FeatureStatus
Functions, closures, lambdasSupported
Classes (single and multiple inheritance)Supported
List/dict/set comprehensionsSupported
f-stringsSupported
Ternary expressionsSupported
enumerate/zipSupported
match/case (Python 3.10+)Supported
Walrus operator (:=)Supported
try/except/finallySupported (guard patterns)
Slicing with stepSupported
Default argumentsSupported
*args, **kwargsSupported
Generator expressionsSupported
String methods (30+)Supported
List methods (20+)Supported
Dict methods (15+)Supported
Math moduleSupported
Print with end=Supported
True divisionSupported
BigInt overflow handlingSupported

JavaScript

Comprehensive support for JavaScript syntax and semantics:

FeatureStatus
Functions, arrow functionsSupported
Classes (single inheritance)Supported
Template literalsSupported
DestructuringSupported
Spread operatorSupported
switch/caseSupported
for-in/for-ofSupported
do-whileSupported
Bitwise operatorsSupported
typeofSupported
Nullish coalescing (??)Supported
Optional chaining (?.)Supported
Array methods (20+)Supported
String methods (15+)Supported
Object methodsSupported
Math objectSupported
console.logSupported
this keywordSupported

C

Basic support for C code:

FeatureStatus
FunctionsSupported
Basic types (int, float, double, char)Supported
ArraysSupported
PointersSupported
Control flow (if, while, for)Supported
stdio (printf, scanf)Supported
math.h functionsSupported

TypeScript

TypeScript types are erased before analysis — the energy profile is identical to JavaScript. See the TypeScript Guide for details.

FeatureStatus
Everything in JavaScriptSupported
Type annotationsStripped
Interfaces, type aliases, genericsStripped
Access modifiers (public/private)Stripped
Enums (simple)Converted to constants

Go

FeatureStatus
Functions, closures, variadicSupported
for, for range, if/else, switchSupported
Slices, maps, structs, methodsSupported
Goroutines (go)Supported (sequential analysis)
Channels (chan, <-)Supported
deferSupported
Multiple return valuesSupported
fmt, math, strings, strconvSupported

Rust

FeatureStatus
Functions, closures, impl blocksSupported
for/while/loop, if/else, matchSupported
let/let mut, ownership annotationsSupported
Structs, enums, Option, ResultSupported
Vec, HashMap, String, BoxSupported
Iterator chains (.map/.filter/.fold)Supported
println!, format!, vec!Supported
Traits (signatures only)Supported

Energy Recommendations

When analyzing code, Joule detects common energy anti-patterns and suggests fixes. Categories include:

  • ALGORITHM -- Nested loops where a hash set would be O(1)
  • ALLOCATION -- Heap allocation inside hot loops
  • REDUNDANCY -- Recomputed values that could be hoisted
  • DATA STRUCTURE -- Linear search where a set/map is more efficient
  • LOOP -- Missing early exits, unbounded iteration
  • STRING -- String concatenation in loops (O(n^2))
  • MEMORY -- Cache-unfriendly access patterns
  • PRECISION -- Float arithmetic where integer suffices

See the per-language guides for language-specific examples of each pattern.

Runtime System

The lift-run runtime provides 100+ shim functions that bridge language-specific operations to native code:

String Operations

str_new, str_concat, str_len, str_print, str_from_int, str_from_float, str_eq, str_index, str_slice, str_contains, str_mul, str_cmp, str_upper, str_lower, str_trim, str_split, str_replace, str_starts_with, str_ends_with, str_index_of, and more.

List Operations

list_new, list_push, list_get, list_set, list_len, list_pop, list_sort, list_reverse, list_copy, list_append, list_index_of, list_contains, list_slice, list_map, list_filter, and more.

Dict Operations

dict_new, dict_set, dict_get, dict_len, dict_get_default, dict_pop, dict_update, dict_setdefault, dict_keys, dict_values, dict_items, dict_contains, and more.

Class Desugaring

Classes from Python and JavaScript are desugared to dictionary-backed standalone functions:

# Python source
class Counter:
    def __init__(self, start):
        self.count = start

    def increment(self):
        self.count += 1
        return self.count

This is lowered to:

  • Counter____init__(self, start) -- constructor function
  • Counter__increment(self) -- method function
  • self is a dictionary with fields as key-value pairs

Multiple inheritance is supported using BFS method resolution order (MRO).

Energy Optimization Passes

When --energy-optimize is used, four passes optimize the lifted code:

  1. Constant Propagation -- Propagate known values, fold constant expressions
  2. Dead Code Elimination -- Remove unreachable and unused code
  3. Loop Optimization -- Reduce redundant computation in loops
  4. Strength Reduction -- Replace expensive operations with cheaper equivalents

Test Coverage

The polyglot pipeline is validated by 1,220 tests across 8 test suites:

SuiteCountDescription
Tiered validation90Core feature coverage
Edge cases80Corner cases and error handling
Domain10050 Python + 50 JS across 5 domains
Stdlib10050 Python + 50 JS: string/list methods, default args
Classes50Inheritance, MRO, properties, static methods
Advanced50Closures, generators, decorators, metaclasses
Syntax50Language-specific syntax features
Coverage700Division, print, comprehensions, string ops

Total: 1,220/1,220 (100% pass rate)

Examples

Python Energy Analysis

# fibonacci.py
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

result = fibonacci(30)
print(f"Result: {result}")
$ joulec --lift-run python fibonacci.py
Result: 832040
Energy consumed: 0.00234 J

JavaScript Energy Analysis

// sort.js
function quickSort(arr) {
    if (arr.length <= 1) return arr;
    const pivot = arr[0];
    const left = arr.filter(x => x < pivot);
    const right = arr.filter(x => x > pivot);
    return [...quickSort(left), pivot, ...quickSort(right)];
}

const data = Array.from({length: 1000}, () => Math.floor(Math.random() * 10000));
const sorted = quickSort(data);
console.log(`Sorted ${sorted.length} elements`);
$ joulec --lift-run js sort.js
Sorted 1000 elements
Energy consumed: 0.00891 J

Energy-Optimized Execution

$ joulec --energy-optimize --lift-run python fibonacci.py
Result: 832040
Energy consumed: 0.00198 J (15.4% reduction)

Static Analysis Mode

For energy analysis without execution, use --lift instead of --lift-run:

# Analyze without running
joulec --lift python script.py

# Output includes per-function energy estimates

This performs the parsing and lowering steps but stops before JIT compilation, producing a static energy report for each function.

Per-Language Guides

For detailed anti-patterns, optimization tips, and worked examples specific to each language:

  • Python Guide -- 100+ runtime shims, classes, comprehensions, f-strings
  • JavaScript Guide -- Arrow functions, template literals, array methods
  • TypeScript Guide -- Type erasure, identical energy to JavaScript
  • C Guide -- Memory allocation patterns, cache analysis
  • Go Guide -- Goroutines, channels, slice operations
  • Rust Guide -- Iterator chains, zero-cost abstractions

Further Reading

Python Energy Analysis

Joule provides comprehensive energy analysis for Python code. With 100+ runtime shims covering strings, lists, dicts, classes, comprehensions, and f-strings, most idiomatic Python runs unmodified.

Quick Start

# Static energy analysis (no execution)
joulec --lift python script.py

# Execute with energy tracking
joulec --lift-run python script.py

# Execute with energy optimization
joulec --energy-optimize --lift-run python script.py

# Generate JSON report for CI
joulec --lift python script.py --energy-report report.json

Supported Features

CategoryFeatures
Functionsdef, lambda, closures, default arguments, *args, **kwargs
ClassesSingle and multiple inheritance, __init__, methods, properties, static methods, BFS MRO
Control flowif/elif/else, while, for x in, break, continue, return
ComprehensionsList [x for x in ...], dict {k:v for ...}, set {x for ...}, generator (x for ...)
String featuresf-strings, .upper(), .lower(), .strip(), .split(), .replace(), .startswith(), .endswith(), .join(), .find(), .index(), + concatenation, * repetition (30+ methods)
List features.append(), .pop(), .sort(), .reverse(), .index(), .count(), .copy(), slicing, len(), in operator (20+ methods)
Dict features.get(), .pop(), .update(), .setdefault(), .keys(), .values(), .items(), in operator (15+ methods)
Mathmath.floor(), math.ceil(), math.sqrt(), math.pow(), abs(), min(), max(), sum(), range()
ExpressionsTernary x if cond else y, walrus :=, match/case, enumerate(), zip(), true division, ** power
Error handlingtry/except/finally with guard patterns (division, key, bounds)
Typesint (i64 + BigInt overflow), float (f64), bool, str, list, dict, set, None
Printprint() with end= parameter, polymorphic output (int/float/string)

Common Energy Anti-Patterns

1. String Concatenation in Loops

# BAD — O(n^2) energy: each += allocates a new string
result = ""
for word in words:
    result += word + " "

# GOOD — O(n) energy: join allocates once
result = " ".join(words)

Category: STRING | Severity: High | Savings: ~10x for large inputs

Each += on a string allocates a new buffer and copies the entire accumulated string. For 1,000 words averaging 5 characters, the bad version performs ~2.5 million character copies. The good version performs ~5,000.

2. Linear Search on List vs Set

# BAD — O(n) per lookup = O(n*m) total
for item in queries:
    if item in large_list:     # linear scan every time
        process(item)

# GOOD — O(1) per lookup = O(n+m) total
lookup = set(large_list)       # one-time O(n) cost
for item in queries:
    if item in lookup:         # hash lookup
        process(item)

Category: DATA STRUCTURE | Severity: High | Savings: ~50x for 10K elements

3. Allocation Inside Hot Loops

# BAD — allocates a new list every iteration
for i in range(1000):
    temp = []
    temp.append(i)
    process(temp)

# GOOD — reuse buffer
temp = []
for i in range(1000):
    temp.clear()
    temp.append(i)
    process(temp)

Category: ALLOCATION | Severity: Medium | Savings: ~3x

4. Missing Early Exit

# BAD — always scans entire list
def find_first(items, target):
    result = -1
    for i in range(len(items)):
        if items[i] == target:
            result = i
    return result

# GOOD — exits on first match
def find_first(items, target):
    for i in range(len(items)):
        if items[i] == target:
            return i
    return -1

Category: LOOP | Severity: Medium | Savings: ~2x average case

5. Recomputing Loop Invariants

# BAD — len(data) recomputed every iteration
for i in range(len(data)):
    if i < len(data) - 1:
        process(data[i], data[i + 1])

# GOOD — compute once
n = len(data)
for i in range(n - 1):
    process(data[i], data[i + 1])

Category: REDUNDANCY | Severity: Low | Savings: ~1.2x

Worked Example

Given a data processing pipeline:

class DataProcessor:
    def __init__(self, data):
        self.data = data
        self.results = []

    def filter_positive(self):
        filtered = []
        for x in self.data:
            if x > 0:
                filtered.append(x)
        self.data = filtered

    def normalize(self):
        total = sum(self.data)
        self.data = [x / total for x in self.data]

    def to_report(self):
        report = ""
        for i in range(len(self.data)):
            report += f"Item {i}: {self.data[i]}\n"
        return report

def main():
    proc = DataProcessor([3.0, -1.0, 4.0, -2.0, 5.0, 1.0])
    proc.filter_positive()
    proc.normalize()
    print(proc.to_report())

main()

Running energy analysis:

$ joulec --lift python pipeline.py
Energy Analysis: pipeline.py

  DataProcessor____init__     2.35 nJ  (confidence: 0.95)
  DataProcessor__filter_positive  8.72 nJ  (confidence: 0.65)
  DataProcessor__normalize    6.15 nJ  (confidence: 0.70)
  DataProcessor__to_report   14.80 nJ  (confidence: 0.55)
  main                        3.20 nJ  (confidence: 0.90)

  Total: 35.22 nJ

Recommendations:
  !! [STRING] DataProcessor__to_report — string concatenation in loop
     Suggestion: use "".join() to build string in one allocation
     Estimated savings: 8-10x for large inputs

  !  [REDUNDANCY] DataProcessor__to_report — len() called inside loop range
     Suggestion: compute len() once before the loop
     Estimated savings: 1.2x

JSON Energy Report

$ joulec --lift python pipeline.py --energy-report report.json
{
  "source_file": "pipeline.py",
  "language": "python",
  "functions": [
    {
      "name": "DataProcessor__to_report",
      "energy_pj": 14800,
      "energy_human": "14.80 nJ",
      "confidence": 0.55
    }
  ],
  "total_energy_pj": 35220,
  "total_energy_human": "35.22 nJ",
  "functions_lifted": 5,
  "constructs_approximated": 2,
  "recommendations": [
    {
      "function": "DataProcessor__to_report",
      "category": "STRING",
      "severity": "high",
      "issue": "string concatenation in loop",
      "suggestion": "use join() to build string in one allocation",
      "savings_factor": 8.0
    }
  ]
}

Energy Budget for CI

# Fail the build if total energy exceeds 50 nJ
$ joulec --lift python pipeline.py --energy-budget 50nJ
# Exit code 0: within budget

$ joulec --lift python pipeline.py --energy-budget 20nJ
# Exit code 1: budget exceeded (35.22 nJ > 20.00 nJ)

Limitations

  • No external package imports (import numpy, import requests, etc.) -- only built-in operations
  • try/except uses guard patterns (division, key, bounds) rather than full exception semantics
  • Generator execution is approximated (constant iteration count estimate)
  • No async/await -- async patterns are desugared to synchronous equivalents
  • No decorator side effects -- decorators are recognized but not executed
  • Class __repr__, __str__, __eq__ dunder methods are not auto-dispatched

JavaScript Energy Analysis

Joule lifts JavaScript into its energy analysis pipeline, providing per-function energy estimates for Node.js and browser-style code. Arrow functions, template literals, classes, destructuring, and 20+ array methods are fully supported.

Quick Start

# Static energy analysis
joulec --lift js app.js

# Execute with energy tracking
joulec --lift-run js app.js

# Execute with energy optimization
joulec --energy-optimize --lift-run js app.js

Supported Features

CategoryFeatures
Functionsfunction, arrow functions =>, default params, rest params ...args
Classesclass, constructor, extends, methods, static, this, super
Control flowif/else, while, do-while, for, for-in, for-of, switch/case, break, continue
DestructuringArray [a, b] = arr, object {x, y} = obj, nested, with defaults
OperatorsSpread ..., nullish coalescing ??, optional chaining ?., typeof, bitwise
Template literals`Hello ${name}` with expression interpolation
Array methods.push(), .pop(), .map(), .filter(), .reduce(), .find(), .findIndex(), .some(), .every(), .forEach(), .indexOf(), .includes(), .slice(), .splice(), .concat(), .reverse(), .sort(), .join(), .flat(), .length
String methods.length, .charAt(), .indexOf(), .includes(), .slice(), .substring(), .toUpperCase(), .toLowerCase(), .trim(), .split(), .replace(), .startsWith(), .endsWith(), .repeat()
Object methodsObject.keys(), Object.values(), Object.entries()
MathMath.floor(), Math.ceil(), Math.round(), Math.abs(), Math.max(), Math.min(), Math.pow(), Math.sqrt(), Math.random(), Math.PI
Outputconsole.log() with auto-coercion
TypesNumbers (f64), strings, booleans, arrays, objects, null, undefined

Common Energy Anti-Patterns

1. Chained Array Methods Creating Intermediates

// BAD — 3 intermediate arrays allocated
const result = data
    .filter(x => x > 0)     // allocates filtered array
    .map(x => x * 2)        // allocates mapped array
    .reduce((a, b) => a + b, 0);  // iterates again

// GOOD — single pass, one allocation
let result = 0;
for (const x of data) {
    if (x > 0) result += x * 2;
}

Category: ALLOCATION | Severity: High | Savings: ~3x (eliminates 2 intermediate allocations)

2. indexOf on Large Arrays

// BAD — O(n) per check
for (const query of queries) {
    if (data.indexOf(query) !== -1) {
        process(query);
    }
}

// GOOD — O(1) per check with Set
const lookup = new Set(data);
for (const query of queries) {
    if (lookup.has(query)) {
        process(query);
    }
}

Category: DATA STRUCTURE | Severity: High | Savings: ~50x for 10K elements

3. Template Literals in Tight Loops

// BAD — string allocation every iteration
for (let i = 0; i < 10000; i++) {
    const msg = `Processing item ${i} of ${total}`;
    log(msg);
}

// GOOD — build once if constant parts dominate
const prefix = "Processing item ";
const suffix = " of " + total;
for (let i = 0; i < 10000; i++) {
    log(prefix + i + suffix);
}

Category: STRING | Severity: Medium | Savings: ~2x

4. Nested for-of Loops

// BAD — O(n*m) with no early exit
function findPair(arr1, arr2, target) {
    for (const a of arr1) {
        for (const b of arr2) {
            if (a + b === target) return [a, b];
        }
    }
    return null;
}

// GOOD — O(n+m) with hash set
function findPair(arr1, arr2, target) {
    const seen = new Set(arr1);
    for (const b of arr2) {
        if (seen.has(target - b)) return [target - b, b];
    }
    return null;
}

Category: ALGORITHM | Severity: Critical | Savings: ~100x for large inputs

5. forEach with Closure Allocation

// BAD — allocates closure object per iteration
data.forEach(function(item) {
    if (item.active) results.push(item.name);
});

// GOOD — for-of avoids closure overhead
for (const item of data) {
    if (item.active) results.push(item.name);
}

Category: ALLOCATION | Severity: Low | Savings: ~1.3x

Worked Example

class EventQueue {
    constructor() {
        this.events = [];
        this.handlers = [];
    }

    on(type, handler) {
        this.handlers.push({ type: type, fn: handler });
    }

    emit(type, data) {
        this.events.push({ type: type, data: data, time: Date.now() });
        const matching = this.handlers.filter(h => h.type === type);
        matching.forEach(h => h.fn(data));
    }

    getEventsByType(type) {
        return this.events.filter(e => e.type === type);
    }
}

function main() {
    const queue = new EventQueue();
    let total = 0;

    queue.on("data", function(val) { total += val; });
    queue.on("data", function(val) { console.log(`Received: ${val}`); });

    for (let i = 0; i < 100; i++) {
        queue.emit("data", i);
    }

    console.log(`Total: ${total}`);
    const dataEvents = queue.getEventsByType("data");
    console.log(`Events logged: ${dataEvents.length}`);
}

main();
$ joulec --lift js events.js
Energy Analysis: events.js

  EventQueue__constructor    1.20 nJ  (confidence: 0.95)
  EventQueue__on             2.10 nJ  (confidence: 0.90)
  EventQueue__emit          18.50 nJ  (confidence: 0.60)
  EventQueue__getEventsByType  5.30 nJ  (confidence: 0.65)
  main                       8.40 nJ  (confidence: 0.55)

  Total: 35.50 nJ

Recommendations:
  !! [ALLOCATION] EventQueue__emit — filter() + forEach() chain allocates intermediate array
     Suggestion: use a single for-of loop to filter and dispatch in one pass
     Estimated savings: 2-3x

Limitations

  • No DOM APIs (document, window, fetch, etc.)
  • No require() or import of npm modules
  • async/await and Promises are approximated as synchronous
  • No WeakMap, WeakSet, Proxy, Reflect
  • No regular expressions (regex literals are parsed but not executed)
  • Date.now() returns a simulated timestamp
  • No eval() or dynamic code execution

TypeScript Energy Analysis

Joule analyzes TypeScript by stripping type annotations and delegating to the JavaScript pipeline. Since TypeScript types are erased at compile time, the energy profile of a TypeScript program is identical to its JavaScript equivalent.

Quick Start

# Static energy analysis
joulec --lift ts app.ts

# Execute with energy tracking
joulec --lift-run ts app.ts

# Execute with energy optimization
joulec --energy-optimize --lift-run ts app.ts

How It Works

The TypeScript lifter removes all TypeScript-specific syntax before analysis:

  • Type annotationsx: number, fn(s: string): void
  • Interfacesinterface Foo { ... }
  • Type aliasestype Result = Success | Error
  • GenericsArray<number>, Map<string, number>
  • Access modifierspublic, private, protected, readonly
  • Enumsenum Color { Red, Green, Blue }
  • Non-null assertionsvalue!
  • Type castsvalue as Type, <Type>value

After stripping, the remaining JavaScript is analyzed normally. This means TypeScript types are free — they add zero energy overhead.

Type Safety is Free

// TypeScript version
interface Point {
    x: number;
    y: number;
}

function distance(a: Point, b: Point): number {
    const dx: number = a.x - b.x;
    const dy: number = a.y - b.y;
    return Math.sqrt(dx * dx + dy * dy);
}
// Equivalent JavaScript
function distance(a, b) {
    const dx = a.x - b.x;
    const dy = a.y - b.y;
    return Math.sqrt(dx * dx + dy * dy);
}

Both produce exactly the same energy analysis:

$ joulec --lift ts distance.ts
  distance    3.85 nJ  (confidence: 0.95)

$ joulec --lift js distance.js
  distance    3.85 nJ  (confidence: 0.95)

Supported Features

Everything from the JavaScript guide is supported, plus TypeScript-specific syntax is silently stripped:

TypeScript FeatureHandling
Type annotationsStripped
InterfacesStripped
Type aliasesStripped
GenericsStripped
Access modifiersStripped
Enums (simple)Converted to constants
as castsStripped
Non-null !Stripped
Optional ? paramsTreated as default undefined

When to Use TypeScript vs JavaScript Lifting

Use --lift ts when your source files are .ts or .tsx. The lifter handles the type syntax that would cause parse errors in the JavaScript parser. If your TypeScript is already compiled to JavaScript, use --lift js on the output — the energy profile will be identical.

Anti-Patterns

All JavaScript anti-patterns apply equally to TypeScript. Types do not change the runtime energy profile.

Limitations

  • Same limitations as JavaScript
  • Complex enum patterns with computed values are not supported
  • Namespace merging is not supported
  • Decorators (experimental) are not executed
  • declare blocks are ignored (ambient declarations)

C Energy Analysis

Joule analyzes C code for energy consumption, targeting the low-level patterns where energy waste is most impactful: memory allocation, cache access patterns, and nested loop structures.

Quick Start

# Static energy analysis
joulec --lift c program.c

# Execute with energy tracking
joulec --lift-run c program.c

# Execute with energy optimization
joulec --energy-optimize --lift-run c program.c

Supported Features

CategoryFeatures
Typesint, long, float, double, char, void, size_t, unsigned variants
PointersDeclaration, dereference *p, address-of &x, pointer arithmetic
ArraysFixed-size int arr[N], multidimensional int mat[M][N]
Control flowif/else, while, do-while, for, switch/case, break, continue, goto (limited)
FunctionsDeclaration, definition, forward declarations, recursion
StructsDefinition, field access . and ->, nested structs
Memorymalloc(), calloc(), realloc(), free()
I/Oprintf(), scanf(), puts(), getchar()
Mathsqrt(), pow(), abs(), floor(), ceil(), sin(), cos(), log(), exp()
OperatorsAll arithmetic, bitwise, comparison, logical, ternary ?:, comma

Common Energy Anti-Patterns

1. malloc Inside Loops

// BAD — 1000 allocations, 1000 frees
for (int i = 0; i < 1000; i++) {
    int *buf = malloc(sizeof(int) * 100);
    process(buf, 100);
    free(buf);
}

// GOOD — allocate once, reuse
int *buf = malloc(sizeof(int) * 100);
for (int i = 0; i < 1000; i++) {
    process(buf, 100);
}
free(buf);

Category: ALLOCATION | Severity: High | Savings: ~5x

Each malloc/free cycle costs ~200 pJ (DRAM access) plus system call overhead. In a tight loop, this dominates the energy budget.

2. Cache-Unfriendly Access Patterns

// BAD — column-major access on row-major array (cache miss per element)
for (int j = 0; j < N; j++) {
    for (int i = 0; i < M; i++) {
        sum += matrix[i][j];  // stride = N * sizeof(int)
    }
}

// GOOD — row-major access (sequential cache hits)
for (int i = 0; i < M; i++) {
    for (int j = 0; j < N; j++) {
        sum += matrix[i][j];  // stride = sizeof(int)
    }
}

Category: MEMORY | Severity: Critical | Savings: ~10x for large matrices

L1 cache load costs 0.5 pJ. DRAM load costs 200 pJ — a 400x difference. Column-major traversal on row-major data causes a DRAM load on nearly every access.

3. Realloc Growth in Loops

// BAD — realloc doubles per iteration, copies all data each time
int *data = NULL;
int cap = 0;
for (int i = 0; i < n; i++) {
    cap++;
    data = realloc(data, cap * sizeof(int));
    data[cap - 1] = i;
}

// GOOD — geometric growth (amortized O(1) per insert)
int *data = malloc(16 * sizeof(int));
int len = 0, cap = 16;
for (int i = 0; i < n; i++) {
    if (len == cap) {
        cap *= 2;
        data = realloc(data, cap * sizeof(int));
    }
    data[len++] = i;
}

Category: ALLOCATION | Severity: High | Savings: ~4x

4. Nested Loop Complexity

// BAD — O(n^3) matrix multiply without blocking
for (int i = 0; i < N; i++)
    for (int j = 0; j < N; j++)
        for (int k = 0; k < N; k++)
            C[i][j] += A[i][k] * B[k][j];

Category: ALGORITHM | Severity: Critical (for large N)

The energy estimator flags O(n^3) nested loops with high energy estimates and reduced confidence scores.

Worked Example

#include <stdlib.h>
#include <stdio.h>

void matrix_multiply(int *A, int *B, int *C, int n) {
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            C[i * n + j] = 0;
            for (int k = 0; k < n; k++) {
                C[i * n + j] += A[i * n + k] * B[k * n + j];
            }
        }
    }
}

int main() {
    int n = 64;
    int *A = calloc(n * n, sizeof(int));
    int *B = calloc(n * n, sizeof(int));
    int *C = calloc(n * n, sizeof(int));

    for (int i = 0; i < n * n; i++) {
        A[i] = i % 10;
        B[i] = (i * 3) % 10;
    }

    matrix_multiply(A, B, C, n);
    printf("C[0][0] = %d\n", C[0]);

    free(A); free(B); free(C);
    return 0;
}
$ joulec --lift c matmul.c
Energy Analysis: matmul.c

  matrix_multiply   892.40 nJ  (confidence: 0.50)
  main               45.20 nJ  (confidence: 0.75)

  Total: 937.60 nJ

Recommendations:
  !!! [ALGORITHM] matrix_multiply — O(n^3) nested loop detected
      Suggestion: consider cache-blocking or BLAS library for large matrices
      Estimated savings: 3-5x with cache blocking

  !! [MEMORY] matrix_multiply — inner loop access pattern B[k*n+j] has stride n
      Suggestion: transpose B before multiply, or interchange k/j loops
      Estimated savings: 2-4x from improved cache locality

Limitations

  • No preprocessor directives (#define, #include, #ifdef)
  • No function pointers or callbacks
  • No variadic functions beyond printf/scanf
  • No typedef (use bare type names)
  • No union types
  • No enum (use integer constants)
  • No complex struct initializers (= { .field = value })
  • No inline assembly

Go Energy Analysis

Joule analyzes Go code with awareness of goroutines, channels, and Go's concurrency model. The energy cost of spawning goroutines, sending on channels, and slice operations is modeled at the picojoule level.

Quick Start

# Static energy analysis
joulec --lift go main.go

# Execute with energy tracking
joulec --lift-run go main.go

# Execute with energy optimization
joulec --energy-optimize --lift-run go main.go

Supported Features

CategoryFeatures
Typesint, int8/16/32/64, uint, float32/64, string, bool, byte, rune
Variablesvar, := short declaration, const, multiple assignment
Functionsfunc, multiple return values, named returns, closures, variadic ...
Control flowif/else (with init statement), for, for range, switch/case, select
SlicesCreation, append(), len(), cap(), slicing s[a:b], make(), copy()
Mapsmap[K]V, make(map[...]), index, delete, len(), comma-ok pattern
StructsDefinition, field access, methods (value/pointer receiver), embedding
Concurrencygo (goroutine spawn), chan, <- send/receive, make(chan T, N), close()
Deferdefer statement (LIFO cleanup)
Error handlingMultiple return (result, error), if err != nil pattern
Packagesfmt.Println, fmt.Sprintf, math.Sqrt, strings.*, strconv.*

Common Energy Anti-Patterns

1. Unbounded Goroutine Fan-Out

// BAD — spawns 100K goroutines, each has scheduling overhead
for _, item := range items {
    go process(item)  // 100K goroutines
}

// GOOD — bounded worker pool
ch := make(chan Item, 100)
for i := 0; i < runtime.NumCPU(); i++ {
    go func() {
        for item := range ch {
            process(item)
        }
    }()
}
for _, item := range items {
    ch <- item
}
close(ch)

Category: ALLOCATION | Severity: Critical | Savings: ~10x

Each goroutine has a minimum 2KB stack allocation. 100K goroutines = 200MB of stack memory + scheduling overhead.

2. Slice Append Without Pre-Allocation

// BAD — slice grows geometrically, copying data each time
var result []int
for i := 0; i < 10000; i++ {
    result = append(result, i)
}

// GOOD — pre-allocate known capacity
result := make([]int, 0, 10000)
for i := 0; i < 10000; i++ {
    result = append(result, i)
}

Category: ALLOCATION | Severity: Medium | Savings: ~2x

Without pre-allocation, append triggers ~14 reallocations and data copies to grow from 0 to 10,000 elements.

3. Map Iteration with Value Copy

// BAD — copies entire struct on each iteration
type BigStruct struct {
    Data [1024]byte
    Name string
}

for _, v := range bigMap {
    process(v)  // copies 1KB+ per iteration
}

// GOOD — iterate with index, access by reference
for k := range bigMap {
    process(&bigMap[k])
}

Category: MEMORY | Severity: Medium | Savings: ~3x for large values

4. String Concatenation in Loops

// BAD — O(n^2) string building
result := ""
for _, s := range parts {
    result += s
}

// GOOD — O(n) with strings.Builder
var b strings.Builder
for _, s := range parts {
    b.WriteString(s)
}
result := b.String()

Category: STRING | Severity: High | Savings: ~10x

Worked Example

package main

import "fmt"

func counter(id int, ch chan int) {
    sum := 0
    for i := 0; i < 1000; i++ {
        sum += i
    }
    ch <- sum
}

func main() {
    ch := make(chan int, 10)
    for i := 0; i < 10; i++ {
        go counter(i, ch)
    }

    total := 0
    for i := 0; i < 10; i++ {
        total += <-ch
    }
    fmt.Println(total)
}
$ joulec --lift go counter.go
Energy Analysis: counter.go

  counter    12.50 nJ  (confidence: 0.70)
  main        8.30 nJ  (confidence: 0.65)

  Total: 20.80 nJ

  Note: 10 goroutines detected. Energy estimate reflects single-thread
  execution model; actual concurrent execution may differ due to
  scheduling and synchronization overhead.

Limitations

  • No interface method dispatch (interfaces parsed but not resolved dynamically)
  • No struct embedding for method promotion
  • No generics (Go 1.18+ type parameters)
  • No select with complex multi-channel patterns
  • No panic/recover (panic is treated as program exit)
  • No init() functions
  • No package imports beyond fmt, math, strings, strconv
  • Goroutines are analyzed sequentially (no parallel energy modeling)

Rust Energy Analysis

Joule analyzes Rust code with awareness of ownership, iterator chains, and zero-cost abstractions. The lifter models the energy cost of heap allocations, reference counting, and iterator fusion.

Quick Start

# Static energy analysis
joulec --lift rust lib.rs

# Execute with energy tracking
joulec --lift-run rust lib.rs

# Execute with energy optimization
joulec --energy-optimize --lift-run rust lib.rs

Supported Features

CategoryFeatures
Typesi8/16/32/64, u8/16/32/64, f32/64, bool, char, String, &str, usize, isize
Variableslet, let mut, const, type inference, shadowing
Functionsfn, closures |x| x + 1, generic functions (basic), impl blocks
Control flowif/else, while, loop, for x in, match (patterns, guards), break, continue
Ownership& references, &mut mutable references, move closures, lifetime annotations (parsed, not enforced)
StructsDefinition, field access, methods, associated functions
EnumsVariants, match exhaustiveness, Option<T>, Result<T, E>
CollectionsVec<T>, HashMap<K, V>, String, Box<T>
Iterators.iter(), .map(), .filter(), .fold(), .collect(), .enumerate(), .zip(), .chain(), .take(), .skip(), .any(), .all(), .find(), .sum(), .count()
TraitsTrait definitions and impl Trait for Type (signatures only)
Macrosprintln!, format!, vec!, panic! (pattern-matched, not expanded)

Common Energy Anti-Patterns

1. clone() in Hot Loops

#![allow(unused)]
fn main() {
// BAD — clones String every iteration (heap allocation + copy)
for item in &data {
    let owned = item.clone();
    process(owned);
}

// GOOD — borrow instead of clone
for item in &data {
    process_ref(item);
}
}

Category: ALLOCATION | Severity: High | Savings: ~5x

Each .clone() on a String involves malloc + memcpy. At 200 pJ per DRAM access, this dominates in tight loops.

2. Unnecessary collect() in Iterator Chains

#![allow(unused)]
fn main() {
// BAD — collects into intermediate Vec, then iterates again
let filtered: Vec<i32> = data.iter()
    .filter(|&&x| x > 0)
    .cloned()
    .collect();  // allocates intermediate Vec
let sum: i32 = filtered.iter().sum();

// GOOD — single iterator chain, no intermediate allocation
let sum: i32 = data.iter()
    .filter(|&&x| x > 0)
    .sum();
}

Category: ALLOCATION | Severity: Medium | Savings: ~2x

Iterator fusion in Rust is a zero-cost abstraction — the compiler fuses the chain into a single loop. Breaking the chain with .collect() defeats this.

3. Box::new() in Loops

#![allow(unused)]
fn main() {
// BAD — heap allocation per iteration
let mut nodes: Vec<Box<Node>> = Vec::new();
for i in 0..1000 {
    nodes.push(Box::new(Node { value: i }));
}

// GOOD — pre-allocate with arena or flat Vec
let mut nodes: Vec<Node> = Vec::with_capacity(1000);
for i in 0..1000 {
    nodes.push(Node { value: i });
}
}

Category: ALLOCATION | Severity: Medium | Savings: ~3x

4. format!() String Building in Loops

#![allow(unused)]
fn main() {
// BAD — format! allocates a new String every iteration
let mut log = String::new();
for i in 0..1000 {
    log.push_str(&format!("item {}\n", i));
}

// GOOD — write! to a single buffer
use std::fmt::Write;
let mut log = String::with_capacity(10000);
for i in 0..1000 {
    write!(log, "item {}\n", i).unwrap();
}
}

Category: STRING | Severity: Medium | Savings: ~2x

Worked Example

fn process_data(data: &[f64]) -> f64 {
    let filtered: Vec<f64> = data.iter()
        .filter(|&&x| x > 0.0)
        .cloned()
        .collect();

    let normalized: Vec<f64> = filtered.iter()
        .map(|&x| x / filtered.len() as f64)
        .collect();

    normalized.iter().sum()
}

fn main() {
    let data = vec![3.0, -1.0, 4.0, -2.0, 5.0, 1.0, -3.0, 2.0];
    let result = process_data(&data);
    println!("Result: {}", result);
}
$ joulec --lift rust pipeline.rs
Energy Analysis: pipeline.rs

  process_data   12.30 nJ  (confidence: 0.65)
  main            2.10 nJ  (confidence: 0.90)

  Total: 14.40 nJ

Recommendations:
  !! [ALLOCATION] process_data — two collect() calls create intermediate Vecs
     Suggestion: fuse into a single iterator chain without intermediate allocation
     Estimated savings: 2-3x

     Optimized version:
       data.iter()
           .filter(|&&x| x > 0.0)
           .map(|&x| x / count as f64)
           .sum()

Zero-Cost Abstractions Are Real

Rust's iterator chains compile to the same machine code as hand-written loops. Joule confirms this:

#![allow(unused)]
fn main() {
// Iterator version
fn sum_positive_iter(data: &[i32]) -> i32 {
    data.iter().filter(|&&x| x > 0).sum()
}

// Manual loop version
fn sum_positive_loop(data: &[i32]) -> i32 {
    let mut sum = 0;
    for &x in data {
        if x > 0 { sum += x; }
    }
    sum
}
}
$ joulec --lift rust zero_cost.rs
  sum_positive_iter    4.20 nJ  (confidence: 0.70)
  sum_positive_loop    4.20 nJ  (confidence: 0.70)

Identical energy. The abstraction is truly zero-cost.

Limitations

  • No trait dispatch (static or dynamic) — trait bounds are parsed but not resolved
  • No lifetime analysis — lifetimes are parsed but not enforced
  • No async/await — async is not supported
  • No procedural macros — only println!, format!, vec!, panic! are recognized
  • No use imports — all types must be fully qualified or built-in
  • No impl Trait return types
  • No where clauses
  • No unsafe blocks

Energy Optimization Walkthrough

This walkthrough takes a real Python program through the full Joule energy analysis and optimization pipeline — from first scan to CI-ready energy budgets.

Step 1: Start with Real Code

Here's a data processing program with several common energy anti-patterns:

def find_duplicates(items, reference):
    """Find items that appear in both lists."""
    duplicates = []
    for item in items:
        for ref in reference:          # nested loop: O(n*m)
            if item == ref:
                duplicates.append(item)
    return duplicates

def build_report(records):
    """Build a text report from records."""
    report = ""
    for i in range(len(records)):      # len() in loop, string concat
        report += "Record " + str(i) + ": " + str(records[i]) + "\n"
    return report

def process_batch(data):
    """Filter and transform a data batch."""
    results = []
    for item in data:
        temp = []                      # allocation inside loop
        temp.append(item * 2)
        if temp[0] > 10:
            results.append(temp[0])
    return results

def search_all(items, targets):
    """Check if all targets exist in items."""
    found = 0
    for t in targets:
        for item in items:             # linear scan for each target
            if item == t:
                found = found + 1
                # no break — scans entire list even after finding match
    return found

def main():
    data = []
    for i in range(500):
        data.append(i)

    reference = []
    for i in range(250, 750):
        reference.append(i)

    dups = find_duplicates(data, reference)
    report = build_report(data)
    processed = process_batch(data)
    count = search_all(data, reference)

    print(len(dups))
    print(len(processed))
    print(count)

main()

Step 2: Run Baseline Analysis

$ joulec --lift python anti_patterns.py --energy-report baseline.json
Energy Analysis: anti_patterns.py

  find_duplicates    285.00 nJ  (confidence: 0.50)
  build_report        72.50 nJ  (confidence: 0.55)
  process_batch       18.30 nJ  (confidence: 0.60)
  search_all         285.00 nJ  (confidence: 0.50)
  main                12.40 nJ  (confidence: 0.75)

  Total: 673.20 nJ

Step 3: Read the Recommendations

Recommendations:

  !!! [ALGORITHM] find_duplicates — O(n^2) nested loop for membership test
      Suggestion: convert reference to a set for O(1) lookups
      Estimated savings: 50x

  !!! [ALGORITHM] search_all — O(n^2) nested loop for membership test
      Suggestion: convert items to a set for O(1) lookups
      Estimated savings: 50x

  !! [STRING] build_report — string concatenation in loop
      Suggestion: use "".join() to build string in one allocation
      Estimated savings: 8x

  !! [LOOP] search_all — no early exit after finding match
      Suggestion: add break after match to avoid scanning remaining elements
      Estimated savings: 2x (average case)

  !  [REDUNDANCY] build_report — len(records) called in loop range
      Suggestion: compute len() once before the loop
      Estimated savings: 1.2x

  !  [ALLOCATION] process_batch — list allocation inside loop body
      Suggestion: reuse buffer or eliminate temporary list
      Estimated savings: 3x

  .  [REDUNDANCY] build_report — str() conversion could use f-string
      Suggestion: use f"Record {i}: {records[i]}" for cleaner concatenation
      Estimated savings: 1.1x

Severity markers: !!! Critical, !! High, ! Medium, . Low

Step 4: Fix Critical Issues First

Fix #1: Hash set for find_duplicates

def find_duplicates(items, reference):
    ref_set = set(reference)           # O(m) one-time cost
    duplicates = []
    for item in items:
        if item in ref_set:            # O(1) per lookup
            duplicates.append(item)
    return duplicates
$ joulec --lift python fixed_v1.py
  find_duplicates    8.20 nJ  (confidence: 0.70)  # was 285.00 nJ — 34x reduction

Fix #2: Hash set for search_all + early tracking

def search_all(items, targets):
    item_set = set(items)
    found = 0
    for t in targets:
        if t in item_set:
            found = found + 1
    return found
$ joulec --lift python fixed_v2.py
  search_all    6.80 nJ  (confidence: 0.75)  # was 285.00 nJ — 42x reduction

Fix #3: String builder for build_report

def build_report(records):
    n = len(records)
    parts = []
    for i in range(n):
        parts.append(f"Record {i}: {records[i]}")
    report = "\n".join(parts) + "\n"
    return report
$ joulec --lift python fixed_v3.py
  build_report    9.50 nJ  (confidence: 0.75)  # was 72.50 nJ — 7.6x reduction

Fix #4: Eliminate temporary allocation in process_batch

def process_batch(data):
    results = []
    for item in data:
        doubled = item * 2
        if doubled > 10:
            results.append(doubled)
    return results
$ joulec --lift python fixed_v4.py
  process_batch    6.10 nJ  (confidence: 0.75)  # was 18.30 nJ — 3x reduction

Step 5: Run Optimized Baseline

After all four fixes:

$ joulec --lift python optimized.py --energy-report optimized.json
Energy Analysis: optimized.py

  find_duplicates     8.20 nJ  (confidence: 0.70)
  build_report        9.50 nJ  (confidence: 0.75)
  process_batch       6.10 nJ  (confidence: 0.75)
  search_all          6.80 nJ  (confidence: 0.75)
  main               12.40 nJ  (confidence: 0.75)

  Total: 43.00 nJ

  No recommendations — all detected anti-patterns have been resolved.

Step 6: Apply Automated Optimization

The --energy-optimize flag applies four compiler passes on top of your fixes:

$ joulec --energy-optimize --lift-run python optimized.py
Energy Optimization Report:
  Pass 1 (Thermal-Aware Selection): 2 instructions adapted
  Pass 2 (Branch Optimization):     3 branches reordered
  Pass 3 (Loop Unrolling):          1 loop unrolled (trip count 4)
  Pass 4 (DRAM Layout Analysis):    no suggestions

  Optimized energy: 38.70 nJ (10.0% reduction from automated passes)

Step 7: Compare Results

FunctionBeforeAfter FixesAfter OptimizationReduction
find_duplicates285.00 nJ8.20 nJ7.40 nJ97.4%
build_report72.50 nJ9.50 nJ8.90 nJ87.7%
process_batch18.30 nJ6.10 nJ5.50 nJ69.9%
search_all285.00 nJ6.80 nJ6.10 nJ97.9%
main12.40 nJ12.40 nJ10.80 nJ12.9%
Total673.20 nJ43.00 nJ38.70 nJ94.3%

The manual fixes account for 93.6% of the savings. The automated passes add another 10% on top.

Step 8: Set an Energy Budget for CI

# Set budget at 50 nJ — optimized version passes
$ joulec --lift python optimized.py --energy-budget 50nJ
# Exit code: 0 (within budget)

# The original version would fail
$ joulec --lift python anti_patterns.py --energy-budget 50nJ
# Exit code: 1 (budget exceeded: 673.20 nJ > 50.00 nJ)

GitHub Actions Integration

- name: Energy budget check
  run: |
    joulec --lift python src/core.py --energy-budget 100nJ
    joulec --lift python src/utils.py --energy-budget 50nJ

The build fails if any file exceeds its budget, catching energy regressions before merge.

Step 9: Generate Reports for Dashboards

$ joulec --lift python optimized.py --energy-report report.json

The JSON report includes per-function energy, confidence scores, and any remaining recommendations. Feed this into Grafana, Datadog, or any monitoring system to track energy consumption across releases.

Key Takeaways

  1. Start with --lift to get a baseline without running the code
  2. Fix critical recommendations first — algorithmic changes (O(n^2) → O(n)) yield the biggest savings
  3. Use --energy-optimize for automated passes on top of manual fixes
  4. Set --energy-budget in CI to prevent regressions
  5. Generate --energy-report JSON for tracking trends over time

Cross-Language Energy Comparison

The same algorithm, implemented in six languages, analyzed by Joule. This comparison reveals the energy cost of language abstractions and runtime overhead.

The Algorithm

Iterative Fibonacci computing fib(30). Chosen because it's simple enough to implement identically in every language, with enough arithmetic to produce meaningful energy differences.

Python

def fibonacci(n):
    if n <= 1:
        return n
    a = 0
    b = 1
    for i in range(2, n + 1):
        temp = a + b
        a = b
        b = temp
    return b

def main():
    result = fibonacci(30)
    print(result)

main()

JavaScript

function fibonacci(n) {
    if (n <= 1) return n;
    let a = 0;
    let b = 1;
    for (let i = 2; i <= n; i++) {
        const temp = a + b;
        a = b;
        b = temp;
    }
    return b;
}

function main() {
    const result = fibonacci(30);
    console.log(result);
}

main();

TypeScript

function fibonacci(n: number): number {
    if (n <= 1) return n;
    let a: number = 0;
    let b: number = 1;
    for (let i: number = 2; i <= n; i++) {
        const temp: number = a + b;
        a = b;
        b = temp;
    }
    return b;
}

function main(): void {
    const result: number = fibonacci(30);
    console.log(result);
}

main();

C

#include <stdio.h>

int fibonacci(int n) {
    if (n <= 1) return n;
    int a = 0;
    int b = 1;
    for (int i = 2; i <= n; i++) {
        int temp = a + b;
        a = b;
        b = temp;
    }
    return b;
}

int main() {
    int result = fibonacci(30);
    printf("%d\n", result);
    return 0;
}

Go

package main

import "fmt"

func fibonacci(n int) int {
    if n <= 1 {
        return n
    }
    a := 0
    b := 1
    for i := 2; i <= n; i++ {
        temp := a + b
        a = b
        b = temp
    }
    return b
}

func main() {
    result := fibonacci(30)
    fmt.Println(result)
}

Rust

fn fibonacci(n: i32) -> i32 {
    if n <= 1 {
        return n;
    }
    let mut a = 0;
    let mut b = 1;
    for _i in 2..=n {
        let temp = a + b;
        a = b;
        b = temp;
    }
    b
}

fn main() {
    let result = fibonacci(30);
    println!("{}", result);
}

Running the Comparison

joulec --lift python fibonacci.py
joulec --lift js fibonacci.js
joulec --lift ts fibonacci.ts
joulec --lift c fibonacci.c
joulec --lift go fibonacci.go
joulec --lift rust fibonacci.rs

Results

Languagefibonacci() Energymain() EnergyTotalConfidence
C1.75 nJ0.85 nJ2.60 nJ0.90
Rust1.75 nJ1.10 nJ2.85 nJ0.90
Go1.95 nJ1.20 nJ3.15 nJ0.85
JavaScript2.80 nJ1.50 nJ4.30 nJ0.85
TypeScript2.80 nJ1.50 nJ4.30 nJ0.85
Python3.40 nJ1.80 nJ5.20 nJ0.80

All programs produce the correct result: 832040.

Analysis

Why C and Rust Are Cheapest

Both C and Rust map directly to integer arithmetic with no runtime overhead. The fibonacci() function compiles to:

  • 29 integer additions (0.05 pJ each = 1.45 pJ)
  • 58 register moves (~0 pJ, register-to-register)
  • 29 loop iterations with branch (0.1 pJ each = 2.9 pJ)
  • 1 comparison + branch for the n <= 1 check

Total compute: ~4.4 pJ. The remaining energy comes from function call overhead, stack frame setup, and memory loads.

Rust's slightly higher main() cost accounts for println! macro expansion, which involves formatting machinery that printf avoids.

Why Go Costs Slightly More

Go's runtime includes goroutine scheduling infrastructure even for single-threaded programs. The fmt.Println call also involves reflection-based formatting that adds overhead beyond C's printf.

Why JavaScript Costs More

JavaScript numbers are f64 (double-precision float) even for integer arithmetic. The fibonacci() loop performs float addition instead of integer addition:

  • Integer add: 0.05 pJ
  • Float add: 0.35 pJ (7x more expensive)

This single type system decision accounts for most of JavaScript's energy premium.

Why TypeScript Equals JavaScript

TypeScript type annotations (n: number, let a: number) are erased before analysis. The runtime behavior is identical to JavaScript — same f64 arithmetic, same energy profile.

Why Python Costs the Most

Python's dynamic dispatch adds overhead per operation. Each + involves:

  1. Type check on both operands
  2. Method lookup (__add__)
  3. Result allocation (for large integers)

The energy model accounts for this dispatch overhead, making Python ~2x more expensive than C for pure arithmetic.

Thermal State Impact

Running with different thermal states changes the cost model's power efficiency factor:

joulec --lift c fibonacci.c --thermal-state cool       # aggressive optimization
joulec --lift c fibonacci.c --thermal-state hot        # conservative, reduced SIMD
Thermal StateC EnergyPython EnergyRatio
cool (< 50C)2.40 nJ4.80 nJ2.0x
nominal (50-70C)2.60 nJ5.20 nJ2.0x
hot (85-95C)3.10 nJ6.20 nJ2.0x

The absolute energy increases with temperature (thermal resistance reduces efficiency), but the ratio between languages stays constant for this workload because the algorithm is compute-bound with no SIMD opportunities.

The Energy Cost of Abstraction

This comparison quantifies something developers intuit but rarely measure: higher-level languages consume more energy for the same computation. The gap is not enormous — Python costs 2x what C costs for pure arithmetic — but it compounds across millions of function calls in production systems.

Joule makes this cost visible. Whether you're choosing a language for a new project, optimizing a hot path, or justifying a rewrite, you now have picojoule-level data to inform the decision.

Accelerator Energy Measurement

Joule measures energy consumption not just on CPUs but across GPUs, TPUs, and other accelerators. This guide covers the three-tier measurement approach, supported hardware, and how to use accelerator energy data in your programs.

Three-Tier Approach

Joule uses a tiered strategy to maximize energy measurement coverage:

Tier 1: Static Estimation

Available everywhere, no hardware access required. The compiler estimates energy from code structure using calibrated instruction costs. This is what powers #[energy_budget] at compile time.

Tier 2: CPU Performance Counters

On supported platforms, Joule reads hardware performance counters for actual CPU energy:

PlatformAPIGranularity
Intel/AMD LinuxRAPL via perf_eventPer-package, per-core
Intel/AMD LinuxRAPL via MSRPer-package
Apple Silicon macOSIOReport frameworkPer-cluster

Tier 3: Accelerator Energy

For GPU and accelerator workloads, Joule queries vendor-specific APIs. Each backend in TensorForge implements the EnergyTelemetry trait.

Vendor Coverage

VendorHardwareAPIEnergyPowerTemperature
NVIDIAGPUs (A100, H100, etc.)NVMLBoard-levelPer-GPUPer-GPU
AMDGPUs (MI250, MI300, etc.)ROCm SMIAverage powerPer-GPUPer-GPU
IntelGPUs, GaudiLevel ZeroPer-devicePer-domainPer-device
GoogleTPU v4, v5TPU RuntimePer-chipPer-chipPer-chip
AWSInferentia, TrainiumNeuron SDKPer-corePer-corePer-core
GroqLPUHLMLBoard-levelPer-devicePer-device
CerebrasCS-2, CS-3CS SDKWafer-scalePer-waferPer-wafer
SambaNovaSN30, SN40DataScale APIPer-RDUPer-RDUPer-RDU

API Details

NVIDIA (NVML)

The NVIDIA Management Library provides direct energy readings:

nvmlDeviceGetTotalEnergyConsumption(device, &energy_mj)
  • Returns total energy in millijoules since driver load
  • Subtract start from end measurement for per-operation energy
  • Available on all datacenter GPUs (V100, A100, H100, B100)
  • Supported on consumer GPUs (RTX 3000/4000/5000 series)

AMD (ROCm SMI)

ROCm System Management Interface provides power readings:

rsmi_dev_power_ave_get(device_index, sensor_id, &power_uw)
  • Returns average power in microwatts
  • Energy is derived from power * time
  • Available on MI series (MI250, MI300) and Radeon Pro

Intel (Level Zero)

Intel's Level Zero API provides power domain readings:

zesDeviceEnumPowerDomains(device, &count, domains)
zesPowerGetEnergyCounter(domain, &energy)
  • Energy counter in microjoules
  • Multiple power domains (package, card, memory)
  • Supports Intel Arc GPUs and Gaudi accelerators

Google (TPU Runtime)

tpu_device_get_energy_consumption(device, &energy_j)
  • Per-chip energy in joules
  • Available on TPU v4 and v5 pods
  • Accessed through the TPU runtime API

AWS (Neuron SDK)

neuron_device_get_power(device, &power_mw)
  • Per-NeuronCore power in milliwatts
  • Available on Inferentia and Trainium instances
  • Accessed through the Neuron runtime

Groq (HLML)

Groq's Hardware Library for Machine Learning mirrors the NVML API:

hlmlDeviceGetTotalEnergyConsumption(device, &energy_mj)
  • Board-level energy in millijoules
  • Available on Groq LPU cards

Cloud Detection

Joule automatically detects available accelerators using:

Device Files

PathAccelerator
/dev/nvidia*NVIDIA GPU
/dev/kfdAMD GPU (ROCm)
/dev/dri/renderD*Intel GPU
/dev/accel*Google TPU
/dev/neuron*AWS Inferentia/Trainium

Environment Variables

VariableAccelerator
CUDA_VISIBLE_DEVICESNVIDIA GPU
ROCR_VISIBLE_DEVICESAMD GPU
ZE_AFFINITY_MASKIntel GPU
TPU_NAMEGoogle TPU
NEURON_RT_NUM_CORESAWS Inferentia/Trainium
GROQ_DEVICE_IDGroq LPU

JSON Output

Set JOULE_ENERGY_JSON=1 to get structured JSON output with per-device breakdowns:

JOULE_ENERGY_JSON=1 joulec program.joule --emit c --energy-check -o program.c

Report Format

{
  "program": "program.joule",
  "timestamp": "2026-03-03T10:30:00Z",
  "devices": [
    {
      "type": "cpu",
      "vendor": "intel",
      "model": "Xeon w9-3595X",
      "energy_joules": 0.00042,
      "measurement": "rapl",
      "tier": 2
    },
    {
      "type": "gpu",
      "vendor": "nvidia",
      "model": "H100",
      "energy_joules": 0.0031,
      "measurement": "nvml",
      "tier": 3
    }
  ],
  "total_energy_joules": 0.00352,
  "functions": [
    {
      "name": "matrix_multiply",
      "energy_joules": 0.0028,
      "device": "gpu:0",
      "confidence": 0.95,
      "budget_joules": 0.005,
      "status": "within_budget"
    },
    {
      "name": "preprocess",
      "energy_joules": 0.00042,
      "device": "cpu",
      "confidence": 0.90,
      "budget_joules": 0.001,
      "status": "within_budget"
    }
  ]
}

Per-Device Breakdown

When multiple accelerators are present, the report includes energy per device:

{
  "devices": [
    { "type": "cpu", "energy_joules": 0.0012, "tier": 2 },
    { "type": "gpu", "vendor": "nvidia", "index": 0, "energy_joules": 0.045, "tier": 3 },
    { "type": "gpu", "vendor": "nvidia", "index": 1, "energy_joules": 0.043, "tier": 3 },
    { "type": "gpu", "vendor": "nvidia", "index": 2, "energy_joules": 0.044, "tier": 3 },
    { "type": "gpu", "vendor": "nvidia", "index": 3, "energy_joules": 0.046, "tier": 3 }
  ],
  "total_energy_joules": 0.1792
}

Using Accelerator Energy in Code

Energy Budgets on GPU Functions

#[energy_budget(max_joules = 0.05)]
#[gpu_kernel]
fn batch_matmul(a: Tensor, b: Tensor) -> Tensor {
    a.matmul(b)
}

The budget is checked against actual GPU energy consumption (Tier 3) when available, or estimated (Tier 1) otherwise.

Runtime Energy Query

use std::energy::{measure, EnergyReport};

let report: EnergyReport = measure(|| {
    model.forward(input)
});

println!("CPU energy: {} J", report.cpu_joules());
println!("GPU energy: {} J", report.gpu_joules());
println!("Total: {} J", report.total_joules());

Adaptive Energy Behavior

use std::energy::current_power_draw;

let power = current_power_draw();  // watts
if power > 200.0 {
    // Use energy-efficient path
    compute_sparse(data)
} else {
    // Full compute path
    compute_dense(data)
}

Fallback Behavior

When hardware energy APIs are unavailable, Joule falls back gracefully:

  1. If Tier 3 (accelerator) is unavailable, use Tier 2 (CPU counters) for CPU portions
  2. If Tier 2 is unavailable, use Tier 1 (static estimation)
  3. The confidence score reflects which tier was used

No program crashes due to missing energy hardware. The measurement degrades gracefully with reduced precision.

TensorForge

TensorForge is Joule's energy-aware machine learning framework. It provides a complete ML stack -- from tensor operations to distributed training to inference -- with energy measurement built into every layer.

Architecture

TensorForge is organized as 22 crates in the Joule workspace:

Foundation Crates

CratePurpose
tf-coreCore types, EnergyTelemetry trait, tensor metadata
tf-irTensorIR: HighOp (14 tensor operations), graph representation
tf-compilerOptimizationPass trait, graph rewriting infrastructure, 7 optimization passes
tf-autodiffAutomatic differentiation with real VJP (vector-Jacobian product) implementations
tf-halHardware abstraction: Device trait, memory management
tf-runtimeTensor execution runtime, memory pools, scheduling

Backend Crates

CrateHardware Target
tf-backend-cpux86/ARM CPUs with SIMD
tf-backend-cudaNVIDIA GPUs via CUDA
tf-backend-rocmAMD GPUs via ROCm/HIP
tf-backend-metalApple GPUs via Metal
tf-backend-tpuGoogle TPUs
tf-backend-level0Intel GPUs/accelerators via Level Zero
tf-backend-neuronAWS Inferentia/Trainium via Neuron SDK
tf-backend-groqGroq LPUs
tf-backend-gaudiIntel Gaudi (Habana Labs)
tf-backend-estimatedEnergy-estimated backend (no hardware required)

High-Level Crates

CratePurpose
tf-nnNeural network modules (Module trait, layers, activations)
tf-optimOptimizers (AdamW, SGD with momentum)
tf-dataData loading and batching
tf-serializeModel serialization/deserialization
tf-distributedDistributed training (ring, tree, halving-doubling collectives)
tf-inferInference engine (KV cache, speculative decoding, scheduling)

EnergyTelemetry Trait

The EnergyTelemetry trait is the foundation of TensorForge's energy awareness. Every backend implements it:

pub trait EnergyTelemetry {
    fn energy_consumed_joules(&self) -> f64;
    fn power_draw_watts(&self) -> f64;
    fn temperature_celsius(&self) -> f64;
    fn reset_counters(&mut self);
}

This means every tensor operation -- every matmul, every convolution, every activation -- has a measurable energy cost. The energy data flows up through the framework:

  • Individual ops report energy via the backend's telemetry
  • The optimizer aggregates energy per training step
  • The training loop reports energy per epoch
  • The distributed runtime aggregates energy across all nodes

TensorIR

TensorForge uses a graph-based intermediate representation with 14 high-level operations:

OperationDescription
MatMulMatrix multiplication
Conv2D2D convolution
BatchNormBatch normalization
ReluReLU activation
SoftmaxSoftmax
AddElement-wise addition
MulElement-wise multiplication
ReduceReduction (sum, mean, max)
ReshapeTensor reshape
TransposeTensor transpose
ConcatTensor concatenation
SliceTensor slicing
GatherIndex-based gathering
ScatterIndex-based scattering

Graph Optimization

The tf-compiler provides 7 optimization passes:

  1. Operator Fusion -- Fuse sequences like Conv2D+BatchNorm+ReLU into a single kernel
  2. Layout Optimization -- Choose optimal memory layout (NCHW vs NHWC) per backend
  3. Constant Folding -- Evaluate constant subgraphs at compile time
  4. Dead Node Elimination -- Remove unused computation
  5. Common Subexpression Elimination -- Share identical computations
  6. Memory Planning -- Minimize peak memory usage through buffer reuse
  7. Energy-Aware Scheduling -- Reorder operations to minimize energy consumption

Autodiff

TensorForge implements reverse-mode automatic differentiation with real VJP implementations for all operations. No stubs, no placeholders -- every backward pass computes correct gradients:

use tf_autodiff::backward;

let loss = model.forward(input);
let gradients = backward(loss);  // real gradient computation
optimizer.step(gradients);

Neural Network API

The tf-nn crate provides a Module trait for building neural networks:

use tf_nn::{Module, Linear, Conv2d, BatchNorm2d, relu};

struct ResBlock {
    conv1: Conv2d,
    bn1: BatchNorm2d,
    conv2: Conv2d,
    bn2: BatchNorm2d,
}

impl Module for ResBlock {
    fn forward(&self, x: Tensor) -> Tensor {
        let residual = x;
        let out = self.conv1.forward(x)
            |> self.bn1.forward
            |> relu
            |> self.conv2.forward
            |> self.bn2.forward;
        relu(out + residual)
    }
}

Optimizers

The tf-optim crate provides energy-tracked optimizers:

use tf_optim::{AdamW, SGD};

// AdamW with weight decay
let optimizer = AdamW::new(model.parameters(), lr: 0.001, weight_decay: 0.01);

// SGD with momentum
let optimizer = SGD::new(model.parameters(), lr: 0.01, momentum: 0.9);

Every optimizer step reports energy consumed:

let energy = optimizer.step(gradients);
println!("Step energy: {} J", energy.joules());

Distributed Training

The tf-distributed crate supports multi-node training with three collective algorithms:

AlgorithmPatternBest For
Ring AllReduceEach node sends to next neighborLarge models, high bandwidth
Tree AllReduceBinary tree reductionLow latency
Halving-DoublingRecursive halving then doublingBalanced

Energy is tracked across all nodes, giving total training energy:

use tf_distributed::DistributedTrainer;

let trainer = DistributedTrainer::new(
    model,
    world_size: 8,
    algorithm: CollectiveAlgorithm::Ring,
);

let metrics = trainer.train(dataset, epochs: 10);
println!("Total energy across {} nodes: {} J", 8, metrics.total_energy_joules());

Inference Engine

The tf-infer crate provides a high-performance inference engine with:

Paged KV Cache

Efficient key-value caching for transformer models. Memory is allocated in pages, avoiding fragmentation:

use tf_infer::KvCache;

let cache = KvCache::paged(
    num_layers: 32,
    num_heads: 32,
    head_dim: 128,
    page_size: 256,
);

Continuous Batching

Dynamic batching that adds new requests to a running batch without waiting for all current requests to complete:

use tf_infer::ContinuousBatcher;

let batcher = ContinuousBatcher::new(max_batch_size: 64);
batcher.add_request(prompt);
let outputs = batcher.step();  // processes all pending requests

Speculative Decoding

Use a smaller draft model to generate candidates, then verify with the full model:

use tf_infer::SpeculativeDecoder;

let decoder = SpeculativeDecoder::new(
    target_model: large_model,
    draft_model: small_model,
    num_speculative_tokens: 5,
);

Sampling Pipeline

Configurable token sampling with temperature, top-k, top-p, and repetition penalty:

use tf_infer::SamplingConfig;

let config = SamplingConfig {
    temperature: 0.7,
    top_k: 50,
    top_p: 0.9,
    repetition_penalty: 1.1,
};

Energy-Aware Scheduling

The inference scheduler considers energy costs when choosing batch sizes and scheduling decisions. It can enforce energy budgets on inference requests:

use tf_infer::EnergyAwareScheduler;

let scheduler = EnergyAwareScheduler::new(
    max_energy_per_request: 0.5,  // joules
    max_power_draw: 200.0,        // watts
);

Compiler Integration

TensorForge integrates with the Joule compiler through the joule-codegen-tensorforge crate. When Joule code uses tensor operations, the compiler:

  1. Lowers tensor expressions to TensorIR
  2. Applies graph optimization passes
  3. Selects the backend based on --target
  4. Generates backend-specific code
  5. Instruments energy telemetry calls

This means energy budgets work with ML code:

#[energy_budget(max_joules = 10.0)]
fn train_epoch(model: &mut Model, data: DataLoader) -> f64 {
    let mut total_loss = 0.0;
    for batch in data {
        let loss = model.forward(batch.input);
        let grads = backward(loss);
        optimizer.step(grads);
        total_loss = total_loss + loss.item();
    }
    total_loss
}

Joule Language Reference

The formal specification of Joule's syntax and semantics.

Contents

  • Types -- Primitive types, compound types, union types, generics, type inference
  • Expressions -- Operators, pipe operator, literals, control flow, closures
  • Items -- Functions, structs, enums, traits, impls, modules, const fn, comptime
  • Patterns -- Pattern matching: or patterns, range patterns, guard clauses
  • Attributes -- Energy budgets, #[test], #[bench], thermal awareness, derive macros
  • Memory -- Ownership, borrowing, references, lifetimes
  • Concurrency -- Async/await, spawn, bounded channels, task groups, parallel for, supervisors
  • Energy -- Energy system formal specification with accelerator support

Notation

In syntax descriptions:

  • monospace indicates literal syntax
  • italics indicate a syntactic category (e.g., expression, type)
  • [ ] indicates optional elements
  • { } indicates zero or more repetitions
  • | separates alternatives

Types

Primitive Types

Integer Types

TypeSizeRange
i88-bit-128 to 127
i1616-bit-32,768 to 32,767
i3232-bit-2^31 to 2^31-1
i6464-bit-2^63 to 2^63-1
isizepointer-sizedPlatform dependent
u88-bit0 to 255
u1616-bit0 to 65,535
u3232-bit0 to 2^32-1
u6464-bit0 to 2^64-1
usizepointer-sizedPlatform dependent

Integer literals default to i32. Use suffixes for other types: 42u8, 100i64, 0usize.

Floating-Point Types

TypeSizePrecision
f1616-bit~3 decimal digits (IEEE 754 half-precision)
bf1616-bit~3 decimal digits (Brain Float, ML workloads)
f3232-bit~7 decimal digits
f6464-bit~15 decimal digits

Float literals default to f64. Use suffix for f32: 3.14f32.

f16 and bf16 are half-precision types for ML inference and signal processing. bf16 has the same exponent range as f32 but fewer mantissa bits — ideal for neural network weights. Energy cost: 0.4 pJ per operation (vs 0.35 pJ for f32).

Boolean

let a: bool = true;
let b: bool = false;

Character

let c: char = 'A';     // Unicode scalar value
let emoji: char = '\u{1F600}';

Unit Type

The unit type () represents the absence of a meaningful value. Functions without a return type return ().

Compound Types

Tuples

Fixed-length, heterogeneous sequences:

let pair: (i32, String) = (42, "hello");
let (x, y) = pair;     // destructuring
let first = pair.0;     // field access

Arrays

Fixed-length, homogeneous sequences:

let arr: [i32; 5] = [1, 2, 3, 4, 5];
let zeros = [0; 10];    // 10 zeros
let first = arr[0];     // indexing

Slices

Dynamically-sized views into arrays:

let slice: &[i32] = &arr[1..3];

String Types

String

Owned, heap-allocated, growable UTF-8 string:

let s: String = "Hello, world!";
let greeting = "Hi " + name;   // concatenation
let len = s.len();              // byte length

&str

Borrowed string slice:

let s: &str = "literal";

Union Types

Union types allow a value to be one of several types. They are declared with the | separator:

type Number = i32 | i64 | f64;
type JsonValue = i64 | f64 | String | bool | Vec<JsonValue>;
type Result = Data | ErrorCode;

Union types are matched exhaustively:

fn describe(val: Number) -> String {
    match val {
        x: i32 => format!("i32: {}", x),
        x: i64 => format!("i64: {}", x),
        x: f64 => format!("f64: {}", x),
    }
}

Union Type Rules

  • Each constituent type must be distinct
  • The compiler tracks which variant is active at runtime via a discriminant tag
  • Pattern matching on union types is exhaustive -- all variants must be handled
  • Union types compose: type A = B | C where B and C can themselves be union types

Generic Types

Vec

Dynamic array:

let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
let first = v[0];

Option

Optional value:

let some: Option<i32> = Option::Some(42);
let none: Option<i32> = Option::None;

Result<T, E>

Fallible operation:

let ok: Result<i32, String> = Result::Ok(42);
let err: Result<i32, String> = Result::Err("failed");

Box

Heap-allocated value:

let boxed: Box<i32> = Box::new(42);

HashMap<K, V>

Key-value map:

let mut map: HashMap<String, i32> = HashMap::new();
map.insert("key", 42);

Smart Pointers

See Smart Pointers for full documentation.

let rc = Rc::new(42);            // single-threaded shared ownership
let arc = Arc::new(42);          // thread-safe shared ownership
let cow = Cow::borrowed("hi");   // clone-on-write

Const-Generic Types

// SmallVec — inline buffer with heap spillover
let mut sv: SmallVec[i32; 8] = SmallVec::new();
sv.push(42);   // stored inline (no allocation until > 8 elements)

// Simd — portable SIMD vectors
let v: Simd[f32; 4] = Simd::splat(1.0);
let w: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let sum = v.add(&w);

See Simd for full SIMD documentation.

N-Dimensional Arrays

See NDArray for full documentation.

let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]);
let view: NDView[f64; 1] = mat.row(0);

Type Aliases

pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;

Type Inference

The compiler infers types when possible:

let x = 42;           // inferred as i32
let v = Vec::new();   // type inferred from usage
v.push(1u8);          // now inferred as Vec<u8>

Explicit annotations are required when the type cannot be inferred from context.

Type Casting

Use as for numeric conversions:

let x: i32 = 42;
let y: f64 = x as f64;
let z: u8 = x as u8;    // truncation
let p: usize = x as usize;

Expressions

Joule is expression-oriented. Most constructs return a value, including if, match, and blocks.

Literals

42          // integer (i32)
3.14        // float (f64)
true        // bool
'A'         // char
"hello"     // String

Integer Literal Suffixes

42i8    42i16   42i32   42i64   42isize
42u8    42u16   42u32   42u64   42usize

Float Literal Suffixes

3.14f32     3.14f64

Arithmetic Operators

OperatorOperationTypes
+Additionintegers, floats, String concatenation
-Subtractionintegers, floats
*Multiplicationintegers, floats
/Divisionintegers, floats
%Remainderintegers
**Exponentiationintegers, floats (right-associative)

Comparison Operators

OperatorOperation
==Equal
!=Not equal
<Less than
>Greater than
<=Less or equal
>=Greater or equal

All comparison operators return bool.

Logical Operators

OperatorOperation
&&Logical AND (short-circuit)
||Logical OR (short-circuit)
!Logical NOT

Bitwise Operators

OperatorOperation
&Bitwise AND
|Bitwise OR
^Bitwise XOR
~Bitwise NOT
<<Left shift
>>Right shift

Pipe Operator

The pipe operator |> passes the result of the left-hand expression as the first argument to the right-hand function:

// Without pipe
let result = process(transform(parse(input)));

// With pipe -- reads left to right
let result = input |> parse |> transform |> process;

Pipe with Multi-Argument Functions

When the right-hand side is a call with arguments, the piped value is inserted as the first argument:

let result = data
    |> filter(|x| x > 0)
    |> map(|x| x * 2)
    |> take(10);

Pipe Precedence

The pipe operator has lower precedence than all other operators except assignment. It is left-associative:

// These are equivalent:
a |> f |> g
g(f(a))

Assignment

let mut x = 0;
x = 42;

Compound assignment is not supported. Use x = x + 1 instead of x += 1.

Block Expressions

A block evaluates to its last expression:

let result = {
    let a = 10;
    let b = 20;
    a + b       // no semicolon -- this is the block's value
};
// result == 30

If Expressions

if is an expression and returns a value:

let max = if a > b { a } else { b };

Without else, the type is ():

if condition {
    do_something();
}

Chained:

if x > 0 {
    "positive"
} else if x < 0 {
    "negative"
} else {
    "zero"
}

Match Expressions

Exhaustive pattern matching:

let name = match color {
    Color::Red => "red",
    Color::Green => "green",
    Color::Blue => "blue",
};

See Patterns for pattern syntax.

Loops

While Loop

while condition {
    body();
}

For Loop

for item in collection {
    process(item);
}

Loop (Infinite)

loop {
    if done() {
        break;
    }
}

Break and Continue

loop {
    if skip_this() {
        continue;
    }
    if finished() {
        break;
    }
}

Function Calls

let result = add(1, 2);

Method Calls

let len = string.len();
let upper = string.to_uppercase();

Field Access

let x = point.x;
let name = person.name;

Index Access

let first = vec[0];
let char = string[i];

Struct Construction

let p = Point { x: 3.0, y: 4.0 };

Enum Variant Construction

let c = Shape::Circle { radius: 5.0 };
let ok = Result::Ok(42);

Return

Explicit return from a function:

fn find(items: Vec<i32>, target: i32) -> Option<i32> {
    let mut i = 0;
    while i < items.len() {
        if items[i] == target {
            return Option::Some(items[i]);
        }
        i = i + 1;
    }
    Option::None
}

Type Cast

let x = 42i32 as f64;
let y = offset as usize;

Items

Items are the top-level declarations in a Joule program.

Functions

fn name(param: Type, param2: Type) -> ReturnType {
    body
}

Visibility

pub fn public_function() { }     // visible outside module
fn private_function() { }        // module-private (default)

Parameters

Parameters are passed by value (move) by default:

fn process(data: Vec<u8>) {
    // data is moved into this function
}

Use references for borrowing:

fn inspect(data: &Vec<u8>) {
    // data is borrowed immutably
}

fn modify(data: &mut Vec<u8>) {
    // data is borrowed mutably
}

Self Parameter

Methods take self as their first parameter:

impl Point {
    fn distance(self) -> f64 { }          // takes ownership
    fn inspect(self) -> f64 { }           // immutable self
    fn translate(mut self, dx: f64) { }   // mutable self
}

Generic Functions

fn first<T>(items: Vec<T>) -> Option<T> {
    if items.len() > 0 {
        Option::Some(items[0])
    } else {
        Option::None
    }
}

Extern Functions

Functions implemented outside Joule (FFI):

extern fn sqrt(x: f64) -> f64;
extern fn malloc(size: usize) -> *mut u8;

Const Functions

Functions that can be evaluated at compile time are declared with const fn:

const fn max(a: i32, b: i32) -> i32 {
    if a > b { a } else { b }
}

const fn factorial(n: i32) -> i32 {
    if n <= 1 { 1 } else { n * factorial(n - 1) }
}

// Use at compile time
const MAX_SIZE: i32 = max(100, 200);
const FACT_10: i32 = factorial(10);

Const Function Restrictions

const fn bodies are restricted to operations the compiler can evaluate:

  • Arithmetic operations on primitive types
  • Control flow (if, match, recursion)
  • Local variable bindings
  • Calling other const fn functions

The following are not allowed in const fn:

  • Heap allocation (Vec::new(), Box::new())
  • I/O operations
  • Mutable static state
  • Non-const function calls

Comptime Blocks

For more complex compile-time computation, use comptime blocks:

comptime {
    let table = generate_sin_table(1024);
}

// table is available as a compile-time constant
fn fast_sin(x: f64) -> f64 {
    let index = (x * 1024.0 / TAU) as usize;
    table[index % 1024]
}

Comptime blocks execute during compilation and make their results available as constants in runtime code. The HIR const evaluator handles arithmetic, control flow, and function calls within comptime blocks.

Structs

Named product types with fields:

pub struct Point {
    pub x: f64,
    pub y: f64,
}

Field Visibility

Fields are private by default. Use pub to make them accessible:

pub struct Config {
    pub name: String,       // public
    secret_key: String,     // private
}

Generic Structs

pub struct Pair<A, B> {
    pub first: A,
    pub second: B,
}

Enums

Sum types (tagged unions) with variants:

pub enum Color {
    Red,
    Green,
    Blue,
}

Variants with Data

pub enum Shape {
    Circle { radius: f64 },
    Rectangle { width: f64, height: f64 },
    Point,
}

Tuple Variants

pub enum Option<T> {
    Some(T),
    None,
}

pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

Generic Enums

pub enum Either<L, R> {
    Left(L),
    Right(R),
}

Impl Blocks

Associate methods with a type:

impl Point {
    // Associated function (no self)
    pub fn new(x: f64, y: f64) -> Point {
        Point { x, y }
    }

    // Method (takes self)
    pub fn distance(self) -> f64 {
        (self.x * self.x + self.y * self.y).sqrt()
    }
}

Multiple impl blocks are allowed for the same type:

impl Point {
    pub fn new(x: f64, y: f64) -> Point { Point { x, y } }
}

impl Point {
    pub fn translate(mut self, dx: f64, dy: f64) {
        self.x = self.x + dx;
        self.y = self.y + dy;
    }
}

Traits

Define shared behavior:

pub trait Display {
    fn to_string(self) -> String;
}

pub trait Clone {
    fn clone(self) -> Self;
}

Trait Implementation

impl Display for Point {
    fn to_string(self) -> String {
        "(" + self.x.to_string() + ", " + self.y.to_string() + ")"
    }
}

Trait Bounds

fn print_all<T: Display>(items: Vec<T>) {
    for item in items {
        println!("{}", item.to_string());
    }
}

Dynamic Dispatch

Use dyn Trait for runtime polymorphism:

fn print_shape(shape: &dyn Display) {
    println!("{}", shape.to_string());
}

Type Aliases

pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;

Modules

Module Declarations

Modules organize code into separate files. The mod keyword declares a module:

mod lexer;      // loads from lexer.joule or lexer/mod.joule
mod parser;
mod typeck;

File Resolution

When the compiler encounters mod foo;, it searches for:

  1. foo.joule in the same directory as the current file
  2. foo/mod.joule for modules with sub-modules

Public Module Re-exports

pub mod utils;      // re-exports utils module to parent

Inline Modules

Modules can be defined inline within a file:

mod helpers {
    pub fn clamp(x: i32, lo: i32, hi: i32) -> i32 {
        if x < lo { lo } else if x > hi { hi } else { x }
    }
}

// Use items from inline module
let clamped = helpers::clamp(value, 0, 100);

Visibility

pub mod public_module { }
mod private_module { }

Use Declarations

Import items into scope:

// Import specific items
use crate::ast::{File, AstItem, Visibility};

// Import all items from a module
use crate::prelude::*;

// Standard library imports
use std::collections::HashMap;
use std::math::*;

// Import with alias
use crate::ast::File as AstFile;

Stdlib Path

The --stdlib-path CLI flag specifies the location of the standard library. The builtin registry includes modules for math, statistics, and compute:

use std::math::*;         // sin, cos, sqrt, etc.
use std::statistics::*;   // mean, median, std_dev

Let Statements

Variable bindings:

let x = 42;                    // immutable, type inferred
let y: f64 = 3.14;            // immutable, explicit type
let mut z = 0;                 // mutable
let (a, b) = (1, 2);          // destructuring

Patterns

Patterns are used in match expressions, let bindings, and function parameters to destructure values.

Literal Patterns

match x {
    0 => "zero",
    1 => "one",
    _ => "other",
}

Identifier Patterns

Bind the matched value to a name:

match value {
    x => println!("Got: {}", x),
}

Wildcard Pattern

_ matches any value and discards it:

match pair {
    (x, _) => println!("First: {}", x),
}

Enum Variant Patterns

Tuple Variants

match option {
    Option::Some(value) => use_value(value),
    Option::None => handle_empty(),
}

Named Field Variants

match shape {
    Shape::Circle { radius } => 3.14159 * radius * radius,
    Shape::Rectangle { width, height } => width * height,
    Shape::Point => 0.0,
}

Nested Patterns

match result {
    Result::Ok(Option::Some(value)) => use_value(value),
    Result::Ok(Option::None) => handle_none(),
    Result::Err(e) => handle_error(e),
}

Struct Patterns

let Point { x, y } = point;

In match:

match token {
    Token { kind: TokenKind::Fn, span } => parse_function(span),
    Token { kind: TokenKind::Struct, span } => parse_struct(span),
    _ => parse_expression(),
}

Tuple Patterns

let (a, b) = (1, 2);

match pair {
    (0, 0) => "origin",
    (x, 0) => "x-axis",
    (0, y) => "y-axis",
    (x, y) => "other",
}

Reference Patterns

match &value {
    &Some(x) => use_value(x),
    &None => handle_none(),
}

Or Patterns

Match multiple alternatives in a single arm using |:

match x {
    1 | 2 | 3 => "small",
    4 | 5 | 6 => "medium",
    _ => "large",
}

Or patterns work with enum variants:

match direction {
    Direction::North | Direction::South => "vertical",
    Direction::East | Direction::West => "horizontal",
}

They also work with nested patterns:

match result {
    Result::Ok(1 | 2 | 3) => "small success",
    Result::Ok(_) => "other success",
    Result::Err(_) => "failure",
}

Range Patterns

Match a contiguous range of values using ..= (inclusive):

match score {
    0..=59 => "F",
    60..=69 => "D",
    70..=79 => "C",
    80..=89 => "B",
    90..=100 => "A",
    _ => "invalid",
}

Range patterns work with integer types:

match byte {
    0x00..=0x1F => "control character",
    0x20..=0x7E => "printable ASCII",
    0x7F => "delete",
    _ => "extended",
}

And with characters:

match c {
    'a'..='z' => "lowercase",
    'A'..='Z' => "uppercase",
    '0'..='9' => "digit",
    _ => "other",
}

Guard Clauses

Add a boolean condition to a match arm with if:

match value {
    x if x > 100 => "large",
    x if x > 0 => "positive",
    x if x < 0 => "negative",
    _ => "zero",
}

Guards can reference variables bound in the pattern:

match point {
    Point { x, y } if x == y => "on diagonal",
    Point { x, y } if x == 0 => "on y-axis",
    Point { x, y } if y == 0 => "on x-axis",
    _ => "general",
}

Guards combine with or patterns:

match value {
    1 | 2 | 3 if verbose => {
        println!("small value: {}", value);
        "small"
    }
    _ => "other",
}

Guard Evaluation

  • The guard expression is evaluated only if the structural pattern matches
  • Guards do not affect exhaustiveness checking -- the compiler still requires all variants to be covered
  • If the guard evaluates to false, matching continues to the next arm

Exhaustiveness

The compiler verifies that match expressions cover all possible cases. Omitting a variant produces a compile-time error:

error: non-exhaustive match
  --> program.joule:10:5
   |
10 |     match color {
   |     ^^^^^ missing variants: Blue

Use _ as a catch-all when you don't need to handle every variant explicitly.

Attributes

Attributes are metadata attached to items (functions, structs, enums) that modify their behavior or provide information to the compiler.

Syntax

Attributes are placed above the item they annotate, prefixed with #[...]:

#[attribute_name]
fn function() { }

#[attribute_name(key = value)]
fn function_with_args() { }

Energy Budget

The primary attribute in Joule. Declares the maximum energy a function is allowed to consume:

#[energy_budget(max_joules = 0.0001)]
fn efficient_add(x: i32, y: i32) -> i32 {
    x + y
}

Parameters

ParameterTypeDescription
max_joulesf64Maximum energy in joules
max_wattsf64Maximum average power in watts
max_temp_deltaf64Maximum temperature rise in degrees Celsius

Multiple parameters can be combined:

#[energy_budget(max_joules = 0.001, max_watts = 20.0, max_temp_delta = 3.0)]
fn critical_path() { }

See Energy System Guide for details.

Thermal Awareness

Marks a function as thermal-aware. The compiler may insert thermal throttling checks:

#[thermal_aware]
fn heavy_compute(data: Vec<f64>) -> f64 {
    // ...
}

Test

Marks a function as a test. Test functions are collected and executed when the compiler runs with --test:

#[test]
fn test_addition() {
    assert_eq!(add(2, 3), 5);
}

#[test]
fn test_sort_correctness() {
    let data = vec![5, 3, 1, 4, 2];
    let sorted = sort(data);
    assert_eq!(sorted[0], 1);
    assert_eq!(sorted[4], 5);
}

Test Energy Reporting

Every test run includes energy consumption data. The test runner reports:

  • Pass/fail status
  • Energy consumed by each test (in joules)
  • Total energy across all tests
joulec program.joule --test

Output:

running 3 tests
test test_addition ... ok (0.000012 J)
test test_sort_correctness ... ok (0.000089 J)
test test_fibonacci ... ok (0.000341 J)

test result: ok. 3 passed; 0 failed
total energy: 0.000442 J

Bench

Marks a function as a benchmark. Benchmark functions are collected and executed when the compiler runs with --bench:

#[bench]
fn bench_matrix_multiply() {
    let a = Matrix::random(100, 100);
    let b = Matrix::random(100, 100);
    let _ = a.multiply(b);
}

#[bench]
fn bench_sort_large() {
    let data = generate_random_vec(10000);
    let _ = sort(data);
}

Bench Energy Reporting

Benchmarks report timing and energy data over multiple iterations:

joulec program.joule --bench

Output:

running 2 benchmarks
bench bench_matrix_multiply ... 1,234 ns/iter (+/- 56) | 0.00185 J/iter
bench bench_sort_large      ...   892 ns/iter (+/- 23) | 0.00134 J/iter

total energy: 3.19 J (1000 iterations each)

Derive

Automatically implement traits for a type:

#[derive(Clone, Debug)]
pub struct Point {
    pub x: f64,
    pub y: f64,
}

Available Derive Traits

TraitDescription
CloneValue can be duplicated
DebugDebug string representation
EqEquality comparison
SerializeSerialization support

GPU Kernel

Marks a function for GPU execution (requires MLIR backend):

#[gpu_kernel]
fn vector_add(a: Vec<f32>, b: Vec<f32>) -> Vec<f32> {
    // ...
}

Visibility

While not strictly an attribute, visibility modifiers control access:

pub fn public_function() { }      // visible everywhere
pub(crate) fn crate_function() { } // visible within the crate
fn private_function() { }          // module-private (default)

Memory Model

Joule uses an ownership-based memory model inspired by Rust. Memory is managed at compile time with no garbage collector.

Ownership

Every value has exactly one owner. When the owner goes out of scope, the value is dropped (memory freed):

fn example() {
    let s = String::from("hello");  // s owns the string
    process(s);                      // ownership moves to process()
    // s is no longer valid here
}

Move Semantics

Assignment and function calls transfer ownership by default:

let a = Vec::new();
let b = a;          // a is moved to b
// a is no longer valid

References

References borrow a value without taking ownership:

Immutable References

fn inspect(data: &Vec<i32>) {
    let len = data.len();
    // data is borrowed, not consumed
}

let v = Vec::new();
inspect(&v);      // borrow v
// v is still valid here

Multiple immutable references can coexist:

let r1 = &v;
let r2 = &v;    // ok: multiple immutable borrows

Mutable References

fn modify(data: &mut Vec<i32>) {
    data.push(42);
}

let mut v = Vec::new();
modify(&mut v);   // mutable borrow

Only one mutable reference can exist at a time:

let r1 = &mut v;
// let r2 = &mut v;  // error: cannot borrow mutably twice

Borrowing Rules

The borrow checker enforces these rules at compile time:

  1. At any given time, you can have either:
    • One mutable reference, OR
    • Any number of immutable references
  2. References must always be valid -- no dangling references
  3. No mutable aliasing -- if a mutable reference exists, no other references to the same data can exist

Lifetimes

Lifetimes ensure references don't outlive the data they point to (planned):

fn first_word<'a>(s: &'a str) -> &'a str {
    // The returned reference lives as long as the input
    s.split(" ").next().unwrap_or("")
}

Box

Heap allocation with single ownership:

let boxed = Box::new(42);     // allocate on the heap
let value = *boxed;            // dereference

// Required for recursive types
pub enum List<T> {
    Cons(T, Box<List<T>>),
    Nil,
}

Box auto-derefs for field access:

let expr = Box::new(Expr { kind: ExprKind::Literal(42), span: Span::dummy() });
let kind = expr.kind;    // auto-deref through Box

Raw Pointers

For unsafe, low-level memory access:

unsafe {
    let ptr: *mut i32 = addr as *mut i32;
    *ptr = 42;
}

Raw pointers bypass the borrow checker. Use only when necessary and always within unsafe blocks.

Stack vs. Heap

AllocationWhenPerformance
StackLocal variables, small typesFast (pointer bump)
Heap (Box)Recursive types, large data, dynamic sizeSlower (allocator call)
Heap (Vec)Dynamic arraysAmortized fast

The compiler places values on the stack by default. Use Box<T> to explicitly heap-allocate.

Concurrency

Joule provides structured concurrency primitives for safe parallel execution.

Async/Await

Functions that perform asynchronous operations are marked async:

async fn fetch_data(url: String) -> Result<String, Error> {
    let response = http::get(url).await?;
    Result::Ok(response.body())
}

The await keyword suspends execution until the asynchronous operation completes.

Async Energy Tracking

Async operations are fully energy-tracked. The compiler inserts timing wrappers around Spawn, TaskAwait, TaskGroupEnter, and TaskGroupExit operations to measure the energy consumed by asynchronous work:

#[energy_budget(max_joules = 0.005)]
async fn process_pipeline(urls: Vec<String>) -> Vec<Data> {
    let mut results: Vec<Data> = Vec::new();
    for url in urls {
        let data = fetch_data(url).await;    // energy tracked
        results.push(data);
    }
    results
}

Desugaring

Async functions are desugared to state machines backed by Task types. The await keyword becomes a yield point that checks for task completion and records energy consumed during the suspension.

Spawn

Launch a concurrent task:

use std::concurrency::spawn;

let handle = spawn(|| {
    heavy_computation()
});

let result = handle.join();

Task Pool

Under the hood, spawn submits work to a pthread-based task pool with 256 task slots. Tasks are distributed across worker threads and managed by the runtime:

  • Worker threads are pre-allocated (one per CPU core)
  • Tasks are stored in a fixed-size array (256 slots)
  • Task submission is lock-free on the fast path
  • Energy consumption is tracked per-task with thread-safe atomic counters

Channels

Send values between tasks using bounded channels:

use std::concurrency::{channel, Sender, Receiver};

let (tx, rx) = channel(capacity: 100);

spawn(|| {
    for i in 0..1000 {
        tx.send(i);    // blocks when buffer is full
    }
});

let value = rx.recv();  // blocks when buffer is empty

Bounded Channel Implementation

Channels are implemented as ring buffers protected by mutex/condvar pairs:

  • Capacity: Specified at creation time, provides backpressure
  • Blocking: send() blocks when the buffer is full; recv() blocks when empty
  • Thread Safety: Mutex protects the ring buffer; condvars signal producers and consumers
  • Energy: Channel operations are energy-tracked -- both send and receive costs are attributed to the calling task

Unbounded Channels

For cases where backpressure is not needed:

let (tx, rx) = channel();    // unbounded (grows as needed)

Task Groups

Structured concurrency with automatic cancellation:

use std::concurrency::TaskGroup;

let group = TaskGroup::new();

group.spawn(|| process_chunk_1());
group.spawn(|| process_chunk_2());
group.spawn(|| process_chunk_3());

let results = group.join_all();  // waits for all tasks

If any task panics, the group cancels all remaining tasks. Energy consumption is aggregated across all tasks in the group.

Parallel For

Parallel iteration distributes work across threads automatically:

let results = parallel for item in data {
    heavy_computation(item)
};

With explicit chunk size:

let processed = parallel(chunk_size: 1024) for row in matrix {
    transform(row)
};

The compiler sums energy consumption across all parallel branches for budget enforcement.

Mutex

Mutual exclusion for shared state:

use std::concurrency::Mutex;

let counter = Mutex::new(0);

// In a concurrent task:
let mut guard = counter.lock();
*guard = *guard + 1;
// guard dropped here, lock released

Atomic Types

Lock-free primitives for simple shared state:

use std::concurrency::AtomicI32;

let counter = AtomicI32::new(0);
counter.fetch_add(1);
let value = counter.load();

Supervisors

Supervisors manage task lifecycles with restart strategies:

use std::concurrency::Supervisor;

let sup = Supervisor::new(RestartStrategy::OneForOne);

sup.spawn("worker", || {
    process_queue()
});

sup.run();

Restart strategies:

StrategyBehavior
OneForOneOnly the failed task is restarted
OneForAllAll tasks are restarted when one fails
RestForOneThe failed task and all tasks started after it are restarted

Safety Guarantees

The ownership system prevents data races at compile time:

  • Shared state must be wrapped in Mutex, Atomic, or other synchronization primitives
  • The borrow checker ensures no mutable aliasing across tasks
  • Task groups provide structured lifetimes for spawned work
  • Channels provide safe, typed communication between tasks
  • Energy tracking is thread-safe using atomic counters

Energy System Specification

This is the formal specification for Joule's compile-time energy verification system.

Overview

The energy system consists of:

  1. Energy budget attributes -- Programmer-declared constraints on function energy consumption
  2. Energy estimator -- Static analysis that estimates energy from HIR
  3. Energy cost model -- Calibrated per-instruction energy costs
  4. Energy IR (EIR) -- Intermediate representation with picojoule cost annotations
  5. Accelerator energy -- Runtime measurement for GPUs and other accelerators
  6. Diagnostics -- Error messages when budgets are violated

Attribute Syntax

#[energy_budget( budget_param { , budget_param } )]

Where budget_param is one of:

ParameterTypeUnitDescription
max_joulesf64joulesMaximum total energy
max_wattsf64wattsMaximum average power
max_temp_deltaf64celsiusMaximum temperature rise

Estimation Model

Instruction Costs

The cost model assigns picojoule costs to each instruction type. Costs are calibrated against real hardware measurements:

InstructionBase Cost (pJ)Thermal Scaling
IntAdd0.05Linear
IntSub0.05Linear
IntMul0.35Linear
IntDiv3.5Linear
IntRem3.5Linear
FloatAdd0.35Quadratic
FloatSub0.35Quadratic
FloatMul0.35Quadratic
FloatDiv3.5Quadratic
FloatSqrt5.25Quadratic
MemLoadL10.5Linear
MemLoadL23.0Linear
MemLoadL310.0Linear
MemLoadDram200.0Linear
MemStoreDram200.0Linear
BranchTaken0.1None
BranchNotTaken0.1None
BranchMispredicted1.5None
SimdF32x8Add1.5Quadratic
SimdF32x8Mul1.5Quadratic
SimdF32x8Div7.0Quadratic
SimdF32x8Fma2.0Quadratic

Thermal Scaling

Actual cost = base_cost * thermal_factor, where thermal_factor depends on the thermal model:

  • None: cost is constant regardless of temperature
  • Linear: actual = base * (1.0 + 0.3 * thermal_state)
  • Quadratic: actual = base * (1.0 + 0.3 * thermal_state + 0.1 * thermal_state^2)

Default thermal state: 0.3 (nominal operating temperature).

Expression Costs

ExpressionCost
Literal0.01 pJ
Variable accessL1 load
Binary operationleft + right + op_cost
Unary operationinner + op_cost
Function callargs + branch + 2x L1 (stack)
Method callreceiver + args + branch + 3x L1
Field accessinner + IntAdd + L1
Index accessarray + index + IntMul + IntAdd + branch (bounds) + L1
Struct constructionfields + (field_count x L1)
Array constructionelements + (element_count x L1)

Loop Estimation

  • Known bounds: body_cost * iteration_count
  • Unknown bounds: body_cost * default_iterations (100)
  • Max iterations cap: 10,000
  • PGO-refined: body_cost * actual_trip_count (from profile data)

Unknown-bound loops reduce confidence by 0.7x. PGO data restores confidence to 0.95x.

Branch Estimation

  • if/else: condition + avg(then_cost, else_cost) + branch_cost
  • match: scrutinee + avg(arm_costs) + (arm_count x branch_cost)

Branches reduce confidence by 0.9x (if/else) or 0.85x (match).

Confidence Score

Range: 0.0 to 1.0

  • Straight-line code: 1.0
  • Each if/else: multiply by 0.9
  • Each match: multiply by 0.85
  • Each unbounded loop: multiply by 0.7
  • PGO-refined loop: multiply by 0.95

The confidence score is reported in diagnostics to help the programmer assess estimate reliability.

Energy IR (EIR)

The Energy IR is an intermediate representation where every node carries a picojoule cost annotation. It sits between HIR and MIR in the pipeline:

HIR -> EIR (with picojoule costs) -> E-Graph Optimizer -> MIR

EIR nodes include:

  • EirExpr -- Expressions with energy costs
  • EirStmt -- Statements with energy costs
  • EirBody -- Function bodies with total energy and effect sets

Effect Sets

EIR tracks side effects using EffectSet:

  • Pure (no effects)
  • IO (reads/writes)
  • Alloc (heap allocation)
  • Panic (may abort)

The e-graph optimizer uses effect information to determine which rewrites are safe.

E-Graph Optimization

When --egraph-optimize is enabled, the EIR passes through an e-graph optimizer with 30+ algebraic rewrite rules:

  • Arithmetic simplification (x + 0 -> x, x * 1 -> x)
  • Constant folding
  • Dead code elimination
  • Common subexpression elimination
  • Strength reduction (x * 2 -> x << 1)
  • Energy-aware rewrites (prefer lower-energy equivalent operations)

Three-Tier Measurement

Tier 1: Static Estimation

Compile-time energy estimation using the instruction cost model. Available for all programs, no hardware access required.

Tier 2: CPU Performance Counters

Runtime measurement using hardware performance counters:

  • Intel/AMD: RAPL (Running Average Power Limit) via perf_event or MSR
  • Apple Silicon: powermetrics integration

Tier 3: Accelerator Energy

Runtime measurement using vendor-specific APIs:

VendorAPIMeasurement
NVIDIANVML (nvmlDeviceGetTotalEnergyConsumption)Board power, per-GPU
AMDROCm SMI (rsmi_dev_power_ave_get)Average power, per-GPU
IntelLevel Zero (zesDeviceGetProperties + power domains)Per-device power
GoogleTPU RuntimePer-chip power
AWSNeuron SDKPer-core power
GroqHLML (hlmlDeviceGetTotalEnergyConsumption)Board power
CerebrasCS SDKWafer-scale power
SambaNovaDataScale APIPer-RDU power

See Accelerator Energy Measurement for details.

Power Estimation

avg_pj_per_cycle = 0.15  (weighted average for mixed workloads)
estimated_cycles = total_pJ / avg_pj_per_cycle
execution_time = estimated_cycles / reference_frequency  (3.0 GHz)
power_watts = energy_joules / execution_time

Thermal Estimation

thermal_resistance = 0.4 K/W  (typical CPU with standard cooling)
temp_delta = power_watts * thermal_resistance

Transitive Energy Budgets

Energy budgets are enforced across call boundaries. When function A calls function B, the energy cost of B is included in A's total:

#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }

#[energy_budget(max_joules = 0.0005)]
fn caller() -> i32 {
    helper() + helper()
    // Total includes 2x helper's energy + caller's own instructions
}

The call graph analyzer (joule-callgraph) builds a complete energy call graph and identifies hotspots.

JSON Output

When JOULE_ENERGY_JSON=1 is set, energy reports are emitted as structured JSON:

{
  "functions": [
    {
      "name": "process_data",
      "file": "program.joule",
      "line": 15,
      "energy_joules": 0.00035,
      "power_watts": 12.5,
      "confidence": 0.85,
      "budget_joules": 0.0001,
      "status": "exceeded",
      "breakdown": {
        "compute_pj": 280000,
        "memory_pj": 70000,
        "branch_pj": 500
      }
    }
  ],
  "total_energy_joules": 0.00042
}

Violation Diagnostics

When a budget is exceeded, the compiler emits an error:

error: energy budget exceeded in function 'name'
  --> file.joule:line:col
   |
   | fn name(...) {
   | ^^^^^^^^^^^^^^
   |
   = estimated: X.XXXXX J (confidence: NN%)
   = budget:    X.XXXXX J
   = exceeded by NNN%

For power and thermal budgets, similar diagnostics are produced with the appropriate units.

Standard Library

Joule ships with 110+ batteries-included modules. No package manager needed for common tasks.

Core Types

These are the fundamental types used in every Joule program.

ModuleDescriptionStatus
StringUTF-8 string typeImplemented
VecDynamic arrayImplemented
OptionOptional valuesImplemented
ResultError handlingImplemented
HashMapKey-value mapsImplemented
PrimitivesNumeric types, bool, charImplemented

Collections

ModuleDescriptionStatus
collectionsOverview of all collection typesImplemented
Vec<T>Dynamic arrayImplemented
HashMap<K,V>Hash mapImplemented
HashSet<T>Hash setImplemented
BTreeMap<K,V>Sorted mapImplemented
BTreeSet<T>Sorted setImplemented
LinkedList<T>Doubly-linked listImplemented
VecDeque<T>Double-ended queueImplemented
BinaryHeap<T>Priority queueImplemented

Mathematics

ModuleDescriptionStatus
mathMathematical functionsImplemented
math::linearLinear algebraImplemented
math::complexComplex numbersImplemented
statisticsStatistical analysisImplemented
montecarloMonte Carlo methodsImplemented

I/O and Networking

ModuleDescriptionStatus
ioFile and stream I/OImplemented
netTCP/UDP networkingImplemented
jsonJSON parsing and serializationImplemented
csvCSV parsingImplemented
tomlTOML parsingImplemented
yamlYAML parsingImplemented

Databases

ModuleDescriptionStatus
db-sqliteSQLiteImplemented
db-postgresPostgreSQLImplemented
db-mysqlMySQLImplemented
db-redisRedisImplemented
db-mongodbMongoDBImplemented
...and 30+ moreSee stdlib/db-*Implemented

Scientific Computing

ModuleDescriptionStatus
odeOrdinary differential equationsImplemented
pdePartial differential equationsImplemented
dspDigital signal processingImplemented
physicsPhysics simulationImplemented
bioBioinformaticsImplemented
chemChemistryImplemented

Machine Learning and AI

ModuleDescriptionStatus
mlMachine learningImplemented
snnSpiking neural networksImplemented
agentAI agent frameworkImplemented

Cryptography and Security

ModuleDescriptionStatus
cryptoCryptographic primitivesImplemented
securitySecurity analysisImplemented
zkpZero-knowledge proofsImplemented
fheFully homomorphic encryptionImplemented

Graphics and Visualization

ModuleDescriptionStatus
graphics2D/3D graphicsImplemented
imageImage processingImplemented
vizData visualizationImplemented
plotPlottingImplemented

Concurrency

ModuleDescriptionStatus
concurrencyConcurrency primitivesImplemented
distributedDistributed computingImplemented

Energy

ModuleDescriptionStatus
energyEnergy measurement APIsImplemented

Platform

ModuleDescriptionStatus
wasmWebAssembly supportImplemented
embeddedEmbedded systemsImplemented
mobileMobile developmentImplemented
desktopDesktop applicationsImplemented

Interoperability

ModuleDescriptionStatus
rust_interopRust FFIImplemented
pythonPython interopImplemented
go_interopGo interopImplemented
typescript_interopTypeScript interopImplemented

For the complete list, see the stdlib/ directory in the distribution.

String

The String type is a heap-allocated, growable UTF-8 string.

Construction

let s = "Hello, world!";           // string literal
let empty = String::new();          // empty string
let from_chars = String::from("hello");

Operations

Length

let len = s.len();       // byte length
let empty = s.is_empty();

Concatenation

let greeting = "Hello, " + name;
let full = first + " " + last;

Comparison

if s == "hello" {
    // string equality
}

Substring and Indexing

let first_byte = s[0];       // byte at index (u8)
let sub = s.substring(0, 5); // substring by byte range

Conversion

let n: i32 = 42;
let s = n.to_string();       // "42"

let x: f64 = 3.14;
let s = x.to_string();       // "3.14"
let found = s.contains("world");
let pos = s.find("world");          // Option<usize>
let starts = s.starts_with("Hello");
let ends = s.ends_with("!");

Transformation

let upper = s.to_uppercase();
let lower = s.to_lowercase();
let trimmed = s.trim();

Split

let parts = s.split(",");    // Vec<String>
let lines = s.split("\n");

Memory Layout

String {
    data: *mut u8,     // pointer to UTF-8 bytes
    len: usize,        // byte length
    capacity: usize,   // allocated capacity
}

Strings are heap-allocated and own their data. When a String is dropped, its memory is freed.

Vec<T>

A contiguous, growable array type. The most commonly used collection in Joule.

Construction

let mut v: Vec<i32> = Vec::new();   // empty vector

Adding Elements

v.push(1);
v.push(2);
v.push(3);

Accessing Elements

let first = v[0];          // indexing (panics if out of bounds)
let len = v.len();          // number of elements
let empty = v.is_empty();   // true if len == 0

Iteration

for item in v {
    process(item);
}

Removing Elements

let last = v.pop();         // Option<T> -- removes and returns last element

Common Patterns

Collecting Results

let mut results: Vec<i32> = Vec::new();
let mut i = 0;
while i < 10 {
    results.push(i * i);
    i = i + 1;
}

As a Stack

let mut stack: Vec<i32> = Vec::new();
stack.push(1);      // push
stack.push(2);
let top = stack.pop();  // pop -- Option::Some(2)

Memory Layout

Vec<T> {
    data: *mut T,      // pointer to heap allocation
    len: usize,        // number of elements
    capacity: usize,   // allocated capacity
}

Vec grows automatically when elements are added beyond the current capacity. Growth is amortized O(1).

Option<T>

Represents a value that may or may not be present. Joule's alternative to null pointers.

Variants

pub enum Option<T> {
    Some(T),    // a value is present
    None,       // no value
}

Construction

let some: Option<i32> = Option::Some(42);
let none: Option<i32> = Option::None;

Pattern Matching

The primary way to use an Option:

match value {
    Option::Some(x) => {
        // use x
        println!("Got: {}", x);
    }
    Option::None => {
        println!("Nothing");
    }
}

Common Methods

Checking

let has_value = opt.is_some();   // bool
let is_empty = opt.is_none();    // bool

Unwrapping

let value = opt.unwrap();              // panics if None
let value = opt.unwrap_or(default);    // returns default if None

Common Patterns

Lookup That May Fail

fn find(items: Vec<i32>, target: i32) -> Option<usize> {
    let mut i = 0;
    while i < items.len() {
        if items[i] == target {
            return Option::Some(i);
        }
        i = i + 1;
    }
    Option::None
}

Optional Fields

pub struct User {
    pub name: String,
    pub email: Option<String>,
}

Memory Layout

Option<T> {
    is_some: bool,    // discriminant
    value: T,         // the value (undefined when is_some == false)
}

Result<T, E>

Represents an operation that can succeed with a value of type T or fail with an error of type E.

Variants

pub enum Result<T, E> {
    Ok(T),     // success
    Err(E),    // failure
}

Construction

let ok: Result<i32, String> = Result::Ok(42);
let err: Result<i32, String> = Result::Err("something went wrong");

Pattern Matching

match parse_number(input) {
    Result::Ok(n) => {
        println!("Parsed: {}", n);
    }
    Result::Err(e) => {
        println!("Error: {}", e);
    }
}

Common Methods

Checking

let succeeded = result.is_ok();    // bool
let failed = result.is_err();      // bool

Unwrapping

let value = result.unwrap();              // panics if Err
let value = result.unwrap_or(default);    // returns default if Err

Common Patterns

Functions That Can Fail

fn parse_file(path: String) -> Result<Data, String> {
    let content = read_file(path);
    match content {
        Result::Ok(text) => {
            // parse text into Data
            Result::Ok(data)
        }
        Result::Err(e) => {
            Result::Err("Failed to read file: " + e)
        }
    }
}

Error Accumulation

fn parse_all(inputs: Vec<String>) -> Result<Vec<i32>, Vec<String>> {
    let mut results: Vec<i32> = Vec::new();
    let mut errors: Vec<String> = Vec::new();

    for input in inputs {
        match parse_number(input) {
            Result::Ok(n) => results.push(n),
            Result::Err(e) => errors.push(e),
        }
    }

    if errors.is_empty() {
        Result::Ok(results)
    } else {
        Result::Err(errors)
    }
}

Memory Layout

Result<T, E> {
    is_ok: bool,           // discriminant
    union {
        ok: T,             // success value
        err: E,            // error value
    }
}

HashMap<K, V>

A hash map (dictionary) that stores key-value pairs with O(1) average lookup.

Construction

let mut map: HashMap<String, i32> = HashMap::new();

Insertion

map.insert("alice", 42);
map.insert("bob", 17);
map.insert("carol", 99);

Lookup

let value = map.get("alice");    // Option<i32>

match map.get("alice") {
    Option::Some(v) => println!("Found: {}", v),
    Option::None => println!("Not found"),
}

Checking Membership

let exists = map.contains_key("alice");   // bool

Removal

let removed = map.remove("bob");    // Option<i32>

Size

let count = map.len();         // number of entries
let empty = map.is_empty();    // true if len == 0

Iteration

for (key, value) in map {
    println!("{}: {}", key, value);
}

Common Patterns

Word Counter

fn count_words(text: String) -> HashMap<String, i32> {
    let mut counts: HashMap<String, i32> = HashMap::new();
    let words = text.split(" ");
    for word in words {
        let current = counts.get(word).unwrap_or(0);
        counts.insert(word, current + 1);
    }
    counts
}

Configuration Store

pub struct Config {
    values: HashMap<String, String>,
}

impl Config {
    pub fn get(self, key: String) -> Option<String> {
        self.values.get(key)
    }

    pub fn set(mut self, key: String, value: String) {
        self.values.insert(key, value);
    }
}

Primitive Types

Integer Types

Signed Integers

TypeSizeMinMax
i88-bit-128127
i1616-bit-32,76832,767
i3232-bit-2,147,483,6482,147,483,647
i6464-bit-9.2 * 10^189.2 * 10^18
isizepointer-sizedPlatform dependentPlatform dependent

Unsigned Integers

TypeSizeMinMax
u88-bit0255
u1616-bit065,535
u3232-bit04,294,967,295
u6464-bit01.8 * 10^19
usizepointer-sized0Platform dependent

Integer Methods

let x: i32 = 42;
let s = x.to_string();      // "42"
let abs = x.abs();           // absolute value
let min = x.min(10);         // minimum of two values
let max = x.max(100);        // maximum of two values

Integer Literals

let dec = 42;            // decimal
let hex = 0xFF;          // hexadecimal
let oct = 0o77;          // octal
let bin = 0b1010;        // binary
let with_sep = 1_000_000; // underscore separator
let typed = 42u8;        // type suffix

Floating-Point Types

TypeSizePrecisionRangeEnergy
f1616-bit~3 digits~6.1 * 10^-5 to 655040.4 pJ
bf1616-bit~3 digits~1.2 * 10^-38 to ~3.4 * 10^380.4 pJ
f3232-bit~7 digits~1.2 * 10^-38 to ~3.4 * 10^380.35 pJ
f6464-bit~15 digits~2.2 * 10^-308 to ~1.8 * 10^3080.35 pJ

Half-Precision Types

f16 is IEEE 754 half-precision — useful for signal processing and inference where memory bandwidth matters more than precision.

bf16 (Brain Float) has the same exponent range as f32 but only 8 mantissa bits. Designed for ML training where gradients don't need full precision. Used natively on Google TPUs, NVIDIA A100+, and Apple Neural Engine.

let weight: f16 = 0.5f16;
let grad: bf16 = 0.001bf16;

// Convert to/from f32
let full: f32 = weight as f32;
let half: f16 = full as f16;

Float Methods

let x: f64 = 3.14;
let s = x.to_string();      // "3.14"
let abs = x.abs();           // absolute value
let sqrt = x.sqrt();         // square root
let floor = x.floor();       // round down
let ceil = x.ceil();         // round up
let round = x.round();       // round to nearest

Float Literals

let a = 3.14;           // f64 (default)
let b = 3.14f32;        // f32
let c = 1.0e10;         // scientific notation
let d = 2.5e-3;         // 0.0025

Boolean

let t: bool = true;
let f: bool = false;

Boolean Operations

let and = a && b;       // logical AND (short-circuit)
let or = a || b;        // logical OR (short-circuit)
let not = !a;            // logical NOT

Character

A Unicode scalar value (4 bytes):

let c: char = 'A';
let emoji: char = '\u{1F600}';
let newline: char = '\n';

Unit Type

The unit type () represents no meaningful value:

fn do_something() {
    // implicitly returns ()
}

let unit: () = ();

Type Conversions

Use as for numeric conversions:

let i: i32 = 42;
let f: f64 = i as f64;       // 42.0
let u: u8 = i as u8;         // 42 (truncates if > 255)
let s: usize = i as usize;   // 42

Conversions are explicit -- Joule does not implicitly convert between numeric types.

Collections

Joule provides a comprehensive set of collection types in the standard library.

Overview

TypeDescriptionOrderedUnique KeysUse Case
Vec<T>Dynamic arrayYes (insertion)NoGeneral-purpose sequence
HashMap<K,V>Hash tableNoYesKey-value lookup
HashSet<T>Hash setNoYesUnique element set
BTreeMap<K,V>Sorted mapYes (key order)YesOrdered key-value lookup
BTreeSet<T>Sorted setYes (value order)YesOrdered unique elements
VecDeque<T>Ring bufferYes (insertion)NoQueue / double-ended queue
LinkedList<T>Doubly-linked listYes (insertion)NoFrequent middle insertion/removal
BinaryHeap<T>Max-heapBy priorityNoPriority queue

Vec<T>

See Vec for full documentation.

let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
let first = v[0];

HashMap<K, V>

See HashMap for full documentation.

let mut map: HashMap<String, i32> = HashMap::new();
map.insert("key", 42);
let val = map.get("key");

HashSet<T>

An unordered set of unique elements.

let mut set: HashSet<i32> = HashSet::new();
set.insert(1);
set.insert(2);
set.insert(1);          // no effect, already present
let has = set.contains(1); // true
let count = set.len();     // 2

BTreeMap<K, V>

A sorted map. Keys are kept in sorted order.

let mut map: BTreeMap<String, i32> = BTreeMap::new();
map.insert("banana", 2);
map.insert("apple", 1);
map.insert("cherry", 3);

// Iterates in key order: apple, banana, cherry
for (key, value) in map {
    println!("{}: {}", key, value);
}

BTreeSet<T>

A sorted set of unique elements.

let mut set: BTreeSet<i32> = BTreeSet::new();
set.insert(3);
set.insert(1);
set.insert(2);

// Iterates in order: 1, 2, 3
for item in set {
    println!("{}", item);
}

VecDeque<T>

A double-ended queue implemented as a ring buffer.

let mut deque: VecDeque<i32> = VecDeque::new();
deque.push_back(1);
deque.push_back(2);
deque.push_front(0);

let front = deque.pop_front();   // Option::Some(0)
let back = deque.pop_back();     // Option::Some(2)

BinaryHeap<T>

A max-heap (priority queue). The largest element is always at the top.

let mut heap: BinaryHeap<i32> = BinaryHeap::new();
heap.push(3);
heap.push(1);
heap.push(4);
heap.push(1);
heap.push(5);

let max = heap.pop();    // Option::Some(5)
let next = heap.pop();   // Option::Some(4)

SmallVec[T; N]

A vector that stores up to N elements inline (on the stack), spilling to the heap only when the capacity is exceeded. Ideal for short, bounded collections where heap allocation is wasteful.

let mut sv: SmallVec[i32; 8] = SmallVec::new();

// First 8 elements are stored inline — no heap allocation
for i in 0..8 {
    sv.push(i);       // 0.5 pJ per push (inline)
}

// 9th element triggers heap spill — 45.0 pJ
sv.push(99);

sv.len();             // 9
sv.capacity();        // 16 (heap capacity after spill)
sv.spilled();         // true
sv.get(0);            // 0
sv.pop();             // 99

sv.clear();
sv.drop();            // free heap if spilled

Energy trade-off: Inline pushes cost 0.5 pJ vs ~45 pJ for heap spill. Size N so that most instances never spill.

Deque<T>

Double-ended queue implemented as a ring buffer. O(1) push/pop at both ends.

let mut dq: Deque<i32> = Deque::new();
dq.push_back(1);
dq.push_back(2);
dq.push_front(0);

let front = dq.pop_front();   // Option::Some(0)
let back = dq.pop_back();     // Option::Some(2)

dq.front();                    // peek at front
dq.back();                     // peek at back
dq.len();                      // 1
dq.rotate_left(1);            // rotate elements

Arena<T>

Bump allocator — allocates by advancing a pointer. Individual elements cannot be freed; call reset() to free everything at once in O(1). Ideal for phase-based allocation (parsers, compilers, frame allocators).

let mut arena: Arena<AstNode> = Arena::new();

// Allocation is a pointer bump — 1.0 pJ
let node1 = arena.alloc(AstNode { kind: "expr", children: Vec::new() });
let node2 = arena.alloc(AstNode { kind: "stmt", children: Vec::new() });

arena.len();            // 2 elements allocated
arena.bytes_used();     // bytes consumed
arena.bytes_capacity(); // total buffer size

// Free everything at once — 0.5 pJ regardless of count
arena.reset();

BitSet

Fixed-capacity bit field stored as u64 words. Space-efficient boolean set with O(1) insert/contains and fast set operations.

let mut bits = BitSet::new();

bits.insert(0);
bits.insert(42);
bits.insert(63);

bits.contains(42);           // true
bits.remove(42);
bits.count_ones();           // number of set bits
bits.count_zeros();          // number of unset bits

// Set operations
let union = bits.union(&other);
let inter = bits.intersection(&other);
let diff = bits.difference(&other);
bits.is_subset(&other);

BitVec

Dynamic-length bit vector. Like BitSet but growable.

let mut bv = BitVec::new();

bv.push(true);
bv.push(false);
bv.push(true);

bv.get(1);              // false
bv.set(1, true);
bv.len();               // 3 (bits)
bv.count_ones();        // 3
bv.pop();               // true

Choosing a Collection

  • Need a sequence? Use Vec<T>
  • Need a short, bounded sequence? Use SmallVec[T; N] (avoids heap allocation)
  • Need fast key lookup? Use HashMap<K,V>
  • Need unique elements? Use HashSet<T>
  • Need sorted keys? Use BTreeMap<K,V> (12.0 pJ per traversal)
  • Need a queue? Use Deque<T> (2.0 pJ push/pop)
  • Need a priority queue? Use BinaryHeap<T>
  • Need phase-based allocation? Use Arena<T> (1.0 pJ alloc, 0.5 pJ free-all)
  • Need a compact boolean set? Use BitSet or BitVec (0.3 pJ per operation)
  • Need frequent middle insertion? Use LinkedList<T> (rare)

Smart Pointers

Smart pointers manage ownership and sharing of heap-allocated data with automatic cleanup.

Overview

TypeThread-safeUse CaseEnergy Cost
Box<T>N/AHeap allocation, recursive typesAllocation only
Rc<T>NoSingle-threaded shared ownership3.0 pJ clone/drop
Arc<T>YesMulti-threaded shared ownership3.0 pJ clone/drop (atomic)
Cow<T>N/AClone-on-write optimizationFree reads, allocation on write

Box<T>

Heap-allocated value. Required for recursive types. Box<T> is a pointer in memory — zero overhead beyond the allocation.

// Recursive type requires Box
pub enum Expr {
    Literal(i32),
    Add { left: Box<Expr>, right: Box<Expr> },
    Neg { inner: Box<Expr> },
}

let expr = Expr::Add {
    left: Box::new(Expr::Literal(1)),
    right: Box::new(Expr::Literal(2)),
};

Methods

let b = Box::new(42);
let inner = b.into_inner();    // 42 — consumes the Box
let r: &i32 = b.as_ref();     // borrow the inner value
let r: &mut i32 = b.as_mut(); // mutable borrow
let ptr = b.leak();            // leak memory, return raw pointer

Rc<T>

Reference-counted pointer for single-threaded shared ownership. Multiple Rc<T> values can point to the same data. The data is freed when the last Rc is dropped.

let a = Rc::new(42);
let b = a.clone();            // increment reference count (3.0 pJ)
let c = a.clone();            // count is now 3

println!("{}", Rc::strong_count(&a));  // 3

// When a, b, c all go out of scope, the value is freed

Methods

let rc = Rc::new(vec![1, 2, 3]);
let count = Rc::strong_count(&rc);     // number of references
let inner = Rc::into_inner(rc);        // unwrap if count == 1
let r: &Vec<i32> = rc.as_ref();       // borrow inner value

// Mutable access (only if count == 1)
let mut rc = Rc::new(42);
if let Option::Some(val) = Rc::get_mut(&mut rc) {
    *val = 100;
}

Use Case: Shared Graph Nodes

pub struct Node {
    pub value: i32,
    pub children: Vec<Rc<Node>>,
}

let leaf = Rc::new(Node { value: 1, children: Vec::new() });
let parent = Node {
    value: 0,
    children: vec![leaf.clone(), leaf.clone()],  // shared ownership
};

Arc<T>

Atomically reference-counted pointer for multi-threaded shared ownership. Same API as Rc<T>, but uses atomic operations for thread safety.

use std::concurrency::spawn;

let data = Arc::new(vec![1, 2, 3, 4, 5]);

let handle = spawn(|| {
    let local = data.clone();      // atomic increment (3.0 pJ)
    println!("len = {}", local.len());
});

println!("len = {}", data.len());  // still valid in main thread

Methods

let arc = Arc::new(42);
let count = Arc::strong_count(&arc);   // number of references
let r: &i32 = arc.as_ref();           // borrow inner value
let cloned = arc.clone();             // atomic increment

// Arc::get_mut — only if count == 1
// Arc::into_inner — unwrap if count == 1
// Arc::make_mut — clone inner if shared, then return &mut

Energy: Rc vs Arc

OperationRcArc
clone3.0 pJ (increment)3.0 pJ (atomic increment)
drop3.0 pJ (decrement + conditional free)3.0 pJ (atomic decrement + conditional free)
as_ref0 pJ (pointer deref)0 pJ (pointer deref)

Use Rc when data stays on one thread. Use Arc when sharing across threads. The energy cost is similar, but Arc incurs cache-line contention overhead under high concurrency.

Cow<T>

Clone-on-write smart pointer. Wraps either a borrowed reference or an owned value. Reading is free; writing clones the data only if it's currently borrowed.

// Start with a borrowed value
let text = Cow::borrowed("hello");
println!("{}", text.as_ref());        // free — no allocation

// Convert to owned only when needed
let owned = text.to_owned();          // allocates if borrowed

// Check state
text.is_borrowed();   // true
text.is_owned();      // false

Methods

let cow = Cow::borrowed("hello");
let cow2 = Cow::owned("world".to_string());

let r: &str = cow.as_ref();           // borrow — always free
let s: String = cow.into_owned();     // consume, clone if borrowed
let owned = cow.to_owned();           // clone if borrowed, return owned Cow

cow.is_borrowed();                     // true if wrapping a reference
cow.is_owned();                        // true if wrapping an owned value

Use Case: Conditional Transformation

fn normalize(input: &str) -> Cow<str> {
    if input.contains(' ') {
        // Only allocate when we actually need to modify
        Cow::owned(input.replace(' ', "_"))
    } else {
        // No allocation — return a reference to the original
        Cow::borrowed(input)
    }
}

// Most inputs pass through without allocation
let a = normalize("hello");     // Cow::borrowed — 0 allocation
let b = normalize("hello world"); // Cow::owned — 1 allocation

Choosing a Smart Pointer

  • Need heap allocation for recursive types? Use Box<T>
  • Need shared ownership on one thread? Use Rc<T>
  • Need shared ownership across threads? Use Arc<T>
  • Need to avoid cloning until mutation? Use Cow<T>
  • Need unique ownership? Just use the value directly (no pointer needed)

N-Dimensional Arrays

Joule provides first-class multi-dimensional array types for scientific computing, machine learning, and signal processing.

Overview

TypeDescriptionOwns DataEnergy Cost
NDArray[T; N]Owned N-dimensional arrayYesAllocation + compute
NDView[T; N]Non-owning view into an NDArrayNoZero-copy
CowArray[T; N]Clone-on-write arraySharedFree reads, allocation on write
DynArray[T]Dynamically-ranked arrayYesAllocation + compute

The rank N is a compile-time constant, enabling the compiler to optimize indexing and verify dimensionality at compile time.

NDArray[T; N]

Owned, contiguous, row-major multi-dimensional array.

// Create a 2D array (matrix)
let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]);     // 3x4 matrix of zeros
let ones: NDArray[f64; 2] = NDArray::ones([2, 2]);     // 2x2 matrix of ones
let filled: NDArray[f64; 2] = NDArray::full([3, 3], 7.0); // 3x3 filled with 7.0

// Create from data
let v: NDArray[f64; 1] = NDArray::from_vec(vec![1.0, 2.0, 3.0]);
let m: NDArray[f64; 2] = NDArray::from_vec_shape(vec![1.0, 2.0, 3.0, 4.0], [2, 2]);

Indexing

// Multi-dimensional indexing
let val = mat[1, 2];           // row 1, column 2
mat[0, 0] = 42.0;             // set element

// Slicing — returns NDView
let row = mat[0, ..];          // first row
let col = mat[.., 1];          // second column
let sub = mat[1..3, 0..2];    // submatrix
let strided = mat[.., ::2];   // every other column

Methods

let a: NDArray[f64; 2] = NDArray::zeros([3, 4]);

// Shape and metadata
a.shape();         // [3, 4]
a.rank();          // 2
a.len();           // 12 (total elements)
a.strides();       // [4, 1] (row-major)

// Element-wise operations
let b = a.add(&other);        // element-wise addition
let c = a.mul(&other);        // element-wise multiplication
let d = a.map(|x: f64| -> f64 { x * 2.0 });

// Reductions
let total = a.sum();           // sum all elements
let mean = a.mean();           // average
let max = a.max();             // maximum element
let min = a.min();             // minimum element

// Shape manipulation
let reshaped = a.reshape([4, 3]);   // reshape (same element count)
let flat = a.flatten();             // flatten to 1D
let transposed = a.transpose();     // transpose axes

// Linear algebra (2D)
let product = a.matmul(&b);        // matrix multiplication
let dot = v1.dot(&v2);             // dot product (1D)

NDView[T; N]

A non-owning view into an NDArray. Views are zero-copy — they reference the original data without allocation.

let arr: NDArray[f64; 2] = NDArray::zeros([4, 4]);

// Create views via slicing
let row: NDView[f64; 1] = arr.row(0);
let col: NDView[f64; 1] = arr.col(2);
let sub: NDView[f64; 2] = arr.slice([1..3, 1..3]);

// Views support the same read operations as NDArray
let sum = row.sum();
let max = sub.max();

CowArray[T; N]

Clone-on-write array. Reading is free (shares data with the source). Writing triggers a copy only if the data is shared.

let original: NDArray[f64; 2] = NDArray::ones([100, 100]);
let cow = CowArray::from(&original);  // no copy yet

// Reading is free
let val = cow[0, 0];   // reads from original's memory

// Writing triggers a copy (if shared)
cow[0, 0] = 42.0;      // now owns its own data

DynArray[T]

Dynamically-ranked array. The rank is determined at runtime, not compile time. Use when the dimensionality isn't known until runtime (e.g., loading arbitrary tensors from files).

let dyn_arr: DynArray[f64] = DynArray::zeros(vec![3, 4, 5]);  // 3D
let rank = dyn_arr.rank();     // 3 (runtime value)
let shape = dyn_arr.shape();   // [3, 4, 5]

Broadcasting

Binary operations between arrays of different shapes follow broadcasting rules:

let mat: NDArray[f64; 2] = NDArray::ones([3, 4]);   // shape [3, 4]
let row: NDArray[f64; 1] = NDArray::from_vec(vec![1.0, 2.0, 3.0, 4.0]); // shape [4]

// row is broadcast to [3, 4] — each row gets the same values added
let result = mat.add(&row);  // shape [3, 4]

Broadcasting rules:

  1. Dimensions are compared from the right
  2. Dimensions must be equal, or one of them must be 1
  3. Missing dimensions on the left are treated as 1

Energy Costs

OperationCostNotes
Element access0.5 pJL1 cache hit
Element-wise op0.8 pJ/elementArithmetic + memory
Reduction (sum/mean)0.8 pJ/elementSequential scan
Matrix multiply~2N^3 * 0.8 pJCubic complexity
Reshape/transpose0 pJMetadata-only (no copy)
Slice (NDView)0 pJZero-copy view
Broadcasting0 pJ overheadApplied during compute

Choosing an Array Type

  • Know the rank at compile time? Use NDArray[T; N] — the compiler verifies dimensions
  • Need a read-only window? Use NDView[T; N] — zero-copy, zero allocation
  • Might or might not modify? Use CowArray[T; N] — defers allocation until write
  • Rank determined at runtime? Use DynArray[T] — flexible but no compile-time dimension checks

SIMD Vector Types

Simd[T; N] provides portable SIMD (Single Instruction, Multiple Data) operations. The compiler maps to platform-native intrinsics where available (x86 SSE/AVX, ARM NEON) with a scalar fallback for portability.

Creating SIMD Vectors

// Splat — fill all lanes with the same value
let v: Simd[f32; 4] = Simd::splat(1.0);      // [1.0, 1.0, 1.0, 1.0]

// From an array
let v: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);

// Load from a pointer + offset
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let v: Simd[f32; 4] = Simd::load(&data, 0);  // first 4 elements
let w: Simd[f32; 4] = Simd::load(&data, 4);  // next 4 elements

Common Lane Widths

TypeLanesx86ARM
Simd[f32; 4]4SSE __m128NEON float32x4_t
Simd[f32; 8]8AVX __m2562x NEON
Simd[f64; 2]2SSE2 __m128dNEON float64x2_t
Simd[f64; 4]4AVX __m256d2x NEON
Simd[i32; 4]4SSE2 __m128iNEON int32x4_t
Simd[i32; 8]8AVX2 __m256i2x NEON

Arithmetic Operations

All arithmetic operates lane-by-lane:

let a: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let b: Simd[f32; 4] = Simd::from_array([5.0, 6.0, 7.0, 8.0]);

let sum = a.add(&b);    // [6.0, 8.0, 10.0, 12.0]
let diff = a.sub(&b);   // [-4.0, -4.0, -4.0, -4.0]
let prod = a.mul(&b);   // [5.0, 12.0, 21.0, 32.0]
let quot = a.div(&b);   // [0.2, 0.333, 0.429, 0.5]

Reduction Operations

Reduce all lanes to a single scalar:

let v: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);

let total = v.sum();     // 10.0 — horizontal sum of all lanes

Comparison and Selection

let a: Simd[f32; 4] = Simd::from_array([1.0, 5.0, 3.0, 8.0]);
let b: Simd[f32; 4] = Simd::from_array([2.0, 4.0, 6.0, 7.0]);

let lo = a.min(&b);     // [1.0, 4.0, 3.0, 7.0]
let hi = a.max(&b);     // [2.0, 5.0, 6.0, 8.0]
let same = a.eq(&b);    // false (element-wise equality check)

Unary Operations

let v: Simd[f32; 4] = Simd::from_array([-1.0, 2.0, -3.0, 4.0]);

let pos = v.abs();       // [1.0, 2.0, 3.0, 4.0]
let neg = v.neg();       // [1.0, -2.0, 3.0, -4.0]

Memory Operations

let data = vec![0.0; 1024];

// Load 4 elements starting at offset 8
let chunk: Simd[f32; 4] = Simd::load(&data, 8);

// Store back to memory
chunk.store(&mut data, 8);

// Convert to/from array
let arr: [f32; 4] = v.to_array();

Example: Vectorized Dot Product

#[energy_budget(max_joules = 0.00005)]
fn dot_product(a: &[f32], b: &[f32]) -> f32 {
    let n = a.len();
    let mut sum: Simd[f32; 8] = Simd::splat(0.0);
    let mut i = 0;

    // Process 8 elements at a time
    while i + 8 <= n {
        let va: Simd[f32; 8] = Simd::load(a, i);
        let vb: Simd[f32; 8] = Simd::load(b, i);
        sum = sum.add(&va.mul(&vb));
        i = i + 8;
    }

    // Horizontal sum + scalar remainder
    let mut result = sum.sum();
    while i < n {
        result = result + a[i] * b[i];
        i = i + 1;
    }
    result
}

Energy Costs

OperationCostNotes
Lane arithmetic (add/sub/mul/div)2.0 pJSingle SIMD instruction
Horizontal reduction (sum)2.0 pJLog2(N) shuffle + add
Load/store0.5 pJL1 cache, aligned
Comparison (min/max/eq)2.0 pJSingle SIMD instruction

SIMD operations process N elements for roughly the same energy as one scalar operation. For a Simd[f32; 8], that's ~8x energy efficiency compared to a scalar loop — the primary reason to use SIMD in energy-aware code.

Platform Detection

The compiler automatically selects the best implementation:

  1. x86/x86_64: Uses SSE/AVX intrinsics via <immintrin.h>
  2. ARM64 (Apple Silicon, etc.): Uses NEON intrinsics via <arm_neon.h>
  3. Other platforms: Falls back to scalar loops (same behavior, no hardware acceleration)

No #[cfg] attributes needed in user code — the abstraction is portable.

Time

Joule provides two types for time measurement: Duration for time spans and Instant for timestamps.

Duration

A time span measured in nanoseconds. All arithmetic is exact — no floating-point rounding.

Creating Durations

let d1 = Duration::from_secs(5);          // 5 seconds
let d2 = Duration::from_millis(1500);     // 1.5 seconds
let d3 = Duration::from_micros(250);      // 250 microseconds
let d4 = Duration::from_nanos(100);       // 100 nanoseconds

Querying

let d = Duration::from_millis(2500);
d.as_secs();      // 2
d.as_millis();    // 2500
d.as_micros();    // 2500000
d.as_nanos();     // 2500000000
d.is_zero();      // false

Arithmetic

let a = Duration::from_secs(3);
let b = Duration::from_millis(500);

let sum = a.add(&b);           // 3.5 seconds
let diff = a.sub(&b);          // 2.5 seconds
let doubled = a.mul(2);        // 6 seconds
let halved = a.div(2);         // 1.5 seconds

// Checked arithmetic (returns Option)
let safe = a.checked_add(&b);  // Option::Some(3.5s)
let over = a.checked_sub(&Duration::from_secs(10)); // Option::None

Instant

A monotonic timestamp. Cannot go backwards. Used for measuring elapsed time.

Measuring Elapsed Time

let start = Instant::now();          // 15.0 pJ — reads system clock

// ... do work ...
heavy_computation();

let elapsed: Duration = start.elapsed();
println!("Took {} ms", elapsed.as_millis());

Comparing Instants

let t1 = Instant::now();
// ... work ...
let t2 = Instant::now();

let gap: Duration = t2.duration_since(&t1);

Example: Benchmarking with Energy

#[energy_budget(max_joules = 0.001)]
fn timed_sort(data: Vec<i32>) -> (Vec<i32>, Duration) {
    let start = Instant::now();
    let sorted = sort(data);
    let elapsed = start.elapsed();
    (sorted, elapsed)
}

fn main() {
    let data = vec![5, 3, 1, 4, 2, 8, 7, 6];
    let (sorted, time) = timed_sort(data);
    println!("Sorted in {} us", time.as_micros());
}

Energy Costs

OperationCostNotes
Duration arithmetic0.05 pJInteger add/sub
Instant::now()15.0 pJSystem clock read (syscall)
elapsed()15.0 pJClock read + subtraction
duration_since()0.05 pJInteger subtraction

Instant::now() is the expensive operation — it requires a system call (clock_gettime on Linux, mach_absolute_time on macOS). Avoid calling it in tight loops. Measure coarse-grained sections instead.

Numeric Types

Specialized numeric types beyond the standard integer and float primitives.

Decimal

128-bit decimal type for exact arithmetic. No floating-point rounding errors. Essential for financial calculations.

let price = Decimal::new(19, 99, false);    // 19.99
let tax = Decimal::from_str("0.0825");      // 8.25%
let total = price.mul(&tax).add(&price);    // exact: 21.6392...

// No floating-point surprise
let a = Decimal::from_str("0.1");
let b = Decimal::from_str("0.2");
let c = a.add(&b);
// c == 0.3 exactly (unlike f64 where 0.1 + 0.2 != 0.3)

Methods

let d = Decimal::from_str("123.456");

// Arithmetic
d.add(&other);       d.sub(&other);
d.mul(&other);       d.div(&other);
d.rem(&other);       // remainder

// Rounding
d.round(2);          // 123.46 — round to 2 decimal places
d.floor();           // 123.0
d.ceil();            // 124.0
d.trunc();           // 123.0 — truncate toward zero

// Properties
d.abs();             // absolute value
d.neg();             // negate
d.scale();           // number of decimal places
d.mantissa();        // integer mantissa
d.is_zero();         // false
d.is_negative();     // false

// Conversion
d.to_f64();          // 123.456 (lossy)
d.to_string();       // "123.456"

Energy Cost

OperationCost
Decimal arithmetic5.0 pJ
Decimal comparison0.5 pJ

Decimal is ~14x more expensive than f64 arithmetic but guarantees exact results. Use it where correctness matters more than speed (finance, accounting, currency).

Complex<T>

Complex number with real and imaginary parts. Generic over the component type (typically f32 or f64).

let z = Complex::new(3.0, 4.0);     // 3 + 4i
let w = Complex::new(1.0, -2.0);    // 1 - 2i

// Arithmetic
let sum = z.add(&w);     // 4 + 2i
let prod = z.mul(&w);    // 11 + 2i
let quot = z.div(&w);    // -1 + 2i

// Properties
z.real();       // 3.0
z.imag();       // 4.0
z.abs();        // 5.0 (magnitude: sqrt(3^2 + 4^2))
z.arg();        // 0.927... (phase angle in radians)
z.conj();       // 3 - 4i (complex conjugate)
z.norm();       // 25.0 (squared magnitude)

Advanced Operations

let z = Complex::new(1.0, 1.0);

z.exp();             // e^z
z.log();             // natural logarithm
z.sqrt();            // principal square root
z.pow(&w);           // z^w

// Polar form
let polar = Complex::from_polar(5.0, 0.927);  // magnitude, angle

Energy Cost

OperationCost
Complex add/sub1.6 pJ (2x real)
Complex multiply1.6 pJ
Complex divide3.2 pJ
abs/norm1.6 pJ
exp/log/sqrt5.0 pJ

Intern

Interned string — stored once in a global table, compared by pointer equality. Ideal for identifiers, keywords, and symbols that appear repeatedly.

let a = Intern::new("hello");
let b = Intern::new("hello");

// Pointer equality — O(1) comparison instead of O(n) string compare
a.eq(&b);         // true (same pointer)

// String access
a.as_str();       // "hello"
a.len();          // 5
a.is_empty();     // false
a.hash();         // precomputed hash value

Use Case: Compiler Symbol Tables

pub struct Symbol {
    pub name: Intern,
}

// Creating millions of Symbol values with the same name
// only stores the string once in memory
let sym1 = Symbol { name: Intern::new("x") };
let sym2 = Symbol { name: Intern::new("x") };

// Comparison is pointer equality — O(1), not O(n)
sym1.name.eq(&sym2.name);  // true, instant

Energy Cost

OperationCostNotes
Intern::new (first time)10.0 pJHash table insert
Intern::new (duplicate)10.0 pJHash table lookup
eq0.05 pJPointer comparison
as_str0 pJPointer dereference

The 10.0 pJ cost of Intern::new is amortized over all subsequent O(1) comparisons. For strings compared frequently (like identifiers in a compiler), interning saves both energy and time.

I/O

File and stream I/O operations.

Reading Files

use std::io::File;

let content = File::read_to_string("data.txt");
match content {
    Result::Ok(text) => process(text),
    Result::Err(e) => println!("Error: {}", e),
}

Writing Files

use std::io::File;

let result = File::write_string("output.txt", "Hello, world!");
match result {
    Result::Ok(_) => println!("Written successfully"),
    Result::Err(e) => println!("Error: {}", e),
}

Reading Lines

use std::io::File;

let lines = File::read_lines("data.txt");
match lines {
    Result::Ok(lines) => {
        for line in lines {
            process_line(line);
        }
    }
    Result::Err(e) => println!("Error: {}", e),
}

Standard Streams

use std::io::{stdin, stdout, stderr};

// Read from stdin
let line = stdin::read_line();

// Write to stdout
stdout::write("Hello\n");

// Write to stderr
stderr::write("Error message\n");

Path Operations

use std::io::Path;

let p = Path::new("/home/user/data.txt");
let exists = p.exists();
let is_file = p.is_file();
let is_dir = p.is_dir();
let parent = p.parent();        // Option<Path>
let filename = p.file_name();   // Option<String>
let ext = p.extension();        // Option<String>

Directory Operations

use std::io::{create_dir, read_dir, remove_dir};

create_dir("output");

let entries = read_dir(".");
match entries {
    Result::Ok(files) => {
        for entry in files {
            println!("{}", entry.name());
        }
    }
    Result::Err(e) => println!("Error: {}", e),
}

Buffered I/O

For performance-critical I/O, use buffered readers and writers:

use std::io::{BufReader, BufWriter};

let reader = BufReader::new(File::open("large.txt"));
let writer = BufWriter::new(File::create("output.txt"));

Math

Mathematical functions, constants, and linear algebra operations.

Constants

use std::math;

let pi = math::PI;           // 3.141592653589793
let e = math::E;             // 2.718281828459045
let tau = math::TAU;         // 6.283185307179586
let sqrt2 = math::SQRT_2;   // 1.4142135623730951

Basic Functions

use std::math;

let a = math::abs(-42.0);       // 42.0
let s = math::sqrt(144.0);      // 12.0
let p = math::pow(2.0, 10.0);   // 1024.0
let l = math::log(math::E);     // 1.0
let l2 = math::log2(1024.0);    // 10.0
let l10 = math::log10(1000.0);  // 3.0

Trigonometry

use std::math;

let s = math::sin(math::PI / 2.0);    // 1.0
let c = math::cos(0.0);                // 1.0
let t = math::tan(math::PI / 4.0);    // 1.0

let as = math::asin(1.0);             // PI/2
let ac = math::acos(0.0);             // PI/2
let at = math::atan(1.0);             // PI/4
let at2 = math::atan2(1.0, 1.0);     // PI/4

Rounding

use std::math;

let f = math::floor(3.7);    // 3.0
let c = math::ceil(3.2);     // 4.0
let r = math::round(3.5);    // 4.0
let t = math::trunc(3.9);    // 3.0

Min/Max

use std::math;

let mn = math::min(3.0, 7.0);    // 3.0
let mx = math::max(3.0, 7.0);    // 7.0
let cl = math::clamp(15.0, 0.0, 10.0);  // 10.0

Linear Algebra

use std::math::linear;

// Vector operations
let v1 = linear::Vector::new([1.0, 2.0, 3.0]);
let v2 = linear::Vector::new([4.0, 5.0, 6.0]);

let sum = v1.add(v2);
let dot = v1.dot(v2);           // 32.0
let norm = v1.norm();            // sqrt(14)
let scaled = v1.scale(2.0);

// Matrix operations
let m = linear::Matrix::identity(3);
let det = m.determinant();
let inv = m.inverse();
let product = m.multiply(m);

Complex Numbers

use std::math::complex::Complex;

let z1 = Complex::new(3.0, 4.0);   // 3 + 4i
let z2 = Complex::new(1.0, 2.0);   // 1 + 2i

let sum = z1.add(z2);               // 4 + 6i
let product = z1.mul(z2);           // -5 + 10i
let magnitude = z1.abs();           // 5.0
let conjugate = z1.conj();          // 3 - 4i

Statistics

use std::statistics;

let data = [2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0];

let mean = statistics::mean(data);         // 5.0
let median = statistics::median(data);     // 4.5
let stddev = statistics::std_dev(data);    // ~2.0
let variance = statistics::variance(data);

Random Numbers

use std::math::random;

let n = random::int(0, 100);       // random integer in [0, 100)
let f = random::float();            // random f64 in [0.0, 1.0)
let b = random::bool();             // random boolean