Joule Documentation

Welcome to the Joule programming language documentation (v1.1.7). Joule is developed by Open Interface Engineering, Inc.

For New Users

Getting Started -- Install the compiler and write your first Joule program
Language Tour -- Learn Joule's syntax and features through examples

Guides

Energy System Guide -- Compile-time energy budgets: Joule's defining feature
Compiler Reference -- CLI usage, flags, backends, and the compilation pipeline
JIT Compilation -- Interactive development with --jit and --watch
Polyglot Energy Analysis -- Measure energy in Python, JavaScript, and C code
Accelerator Energy -- GPU and accelerator energy measurement across vendors
TensorForge -- Energy-aware ML framework built on Joule

Language Reference

The formal specification of Joule's syntax and semantics.

Types -- Primitives, compounds, generics, union types, type inference
Expressions -- Operators, pipe operator, literals, control flow, closures
Items -- Functions, structs, enums, traits, impls, modules, const fn, comptime
Patterns -- Pattern matching: or patterns, range patterns, guard clauses
Attributes -- Energy budgets, #[test], #[bench], thermal awareness
Memory -- Ownership, borrowing, references, lifetimes
Concurrency -- Async/await, spawn, channels, task groups, parallel for
Energy -- Energy system specification with accelerator support

Standard Library Reference

Joule ships with 110+ batteries-included modules.

Overview -- Index of all standard library modules
String -- String type and operations
Vec -- Dynamic arrays
Option -- Optional values
Result -- Error handling
HashMap -- Key-value maps
Primitives -- Numeric types, bool, char
Collections -- All collection types
I/O -- File and stream I/O
Math -- Mathematical functions and linear algebra

Feedback

To report bugs, request features, or ask questions, visit joule-lang.org or open an issue on GitHub.

Getting Started with Joule

This guide walks you through installing the Joule compiler and writing your first program.

Current version: v1.2.0

Install

Quick Install (Recommended)

Platform	Command
macOS / Linux	`brew install openIE-dev/joule/joule`
Windows	`winget install OpenIE.Joule`
Ubuntu / Debian	`sudo apt install joule` (after adding the repo)
Arch Linux	`yay -S joule-bin`
Nix	`nix run github:openIE-dev/joule-lang`
Snap	`sudo snap install joule --classic`
Any (curl)	`curl -fsSL https://joule-lang.org/install.sh \| sh`

macOS

Homebrew (recommended):

brew install openIE-dev/joule/joule

Or download joule-macos-arm64.pkg (Apple Silicon) or joule-macos-x86_64.pkg (Intel) from the releases page:

sudo installer -pkg joule-macos-arm64.pkg -target /

Windows

Winget (recommended, built into Windows 11):

winget install OpenIE.Joule

Scoop:

scoop bucket add joule https://github.com/openIE-dev/scoop-joule
scoop install joule

Chocolatey:

choco install joule

Or download joule-windows-x86_64.msi or joule-windows-arm64.msi from the releases page. The MSI installer adds joulec to your PATH automatically.

APT (Ubuntu/Debian)

curl -fsSL https://openie-dev.github.io/joule-lang/KEY.gpg | sudo gpg --dearmor -o /usr/share/keyrings/joule.gpg
echo "deb [signed-by=/usr/share/keyrings/joule.gpg] https://openie-dev.github.io/joule-lang stable main" | sudo tee /etc/apt/sources.list.d/joule.list
sudo apt update && sudo apt install joule

Arch Linux (AUR)

yay -S joule-bin

Or with any AUR helper: paru -S joule-bin, trizen -S joule-bin.

Nix

# Run without installing
nix run github:openIE-dev/joule-lang

# Install into profile
nix profile install github:openIE-dev/joule-lang

Snap

sudo snap install joule --classic

Install Script

Universal one-line installer for macOS and Linux:

curl -fsSL https://joule-lang.org/install.sh | sh

From Source

git clone https://github.com/openIE-dev/joule-lang.git
cd joule-lang && cargo build --release

From C Source (Zero Dependencies)

Download joule-c-src-*.tar.gz from the releases page:

tar xzf joule-c-src-*.tar.gz && cd joule-c-src-*
make    # or: cc -O2 -o joulec output.c -lm

Verify

joulec --version
# joulec 1.2.0

Write Your First Program

Create a file called hello.joule:

pub fn main() {
    let message = "Hello from Joule!";
    println!("{}", message);
}

Compile and Run

joulec hello.joule -o hello
./hello

Output:

Hello from Joule!

Try JIT Mode

For interactive development, skip the compile step entirely:

joulec --jit hello.joule

This JIT-compiles and runs your program in memory using the Cranelift backend. No intermediate files are produced.

For an even faster workflow, use watch mode. It monitors your source file and re-runs automatically when you save:

joulec --watch hello.joule

JIT mode requires the jit feature flag. See JIT Compilation for details.

Add an Energy Budget

Joule's defining feature is compile-time energy budget verification. Annotate your function with an energy allowance:

#[energy_budget(max_joules = 0.0001)]
pub fn main() {
    let x = 42;
    let y = 58;
    let result = x + y;
    println!("{}", result);
}

Compile with energy checking:

joulec hello.joule -o hello --energy-check

The compiler estimates the energy cost of your function at compile time. If it exceeds the declared budget, compilation fails with a diagnostic showing the estimated vs. allowed energy.

Measure Energy in Existing Code

Already have Python or JavaScript code? Joule can measure its energy consumption without rewriting it:

# Measure energy in a Python script
joulec --lift-run python script.py

# Measure energy in a JavaScript file
joulec --lift-run js app.js

# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py

The --lift-run flag lifts foreign code into Joule's intermediate representation, applies energy analysis, and executes it. See Polyglot Energy Analysis for details.

Batteries Included

Joule ships with 110+ standard library modules. No package manager needed for common tasks:

use std::math;
use std::collections::HashMap;
use std::io::File;
use std::net::TcpStream;
use std::crypto::sha256;

See the Standard Library Reference for the complete list.

Feedback

Joule is developed and maintained by Open Interface Engineering, Inc. We welcome bug reports, feature requests, and questions via joule-lang.org or GitHub Issues.

Next Steps

Language Tour -- Learn Joule's syntax and features through examples
Energy System Guide -- Deep dive into energy budgets
Compiler Reference -- All CLI flags and options
JIT Compilation -- Interactive development workflow
Standard Library -- Available types and modules

Language Tour

A quick introduction to Joule's syntax and features through examples.

Variables

Variables are immutable by default. Use mut for mutable bindings.

let x = 42;              // immutable, type inferred as i32
let name: String = "Jo"; // explicit type annotation
let mut count = 0;        // mutable
count = count + 1;

Primitive Types

let a: i8  = -128;        // signed integers: i8, i16, i32, i64, isize
let b: u32 = 42;          // unsigned integers: u8, u16, u32, u64, usize
let c: f64 = 3.14159;     // floats: f16, bf16, f32, f64
let d: bool = true;       // boolean
let e: char = 'A';        // unicode character
let s: String = "hello";  // string
let h: f16 = 0.5f16;      // half-precision (ML inference, signal processing)
let g: bf16 = 0.001bf16;  // brain float (ML training)

Functions

// Basic function with parameters and return type
fn add(a: i32, b: i32) -> i32 {
    a + b   // last expression is the return value
}

// Public function (visible outside the module)
pub fn greet(name: String) {
    println!("Hello, {}", name);
}

// Mutable self parameter for methods that modify state
fn advance(mut self) -> Token {
    let token = self.peek();
    self.pos = self.pos + 1;
    token
}

Structs

pub struct Point {
    pub x: f64,
    pub y: f64,
}

// Construction
let p = Point { x: 3.0, y: 4.0 };

// Field access
let dist = p.x * p.x + p.y * p.y;

Impl Blocks

Methods are defined in impl blocks, separate from the struct definition.

impl Point {
    // Associated function (constructor)
    pub fn new(x: f64, y: f64) -> Point {
        Point { x, y }
    }

    // Method on self
    pub fn distance(self) -> f64 {
        (self.x * self.x + self.y * self.y).sqrt()
    }

    // Mutable method
    pub fn translate(mut self, dx: f64, dy: f64) {
        self.x = self.x + dx;
        self.y = self.y + dy;
    }
}

let p = Point::new(3.0, 4.0);
let d = p.distance();

Enums

Enums can hold data in each variant, making them sum types (tagged unions).

pub enum Shape {
    Circle { radius: f64 },
    Rectangle { width: f64, height: f64 },
    Triangle { base: f64, height: f64 },
}

let s = Shape::Circle { radius: 5.0 };

Pattern Matching

match is exhaustive -- the compiler ensures you handle every variant.

fn area(shape: Shape) -> f64 {
    match shape {
        Shape::Circle { radius } => {
            3.14159 * radius * radius
        }
        Shape::Rectangle { width, height } => {
            width * height
        }
        Shape::Triangle { base, height } => {
            0.5 * base * height
        }
    }
}

Match with a wildcard:

match token.kind {
    TokenKind::Fn => parse_function(),
    TokenKind::Struct => parse_struct(),
    TokenKind::Enum => parse_enum(),
    _ => parse_expression(),
}

Or Patterns

Match multiple alternatives in a single arm:

match x {
    1 | 2 | 3 => "small",
    4 | 5 | 6 => "medium",
    _ => "large",
}

Range Patterns

Match a range of values:

match score {
    0..=59 => "F",
    60..=69 => "D",
    70..=79 => "C",
    80..=89 => "B",
    90..=100 => "A",
    _ => "invalid",
}

Guard Clauses

Add conditions to match arms:

match value {
    x if x > 0 => "positive",
    x if x < 0 => "negative",
    _ => "zero",
}

Control Flow

// if-else (these are expressions -- they return values)
let max = if a > b { a } else { b };

// while loop
let mut i = 0;
while i < 10 {
    i = i + 1;
}

// for loop
for item in items {
    process(item);
}

// loop (infinite, break to exit)
loop {
    if done() {
        break;
    }
}

Option and Result

Option<T> represents a value that may or may not exist. Result<T, E> represents an operation that can succeed or fail.

// Option
fn find(items: Vec<i32>, target: i32) -> Option<usize> {
    let mut i = 0;
    while i < items.len() {
        if items[i] == target {
            return Option::Some(i);
        }
        i = i + 1;
    }
    Option::None
}

// Handling an Option
match find(items, 42) {
    Option::Some(index) => println!("Found at {}", index),
    Option::None => println!("Not found"),
}

// Result
fn parse_number(s: String) -> Result<i32, String> {
    // ...
    Result::Ok(42)
}

match parse_number(input) {
    Result::Ok(n) => println!("Got: {}", n),
    Result::Err(e) => println!("Error: {}", e),
}

Generics

Functions and types can be parameterized over types.

pub struct Pair<A, B> {
    pub first: A,
    pub second: B,
}

fn swap<A, B>(pair: Pair<A, B>) -> Pair<B, A> {
    Pair { first: pair.second, second: pair.first }
}

Traits

Traits define shared behavior. Types implement traits with impl.

pub trait Display {
    fn to_string(self) -> String;
}

impl Display for Point {
    fn to_string(self) -> String {
        "(" + self.x.to_string() + ", " + self.y.to_string() + ")"
    }
}

Collections

// Vec -- dynamic array
let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
v.push(3);
let first = v[0];       // indexing
let len = v.len();       // length

// HashMap -- key-value store
let mut map: HashMap<String, i32> = HashMap::new();
map.insert("alice", 42);
map.insert("bob", 17);

Closures

Anonymous functions that can capture variables from their enclosing scope:

let double = |x: i32| -> i32 { x * 2 };
let result = double(21);  // 42

// Closures capture variables
let multiplier = 3;
let multiply = |x: i32| -> i32 { x * multiplier };

Range-Based For Loops

Iterate over numeric ranges with ..:

// Exclusive range: 0, 1, 2, ..., 9
for i in 0..10 {
    println!("{}", i);
}

// Use in accumulation
let mut sum = 0;
for i in 1..101 {
    sum = sum + i;
}
// sum = 5050

Iterator Methods

Vec supports functional-style iterator methods:

let numbers = vec![1, 2, 3, 4, 5];

// Transform elements
let doubled = numbers.map(|x: i32| -> i32 { x * 2 });

// Filter elements
let evens = numbers.filter(|x: i32| -> bool { x % 2 == 0 });

// Check conditions
let has_negative = numbers.any(|x: i32| -> bool { x < 0 });
let all_positive = numbers.all(|x: i32| -> bool { x > 0 });

// Reduce to single value
let sum = numbers.fold(0, |acc: i32, x: i32| -> i32 { acc + x });

Option and Result Methods

Rich combinator APIs for safe value handling:

let opt: Option<i32> = Option::Some(42);

// Query
let is_there = opt.is_some();     // true
let is_empty = opt.is_none();     // false

// Extract with default
let val = opt.unwrap_or(0);       // 42

// Transform
let doubled = opt.map(|x: i32| -> i32 { x * 2 });  // Some(84)

// Chain operations
let result = opt.and_then(|x: i32| -> Option<i32> {
    if x > 0 { Option::Some(x * 10) } else { Option::None }
});

Pipe Operator

The pipe operator |> passes the result of the left expression as the first argument to the right function. It makes data transformation pipelines readable:

// Without pipe -- deeply nested calls
let result = to_uppercase(trim(read_file("data.txt")));

// With pipe -- reads left to right
let result = read_file("data.txt")
    |> trim
    |> to_uppercase;

// Works with closures and multi-argument functions
let processed = data
    |> filter(|x| x > 0)
    |> map(|x| x * 2)
    |> fold(0, |acc, x| acc + x);

Union Types

Union types allow a value to be one of several types, checked at compile time:

type JsonValue = i64 | f64 | String | bool | Vec<JsonValue>;

fn process(value: JsonValue) {
    match value {
        x: i64 => println!("integer: {}", x),
        x: f64 => println!("float: {}", x),
        s: String => println!("string: {}", s),
        b: bool => println!("bool: {}", b),
        arr: Vec<JsonValue> => println!("array of {}", arr.len()),
    }
}

Algebraic Effects

Effects declare the side effects a function may perform, tracked by the type system:

effect Log {
    fn log(message: String);
}

effect Fail {
    fn fail(reason: String) -> !;
}

fn process(data: Vec<u8>) -> Result<Output, Error> with Log, Fail {
    Log::log("Processing started");
    if data.is_empty() {
        Fail::fail("empty input");
    }
    // ...
}

Effects are handled at the call site:

handle process(data) {
    Log::log(msg) => {
        println!("[LOG] {}", msg);
        resume;
    }
    Fail::fail(reason) => {
        Result::Err(Error::new(reason))
    }
}

Supervisors

Supervisors manage the lifecycle of concurrent tasks with automatic restart strategies:

use std::concurrency::Supervisor;

let sup = Supervisor::new(RestartStrategy::OneForOne);

sup.spawn("worker-1", || {
    // If this task panics, only this task is restarted
    process_queue()
});

sup.spawn("worker-2", || {
    process_events()
});

sup.run();

Parallel For

Parallel iteration over collections with automatic work distribution:

// Parallel map over a vector
let results = parallel for item in data {
    heavy_computation(item)
};

// With explicit chunk size
let processed = parallel(chunk_size: 1024) for row in matrix {
    transform(row)
};

The compiler tracks energy consumption across all parallel branches and sums them for the total budget.

Computation Builders

Computation builders provide a monadic syntax for composing complex operations:

let result = async {
    let data = fetch(url).await;
    let parsed = parse(data).await;
    transform(parsed)
};

let query = query {
    from users
    where age > 18
    select name, email
    order_by name
};

Const Functions

Functions that can be evaluated at compile time:

const fn factorial(n: i32) -> i32 {
    if n <= 1 { 1 } else { n * factorial(n - 1) }
}

// Evaluated at compile time
const FACT_10: i32 = factorial(10);

Comptime Blocks

Execute arbitrary code at compile time:

comptime {
    let lookup = generate_lookup_table(256);
    // lookup is available as a constant in runtime code
}

Modules and Imports

// Import specific items
use crate::ast::{File, AstItem};
use std::collections::HashMap;

// Module declarations (loads from separate file)
mod lexer;      // loads from lexer.joule or lexer/mod.joule
mod parser;
mod typeck;

// Public module re-export
pub mod utils;

// Inline module
mod helpers {
    pub fn clamp(x: i32, lo: i32, hi: i32) -> i32 {
        if x < lo { lo } else if x > hi { hi } else { x }
    }
}

// Glob import from stdlib
use std::math::*;

Async/Await with Channels

Asynchronous programming with channels for communication:

use std::concurrency::{spawn, channel};

async fn fetch_and_process(url: String) -> Result<Data, Error> {
    let response = http::get(url).await?;
    let data = parse(response.body()).await?;
    Result::Ok(data)
}

// Bounded channels for backpressure
let (tx, rx) = channel(capacity: 100);

spawn(|| {
    for item in source {
        tx.send(item);
    }
});

while let Option::Some(item) = rx.recv() {
    process(item);
}

Smart Pointers

Manage shared ownership and heap allocation:

// Box — heap allocation, required for recursive types
let b = Box::new(42);

// Rc — single-threaded shared ownership
let shared = Rc::new(vec![1, 2, 3]);
let copy = shared.clone();   // reference count +1

// Arc — thread-safe shared ownership
let data = Arc::new(vec![1, 2, 3]);
spawn(|| { let local = data.clone(); });

// Cow — clone-on-write (free reads, allocate on mutation)
let text = Cow::borrowed("hello");

See Smart Pointers for full documentation.

Const-Generic Types

Types with compile-time integer parameters:

// SmallVec — inline buffer, heap only when overflow
let mut sv: SmallVec[i32; 8] = SmallVec::new();
sv.push(42);    // inline — no heap allocation

// Simd — portable SIMD vectors
let a: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let b: Simd[f32; 4] = Simd::splat(2.0);
let c = a.mul(&b);  // [2.0, 4.0, 6.0, 8.0] — single instruction

// NDArray — multi-dimensional arrays
let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]);
let val = mat[1, 2];

Box (Heap Allocation)

Box<T> puts data on the heap. Required for recursive types.

pub enum Expr {
    Literal(i32),
    Add {
        left: Box<Expr>,
        right: Box<Expr>,
    },
}

Type Aliases

pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;

Testing with Energy

Write tests that verify both correctness and energy consumption:

#[test]
fn test_sort_energy() {
    let data = vec![5, 3, 1, 4, 2];
    let sorted = sort(data);
    assert_eq!(sorted, vec![1, 2, 3, 4, 5]);
}

#[bench]
fn bench_matrix_multiply() {
    let a = Matrix::random(100, 100);
    let b = Matrix::random(100, 100);
    let _ = a.multiply(b);
}

Run with:

joulec program.joule --test    # runs tests with energy reporting
joulec program.joule --bench   # runs benchmarks with energy reporting

Built-in Macros

Joule provides built-in macros for common operations:

// Output
println!("Hello, {}!", name);       // print with newline
print!("no newline");               // print without newline

// Formatting
let s = format!("{} + {} = {}", a, b, a + b);

// Collections
let nums = vec![1, 2, 3, 4, 5];

// Assertions (for testing)
assert!(x > 0);
assert_eq!(result, expected);

For FFI with C libraries, use extern declarations:

extern fn sqrt(x: f64) -> f64;

What's Next

Energy System Guide -- Deep dive into energy budgets
Compiler Reference -- CLI flags and options
JIT Compilation -- Interactive development
Polyglot Energy Analysis -- Measure energy in Python/JS/C
Standard Library -- All 110+ modules
Language Reference -- Formal specification

Energy System Guide

Joule's defining feature is compile-time energy budget verification. This guide explains how it works and how to use it.

Computing consumes enormous amounts of energy, and most of it is invisible. Cloud providers report aggregate billing units. Industry benchmarks report averages. Nobody tells you what a single sort, a single allocation, or a single network call actually costs in joules.

Joule makes that cost visible. Every function can declare its energy budget, and the compiler enforces it at compile time.

Basic Usage

Annotate a function with #[energy_budget]:

#[energy_budget(max_joules = 0.0001)]  // 100 microjoules
fn add(x: i32, y: i32) -> i32 {
    x + y
}

Compile with energy checking enabled:

joulec program.joule -o program --energy-check

If the function's estimated energy exceeds the declared budget, compilation fails with a diagnostic:

error: energy budget exceeded in function 'process_data'
  --> program.joule:15:1
   |
15 | fn process_data(input: Vec<f64>) -> f64 {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = estimated: 0.00035 J (confidence: 85%)
   = budget:    0.00010 J
   = exceeded by 250%

Budget Types

Energy Budget (Joules)

The primary budget type. Limits total energy consumption:

#[energy_budget(max_joules = 0.0005)]
fn fibonacci(n: i32) -> i32 {
    // ...
}

Power Budget (Watts)

Limits average power draw. Useful for sustained workloads:

#[energy_budget(max_watts = 15.0)]
fn render_frame(scene: Scene) -> Image {
    // ...
}

Thermal Budget (Temperature Delta)

Limits the temperature increase caused by the function. Prevents thermal throttling:

#[energy_budget(max_temp_delta = 5.0)]  // max 5 degrees Celsius rise
fn heavy_compute(data: Vec<f64>) -> f64 {
    // ...
}

Thermal-Aware Functions

The #[thermal_aware] attribute marks functions that should adapt to thermal conditions:

#[energy_budget(max_joules = 0.0002)]
#[thermal_aware]
fn adaptive_compute(n: i32) -> i32 {
    let result = n * n;
    result + 1
}

Combining Budgets

You can declare multiple budget constraints:

#[energy_budget(max_joules = 0.001, max_watts = 20.0, max_temp_delta = 3.0)]
fn critical_path(data: Vec<f64>) -> f64 {
    // ...
}

How the Estimator Works

The compiler uses static analysis to estimate energy consumption without running your code. Here's what it considers:

Instruction Costs

Every operation has a calibrated energy cost in picojoules:

Operation	Approximate Cost	Cycles
Integer add/sub	0.05 pJ	1
Integer multiply	0.35 pJ	3
Integer divide	3.5 pJ	10
Float add/sub	0.35 pJ	3
Float multiply	0.35 pJ	3
Float divide	3.5 pJ	10
Float sqrt	5.25 pJ	15
L1 cache load	0.5 pJ	4
L2 cache load	3.0 pJ	12
L3 cache load	10.0 pJ	40
DRAM load/store	200.0 pJ	200
Branch (taken)	0.1 pJ	1
Branch misprediction	1.5 pJ	15
SIMD f32x8 multiply	1.5 pJ	3
Half-precision (f16/bf16) op	0.4 pJ	1
SmallVec inline push	0.5 pJ	1
SmallVec heap spill	45.0 pJ	~100
SIMD vector op (any width)	2.0 pJ	3
Atomic read-modify-write	8.0 pJ	20
Rc/Arc clone/drop	3.0 pJ	5
Arena bump alloc	1.0 pJ	2
Arena reset (free all)	0.5 pJ	1
BitSet/BitVec word op	0.3 pJ	1
Decimal (128-bit) arithmetic	5.0 pJ	15
Deque push/pop	2.0 pJ	5
Intern hash lookup	10.0 pJ	30
Complex arithmetic	1.6 pJ	4
Instant::now() (clock read)	15.0 pJ	50
BTreeMap/BTreeSet traversal	12.0 pJ	40

Loop Analysis

For loops with known bounds, the estimator multiplies the loop body cost by the iteration count. For unbounded loops (while with runtime conditions), it uses a configurable default (100 iterations) and reduces the confidence score.

Branch Analysis

For if/else and match expressions, the estimator computes the cost of each branch and averages them, since it can't know which branch will execute at compile time. This reduces the confidence score.

Confidence Score

Every estimate comes with a confidence score from 0.0 to 1.0:

1.0 -- Straight-line code, no loops or branches. Estimate is precise.
0.85-0.95 -- Code with branches. Estimate is an average.
0.5-0.85 -- Code with unbounded loops. Estimate depends on assumed iteration count.
< 0.5 -- Complex code with nested unbounded loops. Estimate is rough.

The confidence score is shown in diagnostic output so you can judge the reliability of the estimate.

Power Estimation

Power (watts) is derived from energy and estimated execution time:

Power = Energy / Time
Time = Estimated Cycles / CPU Frequency (3.0 GHz reference)

Thermal Estimation

Temperature delta is derived from power using a simplified thermal model:

Delta_T = Power * Thermal_Resistance (0.4 K/W typical)

Three-Tier Energy Measurement

Joule measures energy at three levels, providing increasing precision:

Tier 1: Static Estimation (Compile Time)

The compiler estimates energy from code structure alone, using the instruction cost model described above. This is available for all programs, everywhere, with zero runtime overhead.

No hardware access required
Works at compile time
Confidence score indicates reliability
Used for #[energy_budget] verification

Tier 2: CPU Performance Counters (Runtime)

On supported platforms, Joule reads hardware performance counters (RAPL on Intel/AMD) to measure actual CPU energy consumption during execution.

Requires Linux with perf_event or macOS with powermetrics
Per-function and per-scope measurements
Joule-level precision (not just watt-hours)

Tier 3: Accelerator Energy (Runtime)

For GPU and accelerator workloads, Joule queries vendor-specific energy APIs. See Accelerator Energy Measurement for details.

NVIDIA GPUs via NVML
AMD GPUs via ROCm SMI
Intel GPUs/accelerators via Level Zero
Google TPUs via TPU runtime
AWS Inferentia/Trainium via Neuron SDK
Groq LPUs via HLML
Cerebras and SambaNova via vendor APIs

JSON Output Mode

For programmatic consumption, set the environment variable JOULE_ENERGY_JSON=1:

JOULE_ENERGY_JSON=1 joulec program.joule --emit c --energy-check -o program.c

This outputs energy reports as structured JSON:

{
  "functions": [
    {
      "name": "process_data",
      "file": "program.joule",
      "line": 15,
      "energy_joules": 0.00035,
      "power_watts": 12.5,
      "confidence": 0.85,
      "budget_joules": 0.0001,
      "status": "exceeded",
      "breakdown": {
        "compute_pj": 280000,
        "memory_pj": 70000,
        "branch_pj": 500
      }
    }
  ],
  "total_energy_joules": 0.00042,
  "device": "cpu"
}

When accelerator energy is available, the JSON includes per-device breakdowns:

{
  "devices": [
    { "type": "cpu", "energy_joules": 0.00042 },
    { "type": "gpu", "vendor": "nvidia", "energy_joules": 0.0031, "api": "nvml" }
  ],
  "total_energy_joules": 0.00352
}

Practical Guidelines

Start Generous, Then Tighten

Begin with a generous budget, measure, then reduce:

// Start here
#[energy_budget(max_joules = 0.01)]

// After profiling, tighten
#[energy_budget(max_joules = 0.001)]

// Production target
#[energy_budget(max_joules = 0.0005)]

Budget Hot Loops Carefully

The estimator assumes 100 iterations for unbounded loops. If your loop runs 10,000 times, the estimate will be 100x too low. Consider refactoring into bounded loops or adjusting your budget accordingly.

Use Confidence Scores

If the compiler reports low confidence (< 0.7), the estimate may be significantly off. Review the function for unbounded loops and complex branching.

Transitive Energy Budgets

Energy budgets are enforced across call boundaries. A function calling another budgeted function includes the callee's energy in its own estimate:

#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }

#[energy_budget(max_joules = 0.0005)]
fn main_work() -> i32 {
    // The compiler accounts for helper's energy within main_work's budget
    helper() + helper()
}

For the most accurate energy estimates, use profile-guided optimization:

# Phase 1: instrument and run
joulec program.joule --profile-generate -o program
./program

# Phase 2: compile with profile data
joulec program.joule --profile-use profile.json --energy-check -o program

The profile data provides actual loop trip counts and branch frequencies, dramatically improving estimate accuracy.

Feedback

Questions about the energy system? Visit joule-lang.org or open an issue on GitHub.

Compiler Reference

Usage

joulec <INPUT> [OPTIONS]

Where <INPUT> is a .joule source file (or a foreign source file when using --lift-run).

Options

Flag	Description	Default
`-o <FILE>`	Output file path	Derived from input
`--emit <TYPE>`	Emit intermediate representation: `ast`, `hir`, `mir`, `llvm-ir`, `c`, `eir`	(compile to binary)
`--backend <BACKEND>`	Code generation backend: `cranelift`, `llvm`, `mlir`, `auto`	`cranelift`
`--target <TARGET>`	Target platform: `cpu`, `cuda`, `metal`, `rocm`, `hybrid`	`cpu`
`-O <LEVEL>`	Optimization level: `0`, `1`, `2`, `3`	`0`
`--energy-check`	Enable compile-time energy budget verification	Off
`--gpu`	Enable GPU code generation (uses MLIR backend)	Off
`--jit`	JIT-compile and run immediately (requires `--features jit`)	Off
`--watch`	Watch source file and re-run on changes (implies `--jit`)	Off
`--lift <LANG>`	Lift foreign code for energy analysis: `python`, `js`, `c`	(none)
`--lift-run <LANG> <FILE>`	Lift and execute foreign code with energy tracking	(none)
`--energy-optimize`	Apply energy optimization passes to lifted code	Off
`--egraph-optimize`	Enable e-graph algebraic optimization (30+ rewrite rules)	Off
`--profile-generate`	Instrument code for profile-guided optimization	Off
`--profile-use <FILE>`	Apply PGO profile data from a previous run	(none)
`--incremental`	Enable incremental compilation (FNV-1a fingerprinting)	Off
`--test`	Build and run `#[test]` functions with energy reporting	Off
`--bench`	Build and run `#[bench]` functions with energy reporting	Off
`--debug`	Debug build profile (no optimizations, debug info)	Default
`--release`	Release build profile (`-O2`, strip debug info)	Off
`--stdlib-path <DIR>`	Path to the Joule standard library	Built-in
`-v, --verbose`	Verbose compiler output	Off

Environment Variables

Variable	Description
`JOULE_ENERGY_JSON=1`	Output energy reports as JSON instead of human-readable text

Examples

Basic Compilation

# Compile to executable via C backend
joulec program.joule --emit c -o program.c
cc -o program program.c

# Compile with energy checking
joulec program.joule --emit c -o program.c --energy-check

# Release build with optimizations
joulec program.joule --release -o program

Emit Intermediate Representations

# Emit the AST (for debugging)
joulec program.joule --emit ast

# Emit HIR (typed intermediate representation)
joulec program.joule --emit hir

# Emit MIR (mid-level IR, after lowering)
joulec program.joule --emit mir

# Emit EIR (Energy IR with picojoule cost annotations)
joulec program.joule --emit eir

JIT Compilation

# JIT-compile and run immediately
joulec --jit program.joule

# Watch mode: re-compile and re-run on file changes
joulec --watch program.joule

See JIT Compilation for details.

Polyglot Energy Analysis

# Lift and run Python with energy measurement
joulec --lift-run python script.py

# Lift and run JavaScript with energy measurement
joulec --lift-run js app.js

# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py

See Polyglot Energy Analysis for details.

Advanced Optimization

# E-graph algebraic optimization
joulec program.joule --emit c --egraph-optimize -o program.c

# Profile-guided optimization (two-phase)
joulec program.joule --profile-generate -o program
./program                    # generates profile data
joulec program.joule --profile-use profile.json -o program_optimized

# Incremental compilation
joulec program.joule --incremental -o program

Testing and Benchmarking

# Run tests with energy reporting
joulec program.joule --test

# Run benchmarks with energy reporting
joulec program.joule --bench

JSON Energy Output

# Get energy reports as JSON
JOULE_ENERGY_JSON=1 joulec program.joule --emit c -o program.c --energy-check

Compilation Pipeline

Source code flows through these stages:

Source (.joule)
    |
    v
  Lexer ---------- Tokens
    |
    v
  Parser --------- AST (Abstract Syntax Tree)
    |
    v
  Type Checker ---- HIR (High-level IR) + Type Information
    |
    +-- Energy Budget Checker (if --energy-check)
    |
    v
  EIR Lowering ---- EIR (Energy IR) [if --egraph-optimize or --emit eir]
    |
    +-- E-Graph Optimizer (30+ algebraic rewrite rules)
    |
    v
  MIR Lowering ---- MIR (Mid-level IR)
    |
    v
  Borrow Checker -- Ownership/lifetime verification
    |
    v
  Code Generation
    +-- C Backend ---------- C source code
    +-- Cranelift Backend --- Native binary (fast compilation)
    +-- Cranelift JIT ------- In-memory execution (--jit/--watch)
    +-- LLVM Backend -------- Native binary (optimized)
    +-- MLIR Backend -------- GPU/accelerator code
    +-- WASM Backend -------- WebAssembly

Incremental Compilation

When --incremental is enabled, the compiler:

Fingerprints each source file using FNV-1a hashing
Builds a dependency graph between modules
On recompilation, only reprocesses files whose fingerprint changed (or whose dependencies changed)
Caches query results to disk as JSON for persistence across sessions

Profile-Guided Optimization

PGO is a two-phase process:

Phase 1 (--profile-generate): The compiler instruments the C output with basic-block counters. Running the instrumented binary produces a JSON profile with execution frequencies.
Phase 2 (--profile-use): The compiler reads the profile and refines EIR energy cost estimates using actual execution frequencies. Loop trip counts are derived from back-edge counter ratios. Hot paths get more accurate energy budgets.

Backends

C Backend (`--emit c`)

Generates portable C source code. This is the primary backend and the one used for the bootstrap compiler. The generated C compiles with any standard C compiler (gcc, clang, cc).

joulec program.joule --emit c -o program.c
cc -o program program.c

Features:

Freestanding mode for embedded targets (jrt_* runtime abstraction)
#line directives for source-level debugging
Energy instrumentation for PGO

Cranelift Backend

Fast compilation, suitable for development. Uses the Cranelift code generator. Enable with --features cranelift.

Cranelift JIT Backend

In-memory compilation and execution. No intermediate files. Enable with --features jit.

joulec --jit program.joule

LLVM Backend

Optimized compilation for release builds. Requires LLVM 16+. Enable with --features llvm.

MLIR Backend

Heterogeneous computing with GPU/accelerator support. Targets CUDA, Metal, and ROCm. Enable with --gpu.

WASM Backend

WebAssembly output for browser and edge deployment.

File Extension

Joule source files must use the .joule extension. The compiler rejects all other extensions (except when using --lift-run, which accepts .py, .js, and .c).

Energy Checking

When --energy-check is passed, the compiler performs static analysis on every function with an #[energy_budget] attribute. Functions that exceed their declared budget produce a compilation error.

See Energy System Guide for details.

Diagnostics

The compiler produces structured error messages with source locations:

error[E0001]: mismatched types
  --> program.joule:10:15
   |
10 |     let x: i32 = "hello";
   |                  ^^^^^^^ expected i32, found String

Warnings are shown for potential issues but don't prevent compilation (unless you've set strict mode).

JIT Compilation

Joule supports just-in-time compilation for interactive development. Instead of producing an executable file, the compiler compiles your code in memory and runs it immediately.

Quick Start

# JIT-compile and run
joulec --jit program.joule

# Watch mode: re-compile on file changes
joulec --watch program.joule

Requirements

JIT mode requires the jit feature flag, which enables the Cranelift JIT backend:

# Build joulec with JIT support
cargo build --release -p joulec --features jit

The feature chain is: jit -> cranelift -> joule-codegen-cranelift + joule-codegen + notify.

How It Works

JIT Mode (`--jit`)

Source code is parsed, type-checked, and lowered to MIR (the same pipeline as normal compilation)
MIR is translated to Cranelift IR
Cranelift compiles the IR to native machine code in memory
The main() function is called directly via a function pointer
The program runs and exits

No intermediate files are produced. No C compiler is invoked. Compilation and execution happen in a single process.

Watch Mode (`--watch`)

Watch mode extends JIT with file monitoring:

The source file is JIT-compiled and run (same as --jit)
The notify crate monitors the source file for changes
When the file is saved, a fresh JIT module is created and the program re-runs
A 50ms debounce prevents multiple re-runs from editor save-rename sequences

Each watch cycle creates a fresh JITModule because Cranelift's JIT module cannot redefine functions. This ensures clean state on every re-run.

Architecture

FunctionTranslator

The FunctionTranslator<'a, M: Module> is generic over the module type:

Module Type	Mode	Output
`ObjectModule`	AOT compilation	Object file (`.o`)
`JITModule`	JIT compilation	In-memory executable code

This means the same translation logic handles both AOT and JIT -- no code duplication.

Runtime Symbols

JIT mode provides runtime symbols that replace the C runtime's functions:

Symbol	Purpose
`joule_jit_println`	Print a string with newline
`joule_jit_print`	Print a string without newline
`joule_jit_panic`	Panic with a message
`malloc`	Memory allocation (libc)
`free`	Memory deallocation (libc)
`memcpy`	Memory copy (libc)

These symbols are registered with the JITModule before compilation so that generated code can call them.

PIC Mode

JIT compilation uses position-independent code (PIC) = false, since the code runs in a known memory location. AOT compilation uses PIC = true for shared library compatibility.

Energy Tracking

JIT mode includes full energy tracking. Energy consumed during execution is measured and reported:

$ joulec --jit program.joule
Hello from JIT!
Energy consumed: 0.000123 J

Energy budgets declared with #[energy_budget] are checked at compile time, before JIT execution begins. If a budget is violated, compilation fails and the program does not run.

Limitations

No persistent output: JIT mode does not produce an executable file. For deployment, use the C backend or AOT Cranelift compilation.
Single-file: JIT mode currently compiles a single source file. Multi-file projects should use mod declarations within the entry file.
Feature gate: JIT support is behind --features jit to keep the default binary small. The notify dependency is only pulled in when JIT is enabled.

Use Cases

Rapid Prototyping

JIT mode eliminates the compile-link-run cycle:

# Edit, save, see results instantly
joulec --watch prototype.joule

Energy Experimentation

Try different algorithms and immediately see their energy impact:

// Try bubble sort
#[energy_budget(max_joules = 0.001)]
fn sort_experiment(data: Vec<i32>) -> Vec<i32> {
    bubble_sort(data)
}

joulec --jit experiment.joule
# Change to quicksort, save, see new energy reading

Interactive Testing

Run tests without a full build:

joulec --jit --test tests.joule

Comparison with Other Modes

Mode	Command	Speed	Output	Use Case
JIT	`--jit`	Fastest	None (runs in memory)	Development
Watch	`--watch`	Fast (re-runs on save)	None	Interactive development
C Backend	`--emit c`	Moderate	`.c` file	Deployment, bootstrap
Cranelift AOT	(default)	Fast	Binary	Development builds
LLVM	`--features llvm`	Slow	Optimized binary	Release builds

Polyglot Energy Analysis

Joule can measure and optimize the energy consumption of code written in other languages. The --lift-run flag lifts foreign code into Joule's intermediate representation, applies energy analysis, and executes it with full energy tracking.

Quick Start

# Measure energy in a Python script
joulec --lift-run python script.py

# Measure energy in a JavaScript file
joulec --lift-run js app.js

# Measure energy in C code
joulec --lift-run c program.c

# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py

# Generate JSON energy report
joulec --lift python script.py --energy-report report.json

# Set energy budget (exit code 1 if exceeded)
joulec --lift python script.py --energy-budget 100nJ

How It Works

The polyglot pipeline has four stages:

Parse: The source file is parsed by a language-specific parser (Python, JavaScript, or C) into Joule's LiftedModule representation.
Lower: The lifted AST is lowered to MIR (Mid-level IR), the same representation used for native Joule code. Variables, functions, classes, and control flow are all mapped to MIR constructs.
Optimize (optional): When --energy-optimize is passed, four energy optimization passes are applied to the MIR before execution.
Execute: The MIR is JIT-compiled via the Cranelift backend and executed in-memory. Energy consumption is tracked throughout execution and reported at the end.

Supported Languages

Python

Comprehensive support for Python syntax and semantics:

Feature	Status
Functions, closures, lambdas	Supported
Classes (single and multiple inheritance)	Supported
List/dict/set comprehensions	Supported
f-strings	Supported
Ternary expressions	Supported
enumerate/zip	Supported
match/case (Python 3.10+)	Supported
Walrus operator (`:=`)	Supported
try/except/finally	Supported (guard patterns)
Slicing with step	Supported
Default arguments	Supported
`args`, `*kwargs`	Supported
Generator expressions	Supported
String methods (30+)	Supported
List methods (20+)	Supported
Dict methods (15+)	Supported
Math module	Supported
Print with `end=`	Supported
True division	Supported
BigInt overflow handling	Supported

JavaScript

Comprehensive support for JavaScript syntax and semantics:

Feature	Status
Functions, arrow functions	Supported
Classes (single inheritance)	Supported
Template literals	Supported
Destructuring	Supported
Spread operator	Supported
switch/case	Supported
for-in/for-of	Supported
do-while	Supported
Bitwise operators	Supported
typeof	Supported
Nullish coalescing (`??`)	Supported
Optional chaining (`?.`)	Supported
Array methods (20+)	Supported
String methods (15+)	Supported
Object methods	Supported
Math object	Supported
console.log	Supported
`this` keyword	Supported

C

Basic support for C code:

Feature	Status
Functions	Supported
Basic types (int, float, double, char)	Supported
Arrays	Supported
Pointers	Supported
Control flow (if, while, for)	Supported
stdio (printf, scanf)	Supported
math.h functions	Supported

TypeScript

TypeScript types are erased before analysis — the energy profile is identical to JavaScript. See the TypeScript Guide for details.

Feature	Status
Everything in JavaScript	Supported
Type annotations	Stripped
Interfaces, type aliases, generics	Stripped
Access modifiers (public/private)	Stripped
Enums (simple)	Converted to constants

Go

Feature	Status
Functions, closures, variadic	Supported
for, for range, if/else, switch	Supported
Slices, maps, structs, methods	Supported
Goroutines (`go`)	Supported (sequential analysis)
Channels (`chan`, `<-`)	Supported
`defer`	Supported
Multiple return values	Supported
`fmt`, `math`, `strings`, `strconv`	Supported

Rust

Feature	Status
Functions, closures, impl blocks	Supported
for/while/loop, if/else, match	Supported
`let`/`let mut`, ownership annotations	Supported
Structs, enums, Option, Result	Supported
Vec, HashMap, String, Box	Supported
Iterator chains (.map/.filter/.fold)	Supported
`println!`, `format!`, `vec!`	Supported
Traits (signatures only)	Supported

Energy Recommendations

When analyzing code, Joule detects common energy anti-patterns and suggests fixes. Categories include:

ALGORITHM -- Nested loops where a hash set would be O(1)
ALLOCATION -- Heap allocation inside hot loops
REDUNDANCY -- Recomputed values that could be hoisted
DATA STRUCTURE -- Linear search where a set/map is more efficient
LOOP -- Missing early exits, unbounded iteration
STRING -- String concatenation in loops (O(n^2))
MEMORY -- Cache-unfriendly access patterns
PRECISION -- Float arithmetic where integer suffices

See the per-language guides for language-specific examples of each pattern.

Runtime System

The lift-run runtime provides 100+ shim functions that bridge language-specific operations to native code:

String Operations

str_new, str_concat, str_len, str_print, str_from_int, str_from_float, str_eq, str_index, str_slice, str_contains, str_mul, str_cmp, str_upper, str_lower, str_trim, str_split, str_replace, str_starts_with, str_ends_with, str_index_of, and more.

List Operations

list_new, list_push, list_get, list_set, list_len, list_pop, list_sort, list_reverse, list_copy, list_append, list_index_of, list_contains, list_slice, list_map, list_filter, and more.

Dict Operations

dict_new, dict_set, dict_get, dict_len, dict_get_default, dict_pop, dict_update, dict_setdefault, dict_keys, dict_values, dict_items, dict_contains, and more.

Class Desugaring

Classes from Python and JavaScript are desugared to dictionary-backed standalone functions:

# Python source
class Counter:
    def __init__(self, start):
        self.count = start

    def increment(self):
        self.count += 1
        return self.count

This is lowered to:

Counter____init__(self, start) -- constructor function
Counter__increment(self) -- method function
self is a dictionary with fields as key-value pairs

Multiple inheritance is supported using BFS method resolution order (MRO).

Energy Optimization Passes

When --energy-optimize is used, four passes optimize the lifted code:

Constant Propagation -- Propagate known values, fold constant expressions
Dead Code Elimination -- Remove unreachable and unused code
Loop Optimization -- Reduce redundant computation in loops
Strength Reduction -- Replace expensive operations with cheaper equivalents

Test Coverage

The polyglot pipeline is validated by 1,220 tests across 8 test suites:

Suite	Count	Description
Tiered validation	90	Core feature coverage
Edge cases	80	Corner cases and error handling
Domain	100	50 Python + 50 JS across 5 domains
Stdlib	100	50 Python + 50 JS: string/list methods, default args
Classes	50	Inheritance, MRO, properties, static methods
Advanced	50	Closures, generators, decorators, metaclasses
Syntax	50	Language-specific syntax features
Coverage	700	Division, print, comprehensions, string ops

Total: 1,220/1,220 (100% pass rate)

Examples

Python Energy Analysis

# fibonacci.py
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

result = fibonacci(30)
print(f"Result: {result}")

$ joulec --lift-run python fibonacci.py
Result: 832040
Energy consumed: 0.00234 J

JavaScript Energy Analysis

// sort.js
function quickSort(arr) {
    if (arr.length <= 1) return arr;
    const pivot = arr[0];
    const left = arr.filter(x => x < pivot);
    const right = arr.filter(x => x > pivot);
    return [...quickSort(left), pivot, ...quickSort(right)];
}

const data = Array.from({length: 1000}, () => Math.floor(Math.random() * 10000));
const sorted = quickSort(data);
console.log(`Sorted ${sorted.length} elements`);

$ joulec --lift-run js sort.js
Sorted 1000 elements
Energy consumed: 0.00891 J

Energy-Optimized Execution

$ joulec --energy-optimize --lift-run python fibonacci.py
Result: 832040
Energy consumed: 0.00198 J (15.4% reduction)

Static Analysis Mode

For energy analysis without execution, use --lift instead of --lift-run:

# Analyze without running
joulec --lift python script.py

# Output includes per-function energy estimates

This performs the parsing and lowering steps but stops before JIT compilation, producing a static energy report for each function.

Per-Language Guides

For detailed anti-patterns, optimization tips, and worked examples specific to each language:

Python Guide -- 100+ runtime shims, classes, comprehensions, f-strings
JavaScript Guide -- Arrow functions, template literals, array methods
TypeScript Guide -- Type erasure, identical energy to JavaScript
C Guide -- Memory allocation patterns, cache analysis
Go Guide -- Goroutines, channels, slice operations
Rust Guide -- Iterator chains, zero-cost abstractions

Python Energy Analysis

Joule provides comprehensive energy analysis for Python code. With 100+ runtime shims covering strings, lists, dicts, classes, comprehensions, and f-strings, most idiomatic Python runs unmodified.

Quick Start

# Static energy analysis (no execution)
joulec --lift python script.py

# Execute with energy tracking
joulec --lift-run python script.py

# Execute with energy optimization
joulec --energy-optimize --lift-run python script.py

# Generate JSON report for CI
joulec --lift python script.py --energy-report report.json

Supported Features

Category	Features
Functions	`def`, `lambda`, closures, default arguments, `args`, `*kwargs`
Classes	Single and multiple inheritance, `__init__`, methods, properties, static methods, BFS MRO
Control flow	`if/elif/else`, `while`, `for x in`, `break`, `continue`, `return`
Comprehensions	List `[x for x in ...]`, dict `{k:v for ...}`, set `{x for ...}`, generator `(x for ...)`
String features	f-strings, `.upper()`, `.lower()`, `.strip()`, `.split()`, `.replace()`, `.startswith()`, `.endswith()`, `.join()`, `.find()`, `.index()`, `+` concatenation, `*` repetition (30+ methods)
List features	`.append()`, `.pop()`, `.sort()`, `.reverse()`, `.index()`, `.count()`, `.copy()`, slicing, `len()`, `in` operator (20+ methods)
Dict features	`.get()`, `.pop()`, `.update()`, `.setdefault()`, `.keys()`, `.values()`, `.items()`, `in` operator (15+ methods)
Math	`math.floor()`, `math.ceil()`, `math.sqrt()`, `math.pow()`, `abs()`, `min()`, `max()`, `sum()`, `range()`
Expressions	Ternary `x if cond else y`, walrus `:=`, `match/case`, `enumerate()`, `zip()`, true division, `**` power
Error handling	`try/except/finally` with guard patterns (division, key, bounds)
Types	`int` (i64 + BigInt overflow), `float` (f64), `bool`, `str`, `list`, `dict`, `set`, `None`
Print	`print()` with `end=` parameter, polymorphic output (int/float/string)

Common Energy Anti-Patterns

1. String Concatenation in Loops

# BAD — O(n^2) energy: each += allocates a new string
result = ""
for word in words:
    result += word + " "

# GOOD — O(n) energy: join allocates once
result = " ".join(words)

Category: STRING | Severity: High | Savings: ~10x for large inputs

Each += on a string allocates a new buffer and copies the entire accumulated string. For 1,000 words averaging 5 characters, the bad version performs ~2.5 million character copies. The good version performs ~5,000.

2. Linear Search on List vs Set

# BAD — O(n) per lookup = O(n*m) total
for item in queries:
    if item in large_list:     # linear scan every time
        process(item)

# GOOD — O(1) per lookup = O(n+m) total
lookup = set(large_list)       # one-time O(n) cost
for item in queries:
    if item in lookup:         # hash lookup
        process(item)

Category: DATA STRUCTURE | Severity: High | Savings: ~50x for 10K elements

3. Allocation Inside Hot Loops

# BAD — allocates a new list every iteration
for i in range(1000):
    temp = []
    temp.append(i)
    process(temp)

# GOOD — reuse buffer
temp = []
for i in range(1000):
    temp.clear()
    temp.append(i)
    process(temp)

Category: ALLOCATION | Severity: Medium | Savings: ~3x

4. Missing Early Exit

# BAD — always scans entire list
def find_first(items, target):
    result = -1
    for i in range(len(items)):
        if items[i] == target:
            result = i
    return result

# GOOD — exits on first match
def find_first(items, target):
    for i in range(len(items)):
        if items[i] == target:
            return i
    return -1

Category: LOOP | Severity: Medium | Savings: ~2x average case

5. Recomputing Loop Invariants

# BAD — len(data) recomputed every iteration
for i in range(len(data)):
    if i < len(data) - 1:
        process(data[i], data[i + 1])

# GOOD — compute once
n = len(data)
for i in range(n - 1):
    process(data[i], data[i + 1])

Category: REDUNDANCY | Severity: Low | Savings: ~1.2x

Worked Example

Given a data processing pipeline:

class DataProcessor:
    def __init__(self, data):
        self.data = data
        self.results = []

    def filter_positive(self):
        filtered = []
        for x in self.data:
            if x > 0:
                filtered.append(x)
        self.data = filtered

    def normalize(self):
        total = sum(self.data)
        self.data = [x / total for x in self.data]

    def to_report(self):
        report = ""
        for i in range(len(self.data)):
            report += f"Item {i}: {self.data[i]}\n"
        return report

def main():
    proc = DataProcessor([3.0, -1.0, 4.0, -2.0, 5.0, 1.0])
    proc.filter_positive()
    proc.normalize()
    print(proc.to_report())

main()

Running energy analysis:

$ joulec --lift python pipeline.py
Energy Analysis: pipeline.py

  DataProcessor____init__     2.35 nJ  (confidence: 0.95)
  DataProcessor__filter_positive  8.72 nJ  (confidence: 0.65)
  DataProcessor__normalize    6.15 nJ  (confidence: 0.70)
  DataProcessor__to_report   14.80 nJ  (confidence: 0.55)
  main                        3.20 nJ  (confidence: 0.90)

  Total: 35.22 nJ

Recommendations:
  !! [STRING] DataProcessor__to_report — string concatenation in loop
     Suggestion: use "".join() to build string in one allocation
     Estimated savings: 8-10x for large inputs

  !  [REDUNDANCY] DataProcessor__to_report — len() called inside loop range
     Suggestion: compute len() once before the loop
     Estimated savings: 1.2x

JSON Energy Report

$ joulec --lift python pipeline.py --energy-report report.json

{
  "source_file": "pipeline.py",
  "language": "python",
  "functions": [
    {
      "name": "DataProcessor__to_report",
      "energy_pj": 14800,
      "energy_human": "14.80 nJ",
      "confidence": 0.55
    }
  ],
  "total_energy_pj": 35220,
  "total_energy_human": "35.22 nJ",
  "functions_lifted": 5,
  "constructs_approximated": 2,
  "recommendations": [
    {
      "function": "DataProcessor__to_report",
      "category": "STRING",
      "severity": "high",
      "issue": "string concatenation in loop",
      "suggestion": "use join() to build string in one allocation",
      "savings_factor": 8.0
    }
  ]
}

Energy Budget for CI

# Fail the build if total energy exceeds 50 nJ
$ joulec --lift python pipeline.py --energy-budget 50nJ
# Exit code 0: within budget

$ joulec --lift python pipeline.py --energy-budget 20nJ
# Exit code 1: budget exceeded (35.22 nJ > 20.00 nJ)

Limitations

No external package imports (import numpy, import requests, etc.) -- only built-in operations
try/except uses guard patterns (division, key, bounds) rather than full exception semantics
Generator execution is approximated (constant iteration count estimate)
No async/await -- async patterns are desugared to synchronous equivalents
No decorator side effects -- decorators are recognized but not executed
Class __repr__, __str__, __eq__ dunder methods are not auto-dispatched

JavaScript Energy Analysis

Joule lifts JavaScript into its energy analysis pipeline, providing per-function energy estimates for Node.js and browser-style code. Arrow functions, template literals, classes, destructuring, and 20+ array methods are fully supported.

Quick Start

# Static energy analysis
joulec --lift js app.js

# Execute with energy tracking
joulec --lift-run js app.js

# Execute with energy optimization
joulec --energy-optimize --lift-run js app.js

Supported Features

Category	Features
Functions	`function`, arrow functions `=>`, default params, rest params `...args`
Classes	`class`, `constructor`, `extends`, methods, `static`, `this`, `super`
Control flow	`if/else`, `while`, `do-while`, `for`, `for-in`, `for-of`, `switch/case`, `break`, `continue`
Destructuring	Array `[a, b] = arr`, object `{x, y} = obj`, nested, with defaults
Operators	Spread `...`, nullish coalescing `??`, optional chaining `?.`, `typeof`, bitwise
Template literals	`Hello ${name}` with expression interpolation
Array methods	`.push()`, `.pop()`, `.map()`, `.filter()`, `.reduce()`, `.find()`, `.findIndex()`, `.some()`, `.every()`, `.forEach()`, `.indexOf()`, `.includes()`, `.slice()`, `.splice()`, `.concat()`, `.reverse()`, `.sort()`, `.join()`, `.flat()`, `.length`
String methods	`.length`, `.charAt()`, `.indexOf()`, `.includes()`, `.slice()`, `.substring()`, `.toUpperCase()`, `.toLowerCase()`, `.trim()`, `.split()`, `.replace()`, `.startsWith()`, `.endsWith()`, `.repeat()`
Object methods	`Object.keys()`, `Object.values()`, `Object.entries()`
Math	`Math.floor()`, `Math.ceil()`, `Math.round()`, `Math.abs()`, `Math.max()`, `Math.min()`, `Math.pow()`, `Math.sqrt()`, `Math.random()`, `Math.PI`
Output	`console.log()` with auto-coercion
Types	Numbers (f64), strings, booleans, arrays, objects, `null`, `undefined`

Common Energy Anti-Patterns

1. Chained Array Methods Creating Intermediates

// BAD — 3 intermediate arrays allocated
const result = data
    .filter(x => x > 0)     // allocates filtered array
    .map(x => x * 2)        // allocates mapped array
    .reduce((a, b) => a + b, 0);  // iterates again

// GOOD — single pass, one allocation
let result = 0;
for (const x of data) {
    if (x > 0) result += x * 2;
}

Category: ALLOCATION | Severity: High | Savings: ~3x (eliminates 2 intermediate allocations)

2. indexOf on Large Arrays

// BAD — O(n) per check
for (const query of queries) {
    if (data.indexOf(query) !== -1) {
        process(query);
    }
}

// GOOD — O(1) per check with Set
const lookup = new Set(data);
for (const query of queries) {
    if (lookup.has(query)) {
        process(query);
    }
}

Category: DATA STRUCTURE | Severity: High | Savings: ~50x for 10K elements

3. Template Literals in Tight Loops

// BAD — string allocation every iteration
for (let i = 0; i < 10000; i++) {
    const msg = `Processing item ${i} of ${total}`;
    log(msg);
}

// GOOD — build once if constant parts dominate
const prefix = "Processing item ";
const suffix = " of " + total;
for (let i = 0; i < 10000; i++) {
    log(prefix + i + suffix);
}

Category: STRING | Severity: Medium | Savings: ~2x

4. Nested for-of Loops

// BAD — O(n*m) with no early exit
function findPair(arr1, arr2, target) {
    for (const a of arr1) {
        for (const b of arr2) {
            if (a + b === target) return [a, b];
        }
    }
    return null;
}

// GOOD — O(n+m) with hash set
function findPair(arr1, arr2, target) {
    const seen = new Set(arr1);
    for (const b of arr2) {
        if (seen.has(target - b)) return [target - b, b];
    }
    return null;
}

Category: ALGORITHM | Severity: Critical | Savings: ~100x for large inputs

5. forEach with Closure Allocation

// BAD — allocates closure object per iteration
data.forEach(function(item) {
    if (item.active) results.push(item.name);
});

// GOOD — for-of avoids closure overhead
for (const item of data) {
    if (item.active) results.push(item.name);
}

Category: ALLOCATION | Severity: Low | Savings: ~1.3x

Worked Example

class EventQueue {
    constructor() {
        this.events = [];
        this.handlers = [];
    }

    on(type, handler) {
        this.handlers.push({ type: type, fn: handler });
    }

    emit(type, data) {
        this.events.push({ type: type, data: data, time: Date.now() });
        const matching = this.handlers.filter(h => h.type === type);
        matching.forEach(h => h.fn(data));
    }

    getEventsByType(type) {
        return this.events.filter(e => e.type === type);
    }
}

function main() {
    const queue = new EventQueue();
    let total = 0;

    queue.on("data", function(val) { total += val; });
    queue.on("data", function(val) { console.log(`Received: ${val}`); });

    for (let i = 0; i < 100; i++) {
        queue.emit("data", i);
    }

    console.log(`Total: ${total}`);
    const dataEvents = queue.getEventsByType("data");
    console.log(`Events logged: ${dataEvents.length}`);
}

main();

$ joulec --lift js events.js
Energy Analysis: events.js

  EventQueue__constructor    1.20 nJ  (confidence: 0.95)
  EventQueue__on             2.10 nJ  (confidence: 0.90)
  EventQueue__emit          18.50 nJ  (confidence: 0.60)
  EventQueue__getEventsByType  5.30 nJ  (confidence: 0.65)
  main                       8.40 nJ  (confidence: 0.55)

  Total: 35.50 nJ

Recommendations:
  !! [ALLOCATION] EventQueue__emit — filter() + forEach() chain allocates intermediate array
     Suggestion: use a single for-of loop to filter and dispatch in one pass
     Estimated savings: 2-3x

Limitations

No DOM APIs (document, window, fetch, etc.)
No require() or import of npm modules
async/await and Promises are approximated as synchronous
No WeakMap, WeakSet, Proxy, Reflect
No regular expressions (regex literals are parsed but not executed)
Date.now() returns a simulated timestamp
No eval() or dynamic code execution

TypeScript Energy Analysis

Joule analyzes TypeScript by stripping type annotations and delegating to the JavaScript pipeline. Since TypeScript types are erased at compile time, the energy profile of a TypeScript program is identical to its JavaScript equivalent.

Quick Start

# Static energy analysis
joulec --lift ts app.ts

# Execute with energy tracking
joulec --lift-run ts app.ts

# Execute with energy optimization
joulec --energy-optimize --lift-run ts app.ts

How It Works

The TypeScript lifter removes all TypeScript-specific syntax before analysis:

Type annotations — x: number, fn(s: string): void
Interfaces — interface Foo { ... }
Type aliases — type Result = Success | Error
Generics — Array<number>, Map<string, number>
Access modifiers — public, private, protected, readonly
Enums — enum Color { Red, Green, Blue }
Non-null assertions — value!
Type casts — value as Type, <Type>value

After stripping, the remaining JavaScript is analyzed normally. This means TypeScript types are free — they add zero energy overhead.

Type Safety is Free

// TypeScript version
interface Point {
    x: number;
    y: number;
}

function distance(a: Point, b: Point): number {
    const dx: number = a.x - b.x;
    const dy: number = a.y - b.y;
    return Math.sqrt(dx * dx + dy * dy);
}

// Equivalent JavaScript
function distance(a, b) {
    const dx = a.x - b.x;
    const dy = a.y - b.y;
    return Math.sqrt(dx * dx + dy * dy);
}

Both produce exactly the same energy analysis:

$ joulec --lift ts distance.ts
  distance    3.85 nJ  (confidence: 0.95)

$ joulec --lift js distance.js
  distance    3.85 nJ  (confidence: 0.95)

Supported Features

Everything from the JavaScript guide is supported, plus TypeScript-specific syntax is silently stripped:

TypeScript Feature	Handling
Type annotations	Stripped
Interfaces	Stripped
Type aliases	Stripped
Generics	Stripped
Access modifiers	Stripped
Enums (simple)	Converted to constants
`as` casts	Stripped
Non-null `!`	Stripped
Optional `?` params	Treated as default `undefined`

When to Use TypeScript vs JavaScript Lifting

Use --lift ts when your source files are .ts or .tsx. The lifter handles the type syntax that would cause parse errors in the JavaScript parser. If your TypeScript is already compiled to JavaScript, use --lift js on the output — the energy profile will be identical.

Anti-Patterns

All JavaScript anti-patterns apply equally to TypeScript. Types do not change the runtime energy profile.

Limitations

Same limitations as JavaScript
Complex enum patterns with computed values are not supported
Namespace merging is not supported
Decorators (experimental) are not executed
declare blocks are ignored (ambient declarations)

C Energy Analysis

Joule analyzes C code for energy consumption, targeting the low-level patterns where energy waste is most impactful: memory allocation, cache access patterns, and nested loop structures.

Quick Start

# Static energy analysis
joulec --lift c program.c

# Execute with energy tracking
joulec --lift-run c program.c

# Execute with energy optimization
joulec --energy-optimize --lift-run c program.c

Supported Features

Category	Features
Types	`int`, `long`, `float`, `double`, `char`, `void`, `size_t`, `unsigned` variants
Pointers	Declaration, dereference `*p`, address-of `&x`, pointer arithmetic
Arrays	Fixed-size `int arr[N]`, multidimensional `int mat[M][N]`
Control flow	`if/else`, `while`, `do-while`, `for`, `switch/case`, `break`, `continue`, `goto` (limited)
Functions	Declaration, definition, forward declarations, recursion
Structs	Definition, field access `.` and `->`, nested structs
Memory	`malloc()`, `calloc()`, `realloc()`, `free()`
I/O	`printf()`, `scanf()`, `puts()`, `getchar()`
Math	`sqrt()`, `pow()`, `abs()`, `floor()`, `ceil()`, `sin()`, `cos()`, `log()`, `exp()`
Operators	All arithmetic, bitwise, comparison, logical, ternary `?:`, comma

Common Energy Anti-Patterns

1. malloc Inside Loops

// BAD — 1000 allocations, 1000 frees
for (int i = 0; i < 1000; i++) {
    int *buf = malloc(sizeof(int) * 100);
    process(buf, 100);
    free(buf);
}

// GOOD — allocate once, reuse
int *buf = malloc(sizeof(int) * 100);
for (int i = 0; i < 1000; i++) {
    process(buf, 100);
}
free(buf);

Category: ALLOCATION | Severity: High | Savings: ~5x

Each malloc/free cycle costs ~200 pJ (DRAM access) plus system call overhead. In a tight loop, this dominates the energy budget.

2. Cache-Unfriendly Access Patterns

// BAD — column-major access on row-major array (cache miss per element)
for (int j = 0; j < N; j++) {
    for (int i = 0; i < M; i++) {
        sum += matrix[i][j];  // stride = N * sizeof(int)
    }
}

// GOOD — row-major access (sequential cache hits)
for (int i = 0; i < M; i++) {
    for (int j = 0; j < N; j++) {
        sum += matrix[i][j];  // stride = sizeof(int)
    }
}

Category: MEMORY | Severity: Critical | Savings: ~10x for large matrices

L1 cache load costs 0.5 pJ. DRAM load costs 200 pJ — a 400x difference. Column-major traversal on row-major data causes a DRAM load on nearly every access.

3. Realloc Growth in Loops

// BAD — realloc doubles per iteration, copies all data each time
int *data = NULL;
int cap = 0;
for (int i = 0; i < n; i++) {
    cap++;
    data = realloc(data, cap * sizeof(int));
    data[cap - 1] = i;
}

// GOOD — geometric growth (amortized O(1) per insert)
int *data = malloc(16 * sizeof(int));
int len = 0, cap = 16;
for (int i = 0; i < n; i++) {
    if (len == cap) {
        cap *= 2;
        data = realloc(data, cap * sizeof(int));
    }
    data[len++] = i;
}

Category: ALLOCATION | Severity: High | Savings: ~4x

4. Nested Loop Complexity

// BAD — O(n^3) matrix multiply without blocking
for (int i = 0; i < N; i++)
    for (int j = 0; j < N; j++)
        for (int k = 0; k < N; k++)
            C[i][j] += A[i][k] * B[k][j];

Category: ALGORITHM | Severity: Critical (for large N)

The energy estimator flags O(n^3) nested loops with high energy estimates and reduced confidence scores.

Worked Example

#include <stdlib.h>
#include <stdio.h>

void matrix_multiply(int *A, int *B, int *C, int n) {
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            C[i * n + j] = 0;
            for (int k = 0; k < n; k++) {
                C[i * n + j] += A[i * n + k] * B[k * n + j];
            }
        }
    }
}

int main() {
    int n = 64;
    int *A = calloc(n * n, sizeof(int));
    int *B = calloc(n * n, sizeof(int));
    int *C = calloc(n * n, sizeof(int));

    for (int i = 0; i < n * n; i++) {
        A[i] = i % 10;
        B[i] = (i * 3) % 10;
    }

    matrix_multiply(A, B, C, n);
    printf("C[0][0] = %d\n", C[0]);

    free(A); free(B); free(C);
    return 0;
}

$ joulec --lift c matmul.c
Energy Analysis: matmul.c

  matrix_multiply   892.40 nJ  (confidence: 0.50)
  main               45.20 nJ  (confidence: 0.75)

  Total: 937.60 nJ

Recommendations:
  !!! [ALGORITHM] matrix_multiply — O(n^3) nested loop detected
      Suggestion: consider cache-blocking or BLAS library for large matrices
      Estimated savings: 3-5x with cache blocking

  !! [MEMORY] matrix_multiply — inner loop access pattern B[k*n+j] has stride n
      Suggestion: transpose B before multiply, or interchange k/j loops
      Estimated savings: 2-4x from improved cache locality

Limitations

No preprocessor directives (#define, #include, #ifdef)
No function pointers or callbacks
No variadic functions beyond printf/scanf
No typedef (use bare type names)
No union types
No enum (use integer constants)
No complex struct initializers (= { .field = value })
No inline assembly

Go Energy Analysis

Joule analyzes Go code with awareness of goroutines, channels, and Go's concurrency model. The energy cost of spawning goroutines, sending on channels, and slice operations is modeled at the picojoule level.

Quick Start

# Static energy analysis
joulec --lift go main.go

# Execute with energy tracking
joulec --lift-run go main.go

# Execute with energy optimization
joulec --energy-optimize --lift-run go main.go

Supported Features

Category	Features
Types	`int`, `int8/16/32/64`, `uint`, `float32/64`, `string`, `bool`, `byte`, `rune`
Variables	`var`, `:=` short declaration, `const`, multiple assignment
Functions	`func`, multiple return values, named returns, closures, variadic `...`
Control flow	`if/else` (with init statement), `for`, `for range`, `switch/case`, `select`
Slices	Creation, `append()`, `len()`, `cap()`, slicing `s[a:b]`, `make()`, `copy()`
Maps	`map[K]V`, `make(map[...])`, index, delete, `len()`, comma-ok pattern
Structs	Definition, field access, methods (value/pointer receiver), embedding
Concurrency	`go` (goroutine spawn), `chan`, `<-` send/receive, `make(chan T, N)`, `close()`
Defer	`defer` statement (LIFO cleanup)
Error handling	Multiple return `(result, error)`, `if err != nil` pattern
Packages	`fmt.Println`, `fmt.Sprintf`, `math.Sqrt`, `strings.`, `strconv.`

Common Energy Anti-Patterns

1. Unbounded Goroutine Fan-Out

// BAD — spawns 100K goroutines, each has scheduling overhead
for _, item := range items {
    go process(item)  // 100K goroutines
}

// GOOD — bounded worker pool
ch := make(chan Item, 100)
for i := 0; i < runtime.NumCPU(); i++ {
    go func() {
        for item := range ch {
            process(item)
        }
    }()
}
for _, item := range items {
    ch <- item
}
close(ch)

Category: ALLOCATION | Severity: Critical | Savings: ~10x

Each goroutine has a minimum 2KB stack allocation. 100K goroutines = 200MB of stack memory + scheduling overhead.

2. Slice Append Without Pre-Allocation

// BAD — slice grows geometrically, copying data each time
var result []int
for i := 0; i < 10000; i++ {
    result = append(result, i)
}

// GOOD — pre-allocate known capacity
result := make([]int, 0, 10000)
for i := 0; i < 10000; i++ {
    result = append(result, i)
}

Category: ALLOCATION | Severity: Medium | Savings: ~2x

Without pre-allocation, append triggers ~14 reallocations and data copies to grow from 0 to 10,000 elements.

3. Map Iteration with Value Copy

// BAD — copies entire struct on each iteration
type BigStruct struct {
    Data [1024]byte
    Name string
}

for _, v := range bigMap {
    process(v)  // copies 1KB+ per iteration
}

// GOOD — iterate with index, access by reference
for k := range bigMap {
    process(&bigMap[k])
}

Category: MEMORY | Severity: Medium | Savings: ~3x for large values

4. String Concatenation in Loops

// BAD — O(n^2) string building
result := ""
for _, s := range parts {
    result += s
}

// GOOD — O(n) with strings.Builder
var b strings.Builder
for _, s := range parts {
    b.WriteString(s)
}
result := b.String()

Category: STRING | Severity: High | Savings: ~10x

Worked Example

package main

import "fmt"

func counter(id int, ch chan int) {
    sum := 0
    for i := 0; i < 1000; i++ {
        sum += i
    }
    ch <- sum
}

func main() {
    ch := make(chan int, 10)
    for i := 0; i < 10; i++ {
        go counter(i, ch)
    }

    total := 0
    for i := 0; i < 10; i++ {
        total += <-ch
    }
    fmt.Println(total)
}

$ joulec --lift go counter.go
Energy Analysis: counter.go

  counter    12.50 nJ  (confidence: 0.70)
  main        8.30 nJ  (confidence: 0.65)

  Total: 20.80 nJ

  Note: 10 goroutines detected. Energy estimate reflects single-thread
  execution model; actual concurrent execution may differ due to
  scheduling and synchronization overhead.

Limitations

No interface method dispatch (interfaces parsed but not resolved dynamically)
No struct embedding for method promotion
No generics (Go 1.18+ type parameters)
No select with complex multi-channel patterns
No panic/recover (panic is treated as program exit)
No init() functions
No package imports beyond fmt, math, strings, strconv
Goroutines are analyzed sequentially (no parallel energy modeling)

Rust Energy Analysis

Joule analyzes Rust code with awareness of ownership, iterator chains, and zero-cost abstractions. The lifter models the energy cost of heap allocations, reference counting, and iterator fusion.

Quick Start

# Static energy analysis
joulec --lift rust lib.rs

# Execute with energy tracking
joulec --lift-run rust lib.rs

# Execute with energy optimization
joulec --energy-optimize --lift-run rust lib.rs

Supported Features

Category	Features
Types	`i8/16/32/64`, `u8/16/32/64`, `f32/64`, `bool`, `char`, `String`, `&str`, `usize`, `isize`
Variables	`let`, `let mut`, `const`, type inference, shadowing
Functions	`fn`, closures `\|x\| x + 1`, generic functions (basic), `impl` blocks
Control flow	`if/else`, `while`, `loop`, `for x in`, `match` (patterns, guards), `break`, `continue`
Ownership	`&` references, `&mut` mutable references, `move` closures, lifetime annotations (parsed, not enforced)
Structs	Definition, field access, methods, associated functions
Enums	Variants, `match` exhaustiveness, `Option<T>`, `Result<T, E>`
Collections	`Vec<T>`, `HashMap<K, V>`, `String`, `Box<T>`
Iterators	`.iter()`, `.map()`, `.filter()`, `.fold()`, `.collect()`, `.enumerate()`, `.zip()`, `.chain()`, `.take()`, `.skip()`, `.any()`, `.all()`, `.find()`, `.sum()`, `.count()`
Traits	Trait definitions and `impl Trait for Type` (signatures only)
Macros	`println!`, `format!`, `vec!`, `panic!` (pattern-matched, not expanded)

Common Energy Anti-Patterns

1. clone() in Hot Loops

#![allow(unused)]
fn main() {
// BAD — clones String every iteration (heap allocation + copy)
for item in &data {
    let owned = item.clone();
    process(owned);
}

// GOOD — borrow instead of clone
for item in &data {
    process_ref(item);
}
}

Category: ALLOCATION | Severity: High | Savings: ~5x

Each .clone() on a String involves malloc + memcpy. At 200 pJ per DRAM access, this dominates in tight loops.

2. Unnecessary collect() in Iterator Chains

#![allow(unused)]
fn main() {
// BAD — collects into intermediate Vec, then iterates again
let filtered: Vec<i32> = data.iter()
    .filter(|&&x| x > 0)
    .cloned()
    .collect();  // allocates intermediate Vec
let sum: i32 = filtered.iter().sum();

// GOOD — single iterator chain, no intermediate allocation
let sum: i32 = data.iter()
    .filter(|&&x| x > 0)
    .sum();
}

Category: ALLOCATION | Severity: Medium | Savings: ~2x

Iterator fusion in Rust is a zero-cost abstraction — the compiler fuses the chain into a single loop. Breaking the chain with .collect() defeats this.

3. Box::new() in Loops

#![allow(unused)]
fn main() {
// BAD — heap allocation per iteration
let mut nodes: Vec<Box<Node>> = Vec::new();
for i in 0..1000 {
    nodes.push(Box::new(Node { value: i }));
}

// GOOD — pre-allocate with arena or flat Vec
let mut nodes: Vec<Node> = Vec::with_capacity(1000);
for i in 0..1000 {
    nodes.push(Node { value: i });
}
}

Category: ALLOCATION | Severity: Medium | Savings: ~3x

4. format!() String Building in Loops

#![allow(unused)]
fn main() {
// BAD — format! allocates a new String every iteration
let mut log = String::new();
for i in 0..1000 {
    log.push_str(&format!("item {}\n", i));
}

// GOOD — write! to a single buffer
use std::fmt::Write;
let mut log = String::with_capacity(10000);
for i in 0..1000 {
    write!(log, "item {}\n", i).unwrap();
}
}

Category: STRING | Severity: Medium | Savings: ~2x

Worked Example

fn process_data(data: &[f64]) -> f64 {
    let filtered: Vec<f64> = data.iter()
        .filter(|&&x| x > 0.0)
        .cloned()
        .collect();

    let normalized: Vec<f64> = filtered.iter()
        .map(|&x| x / filtered.len() as f64)
        .collect();

    normalized.iter().sum()
}

fn main() {
    let data = vec![3.0, -1.0, 4.0, -2.0, 5.0, 1.0, -3.0, 2.0];
    let result = process_data(&data);
    println!("Result: {}", result);
}

$ joulec --lift rust pipeline.rs
Energy Analysis: pipeline.rs

  process_data   12.30 nJ  (confidence: 0.65)
  main            2.10 nJ  (confidence: 0.90)

  Total: 14.40 nJ

Recommendations:
  !! [ALLOCATION] process_data — two collect() calls create intermediate Vecs
     Suggestion: fuse into a single iterator chain without intermediate allocation
     Estimated savings: 2-3x

     Optimized version:
       data.iter()
           .filter(|&&x| x > 0.0)
           .map(|&x| x / count as f64)
           .sum()

Zero-Cost Abstractions Are Real

Rust's iterator chains compile to the same machine code as hand-written loops. Joule confirms this:

#![allow(unused)]
fn main() {
// Iterator version
fn sum_positive_iter(data: &[i32]) -> i32 {
    data.iter().filter(|&&x| x > 0).sum()
}

// Manual loop version
fn sum_positive_loop(data: &[i32]) -> i32 {
    let mut sum = 0;
    for &x in data {
        if x > 0 { sum += x; }
    }
    sum
}
}

$ joulec --lift rust zero_cost.rs
  sum_positive_iter    4.20 nJ  (confidence: 0.70)
  sum_positive_loop    4.20 nJ  (confidence: 0.70)

Identical energy. The abstraction is truly zero-cost.

Limitations

No trait dispatch (static or dynamic) — trait bounds are parsed but not resolved
No lifetime analysis — lifetimes are parsed but not enforced
No async/await — async is not supported
No procedural macros — only println!, format!, vec!, panic! are recognized
No use imports — all types must be fully qualified or built-in
No impl Trait return types
No where clauses
No unsafe blocks

Energy Optimization Walkthrough

This walkthrough takes a real Python program through the full Joule energy analysis and optimization pipeline — from first scan to CI-ready energy budgets.

Step 1: Start with Real Code

Here's a data processing program with several common energy anti-patterns:

def find_duplicates(items, reference):
    """Find items that appear in both lists."""
    duplicates = []
    for item in items:
        for ref in reference:          # nested loop: O(n*m)
            if item == ref:
                duplicates.append(item)
    return duplicates

def build_report(records):
    """Build a text report from records."""
    report = ""
    for i in range(len(records)):      # len() in loop, string concat
        report += "Record " + str(i) + ": " + str(records[i]) + "\n"
    return report

def process_batch(data):
    """Filter and transform a data batch."""
    results = []
    for item in data:
        temp = []                      # allocation inside loop
        temp.append(item * 2)
        if temp[0] > 10:
            results.append(temp[0])
    return results

def search_all(items, targets):
    """Check if all targets exist in items."""
    found = 0
    for t in targets:
        for item in items:             # linear scan for each target
            if item == t:
                found = found + 1
                # no break — scans entire list even after finding match
    return found

def main():
    data = []
    for i in range(500):
        data.append(i)

    reference = []
    for i in range(250, 750):
        reference.append(i)

    dups = find_duplicates(data, reference)
    report = build_report(data)
    processed = process_batch(data)
    count = search_all(data, reference)

    print(len(dups))
    print(len(processed))
    print(count)

main()

Step 2: Run Baseline Analysis

$ joulec --lift python anti_patterns.py --energy-report baseline.json
Energy Analysis: anti_patterns.py

  find_duplicates    285.00 nJ  (confidence: 0.50)
  build_report        72.50 nJ  (confidence: 0.55)
  process_batch       18.30 nJ  (confidence: 0.60)
  search_all         285.00 nJ  (confidence: 0.50)
  main                12.40 nJ  (confidence: 0.75)

  Total: 673.20 nJ

Step 3: Read the Recommendations

Recommendations:

  !!! [ALGORITHM] find_duplicates — O(n^2) nested loop for membership test
      Suggestion: convert reference to a set for O(1) lookups
      Estimated savings: 50x

  !!! [ALGORITHM] search_all — O(n^2) nested loop for membership test
      Suggestion: convert items to a set for O(1) lookups
      Estimated savings: 50x

  !! [STRING] build_report — string concatenation in loop
      Suggestion: use "".join() to build string in one allocation
      Estimated savings: 8x

  !! [LOOP] search_all — no early exit after finding match
      Suggestion: add break after match to avoid scanning remaining elements
      Estimated savings: 2x (average case)

  !  [REDUNDANCY] build_report — len(records) called in loop range
      Suggestion: compute len() once before the loop
      Estimated savings: 1.2x

  !  [ALLOCATION] process_batch — list allocation inside loop body
      Suggestion: reuse buffer or eliminate temporary list
      Estimated savings: 3x

  .  [REDUNDANCY] build_report — str() conversion could use f-string
      Suggestion: use f"Record {i}: {records[i]}" for cleaner concatenation
      Estimated savings: 1.1x

Severity markers: !!! Critical, !! High, ! Medium, . Low

Step 4: Fix Critical Issues First

Fix #1: Hash set for find_duplicates

def find_duplicates(items, reference):
    ref_set = set(reference)           # O(m) one-time cost
    duplicates = []
    for item in items:
        if item in ref_set:            # O(1) per lookup
            duplicates.append(item)
    return duplicates

$ joulec --lift python fixed_v1.py
  find_duplicates    8.20 nJ  (confidence: 0.70)  # was 285.00 nJ — 34x reduction

Fix #2: Hash set for search_all + early tracking

def search_all(items, targets):
    item_set = set(items)
    found = 0
    for t in targets:
        if t in item_set:
            found = found + 1
    return found

$ joulec --lift python fixed_v2.py
  search_all    6.80 nJ  (confidence: 0.75)  # was 285.00 nJ — 42x reduction

Fix #3: String builder for build_report

def build_report(records):
    n = len(records)
    parts = []
    for i in range(n):
        parts.append(f"Record {i}: {records[i]}")
    report = "\n".join(parts) + "\n"
    return report

$ joulec --lift python fixed_v3.py
  build_report    9.50 nJ  (confidence: 0.75)  # was 72.50 nJ — 7.6x reduction

Fix #4: Eliminate temporary allocation in process_batch

def process_batch(data):
    results = []
    for item in data:
        doubled = item * 2
        if doubled > 10:
            results.append(doubled)
    return results

$ joulec --lift python fixed_v4.py
  process_batch    6.10 nJ  (confidence: 0.75)  # was 18.30 nJ — 3x reduction

Step 5: Run Optimized Baseline

After all four fixes:

$ joulec --lift python optimized.py --energy-report optimized.json
Energy Analysis: optimized.py

  find_duplicates     8.20 nJ  (confidence: 0.70)
  build_report        9.50 nJ  (confidence: 0.75)
  process_batch       6.10 nJ  (confidence: 0.75)
  search_all          6.80 nJ  (confidence: 0.75)
  main               12.40 nJ  (confidence: 0.75)

  Total: 43.00 nJ

  No recommendations — all detected anti-patterns have been resolved.

Step 6: Apply Automated Optimization

The --energy-optimize flag applies four compiler passes on top of your fixes:

$ joulec --energy-optimize --lift-run python optimized.py
Energy Optimization Report:
  Pass 1 (Thermal-Aware Selection): 2 instructions adapted
  Pass 2 (Branch Optimization):     3 branches reordered
  Pass 3 (Loop Unrolling):          1 loop unrolled (trip count 4)
  Pass 4 (DRAM Layout Analysis):    no suggestions

  Optimized energy: 38.70 nJ (10.0% reduction from automated passes)

Step 7: Compare Results

Function	Before	After Fixes	After Optimization	Reduction
find_duplicates	285.00 nJ	8.20 nJ	7.40 nJ	97.4%
build_report	72.50 nJ	9.50 nJ	8.90 nJ	87.7%
process_batch	18.30 nJ	6.10 nJ	5.50 nJ	69.9%
search_all	285.00 nJ	6.80 nJ	6.10 nJ	97.9%
main	12.40 nJ	12.40 nJ	10.80 nJ	12.9%
Total	673.20 nJ	43.00 nJ	38.70 nJ	94.3%

The manual fixes account for 93.6% of the savings. The automated passes add another 10% on top.

Step 8: Set an Energy Budget for CI

# Set budget at 50 nJ — optimized version passes
$ joulec --lift python optimized.py --energy-budget 50nJ
# Exit code: 0 (within budget)

# The original version would fail
$ joulec --lift python anti_patterns.py --energy-budget 50nJ
# Exit code: 1 (budget exceeded: 673.20 nJ > 50.00 nJ)

GitHub Actions Integration

- name: Energy budget check
  run: |
    joulec --lift python src/core.py --energy-budget 100nJ
    joulec --lift python src/utils.py --energy-budget 50nJ

The build fails if any file exceeds its budget, catching energy regressions before merge.

Step 9: Generate Reports for Dashboards

$ joulec --lift python optimized.py --energy-report report.json

The JSON report includes per-function energy, confidence scores, and any remaining recommendations. Feed this into Grafana, Datadog, or any monitoring system to track energy consumption across releases.

Key Takeaways

Start with --lift to get a baseline without running the code
Fix critical recommendations first — algorithmic changes (O(n^2) → O(n)) yield the biggest savings
Use --energy-optimize for automated passes on top of manual fixes
Set --energy-budget in CI to prevent regressions
Generate --energy-report JSON for tracking trends over time

Cross-Language Energy Comparison

The same algorithm, implemented in six languages, analyzed by Joule. This comparison reveals the energy cost of language abstractions and runtime overhead.

The Algorithm

Iterative Fibonacci computing fib(30). Chosen because it's simple enough to implement identically in every language, with enough arithmetic to produce meaningful energy differences.

Python

def fibonacci(n):
    if n <= 1:
        return n
    a = 0
    b = 1
    for i in range(2, n + 1):
        temp = a + b
        a = b
        b = temp
    return b

def main():
    result = fibonacci(30)
    print(result)

main()

JavaScript

function fibonacci(n) {
    if (n <= 1) return n;
    let a = 0;
    let b = 1;
    for (let i = 2; i <= n; i++) {
        const temp = a + b;
        a = b;
        b = temp;
    }
    return b;
}

function main() {
    const result = fibonacci(30);
    console.log(result);
}

main();

TypeScript

function fibonacci(n: number): number {
    if (n <= 1) return n;
    let a: number = 0;
    let b: number = 1;
    for (let i: number = 2; i <= n; i++) {
        const temp: number = a + b;
        a = b;
        b = temp;
    }
    return b;
}

function main(): void {
    const result: number = fibonacci(30);
    console.log(result);
}

main();

C

#include <stdio.h>

int fibonacci(int n) {
    if (n <= 1) return n;
    int a = 0;
    int b = 1;
    for (int i = 2; i <= n; i++) {
        int temp = a + b;
        a = b;
        b = temp;
    }
    return b;
}

int main() {
    int result = fibonacci(30);
    printf("%d\n", result);
    return 0;
}

Go

package main

import "fmt"

func fibonacci(n int) int {
    if n <= 1 {
        return n
    }
    a := 0
    b := 1
    for i := 2; i <= n; i++ {
        temp := a + b
        a = b
        b = temp
    }
    return b
}

func main() {
    result := fibonacci(30)
    fmt.Println(result)
}

Rust

fn fibonacci(n: i32) -> i32 {
    if n <= 1 {
        return n;
    }
    let mut a = 0;
    let mut b = 1;
    for _i in 2..=n {
        let temp = a + b;
        a = b;
        b = temp;
    }
    b
}

fn main() {
    let result = fibonacci(30);
    println!("{}", result);
}

Running the Comparison

joulec --lift python fibonacci.py
joulec --lift js fibonacci.js
joulec --lift ts fibonacci.ts
joulec --lift c fibonacci.c
joulec --lift go fibonacci.go
joulec --lift rust fibonacci.rs

Results

Language	fibonacci() Energy	main() Energy	Total	Confidence
C	1.75 nJ	0.85 nJ	2.60 nJ	0.90
Rust	1.75 nJ	1.10 nJ	2.85 nJ	0.90
Go	1.95 nJ	1.20 nJ	3.15 nJ	0.85
JavaScript	2.80 nJ	1.50 nJ	4.30 nJ	0.85
TypeScript	2.80 nJ	1.50 nJ	4.30 nJ	0.85
Python	3.40 nJ	1.80 nJ	5.20 nJ	0.80

All programs produce the correct result: 832040.

Analysis

Why C and Rust Are Cheapest

Both C and Rust map directly to integer arithmetic with no runtime overhead. The fibonacci() function compiles to:

29 integer additions (0.05 pJ each = 1.45 pJ)
58 register moves (~0 pJ, register-to-register)
29 loop iterations with branch (0.1 pJ each = 2.9 pJ)
1 comparison + branch for the n <= 1 check

Total compute: ~4.4 pJ. The remaining energy comes from function call overhead, stack frame setup, and memory loads.

Rust's slightly higher main() cost accounts for println! macro expansion, which involves formatting machinery that printf avoids.

Why Go Costs Slightly More

Go's runtime includes goroutine scheduling infrastructure even for single-threaded programs. The fmt.Println call also involves reflection-based formatting that adds overhead beyond C's printf.

Why JavaScript Costs More

JavaScript numbers are f64 (double-precision float) even for integer arithmetic. The fibonacci() loop performs float addition instead of integer addition:

Integer add: 0.05 pJ
Float add: 0.35 pJ (7x more expensive)

This single type system decision accounts for most of JavaScript's energy premium.

Why TypeScript Equals JavaScript

TypeScript type annotations (n: number, let a: number) are erased before analysis. The runtime behavior is identical to JavaScript — same f64 arithmetic, same energy profile.

Why Python Costs the Most

Python's dynamic dispatch adds overhead per operation. Each + involves:

Type check on both operands
Method lookup (__add__)
Result allocation (for large integers)

The energy model accounts for this dispatch overhead, making Python ~2x more expensive than C for pure arithmetic.

Thermal State Impact

Running with different thermal states changes the cost model's power efficiency factor:

joulec --lift c fibonacci.c --thermal-state cool       # aggressive optimization
joulec --lift c fibonacci.c --thermal-state hot        # conservative, reduced SIMD

Thermal State	C Energy	Python Energy	Ratio
cool (< 50C)	2.40 nJ	4.80 nJ	2.0x
nominal (50-70C)	2.60 nJ	5.20 nJ	2.0x
hot (85-95C)	3.10 nJ	6.20 nJ	2.0x

The absolute energy increases with temperature (thermal resistance reduces efficiency), but the ratio between languages stays constant for this workload because the algorithm is compute-bound with no SIMD opportunities.

The Energy Cost of Abstraction

This comparison quantifies something developers intuit but rarely measure: higher-level languages consume more energy for the same computation. The gap is not enormous — Python costs 2x what C costs for pure arithmetic — but it compounds across millions of function calls in production systems.

Joule makes this cost visible. Whether you're choosing a language for a new project, optimizing a hot path, or justifying a rewrite, you now have picojoule-level data to inform the decision.

Accelerator Energy Measurement

Joule measures energy consumption not just on CPUs but across GPUs, TPUs, and other accelerators. This guide covers the three-tier measurement approach, supported hardware, and how to use accelerator energy data in your programs.

Three-Tier Approach

Joule uses a tiered strategy to maximize energy measurement coverage:

Tier 1: Static Estimation

Available everywhere, no hardware access required. The compiler estimates energy from code structure using calibrated instruction costs. This is what powers #[energy_budget] at compile time.

Tier 2: CPU Performance Counters

On supported platforms, Joule reads hardware performance counters for actual CPU energy:

Platform	API	Granularity
Intel/AMD Linux	RAPL via `perf_event`	Per-package, per-core
Intel/AMD Linux	RAPL via MSR	Per-package
Apple Silicon macOS	IOReport framework	Per-cluster

Tier 3: Accelerator Energy

For GPU and accelerator workloads, Joule queries vendor-specific APIs. Each backend in TensorForge implements the EnergyTelemetry trait.

Vendor Coverage

Vendor	Hardware	API	Energy	Power	Temperature
NVIDIA	GPUs (A100, H100, etc.)	NVML	Board-level	Per-GPU	Per-GPU
AMD	GPUs (MI250, MI300, etc.)	ROCm SMI	Average power	Per-GPU	Per-GPU
Intel	GPUs, Gaudi	Level Zero	Per-device	Per-domain	Per-device
Google	TPU v4, v5	TPU Runtime	Per-chip	Per-chip	Per-chip
AWS	Inferentia, Trainium	Neuron SDK	Per-core	Per-core	Per-core
Groq	LPU	HLML	Board-level	Per-device	Per-device
Cerebras	CS-2, CS-3	CS SDK	Wafer-scale	Per-wafer	Per-wafer
SambaNova	SN30, SN40	DataScale API	Per-RDU	Per-RDU	Per-RDU

API Details

NVIDIA (NVML)

The NVIDIA Management Library provides direct energy readings:

nvmlDeviceGetTotalEnergyConsumption(device, &energy_mj)

Returns total energy in millijoules since driver load
Subtract start from end measurement for per-operation energy
Available on all datacenter GPUs (V100, A100, H100, B100)
Supported on consumer GPUs (RTX 3000/4000/5000 series)

AMD (ROCm SMI)

ROCm System Management Interface provides power readings:

rsmi_dev_power_ave_get(device_index, sensor_id, &power_uw)

Returns average power in microwatts
Energy is derived from power * time
Available on MI series (MI250, MI300) and Radeon Pro

Intel (Level Zero)

Intel's Level Zero API provides power domain readings:

zesDeviceEnumPowerDomains(device, &count, domains)
zesPowerGetEnergyCounter(domain, &energy)

Energy counter in microjoules
Multiple power domains (package, card, memory)
Supports Intel Arc GPUs and Gaudi accelerators

Google (TPU Runtime)

tpu_device_get_energy_consumption(device, &energy_j)

Per-chip energy in joules
Available on TPU v4 and v5 pods
Accessed through the TPU runtime API

AWS (Neuron SDK)

neuron_device_get_power(device, &power_mw)

Per-NeuronCore power in milliwatts
Available on Inferentia and Trainium instances
Accessed through the Neuron runtime

Groq (HLML)

Groq's Hardware Library for Machine Learning mirrors the NVML API:

hlmlDeviceGetTotalEnergyConsumption(device, &energy_mj)

Board-level energy in millijoules
Available on Groq LPU cards

Cloud Detection

Joule automatically detects available accelerators using:

Device Files

Path	Accelerator
`/dev/nvidia*`	NVIDIA GPU
`/dev/kfd`	AMD GPU (ROCm)
`/dev/dri/renderD*`	Intel GPU
`/dev/accel*`	Google TPU
`/dev/neuron*`	AWS Inferentia/Trainium

Environment Variables

Variable	Accelerator
`CUDA_VISIBLE_DEVICES`	NVIDIA GPU
`ROCR_VISIBLE_DEVICES`	AMD GPU
`ZE_AFFINITY_MASK`	Intel GPU
`TPU_NAME`	Google TPU
`NEURON_RT_NUM_CORES`	AWS Inferentia/Trainium
`GROQ_DEVICE_ID`	Groq LPU

JSON Output

Set JOULE_ENERGY_JSON=1 to get structured JSON output with per-device breakdowns:

JOULE_ENERGY_JSON=1 joulec program.joule --emit c --energy-check -o program.c

Report Format

{
  "program": "program.joule",
  "timestamp": "2026-03-03T10:30:00Z",
  "devices": [
    {
      "type": "cpu",
      "vendor": "intel",
      "model": "Xeon w9-3595X",
      "energy_joules": 0.00042,
      "measurement": "rapl",
      "tier": 2
    },
    {
      "type": "gpu",
      "vendor": "nvidia",
      "model": "H100",
      "energy_joules": 0.0031,
      "measurement": "nvml",
      "tier": 3
    }
  ],
  "total_energy_joules": 0.00352,
  "functions": [
    {
      "name": "matrix_multiply",
      "energy_joules": 0.0028,
      "device": "gpu:0",
      "confidence": 0.95,
      "budget_joules": 0.005,
      "status": "within_budget"
    },
    {
      "name": "preprocess",
      "energy_joules": 0.00042,
      "device": "cpu",
      "confidence": 0.90,
      "budget_joules": 0.001,
      "status": "within_budget"
    }
  ]
}

Per-Device Breakdown

When multiple accelerators are present, the report includes energy per device:

{
  "devices": [
    { "type": "cpu", "energy_joules": 0.0012, "tier": 2 },
    { "type": "gpu", "vendor": "nvidia", "index": 0, "energy_joules": 0.045, "tier": 3 },
    { "type": "gpu", "vendor": "nvidia", "index": 1, "energy_joules": 0.043, "tier": 3 },
    { "type": "gpu", "vendor": "nvidia", "index": 2, "energy_joules": 0.044, "tier": 3 },
    { "type": "gpu", "vendor": "nvidia", "index": 3, "energy_joules": 0.046, "tier": 3 }
  ],
  "total_energy_joules": 0.1792
}

Using Accelerator Energy in Code

Energy Budgets on GPU Functions

#[energy_budget(max_joules = 0.05)]
#[gpu_kernel]
fn batch_matmul(a: Tensor, b: Tensor) -> Tensor {
    a.matmul(b)
}

The budget is checked against actual GPU energy consumption (Tier 3) when available, or estimated (Tier 1) otherwise.

Runtime Energy Query

use std::energy::{measure, EnergyReport};

let report: EnergyReport = measure(|| {
    model.forward(input)
});

println!("CPU energy: {} J", report.cpu_joules());
println!("GPU energy: {} J", report.gpu_joules());
println!("Total: {} J", report.total_joules());

Adaptive Energy Behavior

use std::energy::current_power_draw;

let power = current_power_draw();  // watts
if power > 200.0 {
    // Use energy-efficient path
    compute_sparse(data)
} else {
    // Full compute path
    compute_dense(data)
}

Fallback Behavior

When hardware energy APIs are unavailable, Joule falls back gracefully:

If Tier 3 (accelerator) is unavailable, use Tier 2 (CPU counters) for CPU portions
If Tier 2 is unavailable, use Tier 1 (static estimation)
The confidence score reflects which tier was used

No program crashes due to missing energy hardware. The measurement degrades gracefully with reduced precision.

TensorForge

TensorForge is Joule's energy-aware machine learning framework. It provides a complete ML stack -- from tensor operations to distributed training to inference -- with energy measurement built into every layer.

Architecture

TensorForge is organized as 22 crates in the Joule workspace:

Foundation Crates

Crate	Purpose
`tf-core`	Core types, `EnergyTelemetry` trait, tensor metadata
`tf-ir`	TensorIR: `HighOp` (14 tensor operations), graph representation
`tf-compiler`	`OptimizationPass` trait, graph rewriting infrastructure, 7 optimization passes
`tf-autodiff`	Automatic differentiation with real VJP (vector-Jacobian product) implementations
`tf-hal`	Hardware abstraction: `Device` trait, memory management
`tf-runtime`	Tensor execution runtime, memory pools, scheduling

Backend Crates

Crate	Hardware Target
`tf-backend-cpu`	x86/ARM CPUs with SIMD
`tf-backend-cuda`	NVIDIA GPUs via CUDA
`tf-backend-rocm`	AMD GPUs via ROCm/HIP
`tf-backend-metal`	Apple GPUs via Metal
`tf-backend-tpu`	Google TPUs
`tf-backend-level0`	Intel GPUs/accelerators via Level Zero
`tf-backend-neuron`	AWS Inferentia/Trainium via Neuron SDK
`tf-backend-groq`	Groq LPUs
`tf-backend-gaudi`	Intel Gaudi (Habana Labs)
`tf-backend-estimated`	Energy-estimated backend (no hardware required)

High-Level Crates

Crate	Purpose
`tf-nn`	Neural network modules (`Module` trait, layers, activations)
`tf-optim`	Optimizers (AdamW, SGD with momentum)
`tf-data`	Data loading and batching
`tf-serialize`	Model serialization/deserialization
`tf-distributed`	Distributed training (ring, tree, halving-doubling collectives)
`tf-infer`	Inference engine (KV cache, speculative decoding, scheduling)

EnergyTelemetry Trait

The EnergyTelemetry trait is the foundation of TensorForge's energy awareness. Every backend implements it:

pub trait EnergyTelemetry {
    fn energy_consumed_joules(&self) -> f64;
    fn power_draw_watts(&self) -> f64;
    fn temperature_celsius(&self) -> f64;
    fn reset_counters(&mut self);
}

This means every tensor operation -- every matmul, every convolution, every activation -- has a measurable energy cost. The energy data flows up through the framework:

Individual ops report energy via the backend's telemetry
The optimizer aggregates energy per training step
The training loop reports energy per epoch
The distributed runtime aggregates energy across all nodes

TensorIR

TensorForge uses a graph-based intermediate representation with 14 high-level operations:

Operation	Description
`MatMul`	Matrix multiplication
`Conv2D`	2D convolution
`BatchNorm`	Batch normalization
`Relu`	ReLU activation
`Softmax`	Softmax
`Add`	Element-wise addition
`Mul`	Element-wise multiplication
`Reduce`	Reduction (sum, mean, max)
`Reshape`	Tensor reshape
`Transpose`	Tensor transpose
`Concat`	Tensor concatenation
`Slice`	Tensor slicing
`Gather`	Index-based gathering
`Scatter`	Index-based scattering

Graph Optimization

The tf-compiler provides 7 optimization passes:

Operator Fusion -- Fuse sequences like Conv2D+BatchNorm+ReLU into a single kernel
Layout Optimization -- Choose optimal memory layout (NCHW vs NHWC) per backend
Constant Folding -- Evaluate constant subgraphs at compile time
Dead Node Elimination -- Remove unused computation
Common Subexpression Elimination -- Share identical computations
Memory Planning -- Minimize peak memory usage through buffer reuse
Energy-Aware Scheduling -- Reorder operations to minimize energy consumption

Autodiff

TensorForge implements reverse-mode automatic differentiation with real VJP implementations for all operations. No stubs, no placeholders -- every backward pass computes correct gradients:

use tf_autodiff::backward;

let loss = model.forward(input);
let gradients = backward(loss);  // real gradient computation
optimizer.step(gradients);

Neural Network API

The tf-nn crate provides a Module trait for building neural networks:

use tf_nn::{Module, Linear, Conv2d, BatchNorm2d, relu};

struct ResBlock {
    conv1: Conv2d,
    bn1: BatchNorm2d,
    conv2: Conv2d,
    bn2: BatchNorm2d,
}

impl Module for ResBlock {
    fn forward(&self, x: Tensor) -> Tensor {
        let residual = x;
        let out = self.conv1.forward(x)
            |> self.bn1.forward
            |> relu
            |> self.conv2.forward
            |> self.bn2.forward;
        relu(out + residual)
    }
}

Optimizers

The tf-optim crate provides energy-tracked optimizers:

use tf_optim::{AdamW, SGD};

// AdamW with weight decay
let optimizer = AdamW::new(model.parameters(), lr: 0.001, weight_decay: 0.01);

// SGD with momentum
let optimizer = SGD::new(model.parameters(), lr: 0.01, momentum: 0.9);

Every optimizer step reports energy consumed:

let energy = optimizer.step(gradients);
println!("Step energy: {} J", energy.joules());

Distributed Training

The tf-distributed crate supports multi-node training with three collective algorithms:

Algorithm	Pattern	Best For
Ring AllReduce	Each node sends to next neighbor	Large models, high bandwidth
Tree AllReduce	Binary tree reduction	Low latency
Halving-Doubling	Recursive halving then doubling	Balanced

Energy is tracked across all nodes, giving total training energy:

use tf_distributed::DistributedTrainer;

let trainer = DistributedTrainer::new(
    model,
    world_size: 8,
    algorithm: CollectiveAlgorithm::Ring,
);

let metrics = trainer.train(dataset, epochs: 10);
println!("Total energy across {} nodes: {} J", 8, metrics.total_energy_joules());

Inference Engine

The tf-infer crate provides a high-performance inference engine with:

Paged KV Cache

Efficient key-value caching for transformer models. Memory is allocated in pages, avoiding fragmentation:

use tf_infer::KvCache;

let cache = KvCache::paged(
    num_layers: 32,
    num_heads: 32,
    head_dim: 128,
    page_size: 256,
);

Continuous Batching

Dynamic batching that adds new requests to a running batch without waiting for all current requests to complete:

use tf_infer::ContinuousBatcher;

let batcher = ContinuousBatcher::new(max_batch_size: 64);
batcher.add_request(prompt);
let outputs = batcher.step();  // processes all pending requests

Speculative Decoding

Use a smaller draft model to generate candidates, then verify with the full model:

use tf_infer::SpeculativeDecoder;

let decoder = SpeculativeDecoder::new(
    target_model: large_model,
    draft_model: small_model,
    num_speculative_tokens: 5,
);

Sampling Pipeline

Configurable token sampling with temperature, top-k, top-p, and repetition penalty:

use tf_infer::SamplingConfig;

let config = SamplingConfig {
    temperature: 0.7,
    top_k: 50,
    top_p: 0.9,
    repetition_penalty: 1.1,
};

Energy-Aware Scheduling

The inference scheduler considers energy costs when choosing batch sizes and scheduling decisions. It can enforce energy budgets on inference requests:

use tf_infer::EnergyAwareScheduler;

let scheduler = EnergyAwareScheduler::new(
    max_energy_per_request: 0.5,  // joules
    max_power_draw: 200.0,        // watts
);

Compiler Integration

TensorForge integrates with the Joule compiler through the joule-codegen-tensorforge crate. When Joule code uses tensor operations, the compiler:

Lowers tensor expressions to TensorIR
Applies graph optimization passes
Selects the backend based on --target
Generates backend-specific code
Instruments energy telemetry calls

This means energy budgets work with ML code:

#[energy_budget(max_joules = 10.0)]
fn train_epoch(model: &mut Model, data: DataLoader) -> f64 {
    let mut total_loss = 0.0;
    for batch in data {
        let loss = model.forward(batch.input);
        let grads = backward(loss);
        optimizer.step(grads);
        total_loss = total_loss + loss.item();
    }
    total_loss
}

Joule Language Reference

The formal specification of Joule's syntax and semantics.

Types -- Primitive types, compound types, union types, generics, type inference
Expressions -- Operators, pipe operator, literals, control flow, closures
Items -- Functions, structs, enums, traits, impls, modules, const fn, comptime
Patterns -- Pattern matching: or patterns, range patterns, guard clauses
Attributes -- Energy budgets, #[test], #[bench], thermal awareness, derive macros
Memory -- Ownership, borrowing, references, lifetimes
Concurrency -- Async/await, spawn, bounded channels, task groups, parallel for, supervisors
Energy -- Energy system formal specification with accelerator support

Notation

In syntax descriptions:

monospace indicates literal syntax
italics indicate a syntactic category (e.g., expression, type)
[ ] indicates optional elements
{ } indicates zero or more repetitions
| separates alternatives

Types

Primitive Types

Integer Types

Type	Size	Range
`i8`	8-bit	-128 to 127
`i16`	16-bit	-32,768 to 32,767
`i32`	32-bit	-2^31 to 2^31-1
`i64`	64-bit	-2^63 to 2^63-1
`isize`	pointer-sized	Platform dependent
`u8`	8-bit	0 to 255
`u16`	16-bit	0 to 65,535
`u32`	32-bit	0 to 2^32-1
`u64`	64-bit	0 to 2^64-1
`usize`	pointer-sized	Platform dependent

Integer literals default to i32. Use suffixes for other types: 42u8, 100i64, 0usize.

Floating-Point Types

Type	Size	Precision
`f16`	16-bit	~3 decimal digits (IEEE 754 half-precision)
`bf16`	16-bit	~3 decimal digits (Brain Float, ML workloads)
`f32`	32-bit	~7 decimal digits
`f64`	64-bit	~15 decimal digits

Float literals default to f64. Use suffix for f32: 3.14f32.

f16 and bf16 are half-precision types for ML inference and signal processing. bf16 has the same exponent range as f32 but fewer mantissa bits — ideal for neural network weights. Energy cost: 0.4 pJ per operation (vs 0.35 pJ for f32).

Boolean

let a: bool = true;
let b: bool = false;

Character

let c: char = 'A';     // Unicode scalar value
let emoji: char = '\u{1F600}';

Unit Type

The unit type () represents the absence of a meaningful value. Functions without a return type return ().

Compound Types

Tuples

Fixed-length, heterogeneous sequences:

let pair: (i32, String) = (42, "hello");
let (x, y) = pair;     // destructuring
let first = pair.0;     // field access

Arrays

Fixed-length, homogeneous sequences:

let arr: [i32; 5] = [1, 2, 3, 4, 5];
let zeros = [0; 10];    // 10 zeros
let first = arr[0];     // indexing

Slices

Dynamically-sized views into arrays:

let slice: &[i32] = &arr[1..3];

String Types

String

Owned, heap-allocated, growable UTF-8 string:

let s: String = "Hello, world!";
let greeting = "Hi " + name;   // concatenation
let len = s.len();              // byte length

&str

Borrowed string slice:

let s: &str = "literal";

Union Types

Union types allow a value to be one of several types. They are declared with the | separator:

type Number = i32 | i64 | f64;
type JsonValue = i64 | f64 | String | bool | Vec<JsonValue>;
type Result = Data | ErrorCode;

Union types are matched exhaustively:

fn describe(val: Number) -> String {
    match val {
        x: i32 => format!("i32: {}", x),
        x: i64 => format!("i64: {}", x),
        x: f64 => format!("f64: {}", x),
    }
}

Union Type Rules

Each constituent type must be distinct
The compiler tracks which variant is active at runtime via a discriminant tag
Pattern matching on union types is exhaustive -- all variants must be handled
Union types compose: type A = B | C where B and C can themselves be union types

Generic Types

Vec

Dynamic array:

let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
let first = v[0];

Option

Optional value:

let some: Option<i32> = Option::Some(42);
let none: Option<i32> = Option::None;

Result<T, E>

Fallible operation:

let ok: Result<i32, String> = Result::Ok(42);
let err: Result<i32, String> = Result::Err("failed");

Box

Heap-allocated value:

let boxed: Box<i32> = Box::new(42);

HashMap<K, V>

Key-value map:

let mut map: HashMap<String, i32> = HashMap::new();
map.insert("key", 42);

Smart Pointers

See Smart Pointers for full documentation.

let rc = Rc::new(42);            // single-threaded shared ownership
let arc = Arc::new(42);          // thread-safe shared ownership
let cow = Cow::borrowed("hi");   // clone-on-write

Const-Generic Types

// SmallVec — inline buffer with heap spillover
let mut sv: SmallVec[i32; 8] = SmallVec::new();
sv.push(42);   // stored inline (no allocation until > 8 elements)

// Simd — portable SIMD vectors
let v: Simd[f32; 4] = Simd::splat(1.0);
let w: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let sum = v.add(&w);

See Simd for full SIMD documentation.

N-Dimensional Arrays

See NDArray for full documentation.

let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]);
let view: NDView[f64; 1] = mat.row(0);

Type Aliases

pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;

Type Inference

The compiler infers types when possible:

let x = 42;           // inferred as i32
let v = Vec::new();   // type inferred from usage
v.push(1u8);          // now inferred as Vec<u8>

Explicit annotations are required when the type cannot be inferred from context.

Type Casting

Use as for numeric conversions:

let x: i32 = 42;
let y: f64 = x as f64;
let z: u8 = x as u8;    // truncation
let p: usize = x as usize;

Expressions

Joule is expression-oriented. Most constructs return a value, including if, match, and blocks.

Literals

42          // integer (i32)
3.14        // float (f64)
true        // bool
'A'         // char
"hello"     // String

Integer Literal Suffixes

42i8    42i16   42i32   42i64   42isize
42u8    42u16   42u32   42u64   42usize

Float Literal Suffixes

3.14f32     3.14f64

Arithmetic Operators

Operator	Operation	Types
`+`	Addition	integers, floats, String concatenation
`-`	Subtraction	integers, floats
`*`	Multiplication	integers, floats
`/`	Division	integers, floats
`%`	Remainder	integers
`**`	Exponentiation	integers, floats (right-associative)

Comparison Operators

Operator	Operation
`==`	Equal
`!=`	Not equal
`<`	Less than
`>`	Greater than
`<=`	Less or equal
`>=`	Greater or equal

All comparison operators return bool.

Logical Operators

Operator	Operation
`&&`	Logical AND (short-circuit)
`\|\|`	Logical OR (short-circuit)
`!`	Logical NOT

Bitwise Operators

Operator	Operation
`&`	Bitwise AND
`\|`	Bitwise OR
`^`	Bitwise XOR
`~`	Bitwise NOT
`<<`	Left shift
`>>`	Right shift

Pipe Operator

The pipe operator |> passes the result of the left-hand expression as the first argument to the right-hand function:

// Without pipe
let result = process(transform(parse(input)));

// With pipe -- reads left to right
let result = input |> parse |> transform |> process;

Pipe with Multi-Argument Functions

When the right-hand side is a call with arguments, the piped value is inserted as the first argument:

let result = data
    |> filter(|x| x > 0)
    |> map(|x| x * 2)
    |> take(10);

Pipe Precedence

The pipe operator has lower precedence than all other operators except assignment. It is left-associative:

// These are equivalent:
a |> f |> g
g(f(a))

Assignment

let mut x = 0;
x = 42;

Compound assignment is not supported. Use x = x + 1 instead of x += 1.

Block Expressions

A block evaluates to its last expression:

let result = {
    let a = 10;
    let b = 20;
    a + b       // no semicolon -- this is the block's value
};
// result == 30

If Expressions

if is an expression and returns a value:

let max = if a > b { a } else { b };

Without else, the type is ():

if condition {
    do_something();
}

Chained:

if x > 0 {
    "positive"
} else if x < 0 {
    "negative"
} else {
    "zero"
}

Match Expressions

Exhaustive pattern matching:

let name = match color {
    Color::Red => "red",
    Color::Green => "green",
    Color::Blue => "blue",
};

See Patterns for pattern syntax.

Loops

While Loop

while condition {
    body();
}

For Loop

for item in collection {
    process(item);
}

Loop (Infinite)

loop {
    if done() {
        break;
    }
}

Break and Continue

loop {
    if skip_this() {
        continue;
    }
    if finished() {
        break;
    }
}

Function Calls

let result = add(1, 2);

Method Calls

let len = string.len();
let upper = string.to_uppercase();

Field Access

let x = point.x;
let name = person.name;

Index Access

let first = vec[0];
let char = string[i];

Struct Construction

let p = Point { x: 3.0, y: 4.0 };

Enum Variant Construction

let c = Shape::Circle { radius: 5.0 };
let ok = Result::Ok(42);

Return

Explicit return from a function:

fn find(items: Vec<i32>, target: i32) -> Option<i32> {
    let mut i = 0;
    while i < items.len() {
        if items[i] == target {
            return Option::Some(items[i]);
        }
        i = i + 1;
    }
    Option::None
}

Type Cast

let x = 42i32 as f64;
let y = offset as usize;

Items

Items are the top-level declarations in a Joule program.

Functions

fn name(param: Type, param2: Type) -> ReturnType {
    body
}

Visibility

pub fn public_function() { }     // visible outside module
fn private_function() { }        // module-private (default)

Parameters

Parameters are passed by value (move) by default:

fn process(data: Vec<u8>) {
    // data is moved into this function
}

Use references for borrowing:

fn inspect(data: &Vec<u8>) {
    // data is borrowed immutably
}

fn modify(data: &mut Vec<u8>) {
    // data is borrowed mutably
}

Self Parameter

Methods take self as their first parameter:

impl Point {
    fn distance(self) -> f64 { }          // takes ownership
    fn inspect(self) -> f64 { }           // immutable self
    fn translate(mut self, dx: f64) { }   // mutable self
}

Generic Functions

fn first<T>(items: Vec<T>) -> Option<T> {
    if items.len() > 0 {
        Option::Some(items[0])
    } else {
        Option::None
    }
}

Extern Functions

Functions implemented outside Joule (FFI):

extern fn sqrt(x: f64) -> f64;
extern fn malloc(size: usize) -> *mut u8;

Const Functions

Functions that can be evaluated at compile time are declared with const fn:

const fn max(a: i32, b: i32) -> i32 {
    if a > b { a } else { b }
}

const fn factorial(n: i32) -> i32 {
    if n <= 1 { 1 } else { n * factorial(n - 1) }
}

// Use at compile time
const MAX_SIZE: i32 = max(100, 200);
const FACT_10: i32 = factorial(10);

Const Function Restrictions

const fn bodies are restricted to operations the compiler can evaluate:

Arithmetic operations on primitive types
Control flow (if, match, recursion)
Local variable bindings
Calling other const fn functions

The following are not allowed in const fn:

Heap allocation (Vec::new(), Box::new())
I/O operations
Mutable static state
Non-const function calls

Comptime Blocks

For more complex compile-time computation, use comptime blocks:

comptime {
    let table = generate_sin_table(1024);
}

// table is available as a compile-time constant
fn fast_sin(x: f64) -> f64 {
    let index = (x * 1024.0 / TAU) as usize;
    table[index % 1024]
}

Comptime blocks execute during compilation and make their results available as constants in runtime code. The HIR const evaluator handles arithmetic, control flow, and function calls within comptime blocks.

Structs

Named product types with fields:

pub struct Point {
    pub x: f64,
    pub y: f64,
}

Field Visibility

Fields are private by default. Use pub to make them accessible:

pub struct Config {
    pub name: String,       // public
    secret_key: String,     // private
}

Generic Structs

pub struct Pair<A, B> {
    pub first: A,
    pub second: B,
}

Enums

Sum types (tagged unions) with variants:

pub enum Color {
    Red,
    Green,
    Blue,
}

Variants with Data

pub enum Shape {
    Circle { radius: f64 },
    Rectangle { width: f64, height: f64 },
    Point,
}

Tuple Variants

pub enum Option<T> {
    Some(T),
    None,
}

pub enum Result<T, E> {
    Ok(T),
    Err(E),
}

Generic Enums

pub enum Either<L, R> {
    Left(L),
    Right(R),
}

Impl Blocks

Associate methods with a type:

impl Point {
    // Associated function (no self)
    pub fn new(x: f64, y: f64) -> Point {
        Point { x, y }
    }

    // Method (takes self)
    pub fn distance(self) -> f64 {
        (self.x * self.x + self.y * self.y).sqrt()
    }
}

Multiple impl blocks are allowed for the same type:

impl Point {
    pub fn new(x: f64, y: f64) -> Point { Point { x, y } }
}

impl Point {
    pub fn translate(mut self, dx: f64, dy: f64) {
        self.x = self.x + dx;
        self.y = self.y + dy;
    }
}

Traits

Define shared behavior:

pub trait Display {
    fn to_string(self) -> String;
}

pub trait Clone {
    fn clone(self) -> Self;
}

Trait Implementation

impl Display for Point {
    fn to_string(self) -> String {
        "(" + self.x.to_string() + ", " + self.y.to_string() + ")"
    }
}

Trait Bounds

fn print_all<T: Display>(items: Vec<T>) {
    for item in items {
        println!("{}", item.to_string());
    }
}

Dynamic Dispatch

Use dyn Trait for runtime polymorphism:

fn print_shape(shape: &dyn Display) {
    println!("{}", shape.to_string());
}

Type Aliases

pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;

Modules

Module Declarations

Modules organize code into separate files. The mod keyword declares a module:

mod lexer;      // loads from lexer.joule or lexer/mod.joule
mod parser;
mod typeck;

File Resolution

When the compiler encounters mod foo;, it searches for:

foo.joule in the same directory as the current file
foo/mod.joule for modules with sub-modules

Public Module Re-exports

pub mod utils;      // re-exports utils module to parent

Inline Modules

Modules can be defined inline within a file:

mod helpers {
    pub fn clamp(x: i32, lo: i32, hi: i32) -> i32 {
        if x < lo { lo } else if x > hi { hi } else { x }
    }
}

// Use items from inline module
let clamped = helpers::clamp(value, 0, 100);

Visibility

pub mod public_module { }
mod private_module { }

Use Declarations

Import items into scope:

// Import specific items
use crate::ast::{File, AstItem, Visibility};

// Import all items from a module
use crate::prelude::*;

// Standard library imports
use std::collections::HashMap;
use std::math::*;

// Import with alias
use crate::ast::File as AstFile;

Stdlib Path

The --stdlib-path CLI flag specifies the location of the standard library. The builtin registry includes modules for math, statistics, and compute:

use std::math::*;         // sin, cos, sqrt, etc.
use std::statistics::*;   // mean, median, std_dev

Let Statements

Variable bindings:

let x = 42;                    // immutable, type inferred
let y: f64 = 3.14;            // immutable, explicit type
let mut z = 0;                 // mutable
let (a, b) = (1, 2);          // destructuring

Patterns

Patterns are used in match expressions, let bindings, and function parameters to destructure values.

Literal Patterns

match x {
    0 => "zero",
    1 => "one",
    _ => "other",
}

Identifier Patterns

Bind the matched value to a name:

match value {
    x => println!("Got: {}", x),
}

Wildcard Pattern

_ matches any value and discards it:

match pair {
    (x, _) => println!("First: {}", x),
}

Enum Variant Patterns

Tuple Variants

match option {
    Option::Some(value) => use_value(value),
    Option::None => handle_empty(),
}

Named Field Variants

match shape {
    Shape::Circle { radius } => 3.14159 * radius * radius,
    Shape::Rectangle { width, height } => width * height,
    Shape::Point => 0.0,
}

Nested Patterns

match result {
    Result::Ok(Option::Some(value)) => use_value(value),
    Result::Ok(Option::None) => handle_none(),
    Result::Err(e) => handle_error(e),
}

Struct Patterns

let Point { x, y } = point;

In match:

match token {
    Token { kind: TokenKind::Fn, span } => parse_function(span),
    Token { kind: TokenKind::Struct, span } => parse_struct(span),
    _ => parse_expression(),
}

Tuple Patterns

let (a, b) = (1, 2);

match pair {
    (0, 0) => "origin",
    (x, 0) => "x-axis",
    (0, y) => "y-axis",
    (x, y) => "other",
}

Reference Patterns

match &value {
    &Some(x) => use_value(x),
    &None => handle_none(),
}

Or Patterns

Match multiple alternatives in a single arm using |:

match x {
    1 | 2 | 3 => "small",
    4 | 5 | 6 => "medium",
    _ => "large",
}

Or patterns work with enum variants:

match direction {
    Direction::North | Direction::South => "vertical",
    Direction::East | Direction::West => "horizontal",
}

They also work with nested patterns:

match result {
    Result::Ok(1 | 2 | 3) => "small success",
    Result::Ok(_) => "other success",
    Result::Err(_) => "failure",
}

Range Patterns

Match a contiguous range of values using ..= (inclusive):

match score {
    0..=59 => "F",
    60..=69 => "D",
    70..=79 => "C",
    80..=89 => "B",
    90..=100 => "A",
    _ => "invalid",
}

Range patterns work with integer types:

match byte {
    0x00..=0x1F => "control character",
    0x20..=0x7E => "printable ASCII",
    0x7F => "delete",
    _ => "extended",
}

And with characters:

match c {
    'a'..='z' => "lowercase",
    'A'..='Z' => "uppercase",
    '0'..='9' => "digit",
    _ => "other",
}

Guard Clauses

Add a boolean condition to a match arm with if:

match value {
    x if x > 100 => "large",
    x if x > 0 => "positive",
    x if x < 0 => "negative",
    _ => "zero",
}

Guards can reference variables bound in the pattern:

match point {
    Point { x, y } if x == y => "on diagonal",
    Point { x, y } if x == 0 => "on y-axis",
    Point { x, y } if y == 0 => "on x-axis",
    _ => "general",
}

Guards combine with or patterns:

match value {
    1 | 2 | 3 if verbose => {
        println!("small value: {}", value);
        "small"
    }
    _ => "other",
}

Guard Evaluation

The guard expression is evaluated only if the structural pattern matches
Guards do not affect exhaustiveness checking -- the compiler still requires all variants to be covered
If the guard evaluates to false, matching continues to the next arm

Exhaustiveness

The compiler verifies that match expressions cover all possible cases. Omitting a variant produces a compile-time error:

error: non-exhaustive match
  --> program.joule:10:5
   |
10 |     match color {
   |     ^^^^^ missing variants: Blue

Use _ as a catch-all when you don't need to handle every variant explicitly.

Attributes

Attributes are metadata attached to items (functions, structs, enums) that modify their behavior or provide information to the compiler.

Syntax

Attributes are placed above the item they annotate, prefixed with #[...]:

#[attribute_name]
fn function() { }

#[attribute_name(key = value)]
fn function_with_args() { }

Energy Budget

The primary attribute in Joule. Declares the maximum energy a function is allowed to consume:

#[energy_budget(max_joules = 0.0001)]
fn efficient_add(x: i32, y: i32) -> i32 {
    x + y
}

Parameters

Parameter	Type	Description
`max_joules`	`f64`	Maximum energy in joules
`max_watts`	`f64`	Maximum average power in watts
`max_temp_delta`	`f64`	Maximum temperature rise in degrees Celsius

Multiple parameters can be combined:

#[energy_budget(max_joules = 0.001, max_watts = 20.0, max_temp_delta = 3.0)]
fn critical_path() { }

See Energy System Guide for details.

Thermal Awareness

Marks a function as thermal-aware. The compiler may insert thermal throttling checks:

#[thermal_aware]
fn heavy_compute(data: Vec<f64>) -> f64 {
    // ...
}

Test

Marks a function as a test. Test functions are collected and executed when the compiler runs with --test:

#[test]
fn test_addition() {
    assert_eq!(add(2, 3), 5);
}

#[test]
fn test_sort_correctness() {
    let data = vec![5, 3, 1, 4, 2];
    let sorted = sort(data);
    assert_eq!(sorted[0], 1);
    assert_eq!(sorted[4], 5);
}

Test Energy Reporting

Every test run includes energy consumption data. The test runner reports:

Pass/fail status
Energy consumed by each test (in joules)
Total energy across all tests

joulec program.joule --test

Output:

running 3 tests
test test_addition ... ok (0.000012 J)
test test_sort_correctness ... ok (0.000089 J)
test test_fibonacci ... ok (0.000341 J)

test result: ok. 3 passed; 0 failed
total energy: 0.000442 J

Bench

Marks a function as a benchmark. Benchmark functions are collected and executed when the compiler runs with --bench:

#[bench]
fn bench_matrix_multiply() {
    let a = Matrix::random(100, 100);
    let b = Matrix::random(100, 100);
    let _ = a.multiply(b);
}

#[bench]
fn bench_sort_large() {
    let data = generate_random_vec(10000);
    let _ = sort(data);
}

Bench Energy Reporting

Benchmarks report timing and energy data over multiple iterations:

joulec program.joule --bench

Output:

running 2 benchmarks
bench bench_matrix_multiply ... 1,234 ns/iter (+/- 56) | 0.00185 J/iter
bench bench_sort_large      ...   892 ns/iter (+/- 23) | 0.00134 J/iter

total energy: 3.19 J (1000 iterations each)

Derive

Automatically implement traits for a type:

#[derive(Clone, Debug)]
pub struct Point {
    pub x: f64,
    pub y: f64,
}

Available Derive Traits

Trait	Description
`Clone`	Value can be duplicated
`Debug`	Debug string representation
`Eq`	Equality comparison
`Serialize`	Serialization support

GPU Kernel

Marks a function for GPU execution (requires MLIR backend):

#[gpu_kernel]
fn vector_add(a: Vec<f32>, b: Vec<f32>) -> Vec<f32> {
    // ...
}

Visibility

While not strictly an attribute, visibility modifiers control access:

pub fn public_function() { }      // visible everywhere
pub(crate) fn crate_function() { } // visible within the crate
fn private_function() { }          // module-private (default)

Memory Model

Joule uses an ownership-based memory model inspired by Rust. Memory is managed at compile time with no garbage collector.

Ownership

Every value has exactly one owner. When the owner goes out of scope, the value is dropped (memory freed):

fn example() {
    let s = String::from("hello");  // s owns the string
    process(s);                      // ownership moves to process()
    // s is no longer valid here
}

Move Semantics

Assignment and function calls transfer ownership by default:

let a = Vec::new();
let b = a;          // a is moved to b
// a is no longer valid

References

References borrow a value without taking ownership:

Immutable References

fn inspect(data: &Vec<i32>) {
    let len = data.len();
    // data is borrowed, not consumed
}

let v = Vec::new();
inspect(&v);      // borrow v
// v is still valid here

Multiple immutable references can coexist:

let r1 = &v;
let r2 = &v;    // ok: multiple immutable borrows

Mutable References

fn modify(data: &mut Vec<i32>) {
    data.push(42);
}

let mut v = Vec::new();
modify(&mut v);   // mutable borrow

Only one mutable reference can exist at a time:

let r1 = &mut v;
// let r2 = &mut v;  // error: cannot borrow mutably twice

Borrowing Rules

The borrow checker enforces these rules at compile time:

At any given time, you can have either:
- One mutable reference, OR
- Any number of immutable references
References must always be valid -- no dangling references
No mutable aliasing -- if a mutable reference exists, no other references to the same data can exist

Lifetimes

Lifetimes ensure references don't outlive the data they point to (planned):

fn first_word<'a>(s: &'a str) -> &'a str {
    // The returned reference lives as long as the input
    s.split(" ").next().unwrap_or("")
}

Box

Heap allocation with single ownership:

let boxed = Box::new(42);     // allocate on the heap
let value = *boxed;            // dereference

// Required for recursive types
pub enum List<T> {
    Cons(T, Box<List<T>>),
    Nil,
}

Box auto-derefs for field access:

let expr = Box::new(Expr { kind: ExprKind::Literal(42), span: Span::dummy() });
let kind = expr.kind;    // auto-deref through Box

Raw Pointers

For unsafe, low-level memory access:

unsafe {
    let ptr: *mut i32 = addr as *mut i32;
    *ptr = 42;
}

Raw pointers bypass the borrow checker. Use only when necessary and always within unsafe blocks.

Stack vs. Heap

Allocation	When	Performance
Stack	Local variables, small types	Fast (pointer bump)
Heap (Box)	Recursive types, large data, dynamic size	Slower (allocator call)
Heap (Vec)	Dynamic arrays	Amortized fast

The compiler places values on the stack by default. Use Box<T> to explicitly heap-allocate.

Concurrency

Joule provides structured concurrency primitives for safe parallel execution.

Async/Await

Functions that perform asynchronous operations are marked async:

async fn fetch_data(url: String) -> Result<String, Error> {
    let response = http::get(url).await?;
    Result::Ok(response.body())
}

The await keyword suspends execution until the asynchronous operation completes.

Async Energy Tracking

Async operations are fully energy-tracked. The compiler inserts timing wrappers around Spawn, TaskAwait, TaskGroupEnter, and TaskGroupExit operations to measure the energy consumed by asynchronous work:

#[energy_budget(max_joules = 0.005)]
async fn process_pipeline(urls: Vec<String>) -> Vec<Data> {
    let mut results: Vec<Data> = Vec::new();
    for url in urls {
        let data = fetch_data(url).await;    // energy tracked
        results.push(data);
    }
    results
}

Desugaring

Async functions are desugared to state machines backed by Task types. The await keyword becomes a yield point that checks for task completion and records energy consumed during the suspension.

Spawn

Launch a concurrent task:

use std::concurrency::spawn;

let handle = spawn(|| {
    heavy_computation()
});

let result = handle.join();

Task Pool

Under the hood, spawn submits work to a pthread-based task pool with 256 task slots. Tasks are distributed across worker threads and managed by the runtime:

Worker threads are pre-allocated (one per CPU core)
Tasks are stored in a fixed-size array (256 slots)
Task submission is lock-free on the fast path
Energy consumption is tracked per-task with thread-safe atomic counters

Channels

Send values between tasks using bounded channels:

use std::concurrency::{channel, Sender, Receiver};

let (tx, rx) = channel(capacity: 100);

spawn(|| {
    for i in 0..1000 {
        tx.send(i);    // blocks when buffer is full
    }
});

let value = rx.recv();  // blocks when buffer is empty

Bounded Channel Implementation

Channels are implemented as ring buffers protected by mutex/condvar pairs:

Capacity: Specified at creation time, provides backpressure
Blocking: send() blocks when the buffer is full; recv() blocks when empty
Thread Safety: Mutex protects the ring buffer; condvars signal producers and consumers
Energy: Channel operations are energy-tracked -- both send and receive costs are attributed to the calling task

Unbounded Channels

For cases where backpressure is not needed:

let (tx, rx) = channel();    // unbounded (grows as needed)

Task Groups

Structured concurrency with automatic cancellation:

use std::concurrency::TaskGroup;

let group = TaskGroup::new();

group.spawn(|| process_chunk_1());
group.spawn(|| process_chunk_2());
group.spawn(|| process_chunk_3());

let results = group.join_all();  // waits for all tasks

If any task panics, the group cancels all remaining tasks. Energy consumption is aggregated across all tasks in the group.

Parallel For

Parallel iteration distributes work across threads automatically:

let results = parallel for item in data {
    heavy_computation(item)
};

With explicit chunk size:

let processed = parallel(chunk_size: 1024) for row in matrix {
    transform(row)
};

The compiler sums energy consumption across all parallel branches for budget enforcement.

Mutex

Mutual exclusion for shared state:

use std::concurrency::Mutex;

let counter = Mutex::new(0);

// In a concurrent task:
let mut guard = counter.lock();
*guard = *guard + 1;
// guard dropped here, lock released

Atomic Types

Lock-free primitives for simple shared state:

use std::concurrency::AtomicI32;

let counter = AtomicI32::new(0);
counter.fetch_add(1);
let value = counter.load();

Supervisors

Supervisors manage task lifecycles with restart strategies:

use std::concurrency::Supervisor;

let sup = Supervisor::new(RestartStrategy::OneForOne);

sup.spawn("worker", || {
    process_queue()
});

sup.run();

Restart strategies:

Strategy	Behavior
`OneForOne`	Only the failed task is restarted
`OneForAll`	All tasks are restarted when one fails
`RestForOne`	The failed task and all tasks started after it are restarted

Safety Guarantees

The ownership system prevents data races at compile time:

Shared state must be wrapped in Mutex, Atomic, or other synchronization primitives
The borrow checker ensures no mutable aliasing across tasks
Task groups provide structured lifetimes for spawned work
Channels provide safe, typed communication between tasks
Energy tracking is thread-safe using atomic counters

Energy System Specification

This is the formal specification for Joule's compile-time energy verification system.

Overview

The energy system consists of:

Energy budget attributes -- Programmer-declared constraints on function energy consumption
Energy estimator -- Static analysis that estimates energy from HIR
Energy cost model -- Calibrated per-instruction energy costs
Energy IR (EIR) -- Intermediate representation with picojoule cost annotations
Accelerator energy -- Runtime measurement for GPUs and other accelerators
Diagnostics -- Error messages when budgets are violated

Attribute Syntax

#[energy_budget( budget_param { , budget_param } )]

Where budget_param is one of:

Parameter	Type	Unit	Description
`max_joules`	`f64`	joules	Maximum total energy
`max_watts`	`f64`	watts	Maximum average power
`max_temp_delta`	`f64`	celsius	Maximum temperature rise

Estimation Model

Instruction Costs

The cost model assigns picojoule costs to each instruction type. Costs are calibrated against real hardware measurements:

Instruction	Base Cost (pJ)	Thermal Scaling
`IntAdd`	0.05	Linear
`IntSub`	0.05	Linear
`IntMul`	0.35	Linear
`IntDiv`	3.5	Linear
`IntRem`	3.5	Linear
`FloatAdd`	0.35	Quadratic
`FloatSub`	0.35	Quadratic
`FloatMul`	0.35	Quadratic
`FloatDiv`	3.5	Quadratic
`FloatSqrt`	5.25	Quadratic
`MemLoadL1`	0.5	Linear
`MemLoadL2`	3.0	Linear
`MemLoadL3`	10.0	Linear
`MemLoadDram`	200.0	Linear
`MemStoreDram`	200.0	Linear
`BranchTaken`	0.1	None
`BranchNotTaken`	0.1	None
`BranchMispredicted`	1.5	None
`SimdF32x8Add`	1.5	Quadratic
`SimdF32x8Mul`	1.5	Quadratic
`SimdF32x8Div`	7.0	Quadratic
`SimdF32x8Fma`	2.0	Quadratic

Thermal Scaling

Actual cost = base_cost * thermal_factor, where thermal_factor depends on the thermal model:

None: cost is constant regardless of temperature
Linear: actual = base * (1.0 + 0.3 * thermal_state)
Quadratic: actual = base * (1.0 + 0.3 * thermal_state + 0.1 * thermal_state^2)

Default thermal state: 0.3 (nominal operating temperature).

Expression Costs

Expression	Cost
Literal	0.01 pJ
Variable access	L1 load
Binary operation	left + right + op_cost
Unary operation	inner + op_cost
Function call	args + branch + 2x L1 (stack)
Method call	receiver + args + branch + 3x L1
Field access	inner + IntAdd + L1
Index access	array + index + IntMul + IntAdd + branch (bounds) + L1
Struct construction	fields + (field_count x L1)
Array construction	elements + (element_count x L1)

Loop Estimation

Known bounds: body_cost * iteration_count
Unknown bounds: body_cost * default_iterations (100)
Max iterations cap: 10,000
PGO-refined: body_cost * actual_trip_count (from profile data)

Unknown-bound loops reduce confidence by 0.7x. PGO data restores confidence to 0.95x.

Branch Estimation

if/else: condition + avg(then_cost, else_cost) + branch_cost
match: scrutinee + avg(arm_costs) + (arm_count x branch_cost)

Branches reduce confidence by 0.9x (if/else) or 0.85x (match).

Confidence Score

Range: 0.0 to 1.0

Straight-line code: 1.0
Each if/else: multiply by 0.9
Each match: multiply by 0.85
Each unbounded loop: multiply by 0.7
PGO-refined loop: multiply by 0.95

The confidence score is reported in diagnostics to help the programmer assess estimate reliability.

Energy IR (EIR)

The Energy IR is an intermediate representation where every node carries a picojoule cost annotation. It sits between HIR and MIR in the pipeline:

HIR -> EIR (with picojoule costs) -> E-Graph Optimizer -> MIR

EIR nodes include:

EirExpr -- Expressions with energy costs
EirStmt -- Statements with energy costs
EirBody -- Function bodies with total energy and effect sets

Effect Sets

EIR tracks side effects using EffectSet:

Pure (no effects)
IO (reads/writes)
Alloc (heap allocation)
Panic (may abort)

The e-graph optimizer uses effect information to determine which rewrites are safe.

E-Graph Optimization

When --egraph-optimize is enabled, the EIR passes through an e-graph optimizer with 30+ algebraic rewrite rules:

Arithmetic simplification (x + 0 -> x, x * 1 -> x)
Constant folding
Dead code elimination
Common subexpression elimination
Strength reduction (x * 2 -> x << 1)
Energy-aware rewrites (prefer lower-energy equivalent operations)

Three-Tier Measurement

Tier 1: Static Estimation

Compile-time energy estimation using the instruction cost model. Available for all programs, no hardware access required.

Tier 2: CPU Performance Counters

Runtime measurement using hardware performance counters:

Intel/AMD: RAPL (Running Average Power Limit) via perf_event or MSR
Apple Silicon: powermetrics integration

Tier 3: Accelerator Energy

Runtime measurement using vendor-specific APIs:

Vendor	API	Measurement
NVIDIA	NVML (`nvmlDeviceGetTotalEnergyConsumption`)	Board power, per-GPU
AMD	ROCm SMI (`rsmi_dev_power_ave_get`)	Average power, per-GPU
Intel	Level Zero (`zesDeviceGetProperties` + power domains)	Per-device power
Google	TPU Runtime	Per-chip power
AWS	Neuron SDK	Per-core power
Groq	HLML (`hlmlDeviceGetTotalEnergyConsumption`)	Board power
Cerebras	CS SDK	Wafer-scale power
SambaNova	DataScale API	Per-RDU power

See Accelerator Energy Measurement for details.

Power Estimation

avg_pj_per_cycle = 0.15  (weighted average for mixed workloads)
estimated_cycles = total_pJ / avg_pj_per_cycle
execution_time = estimated_cycles / reference_frequency  (3.0 GHz)
power_watts = energy_joules / execution_time

Thermal Estimation

thermal_resistance = 0.4 K/W  (typical CPU with standard cooling)
temp_delta = power_watts * thermal_resistance

Transitive Energy Budgets

Energy budgets are enforced across call boundaries. When function A calls function B, the energy cost of B is included in A's total:

#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }

#[energy_budget(max_joules = 0.0005)]
fn caller() -> i32 {
    helper() + helper()
    // Total includes 2x helper's energy + caller's own instructions
}

The call graph analyzer (joule-callgraph) builds a complete energy call graph and identifies hotspots.

JSON Output

When JOULE_ENERGY_JSON=1 is set, energy reports are emitted as structured JSON:

{
  "functions": [
    {
      "name": "process_data",
      "file": "program.joule",
      "line": 15,
      "energy_joules": 0.00035,
      "power_watts": 12.5,
      "confidence": 0.85,
      "budget_joules": 0.0001,
      "status": "exceeded",
      "breakdown": {
        "compute_pj": 280000,
        "memory_pj": 70000,
        "branch_pj": 500
      }
    }
  ],
  "total_energy_joules": 0.00042
}

Violation Diagnostics

When a budget is exceeded, the compiler emits an error:

error: energy budget exceeded in function 'name'
  --> file.joule:line:col
   |
   | fn name(...) {
   | ^^^^^^^^^^^^^^
   |
   = estimated: X.XXXXX J (confidence: NN%)
   = budget:    X.XXXXX J
   = exceeded by NNN%

For power and thermal budgets, similar diagnostics are produced with the appropriate units.

Standard Library

Joule ships with 110+ batteries-included modules. No package manager needed for common tasks.

Core Types

These are the fundamental types used in every Joule program.

Module	Description	Status
String	UTF-8 string type	Implemented
Vec	Dynamic array	Implemented
Option	Optional values	Implemented
Result	Error handling	Implemented
HashMap	Key-value maps	Implemented
Primitives	Numeric types, bool, char	Implemented

Collections

Module	Description	Status
collections	Overview of all collection types	Implemented
`Vec<T>`	Dynamic array	Implemented
`HashMap<K,V>`	Hash map	Implemented
`HashSet<T>`	Hash set	Implemented
`BTreeMap<K,V>`	Sorted map	Implemented
`BTreeSet<T>`	Sorted set	Implemented
`LinkedList<T>`	Doubly-linked list	Implemented
`VecDeque<T>`	Double-ended queue	Implemented
`BinaryHeap<T>`	Priority queue	Implemented

Mathematics

Module	Description	Status
math	Mathematical functions	Implemented
`math::linear`	Linear algebra	Implemented
`math::complex`	Complex numbers	Implemented
`statistics`	Statistical analysis	Implemented
`montecarlo`	Monte Carlo methods	Implemented

I/O and Networking

Module	Description	Status
io	File and stream I/O	Implemented
`net`	TCP/UDP networking	Implemented
`json`	JSON parsing and serialization	Implemented
`csv`	CSV parsing	Implemented
`toml`	TOML parsing	Implemented
`yaml`	YAML parsing	Implemented

Databases

Module	Description	Status
`db-sqlite`	SQLite	Implemented
`db-postgres`	PostgreSQL	Implemented
`db-mysql`	MySQL	Implemented
`db-redis`	Redis	Implemented
`db-mongodb`	MongoDB	Implemented
...and 30+ more	See `stdlib/db-*`	Implemented

Scientific Computing

Module	Description	Status
`ode`	Ordinary differential equations	Implemented
`pde`	Partial differential equations	Implemented
`dsp`	Digital signal processing	Implemented
`physics`	Physics simulation	Implemented
`bio`	Bioinformatics	Implemented
`chem`	Chemistry	Implemented

Machine Learning and AI

Module	Description	Status
`ml`	Machine learning	Implemented
`snn`	Spiking neural networks	Implemented
`agent`	AI agent framework	Implemented

Cryptography and Security

Module	Description	Status
`crypto`	Cryptographic primitives	Implemented
`security`	Security analysis	Implemented
`zkp`	Zero-knowledge proofs	Implemented
`fhe`	Fully homomorphic encryption	Implemented

Graphics and Visualization

Module	Description	Status
`graphics`	2D/3D graphics	Implemented
`image`	Image processing	Implemented
`viz`	Data visualization	Implemented
`plot`	Plotting	Implemented

Concurrency

Module	Description	Status
`concurrency`	Concurrency primitives	Implemented
`distributed`	Distributed computing	Implemented

Energy

Module	Description	Status
`energy`	Energy measurement APIs	Implemented

Platform

Module	Description	Status
`wasm`	WebAssembly support	Implemented
`embedded`	Embedded systems	Implemented
`mobile`	Mobile development	Implemented
`desktop`	Desktop applications	Implemented

Interoperability

Module	Description	Status
`rust_interop`	Rust FFI	Implemented
`python`	Python interop	Implemented
`go_interop`	Go interop	Implemented
`typescript_interop`	TypeScript interop	Implemented

For the complete list, see the stdlib/ directory in the distribution.

String

The String type is a heap-allocated, growable UTF-8 string.

Construction

let s = "Hello, world!";           // string literal
let empty = String::new();          // empty string
let from_chars = String::from("hello");

Operations

Length

let len = s.len();       // byte length
let empty = s.is_empty();

Concatenation

let greeting = "Hello, " + name;
let full = first + " " + last;

Comparison

if s == "hello" {
    // string equality
}

Substring and Indexing

let first_byte = s[0];       // byte at index (u8)
let sub = s.substring(0, 5); // substring by byte range

Conversion

let n: i32 = 42;
let s = n.to_string();       // "42"

let x: f64 = 3.14;
let s = x.to_string();       // "3.14"

Search

let found = s.contains("world");
let pos = s.find("world");          // Option<usize>
let starts = s.starts_with("Hello");
let ends = s.ends_with("!");

Transformation

let upper = s.to_uppercase();
let lower = s.to_lowercase();
let trimmed = s.trim();

Split

let parts = s.split(",");    // Vec<String>
let lines = s.split("\n");

Memory Layout

String {
    data: *mut u8,     // pointer to UTF-8 bytes
    len: usize,        // byte length
    capacity: usize,   // allocated capacity
}

Strings are heap-allocated and own their data. When a String is dropped, its memory is freed.

Vec<T>

A contiguous, growable array type. The most commonly used collection in Joule.

Construction

let mut v: Vec<i32> = Vec::new();   // empty vector

Adding Elements

v.push(1);
v.push(2);
v.push(3);

Accessing Elements

let first = v[0];          // indexing (panics if out of bounds)
let len = v.len();          // number of elements
let empty = v.is_empty();   // true if len == 0

Iteration

for item in v {
    process(item);
}

Removing Elements

let last = v.pop();         // Option<T> -- removes and returns last element

Common Patterns

Collecting Results

let mut results: Vec<i32> = Vec::new();
let mut i = 0;
while i < 10 {
    results.push(i * i);
    i = i + 1;
}

As a Stack

let mut stack: Vec<i32> = Vec::new();
stack.push(1);      // push
stack.push(2);
let top = stack.pop();  // pop -- Option::Some(2)

Memory Layout

Vec<T> {
    data: *mut T,      // pointer to heap allocation
    len: usize,        // number of elements
    capacity: usize,   // allocated capacity
}

Vec grows automatically when elements are added beyond the current capacity. Growth is amortized O(1).

Option<T>

Represents a value that may or may not be present. Joule's alternative to null pointers.

Variants

pub enum Option<T> {
    Some(T),    // a value is present
    None,       // no value
}

Construction

let some: Option<i32> = Option::Some(42);
let none: Option<i32> = Option::None;

Pattern Matching

The primary way to use an Option:

match value {
    Option::Some(x) => {
        // use x
        println!("Got: {}", x);
    }
    Option::None => {
        println!("Nothing");
    }
}

Common Methods

Checking

let has_value = opt.is_some();   // bool
let is_empty = opt.is_none();    // bool

Unwrapping

let value = opt.unwrap();              // panics if None
let value = opt.unwrap_or(default);    // returns default if None

Common Patterns

Lookup That May Fail

fn find(items: Vec<i32>, target: i32) -> Option<usize> {
    let mut i = 0;
    while i < items.len() {
        if items[i] == target {
            return Option::Some(i);
        }
        i = i + 1;
    }
    Option::None
}

Optional Fields

pub struct User {
    pub name: String,
    pub email: Option<String>,
}

Memory Layout

Option<T> {
    is_some: bool,    // discriminant
    value: T,         // the value (undefined when is_some == false)
}

Result<T, E>

Represents an operation that can succeed with a value of type T or fail with an error of type E.

Variants

pub enum Result<T, E> {
    Ok(T),     // success
    Err(E),    // failure
}

Construction

let ok: Result<i32, String> = Result::Ok(42);
let err: Result<i32, String> = Result::Err("something went wrong");

Pattern Matching

match parse_number(input) {
    Result::Ok(n) => {
        println!("Parsed: {}", n);
    }
    Result::Err(e) => {
        println!("Error: {}", e);
    }
}

Common Methods

Checking

let succeeded = result.is_ok();    // bool
let failed = result.is_err();      // bool

Unwrapping

let value = result.unwrap();              // panics if Err
let value = result.unwrap_or(default);    // returns default if Err

Common Patterns

Functions That Can Fail

fn parse_file(path: String) -> Result<Data, String> {
    let content = read_file(path);
    match content {
        Result::Ok(text) => {
            // parse text into Data
            Result::Ok(data)
        }
        Result::Err(e) => {
            Result::Err("Failed to read file: " + e)
        }
    }
}

Error Accumulation

fn parse_all(inputs: Vec<String>) -> Result<Vec<i32>, Vec<String>> {
    let mut results: Vec<i32> = Vec::new();
    let mut errors: Vec<String> = Vec::new();

    for input in inputs {
        match parse_number(input) {
            Result::Ok(n) => results.push(n),
            Result::Err(e) => errors.push(e),
        }
    }

    if errors.is_empty() {
        Result::Ok(results)
    } else {
        Result::Err(errors)
    }
}

Memory Layout

Result<T, E> {
    is_ok: bool,           // discriminant
    union {
        ok: T,             // success value
        err: E,            // error value
    }
}

HashMap<K, V>

A hash map (dictionary) that stores key-value pairs with O(1) average lookup.

Construction

let mut map: HashMap<String, i32> = HashMap::new();

Insertion

map.insert("alice", 42);
map.insert("bob", 17);
map.insert("carol", 99);

Lookup

let value = map.get("alice");    // Option<i32>

match map.get("alice") {
    Option::Some(v) => println!("Found: {}", v),
    Option::None => println!("Not found"),
}

Checking Membership

let exists = map.contains_key("alice");   // bool

Removal

let removed = map.remove("bob");    // Option<i32>

Size

let count = map.len();         // number of entries
let empty = map.is_empty();    // true if len == 0

Iteration

for (key, value) in map {
    println!("{}: {}", key, value);
}

Common Patterns

Word Counter

fn count_words(text: String) -> HashMap<String, i32> {
    let mut counts: HashMap<String, i32> = HashMap::new();
    let words = text.split(" ");
    for word in words {
        let current = counts.get(word).unwrap_or(0);
        counts.insert(word, current + 1);
    }
    counts
}

Configuration Store

pub struct Config {
    values: HashMap<String, String>,
}

impl Config {
    pub fn get(self, key: String) -> Option<String> {
        self.values.get(key)
    }

    pub fn set(mut self, key: String, value: String) {
        self.values.insert(key, value);
    }
}

Primitive Types

Integer Types

Signed Integers

Type	Size	Min	Max
`i8`	8-bit	-128	127
`i16`	16-bit	-32,768	32,767
`i32`	32-bit	-2,147,483,648	2,147,483,647
`i64`	64-bit	-9.2 * 10^18	9.2 * 10^18
`isize`	pointer-sized	Platform dependent	Platform dependent

Unsigned Integers

Type	Size	Max
`u8`	8-bit	255
`u16`	16-bit	65,535
`u32`	32-bit	4,294,967,295
`u64`	64-bit	1.8 * 10^19
`usize`	pointer-sized	Platform dependent

Integer Methods

let x: i32 = 42;
let s = x.to_string();      // "42"
let abs = x.abs();           // absolute value
let min = x.min(10);         // minimum of two values
let max = x.max(100);        // maximum of two values

Integer Literals

let dec = 42;            // decimal
let hex = 0xFF;          // hexadecimal
let oct = 0o77;          // octal
let bin = 0b1010;        // binary
let with_sep = 1_000_000; // underscore separator
let typed = 42u8;        // type suffix

Floating-Point Types

Type	Size	Precision	Range	Energy
`f16`	16-bit	~3 digits	~6.1 * 10^-5 to 65504	0.4 pJ
`bf16`	16-bit	~3 digits	~1.2 * 10^-38 to ~3.4 * 10^38	0.4 pJ
`f32`	32-bit	~7 digits	~1.2 * 10^-38 to ~3.4 * 10^38	0.35 pJ
`f64`	64-bit	~15 digits	~2.2 * 10^-308 to ~1.8 * 10^308	0.35 pJ

Half-Precision Types

f16 is IEEE 754 half-precision — useful for signal processing and inference where memory bandwidth matters more than precision.

bf16 (Brain Float) has the same exponent range as f32 but only 8 mantissa bits. Designed for ML training where gradients don't need full precision. Used natively on Google TPUs, NVIDIA A100+, and Apple Neural Engine.

let weight: f16 = 0.5f16;
let grad: bf16 = 0.001bf16;

// Convert to/from f32
let full: f32 = weight as f32;
let half: f16 = full as f16;

Float Methods

let x: f64 = 3.14;
let s = x.to_string();      // "3.14"
let abs = x.abs();           // absolute value
let sqrt = x.sqrt();         // square root
let floor = x.floor();       // round down
let ceil = x.ceil();         // round up
let round = x.round();       // round to nearest

Float Literals

let a = 3.14;           // f64 (default)
let b = 3.14f32;        // f32
let c = 1.0e10;         // scientific notation
let d = 2.5e-3;         // 0.0025

Boolean

let t: bool = true;
let f: bool = false;

Boolean Operations

let and = a && b;       // logical AND (short-circuit)
let or = a || b;        // logical OR (short-circuit)
let not = !a;            // logical NOT

Character

A Unicode scalar value (4 bytes):

let c: char = 'A';
let emoji: char = '\u{1F600}';
let newline: char = '\n';

Unit Type

The unit type () represents no meaningful value:

fn do_something() {
    // implicitly returns ()
}

let unit: () = ();

Type Conversions

Use as for numeric conversions:

let i: i32 = 42;
let f: f64 = i as f64;       // 42.0
let u: u8 = i as u8;         // 42 (truncates if > 255)
let s: usize = i as usize;   // 42

Conversions are explicit -- Joule does not implicitly convert between numeric types.

Collections

Joule provides a comprehensive set of collection types in the standard library.

Overview

Type	Description	Ordered	Unique Keys	Use Case
`Vec<T>`	Dynamic array	Yes (insertion)	No	General-purpose sequence
`HashMap<K,V>`	Hash table	No	Yes	Key-value lookup
`HashSet<T>`	Hash set	No	Yes	Unique element set
`BTreeMap<K,V>`	Sorted map	Yes (key order)	Yes	Ordered key-value lookup
`BTreeSet<T>`	Sorted set	Yes (value order)	Yes	Ordered unique elements
`VecDeque<T>`	Ring buffer	Yes (insertion)	No	Queue / double-ended queue
`LinkedList<T>`	Doubly-linked list	Yes (insertion)	No	Frequent middle insertion/removal
`BinaryHeap<T>`	Max-heap	By priority	No	Priority queue

Vec<T>

See Vec for full documentation.

let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
let first = v[0];

HashMap<K, V>

See HashMap for full documentation.

let mut map: HashMap<String, i32> = HashMap::new();
map.insert("key", 42);
let val = map.get("key");

HashSet<T>

An unordered set of unique elements.

let mut set: HashSet<i32> = HashSet::new();
set.insert(1);
set.insert(2);
set.insert(1);          // no effect, already present
let has = set.contains(1); // true
let count = set.len();     // 2

BTreeMap<K, V>

A sorted map. Keys are kept in sorted order.

let mut map: BTreeMap<String, i32> = BTreeMap::new();
map.insert("banana", 2);
map.insert("apple", 1);
map.insert("cherry", 3);

// Iterates in key order: apple, banana, cherry
for (key, value) in map {
    println!("{}: {}", key, value);
}

BTreeSet<T>

A sorted set of unique elements.

let mut set: BTreeSet<i32> = BTreeSet::new();
set.insert(3);
set.insert(1);
set.insert(2);

// Iterates in order: 1, 2, 3
for item in set {
    println!("{}", item);
}

VecDeque<T>

A double-ended queue implemented as a ring buffer.

let mut deque: VecDeque<i32> = VecDeque::new();
deque.push_back(1);
deque.push_back(2);
deque.push_front(0);

let front = deque.pop_front();   // Option::Some(0)
let back = deque.pop_back();     // Option::Some(2)

BinaryHeap<T>

A max-heap (priority queue). The largest element is always at the top.

let mut heap: BinaryHeap<i32> = BinaryHeap::new();
heap.push(3);
heap.push(1);
heap.push(4);
heap.push(1);
heap.push(5);

let max = heap.pop();    // Option::Some(5)
let next = heap.pop();   // Option::Some(4)

SmallVec[T; N]

A vector that stores up to N elements inline (on the stack), spilling to the heap only when the capacity is exceeded. Ideal for short, bounded collections where heap allocation is wasteful.

let mut sv: SmallVec[i32; 8] = SmallVec::new();

// First 8 elements are stored inline — no heap allocation
for i in 0..8 {
    sv.push(i);       // 0.5 pJ per push (inline)
}

// 9th element triggers heap spill — 45.0 pJ
sv.push(99);

sv.len();             // 9
sv.capacity();        // 16 (heap capacity after spill)
sv.spilled();         // true
sv.get(0);            // 0
sv.pop();             // 99

sv.clear();
sv.drop();            // free heap if spilled

Energy trade-off: Inline pushes cost 0.5 pJ vs ~45 pJ for heap spill. Size N so that most instances never spill.

Deque<T>

Double-ended queue implemented as a ring buffer. O(1) push/pop at both ends.

let mut dq: Deque<i32> = Deque::new();
dq.push_back(1);
dq.push_back(2);
dq.push_front(0);

let front = dq.pop_front();   // Option::Some(0)
let back = dq.pop_back();     // Option::Some(2)

dq.front();                    // peek at front
dq.back();                     // peek at back
dq.len();                      // 1
dq.rotate_left(1);            // rotate elements

Arena<T>

Bump allocator — allocates by advancing a pointer. Individual elements cannot be freed; call reset() to free everything at once in O(1). Ideal for phase-based allocation (parsers, compilers, frame allocators).

let mut arena: Arena<AstNode> = Arena::new();

// Allocation is a pointer bump — 1.0 pJ
let node1 = arena.alloc(AstNode { kind: "expr", children: Vec::new() });
let node2 = arena.alloc(AstNode { kind: "stmt", children: Vec::new() });

arena.len();            // 2 elements allocated
arena.bytes_used();     // bytes consumed
arena.bytes_capacity(); // total buffer size

// Free everything at once — 0.5 pJ regardless of count
arena.reset();

BitSet

Fixed-capacity bit field stored as u64 words. Space-efficient boolean set with O(1) insert/contains and fast set operations.

let mut bits = BitSet::new();

bits.insert(0);
bits.insert(42);
bits.insert(63);

bits.contains(42);           // true
bits.remove(42);
bits.count_ones();           // number of set bits
bits.count_zeros();          // number of unset bits

// Set operations
let union = bits.union(&other);
let inter = bits.intersection(&other);
let diff = bits.difference(&other);
bits.is_subset(&other);

BitVec

Dynamic-length bit vector. Like BitSet but growable.

let mut bv = BitVec::new();

bv.push(true);
bv.push(false);
bv.push(true);

bv.get(1);              // false
bv.set(1, true);
bv.len();               // 3 (bits)
bv.count_ones();        // 3
bv.pop();               // true

Choosing a Collection

Need a sequence? Use Vec<T>
Need a short, bounded sequence? Use SmallVec[T; N] (avoids heap allocation)
Need fast key lookup? Use HashMap<K,V>
Need unique elements? Use HashSet<T>
Need sorted keys? Use BTreeMap<K,V> (12.0 pJ per traversal)
Need a queue? Use Deque<T> (2.0 pJ push/pop)
Need a priority queue? Use BinaryHeap<T>
Need phase-based allocation? Use Arena<T> (1.0 pJ alloc, 0.5 pJ free-all)
Need a compact boolean set? Use BitSet or BitVec (0.3 pJ per operation)
Need frequent middle insertion? Use LinkedList<T> (rare)

Smart Pointers

Smart pointers manage ownership and sharing of heap-allocated data with automatic cleanup.

Overview

Type	Thread-safe	Use Case	Energy Cost
`Box<T>`	N/A	Heap allocation, recursive types	Allocation only
`Rc<T>`	No	Single-threaded shared ownership	3.0 pJ clone/drop
`Arc<T>`	Yes	Multi-threaded shared ownership	3.0 pJ clone/drop (atomic)
`Cow<T>`	N/A	Clone-on-write optimization	Free reads, allocation on write

Box<T>

Heap-allocated value. Required for recursive types. Box<T> is a pointer in memory — zero overhead beyond the allocation.

// Recursive type requires Box
pub enum Expr {
    Literal(i32),
    Add { left: Box<Expr>, right: Box<Expr> },
    Neg { inner: Box<Expr> },
}

let expr = Expr::Add {
    left: Box::new(Expr::Literal(1)),
    right: Box::new(Expr::Literal(2)),
};

Methods

let b = Box::new(42);
let inner = b.into_inner();    // 42 — consumes the Box
let r: &i32 = b.as_ref();     // borrow the inner value
let r: &mut i32 = b.as_mut(); // mutable borrow
let ptr = b.leak();            // leak memory, return raw pointer

Rc<T>

Reference-counted pointer for single-threaded shared ownership. Multiple Rc<T> values can point to the same data. The data is freed when the last Rc is dropped.

let a = Rc::new(42);
let b = a.clone();            // increment reference count (3.0 pJ)
let c = a.clone();            // count is now 3

println!("{}", Rc::strong_count(&a));  // 3

// When a, b, c all go out of scope, the value is freed

Methods

let rc = Rc::new(vec![1, 2, 3]);
let count = Rc::strong_count(&rc);     // number of references
let inner = Rc::into_inner(rc);        // unwrap if count == 1
let r: &Vec<i32> = rc.as_ref();       // borrow inner value

// Mutable access (only if count == 1)
let mut rc = Rc::new(42);
if let Option::Some(val) = Rc::get_mut(&mut rc) {
    *val = 100;
}

Use Case: Shared Graph Nodes

pub struct Node {
    pub value: i32,
    pub children: Vec<Rc<Node>>,
}

let leaf = Rc::new(Node { value: 1, children: Vec::new() });
let parent = Node {
    value: 0,
    children: vec![leaf.clone(), leaf.clone()],  // shared ownership
};

Arc<T>

Atomically reference-counted pointer for multi-threaded shared ownership. Same API as Rc<T>, but uses atomic operations for thread safety.

use std::concurrency::spawn;

let data = Arc::new(vec![1, 2, 3, 4, 5]);

let handle = spawn(|| {
    let local = data.clone();      // atomic increment (3.0 pJ)
    println!("len = {}", local.len());
});

println!("len = {}", data.len());  // still valid in main thread

Methods

let arc = Arc::new(42);
let count = Arc::strong_count(&arc);   // number of references
let r: &i32 = arc.as_ref();           // borrow inner value
let cloned = arc.clone();             // atomic increment

// Arc::get_mut — only if count == 1
// Arc::into_inner — unwrap if count == 1
// Arc::make_mut — clone inner if shared, then return &mut

Energy: Rc vs Arc

Operation	Rc	Arc
`clone`	3.0 pJ (increment)	3.0 pJ (atomic increment)
`drop`	3.0 pJ (decrement + conditional free)	3.0 pJ (atomic decrement + conditional free)
`as_ref`	0 pJ (pointer deref)	0 pJ (pointer deref)

Use Rc when data stays on one thread. Use Arc when sharing across threads. The energy cost is similar, but Arc incurs cache-line contention overhead under high concurrency.

Cow<T>

Clone-on-write smart pointer. Wraps either a borrowed reference or an owned value. Reading is free; writing clones the data only if it's currently borrowed.

// Start with a borrowed value
let text = Cow::borrowed("hello");
println!("{}", text.as_ref());        // free — no allocation

// Convert to owned only when needed
let owned = text.to_owned();          // allocates if borrowed

// Check state
text.is_borrowed();   // true
text.is_owned();      // false

Methods

let cow = Cow::borrowed("hello");
let cow2 = Cow::owned("world".to_string());

let r: &str = cow.as_ref();           // borrow — always free
let s: String = cow.into_owned();     // consume, clone if borrowed
let owned = cow.to_owned();           // clone if borrowed, return owned Cow

cow.is_borrowed();                     // true if wrapping a reference
cow.is_owned();                        // true if wrapping an owned value

Use Case: Conditional Transformation

fn normalize(input: &str) -> Cow<str> {
    if input.contains(' ') {
        // Only allocate when we actually need to modify
        Cow::owned(input.replace(' ', "_"))
    } else {
        // No allocation — return a reference to the original
        Cow::borrowed(input)
    }
}

// Most inputs pass through without allocation
let a = normalize("hello");     // Cow::borrowed — 0 allocation
let b = normalize("hello world"); // Cow::owned — 1 allocation

Choosing a Smart Pointer

Need heap allocation for recursive types? Use Box<T>
Need shared ownership on one thread? Use Rc<T>
Need shared ownership across threads? Use Arc<T>
Need to avoid cloning until mutation? Use Cow<T>
Need unique ownership? Just use the value directly (no pointer needed)

N-Dimensional Arrays

Joule provides first-class multi-dimensional array types for scientific computing, machine learning, and signal processing.

Overview

Type	Description	Owns Data	Energy Cost
`NDArray[T; N]`	Owned N-dimensional array	Yes	Allocation + compute
`NDView[T; N]`	Non-owning view into an NDArray	No	Zero-copy
`CowArray[T; N]`	Clone-on-write array	Shared	Free reads, allocation on write
`DynArray[T]`	Dynamically-ranked array	Yes	Allocation + compute

The rank N is a compile-time constant, enabling the compiler to optimize indexing and verify dimensionality at compile time.

NDArray[T; N]

Owned, contiguous, row-major multi-dimensional array.

// Create a 2D array (matrix)
let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]);     // 3x4 matrix of zeros
let ones: NDArray[f64; 2] = NDArray::ones([2, 2]);     // 2x2 matrix of ones
let filled: NDArray[f64; 2] = NDArray::full([3, 3], 7.0); // 3x3 filled with 7.0

// Create from data
let v: NDArray[f64; 1] = NDArray::from_vec(vec![1.0, 2.0, 3.0]);
let m: NDArray[f64; 2] = NDArray::from_vec_shape(vec![1.0, 2.0, 3.0, 4.0], [2, 2]);

Indexing

// Multi-dimensional indexing
let val = mat[1, 2];           // row 1, column 2
mat[0, 0] = 42.0;             // set element

// Slicing — returns NDView
let row = mat[0, ..];          // first row
let col = mat[.., 1];          // second column
let sub = mat[1..3, 0..2];    // submatrix
let strided = mat[.., ::2];   // every other column

Methods

let a: NDArray[f64; 2] = NDArray::zeros([3, 4]);

// Shape and metadata
a.shape();         // [3, 4]
a.rank();          // 2
a.len();           // 12 (total elements)
a.strides();       // [4, 1] (row-major)

// Element-wise operations
let b = a.add(&other);        // element-wise addition
let c = a.mul(&other);        // element-wise multiplication
let d = a.map(|x: f64| -> f64 { x * 2.0 });

// Reductions
let total = a.sum();           // sum all elements
let mean = a.mean();           // average
let max = a.max();             // maximum element
let min = a.min();             // minimum element

// Shape manipulation
let reshaped = a.reshape([4, 3]);   // reshape (same element count)
let flat = a.flatten();             // flatten to 1D
let transposed = a.transpose();     // transpose axes

// Linear algebra (2D)
let product = a.matmul(&b);        // matrix multiplication
let dot = v1.dot(&v2);             // dot product (1D)

NDView[T; N]

A non-owning view into an NDArray. Views are zero-copy — they reference the original data without allocation.

let arr: NDArray[f64; 2] = NDArray::zeros([4, 4]);

// Create views via slicing
let row: NDView[f64; 1] = arr.row(0);
let col: NDView[f64; 1] = arr.col(2);
let sub: NDView[f64; 2] = arr.slice([1..3, 1..3]);

// Views support the same read operations as NDArray
let sum = row.sum();
let max = sub.max();

CowArray[T; N]

Clone-on-write array. Reading is free (shares data with the source). Writing triggers a copy only if the data is shared.

let original: NDArray[f64; 2] = NDArray::ones([100, 100]);
let cow = CowArray::from(&original);  // no copy yet

// Reading is free
let val = cow[0, 0];   // reads from original's memory

// Writing triggers a copy (if shared)
cow[0, 0] = 42.0;      // now owns its own data

DynArray[T]

Dynamically-ranked array. The rank is determined at runtime, not compile time. Use when the dimensionality isn't known until runtime (e.g., loading arbitrary tensors from files).

let dyn_arr: DynArray[f64] = DynArray::zeros(vec![3, 4, 5]);  // 3D
let rank = dyn_arr.rank();     // 3 (runtime value)
let shape = dyn_arr.shape();   // [3, 4, 5]

Broadcasting

Binary operations between arrays of different shapes follow broadcasting rules:

let mat: NDArray[f64; 2] = NDArray::ones([3, 4]);   // shape [3, 4]
let row: NDArray[f64; 1] = NDArray::from_vec(vec![1.0, 2.0, 3.0, 4.0]); // shape [4]

// row is broadcast to [3, 4] — each row gets the same values added
let result = mat.add(&row);  // shape [3, 4]

Broadcasting rules:

Dimensions are compared from the right
Dimensions must be equal, or one of them must be 1
Missing dimensions on the left are treated as 1

Energy Costs

Operation	Cost	Notes
Element access	0.5 pJ	L1 cache hit
Element-wise op	0.8 pJ/element	Arithmetic + memory
Reduction (sum/mean)	0.8 pJ/element	Sequential scan
Matrix multiply	~2N^3 * 0.8 pJ	Cubic complexity
Reshape/transpose	0 pJ	Metadata-only (no copy)
Slice (NDView)	0 pJ	Zero-copy view
Broadcasting	0 pJ overhead	Applied during compute

Choosing an Array Type

Know the rank at compile time? Use NDArray[T; N] — the compiler verifies dimensions
Need a read-only window? Use NDView[T; N] — zero-copy, zero allocation
Might or might not modify? Use CowArray[T; N] — defers allocation until write
Rank determined at runtime? Use DynArray[T] — flexible but no compile-time dimension checks

SIMD Vector Types

Simd[T; N] provides portable SIMD (Single Instruction, Multiple Data) operations. The compiler maps to platform-native intrinsics where available (x86 SSE/AVX, ARM NEON) with a scalar fallback for portability.

Creating SIMD Vectors

// Splat — fill all lanes with the same value
let v: Simd[f32; 4] = Simd::splat(1.0);      // [1.0, 1.0, 1.0, 1.0]

// From an array
let v: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);

// Load from a pointer + offset
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let v: Simd[f32; 4] = Simd::load(&data, 0);  // first 4 elements
let w: Simd[f32; 4] = Simd::load(&data, 4);  // next 4 elements

Common Lane Widths

Type	Lanes	x86	ARM
`Simd[f32; 4]`	4	SSE `__m128`	NEON `float32x4_t`
`Simd[f32; 8]`	8	AVX `__m256`	2x NEON
`Simd[f64; 2]`	2	SSE2 `__m128d`	NEON `float64x2_t`
`Simd[f64; 4]`	4	AVX `__m256d`	2x NEON
`Simd[i32; 4]`	4	SSE2 `__m128i`	NEON `int32x4_t`
`Simd[i32; 8]`	8	AVX2 `__m256i`	2x NEON

Arithmetic Operations

All arithmetic operates lane-by-lane:

let a: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let b: Simd[f32; 4] = Simd::from_array([5.0, 6.0, 7.0, 8.0]);

let sum = a.add(&b);    // [6.0, 8.0, 10.0, 12.0]
let diff = a.sub(&b);   // [-4.0, -4.0, -4.0, -4.0]
let prod = a.mul(&b);   // [5.0, 12.0, 21.0, 32.0]
let quot = a.div(&b);   // [0.2, 0.333, 0.429, 0.5]

Reduction Operations

Reduce all lanes to a single scalar:

let v: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);

let total = v.sum();     // 10.0 — horizontal sum of all lanes

Comparison and Selection

let a: Simd[f32; 4] = Simd::from_array([1.0, 5.0, 3.0, 8.0]);
let b: Simd[f32; 4] = Simd::from_array([2.0, 4.0, 6.0, 7.0]);

let lo = a.min(&b);     // [1.0, 4.0, 3.0, 7.0]
let hi = a.max(&b);     // [2.0, 5.0, 6.0, 8.0]
let same = a.eq(&b);    // false (element-wise equality check)

Unary Operations

let v: Simd[f32; 4] = Simd::from_array([-1.0, 2.0, -3.0, 4.0]);

let pos = v.abs();       // [1.0, 2.0, 3.0, 4.0]
let neg = v.neg();       // [1.0, -2.0, 3.0, -4.0]

Memory Operations

let data = vec![0.0; 1024];

// Load 4 elements starting at offset 8
let chunk: Simd[f32; 4] = Simd::load(&data, 8);

// Store back to memory
chunk.store(&mut data, 8);

// Convert to/from array
let arr: [f32; 4] = v.to_array();

Example: Vectorized Dot Product

#[energy_budget(max_joules = 0.00005)]
fn dot_product(a: &[f32], b: &[f32]) -> f32 {
    let n = a.len();
    let mut sum: Simd[f32; 8] = Simd::splat(0.0);
    let mut i = 0;

    // Process 8 elements at a time
    while i + 8 <= n {
        let va: Simd[f32; 8] = Simd::load(a, i);
        let vb: Simd[f32; 8] = Simd::load(b, i);
        sum = sum.add(&va.mul(&vb));
        i = i + 8;
    }

    // Horizontal sum + scalar remainder
    let mut result = sum.sum();
    while i < n {
        result = result + a[i] * b[i];
        i = i + 1;
    }
    result
}

Energy Costs

Operation	Cost	Notes
Lane arithmetic (add/sub/mul/div)	2.0 pJ	Single SIMD instruction
Horizontal reduction (sum)	2.0 pJ	Log2(N) shuffle + add
Load/store	0.5 pJ	L1 cache, aligned
Comparison (min/max/eq)	2.0 pJ	Single SIMD instruction

SIMD operations process N elements for roughly the same energy as one scalar operation. For a Simd[f32; 8], that's ~8x energy efficiency compared to a scalar loop — the primary reason to use SIMD in energy-aware code.

Platform Detection

The compiler automatically selects the best implementation:

x86/x86_64: Uses SSE/AVX intrinsics via <immintrin.h>
ARM64 (Apple Silicon, etc.): Uses NEON intrinsics via <arm_neon.h>
Other platforms: Falls back to scalar loops (same behavior, no hardware acceleration)

No #[cfg] attributes needed in user code — the abstraction is portable.

Time

Joule provides two types for time measurement: Duration for time spans and Instant for timestamps.

Duration

A time span measured in nanoseconds. All arithmetic is exact — no floating-point rounding.

Creating Durations

let d1 = Duration::from_secs(5);          // 5 seconds
let d2 = Duration::from_millis(1500);     // 1.5 seconds
let d3 = Duration::from_micros(250);      // 250 microseconds
let d4 = Duration::from_nanos(100);       // 100 nanoseconds

Querying

let d = Duration::from_millis(2500);
d.as_secs();      // 2
d.as_millis();    // 2500
d.as_micros();    // 2500000
d.as_nanos();     // 2500000000
d.is_zero();      // false

Arithmetic

let a = Duration::from_secs(3);
let b = Duration::from_millis(500);

let sum = a.add(&b);           // 3.5 seconds
let diff = a.sub(&b);          // 2.5 seconds
let doubled = a.mul(2);        // 6 seconds
let halved = a.div(2);         // 1.5 seconds

// Checked arithmetic (returns Option)
let safe = a.checked_add(&b);  // Option::Some(3.5s)
let over = a.checked_sub(&Duration::from_secs(10)); // Option::None

Instant

A monotonic timestamp. Cannot go backwards. Used for measuring elapsed time.

Measuring Elapsed Time

let start = Instant::now();          // 15.0 pJ — reads system clock

// ... do work ...
heavy_computation();

let elapsed: Duration = start.elapsed();
println!("Took {} ms", elapsed.as_millis());

Comparing Instants

let t1 = Instant::now();
// ... work ...
let t2 = Instant::now();

let gap: Duration = t2.duration_since(&t1);

Example: Benchmarking with Energy

#[energy_budget(max_joules = 0.001)]
fn timed_sort(data: Vec<i32>) -> (Vec<i32>, Duration) {
    let start = Instant::now();
    let sorted = sort(data);
    let elapsed = start.elapsed();
    (sorted, elapsed)
}

fn main() {
    let data = vec![5, 3, 1, 4, 2, 8, 7, 6];
    let (sorted, time) = timed_sort(data);
    println!("Sorted in {} us", time.as_micros());
}

Energy Costs

Operation	Cost	Notes
`Duration` arithmetic	0.05 pJ	Integer add/sub
`Instant::now()`	15.0 pJ	System clock read (syscall)
`elapsed()`	15.0 pJ	Clock read + subtraction
`duration_since()`	0.05 pJ	Integer subtraction

Instant::now() is the expensive operation — it requires a system call (clock_gettime on Linux, mach_absolute_time on macOS). Avoid calling it in tight loops. Measure coarse-grained sections instead.

Numeric Types

Specialized numeric types beyond the standard integer and float primitives.

Decimal

128-bit decimal type for exact arithmetic. No floating-point rounding errors. Essential for financial calculations.

let price = Decimal::new(19, 99, false);    // 19.99
let tax = Decimal::from_str("0.0825");      // 8.25%
let total = price.mul(&tax).add(&price);    // exact: 21.6392...

// No floating-point surprise
let a = Decimal::from_str("0.1");
let b = Decimal::from_str("0.2");
let c = a.add(&b);
// c == 0.3 exactly (unlike f64 where 0.1 + 0.2 != 0.3)

Methods

let d = Decimal::from_str("123.456");

// Arithmetic
d.add(&other);       d.sub(&other);
d.mul(&other);       d.div(&other);
d.rem(&other);       // remainder

// Rounding
d.round(2);          // 123.46 — round to 2 decimal places
d.floor();           // 123.0
d.ceil();            // 124.0
d.trunc();           // 123.0 — truncate toward zero

// Properties
d.abs();             // absolute value
d.neg();             // negate
d.scale();           // number of decimal places
d.mantissa();        // integer mantissa
d.is_zero();         // false
d.is_negative();     // false

// Conversion
d.to_f64();          // 123.456 (lossy)
d.to_string();       // "123.456"

Energy Cost

Operation	Cost
Decimal arithmetic	5.0 pJ
Decimal comparison	0.5 pJ

Decimal is ~14x more expensive than f64 arithmetic but guarantees exact results. Use it where correctness matters more than speed (finance, accounting, currency).

Complex<T>

Complex number with real and imaginary parts. Generic over the component type (typically f32 or f64).

let z = Complex::new(3.0, 4.0);     // 3 + 4i
let w = Complex::new(1.0, -2.0);    // 1 - 2i

// Arithmetic
let sum = z.add(&w);     // 4 + 2i
let prod = z.mul(&w);    // 11 + 2i
let quot = z.div(&w);    // -1 + 2i

// Properties
z.real();       // 3.0
z.imag();       // 4.0
z.abs();        // 5.0 (magnitude: sqrt(3^2 + 4^2))
z.arg();        // 0.927... (phase angle in radians)
z.conj();       // 3 - 4i (complex conjugate)
z.norm();       // 25.0 (squared magnitude)

Advanced Operations

let z = Complex::new(1.0, 1.0);

z.exp();             // e^z
z.log();             // natural logarithm
z.sqrt();            // principal square root
z.pow(&w);           // z^w

// Polar form
let polar = Complex::from_polar(5.0, 0.927);  // magnitude, angle

Energy Cost

Operation	Cost
Complex add/sub	1.6 pJ (2x real)
Complex multiply	1.6 pJ
Complex divide	3.2 pJ
abs/norm	1.6 pJ
exp/log/sqrt	5.0 pJ

Intern

Interned string — stored once in a global table, compared by pointer equality. Ideal for identifiers, keywords, and symbols that appear repeatedly.

let a = Intern::new("hello");
let b = Intern::new("hello");

// Pointer equality — O(1) comparison instead of O(n) string compare
a.eq(&b);         // true (same pointer)

// String access
a.as_str();       // "hello"
a.len();          // 5
a.is_empty();     // false
a.hash();         // precomputed hash value

Use Case: Compiler Symbol Tables

pub struct Symbol {
    pub name: Intern,
}

// Creating millions of Symbol values with the same name
// only stores the string once in memory
let sym1 = Symbol { name: Intern::new("x") };
let sym2 = Symbol { name: Intern::new("x") };

// Comparison is pointer equality — O(1), not O(n)
sym1.name.eq(&sym2.name);  // true, instant

Energy Cost

Operation	Cost	Notes
`Intern::new` (first time)	10.0 pJ	Hash table insert
`Intern::new` (duplicate)	10.0 pJ	Hash table lookup
`eq`	0.05 pJ	Pointer comparison
`as_str`	0 pJ	Pointer dereference

The 10.0 pJ cost of Intern::new is amortized over all subsequent O(1) comparisons. For strings compared frequently (like identifiers in a compiler), interning saves both energy and time.

I/O

File and stream I/O operations.

Reading Files

use std::io::File;

let content = File::read_to_string("data.txt");
match content {
    Result::Ok(text) => process(text),
    Result::Err(e) => println!("Error: {}", e),
}

Writing Files

use std::io::File;

let result = File::write_string("output.txt", "Hello, world!");
match result {
    Result::Ok(_) => println!("Written successfully"),
    Result::Err(e) => println!("Error: {}", e),
}

Reading Lines

use std::io::File;

let lines = File::read_lines("data.txt");
match lines {
    Result::Ok(lines) => {
        for line in lines {
            process_line(line);
        }
    }
    Result::Err(e) => println!("Error: {}", e),
}

Standard Streams

use std::io::{stdin, stdout, stderr};

// Read from stdin
let line = stdin::read_line();

// Write to stdout
stdout::write("Hello\n");

// Write to stderr
stderr::write("Error message\n");

Path Operations

use std::io::Path;

let p = Path::new("/home/user/data.txt");
let exists = p.exists();
let is_file = p.is_file();
let is_dir = p.is_dir();
let parent = p.parent();        // Option<Path>
let filename = p.file_name();   // Option<String>
let ext = p.extension();        // Option<String>

Directory Operations

use std::io::{create_dir, read_dir, remove_dir};

create_dir("output");

let entries = read_dir(".");
match entries {
    Result::Ok(files) => {
        for entry in files {
            println!("{}", entry.name());
        }
    }
    Result::Err(e) => println!("Error: {}", e),
}

Buffered I/O

For performance-critical I/O, use buffered readers and writers:

use std::io::{BufReader, BufWriter};

let reader = BufReader::new(File::open("large.txt"));
let writer = BufWriter::new(File::create("output.txt"));

Math

Mathematical functions, constants, and linear algebra operations.

Constants

use std::math;

let pi = math::PI;           // 3.141592653589793
let e = math::E;             // 2.718281828459045
let tau = math::TAU;         // 6.283185307179586
let sqrt2 = math::SQRT_2;   // 1.4142135623730951

Basic Functions

use std::math;

let a = math::abs(-42.0);       // 42.0
let s = math::sqrt(144.0);      // 12.0
let p = math::pow(2.0, 10.0);   // 1024.0
let l = math::log(math::E);     // 1.0
let l2 = math::log2(1024.0);    // 10.0
let l10 = math::log10(1000.0);  // 3.0

Trigonometry

use std::math;

let s = math::sin(math::PI / 2.0);    // 1.0
let c = math::cos(0.0);                // 1.0
let t = math::tan(math::PI / 4.0);    // 1.0

let as = math::asin(1.0);             // PI/2
let ac = math::acos(0.0);             // PI/2
let at = math::atan(1.0);             // PI/4
let at2 = math::atan2(1.0, 1.0);     // PI/4

Rounding

use std::math;

let f = math::floor(3.7);    // 3.0
let c = math::ceil(3.2);     // 4.0
let r = math::round(3.5);    // 4.0
let t = math::trunc(3.9);    // 3.0

Min/Max

use std::math;

let mn = math::min(3.0, 7.0);    // 3.0
let mx = math::max(3.0, 7.0);    // 7.0
let cl = math::clamp(15.0, 0.0, 10.0);  // 10.0

Linear Algebra

use std::math::linear;

// Vector operations
let v1 = linear::Vector::new([1.0, 2.0, 3.0]);
let v2 = linear::Vector::new([4.0, 5.0, 6.0]);

let sum = v1.add(v2);
let dot = v1.dot(v2);           // 32.0
let norm = v1.norm();            // sqrt(14)
let scaled = v1.scale(2.0);

// Matrix operations
let m = linear::Matrix::identity(3);
let det = m.determinant();
let inv = m.inverse();
let product = m.multiply(m);

Complex Numbers

use std::math::complex::Complex;

let z1 = Complex::new(3.0, 4.0);   // 3 + 4i
let z2 = Complex::new(1.0, 2.0);   // 1 + 2i

let sum = z1.add(z2);               // 4 + 6i
let product = z1.mul(z2);           // -5 + 10i
let magnitude = z1.abs();           // 5.0
let conjugate = z1.conj();          // 3 - 4i

Statistics

use std::statistics;

let data = [2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0];

let mean = statistics::mean(data);         // 5.0
let median = statistics::median(data);     // 4.5
let stddev = statistics::std_dev(data);    // ~2.0
let variance = statistics::variance(data);

Random Numbers

use std::math::random;

let n = random::int(0, 100);       // random integer in [0, 100)
let f = random::float();            // random f64 in [0.0, 1.0)
let b = random::bool();             // random boolean

The Joule Programming Language