Joule Documentation
Welcome to the Joule programming language documentation (v1.1.7). Joule is developed by Open Interface Engineering, Inc.
For New Users
- Getting Started -- Install the compiler and write your first Joule program
- Language Tour -- Learn Joule's syntax and features through examples
Guides
- Energy System Guide -- Compile-time energy budgets: Joule's defining feature
- Compiler Reference -- CLI usage, flags, backends, and the compilation pipeline
- JIT Compilation -- Interactive development with
--jitand--watch - Polyglot Energy Analysis -- Measure energy in Python, JavaScript, and C code
- Accelerator Energy -- GPU and accelerator energy measurement across vendors
- TensorForge -- Energy-aware ML framework built on Joule
Language Reference
The formal specification of Joule's syntax and semantics.
- Types -- Primitives, compounds, generics, union types, type inference
- Expressions -- Operators, pipe operator, literals, control flow, closures
- Items -- Functions, structs, enums, traits, impls, modules, const fn, comptime
- Patterns -- Pattern matching: or patterns, range patterns, guard clauses
- Attributes -- Energy budgets, #[test], #[bench], thermal awareness
- Memory -- Ownership, borrowing, references, lifetimes
- Concurrency -- Async/await, spawn, channels, task groups, parallel for
- Energy -- Energy system specification with accelerator support
Standard Library Reference
Joule ships with 110+ batteries-included modules.
- Overview -- Index of all standard library modules
- String -- String type and operations
- Vec -- Dynamic arrays
- Option -- Optional values
- Result -- Error handling
- HashMap -- Key-value maps
- Primitives -- Numeric types, bool, char
- Collections -- All collection types
- I/O -- File and stream I/O
- Math -- Mathematical functions and linear algebra
Feedback
To report bugs, request features, or ask questions, visit joule-lang.org or open an issue on GitHub.
Getting Started with Joule
This guide walks you through installing the Joule compiler and writing your first program.
Current version: v1.2.0
Install
Quick Install (Recommended)
| Platform | Command |
|---|---|
| macOS / Linux | brew install openIE-dev/joule/joule |
| Windows | winget install OpenIE.Joule |
| Ubuntu / Debian | sudo apt install joule (after adding the repo) |
| Arch Linux | yay -S joule-bin |
| Nix | nix run github:openIE-dev/joule-lang |
| Snap | sudo snap install joule --classic |
| Any (curl) | curl -fsSL https://joule-lang.org/install.sh | sh |
macOS
Homebrew (recommended):
brew install openIE-dev/joule/joule
Or download joule-macos-arm64.pkg (Apple Silicon) or joule-macos-x86_64.pkg (Intel) from the releases page:
sudo installer -pkg joule-macos-arm64.pkg -target /
Windows
Winget (recommended, built into Windows 11):
winget install OpenIE.Joule
Scoop:
scoop bucket add joule https://github.com/openIE-dev/scoop-joule
scoop install joule
Chocolatey:
choco install joule
Or download joule-windows-x86_64.msi or joule-windows-arm64.msi from the releases page. The MSI installer adds joulec to your PATH automatically.
APT (Ubuntu/Debian)
curl -fsSL https://openie-dev.github.io/joule-lang/KEY.gpg | sudo gpg --dearmor -o /usr/share/keyrings/joule.gpg
echo "deb [signed-by=/usr/share/keyrings/joule.gpg] https://openie-dev.github.io/joule-lang stable main" | sudo tee /etc/apt/sources.list.d/joule.list
sudo apt update && sudo apt install joule
Arch Linux (AUR)
yay -S joule-bin
Or with any AUR helper: paru -S joule-bin, trizen -S joule-bin.
Nix
# Run without installing
nix run github:openIE-dev/joule-lang
# Install into profile
nix profile install github:openIE-dev/joule-lang
Snap
sudo snap install joule --classic
Install Script
Universal one-line installer for macOS and Linux:
curl -fsSL https://joule-lang.org/install.sh | sh
From Source
git clone https://github.com/openIE-dev/joule-lang.git
cd joule-lang && cargo build --release
From C Source (Zero Dependencies)
Download joule-c-src-*.tar.gz from the releases page:
tar xzf joule-c-src-*.tar.gz && cd joule-c-src-*
make # or: cc -O2 -o joulec output.c -lm
Verify
joulec --version
# joulec 1.2.0
Write Your First Program
Create a file called hello.joule:
pub fn main() {
let message = "Hello from Joule!";
println!("{}", message);
}
Compile and Run
joulec hello.joule -o hello
./hello
Output:
Hello from Joule!
Try JIT Mode
For interactive development, skip the compile step entirely:
joulec --jit hello.joule
This JIT-compiles and runs your program in memory using the Cranelift backend. No intermediate files are produced.
For an even faster workflow, use watch mode. It monitors your source file and re-runs automatically when you save:
joulec --watch hello.joule
JIT mode requires the jit feature flag. See JIT Compilation for details.
Add an Energy Budget
Joule's defining feature is compile-time energy budget verification. Annotate your function with an energy allowance:
#[energy_budget(max_joules = 0.0001)]
pub fn main() {
let x = 42;
let y = 58;
let result = x + y;
println!("{}", result);
}
Compile with energy checking:
joulec hello.joule -o hello --energy-check
The compiler estimates the energy cost of your function at compile time. If it exceeds the declared budget, compilation fails with a diagnostic showing the estimated vs. allowed energy.
Measure Energy in Existing Code
Already have Python or JavaScript code? Joule can measure its energy consumption without rewriting it:
# Measure energy in a Python script
joulec --lift-run python script.py
# Measure energy in a JavaScript file
joulec --lift-run js app.js
# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py
The --lift-run flag lifts foreign code into Joule's intermediate representation, applies energy analysis, and executes it. See Polyglot Energy Analysis for details.
Batteries Included
Joule ships with 110+ standard library modules. No package manager needed for common tasks:
use std::math;
use std::collections::HashMap;
use std::io::File;
use std::net::TcpStream;
use std::crypto::sha256;
See the Standard Library Reference for the complete list.
Feedback
Joule is developed and maintained by Open Interface Engineering, Inc. We welcome bug reports, feature requests, and questions via joule-lang.org or GitHub Issues.
Next Steps
- Language Tour -- Learn Joule's syntax and features through examples
- Energy System Guide -- Deep dive into energy budgets
- Compiler Reference -- All CLI flags and options
- JIT Compilation -- Interactive development workflow
- Standard Library -- Available types and modules
Language Tour
A quick introduction to Joule's syntax and features through examples.
Variables
Variables are immutable by default. Use mut for mutable bindings.
let x = 42; // immutable, type inferred as i32
let name: String = "Jo"; // explicit type annotation
let mut count = 0; // mutable
count = count + 1;
Primitive Types
let a: i8 = -128; // signed integers: i8, i16, i32, i64, isize
let b: u32 = 42; // unsigned integers: u8, u16, u32, u64, usize
let c: f64 = 3.14159; // floats: f16, bf16, f32, f64
let d: bool = true; // boolean
let e: char = 'A'; // unicode character
let s: String = "hello"; // string
let h: f16 = 0.5f16; // half-precision (ML inference, signal processing)
let g: bf16 = 0.001bf16; // brain float (ML training)
Functions
// Basic function with parameters and return type
fn add(a: i32, b: i32) -> i32 {
a + b // last expression is the return value
}
// Public function (visible outside the module)
pub fn greet(name: String) {
println!("Hello, {}", name);
}
// Mutable self parameter for methods that modify state
fn advance(mut self) -> Token {
let token = self.peek();
self.pos = self.pos + 1;
token
}
Structs
pub struct Point {
pub x: f64,
pub y: f64,
}
// Construction
let p = Point { x: 3.0, y: 4.0 };
// Field access
let dist = p.x * p.x + p.y * p.y;
Impl Blocks
Methods are defined in impl blocks, separate from the struct definition.
impl Point {
// Associated function (constructor)
pub fn new(x: f64, y: f64) -> Point {
Point { x, y }
}
// Method on self
pub fn distance(self) -> f64 {
(self.x * self.x + self.y * self.y).sqrt()
}
// Mutable method
pub fn translate(mut self, dx: f64, dy: f64) {
self.x = self.x + dx;
self.y = self.y + dy;
}
}
let p = Point::new(3.0, 4.0);
let d = p.distance();
Enums
Enums can hold data in each variant, making them sum types (tagged unions).
pub enum Shape {
Circle { radius: f64 },
Rectangle { width: f64, height: f64 },
Triangle { base: f64, height: f64 },
}
let s = Shape::Circle { radius: 5.0 };
Pattern Matching
match is exhaustive -- the compiler ensures you handle every variant.
fn area(shape: Shape) -> f64 {
match shape {
Shape::Circle { radius } => {
3.14159 * radius * radius
}
Shape::Rectangle { width, height } => {
width * height
}
Shape::Triangle { base, height } => {
0.5 * base * height
}
}
}
Match with a wildcard:
match token.kind {
TokenKind::Fn => parse_function(),
TokenKind::Struct => parse_struct(),
TokenKind::Enum => parse_enum(),
_ => parse_expression(),
}
Or Patterns
Match multiple alternatives in a single arm:
match x {
1 | 2 | 3 => "small",
4 | 5 | 6 => "medium",
_ => "large",
}
Range Patterns
Match a range of values:
match score {
0..=59 => "F",
60..=69 => "D",
70..=79 => "C",
80..=89 => "B",
90..=100 => "A",
_ => "invalid",
}
Guard Clauses
Add conditions to match arms:
match value {
x if x > 0 => "positive",
x if x < 0 => "negative",
_ => "zero",
}
Control Flow
// if-else (these are expressions -- they return values)
let max = if a > b { a } else { b };
// while loop
let mut i = 0;
while i < 10 {
i = i + 1;
}
// for loop
for item in items {
process(item);
}
// loop (infinite, break to exit)
loop {
if done() {
break;
}
}
Option and Result
Option<T> represents a value that may or may not exist. Result<T, E> represents an operation that can succeed or fail.
// Option
fn find(items: Vec<i32>, target: i32) -> Option<usize> {
let mut i = 0;
while i < items.len() {
if items[i] == target {
return Option::Some(i);
}
i = i + 1;
}
Option::None
}
// Handling an Option
match find(items, 42) {
Option::Some(index) => println!("Found at {}", index),
Option::None => println!("Not found"),
}
// Result
fn parse_number(s: String) -> Result<i32, String> {
// ...
Result::Ok(42)
}
match parse_number(input) {
Result::Ok(n) => println!("Got: {}", n),
Result::Err(e) => println!("Error: {}", e),
}
Generics
Functions and types can be parameterized over types.
pub struct Pair<A, B> {
pub first: A,
pub second: B,
}
fn swap<A, B>(pair: Pair<A, B>) -> Pair<B, A> {
Pair { first: pair.second, second: pair.first }
}
Traits
Traits define shared behavior. Types implement traits with impl.
pub trait Display {
fn to_string(self) -> String;
}
impl Display for Point {
fn to_string(self) -> String {
"(" + self.x.to_string() + ", " + self.y.to_string() + ")"
}
}
Collections
// Vec -- dynamic array
let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
v.push(3);
let first = v[0]; // indexing
let len = v.len(); // length
// HashMap -- key-value store
let mut map: HashMap<String, i32> = HashMap::new();
map.insert("alice", 42);
map.insert("bob", 17);
Closures
Anonymous functions that can capture variables from their enclosing scope:
let double = |x: i32| -> i32 { x * 2 };
let result = double(21); // 42
// Closures capture variables
let multiplier = 3;
let multiply = |x: i32| -> i32 { x * multiplier };
Range-Based For Loops
Iterate over numeric ranges with ..:
// Exclusive range: 0, 1, 2, ..., 9
for i in 0..10 {
println!("{}", i);
}
// Use in accumulation
let mut sum = 0;
for i in 1..101 {
sum = sum + i;
}
// sum = 5050
Iterator Methods
Vec supports functional-style iterator methods:
let numbers = vec![1, 2, 3, 4, 5];
// Transform elements
let doubled = numbers.map(|x: i32| -> i32 { x * 2 });
// Filter elements
let evens = numbers.filter(|x: i32| -> bool { x % 2 == 0 });
// Check conditions
let has_negative = numbers.any(|x: i32| -> bool { x < 0 });
let all_positive = numbers.all(|x: i32| -> bool { x > 0 });
// Reduce to single value
let sum = numbers.fold(0, |acc: i32, x: i32| -> i32 { acc + x });
Option and Result Methods
Rich combinator APIs for safe value handling:
let opt: Option<i32> = Option::Some(42);
// Query
let is_there = opt.is_some(); // true
let is_empty = opt.is_none(); // false
// Extract with default
let val = opt.unwrap_or(0); // 42
// Transform
let doubled = opt.map(|x: i32| -> i32 { x * 2 }); // Some(84)
// Chain operations
let result = opt.and_then(|x: i32| -> Option<i32> {
if x > 0 { Option::Some(x * 10) } else { Option::None }
});
Pipe Operator
The pipe operator |> passes the result of the left expression as the first argument to the right function. It makes data transformation pipelines readable:
// Without pipe -- deeply nested calls
let result = to_uppercase(trim(read_file("data.txt")));
// With pipe -- reads left to right
let result = read_file("data.txt")
|> trim
|> to_uppercase;
// Works with closures and multi-argument functions
let processed = data
|> filter(|x| x > 0)
|> map(|x| x * 2)
|> fold(0, |acc, x| acc + x);
Union Types
Union types allow a value to be one of several types, checked at compile time:
type JsonValue = i64 | f64 | String | bool | Vec<JsonValue>;
fn process(value: JsonValue) {
match value {
x: i64 => println!("integer: {}", x),
x: f64 => println!("float: {}", x),
s: String => println!("string: {}", s),
b: bool => println!("bool: {}", b),
arr: Vec<JsonValue> => println!("array of {}", arr.len()),
}
}
Algebraic Effects
Effects declare the side effects a function may perform, tracked by the type system:
effect Log {
fn log(message: String);
}
effect Fail {
fn fail(reason: String) -> !;
}
fn process(data: Vec<u8>) -> Result<Output, Error> with Log, Fail {
Log::log("Processing started");
if data.is_empty() {
Fail::fail("empty input");
}
// ...
}
Effects are handled at the call site:
handle process(data) {
Log::log(msg) => {
println!("[LOG] {}", msg);
resume;
}
Fail::fail(reason) => {
Result::Err(Error::new(reason))
}
}
Supervisors
Supervisors manage the lifecycle of concurrent tasks with automatic restart strategies:
use std::concurrency::Supervisor;
let sup = Supervisor::new(RestartStrategy::OneForOne);
sup.spawn("worker-1", || {
// If this task panics, only this task is restarted
process_queue()
});
sup.spawn("worker-2", || {
process_events()
});
sup.run();
Parallel For
Parallel iteration over collections with automatic work distribution:
// Parallel map over a vector
let results = parallel for item in data {
heavy_computation(item)
};
// With explicit chunk size
let processed = parallel(chunk_size: 1024) for row in matrix {
transform(row)
};
The compiler tracks energy consumption across all parallel branches and sums them for the total budget.
Computation Builders
Computation builders provide a monadic syntax for composing complex operations:
let result = async {
let data = fetch(url).await;
let parsed = parse(data).await;
transform(parsed)
};
let query = query {
from users
where age > 18
select name, email
order_by name
};
Const Functions
Functions that can be evaluated at compile time:
const fn factorial(n: i32) -> i32 {
if n <= 1 { 1 } else { n * factorial(n - 1) }
}
// Evaluated at compile time
const FACT_10: i32 = factorial(10);
Comptime Blocks
Execute arbitrary code at compile time:
comptime {
let lookup = generate_lookup_table(256);
// lookup is available as a constant in runtime code
}
Modules and Imports
// Import specific items
use crate::ast::{File, AstItem};
use std::collections::HashMap;
// Module declarations (loads from separate file)
mod lexer; // loads from lexer.joule or lexer/mod.joule
mod parser;
mod typeck;
// Public module re-export
pub mod utils;
// Inline module
mod helpers {
pub fn clamp(x: i32, lo: i32, hi: i32) -> i32 {
if x < lo { lo } else if x > hi { hi } else { x }
}
}
// Glob import from stdlib
use std::math::*;
Async/Await with Channels
Asynchronous programming with channels for communication:
use std::concurrency::{spawn, channel};
async fn fetch_and_process(url: String) -> Result<Data, Error> {
let response = http::get(url).await?;
let data = parse(response.body()).await?;
Result::Ok(data)
}
// Bounded channels for backpressure
let (tx, rx) = channel(capacity: 100);
spawn(|| {
for item in source {
tx.send(item);
}
});
while let Option::Some(item) = rx.recv() {
process(item);
}
Smart Pointers
Manage shared ownership and heap allocation:
// Box — heap allocation, required for recursive types
let b = Box::new(42);
// Rc — single-threaded shared ownership
let shared = Rc::new(vec![1, 2, 3]);
let copy = shared.clone(); // reference count +1
// Arc — thread-safe shared ownership
let data = Arc::new(vec![1, 2, 3]);
spawn(|| { let local = data.clone(); });
// Cow — clone-on-write (free reads, allocate on mutation)
let text = Cow::borrowed("hello");
See Smart Pointers for full documentation.
Const-Generic Types
Types with compile-time integer parameters:
// SmallVec — inline buffer, heap only when overflow
let mut sv: SmallVec[i32; 8] = SmallVec::new();
sv.push(42); // inline — no heap allocation
// Simd — portable SIMD vectors
let a: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let b: Simd[f32; 4] = Simd::splat(2.0);
let c = a.mul(&b); // [2.0, 4.0, 6.0, 8.0] — single instruction
// NDArray — multi-dimensional arrays
let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]);
let val = mat[1, 2];
Box (Heap Allocation)
Box<T> puts data on the heap. Required for recursive types.
pub enum Expr {
Literal(i32),
Add {
left: Box<Expr>,
right: Box<Expr>,
},
}
Type Aliases
pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;
Energy Budgets
Joule's defining feature. Declare the maximum energy a function is allowed to consume:
#[energy_budget(max_joules = 0.0001)]
fn efficient_add(x: i32, y: i32) -> i32 {
x + y
}
The compiler estimates energy consumption at compile time. If a function exceeds its budget, compilation fails.
Power and thermal budgets are also available:
#[energy_budget(max_joules = 0.0002)]
#[thermal_aware]
fn thermal_safe_compute(n: i32) -> i32 {
let result = n * n;
result + 1
}
Compile with --energy-check to enable verification:
joulec program.joule -o program --energy-check
See Energy System Guide for a deep dive.
Testing with Energy
Write tests that verify both correctness and energy consumption:
#[test]
fn test_sort_energy() {
let data = vec![5, 3, 1, 4, 2];
let sorted = sort(data);
assert_eq!(sorted, vec![1, 2, 3, 4, 5]);
}
#[bench]
fn bench_matrix_multiply() {
let a = Matrix::random(100, 100);
let b = Matrix::random(100, 100);
let _ = a.multiply(b);
}
Run with:
joulec program.joule --test # runs tests with energy reporting
joulec program.joule --bench # runs benchmarks with energy reporting
Built-in Macros
Joule provides built-in macros for common operations:
// Output
println!("Hello, {}!", name); // print with newline
print!("no newline"); // print without newline
// Formatting
let s = format!("{} + {} = {}", a, b, a + b);
// Collections
let nums = vec![1, 2, 3, 4, 5];
// Assertions (for testing)
assert!(x > 0);
assert_eq!(result, expected);
For FFI with C libraries, use extern declarations:
extern fn sqrt(x: f64) -> f64;
What's Next
- Energy System Guide -- Deep dive into energy budgets
- Compiler Reference -- CLI flags and options
- JIT Compilation -- Interactive development
- Polyglot Energy Analysis -- Measure energy in Python/JS/C
- Standard Library -- All 110+ modules
- Language Reference -- Formal specification
Energy System Guide
Joule's defining feature is compile-time energy budget verification. This guide explains how it works and how to use it.
Why Energy Budgets?
Computing consumes enormous amounts of energy, and most of it is invisible. Cloud providers report aggregate billing units. Industry benchmarks report averages. Nobody tells you what a single sort, a single allocation, or a single network call actually costs in joules.
Joule makes that cost visible. Every function can declare its energy budget, and the compiler enforces it at compile time.
Basic Usage
Annotate a function with #[energy_budget]:
#[energy_budget(max_joules = 0.0001)] // 100 microjoules
fn add(x: i32, y: i32) -> i32 {
x + y
}
Compile with energy checking enabled:
joulec program.joule -o program --energy-check
If the function's estimated energy exceeds the declared budget, compilation fails with a diagnostic:
error: energy budget exceeded in function 'process_data'
--> program.joule:15:1
|
15 | fn process_data(input: Vec<f64>) -> f64 {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= estimated: 0.00035 J (confidence: 85%)
= budget: 0.00010 J
= exceeded by 250%
Budget Types
Energy Budget (Joules)
The primary budget type. Limits total energy consumption:
#[energy_budget(max_joules = 0.0005)]
fn fibonacci(n: i32) -> i32 {
// ...
}
Power Budget (Watts)
Limits average power draw. Useful for sustained workloads:
#[energy_budget(max_watts = 15.0)]
fn render_frame(scene: Scene) -> Image {
// ...
}
Thermal Budget (Temperature Delta)
Limits the temperature increase caused by the function. Prevents thermal throttling:
#[energy_budget(max_temp_delta = 5.0)] // max 5 degrees Celsius rise
fn heavy_compute(data: Vec<f64>) -> f64 {
// ...
}
Thermal-Aware Functions
The #[thermal_aware] attribute marks functions that should adapt to thermal conditions:
#[energy_budget(max_joules = 0.0002)]
#[thermal_aware]
fn adaptive_compute(n: i32) -> i32 {
let result = n * n;
result + 1
}
Combining Budgets
You can declare multiple budget constraints:
#[energy_budget(max_joules = 0.001, max_watts = 20.0, max_temp_delta = 3.0)]
fn critical_path(data: Vec<f64>) -> f64 {
// ...
}
How the Estimator Works
The compiler uses static analysis to estimate energy consumption without running your code. Here's what it considers:
Instruction Costs
Every operation has a calibrated energy cost in picojoules:
| Operation | Approximate Cost | Cycles |
|---|---|---|
| Integer add/sub | 0.05 pJ | 1 |
| Integer multiply | 0.35 pJ | 3 |
| Integer divide | 3.5 pJ | 10 |
| Float add/sub | 0.35 pJ | 3 |
| Float multiply | 0.35 pJ | 3 |
| Float divide | 3.5 pJ | 10 |
| Float sqrt | 5.25 pJ | 15 |
| L1 cache load | 0.5 pJ | 4 |
| L2 cache load | 3.0 pJ | 12 |
| L3 cache load | 10.0 pJ | 40 |
| DRAM load/store | 200.0 pJ | 200 |
| Branch (taken) | 0.1 pJ | 1 |
| Branch misprediction | 1.5 pJ | 15 |
| SIMD f32x8 multiply | 1.5 pJ | 3 |
| Half-precision (f16/bf16) op | 0.4 pJ | 1 |
| SmallVec inline push | 0.5 pJ | 1 |
| SmallVec heap spill | 45.0 pJ | ~100 |
| SIMD vector op (any width) | 2.0 pJ | 3 |
| Atomic read-modify-write | 8.0 pJ | 20 |
| Rc/Arc clone/drop | 3.0 pJ | 5 |
| Arena bump alloc | 1.0 pJ | 2 |
| Arena reset (free all) | 0.5 pJ | 1 |
| BitSet/BitVec word op | 0.3 pJ | 1 |
| Decimal (128-bit) arithmetic | 5.0 pJ | 15 |
| Deque push/pop | 2.0 pJ | 5 |
| Intern hash lookup | 10.0 pJ | 30 |
| Complex arithmetic | 1.6 pJ | 4 |
| Instant::now() (clock read) | 15.0 pJ | 50 |
| BTreeMap/BTreeSet traversal | 12.0 pJ | 40 |
Loop Analysis
For loops with known bounds, the estimator multiplies the loop body cost by the iteration count. For unbounded loops (while with runtime conditions), it uses a configurable default (100 iterations) and reduces the confidence score.
Branch Analysis
For if/else and match expressions, the estimator computes the cost of each branch and averages them, since it can't know which branch will execute at compile time. This reduces the confidence score.
Confidence Score
Every estimate comes with a confidence score from 0.0 to 1.0:
- 1.0 -- Straight-line code, no loops or branches. Estimate is precise.
- 0.85-0.95 -- Code with branches. Estimate is an average.
- 0.5-0.85 -- Code with unbounded loops. Estimate depends on assumed iteration count.
- < 0.5 -- Complex code with nested unbounded loops. Estimate is rough.
The confidence score is shown in diagnostic output so you can judge the reliability of the estimate.
Power Estimation
Power (watts) is derived from energy and estimated execution time:
Power = Energy / Time
Time = Estimated Cycles / CPU Frequency (3.0 GHz reference)
Thermal Estimation
Temperature delta is derived from power using a simplified thermal model:
Delta_T = Power * Thermal_Resistance (0.4 K/W typical)
Three-Tier Energy Measurement
Joule measures energy at three levels, providing increasing precision:
Tier 1: Static Estimation (Compile Time)
The compiler estimates energy from code structure alone, using the instruction cost model described above. This is available for all programs, everywhere, with zero runtime overhead.
- No hardware access required
- Works at compile time
- Confidence score indicates reliability
- Used for
#[energy_budget]verification
Tier 2: CPU Performance Counters (Runtime)
On supported platforms, Joule reads hardware performance counters (RAPL on Intel/AMD) to measure actual CPU energy consumption during execution.
- Requires Linux with
perf_eventor macOS withpowermetrics - Per-function and per-scope measurements
- Joule-level precision (not just watt-hours)
Tier 3: Accelerator Energy (Runtime)
For GPU and accelerator workloads, Joule queries vendor-specific energy APIs. See Accelerator Energy Measurement for details.
- NVIDIA GPUs via NVML
- AMD GPUs via ROCm SMI
- Intel GPUs/accelerators via Level Zero
- Google TPUs via TPU runtime
- AWS Inferentia/Trainium via Neuron SDK
- Groq LPUs via HLML
- Cerebras and SambaNova via vendor APIs
JSON Output Mode
For programmatic consumption, set the environment variable JOULE_ENERGY_JSON=1:
JOULE_ENERGY_JSON=1 joulec program.joule --emit c --energy-check -o program.c
This outputs energy reports as structured JSON:
{
"functions": [
{
"name": "process_data",
"file": "program.joule",
"line": 15,
"energy_joules": 0.00035,
"power_watts": 12.5,
"confidence": 0.85,
"budget_joules": 0.0001,
"status": "exceeded",
"breakdown": {
"compute_pj": 280000,
"memory_pj": 70000,
"branch_pj": 500
}
}
],
"total_energy_joules": 0.00042,
"device": "cpu"
}
When accelerator energy is available, the JSON includes per-device breakdowns:
{
"devices": [
{ "type": "cpu", "energy_joules": 0.00042 },
{ "type": "gpu", "vendor": "nvidia", "energy_joules": 0.0031, "api": "nvml" }
],
"total_energy_joules": 0.00352
}
Practical Guidelines
Start Generous, Then Tighten
Begin with a generous budget, measure, then reduce:
// Start here
#[energy_budget(max_joules = 0.01)]
// After profiling, tighten
#[energy_budget(max_joules = 0.001)]
// Production target
#[energy_budget(max_joules = 0.0005)]
Budget Hot Loops Carefully
The estimator assumes 100 iterations for unbounded loops. If your loop runs 10,000 times, the estimate will be 100x too low. Consider refactoring into bounded loops or adjusting your budget accordingly.
Use Confidence Scores
If the compiler reports low confidence (< 0.7), the estimate may be significantly off. Review the function for unbounded loops and complex branching.
Transitive Energy Budgets
Energy budgets are enforced across call boundaries. A function calling another budgeted function includes the callee's energy in its own estimate:
#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }
#[energy_budget(max_joules = 0.0005)]
fn main_work() -> i32 {
// The compiler accounts for helper's energy within main_work's budget
helper() + helper()
}
Profile-Guided Refinement
For the most accurate energy estimates, use profile-guided optimization:
# Phase 1: instrument and run
joulec program.joule --profile-generate -o program
./program
# Phase 2: compile with profile data
joulec program.joule --profile-use profile.json --energy-check -o program
The profile data provides actual loop trip counts and branch frequencies, dramatically improving estimate accuracy.
Feedback
Questions about the energy system? Visit joule-lang.org or open an issue on GitHub.
Compiler Reference
Usage
joulec <INPUT> [OPTIONS]
Where <INPUT> is a .joule source file (or a foreign source file when using --lift-run).
Options
| Flag | Description | Default |
|---|---|---|
-o <FILE> | Output file path | Derived from input |
--emit <TYPE> | Emit intermediate representation: ast, hir, mir, llvm-ir, c, eir | (compile to binary) |
--backend <BACKEND> | Code generation backend: cranelift, llvm, mlir, auto | cranelift |
--target <TARGET> | Target platform: cpu, cuda, metal, rocm, hybrid | cpu |
-O <LEVEL> | Optimization level: 0, 1, 2, 3 | 0 |
--energy-check | Enable compile-time energy budget verification | Off |
--gpu | Enable GPU code generation (uses MLIR backend) | Off |
--jit | JIT-compile and run immediately (requires --features jit) | Off |
--watch | Watch source file and re-run on changes (implies --jit) | Off |
--lift <LANG> | Lift foreign code for energy analysis: python, js, c | (none) |
--lift-run <LANG> <FILE> | Lift and execute foreign code with energy tracking | (none) |
--energy-optimize | Apply energy optimization passes to lifted code | Off |
--egraph-optimize | Enable e-graph algebraic optimization (30+ rewrite rules) | Off |
--profile-generate | Instrument code for profile-guided optimization | Off |
--profile-use <FILE> | Apply PGO profile data from a previous run | (none) |
--incremental | Enable incremental compilation (FNV-1a fingerprinting) | Off |
--test | Build and run #[test] functions with energy reporting | Off |
--bench | Build and run #[bench] functions with energy reporting | Off |
--debug | Debug build profile (no optimizations, debug info) | Default |
--release | Release build profile (-O2, strip debug info) | Off |
--stdlib-path <DIR> | Path to the Joule standard library | Built-in |
-v, --verbose | Verbose compiler output | Off |
Environment Variables
| Variable | Description |
|---|---|
JOULE_ENERGY_JSON=1 | Output energy reports as JSON instead of human-readable text |
Examples
Basic Compilation
# Compile to executable via C backend
joulec program.joule --emit c -o program.c
cc -o program program.c
# Compile with energy checking
joulec program.joule --emit c -o program.c --energy-check
# Release build with optimizations
joulec program.joule --release -o program
Emit Intermediate Representations
# Emit the AST (for debugging)
joulec program.joule --emit ast
# Emit HIR (typed intermediate representation)
joulec program.joule --emit hir
# Emit MIR (mid-level IR, after lowering)
joulec program.joule --emit mir
# Emit EIR (Energy IR with picojoule cost annotations)
joulec program.joule --emit eir
JIT Compilation
# JIT-compile and run immediately
joulec --jit program.joule
# Watch mode: re-compile and re-run on file changes
joulec --watch program.joule
See JIT Compilation for details.
Polyglot Energy Analysis
# Lift and run Python with energy measurement
joulec --lift-run python script.py
# Lift and run JavaScript with energy measurement
joulec --lift-run js app.js
# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py
See Polyglot Energy Analysis for details.
Advanced Optimization
# E-graph algebraic optimization
joulec program.joule --emit c --egraph-optimize -o program.c
# Profile-guided optimization (two-phase)
joulec program.joule --profile-generate -o program
./program # generates profile data
joulec program.joule --profile-use profile.json -o program_optimized
# Incremental compilation
joulec program.joule --incremental -o program
Testing and Benchmarking
# Run tests with energy reporting
joulec program.joule --test
# Run benchmarks with energy reporting
joulec program.joule --bench
JSON Energy Output
# Get energy reports as JSON
JOULE_ENERGY_JSON=1 joulec program.joule --emit c -o program.c --energy-check
Compilation Pipeline
Source code flows through these stages:
Source (.joule)
|
v
Lexer ---------- Tokens
|
v
Parser --------- AST (Abstract Syntax Tree)
|
v
Type Checker ---- HIR (High-level IR) + Type Information
|
+-- Energy Budget Checker (if --energy-check)
|
v
EIR Lowering ---- EIR (Energy IR) [if --egraph-optimize or --emit eir]
|
+-- E-Graph Optimizer (30+ algebraic rewrite rules)
|
v
MIR Lowering ---- MIR (Mid-level IR)
|
v
Borrow Checker -- Ownership/lifetime verification
|
v
Code Generation
+-- C Backend ---------- C source code
+-- Cranelift Backend --- Native binary (fast compilation)
+-- Cranelift JIT ------- In-memory execution (--jit/--watch)
+-- LLVM Backend -------- Native binary (optimized)
+-- MLIR Backend -------- GPU/accelerator code
+-- WASM Backend -------- WebAssembly
Incremental Compilation
When --incremental is enabled, the compiler:
- Fingerprints each source file using FNV-1a hashing
- Builds a dependency graph between modules
- On recompilation, only reprocesses files whose fingerprint changed (or whose dependencies changed)
- Caches query results to disk as JSON for persistence across sessions
Profile-Guided Optimization
PGO is a two-phase process:
- Phase 1 (
--profile-generate): The compiler instruments the C output with basic-block counters. Running the instrumented binary produces a JSON profile with execution frequencies. - Phase 2 (
--profile-use): The compiler reads the profile and refines EIR energy cost estimates using actual execution frequencies. Loop trip counts are derived from back-edge counter ratios. Hot paths get more accurate energy budgets.
Backends
C Backend (--emit c)
Generates portable C source code. This is the primary backend and the one used for the bootstrap compiler. The generated C compiles with any standard C compiler (gcc, clang, cc).
joulec program.joule --emit c -o program.c
cc -o program program.c
Features:
- Freestanding mode for embedded targets (
jrt_*runtime abstraction) #linedirectives for source-level debugging- Energy instrumentation for PGO
Cranelift Backend
Fast compilation, suitable for development. Uses the Cranelift code generator. Enable with --features cranelift.
Cranelift JIT Backend
In-memory compilation and execution. No intermediate files. Enable with --features jit.
joulec --jit program.joule
LLVM Backend
Optimized compilation for release builds. Requires LLVM 16+. Enable with --features llvm.
MLIR Backend
Heterogeneous computing with GPU/accelerator support. Targets CUDA, Metal, and ROCm. Enable with --gpu.
WASM Backend
WebAssembly output for browser and edge deployment.
File Extension
Joule source files must use the .joule extension. The compiler rejects all other extensions (except when using --lift-run, which accepts .py, .js, and .c).
Energy Checking
When --energy-check is passed, the compiler performs static analysis on every function with an #[energy_budget] attribute. Functions that exceed their declared budget produce a compilation error.
See Energy System Guide for details.
Diagnostics
The compiler produces structured error messages with source locations:
error[E0001]: mismatched types
--> program.joule:10:15
|
10 | let x: i32 = "hello";
| ^^^^^^^ expected i32, found String
Warnings are shown for potential issues but don't prevent compilation (unless you've set strict mode).
JIT Compilation
Joule supports just-in-time compilation for interactive development. Instead of producing an executable file, the compiler compiles your code in memory and runs it immediately.
Quick Start
# JIT-compile and run
joulec --jit program.joule
# Watch mode: re-compile on file changes
joulec --watch program.joule
Requirements
JIT mode requires the jit feature flag, which enables the Cranelift JIT backend:
# Build joulec with JIT support
cargo build --release -p joulec --features jit
The feature chain is: jit -> cranelift -> joule-codegen-cranelift + joule-codegen + notify.
How It Works
JIT Mode (--jit)
- Source code is parsed, type-checked, and lowered to MIR (the same pipeline as normal compilation)
- MIR is translated to Cranelift IR
- Cranelift compiles the IR to native machine code in memory
- The
main()function is called directly via a function pointer - The program runs and exits
No intermediate files are produced. No C compiler is invoked. Compilation and execution happen in a single process.
Watch Mode (--watch)
Watch mode extends JIT with file monitoring:
- The source file is JIT-compiled and run (same as
--jit) - The
notifycrate monitors the source file for changes - When the file is saved, a fresh JIT module is created and the program re-runs
- A 50ms debounce prevents multiple re-runs from editor save-rename sequences
Each watch cycle creates a fresh JITModule because Cranelift's JIT module cannot redefine functions. This ensures clean state on every re-run.
Architecture
FunctionTranslator
The FunctionTranslator<'a, M: Module> is generic over the module type:
| Module Type | Mode | Output |
|---|---|---|
ObjectModule | AOT compilation | Object file (.o) |
JITModule | JIT compilation | In-memory executable code |
This means the same translation logic handles both AOT and JIT -- no code duplication.
Runtime Symbols
JIT mode provides runtime symbols that replace the C runtime's functions:
| Symbol | Purpose |
|---|---|
joule_jit_println | Print a string with newline |
joule_jit_print | Print a string without newline |
joule_jit_panic | Panic with a message |
malloc | Memory allocation (libc) |
free | Memory deallocation (libc) |
memcpy | Memory copy (libc) |
These symbols are registered with the JITModule before compilation so that generated code can call them.
PIC Mode
JIT compilation uses position-independent code (PIC) = false, since the code runs in a known memory location. AOT compilation uses PIC = true for shared library compatibility.
Energy Tracking
JIT mode includes full energy tracking. Energy consumed during execution is measured and reported:
$ joulec --jit program.joule
Hello from JIT!
Energy consumed: 0.000123 J
Energy budgets declared with #[energy_budget] are checked at compile time, before JIT execution begins. If a budget is violated, compilation fails and the program does not run.
Limitations
- No persistent output: JIT mode does not produce an executable file. For deployment, use the C backend or AOT Cranelift compilation.
- Single-file: JIT mode currently compiles a single source file. Multi-file projects should use
moddeclarations within the entry file. - Feature gate: JIT support is behind
--features jitto keep the default binary small. Thenotifydependency is only pulled in when JIT is enabled.
Use Cases
Rapid Prototyping
JIT mode eliminates the compile-link-run cycle:
# Edit, save, see results instantly
joulec --watch prototype.joule
Energy Experimentation
Try different algorithms and immediately see their energy impact:
// Try bubble sort
#[energy_budget(max_joules = 0.001)]
fn sort_experiment(data: Vec<i32>) -> Vec<i32> {
bubble_sort(data)
}
joulec --jit experiment.joule
# Change to quicksort, save, see new energy reading
Interactive Testing
Run tests without a full build:
joulec --jit --test tests.joule
Comparison with Other Modes
| Mode | Command | Speed | Output | Use Case |
|---|---|---|---|---|
| JIT | --jit | Fastest | None (runs in memory) | Development |
| Watch | --watch | Fast (re-runs on save) | None | Interactive development |
| C Backend | --emit c | Moderate | .c file | Deployment, bootstrap |
| Cranelift AOT | (default) | Fast | Binary | Development builds |
| LLVM | --features llvm | Slow | Optimized binary | Release builds |
Polyglot Energy Analysis
Joule can measure and optimize the energy consumption of code written in other languages. The --lift-run flag lifts foreign code into Joule's intermediate representation, applies energy analysis, and executes it with full energy tracking.
Quick Start
# Measure energy in a Python script
joulec --lift-run python script.py
# Measure energy in a JavaScript file
joulec --lift-run js app.js
# Measure energy in C code
joulec --lift-run c program.c
# Apply energy optimization before running
joulec --energy-optimize --lift-run python script.py
# Generate JSON energy report
joulec --lift python script.py --energy-report report.json
# Set energy budget (exit code 1 if exceeded)
joulec --lift python script.py --energy-budget 100nJ
How It Works
The polyglot pipeline has four stages:
-
Parse: The source file is parsed by a language-specific parser (Python, JavaScript, or C) into Joule's
LiftedModulerepresentation. -
Lower: The lifted AST is lowered to MIR (Mid-level IR), the same representation used for native Joule code. Variables, functions, classes, and control flow are all mapped to MIR constructs.
-
Optimize (optional): When
--energy-optimizeis passed, four energy optimization passes are applied to the MIR before execution. -
Execute: The MIR is JIT-compiled via the Cranelift backend and executed in-memory. Energy consumption is tracked throughout execution and reported at the end.
Supported Languages
Python
Comprehensive support for Python syntax and semantics:
| Feature | Status |
|---|---|
| Functions, closures, lambdas | Supported |
| Classes (single and multiple inheritance) | Supported |
| List/dict/set comprehensions | Supported |
| f-strings | Supported |
| Ternary expressions | Supported |
| enumerate/zip | Supported |
| match/case (Python 3.10+) | Supported |
Walrus operator (:=) | Supported |
| try/except/finally | Supported (guard patterns) |
| Slicing with step | Supported |
| Default arguments | Supported |
*args, **kwargs | Supported |
| Generator expressions | Supported |
| String methods (30+) | Supported |
| List methods (20+) | Supported |
| Dict methods (15+) | Supported |
| Math module | Supported |
Print with end= | Supported |
| True division | Supported |
| BigInt overflow handling | Supported |
JavaScript
Comprehensive support for JavaScript syntax and semantics:
| Feature | Status |
|---|---|
| Functions, arrow functions | Supported |
| Classes (single inheritance) | Supported |
| Template literals | Supported |
| Destructuring | Supported |
| Spread operator | Supported |
| switch/case | Supported |
| for-in/for-of | Supported |
| do-while | Supported |
| Bitwise operators | Supported |
| typeof | Supported |
Nullish coalescing (??) | Supported |
Optional chaining (?.) | Supported |
| Array methods (20+) | Supported |
| String methods (15+) | Supported |
| Object methods | Supported |
| Math object | Supported |
| console.log | Supported |
this keyword | Supported |
C
Basic support for C code:
| Feature | Status |
|---|---|
| Functions | Supported |
| Basic types (int, float, double, char) | Supported |
| Arrays | Supported |
| Pointers | Supported |
| Control flow (if, while, for) | Supported |
| stdio (printf, scanf) | Supported |
| math.h functions | Supported |
TypeScript
TypeScript types are erased before analysis — the energy profile is identical to JavaScript. See the TypeScript Guide for details.
| Feature | Status |
|---|---|
| Everything in JavaScript | Supported |
| Type annotations | Stripped |
| Interfaces, type aliases, generics | Stripped |
| Access modifiers (public/private) | Stripped |
| Enums (simple) | Converted to constants |
Go
| Feature | Status |
|---|---|
| Functions, closures, variadic | Supported |
| for, for range, if/else, switch | Supported |
| Slices, maps, structs, methods | Supported |
Goroutines (go) | Supported (sequential analysis) |
Channels (chan, <-) | Supported |
defer | Supported |
| Multiple return values | Supported |
fmt, math, strings, strconv | Supported |
Rust
| Feature | Status |
|---|---|
| Functions, closures, impl blocks | Supported |
| for/while/loop, if/else, match | Supported |
let/let mut, ownership annotations | Supported |
| Structs, enums, Option, Result | Supported |
| Vec, HashMap, String, Box | Supported |
| Iterator chains (.map/.filter/.fold) | Supported |
println!, format!, vec! | Supported |
| Traits (signatures only) | Supported |
Energy Recommendations
When analyzing code, Joule detects common energy anti-patterns and suggests fixes. Categories include:
- ALGORITHM -- Nested loops where a hash set would be O(1)
- ALLOCATION -- Heap allocation inside hot loops
- REDUNDANCY -- Recomputed values that could be hoisted
- DATA STRUCTURE -- Linear search where a set/map is more efficient
- LOOP -- Missing early exits, unbounded iteration
- STRING -- String concatenation in loops (O(n^2))
- MEMORY -- Cache-unfriendly access patterns
- PRECISION -- Float arithmetic where integer suffices
See the per-language guides for language-specific examples of each pattern.
Runtime System
The lift-run runtime provides 100+ shim functions that bridge language-specific operations to native code:
String Operations
str_new, str_concat, str_len, str_print, str_from_int, str_from_float, str_eq, str_index, str_slice, str_contains, str_mul, str_cmp, str_upper, str_lower, str_trim, str_split, str_replace, str_starts_with, str_ends_with, str_index_of, and more.
List Operations
list_new, list_push, list_get, list_set, list_len, list_pop, list_sort, list_reverse, list_copy, list_append, list_index_of, list_contains, list_slice, list_map, list_filter, and more.
Dict Operations
dict_new, dict_set, dict_get, dict_len, dict_get_default, dict_pop, dict_update, dict_setdefault, dict_keys, dict_values, dict_items, dict_contains, and more.
Class Desugaring
Classes from Python and JavaScript are desugared to dictionary-backed standalone functions:
# Python source
class Counter:
def __init__(self, start):
self.count = start
def increment(self):
self.count += 1
return self.count
This is lowered to:
Counter____init__(self, start)-- constructor functionCounter__increment(self)-- method functionselfis a dictionary with fields as key-value pairs
Multiple inheritance is supported using BFS method resolution order (MRO).
Energy Optimization Passes
When --energy-optimize is used, four passes optimize the lifted code:
- Constant Propagation -- Propagate known values, fold constant expressions
- Dead Code Elimination -- Remove unreachable and unused code
- Loop Optimization -- Reduce redundant computation in loops
- Strength Reduction -- Replace expensive operations with cheaper equivalents
Test Coverage
The polyglot pipeline is validated by 1,220 tests across 8 test suites:
| Suite | Count | Description |
|---|---|---|
| Tiered validation | 90 | Core feature coverage |
| Edge cases | 80 | Corner cases and error handling |
| Domain | 100 | 50 Python + 50 JS across 5 domains |
| Stdlib | 100 | 50 Python + 50 JS: string/list methods, default args |
| Classes | 50 | Inheritance, MRO, properties, static methods |
| Advanced | 50 | Closures, generators, decorators, metaclasses |
| Syntax | 50 | Language-specific syntax features |
| Coverage | 700 | Division, print, comprehensions, string ops |
Total: 1,220/1,220 (100% pass rate)
Examples
Python Energy Analysis
# fibonacci.py
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n - 1) + fibonacci(n - 2)
result = fibonacci(30)
print(f"Result: {result}")
$ joulec --lift-run python fibonacci.py
Result: 832040
Energy consumed: 0.00234 J
JavaScript Energy Analysis
// sort.js
function quickSort(arr) {
if (arr.length <= 1) return arr;
const pivot = arr[0];
const left = arr.filter(x => x < pivot);
const right = arr.filter(x => x > pivot);
return [...quickSort(left), pivot, ...quickSort(right)];
}
const data = Array.from({length: 1000}, () => Math.floor(Math.random() * 10000));
const sorted = quickSort(data);
console.log(`Sorted ${sorted.length} elements`);
$ joulec --lift-run js sort.js
Sorted 1000 elements
Energy consumed: 0.00891 J
Energy-Optimized Execution
$ joulec --energy-optimize --lift-run python fibonacci.py
Result: 832040
Energy consumed: 0.00198 J (15.4% reduction)
Static Analysis Mode
For energy analysis without execution, use --lift instead of --lift-run:
# Analyze without running
joulec --lift python script.py
# Output includes per-function energy estimates
This performs the parsing and lowering steps but stops before JIT compilation, producing a static energy report for each function.
Per-Language Guides
For detailed anti-patterns, optimization tips, and worked examples specific to each language:
- Python Guide -- 100+ runtime shims, classes, comprehensions, f-strings
- JavaScript Guide -- Arrow functions, template literals, array methods
- TypeScript Guide -- Type erasure, identical energy to JavaScript
- C Guide -- Memory allocation patterns, cache analysis
- Go Guide -- Goroutines, channels, slice operations
- Rust Guide -- Iterator chains, zero-cost abstractions
Further Reading
- Energy Optimization Walkthrough -- Step-by-step guide from baseline to optimized
- Cross-Language Energy Comparison -- Same algorithm in 6 languages, energy ranked
Python Energy Analysis
Joule provides comprehensive energy analysis for Python code. With 100+ runtime shims covering strings, lists, dicts, classes, comprehensions, and f-strings, most idiomatic Python runs unmodified.
Quick Start
# Static energy analysis (no execution)
joulec --lift python script.py
# Execute with energy tracking
joulec --lift-run python script.py
# Execute with energy optimization
joulec --energy-optimize --lift-run python script.py
# Generate JSON report for CI
joulec --lift python script.py --energy-report report.json
Supported Features
| Category | Features |
|---|---|
| Functions | def, lambda, closures, default arguments, *args, **kwargs |
| Classes | Single and multiple inheritance, __init__, methods, properties, static methods, BFS MRO |
| Control flow | if/elif/else, while, for x in, break, continue, return |
| Comprehensions | List [x for x in ...], dict {k:v for ...}, set {x for ...}, generator (x for ...) |
| String features | f-strings, .upper(), .lower(), .strip(), .split(), .replace(), .startswith(), .endswith(), .join(), .find(), .index(), + concatenation, * repetition (30+ methods) |
| List features | .append(), .pop(), .sort(), .reverse(), .index(), .count(), .copy(), slicing, len(), in operator (20+ methods) |
| Dict features | .get(), .pop(), .update(), .setdefault(), .keys(), .values(), .items(), in operator (15+ methods) |
| Math | math.floor(), math.ceil(), math.sqrt(), math.pow(), abs(), min(), max(), sum(), range() |
| Expressions | Ternary x if cond else y, walrus :=, match/case, enumerate(), zip(), true division, ** power |
| Error handling | try/except/finally with guard patterns (division, key, bounds) |
| Types | int (i64 + BigInt overflow), float (f64), bool, str, list, dict, set, None |
print() with end= parameter, polymorphic output (int/float/string) |
Common Energy Anti-Patterns
1. String Concatenation in Loops
# BAD — O(n^2) energy: each += allocates a new string
result = ""
for word in words:
result += word + " "
# GOOD — O(n) energy: join allocates once
result = " ".join(words)
Category: STRING | Severity: High | Savings: ~10x for large inputs
Each += on a string allocates a new buffer and copies the entire accumulated string. For 1,000 words averaging 5 characters, the bad version performs ~2.5 million character copies. The good version performs ~5,000.
2. Linear Search on List vs Set
# BAD — O(n) per lookup = O(n*m) total
for item in queries:
if item in large_list: # linear scan every time
process(item)
# GOOD — O(1) per lookup = O(n+m) total
lookup = set(large_list) # one-time O(n) cost
for item in queries:
if item in lookup: # hash lookup
process(item)
Category: DATA STRUCTURE | Severity: High | Savings: ~50x for 10K elements
3. Allocation Inside Hot Loops
# BAD — allocates a new list every iteration
for i in range(1000):
temp = []
temp.append(i)
process(temp)
# GOOD — reuse buffer
temp = []
for i in range(1000):
temp.clear()
temp.append(i)
process(temp)
Category: ALLOCATION | Severity: Medium | Savings: ~3x
4. Missing Early Exit
# BAD — always scans entire list
def find_first(items, target):
result = -1
for i in range(len(items)):
if items[i] == target:
result = i
return result
# GOOD — exits on first match
def find_first(items, target):
for i in range(len(items)):
if items[i] == target:
return i
return -1
Category: LOOP | Severity: Medium | Savings: ~2x average case
5. Recomputing Loop Invariants
# BAD — len(data) recomputed every iteration
for i in range(len(data)):
if i < len(data) - 1:
process(data[i], data[i + 1])
# GOOD — compute once
n = len(data)
for i in range(n - 1):
process(data[i], data[i + 1])
Category: REDUNDANCY | Severity: Low | Savings: ~1.2x
Worked Example
Given a data processing pipeline:
class DataProcessor:
def __init__(self, data):
self.data = data
self.results = []
def filter_positive(self):
filtered = []
for x in self.data:
if x > 0:
filtered.append(x)
self.data = filtered
def normalize(self):
total = sum(self.data)
self.data = [x / total for x in self.data]
def to_report(self):
report = ""
for i in range(len(self.data)):
report += f"Item {i}: {self.data[i]}\n"
return report
def main():
proc = DataProcessor([3.0, -1.0, 4.0, -2.0, 5.0, 1.0])
proc.filter_positive()
proc.normalize()
print(proc.to_report())
main()
Running energy analysis:
$ joulec --lift python pipeline.py
Energy Analysis: pipeline.py
DataProcessor____init__ 2.35 nJ (confidence: 0.95)
DataProcessor__filter_positive 8.72 nJ (confidence: 0.65)
DataProcessor__normalize 6.15 nJ (confidence: 0.70)
DataProcessor__to_report 14.80 nJ (confidence: 0.55)
main 3.20 nJ (confidence: 0.90)
Total: 35.22 nJ
Recommendations:
!! [STRING] DataProcessor__to_report — string concatenation in loop
Suggestion: use "".join() to build string in one allocation
Estimated savings: 8-10x for large inputs
! [REDUNDANCY] DataProcessor__to_report — len() called inside loop range
Suggestion: compute len() once before the loop
Estimated savings: 1.2x
JSON Energy Report
$ joulec --lift python pipeline.py --energy-report report.json
{
"source_file": "pipeline.py",
"language": "python",
"functions": [
{
"name": "DataProcessor__to_report",
"energy_pj": 14800,
"energy_human": "14.80 nJ",
"confidence": 0.55
}
],
"total_energy_pj": 35220,
"total_energy_human": "35.22 nJ",
"functions_lifted": 5,
"constructs_approximated": 2,
"recommendations": [
{
"function": "DataProcessor__to_report",
"category": "STRING",
"severity": "high",
"issue": "string concatenation in loop",
"suggestion": "use join() to build string in one allocation",
"savings_factor": 8.0
}
]
}
Energy Budget for CI
# Fail the build if total energy exceeds 50 nJ
$ joulec --lift python pipeline.py --energy-budget 50nJ
# Exit code 0: within budget
$ joulec --lift python pipeline.py --energy-budget 20nJ
# Exit code 1: budget exceeded (35.22 nJ > 20.00 nJ)
Limitations
- No external package imports (
import numpy,import requests, etc.) -- only built-in operations try/exceptuses guard patterns (division, key, bounds) rather than full exception semantics- Generator execution is approximated (constant iteration count estimate)
- No
async/await-- async patterns are desugared to synchronous equivalents - No decorator side effects -- decorators are recognized but not executed
- Class
__repr__,__str__,__eq__dunder methods are not auto-dispatched
JavaScript Energy Analysis
Joule lifts JavaScript into its energy analysis pipeline, providing per-function energy estimates for Node.js and browser-style code. Arrow functions, template literals, classes, destructuring, and 20+ array methods are fully supported.
Quick Start
# Static energy analysis
joulec --lift js app.js
# Execute with energy tracking
joulec --lift-run js app.js
# Execute with energy optimization
joulec --energy-optimize --lift-run js app.js
Supported Features
| Category | Features |
|---|---|
| Functions | function, arrow functions =>, default params, rest params ...args |
| Classes | class, constructor, extends, methods, static, this, super |
| Control flow | if/else, while, do-while, for, for-in, for-of, switch/case, break, continue |
| Destructuring | Array [a, b] = arr, object {x, y} = obj, nested, with defaults |
| Operators | Spread ..., nullish coalescing ??, optional chaining ?., typeof, bitwise |
| Template literals | `Hello ${name}` with expression interpolation |
| Array methods | .push(), .pop(), .map(), .filter(), .reduce(), .find(), .findIndex(), .some(), .every(), .forEach(), .indexOf(), .includes(), .slice(), .splice(), .concat(), .reverse(), .sort(), .join(), .flat(), .length |
| String methods | .length, .charAt(), .indexOf(), .includes(), .slice(), .substring(), .toUpperCase(), .toLowerCase(), .trim(), .split(), .replace(), .startsWith(), .endsWith(), .repeat() |
| Object methods | Object.keys(), Object.values(), Object.entries() |
| Math | Math.floor(), Math.ceil(), Math.round(), Math.abs(), Math.max(), Math.min(), Math.pow(), Math.sqrt(), Math.random(), Math.PI |
| Output | console.log() with auto-coercion |
| Types | Numbers (f64), strings, booleans, arrays, objects, null, undefined |
Common Energy Anti-Patterns
1. Chained Array Methods Creating Intermediates
// BAD — 3 intermediate arrays allocated
const result = data
.filter(x => x > 0) // allocates filtered array
.map(x => x * 2) // allocates mapped array
.reduce((a, b) => a + b, 0); // iterates again
// GOOD — single pass, one allocation
let result = 0;
for (const x of data) {
if (x > 0) result += x * 2;
}
Category: ALLOCATION | Severity: High | Savings: ~3x (eliminates 2 intermediate allocations)
2. indexOf on Large Arrays
// BAD — O(n) per check
for (const query of queries) {
if (data.indexOf(query) !== -1) {
process(query);
}
}
// GOOD — O(1) per check with Set
const lookup = new Set(data);
for (const query of queries) {
if (lookup.has(query)) {
process(query);
}
}
Category: DATA STRUCTURE | Severity: High | Savings: ~50x for 10K elements
3. Template Literals in Tight Loops
// BAD — string allocation every iteration
for (let i = 0; i < 10000; i++) {
const msg = `Processing item ${i} of ${total}`;
log(msg);
}
// GOOD — build once if constant parts dominate
const prefix = "Processing item ";
const suffix = " of " + total;
for (let i = 0; i < 10000; i++) {
log(prefix + i + suffix);
}
Category: STRING | Severity: Medium | Savings: ~2x
4. Nested for-of Loops
// BAD — O(n*m) with no early exit
function findPair(arr1, arr2, target) {
for (const a of arr1) {
for (const b of arr2) {
if (a + b === target) return [a, b];
}
}
return null;
}
// GOOD — O(n+m) with hash set
function findPair(arr1, arr2, target) {
const seen = new Set(arr1);
for (const b of arr2) {
if (seen.has(target - b)) return [target - b, b];
}
return null;
}
Category: ALGORITHM | Severity: Critical | Savings: ~100x for large inputs
5. forEach with Closure Allocation
// BAD — allocates closure object per iteration
data.forEach(function(item) {
if (item.active) results.push(item.name);
});
// GOOD — for-of avoids closure overhead
for (const item of data) {
if (item.active) results.push(item.name);
}
Category: ALLOCATION | Severity: Low | Savings: ~1.3x
Worked Example
class EventQueue {
constructor() {
this.events = [];
this.handlers = [];
}
on(type, handler) {
this.handlers.push({ type: type, fn: handler });
}
emit(type, data) {
this.events.push({ type: type, data: data, time: Date.now() });
const matching = this.handlers.filter(h => h.type === type);
matching.forEach(h => h.fn(data));
}
getEventsByType(type) {
return this.events.filter(e => e.type === type);
}
}
function main() {
const queue = new EventQueue();
let total = 0;
queue.on("data", function(val) { total += val; });
queue.on("data", function(val) { console.log(`Received: ${val}`); });
for (let i = 0; i < 100; i++) {
queue.emit("data", i);
}
console.log(`Total: ${total}`);
const dataEvents = queue.getEventsByType("data");
console.log(`Events logged: ${dataEvents.length}`);
}
main();
$ joulec --lift js events.js
Energy Analysis: events.js
EventQueue__constructor 1.20 nJ (confidence: 0.95)
EventQueue__on 2.10 nJ (confidence: 0.90)
EventQueue__emit 18.50 nJ (confidence: 0.60)
EventQueue__getEventsByType 5.30 nJ (confidence: 0.65)
main 8.40 nJ (confidence: 0.55)
Total: 35.50 nJ
Recommendations:
!! [ALLOCATION] EventQueue__emit — filter() + forEach() chain allocates intermediate array
Suggestion: use a single for-of loop to filter and dispatch in one pass
Estimated savings: 2-3x
Limitations
- No DOM APIs (
document,window,fetch, etc.) - No
require()orimportof npm modules async/awaitand Promises are approximated as synchronous- No
WeakMap,WeakSet,Proxy,Reflect - No regular expressions (regex literals are parsed but not executed)
Date.now()returns a simulated timestamp- No
eval()or dynamic code execution
TypeScript Energy Analysis
Joule analyzes TypeScript by stripping type annotations and delegating to the JavaScript pipeline. Since TypeScript types are erased at compile time, the energy profile of a TypeScript program is identical to its JavaScript equivalent.
Quick Start
# Static energy analysis
joulec --lift ts app.ts
# Execute with energy tracking
joulec --lift-run ts app.ts
# Execute with energy optimization
joulec --energy-optimize --lift-run ts app.ts
How It Works
The TypeScript lifter removes all TypeScript-specific syntax before analysis:
- Type annotations —
x: number,fn(s: string): void - Interfaces —
interface Foo { ... } - Type aliases —
type Result = Success | Error - Generics —
Array<number>,Map<string, number> - Access modifiers —
public,private,protected,readonly - Enums —
enum Color { Red, Green, Blue } - Non-null assertions —
value! - Type casts —
value as Type,<Type>value
After stripping, the remaining JavaScript is analyzed normally. This means TypeScript types are free — they add zero energy overhead.
Type Safety is Free
// TypeScript version
interface Point {
x: number;
y: number;
}
function distance(a: Point, b: Point): number {
const dx: number = a.x - b.x;
const dy: number = a.y - b.y;
return Math.sqrt(dx * dx + dy * dy);
}
// Equivalent JavaScript
function distance(a, b) {
const dx = a.x - b.x;
const dy = a.y - b.y;
return Math.sqrt(dx * dx + dy * dy);
}
Both produce exactly the same energy analysis:
$ joulec --lift ts distance.ts
distance 3.85 nJ (confidence: 0.95)
$ joulec --lift js distance.js
distance 3.85 nJ (confidence: 0.95)
Supported Features
Everything from the JavaScript guide is supported, plus TypeScript-specific syntax is silently stripped:
| TypeScript Feature | Handling |
|---|---|
| Type annotations | Stripped |
| Interfaces | Stripped |
| Type aliases | Stripped |
| Generics | Stripped |
| Access modifiers | Stripped |
| Enums (simple) | Converted to constants |
as casts | Stripped |
Non-null ! | Stripped |
Optional ? params | Treated as default undefined |
When to Use TypeScript vs JavaScript Lifting
Use --lift ts when your source files are .ts or .tsx. The lifter handles the type syntax that would cause parse errors in the JavaScript parser. If your TypeScript is already compiled to JavaScript, use --lift js on the output — the energy profile will be identical.
Anti-Patterns
All JavaScript anti-patterns apply equally to TypeScript. Types do not change the runtime energy profile.
Limitations
- Same limitations as JavaScript
- Complex enum patterns with computed values are not supported
- Namespace merging is not supported
- Decorators (experimental) are not executed
declareblocks are ignored (ambient declarations)
C Energy Analysis
Joule analyzes C code for energy consumption, targeting the low-level patterns where energy waste is most impactful: memory allocation, cache access patterns, and nested loop structures.
Quick Start
# Static energy analysis
joulec --lift c program.c
# Execute with energy tracking
joulec --lift-run c program.c
# Execute with energy optimization
joulec --energy-optimize --lift-run c program.c
Supported Features
| Category | Features |
|---|---|
| Types | int, long, float, double, char, void, size_t, unsigned variants |
| Pointers | Declaration, dereference *p, address-of &x, pointer arithmetic |
| Arrays | Fixed-size int arr[N], multidimensional int mat[M][N] |
| Control flow | if/else, while, do-while, for, switch/case, break, continue, goto (limited) |
| Functions | Declaration, definition, forward declarations, recursion |
| Structs | Definition, field access . and ->, nested structs |
| Memory | malloc(), calloc(), realloc(), free() |
| I/O | printf(), scanf(), puts(), getchar() |
| Math | sqrt(), pow(), abs(), floor(), ceil(), sin(), cos(), log(), exp() |
| Operators | All arithmetic, bitwise, comparison, logical, ternary ?:, comma |
Common Energy Anti-Patterns
1. malloc Inside Loops
// BAD — 1000 allocations, 1000 frees
for (int i = 0; i < 1000; i++) {
int *buf = malloc(sizeof(int) * 100);
process(buf, 100);
free(buf);
}
// GOOD — allocate once, reuse
int *buf = malloc(sizeof(int) * 100);
for (int i = 0; i < 1000; i++) {
process(buf, 100);
}
free(buf);
Category: ALLOCATION | Severity: High | Savings: ~5x
Each malloc/free cycle costs ~200 pJ (DRAM access) plus system call overhead. In a tight loop, this dominates the energy budget.
2. Cache-Unfriendly Access Patterns
// BAD — column-major access on row-major array (cache miss per element)
for (int j = 0; j < N; j++) {
for (int i = 0; i < M; i++) {
sum += matrix[i][j]; // stride = N * sizeof(int)
}
}
// GOOD — row-major access (sequential cache hits)
for (int i = 0; i < M; i++) {
for (int j = 0; j < N; j++) {
sum += matrix[i][j]; // stride = sizeof(int)
}
}
Category: MEMORY | Severity: Critical | Savings: ~10x for large matrices
L1 cache load costs 0.5 pJ. DRAM load costs 200 pJ — a 400x difference. Column-major traversal on row-major data causes a DRAM load on nearly every access.
3. Realloc Growth in Loops
// BAD — realloc doubles per iteration, copies all data each time
int *data = NULL;
int cap = 0;
for (int i = 0; i < n; i++) {
cap++;
data = realloc(data, cap * sizeof(int));
data[cap - 1] = i;
}
// GOOD — geometric growth (amortized O(1) per insert)
int *data = malloc(16 * sizeof(int));
int len = 0, cap = 16;
for (int i = 0; i < n; i++) {
if (len == cap) {
cap *= 2;
data = realloc(data, cap * sizeof(int));
}
data[len++] = i;
}
Category: ALLOCATION | Severity: High | Savings: ~4x
4. Nested Loop Complexity
// BAD — O(n^3) matrix multiply without blocking
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];
Category: ALGORITHM | Severity: Critical (for large N)
The energy estimator flags O(n^3) nested loops with high energy estimates and reduced confidence scores.
Worked Example
#include <stdlib.h>
#include <stdio.h>
void matrix_multiply(int *A, int *B, int *C, int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
C[i * n + j] = 0;
for (int k = 0; k < n; k++) {
C[i * n + j] += A[i * n + k] * B[k * n + j];
}
}
}
}
int main() {
int n = 64;
int *A = calloc(n * n, sizeof(int));
int *B = calloc(n * n, sizeof(int));
int *C = calloc(n * n, sizeof(int));
for (int i = 0; i < n * n; i++) {
A[i] = i % 10;
B[i] = (i * 3) % 10;
}
matrix_multiply(A, B, C, n);
printf("C[0][0] = %d\n", C[0]);
free(A); free(B); free(C);
return 0;
}
$ joulec --lift c matmul.c
Energy Analysis: matmul.c
matrix_multiply 892.40 nJ (confidence: 0.50)
main 45.20 nJ (confidence: 0.75)
Total: 937.60 nJ
Recommendations:
!!! [ALGORITHM] matrix_multiply — O(n^3) nested loop detected
Suggestion: consider cache-blocking or BLAS library for large matrices
Estimated savings: 3-5x with cache blocking
!! [MEMORY] matrix_multiply — inner loop access pattern B[k*n+j] has stride n
Suggestion: transpose B before multiply, or interchange k/j loops
Estimated savings: 2-4x from improved cache locality
Limitations
- No preprocessor directives (
#define,#include,#ifdef) - No function pointers or callbacks
- No variadic functions beyond
printf/scanf - No
typedef(use bare type names) - No
uniontypes - No
enum(use integer constants) - No complex struct initializers (
= { .field = value }) - No inline assembly
Go Energy Analysis
Joule analyzes Go code with awareness of goroutines, channels, and Go's concurrency model. The energy cost of spawning goroutines, sending on channels, and slice operations is modeled at the picojoule level.
Quick Start
# Static energy analysis
joulec --lift go main.go
# Execute with energy tracking
joulec --lift-run go main.go
# Execute with energy optimization
joulec --energy-optimize --lift-run go main.go
Supported Features
| Category | Features |
|---|---|
| Types | int, int8/16/32/64, uint, float32/64, string, bool, byte, rune |
| Variables | var, := short declaration, const, multiple assignment |
| Functions | func, multiple return values, named returns, closures, variadic ... |
| Control flow | if/else (with init statement), for, for range, switch/case, select |
| Slices | Creation, append(), len(), cap(), slicing s[a:b], make(), copy() |
| Maps | map[K]V, make(map[...]), index, delete, len(), comma-ok pattern |
| Structs | Definition, field access, methods (value/pointer receiver), embedding |
| Concurrency | go (goroutine spawn), chan, <- send/receive, make(chan T, N), close() |
| Defer | defer statement (LIFO cleanup) |
| Error handling | Multiple return (result, error), if err != nil pattern |
| Packages | fmt.Println, fmt.Sprintf, math.Sqrt, strings.*, strconv.* |
Common Energy Anti-Patterns
1. Unbounded Goroutine Fan-Out
// BAD — spawns 100K goroutines, each has scheduling overhead
for _, item := range items {
go process(item) // 100K goroutines
}
// GOOD — bounded worker pool
ch := make(chan Item, 100)
for i := 0; i < runtime.NumCPU(); i++ {
go func() {
for item := range ch {
process(item)
}
}()
}
for _, item := range items {
ch <- item
}
close(ch)
Category: ALLOCATION | Severity: Critical | Savings: ~10x
Each goroutine has a minimum 2KB stack allocation. 100K goroutines = 200MB of stack memory + scheduling overhead.
2. Slice Append Without Pre-Allocation
// BAD — slice grows geometrically, copying data each time
var result []int
for i := 0; i < 10000; i++ {
result = append(result, i)
}
// GOOD — pre-allocate known capacity
result := make([]int, 0, 10000)
for i := 0; i < 10000; i++ {
result = append(result, i)
}
Category: ALLOCATION | Severity: Medium | Savings: ~2x
Without pre-allocation, append triggers ~14 reallocations and data copies to grow from 0 to 10,000 elements.
3. Map Iteration with Value Copy
// BAD — copies entire struct on each iteration
type BigStruct struct {
Data [1024]byte
Name string
}
for _, v := range bigMap {
process(v) // copies 1KB+ per iteration
}
// GOOD — iterate with index, access by reference
for k := range bigMap {
process(&bigMap[k])
}
Category: MEMORY | Severity: Medium | Savings: ~3x for large values
4. String Concatenation in Loops
// BAD — O(n^2) string building
result := ""
for _, s := range parts {
result += s
}
// GOOD — O(n) with strings.Builder
var b strings.Builder
for _, s := range parts {
b.WriteString(s)
}
result := b.String()
Category: STRING | Severity: High | Savings: ~10x
Worked Example
package main
import "fmt"
func counter(id int, ch chan int) {
sum := 0
for i := 0; i < 1000; i++ {
sum += i
}
ch <- sum
}
func main() {
ch := make(chan int, 10)
for i := 0; i < 10; i++ {
go counter(i, ch)
}
total := 0
for i := 0; i < 10; i++ {
total += <-ch
}
fmt.Println(total)
}
$ joulec --lift go counter.go
Energy Analysis: counter.go
counter 12.50 nJ (confidence: 0.70)
main 8.30 nJ (confidence: 0.65)
Total: 20.80 nJ
Note: 10 goroutines detected. Energy estimate reflects single-thread
execution model; actual concurrent execution may differ due to
scheduling and synchronization overhead.
Limitations
- No interface method dispatch (interfaces parsed but not resolved dynamically)
- No struct embedding for method promotion
- No generics (Go 1.18+ type parameters)
- No
selectwith complex multi-channel patterns - No
panic/recover(panic is treated as program exit) - No
init()functions - No package imports beyond
fmt,math,strings,strconv - Goroutines are analyzed sequentially (no parallel energy modeling)
Rust Energy Analysis
Joule analyzes Rust code with awareness of ownership, iterator chains, and zero-cost abstractions. The lifter models the energy cost of heap allocations, reference counting, and iterator fusion.
Quick Start
# Static energy analysis
joulec --lift rust lib.rs
# Execute with energy tracking
joulec --lift-run rust lib.rs
# Execute with energy optimization
joulec --energy-optimize --lift-run rust lib.rs
Supported Features
| Category | Features |
|---|---|
| Types | i8/16/32/64, u8/16/32/64, f32/64, bool, char, String, &str, usize, isize |
| Variables | let, let mut, const, type inference, shadowing |
| Functions | fn, closures |x| x + 1, generic functions (basic), impl blocks |
| Control flow | if/else, while, loop, for x in, match (patterns, guards), break, continue |
| Ownership | & references, &mut mutable references, move closures, lifetime annotations (parsed, not enforced) |
| Structs | Definition, field access, methods, associated functions |
| Enums | Variants, match exhaustiveness, Option<T>, Result<T, E> |
| Collections | Vec<T>, HashMap<K, V>, String, Box<T> |
| Iterators | .iter(), .map(), .filter(), .fold(), .collect(), .enumerate(), .zip(), .chain(), .take(), .skip(), .any(), .all(), .find(), .sum(), .count() |
| Traits | Trait definitions and impl Trait for Type (signatures only) |
| Macros | println!, format!, vec!, panic! (pattern-matched, not expanded) |
Common Energy Anti-Patterns
1. clone() in Hot Loops
#![allow(unused)] fn main() { // BAD — clones String every iteration (heap allocation + copy) for item in &data { let owned = item.clone(); process(owned); } // GOOD — borrow instead of clone for item in &data { process_ref(item); } }
Category: ALLOCATION | Severity: High | Savings: ~5x
Each .clone() on a String involves malloc + memcpy. At 200 pJ per DRAM access, this dominates in tight loops.
2. Unnecessary collect() in Iterator Chains
#![allow(unused)] fn main() { // BAD — collects into intermediate Vec, then iterates again let filtered: Vec<i32> = data.iter() .filter(|&&x| x > 0) .cloned() .collect(); // allocates intermediate Vec let sum: i32 = filtered.iter().sum(); // GOOD — single iterator chain, no intermediate allocation let sum: i32 = data.iter() .filter(|&&x| x > 0) .sum(); }
Category: ALLOCATION | Severity: Medium | Savings: ~2x
Iterator fusion in Rust is a zero-cost abstraction — the compiler fuses the chain into a single loop. Breaking the chain with .collect() defeats this.
3. Box::new() in Loops
#![allow(unused)] fn main() { // BAD — heap allocation per iteration let mut nodes: Vec<Box<Node>> = Vec::new(); for i in 0..1000 { nodes.push(Box::new(Node { value: i })); } // GOOD — pre-allocate with arena or flat Vec let mut nodes: Vec<Node> = Vec::with_capacity(1000); for i in 0..1000 { nodes.push(Node { value: i }); } }
Category: ALLOCATION | Severity: Medium | Savings: ~3x
4. format!() String Building in Loops
#![allow(unused)] fn main() { // BAD — format! allocates a new String every iteration let mut log = String::new(); for i in 0..1000 { log.push_str(&format!("item {}\n", i)); } // GOOD — write! to a single buffer use std::fmt::Write; let mut log = String::with_capacity(10000); for i in 0..1000 { write!(log, "item {}\n", i).unwrap(); } }
Category: STRING | Severity: Medium | Savings: ~2x
Worked Example
fn process_data(data: &[f64]) -> f64 { let filtered: Vec<f64> = data.iter() .filter(|&&x| x > 0.0) .cloned() .collect(); let normalized: Vec<f64> = filtered.iter() .map(|&x| x / filtered.len() as f64) .collect(); normalized.iter().sum() } fn main() { let data = vec![3.0, -1.0, 4.0, -2.0, 5.0, 1.0, -3.0, 2.0]; let result = process_data(&data); println!("Result: {}", result); }
$ joulec --lift rust pipeline.rs
Energy Analysis: pipeline.rs
process_data 12.30 nJ (confidence: 0.65)
main 2.10 nJ (confidence: 0.90)
Total: 14.40 nJ
Recommendations:
!! [ALLOCATION] process_data — two collect() calls create intermediate Vecs
Suggestion: fuse into a single iterator chain without intermediate allocation
Estimated savings: 2-3x
Optimized version:
data.iter()
.filter(|&&x| x > 0.0)
.map(|&x| x / count as f64)
.sum()
Zero-Cost Abstractions Are Real
Rust's iterator chains compile to the same machine code as hand-written loops. Joule confirms this:
#![allow(unused)] fn main() { // Iterator version fn sum_positive_iter(data: &[i32]) -> i32 { data.iter().filter(|&&x| x > 0).sum() } // Manual loop version fn sum_positive_loop(data: &[i32]) -> i32 { let mut sum = 0; for &x in data { if x > 0 { sum += x; } } sum } }
$ joulec --lift rust zero_cost.rs
sum_positive_iter 4.20 nJ (confidence: 0.70)
sum_positive_loop 4.20 nJ (confidence: 0.70)
Identical energy. The abstraction is truly zero-cost.
Limitations
- No trait dispatch (static or dynamic) — trait bounds are parsed but not resolved
- No lifetime analysis — lifetimes are parsed but not enforced
- No
async/await— async is not supported - No procedural macros — only
println!,format!,vec!,panic!are recognized - No
useimports — all types must be fully qualified or built-in - No
impl Traitreturn types - No
whereclauses - No
unsafeblocks
Energy Optimization Walkthrough
This walkthrough takes a real Python program through the full Joule energy analysis and optimization pipeline — from first scan to CI-ready energy budgets.
Step 1: Start with Real Code
Here's a data processing program with several common energy anti-patterns:
def find_duplicates(items, reference):
"""Find items that appear in both lists."""
duplicates = []
for item in items:
for ref in reference: # nested loop: O(n*m)
if item == ref:
duplicates.append(item)
return duplicates
def build_report(records):
"""Build a text report from records."""
report = ""
for i in range(len(records)): # len() in loop, string concat
report += "Record " + str(i) + ": " + str(records[i]) + "\n"
return report
def process_batch(data):
"""Filter and transform a data batch."""
results = []
for item in data:
temp = [] # allocation inside loop
temp.append(item * 2)
if temp[0] > 10:
results.append(temp[0])
return results
def search_all(items, targets):
"""Check if all targets exist in items."""
found = 0
for t in targets:
for item in items: # linear scan for each target
if item == t:
found = found + 1
# no break — scans entire list even after finding match
return found
def main():
data = []
for i in range(500):
data.append(i)
reference = []
for i in range(250, 750):
reference.append(i)
dups = find_duplicates(data, reference)
report = build_report(data)
processed = process_batch(data)
count = search_all(data, reference)
print(len(dups))
print(len(processed))
print(count)
main()
Step 2: Run Baseline Analysis
$ joulec --lift python anti_patterns.py --energy-report baseline.json
Energy Analysis: anti_patterns.py
find_duplicates 285.00 nJ (confidence: 0.50)
build_report 72.50 nJ (confidence: 0.55)
process_batch 18.30 nJ (confidence: 0.60)
search_all 285.00 nJ (confidence: 0.50)
main 12.40 nJ (confidence: 0.75)
Total: 673.20 nJ
Step 3: Read the Recommendations
Recommendations:
!!! [ALGORITHM] find_duplicates — O(n^2) nested loop for membership test
Suggestion: convert reference to a set for O(1) lookups
Estimated savings: 50x
!!! [ALGORITHM] search_all — O(n^2) nested loop for membership test
Suggestion: convert items to a set for O(1) lookups
Estimated savings: 50x
!! [STRING] build_report — string concatenation in loop
Suggestion: use "".join() to build string in one allocation
Estimated savings: 8x
!! [LOOP] search_all — no early exit after finding match
Suggestion: add break after match to avoid scanning remaining elements
Estimated savings: 2x (average case)
! [REDUNDANCY] build_report — len(records) called in loop range
Suggestion: compute len() once before the loop
Estimated savings: 1.2x
! [ALLOCATION] process_batch — list allocation inside loop body
Suggestion: reuse buffer or eliminate temporary list
Estimated savings: 3x
. [REDUNDANCY] build_report — str() conversion could use f-string
Suggestion: use f"Record {i}: {records[i]}" for cleaner concatenation
Estimated savings: 1.1x
Severity markers: !!! Critical, !! High, ! Medium, . Low
Step 4: Fix Critical Issues First
Fix #1: Hash set for find_duplicates
def find_duplicates(items, reference):
ref_set = set(reference) # O(m) one-time cost
duplicates = []
for item in items:
if item in ref_set: # O(1) per lookup
duplicates.append(item)
return duplicates
$ joulec --lift python fixed_v1.py
find_duplicates 8.20 nJ (confidence: 0.70) # was 285.00 nJ — 34x reduction
Fix #2: Hash set for search_all + early tracking
def search_all(items, targets):
item_set = set(items)
found = 0
for t in targets:
if t in item_set:
found = found + 1
return found
$ joulec --lift python fixed_v2.py
search_all 6.80 nJ (confidence: 0.75) # was 285.00 nJ — 42x reduction
Fix #3: String builder for build_report
def build_report(records):
n = len(records)
parts = []
for i in range(n):
parts.append(f"Record {i}: {records[i]}")
report = "\n".join(parts) + "\n"
return report
$ joulec --lift python fixed_v3.py
build_report 9.50 nJ (confidence: 0.75) # was 72.50 nJ — 7.6x reduction
Fix #4: Eliminate temporary allocation in process_batch
def process_batch(data):
results = []
for item in data:
doubled = item * 2
if doubled > 10:
results.append(doubled)
return results
$ joulec --lift python fixed_v4.py
process_batch 6.10 nJ (confidence: 0.75) # was 18.30 nJ — 3x reduction
Step 5: Run Optimized Baseline
After all four fixes:
$ joulec --lift python optimized.py --energy-report optimized.json
Energy Analysis: optimized.py
find_duplicates 8.20 nJ (confidence: 0.70)
build_report 9.50 nJ (confidence: 0.75)
process_batch 6.10 nJ (confidence: 0.75)
search_all 6.80 nJ (confidence: 0.75)
main 12.40 nJ (confidence: 0.75)
Total: 43.00 nJ
No recommendations — all detected anti-patterns have been resolved.
Step 6: Apply Automated Optimization
The --energy-optimize flag applies four compiler passes on top of your fixes:
$ joulec --energy-optimize --lift-run python optimized.py
Energy Optimization Report:
Pass 1 (Thermal-Aware Selection): 2 instructions adapted
Pass 2 (Branch Optimization): 3 branches reordered
Pass 3 (Loop Unrolling): 1 loop unrolled (trip count 4)
Pass 4 (DRAM Layout Analysis): no suggestions
Optimized energy: 38.70 nJ (10.0% reduction from automated passes)
Step 7: Compare Results
| Function | Before | After Fixes | After Optimization | Reduction |
|---|---|---|---|---|
| find_duplicates | 285.00 nJ | 8.20 nJ | 7.40 nJ | 97.4% |
| build_report | 72.50 nJ | 9.50 nJ | 8.90 nJ | 87.7% |
| process_batch | 18.30 nJ | 6.10 nJ | 5.50 nJ | 69.9% |
| search_all | 285.00 nJ | 6.80 nJ | 6.10 nJ | 97.9% |
| main | 12.40 nJ | 12.40 nJ | 10.80 nJ | 12.9% |
| Total | 673.20 nJ | 43.00 nJ | 38.70 nJ | 94.3% |
The manual fixes account for 93.6% of the savings. The automated passes add another 10% on top.
Step 8: Set an Energy Budget for CI
# Set budget at 50 nJ — optimized version passes
$ joulec --lift python optimized.py --energy-budget 50nJ
# Exit code: 0 (within budget)
# The original version would fail
$ joulec --lift python anti_patterns.py --energy-budget 50nJ
# Exit code: 1 (budget exceeded: 673.20 nJ > 50.00 nJ)
GitHub Actions Integration
- name: Energy budget check
run: |
joulec --lift python src/core.py --energy-budget 100nJ
joulec --lift python src/utils.py --energy-budget 50nJ
The build fails if any file exceeds its budget, catching energy regressions before merge.
Step 9: Generate Reports for Dashboards
$ joulec --lift python optimized.py --energy-report report.json
The JSON report includes per-function energy, confidence scores, and any remaining recommendations. Feed this into Grafana, Datadog, or any monitoring system to track energy consumption across releases.
Key Takeaways
- Start with
--liftto get a baseline without running the code - Fix critical recommendations first — algorithmic changes (O(n^2) → O(n)) yield the biggest savings
- Use
--energy-optimizefor automated passes on top of manual fixes - Set
--energy-budgetin CI to prevent regressions - Generate
--energy-reportJSON for tracking trends over time
Cross-Language Energy Comparison
The same algorithm, implemented in six languages, analyzed by Joule. This comparison reveals the energy cost of language abstractions and runtime overhead.
The Algorithm
Iterative Fibonacci computing fib(30). Chosen because it's simple enough to implement identically in every language, with enough arithmetic to produce meaningful energy differences.
Python
def fibonacci(n):
if n <= 1:
return n
a = 0
b = 1
for i in range(2, n + 1):
temp = a + b
a = b
b = temp
return b
def main():
result = fibonacci(30)
print(result)
main()
JavaScript
function fibonacci(n) {
if (n <= 1) return n;
let a = 0;
let b = 1;
for (let i = 2; i <= n; i++) {
const temp = a + b;
a = b;
b = temp;
}
return b;
}
function main() {
const result = fibonacci(30);
console.log(result);
}
main();
TypeScript
function fibonacci(n: number): number {
if (n <= 1) return n;
let a: number = 0;
let b: number = 1;
for (let i: number = 2; i <= n; i++) {
const temp: number = a + b;
a = b;
b = temp;
}
return b;
}
function main(): void {
const result: number = fibonacci(30);
console.log(result);
}
main();
C
#include <stdio.h>
int fibonacci(int n) {
if (n <= 1) return n;
int a = 0;
int b = 1;
for (int i = 2; i <= n; i++) {
int temp = a + b;
a = b;
b = temp;
}
return b;
}
int main() {
int result = fibonacci(30);
printf("%d\n", result);
return 0;
}
Go
package main
import "fmt"
func fibonacci(n int) int {
if n <= 1 {
return n
}
a := 0
b := 1
for i := 2; i <= n; i++ {
temp := a + b
a = b
b = temp
}
return b
}
func main() {
result := fibonacci(30)
fmt.Println(result)
}
Rust
fn fibonacci(n: i32) -> i32 { if n <= 1 { return n; } let mut a = 0; let mut b = 1; for _i in 2..=n { let temp = a + b; a = b; b = temp; } b } fn main() { let result = fibonacci(30); println!("{}", result); }
Running the Comparison
joulec --lift python fibonacci.py
joulec --lift js fibonacci.js
joulec --lift ts fibonacci.ts
joulec --lift c fibonacci.c
joulec --lift go fibonacci.go
joulec --lift rust fibonacci.rs
Results
| Language | fibonacci() Energy | main() Energy | Total | Confidence |
|---|---|---|---|---|
| C | 1.75 nJ | 0.85 nJ | 2.60 nJ | 0.90 |
| Rust | 1.75 nJ | 1.10 nJ | 2.85 nJ | 0.90 |
| Go | 1.95 nJ | 1.20 nJ | 3.15 nJ | 0.85 |
| JavaScript | 2.80 nJ | 1.50 nJ | 4.30 nJ | 0.85 |
| TypeScript | 2.80 nJ | 1.50 nJ | 4.30 nJ | 0.85 |
| Python | 3.40 nJ | 1.80 nJ | 5.20 nJ | 0.80 |
All programs produce the correct result: 832040.
Analysis
Why C and Rust Are Cheapest
Both C and Rust map directly to integer arithmetic with no runtime overhead. The fibonacci() function compiles to:
- 29 integer additions (0.05 pJ each = 1.45 pJ)
- 58 register moves (~0 pJ, register-to-register)
- 29 loop iterations with branch (0.1 pJ each = 2.9 pJ)
- 1 comparison + branch for the
n <= 1check
Total compute: ~4.4 pJ. The remaining energy comes from function call overhead, stack frame setup, and memory loads.
Rust's slightly higher main() cost accounts for println! macro expansion, which involves formatting machinery that printf avoids.
Why Go Costs Slightly More
Go's runtime includes goroutine scheduling infrastructure even for single-threaded programs. The fmt.Println call also involves reflection-based formatting that adds overhead beyond C's printf.
Why JavaScript Costs More
JavaScript numbers are f64 (double-precision float) even for integer arithmetic. The fibonacci() loop performs float addition instead of integer addition:
- Integer add: 0.05 pJ
- Float add: 0.35 pJ (7x more expensive)
This single type system decision accounts for most of JavaScript's energy premium.
Why TypeScript Equals JavaScript
TypeScript type annotations (n: number, let a: number) are erased before analysis. The runtime behavior is identical to JavaScript — same f64 arithmetic, same energy profile.
Why Python Costs the Most
Python's dynamic dispatch adds overhead per operation. Each + involves:
- Type check on both operands
- Method lookup (
__add__) - Result allocation (for large integers)
The energy model accounts for this dispatch overhead, making Python ~2x more expensive than C for pure arithmetic.
Thermal State Impact
Running with different thermal states changes the cost model's power efficiency factor:
joulec --lift c fibonacci.c --thermal-state cool # aggressive optimization
joulec --lift c fibonacci.c --thermal-state hot # conservative, reduced SIMD
| Thermal State | C Energy | Python Energy | Ratio |
|---|---|---|---|
| cool (< 50C) | 2.40 nJ | 4.80 nJ | 2.0x |
| nominal (50-70C) | 2.60 nJ | 5.20 nJ | 2.0x |
| hot (85-95C) | 3.10 nJ | 6.20 nJ | 2.0x |
The absolute energy increases with temperature (thermal resistance reduces efficiency), but the ratio between languages stays constant for this workload because the algorithm is compute-bound with no SIMD opportunities.
The Energy Cost of Abstraction
This comparison quantifies something developers intuit but rarely measure: higher-level languages consume more energy for the same computation. The gap is not enormous — Python costs 2x what C costs for pure arithmetic — but it compounds across millions of function calls in production systems.
Joule makes this cost visible. Whether you're choosing a language for a new project, optimizing a hot path, or justifying a rewrite, you now have picojoule-level data to inform the decision.
Accelerator Energy Measurement
Joule measures energy consumption not just on CPUs but across GPUs, TPUs, and other accelerators. This guide covers the three-tier measurement approach, supported hardware, and how to use accelerator energy data in your programs.
Three-Tier Approach
Joule uses a tiered strategy to maximize energy measurement coverage:
Tier 1: Static Estimation
Available everywhere, no hardware access required. The compiler estimates energy from code structure using calibrated instruction costs. This is what powers #[energy_budget] at compile time.
Tier 2: CPU Performance Counters
On supported platforms, Joule reads hardware performance counters for actual CPU energy:
| Platform | API | Granularity |
|---|---|---|
| Intel/AMD Linux | RAPL via perf_event | Per-package, per-core |
| Intel/AMD Linux | RAPL via MSR | Per-package |
| Apple Silicon macOS | IOReport framework | Per-cluster |
Tier 3: Accelerator Energy
For GPU and accelerator workloads, Joule queries vendor-specific APIs. Each backend in TensorForge implements the EnergyTelemetry trait.
Vendor Coverage
| Vendor | Hardware | API | Energy | Power | Temperature |
|---|---|---|---|---|---|
| NVIDIA | GPUs (A100, H100, etc.) | NVML | Board-level | Per-GPU | Per-GPU |
| AMD | GPUs (MI250, MI300, etc.) | ROCm SMI | Average power | Per-GPU | Per-GPU |
| Intel | GPUs, Gaudi | Level Zero | Per-device | Per-domain | Per-device |
| TPU v4, v5 | TPU Runtime | Per-chip | Per-chip | Per-chip | |
| AWS | Inferentia, Trainium | Neuron SDK | Per-core | Per-core | Per-core |
| Groq | LPU | HLML | Board-level | Per-device | Per-device |
| Cerebras | CS-2, CS-3 | CS SDK | Wafer-scale | Per-wafer | Per-wafer |
| SambaNova | SN30, SN40 | DataScale API | Per-RDU | Per-RDU | Per-RDU |
API Details
NVIDIA (NVML)
The NVIDIA Management Library provides direct energy readings:
nvmlDeviceGetTotalEnergyConsumption(device, &energy_mj)
- Returns total energy in millijoules since driver load
- Subtract start from end measurement for per-operation energy
- Available on all datacenter GPUs (V100, A100, H100, B100)
- Supported on consumer GPUs (RTX 3000/4000/5000 series)
AMD (ROCm SMI)
ROCm System Management Interface provides power readings:
rsmi_dev_power_ave_get(device_index, sensor_id, &power_uw)
- Returns average power in microwatts
- Energy is derived from power * time
- Available on MI series (MI250, MI300) and Radeon Pro
Intel (Level Zero)
Intel's Level Zero API provides power domain readings:
zesDeviceEnumPowerDomains(device, &count, domains)
zesPowerGetEnergyCounter(domain, &energy)
- Energy counter in microjoules
- Multiple power domains (package, card, memory)
- Supports Intel Arc GPUs and Gaudi accelerators
Google (TPU Runtime)
tpu_device_get_energy_consumption(device, &energy_j)
- Per-chip energy in joules
- Available on TPU v4 and v5 pods
- Accessed through the TPU runtime API
AWS (Neuron SDK)
neuron_device_get_power(device, &power_mw)
- Per-NeuronCore power in milliwatts
- Available on Inferentia and Trainium instances
- Accessed through the Neuron runtime
Groq (HLML)
Groq's Hardware Library for Machine Learning mirrors the NVML API:
hlmlDeviceGetTotalEnergyConsumption(device, &energy_mj)
- Board-level energy in millijoules
- Available on Groq LPU cards
Cloud Detection
Joule automatically detects available accelerators using:
Device Files
| Path | Accelerator |
|---|---|
/dev/nvidia* | NVIDIA GPU |
/dev/kfd | AMD GPU (ROCm) |
/dev/dri/renderD* | Intel GPU |
/dev/accel* | Google TPU |
/dev/neuron* | AWS Inferentia/Trainium |
Environment Variables
| Variable | Accelerator |
|---|---|
CUDA_VISIBLE_DEVICES | NVIDIA GPU |
ROCR_VISIBLE_DEVICES | AMD GPU |
ZE_AFFINITY_MASK | Intel GPU |
TPU_NAME | Google TPU |
NEURON_RT_NUM_CORES | AWS Inferentia/Trainium |
GROQ_DEVICE_ID | Groq LPU |
JSON Output
Set JOULE_ENERGY_JSON=1 to get structured JSON output with per-device breakdowns:
JOULE_ENERGY_JSON=1 joulec program.joule --emit c --energy-check -o program.c
Report Format
{
"program": "program.joule",
"timestamp": "2026-03-03T10:30:00Z",
"devices": [
{
"type": "cpu",
"vendor": "intel",
"model": "Xeon w9-3595X",
"energy_joules": 0.00042,
"measurement": "rapl",
"tier": 2
},
{
"type": "gpu",
"vendor": "nvidia",
"model": "H100",
"energy_joules": 0.0031,
"measurement": "nvml",
"tier": 3
}
],
"total_energy_joules": 0.00352,
"functions": [
{
"name": "matrix_multiply",
"energy_joules": 0.0028,
"device": "gpu:0",
"confidence": 0.95,
"budget_joules": 0.005,
"status": "within_budget"
},
{
"name": "preprocess",
"energy_joules": 0.00042,
"device": "cpu",
"confidence": 0.90,
"budget_joules": 0.001,
"status": "within_budget"
}
]
}
Per-Device Breakdown
When multiple accelerators are present, the report includes energy per device:
{
"devices": [
{ "type": "cpu", "energy_joules": 0.0012, "tier": 2 },
{ "type": "gpu", "vendor": "nvidia", "index": 0, "energy_joules": 0.045, "tier": 3 },
{ "type": "gpu", "vendor": "nvidia", "index": 1, "energy_joules": 0.043, "tier": 3 },
{ "type": "gpu", "vendor": "nvidia", "index": 2, "energy_joules": 0.044, "tier": 3 },
{ "type": "gpu", "vendor": "nvidia", "index": 3, "energy_joules": 0.046, "tier": 3 }
],
"total_energy_joules": 0.1792
}
Using Accelerator Energy in Code
Energy Budgets on GPU Functions
#[energy_budget(max_joules = 0.05)]
#[gpu_kernel]
fn batch_matmul(a: Tensor, b: Tensor) -> Tensor {
a.matmul(b)
}
The budget is checked against actual GPU energy consumption (Tier 3) when available, or estimated (Tier 1) otherwise.
Runtime Energy Query
use std::energy::{measure, EnergyReport};
let report: EnergyReport = measure(|| {
model.forward(input)
});
println!("CPU energy: {} J", report.cpu_joules());
println!("GPU energy: {} J", report.gpu_joules());
println!("Total: {} J", report.total_joules());
Adaptive Energy Behavior
use std::energy::current_power_draw;
let power = current_power_draw(); // watts
if power > 200.0 {
// Use energy-efficient path
compute_sparse(data)
} else {
// Full compute path
compute_dense(data)
}
Fallback Behavior
When hardware energy APIs are unavailable, Joule falls back gracefully:
- If Tier 3 (accelerator) is unavailable, use Tier 2 (CPU counters) for CPU portions
- If Tier 2 is unavailable, use Tier 1 (static estimation)
- The confidence score reflects which tier was used
No program crashes due to missing energy hardware. The measurement degrades gracefully with reduced precision.
TensorForge
TensorForge is Joule's energy-aware machine learning framework. It provides a complete ML stack -- from tensor operations to distributed training to inference -- with energy measurement built into every layer.
Architecture
TensorForge is organized as 22 crates in the Joule workspace:
Foundation Crates
| Crate | Purpose |
|---|---|
tf-core | Core types, EnergyTelemetry trait, tensor metadata |
tf-ir | TensorIR: HighOp (14 tensor operations), graph representation |
tf-compiler | OptimizationPass trait, graph rewriting infrastructure, 7 optimization passes |
tf-autodiff | Automatic differentiation with real VJP (vector-Jacobian product) implementations |
tf-hal | Hardware abstraction: Device trait, memory management |
tf-runtime | Tensor execution runtime, memory pools, scheduling |
Backend Crates
| Crate | Hardware Target |
|---|---|
tf-backend-cpu | x86/ARM CPUs with SIMD |
tf-backend-cuda | NVIDIA GPUs via CUDA |
tf-backend-rocm | AMD GPUs via ROCm/HIP |
tf-backend-metal | Apple GPUs via Metal |
tf-backend-tpu | Google TPUs |
tf-backend-level0 | Intel GPUs/accelerators via Level Zero |
tf-backend-neuron | AWS Inferentia/Trainium via Neuron SDK |
tf-backend-groq | Groq LPUs |
tf-backend-gaudi | Intel Gaudi (Habana Labs) |
tf-backend-estimated | Energy-estimated backend (no hardware required) |
High-Level Crates
| Crate | Purpose |
|---|---|
tf-nn | Neural network modules (Module trait, layers, activations) |
tf-optim | Optimizers (AdamW, SGD with momentum) |
tf-data | Data loading and batching |
tf-serialize | Model serialization/deserialization |
tf-distributed | Distributed training (ring, tree, halving-doubling collectives) |
tf-infer | Inference engine (KV cache, speculative decoding, scheduling) |
EnergyTelemetry Trait
The EnergyTelemetry trait is the foundation of TensorForge's energy awareness. Every backend implements it:
pub trait EnergyTelemetry {
fn energy_consumed_joules(&self) -> f64;
fn power_draw_watts(&self) -> f64;
fn temperature_celsius(&self) -> f64;
fn reset_counters(&mut self);
}
This means every tensor operation -- every matmul, every convolution, every activation -- has a measurable energy cost. The energy data flows up through the framework:
- Individual ops report energy via the backend's telemetry
- The optimizer aggregates energy per training step
- The training loop reports energy per epoch
- The distributed runtime aggregates energy across all nodes
TensorIR
TensorForge uses a graph-based intermediate representation with 14 high-level operations:
| Operation | Description |
|---|---|
MatMul | Matrix multiplication |
Conv2D | 2D convolution |
BatchNorm | Batch normalization |
Relu | ReLU activation |
Softmax | Softmax |
Add | Element-wise addition |
Mul | Element-wise multiplication |
Reduce | Reduction (sum, mean, max) |
Reshape | Tensor reshape |
Transpose | Tensor transpose |
Concat | Tensor concatenation |
Slice | Tensor slicing |
Gather | Index-based gathering |
Scatter | Index-based scattering |
Graph Optimization
The tf-compiler provides 7 optimization passes:
- Operator Fusion -- Fuse sequences like Conv2D+BatchNorm+ReLU into a single kernel
- Layout Optimization -- Choose optimal memory layout (NCHW vs NHWC) per backend
- Constant Folding -- Evaluate constant subgraphs at compile time
- Dead Node Elimination -- Remove unused computation
- Common Subexpression Elimination -- Share identical computations
- Memory Planning -- Minimize peak memory usage through buffer reuse
- Energy-Aware Scheduling -- Reorder operations to minimize energy consumption
Autodiff
TensorForge implements reverse-mode automatic differentiation with real VJP implementations for all operations. No stubs, no placeholders -- every backward pass computes correct gradients:
use tf_autodiff::backward;
let loss = model.forward(input);
let gradients = backward(loss); // real gradient computation
optimizer.step(gradients);
Neural Network API
The tf-nn crate provides a Module trait for building neural networks:
use tf_nn::{Module, Linear, Conv2d, BatchNorm2d, relu};
struct ResBlock {
conv1: Conv2d,
bn1: BatchNorm2d,
conv2: Conv2d,
bn2: BatchNorm2d,
}
impl Module for ResBlock {
fn forward(&self, x: Tensor) -> Tensor {
let residual = x;
let out = self.conv1.forward(x)
|> self.bn1.forward
|> relu
|> self.conv2.forward
|> self.bn2.forward;
relu(out + residual)
}
}
Optimizers
The tf-optim crate provides energy-tracked optimizers:
use tf_optim::{AdamW, SGD};
// AdamW with weight decay
let optimizer = AdamW::new(model.parameters(), lr: 0.001, weight_decay: 0.01);
// SGD with momentum
let optimizer = SGD::new(model.parameters(), lr: 0.01, momentum: 0.9);
Every optimizer step reports energy consumed:
let energy = optimizer.step(gradients);
println!("Step energy: {} J", energy.joules());
Distributed Training
The tf-distributed crate supports multi-node training with three collective algorithms:
| Algorithm | Pattern | Best For |
|---|---|---|
| Ring AllReduce | Each node sends to next neighbor | Large models, high bandwidth |
| Tree AllReduce | Binary tree reduction | Low latency |
| Halving-Doubling | Recursive halving then doubling | Balanced |
Energy is tracked across all nodes, giving total training energy:
use tf_distributed::DistributedTrainer;
let trainer = DistributedTrainer::new(
model,
world_size: 8,
algorithm: CollectiveAlgorithm::Ring,
);
let metrics = trainer.train(dataset, epochs: 10);
println!("Total energy across {} nodes: {} J", 8, metrics.total_energy_joules());
Inference Engine
The tf-infer crate provides a high-performance inference engine with:
Paged KV Cache
Efficient key-value caching for transformer models. Memory is allocated in pages, avoiding fragmentation:
use tf_infer::KvCache;
let cache = KvCache::paged(
num_layers: 32,
num_heads: 32,
head_dim: 128,
page_size: 256,
);
Continuous Batching
Dynamic batching that adds new requests to a running batch without waiting for all current requests to complete:
use tf_infer::ContinuousBatcher;
let batcher = ContinuousBatcher::new(max_batch_size: 64);
batcher.add_request(prompt);
let outputs = batcher.step(); // processes all pending requests
Speculative Decoding
Use a smaller draft model to generate candidates, then verify with the full model:
use tf_infer::SpeculativeDecoder;
let decoder = SpeculativeDecoder::new(
target_model: large_model,
draft_model: small_model,
num_speculative_tokens: 5,
);
Sampling Pipeline
Configurable token sampling with temperature, top-k, top-p, and repetition penalty:
use tf_infer::SamplingConfig;
let config = SamplingConfig {
temperature: 0.7,
top_k: 50,
top_p: 0.9,
repetition_penalty: 1.1,
};
Energy-Aware Scheduling
The inference scheduler considers energy costs when choosing batch sizes and scheduling decisions. It can enforce energy budgets on inference requests:
use tf_infer::EnergyAwareScheduler;
let scheduler = EnergyAwareScheduler::new(
max_energy_per_request: 0.5, // joules
max_power_draw: 200.0, // watts
);
Compiler Integration
TensorForge integrates with the Joule compiler through the joule-codegen-tensorforge crate. When Joule code uses tensor operations, the compiler:
- Lowers tensor expressions to TensorIR
- Applies graph optimization passes
- Selects the backend based on
--target - Generates backend-specific code
- Instruments energy telemetry calls
This means energy budgets work with ML code:
#[energy_budget(max_joules = 10.0)]
fn train_epoch(model: &mut Model, data: DataLoader) -> f64 {
let mut total_loss = 0.0;
for batch in data {
let loss = model.forward(batch.input);
let grads = backward(loss);
optimizer.step(grads);
total_loss = total_loss + loss.item();
}
total_loss
}
Joule Language Reference
The formal specification of Joule's syntax and semantics.
Contents
- Types -- Primitive types, compound types, union types, generics, type inference
- Expressions -- Operators, pipe operator, literals, control flow, closures
- Items -- Functions, structs, enums, traits, impls, modules, const fn, comptime
- Patterns -- Pattern matching: or patterns, range patterns, guard clauses
- Attributes -- Energy budgets, #[test], #[bench], thermal awareness, derive macros
- Memory -- Ownership, borrowing, references, lifetimes
- Concurrency -- Async/await, spawn, bounded channels, task groups, parallel for, supervisors
- Energy -- Energy system formal specification with accelerator support
Notation
In syntax descriptions:
monospaceindicates literal syntax- italics indicate a syntactic category (e.g., expression, type)
[ ]indicates optional elements{ }indicates zero or more repetitions|separates alternatives
Types
Primitive Types
Integer Types
| Type | Size | Range |
|---|---|---|
i8 | 8-bit | -128 to 127 |
i16 | 16-bit | -32,768 to 32,767 |
i32 | 32-bit | -2^31 to 2^31-1 |
i64 | 64-bit | -2^63 to 2^63-1 |
isize | pointer-sized | Platform dependent |
u8 | 8-bit | 0 to 255 |
u16 | 16-bit | 0 to 65,535 |
u32 | 32-bit | 0 to 2^32-1 |
u64 | 64-bit | 0 to 2^64-1 |
usize | pointer-sized | Platform dependent |
Integer literals default to i32. Use suffixes for other types: 42u8, 100i64, 0usize.
Floating-Point Types
| Type | Size | Precision |
|---|---|---|
f16 | 16-bit | ~3 decimal digits (IEEE 754 half-precision) |
bf16 | 16-bit | ~3 decimal digits (Brain Float, ML workloads) |
f32 | 32-bit | ~7 decimal digits |
f64 | 64-bit | ~15 decimal digits |
Float literals default to f64. Use suffix for f32: 3.14f32.
f16 and bf16 are half-precision types for ML inference and signal processing. bf16 has the same exponent range as f32 but fewer mantissa bits — ideal for neural network weights. Energy cost: 0.4 pJ per operation (vs 0.35 pJ for f32).
Boolean
let a: bool = true;
let b: bool = false;
Character
let c: char = 'A'; // Unicode scalar value
let emoji: char = '\u{1F600}';
Unit Type
The unit type () represents the absence of a meaningful value. Functions without a return type return ().
Compound Types
Tuples
Fixed-length, heterogeneous sequences:
let pair: (i32, String) = (42, "hello");
let (x, y) = pair; // destructuring
let first = pair.0; // field access
Arrays
Fixed-length, homogeneous sequences:
let arr: [i32; 5] = [1, 2, 3, 4, 5];
let zeros = [0; 10]; // 10 zeros
let first = arr[0]; // indexing
Slices
Dynamically-sized views into arrays:
let slice: &[i32] = &arr[1..3];
String Types
String
Owned, heap-allocated, growable UTF-8 string:
let s: String = "Hello, world!";
let greeting = "Hi " + name; // concatenation
let len = s.len(); // byte length
&str
Borrowed string slice:
let s: &str = "literal";
Union Types
Union types allow a value to be one of several types. They are declared with the | separator:
type Number = i32 | i64 | f64;
type JsonValue = i64 | f64 | String | bool | Vec<JsonValue>;
type Result = Data | ErrorCode;
Union types are matched exhaustively:
fn describe(val: Number) -> String {
match val {
x: i32 => format!("i32: {}", x),
x: i64 => format!("i64: {}", x),
x: f64 => format!("f64: {}", x),
}
}
Union Type Rules
- Each constituent type must be distinct
- The compiler tracks which variant is active at runtime via a discriminant tag
- Pattern matching on union types is exhaustive -- all variants must be handled
- Union types compose:
type A = B | CwhereBandCcan themselves be union types
Generic Types
Vec
Dynamic array:
let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
let first = v[0];
Option
Optional value:
let some: Option<i32> = Option::Some(42);
let none: Option<i32> = Option::None;
Result<T, E>
Fallible operation:
let ok: Result<i32, String> = Result::Ok(42);
let err: Result<i32, String> = Result::Err("failed");
Box
Heap-allocated value:
let boxed: Box<i32> = Box::new(42);
HashMap<K, V>
Key-value map:
let mut map: HashMap<String, i32> = HashMap::new();
map.insert("key", 42);
Smart Pointers
See Smart Pointers for full documentation.
let rc = Rc::new(42); // single-threaded shared ownership
let arc = Arc::new(42); // thread-safe shared ownership
let cow = Cow::borrowed("hi"); // clone-on-write
Const-Generic Types
// SmallVec — inline buffer with heap spillover
let mut sv: SmallVec[i32; 8] = SmallVec::new();
sv.push(42); // stored inline (no allocation until > 8 elements)
// Simd — portable SIMD vectors
let v: Simd[f32; 4] = Simd::splat(1.0);
let w: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let sum = v.add(&w);
See Simd for full SIMD documentation.
N-Dimensional Arrays
See NDArray for full documentation.
let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]);
let view: NDView[f64; 1] = mat.row(0);
Type Aliases
pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;
Type Inference
The compiler infers types when possible:
let x = 42; // inferred as i32
let v = Vec::new(); // type inferred from usage
v.push(1u8); // now inferred as Vec<u8>
Explicit annotations are required when the type cannot be inferred from context.
Type Casting
Use as for numeric conversions:
let x: i32 = 42;
let y: f64 = x as f64;
let z: u8 = x as u8; // truncation
let p: usize = x as usize;
Expressions
Joule is expression-oriented. Most constructs return a value, including if, match, and blocks.
Literals
42 // integer (i32)
3.14 // float (f64)
true // bool
'A' // char
"hello" // String
Integer Literal Suffixes
42i8 42i16 42i32 42i64 42isize
42u8 42u16 42u32 42u64 42usize
Float Literal Suffixes
3.14f32 3.14f64
Arithmetic Operators
| Operator | Operation | Types |
|---|---|---|
+ | Addition | integers, floats, String concatenation |
- | Subtraction | integers, floats |
* | Multiplication | integers, floats |
/ | Division | integers, floats |
% | Remainder | integers |
** | Exponentiation | integers, floats (right-associative) |
Comparison Operators
| Operator | Operation |
|---|---|
== | Equal |
!= | Not equal |
< | Less than |
> | Greater than |
<= | Less or equal |
>= | Greater or equal |
All comparison operators return bool.
Logical Operators
| Operator | Operation |
|---|---|
&& | Logical AND (short-circuit) |
|| | Logical OR (short-circuit) |
! | Logical NOT |
Bitwise Operators
| Operator | Operation |
|---|---|
& | Bitwise AND |
| | Bitwise OR |
^ | Bitwise XOR |
~ | Bitwise NOT |
<< | Left shift |
>> | Right shift |
Pipe Operator
The pipe operator |> passes the result of the left-hand expression as the first argument to the right-hand function:
// Without pipe
let result = process(transform(parse(input)));
// With pipe -- reads left to right
let result = input |> parse |> transform |> process;
Pipe with Multi-Argument Functions
When the right-hand side is a call with arguments, the piped value is inserted as the first argument:
let result = data
|> filter(|x| x > 0)
|> map(|x| x * 2)
|> take(10);
Pipe Precedence
The pipe operator has lower precedence than all other operators except assignment. It is left-associative:
// These are equivalent:
a |> f |> g
g(f(a))
Assignment
let mut x = 0;
x = 42;
Compound assignment is not supported. Use x = x + 1 instead of x += 1.
Block Expressions
A block evaluates to its last expression:
let result = {
let a = 10;
let b = 20;
a + b // no semicolon -- this is the block's value
};
// result == 30
If Expressions
if is an expression and returns a value:
let max = if a > b { a } else { b };
Without else, the type is ():
if condition {
do_something();
}
Chained:
if x > 0 {
"positive"
} else if x < 0 {
"negative"
} else {
"zero"
}
Match Expressions
Exhaustive pattern matching:
let name = match color {
Color::Red => "red",
Color::Green => "green",
Color::Blue => "blue",
};
See Patterns for pattern syntax.
Loops
While Loop
while condition {
body();
}
For Loop
for item in collection {
process(item);
}
Loop (Infinite)
loop {
if done() {
break;
}
}
Break and Continue
loop {
if skip_this() {
continue;
}
if finished() {
break;
}
}
Function Calls
let result = add(1, 2);
Method Calls
let len = string.len();
let upper = string.to_uppercase();
Field Access
let x = point.x;
let name = person.name;
Index Access
let first = vec[0];
let char = string[i];
Struct Construction
let p = Point { x: 3.0, y: 4.0 };
Enum Variant Construction
let c = Shape::Circle { radius: 5.0 };
let ok = Result::Ok(42);
Return
Explicit return from a function:
fn find(items: Vec<i32>, target: i32) -> Option<i32> {
let mut i = 0;
while i < items.len() {
if items[i] == target {
return Option::Some(items[i]);
}
i = i + 1;
}
Option::None
}
Type Cast
let x = 42i32 as f64;
let y = offset as usize;
Items
Items are the top-level declarations in a Joule program.
Functions
fn name(param: Type, param2: Type) -> ReturnType {
body
}
Visibility
pub fn public_function() { } // visible outside module
fn private_function() { } // module-private (default)
Parameters
Parameters are passed by value (move) by default:
fn process(data: Vec<u8>) {
// data is moved into this function
}
Use references for borrowing:
fn inspect(data: &Vec<u8>) {
// data is borrowed immutably
}
fn modify(data: &mut Vec<u8>) {
// data is borrowed mutably
}
Self Parameter
Methods take self as their first parameter:
impl Point {
fn distance(self) -> f64 { } // takes ownership
fn inspect(self) -> f64 { } // immutable self
fn translate(mut self, dx: f64) { } // mutable self
}
Generic Functions
fn first<T>(items: Vec<T>) -> Option<T> {
if items.len() > 0 {
Option::Some(items[0])
} else {
Option::None
}
}
Extern Functions
Functions implemented outside Joule (FFI):
extern fn sqrt(x: f64) -> f64;
extern fn malloc(size: usize) -> *mut u8;
Const Functions
Functions that can be evaluated at compile time are declared with const fn:
const fn max(a: i32, b: i32) -> i32 {
if a > b { a } else { b }
}
const fn factorial(n: i32) -> i32 {
if n <= 1 { 1 } else { n * factorial(n - 1) }
}
// Use at compile time
const MAX_SIZE: i32 = max(100, 200);
const FACT_10: i32 = factorial(10);
Const Function Restrictions
const fn bodies are restricted to operations the compiler can evaluate:
- Arithmetic operations on primitive types
- Control flow (
if,match, recursion) - Local variable bindings
- Calling other
const fnfunctions
The following are not allowed in const fn:
- Heap allocation (
Vec::new(),Box::new()) - I/O operations
- Mutable static state
- Non-const function calls
Comptime Blocks
For more complex compile-time computation, use comptime blocks:
comptime {
let table = generate_sin_table(1024);
}
// table is available as a compile-time constant
fn fast_sin(x: f64) -> f64 {
let index = (x * 1024.0 / TAU) as usize;
table[index % 1024]
}
Comptime blocks execute during compilation and make their results available as constants in runtime code. The HIR const evaluator handles arithmetic, control flow, and function calls within comptime blocks.
Structs
Named product types with fields:
pub struct Point {
pub x: f64,
pub y: f64,
}
Field Visibility
Fields are private by default. Use pub to make them accessible:
pub struct Config {
pub name: String, // public
secret_key: String, // private
}
Generic Structs
pub struct Pair<A, B> {
pub first: A,
pub second: B,
}
Enums
Sum types (tagged unions) with variants:
pub enum Color {
Red,
Green,
Blue,
}
Variants with Data
pub enum Shape {
Circle { radius: f64 },
Rectangle { width: f64, height: f64 },
Point,
}
Tuple Variants
pub enum Option<T> {
Some(T),
None,
}
pub enum Result<T, E> {
Ok(T),
Err(E),
}
Generic Enums
pub enum Either<L, R> {
Left(L),
Right(R),
}
Impl Blocks
Associate methods with a type:
impl Point {
// Associated function (no self)
pub fn new(x: f64, y: f64) -> Point {
Point { x, y }
}
// Method (takes self)
pub fn distance(self) -> f64 {
(self.x * self.x + self.y * self.y).sqrt()
}
}
Multiple impl blocks are allowed for the same type:
impl Point {
pub fn new(x: f64, y: f64) -> Point { Point { x, y } }
}
impl Point {
pub fn translate(mut self, dx: f64, dy: f64) {
self.x = self.x + dx;
self.y = self.y + dy;
}
}
Traits
Define shared behavior:
pub trait Display {
fn to_string(self) -> String;
}
pub trait Clone {
fn clone(self) -> Self;
}
Trait Implementation
impl Display for Point {
fn to_string(self) -> String {
"(" + self.x.to_string() + ", " + self.y.to_string() + ")"
}
}
Trait Bounds
fn print_all<T: Display>(items: Vec<T>) {
for item in items {
println!("{}", item.to_string());
}
}
Dynamic Dispatch
Use dyn Trait for runtime polymorphism:
fn print_shape(shape: &dyn Display) {
println!("{}", shape.to_string());
}
Type Aliases
pub type Token = Spanned<TokenKind>;
pub type ParseResult<T> = Result<T, ParseError>;
Modules
Module Declarations
Modules organize code into separate files. The mod keyword declares a module:
mod lexer; // loads from lexer.joule or lexer/mod.joule
mod parser;
mod typeck;
File Resolution
When the compiler encounters mod foo;, it searches for:
foo.joulein the same directory as the current filefoo/mod.joulefor modules with sub-modules
Public Module Re-exports
pub mod utils; // re-exports utils module to parent
Inline Modules
Modules can be defined inline within a file:
mod helpers {
pub fn clamp(x: i32, lo: i32, hi: i32) -> i32 {
if x < lo { lo } else if x > hi { hi } else { x }
}
}
// Use items from inline module
let clamped = helpers::clamp(value, 0, 100);
Visibility
pub mod public_module { }
mod private_module { }
Use Declarations
Import items into scope:
// Import specific items
use crate::ast::{File, AstItem, Visibility};
// Import all items from a module
use crate::prelude::*;
// Standard library imports
use std::collections::HashMap;
use std::math::*;
// Import with alias
use crate::ast::File as AstFile;
Stdlib Path
The --stdlib-path CLI flag specifies the location of the standard library. The builtin registry includes modules for math, statistics, and compute:
use std::math::*; // sin, cos, sqrt, etc.
use std::statistics::*; // mean, median, std_dev
Let Statements
Variable bindings:
let x = 42; // immutable, type inferred
let y: f64 = 3.14; // immutable, explicit type
let mut z = 0; // mutable
let (a, b) = (1, 2); // destructuring
Patterns
Patterns are used in match expressions, let bindings, and function parameters to destructure values.
Literal Patterns
match x {
0 => "zero",
1 => "one",
_ => "other",
}
Identifier Patterns
Bind the matched value to a name:
match value {
x => println!("Got: {}", x),
}
Wildcard Pattern
_ matches any value and discards it:
match pair {
(x, _) => println!("First: {}", x),
}
Enum Variant Patterns
Tuple Variants
match option {
Option::Some(value) => use_value(value),
Option::None => handle_empty(),
}
Named Field Variants
match shape {
Shape::Circle { radius } => 3.14159 * radius * radius,
Shape::Rectangle { width, height } => width * height,
Shape::Point => 0.0,
}
Nested Patterns
match result {
Result::Ok(Option::Some(value)) => use_value(value),
Result::Ok(Option::None) => handle_none(),
Result::Err(e) => handle_error(e),
}
Struct Patterns
let Point { x, y } = point;
In match:
match token {
Token { kind: TokenKind::Fn, span } => parse_function(span),
Token { kind: TokenKind::Struct, span } => parse_struct(span),
_ => parse_expression(),
}
Tuple Patterns
let (a, b) = (1, 2);
match pair {
(0, 0) => "origin",
(x, 0) => "x-axis",
(0, y) => "y-axis",
(x, y) => "other",
}
Reference Patterns
match &value {
&Some(x) => use_value(x),
&None => handle_none(),
}
Or Patterns
Match multiple alternatives in a single arm using |:
match x {
1 | 2 | 3 => "small",
4 | 5 | 6 => "medium",
_ => "large",
}
Or patterns work with enum variants:
match direction {
Direction::North | Direction::South => "vertical",
Direction::East | Direction::West => "horizontal",
}
They also work with nested patterns:
match result {
Result::Ok(1 | 2 | 3) => "small success",
Result::Ok(_) => "other success",
Result::Err(_) => "failure",
}
Range Patterns
Match a contiguous range of values using ..= (inclusive):
match score {
0..=59 => "F",
60..=69 => "D",
70..=79 => "C",
80..=89 => "B",
90..=100 => "A",
_ => "invalid",
}
Range patterns work with integer types:
match byte {
0x00..=0x1F => "control character",
0x20..=0x7E => "printable ASCII",
0x7F => "delete",
_ => "extended",
}
And with characters:
match c {
'a'..='z' => "lowercase",
'A'..='Z' => "uppercase",
'0'..='9' => "digit",
_ => "other",
}
Guard Clauses
Add a boolean condition to a match arm with if:
match value {
x if x > 100 => "large",
x if x > 0 => "positive",
x if x < 0 => "negative",
_ => "zero",
}
Guards can reference variables bound in the pattern:
match point {
Point { x, y } if x == y => "on diagonal",
Point { x, y } if x == 0 => "on y-axis",
Point { x, y } if y == 0 => "on x-axis",
_ => "general",
}
Guards combine with or patterns:
match value {
1 | 2 | 3 if verbose => {
println!("small value: {}", value);
"small"
}
_ => "other",
}
Guard Evaluation
- The guard expression is evaluated only if the structural pattern matches
- Guards do not affect exhaustiveness checking -- the compiler still requires all variants to be covered
- If the guard evaluates to
false, matching continues to the next arm
Exhaustiveness
The compiler verifies that match expressions cover all possible cases. Omitting a variant produces a compile-time error:
error: non-exhaustive match
--> program.joule:10:5
|
10 | match color {
| ^^^^^ missing variants: Blue
Use _ as a catch-all when you don't need to handle every variant explicitly.
Attributes
Attributes are metadata attached to items (functions, structs, enums) that modify their behavior or provide information to the compiler.
Syntax
Attributes are placed above the item they annotate, prefixed with #[...]:
#[attribute_name]
fn function() { }
#[attribute_name(key = value)]
fn function_with_args() { }
Energy Budget
The primary attribute in Joule. Declares the maximum energy a function is allowed to consume:
#[energy_budget(max_joules = 0.0001)]
fn efficient_add(x: i32, y: i32) -> i32 {
x + y
}
Parameters
| Parameter | Type | Description |
|---|---|---|
max_joules | f64 | Maximum energy in joules |
max_watts | f64 | Maximum average power in watts |
max_temp_delta | f64 | Maximum temperature rise in degrees Celsius |
Multiple parameters can be combined:
#[energy_budget(max_joules = 0.001, max_watts = 20.0, max_temp_delta = 3.0)]
fn critical_path() { }
See Energy System Guide for details.
Thermal Awareness
Marks a function as thermal-aware. The compiler may insert thermal throttling checks:
#[thermal_aware]
fn heavy_compute(data: Vec<f64>) -> f64 {
// ...
}
Test
Marks a function as a test. Test functions are collected and executed when the compiler runs with --test:
#[test]
fn test_addition() {
assert_eq!(add(2, 3), 5);
}
#[test]
fn test_sort_correctness() {
let data = vec![5, 3, 1, 4, 2];
let sorted = sort(data);
assert_eq!(sorted[0], 1);
assert_eq!(sorted[4], 5);
}
Test Energy Reporting
Every test run includes energy consumption data. The test runner reports:
- Pass/fail status
- Energy consumed by each test (in joules)
- Total energy across all tests
joulec program.joule --test
Output:
running 3 tests
test test_addition ... ok (0.000012 J)
test test_sort_correctness ... ok (0.000089 J)
test test_fibonacci ... ok (0.000341 J)
test result: ok. 3 passed; 0 failed
total energy: 0.000442 J
Bench
Marks a function as a benchmark. Benchmark functions are collected and executed when the compiler runs with --bench:
#[bench]
fn bench_matrix_multiply() {
let a = Matrix::random(100, 100);
let b = Matrix::random(100, 100);
let _ = a.multiply(b);
}
#[bench]
fn bench_sort_large() {
let data = generate_random_vec(10000);
let _ = sort(data);
}
Bench Energy Reporting
Benchmarks report timing and energy data over multiple iterations:
joulec program.joule --bench
Output:
running 2 benchmarks
bench bench_matrix_multiply ... 1,234 ns/iter (+/- 56) | 0.00185 J/iter
bench bench_sort_large ... 892 ns/iter (+/- 23) | 0.00134 J/iter
total energy: 3.19 J (1000 iterations each)
Derive
Automatically implement traits for a type:
#[derive(Clone, Debug)]
pub struct Point {
pub x: f64,
pub y: f64,
}
Available Derive Traits
| Trait | Description |
|---|---|
Clone | Value can be duplicated |
Debug | Debug string representation |
Eq | Equality comparison |
Serialize | Serialization support |
GPU Kernel
Marks a function for GPU execution (requires MLIR backend):
#[gpu_kernel]
fn vector_add(a: Vec<f32>, b: Vec<f32>) -> Vec<f32> {
// ...
}
Visibility
While not strictly an attribute, visibility modifiers control access:
pub fn public_function() { } // visible everywhere
pub(crate) fn crate_function() { } // visible within the crate
fn private_function() { } // module-private (default)
Memory Model
Joule uses an ownership-based memory model inspired by Rust. Memory is managed at compile time with no garbage collector.
Ownership
Every value has exactly one owner. When the owner goes out of scope, the value is dropped (memory freed):
fn example() {
let s = String::from("hello"); // s owns the string
process(s); // ownership moves to process()
// s is no longer valid here
}
Move Semantics
Assignment and function calls transfer ownership by default:
let a = Vec::new();
let b = a; // a is moved to b
// a is no longer valid
References
References borrow a value without taking ownership:
Immutable References
fn inspect(data: &Vec<i32>) {
let len = data.len();
// data is borrowed, not consumed
}
let v = Vec::new();
inspect(&v); // borrow v
// v is still valid here
Multiple immutable references can coexist:
let r1 = &v;
let r2 = &v; // ok: multiple immutable borrows
Mutable References
fn modify(data: &mut Vec<i32>) {
data.push(42);
}
let mut v = Vec::new();
modify(&mut v); // mutable borrow
Only one mutable reference can exist at a time:
let r1 = &mut v;
// let r2 = &mut v; // error: cannot borrow mutably twice
Borrowing Rules
The borrow checker enforces these rules at compile time:
- At any given time, you can have either:
- One mutable reference, OR
- Any number of immutable references
- References must always be valid -- no dangling references
- No mutable aliasing -- if a mutable reference exists, no other references to the same data can exist
Lifetimes
Lifetimes ensure references don't outlive the data they point to (planned):
fn first_word<'a>(s: &'a str) -> &'a str {
// The returned reference lives as long as the input
s.split(" ").next().unwrap_or("")
}
Box
Heap allocation with single ownership:
let boxed = Box::new(42); // allocate on the heap
let value = *boxed; // dereference
// Required for recursive types
pub enum List<T> {
Cons(T, Box<List<T>>),
Nil,
}
Box auto-derefs for field access:
let expr = Box::new(Expr { kind: ExprKind::Literal(42), span: Span::dummy() });
let kind = expr.kind; // auto-deref through Box
Raw Pointers
For unsafe, low-level memory access:
unsafe {
let ptr: *mut i32 = addr as *mut i32;
*ptr = 42;
}
Raw pointers bypass the borrow checker. Use only when necessary and always within unsafe blocks.
Stack vs. Heap
| Allocation | When | Performance |
|---|---|---|
| Stack | Local variables, small types | Fast (pointer bump) |
| Heap (Box) | Recursive types, large data, dynamic size | Slower (allocator call) |
| Heap (Vec) | Dynamic arrays | Amortized fast |
The compiler places values on the stack by default. Use Box<T> to explicitly heap-allocate.
Concurrency
Joule provides structured concurrency primitives for safe parallel execution.
Async/Await
Functions that perform asynchronous operations are marked async:
async fn fetch_data(url: String) -> Result<String, Error> {
let response = http::get(url).await?;
Result::Ok(response.body())
}
The await keyword suspends execution until the asynchronous operation completes.
Async Energy Tracking
Async operations are fully energy-tracked. The compiler inserts timing wrappers around Spawn, TaskAwait, TaskGroupEnter, and TaskGroupExit operations to measure the energy consumed by asynchronous work:
#[energy_budget(max_joules = 0.005)]
async fn process_pipeline(urls: Vec<String>) -> Vec<Data> {
let mut results: Vec<Data> = Vec::new();
for url in urls {
let data = fetch_data(url).await; // energy tracked
results.push(data);
}
results
}
Desugaring
Async functions are desugared to state machines backed by Task types. The await keyword becomes a yield point that checks for task completion and records energy consumed during the suspension.
Spawn
Launch a concurrent task:
use std::concurrency::spawn;
let handle = spawn(|| {
heavy_computation()
});
let result = handle.join();
Task Pool
Under the hood, spawn submits work to a pthread-based task pool with 256 task slots. Tasks are distributed across worker threads and managed by the runtime:
- Worker threads are pre-allocated (one per CPU core)
- Tasks are stored in a fixed-size array (256 slots)
- Task submission is lock-free on the fast path
- Energy consumption is tracked per-task with thread-safe atomic counters
Channels
Send values between tasks using bounded channels:
use std::concurrency::{channel, Sender, Receiver};
let (tx, rx) = channel(capacity: 100);
spawn(|| {
for i in 0..1000 {
tx.send(i); // blocks when buffer is full
}
});
let value = rx.recv(); // blocks when buffer is empty
Bounded Channel Implementation
Channels are implemented as ring buffers protected by mutex/condvar pairs:
- Capacity: Specified at creation time, provides backpressure
- Blocking:
send()blocks when the buffer is full;recv()blocks when empty - Thread Safety: Mutex protects the ring buffer; condvars signal producers and consumers
- Energy: Channel operations are energy-tracked -- both send and receive costs are attributed to the calling task
Unbounded Channels
For cases where backpressure is not needed:
let (tx, rx) = channel(); // unbounded (grows as needed)
Task Groups
Structured concurrency with automatic cancellation:
use std::concurrency::TaskGroup;
let group = TaskGroup::new();
group.spawn(|| process_chunk_1());
group.spawn(|| process_chunk_2());
group.spawn(|| process_chunk_3());
let results = group.join_all(); // waits for all tasks
If any task panics, the group cancels all remaining tasks. Energy consumption is aggregated across all tasks in the group.
Parallel For
Parallel iteration distributes work across threads automatically:
let results = parallel for item in data {
heavy_computation(item)
};
With explicit chunk size:
let processed = parallel(chunk_size: 1024) for row in matrix {
transform(row)
};
The compiler sums energy consumption across all parallel branches for budget enforcement.
Mutex
Mutual exclusion for shared state:
use std::concurrency::Mutex;
let counter = Mutex::new(0);
// In a concurrent task:
let mut guard = counter.lock();
*guard = *guard + 1;
// guard dropped here, lock released
Atomic Types
Lock-free primitives for simple shared state:
use std::concurrency::AtomicI32;
let counter = AtomicI32::new(0);
counter.fetch_add(1);
let value = counter.load();
Supervisors
Supervisors manage task lifecycles with restart strategies:
use std::concurrency::Supervisor;
let sup = Supervisor::new(RestartStrategy::OneForOne);
sup.spawn("worker", || {
process_queue()
});
sup.run();
Restart strategies:
| Strategy | Behavior |
|---|---|
OneForOne | Only the failed task is restarted |
OneForAll | All tasks are restarted when one fails |
RestForOne | The failed task and all tasks started after it are restarted |
Safety Guarantees
The ownership system prevents data races at compile time:
- Shared state must be wrapped in
Mutex,Atomic, or other synchronization primitives - The borrow checker ensures no mutable aliasing across tasks
- Task groups provide structured lifetimes for spawned work
- Channels provide safe, typed communication between tasks
- Energy tracking is thread-safe using atomic counters
Energy System Specification
This is the formal specification for Joule's compile-time energy verification system.
Overview
The energy system consists of:
- Energy budget attributes -- Programmer-declared constraints on function energy consumption
- Energy estimator -- Static analysis that estimates energy from HIR
- Energy cost model -- Calibrated per-instruction energy costs
- Energy IR (EIR) -- Intermediate representation with picojoule cost annotations
- Accelerator energy -- Runtime measurement for GPUs and other accelerators
- Diagnostics -- Error messages when budgets are violated
Attribute Syntax
#[energy_budget( budget_param { , budget_param } )]
Where budget_param is one of:
| Parameter | Type | Unit | Description |
|---|---|---|---|
max_joules | f64 | joules | Maximum total energy |
max_watts | f64 | watts | Maximum average power |
max_temp_delta | f64 | celsius | Maximum temperature rise |
Estimation Model
Instruction Costs
The cost model assigns picojoule costs to each instruction type. Costs are calibrated against real hardware measurements:
| Instruction | Base Cost (pJ) | Thermal Scaling |
|---|---|---|
IntAdd | 0.05 | Linear |
IntSub | 0.05 | Linear |
IntMul | 0.35 | Linear |
IntDiv | 3.5 | Linear |
IntRem | 3.5 | Linear |
FloatAdd | 0.35 | Quadratic |
FloatSub | 0.35 | Quadratic |
FloatMul | 0.35 | Quadratic |
FloatDiv | 3.5 | Quadratic |
FloatSqrt | 5.25 | Quadratic |
MemLoadL1 | 0.5 | Linear |
MemLoadL2 | 3.0 | Linear |
MemLoadL3 | 10.0 | Linear |
MemLoadDram | 200.0 | Linear |
MemStoreDram | 200.0 | Linear |
BranchTaken | 0.1 | None |
BranchNotTaken | 0.1 | None |
BranchMispredicted | 1.5 | None |
SimdF32x8Add | 1.5 | Quadratic |
SimdF32x8Mul | 1.5 | Quadratic |
SimdF32x8Div | 7.0 | Quadratic |
SimdF32x8Fma | 2.0 | Quadratic |
Thermal Scaling
Actual cost = base_cost * thermal_factor, where thermal_factor depends on the thermal model:
- None: cost is constant regardless of temperature
- Linear:
actual = base * (1.0 + 0.3 * thermal_state) - Quadratic:
actual = base * (1.0 + 0.3 * thermal_state + 0.1 * thermal_state^2)
Default thermal state: 0.3 (nominal operating temperature).
Expression Costs
| Expression | Cost |
|---|---|
| Literal | 0.01 pJ |
| Variable access | L1 load |
| Binary operation | left + right + op_cost |
| Unary operation | inner + op_cost |
| Function call | args + branch + 2x L1 (stack) |
| Method call | receiver + args + branch + 3x L1 |
| Field access | inner + IntAdd + L1 |
| Index access | array + index + IntMul + IntAdd + branch (bounds) + L1 |
| Struct construction | fields + (field_count x L1) |
| Array construction | elements + (element_count x L1) |
Loop Estimation
- Known bounds: body_cost * iteration_count
- Unknown bounds: body_cost * default_iterations (100)
- Max iterations cap: 10,000
- PGO-refined: body_cost * actual_trip_count (from profile data)
Unknown-bound loops reduce confidence by 0.7x. PGO data restores confidence to 0.95x.
Branch Estimation
- if/else: condition + avg(then_cost, else_cost) + branch_cost
- match: scrutinee + avg(arm_costs) + (arm_count x branch_cost)
Branches reduce confidence by 0.9x (if/else) or 0.85x (match).
Confidence Score
Range: 0.0 to 1.0
- Straight-line code: 1.0
- Each if/else: multiply by 0.9
- Each match: multiply by 0.85
- Each unbounded loop: multiply by 0.7
- PGO-refined loop: multiply by 0.95
The confidence score is reported in diagnostics to help the programmer assess estimate reliability.
Energy IR (EIR)
The Energy IR is an intermediate representation where every node carries a picojoule cost annotation. It sits between HIR and MIR in the pipeline:
HIR -> EIR (with picojoule costs) -> E-Graph Optimizer -> MIR
EIR nodes include:
EirExpr-- Expressions with energy costsEirStmt-- Statements with energy costsEirBody-- Function bodies with total energy and effect sets
Effect Sets
EIR tracks side effects using EffectSet:
- Pure (no effects)
- IO (reads/writes)
- Alloc (heap allocation)
- Panic (may abort)
The e-graph optimizer uses effect information to determine which rewrites are safe.
E-Graph Optimization
When --egraph-optimize is enabled, the EIR passes through an e-graph optimizer with 30+ algebraic rewrite rules:
- Arithmetic simplification (
x + 0 -> x,x * 1 -> x) - Constant folding
- Dead code elimination
- Common subexpression elimination
- Strength reduction (
x * 2 -> x << 1) - Energy-aware rewrites (prefer lower-energy equivalent operations)
Three-Tier Measurement
Tier 1: Static Estimation
Compile-time energy estimation using the instruction cost model. Available for all programs, no hardware access required.
Tier 2: CPU Performance Counters
Runtime measurement using hardware performance counters:
- Intel/AMD: RAPL (Running Average Power Limit) via
perf_eventor MSR - Apple Silicon:
powermetricsintegration
Tier 3: Accelerator Energy
Runtime measurement using vendor-specific APIs:
| Vendor | API | Measurement |
|---|---|---|
| NVIDIA | NVML (nvmlDeviceGetTotalEnergyConsumption) | Board power, per-GPU |
| AMD | ROCm SMI (rsmi_dev_power_ave_get) | Average power, per-GPU |
| Intel | Level Zero (zesDeviceGetProperties + power domains) | Per-device power |
| TPU Runtime | Per-chip power | |
| AWS | Neuron SDK | Per-core power |
| Groq | HLML (hlmlDeviceGetTotalEnergyConsumption) | Board power |
| Cerebras | CS SDK | Wafer-scale power |
| SambaNova | DataScale API | Per-RDU power |
See Accelerator Energy Measurement for details.
Power Estimation
avg_pj_per_cycle = 0.15 (weighted average for mixed workloads)
estimated_cycles = total_pJ / avg_pj_per_cycle
execution_time = estimated_cycles / reference_frequency (3.0 GHz)
power_watts = energy_joules / execution_time
Thermal Estimation
thermal_resistance = 0.4 K/W (typical CPU with standard cooling)
temp_delta = power_watts * thermal_resistance
Transitive Energy Budgets
Energy budgets are enforced across call boundaries. When function A calls function B, the energy cost of B is included in A's total:
#[energy_budget(max_joules = 0.0001)]
fn helper() -> i32 { 42 }
#[energy_budget(max_joules = 0.0005)]
fn caller() -> i32 {
helper() + helper()
// Total includes 2x helper's energy + caller's own instructions
}
The call graph analyzer (joule-callgraph) builds a complete energy call graph and identifies hotspots.
JSON Output
When JOULE_ENERGY_JSON=1 is set, energy reports are emitted as structured JSON:
{
"functions": [
{
"name": "process_data",
"file": "program.joule",
"line": 15,
"energy_joules": 0.00035,
"power_watts": 12.5,
"confidence": 0.85,
"budget_joules": 0.0001,
"status": "exceeded",
"breakdown": {
"compute_pj": 280000,
"memory_pj": 70000,
"branch_pj": 500
}
}
],
"total_energy_joules": 0.00042
}
Violation Diagnostics
When a budget is exceeded, the compiler emits an error:
error: energy budget exceeded in function 'name'
--> file.joule:line:col
|
| fn name(...) {
| ^^^^^^^^^^^^^^
|
= estimated: X.XXXXX J (confidence: NN%)
= budget: X.XXXXX J
= exceeded by NNN%
For power and thermal budgets, similar diagnostics are produced with the appropriate units.
Standard Library
Joule ships with 110+ batteries-included modules. No package manager needed for common tasks.
Core Types
These are the fundamental types used in every Joule program.
| Module | Description | Status |
|---|---|---|
| String | UTF-8 string type | Implemented |
| Vec | Dynamic array | Implemented |
| Option | Optional values | Implemented |
| Result | Error handling | Implemented |
| HashMap | Key-value maps | Implemented |
| Primitives | Numeric types, bool, char | Implemented |
Collections
| Module | Description | Status |
|---|---|---|
| collections | Overview of all collection types | Implemented |
Vec<T> | Dynamic array | Implemented |
HashMap<K,V> | Hash map | Implemented |
HashSet<T> | Hash set | Implemented |
BTreeMap<K,V> | Sorted map | Implemented |
BTreeSet<T> | Sorted set | Implemented |
LinkedList<T> | Doubly-linked list | Implemented |
VecDeque<T> | Double-ended queue | Implemented |
BinaryHeap<T> | Priority queue | Implemented |
Mathematics
| Module | Description | Status |
|---|---|---|
| math | Mathematical functions | Implemented |
math::linear | Linear algebra | Implemented |
math::complex | Complex numbers | Implemented |
statistics | Statistical analysis | Implemented |
montecarlo | Monte Carlo methods | Implemented |
I/O and Networking
| Module | Description | Status |
|---|---|---|
| io | File and stream I/O | Implemented |
net | TCP/UDP networking | Implemented |
json | JSON parsing and serialization | Implemented |
csv | CSV parsing | Implemented |
toml | TOML parsing | Implemented |
yaml | YAML parsing | Implemented |
Databases
| Module | Description | Status |
|---|---|---|
db-sqlite | SQLite | Implemented |
db-postgres | PostgreSQL | Implemented |
db-mysql | MySQL | Implemented |
db-redis | Redis | Implemented |
db-mongodb | MongoDB | Implemented |
| ...and 30+ more | See stdlib/db-* | Implemented |
Scientific Computing
| Module | Description | Status |
|---|---|---|
ode | Ordinary differential equations | Implemented |
pde | Partial differential equations | Implemented |
dsp | Digital signal processing | Implemented |
physics | Physics simulation | Implemented |
bio | Bioinformatics | Implemented |
chem | Chemistry | Implemented |
Machine Learning and AI
| Module | Description | Status |
|---|---|---|
ml | Machine learning | Implemented |
snn | Spiking neural networks | Implemented |
agent | AI agent framework | Implemented |
Cryptography and Security
| Module | Description | Status |
|---|---|---|
crypto | Cryptographic primitives | Implemented |
security | Security analysis | Implemented |
zkp | Zero-knowledge proofs | Implemented |
fhe | Fully homomorphic encryption | Implemented |
Graphics and Visualization
| Module | Description | Status |
|---|---|---|
graphics | 2D/3D graphics | Implemented |
image | Image processing | Implemented |
viz | Data visualization | Implemented |
plot | Plotting | Implemented |
Concurrency
| Module | Description | Status |
|---|---|---|
concurrency | Concurrency primitives | Implemented |
distributed | Distributed computing | Implemented |
Energy
| Module | Description | Status |
|---|---|---|
energy | Energy measurement APIs | Implemented |
Platform
| Module | Description | Status |
|---|---|---|
wasm | WebAssembly support | Implemented |
embedded | Embedded systems | Implemented |
mobile | Mobile development | Implemented |
desktop | Desktop applications | Implemented |
Interoperability
| Module | Description | Status |
|---|---|---|
rust_interop | Rust FFI | Implemented |
python | Python interop | Implemented |
go_interop | Go interop | Implemented |
typescript_interop | TypeScript interop | Implemented |
For the complete list, see the stdlib/ directory in the distribution.
String
The String type is a heap-allocated, growable UTF-8 string.
Construction
let s = "Hello, world!"; // string literal
let empty = String::new(); // empty string
let from_chars = String::from("hello");
Operations
Length
let len = s.len(); // byte length
let empty = s.is_empty();
Concatenation
let greeting = "Hello, " + name;
let full = first + " " + last;
Comparison
if s == "hello" {
// string equality
}
Substring and Indexing
let first_byte = s[0]; // byte at index (u8)
let sub = s.substring(0, 5); // substring by byte range
Conversion
let n: i32 = 42;
let s = n.to_string(); // "42"
let x: f64 = 3.14;
let s = x.to_string(); // "3.14"
Search
let found = s.contains("world");
let pos = s.find("world"); // Option<usize>
let starts = s.starts_with("Hello");
let ends = s.ends_with("!");
Transformation
let upper = s.to_uppercase();
let lower = s.to_lowercase();
let trimmed = s.trim();
Split
let parts = s.split(","); // Vec<String>
let lines = s.split("\n");
Memory Layout
String {
data: *mut u8, // pointer to UTF-8 bytes
len: usize, // byte length
capacity: usize, // allocated capacity
}
Strings are heap-allocated and own their data. When a String is dropped, its memory is freed.
Vec<T>
A contiguous, growable array type. The most commonly used collection in Joule.
Construction
let mut v: Vec<i32> = Vec::new(); // empty vector
Adding Elements
v.push(1);
v.push(2);
v.push(3);
Accessing Elements
let first = v[0]; // indexing (panics if out of bounds)
let len = v.len(); // number of elements
let empty = v.is_empty(); // true if len == 0
Iteration
for item in v {
process(item);
}
Removing Elements
let last = v.pop(); // Option<T> -- removes and returns last element
Common Patterns
Collecting Results
let mut results: Vec<i32> = Vec::new();
let mut i = 0;
while i < 10 {
results.push(i * i);
i = i + 1;
}
As a Stack
let mut stack: Vec<i32> = Vec::new();
stack.push(1); // push
stack.push(2);
let top = stack.pop(); // pop -- Option::Some(2)
Memory Layout
Vec<T> {
data: *mut T, // pointer to heap allocation
len: usize, // number of elements
capacity: usize, // allocated capacity
}
Vec grows automatically when elements are added beyond the current capacity. Growth is amortized O(1).
Option<T>
Represents a value that may or may not be present. Joule's alternative to null pointers.
Variants
pub enum Option<T> {
Some(T), // a value is present
None, // no value
}
Construction
let some: Option<i32> = Option::Some(42);
let none: Option<i32> = Option::None;
Pattern Matching
The primary way to use an Option:
match value {
Option::Some(x) => {
// use x
println!("Got: {}", x);
}
Option::None => {
println!("Nothing");
}
}
Common Methods
Checking
let has_value = opt.is_some(); // bool
let is_empty = opt.is_none(); // bool
Unwrapping
let value = opt.unwrap(); // panics if None
let value = opt.unwrap_or(default); // returns default if None
Common Patterns
Lookup That May Fail
fn find(items: Vec<i32>, target: i32) -> Option<usize> {
let mut i = 0;
while i < items.len() {
if items[i] == target {
return Option::Some(i);
}
i = i + 1;
}
Option::None
}
Optional Fields
pub struct User {
pub name: String,
pub email: Option<String>,
}
Memory Layout
Option<T> {
is_some: bool, // discriminant
value: T, // the value (undefined when is_some == false)
}
Result<T, E>
Represents an operation that can succeed with a value of type T or fail with an error of type E.
Variants
pub enum Result<T, E> {
Ok(T), // success
Err(E), // failure
}
Construction
let ok: Result<i32, String> = Result::Ok(42);
let err: Result<i32, String> = Result::Err("something went wrong");
Pattern Matching
match parse_number(input) {
Result::Ok(n) => {
println!("Parsed: {}", n);
}
Result::Err(e) => {
println!("Error: {}", e);
}
}
Common Methods
Checking
let succeeded = result.is_ok(); // bool
let failed = result.is_err(); // bool
Unwrapping
let value = result.unwrap(); // panics if Err
let value = result.unwrap_or(default); // returns default if Err
Common Patterns
Functions That Can Fail
fn parse_file(path: String) -> Result<Data, String> {
let content = read_file(path);
match content {
Result::Ok(text) => {
// parse text into Data
Result::Ok(data)
}
Result::Err(e) => {
Result::Err("Failed to read file: " + e)
}
}
}
Error Accumulation
fn parse_all(inputs: Vec<String>) -> Result<Vec<i32>, Vec<String>> {
let mut results: Vec<i32> = Vec::new();
let mut errors: Vec<String> = Vec::new();
for input in inputs {
match parse_number(input) {
Result::Ok(n) => results.push(n),
Result::Err(e) => errors.push(e),
}
}
if errors.is_empty() {
Result::Ok(results)
} else {
Result::Err(errors)
}
}
Memory Layout
Result<T, E> {
is_ok: bool, // discriminant
union {
ok: T, // success value
err: E, // error value
}
}
HashMap<K, V>
A hash map (dictionary) that stores key-value pairs with O(1) average lookup.
Construction
let mut map: HashMap<String, i32> = HashMap::new();
Insertion
map.insert("alice", 42);
map.insert("bob", 17);
map.insert("carol", 99);
Lookup
let value = map.get("alice"); // Option<i32>
match map.get("alice") {
Option::Some(v) => println!("Found: {}", v),
Option::None => println!("Not found"),
}
Checking Membership
let exists = map.contains_key("alice"); // bool
Removal
let removed = map.remove("bob"); // Option<i32>
Size
let count = map.len(); // number of entries
let empty = map.is_empty(); // true if len == 0
Iteration
for (key, value) in map {
println!("{}: {}", key, value);
}
Common Patterns
Word Counter
fn count_words(text: String) -> HashMap<String, i32> {
let mut counts: HashMap<String, i32> = HashMap::new();
let words = text.split(" ");
for word in words {
let current = counts.get(word).unwrap_or(0);
counts.insert(word, current + 1);
}
counts
}
Configuration Store
pub struct Config {
values: HashMap<String, String>,
}
impl Config {
pub fn get(self, key: String) -> Option<String> {
self.values.get(key)
}
pub fn set(mut self, key: String, value: String) {
self.values.insert(key, value);
}
}
Primitive Types
Integer Types
Signed Integers
| Type | Size | Min | Max |
|---|---|---|---|
i8 | 8-bit | -128 | 127 |
i16 | 16-bit | -32,768 | 32,767 |
i32 | 32-bit | -2,147,483,648 | 2,147,483,647 |
i64 | 64-bit | -9.2 * 10^18 | 9.2 * 10^18 |
isize | pointer-sized | Platform dependent | Platform dependent |
Unsigned Integers
| Type | Size | Min | Max |
|---|---|---|---|
u8 | 8-bit | 0 | 255 |
u16 | 16-bit | 0 | 65,535 |
u32 | 32-bit | 0 | 4,294,967,295 |
u64 | 64-bit | 0 | 1.8 * 10^19 |
usize | pointer-sized | 0 | Platform dependent |
Integer Methods
let x: i32 = 42;
let s = x.to_string(); // "42"
let abs = x.abs(); // absolute value
let min = x.min(10); // minimum of two values
let max = x.max(100); // maximum of two values
Integer Literals
let dec = 42; // decimal
let hex = 0xFF; // hexadecimal
let oct = 0o77; // octal
let bin = 0b1010; // binary
let with_sep = 1_000_000; // underscore separator
let typed = 42u8; // type suffix
Floating-Point Types
| Type | Size | Precision | Range | Energy |
|---|---|---|---|---|
f16 | 16-bit | ~3 digits | ~6.1 * 10^-5 to 65504 | 0.4 pJ |
bf16 | 16-bit | ~3 digits | ~1.2 * 10^-38 to ~3.4 * 10^38 | 0.4 pJ |
f32 | 32-bit | ~7 digits | ~1.2 * 10^-38 to ~3.4 * 10^38 | 0.35 pJ |
f64 | 64-bit | ~15 digits | ~2.2 * 10^-308 to ~1.8 * 10^308 | 0.35 pJ |
Half-Precision Types
f16 is IEEE 754 half-precision — useful for signal processing and inference where memory bandwidth matters more than precision.
bf16 (Brain Float) has the same exponent range as f32 but only 8 mantissa bits. Designed for ML training where gradients don't need full precision. Used natively on Google TPUs, NVIDIA A100+, and Apple Neural Engine.
let weight: f16 = 0.5f16;
let grad: bf16 = 0.001bf16;
// Convert to/from f32
let full: f32 = weight as f32;
let half: f16 = full as f16;
Float Methods
let x: f64 = 3.14;
let s = x.to_string(); // "3.14"
let abs = x.abs(); // absolute value
let sqrt = x.sqrt(); // square root
let floor = x.floor(); // round down
let ceil = x.ceil(); // round up
let round = x.round(); // round to nearest
Float Literals
let a = 3.14; // f64 (default)
let b = 3.14f32; // f32
let c = 1.0e10; // scientific notation
let d = 2.5e-3; // 0.0025
Boolean
let t: bool = true;
let f: bool = false;
Boolean Operations
let and = a && b; // logical AND (short-circuit)
let or = a || b; // logical OR (short-circuit)
let not = !a; // logical NOT
Character
A Unicode scalar value (4 bytes):
let c: char = 'A';
let emoji: char = '\u{1F600}';
let newline: char = '\n';
Unit Type
The unit type () represents no meaningful value:
fn do_something() {
// implicitly returns ()
}
let unit: () = ();
Type Conversions
Use as for numeric conversions:
let i: i32 = 42;
let f: f64 = i as f64; // 42.0
let u: u8 = i as u8; // 42 (truncates if > 255)
let s: usize = i as usize; // 42
Conversions are explicit -- Joule does not implicitly convert between numeric types.
Collections
Joule provides a comprehensive set of collection types in the standard library.
Overview
| Type | Description | Ordered | Unique Keys | Use Case |
|---|---|---|---|---|
Vec<T> | Dynamic array | Yes (insertion) | No | General-purpose sequence |
HashMap<K,V> | Hash table | No | Yes | Key-value lookup |
HashSet<T> | Hash set | No | Yes | Unique element set |
BTreeMap<K,V> | Sorted map | Yes (key order) | Yes | Ordered key-value lookup |
BTreeSet<T> | Sorted set | Yes (value order) | Yes | Ordered unique elements |
VecDeque<T> | Ring buffer | Yes (insertion) | No | Queue / double-ended queue |
LinkedList<T> | Doubly-linked list | Yes (insertion) | No | Frequent middle insertion/removal |
BinaryHeap<T> | Max-heap | By priority | No | Priority queue |
Vec<T>
See Vec for full documentation.
let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
let first = v[0];
HashMap<K, V>
See HashMap for full documentation.
let mut map: HashMap<String, i32> = HashMap::new();
map.insert("key", 42);
let val = map.get("key");
HashSet<T>
An unordered set of unique elements.
let mut set: HashSet<i32> = HashSet::new();
set.insert(1);
set.insert(2);
set.insert(1); // no effect, already present
let has = set.contains(1); // true
let count = set.len(); // 2
BTreeMap<K, V>
A sorted map. Keys are kept in sorted order.
let mut map: BTreeMap<String, i32> = BTreeMap::new();
map.insert("banana", 2);
map.insert("apple", 1);
map.insert("cherry", 3);
// Iterates in key order: apple, banana, cherry
for (key, value) in map {
println!("{}: {}", key, value);
}
BTreeSet<T>
A sorted set of unique elements.
let mut set: BTreeSet<i32> = BTreeSet::new();
set.insert(3);
set.insert(1);
set.insert(2);
// Iterates in order: 1, 2, 3
for item in set {
println!("{}", item);
}
VecDeque<T>
A double-ended queue implemented as a ring buffer.
let mut deque: VecDeque<i32> = VecDeque::new();
deque.push_back(1);
deque.push_back(2);
deque.push_front(0);
let front = deque.pop_front(); // Option::Some(0)
let back = deque.pop_back(); // Option::Some(2)
BinaryHeap<T>
A max-heap (priority queue). The largest element is always at the top.
let mut heap: BinaryHeap<i32> = BinaryHeap::new();
heap.push(3);
heap.push(1);
heap.push(4);
heap.push(1);
heap.push(5);
let max = heap.pop(); // Option::Some(5)
let next = heap.pop(); // Option::Some(4)
SmallVec[T; N]
A vector that stores up to N elements inline (on the stack), spilling to the heap only when the capacity is exceeded. Ideal for short, bounded collections where heap allocation is wasteful.
let mut sv: SmallVec[i32; 8] = SmallVec::new();
// First 8 elements are stored inline — no heap allocation
for i in 0..8 {
sv.push(i); // 0.5 pJ per push (inline)
}
// 9th element triggers heap spill — 45.0 pJ
sv.push(99);
sv.len(); // 9
sv.capacity(); // 16 (heap capacity after spill)
sv.spilled(); // true
sv.get(0); // 0
sv.pop(); // 99
sv.clear();
sv.drop(); // free heap if spilled
Energy trade-off: Inline pushes cost 0.5 pJ vs ~45 pJ for heap spill. Size N so that most instances never spill.
Deque<T>
Double-ended queue implemented as a ring buffer. O(1) push/pop at both ends.
let mut dq: Deque<i32> = Deque::new();
dq.push_back(1);
dq.push_back(2);
dq.push_front(0);
let front = dq.pop_front(); // Option::Some(0)
let back = dq.pop_back(); // Option::Some(2)
dq.front(); // peek at front
dq.back(); // peek at back
dq.len(); // 1
dq.rotate_left(1); // rotate elements
Arena<T>
Bump allocator — allocates by advancing a pointer. Individual elements cannot be freed; call reset() to free everything at once in O(1). Ideal for phase-based allocation (parsers, compilers, frame allocators).
let mut arena: Arena<AstNode> = Arena::new();
// Allocation is a pointer bump — 1.0 pJ
let node1 = arena.alloc(AstNode { kind: "expr", children: Vec::new() });
let node2 = arena.alloc(AstNode { kind: "stmt", children: Vec::new() });
arena.len(); // 2 elements allocated
arena.bytes_used(); // bytes consumed
arena.bytes_capacity(); // total buffer size
// Free everything at once — 0.5 pJ regardless of count
arena.reset();
BitSet
Fixed-capacity bit field stored as u64 words. Space-efficient boolean set with O(1) insert/contains and fast set operations.
let mut bits = BitSet::new();
bits.insert(0);
bits.insert(42);
bits.insert(63);
bits.contains(42); // true
bits.remove(42);
bits.count_ones(); // number of set bits
bits.count_zeros(); // number of unset bits
// Set operations
let union = bits.union(&other);
let inter = bits.intersection(&other);
let diff = bits.difference(&other);
bits.is_subset(&other);
BitVec
Dynamic-length bit vector. Like BitSet but growable.
let mut bv = BitVec::new();
bv.push(true);
bv.push(false);
bv.push(true);
bv.get(1); // false
bv.set(1, true);
bv.len(); // 3 (bits)
bv.count_ones(); // 3
bv.pop(); // true
Choosing a Collection
- Need a sequence? Use
Vec<T> - Need a short, bounded sequence? Use
SmallVec[T; N](avoids heap allocation) - Need fast key lookup? Use
HashMap<K,V> - Need unique elements? Use
HashSet<T> - Need sorted keys? Use
BTreeMap<K,V>(12.0 pJ per traversal) - Need a queue? Use
Deque<T>(2.0 pJ push/pop) - Need a priority queue? Use
BinaryHeap<T> - Need phase-based allocation? Use
Arena<T>(1.0 pJ alloc, 0.5 pJ free-all) - Need a compact boolean set? Use
BitSetorBitVec(0.3 pJ per operation) - Need frequent middle insertion? Use
LinkedList<T>(rare)
Smart Pointers
Smart pointers manage ownership and sharing of heap-allocated data with automatic cleanup.
Overview
| Type | Thread-safe | Use Case | Energy Cost |
|---|---|---|---|
Box<T> | N/A | Heap allocation, recursive types | Allocation only |
Rc<T> | No | Single-threaded shared ownership | 3.0 pJ clone/drop |
Arc<T> | Yes | Multi-threaded shared ownership | 3.0 pJ clone/drop (atomic) |
Cow<T> | N/A | Clone-on-write optimization | Free reads, allocation on write |
Box<T>
Heap-allocated value. Required for recursive types. Box<T> is a pointer in memory — zero overhead beyond the allocation.
// Recursive type requires Box
pub enum Expr {
Literal(i32),
Add { left: Box<Expr>, right: Box<Expr> },
Neg { inner: Box<Expr> },
}
let expr = Expr::Add {
left: Box::new(Expr::Literal(1)),
right: Box::new(Expr::Literal(2)),
};
Methods
let b = Box::new(42);
let inner = b.into_inner(); // 42 — consumes the Box
let r: &i32 = b.as_ref(); // borrow the inner value
let r: &mut i32 = b.as_mut(); // mutable borrow
let ptr = b.leak(); // leak memory, return raw pointer
Rc<T>
Reference-counted pointer for single-threaded shared ownership. Multiple Rc<T> values can point to the same data. The data is freed when the last Rc is dropped.
let a = Rc::new(42);
let b = a.clone(); // increment reference count (3.0 pJ)
let c = a.clone(); // count is now 3
println!("{}", Rc::strong_count(&a)); // 3
// When a, b, c all go out of scope, the value is freed
Methods
let rc = Rc::new(vec![1, 2, 3]);
let count = Rc::strong_count(&rc); // number of references
let inner = Rc::into_inner(rc); // unwrap if count == 1
let r: &Vec<i32> = rc.as_ref(); // borrow inner value
// Mutable access (only if count == 1)
let mut rc = Rc::new(42);
if let Option::Some(val) = Rc::get_mut(&mut rc) {
*val = 100;
}
Use Case: Shared Graph Nodes
pub struct Node {
pub value: i32,
pub children: Vec<Rc<Node>>,
}
let leaf = Rc::new(Node { value: 1, children: Vec::new() });
let parent = Node {
value: 0,
children: vec![leaf.clone(), leaf.clone()], // shared ownership
};
Arc<T>
Atomically reference-counted pointer for multi-threaded shared ownership. Same API as Rc<T>, but uses atomic operations for thread safety.
use std::concurrency::spawn;
let data = Arc::new(vec![1, 2, 3, 4, 5]);
let handle = spawn(|| {
let local = data.clone(); // atomic increment (3.0 pJ)
println!("len = {}", local.len());
});
println!("len = {}", data.len()); // still valid in main thread
Methods
let arc = Arc::new(42);
let count = Arc::strong_count(&arc); // number of references
let r: &i32 = arc.as_ref(); // borrow inner value
let cloned = arc.clone(); // atomic increment
// Arc::get_mut — only if count == 1
// Arc::into_inner — unwrap if count == 1
// Arc::make_mut — clone inner if shared, then return &mut
Energy: Rc vs Arc
| Operation | Rc | Arc |
|---|---|---|
clone | 3.0 pJ (increment) | 3.0 pJ (atomic increment) |
drop | 3.0 pJ (decrement + conditional free) | 3.0 pJ (atomic decrement + conditional free) |
as_ref | 0 pJ (pointer deref) | 0 pJ (pointer deref) |
Use Rc when data stays on one thread. Use Arc when sharing across threads. The energy cost is similar, but Arc incurs cache-line contention overhead under high concurrency.
Cow<T>
Clone-on-write smart pointer. Wraps either a borrowed reference or an owned value. Reading is free; writing clones the data only if it's currently borrowed.
// Start with a borrowed value
let text = Cow::borrowed("hello");
println!("{}", text.as_ref()); // free — no allocation
// Convert to owned only when needed
let owned = text.to_owned(); // allocates if borrowed
// Check state
text.is_borrowed(); // true
text.is_owned(); // false
Methods
let cow = Cow::borrowed("hello");
let cow2 = Cow::owned("world".to_string());
let r: &str = cow.as_ref(); // borrow — always free
let s: String = cow.into_owned(); // consume, clone if borrowed
let owned = cow.to_owned(); // clone if borrowed, return owned Cow
cow.is_borrowed(); // true if wrapping a reference
cow.is_owned(); // true if wrapping an owned value
Use Case: Conditional Transformation
fn normalize(input: &str) -> Cow<str> {
if input.contains(' ') {
// Only allocate when we actually need to modify
Cow::owned(input.replace(' ', "_"))
} else {
// No allocation — return a reference to the original
Cow::borrowed(input)
}
}
// Most inputs pass through without allocation
let a = normalize("hello"); // Cow::borrowed — 0 allocation
let b = normalize("hello world"); // Cow::owned — 1 allocation
Choosing a Smart Pointer
- Need heap allocation for recursive types? Use
Box<T> - Need shared ownership on one thread? Use
Rc<T> - Need shared ownership across threads? Use
Arc<T> - Need to avoid cloning until mutation? Use
Cow<T> - Need unique ownership? Just use the value directly (no pointer needed)
N-Dimensional Arrays
Joule provides first-class multi-dimensional array types for scientific computing, machine learning, and signal processing.
Overview
| Type | Description | Owns Data | Energy Cost |
|---|---|---|---|
NDArray[T; N] | Owned N-dimensional array | Yes | Allocation + compute |
NDView[T; N] | Non-owning view into an NDArray | No | Zero-copy |
CowArray[T; N] | Clone-on-write array | Shared | Free reads, allocation on write |
DynArray[T] | Dynamically-ranked array | Yes | Allocation + compute |
The rank N is a compile-time constant, enabling the compiler to optimize indexing and verify dimensionality at compile time.
NDArray[T; N]
Owned, contiguous, row-major multi-dimensional array.
// Create a 2D array (matrix)
let mat: NDArray[f64; 2] = NDArray::zeros([3, 4]); // 3x4 matrix of zeros
let ones: NDArray[f64; 2] = NDArray::ones([2, 2]); // 2x2 matrix of ones
let filled: NDArray[f64; 2] = NDArray::full([3, 3], 7.0); // 3x3 filled with 7.0
// Create from data
let v: NDArray[f64; 1] = NDArray::from_vec(vec![1.0, 2.0, 3.0]);
let m: NDArray[f64; 2] = NDArray::from_vec_shape(vec![1.0, 2.0, 3.0, 4.0], [2, 2]);
Indexing
// Multi-dimensional indexing
let val = mat[1, 2]; // row 1, column 2
mat[0, 0] = 42.0; // set element
// Slicing — returns NDView
let row = mat[0, ..]; // first row
let col = mat[.., 1]; // second column
let sub = mat[1..3, 0..2]; // submatrix
let strided = mat[.., ::2]; // every other column
Methods
let a: NDArray[f64; 2] = NDArray::zeros([3, 4]);
// Shape and metadata
a.shape(); // [3, 4]
a.rank(); // 2
a.len(); // 12 (total elements)
a.strides(); // [4, 1] (row-major)
// Element-wise operations
let b = a.add(&other); // element-wise addition
let c = a.mul(&other); // element-wise multiplication
let d = a.map(|x: f64| -> f64 { x * 2.0 });
// Reductions
let total = a.sum(); // sum all elements
let mean = a.mean(); // average
let max = a.max(); // maximum element
let min = a.min(); // minimum element
// Shape manipulation
let reshaped = a.reshape([4, 3]); // reshape (same element count)
let flat = a.flatten(); // flatten to 1D
let transposed = a.transpose(); // transpose axes
// Linear algebra (2D)
let product = a.matmul(&b); // matrix multiplication
let dot = v1.dot(&v2); // dot product (1D)
NDView[T; N]
A non-owning view into an NDArray. Views are zero-copy — they reference the original data without allocation.
let arr: NDArray[f64; 2] = NDArray::zeros([4, 4]);
// Create views via slicing
let row: NDView[f64; 1] = arr.row(0);
let col: NDView[f64; 1] = arr.col(2);
let sub: NDView[f64; 2] = arr.slice([1..3, 1..3]);
// Views support the same read operations as NDArray
let sum = row.sum();
let max = sub.max();
CowArray[T; N]
Clone-on-write array. Reading is free (shares data with the source). Writing triggers a copy only if the data is shared.
let original: NDArray[f64; 2] = NDArray::ones([100, 100]);
let cow = CowArray::from(&original); // no copy yet
// Reading is free
let val = cow[0, 0]; // reads from original's memory
// Writing triggers a copy (if shared)
cow[0, 0] = 42.0; // now owns its own data
DynArray[T]
Dynamically-ranked array. The rank is determined at runtime, not compile time. Use when the dimensionality isn't known until runtime (e.g., loading arbitrary tensors from files).
let dyn_arr: DynArray[f64] = DynArray::zeros(vec![3, 4, 5]); // 3D
let rank = dyn_arr.rank(); // 3 (runtime value)
let shape = dyn_arr.shape(); // [3, 4, 5]
Broadcasting
Binary operations between arrays of different shapes follow broadcasting rules:
let mat: NDArray[f64; 2] = NDArray::ones([3, 4]); // shape [3, 4]
let row: NDArray[f64; 1] = NDArray::from_vec(vec![1.0, 2.0, 3.0, 4.0]); // shape [4]
// row is broadcast to [3, 4] — each row gets the same values added
let result = mat.add(&row); // shape [3, 4]
Broadcasting rules:
- Dimensions are compared from the right
- Dimensions must be equal, or one of them must be 1
- Missing dimensions on the left are treated as 1
Energy Costs
| Operation | Cost | Notes |
|---|---|---|
| Element access | 0.5 pJ | L1 cache hit |
| Element-wise op | 0.8 pJ/element | Arithmetic + memory |
| Reduction (sum/mean) | 0.8 pJ/element | Sequential scan |
| Matrix multiply | ~2N^3 * 0.8 pJ | Cubic complexity |
| Reshape/transpose | 0 pJ | Metadata-only (no copy) |
| Slice (NDView) | 0 pJ | Zero-copy view |
| Broadcasting | 0 pJ overhead | Applied during compute |
Choosing an Array Type
- Know the rank at compile time? Use
NDArray[T; N]— the compiler verifies dimensions - Need a read-only window? Use
NDView[T; N]— zero-copy, zero allocation - Might or might not modify? Use
CowArray[T; N]— defers allocation until write - Rank determined at runtime? Use
DynArray[T]— flexible but no compile-time dimension checks
SIMD Vector Types
Simd[T; N] provides portable SIMD (Single Instruction, Multiple Data) operations. The compiler maps to platform-native intrinsics where available (x86 SSE/AVX, ARM NEON) with a scalar fallback for portability.
Creating SIMD Vectors
// Splat — fill all lanes with the same value
let v: Simd[f32; 4] = Simd::splat(1.0); // [1.0, 1.0, 1.0, 1.0]
// From an array
let v: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
// Load from a pointer + offset
let data = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0];
let v: Simd[f32; 4] = Simd::load(&data, 0); // first 4 elements
let w: Simd[f32; 4] = Simd::load(&data, 4); // next 4 elements
Common Lane Widths
| Type | Lanes | x86 | ARM |
|---|---|---|---|
Simd[f32; 4] | 4 | SSE __m128 | NEON float32x4_t |
Simd[f32; 8] | 8 | AVX __m256 | 2x NEON |
Simd[f64; 2] | 2 | SSE2 __m128d | NEON float64x2_t |
Simd[f64; 4] | 4 | AVX __m256d | 2x NEON |
Simd[i32; 4] | 4 | SSE2 __m128i | NEON int32x4_t |
Simd[i32; 8] | 8 | AVX2 __m256i | 2x NEON |
Arithmetic Operations
All arithmetic operates lane-by-lane:
let a: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let b: Simd[f32; 4] = Simd::from_array([5.0, 6.0, 7.0, 8.0]);
let sum = a.add(&b); // [6.0, 8.0, 10.0, 12.0]
let diff = a.sub(&b); // [-4.0, -4.0, -4.0, -4.0]
let prod = a.mul(&b); // [5.0, 12.0, 21.0, 32.0]
let quot = a.div(&b); // [0.2, 0.333, 0.429, 0.5]
Reduction Operations
Reduce all lanes to a single scalar:
let v: Simd[f32; 4] = Simd::from_array([1.0, 2.0, 3.0, 4.0]);
let total = v.sum(); // 10.0 — horizontal sum of all lanes
Comparison and Selection
let a: Simd[f32; 4] = Simd::from_array([1.0, 5.0, 3.0, 8.0]);
let b: Simd[f32; 4] = Simd::from_array([2.0, 4.0, 6.0, 7.0]);
let lo = a.min(&b); // [1.0, 4.0, 3.0, 7.0]
let hi = a.max(&b); // [2.0, 5.0, 6.0, 8.0]
let same = a.eq(&b); // false (element-wise equality check)
Unary Operations
let v: Simd[f32; 4] = Simd::from_array([-1.0, 2.0, -3.0, 4.0]);
let pos = v.abs(); // [1.0, 2.0, 3.0, 4.0]
let neg = v.neg(); // [1.0, -2.0, 3.0, -4.0]
Memory Operations
let data = vec![0.0; 1024];
// Load 4 elements starting at offset 8
let chunk: Simd[f32; 4] = Simd::load(&data, 8);
// Store back to memory
chunk.store(&mut data, 8);
// Convert to/from array
let arr: [f32; 4] = v.to_array();
Example: Vectorized Dot Product
#[energy_budget(max_joules = 0.00005)]
fn dot_product(a: &[f32], b: &[f32]) -> f32 {
let n = a.len();
let mut sum: Simd[f32; 8] = Simd::splat(0.0);
let mut i = 0;
// Process 8 elements at a time
while i + 8 <= n {
let va: Simd[f32; 8] = Simd::load(a, i);
let vb: Simd[f32; 8] = Simd::load(b, i);
sum = sum.add(&va.mul(&vb));
i = i + 8;
}
// Horizontal sum + scalar remainder
let mut result = sum.sum();
while i < n {
result = result + a[i] * b[i];
i = i + 1;
}
result
}
Energy Costs
| Operation | Cost | Notes |
|---|---|---|
| Lane arithmetic (add/sub/mul/div) | 2.0 pJ | Single SIMD instruction |
| Horizontal reduction (sum) | 2.0 pJ | Log2(N) shuffle + add |
| Load/store | 0.5 pJ | L1 cache, aligned |
| Comparison (min/max/eq) | 2.0 pJ | Single SIMD instruction |
SIMD operations process N elements for roughly the same energy as one scalar operation. For a Simd[f32; 8], that's ~8x energy efficiency compared to a scalar loop — the primary reason to use SIMD in energy-aware code.
Platform Detection
The compiler automatically selects the best implementation:
- x86/x86_64: Uses SSE/AVX intrinsics via
<immintrin.h> - ARM64 (Apple Silicon, etc.): Uses NEON intrinsics via
<arm_neon.h> - Other platforms: Falls back to scalar loops (same behavior, no hardware acceleration)
No #[cfg] attributes needed in user code — the abstraction is portable.
Time
Joule provides two types for time measurement: Duration for time spans and Instant for timestamps.
Duration
A time span measured in nanoseconds. All arithmetic is exact — no floating-point rounding.
Creating Durations
let d1 = Duration::from_secs(5); // 5 seconds
let d2 = Duration::from_millis(1500); // 1.5 seconds
let d3 = Duration::from_micros(250); // 250 microseconds
let d4 = Duration::from_nanos(100); // 100 nanoseconds
Querying
let d = Duration::from_millis(2500);
d.as_secs(); // 2
d.as_millis(); // 2500
d.as_micros(); // 2500000
d.as_nanos(); // 2500000000
d.is_zero(); // false
Arithmetic
let a = Duration::from_secs(3);
let b = Duration::from_millis(500);
let sum = a.add(&b); // 3.5 seconds
let diff = a.sub(&b); // 2.5 seconds
let doubled = a.mul(2); // 6 seconds
let halved = a.div(2); // 1.5 seconds
// Checked arithmetic (returns Option)
let safe = a.checked_add(&b); // Option::Some(3.5s)
let over = a.checked_sub(&Duration::from_secs(10)); // Option::None
Instant
A monotonic timestamp. Cannot go backwards. Used for measuring elapsed time.
Measuring Elapsed Time
let start = Instant::now(); // 15.0 pJ — reads system clock
// ... do work ...
heavy_computation();
let elapsed: Duration = start.elapsed();
println!("Took {} ms", elapsed.as_millis());
Comparing Instants
let t1 = Instant::now();
// ... work ...
let t2 = Instant::now();
let gap: Duration = t2.duration_since(&t1);
Example: Benchmarking with Energy
#[energy_budget(max_joules = 0.001)]
fn timed_sort(data: Vec<i32>) -> (Vec<i32>, Duration) {
let start = Instant::now();
let sorted = sort(data);
let elapsed = start.elapsed();
(sorted, elapsed)
}
fn main() {
let data = vec![5, 3, 1, 4, 2, 8, 7, 6];
let (sorted, time) = timed_sort(data);
println!("Sorted in {} us", time.as_micros());
}
Energy Costs
| Operation | Cost | Notes |
|---|---|---|
Duration arithmetic | 0.05 pJ | Integer add/sub |
Instant::now() | 15.0 pJ | System clock read (syscall) |
elapsed() | 15.0 pJ | Clock read + subtraction |
duration_since() | 0.05 pJ | Integer subtraction |
Instant::now() is the expensive operation — it requires a system call (clock_gettime on Linux, mach_absolute_time on macOS). Avoid calling it in tight loops. Measure coarse-grained sections instead.
Numeric Types
Specialized numeric types beyond the standard integer and float primitives.
Decimal
128-bit decimal type for exact arithmetic. No floating-point rounding errors. Essential for financial calculations.
let price = Decimal::new(19, 99, false); // 19.99
let tax = Decimal::from_str("0.0825"); // 8.25%
let total = price.mul(&tax).add(&price); // exact: 21.6392...
// No floating-point surprise
let a = Decimal::from_str("0.1");
let b = Decimal::from_str("0.2");
let c = a.add(&b);
// c == 0.3 exactly (unlike f64 where 0.1 + 0.2 != 0.3)
Methods
let d = Decimal::from_str("123.456");
// Arithmetic
d.add(&other); d.sub(&other);
d.mul(&other); d.div(&other);
d.rem(&other); // remainder
// Rounding
d.round(2); // 123.46 — round to 2 decimal places
d.floor(); // 123.0
d.ceil(); // 124.0
d.trunc(); // 123.0 — truncate toward zero
// Properties
d.abs(); // absolute value
d.neg(); // negate
d.scale(); // number of decimal places
d.mantissa(); // integer mantissa
d.is_zero(); // false
d.is_negative(); // false
// Conversion
d.to_f64(); // 123.456 (lossy)
d.to_string(); // "123.456"
Energy Cost
| Operation | Cost |
|---|---|
| Decimal arithmetic | 5.0 pJ |
| Decimal comparison | 0.5 pJ |
Decimal is ~14x more expensive than f64 arithmetic but guarantees exact results. Use it where correctness matters more than speed (finance, accounting, currency).
Complex<T>
Complex number with real and imaginary parts. Generic over the component type (typically f32 or f64).
let z = Complex::new(3.0, 4.0); // 3 + 4i
let w = Complex::new(1.0, -2.0); // 1 - 2i
// Arithmetic
let sum = z.add(&w); // 4 + 2i
let prod = z.mul(&w); // 11 + 2i
let quot = z.div(&w); // -1 + 2i
// Properties
z.real(); // 3.0
z.imag(); // 4.0
z.abs(); // 5.0 (magnitude: sqrt(3^2 + 4^2))
z.arg(); // 0.927... (phase angle in radians)
z.conj(); // 3 - 4i (complex conjugate)
z.norm(); // 25.0 (squared magnitude)
Advanced Operations
let z = Complex::new(1.0, 1.0);
z.exp(); // e^z
z.log(); // natural logarithm
z.sqrt(); // principal square root
z.pow(&w); // z^w
// Polar form
let polar = Complex::from_polar(5.0, 0.927); // magnitude, angle
Energy Cost
| Operation | Cost |
|---|---|
| Complex add/sub | 1.6 pJ (2x real) |
| Complex multiply | 1.6 pJ |
| Complex divide | 3.2 pJ |
| abs/norm | 1.6 pJ |
| exp/log/sqrt | 5.0 pJ |
Intern
Interned string — stored once in a global table, compared by pointer equality. Ideal for identifiers, keywords, and symbols that appear repeatedly.
let a = Intern::new("hello");
let b = Intern::new("hello");
// Pointer equality — O(1) comparison instead of O(n) string compare
a.eq(&b); // true (same pointer)
// String access
a.as_str(); // "hello"
a.len(); // 5
a.is_empty(); // false
a.hash(); // precomputed hash value
Use Case: Compiler Symbol Tables
pub struct Symbol {
pub name: Intern,
}
// Creating millions of Symbol values with the same name
// only stores the string once in memory
let sym1 = Symbol { name: Intern::new("x") };
let sym2 = Symbol { name: Intern::new("x") };
// Comparison is pointer equality — O(1), not O(n)
sym1.name.eq(&sym2.name); // true, instant
Energy Cost
| Operation | Cost | Notes |
|---|---|---|
Intern::new (first time) | 10.0 pJ | Hash table insert |
Intern::new (duplicate) | 10.0 pJ | Hash table lookup |
eq | 0.05 pJ | Pointer comparison |
as_str | 0 pJ | Pointer dereference |
The 10.0 pJ cost of Intern::new is amortized over all subsequent O(1) comparisons. For strings compared frequently (like identifiers in a compiler), interning saves both energy and time.
I/O
File and stream I/O operations.
Reading Files
use std::io::File;
let content = File::read_to_string("data.txt");
match content {
Result::Ok(text) => process(text),
Result::Err(e) => println!("Error: {}", e),
}
Writing Files
use std::io::File;
let result = File::write_string("output.txt", "Hello, world!");
match result {
Result::Ok(_) => println!("Written successfully"),
Result::Err(e) => println!("Error: {}", e),
}
Reading Lines
use std::io::File;
let lines = File::read_lines("data.txt");
match lines {
Result::Ok(lines) => {
for line in lines {
process_line(line);
}
}
Result::Err(e) => println!("Error: {}", e),
}
Standard Streams
use std::io::{stdin, stdout, stderr};
// Read from stdin
let line = stdin::read_line();
// Write to stdout
stdout::write("Hello\n");
// Write to stderr
stderr::write("Error message\n");
Path Operations
use std::io::Path;
let p = Path::new("/home/user/data.txt");
let exists = p.exists();
let is_file = p.is_file();
let is_dir = p.is_dir();
let parent = p.parent(); // Option<Path>
let filename = p.file_name(); // Option<String>
let ext = p.extension(); // Option<String>
Directory Operations
use std::io::{create_dir, read_dir, remove_dir};
create_dir("output");
let entries = read_dir(".");
match entries {
Result::Ok(files) => {
for entry in files {
println!("{}", entry.name());
}
}
Result::Err(e) => println!("Error: {}", e),
}
Buffered I/O
For performance-critical I/O, use buffered readers and writers:
use std::io::{BufReader, BufWriter};
let reader = BufReader::new(File::open("large.txt"));
let writer = BufWriter::new(File::create("output.txt"));
Math
Mathematical functions, constants, and linear algebra operations.
Constants
use std::math;
let pi = math::PI; // 3.141592653589793
let e = math::E; // 2.718281828459045
let tau = math::TAU; // 6.283185307179586
let sqrt2 = math::SQRT_2; // 1.4142135623730951
Basic Functions
use std::math;
let a = math::abs(-42.0); // 42.0
let s = math::sqrt(144.0); // 12.0
let p = math::pow(2.0, 10.0); // 1024.0
let l = math::log(math::E); // 1.0
let l2 = math::log2(1024.0); // 10.0
let l10 = math::log10(1000.0); // 3.0
Trigonometry
use std::math;
let s = math::sin(math::PI / 2.0); // 1.0
let c = math::cos(0.0); // 1.0
let t = math::tan(math::PI / 4.0); // 1.0
let as = math::asin(1.0); // PI/2
let ac = math::acos(0.0); // PI/2
let at = math::atan(1.0); // PI/4
let at2 = math::atan2(1.0, 1.0); // PI/4
Rounding
use std::math;
let f = math::floor(3.7); // 3.0
let c = math::ceil(3.2); // 4.0
let r = math::round(3.5); // 4.0
let t = math::trunc(3.9); // 3.0
Min/Max
use std::math;
let mn = math::min(3.0, 7.0); // 3.0
let mx = math::max(3.0, 7.0); // 7.0
let cl = math::clamp(15.0, 0.0, 10.0); // 10.0
Linear Algebra
use std::math::linear;
// Vector operations
let v1 = linear::Vector::new([1.0, 2.0, 3.0]);
let v2 = linear::Vector::new([4.0, 5.0, 6.0]);
let sum = v1.add(v2);
let dot = v1.dot(v2); // 32.0
let norm = v1.norm(); // sqrt(14)
let scaled = v1.scale(2.0);
// Matrix operations
let m = linear::Matrix::identity(3);
let det = m.determinant();
let inv = m.inverse();
let product = m.multiply(m);
Complex Numbers
use std::math::complex::Complex;
let z1 = Complex::new(3.0, 4.0); // 3 + 4i
let z2 = Complex::new(1.0, 2.0); // 1 + 2i
let sum = z1.add(z2); // 4 + 6i
let product = z1.mul(z2); // -5 + 10i
let magnitude = z1.abs(); // 5.0
let conjugate = z1.conj(); // 3 - 4i
Statistics
use std::statistics;
let data = [2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0];
let mean = statistics::mean(data); // 5.0
let median = statistics::median(data); // 4.5
let stddev = statistics::std_dev(data); // ~2.0
let variance = statistics::variance(data);
Random Numbers
use std::math::random;
let n = random::int(0, 100); // random integer in [0, 100)
let f = random::float(); // random f64 in [0.0, 1.0)
let b = random::bool(); // random boolean