C Energy Analysis
Joule analyzes C code for energy consumption, targeting the low-level patterns where energy waste is most impactful: memory allocation, cache access patterns, and nested loop structures.
Quick Start
# Static energy analysis
joulec --lift c program.c
# Execute with energy tracking
joulec --lift-run c program.c
# Execute with energy optimization
joulec --energy-optimize --lift-run c program.c
Supported Features
| Category | Features |
|---|---|
| Types | int, long, float, double, char, void, size_t, unsigned variants |
| Pointers | Declaration, dereference *p, address-of &x, pointer arithmetic |
| Arrays | Fixed-size int arr[N], multidimensional int mat[M][N] |
| Control flow | if/else, while, do-while, for, switch/case, break, continue, goto (limited) |
| Functions | Declaration, definition, forward declarations, recursion |
| Structs | Definition, field access . and ->, nested structs |
| Memory | malloc(), calloc(), realloc(), free() |
| I/O | printf(), scanf(), puts(), getchar() |
| Math | sqrt(), pow(), abs(), floor(), ceil(), sin(), cos(), log(), exp() |
| Operators | All arithmetic, bitwise, comparison, logical, ternary ?:, comma |
Common Energy Anti-Patterns
1. malloc Inside Loops
// BAD — 1000 allocations, 1000 frees
for (int i = 0; i < 1000; i++) {
int *buf = malloc(sizeof(int) * 100);
process(buf, 100);
free(buf);
}
// GOOD — allocate once, reuse
int *buf = malloc(sizeof(int) * 100);
for (int i = 0; i < 1000; i++) {
process(buf, 100);
}
free(buf);
Category: ALLOCATION | Severity: High | Savings: ~5x
Each malloc/free cycle costs ~200 pJ (DRAM access) plus system call overhead. In a tight loop, this dominates the energy budget.
2. Cache-Unfriendly Access Patterns
// BAD — column-major access on row-major array (cache miss per element)
for (int j = 0; j < N; j++) {
for (int i = 0; i < M; i++) {
sum += matrix[i][j]; // stride = N * sizeof(int)
}
}
// GOOD — row-major access (sequential cache hits)
for (int i = 0; i < M; i++) {
for (int j = 0; j < N; j++) {
sum += matrix[i][j]; // stride = sizeof(int)
}
}
Category: MEMORY | Severity: Critical | Savings: ~10x for large matrices
L1 cache load costs 0.5 pJ. DRAM load costs 200 pJ — a 400x difference. Column-major traversal on row-major data causes a DRAM load on nearly every access.
3. Realloc Growth in Loops
// BAD — realloc doubles per iteration, copies all data each time
int *data = NULL;
int cap = 0;
for (int i = 0; i < n; i++) {
cap++;
data = realloc(data, cap * sizeof(int));
data[cap - 1] = i;
}
// GOOD — geometric growth (amortized O(1) per insert)
int *data = malloc(16 * sizeof(int));
int len = 0, cap = 16;
for (int i = 0; i < n; i++) {
if (len == cap) {
cap *= 2;
data = realloc(data, cap * sizeof(int));
}
data[len++] = i;
}
Category: ALLOCATION | Severity: High | Savings: ~4x
4. Nested Loop Complexity
// BAD — O(n^3) matrix multiply without blocking
for (int i = 0; i < N; i++)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
C[i][j] += A[i][k] * B[k][j];
Category: ALGORITHM | Severity: Critical (for large N)
The energy estimator flags O(n^3) nested loops with high energy estimates and reduced confidence scores.
Worked Example
#include <stdlib.h>
#include <stdio.h>
void matrix_multiply(int *A, int *B, int *C, int n) {
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
C[i * n + j] = 0;
for (int k = 0; k < n; k++) {
C[i * n + j] += A[i * n + k] * B[k * n + j];
}
}
}
}
int main() {
int n = 64;
int *A = calloc(n * n, sizeof(int));
int *B = calloc(n * n, sizeof(int));
int *C = calloc(n * n, sizeof(int));
for (int i = 0; i < n * n; i++) {
A[i] = i % 10;
B[i] = (i * 3) % 10;
}
matrix_multiply(A, B, C, n);
printf("C[0][0] = %d\n", C[0]);
free(A); free(B); free(C);
return 0;
}
$ joulec --lift c matmul.c
Energy Analysis: matmul.c
matrix_multiply 892.40 nJ (confidence: 0.50)
main 45.20 nJ (confidence: 0.75)
Total: 937.60 nJ
Recommendations:
!!! [ALGORITHM] matrix_multiply — O(n^3) nested loop detected
Suggestion: consider cache-blocking or BLAS library for large matrices
Estimated savings: 3-5x with cache blocking
!! [MEMORY] matrix_multiply — inner loop access pattern B[k*n+j] has stride n
Suggestion: transpose B before multiply, or interchange k/j loops
Estimated savings: 2-4x from improved cache locality
Limitations
- No preprocessor directives (
#define,#include,#ifdef) - No function pointers or callbacks
- No variadic functions beyond
printf/scanf - No
typedef(use bare type names) - No
uniontypes - No
enum(use integer constants) - No complex struct initializers (
= { .field = value }) - No inline assembly