C Energy Analysis

Joule analyzes C code for energy consumption, targeting the low-level patterns where energy waste is most impactful: memory allocation, cache access patterns, and nested loop structures.

Quick Start

# Static energy analysis
joulec --lift c program.c

# Execute with energy tracking
joulec --lift-run c program.c

# Execute with energy optimization
joulec --energy-optimize --lift-run c program.c

Supported Features

CategoryFeatures
Typesint, long, float, double, char, void, size_t, unsigned variants
PointersDeclaration, dereference *p, address-of &x, pointer arithmetic
ArraysFixed-size int arr[N], multidimensional int mat[M][N]
Control flowif/else, while, do-while, for, switch/case, break, continue, goto (limited)
FunctionsDeclaration, definition, forward declarations, recursion
StructsDefinition, field access . and ->, nested structs
Memorymalloc(), calloc(), realloc(), free()
I/Oprintf(), scanf(), puts(), getchar()
Mathsqrt(), pow(), abs(), floor(), ceil(), sin(), cos(), log(), exp()
OperatorsAll arithmetic, bitwise, comparison, logical, ternary ?:, comma

Common Energy Anti-Patterns

1. malloc Inside Loops

// BAD — 1000 allocations, 1000 frees
for (int i = 0; i < 1000; i++) {
    int *buf = malloc(sizeof(int) * 100);
    process(buf, 100);
    free(buf);
}

// GOOD — allocate once, reuse
int *buf = malloc(sizeof(int) * 100);
for (int i = 0; i < 1000; i++) {
    process(buf, 100);
}
free(buf);

Category: ALLOCATION | Severity: High | Savings: ~5x

Each malloc/free cycle costs ~200 pJ (DRAM access) plus system call overhead. In a tight loop, this dominates the energy budget.

2. Cache-Unfriendly Access Patterns

// BAD — column-major access on row-major array (cache miss per element)
for (int j = 0; j < N; j++) {
    for (int i = 0; i < M; i++) {
        sum += matrix[i][j];  // stride = N * sizeof(int)
    }
}

// GOOD — row-major access (sequential cache hits)
for (int i = 0; i < M; i++) {
    for (int j = 0; j < N; j++) {
        sum += matrix[i][j];  // stride = sizeof(int)
    }
}

Category: MEMORY | Severity: Critical | Savings: ~10x for large matrices

L1 cache load costs 0.5 pJ. DRAM load costs 200 pJ — a 400x difference. Column-major traversal on row-major data causes a DRAM load on nearly every access.

3. Realloc Growth in Loops

// BAD — realloc doubles per iteration, copies all data each time
int *data = NULL;
int cap = 0;
for (int i = 0; i < n; i++) {
    cap++;
    data = realloc(data, cap * sizeof(int));
    data[cap - 1] = i;
}

// GOOD — geometric growth (amortized O(1) per insert)
int *data = malloc(16 * sizeof(int));
int len = 0, cap = 16;
for (int i = 0; i < n; i++) {
    if (len == cap) {
        cap *= 2;
        data = realloc(data, cap * sizeof(int));
    }
    data[len++] = i;
}

Category: ALLOCATION | Severity: High | Savings: ~4x

4. Nested Loop Complexity

// BAD — O(n^3) matrix multiply without blocking
for (int i = 0; i < N; i++)
    for (int j = 0; j < N; j++)
        for (int k = 0; k < N; k++)
            C[i][j] += A[i][k] * B[k][j];

Category: ALGORITHM | Severity: Critical (for large N)

The energy estimator flags O(n^3) nested loops with high energy estimates and reduced confidence scores.

Worked Example

#include <stdlib.h>
#include <stdio.h>

void matrix_multiply(int *A, int *B, int *C, int n) {
    for (int i = 0; i < n; i++) {
        for (int j = 0; j < n; j++) {
            C[i * n + j] = 0;
            for (int k = 0; k < n; k++) {
                C[i * n + j] += A[i * n + k] * B[k * n + j];
            }
        }
    }
}

int main() {
    int n = 64;
    int *A = calloc(n * n, sizeof(int));
    int *B = calloc(n * n, sizeof(int));
    int *C = calloc(n * n, sizeof(int));

    for (int i = 0; i < n * n; i++) {
        A[i] = i % 10;
        B[i] = (i * 3) % 10;
    }

    matrix_multiply(A, B, C, n);
    printf("C[0][0] = %d\n", C[0]);

    free(A); free(B); free(C);
    return 0;
}
$ joulec --lift c matmul.c
Energy Analysis: matmul.c

  matrix_multiply   892.40 nJ  (confidence: 0.50)
  main               45.20 nJ  (confidence: 0.75)

  Total: 937.60 nJ

Recommendations:
  !!! [ALGORITHM] matrix_multiply — O(n^3) nested loop detected
      Suggestion: consider cache-blocking or BLAS library for large matrices
      Estimated savings: 3-5x with cache blocking

  !! [MEMORY] matrix_multiply — inner loop access pattern B[k*n+j] has stride n
      Suggestion: transpose B before multiply, or interchange k/j loops
      Estimated savings: 2-4x from improved cache locality

Limitations

  • No preprocessor directives (#define, #include, #ifdef)
  • No function pointers or callbacks
  • No variadic functions beyond printf/scanf
  • No typedef (use bare type names)
  • No union types
  • No enum (use integer constants)
  • No complex struct initializers (= { .field = value })
  • No inline assembly