Skip to content

Instantly share code, notes, and snippets.

@Akeboshiwind
Last active December 25, 2025 12:15
Show Gist options
  • Select an option

  • Save Akeboshiwind/22bb9e19293d41ab94247f3b13140e47 to your computer and use it in GitHub Desktop.

Select an option

Save Akeboshiwind/22bb9e19293d41ab94247f3b13140e47 to your computer and use it in GitHub Desktop.
Unfinished Squint-like Clojure to C syntax transpiler design doc

An unfinished design document I worked on w/ Claude. Didn't complete it because I don't really have a use case for it and C and Clojure have more incompatibilities than I'd assumed. Note that this doesn't include the full "converstaion" as I got Claude to remove comments once they were resolved.


Things outside of quotes are the design elements.

Quotes like this

Note

Or notes like this are from me

[!NOTE] πŸ€– Notes like this are from Claude

This allowed me to have a "converstaion" with the design doc over serveral sessions.

Clojure-to-C Compiler Design Document

A compiler that transpiles Clojure syntax to C code. The goal is to provide Clojure's expressive syntax while generating idiomatic, zero-overhead C.

Philosophy

Core Principles

  1. Thin layer over C β€” The language is C with Clojure syntax, not a new language
  2. No magic β€” What you write is what you get; no hidden allocations, no implicit runtime
  3. Explicit control β€” User manages memory, types, and file organization
  4. Transpile only β€” Compiler outputs C text; user invokes their own C compiler
  5. No runtime required β€” Generated code is standalone C (optional runtime library for convenience)

Comparison to Squint (Clojure-to-JS)

Aspect Squint (JS target) This Project (C target)
Runtime types JS provides them None; explicit types required
GC JS provides it None; user manages memory
Collections JS arrays/objects C arrays/structs
Polymorphism JS dynamic dispatch Monomorphization or explicit enums
Closures JS closures Inline only, or explicit env structs
Eval/REPL Possible at runtime Compile-time only (dev REPL via dlopen trick)
Macros SCI at compile time Clojure JVM at compile time

Type System

Explicit Type Annotations

All types must be specified. The compiler does not infer types.

(defn add ^int [^int a ^int b]
  (+ a b))

Primitive Types

Clojure Annotation C Type
^int int
^int8 int8_t
^int16 int16_t
^int32 int32_t
^int64 int64_t
^uint8 uint8_t
^uint16 uint16_t
^uint32 uint32_t
^uint64 uint64_t
^float float
^double double
^char char
^bool bool
^void void
^size_t size_t

Pointer Types

^int*           ;; int*
^char*          ;; char*
^void*          ;; void*
^int**          ;; int**
^Node*          ;; Node* (struct pointer)

Const, Volatile, Restrict

Use metadata maps for type qualifiers. The standard Clojure ^:const shortcut works for simple cases.

^{:type char* :const true}                    ;; const char*
^{:type int :volatile true}                   ;; volatile int
^{:type int* :restrict true}                  ;; int* restrict
^{:type char* :const true :restrict true}     ;; const char* restrict

;; Shorthand for common cases
^:const ^char* msg                            ;; const char* msg

Function Pointer Types

Similar to Carp's (Fn [Int Int] Int) and Typed Racket's (-> Integer Integer Integer).

^(fn int int -> int)           ;; int (*)(int, int)
^(fn void* size_t -> void*)    ;; void* (*)(void*, size_t)
^(fn -> void)                  ;; void (*)(void)

Array Types

Similar to Carp's (Array Int) syntax.

^(array int 10)     ;; int[10] (fixed size)
^int*               ;; int* (pointer, for dynamic arrays)

Functions

Definition (defn)

(defn function-name ^return-type [^type1 arg1 ^type2 arg2]
  body)

Example:

(defn max ^int [^int a ^int b]
  (if (> a b) a b))

Outputs:

int max(int a, int b) {
    return (a > b) ? a : b;
}

Declaration (declare)

For header files or forward declarations:

(declare max ^int [^int a ^int b])

Outputs:

int max(int a, int b);

Function Modifiers

(defn ^:static helper ^void []    ;; static void helper(void)
  ...)

(defn ^:inline fast-add ^int [^int a ^int b]   ;; static inline int fast_add(...)
  (+ a b))

(defn ^:export api-func ^void []   ;; with visibility attribute for shared libs
  ...)

Variadic Functions

Note

We should stick to the clojure norms here What would the typing be?

[!NOTE] πŸ€– Tricky because C's ... is untyped. Options:

  1. Untyped varargs (match C): & args compiles to ..., no type annotation possible
  2. va_list passthrough: ^va_list args for functions that just forward
  3. Typed array alternative: For type safety, encourage ^int* args ^size_t n instead

The & args syntax is Clojure-idiomatic. Just document that it maps to C's ... and is inherently untyped. The type annotation would go on the non-variadic params only.

Yeah, varargs are a bit weird in C it seems The typing seems fine (as untyped) I've updated the code, what would this compile to?

[!NOTE] πŸ€– This is tricky. The issue is that C's ... doesn't give you a handle to the args β€” you need va_list to access them. The vprintf approach only works if you have a va_list. Two options:

Option A: Function can't access args directly

(defn my-printf ^int [^char* fmt & ...]  ;; & ... means "passthrough only"
  (printf fmt ...))  ;; special syntax to forward varargs

But C can't forward ... to another variadic function directly.

Option B: Use va_list explicitly

(defn my-vprintf ^int [^char* fmt ^va_list args]
  (vprintf fmt args))

This is honest about what C can actually do.

Option C: Macro that wraps

(defmacro my-printf [fmt & args]
  `(printf ~fmt ~@args))

Expands at compile time, so varargs work naturally.

I'd suggest: & args compiles to ... but is only usable for calling other variadic functions via a compiler-generated wrapper that uses va_start/va_end. Or just document that variadics are best handled via macros.

Hmm, I think ... is out because it doesn't match any existing clojure syntax How does squint handle this? Is there other prior art for clojure-like languages? What's the minimum we can support here that's strictly clojure? Maybe just not supporting varargs at all to start with?

[!NOTE] πŸ€– How Squint handles varargs: Squint's & args compiles to JS rest parameters (...args), which are proper arrays. JS handles this natively, so no special treatment needed.

Prior art:

  • Carp: Has (defn foo [x & rest] ...) but rest becomes an Array β€” no direct C varargs mapping
  • ClojureScript: Same as Squint, relies on JS
  • C itself: Varargs are fundamentally different from collected args β€” you can't iterate them without macros

Minimum viable approach: Not supporting & args for C varargs initially is reasonable. Options:

  1. Defer: Document that C variadics are out of scope v1. Users use macros or explicit arrays.
  2. Macro-only: & args only works in defmacro, expands at compile time
  3. va_list explicit: Support ^va_list as a type for wrapping existing variadic C functions
;; Option 3: Explicit va_list for wrapping
(defn my-vprintf ^int [^char* fmt ^va_list args]
  (vprintf fmt args))

;; Calling variadic C functions works naturally
(printf "hello %s %d" name age)  ;; just emits printf("hello %s %d", name, age)

I'd suggest: defer defining variadic functions but allow calling C variadic functions freely. Most use cases are calling printf/sprintf, not defining new variadics.

;; Calling variadic C functions works
(printf "Value: %d\n" x)

;; Defining variadic functions: use macros instead
(defmacro log [fmt & args]
  `(printf ~fmt ~@args))

Outputs:

int my_printf(char* fmt, ...) {
    va_list args;
    va_start(args, fmt);
    int result = vprintf(fmt, args);
    va_end(args);
    return result;
}

Structs

Definition (defstruct)

(defstruct Point [^int x ^int y])

Outputs:

typedef struct { int x; int y; } Point;

Forward Declaration

Uses declare with metadata, consistent with how declare works in Clojure.

(declare ^:struct Node)

Outputs:

struct Node;

Field Access

(:x point)              ;; point.x (for value)
(:x point-ptr)          ;; point_ptr->x (for pointer)

Field Mutation

Two approaches: assoc for copy semantics (stack structs), assoc! for in-place mutation (pointers).

;; Copy semantics (returns new struct, original unchanged)
(let [p2 (assoc p1 :x 10)]    ;; p1 unchanged, p2 is copy with new x
  ...)

;; In-place mutation (modifies pointer target)
(assoc! ptr :x 10)            ;; modifies *ptr in place, returns ptr

;; Multiple fields
(assoc point :x 10 :y 20)     ;; copy with both fields changed
(assoc! ptr :x 10 :y 20)      ;; mutate both fields

Outputs:

// assoc (copy)
Point p2 = p1;
p2.x = 10;

// assoc! (mutate)
ptr->x = 10;

Struct Modifiers

(defstruct ^:packed NetHeader [^uint16 len ^uint8 type])

Bit Fields

Use metadata maps with :bits key, consistent with the const/volatile pattern.

(defstruct Flags
  [^{:type uint8 :bits 1} read
   ^{:type uint8 :bits 1} write
   ^{:type uint8 :bits 1} exec
   ^{:type uint8 :bits 5} _reserved])

Outputs:

typedef struct {
    uint8_t read : 1;
    uint8_t write : 1;
    uint8_t exec : 1;
    uint8_t _reserved : 5;
} Flags;

Enums and Unions

Simple Enum

(defenum Color
  RED
  GREEN
  BLUE)

Outputs:

typedef enum { RED, GREEN, BLUE } Color;

Tagged Union (Sum Type)

Note

Hmm, this is interesting. It's wrapping up a common C pattern in a nice syntax I wonder if we could instead have the struct syntax support unions and have a special macro for tagged unions?

[!NOTE] πŸ€– Good instinct. You could decompose it:

;; Low-level: raw union (no tag)
(defunion IntCollData
  [^int* array-data
   ^IntNode* list-head])

;; Low-level: manual tagged struct
(defstruct IntColl
  [^IntCollTag tag
   ^IntCollData data])

;; High-level: macro that generates both
(deftagged IntColl
  (Array ^int* data ^size_t len)
  (List ^IntNode* head))

This way defunion and defstruct are primitive, and deftagged (or a user macro) builds on them. More composable, less magic in the core.

I was more thinking there is struct and union functions like:

(def s
  (struct
    [^int* tag
     (union [^(struct [^int* data ^size_t len]) array])
     (union [^(struct [^IntNode* head]) list])]))

Or something like that anyway, I'm not sure the above would work. What would?

[!NOTE] πŸ€– The idea of composable struct and union forms is good. Here's a syntax that would work:

;; Named union (like defstruct but for unions)
(defunion Value
  [^int i
   ^float f
   ^char* s])

;; Anonymous union inline in a struct
(defstruct Event
  [^EventType type
   ^(union [^KeyData key]
           [^MouseData mouse]) data])

;; Anonymous struct inline (less common but possible)
(defunion Packet
  [^(struct [^uint16 len ^uint8 flags]) header
   ^uint32 raw])

The key insight: ^(union ...) and ^(struct ...) as type annotations for anonymous inline types. This parallels ^(array int 10) and ^(fn int -> int).

For a tagged union specifically:

(defstruct IntColl
  [^IntCollTag tag
   ^(union [^(struct [^int* data ^size_t len]) array]
           [^(struct [^IntNode* head]) list]) data])

This compiles to:

typedef struct {
    IntCollTag tag;
    union {
        struct { int* data; size_t len; } array;
        struct { IntNode* head; } list;
    } data;
} IntColl;

The deftagged macro could then be sugar that generates both the enum and this struct pattern.

I was thinking more than it being a type-function (which on looking at further doesn't seem very clojure-y) that instead union was a function which wrapped the union {...} syntax

(defstruct IntColl
  [^IntCollTag tag
   (union [^(struct [^int* data ^size_t len]) array]
          [^(struct [^IntNode* head]) list]) data])

Possibly the same would be true for struct? Although I'm not sure how that would work syntactically

Could you consider the above plus also provide any prior are there is for function call-like type hints in clojure itself (or it's libraries)?

[!NOTE] πŸ€– I like the (union ...) form approach β€” it's more Clojure-y than type annotations. Here's how it could work:

(defstruct IntColl
  [^IntCollTag tag
   (union
     [^int* data ^size_t len]    ;; variant 1: array
     [^IntNode* head])           ;; variant 2: list
   ])

The challenge is naming the variants and the union field itself. Options:

;; Option A: Named variants inline
(defstruct IntColl
  [^IntCollTag tag
   (union :data                           ;; field name
     [:array ^int* data ^size_t len]      ;; variant name + fields
     [:list ^IntNode* head])])

;; Option B: Separate field name
(defstruct IntColl
  [^IntCollTag tag
   ^(union [:array [^int* data ^size_t len]]
           [:list [^IntNode* head]]) data])

Prior art for function-like type hints in Clojure:

  • core.typed: Uses (HMap :mandatory {:x Int}), (Vec Int), (Fn [Int -> Int])
  • Malli: Uses vectors like [:map [:x :int]], [:vector :int]
  • Spec: Uses (s/keys :req [:x]), (s/coll-of int?)

These all use function-call-like forms for complex types. So (union ...) and (struct ...) as forms (not type annotations) fits the pattern. The key is: are they forms that emit code or type annotations?

I'd suggest: (union ...) is a form within defstruct that describes an anonymous union field:

(defstruct IntColl
  [^IntCollTag tag
   (union data                    ;; field name first
     (variant :array [^int* ptr ^size_t len])
     (variant :list [^IntNode* head]))])

For runtime type choice:

(defenum IntColl
  (Array ^int* data ^size_t len)
  (List ^IntNode* head))

Outputs:

typedef enum { IntColl_Array, IntColl_List } IntCollTag;
typedef struct {
    IntCollTag tag;
    union {
        struct { int* data; size_t len; } array;
        struct { IntNode* head; } list;
    };
} IntColl;

Pattern Matching on Enums

Note

This is definitely a step too far I think, this could be defined as a macro in user code imo

[!NOTE] πŸ€– Agreed. If deftagged is a user-space macro, then match should be too. The core compiler only needs to handle:

  • switch on an enum value
  • Field access on structs/unions

A user can build match from those primitives. Keeps the core minimal.

Great, before we remove this: How would a user handle this in user code with the primitives we provide?

[!NOTE] πŸ€– With primitives only, the user would write:

(case (:tag coll)
  IntColl_Array (let [data (:data (:array coll))
                      len (:len (:array coll))]
                  (sum-array data len))
  IntColl_List (let [head (:head (:list coll))]
                 (sum-list head)))

Then a user-defined match macro could expand to this. The macro would:

  1. Extract the tag field name from the type
  2. Generate case with enum values
  3. Bind destructured fields via let
(defmacro match [val & clauses]
  (let [tag-sym (gensym "tag")]
    `(let [~tag-sym (:tag ~val)]
       (case ~tag-sym
         ~@(mapcat (fn [[pattern body]]
                     (let [[type-name & bindings] pattern
                           variant (keyword (name type-name))]
                       [(symbol (str (namespace type-name) "_" (name type-name)))
                        `(let [~@(destructure-bindings val variant bindings)]
                           ~body)]))
                   (partition 2 clauses))))))

This keeps the compiler simple β€” just case, field access, and let β€” while allowing rich pattern matching via macros.

(match coll
  (IntColl/Array data len) (sum-array data len)
  (IntColl/List head) (sum-list head))

Outputs:

switch (coll.tag) {
    case IntColl_Array: sum_array(coll.array.data, coll.array.len); break;
    case IntColl_List: sum_list(coll.list.head); break;
}

Standalone Union

Note

I suppose this would be something like union under the hood but would compile to (def Value (union ...))

[!NOTE] πŸ€– Yes, defunion would be analogous to defstruct but for C unions. Could also support anonymous unions inline in structs:

(defstruct Event
  [^EventType type
   ^(union [^KeyEvent key]
           [^MouseEvent mouse]) data])

This mirrors how C allows anonymous unions inside structs. The (union ...) form creates a union type inline.

If we went with union as a function this could the def from above, does this make more sense now?

[!NOTE] πŸ€– Yes! With union as a form (not type annotation), defunion becomes consistent:

;; defunion is sugar for a union with named fields
(defunion Value
  [^int i
   ^float f
   ^char* s])

;; Equivalent to defining a named union type

And inline in structs:

(defstruct Event
  [^EventType type
   (union data                    ;; field name
     [^KeyEvent key]              ;; variant: key
     [^MouseEvent mouse])])       ;; variant: mouse

The pattern is consistent: defstruct/defunion for top-level definitions, (union ...) as a form within struct definitions for anonymous inline unions.

(defunion Value
  [^int i
   ^float f
   ^char* s])

Memory Management

No Garbage Collection

User is responsible for all memory management. The language provides no automatic memory management.

Standard C Functions

(malloc (sizeof Point))
(calloc n (sizeof int))
(realloc ptr new-size)
(free ptr)

Stack vs Heap

;; Stack allocation
(let [point (Point 10 20)]     ;; Point point = {10, 20};
  ...)

;; Stack array
(let [arr (stack-array int 100)]   ;; int arr[100];
  ...)

;; Heap allocation
(let [ptr (malloc (sizeof Point))]
  ...
  (free ptr))

Sizeof

(sizeof int)
(sizeof Point)
(sizeof (deref ptr))

Control Flow

Conditionals

;; if (two branches)
(if (> x 0)
  (positive x)
  (non-positive x))

;; when (single branch)
(when (nil? ptr)
  nil)

;; cond (multiple conditions)
(cond
  (< x 0) (puts "negative")
  (> x 0) (puts "positive")
  :else   (puts "zero"))

Loops

;; loop/recur (compiles to for/while)
(loop [i 0
       sum 0]
  (if (>= i n)
    sum
    (recur (inc i) (+ sum (aget arr i)))))

;; dotimes
(dotimes [i 10]
  (printf "%d\n" i))

;; doseq (over array with known length)
(doseq [x xs]
  (printf "%d\n" x))

;; while
(while (not (nil? node))
  (printf "%d\n" (:val node))
  (set! node (:next node)))

Early Return

No explicit return form. Use loop/recur with conditional logic instead β€” it compiles to the same C and is more functional:

(defn find ^int* [^int* arr ^size_t len ^int target]
  (loop [i 0]
    (cond
      (>= i len) NULL
      (= (aget arr i) target) (+ arr i)
      :else (recur (inc i)))))

Outputs:

int* find(int* arr, size_t len, int target) {
    for (size_t i = 0; i < len; i++) {
        if (arr[i] == target) return arr + i;
    }
    return NULL;
}

Break and Continue

No explicit break or continue. Use loop/recur with conditional logic:

;; Print even numbers less than 50
(loop [i 0]
  (when (< i 50)
    (when (even? i)
      (printf "%d\n" i))
    (recur (inc i))))

Outputs:

for (int i = 0; i < 50; i++) {
    if (i % 2 == 0) {
        printf("%d\n", i);
    }
}

Arrays and Pointers

Array Access

(aget arr i)           ;; arr[i]
(aset arr i val)       ;; arr[i] = val
(aget matrix r c)      ;; matrix[r][c]

Pointer Operations

Note

Would deref work better for getting the address of a pointer? Possibly not. I don't want to introduce much if any new syntax to c Is pointer arithmetic compiled to a different type in C? Or can it use the same arithmetic as in clojure? (+ - etc)

Casting is interesting, there's likely some better syntax though? Possibly:

  • (my-func ^void* ptr) - cast when tagged at usage site (not definition)

[!NOTE] πŸ€– Pointer arithmetic: In C, ptr + n is type-aware (adds n * sizeof(*ptr)). Using (+ ptr 5) would work if the compiler knows ptr is a pointer type β€” it can emit ptr + 5 directly. No need for ptr+ if we have type info.

Casting at call site: I like (my-func ^void* ptr) β€” the metadata says "cast this before passing". Clean and consistent with type annotations. Could work for any expression:

(some-func ^int (+ x y))  ;; (int)(x + y)
(other-func ^void* ptr)   ;; (void*)ptr

deref vs addr: deref for *ptr is Clojure-ish (atoms use deref). For address-of, & reader macro would be nice but might conflict. (addr x) is fine, or even just prefix &x if the reader supports it.

Great, for arithmetic just use the existing operators (+ - etc) For casting add a note and examples of that

I definitely don't want to add any reader syntax as clojure has barely any of that The problem with implementing deref for a value to get it's pointer is that we can no longer use deref for other types of values What other options do we have that are nice and ergonomic? but still fit in all my requirements?

[!NOTE] πŸ€– You're right that deref is overloaded in Clojure (atoms, refs, agents, futures). Options for pointer dereference:

  1. deref with type dispatch: If compiler knows it's a pointer type, deref means *ptr. Risk: confusion.
  2. * as a function: (* ptr) β€” but conflicts with multiplication. Bad.
  3. ptr-get/ptr-set!: Explicit, clear, ugly.
  4. @ reader macro: @ptr β€” but you don't want new reader syntax.
  5. .- field access style: (.- ptr) β€” too cryptic.
  6. Just use deref: It's the closest Clojure concept. The compiler knows types, so (deref atom-like-thing) vs (deref ^int* ptr) are distinguishable.

Recommendation: Keep deref for pointer dereference. In a C-targeting language, you won't have Clojure atoms/refs anyway. The mental model is "deref gets the value at a reference" β€” which is exactly what *ptr does.

For address-of, addr is fine and readable.

;; Pointer dereference
(deref ptr)            ;; *ptr

;; Address-of
(addr x)               ;; &x

;; Pointer arithmetic (just use regular operators)
(+ arr 5)              ;; arr + 5 (type-aware)
(- p q)                ;; p - q

;; Casting at call site
(func ^void* ptr)      ;; func((void*)ptr)
(func ^int (+ x y))    ;; func((int)(x + y))

Array Literals

Note

Feels like the syntax for a stack array can be worked on a bit more

[!NOTE] πŸ€– Options:

;; Current
(stack-array int 100)

;; More Clojure-ish: make-array with type
(make-array int 100)        ;; Clojure has this!

;; Using type annotation on literal
^(array int 100) []         ;; empty fixed array

;; Just use let binding with type
(let [^(array int 100) arr]  ;; declares int arr[100]
  ...)

make-array exists in Clojure and is idiomatic. The distinction between stack/heap could be context-dependent (inside let = stack, returned = must be heap).

I like make-array, would metadata be the wrong way to make this stack or not? Hmm, for heap stuff how is that done in C? Isn't it done explicitly with malloc/calloc? I don't want things to magically be on the heap unless the user explicitly does that, or a function which is documented does that

[!NOTE] πŸ€– You're exactly right. In C:

  • Stack: int arr[100]; β€” automatic storage, no explicit allocation
  • Heap: int* arr = malloc(100 * sizeof(int)); β€” explicit allocation

The philosophy of "no magic" means heap allocation should always be explicit via malloc/calloc. So:

;; Stack array (automatic storage)
(make-array int 100)         ;; int arr[100]; on stack

;; Heap array (explicit)
(malloc (* 100 (sizeof int)))  ;; malloc(100 * sizeof(int))
(calloc 100 (sizeof int))      ;; calloc(100, sizeof(int))

make-array is always stack. No metadata needed because heap allocation is a separate, explicit operation. This matches C's mental model exactly.

For initialized arrays:

(int-array [1 2 3 4 5])      ;; (int[]){1, 2, 3, 4, 5}
(make-array int 100)          ;; int arr[100]
;; Stack arrays
(make-array int 100)         ;; int arr[100]
(int-array [1 2 3 4 5])      ;; (int[]){1, 2, 3, 4, 5}

;; Heap arrays (explicit allocation)
(malloc (* n (sizeof int)))  ;; malloc(n * sizeof(int))
(calloc n (sizeof int))      ;; calloc(n, sizeof(int))

Operators

Arithmetic

(+ a b)    (- a b)    (* a b)    (/ a b)
(mod a b)  (inc x)    (dec x)

Comparison

(= a b)    (not= a b)
(< a b)    (> a b)    (<= a b)    (>= a b)
(zero? x)  (nil? ptr)

Logical

(and a b)  (or a b)  (not x)

Bitwise

(bit-and x y)    ;; x & y
(bit-or x y)     ;; x | y
(bit-xor x y)    ;; x ^ y
(bit-not x)      ;; ~x
(bit-shl x n)    ;; x << n
(bit-shr x n)    ;; x >> n

Macros

Compile-Time Expansion

Macros run on the JVM at compile time (like Squint uses SCI). They expand to language constructs, which then compile to C.

(defmacro unless [test & body]
  `(when (not ~test) ~@body))

(unless (nil? ptr)
  (process ptr))

seq Operations as Macros

Functions like map, filter, reduce are macros that expand to loops:

Note

I think it's worth noting the reason for this Afaik, it's for typing issues Otherwise when you would only be able to define reduce for one type signature, which is limiting

One thing that might need to be expanded on is how the types do get inferred by macros and here specifically What does the macro look like to do that?

[!NOTE] πŸ€– Yes, the core reason is: C has no generics (pre-C11 _Generic), so you can't write one reduce function that works on all types.

How macros get type info: The macro would need access to compiler metadata. Options:

  1. Explicit type args: (reduce ^int + 0 xs) β€” user specifies accumulator type
  2. Infer from initial value: 0 is int, 0.0 is double β€” macro deduces type
  3. Infer from collection type: If xs is annotated ^int*, macro knows element type

The macro might look like:

(defmacro reduce [f init coll]
  (let [acc-type (or (type-meta init) (element-type coll))
        elem-type (element-type coll)]
    `(loop [acc# ~init, i# 0]
       (if (>= i# (len ~coll))
         acc#
         (recur (~f acc# (aget ~coll i#)) (inc i#))))))

The type-meta and element-type would be compiler-provided functions available at macro-expansion time.

Neither of those types are actually used, are they even needed if it basically compiles to:

(let [f +, init 0, coll xs]
  (loop [acc init, i 0]
    (if (>= i (len coll))
      acc
      (recur (f acc (aget coll i)) (inc i)))))

Basically types aren't checked during macro expansion

[!NOTE] πŸ€– You're right β€” the macro doesn't need type info. It just expands to code, and the C compiler handles types. The expanded code uses the same variables, so types flow through naturally:

(reduce + 0 xs)  ;; xs is ^int*, 0 is int

Expands to code where acc inherits type from 0 (int), and (aget xs i) inherits from xs (int). The C compiler sees:

int acc = 0;
for (...) { acc = acc + xs[i]; }

No type tracking needed in the macro β€” just textual expansion. The C type system does the rest.

(reduce + 0 xs)

Expands to:

int acc = 0;
for (size_t i = 0; i < xs_len; i++) {
    acc = acc + xs[i];
}

Closures

Note

This is an area I'm still unsure about It feels like the language should pick up clojure's semantics here But I don't like the idea of this language being too "thick" of a layer over C This feels like something to consider more broadly across the whole language Should most of the language be 1:1 transpilation and some features improvments? Should we make everything fully representable? Perhaps let's come back to this after considering more of the rest of the design. We should also consider prior art here for other transpilation languages, what behaviour would be more or less surprising to a user?

[!NOTE] πŸ€– Prior art from other transpilers:

  • Carp: Closures that capture must be explicit with copy or reference. No implicit heap allocation.
  • Nim: Has closures but they're heap-allocated with GC. Explicit {.closure.} pragma.
  • Zig: No closures at all. You pass context explicitly.
  • Rust: Closures work, but ownership/borrowing makes captures explicit.

Spectrum of options:

  1. Zig-style (thin): No closures. Always pass context explicitly. Most C-like.
  2. Carp-style (medium): Inline closures work, escaping closures require explicit handling.
  3. Full closures (thick): Compiler generates env structs automatically, heap allocates. Magic.

I'd suggest Carp-style: inline closures "just work" (no allocation), escaping closures are a compiler error that tells you to restructure. This is honest about C's limitations while still being ergonomic for common cases.

Could you show me various examples of the medium approach? What would it look like in that case to handle it "manually" in this case?

[!NOTE] πŸ€– Medium approach examples:

1. Inline closure β€” works automatically:

(let [multiplier 10]
  (map (fn [x] (* x multiplier)) xs))

Compiler inlines multiplier since it's in scope. No allocation.

2. Callback to same-scope function β€” works:

(defn process-items ^void [^int* items ^size_t len]
  (let [threshold 50]
    (filter (fn [x] (> x threshold)) items len)))

threshold is visible where the closure is used. Fine.

3. Escaping closure β€” compiler error:

(defn make-adder ^??? [^int n]
  (fn [x] (+ x n)))  ;; ERROR: closure escapes, captures `n`

Compiler says: "Closure captures n but escapes scope. Use explicit env struct."

4. Manual handling of escaping case:

;; Define environment struct
(defstruct AdderEnv [^int n])

;; Function that takes env explicitly
(defn adder-fn ^int [^AdderEnv* env ^int x]
  (+ x (:n env)))

;; "Constructor" returns env, not closure
(defn make-adder ^AdderEnv [^int n]
  (AdderEnv. n))

;; Usage: pass function + env separately
(let [env (make-adder 10)]
  (adder-fn (addr env) 5))  ;; => 15

This is explicit, zero-magic, and maps directly to C patterns.

Inline Closures (Work)

Closures that don't escape their scope compile to inline code:

(let [offset 10]
  (reduce (fn [acc x] (+ acc x offset)) 0 xs))

offset is in scope, so this just inlines to a loop with offset visible.

Returned Closures (Require Explicit Env)

Closures that escape require user-defined environment structs:

(defstruct OffsetEnv [^int n])

(defn make-offset-env ^OffsetEnv [^int n]
  (OffsetEnv n))

(defn apply-offset ^int [^OffsetEnv* env ^int acc ^int x]
  (+ acc x (:n env)))

;; Usage
(let [env (make-offset-env 10)]
  (reduce-with-env apply-offset &env 0 xs))

Polymorphism

Note

In this area I'm in a couple minds I definitely want it to be possible to write code that compiles transparently to C without much ceremony But it also seems very useful to provide some level of abstraction for areas like this Maybe there's a "core" set of functions that have zero abstraction but certain features have more abstraction build in? Maybe functions work on structs and arrays normally, but we provide a Vector (dynamic array) and maps with more features that work with more functions. I.e. you can only use reduce, map, filter, assoc etc when using types with a specific interface? It would also be nice to support as much of the clojure standard library as possible with zero abstraction

[!NOTE] πŸ€– I think a layered approach works:

Layer 0 (Core): Pure transpilation, zero magic

  • defn, defstruct, let, if, loop, arithmetic, aget/aset
  • Works on raw C types

Layer 1 (Macros): Zero-cost abstractions via macros

  • reduce, map, filter, doseq β€” expand to loops
  • Work on anything with known length (arrays, structs with len field)
  • Monomorphize at compile time

Layer 2 (Runtime library, opt-in): Provided data structures

  • Vec, HashMap, String β€” dynamic, heap-allocated
  • Full assoc, get, conj support
  • Could use Boehm GC or manual memory

This way: simple C code uses Layer 0, functional style uses Layer 1 (still zero-cost), and people who want richer collections opt into Layer 2.

Hmm, is there a concept similar to interfaces we can use? In clojure there is protocols which we could consider as something similar

[!NOTE] πŸ€– Protocol-like patterns in C:

1. Compile-time protocols (via macros): A "protocol" could be a contract that macros check at expansion time:

(defprotocol Countable
  (len [coll]))

(extend-protocol Countable
  Vec  (len [v] (:len v))
  Slice (len [s] (:len s))
  (array _ n) (len [a] n))  ;; fixed arrays know their length

When a macro like reduce sees a Countable, it knows how to get length. This is pure compile-time β€” no runtime vtable.

2. Runtime protocols (function pointers): For actual runtime dispatch, you need explicit vtables:

(defstruct CountableVtable
  [^(fn void* -> size_t) len])

(defstruct Countable
  [^CountableVtable* vtable
   ^void* data])

(defn len ^size_t [^Countable* c]
  ((-> c :vtable :len) (:data c)))

This is explicit, user-defined, no magic.

3. Monomorphizing macro approach: Most practical: reduce, map etc. are macros that generate type-specific code. They don't need protocols if they can see the type at macro-expansion time.

I'd suggest: start without protocols, see if macros + type annotations are sufficient. Add compile-time protocols later if there's demand.

No Runtime Polymorphism by Default

C has no virtual dispatch. Options:

  1. Monomorphization β€” Generate type-specific functions
  2. Enums β€” Explicit tagged unions with switch
  3. Function pointers β€” Manual vtables (user-implemented)

Monomorphization via Macros

The same macro generates different code based on types:

;; Macro knows xs is int* and expands accordingly
(reduce + 0 xs)  ;; generates int loop

;; If xs were double*
(reduce + 0.0 xs)  ;; generates double loop

Explicit Enum for Runtime Type Choice

When types aren't known at compile time, user must use an enum:

(defenum Number
  (Int ^int val)
  (Float ^double val))

(defn add-numbers ^Number [^Number a ^Number b]
  (match [a b]
    [(Number/Int x) (Number/Int y)] (Number/Int (+ x y))
    [(Number/Float x) (Number/Float y)] (Number/Float (+ x y))
    ;; ... mixed cases
    ))

C Interop

Note

I don't like any of this tbh, is there prior art in clojure itself for this kind of syntax?

[!NOTE] πŸ€– Clojure prior art for native interop:

  • ClojureScript: (ns foo (:require ["react" :as r])) β€” string requires for JS modules
  • Clojure JNI/JNA: Uses Java interop, not directly comparable
  • Carp: (relative-include "header.h") and (system-include "stdio.h")

Using ns with :require for includes is Clojure-idiomatic. The string-vs-angle-bracket distinction could be:

(ns mylib
  (:require ["stdio.h"]      ;; system: #include <stdio.h>
            ["myheader.h" :local]))  ;; local: #include "myheader.h"

Or simpler: <stdio.h> vs "myheader.h" as literal strings.

I think using the C syntax for requires makes sense:

(ns mylib
  (:require ["<stdio.h>"]
            ["myheader.h"]))

What about :as, is that a feature that makes sense? Or should we require :refer :all or something or maybe just support use which I guess does the same thing?

[!NOTE] πŸ€– :as for C headers: Doesn't map well to C. C #include is textual inclusion β€” there's no namespacing. Everything becomes globally visible. So :as would be misleading.

What makes sense:

  • (:require ["<stdio.h>"]) β€” just includes, all symbols visible
  • No :as, no :refer β€” not applicable to C includes

For this project's own modules (other .cljc files), you could potentially have:

(ns mylib
  (:require ["<stdio.h>"]           ;; C header, just include
            [myutils :as u]))        ;; our .cljc module, could prefix

But that requires the compiler to track symbols and generate prefixed names. Low priority β€” start with just includes, no namespacing. C devs expect global symbols anyway.

Include

Use :require in the ns form with strings. Angle brackets vs quotes detected from string content:

  • "<stdio.h>" β†’ #include <stdio.h> (starts with <)
  • "myheader.h" β†’ #include "myheader.h" (no angle brackets)
(ns mylib
  (:require ["<stdio.h>"]
            ["<stdlib.h>"]
            ["myheader.h"]))

Outputs:

#include <stdio.h>
#include <stdlib.h>
#include "myheader.h"

Raw C Escape Hatch

Note

I'm hoping we can avoid this with more thought for the design I can see there's a use for these for interacting with other C files, but maybe we can have better first class support for that? What might it look like? Maybe something in the ns form? That might be too specific though

[!NOTE] πŸ€– Common c-raw use cases and how to replace them:

Use case Instead of c-raw
Include guards (ns foo (:guard :once)) or just document "use pragma once"
Pragma once (pragma :once) β€” first-class form
Define constants (def ^:macro NAME value)
Define macros Use Clojure macros instead
Platform ifdefs (when-defined _WIN32 ...)
Extern C (ns foo (:extern-c true)) for C++ compat

You could probably eliminate 90% of c-raw needs. Keep it as an escape hatch but discourage use. If people need it a lot, that's a signal to add first-class support.

I think a pragma form makes sense which we can include in ns similar to require. There are surely other pragmas that we might want to support in the future too I think it makes sense for #define to be supported in def, I'm not sure about :macro. What other syntax could we use? Are there libraries that require us to define C macros to use them? I don't like when-defined maybe we can have a defined? form and pattern match (when/if (defined? ...)) Is there cases of that in Squint? I think we need to support extern C too, I'm not sure it make sense in the ns form though. What would it look like as a separate form or as metadata?

[!NOTE] πŸ€– Pragma in ns: Yes, (:pragma :once) or (:pragma ["pack" "push" 1]) for more complex ones.

#define via def: ^:define or ^:c-macro could work:

(def ^:define BUFFER_SIZE 4096)        ;; #define BUFFER_SIZE 4096
(def ^:define MAX (fn [a b] ...))      ;; #define MAX(a, b) ...

Libraries requiring #defines: Yes, common patterns:

  • #define STB_IMAGE_IMPLEMENTATION before including stb headers
  • #define GLAD_GL_IMPLEMENTATION
  • Feature toggles like #define USE_OPENGL

defined? form: I like (if (defined? SYMBOL) ...) better than when-defined. It composes:

(if (defined? _WIN32)
  (windows-impl)
  (posix-impl))

Squint doesn't have this β€” JS doesn't have preprocessor conditionals.

extern "C": For C++ interop, options:

;; Option A: Block form
(extern-c
  (defn foo ^void [] ...))

;; Option B: Metadata
(defn ^:extern-c foo ^void [] ...)

;; Option C: ns-level
(ns mylib
  (:extern-c true))  ;; wraps whole file

Option A is explicit for specific functions. Option C is convenient for whole files.

(c-raw "#ifndef MY_HEADER_H")
(c-raw "#define MY_HEADER_H")
(c-raw "#endif")

(c-raw "#pragma once")

(c-raw "#define MAX(a, b) ((a) > (b) ? (a) : (b))")

Extern Declarations

Note

This seems right

(declare ^:extern printf ^int [^char* fmt ^...])
(declare ^:extern malloc ^void* [^size_t size])

Platform-Specific Code

Note

This feels a little off to me, it's very similar to #?{:clj ...} I think it's the ifdef form, certainly platform-case is too far

[!NOTE] πŸ€– You're right that platform-case is too magical. Let's simplify:

Using if/when with defined? is more composable and Clojure-like:

(if (defined? _WIN32)
  (defn get-time ^int64 [] (GetTickCount64))
  (defn get-time ^int64 [] (clock-gettime ...)))

For complex multi-platform, just use nested conditions or multiple when:

(when (defined? _WIN32)
  (defn sleep ^void [^int ms] (Sleep ms)))

(when (defined? __linux__)
  (defn sleep ^void [^int ms] (usleep (* ms 1000))))

This is less magical and composes with regular Clojure control flow.

;; Simple platform check
(when (defined? _WIN32)
  (defn get-time ^int64 []
    (GetTickCount64)))

;; Platform-specific implementations
(if (defined? _WIN32)
  (defn sleep ^void [^int ms] (Sleep ms))
  (defn sleep ^void [^int ms] (usleep (* ms 1000))))

Outputs:

#ifdef _WIN32
int64_t get_time(void) {
    return GetTickCount64();
}
#endif

#ifdef _WIN32
void sleep(int ms) { Sleep(ms); }
#else
void sleep(int ms) { usleep(ms * 1000); }
#endif

File Output and Organization

Note

I like this design, it leaves room for an ns too:

(ns vec ; Just a name, doesn't mean anything
  (:require ["<stdio.h>"] ; => #include <stdio.h>
            ["other.h"]   ; => #include "other.h"
            [something :as s])) ; Maybe has some other features or something

What does this translate to in Squint for comparison?

An interesting inspiration might be :gen-class We could take inspiration from that for prefixing functions like some libraries do, maybe cram some other features into that

[!NOTE] πŸ€– Squint's ns form:

(ns my.app
  (:require ["react" :as r]           ;; JS import
            ["./utils.js" :refer [helper]]
            [squint.core :refer [println]]))

Compiles to ES6 imports:

import * as r from 'react';
import { helper } from './utils.js';

For this project, the ns could have:

  • :require for includes (as you showed)
  • :pragma for things like once
  • :prefix for function name prefixing (like :gen-class)
  • :export to list public symbols
(ns my.vec
  (:require ["<stddef.h>"])
  (:pragma :once)
  (:prefix "vec_")           ;; vec_new, vec_push, etc.
  (:export [new push pop]))  ;; only these are non-static

I'm not sure about :prefix but it seems fine for now, I think :export makes more sense as metadata on a def/defn/declare though Remove export and remove this note

User Controls Output Type

The compiler only transpiles. User decides the output filename:

mycompiler vec.cljc -o vec.h      # output as header
mycompiler vec.cljc -o vec.c      # output as implementation
mycompiler main.cljc -o main.c

No Automatic Include Guards

Use (:pragma :once) in the ns form:

(ns vec
  (:pragma :once))

Outputs:

#pragma once

Header vs Implementation Pattern

Note

The declarations don't feel clojure-y to me, but maybe they don't need to? Maybe these could be:

  • (defstruct ^:declare Vec)
  • (declare ^:struct Vec)

[!NOTE] πŸ€– I prefer (declare ^:struct Vec) β€” it extends existing declare semantics with metadata. Consistent pattern:

  • (declare foo) β€” forward declare function (existing Clojure)
  • (declare ^:struct Vec) β€” forward declare struct
  • (declare ^:extern printf ...) β€” extern declaration

All use declare with metadata to specify what kind of declaration. One form to learn.

vec.h.cljc (declarations):

(ns vec
  (:require ["<stddef.h>"])
  (:pragma :once))

(declare ^:struct Vec)
(declare vec-new ^Vec* [])
(declare vec-push ^void [^Vec* v ^int val])
(declare vec-get ^int [^Vec* v ^size_t i])
(declare vec-free ^void [^Vec* v])

Clojure declare doesn't support specifying argument and return types How else could this be specified? More metadata?

[!NOTE] πŸ€– You're right that standard Clojure declare is just (declare foo). Options:

Option A: Extended declare (what we have)

(declare vec-new ^Vec* [])

Not standard Clojure, but clear and mirrors defn structure.

Option B: defn with :declare metadata

(defn ^:declare vec-new ^Vec* [] ...)

Problem: ... or empty body is weird.

Option C: Metadata map on declare

(declare ^{:returns Vec* :args []} vec-new)

More verbose but uses standard metadata.

Option D: defn without body = declaration

(defn vec-new ^Vec* [])  ;; no body = declaration only

This overloads defn meaning based on body presence.

I'd suggest keeping the extended declare syntax β€” it's intuitive and parallels defn. It's a small deviation from Clojure, but this is a C-targeting language anyway.

vec.c.cljc (definitions):

(ns vec
  (:require ["vec.h"]))

(defstruct Vec [^int* data ^size_t len ^size_t cap])

(defn vec-new ^Vec* []
  ...)

(defn vec-push ^void [^Vec* v ^int val]
  ...)

Header-Only Library Pattern

Use ifdef form for conditional compilation blocks:

(ns vec2
  (:pragma :once))

;; Declarations (always visible)
(defstruct Vec2 [^float x ^float y])
(declare vec2-add ^Vec2 [^Vec2 a ^Vec2 b])

;; Implementation (guarded)
(ifdef VEC2_IMPL
  (defn ^:static vec2-add ^Vec2 [^Vec2 a ^Vec2 b]
    (Vec2. (+ (:x a) (:x b))
           (+ (:y a) (:y b)))))

Outputs:

#pragma once

typedef struct { float x; float y; } Vec2;
Vec2 vec2_add(Vec2 a, Vec2 b);

#ifdef VEC2_IMPL
static Vec2 vec2_add(Vec2 a, Vec2 b) {
    return (Vec2){a.x + b.x, a.y + b.y};
}
#endif

Constants and Preprocessor

Constants

Note

I'm not sure about defconst, seems fine Maybe (def ^:const BUFFER_SIZE 4096) would be better? It feels like it would be difficult to tell from clojure whether something is a C macro or a runtime constant?

[!NOTE] πŸ€– Good point about the distinction. Options:

;; C macro (#define)
(def ^:macro BUFFER_SIZE 4096)     ;; or (defmacro-c ...)

;; C const variable (static const int)
(def ^:const ^int BUFFER_SIZE 4096)

The difference matters:

  • #define has no type, evaluated at preprocessing
  • static const has a type, exists at runtime (but often optimized away)

Using metadata makes the distinction explicit. ^:macro for #define, ^:const for typed constant.

Ok, I'm coming around to :macro now. I think I would like some more examples of how :macro could be used throughout the language as well as in practice though Can you give me some more of those in a note below?

[!NOTE] πŸ€– Examples of ^:define (C preprocessor macros):

;; Simple constants
(def ^:define BUFFER_SIZE 4096)
(def ^:define PI 3.14159)
(def ^:define NULL 0)

;; Feature flags (common pattern with stb-style headers)
(def ^:define STB_IMAGE_IMPLEMENTATION)  ;; no value = just defined

;; Macro functions
(def ^:define MAX (fn [a b] (if (> a b) a b)))
(def ^:define MIN (fn [a b] (if (< a b) a b)))
(def ^:define ARRAY_LEN (fn [arr] (/ (sizeof arr) (sizeof (aget arr 0)))))

;; Debug helpers using __LINE__, __FILE__
(def ^:define DEBUG_LOG (fn [msg] (printf "[%s:%d] %s\n" __FILE__ __LINE__ msg)))

;; Stringification (token pasting) - might need special syntax
(def ^:define STRINGIFY (fn [x] #x))  ;; needs thought

Outputs:

#define BUFFER_SIZE 4096
#define PI 3.14159
#define NULL 0
#define STB_IMAGE_IMPLEMENTATION
#define MAX(a, b) ((a) > (b) ? (a) : (b))
#define MIN(a, b) ((a) < (b) ? (a) : (b))
#define ARRAY_LEN(arr) (sizeof(arr) / sizeof((arr)[0]))
#define DEBUG_LOG(msg) printf("[%s:%d] %s\n", __FILE__, __LINE__, msg)

The fn form in a ^:define context generates a macro function. Token pasting (##) and stringification (#) might need special forms.

(def ^:define BUFFER_SIZE 4096)
(defconst PI 3.14159)

Outputs:

#define BUFFER_SIZE 4096
#define PI 3.14159

Or for typed constants:

(def ^:const ^int MAX_SIZE 1024)

Outputs:

static const int MAX_SIZE = 1024;

C Preprocessor Macros

Note

I wonder if we even need this? Maybe lisp macros are sufficient? I suppose, users may want to define C macros for use in other libraries?

[!NOTE] πŸ€– Lisp macros are sufficient for your code. But C macros are needed when:

  1. You're writing a header that C users will consume
  2. You need preprocessor-specific features (__LINE__, __FILE__, token pasting)
  3. You're wrapping existing C macro patterns

For pure Clojure-to-C projects, Lisp macros are better. But for interop with C ecosystem, having c-define as an escape hatch makes sense. Low priority though β€” can be added later if needed.

Could this instead be done using :define in def?

[!NOTE] πŸ€– Yes! Using ^:define on def is cleaner:

;; Macro function via def
(def ^:define MIN (fn [a b] (if (< a b) a b)))

This is consistent with how we handle constants vs macros. No need for a separate c-define form.

(def ^:define MIN (fn [a b] (if (< a b) a b)))

Outputs:

#define MIN(a, b) ((a) < (b) ? (a) : (b))

Static and Linkage

File-Local (static)

(def ^:static counter 0)           ;; static int counter = 0;
(defn ^:static helper ^void [] ...) ;; static void helper(void) {...}

Static Local Variables

(defn next-id ^int []
  (let [^{:static true :type int} id 0]
    (set! id (inc id))
    id))

Outputs:

int next_id(void) {
    static int id = 0;
    id = id + 1;
    return id;
}

Compiler Architecture

Compilation Pipeline

Note

As a side note, it would be cool if the language added an easy (but compatible with existing tools) way to configure builds Maybe it's just a library that can be run or something? Not important to start with though

[!NOTE] πŸ€– A build.clj that runs with Babashka would be nice:

(ns build
  (:require [cljc.build :as b]))

(b/compile-dir "src" "build"
  {:cc "gcc"
   :cflags ["-O2" "-Wall"]
   :output "myapp"})

Future consideration: A build.clj script running via Babashka could provide a Clojure-native build system. Low priority β€” users can use Make/CMake initially.

Clojure Source (.cljc)
        ↓
    [Reader]          ← Clojure/JVM reads the source
        ↓
    [Macro Expansion] ← Macros run on JVM
        ↓
    [Code Generation] ← Walk AST, emit C
        ↓
    C Source (.c/.h)
        ↓
    [User's Build System] ← gcc, clang, make, cmake, etc.
        ↓
    Binary

Compiler Does

Note

Does it need to do the type checking? It could leave that to the C compiler completely maybe? What benefits would there be to doing it ourselves?

Also possibly the build could work on a given folder like squint and be configured to compile files in place or to a different folder?

Broadly this is right though

[!NOTE] πŸ€– Benefits of doing some type checking:

  1. Better error messages β€” "expected ^int, got ^float at line 42" vs cryptic C compiler error
  2. Catch errors earlier β€” Before generating invalid C
  3. Enable type-based code gen β€” Macros like reduce need to know types to expand correctly
  4. IDE support β€” Can provide autocomplete, error highlighting

What to check:

  • Minimal: Just track types for macro expansion, let C compiler do the rest
  • Medium: Check function call arities, basic type mismatches
  • Full: Replicate C's type system (probably overkill)

Type checking approach: Minimal β€” catch obvious errors (arity, missing annotations, struct field access on non-structs) but delegate deep type checking to the C compiler. Generate #line directives to map C compiler errors back to source.

  • Parse Clojure syntax
  • Expand macros (on JVM at compile time)
  • Type check (based on annotations)
  • Emit C code
  • Output to specified file

Compiler Does Not

  • Invoke C compiler
  • Generate build files
  • Add include guards automatically
  • Manage header/implementation split
  • Provide a runtime

Optional Runtime Library

Note

As mentioned above, maybe some functions require these types or specific interfaces? Possibly they get included and assume the C compiler will tree-shake unused code if they don't get used?

[!NOTE] πŸ€– C compilers don't tree-shake β€” they compile what you give them. Linkers can do dead code elimination with -flto (link-time optimization) and -ffunction-sections + -gc-sections.

Better approach: header-only library with guards:

// Only compile what's used
#ifdef CLJC_USE_VEC
// Vec implementation
#endif

#ifdef CLJC_USE_HASHMAP
// HashMap implementation
#endif

Or generate only the runtime functions that are actually used in the user's code. The compiler knows what's needed.

An optional library providing common utilities. User explicitly opts in.

Core Data Structures (if desired)

(require '[cljc.runtime :as rt])

;; Dynamic vector
(let [v (rt/vec-new)]
  (rt/vec-push v 1)
  (rt/vec-push v 2)
  (rt/vec-free v))

Value Type (for dynamic typing)

(require '[cljc.dynamic :as d])

;; Boxed values with runtime type tags
(let [x (d/int 42)
      y (d/string "hello")]
  (d/print x)
  (d/print y))

This would use Boehm GC or similar. Completely optional.


Open Questions and Gaps

Unresolved Design Decisions

  1. Namespace handling β€” Do namespaces map to anything in C? Probably just for macro organization, no C output.

Possibly the name can be like a file path relative the root? foo.bar.baz or maybe it means nothing. Unsure I think the ns function is definitely useful as seen above for includes and pragmas

[!NOTE] πŸ€– The namespace name could be used for:

  • Function name prefixing (if :prefix not set): my.vec/new β†’ my_vec_new
  • Organizing macros at compile time
  • Documentation / tooling

But emitting nothing to C is also valid β€” just organizational. I'd say: optional prefixing, otherwise just for tooling.

Decision: Namespace names are organizational only β€” for macro scoping and tooling. No C output from namespace names unless explicit :prefix option is used.

Symbol Naming Convention

Follow Squint/ClojureScript's battle-tested munging:

Clojure C
my-func my_func
foo? foo_QMARK_
foo! foo_BANG_
str->int str__GT_int
< _LT_
= _EQ_
*x* _STAR_x_STAR_
+ _PLUS_

Consistent and reversible. Already in existing code means no identifier collisions.

Error Messages

Compiler errors include file, line, and column from reader metadata (like Squint/edamame):

Error at src/foo.cljc:42:3
  Unresolved symbol: undefined-fn

For C compiler errors, generate #line directives so gcc/clang report the original Clojure source line. Runtime errors are the user's responsibility.

Type Checking Depth

Minimal type checking β€” catch obvious errors early:

  • Function call arity mismatches
  • Missing type annotations on defn
  • Struct field access on non-structs
  • Macro expansion requirements

Deeper checks (int vs long, pointer compatibility) delegated to C compiler.

Generic/Templated Code

Out of scope for v1. Clojure macros can generate type-specific code if needed:

;; User-defined macro for type-specific structs
(defmacro defvec [T]
  `(defstruct ~(symbol (str "Vec_" (name T)))
     [^~(symbol (str T "*")) data
      ^size_t len
      ^size_t cap]))

;; Usage
(defvec int)   ;; generates Vec_int struct
(defvec float) ;; generates Vec_float struct

Could be formalized later if there's demand.

Struct Initialization Syntax

Follow Clojure's record syntax:

(Point. 10 20)              ;; positional: (Point){10, 20}
(map->Point {:x 10 :y 20})  ;; named: (Point){.x = 10, .y = 20}

Map literals {:x 10 :y 20} require type context to work:

(let [^Point p {:x 10 :y 20}] ...)  ;; type annotation tells compiler it's Point

Multi-dimensional Arrays

Use multi-arity aget/aset like standard Clojure:

(aget matrix i j)         ;; matrix[i][j]
(aset matrix i j val)     ;; matrix[i][j] = val
(aget cube x y z)         ;; cube[x][y][z]

Collection Length for Seq Operations

Unsure here, needs more thought? Maybe these functions require a function set that provides length?

[!NOTE] πŸ€– Options for knowing collection length:

  1. Convention: Arrays always come with length: (reduce + 0 xs xs-len)
  2. Struct wrapper: (defstruct Slice [^int* data ^size_t len]) β€” seq ops work on Slice
  3. Macro inspects type: If xs is ^(array int 10), length is known. If ^int*, require explicit length.
  4. Protocol-like pattern: Types that support seq ops must have a len field or function

Option 3 is cleanest: fixed arrays have compile-time length, pointers require explicit length parameter.

I really don't like the idea of requiring the length, it means changing the contract of these functions What would a protocol like patter look like for this? It seems like the only option that means that we don't change the syntax

[!NOTE] πŸ€– Protocol-like pattern for seq operations:

The compiler could recognize types that are "seqable":

;; Fixed-size arrays: length known at compile time
(let [^(array int 10) arr (make-array int 10)]
  (reduce + 0 arr))  ;; compiler knows len=10

;; Slice struct: has :data and :len fields
(defstruct Slice [^int* data ^size_t len])

(let [^Slice s ...]
  (reduce + 0 s))  ;; compiler extracts (:data s) and (:len s)

;; Vec struct: same pattern
(defstruct Vec [^int* data ^size_t len ^size_t cap])

(reduce + 0 my-vec)  ;; works because Vec has :data and :len

The "protocol" is implicit: any struct with ^T* data and ^size_t len fields is seqable. The macro expands differently based on the type:

;; For fixed array
(reduce + 0 arr) β†’ loop over 0..10

;; For Slice/Vec
(reduce + 0 s) β†’ loop over 0..(:len s) using (:data s)

;; For bare pointer β€” compiler error
(reduce + 0 ^int* ptr) β†’ "Cannot iterate ^int*: unknown length. Use Slice or pass length."

This keeps the syntax clean while being explicit about what's supported. Bare pointers must be wrapped in a Slice or use explicit length.

REPL Implementation

Future consideration: Development REPL via dlopen trick:

  • Compile expression to .so
  • dlopen with RTLD_GLOBAL
  • Call eval function
  • Keep library loaded

Needs more design work. Low priority for v1.

C++ Interop

Use extern "C" blocks for C++ compatibility:

(extern-c
  (defn my-func ^void [] ...))

;; Or at ns level
(ns mylib
  (:extern-c true))

Features Explicitly Not Included

  • Garbage collection (use runtime library if needed)
  • Runtime reflection
  • Runtime eval
  • Persistent data structures (use runtime library if needed)
  • Automatic memory management
  • Exceptions (use return codes or setjmp/longjmp via c-raw)

Comparison to Similar Projects

Project Language Target GC Types
This project Clojure syntax C No Explicit
Squint ClojureScript JavaScript JS GC Dynamic
Carp Lisp C Borrow checker Inferred
Zig Custom Machine code No Explicit
Nim Python-like C Optional Inferred

Example: Complete Small Program

point.h.cljc:

(ns point
  (:pragma :once))

(defstruct Point [^int x ^int y])

(declare point-new ^Point [^int x ^int y])
(declare point-add ^Point [^Point a ^Point b])
(declare point-print ^void [^Point* p])

point.c.cljc:

(ns point
  (:require ["<stdio.h>"]
            ["point.h"]))

(defn point-new ^Point [^int x ^int y]
  (Point. x y))

(defn point-add ^Point [^Point a ^Point b]
  (Point. (+ (:x a) (:x b))
          (+ (:y a) (:y b))))

(defn point-print ^void [^Point* p]
  (printf "Point(%d, %d)\n" (:x p) (:y p)))

main.c.cljc:

(ns main
  (:require ["point.h"]))

(defn main ^int []
  (let [a (point-new 10 20)
        b (point-new 5 5)
        c (point-add a b)]
    (point-print (addr c))
    0))

Build:

mycompiler point.h.cljc -o build/point.h
mycompiler point.c.cljc -o build/point.c
mycompiler main.c.cljc -o build/main.c
gcc build/point.c build/main.c -o program
./program
# Output: Point(15, 25)

Next Steps

  1. Implement minimal compiler (defn, defstruct, let, if, basic types)
  2. Add macro support (leverage Clojure's macro system)
  3. Implement seq operations as macros (reduce, map, filter)
  4. Add enum/tagged union support
  5. Build example programs to validate design
  6. Document symbol naming conventions
  7. Consider REPL implementation for dev experience
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment