Skip to content

Instantly share code, notes, and snippets.

@xeioex
Created February 11, 2026 05:51
Show Gist options
  • Select an option

  • Save xeioex/c77bb9ee277985e17c383a6ee19d71db to your computer and use it in GitHub Desktop.

Select an option

Save xeioex/c77bb9ee277985e17c383a6ee19d71db to your computer and use it in GitHub Desktop.

njs Source Code Internals

Detailed notes on the njs JavaScript engine internals gathered during the optional chaining (?.) implementation. Intended as a reference for future feature work.


1. Parser State Machine

Core Mechanism

The parser is a stack-based state machine, not a recursive-descent parser. Each state is a function with the signature:

static njs_int_t
njs_parser_<state>(njs_parser_t *parser, njs_lexer_token_t *token,
    njs_queue_link_t *current);

State transitions use three primitives:

Primitive Effect
njs_parser_next(parser, state) Set the immediate next state (replaces current)
njs_parser_after(parser, current, target, deferred, state) Push a continuation onto the stack (runs after the immediate state completes)
njs_parser_stack_pop(parser) Pop to the next continuation on the stack

Continuation Insertion Order

njs_parser_after inserts before the current link in the queue. When multiple njs_parser_after calls are made in sequence, the last push runs first after the immediate state completes.

Example — pushing A then B:

njs_parser_after(parser, current, ..., stateA);  // push A
njs_parser_after(parser, current, ..., stateB);  // push B before A

Execution order: immediate_state -> B -> A -> (rest of stack)

Key Fields

  • parser->node — the current AST node being built. States read and write this to pass results between states.
  • parser->target — set by the target parameter of njs_parser_after. Used to pass a "parent" node into a continuation state (e.g., the function node for argument parsing, or a wrapper node for optional chain).

Token Consumption

njs_lexer_consume_token(parser->lexer, N);  // consume N tokens
njs_lexer_token(parser->lexer, 0);          // get current token (after consume)
njs_lexer_peek_token(parser->lexer, token, 0);  // peek next token without consuming

The token parameter passed to each state function is the current token. After consuming tokens, call njs_lexer_token() to get the new current token.

Creating AST Nodes

njs_parser_node_t *node = njs_parser_node_new(parser, NJS_TOKEN_TYPE);
node->token_line = token->line;
node->u.operation = NJS_VMCODE_XXX;  // for operation nodes
node->left = ...;
node->right = ...;

String/identifier nodes:

njs_parser_node_t *str = njs_parser_node_string(parser->vm, token, parser);

Call Expression Creation

njs_parser_create_call(parser, node, ctor) inspects node->token_type:

Node Type Result
NJS_TOKEN_NAME Reuses the node, changes type to FUNCTION_CALL
NJS_TOKEN_PROPERTY Creates METHOD_CALL wrapping the property
NJS_TOKEN_OPTIONAL_CHAIN (with PROPERTY right) Unwraps to METHOD_CALL with plain PROPERTY
Everything else Creates FUNCTION_CALL wrapping the node

The distinction between METHOD_CALL and FUNCTION_CALL is critical for this binding: METHOD_CALL uses METHOD_FRAME (which sets this to the object), while FUNCTION_CALL uses FUNCTION_FRAME (which sets this to undefined).

Delete Handling

In njs_parser_unary_expression_after, when type == NJS_TOKEN_DELETE:

  • NJS_TOKEN_PROPERTY -> converted to NJS_TOKEN_PROPERTY_DELETE in-place, returned without a DELETE wrapper.
  • NJS_TOKEN_OPTIONAL_CHAIN -> inner PROPERTY converted to PROPERTY_DELETE, then falls through to wrap in a DELETE unary node (so the short-circuit undefined is coerced to true by NJS_VMCODE_DELETE).
  • NJS_TOKEN_NAME -> syntax error.
  • Default -> wrapped in DELETE unary node (evaluates expression, returns true).

2. Code Generator

Core Mechanism

The generator is also a stack-based state machine, mirroring the parser's design. Each state is:

static njs_int_t
njs_generate_<state>(njs_vm_t *vm, njs_generator_t *generator,
    njs_parser_node_t *node);

State transitions:

Primitive Effect
njs_generator_next(generator, state, node) Set immediate next state and node
njs_generator_after(vm, generator, link, node, state, ctx, ctx_size) Push continuation
njs_generator_stack_pop(vm, generator, ctx) Pop to next continuation

Insertion Order

Same as the parser: njs_generator_after inserts before the given link. The last push runs first.

Context Passing

njs_generator_after accepts a ctx pointer and ctx_size. If ctx_size > 0, the context is copied to a new allocation. The continuation state accesses it via generator->context. This is used to pass data like jump offsets between phases.

Main Dispatch

njs_generate() in njs_generator.c has a large switch on node->token_type that dispatches to specialized generators:

case NJS_TOKEN_PROPERTY:
case NJS_TOKEN_PROPERTY_DELETE:
    return njs_generate_3addr_operation(vm, generator, node, 0);

case NJS_TOKEN_METHOD_CALL:
    return njs_generate_method_call(vm, generator, node);

case NJS_TOKEN_FUNCTION_CALL:
    return njs_generate_function_call(vm, generator, node);

Common Patterns

3-address operation (njs_generate_3addr_operation):

  1. Generate left child
  2. Generate right child
  3. Emit 3-addr instruction: dst = op(src1, src2)

Used for PROPERTY_GET, PROPERTY_ATOM_GET, PROPERTY_DELETE, arithmetic, comparisons, etc. The swap parameter controls operand order.

Test-jump pattern (used by ??, ||, &&, ?.):

  1. Phase 1: Generate the left/base expression
  2. Phase 2: Emit test-jump opcode (conditional jump), save jump offset
  3. Generate the right/body expression
  4. Phase 3: MOVE result if needed, patch the jump offset
// Phase 2: emit test-jump
njs_generate_code(generator, njs_vmcode_test_jump_t, test_jump,
                  NJS_VMCODE_XXX, node);
jump_offset = njs_code_offset(generator, test_jump);
test_jump->value = node->left->index;
test_jump->retval = node->index;

// Phase 3: patch jump
njs_code_set_jump_offset(generator, njs_vmcode_test_jump_t,
                         *((njs_jump_off_t *) generator->context));

Method call (njs_generate_method_call):

  1. Generate prop->left (object expression)
  2. Generate prop->right (method name expression)
  3. Emit METHOD_FRAME with object = prop->left->index, method = prop->right->index
  4. Generate arguments
  5. Emit FUNCTION_CALL

Function call (njs_generate_function_call):

  1. Generate function expression
  2. Emit FUNCTION_FRAME (this = undefined)
  3. Generate arguments
  4. Emit FUNCTION_CALL

Code Emission

njs_generate_code(generator, type_t, var, NJS_VMCODE_XXX, node);

This macro allocates space in the code buffer, sets the opcode, and assigns the pointer to var. The node is used for debug info (line numbers).

Index Management

// Allocate a temporary index for a node
node->index = njs_generate_node_temp_index_get(vm, generator, node);

// Get destination index (reuses dest hint or allocates temp)
node->index = njs_generate_dest_index(vm, generator, node);

// Release children's temp indexes
njs_generate_children_indexes_release(vm, generator, node);

Jump Offset Patching

// Save current code position
njs_jump_off_t offset = njs_code_offset(generator, instruction);

// Later, patch the jump to point to current position
njs_code_set_jump_offset(generator, instruction_type, offset);

3. VM Opcodes

Dispatch Table

njs_vmcode.c uses a computed-goto dispatch table (NJS_GOTO_ROW(opcode) entries). Each opcode has a CASE handler.

Key Opcodes

Opcode Structure Description
FUNCTION_FRAME njs_vmcode_function_frame_t Create call frame, this = undefined
METHOD_FRAME njs_vmcode_method_frame_t Property lookup + call frame, this = object
FUNCTION_CALL njs_vmcode_function_call_t Invoke the prepared frame
PROPERTY_GET njs_vmcode_3addr_t dst = obj[key] (computed key)
PROPERTY_ATOM_GET njs_vmcode_3addr_t dst = obj.name (string/number key, optimized)
PROPERTY_DELETE njs_vmcode_3addr_t delete obj[key], returns boolean
MOVE njs_vmcode_move_t dst = src
COALESCE njs_vmcode_test_jump_t ?? operator: if value is null/undefined, jump
OPTIONAL_CHAIN njs_vmcode_test_jump_t ?. operator: if null/undefined, set undefined + jump
TEST_IF_TRUE njs_vmcode_test_jump_t || operator
TEST_IF_FALSE njs_vmcode_test_jump_t && operator
DELETE njs_vmcode_2addr_t Always returns true (non-reference delete)

METHOD_FRAME Details

CASE (NJS_VMCODE_METHOD_FRAME):
    // operand2 = object, operand3 = method name/key
    // 1. Convert key to string if needed
    // 2. njs_value_property_val(vm, object, key, &method) — property lookup
    // 3. Check method is a function
    // 4. njs_function_frame_create(vm, &method, object, nargs, ctor)
    //    — creates frame with this = object

Note: METHOD_FRAME does its own property lookup. This means it re-evaluates the property access. For optional chaining method calls (o.m?.()), this means the property is accessed twice: once for the null check and once for the METHOD_FRAME. This is acceptable for njs (no Proxy support).

FUNCTION_FRAME Details

CASE (NJS_VMCODE_FUNCTION_FRAME):
    // operand2 = function value
    // njs_function_frame_create(vm, function, &njs_value_undefined, nargs, ctor)
    // — creates frame with this = undefined

4. Token Types and Lexer

Token Type Enum

Defined in src/njs_lexer.h. Token types serve dual purpose:

  • Lexer tokens (produced by the lexer during tokenization)
  • AST node types (used in njs_parser_node_t.token_type)

Some token types are AST-only (never produced by the lexer):

NJS_TOKEN_OPTIONAL_CHAIN,       // wrapper for ?. chain
NJS_TOKEN_OPTIONAL_CHAIN_REF,   // placeholder for base index
NJS_TOKEN_PROPERTY_DELETE,      // delete obj.prop
NJS_TOKEN_METHOD_CALL,          // obj.method()
NJS_TOKEN_FUNCTION_CALL,        // func()

Token Serialization

Each token type needs a serialization entry in njs_parser_serialize_node() (at the end of njs_parser.c) for error messages and debugging:

njs_token_serialize(NJS_TOKEN_OPTIONAL_CHAIN);
njs_token_serialize(NJS_TOKEN_OPTIONAL_CHAIN_REF);

5. Adding a New VM Opcode (Checklist)

  1. src/njs_vmcode.h: Add to the njs_vmcode_t enum.
  2. src/njs_vmcode.c: Add NJS_GOTO_ROW() entry in the dispatch table. Add CASE handler.
  3. src/njs_disassembler.c: Add disassembly output block.
  4. src/njs_generator.c: Add dispatch in njs_generate() and implement the generation function(s).
  5. src/njs_lexer.h: If new AST-only token types are needed, add before NJS_TOKEN_RESERVED.
  6. src/njs_parser.c: Add token serialization at the end.

6. Parser Node Structure

struct njs_parser_node_s {
    njs_token_type_t    token_type:16;
    uint8_t             ctor:1;       // constructor call flag
    uint8_t             hoist:1;      // statement hoisting (imports)
    uint8_t             temporary;    // temp index flag
    uint32_t            token_line;

    union {
        uint32_t                    length;
        njs_variable_reference_t    reference;
        njs_value_t                 value;     // literal values
        njs_vmcode_t                operation; // opcode for operations
        njs_parser_node_t           *object;
        njs_mod_t                   *module;
    } u;

    njs_str_t           name;
    njs_index_t         index;        // result index after generation
    njs_parser_node_t   *left;        // first child / object
    njs_parser_node_t   *right;       // second child / property name
    njs_parser_node_t   *dest;        // destination hint
    // ... scope, etc.
};

Index Field

node->index is set during code generation. It encodes:

index | level_type (4-bit) | var_type (4-bit)

Level types: LOCAL=0, CLOSURE=1, GLOBAL=2, STATIC=3.

After generating a node, its index field contains the register/slot where the result is stored at runtime.


7. Optional Chaining Implementation Details

AST Structure

For obj?.prop:

OPTIONAL_CHAIN (op: NJS_VMCODE_OPTIONAL_CHAIN)
+-- left: NAME(obj)              <- base, null-checked
+-- right: PROPERTY(ref, "prop") <- chain body

OPTIONAL_CHAIN_REF is a placeholder node whose index is set by the generator to match the base's index, so the chain body can reference the base's computed value without re-evaluating it.

Method Call This Binding

For obj.method?.() — the ?.() pattern on a property base:

OPTIONAL_CHAIN
+-- left: PROPERTY(NAME(obj), STRING("method"))
+-- right: METHOD_CALL
    +-- left: PROPERTY(ref_obj, ref_method)  <- two refs
    +-- right: args

The generator walks the right subtree's left spine to find a METHOD_CALL whose PROPERTY has an OPTIONAL_CHAIN_REF as its right child (distinguishing from normal METHOD_CALL where right is a STRING). It then extracts object/method-name indices from the base PROPERTY:

  • ref_obj->index = base_prop->left->index (the object)
  • ref_method->index = base_prop->right->index (the method name)

The METHOD_FRAME re-evaluates the property access but correctly sets this to the object.

Generator Left-Spine Walk

The njs_generate_optional_chain_ref() helper walks node->left repeatedly to find the first NJS_TOKEN_OPTIONAL_CHAIN_REF in a subtree. This works because AST nodes chain through left pointers:

PROPERTY -> left -> METHOD_CALL -> left -> PROPERTY -> left -> ref

For the two-ref method call case, the generator walks the left spine looking for a METHOD_CALL with the special PROPERTY(ref, ref) pattern instead of using the simple ref-finder.


8. Useful Debugging Techniques

Bytecode Disassembly

./build/njs -d -c "expression"

Opcode Tracing (requires --debug-opcode=YES build)

./configure --debug-opcode=YES && make njs
./build/njs -o script.js

Generator Debug (requires --debug-generator=YES build)

./configure --debug-generator=YES && make njs

Address Sanitizer Build

./configure --address-sanitizer=YES && make njs

GDB with debugger Statement

// In JS code:
debugger;
gdb ./build/njs
(gdb) break njs_vmcode_debugger
(gdb) run script.js
# Then inspect: p *njs_scope_value_get(vm, 0x0123)

9. Key Source Files

File Purpose
src/njs_lexer.h Token type enum, lexer structures
src/njs_lexer.c Tokenizer implementation
src/njs_parser.h Parser node structure, parser state
src/njs_parser.c Parser state machine (~9600 lines)
src/njs_generator.c Code generator state machine (~5200 lines)
src/njs_vmcode.h VM opcode enum, instruction structures
src/njs_vmcode.c VM interpreter loop (~1800 lines)
src/njs_disassembler.c Bytecode disassembly output
src/njs_scope.h Scope/index value access (njs_scope_value)
src/njs_variable.c Variable resolution
src/njs_function.c Function/frame management
src/test/njs_unit_test.c Unit test cases (~23000 lines)

10. Coding Conventions

  • 4 spaces indentation, no tabs
  • 80 character line limit
  • C declarations at the top of function bodies
  • njs_slow_path() / njs_fast_path() for branch prediction hints
  • Pool-based memory allocation (njs_mp_alloc), no individual frees for parser nodes
  • Error returns: NJS_ERROR for fatal, NJS_DONE for parse errors with message, NJS_DECLINED for "not handled"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment