Bytecode VM
For contributors working on the bytecode backend.
Executive Summary#
- Two execution modes — tree-walk interpreter (default) and bytecode VM (
--mode=bytecode), sharing the same frontend, runtime objects, and GC - Executor abstraction —
TGocciaBytecodeExecutorimplementsTGocciaExecutorwith no dependency on the interpreter or evaluator - Goccia-owned VM — executes directly on
TGocciaValuewith taggedTGocciaRegistervalues; not a generic VM layer - Opcode space — core instructions (0-127) for hot paths, non-core generic ops (128-166), and semantic/helper instructions (167-255) for colder operations like imports/exports
- Binary format —
.gbcfiles with little-endian encoding,GBC\0magic, and version constant
Overview#
GocciaScript has two execution modes:
- Interpreter mode: tree-walk execution over the AST via
TGocciaInterpreterExecutor - Bytecode mode: AST compilation to Goccia bytecode, then execution on
TGocciaVMviaTGocciaBytecodeExecutor
Both modes are implementations of TGocciaExecutor (see Architecture). The single TGocciaEngine class bootstraps the core language environment (global scope, core built-ins, shims) and delegates execution to whichever executor is configured. Optional host/runtime globals are attached through runtime extensions. The bytecode executor has no dependency on the interpreter or evaluator — it only uses the compiler and VM.
Pipeline#
Source -> JSX Transformer (optional) -> Lexer -> Parser -> Compiler -> Goccia Bytecode -> TGocciaVM -> TGocciaValuePublic bytecode artifacts use the .gbc extension.
Main Units#
| Area | Units |
|---|---|
| Opcode definitions | Goccia.Bytecode.pas |
| Function templates / constants | Goccia.Bytecode.Chunk.pas |
| Module format | Goccia.Bytecode.Module.pas |
| Binary I/O | Goccia.Bytecode.Binary.pas |
| Debug metadata | Goccia.Bytecode.Debug.pas |
| VM execution | Goccia.VM.pas |
| Frames / closures / upvalues | Goccia.VM.CallFrame.pas, Goccia.VM.Closure.pas, Goccia.VM.Upvalue.pas |
| Bytecode executor | Goccia.Engine.Backend.pas (TGocciaBytecodeExecutor) |
| Opcode name lookup | Goccia.Bytecode.OpCodeNames.pas |
| Profiler | Goccia.Profiler.pas, Goccia.Profiler.Report.pas |
Core Design#
- The register file uses tagged values that keep scalars unboxed until they cross a runtime boundary (see Design Direction).
- The VM uses the same value classes as the interpreter: arrays, objects, classes, promises, functions, symbols, enums, and built-ins.
undefined,null, booleans, and hole values use shared singleton objects.- Sparse arrays use
TGocciaHoleValue.HoleValue, not rawnil. - The VM is integrated with the shared garbage collector and shared call stack.
- Call stack depth is tracked per frame (
FFrameDepth) and enforced against a configurable limit (default 3 500 frames,--stack-size=N). Exceeding the limit throws aRangeError: Maximum call stack size exceeded. Pass--stack-size=0to disable the limit. Bytecode-to-bytecode calls use a trampoline (FFrameStack) so the Pascal call stack stays flat regardless of JS call depth. - Bytecode mode enables strict type enforcement through compiler-emitted checks and typed opcodes.
Opcode Layout#
The opcode space is split into three tiers:
0..127: core VM instructions128..166: non-core generic arithmetic/bitwise operations167..255: semantic helper/orchestration operations
In the current VM:
- core instructions cover hot execution paths such as locals, arithmetic, comparisons, property/index access, calls, construction, iteration, and class/object setup
- semantic instructions already include generic arithmetic and bitwise operations in
128..140 - module and async orchestration currently starts at
167(IMPORT,EXPORT,AWAIT,IMPORT_META)
The current encoding helpers are defined in Goccia.Bytecode.pas:
EncodeABCEncodeABxEncodeAsBxEncodeAxDecodeOpDecodeADecodeBDecodeCDecodeBxDecodesBxDecodeAx
Current instruction families:
- load and move: constants, literals, locals, upvalues
- control flow: jumps, handlers, throw, return
- closures: function templates and captured upvalues
- typed arithmetic and comparison
- object and array operations
- class construction and member definition
- calls, construction, iteration, globals, and string coercion
- semantic-only imports/exports, dynamic import, import.meta, await, and yield
Some opcode families intentionally use flags or mode operands instead of one opcode per syntax form. For example:
- accessor definition uses constant-key and dynamic-key instructions plus getter/setter and static/instance flags
- collection helpers use a shared opcode (
OP_COLLECTION_OP) for object spread, object rest, and iterable-to-array spread - validation uses a shared opcode for require-object and require-iterable checks
- generator metadata is serialized and a single
OP_YIELDdrives generator suspension;yield*uses the same opcode with delegation bookkeeping rather than a second opcode - bytecode generators suspend by snapshotting the VM continuation at
OP_YIELD(instruction pointer, live registers, local cells, handler state, and GC-visible references) and resume from that snapshot instead of replaying from function entry
Current opcode design rules:
- add explicit Goccia opcodes for stable, hot, language-owned behaviour
- do not introduce generic VM naming into new instructions
- do not add an opcode for something already reachable through existing call dispatch — if a built-in (e.g.
Object.freeze) already goes throughOP_CALL_METHOD, emit that call sequence from the compiler rather than adding a new opcode or sub-mode - prefer mode operands on shared opcodes over proliferating single-purpose opcodes for cold or infrequent operations
Performance Direction#
Recent VM cleanup and optimization work has focused on reducing per-instruction overhead without reintroducing old abstraction layers:
- cache and reuse shared primitive values directly in registers
- avoid eager allocation of closure cells for uncaptured locals
- pre-size argument collections for calls and construction
- use unchecked template access in the dispatch loop where bounds are already guaranteed
- keep fast register access limited to proven hot/simple paths; local-slot and complex property paths should only move to fast access when they stay correct and measurably improve throughput
The current optimization target is reducing bytecode-mode suite time further without diverging interpreter and bytecode semantics.
Profiling#
The --profile flag on GocciaScriptLoader enables language-level profiling of the bytecode VM. See profiling.md for the full guide.
--profile=opcodes— opcode frequency histogram, opcode pair frequency (superinstruction candidates), and scalar fast-path hit rate for generic arithmetic/comparison opcodes--profile=functions— per-function self-time, total-time, call count, and heap allocation count--profile=all— both--profile-output=path.json— JSON export
The profiler follows the same singleton-tracker pattern as coverage (Goccia.Coverage.pas). Zero overhead when disabled. Opcode counting adds ~1% overhead; function timing adds ~3%.
Instruction Limit#
The dispatch loop supports an optional instruction counter (Goccia.InstructionLimit.pas). When armed, the counter increments on every dispatched instruction and the limit is checked at the top of each iteration. When disabled, only the guard read of the limit threadvar remains on the hot path. See Embedding — Execution Limits for the full API and interpreter-mode behavior.
Binary Format#
- Magic:
GBC\0 - Version constant:
GOCCIA_FORMAT_VERSION - Endianness: little-endian
- File extension:
.gbc
Current Status#
--mode=bytecoderuns the Goccia VM directly.- The full JavaScript suite passes in bytecode mode.
- The old generic VM/runtime bridge has been removed from the active build.
Design Rationale#
GocciaScript includes a bytecode execution backend built specifically for GocciaScript. The current VM is not a language-agnostic subsystem: it executes directly on TGocciaValue, shares the same runtime objects as the interpreter, and uses a Goccia-owned opcode surface.
Why a Bytecode VM?#
The tree-walk interpreter directly evaluates AST nodes via recursive function calls. This is simple and debuggable, but carries overhead from VMT dispatch on every AST node, deep call stacks for nested expressions, and no opportunity for instruction-level optimization. A bytecode VM trades compilation cost for faster execution: flat instruction dispatch, register-based operands, and a compact in-memory representation.
Why Register-Based?#
Stack-based VMs (like the JVM and WASM) are simpler to compile to and have smaller instruction encoding. Register-based VMs (like Lua 5, LuaJIT, and Dalvik) need fewer instructions per operation and avoid redundant stack manipulations. Register-based was chosen for execution performance.
Why Three Tiers?#
The solution is a split opcode space with three tiers:
- Core range (0–127): register, control-flow, closure, literal, and other hot/stable VM operations.
- Non-core generic range (128–166): generic arithmetic and bitwise operations that are still explicit bytecode but handle mixed or untyped operands.
- Semantic helper range (167–255): colder language-level orchestration operations such as imports/exports, dynamic import,
import.meta, await, and resource disposal.
This split keeps the dispatch surface organized while still allowing the backend to be explicitly Goccia-specific.
Why Shared Runtime Values?#
The VM shares the TGocciaValue object model with the interpreter rather than maintaining a second value representation. Registers use TGocciaRegister — a tagged variant record (Goccia.VM.Registers.pas) that keeps booleans, integers, and floats unboxed as scalars. Values only cross into TGocciaValue when they leave the register file (e.g., property access, function calls, GC marking).
That choice removes:
- conversion layers between interpreter values and VM values
- duplicate object models for arrays, objects, classes, and promises
- bridge-only GC root management
- bytecode/runtime disagreement over
undefined,null, and sparse array holes
The trade-off is that arithmetic fast paths are split between scalar register operations (typed opcodes like OP_ADD_INT / OP_ADD_FLOAT) and generic TGocciaValue fallbacks (like OP_ADD).
Compiler-Side Desugaring#
Language features are compiled into compact bytecode instruction sequences rather than expanding the opcode surface unnecessarily:
- Nullish coalescing (`??`) and nullish coalescing assignment (`??=`) — The compiler emits
OP_JUMP_IF_NOT_NULLISHin its nullish-match mode, soundefined,null, and internal hole values all follow the same short-circuit path without extra comparison instructions. - Template literals — The compiler parses interpolations at compile time, emits string constants and
OP_TO_STRINGfor expression parts, then chainsOP_CONCATinstructions. - Object spread — The compiler emits dedicated Goccia bytecode rather than routing through a generic extension dispatcher.
This keeps the emitted bytecode compact and makes opcode additions deliberate instead of reactive.
Compiler Optimizer#
Bytecode compilation includes a small compile-time value optimizer. It folds pure primitive constant expressions, propagates immutable local const bindings initialized from compile-time constants, and omits branches or statement tails that are provably unreachable.
The optimizer is intentionally compiler-side only:
- it does not add opcodes or change the
.gbcformat - it does not track mutable bindings, imports, destructuring, function/class declarations, or global-backed top-level bindings
- it only uses
--strict-typesfor conservative algebraic simplifications where the strict type alone preserves JavaScript semantics
When coverage is enabled, PreserveCoverageShape keeps constant branch structure in the emitted bytecode so coverage can report the non-hit branch instead of erasing it from the report.
How Opcode Additions Work#
New opcodes should be added only when an operation is both common enough and semantically stable enough to justify a dedicated instruction.
Prefer:
- explicit Goccia opcodes for core language/runtime behaviour
- compiler lowering to existing instructions for syntactic sugar
- flags or operands when an operation is a mode of an existing instruction rather than a new concept
Tier 1 Property Flags vs Tier 2 Visibility#
Property mutability (writable/configurable) is still a VM concern. Bulk operations like freeze and seal remain derived from the lower-level property-flag operations:
SetEntryFlags(key, flags)— modify flags on a single propertyPutWithFlags(key, value, flags)— create a property with specific flagsPreventExtensions— stop new properties from being addedFreeze= iterate all entries, set flags to 0, prevent extensions (a convenience, not a primitive)
Property visibility and accessor semantics remain part of the higher-level object/class model rather than low-level property-flag storage.
Spread Calling Consolidation#
Spread-based calls use the flags byte on OP_CALL and OP_CALL_METHOD. Spread is treated as a mode of the call instruction rather than as a separate opcode family.
Rejected Findings#
During code review, the following findings were investigated and determined to be non-issues:
- `SBIAS_24` (`Goccia.Bytecode.pas`) — The 24-bit signed bias constant 8388607 is correct. The 24-bit unsigned range 0..16777215 centered at 8388607 gives a signed range of −8388607..+8388608. This is standard Lua-style bias encoding.
- Token list leak in `Goccia.Compiler.Test.pas` —
Lexer.ScanTokensreturns the lexer's ownFTokenslist (freed in the lexer's destructor). Adding manualTokens.Freecauses a double-free crash.
Related documents#
- Architecture — Shared frontend and both backends at a glance
- Interpreter — Tree-walk backend (
Goccia.Interpreter,Goccia.Evaluator.*) - Core patterns — Recurring Pascal conventions and terminology
Contributor Notes#
- Do not add new bytecode/runtime concepts under old generic naming.
- Prefer
TGoccia*bytecode and VM types in new code. - Keep interpreter and bytecode semantics aligned through shared runtime objects, not conversion layers.