Bytecode VM
For contributors working on the bytecode executor and VM.
Executive Summary#
- Two execution modes — tree-walk interpreter (default) and bytecode VM (
--mode=bytecode), sharing the same source pipeline, runtime objects, and GC - Executor abstraction —
TGocciaBytecodeExecutorimplementsTGocciaExecutorwith no dependency on the interpreter or evaluator - Goccia-owned VM — executes directly on
TGocciaValuewith taggedTGocciaRegistervalues; not a generic VM layer - Opcode space — core instructions (0-127) for hot paths, non-core generic ops (128-166), and semantic/helper instructions (167-255) for colder operations like imports/exports
- Binary format —
.gbcfiles with little-endian encoding,GBC\0magic, and version constant
Overview#
GocciaScript has two execution modes:
- Interpreter mode: tree-walk execution over the AST via
TGocciaInterpreterExecutor - Bytecode mode: AST compilation to Goccia bytecode, then execution on
TGocciaVMviaTGocciaBytecodeExecutor
Both execution modes are implementations of TGocciaExecutor (see Architecture). The single TGocciaEngine class bootstraps the core language environment (global scope, core built-ins, shims) and delegates execution to whichever executor is configured. Optional runtime globals are attached through runtime extensions. The bytecode executor has no dependency on the interpreter or evaluator — it only uses the compiler and VM.
Pipeline#
Source -> JSX Transformer (optional) -> Lexer -> Parser -> Compiler -> Goccia Bytecode -> TGocciaVM -> TGocciaValuePublic bytecode artifacts use the .gbc extension.
Main Units#
| Area | Units |
|---|---|
| Opcode definitions | Goccia.Bytecode.pas |
| Function templates / constants | Goccia.Bytecode.Chunk.pas |
| Module format | Goccia.Bytecode.Module.pas |
| Binary I/O | Goccia.Bytecode.Binary.pas |
| Debug metadata | Goccia.Bytecode.Debug.pas |
| VM execution | Goccia.VM.pas |
| Frames / closures / upvalues | Goccia.VM.CallFrame.pas, Goccia.VM.Closure.pas, Goccia.VM.Upvalue.pas |
| Bytecode executor | Goccia.Executor.Bytecode.pas (TGocciaBytecodeExecutor) |
| Opcode name lookup | Goccia.Bytecode.OpCodeNames.pas |
| Profiler | Goccia.Profiler.pas, Goccia.Profiler.Report.pas |
Core Design#
- The register file uses tagged values that keep scalars unboxed until they cross a runtime boundary (see Design Direction).
- The VM uses the same value classes as the interpreter: arrays, objects, classes, promises, functions, symbols, enums, and built-ins.
undefined,null, booleans, and hole values use shared singleton objects.- Sparse arrays use
TGocciaHoleValue.HoleValue, not rawnil. - The VM is integrated with the shared garbage collector and shared call stack.
- Call stack depth is tracked per frame (
FFrameDepth) and enforced against a configurable limit (default 2 900 frames,--stack-size=N). Exceeding the limit throws aRangeError: Maximum call stack size exceeded. Pass--stack-size=0to disable the limit. Bytecode-to-bytecode calls use a trampoline (FFrameStack) so the Pascal call stack stays flat regardless of JS call depth. - Bytecode mode enables strict type enforcement through compiler-emitted checks and typed opcodes.
Opcode Layout#
The opcode space is split into three tiers:
0..127: core VM instructions128..166: non-core generic arithmetic/bitwise operations167..255: semantic helper/orchestration operations
In the current VM:
- core instructions cover hot execution paths such as locals, arithmetic, comparisons, property/index access, calls, construction, iteration, and class/object setup
- semantic instructions already include generic arithmetic and bitwise operations in
128..140 - module and async orchestration currently starts at
167(IMPORT,EXPORT,AWAIT,IMPORT_META)
The current encoding helpers are defined in Goccia.Bytecode.pas:
EncodeABCEncodeABxEncodeAsBxEncodeAxDecodeOpDecodeADecodeBDecodeCDecodeBxDecodesBxDecodeAx
Current instruction families:
- load and move: constants, literals, locals, upvalues
- control flow: jumps, handlers, throw, return
- closures: function templates and captured upvalues
- typed arithmetic and comparison
- object and array operations
- class construction and member definition
- calls, construction, iteration, globals, and string coercion
- opt-in compatibility scope helpers: unmapped
argumentsobject creation andwithobject binding probes - semantic-only imports/exports, dynamic import, import.meta, await, and yield
Some opcode families intentionally use flags or mode operands instead of one opcode per syntax form. For example:
- accessor definition uses constant-key and dynamic-key instructions plus getter/setter and static/instance flags
- collection helpers use a shared opcode (
OP_COLLECTION_OP) for object spread, object rest, and iterable-to-array spread - validation uses a shared opcode for require-object and require-iterable checks
- generator metadata is serialized and a single
OP_YIELDdrives generator suspension;yield*uses the same opcode with delegation bookkeeping rather than a second opcode - bytecode generators suspend by snapshotting the VM continuation at
OP_YIELD(instruction pointer, live registers, local cells, handler state, and GC-visible references) and resume from that snapshot instead of replaying from function entry
Current opcode design rules:
- add explicit Goccia opcodes for stable, hot, language-owned behaviour
- do not introduce generic VM naming into new instructions
- do not add an opcode for something already reachable through existing call dispatch — if a built-in (e.g.
Object.freeze) already goes throughOP_CALL_METHOD, emit that call sequence from the compiler rather than adding a new opcode or sub-mode - prefer mode operands on shared opcodes over proliferating single-purpose opcodes for cold or infrequent operations
Performance Direction#
Recent VM cleanup and optimization work has focused on reducing per-instruction overhead without reintroducing old abstraction layers:
- cache and reuse shared primitive values directly in registers
- avoid eager allocation of closure cells for uncaptured locals
- pre-size argument collections for calls and construction
- use unchecked template access in the dispatch loop where bounds are already guaranteed
- keep fast register access limited to proven hot/simple paths; local-slot and complex property paths should only move to fast access when they stay correct and measurably improve throughput
Inline Caches#
Three per-site inline caches live on TGocciaFunctionTemplate, all indexed by the instruction's name-constant index, all runtime-only (never serialised to .gbc):
- Global reads (
OP_GET_GLOBAL) —TGocciaGlobalReadCacheEntryvalidates(scope identity, binding-map entry version)and re-reads the binding by entry index, skipping the name hash. - Own property reads (
OP_GET_PROP_CONST) —TGocciaPropertyReadCacheEntryvalidates the receiver's interned shape (Goccia.Values.Shape): same shape implies the same key at the cached entry index, so one site hits across many same-layout receivers. The descriptor kind is re-checked on every hit because data-to-accessor redefinition keeps the entry index. - Prototype-resolved reads (
OP_GET_PROP_CONST, after an own miss) —TGocciaProtoReadCacheEntryproves continued absence of the name on the receiver and intermediate levels and presence at the holder, all by fresh shape identity per level, then re-reads the holder descriptor by entry index. The live chain is re-walked per hit, sosetPrototypeOfis followed inherently; chain levels must be exactTGocciaObjectValue; chains deeper than two levels and accessor holders stay generic. Class instance methods (data properties on the class prototype object) are the dominant beneficiary.
Hits and fills serve only exact-class TGocciaObjectValue / TGocciaVMLiteralObjectValue / TGocciaInstanceValue receivers, so overridden lookup semantics (proxies, exotic objects, private names) always take the generic path. Shapes are computed lazily at fill time (EnsureShape), not eagerly at property-append time: a stale shape is a true prefix description of an append-only layout, so the hit path may read it raw and at worst misses. Delete/clear flip a map to dictionary mode (a sentinel shape that never matches a cache entry). A map also flips to dictionary mode when EnsureShape runs from a non-owner realm, so cross-realm property reads never intern one realm's layout into another realm's shape table. After PROPERTY_READ_CACHE_POLYMORPHIC_LIMIT consecutive misses-with-refill or fill declines a site is megamorphic: it stops probing and serves gated receivers through the uncached own-data fast path.
Cached pointers (scope, shape) are compared for identity only and never dereferenced. Scope cache entries carry an entry-version stamp against allocator address reuse; shape entries need none, because shapes are never freed within an engine's lifetime, function templates never outlive their engine, and cross-realm maps stop shape tracking before a foreign realm can cache their owner layout.
Computed property access (OP_ARRAY_GET/OP_ARRAY_SET, OP_GET_INDEX/OP_SET_INDEX, OP_DEL_INDEX) shares one key-classification and receiver-dispatch implementation (ClassifyPropertyKey plus the ExecGet/ExecSet/ExecDeleteComputedProperty cores in Goccia.VM.pas); per-opcode semantic differences are explicit TGocciaComputedAccessOptions, not divergent copies.
The current optimization target is reducing bytecode-mode suite time further without diverging interpreter and bytecode semantics.
Profiling#
The --profile option on GocciaScriptLoader enables language-level profiling of the bytecode VM. See profiling.md for the full guide.
--profile=opcodes— opcode frequency histogram, opcode pair frequency (superinstruction candidates), and scalar fast-path hit rate for generic arithmetic/comparison opcodes--profile=functions— per-function self-time, total-time, call count, and heap allocation count--profile=all— both--profile-output=path.json— JSON export
The profiler follows the same singleton-tracker pattern as coverage (Goccia.Coverage.pas). Zero overhead when disabled. Opcode counting adds ~1% overhead; function timing adds ~3%.
Instruction Limit#
The dispatch loop supports an optional instruction counter (Goccia.InstructionLimit.pas). When armed, the counter increments on every dispatched instruction and the limit is checked at the top of each iteration. When disabled, only the guard read of the limit threadvar remains on the hot path. See Embedding — Execution Limits for the full API and interpreter-mode behavior.
Binary Format#
- Magic:
GBC\0 - Version constant:
GOCCIA_FORMAT_VERSION - Endianness: little-endian
- File extension:
.gbc
Current Status#
--mode=bytecoderuns the Goccia VM directly.- The full JavaScript suite passes in bytecode mode.
- The old generic VM/runtime bridge has been removed from the active build.
Design Rationale#
GocciaScript includes a bytecode executor built specifically for GocciaScript. The current VM is not a language-agnostic subsystem: it executes directly on TGocciaValue, shares the same runtime objects as the interpreter, and uses a Goccia-owned opcode surface.
Why a Bytecode VM?#
The tree-walk interpreter directly evaluates AST nodes via recursive function calls. This is simple and debuggable, but carries overhead from VMT dispatch on every AST node, deep call stacks for nested expressions, and no opportunity for instruction-level optimization. A bytecode VM trades compilation cost for faster execution: flat instruction dispatch, register-based operands, and a compact in-memory representation.
Why Register-Based?#
Stack-based VMs (like the JVM and WASM) are simpler to compile to and have smaller instruction encoding. Register-based VMs (like Lua 5, LuaJIT, and Dalvik) need fewer instructions per operation and avoid redundant stack manipulations. Register-based was chosen for execution performance.
Why Three Tiers?#
The solution is a split opcode space with three tiers:
- Core range (0–127): register, control-flow, closure, literal, and other hot/stable VM operations.
- Non-core generic range (128–166): generic arithmetic and bitwise operations that are still explicit bytecode but handle mixed or untyped operands.
- Semantic helper range (167–255): colder language-level orchestration operations such as imports/exports, dynamic import,
import.meta, await, and resource disposal.
This split keeps the dispatch surface organized while still allowing the bytecode executor to be explicitly Goccia-specific.
Why Shared Runtime Values?#
The VM shares the TGocciaValue object model with the interpreter rather than maintaining a second value representation. Registers use TGocciaRegister — a tagged variant record (Goccia.VM.Registers.pas) that keeps booleans, integers, and floats unboxed as scalars. Values only cross into TGocciaValue when they leave the register file (e.g., property access, function calls, GC marking).
That choice removes:
- conversion layers between interpreter values and VM values
- duplicate object models for arrays, objects, classes, and promises
- bridge-only GC root management
- bytecode/runtime disagreement over
undefined,null, and sparse array holes
The trade-off is that arithmetic fast paths are split between scalar register operations (typed opcodes like OP_ADD_INT / OP_ADD_FLOAT) and generic TGocciaValue fallbacks (like OP_ADD).
Compiler-Side Desugaring#
Language features are compiled into compact bytecode instruction sequences rather than expanding the opcode surface unnecessarily:
- Nullish coalescing (`??`) and nullish coalescing assignment (`??=`) — The compiler emits
OP_JUMP_IF_NOT_NULLISHin its nullish-match mode, soundefined,null, and internal hole values all follow the same short-circuit path without extra comparison instructions. - Template literals — The compiler parses interpolations at compile time, emits string constants and
OP_TO_STRINGfor expression parts, then chainsOP_CONCATinstructions. - Object literals — Data properties compile to
OP_DEFINE_DATA_PROPso object initializers create or overwrite own enumerable data properties. Concise methods useOP_DEFINE_METHOD_PROPto attach[[HomeObject]]without changing plain data-property function or arrow values. Ordinary property assignment still uses theOP_SET_*family and keeps[[Set]]prototype-chain semantics. - Object spread — The compiler emits dedicated Goccia bytecode rather than routing through a generic extension dispatcher.
- Increment/decrement (`++`/`--`) — The compiler emits fused numeric update opcodes for increment/decrement sites. Prefix or discarded-result sites use
OP_INC_NUMERIC/OP_DEC_NUMERIC; postfix sites with distinct result and storage registers useOP_POST_INC_NUMERIC/OP_POST_DEC_NUMERICso the old numeric value is produced while the binding/property value is updated. All variants preserve BigInt, convert other inputs throughToNumeric, and keep the read/convert/write side effects required by ES2026 §13.4.4.1. - Traditional `for` lexical bindings —
let/constloop initializers keep the full per-iteration environment path whenever closures, direct eval, suspension, destructuring, pattern matching,with,using, or nested declaration boundaries can observe binding identity. Plain generated/counting loops share the loop lexical scope and avoid the otherwise redundant copy-in/copy-out sequence.
This keeps the emitted bytecode compact and makes opcode additions deliberate instead of reactive.
Compatibility Scope Helpers#
Compatibility features that alter identifier lookup still compile to explicit VM state instead of falling back to interpreter behavior.
- `arguments` object — With
--compat-arguments-objectenabled, function templates snapshot the current call arguments in the frame.--compat-non-strict-modedoes not enable this helper by itself.OP_CREATE_ARGUMENTSmaterializes the object into the declared local slot before parameter defaults and body execution, so default initializers can observearguments.lengthand generators see the original call list after suspension/resume. OperandBselects mapped semantics for sloppy simple parameter lists and operandCcarries the formal parameter count; the VM forces those parameter locals into cells so indexed properties alias parameter bindings even if the object escapes. Strict functions, modules, and non-simple parameter lists use unmapped arguments objects. - Non-strict `this` binding — Function templates serialize their strict-this mode. With
--compat-non-strict-modeenabled for script source, ordinary function templates clear it so VM call paths coerce nullishthistoglobalThis; arrows and class methods keep their existing lexical or strict receiver behavior. Module source ignores the compatibility flag for this decision. - Non-strict assignment — Failed object/global writes throw by default. In script source non-strict compatibility mode, the compiler emits
OP_SET_PROP_CONST_LOOSE,OP_SET_INDEX_LOOSE, andOP_SET_GLOBAL_LOOSEfor ordinary writes so failed[[Set]]results are ignored while null/undefined property access and throwing setters still raise errors. - `with` statement — With
--compat-non-strict-modeenabled for script source, the compiler lowerswith (expr) bodytoOP_TO_OBJECT, stores the object in a hidden local, and records that hidden binding in the compiler scope. Identifier reads, writes, updates, and identifier calls inside the dynamic extent emitOP_HAS_WITH_BINDINGprobes from innermost to outermost hidden object before falling back to normal local/upvalue/global resolution. Writes that resolve to a with object use the loose set opcodes in non-strict mode. Nested functions inherit the hidden binding as an upvalue when captured, preserving closures created insidewith. - Non-strict `delete` — With
--compat-non-strict-modeenabled for script source, member deletes emitOP_DELETE_PROP_CONST_LOOSEorOP_DEL_INDEX_LOOSE, which preserve strict null/undefined errors but returnfalsefor non-configurable properties. Identifier deletes compile to local/upvalue false results,OP_DELETE_GLOBALfor global object property semantics, orwithbinding probes as needed.
Compiler Optimizer#
Bytecode compilation includes a small compile-time value optimizer. It folds pure primitive constant expressions, propagates immutable local const bindings initialized from compile-time constants, and omits branches or statement tails that are provably unreachable.
The optimizer is intentionally compiler-side only:
- it does not add opcodes or change the
.gbcformat - it does not track mutable bindings, imports, destructuring, function/class declarations, or global-backed top-level bindings
- it only uses
--strict-typesfor conservative algebraic simplifications where the strict type alone preserves JavaScript semantics
When coverage is enabled, PreserveCoverageShape keeps constant branch structure in the emitted bytecode so coverage can report the non-hit branch instead of erasing it from the report.
How Opcode Additions Work#
New opcodes should be added only when an operation is both common enough and semantically stable enough to justify a dedicated instruction.
Prefer:
- explicit Goccia opcodes for core language/runtime behaviour
- compiler lowering to existing instructions for syntactic sugar
- flags or operands when an operation is a mode of an existing instruction rather than a new concept
Tier 1 Property Flags vs Tier 2 Visibility#
Property mutability (writable/configurable) is still a VM concern. Bulk operations like freeze and seal remain derived from the lower-level property-flag operations:
SetEntryFlags(key, flags)— modify flags on a single propertyPutWithFlags(key, value, flags)— create a property with specific flagsPreventExtensions— stop new properties from being addedFreeze= iterate all entries, set flags to 0, prevent extensions (a convenience, not a primitive)
Property visibility and accessor semantics remain part of the higher-level object/class model rather than low-level property-flag storage.
Spread Calling Consolidation#
Spread-based calls use the flags byte on OP_CALL and OP_CALL_METHOD. Spread is treated as a mode of the call instruction rather than as a separate opcode family.
Rejected Findings#
During code review, the following findings were investigated and determined to be non-issues:
- `SBIAS_24` (`Goccia.Bytecode.pas`) — The 24-bit signed bias constant 8388607 is correct. The 24-bit unsigned range 0..16777215 centered at 8388607 gives a signed range of −8388607..+8388608. This is standard Lua-style bias encoding.
Related documents#
- Architecture — Shared source pipeline and both execution modes at a glance
- Interpreter — Tree-walk execution (
Goccia.Interpreter,Goccia.Evaluator.*) - Core patterns — Recurring Pascal conventions
- GocciaScript Context — Canonical project terminology
Contributor Notes#
- Do not add new bytecode/runtime concepts under old generic naming.
- Prefer
TGoccia*bytecode and VM types in new code. - Keep interpreter and bytecode semantics aligned through shared runtime objects, not conversion layers.