uxntal syntax

lab briefwhat is a generative grammar for uxntal?
keywords
  • uxntal
  • Backus–Naur form
  • formal methods
conductedJL 2025/05/07–2025/05/09

purpose

uxntal is the primary programming language of the uxn virtual machine. While the interface of the underlying virtual machine has been frozen to further change, the language it hosts has been developed not towards a specification, but as a means to an end.

In practice, uxntal newcomers are active participants in the language's clarification and refinement, wanting (and often needing) guidance from the active ecosystem of users. While a majority of this clarification relates to the language semantics, the surface-level syntax is itself also a nuanced object which is difficult to communicate.

The purpose of this experiment is the notation of a generative grammar which shows a valid construction for every uxntal syntactic form. The rules showing these constructions are a much weaker version of the problem of notating a specification of the language: the set of generated programs being a conservative subset of all valid uxntal programs.

method

Generative grammars were sketched, with feedback collected on the project's active public channels:

instrumentation

uxntal versioning
uxntal assemblerdrifblim rev. 9 May 2025
uxn emulatoruxncli rev. 19 Oct 2024

results

uxntal syntax is described here using the rules of a generative grammar, encoding a set of program texts that:

program

The generative grammar is given in a Bachus-Naur Form with two extensions.

First, as notation shorthands, take to be any amount of whitespace separation, <hex1> to be any hexadecimal character 0f, and <hexn> a string of n hexadecimal characters (e.g., <hex2> encoding 00ff, a byte).

Second, uxntal naming forms are defined as being not something, which resists BNF expression: instead, take as given:

  1. a <STRING> is anything but its closing
  2. a <COMMENT> is anything but its closing ␣)
  3. an <ID> is anything but the finite sets <opcode> and reserved hexadecimal strings <hex1> <hex2> <hex3> <hex4>.
<program> = concatenating
| <program><program>
operation
| <opcode>
assembler control
| $<ID>
| $<hex1>
| $<hex2>
| $<hex3>
| $<hex4>
| |<ID>
| |<hex1>
| |<hex2>
| |<hex3>
| |<hex4>
literals
| <ID>
| <hex2>
| "<STRING>
| #<hex2>
| #<hex4>
referencing
| ,<refer>
| _<refer>
| .<refer>
| ~<refer>
| ;<refer>
| =<refer>
defining IDs
| %<ID>␣{␣<program>␣}
| @<ID><program>
| &<ID><program>
bracketing
| [␣<program>␣]
| (␣<COMMENT>␣)
a reference can be to a concrete identifier, or to an anonymous label denoted by curly bracketing (a lambda):
<refer> = <ID>
| {␣<program>␣}

opcode

<core-opcode> = arithmetic
| INC | ADD | SUB
| MUL | DIV | SFT
| AND | ORA | EOR
test conditionals
| EQU | NEQ
| GTH | LTH
stack combinators
| POP | NIP
| DUP | OVR
| ROT | STH
memory & I/O
| LIT | DEI | DEO
| STZ | STR | STA
| LDZ | LDR | LDA
jumps & control flow
| JMP | JCN | JSR
the <core-opcodes> are extended with four additional forms, and by mode suffixes:
<opcode> = BRK
| JCI | JMI | JSI
| <core-opcode>
| <core-opcode>2
| <core-opcode>k
| <core-opcode>r
| <core-opcode>2k
| <core-opcode>2r
| <core-opcode>kr
| <core-opcode>2kr
uxntal syntax is effectively communicated using a generative grammar.

further

references

https://git.sr.ht/~rabbits/drifblim/tree/main/item/examples/acid.tal https://git.sr.ht/~rabbits/drifblim/tree/main/item/tests.sh