uxntal syntax

lab brief	what is a generative grammar for uxntal?
keywords	uxntal Backus–Naur form formal methods
conducted	JL 2025/05/07–2025/05/09

purpose

uxntal is the primary programming language of the uxn virtual machine. While the interface of the underlying virtual machine has been frozen to further change, the language it hosts has been developed not towards a specification, but as a means to an end.

In practice, uxntal newcomers are active participants in the language's clarification and refinement, wanting (and often needing) guidance from the active ecosystem of users. While a majority of this clarification relates to the language semantics, the surface-level syntax is itself also a nuanced object which is difficult to communicate.

The purpose of this experiment is the notation of a generative grammar which shows a valid construction for every uxntal syntactic form. The rules showing these constructions are a much weaker version of the problem of notating a specification of the language: the set of generated programs being a conservative subset of all valid uxntal programs.

method

Generative grammars were sketched, with feedback collected on the project's active public channels:

#uxn on the libera.chat IRC server.
#uxn on the concatenative Discord server.

instrumentation

uxntal versioning

uxntal assembler	`drifblim` rev. 9 May 2025
uxn emulator	`uxncli` rev. 19 Oct 2024

results

uxntal syntax is described here using the rules of a generative grammar, encoding a set of program texts that:

is not to be confused with a uxntal specification,
are valid syntactic constructions,
is a conservative subset of valid all syntactic constructs,
are not always meaningful,
and rarely follow idioms of the language as used.

program

The generative grammar is given in a Bachus-Naur Form with two extensions.

First, as notation shorthands, take ␣ to be any amount of whitespace separation, <hex1> to be any hexadecimal character 0–f, and <hexn> a string of n hexadecimal characters (e.g., <hex2> encoding 00–ff, a byte).

Second, uxntal naming forms are defined as being not something, which resists BNF expression: instead, take as given:

a <STRING> is anything but its closing ␣
a <COMMENT> is anything but its closing ␣)
an <ID> is anything but the finite sets <opcode> and reserved hexadecimal strings <hex1> <hex2> <hex3> <hex4>.

<program>	=	concatenating
	\|	<program>␣<program>
		operation
	\|	<opcode>
		assembler control
	\|	$<ID>
	\|	$<hex1>
	\|	$<hex2>
	\|	$<hex3>
	\|	$<hex4>
	\|	\|<ID>
	\|	\|<hex1>
	\|	\|<hex2>
	\|	\|<hex3>
	\|	\|<hex4>
		literals
	\|	<ID>
	\|	<hex2>
	\|	"<STRING>
	\|	#<hex2>
	\|	#<hex4>
		referencing
	\|	,<refer>
	\|	_<refer>
	\|	.<refer>
	\|	~<refer>
	\|	;<refer>
	\|	=<refer>
		defining IDs
	\|	%<ID>␣{␣<program>␣}
	\|	@<ID>␣<program>
	\|	&<ID>␣<program>
		bracketing
	\|	[␣<program>␣]
	\|	(␣<COMMENT>␣)

a reference can be to a concrete identifier, or to an anonymous label denoted by curly bracketing (a lambda):

<refer>	=	<ID>
	\|	{␣<program>␣}

opcode

<core-opcode>	=	arithmetic
	\|	INC \| ADD \| SUB
	\|	MUL \| DIV \| SFT
	\|	AND \| ORA \| EOR
		test conditionals
	\|	EQU \| NEQ
	\|	GTH \| LTH
		stack combinators
	\|	POP \| NIP
	\|	DUP \| OVR
	\|	ROT \| STH
		memory & I/O
	\|	LIT \| DEI \| DEO
	\|	STZ \| STR \| STA
	\|	LDZ \| LDR \| LDA
		jumps & control flow
	\|	JMP \| JCN \| JSR

the <core-opcodes> are extended with four additional forms, and by mode suffixes:

<opcode>	=	BRK
	\|	JCI \| JMI \| JSI
	\|	<core-opcode>
	\|	<core-opcode>2
	\|	<core-opcode>k
	\|	<core-opcode>r
	\|	<core-opcode>2k
	\|	<core-opcode>2r
	\|	<core-opcode>kr
	\|	<core-opcode>2kr

uxntal syntax is effectively communicated using a generative grammar.

further

references

https://git.sr.ht/~rabbits/drifblim/tree/main/item/examples/acid.tal https://git.sr.ht/~rabbits/drifblim/tree/main/item/tests.sh

<core-opcode>	=	arithmetic
	\|	INC \| ADD \| SUB
	\|	MUL \| DIV \| SFT
	\|	AND \| ORA \| EOR
		test conditionals
	\|	EQU \| NEQ
	\|	GTH \| LTH
		stack combinators
	\|	POP \| NIP
	\|	DUP \| OVR
	\|	ROT \| STH
		memory & I/O
	\|	LIT \| DEI \| DEO
	\|	STZ \| STR \| STA
	\|	LDZ \| LDR \| LDA
		jumps & control flow
	\|	JMP \| JCN \| JSR