Specification
Introduction
hex is a minimalist, concatenative, stack-based programming language designed for experimenting with the concatenative programming paradigm. It is inspired by the min programming language and aims to provide a small yet powerful language for creating short scripts and automating common tasks.
hex supports 32-bit integers (written only in hexadecimal format), strings, and quotations (lists). It features a set of built-in symbols that implement arithmetic operations, boolean logic, bitwise operations, comparison of integers, I/O operations, file manipulation, external process execution, and stack manipulation. The language is fully homoiconic, meaning that everything in hex is data.
hex was created with simplicity in mind, both in its implementation and usage. The language's design encourages a minimalist approach, focusing on essential features and avoiding unnecessary complexity.
Syntax
The syntax of hex is designed to be simple and intuitive, following the principles of concatenative programming. In hex, programs are composed of sequences of literals and symbols, which are evaluated from left to right.
Literals push values onto the stack, while symbols manipulate the stack or perform operations. There are no explicit control structures; instead, hex relies on stack manipulation and quotations to achieve flow control and data management. Symbols in hex can be used to store values globally, providing a way to manage state across different parts of a program.
hex programs are written as sequences of whitespace-separated tokens. Tokens can be literals, symbols, or comments.
This is an example of a simple hex program:
; Filters a quotation to keep only the even numbers
(0x2 0x3 0x4 0x5 0x6) (0x2 % 0x0 ==) filter
This example includes:
- One single-line comment:
; Filters a quotation to keep only the even numbers
- Two quotations:
(0x2 0x3 0x4 0x5 0x6)
and(0x2 % 0x0 ==)
- Three symbols:
%
,==
, andfilter
Comments
Comments in hex are used to annotate code and are ignored during execution. There are two types of comments: single-line comments and multi-line comments.
Single-line Comments
Single-line comments start with a semicolon (;
) and continue until the end of the line. Everything
after the semicolon is ignored.
Example:
; This is a single-line comment
0x2 0x3 + ; This adds 0x2 and 0x3$
Multi-line Comments
Multi-line comments start with #|
and end with |#
. Everything between these markers is
ignored, allowing comments to span multiple lines.
Example:
#|
This is a multi-line comment
It can span multiple lines
|#
0x2 0x3 + #| This adds 0x2 and 0x3 |#
Integer Literals
Integer literals in hex are always written in hexadecimal form, prefixed with 0x
. They can contain
up to 8 hexadecimal digits, representing 32-bit integers. Hexadecimal digits include the numbers
0-9
and the letters >a-f
(or A-F
), which correspond to the decimal values
10-15.
Integers in hex can be positive or negative, and are implemented using two's complement representation. For more information on two's complement, see .
Examples:
0x1
represents the decimal value 1.0xa
represents the decimal value 10.0x1f
represents the decimal value 31.0xffffffff
represents the decimal value -1 (in two's complement).
Integers are case-insensitive; typically, lowercase letters are preferred but not mandatory.
String Literals
String literals in hex are delimited by double quotes ("
). They can contain any character except for
a newline, meaning that strings must be on a single line. To include special characters within a string, hex
supports the following escape codes:
\n
- Newline\t
- Tab\r
- Carriage return\b
- Backspace\f
- Form feed\v
- Vertical tab\\
- Backslash\"
- Double quote
Example:
"Hello, World!\nThis is a new line."
Quotation Literals
Quotations in hex are delimited by parentheses (they must start with (
and end with )
).
They can contain integers, strings, symbols, and
even other quotations, allowing for nested structures.
Examples:
(0x1 0x2 0x3)
- A quotation containing three integer literals.(0x1 "hello" (0x2 0x3))
- A nested quotation containing an integer, a string, and another quotation.
Unlike string literals, quotations can span multiple lines, making them suitable for representing complex data structures and control flow mechanisms.
Symbol Identifiers
Symbol identifiers in hex are used to represent built-in native symbols and user-defined symbols.
There are 0x40 (64) native symbols in hex, and some of them contain special
characters like ==
or .
Instead, user-defined symbols:
- must start with a letter (
a-z
orA-Z
) or an underscore (_
) - can contain additional letters (
a-z
orA-Z
), digits (0-9
), dashes (-
) and underscores (_
)
Symbols are case-sensitive.
Data Types
hex supports the following data types:
- Integers — 32-bit signed integers represented in hexadecimal form.
- Strings — Sequences of characters delimited by double quotes.
- Quotations — Lists of literals, symbols, and other quotations delimited by parentheses.
- Symbols — Identifiers representing native or user-defined symbols.
Integers
Integers in hex are 32-bit signed values represented in hexadecimal form. They can be positive or negative (using
two's complement), and range from -2,147,483,647
(-231
) and
2,147,483,647
(231 - 1
)
Integers are written using the prefix 0x
followed by up to 8 hexadecimal digits.
hex integers are case-insensitive, meaning that 0x1f
and
0X1F
are equivalent
(however, lowercase letters are preferred).
Because hex has no boolean data type, 0x0 is assumed to be false, and any other integer value is assumed to be true.
Examples:
0x1
— Represents the decimal value 1.0xffffffff
— Represents the decimal value -1.0x10
— Represents the decimal value 16.
Strings
Strings in hex are sequences of characters delimited by double quotes ("
). They can contain any
character except for a newline character, and special characters can be escaped using backslashes.
Strings are used to represent textual data and can be manipulated using various string manipulation symbols in hex.
Examples:
"Hello, World!"
— Represents the stringHello, World!
."This is a string with a newline:\nSecond line."
— Represents a string with a newline character.
Quotations
Quotations in hex are lists of literals (including other quotations) and symbols delimited by parentheses
((
and )
). They are used to represent structured data and are a fundamental part of the language's
syntax.
An important thing to remember about quotations is that any symbol contained in them will not be executed, and this is a fundamental property of hex and other concatenative programming languages, because it means that quotation effectively acts as code blocks, holding code that can be executed later on using appropriate dequoting symbols.
Consider the following example:
0x0 "t-count" :
(t-count 0xa <)
(
t-count puts
t-count 0x1 + "t-count" :
)
while
"t-count" #
This example defines a symbol t-count
that counts from 0 to 9 and prints each number to the
standard
output. The quotation (t-count 0xa <)
is used to check if the count is less than 10,
and the
while
symbol repeats the process until the condition is no longer met.
In this case, the first two quotations are first pushed on the stack, and the the while
symbols
perform the dequoting necessary to implement the expected control flow.
Symbols
In hex there native symbols and user-defined symbols. Native symbols are built-in functions that perform specific operations, while user-defined symbols are created by the user to store values or define custom behavior.
hex provides 64 (0x40) native symbols that cover a wide range of functionality, including arithmetic operations, control flow, I/O operations, file manipulation, and stack manipulation.
You can think of symbols as both functions that manipulate the stack, or variables that can be used to store literal values.
While native symbol identifiers sometimes are comprised of special characters, like ==, user-defined symbol identifier must adhere to specific rules.
All symbols are stored in a single registry, implemented as a simple dictionary. Therefore, all symbols in hex are global, and not lexically scoped. The main driver for this is to keep the language as simple as possible.
You can define your own symbols and delete them using the memory management symbols provided natively. However, native symbols cannot be deleted.
Stack
The stack is a fundamental data structure in hex that holds values and controls the flow of execution. hex is a stack-based language, meaning that all operations are performed on a stack of values. The order according to which items are added (pushed) to or removed (popped) from the stack is LIFO.
In the canonical implementation, the hex stack can contain up to 256 items. If you try to push more items on the stack, a stack overflow error will be raised and the program will terminate. While this may seem a relatively low number, it is important to note that typically there will not be more than 5-10 items on the stack at any time, because typically symbols are used to frequently pop them out of the stacks.
Pushing Literals
Literals are values that are directly pushed onto the stack. In hex, literals can be integers, strings, or quotations. When a literal is encountered in a hex program, it is pushed onto the stack for further processing.
Examples:
0x1
— Pushes the integer 1 onto the stack."Hello, World!"
— Pushes the stringHello, World!
onto the stack.(0x1 0x2 0x3)
— Pushes the quotation(0x1 0x2 0x3)
onto the stack.
Pushing Symbols
Symbols in hex are used to represent native or user-defined functions and values. When a symbol is encountered in a hex program, it is looked up in the registry, and its associated value or function is pushed onto the stack.
Native symbols can perform manipulations on the stack; they can pop values from the stack and push values back in.
By contrast, you can only define literals as user-defined symbols, but you can define a quotation which can then be dequoted through symbols like ., which pushes all the items in a quotations on the stack, one by one.
Consider the following example hex program:
(dup * *) "square" :
0x3 square . puts ; prints 9
This program defines a symbol square that can be used to calculate the square value of an integer, using the
symbol :. From then on, if square is found anywhere in the same hex program, it
will be substituted with (* *)
. However, this is not enough to calculate the square value,
because the logic to do so is in a quotation. To "execute" (dequote) a quotation, you must use the .
symbol, which pushes all the items in the quotation on the stack, which is equivalent
to the following program:
0x3 dup * * puts ; prints 9
Registry
The registry in hex is a simple dictionary that stores symbols and their associated values or functions. The registry is used to look up symbols when they are encountered in a hex program and to store user-defined symbols and their values.
When a symbol is pushed onto the stack, hex looks up the symbol in the registry and pushes its associated value or function onto the stack. If the symbol is not found in the registry, an error is raised.
The registry is implemented as a simple key-value store, where the keys are symbol identifiers and the values are the associated values or functions. The registry is global and shared across the entire hex program.
hex provides a set of native symbols that are pre-defined in the registry and cannot be deleted or modified. These symbols provide basic functionality for arithmetic operations, control flow, I/O operations, file manipulation, and stack manipulation.
hex also allows users to define their own symbols and store values in the registry. User-defined symbols can be created, modified, and deleted using the memory management symbols provided natively.
It is important to note that the registry is a global store, meaning that symbols are not lexically scoped and can be accessed from anywhere in the program. This design choice was made to keep the language simple and straightforward.
In the canonical hex implementation, the registry can hold up to 1024 symbols (960 of which can be user-defined symbols).
Hex Bytecode eXecutable (HBX) Format
hex programs can be compiled to a binary format called Hex Bytecode eXecutable (HBX). HBX is a compact binary representation of hex programs that can be executed by the hex interpreter. HBX files are typically smaller and faster to load than hex source files, making them ideal for distribution and execution.
HBX files are structured as follows:
- Bytecode Header (8 bytes)
- Bytecode Symbol Table — containing the list of all symbols that have been defined by the user in the compiled program.
- Bytecode Program — containing the compiled hex program as a sequence of opcodes and payload.
Bytecode Header
The header of an HBX file consists of 8 bytes:
01
— Header Start68
— The letter 'h'65
— The letter 'e'78
— The letter 'x'01
— Version00
— First byte indicating the size of the symbol table (little-endian)00
— Second byte indicating the size of the symbol table (little-endian)02
— Header End
Bytecode Symbol Table
The symbol table in an HBX file contains the list of all symbols that have been defined by the user in the compiled program. Symbols are stored sequentially using the following format:
- Symbol Length (1 byte) — The length of the symbol identifier (Can be up to 255 characters long).
- Symbol Identifier (variable length) — The symbol identifier as a sequence of ASCII characters (not null-terminated).
The symbol table can theoretically contain up to 65536 entries (the maximum size representable in two bytes); however, the maximum number of user-defined symbols is currently limited to 960, since the registry has a maximum size of 1024 items and 64 are reserved for native symbols.
Bytecode Program
The bytecode program in an HBX file contains the compiled hex program as a sequence of opcodes and payload. Each opcode is represented by a single byte, and some opcodes may have an associated payload.
The following opcodes are defined for pushing different types of values on the stack
00
— (LOOKUP) Lookup user symbol01
— (PUSHIN) Push Integer02
— (PUSHST) Push String03
— (PUSHQT) Push Quotation
Other opcodes are assigned to each native symbol, and range from 10
to
4f
.
Each of the four opcodes for pushing data has an associated payload, which is used to provide additional information to the opcode. The payload is represented as a sequence of bytes following the opcode byte.
Opcodes for native symbols, instead, do not have any associated payload.
00 - LOOKUP
The 00
(LOOKUP) opcode is used to look up a user-defined symbol in the symbol table and push its
associated value onto
the stack. The 00
opcode is followed by two bytes representing the index of the symbol in the
symbol table, in
little-endian format.
For example, the sequence 00 03 00
instructs the interpreter to perform a lookup in the symbol table
and retrieve the 4th symbol (index 3).
01 - PUSHIN
The 01
(PUSHIN) opcode is used to push an integer value onto the stack. The 01
opcode
is
followed by:
- One byte representing the number of following bytes used to represent the integer (1 to 4).
- Four bytes representing the signed integer value using two's complement, in little-endian format.
For example, the sequence 01 04 fe ff ff ff
represents the integer -2
(0xfffffe),
and
the sequence 01 01 10
represents the integer 16 (0x10$).
02 - PUSHST
The 02
(PUSHST) opcode is used to push a string value onto the stack. The 02
opcode is
followed by:
- A variable number of bytes representing the length of the string, encoded using the Little Endian Base 128 (LEB128) algorithm.
- Variable-length sequence of bytes representing the ASCII characters of the string, without the null terminator. Note that only ASCII characters are supported by the HBX format right now; attempting to encode non-ASCII characters will result in a compiler error.
The following sequence:
02 16 54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 73 74 72 69 6e 67 21
represents the string "This is a test string!"
03 - PUSHQT
The 03
(PUSHQT) opcode is used to push a quotation value onto the stack. The 03
opcode
is followed by:
- A variable number of bytes representing the number of items in the quotation, encoded using the Little Endian Base 128 (LEB128) algorithm.
- The opcode sequences for each item of the quotation.
The following sequence:
03 05 02 04 74 65 73 74 01 01 01 36 3b 45
represents the quotation ("test" 0x1 dec cat puts)
Full Bytecode Example
Consider the following hex program:
(0x1 0x2 0x3 0x4)
(
"_n" :
(_n 0x2 % 0x0 ==)
(_n dec " is divisible by two." cat puts)
when
)
each
This gets compiled to the following bytecode:
01 68 65 78 01 01 00 02
02 5f 6e 03 04 01 01 01
01 01 02 01 01 03 01 01
04 03 05 02 02 5f 6e 10
03 05 00 00 00 01 01 02
23 01 01 00 2a 03 05 00
00 00 36 02 15 20 69 73
20 64 69 76 69 73 69 62
6c 65 20 62 79 20 74 77
6f 2e 3b 45 13 42
And here is an annotated breakdown:
; Header with symbol table of size 1
01 68 65 78 01 01 00 02
; Symbol table with one symbol: _n
02 5f 6e
; Push quotation of four items
03 04
; Push integer 1
01 01 01
; Push integer 2
01 01 02
; Push integer 3
01 01 03
; Push integer 4
01 01 04
; Push quotation of five items
03 05
; Push string "_n"
02 02 5f 6e
10 ; Symbol :
; Push quotation of five items
03 05
; Lookup first symbol (_n)
00 00 00
; Push integer 2
1 01 02
23 ; Symbol %
; Push integer 0
01 01 00
2a ; Symbol ==
; Push quotation of five items
03 05
; Lookup first symbol (_n)
00 00 00
36 ; Symbol dec
; Push string " is divisible by two."
02 15 20 69 73 20 64 69 76 69 73
69 62 6c 65 20 62 79 20 74 77 6f 2e
3b ; Symbol cat
45 ; Symbol puts
13 ; Symbol when
42 ; Symbol each
Native Symbol Reference
hex provides a set of 64 (0x40) native symbols that are built-in and pre-defined in the registry. The following section provides details on each of these symbols, including a signature illustrating how each symbol manipulates the stack.
The notation used to specify the signature of a symbol is as follows:
in1 in2 ... inN → out1 out2 ... outM
Where in1
, in2
, ..., inN
are the items consumed from the stack,
and out1
, out2
, ..., outM
are the items pushed back onto the
stack.
Note that the →
character represents the symbol being described, and:
inN
is the first element on the stack before the symbol is pushed on the stack.outM
is the first element on the stack after the symbol is pushed on the stack.
The following abbreviations are used to represent different types of literals (and each can have a numerical suffix for differentiation within the signature):
a
— Any literal values
— Stringq
— Quotationi
— Integer
Additionally, *
is used to represent zero or more literals of any type.
Consider, for example, the following signature for the swap symbol:
a1 a2 → a2 a1
This signature indicates that the symbol swap pops two items from the stack (a1
and
a2
), and then pushes them back onto the stack in reverse order (a2
and
a1
).
Memory Management Symbols
:
Symbol
a s →
Stores the literal a
in the registry as the symbol s
.
#
Symbol
s →
Frees the symbol s
from the registry.
Control Flow Symbols
if
Symbol
q1 q2 q3 → *
Dequotes quotation q1
, if it pushes a positive integer on the stack it dequotes
q2
,
otherwise
dequotes q3
.
when
Symbol
q1 q2 → *
Dequotes quotation q1
, if it pushes a positive integer on the stack it dequotes
q2
.
while
Symbol
q1 q2 → *
Dequotes quotation q1
, if it pushes a positive integer on the stack it dequotes
q2
and
repeats the process.
error
Symbol
→ s
Pushes the last error message to the stack.
try
Symbol
q1 q2 → *
Dequotes quotation q1
, if it throws an error it dequotes q2
.
Stack Management Symbol
dup
Symbol
a → a a
Duplicates literal a
and pushes it on the stack.
stack
Symbol
→ q
Pushes the items currently on the stack as a quotation on the stack.
clear
Symbol
→
Clears the stack.
pop
Symbol
a →
Removes the top item from the stack.
swap
Symbol
a1 a2 → a2 a1
Swaps the top two items on the stack.
Evaluation Symbols
.
Symbol
q → *
Dequotes quotation q
.
!
Symbol
(s|q) → *
Evaluates the string s
as an hex program, or the array of integers to be interpreted as hex
bytecode
(HBX format).
'
Symbol
a → q
Pushes the literal a
wrapped in a quotation on the stack.
Arithmetic Symbols
+
Symbol
i1 i2 → i
Pushes the result of the sum of i1
and i2
on the stack.
-
Symbol
i1 12 → i
Pushes the result of the subtraction of 12
from i1
on the stack.
*
Symbol
i1 12 → i
Pushes the result of the multiplication of i1
and 12
on the stack.
/
Symbol
i1 12 → i
Pushes the result of the division of i1
by 12
on the stack.
%
Symbol
i1 12 → i
Pushes the result of the modulo of i1
by 12
on the stack.
Bitwise Operations Symbols
&
Symbol
i1 12 → i
Pushes the result of a bitwise and of i1
and i2
on the stack.
|
Symbol
i1 12 → i
Pushes the result of a bitwise or of i1
and i2
on the stack.
^
Symbol
i1 12 → i
Pushes the result of a bitwise xor of i1
and i2
on the stack.
~
Symbol
i → i
Pushes the result of a bitwise not of i
on the stack.
<<
Symbol
i1 12 → i
Pushes the result of shifting i1
by i2
bits to the left.
>>
Symbol
i1 12 → i
Pushes the result of shifting i1
by i2
bits to the right.
Comparisons Symbols
==
Symbol
a1 a2 → i
Pushes 0x1
on the stack if a1
and a2
are equal, or
0x0
otherwise.
!=
Symbol
i1 12 → i
Pushes 0x1
on the stack if a1
and a2
are not equal, or
0x0
otherwise.
>
Symbol
i1 12 → i
Pushes 0x1
on the stack if i1
is greater than i2
, or
0x0
otherwise.
<
Symbol
i1 12 → i
Pushes 0x1
on the stack if i1
is less than i2
, or 0x0
otherwise.
>=
Symbol
i1 12 → i
Pushes 0x1
on the stack if i1
is greater than or equal to i2
, or
0x0
otherwise.
<=
Symbol
i1 i2 → i
Pushes 0x1
on the stack if i1
is less than or equal to i2
, or
0x0
otherwise.
Boolean Logic Symbols
and
Symbol
i1 i2 → i
Pushes 0x1
on the stack if i1
and i2
are non-zero integers, or
0x0
otherwise.
or
Symbol
i1 i2 → i
Pushes 0x1
on the stack if i1
or i2
are non-zero integers, or
0x0
otherwise.
not
Symbol
i → i
Pushes 0x1
on the stack if i
is zero, or 0x0
otherwise.
xor
Symbol
i1 i2 → i
Pushes 0x1
on the stack if i1
and i2
are different, or
0x0
otherwise.
Type Checking and Conversion Symbols
int
Symbol
s → i
Converts the string s
representing a hexadecimal integer to an integer value and pushes it
on
the
stack.
str
Symbol
i → s
Converts the integer i
to a string representing a hexadecimal integer and pushes it on the
stack.
dec
Symbol
i → s
Converts the integer i
to a string representing a decimal integer and pushes it on the
stack.
hex
Symbol
s → i
Converts the string s
representing a decimal integer to an integer value and pushes it on
the
stack.
ord
Symbol
s → i
Pushes the ASCII value of the string s
on the stack.
If s
is longer than 1 character or if it is not representable using an ASCII code between
0x0
and
0x7f, 0xffffffff
is pushed on the stack.
chr
Symbol
i → s
Pushes the ASCII character represented by the integer i
on the stack.
If i
is not between 0x0 and 0x7f, an empty string is pushed on the stack.
type
Symbol
a → s
Pushes the type of the literal a
on the stack (integer
, string
,
quotation
, native-symbol
, user-symbol
, invalid
, or
unknown
).
List (Strings and Quotations) Symbols
cat
Symbol
(s1 s2|q1 q2) → (s|q)
Pushes the result of the concatenation of two strings or two quotations on the stack.
len
Symbol
(s|q) → i
Pushes the length of a string or a quotation on the stack.
get
Symbol
(s|q) i → a
Pushes the i
th item of a string or a quotation on the stack.
index
Symbol
(s a|q a) → i
Pushes the index of the first occurrence of the literal a
in a string or a quotation on the
stack.
If a
is not found, 0xffffffff
is pushed on the stack.
join
Symbol
q s1 → s2
Assuming that q
is a quotation containing only strings, pushes the string s2
obtained
by joining each element of q
together using s1
as a delimiter.
String Symbols
split
Symbol
s1 s2 → q
Pushes a quotation q
containing the strings obtained by splitting s1
using
s2
as a delimiter.
replace
Symbol
s1 s2 s3 → s4
Pushes the string s4
obtained by replacing the first occurrence of s2
in
s1
by
s3
.
Quotation Symbols
each
Symbol
q1 q2 → *
Dequotes quotation q1
and applies it to each item of quotation q2
.
map
Symbol
q1 q2 → q3
Dequotes quotation q1
and applies it to each item of quotation q2
to obtain a
new
quotation q3
.
filter
Symbol
q1 q2 → q
Dequotes quotation q1
and applies it to each item of quotation q2
to obtain a
new
quotation q
containing only the items that returned a positive integer.
Input/Output Symbols
puts
Symbol
a →
Prints a
to standard output, followed by a new line.
warn
Symbol
a →
Prints a
to standard error, followed by a new line.
print
Symbol
a →
Prints a
to standard output.
gets
Symbol
→ s
Reads a line from standard input and pushes it on the stack as a string.
File Symbols
read
Symbol
s1 → (s2|q)
Reads the content of the file s1
and pushes it on the stack as a string, if the file is in
textual
format, or as a quotation of integers representing bytes, if the file is in binary format.
write
Symbol
(s1|q) s2 →
Writes the string s1
or the array of integers representing bytes q
to the file
s2
.
append
Symbol
(s1|q) s2 →
Appends the string s1
or the array of integers representing bytes q
to the file
s2
.
Shell Symbols
args
Symbol
→ q
Pushes the command line arguments as a quotation on the stack.
exit
Symbol
i →
Exits the program with the exit code i
.
exec
Symbol
s → i
Executes the string s
as a shell command, and pushes the command return code on the stack.
run
Symbol
s → q
Executes the string s
as a shell command, capturing its output and errors. It pushes a
quotation
on
the stack containing the following items:
- the exit code of the command as an integer
- the standard output of the command as a string
- the standard error of the command as a string