_*_ _
 / \

hex

\* programming language *\_/_/_/ *

Specification

Introduction

hex is a minimalist, concatenative, stack-based programming language designed for experimenting with the concatenative programming paradigm. It is inspired by the min programming language and aims to provide a small yet powerful language for creating short scripts and automating common tasks.

hex supports 32-bit integers (written only in hexadecimal format), strings, and quotations (lists). It features a set of built-in symbols that implement arithmetic operations, boolean logic, bitwise operations, comparison of integers, I/O operations, file manipulation, external process execution, and stack manipulation. The language is fully homoiconic, meaning that everything in hex is data.

hex was created with simplicity in mind, both in its implementation and usage. The language's design encourages a minimalist approach, focusing on essential features and avoiding unnecessary complexity.

Syntax

The syntax of hex is designed to be simple and intuitive, following the principles of concatenative programming. In hex, programs are composed of sequences of literals and symbols, which are evaluated from left to right.

Literals push values onto the stack, while symbols manipulate the stack or perform operations. There are no explicit control structures; instead, hex relies on stack manipulation and quotations to achieve flow control and data management. Symbols in hex can be used to store values globally, providing a way to manage state across different parts of a program.

hex programs are written as sequences of whitespace-separated tokens. Tokens can be literals, symbols, or comments.

This is an example of a simple hex program:

    ; Filters a quotation to keep only the even numbers
    (0x2 0x3 0x4 0x5 0x6) (0x2 % 0x0 ==) filter

This example includes:

Comments

Comments in hex are used to annotate code and are ignored during execution. There are two types of comments: single-line comments and multi-line comments.

Single-line Comments

Single-line comments start with a semicolon (;) and continue until the end of the line. Everything after the semicolon is ignored.

Example:

    ; This is a single-line comment
    0x2 0x3 + ; This adds 0x2 and 0x3$
Multi-line Comments

Multi-line comments start with #| and end with |#. Everything between these markers is ignored, allowing comments to span multiple lines.

Example:

    #|
      This is a multi-line comment
      It can span multiple lines
    |#
    0x2 0x3 + #| This adds 0x2 and 0x3 |#

Integer Literals

Integer literals in hex are always written in hexadecimal form, prefixed with 0x. They can contain up to 8 hexadecimal digits, representing 32-bit integers. Hexadecimal digits include the numbers 0-9 and the letters >a-f (or A-F), which correspond to the decimal values 10-15.

Integers in hex can be positive or negative, and are implemented using two's complement representation. For more information on two's complement, see .

Examples:

Integers are case-insensitive; typically, lowercase letters are preferred but not mandatory.

String Literals

String literals in hex are delimited by double quotes ("). They can contain any character except for a newline, meaning that strings must be on a single line. To include special characters within a string, hex supports the following escape codes:

Example:

"Hello, World!\nThis is a new line."

Quotation Literals

Quotations in hex are delimited by parentheses (they must start with ( and end with )). They can contain integers, strings, symbols, and even other quotations, allowing for nested structures.

Examples:

Unlike string literals, quotations can span multiple lines, making them suitable for representing complex data structures and control flow mechanisms.

Symbol Identifiers

Symbol identifiers in hex are used to represent built-in native symbols and user-defined symbols.

There are 0x40 (64) native symbols in hex, and some of them contain special characters like == or .

Instead, user-defined symbols:

Symbols are case-sensitive.

Data Types

hex supports the following data types:

Integers

Integers in hex are 32-bit signed values represented in hexadecimal form. They can be positive or negative (using two's complement), and range from -2,147,483,647 (-231) and 2,147,483,647 (231 - 1)

Integers are written using the prefix 0x followed by up to 8 hexadecimal digits.

hex integers are case-insensitive, meaning that 0x1f and 0X1F are equivalent (however, lowercase letters are preferred).

computations.

Because hex has no boolean data type, 0x0 is assumed to be false, and any other integer value is assumed to be true.

Examples:

Strings

Strings in hex are sequences of characters delimited by double quotes ("). They can contain any character except for a newline character, and special characters can be escaped using backslashes.

Strings are used to represent textual data and can be manipulated using various string manipulation symbols in hex.

Examples:

Quotations

Quotations in hex are lists of literals (including other quotations) and symbols delimited by parentheses (( and )). They are used to represent structured data and are a fundamental part of the language's syntax.

An important thing to remember about quotations is that any symbol contained in them will not be executed, and this is a fundamental property of hex and other concatenative programming languages, because it means that quotation effectively acts as code blocks, holding code that can be executed later on using appropriate dequoting symbols.

Consider the following example:

    0x0 "t-count" :
    (t-count 0xa <)
        (
            t-count puts
            t-count 0x1 + "t-count" :
        )
    while
    "t-count" #

This example defines a symbol t-count that counts from 0 to 9 and prints each number to the standard output. The quotation (t-count 0xa <) is used to check if the count is less than 10, and the while symbol repeats the process until the condition is no longer met.

In this case, the first two quotations are first pushed on the stack, and the the while symbols perform the dequoting necessary to implement the expected control flow.

Symbols

In hex there native symbols and user-defined symbols. Native symbols are built-in functions that perform specific operations, while user-defined symbols are created by the user to store values or define custom behavior.

hex provides 64 (0x40) native symbols that cover a wide range of functionality, including arithmetic operations, control flow, I/O operations, file manipulation, and stack manipulation.

You can think of symbols as both functions that manipulate the stack, or variables that can be used to store literal values.

While native symbol identifiers sometimes are comprised of special characters, like ==, user-defined symbol identifier must adhere to specific rules.

All symbols are stored in a single registry, implemented as a simple dictionary. Therefore, all symbols in hex are global, and not lexically scoped. The main driver for this is to keep the language as simple as possible.

You can define your own symbols and delete them using the memory management symbols provided natively. However, native symbols cannot be deleted.

Stack

The stack is a fundamental data structure in hex that holds values and controls the flow of execution. hex is a stack-based language, meaning that all operations are performed on a stack of values. The order according to which items are added (pushed) to or removed (popped) from the stack is LIFO.

In the canonical implementation, the hex stack can contain up to 256 items. If you try to push more items on the stack, a stack overflow error will be raised and the program will terminate. While this may seem a relatively low number, it is important to note that typically there will not be more than 5-10 items on the stack at any time, because typically symbols are used to frequently pop them out of the stacks.

Pushing Literals

Literals are values that are directly pushed onto the stack. In hex, literals can be integers, strings, or quotations. When a literal is encountered in a hex program, it is pushed onto the stack for further processing.

Examples:

Pushing Symbols

Symbols in hex are used to represent native or user-defined functions and values. When a symbol is encountered in a hex program, it is looked up in the registry, and its associated value or function is pushed onto the stack.

Native symbols can perform manipulations on the stack; they can pop values from the stack and push values back in.

By contrast, you can only define literals as user-defined symbols, but you can define a quotation which can then be dequoted through symbols like ., which pushes all the items in a quotations on the stack, one by one.

Consider the following example hex program:

    (dup * *) "square" :
    0x3 square . puts ; prints 9

This program defines a symbol square that can be used to calculate the square value of an integer, using the symbol :. From then on, if square is found anywhere in the same hex program, it will be substituted with (* *). However, this is not enough to calculate the square value, because the logic to do so is in a quotation. To "execute" (dequote) a quotation, you must use the . symbol, which pushes all the items in the quotation on the stack, which is equivalent to the following program:

    0x3 dup * * puts ; prints 9

Registry

The registry in hex is a simple dictionary that stores symbols and their associated values or functions. The registry is used to look up symbols when they are encountered in a hex program and to store user-defined symbols and their values.

When a symbol is pushed onto the stack, hex looks up the symbol in the registry and pushes its associated value or function onto the stack. If the symbol is not found in the registry, an error is raised.

The registry is implemented as a simple key-value store, where the keys are symbol identifiers and the values are the associated values or functions. The registry is global and shared across the entire hex program.

hex provides a set of native symbols that are pre-defined in the registry and cannot be deleted or modified. These symbols provide basic functionality for arithmetic operations, control flow, I/O operations, file manipulation, and stack manipulation.

hex also allows users to define their own symbols and store values in the registry. User-defined symbols can be created, modified, and deleted using the memory management symbols provided natively.

It is important to note that the registry is a global store, meaning that symbols are not lexically scoped and can be accessed from anywhere in the program. This design choice was made to keep the language simple and straightforward.

In the canonical hex implementation, the registry can hold up to 1024 symbols (960 of which can be user-defined symbols).

Hex Bytecode eXecutable (HBX) Format

hex programs can be compiled to a binary format called Hex Bytecode eXecutable (HBX). HBX is a compact binary representation of hex programs that can be executed by the hex interpreter. HBX files are typically smaller and faster to load than hex source files, making them ideal for distribution and execution.

HBX files are structured as follows:

Bytecode Header

The header of an HBX file consists of 8 bytes:

Bytecode Symbol Table

The symbol table in an HBX file contains the list of all symbols that have been defined by the user in the compiled program. Symbols are stored sequentially using the following format:

The symbol table can theoretically contain up to 65536 entries (the maximum size representable in two bytes); however, the maximum number of user-defined symbols is currently limited to 960, since the registry has a maximum size of 1024 items and 64 are reserved for native symbols.

Bytecode Program

The bytecode program in an HBX file contains the compiled hex program as a sequence of opcodes and payload. Each opcode is represented by a single byte, and some opcodes may have an associated payload.

The following opcodes are defined for pushing different types of values on the stack

Other opcodes are assigned to each native symbol, and range from 10 to 4f.

Each of the four opcodes for pushing data has an associated payload, which is used to provide additional information to the opcode. The payload is represented as a sequence of bytes following the opcode byte.

Opcodes for native symbols, instead, do not have any associated payload.

00 - LOOKUP

The 00 (LOOKUP) opcode is used to look up a user-defined symbol in the symbol table and push its associated value onto the stack. The 00 opcode is followed by two bytes representing the index of the symbol in the symbol table, in little-endian format.

For example, the sequence 00 03 00 instructs the interpreter to perform a lookup in the symbol table and retrieve the 4th symbol (index 3).

01 - PUSHIN

The 01 (PUSHIN) opcode is used to push an integer value onto the stack. The 01 opcode is followed by:

For example, the sequence 01 04 fe ff ff ff represents the integer -2 (0xfffffe), and the sequence 01 01 10 represents the integer 16 (0x10$).

02 - PUSHST

The 02 (PUSHST) opcode is used to push a string value onto the stack. The 02 opcode is followed by:

The following sequence:

02 16 54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 73 74 72 69 6e 67 21

represents the string "This is a test string!"

03 - PUSHQT

The 03 (PUSHQT) opcode is used to push a quotation value onto the stack. The 03 opcode is followed by:

The following sequence:

03 05 02 04 74 65 73 74 01 01 01 36 3b 45

represents the quotation ("test" 0x1 dec cat puts)

Full Bytecode Example

Consider the following hex program:

(0x1 0x2 0x3 0x4)
    (
        "_n" :
        (_n 0x2 % 0x0 ==)
          (_n dec " is divisible by two." cat puts)
        when
    )
each

This gets compiled to the following bytecode:

01 68 65 78 01 01 00 02 
02 5f 6e 03 04 01 01 01 
01 01 02 01 01 03 01 01 
04 03 05 02 02 5f 6e 10 
03 05 00 00 00 01 01 02 
23 01 01 00 2a 03 05 00 
00 00 36 02 15 20 69 73 
20 64 69 76 69 73 69 62 
6c 65 20 62 79 20 74 77 
6f 2e 3b 45 13 42

And here is an annotated breakdown:

; Header with symbol table of size 1
01 68 65 78 01 01 00 02
; Symbol table with one symbol: _n
02 5f 6e
; Push quotation of four items
03 04 
   ; Push integer 1
   01 01 01 
   ; Push integer 2
   01 01 02 
   ; Push integer 3
   01 01 03 
   ; Push integer 4
   01 01 04 
; Push quotation of five items
03 05 
   ; Push string "_n"
   02 02 5f 6e 
   10 ; Symbol :
   ; Push quotation of five items
   03 05
   ; Lookup first symbol (_n) 
   00 00 00
   ; Push integer 2 
   1 01 02 
   23 ; Symbol %
   ; Push integer 0
   01 01 00 
   2a ; Symbol ==
      ; Push quotation of five items
      03 05 
         ; Lookup first symbol (_n) 
         00 00 00 
         36 ; Symbol dec
         ; Push string " is divisible by two."
         02 15 20 69 73 20 64 69 76 69 73 
         69 62 6c 65 20 62 79 20 74 77 6f 2e 
         3b ; Symbol cat
         45 ; Symbol puts
   13 ; Symbol when
42 ; Symbol each

Native Symbol Reference

hex provides a set of 64 (0x40) native symbols that are built-in and pre-defined in the registry. The following section provides details on each of these symbols, including a signature illustrating how each symbol manipulates the stack.

The notation used to specify the signature of a symbol is as follows:

    in1 in2 ... inN → out1 out2 ... outM

Where in1, in2, ..., inN are the items consumed from the stack, and out1, out2, ..., outM are the items pushed back onto the stack.

Note that the character represents the symbol being described, and:

The following abbreviations are used to represent different types of literals (and each can have a numerical suffix for differentiation within the signature):

Additionally, * is used to represent zero or more literals of any type.

Consider, for example, the following signature for the swap symbol:

a1 a2 → a2 a1

This signature indicates that the symbol swap pops two items from the stack (a1 and a2), and then pushes them back onto the stack in reverse order (a2 and a1).

Memory Management Symbols

: Symbol

a s →

Stores the literal a in the registry as the symbol s.

# Symbol

s →

Frees the symbol s from the registry.

Control Flow Symbols

if Symbol

q1 q2 q3 → *

Dequotes quotation q1, if it pushes a positive integer on the stack it dequotes q2, otherwise dequotes q3.

when Symbol

q1 q2 → *

Dequotes quotation q1, if it pushes a positive integer on the stack it dequotes q2.

while Symbol

q1 q2 → *

Dequotes quotation q1, if it pushes a positive integer on the stack it dequotes q2 and repeats the process.

error Symbol

→ s

Pushes the last error message to the stack.

try Symbol

q1 q2 → *

Dequotes quotation q1, if it throws an error it dequotes q2.

Stack Management Symbol

dup Symbol

a → a a

Duplicates literal a and pushes it on the stack.

stack Symbol

→ q

Pushes the items currently on the stack as a quotation on the stack.

clear Symbol

Clears the stack.

pop Symbol

a →

Removes the top item from the stack.

swap Symbol

a1 a2 → a2 a1

Swaps the top two items on the stack.

Evaluation Symbols

. Symbol

q → *

Dequotes quotation q.

! Symbol

(s|q) → *

Evaluates the string s as an hex program, or the array of integers to be interpreted as hex bytecode (HBX format).

' Symbol

a → q

Pushes the literal a wrapped in a quotation on the stack.

Arithmetic Symbols

+ Symbol

i1 i2 → i

Pushes the result of the sum of i1 and i2 on the stack.

- Symbol

i1 12 → i

Pushes the result of the subtraction of 12 from i1 on the stack.

* Symbol

i1 12 → i

Pushes the result of the multiplication of i1 and 12 on the stack.

/ Symbol

i1 12 → i

Pushes the result of the division of i1 by 12 on the stack.

% Symbol

i1 12 → i

Pushes the result of the modulo of i1 by 12 on the stack.

Bitwise Operations Symbols

& Symbol

i1 12 → i

Pushes the result of a bitwise and of i1 and i2 on the stack.

| Symbol

i1 12 → i

Pushes the result of a bitwise or of i1 and i2 on the stack.

^ Symbol

i1 12 → i

Pushes the result of a bitwise xor of i1 and i2 on the stack.

~ Symbol

i → i

Pushes the result of a bitwise not of i on the stack.

<< Symbol

i1 12 → i

Pushes the result of shifting i1 by i2 bits to the left.

>> Symbol

i1 12 → i

Pushes the result of shifting i1 by i2 bits to the right.

Comparisons Symbols

== Symbol

a1 a2 → i

Pushes 0x1 on the stack if a1 and a2 are equal, or 0x0 otherwise.

!= Symbol

i1 12 → i

Pushes 0x1 on the stack if a1 and a2 are not equal, or 0x0 otherwise.

> Symbol

i1 12 → i

Pushes 0x1 on the stack if i1 is greater than i2, or 0x0 otherwise.

< Symbol

i1 12 → i

Pushes 0x1 on the stack if i1 is less than i2, or 0x0 otherwise.

>= Symbol

i1 12 → i

Pushes 0x1 on the stack if i1 is greater than or equal to i2, or 0x0 otherwise.

<= Symbol

i1 i2 → i

Pushes 0x1 on the stack if i1 is less than or equal to i2, or 0x0 otherwise.

Boolean Logic Symbols

and Symbol

i1 i2 → i

Pushes 0x1 on the stack if i1 and i2 are non-zero integers, or 0x0 otherwise.

or Symbol

i1 i2 → i

Pushes 0x1 on the stack if i1 or i2 are non-zero integers, or 0x0 otherwise.

not Symbol

i → i

Pushes 0x1 on the stack if i is zero, or 0x0 otherwise.

xor Symbol

i1 i2 → i

Pushes 0x1 on the stack if i1 and i2 are different, or 0x0 otherwise.

Type Checking and Conversion Symbols

int Symbol

s → i

Converts the string s representing a hexadecimal integer to an integer value and pushes it on the stack.

str Symbol

i → s

Converts the integer i to a string representing a hexadecimal integer and pushes it on the stack.

dec Symbol

i → s

Converts the integer i to a string representing a decimal integer and pushes it on the stack.

hex Symbol

s → i

Converts the string s representing a decimal integer to an integer value and pushes it on the stack.

ord Symbol

s → i

Pushes the ASCII value of the string s on the stack.

If s is longer than 1 character or if it is not representable using an ASCII code between 0x0 and 0x7f, 0xffffffff is pushed on the stack.

chr Symbol

i → s

Pushes the ASCII character represented by the integer i on the stack.

If i is not between 0x0 and 0x7f, an empty string is pushed on the stack.

type Symbol

a → s

Pushes the type of the literal a on the stack (integer, string, quotation, native-symbol, user-symbol, invalid, or unknown).

List (Strings and Quotations) Symbols

cat Symbol

(s1 s2|q1 q2) → (s|q)

Pushes the result of the concatenation of two strings or two quotations on the stack.

len Symbol

(s|q) → i

Pushes the length of a string or a quotation on the stack.

get Symbol

(s|q) i → a

Pushes the ith item of a string or a quotation on the stack.

index Symbol

(s a|q a) → i

Pushes the index of the first occurrence of the literal a in a string or a quotation on the stack. If a is not found, 0xffffffff is pushed on the stack.

join Symbol

q s1 → s2

Assuming that q is a quotation containing only strings, pushes the string s2 obtained by joining each element of q together using s1 as a delimiter.

String Symbols

split Symbol

s1 s2 → q

Pushes a quotation q containing the strings obtained by splitting s1 using s2 as a delimiter.

replace Symbol

s1 s2 s3 → s4

Pushes the string s4 obtained by replacing the first occurrence of s2 in s1 by s3.

Quotation Symbols

each Symbol

q1 q2 → *

Dequotes quotation q1 and applies it to each item of quotation q2.

map Symbol

q1 q2 → q3

Dequotes quotation q1 and applies it to each item of quotation q2 to obtain a new quotation q3.

filter Symbol

q1 q2 → q

Dequotes quotation q1 and applies it to each item of quotation q2 to obtain a new quotation q containing only the items that returned a positive integer.

Input/Output Symbols

puts Symbol

a →

Prints a to standard output, followed by a new line.

warn Symbol

a →

Prints a to standard error, followed by a new line.

a →

Prints a to standard output.

gets Symbol

→ s

Reads a line from standard input and pushes it on the stack as a string.

File Symbols

read Symbol

s1 → (s2|q)

Reads the content of the file s1 and pushes it on the stack as a string, if the file is in textual format, or as a quotation of integers representing bytes, if the file is in binary format.

write Symbol

(s1|q) s2 →

Writes the string s1 or the array of integers representing bytes q to the file s2.

append Symbol

(s1|q) s2 →

Appends the string s1 or the array of integers representing bytes q to the file s2.

Shell Symbols

args Symbol

→ q

Pushes the command line arguments as a quotation on the stack.

exit Symbol

i →

Exits the program with the exit code i.

exec Symbol

s → i

Executes the string s as a shell command, and pushes the command return code on the stack.

run Symbol

s → q

Executes the string s as a shell command, capturing its output and errors. It pushes a quotation on the stack containing the following items: