Overview of the
GP1 Programming Language

GP1 is a statically typed, multi-paradigm programming language with an emphasis on brevity and explicitness. It provides both value and reference types, as well as higher-order functions and first-class support for many common programming patterns.

This document serves as a quick, informal reference for developers of GP1 (or anyone who's curious).

Variables and Constants

A given "variable" is defined with either the var or con keyword, for mutable and immutable assignment respectively, alonside the assignment operator, <-. An uninitialized variable MUST have an explicit type, and cannot be accessed until it is assigned. A variable that is initialized in its declaration may have an explicit type, but the type may be inferred here, when possible, if one is omitted. Normal type-coercion rules apply in assignments, as described in the Coercion and Casting section.

Non-ascii unicode characters are allowed in variable names as long as the character doesn't cause a parsing issue. For example, whitespace tokens are not allowed in variable names.

Some examples of assigning variables:

var x: i32;  // x is an uninitialized 32-bit signed integer
var y <- x;  // this won't work, because x has no value
x <- 7;
var y <- x;  // this time it works, because x is now 7

con a: f64 <- 99.8;  // a is immutable
a <- 44.12;          // this doesn't work, because con variables cannot be reassigned

The following lines are equivalent,

con a <- f64(7.2);
con a: f64 <- 7.2;
con a <- 7.2;        // 7.2 is implicitly of type f64
con a <- 7.2D;       // With an explicit type suffix

as are these.

var c: f32 <- 9;
var c <- f32(9);
var c: f32 <- f32(9);
var c <- 9F;

Variable assignments are expressions in GP1, which can enable some very interesting code patterns. For example, it allows multiple assignments on one line with the following syntax. con a <- var b <- "death and taxes" assigns the string "death and taxes" to both a and b, leaving you with one constant and one variable containing separate instances of identical data. This is equivalent to writing con a <- "death and taxes" and var b <- "death and taxes" each on their own line. Assignment as an expression also eliminates much of the need to define variables immediately before the control structure in which they're used, which improves readability.

Intrinsic Types

Numeric Types

u8 u16 u32 u64 u128 u256 usize byte

i8 i16 i32 i64 i128 i256 isize

f16 f32 f64 f128 f256

GP1 has signed integer, unsigned integer, and floating point numeric types. Numeric types take the form of a single-letter indicator followed by the type's size in bits. The indicators are i (signed integer), u (unsigned integer), and f (floating point). usize and isize are pointer-width types. For example, on a 64-bit system, usize is a 64-bit unsigned integer. However, it must be cast to u64 when assigning to a u64 variable. The type byte is an alias for u8. Numeric operators are as one expects from C, with the addition of ** as a power operator.

Numeric literals have an implicit type, or the type can be specified by a case-insensitive suffix. For example:

var i1 <- 1234;    // implicitly i32
var f1 <- 1234.5;  // implicitly f64

var i3 <- 1234L;   // i64
var u3 <- 1234ui;  // u32
var f2 <- 1234.6F; // f32

The complete set of suffixes is given.

suffix	corresponding type
s	i16
i	i32
l	i64
p	isize
b	byte
us	u16
ui	u32
ul	u64
up	usize
f	f32
d	f64
q	f128

Booleans

bool is the standard boolean type with support for all the usual operations. The boolean literals are true and false. Bool operators are as one expects from C, with the exception that NOT is !! instead of !.

Bitwise Operators

Bitwise operators can be applied only to integers and booleans. They are single counterparts of the doubled boolean operators, e.g. boolean negation is !!, so bitwise negation is !.

Strings and Characters

char is a unicode character of variable size. Char literals are single-quoted, e.g. 'c'. Any single valid char value can be used as a literal in this fasion.

string is a unicode string. String literals are double-quoted, e.g. "Hello, World.".

Arrays

GP supports typical array operations.

var tuples : (int, int)[]; // declare array of tuples
var strings : string[];    // declare array of strings

var array <- i32[n];       // declare and allocate array of n elements
                           // n is any number that can be coerced to usize

con nums <- {1, 2, 3};     // immutable array of i32

Use the length property to access the number of elements in an allocated array. Attempting to access length of an unallocated array is an exception.


var colors <- {"Red", "White", "Blue"};  // allocate array

var count <- colors.length; // count is usize(3)

Arrays can be indexed with any integer type (signed or unsigned). Negative values wrap from the end (-1 is the last element). An exception occurs if the value is too big, i.e.no modulo operation is performed.

var w <- {1, 2, 3, 4, 5, 6, 7};

w[0]  // first element, 1
w[-1] // last element, 7

var x <- isize(-5);
w[x]  // 5th to last element, 3

Tuples

Tuples group multiple values into a single value with anonymous, ordered fields. () is an empty tuple. ("hello", i32(17)) is a tuple of type (string i32). Tuple fields are named like indices, i.e.(u128(4), "2").1 would be "2".

The unit type, represented as a 0-tuple, is written ().

Regex

regex is a regular expression. GP1 regex format is identical to that of .NET 5 and very similar to that of gawk.

Named Functions

Some examples of defining named functions:

fn sum(a: f32, b: f32): f32 { a + b }        // takes parameters and returns an f32

fn twice_println(s: string) {                // takes parameters and implicitly returns ()
    println("${s}\n${s}");
}

fn join_println(a: string, b: string): () {  // takes parameters and explicitly returns ()
    println("${a} ${b}");
}

fn seven(): u32 { 7 }                        // takes no parameters and returns the u32 value of 7

There are a number of syntaxes allowed for calling a given function. This is because the caller is allowed to assign to zero or more of that function's parameters by name. Parameters assigned by name are freely ordered, while those assigned normally bind to the first parameter ordered from left to right in the function definition that is unassigned. With regard to the join_println function defined above, this means that all of the following are valid and behave identically.

join_println(a <- "Hello,", b <- "World.");
join_println(b <- "World.", a <- "Hello,");
join_println(b <- "World.", "Hello,");
join_println("Hello,", "World.");

Function names may be overloaded. For example, join_println could be additionally defined as

fn join_println(a: string, b: string, sep: string) {    
    println("${a}${sep}${b}");
}

and then both join_println("Hello,", "World.", " ") and join_println("Hello,", "World.") would be valid calls.

Functions may be defined and called within other functions. You may be familar with this pattern from functional languages like F#, wherein a wrapper function is often used to guard an inner recursive function (GP1 permits both single and mutual recursion in functions). For example:

fn factorial(n: u256): u256 {
    fn aux(n: u256, accumulator: u256): u256 {
        match n > 1 {
            true => aux(n - 1, accumulator * n),
            _ => accumulator,
        }
    }
    aux(n, 1)
}

Arguments are passed by value by default. For information on the syntax used in this example, refer to Control Flow.

Anonymous Functions

Closures

Closures behave as one would expect in GP1, exactly like they do in most other programming languages that feature them. Closures look like this:

var x: u32 <- 8;

var foo <- { y, z => x * y * z};     // foo is a closure; its type is fn<u32 | u32>
assert(foo(3, 11) == (8 * 3 * 11));  // true

x <- 5;
assert(foo(3) == (8 * 3 * 11));  // true

con bar <- { => x * x };    // bar is a closure of type `fn<u32>`

assert(bar() == 25);        // true because closure references already-defined x

They are surrounded by curly braces. Within the curly braces goes an optional, comma-separated parameter list, followed by a required => symbol, followed by an optional expression. If no expression is included, the closure implicitly returns ().

The reason the match-expression uses the same => symbol is because the when section of a match arm is an implicit closure. The reason => in particular was chosen for closures is twofold. One, arrows are conventional for expressing anonymous functions, and two, the space between the lines of an equals sign is enclosed by them.

Lambdas

Lambdas are nearly identical to closures, but they don't close over their environment, and they use the -> symbol in place of =>. A few examples of lambdas:

con x: u32 <- 4;  // this line is totally irrelevant

con square <- { x -> x * x };                 // this in not valid, because the type of the function is not known
con square <- { x: u32 -> x * x };            // this if fine, because the type is specified in the lambda
con square: fn<u32 | u32> <- { x -> x * x };  // also fine, because the type is specified in the declaration

Function Types

Functions are first-class citizens in GP1, so you can assign them to variables, pass them as arguments, &c.However, using the function definition syntax is suboptimal when using function types. Instead, there is a separate syntax for function types. Given the function fn sum(a: f64, b: f64): f64 { a + b } the function type is expressed fn<f64 f64 | f64>, meaning a function that accepts two f64 values and returns an f64. Therefore,

fn sum(a: f64, b: f64): f64 { a + b }

con sum: fn<f64 f64 | f64> <- { a, b -> a + b };

con sum <- { a: f64, b: f64 -> a + b };

are all equivalent ways of binding a function of type fn<f64 f64 | f64> to the constant sum. Here's an example of how to express a function type for a function argument.

fn apply_op(a: i32, b: i32, op: fn<i32 i32 | i32>): i32 {
    op(a, b)
}

Function Type Inference

The above example provides an explicit type for the argument op. You could safely rewrite this as

fn apply_op(a: i32, b: i32, op: fn): i32 {
    op(a, b)
}

because the compiler can safely infer the function type of op. Type inference only works to figure out the function signature, so fn apply_op(a:i32, b:i32, op):i32 { . . . } is not allowed.

Coercion and Casting

Refer to Variables and Constants for information on the syntax used in this section.

Numeric types are automatically coerced into other numeric types as long as that coercion is not lossy. For example,

var x: i32 <- 10;
var y: i64 <- x;

is perfectly legal (the 32-bit value fits nicely in the 64-bit variable). However, automatic coercion doesn't work if it would be lossy, so

var x: i64 <- 10;
var y: i32 <- x;

doesn't work. This holds for numeric literals as well. Unsurprisingly, var x: i32 <- 3.14 wouldn't compile. The floating point value can't be automatically coerced to an integer type. So what does work? Casting via the target type's pseudo-constructor works.

con x: f64 <- 1234.5;        // okay because the literal can represent any floating point type
con y: f64 <- f16(1234.5);   // also okay, because any f16 can be losslessly coerced to an f64
con z: i32 <- i32(x);        // also okay; uses the i32 pseudo-constructor to 'cast' x to a 32-bit integer

assert(z == 1234)

con a: f64 <- 4 * 10 ** 38;  // this value is greater than the greatest f32
con b: f32 <- f32(a);        // the value of b is the maximum value of f32

This approach is valid for all intrinsic types. For example, var flag: bool <- bool(0) sets flag to false and var txt: string <- string(83.2) sets txt to the string value "83.2". Such behavior can be implemented by a programmer on their own types via a system we'll discuss in the Interfaces section.

Program Structure

Every GP1 program has an entry-point function. Within that function, statements are executed from top to bottom and left to right. The entry-point function can be declared with the entry keyword in place of fn and returns an integer, which will be provided to the host operating system as an exit code. Naturally, this means that the handling of that code is platform-dependent once it passes the program boundry, so it's important to keep in mind that a system may implicitly downcast or otherwise modify it before it is made available to the user. If no exit code is specified, or if the return type of the function is not an integer, GP1 assumes an exit code of usize(0) and returns that to the operating system.

The following program prints Hello, World. and exits with an error code.

entry main(): usize {
    hello_world();
    1
}

fn hello_world() {
    println("Hello, World.");
}

The entry function may have any name; it's the entry keyword that makes it the entry point. The entry function may also be implicit. If one is not defined explicitly, the entire file is treated as being inside an entry function. Therefore,

println("Hello, World.");

is a valid and complete program identical to

entry main(): usize {
    println("Hello, World.");
}

This behavior can lend GP1 a very flexible feeling akin to many scripting languages.

In a program where there is an entry-point specified, only expressions made within that function will be evaluated. This means that the following program does NOT print anything to the console.

entry main(): usize {
    con x: usize <- 7;
}

println("This text will not be printed.");

In fact, this program is invalid. Whenever there is an explicit entry point, no statements may be made in the global scope.

Control Flow

Conditionals

At this time, GP1 has only one non-looping conditional control structure, in two variants: match and match all. The syntax is as follows, where *expr* are expressions and pattern* are pattern matching options (refer to Pattern Matching for more info).

match expr {
    pattern1 => arm_expr1,
    pattern2 => arm_expr2,
    _ => arm_expr3,
}

The match expression executes the first arm that matches the pattern passed in expr. The match all expression executes all arms that match the pattern. Both flavors return their last executed expression.

The when keyword may be used in a given match arm to further restrict the conditions of execution, e.g.

con fs <- 43;

con is_even <- match fs {
    n when n % 2 == 0 => " is "
    _ => " is not "
};

print(fs + is_even + "even.")

Loops

Several looping structures are supported in GP1

loop
for
while
do/while

along with continue and break to help control program flow. All of these are statements.

loop { . . . }  // an unconditional loop -- runs forever or until broken

for i in some_iterable { . . . }  // loop over anything that is iterable

while some_bool { . . . }  // classic conditional loop that executes until the predicate is false

do { . . .
} while some_bool  // traditional do/while loop that ensures body executes at least once

Pattern Matching

Pattern matching behaves essentially as it does in SML, with support for various sorts of destructuring. It works in normal assignment and in match arms. It will eventually work in function parameter assignment, but perhaps not at first.

For now, some examples.

a <- ("hello", "world");  // a is a tuple of strings
(b, c) <- a;

assert(b == "hello" && c == "world")

fn u32_list_to_string(l: List<u32>): string {  // this is assuming that square brackets are used for linked lists
    con elements <- match l {
        [] => "",
        [e] => string(e),
        h::t => string(h) + ", " + u32_list_to_string(t),  // the bit before the arrow in each arm is a pattern
    }                                                      // h::t matches the head and tail of the list to h and t, respectively
    "[" + elements + "]"                                   // [s] matches any single-element list
}                                                          // [] matches any empty list

Interfaces

Interfaces are in Version 2 on the roadmap.

User-Defined Types

Enums

Enums are pretty powerful in GP1. They can be the typical enumerated type you'd expect, like

enum Coin { penny, nickle, dime, quarter }  // 'vanilla' enum

var a <- Coin.nickle
assert a == Coin.nickle

Or an enum can have an implicit field named value

enum Coin: u16 { penny(1), nickle(5), dime(10), quarter(25) }

var a <- Coin.nickle;
assert(a == Coin.nickle);
assert(a.value == 5);

Or an enum can be complex with a user-defined set of fields, like

enum CarModel(make: string, mass: f32, wheelbase: f32) {  // enum with multiple fields
   gt          ( "ford",  1581, 2.71018 ),
   c8_corvette ( "chevy", 1527, 2.72288 )
}

A field can also have a function type. For example

enum CarModel(make: string, mass: f32, wheelbase: f32, gasUsage: fn<f32 | f32>) {
   gt          ( "ford",  1581, 2.71018, { miles_traveled -> miles_traveled / 14 } ),
   c8_corvette ( "chevy", 1527, 2.72288, { miles_traveled -> miles_traveled / 19 } )
}

var my_car <- CarModel.c8_corvette;
var gas_used <- my_car.gasUsage(200);  // estimate how much gas I'd use on a 200 mile trip

Equivalence of enums is not influenced by case values, e.g.

enum OneOrAnother: u16 { one(0), another(0) }

con a <- OneOrAnother.one;
con b <- OneOrAnother.another;

assert(a != b);
assert(a.value == b.value);

It's important to remember that enums are 100% always totally in every concieveable fashion immutable. To make this easier to enforce, only value types are allowed for enum fields.

Records

Records are record types, defined with the record keyword. Fields are defined in the record block and behavior is defined in the optional impl block.

For example,

record Something {
    label: i32    // field label followed by some type
} impl { . . . }  // associated functions. This is different than having functions in the fields section because impl functions are not assignable.

If the record implements some interface, SomeInterface, the impl would be replaced with impl SomeInterface, and the functions of SomeInterface would be defined alongside any other functions of the Something record.

Unions

Unions are the classic discriminated sum type.

union BinaryTree {
    Empty,
    Leaf: i32,
    Node: (BinaryTree BinaryTree),
}

Type Aliases

Refer to Generics for info on the syntax used in this section.

Type aliasing is provided with the type keyword, e.g.

type TokenStream Sequence<Token>
type Ast Tree<AbstractNode>

fn parse(ts: TokenStream): Ast { . . . }

Notice how much cleaner the function definition looks with the aliased types. This keyword is useful mainly for readability and domain modeling.

Generics

Generics are in Version 2 on the official GP1 roadmap. They roughly use C++ template syntax or Rust generic syntax.

References and Reference Types

GP1 has three operators involved in handling references, #, &, and @. These are immutable reference, mutable reference, and dereference, respectively. Some examples of referencing/dereferencing values:

var a <- "core dumped";
var b <- &a;                                       // b is a mutable reference to a
                                                 
assert(a == @b);                                  
assert(a != b);                                   

@b <- "missing ; at line 69, column 420";
assert(a == "missing ; at line 69, column 420");

b <- &"missing ; at line 420, column 69";
assert(a != "missing ; at line 420, column 69");

var c <- #b;                                       // c is an immutable reference to b
assert(@c == b);
assert(@@c == a);

@c <- &"kablooey";                                 // this does not work. `c` is an immutable reference and cannot be used to assign its referent.

Naturally, only var values can be mutated through references.

The reference operators may be prepended to any type, T, to describe the type of a reference to a value of type T, e.g.

fn set_through(ref: &string) {  // this function takes a mutable reference to a string and returns `()`
    @ref <- "goodbye";
}

var a <- "hello";
set_through(&a);

assert(a == "goodbye");

Overview of the GP1 Programming Language