correctly handle unknown syntaxes when highlighting

This commit is contained in:
Alexander 2025-06-16 19:27:42 -04:00
parent 18b0b14e25
commit 64acf09214
6 changed files with 56 additions and 60 deletions

View file

@ -17,7 +17,7 @@ section.</p>
the character doesn't cause a parsing issue. For example, whitespace
tokens are not allowed in variable names.</p>
<p>Some examples of assigning variables:</p>
<pre class="gp1"><code>var x: i32; // x is an uninitialized 32-bit signed integer
<pre class="language-gp1"><code>var x: i32; // x is an uninitialized 32-bit signed integer
var y &lt;- x; // this won&#39;t work, because x has no value
x &lt;- 7;
var y &lt;- x; // this time it works, because x is now 7
@ -25,12 +25,12 @@ var y &lt;- x; // this time it works, because x is now 7
con a: f64 &lt;- 99.8; // a is immutable
a &lt;- 44.12; // this doesn&#39;t work, because con variables cannot be reassigned</code></pre>
<p>The following lines are equivalent,</p>
<pre class="gp1"><code>con a &lt;- f64(7.2);
<pre class="language-gp1"><code>con a &lt;- f64(7.2);
con a: f64 &lt;- 7.2;
con a &lt;- 7.2; // 7.2 is implicitly of type f64
con a &lt;- 7.2D; // With an explicit type suffix</code></pre>
<p>as are these.</p>
<pre class="gp1"><code>var c: f32 &lt;- 9;
<pre class="language-gp1"><code>var c: f32 &lt;- 9;
var c &lt;- f32(9);
var c: f32 &lt;- f32(9);
var c &lt;- 9F;</code></pre>
@ -68,7 +68,7 @@ Numeric operators are as one expects from C, with the addition of
<code>**</code> as a power operator.</p>
<p>Numeric literals have an implicit type, or the type can be specified
by a case-insensitive suffix. For example:</p>
<pre class="gp1"><code>var i1 &lt;- 1234; // implicitly i32
<pre class="language-gp1"><code>var i1 &lt;- 1234; // implicitly i32
var f1 &lt;- 1234.5; // implicitly f64
var i3 &lt;- 1234L; // i64
@ -150,7 +150,7 @@ value can be used as a literal in this fasion.</p>
double-quoted, e.g. <code>"Hello, World."</code>.</p>
<h3 id="arrays">Arrays</h3>
<p>GP1 supports typical array operations.</p>
<pre class="gp1"><code>var tuples : (int, int)[]; // declare array of tuples
<pre class="language-gp1"><code>var tuples : (int, int)[]; // declare array of tuples
var strings : string[]; // declare array of strings
var array &lt;- i32[n]; // declare and allocate array of n elements
@ -161,7 +161,7 @@ con nums &lt;- {1, 2, 3}; // immutable array of i32
<p>Use the <code>length</code> property to access the number of elements
in an allocated array. Attempting to access <code>length</code> of an
unallocated array is an exception.</p>
<pre class="gp1"><code>
<pre class="language-gp1"><code>
var colors &lt;- {&quot;Red&quot;, &quot;White&quot;, &quot;Blue&quot;}; // allocate array
var count &lt;- colors.length; // count is usize(3)
@ -170,7 +170,7 @@ var count &lt;- colors.length; // count is usize(3)
Negative values wrap from the end (-1 is the last element). An exception
occurs if the value is too big, i.e.no modulo operation is
performed.</p>
<pre class="gp1"><code>var w &lt;- {1, 2, 3, 4, 5, 6, 7};
<pre class="language-gp1"><code>var w &lt;- {1, 2, 3, 4, 5, 6, 7};
w[0] // first element, 1
w[-1] // last element, 7
@ -191,7 +191,7 @@ i.e.<code>(u128(4), "2").1</code> would be <code>"2"</code>.</p>
identical to that of .NET 5 and very similar to that of gawk.</p>
<h2 id="named-functions">Named Functions</h2>
<p>Some examples of defining named functions:</p>
<pre class="gp1"><code>fn sum(a: f32, b: f32): f32 { a + b } // takes parameters and returns an f32
<pre class="language-gp1"><code>fn sum(a: f32, b: f32): f32 { a + b } // takes parameters and returns an f32
fn twice_println(s: string) { // takes parameters and implicitly returns ()
println(&quot;${s}\n${s}&quot;);
@ -210,13 +210,13 @@ ordered from left to right in the function definition that is
unassigned. With regard to the <code>join_println</code> function
defined above, this means that all of the following are valid and behave
identically.</p>
<pre class="gp1"><code>join_println(a &lt;- &quot;Hello,&quot;, b &lt;- &quot;World.&quot;);
<pre class="language-gp1"><code>join_println(a &lt;- &quot;Hello,&quot;, b &lt;- &quot;World.&quot;);
join_println(b &lt;- &quot;World.&quot;, a &lt;- &quot;Hello,&quot;);
join_println(b &lt;- &quot;World.&quot;, &quot;Hello,&quot;);
join_println(&quot;Hello,&quot;, &quot;World.&quot;);</code></pre>
<p>Function names may be overloaded. For example,
<code>join_println</code> could be additionally defined as</p>
<pre class="gp1"><code>fn join_println(a: string, b: string, sep: string) {
<pre class="language-gp1"><code>fn join_println(a: string, b: string, sep: string) {
println(&quot;${a}${sep}${b}&quot;);
}</code></pre>
<p>and then both <code>join_println("Hello,", "World.", " ")</code> and
@ -226,7 +226,7 @@ be familar with this pattern from functional languages like F#, wherein
a wrapper function is often used to guard an inner recursive function
(GP1 permits both single and mutual recursion in functions). For
example:</p>
<pre class="gp1"><code>fn factorial(n: u256): u256 {
<pre class="language-gp1"><code>fn factorial(n: u256): u256 {
fn aux(n: u256, accumulator: u256): u256 {
match n &gt; 1 {
true =&gt; aux(n - 1, accumulator * n),
@ -242,7 +242,7 @@ syntax used in this example, refer to <em>Control Flow</em>.</p>
<p>Closures behave as one would expect in GP1, exactly like they do in
most other programming languages that feature them. Closures look like
this:</p>
<pre class="gp1"><code>var x: u32 &lt;- 8;
<pre class="language-gp1"><code>var x: u32 &lt;- 8;
var foo &lt;- { y, z =&gt; x * y * z}; // foo is a closure; its type is fn&lt;u32 | u32&gt;
assert(foo(3, 11) == (8 * 3 * 11)); // true
@ -268,7 +268,7 @@ sign is enclosed by them.</p>
<p>Lambdas are nearly identical to closures, but they don't close over
their environment, and they use the <code>-&gt;</code> symbol in place
of <code>=&gt;</code>. A few examples of lambdas:</p>
<pre class="gp1"><code>con x: u32 &lt;- 4; // this line is totally irrelevant
<pre class="language-gp1"><code>con x: u32 &lt;- 4; // this line is totally irrelevant
con square &lt;- { x -&gt; x * x }; // this in not valid, because the type of the function is not known
con square &lt;- { x: u32 -&gt; x * x }; // this if fine, because the type is specified in the lambda
@ -281,20 +281,20 @@ there is a separate syntax for function types. Given the function
<code>fn sum(a: f64, b: f64): f64 { a + b }</code> the function type is
expressed <code>fn&lt;f64 f64 | f64&gt;</code>, meaning a function that
accepts two f64 values and returns an f64. Therefore,</p>
<pre class="gp1"><code>fn sum(a: f64, b: f64): f64 { a + b } </code></pre>
<pre class="gp1"><code>con sum: fn&lt;f64 f64 | f64&gt; &lt;- { a, b -&gt; a + b };</code></pre>
<pre class="gp1"><code>con sum &lt;- { a: f64, b: f64 -&gt; a + b };</code></pre>
<pre class="language-gp1"><code>fn sum(a: f64, b: f64): f64 { a + b } </code></pre>
<pre class="language-gp1"><code>con sum: fn&lt;f64 f64 | f64&gt; &lt;- { a, b -&gt; a + b };</code></pre>
<pre class="language-gp1"><code>con sum &lt;- { a: f64, b: f64 -&gt; a + b };</code></pre>
<p>are all equivalent ways of binding a function of type
<code>fn&lt;f64 f64 | f64&gt;</code> to the constant <code>sum</code>.
Here's an example of how to express a function type for a function
argument.</p>
<pre class="gp1"><code>fn apply_op(a: i32, b: i32, op: fn&lt;i32 i32 | i32&gt;): i32 {
<pre class="language-gp1"><code>fn apply_op(a: i32, b: i32, op: fn&lt;i32 i32 | i32&gt;): i32 {
op(a, b)
}</code></pre>
<h3 id="function-type-inference">Function Type Inference</h3>
<p>The above example provides an explicit type for the argument
<code>op</code>. You could safely rewrite this as</p>
<pre class="gp1"><code>fn apply_op(a: i32, b: i32, op: fn): i32 {
<pre class="language-gp1"><code>fn apply_op(a: i32, b: i32, op: fn): i32 {
op(a, b)
}</code></pre>
<p>because the compiler can safely infer the function type of
@ -306,19 +306,19 @@ is not allowed.</p>
syntax used in this section.</p>
<p>Numeric types are automatically coerced into other numeric types as
long as that coercion is not lossy. For example,</p>
<pre class="gp1"><code>var x: i32 &lt;- 10;
<pre class="language-gp1"><code>var x: i32 &lt;- 10;
var y: i64 &lt;- x;</code></pre>
<p>is perfectly legal (the 32-bit value fits nicely in the 64-bit
variable). However, automatic coercion doesn't work if it would be
lossy, so</p>
<pre class="gp1"><code>var x: i64 &lt;- 10;
<pre class="language-gp1"><code>var x: i64 &lt;- 10;
var y: i32 &lt;- x;</code></pre>
<p>doesn't work. This holds for numeric literals as well.
Unsurprisingly, <code>var x: i32 &lt;- 3.14</code> wouldn't compile. The
floating point value can't be automatically coerced to an integer type.
So what does work? Casting via the target type's pseudo-constructor
works.</p>
<pre class="gp1"><code>con x: f64 &lt;- 1234.5; // okay because the literal can represent any floating point type
<pre class="language-gp1"><code>con x: f64 &lt;- 1234.5; // okay because the literal can represent any floating point type
con y: f64 &lt;- f16(1234.5); // also okay, because any f16 can be losslessly coerced to an f64
con z: i32 &lt;- i32(x); // also okay; uses the i32 pseudo-constructor to &#39;cast&#39; x to a 32-bit integer
@ -346,7 +346,7 @@ type of the function is not an integer, GP1 assumes an exit code of
<code>usize(0)</code> and returns that to the operating system.</p>
<p>The following program prints Hello, World. and exits with an error
code.</p>
<pre class="gp1"><code>entry main(): usize {
<pre class="language-gp1"><code>entry main(): usize {
hello_world();
1
}
@ -358,9 +358,9 @@ fn hello_world() {
keyword that makes it the entry point. The entry function may also be
implicit. If one is not defined explicitly, the entire file is treated
as being inside an entry function. Therefore,</p>
<pre class="gp1"><code>println(&quot;Hello, World.&quot;);</code></pre>
<pre class="language-gp1"><code>println(&quot;Hello, World.&quot;);</code></pre>
<p>is a valid and complete program identical to</p>
<pre class="gp1"><code>entry main(): usize {
<pre class="language-gp1"><code>entry main(): usize {
println(&quot;Hello, World.&quot;);
}</code></pre>
<p>This behavior can lend GP1 a very flexible feeling akin to many
@ -368,7 +368,7 @@ scripting languages.</p>
<p>In a program where there is an entry-point specified, only
expressions made within that function will be evaluated. This means that
the following program does NOT print anything to the console.</p>
<pre class="gp1"><code>entry main(): usize {
<pre class="language-gp1"><code>entry main(): usize {
con x: usize &lt;- 7;
}
@ -383,7 +383,7 @@ structure, in two variants: <code>match</code> and
<code>*expr*</code> are expressions and <code>pattern*</code> are
pattern matching options (refer to <em>Pattern Matching</em> for more
info).</p>
<pre class="gp1"><code>match expr {
<pre class="language-gp1"><code>match expr {
pattern1 =&gt; arm_expr1,
pattern2 =&gt; arm_expr2,
_ =&gt; arm_expr3,
@ -394,7 +394,7 @@ expression executes all arms that match the pattern. Both flavors return
their last executed expression.</p>
<p>The <code>when</code> keyword may be used in a given match arm to
further restrict the conditions of execution, e.g.</p>
<pre class="gp1"><code>con fs &lt;- 43;
<pre class="language-gp1"><code>con fs &lt;- 43;
con is_even &lt;- match fs {
n when n % 2 == 0 =&gt; &quot; is &quot;
@ -412,10 +412,10 @@ print(fs + is_even + &quot;even.&quot;)</code></pre>
</ul>
<p>along with <code>continue</code> and <code>break</code> to help
control program flow. All of these are statements.</p>
<pre class="gp1"><code>loop { . . . } // an unconditional loop -- runs forever or until broken</code></pre>
<pre class="gp1"><code>for i in some_iterable { . . . } // loop over anything that is iterable</code></pre>
<pre class="gp1"><code>while some_bool { . . . } // classic conditional loop that executes until the predicate is false</code></pre>
<pre class="gp1"><code>do { . . .
<pre class="language-gp1"><code>loop { . . . } // an unconditional loop -- runs forever or until broken</code></pre>
<pre class="language-gp1"><code>for i in some_iterable { . . . } // loop over anything that is iterable</code></pre>
<pre class="language-gp1"><code>while some_bool { . . . } // classic conditional loop that executes until the predicate is false</code></pre>
<pre class="language-gp1"><code>do { . . .
} while some_bool // traditional do/while loop that ensures body executes at least once</code></pre>
<h2 id="pattern-matching">Pattern Matching</h2>
<p>Pattern matching behaves essentially as it does in SML, with support
@ -423,7 +423,7 @@ for various sorts of destructuring. It works in normal assignment and in
<code>match</code> arms. It will eventually work in function parameter
assignment, but perhaps not at first.</p>
<p>For now, some examples.</p>
<pre class="gp1"><code>a &lt;- (&quot;hello&quot;, &quot;world&quot;); // a is a tuple of strings
<pre class="language-gp1"><code>a &lt;- (&quot;hello&quot;, &quot;world&quot;); // a is a tuple of strings
(b, c) &lt;- a;
assert(b == &quot;hello&quot; &amp;&amp; c == &quot;world&quot;)
@ -442,24 +442,24 @@ fn u32_list_to_string(l: List&lt;u32&gt;): string { // this is assuming that sq
<h3 id="enums">Enums</h3>
<p>Enums are pretty powerful in GP1. They can be the typical enumerated
type you'd expect, like</p>
<pre class="gp1"><code>enum Coin { penny, nickle, dime, quarter } // &#39;vanilla&#39; enum
<pre class="language-gp1"><code>enum Coin { penny, nickle, dime, quarter } // &#39;vanilla&#39; enum
var a &lt;- Coin.nickle
assert a == Coin.nickle
</code></pre>
<p>Or an enum can have an implicit field named <code>value</code></p>
<pre class="gp1"><code>enum Coin: u16 { penny(1), nickle(5), dime(10), quarter(25) }
<pre class="language-gp1"><code>enum Coin: u16 { penny(1), nickle(5), dime(10), quarter(25) }
var a &lt;- Coin.nickle;
assert(a == Coin.nickle);
assert(a.value == 5);</code></pre>
<p>Or an enum can be complex with a user-defined set of fields, like</p>
<pre class="gp1"><code>enum CarModel(make: string, mass: f32, wheelbase: f32) { // enum with multiple fields
<pre class="language-gp1"><code>enum CarModel(make: string, mass: f32, wheelbase: f32) { // enum with multiple fields
gt ( &quot;ford&quot;, 1581, 2.71018 ),
c8_corvette ( &quot;chevy&quot;, 1527, 2.72288 )
}</code></pre>
<p>A field can also have a function type. For example</p>
<pre class="gp1"><code>enum CarModel(make: string, mass: f32, wheelbase: f32, gasUsage: fn&lt;f32 | f32&gt;) {
<pre class="language-gp1"><code>enum CarModel(make: string, mass: f32, wheelbase: f32, gasUsage: fn&lt;f32 | f32&gt;) {
gt ( &quot;ford&quot;, 1581, 2.71018, { miles_traveled -&gt; miles_traveled / 14 } ),
c8_corvette ( &quot;chevy&quot;, 1527, 2.72288, { miles_traveled -&gt; miles_traveled / 19 } )
}
@ -467,7 +467,7 @@ assert(a.value == 5);</code></pre>
var my_car &lt;- CarModel.c8_corvette;
var gas_used &lt;- my_car.gasUsage(200); // estimate how much gas I&#39;d use on a 200 mile trip</code></pre>
<p>Equivalence of enums is not influenced by case values, e.g.</p>
<pre class="gp1"><code>enum OneOrAnother: u16 { one(0), another(0) }
<pre class="language-gp1"><code>enum OneOrAnother: u16 { one(0), another(0) }
con a &lt;- OneOrAnother.one;
con b &lt;- OneOrAnother.another;
@ -482,7 +482,7 @@ only value types are allowed for enum fields.</p>
keyword. Fields are defined in the <code>record</code> block and
behavior is defined in the optional <code>impl</code> block.</p>
<p>For example,</p>
<pre class="gp1"><code>record Something {
<pre class="language-gp1"><code>record Something {
label: i32 // field label followed by some type
} impl { . . . } // associated functions. This is different than having functions in the fields section because impl functions are not assignable.</code></pre>
<p>If the record implements some interface, <code>SomeInterface</code>,
@ -492,7 +492,7 @@ the <code>impl</code> would be replaced with
functions of the <code>Something</code> record.</p>
<h3 id="unions">Unions</h3>
<p>Unions are the classic discriminated sum type.</p>
<pre class="gp1"><code>union BinaryTree {
<pre class="language-gp1"><code>union BinaryTree {
Empty,
Leaf: i32,
Node: (BinaryTree BinaryTree),
@ -502,7 +502,7 @@ functions of the <code>Something</code> record.</p>
section.</p>
<p>Type aliasing is provided with the <code>type</code> keyword,
e.g.</p>
<pre class="gp1"><code>type TokenStream Sequence&lt;Token&gt;
<pre class="language-gp1"><code>type TokenStream Sequence&lt;Token&gt;
type Ast Tree&lt;AbstractNode&gt;
fn parse(ts: TokenStream): Ast { . . . }</code></pre>
@ -518,7 +518,7 @@ Types</h2>
<code>#</code>, <code>&amp;</code>, and <code>@</code>. These are
immutable reference, mutable reference, and dereference, respectively.
Some examples of referencing/dereferencing values:</p>
<pre class="gp1"><code>var a &lt;- &quot;core dumped&quot;;
<pre class="language-gp1"><code>var a &lt;- &quot;core dumped&quot;;
var b &lt;- &amp;a; // b is a mutable reference to a
assert(a == @b);
@ -539,7 +539,7 @@ assert(@@c == a);
references.</p>
<p>The reference operators may be prepended to any type, T, to describe
the type of a reference to a value of type T, e.g.</p>
<pre class="gp1"><code>fn set_through(ref: &amp;string) { // this function takes a mutable reference to a string and returns `()`
<pre class="language-gp1"><code>fn set_through(ref: &amp;string) { // this function takes a mutable reference to a string and returns `()`
@ref &lt;- &quot;goodbye&quot;;
}

1
acl.cool/syntax_wrapper.sh Symbolic link
View file

@ -0,0 +1 @@
../syntax_wrapper.sh

View file

@ -21,23 +21,8 @@ source ./pgvv/bin/activate
find acl.cool/site/ ytheleus.org/site/ -type f \( -name '*.dj' -o -name '*.html' \) -exec cat {} + >all_chars.txt
cat common_chars.txt >>all_chars.txt
for font in fonts/LiterataTT/LiterataTT-Subhead{Regular,Italic,Bold,BoldItalic}.woff2; do
woff2_decompress "$font"
ttf_font="${font%.woff2}.ttf"
subset_ttf="${ttf_font%.ttf}-Subset.ttf"
hb-subset "$ttf_font" \
--output-file="$subset_ttf" \
--text-file=all_chars.txt \
--layout-features='*' \
--passthrough-tables
woff2_compress "$subset_ttf"
rm "$subset_ttf" "$ttf_font"
done
for font in fonts/JuliaMono/*{-Light,-Regular,-SemiBold}{,Italic}.woff2; do
for font in fonts/LiterataTT/LiterataTT-Subhead{Regular,Italic,Bold,BoldItalic}.woff2 \
fonts/JuliaMono/*{-Light,-Regular,-SemiBold}{,Italic}.woff2; do
woff2_decompress "$font"
ttf_font="${font%.woff2}.ttf"

View file

@ -139,4 +139,4 @@ delete_all = true
[widgets.syntax]
widget = "preprocess_element"
selector = 'pre code'
command = "pygmentize -l ${ATTR_CLASS##*-} -f html | head -c -13 | awk -F '<pre>' '{print $NF}'"
command = "./syntax_wrapper.sh ${ATTR_CLASS##*-}"

9
syntax_wrapper.sh Executable file
View file

@ -0,0 +1,9 @@
#! /usr/bin/env bash
if [[ $# -lt 1 ]] || ! { pygmentize -L lexers | grep -qw "$1"; }; then
printf "<code>"
cat
printf "</code>"
else
pygmentize -l $1 -f html | head -c -13 | awk -F '<pre>' '{print $NF}'
fi

View file

@ -0,0 +1 @@
../syntax_wrapper.sh