Internals¶
This section of the document provides a series of disconnected topics about the compiler internals that affects semantics.
Tuple operations¶
There are 3 basic operations/checks with tuples that affect many other
operations: a in b
, a does b
, and lambda call rules.
-
a in b
allows to work whenb
is a name/unnamed tuple even whena
is named. -
a does b
requiresb
to be named consistent with names ina
. -
lambda call matches the arguments with the definition in a third different set of rules.
cassert (a=1) in (1,a=1,3)
cassert (a=1) !does (1,a=1,3)
let f = fun(a) { puts "{}",a }
let g = fun(long, short) { puts "{}",long }
f(a=1) // OK
f(1) // OK
g(long=1, short=1) // OK
g(1,1) // compile error
let long=1
g(long, short=1) // OK
let short=1
g(long, short) // OK
Operators like a == b
, a case b
, a equals b
, ... built on top of the previous functionality.
Determinism¶
Pyrope is designed to be deterministic. This means that the result is always
the same for synthesis. Simulation is also deterministic unless the random seed
is changed or non-Pyrope (C++) calls to procedures
add non-determinism.
Expressions are deterministic because procedures
have an explicit order in
Pyrope. Only functions
have a non-explicit order, but functions
are pure
without side-effects. puts
can be called non-deterministically but the result
is buffered and ordered at the end of the cycle to be deterministic.
The only source of non-determinism is non-Pyrope (C++) calls from procedures
executed at different pipeline stages. The pipeline stages could be executed in
any order, it is just that the same order must be repeated deterministically
during simulation. The non-Pyrope calls must be ::[comptime]
to affect
synthesis. So the synthesis is deterministic, but the testing like cosimulation
may not.
The same non-Pyrope calls also represent a problem for the compiler optimizations. During the setup phase, several non-Pyrope can exist like reading the configuration file. If the non-Pyrope calls are not deterministic, the result could be a non-deterministic setup phase.
The idea is that the non-Pyrope API is also divided in 2 categories:
functions
and procedures
. A function
can be called many times without
non-Pyrope side-effects. Pyrope guarantees that the procedures
are called in
the same order given a source code, but does not guarantee the call order. This
guarantee order slowdowns simulation and elaboration. Whenever possible, use
functions
instead of procedures
for compilation speed reasons.
Import¶
import
statement allows for circular dependencies of files, but not of
variables. This means that if there is no dependency (a imports b
), just
running a
before b
is enough. If there is a dependency (a imports b
and b
imports a
) a multiple compiler pass is proposed, but other solutions are
allowed as long as it can handle not true circular dependences.
The solution to this problem is to pick an order, and import at least three
times the files involved in the cyclic dependency. The files involved in the
cylic dependency are alphabetically sorted and called three times: (1) a
import b
, then b import a
; (2) a import b
and b import a
; (3) a import
b
and b import a
. Only the last import chain can perform procedure proc
calls (Pyrope and non-Pyrope) and puts/debug statements.
If the result of the last two imports has the same variables, the import has "converged", otherwise a compile error is generated. This multi-pass solution does not address all the false paths, but the common case of having two sets of independent variables. This should address most of the Pyrope cases because there is no concept of "reference/pointer" which is a common source of dependences.
Register reference¶
Register reference (regfef
) can create a dependence update between files, but this is
not a source of non-determinism because only one file can perform updates for
the register din
pin, and all the updated register can only read the register
q
pin.
Dealing with unknowns¶
Pyrope tries to be compatible with synthesizable Verilog but not equivalent. As
such it must handle/understand unknowns. Compatible does not mean that it will
generate the same ?
bits as Verilog, but that it will not generate an unknown
when Verilog has known. It is allowed to generate a 0
or a 1
when the
Verilog logical equivalence check generates an ?
.
An example of different behavior is that Verilog semantics state 0 * 0sb?
is
0sb?
while most programmers would expect a zero.
The previous definition of compatibility could allow the Pyrope compiler to
randomly replace all the unknowns by 0
or 1
when doing internal compiler
passes. This is not done at compile time to keep determinism, but simulation
time should randomly pick 0/1 for unknown bits.
The issue is that the most likely source of having unknowns in operations is either during reset or due to a bug on how to handle non initialized structures like memories.
The compiler internal transformations use a 3-state logic that includes 0
,
1
, and ?
for each bit. Any register or memory initialized with unknowns
will generate a Verilog with the same initialization.
The compiler internals only needs to deal with unknowns during the copy propagation or peephole optimizations. The compile goes through 2 phases: LNAST and Lgraph.
In the compiler passes, we have the following semantics:
-
In Pyrope, there are 3 array-like structures: non-persistent arrays, register arrays, and custom RTL memories. Verilog and CHISEL memories get translated to custom RTL memories. Non-persistent Verilog/CHISEL get translated to arrays. In Verilog, the semantics is that an out of bounds access generates unknowns. In CHISEL, the
Vec
is that an out of bound access uses the first index of the array. A CHISEL out of bound memory is an unknown like in Verilog. These are the semantics applied by the compiler optimization/transformations:-
Custom RTL memories do not allow value propagation across the array, only across non-persistent arrays, or register arrays explicitly marked with
retime=true
. -
An out of bound RTL address drops the unused index bits. For non-power of two arrays, out of bounds access triggers a compile error. The code must be fixed to avoid access. An
if addr < mem_size { ... mem[addr] ... }
tends to be enough. This is to guarantee that passes like Verilog and CHISEL have the same semantics, and trigger likely bugs in Pyrope code. -
An index with unknowns does not perform value propagation.
-
-
Shifts, additions and substractions propagate unknowns at computation. E.g:
0b11?0 + 0b1
is0b11?1
,0b1?0 >> 1
is0b1?
. -
Other arithmetic are more conservative. When an input is unknown, the result is unknown only respecting the sign when possible. E.g:
0b1?0? * -1
is0sb1?
. -
Logic operations behave like Verilog.
0b000111??? | 0b01?01?01?
is0b01?111?1?
. -
Equality comparisons (
==
and!=
) use unknowns, this means that at compile time0b1? != 0b10
. Comparisons is consistent with the equivalent logic operationsa == b
is the same as(a ^ b) == -1
. -
Other comparisons (
<=
,<
,>
,>=
) return true if the comparison is true for each possible unknown bit. -
match
statement andunique if
will trigger a compile error if the unknown semantics during compiler passes can trigger 2 options simultaneously. The solution is to change to a sequence ofifs
or change the code to guarantee no unknowns. -
if
statement withoutunique
logical expressions that have an unknown (single bit) are a source of confusion. In Verilog, it depends on the compiler options. A classic compiler will generate?
in all the updated variables. A Tmerge option will propagate?
only if both sides can generate a different value. The LNAST optimization pass will behave like the Tmerge when the if/mux control has unknowns:-
If all the paths have the same constant value, the
if
is useless and the correct value will be used. -
If any path has a different constant value, the generated result bits will have unknowns if the source bits are different or unknown.
-
If any paths are not constant, there is no LNAST optimization. Further Lgraph optimizations could optimize if all the mux generated values are proved to be the same.
-
-
The
for
loops are expanded, if the expression in thefor
is unknown, a compile error is generated. -
The
while
loops are also expanded, if the condition on thewhile
loop has unknowns a compile error is generated.
At the end of the LNAST generation, a Lgraph is created. Only the registers and
memory initialization are allowed to have unknowns in Lgraph. Any invalid
(nil
) assigned to an output or register triggers a compile error. Any unknown constant
bit is translated preserved (0b10?
).
The semantics on the generated simulator are similar to CHISEL, any unknowns are randomly translated to 0 or 1 at initialization.
Optimize directive¶
The optimize
directive is like an assert
but it also allows compiler
optimizations. In a way, it is a safer version of Verilog ?
. Unlike other
languages like C++23, Pyrope optimize
verifies at simulation time that the
optimize
is correct. This means that the optimize
is checked like an
assert
but it allows the compiler to optimize based on the condition.
asserts
do not trigger optimizations because their check can be disabled at
simulation time, and hence create mismatches between simulation and synthesis
if the compiler optimized over assertions.
always_comb begin // one hot mux
case (sel)
3’b001 : f=i0;
3’b010 : f=i1;
3’b100 : f=i2;
default: f=2’b??;
endcase
end
optimize sel==1 or sel==2 or sel==4 // not needed. match sets it
match sel {
== 0b001 { f = i0 }
== 0b010 { f = i2 }
== 0b100 { f = i3 }
}
f = (sel[0] & i0)
| (sel[1] & i1)
| (sel[2] & i2)
Optimize allows more freedom, without dangerous Verilog x-optimizations:
if (a == 0) begin
assert(false);
out = '?;
end else if (1 + a) == 1 begin // always false
out = 1;
end else begin
out = 3;
end
array[3] = '?; // entry 3 will not be used
// array = (1,2,3,'?,5,6,7,8)
res = array[b]
optimize a != 0
if (1 + a) != 1 { // always false
out = 1
}else{
out = 3
}
optimize b != 3
// array = (1,2,3,4,5,6,7,8)
res = array[b]
Unknown no optimization¶
In Verilog, unknowns can trigger synthesis optimizations. This is not the case
in Pyrope. Each unknown bit (?
) can result in random 0/1 at simulation time, but it will
not trigger optimizations. The optimize
statement should be use for such behavior.
assert cond==3 // Not cassert or optimize, so no optimized
var x1 = 0sb?
if cond == 3 {
x1 = 1
}
assert x1==1 // still not optimized (cassert fails)
cassert !x1.[comptime]
var x2 = 0sb?
optimize cond==3
if cond == 3 {
x2 = 1
}
cassert x2==1
cassert x2.[comptime]
LNAST optimization¶
The compiler has three IR levels: The high level is the parse AST, the mid-level is the LNAST, and the low level is the Lgraph. This section explains the main steps in the LNAST optimizations/transformations before performing type synthesis and generating the lower level Lgraph. This is a minimum of optimizations without them several type conflicts would be affected.
Unlike the parse AST, the LNAST nodes are guaranteed to be in topological order. This means that a single pass that visits the children first (deep first) is sufficient.
The work can be performed as a single "global" topographical pass starting from the root/top, where each LNAST node performs these operations during traversal depending on the LNAST node:
-
If the node allows, perform these node input optimization first:
-
constant folding for existing node, also be performed as instruction combining proceeds
-
instruction combining from sources only for same type but not beyond 128 n-ary nodes. This step subsumes constant propagation and copy propagation. E.g:
a+(x-3)+1
becomesa+x-3+1
-
create a canonical order by sorting the inputs by name/constant. E.g:
+ 2 a b
. This simplifies the following steps but it is not needed for semantics. Most commutative gates (add/sub/and/or/...
) will have a single constant as a result. -
trivial simplification with constants for existing node, also performed as instruction combining proceeds. E.g.:
a+0 == a
,a or true == true
... -
trivial identity simplification for existing node, also performed as instruction combining proceeds. E.g:
a^a == a
,a-a=0
...
-
-
If the node is a
::[comptime]
trigger a compile error unless all the inputs are constantcassert
should satisfy the condition or a compile error is generated
-
If the node is a loop (
for
/while
) that has at least one iteration expand the loop. This is an iterative process because the loop exit condition may depend on the loop body or functions called inside. After the loop expansions, nofor
,while
,break
,last
,continue
statement exists. -
If the node is a function call, iterate over the possible polymorphic calls. Call the first call that is valid (input types). Call the function and pass all the input constants needed. This requires specializing the function by input constants and types. If no call matches a valid type trigger a compile error
-
Delete unreachable statements (
if false { delete his }
,delete this when false
, ...) -
Compute these steps that may be needed in future steps:
-
Perform the "Mark" phase typical in dead-code-elimination (DCE) so that dead nodes are not generated when creating the Lgraph.
-
Update the tuple field in the Symbol Table
-
Track the array accesses for memory/array Lgraph generation
-
Type synthesis¶
The type synthesis and check are performed during the LNAST pass. Pyrope uses a structural type system with global type inference.
The type inference should be performed as the same time as the LNAST optimization traverses the tree. It can not be a separate pass because there can be interactions between the LNAST optimization and the Type synthesis. These are the additional checks performed for type synthesis:
-
If the node does type checks (
equals
,does
) compute the outcome and perform copy propagation. The result of this step is that the compiler is effectively doing flow-type inference. All the types must be resolved before. If theequals
/does
was in aif
condition, the control is decided at compile time. -
If the node reads bitwidth, replace the node with the computer Bitwidth value (max, min, ubits, and/or sbits)
- Compute the max/min for the output[s] using the bitwidth algorithm. Update the symbol table with the range. This is only needed because some code like polymorphism functions can read the bits.
-
If the node is a conditional (
if
/match
), the pass performs narrowing1.-
When the expression has these possible syntax
v >= y
,v > y
or the reciprocals, restrict the Bitwidth. E.g: in thev < y
restricts thev.max = y.min-1 ; y.min = v.min + 1
-
When the expression is an equality format
eq [and eq]*
oreq [or eq]*
likev1 == z1 and v2 != z2
, create av1=z1
andv2=z2
in the corresponding path. This will help bitwidth and copy propagation. Complicated mixes of and/or have no path optimization -
When the expression is a single variable
a
or!a
, set the variabletrue
andfalse
in both paths
-
No previous transformation could break the type checks. This means that the copy propagation, and final lgraph translation the type checks are respected.
-
All the entries on the comparator have the same type (
LHS equals RHS
) -
Left side assignments respect the assigned type (
LHS does RHS
) -
Any explicit type on any expression should respect the type (
var does type
)
The previous algorithm describes the semantics, the implementation may be different. For example, to parallelize the algorithm, each LNAST tree can be processed locally, and then a global top pass is performed.
Programming warts¶
In programming languages, warts are small code fragments that have unexpected or not great behavior. Every language has its warts. This section tries to list the Pyrope main ones to address and learn more about the language.
Shadowing¶
Pyrope does not allow shadowing, but you can still have it with tuples. To
access the tuple field, the self.field
is always required. This avoid the
problem of true shadowing.
let f1 = fun() { 1 }
let tup = (
,f1 = fun() { 2 }
,code = fun() {
assert self.f1() == 2
assert f1() == 1
}
)
Closures¶
Closures capture extra state or inputs at definition. The capture variables are
always immutable let
no matter the outter scope definition. Therefore,
capture variables behave like passed by value, not reference.
One important thing is 'when' does the capture happens. Pyrope follows the model of most languages like C++ that captures at lambda definition, not lambda execution.
var x_s = 10
let call_captured = fun[x_s]() {
return fun[x_s]() {
assert x_s == 10
return x_s
}
}
test "capture test" {
let tst = fun() {
var x_s = 20 // not variable shadowing because fun scope
let x1 = call_captured()
assert x1 == 10
x_s = 30;
let x2 = call_captured()
assert x2 == 10
}
tst // call the test
}
#include <iostream>
int main() {
int x_s{ 10 };
auto call_captured{
[x_s]() {
assert(x_s == 10);
return x_s;
}
};
}
x_s = 20;
auto x1 = call_captured();
assert(x1==10);
x_s = 30;
auto x2 = call_captured();
assert(x2==10);
}
Some languages like ZIG do not allow closures, but they allow structs with a lambda to implement an equivalent functionality. It is possible in Pyrope to also create a tuple and populate the getter. This effectively behaves as the closures. Internally, Pyrope may do this implementation.
let j = 1
let b = fun[j](x:i32)-> :i32 {
return x+j
}
assert b(1) == 2
test "closure with tuple" {
var a: i32 = 1
a += 1
var addX = (
,a: i32 = a // copy value, runtime or comptime
,getter = fun(self, x: i32) {
return x + self.a
}
)
a += 100;
assert addX(2) == 4
}
test "plain closure" {
var a:i32 = 1
a += 1
let addX = fun[a](x: i32) { // Same behaviour as closure with tuple
return x + a
}
a += 100;
assert addX(2) == 4
}
pub fn main() void {
const j = 1;
var b = struct{
fn function(x: i32) i32 {
return x+j;
}
}.function;
@import("std").debug.assert(b(1) == 2);
}
test "closure with runtime" {
var a: i32 = 1;
a += 1;
const addX = (struct {
a: i32,
fn call(self: @This(), x: i32) i32 {
return x + self.a;
}
} { .a = a }).call;
a += 100;
@import("std").debug.assert(addX(2) == 4);
}
Capture values must be explicit, or no capture happens. This means that
...fun[](...)...
is the same as ...fun(...)...
.
var x = 3
let f1 = fun[x]()->(_:int){
assert x == 3
var x = _ // compile error. Shadow captured x
return 200
}
let f2 = fun()->(_:int){
var x = _ // OK, no captures 'x' variable
x = 100
return x
}
Capture variables pass the value at capture time:
var x = 3
var y = 10
let fun2 = fun[y]()->(_:int){
y = 100 // compile error, y is immutable when captured
var x = 200
return y + x
}
x = 1000
assert fun2() == 203
Lambda arguments¶
Lambda calls happen whenever an identifer is followed by a list of expressions. If the first expression in the list has parenthesis, it can lead to unexpected behavior:
assert 0 == (0) // OK, same as assert( 0 == (0) )
assert (0) == 0 // compile error: (assert(0)) == 0 is an expression
assert(0 == 0) // OK
It is also easy to forget that parenthesis can be ommited in simple expressions, not when ranges or tuples are involed.
asssert 2 in (1,2) // compile error, not allowed to drop parenthesis
asssert(2 in (1,2)) // OK
Multiple tuples¶
The evaluation order is always the same program order starting from the top module. Remember that the setter method is the constructor called even when there is no initial value set.
let X_t = (
,i1 = (
,i1_field:u32 = 1
,i2_field:u32 = 2
,setter = proc(ref self, a) {
self.i1_field = a
}
)
,i2 = (
,i1_field:i32 = 11
,setter = proc(ref self, a) {
self.i1_field = a
}
)
)
var top = (
,setter = proc(ref self) {
var x:X_t = _
assert x.i1.i1_field == 1
assert x.i1.i2_field == 2
assert x.i2.i1_field == 11
x.i1 = 400
assert x.i1.i1_field == 400
assert x.i1.i2_field == 2
assert x.i2.i1_field == 11
x.i2 = 1000
assert x.i1.i1_field == 400
assert x.i1.i2_field == 2
assert x.i2.i1_field == 1000
}
)
If a lambda in the hierarchy does not have a setter/constructor, the program order follows the tuple scope which is in tuple ordered asignment.
Unknowns¶
Pyrope respects the same semantics as Verilog with unknowns. As such, there can
be many unexpected behaviors in these cases. The difference is that in Pyrope
everything is initialized and unknowns (0sb?
) can happen only when explicitly
enabled.
The compare respects Verilog semantics. This means that it is true if and only if all the possible values are true, which is quite counter-intuitive behavior for programmers not used to 4 value logic.
assert !(0sb? == 0)
assert !(0sb? != 0)
assert !(0sb? == 0sb?)
assert !(0sb? != 0sb?)
There is no way to know at run-time if a value is unknown, but a compile trick can work. The reason is that integers can be converted to strings in a C++ API
var x = 0sb10?
let str = __to_string(x) // only works for compile time constants
assert x == "0sb10?"
for loop¶
The for
expects a tuple, and iterates over the tuple. This can lead to some
unexpected behaviour. The most strange is that ranges are always from smallest
to largest. It is not legal to do a 5..<0
range, the solution is to use a
to
which creates a tuple not a range.
let s:string="hell"
for (idx,i) in s.enumerate() {
let v = match idx {
== 0 { "h" }
== 1 { "e" }
== 2 { "l" }
== 3 { "l" }
}
assert v == i
}
let t = (1,2,3)
for (idx,i) in t.enumerate() {
let v = match idx {
== 0 { 1 }
== 1 { 2 }
== 2 { 3 }
}
assert v == i
}
let r=2..<5
for (idx,i) in r.enumerate() {
let v = match idx {
== 0 { 2 }
== 1 { 3 }
== 2 { 4 }
}
assert v == i
}
let r2=4..=2 step -1
assert r2 == (4,3,2)
for (idx,i) in r2.enumerate() {
let v = match idx {
== 0 { 4 }
== 1 { 3 }
== 2 { 2 }
}
assert v == r2[i]
}
for i in 2..<5 {
let ri = 2+(4-i) // reverse index
// 2 == (2..<5).trailing_one
// 4 == (2..<5).leading_one
let v = match idx {
== 0 { 4 }
== 1 { 3 }
== 2 { 2 }
}
assert v == ri
}
for (idx,i) in enumerate(123) {
assert i == 123 and idx==0
}
Multiple bit selection¶
Ranges are sets, this creates potentially unexpected results in reverse for
iterators, but also in bit section:
let v = 0xF0
assert v@[0] == 0
assert v@[4] == 1 // unsigned output
assert v@sext[4] == -1 // signed output
assert v@[3..=4] == 0b010 == v@[3,4]
assert v@[4..=3 step -1] == 0b010
assert v@[4,3] == v@[3,4] == 0b010
let tmp1 = (v@[4], v@[3])@[..] // typecast from
let tmp2 = (v@[3], v@[4])@[..]
let tmp3 = v@[3,4]
assert tmp1 == 0b01
assert tmp2 == 0b100
assert tmp3 == 0b10
let tmp1s = (v@sext[4], v@sext[3])@[..] // typecast from
let tmp2s = (v@sext[3], v@sext[4])@[..]
let tmp3s = v@[4,3]
assert tmp1s == 0b01
assert tmp2s == 0b10
assert tmp3s == 0b10
let tmp1ss = (v@sext[4], v@sext[3])@sext[..] // typecast from
let tmp2ss = (v@sext[3], v@sext[4])@sext[..]
let tmp3ss = v@sext[3,4]
assert tmp1ss == 0b01 == 1
assert tmp2ss == 0sb10 == -2
assert tmp3ss == 0sb10 == -2 == v@sext[4,3]
The reason is that for multiple bit selection assumes a smaller to larger bits. If the opposite order is needed, support functions/code must explicitly do it.
In Pyrope, there is no order in bit selection (xx@[0,1,2,3] == xx@[3,2,1,0]
).
This is done to avoid mistakes. If a bit swap is wanted, it must be explicit.
let reverse = fun(x:uint)->(total:uint) {
for i in 0..<x.__bits {
total <<= 1
total |= x@[i]
}
}
assert reverse(0b10110) == 0b01101
Unexpected calls¶
Passing a lambda argument with a ref
does not have any side effect because
lambdas without arguments need to be explicitly called or just passed as
reference.
let args = fun(x) { puts "args:{}", b ; 1}
let here = fun() { puts "here" ; 3}
let call_now = fun(f:fun){ return f() }
let call_defer = fun(f:fun){ return f }
let x0 = call_now(here) // prints "here"
let e1 = call_now(args) // compile error, args needs arguments
let x1 = call_defer(here) // nothing printed
let e2 = call_defer(args) // compile error, args needs arguments
assert x0 == 3 // nothing printed
assert x1 == 3 // nothing printed
let x2 = call_now(ref here) // prints "here"
let e3 = call_now(ref args) // compile error, args needs arguments
let x3 = call_defer(ref here) // nothing printed
let x4 = call_defer(ref args) // nothing printed
assert x2 == 3 // nothing printed
assert x3() == 3 // prints "here"
assert x3 == 3 // compile error, explicit call needed
assert x4 == 1 // compile error, args needs arguments
assert x4("xx") == 1 // prints "args:xx"
if
is an expression¶
Since if
, for
, match
are expressions, you can build some strange code:
if if x == 3 { true }else{ false } {
puts "x is 3"
}
Legal but weird¶
There is no --
operator in Pyrope, but there is a -
which can
be followed by a negative number -3
.
let v = (3)--3
assert v == 6
-
Narrowing is based on "ABCD: eliminating array bounds checks on-demand" by Ras Bodik et al. ↩