ctlox
is Lox from Crafting Interpreters in C++, at compile-time.
Implemented using constexpr
everywhere.
The advanced techniques employed here are heavily based upon the work by Jules Pénochot in Advanced techniques for high performance code generation and its reference repository poacher.
A Lox program is compiled using the ctlox::v2::compile<source>()
entrypoint,
or by using the _lox
user-defined literal suffix. This program can then be invoked
as-is to execute the program, and may be invoked any number of times.
using namespace ctlox::v2::literals;
constexpr auto program = R"(
print "Hello, world!";
)"_lox;
program(); // writes "Hello, world!\n" to stdout
Arbitrary native functions may be defined by passing in a setup_fn
callable
which accepts a ctlox::v2::environment*
. In particular, this can be used to
override println
, the native function used to implement print
statement:
constexpr int count_prints() {
int i = 0;
auto println = [&i](const ctlox::v2::value_t&) { ++i; };
auto setup = [=](ctlox::v2::environment* env) {
// println takes 1 argument
env->define_native<1>("println", println);
};
program(setup);
return i;
}
static_assert(count_prints() == 6);
The scanner and the parser are implemented mostly as-is from JLox, with some adjustments where necessary to be constexpr-compatible.
Following that, as described in "Advanced techniques for high performance code generation", the AST is serialized into a fixed-size flat tree, allowing it to be used as a non-type template parameter. This flat AST is first passed to a resolver, and both the AST and the resolved bindings are passed to the code generator, which traverses the AST and generates a tree of embedded lambdas.
For closures, ctlox implements upvalue semantics, and uses simple ref-counting to clean up memory. However, there is no garbage collection or cycle detection, so even as much as a local function referencing itself will cause a (lox) memory leak. No memory is actually leaked in the C++ program, since the "heap" is fully destroyed when the lox program exits.
The goal is to implement all of Lox in ctlox::v2. As of the time of writing, chapters 1-11 are implemented.
MSVC fails to compile some of the constexpr/template patterns employed here. I've managed to find workarounds for some of the valid code, but abandoned and switch entirely to LLVM/MSVC after one such error would require a massive reimagining of the code to correct. See constexpr_limitations for more details.
Implemented using a pipeline-style alias template syntax, partially inspired by projects like tmp, with a sprinkling of concepts from Rappel (currently closed-source).
A Lox program is parsed into an AST type using ctlox::v1::compile<source>
.
This program may then be passed to ctlox::v1::execute<program>
, which will
return a type-list containing the printed values.
using namespace ctlox::v1::literals;
using program = ctlox::v1::compile<R"(
print "Hello, world!";
)"_ct>;
using output = ctlox::v1::execute<program>;
static_assert(std::is_same_v<
output,
ctlox::v1::list<ctlox::v1::value_t<"Hello, world!"_ct>>>);
At the core, ctlox/v1/ct.hpp
contains a lot of scaffolding and a few generic functions.
The main user-facing syntaxes are to define pipelines:
// when invoked, my_pipeline replaces the first type in the list with int
using my_pipeline = compose<
drop<1>,
prepend<int>>;
and to invoke pipelines:
using my_list = run<
given_pack<char, double, int>,
my_pipeline,
listed>;
static_assert(std::is_same_v<my_list, list<int, double, int>>);
Like regular functions, alias functions are primarily defined by their inputs and their outputs. Unlike regular functions, their outputs aren't returned to the caller, but rather directly passed as the inputs to the next function in the pipeline.
This library distinguishes between the two main types of inputs/outputs: exactly one type,
and a pack of types. The concepts accepts_one
and accepts_pack
represent
this distinction, and it is a compile-time error to chain a function which returns
a singular type into a function which accepts a pack, or vice versa. Internally, this
is checked with the above concepts and the usage of specific call1
and calln
templates, so a function which "returns" a pack which happens to have exactly 1 type in it
will still properly be treated as a pack.
There is a special third kind of input: exactly zero types, modeled by initiator
.
This kind of function may be used as the first item in a pipeline to provide types for
the next function in the pipeline. Specifically, given_one<T>
and given_pack<Ts...>
are initiators which provide a singular type and a pack respectively.
There is also a special kind of output: functions which model terminates_one
or terminates_pack
resolve to an actual type rather than passing arguments to another
function in the pipeline. Specifically, returned
resolves to its singular argument T
,
and listed
resolves to list<Ts...>
.
If a pipeline is executed and ends without having an explicit terminator, a special
function deferred
is injected. This function models both terminates_one
and terminates_pack
,
and resolves to given_one<T>
and given_pack<Ts...>
respectively. This allows
the result of an unterminated pipeline to be used as the initiator to another pipeline:
using my_tuple_types = run<given_one<int>, repeat<2>, prepend<char>, append<double>>;
using my_tuple = run<my_tuple_types, applied<std::tuple>, returned>;
static_assert(std::is_same_v<my_tuple, std::tuple<char, int, int, double>>);
A subset of Lox (chapters 1-8) is implemented in ctlox::v1
as the three alias functions
scan_ct
, parse_ct
and interpret_ct
.
scan_ct<s>
iterates over s and produces a pack oftoken_t
specializations,parse_ct
takes those tokens and produces a pack of*_stmt
specializations, where their template arguments are those statements' substatements, and subexpressions,interpret_ct
takes those statements and produces a pack ofvalue_t
specializations, each corresponding to an executedprint
statement and containing the resulting value, as-is (non-stringified).
Any error in this process, be it an invalid token, an invalid expression, or a "runtime" error, results in a compile-time error.
The main limitation with this approach is that compilers begin to reject programs past a
relatively small level of complexity. For instance, adding a single subexpression to
the program in src/v1.cpp
causes MSVC to fail with error C1202, "recursive
type or function dependency context too complex". Hence, ctlox::v2.