Project 5: Code Generation
Optional checkpoint due Monday, Apr 15, 2024 at 8pm ET
Final deadline Monday, Apr 22, 2024 at 8pm ET
A PDF version of this document is located here.In this project, you will implement a code generator for uC, a small language in the C family. The main purpose of this project is to gain an understanding of metaprogramming, how one language can be translated into another, and to get practice with using macros and templates. A secondary goal is to gain experience working with an existing code base, with complex code written by someone else.
This project does not rely on Project 4 in any way. We will only be testing this project with uC code that is semantically correct, so semantic analysis will be disabled. You may start with either your Project 4 implementation or a fresh copy of the Project 4 starter files as a basis, and we will only be providing distribution code for files that are different from or in addition to Project 4.
Since this project allows reuse of code from Project 4, if you choose to work in a different partnership than for Project 4, you must start from a fresh copy of the Project 4 starter files. You will not need to reimplement any part of Project 4 before proceeding with this project.
Please see the syllabus for partnership rules. As a reminder, you may not share any part of your solution outside of your partnership. This includes both code and test cases.
The project is divided into multiple phases that are described below.
Reported Time to Complete the Project
The following is the time students reported they spent on the project in Winter 2023.
These data are for planning purposes only. We do not consider the exact time spent on the project to be reflective of anyone’s learning or ability. Rather, completing the project regardless of how much time it takes is what is important to achieve the learning goals of the project.
Optional Checkpoint
The checkpoint consists of achieving at least 30% of the points on the public and private test cases before the checkpoint deadline. Your grade for the checkpoint will be computed as the better of:
\(min(0.3, score) / 0.3\), where \(score\) is the fraction of autograded points earned by your best submission before the checkpoint deadline.
\(finalScore\), where \(finalScore\) is the fraction of autograded points earned by your best submission before the final deadline.
Thus, completing the checkpoint is optional. However, doing it will work to your benefit, since you can guarantee full credit on the 20% of the project points dedicated to the checkpoint.
For this project, passing the public test cases for Phases 1-3 plus the style checks is enough to guarantee full credit on the checkpoint.
Overview
In Project 4, you implemented a semantic analyzer for uC, which performed multiple analysis phases over a source program. Your main task in Project 5 is to implement the final code-generation phase, which will generate C++ code from an abstract syntax tree (AST). Since the compiler is generating C++ code rather than machine code, it is a source-to-source compiler.
As an example, consider the following uC program:
void main(string[] args) {
println("Hello world!");
}
This can be translated into the following C++ code (which, modulo some minor spacing, is the actual code produced by the staff solution to this project):
#include "defs.hpp"
#include "ref.hpp"
#include "array.hpp"
#include "library.hpp"
#include "expr.hpp"
namespace uc {
// Forward type declarations
// Forward function declarations
UC_PRIMITIVE(void)
UC_FUNCTION(main)(UC_ARRAY(UC_PRIMITIVE(string)) UC_VAR(args));
// Full type definitions
// Full function definitions
UC_PRIMITIVE(void)
UC_FUNCTION(main)(UC_ARRAY(UC_PRIMITIVE(string)) UC_VAR(args)) {
UC_FUNCTION(println)("Hello world!"s);
}
} // namespace uc
int main(int argc, char **argv) {
uc::UC_ARRAY(uc::UC_PRIMITIVE(string)) args =
uc::uc_make_array_of<uc::UC_PRIMITIVE(string)>();
for (int i = 1; i < argc; i++) {
uc::uc_array_push(args, uc::UC_PRIMITIVE(string)(argv[i]), -1);
}
uc::UC_FUNCTION(main)(args);
return 0;
}
First, there are some includes of library files written in C++, which
define built-in uC functions and types as well as macros and
templates to be used by the generated code. The generated code itself
is placed in the uc
namespace to avoid clashing with other code.
Forward declarations of types and functions come first, allowing them
to be used before their definition as required by uC. Then there
are full type and function definitions. The generated code uses macros
to mangle names into identifiers that will not clash with C++ names
or with each other, in the case of names belonging to different
categories. Finally, a C++ main()
function converts command-line
arguments into the format expected by uC and calls the uC
main()
function.
The compiler is run as follows to generate C++ code:
$ python3 ucc.py -C <source file>
This parses the source file and generates C++ code, saving it to a file. You can then compile the resulting C++ code as follows:
$ g++ --std=c++20 -I. -o <executable> <generated file>
The -I.
argument tells g++
to search the current directory for
header files and is necessary if the generated file is not in the
current directory.
For the example above, we can run:
$ python3 ucc.py -C tests/hello.uc
$ g++ --std=c++20 -I. -o hello.exe tests/hello.cpp
$ ./hello.exe
Hello world!
The compiler implementation is divided into several Python and C++ files, described below.
Distribution Code
Use the following commands to download and unpack the distribution code:
$ wget https://eecs390.github.io/project-uc/backend/starter-files.tar.gz
$ tar xzf starter-files.tar.gz
Start by looking over the distribution code, which consists of the following files:
File or directory |
Purpose |
What you need to do with it |
---|---|---|
|
Library |
Read and use it |
|
Library |
Read and use it |
|
Library |
Read and use it |
|
Library |
Read it |
|
Library |
Read, use, and modify it |
|
Testing |
Run it with |
|
Testing |
Read, use, and modify it |
|
Testing |
Read, use, and modify it |
You will also need to use and modify code from Project 4. You will
specifically need to modify ucbase.py
, ucstmt.py
, and
ucexpr.py
. You will also need to modify the ucbackend.py
file
included in the Project 4 distribution.
Driver Files
The top-level entry point of the compiler is ucc.py
. It opens the
given uC source file, invokes the parser to produce an AST, and
then invokes each of the compiler phases on the AST. If the -S
command-line argument is present, it only performs semantic analysis
on the AST without any code generation. Otherwise, if the -C
argument is present, it disables semantic analysis and invokes the
code-generation phases on the AST to generate C++ code to a file.
If the -A
command-line argument is present, it performs both
semantic analysis and code generation.
The interface for the backend compiler phases is defined in
ucbackend.py
. If all backend phases are enabled, the distribution
code calls gen_header()
to generate the header for a generated
program, which includes library files and opens the uc
namespace.
Then each backend phase is invoked in turn:
gen_type_decls()
, which is responsible for generating forward declarations for each user-defined typegen_function_decls()
, which generates forward declarations for each user-defined functiongen_type_defs()
, which produces definitions for each user-defined typegen_function_defs()
, which is responsible for producing definitions for each user-defined function
Then the distribution code calls gen_footer()
, which closes the
uc
namespace and generates a C++ main()
function that calls
the uC main()
. You will need to place your code for invoking
code generation on the AST within the top-level functions for each
phase.
The ucc.py
driver also provides a --backend-phase
command-line
argument to limit which backend phases run. For example, invoking
ucc.py
with --backend-phase=3
runs only Phases 1 through 3 of
the backend, without calling gen_header()
or gen_footer()
. Our
testing framework makes use of this to test the early phases of
code generation.
Unlike in Project 4, it is up to you how to structure your code for
code generation. You may add whatever member functions you need to
ASTNode
, and then call them from the top-level functions for each
phase.
A PhaseContext
object, defined in uccontext.py
, provides a
print()
method to print to an output file. The distribution code
sets the output file appropriately for code generation, and you should
use a context’s print method when generating code. If the optional
argument indent=True
is given, it prepends the output with the
context’s current indentation string. At the beginning of code
generation, indentation is set to two spaces. You may wish to change
the indentation level at various points to produce more readable
output code.
All arguments to a context’s print()
method other than indent
are passed on to the standard, top-level print()
function. The
following are some combinations of indentation and line endings that
can be accomplished with a call to print()
on a context ctx
:
ctx.print(args)
: print arguments to the output file, with a trailing newlinectx.print(args, end='')
: print arguments to the output file, but without a trailing newlinectx.print(args, indent=True)
: print the current indentation followed by the arguments to the output file, with a trailing newlinectx.print(args, indent=True, end='')
: print the current indentation followed by the arguments to the output file, but without a trailing newline
We recommend cloning a context when increasing the indentation level. The following creates a new context with two additional spaces of indentation:
new_ctx = ctx.clone()
new_ctx.indent += ' '
# alternatively, do the above in one step
new_ctx = ctx.clone(indent=' ')
We recommend using Python 3 string formatting or Python 3.6 f-strings to construct output text.
Library Files
The header files provided in the distribution are C++ library files
that are included in a generated uC program. They implement useful
macros and templates that can be used by the code generator, as well
as definitions for built-in uC types and functions. The preamble
code generated by ucbackend.code_gen()
includes the library
headers from the output C++ file.
The most basic macro definitions are in defs.hpp
. It defines
name-mangling macros that convert uC names to distinct C++
names. These are:
UC_PRIMITIVE
: mangles the name of a primitive typeUC_TYPEDEF
: mangles the name of a user-defined type, as used when generating the struct definition of the typeUC_REFERENCE
: mangles the name of a user-defined type and wraps it in auc_reference
, providing the reference semantics required for user-defined typesUC_ARRAY
: denotes an array of the given element type, which must itself already be mangledUC_FUNCTION
: mangles the name of a functionUC_VAR
: mangles the name of a variable
Make sure to apply the appropriate macro when generating code for one
of the entities above. Within the uC compiler,
BaseTypeNameNode
objects define a mangle()
method that you can
use to mangle type names.
The library file ref.hpp
defines a uc_reference
template that
implements reference semantics, as well as memory management using
reference counting. It uses std::shared_ptr
in order to perform
the latter. Except for constructing a null reference, you will not
have to create a uc_reference
object directly in the compiler.
Instead, use the uc_construct()
function template to allocate an
object and wrap it in a reference. You will need to explicitly
instantiate the template, such as
uc_construct<UC_REFERENCE(foo)>(...)
. To construct a null
reference of a specific type, use the default constructor for a
uc_reference
of the appropriate type.
The file array.hpp
implements operations on arrays. An array can be
constructed with the uc_construct()
function template defined in
ref.hpp
, explicitly instantiated with the array’s type. The
templates uc_array_length()
, uc_array_push()
,
uc_array_pop()
, and uc_array_index()
in array.hpp
implement
the corresponding array operations.
The file library.hpp
defines the appropriate aliases for primitive
types as well as the built-in library functions. You will not have to
use any of these directly.
The file expr.hpp
provides a definition for the uc_id()
function template, and it is also intended to implement function
overloads for the polymorphic operations in Phase 5. You will need to
fill in the code for expr.hpp
and use it when generating code for
the polymorphic operations. The uc_add()
overloads should add two
items together. You will need to provide overloads for the
combinations of types that may be added in uC. Do not repeat code;
use function templates where possible.
Testing Framework
A basic testing framework is implemented in the Makefile
. There
are separate targets for each of the phases:
make phase1
: Run only Phase 1 of code generation. Compile the output with a provided*_phase1.cpp
test file usingg++ -c
, so thatg++
does not attempt to link the resulting object file.make phase2
: Run Phases 1 and 2 of code generation. Compile the output with a provided*_phase2.cpp
test file usingg++ -c
.make phase3
: Run up to Phase 3 of code generation. Compile the output with a provided*_phase3.cpp
test file usingg++
, producing an executable. Run the resulting executable.make phase4
: Run all phases of code generation on tests that only require up to Phase 4. Compile the output usingg++
, producing an executable. Run the resulting executable with command-line arguments20 10 5 2
, save the output to a file with extension.run
, and compare against the expected output in the.run.correct
file.make phase5
: Run all phases of code generation on tests that require up to Phase 5. Compile the output usingg++
, producing an executable. Run the resulting executable with command-line arguments20 10 5 2
and compare the results to the expected output.
You may need to change the value of the PYTHON
and CXX
variables if the Python executable is not in the path or not called
python3
, or if you want to use a different C++ compiler. You can
do so from the command line:
$ make PYTHON=<your python here> CXX=<your C++ compiler here> ...
Test Cases
The following basic test cases are provided in the tests
directory, with output from running the resulting executable.
Test |
Description |
---|---|
|
Test default constructors for user-defined types. |
|
Test equality and inequality comparisons. |
|
Simple “hello world” example. |
|
Test accessing the |
|
Test literals and addition between strings and other types. |
|
Test accessing a field through a null reference. |
|
Complex example of a uC program. |
|
Test using types or functions before their declaration, which should be valid. |
These are only a limited set of test cases. You will need to write extensive test cases of your own to ensure your compiler is correct.
You will not be able to directly compile the generated C++ code from a
uC program until you complete Phase 4, since you will need to be
able to generate the uC main()
function. However, for each
test case, we provide phase-specific test files in the form of
<test>_phase{1,2,3}.cpp
. These files directly test the
functionality required up to the associated phase. For Phases 1 and 2,
they ensure that the generated declarations are correct. For Phase 3,
the test files determine whether or not type definitions are correct.
Use the provided Makefile
to run these tests, as described in
Testing Framework.
You may find it useful to examine the generated C++ code with macros expanded, so that you can reason about its correctness with respect to the C++ language. We recommend using a command such as the following to run the code through the C preprocessor:
$ cat defs.hpp tests/default.cpp | grep -v '#include' | g++ -E - > tests/default.E.cpp
This command works as follows:
It invokes the
cat
utility to print the contents ofdefs.hpp
(which contains the relevant macro definitions) andtests/default.cpp
(the generated code fortests/default.uc
) to standard output.It pipes the output through
grep -v
, which excludes lines that contain the given sequence. We strip out#include
lines so that only the code fromtests/default.cpp
gets preprocessed.It then invokes
g++ -E
to run the preprocessor. The-
command-line argument tellsg++
to run on the contents of standard input.Finally, the results are stored in
tests/default.E.cpp
, which you can then examine by hand.
We have included a Makefile
rule to run this command:
$ make tests/default.preprocess
cat defs.hpp tests/default.cpp | grep -v '#include' | g++ -E - > tests/default.E.cpp
You can replace default
with another test name to run the
corresponding generated output through the processor. Make sure that
you run the uC compiler first (either directly using python3
ucc.py -C <test file>
or through one of the Makefile
targets) to
generate the output first.
You may find it helpful to run the preprocessed output through an
automatic formatter before examining it. Either use your IDE for this,
or an external tool such as clang-format. (You can install it
via brew install clang-format
on MacOS if you have Homebrew, apt
install clang-format
on Ubuntu [you may need to use sudo
], npm
install -g clang-format
if you have npm, or even through pip via
pip3 install clang-format
.)
Phase 1: Generating Type Declarations
The semantics of uC allow types and functions to be used before their definition. In order to support this in the generated C++ code, forward declarations must be made for each user-defined type and function. In addition, forward declarations of types should be made before those of functions, since a function may name a user-defined type in its return or parameter types.
Thus, the first step is to generate forward declarations for user-defined types. As an example, consider the following uC type definition:
struct foo {
int x;
double y
};
This should result in a C++ forward declaration as follows:
struct UC_TYPEDEF(foo);
Since this is a forward declaration, this is not a definition for the resulting struct.
Modify gen_type_decls()
in ucbackend.py
so that it runs your
code for this phase.
Phase 2: Generating Function Declarations
The second step is to generate forward declarations for user-defined functions. As an example, consider the following uC function:
void main(string[] args) {
println("Hello world!");
}
This should result in a C++ forward declaration as follows:
UC_PRIMITIVE(void)
UC_FUNCTION(main)(UC_ARRAY(UC_PRIMITIVE(string)) UC_VAR(args));
The return and parameter types should be mangled appropriately. You
will find the mangle()
method of BaseTypeNameNode
objects
useful for this purpose. The parameter names are optional in a forward
declaration, but if you choose to generate them, make sure to mangle
them with the UC_VAR
macro.
Modify gen_function_decls()
in ucbackend.py
to run your code
for this phase.
Phase 3: Generating Type Definitions
Full type definitions must appear before function definitions, since in C++, the contents of a class or struct are not accessible until after the definition. Thus, the next step is to generate definitions for each user-defined type.
Unlike in the previous phases, we are not providing full examples of generated code for this or future phases. The goal is to reason about what code is necessary to translate between the languages, using your knowledge of both uC and C++, rather than just pattern matching.
In creating an instance of a user-defined uC type, it is legal to provide a single argument for each field, or to provide no arguments at all. The latter results in default initialization, which initializes primitive types to 0, false, or an empty string, and references to user-defined types to null. Consider the following type:
struct bar {
int x;
foo f;
};
Both of the following initializations are valid:
bar b1 = new bar(3, new foo(4, 5));
bar b2 = new bar();
The former explicitly initializes each field to the corresponding
argument, so that b1.x
is 3 and b1.f
is a reference to the
newly created foo
object. The latter performs default
initialization, resulting in b2.x
being 0 and b2.f
null.
You will need to support both possibilities for initialization. Default initialization in uC is equivalent to value initialization in C++. You will need to generate code to value initialize fields in cases that require default initialization from uC.
User-defined types also must support equality comparisons. In C++20,
you only have to overload operator==()
, and you can default it –
the defaulted operator
does member-by-member comparisons. Use a signature like the following,
so that comparisons can be done on r-values, which bind to const
l-value references in C++:
struct UC_TYPEDEF(bar) {
UC_PRIMITIVE(boolean) operator==(const UC_TYPEDEF(bar) &rhs) const ...
};
Modify gen_type_defs()
in ucbackend.py
to invoke the code for
this phase.
Phase 4: Generating Function Definitions
The next step is to generate full definitions for each user-defined function.
Make sure to mangle the names of uC variables with UC_VAR
,
both in variable definitions as well as in variable and field
accesses.
Most uC statements and expressions have a one-to-one correspondence with C++ statements and expressions, and in those cases, you’ll find that you’ll be able to generate C++ code that is nearly identical to uC code. The semantics of most uC expressions, in terms of their order of operations and their types, are designed to be identical to their C++ counterparts.
For a string literal, use the s
suffix so that it is a
std::string
literal rather than a const char *
:
"Hello world!"s
For call nodes, mangle the function name using the UC_FUNCTION
macro.
For allocation nodes, you should make use of the library template
uc_construct<T>()
. You will need to explicitly provide the first
template argument, which should be the mangled name of the underlying
uC type, when calling it.
For field accesses, the receiver is always of uc_reference
type.
Since uc_reference
derives from shared_ptr
, it supports the
indirect member-access operator ->
, which you can use to access a
field. However, the receiver needs to be checked for null before it
can be dereferenced via ->
. Use uc_null_check()
(defined in
ref.hpp
) to do so:
uc_null_check(<receiver>, <position of AST node>)-><mangled field>
For array indexing, you will find the library template
uc_array_index()
useful. The position of the corresponding AST
node should be passed in as the last argument – it is used for
reporting runtime errors.
For unary and binary operations, you should place parentheses around the expression in order to preserve the precedence and associativity encoded in the structure of the AST. For example, consider the following uC code:
(3 - 4) * 5
This results in the AST structure in Figure 1.
The parenthesization is encoded in the structure of the AST itself, so
that 3 and 4 are grouped together under the MinusNode
. In order to
preserve this in the generated code, emit parentheses around every
binary and unary operation (that has a C++ counterpart):
((3 - 4) * 5)
In uC, addition is a polymorphic operation, as it can be applied to numerical types as well as strings. Thus, its implementation is deferred to Phase 5.
Use the uc_id()
function template in expr.hpp
to implement the
ID operator (#
).
For array push and pop operations, you will find the
uc_array_push()
and uc_array_pop()
library templates useful.
Modify gen_function_defs()
in ucbackend.py
to execute the code
for this phase.
Phase 5: Polymorphic Operations
The final compiler phase is to write implementations for the polymorphic operation of adding two items. Since we are not running the type computation from Project 4, we cannot resolve the operation within the compiler itself, as we do not know the types of the operands. Instead, we will rely on the target C++ compiler to resolve the operation for us.
The operands of an addition may be both of numerical type, both of
string type, or one of string type and the other of numerical or
boolean type. To support this, define overloads for uc_add()
in
expr.hpp
. Then generate code to call uc_add()
for an addition.
You may find std::to_string()
useful for converting numerical
types to strings. However, you cannot use it for booleans, which
should be converted to the strings true
or false
, not 1
and 0
. Instead, you will need to define specific uc_add()
overloads for booleans that perform the correct string conversion.
Do not repeat code if not necessary for the overloads in this phase.
Instead, use templates where possible. You should have no more than
six overloads for uc_add()
.
You may run into a case where the C++ compiler considers a call to be ambiguous because two or more overloads are equally applicable. In such a case, you can resolve the ambiguity by providing a more specialized overload that will be preferred over the ambiguous ones.
Phase 6 (Optional): Writing a uC Program
Implement a simulation of Conway’s Game of Life in uC. Your implementation should be on a finite grid. Edge cells have fewer neighbors but should otherwise follow the same rules as any other cell.
Complete the implementation of the simulate()
function in
life.uc
. You may define any helper functions and structs you need.
When printing the grid, you should first print a line consisting of
\(cols + 2\) dashes (-
). Each row should then start and end
with a pipe (|
), each live cell should be printed as an asterisk
(*
), and each dead cell should be printed as a space. Finally,
print another line consisting of \(cols + 2\) dashes followed by
another empty line.
We have provided a test case in the main()
function and the
expected output in life_test.out
. Compile and run your code with:
$ python3 ucc.py -C life_test.uc
$ g++ --std=c++20 -I. -o life.exe life.cpp
$ ./life.exe
Alternatively, you can use the Makefile
to compile and test
life.uc
:
$ make life
This phase is optional and will not be graded.
Testing and Evaluation
We have provided a small number of test cases in the distribution code. However, the given test cases only cover a small number of possible uC constructs, so you should write extensive tests of your own.
We will evaluate your compiler based on whether the generated code is
a valid C++ translation whose behavior matches that expected from the
uC source. Thus, we will run a valid uC source file through
your compiler to produce the output C++ code. For Phases 1, 2, and 3,
we will combine your output with our own C++ test code that relies on
the output being correct. For Phases 1 and 2, we will compile with
g++ -c
, without linking, to ensure that the declarations are
correct. For Phase 3, our test code will use assert statements to test
that your generated type definitions are correct. For Phases 4 and 5,
we will compile your generated code with g++
, and run it and check
that the output matches what is expected.
Your generated C++ code itself does not have to exactly match ours. However, its behavior must match the behavior of our C++ code.
We will not test your compiler with erroneous uC code.
For Phases 4 and 5, we will run your generated code through
valgrind
to ensure that fields are properly default initialized.
The provided Makefile
invokes valgrind
if it is present. This
can be disabled (e.g. if your valgrind
installation is broken) by
setting VALGRIND=""
:
$ make phase5 VALGRIND=""
Grading
The grade breakdown for this project is as follows:
20% checkpoint
80% final deadline autograded
We will not hand grade this project, nor will we grade your test cases. However, we are still requiring that you submit your test cases, as they are part of your solution.
You are required to adhere to the coding practices in the course style guide. We will use the automated tools listed there to evaluate your code. You can run the style checks yourself as described in the guide.
Submission
All code that you write for the interpreter must be placed in the
files ucbackend.py
, ucbase.py
, ucexpr.py
,
ucfunctions.py
, ucstmt.py
, uctypes.py
, or expr.hpp
. We
will test all seven files together, so you are free to change
interfaces that are internal to these files. You may not change any
part of the interface that is used by ucc.py
or ucfrontend.py
.
Submit all of ucbackend.py
, ucbase.py
, ucexpr.py
,
ucfunctions.py
, ucstmt.py
, uctypes.py
, expr.hpp
, and
any of your own test files to the autograder before the deadline. We
suggest including a README.txt
describing how to run your test
cases.
Frequently Asked Questions
Q: Are we supposed to output the same code as what’s in
default_phase1.cpp
, etc.?A: No. That’s a test file for Phase 1, and it gets compiled together with your output to produce a test executable for that phase. You’re supposed to output code as described in this project spec. For the
default.uc
test case in Phase 1, your output should be something like:struct UC_TYPEDEF(foo); struct UC_TYPEDEF(bar); struct UC_TYPEDEF(baz);
Q: Should we write all of our code for Phases 1-4 in
ucbackend.py
?A: No. Project 4 demonstrated the proper way to structure code that operates on an AST, and you should follow a similar organization in this project. Take a look at
ucfrontend.py
for reference. You are allowed to add methods to ASTNode as needed.Q: I’m getting compiler errors when my generated code is compiled with
g++
. How do I figure out what’s wrong?A: Try running the generated code through the preprocessor as described in Test Cases, which will expand the macros. The following example is for
default.uc
after Phase 1:$ cat defs.hpp tests/default.cpp | grep -v '#include' | g++ -E - # 1 "<stdin>" # 1 "<built-in>" 1 # 1 "<built-in>" 3 # 368 "<built-in>" 3 # 1 "<command line>" 1 # 1 "<built-in>" 2 # 1 "<stdin>" 2 # 46 "<stdin>" struct ucp_t_foo; struct ucp_t_bar; struct ucp_t_baz;