# Project 5: Code Generation¶

## Final deadlineMonday, Apr 18, 2022at 8pm ET¶

In this project, you will implement a code generator for uC, a small language in the C family. The main purpose of this project is to gain an understanding of metaprogramming, how one language can be translated into another, and to get practice with using macros and templates. A secondary goal is to gain experience working with an existing code base, with complex code written by someone else.

This project relies on the the first two phases of the semantic analyzer you implemented for uC in Project 4. We will only be testing this project with uC code that is semantically correct, so the later semantic analysis phases will be disabled. You will need to start with your Project 4 implementation as a basis, and we will only be providing distribution code for files that are different from or in addition to Project 4.

Since this project reuses code from Project 4, if you choose to work in a different partnership than for Project 4, you must start from a fresh copy of the Project 4 starter files. You will then need to reimplement the first two phases of Project 4 before proceeding with this project.

Please see the syllabus for partnership rules. As a reminder, you may not share any part of your solution outside of your partnership. This includes both code and test cases.

The project is divided into multiple phases that are described below.

## Reported Time to Complete the Project¶

The following is the time students reported they spent on the project in Winter 2021.

These data are for planning purposes only. We do not consider the exact time spent on the project to be reflective of anyone’s learning or ability. Rather, completing the project regardless of how much time it takes is what is important to achieve the learning goals of the project.

# Optional Checkpoint¶

The checkpoint consists of achieving at least 30% of the points on the public and private test cases before the checkpoint deadline. Your grade for the checkpoint will be computed as the better of:

• $$min(0.3, score) / 0.3$$, where $$score$$ is the fraction of autograded points earned by your best submission before the checkpoint deadline.

• $$finalScore$$, where $$finalScore$$ is the fraction of autograded points earned by your best submission before the final deadline.

Thus, completing the checkpoint is optional. However, doing it will work to your benefit, since you can guarantee full credit on the 20% of the project points dedicated to the checkpoint.

# Overview¶

In Project 4, you implemented a semantic analyzer for uC, which performed multiple analysis phases over a source program. Your main task in Project 5 is to implement the final code-generation phase, which will generate C++ code from an abstract syntax tree (AST). Since the compiler is generating C++ code rather than machine code, it is a source-to-source compiler.

As an example, consider the following uC program:

void main(string[] args)() {
println("Hello world!");
}


This can be translated into the following C++ code (which, modulo some minor spacing, is the actual code produced by the staff solution to this project):

#include "defs.h"
#include "ref.h"
#include "array.h"
#include "library.h"
#include "expr.h"

namespace uc {

// Forward type declarations

// Forward function declarations

UC_PRIMITIVE(void)
UC_FUNCTION(main)(UC_ARRAY(UC_PRIMITIVE(string)) UC_VAR(args));

// Full type definitions

// Full function definitions

UC_PRIMITIVE(void)
UC_FUNCTION(main)(UC_ARRAY(UC_PRIMITIVE(string)) UC_VAR(args)) {
{
UC_FUNCTION(println)("Hello world!"s);
}
}

} // namespace uc

int main(int argc, char **argv) {
uc::UC_ARRAY(uc::UC_PRIMITIVE(string)) args =
uc::uc_make_array_of<uc::UC_PRIMITIVE(string)>();
for (int i = 1; i < argc; i++) {
uc::uc_array_push(args, uc::UC_PRIMITIVE(string)(argv[i]));
}
uc::UC_FUNCTION(main)(args);
return 0;
}


First, there are some includes of library files written in C++, which define built-in uC functions and types as well as macros and templates to be used by the generated code. The generated code itself is placed in the uc namespace to avoid clashing with other code. Forward declarations of types and functions come first, allowing them to be used before their definition as required by uC. Then there are full type and function definitions. The generated code uses macros to mangle names into identifiers that will not clash with C++ names or with each other, in the case of names belonging to different categories. Finally, a C++ main() function converts command-line arguments into the format expected by uC and calls the uC main() function.

The compiler is run as follows to generate C++ code:

$python3 ucc.py -C <source file>  This parses the source file, runs the first two semantic analysis phases, and generates C++ code and saves it to a file. You can then compile the resulting C++ code as follows: $ g++ --std=c++17 -I. -o <executable> <generated file>


The -I. argument tells g++ to search the current directory for header files and is necessary if the generated file is not in the current directory.

For the example above, we can run:

$python3 ucc.py -C tests/hello.uc$ g++ --std=c++17 -I. -o hello.exe tests/hello.cpp
$./hello.exe Hello world!  The compiler implementation is divided into several Python and C++ files, described below. # Distribution Code¶ Use the following commands to download and unpack the distribution code: $ wget https://eecs390.github.io/project-uc/backend/starter-files.tar.gz


## Test Cases¶

The following basic test cases are provided in the tests directory, with output from running the resulting executable.

Test

Description

default.uc

Test default constructors for user-defined types.

equality.uc

Test equality and inequality comparisons.

hello.uc

Simple “hello world” example.

length_field.uc

Test accessing the length field of an array or an object.

literals.uc

Test literals and addition between strings and other types.

particle.uc

Complex example of a uC program.

use_before_decl.uc

Test using types or functions before their declaration, which should be valid.

These are only a limited set of test cases. You will need to write extensive test cases of your own to ensure your compiler is correct.

You will not be able to directly compile the generated C++ code from a uC program until you complete Phase 4, since you will need to be able to generate the uC main() function. However, for each test case, we provide phase-specific test files in the form of <test>_phase{1,2,3}.cpp. These files directly test the functionality required up to the associated phase. For Phases 1 and 2, they ensure that the generated declarations are correct. For Phase 3, the test files determine whether or not type definitions are correct. Use the provided Makefile to run these tests, as described in Testing Framework.

# Phase 1: Generating Type Declarations¶

The semantics of uC allow types and functions to be used before their definition. In order to support this in the generated C++ code, forward declarations must be made for each user-defined type and function. In addition, forward declarations of types should be made before those of functions, since a function may name a user-defined type in its return or parameter types.

Thus, the first step is to generate forward declarations for user-defined types. As an example, consider the following uC type definition:

struct foo(int x, float y);


This should result in a C++ forward declaration as follows:

struct UC_TYPEDEF(foo);


Since this is a forward declaration, this is not a definition for the resulting struct.

Modify gen_type_decls() in ucbackend.py so that it runs your code for this phase.

# Phase 2: Generating Function Declarations¶

The second step is to generate forward declarations for user-defined functions. As an example, consider the following uC function:

void main(string[] args)() {
println("Hello world!");
}


This should result in a C++ forward declaration as follows:

UC_PRIMITIVE(void)
UC_FUNCTION(main)(UC_ARRAY(UC_PRIMITIVE(string)) UC_VAR(args));


The return and parameter types should be mangled appropriately. You will find the mangle() method of Type objects useful for this purpose. The parameter names are optional in a forward declaration, but if you choose to generate them, make sure to mangle them with the UC_VAR macro.

Modify gen_function_decls() in ucbackend.py to run your code for this phase.

# Phase 3: Generating Type Definitions¶

Full type definitions must appear before function definitions, since in C++, the contents of a class or struct are not accessible until after the definition. Thus, the next step is to generate definitions for each user-defined type.

Unlike in the previous phases, we are not providing full examples of generated code for this or future phases. The goal is to reason about what code is necessary to translate between the languages, using your knowledge of both uC and C++, rather than just pattern matching.

In creating an instance of a user-defined uC type, it is legal to provide a single argument for each field, or to provide no arguments at all. The latter results in default initialization, which initializes primitive types to 0, false, or an empty string, and references to user-defined types to null. Consider the following type:

struct bar(int x, foo f);


Both of the following initializations are valid:

b1 = new bar(3, new foo(4, 5));
b2 = new bar();


The former explicitly initializes each field to the corresponding argument, so that b1.x is 3 and b1.f is a reference to the newly created foo object. The latter performs default initialization, resulting in b2.x being 0 and b2.f null.

You will need to support both possibilities for initialization. Default initialization in uC is equivalent to value initialization in C++. You will need to generate code to value initialize fields in cases that require default initialization from uC.

User-defined types also must support equality comparisons. You will need to overload operator==() and operator!=() to do member-by-member comparisons. Use a signature like the following, so that comparisons can be done on r-values, which bind to const l-value references in C++:

struct UC_TYPEDEF(bar) {
UC_PRIMITIVE(boolean) operator==(const UC_TYPEDEF(bar) &rhs) const {
...
}
...
};


Modify gen_type_defs() in ucbackend.py to invoke the code for this phase.

# Phase 4: Generating Function Definitions¶

The next step is to generate full definitions for each user-defined function.

Make sure to place declarations of local uC variables at the top of the body of a generated function, taking care to mangle them with UC_VAR. You do not need to initialize local variables, since uC specifies that their initial values are undefined.

Most uC statements and expressions have a one-to-one correspondence with C++ statements and expressions, and in those cases, you’ll find that you’ll be able to generate C++ code that is nearly identical to uC code. The semantics of most uC expressions, in terms of their order of operations and their types, are designed to be identical to their C++ counterparts.

For a string literal, use the s suffix so that it is a std::string literal rather than a const char *:

"Hello world!"s


For call nodes, the func attribute will not be available, since we cut off semantic analysis after the first two frontend phases. Instead, mangle the function name using the UC_FUNCTION macro.

For allocation nodes, you should make use of the library template uc_construct<T>(). You will need to explicitly provide the first template argument, which should be the mangled name of the underlying uC type, when calling it.

For field accesses, the receiver is always of uc_reference type. Since uc_reference derives from shared_ptr, it supports the indirect member-access operator ->, which you can use to access a field in the general case. However, for the specific case where the field name is length, the receiver may be of object or array type. We will handle that case in Phase 5.

For array indexing, you will find the library template uc_array_index() useful.

For unary and binary operations, you should place parentheses around the expression in order to preserve the precedence and associativity encoded in the structure of the AST. For example, consider the following uC code:

(3 - 4) * 5


This results in the AST structure in Figure 1.

Figure 1 AST structure that encodes associativity and precedence.

The parenthesization is encoded in the structure of the AST itself, so that 3 and 4 are grouped together under the MinusNode. In order to preserve this in the generated code, emit parentheses around every binary and unary operation (that has a C++ counterpart):

((3 - 4) * 5)


In uC, addition is a polymorphic operation, as it can be applied to numerical types as well as strings. Thus, its implementation is deferred to Phase 5.

Use the uc_id() function template in expr.h to implement the ID operator (#).

For array push and pop operations, you will find the uc_array_push() and uc_array_pop() library templates useful.

Modify gen_function_defs() in ucbackend.py to execute the code for this phase.

# Phase 5: Polymorphic Operations¶

The final compiler phase is to write implementations for two polymorphic operations, accessing the length field of a receiver and adding two items.

In a field access, the receiver may be of object or array type if the name of the field is length. In order to support this, we implement library overloads for uc_length_field() and rely on the C++ compiler’s overload resolution to select the correct one. We provide an overload for object types in the distribution file expr.h. Your task is to write the overload for array types. You should then generate code to call uc_length_field() when the field name is length:

args.length


This results in:

uc_length_field(UC_VAR(args))


An addition is also polymorphic, as the operands may be both of numerical type, both of string type, or one of string type and the other of numerical or boolean type. To support this, define overloads for uc_add() in expr.h. Then generate code to call uc_add() for an addition. You may find std::to_string() useful for converting numerical types to strings. However, you cannot use it for booleans, which should be converted to the strings true or false, not 1 and 0. Instead, you will need to define specific uc_add() overloads for booleans that perform the correct string conversion.

Do not repeat code if not necessary for the overloads in this phase. Instead, use templates where possible. For reference, our solution has two overloads for uc_length_field(), including the one provided in the distribution, and six for uc_add().

You may run into a case where the C++ compiler considers a call to be ambiguous because two or more overloads are equally applicable. In such a case, you can resolve the ambiguity by providing a more specialized overload that will be preferred over the ambiguous ones.

# Phase 6 (Optional): Writing a uC Program¶

Implement a simulation of Conway’s Game of Life in uC. Your implementation should be on a finite grid. Edge cells have fewer neighbors but should otherwise follow the same rules as any other cell.

Complete the implementation of the simulate() function in life.uc. You may define any helper functions and structs you need.

When printing the grid, you should first print a line consisting of $$cols + 2$$ dashes (-). Each row should then start and end with a pipe (|), each live cell should be printed as an asterisk (*), and each dead cell should be printed as a space. Finally, print another line consisting of $$cols + 2$$ dashes followed by another empty line.

We have provided a test case in the main() function and the expected output in life_test.out. Compile and run your code with:

$python3 ucc.py -C life_test.uc$ g++ --std=c++17 -I. -o life.exe life.cpp
$./life.exe  Alternatively, you can use the Makefile to compile and test life.uc: $ make life


This phase is optional and will not be graded.

# Testing and Evaluation¶

We have provided a small number of test cases in the distribution code. However, the given test cases only cover a small number of possible uC constructs, so you should write extensive tests of your own.

We will evaluate your compiler based on whether the generated code is a valid C++ translation whose behavior matches that expected from the uC source. Thus, we will run a valid uC source file through your compiler to produce the output C++ code. For Phases 1, 2, and 3, we will combine your output with our own C++ test code that relies on the output being correct. For Phases 1 and 2, we will compile with g++ -c, without linking, to ensure that the declarations are correct. For Phase 3, our test code will use assert statements to test that your generated type definitions are correct. For Phases 4 and 5, we will compile your generated code with g++, and run it and check that the output matches what is expected.

Your generated C++ code itself does not have to exactly match ours. However, its behavior must match the behavior of our C++ code.

We will not test your compiler with erroneous uC code.

For Phases 4 and 5, we will run your generated code through valgrind to ensure that fields are properly default initialized. The provided Makefile invokes valgrind if it is present. This can be disabled (e.g. if your valgrind installation is broken) by setting VALGRIND="":

\$ make phase5 VALGRIND=""


The grade breakdown for this project is as follows:

• 20% checkpoint

All code that you write for the interpreter must be placed in the files ucbackend.py, ucbase.py, ucexpr.py, ucfunctions.py, ucstmt.py, uctypes.py, or expr.h. We will test all seven files together, so you are free to change interfaces that are internal to these files. You may not change any part of the interface that is used by ucc.py or ucfrontend.py.
Submit all of ucbackend.py, ucbase.py, ucexpr.py, ucfunctions.py, ucstmt.py, uctypes.py, expr.h, and any of your own test files to the autograder before the deadline. We suggest including a README.txt describing how to run your test cases.