# Simulation and Test Benches

A significant portion of the language are dedicated to test benches and testing. In this chapter we will cover some commonly used techniques to write efficient test bench for your hardware designs.

## 6.1 How SystemVerilog Simulator Works

Before we delve into details of how to write a proper test bench, we need to establish a deep understanding of how simulator works and how it schedules events. This will help us troubleshoot bugs and errors in the future.

A specification-compliant SystemVerilog simulator follows a discrete event execution model, where the simulation time advances with value updates. The hardware design is inherently parallel, where processes such as always_comb and always_ff are executed currently. Each time the value of a net/variable changes, we will have an update event and any processes that are sensitive to that event need to be evaluated as well, which is called evaluation event. At each “timestamp”, the simulator needs to first compute the update events, evaluate update events, and loop back to see if there is more update events triggered by previous update events.

The term for “timestamp” in SystemVerilog is simulation time. It can be transformed back to real time using the timescale compiler directive introduced earlier in the book. We use simulation time, or simply time throughout the entire chapter to avoid confusion.

Although the design and test bench is parallel by nature, most simulators are single-threaded and follows certain rules to evaluate the code to ensure it is conceptually correct. Typically the simulator divides the unit time slop into multiple regions where events can be scheduled in a pre-defined orders. In each region, the events can be scheduled arbitrarily, allowing simulator performs optimization when it sees fit. Figure 12 shows how the time slot is divided into different regions and the execution flow between different regions.

PLI regions will be discussed in much details later in the book. For now it is enough to know there are regions reserved for third-party libraries that can be loaded into the simulator and can have direct access to the simulator state.

Fully cover each region requires much lengthy details and readers are encouraged to read through the language LRM and even try to implement a simple interpreter-based simulator. We will focus on three major regions: active event region, inactive events region, and NBA events region.

Generally speaking, any events (e.g. blocking assignment) specified in the always_comb and continuous assignment are evaluated in the active event region. The simulator continues evaluate the events in the active event region in a loop until no events left in the region. If there is an explicit timing control, e.g. #0 delay control, in the process, the process will be suspended and the following events are scheduled into the inactive events region. Again, the simulator runs in loop to clear out the events in the inactive events region.

The NBA events region contains nonblocking assignment update. It will only be executed after precedent active and inactive region are cleared.

### 6.1.1 Simulation order

The SystemVerilog LRM guarantees a certain scheduling order. Any simulator claims to be standard compliant should obey the execution order:

1. Statements within a begin-end block shall be executed in lexical order, i.e., the order in whey they appear in the source code
2. NBAs shall be performed in the order the statement where executed.

To understand the second requirement, let’s consider the following example:


logic a;
initial begin
a <= 0;
a <= 1;
end

At the end of simulation time, variable a will be first assign to 0, and then 1.

As one can suspect, such ordering poses a hard restriction on reordering-related compiler optimization. Simulation vendors typically employ different types of optimization to ensure the semantics is met, but necessary the actual ordering of execution. For instance, if no third-party entity is expected to read out the exact simulation order (e.g. debugger that allows step through), we can reorder the statements as long as it is side-effects free and matches the ordering semantics. This significantly speeds up the simulation but requires extra flags if users wish to debug and step through the code, e.g. -line_debug flag in Xcelium. Verilator, on the other hand, only offers reordered simulation order for the sake of performance. As a result, it is not standard compliant.

The SystemVerilog LRM, however, does not specify the ordering at which processes are evaluated. As a result, it is up to the simulator to decide which process to execute first. This introduce nondeterminism among the simulators. Another source of nondeterminism comes from the fact that simulator may suspend the process and place partially completed events as pending event in the event region whenever it encounters a timing control statement. This typically happens in the test bench instead of the RTL design, since synthesizable RTL disallows timing control except for always_ff.

## 6.2 Timing Controls

Timing is one of the most important factor to consider when writing a test bench. Should the signal be stable before the clock edge, or how long should the signal be valid for? What does delay mean? This section will cover various aspect of timing controls.

The compiler directive timescale specifies the precision at which the simulator should run. Since different modules may have different timescale, the simulator needs to make a decision on how to represent simulation time. In most simulators, in fact any simulator that supports VPI standard (discussed later), simulation time is represented as an unsigned 64-bit integer, even though the RTL model may expect the time to be a float. To do so, time is rounded off to the specified precision and then scaled to the simulation time units. Consider the following example:

timescale 1ns/10ps
module A;
logic a;
initial begin
#1.2 a = 1;
end
endmodule

timescale 1us/10ns

module B;
logic b;
initial begin
#3.4 b = 1;
end
endmodule

For all modules, 10ps is the finest precision so 1 simulation time unit corresponds to 10ps. Before we convert every delay into the simulation time, we first round the delay into module’s precision. So 1.2 in module A becomes $$1.2ns = 120 \times 10ps$$, i.e. 120 10-picoseconds unit; 3.4 in module B becomes $$3.4us = 340 \times 10us$$, i.e. 340 10-microseconds. Then we scale everything into simulation time. Hence 1.2 in module A becomes 120 10-picoseconds and 3.4 in module B becomes 340000 10-picoseconds.

To obtain the simulation time, we can use $time, which can be printed out either via %d or %t in the $display function.

The most common usage of timing control is setting the clock. A standard code style is shown below:

module top;
logic clk;

initial clk = 0;
always clk = #10 ~clk;

endmodule

Notice that the clock changes its value every 10 units of time, hence the clock period is 20 units of time. Because this always block runs forever, we have to terminate the simulation with the builtin SystemVerilog task $finish, as shown below: initial begin // test bench logic$finish;
end

To synchronize the values against the clock, we highly discourage readers to set delays by hand, which is error-prone and reduce the readability. Instead, we recommend to use timing controls (@) directly. Here is an example:

initial begin
input1 = 1;
input2 = 2;

@(posedge clk);

input1 = 2;
input2 = 3;

@(posedge clk);
end

In such way, we are guaranteed that signals input1 and input2 are set before the rising edge of the clock signal, regardless of the clock period! If you have checking/assertion logics, you can place them after the negative edge of the clock, assuming there is no synchronous logic depends on negative edge of the clock in your design (dual triggering typically happens in some high-performance design), as shown below:

initial begin
// input logic
input1 = 1;
@posedge (clk);
@negedge (clk);
// checking logic
assert(output1 == 1);
// input logic
input1 = 2;
@posedge (clk);
@negedge (clk);
// checking logic
assert(output1 == 2);
//...
end

We will discuss more complex but reusable test bench design pattern later in the chapter.

### 6.2.1 Fork and Join

Because hardware is inherently concurrent, in many cases we want to have multiple threads performing tasks at the same time, either driving or checking differently parts of the design. SystemVerilog offers fork and join semantics that is similar to that of software programming languages, e.g. std::thread in C++.

The general syntax for fork and join is shown below. Notice that each statement inside the fork join is an individual thread, so if you want complex logic, you need to enclose it with begin and end block.

fork
join

Here is a simple example to illustrate how to use fork and join:

module fork_join_ex;
initial begin
fork
#10 $display("Thread 1 finished at %t",$time);
begin
#5 $display("Thread 2 finished at %t",$time);
end
#20 $display("Thread 3 finished at %t",$time);
join
end
endmodule

Run the file (code/06/fork_join_ex.sv) with xrun we will get:

Thread 2 finished at                    5
Thread 3 finished at                   20

Notice that you can even have nested fork join, i.e. one thread can spawn multiple threads as well. Although the fork join semantics is similar to software programming languages, there are some properties we need to keep in mind:

1. All statements are executed concurrently, regardless of whether it is simulated on a single CPU core or not.
2. Timing controls are local to each fork block and are computed relative to the simulation time when entering the block.
3. It is always a good practice to name the fork block, especially when you’re creating variables inside, as shown below:

fork
begin: blk_1
// logic
end: blk_1
begin: blk_2
// logic
end: blk_2
join
4. Since fork and join is part of SystemVerilog’s timing control, it is not allowed inside function. You need to use task instead.
5. Any objects declared inside the fork-join block are managed by the simulator, so we don’t need to worry about dangling references or memory leaks. However, they should be declared as automatic so that it is local to the block.
6. You cannot put fork-join inside always_comb.

#### 6.2.1.1 Different Join Semantics

There are three different join keywords we can use in SystemVerilog and each have different semantics:

• join: this keyword blocks the execution until all the forked processes finish, This is similar to join() in software threads
• join_any: this keyword blocks until any of the forked processes finishes. As a result, some processes may still be running when the execution of the main thread continues
• join_none: this keyword does not block and execution as the forked processes continue to execute in the background.

## 6.3 Standard Data Structures

SystemVerilog introduces many common data structures to help designers build complex test logic. These data structure interfaces are heavily influenced by C++ standard libraries. We will take a quick look at some commonly used data structures. Interested readers should refer to LRM for more information. Keep in mind that all the data structures introduced in this sub-chapter is not synthesizable, as with any construct discussed in this chapter.

### 6.3.1 Dynamic Array

Most arrays in SystemVerilog are fix-sized and their dimensions cannot be changed at run time. Dynamic array, as its name suggests, is an unpacked array whose dimension can be changed at runtime. To decare a dynamic array we can use the following syntax

    // data_type name[];
integer a[];
logic[15:0] b[];

You can also combine it with other arrays, as shown below, which declares a fix-sized array of dynamic arrays.

    integer a[1:0][];

To initialize the dynamic array, we can use the keyword new with the targeted dimension:

    integer a[];
a = new[10];

Keep in mind that even though we have initialized the dynamic array, the content of each array element is still uninitialized. As a result, you can get x when reading the element values.

To loop through the array, we can simply do

integer a[];
a = new[4];
foreach (a[i]) begin
$display("a[%0d] = %0d", i, a[i]); end Notice that we implicitly create an index variable i with the foreach keyword. Below is a list of methods associated with the dynamic array: • size(): in additional to the standard system call function $size(), dynamic array has a method that returns the size of the array.
• delete(): clears all the elements and becomes an empty array.

### 6.3.2 Queue

Queue is SystemVerilog’s equivalence for vector in C++. To declare a queue, we can use the following syntax:

// type name[$]; string names[$];
integer values[$]; Like normal arrays, queue supports slicing operations: - Like the usual slicing operator, the indexing is inclusive, that is, queue[a:b] should returns b - a + 1 elements. - If the slicing is out of range or malformed, e.g., queue[1:0], an empty queue should be returned. - If any 4-state value containing x or z is used for slicing, an empty queue should be returned. Looping through the queue is the same as looping through dynamic arrays: integer a[$];
foreach (a[i]) begin
$display("a[%0d] = %d", i, a[i]); end Below is a list of methods associated with the queue: - size(): in additional to the standard system call function $size(), size() returns the size of the queue. - delete(index): deletes the element based on given index; if index is not provided as a function argument, clear the queue. - insert(index, value): insert the value into given index. - push_back(value): put the element to the end of the queue. - pop_back(): removes and returns the last element of the queue. If the queue is empty, default value for the data type is returned and a warning may be issued. - push_front(value): put the element to the front of the queue. - pop_front(): removes and returns the first element of the queue. If the queue is empty, default value for the data type is returned and a warning may be issued.

### 6.3.3 Associative Array

Associative array is SystemVerilog’s equivalence for map containers in C++. The index expression can be any legal SystemVerilog type and the size of the container grows as more elements are inserted. To declare an associative array, we can use the following syntax:

// data_type name [index_type]
integer array1[string];
logic[15:0] array2[ClassA]; // ClassA is a class
// * implies any integral expression of any size
// more details below
logic array3[*];

SystemVerilog supports using * as a wildcard for index type with the following restrictions:

1. The index type must be an integral type, but can be different size. The “true value” is used for indexing; that is, SystemVerilog needs to resolve two values with different sizes to the same index location if their values match.
2. 4-state values with x and z is illegal.
3. Non-integral index types/values are illegal and will result in an error
4. String can be used, but will be casted as integral values.

To initialize the associative map when declaring it, we can use the following syntax:

string map[integer] = {0: "a", 1: "b"};

Similar to other data structures, we can loop through the associative array using foreach keyword:

string map[integer] = {0: "a", 1: "b"};
foreach (map[key]) begin
string value = map[key];
end

xact.b = $random(); this.gen2driver.put(xact); end endtask endclass Notice that we have a public task main() that’s used to produce input transactions. This task will be called inside the test environment. ### 6.5.3 Driver Design The role of the driver is to serialize the input stimulus onto the interface bus. Unlike the generator, it needs to understand the interface protocol our dut is using, in this case, a simple ready-valid handshake. It pulls the transaction from the mailbox used by the generator, and then drives the net, as shown in the code below (code/06/mult_driver.sv): class mult_driver; mailbox gen2driver; // virtual interface handle virtual mult_io_interface.driver driver; GeneratorXact xact; function new(mailbox gen2driver, virtual mult_io_interface.driver driver); this.gen2driver = gen2driver; this.driver = driver; endfunction task reset(); // reset the driver interface wait (!driver.rst_n); driver.a = 0; driver.b = 0; driver.valid_in = 0; driver.ready_in = 0; wait(driver.rst_n); endtask // entry point task main(); // loop forever // we are always ready to receive data driver.ready_in = 1'b1; forever begin this.gen2driver.get(xact); // drive the bus. need to make sure that the dut is ready // block until we have successfully put one transaction in while (1) begin @(posedge driver.clk); if (driver.ready_out) begin // dut is ready driver.a = xact.a; driver.b = xact.b; driver.valid_in = 1'b1; break; end else begin driver.valid_in = 1'b0; end end end endtask endclass Notice that in addition to the main() task, we have a reset task that’s responsible to initialize the dut. We also need to obey the ready-valid protocol, that is, we shall wait until the dut is ready, otherwise we will hold the pending transaction and wait. The driver does not need to know the details such as the total number of transactions. All it does is to take one transaction from the mailbox (if any), and then drive the interface. Also notice that we use a new syntax using the keyword virtual to get the reference for interface. Doing so allows us to directly set values to the interface as if the interface is an object. ### 6.5.4 Monitor Design The monitor taps into the interface bus and de-serialize signals into the high-level transaction class. Similar to the driver, it needs to understand the interface protocol and then put the transaction object into a mailbox shared with the scoreboard. Because it usually takes multiple cycles to complete the data collection, monitor typically has internal state to store information. Below shows the monitor that listens to the interface ports and gather data when the dut is ready (code/06/mult_monitor.sv) class mult_monitor; mailbox monitor2score; ScoreBoardXact xact; // virtual interface handle virtual mult_io_interface.monitor monitor; function new(mailbox mb, virtual mult_io_interface.monitor monitor); this.monitor2score = mb; this.monitor = monitor; endfunction // entry point task main(); forever begin xact = new(); @(posedge monitor.clk); wait (monitor.valid_in); // grab signals from the bus xact.a = monitor.a; xact.b = monitor.b; @(posedge monitor.clk); // wait until valid out is high wait (monitor.valid_out); // grab the output from the bus xact.lo = monitor.lo; xact.hi = monitor.hi; // put it into the mailbox monitor2score.put(xact); end endtask endclass Similar to driver, it uses a mailbox to interact with the scoreboard. It waits for the input valid signal goes high, then grab the input signals a and b. Then wait until the valid output goes high, i.e. dut has successfully computed the output, and then grab the output lo and hi. Once we have everything we need to for the packet, we assemble the packet and put it into the mailbox. Notice that monitor does not care about whether the output of the computation is correct or not! ### 6.5.5 Scoreboard Design Once we have complete packet, we can compare the output against the model we have. We can also figure out if there is any packet missing or malformed. Scoreboard typically interface with a high-level function model that’s either written in C/C++ or SystemVerilog. We will cover how to interface with C/C++ model later. For now we will simply compute the gold output in SystemVerilog. Below shows the scoreboard for our multiplier (code/06/mult_scoreboard.sv) : class mult_scoreboard; mailbox monitor2score; int num_xact; ScoreBoardXact xact; logic[31:0] lo, hi; function new(mailbox mb); this.monitor2score = mb; this.num_xact = 0; endfunction task main(); forever begin monitor2score.get(xact); // assertion part // simplified this.num_xact++; {hi, lo} = xact.a * xact.b; assert (hi == xact.hi); assert (lo == xact.lo); end endtask endclass Notice that we also keep track of number of transactions, in case there is some protocol bug that drops transaction packets. Once we get the transaction from the mailbox, we simply compute the gold output and assert the result. ### 6.5.6 Test Environment Setup Now we have all the major components written, the next step is to set up the test environment. The role of the environment is to instantiate and run the test suites. Below shows an example of test environment (code/06/mult_env.sv): class mult_env; // instances mult_generator gen; mult_driver driver; mult_monitor monitor; mult_scoreboard scoreboard; // mailboxes mailbox gen2driver; mailbox monitor2score; function new(int num_xact, virtual mult_io_interface io); // initial mail box first this.gen2driver = new(); this.monitor2score = new(); this.gen = new(gen2driver, num_xact); this.driver = new(gen2driver, io.driver); this.monitor = new(monitor2score, io.monitor); this.scoreboard = new(monitor2score); endfunction task reset(); this.driver.reset(); endtask task test(); fork gen.main(); driver.main(); monitor.main(); scoreboard.main(); join_any endtask task finish(); wait(gen.num_xact == scoreboard.num_xact); endtask task run(); reset(); test(); finish();$finish();
endtask

In the test environment, we instantiates the test components as well as the mailboxes. Notice that the constructor takes the full interface and use modport when instantiating test components. The main test task, task() uses fork so each component runs concurrently. We finish the test when the number of transaction received in the scoreboard equals to the number generated from the generator. The entry task is run(), which first resets the dut, then calls test(), and eventually finish().

To use the test environment, we need to following test bench code (code/06/mult_top):

module mult_top;

// env
mult_env env;
// interface
logic clk, rst_n;
// num of xacts
localparam num_xact = 42;

mult_io_interface io(.*);
// dut
mult_ex dut (.clk(io.clk),
.rst_n(io.rst_n),
.a(io.a),
.b(io.b),
.hi(io.hi),
.lo(io.lo),
.valid_in(io.valid_in),
.valid_out(io.valid_out),
);

// clocking
initial clk = 0;
always clk = #5 ~clk;

// reset sequence
initial begin
rst_n = 1;
#1;
rst_n = 0;
#1;
rst_n = 1;
end

// start the test
initial begin
env = new(num_xact, io);
env.run();
end

// in case of bug, terminate after certain times
initial #(num_xact * 10 * 5) \$finish;

The test bench top drives the clock as well as the reset signal. Notice that in order to avoid infinite loop when we have a missing packet (the end condition will never trigger), we set a terminal condition based on the number of cycles run.