Overview

The **LOOPGEN** IP collection provides fast hardware architectures for implementing nested loop structures. The collection comprises of three different architectures (variants) adhering to a common I/O interface, namely **HWLU**, a mixed-level structural/RTL architecture, **IXGENB**, a behavioral-level and **IXGENR**, a high-performance, pure RTL description.

LOOPGEN can be used for data-intensive processing on multi-dimensional data. Each update of the iteration vector is evaluated with zero-cycle overhead.

Functional description

Each core uses a single external clock source, connected to signal CLK and can be reset with the active high signal RESET. INNERLOOP_END indicates that the innermost loop computations for the current iteration vector have completed. LOOP_COUNT and STRIDE provide the loop bound and step values for each loop. The loops are assumed to be enumerated from 1 to NLP, with the NLP-th being the innermost one. The iteration vector is represented by the INDEX output vector. DONE signifies the end of the computation for the entire loop structure.

The LOOPGEN IP can be viewed as implementing the following C-like looping structure in hardware:

```c
for (i1 = 0; i1 < loop_count1; i1 += stride1) {
    for (i2 = 0; i2 < loop_count2; i2 += stride2) {
        for (in = 0; in < loop_countn; in += striden) {
            // innermost loop computations
        }
    }
}
```

**FEATURES**

- Three distinct architectures: Mixed-level structural/RTL, pure RTL and high-level/behavioral variants provided
- Support for any number of loops and datapath bitwidth (configured as generics)
- Compatible with the IEEE-1076 standard
- Uses the standard IEEE packages (numeric_std)
- Tested for large number of loops and large data bitwidths
- Simple block-level interface for bus-level integration to third-party designs

**DELIVERABLES**

- Documentation in ASCII text, PDF, HTML format
- Vendor-independent VHDL code for all architectures
- Configurable testbench
- Simulation (GHDL, Modelsim) and synthesis scripts (Xilinx XST/ISE)

**Performance/QoR**

<table>
<thead>
<tr>
<th>IP architecture</th>
<th>Clock freq.</th>
<th>Area (LUTs/reg)</th>
<th>Time</th>
</tr>
</thead>
<tbody>
<tr>
<td>HWLU</td>
<td>216</td>
<td>326 (0%)/81 (0%)</td>
<td>4.62 ms</td>
</tr>
<tr>
<td>IXGENR</td>
<td>243</td>
<td>242 (0%)/81 (0%)</td>
<td>4.11 ms</td>
</tr>
<tr>
<td>IXGENB</td>
<td>201</td>
<td>409 (0%)/81 (0%)</td>
<td>4.96 ms</td>
</tr>
</tbody>
</table>