# FPGA-Based Hardware Accelerators for 10/40 GigE TCP/IP and Other Protocols

Dr. Endric Schubert, Univ. Ulm / Missing Link Electronics
Ulrich Langenbach, Fraunhofer Heinrich-Hertz-Institute

#### We are

a Silicon Valley based technology company with offices in Germany. We are partner of leading electronic device and solution providers and have been enabling key innovators in the automotive, industrial, test & measurement markets to build better Embedded Systems, faster.

#### **Our Mission is**

To develop and market technology solutions for Embedded Systems Realization via pre-validated IP and expert application support, and to combine off-the-shelf FPGA devices with Open-Source Software for dependable, configurable Embedded System platforms

#### **Our Expertise is**

I/O connectivity and acceleration of data communication protocols, additionally opening up FPGA technology for analog applications, and the integration and optimization of Open Source Linux and Android software stacks on modern extensible processing architectures.



#### Network Processing at 10 GigE has a Huge Compute Burden ...

Transporting 1 bit per second needs 1 Hz

- 1 GigE → 1 CPU at 1 GHz
- 10 GigE → 4 CPUs at 2.5 GHz







#### ... and soon we will see 25 GigE





# **Design Choices for Network Processing in SoC FPGAs**

#### SoC FPGA as (yet) another computer

|         | Intel<br>i7-4770 | Xilinx<br>Zynq 7045              |
|---------|------------------|----------------------------------|
| Compute | ~100 GFLOPS      | 5 GFLOPS (PS)<br>778 GFLOPS (PL) |
| TDP     | 84 W             | <20 W (typ)                      |

SOC FPGA has 4x more compute With ¼ the power dissipation!



[http://www.xilinx.com/products/technology/dsp.html]



4

#### Network Stack in RTL from Fraunhofer Heinrich-Hertz-Institute

• Brings full TCP/UDP/IP connectivity to FPGAs even when there is no CPU available. Accelerate CPUs by offloading TCP/UDP/IP processing into programmable logic.





5

#### **Network Protocol Acceleration Platform Architecture**





### **High-Level Synthesis Design Flow for SoC FPGA**

Input C/C++/SystemC into High-Level Synthesis to generate VHDL/Verilog code





## **Working Principles of High-Level Synthesis**

• Design automation runs scheduling and resource allocation to generate RTL code comprising data path plus state machines for control.





## **Benefits of High-Level Synthesis**

 Automatic performance optimization via parallelization at dataflow level



 Automatic interface synthesis and code generation for variety of real-life HW/SW connectivity

| Bus Interfaces |      |        |                   | Argument       | ,<br>, | /ariabl  | e      |          | Pointe<br>/ariabl |        |                                    | Array  |        | Reference<br>Variable |                   |   |  |
|----------------|------|--------|-------------------|----------------|--------|----------|--------|----------|-------------------|--------|------------------------------------|--------|--------|-----------------------|-------------------|---|--|
|                | AXI4 |        |                   | Туре           | Pas    | :s-by- v | alue   | Pass     | -by-ref           | erence | Pass-by- Pass-by-refe<br>reference |        |        |                       | Pass-by-reference |   |  |
| Stream         | Lite | Master |                   | Interface Type | I      | IO       | 0      | I        | IO                | 0      | I                                  | IO     | 0      | I                     | IO                | 0 |  |
|                |      |        |                   | ap_none        | D      |          |        | D        |                   |        |                                    |        |        | D                     |                   |   |  |
|                |      |        |                   | ap_stable      |        |          |        |          |                   |        |                                    |        |        |                       |                   |   |  |
|                |      |        | $\Leftrightarrow$ | ap_ack         |        |          |        |          |                   |        |                                    |        |        |                       |                   |   |  |
|                |      |        | $\Leftrightarrow$ | ap_vld         |        |          |        |          |                   | D      |                                    |        |        |                       |                   | D |  |
|                |      |        | $\Leftrightarrow$ | ap_ovld        |        |          |        |          | D                 |        |                                    |        |        |                       | D                 |   |  |
|                |      |        | $\Leftrightarrow$ | ap_hs          |        |          |        |          |                   |        |                                    |        |        |                       |                   |   |  |
|                |      |        |                   | ap_memory      |        |          |        |          |                   |        | D                                  | D      | D      |                       |                   |   |  |
|                |      |        | $\Leftrightarrow$ | ap_fifo        |        |          |        |          |                   |        |                                    |        |        |                       |                   |   |  |
|                |      |        |                   | ap_bus         |        |          |        |          |                   |        |                                    |        |        |                       |                   |   |  |
|                |      |        | $\Leftrightarrow$ | ap_ctrl_none   |        |          |        |          |                   |        |                                    |        |        |                       |                   |   |  |
|                |      |        |                   | ap_ctrl_hs     |        |          | D      |          |                   |        |                                    |        |        |                       |                   |   |  |
|                |      |        |                   | ap_ctrl_chain  |        |          |        |          |                   |        |                                    |        |        |                       |                   |   |  |
|                |      |        |                   |                |        | Sup      | ported | l Interf | ace               |        | Unsi                               | upport | erface |                       |                   |   |  |



## **Visualization and User Interaction in High-Level Synthesis Tool**

| A 🖸                                                                                    |      |       |      |         | e/local/work/florianh/test90_HLS_example/ownErode/ownErode) <@tasse> |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
|----------------------------------------------------------------------------------------|------|-------|------|---------|----------------------------------------------------------------------|-------------|-------|------------------------------------------|----------|--|--|--|--|--|--|--|--|--|
| <u>F</u> ile <u>E</u> dit <u>P</u> roject <u>S</u> olution <u>W</u> indow <u>H</u> elp |      |       |      |         |                                                                      |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
| ] 🗶 ] 🗊 V 🖆 📄 660 🔇                                                                    |      |       |      |         |                                                                      |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
| 🗱 Debug 🔊 Synthesis 🔐 Analysis                                                         |      |       |      |         |                                                                      |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
| Module Hierarchy                                                                       |      |       |      |         | 🗈 top.cpp 👔 ownErode_csynth.rpt 🗧 Performance - ownErode 🕱 🗧         | - 🗆         |       |                                          |          |  |  |  |  |  |  |  |  |  |
|                                                                                        | BRAM | DSP   | FF   | LUT     | Latency                                                              | Interval    | : Pip |                                          |          |  |  |  |  |  |  |  |  |  |
| 🖌 🔍 ownErode                                                                           | 9    | 0     |      | 3294    |                                                                      | undef       | dat   | Current Module : ownErode                | ownErode |  |  |  |  |  |  |  |  |  |
| - • init                                                                               | 0    | 0     | 50   | 50      | 0                                                                    | 0           | no    | Operation\Control Step C0 C1 C2 C3 C4 C5 |          |  |  |  |  |  |  |  |  |  |
| - • init_1                                                                             | 0    | 0     | 26   | 26      | 0                                                                    | 0           | no    | cols read(wire read)                     |          |  |  |  |  |  |  |  |  |  |
| <ul> <li>AXIvideo2Mat_32_1080_1920_16_s</li> </ul>                                     | 0    | 0     | 180  | 220     |                                                                      | undef       | nor   | rows read(wire read)                     |          |  |  |  |  |  |  |  |  |  |
| Erode_16_16_1080_1920_s                                                                | 9    | 0     | 1526 | 2617    |                                                                      | undef       | no    |                                          | _        |  |  |  |  |  |  |  |  |  |
| <ul> <li>Filter_opr_erode_kernel_16_16_unsigned_cha</li> </ul>                         | r 9  | 0     | 980  | 1859    | 63~208225                                                            | 63 ~ 208225 | nor   | init(function)                           |          |  |  |  |  |  |  |  |  |  |
| getStructuringElement_unsigned_char_int_ir                                             | n 0  | 0     | 469  | 756     |                                                                      | undef       | no    | init_1(function)                         |          |  |  |  |  |  |  |  |  |  |
| Mat2AXIvideo_32_1080_1920_16_s                                                         | 0    | 0     | 57   | 111     | 1~2076841                                                            | 1 ~ 2076841 | nor   | AXIvideo2Mat_32_10                       |          |  |  |  |  |  |  |  |  |  |
|                                                                                        |      |       |      |         |                                                                      |             |       | Erode_16_16_1080_1                       |          |  |  |  |  |  |  |  |  |  |
| 🖞 Performance Profile 📕 Resource Profile 🕱 🛛 🖓 🗖                                       |      |       |      |         |                                                                      |             |       | Mat2AXIvideo_32_10                       |          |  |  |  |  |  |  |  |  |  |
|                                                                                        |      |       |      |         |                                                                      |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
| BRAM A DSP FF LUT                                                                      | 1    | 0 Eit | s P1 | Bits P2 | Banks/Dep                                                            | th          | -     |                                          |          |  |  |  |  |  |  |  |  |  |
| ✓ ● ownErode 9 0 1913 329                                                              |      |       |      |         |                                                                      |             | _     |                                          | -        |  |  |  |  |  |  |  |  |  |
| > 1/O Ports(16)                                                                        | 152  |       |      |         |                                                                      |             |       |                                          | -        |  |  |  |  |  |  |  |  |  |
| ≻ Instances(5) 9 0 1839 302                                                            | 4    |       |      |         |                                                                      |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
| - 🎟 Memories(0) 0 0 0                                                                  | 0    |       |      |         | 0                                                                    |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
| ≻∑ Expressions(3) 0 0 6                                                                | 3    | 3     |      | 0       |                                                                      |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
| ≻ 0001 Registers(14) 14                                                                | 14   |       |      |         |                                                                      |             |       |                                          |          |  |  |  |  |  |  |  |  |  |
| ≻∰ FIFO(12) 0 60 264                                                                   | 120  |       |      |         | 18                                                                   |             |       |                                          | <b>•</b> |  |  |  |  |  |  |  |  |  |
| Multiplexers(0) 0 0                                                                    | 0    |       |      |         | 0                                                                    |             |       |                                          | •        |  |  |  |  |  |  |  |  |  |
|                                                                                        |      |       |      |         |                                                                      |             |       | Performance Resource                     |          |  |  |  |  |  |  |  |  |  |



#### **Conclusion and References**

- Significant productivity increase for protocol oriented or dataflow based design blocks.
- Easy to adopt: Known languages
   C/C++ combined with known tool chain.
- → Add this to your bag of tricks!

- UG998 Introduction to FPGA Design Using High-Level Synthesis
- UG871 Vivado Design Suite Tutorial: High-Level Synthesis
- XAPP1209 Designing Protocol Processing Systems with Vivado High-Level Synthesis
- UG949 UltraFast Design Methodology Guide for the Vivado Design Suite



#### **Contact Information**

Dr. Endric Schubert <u>endric.schubert@uni-ulm.de</u> <u>endric@MLEcorp.com</u> Phone US: +1 (408) 320-6139 Phone DE: +49 (731) 141149-66



