

# Digital Signal Processing with FPGAs

Emil Matus TU Dresden emil.matus@tu-dresden.de

Terascale Detector Workshop Dresden, March'19



#### Outline

- Digital filter:
  - System theory
  - Filter design
  - Implementation
  - FPGA HW design



# System Theory



#### DSP System using FPGA

• DSP - Digital Signal Processing: use 'digits' to represent signals and systems



- s(t) Analyzed time-continuous signal
- x(t) Band-limited time-continuous signal
- x(nT) Time-discrete signal with sampling period T
- q(nT) Quantisation noise

#### VODAFONE CHAIR

#### **Time-Continuous Signal**





**VODAFONE CHAIR** 

#### Antialiasing Low-Pass Filter







### Analog-Digital Conversion: Sampling + Quantisation



**VODAFONE CHAIR** 

Emil Matus – TU Dresden

14/03/2019

7

#### **DSP: System Function**



$$y(nT) = S\{x(nT)\} \equiv y(n) = S\{x(n)\}$$



#### Linear Time-Invariant (LTI) System

• LTI System -  $N^{th}$  order linear difference equation,  $b_k, a_k \in \mathbb{R}$ :

$$y(n) = \sum_{k=0}^{N-1} b_k x(n-k) - \sum_{k=1}^{N-1} a_k y(n-k)$$

• Time-delay:

$$x(n) \xrightarrow{k \text{-times delay}} x(n-k)$$

• Multiplication by constant:



• Addition:



#### **VODAFONE CHAIR**

#### Signal Flow Graph

• Example -  $2^{nd}$  order filter with N = 2:

$$y(n) = \underbrace{b_0 x(n) + b_1 x(n-1) + b_2 x(n-2) - a_1 y(n-1) - a_2 y(n-2)}_{u(n)}$$

$$u(n)$$

$$v(n)$$

$$x(n-2) \xrightarrow{b_2} \\ x(n-1) \xrightarrow{b_1} \\ x(n) \xrightarrow{b_1} \\ y(n-1) \\ y(n) \\ y(n)$$



#### LTI System: Transfer Function

• Apply Z-transform to  $N^{th}$  order difference equation with  $b_k$ ,  $a_k \in \mathbb{R}$ :

$$x(n-k) \qquad \bullet \qquad z^{-k}X(z)$$
  

$$b_k x(n-k) \qquad \bullet \qquad b_k z^{-k}X(z)$$
  

$$u(n) + v(n) \qquad \bullet \qquad U(z) + V(z)$$

• Transfer function:

$$H(z) = \frac{Y(z)}{X(z)} = \frac{b_0 + b_1 z^{-1} + \dots + b_{N-1} z^{-(N-1)}}{a_0 + a_1 z^{-1} + \dots + a_{N-1} z^{-(N-1)}} = \frac{b_0}{a_0} \frac{\prod_{k=0}^{N-1} (z - n_k)}{\prod_{k=0}^{N-1} (z - p_k)}$$
  
Filter coefficients  
$$b_k, a_k$$
  
Zeros/poles  
$$n_k, p_k$$

**VODAFONE CHAIR** 

#### FIR vs. IIR Filter

• IIR – Infinite Impulse Response:  $a_k \neq 0$ , for some k = 1, 2, ..., N - 1

$$H(z) = \frac{b_0 + b_1 z^{-1} + \dots + b_{N-1} z^{-(N-1)}}{a_0 + a_1 z^{-1} + \dots + a_{N-1} z^{-(N-1)}} = \sum_{k=0}^{\infty} h(n) z^{-k} \rightarrow \text{Impulse response } h(n)$$
  
FIR - Finite Impulse Response:  
 $a_0 = 1, a_k = 0, k = 1, 2, \dots N - 1$ :  
$$H(z) = \frac{b_0 + b_1 z^{-1} + \dots + b_{N-1} z^{-(N-1)}}{1} = \sum_{k=0}^{N-1} b_k z^{-k} \rightarrow b_k = h(k)$$

#### Frequency Response of LTI System

• Transform system function H(z) by substituting  $\mathbf{z} = e^{j\Omega}$ ,  $\Omega = \omega T$ :

$$H(e^{j\Omega}) = \frac{Y(z)}{X(z)} = \frac{b_0 + b_1 e^{-j\Omega} + b_2 e^{-j2\Omega} + \dots + b_{N-1} e^{-j(N-1)\Omega}}{a_0 + a_1 e^{-j\Omega} + a_2 e^{-j2\Omega} + \dots + a_{N-1} e^{-j(N-1)\Omega}}$$

• Amplitude spectrum:

$$|H(e^{j\Omega})| = \begin{vmatrix} b_0 + b_1 e^{-j\Omega} + \dots + b_{N-1} e^{-j(N-1)\Omega} \\ a_0 + a_1 e^{-j\Omega} + \dots + a_{N-1} e^{-j(N-1)\Omega} \end{vmatrix}$$
Normalized sampling frequency:  

$$\begin{array}{c} & & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & & \\ & &$$

### Tools for LTI System Analysis

- Based on knowledge of system function  $b_k$ ,  $a_k$
- MATLAB:

...

- y=filter(b, a, x)
- [H,W]=freqz(b, a)
- zplane(b, a)

- filter signal x
- spectral response
- zeros, poles locations



# **Digital Filter Design**



#### Filter Design 1

- Determine  $N \in \mathbb{N}$  and  $b_k$ ,  $a_k \in \mathbb{R}$  according to specific criteria :
  - Frequency response specification:  $\min_{b_k, a_k} \max_{\Omega \in S} |H(\Omega) D(\Omega)|$





#### Filter Design 2

- Determine  $N \in \mathbb{N}$  and  $b_k$ ,  $a_k \in \mathbb{R}$  according to specific criteria :
  - Maximize SNR for known signal: Matched filter

Least square error with respect to desired signal

Minimum (statistical) mean square error (Wiener filter)





#### VODAFONE CHAIR

#### Filter Design 3

- Determine  $N \in \mathbb{N}$  and  $b_k$ ,  $a_k \in \mathbb{R}$  according to specific criteria :
  - Model analog function e.g.

$$y(t) = \int_{-\infty}^{t} x(\tau) d\tau$$

$$y(t) = \frac{dx(t)}{dt}$$



...

#### FIR Filter Design: Fourier Series and Windowing

• Desired frequency response  $D(e^{j\Omega})$  e.g. perfect low pass



• Continuous periodic function approximated using Fourier series:

$$D(e^{j\Omega}) = \sum_{n=-\infty} d(n)e^{-j\Omega n}$$

$$d(n) = \frac{1}{2\pi} \int_{-\pi}^{\pi} D(e^{j\Omega}) e^{j\Omega n} d\Omega$$



#### FIR Filter Design: Fourier Series and Windowing

• Windowing:

$$h(n) = d(n).w(n) \quad \bullet \quad H(e^{j\Omega}) = D(e^{j\Omega}) * W(e^{j\Omega})$$

- Example:
  - $f_D = 1kHz$
  - $f_S = 4kHz$
  - w(n) Recktangle N=2
- MATLAB:
  - w=window(...);
  - b=fir1(N,W\_D,'window')



#### **VODAFONE CHAIR**

### IIR Filter Design: Analog prototype filter

- Exploit theory of analog filter desing and transform analog filter to digital
- Filter types:
  - Butterworth: buttord()
  - Tschebyscheff I.:
  - Tschebyscheff II.:
  - Cauer (elliptic):

- buttord()  $\rightarrow$  butter()
- cheb1ord()  $\rightarrow$ 
  - cheb2ord()  $\rightarrow$ ellipord()  $\rightarrow$

cheby1()cheby1()

ellip()

- Analog to digital filter transformation:
  - Impulse invariance method
  - Bilinear transformation

Stability and causality issues !!!

#### **VODAFONE CHAIR**

...

### Filter Design Tools

- MATLAB fdatool filter design and analysis tool IDE
- MATLAB signal processing toolbox:
  - fir1()
  - fir2()
  - firpm()

• ...



#### **VODAFONE CHAIR**



## **Ditital Filter Implementation**



### **Representation of Signal and Coefficient Values**

- Floating point numbers:
  - Mantisa a
  - Exponent *b*
  - Number value:

 $d = a \ge 2^{b}$ 

- Fixed point number number:

  - Integer part *i* bits
    Fraction part *f* bits
    W = i + f [bits]
  - Number value:

 $d = 2^{i} \cdot 2^{f}$ 

Floating vs. fixed point: Higher accuracy, resolution, power and resource consumption

Signed fix-point numbers W=4

| Binary |       |       |       | Interpreted value |                 |  |
|--------|-------|-------|-------|-------------------|-----------------|--|
| $d_3$  | $d_2$ | $d_1$ | $d_0$ | $d_3d_2d_1d_0.$   | $d_3d_2.d_1d_0$ |  |
| 0      | 0     | 0     | 0     | 0                 | 0.00            |  |
| 0      | 0     | 0     | 1     | 1                 | 0.25            |  |
| 0      | 0     | 1     | 0     | 2                 | 0.50            |  |
|        |       |       |       | •••               |                 |  |
| 0      | 1     | 1     | 1     | 7                 | 1.75            |  |
| 1      | 0     | 0     | 0     | -8                | -2.00           |  |
| 1      | 0     | 0     | 1     | -7                | -1.75           |  |
| 1      | 0     | 1     | 0     | -6                | -1.50           |  |
|        |       |       |       |                   |                 |  |
| 1      | 1     | 1     | 0     | -2                | -0.50           |  |
| 1      | 1     | 1     | 1     | -1                | -0.25           |  |

#### **VODAFONE CHAIR**

#### **Binary Arithmetic**

- **2's complement** of *W*-bit binary number *d*
- Simple addition, subtraction and multiplication for signed/unsigned numbers
- Example W = 4 bit subtraction 5 2 = 3:
  - Approach 1:  $(0101)_2 (0010)_2 = (0011)_2$
  - Approach 2: 5 2 = 5 + (-2)
  - 2's complement  $(-2): 2^4 |-2| = 14 = (1110)_2$  $(0101)_2 + (1110)_2 = (\text{\$}0011)_2$

#### **VODAFONE CHAIR**

### Signal Flow Graph & Critical Path

- Critical path (CP) maximum propagation delay of combinatorial logic  $T_{CP}$
- Consequence for maximum achievable clock frequency of digital circuit:

$$f_{clk} \le \frac{1}{T_{CP}}$$

• Example 2<sup>nd</sup> order system *N=2*:



 $T_{CP} = T_{MUL} + (N+1)T_{ADD}$  $T_{CP} \sim NT_{ADD} \quad !!!$ 

Proportional to system order N

### Pipelining Signal Flow Graph: Cut Sets

- **Pipelining** shorten CP by inserting and/or relocating registers in signal flow graph
- **Cut set method** pipelining approach preserving system function:
  - Cut graph edges using closed loop
  - Add delay z<sup>-1</sup> and speedup z<sup>+1</sup> (z<sup>-1</sup>, z<sup>+1</sup> = 1) elements at cut positions according to edge orientation



### Pipelining Signal Flow Graph: Cut Sets

- **Pipelining** shorten CP by inserting and/or relocating registers in signal flow graph
- **Cut set method** pipelining approach preserving system function:
  - Cut graph edges using closed loop
  - Add delay z<sup>-1</sup> and speedup z<sup>+1</sup> (z<sup>-1</sup>, z<sup>+1</sup> = 1) elements at cut positions according to edge orientation



**VODAFONE CHAIR** 



# Digital Filter HW Design



#### **FPGA Introduction**

- Field Programmable Gate Array inherently parallel processing architecture
- Software defined devices: (re-)programmed by firmware to achieve desired function
- 1980: Invented
- 1985: Xilinx first commercial FPGA XC2064 Logic + RAM
- 2019: FPGA scalable computing platform
  - 500MHz
  - Logic blocks + large RAM
  - DSP blocks
  - CPUs
  - High speed I/O + Interfaces (memory, ADC, DAC, USB, Ethernet, PCIe)
  - PLLs + DLLs
  - GPU + VPU

#### **VODAFONE CHAIR**

#### **RTL** Design

• Register-Transfer-Level design



### **RTL Design using Verilog**

• Define module interface:





#### **RTL Design using Verilog**

• Define: Registers, wires, combinatorial functions,



...

### FIR Filter Design using HDL Verilog

• Define module interface:



```
wire signed [15:0] b2 = 16'h3db6;
wire signed [15:0] b2 = 16'h0352;
```

assign yi=b0\*xn[0]+b1\*xn[1]+b2\*xn[2]+b3\*xn[3];

```
always @(posedge clock)
begin
    xn[0]<=x; xn[1]<=xn[0];
    xn[2]<=xn[1]; xn[3]<=xn[2];
    y<=v;
end</pre>
```

```
endmodule
```

VODAFONE CHAIR

### HDL Synthesis 1

- Example: 4-tap direct form FIR filter
  - Device: Xilinx Artix-7
  - Synthesis: Xilinx Vivado
- RTL elaboration of Verilog source code to produce platform independent schematics



### HDL Synthesis 2

- Example: 4-tap direct form FIR filter
- Synthesis: mapping logic functions to FPGA specific resources by meeting design constraints





#### VODAFONE CHAIR

#### FPGA Design Methodology

- Register-transfer-level (RTL): Hardware Description Language (HDL): Verilog, VHDL
- IP-core integration based design: pre-defined/customizable reusable units of logic
  - Vivado design suite: Xilinx logiCORE IP FIR Compiler
- High-level synthesis (HLS): C-language synthesis tool
- Model-based: High-level abstraction using Simulink/LabVIEW... blocks

#### **VODAFONE CHAIR**

#### Thank You!



#### Linear Time-Invariant (LTI) System

• LTI System representation by N<sup>th</sup> order linear difference equation:



**VODAFONE CHAIR** 

#### **DSP** Design

• Digital vs. Analog: more flexibility, lower power consumption, higher reliability, higher accuracy and much better scalability

#### Synthesis: 4-tap direct form FIR w/ DSP block

Product family:

Project part:

xc7a100tlcsg324-2L

Artix-7

| Name <sup>1</sup>      | Slice Registers | DSPs  | Bonded IOB | BUFGCTRL |
|------------------------|-----------------|-------|------------|----------|
|                        | (126800)        | (240) | (210)      | (32)     |
| direct_fir_filter_4tap | 32              | 4     | 49         | 1        |

| Resource | Estimation | Available | Utilization % |
|----------|------------|-----------|---------------|
| FF       | 32         | 126800    | 0.03          |
| DSP      | 4          | 240       | 1.67          |
| ю        | 49         | 210       | 23.33         |
| BUFG     | 1          | 32        | 3.13          |



#### Synthesis: 4-tap direct form FIR w/o DSP block

| Name 1                              | Slice LUTs | Slice Registers | Bonded IOB | BUFGCTRL |
|-------------------------------------|------------|-----------------|------------|----------|
|                                     | (63400)    | (126800)        | (210)      | (32)     |
| direct_fir_filter_without_dsp_block | 445        | 96              | 49         | 1        |

| Resource | Estimation | Available | Utilization % |
|----------|------------|-----------|---------------|
| LUT      | 445        | 63400     | 0.70          |
| FF       | 96         | 126800    | 0.08          |
| ю        | 49         | 210       | 23.33         |
| BUFG     | 1          | 32        | 3.13          |



