# 2nd MTCA Workshop for Industry and Research # **MTCA.4 Front End Processing** December 2013 # **Presentation** INTERFACE CONCEPT, European manufacturer of electronic embedded systems for Industry, Aero, Telecommunication and Transportation markets #### Since 1987, IC has: - Develop a wide range of innovative products "on the shelf" (COTS) - Constitute a technical team recognized at the forefront of new technologies and wholly available to customer - Acquire global experience in the field of critical embedded applications - Establish strong technology partnerships and industrials ... allowing to offer its customers the best solutions on the market. # **Product Lines** ### **AGENDA** - FPGAs by far the highest ratio processing power/consumption for parallel computing - Virtex-7 example - FPGAs high speed transceivers - ADCs - Vita 57 used for high speed coders - IC-FEP-TCAa the best that the current technology can provide for MTCA.4 signal processing - Reference Design # FPGAs by far the highest ratio processing power/consumption for parallel computing # FPGA Technologies - FPGAs achieve more computing speed per unit of power compared to CPUs, DSPs, and GPUs (typically ten times on 16 integer 50 GOPS/watt see the main results of the National Science Foundation Study published in the IEEE Magazine Computing in Science and Engineering in Jan/Feb 2011) - These capabilities are very useful in application as Computer Vision, Radars,... - IC has developed a full product family of FPGA processor modules that take advantage of the high performance Xilinx Virtex-6/7 FPGAs - Virtex-7 boasts twice the performance and half the power consumption of the Virtex-6 # FPGA Technologies A study financed by the National Science Foundation (Alan George, Herman Lam, and Greg Stitt - IEEE magazine Computing in Science and Engineering - Jan/Feb 2011) # **Configurable logic devices** # FPGA Technologies # Fixed logic devices ### DSP48E1 Slice - Artix-7 FPGAs offer up to a 5X performance-per-watt advantage compared to multi-core DSPs - TI TM320C6678 multi-core DSP consumes an estimated 12,3 watts to deliver a peak DSP performance of 320 GMAC/s which normalizes to 26 GMAC/s per watt - Artix-7 FPGA can deliver up to 140 GMAC/s per watt Figure 12: DSP Performance per Watt http://www.xilinx.com/support/documentation/user\_guides/ug479\_7Series\_DSP48E1.pdf # Virtex-7 Example # Logic Elements (7 Series Xilinx) #### **SLICES** - Every slice contains : - Four logic-function generators (or look-up table) - Eight storage element - Wide-function multiplexers - Carry logic - Stroring data distributing RAM (\*) - Shifting data with 32 bits registers #### DSP48E1 Slice - 25 x 18 two's-complement multiplier - 48-bit accumulator (synchronous counter) - Power saving pre-adder - Single instruction multiple data (SIMD) arithmetic unit - Optional logical unit # FPGA Characteristics (Virtex-7) #### Virtex-7 FPGA Feature Summary #### Table 7: Virtex-7 FPGA Feature Summary | | Logic | Configu<br>Block | rable Logic<br>s (CLBs) | DSP | Bloc | k RAM Blo | ocks <sup>(4)</sup> | CMTs | PCle | | | | XADC | Total I/O | Max | | |-----------------------|----------------|-----------------------|--------------------------------|-----------------------|-------|-----------|---------------------|------|------|-----|-----|-----|--------|----------------------|----------------------------|---------------------| | Device <sup>(1)</sup> | Logic<br>Cells | Slices <sup>(2)</sup> | Max<br>Distributed<br>RAM (Kb) | Slices <sup>(3)</sup> | 18 Kb | 36 Kb | Max<br>(Kb) | (5) | (6) | GTX | GTH | GTZ | Blocks | Banks <sup>(7)</sup> | User<br>I/O <sup>(8)</sup> | SLRs <sup>(9)</sup> | | XC7V585T | 582,720 | 91,050 | 6,938 | 1,260 | 1,590 | 795 | 28,620 | 18 | 3 | 36 | 0 | 0 | 1 | 17 | 850 | N/A | | XC7V2000T | 1,954,560 | 305,400 | 21,550 | 2,160 | 2,584 | 1,292 | 46,512 | 24 | 4 | 36 | 0 | 0 | 1 | 24 | 1,200 | 4 | | XC7VX330T | 326,400 | 51,000 | 4,388 | 1,120 | 1,500 | 750 | 27,000 | 14 | 2 | 0 | 28 | 0 | 1 | 14 | 700 | N/A | | XC7VX415T | 412,160 | 64,400 | 6,525 | 2,160 | 1,760 | 880 | 31,680 | 12 | 2 | 0 | 48 | 0 | 1 | 12 | 600 | N/A | | XC7VX485T | 485,760 | 75,900 | 8,175 | 2,800 | 2,060 | 1,030 | 37,080 | 14 | 4 | 56 | 0 | 0 | 1 | 14 | 700 | N/A | | XC7VX550T | 554,240 | 86,600 | 8,725 | 2,880 | 2.360 | 1.180 | 42,480 | 20 | 2 | 0 | 80 | 0 | 1 | 16 | 600 | N/A | | XC7VX690T | 693,120 | 108,300 | 10,888 | 3,600 | 2,940 | 1,470 | 52,920 | 20 | 3 | 0 | 80 | 0 | 1 | 20 | 1,000 | N/A | | XC/VX980T | 979,200 | 153,000 | 13,838 | 3,600 | 3,000 | 1,500 | 54,000 | 18 | 3 | U | 72 | U | 1 | 18 | 880 | N/A | | XC7VX1140T | 1,139,200 | 178,000 | 17,700 | 3,360 | 3,760 | 1,880 | 67,680 | 24 | 4 | 0 | 96 | 0 | 1 | 22 | 1,100 | 4 | | XC7VH580T | 580,480 | 90,700 | 8,850 | 1,680 | 1,880 | 940 | 33,840 | 12 | 2 | 0 | 48 | 8 | 1 | 12 | 600 | 2 | | XC7VH870T | 876,160 | 136,900 | 13,275 | 2,520 | 2,820 | 1,410 | 50,760 | 18 | 3 | 0 | 72 | 16 | 1 | 13 | 650 | 3 | #### Notes: - EasyPath™-7 FPGAs are also available to provide a fast, simple, and risk-free solution for cost reducing Virtex-7 T and Virtex-7 XT FPGA designs - 2. Each 7 series FPGA slice contains four LUTs and eight flip-flops; only some slices can use their LUTs as distributed RAM or SRLs. - 3. Each DSP slice contains a pre-adder, a 25 x 18 multiplier, an adder, and an accumulator. - 4. Block RAMs are fundamentally 36 Kb in size; each block can also be used as two independent 18 Kb blocks. - Each CMT contains one MMCM and one PLL. - Virtex-7 T FPGA Interface Blocks for PCI Express support up to x8 Gen 2. Virtex-7 XT and Virtex-7 HT Interface Blocks for PCI Express support up to x8 Gen 3, with the exception of the XC7VX485T device, which supports x8 Gen 2. - Does not include configuration Bank 0. - 8. This number does not include GTP, GTX, GTH, or GTZ transceivers. - Super logic regions (SLRs) are the constituent parts of FPGAs that use SSI technology. Virtex-7 HT devices use SSI technology to connect SLRs with 28.05 Gb/s transceivers. # FPGAs high speed transceivers # GTX – GTH (Virtex-7) - The GTX tranceivers support line rate between 500 Mb/s and 12.5 Gb/s - Configurable and tightly integrated with the programmable logic ressources Figure 1-2: GTX Transceiver Quad Configuration http://www.xilinx.com/support/documentation/data sheets/ds180 7Series Overview.pdf # Insertion loss Differential Insertion Loss of Samples Nelco 4000-13 traces 16-inch and 26-inch | | 3 GHz | 5 GHz | |---------------|--------|----------| | 16 inch Nelco | ~7 dB | ~11.5 dB | | 26 inch Nelco | ~12 dB | ~18 dB | # **Insertion loss** # Transmitter End Techniques - Techniques used to combat insertion loss - Transmitter end techniques : De-emphasis 11110000 Bianary Pattern without de-emphasis (left) and with 6 dB post de-emphasis (right) # TX Emphasis Eye Pattern GTX Tranceiver 10 Gb/s Eye Diagram without (left) and with 2 dB post tap de-empahsis (right) Peak to peak ISI jitter reduced by more than two # Receiver End Techniques Continuous Time Linear Equalization (CTLE) Decision Feedback Equalizer (DFE) # **ADCs** # RF Sampling #### **Benefits of RF-Sampling** - A single direct RF-sampling ADC can replace an entire IF-sampling or ZIF-sampling subsystem of mixers, LO synthesizers, amplifiers, filters, and ADCs - Reduction of bill of materials (BOM) cost, design time, board size, weight, and power. - Analog frequency down-conversion function moved into the DSP, FPGA, or ASIC, where frequencies and bandwidths can be controlled digitally, enabling maximum system flexibility and re-configurability - Example TI ADC 12D1600 # RF Sampling ADC12D1600 ### **ADC Performance** $$\begin{split} SNR &= 10log_{10} \ (P_S/P_N) \\ SFDR &= 10log_{10} \ (P_S/P_H) \\ THD &= 10 \ log 10 \ (P_S/P_D) \\ SINAD &= 10log_{10} \ P_S/(P_D + P_N) \\ ENOB &= (SINAD - 1.76)/6.02 \end{split}$$ Ps: Signal Power (red) P<sub>N</sub>: Noise Floor Power (blue) P<sub>D</sub>: Power of harmonics 2-6 (black) P<sub>H</sub>: Power of next highest spur (black) Digital Output - Frequency Domain Frequency Fin In this plot harmonic #3 would be P<sub>H</sub> in the SFDR calculation, since it is the largest nonfundamental spur. #### Reference equations: SNRQ: quantization best-case SNR => SNRQ = 6.02N +1.76 (dB) SNRT : Total desired SNR: SNRT = $-10.log(10^{-sNRJ}/10 + 10^{-sNRQ/10})$ SNRj : SNR due to jitter SNRj = -20.log(2.pi.fin.TJ) where TJ2=TJ\_ext2+TJ\_cdc2+TJ\_adc2 # Required external clock performance | ADC Theorical resolution | 12 | bits | | | User input data | | | |---------------------------------------|-------|--------|--------------------------------------------------|----------------|--------------------|--|--| | Desired SNR (SNRT) | 58 | dB | | | Output data | | | | at input frequency (fin) | 700 | Mhz | | | | | | | Internal ADC Jitter (TJ_adc) | 200 | fs | | | | | | | best-case SNR (SNRQ) | 74,00 | dB | | | | | | | due to Jitter SNR (SNRJ) | 58,11 | dB | | | | | | | Total clock jitter (TJ) | 283 | fs RMS | : ADC + Clo | xx buffer + ex | ternal clock input | | | | clock buffer additive jitter (TJ_cdc) | 0 | fs RMS | : Jitter added by the clock distribution circuit | | | | | | external clock specification (TJ_ext) | 200 | & RMS | : Jitter from | external clock | input | | | # Required external clock performance #### Phase Noise at 1300,0 MHz - Good phase noise performance (internal VCO = ~ -160dBc/Hz floor) - Integrated VCO for 1300 and 1500 MHz sampling clock synthesis, - Jitter cleaning on reference input, - Simple distribution sampling clock directly to ADCs # IC-ADC-FMCc # IC Signal Viewer Additionally, *INTERFACE CONCEPT* has developed an application performing a FFT on a digital data flow and calculating the most popular specifications for quantifying ADC dynamic performance: SINAD, ENOB, THD, SNR, SFDR, Ain SNRFS, av. bin noise, NSD. IC Signal Wiever runs on a PC, connected to the CPU via an Ethernet link. This application allows to: - ✓ Configure the ADC board, - Channel selection, sampling frequency, # of samples - ✓ Configure the DAC board (when preferred to external generators) - Channel selection, generated frequency, gain - ✓ Launch signal generation by the DAC (optional) #### and (one shot or continuously): - ✓ Launch the acquisition of x samples by the ADC - ✓ Download the digital samples from the ADC via the CPU - ✓ Perform FFT and calculation of the specifications. # IC Signal Viewer - 1 Signal acquisition - 2 DMA transfert to the CPU - 3 Acquisition & Processing # **Sampling Clock** **Input Signal** # IC Signal Viewer # **VITA 57** # Architecture of the FEP boards #### **Key Benefits of FMCs:** - Data throughput: Support of individual signaling speeds up to 10 Gb/s - Latency: Elimination of protocol overhead removes latency and ensures deterministic data delivery - Design simplicity: Expertise in protocol standards such as PCI™, PCI Express®, or Serial RapidIO not required - System overhead: Power consumption, IP core costs, engineering time, and material costs reduced through simplification of system design - Design reuse: Whether using a custom in-house board design or a commercial off-the-shelf (COTS) mezzanine or carrier card, the FMC standard promotes the ability to retarget existing FPGA/carrier card designs to a new I/O. All that is required is swapping out the FMC module and slightly adjusting the FPGA design. # FMC Modules examples PREMIUM - IC-ADC-FMCa unit : Quad 16-bit, 135Msps - IC-ADC-FMCb unit : Quad 14 bit, 400 Msps (or Quad 12 bit, 500 Msps) IC-ADC-FMCa - IC-ADC-FMCc unit : Quad 12 bit, 1.3Gsps - IC-ADA-FMCa unit : Dual DAC / Dual ADC 12-bits 1Gsps (TBC) - IC-DAC-FMCa : quad 16-bit 800Msps - IC-DAC-FMCb : quad 16-bit 1Gsps IC-IO-FMCa : Dual QSFP+ ports IC-ADC-FMCb IC-DAC-FMCa IC-QSFP-FMCa **IO Modules:** # **IC-FEP-TCAa** # IC-FEP-TCAa + 2 x IC-ADC-FMCc - 8 x 1300 MSPS 12 bit channels that can be all strictly synchronized - Low phase-noise ADC clocking system - High DDR3 bandwidth: 2 memory banks with a 64 bit interface and a 40 bit interface at 1600 Mtransfers/s allowing safe real time storage of the intake of samples on all the channels - Very high bandwidth on the backplane with four times GTH x 4 - Very high bandwidth on the Zone 3 connectors : 38 LVDS and a GTH x 4 # Architecture of the FEP boards IC has the strategy of working on the VPX/OpenVPX format the only one that is proven and can sustain high speed connections through the backplane with differential pairs. Use of the FMC standard (Vita 57) to get the maximum of flexibility and reuse capabilities Virtex-6 and Virtex-7 FPGAs 6U and 3U VPX # IC-FEP-VPX3c - Compatible with the IC-FEP-VPX3b - The VX485T version has GTX - The VX690T have GTH (faster) and is the baseline # IC-FEP-VPX6a - One QorlQ processor P2020 up to 1.2 GHz, e500 v2 core with : - 1 GB of DDR3 with ECC - up to 512 MBytes of NOR Flash - up to 16 GBytes of Nand Solid-state Disk - Two Xilinx Virtex-6 (SX315T/SX475T or LX365/550T),both offering: - two banks of DDR3: 40-bit wide, 1.25 GBytes each, 800MT/s - One bank SRAM: 18-bit wide / 9 MB, 600MT/s - one SPI flash (16 MBytes) - One NOR flash (128 MBytes) - One Spartan-6 LX-45T (control Node). Management of the bitstream downloading « on the fly » of the two Virtex 6 - Very comprehensive Interface - 4 x PCle x4 port - GTX ports (from FPGAs) - differential pairs (from FPGAs) - 32 differential pairs (from each FMC IOs conne - general purpose IOs - 2 Ethernet ports (available as 1000BT or 1000E - 1 RS485 port, 2 USB 2.0 ports, one eUSB housing slot ### IC-FEP-VPX6a # IC-FEP-VPX6b # Signal Processing Reference Design # Signal Processing Reference Design INTERFACE CONCEPT has developed a wide product range, incorporating the latest technological innovations: Ethernet Switches and IP Routers, Intel and Freescale based SBCs and IO boards (Graphic, storage...) FPGA boards intended for high-computing have been added to this product portfolio, and represent a market on which IC concentrates as a major international player. - ✓ To facilitate the integration of these building blocks, - ✓ To streamline developments, and - ✓ To empower customers to concentrate their efforts on their most critical tasks, maximizing thus their added value, IC provides with these bulding blocks a complete Signal Processing Reference design... # ADC Reference Design # PCIe DMA Engine See the PCIe DMA Engine Reference Design Quick Start Guide # Example of RF: PCle DMA Engine #### **FMC Example**