Kontron HPEC Proof-of-Concept delivers breakthrough I/O data bandwidth

5High-Performance Embedded Computing (HPEC) has made tremendous leaps in performance and capability, with new implementations adding to the advancements. Processors like the 3rd generation Intel® Core™ i7 are making HPEC a powerful and cost-effective solution, eliminating the compute performance bottleneck of previous generations. However, a major challenge for HPEC platforms remains, "How do I get more high-bandwidth data to the processors?" Applications such as radar, sonar, and other signal processing systems generate gigabytes of data that need high GFLOP computational power.
Editor’s Note: This article first appeared as a blog post on the Intel Embedded Community website, and was published by the Intel Intelligent Systems Alliance.

Kontron took a major step towards solving this problem with its HPEC Proof-of-Concept (PoC) platform, which delivers 40 Gigabit Ethernet (GbE)-equivalent data transfer rates via PCI Express 3.0 (Figure 1). The platform delivers breakthrough I/O data bandwidth in a small footprint for next-generation embedded radar/sonar applications. The PoC platform is based on VPX, which is known for its high performance, harsh environment capability, and small size. Designed as a complete system, the Kontron HPEC PoC integrates the 3U VPX SBC VX3044, the VPX PCIe Switch VX3905, and the high-end L2/L3 Ethernet switch VX3910 into one platform (Figure 2).

Figure 1: The Kontron HPEC PoC delivers 40 GbE-equivalent data rates over PCI Express 3.0 in a small footprint, VPX-based solution.
(Click graphic to zoom by 1.9x)

Figure 2: The VX3044 is a 3U VPX SBC integrated in the Kontron PoC for HPEC applications.
(Click graphic to zoom by 1.9x)

The Kontron PoC is unusual for its use of PCIe instead of 10 GbE, which is a popular serial fabric option for high-performance computing platforms. Ethernet has the benefit of widespread adoption and excellent software support for the TCP/IP protocol. However, some applications require even higher throughput that is available through serial fabrics like PCIe 3.0, but these solutions have been hindered by programming challenges and more limited support of communication protocols.

What sets the Kontron platform apart is the use of PCIe 3.0 to deliver 40 GbE-equivalent data transfer rates using common TCP/IP protocols. The combination of PCIe 3.0 and TCP/IP is achieved through Kontron’s VXFabric* middleware, which implements the TCP/IP protocol over the PCIe infrastructure to boost transmission bandwidth to nearly 40 GbE speeds (Figure 3). This allows the I/O data bandwidth to match up nicely with the capabilities of 3rd generation Intel Core i7 processors on the VPX blades while running a well-established transfer protocol, minimizing software impact during system development and enabling the quick transfer of legacy applications to the new platform with little or no modification. This technology makes it easier to fully utilize the processing potential of the 3rd generation Intel Core i7. Routing of the PCIe fabric is provided by the Kontron VX3906, one of the industry’s first PCIe 3.0 VPX switches. This switch roughly doubles the per-lane throughput compared to PCIe 2.0, providing a major performance boost.

Figure 3: Kontron's VXFabric middleware enables the TCP/IP protocol over PCI Express to facilitate 40 GbE-like bandwidth speeds.

With VXFabric, the use of standard communication protocols – TCP/IP or UDP/IP based on the socket API – protects the application software investment. Legacy software can operate now and new software based on TCP/IP is ensured support for years to come. OEMs and developers can enjoy an optimized Total Cost of Ownership (TCO) and have a direct migration path from their existing applications deploying today. VXFabric addresses all fast and low latency peer-to-peer inter-computer node communication within a chassis. VXFabric can deliver up to 4.2 Gigabytes per second (GBps) in data throughput between VPX boards in a rack over PCI Express.

In addition to the high-performance interconnection between blades, there is the tremendous amount of processing power provided with the 3rd generation Intel Core i7-3612QE processor (4M cache, 4 execution cores, 8 threads) with the integrated graphics core, the Intel® HD Graphics 4000. The Intel HD Graphics 4000 provides 16 graphics Execution Units (EUs) that produce a noticeable improvement in 3D performance of as much as 2x. This combination in the 22 nm process makes for low power consumption, and the integration enables even smaller packages helping to increase blade functional density that is so important to HPEC.

HPEC applications require high GFLOPS performance. This is achieved with the Core/GPU combination of the 3rd generation Intel Core i7-3612QE processor and Intel HD Graphics 4000 (Figure 4). This low-voltage processor has a very favorable GFLOPS/watt ratio thanks to Intel® Advanced Vector Extensions (Intel® AVX) technology.

Graphics computing offers unprecedented application performance by offloading compute-intensive portions of the application to the Intel HD Graphics 4000 execution units while the remainder of the code still runs on the CPU cores. The CPU cores are optimized for serial processing while the graphic EUs are more efficient for parallel process performance. Many radar and sonar applications can be broken down into serial and parallel algorithms that can take advantage of this combination. The graphics EUs provide a massively parallel processing subsystem that can focus on many threads and parallel large data sets boosting the GFLOPS to high-performance levels.

Figure 4: Intel HD Graphics 4000 execution units add GFLOPS to HPEC applications.
(Click graphic to zoom by 1.9x)

Looking forward, the performance story will get even better with the 4th generation Intel® Core processor family. These chips introduce the Intel® Advanced Vector Extensions (Intel® AVX) 2.0 instruction set, which doubles peak floating-point throughput to enable a quad-core mobile-class processor to achieve up to 307 GFLOPS at 2.4 GHz. The graphics engine is also upgraded, offering another 352 GFLOPS of raw performance through OpenCL 1.2 programming – more than doubling overall compute potential – while adding only a few watts of power consumption.

The Kontron PoC platform is also notable for integrating a wealth of hardware and software that simplifies development. The PoC platform includes a Linux distribution, diskless node support, and parallel workload management software. Also integrated are compilers, optimized FFT benchmark code samples, and a stress test application framework for benchmarking. Computer health management is enabled through a chassis management board kit, system-wide PBIT, and power/performance management at the system level.

The goal of the Kontron HPEC PoC is to help developers dramatically streamline the process from design to field deployment of next-generation radar and sonar that are expected to make a tremendous jump in a processing power and bandwidth.

The VPX-based PoC is a flexible design that can be specifically optimized for the most demanding applications. The configurable PCI Express switch fabric interconnect can be routed in ways most appropriate for the data transfer needs of HPEC platforms. Legacy application support with TCP/IP make the platform even more attractive, leaving it to the imagination of designers to utilize the 10x increase in bandwidth. Based on mainstream IT technology (TCP/IP, PCIe, Intel® processors), the Kontron HPEC PoC is also developed to address the U.S. military’s smart procurement initiatives that put into place more rapid and agile purchasing processes.

Kontron | us.kontron.com


Original text can be found at: opsy.st/IEuYJa