Images courtesy SRC Computers
Figure 1: In the Implicit+Explicit Architecture, Dense Logic Devices (DLDs) encompass a family of components that includes microprocessors,
digital signal processors, as well as some ASICs. These processing elements are all implicitly controlled and typically are made up of fixed logic that is not altered by the user.
Figure 2: Systems can be built with a single MAP processor
and microprocessor combination, or when more flexibility is
desired, Multi-Ported Common Memory accommodating up to
three MAP processors and Hi-Bar switches accommodating
thousands of MAP processors can be employed.
Figure 3: SRC servers that use the Hi-Bar crossbar switch interconnect can incorporate common memory nodes in addition to microprocessor and MAP nodes. Each of these common memory nodes contains an intelligent DMA controller and up to 16 GBs of DDR-2 SDRAM.
Figure 4: The MAP processor used in this system was the most powerful SRC-6 MAP processor ever produced. It was coupled to an Intel Pentium microprocessor and used a Fedora Linux operating system.
Figure 5: The second airborne system in production is a 10-module system designed for payload bay 3 of the General Atomics Sky Warrior, but is also usable in other larger manned and unmanned platforms. It contains a dual Xeon motherboard, a Hi-Bar switch, 750 Gbytes of removable encrypted storage, 28 VDC power system, thermal solution and a mixture of up to 10 MAP processors or common memory
Figure 6: This system is being designed to withstand an operating range from –50C to +50C, an altitude limit in excess of 25,000 feet. and will meet shock and
vibration requirements for single engine aircraft weighing less than 12,500 pounds.
Figure 7: A grayscale pixel’s intensity is simply the pixel’s eight-bit numeric value, but the intensity information is distributed among the individual RGB values for a color pixel. To obtain the intensity value
for an RGB pixel, each 24 bit RGB value is transformed from the RGB color space to the Hue-Saturation-Intensity (HSI) color space. The intensity values for all pixels in both frames are then histogrammed.
From these two intensity histograms, a statistical Cumulative Distribution Function (CDF) is created and then normalized for each frame. A mapping function is created from these two normalized CDF
arrays to map the original color pixel intensity values to a new intensity value such that the new intensity value distribution matches the GS pixel intensity value distribution. The original intensity values are
re-mapped and the new HSI image is transformed back into the RGB color space.
Figure 8: The MAP processor’s GCM Bank 0 acts as a frame buffer for the RGB image and GCM Bank 1 acts as a frame buffer for the GS image. In stage
0, two RGB and six GS pixel intensities are histogrammed in parallel every clock. The integer RGB intensity calculation is part of the RGB histogramming
pipeline. After all pixel intensities for both frames are histogrammed, stage 1 calculates the CDF arrays for both histograms for all histogram bins in parallel.
Stage 2 normalizes both CDF arrays in parallel, a single precision floating point (SPFP) calculation. Stage 3 uses both normalized CDF arrays to generate
the histogram matching MAP array. Finally, stage 4 re-reads the RGB image data two RGB pixels per clock from GCM Bank 0 and calculates the HSI pixel
values. The two integer intensity values select two new intensity values from the Map array (generated in stage 3). The two new intensity values are cast to
SPFP, and together with the two SPFP pixel hue and saturation values, are converted back to the 24 bpp RGB color space and stored in GCM Bank 1.
Figure 9: The CPU normalized cross-correlation application is a single threaded, serial implementation of the algorithm shown in Figure 8.
SRC Computers was the first to pioneer the breakthrough that was required to make DEL computing ready for prime time. We tightly coupled the hardware into a standards-based environment using a high-bandwidth and low-latency connection, and then made this inherently superior hardware performance accessible to the broadest possible range of Fortran and C application/developers.
The MAP Processor
The patented MAP processor uses commodity reconfigurable components to accomplish control, user-defined compute, data prefetch, and data access functions. This compute capability is teamed with very high on and off-module interconnect and memory bandwidths.
The MAP processor’s 16 logical banks of on-board SRAM memory provide 19.2 GBytes/sec of local memory bandwidth. In addition, the MAP processor contains two 1-Gbyte globally shared DDR-2 SDRAM banks. The processor is equipped with two separate input and output ports with each port sustaining a data payload bandwidth of 3.6 GBytes/sec. This allows it to simultaneously sustain two input and two output DMAs at an aggregate bandwidth of 14.4 GBytes/sec.
The MAP processor is a 5-inch-by-7-inch module typically housed in a 5.25-inch drive bay enclosure along with its cooling solution and power converters. Each MAP processor is powered by +12v from a standard disk drive connector, which allows the easy incorporation of MAP processors into commodity PC enclosures. It also allows for the dense packaging of multiple MAP processors into 2U-high rack-mount chassis for use in SRC high performance servers as well as custom enclosures for rugged applications.
Systems can be built with a single MAP processor and microprocessor combination, or when more flexibility is desired, Multi-Ported Common Memory accommodating up to three MAP processors, and SRC’s proprietary Hi-Bar® switches accommodating thousands of MAP processors can be employed. Each Hi-Bar module supports 64-bit addressing and has 16 input and 16 output ports to connect to 16 nodes. Microprocessors, MAP processors and Common Memory nodes can all be connected to Hi-Bar switch in any configuration as shown in Figure 3. Each input or output port sustains a yielded data payload of 3.6 GBytes/sec for an aggregate yielded bisection data bandwidth of 57.6 GBytes/sec per 16 ports. Port-to-port latency is 180 ns with Single Error Correction and Double Error Detection (SECDED) implemented on each port. Hi-Bar switches can also be interconnected in multi-tier configurations, allowing two tiers to support 256 nodes. When used in SRC server products, each Hi-Bar switch is housed in a 1 or 2U-high, 19-inch wide rack mountable chassis, along with its power supplies and cooling solution. In custom enclosures the Hi-Bar switch can be mounted in a variety of alternative configurations.