GRFPU High-Performance Floating-Point Unit

The GRFPU is an IEEE-754 compliant floating-point unit, supporting both single and double precision operands. The advanced design combines high throughput with low latency, providing up to 250 MFLOPS on a 0.13 um ASIC process. The host interface is clean and versatile, simplifying the interfacing to processor pipelines and DSPs. The accuracy and convergence of the FPU algorithms have been proven mathematically, and the implementation has been validated with more than 20 million test vectors. A brief datasheet an be downloaded here.
  • IEEE-754 compliant, supporting all rounding modes and exceptions
  • Operations: add, subtract, multiply, divide, square-root, convert, compare, move, abs, negate
  • Data formats: single and double precision (32- and 64-bit floats)
  • Fully pipelined, 3 clock cycles latency for all operations except divide and square-root
  • Non-blocking parallel execution of divide and square-root operations
  • Clean and versatile interface
  • LEON FP Control unit available
  • Supports all SPARC V8 floating-point instructions
  • 250 MHz (250 MFLOPS) on a typical 0.13um standard cell process using less than 100 kgates
  • 65 MHz (65 MFLOPS) on a Virtex-II FPGA using approximately 8,500 LUTs
  • Fault-tolerant (FT) version available
Functional Description The GRFPU performs operations on single and double precision floating-point operands. All operations are IEEE-754 compliant, with exception of denormalized numbers which are flushed to zero. The specified four rounding modes and the detection of exception conditions is fully supported.


An FPU operation is started by providing the operands, opcode and rounding mode on a rising clock edge. The result and the exception flags will be available three clocks later. The FPU is fully pipelined and a new operation can be started every clock cycle. The only exceptions are the FDIV and FSQRT instructions which require between 15 and 24 clock cycles to complete, and which are not pipelined. They are however calculated in a separate non-blocking execution unit, allowing all other operations to be performed in parallel without stalling the FPU pipeline. The table below summarises the throughput and latency of the supported operations:

Operation
Throughput
Latency Description
FADDS, FADDD, FSUBS, FSUBD,FMULS, FMULD, FSMULD
1
3
Add, subtract, multiply
FITOS, FITOD, FSTOI, FDTOI, FSTOD, FDTOS
1
3
Convert between floats and integers
FCMPS, FCMPD, FCMPES, FCMPED 1
3
Compare
FDIVS/FDIVD
15/16
15/16
Divide (single/double)
FSQRTS/FSQRTD
23/24
23/24
Square-root (single/double)

Validation
The GRFPU core has been extensively validated with a large set of test vectors. Special test programs such as TestFloat, UCBTEST and IEEE CC754 has been used, as well as floating-point based application software.

LEON FPU Control Unit
The GRFPU can be attached to LEON2 and LEON3  processors through the LEON FPU Control unit (GRFPC). The control unit receives SPARC FPU instructions (FPOP) from the LEON integer unit, and schedules them for execution by the FPU. The FPOPs are executed in parallel with other integer instructions, the LEON pipeline is only stalled in case of operand or resource conflicts. The GRFPC also includes the FPU register file, the processor floating-point status register (FSR) and a deferred trap queue. The GRFPC is available for all versions of the LEON processor.


The GRFPC requires approximately 4,000 LUTs on a Virtex-II FPGA or 20 kgates on a typical 0.13 um process.

Fault-tolerance
The fault-tolerant version of GRFPU and GRFPC includes SEU protection by design. The FPU register file is protected using (39,7) BCH coding, while all other registers are protected with TMR.

Documentation
Data sheet and white paper are available at download page.

Availability
GRFPU and GRFPC are available immediately and licensed together. A pre-synthesized Virtex-II netlist is available for download.