The GRFPU is an IEEE-754 compliant floating-point unit, supporting both single and double precision
operands. The advanced design combines high throughput with low
latency, providing up to 250 MFLOPS on a 0.13 um ASIC process. The host
interface is clean and versatile, simplifying the interfacing to
processor pipelines and DSPs. The accuracy and convergence of the FPU
algorithms have been proven mathematically, and the implementation has
been validated with more than 20 million test vectors. A brief
datasheet an be downloaded here.
- IEEE-754 compliant, supporting all rounding modes and
exceptions
- Operations: add, subtract, multiply, divide, square-root,
convert, compare, move, abs, negate
- Data formats: single and double precision (32- and 64-bit
floats)
- Fully pipelined, 3 clock cycles latency for all operations
except divide and square-root
- Non-blocking parallel execution of divide and square-root
operations
- Clean and versatile interface
- LEON FP Control unit available
- Supports all SPARC V8 floating-point instructions
- 250 MHz (250 MFLOPS) on a typical 0.13um standard cell
process using less than 100 kgates
- 65 MHz (65 MFLOPS) on a Virtex-II FPGA using approximately
8,500 LUTs
- Fault-tolerant (FT) version available
Functional Description The GRFPU performs operations on single and
double precision floating-point operands. All operations are IEEE-754
compliant, with exception of denormalized numbers which are flushed to
zero. The specified four rounding modes and the detection of exception
conditions is fully supported.

An FPU operation is started by providing the operands, opcode and
rounding mode on a rising clock edge. The result and the exception
flags will be available three clocks later. The FPU is fully pipelined
and a new operation can be started every clock cycle. The only
exceptions are the FDIV and FSQRT instructions which require between 15
and 24 clock cycles to complete, and which are not pipelined. They are
however calculated in a separate non-blocking execution unit, allowing
all other operations to be performed in parallel without stalling the
FPU pipeline. The table below summarises the throughput and latency of
the supported operations:
Operation
|
Throughput
|
Latency |
Description |
FADDS, FADDD,
FSUBS, FSUBD,FMULS, FMULD, FSMULD
|
1
|
3
|
Add,
subtract, multiply
|
FITOS, FITOD,
FSTOI, FDTOI, FSTOD, FDTOS
|
1
|
3
|
Convert
between floats and integers
|
| FCMPS, FCMPD,
FCMPES, FCMPED |
1
|
3
|
Compare
|
FDIVS/FDIVD
|
15/16
|
15/16
|
Divide
(single/double)
|
FSQRTS/FSQRTD
|
23/24
|
23/24
|
Square-root
(single/double)
|
The GRFPU core has been extensively validated with a large
set of test vectors. Special test programs such as TestFloat, UCBTEST
and IEEE CC754 has been used, as well as floating-point based
application software.
The GRFPU can be attached to LEON2 and LEON3 processors through the LEON FPU Control
unit (GRFPC). The control unit receives SPARC FPU instructions (FPOP)
from the LEON integer unit, and schedules them for execution by the
FPU. The FPOPs are executed in parallel with other integer
instructions, the LEON pipeline is only stalled in case of operand or
resource conflicts. The GRFPC also includes the FPU register file, the
processor floating-point status register (FSR) and a deferred trap
queue. The GRFPC is available for all versions of the LEON processor.

The GRFPC requires approximately 4,000 LUTs on a Virtex-II FPGA or 20
kgates on a typical 0.13 um process.
The fault-tolerant version of GRFPU and GRFPC includes SEU
protection by design. The FPU register file is protected using (39,7)
BCH coding, while all other registers are protected with TMR.
Documentation
Data sheet and white paper are available at download
page.
GRFPU and GRFPC are available immediately and licensed together. A pre-synthesized Virtex-II netlist is available for download.
|