Digital Core Design

The Power of Intellectual Property

DFPMU

Floating Point Coprocessor

The DFPMU is a Floating Point Coprocessor, designed to assist CPU in performing the floating point mathematic computations. It replaces directly C software functions by equivalent, very fast hardware operations, which significantly accelerate system performance. It requires neither programming nor modifications in the main software. Everything is done automatically during software compilation by the DFPMU C driver.

Our efficient coprocessor was designed to operate with DCD’s DP8051 but can also operate with any other 8-, 16- and 32-bit processor. Drivers for all popular 8051 C compilers are delivered together within the package.
The DFPMU uses the specialized CORDIC and standard algorithms to compute math functions. It supports addition, subtraction, multiplication, division, square root, comparison, absolute value, change sign of a number and trigonometric functions: sine, cosine, tangent and arctangent. It has built-in conversion instructions from integer type to floating point type and vice versa. The input numbers' format has been developed according to IEEE-754 standard. Our solution supports single precision real numbers, 16-bit, 32-bit integers and is prepared to use with 8-, 16- and 32-bit processors.
The DFPMU is a technology independent design, that can be implemented in a variety of process technologies.


Family summary

Design Standard compliance Arithmetic operations
ADD, SUB, MUL, DIV, SQRT, COMP
Trigonometric operations
SIN, COS, TAN, ARCTAN
Processors interfaces
8,16,32 bit
Single precision Double precision 8/16/32 bit integers 52-bit integers
DFPAU IEEE-754 + - + + - - -
DFPMU IEEE-754 + + + + - + -
DFPAU-DP IEEE-754 + - + + + + +
DFPMU-DP IEEE-754 + + + + + + +

The main features of each Arithmetic Coprocessors family member has been summarized in table above. It gives a briefly member characterization helping you to select the most suitable IP Core for your application.

Performance

Each core has been tested in variety of FPGA and ASIC technologies. Its implementation results are summarized below.

Implementation Speed
grade
LUTs/PFUs Frequency
[MHz]
ispXPGA -5 5327/1393 42

DFPMU implementation results for LATTICE devices. 
All features have been included. 

Implementation Speed
grade
Slices Frequency
[MHz]
SPARTAN-3 -5 2330 47
SPARTAN-3E -5 2930 80
SPARTAN-6 -3 1445 80
VIRTEX-II -5 2330 77
VIRTEX-II pro -7 2330 85
VIRTEX-4 -11 2930 103
VIRTEX-5 -3 1520 135

DFPMU implementation results for XILINX devices. 
All features have been included. 

Implementation Speed
grade
Logic Cells Frequency
[MHz]
STRATIX -5 4460 108
CYCLONE -6 4650 90
CYCLONE-II -6 4520 96
STRATIX-II -3 3300 168
STRATIX-IV -2 3900 220

DFPMU implementation results for ALTERA devices.
All features have been included. 


Info

The table and figures below illustrates the system with DFPMU performance improvements for two typical CPU.
The DFPMU floating point performance instructions has been compared to standard C library functions delivered with every commercial C compiler. Each program was executed in the same system environments. Number of clock periods were measured between input data loading into work registers and output result storing after operation. The results are placed in tables below.
Improvement has been computed as a number of clock cycles reuired by the CPU to compute FP operation, by the number of clocks required to compute the same operation by system of CPU with DFPMU:

DP8051 BASED SYSTEM

The following table gives a survey about the DP8051+DFPMU performance compared to std 8051 microcontroller.

Device Improvement
80C51 1.0
DP8051 7.3
DP8051+DFPMU  162.0

 

Improvements of particular operations are presented below.

IEEE-754 FP Instruction Improvement
Addition 73
Subtraction 60
Multilication 65
Division 182
Square Root 392
Sine 139
Cosine 144
Tangent 222
Arcs Tangent 182
Average speed improvement: 162

32-BIT RISC BASED SYSTEM

The table below shows performance improvements of the sample 32-bit-RISC CPU with DFPMU, compared to the same system without the DFPMU coprocessor.

Device Improvement
CPU 1.0
CPU+DFPMU (arithmetic) 7.5
CPU+DFPMU (trigonometric)  49.2
CPU+DFPMU (overall) 28.3

 

Improvements of particular operations are presented below.

IEEE-754 FP Instruction Improvement
Addition 6.4
Subtraction 6.5
Multilication 5.1
Division 6.5
Square Root 12.9
Sine 40.8
Cosine 41.3
Tangent 65
Arcs Tangent 49.6
Average speed improvement: 28.3

 

Key Features

  • Direct replacement for C float software functions such as: +, -, *, /,==, !=,>=, <=, <, >
  • C interface supplied for all popular compilers: GNU C/C++, 8051 compilers
  • No programming required
  • IEEE-754 Single precision real format support – float type
  • 16-bit word and 32-bit short integers format supported – integer types
  • Flexible arguments and result registers location
  • Performs the following functions:
    • FADD, FSUB – addition, subtraction
    • FMUL, FDIV – multiplication, division
    • FSQRT – square root
    • FCHS, FABS – change of sign, absolute value
    • FXAM – examine input data
    • FUCOM – comparison
    • FSIN, FCOS – sine, cosine
    • FTAN – tangent
    • FATAN – arctangent
    • FILDW, FILD – 16-bit, 32-bit integer to float
    • FISTW, FIST – float to 16-bit, 32-bit integer
  • Exceptions built-in routines
  • Masks each exception indicator:
    • Precision lack PE
    • Underflow result UE
    • Overflow result OE
    • Invalid operand IE
    • Division by zero ZE
    • Denormal operand DE
  • Fully synthesizable
  • Static synchronous design
  • Positive edge clocking and no internal tri-states
  • Scan test ready

Applications

  • Math coprocessors
  • DSP algorithms
  • Embedded arithmetic coprocessor
  • Fast data processing & control

Symbol

 datai1 (31:0)
 addr2 (4:0)
 cs
 we
datao1 (31:0) 
irq 

Pins description

PinTypeDescription
datai1 (31:0)inputData bus input
addr2 (4:0)inputRegister address to read/write
csinputChip select for read/write
weinputData write enable
datao1 (31:0)outputData bus output
irqoutputInterrupt request indicator

Block Diagram

AlignIt performs the numbers analyze against IEEE-754 standard compliance. Information about the data classes is passed as a result to appropriate internal module.
Control UnitIt manages execution of all instructions and internal operation required to carry particular function.
ExponentIt performs operations on exponent part of number. The addition, subtraction, shifting, comparison and conversion operations are executed in this module. It contains exponents and work registers.
CORDICCORDIC performs trigonometric operations on input data. The sine, cosine, tangent and arctangent operations are executed in this module. It contains three work registers.
InterfaceIt is an interface between external device and DFPAU internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors.

1 - data bus can be configured as 8-, 16- or 32- bit depends on processor's bus size
2 - address bus is aligned to work with 8- (3:0), 16- (3:1) or 32- (4:2) bit processors
datai1 (31:0)
datao1 (31:0)
addr2 (4:0)
cs
we
irq
MantissaIt performs operations on mantissa part of number. The addition, subtraction, multiplication, division, square root, comparison and conversion operations are executed in this module. It contains mantissas and work registers.
ShifterIt performs mantissa shifting during normalization, denormalization operations. Information about out-shifted bits is stored for rounding process.
Exponent bus Exponent data bus is 17-bit wide bus used for exponent transferring between modules.
Mantissa Mantissa data bus. It is 70-bit wide internal bus used for mantissas transferring between modules.
Control bus Control bus is intended for control signals connected to each module. Main control is performed by Control Unit.

Units

Align
It performs the numbers analyze against IEEE-754 standard compliance. Information about the data classes is passed as a result to appropriate internal module.
Control Unit
It manages execution of all instructions and internal operation required to carry particular function.
Exponent
It performs operations on exponent part of number. The addition, subtraction, shifting, comparison and conversion operations are executed in this module. It contains exponents and work registers.

CORDIC
CORDIC performs trigonometric operations on input data. The sine, cosine, tangent and arctangent operations are executed in this module. It contains three work registers.
Interface
It is an interface between external device and DFPAU internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors.

1 - data bus can be configured as 8-, 16- or 32- bit depends on processor's bus size
2 - address bus is aligned to work with 8- (3:0), 16- (3:1) or 32- (4:2) bit processors
Mantissa
It performs operations on mantissa part of number. The addition, subtraction, multiplication, division, square root, comparison and conversion operations are executed in this module. It contains mantissas and work registers.

Shifter
It performs mantissa shifting during normalization, denormalization operations. Information about out-shifted bits is stored for rounding process.