# Fixed-point library

Fixed-point data types represent fractional numbers as integer values with a fixed number of bits used for the integer part of the value and a fixed number of bits used for the fractional part. Fixed point data types can have an arbitrary number of total bits and are distinguished between signed and unsigned data. This software module only supports a total bit number of 32. The fixed-point library provides a data type definition for the PS to ease the handling of them.

This software module is not intended to do fixed-point math on the processor and does not provide functions for it! It is only intended to be used in the lowest software layer of IP-Core drivers to read/write to and from the PL (i.e., functions that are named with _hw). The aim is to provide a clean way to interact with IP-Cores that use fixed-point data representation. Thus, all functions accept float values as their inputs and return float values, all fixed point handling is done internally. The read and write functions check against boundaries of the fixed-point data type - the data itself is always a 32-bit integer internally on the PS, which is not exposed to the user. The 32-bit limitation is due to the AXI data width of 32-bit.

• Data is passed and received from the software module as float values

• Data is stored as 32-bit integers on the processor internally (signed/unsigned).

• The integer is divided into fractional and integer bits

• The first $$N$$-bits are the fractional bits depending on the representation

• The following $$M$$-bits are the integer bits depending on the representation

• The fixed-point data is $$K=N+M$$ bits wide with $$K \leq 32$$

Fig. 82 showcases the split-up of a 32-bit integer variable. Note that this is just a mental model for the user, neither the processor nor the compiler do know about the split up.

## Scaling and precision

The represented fixed-point number is scaled by a fixed scaling factor $$s$$. The software assumes binary scaling, that is, base 2 is used for scaling the fixed point value. The scaling factor $$s$$ is determined by the number of fractional bits $$N$$:

$s=2^{N}$

A floating-point value $$x_f$$ is converted to a fixed point number (stored integer) $$x_i$$ by:

$x_i = x_f \cdot s = x_f \cdot 2^{N}$

The inverse conversion from the stored integer $$x_i$$ to the floating-point representation $$x_f$$ is done by:

$x_f = x_i \cdot 2^{-N}$

Please note that there are conflicting definitions of fixed-point data types. The fixed point data type used by Vitis HLS and Matlab HDL-Coder is defined as two’s complement. The definition outlined of the software module is therefore consistent with Vitis HLS and Matlab HDL-Coder. See the following information for more details:

## Unsigned fixed point data

For unsigned fixed-point data, the smallest and largest values that the data type can represent are calculated by:

$\begin{split}min &= 0 \\ max &= 2^{M}-2^{-N}\end{split}$

Example: unsigned fixed point with 16 bits of which $$N=5$$ bits are used for the fraction and $$M=16-5=11$$ bits are used for the integer part. The smallest representable value of unsigned fixed-point data type is zero ($$min=0$$). The largest number is:

$max=2^{M}-2^{-N}=2^{11}-0.0312=2047.96875$

## Signed fixed-point data

A signed fixed-point data with $$M$$ integer bits and $$N$$ fractional bits can represent integers from:

$\begin{split}min &=-2^{(M-1)} \\ max &=2^{(M-1)}-2^{-N}\end{split}$

Example: signed fixed point with 16 bits of which $$N=5$$ bits are used for the fraction and $$M=16-5=11$$ bits are used for the integer part. The precision of the data type is equal to the inverse of the scaling $$s^{-1}=2^{-N}=2^{-5}=0.03125$$. The smallest and largest representable numbers for this data type are:

$\begin{split}min &=-2^{M-1}=-2^{11-1}=-1024 \\ max &=2^{M-1}-2^{-N}=2^{11-1}-0.0312=1023.96875\end{split}$

## Rounding

Since all input and output functions of this software module use single-precision floating-point values (i.e., float), the values have to be rounded to be represented in fixed-point data type. Note that floating-point values also introduce rounding errors and can not represent values exactly. See the following for details:

Given the limited precision of the fixed-point data type (determined by the number of fractional bits $$N$$), the floating-point value is rounded when converting to fixed-point precision. The following rounding methods are possible:

The software module always rounds towards the nearest integer!

## Conversion

Converting the floating-point value of $$x_f=2.9$$ to a signed fixed-point data type with $$M=14$$ bits for the integer part and $$N=2$$ bits for the fraction yields the scaling factor $$s=2^{2}=4$$. Note that the different rounding modes are shown here to highlight their importance and to keep in mind that round to nearest integer is used by the software module.

The stored integer is calculated by:

$\begin{split}x_i &=x_f \cdot 2^{N} \\ x_i &=2.9 \cdot 4 = 11.6\end{split}$

The result is rounded by a rounding function:

• ceil: $$x_i=12$$ ($$x_f=3.0$$)

• floor: $$x_i=11$$ ($$x_f=2.75$$)

• round: $$x_i=12$$ ($$x_f=3.0$$)

• trunc: $$x_i=11$$ ($$x_f=2.75$$)

To convert back to a floating-point value, the stored integer $$x_i$$ is multiplied by the inverse scaling factor:

$\begin{split}x_f &= x_i \cdot 2^{-N} \\ x_{f,ceil} &= 12 \cdot 2^{-2}=3.0 \\ x_{f,floor} &= 11 \cdot 2^{-2}=2.75 \\ x_{f,round} &= 12 \cdot 2^{-2}=3.0 \\ x_{f,trunc} &= 11 \cdot 2^{-2}=2.75\end{split}$

Note how the rounding method determines if the error is $$0.1$$ or $$0.15$$.

## Write

Write a value that is a float in the processor to an IP-Core that expects signed fixed-point data with 3 integer and 4 fraction bits.

#include "uz_fixedpoint.h"

struct uz_fixedpoint_definition_t def={
.is_signed=true,
.fractional_bits=4,
.integer_bits=3
};
float write_value=1.0f;


Read a value from an IP-Core that is an unsigned fixed-point with 10 integer bits and 2 fractional bits and pass it to the processor as a float.

#include "uz_fixedpoint.h"

struct uz_fixedpoint_definition_t def={
.is_signed=false,
.fractional_bits=2,
.integer_bits=10
};


### Reference

struct uz_fixedpoint_definition_t

Configuration struct for the fixed point data type.

Public Members

bool is_signed

Determines if the fixed point value is signed or unsigned

int32_t fractional_bits

Number of bits for the the fraction

int32_t integer_bits

Number of bits for the integer part

Reads a fixed-point data type from AXI and converts it to float given the fixed-point definition.

Parameters

• fixedpoint_definition – Definition of fixedpoint data type

Returns

float

void uz_fixedpoint_axi_write(uint32_t memory_address, float data, struct uz_fixedpoint_definition_t fixedpoint_definition)

Converts the input data to a fixedpoint data type, rounds to nearest integer, and writes it to AXI.

Parameters

• data – Data that is written to AXI

• fixedpoint_definition – Definition of fixedpoint data type

void uz_fixedpoint_check_limits(float data, struct uz_fixedpoint_definition_t fixedpoint_definition)

Checks that the data is within the min/max that is representable by the fixed-point data type.

Parameters
• data – Data that is checked

• fixedpoint_definition – Definition of fixedpoint data type

float uz_fixedpoint_get_precision(struct uz_fixedpoint_definition_t input)

Calculates the precision of the specified data type.

Parameters
• input – Definition of fixed-point data type

Returns

float

float uz_fixedpoint_get_max_representable_value(struct uz_fixedpoint_definition_t input)

Calculates the biggest representable value of the given data type definition.

Parameters
• input – Definition of fixed-point data type

Returns

float

float uz_fixedpoint_get_min_representable_value(struct uz_fixedpoint_definition_t input)

Calculates the smallest representable valueof the given data type definition.

Parameters
• input – Definition of fixed-point data type

Returns

float