uz_nn#
The neural network software implementation follows the definitions outlined in Neural network. The module is based on the uz_nn_layer and the Matrix math module.
Features and limitations:
Multiplayer perceptron
No recurrent connections
Configurable number of outputs
Configurable number of inputs
Configurable number of neurons per layer
Configurable number of hidden layer (min. 1, max. 9)
Configurable activation function per layer
Software#
The internal struct of the uz_nn object holds an array of pointer to layers
This array is always of length
UZ_NN_MAX_LAYER
, which is arbitrarily set to 10.The define can be changed, which changes the size of the array for all instances of uz_nn
This solution makes it possible to have different number of layers for different instances of uz_nn
uz_nn_init uses the config struct of uz_nn_layer
Compared to other modules (see Software Development Guidelines), the initialization function (
uz_nn_init
) takes an array of pointers to the config struct of each layer instead of one config structureThis allows for individual configuration of each layer with a variable amount of layers
The number of outputs is automatically determined based on the number of neurons in the last layer
The arrays that hold the actual data (weights, bias, outputs of each layer) have to be allocated manually (see Matrix math)
Initialization of config struct#
Initialization of the config struct uses an array of config structs
Each element of the config struct has to be initialized with designated initializers
The first element (zero index
[0]
!) of the config struct configures the first hidden layerThe following elements define the subsequent hidden layers
The last element of the config struct configures the output layer
Arrays for the data (weight, bias, output) have to be provided for each layer
Tip
Use defines to setup the dimensions of the data arrays
Dimensions of arrays#
Care has to be taken regarding the dimensions of the arrays that hold weights, bias, and outputs of the layer
The array that holds the weights of the first hidden layer has to be of length
NUMBER_OF_INPUTS
*NUMBER_OF_NEURONS_IN_FIRST_HIDDEN_LAYER
The array that holds the weights of any other hidden layer has to be of length
NUMBER_OF_NEURONS_IN_PREVIOUS_LAYER
*NUMBER_OF_NEURONS_IN_THIS_HIDDEN_LAYER
The array that holds the bias of a hidden layer has to be equal to the number of neurons in the respective layer
The array that holds the bias of the output layer has to be equal to the number of outputs
The array that holds the output of a layer is of the same dimension as the array of the array that holds the bias values
Example initialization#
The following shows an example initialization of a uz_nn
that implements the example network of Neural network and that is used in the unit test test_uz_nn_ff
.
#define NUMBER_OF_INPUTS 2
#define NUMBER_OF_OUTPUTS 1
#define NUMBER_OF_NEURONS_IN_HIDDEN_LAYER 3
static float x[NUMBER_OF_INPUTS] = {1, 2};
static float w_1[NUMBER_OF_INPUTS * NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {1, 2, 3, 4, 5, 6};
static float b_1[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {1, 2, 3};
static float y_1[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {0};
static float w_2[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER * NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {-7, -8, -9, -10, -11, -12, 13, 14, -15};
static float b_2[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {4, 5, 6};
static float y_2[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {0};
static float w_3[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER * NUMBER_OF_OUTPUTS] = {16, 17, -18};
static float b_3[NUMBER_OF_OUTPUTS] = {7};
static float y_3[NUMBER_OF_OUTPUTS] = {0};
struct uz_nn_layer_config config[3] = {
[0] = {
.activation_function = activation_ReLU,
.number_of_neurons = NUMBER_OF_NEURONS_IN_HIDDEN_LAYER,
.number_of_inputs = NUMBER_OF_INPUTS,
.length_of_weights = UZ_MATRIX_SIZE(w_1),
.length_of_bias = UZ_MATRIX_SIZE(b_1),
.length_of_output = UZ_MATRIX_SIZE(y_1),
.weights = w_1,
.bias = b_1,
.output = y_1},
[1] = {.activation_function = activation_ReLU,
.number_of_neurons = NUMBER_OF_NEURONS_IN_HIDDEN_LAYER,
.number_of_inputs = NUMBER_OF_NEURONS_IN_HIDDEN_LAYER,
.length_of_weights = UZ_MATRIX_SIZE(w_2),
.length_of_bias = UZ_MATRIX_SIZE(b_2),
.length_of_output = UZ_MATRIX_SIZE(y_2),
.weights = w_2,
.bias = b_2,
.output = y_2},
[2] = {.activation_function = activation_linear,
.number_of_neurons = NUMBER_OF_OUTPUTS,
.number_of_inputs = NUMBER_OF_NEURONS_IN_HIDDEN_LAYER,
.length_of_weights = UZ_MATRIX_SIZE(w_3),
.length_of_bias = UZ_MATRIX_SIZE(b_3),
.length_of_output = UZ_MATRIX_SIZE(y_3),
.weights = w_3,
.bias = b_3,
.output = y_3}
};
void test_uz_nn_ff(void)
{
struct uz_matrix_t input_matrix={0};
uz_matrix_t* input=uz_matrix_init(&input_matrix,x,UZ_MATRIX_SIZE(x),1,2);
uz_nn_t *test = uz_nn_init(config, 3);
uz_nn_ff(test,input);
float expected_result_first_layer[3]={10, 14, 18};
float expected_result_second_layer[3]={28, 23, 0};
float expected_result_output_layer[1]={846};
TEST_ASSERT_EQUAL_FLOAT_ARRAY(expected_result_first_layer,y_1,UZ_MATRIX_SIZE(expected_result_first_layer));
TEST_ASSERT_EQUAL_FLOAT_ARRAY(expected_result_second_layer,y_2,UZ_MATRIX_SIZE(expected_result_second_layer));
TEST_ASSERT_EQUAL_FLOAT_ARRAY(expected_result_output_layer,y_3,UZ_MATRIX_SIZE(expected_result_output_layer));
float expected_result=846;
uz_matrix_t* output=uz_nn_get_output_data(test);
float result=uz_matrix_get_element_zero_based(output,0,0);
TEST_ASSERT_EQUAL_FLOAT(expected_result,result);
}
The network takes approximately The same network with different activation functions in the hidden layers:
ReLU: \(3.5 \mu s\)
activation_sigmoid: \(5.5 \mu s\)
activation_sigmoid2: \(6.5 \mu s\)
activation_tanh: \(5.0 \mu s\)
Initialization of pretrained network#
To ease the declaration of weight and bias arrays, initialization based on .csv
data can be used, like so:
static float weights[]=
{
#include "weights.csv"
};
The weights have to be a .csv
with the separator set to comma
.
Furthermore, for the weights, the first \(n\) elements correspond to the first row of weights with \(n\) representing the number of neurons in the layer.
Effectively, each row is attached to the columns one by one.
See Matrix math for details regarding the transformation of matrix to vector dimensions and Neural network regarding the dimension definition of the network.
Tip
Use the declaration and defines shown in the examples and unit tests and adjust them to specific networks.
Full example#
The following example is based on a basic Matlb Example.
A network with 13 inputs, two hidden layer (50 neurons in the first, 20 neurons in the second), ReLU activatin and one output is trained on a existing data set.
Note that this example is not concerned with the accuracy of the network, it is just used to showcase the initialization of the network and as a test-case.
The Matlab script uz_nn_full_example_script.m
in ~/ultrazohm_sw/vitis/software/Baremetal/test/uz/uz_nn
trains the network and writes the weights and bias to a .csv
file.
Be aware that the Matlab neural network definition differs from the network definition used in Neural network, thus the data is transposed and reshaped before the write operation.
See the file test_uz_nn_full_example.c
in ~/ultrazohm_sw/vitis/software/Baremetal/test/uz/uz_nn
for the code.
Execution time on R5#
The following lists the expected execution time for different networks with the feedforward calculation in the empty (expect for required code for system function) ISR of the R5 processors (takes 2.6 us without feedforward calculation).
2 inputs, 1 output, 3 neurons, two hidden layer with ReLU takes 5.0 us
2 inputs, 1 output, 3 neurons, two hidden layer with ReLU ten times takes 25.5 us
(5.0us-2.6us)*10+2.6us is approx. 25.5us, which means that the calculation is actually happening 10 times (compiler does not optimize it away)
4 inputs, 8 outputs, 64 neurons, two hidden layer with ReLU takes 89 us.
4 inputs, 8 outputs, 64 neurons, one hidden layer with ReLU takes 24.7 us.
4 inputs, 8 outputs, 128 neurons, one hidden layer with ReLU takes 44 us.
7 inputs, 2 outputs, 100 neurons ReLU, 30.2 us.
5 inputs, 8 outputs, three hidden layer with 64 neurons, ReLU, takes 200 us.
13 inputs, 1 output, one hidden layer with 20 neurons ReLU, takes 11 us.
13 inputs, 1 output, two hidden layer (50 neurons in the first, 20 neurons in the second hidden layer) with
Optimization#
All timing above was done with -O2 flag.
Testing with -funroll-all-loops
leads to worse performance (4 inputs, 8 outputs, 64 neurons, two hidden layer with ReLU takes 94 us with the flag compared to 89 us without).
Testing with -funroll-loops
results in 92 us.
Most time in the program is spent on multiplying the inputs of a layer with the weight matrix (as expected).
See:
Reference#
-
uz_nn_t *uz_nn_init(struct uz_nn_layer_config config[UZ_NN_MAX_LAYER], uint32_t number_of_layer)#
Initialization of a neural network object.
- Parameters:
config – Array of length(number_of_layer)
number_of_layer – Number of layers including hidden layer and output layer (but not input layer)
- Returns:
uz_nn_t*
-
uz_matrix_t *uz_nn_get_output_data(uz_nn_t const *const self)#
Returns a matrix of dimension 1xOutputs of the last fordward pass.
- Parameters:
self –
- Returns:
uz_matrix_t*
-
UZ_NN_MAX_LAYER#
Arbitrarily defined maximum number of layers for the module. Affect all instances of the module.