.. _uz_nn:

=====
uz_nn
=====

The neural network software implementation follows the definitions outlined in :ref:`neural_network`.
The module is based on the :ref:`uz_nn_layer` and the :ref:`uz_matrix` module.

Features and limitations:

- Multiplayer perceptron
- No recurrent connections
- Configurable number of outputs
- Configurable number of inputs
- Configurable number of neurons per layer
- Configurable number of hidden layer (min. 1, max. 9)
- Configurable activation function per layer

Software
========

- The internal struct of the uz_nn object holds an array of pointer to layers
- This array is always of length ``UZ_NN_MAX_LAYER``, which is arbitrarily set to 10.
- The define can be changed, which changes the size of the array for all instances of uz_nn
- This solution makes it possible to have different number of layers for different instances of uz_nn
- uz_nn_init uses the config struct of :ref:`uz_nn_layer`
- Compared to other modules (see :ref:`software_development_guidelines`), the initialization function (``uz_nn_init``) takes an array of pointers to the config struct of each layer instead of one config structure
- This allows for individual configuration of each layer with a variable amount of layers
- The number of outputs is automatically determined based on the number of neurons in the last layer
- The arrays that hold the actual data (weights, bias, outputs of each layer) have to be allocated manually (see :ref:`uz_matrix`)

Initialization of config struct
*******************************

- Initialization of the config struct uses an array of config structs
- Each element of the config struct has to be initialized with designated initializers
- The first element (zero index ``[0]``!) of the config struct configures the first hidden layer
- The following elements define the subsequent hidden layers
- The last element of the config struct configures the output layer
- Arrays for the data (weight, bias, output) have to be provided for each layer

.. tip:: Use defines to setup the dimensions of the data arrays

Dimensions of arrays
********************

- Care has to be taken regarding the dimensions of the arrays that hold weights, bias, and outputs of the layer
- The array that holds the weights of the first hidden layer has to be of length ``NUMBER_OF_INPUTS`` * ``NUMBER_OF_NEURONS_IN_FIRST_HIDDEN_LAYER``
- The array that holds the weights of any other hidden layer has to be of length ``NUMBER_OF_NEURONS_IN_PREVIOUS_LAYER`` * ``NUMBER_OF_NEURONS_IN_THIS_HIDDEN_LAYER``
- The array that holds the bias of a hidden layer has to be equal to the number of neurons in the respective layer
- The array that holds the bias of the output layer has to be equal to the number of outputs
- The array that holds the output of a layer is of the same dimension as the array of the array that holds the bias values

Example initialization
**********************

The following shows an example initialization of a ``uz_nn`` that implements the example network of :ref:`neural_network` and that is used in the unit test ``test_uz_nn_ff``.

.. code-block::
    :caption: Initialization of config struct for uz_nn and forward calculation

    #define NUMBER_OF_INPUTS 2
    #define NUMBER_OF_OUTPUTS 1
    #define NUMBER_OF_NEURONS_IN_HIDDEN_LAYER 3

    static float x[NUMBER_OF_INPUTS] = {1, 2};
    static float w_1[NUMBER_OF_INPUTS * NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {1, 2, 3, 4, 5, 6};
    static float b_1[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {1, 2, 3};
    static float y_1[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {0};
    static float w_2[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER * NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {-7, -8, -9, -10, -11, -12, 13, 14, -15};
    static float b_2[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {4, 5, 6};
    static float y_2[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER] = {0};
    static float w_3[NUMBER_OF_NEURONS_IN_HIDDEN_LAYER * NUMBER_OF_OUTPUTS] = {16, 17, -18};
    static float b_3[NUMBER_OF_OUTPUTS] = {7};
    static float y_3[NUMBER_OF_OUTPUTS] = {0};

    struct uz_nn_layer_config config[3] = {
    [0] = {
        .activation_function = activation_ReLU,
        .number_of_neurons = NUMBER_OF_NEURONS_IN_HIDDEN_LAYER,
        .number_of_inputs = NUMBER_OF_INPUTS,
        .length_of_weights = UZ_MATRIX_SIZE(w_1),
        .length_of_bias = UZ_MATRIX_SIZE(b_1),
        .length_of_output = UZ_MATRIX_SIZE(y_1),
        .weights = w_1,
        .bias = b_1,
        .output = y_1},
    [1] = {.activation_function = activation_ReLU,
            .number_of_neurons = NUMBER_OF_NEURONS_IN_HIDDEN_LAYER,
            .number_of_inputs = NUMBER_OF_NEURONS_IN_HIDDEN_LAYER,
            .length_of_weights = UZ_MATRIX_SIZE(w_2),
            .length_of_bias = UZ_MATRIX_SIZE(b_2),
            .length_of_output = UZ_MATRIX_SIZE(y_2),
            .weights = w_2,
            .bias = b_2,
            .output = y_2},
    [2] = {.activation_function = activation_linear,
           .number_of_neurons = NUMBER_OF_OUTPUTS,
           .number_of_inputs = NUMBER_OF_NEURONS_IN_HIDDEN_LAYER,
           .length_of_weights = UZ_MATRIX_SIZE(w_3),
           .length_of_bias = UZ_MATRIX_SIZE(b_3),
           .length_of_output = UZ_MATRIX_SIZE(y_3),
           .weights = w_3,
           .bias = b_3,
           .output = y_3}
    };

    void test_uz_nn_ff(void)
    {
        struct uz_matrix_t input_matrix={0};
        uz_matrix_t* input=uz_matrix_init(&input_matrix,x,UZ_MATRIX_SIZE(x),1,2);
        uz_nn_t *test = uz_nn_init(config, 3);
        uz_nn_ff(test,input);
        float expected_result_first_layer[3]={10, 14, 18};
        float expected_result_second_layer[3]={28, 23, 0};
        float expected_result_output_layer[1]={846};

        TEST_ASSERT_EQUAL_FLOAT_ARRAY(expected_result_first_layer,y_1,UZ_MATRIX_SIZE(expected_result_first_layer));
        TEST_ASSERT_EQUAL_FLOAT_ARRAY(expected_result_second_layer,y_2,UZ_MATRIX_SIZE(expected_result_second_layer));
        TEST_ASSERT_EQUAL_FLOAT_ARRAY(expected_result_output_layer,y_3,UZ_MATRIX_SIZE(expected_result_output_layer));

        float expected_result=846;
        uz_matrix_t* output=uz_nn_get_output_data(test);
        float result=uz_matrix_get_element_zero_based(output,0,0);
        TEST_ASSERT_EQUAL_FLOAT(expected_result,result);
    }

The network takes approximately 
The same network with different activation functions in the hidden layers:

- ReLU: :math:`3.5 \mu s`
- activation_sigmoid: :math:`5.5 \mu s`
- activation_sigmoid2: :math:`6.5 \mu s`
- activation_tanh: :math:`5.0 \mu s`


Initialization of pretrained network
************************************

To ease the declaration of weight and bias arrays, initialization based on ``.csv`` data can be used, like so:

.. code-block::

    static float weights[]=
    {
        #include "weights.csv"
    };

The weights have to be a ``.csv`` with the separator set to ``comma``.
Furthermore, for the weights, the first :math:`n` elements correspond to the first row of weights with :math:`n` representing the number of neurons in the layer.
Effectively, each row is attached to the columns one by one.
See :ref:`uz_matrix` for details regarding the transformation of matrix to vector dimensions and :ref:`neural_network` regarding the dimension definition of the network.

.. tip:: Use the declaration and defines shown in the examples and unit tests and adjust them to specific networks.

Full example
============

The following example is based on a basic `Matlb Example <https://de.mathworks.com/help/deeplearning/ug/train-and-apply-multilayer-neural-networks.html>`_.
A network with 13 inputs, two hidden layer (50 neurons in the first, 20 neurons in the second), ReLU activatin and one output is trained on a existing data set.
Note that this example is not concerned with the accuracy of the network, it is just used to showcase the initialization of the network and as a test-case.
The Matlab script ``uz_nn_full_example_script.m`` in ``~/ultrazohm_sw/vitis/software/Baremetal/test/uz/uz_nn`` trains the network and writes the weights and bias to a ``.csv`` file.
Be aware that the Matlab neural network definition differs from the network definition used in :ref:`neural_network`, thus the data is transposed and reshaped before the write operation. 
See the file ``test_uz_nn_full_example.c`` in ``~/ultrazohm_sw/vitis/software/Baremetal/test/uz/uz_nn`` for the code.


Execution time on R5
====================

The following lists the expected execution time for different networks with the feedforward calculation in the *empty* (expect for required code for system function) ISR of the R5 processors (takes 2.6 us without feedforward calculation).

- 2 inputs, 1 output, 3 neurons, two hidden layer with ReLU takes 5.0 us
- 2 inputs, 1 output, 3 neurons, two hidden layer with ReLU ten times takes 25.5 us
- (5.0us-2.6us)*10+2.6us is approx. 25.5us, which means that the calculation is actually happening 10 times (compiler does not optimize it away)
- 4 inputs, 8 outputs, 64 neurons, two hidden layer with ReLU takes 89 us.
- 4 inputs, 8 outputs, 64 neurons, one hidden layer with ReLU takes 24.7 us.
- 4 inputs, 8 outputs, 128 neurons, one hidden layer with ReLU takes 44 us.
- 7 inputs, 2 outputs, 100 neurons ReLU, 30.2 us.
- 5 inputs, 8 outputs, three hidden layer with 64 neurons, ReLU, takes 200 us.
- 13 inputs, 1 output, one hidden layer with 20 neurons ReLU, takes 11 us.
- 13 inputs, 1 output, two hidden layer (50 neurons in the first, 20 neurons in the second hidden layer) with 

Optimization
************

All timing above was done with -O2 flag.
Testing with ``-funroll-all-loops`` leads to worse performance (4 inputs, 8 outputs, 64 neurons, two hidden layer with ReLU takes 94 us with the flag compared to 89 us without).
Testing with ``-funroll-loops`` results in 92 us.
Most time in the program is spent on multiplying the inputs of a layer with the weight matrix (as expected).

See:

- https://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Optimize-Options.html
- https://stackoverflow.com/questions/24196076/is-gcc-loop-unrolling-flag-really-effective


Reference
=========

.. doxygentypedef:: uz_nn_t

.. doxygenfunction:: uz_nn_init

.. doxygenfunction:: uz_nn_get_output_data

.. doxygendefine:: UZ_NN_MAX_LAYER