Neural network#

The following describes the basic theory and definitions for multilayer perceptron networks, a class of feedforward neutral networks. The software module uz_nn is based on the definitions on this page.

Network and dimension definition#

A neural network consists of an input layer, one or multiple hidden layer, and an output layer. Each layer has one or multiple neurons (also called perceptorn or nodes). A network with one input layer, one hidden layer and one output layer has \(l=2\) layers (hidden layer +1, input layer is not counted). A network can have a different number of inputs and outputs, e.g., two inputs and one output. Each hidden layer has a defined number of neurons in the layer.

The weight connecting the first input \(x_1\) to the first neuron of the first hidden layer is called \(w^{(1)}_{11}\). From the first input to the second neuron \(w^{(1)}_{12}\) and from the second input to the first neuron of the first hidden layer \(w^{(1)}_{21}\). Generalized:

\[w^{(l)}_{i,j}\]

The index of the layer \(l\) is counted by a superscript (\(w^{(l)}\)). Therefore, \(w^{(1)}\) is the complete matrix with all weights of the layer \(l=1\). The row of the weight matrix is defined by the number of connections which end in the layer. That is, the number of rows (\(m\)) is equal to the number of inputs in first hidden layer (\(w^{(1)}\)) and for all other hidden layer (\(l>1\)) the number of rows (\(m\)) is equal to the number of neurons of the previous hidden layer (\(l-1\)). Each neuron in a hidden layer \(l\) has one connection to every neuron of the following layer \(l+1\) (fully connected).

The weight matrix has the following generic dimensions:

../../../_images/weights.svg

Fig. 261 Dimensions of weight matrix#

For each layer \(l\) there is a weight and a bias matrix. The matrix is numbered by the layer \(l\) of which the weight belongs to (= the layer to which the weight connects to / where the arrow ends). The first subscript \(i\) notes the number (counted from up to down in the layer) of the starting neuron (or the input). The second subscript \(j\) notes the number of the neuron where the connection ends.

../../../_images/simple_nn_twolayer.svg

Fig. 262 Simple neural network with naming scheme of weights#

The weights and bias of the network in Fig. 262 are represented by the following equations.

\[\begin{split}w^{(1)}_{ij}=\left[ \begin{array}{rr} w_{11} & w_{12} \\ w_{21} & w_{22} \\ \end{array}\right]\end{split}\]
\[\begin{split}w^{(2)}_{ij}=\left[ \begin{array}{rr} w_{11} \\ w_{21} \\ \end{array}\right]\end{split}\]

The bias are not shown but represented as following:

\[\begin{split}b^{(1)}_{j}=\left[ \begin{array}{rr} b_{1} & b_{2} \\ \end{array}\right]\end{split}\]
\[\begin{split}b^{(2)}_{j}=\left[ \begin{array}{rr} b_{1}\\ \end{array}\right]\end{split}\]

Note

The notation across the literature and different other software implementations (e.g., Tensorflow, Pytorch, Matlab) is not consistent. The definition of inputs, weights, and bias can be transposed and the calculation of \(s\) rearranged (\(w^T y\) instead of \(yw\)) without changing the function of the network. Having the inputs as column or row vector is used by differend software modules, see this article for example.

Neurons#

../../../_images/neuron.svg

Fig. 263 First neuron \(j=1\) of layer \(l\) with inputs \(y^l\), weights \(w^l_{i,j}\), bias \(b^l_j\) and output \(y^l_j\) (definition according to [1])#

Neurons are the basic building block of neural networks. A neuron sums over its weighted input values as well as the bias and calculates the output based on an arbitrary activation function \(\mathcal{F}(\cdot)\). The notation of this software module denotes the number of the layer with the superscript \(l\) for all parameters. The weight connecting the output \(y^{l-1}_i\) of the \(i\)-th neuron of the previous layer \(l-1\) with the input of the \(j\)-th neuron of the layer \(l\) is denoted by \(w^l_{i,j}\). The following equation calculates the dot product of the weight vector \(\boldsymbol{w}^l_j\) and the input vector \(\boldsymbol{y}^l_j\) of the \(j`th neuron of layer :math:`l\) with the length \(n\) and adds the bias \(b^l_j\) to yield the sum \(s^l_j\) of the neuron inputs.

\[s^l_j =\sum^n_{i=1} y^{l-1}_{ij} w^l_{ij} + b^l_j\]

The output value \(y^l_j\) of the neuron is calculated by the activation function for all hidden layers.

\[y^l_j = \mathcal{F}(s^l_j)\]

Network example#

MLP are implemented with the following definition and representation of the neural network. The neural network has a number of layers which consists of the input layer, the output layer, and the number of hidden layer \(l\)). Each layer has a number of neurons.

../../../_images/nn_structure.svg

Fig. 264 Structure of a neural network#

The MLP shown in Fig. 264 has two inputs, two hidden layer with three neurons each, and one output. The input is defined as:

\[\begin{split}x &=y^{(0)}=\left[ \begin{array}{rr} x_{1} & x_{2} \\ \end{array}\right] \\ x &=y^{(0)}=\left[ \begin{array}{rr} 1 & 2 \\ \end{array}\right]\end{split}\]

The output is defined as:

\[\begin{split}y^{(3)}=\left[ \begin{array}{rr} y_{1} \\ \end{array}\right]\end{split}\]

The weights and bias matrices for each layer with example values are given n the following. For the first hidden layer:

\[\begin{split}w^{(1)} &=\left[ \begin{array}{rr} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \\ \end{array}\right] \\ w^{(1)} &=\left[ \begin{array}{rr} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{array}\right] \\ b^{(1)} &=\left[ \begin{array}{rr} b_1 & b_2 & b_3 \\ \end{array}\right] \\ b^{(1)} &=\left[ \begin{array}{rr} 1 & 2 & 3 \\ \end{array}\right]\end{split}\]

For the second hidden layer:

\[\begin{split}w^{(2)} &=\left[ \begin{array}{rr} w_{11} & w_{12} & w_{13} \\ w_{21} & w_{22} & w_{23} \\ w_{31} & w_{32} & w_{33} \end{array}\right] \\ w^{(2)} &=\left[ \begin{array}{rr} -7 & -8 & -9 \\ -10 & -11 & -12 \\ 13 & 14 & -15 \\ \end{array}\right] \\ b^{(2)} &=\left[ \begin{array}{rr} b_1 & b_2 & b_3 \\ \end{array}\right] \\ b^{(2)} &=\left[ \begin{array}{rr} 4 & 5 & 6 \\ \end{array}\right]\end{split}\]

For the output layer:

\[\begin{split}w^{(3)} &=\left[ \begin{array}{rr} w_{11} \\ w_{21} \\ w_{31} \end{array}\right] \\ w^{(3)} &=\left[ \begin{array}{rr} 16 \\ 17 \\ -18 \end{array}\right] \\ b^{(3)} &=\left[ \begin{array}{rr} b_1 \\ \end{array}\right] \\ b^{(3)} &=\left[ \begin{array}{rr} 7 \\ \end{array}\right]\end{split}\]

The activation function of the hidden layer is set to ReLU, the output activation function to linear. The following section calculates all steps and intermediate results in the network.

First layer#

\[\begin{split}\boldsymbol{x} \boldsymbol{w^{(1)}} + \boldsymbol{b^{(1)}} &= \boldsymbol{s^{(1)}} \\ \left[ \begin{array}{rr} 1 & 2 \\ \end{array}\right] \left[ \begin{array}{rr} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{array}\right] + \left[ \begin{array}{rr} 1 & 2 & 3 \\ \end{array}\right] &= \left[ \begin{array}{rr} 9 & 12 & 15 \\ \end{array}\right]\end{split}\]

Activation function:

\[\begin{split}y^{1} &= ReLU(\boldsymbol{s^{(1)}}) \\ y^{1} &= ReLU( \left[ \begin{array}{rr} 10 & 14 & 18 \\ \end{array}\right])\\ &= \left[ \begin{array}{rr} 10 & 14 & 18 \\ \end{array}\right]\end{split}\]

Second layer#

The input of the second hidden layer is the output of the first hidden layer \(y^{(1)}\):

\[\begin{split}\boldsymbol{y^{(1)}} \boldsymbol{w^{(2)}} + \boldsymbol{b^{(2)}} &= \boldsymbol{s^{(2)}} \\ \left[ \begin{array}{rr} 10 & 14 & 18 \\ \end{array}\right] \left[ \begin{array}{rr} -7 & -8 & -9 \\ -10 & -11 & -12 \\ 13 & 14 & -15 \\ \end{array}\right] + \left[ \begin{array}{rr} 4 & 5 & 6 \\ \end{array}\right] &= \left[ \begin{array}{rr} 28 & 23 & -522 \\ \end{array}\right]\end{split}\]

Activation function:

\[\begin{split}y^{2} &= ReLU(\boldsymbol{s^{(2)}}) \\ y^{2} &= ReLU( \left[ \begin{array}{rr} 28 & 23 & -522 \\ \end{array}\right])\\ &= \left[ \begin{array}{rr} 28 & 23 & 0 \\ \end{array}\right]\end{split}\]

Output layer#

The input of the output layer is the output of the second hidden layer \(y^{(2)}\):

\[\begin{split}\boldsymbol{y^{(2)}} \boldsymbol{w^{(3)}} + \boldsymbol{b^{(3)}} &= \boldsymbol{s^{(3)}} \\ \left[ \begin{array}{rr} 28 & 23 & 0 \\ \end{array}\right] \left[ \begin{array}{rr} 16 \\ 17 \\ -18 \end{array}\right] + \left[ \begin{array}{rr} 7 \\ \end{array}\right] &= \left[ \begin{array}{rr} 846 \\ \end{array}\right]\end{split}\]

Activation function:

\[\begin{split}y^{3} &= linear(\boldsymbol{s^{(3)}}) \\ y^{3} &= linear( \left[ \begin{array}{rr} 846 \\ \end{array}\right])\\ &= \left[ \begin{array}{rr} 846 \\ \end{array}\right]\end{split}\]

Sources#

Software implementation#