Wing Sheung Chan

126 Introduction to neural network classification where { w i } and b are parameters commonly referred to as the weights and bias. When a neural network is being optimised, the weights and biases are usually the “trainable parameters” that are being varied. The way that neurons are connected to each other is referred to as the architecture of the NN. One of the simplest and most common types of architectures is the feed-forward NN. A neural network is feed-forward when the connections between neurons do not form cycles. Neurons in a feed-forward NN can often be arranged into layers, where neurons in a certain layer are only connected to neurons in the previous or the next layer. The first layer of a feed-forward NN is the input layer in which the neurons receive external inputs (usually one neuron per input). The last layer is the output layer that provides external output(s). Any layers in between are called hidden layers. Two layers are fully connected if every neuron in one of the layers is connected to every neuron in the other layer. A feed-forward NN consisting of only fully connected layers is called a dense NN. Neural networks can be trained for various purposes, and there are different methods to train them. Here, we will only discuss neural networks as classifiers trained via supervised learning. To train a neural network for classification, training samples are given to the NN. The true classes of the training samples are known and each class is associated to a certain ideal output of the NN. For example, for a binary classification between signal and background, the signal can be associated with an output of “1” from a single output neuron, while the background can be associated with an output of “0”. The actual output of a neural network can then be compared with the expected output to calculate a value known as the loss (or cost). The loss is defined such that it decreases as the accuracy of the classifier increases. The parameters (often weights and biases) of the NN are then varied to minimise the loss. A commonly chosen definition for the loss of a binary classifier is the binary cross-entropy: H ( y, ˆ y ) = − y log ˆ y − (1 − y ) log (1 − ˆ y ) , (A.4) where y is the target output and ˆ y is the actual output of the classifier. Using this definition, the output of a trained classifier would estimate the Bayesian a posteriori probability [136] . Iterative methods are often used for the optimisation of NNs (minimisation of the loss). In these methods, training samples are usually randomly partitioned into smaller batches. For each iteration, the parameters of the NN are varied to reduce the loss of the NN evaluated on a single batch of samples. After iterating through each batch in the full training sample set, the training samples can be shuffled and randomly partitioned again into different batches for more iterations. Each cycle through the full training sample set is known as an epoch.