**What are neural networks?**

Neural networks, also called artificial neural networks (ANNs), are a subset of machine learning and integral to deep learning algorithms. The concept and structure of neural networks are derived from the human brain by trying to mimic how biological neurons interact. ANNs rely on training data to learn and improve their accuracy over time; once their learning algorithms lead to fine-tuned precision, they can perform tasks like classification and clustering on data at a high velocity.

ANNs are usually categorized into 3 different node layers, an input layer, multiple hidden layers, and an output layer. Nodes within each layer contain a specific weight and threshold associated with it. If the output of any individual note is above the specified threshold value, that node is activated and sends data to the next layer of the network. No data is passed onto the next network layer if the output doesn’t exceed the threshold value.

**How do neural networks work?**

Each node within a layer contains the same concept as a linear regression model, composed of input data, weights, a bias (also known as threshold), and an output.

Once an input layer is determined, weights are assigned. These weights help determine the importance of any given variable, with larger ones contributing more significantly to the output than other inputs. All inputs are then multiplied by their corresponding weights and then summed. Afterward, the output is passed through an activation function. If that output exceeds a given threshold, it “fires” (or activates) the node, passing data to the next layer in the network. This results in the output of one node becoming the input of the next node. This process of passing data from one layer to the next defines this neural network as a feedforward network.

Let’s use binary values to break down what one single node might look like. We can apply this concept to a more tangible example, like whether you should go to the festival (Yes: 1, No: 0). The decision to go or not to go is our predicted outcome or y-hat. Let’s assume that there are three factors influencing your decision-making:

- How highly rated is the festival by other people? (Yes: 1, No: 0)
- Is it too crowded? (Yes: 1, No: 0)
- Is it raining outside today? (Yes: 0, No: 1)

Then, let’s assume the following, giving us the following inputs:

- X = 1, it is highly rated by other people
- X2 = 0; it is a bit crowded today
- X3 = 1; it isn’t raining outside today

Now, we need to assign some weights to determine its importance. Larger weights signify that particular variables are of greater importance to the decision or outcome.

- W1 = 5 because this festival only happens once a year for 3 days
- W2 = 2 because you’re used to the crowds
- W3 = 4 because you hate rainy weather

Finally, we’ll also assume a threshold value of 3, translating to a bias value of –3. With all the various inputs, we can start to plug values into the formula to get the desired output.

*Y-hat = (1*5) + (0*2) + (1*4) – 3 = 6*

If we use the activation function from the beginning of this section, we can determine that the output of this node would be 1 since 6 is greater than 0. In this instance, you would go to the festival; but if we adjust the weights or the threshold, we can achieve different outcomes from the model. When we observe one decision, like in the above example, we can see how a neural network could make increasingly complex decisions depending on the output of previous decisions or layers.

In the example above, we used perceptrons to illustrate some of the mathematics at play here, but neural networks leverage sigmoid neurons, which are distinguished by having values between 0 and 1. Since neural networks behave similarly to decision trees, cascading data from one node to another, having x values between 0 and 1 will reduce the impact of any given change of a single variable on the output of any given node and, subsequently, the output of the neural network.

As we think about more practical use cases for neural networks, like image recognition or classification, we’ll leverage supervised learning, or labeled datasets, to train the algorithm. As we train the model, we’ll want to evaluate its accuracy using a cost (or loss) function. This is also known as the mean squared error (MSE). In the equation below,

*i*represents the index of the sample,- y-hat is the predicted outcome,
- y is the actual value, and
*m*is the number of samples.

The goal is to minimize the cost function, ensuring the correctness of the fit for any given observation. As the model adjusts its weights and bias, it uses the cost function in combination with reinforcement learning to reach the point of convergence, better known as the **local minimum**. The algorithm alters its weights through gradient descent, allowing the model to determine which direction to take to reduce errors (or minimize the cost function). With each training example, the model’s parameters adapt to converge gradually at the minimum.

Most deep neural networks are feedforward, meaning they flow in one direction only, from input to output. However, you can also train your model through backpropagation, moving in the opposite direction from output to input. Backpropagation allows us to calculate and attribute the error associated with each neuron, allowing us to adjust and fit the model(s) parameters appropriately.

**5 Main types of neural networks**

**FeedForward Neural Network:**- The simplest form of neural network where input data travels in one direction only, passing through artificial neural nodes and exiting through output nodes. Where hidden layers may or may not be present, input and output layers are present there. Based on this, they can be further classified as single-layered or multi-layered feedforward neural networks.

**Radial Neural Network:**- Radial Basis Function Network consists of an input vector followed by a layer of RBF neurons and an output layer with one node per category. Classification is performed by measuring the input’s similarity to data points from the training set where each neuron stores a prototype. This will be one of the examples from the training set.

**Recurrent Neural Network:**- Designed to save the output of a layer, Recurrent Neural Network is fed back to the input to help predict the outcome of the layer. The first layer is typically a feedforward neural network followed by a recurrent neural network layer where a memory function remembers some information it had in the previous time-step. Forward propagation is implemented in this case. It stores information required for its future use. If the prediction is wrong, the learning rate is employed to make small changes. Hence, making it gradually increases toward making the right prediction during the backpropagation.

**Convolutional Neural Network:**- CNNs contain a three-dimensional arrangement of neurons instead of the standard two-dimensional array. The first layer is called the convolutional layer. Each neuron in the convolutional layer only processes the information from a small part of the visual field. Input features are taken batch-wise like a filter. The network understands the images in parts and can compute these operations multiple times to complete the full image processing. Processing involves the conversion of the image from RGB or HSI scale to greyscale. Furthering the changes in the pixel value will help to detect the edges, and images can be classified into different categories.

**LSTM (Long Short-Term Memory):**- LSTM networks are a type of RNN that uses special units in addition to standard units. LSTM units have a ‘memory cell’ that can maintain information in the form of memory for long periods of time. A set of gates controls when information enters the memory when it leaves as output, and when it’s forgotten. There are three types of gates: Input gate, output gate, and forget gate. The input gate decides how much information from the last sample will be kept in memory; the output gate regulates the amount of data passed to the next layer, and forget gates control the tearing rate of memory stored. This architecture lets them learn longer-term dependencies.