what is deep learning?

In this tutorial I a going to teach you about Deep learning, what it makes different from machine learning and for what purpose you used deep learning algorithm. Before your proceed to the tutorial, I recommend you to visit my previous tutorial about machine learning.

Introduction to deep learning

Deep Learning is a result of many small technical improvements of machine learning.

Deep learning

The reason that the neural network with deeper layers yields poorer performance is that the network was not properly trained.

What can we do with deep learning

The three primary research areas of Deep Learning are usually applicable

  • Image recognition
  • Speech recognition, and
  • Natural language processing.

Training a neural network has three major steps.

  • First, it does a forward pass and makes a prediction.
  • Second, it compares the prediction to the ground truth using a loss function. The loss function outputs an error value which is an estimate of how poorly the network is performing.
  • Last, it uses that error value to do back propagation which calculates the gradients for each node in the network.

The back propagation algorithm solves the following three primary difficulties in the training process of the deep neural network (machine learning):

  • Vanishing gradient
  • Over fitting
  • Computational load

The vanishing gradient in the training process occurs when the output error is more likely to fail to reach the farther nodes.

The representative solution to the vanishing gradient is the use of the Rectified Linear Unit (ReLU) function as the activation function.

Deep neural network

The ReLU function can be define as follows:

ReLu FUNCTION
rELU GRAPH

Over fitting in Deep learning

The reason that the deep neural network is especially vulnerable to over fitting is that the model becomes more complicated as it includes more hidden layers, and hence more weight.

The most representative solution is the dropout, which trains only some of the randomly selected nodes rather than the entire network.

Using dropout we can randomly select some nods at a certain percentage and their outputs are set to be zero to deactivate the nodes.

Dropout in deep learning

The dropout effectively prevents over fitting as it continuously alters the nodes and weights in the training process.

The adequate percentages of the dropout are approximately 50% and 25% for hidden and input layers, respectively.

Dropout

Computational Load

The last challenge is the need of time to complete the training. The number of weights increases geometrically with the number of hidden layers, thus requiring more training data. This ultimately requires more calculations . The more computations the neural network performs, the longer the training takes. This problem is a serious concern in the practical development of the neural network. If a deep neural network requires a month to train, we only can modify 20 times a year.

This trouble has been relieved to a considerable extent by the introduction of high-performance hardware, such as GPU, and algorithms, such as batch normalization.

Convolutional Neural Network (CNN) in deep learning

Convolution network architecture comprises a fixed set of layers designated for specialized functions. The most critical layers are as follows

  • Convolutional layer (CONV)
  • Pooling layer (POOL)
  • Full-connected (FC) layer

Convolutional layer (CONV) in deep learning

This layer is responsible for holding the neurons in a three-dimensional format and is therefore responsible for a three-dimensional output. The left side figure is an example of an input volume with the dimensions 32 x 32 x 3. As shown, each neuron is connected to a particular input region. Along the depth, there can be many neurons; we can see five neurons in the example.

Convolutional layer in deep learning
weight update in deep learning

CNN basically has two main parts such as feature extraction (learning) layer and Classification layer as follows.

Structure of CNN

Convolution

The term convolution refers to the mathematical combination of two functions to produce a third function. It merges two sets of information. In the case of a CNN, the convolution is performed on the input data with the use of a filter or kernel (these terms are used interchangeably) to then produce a feature map.

Convolution process in Deep learning
Max pooling in Deep learning

Pooling layer (POOL)

The pooling layer is responsible for reducing the chances of over-fitting by reducing the spatial size of the input volume. The reduction of the spatial size implies reducing the number of parameters or the amount of computations in the network.

max pooling
type of polling
average polling
Pooling in deep learning

Fully connected layer

The fully connected layer will compute the class score, and the resulted dimension will be a single vector 1x1x10 (if there is 10 class scores).  This layer is a regular neural network layer that takes input from the previous layer and computes the class scores and outputs the 1-D array of size equal to the number of classes

CNN to classify care
flatten layer in deep learning

CNN Parameter Calculation

CNN parameter calculation

Parameters in general are weights that are a model can learn during training. They are weight matrices that contribute to model’s predictive power, changed during back-propagation process.

1.The first input layer has no parameters. You know why.

2.Parameters in the second CONV1(filter shape =5*5, stride=1) layer is: ((width of filter*height of filter*number of filters in the previous layer+1)*number of filters) = (((5*5*3)+1)*8) = 608.

3.The third POOL1 layer has no parameters. You know why.

4.Parameters in the fourth CONV2(filter shape =5*5, stride=1) layer is: ((width of filter * height of filter * number of filters in the previous layer+1) * number of filters) = (((5*5*8)+1)*16) = 3216.

5.The fifth POOL2 layer has no parameters. You know why.

6.Parameters in the Sixth FC3 layer is((current layer c*previous layer p)+1*c) = 120*400+1*120= 48120.

7.Parameters in the Seventh FC4 layer is: ((current layer c*previous layer p)+1*c) = 84*120+1* 84 = 10164.

8.The Eighth Softmax layer has ((current layer c*previous layer p)+1*c) parameters = 10*84+1*10 = 850.

List of parameter

Back propagation

The only algorithm to optimize the error and adjust the wight of each neuron in each hidden layer is called back propagation.

feed forward
back propagation

Why use the back propagation algorithm?

Neural Network is trained using 2 passes:

  • forward and
  • backward.

At the end of the forward pass, the network error is calculated, and should be as small as possible.

If the current error is high, the network didn’t learn properly from the data. It means the current set of weights isn’t accurate enough to reduce the network error and make accurate predictions. As a result, we should update network weights to reduce the network error. 

The back propagation algorithm is one of the algorithms responsible for updating network weights with the objective of reducing the network error.

Leave a Reply

Your email address will not be published. Required fields are marked *