Tuesday, November 3, 2020

Deep Learning - Coursera

 


This course was introduction to Neural Networks, I'll try to summarize this in as simple a manner as possible.

Logistic regression can be viewed as a simple, single layer neural network. Similarly, a neural network can be viewed as multiple layers of logistic regression.

The difference being that, logistic regression can detect linear patterns only i.e. just do line fitting through the training dataset.

Neural networks can detect non-linear patterns. This is because the each layer of the neural network has a non-linear activation function.

Logistic Regression with Gradient Descent

The goal of Logistic Regression is to train a model or create a model which can make predictions, more specifically True or False predictions.

The input is anything that can be represented as a matrix, say X, then we need a matrix W and a vector b such that:

W * X + b  = a.   

When an activation function say Ω is applied to a, we get the True or False result.

Rounding Ω(a) to an integer gives the True or False result.


Gradient Descent Algorithm

 The training algorithm proceeds as follows: 

We take a large number of training examples and We start with a zero matrix and zero vector for W and b.

For each training example X and training result Y, 

  1. we calculate  a = Ω (W * X + b) 
  2. Next, we calculate the cost Y vs. a
  3. Next, we adjust W and b based on the cost.
We do this until there is no difference in cost across different iterations i.e there is no gradient descent.


Neural Network 

The Neural Network goal is similar, but goal is to train multiple layers and we start with random value matrices

The neural network algorithm is similar to the gradient descent algorithm, but once again it works across multiple layers.

  1. As an equivalent of step 1 in logistic regression, we have "Forward Propagation"
  2. Once again we calculate cost by comparing with intermediate step's result with the training result.
  3. As an equivalent of step 3 in Logistic regression, we have backward propogation, which adjusts the weights across all the layers.


Other Stuff

Hyper-parameters

Things like the number of layers, the learning rate etc. Tuning these for optimal efficiency is a course of its own

Vectorization

This is a computation optimization where we avoid explicit for-loops in the code and instead use Python and Numpy's inbuilt features such as broadcasting

Learning Tip

Do the course with a friend, makes it much more easier and fun..



No comments: