# Cost Function Optimization using Gradient Descent Algorithm

This topic explains the cost function optimization using Gradient Descent Algorithm.

What is Cost Function?

First of all, Cost Function can be also called as Loss Function. It typically represents the error of the trained model performance.

The objective of a cost function can be different for different machine learning algorithms.

• minimize mean squared error (Linear Regression)
• maximize the reward function (reinforcement learning)
• minimize Gini index/maximize information gain (Decision tree classification)
• minimize cross entropy (Logistic Regression)

What is Gradient and how it differs from derivative?

As per Wikepedia, “The gradient is a multi-variable generalization of the derivative. While a derivative can be defined on functions of a single variable, for functions of several variables, the gradient takes its place. The gradient is a vector-valued function, as opposed to a derivative, which is scalar-valued Like the derivative, the gradient represents the slope of the tangent of the graph of the function.”

What is the relationship between cost function and gradient descent?

Any cost function can be minimized or maximized using gradients. The gradient vector helps to find out the direction to optimize and its magnitude represents the slope of the function in that direction.

Below is an example to optimize the linear regression cost function using gradient descent algorithm

``````# Import libraries for basic python operation

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
``````

Function to generate random numbers

``````def random():
np.random.seed(1) # generate same numbers
X = np.arange(50)
delta = np.random.uniform(0,15,size=(50,))
y = .4 * X + 3 + delta
return X,y
``````

Data Normalization

``````def normalize(data):
data -= np.min(data)
data /= np.ptp(data)
return data
``````
``````def plot(X,y):
plt.scatter(X,y)
plt.xlabel('X',fontweight="bold",fontsize = 15)
plt.ylabel('y',fontweight="bold",fontsize = 15)
plt.title('Scatter Data',fontweight="bold",fontsize = 20)
plt.show()
``````

``````def gradient_descent(X,y):
m_updated = 0
b_updated = 0
mse = []
iterations = 500
learning_rate = 0.01
n = len(X)

for i in range(iterations):
y_pred = (X * m_updated) + b_updated

error = y_pred - y

mse.append(np.mean(np.square(error)))

return [m_updated,b_updated,mse]
``````

The objective of the cost function is to minimize the mean squared error.

``````def plot_cost_function(mse):
plt.plot(mse,label="Mean Square Error")
plt.xlabel('Iteration', fontweight="bold", fontsize = 15)
plt.ylabel('MSE', fontweight="bold", fontsize = 15)
plt.title('Cost Function',fontweight="bold",fontsize = 20)
plt.legend()
plt.show()
``````
``````def plot_line(X,y,y_pred):
plt.scatter(X,y,label="Actual_Data")
plt.plot(X,y_pred,c='r',label = "Predicted Line")
plt.xlabel('X', fontweight="bold", fontsize = 15)
plt.ylabel('y', fontweight="bold", fontsize = 15)
plt.legend()
plt.show()
``````
``````if __name__ == "__main__":

X,y = random()

X = np.array(X).astype(np.float32)
y = np.array(y).astype(np.float32)
X = X.reshape(-1,1)
y = y.reshape(-1,1)

# Normalize the features
X = normalize(X)

plot(X,y)

plot_cost_function(mse)

y_pred = m * X + b

plot_line(X,y,y_pred)

``````   