Let’s Develop Artificial Neural Network in 30 lines of code, Simple yet Complete Guide on how to apply ANN for classification


Let’s Develop Artificial Neural Network in 30 lines of code

I believe you are already aware of how Neural Networks work if not…don’t worry, there are plenty of resources available on the web to get started with. However, i will too walk you through in brief what is neuron networks and how it learns?

Parts Of Neuron

In this diagram/photo, Dendrites are the receivers of the neuron while Axom is the transmitter of neuron signal.

What is a neuron?

In Artificial Intelligence Neuron is a mathematical function that models the functioning of a biological neuron. Typically, a neuron computes the weighted average of its input, and this sum is passed through a nonlinear function, also called as activation function, such as the sigmoid, Relu

Now if we put this in a flow diagram it will look something like this

Simple Neuron Network Diagram

In real off-course we gonna have larger and more complex neuronal networks

Multi-layer Neuron Network

How does it learn?

When they go process data back and forth (also known as backpropagation). They create weights to save the optimized parameter settings over n over again that gives less error/loss inaccuracy. Once it reaches the point where further calculation doesn’t give any improvement over preceding accuracy, the parameter settings are saved as weights. Now there are different types of methods to minimize the loss inaccuracy. One of them is the Gradient Descent.

Gradient Descent is an optimized algorithm often used for finding weights.

Types of Gradient Descent

1. Batch Gradient Descent: it calculates the error for each example in the training dataset but only updates the model after all training examples have been evaluated. In other words, it takes the whole data and adjusts weights with iterations & iterations.

Pros:

a) Fewer updates to the model means this variant of gradient descent is more computationally efficient than stochastic gradient descent.

b) And with the decreased update frequency results in a more stable error gradient and that may result in more stable convergence.

Cons:

a.) However stable error may result in premature convergence of the model to a less optimal set of parameters.

b.) It is implemented in such a way that it requires the entire training set in memory and is available to the algorithm. Thus with respect to training speed, may become slow for large datasets.

2. Stochastic Gradient Descent calculates the error and updates the model for each example in the training dataset.

In other words: one row at a time, adjust the weights with iterations. Helps to avoid the local minimum rather than the global minimum and it's faster.

Pros:

a.) This variant is simpler to understand and implement for beginners

b.) The frequent updates immediately give an insight into the performance of the model and the rate of improvement.

c.) The increased model update frequency one row at a time can result in faster learning on some problems.

Cons:

a.) However updating the model so frequently is computationally expensive than others variants of gradient descent, especially train models on a large dataset.

b.) But the frequent updates can result in a noisy gradient signal which may cause the model parameters and in turn the model error to jump around.

3. Mini-Batch Gradient Descent: is a variation of the gradient descent algorithm that splits the training set into small batches that are used to calculate model error and update model co-efficient.

Mini-batch gradient descent seeks to find a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent.

Pros:

a.) The model update frequency is higher than batch gradient descent which allows for a more robust convergence and avoiding local minima.

b.) The batch updates provide a computationally more efficient process than stochastic gradient descent.

c.) The batching allows both the efficiency of not having all the training data in memory and algorithm implementation.

Cons:

a.) Mini-batch requires the configuration of an additional ‘mini-batch size hyperparameter for the learning algorithm.

b.) Error information must be accumulated across mini-batches of training examples like batch gradient descent Thus requiring high computational power.

THE MOST COMMONLY USED OPTIMIZER IN DEEP LEARNING is ADAM, an another optimized algorithm.

NOW Since we have an idea of how Neural networks work. Let’s get started with a real-life example.

First, we will import the required libraries and the data.

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# then import the dataset
dataset = pd.read_csv(churn_data.csv')

#if we view the dataset we can see it contains variables that have contributed in retaining the customer labeled with ‘Exited’ 0’s,1’s

Data Set Preview using python
Data Set

# Now split the dataset into X and Y where X is the independent variable and y is the dependent variable. And Since this is a classification problem where y = 1,0 we will perform ANN for classification

X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

# we don’t have much to clean in the dataset except a few transformations for the categorical variable ‘country’ and ‘gender’

#data cleaning and transformation
#Load the required libary Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
#country column
ct = ColumnTransformer([("Country", OneHotEncoder(), [1])], remainder = 'passthrough')
X = ct.fit_transform(X)
X = X[:, 1:]

#remember whenever you have more than 2 categories assume 5 then u have to select one less than 5 i.e. (n-1) = 5–1=4. Else it will give N/A output. Re-Think again when you convert a Two category variable like gender, Male/Female, the output will have 1 column that will contain 0 and 1 values, 0 for male or 1 for female.

The same goes if we have 5 classes/categories we will have 5 columns as output but remember each column have the response of 2 variable 0,1 (yes/No). Thus we remove 1 column to indicate all the 4 variables(assume from 2nd to 5th) output(yes/no,1/0) are in respect to 1 variable(assume the 1st variable).

In other words, it means the value of 1st variable is already included in all 4 variables.

#So to avoid the dummy variable trap we will remove one(1st) variable or any of the country columns, off-course ‘ANY’ doesn't mean any other variable output outside the country columns.

X = X[:, 1:]

#the same we will do for male/female but this time it’s simpler

# Male/Female
labelencoder_X = LabelEncoder() # we will call the function
X[:, 3] = labelencoder_X.fit_transform(X[:, 3]) #then fit it to transform the 3rd column

We are almost there to create the neural network. One more simple and super fast step we have to do is split the dataset into training and test datasets for the ANN to learn and test then we have to do Feature scaling to bring the magnitude into a small range that will help to reduce the workload in ANN without compromising the original meaning of the data.

Thus scaling doesn’t add any noise neither loses the original meaning of the data.
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# AND WE ARE DONE WITH THE DATA PREPARATION !!!!!!!!!

#LET’s START THE FUN PART- CREATING A NEURAL NETWORK!!!
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense

#a small note on Keras and TensorFlow BUZZ word that we hear all the time.

TensorFlow is an end-to-end open-source platform. It’s a comprehensive and flexible ecosystem of tools, libraries, and other resources that provide workflows with high-level APIs.

Keras, on the other hand, is a high-level neural networks library that is running on top of TensorFlow, CNTK, and Theano. Using Keras in deep learning allows developers to easily build neural networks without worrying much about the mathematical aspects of tensor algebra, numerical techniques, and optimization methods. Keras was developed with the objective of allowing people to write their own scripts without having to learn the backend in detail.

Let’s Get Back To The Track!

# Initializing the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
# Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# we will add and connect layers using .add and DENSE with units = 6 hmmm..! what does that 6 means?

6 refers to a number of nodes/neurons in the layer, usually, we choose half of the number of columns(variables) we have in our dataset. In this case, we have 11 columns, now we cannot choose 5.5 so we can simply choose 5 or 6 .

Note: choosing as many nodes doesn't mean it will improve the accuracy, it will simply create noise and complexity with too many unnecessary neurons.

Next, we have kernel _initializer = ‘uniform’ where uniform is a function to initialize the weights for Stochastic gradient descent or any other optimizer like ‘ADAM’ What is an optimizer? we will get to the part in a few seconds.

Activation = ‘relu’ stands for the rectified linear unit is the rectifier to create and measure the non-linearity.

Relu is linear for all positive values and zeroes for all negative values. The downside for being zero for all negative values is a problem called “dying RELU” . a Relu neuron is “dead” if it’s stuck on the negative side and always outputs 0. The dying problem is likely to occur when the learning rate is too high or there is large negative bias. ‘Leaky ReLU’ and ‘ELU’ are also good alternatives to try. Other variants include ReLU-6, Concatenated ReLU(CReLU), Exponential Linear(ELU,SELU), Parametric ReLU.

Last one is ‘input_dim’ simply refers to the number of columns(input dimensions)

Congrats!

CONGRATS!

We have successfully created our first layer!

Further, we will add a second layer the same way we did above, the only difference is we don’t need to add “input_dim” becoz it will learn itself from the first layer the input dimensions value is 11

# Adding the second hidden layer 
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

Activation ‘sigmoid’ Now this function gives the output in 0’s and 1’s instead of numbers. And since we have a classification problem we need the output in 0’s and 1’s However if our problem is a regression then we will simply use ‘relu’. We will see that ‘relu’ function on “how to use ANN for regression” in my next chapter.

#Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

This function is actually used to compile all the layers in other words calculate weights(settings) in the neural network.

Optimizer = ‘adam” just like Stochastic Gradient Descent (SGD) optimizes the algorithm to find the optimal set of weights in neural networks using pre-defined kernel_initializer =”uniform” that we set a while ago.

Loss = ‘binary_crossentropy’ is the function used to calculate the loss in accuracy for the Classification problem and for Regression its RMSE (Root Mean Square Error)

Metric== [‘accuracy’] is again another function to display the accuracy of the model.

Done.

women with umberalla, so done gif
So done.

Congratulations. We have successfully built the Neuron Network having one input layer, a second layer, and an output layer.

Now it's time to fit out the dataset. With batch_Size = 10 refers to the number of samples to work with before updating the internal model parameters. While EPOCH refers to the number of times that the learning algorithm will work through the entire training dataset.

# Fitting the ANN to the Training set 
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
#TIME TO PREDICT 
#Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5) # becoz we need our output in TRUE/FALSE (1s &0s)

#now we can compare y_pred predicted data with actual data i.e. y_test dataset Or we can use a confusion matrix to calculate total prediction accuracy.

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

for the first time, it might be confusing with training and test set (X_train,X_test, y_train, Y_test.)

View and Compare the X_train,X_test, y_train, Y_test. You will notice X_train,X_test are the independent variables and Y_train and y_test are the dependent variables with respect to X_train and X_test. Therefore we trained our data with X_train and test with its corresponding Y_train . Then Predict our model with unseen data that is X_test and later compare the predicting accuracy of our prediction data with the original data i.e. Y_test… I hope you are able to understand? Cool.

We have successfully create our Artificial Neuron Network for a classification problem.

The whole code will look something like this

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
#country column
ct = ColumnTransformer([("Country", OneHotEncoder(), [1])], remainder = 'passthrough')
X = ct.fit_transform(X)
#to avoid dummy variable trap
X = X[:, 1:]
# Male/Female
labelencoder_X = LabelEncoder()
X[:, 3] = labelencoder_X.fit_transform(X[:, 3])
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))
# Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)
# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)
# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
Me Leaving this conversation
Stay Tune…!
I hope you enjoyed…! Next is ANN for regression.
Stay Tune! or ping me

Comments