**Table of Contents**

- what is multi-layer perception?
- Multi-layer perception in Keras
- Video Tutorial

**1. what is multi-layer perception?**

- Multi-layer perception is the basic type of algorithm used in deep learning it is also known as an artificial neural network and they are the most useful type of neural network.
- They are connected to multiple layers in a directed graph a perceptron is a single neuron model that was a precursor to large neural Nets it is a field of study that investigates how simple models of the biological brain can be used to solve different computational tasks.
- The goal is to create robust algorithms and data structures that can be used to model difficult problems MLP utilizes a supervised learning technique called backpropagation.
- MLP is used to solve regression and classification problems we use MLP in speech and image processing computer vision time series prediction and machine translation.
- In MLP neurons are arranged into networks a row of neurons is called a layer and one network can have multiple layers any network model starts with an input layer that takes input from the data set layers after the input layer are called hidden layers.

- In the above example, we have two hidden layers that are not directly exposed to the input the planning can refer to have many hidden layers in our network model usually if we increase the number of hidden layers the model will work more efficiently.
- The final layer is called the output layer and it is responsible for the final outcome the choice of activation function in the output layer strongly depends on the type of problem.
- If we are modeling a regression problem may have a single output neuron and the neuron may have no activation function a binary classification problem may have a single output neuron and use a sigmoid activation function to output a value between zero and one
- A multi-class classification problem may have multiple neurons in the final output layer one for each class in this scenario softmax activation function may be used to output our probability of the network.

**Lets code in Jupyter Notebook:**

- To construct our first multi-layer perception first we import sequential model API from Keras.
- We are using Dense and dropout layers so we have to import them from our Keras.
- To split our dataset we will use the train tests split function which is available in scikit-learn model selection and plot_model will help us to display our model.
- Finally, we import the load txt function from numpy which will be used to load our data set let’s import them here we work with the titanic data set.

**About Dataset**

Data is comma-separated it has a total of 891 samples of five variables first one is survival column zero means dead one means survived this is our target variable next is passenger class so we have three classes 1 2 & 3 then 6 after that age and the final one is fair these four are our predictors using predictors we will predict the class survival which contains 0 & 1.

- Import the data set from our working directory first we split this data set into inputs and the output this X contains the four columns second to fifth as inputs and y contains the output that survival column next we split our data set into training and test set using this training test speed function we select this size as 0.3.
- 70% data will be in the training set and 30% will be in the test set this final line will display the shape of the training and test set so let’s execute it in our inputs we have 4 variables so in the model the input dimension will be four.
- Now we construct a sequential model with dense and dropout layers first we construct a dense layer with 32 neurons as this is the first layer we have to specify the input dimension which is 4.
- So in the first hidden layer there will be 4 inputs and 32 outputs we use RELU as our activation function the next one is another dense layer with 16 neurons then dropout layer with 0.2 dropout is a technique used to prevent a model from overfitting this dropout will use 20% inputs at the time of model.

- Finally, we have a dense output layer with the activation function sigmoid as our target variable contains only zero and one sigmoid
- Then we compile our model as this is a binary classification we will use binary cross-entropy as a loss function we set Adam as optimizer it calculates an exponential moving average of the gradient and the square gradient
- Parameters control the decay rates of the moving averages we are also using accuracy as a metric function let’s compile it.
- it is time to train our model with training data with a batch size of 10 as our training set contains 623 samples there will be 63 batches of ten samples
- Finally model evaluation we will evaluate our model using test data set this evaluation function will return the loss and accuracy of the model this final line will display them in percentage let’s run it so our model accuracy is 65.67.