**Table of Contents**

- Model Compiling
- Model Training
- Model Evaluation and Prediction
- Implementation with Keras
- Video Tutorial

**1. Model Compiling**

- Deep analysis is not a single job, we have to go through a good number of complex steps.
- For Data selection, We can select built-in data sets or data sets from external sources.
- Next is data processing, This process differs based on sequence, text, and image data. After that model generation, we can choose sequential or functional API and different layers to build a new model then model compilation. In this step, we have to select a problem-specific lost function and optimizer for model training. In this step, we fit our training data on the compiler model.
- Then the model evaluation will guarantee our prediction for each input and output pair and collect scores. This will give us an idea of how will we have modeled the data.
- Finally, we predict outcomes from our model. We have already worked with the first three steps in previous blogs.

- Keras model provides the method compile to compile the model. It contains three arguments. The loss function is set as binary cross-entropy optimizer as Adam and matrices as accuracy.
- We have many other options for loss functions and we need to select them based on our algorithm. We use mean square error, mean absolute error, for regression problems. Hinge or binary cross-entropy is used for classification.
- We have also a good number of choices in the optimizers. The first one is SGD, which is the stochastic gradient descent optimizer, which includes support for momentum, learning rate, and Nesterov Momentum.
- Rmsprop maintains power parameter learning rates that are adopted based on an average of recent magnitudes of gradients for weight. Add a grade or adaptive gradient is an optimizer with parameter specific learning rates which are adopted relative to how frequently a perimeter gets updated during training.
- The more updates a parameter receives, the smaller the learning rate added. Adadelta is a more robust extension of the grade that adapts learning rates based on a moving window of gradient updates.
- Adam realizes the benefit of both Adagrad and rmsprop. It calculates an exponential moving average of the gradient and that squared gradient and parameters Beta1 and Beta 2 control the moving averages.
- No mattresses are metric function is similar to a lost function, except that the results from evaluating a metric are not used. When training the model, Kerass provides the following options and we can choose anyone from it.

**2. Model Training:**

- Models are trained by using this fit function, we have to specify input and output in the function contains two important arguments epochs and Batch_sizes.
- In epochs, we have to specify the number of times the model is needed to be evaluated during training.
- The batch_size is the number of samples processed before the model is updated. Suppose we have a dataset with a hundred samples and we choose batch size five.
- It means the data set will be divided into twenty batches, each with five samples.
- Verbose, we say to zero, which means it will run silently.

**3. Model evaluation and Prediction:**

- Kerass provides evaluate the function, which does the evaluation of the model. It returns the loss of the model and the accuracy of the model on the data set.
- Finally, model prediction. Keras provides this predictive method to gauge the prediction of the trained model. It only returns numbed by arrays of predictions.

**4. Implementation with Keras**

- We are going to construct a model, then compile and train our model. After that, we evaluate our model and finally predict outcomes using the same model.
- First, we import all the required methods, layers, and models from Keras as we use a sequential model.
- We load our debate data set, which is available in our working directory.

- Our dataset contains seven hundred sixty-eight samples of nine features. We split our dataset into Inputs eight feature and the final column as output.
- We construct our model using the sequential API. We are taking only two dense layers to avoid complexity in the final layer. We are using the sigmoid activation function so the output will be in the range of zero and one.
- It has three layers in the first input layer input and the output dimension is the same eight in the second layer, the input dimension is eight, and the output dimension is two.
- In the final dense layer, the input dimension is to an output dimension is one.
- Now we compile our model as this is a classification problem. We use binary cross-entropy as the loss function.

- We are going to define Adam as the optimizer and finally choosing accuracy as a metrics function.
- While training our dataset will be divided into 77 batches with ten samples. Each of these verbose zero will run silently.
- So model training is done. Now we evaluate our model using the evaluate function which will take input and output data as arguments. This function will return the loss and accuracy of the model and the accuracy of our model is around 65 percent.
- If we increase the layers of my model, the accuracy will increase. Its time for predicting our outcomes using the input data.
- Now we compare our result with the expected result. We have taken only 10 samples as an example and the results are in below the image.