natural language processing Scikit learn tutorial

  1. What is NLP
  2. Steps in NLP
  3. Applications of NLP
  4. Example
  5. NLP Example
  6. Video Tutorial ( 2 videos )

1 What is NLP

  1. Natural language processing is one of the most practical and famous implementation and it is a subfield of computer science, information engineering and artificial intelligence
  2.  Text analytics, sentiment analysis is being part of natural language processing.
  3. Natural language processing can be used in a very determined fields . It can be used as text analytics or summarization and natural language processing is having a very huge scope in the chatbots which are being developed. 
  4. The chatbots are highly in demand it gives us the computer much ability to understand the natural language and take the commands through the text input or the voiceover commands.
  5. Similarly the sentiment analysis in which the computer is trying to understand the human emotions .

2 Steps in NLP

Natural language processing mainly consists of five major steps 

a lexical analysis

Lexical analysis in the first term in which the computer tries to understand the complete overview of the given text and tries to break it down into paragraphs sentences and words so this is the first approach as we can see by the time we look at a piece of paper that’s what so we see paragraphs and sentences and then we see the words.

b syntactic analysis

syntactic analysis or the parsing which simply means that once the sentence is given the computer now tries to take the grammar checking for that and make sure that the word which is or the sentence which is given to it is grammatically and syntactically correct.

c semantic analysis

Semantic analysis in which the computer takes a specific word and look exactly what is the meaning of it provided once that is done now as you can see by each incrementation of the step and the implementation we are able to make much more sense.

d disclosure indication 

  1. Disclosure integration takes as the primary step of understanding what happeningin text .sometimes the piece of word or the text which is given to you depends on the word which was present before it now bad is a negative term not bad this positive term or not bad is not a negative term.
  2. Actually so the complete sentence depended of bad depended on the not word that happens in disclosure integration in which a computer tries to analyze the information out of a text and considering the meaning of the sentence based on the previous and the after text given.

e pragmatic analysis 

  1. the pragmatic analysis is to understand what was the real meaning out of the sentence.
  2. When text is being provided then taking out insight from the same and trying to understand whether the information was as direct as provided or does it conclude to something else which can include some of the major practical implementations such as sarcasm.

3 Applications of NLP

  1. There are some awesome applications in natural language processing which is quite famous. One of them we have something known as IBM Watson natural language understanding which provides us was a very based API for the classification of matched language processing as it extracts the entities and the relationships and tries to convey as much as more information on the text provided.
  2.  Similarly, we have something known as dialogue flow as a part of google chat pod which allows the user to make chatbots. Dialogue flow completely is based on natural language processing and instance phase processing in which it depends on the first sentence which has been provided team making the sense out of it and then replying to the same now those are some of the few API’s which is being provided for natural language processing.

4 Natural language processing example

  1. Lets see how we can implement NLP by yourself so we have started to import basic libraries followed by we are taking our review from our restaurant reviews determined as positive or negative.
  2. Data is in a TSV file which means the spacing is given although different values is being separated with the tabular space so once we read the data lifting or freedom now here is an example of the sentiment provided with the natural language given.
  1. The sentiment of the each sentence is given in another column .
  2. For example ‘a great touch ‘simply means it was positive sentence if we see ‘would not go back’ simply means is the negative term and is being represented with zero 
  3. That is about data and we have a thousand rows into two columns that is the shape.
  4. Along its scikit-learn we’ll be using to one of the major libraries in that are regular expressions and the NLTK library.

5 NLP Example

Implementation of sentiments provided for food review representing the negative ones with zero and the positives with one 

  1. we have imported the necessary libraries and the data.
  2. Looking at two major libraries are the regular expressions and NLTK (natural language toolkit) library. 
  3. Then download the stop words which are the words that do not hold any information value to the given sentence. For example “a, the, is, are”
  4. We call the stop words from the corpus and the Porter stemmer.
  5. Since we have a thousand values in our data we’ll be running the for loop for 0 to 1000
  1. Then We are using regular expression such that we only want to take the string or this alphabetic characters for example in this one if I have something known as $11.99 which is not actually an information for us
  2. Then all we have to do is to classify all of the send all of the text of the strings available into the lower class splitting them up and doing the porter stemmer which is the normalization in natural language processing.
  3. Then remove the stopwords from the corpus that we have made of a list by joining each of the strings.
  4. Then we need to convert these strings into vectors and that is done through count vectorizer and which available in scikit-learn feature extraction of text.
  1.  Now each string is converted into a vector by giving a maximum number of features to the one we say that this is the complete distribution.
  2. We call the count vectorizer and pass the maximum number of features that we want. Here we are taking the words as a maximum of 1500.
  3. We get that after fit and transforming the data. Then we obtain X and Y values we have to split data for train and test.
  4. For splitting, we use sklearn Train test split making the test size to be 20% and the training set to be 80% and having a random state of 0.
  5. so once that is done now we are using a Naive Bayers classifier which we can import from sklearn library as GaussianNB.
  6. Once that is done we fit the value of X-train and Y-train and Gaussian naive Bayes takes the understanding of classification from the previous value provided and learns from them. 
  7. Now the NB classifier has fitted our model and trained the data now we all we have to do is to predict the value which is given to the model 
  1. Here we are using Classifier.predict to predict X values. We can see the results and we are storing them as y_pred
  2. Now to evaluate results we call the confusion matrix from sklearn as a confusion matrix 
  1. It gives us the classification report of the predicted points.
  2. let’s see how much accuracy we have obtained from the predicted values for that we are calling accuracy function from the sklearn.matrices 
  3. The accuracy we have obtained is 0.73 resenting seventy-three percent of accuracy on the y test was the y predicted values which is a pretty good score.

Video Tutorial-1

 Video Tutorial-2

Leave a Reply

Your email address will not be published. Required fields are marked *