applying filter using pandas

Table of Contents

  1. Equal to operator
  2. Not equal to operator
  3. Greater than (>)
  4. Less than (<) Greater than or equal to (>=)
  5. Less than or equal to (<=)
  6. Between two values
  7. Video Tutorial

The first step is to load the data file (stock_data.csv) using Pandas library as shown in figure 1.

Figure-1

The Stock_data.csv data set is shown in figure 2. As you can see it contains data about stocks.

Figure 2: The data set

1 Equal operator

Let’s think we want to print the records, where the Volume value is equal to 1351760. This can be done by using the equal to operator ==. As shown in figure 3, specify the data frame name and also the column name.

Figure-3

In the same way we can print the data row, where the value for the High is equal to 1334.500000. The figure 4, shows the printed records. Note that the value for the High column is equal to 1334.500000.

Figure 4: Printing the data row where the High is equal to 1334.5

2 Not equal operator

In this section let’s see how can we filter out the data, which are not equal to a specified value. As in the same way as SQL, the operator for the Not Equal to is <>. Also, := can be used as the not equal to operator.

Let’s think that we want to obtain the data set where the value for the High is not equal to 1334.5. This is shown in figure 5. Specify the value and the column name which the operation should be done as shown below.

stock_data[(stock_data.High<>1334.500000)]

Execute the code and it can be observed that the data rows where the High column values are not equal to the given value are printed.

Figure 5: Filtering the data where the High value is not equal to the specified value
Figure-6

In order to get the count of the printed values, you can use the count() function as shown in figure 6. It shows the column-wise count where the High value is not equal to 1334.5.

In the same way, in order to know the count of the particular filter (figure 7), specify it in the way given below:

stock_data[(stock_data.High<>1334.500000)].High.count()

Figure 7: The count of the particular column

3 Greater than (>)

In the same way, we can filter out the data by providing the greater than sign. Let’s think we want to filter out the records which have a higher value than 1402.5 in High column. Use the > sign as given below.

stock_data[(stock_data.High>1402.500000)]

Execute the code as shown in figure 8. The data rows where the records have greater than 1402.5 value at the High column are printed out.

Figure 8: Filtering out the records which are greater than the specified value

4 Less than (<)

In this section, let’s see how can we filter out the data that is lesser than the specified value. The <  is used as the operator.

Let’s think that we want to obtain the data set, where the value for the High is lesser than 1402.5 as shown in figure 9. Specify the value and the column name which the operation should be done as shown below.

stock_data[(stock_data.High< 1402.500000)]

Execute the code, and it can be observed that the data rows where the “High” column values are lesser than the given value are printed.

Figure 9: Filtering out records which are lesser than the given value

5 Greater than or equal to (>=)

The sign >= can be used to print out the records which are greater than or equal to a specified value. If the value of the left operand is greater than or equal to the value of the right operand, then the condition becomes true.

Let’s assume we want to print the records which are greater than or equal to 1402.5 in the High column. Then it should be typed as,

stock_data[(stock_data.High>= 1402.5)]

Refer the figure 10, it can be seen that records are printed out which are higher than or equal to 1402.5. This means records also contain the value 1402.5.

Figure 10: Using greater than or equal to sign

6. Less than or equal to (<=)

The sign <= can be used to print out the records which are greater than or equal to a specified value. If the value of the left operand is less than or equal to the value of the right operand, then the condition becomes true.

Let’s assume we want to print the records which are less than or equal to 1402.5 in the High column. Then it should be typed as,

stock_data[(stock_data.High<= 1402.5)]

Refer the figure 11, it can be seen that records are printed out which are lesser than or equal to 1402.5. This means records also contain the value 1402.5.

Figure 11: Using the less than or equal to sign

7 Between two values

In order to check whether a record is between the given two values, the between() function can be used along with the inclusive parameter.

Specify the two numbers and set the inclusive parameter as True. Let’s assume we want to print out the records for which the volume values are between the 19020290 and 3368314. The code should be,

stock_data[(stock_data.Volume).between(1920290,3368314, inclusive= True)]

As shown in figure 12, the records between the specified values can be obtained.

Figure-12

To put this into practice, let’s assume we want the records between the mean value of the Volume column and 3368314 of the Volume column. To do this,

  1. First, the mean value of the Volume column should be found using the mean() function.
  2. Then assign the found mean value to the “mean_volume” variable
  3. At last, specify the two values as we did in this section as given in figure 13.
Figure 13: Checking the records between two given values

Video Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *