python pandas retrieve count max min mean median mode std

Table of Contents

  1. Introduction
  2. Count
    1. Getting the count of a particular column
    2. Getting the count of all columns
  3. Getting maximum or minimum
    1. Maximum
    2. Minimum
  4. Mean
  5. Median
  6. Mode
  7. Standard Deviation
  8. Getting the information
    1. describe function
    2. info function
  9. Video Tutorial
  1. Introduction

In this session, the most common methods to obtain statistics of a record will be discussed.  These methods are count, min, max, mean, meadian, mode, standard deviation. The basic meaning of some methods are

  1. Mean – Average value of given values
  2. Median – Middle value
  3. Mode – Most repeated value
  4. Standard Deviation – For each of the value subtracted by mean and square, and divide the values by number of values then apply the square root  

In order to start the practical, open Jupyterlab and launch a Jupyter notebook

Import Pandas and then read the csv file “car_sales.csv”  and execute the data frame as shown in figure 1.

Figure 1: Reading the csv file

2 Count

In order to find out the number of records present in the data set, count()function can be used. The data frame name should be specified when using this function

2.1  Getting the count of a particular column

The number of records in a particular column can be printed by specifying the data frame, the column name with the count function as shown in figure 2. Assume that the count of the records in the Quantity column is needed to be printed out.

Figure 2: Getting the count of recordings of a particular column

Please note that the count function doesn’t take null values to the account. In order to demonstrate this, delete a value in the count column (in here the first cell of the Quantity column is deleted) and re-import the file. Then, again execute the code. As shown in figure 3, it can be seen that now the count is 9. Because the first cell is a null value.

Figure 3: Getting the count of a column which is having a null value cell

2.2  Getting the count of all columns

Please note that in the previous section we specified the column name. But, if we don’t specify it, column-wise records count can be obtained. See figure 4, the function count is used after the frame.

Figure 4: Getting the count of the all columns

3   Getting the maximum or minimum

3.1  Maximum

The max() function can be used to find out the maximum value in a column.

  1. First, let’s find the maximum value in the Quantity column. (Please change the first cell value back to 2884 as we put it as null in the previous section) Specify the data frame and then the column name with the max function as shown in figure 5.
  2. If we want to find the column-wise max value, remove the column name from the code, and execute it.  As you can see in figure 5, it gives the max value of each column. The max value of the Make column is Volvo because Vis the maximum character when it comes to A-Z. Refer the figure 6 to observe the data set.
Figure 5: Use of the max function

3.2 Minimum

The minimum can be taken in the same way we have done with the max function. But in order to take the minimum, we gave to use the min() function. Figure 7shows getting minimum of all the columns. Refer to figure 6 to verify whether the printed values are correct.

Figure-6
Figure 7: Getting the minimum

4  Mean

The mean is the average value of a given set of values. The mean can be calculated by using the mean() function. As the functions we discussed previously this function can be used to get the mean of a particular column or all the columns.

  1. Assume that we need to calculate the mean of the Quantity column. First, specify the data frame (car_sales), then the column name(Quantity). Then use the mean function as shown in figure 8.
Figure 8: Getting the mean of the Quantity column

To get the column-wise mean, remove the column name from the above code. Then execute it as shown in figure 9. Observe that the mean of the Make column is not shown. This is because it automatically detects that, the column contains strings.

Figure 9: Getting the mean of the columns

5. Median

Median is the middle value of a given data set.  The median can be calculated using the median()function. Specify the data frame you want to find the median and then use the median function. As discussed in the above sections, this function also can be used to find out the median of a particular column or all the columns (figure 10).

Figure 10: Getting the median of all columns

6   Mode

The mode is the most repeated value of a given data set. The mode can be obtained using the mode() function. This can be used for a particular column.

To clearly obtain the mode, lets first change multiple cell values to 2884 as shown in the figure 11.

Figure 11: Changing multiple cell values to 2884

Then let’s find the mode of the Quantity column. First specify the data frame, and then column and at last put the mode() function as shown in figure 12 and execute it. As you can see the mode is shown as 2884.

Figure 12: Getting the mode of the Quantity column

For demonstration purposes now let’s put mode() function to all the columns as shown in figure 13 and execute it. It can be observed that for Year column mode is 2007 there is no other hence shows as NaN. For both Pct and the Quantity column, there are no repeated values hence shows all the values.  The Price column mode is 12090 hence it shows in the first cell and the other cells in that column are NaN.

Figure 13: Trying to find mode of the all columns

Let’s assume that there are no repeated values in the quantity column, and then execute the code to calculate the mode of the column: Quantity. As shown in the figure 14, it outputs all the values in that column as there is no mode.

Figure 14: Getting the mode of Quantity column

As another example execute the mode for the Pct column and it returns all the values in that column as well. As shown in figure 15 there is no mode.

Figure 15: Getting the mode of Pct column

7 Standard Deviation

In the same way as the other functions are used, in order to find the Standard Deviation std()function can be used.

In order to calculate the standard deviation of the Quantity column, Specify the data frame you want to find the std and then the column name, and lastly, use the std function as shown in figure 16.

Figure 16: Getting the std of Quantity column

Column wise std can be obtained if we remove the column name. This is shown in figure 17.

Figure 17: Obtaining column wise std

8  Getting the information

There are two functions that can be used to obtain the statistical or concise summary of the data frame. They are the describe() and info() functions.

8.1  Describe function

Rather than column-wise obtaining the mode, median, std, etc. using the relevant functions, describe() function can be used. It gives the summarized version of the calculated mode, median, std, max, min, percentile values as shown in figure 18.

Figure-18

8.2 info function

The info() function can be used to get a summary of index and column data types, non-null values, and the memory usage as shown in figure 19.

Figure 19: The info function

Video Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *