Python pandas adding dropping and renaming columns in data frame

Table of Contents

  1. TO_CSV (Parameters)
    1. Writing data into a file
    2. Null Values Representation
    3. Export a Particular Column
  2. RENAME_COLUMNS (Parameters)
    1. Rename all columns
    2. Rename particular column
    3. Add new column
    4. Delete a column
  3. Video Tutorial

1 TO_CSV (Parameters)

Firstly, launch Jupyterlab and then open a Jupyter notebook. As discussed in previous sessions import pandas and read the csv file “car_sales” using read_csv function as shown in figure 1.

Figure-1

1.1 Writing data in to a file

 Now, let’s replace a value by a null value. This can be done using the na_values parameter as shown below.

na_values = [19.9]

The 19.9 values in the data set will be replaced by null values once the code is executed as shown in the figure 2.

Figure 2: Replacing 19.9 by a null value

Secondly, we will export the file using to_csv function as a new file named “modified_data.csv” which has the exact destination as the original file’s destination as shown below in figure 3.

Figure 3: Exporting the data in to a new file

Open the file modified_data.csv in the folder. It can be observed that the null value is not shown in the file. This is shown in figure 4.

Figure 4: The modified_data.csv file doesn’t show the null value

1.2 Null Values Representation

Any desired value can be passed as instead of a null value. This can be represented by a negative value or any value. Let’s assume -1 is passed instead of the null value. To achieve this purpose na_rep parameter can be used. The desired value should be given inside single quotations as shown below.

Na_rep = ‘-1’

After adding this to the code as shown in figure 5, execute the code and refresh the modified_data.csv file.

Figure 5: Code to represent the null value as -1 value

It can be observed that -1 is displayed instead where the null value should be, as shown in figure 6.

Figure 6:  -1 is displayed instead of the null value

As an example let’s replace multiple values with null values and then represent the null values by -1. To do this, first, pass the desired values to be replaced by the null value to the na_values parameter as shown below.

na_values= [19.9,12899,878,352]

Then execute the codes again as shown in figure 7.

Figure-7

Then refresh the modified_data.csv and observe that the -1 is displayed instead of that multiple values that were replaced by null values as shown in figure 8.

Figure 8 : Multiple null values are represented by -1

1.3  Export a Particular Column

Most of the time we don’t need all the columns to be exported. Particular columns can be exported by passing the column names to the parameter columns as a list. If Year and Price columns should be exported to the modified_data.csv file, the parameter should be,

Columns = [‘Year’, ‘Price’]

Then execute the following code in figure 9.

Figure 9: Exporting the Year and Price columns

The effect can be observed by re-opening the modified_data.csv file as shown in figure 10.

Figure 10: The required columns are displayed

2. RENAME_COLUMNS (Parameters)

First, let’s display all column names of our data frame. This can be achieved by the columns function. Say, the column names of the car_sales data frame should be displayed. Then the code should be as follows. Execute the code to display the column names as in figure 11.

car_sales.columns

Figure 11: Displaying column names

 2.1 Rename all columns

All the column names can be renamed by passing the desired column names to the columns function as a list as shown below. The original column names will be overwritten by the given column names.

car_sales.columns=[‘Year_of_Manf’, ‘Manufacturer’, ‘Sales_Qty’, ‘Market_Share’, ‘Price_Vehicle’]

In here, its assumed to rename the original columns by, Year_of_Manf, Manufacturer, Sales_Qty, Market_Share, Price_Vehicle. Then in order to display the data frame execute the above code with car_sales.columns code as shown in figure 12.

Figure 12: Renaming all columns

This can be also verified by executing the data frame as shown in figure 13. It can be observed that the column names are renamed.

Figure 13:  All columns are renames

2.2 Rename particular column

In this section, we are going to discuss renaming a particular column. This can be done using the rename function. Say, we need to rename the Year_of_Manf column and, the new column name should be Year. It can be done in the following way

car_sales.rename(columns={‘Year_of_Manf’: ’Year’}, inplace= True)

In order to overwrite the updated name, pass the value as True to the inplace parameter.

Figure 14:  Renaming a particular column

This can be verified by executing the data frame as shown in figure 15. Now the column name has become the Year. Previously it was Year_of_Manf.

Figure 15: Column name has changed to Year

 2.3 Add new column

Assume that a new column named “Revenue” should be added to the data set.

First, delete the highlighted part of the code as shown in figure 16 and execute the data frame, as we need the original data set for this part of the practical.

Figure 16: Deleting the highlighted part of the code

The Revenue should be generated by multiplying Quantity and Price. Hence, in order to generate the Revenue column values, the values in the Quantity column should be multiplied by values in the Price column. The code is shown in figure 17. The name of the new column should be given inside the square brackets and inside single quotations after the name of the data frame (car_sales). Then the column names which should be multiplied in order to get the Revenue column should be given.

Figure 17: Generating a new column named Revenue

Then execute the car_sales data frame. As shown in figure 18, the data set contains the newly added column:Revenue.

Figure 18: The newly added column: Revenue

2.4 Delete a column

In this section, we will look at how to delete a column. Assume that the Revenue column should be deleted. This can be done using del keyword. Simply put del keyword and then the data frame name and pass the column name as a list as shown below.

del car_sales[‘Revenue’]

Then execute this code. Then, execute the data frame to observe the change. This is shown in figure 19.

Figure 19: Deleting a column

Video Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *