Python pandas read csv and writing data to a flat file

Table of Contents

  1. Replace a value as a null
  2. Display only required columns
  3. Write the data into flat file
  4. Video Tutorial

1 Replace a value as a null

Launch Jupyterlab and open the Jupyter notebook. As shown in the figure 1, import the data file, and read the data frame using read_csv function.

Figure 1: Importing data in to the Jupyter notebook

Lets assume the 2884 values should be replaced by null values. This can be done by na_values parameter as shown below, na_values = [2884]

Add this parameter to the code and execute as shown in figure 2. It can be observed both 2884 values were replaced by null values. Hence, wherever this value is contained in the data set, it will be replaced

Figure 2: Replacing 2884 value by a null value

In order to replace a particular value, in a particular column, the same parameter can be used by passing a dictionary to it as shown below. Let’s assume that only the 2884 value in the Quantity column should be replaced by a null value. na_values= {‘Quantity’: [2884]}

Execute the code. As shown in figure 3, the 2884 value that was in Quantity column was replaced by a null value.

Figure 3: Replacing a particular value by a null value

Likewise, if the 2884 value in Pct column needs to be replaced by a null value, simply change the parameter to,

na_values= {‘Pct’: [2884]} as shown in figure 4

Figure-4

In the same way, any number of values can be replaced. As an example,na_values={‘Pct’:[2884,19.9]}

In this scenario both 2884 and 19.9 values in Pct column will be replaced by null values as shown in figure 5.

Figure 5: Replacing multiple values in a particular column
  1. Let’s take another example. Assume that 2884 and 19.9 values in Pct column and 870 values in Quantity should be replaced by null values. Then the parameter should be,

na_values={‘Pct’:[2884,19.9], ‘Quantity’:[870]}

Execute the code after changing the parameter as shown above. According to figure 6, it can be observed that the requirement is fulfilled.

Figure 6: Replacing multiple values in multiple columns by a null value

2 Display only required columns

The usecols parameter can be used to display a particular column. Let’s assume that the Year and Quantity columns should be displayed. The relevant column names should be passed as a dictionary to the parameter as shown below,

usecols=[‘Year’,’Quantity’]

Execute the code after changing the parameter as shown above. It can be observed in figure 7, the output only displays the Year and Quantity columns. In this way, reading all the columns can be avoided. Which improves the performance of the code.

Figure 7: Displaying only required columns

3 Write the data into flat file

  1. Exporting data into a file can be done by usingto_csv function. Please note that in order to read a csv file we used read_csv function whereas, to write csv file we use to_csv function. By using this parameter, we can write data into the same file or another file.

Assume we are writing the above-obtained data into a new file named modified_data.csv which is in the same path. This can be done by the code which is shown in figure 8.

Figure 8: Exporting data using to_csv function

Execute the code and go to the relevant folder as shown in figure 9. We can see that the file we created (modified_data.csv) is present. In order to verify this, open the file and it can be observed that the above-modified data is written into the file as shown in figure 10.

Figure 9: Newly created file presents in the folder
Figure-10

According to figure 10, there is an index column which shows the sequencing. If it shouldn’t be displayed, change the code by passing an additional parameter called index as shown below.

index= False

Add this to the code as shown in figure 11 and execute the code. Reload the data file, and it can be observed the index column is absent.

Figure 11:  Setting the index parameter to false

** Please note that by default the index parameter value is True. Which explains the display of index column at the beginning. Verify it by changing the index= True as shown in figure 12 and reloading the file as shown in figure 13.

Figure 12: Setting the index parameter to True
Figure-13

** Consider that, the default delimiter for csv is comma. This can be changed to pipe by, sep = ‘|’

As shown in figure 14. Reload the data file as in figure 15. It can be observed the values are now separated by pipes.

Figure 14: Changing the comma separation to pipe separation
Figure 15: The values are separated by pipes

Video Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *