python pandas Read csv parameters

Table of Contents

  1. How to Pass a Delimiter
  2. Skipping the First n Rows
  3. Changing the Column Names
  4. Skipping n Footer Rows
  5. Display n Number of Rows
  6. Display Blank Lines
  7. Index the Columns while Importing
  8. Managing Duplicate Column Names
  9. Converting Column to Dates
  10. Displaying Data Types of the Columns
  11. Video Tutorial ( 2 videos )

1 How to Pass a Delimiter

Firstly, launch the Jupyterlab and open the Jupyter notebook that we are done practicals in the last time. In order to pass the delimiter use delimiter =’,’ command as shown in figure 1. A comma has to be put here because this is a csv file.

Figure 1: The comma delimiter

As an example, when a text file is used, pipe (|)should be used instead of the comma. That is delimiter = ‘|’ ( Pipe ). This is shown in figure 2. As shown in figure 3, If the comma delimiter is used for a txt file, all the columns will take as a one record.

Figure 2: The pipe delimiter
Figure 3: Output when the comma delimiter is used for a text file3

2 How to Skip First n Rows

The first n rows can be skipped by using the “skiprows” parameter. If the first 3 rows should be skipped, type the parameter as skiprows=3. Then execute the code. In the output, it can be observed that the first 3 rows are skipped. This is shown in figure 4.

Figure 4: Skipping the first 3 rows

If the first 10 rows should be skipped, simply change the parameter to skiprows= 10, and then execute it.  You will get the output accordingly as shown in figure 5.

Figure 5: Skipping the first 10 rows

3 Changing Column Names

As shown in figure 5, when you skip rows, the record is taken as the header where the column names should be. The header should be  Year, Maker, sales_qty, and Pct.

This can be changed using the “names” parameter. The data should be passed to the “names” parameter as a list, which means names=[‘Year’,’Maker’,’sales_qty’, ‘Pct’] as shown in figure 6. After executing, it can be observed that the header contains the column names that we passed, not the records as previously.

Figure 6: Inserting the names parameter

4 Skipping n Footer Rows

In a situation where the bottom n number of records should be skipped, “skipfooter” parameter can be used. As an example, if the bottom 5 records should be skipped insert the skipfooter= 5 in to the command and execute it. This is shown in figure 7.

Figure 7: Skipping the last 5 rows using skipfooter parameter

In here, it gives us a warning that,python should be added as the engine because c engine doesn’t support the “skipfooter” parameter. Hence, make sure to specify engine=’python’ when “skipfooter” parameter is used as shown in figure 8.

Figure 8: Putting engine as python

** Please note that

If “skipfooter” or “skiprows” parameters were not used and still we pass column names, using the “names” parameter, the output will take the column names given by using “names” parameter. It will not take the column names from the original file.

Use of header parameter


The “header” parameter can be used to skip the first n rows. The default value of the header is 0. If we put header=1, it’s going to skip the first record as shown in figure 9.

Figure-9

5 Display n Number of Rows

In order to display the first n number of rows, “nrows” parameter can be used. As an example, if the first 5 rows should be displayed, nrows=5 should be added to the command as shown in figure 9. It can be observed that only the first 5 rows are displayed in the output.

Figure 10: Using nrows parameter to display n number of rows

6 Display Blank Lines

Launch Jupyterlab and open a Jupyter Notebook. Let’s start with importing Pandas and importing the txt file as we did in the previous session as shown in figure 11.

Figure 11: The Jupyter notebook from the previous session

There are no blank rows in our original txt file. Hence, for demonstration purposes add some blank lines in the txt file as shown in figure 12.

Figure-12
  1. Then again execute the code in the Jupyter Notebook. As you can see in figure 1, it doesn’t show the blank lines in the output.
  2. In order to display the blank lines, an additional parameter should be used as shown below.

Skip_blank_lines= False

By setting skip_blank_lines parameter to False, the blank lines can be displayed. As shown in figure 13, the blank cell values are displayed as “NaN”.

Figure 13: The output after setting the parameter

The default value of skip_blank_lines parameter is set to be true. This was the reason behind not displaying the blank lines at the beginning. Hence, to display the blank lines, set the parameter to false.

7 Index the Columns while Importing

  1. The index_col parameter can be used to index the required column. As an example, if you want the 0 th column to be the index column,index_col = 0 should be added to the code as shown in figure 14. It can be observed that, in the output the 0 th column is displayed as the index column.
Figure-14
  1. Likewise, other columns can be set as the index column as well. In figure 15, the column 1 is set as the index column. If the Pct column should be set as the index column as in figure 6, set index_col =3.
  2. If you put the index column as column 4, it will give an error message because, in here there is no 4th column.
Figure 15:  Setting the Year column as the index column

Likewise, other columns can be set as the index column as well. In figure 5, the column 1 is set as the index column. If the Pct column should be set as the index column as in figure 6, set index_col =3. If you put the index column as column 4, it will give an error message because, in here there is no 4th column

Figure 16: Setting column 1 as the index column
Figure 17: Setting column 2 as the index column
  1. Moreover, indexing can be done from the backward order as well as from the forward order. As an example if the Pct column should be set as the index column using backward indexing, Index_col = -1 the parameter value should be set to -1, whereas in forwarding indexing it is set as 3. This is shown in figure 17.
Figure 18: Setting Pct column as index column using backward indexing

By setting the value as -2,-3 and -4, the index column can be set as Quantity, Make, and year. Please note that, in backward indexing there is no 0th column. If the parameter is set to -5, it will give an error message because, there is no -5th column in this data set.

8 Managing Duplicate Column Names

Let us think that mistakenly we have put the same column name twice as shown in figure 8. The name Quantity has set for two columns. After executing the code, it can be observed that from figure 9Python has automatically detected this and by default, it will manage the column names as Quantity, Quantity 1.

Figure 19: The data set having the same two columns with the same name
Figure-20

Likewise, if Python detects there are 3 columns using the name as Quantity, it will automatically handle this by renaming the columns as Quantity, Quantity 1, and Quantity 2 as shown in figure 21

Figure 21: Reading a data set having 3 columns with the same name
  1. This can be also handled by passing a parameter as well as shown in figure 10, by setting the, mangle_dupe_col= True.
  2. Even if the parameter is not set to True, duplicate columns will be handled by Python by default. If the parameter is set to False, error message will be pop up saying that its not supported by Python.

9 Converting Column to Dates

  1. In order to convert strings to date data type, parse_date parameter can be used. Lets assume that the year column should be converted to dates. The column name which is needed to be converted should be typed inside the square bracket as shown below.
  2. parse_date = [‘Year’]
  3. Execute the code. It can be seen that the Python has taken 01-01 as the default values for the month and the date. In order to verify if this is converted to date data type, lets display the data types of each column.
Figure-22

10 Displaying Data Types of Columns

The data type object dtypes can be used to display the data type of each column. This can be done by executing the code: car_sales.dtypes

Figure 23 shows the output. It can be observed that, before converting to date data type, the data type of the Year column is a string. But after converting, it has become datetime64[ns] data type. The output also shows the data types of the other columns.

Figure 23: Data types of each column

Video Tutorial-1

 

Video Tutorial-2

Leave a Reply

Your email address will not be published. Required fields are marked *