how to read multiple files using python pandas

Table of Contents

In this practical, we are going to read multiple files using Pandas. Two data sets are going to be used which are employee_india.csv and employee_usa.csv. Both files contain 7 records.

To read multiple files we need several packages. They are os, glob, pandas and numpy. Import all these packages to the Jupyter notebook.

Initially, we will define two variables, one for the file path and one for reading the files. The files_path variable is used to specify the path of the file. Both files are contained in the Multiple_files folder. In the read_files variable, we are going to use glob package and glob() function.  Inside this function, the file path and the extension of the files are given. The codes are given in figure 1.

Figure 1: Importing the packages and defining the variables

for loop is going to be used in order to read the files.

First, initialize np_array_values as a dummy list.  Then inside the for loop we are going to use read_csv() to read the files inside the file_path.

Then assign the data to the employee_data data frame.

Next, these files which are read should be appended to the np_array_values. Use append() function to append employee_data to the np_array_values list as shown in figure 2.

Print the files, so that we can know what are the files have been read. Figure 2 also shows the files that were read.

Figure 2: Reading multiple files using a for loop

Then execute the np_array_values. As shown in figure 3, we can see that all the records from the multiple files are inserted in to the list.

Figure-3
  1. Then let’s convert this list in to the array format using the vstack() function in numpy library. The array which is returned by vstack() function is assigned to merge_values variable. So this merge_values is an array.

Then merge_values should be converted in to a data frame using Dataframe() function in Pandas. Then execute the employee_dataas shown in figure 4.

Figure 4: Converting list to array and then to tabular form

Observe the result in figure 5. It doesn’t have any headers.

Figure 5: Data in tabular form
  1. To add headers to the table, pass the column names as a list as shown in figure 6.

Employee_data.columns=[‘ENAME’ , ‘JOB’, ‘EMPNO’, ‘HIREDATE’, “COUNTRY’, ‘DEPTNO’]

               Now the table has a header which contains the column names.

FIgure-6

We can print the merge_values variable as well. It’s an array. This is shown in figure 7.

Figure-7

Video Tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *