Table of Contents
In this practical, we are going to read multiple files using Pandas. Two data sets are going to be used which are employee_india.csv and employee_usa.csv. Both files contain 7 records.
To read multiple files we need several packages. They are os, glob, pandas and numpy. Import all these packages to the Jupyter notebook.
Initially, we will define two variables, one for the file path and one for reading the files. The files_path variable is used to specify the path of the file. Both files are contained in the Multiple_files folder. In the read_files variable, we are going to use glob package and glob() function. Inside this function, the file path and the extension of the files are given. The codes are given in figure 1.
for loop is going to be used in order to read the files.
First, initialize np_array_values as a dummy list. Then inside the for loop we are going to use read_csv() to read the files inside the file_path.
Then assign the data to the employee_data data frame.
Next, these files which are read should be appended to the np_array_values. Use append() function to append employee_data to the np_array_values list as shown in figure 2.
Print the files, so that we can know what are the files have been read. Figure 2 also shows the files that were read.
Then execute the np_array_values. As shown in figure 3, we can see that all the records from the multiple files are inserted in to the list.
- Then let’s convert this list in to the array format using the vstack() function in numpy library. The array which is returned by vstack() function is assigned to merge_values variable. So this merge_values is an array.
Then merge_values should be converted in to a data frame using Dataframe() function in Pandas. Then execute the employee_dataas shown in figure 4.
Observe the result in figure 5. It doesn’t have any headers.
- To add headers to the table, pass the column names as a list as shown in figure 6.
Employee_data.columns=[‘ENAME’ , ‘JOB’, ‘EMPNO’, ‘HIREDATE’, “COUNTRY’, ‘DEPTNO’]
Now the table has a header which contains the column names.
We can print the merge_values variable as well. It’s an array. This is shown in figure 7.