Table of Contents
- Google BigQuery console
- Using Python Pandas to write data to BigQuery
- Video Tutorial
In this practical, we are going to write data to Google Big Query using Python Pandas with a single line of code.
Google BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google storage. Simply put, BigQuery is a warehouse that you can load, do manipulations, and retrieve data.
In order to write or read data from BigQuery, a package should be installed.
Open the Anaconda command prompt and type the following command to install it.
conda install pandas- gbq –c conda-forge
2. Google BigQuery Console
Go to the Google BigQuery console as shown in figure 1. In my console I have alexa_data, EMP_TGT, stock_data tables under SampleData schema.
We are going to make a table using Python and write it in to the BigQuery under the SampleData scheme.
3. Using Python Pandas to write data to BigQuery
- Launch Jupyterlab and open a Jupyter notebook. Then import pandas and gbq from the Pandas.io module.
Import the data set Emp_tgt.csv file and assign it to the employee_data data frame as shown in figure 2.
Now we have to make a table so that we can insert the data. To do this we can use to_gbq() function. In here the parameters destination_table, project_id andif_existsshould be specified.
The destination table should be inside the Sample data schema in BigQuery, the project id should be given as shown in the BigQuery console. The parameter if_exists should be put as fail, because if there is a similar table in BigQuery we don’t want to write in to it. The code is shown below.
Employee_data.to_gbq(destination_table= ‘SampleData.Employee_data’ , project_id =‘secondproject201206’ , if_exists = ‘fail’)
Execute the above code. It will take few minutes. Then go to Google BigQuery console and refresh it. Now look at inside secondproject folder, and under SampleData. Our table is written in to it as shown in figure 3.
To view the data inside the table, use the preview tab as shown in figure 4.
As an example, let’s think now of the table is existing in Google BigQuery. Let’s again try to write data. In this scenario, we are getting an error because we have put if_exists parameter as fail.
In a situation where we have done some changes to the table, and we need to replace the table at BigQuery with the one we newly made. Set the value for the if_exists parameter as replace as shown below.
Employee_data.to_gbq(destination_table= ‘SampleData.Employee_data’ , project_id =‘secondproject201206’ , if_exists = ‘replace’)
As an example, let’s think now we have a new column named Deptno as shown in figure 6.
Then let’s re-execute the codes to import the data file and write it to BigQuery. After executing, reload the BigQuery console. Now, the previous data set is replaced by the new one successfully. This is shown in figure 7.
- Let’s assume, we want to append new data to the existing table at BigQuery. To do this we need to set the if_exists parameter as append.
The data which is needed to append is shown in figure 8. Import the data to the notebook and then type the following command to append the data to the existing table.
Employee_data.to_gbq(destination_table= ‘SampleData.Employee_data’ , project_id =‘secondproject201206’ , if_exists = ‘append’)
Then execute the command. After executing, go to BigQuery console and reload it. Check the table. We can see that the data is appended to the existing table as shown in figure 9.