Table of Contents
- Google BigQuery console
- Using Python Pandas to write data to BigQuery
- Video Tutorial
In this practical, we are going to read data from Google Big Query using Python Pandas with a single line of code.
Google BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google storage. Simply put, BigQuery is a warehouse that you can load, do manipulations, and retrieve data.
In order to write or read data from BigQuery, a package should be installed.
Open the Anaconda command prompt and type the following command to install it.
conda install pandas- gbq –c conda-forge
2. Google BigQuery Console
Go to the Google BigQuery console as shown in figure 1. In my console I have alexa_data, EMP_TGT, stock_data tables under SampleData schema. We are going to read alexa_data from BigQuery.
In order to visualize the first 1000 records in alexa_data, write a query and run it at the new query as shown in figure 2.
3. Using Python Pandas to read data from BigQuery
- Launch Jupyterlab and open a Jupyter notebook. Then import pandas and gbq from the Pandas.io module.
- Then we need to specify the data set name we are going to read from BigQuery. Then the data set should be specified as we did in BigQuery console as the following way.
Query = “”” SELECT * FROM [secondproject-201206:SampleData.alexa_data] LIMIT 50”””
In here we have written a query to retrieve the first 50 records from alexa_data which is inside the second project, SampleData schema.
Note that we have passed the query as a variable (Query) in here.
3. Then in order to read the file from BigQuery, we can use read_gbq() function. We need to pass the query and the value for project_id in to this function as shown below.
alexa_data = gbq.read_gbq(Query, project_id = “secondproject-201206”)
4. Then execute the all codes as shown in figure 3. Now we can read the data from BigQuery.
- As another example, let’s read EMP_TGT table from the BigQuery. This can be done in the same way. First write a query at the console as shown below.
SELECT * FROM [secondproject-201206:SampleData.EMP_TGT] LIMIT 1000
Then pass this query in to the notebook as shown in figure 4. Execute the codes. This will take few minutes.