Deploying Containers with Docker, GCP Cloud Run and Flask-RESTful

Deploying Containers with Docker, GCP Cloud Run and Flask-RESTful

Serving up an API to access data from Google BigQuery

img

​ Photo by Ian Taylor on Unsplash

These days, data science practitioners find themselves relying on cloud platforms more and more, either for data storage, cloud computing, or a mix of both. This article will demonstrate how to leverage Cloud Run in GCP to access a dataset stored on Google BigQuery, apply a quick transformation, and present the result back to users through a Flask-RESTful API.

Intro — GCP Cloud Run, Containers vs VMs

Cloud Run is a service that allows you to construct and deploy containers that can be accessed via HTTP requests. Cloud Run is scalable and abstracts away infrastructure management so you can get things up and running quickly.

What is a container you ask? A simple way to think about containers is that they are similar to a Virtual Machine (VM), but much smaller in scale and scope.

With a VM, you typically have a virtual version of an entire OS running (such as a Windows PC running a Linux VM through something like VirtualBox.) This Linux VM will typically have a GUI, a web browser, word processing software, IDEs and a whole host of software accompanying it.

With containers however, you can have the minimal amount of software necessary to perform your desired task, making them compact and efficient, easy to create, destroy and deploy on the fly. For example, the container in this article will just have Python 3.8 installed and nothing else.

Cloud Run is well-suited to deploying stateless containers. For a good insight into stateful vs stateless containers, take a look at this article.

Getting Started

The public dataset on BigQuery we will be looking at is the bigquery-public-data.usa_names.usa_1910_2013 dataset:

img

​ Querying to get total counts of first names.

Before getting started, you will need to create:

  1. A Project on GCP,
  2. a service account and a service account key.

For a guide on how to quickly do so, check out the BigQuery API docs. Once you have created a service account, you can create and download a .json service account key, which is used to authenticate you when trying to access BigQuery data through the BQ API.

Step 1: Google Cloud Shell

Access Google Cloud Shell and click ‘Open Editor’. You should see something like this:

img

​ Cloud Shell Editor — Notice ‘Cloud Code’ in bottom left corner.

Next click ‘Cloud Code’ in the bottom left corner:

img

This will bring up a menu of options. Select ‘New Application’:

img

Then from the following options select ‘Cloud Run Application’ → ‘Python (Flask) : Cloud Run’. This gives you the sample Flask-based ‘Hello World’ application for Cloud Run that we will build on to access our BigQuery dataset. You should now have something like this:

img

​ Hello World sample application

The next steps will be changing the provided app.py and Dockerfile, as well as adding some code of our own to access BigQuery.

Step 2: Building the container with Docker

The first step is to slightly edit the existing Dockerfile to specify how our container will be created. Replace the existing Dockerfile code with:

# Python image to use.

FROM python:3.8
# Set the working directory to /appWORKDIR /app

# copy the requirements file used for dependencies

COPY requirements.txt .

# Install any needed packages specified in requirements.txt

RUN pip install --trusted-host pypi.python.org -r requirements.txt

RUN pip install flask-restful

RUN pip install --upgrade google-cloud-bigquery

RUN pip install --upgrade gcloud

RUN pip install pandas

# Copy the rest of the working directory contents into the container at /app

COPY . .

# Run app.py when the container launches

ENTRYPOINT ["python", "app.py"]

This Dockerfile will:

  1. Build a container from the official Python 3.8 image
  2. Set the working directory of the container
  3. Install the packages in the existing requirements.txt file
  4. Install the extra packages necessary (these could instead be added to the existing requirements.txt file if one wished)
  5. Copy other existing files in the working directory to the container’s working directory (his includes our service account key)
  6. Run app.py when the container launches

Step 3: Starting the Flask app with app.py

Replace the code in the existing app.py with:

import os
import requests
import bqfrom flask import Flask
from flask_restful import Resource, Apiapp = Flask(__name__)
api = Api(app)class QueryData(Resource):
    def get(self):
        return bq.run_()api.add_resource(QueryData, '/')if __name__ == '__main__':
    server_port = os.environ.get('PORT', '8080')
    app.run(debug=True, port=server_port, host='0.0.0.0')

Flask-RESTful uses Resource objects to easily define HTTP methods (see the docs for more info). Above we define a Resource to get the results of our bq.py Python script which queries, sorts and returns the data. (It’s possible to create a number of resources and add them to the API using the .add_resource() method.)

Step 4: BigQuery API Python code

Below is the code file that will access the bigquery-data.usa_names.usa_1910_2013 dataset:

bq.py

def run_():
    import os
    import pandas as pd    from google.cloud import bigquery
    from google.oauth2 import service_account    key_path = "./your_key.json"    credentials =   service_account.Credentials.from_service_account_file(
    key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
    )    client = bigquery.Client(credentials=credentials)    query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
    """
    query_job = client.query(query)
    
    counts = []
    names = [] 
    for row in query_job:
        names.append(row["name"])
        counts.append(row["total_people"])
    
    # put names and name counts in a dataframe and sort       #alphabetically, to simulate operating on data with a model
    
    results = {'Names': names, 'Name Counts': counts}
    df = pd.DataFrame.from_dict(results) # convert to DataFrame
    df = df.sort_values(by=['Names']) # sort alphabetically
    df = df.to_dict(orient='list') # convert to dictionary format 
    
    return df

Add this code to a new file name bq.py in the same directory as app.py and the Dockerfile:

img

​ bq.py

Breakdown of bq.py:

This section will allow us to authenticate and access BigQuery to fetch our data:

from google.cloud import bigquery
from google.oauth2 import service_account

key_path = "./your_key.json"credentials =   service_account.Credentials.from_service_account_file(
    key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
    )

client = bigquery.Client(credentials=credentials)

Note that key_path = “./your_key.json” must be changed to the name of your downloaded json service account key from earlier. To import your downloaded key from your computer’s downloads folder into the Cloud Shell editor, simply drag and drop the file into your browser window:

img

​ Service account key now in Cloud Shell project.

The next section contains the query for our desired data:

query = """
    SELECT name, SUM(number) as total_people
    FROM `bigquery-public-data.usa_names.usa_1910_2013`
    WHERE state = 'TX'
    GROUP BY name, state
    ORDER BY total_people DESC
    LIMIT 20
    """
query_job = client.query(query)

The remaining code simply applies sorting to the two data columns, sorting the columns according to the alphabetical order of the ‘Name’ column:

counts = []
names = [] 
for row in query_job:
    names.append(row["name"])
    counts.append(row["total_people"])
    
# put names and name counts in a dataframe and sort   #alphabetically, to simulate operating on data with a model
    
results = {'Names': names, 'Name Counts': counts}
df = pd.DataFrame.from_dict(results) # convert to DataFrame
df = df.sort_values(by=['Names']) # sort alphabetically
df = df.to_dict(orient='list') # convert to dictionary format 
    
return df

The data must be returned in a json compatible format for use with Flask-RESTful, which is why we return the data in dictionary format.

Final Step — Deploy Container with Cloud Run!

Finally, we can deploy our service to the web. Deploying with Cloud Run will generate a link that will allow access to the result of our data transformation.

Once again, select ‘Cloud Code’ in the bottom left corner of the Cloud Shell Editor. This time click ‘Deploy to Cloud Run’ :

img

Follow the steps to create a service, choose a region, allow unauthenticated invocations, and a build environment (either local or with Cloud Build). When you are ready, click Deploy! You can click ‘Show Detailed Logs’ to view the steps of the build and deployment taking place.

When the container has finished building, click the provided link:

img

This link opens a new tab to our final result — the queried BQ data with sorting applied!

img

The two data columns have been sorted, according to the alphabetical order of the ‘Name’ column! While this is a simple operation, the potential for more complex workflows exists. Instead of sorting data, you could apply a trained machine learning model to make predictions and return the results through an API instead. Adding a nicer design to the page as well would help readability.

As a side note, it is not considered best practice to have service account keys sitting in storage, there are other alternative authentication methods available for GCP. Also, keep an eye on billing for GCP products, the public BQ datasets can be queried up to 1TB for free but it is worth deactivating projects if you are not using them long term.

I hoped this article was of use to you. If you liked this story, please consider following me on Medium.

Find me on LinkedIn: https://www.linkedin.com/in/mark-garvey/