Showing posts with label authentication. Show all posts
Showing posts with label authentication. Show all posts

Tuesday, December 13, 2016

Formatting text with the Google Slides API

NOTE: The code covered in this post are also available in a video walkthrough.

Introduction

If you know something about public speaking, you're aware that the most effective presentations are those which have more images and less text. As a developer of applications that auto-generate slide decks, this is even more critical as you must ensure that your code creates the most compelling presentations possible for your users.

This means that any text featured in those slide decks must be more impactful. To that end, it's important you know how to format any text you do have. That's the exact subject of today's post, showing you how to format text in a variety of ways using Python and the Google Slides API.

The API is fairly new, so if you're unfamiliar with it, check out the launch post and take a peek at the API overview page to acclimate yourself to it first. You can also read related posts (and videos) explaining how to replace text & images with the API or how to generate slides from spreadsheet data. If you're ready-to-go, let's move on!

Using the Google Slides API

The demo script requires creating a new slide deck so you need the read-write scope for Slides:
  • 'https://www.googleapis.com/auth/presentations' — Read-write access to Slides and Slides presentation properties
If you're new to using Google APIs, we recommend reviewing earlier posts & videos covering the setting up projects and the authorization boilerplate so that we can focus on the main app. Once we've authorized our app, assume you have a service endpoint to the API and have assigned it to the SLIDES variable.

Create deck & set up new slide for text formatting

A new slide deck can be created with SLIDES.presentations().create()—or alternatively with the Google Drive API which we won't do here. We'll name it, "Slides text formatting DEMO" and save its ID along with the IDs of the title and subtitle textboxes on the auto-created title slide:
DATA = {'title': 'Slides text formatting DEMO'}
rsp = SLIDES.presentations().create(body=DATA).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0]
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']
The title slide only has two elements on it, the title and subtitle textboxes, returned in that order, hence why we grab them at indexes 0 and 1 respectively. Now that we have a deck, let's add a slide that has a single (largish) textbox. The slide layout with that characteristic that works best for our demo is the "main point" template:



While we're at it, let's also add the title & subtitle on the title slide. Here's the snippet that builds and executes all three requests:
print('** Create "main point" layout slide & add titles')
reqs = [
  {'createSlide':
     {'slideLayoutReference': {'predefinedLayout': 'MAIN_POINT'}}},
  {'insertText':
     {'objectId': titleID, 'text': 'Formatting text'}},
  {'insertText':
     {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
slideID = rsp[0]['createSlide']['objectId']
The requests are sent in the order you see above, and responses come back in the same order. We don't care much about the 'insertText' directives, but we do want to get the ID of the newly-created slide. In the array of 3 returned responses, that slideID comes first.

Why do we need the slide ID? Well, since we're going to be using the one textbox on that slide, the only way to get the ID of that textbox is by doing a presentations().pages().get() call to fetch all the objects on that slide. Since there's only one "page element," the textbox in question, we make that call and save the first (and only) object's ID:
print('** Fetch "main point" slide title (textbox) ID')
rsp = SLIDES.presentations().pages().get(presentationId=deckID,
        pageObjectId=slideID).execute().get('pageElements')
textboxID = rsp[0]['objectId']
Armed with the textbox ID, we're ready to add our text and format it!

Formatting text

The last part of the script starts by inserting seven (short) paragraphs of text—then format different parts of that text (in a variety of ways). Take a look here, then we'll discuss below:
reqs = [
    # add 6 paragraphs
    {'insertText': {
        'text': 'Bold 1\nItal 2\n\tfoo\n\tbar\n\t\tbaz\n\t\tqux\nMono 3',
        'objectId': textboxID,
    }},
    # shrink text from 48pt ("main point" textbox default) to 32pt
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'fontSize': {'magnitude': 32, 'unit': 'PT'}},
        'textRange': {'type': 'ALL'},
        'fields': 'fontSize',
    }},
    # change word 1 in para 1 ("Bold") to bold
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'bold': True},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 0, 'endIndex': 4},
        'fields': 'bold',
    }},
    # change word 1 in para 2 ("Ital") to italics
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'italic': True},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 7, 'endIndex': 11},
        'fields': 'italic'
    }},
    # change word 1 in para 7 ("Mono") to Courier New
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'fontFamily': 'Courier New'},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 36, 'endIndex': 40},
        'fields': 'fontFamily'
    }},
    # bulletize everything
    {'createParagraphBullets': {
        'objectId': textboxID,
        'textRange': {'type': 'ALL'},
    }},
]
After the text is inserted, the first operation this code performs is to change the font size of all the text inserted ('ALL' means to format the entire text range) to 32 pt. The main point layout specifies a default font size of 48 pt, so this request shrinks the text so that everything fits and doesn't wrap. The 'fields' parameter specifies that only the 'fontSize' attribute is affected by this command, meaning leave others such as the font type, color, etc., alone.

The next request bolds the first word of the first paragraph. Instead of 'ALL', the exact range for the first word is given. (NOTE: the end index is excluded from the range, so that's why it must be 4 instead of 3, or you're going to lose one character.) In this case, it's the "Bold" word from the first paragraph, "Bold 1". Again, 'fields' is present to indicate that only the font size should be affected by this request while everything else is left alone. The next directive is nearly identical except for italicizing the first word ("Ital") of the second paragraph ("Ital 2").

After this we have a text style request to alter the font of the first word ("Mono") in the last paragraph ("Mono 3") to Courier New. The only other difference is that 'fields' is now 'fontFamily' instead of a flag. Finally, bulletize all paragraphs. Another call to SLIDES.presentations().batchUpdate() and we're done.

Conclusion

If you run the script, you should get output that looks something like this, with each print() representing execution of key parts of the application:
$ python3 slides_format_text.py 
** Create new slide deck
** Create "main point" layout slide & add titles
** Fetch "main point" slide title (textbox) ID
** Insert text & perform various formatting operations
DONE
When the script has completed, you should have a new presentation with these slides:




Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!)—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/presentations',
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
SLIDES = discovery.build('slides', 'v1', http=creds.authorize(Http()))

print('** Create new slide deck')
DATA = {'title': 'Slides text formatting DEMO'}
rsp = SLIDES.presentations().create(body=DATA).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0]
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']

print('** Create "main point" layout slide & add titles')
reqs = [
    {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'MAIN_POINT'}}},
    {'insertText': {'objectId': titleID, 'text': 'Formatting text'}},
    {'insertText': {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
slideID = rsp[0]['createSlide']['objectId']

print('** Fetch "main point" slide title (textbox) ID')
rsp = SLIDES.presentations().pages().get(presentationId=deckID,
        pageObjectId=slideID).execute().get('pageElements')
textboxID = rsp[0]['objectId']

print('** Insert text & perform various formatting operations')
reqs = [
    # add 7 paragraphs
    {'insertText': {
        'text': 'Bold 1\nItal 2\n\tfoo\n\tbar\n\t\tbaz\n\t\tqux\nMono 3',
        'objectId': textboxID,
    }},
    # shrink text from 48pt ("main point" textbox default) to 32pt
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'fontSize': {'magnitude': 32, 'unit': 'PT'}},
        'textRange': {'type': 'ALL'},
        'fields': 'fontSize',
    }},
    # change word 1 in para 1 ("Bold") to bold
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'bold': True},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 0, 'endIndex': 4},
        'fields': 'bold',
    }},
    # change word 1 in para 2 ("Ital") to italics
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'italic': True},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 7, 'endIndex': 11},
        'fields': 'italic'
    }},
    # change word 1 in para 6 ("Mono") to Courier New
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'fontFamily': 'Courier New'},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 36, 'endIndex': 40},
        'fields': 'fontFamily'
    }},
    # bulletize everything
    {'createParagraphBullets': {
        'objectId': textboxID,
        'textRange': {'type': 'ALL'},
    }},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Wednesday, November 9, 2016

Replacing text & images with the Google Slides API with Python

NOTE: The code covered in this post are also available in a video walkthrough however the code here differs slightly, featuring some minor improvements to the code in the video.

Introduction

One of the critical things developers have not been able to do previously was access Google Slides presentations programmatically. To address this "shortfall," the Slides team pre-announced their first API a few months ago at Google I/O 2016—also see full announcement video (40+ mins). In early November, the G Suite product team officially launched the API, finally giving all developers access to build or edit Slides presentations from their applications.

In this post, I'll walk through a simple example featuring an existing Slides presentation template with a single slide. On this slide are placeholders for a presentation name and company logo, as illustrated below:

One of the obvious use cases that will come to mind is to take a presentation template replete with "variables" and placeholders, and auto-generate decks from the same source but created with different data for different customers. For example, here's what a "completed" slide would look like after the proxies have been replaced with "real data:"

Using the Google Slides API

We need to edit/write into a Google Slides presentation, meaning the read-write scope from all Slides API scopes below:
  • 'https://www.googleapis.com/auth/presentations' — Read-write access to Slides and Slides presentation properties
  • 'https://www.googleapis.com/auth/presentations.readonly' — View-only access to Slides presentations and properties
  • 'https://www.googleapis.com/auth/drive' — Full access to users' files on Google Drive
Why is the Google Drive API scope listed above? Well, think of it this way: APIs like the Google Sheets and Slides APIs were created to perform spreadsheet and presentation operations. However, importing/exporting, copying, and sharing are all file-based operations, thus where the Drive API fits in. If you need a review of its scopes, check out the Drive auth scopes page in the docs. Copying a file requires the full Drive API scope, hence why it's listed above. If you're not going to copy any files and only performing actions with the Slides API, you can of course leave it out.

Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action.

Getting started

What are we doing in today's code sample? We start with a slide template file that has "variables" or placeholders for a title and an image. The application code will go then replace these proxies with the actual desired text and image, with the goal being that this scaffolding will allow you to automatically generate multiple slide decks but "tweaked" with "real" data that gets substituted into each slide deck.

The title slide template file is TMPFILE, and the image we're using as the company logo is the Google Slides product icon whose filename is stored as the IMG_FILE variable in my Google Drive. Be sure to use your own image and template files! These definitions plus the scopes to be used in this script are defined like this:
IMG_FILE = 'google-slides.png'     # use your own!
TMPLFILE = 'title slide template'  # use your own!
SCOPES = (
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/presentations',
)
Skipping past most of the OAuth2 boilerplate, let's move ahead to creating the API service endpoints. The Drive API name is (of course) 'drive', currently on 'v3', while the Slides API is 'slides' and 'v1' in the following call to create a signed HTTP client that's shared with a pair of calls to the apiclient.discovery.build() function to create the API service endpoints:
HTTP = creds.authorize(Http())
DRIVE =  discovery.build('drive',  'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

Copy template file

The first step of the "real" app is to find and copy the template file TMPLFILE. To do this, we'll use DRIVE.files().list() to query for the file, then grab the first match found. Then we'll use DRIVE.files().copy() to copy the file and name it 'Google Slides API template DEMO':
rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

Find image placeholder

Next, we'll ask the Slides API to get the data on the first (and only) slide in the deck. Specifically, we want the dimensions of the image placeholder. Later on, we will use those properties when replacing it with the company logo, so that it will be automatically resized and centered into the same spot as the image placeholder.
The SLIDES.presentations().get() method is used to read the presentation metadata. Returned is a payload consisting of everything in the presentation, the masters, layouts, and of course, the slides themselves. We only care about the slides, so we get that from the payload. And since there's only one slide, we grab it at index 0. Once we have the slide, we're loop through all of the elements on that page and stop when we find the rectangle (image placeholder):
print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID
       ).execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
    if obj['shape']['shapeType'] == 'RECTANGLE':
        break

Find image file

At this point, the obj variable points to that rectangle. What are we going to replace it with? The company logo, which we now query for using the Drive API:
print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
        DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token) 
The query code is similar to when we searched for the template file earlier. The trickiest thing about this snippet is that we need a full URL that points directly to the company logo. We use the DRIVE.files().get_media() method to create that request but don't execute it. Instead, we dig inside the request object itself and grab the file's URI and merge it with the current access token so what we're left with is a valid URL that the Slides API can use to read the image file and create it in the presentation.

Replace text and image

Back to the Slides API for the final steps: replace the title (text variable) with the desired text, add the company logo with the same size and transform as the image placeholder, and delete the image placeholder as it's no longer needed:
print('** Replacing placeholder text and icon')
reqs = [
    {'replaceAllText': {
        'containsText': {'text': '{{NAME}}'},
        'replaceText': 'Hello World!'
    }},
    {'createImage': {
        'url': img_url,
        'elementProperties': {
            'pageObjectId': slide['objectId'],
            'size': obj['size'],
            'transform': obj['transform'],
        }
    }},
    {'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=DECK_ID).execute()
print('DONE')
Once all the requests have been created, send them to the Slides API then let the user know everything is done.

Conclusion

That's the entire script, just under 60 lines of code. If you watched the video, you may notice a few minor differences in the code. One is use of the fields parameter in the Slides API calls. They represent the use of field masks, which is a separate topic on its own. As you're learning the API now, it may cause unnecessary confusion, so it's okay to disregard them for now. The other difference is an improvement in the replaceAllText request—the old way in the video is now deprecated, so go with what we've replaced it with in this post.

If your template slide deck and image is in your Google Drive, and you've modified the filenames and run the script, you should get output that looks something like this:
$ python3 slides_template.py
** Copying template 'title slide template' as 'Google Slides API template DEMO'
** Get slide objects, search for image placeholder
** Searching for icon file
 - Found image 'google-slides.png'
** Replacing placeholder text and icon
DONE
Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into major sections, they would be:
  • Get creds & build API service endpoints
  • Copy template file
  • Get image placeholder size & transform (for replacement image later)
  • Get secure URL for company logo
  • Build and send Slides API requests to...
    • Replace slide title variable with "Hello World!"
    • Create image with secure URL using placeholder size & transform
    • Delete image placeholder
Here's the complete script—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

IMG_FILE = 'google-slides.png'      # use your own!
TMPLFILE = 'title slide template'   # use your own!
SCOPES = (
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/presentations',
)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
HTTP = creds.authorize(Http())
DRIVE  = discovery.build('drive',  'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID,
        fields='slides').execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
    if obj['shape']['shapeType'] == 'RECTANGLE':
        break

print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
        DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token)

print('** Replacing placeholder text and icon')
reqs = [
    {'replaceAllText': {
        'containsText': {'text': '{{NAME}}'},
        'replaceText': 'Hello World!'
    }},
    {'createImage': {
        'url': img_url,
        'elementProperties': {
            'pageObjectId': slide['objectId'],
            'size': obj['size'],
            'transform': obj['transform'],
        }
    }},
    {'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=DECK_ID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Code challenge

Add more slides and/or text variables and modify the script replace them too. EXTRA CREDIT: Change the image-based image placeholder to a text-based image placeholder, say a textbox with the text, "{{COMPANY_LOGO}}" and use the replaceAllShapesWithImage request to perform the image replacement. By making this one change, your code should be simplified from the image-based image replacement solution we used in this post.

Tuesday, April 7, 2015

Google APIs: migrating from tools.run() to tools.run_flow()

Got AttributeError? As in: AttributeError: 'module' object has no attribute 'run'? Rename run() to run_flow(), and you'll be good-to-go. TL;DR: This mini-tutorial slash migration guide slash PSA (public service announcement) is aimed at Python developers using the Google APIs Client Library (to access Google APIs from their applications) currently calling oauth2client.tools.run() and likely getting an exception (see Jan 2016 update below), and need to oauth2client.tools.run_flow(), its replacement. 

UPDATE (Aug 2016): The flags parameter in run_flow() function became optional in Feb 2016, so tweaked the blogpost to reflect that.

UPDATE (Jun 2016): Revised the code and cleaned up the dialog so there are no longer any instances of using run() function, significantly shortening this post.

UPDATE (Jan 2016): The tools.run() function itself was forcibly removed (without a fallback) in Aug 2015, so if you're using any release on or after that, any such calls from your code will throw an exception (AttributeError: 'module' object has no attribute 'run'). To fix this problem, continue reading.

Prelude

We're going to continue our look at accessing Google APIs from Python. In addition to the previous pair of posts (http://goo.gl/57Gufk and http://goo.gl/cdm3kZ), as part of my day job, I've been working on corresponding video content, some of which are tied specifically to posts on this blog.

In this follow-up, we're going to specifically address the sidebar in the previous post, where we bookmarked an item for future discussion where the future is now: in the oauth2client package, tools.run() has been deprecated by tools.run_flow(). Note you need at least Python 2.7 or 3.3 to use the Google APIs Client Library. (If you didn't even know Python 3 was supported at all, then you need to see this post and this Quora Q&A.)

Replacing tools.run() with tools.run_flow()

Now let's convert the authorized access to Google APIs code from using tools.run() to tools.run_flow(). Here is the old snippet I'm talking about that needs upgrading:
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = # one or more scopes (str or iterable)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run(flow, store)

SERVICE = discovery.build(API, VERSION, http=creds.authorize(Http()))
If you're using the latest Client Library (as of Feb 2016), all you need to do is change the tools.run() call to tools.run_flow(), as italicized below. Everything else stays exactly the same:
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = # one or more scopes (str or iterable)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
If you don't have the latest Client Library, then your update involves the extra steps of adding lines that import argparse and using it to get the flags argument needed by tools.run_flow() plus the actual change from tools.run(); all updates italicized below:
import argparse

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = # one or more scopes (str or iterable)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
    flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
    creds = tools.run_flow(flow, store, flags)

SERVICE = discovery.build(API, VERSION, http=creds.authorize(Http()))

Command-line argument processing, or "Why argparse?"

Python has had several modules in the Standard Library that allow developers to process command-line arguments. The original one was getopt which mirrored the getopt() function from C. In Python 2.3, optparse was introduced, featuring more powerful processing capabilities. However, it was deprecated in 2.7 in favor of a similar module, argparse. (To find out more about their similarities, differences and rationale behind developing argparse , see PEP 389 and this argparse docs page.) For the purposes of using Google APIs, you're all set if using Python 2.7 as it's included in the Standard Library. Otherwise Python 2.3-2.6 users can install it with: "pip install -U argparse". 

Irregardless of whether you need argparse, once you migrate to either snippet with tools.run_flow(), your application should go back to working the way it had before.

Thursday, November 6, 2014

Authorized Google API access from Python (part 2 of 2)

Listing your files with the Google Drive API

NOTE: You can also watch a video walkthrough of the common code covered in this blogpost here.

UPDATE (Mar 2020): You can build this application line-by-line with our codelab (self-paced, hands-on tutorial) introducing developers to G Suite APIs. The deprecated auth library comment from the previous update below is spelled out in more detail in the green sidebar towards the bottom of step/module 5 (Install the Google APIs Client Library for Python). Also, the code sample is now maintained in a GitHub repo which includes a port to the newer auth libraries so you have both versions to refer to.

UPDATE (Apr 2019): In order to have a closer relationship between the GCP and G Suite worlds of Google Cloud, all G Suite Python code samples have been updated, replacing some of the older G Suite API client libraries with their equivalents from GCP. NOTE: using the newer libraries requires more initial code/effort from the developer thus will seem "less Pythonic." However, we will leave the code sample here with the original client libraries (deprecated but not shutdown yet) to be consistent with the video.

UPDATE (Aug 2016): The code has been modernized to use oauth2client.tools.run_flow() instead of the deprecated oauth2client.tools.run_flow(). You can read more about that change here.

UPDATE (Jun 2016): Updated to Python 2.7 & 3.3+ and Drive API v3.

Introduction

In this final installment of a (currently) two-part series introducing Python developers to building on Google APIs, we'll extend from the simple API example from the first post (part 1) just over a month ago. Those first snippets showed some skeleton code and a short real working sample that demonstrate accessing a public (Google) API with an API key (that queried public Google+ posts). An API key however, does not grant applications access to authorized data.

Authorized data, including user information such as personal files on Google Drive and YouTube playlists, require additional security steps before access is granted. Sharing of and hardcoding credentials such as usernames and passwords is not only insecure, it's also a thing of the past. A more modern approach leverages token exchange, authenticated API calls, and standards such as OAuth2.

In this post, we'll demonstrate how to use Python to access authorized Google APIs using OAuth2, specifically listing the files (and folders) in your Google Drive. In order to better understand the example, we strongly recommend you check out the OAuth2 guides (general OAuth2 info, OAuth2 as it relates to Python and its client library) in the documentation to get started.

The docs describe the OAuth2 flow: making a request for authorized access, having the user grant access to your app, and obtaining a(n access) token with which to sign and make authorized API calls with. The steps you need to take to get started begin nearly the same way as for simple API access. The process diverges when you arrive on the Credentials page when following the steps below.

Google API access

In order to Google API authorized access, follow these instructions (the first three of which are roughly the same for simple API access):
  • Go to the Google Developers Console and login.
    • Use your Gmail or Google credentials; create an account if needed
  • Click "Create a Project" from pulldown under your username (at top)
    • Enter a Project Name (mutable, human-friendly string only used in the console)
    • Enter a Project ID (immutable, must be unique and not already taken)
  • Once project has been created, enable APIs you wish to use
  • Select "Credentials" in left-nav
    • Click "Create credentials" and select OAuth client ID
    • In the new dialog, select your application type — we're building a command-line script which is an "Installed application"
    • In the bottom part of that same dialog, specify the type of installed application; choose "Other" (cmd-line scripts are not web nor mobile)
    • Click "Create Client ID" to generate your credentials
  • Finally, click "Download JSON" to save the new credentials to your computer... perhaps choose a shorter name like "client_secret.json" or "client_id.json"
NOTEs: Instructions from the previous blogpost were to get an API key. This time, in the steps above, we're creating and downloading OAuth2 credentials. You can also watch a video walkthrough of this app setup process of getting simple or authorized access credentials in the "DevConsole" here.

    Accessing Google APIs from Python

    In order to access authorized Google APIs from Python, you still need the Google APIs Client Library for Python, so in this case, do follow those installation instructions from part 1.

    We will again use googleapiclient.discovery.build(), which is required to create a service endpoint for interacting with an API, authorized or otherwise. However, for authorized data access, we need additional resources, namely the httplib2 and oauth2client packages. Here are the first five lines of the new boilerplate code for authorized access:

    from __future__ import print_function
    
    from googleapiclient import discovery
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    SCOPES = # one or more scopes (strings)
    
    SCOPES is a critical variable: it represents the set of scopes of authorization an app wants to obtain (then access) on behalf of user(s). What's does a scope look like?

    Each scope is a single character string, specifically a URL. Here are some examples:
    • 'https://www.googleapis.com/auth/plus.me' — access your personal Google+ settings
    • 'https://www.googleapis.com/auth/drive.metadata.readonly' — read-only access your Google Drive file or folder metadata
    • 'https://www.googleapis.com/auth/youtube' — access your YouTube playlists and other personal information
    You can request one or more scopes, given as a single space-delimited string of scopes or an iterable (list, generator expression, etc.) of strings.  If you were writing an app that accesses both your YouTube playlists as well as your Google+ profile information, your SCOPES variable could be either of the following:
    SCOPES = 'https://www.googleapis.com/auth/plus.me https://www.googleapis.com/auth/youtube'

    That is space-delimited and made tiny by me so it doesn't wrap in a regular-sized browser window; or it could be an easier-to-read, non-tiny, and non-wrapped tuple:

    SCOPES = (
        'https://www.googleapis.com/auth/plus.me',
        'https://www.googleapis.com/auth/youtube',
    )

    Our example command-line script will just list the files on your Google Drive, so we only need the read-only Drive metadata scope, meaning our SCOPES variable will be just this:
    SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
    The next section of boilerplate represents the security code:
    store = file.Storage('storage.json')
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
        creds = tools.run_flow(flow, store)
    
    Once the user has authorized access to their personal data by your app, a special "access token" is given to your app. This precious resource must be stored somewhere local for the app to use. In our case, we'll store it in a file called "storage.json". The lines setting the store and creds variables are attempting to get a valid access token with which to make an authorized API call.

    If the credentials are missing or invalid, such as being expired, the authorization flow (using the client secret you downloaded along with a set of requested scopes) must be created (by client.flow_from_clientsecrets()) and executed (by tools.run_flow()) to ensure possession of valid credentials. The client_secret.json file is the credentials file you saved when you clicked "Download JSON" from the DevConsole after you've created your OAuth2 client ID.

    If you don't have credentials at all, the user much explicitly grant permission — I'm sure you've all seen the OAuth2 dialog describing the type of access an app is requesting (remember those scopes?). Once the user clicks "Accept" to grant permission, a valid access token is returned and saved into the storage file (because you passed a handle to it when you called tools.run_flow()).

    Note: tools.run() deprecated by tools.run_flow()
    You may have seen usage of the older tools.run() function, but it has been deprecated by tools.run_flow(). We explain this in more detail in another blogpost specifically geared towards migration.

    Once the user grants access and valid credentials are saved, you can create one or more endpoints to the secure service(s) desired with googleapiclient.discovery.build(), just like with simple API access. Its call will look slightly different, mainly that you need to sign your HTTP requests with your credentials rather than passing an API key:

    DRIVE = discovery.build(API, VERSION, http=creds.authorize(Http()))

    In our example, we're going to list your files and folders in your Google Drive, so for API, use the string 'drive'. The API is currently on version 3 so use 'v3' for VERSION:

    DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

    If you want to get comfortable with OAuth2, what it's flow is and how it works, we recommend that you experiment at the OAuth Playground. There you can choose from any number of APIs to access and experience first-hand how your app must be authorized to access personal data.

    Going back to our working example, once you have an established service endpoint, you can use the list() method of the files service to request the file data:

    files = DRIVE.files().list().execute().get('files', [])

    If there's any data to read, the response dict will contain an iterable of files that we can loop over (or default to an empty list so the loop doesn't fail), displaying file names and types:

    for f in files:
        print(f['name'], f['mimeType'])

    Conclusion

    To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs for files().list(). For more information on what other operations you can execute with the Google Drive API, take a look at the reference docs and check out the companion video for this code sample. Don't forget the codelab and this sample's GitHub repo. That's it!

    Below is the entire script for your convenience:
    '''
    drive_list.py -- Google Drive API demo; maintained at:
        http://github.com/googlecodelabs/gsuite-apis-intro
    '''
    from __future__ import print_function
    
    from googleapiclient import discovery
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    SCOPES = 'https://www.googleapis.com/auth/drive.readonly.metadata'
    store = file.Storage('storage.json')
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
        creds = tools.run_flow(flow, store)
    
    DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
    files = DRIVE.files().list().execute().get('files', [])
    for f in files:
        print(f['name'], f['mimeType'])
    
    When you run it, you should see pretty much what you'd expect, a list of file or folder names followed by their MIMEtypes — I named my script drive_list.py:
    $ python3 drive_list.py
    Google Maps demo application/vnd.google-apps.spreadsheet
    Overview of Google APIs - Sep 2014 application/vnd.google-apps.presentation
    tiresResearch.xls application/vnd.google-apps.spreadsheet
    6451_Core_Python_Schedule.doc application/vnd.google-apps.document
    out1.txt application/vnd.google-apps.document
    tiresResearch.xls application/vnd.ms-excel
    6451_Core_Python_Schedule.doc application/msword
    out1.txt text/plain
    Maps and Sheets demo application/vnd.google-apps.spreadsheet
    ProtoRPC Getting Started Guide application/vnd.google-apps.document
    gtaskqueue-1.0.2_public.tar.gz application/x-gzip
    Pull Queues application/vnd.google-apps.folder
    gtaskqueue-1.0.1_public.tar.gz application/x-gzip
    appengine-java-sdk.zip application/zip
    taskqueue.py text/x-python-script
    Google Apps Security Whitepaper 06/10/2010.pdf application/pdf
    
    Obviously your output will be different, depending on what files are in your Google Drive. But that's it... hope this is useful. You can now customize this code for your own needs and/or to access other Google APIs. Thanks for reading!

    EXTRA CREDIT: To test your skills, add functionality to this code that also displays the last modified timestamp, the file (byte)size, and perhaps shave the MIMEtype a bit as it's slightly harder to read in its entirety... perhaps take just the final path element? One last challenge: in the output above, we have both Microsoft Office documents as well as their auto-converted versions for Google Apps... perhaps only show the filename once and have a double-entry for the filetypes!