Editor’s note: This is a guest post by Martin Böhringer, Co-Founder and CEO of Hojoki. -- Steve Bazyl
Hojoki integrate productivity cloud apps into one newsfeed and enables sharing and discussions on top of the feed. We’ve integrated 17 apps now and counting, so it’s safe to say that we’re API addicts. Now it's Time to share with you what we learned about the Google Apps APIs!
Our initial reason for building Hojoki was because of the fragmentation we experience in all of our cloud apps. And all those emails. Still, there was this feeling of “I don’t know what’s going on” in our distributed teamwork. So we decided to build something like a Google+ where streams get automatically filled by activities in the apps you use.
This leads to a comprehensive stream of everything that’s going on in your team combined with comments and microblogging. You can organize your stream into workspaces, which are basically places for discussions and collaboration with your team.
To build this, we first need some kind of information on recent events. As we wanted to be able to aggregate similar activities and to provide a search, as well as splitting up the stream in workspaces, we also had to be able to sort events as unique objects like files, calendar entries and contacts.
Further, it’s crucial to not only know what has changed, but who did it. So providing unique identities is important for building federated feeds.
Google’s APIs share some basic architecture and structure, described in their Google Data Protocol. Based on that, application-specific APIs provide access to the application’s data. What we use is the following:
The basic call for Google Contacts for example looks like this:
https://www.google.com/m8/feeds/contacts/default/full
This responds with a complete list of your contacts. Once we have this list all we have to do is to ask for the delta to our existing knowledge. For such use cases, Google’s APIs support query parameters as well as sorting parameters. So we can set “orderby” to “lastmodified” as well as “updated-min” to the timestamp of our last call. This way we are able to keep the traffic low and get quick results by only asking for things we might have missed.
If you want to develop using those APIs you should definitely have a look at the SDKs for them. We used the Google Docs SDK for an early prototype and loved it. Today, Hojoki uses its own generic connection handler for all our integrated systems so we don’t leverage the SDKs anymore.
If you’re into API development, you’ve probably already realized that our information needs don’t fit into many of the APIs out there. Most of the APIs are object centric. They can tell you what objects are included in a certain folder, but they can’t tell you which object in this folder has been changed recently. They just aren’t built with newsfeeds in mind.
Google Apps APIs support most of our information needs. Complete support of OAuth and very responsive APIs definitely make our lives easier.
However, the APIs are not built with Hojoki-like newsfeeds in mind. For example, ETags may change even if nothing happened to an object because of asynchronous processing on Google’s side (see Google’s comment on this). For us this means that, once we detect an altered ETag, in some cases we still have to check based on our existing data if there really have been relevant activities. Furthermore, we often have trouble with missing actors in our activities. For example, up to now we know when somebody changed a calendar event, but there is no way to find out who this was.
Another issue is the classification of updates. Google’s APIs tell us that something changed with an object. But to build a nice newsfeed you also want to know what exactly has been changed. So you’re looking for a verb like created, updated, shared, commented, moved or deleted. While Hojoki calls itself an aggregator for activities, technically we’re primarily an activity detector.
You can think of Hojoki as a multi-layer platform. First of all, we try to get a complete overview on your meta-data of the connected app. In Google Docs, this means files and collections, and we retrieve URI and name as well as some additional information (not the content itself). This information fills a graph-based data storage (we use RDF, read more about it here).
At the moment, we subscribe to events in the integrated apps. If detected, they create a changeset for the existing data graph. This changeset is an activity for our newsfeed and related to the object representation. This allows us to provide a very flexible aggregation and filtering on the client side. See the following screenshot. You can filter the stream for a certain collection (“Analytics”) or only for the file history or for the Hojoki workspace where this file is added (“Hojoki Marketing”).
What’s really important in terms of such heavy API processing is to use asynchronous calls. We use the great Open Source project async-http-client for this task.
When I wrote that “we subscribe to events” this is a very nice euphemism for “we’re polling every 30s to see if something changed”. This is not really optimal and we’d love to change it. If Google Apps APIs would support a feed of user events modelled in a common standard like ActivityStrea.ms, combined with reliable ETags and maybe even a push API (e.g. Webhooks) this would also make life easier for lots of developers syncing their local files with Google and help to reduce traffic on both sides.
Users of the Google Documents List API have traditionally had to perform individual export operations in order to export their documents. This is quite inefficient, both in terms of time and bandwidth for these operations.
To improve latency for these operations, we have added the Archive Feed to the API. Archives allow users to export a large number of items at once, in single ZIP archives. This feature provides a useful optimization for users, greatly increasing the efficiency of export operations. Additionally, users can receive emails about archives, and choose to download them from a link provided in the email.
This feature had a soft release earlier this year, and we think it’s now ready for prime time. For more information, please see the Archive Feed documentation.
With the Google Documents List API, there are two ways to identify resources: typed and untyped resource identifiers. Typed resource identifiers prefix a string of characters with the resource type. Untyped resource identifiers are similar, but do not have a type prefix. For example:
drawing:0Aj01z0xcb9
0Aj01z0xcb9
Client applications often need one type of identifier or the other. For instance, some applications use untyped resource IDs to access spreadsheets using the Google Spreadsheets API. Automatically generated API URLs in the Documents List API use typed or untyped resource IDs in certain situations.
Having two types of resource IDs is something that we will resolve in a future version of the API. Meanwhile, we strongly recommend that instead of using resource identifiers, clients always use URLs provided in feeds and entries of the Google Documents List API. The only time that manual URL modification is required is to add special parameters to a URL given by the API, for instance to search for a resource by title.
For example, the API issues self links along with each entry. To request an entry again, simply GET the self link of the entry. We recommend against constructing the link manually, by inserting the entry’s resource ID into the link.
Common links on entries include:
Accessing these links from a client library is simple. For instance, to retrieve the alternate link in Python, one uses:
resource = client.GetAllResources()[0] print resource.GetHtmlLink()
More information on these links is available in the documentation. For any questions, please post in the forum.
import randomimport timedef GetResourcesWithExponentialBackoff(client): """Gets all of the resources for the authorized user Args: client: gdata.docs.client.DocsClient authorized for a user. Returns: gdata.docs.data.ResourceFeed representing Resources found in request. """ for n in range(0, 5): try: response = client.GetResources() return response except: time.sleep((2 ** n) + (random.randint(0, 1000) / 1000)) print "There has been an error, the request never succeeded." return None
We are currently rolling out a change to the organization of existing resources in collections in Google Docs. This change is completely transparent to users of the Google Docs web user interface, but it is technically visible when using the Google Documents List API to make requests with the showroot=true query parameter or specifically querying the contents of the root collection. In order to understand this change, first read how Google Docs organizes resources.
showroot=true
The change involves Google removing those resources from a user’s root collection that already exist within another collection accessible to the given user. That is, if “My Presentation” is currently in root and in the “My Talks” collection, after this change it will only exist in the “My Talks” collection.
We are making this change in order to make the organization of resources less confusing for API developers. This change allows clients to know that a resource either exists in root or in some collection under root. Clients can still retrieve all resources, regardless of which collections they’re in, using the resources feed.
The change is rolling out gradually to all Google Docs users over the next few months.
Developers with further questions about this change should post in the Google Documents List API forum.
Google Docs supports sharing collections and their contents with others. This allows multiple Google Docs resources to be shared at once, and for additional resources added to the collection later to be automatically shared.
Class.io, an EDU application on the Google Apps Marketplace, uses this technique. When a professor creates a new course, the application automatically creates a Google Docs collection for that course and shares it with all the students. This gives the students and professor a single place to go in Google Docs to access and manage all of their course files.
A collection is a Google Docs resource that contains other resources, typically behaving like a folder on a file system.
A collection resource is created by making an HTTP POST to the feed link with the category element’s term set to http://schemas.google.com/docs/2007#folder, for example:
http://schemas.google.com/docs/2007#folder
<?xml version='1.0' encoding='UTF-8'?> <entry xmlns="http://www.w3.org/2005/Atom"> <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/docs/2007#folder"/> <title>Example Collection</title> </entry>
To achieve the same thing using the Python client library, use the following code:
from gdata.docs.data import Resource collection = Resource('folder') collection.title.text = 'Example Collection' # client is an Authorized client collection = client.create_resource(entry)
The new collection returned has a content element indicating the URL to use to add new resources to the collection. Resources are added by making HTTP POST requests to this URL.
content
<content src="https://docs.google.com/feeds/default/private/full/folder%3A134acd/contents" type="application/atom+xml;type=feed" />
This process is simplified in the client libraries. For example, in the Python client library, resources can be added to the new collection by passing the collection into the create_resource method for creating resources, or the move_resource method for moving an existing resource into the collection, like so:
create_resource
move_resource
# Create a new resource of document type in the collection new_resource = Resource(type='document', title='New Document') client.create_resource(new_resource, collection=collection) # Move an existing resource client.move_resource(existing_resource, collection=collection)
Once resources have been added to the collection, the collection can be shared using ACL entries. For example, to add the user user@example.com as a writer to the collection and every resource in the collection, the client creates and adds the ACL entry like so:
user@example.com
writer
from gdata.acl.data import AclScope, AclRole from gdata.docs.data import AclEntry acl = AclEntry( scope = AclScope(value='user@example.com', type='user'), role = AclRole(value='writer') ) client.add_acl_entry(collection, acl)
The collection and its contents are now shared, and this can be verified in the Google Docs user interface:
Note: if the application is adding more than one ACL entry, it is recommended to use batching to combine multiple ACL entries into a single request. For more information on this best practice, see the latest blog post on the topic.
The examples shown here are using the raw protocol or the Python client library. The Java client library also supports managing and sharing collections.
For more information on how to use collections, see the Google Documents List API documentation. You can also find assistance in the Google Documents List API forum.
There are a number of ways to add resources to your Google Documents List using the API. Most commonly, clients need to upload an existing resource, rather than create a new, empty one. Legacy clients may be doing this in an inefficient way. In this post, we’ll walk through why using resumable uploads makes your client more efficient.
The resumable upload process allows your client to send small segments of an upload over time, and confirm that each segment arrived intact. This has a number of advantages.
Since only one small segment of data is sent to the API at a time, clients can store less data in memory as they send data to the API. For example, consider a client uploading a PDF via a regular, non-resumable upload in a single request. The client might follow these steps:
But that 100,000 bytes isn’t a customizable value in most client libraries. In some environments, with limited memory, applications need to choose a custom chunk size that is either smaller or larger.
The resumable upload mechanism allows for a custom chunk size. That means that if your application only has 500KB of memory available, you can safely choose a chunk size of 256KB.
In the previous example, if any of the bytes fail to transmit, this non-resumable upload fails entirely. This often happens in mobile environments with unreliable connections. Uploading 99% of a file, failing, and restarting the entire upload creates a bad user experience. A better user experience is to resume and upload only the remaining 1%.
Traditional non-resumable uploads via HTTP have size limits depending on both the client and server systems. These limits are not applicable to resumable uploads with reasonable chunk sizes, as individual HTTP requests are sent for each chunk of a file. Since the Documents List API now supports file sizes up to 10GB, this is very important.
The Java, Python, Objective-C, and .NET Google Data API client libraries all include a mechanism by which you can initiate a resumable upload session. Examples of uploading a document with resumable upload using the client libraries is detailed in the documentation. Additionally, the new Documents List API Python client library now uses only the resumable upload mechanism. To use that version, make sure to follow these directions.
We are announcing the deprecation of SWF export functionality for presentations from the Google Documents List API. We are taking this action due to the limited demand for this feature, and in order to focus engineering efforts on other aspects of the API.
Clients currently making the following request to the API are affected by this change.
https://docs.google.com/feeds/download/presentations/Export?docID=1234&exportFormat=swf
We recommend clients currently using SWF exports switch to PDF exports, using the appropriate exportFormat value.
https://docs.google.com/feeds/download/presentations/Export?docID=1234&exportFormat=pdf
We are disabling SWF exports in the coming weeks. Clients attempting to export presentations as SWF after the exports are disabled will receive an HTTP 400 response.
For more information on exporting presentations, see the Google Documents List API documentation. If you have any questions, feel free to reach out in the forums.
ACL (Access Control List) entries control who can access Google Docs resources. This allows more specific control over resource privacy or permissions.
Many types of applications need to grant document access for several users at once. As an example: when a new user is added to a project in the Manymoon project management application, every user on the project needs to be granted access to all attached Google docs. If there are 10 users on the project and 10 shared documents, this means the app would typically need to perform 100 HTTP requests -- a lot of overhead. With batching of ACL requests, the application can reduce the number of requests to one per document, resulting in a 10x savings.
A typical ACL entry for a single user is created by making an HTTP POST to the ACL link provided with each resource entry. The POST body looks something like this:
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:gAcl='http://schemas.google.com/acl/2007'> <category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/acl/2007#accessRule'/> <gAcl:role value='writer'/> <gAcl:scope type='user' value='new_writer@example.com'/> </entry>
from gdata.acl.data import AclScope, AclRole from gdata.docs.data import AclEntry acl = AclEntry( scope = AclScope(value='user@example.com', type='user'), role = AclRole(value='writer') )
Instead of submitting the requests separately, multiple ACL operations for a resource can be combined into a single batch request. This is done by POSTing a feed of ACL entries. Each ACL entry in the feed must have a special batch:operation element, describing the type of operation to perform on the ACL entry. Valid operations are query, insert, update, and delete.
batch
batch:operation
query
insert
update
delete
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:gAcl='http://schemas.google.com/acl/2007' xmlns:batch='http://schemas.google.com/gdata/batch'> <category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/acl/2007#accessRule'/> <entry> <category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/acl/2007#accessRule'/> <gAcl:role value='reader'/> <gAcl:scope type='domain' value='example.com'/> <batch:operation type='insert'/> </entry> <entry> <category scheme='http://schemas.google.com/g/2005#kind' term='http://schemas.google.com/acl/2007#accessRule'/> <id>https://docs.google.com/feeds/default/private/full/document%3Adocument_id/acl/user%3Aold_writer%40example.com</id> <gAcl:role value='writer'/> <gAcl:scope type='user' value='new_writer@example.com'/> <batch:operation type='update'/> </entry> </feed>
The following code represents the same operation in the Python client library:
from gdata.data import BatchOperation from gdata.acl.data import AclScope, AclRole from gdata.docs.data import AclEntry acl1 = AclEntry( scope=AclScope(value='example.com', type='domain'), role=AclRole(value='reader'), batch_operation=BatchOperation(type='insert') ) acl2 = client.get_acl_entry_by_self_link( ('https://docs.google.com/feeds/default/private/full/' 'document%3Adocument_id/acl/user%3Aold_writer%40example.com')) acl2.scope = AclScope(value='new_writer@example.com', type='user') acl2.role = AclRole(value='writer') acl2.batch_operation = BatchOperation(type='update') entries = [acl1, acl2]
The feed of these entries can now be submitted together to apply to a resource:
results = client.batch_process_acl_entries(resource, entries)
The return value is an AclFeed, with a list of AclEntry elements for each operation, the status of which can be checked individually:
for result in results.entry: print entry.title.text, entry.batch_status.code
The examples shown here are using the raw protocol or the Python client library. The Java client library also supports batch operations on ACL entries.
For more information on how to use batch operations when managing ACLs, see the Google Documents List API documentation, and the Google Data APIs batch protocol reference guide. You can also find assistance in the Google Documents List API forum.
In March, we announced that we would start requiring clients to use SSL when making requests to the Google Documents List API, the Google Spreadsheets API, and the Google Sites API. This is part of our ongoing effort to increase the security of user data.
The time has come, and we are starting to roll out this requirement. On average, about 86% of requests to these APIs are already using SSL, so we expect there to be minimal migration required. The implementation will continue throughout September. If an application receives an HTTP 400 Bad Request response to a request, then it may be because the request was not made using HTTPS.
Clients that have not already started using SSL for all requests should do so immediately. This is as simple as upgrading to the latest version of the relevant API client library. Developers with questions should post in the API forums.
app:edited
https://docs.google.com/feeds/default/private/changes
<?xml version="1.0" encoding="UTF-8"?><feed xmlns="http://www.w3.org/2005/Atom" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:docs="http://schemas.google.com/docs/2007" xmlns:gd="http://schemas.google.com/g/2005" gd:etag="W/"DEEMQ3w8eyt7ImA9WhZUGUo.""> <docs:largestChangestamp>5635</docs:largestChangestamp> <link rel="next" type="application/atom+xml" href="...?start-index=5636"/> <entry gd:etag="W/"DUcMRHg5cCt7ImA9WhZUGUo.""> <category scheme="http://schemas.google.com/g/2005#kind" term="http://schemas.google.com/docs/2007#change" label="change"/> <title>Project tasks</title> ... <docs:changestamp value="5623"/> </entry> ...</feed>
GmailApp
GmailApp.search()
DocumentApp
DocsList.find()
File.getId()
Document.saveAndClose()
Posted by Corey Goldfeder, Google Apps Script Team
Want to weigh in on this topic? Discuss on Buzz
def u2open_appengine(self, u2request):tm = self.options.timeouturl = self.u2opener()# socket.setdefaulttimeout(tm) can't call this on app engineif self.u2ver() < 2.6: return url.open(u2request)else: return url.open(u2request, timeout=tm)transport.http.HttpTransport.u2open = u2open_appengine
class MemCache(cache.Cache): def __init__(self, duration=3600): self.duration = duration self.client = memcache.Client() def get(self, id): return self.client.get(str(id)) def getf(self, id): return self.get(id) def put(self, id, object): self.client.set(str(id), object, self.duration) def putf(self, id, fp): self.put(id, fp) def purge(self, id): self.client.delete(str(id)) def clear(self): self.client.flush_all()
docmail_client = client.Client(USERNAME, PASSWORD, SOURCE)
mailing = client.Mailing(name='test mailing')mailing = docmail_client.create_mailing(mailing)
mailing = docmail_client.get_mailing('enter-your-mailing-guid')
docs_client = gdata.docs.client.DocsClient()# authenticate client using oauth (see google docs documentation for example code)
file_content = docs_client.GetFileContent(uri=doc_url + '&exportFormat=rtf')
docmail_client.add_template_file(mailing.guid, file_content)
docs_client = gdata.docs.client.DocsClient()file_content = docs_client.GetFileContent(uri=doc_url + '&exportFormat=csv')docmail_client.add_mailing_list_file(mailing.guid, file_content)
docmail_client.process_mailing(mailing.guid, False, True)
docmail_client.process_mailing(mailing.guid, True, False)
Posted by Stuart Keeble & Gwyn Howell, Appogee
?convert=false
Posted by Rob Wyrick, Google Documents List API Team
Posted by Adam Feldman, Google Developer Team