Google Open Source Blog

Introducing Data Transfer Project: an open source platform promoting universal data portability

Friday, July 20, 2018

In 2007, a small group of engineers in our Chicago office formed the Data Liberation Front, a team that believed consumers should have better tools to put their data where they want, when they want, and even move it to a different service. This idea, called “data portability,” gives people greater control of their information, and pushes us to develop great products because we know they can pack up and leave at any time.

In 2011, we launched Takeout, a new way for Google users to download or transfer a copy of the data they store or create in a variety of industry-standard formats. Since then, we've continued to invest in Takeout—we now call it Download Your Data—and today, our users can download a machine-readable copy of the data they have stored in 50+ Google products, with more on the way.

Now, we’re taking our commitment to portability a step further. In tandem with Microsoft, Twitter, and Facebook we’re announcing the Data Transfer Project, an open source initiative dedicated to developing tools that will enable consumers to transfer their data directly from one service to another, without needing to download and re-upload it. Download Your Data users can already do this; they can transfer their information directly to their Dropbox, Box, MS OneDrive, and Google Drive accounts today. With this project, the development of which we mentioned in our blog post about preparations for the GDPR, we’re looking forward to working with companies across the industry to bring this type of functionality to individuals across the web.

Our approach

The organizations involved with this project are developing tools that can convert any service's proprietary APIs to and from a small set of standardized data formats that can be used by anyone. This makes it possible to transfer data between any two providers using existing industry-standard infrastructure and authorization mechanisms, such as OAuth. So far, we have developed adapters for seven different service providers across five different types of consumer data; we think this demonstrates the viability of this approach to scale to a large number of use cases.

Consumers will benefit from improved flexibility and control over their data. They will be able to import their information into any participating service that offers compelling features—even brand new ones that could rely on powerful, cloud-based infrastructure rather than the consumers’ potentially limited bandwidth and capability to transfer files. Services will benefit as well, as they will be able to compete for users that can move their data more easily.

Protecting users’ data and keeping them in control

Data security and privacy are foundational to the design of the Data Transfer Project. Services must first agree to allow data transfer between them, and then they will require that individuals authenticate each account independently. All credentials and user data will be encrypted both in transit and at rest. The protocol uses a form of perfect forward secrecy where a new unique key is generated for each transfer. Additionally, the framework allows partners to support any authorization mechanism they choose. This enables partners to leverage their existing security infrastructure when authorizing accounts.

As it is an open source product, anyone can inspect the code to verify that data isn't being collected or used for profiling purposes. Tech savvy consumers are also free to download and run an instance of the framework themselves. Interested parties can learn more at the Data Transfer Project website, which explains the technical foundations behind the project and goes into greater detail on how it works.

How to get involved

It is very early days for the Data Transfer Project and we encourage the developer community to join us and help extend the platform to support many more data types, service providers, and hosting solutions.

The Data Transfer Project’s open source code can be found at datatransferproject.dev and you can learn more about Google’s approach to portability in our paper, where we describe our history with this topic and the values and principles that motivated us to invest in the Data Transfer Project. Our prototype already supports data transfer for several product verticals including: photos, mail, contacts, calendar, and tasks. These are enabled by existing, publicly available APIs from Google, Microsoft, Twitter, Flickr, Instagram, Remember the Milk, and Smugmug.

Data portability makes it easy for consumers to try new services and use the ones that they like best. We’re thrilled to help drive an initiative that incentivizes companies large and small to continue innovating across the internet. We’re just getting started and we’re looking forward to what comes next.

By Brian Willard, Software Engineer and Greg Fair, Product Manager

Googlers on the road: CLS and OSCON 2018

Friday, July 13, 2018

Next week a veritable who’s who of free and open source software luminaries, maintainers and developers will gather to celebrate the 20th annual OSCON and the 20th anniversary of the Open Source Definition. Naturally, the Google Open Source and Google Cloud teams will be there too!

Program chairs at OSCON 2017, left to right:
Rachel Roumeliotis, Kelsey Hightower, Scott Hanselman.
Photo used with permission from O'Reilly Media.

This year OSCON returns to Portland, Oregon and runs from July 16-19. As usual, it is preceded by the free-to-attend Community Leadership Summit on July 14-15.

If you’re curious about our outreach programs, our approach to open source, or any of the open source projects we’ve released, please find us! We’re eager to chat. You’ll find us and many other Googlers throughout the week on stage, in the expo hall, and at several special events that we’re running, including:

TensorFlow Day on Tuesday from 9am-5pm
Istio Day on Tuesday from 9am-5pm
Better Together Diversity Networking Lunch on Wednesday from 12:30-1:45pm
Kubernetes Anniversary Party at Cloud Native PDX Meetup on Wednesday from 7-9pm

Here’s a rundown of the sessions we’re hosting this year:

Sunday, July 15th (Community Leadership Summit)

11:45am Asking for time and/or money by Cat Allman

Monday, July 16th (Tutorials)

9:00am   Getting started with TensorFlow by Josh Gordon
1:30pm   Introduction to natural language processing with Python by Barbara Fusinska

Tuesday, July 17th (Tutorials)

9:00am   Istio Day opening remarks by Kelsey Hightower
9:00am   TensorFlow Day opening remarks by Edd Wilder-James
9:05am   Sailing to 1.0: Istio community update by April Nassi
9:05am   The state of TensorFlow by Sandeep Gupta
9:30am   Introduction to fairness in machine learning by Hallie Benjamin
9:55am   Farm to table: A TensorFlow story by Gunhan Gulsoy
11:00am Hassle-free, scalable machine learning with Kubeflow by Barbara Fusinska
11:05am Istio: Zero-trust communication security for production services by Samrat Ray, Tao Li, and Mak Ahmad
12:00pm Project Magenta: Machine learning for music and art by Sherol Chen
1:35pm   Istio à la carte by Daniel Ciruli

Wednesday, July 18th (Sessions)

9:00am   Wednesday opening welcome by Kelsey Hightower
11:50am Machine learning for continuous integration by Joseph Gregorio
1:45pm   Live-coding a beautiful, performant mobile app from scratch by Emily Fortuna and Matt Sullivan
2:35pm   Powering TensorFlow with big data using Apache Beam, Flink, and Spark by Holden Karau
5:25pm   Teaching the Next Generation to FLOSS by Josh Simmons

Thursday, July 19th (Sessions)

9:00am   Thursday opening welcome by Kelsey Hightower
9:40am   20 years later, open source is as important as ever by Sarah Novotny
11:50am Google’s approach to distributed systems observability by Jaana B. Dogan
2:35pm   gRPC versus REST: Let the battle begin with Alex Borysov
5:05pm   Shenzhen Go: A visual Go environment for everybody, even professionals by Josh Deprez

We look forward to seeing you and the rest of the community there!

By Josh Simmons, Google Open Source

Contributing to the AMP Project

Thursday, June 21, 2018

This is a guest post by Adam Silverstein who was recently recognized through the Google Open Source Peer Bonus Program for contributions to the AMP Project. We invited Adam to share about his work on our blog.

I started my web career building websites for small businesses on WordPress, so when I decided to begin contributing to open source, WordPress was a natural place to start.

Now I work at the digital agency 10up, where I am a part of our open source team. We build popular sites like FiveThirtyEight where having the best possible AMP experience is critical. However, bringing FiveThirtyEight’s AMP version up to parity with the site’s responsive mobile experience was challenging, in part because of advanced features that aren’t directly supported in AMP.

One of those unsupported features was MathML, a standard for displaying mathematical formulas on the web. To avoid a clumsy work around (amp-iframe) and improve our presentation of formulas, I proposed a native `amp-mathml` component which could display formulas inline. Contributing improvements “upstream” to open source projects – especially as we encounter friction in real-world projects – is a core value at 10up and important to the health of the web. I expected that I could leverage the same open source MathJax library we used on the responsive website for an AMP implementation. Contributing this component would strengthen my understanding of AMP’s internals while simultaneously improving a client site and enabling the open MathML standard on any AMP page. Win, win, win!

I started by opening an issue on Google’s amphtml repository, describing MathML and proposing a native `amp-mathml` component. Justin Ridgewell from the AMP team immediately responded to the issue and asked Ali Ghassemi to track it. I offered to help write the code and received an enthusiastic response, encouraging me and assuring me that the team would be available on GitHub and in Slack to answer any questions.

This warm welcome gave me the confidence to dive in, but ramp up was daunting. The build tools and coding standards were quite different from other projects I work on and setup required some editor reconfiguring and reflex retraining. Getting the unit test to run on my system required tracking down and installing some missing dependencies.

Fortunately, AMP’s project documentation is thorough, and Ali guided me through the implementation, pointing me to existing, similar samples in the project. I already knew how to use JavaScript to render formulas with MathJax – my challenge was building an AMP component that ran this code and displayed it inline.

After a few days of concerted effort, I built a proof of concept and opened a pull request. The real fun began as I refined the approach and wrote documentation with help from the team. The team’s active engagement helped the process move along rapidly. Amazingly, the pull request was merged one week later, and today amp-mathml is live in the wild. FiveThirtyEight is already using the new, native implementation.

From opening the issue all the way to the merge of my pull request, I was impressed by the support and encouragement I received. Ali and honeybadgerdontcare provided regular reviews and thorough suggestions on the pull request when I pushed iterations. Their engagement throughout the process made me and my work feel valued, and helped me stay motivated to continue working on the feature.

Adding MathML to AMP reminded me why I find so much joy and professional growth in contributing to open source projects. I have a better understanding of AMP from the inside out, and I was welcomed into the project’s community with wide open arms. I'm proud of my contribution, and ready to tackle new challenges after seeing its success!

By Adam Silverstein, AMP Project contributor