Google Open Source Blog

An open source font system for everyone

Thursday, October 6, 2016

Originally posted on the Google Developers Blog

A big challenge in sharing digital information around the world is “tofu”—the blank boxes that appear when a computer or website isn’t able to display text: ⯐. Tofu can create confusion, a breakdown in communication, and a poor user experience.

Five years ago we set out to address this problem via the Noto—aka “No more tofu”—font project. Today, Google’s open source Noto font family provides a beautiful and consistent digital type for every symbol in the Unicode standard, covering more than 800 languages and 110,000 characters.

A few samples of the 110,000+ characters covered by Noto fonts.

The Noto project started as a necessity for Google’s Android and Chrome OS operating systems. When we began, we did not realize the enormity of the challenge. It required design and technical testing in hundreds of languages, and expertise from specialists in specific scripts. In Arabic, for example, each character has four glyphs (i.e., shapes a character can take) that change depending on the text that comes after it. In Indic languages, glyphs may be reordered or even split into two depending on the surrounding text.

The key to achieving this milestone has been partnering with experts in the field of type and font design, including Monotype, Adobe, and an amazing network of volunteer reviewers. Beyond “no more tofu” in the common languages used every day, Noto will be used to preserve the history and culture of rare languages through digitization. As new characters are introduced into the Unicode standard, Google will add these into the Noto font family.

Google has a deep commitment to openness and the accessibility and innovation that come with it. The full Noto font family, design source files, and the font building pipeline are available for free at the links below. In the spirit of sharing and communication across borders and cultures, please use and enjoy!

Noto fonts download: https://www.google.com/get/noto/
Design source files: https://github.com/googlei18n/noto-source
Font building pipeline: https://github.com/googlei18n/fontmake

By Xiangye Xiao and Bob Jung, Internationalization

Introducing Cartographer

Wednesday, October 5, 2016

We are happy to announce the open source release of Cartographer, a real-time simultaneous localization and mapping (SLAM) library in 2D and 3D with ROS support.

SLAM algorithms combine data from various sensors (e.g. LIDAR, IMU and cameras) to simultaneously compute the position of the sensor and a map of the sensor’s surroundings. For example, consider this approach to drawing a floor plan of your living room:

Grab a laser rangefinder, stand in the middle of the room, and draw an X on a piece of paper.
Measure the distance from where you’re standing to any wall.
Draw a line on the paper where the wall is and write down the distance between the X (your position) and the wall.
Measure the distance from where you’re standing to another wall and add it to the drawing as well.
Now, move to another part of the room.
Since the walls (hopefully) haven’t moved, you can measure your distance to the same two walls to determine your new position.

SLAM is an essential component of autonomous platforms such as self driving cars, automated forklifts in warehouses, robotic vacuum cleaners, and UAVs.

Cartographer builds globally consistent maps in real-time across a broad range of sensor configurations common in academia and industry. The following video is a demonstration of Cartographer’s real-time loop closure:

A detailed description of Cartographer’s 2D algorithms can be found in our ICRA 2016 paper.

Thanks to ROS integration and support from external contributors, Cartographer is ready to use on several robot platforms with ROS support:

At Google, Cartographer has enabled a range of applications from mapping museums and transit hubs to enabling new visualizations of famous buildings.

We recognize the value of high quality datasets to the research community. That’s why, thanks to cooperation with the Deutsches Museum (the largest tech museum in the world), we are also releasing three years of LIDAR and IMU data collected using our 2D and 3D mapping backpack platforms during the development and testing of Cartographer.

Our focus is on advancing and democratizing SLAM as a technology. Currently, Cartographer is heavily focused on LIDAR SLAM. Through continued development and community contributions, we hope to add both support for more sensors and platforms as well as new features, such as lifelong mapping and localizing in a pre-existing map.

By Damon Kohler, Wolfgang Hess, and Holger Rapp, Google Engineering

Introducing the Open Images Dataset

Monday, October 3, 2016

Originally posted on the Google Research Blog

In the last few years, advances in machine learning have enabled Computer Vision to progress rapidly, allowing for systems that can automatically caption images to apps that can create natural language replies in response to shared photos. Much of this progress can be attributed to publicly available image datasets, such as ImageNet and COCO for supervised learning, and YFCC100M for unsupervised learning.

Today, we introduce Open Images, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried to make the dataset as practical as possible: the labels cover more real-life entities than the 1000 ImageNet classes, there are enough images to train a deep neural network from scratch and the images are listed as having a Creative Commons Attribution license^*.

The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API. For the validation set, we had human raters verify these automated labels to find and remove false positives. On average, each image has about 8 labels assigned. Here are some examples:

Annotated images form the Open Images dataset. Left: Ghost Arches by Kevin Krejci. Right: Some Silverware by J B. Both images used under CC BY 2.0 license

We have trained an Inception v3 model based on Open Images annotations alone, and the model is good enough to be used for fine-tuning applications as well as for other things, like DeepDream or artistic style transfer which require a well developed hierarchy of filters. We hope to improve the quality of the annotations in Open Images the coming months, and therefore the quality of models which can be trained.

The dataset is a product of a collaboration between Google, CMU and Cornell universities, and there are a number of research papers built on top of the Open Images dataset in the works. It is our hope that datasets like Open Images and the recently released YouTube-8M will be useful tools for the machine learning community.

By Ivan Krasin and Tom Duerig, Software Engineers

* While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.