Google Online Security Blog: privacy

How Hash-Based Safe Browsing Works in Google Chrome

August 8, 2022

There are various threats a user faces when browsing the web. Users may be tricked into sharing sensitive information like their passwords with a misleading or fake website, also called phishing. They may also be led into installing malicious software on their machines, called malware, which can collect personal data and also hold it for ransom. Google Chrome, henceforth called Chrome, enables its users to protect themselves from such threats on the internet. When Chrome users browse the web with Safe Browsing protections, Chrome uses the Safe Browsing service from Google to identify and ward off various threats. Safe Browsing works in different ways depending on the user's preferences. In the most common case, Chrome uses the privacy-conscious Update API (Application Programming Interface) from the Safe Browsing service. This API was developed with user privacy in mind and ensures Google gets as little information about the user's browsing history as possible. If the user has opted-in to "Enhanced Protection" (covered in an earlier post) or "Make Searches and Browsing Better", Chrome shares limited additional data with Safe Browsing only to further improve user protection. This post describes how Chrome implements the Update API, with appropriate pointers to the technical implementation and details about the privacy-conscious aspects of the Update API. This should be useful for users to understand how Safe Browsing protects them, and for interested developers to browse through and understand the implementation. We will cover the APIs used for Enhanced Protection users in a future post. Threats on the Internet When a user navigates to a webpage on the internet, their browser fetches objects hosted on the internet. These objects include the structure of the webpage (HTML), the styling (CSS), dynamic behavior in the browser (Javascript), images, downloads initiated by the navigation, and other webpages embedded in the main webpage. These objects, also called resources, have a web address which is called their URL (Uniform Resource Locator). Further, URLs may redirect to other URLs when being loaded. Each of these URLs can potentially host threats such as phishing websites, malware, unwanted downloads, malicious software, unfair billing practices, and more. Chrome with Safe Browsing checks all URLs, redirects or included resources, to identify such threats and protect users. Safe Browsing Lists Safe Browsing provides a list for each threat it protects users against on the internet. A full catalog of lists that are used in Chrome can be found by visiting chrome://safe-browsing/#tab-db-manager on desktop platforms. A list does not contain unsafe web addresses, also referred to as URLs, in entirety; it would be prohibitively expensive to keep all of them in a device’s limited memory. Instead it maps a URL, which can be very long, through a cryptographic hash function (SHA-256), to a unique fixed size string. This distinct fixed size string, called a hash, allows a list to be stored efficiently in limited memory. The Update API handles URLs only in the form of hashes and is also called hash-based API in this post. Further, a list does not store hashes in entirety either, as even that would be too memory intensive. Instead, barring a case where data is not shared with Google and the list is small, it contains prefixes of the hashes. We refer to the original hash as a full hash, and a hash prefix as a partial hash.

A list is updated following the Update API’s request frequency section. Chrome also follows a back-off mode in case of an unsuccessful response. These updates happen roughly every 30 minutes, following the minimum wait duration set by the server in the list update response. For those interested in browsing relevant source code, here’s where to look: Source Code

GetListInfos() contains all the lists, along with their associated threat types, the platforms they are used on, and their file names on disk.

HashPrefixMap shows how the lists are stored and maintained. They are grouped by the size of prefixes, and appended together to allow quick binary search based lookups.

How is hash-based URL lookup done As an example of a Safe Browsing list, let's say that we have one for malware, containing partial hashes of URLs known to host malware. These partial hashes are generally 4 bytes long, but for illustrative purposes, we show only 2 bytes. ['036b', '1a02', 'bac8', 'bb90'] Whenever Chrome needs to check the reputation of a resource with the Update API, for example when navigating to a URL, it does not share the raw URL (or any piece of it) with Safe Browsing to perform the lookup. Instead, Chrome uses full hashes of the URL (and some combinations) to look up the partial hashes in the locally maintained Safe Browsing list. Chrome sends only these matched partial hashes to the Safe Browsing service. This ensures that Chrome provides these protections while respecting the user’s privacy. This hash-based lookup happens in three steps in Chrome: Step 1: Generate URL Combinations and Full Hashes When Google blocks URLs that host potentially unsafe resources by placing them on a Safe Browsing list, the malicious actor can host the resource on a different URL. A malicious actor can cycle through various subdomains to generate new URLs. Safe Browsing uses host suffixes to identify malicious domains that host malware in their subdomains. Similarly, malicious actors can also cycle through various subpaths to generate new URLs. So Safe Browsing also uses path prefixes to identify websites that host malware at various subpaths. This prevents malicious actors from cycling through subdomains or paths for new malicious URLs, allowing robust and efficient identification of threats.

To incorporate these host suffixes and path prefixes, Chrome first computes the full hashes of the URL and some patterns derived from the URL. Following Safe Browsing API's URLs and Hashing specification, Chrome computes the full hashes of URL combinations by following these steps:

First, Chrome converts the URL into a canonical format, as defined in the specification.

Then, Chrome generates up to 5 host suffixes/variants for the URL.

Then, Chrome generates up to 6 path prefixes/variants for the URL.

Then, for the combined 30 host suffixes and path prefixes combinations, Chrome generates the full hash for each combination.

Source Code

V4LocalDatabaseManager::CheckBrowseURL is an example which performs a hash-based lookup.

V4ProtocolManagerUtil::UrlToFullHashes creates the various URL combinations for a URL, and computes their full hashes.

Example For instance, let's say that a user is trying to visit https://evil.example.com/blah#frag. The canonical url is https://evil.example.com/blah. The host suffixes to be tried are evil.example.com, and example.com. The path prefixes are / and /blah. The four combined URL combinations are evil.example.com/, evil.example.com/blah, example.com/, and example.com/blah. url_combinations = ["evil.example.com/", "evil.example.com/blah","example.com/", "example.com/blah"] full_hashes = ['1a02…28', 'bb90…9f', '7a9e…67', 'bac8…fa'] Step 2: Search Partial Hashes in Local Lists Chrome then checks the full hashes of the URL combinations against the locally maintained Safe Browsing lists. These lists, which contain partial hashes, do not provide a decisive malicious verdict, but can quickly identify if the URL is considered not malicious. If the full hash of the URL does not match any of the partial hashes from the local lists, the URL is considered safe and Chrome proceeds to load it. This happens for more than 99% of the URLs checked. Source Code

V4LocalDatabaseManager::GetPrefixMatches gets the matching partial hashes for the full hashes of the URL and its combinations.

Example Chrome finds that three full hashes 1a02…28, bb90…9f, and bac8…fa match local partial hashes. We note that this is for demonstration purposes, and a match here is rare.

Step 3: Fetch Matching Full Hashes Next, Chrome sends only the matching partial hash (not the full URL or any particular part of the URL, or even their full hashes), to the Safe Browsing service's fullHashes.find method. In response, it receives the full hashes of all malicious URLs for which the full hash begins with one of the partial hashes sent by Chrome. Chrome checks the fetched full hashes with the generated full hashes of the URL combinations. If any match is found, it identifies the URL with various threats and their severities inferred from the matched full hashes. Source Code

V4GetHashProtocolManager::GetFullHashes performs the lookup for the full hashes for the matched partial hashes.

Example Chrome sends the matched partial hashes 1a02, bb90, and bac8 to fetch the full hashes. The server returns full hashes that match these partial hashes, 1a02…28, bb90…ce, and bac8…01. Chrome finds that one of the full hashes matches with the full hash of the URL combination being checked, and identifies the malicious URL as hosting malware. Conclusion Safe Browsing protects Chrome users from various malicious threats on the internet. While providing these protections, Chrome faces challenges such as constraints in memory capacity, network bandwidth usage, and a dynamic threat landscape. Chrome is also mindful of the users’ privacy choices, and shares little data with Google. In a follow up post, we will cover the more advanced protections Chrome provides to its users who have opted in to “Enhanced Protection”.

#ShareTheMicInCyber: Brooke Pearson

March 11, 2021

Posted by Parisa Tabriz, Head of Chrome Product, Engineering and UX

In an effort to showcase the breadth and depth of Black+ contributions to security and privacy fields, we’ve launched a profile series that aims to elevate and celebrate the Black+ voices in security and privacy we have here at Google.

Brooke Pearson manages the Privacy Sandbox program at Google, and her team's mission is to, “Create a thriving web ecosystem that is respectful of users and private by default.” Brooke lives this mission and it is what makes her an invaluable asset to the Chrome team and Google.

In addition to her work advancing the fields of security and privacy, she is a fierce advocate for women in the workplace and for elevating the voices of her fellow Black+ practitioners in security and privacy. She has participated and supported the #ShareTheMicInCyber campaign since its inception.

Brooke is passionate about delivering privacy solutions that work and making browsing the web an inherently more private experience for users around the world.

Why do you work in security or privacy?

I work in security and privacy to protect people and their personal information. It’s that simple. Security and privacy are two issues that are core to shaping the future of technology and how we interact with each other over the Internet. The challenges are immense, and yet the ability to impact positive change is what drew me to the field.

Tell us a little bit about your career journey to Google

My career journey into privacy does not involve traditional educational training in the field. In fact, my background is in public policy and communications, but when I transitioned to the technology industry, I realized that the most pressing policy issues for companies like Google surround the nascent field of privacy and the growing field of security.

After I graduated from college at Azusa Pacific University, I was the recipient of a Fulbright scholarship to Macau, where I spent one year studying Chinese and teaching English. I then moved to Washington D.C. where I initially worked for the State Department while finishing my graduate degree in International Public Policy at George Washington University. I had an amazing experience in that role and it afforded me some incredible networking opportunities and the chance to travel the world, as I worked in Afghanistan and Central Asia.

After about five years in the public sector, I joined Facebook as a Program Manager for the Global Public Policy team, initially focused on social good programs like Safety Check and Charitable Giving. Over time, I could see that the security team at Facebook was focused on fighting the proliferation of misinformation, and this called to me as an area where I could put my expertise in communication and geopolitical policy to work. So I switched teams and I've been in the security and privacy field ever since, eventually for Uber and now with Google's Chrome team.

At Google, privacy and security are at the heart of everything we do. Chrome is tackling some of the world's biggest security and privacy problems, and everyday my work impacts billions of people around the world. Most days, that's pretty daunting, but every day it's humbling and inspiring.

What is your security or privacy "soapbox"?

If we want to encourage people to engage in more secure behavior, we have to make it easy to understand and easy to act on. Every day we strive to make our users safer with Google by implementing security and privacy controls that are effective and easy for our users to use and understand.

As a program manager, I’ve learned that it is almost always more effective to offer a carrot than a stick, when it comes to security and privacy hygiene. I encourage all of our users to visit our Safety Center to learn all the ways Google helps you stay safe online, every day.

If you are interested in following Brooke’s work here at Google and beyond, please follow her on Twitter @brookelenet. We will be bringing you more profiles over the coming weeks and we hope you will engage with and share these with your network.

If you are interested in participating or learning more about #ShareTheMicInCyber, click here.

#ShareTheMicInCyber: Rob Duhart

March 1, 2021

In an effort to showcase the breadth and depth of Black+ contributions to security and privacy fields, we’ve launched a series in support of #ShareTheMicInCyber that aims to elevate and celebrate the Black+ voices in security and privacy we have here at Google.
EC-CouncilInternational Consortium of Cybersecurity Professionals (ICMCP)

Why do you work in security or privacy?

I have been in the cyber world long enough to know how important it is for security and privacy to be top of mind and focus for organizations of all shapes and sizes. My passion lies in keeping users and Googlers safe. One of the main reasons I joined Google is its commitment to security and privacy.

Tell us a little bit about your career journey to Google...
I was fortunate to begin my cybersecurity career in the United States Government working at the Department of Energy, FBI, and the Intelligence Community. I transitioned to the private sector in 2017 and have been fortunate to lead talented security teams at Cardinal Health and Ford Motor Company.

My journey into cybersecurity was not traditional. I studied Political Science at Washington University in St. Louis, completed graduate education at George Mason University and Carnegie Mellon University. I honed my skills and expertise in this space through hands on experience and with the support of many amazing mentors. It has been the ride of a lifetime and I look forward to what is next.

To those thinking about making a career change or are just starting to get into security, my advice is don’t be afraid to ask for help.

What is your security or privacy "soapbox"?

At Google, we implement a model known as Federated Security, where our security teams partner across our Product Areas to enable security program maturity Google wide. Our Federated Security team believes in harnessing the power of relationship, engagement, and community to drive maturity into every product. Security and privacy are team sports – it takes business leaders and security leaders working together to secure and protect our digital and physical worlds.

If you are interested in following Rob’s work here at Google and beyond, please follow him on Twitter @RobDuhart. We will be bringing you more profiles over the coming weeks and we hope you will engage with and share these with your network.

If you are interested in participating or learning more about #ShareTheMicInCyber, click here.

Celebrating the influence and contributions of Black+ Security & Privacy Googlers

February 25, 2021

less than 12%cyber threats#ShareTheMicInCyber↓
#ShareTheMicInCyber: Camille Stewart#ShareTheMicInCybernationalInternational Foundation for Electoral Systems Girl SecurityWhy do you work in security or privacy? Tell us a little bit about your career journey to GoogleWhat is your security or privacy "soapbox"? penned a piece@CamilleEsq
If you are interested in participating or learning more about #ShareTheMicInCyber, click here.

Privacy-Preserving Smart Input with Gboard

October 7, 2020

Posted by Yang Lu, Software Engineer, Angana Ghosh, Group Product Manager, and Xu Liu, Director of Engineering, Gboard team Google Keyboard (a.k.a Gboard) has a critical mission to provide frictionless input on Android to empower users to communicate accurately and express themselves effortlessly. In order to accomplish this mission, Gboard must also protect users' private and sensitive data. Nothing users type is sent to Google servers. We recently launched privacy-preserving input by further advancing the latest federated technologies. In Android 11, Gboard also launched the contextual input suggestion experience by integrating on-device smarts into the user's daily communication in a privacy-preserving way.

Before Android 11, input suggestions were surfaced to users in several different places. In Android 11, Gboard launched a consistent and coordinated approach to access contextual input suggestions. For the first time, we've brought Smart Replies to the keyboard suggestions - powered by system intelligence running entirely on device. The smart input suggestions are rendered with a transparent layer on top of Gboard’s suggestion strip. This structure maintains the trust boundaries between the Android platform and Gboard, meaning sensitive personal content cannot be not accessed by Gboard. The suggestions are only sent to the app after the user taps to accept them. For instance, when a user receives the message “Have a virtual coffee at 5pm?” in Whatsapp, on-device system intelligence predicts smart text and emoji replies “Sounds great!” and “👍”. Android system intelligence can see the incoming message but Gboard cannot. In Android 11, these Smart Replies are rendered by the Android platform on Gboard’s suggestion strip as a transparent layer. The suggested reply is generated by the system intelligence. When the user taps the suggestion, Android platform sends it to the input field directly. If the user doesn't tap the suggestion, gBoard and the app cannot see it. In this way, Android and Gboard surface the best of Google smarts whilst keeping users' data private: none of their data goes to any app, including the keyboard, unless they've tapped a suggestion.

Additionally, federated learning has enabled Gboard to train intelligent input models across many devices while keeping everything individual users type on their device. Today, the emoji is as common as punctuation - and have become the way for our users to express themselves in messaging. Our users want a way to have fresh and diversified emojis to better express their thoughts in messaging apps. Recently, we launched new on-device transformer models that are fine-tuned with federated learning in Gboard, to produce more contextual emoji predictions for English, Spanish and Portuguese.

Furthermore, following the success of privacy-preserving machine learning techniques, Gboard continues to leverage federated analytics to understand how Gboard is used from decentralized data. What we've learned from privacy-preserving analysis has let us make better decisions in our product. When a user shares an emoji in a conversation, their phone keeps an ongoing count of which emojis are used. Later, when the phone is idle, plugged in, and connected to WiFi, Google’s federated analytics server invites the device to join a “round” of federated analytics data computation with hundreds of other participating phones. Every device involved in one round will compute the emoji share frequency, encrypt the result and send it a federated analytics server. Although the server can’t decrypt the data individually, the final tally of total emoji counts can be decrypted when combining encrypted data across devices. The aggregated data shows that the most popular emoji is 😂 in Whatsapp, 😭 in Roblox(gaming), and ✔ in Google Docs. Emoji 😷 moved up from 119th to 42nd in terms of frequency during COVID-19. Gboard always has a strong commitment to Google’s Privacy Principles. Gboard strives to build privacy-preserving effortless input products for users to freely express their thoughts in 900+ languages while safeguarding user data. We will keep pushing the state of the art in smart input technologies on Android while safeguarding user data. Stay tuned!

Security Blog

How Hash-Based Safe Browsing Works in Google Chrome

#ShareTheMicInCyber: Brooke Pearson

#ShareTheMicInCyber: Rob Duhart

Celebrating the influence and contributions of Black+ Security & Privacy Googlers

Privacy-Preserving Smart Input with Gboard

Labels

Archive

Feed