We have a new piece of the SEO puzzle from Google. They have introduced a new ranking signal in the form of machine learning and artificial intelligence called RankBrain.
Bloomberg broke the story that not only has Google introduced RankBrain, but that it has already been using it for some time in the search results.
We also reached out to Google, and they responded to The SEM Post with plenty of answers with how this works and what it impacts. Author Jack Clark has shared additional details outside of the Bloomberg article, to give additional insight into the new AI.
What is RankBrain?
RankBrain is an artificial intelligence Google is using in order to serve better search results, particularly for the 15% of daily search queries that Google has never seen before.
The Bloomberg article describes it in simpler terms:
RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.
It is only one of the pieces of Google’s search algorithm, but it is a fairly significant one.
When It Went Live
It is not known the exact date this went live, although it was earlier this year. Bloomberg reports it as being active for the past few months.
I asked Google and they did not have a more specific date to share, just that “it was rolled out gradually starting early in 2015.”
There is a big difference between “a few months” and “early in 2015”, for those trying to see if any previous ranking changes we noticed could possibly be attributed to RankBrain. There have been multiple times we have noticed clear changes that Google has made to the core ranking algo, but we can’t be sure if any of these updates we saw can be attributed to RankBrain launch.
As Google doesn’t generally comment on core ranking changes, unless we get a more specific time frame, it will be difficult to narrow down which of the changes could be attributed to RankBrain and then reverse engineer the changes we saw.
That said, for it to be such a strong ranking signal, it is almost certain one of the “updates” we saw was actually RankBrain rolling out.
Covers All Languages
This is not just for English queries, Google confirmed to The SEM Post that they are using RankBrain on all languages.
This is especially important because it shows RankBrain can be applied regardless of the language used.
Third Most Important Signal
According to the Bloomberg article, this is the third most important signal in the Google ranking algorithm, so that is pretty significant.
I also reached out to Google, and a Google spokesperson shared that “It is one of hundreds of signals, but a significant one.”
It also stands to reason that because it does well at those 15% of never before seen search queries, that it will be a larger ranking signal with those queries.
@glenngabe therefore it'd be logical to say that a higher proportion of the 15% of never-before-seen ones get a big signal from RankBrain
— Jack Clark (@jackclarkSF) October 26, 2015
The Google spokesperson also said “It’s especially helpful on long-tail queries, such as the 15% never seen before each day.”
And no, Google did not tell Clark the first or second most important signals, although many can make a pretty educated guess at this.
Used on a Large Set of Queries
One of the big misunderstandings is that RankBrain is just used on the 15% of new-to-Google search queries. But it is used much more.
While this technology is particularly good at results for the 15% of searches, it is being used on a large percentage of Google’s search queries.
@glenngabe It's invoked on a very large fraction of queries and is particularly good at dealing with the 15% per day that haven't been seen
— Jack Clark (@jackclarkSF) October 26, 2015
Not Restricted to Types of Search Queries
Some were wondering whether this RankBrain might be more useful for certain types of searches – beyond the 15% new ones – and if Google would skew towards using RankBrain for those specific types of queries. But this is not the case.
The Google spokesperson said it is “not limited to any particular set of queries.”
It also means SEOs can’t reverse engineer it, to see if it is applied more for certain market areas or topics.
Not Continually Learning
Gary Illyes on Twitter was asked whether it was continually learning, it is not. So it doesn’t evolve with each search query on-the-fly.
@glenngabe it's not. The team was working on it for months and its effects are expectable, not assumable.
— Gary Illyes (@methode) October 26, 2015
RankBrain Updates
The author shared on Twitter that it is periodically “re-trained”.
@methode @glenngabe yeah. It's periodically re-trained, but it's not learning on-the-fly.
— Jack Clark (@jackclarkSF) October 26, 2015
When I asked Google about the frequency of updates, they said they will update as needed. “We’ll keep experimenting with and testing new models, and we’ll make updates as we come up with models that do a better job than the existing one,” the Google spokesperson said. “That could be about refreshing the data or developing new neural net architectures.”
We could see this continue to evolve, but it will be hard to translate current or new RankBrain signals into SEO.
What Does This Mean for SEO?
Whenever Google makes a change to their algo, there is always a chorus of “SEO is dead” followed up by many articles about it. But Gary Illyes confirms that “SEO magic” still works.
@JohnJCurtis It's very much not. This was launched months ago, and your SEO magic still works.
— Gary Illyes (@methode) October 27, 2015
And while we have definitely seen algo changes, attributed to the usual “updates to one of our hundreds of ranking signals”, SEO is still alive and well even with the introduction of RankBrain.
Thought vectors
Here is where it gets really interesting. According to Jack Clark, it is “converting words and phrases into vectors.”
@ollieglass @ejlbell it's converting words and phrases into vectors. Closely related to Hinton's work on thought vectors.
— Jack Clark (@jackclarkSF) October 26, 2015
The Hinton referred to is Professor Geoff Hinton with many accomplishments in artificial neural networks. He is a professor at the University of Toronto and, when his company DNNresearch Inc was acquired by Google, as a Distinguished Researcher for Google.
Word2vec Connection
Word2vec is one that many are speculating is the basis for RankBrain. Clark posted additional comments on the Word2vec connection on Hacker News.
They wouldn’t explicitly confirm that it is word2vec, but everything we discussed indicated it’s likely doing something roughly equivalent to word2vec, and is also doing similar conversions for sequences which is likely connected to Sequence to Sequence learning (PDF: http://papers.nips.cc/paper/5346-sequence-to-sequence-learni…). It also links to Geoff Hinton’s stuff on Thought Vectors which implicitly involves word2vec.
When I asked Google if it was based on Word2vec, the Google spokesperson said “It’s related to word2vec in that it uses ’embeddings’ — looking at phrases in high-dimensional space to learn how they’re related to one another.”
Converting Words and Phrases Into Vectors
From a technical aspect, RankBrain is converting words and phrases into vectors, which can then be used for deep learning.
Hinton gave a keynote lecture on deep learning at The Royal Society that talks about these connections.
The implications of this for document processing are very important.
If we can convert a sentence into a vector that captures the meaning of the sentence, then Google can do much better searches. They can search based on what is being said in a document.
Also, if you can convert each sentence in a document into a vector, you can then take that sequence of vectors and try and model why you get this vector after you get these vectors. That’s called reasoning, that’s natural reasoning, and that was kind of the core of good old fashioned AI and something they could never do because natural reasoning is a complicated business, and logic isn’t a very good model of it.
Here we can say, well, look, if we can read every English document on the web, and turn each sentence into a thought vector, we’ve got plenty of data for training a system that can reason like people do. Now, you might not want to reason like people do on the web, but at least we can see what they would think.
So I think what is going to happen over the next few years is this ability to turn these sentences into thought vectors is going to rapidly change the level that we can understand documents.
He also talks about the current scaling issues.
To understand at human levels, we are probably going to need human level resources, and we’ve got trillions of connection and the biggest neural net we run so far have at most a few billion connections, so we are a few orders of magnitude off still. But I’m sure the hardware people will help us out.
Possible RankBrain Patents
Bill Slawski has already written about the possible connections with RankBrain and patents, including one that specifically deals with how Google can replace search terms within a query. The patent is “Using concepts as contexts for query term substitutions,” filed in 2012 but published August 2015.
You will find Slawski’s analysis Investigating Google RankBrain and Query Term Substituions well worth the read, as well as the patent itself, for those looking to learn as much about RankBrain as possible.
Deep Diving Into Deep Learning
For those who really want to deep dive into this more, in addition to the video above, there are several papers related to deep learning and thought vectors.
Deep Learning, Nature, LeCun, Y., Bengio, Y. and Hinton, G. E. (PDF)
Distilling the knowledge in a neural network, Hinton, G. E., Vinyals, O., and Dean, J. (PDF)
And some more videos:
And a couple of older ones that provide a great background of deep learning:
Spam
This definitely raises the question of whether or not an AI is smarter at detecting spam, or if it can prevent itself from serving spam that the rest of Google’s core ranking algo fails to catch.
Game changer?
Is this a game changer? It definitely changes how Google sees and handles searches, even if it went largely unnoticed by the SEO community. I am sure Google will test dialing this up and down as a ranking signal, if it hasn’t already, especially as it continues to update with new data or models.
Update: Want to know what the industry thinks about RankBrain? 9 industry experts weigh in.
David Carley says
Fascinating – and not just for SEO, the idea of changing key-phrases [and not just the ~15%] into vectors could seriously enable the returning of far better search results down the line – and if, as it seems, is Google’s ambition regarding RankBrain – then I’m all for it.
Paul Reilly says
Hi Jennifer,
Awesome post – definitely the best researched coverage of the RankBrain topic on the web..
Hope you don’t mind me also adding my 2c here also.
Word2Vec is only part of the picture in my view and if it’s no currenty a ranking factor or be part of spam detection or search quality, I’ve included my own views on RankBrain including wider A.I. considerations and implications for SEO here as well.
https://www.mediaskunkworks.com/iGB_Aff_54_DecJan_p28-31.pdf
(from Dec/Jan edition of iGaming Affiliate Magazine)
Technically minded readers may find the latest research on Dependency Based Embeddings for Sentence Classification Tasks: (Suresh Manandhar & Alexandros Komninos).
http://www-users.cs.york.ac.uk/~suresh/papers/dep_embeddings_naacl2016.pdf
Things move very fast in A.I.
Word2Vec, as originally implemented, was superseded 6 days ago. (14 March 2016)
by the above research.
Given the training of such neural network based models require large batch processing jobs, I suspect switching to a new model will be very easy and cost beneficial when implementing.
Additionally, I see no reason why Google wouldn’t just continue to swap in newer more effective algorithms as they emerge.
For the less mathematically gifted, here’s the animated GIF visualisation of Word2Vec in action.
https://twitter.com/paulreilly/status/711397493855145985
Thanks again for such as well researched piece on this topic..
Paul
Jennifer Slegg says
Thanks Paul!
Nikolay Stoyanov says
Great article Jennifer. Love the twitter compilation, so much rumors all over the place. Anyway, you mentioned RankBrain’s impact on spam. The way I see it, everything will remain the same, only ones that will profit are end users. Do you think that this will lead to some new SEO strategies as we learn more about the system?
Jennifer Slegg says
Algo changes always lead to new strategies. The papers around RankBrain are really interesting to read.