It is always interesting to know just how much of an impact Google’s quality raters have on the search results, especially when it comes to both the number of experiments Google conducts with their raters as well as the number of actual updates Google has made based on rater feedback.
Google has updated their statistics for how Google’s algorithms have changed over the past year, as well as the impact of Google’s Search Quality Raters on the search results. The numbers had been previously updated in 2019 with statistics from 2018, but Google has quietly updated the stats this year to reflect the changes made in 2019 to their search results.
Overall Experiments and Launches
In 2019, Google ran 464,065 experiments with Google’s quality raters, which resulted in 3620 “improvements to Search.”
What is most interesting about this stat is that Google ran significantly less experiments last year than they did in 2018, however even with fewer experiments, it resulted in more actual improvements to the results. In 2018, they ran 654,680 experiments, which resulted in 3234 improvements. In 2017, it was 200,000 experiments resulting in 2400 updates.
It isn’t clear why Google ran significantly few experiments with the raters. Are they using more AI which lessens the need for experiments? Are they getting better at evaluating changes internally, resulting in fewer experiments needed? Or are Google’s experiments simply stronger and are more likely to produce a positive change in the results that do get launched? Even with the few experiments run however, they still made more updates to the search results than we saw in 2018.
Search Quality Tests
When it comes to search quality tests, Google did 383,605 quality tests in 2019 with the raters. This is also a significantly reduced number from 2018 which saw a whopping 595,429 tests. Quality tests are generally tests Google runs to ensure their search results are surfacing the highest quality results, based on the overall search results and those sites or pages appearing in the top result spots.
Again, Google doesn’t offer any reasoning for the change in the number of quality tests or make note of the difference.
Side-by-Side Experiments
One of the ways that quality raters evaluate the search results is by having a dual panel in the rater’s hub which will display the current search results on one side (randomized which side) with a potential search update being applied to the search results on the other side. Raters then evaluate both sides and rate based on which side they feel has the best search results.
As you can see from the image, Google doesn’t have their raters simply rate based on the traditional “ten blue links” but also on the overall results which includes search features as well.
In 2019, Google did 62,937 side-by-side experiments, compared to 44,155 in 2018. So it seems Google is placing a higher priority on testing results side-by-side now.
Live Traffic Experiments
Live traffic experiments are types of experiments that are tested with live searchers, where we see things like different colors, borders and font sizes used in the search results on a very limited basis to see how the general public reacts to them. Google enables these types of features to a small percentage of searchers, “usually starting at 0.1%”. Some of these are so minor they are usually not noticeable, such as the 41 shades of blue Google tested several years ago.
In 2019, Google did 17,523 live traffic experiments. In 2018, it was only 15,096.
Web Spam Messages
One related number did not get updated for 2019, and that is the number of messages Google sent to webmasters about spam issues on their site. It still cites 180 million from 2018. However, in Google’s Spam Report for 2019, released this summer, Google cites that they sent 90 million messages to website owners, which dropped by half.
Last Thoughts
Google’s quality raters are still important to Google and the search results, since it gives Google real feedback on proposed search result changes. The significant changes to the number of experiments done year over year is interesting, especially since it resulted in a higher number of actual updates compared to the previous year.
Google does also seem to be skewing higher on pushing live traffic experiments. However, it is hard to know how significant these live experiments are, since they are most often associated with the minor look of the search results and sometimes search features, rather than search quality specific to the results on the page.
They have not updated their number of quality raters they have, the only number we have been given is 10,000, but that number is several years old and not included in their How Search Works section.
It will be most interesting to see what impact COVID has on Google’s experiments when it updates its numbers next year for 2020.
It is also worth noting that Google hasn’t publicly updated their Quality Rater Guidelines since December 2019, so we are definitely due for an updated version, but again, COVID is likely impacting the guidelines themselves as well.