Official Google Webmaster Central Blog: October 2009

Using RSS/Atom feeds to discover new URLs

Thursday, October 29, 2009

submitted URLsPubSubHubbubrobots.txtrobots.txt tester in Google Webmaster ToolsWritten by Raymond Lo, Guhan Viswanathan, and Dave Weissman, Crawl and Indexing Team

Help us make the web better: An update on Rich Snippets

Monday, October 26, 2009

announced

reviewpeople/social networkingTesting toolRich Snippets Testing Tool

Custom Search engineBetter documentation.Tips & TricksFrequently Asked QuestionsExtended RDFa support.PersonFOAFvCardVideos.help Google find those videosWritten by Kavi Goel, Pravir Gupta, and Othar Hansson

Verifying a Blogger blog in Webmaster Tools

Thursday, October 22, 2009

This post is outdated. For the latest information on verifying your Blogger blog, please see our Webmaster Help Center article for Blogger.

Posted by Jonathan Simon, Webmaster Trends Analyst

One million YouTube views!

Wednesday, October 21, 2009

we launchedWebmaster Central channelconference presentationsupdates on tools for webmastersgeneral tips"Grab bag" questions

captionsPosted by Michael Wyszomierski, Search Quality Team

Dealing with low-quality backlinks

Friday, October 16, 2009

If your site receives links that look similarly dodgy, don't be alarmed... read on!just one of manyhugeandcontact the site(s)factors that influence indexing and rankingwebsite testing toolspam reportpaid links reportPosted by Kaspar Szymanski, Search Quality Strategist, Dublin & Susan Moskwa, Webmaster Trends Analyst, Kirkland

Let's make the mobile web faster

Friday, October 16, 2009

(Cross-posted on the Google Code Blog)celebrating all things mobilemake the web fasterMake the mobile web faster articlePosted by Jeremy Weinstein, Google Webmaster

Managing your reputation through search results

Thursday, October 15, 2009

(Cross-posted on the Official Google Blog)Think twiceThink twice before putting your personal information onlineTackle it at the sourcetry to remove it from the site where it's appearing

If the content in question is on a site you own, easy — just remove it. It will naturally drop out of search results after we recrawl the page and discover the change.

It's also often easy to remove content from sites you don't own if you put it there, such as photos you've uploaded, or content on your profile page.

If you can't remove something yourself, you can contact the site's webmaster and ask them to remove the content or the page in question.

our URL removal toolProactively publish informationproactively publishing useful, positive informationwant

Create a Google profile. When people search for your name, Google can display a link to your Google profile in our search results and people can click through to see whatever information you choose to publish in your profile.

If a customer writes a negative review of your business, you could ask some of your other customers who are happy with your company to give a fuller picture of your business.

If a blogger is publishing unflattering photos of you, take some pictures you prefer and publish them in a blog post or two.

If a newspaper wrote an article about a court case that put you in a negative light, but which was subsequently ruled in your favor, you can ask them to update the article or publish a follow-up article about your exoneration. (This last one may seem far-fetched, but believe it or not, we've gotten multiple requests from people in this situation.)

share your own advice or storiesPosted by Susan Moskwa, Webmaster Trends Analyst

Fetch as Googlebot and Malware details -- now in Webmaster Tools Labs!

Monday, October 12, 2009

feedback ASAP

Today we're launching two cool features:

Malware details
Fetch as Googlebot

Malware details (developed by Lucas Ballard)

Before today, you may have been relying on manual testing, our safe browsing API, and malware notifications to determine which pages on your site may be distributing malware. Sometimes finding the malicious code is extremely difficult, even when you do know which pages it was found on. Today we are happy to announce that we'll be providing snippets of code that exist on some of those pages that we consider to be malicious. We hope this additional information enables you to eliminate the malware on your site very quickly, and reduces the number of iterations many webmasters go through during the review process.
More information on this cool feature is available at our Online Security Blog.

Fetch as Googlebot (developed by Javier Tordable)

"What does Googlebot see when it accesses my page?" is a common question webmasters ask us on our forums and at conferences. Our keywords and HTML suggestions features help you understand the content we're extracting from your site, and any issues we may be running into at crawl and indexing time. However, we realized it was important to provide the ability for users to submit pages on their site and get real-time feedback on what Googlebot sees. This feature will help users a great deal when they re-implement their site with a new technology stack, find out that some of their pages have been hacked, or want to understand why they're not ranking for specific keywords.

We're pretty excited about this launch, and hope you are too. Let us know what you think!

Posted by Sagar Kamdar, Product Manager, Webmaster Tools

A proposal for making AJAX crawlable

Wednesday, October 07, 2009

Minimal changes are required as the website grows

Users and search engines see the same content (no cloaking)

Search engines can send users directly to the AJAX URL (not to a static copy)

Site owners have a way of verifying that their AJAX website is rendered correctly and thus that the crawler has access to all the content

Slightly modify the URL fragments for stateful AJAX pages
Stateful AJAX pages display the same content whenever accessed directly. These are pages that could be referred to in search results. Instead of a URL like http://example.com/page?query#state we would like to propose adding a token to make it possible to recognize these URLs: http://example.com/page?query#[FRAGMENTTOKEN]state . Based on a review of current URLs on the web, we propose using "!" (an exclamation point) as the token for this. The proposed URL that could be shown in search results would then be: http://example.com/page?query#!state.

Use a headless browser that outputs an HTML snapshot on your web server
The headless browser is used to access the AJAX page and generates HTML code based on the final state in the browser. Only specially tagged URLs are passed to the headless browser for processing. By doing this on the server side, the website owner is in control of the HTML code that is generated and can easily verify that all JavaScript is executed correctly. An example of such a browser is HtmlUnit, an open-sourced "GUI-less browser for Java programs.

Allow search engine crawlers to access these URLs by escaping the state
As URL fragments are never sent with requests to servers, it's necessary to slightly modify the URL used to access the page. At the same time, this tells the server to use the headless browser to generate HTML code instead of returning a page with JavaScript. Other, existing URLs - such as those used by the user - would be processed normally, bypassing the headless browser. We propose escaping the state information and adding it to the query parameters with a token. Using the previous example, one such URL would be http://example.com/page?query&[QUERYTOKEN]=state . Based on our analysis of current URLs on the web, we propose using "_escaped_fragment_" as the token. The proposed URL would then become http://example.com/page?query&_escaped_fragment_=state .

Show the original URL to users in the search results
To improve the user experience, it makes sense to refer users directly to the AJAX-based pages. This can be achieved by showing the original URL (such as http://example.com/page?query#!state from our example above) in the search results. Search engines can check that the indexable text returned to Googlebot is the same or a subset of the text that is returned to users.

http://example.com/dictionary.html#AJAXhttp://example.com/dictionary.html#!AJAXhttp://example.com/dictionary.html?_escaped_fragment_=AJAXhttp://example.com/dictionary.html#!AJAXView the presentationWebmaster Help ForumProposal by Katharina Probst, Bruce Johnson, Arup Mukherjee, Erik van der Poel and Li Xiao, Google
Blog post by John Mueller, Webmaster Trends Analyst, Google Zürich

Reunifying duplicate content on your website

Tuesday, October 06, 2009

Steps for dealing with duplicate content within your website

Recognize duplicate content on your website.
The first and most important step is to recognize duplicate content on your website. A simple way to do this is to take a unique text snippet from a page and to search for it, limiting the results to pages from your own website by using a site:query in Google. Multiple results for the same content show duplication you can investigate.

Determine your preferred URLs.
Before fixing duplicate content issues, you'll have to determine your preferred URL structure. Which URL would you prefer to use for that piece of content?

Be consistent within your website.
Once you've chosen your preferred URLs, make sure to use them in all possible locations within your website (including in your Sitemap file).

Apply 301 permanent redirects where necessary and possible.
If you can, redirect duplicate URLs to your preferred URLs using a 301 response code. This helps users and search engines find your preferred URLs should they visit the duplicate URLs. If your site is available on several domain names, pick one and use the 301 redirect appropriately from the others, making sure to forward to the right specific page, not just the root of the domain. If you support both www and non-www host names, pick one, use the preferred domain setting in Webmaster Tools, and redirect appropriately.

Implement the rel="canonical" link element on your pages where you can.
Where 301 redirects are not possible, the rel="canonical" link element can give us a better understanding of your site and of your preferred URLs. The use of this link element is also supported by major search engines such as Ask.com, Bing and Yahoo!.

Use the URL parameter handling tool in Google Webmaster Tools where possible.
If some or all of your website's duplicate content comes from URLs with query parameters, this tool can help you to notify us of important and irrelevant parameters within your URLs. More information about this tool can be found in our announcement blog post.

What about the robots.txt file?We now recommend not blocking access to duplicate content on your website, whether with a robots.txt file or other methodsrel="canonical" link elementURL parameter handling tooladjust the crawl rate setting in Webmaster ToolsHelp CenterWebmaster Help ForumPosted by John Mueller, Webmaster Trends Analyst, Google Zürich

New parameter handling tool helps with duplicate content issues

Monday, October 05, 2009

Duplicate content has been a hot topic among webmasters and our blog for over three years. One of our first posts on the subject came out in December of '06, and our most recent post was last week. Over the past three years, we've been providing tools and tips to help webmasters control which URLs we crawl and index, including a) use of 301 redirects, b) www vs. non-www preferred domain setting, c) change of address option, and d) rel="canonical".
We're happy to announce another feature to assist with managing duplicate content: parameter handling. Parameter handling allows you to view which parameters Google believes should be ignored or not ignored at crawl time, and to overwrite our suggestions if necessary.

Let's take our old example of a site selling Swedish fish. Imagine that your preferred version of the URL and its content looks like this:

http://www.example.com/product.php?item=swedish-fish

However, you may also serve the same content on different URLs depending on how the user navigates around your site, or your content management system may embed parameters such as sessionid:

http://www.example.com/product.php?item=swedish-fish&category=gummy-candy

http://www.example.com/product.php?item=swedish-fish&trackingid=1234&sessionid=5678

With the "Parameter Handling" setting, you can now provide suggestions to our crawler to ignore the parameters category, trackingid, and sessionid. If we take your suggestion into account, the net result will be a more efficient crawl of your site, and fewer duplicate URLs.

Since we launched the feature, here are some popular questions that have come up:

Are the suggestions provided a hint or a directive?

Your suggestions are considered hints. We'll do our best to take them into account; however, there may be cases when the provided suggestions may do more harm than good for a site.

When do I use parameter handling vs rel="canonical"?

rel="canonical" is a great tool to manage duplicate content issues, and has had huge adoption. The differences between the two options are:

rel="canonical" has to be put on each page, whereas parameter handling is set at the host level
rel="canonical" is respected by many search engines, whereas parameter handling suggestions are only provided to Google

Use which option works best for you; it's fine to use both if you want to be very thorough.

As always, your feedback on our new feature is appreciated.

Posted by Tanya Gupta and Ningning Zhu, Software Engineers

Google Friend Connect: No more FTP... just get started!

Friday, October 02, 2009

Update: The described product or service is no longer available.www.google.com/friendconnectour post on the Google Social Web BlogPosted by Mussie Shore, Product Manager, Google Friend Connect

Changes to website verification in Webmaster Tools

Thursday, October 01, 2009

Webmaster Tools

Google SitesPosted by Sean Harding, Software Engineer

Webmaster Central Blog